Author:
Sandhya Rajendran
Subject:
Engineering
Material Type:
Assessment, Interactive, Module, Reading, Student Guide
Level:
Graduate / Professional
Tags:
  • ETl
  • FME®
  • Oer
  • Spatial ETL
    License:
    Creative Commons Attribution Non-Commercial Share Alike
    Language:
    English
    Media Formats:
    Audio, Downloadable docs, Interactive, Video

    Spatial ETL

    Spatial ETL

    Overview

    Course: Spatial ETL and  FME® safe software Inc 

    Intended for: Students, Data analyst, planners, decision makers, business intelligence.

     

    Introduction

    Prerequisites


    Before we dive into this course, we need to prep you up :) You must satisfy the following:
    1) You have a basic grasp about DATA

    •       Different types of data sources
    •       Spatial data vs Non-Spatial data
    •       Structured data and Unstructured data

    2) You heard about ETL( Extract, transform, load )

    If you haven't heard about ETL, fear not :) Throughout the course, we will provide some resources
    for you to look up.

    Module Description

    The entire course is constructed to slowly build your knowledge of Spatial ETL and Data Integration. The course is designed to help you by providing relevant and anxicillary resources if you are intrigued by the topic. This course doesn't proll on indepth concepts but will provide strong foundation on the basics.

    Module Overview

    CASE 1: If you know about spatial ETL and its processes and wants to dive into practical stuff, SKIP PART 1

    CASE 2: If you want to learn about spatial ETL and its processes and then proceed with practical stuff, BUCKLE UP

    Now, I welcome you all for this journey :)

    Part 1: Spatial ETL

    Learning Objectives

    From this module, you will learn about

    • Basic to Moderate knowledge in Spatial ETL
    • Processes behind Spatial ETL: Data Integration
    • Softwares for performing ETL with spatial data

    What is Spatial ETL ?

    The term Spatial ETL was coined by safe software in 1993 to achieve the data accessibility. 

    Spatial ETL

    Spatial Extract, Transform and Load (spatial ETL) is the process whereby spatial data flows from source to target systems.

    Properties:
    1. Extract the data from various data sources eg: CAD, GIS, Raster, Databases
    2. Transform the data as required to make it usable defined by the user(us)
    3. Load the content into our preferred sites/ destinations.

    Benefits:

    • Data cleansing: The removal of errors within a dataset
    • Data merging: The bringing together of multiple datasets into a common framework
    • Data verification: The comparison of multiple datasets for verification and quality assurance purposes
    • Data conversion: Conversion between different data formats.

                                                                                                                                   Quiz

    Knowledge Checker


    Attempt the Quiz here to review your understanding of this section.


     

    Spatial ETL Softwares :

    We need to deal with whether the softwares are proprietary or open source. Some proprietary softwares have
    student offers that we can use to maximize our learning exeperience.
    Proprietary Software is computer software whose source codes are not publicly available and can only be
    modified by the firm/company/organisation that generated it whereas in Open Software whose source code
    is freely available on the internet, allowing programmers to add new features and capabilities without
    incurring any costs.

    Spatial Softwares


    With proprietary comes some benefits, the organisation are experts tailored to address the needs. Many
    sophisticated tools are created in the proprietary software. Spatial ETL is one of the emerging fields so not
    many open source softwares are available with better features.

    Spatial ETL: Data Integration

    Data Integration

    The process of integrating data from several sources together to present users with a single
    perspective is known as data integration

    Without broad modifications to existing systems or data structures, data integration done correctly can
    decrease IT costs, free up resources, increase data quality, and stimulate creativity. Data needs to be
    organized to facilitate analytical reporting and provide users with a complete and unified view of all the
    information that flows through the organization.


    Data integration is based on the idea of making data more freely available, as well as easier to consume and
    process by systems and users. The processes: extraction, transformation and loading constitute the Data Integration process.

                                                                                                          Extraction

    Data Extraction                          

    The process of collecting or obtaining varied types of data from a range of sources, many of which
    are poorly organized or completely unstructured, is known as data extraction.

    Data extraction allows data to be consolidated, processed, and refined before being stored in a centralized
    location and changed. These locations could be on-premises, cloud-based, or a combination of both. Extracting
    a spatial data can be done by searching the keywords of interest, using ,metadata, by visiting some portals.

                                                                                                                                              Transformation

    Data Transformation

    Transformation as the name suggest is the process of transforming from a source to the another source by satisfying required user defined parameters.

    Because most spatial data consists of a geographic element and an attribute data, spatial ETL transformations are sometimes referred to as either geometric transformations (transformations of the geographic element) or attribute transformations (transformations of the related attribute data).

                                                                                                                                        Loading

    Data Loading

    Data Loading is the ultimate step when it comes to Spatial ETL.

    Data is typically loaded into the destination application in a format that differs from the original source location. - Technopedia

                                                                                                                            Quiz

    Knowledge Checker


    Attempt the Quiz here to review what we learned in this section.

     

    Part 2: Module Overview

     FME® (the Feature Manipulation Engine) is a data integration tool used for transforming data. We are heavily
    concentrating on this tool for our analysis for this course.


    Learning Objectives
    After completing this unit, you will be able to:

    1. Know what is  FME® and its process
    2. Benefits of  FME®
    3. How to use  FME® for an analysis


    Module Overview
    This module will teach you the basics of  FME® desktop which inturn answers what  FME® does. Throughout this
    section, you will have real life examples that are tailored to help you understand the processess better. You
    will learn how to perform basic data transformations and how to set up your  FME® workflow according to best
    practices.

    FME®

    As I have said  FME® is a data integration tool that we use for data transformations.
    In our context,

    The Feature Manipulation Engine is a collection of tools for extracting, transform and load (ETL)
    geographic data. - universitat-koblenz

    It is intended for use with geographic information system (GIS), computer-aided design (CAD) and raster
    graphics software. It allows you to break down data silos by connecting data from 400+ sources. Make your data more valuable by making it available where, when, and how it is needed. Then, using event-based workflows, eliminate the manual effort of performing complex, repetitive tasks.


     FME® Features:

    • Many tools can only deal with tabular data(non-spatial data), while  FME® handles spatial data.
    • FME® has a pretty good graphical user interface where non-coders can esaily use this software for their analysis.
    •  FME® has rich data library and it can support more than 450 data formats.

    Here is a short introductory video by safe software company regarding FME® and its need.

    FME

    How  FME® Works?
     FME® has a number of key characteristics:
    1. Centralized
     FME® is a central engine amongst a whole array of supported formats (right). Data can be read from any format
    and written to any other. This means adding support for a new format automatically adds support to convert
    that data to or from any existing format.
    2. Semantic
     FME® is an engine that has a rich data model that handles all possible geometry and attribute types. It
    supports all kinds of data formats such as XML, JPEG, JSON and many mapping formats such as GIS, CAD etc.
    3. Thick-Pipe
    The 'T' in ETL is what traditional format translators lack.  FME® provides tremendous transformation
    functionality, resulting in output that can be much greater than the sum of the inputs.


    Now let's dive into the exercise part that we can do together :)

    Exercise 1: Setting up FME® Workspace

    License Information:
    If you’d like to follow along with your own  FME® Desktop, here are the prerequisites:
    1.  FME® Platform must be installed
           1. Platform: Desktop, server, cloud
           2. System requirements: Mac, windows, linux
    2.  FME® Platform must be licensed
           1. Free license
           2. Offers provided for particular category


    Installation Procedure:
    For installing, you need to access the safe software website and download the packages tailored to your OS.
    a) For downloading
    1. Go to safe software inc website
    2. Download the official and stable  FME® desktop version 2022.0.1
    3. Select your preferred Operating System, shown in this picture and download the file.

    b) For installing
    Choose your OS and click on installation guide to follow the instructions.
    Installation Guide :

    Note: For linux users, Update your linux to the latest version before installation.

    Exercise: 2 - Extraction

    Congratulations you have successfully installed and setup the  FME® Workspace.

    Problem scenario

    You work in an insurance company. A massive flooding event will take place in United Kingdom. Some of the places will be affected by the flood. You have the boundary data of the regions that are most likely affected by the flood. You want to increase the insurance billing amount to these customers in order to tackle the disaster events as a mitigation strategy. You have a list of customer addresses but are not sure which flooded area these customers belong to. Using a dataset of the flood affected areas, you will overlay the addresses on the affected area to determine which customers belong to that area. Then after the customers have been determined, a Microsoft Excel spreadsheet will be created so someone can notify the customers.


    Data Extraction

    Instructions:
    To tackle this question
    1) Download the dataset(given below under resources) and explore.

    Data Exploration

    2) Set up a workspace to work on.
    Now, if you are a windows user like myself, then go to Start >  FME® Workbench 2022.0.
    If you are a MacOS user, then go to Applications >  FME® 2022.0 >  FME® Workbench
    It takes some time to load the workbench.

    Exploration

    Add Reader Dialogue

    Reader

     

     

     

     

     

     

     

    a) Format and Dataset
    The Format and Dataset automatically populate when data is dropped into  FME®. You should check the format
    to make sure it is proper because there could be several format types for the same format extension.

    b) Parameters
    There are particular settings that can be set for each format. The Format Parameters dialog will appear when
    you click the Parameter button to check the parameters. There are mandatory parameters that must be set if
    the Parameters button has an exclamation point (!) at the end of it. You can click the Help button in the
    Parameters dialog to find out more about the precise parameters for your format.

    c) Coordinate System
    The Coord. System part of  FME® will display "Read from Source" if it can determine the coordinate system of
    the dataset. It will say "Unknown" if it can't. You can type the coordinate system into the Coord. System box
    manually if  FME® cannot recognize it.

    d) Workflow Options
    Workflow Options control how the reader feature types appear on the canvas. Each layer will be represented
    as its own feature type in "Individual Feature Types," and all the layers will be represented as one feature type
    in "Single Merged Feature Types," which will function as a single "layer." If you want to alter data that is same
    throughout several files or layers, Single Merged is an excellent option.

    Note: The help button is available for each options that can guide you throughout the process.

    Data Visualisation

    When creating a workspace in  FME®, inspecting data is a crucial step.It is critical to understand how the data
    looks, whether it was correctly read in, and what needs to be changed before the final output.

    Instructions:
    Continue on the workspace that we created in the previous exercise.
    a) Feature Caching

    Feature Caching


    b) Visual Preview
     

    Visual Preview


    Well done guys :) You extract the data and viewed the data. you have finished the first process in
    data integration

                                                                                                                             Quiz

     

    Knowledge Checker

    Attempt the Quiz to review what we learned in this section.

    Exercise: 3 - Transformation

    Data Transformation

    A transformer is an  FME® Workbench object that performs feature transformation. Each transformer has a distinct function and can be linked together in a series to perform complex tasks.

    a) Transformation using Transformers

    Instructions:
    When we viewed the flood_area data in Visual Preview, we come to know that there is a RED status in the
    dataset. We will use the Tester transformer to only select the service areas with the RED status.
    Step: 1

    Transformation


    Step 2:

    Transformer


    Answer this Question to check your results

    Part 2: Exercise
    Solve this exercise to finish the last part of data transformation.

    Problem: Since we want to know which customers will be affected by the Flood, we will need to filter the
    customers based on their spatial relationship to the Flood area with a RED status.
    Instructions:

    1) Add SpatialFilter transformer to the canvas Hint: by typing the name of the transformer in quick add
    dialogue
    2) Connect the Customer CSV reader feature type output to the SpatialFilter candidate input port.
    3) Then connect the Passed output port on the Tester to the Filter input port on the SpatialFilter.
    4) Open the SpatialFilter parameter by double clicking. Disable the merge attribute option. Click OK.
    5) Run the workspace.

    To check: 

    Answer     

     

     

    Your end result should look like this picture. If yes, then congratulations you now know how to add
    transformers. If not, then gothrough the diagram and follow the steps once more by undoing.

     

    b) Data cleaning

    The last step before we can write out the data is to clean up the attributes. For this we need to add another
    transformer.
    Instructions:

    Data Cleaning


    Congratulations!, you have finished the data transformation.

                                                                                                                                     Quiz

     

    Knowledge Checker

    Attempt the Quiz to review what we learned in this section.

    Exercise 4 - Loading

    Data Writer and Loading

    The transformed data needs to be written out to a file using a writer. We are now ready to write the data to a Microsoft Excel spreadsheet since we have the filtered data (which customers are within the affected storm area).
    a) Adding Writer

    Instructions:

    Writer


    b) Running Workspace
    Running the workspace starts the flow of data from input through any transformers to disk writing with a writer.
    Instructions:

    Running_the_Workspace

     

    Saving_the_workspace


    Well done on completing data integration. We now have the customer data that are likely to have damage due
    to flooding.

                                                                                                                                        Quiz

     

    Knowledge Checker

    Attempt the Quiz to review what we learned in this section.

    Wrap Up

    We are at the end of this course

    Lets have a recap:

    • We learnt about what is Spatial ETL and Data Integration, along with some Spatial ETL softwares.
    • We also learnt about  FME®, data integration with  FME®.
    • We know the basic functionalities with FME® such as How to extract data, how to add a reader/writer, how to transform the data, how to visualise the data and finally, how to load the data into our desired format.

    Now this course will help you understand the basic terminology in  FME® and simple real world example to get you to the spatial ETL journey.

     

    Thank you for taking this journey with us.

     

     

     


    Data Attribution
    The data was modified from Safe Software under CC license.
    The data used here originates from data made available by the City of Surrey, British Columbia. It contains
    information licensed under the Open Government License - Surrey.

    Images, Icons, Definitions: All the images and icons are created from Canvas, safe software inc.