Sensors and Systems
Breaking News
Industry players announce London Geospatial Week for 2020
Rating12345Geospatial professionals have an exciting new week to look...
Seabed 2030 and World Ocean Council agree new partnership for sustainable stewardship of the oceans
Rating12345London – A new partnership for sustainable stewardship of...
iGeolise Raise £3.2m Investment for Their Location Search and Mapping API
Rating12345LONDON – Today iGeolise announce £3.2 million funding from...
  • Rating12345

In recent years there has been acceleration in the collection and publishing of digital data about people, places, and phenomena of all kinds. Much of this big data explosion is due to the advancing diffusion of public data offered by government agencies at all levels. This increased availability of data presents great opportunities for answering new questions and improving understanding of the world by integrating previously distinct areas such as weather, transportation, and demographics, to name a few.

However, combining such diverse data remains a challenge because of the potential disparities in coverage, quality, compatibility, confidentiality, and update frequency, among others. Thus, developing strategies for handling such inconsistencies between datasets is critical for those interested in leveraging this abundance of data. It is also important for data providers, particularly in the public arena, to understand these challenges and the needs of integrators as they invest in the development new data dissemination approaches.

Need for Data Integration

Combining multiple datasets into the same application or database for visualization and analysis has become common practice in nearly every industry today. This practice is typically done by centrally integrating existing public data from disparate sources that facilitates new analyses to be conducted more efficiently and at a lower cost. This practice has become particularly vital for those preparing for, monitoring, and responding to emergencies, natural hazard events, and disasters.

For government officials, emergency managers, the media, and others involved in what are often rapidly changing events, timely access to wide-ranging information is critical for effective decision making. For example, access to land elevation, evacuation routes, traffic conditions, weather forecasts, building and property data, hospital and shelter locations, addresses, population density, demographics, and more may be required. However, digital representations of such data generally have widely varying characteristics affecting their use and compatibility in an integrated environment.

To successfully assess data and make decisions about their use together in today’s complex geo-analytical applications, a holistic approach can be especially effective. By evaluating the analytical, operational, and organizational implications of data integration efforts prior to implementation, more effective integration strategies can be devised that can eliminate or reduce the many challenges inherent in this activity.

Florida's Geospatial Assessment Tool for Operational Response (GATOR) brings together data from various sources to aid preparedness and response.

Florida’s Geospatial Assessment Tool for Operational Response (GATOR) brings together data from various sources to aid preparedness and response.

Analytical Considerations

Before data can be integrated for analytical purposes, there are several important questions that should be considered: What is the intended purpose of integration? Does integrating data help solve a new problem or answer a specific question? Does data cover the same geographic area and scale? Will datasets interact with one another? Are there complex geographies or special cases that may present problems?

Combining digital data for spatial and geographic analyses, presents a variety of challenges that need to be addressed. Consideration should be given to any facet of incoming data that may affect analyses including:

  • Data Extents – This refers to the geographic area of extent or coverage for which there is data in a given dataset. This must be considered for all integrated datasets because it determines which datasets can be displayed together or queried against one another.
  • Geometry Type & Complexity – Geometry type refers to the form used for geographic data storage and representation (i.e. point, line, and polygon). For vector data being integrated this is important because it influences the kind of spatial queries and analyses that are possible between datasets being integrated. Complex shapes can present special use cases requiring handling and extensive testing.
  • Spatial Resolution – Spatial resolution refers to the level of precision and scale of the spatial data being represented. Does the data represent features collected and stored at the local level (such as buildings and properties) or at the regional or national level (such as temperature)? This is important to consider for all integrated datasets because it influences the type of queries and analyses that are possible between datasets and affects the quality of their results.
  • Currency & Temporal Resolution – Currency refers to when the data was collected or how up-to-date it is. Temporal resolution refers to the period of time for which the data is valid. These factors are especially important when combining datasets that represent features or events from different points in time.
  • Quality & Accuracy – Data quality and accuracy refers to how correctly the data depicts or represents real-world features or phenomena. This must be considered for all datasets being integrated because it influences the suitability of use and any results derived from queries and analyses.
  • Privacy – Privacy refers to whether the data contains confidential or personally identifiable information (PII). This can be an especially important consideration for governmental data disseminators or integrators and may dictate how the data can be used and whether measures are required to protect private information.

The two maps displayed in figures 1 and 2 demonstrate the importance of scale when integrating geospatial data to determine the impact from natural disasters. Figure 1 shows a map developed to estimate the number of people affected by hurricane Sandy by combining data representing the high wind impact area at the time of landfall (the large oval area outlined in orange) with Census population figures by tract (in blue). Because there are many tracts entirely within the wind area, a standard spatial containment query enables the retrieval of population totals limited to the area in question, yielding an accurate result.

Figure 1. Map shows Census tracts in blue falling mostly all within hurricane Sandy wind area (in orange outline) during landfall on October 29 2012 (from Census OnTheMap for Emergency Management tool at:http://onthemap.ces.census.gov/em.html).

Figure 1. Map shows Census tracts in blue falling mostly all within hurricane Sandy wind area (in orange outline) during landfall on October 29 2012 (from Census OnTheMap for Emergency Management tool at: http://onthemap.ces.census.gov/em.html).

 

Figure 2. Map shows Census tracts in orange extending mostly outside of tornado impact area (outlined in red) in Moore Oklahoma (from ArcGIS Online web map at:http://bit.ly/1atvA04).

Figure 2. Map shows Census tracts in orange extending mostly outside of tornado impact area (outlined in red) in Moore Oklahoma (from ArcGIS Online web map at http://bit.ly/1atvA04).

Operational Considerations

In addition to analytical aspects, there are a variety of practical operational factors that may affect the technical implementation of integration efforts. Operational considerations can help answer questions such as: How is data accessed and stored? What formats and data models are used? Are data processing steps required? How often is data updated or changed? Operational Considerations include :

  • Access Method – Access method refers to the mechanism by which the data is accessed for integration. Options may include (1) access through remote services or interfaces such as web or database services and APIs, or (2) direct file access via FTP, HTTP, RSS etc. This may affect whether data is used as a reference or can be queried and interact with other datasets and whether it must be stored locally.
  • Format & Size – Format refers to the specific form of encoding of the data- generally known as file format (i.e. shapefile, KML, JSON, etc.). Datafile format and size can influence what technologies and methods are utilized for access, extraction, and analysis.
  • Data Model & Schema – The data model or schema refers to the logical internal organization of data elements. This organization governs how features are structured and accessed including sub-files, columns, rows, etc. This factor is critical to examine as it influences what analyses are possible, how queries are constructed, and whether unique identifiers are available for spatial features or temporal events.
  • Update Frequency – Update frequency refers to how often the data is changed, updated, or released with new or different contents. For example, weather and event related data often change hourly or daily whereas administrative boundaries or demographics on an annual cycle. The update frequency of incoming data may influence how features/events are tracked and handled over time as well as how data must be stored and archived.
  • Speed & Performance – Speed and performance refers to how quickly integrated data is able to return query results or render data displayed graphically. The performance of data can influence the method of integration and whether any additional post-processing steps or special technologies are required.
  • Stability & Reliability – Stability and reliability refer to the constancy of accessibility, format, schema and other data characteristics. Depending on the nature of the integration being attempted, changes to these operational aspects of incoming datasets may render invalid analyses or inoperable applications, thus requiring special attention.

Operational aspects are particularly central to real-time applications that rely on continuous access to data that’s always changing. When integrating data from disparate external sources, special accommodation and testing may be required to anticipate and minimize unforeseen issues due to content and formatting changes over time. For example, atypical geographic features or shapes may be encountered, schemas can change, and data services can slow down or become inactive without warning.

Figure 3 shows an example of a U.S. Census Bureau application, designed for studying the potential impacts of disaster events, that integrates a variety of natural hazard and socio-economic data from various federal sources with varying update frequencies. Here, National Weather Service data on floods and winter storms are updated on a daily cycle, hurricanes and tropical storms data are released on a regular hourly-based schedule, while USGS wildfire data and disaster declaration data from FEMA are updated on an ongoing as-needed basis. Census statistics on the U.S. workforce and population are updated on annual and decennial timetables.

Figure 3. Shows screenshot of U.S. Census Bureau’s web-based OnTheMap for Emergency Management tool (available at http://onthemap.ces.census.gov/em.html) which integrates diverse natural hazard and socio-economic data for studying potential disaster impacts.

Organizational Considerations

In addition to analytical and operational aspects, organizational or environmental factors can also greatly influence many data integration decisions. For example, many governmental organizations have policies, procedures, and protocols which may need to be followed that direct how internal and external data and systems can be accessed or integrated. These might include standard operating procedures, change control steps, security protocols, quality assurance measures, and other administrative requirements.

Consideration of the costs associated with data integration is also important to any organization involved in evaluating this activity. Given the complexity of many modern integrated environments which merge internal and external data that is constantly changing, calculating and allocating costs can be challenging. Several important questions to consider include : Are there data licensing fees? Is new hardware or software required? Will new skills be required to implement and sustain data integration activities? What is the management and maintenance cost over time as databases evolve and grow?

Putting It All Together

Given the diversity of factors that influence successful geospatial integration, developing a comprehensive approach that considers the analytical, operational and organizational implications can be helpful to integrators and suppliers alike. Business tools such as decision trees, flow diagrams, and simple graphic mockups can help solidify project goals and illuminate potential pitfalls before diving in. Utilizing or establishing basic data standards can also provide a basis for ensuring analytical, operational, and organizational consistency across integration efforts over time. In-depth testing and evaluation is, of course, often the best way to uncover the particular challenges inherent in any data integration project.

Government agencies play an increasingly important role in making data available to the public that is easy to utilize and integrate. As more of this data becomes available and the efficiencies of integration further realized, understanding how to evaluate and manage this activity is likely to become even more important.

 

About the Author

Robert Pitts is Director of Geospatial Services at New Light Technologies Inc. in Washington D.C., overseeing the provisioning of technical consultants and delivery of information solutions for a variety of government and commercial clients. Robert received his Master’s degree in Geographic Information Science from Edinburgh University (UK) and a Bachelor’s degree in Geography from the University of Denver. Robert is also a Certified Geographic Information Systems Professional (GISP) and Project Management Professional (PMP).

Leave a Reply

Your email address will not be published. Required fields are marked *