I consider this situation already exists today since we can buy or gain high levels of interoperability between different data sources and their file formats. Global modeling progress is challenged by matters relating to governance, semantic interoperability, language barriers, suitable data availability, data quality and our level of understanding about complex processes at different scales.
A few decades ago when spatial information was getting off the ground and the digital tools of GIS, GPS, CAD and remote sensing were beginning, the problems associated with data formats were far more challenging than they were today. At that time many people and organisations had difficulties not only in sharing information between different software, but in collecting suitable information that would fulfill the needs for meeting the spatially related questions present.
The approach using Extract-Transform-Load (ETL) technologies is certainly functional today and represents an effective means for dealing with multiple data formats both within and between organisations. As well, standardised formats, such as those available through the Open Geospatial Consortium (OGC) are also helping to meet requirements for increased collaboration and sharing.
Many of us understand the matters involved in modeling at the global scale because we see the processes in our office, organisation, city and countries. As we expand and grow to include more people and spatial systems together, then an increasing number of resources and effort ought to be coordinated and spent to integrate them.
I am not certain that a single data format is the answer. The significance of a distributed approach recognises the inherit uniqueness of the individual phenomenon we are attempting to model – architects do not generally consider a GIS as the best modeling tool for designing buildings and demographic analysis are not best performed with a CAD software, for instance.
We use different tools for different purposes and performance matters. However, the real issues of semantics are present and modeling that considers cultural diversity, language and data availability, for instance, is a reality in creating an environment conducive to integration, collaboration and sharing. Today we see many standards invoked along a geospatial chain from data capture, to data management, to processing and visualization. Beyond that lie more standards for data distribution, video and sound – all optimised to complete the task. Today’s ETL technologies maintain performance characteristics while enabling higher levels of interaction and collaboration.
There are many places in the world, outside of North America and Europe where basic issues and challenges relating to the collection of geoinformation and the processing of it remain a large issue. Accordingly, the first issue to be aware of when discussing a global context is that not all places around the world are on the same point of the maturity curve. As a result, building capacity in these places remains a primary goal.
It is next to impossible to create useful models without the availability of global data sets and real-time modeling will demand even greater efforts in data collection from more places. Even so, I consider the progress being made in the processing of remotely sensed imagery is crucial today as imagery is increasingly viewed as a primary data source. As such, being able to channel, or make available processed data from imagery is a solid step toward a common modeling environment, especially in a real-time sense.
Currently there is 1 km gridded data or better for some data sets. Shuttle Radar Topographic Mission (SRTM) has provided 30 m resolution data in some parts of the world and 90 m resolution data for most of the world. More recently, Intermap through their NextMap series of elevation data, are providing 5 m range surface models for North America, parts of Europe and soon all of Europe. Other producers are entering or about to enter the market to produce continental and globally available topographic information that is consistent, and whose quality is assured.
For Europe, the CORINE 2007 data set provides 250 m resolution vegetation information. The GISS (Matthews) Global Vegetation Data Set available from NASA provides for a 1 degree by 1 degree data grid globally. There are many other projects underway that are global in scope and cross different disciplines. By and large, the global focus has been on physical land measurements and food production, often involving topography, geology and soils, vegetation, water and forests. A significant amount of global modeling pertains to the climate and meteorological studies.
Yet, digitally available information which is suitable for computer modeling remains an obstacle. How can we rationalise global modeling without having available digital global data sets from large segments of the world’s land and oceans? And, while information may be available, like the World Soil Data, the scale of the information is 1:1,000,000 or more.
Make no mistake, excellent progress is being made at the global level. Nevertheless, much of it surrounds harmonisation of policies and coordination of activities. The Global Spatial Data Infrastructure (GSDI) is committed toward the global integration of organisations and initiatives to evolve international model efforts. In addition, in Europe the INSPIRE program, while having a Directive, is now beginning to develop implementation rules.
While I don’t doubt the need for global understanding and pursuit of those initiatives and support them, I question whether global modeling is better achieved from the top-down or or bottom-up. I believe the later. Whether we need to know everything to begin modeling is not the concern. The greater the number of people who can be empowered to begin this effort is the task. We need to keep complexity, but make it available for more people – from the bottom-up. Even though we are moving toward models that are more complex and sometimes include the behavior of the objects being modeled, our goal should be to make this capability available to more people.
Geographic information science is in its infancy, we are making great progress, but our global understanding is like looking at an iceberg – mostly hidden. All that we do not know are taunting us and our tools, to study and learn more. We need to develop more tools, methodologies and capabilities to further the work, and we need to invest in the data to make that happen. It is clearly a case where the sum equals the whole – we progress on multiple approaches and levels, piece by piece.
Formats, Legacy and Value
OK – let’s assume that today we had one data format. You can choose GML, KML, XML, DWG, SHP, DGN or any one the more than 250 or so that are being used – pick the one you wish. Then – let’s add in the graphics file formats available. There are a few dozen or more them. Let’s not forget the video file formats, since we will want to visualise our global data models. The list goes on and on.
Secondly, we often forget about the value of legacy data and need to consider that. After all, it is not as if we want to begin modeling the world today. I would suppose that we would want to leverage the value of past efforts and investments in spatial data for global modeling purposes as well since that data also provides a means to compare historical events.
Now consider – would we be any further ahead today if we had one data format?
The Format in the Process
If we are to meet the challenges of food production, energy production, health and environment, then our modeling efforts will become far more dependent on useful and quality data in the future. Having access to any data – not just because because it is free, standardised or the lady next door uses it – is not enough reason to consider it for global modeling.
Many of the kinds of data that we are interested in for global modeling require that we understand the underlying processes (how water moves in soil, what controls ocean movement, where carbon is cycling, when hurricanes will occur, how to assess building energy, why the plants are diseased etc.) and the data and models associated with these processes.
We need the right tools collecting the right data for the right purpose. And… We need to be doing this much more often. As we assemble these pieces and collaborate, we will be building our local SDI, regional SDI and global SDI. However, we will also be managing landscapes, building cities and living sustainably – hopefully.
There’s much investment in large global modeling environments, such as Google Earth, Microsoft Virtual Earth, ESRI’s ArcGIS Explorer and NASA’s World Wind. The prevailing wisdom in the marketplace seems to favor multiple competing globes for different purposes. Certainly the amount of data that could be spatially referenced and modeled is daunting, but what if the concept of a digital globe were to be sold on the basis of universal access and interface to all data types?
The idea is that a large global digital model would ingest any data or model format, retain all the intelligence of the format, and become a hub for any type of building data, design data or geospatial data. The hub would also act as a model-creation medium, where visualization details are extracted and analysis tools would help gain intelligence from the combined data. A digital globe as an interoperability hub would resolve quite a few issues of redundant data collection, and would also help to improve our knowledge about our planet by allowing cross-analysis of multidisciplinary data and models.
Getting consensus on such a Utopian concept is a tough thing to imagine, given the competing business interests of the many players involved, but it’s a worthy concept for discussion. The creation of the Digital Earth as originally envisioned certainly didn’t encompass multiple competing Earths.
No Standard Format
Multiple geospatial interoperability approaches exist today that provide somewhat effective solutions to the disparate systems problem. There are intermediate formats and tools that allow for the transfer of information from one system to another, but often key intelligence about the data gets left behind.
If we got to the point of a universal repository of all data types, there wouldn’t be a need to worry about data format. The centralized repository approach would require a certain amount of translation and manipulation in order to provide a base visualization capability. Allowing for the ingest of any data type would provide for more rapid build-out of the model, giving data creators the ability to share what they have without time-consuming efforts to adhere to a specific import format. Such a model would also help to preserve the tools and workflows that created the data to begin with.
The hub approach would require different tools to extract and manipulate meaning from the central model, but wouldn’t provide any barriers for visualization of the model. This way, there’s no standard format, and no lock on the model for any specific type of data. The result would be a model open for viewing by all, but with data extraction, creation and manipulation priced at a premium.
Much of the geospatial and design software business is built around proprietary data formats. There are rather complex data models that have advanced over time that are specific to individual vendor’s toolsets. These data models are tied directly to how the data can be manipulated and analyzed within these specific tools, and provide for standard workflows that add intelligence to raw observations. There’s no need to throw out the idea of individual tools and markets with this idea of a centralized model, as the centralized collection of specific data types would become separate marketplaces for both data creators and vendors with tools for specific data types.
The central model would provide a showcase for individual data capture and manipulation tools, while still maintaining these separate markets. On the data creation side, surveyors could retain a certain amount of ownership of the detailed information that they’ve collected for endeavors that require defensible legal descriptions of property. Designers of buildings and infrastructure would retain a measure of intellectual property that could reward them for any reuse of their designs. Spatial analysts and scientists would have access to rich data stores for the development of algorithms and intelligent models that they could sell as services to decision makers. Software toolmakers would retain hooks into the manipulation of their proprietary data types, and would be incentivized to develop better tools by the creation of a shared market.
The issue of model ownership would certainly be at play, with the need for an outside entity to administer the data amalgamation task and to ensure that multiple interests are maintained. Ideally, digital city models would be owned and operated by individual cities as an extension to current planning and infrastructure management departments. A public/private arrangement for the computing infrastructure could work out well for the creation and maintenance of the computing power to drive the models, similar to the Internet Service Provider (ISP) idea.
At the regional scale, the models would have inputs and maintenance from states and cross-border entities such as watersheds. The combined model would foster greater regional collaboration on issues of common interest such as natural disaster mitigation and management and transportation. The ownership of regional data would require a cooperative approach, but there are many good examples of this approach in the geospatial world already.
Inputs into a combined model at the global scale would best work for large scientific data sets. Environmental, weather, biodiversity and energy data is being collected and analyzed on the global scale already, and such a repository would allow for analysis to draw any inferences between these and other metrics. Scientific purpose would be much easier to justify than politically sensitive details such as economic data and any details related to security.
The promise of integrated models for a holistic approach to land use management is currently hampered by separate modeling environments and separate worlds. Current global visualization offerings are well suited for communicating multidisciplinary geospatial data, but they don’t offer any tools for collaboration and data sharing across toolsets. A model as an interoperability hub would resolve these issues and provide a forum for collectively increasing our knowledge about our complex world.
City and regional models are being built now with increasing fidelity that would benefit from more inputs and a broader scope. The concept of the integrated model could work at this scale, and then be built out over time. Discipline-specific digital worlds are likely a precursor to any large-scale multidisciplinary effort.
The timeframe to build such a complex and integrated model would certainly need to be at least a decade long, and the political complexity of such a global effort may make this concept out of reach. But, why not just one global modeling environment? After all, there’s just one Internet.