Spatial data quality relates to and is connected to many of the processes involving spatial applications. However, some people may be unaware of the relationship and why it is important. Vector1 Media editor Jeff Thurston interviewed Steven Ramage (left) and Graham Stickler (right) of the UK-based company 1Spatial who are involved in many spatial data quality applications around the globe. The result was a series of probing questions designed to get at the meaning of spatial data quality, why it is important and where it makes a difference – and how.
Question: Different people think of data quality in different ways. A large number of people are still grappling to understand what data involves and what it means. What does ‘data quality’ mean?
Answer: That’s a very good question. It’s interesting that the Open Geospatial Consortium (OGC) Data Quality Working Group has taken steps to try and understand that question. They recently conducted a worldwide survey to explore what ‘data quality’ means to the geospatial industry; with almost 800 responses this is obviously a topic of concern to industry professionals. There is an expectation that it means different things to different stakeholders across the spatial data supply chain. We all understand that it includes more than just geometric accuracy; it must also include areas like logical or semantic consistency, and it must also relate to fitness for purpose and the data’s intended use. As an example, if you were to receive a road network dataset that was highly accurate from a geometric standpoint, but that was not actually connected at all nodes, then its ‘quality’ would be very poor if you intended to use it for routing purposes.
The following quote from J.M Juran, the recently deceased ‘quality guru’, sums up the situation well:
“Data are of high quality if they are fit for their intended uses in operations, decision making and planning. Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. These two views can often be in disagreement, even about the same set of data used for the same purpose.”
Question: Can data quality impact work processes? How? Have you been able to see the benefits of improved data quality and could you explain some of them please.
Answer: Very much so. Improving data quality is not a once-only task, and implementing an ongoing quality control regime can have major impact on work processes and benefit a business by cutting out ambiguity and improving overall operational efficiency, e.g. limiting duplication of effort. Data quality has been defined earlier as assessing fitness for purpose and plays a vitally important role when looking at the integration of datasets. We often talk about the ‘supply chain problem’ and data quality is pivotal to this concept. A classic manufacturing supply chain derives source materials from various places, manufactures a product from these materials and then delivers the product to distributors or end customers.
In the IT world, we are attempting to do the same thing, and spatial data is playing an increasingly large role in this process. We take data from multiple sources, combine and integrate it (often to a single source) to create a product – information. We then need to distribute this product to the end customer or decision-makers. Source data may well be of high quality in the context of what it is being collected for and operationally used for (fit for purpose). However, the source systems will be ‘doing things right’ in the context of their operational role. What we are increasingly trying to achieve, as an organisation, community or enterprise, is combining all these data so that we can undertake effective analysis, planning and business decisions – i.e. ‘doing the right thing’. This spatial data integration or conflation is what 1Spatial has being doing successfully for many years and one of the key areas of expertise that will be highlighted at the Conference.
One organisation attempting to tackle the data quality issue is MidCoast Water, in New South Wales, Australia. MidCoast Water serves an area of 7000 square kilometres and the authority is responsible for reticulated water supply and sewerage systems to communities in the Manning and Great Lakes regions. The delivery of services to such a vast area brings with it many challenges. Formed just under 10 years ago from the water and sewerage sections of three local authorities, MidCoast Water has quickly grown to be an industry leader, and sees its encouragement of innovation as crucial to this positioning. MidCoast Water introduced a programme to provide improved access to accurate geographical and asset information. The programme was developed in two stages:-
1. Improving the efficiency and accuracy with which information is gathered and recorded
2. Improving the accessibility of this information.
The authority needed to ensure its data quality in order to provide the best possible service to its end customers. They needed to be confident in the reliability of their data at the attribute level so that no manual checking was required and that time was not wasted searching for assets in the field due to errors within their data. Topological connectivity between networks also needed to be assured to prevent the duplication of editing tasks, which could provide a drain on their manpower. As well as MidCoast’s ongoing programme of internal improvement, from 1st July 2006, state government regulations required utility organisations to have the ability to accurately pinpoint their assets. MidCoast Water needed to ensure that they were fully compliant with this before the legislation came into force.
Following the implementation of a spatial data quality regime, MidCoast Water now enjoys the following advantages and benefits:
• Interoperability – centrally stored data is error-free and accessible via multiple applications across the company
• Enhanced productivity – significant time and cost savings have been achieved through increased query performance
• Enterprise-wide data management – business and spatial data has been centralised into a single database, reducing the duplication of effort in maintenance of that data
• Improvements in data gathering – it now takes just a few hours, instead of a week, to translate spatial data into maps
• Efficient processing of property information – there has been a 60% reduction in staff time for this task when combined with other processes
• Versatility of application – the return on investment has increased as a result of being able to apply the data in a variety of new ways and through data mining opportunities.
The return on investment for MidCoast Water has been substantial. Not only have they met their objectives to centralise business data and to accurately manage and pinpoint their assets, they have also achieved measurable time and cost savings. As just one example, before the implementation of the new regime it appeared that two sewer stations were needed at a particular development. Under the new system, with the resultant improvement in data quality, the information supplied clearly showed that only one sewer station was required, saving MidCoast Water up to $300,000 AUD.
Question: There has been a growing discussion relating to spatial data infrastructure (SDI). It is hard to understand how all these countries and people are going to share data and information. What initiatives are you involved in which will help to make SDI a reality? What are the challenges that SDI face in your opinion?
Answer: 1Spatial has been involved with the Digital National Framework (DNF) group in Great Britain since its inception several years ago. The vision of the DNF is “to enable and support easy and reliable integration of business and geographic information regardless of who is responsible for its maintenance and where this is undertaken, thus achieving the goal of “plug and play information”.” This vision runs parallel to the idea of building a spatial data infrastructure (SDI) since it is all about providing increased access to, and being able to share, quality information across organisations. DNF is not just an activity that is unique to ‘building SDIs’, it crosses all areas of spatial information management and therefore supports SDI efforts, as well as the wider goals of the Digital National Framework. 1Spatial is also involved in SDI research in Scandinavia and the Benelux region, for example, one project is assessing ‘quality elements within the supply chain to deliver an SDI’.
Although not called an SDI, the work carried out as part of the VISTA project could be described as exactly that. The VISTA project is examining innovative ways of integrating, presenting and making use of utility data with the aim of reducing the direct and indirect costs of streetworks. Inaccuracies in existing information, and the lack of methods to integrate, share, reuse and effectively communicate knowledge held by the owners of underground assets, means that more excavations are required to locate those assets, causing unnecessary traffic congestion and increased costs for the economy, i.e. unnecessary damage to underground assets results in increased costs, injury and or possible death of workers and loss of service to consumers, both business and domestic.
UK Water Industry Research (UKWIR) is the lead co-ordinating partner for the project, with Leeds and Nottingham Universities providing the research input, and over 20 utility and other industrial organisations also involved, who have been working on a Global Schema to provide a framework that will unify the utility data from each of the partner domains (electricity, gas, sewer, telecoms and water). This should not require a change in the source schema, as well as providing enough flexibility to articulate the visualisation and analytical requirements of each domain.
The VISTA project industry partners have provided access to 16 datasets unique to company and domain, which were used extensively in the modelling and design phases. 1Spatial has helped to define the Global Schema mapping metadata, this complex metadata being maintained within Radius Studio. It is envisaged that Radius Studio will significantly reduce the time taken to integrate these different assets and improve the mechanisms of domain validation with the project partners. This will result in decreased development time and an increase in data quality and domain knowledge.
From our experience of working with organisations such as Fujitsu and ESRI as our partners on the Ordnance Survey of Northern Ireland GeoHub NI project, we know there are some practical issues to be addressed with an SDI. GeoHub NI is a great example of an SDI since it aims to host and connect to spatial data from multiple sources, enable metadata searching and, via Web thin clients, allow multiple data layers to be accessed and shared. In addition to the cultural aspects of people buying in and using the service, there are also commercial considerations around licensing the data. The main problems that 1Spatial addressed were to do with data management issues and included the validation of critical third party datasets, ease of administration, data integration, security, standards compliance and robustness of the solution. It highlighted how encompassing an SDI can be with its many components.
While trying to provide a secure gateway to web services and web mapping we encountered some limitations in building these elements of SDIs due to the maturity of OGC Web Feature Services (WFS) tools. This project was done using a modular architecture that highlights that no single vendor can provide it all, making interoperability a key consideration when mixing and matching architecture and using a number of mainstream IT approaches, such as Simple Object Access Protocol (SOAP) and Web Services-Security for remote access via WFS and Web Mapping Services (WMS). As experienced with GeoHub NI, the practical issues of secure, reliable access to fit for purpose spatial data will be significant challenges for any SDI.
This Vector1Media article covers it well: http://vector1media.com/article/feature/source-of-truth:-is-the-it-community-prepared-for-spatial-data-infrastructures?/
Question: There is a trend to web services for many organisations. In some places it appears people are still at the desktop approach, while at others, almost all of the work is web service based. Are the data quality needs for desktop versus web service oriented approaches any different? Can you explain how data quality applies to web services and desktop environments and the connection to organisational processes for each?
Answer: These needs are no different from a corporate standpoint. Individual users may have different needs in terms of what they do to ‘their’ data (e.g. tidying up geometric accuracy on a GIS workstation) but organisations are now seeking to build on and, in some instances, move beyond this departmental approach to integrate, share and reuse data across the entire enterprise. In this scenario, and if we think back to question two and the concept of supply chains, then the benefits of web services, or component architectures, seem obvious. Having the flexibility to ‘interface’ into a data quality control regime from any access point in an information workflow is vital to creating a sustainable, enterprise-wide culture and the creation of a single source of truth for spatial data. This has now been made possible with web services and maturity of mainstream tools for areas like BPEL and transaction management.
Question: It seems that data access is a big concern for many people. When you hear the term ‘access’ what does it mean to you? If someone has to transform data, what are the quality considerations involved? Are there alternate approaches involving non-transformation?
Answer: Access can be widely interpreted and has many issues associated with it depending on the interpretation. However, the issue of security is perhaps becoming the most important concern, and goes way beyond digital rights management and IT security issues. The ‘Google effect’ has opened up access to spatial data but have we thought through the implications? As data becomes more available at ever increasing levels of detail, are we happy for it to be available to everyone? Open Source Software has been around for a while, but what are the implications of Open Source Data? You may be happy for the floor plans of your buildings to be available to the Emergency Services, but do you want other organisations to have unrestricted access to such data? Data transformation is all about the repurposing of such data for alternate uses. This may involve spatial data integration considerations or simply involve physical change to the data to enable its reuse for other purposes. The quality considerations go back to the previous answers in that the data needs to be fit for purpose. The Quality Challenge is how to define and describe a fitness for purpose measure and to report on this during the data supply stage. This is the next step for the OGC Data Quality Working Group (also referred to earlier).
Question: Increasingly we see a trend toward real-time or near real-time applications. These are not all necessarily mobile based. Many of them are field based from static sites which are continually collecting information and feeding that to other parts of networks. How does one go about ensuring data quality over a real-time network?
Answer: Our approach to data quality is generic, i.e. it is a rules-based approach for automating as much of the spatial data mining, rules discovery and conformance checking as possible. If you can provide the data specifications or the business rules associated with the spatial data coming in over a real-time network, then it is still about the data validation process and reducing the roundtrip engineering costs associated with carrying out data quality checking manually on input or once the data has been integrated into a workflow.
Real-time data validation services should provide tools to increase data accuracy at the time it is entered. By using a data validation Web Service, such as Radius Studio, when the collected information is shared or distributed to other parts of the network the service can assess conformance of the incoming data, according to minimum quality levels or service level agreements if the associated business rules are known. In the event that invalid data is submitted, a system of automatic checks can be set up to prompt the provider to enter corrected information, reject a transaction, or apply business rules to improve the performance of the data collection efforts. The end result is an improved workflow that cuts out unnecessary and additional manual data validation stages.
Automated data validation in real-time requires some thought as to the subset of rules to be applied and the source of the data. For live data streams, obvious checks would be things like “all vehicle locations must be within x metres of a road or ferry route” and could be applied in real-time. In other situations, particularly for data being input by a user or copied from another system, some rules must be deferred or ignored until enough information is present. For example, “pipes must have connectors at both ends” and “connectors must be on a pipe” should not be checked for each feature when it is input, as the data will always fail at first (a new pipe will have no connectors, but a new connector will have no pipe). Instead, a related group of inputs need to be read before the checks can take place. Other checks, which require large amounts of contextual data, need to be applied at different intervals when enough data has been captured to allow a meaningful result.
Question: The architecture industry is heavily involved in the use of spatial geometry for design, construction and operation of buildings. In some cases hundreds if not thousands of polygons, nodes and lines are involved, yet, many architects are primarily concerned with building design. It seems like data quality would play an important role in building geometry, or is this not the case?
Answer: The CAD arena does indeed develop very sophisticated models and, fundamentally, they are made up of basic geometric primitives in the same way as we in the GI space put together maps in the 2D world. In the past it has been a case of never the twain shall meet, but this is changing and we are seeing a convergence of these data ‘spaces’. From a quality standpoint, the issues are no different to those mentioned above. These data are certainly fit for purpose in their source systems, but as soon as we look to integrate the data then we have to consider a new set of quality rules. 1Spatial is currently involved in research projects in the USA and have been looking at how 3D models can be ‘stitched’ into the 2D space and how topology can then be built in line with the 2D data to give a uniform structure and internal intelligence – i.e. a uniform ‘quality’ measure. Our partner LSI will be presenting on this topic as part of the CAD to GIS Track at the Conference.
Question: In Europe the INSPIRE program aims to harmonise spatial data and enable more integration at the policy level. Why should Europeans care about data quality in INSPIRE?
Answer: Policy is good for setting the path for where we want to go; maybe we could look at data quality as one of the vehicles for getting there. If we define spatial data quality as ‘fit for purpose spatial data’ and follow the old adage, ‘garbage in, garbage out’ then how can organisations even consider providing increased access, sharing or migrating spatial data without knowing and understanding the quality of their spatial data. INSPIRE references data quality numerous times, but it is not clear who has ownership of data quality in Europe at any regional, national or European level. It is also not clear what tools exist in the market place to discover information on quality.
As mentioned, one of the key aims of setting up the OGC Data Quality Working Group was to address specifically this topic, i.e. data quality across the geospatial supply chain. Finding better ways to measure, communicate and report on spatial data quality by building on the work already done by ISO with the ISO 19000 series standards are key aims. Working with organisations like ePSIplus, EuroSDR and EuroGeographics, 1Spatial is trying to connect a number of stakeholders and raise awareness of issues around spatial data reuse and data quality to help prepare for INSPIRE.
If European geospatial professionals don’t consider data quality issues now, it will be too late once spatial data is accessible in INSPIRE and used in applications and the data is not fit for purpose. At that stage they will have to tackle it retrospectively, incur the huge costs, headaches and business process breakdowns associated with poor quality data. We should learn from painful lessons in other IT data management disciplines, such as CRM (customer details and addresses) and ERP (asset information) where there is now a multi-$billion industry addressing all their data quality problems and avoid the geospatial industry having to spend similar amounts around INSPIRE. Expertise, methodologies and tools exist now to avoid this situation with geospatial data and to help prepare for INSPIRE.
Question: The game industry is another industry that is heavily based on spatial geometry for the design of new games as well as virtual environments. Much of this work is leaning toward 3D. Can you explain how spatial data applies to games in general but also the impacts on the trends towards 3D applications.
Answer: We can learn from the gaming industry, in particular from a visualisation perspective and how individuals may interact with spatial data in the future. By combining 3D models with 2D geospatial data and providing access through gaming style platforms and interfaces, it will be possible to create authentic virtual versions of our world, rather than fictional ones, that we may explore remotely. The application possibilities for planning, emergency services and, of course, the military are also now becoming apparent. For other 3D applications it will eventually become important for analysis and decision making to have greater intelligence in the data, such as topology. Without this type of connectivity, extensive manual efforts will be necessary to try and obtain more information relating to the nature of the data and make joined-up decisions.
Question: What are some of the environmental and sustainability applications that involve your products and how have they fared?
Answer: Our products are all involved in spatial data integration or data management exercises on a worldwide basis. Some examples that will have an impact on environmental and sustainability applications are:
The Environment Agency used 1Spatial’s data re-engineering expertise to speed up spatial queries and image rendering by simplifying the data within its National Flood and Coastal Defence Database (NFCDD) system. An initial prototype exercise showed that there were significant benefits to be gained from generalising the LIDAR data using persistent topology. Rendering an image yielded a 115% increase in the speed, whilst spatial queries using address points showed a 229% increase. There was also an accompanying increase in the accuracy of the query results returned.
Rural Payments under the Common Agricultural Policy have data quality challenges associated with aligning raster plots with the national topographic vector datasets. INGA, the Portuguese Ministry of Agriculture, is implementing a solution to ensure more efficient administration of the system which co-ordinates and pays EU subsidies to over 400,000 Portuguese farmers per year. This work has been done in collaboration with Intergraph, a member of the 1Spatial Community.
CEH (Centre for Ecology and Hydrology, part of the Natural Environment Research Council) initially undertook a joint feasibility study with 1Spatial and Ordnance Survey Great Britain to generalise OS MasterMap® data as a potential foundation for the Land Cover Map (LCM) 2007. This is now at production rollout stage and will form the basis of the UK’s submission for the Coordination of information on the environment (CORINE) programme.
MidCoast Water, Australia purchased Radius Topology from 1Spatial to improve asset data quality and made substantial savings, as previously mentioned in question 2.
Question: Today – more people are interested in extracting ‘value’ from all the costs involved in their data holdings. Could you elaborate on how value could be realised more fully and provide a few examples as well?
Answer: There has been a huge amount of investment made in the collection of both 2D and 3D data over the years and this process shows no signs of slowing down. In many cases it could be argued that these data represent an organisation’s biggest and most valuable asset. Software, databases and even staff will come and go, technologies will change, but the data remains. However, very often the knowledge will disappear or be isolated and the data itself will drift and slowly deteriorate.
In order to extract the maximum use from these data (maximum value from the asset) two things are needed. Firstly, we need to store all the intelligence, knowledge and wisdom about these data alongside the relevant data and not have them locked in the logic of some software application. Secondly, we must be able to communicate how these data may be used. Metadata takes us a long way towards the first of these aims, but the ability to define the quality in terms of fitness for purpose will enable us to solve the second issue; this will allow us to fully recognise or realise the value.
As an example, Pira International Ltd noted in a report in 2000 that the spatial component of data held within the Public Sector across Europe was over €36bn, and growing rapidly. At that time, the re-collection cost of these datasets was estimated at in excess of €100bn – one can only imagine the total investment today. At that same time, the estimated costs for North America were $200bn. This was almost 10 years ago and there has been a data deluge since that period. By assessing the fitness for purpose for spatial data, organisations can take large strides towards realising the value.
Question: “We have been doing it this way for a long time and it is the procedure we use to collect our information and make our decisions.” How often do you hear this? What pieces of wisdom would you add to this line of thinking?
Answer: Unfortunately it is human nature to respond in this way, either because change threatens the status quo in terms of business processes and could be seen to entail additional work, or because it can be perceived as threatening job security or highlighting problems that may have been well hidden until procedures were reviewed and adapted. So the answer is that we hear it very often.
It’s a very open statement, but change is the only constant in business life, so we would encourage all geospatial professionals to consider this fact when thinking about their spatial data management. It’s only by challenging the status quo and reviewing existing processes and procedures that you can improve them and the overall effectiveness of your operation.
We believe that quality underpins analysis, planning and decision making based on spatial data, so we would encourage you to step back from the way you have been doing it and think about how you could be doing it better, starting with an assessment of your data’s fitness for use.
Question: What is unique about your upcoming 1Spatial Conference in the UK and what might I learn by attending? What are some of the topics that will be discussed and how can I find out more about it?
Answer: The 1Spatial Conference brings together a wide range of organisations to consider industry-relevant topics, as opposed to simply being a user group or an event limited to one specific area of the market. The experiences shared and the problems addressed by the conference are global in nature and cross into all sectors of the geospatial community, hence the marketing messages that it will highlight the topics of Spatial Data Quality, Spatial Data Infrastructures and CAD to GIS convergence.
The Conference aims to provide practical advice and share experiences, for example in trying to achieve INSPIRE goals. There will be a FREE INSPIRE seminar and metadata workshop, speakers from the European Commission Joint Research Centre, EuroGeographics and the Chief Executive of Ordnance Survey of Northern Ireland will also provide insights on this topic.
Two useful opportunities in this area concern INSPIRE and DNF activities; delegates can come and learn about tools and processes that can be used for the discovery of metadata that can subsequently be published in catalogues for other users to assess its fitness for use. Published metadata consists of data quality items (qualitative and quantitative metrics) and is encoded in a standard form (ISO 19139 Metadata – XML schema implementation). The final results of the conformance tests using Radius Studio are obtained in the form of metadata, which is compliant to the conceptual model of ISO 19115 Metadata and encoded in the form recommended in ISO 19139. There will also be a DNF demonstration that will highlight the benefits of Unique IDs and how users can profit from data providers supplying stable identifiers with lifecycles. It will also highlight how the spatial data supply chain can use the lifecycle information once it has been generated.
The speaker line-up is exceptional and provides a comprehensive insight into a range of geospatial areas, including CAD/GIS convergence, database technologies, industry standards and best practice, Open Source and SDIs.
You can find out more, and register for the Conference, by visiting the conference web pages at www.1spatial.com/conference