Which data should we use? Is it accurate enough to be used with our model? Will it address the issues and provide us with new insight we can depend upon? How much error is too much? Will the spatial data describe the model we are using without excessive error? Many geographic information system (GIS) users wish to extend their geographic data through the use of spatial statistical data analysis. Geostatistical tools provide deeper insight toward understanding the quality and value of data, as expressed through modeling. Author Konstantin Krivoruchko writes about geostatistics from many perspectives in this book on DVD. He also discusses the relationship of modeling to data, the benefits, disadvantages and values for applying certain techniques and methods.
Spatial Statistical Data Analysis for GIS Users
ISBN: 9781589481619 2011 894 pages
Review by Jeff Thurston
Konstantin Krivoruchko is a senior research associate on the Esri software development team who played a central role in developing ArcGIS Geostatistical Analyst. That product is connected to the ArcGIS system and is available for use through the Toolbox of the software. Those interested in spatial statistical analysis using GIS, often seek to gain more detailed insight into how that relates to modeling efforts.
The author provides a wealth of detail and information about these kinds of relationships, including both data and working examples on this DVD-book. It is divided into three parts including, a discussion of the processes of statistical analysis including probablistic modeling, assessing data error and modeling uncertainty. The principles of modeling diagnostics is also included. Part 2 includes discussions on modeling assumptions and requirements for statistical modeling at various scales, along with advantages and disadvantages. Part 3 shows how to use statistical software for statistical modeling.
To begin, first click on the Esri.exe file once the DVD is loaded. Some presentation screens appear with the usual legal info and so on, then a notice that Adobe Reader is needed to view the book. Either the Book or Assignments can be opened from the start – I chose the book. Since I do most of my work on a laptop, I chose to zoom in to 150% on the PDF file.
Right off the bat I knew this was going to be interesting from the wide range of examples included from around the world for the discussions. For example, temperatures between North America and Europe, thyroid cancers from Belarus, soilt emperature and moisture, weed counts from a Nebraska corn field, particles and ozone analysis in California, fertilizer applications in farm fields in Illinois, fishery data from the British Isles, child mortality rates in Africa, German forestry, level of happiness in Europe and stolen cars Redlands, California (home of Esri).
As indicated, the main themes of the book are statistics, uncertainty, spatial dependence, data distribution and probabilistic modeling. “To understand and interpet spatial information is called spatial analysis,” Krivoruchko says. In practice, geostistics if used for decision making, but the author suggests that another role involves understanding uncertainty. How many times have you asked yourself when collecting GIS data using a GPS navigator, total station, photogrammetry or other technique, “that seems accurate and what we are seeing, but how do we know with less uncertainty?” Many of us have collected information through a variety of technologies, then sat down with a GIS attempting to integrate it and wondered whether or not what we really think and observe, is aligned with reality. Insurance companies, for example, are notorious for wanting to know about uncertainty. It forms the basis for understanding their risk and exposure when underwriting policies.
Models are representations of reality. They can be either continuous, discrete or regional. This book describes these various types and uses maps to show how these differences are represented. A map of lightning strikes show their level of polarity and uncertainty, serving to provide an indication of potential risk for wildfire occurrence. Graphs of statistical information show that amounts over time vary more widely around Redlands as compared to other cities in California (I would guess this suggests morning and night traffic patterns without much industrial contribution by comparison). Statisitcs and equations are included that outline uncertainty based upon kriging techniques, for example.
The case of cesium measurement in soil around Belarus provides a clue as to the strength, value and significance that detailed statistical analysis can provide. Although the actual sampled soils seemed to indicate trends, prediction errors and root-mean-square errors would suggest alternative possibilities. Another example shows the male mortality rate from lung cancers based on U.S. National Map data and U.S. Geological Survey data. The rate is much higher in the south east part of country, and the data even shows which counties.
The power of geostatistical tools becomes further evident as analysis is coupled to simulations. Based on 100,000 simulations, the temperature for western Europe can be shown with respect to distribution. Gamma distributions are used for modeling soil, water, air, house prices and so on. They include positive numbers and follow smooth distribution curves, in many cases. The distributions are based upon shape and scale. An example from Sweden is given for rainfall distribution and readers may also find it in earlier ArcNews archives. In this example, the user was looking for the probability of radioactive rainfall occurring.
Do you want to know where the most milk cows of the total population of cows are located in the U.S.? This book explains where they are and how the map was created. using proportional statistics, a fitted beta distribution shows that quite a few of them are in Illinois west of Lake Michigan and relate closely to the value of the crops and those that are family owned. In other words – high value crops lead to more milk cows – provided families are milking them. Taking this one step further, it would be interesting to see milk prices as a distribution based upon travel time to markets for various States.
Reading through these examples begins to provide the insight that many of us often do not pay attention to – but should. It is like we collect and manage vast quantities of spatial data, but do not spend enough time analysing it to realise first – the message, and second – the value. Without applying geostatistical analysis to spatial data, it is similar to performing earth observation without the benefits of image processing.
Krivoruchko considers whether or not a bird observation is valuable or not based upon the expert knowledge of the person making the observation. A map is drawn and viewers can see who is right on with their observation as compared to those who need to learn a bit more. Vairability for grain crops was assessed and using geostatistical techniques, the level of uncertainty for pH, cation exchange capacity and nutrient availability was determined. In princple the author states, many agricultural variables can be analysed through simulations and that the number of simulations that must be included rises until the distributions level out. Although this sounds simple, it involved considerable analysis that is handled readily through geostatistical analysis.
Coal seam analysis is investigated using kriging techniques, and models are described for ascertaining the average depths and presence of coal based upon existing data sets. Similarly, digital elevation models (DEM) are asssessed for their accuracy and the analysis shows that the greatest variability in the data is where the largest slopes occur. The entire State of California is monitored for atmospheric air quality and statistical analysis provides predicitive clues for various regions and what to expect in terms of changing atmospheric patterns.
Principles of modeling regional data begins to look at regional statistics. Spatial smoothing, cluster detection, spatial regression, autoregressive modeling and Markov fields are discussed. Cadmium was evaluated for Austrian mosses to ascertain the distribution of those mosses. There are a few regions in Austria where cadmium is indeed higher in some of the mosses. Although, not widely spread, it does show the variability.
Spatial dependency is discussed in later chapters in greater detail. Maps help to provide a visual understanding of the relationships of spatial dependency for numerous variables across regions of various size. The incidence of auto thefts is greater than the incidence of robberies in most parts of Redlands, California based upon hierarchical modeling of point data.
The second part of the DVD includes the individual assignments that relate to the conceptual text. Data is provided for loading into ArcGIS and users will find this helpful for really understanding how the software connects to the results.
In summary, this book is very well done. It includes a unique combination of statistical analysis and spatial data, representing results through maps. I would expect that many readers will come away from this text with a greater appreciation of the potential for spatial statistical analysis using GIS and would wonder if they are gaining the full value and knowledge of their own data. This book not only explains spatial statistical analysis, but presents cases for real places and circumstances from around the world. Since the data for the presented discussions is included, readers can readily load the data and follow along, learning how their own ArcGIS can be used in this way.
It would be interesting to see a series of books based on spatial statistical analysis for individual industry sectors like transport, agriculture, housing, environment and so on. I mention this because I can almost see those sectors reading this and wanting more.
Spatial Statistical Data Analysis for GIS Users is a book that many GIS users should read. It explains how the data many people collect, has value – or not. The book really goes to the heart of squeezing value from often expensive and time consuming data collection, and shows why geostatistical analysis should be considered. Based on this book and the techniques within, I would suggest that most large organisations could extract another 15% of value or more from their spatial data archives in new ideas, decision making value and understanding their own operations.