Methods for Visually Communicating Data and Meta-Data
The communication of uncertainty has been studied by Morgan and Henrion (1990) with Harold Ibrekk, although their concern was the communication of the uncertainty of linear data. Because of this, the methods they present cannot be readily used in the communication of mapped geographic data, except for point symbolization (even this is difficult in ARC/INFO), but their study highlights some of the difficulties that can be encountered in the visual communication of uncertainty and their conclusion are useful.
Their study analyzed nine methods for displaying uncertainty in linear data: a point estimate with an error bar; a discrete density function); a pie chart; a probability density function; a half-height probability density function mirrored on the x-axis; a dot density horizontal bar a vertical line density horizontal bar; a modified Tukey box plot (minimum and maximum points are not included, and a mean point is included); and a cumulative density function (see Figure 1.9). Their recommendations include using displays that specifically show the information that is to be extracted (such as a point for a mean value, if mean values are important), and using multiple displays (particularly, display the cumulative density function and the probability density function one over the other, with a common horizontal axis).
Figure 1.9 The methods of graphically communicating uncertainty tested by Morgan, Henrion and Ibrekk (redrawn from Morgan and Henrion 1990, 228-9; not to scale). For single variable graphs Morgan and Henrion suggest using a combined probability density/cumulative density display, and labeling important point in the data (for example, the mean value).
For two- and multidimensional uncertainty in non-spatial displays, Morgan and Henrion give several examples of methods that can be used for graphing (see Figure 1.10). These include: multiple lines in a probability density/cumulative density pair; multiple Tukey box plots; linear graphs with error bars; orthogonal displays; and triangle plots alone and in multiples.
Figure 1.10 Two- and multidimensional representations of uncertainty (redrawn from Morgan and Henrion 1990, 240-51; not to scale). Values in triangle plots are read by going from each of the bisecting lines out perpendicularly to the point of intersection; the position on the bisecting line is the value of the point for that measure, and the sum of the measures from the three bisecting lines will equal 100%. The 50% and 90% probability areas for the most likely location for a measured point indicate peaks in the trivariate probability density function.
They conclude with a list of factors that should go in to design of uncertainty displays (Morgan and Henrion 1990, 252):
- finding a clear, uncluttered graphic style and an easily understood format,
- making decisions about what information to display,
- making decisions about what information to treat in a deterministic form and what to treat in a probabilistic form,
- making decisions about what kind of parametric sensitivities will provide key insights.
They also suggest that display design often involves the reduction of a multidimensional model into the two dimensions of a paper or monitor display (as does Tufte 1991), and that the intended audience's experience in interpreting graphs must be considered when creating a display.
For the representation of uncertainty for spatial variables, there are two possible cartographic routes. The first of these is the creation of two maps, one for displaying the data, and one for displaying the meta-data. Olson (1981) and Laurence Carstensen (1986) have tested this choropleth map arrangement against two bivariate mapping techniques (Olson: spectral encoding; Carstensen: intersecting lines) in the representation of statistical correlation.
Olson finds that map readers can initially interpret value shaded, monovariate map pairs more readily than bivariate maps. On the other hand, she reports that over half of those who could interpret bivariate maps at a significantly better than guessing level, did better with a bivariate map than two separate maps. She then suggests that the bivariate maps may be more readily interpreted once the "cognitive hurdle" (Olson 1981, 269) of the bivariate mapping technique is overcome.
Carstensen also finds that map readers find interpreting value shaded, monovariate map pairs easier than bivariate maps. Because his test compared a classed map pair and unclassed bivariate maps, the problem he notes with the use of map pairs (poorer statistical residual scores) would not be as likely to occur if map pairs are compared with classed bivariate maps.
The map pair technique should be a useful tool for communication of two variables, when the data must be classed (as is done for pragmatic reasons in ARC/INFO). Since meta-data values may not be correlated with data values, the use of two monovariate maps should be an effective tool for communicating uncertainty versus techniques that were designed to highlight spatial correlation.
The second cartographic route for the visual representation of uncertainty is bivariate mapping (the mapping of two variables onto the same map). This would ensure that meta-data is presented to the map reader with the data, but this mapping method generally increases the difficulty with which data can be extracted from a map. Because of the variety of visual variables, there are several possible methods of bivariate mapping, some of which have been tested for communication effectiveness; most of these have been oriented toward the representation of spatial correlation. This includes the testing of the intersecting lines method (which uses texture), color maps made by the U. S. Census Bureau (which uses a full hue range to achieve bivariate representations--spectral encoding), color maps that rely on two complementary hues, and the equiprobability ellipse (which also uses the complementary hues).
In a technique that allows for bivariate mapping in monochrome displays, Carstensen (1982) has tested the communication effectiveness of the intersecting lines method of bivariate mapping (see Figure 5.4b). These maps use horizontal and vertical lines for representing two variables. Carstensen suggests that this scheme be used in an unclassed map, but this scheme can be used for classed representations of data and meta-data. Although this technique can be used in monochrome displays, colored lines can be used to distinguish the two variables (this has not been tested for communication effectiveness, though).
This technique has one disadvantage: the method of producing the area symbolization may lead to a conflict between two visual variables. Both texture and value change with the mixing of the lines; both establish visual hierarchies with coarse textures and dark values standing out in the display. This is a problem because in the representation scheme these two points (coarseness and darkness) are at opposite ends of the data ranges. This can cause value to be used for identifying relationships even though squareness in the texture is the intended visual variable (Carstensen 1986).
The U. S. Census products, originally published for 1970 census data, that show two variables were studied for communication effectiveness by Olson (1981) (see Figure 5.5c Figure 5.5c and 5.5d). These maps use two hue patterns for representing two variables, with the part-spectral ranges of yellow to blue and yellow to red being used on the x and y axes, respectively. Olson reports that some of these maps (such as education and income) convey information well, especially for homogeneous regions. She also indicates that these maps are thought to be more authoritative and more innovative than two separate maps showing the same information. In concluding, she suggests that prominent and clear legends are necessary for accurate interpretation of bivariate maps; that both the monovariate map pair and the bivariate map be shown, with the monovariate map pair in a monochrome format; and explanatory notes should be include to the types of information presented. These guidelines are in keeping with both Morgan and Henrion, and Tufte, and are feasible with ARC/INFO's ability to rapidly generate small, monovariate maps.
As a response to the problems of interpreting a spectrally encoded bivariate maps, Steiner (1979, in Eyton 1984) has proposed a complementary color scheme (see Figure 5.6). This color system makes use of the mixing of complementary colors (such as red and cyan) to produce a central gray region that highlights the diagonal that represents correlation in a bivariate map. This can be done for unclassed or classed representations of data, and by flipping the order of one tint, negative correlations can be shown as well as positive correlations.
Building on Steiner's proposal, J. Ronald Eyton (1984) developed the complementary color bivariate map with an equiprobability ellipse (see PEE.aml, LEE.aml and CEE.aml). These maps use a modified, 2x2 complementary color range, with an additional class that occurs in the middle of the matrix and represents the central cluster of data. By plotting the two variables to be mapped on a scatter diagram, linearly correlated data will have an ellipse-shaped central cluster. By selecting a percentage of the total number of observation to be included in the central ellipse, a category for the central cluster can be created with the formula (Eyton 1984, 488):
|
|||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||
When this ellipse is displayed in gray, surrounded by the four corners of the data set in white, black, red and cyan, a bivariate map that portrays the central data cluster explicitly (without having a staircase effect) can be created.
There are other possible methods of bivariate mapping that have been postulated as effective communicators of (un)certainty. Two of these, which have not been tested, are color intensity and focus. Neither of these methods place emphasis on a correlation of the data and meta-data variables, but rather use visual variables to highlight portions of the data set, as determined by the meta-data information.
Use of color intensity may prove useful for the cartographic representation of uncertainty, because it allows the highlighting of certain (or uncertain) areas by specifying intense shades for those areas, and less intense for others (see Figure Figure 5.5b). Because intense colors stand out in an image, this technique would allow the creation of a distinct visual hierarchy that emphasizes certain (or uncertain) values. Generally, for maps that will be used for data communication (as in the DiBiase model), intense colors should represent certain areas and less intense (that is, more gray) colors should represent uncertain areas.
Focus is potentially a useful means of representing uncertainty. MacEachren (1992) presents several possible variations on focus: edge crispness (for external boundaries of points, lines and areas), fill clarity (for internal boundaries within point, line or area symbols), fog (by imposition of an interposing, translucent layer over another symbol), and resolution (point thinning for vector databases or aggregation in to larger area units for raster databases). All of these involve the blending of a symbol with the surrounding parts of the image, thereby eliminating clearly defined regions. In ARC/INFO edge crispness can be accomplished by buffering regions and careful assignment of color in the buffer areas; resolution can be accomplished by thinning points manually or with the ARCEDIT command GENERALIZE.
Another method of bivariate mapping is the use of a fishnet, orthogonal, view to display one data set, with another data set used to color the display (see Figure 9.2). This is commonly used as a technique for displaying terrain elevation (with land use/land cover draped over the net by specifying the net's color with the land use symbolization scheme), although there is no restriction on its use for other types of data. When a data layer represents information that is known to be continuous and smoothly changing (such as elevation, air temperature, or air pressure), this type of representation is appropriate. If the uncertainty of a spatial data layer can be shown or assumed to be continuous and smoothly changing, a fishnet representation of that statistical surface (with data values used to specify the color of the net) should be an effective method of conveying meta-data.
Finally, another representation of continuous and smoothly changing data that could be used to display data and meta-data is the use of isolines (see Figure 9.1a). With this technique data and/or meta-data can be represented, with the use of another display technique if only one is to be displayed with isolines. Like fishnets, isolines are often used for displaying terrain information (as in topographic maps), but isolines can also be used to show data and meta-data. This could be accomplished by using different hues, values, intensities, sizes, or textures to indicate which lines represent data and which represent meta-data.