Birth Defects Research Part A: Clinical and Molecular Teratology

Volume 82, Issue 2 pp. 110-119

Original Article

Full Access

Visualization and interpretation of birth defects data using linked micromap plots^†

Samson Y. Gebreab,

Samson Y. Gebreab

Department of Watershed Sciences, Utah State University, 5210 Old Main Hill, Logan, Utah 84322-5210

Search for more papers by this author

Robert R. Gillies,

Corresponding Author

Robert R. Gillies

[email protected]

Department of Plants, Soils and Climate, Utah State University, 4280 Old Main Hill, Logan, Utah 84322-4280

Department of Plants, Soils and Climate, Utah State University, 4280 Old Main Hill, Logan, UT 84322-4280===Search for more papers by this author

Ronald G. Munger,

Ronald G. Munger

Department of Nutrition and Food Sciences, Utah State University, 8700 Old Main Hill, Logan, Utah 84322-8700

Search for more papers by this author

Jürgen Symanzik,

Jürgen Symanzik

Department of Mathematics and Statistics, Utah State University, 3900 Old Main Hill, Logan, Utah 84322-3900

Search for more papers by this author

Samson Y. Gebreab,

Samson Y. Gebreab

Department of Watershed Sciences, Utah State University, 5210 Old Main Hill, Logan, Utah 84322-5210

Search for more papers by this author

Robert R. Gillies,

Corresponding Author

Robert R. Gillies

[email protected]

Department of Plants, Soils and Climate, Utah State University, 4280 Old Main Hill, Logan, Utah 84322-4280

Department of Plants, Soils and Climate, Utah State University, 4280 Old Main Hill, Logan, UT 84322-4280===Search for more papers by this author

Ronald G. Munger,

Ronald G. Munger

Department of Nutrition and Food Sciences, Utah State University, 8700 Old Main Hill, Logan, Utah 84322-8700

Search for more papers by this author

Jürgen Symanzik,

Jürgen Symanzik

Department of Mathematics and Statistics, Utah State University, 3900 Old Main Hill, Logan, Utah 84322-3900

Search for more papers by this author

First published: 29 November 2007

https://doi.org/10.1002/bdra.20419

Citations: 10

^†

Presented at the Urban and Regional Information Systems Association's (URISA) GIS in Public Health Conference, May 20–23, 2007, New Orleans, Louisiana.

Share a link

Email
Wechat
Bluesky

Abstract

BACKGROUND: Many states have implemented birth defects surveillance systems to monitor and disseminate information regarding birth defects. However, many of these states rely on tabular methods to disseminate statistical birth defects summaries. An innovative presentation technique for birth defect data that portrays the information in a joint geographical and statistical context is the linked micromap (LM) plot. METHODS: LM plots were generated for oral cleft data at two geographical resolutions—USA states and counties of Utah. The LM plots also included demographic and behavioral risk data. RESULTS: A LM plot for the USA reveals spatial patterns indicating higher oral cleft occurrence in the southwest and the midwest and lower occurrence in the east. The LM plot also indicates relationships between oral cleft occurrence and maternal smoking rates and the proportion of American Indians and Alaskan Natives. In particular, the five states with the highest oral cleft occurrence had a higher proportion of American Indians and Alaskan Natives. Among the 15 states with the highest oral cleft occurrence, nine had a smoking rate of 16% or higher while among the 15 states with the lowest oral cleft occurrence only one state had a smoking rate greater than 16%. The LM plot for the state of Utah shows no clear geographic pattern, due perhaps to a relatively small number of cases in a limited geographic area. CONCLUSIONS: LM plots are effective in representing complex and large volume birth defects data. Integration to birth defects surveillance systems will improve both presentation and interpretation. Birth Defects Research (Part A), 2008. © 2007 Wiley-Liss, Inc.

INTRODUCTION

Birth defects are one of the leading causes of infant mortality and childhood morbidity in the US; the statistics for the US hold that birth defects alone account for 21% of all infant mortality (CDC,1998). Most of these birth defects result in a range of disabilities where the economic cost of medical treatment is substantial: according to Waitzman et al. (1994), the estimated lifetime cost of care for the number of US children born with the 18 most common birth defects exceeds $8 billion per year. In addition to the economic effects, children who are born with such birth defects often experience long-lasting psychological and physical burdens. For many years now, there has been a continuous and concentrated effort to monitor birth defects through data collection and surveillance systems—with the ultimate goal of establishing prevention strategies. As a result, many states have developed monitoring systems that collect data on the occurrence of birth defects along with other crucial information—the underlying objectives being to catalogue and disseminate information regarding the prevalence of birth defects (Sever,2004). These data are very important in providing information as to the monitoring of such fundamentals as the occurrences and trends of birth defects. Moreover, historical characteristics of the data are particularly useful in public health planning services, implementing prevention strategies such as allocating finite resources to the most affected areas, and improving health care access to affected children and families. Furthermore, these data are essential in a scientific sense as they are often used to generate hypotheses that are used to research the risk factors that may be associated with birth defects.

As a component of public health surveillance, states collect data on 45 major birth defects and related information (National Birth Defects Prevention Network [NBDPN],2005). In addition, these states and many US public health agencies (e.g., CDC and NBDPN) play an important role in making birth defects data accessible to the public through differing media. However, they tend to depend on tables to disseminate the birth defects information. For example, in its role to inform the public, the NBDPN published birth defects data for the period of 1998–2002 (NBDPN,2005). The published report contains a myriad of data that consist of estimates for each birth defect by state, race/ethnicity, and, for some birth defects, by age of mother–all of which are in tabular form that constitute multiple rows and columns that run through many pages.

Publishing large statistical datasets in tabular form is an important way of managing data but is not particularly informative from an interpretative standpoint. It may be difficult and frustrating for a reader to observe trends, relationships, and anomalies that may be present in the data. A user is forced to scan through many pages of tables, and tries to build a visual picture that permits an integrative understanding of the numbers, for example, which state has the maximum number of cases in a particular year. Equally, it can be argued that tabular data are especially useful to researchers who are interested in utilizing the raw data to conduct research; however, researchers likewise require a conversion of bulk tabular data into a visual framework to help not only in understanding the structure of the data, but further to facilitate the analysis of the data. Furthermore, there is value in reporting to the public in an informative way while at the same time facilitating the presentation of data for policy makers to enable them to make informed and timely decisions. These aforementioned circumstances suggest that the conversion of tabular data into a visual and ordered context can illustrate patterns and relationships and so forth in the data to an observer that would erstwhile be elusive and moreover, in the most practical sense, be an efficient vehicle for disseminating information to the public and decision makers.

Visualization techniques offer a set of tools that can be used to simplify large and complex datasets into more comprehensible forms. They offer the ability to transform large public health datasets such as birth defects data into a more meaningful representation of the underlying epidemiological information in a revealing way without overwhelming the reader. Using visualization, trends, relationships, and anomalies that were not at first obvious in the tables can be revealed quickly. Moreover, visualization increases the effectiveness of communicating information to the public and further enables users to do a critical evaluation of the data while at the same time likely reducing errors in its interpretation but maintaining consistency.

Over many years, many visualization tools have been developed to convert tabular information into visual graphs or plots (e.g., Carr,1994), but a fairly recent development in the field highlights the use of linked micromap (LM) plots (Carr and Pierson,1996) as a way of displaying geographically indexed data. LM plots use multiple small maps (called micromaps) to visualize complex data structures in a geographical context. LM plots have already been used in many fields, including environmental science (Carr and Pierson,1996; Carr et al.,1998; Symanzik et al.,1999), ecology (Carr et al.,2000a), epidemiology (Carr et al.,2000b; Symanzik et al.,2003), and in the case of federal statistical summaries (Hurst et al.,2003). However, LM plots have not been specifically applied to birth defects data. The purpose of this article is to highlight and examine the use of LM plots in presenting geographically indexed birth defects data. Specifically, it will demonstrate the use of LM plots to graphically represent statistical summaries and their associated uncertainties for oral cleft occurrences (oral clefts are defined as a cleft lip and cleft palate birth defects, where occurrence of oral clefts observed is prevalence at birth). This is done at two geographical resolutions: (1) at the state level for the US, and (2) at the county level for the state of Utah. Furthermore, LM plots are used to graphically relate oral cleft occurrence estimates with associated demographics and behavioral data collected at the same geographical resolutions.

A final important point is that of ensuring confidentiality. All the data used in the construction of the LM plots were aggregated values and so an individual's information is kept strictly confidential. In fact, LM plots are not designed to show specific data at a particular location but more to group information into manageable units such as a statistical summary that by its very nature removes the individual from the picture.

MATERIALS AND METHODS

Data Sources, Breakdown, and Aggregation

Birth defects and other variables of interest (including data on demographics and behavioral risk factors) were obtained from various sources. National data for oral cleft occurrence and livebirths for the period of 1998–2002 were obtained from the NBDPN (2005) as issued in Birth Defects Research (Part A). Thirty-five states participated in reporting up to 45 major birth defects and, of these, 31states contributed oral cleft occurrence. The relevant data for oral cleft occurrence were compiled for each state. Next, occurrence of oral cleft in each state was computed per 10,000 livebirths (NBDPN,2005) for the same period.

Oral cleft occurrence for the state of Utah was obtained from a case-control study of oral cleft occurrence undertaken by the Center for Epidemiologic Studies, Utah State University, that covered the period of 1995–2004. The cases used in the study were originally obtained from the Utah Birth Defect Network, a statewide surveillance program that monitors and detects birth defects in Utah. All cases had street address information of the mother's residence at the time of birth. The street address information was transformed (geocoded) to map coordinates and then aggregated to the county level. The live births at the county level for the same period (1995–2004) were obtained from US census data (http://quickfacts.census.gov/qfd/index.html) after which the oral cleft occurrence in each county was computed per 10,000 livebirths for the period of 1995–2004.

Details of the geocoding process are as follows. Case addresses were geocoded using the ArcView geocoding utility and Dynamap/2000 (Version 14.3). Street File Network information for the state of Utah was obtained from Geographic Data Technology, Inc. (GDT,2004). Of the total cases, 96.6% of them were automatically geocoded or interactively geocoded with minor editing for spelling, street aliases, and acronyms. Certain addresses (0.5%) were unmatched and geocoded manually with the assistance of internet mapping services such as Yahoo Maps, MapQuest, and Google Maps. A number of the cases (2.7%) did not have a geocodable address but geocoded either to the city or zip code centroid. The geographic centroids were obtained from a 2004 Municipalities shapefile or a zipcode shapefile available from the Utah Automated Geographic Reference Center (AGRC,2006). The remaining cases (0.2%) were excluded from the analysis because no address was resolved or the location resided outside the state of Utah.

Maternal smoking is a well-established risk factor for oral clefting (Khoury et al.,1989; Little et al.,2004). Therefore, data on the percentage of maternal smoking during pregnancy were obtained for the purpose of relating this particular risk factor with the oral cleft occurrence; this was done at the state level. The data on the percentage of maternal smoking during pregnancy for 2002 were obtained from Mathews and Rivera (2004). The data were collected from birth certificates and reported by 49 states (including the District of Columbia and New York City) to CDC's National Vital Statistics System, operated by the National Center for Health Statistics. Of the 31 states that reported oral clefts, only California had not collected data on maternal smoking using the same protocol as the rest (data collected as to maternal smoking at time of pregnancy was through birth certificates; California was an exception, as it does not conform to the standard format used by the other states, hence it was coded as data not available), but instead of excluding it from the analysis, it was included as missing data. As to the reliability of the maternal smoking data collection, Mathews and Rivera (2004) note:

Second, prenatal smoking is underreported on birth certificates. Underreporting might be related to the wording of the smoking question, the timing of the data collection (e.g., during prenatal care versus after the live birth), and the stigma associated with smoking during pregnancy, particularly in cases of poor birth outcome. However, despite underreporting, the trends and variations in smoking derived from birth certificate data have been confirmed with data from other sources (e.g., National Survey of Family Growth and Pregnancy Risk Assessment Monitoring System). (p. 913)

In addition, demographic factors, that is, race and ethnicity, are also understood to be risk factors in oral cleft occurrence. For example, the risk is particularly high in the American Indian and Alaskan Native (AIAN) population (Coddington and Hisnanick,1996). Therefore, data on the proportion of AIAN in the population for the year 2000 was obtained from the U.S. Census Bureau (2000), accessible at http://www.census.gov/prod/2002pubs/c2kbr01-15.pdf.

Visualization Technique

The graphical visualization technique presented in this article is referred to as LM plots. LM plots provide an alternate way (compared to traditional choropleth maps—for a comparative discussion on the relative merits of choropleth maps see Symanzik and Carr,2008) of displaying geographically indexed statistical summaries (e.g., oral cleft occurrence for each state or counties within a state) in a corresponding spatial context (Carr and Pierson,1996; Carr et al.,1998). LM plots combine both an exploratory analysis capability together with traditional statistical graphics while maintaining the geographical context.

Before LM plots are programmed and subsequently displayed (using the statistical software package S-plus or on the web), LM plots require a generalized map to work from, that is, a smoothed or simplified boundary defining a geographical region. However, such boundaries (e.g., state or county) that exist as Geographic Information System (GIS) data layers often consist of a large number of vertices that are considerably more than that required for micromap depiction on the display. Therefore, it is necessary to reduce redundant vertices in a polygon but only to the point of maintaining the essential shape and neighborhood relationship of the polygons that comprise the micromap. A generalized map for the US is available online at ftp://galaxy.gmu.edu/pub/dcarr/newsletter/micromap/. To produce a generalized map for the state of Utah, a boundary shape file was obtained from the (AGRC,2006). Using ArcGIS, a desktop GIS package, the simplified boundaries were generalized. The generalization routine applied is based on the Douglas-Peucker line simplification algorithm (Douglas and Peucker,1973). Finally, after generalizing the boundaries, LM plots for the US and the state of Utah were created using the S-plus statistical software package. The sample S-plus code for creating LM plots is also obtainable from Dan Carr's ftp site at ftp://galaxy.gmu.edu/pub/dcarr/newsletter/micromap/.

RESULTS

Template for LM Plots

A typical template of a LM plot consists of four key features (Carr and Pierson,1996). Figure 1 shows a hypothetical LM plot. The first feature is three or more sequence panels in parallel linked by location. In the hypothetical case, Figure 1 shows five parallel sequences of panels. The first (leftmost) sequence of panels is the micromap panel itself that typically contains small caricatures of map outlines of a region. The caricature map maintains the shape and neighborhood relationship while making the small subregions more visible. The second (from the left) sequence of panels is the label panel that provides the names of the geographical subregions (here, Region 1 through Region 10). The third through the fifth (from the left) sequence of panels display the statistical summaries. These panels may represent many forms of statistical summaries including box-plots, dot-plots (as shown in Fig. 1), time series plots, CIs, and so forth. Sorting the geographic subregions based on the statistical variable(s) of interest is the second feature. Sorting improves perception between consecutive panels from the top to the bottom of the display. The third feature is the partitioning of the regions into perceptual groups of size five or less to allow the viewer's attention to focus on explicit areas at a time. The fourth feature is color and location that links corresponding elements within the parallel sequence panels, that is, the color red in the topmost panels relates to the geographic subregion in the northeast of the map, the subregion name (Region 5), and a red dot in each of the three statistical panels. The color red is reused in the next consecutive set of panels for Region 2, but there is no relationship between Region 5 and Region 2 as one might at first assume. Simply, there do not exist enough distinguishable colors to populate an entire display (with, say, 50 different subregions) such that colors have to be reused in different panels.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Hypothetical LM plot illustrating the main features of such plots: the leftmost sequence of map panels, the second (from the left) sequence of label panels, and the third through the fifth (from the left) sequence of statistical panels.

In the hypothetical Figure 1, the rows are sorted by decreasing values with respect to the statistical panel 2. The statistical data displayed in the statistical panels 1 and 2 show a strong positive association (the correlation r calculated as 0.99), expressed in the almost parallel behavior of the dots and lines representing the values for these two variables. In contrast, the statistical data in panel 3 and 1 (as well as 3 and 2) show a strong negative association (the correlation r calculated as −0.94 for 3 and 1 and as −0.92 for 3 and 2). This negative association is seen in the movement of the dots and lines in opposite directions for these variables. Moreover, the data in panel 3 show an unusual outlier, the value for Region 1. It is this outlier that considerably reduces the almost perfect negative association otherwise present in this data. Just a simple numerical calculation of r might not be able to reveal the influence of a single subregion on the overall relationship.

The map panels of the LM plot in Figure 1 exhibit a strong geographic pattern: highest occurrences with respect to the statistical panels 1 and 2 can be found in the north and in the east; lowest occurrences can be found in the west and in the south. Additional features of LM plots exist and are described in more detail in Symanzik and Carr (2008).

US Level LM Plots

Figure 2 shows the LM plot for the 31 US states that reported on oral cleft occurrence. The figure shows five vertical columns that are linked by geographic location. The first column shows the generalized outline of the US wherein are drawn the map caricatures for the states. In particular, Alaska and Hawaii are modified in size and shifted towards the 48 contiguous states. Otherwise, the island to the east of Virginia represents Washington, D.C. that otherwise would not be visible. Note that redundant details of a state's boundaries are left out; however, the essential fraction that designates the boundary shape and neighborhood relationships is preserved (other than Washington, D.C.), while at the same time small states such as Rhode Island are magnified such that their assigned color is evident on the map. The second column shows the state names along with a dot in the linking color. The last three columns illustrate three statistical variables. In this particular example, dot-plots represent the three variables oral cleft occurrence, maternal smoking rate, and the AIAN proportion in the population. All the corresponding micromaps, labels, and statistical panels are linked through the same color designation. Note that five distinct colors are used to distinguish the states within a particular micromap frame.

The data in Figure 2 are sorted by oral cleft occurrence from largest to smallest. The micromaps are further divided into two main blocks with Texas in the middle—Texas defines the median occurrence and is plotted (and identified) in black between those states that lie above and below this median. The data are further partitioned into six micromaps each containing a grouping of five states. Such sorting (here descending) and breaking of a long list of states into smaller groups highlights the data from a discrete visual perspective and so draws the viewer's attention to a few subregions at a time. Furthermore, it also provides a viewer with additional visual perspective, that is, by sorting and breaking the data apart into, in this case, six micromaps. These LM plots provide a viewer with considerably more information than what would otherwise have been provided by a series of tables or an overall map representation (e.g., a chrolopleth map) alone. Viewers can now easily navigate through the LM plot to a place of interest in order to review oral cleft occurrence and related statistics without having to leaf backward and forward through a collection of tables or, for that matter, a series of maps. Moreover, viewers can compare the oral cleft occurrence of a particular state with a benchmark oral cleft occurrence or other states in an easier fashion. For example, it is immediately clear from the LM plot that Alaska (ranked 1^st) exhibits a much higher oral cleft occurrence compared to Utah (ranked 2^nd). The LM plot also reveals states that had oral cleft occurrence above, below, or equal to the median and shows states that surpassed the national average (which is 17.7 per 10,000 occurrences, i.e., 1.25 on a log10 scale). The national average is indicated with a vertical red line.

Figure 2 also provides a viewer with a quick overview of any spatial patterns present in oral cleft occurrence. The LM plot is very effective in revealing spatial trends. The immediate impression about spatial patterns observed in Figure 2 is of a few small groups of states that certainly raise questions about oral cleft occurrence similarities. For example, there is a noticeable elevation in the west (including Alaska) as compared to an observable low occurrence in the east-northeast.

However, a glance at the series of micromaps in Figure 2 reveals further details in spatial patterns. For example, light gray shading is used as a foreground to distinguish states above the median occurrence (i.e., in Texas) from those states below the median occurrence. The light gray shading draws attention to higher oral cleft occurrence in the upper half of the plot and lower oral cleft occurrence in the lower half of the plot. The state with the median occurrence (Texas) is shaded in all individual micromaps. The use of such shading provides additional spatial detail. As one can see in Figure 2, high oral cleft occurrence is primarily to be found in the west and the midwest with the exception of California, while the east coast states show up as a broad area of lower oral cleft occurrence.

LM plots can also display multiple variables simultaneously and this allows the viewer to explore the relationships between these variables. As shown in Figure 2, viewers can observe the relationship between oral cleft occurrence and maternal smoking—as mentioned earlier no data on maternal smoking rates were collated for California. The map shows that 9 of the 15 states that are above the median oral cleft occurrence have smoking rates above 16% (1.2 on a log10 scale) compared to only 1 of the 14 states that are below the median oral cleft occurrence. This difference is statistically significant (p = .0052) as tested through a two-tailed Fisher's exact test. This is consistent with the smoking-cleft association that is well established and noted previously.

The rightmost statistical panel reveals a positive relationship between oral cleft occurrence and the proportion of AIAN in the population. In fact, 7 of the 15 states with above the median oral cleft occurrence have an AIAN population equal to or above 1.3% (0.114 on a log10 scale), while none of the 15 states with below the median oral cleft occurrence exceeds the same AIAN population level. This difference is also statistically significant (p = .00632, two-tailed Fisher's exact test).

Utah Level Analysis

Figure 3 illustrates a LM plot for oral cleft occurrence by county for the state of Utah. The overall design of the LM plot in Figure 3 follows the LM template: it shows five sequence columns, the first column being a map demarking the counties of Utah, while the second column contains the county name with associated color labels. The next three columns show the statistical panels for three variables for each county respectively oral cleft occurrence (counts divided by number of live births) for each county. The counties are ranked according to the oral cleft occurrence from highest to lowest and are partitioned into seven micromaps. The number of counties in Utah is 29 and therefore it is not evenly divisible by five. Symanzik and Carr (2009) provide suggestions of how to partition subregions into the micromaps when the number of geographic units (in this case the counties of Utah) within a LM plot are not evenly divisible by the number of geographic units to be displayed in a single micromap. Here, the first three micromaps and the last three micromaps each display four counties while the fourth (middle) micromap displays five counties. Note that in this representation of the LM plot the median is not explicibly drawn but the first three micromaps outline counties above the median while the last three micromaps outline counties below the median. The county with the median occurrence (Garfield) is shaded in all individual micromaps. No additional counties are outlined in the fourth (middle) micromap (other than the five counties that constitute this micromap).

One supplementary statistical representation included in Figure 3 is the addition of CIs as part of the statistical oral cleft occurrence panel. The CIs indicated by connected small dots correspond to the 95% lower and upper confidence limits. The larger colored dots refer to, as before, the oral cleft occurrence in each county. The 95% CI was calculated for each occurrence using an exact Poisson distribution (Leslie,1992). A viewer can now appreciate the fact that the oral cleft occurrence of each county is not quite the “true” (actual) oral cleft occurrence and that the CIs describe the uncertainties of the occurrence estimates, that is, the true value of the occurrence falls most likely somewhere between the limits of the CIs. Moreover, readers can also observe that counties where the occurrence is calculated from limited data (i.e., are more uncertain) have wider CIs and vice versa. As an example, consider how Daggett County (ranked 1^st) with an oral cleft occurrence of 102.5 per 10,000 (resulting in a value of 2.01 on a log10 scale) compares to Salt Lake County (ranked 18^th) that has an oral cleft occurrence of 12.7 per 10,000 (resulting in a value of 1.1 on a log10 scale). Upon initial examination of the occurrence information alone, one might be tempted to infer that Daggett County has a higher oral cleft occurrence when compared to Salt Lake County. However, the conclusion is somewhat different when one takes the CI information of both counties into account: it is evident that Daggett County has a wide 95% CI, compared to Salt Lake County, which has a narrow 95% CI. The implication that one should take from the additional information is that the oral cleft occurrence for Daggett County is less reliable, while one may consider the occurrence for Salt Lake County to be more representative/reliable. This is justified by the data displayed in the counts and livebirths statistical panels.

The addition of counts and livebirths into the statistical panels in Figure 3 provides a viewer with a more complete picture of the statistical assessment of oral cleft data at the county level. Certainly, viewers can appreciate the importance of these two variables by just comparing the oral clefts occurrence and counts in the statistical panels. As indicated in Figure 3, counties such as Salt Lake, Utah, Cache, Davis, Weber, and Box Elder have sizeable numbers in the counts and livebirths categories (a direct result of these counties being more heavily populated). This translates to narrow CIs. In contrast, counties such as Daggett, Garfield, Kane, Millard, and Sanpete correspondingly exhibit wide CIs—a direct result of a sparser population base. Overall, this demonstrates the interdependence of occurrence, counts, and livebirth numbers and implies that both the number of counts and livebirths determine the reliability of the oral cleft occurrence.

DISCUSSION

This article demonstrates the use of LM plots for the display of geographically indexed oral cleft occurrence at two geographical levels—the state and the county level. It is important to note that there are inherent limitations in the data used in this article. To begin with, all birth defects data (including oral clefts) were collected at the state level as compiled by the NBDPN—that is, the NBDPN only maintains the network of state and population-based programs for birth defects. Thus, there may be differences in the standards used when gathering birth defects data and level of ascertainments among states; this may result in certain extremes of the variability of oral cleft occurrence among states that may obscure the true difference of the oral cleft occurrence among states. Maternal smoking and AIAN data are also not without limitations as they were only available for a single year, that is, 2002 and 2000, respectively, and do not cover the same period as the oral cleft data. Despite these limitations, we respect the differences in the state and census data and surmise that the limitations in the data are not so extreme that they may preclude the visualization and analysis presented here. The data can still provide us with important insights as to patterns and relationships in birth defects. However, the readers should be alert to these limitations and use caution when they interpret the results derived from these data. Hence, our intent is not to draw definitive conclusions from the LM plots but rather to show how the visualizations can order the data such that an easier interpretation is possible. From experience in the use of micromapping, the authors believe that LM plots may well have an important role to play in birth defect surveillance because of the many advantages a LM plot offers over tabular or other graphical methods of representation and elucidate this further with the following statements.

The first advantage is that LM plots provide an improved way of viewing and communicating information about birth defects. By sorting and breaking the datasets into a series of micromaps, LM plots simplify the visual appearance by encouraging selective focus. Viewers can immediately spot their home state or county for review of the status of birth defects, and in this manner, they can engage in meaningful discussion. Moreover, LM plots allow viewers to make rapid and meaningful comparisons between different regions. Viewers can compare the rate of a particular state of interest with benchmark values (median or national average) or with other states in a stratified environment. This kind of profiling of states or counties (above or below a central tendency) is valuable information for planning public health services and their subsequent decision criteria like that of resource allocation.

The second advantage of LM plots is that they present statistical summaries and estimates of birth defects in a spatial context. Unlike traditional statistical graphical methods, LM plots combine both exploratory analysis and traditional statistical graphics while maintaining the spatial context; this is very important in birth defects epidemiology because of the intrinsic spatial nature of the events. LM plots are also very effective at describing the spatial elements of the oral cleft occurrence, that is, the varied geographical distribution of the oral clefts as well as their spatial clustering. LM plots are particularly effective in highlighting subregions in a series of micromaps, and in doing so, they reveal detailed spatial patterns that otherwise might not have been detected from data tables alone. As was illustrated in Figure 2, as one moved from the high to median oral cleft occurrence and from the median to the low oral cleft occurrence, a spatial pattern emerged. High oral cleft occurrence tended to be in the western and midwestern states, while the east coast (especially the northeast) revealed a region of low oral cleft occurrence. Such insights are as valuable for hypothesis generation as for identifying areas of high or low risk.

A third advantage of LM plots is associated with the efficacy of the technique of micromapping in handling multiple variables. It is well known that causes for birth defects are, by nature, multivariate, which advocates the linking of birth defects data with potential risk factors in order that one may investigate underlying patterns and relationships. LM plots effectively facilitate this by displaying multiple variables alongside one-another. This capability allows readers to quickly view associations between variables and further pinpoint any anomalous relationship(s) that may exist between variables. Figure 2 illustrates this by displaying maternal smoking and AIAN alongside the oral cleft occurrence. In particular, the association observed between oral cleft occurrence and AIAN was immediately evident for the 15 states in the top three micromaps when compared with the remaining states.

Also shown was the capability of LM plots to display uncertainties of the oral cleft occurrence estimates. Reporting uncertainties along with the occurrence are particularly helpful to the viewer as this permits an assessment as to the reliability of the data. Viewers are able to appreciate that the big dots (Fig. 3) are not representative of the true value but the fact that CIs indicate that there is a range into which the true occurrence falls. The viewer can also note that states or counties with small count values and livebirths produce less reliable information on the occurrence as exhibited by wider CIs, while states or counties with a large number of counts and livebirths create an occurrence that is more reliable and is evidenced by narrower CIs.

In addition to the earlier description of LM plot templates, an ample set of templates are available that offer readers considerable flexibility in visualizing their data via LM plots. For example, the statistical panels of LM plots can take many different forms such as box-plots, bar-plots, histograms, or time series plots. These alternate statistical plots offer additional avenues for one to query the underlying structure of the data and to examine patterns and relationships in the data. For example, Carr et al. (1998) used LM plots to effectively depict time series data for per capita carbon dioxide emissions. One could imagine a similar time series LM plot that would examine the trend of NTDs before and after mandatory fortification of cereal grain products with folic acid. One can also manipulate the colors by using a different set of colors or hues. Furthermore, the beauty of LM plots is that they are not limited to static representations of summary statistics; web-based LM plots can provide users with real-time data to interactively and dynamically query, sort, and compare different regions over different resolutions, for example, at the state or county level. Such web-based LM plots also permit dynamic links between databases and automatic updates of data. In this capacity, Symanzik et al. (1999) developed web-based interactive LM plots for the US Environmental Protection Agency, and in a similar fashion, Wang et al. (2002) developed web-based LM plots for the National Cancer Institute, micromap website (National Cancer Institutes, 2003) accessible at http://statecancerprofiles.cancer.gov/micromaps/.

A final interesting aspect of the national cleft data that pertains to the eastern states lies in the fact that the oral cleft occurrences in these states all fall below the median occurrence. This is notable because the northeastern states are generally high in cancer rates (Hao et al.,2006) and many (Zhu et al.,2002: Mili et al.,1993a,b; Windham et al.,1985) have suggested that cancer and birth defects may share common causes linked to location—these data, at least for clefts, do not support that notion.

In conclusion, LM plots provide a constructive geographic representation coupled to a statistical visualization tool, which also have an exploratory capability. In the context of the integration of LM plots towards the monitoring of birth defects, there is certainly provision, if not tremendous advantage, in the utilization of LM plots to augment the presentation of birth defect data. Further, the application of LM plots has distinct merit in the enhancement of data analysis, the generation of scientific hypotheses, as well as in the integration of data of various forms (e.g., census, environment, etc.). These aforementioned aspects, when linked together, can facilitate planning of public health services towards such aims as targeting limited resources to places with the greatest need.

Acknowledgements

We would like to thank Sara H. Riordan, Genetic Counselor, from Arizona Teratology Information Program, for providing us with the Birth Defects Research (Part A) issue 73(10). We are also grateful to Sam LeFevre, Environmental Epidemiologist, from the Utah Department of Health, for geocoding the oral cleft occurrence for the state of Utah as well as Marcia Feldkamp and Amy Nance (both from the Utah Birth Defects Network, Utah Department of Health) for their assistance.

REFERENCES

Carr DB. 1994. Converting tables to plots. Technical Report 101, Center for Computational Statistics, Fairfax, VA: George Mason University.
Google Scholar
Carr DB,Olsen AR,Courbois JP, et al. 1998. Linked micromap plots: named and described. Statist Comput Stat Graph Newslet 9: 24–32.
Google Scholar
Carr DB,Olsen AR,Pierson SM, et al. 2000a. Using linked micromap plots to characterize Omernik ecoregions. Data Mining and Knowledge Discovery 4: 43–67.
10.1023/A:1009828700017
PubMed Web of Science® Google Scholar
Carr DB,Pierson SM. 1996. Emphasizing statistical summaries and showing spatial context with micromaps. Stat Comput Stat Graph Newslet 7: 16–23.
Google Scholar
Carr DB,Wallin JF,Carr DA. 2000b. Two new templates for epidemiology applications: linked micromap plots and conditioned choropleth maps. Stat Med 19: 2521–2538.
10.1002/1097-0258(20000915/30)19:17/18<2521::AID-SIM585>3.0.CO;2-K
CAS PubMed Web of Science® Google Scholar
Centers for Disease Control and Prevention (CDC). 1998. Tends in infant mortality attributed to birth defects—United States, 1980–1995. MMWR 47: 773–778.
PubMed Google Scholar
Coddington DA,Hisnanick JJ. 1996. Midline congenital anomalies: the estimated occurrence among American Indian and Alaska Native infants. Clin Genet 50: 74–77.
10.1111/j.1399-0004.1996.tb02351.x
CAS PubMed Web of Science® Google Scholar
Douglas D,Peucker T. 1973. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Can Cartographer 10: 112–122.
10.3138/FM57-6770-U75U-7727
Google Scholar
Geographic Data Technology (GDT). 2004. Dynamap/2000 v14.3 street net work file in ArcView shapefile format, county tile for the state of Utah, on digital optical disk (CD).
Google Scholar
Hao Y,Ward EM,Jemal A, et al. 2006. U.S. congressional district cancer death rates. Int J Health Geog 5: 28.
10.1186/1476-072X-5-28
PubMed Google Scholar
Hurst J,Symanzik J,Gunter L. 2003. Interactive federal statistical data on the web using ViZn. Comput Sci Stat 35: CD.
Google Scholar
Khoury MJ. 1989. Does maternal cigarette smoking during pregnancy cause cleft lip and palate offspring? Am J Diseases Children 143: 333–337.
CAS PubMed Web of Science® Google Scholar
Leslie D. 1992. Simple SAS macros for the calculation of exact binomial and Poisson confidence limits. Comput Biol Med 22: 351–361.
10.1016/0010-4825(92)90023-G
PubMed Web of Science® Google Scholar
Little J,Cardy A,Munger RG. 2004. Tobacco smoking and oral clefts: a meta-analysis. Bull WHO 82: 213–218.
PubMed Web of Science® Google Scholar
Mathews TJ,Rivera CC. 2004. Smoking during pregnancy: United States, 1990–2002. MMWR 53: 911–915.
PubMed Google Scholar
Mili F,Khoury MJ,Flanders WD, et al. 1993a. Risk of childhood cancer for infants with birth defects. I. A record-linkage study, Atlanta, Georgia, 1968–1988. Am J Epidemiol 137: 629–638.
10.1093/oxfordjournals.aje.a116720
CAS PubMed Web of Science® Google Scholar
Mili F,Lynch CF,Khoury MJ, et al. 1993b. Risk of childhood cancer for infants with birth defects. II. A record-linkage study, Iowa, 1983–1989. Am J Epidemiol 137: 639–644.
10.1093/oxfordjournals.aje.a116721
CAS PubMed Web of Science® Google Scholar
National Birth Defects Prevention Network (NBDPN). 2005. Birth defects surveillance data from selected states, 1998–2002. Birth Defects Res A Clin Mol Teratol 73: 758–853.
10.1002/bdra.20214
PubMed Web of Science® Google Scholar
National Cancer Institute (NCI). 2003. State Cancer Profiles. Dynamic views ofcancer statistics for prioritizing cancer control efforts in the nation, states, and counties. http://statecancerprofiles.cancer.gov/micromaps/
Google Scholar
Sever LE. 2004. Guidelines for conducting birth defect surveillance. 2^nd ed. Atlanta, GA: National Birth Defects Prevention Network, Inc.
Google Scholar
Symanzik J,Axelrad DA,Carr DB, et al. 1999. HAPs, micromaps and GPL—visualization of geographically referenced statistical summaries on the world wide web. In: Annual Proceedings (ACSM-WFPS-PLSO-LSAW 1999 Conference CD). American Congress on Surveying and Mapping, Bethesda, MD.
Google Scholar
Symanzik J,Carr DB. 2008. Interactive linked micromap plots for the display of geographically referenced statistical data. In: C-H Chen, W Härdle, A Unwin, editors. Handbook of Data Visualization, New York: Springer. (in press).
10.1007/978-3-540-33037-0_12
Google Scholar
Symanzik J,Gebreab S,Gillies R, et al. 2003. Visualizing the spread of West Nile Virus, 2003 Proceedings. Alexandria, VA: American Statistical Association, CD.
Google Scholar
U.S. Census Bureau. 2000. The American Indian and Alaska Native population (AIAN). Census 2000 Brief. http://www.census.gov/prod/2002pubs/c2kbr01-15.pdf
Google Scholar
Utah Automated Geographic Reference Center (AGRC). 2006. http://agrc.utah.gov/agrc_gisservices/gisservicesintro.html
Google Scholar
Waitzman NJ,Romano PS,Scheffler RM. 1994. Estimates of the economic costs of birth defects. Inquiry 31: 188–205.
CAS PubMed Web of Science® Google Scholar
Wang X,Chen JX,Carr DB, et al. 2002. Geographic statistics visualization: web-based linked micromap plots. Comput Sci Eng 4: 90–94.
10.1109/5992.998645
Web of Science® Google Scholar
Windham GC,Bjerkedal T,Langmark F. 1985. A population-based study of cancer incidence in twins and in children with congenital malformations or low birth weight, Norway, 1967–1980. Am J Epidemiol 121: 49–56.
10.1093/oxfordjournals.aje.a113982
CAS PubMed Web of Science® Google Scholar
Zhu JL,Basso O,Hasle H, et al. 2002. Do parents of children with congenital malformations have a higher cancer risk? A nationwide study in Denmark. Br J Cancer 87: 524–528.
10.1038/sj.bjc.6600488
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume82, Issue2

February 2008

Pages 110-119

Visualization and interpretation of birth defects data using linked micromap plots^†

Abstract

INTRODUCTION