Volume 16, Issue 2 pp. 361-371
Hazard/Risk Assessment
Full Access

Demonstration of a toxicological risk ranking method to correlate measures of ambient toxicity and fish community diversity

S. Ian Hartwell

S. Ian Hartwell

Maryland Department of Natural Resources, Tidewater Administration, Chesapeake Bay Research & Monitoring Division, Annapolis, Maryland 21401, USA

Search for more papers by this author
First published: 26 October 2009
Citations: 27

Abstract

The goal of this study was to assess a new toxicological risk ranking model, field validate it with results from a battery of sediment and water column bioassays, and identify correlations of model output with fish community and population metrics. The model has five components: severity of effect, degree of response, bioassay variability, consistency, and number of measured endpoints. The model can reliably reduce an array of ambient toxicity data into a site-specific metric that is appropriate for comparisons with other metrics, such as Index of Biotic Integrity (IBI) or community diversity indices. The model is tolerant of variable amounts of data between stations. It does not generate probability limits without repeated sampling. The model can identify trends between sampling stations and document where chemical contamination is contributing to community impacts as well as where toxicological impacts are not likely to be contributing to observed population level impairment. The model was evaluated with field/laboratory data. Test sites were located in tributaries of Chesapeake Bay watersheds that are impacted by industrial, urban, and agricultural land use patterns. The toxicological risk scores correlate with fish community health metrics. The strongest correlations were between sediment toxicity risk and bottom trawl fish community diversity index.

INTRODUCTION

The goal of this study was to assess a new toxicological risk ranking model, field validate it with results from a battery of sediment and water column bioassays, and identify correlations of model output with fish community metrics. This project represents the initial attempt by the Maryland Department of Natural Resources (DNR) to integrate an environmental ambient toxicity testing scheme with a biological community assessment approach. The toxicological approach was designed to be applicable to assessing tributary-wide contaminant impacts or site-specific assessments. This paper describes the toxicological risk ranking model developed specifically for evaluating the results of tributary-specific ambient toxicity testing. The fish community metrics employed were measures of diversity and Index of Biotic Integrity (IBI). The ambient toxicity tests were laboratory-based bioassays using standard and/or local species. The details of the bioassays and metrics are not dealt with here.

The current approach was undertaken to further our understanding of how toxic contaminants may be affecting habitat quality and resource populations in Chesapeake Bay. The ultimate question to address is: To what extent do toxicants affect living resource populations in Chesapeake Bay? This is a very difficult question to answer in any coastal system with the current state of knowledge. A major difficulty exists in separating natural population fluctuations from anthropogenic factors.

Many species that depend on the Chesapeake Bay habitat for reproduction are in a more advanced state of decline than those that spawn outside the Bay [1]. Clearly, this effect is due to overharvest and/or loss of habitat or access to spawning grounds in some cases. It is also clear that some areas are severely contaminated and others demonstrate localized ambient toxicity [2-4]. It is unknown if localized toxic contaminant effects can influence populations in the Bay as a whole, or if low-level but widespread contamination is a greater problem, or if a combination of the two is important.

As a starting point in addressing the impact of toxic contamination on populations, the risk ranking method was developed to quantify ambient toxicological effects for statistical contrasts with biological community response parameters. The risk ranking model output can be contrasted with any type of community health metric, not just fish community metrics. Confirmation of deleterious effects at the community level is an inherent confirmation that population level effects are occurring. Positive correlation of toxicological impact and depressed community metrics implies cause and effect, but does not substantiate it without a demonstrable mechanism.

This approach is built upon efforts to demonstrate the presence of toxic contaminants in specific sites and regions. An ambient toxicity pilot program, sponsored by the DNR and the United States Environmental Protection Agency (EPA) Chesapeake Bay Program, field validated a suite of sensitive lethal and sublethal bioassays for resident aquatic organisms [3, 4]. The approach assumes that biologically significant environmental contamination is not necessarily predictable based solely on chemical analysis. Contaminants may be present but not bioavailable, or biological impacts may be occurring due to unmeasured constituents and synergistic interactions. It has been demonstrated that the bioassays have the ability to detect the presence of toxic effects in contaminated areas, in areas of unknown quality, and in areas previously thought to be pristine.

Details are in the caption following the image

Map of Chesapeake Bay showing the location of rivers selected for joint ambient toxicity/Index of Biotic Integrity (IBI) sampling in 1993.

MATERIALS AND METHODS

Site selection

Previous studies [5-8] developed an estuarine fish IBI. Based on these studies, four tidal tributaries were selected for paired ambient toxicity/fish IBI sampling (Fig. 1).

Curtis Creek is a tributary of the Patapsco River in Anne Arundel County near Baltimore, Maryland, USA. The watershed is dominated by urban and commercial development. It was selected as an example of a polluted area to allow field assessment of existing test methods for ambient toxicity testing and the toxicological risk ranking model. Adverse biological effects expected at this site provided a positive control.

Rock Creek is also a tributary of the Patapsco River. The watershed is dominated by urban development with forested areas near the headwaters. It is also densely populated but with no heavily industrialized areas, as in Curtis Creek.

Fishing Bay is located in a lightly developed area of the eastern shore. The watershed is more than 70% forest and wetlands. It is the least urbanized of the study areas. Fishing Bay was thought to be a relatively unspoiled environment; however, the number of species caught in Fishing Bay had a similar pattern to that of stressed tributaries [6].

The Wicomico River is a tributary to the Potomac River, located between Charles and St. Mary's counties. This watershed contains the largest proportion of agricultural land of the four tributaries (approximately 30%, crop and pasture). There is little urban development, and forest and wetland occupy more than 50% of the watershed. The Wicomico served as the saltwater reference tributary in Carmichael et al. [5, 6]. The Wicomico River was selected to represent a relatively unimpacted reference area based upon consistently high IBI scores.

Field sampling and bioassays

Fish IBI sampling methods were designed to assess the fish community at its peak diversity in late summer. Tributaries were sampled three times each summer at two to five evenly spaced stations established along the axis of each tributary from its mouth to near the head of tide with beach seines and bottom trawls, deployed near mid-channel. The sampling stations were located beyond the direct influence of point sources. Ambient toxicity testing was initiated before the fish community IBI sampling was begun. This was done to assess the potential impact of toxic contamination as the fish communities matured to peak development and to assess any short-term spikes in toxic effects that may be detectable during the late spring and summer. Monthly, depth-integrated water samples were collected from the central trawl station in each tributary. Water column ambient toxicity bioassays were conducted monthly from May through September 1993 using a 7-d sheepshead minnow (Cyprinodon variegatus) survival and growth test and a 7-day grass shrimp (Palaemonetes pugio) survival and growth test at the DNR's Aquatic Toxicology Laboratory (Glen Burnie, MD, USA). Copepod (Eurytemora affinis) life-cycle survival and reproduction tests were conducted at the University of Maryland, Chesapeake Biological Laboratory (CBL) in July.

Sediment samples were collected with a petite ponar grab sampler at each fish community assessment trawl sampling station. The top 2 cm were retained for testing. Five sites were sampled in the Wicomico River and Fishing Bay. Due to their small size, Curtis Creek and Rock Creek only had two sampling locations. In these systems, two discrete samples were taken at the upstream sites, and three at the downstream sites. Sample sites were separated by ∼ 100 m on a linear or triangular sampling pattern. The sampling plan does not normally provide for true field replication for statistical purposes, but does allow a contrast of upstream versus downstream in the two smaller creeks. The rationale was the assumption of low temporal variation in sediment relative to the water column, and for the purpose of examining sediment contamination effects on a system-wide basis, which is consistent with the IBI community approach. This approach also allowed for an assessment of sediment contamination impacts on bottom communities and could be contrasted with benthic community metrics as well as bottom trawl survey data. Samples were segregated throughout the collection and toxicological tests.

Sediment toxicity bioassays were conducted using the following tests: 10-d sheepshead minnow (C. variegatus) embryo-larval survival and teratogenicity test; 20-d amphipod (Lepidactylus dytiscus) survival, growth, and reburial test; 20-d amphipod (Leptocheirus plumulosus) survival, growth, and reburial test; and 20-d polychaete worm (Streblospio benedicti) survival and growth test. Sediment bioassays were conducted by Old Dominion University, Applied Marine Research Laboratory (AMRL).

Culture and water column bioassay procedures for grass shrimp and sheepshead minnow were adapted from standard methods from the American Public Health Association (APHA) and the EPA [9, 10]. Copepod bioassays used methods modified from Hartwell et al. [11]. Culture and sediment bioassay procedures for the amphipods, polychaetes, and fish eggs were adapted from methods contained in Lambertson and Swartz [12] and DeWitt et al. [13]. Statistical evaluations for L. dytiscus were adjusted for particle size effects [3].

Heavy metals and acid and base/neutral extractable semivolatile organic compounds were analyzed on August water samples and bulk composite sediment samples using standard EPA methods. Total petroleum hydrocarbons in sediment were also measured by EPA method 3550 [14]. Inorganic contaminants were evaluated concurrently with sediment toxicity tests on a composite of the five discrete samples from each river system. Sediments were analyzed for acid-volatile sulfides and simultaneously extractable metals (AVS/SEM) using the method of DiToro et al. [15] and total organic carbon (TOC). Pore-water samples were extracted by squeezing with a nitrogen press and analyzed for ammonia, nitrite, and sulfides. Detailed methods and results of the bioassays can be found in Hartwell et al. [16]. The data were used here to test the risk ranking model.

Ranking model

At the inception of the ambient toxicity program, a ranking scheme was proposed to evaluate the toxicological results on a site-by-site basis [17]. The goal of the ranking system was to quantify the toxicological risk to populations due to the presence of toxic contamination, not merely to catalog presence or absence of toxic effects. The word risk is used here in the sense of jeopardy. If the toxicological risk score is high, due to high impact or high uncertainty, that implies resource populations in the test area are in more danger than areas with low impact or uncertainty. The model is not designed to replace a classical risk assessment, but may be a useful component of, or a companion to, the hazard or exposure assessment components. It specifically addresses ecological impact, without the necessity for demonstrating detailed, chemical-specific cause-and-effect relationships, which is one of the greatest difficulties in ecological risk assessment.

This scheme has five components: severity of effect, degree of response, test variability, site consistency, and number of measured endpoints. Each site was ranked by the following scheme. For each bioassay, endpoint severity was multiplied by the percent response of the test organisms for each bioassay endpoint, and the coefficient of variation for that test endpoint. This product from all bioassays was summed for each test site. The sum was adjusted by a site consistency factor and divided by the square root of the number of test endpoints (N) for each site, to compensate for bias between different sites where different amounts of data may be present.

equation image

Severity refers to the degree of effect that the bioassay endpoints measure. Mortality is considered the most severe response, followed by impaired reproduction and impaired growth. Other endpoints could be included in the list. The magnitude of the values and the relationship between them (e.g., linear, quadratic) needs further evaluation with empirical data. However, to evaluate the behavior of the model, the values of the endpoints were arbitrarily set as integers of mortality = 3, reduced fecundity = 2, and reduced growth = 1, and used consistently throughout.

Degree of response is the measure of the proportion of organisms responding in each bioassay regardless of statistical significance (e.g., 5% mortality, 45% growth inhibition, etc.). Low-level impacts may have significant population level ramifications if present over widespread areas or for long time periods. In this regard, it is as important to know what percentage of the organisms responded as it is to know whether the value was “statistically significant”. The response values were adjusted for mean control values in their calculation formulas. Negative values were assigned a value of zero in the model. The following equations were used to calculate degree of response:
equation image
where test = endpoint datum from ambient media.

Variability was expressed as the coefficient of variation (CV) of response for each set of laboratory replicates for each sample site. This parameter reflects the bioassay-specific variability for each endpoint and sample period. Thus, high variability may result in increased risk scores to a similar extent as positive toxic responses. Data were pooled by river or sample date in some cases for this purpose. It is recognized that this measure of variability may incorporate both experimental and site-specific variation between discrete samples. Experiments are currently underway to address this issue.

Severity, response, and variability are characteristics of the individual bioassays conducted at each site whereas consistency and the number of endpoints measured are site-specific attributes. Consistency refers to the agreement between the various bioassay endpoints measured at a site. If the results from all tests and/or species agree, consistency is high, and confidence in predicting toxic impacts (or lack of effect) is high. If half of the results are positive and half are negative, consistency and certainty of toxic impacts (or lack of impact) are lower.

Consistency was calculated as the cube of the difference between one-half the number of endpoints and the number of statistically nonsignificant responses at each site. Statistical significance in this instance refers to typical pairwise comparison tests of control versus experimental data, not a statistical test of the calculated response values from the equations above.

equation image
where N = total number of endpoints X = number of statistically nonsignificant (P = 0.05) endpoints.

When bioassay endpoint values tend to be nonsignificant (N/2 ⩽X), the function is negative. When half of the endpoints are significant and half are nonsignificant (N/2 = X), the function is zero. When more than half of the endpoint values are statistically different from control values the function is positive. The absolute value is dependent on the amount of data available. Large data sets (high N) will have higher extremes. This polynomial function is an additive factor in the equation. The function reduces the risk score of a station when most of the test results were not significantly different from controls but increases the risk score when more than half the tests are significant.

Table Table 1.. Summary data of bottom fish diversity index, resident fish diversity index, and river Index of Biotic Integrity (IBI) scores versus toxicological risk scores for water, sediment, and combined water and sediment from four test stations in the Chesapeake Bay in 1993
River Bottom index Resident index IBI Water risk Sediment risk Combined risk
Curtis Creek 0a 0.865 29.0 21.04 137.20 158.45
Rock Creek 1.428 1.761 35.0 17.90 76.48 47.02
Fishing Bay 1.760 1.086 28.5 10.46 58.24 24.73
Wicomico River 1.609 1.243 33.8 15.20 56.09 42.28
  • a No fish were caught.

The number of endpoints measured at each site refers to the number of bioassays (species) and measured parameters (survival, growth, etc.) that are monitored. For statistical and experimental reasons, the number of tests run at each site ideally should be the same. However, given the uncertainties of experimental work, this is not always possible. For example, if mortality is very high, it may not be possible to measure growth.

A simple toxicity score can also be calculated for each discrete sample site. This is the sum of the products of endpoint severity and percent response divided by √N. Toxicity score = {Σ [(severity)(% response)]}\√N.

This score is a useful technique for comparing individual sites and for examining spatial trends in sediment or temporal trends in water samples. These calculations are also instructive in examining the response of the full risk ranking model and its response to inclusion of the interrelated factors of consistency and variation.

Data analysis

There are three possible risk ranking scores that may be calculated from the resultant data; water only, sediment only, or water and sediment combined. Because water column bioassays are replicated in the laboratory, a risk score can be calculated for each sampling month and the response scores can be averaged by river over months. This approach allows for an assessment of water column contamination effects on pelagic communities or possibly correlation to the abundance of specific species. Sediment samples were collected and tested as discrete samples without laboratory replication due to budgetary constraints. Therefore, calculation of a risk score can only be done by pooling the data together by tributary to calculate the CV and consistency factors.

Details are in the caption following the image

Plot of mean score generated by 1,000 random simulations of Σ [(severity)(response)(coefficient of variation)]. Insets show cumulative frequency distribution of potential summed scores at selected values of N (number of endpoints).

Sediment and water data may be combined together by river system to calculate a toxicity risk factor for the whole system. This calculation allows an assessment of toxic contamination on the entire river system with equal weight given to sediment and water column (assuming equal data availability). It also has the advantage of combining the data into larger subsets, which tends to dampen out individual spikes in the data set, without eliminating them. To pool the water data by river, the calculated response results were averaged over months. The CV of the mean responses was used in the risk calculation, rather than the mean CV value. Consistency was calculated as before.

The risk scores were contrasted to fish community IBI values. In addition, the Margalef [18] diversity index for bottom trawl community data and resident species (estuarine spawners) was calculated from the fish data.

Pearson correlation coefficients were calculated for every combination of toxicological risk score (water, sediment, and combined) and IBI, bottom trawl species diversity, and resident species diversity (Table 1). The resident species data included both bottom trawl and beach seine data but the value is dominated by species captured in the beach seines. The IBI score effectively incorporates all resident and migratory species in both the trawl and beach seine data. Calculation of an IBI score with only the trawl data would not be effective because it would incorporate an incomplete set of species, relative to the number of metrics in the IBI derivation. The IBI is designed to reflect the diversity and trophic structure of the entire fish community. This also means that the IBI score should respond to a variety of factors in the habitat, including, but by no means limited to, toxic impacts.

Details are in the caption following the image

Plot of the range of possible consistency parameter values versus the number of nonsignificant endpoints for N = 5–10 (where N is number of endpoints).

RESULTS

The behavior of the model relative to the components of the summed products of response, severity, and variability is illustrated in Figure 2. Computer simulations were run to determine the potential range and distribution of values resulting from random combinations of response (0-100%), coefficient of variation (0-1.0), and severity (1, 2, or 3). Simulations were run at a number of endpoints (N) of 4, 8, 12, 16, and 20 to assess how the model responds to different amounts of data. One thousand simulations were run at each level and the cumulative frequencies of resulting products were tabulated. As the number of endpoints increases, the mean of the summed components increases linearly. The lower bound of values at N = 20 falls well within the range of even the N = 4 results. Thus, a small difference of one or two endpoints between stations is not likely to greatly bias results toward stations with more data.

For a given response scenario, the behavior of the consistency factor is illustrated in Figure 3. In the example of the sediment bioassays, using four species and measuring two endpoints for each (N = 8), the maximum value of the parameter is ± 64. This maximum value, relative to the extremes of the potential toxicity scores, is relatively small. If all bioassays were statistically significant and response values were in the range of 80 to 90%, the sum of the toxicity scores could reach into the range of 1,000+. Conversely, where average responses are in the range of 10%, the sum of toxicity scores would be less than 200. A consistency adjustment of 64 would be potentially significant relative to summary scores from other sites in this case. In the case of combined water and sediment scores (with seven test species in this case, N = 14) the maximum consistency score is ± 343. At low-level responses, the consistency factor could be larger than the summed response scores. The consistency score increases rapidly as the number of endpoints increases. At an N of 20 the consistency value maximum is ± 1,000. Even at high response values (summed response values between 3,000 and 4,000), the consistency value may have a larger proportional impact on the final score than with smaller data sets.

Details are in the caption following the image

Plot of the effect of dividing the summed score by √ N for N = 2–12 (where N is number of endpoints).

The impact of division by √ N is illustrated in Figure 4. The influence is straightforward. All things being equal, sites with more measured endpoints will tend to have higher scores. Division by √ N partially corrects for this bias. If two stations had the same summed score, but a differing number of endpoints, the reason must be due to higher response and/or variability levels in the bioassay data from the site with less data. It is logical, therefore, for that site to have a risk score that reflects the risk of greater toxic impact. Division by √ N results in the final site risk score being higher for the site with fewer endpoints in this situation. Although this factor is primarily intended to counteract bias due to differing amounts of data at different stations, it also tends to moderate scores driven by extreme consistency factors. Extremely large (positive or negative) consistency factors by necessity will come from a high number of endpoints.

The behavior of the model was tested with toxicity data from the laboratory bioassays. Examples of bioassay results are included in Tables 2 to 4. These are included for illustrative purposes only, to show how the data were manipulated by the ranking scheme. The tables only reflect a portion of the database.

Briefly, the bioassay results demonstrated that statistically significant mortality or reduced growth did not occur in the water column bioassays. Growth rates were variable between sampling months at a given station, but with no apparent pattern. In the July copepod assays, survival to adult stages was significantly reduced in Curtis Creek relative to controls and the other stations. The Fishing Bay assay had very low survival and reproduction in both ambient and reference water, which was attributed to relatively high test salinities (11 ppt), to which the stock cultures were not adequately acclimated. Reproduction was lowest in Curtis Creek and Rock Creek.

Table Table 2.. Typical response data and calculated toxicological risk ranking score for water bioassays. Data are from July 1993 for two of four tributaries sampled in 1993 (CC = Curtis Creek, FB = Fishing Bay, CV = coefficient of variation, ND = no data)
Site Species Endpoint Severity % Response CV Significant Subscore No. endpoints No nonsignificant Consistency Risk score
CC Eurytemora affinis Mortality 3 86 0.22 Yes 55.73 6 5 -8 57.18
CC E. affinis Reduced reproduction 2 79.2 0.53 83.16
CC Cyprinodon variegatus Mortality 3 0 0 0
CC C. variegatus Reduced growth 1 0 0.24 0
CC Palaemonetes pugio Mortality 3 0 0.87 0
CC P. pugio Reduced growth 1 23.97 0.38 9.18
FB E. affinis Mortality 3 ND 4 4 −8 6.65
FB E. affinis Reduced reproduction 2 ND
FB C. variegatus Mortality 3 3.3 1.41 13.99
FB C. variegatus Reduced growth 1 5.8 1.26 7.29
FB P. pugio Mortality 3 0 1.73 0
FB P. pugio Reduced growth 1 0 1.67 0

Chemical analyses of water samples did not reveal detectable semivolatile organic compounds, chlorinated pesticides, or polychlorinated biphenyls (PCBs) at any station. Of the metals, only silver was detected at elevated levels in Curtis Creek, Rock Creek, and Wicomico River samples (26, 21, and 28 μg/L [total], respectively).

Sediment bioassay results from Curtis Creek, Rock Creek, and the Wicomico River demonstrated elevated mortality of L. dytiscus. Leptocheirus plumulosus and S. benedicti exhibited increased mortality in Curtis Creek sediments. The sheepshead minnow egg test produced significant egg and larval mortality at both Curtis Creek and Fishing Bay. No significant reduction in growth was observed for L. dytiscus for any of the test sites. Leptocheirus plumulosus displayed significant reduction in growth rate at Curtis Creek and Wicomico River sites, whereas S. benedicti demonstrated reduced growth at Rock Creek.

Chemical concentrations on composited sediment samples were below detection for all organic analytes in the Wicomico River and Fishing Bay samples. Curtis Creek and Rock Creek sediments were contaminated with high molecular weight polycyclic aromatic hydrocarbons (PAHs) (Table 5). Petroleum hydrocarbons were found at the highest levels in Curtis and Rock Creeks, with elevated levels in Wicomico River sediments also, relative to Fishing Bay. All pore water SEM heavy metal levels in Curtis Creek and Rock Creek were high relative to Fishing Bay and Wicomico River (Table 6). Heavy metal concentrations in pore waters were an order of magnitude lower in Fishing Bay sediments and below detection in Wicomico River sediments, except for low levels of zinc.

Table Table 3.. Typical response data and calculated simple toxicity score for sediment bioassays. Data are from station 1 for two of four tributaries sampled in 1993 (CC = Curtis Creek, FB = Fishing Bay, ND = no data)
Site Species Endpoint Severity Response Subscore N Score
CC Lepidactylus dytiscus Mortality 3 58.21 174.63 8 157.02
CC L. dytiscus Reduced growth 1 0 0
CC Leptocheirus plumulosus Mortality 3 6 18
CC L. plumulosus Reduced growth 1 43.6 43.6
CC Streblospio benedicti Mortality 3 51 153
CC S. benedicti Reduced growth 1 30.88 30.88
CC Cyprinodon variegatus Mortality 3 4 12
CC C. variegatus Reduced hatch 3 4 12
FB L. dytiscus Mortality 3 0 0 7 64.31
FB L. dytiscus Reduced growth 1 ND
FB L. plumulosus Mortality 3 0 0
FB L. plumulosus Reduced growth 1 0 0
FB S. benedicti Mortality 3 0 0
FB S. benedicti Reduced growth 1 38.16 38.16
FB C. variegatus Mortality 3 44 132
FB C. variegatus Reduced hatch 3 0 0

The water column risk scores for each sampling period are shown in Figure 5. Values varied from slightly negative to above 50 depending on month. The spike in Curtis Creek in July is primarily due to E. affinis mortality and reproduction effects. The spike in June in the Wicomico River is entirely due to fish growth data. Pooled risk values are shown in Figure 6. With a larger database, the overall Wicomico River risk score becomes a much lower value due to a lower consistency value.

The sediment risk scores are shown in Figure 7. Again, the Curtis Creek score is higher than any other station. The individual toxicity scores for discrete sediment samples are shown in Figure 8. The values vary widely from site to site. Recalling that the Curtis Creek and Rock River stations consisted of only upstream and downstream stations, it is obvious that the downstream Curtis Creek location accounts for the vast majority of the response score. A t test of the toxicity scores from the upstream versus downstream Curtis Creek sites was significant in a one-tailed test (t = 2.47, 3 d.f., p = 0.05).

Table Table 4.. Final response data and calculated toxicological risk ranking scores for combined water (pooled by month) and sediment (pooled by river) bioassays. Data are for two of four tributaries sampled in 1993 (CC = Curtis Creek, FB = Fishing Bay, ND = no data)
Site Medium Species Endpoint Severity % Response CV Significant Subscore No. endpoints No nonsignificant Consistency Risk score
CC Sediment Lepidactylus dytiscus Mortality 3 52.08 0.26 Yes 40.62 14 9 -8 149.65
CC Sediment L. dytiscus Reduced growth 1 0 0 0
CC Sediment Leptocheirus plumulosus Mortality 3 22 1.02 Yes 67.32
CC Sediment L. plumulosus Reduced growth 1 44.67 0.63 Yes 28.14
CC Sediment Cyprinodon variegatus Mortality 3 44 0.94 Yes 124.08
CC Sediment C. variegatus Reduced hatch 3 7.2 0.8 17.28
CC Sediment Streblospio benedicti Mortality 3 19.6 1.41 82.91
CC Sediment S. benedicti Reduced growth 1 26.13 1.01 26.39
CC Water Palaemonetes pugio Mortality 3 4.58 1.65 22.75
CC Water P. pugio Reduced growth 1 13.84 0.57 7.93
CC Water C. variegatus Mortality 3 3.17 1.22 11.63
CC Water C. variegatus Reduced growth 1 0 2.68 0
CC Water Eurytemora affinis Mortality 3 86 0.22 Yes 55.73
CC Water E. affinis Reduced reproduction 2 79.2 0.53 83.16
FB Sediment L. dytiscus Mortality 3 8.87 1.82 48.43 12 11 −125 23.03
FB Sediment L. dytiscus Reduced growth 1 3.16 1.41 4.46
FB Sediment L. plumulosus Mortality 3 9.8 0.54 15.88
FB Sediment L. plumulosus Reduced growth 1 1.5 1.24 1.86
FB Sediment C. variegatus Mortality 3 31.2 0.77 Yes 72.07
FB Sediment C. variegatus Reduced hatch 3 3.6 1.51 16.31
FB Sediment S. benedicti Mortality 3 2.6 1.62 12.64
FB Sediment S. benedicti Reduced growth 1 19.86 1.11 22.04
FB Water P. pugio Mortality 3 0.21 9.95 6.21
FB Water P. pugio Reduced growth 1 0 3.69 0
FB Water C. variegatus Mortality 3 2.99 0.54 4.89
FB Water C. variegatus Reduced growth 1 0 1.47 0
FB Water E. affinis Mortality 3 ND
FB Water E. affinis Reduced reproduction 2 ND

The risk scores for combined water and sediment data are shown in Figure 9. The basic patterns seen in the water and sediment scores are reflected in the combined scores. The score from the Curtis Creek site is more than twice as high as that of any other station. Fishing Bay had the lowest risk score. Because the risk calculations do not produce quantified measures of variability, there is no way to test one site against another statistically in the Wicomico River and Fishing Bay data.

Table Table 5.. Results of chemical analyses of composite sediment samples, collected September 9–13, 1993, from four sites in the Chesapeake Bay for priority pollutant semivolatile acid/base neutral organic compounds. Only those compounds found above detection limits are listed. Units are in μg/kg unless otherwise noted
Site
Chemical Curtis Creek Rock Creek Fishing Bay Wicomico River NOAA ER-L
Fluoranthene 540 480 BDLa BDL 600
Pyrene 510 420 BDL BDL 350
Chrysene 300 290 BDL BDL 400
Benzo[a]anthracene 220 BDL BDL BDL 230
Benzo[b]fluoranthene 320 290 BDL BDL NAb
Benzo[k]fluoranthene 300 380 BDL BDL NA
Benzo[a]pyrene 250 BDL BDL BDL 400
Petroleum hydrocarbons (mg/kg) 509 680 180 428 NA
  • a BDL = below detection limit. b NA = data not available.

The correlation coefficients for the risk scores and the fish community metrics from 1993 are shown in Table 7. The bottom trawl diversity index was strongly correlated with the sediment toxicological risk score and the combined score, which was driven by the sediment score. The resident species diversity index and the overall IBI did not have strong correlations with the toxicological risk scores. The strong relationship between all the toxicological risk scores and bottom diversity index can be seen in Figures Fig. 10.-Fig. 12.. The resident species diversity index demonstrates a similar trend with bottom and combined risk scores, but they are not statistically significantly correlated.

Table Table 6.. Mean pore-water simultaneously extracted metals values for sediment pore-water samples, collected September 9–13, 1993, from four sites in the Chesapeake Bay. Detection limits for each metal are also listed. Units are in μmol/g. (Data from Adolphson et al. [unpublished data])a
Site Cadmium Lead Copper Nickel Zinc Sum
Curtis Creek 0.322 0.360 1.051 0.144 2.966 4.575
Fishing Bay 0.017 0.036 0.022 0.022 0.326 0.423
Rock Creek 0.057 0.383 0.870 0.292 4.403 6.006
Wicomico River 0.000 0.000 0.000 0.000 0.023 0.023
Detection limits 0.0003 0.0050 0.0006 0.0004 0.0005
  • aMercury values for all sites were <0.001 μmol/g.

DISCUSSION

Overall, bioassay results indicate that the assays are sensitive enough to identify biologically significant contamination. The risk results demonstrate that water column toxicity is not as severe as localized sediment toxicity but that water column effects may be more widespread and variable over time. The risk ranking procedure results in comparable data sets between sites. For example, not all sites have the same number of tests. Growth could not be assessed in some Curtis Creek sediment tests due to high levels of mortality. This site ranked far above all other sites in spite of this, due to the relatively high consistency, high response rates, and that the observed effects were principally mortality, which has the highest severity factor. It is recognized that some of these values are arbitrary and need further refinement as more data become available.

Both severity and degree of response are straightforward parameters if only one type of bioassay is run at each test site. However, given a suite of tests run at each site, some system of factoring in the relative sensitivity of different species and different endpoints needs to be developed for these two parameters. Adjustment of degree of response by a simple division by the frequency of positive responses would bias the results toward those endpoints with low sensitivity. Also, a factor for sensitivity and severity must be carefully weighted, as they are likely to be confounded in response patterns. That is, endpoints that are severe (e.g., mortality) may be less sensitive than other endpoints. This may not always be the case, depending on the mode of action of the toxicants present in the environment and their synergistic interactions with the test species.

Details are in the caption following the image

Risk scores for individual water column samples from four Chesapeake Bay tributaries evaluated for ambient toxicity in 1993. Individual bars from left to right represent results from May to September for each tributary.

The consistency factor can drive the score to be negative if there are a large number of endpoints and/or the response values are very low. The scoring method for toxicity ranking demonstrates site- and/or sample-specific differences between stations and sample times. Inclusion of factors for consistency and variability provides additional information for risk ranking of toxic impact, which is sensitive to the amount of data and the agreement between results from different bioassays.

No individual model parameter can be seen to drive the model output. Table 8 lists regression statistics for each parameter versus risk score. With one exception, correlation coefficients (r2) are low, which indicates weak relationships between final risk scores and individual parameter values. Thus, the model appears to incorporate all the parameters equally. Severity values should always be the same for all rivers, unless some endpoints are not measured in some systems. The slopes for response versus risk regressions were all significant. This is to be expected because higher response values should be reflected in higher risk scores. However, the correlation co-efficients were relatively low, indicating that the final risk scores reflect more than just the level of response. Observed bioassay coefficients of variability were inversely related to risk, but did not have a strong influence on the risk scores. Consistency is a result of the number of endpoints and the number of significant endpoints and consistency and N are therefore statistically confounded. When used as the test parameter, consistency appears to correlate strongly with risk score for combined data, but there were only four data points (from four bodies of water) and consistency would be expected to track with the final risk score because it is driven by the number of significant endpoints. Consistency does not correlate well with risk score in the water column data when calculated on a month-by-month basis (N = 20) and the r2 value for sediment (N = 4) is only 34%. The consistency factor was designed to act as a counterweight to response values for the purpose of damping out rare spikes, while not influencing scores from uniformly consistent response results. The consistency factor will tend to enhance the score of highly polluted sites and lower the scores of uncontaminated sites. The pooled risk scores for water and sediments from Curtis Creek are at least twice as high as the next nearest score. The consistency factor value for Curtis Creek was near zero. The values in the other sites were strongly negative. Stepwise regression [19] of severity, response, CV, and consistency with risk score identified response as the only significant factor in the water column data. Both response and consistency were identified as significant in the final stepwise model for the sediment data and only consistency was included in the model for the combined water and sediment data set.

Details are in the caption following the image

Mean risk scores for water column bioassay data from four Chesapeake Bay tributaries evaluated for ambient toxicity in 1993.

Details are in the caption following the image

Risk scores for pooled sediment data from four Chesapeake Bay tributaries evaluated for ambient toxicity in 1993.

Details are in the caption following the image

Toxicity scores for individual sediment samples from four Chesapeake Bay tributaries evaluated for ambient toxicity in 1993.

Details are in the caption following the image

Risk scores for combined sediment and water data from four Chesapeake Bay tributaries evaluated for ambient toxicity in 1993.

Data are also included in Table 8 to contrast response scores versus variability. Biological response data have a tendency to have relatively higher variability at low and threshold effects levels. This does not appear to be a significant complication in this case.

The combined scores are not merely the sum of the sediment and water scores. Numerically, the combined Curtis Creek risk score is greater than the water or sediment scores. The combined Wicomico and Fishing Bay scores are in between their water and sediment values. The combined Rock Creek risk score remains at or above the individual water and sediment values, which may indicate some form of threshold value that integrates data volume and level of response in the model. Average response values were not greatly higher in Rock Creek than in Fishing Bay or the Wicomico River, but they were more consistent.

Some information is lost in the process of pooling the data to calculate a risk score on a river-by-river basis. For example, the copepod bioassays were not run as often as the other water column tests but they are given equal weight in terms of contribution to the final score because data are summed by river for each bioassay endpoint. This does have the advantage of smoothing out individual spikes in the data set. It is instructive, however, to look at the spikes by assessing risk scores (or toxicity scores in the case of sediment) on a sample-by-sample basis. This approach displays the occurrence of transient spikes in the Wicomico water column and the downstream gradient in Curtis Creek sediment toxicity.

Table Table 7.. Pearson correlation coefficients and p values (in parentheses) for toxicological risk scores and fish community metrics from four sites in the Chesapeake Bay in 1993 (IBI = Index of Biotic Integrity)
Risk score IBI score Bottom diversity index Resident diversity index
Water risk 0.1980 -0.8290 -0.0090
(0.8023) (0.1706) (0.9908)
Sediment risk -0.3870 -0.9910 -0.4610
(0.6126) (0.0092) (0.5387)
Combined risk -0.3790 -0.9980 -0.5490
(0.6213) (0.0018) (0.4513)
Details are in the caption following the image

Mean risk scores for water samples from four tributaries of Chesapeake Bay sampled in 1993 versus fish community metrics. (Bottom diversity index, BDI; resident diversity index, RDI; Index of Biotic Integrity, IBI.)

Details are in the caption following the image

Pooled risk scores for sediment samples from four tributaries of Chesapeake Bay sampled in 1993 versus fish community metrics. (Bottom diversity index, BDI; resident diversity index, RDI; Index of Biotic Integrity, IBI.)

Details are in the caption following the image

Combined risk scores for water and sediment samples from four tributaries of Chesapeake Bay sampled in 1993 versus fish community metrics. (Bottom diversity index, BDI; resident diversity index, RDI; Index of Biotic Integrity, IBI.)

It should be noted that the final risk ranking values should be interpreted with respect to specific results in laboratory data. The response scores for growth in the Wicomico River sediment bioassays can be attributed to grain size and/or TOC effects on both amphipod species. These effects tend to artificially increase the Wicomico River score slightly. In the case of Fishing Bay, there is no reproductive response score contribution from the E. affinis bioassays. This is due to the lack of acceptable control results. Therefore, the response score cannot be calculated, but reproductive failure is clearly indicated by the data for this system. Conversely, it is unknown if the fish egg mortality data are influenced by sediment ammonia levels. Pore-water levels were above lethal levels for sheepshead minnow larvae [20], but ammonia was not measured in the overlying water in the bioassays.

Finally, the sampling sites were not selected randomly. Curtis Creek is an area of known contamination problems. The toxicological ranking model clearly reflects this. Furthermore, the depauperate bottom fish community seen in the absence of a consistently anoxic condition indicates a chemical contaminant problem, especially in the sediment [6]. Water quality problems in Rock Creek may have as much to do with nutrient enrichment as with contaminants from urban runoff and marina operations [21]. The Wicomico River was considered to be relatively clean, but with potential agricultural runoff impacts. Fishing Bay has no history of contamination and the watershed is largely undeveloped.

Table Table 8.. Regression statistics for risk ranking model parameters versus risk scores for water, sediment, and combined sediment and water from toxicity tests at four stations in the Chesapeake Bay in 1993. Also included are regression statistics for response and coefficient of variability
Parameter Slope r2 %
Water
Severity 0.61 0.17
Response 0.33**a 19.21
CVb -1.57 2.97
Consistency -0.16 -0.64
Response versus CV -0.02 3.52
Sediment
Severity -1.18 0.12
Response 1.00** 21.87
CV -22.27 11.87
Consistency 1.81 34.90
Response versus CV -0.01* 18.03
Sediment and water
Severity -0.17 0.00
Response 1.04** 16.93
CV -23.02 5.74
Consistency 1.00* 92.86
Response versus CV -0.01 23.54
  • a* Slope significantly different than 0 at P = 0.05; ** slope significantly different than 0 at p = 0.01.
  • b CV = coefficient of variation.

The correlation between the toxicological scores and the IBI metrics are consistent with these interpretations. The risk scores are correlated with the trawl data, particularly the sediment risk scores, as opposed to the water column risk scores (Table 7). The resident species metrics were not well correlated with the risk scores. The definition of resident species as estuarine spawners is important in this regard. This metric is dominated by species taken in the beach seines in terms of number of species and individuals, particularly in Curtis Creek, where no fish were caught in the trawls in 1993. The “resident species” are thus not living in close contact with the sediment at the bottom of the channels where the sediment samples were taken. The toxicological data clearly demonstrate that sediment toxicity is a dominant problem in Curtis Creek, but that the water column scores were marginal there, as in all systems.

The low IBI scores and the relatively low number of resident individuals taken in the Fishing Bay system do not appear to be due to toxic contamination. Low IBI scores in this area [6] may be due to habitat deficiencies, such as the absence of submerged aquatic vegetation in shallow areas. This interpretation is further supported by the fact that the bottom species diversity is relatively high in Fishing Bay (Table 1). The value of the toxicological risk ranking approach presented here is that it is equally able to indicate where toxic contamination is and is not a likely impact, in the face of indications of impaired community health. Thus, although trends between the IBI and the toxicological risk ranking scheme exist, closer statistical associations are observed between the risk scores and specific metrics in the fish community database. Based on these results, three predictions can be made:
  • 1.

    Areas with high IBI scores and/or diversity indices will always have low toxicological risk scores, unless populations have adapted to contaminated conditions.

  • 2.

    Areas with high toxicological risk scores will always have low IBI scores and/or diversity indices, unless populations have adapted to contaminated conditions.

  • 3.

    Areas with low IBI scores and/or diversity indices may or may not have high toxicological risk scores, depending on the nature of the reason for poor fish communities.

As studies progress, more sites will be included in the analyses. The IBI database, which spans several years from specific locations, will be used to test these hypotheses as more ambient toxicity data become available.

Additional work needs to be done to examine how well the toxicological risk ranking results from different years can be integrated. In addition, an assessment is needed on the importance of sampling intensity, relative to the size of the river system, on toxicological risk score sensitivity.

Acknowledgements

Sediment bioassays and pore-water chemical analyses were conducted by Ray Alden, Pete Adolphson, and Joe Winfield. Copepod bioassays were conducted by David Wright, Gena Coelho, and John Magee. Invaluable field sampling assistance was provided by Margaret McGinty, Sandy Ives, Doug Randle, and Bill Rodney, under the direction of Stephen Jordan. Additional field assistance was provided by Randy Kerhin. Tyler Hartwell and Andy Clayton assisted with computer simulations for model sensitivity assessment.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.