Volume 18, Issue 6 pp. 1298-1303
Hazard/Risk Assessment
Full Access

Empirical assessment of an ambient toxicity risk ranking model's ability to differentiate clean and contaminated sites

S. Ian Hartwell

Corresponding Author

S. Ian Hartwell

Maryland Department of Natural Resources, Chesapeake Bay Research and Monitoring Division, Annapolis, Maryland 21401, USA

Maryland Department of Natural Resources, Chesapeake Bay Research and Monitoring Division, Annapolis, Maryland 21401, USASearch for more papers by this author
First published: 02 November 2009
Citations: 2

Abstract

Ambient toxicity results were used to investigate statistical implications of sampling design options for an existing toxicological risk ranking model. A battery of water column and sediment toxicity bioassays measuring lethal and sublethal endpoints was employed with fish, invertebrates, vascular plants, and bacteria. Bioassays were conducted monthly from July through September 1995 with water from three stations in the South River estuary, Maryland, USA, and a reference station. Sediment bioassays were conducted in August with five discrete samples taken from each station. Water column assays indicated low-level contamination at the upstream stations. Sediment bioassays yielded greater toxic responses than water column assays. The toxicological risk ranking model identified a strong toxicity gradient from upstream to downstream in the South River. Toxicological risk scores for the downstream station in the South River were comparable to the reference station. Statistical analyses demonstrated that the risk ranking model does not require field sample replication for tributary-wide assessment. Characterization of an entire estuary does require broad coverage to assess the system as a whole, however. Threshold levels of toxic impact can be quantified with the model.

INTRODUCTION

Previous studies in Chesapeake Bay, Maryland, USA, have demonstrated correlations between measures of ambient toxicity and fish community indices on a tributary-wide basis [1, 2]. Those studies were also used to field validate a toxicological risk ranking model designed to express the combined influence of endpoint severity, response, variability, and interassay consistency of observed toxicological results in a single metric that could be used in coordination with biological metrics [3]. This paper describes the results of intensive sampling for ambient toxicity in a single estuarine tributary of Chesapeake Bay. The project was undertaken to address the sensitivity of the sampling design and the statistical power of the toxicological risk ranking model employed in this program.

The previous studies employed a sampling strategy that sampled water from a central location on a monthly basis on the assumption that, within an estuarine tributary, the water column was relatively homogeneous spatially but varied temporally within a tributary. Sediment was sampled only once during the testing season, but from widely separated sites down the length of each tributary. The rationale for sampling sediment in this way was the assumption of low temporal variation in sediment relative to the water column and for the purpose of examining sediment contamination effects on a system-wide basis. This is consistent with the fish community sampling approach, with which the sediment results are contrasted. This approach would also allow for an assessment of sediment contamination impacts on bottom communities where data are available. Based on 1993 data from two tributaries in the Baltimore, Maryland area, where replicate upstream and downstream sediment samples were taken, these assumptions appear to be accurate, and toxicological gradients could be detected [1]. However, the number of samples did not allow for a rigorous statistical assessment of the results. Therefore, the relative sensitivity of the risk ranking model and its ability to distinguish between marginally contaminated sites within a tributary has not been quantified. Based on the first study, four tidal tributaries were selected for paired ambient toxicity/fish community sampling the following year to assess the impact of watershed urbanization on receiving stream habitat quality. Results from these studies indicated a distinct upstream to downstream gradient of contaminant effect in the South River. This system was selected for more intensive sampling to generate data for analysis of statistical power and intrasite variability in a medium sized estuarine tributary. The South River watershed does not have large concentrations of heavy industry or extremely dense urban areas with conspicuous point sources. However, the watershed is undergoing urban development and is typical of tributaries of the northwestern shore of Chesapeake Bay.

METHODS

Water sampling stations were established at three sampling stations used in the earlier study (Fig. 1). Sample sites were located away from known point sources and mixing zones. Control samples were taken from a station in the Wicomico River estuary (a tributary to the Potomac River). Depth-integrated water samples were collected by boat in July, August and September 1995. Samples were taken twice from each site during the course of each 7-d bioassay test to provide fresh renewal water. Samples were filtered through 37-μm mesh and adjusted to a salinity of 15 ml/L with hypersaline brine. All water was stored in amber bottles at 4°C until use. Standard water-quality parameters were recorded at the time of collection (salinity, dissolved oxygen, temp, secchi depth).

Sediment samples were taken from five sites at each station in August. The first site was located at the water sample and fish bottom trawl location. Using this location as a central point, a 100-m square grid was located around the central point and a discreet sediment sample was taken at each corner of the square. Throughout the text, ‘station’ refers to the entire sampling location in the river and ‘site’ refers to the discrete locations where sediment was taken at each station. Sediment samples were kept as discrete samples for testing and analysis, which provided for true field replication for statistical purposes. The central site was located in the channel of the tributary. Sampling at the corners of the square provided samples both upstream and downstream of the central location and away from the channel center. The absolute width of the channel varied from station to station. Samples were held out of direct sunlight at 4°C and used within 2 weeks. Sampling techniques used were the same as in previous years [1, 2]. Sediment samples were obtained from the top 2 cm of grab samples using a petite ponar sampler.

Details are in the caption following the image

Map showing South River sampling stations and location of the South and Wicomico Rivers and rivers sampled in previous years in the Chesapeake Bay region (inset).

The water column tests conducted used bioassays of 7-d larval sheepshead minnow (Cyprinodon variegatus) survival and growth; 7-d juvenile grass shrimp (Palaemonetes pugio) survival and growth; copepod (Eurytemora affinis) life-cycle survival and reproduction; and bacterial luminosity (Microtox®, AZUR, Carlsbad, CA, USA). Bioassays, which measured 14-d growth and reproduction, were also conducted in August with the submerged aquatic vegetation (SAV) species sago pondweed (Potamogeton pectinatus). Fish and grass shrimp bioassays were conducted at Maryland Department of Natural Resource's Aquatic Toxicology Laboratory, Glen Burnie, Maryland, USA. The copepod bioassays were conducted at the University of Maryland, Chesapeake Biological Laboratory, Solomons, Maryland, USA. The vascular plant bioassays were conducted at the Anne Arundel Community College, Arnold, Maryland, USA. All culture, handling, and testing methods have been previously described in detail [2].

Culture, maintenance, and bioassay procedures for grass shrimp and sheepshead minnow used standard methods [4, 5]. Copepod methods were based on the methods of Hartwell et al. [6]. Copepod development was followed from 24-h-old nauplii through one life cycle, for example, 10 to 14 d. Numbers of surviving adults, eggs, nauplii, and subadults from the F1 generation were also counted at the end of the test. The culture and bioassay techniques used for the sago pondweed tests were based on methods developed Fleming et al. [7] and Ailstock et al. [8]. Laboratory-propagated stocks were weighed and rooted in a nutrient agar medium. Eight replicates were submersed in 750 ml of ambient water from each station. Light levels were maintained at 70 mol/m2/s photosynthetically active radiation on a 12 h light: 12 h dark cycle. Temperature was held constant at 22°C. Filtered air supplemented with CO2 to approximately 3% was continuously pumped into each test chamber. After 4 weeks, the plants were removed and weighed. The number of rhizome tips on each plant were counted as a measure of reproduction. Dry weight was taken as a measure of growth. The Microtox® assay was performed on all water samples. The method exposes a luminescent bacteria (Photobacterium phosphoreum) to ambient water samples and measures changes in light output following incubation [9]. Microtox bioassays were run as dilution series bioassays for each sample to calculate an EC50.

Sediment toxicity tests were conducted using the bioassays of 10-d sheepshead minnow (C. variegatus) embryo-larval survival and teratogenicity; 10-d amphipod (Lepidactylus dytiscus) survival and growth; 10-d amphipod (Leptocheirus plumulosus) survival and growth; 10-d polychaete (Streblospio benedicti) survival and growth; and lettuce (Lactuca sativa) seed germination. The L. plumulosus bioassays were conducted at the Maryland Deparment of Natural Resource's Toxic Aquatic Contaminants Laboratory. The seed tests were conducted at the Anne Arundel Community College. All the other sediment bioassays were conducted by Old Dominion University, Applied Marine Research Laboratory (AMRL), Norfolk, Virginia, USA. All culture, handling, and testing methods have been previously described in detail [2]. Test methods for animal species were adapted from the American Society for Testing and Materials (ASTM), the U.S. Environmental Protection Agency (U.S. EPA), and DeWitt et al. [10-12]. The seed bioassay followed the methods described in Hartwell et al. [2]. Seeds were placed in 10 replicate porous bags, which were then buried in sediment samples from each sample site. Control tests were done in sand adjusted to field salinities. Seeds were incubated in the sediment for 3 d at 2°C. Percent germination was recorded at intervals of 2, 7, and 14 d.

For all tests, percent survival was compared to controls using the t test following arc sine transformation or a Wilcoxon rank sum test where necessary. Measures of growth and reproduction were compared to controls using t tests or the Wilcoxon rank sum test. Differences between means were considered significant at the p = 0.05 level.

The risk ranking scheme has five components, which are (1) severity of effect, (2) degree of response, (3) test variability, (4) site consistency, and (5) number of measured endpoints [3]. Severity is the degree of effect that the bioassay endpoints measure (e.g., mortality, growth). Degree of response is the proportion of organisms responding, relative to control values, in each bioassay regardless of statistical significance. Variability is the coefficient of variation (CV) of response for each individual set of laboratory or site replicates. The number of endpoints measured at each site refers to the number of bioassays (species) and measured parameters (survival, growth, etc.). Consistency refers to the agreement between the various endpoints at a station. A detailed discussion of the model can be found in Hartwell [3].

Table Table 1.. Calculated toxicological risk ranking scores from ambient toxicity bioassay results for water, sediment, and combined water and sediment for samples from the South and Wicomico Rivers in 1995
Station Water Sediment Combined
South 1 58.6 102.2 103.1
South 2b 34.2 71.3 64.0
South 4 16.5 25.8 −30.7
Wicomico −3.1 −36.6 −32.0
Toxicological risk ranking calculations used a scheme whereby endpoint severity was multiplied by the percent response (corrected for control) of the test organisms for each bioassay endpoint and the coefficient of variation for that test endpoint. This value is referred to as the subscore. The subscores from all tests were summed for each test site. The sum was adjusted by the site consistency factor and divided by the square root of the number of test endpoints for each site. The consistency parameter was calculated as the cube of the difference between half the number of endpoints and the number of statistically nonsignificant responses at each site. Clean sites (few significant responses) tend to have negative consistency values, whereas impacted sites have positive values. Statistical significance in this instance refers to typical sample versus control comparison tests, not a statistical test of the control corrected response values. The risk scores are calculated as
equation image
where consistency = [(N/2) - X]3, N = total number of endpoints, CV = coefficient of variation, and X = number of statistically nonsignificant endpoints.
Details are in the caption following the image

Simple toxicity scores for sediment samples from discrete sites at the South River and Wicomico River stations evaluated for ambient toxicity in 1995. Numbers above bars indicate sample site designation.

Table Table 2.. Calculated least-significant-differences (LSD) groupings for water, sediment, and combined water and sediment toxicological subscores from ambient toxicity bioassays with samples from the South and Wicomico Rivers in 1995
Matrix LSD Station Mean score Grouping
Water 20.11 South 1 19.92 A
South 2b 13.14 A
South 4 10.26 A
Wicomico 9.10 A
Sediment 21.81 South 1 35.81 A
South 2b 24.13 A B
Wicomico 13.95  B
South 4 13.36   B
Sediment and water 14.51 South 1 27.87 A
South 2b 18.63 A B
South 4 11.81   B
Wicomico 11.53   B

Three risk ranking scores were calculated, which were water column, sediment, and water and sediment combined. A water column risk score was calculated for each sampling month and averaged over months by station. Sediment risk scores were calculated for each station. To test for statistical differences between stations, the mean and variation of the subscores was used. The coefficient of variation of a set of numbers does not change when they are multiplied by a constant. Since N1/2 is a constant for all stations, the variability of the subscores at each station was used as a surrogate for station variation.

Interstation differences were tested by calculation of the least significant differences (LSDs) of the subscores. Separate LSDs were calculated for the results from water, sediment, and combined data. A simple toxicity score was also calculated for each sediment site. This is the sum of the products of endpoint severity and percent response divided by N1/2. It was used to assess intrasite variability within each station. The individual toxicity score for the central sediment site was contrasted to the confidence interval of the mean of all five scores from each site
equation image

RESULTS

The toxicological risk scores are shown in Table 1. Results showed a gradient in the South River from upriver to downriver. The Wicomico River had consistently lower values than those of the South River stations. The sediment risk values were higher, and the gradient seen in the South River is similar to the water column risk scores. The overall risk scores for combined water and sediment data show that stations 1 and 2B of the South River have the highest risk values, with station 4 and the Wicomico River having negative risk scores. The site-specific sediment toxicity scores are shown in Figure 2. Variability was similar between all stations.

Results of statistical testing of the risk scores using LSDs are shown in Table 2. Stations South 4 and the Wicomico reference station were consistently grouped together in both sediment and combined water-sediment results. South River station 1 was significantly different than those two stations in both cases. Station 2B was intermediate and did not differ statistically from either extreme. The water risk scores did not demonstrate statistically significant differences between stations. Simple toxicity scores for the central sampling sites (site 1) at each station were well within the confidence limits of the station as a whole (Table 3).

Table Table 3.. Toxicity scores for central sampling sites versus mean scores and confidence intervals from sediment bioassays with samples from the South and Wicomico Rivers evaluated for ambient toxicity in August 1995
Station Mean CI Central site score
South 1 152.6 ±30.73 151.4
South 2b 132.7 ±20.85 151.6
South 4 45.7 ±10.26 54.8
Wicomico 64.3 ±13.59 70.9

DISCUSSION

Results were consistent with data gathered in 1994 [2] (Fig. 3). The upstream stations in the South River yielded greater toxicological responses than the downstream station, and the risk ranking results show a strong gradient from upstream to downstream. A gradient of effect was seen in both the water column and sediment. Sampling in 1994 could not assess the possibility of a gradient of effects in the water column due to the use of a single, centrally located water sampling station. The results of 1995 bioassays with water from discrete locations in the tributary allow a more thorough assessment of the impact of toxic contamination.

The consistency of the 1995 results in the South River with those predicted from the 1994 study indicate the need for more intense studies in the Patuxent and Severn Rivers also. Indications of local toxic conditions were noted in the middle and upper reaches of those systems, respectively [2]. The statistical tests of inter- and intrastation differences indicate that the sampling design used in 1994 is adequate to detect localized zones of toxic contamination. The centrally located sampling site was representative of the entire station. Differences between risk values for individual stations do reflect significant gradients. Locations that exhibit highly variable conditions (e.g., grain size) could still yield false negatives depending on precise sampling location. Based on grain size analysis, station 2B appears to lie in or near the transition zone from tributary-dominated sediment dynamics to open bay dynamics, and the individual and combined risk scores for that station are intermediate between scores from stations 1 and 4 [13].

Thus, a sampling design that utilizes single samples from individual stations is adequate for ambient toxicity screening and for correlation with biological community health indices such as an index of biotic integrity (IBI) [14-16] or a diversity index. This means that field and laboratory resources can be used to survey a larger number of locations in any given sampling period rather than assessing one tributary at a time. However, it does not follow that one or two stations in a given tributary are adequate coverage for ambient toxicity testing. Clearly, if only the downstream had been sampled in the South River, the conclusions of the 1994 study would have been totally different than the current interpretation. Ambient toxicity studies must cover the length of a tributary to be truly representative of the system. Study designs should take into account what is known about the tributary watershed. For example, the toxicological risk scores in Curtis Creek obtained in 1993 reflect the obvious difference in upstream versus downstream development and past releases into the environment [1]. The ability of a sampling scheme to identify hot spots is dependent on the size of the area relative to sampling intensity [17]. Thus, the sensitivity of a river-specific combined score requires careful application of the data. Large areas of degraded habitat could be missed if the sampling stations are too far apart. Finally, the sheer size of the river may also affect its assimilative capacity for environmental degradation.

Details are in the caption following the image

Risk scores from evaluation of ambient toxicity in nine Chesapeake Bay tributaries and/or stations from 1993 to 1995. Horizontal line represents the upper confidence interval limit of Wicomico River data. Plots are for (clockwise from upper left) water, sediment, and combined water and sediment.

Each station has a single toxicological risk score. This score is derived from the subscores, the consistency value, and N1/2. Unlike N1/2, the consistency factor is not a constant between stations and addition of it will tend to drive the station scores apart, thus improving the distinction between clean and contaminated sites. By ignoring the consistency factor and using the individual subscores as a surrogate for statistical tests, conservative results are produced since differences in the station risk scores will be greater than the differences between the means of the subscores. The LSD contrasts (Table 2) demonstrate that the water column risk scores were not significantly different between stations. The sediment and combined scores consistently showed significant differences between stations South 1 and both South 4 and the reference station in the Wicomico River. Station South 2B was clearly a transition zone. Thus, results from the toxicological risk ranking model appears to be reliable for inferring statistical differences between stations.

In addition to a detailed biological assessment of South River and testing the abilities of the sampling and data analysis methods to identify statistically significantly differences between stations, we also needed to determine what a threshold significance value is in the risk ranking model, that is, what score indicates significant degradation of the habitat due to contamination.

The threshold sensitivity of the risk ranking model can be estimated with the reference station data from the Wicomico River. The mean of the Wicomico combined risk scores from 1993 to 1995 is −7.6, with a confidence interval of ±45.2. Given the inherently variable nature of biological response testing and field sampling, a presumptively clean site should therefore not greatly exceed a value of 40 with the current set of bioassays (Fig. 3). Further testing will allow an assessment of the variation observed in the Wicomico data. Whether the variation seen between years is normal variation given the inherent nature of the biological tests or is due to some bias from 1993 is unknown at this time. The mean risk scores and confidence intervals for the water column and sediment data were 18.3 (±26.2) and 31.1 (±35.0), respectively. The relationships between these values and the risk scores from the nine rivers and/or sites sampled over 3 years are shown in Figure 3. A combined toxicological risk score in excess of 100 clearly indicates a contaminated site. Scores above 50 indicate transitional zones or areas of marginal but significant contamination in terms of biological effects. The same logic applies to the water only and sediment only values.

Sample replication at specific stations differs between the 3 years of testing due to sampling design and/or river size, but the number of data points (discrete samples/replicates) were the same. Also, the number of toxicological endpoints measured in 1993 was slightly smaller (no vascular plant or Microtox assays) than in 1994 and 1995. However, the toxicological risk ranking model compensates for different sample sizes in the calculation formula [3]. The stations that are above the threshold score are clearly obvious in Figure 3. The South River and Curtis Creek scores are far above the combined scores of other locations. The South 1 station score clearly drives the score for that river, with South 2b in the transition zone. Also, the sediment scores are seen to be the dominant factor at these three stations. The sediment scores for South 2b and Rock Creek are also above the threshold. The water scores are generally at or below the threshold. The only water column score above the threshold is South 1 (Fig. 3).

It must be emphasized that the sample sites are located away from known point sources and mixing zones, so water toxicological risk scores reflect ambient water column conditions, influenced by the integrated water flow from upstream, tidal mixing and/or sediment releases. The relative magnitude of impacts seen in the water column and in sediment could be markedly different in or near the mixing zone of a specific source.

Acknowledgements

Celia Dawson and Eric Durell performed the necessary bioassays. Sediment invertebrate bioassays were directed by Ray Alden, and vascular plant assays were directed by Steve Ailstock. Field assistance was provided by Margaret McGinty, Sandy Ives, Doug Randle, Bill Rodney, Dave Goshorn, and Randy Kerhin. This project was partially funded by the National Oceanic and Atmospheric Association through the Maryland Department of Natural Resources, Coastal and Watershed Resources Division grant NA 470 Z 0132.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.