Volume 33, Issue 9 pp. 2133-2139

Hazard/Risk Assessment

Full Access

MOSAIC_SSD: A new web tool for species sensitivity distribution to include censored data by maximum likelihood

Guillaume Kon Kam King,

Guillaume Kon Kam King

UMR CNRS 5558–Laboratoire de Biométrie et Biologie Évolutive, Unviersite Claude Bernard, Lyon, Villeurbanne, France

Search for more papers by this author

Philippe Veber,

Philippe Veber

UMR CNRS 5558–Laboratoire de Biométrie et Biologie Évolutive, Unviersite Claude Bernard, Lyon, Villeurbanne, France

Search for more papers by this author

Sandrine Charles,

Corresponding Author

Sandrine Charles

UMR CNRS 5558–Laboratoire de Biométrie et Biologie Évolutive, Unviersite Claude Bernard, Lyon, Villeurbanne, France

Institut Universitaire de France, Paris, France

Address correspondence to [email protected].Search for more papers by this author

Marie Laure Delignette-Muller,

Marie Laure Delignette-Muller

UMR CNRS 5558–Laboratoire de Biométrie et Biologie Évolutive, Unviersite Claude Bernard, Lyon, Villeurbanne, France

VetAgro Sup, Campus Vétérinaire de Lyon, Marcy l'étoile, France

Search for more papers by this author

Guillaume Kon Kam King,

Guillaume Kon Kam King

UMR CNRS 5558–Laboratoire de Biométrie et Biologie Évolutive, Unviersite Claude Bernard, Lyon, Villeurbanne, France

Search for more papers by this author

Philippe Veber,

Philippe Veber

UMR CNRS 5558–Laboratoire de Biométrie et Biologie Évolutive, Unviersite Claude Bernard, Lyon, Villeurbanne, France

Search for more papers by this author

Sandrine Charles,

Corresponding Author

Sandrine Charles

UMR CNRS 5558–Laboratoire de Biométrie et Biologie Évolutive, Unviersite Claude Bernard, Lyon, Villeurbanne, France

Institut Universitaire de France, Paris, France

Address correspondence to [email protected].Search for more papers by this author

Marie Laure Delignette-Muller,

Marie Laure Delignette-Muller

UMR CNRS 5558–Laboratoire de Biométrie et Biologie Évolutive, Unviersite Claude Bernard, Lyon, Villeurbanne, France

VetAgro Sup, Campus Vétérinaire de Lyon, Marcy l'étoile, France

Search for more papers by this author

First published: 26 May 2014

https://doi.org/10.1002/etc.2644

Citations: 34

Share a link

Email
Wechat
Bluesky

Abstract

Censored data are seldom taken into account in species sensitivity distribution (SSD) analysis. However, they are found in virtually every dataset and sometimes represent the better part of the data. Stringent recommendations on data quality often entail discarding a lot of these meaningful data, resulting in datasets of reduced size which lack representativeness of any realistic community. However, it is reasonably simple to include censored data in SSD by using an extension of the standard maximum likelihood method. The authors detail this approach based on the use of the R-package fitdistrplus, dedicated to the fit of parametric probability distributions. The authors present the new Web tool MOSAIC_SSD, that can fit an SSD on datasets containing any type of data, censored or not. The MOSAIC_SSD Web tool predicts any hazardous concentration and provides bootstrap confidence intervals on the predictions. Finally, the authors illustrate the added value of including censored data in SSD, taking examples from published data. Environ Toxicol Chem 2014; 33:2133–2139. © 2014 SETAC

INTRODUCTION

The species sensitivity distribution (SSD) is a central tool for risk assessment in ecotoxicology. It provides a subtle way to define hazardous concentrations for the environment compared with using arbitrary safety factors. However, the approach is still subject to heated debates regarding its optimal implementation, and no consensus has been reached. As a result, environmental regulatory bodies often advocate country- or region-specific approaches 1-5. In studying SSD, methodological choices based on very theoretical arguments have a direct impact on legislation and significant economic and ecological outcomes. The purposes of the SSD approach are to model interspecies sensitivity variability and to provide a protective concentration for a group of species. Species sensitivity distribution uses sensitivity data as input, such as no-effect concentration, no-observed-effect concentration (NOEC), or x% effective concentration (ECx) for a group of tested species. These can be called critical effect concentrations, which are obtained through acute or chronic toxicity bioassays. The SSD approach is based on the hypothesis that variability in species sensitivity follows a probability distribution. This distribution is extrapolated from the sample of tested species to infer a group-wide protective concentration, the hazardous concentration for p% of the group (HCp). In general, a parametric distribution is assumed. However, assumptions about the distribution can be avoided using distribution-free methods 6.

The SSD approach has many issues 7, ranging from ecological to statistical concerns. Despite the difficulty of addressing all of them, specific aspects can be improved. Throughout the body of work mentioning SSD, there are very few occurrences of properly taking into account missing data, nondetects, or censored data in general. On those rare occurrences, this is achieved in a Bayesian framework 8, which requires statistical expertise not accessible to untrained users. However, censored data contain crucial information, and ignoring or transforming them alters the quality of the predictions based on the remaining data 9. It is possible to deal with censored data in the more familiar frequentist framework using standard maximum likelihood methods. This also offers several advantages over common SSD approaches. In the present study, we present a Web tool, MOSAIC_SSD (http://pbil.univ-lyon1.fr/software/mosaic/ssd/), to easily include censored data in an SSD analysis. The MOSAIC_SSD Web tool relies on the existing R package fitdistrplus10.

In the following sections, we first review the methods to fit a distribution to data to position our tool among the existing SSD approaches. Then, we explain how to include censored data in SSD and present the Web tool we have designed for this purpose, MOSAIC_SSD. Finally, we illustrate the added value of including censored data in SSD using example datasets from published literature.

REVIEW OF METHODS TO FIT AN SSD

Two steps are required to fit an SSD. The first step is to choose a distribution that seems appropriate to describe the data. Possible options include a Weibull distribution 11 to emphasize the tails of the distribution; a triangular distribution 11 with a finite-size support when no species are more sensitive/resilient than a certain threshold value; or a multimodal distribution 12, 13 for an assemblage of several taxons, for example. Log-normal 14 and log-logistic 15 distributions are the customary choices 16, although an extensive list of distributions have been applied to SSD. Alternatively, distribution-free methods can be used to avoid the subjective choice of a distribution for the data. This approach has been used for SSD in various works 6, 8, 17. Among these distribution-free methods, the Kaplan-Meier estimator is a means to include censored data in SSD 8. However, it is restricted to certain types of censored data. One difference between parametric and distribution-free methods is that when fitting a parametric distribution, the possibilities of shapes are restricted to a certain class of distributions, and the shape is refined by adding information from the data. In a distribution-free approach, however, any kind of shape is allowed, and the result emerges solely from the data. Therefore, distribution-free methods do not use any sort of exterior information and often require more data than parametric methods 16, 17. Consequently, when dealing with small datasets, it is more reasonable to fit a parametric distribution.

Each parametric distribution is determined by a set of parameters, so the second step is to estimate them. Parameter estimation is performed by optimizing a chosen criterion: the likelihood or some goodness-of-fit distance. The likelihood function gives the probability of observing the data given the parameters. Maximizing the likelihood implies selecting the parameters for which the probability of observing the data is highest. Maximum likelihood is by far the most standard approach to distribution fitting and more generally to model fitting. It is backed with a consequent body of theoretical work ensuring many interesting asymptotic properties 18: the maximum likelihood estimate converges to the true value of the parameters (consistency), it is the fastest estimate to converge (efficiency), and the difference with the true value is normally distributed (normality), which provides confidence intervals on the parameters. This approach is used by the Australian software BurrliOZ 19, for example, which fits a Burr III distribution to the toxicity data. Moreover, a natural extension of the likelihood function makes it possible to take into account censored data 20.

Another popular method to determine the distribution parameters that best describe the data is the least-square regression on the empirical cumulative distribution function (CDF), which minimizes the sum of the squared vertical distances between the CDF and the data. It is also known as the Cramer-Von Mises distance 21. This approach is adopted by the software CADDIS_SSD 22 from the US Environmental Protection Agency (USEPA), which performs a least-square regression on the CDF with a log-probit function. The software SSD MASTER 23 uses the same method but tries several other distributions: normal, logistic, Gompertz, and Fisher-Tippett. However, regression on the CDF is not an easy method to include censored data, because constructing the CDF implies sorting the data, which is not trivial with censored data. In addition, there is no unique way to build a CDF. Several possible plotting positions 1, 24 all have desirable properties, but none of them represents the data more faithfully than any other. Therefore, the resulting SSD and its predictions depend on purely arbitrary decisions regarding the plotting positions, a fact that undermines its scientific credibility.

A third common approach to determine the distribution parameters is to match the moments of the empirical distribution with those of the model. This is numerically easy when there is an analytical formula for the determination of the parameters. Moment matching is equivalent to maximum likelihood for the distributions of the exponential family, such as the normal, exponential, or gamma distributions, but not for the logistic distribution. The free software ETX 25 uses the moment matching method 14, 15. It fits a log-normal and a log-logistic distribution by moment matching and computes confidence intervals. Moment matching is sensitive to outliers and can give unrealistic results 26. However, it can be useful when the maximum likelihood computation is intractable 26. Moment determination with censored data is not trivial. Therefore, this method cannot be straightforwardly extended to include censored data.

This brief review of the classical SSD approaches shows that several methodological choices must be made to fit an SSD: whether to use a nonparametric method, selection of the distribution and of the parameter estimation method. Apart from maximum likelihood, no straightforward approach exists for a nonexpert in statistics to make use of all types of censored data. Indeed, all of the available turn-key software programs for SSD fitting require the use of noncensored data. Yet, there is a possibility to use the R-package fitdistrplus 10 to fit censored data using maximum likelihood, with the following scheme.

MAXIMUM LIKELIHOOD FOR CENSORED DATA

Maximum likelihood provides a single framework to cope with both censored and noncensored data. Censored data is a general name given to data that are not in the form of fixed values but belong to an interval, bounded or not. Censored sensitivity data occur when determining a critical effect concentration for a given species is not possible. Possible reasons are that the highest concentration tested does not have any noticeable effect, that only a tiny amount of contaminant already stamps out all the individuals, or that the measurement is simply too imprecise to be reasonably described by a single value instead of an interval. In such cases, it is only possible to give a lower bound, a higher bound, or an interval to the critical effect concentration. Such data are called right-censored, left-censored, or interval-censored, respectively. Censored data can also occur when there are multiple values for the sensitivity of 1 species to a given toxicant. When the quality of the data seem equivalent, the European Chemical Agency's (ECHA) advice 5 is to use the geometric mean as a replacement for the different values. It might be more cautious to use these multiple values to define an interval containing the sensitivity of that species. Censored data are very different from doubtful or questionable data, obtained from failed experiments. They are produced using a valid experimental procedure, and they contain information as valid as noncensored data. Censorship is very common, especially for rare species for which scant data are available and for which no standard test procedure exists. Discarding censored data has a downside, because these data could represent the larger part of an extended dataset. For instance, in the work by Dowse et al. 8, discarding censored data entails a division of the number of tested species by a factor of 8.

In spite of their ubiquity, censored data appear to be very much ignored in ecotoxicology. To our knowledge, there is no example of an SSD including all types of censored data in a frequentist framework. It is possible in a Bayesian framework 24, 27, 28, but fitting a Bayesian model requires a certain statistical expertise. Censored data are typically either discarded or substituted with arbitrary values, which is a bias-prone approach in general 9. However, there is a simple method to include censored data in a frequentist framework. Parameter estimation of a distribution on any type of censored data can be performed using a natural extension of the maximum likelihood method 20. The likelihood function for noncensored data is written as follows

$urn:x-wiley:07307268:media:etc2644:etc2644-math-0001$ (1)

where x_i are the N sensitivity data following distribution ƒ of parameter θ. This likelihood function can be extended to censored data. Let x_i be the N_nc noncensored data, $urn:x-wiley:07307268:media:etc2644:etc2644-math-0002$ the N_lc upper bounds for left-censored data, $urn:x-wiley:07307268:media:etc2644:etc2644-math-0003$ the N_rc lower bounds for right-censored data and $urn:x-wiley:07307268:media:etc2644:etc2644-math-0004$ the N_ic pairs of bounds for interval-censored data. Then, the previous likelihood function is now extended to

$urn:x-wiley:07307268:media:etc2644:etc2644-math-0005$ (2)

where F is the cumulative distribution function of distribution f. We see that the likelihood function for censored data (Equation 2) writes as a product of 4 terms, the first being the likelihood of noncensored data (Equation 1) and the next 3 corresponding to the left-censored data, right-censored data, and interval-censored data, respectively.

MOSAIC_SSD

It is possible to use the method described in the previous section using the R-package fitdistrplus 10. R-packages survival 29 and NADA 30 offer the same possibility. However, they require a certain fluency in the R programming language, preventing the widespread use of censored data in ecotoxicology. Minitab 31 is a commercial software with a graphical user interface that fits multiple distributions to censored data rather easily, but there does not seem to be any open-source alternative.

Moreover, fitdistrplus and these other packages and software are not specifically designed for SSD, and their versatility in the choice of distributions and fitting methods might discourage inexperienced users. Thus, we developed a Web interface, MOSAIC_SSD (http://pbil.univ-lyon1.fr/software/mosaic/ssd/), which is a wrap-up of fitdistrplus into an SSD-dedicated online tool. The MOSAIC_SSD tool enables anyone to perform a simple yet statistically sound SSD analysis including censored data without worrying about the conceptually difficult underlying statistical questions. The Web interface is easily accessible via any browser and simple to use; given an input dataset, it sends the calculation to a server and then hands in the result. The input dataset is a text file uploaded by the user with straightforward encoding. A noncensored dataset is given in 1 column. A censored dataset is given as 2 columns: an “NA” in the left column and a number in the right denotes left-censored data, and a number in the left column and an “NA” on the right denotes right-censored data. Two differing numbers denote interval-censored data, and 2 identical numbers represent noncensored data. It is also possible to type in the dataset by copying and pasting it in a text box.

To keep the tool more user-friendly, few options are offered. The user can choose 1 or 2 among the log-normal and log-logistic distributions. These 2 distributions are the most widely used 16, and parameter estimation appears robust enough to accommodate for most datasets, because they contain only 2 parameters. To select which distribution best describes the data, the first step is to perform a qualitative assessment by looking at the representative curves. The value of the likelihood function for each model can then be used as a further decision criterion. The log-logistic distribution has heavier tails than the log-normal and is therefore more conservative in the determination of the 5% hazardous concentration (HC5) 15.

After clicking the run button, the bootstrap 95% confidence intervals are automatically computed. They yield confidence intervals on the parameters of the distribution and on several computed HC5. The bootstrap procedure is not guaranteed to converge, the number of iterations required being strongly dependent on the dataset. Therefore, an automatic check of bootstrap convergence is implemented. The bootstrap procedure is run several times, comparing the magnitude of the results' fluctuations with the span of the confidence interval. This comparison determines whether the bootstrap has converged. In the case that the bootstrap procedure fails to converge, additional computations are launched. If the bootstrap finally converges, or if the process has reached the time limit, the user is advised as to whether the confidence intervals are reliable. Calculating the confidence intervals using a bootstrap method has the advantage of using a unified framework for every distribution. Figure 1 shows a screenshot of the result page of the analysis with an example dataset (provided in MOSAIC_SSD) containing censored data and documenting the salinity tolerance of riverine macro-invertebrates 32 (hereafter referred to as the censored salinity dataset). The dataset contains 72-h 50% lethal concentration (LC50) values for 110 macro-invertebrate species from Australia. Data were collected using rapid toxicity testing 33 and contain noncensored, right-censored, and interval-censored data. The result page shows a graphical representation of the censored data, the distribution parameters, HCp computed for various interesting values of p, and the bootstrap confidence intervals within brackets. Figure 1 also shows the output of an SSD analysis with a noncensored dataset. It actually is a noncensored version of the salinity dataset. The transformation from censored to noncensored dataset follows the customary approach to censored data, which consists of discarding some type of data and transforming others (see Added value of including censored data). An analysis with noncensored data follows identical steps and yields results with the same outline, except that a traditional CDF is used to visualize the data. The obvious difference between the outputs of the censored dataset and the noncensored dataset is the representation of the CDF. For noncensored data, the CDF is represented using the traditional Hazen plotting positions 1. The choice of plotting positions remains arbitrary, and no perfect solution exists 1, 24, so preference was given to the most standard approach. Building a CDF implies defining an ordering for the data. If obvious for noncensored data, such an ordering makes little sense for interval-censored data. They might be ranked according to the median of the interval, the higher bound, or the lower. Adding left- or right-censored data further complicates matters. Within fitdistrplus, the answer to this problem is to use the Turnbull estimate of the CDF, which is a nonparametric maximum likelihood estimator of the CDF 34. This estimate can be obtained through an expectation-maximization algorithm and yields the CDF that predicts the data with the highest probability. The Turnbull estimate is represented as a stepwise curve, as in Figure 1 (top panel).

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Screenshot of the result page of MOSAIC_SSD on the censored salinity dataset. The top panel shows the output of MOSAIC_SSD on the salinity dataset, and the bottom panel shows the output on a noncensored dataset obtained from a transformation of the salinity dataset. HC = hazardous concentration. [Color figure can be viewed in the online issue which is available at wileyonlinelibrary.com]

Finally, MOSAIC_SSD can be used as a stepping stone to perform further analysis with fitdistrplus. The last item on the MOSAIC_SSD result page is not shown on the screenshots. It is an R script offering the possibility to replicate the analysis using fitdistrplus, through a copy and paste operation in R. This script is intended as a stepping stone to using the complete fitdistrplus R-package. It can be adjusted by slightly changing some of the options. For instance, HCp for different values of p can be computed, with an alternative distribution or a different fitting method. Moreover, this script ensures transparency and traceability of the results obtained through MOSAIC_SSD.

ADDED VALUE OF INCLUDING CENSORED DATA

Changing a few parameters in the R script provided within MOSAIC_SSD, it is possible to use fitdistrplus to investigate several fundamental aspects of SSD, such as the influence of including censored data on the prediction. A customary approach when dealing with censored data is to discard or to transform them. More precisely, it is common to discard left- or right-censored data and to take the middle of the interval-censored data as a single value. Two datasets were analyzed to assess the effect of such data transformation on the predicted hazardous concentrations. In the censored salinity dataset, 89 of 108 LC50s (82.4%) are censored, among which 60 (55.6%) are right-censored and 29 (26.8%) interval-censored. Most of the censored data resulted from the testing of rare species, for which the small number of individuals captured prevented the calculation of an LC50 by fitting a concentration–effect model 32. This extensive dataset was collected to be as representative as possible of the species found in nature 32. Therefore, a first asset of taking censored data into account is to abstain from discarding or altering the vast majority of the data. The resulting SSD is therefore more representative of the community it aims to describe. Moreover, using only noncensored data in the analysis introduces a strong selection bias toward abundant species. This is particularly problematic when some rare species are likely to be among those that the environmental manager wishes to protect by carrying out an SSD analysis. The second dataset was published by Koyama 35 and contains vertebral deformity susceptibilities of marine fishes exposed to trifluralin (hereinafter referred to as the censored trifluralin dataset). The measured endpoints are 96-h LC50 on 10 species. Four of the LC50s are censored, among which 2 are right-censored and 2 are left-censored. For this dataset, the obvious advantage of including censored data is that the SSD can be fitted on 10 species, whereas discarding the censored data reduces the size of the dataset to 6 species only. This is below the minimum recommendation of ECHA of 10, preferably 15 36. A noncensored version of the 2 datasets (hereinafter referred to as the transformed salinity and transformed trifluralin datasets) was obtained after the habitual procedure of discarding the right- or left-censored data and taking the middle of the interval-censored data. Fitting the log-normal distribution on the censored and transformed versions of the datasets showed that discarding censored data had an adverse effect on the predicted HC5 (Figure 2). For the salinity dataset, discarding right-censored data induced a clear upward bias for the cumulative curve and therefore a smaller HC5 (Figure 2, left). The estimates for the HC5 and their 95% confidence interval were 9.85 g L⁻¹ (8.38–11.80 g L⁻¹) for the censored dataset and 7.98 g L⁻¹ (6.63–9.93 g L⁻¹) for the transformed dataset, respectively. Underestimating the hazardous concentration might seem a harmless error, because it is more protective to use the transformed salinity dataset. However, that incorrectly low value might motivate the use of costly decontamination measures at a specific location, when efforts could be spared and distributed elsewhere.

The influence of censored data is dataset-dependent, and the bias could be in the opposite direction. This is illustrated on the trifluralin dataset (Figure 2, right). Fitting the log-normal distribution yielded the following estimates and 95% confidence intervals for the HC5: 2.4 × 10⁻³ mg · L⁻¹ (4.7 × 10⁻⁵ − 2.6 × 10⁻²) for the censored dataset and 1.7 × 10⁻² mg · L⁻¹ (8.9 × 10⁻³ − 4.3 × 10⁻² mg · L⁻¹) for the transformed dataset, respectively.

Discarding the censored data led to underestimating the variability in the community sampled by the tested species. Therefore, the width of the distribution was underestimated, and the fifth percentile had a too large value. Discarding the censored data led to an underestimation of the trifluralin real toxicity and of its potential hazard to the environment. Another striking differentiation was that the span of the confidence interval was much larger when censored data were included in the SSD. It reveals that a possible effect of transforming censored datasets is to severely underestimate the width of the confidence interval and to give overconfident predictions on the hazardous concentrations.

To confirm these findings, we studied a published dataset containing 3442 contaminants and 1549 species 37. This dataset contained both censored and noncensored data. Thus, we compared the HC5 obtained on the complete dataset with the HC5 obtained on the transformed dataset including only censored data. To have consistent endpoints for SSDs, we restricted the dataset to LC50 only. We also chose to focus on chemicals for which the proportion of censored data was superior to 10%. The data were aggregated to accommodate for the presence of multiple sensitivity values for the same contaminant, species, and endpoint. The aggregation procedure is detailed in the Supplemental Data. Two subsets of the dataset were considered: 1) a well-documented subset (data-rich), containing more than 10 species per contaminant to produce good quality SSDs, and 2) a poorly documented contaminant dataset (data-poor), containing between 5 and 9 noncensored LC50 after aggregation. This second subset represents a common situation with insufficient precise information on the species sensitivities to the contaminant. On each of these subsets, we studied the ratio of the noncensored HC5 over the censored HC5. We also studied the ratio of the noncensored lower bound of the 95% confidence interval on the HC5 (HC5_2.5%) over the censored HC5_2.5% for both datasets. The HC5_2.5% has been proposed to derive safe concentration levels for contaminants 16. The computation of the confidence intervals is detailed in the Supplemental Data. The study of the large ecotoxicity database revealed that it is risky to discard or transform censored data. When sufficient noncensored data are available, the value of the HC5 might prove relatively insensitive to the degradation of the information induced by arbitrarily transforming censored data into noncensored data. The maximum bias introduced was a factor 5 larger or smaller, which is comparable to the largest safety factors that can be applied to the HC5. When few data are available, however, it appears to be crucially important to include censored data in the determination of the HC5. The bias on the HC5 can lead to an overestimation by a factor of 80, yielding safe concentrations that fail to protect the target communities. The bias on the HC5_2.5% can be even greater, as it reached 5 orders of magnitude on the dataset studied. Finally, because MOSAIC_SSD provides a transparent method to deal with censored data, there remains no particular reason to transform the data. The detailed results of the analysis, along with the database, are provided in the Supplemental Data.

DISCUSSION

In the present study, we reviewed the general approach to fit an SSD to sensitivity data and explained how it was possible to use maximum likelihood to include censored data in SSD. We presented MOSAIC_SSD, a Web tool that enables any user to perform an SSD analysis including censored data with few very simple steps. The MOSAIC_SSD tool is an interface to a more versatile tool, the R package fitdistrplus 10, and presents only a handful of options to simplify the use. We supported the methodological approach behind MOSAIC_SSD with several arguments and showed the added value of including censored data into the SSD. Discarding or transforming censored data has been shown to alter the results of the SSD analysis. Using MOSAIC_SSD is a convenient way to take censored data into account in the fitting of an SSD. Moreover, the sound general statistical approach is also an asset to perform any sort of SSD. Considering the choice of a distribution, MOSAIC_SSD provides by default 2 standard distributions, the log-normal and log-logistic, but it encourages the use of alternative distributions by providing a stepping stone to using the R package fitdistrplus. The question of which distribution best fits a dataset cannot have a general answer and must be addressed by testing several options. Therefore, the possibility of trying multiple distributions is a valuable asset. For instance, a user might wish to fit a distribution that best describes the tails of the dataset, because determining an HC5 is an extreme quantile estimation problem. In that case, a heavy tailed distribution such as Weibull or exponential is appropriate.

In selecting a distribution, it is important not to pick a distribution with too many parameters. One of the easily accessible software programs for SSD is BurrliOZ 19, which fits the Burr III distribution using maximum likelihood and computes confidence intervals using bootstrap. The Burr III distribution is very flexible 13, but it contains 1 parameter more than the log-normal or log-logistic distributions. Fitting of a distribution with many parameters requires a lot of data, and the Burr III distribution is likely to suffer from strong structural correlation among the parameters 13. Therefore, convergence can be difficult, and the estimates produced are not very reliable. However, BurrliOZ is currently being developed to fit the log-logistic distribution on small datasets and to provide a comparison between at least the log-logistic and the Burr III distribution for larger datasets 3.

Because it is easily accessible and user-friendly, MOSAIC_SSD can encourage the inclusion of censored data in SSD analysis to better use all of the data at hand. We did not address all of the methodological issues relating to the SSD approach but tried to improve the existing methods, with the aim of making the most of the available data given the cost of collecting them. Questions remain as to what might happen if the proportion of censored data is too great and the dataset is small. It is not possible to test this situation thoroughly, because there are many ways to censor data and no trivial way to choose between them. A good practice would be to consider the span of the confidence interval around the hazardous concentration of interest and decide whether the dataset is adequate for predicting such concentrations or whether more data need to be collected. Taking censored data into account would therefore be crucial to have a precise assessment of the confidence interval and not an artificially reduced estimation as in the trifluralin dataset.

We mentioned that censored data might represent an important part of any dataset and that MOSAIC_SSD could be profitably used on many occasions. In particular, inclusion of censored data appeared most important when few noncensored data were available on contaminants. However, this work could have a more general scope, because fundamentally all data with confidence intervals could be considered interval-censored data. Indeed, the confidence interval around any critical effect concentration estimate can be considered as the range that has a 95% probability of containing the real value and could be reported as interval-censored data. Using the confidence intervals on the critical effect concentrations as censored data provides a basic way to propagate the uncertainty on the critical effect concentration into the SSD, a fundamental problem of SSD 3 that is seldom addressed 38.

Moreover, lowest-observed-effect concentration (LOEC) data, which are often reported, are indeed left-censored data. The only information LOEC carries is that the NOEC lies below this concentration 39. Therefore, the SSD approach we propose, which includes censored data, would allow ecotoxicologists to better use the available experimental data used to calculate the NOEC.

However, we reached the limits of a traditional SSD based on critical effect concentrations and still discarded a lot of information. Indeed, a critical effect concentration is only a summary of a full concentration–effect curve. This summary sets aside several aspects of the response of a species to a pollutant, such as the slope of the curve. This slope describes whether the species is gradually affected or there is a sudden drastic effect. Including all the information present in the experimental concentration–effect curve in the SSD by building a hierarchical model of SSD is possible. This hierarchy would model the joint probability of all of the parameters describing a concentration–effect curve, not only the critical effect concentration in the classical SSD 40. Moreover, this would also take proper account of the uncertainty on the species response modeling and propagate uncertainty into the SSD.

SUPPLEMENTAL DATA

Tables S1.

Figures S1–S2. (175 KB DOC).

Acknowledgment

Financial support for the PhD of G. Kon Kam King is provided by the Région Rhône-Alpes. The authors thank C. Poix for English proofreading.

Supporting Information

REFERENCES

1 Posthuma L, Suter GW II, Traas TP. 2010. Species Sensitivity Distributions in Ecotoxicology. CRC Press, Boca Raton, FL, USA.
Google Scholar
2 Stephen C, Mount D, Hansen D. 1985. Guidelines for deriving numerical national water quality criteria for the protection of aquatic organisms and their uses. PB85-227049. US Environmental Protection Agency, Duluth, MN, USA. [cited 2014 May 13]. Available from: http://owpubauthor.epa.gov/scitech/swguidance/standards/criteria/current/upload/2009_01_13_criteria_85guidelines.pdf.
Google Scholar
3 Warne MSJ, Batley GE, Braga O, Chapman JC, Fox DR, Hickey CW, Stauber JL, Van Dam R. 2014. Revisions to the derivation of the Australian and New Zealand guidelines for toxicants in fresh and marine waters. Environ Sci Pollut Res 21: 51–60.
10.1007/s11356-013-1779-6
PubMed Web of Science® Google Scholar
4 CCME. 2007. A protocol for the derivation of water quality guidelines for the protection of aquatic life. Canadian Environmental Quality Guidelines. Canadian Council of Ministers of the Environment, Winnipeg.
Google Scholar
5 European Chemicals Agency. 2008. Chapter R:10: Characterisation of dose [concentration]-response for environment. In Guidance on Information Requirements and Chemical Safety Assessment. Helsinki, Finland.
Google Scholar
6 Jagoe RH, Newman MC. 1997. Bootstrap estimation of community NOEC values. Ecotoxicology 6: 293–306.
10.1023/A:1018639113818
CAS Web of Science® Google Scholar
7 Power M, McCarty LS. 1997. Fallacies in ecological risk assessment practices. Environ Sci Technol 31: 370A–375A.
10.1021/es972418b
CAS PubMed Web of Science® Google Scholar
8 Dowse R, Tang D, Palmer CG, Kefford BJ. 2013. Risk assessment using the species sensitivity distribution method: Data quality versus data quantity. Environ Toxicol Chem 32: 1360–1369.
10.1002/etc.2190
CAS PubMed Web of Science® Google Scholar
9 Helsel DR. 2006. Fabricating data: How substituting values for nondetects can ruin results, and what can be done about it. Chemosphere 65: 2434–2439.
10.1016/j.chemosphere.2006.04.051
CAS PubMed Web of Science® Google Scholar
10 Delignette-Muller ML, Pouillot R, Denis J-B, Dutang C. 2013. fitdistrplus: Help to Fit of a Parametric Distribution to Non-Censored or Censored Data. The R Project for Statistical Computing. [cited 2014 May 13]. Available from https://r-forge.r-project.org/scm/viewvc.php/*checkout*/JSS/fitdistrplus/paper2JSS.pdf?revision=236&root=riskassessment&pathrev=236.
Google Scholar
11 Van Straalen NM. 2002. Threshold models for species sensitivity distributions applied to aquatic risk assessment for zinc. Environ. Toxicol Pharmacol 11: 167–172.
10.1016/S1382-6689(01)00114-4
CAS PubMed Web of Science® Google Scholar
12 Zajdlik B, Dixon D, Stephenson G. 2009. Estimating water quality guidelines for environmental contaminants using multimodal species sensitivity distributions: A case study with atrazine. Hum Ecol Risk Assess 15: 554–564.
10.1080/10807030902892539
CAS Web of Science® Google Scholar
13 Shao Q. 2000. Estimation for hazardous concentrations based on NOEC toxicity data: An alternative approach. Environmetrics 11: 583–595.
10.1002/1099-095X(200009/10)11:5<583::AID-ENV456>3.0.CO;2-X
CAS Web of Science® Google Scholar
14 Aldenberg T, Jaworska JS. 2000. Uncertainty of the hazardous concentration and fraction affected for normal species sensitivity distributions. Ecotoxicol Environ Saf 46: 1–18.
10.1006/eesa.1999.1869
CAS PubMed Web of Science® Google Scholar
15 Aldenberg T, Slob W. 1993. Confidence limits for hazardous concentrations based on logistically distributed NOEC toxicity data. Ecotoxicol Environ Saf 25: 48–63.
10.1006/eesa.1993.1006
CAS PubMed Web of Science® Google Scholar
16 Wheeler JR, Grist EPM, Leung KMY, Morritt D, Crane M. 2002. Species sensitivity distributions: Data and model choice. Mar Pollut Bull 45: 192–202.
10.1016/S0025-326X(01)00327-7
CAS PubMed Web of Science® Google Scholar
17 Newman MC, Ownby DR, Mézin LCA, Powell DC, Christensen TRL, Lerberg SB, Anderson B-A. 2000. Applying species-sensitivity distributions in ecological risk assessment: Assumptions of distribution type and sufficient numbers of species. Environ Toxicol Chem 19: 508–515.
10.1002/etc.5620190233
CAS Web of Science® Google Scholar
18 Armitage P, Colton T. 2005. Maximum likelihood. In Encyclopedia of Biostatistics. John Wiley & Sons, Hoboken, NY, USA, pp 3058–3059.
10.1002/0470011815
Google Scholar
19 Campbell E, Palmer M, Shao Q, Warne M, Wilson D. 2000. BurrliOZ: A Computer Program for Calculating Toxicant Trigger Values for the ANZECC and ARMCANZ Water Quality Guidelines. Commonwealth Scientific and Industrial Research Organisation. [cited 2014 May 13]. Available from: http://www.csiro.au/Outcomes/Environment/Australian-Landscapes/BurrliOZ.aspx.
Google Scholar
20 Helsel DR. 2005. Nondetects and Data Analysis: Statistics for Censored Environmental Data. Wiley-Interscience, Hoboken, NJ, USA.
Google Scholar
21 Von Mises R. 1964. Mathematical Theory of Probability and Statistics. New York.
Google Scholar
22 Shaw-Allen P, Suter GW II. CADDIS SSD Generator. US Environmental Protection Agency. [cited 2014 May 13]. Available from: http://www.epa.gov/caddis/downloads/SSD_Generator_V1.xlt.
Google Scholar
23 Rodney SI, Moore DTR. 2008. SSD Master Ver 2.0. Determination of Hazardous Concentrations with Species Sensitivity Distributions. Intrinsik, Toronto, ON, Canada.
Google Scholar
24 Hickey GL, Craig PS. 2012. Competing statistical methods for the fitting of normal species sensitivity distributions: Recommendations for practitioners. Risk Analysis 32: 1232–1243.
10.1111/j.1539-6924.2011.01728.x
PubMed Web of Science® Google Scholar
25 Van Vlaardingen PLA, Traas TP, Wintersen AM, Aldenberg T. 2004. ETX 2.0. [cited 2014 May 13]. National Institute for Public Health and the Environment (RIVM). Available from: http://rivm.nl/bibliotheek/digitaaldepot/etx_2_0.zip.
Google Scholar
26 Turner R. 2010. A review of parameter learning methods based on approximate versions of the method of moments. [cited 2014 May 13]. The Gatsby Computational Neuroscience Unit. Available from: http://www.gatsby.ucl.ac.uk/∼turner/Notes/MomentMatching/MMReview.pdf.
Google Scholar
27 Hayashi TI, Kashiwagi N. 2010. A Bayesian method for deriving species-sensitivity distributions: Selecting the best-fit tolerance distributions of taxonomic groups. Hum Ecol Risk Assess 16: 251–263.
10.1080/10807031003670279
CAS Web of Science® Google Scholar
28 Hickey G, Kefford B. 2008. Making species salinity sensitivity distributions reflective of naturally occurring communities: Using rapid testing and Bayesian statistics. Environ Toxicol Chem 27: 2403–2411.
10.1897/08-079.1
CAS PubMed Web of Science® Google Scholar
29 Therneau TM. 2014. A Package for Survival Analysis in S. The R Project for Statistical Computing. [cited 2014 May 13]. Available from: http://cran.r-project.org/web/packages/survival/index.html.
Google Scholar
30 Lee L, Helsel D. 2005. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics. Computers & Geosciences 31: 1241–1248.
10.1016/j.cageo.2005.03.012
CAS Web of Science® Google Scholar
31 Minitab Inc. 2000. MINITAB Statistical Software, Release 13. State College, PA, USA.
Google Scholar
32 Kefford B, Nugegoda D. 2006. Validating species sensitivity distributions using salinity tolerance of riverine macroinvertebrates in the southern Murray-Darling Basin (Victoria, Australia). Can J Fish Aquat Sci 1877: 1865–1877.
10.1139/f06-080
Web of Science® Google Scholar
33 Kefford B, Palmer C. 2005. What is meant by ‘95% of species’? An argument for the inclusion of rapid tolerance testing. Hum Ecol Risk Assess 11: 1025–1046.
10.1080/10807030500257770
Web of Science® Google Scholar
34 Turnbull B. 1976. The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc Ser B Stat Methodol 38: 290–295.
10.1111/j.2517-6161.1976.tb01597.x
Web of Science® Google Scholar
35 Koyama J. 1996. Vertebral deformity susceptibilities of marine fishes exposed to herbicide. Bull Environ Contam Toxicol 56: 655–662.
10.1007/s001289900096
CAS PubMed Web of Science® Google Scholar
36 Aldenberg T, Rorije E. 2013. Species sensitivity distribution estimation from uncertain (QSAR-based) effects data. Altern Lab Anim 41: 19–31.
10.1177/026119291304100105
CAS PubMed Web of Science® Google Scholar
37 Hickey GL, Craig PS, Luttik R, de Zwart D. 2012. On the quantification of intertest variability in ecotoxicity data with application to species sensitivity distributions. Environ Toxicol Chem 31: 1903–1910.
10.1002/etc.1891
CAS PubMed Web of Science® Google Scholar
38 Dixon WJ. 2007. The use of probability bounds analysis for characterising and propagating uncertainty in species sensitivity distributions. Department of Sustainability and Environment, Victoria, Austraila.
Google Scholar
39 Delignette-Muller M-L, Forfait C, Billoir E, Charles S. 2011. A new perspective on the Dunnett procedure: Filling the gap between NOEC/LOEC and ECx concepts. Environ Toxicol Chem 30: 2888–2891.
10.1002/etc.686
PubMed Web of Science® Google Scholar
40 Moore D, Warren-Hicks W, Qian S, Fairbrother A, Aldenberg T, Barry T, Luttik R, Ratte H. 2010. Uncertainty analysis using classical and bayesian hierarchical models. In W Warren-Hicks, A Hart, eds, Application of Uncertainty Analysis to Ecological Risk of Pesticides. CRC Press, Pensacola, FL, USA, pp 134–141.
Google Scholar

Citing Literature

Volume33, Issue9

September 2014

Pages 2133-2139

MOSAIC_SSD: A new web tool for species sensitivity distribution to include censored data by maximum likelihood

Abstract

INTRODUCTION

REVIEW OF METHODS TO FIT AN SSD

MAXIMUM LIKELIHOOD FOR CENSORED DATA

MOSAIC_SSD

ADDED VALUE OF INCLUDING CENSORED DATA

DISCUSSION

SUPPLEMENTAL DATA

Acknowledgment

Supporting Information

REFERENCES

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

MOSAIC_SSD: A new web tool for species sensitivity distribution to include censored data by maximum likelihood

Abstract

INTRODUCTION

REVIEW OF METHODS TO FIT AN SSD

MAXIMUM LIKELIHOOD FOR CENSORED DATA

MOSAIC_SSD

ADDED VALUE OF INCLUDING CENSORED DATA

DISCUSSION

SUPPLEMENTAL DATA

Acknowledgment

Supporting Information

REFERENCES

Citing Literature

Figures

References

Related

Information