Development of algal interspecies correlation estimation models for chemical hazard assessment
Abstract
Web-based Interspecies Correlation Estimation (ICE) is an application developed to predict the acute toxicity of a chemical from 1 species to another taxon. Web-ICE models use the acute toxicity value for a surrogate species to predict effect values for other species, thus potentially filling in data gaps for a variety of environmental assessment purposes. Web-ICE has historically been dominated by aquatic and terrestrial animal prediction models. Web-ICE models for algal species were essentially absent and are addressed in the present study. A compilation of public and private sector–held algal toxicity data were compiled and reviewed for quality based on relevant aspects of individual studies. Interspecies correlations were constructed from the most commonly tested algal genera for a broad spectrum of chemicals. The ICE regressions were developed based on acute 72-h and 96-h endpoint values involving 1647 unique studies on 476 unique chemicals encompassing 40 genera and 70 species of green, blue-green, and diatom algae. Acceptance criteria for algal ICE models were established prior to evaluation of individual models and included a minimum sample size of 3, a statistically significant regression slope, and a slope estimation parameter ≥0.65. A total of 186 ICE models were possible at the genus level, with 21 meeting quality criteria; and 264 ICE models were developed at the species level, with 32 meeting quality criteria. Algal ICE models will have broad utility in screening environmental hazard assessments, data gap filling in certain regulatory scenarios, and as supplemental information to derive species sensitivity distributions. Environ Toxicol Chem 2016;35:2368–2378. Published 2016 Wiley Periodicals Inc. on behalf of SETAC. This article is a US government work and, as such, is in the public domain in the United States of America.
INTRODUCTION
Interspecies Correlation Estimation (ICE) models are least squares linear regressions that predict toxic effect concentrations of chemicals to untested taxa. Simply stated, ICE models are the log-linear relationship of relative sensitivity between the taxa of interest and that of a surrogate species (e.g., standard test species). Models for wildlife (birds and mammals 1-3) and aquatic animals (fish, invertebrates, amphibians 4-8) have been developed. Models for these taxonomic groups are publically available on the US Environmental Protection Agency's (USEPA's) web-based application Web-ICE 9, 10 and now include algae models established from the present study. The ICE models are becoming an important tool for determining safe concentrations of chemical exposures for a diversity of species through either direct toxicity estimation or generation of species sensitivity distributions (SSDs) 2, 3, 5, 6. Barron et al. 11 evaluated sources of uncertainty and variability for SSDs for aquatic organisms with a focus on fish and invertebrates. Not unexpectedly, ICE models provide better predictions for closely related taxa 1, 12. Guidance for their use in hazard assessment recommends that models are selected with the most closely related surrogate and predicted species as possible; that is, fish surrogates should be used to predict fish toxicity and invertebrate surrogates, to predict invertebrate toxicity 6, 12. In addition to the advantages of developing toxicity estimates for hazard assessments containing limited measured data, ICE models enable reduced testing of vertebrate animals in the context of improved animal welfare 13. In addition, the use of SSDs and a hazardous concentration that is protective of 95% of laboratory-tested species based on the SSD predictions is being employed or explored by a number of regulatory authorities including the European Chemicals Agency, the USEPA, Environment Canada, and the Chinese Research Academy of Environmental Sciences for chemical registrations and the development of water quality criteria. Finally, ICE is recommended by the US National Research Council in endangered species assessments of pesticides 13-17. Web-ICE models are envisioned to supplement the development of SSDs in the future by providing estimated values for additional taxa.
Algal species were not included in Web-ICE prior to 2013 (Ver 3.2.1). This is despite the broad recognition of the importance of algae in aquatic ecosystems and the need to consider algal responses to chemical stressors in environmental risk assessment. For certain chemicals, algae are often more sensitive than fish or invertebrate test organisms 18-20. Testing strategies that target algae and Daphnia (invertebrates) can lead to large reductions in subsequent animal (fish) testing because these can be the most sensitive taxa to a diversity of chemicals 21. For example, Jeram et al. 19 found that for 1439 substances algae, Daphnia, and fish were the most sensitive group 39.3%, 23.2%, and 14.9% of the time, respectively, with 22.6% of substances being equitoxic across all taxa. These observations are consistent with observations of large proprietary, multiple-species testing programs of many chemical technologies for consumer product formulations (S. Gimeno, Procter & Gamble, internal research summary presented at SETAC-Europe, Warsaw, Poland, May 2008).
Algal toxicity data are typically required by regulatory authorities worldwide in chemical dossiers and registration processes, in addition to data on fish and invertebrates (e.g., European Chemicals Agency 22, Zeeman 23, and Ministry of Environmental Protection of the People's Republic of China 24). The development of ICE models for algae completes the domain of the 3 principal taxonomic groups (algae, invertebrate, fish) to inform aquatic environmental assessments. Reasons for the lack of algal ICE models prior to 2013 relate to difficulties in assembling reliable toxicity data with comparable test procedures and toxicity metrics. An inherent extrapolation challenge exists in relating algal toxicity to invertebrate or fish toxicity. Algal toxicity endpoints are associated with growth of the population in biomass and reflect the unique physiological process being monitored in algal toxicity tests (photosynthesis) that is not shared by fish or invertebrate test species. It is likely that algae may not have an intrinsic relationship to higher, nonphotosynthetic organisms with respect to sensitivity to chemical exposure because of the distance of their taxonomic relatedness 12. Thus, the addition of algae expands the predictive utility of Web-ICE appreciably.
The purpose of the present study was to develop species-level and genus-level algal ICE models using as many chemicals and algal taxa as the available data can support. Alternative assumptions and criteria were systematically evaluated for screening and normalizing the algal toxicity data and for cross-validating the regressions. Unlike fish and invertebrate ICE models, models were developed using both genus and species surrogates because species-specific algal identifications are often lacking in the historical algal literature. In the present study, the algal toxicity database used for constructing algal ICE models is described, followed by the development of robust algal ICE models. Improvements to algal Web-ICE models are subsequently identified.
MATERIALS AND METHODS
Algal toxicity data compilation
The approach used to develop algal ICE models at the species and genus level is outlined in Figure 1. It consisted of the following: algal toxicity data compilation, data quality screening and normalization, regression model development and statistical analysis, and ICE model development and cross-validation. Over 20 000 data records were identified from electronic searches of ECOTOX (extraction occurred May 2010 25), Procter & Gamble internal testing archives (available data as of October 2012), and high-quality USEPA data that were not included in ECOTOX (information made available November 2012). Procter & Gamble testing was conducted in the course of consumer product ingredient assessments using conventional test methods. Testing was conducted under good laboratory practice or equivalent protocols, either in-house by Procter & Gamble or through contract test laboratories. Collected literature was extensively evaluated for quality ecotoxicity data, and conclusions were verified. Discrepancies were identified and resolved prior to any further analysis. In addition, taxonomic nomenclature and chemical synonyms were harmonized.

After removal of duplicates and single-concentration test results, 17 183 algal toxicity data records were retained, comprising tests of 1379 chemicals with 520 algal species or genera.
Data quality screening and normalization
Figure 1 summarizes the data quality screening and normalization process used to construct a reliable and consistent database of algal toxicity studies for regression analysis. The data records retained were restricted to studies utilizing the conventional test durations of 72 h and 96 h, effect concentration statistics (described in detail below, Choice of toxicity metric), and when the algal taxon was identified to at least the genus level. This reduced the number of records by 80%; 3500 records were retained, representing 752 chemicals tested with 76 species and genera. Taxa information in addition to chemical coverage from this restructured database can be seen in Supplemental Data.
The validity and quality of each study were determined by evaluating its coherence to 40-plus descriptive parameters and data quality criteria found within standard method guidelines such as Organisation for Economic Co-operation and Development (OECD) 202, USEPA 850.5400, and ASTM International E1218 26-28. Evaluation details can be found in the Supplemental Data. Acceptable studies were retained in the database and would be considered consistent with Klimisch level 1 or level 2 quality scores 29. Klimisch scoring is widely used by industry and regulators as the basis for assessing the quality of toxicological and ecotoxicological studies.
Algal taxonomic evaluation was conducted to ensure consistent species or genus designations. Synonymous species and genus names were each grouped. This was deemed necessary because of the nature of the algal data, reflecting the difficulty in precise algal taxonomy for some congeneric species, as well as historical changes in nomenclature. For example, toxicity data for Pseudokirchneriella and its former genus names, Selenastrum and Raphidocelis, were grouped together (the sole Pseudokirchneriella species was Pseudokirchneriella subcapitata). For approximately 5% of studies the alga tested was reliably identified only to the genus level; these data records were used in genus-level ICE models only. Two data sets were distinguished from the primary database based on the level of algal taxonomy: a genus data set and a species data set. Modeling was subsequently conducted on both to compare interspecies and intergenus toxicity correlation estimates.
Similarly, synonymous chemical names occurring in the primary database were identified by Chemical Abstracts Service number and grouped. Chemicals were then categorized into functional classes through the use of structural fragments as in ECOSAR 30, the OECD QSAR Toolbox 31, and expert judgment with the assistance of expert chemists. In some cases, multiple algal toxicity estimates were available for a single chemical and test organism. If the range of toxicity estimates exceeded a factor of 10, the data pair was viewed as potentially less reliable for that chemical. These potential outliers were evaluated for causes of variability and, if not resolved, were removed from the database. For ranges less than 10, geometric means and variances of toxicity values were calculated for each chemical/taxa data set; the calculated mean was subsequently used in regression modeling. For multiple, single-species toxicity estimates of divalent metals, data records were grouped for the various salts tested, irrespective of the counteranion.
Choice of toxicity metric
Algal toxicity data were initially sorted and grouped to ensure consistency in exposure duration, toxicity test metrics, and taxonomic designations. Data records from tests conducted under conventional exposure durations (72 h and 96 h) were retained; data with other test durations were removed from the database. Similarly, only records citing conventional test metrics (effect concentrations [EC] relating to algal growth rate over the test duration [ErC], yield or biomass under the curve [EbC], or cell density at the termination of the study) were retained for regression analyses. Because of different preferences and regulatory needs among member countries, 2 response variables are accepted. The average specific growth rate (ErC) is calculated based on the logarithmic increase in biomass during the test period, and the yield (EbC) is the biomass at the end of the test minus the starting biomass (see OECD 26 and USEPA 27 for complete endpoint descriptions). Effect concentrations relating to 50% of test organisms based on growth rate (ErC50), biomass (EbC50), or cell density (EC50) were used in the development of ICE models (referred to collectively as EC50). The influence of the metrics on the regression variability and goodness of fit was statistically evaluated in initial methods development. Regressions of the empirical relationship between ErC50 and EbC50, which are often determined within the same algal toxicity study, were generated using larger data sets of the most commonly evaluated taxa. In this manner, it was concluded that the relationship was sufficiently predictive across a wide range of chemicals that metrics could be mixed into an ICE model regression. However, for the development of ICE models broadly across the full complement of algal species or genera, data for the same effect metric (EC50, ErC50, EbC50) for both predicted and surrogate species with the same chemical were used to develop ICE models whenever possible. Only when this was not possible were endpoint metrics mixed. It should be noted that the focus in the present study was on acute toxicity only. Therefore, models do not contain no-observed-effect concentration and EC10 algal inhibition results. Web-ICE is built for the assessment of acute toxicity relationships. In general, acute toxicity values (i.e., EC50s) will have the smallest confidence interval derived from the exposure–response curve from well-conducted studies and therefore would be more likely to be associated with quality interspecies correlation estimates.
Regression model development and validation
Algal ICE regression models were developed from the standardized toxicity database using ordinary least squares regression, consistent with methods used for fish and invertebrate ICE models 10. The log10-transformed EC50 value (milligrams per liter) for the surrogate species or genus for each test chemical was designated the independent variable (x), and the respective log10-transformed EC50 value (milligrams per liter) for the predicted species or genus was included as the dependent variable (y). For chemicals that were assessed multiple times, toxicity data for each algal species or genus were summarized as a mean and standard deviation to assess for patterns of repeatability. Geometric mean values per chemical per taxon were used in ICE models so that models were not biased by repeat tests on the same chemical. All analyses were conducted using S-PLUS 8.1 statistical software (TIBCO Software).
Inclusion of algal ICE regressions into the Web-ICE model platform was based on an evaluation of adherence to 3 criteria: statistical significance of the regression, an adequate slope coefficient, and a minimum sample size. Dyer et al. 5 used these criteria for fish and invertebrate models at p < 0.05, slope ≥0.65, and a sample size ≥10. For algal ICE models included into Web-ICE, a minimum sample size of n = 3 was used, which is consistent for ICE models in Raimondo et al. 1, 10 for terrestrial wildlife and aquatic animals. These 3 “robustness criteria” were selected based on preliminary statistical evaluations of the algal toxicity database. Given the form and content of the standardized algal database, these criteria were chosen to maximize data quality while maintaining reasonable breadth among the chemical and species domains that could be modeled. For the purposes of the present study, a sample size criterion of 7 (number of chemicals tested in common for a taxonomic pair) was used as a compromise between that for models used on Web-ICE versus those previously used in Dyer et al. 5. The regression statistics for robust models of statistical significance (p < 0.05) and slope (≥0.65) remained the same.
The regression models that satisfied the statistical criteria for each chemical and species or genus pair were cross-validated by the “leave-one-out” approach used by Raimondo et al. 1, 10. In this approach, each pair of EC50 values for surrogate and predicted species responses for a given chemical were alternately removed, and the regression was recalculated with the remaining data. The “n-fold” difference of each estimated and actual value was used to determine the accuracy of the estimated toxicity value. An n-fold difference of 5 has been suggested as a common level of interlaboratory variability 32, 33. The cross-validation success rate for each model was calculated as the proportion of removed data points that are predicted within 5-fold of the actual value. If the removal of an x–y data pair results in a model that is not significant at the p < 0.05 level, the replicate is not included in calculating the cross-validation success rate.
RESULTS
The final database contained 1647 unique data records, 1041 from ECOTOX, 253 from Procter & Gamble's study archives, and 353 from the USEPA. Represented in the database were tests on 476 unique chemicals with 70 species and 40 genera of algae (Figure 1). The distribution of algal taxa tested was dominated by green algae (43%), diatoms (26%), and blue-green algae (26%).
The chemical modeling domain
Table 1 profiles the chemical domain represented in the final database. Inorganic chemicals were grouped as inorganic compounds, metals, and organo-metal compounds. Organic chemicals comprised the balance of the remaining chemical classes. A total of 62 classes of organic chemicals were represented, of which 30 classes contained only 1 or 2 unique chemicals. The neutral organic class dominated the distribution with 106 chemicals, followed by esters (39), phenols (37), aliphatic amines (31), and amides (27). Collectively, 31 surfactants in 3 classes based on charge were included in the database, reflecting Procter & Gamble's focus on high-volume cleaning product ingredients. Approximately 20 test records were included for metal ions of copper, cadmium, zinc, lead, and various oxides and salts.
Chemical class | n |
---|---|
Benzyl alcohol | 1 |
Benzyl halide | 1 |
Cationic polymer | 1 |
Chelator | 1 |
Chlorophenoxy ester | 1 |
Epoxides, mono | 1 |
Halo acid | 1 |
Halo ester | 1 |
Halo ketone | 1 |
Inorganic compound | 1 |
Keto-enol | 1 |
Methoxyacrylate | 1 |
Neonicotinoid | 1 |
Nicotinoid acid | 1 |
Nitrile | 1 |
Organometal | 1 |
Oxime carbamate ester | 1 |
Phenol amine | 1 |
Proparagyl Ether | 1 |
Triazole pyrimidine sulfona | 1 |
Vinyl/allyl alcohol | 1 |
Vinyl/allyl ether acid | 1 |
Thiophthalimides | 2 |
Thiazolones (iso-) | 2 |
Imides | 2 |
Halo ethers | 2 |
Halo alcohols | 2 |
Cyanides | 2 |
Benzyl nitriles | 2 |
Acrylimides | 2 |
Aldehydes | 3 |
Alkyl imidazolinium | 3 |
Polynitrophenols | 3 |
Pyridine-alpha acid | 3 |
Quinolones | 3 |
Thiocyanates | 3 |
Thioureas | 3 |
Esters (phosphate) | 4 |
Haloacetamides | 4 |
Pyrethroids | 4 |
Vinyl/allyl esters | 4 |
Carbonyl ureas | 5 |
Polynitrobenzenes | 6 |
Triazoles (nonfused) | 6 |
Vinyl/allyl ketones | 7 |
Metals | 8 |
Carbamates | 9 |
Esters, mono or dithiophosphates | 9 |
Hydrazines | 9 |
Pyrazoles | 9 |
Cationic surfactants | 10 |
Halopyrdines | 10 |
Imidazoles | 10 |
Nonionic surfactants | 10 |
Sulfonyl ureas | 10 |
Anionic surfactants | 11 |
Substituted ureas | 11 |
Vinyl/allyl halides | 11 |
Phenols, poly-acid | 14 |
Thiocarbamate | 14 |
Triazines | 17 |
Anilines | 22 |
Amides | 27 |
Aliphatic amines | 31 |
Phenols | 37 |
Esters | 39 |
Neutral organics | 106 |
Use of different EC50 metrics for the purpose of effect comparisons in ICE models
The effect of utilizing ErC50 and EbC50 values as interchangeable metrics was evaluated from paired algal toxicity tests with Desmodesmus and Pseudokirchneriella across the range of chemicals tested in common. Values for the 2 metrics appear to be randomly distributed; the correlation coefficient of the pooled regression was 0.85 (r2 = 0.73).
For studies in which both ErC50 and EbC50 were calculated, EbC50 always provided the lower value. However, the relationship is highly predictive with a slope close to unity, and the intercept represents the routine difference between the 2 (generally a factor of 3.2). Regressions of P. subcapitata and Desmodesmus subspicatus ErC50s, EbC50s, and pooled data were all highly significant (p < 0.001). Clearly, EbC50 effect values were quantitatively lower than ErC50 effect values, but combining endpoints still provided reasonable comparison of sensitivity for the species evaluated.
Interspecies and intergenus regression results
Regressions of EC50 values for chemicals tested in common between 2 species were calculated for each species pair among the 70 algal species in the toxicity database for a total of 264 paired comparisons. Thirty-two of the paired species regressions satisfied all 3 criteria, and their regression statistics (slope, p value, and sample size, correlation estimate) are shown in Table 2. Among these, the species with the greatest number of “robust” regressions with other species were P. subcapitata (7 species), Skeletonema costatum (5), D. subspicatus (3), and Scenedesmus acutus (3). Figure 2A provides a representative species model for P. subcapitata and Chlorella vulgaris. The remaining interspecies regressions failed at least 1 criterion. In most cases, these regressions had a low sample size (<7) of chemicals tested in common between the 2 species or had a slope <0.65.
Species (X) | Species (Y) | p | n | Slope | Intercept | r2 |
---|---|---|---|---|---|---|
Anabaena flos-aquae | Microcystis aeruginosa | <0.00001 | 15 | 0.887 | –0.698 | 0.737 |
Anabaena flos-aquae | Microcystis flos-aquae | 0.00041 | 15 | 0.958 | –1.102 | 0.630 |
Chlorella pyrenoidosa | Scenedesmus acutus | <0.00001 | 18 | 0.796 | 0.415 | 0.697 |
Desmodesmus subspicatus | Phaeodactylum tricornutum | <0.00001 | 13 | 0.975 | 0.088 | 0.936 |
Desmodesmus subspicatus | Pseudokirchneriella subcapitata | <0.00001 | 32 | 0.876 | 0.067 | 0.969 |
Desmodesmus subspicatus | Skeletonema costatum | <0.00001 | 23 | 0.938 | –0.543 | 0.955 |
Microcystis aeruginosa | Anabaena flos-aquae | <0.00001 | 15 | 0.831 | –0.067 | 0.737 |
Microcystis aeruginosa | Microcystis flos-aquae | 0.00088 | 15 | 0.894 | –0.882 | 0.586 |
Microcystis flos-aquae | Anabaena flos-aquae | 0.00041 | 15 | 0.658 | –0.184 | 0.630 |
Microcystis flos-aquae | Microcystis aeruginosa | 0.00088 | 15 | 0.655 | –0.612 | 0.586 |
Minutocellus polymorphus | Skeletonema costatum | <0.00001 | 15 | 0.743 | 0.147 | 0.757 |
Phaeodactylum tricornutum | Skeletonema costatum | <0.00001 | 13 | 0.912 | –0.906 | 0.947 |
Phaeodactylum tricornutum | Desmodesmus subspicatus | <0.00001 | 13 | 0.960 | –0.364 | 0.936 |
Pseudokirchneriella subcapitata | Anabaena flos-aquae | 0.00065 | 20 | 0.836 | 0.271 | 0.466 |
Pseudokirchneriella subcapitata | Chlorella vulgaris | 0.00301 | 18 | 0.724 | 0.388 | 0.433 |
Pseudokirchneriella subcapitata | Scenedesmus acutus | 0.00582 | 15 | 0.716 | –0.042 | 0.455 |
Pseudokirchneriella subcapitata | Chlorella pyrenoidosa | 0.00011 | 19 | 1.028 | 0.481 | 0.593 |
Pseudokirchneriella subcapitata | Scenedesmus quadricauda | <0.00001 | 21 | 0.703 | 0.182 | 0.571 |
Pseudokirchneriella subcapitata | Skeletonema costatum | <0.00001 | 39 | 1.050 | 0.593 | 0.936 |
Pseudokirchneriella subcapitata | Desmodesmus subspicatus | <0.00001 | 32 | 1.107 | –0.110 | 0.969 |
Scenedesmus acutus | Anabaena flos-aquae | 0.04109 | 11 | 0.692 | 0.148 | 0.387 |
Scenedesmus acutus | Scenedesmus quadricauda | 0.00273 | 15 | 0.689 | –0.112 | 0.511 |
Scenedesmus acutus | Chlorella pyrenoidosa | <0.00001 | 18 | 0.875 | –0.376 | 0.697 |
Scenedesmus quadricauda | Pseudokirchneriella subcapitata | <0.00001 | 21 | 0.813 | –0.331 | 0.571 |
Scenedesmus quadricauda | Scenedesmus acutus | 0.00273 | 15 | 0.742 | –0.073 | 0.511 |
Sellaphora seminulum | Pseudokirchneriella subcapitata | 0.00826 | 12 | 0.815 | –0.446 | 0.519 |
Skeletonema costatum | Phaeodactylum tricornutum | <0.00001 | 13 | 1.038 | 0.739 | 0.947 |
Skeletonema costatum | Pseudokirchneriella subcapitata | <0.00001 | 39 | 0.891 | 0.478 | 0.936 |
Skeletonema costatum | Minutocellus polymorphus | <0.00001 | 15 | 1.019 | –0.252 | 0.757 |
Skeletonema costatum | Desmodesmus subspicatus | <0.00001 | 23 | 1.018 | 0.468 | 0.955 |
Skeletonema costatum | Thalassiosira pseudonana | <0.00001 | 19 | 0.957 | 0.175 | 0.927 |
Thalassiosira pseudonana | Skeletonema costatum | <0.00001 | 19 | 0.969 | –0.290 | 0.927 |

Intergenus regressions were calculated for all possible paired comparisons among 40 genera and evaluated by the same robustness criteria. Twenty-one paired regressions satisfied the criteria (Table 3), and Figure 2B provides a representative genus-level model for Pseudokirchneriella and Chlorella. Genera with the higher number of robust regressions with other genera were Skeletonema (5), Pseudokirchneriella (4), and Desmodesmus (3). Genera with a single robust regression were Anabaena, Chlamydomonas, Chlorella, Minutocellus, Scenedesmus, Sellaphora, and Thalassiosira. The remaining regressions failed at least 1 criterion, usually because of inadequate sample size and nonsignificant slope. These were not considered further for the set of intergenus ICE models. Figure 3 provides examples of 4 intergenus regression estimates of toxicity (Anabaena, Chlorella, Scenedesmus, and Desmodesmus) based on Pseudokirchneriella's sensitivity as the dependent variable. As the chemicals tested in common with Pseudokirchneriella differed for each of the predicted genera, differences in slopes may not be directly comparable; however, these comparisons can provide an overall impression of relative order of sensitivity. Pseudokirchneriella is nearly equisensitive to Desmodesmus (slope of 1.107, intercept of –0.107; Table 3). Chlorella is consistently less sensitive than Pseudokirchneriella (Figure 3).
Species (X) | Species (Y) | p | n | Slope | Intercept | r2 |
---|---|---|---|---|---|---|
Anabaena | Pseudokirchneriella | <0.00001 | 24 | 0.650 | –0.331 | 0.519 |
Chlamydomonas | Chlorella | 0.00981 | 11 | 0.862 | 0.230 | 0.542 |
Chlorella | Pseudokirchneriella | <0.00001 | 32 | 0.686 | –0.232 | 0.479 |
Desmodesmus | Pseudokirchneriella | <0.00001 | 33 | 0.876 | 0.064 | 0.969 |
Desmodesmus | Phaeodactylum | <0.00001 | 13 | 0.975 | 0.088 | 0.936 |
Desmodesmus | Skeletonema | <0.00001 | 23 | 0.938 | –0.543 | 0.955 |
Minutocellus | Skeletonema | <0.00001 | 15 | 0.743 | 0.147 | 0.757 |
Phaeodactylum | Skeletonema | <0.00001 | 13 | 0.912 | –0.906 | 0.947 |
Phaeodactylum | Desmodesmus | <0.00001 | 13 | 0.960 | –0.364 | 0.936 |
Pseudokirchneriella | Chlorella | <0.00001 | 32 | 0.699 | 0.543 | 0.479 |
Pseudokirchneriella | Desmodesmus | <0.00001 | 33 | 1.107 | –0.107 | 0.969 |
Pseudokirchneriella | Skeletonema | <0.00001 | 40 | 1.054 | –0.568 | 0.940 |
Pseudokirchneriella | Anabaena | <0.00001 | 24 | 0.798 | 0.291 | 0.519 |
Scenedesmus | Pseudokirchneriella | 0.00049 | 25 | 0.738 | –0.203 | 0.417 |
Sellaphora | Pseudokirchneriella | 0.00826 | 12 | 0.815 | –0.446 | 0.519 |
Skeletonema | Thalassiosira | <0.00001 | 20 | 0.959 | 0.183 | 0.927 |
Skeletonema | Minutocellus | <0.00001 | 15 | 1.019 | –0.252 | 0.757 |
Skeletonema | Phaeodactylum | <0.00001 | 13 | 1.038 | 0.739 | 0.947 |
Skeletonema | Desmodesmus | <0.00001 | 23 | 1.018 | 0.468 | 0.955 |
Skeletonema | Pseudokirchneriella | <0.00001 | 40 | 0.892 | 0.462 | 0.940 |
Thalassiosira | Skeletonema | <0.00001 | 20 | 0.967 | –0.296 | 0.927 |

Cross-validation
Only the 58 “robust” regressions were subsequently cross-validated to evaluate their ability to accurately predict toxicity. The species-level models predicted within 5-fold for on average 70.3% of cross-validated data points and within 10-fold for 76.7% of data points (Table 4). At the 5-fold and 10-fold criteria levels, 4 and 2 models, respectively, yielded 0% success. No apparent pattern based on taxonomy was discerned, and all of the species pairs that were not predicted within 10-fold had fewer than 7 chemicals represented in the correlation. Twenty-five of the 58 species pairs had cross-validation success rates at the 5-fold criterion level that were ≥80%. Success rates at the genus level were somewhat higher, with the worst-performing genus pairs approximately 40% (Table 5). Twenty of 44 models had ≥80% success rates at the 5-fold criterion level.
Cross-validation success (%) at each n-fold | |||
---|---|---|---|
Species X: surrogate | Species Y: predicted | 5 | 10 |
Chlorella pyrenoidosa | Anabaena flos-aquae | 71 | 86 |
Microcystis aeruginosa | Anabaena flos-aquae | 93* | 93* |
Microcystis flos-aquae | Anabaena flos-aquae | 87* | 87* |
Pseudokirchneriella subcapitata | Anabaena flos-aquae | 62* | 76* |
Scenedesmus acutus | Anabaena flos-aquae | 0* | 20* |
Scenedesmus quadricauda | Anabaena flos-aquae | 78 | 78 |
Skeletonema costatum | Anabaena flos-aquae | 50 | 63 |
Desmodesmus subspicatus | Chlamydomonas reinhardtii | 50 | 50 |
Anabaena flos-aquae | Chlorella pyrenoidosa | 86 | 86 |
Pseudokirchneriella subcapitata | Chlorella pyrenoidosa | 74* | 79* |
Scenedesmus acutus | Chlorella pyrenoidosa | 83* | 83* |
Desmodesmus subspicatus | Chlorella vulgaris | 33 | 33 |
Pseudokirchneriella subcapitata | Chlorella vulgaris | 67* | 67* |
Skeletonema costatum | Chlorella vulgaris | 0 | 0 |
Chlamydomonas reinhardtii | Desmodesmus subspicatus | 0 | 100 |
Chlorella vulgaris | Desmodesmus subspicatus | 0 | 0 |
Phaeodactylum tricornutum | Desmodesmus subspicatus | 85* | 85* |
Pseudokirchneriella subcapitata | Desmodesmus subspicatus | 84* | 94* |
Scenedesmus quadricauda | Desmodesmus subspicatus | 78 | 78 |
Skeletonema costatum | Desmodesmus subspicatus | 91* | 91* |
Pseudokirchneriella subcapitata | Dunaliella tertiolecta | 67 | 67 |
Anabaena flos-aquae | Microcystis aeruginosa | 87* | 93* |
Microcystis flos-aquae | Microcystis aeruginosa | 93* | 93* |
Anabaena flos-aquae | Microcystis flos-aquae | 87* | 93* |
Microcystis aeruginosa | Microcystis flos-aquae | 93* | 93* |
Skeletonema costatum | Minutocellus polymorphus | 87* | 93* |
Desmodesmus subspicatus | Phaeodactylum tricornutum | 85* | 85* |
Pseudokirchneriella subcapitata | Phaeodactylum tricornutum | 100 | 100 |
Skeletonema costatum | Phaeodactylum tricornutum | 77* | 85* |
Anabaena flos-aquae | Pseudokirchneriella subcapitata | 48 | 57 |
Chlorella pyrenoidosa | Pseudokirchneriella subcapitata | 63 | 63 |
Chlorella vulgaris | Pseudokirchneriella subcapitata | 50 | 67 |
Desmodesmus subspicatus | Pseudokirchneriella subcapitata | 88* | 94* |
Dunaliella tertiolecta | Pseudokirchneriella subcapitata | 100 | 100 |
Phaeodactylum tricornutum | Pseudokirchneriella subcapitata | 100 | 100 |
Scenedesmus acutus | Pseudokirchneriella subcapitata | 67 | 73 |
Scenedesmus quadricauda | Pseudokirchneriella subcapitata | 67* | 76* |
Sellaphora seminulum | Pseudokirchneriella subcapitata | 42* | 58* |
Skeletonema costatum | Pseudokirchneriella subcapitata | 82* | 89* |
Anabaena flos-aquae | Scenedesmus acutus | 20 | 20 |
Chlorella pyrenoidosa | Scenedesmus acutus | 89* | 89* |
Pseudokirchneriella subcapitata | Scenedesmus acutus | 73* | 73* |
Scenedesmus quadricauda | Scenedesmus acutus | 67* | 73* |
Anabaena flos-aquae | Scenedesmus quadricauda | 89 | 89 |
Desmodesmus subspicatus | Scenedesmus quadricauda | 78 | 89 |
Pseudokirchneriella subcapitata | Scenedesmus quadricauda | 76* | 76* |
Scenedesmus acutus | Scenedesmus quadricauda | 73* | 73* |
Pseudokirchneriella subcapitata | Sellaphora seminulum | 67 | 67 |
Skeletonema costatum | Sellaphora seminulum | 63 | 63 |
Anabaena flos-aquae | Skeletonema costatum | 75 | 86 |
Chlorella vulgaris | Skeletonema costatum | 67 | 100 |
Desmodesmus subspicatus | Skeletonema costatum | 91* | 91* |
Minutocellus polymorphus | Skeletonema costatum | 80* | 87* |
Phaeodactylum tricornutum | Skeletonema costatum | 77* | 77* |
Pseudokirchneriella subcapitata | Skeletonema costatum | 82* | 82* |
Sellaphora seminulum | Skeletonema costatum | 63 | 88 |
Thalassiosira pseudonana | Skeletonema costatum | 100* | 100* |
Skeletonema costatum | Thalassiosira pseudonana | 95* | 100* |
- * Regressions meeting the more stringent criteria of n ≥ 7
Cross-validation success (%) at n-fold | |||
---|---|---|---|
Genus X: surrogate | Genus Y: predicted | 5 | 10 |
Chlorella | Anabaena | 71 | 71 |
Desmodesmus | Anabaena | 50 | 63 |
Microcystis | Anabaena | 100 | 100 |
Pseudokirchneriella | Anabaena | 71* | 75* |
Skeletonema | Anabaena | 60 | 70 |
Chlorella | Chlamydomonas | 70 | 70 |
Desmodesmus | Chlamydomonas | 80 | 90 |
Anabaena | Chlorella | 71 | 71 |
Chlamydomonas | Chlorella | 60* | 60* |
Pseudokirchneriella | Chlorella | 66* | 75* |
Skeletonema | Chlorella | 56 | 56 |
Anabaena | Desmodesmus | 63 | 63 |
Chlamydomonas | Desmodesmus | 80 | 90 |
Phaeodactylum | Desmodesmus | 85* | 85* |
Pseudokirchneriella | Desmodesmus | 85* | 94* |
Scenedesmus | Desmodesmus | 56 | 67 |
Skeletonema | Desmodesmus | 91* | 91* |
Pseudokirchneriella | Dunaliella | 67 | 67 |
Anabaena | Microcystis | 100 | 100 |
Skeletonema | Minutocellus | 87* | 93* |
Desmodesmus | Phaeodactylum | 85* | 85* |
Pseudokirchneriella | Phaeodactylum | 100 | 100 |
Skeletonema | Phaeodactylum | 77* | 858* |
Anabaena | Pseudokirchneriella | 50* | 63* |
Chlorella | Pseudokirchneriella | 59* | 63* |
Desmodesmus | Pseudokirchneriella | 88* | 94* |
Dunaliella | Pseudokirchneriella | 100 | 100 |
Phaeodactylum | Pseudokirchneriella | 100 | 100 |
Scenedesmus | Pseudokirchneriella | 72* | 72* |
Sellaphora | Pseudokirchneriella | 42* | 58* |
Skeletonema | Pseudokirchneriella | 80* | 83* |
Desmodesmus | Scenedesmus | 78 | 78 |
Pseudokirchneriella | Scenedesmus | 76 | 76 |
Pseudokirchneriella | Sellaphora | 67 | 67 |
Skeletonema | Sellaphora | 63 | 63 |
Anabaena | Skeletonema | 80 | 80 |
Chlorella | Skeletonema | 44 | 44 |
Desmodesmus | Skeletonema | 91* | 91* |
Minutocellus | Skeletonema | 80* | 87* |
Phaeodactylum | Skeletonema | 77* | 77* |
Pseudokirchneriella | Skeletonema | 80* | 83* |
Sellaphora | Skeletonema | 63 | 88 |
Thalassiosira | Skeletonema | 100* | 100* |
Skeletonema | Thalassiosira | 95* | 100* |
- * Regressions meeting the more stringent criteria of n ≥ 7
DISCUSSION
The present study demonstrated that algal ICE models can be developed and can be statistically robust for a diversity of species and across multiple chemical classes. In general, it appears that ICE models for algae are in the range of variability and robustness encountered with ICE models built for invertebrates, fish 10 (supported by representative outputs from Web-ICE), and wildlife 1, 3. A number of models in Web-ICE show high prediction accuracy for both invertebrates and fish. For example, using fathead minnow as the surrogate to predict toxicity to rainbow trout or Gammarus pseudolimneaus to predict Daphnia magna gives very high p values (both <0.00001), R2 values (0.83 and 0.77, respectively), and slopes (0.93 and 0.76, respectively) with narrow prediction intervals. Dyer et al. 6 hypothesized that taxonomic relatedness was a key determinant of uncertainty in ICE models after noting that fish–fish models and invertebrate–invertebrate models were measurably better than invertebrate–fish or fish–invertebrate models. Raimondo et al. 12 demonstrated this facet more conclusively, resulting in recommendations that taxonomic relatedness be considered as a matter of course when utilizing ICE models for environmental assessments. Craig et al. 34 also explored this factor in hierarchical SSD models developed from interspecies correlation estimates where species exchangeability in SSD models is assessed. Algae–invertebrate and algae–fish ICE models were explored to a very minor extent by Asfaw et al. 35. In these summaries, 31 ICE models involving 4 different algal taxa consistently yielded poor results. Only 1 model (the marine diatom S. costatum and the sheepshead minnow, Cyprinodon variegatus) would be judged sufficient using the criteria employed in the present study, and not 1 algal–algal model was useful. It is very unlikely that useful algae–invertebrate or algae–fish relationships will result from the expanded information base used in the present study because the taxonomic distance between algae and multicellular animals is greater than that in the animal–animal relationships already assessed.
Taxa diversity in the algae database and models were somewhat limited because of trends in toxicity testing toward a few standard test species, whereas the chemical coverage was relatively diverse. For example, in recent years there has been substantially less testing of freshwater cyanobacteria and diatoms because standard methods have increasingly emphasized green algae, and thus, much of the data on the non–green algal species are dated. Importantly, both Pseudokirchneriella and Chlorella are members of the family Oocystaceae and Scenedesmus and Desmodesmus are of the family Scenedesmaceae. Therefore, even for green algae, diversity of commonly tested taxa is somewhat limited. Photosynthetic biota diversity is quite large, and the number of algal species covered is paltry by comparison but less so for fish considering the number of known and suspected species worldwide. Freshwater algae are polyphyletic, span at least a dozen described phyla (or divisions), and by some estimates may number in the range of 200 000 species with approximately 25% presently described 36, 37, although some estimates are as high as 1 million 38.
A total of 457 chemicals were included in the present study's models. Neutral organics were the single largest chemical class, which is important because this class also is the largest in commerce. It has been estimated that over 80% of the volume of chemicals in international commerce are neutral organics by production volume 39. Approximately half of the chemical classes covered (62) were represented by only 1 or 2 individual chemicals. Based on the total chemical and species coverage, a large number of additional algal ICE models could be developed with focused testing on a relatively small number of organisms and chemicals.
Contributors to variability in algal ICE models are different from those for invertebrates and fish. First, summaries of tests on all these taxonomic groups will indicate a certain level of intertest variation on the same chemicals 40. Coefficients of variation for fish and daphnid acute toxicity studies generally range from 5% to 25% 41. Fish and daphnid acute toxicity studies utilize survival (or conversely mortality) as the primary criterion and are exceptionally well standardized for test duration 42-45. This is not the case for algal inhibition studies, whose effect conclusions are based on growth rates of exposed populations for 72 h in the case of the OECD 26 or terminal cell density or biomass after 96 h in the case of the USEPA 27. To meet the needs of these different but related test procedures, different levels of nutrient addition, light, and so on are used. When evaluating algal ICE models, several additional factors contribute to apparent test variability for any given chemical. We showed that these differences are not enormous, but they also should not be ignored. Relationships appeared to be robust when comparing like effect metrics across tests on different species (for example, the ErC50s for D. subspicatus and P. subcapitata for a given test chemical show the same relative sensitivity that their EC50s for EbC50s do). Therefore, algal ICE models could at least initially be developed which would utilize an appropriate pair of inhibition endpoint metrics.
One of the 3 robustness criteria applied to the algal database was the minimum number of chemicals tested in common between species/genus pairs, that is, the chemical sample size. A sample size of 7 was employed in the present study. This limit was chosen for robustness and predictivity. Figure 4 compares the constraints on the number of records in the algal database imposed by different chemical sample size limits. The criterion of 7 appeared to be reasonable based on the form and content of the algal data. Web-ICE includes models with 3 or greater chemicals in species pair combinations, yielding a total of 58 species-level and 44 genus-level algal models. Models with fewer than 7 chemicals included in Web-ICE still meet slope and statistical significance criteria; however, as discussed in the section Cross-validation, this also has an influence on the cross-validation success rate.

An extension of ICE models to algae fulfills a need to incorporate photosynthetic organisms into the prediction software. An extension to include aquatic macrophytes would also be desirable. Algal–macrophyte and macrophyte–macrophyte ICE models have not been explored to a large degree and are lacking in the literature, most likely because of so few plant test species being available. M.M. Gausman (2006, Master's thesis, Miami University, Oxford, OH, USA) compared the sensitivity of Lemna species (L. gibba, L. major, and L. minor) to aquatic algae across a series of broad chemical categories (pesticides, pharmaceuticals, heavy metals, surfactants). Regressions of all chemicals pooled provided similar quality to algal–algal ICE models. The slope of the log algae–log Lemna EC50 regression (algae as the surrogate taxon) resulted in a slope of 0.66, an intercept of 0.1, and a correlation of 0.704 (n = 93 compounds). Algae were empirically more sensitive to 50 of the 93 chemicals compared with Lemna.
Algal ICE models are expected to have a number of possible uses in environmental hazard and risk assessment. Algae are often excluded in certain hazard-assessment procedures. For example, algae inhibition information is not a component of typical United States water quality criteria. Algae are among the more sensitive taxonomic groups for such commonly assessed chemicals as triclosan 46, zinc pyrithione 47, cationic surfactants 48, herbicides 49, and alkyl amines 50; and hazard assessments for the purpose of determining safe concentrations in the environment can therefore be improved if algae are included. Hutchinson et al. 20 reviewed pharmaceutical active ingredients and found that algal and daphnid EC50 values were lower than fish 50% lethal concentrations for 73 of 91 (80.2%) actives leading to the development of the acute threshold approach 13. Jeram et al. 19 expanded this review of sensitivity across algal, invertebrate, and fish and identified that algae were the most sensitive taxa tested in 39.3% of 1439 substances reviewed. This strongly suggests that algae not only are essential members of a screening toxicity data set but also drive many assessments. By understanding such relationships scientists can also reduce the number of aquatic vertebrate tests as well by focusing on the more likely sensitive trophic levels on a routine basis. In addition to use in more traditional hazard assessments, algal ICE models may be useful to plug data gaps in SSDs, support the development of algal quantitative structure–activity relationships, and assist in the support of read-across strategies within chemical categories.
Although the models described in the present study represent a reasonable first step toward their use, many areas remain to be explored—for example, detailed review of marine versus freshwater species relationships. If sufficient information is available someday, more mechanistic-type assessments based on mode of action or specific subsets of chemical categories may reveal important relationships. Development of new data that would result in additional algal ICE models or supplement those presented would allow models to be developed with more diverse chemicals and species. Large databases of studies are now becoming publically available as a result of the European Registration, Evaluation, Authorization, and Restriction of Chemicals (REACH) legislation, the OECD High Production Volume Challenge, and other large chemical management programs that could be useful to supplement existing data. In the present study, the published literature was directly supplemented by Procter & Gamble algal inhibition data reflecting the use of algal toxicity testing in a traditional industry risk-assessment setting. Additional industry-held data sets would provide added benefit. Variability of the data used to develop algal ICE models could be reduced through the full international standardization of test methodologies and endpoint metrics. The international trend toward use of the 72-h ErC50 and ErC10 growth rate inhibition metrics for acute and chronic toxicity testing versus the 96-h EC50 is evident in the literature. Algal ICE results could be used to supplement SSDs similar to those already proposed for wildlife, invertebrates, and fish. Some chemical groups may benefit greatly from their inclusion (microbiocides and herbicides in particular). Lastly, the ICE models reported in the present study have focused exclusively on acute toxicity. Algae may represent the first route to develop chronic ICE models because the base data used in chronic growth rate inhibition are the same as for acute; the primary difference is the choice of percentage effect that is modeled (50% for acute effects vs 10% for chronic effects).
Supplemental Data
The Supplemental Data are available on the Wiley Online Library at DOI: 10.1002/etc.3375.
Acknowledgment
Numerous individuals contributed to the present study at Procter & Gamble and the USEPA. The authors especially thank the efforts of D. Versteeg, M. Gausman, A. Huerta, O. Idris, M. Kovach, C. White-Hull, C. Lilavois, and D. Vivian for data compilation, quality assurance, and manuscript critiques.
Disclaimer
The present study has been reviewed according to USEPA guidelines, but the opinions expressed are those of the authors and do not represent the policies or opinions of the USEPA.
Data availability
The data is accessible via the USEPA on the WEB-ICE platform (http://epa.gov/ceampubl/fchain/webice/).