Volume 18, Issue 11 pp. 2497-2505
Environmental Toxicology
Full Access

Multiple Computer-Automated structure evaluation program study of aquatic toxicity 1: Guppy

Gilles Klopman

Corresponding Author

Gilles Klopman

Department of Chemistry, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio, 44106, USA

Department of Chemistry, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio, 44106, USASearch for more papers by this author
Roustem Saiakhov

Roustem Saiakhov

Department of Chemistry, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio, 44106, USA

Search for more papers by this author
Herbert S. Rosenkranz

Herbert S. Rosenkranz

Department of Environmental and Occupational Health, University of Pittsburgh, Pittsburgh, Pennsylvania, USA

Search for more papers by this author
Joop L. M. Hermens

Joop L. M. Hermens

Research Institute of Toxicology, University of Utrecht, Utrecht, The Netherlands

Search for more papers by this author
First published: 02 November 2009
Citations: 28

Abstract

An acute fish toxicity model was constructed on the basis of a wide series of experimental data for guppy. The Multiple Computer-Automated Structure Evaluation program was used to construct the model. The created model possesses very good predictive ability. It can correctly predict acute toxicity for guppy for 80% of compounds with an average error of only 0.63 log unit per median lethal concentration. The importance of the narcosis effect was demonstrated. The main toxicophores, corresponding to polar narcosis and to the reactive chemicals, were identified.

INTRODUCTION

The vitality of our environment, especially its aquatic resources-rivers, lakes, seas, and oceans-is continuously challenged by the influx of pollutants from a multitude of sources. Such pollution strongly affects the quality of aquatic life and of human health.

The aquatic toxicity of chemicals can be measured experimentally. For fish, it is usually given as the median lethal concentration (LC50), or the concentration needed to kill 50% of fish in a specified time. Systematic measurements of LC50 for different classes of chemicals were made for a variety of fish. Among these, guppy, fathead minnow, gold fish, rainbow trout, blue gill, and a few others were recommended as standard test species for the estimation of the aquatic hazard of chemicals [1].

It is obvious that experimental investigation of the aquatic toxicity of the many chemicals that may find their way into lakes and rivers is not feasible. It would take many years and be resource intensive. For this reason, alternative ways to estimate aquatic toxicity are desirable. Among these, methods based on quantitative structure-activity relationship (QSAR) methodologies are particularly attractive. A number of attempts to design models of fish toxicity that could be used to predict the toxic properties of untested chemicals have been made.

This is the first of a series of articles dealing with structure-activity studies of a large number of fish, leading ultimately to a better understanding of the factors involved in such toxicity as well as the development of a predictive model of aquatic toxicity.

The first real application of QSAR to aquatic toxicology has to be attributed to Konemann [2]. He demonstrated that for a set of 50 so-called industrial pollutants, the acute aquatic toxicity, represented by 7- or 14-d LC50 to guppy (Poecilia reticulata), could be described with a good degree of accuracy using the log of the octanol/water partition coefficient (log Kow) of the pollutants. Two years later, Veith et al. [3] reported the 96-h log LC50 for fathead minnow (Pimephales promelas) for 56 chemicals. In that article, they showed that for a set of similar chemicals (derivatives of halogenoalkanes, alcohols, and inactivated aromatic compounds), a somewhat better description of the relation between aquatic toxicity and log Kow could be obtained using a bilinear equation rather than the simple linear relationship used previously.

Konemann and Musch [4], Schultz et al. [5] and Veith and Broderius [6] later showed that similar QSARs could be obtained for substituted phenols and anilines. However, these two classes of chemicals seemed to be more toxic overall than those studied previously.

The work of Hermens et al. [7] and Lipnick et al. [8] later showed that some chemicals exhibit higher toxicity than anticipated from their log Kow values. All these compounds were seen as particularly reactive and often possessed electrophilic properties. In addition, some of the compounds that were not electrophilic per se could be readily transformed into electrophiles enzymatically. The QSARs for these classes of chemicals needed additional terms, besides log Kow, to be predictive [9, 10].

In the late 1980s, it became clear that the practice of sorting aquatic pollutants into groups of similar structures or classes might be more than just convenience and possibly reflected different mechanism of action. Several mechanisms of toxicity of chemicals for fish were subsequently reported [7, 11]. Overall, they can be fundamentally divided into two major mechanisms: the narcosis effect and the specific interactions effects.

The narcosis effect is a nonspecific mode of toxic action that is now recognized [12] as resulting from the disruption of the cell cytoplasmic membrane by the sheer physical presence of lipophilic chemicals. It was postulated [13] that for aquatic organisms, the mechanism of narcosis is not a specific process. This means that compounds with identical octanol/water partition coefficients should have the same intrinsic toxicity, regardless of their structures.

Most of the QSAR studies mentioned above have focused on the so-called classic QSAR approach, using Hansch-type multiple linear regression models, with a prominent role for log Kow. All these models are based on the concept of a baseline toxicity. Accordingly, the activity of all molecules should be at or above the baseline toxicity value, obtained from the relationships that exists between the log Kow and the nonspecific narcosis effect.

On the other hand, it was noticed [14] that octanol is not a very good representative of the biological target (i.e., cellular membranes). Actual target-water distributions can differ from octanol/water distributions, and that difference may be species sensitive.

Compounds that were particularly noted as exerting narcosis effects are otherwise unreactive (e.g., alkyl polychlorinated alkane derivatives, such as chloroform, aromatic halides, alcohols, ethers, unsubstituted aliphatic, and aromatic compounds).

For more active compounds, it has been postulated that electrophiles exert their toxicity by interacting with glutathione [11] and either remove or deactivate this key component of the biochemical chain in aquatic organisms. Here, the activity is believed to be produced when an electrophile conjugates with the thiol group of glutathione. A large number of aquatic pollutants are believed to act by this mechanism. Examples of structural entities leading to such activity are found in halogenated compounds activated by electron-withdrawing substituents, carbonyl compounds, activated unsaturated compounds, epoxides, and thioorganic compounds. In addition, some chemicals, such as aromatics, anilines, amines, and phenols [11, 15], can be metabolized to electrophilic species and may react by the same mechanism.

Although interaction with glutathione is a common mode of toxic action of electrophiles, some classes of electrophiles can exert their toxic effect in other ways as well. For example, it is reported that substituted phenols cause toxicity by a unique mode of action involving inhibition of oxidative phosphorylation [16]. Similarly, organophosphorus compounds are believed to be active because of their anticholinesterase properties [9, 10, 17].

Mainly through the work of McKim et al. [17] and Bradbury et al.[18], the idea that each of these classes could be more or less unequivocally linked to a separate acute toxicity syndrome became the paradigm. The toxicity syndromes recognized by these researchers were nonpolar narcosis, polar narcosis, uncoupling of oxidative phosphorylation, respiratory irritancy, acetylcholinesterase inhibition, and central nervous system seizure.

However, the exact chemical mechanism causing each of these syndromes is not yet completely understood. Nevertheless, a number of theoretical structural approaches have been proposed to rationalize these observations. Verhaar et al. [7, 15] and Lipnick [19], among others, presented a number of rules to classify the toxicity of chemicals with respect to each of the recognized toxicity syndromes.

In the meantime, a number of alternative approaches, concerning different aspects of the classic model, have been proposed. Thus, Schüürmann [10] used quantum chemistry-derived parameters for modeling fish toxicity in a QSAR study of organothiophosphorothioates. Protic and Sablijc [12], on the other hand, advocated the use of topological descriptors in QSAR construction, and Veith and Mekenyan [20] developed a response-surface model able to predict the nonpolar and polar narcosis toxicity of phenols and anilines by a single equation. The predictive ability of this QSAR was subsequently extended to estimate the toxic effect of some reactive aromatic compounds [20]. Karabunarliev et al. [21] also identified a set of simple descriptors in an attempt to develop a method capable of screening large numbers of diverse compounds.

It should be mentioned that all these models were developed for specific classes of chemicals. None can be used to predict acute fish toxicity of chemicals in general. The problem is particularly difficult for complex molecules, which cannot be classified unequivocally because of the presence of multiple functionalities. A few authors have tried to address this issue by proposing classification schemes [7, 15, 19] but usually have had little success. Hence, efforts to develop a model capable of describing the acute toxic properties of larger sets of diverse compounds using these traditional QSAR methods have been largely unsuccessful.

Our objective in this study was to show that it is possible to develop a fish toxicity model that encompasses a variety of possible mechanisms and that this model can be used to predict potential fish toxicity for practically any organic molecule. We did this by using the Multiple Computer-Automated Structure Evaluation (M-CASE) program [22]. This program has been described previously [23, 24] and has been used to create predictive models for the activity of structurally diverse chemicals in a number of different biological phenomena. The approach has been used to predict antimycobacterial activity [25], capsaicin's antiinflammatory activity [26], and reversal of multidrug resistance in cancer therapy [27] as well as to uncover attributes of activity in carcinogenicity [28], mutagenicity [29], teratogenicity [30], and other toxic biological endpoints. The program has also been used to create models for such physicochemical properties as pKa, log Kow, and water solubility [23, 24]. This unique ability of the M-CASE program to generate models enabled us to derive fish toxicity models for a number of different fish species that were recommended as standard aquatic hazard test species [1], on the basis of available toxicity data. The first attempt to devise such a model for guppy is presented in this article.

MATERIALS AND METHODS

Modeling methodology

The M-CASE program is a QSAR expert system. It is based on a hierarchical algorithm designed for the treatment of large databases (learning sets) consisting of structurally different compounds [23, 24]. The fundamental assumption of the MCASE methodology is that if a substructure is not relevant to the observed activity, it will be found randomly in both active and inactive compounds. If it is related to the observed activity, it will be found predominantly in active and marginally active compounds. This substructure will then be called a “biophore.” The M-CASE program starts off by identifying the statistically most significant substructure that exists within the learning set, which consists of inactive and actives chemicals. This fragment, labeled the “top biophore,” is observed to be responsible for the activity of the largest possible number of active molecules. The molecules containing this biophore are then removed from the database, and the remaining ones are submitted to a new analysis, leading to identification of the next biophore. This procedure is repeated until either the activity of all the molecules in the learning set have been accounted for or no more statistically significant substructures can be found. In this way, M-CASE logically breaks the database into subsets of molecules, each associated with a particular biophore. It is presumed that the molecules within the same subset act by the same mechanism. For each set of molecules containing a specific biophore, M-CASE identifies additional parameters or “modulators.” These modulators are mostly structural fragments but can include physical properties, such as graph indices, molecular orbital energies, log Kow and water solubility. Then, within each biophore subset, an attempt is made to relate the presence, or magnitude, of the modulators to the toxic potency of each molecule containing the biophore. The process is totally automated and proceeds without human intervention and without bias.

Once the biophores and their modulators are identified and used to create multiparameter models, the program can be used to predict the potential toxicity of molecules that are not part of the training set. In this case, the program first searches the molecule for the presence of a biophore. If one is found, a potency value is calculated from the relevant QSAR. If none is found, then it is presumed that the molecule is inactive or possibly acts by a mechanism that had not been seen before. Based on the calculated potency, the molecules can subsequently be classified as active, marginal, or inactive using commonly accepted borderlines limits.

The M-CASE program has recently been outfitted with a new feature called the baseline activity identification algorithm (BAIA) [31], which allows it to identify baseline activity due to a specific physical attribute (e.g., Kow) of the molecule. This methodology is especially useful for fish toxicity because it has the capability to model the ability of some organic compounds to be active via the narcosis effect. When the observed toxicity of active compounds can be explained simply by the narcosis effect, then there is no residual activity produced by specific chemical interactions with biotargets for these chemicals; hence, they are assumed to be inactive as specific toxicants. However, we also found that a large number of chemicals are more toxic than anticipated from the log Kow baseline value. These are important to identify because they probably exert their activity through some specific mechanism unrelated to the relatively mild narcosis effect. In these cases, the narcosis effect or baseline toxicity is only one of the components of the observed toxicity of these compounds. All chemicals whose toxicity is not accounted for by BAIA are assumed to derive their activity from other factors, which can then be analyzed by M-CASE. These compounds therefore remain as actives even after BAIA treatment. Thus, for the purpose of the M-CASE analysis, activity is defined as the residual activity obtained by subtracting the relevant baseline activity from the observed one.

The M-CASE program, therefore, is trained to identify compounds whose toxicity is due to factors others than narcosis. When M-CASE evaluates the activity of a new molecule, it searches its structure for the presence of a biophore. If none is found, then the molecule is presumed to exert only baselinetype toxicity and can be either inactive or active depending on its log Kow value. On the other hand, if a biophore is found, then the toxicity is calculated to be due to a combination of the narcosis effect and biophore-related toxic effects.

Fish toxicity databases

During the past 20 years, many compounds have been tested for fish toxicity under diverse experimental conditions. A systematic examination of the literature was conducted by scanning online literature databases. We were able to assemble an extended database comprised of the toxicity of many molecules toward the guppy (P. reticulata) [2, 6, 32, 33]. This, in turn, enabled us to develop a global QSAR expert system for that fish.

Details are in the caption following the image

Distribution of component activity (inactive, marginally active, or active) before (A) and after (B) implementation of the baseline activity identification algorithm utility.

Only data obtained under identical or at least very similar conditions were retained for our analyses. Indeed, toxicity data for compounds tested under diverse experimental conditions may present a serious obstacle to a good structure-activity relationship model. Toxicity data for the sets of chemicals obtained from different sources for the same fish species were combined only if a correlation exceeding r = 0.99 was obtained between the data from the two sources for those chemicals that were evaluated in both.

Toxicity data for guppy were collected mainly from reports by Hermens et al. [7]. All measurements were made under the same semistatic conditions, with LC50 values obtained for an exposure period of 14 d. pH values of the aquatic solutions were in the range 6.5 to 7.5, and temperature was kept at 21 to 23°C.

All data sets were expressed in micromolar units (μM). The data obtained from sources other than Hermen's reports were combined with Hermen's data set. Once these data had been collected, the resulting 219 molecules were represented in SMILES code [34] and catalogued in an appropriate data file (learning set). Each record contained the chemical abstract substance number, the chemical or commercial name, the SMILES string, and the LC50 values. The complete database, without SMILES codes, is available from MULTICASE, Inc. [22].

The breakpoint between what is considered a toxic and a nontoxic chemical was set at an LC50 value of 750 μM. With this breakpoint, chemicals are considered to be toxic if their LC50 value is less than 750 μM and are considered to be very toxic if their LC50 value is less than 5 μM. These assignments are in accordance with guidelines used by regulatory agencies [35]. The inactive, marginally active, and active compounds distribution with respect to the breakpoint is presented in Figure 1A. It should be noted that the database is heavily skewed toward marginally active and active compounds, which make up nearly 94% of the database.

Testing methods

The QSAR expert system model obtained by submitting this learning set to M-CASE was first challenged by the same set to test its ability to account for the activities that were used to train it.

Additional cross-validation tests were later made to gauge the ability of the model to predict the toxicity of molecules not included in the database. To this end, 10% of the compounds from the database were excluded randomly from the learning set. The M-CASE model was rebuilt from the remaining molecules and used to predict the activity of the compounds that were excluded. In this manner, the BAIA equation and relevant biophores are reevaluated for each learning set. This procedure was repeated three times. Inconclusive results were excluded from the statistical evaluation.

Details are in the caption following the image

Baseline determination by baseline activity identification algorithm (BAIA) utility.

RESULTS AND DISCUSSION

Baseline toxicity and narcosis effect

We expected that the M-CASE program, modified by the BAIA feature, would be able to identify the narcosis effect in fish. Indeed, after the guppy database was submitted to the MCASE program, version 3.1, a baseline relationship was determined by BAIA for 69 chemicals:
equation image(1)
Such an extremely good correlation, high index of determination (rsq), low standard error (S0), and especially very high value on the F test, demonstrates the strong influence of the narcosis effect on overall toxicity. Indeed, the toxic activity of about 31.5% of the chemicals in the database can be explained simply by their lipophilic properties (Fig. 2).
The baseline parameters are similar to those found previously by other authors, such as Konemann [2], whose study dealt with 50 industrial compounds whose toxicity was believed to be due to nonpolar narcosis.
equation image(2)
After implementation of the BAIA utility, 43 of the 143 active compounds and 12 of the 50 marginal compounds had no residual activity and hence were marked as inactive. These compounds are believed to exert their toxicity exclusively through the narcosis effect.

Consequently, the implementation of the BAIA utility caused changes in the distribution of the inactive, marginally active, and active components in the database, as shown in Figure 1B.

Toxicophores

M-CASE identified biophores (toxicophores) for those compounds whose toxicity could not be completely accounted for by the narcosis effect. Among the toxicophores identified by the program, some characterize classes of compounds already described in the literature as producing fish toxicity [11]. In this study, however, these and the other toxicophores were identified without human intervention and without bias in a systematic and inclusive manner. They form an exhaustive list of fragments potentially responsible for acute fish toxicity due to chemicals of the type represented in our database.

A total of 39 toxicophores were identified by M-CASE. The statistical results for “top biophore” for guppy are shown below.

The biophore 1 consists of XYnc =, where X is nitrogen or carbon, Y is nitrogen or oxygen, c is an aromatic carbon, and n is 1 or 2. All 37 molecules containing this biophore are active. Hence, the probability that this fragment is related to activity is 100.00%. However, the activity of nine molecules is only marginal. The average activity of molecules containing this biophore is 31.00. Seventy-two potential modulators are associated with this biophore, of which 30 were redundant and therefore useless; 23 of these fragments are present in only one to three molecules and were therefore ignored as well. The remaining 19 potential modulators were accepted for the elaboration of the relational QSAR. Of these, only three modulators were found to be significant (Table 1). They are water solubility, the presence of a methoxy group, and the presence of an ortho substituent. The equation is
equation image(3)
where C is a constant, Ti is Ith parameter, ni is an occurrence of Ith parameter, and N is the number of parameters. The statistical parameters for the equation are shown in Table 1. A similar analysis was produced for each statistically significant biophore.

In total, 11 biophores were used to develop semiquantitative models. These are shown in Table 2. The others did not exist in a sufficient number of molecules to support a QSAR-type analysis. The complete list of biophores with their related modulators are available from MULTICASE [22].

As reported above, the “top biophore” is a potent electron-withdrawing substituent (e.g., NO2, CO, C  N, etc.) attached to an aromatic ring. Of the 37 molecules containing this biophore, 28 are toxic (28% of the total number of active molecules) and nine are marginally toxic to guppy. This kind of substituent is seen to enhance the electrophilic properties of the aromatic ring and thereby create the potential for reaction with glutathione or other important nucleophilic biomolecules.

The most important deactivating modulator appears to be water solubility (Table 1). This seems to be reasonable and is most likely related to the extremely low solubility of nitroaromatic compounds in water. Decreasing the solubility decreases the activity. The most significant activating modulator is the methoxy group attached to a (thio-)phosphoryl group found in all organophosphates belonging to this subset of compounds. This lends additional support for the hypothesis that these compounds can alkylate glutathione.

The second significant toxicophore, an NH2-substituted aromatic ring, exists in 20 substituted anilines, of which 10 are active and 10 are marginally active. This class of chemicals can readily be associated with polar narcosis. The exact mechanism of action is unclear but is believed to involve a reversible disruption of the cell membranes [6]. Polar narcotics are usually weak acids (phenols) or bases (anilines), which are capable of forming hydrogen bonds. One of the proposed mechanisms of toxic action for these compounds involves interaction with free-acid functionalities of proteins in the membrane [6]. That process can change the properties of the membranes and cause narcosis-like effects in aquatic organisms.

Table Table 1.. Parameters of local relational structure–activity model
Modulatorsa
Parameter T(I) Mean S0 Rsq Fregr Fpart
Parameter T(I) Mean So Rsq Fregr. Fpart
Constant −1.56 0.16 0.16
Water solubility −0.47 −0.53 0.13 0.5241 13.3 13.7
O-CH3 0.42 0.44 0.17 0.6234 10.8 6.2
cH=cH-cH=c-c > = −1.38 −0.34 0.60 0.6890 9.9 5.3
Quality of the local QSAR Minimal allowed Fclit(3,33,0.05) ∼ 2.6 0.82 0.4852 10
Quality of prediction
Type of activity True prediction False prediction
 Actives 24 0
 Inactives 0 0
 Marginals 4 9
  • a Parameter is a modulator; T(I) is a value of a parameter in Equation 1; Mean is a deviation of this value; Rsq is the index of determination; Fregr. is the F test value; S0 is the standard deviation; and Fpart is the partial F test value for the ith parameter.

A good-quality QSAR model (see Table 2) was found for these 20 compounds with six statistical parameters-modulators. All modulators refer to the presence of various substituents at the ortho position to the amino group. Weak σ-electron donor alkyl substituents are recognized as deactivating modulators, whereas σ-electron donor chlorine groups in any position are strong activating modulators. Although such a pattern of modulators could be related to their ability to form hydrogen bonds, it is also possible that the amino group is oxidized by cytochrome P-450 or flavin containing monooxygenase [36], which will make these compounds active toward glutathione as well. It is difficult to decide which of the two mechanisms is most important for the guppy using only the limited data that are available, but the absence of the partition coefficient among the detected modulators might indicate a preference for the oxidation processes.

The third important toxicophore, shown in Table 2, is a two-dimensional descriptor consisting of two nucleophilic centers, such as oxygen, sulfur, or nitrogen, separated by a distance of 5.2 Å. This biophore was found in 14 active, one marginally active, and two inactive chemicals. The chemicals containing this biophore can be subdivided into several structural types. These are organophosphates, containing the pairs oxygen-other nucleophilic center (oxygen, nitrogen, or sulfur) separated by four covalent bonds; dinitroaromatics, containing two nitrogen groups in meta-position with respect to each other; and dimethoxyaromatics, containing methoxy groups in a metaposition with respect to each other. One may suppose that the presence of nucleophilic centers in the active molecule again favors a mechanism involving the formation of hydrogen bonds with some amino or carboxyl fragments of cellular proteins or others biomolecules and, therefore, produces a toxic effect similar to polar narcosis. However, other possibilities exist but cannot be demonstrated.

A satisfactory three-parameter QSAR model (see Table 2) has been constructed with these 17 molecules. The activating modulators are the presence of a nucleophilic oxygen in the substituent attached to the aromatic ring and the presence of a sulfur-containing fragment, S-CH2-N, where S is attached to the phosphoryl group. Both introduce an additional nucleophilic center, which may assist the formation of a bond with the biotarget. The deactivating modulator consists of a sixmember cyclic fragment containing oxygen atoms, found in organophosphates. The presence of this fragment decreases the conformation flexibility of the toxicophore-containing part of a molecule, which may decrease its activity if the flexibility is critical for binding with the biotarget.

The next significant toxicophore is the hydroxyl group of phenols, which have at least one unsubstituted ortho position. This structural feature exists in 20 phenol derivatives, of which 14 are highly toxic, three are marginally toxic, and three are nontoxic. Obviously, phenols are weak acids. It was also reported that they readily undergo oxidation, forming potentially toxic electrophilic products such as quinone [37]. Aside from their behavior as acids, phenols interfere with a variety of processes within cells. As weak acids, phenols are capable of being involved in the uncoupling of oxidative phosphorylation [37]. The activity of, for example, 2,4-dinitrophenol, can be explained by the fact that it is a lipophilic acid, capable of migrating through mitochondrial membranes in both unionized and ionized forms, thus washing out the proton gradient [37]. Phenols are also effective inhibitors of a number of flavin adenine dinucleotide (FAD-)- and nicotinamide adenine dinucleotide (NAD+)-containing oxidases and dehydrogenases via reaction mechanisms that exhibit complex kinetics. Phenols also can form charge transfer complexes with FAD- and NAD+ and compete with these coenzymes for binding to enzymes [37].

M-CASE developed a local QSAR for phenols that describes the toxicity of 20 compounds by a three-parameter equation. The statistical parameters are an activating modulator, the charge on oxygen atom, and two deactivating modulators, (log Kow)2 and a chlorine atom in the ortho position of the phenol. All seem to fit to the hypothesis that phenols act by uncoupling oxidative phosphorylation and appear to be related to a membrane migration process, (log Kow)2, and the ability to form charge transfer complexes.

The next toxicophore is a chlorine atom in an allylic or benzylic position. This structure feature has been found in 11 compounds, all of them active. The toxicophore is found in compounds with a chlorine atom at the carbon atom next to a double bond or to an aromatic ring (the so-called allylic or benzylic positions, respectively) or in compounds containing a sulfur or an epoxy group at the same position. This type of chemical functionality readily undergoes nucleophilic substitution or alkylation at the alpha position to the unsaturated system. The most likely targets for such a reaction are the sulfhydryl, amino, and hydroxyl groups of proteins and enzymes. According to the corresponding QSAR, the toxicity of all 11 compounds is modulated by their lipophilicity, which is expressed as the logarithm of the octanol/water partition coefficient. Such a relationship is usually indicative of a mechanism involving membrane disruption.

Table Table 2.. The most statistically significant biophores and parameters of local quantitative structure-activity relationshipsa

image

A number of other, less significant biophores are also found and in some cases can be associated with known toxic mechanisms, such as antiglutathione activity or anticholinesterase activity. In several instances, the nature of the biophore does not lend itself to a mechanistic interpretation and may either indicate as-yet-unknown mechanisms of toxicity or possibly have been identified spuriously.

Table Table 3.. Statistical evaluation of the descriptive and predictive ability of the guppy toxicity modela
Accuracy of predicited values
Type of evaluation Size of learning set/size of tested set Concordance (%) Sensitivity (%) Specificity (%) % That may be described With error (log unit) R F
Fitting 225/225 100.00 100.00 100.00 91.60 0.33 0.96 2,562
Prediction Of them (205/20) × 3 93.10 90.23 89.54 77.58 0.63 0.92 234
 Alcohols 13 92.31 92.31 92.31 0.18 0.93 67
 Epoxides 5 80.00 80.00 80.00 0.35 0.88 14
 Haloaromatics 10 100.00 100.00 100.00 0.19 0.93 50
 Nitroaromatics 7 100.00 100.00 71.42 0.23 0.94 23
 Polar narcotics 7 100.00 100.00 71.42 0.13 0.93 19
 Others 18 87.50 100.00 60.00 75.00 0.42 0.84 24
  • a Sensitivity is the percentage of correctly predicted actives; specificity is the percentage of correctly predicted inactives; R is the coefficient of regression; and F is the F test value. Others includes organophosphates, esters, ethers, alkanes, aromatics, and other classes of chemicals.

Testing the model

The structure-activity relationship model has been tested using the techniques described previously in “Testing methods.” These tests help evaluate the descriptive and predictive abilities of the model. The descriptive reliability was tested by retrofitting the data of the learning sets. The results of this test are presented in Table 3.

It is easily seen from the table that the structure-activity relationship model is able to retrofit the toxicity values from its own learning set. For the guppy toxicity model, the prediction that a certain compound is toxic (or nontoxic) is correct with a probability of approx. 100%. The average deviation (i.e., the difference between experimental and retrofitted values) is only about 0.32 log units for approx. 90% of the chemicals included in the database. We therefore conclude that the database is reliable and accurate in retrofitting its own data.

More important, however, is the ability to predict the toxicity of untested compounds, or compounds that are not included in the learning sets. Therefore, we have also performed additional cross-validation tests using 90% of the database to predict the 10% that were randomly left out. The results of these tests are presented in Table 3 as well.

It can be seen that even the models produced from the reduced data sets remain of very high quality and provide adequate quantitative predictions. The prediction that a certain compounds is toxic (or nontoxic) is found to be correct with a probability of approx. 93%. The average error (i.e., the difference between experimental and predicted values) is about 0.6 log units for approx. 80% of the chemicals. As can be seen in Table 3, the predictions are reliable for various classes of organic compounds, including nonpolar and polar narcotics or specifically acting chemicals (e.g, nitroaromatics or epoxides).

Comparison with literature models

It is clear from this and other studies that fish toxicity is multifaceted. Although narcosis is definitely one of the main causes of toxicity, a variety of other mechanisms must be involved to explain the variety of functionalities that elicit a toxic response. Usually the published models describe only one kind of activity, either narcosis or specific toxicity [2, 3, 6, 7]. A number of structural parameters were used by different authors to describe these mechanisms. Among them, a number of different types of graph indices [38, 39], molecular volume [40], and partition coefficients [2, 3, 6, 7] have been used. Only the model derived recently by Verhaar et al.[11, 15], which uses a complex classification algorithm, can include both types of activity. However, this model cannot describe compounds containing molecules with multiple functionalities.

Our model, as mentioned earlier, uses for its estimation and prediction of activity the artificial intelligence algorithm embedded in the M-CASE paradigm to classify substances automatically and without bias. The analysis includes both an estimation of narcosis, or baseline activity, and identification of the biophores, which may be responsible for specific activity. Our model is not limited to single classes of compounds and can handle very large databases. The resulting model can deal with molecules as diverse as nitrogenous heterocycles, polycyclic aromatics, heteroatoms, esters, chlorinated compounds, phenols, aliphatic alcohols, and so on. In contrast, the majority of the published models have been reported for a particular class of chemicals, and, as mentioned by Protic and Sabljic [12], their usefulness is limited. It is therefore difficult to compare our results with those described previously in the literature.

Nevertheless, we compared the concordance of some of the best published models with our model. The results shown in Table 4 demonstrate that our model retrofits its own data as well as predicts the toxicity of unknown chemicals with excellent accuracy for various classes of organic compounds and compares favorably with other models presented in the literature.

CONCLUSION

An aquatic toxicity database for guppy was constructed and submitted to analysis with the M-CASE program. Based on comparison between our results and experimental and literature data, it is found that the methodology is able to describe acute toxicity for the species studied with high concordance and accuracy. The model includes the concept of a baseline activity as one of the parameters for the correlation as well as others parameters, such as the presence of some biophores, a hardness-softness index, and other characteristics determined from quantum chemical calculations. By using its artificial intelligence algorithm, M-CASE chooses automatically the most suitable set of parameters. Using information stored by the program in dedicated dictionaries, any organic molecule can be evaluated for its potential to be toxic to guppy. In parallel, we are also analyzing structure-activity relationship models of toxicity in other aquatic species. Upon completion, these will allow comparison of the putative mechanisms involved in aquatic toxicity for these various species.

Table Table 4.. Descriptive characteristics of published fish toxicity modelsa
Model Chemicals R Error No. of compounds F
Leegwater [37] Polar narcotics 0.92 0.53 92 170
Konemann [2] Polar narcotics 0.99 0.26 50 1,865
Karabunarliev et al. [21] Aromatics 0.85 0.16 64 80
M-CASE Unlimited 0.96 0.33 205 2,562
Alcohols 0.99 0.23 13 717
Aldehydes 0.96 0.21 10 85
Anilines 0.89 0.22 10 27
Epoxides 0.96 0.20 9 82
Haloalkanes 0.99 0.23 19 774
Haloalkenes 0.96 0.32 9 77
Monohaloaromatics 0.97 0.18 21 323
Nitroaromatics 0.93 0.32 21 125
Organophosphates 0.91 0.31 14 55
Polyhalogenated anilines and phenols 0.92 0.16 21 104
Phenols 0.96 0.28 10 105
Others 0.94 0.67 48 256
  • a R is a coefficient of correlation; error is the standard deviation; No. of compounds is the number of molecules included in a local QSAR; and F is the F test value. Others represents other classes of chemicals, such as piretroids, amides, carbamates, and heterocyles.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.