Volume 65, Issue 3-4 pp. 116-120
RESEARCH ARTICLE
Open Access

Within-laboratory reproducibility of Ames test results: Are repeat tests necessary?

Errol Zeiger

Errol Zeiger

Errol Zeiger Consulting, Chapel Hill, North Carolina, USA

Search for more papers by this author
Constance A. Mitchell

Corresponding Author

Constance A. Mitchell

Health and Environmental Sciences Institute, Washington, DC, USA

Correspondence

Constance A. Mitchell, Health and Environmental Sciences Institute, Washington, DC, USA.

Email: [email protected]

Search for more papers by this author
Stefan Pfuhler

Stefan Pfuhler

The Procter & Gamble Company, Mason, Ohio, USA

Search for more papers by this author
Yang Liao

Yang Liao

Cencora PharmaLex, Conshohocken, Pennsylvania, USA

Search for more papers by this author
Kristine L. Witt

Kristine L. Witt

Division of Translational Toxicology, National Institute of Environmental Sciences, Research Triangle Park, North Carolina, USA

Search for more papers by this author
First published: 23 April 2024

Accepted by: B. Gollapudi

Abstract

The Ames test is required by regulatory agencies worldwide for assessing the mutagenic and carcinogenic potential of chemical compounds. This test uses several strains of bacteria to evaluate mutation induction: positive results in the assay are predictive of rodent carcinogenicity. As an initial step to understanding how well the assay may detect mutagens present as constituents of complex mixtures such as botanical extracts, a cross-sector working group examined the within-laboratory reproducibility of the Ames test using the extensive, publicly available National Toxicology Program (NTP) Ames test database comprising more than 3000 distinct test articles, most of which are individual chemicals. This study focused primarily on NTP tests conducted using the standard Organization for Economic Co-operation and Development Test Guideline 471 preincubation test protocol with 10% rat liver S9 for metabolic activation, although 30% rat S9 and 10 and 30% hamster liver S9 were also evaluated. The reproducibility of initial negative responses in all strains with and without 10% S9, was quite high, ranging from 95% to 99% with few exceptions. The within-laboratory reproducibility of initial positive responses for strains TA98 and TA100 with and without 10% rat liver S9 was ≥90%. Similar results were seen with hamster S9. As expected, the reproducibility of initial equivocal responses was lower, <50%. These results will provide context for determining the optimal design of recommended test protocols for use in screening both individual chemicals and complex mixtures, including botanicals.

1 INTRODUCTION

The Ames test was developed in the early 1970s as a relatively simple and sensitive method for identifying mutagenic chemicals (Ames et al., 1975) by measuring gene mutation induction in Salmonella typhimurium and Escherichia coli bacterial strains. Although the test measures mutations in bacterial genes, a positive (mutagenic) response has ≥70% sensitivity for rodent carcinogenicity depending on the chemical classes tested (Zeiger, 1998). The test is required by regulatory authorities worldwide as an initial screen for chemical mutagens and carcinogens. A positive response in the test is considered strong evidence that the substance in question is a presumptive human mutagen and/or carcinogen. Protocol recommendations for the Ames test are found in the Organization for Economic Co-operation and Development (OECD) Test Guideline 471 (OECD, 2020). Detailed information on the molecular basis of the test and a description of its various procedures can be found in Mortelmans and Zeiger (2000) and OECD (2020).

A primary requirement for a valid test is that the results are reliable (i.e., reproducible within and across laboratories). OECD Test Guideline 471 does not require a repeat test for a clear positive result, while negative results should be considered for confirmation on a case-by-case basis. When results are clearly negative, justification for not repeating should be provided. Equivocal results should be repeated, preferably using modified experimental conditions. For this investigation of reproducibility, the freely available Ames test database generated by the U.S. National Toxicology Program (NTP) was used to examine the within-laboratory reproducibility of the various trials with and without metabolic activation (S9). Most tests were conducted on individual chemicals, rather than on undefined mixtures such as botanical extracts. The NTP Ames test database comprises more than 3000 distinct substances from a wide variety of chemical classes generated over a span of several decades.

To address the question of reproducibility, a cross-sector group of experts from the Health and Environmental Sciences Institutes (HESI)'s Botanical Safety Consortium (BSC) and Genetic Toxicology Technical Committee (GTTC) used this large NTP data set to evaluate how often initial Ames test results repeat in actual practice. Although a database of this size and complexity can be useful for a number of different analyses of the method and its reliability, the sole purpose of this study was the reproducibility of the initial positive or negative test results. The results of this evaluation will provide context to aid in deciding whether repeat testing should be an integral part of a recommended testing protocol, both for single chemicals and for complex mixtures such as botanicals.

2 BACKGROUND ON THE DATA SOURCE

In 1975, the National Institute of Environmental Health Sciences (NIEHS) was tasked by the US Congress with developing a testing program to identify mutagenic chemicals because of their potential to also be carcinogenic (Zeiger, 2019). This program began testing chemicals for bacterial mutagenicity in 1979, using the Ames test, among others. In 1980, this new mutagenicity testing program was incorporated into the newly formed NTP. Ames test data were generated in multiple NTP contract laboratories over the years using the same, basic preincubation test protocol. The tests evaluated for this project on reproducibility were performed between 1979 and 2020. Although some modified test protocols were used by the NTP (e.g., different tester strains; varying levels and sources of S9), those consistent with OECD Test Guideline 471 (OECD, 2020) conditions (i.e., ±10% induced male Sprague–Dawley rat liver S9, S. typhimurium strains TA97, TA98, TA100, TA1535, TA1537, TA104, and E. coli WP2 uvrA pKM101) were of primary interest for this study. Data from tests employing other S9 sources (e.g., hamster and mouse) and percentages (e.g., 5% and 30%) are included in the NTP database. Although not standard, data from tests employing 30% rat S9 and 10 and 30% hamster S9 were also considered in these analyses to determine if their reproducibility was consistent with, and supportive of, the OECD test conditions. All tests were conducted using a preincubation protocol and repeat tests used the same vehicles (e.g., water; DMSO) as the original tests. The complete NTP Ames test data in the NTP comprehensive Chemical Effects in Biological Systems (CEBS) database is publicly accessible at https://cebs.niehs.nih.gov/cebs/.

All chemicals were tested under code at NTP contract laboratories and all calls (i.e., positive, negative, or equivocal) were made on the coded chemicals by scientific personnel at the NTP using expert judgment rather than a formulaic 2-(or 3-) fold rule or p value statistic (Haworth et al., 1983; Zeiger et al., 1992). This approach avoided situations where the difference between a judgment of positive or negative for a single trial was based on a strict cutoff value (i.e., ≥2-fold increase or p ≤ .05) (Zeiger, 2023). This approach, therefore, may actually have enhanced reproducibility between repeat trials. Each trial was evaluated independently and overall calls for the test chemical were based on the following criteria (Haworth et al., 1983). For a test substance to be judged as positive, a clear, reproducible, dose-related increase in mutant colonies over a range of five doses in at least one strain and activation condition was required. Negative tests were those in which no increase in mutant colonies was observed in either of the two trials in all strain/activation conditions. Equivocal responses were characterized by small increases in mutant colonies insufficient to support a determination of mutagenicity and/or an increase in the absence of a dose–response.

2.1 Data source and analysis

The data and mutagenicity decision from each test were extracted from the CEBS database and compiled, organized, and displayed in spreadsheets (Supplemental File 1). For each test, the unique test article name and identifier, study year, bacterial strain used, solvent/vehicle used, activation condition, and individual trial number were included; individual trial results and overall test results were also included. For this study, only within-laboratory strain- and %S9-specific data, and the identity of the solvent/vehicle were used to address the question of the reproducibility of an initial positive, negative, or equivocal response.

The majority of the test data examined in this study were from tests using only TA98 and TA100. This apparent limitation is an artifact of the test protocols used by the NTP, beginning in the mid-1980's, that called for initial evaluations of a test substance in these two strains (Zeiger et al., 1985). If the result in either strain was positive and reproducible, there was no requirement to use additional bacterial strains since the test substance had been determined to be mutagenic. If the initial trials in TA98 and TA100 were negative or equivocal, additional bacterial strains (e.g., TA1535, TA1537, and/or TA97) were tested (Zeiger et al., 1985). If all strains were negative in the initial trials, all strains were repeated. If the initial negative trial was performed with 10% S9, the repeat trial used 30% S9; if the initial negative was with 30% S9, the repeat used 10%. Positive results in any strain required a repeat test at least 1 week after the initial trial using the same test protocol although the test doses may have been adjusted to focus on a specific region of the response. All equivocal responses required a repeat test unless any of the other strain-S9 combinations were judged positive (e.g., Haworth et al., 1983; Zeiger et al., 1992).

Beginning in 2001, based on overviews of the results in several strains of bacteria, the NTP testing program opted to streamline testing by using only S. typhimurium strains TA100 and TA98, and E. coli strain WP2 uvrA pKM101, as these three strains detected the great majority of mutagens (Williams et al., 2019). Under this new approach to testing, all trials were repeated regardless of the initial response, and only 10% of rat liver S9 was used to provide exogenous metabolic activation. Repeat trials in the post-2001 protocol generally used the same testing conditions as the initial trial although sometimes the doses tested may have been adjusted to focus on a specific portion of the dose–response curve. As before, the data were evaluated using expert judgment rather than a strict fold rule or statistic.

3 RESULTS AND DISCUSSION

There are a number of strain/activation combinations in Tables 1 and 2, with very limited data, for example, tests with strain TA104 and the majority of equivocal trials. From a statistical and biological point of view, such low numbers cannot form the basis of a conclusion on the effectiveness of an assay, but the data are included for completeness.

TABLE 1. Initial test result reproducibility by strain and rat S9 activation.
Positive repeat Negative repeat Equivocal repeat
Tester strain No. % No. % No. %
TA98 NA 168 91.3 1752 98.3 13 25.0
+10% rat S9 165 92.7 923 97.9 21 39.6
+30% rat S9 121 96.0 57 91.9 6 20.7
TA100 NA 266 95.0 1671 98.0 36 30.0
+10% rat S9 274 94.8 846 96.7 21 19.8
+30% rat S9 157 92.9 44 89.8 15 34.9
TA97 NA 63 88.7 747 97.9 5 10.2
+10% rat S9 81 94.2 46 92.0 8 20.0
+30% rat S9 4 40.0 32 94.1 10 21.7
TA1535 NA 79 87.8 1557 99.2 6 20.7
+10% rat S9 127 92.0 723 99.0 9 20.9
+30% rat S9 14 100.0 31 100.0 0 0
TA1537 NA 26 83.9 802 98.5 7 53.8
+10% rat S9 50 90.9 723 98.8 6 23.1
+30% rat S9 0 0 2 66.7 0 0
TA104 NA 6 100.0 54 98.2 2 33.3
+10% rat S9 1 100.0 10 83.3 2 100.0
+30% rat S9 4 100.0 1 100.0 0 0
Escherichia coli NA 21 95.5 159 97.5 2 22.2
+10% rat S9 22 84.6 148 95.5 1 14.3
+30% rat S9 0 0 0 0 0 0
  • Note: NA, not activated (no S9).
  • a Too few chemicals in this category for consideration.
TABLE 2. Initial test result reproducibility by strain and hamster S9 activation.
Positive repeat Negative repeat Equivocal repeat
Tester strain No. % No. % No. %
TA98 NA 168 91.3 1752 98.3 13 25.0
+10% hamster S9 172 95.6 750 97.0 7 16.7
+30% hamster S9 125 94.7 49 90.7 4 15.4
TA100 NA 266 95.0 1671 98.0 36 30.0
+10% hamster S9 269 95.4 661 95.7 36 36.7
+30% hamster S9 194 97.5 33 82.5 8 18.2
TA97 NA 63 88.7 747 98.0 5 10.2
+10% hamster S9 85 90.4 40 88.9 14 31.8
+30% hamster S9 11 84.6 26 89.7 11 26.2
TA1535 NA 79 87.8 1557 99.2 6 20.7
+10% hamster S9 137 94.5 739 98.1 9 23.7
+30% hamster S9 20 90.9 30 90.9 0 0
TA1537 NA 26 83.9 802 98.5 7 53.8
+10% hamster S9 47 82.5 734 99.1 4 17.4
+30% hamster S9 0 0 4 100.0 0 0
TA104 NA 6 100.0 54 98.2 2 33.3
+10% hamster S9 0 0 1 100.0 2 100.0
+30% hamster S9 4 100.0 0 0 1 50.0
Escherichia coli NA 21 95.5 159 97.5 2 22.2
+10% hamster S9 1 100.0 0 0 0 0
+30% hamster S9 0 0 0 0 0 0
  • Note: NA, not activated (no S9).
  • a Too few chemicals in this category for consideration.

The reproducibility of initial negative responses in all strains, with and without 10% rat S9, was quite high, ranging from 92% to 99% (Table 1). The within-laboratory reproducibility of initial positive responses for strains TA98 and TA100 with and without 10% and 30% rat liver S9 was >92% (Table 1). Reproducibility of positive responses was lower for strains TA1535, TA1537, and TA97 in the absence of S9 (83.9%–88.7%), but with S9, it was still at least 90% (Table 1). Reproducibility of initial positive responses in the E. coli strain with 10% S9 was 84.6%.

The reproducibility of the hamster S9 tests tended to be slightly lower than those conducted with rat S9 (Table 2). As seen with rat S9, the reproducibility of initial negative responses with and without 10% hamster S9 in all strains was ≥95% (Table 2). Reproducibility of positive responses with and without S9 was ≥95% in strains TA98 and TA100; the lowest reproducibility, 82.5%, was seen in strains TA97 and TA1537 with 10% hamster S9.

As anticipated, the reproducibility of equivocal responses, which indicate possible weak activity of the test chemical, was much lower in all strains and activation conditions (Tables 1 and 2). The numbers of samples for which data were available is too low in most cases to be meaningful. This observation underscores the difficulty in using set cutoffs for response characterization because a few colonies more or less can be the difference between a response judged “equivocal” and one judged “positive” or “negative.” All initial equivocal tests required repeating.

One limitation of this study is that the potencies of the initial positive and equivocal responses were not factored into the calculations of reproducibility. It can be presumed that the more potent the initial positive response, the more likely it is to repeat. The same can be presumed for negative responses that have slopes of 0. This question of potency is important for chemical testing of complex mixtures such as botanicals. Mixtures can have numerous, often unknown constituents, each representing some fraction of the whole mixture that may contain a mutagenic substance. Thus, relatively weak positive responses may tend to be produced at low concentrations of the mutagens in the mixture and therefore, may be less likely to reproduce than if the same (single) chemicals of high purities were being tested, as was done in the majority of the NTP tests. Another consideration that would be expected to affect the reproducibility of weak responses is the challenge of obtaining a good solution or stable suspension. Some botanicals, for example, are difficult to get into solution, which could lead to inconsistencies from day to day or batch to batch in the relative proportions of the various constituents. Another factor that could affect reproducibility is the experience and competence of the testing laboratory.

Data evaluation using expert judgment may have increased the reproducibility value because the test conclusion is not dependent on where the mutant colony count falls with respect to the strict, but not biologically relevant, fold-increase, p value, or historical control range methods of response characterization. Expert judgment also considers the inter-plate range of responses at each dose (Zeiger, 2023) which would lead to a less conservative evaluation than a fold rule. The data presented here may be useful in any future attempts to revise the OECD Test Guideline 471, when considered along with the analyses published by Williams et al. (2019) and Gatehouse et al. (1994).

The high level of reproducibility of the initial negative and positive responses in our study, coupled with the results of previous analyses by Gatehouse et al. (1994) and Levy et al. (2019) showing the same pattern, strongly suggests that repeating initial clearly positive and negative responses is unnecessary, particularly for individual chemical substances or well-defined mixtures with good solubility. Exceptions may include weak positive responses, or negative responses that show a positive trend, given that the difference between a negative, equivocal, or weak positive result may depend, for example, on only a few mutant colonies per plate or high inter-plate variability. Initial equivocal responses should always be repeated unless a response in one of the other strains is clearly positive. As can be seen in Table 1, initial equivocal responses repeated as equivocal in <40% of the tests in most strains. The OECD test guideline does note that “results may remain equivocal or questionable regardless of the number of times the experiment is repeated.”

The BSC is currently testing 13 well-characterized botanical extracts, as case studies, in the standard OECD Ames test protocol that includes repeat testing of all responses. The results from this exercise, along with the results from the NTP Ames test data analysis, will help in designing a recommended protocol for use in the routine testing of complex mixtures including botanical extracts.

ACKNOWLEDGMENTS

This work was supported in part by the Health and Environmental Sciences Institute's (HESI) Botanical Safety Consortium and the Genetic Toxicology Technical Committee. We acknowledge the committee members for their support and helpful feedback on the development of this manuscript. The BSC is supported by U.S. Food and Drug Administration (FDA); Office of Dietary Supplement Programs and Department of Health and Human Services (HHS); National Institute of Environmental Health Sciences (NIEHS); Division of Translational Toxicology, Office of Liaison, Policy, and Review; and Health and Environmental Sciences Institute (HESI) via Department of the Interior (DOI) Federal Consulting Group (FCG) under Blanket Purchase Agreement Order 140D0421F0068. This work is also supported in part by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences, Intramural Research project ZIA ES103316-04. We thank Jennifer Fostel for compiling the NTP-generated Ames data from CEBS. We thank PharmaLex for their analytical support. We thank Ray Tice and Connie Chen for their careful and thoughtful reviews of the manuscript.

    DATA AVAILABILITY STATEMENT

    The data that support the findings of this study are available in CEBS at https://cebs.niehs.nih.gov/cebs and in the supplemental files.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.