Volume 19, Issue 9 pp. 2218-2223
Environmental Chemistry
Full Access

Results of an interlaboratory evaluation of an analytical screening method for assessing persistent bioaccumulative toxic chemicals in sediment samples

Robin A. Silva-Wilkinson

Corresponding Author

Robin A. Silva-Wilkinson

Great Lakes Environmental Center, 739 Hastings Street, Traverse City, Michigan 49686, USA

Great Lakes Environmental Center, 739 Hastings Street, Traverse City, Michigan 49686, USASearch for more papers by this author
G. Michael DeGraeve

G. Michael DeGraeve

Great Lakes Environmental Center, 739 Hastings Street, Traverse City, Michigan 49686, USA

Search for more papers by this author
Lawrence P. Burkhard

Lawrence P. Burkhard

U.S. Environmental Protection Agency, National Health and Environmental Effects Research Laboratory, Mid-Continent Ecology Division, 6201 Congdon Boulevard, Duluth, Minnesota 55804

Search for more papers by this author
Daniel W. Tholen

Daniel W. Tholen

Great Lakes Environmental Center, 739 Hastings Street, Traverse City, Michigan 49686, USA

Search for more papers by this author
Barbara R. Sheedy

Barbara R. Sheedy

U.S. Environmental Protection Agency, National Health and Environmental Effects Research Laboratory, Mid-Continent Ecology Division, 6201 Congdon Boulevard, Duluth, Minnesota 55804

Search for more papers by this author
First published: 02 November 2009

Abstract

An inter- and intralaboratory evaluation was performed on a recently developed sediment screening method designed to detect a wide range of persistent bioaccumulative toxins (PBTs). Ten participating laboratories analyzed sediment samples, and their results were evaluated. The analyses demonstrated that laboratories that were better prepared to perform the screening method generally characterized each of the sediment samples in a similar fashion and had 80% average interlaboratory agreement in reporting the presence or absence of chemical classes in the samples. The results of this study suggest that the sediment screening method for PBTs is reasonably reproducible when performed by properly prepared and experienced laboratories.

INTRODUCTION

In the arena of human health and wildlife protection, there is significant concern regarding impacts from the consumption of fish and shellfish contaminated by persistent bioaccumulative toxins (PBTs) [1]. Historically, federal and state governments have focused on monitoring and controlling specific priority chemicals to address this concern. Water quality standards, including designated uses for specific water bodies and ambient water quality criteria for specific chemicals, currently form the regulatory basis for implementing controls to improve or maintain ambient water quality, sediment quality, and aquatic organism tissue contamination. However, because the priority chemicals represent only a subset of those bioaccumulative chemicals being discharged to surface waters and because there are currently few water quality criteria relative to the universe of PBTs, the water quality criteria approach has not proven to be completely effective in eliminating risks to human health and wildlife from PBTs.

The primary reason that the water quality criteria approach is not entirely effective is that it does not protect against exposure to chemicals whose presence in surface waters is not known or suspected [1]. What is needed to address this shortcoming are screening-level procedures to detect and tentatively identify a wide range of organic PBTs in environmental samples. Such procedures, coupled with sorting of the tentative identifications (TIDs) according to chemical class, can lead investigators working for state and tribal agencies or for the regulated community to identify the presence of chemical classes that may be cause for concern. Knowledge obtained about the presence of certain chemical classes (e.g., polychlorinated biphenyls [PCBs], polycyclic aromatic hydrocarbons [PAHs], pesticides) provides investigators with the opportunity to focus monitoring resources and to make well-informed decisions regarding follow-up efforts, such as whether additional analyses are necessary to determine the precise identity and concentration of specific PBTs in chemical classes of concern. Based on the outcome of confirmation studies, the sources of the PBTs to the environment can be investigated and the risks to human health and wildlife from PBTs can be assessed.

Accordingly, the U.S. Environmental Protection Agency (U.S. EPA) has developed analytical procedures to be used to screen samples of water, effluent, tissue, and sediment for a range of nonpolar, acid-stable PBTs [1] that extends far beyond the recognized organic priority pollutants. The procedures involve the use of innovative sample preparation techniques to allow for broad-spectrum organic chemical analysis coupled with a high level of sensitivity, a combination that enhances the utility of the methods as screening procedures [1]. To detect, tentatively identify, and approximately quantitate PBTs, cleaned extracts of aqueous, tissue, or sediment samples are separated using capillary column chromatography and are analyzed with full-scan electron impact ionization mass spectrometry. The sample preparation procedures include aggressive techniques such as sulfuric acid treatment and high-performance liquid chromatography fractionation to eliminate interferences, thereby improving the quality of the chromatographically generated mass spectra [1]. A mass spectral library reverse-search algorithm is employed to produce a list of tentative identifications for each sample component.

The present evaluation was initiated to assess the inter- and intralaboratory variability and ease of performance associated with the sediment method. This study was necessary in order to determine the reproducibility of the screening procedure; to gauge the readiness of the method to be used routinely in federal, state, and private-sector screening and monitoring programs; and to gauge the readiness of environmental analytical laboratories to use the method.

METHODS

Ten laboratories representing government, academia, and the private sector volunteered to participate in the study. Although the participants were experienced in gas chromatography-mass spectrometry (GC/MS) analysis, most of their experience was with chemical-specific analysis rather than with library searching for the tentative identification of unknowns. Only two of the laboratories had previous experience with the method; the other eight laboratories were performing the method for the first time using the study samples. The only qualification requirement placed on the laboratories was that they had the necessary equipment and that they were willing to participate without compensation. Although all of the laboratories had trace-level analytical experience, no demonstration of their abilities was required prior to study initiation. Interviews, conducted following receipt of the sample results from the laboratories, revealed that the experience levels for the analysts who actually performed the work ranged from extensive low-level contaminant analysis experience to little experience preparing sediment samples for GC/MS analysis. In an attempt to better acquaint the participants with the sediment method prior to the initiation of the study, a 2-d workshop was held for the analysts from the laboratories, focusing on the sample preparation techniques that were applied in the method in unique ways to allow for increased sensitivity.

In August 1992, homogeneous, frozen sediment samples (S1, S2, and S3) from three contaminated sites in the Lake Michigan, USA, drainage basin were shipped on ice via overnight courier to the participating laboratories, where they were stored at −10°C until sample extraction. In addition to the samples, each laboratory received a vial of a spiking solution (in acetone) containing eight aromatic organic compounds that are not common environmental contaminants. These included (1) 1,2-dibromocyclohexane, (2) 1,1′-oxybisbenzene, (3) 1-chloronaphthalene, (4)p-terphenyl, (5) 4,4′-dimethyl-1,1′-biphenyl, (6) hexabromobenzene, (7) 2,3,4,5,6-pentabromotoluene, and (8) p-quaterphenyl. Although the 10 laboratories were asked to prepare and analyze the three sediment samples plus a duplicate of sample S1 and a procedural blank sample, not all laboratories provided results for all samples.

Laboratory procedure

Approximately 20 g of homogenized sediment were mixed with sufficient sodium sulfate (100-140 g) to dry the sample, were spiked with 1 ml of the spiking solution (containing approximately 5 μg/ml of each of the eight chemicals), and were spiked with 1 ml of a surrogate solution containing 1 μg/ml each of [2H10]-biphenyl, [13C6]-1,2,4,5-tetrachlorobenzene, and [13C6]-hexachlorobenzene. The spiked sample was either Soxhlet extracted for ≥12 h or ultrasonically extracted three times using a 1:1 (v/v) mixture of acetone:hexane. The concentrated extract was then subjected to acid-Celite/silica gel and activated copper for cleanup [2].

The cleaned extract was fractionated on a high-performance liquid chromatography system equipped with either a 250 × 4.6-mm i.d., 5-μm C18 column or a 250 × 9.4 mm-i.d., 5-μm C18 column, and three fractions were collected. The high-performance liquid chromatography program conditions were identical to those reported previously [3]. Each fraction was diluted at least 10-fold with water and back extracted into hexane using either liquid/liquid extraction or C18 solid-phase extraction columns [2]. The fractions were concentrated to 100 μl, and each was spiked with 5 μl of 200 mg/L [2H12]-chrysene as an internal standard.

The high-performance liquid chromatography fractionation approach was designed to result in one surrogate chemical eluting in each fraction, specifically, [2H10]-biphenyl in fraction 1, [13C6]-1,2,4,5-tetrachlorobenzene in fraction 2, and [13C6]-hexachlorobenzene in fraction 3. The surrogates were quantified on the GC/MS using an internal standard method, and their percent recoveries were determined.

The three fractions were analyzed using GC/MS. All the GC/MS systems were manufactured by Hewlett-Packard (Avondale, PA, USA) and were of the quadrupole design. The systems were equipped with interfaced data systems, which included a version of the National Institute of Standards and Technology (NIST) mass spectral library containing more than 40,000 entries. The following conditions were specified for data acquisition, but not all the laboratories used them: DB-5 (J&W Scientific, Folsom, CA, USA) or equivalent column; scan range of 45 to 560 m/z; scan rate of 1 scan/s; electron energy of 70 eV; temperature program of 50°C for 4 min, linearly ramped to 175°C at a rate of 10°C/min, linearly ramped to 275°C at a rate of 5°C/min, and isocratic at 275°C for 20 min; 1 or 2 μl injections; transfer line temperature at 250 to 300°C; and helium carrier gas with a linear velocity of 30 cm/s at 250°C.

Following the subtraction of background spectra, the mass spectrum of each chromatographic peak was library searched. All of the laboratories employed a reverse-searching probability-based matching algorithm to provide a four-tiered report. All peaks, regardless of height/area, were first searched using a subset of the NIST library, called the Chemicals of Highest Concern library, consisting of approximately 20 priority pollutant PBTs, including pesticides, PCBs, and dioxin. Peaks that were tentatively identified using the Chemicals of Highest Concern library with a fit ≥70% were reported in report 1. Peaks that were not tentatively identified using the Chemicals of Highest Concern library and that had a height or area greater than or equal to the height or area for the surrogate peak were searched using the NIST library. Information for a peak was included in report 2 if it was tentatively identified, using the NIST library, with a fit ≥70%. Up to 10 TIDs could be reported in report 2, but TIDs with fits <70% were suppressed from the list. Information for a peak was included in report 3 if it was tentatively identified using the NIST library with a fit between 25 and 69%; only the two TIDs with the highest fit were reported. When no TIDs with fits ≥25% were found for a peak, the peak was reported as being unknown in report 4.

The quality assurance/quality control requirements included demonstrations of instrument performance, sensitivity, calibration, and stability, as in the U.S. EPA chemical-specific analytical methods [4, 5]. The laboratories were expected to achieve recoveries of 20 to 120% of all three surrogate compounds spiked into each sample and to demonstrate acceptable procedural blank sample results [2].

Data analysis

Reported sample components were deleted from further evaluation if they were common procedural contaminants observed in sediment preparation methodologies (e.g., molecular sulfur, phthalates); compounds with properties of PBTs but that do not bioconcentrate/bioaccumulate (e.g., alkanes); or contaminants unique to a particular laboratory, assessed through blank contamination and/or TID patterns unique to the laboratory and present in all their sample results (e.g., long-chain aldehydes such as octadecenal). The TIDs for the sample components that were position isomers were treated as though they were the same chemical because mass spectra for position isomers are virtually indistinguishable using the instrumentation employed in this study.

Table Table 1.. Success by laboratory in correctly identifying spiked chemicals in all three sediment samples (S1, S2, and S3)
Laboratory (group) Total spiked chemicalsa identified correctly or as an isomer (%)b Spiked chemicals 2,3,4,5c identified correctly or as an isomer (%)b
1 (II) 50 67
2 (I) 63 92
3 (II) 31 38
4 (I) 71 100
5 (I) 81 100
6 (I) 46 92
7 (I) 75 100
8 (II) 42 50
10 (II) 47 80
12 (I) 50 100
Average (SL)d 55 (0.054) 81 (0.0001)
  • a 1,2-Dibromocyclohexane; 1,1′-oxybisbenzene; 1-chloronaphthalene; p-terphenyl; 4,4′-dimethyl-1,1′-biphenyl; hexabromobenzene; 2,3,4,5,6-pentabromotoluene; p-quaterphenyl.
  • b At ≥25% fit.
  • c 1,1′-Oxybisbenzene, 1-chloronaphthalene,p-terphenyl, and 4,4′-di-methyl-1,1′ -biphenyl.
  • d SL = significance level. A value less than 0.05 indicates that there was an association between laboratory and the identification of chemicals.

The TIDs reported in the sediment samples by all the laboratories were grouped into 21 chemical classes according to structural similarity. There were three purposes for classifying the TIDs, which were to group the TIDs together whose presence and concentration could subsequently be confirmed using similar analytical procedures, to allow the evaluation of the success of the laboratories in tentatively identifying the sample components, and to illustrate how grouping TIDs from the results of the procedure can be used to make risk assessment decisions.

RESULTS AND DISCUSSION

Introduced chemicals

Spiked chemical results. Fisher's exact test was applied to the results for the spiked chemicals to assess whether correctly identifying spiked chemicals was statistically associated with the individual chemicals, laboratories, or samples. When the data for all of the spiked chemicals were statistically evaluated, there was no significant association (α = 0.05) between laboratories and their success in correctly identifying the spiked chemicals (Table 1), in spite of observed differences among the laboratories (31-81%). There was, however, a statistically significant association between individual chemicals and their likelihood of being correctly identified, primarily due to the poor results obtained by the laboratories for chemicals 1 (1,2-dibromocyclohexane), 6 (hexabromobenzene), 7 (2,3,4,5,6-pentabromotoluene), and 8 (p-quaterphenyl). These four chemicals were identified correctly (or as a position isomer) in less than 50% of the laboratories' analyses. The overall poor results for these four chemicals were primarily due to poor chromatography resulting from insufficient clean-up of the extracts by the laboratories, although other factors could have contributed [6]. When we removed the results for chemicals 1,6,7, and 8 from the data set, the laboratories separated into two distinct groups; six of the laboratories (group I) correctly identified either 92% (11 out of 12 correct chemical identifications) or 100% of the remaining four spiked chemicals in samples S1, S2, and S3, whereas the remaining four laboratories (group II) correctly identified 38 to 80% of the four chemicals (Table 1).

The difference in the success of identifying spiked chemicals between the group I and group II laboratories was not surprising considering the difficulties in performing the method (as a consequence of lack of experience with the method) that were reported by a number of the laboratories. In interviews with the participants following the submission of the results, almost all the laboratories concluded that a high skill level was required to achieve the ultralow-level detection requirement of the method and that previous experience with the method would have been invaluable. The complexity of the sample preparation and analytical procedures provides numerous opportunities for loss of spiked chemicals, especially because multiple sample concentration steps are required to achieve a final extract volume of 100 μl. This feedback led us to conclude that the success of identifying spiked chemicals would likely have been substantially better if more of the participants had been performing this method or similar ultratrace-level organic analyses on a routine basis prior to the initiation of this study. Because of the reported difficulties among the participating laboratories, we elected to evaluate whether a distinction existed between the group I and group II laboratories' surrogate recoveries.

Surrogate chemical results. With one exception, the surrogate chemicals were successfully identified by all the laboratories in all fractions. This identification success was expected because the method was calibrated to specifically analyze for the surrogate and internal standard chemicals. Acceptable recoveries were obtained for the [2H10]-biphenyl, [13C6]-1,2,4,5-tetrachlorobenzene, and [13C6]-hexachlorobenzene surrogates in 77, 87, and 96% of the analyses, respectively. However, the method specifies an acceptability criterion for surrogate recovery of 20 to 120% for all three surrogates/fractions, and there was a substantial difference between the group I and the group II laboratories in meeting this criterion. The group I and group II laboratories achieved acceptable recoveries for all three surrogates in 78 and 36% of the instances, respectively, where data were reported for all three fractions. These differences in surrogate recovery emphasize the distinction between the two groups of laboratories and are consistent with the previously discussed differences between the success of the two groups in correctly identifying the spiked chemicals.

Differences between the two groups of laboratories' results for the introduced chemicals could be linked to their varying levels of experience, with expertise ranging from chemists with extensive experience in low-level residue analysis to technicians with little experience preparing sediment samples for GC/MS analysis [7]. Some of the laboratories found the procedures to be much more labor intensive than they had expected and were therefore not prepared for the challenges of the procedures. In a number of laboratories, significant analytical interferences resulted from insufficient sample extract cleanup and glassware preparation, and poor surrogate recovery was attributed to cumulative loss during the multiple sample preparation steps. Seven of the 10 laboratories reported difficulties with the high-performance liquid chromatography fractionation step, and five laboratories had difficulty programming their GC/MS software to generate the reports in the format specified by the method. The result of these problems was that the method specifics were not always followed, that some laboratories' results were incomplete, and that the method quality-control criteria were not always achieved. This outcome illustrates that some of the laboratories participating in this study were well prepared by virtue of their previous experience (group I), while others (group II) were less ready to successfully perform the method on complex sediment samples.

Table Table 2.. Number of peaksa reported for each sediment sample
Sediment sample
Laboratory S1 S1B S2 S3
2 113 (13) NAb 24 (4) 200 (22)
4 81 (14) >118 (25)c 44 (4) 396 (56)
5 114 (14) 123 (31) 106 (51) 150 (3)
6 65 (3) 45 (4) 9 (2) 114 (10)
7 NA NA 73 (40) NA
12 97 (4) NA 31 (7) 150 (17)
 Total 462 279 289 1,017
  Total peaks with one class assignmentd 363 234 251 856
  • aNumber of peaks for sample components reported at concentrations ≥50 μg/kg, not including spiked chemicals, surrogate or internal standard compounds, or components that were not tentatively identified with >25% fit (i.e., unknowns). The number of unknowns is included parenthetically.
  • b NA = not available.
  • c Fraction 1 was lost.
  • d Chemical classes presented in Table 3.

The less than adequate level of preparedness demonstrated by the group II laboratories led us to conclude that their data should be eliminated from further analysis of the interlaboratory agreement associated with the methods. We felt that incorporation of the unqualified laboratories' results would introduce variability associated with the background experience of the laboratories rather than with the method itself. Consequently, the analysis of the laboratories' agreement in reporting sample components was performed using the data from only the group I laboratories.

Sample components

The results for the sediment sample components that were not spiked compounds, surrogate compounds, or compounds deleted from consideration (i.e., common procedural contaminants, nonbioaccumulative compounds, and contaminants unique to a particular laboratory) were used to assess the interlaboratory variability for each of the three sediment samples. Because the analytical method is a screening method that results in the generation of lists of TIDs for each component and because there was no way to know in advance what contaminants were present in the sediments used for this study, we could not determine conclusively which of the sample components were correctly identified by each laboratory.

The most straightforward approach for comparing the laboratories' results for unknown sample components was to examine the number of chromatographic peaks reported by each of the laboratories for each sediment sample. This evaluation found that the number and retention times of peaks reported by each laboratory varied (Table 2). However, because of the complexity and varied nature of the chromatograms and reports submitted by the laboratories and because the sample compositions were not known, we were unable to evaluate the interlaboratory variability associated with the reported sample components by comparing peaks or lists of TIDs for peaks. Consequently, we attempted to compare the entire list of TIDs received from the laboratories for each sample. Comparing the lists of TIDs proved to be unmanageable and did not provide understandable measures of agreement, even when position isomers were equated. These difficulties were caused, we believe, by the combined effects of the different numbers of TIDs and peaks reported by each laboratory; the presence of multicomponent chemical classes, e.g., the tetrachlorobiphenyls, which provided numerous TIDs and peaks; and the use of the tiered reporting format required by the method, which caused some TIDs to be suppressed from the reports.

In order to facilitate a comparison between the laboratories, the 1,081 different TIDs reported by all the laboratories for all the samples were grouped into 22 classes according to structural similarity (Table 3). Because the lists of TIDs generated by the method for an individual sample component included spectrally similar chemicals, which often had structural and functional similarities, we reasoned that, by identifying the presence of chemical classes, the objective of screening for PBTs using the method was achieved. For instance, TIDs for the PAH benzo[a]pyrene may have included other five-ring PAHs (such as benzo[j]fluoranthene and benzo[e]pyrene) that were not considered position isomers of benzo[a]pyrene but that were spectrally and structurally similar and had similar chemical characteristics. In this example, the tentative identification of high molecular-weight PAHs would allow investigators to assess the relative risk to the exposed population so that decisions could be made regarding additional site monitoring and confirmation analysis (using a PAH-specific analytical method) to establish the true identification and the actual concentration of the compound(s) of concern.

For the majority of the components, a single chemical class was reported, although in each laboratory's results there were peaks for which more than one class was represented in the list of TIDs (Table 2). There were a total of 21, 17, and 21 classes of compounds reported for the peaks with a single class assignment in samples S1, S2, and S3, respectively. In general, the samples were characterized similarly, on a chemical class basis, by each of the laboratories. For instance, all the laboratories' results for sample S1 were overwhelmingly dominated by PCBs, and PAHs were also reported in this sample by all the laboratories. The PAHs were the dominant TIDs for sample S2 and were reported by all the laboratories, although sample S2 appeared to be the least contaminated of the three samples from the standpoint of having the fewest PBTs. The laboratories' results uniformly indicated that sample S3 was the most complex of the three samples in terms of both numbers and types of in situ sample components.

Table Table 3.. Number of peaks identified in each class by each laboratory and agreement among laboratories
Number of peaks identified in the class (% agreement among laboratories in reporting the presence or absence of the class)
Class Sample 1 laboratories 2, 4, 5, 6, 12 Sample 2 laboratories 2, 4, 5, 6, 7, 12 Sample 3 laboratories 2, 4, 5, 6, 12 % Agreement across samples
Aliphatic 1, 0, 8, 0, 9 (60) 0, 0, 25, 0, 4, 3 (50) 0, 5, 0, 0, 12 (60) 57
Halogenated aliphatic 1, 0, 0, 0, 2 (60) 0, 0, 0, 0, 1, 0 (83) 0, 1, 0, 0, 1 (60) 68
Cycloaliphatic 11, 4, 11, 2, 5 (100) 0, 3, 7, 0, 8, 0 (50) 2, 13, 0, 3, 5 (80) 77
Halogenated cycloaliphatic 0, 0, 0, 0, 0 (100) 0, 0, 0, 0, 0, 0 (100) 1, 2, 0, 0, 0 (60) 87
Heterocyclic 0, 3, 1, 1, 1 (80) 0, 2, 23, 0, 8, 2 (67) 0, 5, 1, 0, 3 (60) 69
Alkyl benzene 2, 1, 2, 2, 1 (100) 0, 0, 1, 0, 2, 0 (67) 5, 13, 2, 1, 0 (80) 82
Substituted aromatic 3, 2, 1, 1, 2 (100) 0, 0, 0, 0, 5, 0 (83) 4, 7, 0, 1, 6 (80) 88
Halogenated aromatic 0, 1, 0, 0, 2 (60) 1, 0, 1, 0, 3, 0 (50) 0, 1, 0, 0, 0 (80) 63
Halogenated condensed aromatic 0, 0, 0, 0, 1 (80) 0, 0, 0, 0, 0, 0 (100) 0, 1, 0, 0, 0 (80) 87
Two ring PAHa 4, 2, 3, 0, 0 (60) 0, 0, 0, 0, 0, 0 (100) 20, 35, 24, 20, 15 (100) 87
Three-ring PAH 5, 6, 4, 2, 4 (100) 2, 2, 4, 0, 1, 2 (83) 11, 21, 13, 12, 11 (100) 94
Four-ring PAH 6, 4, 9, 3, 2 (100) 6, 7, 9, 5, 6, 7 (100) 18, 34, 18, 8, 9 (100) 100
Five-ring PAH 2, 2, 2, 0, 0 (60) 2, 3, 4, 1, 6, 3 (100) 8, 24, 4, 4, 7 (100) 87
Six-ring PAH 0, 1, 0, 0, 0 (80) 2, 2, 2, 0, 0, 0 (50) 2, 4, 0, 1, 2 (80) 70
Substituted biphenyl 2, 0, 2, 0, 5 (60) 0, 0, 2, 0, 3, 0 (67) 10, 23, 10, 8, 0 (80) 69
Halogenated biphenyl 38, 34, 34, 44, 30 (100) 2, 1, 1, 0, 0, 0 (50) 3, 8, 6, 1, 0 (80) 77
Substituted condensed aromatic 5, 3, 8, 4, 7 (100) 1, 3, 2, 1, 3, 2 (100) 28, 40, 24, 21, 8 (100) 100
Substituted heteroaromatic 1, 0, 2, 0, 0 (60) 0, 1, 0, 0, 2, 0 (67) 0, 0, 0, 0, 2 (80) 69
Halogenated heteroaromatic 0, 0, 0, 0, 1 (80) 0, 0, 0, 0, 0, 0 (100) 0, 0, 0, 0, 0 (100) 93
Substituted condensed heteroaromatic 4, 2, 7, 0, 1 (80) 0, 5, 0, 0, 4, 2 (50) 19, 42, 15, 11, 9 (100) 77
Halogenated condensed heteroaromatic 0, 0, 0, 0, 1 (80) 0, 0, 0, 0, 0, 0 (100) 0, 1, 0, 0, 1 (60) 80
Aliphatic carboxylic acid 0, 0, 0, 0, 1 (80) 0, 0, 1, 0, 3, 0 (67) 0, 0, 0, 0, 1 (80) 76
Overall agreement (%) 81 77 82 80
  • a PAH = polycyclic aromatic hydrocarbon.

We were unable to apply conventional agreement statistics to assess the agreement among laboratories in reporting the classes because those analyses depended on either knowing which classes were correct or having numerical values from which arithmetic means and coefficients of variation could be calculated. However, it was possible to assess the percent agreement among laboratories by calculating the number of times laboratories agreed on the presence or absence of a chemical class without judging whether or not the answer was correct (Table 3). Although the number of peaks reported by each laboratory for each individual chemical class varied, there was good agreement across laboratories in reporting the presence or absence of chemical classes in samples. In samples S1, S2, and S3, respectively, 15, 11, and 17 of the 22 classes were reported as present or absent by more than 80% of the laboratories; overall agreement ranged from 77 to 82%. Intralaboratory variability, assessed by examining the repeatability in reporting the classes of chemicals for sample S1 and S1-duplicate by laboratories 4, 5, and 6, was 84%.

We found that it was not possible, however, to compare the inter- and intralaboratory variability observed in this study with the published variability associated with other analytical methods. And, to the best of our knowledge, there are no other PBT screening methods against which to compare this method. In inter- and intralaboratory studies of other analytical chemistry procedures, variability has been determined on the basis of known concentrations of chemicals in a particular sample matrix. In contrast, the agreement rates in this study relate to the success of the participating laboratories in qualitatively identifying chemical classes (Table 3). Because of the underlying differences between qualitative and quantitative measures of variability, comparing the variability of this method with other methods was not revealing.

We feel it is important to recognize, however, that inter-and intralaboratory variability are not the only features that are important to consider in assessing the utility of a screening procedure. For any analytical screening tool, it is important for a laboratory to be confident that it can detect the chemicals that are present at levels of concern so that risk assessment and subsequent confirmatory analytical decisions can be made. Although it would be desirable for a PBT screening method to detect and tentatively identify all PBTs in a sample while minimizing false positive results, it is not realistic to assume that this is possible, particularly in a complex environmental matrix like a contaminated sediment. All analytical chemistry procedures have limitations, which are the function of factors such as detection limits, the limitations of mass spectral library searching algorithms, and background interferences. Unfortunately, the results from this study did not permit us to assess the accuracy of the method by evaluating reportings of false negative and false positive results. However, the overall interlaboratory agreement of 80% and intralaboratory agreement of 84% suggest that the method is reasonably reproducible.

SUMMARY AND CONCLUSIONS

This procedure was a relatively recent development, employing challenging sample preparation and analysis techniques in an innovative combination to achieve a qualitative/semiquantitative understanding of the PBTs in sediments. This study found that some laboratories that were experienced in performing other well-established analytical methods were less prepared to complete this procedure. The 10 laboratories separated into two distinct groups based on their success in correctly identifying four of the eight spiked chemicals and on their ability to achieve the criterion for surrogate recovery; the group I laboratories performed better than the group II laboratories in both regards. We believe that the differences in laboratory performance were a function of each laboratory's level of preparedness to perform the method, which in turn was responsible for the observed differences between the group I and group II laboratories. The differences that were found between the results for the two groups of laboratories is an important outcome from this investigation, reflecting on the utility of the method to be used routinely as a tool for screening sediments for PBTs. This finding points to the importance of selecting laboratories with relevant experience, adequate equipment and instrumentation, and appropriately trained personnel to perform the procedure.

The method was designed to characterize the PBT nature of sediment samples. Although the accuracy of the results could not be verified, prepared laboratories provided similar lists of TIDs for each of the three sediment samples. The overall interlaboratory agreement of 80% in reporting the presence or absence of the reported chemical classes suggests that the method was reasonably reproducible when performed by properly prepared laboratories.

Some important conclusions can be drawn from our experiences about designing studies for the evaluation of inter-and intralaboratory variability associated with screening methods. First, samples of known composition, both qualitatively and quantitatively, should be used. If the sample compositions are known, conventional agreement statistics can be applied, thereby allowing the results to be comparable to the results of interlaboratory studies for other methods and allowing questions about false positive and false negative results to be addressed. Second, as noted above, the use of experienced and prepared laboratories is essential. Third, complete, unsuppressed library search results should be used to assess the inter-and intralaboratory variability. Fourth, due to the difficulties of matching/aligning reported chromatographic components and their lists of TIDs among different laboratories, the use of identical chromatography conditions among all laboratories is strongly recommended. With these improvements, results from investigations of inter- and intralaboratory variability would be much more conclusive in defining the robustness of PBT screening methodologies.

In summary, the findings of this study suggest that similar results for PBTs are generated when the sediment screening method is used by multiple laboratories that are sufficiently prepared to perform the method. As with any analytical methodology, it is important for investigators to clearly understand the scope and limitations of the data generated by the method. We anticipate that, with improvements to the procedure and with gains in experience among laboratories in performing this type of analysis, the challenges associated with performing the method will be reduced and the ability of laboratories using the method to tentatively identify PBTs in environmental samples of varying complexity will increase.

Acknowledgements

We thank William Morrow and John Miller, U.S. EPA Office of Water and the U.S. EPA Office of Research and Development. We thank the laboratories that volunteered to participate in the study, those being ABC Laboratories, A.D. Little, AScI, Eastman Kodak, Louisiana State University, National Council of the Paper Industry for Air and Stream Improvement, Occidental Chemical, Research Triangle Institute, Shell Development, and the U.S. EPA National Health and Environmental Effects Research Laboratory Mid-Continent Ecology Division. We also thank John Bachman.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.