Metagenomic insights into microbial metabolism affecting arsenic dispersion in Mediterranean marine sediments
Abstract
Microorganisms dwelling in sediments have a crucial role in biogeochemical cycles and are expected to have a strong influence on the cycle of arsenic, a metalloid responsible for severe water pollution and presenting major health risks for human populations. We present here a metagenomic study of the sediment from two harbours on the Mediterranean French coast, l'Estaque and St Mandrier. The first site is highly polluted with arsenic and heavy metals, while the arsenic concentration in the second site is below toxicity levels. The goal of this study was to elucidate the potential impact of the microbial community on the chemical parameters observed in complementary geochemical studies performed on the same sites. The metagenomic sequences, along with those from four publicly available metagenomes used as control data sets, were analysed with the RAMMCAP workflow. The resulting functional profiles were compared to determine the over-represented Gene Ontology categories in the metagenomes of interest. Categories related to arsenic resistance and dissimilatory sulphate reduction were over-represented in l'Estaque. More importantly, despite very similar profiles, the identification of specific sequence markers for sulphate-reducing bacteria and sulphur-oxidizing bacteria showed that sulphate reduction was significantly more associated with l'Estaque than with St Mandrier. We propose that biotic sulphate reduction, arsenate reduction and fermentation may together explain the higher mobility of arsenic observed in l'Estaque in previous physico-chemical studies of this site. This study also demonstrates that it is possible to draw sound conclusions from comparing complex and similar unassembled metagenomes at the functional level, even with very low sequence coverage.
Introduction
Arsenic water contamination originating from industrial or mining activities represents a major health threat for local human populations (Chowdhury 2004). Because of leaching and contaminated water release, marine harbour sediments are major sinks for many pollutants, including arsenic and metals. The microorganisms that dwell in sediments have a crucial role in biogeochemical cycles and, as a consequence, are expected to have a strong influence on arsenic cycle (Oremland & Stolz 2005; Oremland et al. 2005).
High-throughput metagenomic sequencing has been widely used to provide a comprehensive view of ecology of microbial communities and new insights into microbial life (Venter et al. 2004; Biddle et al. 2008, 2011; Dinsdale et al. 2008; Hemme et al. 2010). Moreover, it has highlighted habitat-specific functions and profiles by comparing data sets from various environments (Tringe et al. 2005; Inskeep et al. 2010) and has helped to understand how bacteria cooperate to thrive in an arsenic-rich environment (Bertin et al. 2011).
We report here a comparative metagenomic study of two arsenic-polluted marine harbour sediments on the French Mediterranean coast: l'Estaque, highly contaminated by arsenic (653 μg/L in sediment interstitial water) and St Mandrier, with an arsenic concentration of 6 μg/L in interstitial water, below the 10 μg/L concentration limit recommended for drinking water by the World Health Organisation. Both sites were also contaminated by other pollutants such as heavy metals and organic contaminants. The sampling campaign for this study was performed at the same time and place as the geochemical characterization of these sites by Mamindy-Pajany et al. (2013). Those authors showed that arsenic was nine times more mobile in l'Estaque than in St Mandrier and detected both sulphide and thioarsenical complexes in l'Estaque pore water. However, the biotic factors having a role in the production of sulphide that may react with arsenic to form thioarsenical complexes were not characterized. Microorganisms are known indeed to have an important role in the sulphur cycle in sediments. For instance, sulphate-reducing bacteria (SRBs) such as δ-Proteobacteria and Firmicutes are able to use sulphate as a terminal electron acceptor to degrade organic matter, thus producing hydrogen sulphide (Muyzer & Stams 2008). A key enzyme in SRBs is the dissimilatory sulphite reductase (Dsr) that catalyses sulphite reduction into sulphide. Loy et al. (2009) identified a reverse Dsr (r-Dsr) that catalyses the reverse reaction in sulphur-oxidizing bacteria (SOBs) such as sulphur phototrophic bacteria. The reverse and the regular Dsr are homologous, but their sequences can be phylogenetically separated (Loy et al. 2009).
The aim of our work, complementary to the geochemical studies by Mamindy-Pajany et al. (2013), was to identify the biotic factors at work in both sites by metagenomic and bioinformatic approaches in order to point out the potential adaptive strategies and metabolic capacities of those communities in response to different levels of arsenic pollution and their impact on arsenic and sulphur cycles.
Material and methods
Sites description
Two sites were selected on the South coast of France: l'Estaque marina, near Marseille, polluted with high arsenic concentrations (653 μg/L in sediment interstitial water, 194 mg/kg in solid phase) due to former metallurgic and industrial activities and St Mandrier, a marina near Toulon where sediments presented lower concentrations of arsenic (6 μg/L in interstitial water, 10 mg/kg in solid phase; Mamindy-Pajany et al. 2013). Both sites were also contaminated by heavy metals (Cu, Ni, Pb, Zn), mineral oils, polycyclic aromatic hydrocarbons and polychlorinated biphenyls (Mamindy-Pajany et al. 2013).
The pH values were 7.5 and 7.7 for l'Estaque and St Mandrier samples, respectively. Sulphate concentrations were similar on both sites (2890 mg/L in l'Estaque, 3070 mg/L in St Mandrier), but sulphate reduction products (sulphide and thiosulphate) were only detected in l'Estaque where thioarsenical complexes were also detected in interstitial water (Mamindy-Pajany et al. 2013).
Sampling campaign and DNA extraction
The top surface sediments (0–10 cm) were collected in March 2009 during the same sampling campaign as Mamindy-Pajany et al. (2013). Samples from l'Estaque were retrieved at a depth of 2.5 m (43° 21,657N 5° 18,665E) and samples from St Mandrier at a depth of 4 m (43° 04,737N 5° 55,447E). The sediments were re-covered in 250-mL sterile containers, immediately frozen in liquid nitrogen, kept on dry ice during transportation and stored at −80 °C.
For each site, sediments from three independent samples were used for DNA extraction with Mo-Bio PowerMax Soil DNA extraction kit (Mo-Bio). The extracted genomic DNAs (gDNAs) were pooled together.
Taxonomic DNA microarray analysis
Oligonucleotide probes (25-mer) were designed with the phylarray software (Militon et al. 2007) and spotted on a phylogenetic DNA microarray (14 399 probes targeting 1938 prokaryotic genera). Microarrays (44K arrays) were manufactured by Agilent Technologies®. Triplicate oligonucleotides were synthesized in situ, and probes were randomly distributed across the array in order to minimize spatial effect. Random probes were deposited on the microarray to provide a metric for technical background noise and as control probes.
Extracted gDNA was labelled with the Bioprime® labelling kit according to the manufacturer's recommendations (Invitrogen). Labelling was quantified by spectrophotometry (at 260 nm) and dye incorporation with a Nanodrop ND-1000 (NanoDrop Technologies). DNA targets were controlled by electrophoresis on a 1% agarose gel. Hybridization reactions were performed with 1 μg of gDNA in Agilent hybridization chambers at 65 °C for 17 h. Slides were washed according to manufacturer's instructions, dried and scanned at a resolution of 10 μm using a DNA microarray scanner (Agilent Technologies®) at a photomultiplier sensitivity setting of 100%. The image data were used for quantification by image analysis using genepix pro6® software (Axon Instruments). To quantify fluorescence of the features, local median background value was subtracted from the foreground intensity.
454/Roche sequencing
Purified gDNA was quantified using Quant-iT PicoGreen kit (Invitrogen). After gDNA nebulization, libraries were prepared with the GS DNA Library Preparation Kit according to manufacturer's recommendations (454 Life Sciences; Roche Diagnostics) using 5 μg of each gDNA sample. Sheared DNA was ligated to the linker for emPCR and sequencing. Final libraries were quantified with the SlingShotTM kit using the Fluidigm® Digital Array for sample quantification following manufacturer's instructions. Pyrosequencing using the 454/Roche GS FLX Titanium chemistry was carried out with the GS Titanium LV emPCR kit and sequencing kit according to manufacturer's instructions. Each sample was sequenced on one region of a two-region Picotiterplate.
Raw data were deposited into the Sequence Read Archive (SRA) with the study Accession no ERP001568.
Bioinformatics analyses of 454 sequence reads
Metagenomic sequence data quality filtering
The raw metagenomic sequences (553 850 reads from l'Estaque, 540 574 from St Mandrier) were filtered with the Pyrocleaner program (Mariette et al. 2011) to remove low complexity, low quality sequences and duplicates. Reads were removed when none of the 100-base-long sliding windows had a complexity/length ratio above 40. Reads containing more than 4% of undetermined bases were considered of poor quality and discarded. Duplicates were filtered out with the nonaggressive default option (length difference lower than 70). Reads shorter or longer by 2 standard deviations than the average read length were discarded.
Thus, 47 777 and 47 840 reads were removed respectively from l'Estaque and St Mandrier sets.
Reference metagenomes
Although l'Estaque and St Mandrier presented strong differences in arsenic concentrations of two orders of magnitude, they nonetheless presented very similar profiles for other pollutants. It was thus expected that the responses of microbial communities to those contaminants might interfere with and partly mask differences due to the response to arsenic. Furthermore, our main goal was to build a model of the function of the two communities, highlighting not only their differences but also any process involved in the response to contaminants that they may share. A direct comparison of the two metagenomes only pointing out differences would only yield trivial results and a partial picture of the whole model. A background was therefore needed that may both contrast the two metagenomes and provide a means to separate the common processes relevant to the specific conditions on the studied sites from the noninformative ones that are ubiquitous and are not affected by those conditions. We therefore selected four publicly available metagenomes obtained from marine environments that did not present any significant contamination by arsenic or other pollutants and were sequenced by 454/Roche: two samples from a Norwegian fjord water (Accession nos SRS000294, SRS000296—Gilbert et al. 2008) and two from the Peru Margin subseafloor sediment (accession numbers SRR001322, SRR001324—Biddle et al. 2008). These data were collected from the CAMERA portal (Sun et al. 2010) data repository.
Sequence annotation
Reads from all six metagenomes were submitted for annotation to the Metagenomic Data Annotation and Clustering workflow RAMMCAP (Li 2009) available on the CAMERA portal. Ribosomal RNA sequence prediction was performed using the Hidden Markov Model strategy and potential open-reading frames (ORF) were predicted with Metagene (Noguchi et al. 2006). ORFs were subsequently matched against TIGRFAM (Selengut et al. 2007) and Pfam (Punta et al. 2011) models with HMMER (Eddy 2011). All program options were set to their default values as defined by the RAMMCAP workflow. Gene Ontology (GO) annotations (Ashburner et al. 2000) derived from TIGRfam and Pfam hits were assigned to matching ORFs. The mapping between GO entries and reads was propagated from low-level entries to their parents in order to take into account the hierarchical nature of the GO graph and homogenize GO annotation levels between all TIGRFAM and Pfam models. Thus, we could ensure that higher-level GO entries would be counted even if they were not explicitly mapped to a TIGRFAM or Pfam entry.
In order to search for genes of interest for which no specific TIGRFAM nor Pfam model was available, the relative counts in the metagenomes were estimated by searching reads translated in all six frames with the blastp program using an appropriate reference sequence as a query, default parameters and a very stringent expect value threshold of 10−10 in order to favour specificity over sensitivity.
Over-representation analysis of GO entries
For each GO entry from the Biological Process hierarchy, we counted the number of associated reads in each metagenome sequence set. Following the guidelines in Beszteri et al. (2010), the raw counts were normalized by the relative average genome size across metagenomes computed according to Raes et al. (2007) (Appendix S1, Supporting information). The significance of the over-representation for each GO entry in each site was assessed using the Fisher's exact test (Fisher 1970), and the P-values were corrected for multiple testing using the Benjamini–Hochberg false discovery rate method implemented in the R package fdrtools (Strimmer 2008). For improved readability, a specificity score was computed for each GO category i in every site j as Si,j = −log10(fdri,j) where fdri,j is the corresponding false discovery rate value. Categories having a specificity score higher or equal than 2 (fdr ≤ 1%) for St Mandrier or l'Estaque were selected for further examination. In the text below, we will note SE the specificity scores for l'Estaque, and SS the specificity scores for St Mandrier.
Discrimination between DsrA and reverse DsrA sequences
The subunits A of dissimilatory sulphite reductase (Dsr) enzymes and their reverse counterparts can be phylogenetically separated by their sequences and may thus provide good markers of the presence of SRBs or SOBs in the microbial population (Loy et al. 2009). A set of 20 regular and reverse Dsr subunit A protein sequences representing the major taxons of SRBs (δ-Proteobacteria and Firmicutes) and SOBs (Chromatiales, Chlorobiales and β-Proteobacteria) were selected as reference sequences from the UniprotKB database (The UniProt Consortium 2011) and compared with each other with the blastp program (Altschul et al. 1997). Reference sequences were grouped in regular (SRBs) and reverse (SOBs), and for each reference sequence, we computed the mean of its percent similarities with every sequence from the SRB group and from the SOB group, respectively. We thus obtained two values for each of the 20 reference sequences that were arranged in two arrays of length 20. These arrays represented the expected similarity relative to each of the 20 reference sequences of an average SRB DsrA sequence for the first array, and of an average SOB r-DsrA sequence for the second array (Table 1). Because those arrays distinctly show opposite profiles, we assumed that the array representing the similarities of an unknown DsrA sequence relative to the 20 references would present a positive correlation with the profile of the group (SRBs or SOBs) it belongs to and a negative or lower correlation with the other one.
Groups | Species | Database references | Profiles | |
---|---|---|---|---|
SRB | SOB | |||
SRB | ||||
δ-Proteobacteria | Desulfatibacillum alkenivorans | B8FME3_DESAA | 0.801 | 0.593 |
Desulfobacula toluolica | CCK78725.1 | 0.787 | 0.585 | |
Desulfococcus oleovorans | AAQ05957.1 | 0.684 | 0.475 | |
Desulfovibrio desulfuricans | Q9AIH7_DESDE | 0.676 | 0.485 | |
Firmicutes | Desulfitibacter alkalitolerans | AAU95491.1 | 0.673 | 0.520 |
Desulfitobacterium hafniense | YP_516542.1 | 0.744 | 0.595 | |
Desulfosporosinus sp. | ZP_08812562.1 | 0.742 | 0.594 | |
Desulfotomaculum reducens | YP_001114514.1 | 0.742 | 0.593 | |
SOB | ||||
β-Proteobacteria | Burkholderiales | H5WQR1_9BURK | 0.594 | 0.780 |
Sideroxydans lithotrophicus | D5CSI3_SIDLE | 0.589 | 0.772 | |
Thiobacillus denitrificans | Q3SJA5_THIDA | 0.601 | 0.775 | |
Thiobacillus thioparus | ABX82443.1 | 0.520 | 0.675 | |
Chromatiales | Alkalilimnicola ehrlichii | YP_742489.1 | 0.583 | 0.781 |
Allochromatium vinosum | ADC62190.1 | 0.581 | 0.755 | |
Halorhodospira halophila | YP_001003517.1 | 0.577 | 0.762 | |
Thioalkalivibrio sp. | B8GUE5_THISH | 0.598 | 0.764 | |
Chlorobiales | Chlorobium limicola | B3EHJ7_CHLL2 | 0.571 | 0.741 |
Chlorobium phaeobacteroides | B3ELT3_CHLPB | 0.582 | 0.744 | |
Pelodictyon luteolum | Q3B6V5_PELLD | 0.574 | 0.748 | |
Prosthecochloris aestuarii | B4S963_PROA2 | 0.578 | 0.755 |
- The SRB and SOB profiles give for each reference sequence the average similarity it has with all sequences from the SRB group and from the SOB group, respectively. Those profiles show distinctive and opposite trends: for every sequence in SRB and SOB groups, the average similarity is higher in the corresponding profile (bold values) than in the other one. It is thus expected that the similarity values between an unknown DsrA sequence and every reference sequence would correlate positively with the profile of the group it belongs to (SRB or SOB) and negatively with the other.
In order to validate this assumption, the method was assessed using three test sets of 10 000 fragments each (DsrA, r-DsrA and non-DsrA 4Fe-4S proteins): we extracted from the UniprotKB database 131 DsrA sequences of SRBs (109 δ-Proteobacteria and 22 Firmicutes), 21 r-DsrA sequences of SOBs (7 β-Proteobacteria, 8 Chromatiales and 6 Chlorobiales) and 893 proteins containing a 4Fe-4S domain, but not annotated as sulphite reductases to serve as a negative control set. Environmental sequences were excluded from those sets as their taxonomic origin could not be trusted. Test sets of 10 000 DsrA fragments, 10 000 r-DsrA fragments and 10 000 4Fe-4S proteins fragments were obtained by random sampling fragments from 100 to 215 amino acids from the full sequences, reproducing the actual size range of metagenomic reads. In order to introduce noise that would mimic lower similarities due to sequencing errors, the SRB and SOB sampled fragments were randomly modified by the msbar program from emboss package (Rice et al. 2000). For each fragment, 10 modifications (replacement, deletion or insertion) were applied to the sequence. Negative control 4Fe-4S protein fragments were kept unmodified in order not to reduce their similarity with sulphite reductases and thus test the worst possible scenario for false positives.
The reference DsrA sequences were used as queries in blastp searches against each set of sampled fragments without any limitation in the number of returned hits (blastp parameter -max_target_seqs 10 000). A number of 9976 SRB fragments, 9434 SOB fragments and 504 4Fe-4S protein fragments reached an expect value of <10−5 with at least one of the reference sequences. The fragments were assigned to the group (SRBs or SOBs) whose profile correlated best with the similarity values between the fragment and reference sequences. The ROC curve showed that a Pearson correlation threshold of 0.75 would be conservative and would both indicate a good correlation and provide a specificity of 100% while maintaining a reasonably high sensitivity level of 38.6% (Fig. S1, Supporting information).
Abundance of sulphate reducers and sulphur oxidizers
DsrA reference sequences were subsequently used as queries for blastp searches against translations in all six frames of reads from l'Estaque and St Mandrier. Hits involving translated reads longer than 100 amino acids and having a blastp expect value of <10−5 were selected as potential DsrA reads and attributed an array of 20 values representing their similarities relatively to each of the 20 reference sequences. The Pearson correlation coefficients were then computed between this array and the reference SRB and SOB profiles described above. A read was assigned to the group (SRB or SOB) with the highest Pearson correlation coefficient of both. In order to avoid false positives, only Pearson correlation coefficients >0.75 were considered.
The independence between the counts of SRB reads and both sites was assessed by the Fisher's exact test.
Taxonomic profile of 454 sequences
The 16S ribosomal RNA read sequences predicted by RAMMCAP were submitted to the Ribosomal Database Project (RDP) naïve Bayesian classifier (Wang et al. 2007) for Operational Taxonomic Unit (OTU) prediction at the default level of 80%.
The abundance-based coverage estimator (ACE; Chao et al. 2000) was computed using the R package vegan.
Results
Taxonomy and microbial diversity
Taxonomic DNA microarrays revealed a similar prokaryotic diversity between both environments with a total of 22 phyla and 30 classes detected. St Mandrier seemed slightly more diversified than l'Estaque with 122 genera against 109. The major phyla in l'Estaque and St Mandrier were the Proteobacteria followed by Actinobacteria, Firmicutes and Bacteroidetes (Table S1, Supporting information).
In parallel, the 454/Roche reads identified by RAMMCAP as partial 16S rDNA sequences were submitted to the RDP classifier for taxonomic assignment.
We first tested the method on data published earlier by Gilbert et al. (2008) in order to confirm that a relatively small number of 16S reads obtained from nonamplified environmental metagenomic sequencing may nonetheless allow to identify the dominant taxonomic classes. Classes were thus predicted for 135 and 96 reads from the Norwegian fjord plankton bloom peak and bloom decline data, respectively, satisfactorily reproducing the results by Gilbert et al. (2008) with the dominant classes being α-Proteobacteria, γ-Proteobacteria and Flavobacteria (Bacteroidetes).
We therefore applied this strategy to the 16S reads from l'Estaque and St Mandrier. Classes were assigned between 80 and 95 reads from l'Estaque and St Mandrier, respectively. The Good's coverage (Good 1953) was 95.0% for l'Estaque and 94.7% for St Mandrier. The ACE (Chao et al. 2000) for classes gave an expected total number of 16 and 20 classes for l'Estaque and St Mandrier, respectively.
The most abundant classes predicted in the two Mediterranean harbours were the δ-Proteobacteria and γ-Proteobacteria (Fig. 1), the δ-Proteobacteria being mainly represented by Desulfobacterales. However, the proportion of Desulfobacterales was higher in l'Estaque (54.7% of the 16S reads taxonomically assigned at the order level) than in St Mandrier (31.7% of the 16S reads assigned at the order level).

Biological processes analysis
Attempts to assemble reads into contigs with the Newbler assembler yielded 18 174 contigs with an average length of 407 bases for l'Estaque and 17 109 contigs with an average length of 396 bases for St Mandrier. The largest contigs were 5389 bases long for l'Estaque and 3849 bases for St Mandrier. The reads that could be assembled into contigs represented only 7.9% of the total in l'Estaque and 7.6% in St Mandrier. These results suggest that both metagenomes presented a very high sequence diversity and that most of the data was actually represented by unique sequences. At the sequence level, coverage was therefore extremely low and would not have allowed to exhaustively identify every single gene copy present in the metagenomes. However, when genes or gene families predicted for all reads were grouped together by GO categories, the coverage for those functional categories was high enough to allow the comparison of the metagenomes at the functional level. The rarefaction curves (Fig. S2, Supporting information) and the Good's coverage of 99.95% obtained for l'Estaque and St Mandrier confirmed that there was enough information in those metagenomes to draw significant conclusions from their comparison. The Norwegian fjord and Peru margin metagenomes presented similar coverage values (99.8%, 99.89%, 98.96% and 97.69%) confirming them as suitable control data sets.
The full list of GO biological process entries significantly over-represented in l'Estaque, in St Mandrier or in both sites comparatively to the background metagenomes is available in the Table S2 (Supporting information). A selection of biological processes of interest discussed in the text is given in Table 2.
GO id | GO annotation | S E | S S | |
---|---|---|---|---|
Energy production and metabolism | GO:0006113 | Fermentation | 0.41 | 7.72 |
GO:0009061 | Anaerobic respiration | 14.7 | 0.43 | |
GO:0019420 | Dissimilatory sulfate reduction | 3.37 | 0.85 | |
GO:0015948 | Methanogenesis | 2.76 | 0.0 | |
GO:0019439 | Aromatic compound catabolic process | 0.0 | 2.96 | |
Photosynthesis | GO:0030494 | Bacteriochlorophyl biosynthetic process | 3.94 | 2.22 |
Response to heavy metal contamination | GO:0006824 | Cobalt ion transport | 7.86 | 12.6 |
GO:0006825 | Copper ion transport | 6.86 | 3.60 | |
GO:0015675 | Nickel ion transport | 4.21 | 13.5 | |
GO:0015691 | Cadmium ion transport | 7.86 | 1.88 | |
Response to arsenic contamination | GO:0006817 | Phosphate transport | 17.8 | 19.9 |
GO:0015700 | Arsenite transport | 5.65 | 0.0 | |
GO:0046685 | Response to arsenic | 5.65 | 0.0 | |
Response to stress | GO:0042221 | Response to chemical stimulus | 5.65 | 14.8 |
GO:0009636 | Response to toxin | 0.0 | 7.66 | |
GO:0042493 | Response to drug | 0.0 | 4.50 | |
GO:0010033 | Response to organic substance | 0.0 | 2.23 | |
GO:0010035 | Response to inorganic substance | 6.04 | 0.43 | |
GO:0010125 | Mycothiol biosynthetic process | 1.79 | 3.27 | |
GO:0010127 | Mycothiol-dependent detoxification | 0.0 | 3.27 | |
GO:0006508 | Proteolysis | 7.86 | 4.41 | |
GO:0006310 | DNA recombination | 4.34 | 0.0 | |
GO:0032196 | Transposition | 14.7 | 19.1 |
- GO, Gene Ontology. The last two columns contain the specificity score for l'Estaque (SE) and for St Mandrier (SS) computed as -log10(fdr) where fdr is the false-discovery rate value for the corresponding process in the site. A specificity score higher than 2 (i.e. fdr ≤ 1%) was considered as significant (value shown in bold face). The full list of over-represented processes is given in Table S2 (Supporting information).
The ‘response to arsenic’ GO category (GO:0046685) was over-represented in l'Estaque only (specificity scores SE = 5.65, SS = 0.0) along with ‘arsenite transport’ (GO:0015700 SE = 5.65, SS = 0.0). These processes encompass the glutaredoxin- and thioredoxin-dependant arsenate reductases, arsenic resistance proteins ArsH as well as arsenite transporters and efflux pumps. No Pfam nor TIGRFam model was available although for arsenate respiratory reductase subunits that, as a consequence, were not counted in the ‘response to arsenic’ process. We therefore searched the reads translated in all six frames with the Shewanella sp. ANA-3 arsenate respiratory reductase catalytic subunit ArrA (UniprotKB Accession no Q7WTU0) using blastp. This reference sequence was chosen because the abundance of γ-Proteobacteria, the class Shewanella belongs to, was equivalent in both metagenomes and thus was expected not to introduce any bias due to the metagenomes' taxonomic profiles. In l'Estaque, 59 matches were returned with an expect value lower than 10−10, and 32 were found in St Mandrier. A Fisher's exact test showed that the abundance of ArrA was significantly higher in l'Estaque than in St Mandrier (P-value = 5 × 10−3).
No specific model was available for either arsenite methyltransferase in Pfam or TIGRFam databases. A blastp search on the translated reads using Alcanivorax hongdengensis A-11-3 arsenite methyltransferase (UniProtKB Accession no L0WGF3) as a query allowed us to identify 19 reads similar to this sequence (expect value ≤ 10−10) in l'Estaque and 28 in St Mandrier. According to the Fisher's exact test, these counts were not significantly different (P-value = 0.09).
Arsenite oxidase TIGRFam models were not annotated with any GO term and were not counted either in the ‘response to arsenic’. However, arsenite oxidase read counts identified by the RAMMCAP workflow were significantly higher in l'Estaque with a Bonferroni-corrected Fisher's exact test P-value of 3.31 × 10−9.
The ‘response to chemical stimulus’ processes encompassed not only the responses to arsenic and heavy metal contamination, but more generally responses to stress, sensing and signalling including chemotaxis (GO:0006935; SE = 5.34, SS = 10.27). However, those processes were not the same in both sites as can be seen in Table 2. For instance, the ‘response to inorganic substance’ (GO:0010035; SE = 6.04, SS = 0.43) was over-represented in l'Estaque, whereas in St Mandrier, the ‘response to organic substance’ (GO:0010033; SE = 0.0, SS = 2.23) was over-represented along with the ‘response to drug’ (GO:0042493; SE = 0.0, SS = 4.50) and the ‘response to toxin’ (GO:0009636; SE = 0.0, SS = 7.66).
Energy production anaerobic processes differed as well between the two sites. ‘Fermentation’ (GO:0006113; SE = 0.41, SS = 7.72) was over-represented in St Mandrier, whereas ‘anaerobic respiration’ (GO:0009061; SE = 14.7, SS = 0.43), ‘dissimilatory sulphate reduction’ (GO:0019420; SE = 3.37, SS = 0.85) and ‘methanogenesis’ (GO:0015948; SE = 2.75, SS = 0.0) were over-represented in l'Estaque (Table 2).
Sulphur cycle
The presence of ‘dissimilatory sulphate reduction’ in both sites is consistent with the dominance of Desulfobacterales that was observed in the taxonomic analysis of 16S reads from l'Estaque and St Mandrier. However, it was only over-represented in l'Estaque and, as mentioned above, Desulfobacterales represented 54.7% of the 16S DNA reads assigned to an order by the RDP classifier for this site while this order accounted only for 31.7% of the assigned 16S reads in St Mandrier. Furthermore, because of the strong similarity between the SRB dissimilatory sulphite reductase (Dsr) and its homologue catalysing the reverse reaction in SOBs, it was expected that a significant proportion of the reads accounting for the ‘dissimilatory sulphate reduction’ process were actually from the reverse enzyme catalysing sulphur oxidation instead of sulphite reduction. The presence on both sites of sulphur-oxidizing Chromatiales and/or Chlorobiales would be in agreement with the over-representation of ‘bacteriochlorophyl biosynthesis’ (GO:0030494; SE = 3.94, Ss = 2.22).
In order to assess the relative importance of sulphate reduction and sulphur oxidation in l'Estaque and St Mandrier, reads translated in all six frames were searched with blastp for similarity with the 20 DsrA reference sequences shown in Table 1. We could thus identify in l'Estaque and St Mandrier respectively 86 reads and 58 reads reaching a blastp expect value lower than 10−5 with at least one of the reference sequences. The similarity values of these reads with the 20 DsrA reference sequences were lying in a similar range as those of the test sets, from 15.8% to 71.2% with a median of 48.3%.
In l'Estaque, 29 reads had similarity values with the reference sequences presenting a Pearson correlation coefficient >0.75 with the SRB profile and were therefore identified as such (δ-Proteobacteria or Firmicutes). A Pearson correlation coefficient >0.75 with the SOB profile was obtained for five reads that were consequently assigned to SOBs (β-Proteobacteria, Chlorobiales or Chromatiales). In St Mandrier, 13 reads were identified as SRBs and nine reads as SOBs. It is important to note here that the proportion of potential DsrA reads that were assigned to either SRB or SOB groups was 39.5% (34/86) for l'Estaque and 37.9% (22/58) for St Mandrier, two values consistent with the expected sensitivity value of 38.6% as determined on the ROC curve (Fig. S1, Supporting information).
The Fisher's exact test of independence indicated that the proportion of assigned SRB DsrA reads was significantly higher in l'Estaque than in St Mandrier at the significance level 5% (P-value = 0.0298), suggesting that sulphate reduction is of relatively higher importance in l'Estaque than in St Mandrier.
Discussion
We presented here a comparative metagenomic study of two marine harbour sediments subject to strong anthropic influence and in particular to metal and arsenic contamination. The taxonomic diversity was slightly lower in l'Estaque than in St Mandrier, possibly because of a much higher concentration of arsenic (Mamindy-Pajany et al. 2013) imposing a stronger selection on the microorganisms. However, notwithstanding the presence of high quantities of toxic pollutants, the taxonomic diversity of l'Estaque was still rather high. This would suggest that in heterogeneous environments with a high availability of organic carbon in various forms such as harbour sediments, a very high arsenic concentration might have only a limited impact on microbial diversity that would be driven more by the heterogeneous character of the environment. This could account in those metagenomes for such features as a high number of regulatory and signalling genes (Table S2, Supporting information). Microorganisms in the studied metagenomes seemed to possess a large variety of potentials for sensing, transducing and responding to external stimuli that are expected to favour a rapid and effective phenotypic response to environmental changes or local niches. The maintenance of such a large diversity, both in terms of taxonomy and metabolism, could have been achieved thanks to the acquisition by microorganisms of arsenic resistance and stress-related genes.
Arsenic resistance and stress response
As arsenic is present in high concentrations in l'Estaque and below the toxicity level in St Mandrier (Mamindy-Pajany et al. 2013), it is not surprising to find that the processes involved in the major mechanisms by which microorganisms deal with arsenic (Slyemi & Bonnefoy 2011) were over-represented in l'Estaque. The ‘response to arsenic’ GO category (GO:0046685) corresponded to the glutathione- or thioredoxin-dependant reduction in arsenate and the subsequent extrusion of arsenite from the cell. It might be surprising at first sight that a response to arsenate contamination would be more represented in l'Estaque, whereas arsenate was the major form of arsenic in St Mandrier. One should keep in mind, though, that even if <25% of total arsenic in l'Estaque was in the form of arsenate, the concentration of arsenate in this site was five times higher than in St Mandrier (Mamindy-Pajany et al. 2013). As suggested by the potential presence of respiratory arsenate reductases in both sites, arsenate could also be reduced by microorganisms capable of anaerobic respiration of arsenate with electron donors provided by the fermentation process. As arsenate reduction processes were over-represented in l'Estaque, they might possibly be contributing somehow to the predominance of arsenite in this site. Arsenite could be oxidized back into arsenate, thus completing the cycle. It may also be immobilized by precipitation or adsorption on the sediment particles (Dixit & Hering 2003) or form thioarsenical complexes with sulphides (Hoeft et al. 2004; Stauder et al. 2005; Kulp et al. 2006; Fisher et al. 2008; Cornelia & Britta 2012). Arsenite might also be methylated by microorganisms in monomethylarsonic acid (MMA), dimethylarsinic acid (DMAA) and eventually in dimethylarsine (DMA) and trimethylarsine (TMA). Because the arsine compounds are volatile, arsenic methylation may contribute to the removal of arsenic from the immediate environment of microorganisms (Slyemi & Bonnefoy 2011; Kruger et al. 2013). Potential arsenite methyltransferases that may catalyse the methylation steps leading from arsenite to DMA and TMA were identified in l'Estaque and St Mandrier, although the possibility of false positives cannot be totally ruled out because of the very strong similarities between the methyltransferase domains of this protein family. It is therefore surprising that Mamindy-Pajany et al. (2013) did not detect any arsenic methylated as arsenic methylation is a widespread biological process. One possible explanation would be that MMA and DMAA are transient intermediates and would not accumulate (Qin et al. 2006). Another explanation would be the presence of demethylation as some microorganisms may regenerate arsenate while using methylated arsenic species as carbon and energy sources (Slyemi & Bonnefoy 2011; Kruger et al. 2013). Furthermore, it was observed in soil samples that MMA could be demethylated back to arsenite in two steps performed each by a distinct microorganism species (Yoshinaga et al. 2011). As the demethylation mechanisms are still unknown, the possibility of their occurring in the sediments could not be examined in this study. However, demethylation too could contribute somehow to the predominance of the inorganic forms of arsenic—arsenate and arsenite—observed in l'Estaque and St Mandrier.
Due to its structural similarity with phosphate, arsenate—As(V)—may enter the cell through the phosphate transport system. The over-representation on both sites of ‘phosphate transport’ GO category (GO:0006817) might thus be related to arsenic resistance. The phosphate transport-related genes identified in both sites belonged to the phosphate-specific transporter family (Pst) whose preferential use over the low-affinity Pit mechanism may reduce the uptake of As(V) while maintaining a sufficient intracellular phosphate level (Cleiss-Arnold et al. 2010). The presence of As(V) in St Mandrier as a major form of arsenic (Mamindy-Pajany et al. 2013) could thus explain that this process was over-represented in this site also.
Arsenite—As(III)—toxicity is related to its strong affinity for sulphur groups in proteins that are thus prevented from adopting a functional fold. As a matter of fact, the ‘proteolysis’ category (GO:0006508) over-representation in l'Estaque and St Mandrier (SE = 7.86, SS = 4.41) was due to the presence in those metagenomes of proteases such as DegP, FtsH, Lon, subtilases and Clp that can all have a role in the response to stress conditions (heat shock, host invasion, toxin–antitoxin systems, etc.). Those proteases are involved in protein folding quality control (FtsH) or have a chaperone activity (Clp, Lon) related to their ability to degrade misfolded proteins produced under stress conditions (Kihara et al. 1999; Dalbey et al. 2011).
In l'Estaque, the over-representation of ‘recombination’ process may indicate a need for DNA repair as DNA may be damaged under the oxidative stress induced by exposure to arsenic and heavy metals. The ‘response to oxidative stress’ GO category (GO:0006979) however, was not significantly over-represented in l'Estaque. This may be due to the fact that this GO category represents a diverse process and is not linked to all TIGRFam entries that may be remotely related somehow to oxidative stress. However, the ‘response to chemical stimulus’ GO category included among others genes encoding enzymes dealing with reactive oxygen species, such as catalases or peroxidases. In St Mandrier, other oxidative stress-related processes were identified too such as the metabolism of mycothiol, a metabolite having a role in the resistance to reactive oxygen (Newton et al. 2008), associated there to formaldehyde detoxification in the ‘response to toxin’ GO category.
More generally, the ‘response to chemical stimulus’ was related in l'Estaque to inorganic substances (GO:0010035) and to the ‘response to arsenic’ (GO:0046685). In St Mandrier on the other hand, it was related to organic substances (GO:0010033), drug (GO:0042493) and toxin (GO:0009636). This may be due to a selective pressure that is mainly driven in l'Estaque by the extremely high concentration of inorganic arsenic, while in St Mandrier, the effects of organic contaminants (mineral oils, polycyclic aromatic hydrocarbons, polychlorinated biphenyls, etc.), and possibly some competition between organisms, would prevail.
Different genes having different roles may therefore account for the same observed GO processes. It is therefore of crucial importance to note at this point that although the over-represented biological processes were essentially the same on both l'Estaque and St Mandrier, a thorough examination of the results could nonetheless highlight subtle differences between both ecosystems.
Metabolic processes
Figure 2 shows the descriptive model of the biogeochemical and metabolic fluxes in each site deduced from the corresponding over-represented metabolism-related processes and from the chemical characterization of the sites by Mamindy-Pajany et al. (2013).

Although the ‘sulphate reduction’ process was present in both sites, it was over-represented in l'Estaque only. This was confirmed by both the predominance of Desulfobacterales and the significantly higher proportion of SRBs’ DsrA reads in l'Estaque relatively to SOBs’ reverse DsrA reads. It should be noted at this point that, as reverse DsrA cannot be readily distinguished from regular DsrA reads by generic automated annotation protocols, specific biological knowledge had to be used here in the form of a specific classification algorithm that allowed us to confidently determine whether a given read corresponded to a reverse DsrA gene or not. This exemplifies a known and accepted limitation of high-throughput annotation procedures, and one should be aware that without any additional expert knowledge, some important conclusions could be overlooked or erroneous. For instance, this may have a strong implication for DsrA sequences from metagenomic samples in public databases, as a proportion of sequences annotated as ‘DsrA sequences from uncultured-sulphite-reducing bacteria’ might actually correspond to the reverse enzyme from SOB.
A microbial community activity more drawn towards sulphate reduction in l'Estaque would be consistent with the chemical characterization of the sites by Mamindy-Pajany et al. (2013) who detected sulphate reduction products in interstitial water in l'Estaque but not in St Mandrier. The free hydrogen sulphide produced by SRBs may have an important effect on arsenic biogeochemistry. As previously described for alkaline sulphidic lakes and hot springs, free sulphide can abiotically react with As(III) to form thioarsenical complexes (Hoeft et al. 2004; Stauder et al. 2005; Kulp et al. 2006; Fisher et al. 2008; Cornelia & Britta 2012). Although thioarsenical complexes are considered by some authors as less toxic forms of arsenic (Rader et al. 2004; Stauder et al. 2005), they actually present a toxicity ranging from the low toxicity of monothioarsenate to the acute toxicity of trithioarsenate (Planer-Friedrich et al. 2008). A potential antagonistic toxicological interaction was suggested though between trithioarsenate and arsenite, resulting in a decreased toxicity in synthetic solutions of arsenite and sulphide (Planer-Friedrich et al. 2008).
The formation of thioarsenical complexes is known to increase the solubility and mobility of arsenic in reducing environments (Couture & Van Cappellen 2011). As a matter of fact, thioarsenical complexes were observed in l'Estaque interstitial water, and arsenic mobility was shown to be nine times higher in l'Estaque than in St Mandrier (Mamindy-Pajany et al. 2013). The proportion of mobile arsenic in interstitial water relative to the solid phase is indeed higher in l'Estaque (653 μg/L in interstitial water for 194 mg/kg in solid phase) than in St Mandrier (6 μg/L in interstitial water for 10 mg/kg in solid phase). This difference could be caused by sulphate reduction performed by the microbial community that would indirectly promote the dispersal of arsenic in the form of thioarsenical complexes. Thus, in l'Estaque, the extremely high concentration of arsenic in interstitial water might actually be, at least in part, a consequence of the predominance of SRB.
Other metabolic processes were significantly over-represented too in the contaminated sites, strongly suggesting that, although central, sulphate reduction is probably not the only important process at work. First of all, the balance between sulphate reduction and sulphur oxidation may have an important role in the availability of hydrogen sulphide for thioarsenical complexes production. Furthermore, arsenate reduction, which accounts for the over-represented ‘response to arsenic’ process in l'Estaque, would produce arsenite that may react with hydrogen sulphide to generate thioarsenical complexes. The electron donors required for arsenate reduction and sulphate reduction by microorganisms could be provided by the fermentation process. Thus, fermentation, sulphate reduction and arsenate reduction would act together in l'Estaque to favour the dispersion of arsenic and possibly reduce its toxicity. At this stage, though, it is not clear whether this mechanism could represent a real advantage to the local microbial community by way of arsenic dilution in the environment as arsenic concentration in interstitial water is still very high. Furthermore, thioarsenical complexes are unstable in oxic conditions and would eventually revert in the water column back to arsenite or arsenate. In any case, the large diversity encountered in l'Estaque would nonetheless indicate that arsenic mobilization in this site does not seem to have any major negative impact on the microbial community as a whole.
On a more general and technical scope, this study also demonstrates that complex and similar metagenomes can be compared even with a relatively low coverage preventing reconstruction of genomes by assembly. Provided that the sequenced reads are long enough to allow their annotation in reasonable proportions, it is expected that enough information can be retrieved to highlight the biological processes at work in the microbial communities. Because the vast majority of environmental microorganisms are still uncultivated and not characterized, focusing on the presence of biological function instead of individual OTUs avoids the hazardous task of function deduction from predicted taxonomic classes. Much more interestingly, by viewing the community as a whole supraorganism, this function-centric approach provides useful insight into the global functionalities (Tringe et al. 2005) no matter whether these functionalities are gained through interaction between different species or through within-species phenotypic variability (Ackermann 2013). The models built from such analyses may thus provide useful guidelines and hypotheses for the design of laboratory or field experiments that, in turn, would verify and refine these models and hypotheses. On the other hand, in the absence of expression or proteomic data that could provide information on the actual activity of the community but might be difficult to obtain, such studies would be less applicable to environments where no significant selective pressure could be clearly identified.
Even more importantly, this study points out the limitations of automated metagenome analyses and highlights the crucial importance of biological knowledge integration and human expertise for more precise and in-depth analysis. Of course, automated procedures such as those offered by CAMERA (Sun et al. 2010) or MG-RAST (Meyer et al. 2008) are working perfectly well for a first global sight of the functions represented in metagenomes, but they may not allow to highlight subtle differences without any further analysis. Indeed, human expertise and specifically tailored bioinformatic tools and methods, such as those developed in the present study for DsrA, are still absolutely necessary to address a particular biological question and complement the generic analysis procedures that lack the ability to integrate global biological knowledge and to adapt their use to the very specificities of the metagenomes under scrutiny.
Acknowledgements
This work was supported by the Agence Nationale de la Recherche (ANR-2008-CESA-003), the Centre National de la Recherche Scientifique (CNRS) in the frame of the ‘Groupement de Recherche—Métabolisme de l'Arsenic chez les Microorganismes (GDR2909-CNRS)’ (http://gdr2909.alsace.cnrs.fr/) and the Université de Strasbourg (UdS). The authors are deeply grateful to Dr Nicolas Marmier for useful discussion about arsenic methylation.
References
Bioinformatics and statistical analyses were performed by B.N. and F.P. The ecophysiological interpretations of results were realized by S.K., F.P. and P.B. Metagenomic DNA was extracted by S.K. and sequenced by O.B. Taxonomic DNA arrays were realized by E.D.B. and P.P. Expertise in biogeochemistry interpretation was provided by F.S. and F.B.B.
Data accessibility
Raw metagenomic sequences were deposited into the EMBL Sequence Read Archive (SRA) with the study Accession no ERP001568. The sample Accession nos are ERS153818 for l'Estaque (553 850 reads and 210 852 489 bp) and ERS153819 for St Mandrier (540 574 reads and 210 532 149 bp).
The full list of over-represented biological processes is available in Table S2 (Supporting information).