Critical Steps in Shotgun Metagenomics-Based Diagnosis of Bloodstream Infections Using Nanopore Sequencing
[Correction added on 26 February 2025, after first online publication: The email address of the corresponding author has been corrected.]
ABSTRACT
Shotgun metagenomics offers a broad detection of pathogens for rapid blood stream infection of pathogens but struggles with often low numbers of pathogens combined with high levels of human background DNA in clinical samples. This study aimed to develop a shotgun metagenomics protocol using blood spiked with various bacteria and to assess bacterial DNA extraction efficiency with human DNA depletion. The Blood Pathogen Kit (Molzym) was used to extract DNA from EDTA-whole blood (WB) and plasma samples, using contrived blood specimens spiked with bacteria for shotgun metagenomics diagnostics via Oxford Nanopore sequencing and PCR-based library preparation. Results showed that bacterial reads were higher in WB than plasma. Differences for Staphylococcus aureus and Streptococcus pneumoniae were more pronounced compared to Escherichia coli. Plasma samples exhibited better method reproducibility, with more consistent droplet digital PCR results for human DNA. The study found that extraction was more efficient for Gram-positive bacteria than Gram-negative, suggesting that the human DNA depletion exerts a negative effect on Gram-negative bacteria. Overall, shotgun metagenomics needs further optimisation to improve bacterial DNA recovery and enhance pathogen detection sensitivity. This study highlights some critical steps in the methodology of shotgun metagenomic-based diagnosis of blood stream infections using Nanopore sequencing.
1 Introduction
Sepsis, a life-threatening dysregulated host response to infections caused by viable microbes in the body, represents almost 20% of all global deaths [1]. There are several factors contributing to the high mortality rate. The long turnaround time for blood culture (1–5 days), which is the gold standard diagnostic tool for bloodstream infection, could be a contributing factor in certain cases where proper antibiotic treatment is not introduced with empiric therapy [2]. Antimicrobial treatment prior to blood culture sampling [3], insufficient sampling and low inoculum can often result in delays or even failure (up to 50% negative) in detecting microbes involved in bloodstream infections [4]. The use of multiplexed PCR in this situation is restricted by the limited range of pathogens targeted by the panels [5]. Although targeted sequenced-based methods, such as 16S rRNA and 18S rRNA, widely used in clinical diagnostics, are not limited to pre-set panels, they allow the identification of pathogens predictably only at the species level, and often only at the genus level [6, 7]. Shotgun metagenomics has the potential to overcome some of the limitations of routinely used microbial diagnostic techniques. It can provide a different level of information, from the detection and characterisation of microorganisms and viruses to the detection of virulence factors and the identification of antimicrobial resistance markers, in a single assay without a priori knowledge [8]. As the abundance of the infection-causing microbes can be less than 0.25 CFU/mL [9], the recommended volume is 40–60 mL for blood culture diagnostics [10]. Molecular tests use only 1–10 mL of blood [11], which in this context might hamper detection, but in certain cases, for example, in paediatric patients, only one blood draw of 1–7.5 mL is feasible anyway [12]. To date, attempts to use shotgun metagenomics to diagnose bloodstream infections have been performed using cell-free DNA from plasma [13-17] and blood samples [18]. Gyarmati et al. used shotgun metagenomics to detect bacterial, fungal and viral pathogens along with antimicrobial resistance genes in blood samples of acute leukaemia patients with suspected bloodstream infections at different time points of their antimicrobial treatment [18]. Despite the promises of diagnostic value in the specific clinical situations mentioned above, the application of shotgun metagenomics in routine settings is still limited by its analytical sensitivity owing to the low abundance of bacteria during septicaemia and the large amount of host nucleic acid (NA) in blood samples [9]. High-quality extraction of DNA from the disease-causing bacteria in the clinical specimen is thus of utmost importance for bloodstream infection diagnostics, for which there is currently no consensus regarding the sample matrix to be used, that is whole blood (WB), plasma or both. In addition, short-read sequencing is often used as the standard diagnostic technology for sequencing, but it has a long turnaround time compared to sequencing with Oxford Nanopore Technologies (ONT) [19, 20]. The primary aim of this study was therefore to develop a shotgun metagenomics protocol using contrived blood specimens spiked with bacteria and to evaluate the bacterial DNA extraction with human DNA depletion. A secondary aim was to enable same-day diagnostics to have a turnaround time short enough to be meaningful in a clinical context.
2 Materials and Methods
2.1 Microbial Strains
Staphylococcus aureus (CCUG 15915), Escherichia coli (CCUG 17620) and Streptococcus pneumoniae (CCUG 33638) and a mock microbial community standard, 20 Strain Even Mix Whole Cell Material (ATCC MSA2002), were used for spiking contrived WB. The S. aureus, E. coli and S. pneumoniae were cultured overnight on blood agar plates (Columbia Blood Agar Base (Acumedia Neogen Corporation, Lansing, MI, USA) and 6% horse blood) at 36.5°C. Bacterial colonies were suspended in 0.85% NaCl solution to an adjusted turbidity of 0.5 McFarland, corresponding to 108 colony-forming units (CFU)/mL bacteria. Freshly prepared bacterial suspensions were 10-fold serially diluted in 0.85% NaCl. From each dilution, 100 μL were plated onto blood agar plates (Acumedia Neogen Corporation) in triplicate and cultured overnight to confirm the CFU/mL of the prepared suspensions. The ATCC-MSA-2002 mock microbial community, consisting of 20 species (4 × 107 cells/vial ± 1 log), was prepared in 1 mL 1X PBS (phosphate-buffered saline).
2.2 Spiking Into Whole Blood
Blood from healthy volunteers was collected in 6 mL EDTA tubes (Becton, Dickinson and Company, New Jersey, USA) and combined in falcon tubes for spiking within the same hour of the blood draw. WB was spiked with the freshly prepared bacterial suspensions mentioned above, corresponding to 103–105 CFU/mL in WB. For E. coli, dilutions 104 and 103 were used because in previous experiments (not presented here), we could not reach detectable bacterial gene copy numbers in S. aureus and S. pneumoniae spiked samples at 103 CFU/mL. The ATCC-MSA-2002 cell mixture suspension was spiked into WB to theoretical concentrations of 4 × 104, 4 × 103 and 4 × 102 cells/mL (2000, 200 and 20 cells of each bacterium) in triplicates. Negative controls for each experiment were spiked with either 0.85% NaCl or 1X PBS, extracted simultaneously and further analysed together with the other samples.
2.3 Preparation of Plasma Samples
One millilitre of plasma was obtained from 5 mL of spiked WB by centrifugation at 180 g (P-180 g) or 100 g (P-100 g) (Figure 1) for 10 min at room temperature. The bacteria were only spiked directly into WB to mimic the actual conditions of a clinical sample.

2.4 Genomic DNA Isolation
The DNA from 10 mL of spiked WB, containing freshly prepared bacterial suspension or mock microbial community, was extracted using the Blood Pathogen Kit combined with the add-on 10 complement (Molzym, GmbH Bremen, Germany). The DNA from 1 mL of plasma obtained from the same WB samples as mentioned above was extracted, also using the Blood Pathogen kit (Molzym), differing in the human depletion step (without the add-on 10 complement) in accordance with the manufacturer's instructions, followed by an automatic extraction (Arrow, Diasorin, Oslo, Norway). The extracted DNAs were eluted in 100 μL elution buffer of the Blood Pathogen Kit (Molzym) and stored at −80°C. In addition, the ATCC-MSA-2002 cell mixture suspensions in 1X PBS at three dilutions were also processed using the magLEAD 12gC NA extraction robot and the magDEA Dx SV kit (Precision System Science Co. Ltd., Japan) in accordance with the manufacturer's instructions with and without enzyme pre-treatment (mix of 20 μL lysozyme, 100 mg/mL, [L6876; Merck, Darmstadt, Germany] and 10 μL mutanolysin, 1 mg/mL, [M9901; Merck] added to 200 μL sample).
Extracted DNA was quantified using the Qubit dsDNA HS assay (Thermo Fisher Scientific, Massachusetts, USA), and the quality was assessed using a Nanodrop spectrophotometer (Thermo Fisher Scientific). Fragment size analysis was performed using the gDNA ScreenTape assay on an Agilent 4150 TapeStation (Agilent, California, United States of America) before library preparation.
2.5 Droplet Digital PCR (ddPCR)
The quantities of human and bacterial extracted DNA were analysed by ddPCR using the QX100 system (Bio-Rad Laboratories, California, United States of America) in accordance with the manufacturer's instructions. Separate ddPCR mixes, targeting the E. coli, S. aureus and S. pneumoniae-specific genes uidA, nuc and lytA, respectively, as well as 16S-rRNA genes, were prepared as described by Ziegler et al. [21]. In addition, the human gene RPP30 was targeted in each of the species-specific ddPCR mixes to evaluate the human depletion efficiency in each of the DNA extraction protocols [22]. The ATCC-MSA-2002 mock community spiked samples were also analysed by ddPCR targeting the 16S rRNA, uidA, nuc and RPP30 genes.
2.6 Oxford Nanopore Technologies Sequencing
ONT sequencing libraries were prepared using 1–5 ng DNA input, depending on the amount of the extracted DNA, with the Rapid PCR Barcoding kit, SQK-RPB004 (ONT, Oxford, United Kingdom), in accordance with the manufacturer's instructions with the following modifications: the number of PCR cycles was increased from 14 to 24 cycles, and the incubation step with the AMPure XP beads and the Tris–HCl buffer was increased to 10 and 5 min, respectively. DNA libraries were sequenced with a FLO-MIN106 R9.4 flowcell on a MinION device (ONT) operated for 24 h.
2.7 Data Analysis
The raw reads (Fast5) were base-called using Guppy (v3.6.0) in high accuracy mode and de-multiplexed and adapter trimmed using qcat v1.1.0 (https://github.com/nanoporetech/qcat). The quality of the de-multiplexed data was checked using NanoStat v1.2.0 (https://github.com/wdecoster/nanostat) [23]. The taxonomic classification of the sequencing reads was performed using Kraken2 v2.11–1 (https://github.com/DerrickWood/kraken2) [24] with the MiniKraken2_v2_8GB database.
2.8 Statistics
Coverage was calculated by mapping the reads to a reference genome in Kraken 2. Descriptive statistics were used (median, IQR) to facilitate the understanding of data. No comparative statistics were calculated due to the low level of samples and the exploratory character of the study.
2.9 Ethics Statement
The study received approval from the Regional Ethical Review Board Uppsala, Sweden, Sep. 2014 (dnr 2014/193) with amendment dnr 2022–03414-02 (Jul. 2022). Healthy donors provided informed verbal consent for participation, which was noted in the study protocol prior to sampling. No sensitive information regarding the donors were collected and samples were anonymised after sampling.
3 Results
3.1 Samples Spiked With Single Bacteria
3.1.1 ddPCR Analysis
The DNA extraction efficiencies of the protocols for spiked WB and plasma samples obtained after centrifugation at 100 g (P-100 g) and 180 g (P-180 g) were evaluated by quantifying the specific genes of each spiked bacterial species and the human-specific gene using ddPCR. All ddPCR analyses were performed in duplicate, and differences in the number of gene copies were compared as means of duplicates. The bacterial gene copy appeared to be higher in spiked WB samples compared to plasma samples (centrifuged at P-100 g or P-180 g). This trend seemed more pronounced in samples spiked with S. aureus and S. pneumoniae (1.2–3.3 and 3.4–7.5-fold differences, respectively) than in those spiked with E. coli (1.1–1.2-fold difference; Figure 2).

The human gene RPP30 copy number was highest in P-100 g plasma samples and lowest in WB samples (the RPP30 gene was undetectable in S. aureus-spiked WB [Figure 2a]). In the WB sample spiked with 105 CFU/mL S. pneumoniae (Figure 2b), there was an unexpectedly high RPP30 gene copy number compared to all the other samples, indicating a technical issue in the human depletion step in this sample. Overall, the P-100 g protocol yielded a higher RPP30 gene copy number and a lower copy number of bacterial genes than the WB samples. The negative controls showed very low levels of 16S DNA in the ddPCR and was thus not sequenced.
3.1.2 ONT Sequencing and Data Analysis
Next, pooled DNA of duplicates for each spiked sample was sequenced. The proportion of reads classified as bacteria and host (human) was compared between WB and the different plasma preparations (Figure 3). The proportion of reads classified as bacteria was highest in WB (Figure 3b,c), except for S. aureus, for which bacterial-classified reads were highest in the P-180 g plasma samples (Figure 3a). The lowest proportion of bacterial reads was detected in P-100 g samples. Moreover, more than 95% of the reads (min 95.2%–max 99.49%) were classified as human in all samples, with the highest in P-100 g samples (Table S2). The depth of genome sequencing for bacteria was low overall, with a maximum of 4.8x coverage in S. aureus P-180 g samples, which also had the highest proportion of bacterial reads (2.3%; Table S2).

In general, the P-100 g protocol yielded the highest proportion and gene copy number of human DNA and the lowest proportion and gene copy number of bacterial DNA. Therefore, the P-100 g protocol was excluded from the ATCC-MSA-2002 mock community spiking experiments.
3.2 Polymicrobial (Mock Microbial Community) Spiked Samples
3.2.1 ddPCR Analysis
In the ATCC-MSA-2002 mock community spiked samples, gene copies/μL for nuc, uidA and 16S rRNA genes as determined by ddPCR were at least two times higher in WB samples than in plasma (P-180 g) at three different concentrations: 4 × 102, 4 × 103 and 4 × 104 cells/mL (Table S1 and Figure S1). The RPP30 gene copy number was lowest in 4 × 104 cells/mL spiked WB but unexpectedly highest in 4 × 103 and 4 × 102 cells/mL spiked WB. Although the RPP30 gene copy numbers among all samples were more consistent for P-180 g samples (n = 11; median 2.0 copies/μL; IQR 1.45–2.1), they varied substantially for WB samples (n = 11; median 3.0 copies/μL; IQR 0.31–10.45). The variation in gene copies among the replicates was also observed for nuc, uidA and 16S rRNA genes in WB samples, most likely due to extraction variability (Table S1 and Figure S1). The negative controls showed very low levels of 16S DNA in the ddPCR and were thus not sequenced.
3.2.2 ONT Sequencing and Data Analysis
The replicates of WB and P-180 g samples spiked with the ATCC-MSA-2002 mock community were sequenced. In the first dilution (4 × 104 cells/mL), 48% and 10% of the reads were classified as bacteria in WB and P-180 g samples, respectively (Figure 4). The proportion of reads for total bacteria was also higher in WB samples than plasma samples at dilution 2 (4 × 103 cells/mL). However, 99% of the reads were classified as human in both WB and plasma samples at dilution 3 (4 × 102 cells/mL). The mean read length obtained from plasma samples was shorter than from WB samples (Table S3).

The DNA extraction efficiency for each bacterium in the mock community was investigated by looking at the respective proportion of reads (Figure 5) and the number of reads (Figure S2) classified in WB and plasma. Using Kraken2, the highest relative abundance was seen for Enterococcus faecalis and S. aureus, followed by Acinetobacter baumannii in both WB and plasma samples (Figures 5 and S2). Most of the other species were barely detected. Pseudomonas aeruginosa and Cutibacterium acnes were not detected in either WB or plasma samples. To further investigate this result, the mock community bacterial suspensions in 1 x PBS (not spiked into WB) at three different concentrations were sequenced. P. aeruginosa, which could not be detected either in WB or plasma (Figure 5a,b), was recovered from the bacterial suspension in PBS (Figure S3). Furthermore, including a pre-enzyme treatment step into the protocol increased the DNA extraction yield for Gram (+) bacteria but decreased it for Gram (−) bacteria (Figure S3).

When studying the abundances of S. aureus and E. coli in the mock community spiked samples, the detection limit was estimated to be 20 genomic copies/mL WB +/−1 log10 for ONT sequencing (Figure 5a).
4 Discussion
In this study, a shotgun metagenomic protocol using ONT sequencing, with the potential of same-day diagnostics of bloodstream infections, was investigated. WB and plasma were evaluated as clinical specimens to reach an optimum shotgun metagenomics protocol with a similar NA extraction efficiency for different bacterial species despite their varying cellular structures. Considering the challenges of the conventional diagnostic approaches mentioned previously, shotgun metagenomics has the advantage of potentially providing information on all microbial pathogens and associated antimicrobial resistance in a single assay directly from the sample. The fast sequencing technology of Oxford Nanopore was used to get test results within 1–2 days by processing 10 mL of WB and 1 mL of plasma (obtained from 5 mL of WB). In general, more bacteria, with the highest bacterial gene copy number detected by ddPCR and the highest proportion of reads classified as bacteria through sequencing, were recovered using 10 mL WB. One exception was S. aureus, for which the number of bacterial reads was highest in the P-180 g plasma samples. The WB and P-180 g samples spiked with the ATCC-MSA-2002 mock community (4 × 104 cells/mL) showed proportions of reads classified as bacteria at, on average, 48% and 10%, respectively. In turn, the bacterial DNA extraction was least efficient in plasma P-180 g samples, giving the lowest proportion of reads classified as bacteria and with the highest level of human DNA. The mean read length obtained from plasma samples was also shorter than from WB samples (Table S3). However, reproducibility was higher in plasma compared to WB, shown by IQRs of 1.34–2.1 and 0.31–10.45 copies/μL, respectively. Together with the fact that the Molzym kit resulted in a consistently higher DNA extraction efficiency for Gram (+) bacteria in all preparations, these results make it hard to definitely conclude on the impact of the choice of sample matrix. Although the spiked mock community theoretically contains an even whole cell mixture of 20 different bacterial species, Gram (+) bacteria, particularly E. faecalis, were detected in high proportions in both WB and plasma samples. To study the efficiency of the DNA extraction without the effects of a sample matrix but also without the effects of the human depletion step, possibly affecting the extraction of DNA from Gram (−) bacteria, DNA was also extracted directly from the mock community suspension in PBS in three different concentrations. However, another DNA extraction kit had to be used, magDEA Dx SV, since the Molzym kit is only designed for DNA extraction from clinical samples that include human material. We found the P. aeruginosa, which could not be detected either in WB or plasma (Figure 5a,b), was recovered from the bacterial suspension in PBS (Figure S3). Other Gram (−) bacteria, such as E. coli, Neisseria meningitidis and A. baumannii, were also detected in higher proportions in the protocol without the human depletion step. We additionally evaluated how enzymatic pre-treatment affected bacterial DNA isolation and found it to improve extraction efficiency for Gram (+) bacteria but decrease it for Gram (−) (Figure S3). This finding should be considered when trying to determine an optimal extraction procedure for clinical samples prior to shotgun metagenomics.
We observed that there was no strict linear relationship in the dilution series in the detection of bacteria. This result might be explained by the low concentrations of bacterial DNA in the higher dilutions, as human depletion is likely more important in samples containing less bacterial DNA to reach effective sequencing. A disagreement was also observed in the amount of host DNA between ddPCR and sequencing. Although the ddPCR copy number of the RPP30 gene was lower than the bacterial gene copy number in WB and P-180 g samples, most of the reads were classified as human with ONT sequencing in samples spiked with single bacteria (> 95% human reads) and the ATCC-MSA-2002 mock community (> 50% human reads). This difference is most likely explained by the modified library preparation protocol, which includes an increased amount of PCR cycles, which may not be optimal. Since the library preparation includes this PCR step, in which all material amplifies exponentially, considering the size of the human genome vs. the bacterial genome, the included PCR step increases the human DNA copy number to a much greater extent. Theoretically, in 10 mL WB with a concentration of E. coli of 104 CFU/mL, the ratio of leucocyte (10 × 106 cells/mL) human DNA (~6.4 billion bp) to E. coli DNA (~5 Mb) should be 320,000,000: 500 [10 × (5 × 106) × 6.4: 10 × 104 × 0.005]. The difference in amplification also explains why the detected amount of human DNA is more significant with sequencing than ddPCR, since the ddPCR only targets one single gene from the whole human genome (RPP30) and sequencing detects the whole human genome within the sample. Yet, the increased amplification enables the detection of low-abundant pathogens present in a sample, which can be very important in a clinical context.
Unfortunately, we could not conclude how the different matrices affected the bacterial DNA isolation efficiency using the same Molzym kit. However, extraction of bacterial suspension using the magDEA Dx SV kit also showed a difference in the proportion of reads classified for each bacterium despite theoretically even cell mixtures in suspension. A substantial variation in Molzym kit extractions among the replicates was also observed, especially in WB samples. For example, one of the replicates spiked with 4 × 104 cells/mL did not yield any quantifiable DNA. The failure during automatic extraction could be due to a high amount of total NA released from the bacteria and the human cells that made the magnetic beads malfunction. Nucleic acid-based detection methods might also be hampered by inhibitory components such as haemoglobin, immunoglobulin and heparin in WB [25]. These factors affect reproducibility when using WB as a sample matrix and should be considered when introducing this into a clinical context. The presence of inhibitors could also explain the source of variation observed in the replicates within the same dilution, shown in general within this study. A previous study comparing different DNA extraction kits from spiked WB also showed similar inconsistencies in DNA concentrations among different concentrations of bacteria using the MolYsis Complete5 kit of Molzym [26].
Overall, the detection limit was estimated to be 20 genomic copies/mL WB +/−1 log for ONT sequencing, and DNA isolation was generally more efficient for Gram (+) than Gram (−) bacteria. To assess this more accurately, it would have been ideal to check the gene copy number of each of the 20 bacteria included in the mock community by ddPCR. Nevertheless, one should consider the differences in genome size, which might favour the number of reads from larger bacterial genomes. However, we did not see any association between the proportion of reads and the genome size of the bacteria in the mock community.
One limitation of this study is the low coverage since a higher coverage is required for reliable data. At the time of these experiments, the newer Q20 technology and analytic tools from Nanopore were not available, and therefore all experiments were run using the old technology, known to have a higher error rate and therefore the species-level call had lower reliability [27, 28]. Hence, the pathogen content of the samples was here known in advance, and the species-level call could be trusted. This finding highlights the importance of further improving the method using the newest Nanopore technology with higher accuracy to make it more sensitive for low levels of bacteria. Improvements would facilitate reaching down to species level to a greater extent than seen in this study, in which several reads were classified at the genus level. In addition, the study was limited by the volume of samples available and the cost to process all biological and technical replicates of samples in the ddPCR and Nanopore sequencing experiments. Another limitation is also the fact that the blood used for spiking was contrived from healthy donors. WB is not a passive medium, and in patients presenting with bloodstream infections, with a storm of events occurring, blood cells are exceedingly active, which may influence the result. Further studies are needed with specimens from patients with bloodstream infections. Careful consideration should also be given to the choice of human depletion method; saponin and DNAse can also be used to reduce human DNA in similar samples and should be compared with the choice of method in this study. In addition, sequencing technologies are constantly evolving and selective sequencing using Nanopore could be a useful approach to try out in the future [29].
5 Conclusion
In this study, we used nanopore-based shotgun metagenomics for spiked WB and plasma samples. The study revealed several challenges and critical steps, the first of which is achieving optimal extraction, including human DNA depletion. The proportion of bacteria detected from 10 mL of WB appeared to be higher, but it also seemed to have a lower reproducibility compared to plasma. Second, the DNA extraction efficiency was higher for Gram (+) bacteria, regardless of the sample matrix. These two factors prevent us from drawing definite conclusions as to whether WB or plasma is the best sample material. Third, the low genome coverage also indicates that further optimisation is necessary to obtain more detailed information beyond pathogen detection. Several challenges remain with shotgun metagenomics-based diagnostics of bloodstream infections. This study contributes by highlighting some of the critical steps for future development of shotgun metagenomics-based diagnostics to bring it closer to a method with general clinical applicability.
Conflicts of Interest
At the time of the study John Rossen was consulting for IDbyDNA (now Illumina). This did not have any influence on the interpretation of reviewed data and conclusions drawn nor on the drafting of the manuscript, and no support was obtained from them. Remaining authors declared no conflict of interest.
Open Research
Data Availability Statement
The data that support the findings of this study are openly available in SRA at https://dataview.ncbi.nlm.nih.gov/object/PRJNA1020393, reference number PRJNA1020393.