Volume 15, Issue 5 pp. 1079-1090

Resource Article

Full Access

Assessing the utility of whole genome amplified DNA for next-generation molecular ecology

Christopher Blair,

Corresponding Author

Christopher Blair

Department of Biology, Duke University, Box 90338, BioSci 130 Science Drive, Durham, NC, 27708 USA

Correspondence: Christopher Blair, Fax: 718-260-5278; E-mail: [email protected]Search for more papers by this author

C. Ryan Campbell,

C. Ryan Campbell

Department of Biology, Duke University, Box 90338, BioSci 130 Science Drive, Durham, NC, 27708 USA

Search for more papers by this author

Anne D. Yoder,

Anne D. Yoder

Department of Biology, Duke University, Box 90338, BioSci 130 Science Drive, Durham, NC, 27708 USA

Search for more papers by this author

Christopher Blair,

Corresponding Author

Christopher Blair

Department of Biology, Duke University, Box 90338, BioSci 130 Science Drive, Durham, NC, 27708 USA

Correspondence: Christopher Blair, Fax: 718-260-5278; E-mail: [email protected]Search for more papers by this author

C. Ryan Campbell,

C. Ryan Campbell

Department of Biology, Duke University, Box 90338, BioSci 130 Science Drive, Durham, NC, 27708 USA

Search for more papers by this author

Anne D. Yoder,

Anne D. Yoder

Department of Biology, Duke University, Box 90338, BioSci 130 Science Drive, Durham, NC, 27708 USA

Search for more papers by this author

First published: 24 January 2015

https://doi.org/10.1111/1755-0998.12376

Citations: 25

Share a link

Email
Wechat
Bluesky

Abstract

DNA quantity can be a hindrance in ecological and evolutionary research programmes due to a range of factors including endangered status of target organisms, available tissue type, and the impact of field conditions on preservation methods. A potential solution to low-quantity DNA lies in whole genome amplification (WGA) techniques that can substantially increase DNA yield. To date, few studies have rigorously examined sequence bias that might result from WGA and next-generation sequencing of nonmodel taxa. To address this knowledge deficit, we use multiple displacement amplification (MDA) and double-digest RAD sequencing on the grey mouse lemur (Microcebus murinus) to quantify bias in genome coverage and SNP calls when compared to raw genomic DNA (gDNA). We focus our efforts in providing baseline estimates of potential bias by following manufacturer's recommendations for starting DNA quantities (>100 ng). Our results are strongly suggestive that MDA enrichment does not introduce systematic bias to genome characterization. SNP calling between samples when genotyping both de-novo and with a reference genome are highly congruent (>98%) when specifying a minimum threshold of 20X stack depth to call genotypes. Relative genome coverage is also similar between MDA and gDNA, and allelic dropout is not observed. SNP concordance varies based on coverage threshold, with 95% concordance reached at ~12X coverage genotyping de-novo and ~7X coverage genotyping with the reference genome. These results suggest that MDA may be a suitable solution for next-generation molecular ecological studies when DNA quantity would otherwise be a limiting factor.

Introduction

As the cost of both second- and third-generation DNA sequencing continues to plummet (Glenn 2011), ecological genomic studies will be feasible for the majority of individual laboratories, even those operating on a modest budget. Indeed, the reduced costs of modern sequencing technologies are enabling a more cost-efficient approach to empirical population genetic research than traditional programmes that, for example, utilized molecular cloning procedures for marker development. For example, one run on an Illumina MiSeq results in thousands of potentially suitable microsatellite markers to study rates and patterns of gene flow in nonmodel species (Castoe et al. 2012). These technological advances are now able to provide exponentially more data at a reduced cost per base pair vs. traditional Sanger sequencing-based methods (Glenn 2011).

Although next-generation DNA sequencing (NGS) has found a home in nearly all biological disciplines, these methods rely an initial library preparation step that requires a relatively large quantity of starting material for adequate construction (Quail et al. 2012). For example, a library insert size of 10 kb for sequencing on a PacBio RS II currently requires approximately 10 μg of starting gDNA (Pacific Biosciences). Although DNA libraries can be constructed with substantially less DNA for sequencing on other platforms using alternative techniques, DNA quantity can nonetheless be a limiting factor for genomic workflows. This issue has been addressed numerous times in the field of clinical and forensic research (see Lovmar & Syvänen 2006 for review).

Obtaining high concentrations of whole genomic DNA (gDNA) from large quantities of fresh tissue samples for NGS is certainly feasible in many cases. However, when assessing natural populations of rare species researchers are often forced to work in suboptimal conditions that include inadequate preservation methods, limited sampling regimes, and suboptimal tissue type and quantity. Molecular ecological studies, for example, might rely on noninvasive tissue sampling methods to obtain genetic material that often exhibits substantially reduced DNA quantity and perhaps quality (Taberlet & Luikart 1999; Taberlet et al. 1999; Waits & Paetkau 2005). In addition, old tissue samples preserved in inadequate buffer may result in reduced DNA yield that is of high molecular weight and generally yields biased sequencing data unfit for analysis (Taberlet & Luikart 1999; Taberlet et al. 1999). Further, ancient DNA and environmental DNA (eDNA) studies are often plagued by low-quantity, highly degraded DNA that puts limits both on the choice of sequencing procedures and more importantly the hypotheses that can be tested (Hofreiter et al. 2001; Pääbo et al. 2004; Taberlet et al. 2012b).

Although relatively unexplored in ecological and non-human evolutionary research, whole genome amplification (WGA) may be a viable method to substantially increase starting DNA yield for subsequent genetic and genomic analysis (see Lasken 2009 for review). In noninvasive strategies such as hair collection (particularly if samples are collected soon after they are shed from the animal), DNA is often of high molecular weight though in low copy number quantities. WGA may therefore be useful to increase DNA concentrations so that NGS can be performed. For ancient DNA and eDNA applications, WGA could complement previously existing laboratory methods (e.g. Rohland & Hofreiter 2007) and be useful for increasing DNA yield based on the limited number of target fragments long enough to be successfully amplified. Several different WGA methods and kits are currently available including degenerate oligonucleotide primed PCR (DOP; Telenius et al. 1992; Cheung & Nelson 1996), primer extension PCR (PEP; Zhang et al. 1992) and multiple displacement amplification (MDA; Dean et al. 2002). Unlike PCR methods that use Taq DNA polymerase and may result in amplification bias and highly skewed genome coverage (Dean et al. 2002; Lasken 2009), MDA techniques use a high-fidelity φ29 polymerase and random hexamers to provide a more stable amplification with a more uniform coverage and longer fragment lengths >10 kb (Dean et al. 2002; Hosono et al. 2003; Lovmar et al. 2003; Paez et al. 2004; Park et al. 2005; Pinard et al. 2006).

Although the use of MDA for increasing copy number may seem like a panacea when working with low-quantity DNA samples, amplification bias remains a concern when using these methods. Several studies have quantified potential bias in MDA, although the majority have either focused on few loci (e.g. Cheung & Nelson 1996; Dean et al. 2002; Hosono et al. 2003; Lovmar et al. 2003), genomic resources rarely available for nonmodel taxa (e.g. Barker et al. 2004; Paez et al. 2004), or relatively simple genomes (e.g. Abulencia et al. 2006; Pinard et al. 2006). Further, only a handful of studies have examined how bias scales with the use of the high-throughput DNA sequencing methods now common in both clinical and nonclinical research programmes (e.g. Pinard et al. 2006; ElSharawy et al. 2012). Thus, additional studies are needed to examine the efficacy of ‘next-generation MDA’, particularly for research applications that target thousands of loci for hundreds of individuals as is commonly observed in NGS applications in molecular ecology.

Genotyping-by-sequencing (GBS) methods such as restriction site-associated DNA sequencing (RADseq; Baird et al. 2008) and double-digest RADseq (ddRADseq; Peterson et al. 2012) have become popular approaches for interrogating thousands of genomic regions for up to hundreds of individuals at a modest sequencing cost (Davey & Blaxter 2010; Rowe et al. 2011; Narum et al. 2013). In general, these techniques are ideal for population genomic- and shallow-scale phylogenomic studies (e.g. Emerson et al. 2010; Hohenlohe et al. 2010, 2011, 2013; Peterson et al. 2012; Rubin et al. 2012; Eaton & Ree 2013; Nadeau et al. 2013; Wagner et al. 2013; Mastretta-Yanes et al. 2015; Schield et al. 2015). New analytical software packages such as Stacks (Catchen et al. 2011, 2013) and pyrad (Eaton & Ree 2013) are attractive in that they allow researchers to control multiple parameters to cluster sequences based on a priori knowledge about their study system (Davey et al. 2013). This flexibility helps solve some of the challenges for many of these methods, including adequately merging orthologous sequences within and among individuals while minimizing the clustering of paralogous markers. However, no study to date has tested for bias when performing ddRADseq on MDA samples. We focus our efforts on ddRADseq, as the method is extremely flexible (Peterson et al. 2012) and likely to become routinely used in ecological and evolutionary genomics for the foreseeable future.

Here, we examine genetic material from the grey mouse lemur, Microcebus murinus, to investigate the potential bias introduced with MDA and ddRADseq. By performing ddRADseq on the same set of individuals in duplicate – that is., those from both MDA and high quality gDNA – we test the null hypothesis that there are no observable differences in both relative genome coverage as well as inferred SNPs within individuals among sample types. We also estimate the potential for increased homozygosity in MDA samples due to allelic dropout. We compare our findings when genotyping de-novo vs. genotyping from mapping RAD sequences to an available mouse lemur reference genome. We find that sequence bias resulting from MDA is very low to nonexistent, especially when sequencing coverage is sufficiently deep. We thus hope that our results will find use for a wide variety of applications for studying molecular ecological and evolutionary processes in natural populations.

Methods

Samples and laboratory methods

We obtained fresh tissue (liver) samples from five Microcebus murinus individuals (MM1812, MM1842, MM1895, MM7011, MM7020) from the Duke Lemur Center (DLC, Durham, NC, USA) that were immediately preserved at −80 °C. We used the Qiagen DNeasy Blood & Tissue DNA Extraction Kit (Qiagen Sciences, Inc.) to obtain purified gDNA for each sample. All DNA extracts were treated with Qiagen RNAse A to remove any RNA contamination. Following purification and quantification on an Agilent Nanodrop 2000 spectrophotometer, MDA reactions were performed for each individual. For our study, we utilized the MDA technique used in the Qiagen REPLI-g Mini Kit following manufacturer recommendations (>100 ng of high molecular weight gDNA per reaction). Although we could have performed MDA reactions with substantially less starting DNA (to more closely mimic potential field conditions), we chose to follow manufacturer guidelines for a ‘best case scenario’ comparison. Our primary justification for this approach is that before potential bias with MDA and ddRADseq can be assessed when starting with exceptionally low DNA quantity (generally <1 ng), baseline studies are needed to test whether there is bias in the MDA reaction itself when used in conjunction with ddRADseq. Also of note is that total DNA yield from MDA is highly uniform irrespective of the quantity of starting material (Dean et al. 2002), and some studies have suggested that MDA efficiency may only be compromised at extremely low DNA input (see 4).

All MDA reactions were incubated overnight at 30 °C and subsequently heat-killed at 65 °C for 3 min. Products were run on a 2% agarose gel stained with ethidium bromide to check the quality of reactions. Quantification of unpurified MDA reactions on an Agilent Nanodrop 2000 yielded >300 ng/μL DNA per sample. To remove unincorporated nucleotides and clean MDA products, we used the Zymo Genomic DNA Clean & Concentrator Kit (Zymo Research) following specified protocols. Quantification of purified MDA samples yielded concentrations between 50 and 100 ng/μL. Although unpurified MDA reactions can be used for subsequent assays (Hosono et al. 2003), we purified our samples to minimize any potential bias introduced by unincorporated reagents.

To create ddRAD libraries, we followed the protocol of Peterson et al. (2012). The ddRADseq protocol was selected over single-digest RADseq for several reasons; (i) digesting with two restriction enzymes potentially eliminates the need for the expensive and error-prone step of randomly shearing the DNA; (ii) the efficiency of random shearing and subsequent library preparation steps will be more sensitive to both input DNA quantity and quality; (iii) restriction fragment length bias is a greater concern with RADseq (Davey et al. 2013); (iv) the majority of laboratories will have the necessary equipment to perform both MDA and ddRADseq in-house. Briefly, we first double digested the gDNA and MDA samples with the enzymes SphI and MluCI (New England Biolabs). Digestions (50 μL) were performed for 3 h at 37° C using ~1 μg DNA using the following master mix: 5 μL NEB buffer 4, 1 μL of each enzyme, 15–20 μL gDNA or MDA, and ddH₂0 to volume. Digests were then cleaned using 1.5X volume Agencourt Ampure Beads (Beckman Coulter) following manufacturer protocols. Ligation reactions (40 μL) were then performed to attach adapters (barcoded P1, P2) containing PCR and sequencing regions to each individual. Adapters were first diluted (from a 40 μm stock) targeting a value of fivefold excess adapters to complementary sticky ends following Peterson et al. (2012). The P1 adapter was diluted to a final working concentration of 0.406 μm, and P2 was diluted to 0.876 μm. Each reaction contained 4 μL T4 buffer, 1 μL T4 DNA ligase, 2 μL of each adapter, 50 ng DNA and ddH₂0 to volume. Samples were then incubated following Peterson et al. (2012). Ligated products were then combined, cleaned using Ampure beads and size-selected using a 2% agarose gel with a target range of 400–600 bp. The cleaned product was aliquoted into five separate PCR tubes and amplified using the Phusion High-Fidelity PCR Kit (New England Biolabs). All reactions contained the following mastermix (20 μL per sample): 4.4 μL ddH₂0, 4 μL 5X Phusion PCR buffer, 0.4 μL 10 mm dNTPs, 1 μL 10 μm of each primer, 0.2 μL DNA polymerase and 9 μL purified size-selected DNA. Cycling conditions used were as follows: denaturation at 98 °C for 30 s, 12X cycles of 98 °C for 10 s, 68 °C for 30 s, 72 °C for 30 s and a final extension step of 72 °C for 7 min. PCR products were pooled, cleaned with Ampure beads and eluted in EB buffer to a final volume of 40 μL. DNA was then quantified on a Nanodrop and Bioanalyzer 2100 (Agilent) and further size-selected using a Pippin Prep (Sage Science) because of the relatively wide range of fragments recovered by gel excision. The resulting fragments (~320–520 bp) were visualized on a TapeStation 2200 (Agilent) and subsequently sequenced on an Illumina MiSeq DNA sequencer (150-bp paired-end) at the Duke University Institute for Genome Science & Policy Sequencing Facility.

To increase the number of comparisons between gDNA and MDA and to ascertain if results were due to laboratory or sequencing error, we performed a second experiment starting from the original liver samples and followed the same protocol as above. However, in the second experiment size selection was performed using the Pippin Prep only (i.e. no gel excision). Sequenced fragments in this run were ~275–500 bp in length (mean = 375 bp). All subsequent comparisons between gDNA and MDA were performed twice (i.e. one set per repetition of the experimental procedures above).

Data demultiplexing and quality filtering

To check the quality of the data, we first imported the paired-end reads into FastQC. This enabled us to quantify several characteristics of the data, including to what fragment length the reads should be trimmed for subsequent analysis. We then used the pipeline Stacks v 1.20 beta (Catchen et al. 2011, 2013) to process our RAD data. Stacks was developed specifically to deal with short-read data generated through next-generation sequencing methods and is ideal for gleaning population genomic statistics from ddRADseq studies. Although other programmes handle ddRADseq data and have different relative strengths (e.g. pyrad for deep phylogenetic studies and Genome Analysis Toolkit [GATK] for quality-aware SNP calling), Stacks is generally the currently recommended software package based on ease of use, features and performance (Davey et al. 2013). Samples were first demultiplexed and quality-filtered using the process_radtags program. Because the quality of the data was high through nearly the full length of the reads (150 bp), we only trimmed sequences to 145 bp for all subsequent analyses. Loci were then built both de-novo and with the available M. murinus reference genome.

De-novo analysis

We first used Stacks to build loci de-novo. The entire pipeline was run 10 times [i.e. once for each duplicate pair (MDA vs. gDNA) for both sequencing runs]. To minimize potential errors when making comparisons between gDNA and MDA samples, only single-read data were analysed. Thus, following execution of process_radtags, we combined the demultiplexed fastq files from read-1 with read-1 sequences that were stranded (singletons—rem file) into a single file (per individual) for analysis. All sequences have been deposited into the Sequence Read Archive (Table S1, Supporting information).

A new program in the Stacks pipeline, rxstacks, was recently developed to further quality filter data and correct SNP calls and excess haplotypes. At present, this program cannot be implemented using the streamlined pipelines denovo_map.pl and ref_map.pl. Therefore, following demultiplexing and filtering, each Stacks program was run independently using the following sequence: ustacks > cstacks > sstacks > rxstacks > cstacks > sstacks. The pipeline was executed using the following settings: minimum depth of coverage required to create a stack (-m) = 3; maximum distance allowed between stacks (-M) = 4; maximum distance allowed to align secondary reads (-N) = 6; –max_locus_stacks = 3; enable removal algorithm (-r) and deleveraging algorithm (-d); maximum number of mismatches allowed between loci when building the catalogue (-n) = 3. For rxstacks, we used the following settings: –conf_filter –conf_lim 0.25, –prune_haplo –model_type bounded –bound_high 0.1 –lnl_lim −10, –lnl_dist. These parameters were carefully chosen based on the fact that each of the 10 data sets consisted of pairs of the same individual. For example, for both -M and -n, we presumed that these values would be efficient to cluster orthologous loci and simultaneously minimize clustering of divergent paralogous markers. Parameters were also chosen based on recommendations from previous studies (Catchen et al. 2013). Following the execution of the pipeline, we implemented the populations program to generate output to compare the gDNA data to MDA for each paired individual. Because inferred SNPs and haplotypes were affected by depth of coverage, we adopted a relatively conservative approach and only kept loci with a minimum stack depth of 20X (-m in populations). All other default settings were used. We created a MySQL database to house Stacks output, and results were visualized on the Stacks webserver. All subsequent filtering and processing were performed in UNIX. Bias between MDA and gDNA samples was quantified at the SNP level for each individual. Note that throughout the study, we refer to tags or RAD-tags as the entire ddRAD sequence (145 bp), whereas a locus refers to a single SNP marker.

Reference analysis

In addition to creating loci and genotyping de-novo, we were interested in using the available M. murinus genome to assemble our RAD-tags and call SNPs. We obtained a copy of the M. murinus genome (assembled scaffolds) from the Ensembl Genome Browser (useast.ensembl.org/index.html). We then used Bowtie2 v2.1.0 (Langmead & Salzberg 2012) to map the quality-filtered RAD-tags to the reference. We ran initial analyses using the paired reads specifying a maximum fragment length of 600 bp. Default settings were used for all mapping. SAMtools v0.1.17 (Li et al. 2009) was used to convert the resulting SAM files to BAM files to ease data manipulation. Visualization of BAM files was implemented in the software Tablet v1.13.08.05 (Milne et al. 2013) for QC. We extensively browsed the mapping results for different scaffolds of the reference to assess the quality of alignment, in particular regarding the mate pairs. Results indicated a relatively large fraction of mates were stranded (i.e. one mate was unmapped or mapped to a different scaffold). This issue could be due to any number of potential factors including issues with the assembly of contigs and scaffolds, gene duplication, repetitive DNA, mutation, etc. To be conservative and to minimize potential errors mapping to the reference (which would subsequently influence genotyping), we used only the single-read data for alignment. Using the single-read data for mapping also enabled us to make a fair comparison to the results of the de-novo analysis. Therefore, we reran Bowtie2 for each individual using the single-read data only. Default settings were again used and the resulting alignment was post-processed in SAMtools and Tablet. We then reran Stacks an additional 10 times using the same settings as the de-novo analyses but substituting ustacks for pstacks. Catalogues were created using information both regarding genomic position of reads (-g) and the maximum number of nucleotide differences for consensus sequences (-n 3).

Finally, as read depth continues to be a major hurdle to accurately score genotypes from RAD-based data set (Davey et al. 2013), we were also interested in assessing how the minimum level of coverage for genotyping might influence SNP concordance between gDNA and MDA samples. Therefore, we ran additional populations analyses specifying the following minimum thresholds for coverage: 0X, 5X, 10X and 15X. These analyses were performed both de-novo and with the reference for sequencing Run-1 only. Output files from all Stacks analyses can be obtained from Dryad (doi:10.5061/dryad.83dc2).

Results

Quality filtering

We first assessed the quality of the paired RAD-tags using FastQC. Both sequencing runs were of high quality, with median per base sequence qualities (Phred score) ranging from 32 to 39 (majority 39). Median per base quality scores after removing adapter and barcode sequence (i.e. data analysed in Stacks) ranged from 37 to 39. Likewise, the vast majority of the sequence reads had a mean Phred score >38, indicating that the data were of high quality. The correct cut site was found in the majority of fragments for both restriction sites (CATG and AATT), indicating that the digestion and ligation proceeded as expected. The data were then imported into Stacks where samples were demultiplexed and filtered.

For Run-1, out of a total of 18 715 106 sequences, 96 214 (0.5%) had ambiguous barcodes, 196 448 (1%) were of low quality and 106 438 (0.6%) had an ambiguous RAD-tag. Removing these reads left 18 316 006 reads (~98%) for subsequent analysis. The mean number of reads for the gDNA samples was 1 915 548, whereas the mean number of reads for the MDA samples was 1 808 230. Run-2 yielded 30 205 860 total paired reads, 591 620 (2%) of which had ambiguous barcode drops, zero low-quality read drops and 244 581 (0.8%) ambiguous RAD-tag drops. Removing these reads left 29 369 659 reads (~97%) for subsequent analysis. The mean number of reads for the gDNA samples was 3 032 860, whereas the mean number of reads for the MDA samples was 2 889 988.

De-novo analysis

The de-novo analysis of RAD-tags for Run-1 yielded between 34 469 and 70 454 unique stacks per individual, with a total number of tags in each catalogue ranging between 51 299 and 80 250 (Table 1). The de-novo analysis for Run-2 yielded between 72 137 and 140 158 unique stacks per individual, with a total number of tags in each catalogue ranging between 127 648 and 166 585 (Table S2, Supporting information). There was no observable difference between gDNA and MDA samples for the number of tags and SNPs generated within individuals. After extensive filtering of the data at a conservative level (minimum of 20X coverage), the total number of tags in catalogues was greatly reduced in both Run-1 and Run-2 (Table 1; Table S2, Supporting information). We then used the remaining tags to examine individual shared SNP loci to compare rates of concordance between gDNA and MDA. Results suggested a very high level of concordance (~98%) among gDNA and MDA in both runs for the SNP loci that remained after filtering (Fig. 1a; Fig. S1a, Supporting information).

Table 1. De-novo results from Run-1. All analyses were performed in pairs (ie. gDNA + MDA) for each individual. The final column represents the number or RAD-tags retained after filtering stacks with a minimum of 20X coverage in the populations program

Individual	Barcode	Source	Unique stacks	SNPs	Total number of tags in catalogue	Total number of retained tags at 20X
MM1812	GCATG	gDNA	53 630	7 776 350	71 280	8554
MM1812	CAACC	MDA	64 456	9 346 120	71 280	8554
MM1842	GGTTG	gDNA	63 059	9 143 555	70 978	10 413
MM1842	CGATC	MDA	58 475	8 478 875	70 978	10 413
MM1895	AAGGA	gDNA	64 590	9 365 550	70 768	8988
MM1895	TCGAT	MDA	54 785	7 943 825	70 768	8988
MM7011	AGCTA	gDNA	64 360	9 332 200	80 250	11 309
MM7011	TGCAT	MDA	70 454	10 215 830	80 250	11 309
MM7020	AACCA	gDNA	49 518	7 180 110	51 299	3679
MM7020	ACACA	MDA	34 469	4 998 005	51 299	3679

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

(a) Run-1 SNP concordance between gDNA and MDA samples genotyped *de-novo*. (b) Run-1 SNP concordance between gDNA and MDA samples genotyped with the *Microcebus murinus* reference genome. Separate catalogues were created for each individual in Stacks, with each individual containing two samples (gDNA and MDA). All results were reported based on filtering RAD-tags to a minimum stack depth of 20X coverage prior to calling genotypes. A SNP locus was defined as a particular nucleotide position of the genome containing at least one single nucleotide substitution within (heterozygous) or between (divergence) samples.

Reference analysis

We first separately mapped all single-read RAD-tags for each sample to the currently available M. murinus genome. Alignment rate was consistent both among samples and among runs at ~80%. Approximately 50% of RAD-tags aligned once to the genome, whereas ~30% aligned more than once and ~20% did not align. There was no significant difference in alignment rate between gDNA and MDA samples in either Run-1 (Wilcoxon V = 12, P = 0.3125) or Run-2 (Wilcoxon V = 15, P = 0.0625).

Results from the reference analysis were generally concordant with those of the de-novo pipeline. The number of unique stacks per individual in Run-1 ranged from 34 776 to 84 764 and the total number of tags in catalogues ranged from 45 438 to 74 073 (Table 2). The number of unique stacks per individual in Run-2 ranged from 81 781 to 164 391, and the total number of tags in catalogues ranged from 110 979 to 130 808 (Table S3, Supporting information). Like the de-novo analysis, statistics were similar across all gDNA and MDA samples, and again, the total number of RAD-tags was substantially higher in Run-2. Further, SNP concordance among duplicate gDNA and MDA samples was high for the shared SNP loci remaining after filtering at 20X coverage (~98%; Fig. 1b; Fig. S1b, Supporting information).

Table 2. Reference-aligned results from Run-1. All analyses were performed in pairs (ie. gDNA + MDA) for each individual. The final column represents the number or RAD-tags retained after filtering stacks with a minimum of 20X coverage in the populations program

Individual	Barcode	Source	Unique stacks	SNPs	Total number of tags in catalogue	Total number of retained tags at 20X
MM1812	GCATG	gDNA	58 314	8 455 530	66 093	5829
MM1812	CAACC	MDA	77 056	11 173 120	66 093	5829
MM1842	GGTTG	gDNA	72 393	10 496 985	65 876	6847
MM1842	CGATC	MDA	67 328	9 762 560	65 876	6847
MM1895	AAGGA	gDNA	74 690	10 830 050	64 998	5959
MM1895	TCGAT	MDA	61 268	8 883 860	64 998	5959
MM7011	AGCTA	gDNA	73 815	10 703 175	74 073	7304
MM7011	TGCAT	MDA	84 764	12 290 780	74 073	7304
MM7020	AACCA	gDNA	52 092	7 553 340	45 438	2984
MM7020	ACACA	MDA	34 776	5 042 520	45 438	2984

To further assess similarity in RAD-tag location and coverage depth among gDNA and MDA, a Circos (Krzywinski et al. 2009) diagram was generated to visually inspect coverage across the grey mouse lemur genome. We first performed two separate mapping analyses in Bowtie2 (all gDNA and all MDA for Run-1) and used the two resulting SAM files to generate the figure. These results confirmed the results of the SNP concordance assessments; both sample preparation techniques, gDNA and MDA, had consistent levels of average and maximum coverage depth across the M. murinus genome (Fig. 2).

Allelic dropout

Previous studies have suggested that allelic dropout (i.e. failure to detect alleles that are actually present) may be a concern when working with wgaDNA (Lovmar et al. 2003; Handyside et al. 2004; Sun et al. 2005). For both pipelines (de-novo, reference) and sequencing runs (Run-1, Run-2), we calculated the number of shared SNP loci that were homozygous and heterozygous for both gDNA and MDA. For Run-1, there was no significant difference in the proportion of homozygotes among gDNA and MDA either genotyping de-novo (Wilcoxon V = 3, P = 0.5839; Fig. 3) or using the reference (Wilcoxon V = 1, P = 0.1975; Fig. 3). Similar results were obtained for Run-2 de-novo (Wilcoxon V = 8, P = 1; Fig. S2, Supporting information) and with the reference (Wilcoxon V = 0, P = 0.1003; Fig. S2, Supporting information), indicating that the MDA protocol had no appreciable effect on SNP calls and levels of homozygosity.

Coverage and SNP concordance

As recent studies have suggested that genotyping accuracy of RAD data is heavily reliant on levels of coverage (Davey et al. 2013), we tested the hypothesis that levels of SNP concordance among gDNA and MDA were related to the minimum level of coverage required to call genotypes. This minimum coverage level likely depends on whether the data are analysed with or without a reference genome. We therefore performed analysis both with and without the M. murinus reference. We tested the following minimum coverage thresholds: 0X, 5X, 10X, 15X and 20X. Results suggested that 95% SNP concordance among gDNA and MDA was obtained with a threshold of ~7X with the reference and ~12X with the de-novo assembly (Fig. 4). Concordance rates began to asymptote at about 5X coverage with the reference and at approximately 10X de-novo. At 20X coverage, there was no difference in gDNA and MDA SNP concordance among reference-based vs. de-novo genotyping. The average number of shared SNP loci across all individuals was consistently higher for de-novo analyses vs. reference analyses for a given sequencing depth out to about 15X coverage (Fig. 4).

Discussion

Advances in DNA sequencing technologies have revolutionized ecological and evolutionary studies of natural populations, enabling researchers to test a suite of hypotheses that were previously beyond the scope of sequencing technology. With these new methods come challenges ranging from adequate experimental design to data analysis. We focus our efforts here on quantifying bias in genome coverage and SNP genotypes by comparing duplicate gDNA and MDA samples from the grey mouse lemur, M. murinus, using ddRADseq. Although other WGA methods are available, we focus solely on MDA as this technique is generally considered to be one of the more reliable methods (Lasken 2009). We use ddRADseq as our method of choice for genetic characterization because of the increasing popularity of restriction enzyme-based methods in population genomic and shallow-scale phylogenetic studies (Emerson et al. 2010; Hohenlohe et al. 2010, 2011, 2013; Peterson et al. 2012; Rubin et al. 2012; Mastretta-Yanes et al. 2015; Schield et al. 2015). Our results suggest both adequate genome coverage and SNP concordance among multiple individuals and separate independent sequencing runs. Genotype mismatches between MDA and the control gDNA were rare in both replicates of the experiment, suggesting that MDA has no appreciable effect on ddRADseq experiments when using an acceptable level of input DNA (>100 ng). These results are congruent with previous studies that quantify bias in MDA using alternative methods (e.g. Dean et al. 2002; Hosono et al. 2003; Lovmar et al. 2003; Barker et al. 2004; Sun et al. 2005), For example, SNP and STR genotype concordance rates between MDA and gDNA have been estimated to range from 70% to 100% (majority >90%), depending on the methodology utilized (see Lovmar & Syvänen 2006 for review). However, many of these studies were based off a relatively low number of markers. Our results from ddRADseq are concordant with both previous and more recent studies that examine how bias scales with NGS data (e.g. Pinard et al. 2006; ElSharawy et al. 2012) and suggest that MDA may be an important tool for ecological and evolutionary genomics.

Ecological applications of MDA

A primary motivation for this study comes from our own experiences with suboptimal tissue preservative for long-term storage of rare samples from Madagascar. Further, given the highly endangered status of virtually all endemic Malagasy vertebrates, we must typically rely on hair and other sample types that have low amounts of endogenous DNA (Taberlet & Luikart 1999; Taberlet et al. 1999). Moreover, these samples must be initially preserved under field conditions, and due to U.S. regulatory oversight, must often be stored for several to many months at a time as they await exportation. In such cases, samples typically contain low-quantity (~3–10 ng/μL), high molecular weight DNA that is suitable for MDA (Dean et al. 2002). The finding of no substantial bias in MDA samples suggests that these tissue samples, which usually cost thousands to tens of thousands of dollars in time, equipment and labour, can be saved for later use with NGS methods and lead to larger sample sizes for population genomic and phylogeographic studies. It should be noted, however, that many WGA procedures, including MDA, may not produce a reliable and unbiased amplification when working with highly degraded template (Wang et al. 2004). As no study has assessed the potential biases of MDA and ddRADseq, the present study serves to establish a baseline operating within ideal conditions of >100 ng of high molecular weight starting DNA template (for further discussion see 4.2). To maximize the potential of MDA for noninvasive sampling regimes, tissue samples should be collected soon after they are released from the animal to minimize the quantity of degraded DNA in the sample. A simple DNA extraction followed by gel electrophoresis can be performed to assess whether the samples are good candidates for MDA.

Another potential use of WGA generally and MDA specifically lies in eDNA metabarcoding studies, which are increasing in popularity at an extraordinary rate. These studies use DNA present in environmental samples such as water or soil to detect the presence or absence of particular species (Ficetola et al. 2008; Taberlet et al. 2012a,b). Thus, eDNA can be classified as a second type of noninvasive DNA sampling and again, DNA quantity and quality can be a limiting factor depending on the amount of time elapsed between the time DNA molecules were shed to the time they were preserved (Dejean et al. 2011; Thomsen et al. 2012a,b; Goldberg et al. 2013; Barnes et al. 2014). It is also highly likely that eDNA molecules from different species are present in different concentrations, and it is presently unknown how this asymmetry may affect species detection and estimates of relative abundance. This may be particularly true for rare species, whose DNA may be represented in exceptionally low quantities (Boessenkool et al. 2012). MDA may be one solution to simultaneously amplify all DNA in an environmental sample prior to library construction, yielding more accurate species inventories. However, it may be possible that MDA introduces bias in eDNA applications by preferentially amplifying DNA fragments from species that have not passed a threshold of degradation. This potential trade-off is an area ripe for research.

Finally, MDA has shown promise in the field of metagenomics (Abulencia et al. 2006) and for DNA amplification in single bacterial cells (Raghunathan et al. 2005). Our results support the notion that MDA may be useful for population genomic studies of microbes to fully characterize genomic diversity and population structure of both bacteria and viruses. This is a particularly exiting avenue of research as more accurate estimates of genomic diversity for common human pathogens can be ascertained. Further, comprehensive genomic assays for these pathogens would enable more robust estimations of genomic patterns of adaptation and how these patterns correlate with particular demographics.

Considerations and limitations

Although our results report no substantial bias between gDNA and MDA using ddRADseq, there are several limitations of the current study that must be addressed. First, we base our conclusions on a single WGA method and kit (Qiagen REPLI-g). A variety of other MDA kits are presently available including Illustra GenomiPhi and TempliPhi (GE Healthcare Life Sciences). However, the principle and chemistry between these kits is virtually identical and is thus likely to provide similar results.

Second, allelic dropout in heterozygous individuals can be an issue when working with wgaDNA (Lovmar & Syvänen 2006). The probability of allelic dropout appears to a factor of both the method of WGA used and the quantity of starting material (Handyside et al. 2004; Bergen et al. 2005a; Sun et al. 2005). More specifically, PCR-based WGA methods or starting DNA quantities <1 ng may introduce some degree of both locus and allelic bias (Cheung & Nelson 1996; Dean et al. 2002; Sun et al. 2005). Results from additional studies suggest DNA input quantities between 3 and 100 ng per MDA reaction to maximize genome coverage and minimize allelic dropout and imbalance (Lovmar et al. 2003; Bergen et al. 2005b). We follow these guidelines in the present study by utilizing the MDA approach with starting DNA concentrations that are relatively large (>100 ng as per manufacturer recommendations). Indeed, our results suggest weak evidence for allelic dropout in MDA samples, as the proportion of homozygotes in shared loci is similar for both gDNA and MDA.

Third, although our results serve as a baseline estimate and suggest limited bias within MDA and ddRAD libraries when strictly following manufacturer guidelines, future studies should compare the concordance of gDNA and MDA among high (10–100 ng) and low-quantity (<10 ng) DNA template. These comparisons should be made within a single library and sequencing run to minimize any potential exogenous bias. For example, a suitable follow-up study would be to create libraries consisting of both high quantity gDNA from blood and low-quantity DNA from hair (to be amplified by MDA), with both sample types originating from the same animal. This combined ‘testing the kit + ‘testing the sample’ approach will highlight more of the potential limitations when using MDA in ecological and evolutionary research.

Fourth, we focus our efforts solely on GBS techniques (i.e. ddRADseq) as these are becoming methods of choice for next-generation population genomic and phylogeographic research (Davey & Blaxter 2010; Rowe et al. 2011; Narum et al. 2013). Additional studies are needed to determine how alternative NGS approaches commonly used for similar ecological applications such as sequence capture methods compare with our results. Further, we test for bias only using ddRADseq while bias may prove more substantial in alternative RAD-based protocols, particularly those that involve a random shearing step. However, we anticipate that results will be highly congruent as evidenced from other recent studies quantifying bias in MDA with NGS (Pinard et al. 2006; ElSharawy et al. 2012). Along similar lines, we use a single software package, Stacks, to analyse our data. We focus on Stacks because the software was specifically designed to process RAD-based markers and is the generally accepted analytical method (Davey et al. 2013).

Fifth, we use a relatively low-quality reference genome (~2X) in our analysis, which may reduce mapping efficiency. This may suggest why at any given coverage depth, the number of shared loci among samples is greater when genotyping de-novo than with the reference (Fig. 4). Interestingly, however, results regarding the efficiency of MDA are virtually identical in both cases, suggesting that the use of a low-quality reference is not a hindrance to our conclusions.

Our study design and objectives allow us to test how genotyping efficiency may be related to minimum depth of coverage. Compared to fragment analysis, genotyping NGS data are far less straightforward with multiple factors that must be considered and incorporated for accurate calls (Davey et al. 2013). As more NGS data are collected, new genotyping programs will continue to be developed and the strengths and weaknesses of methods will have to be compared. Based on the maximum-likelihood method in Stacks, our results suggest that a minimum of 7X coverage may be adequate if a reference genome is available, whereas a depth of 12X may suffice if genotyping de-novo. We base these conclusions on levels of SNP concordance (>95%) between gDNA and MDA as a function of coverage depth.

Finally, as alluded to above, MDA generally works best when using high molecular weight template. However, noninvasive sampling regimes are frequently confronted with the problem of highly degraded DNA (Taberlet et al. 1999, 2012b). Thus, an interesting and useful avenue for future research will be to ascertain if the potentially increased power to detect rare species by performing MDA on eDNA outweighs the bias introduced by the preferential amplification of long fragments. One solution to this trade-off would be to use WGA methods that aim to provide unbiased amplification of highly degraded template. For example, Wang et al. (2004) introduce a method termed Restriction and Circularization-Aided Rolling Circle Amplification that appears robust to moderate levels of degradation. In brief, the method first uses restriction enzymes to fragment the DNA between damage sites. Fragments are then circularized using DNA ligase and noncircularized fragments are discarded. The remaining double-stranded circularized DNA is denatured and amplified using φ29 polymerase, similar to traditional MDA (Dean et al. 2002). Early results show promise when working with highly degraded template, indicating that the method should be explored further.

Conclusions

The primary objective of this study was to establish a baseline estimate of bias levels combining MDA with ddRADseq under optimal conditions (>100 ng input DNA for MDA). Our results suggest that the efficacy of MDA scales with NGS applications including ddRAD-based studies. These conclusions are based on high concordance in both genome coverage and SNP genotypes when compared to raw gDNA samples. Future studies are needed to ascertain if similar results are obtained under suboptimal sampling conditions (e.g. low-quantity [<10 ng] and/or highly degraded template). We hope that MDA (and WGA methods more generally) will be found useful across numerous ecological and evolutionary applications. As many species and populations continue to be threatened by extinction, thus limiting field options for collecting biological samples, methods allowing genomic investigations from small noninvasive samples will be vital to implement adequate conservation measures.

Acknowledgements

We thank Erin Ehmke from the Duke Lemur Center for providing tissue samples. We thank P. Larsen for his insightful discussions regarding methods of statistical analysis and data visualization. J. Catchen provided invaluable advice regarding the use of Stacks. This research was funded by Duke University start-up funds to A.D. Yoder. This is Duke Lemur Center publication # 1283.

Supporting Information

References

Abulencia CB, Wyborski DL, Garcia JA et al. (2006) Environmental whole-genome amplification to access microbial populations in contaminated sediments. Applied and Environmental Microbiology, 72, 3291–3301.
10.1128/AEM.72.5.3291-3301.2006
CAS PubMed Web of Science® Google Scholar
Baird NA, Etter PD, Atwood TS, et al. (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE, 3, e3376.
10.1371/journal.pone.0003376
CAS PubMed Web of Science® Google Scholar
Barker DL, Hansen MS, Faruqi AF et al. (2004) Two methods of whole-genome amplification enable accurate genotyping across a 2320-SNP linkage panel. Genome Research, 14, 901–907.
10.1101/gr.1949704
CAS PubMed Web of Science® Google Scholar
Barnes M, Turner CR, Jerde CL et al. (2014) Environmental conditions influence eDNA persistence in aquatic systems. Environmental Science and Technology, 48, 1819–1827.
10.1021/es404734p
CAS PubMed Web of Science® Google Scholar
Bergen AW, Haque KA, Qi Y et al. (2005a) Comparison of yield and genotyping performance of multiple displacement amplification and OmniPlex^™ whole genome amplified DNA generated from multiple DNA sources. Human Mutation, 26, 262–270.
10.1002/humu.20213
CAS PubMed Web of Science® Google Scholar
Bergen AW, Qi Y, Haque KA, Welch RA, Chanock SJ (2005b) Effects of DNA mass on multiple displacement whole genome amplification and genotyping performance. BMC Biotechnology, 5, 24.
10.1186/1472-6750-5-24
CAS PubMed Web of Science® Google Scholar
Boessenkool S, Epp LS, Haile J et al. (2012) Blocking human contaminant DNA during PCR allows amplification of rare mammal species from sedimentary ancient DNA. Molecular Ecology, 21, 1806–1815.
10.1111/j.1365-294X.2011.05306.x
CAS PubMed Web of Science® Google Scholar
Castoe TA, Poole AW, de Koning AJ et al. (2012) Rapid microsatellite identification from Illumina paired-end genomic sequencing in two birds and a snake. PLoS ONE, 7, e30953.
10.1371/journal.pone.0030953
CAS PubMed Web of Science® Google Scholar
Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH (2011) Stacks: building and genotyping loci de novo from short-read sequences. G3: Genes, Genomes, Genetics, 1, 171–182.
10.1534/g3.111.000240
CAS PubMed Web of Science® Google Scholar
Catchen J, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124–3140.
10.1111/mec.12354
CAS PubMed Web of Science® Google Scholar
Cheung VG, Nelson SF (1996) Whole genome amplification using a degenerate oligonucleotide primer allows hundreds of genotypes to be performed on less than one nanogram of genomic DNA. Proceedings of the National Academy of Sciences, 93, 14676–14679.
10.1073/pnas.93.25.14676
CAS PubMed Web of Science® Google Scholar
Davey JW, Blaxter ML (2010) RADSeq: next-generation population genetics. Briefings in Functional Genomics, 9, 416–423.
10.1093/bfgp/elq031
CAS PubMed Web of Science® Google Scholar
Davey JW, Cezard T, Fuentes-Utrilla P et al. (2013) Special features of RAD Sequencing data: implications for genotyping. Molecular Ecology, 22, 3151–3164.
10.1111/mec.12084
CAS PubMed Web of Science® Google Scholar
Dean FB, Hosono S, Fang L et al. (2002) Comprehensive human genome amplification using multiple displacement amplification. Proceedings of the National Academy of Sciences, 99, 5261–5266.
10.1073/pnas.082089499
CAS PubMed Web of Science® Google Scholar
Dejean T, Valentini A, Duparc A et al. (2011) Persistence of environmental DNA in freshwater ecosystems. PLoS ONE, 6, e23398.
10.1371/journal.pone.0023398
CAS PubMed Web of Science® Google Scholar
Eaton DA, Ree RH (2013) Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae). Systematic Biology, 62, 689–706.
10.1093/sysbio/syt032
CAS PubMed Web of Science® Google Scholar
ElSharawy A, Warner J, Olson J et al. (2012) Accurate variant detection across non-amplified and whole genome amplified DNA using targeted next generation sequencing. BMC Genomics, 13, 500.
10.1186/1471-2164-13-500
CAS PubMed Web of Science® Google Scholar
Emerson KJ, Merz CR, Catchen JM et al. (2010) Resolving postglacial phylogeography using high-throughput sequencing. Proceedings of the National Academy of Sciences, 107, 16196–16200.
10.1073/pnas.1006538107
CAS PubMed Web of Science® Google Scholar
Ficetola GF, Miaud C, Pompanon F, Taberlet P (2008) Species detection using environmental DNA from water samples. Biology Letters, 4, 423–425.
10.1098/rsbl.2008.0118
PubMed Web of Science® Google Scholar
Glenn TC (2011) Field guide to next-generation DNA sequencers. Molecular Ecology Resources, 11, 759–769.
10.1111/j.1755-0998.2011.03024.x
CAS PubMed Web of Science® Google Scholar
Goldberg CS, Sepulveda A, Ray A, Baumgardt J, Waits LP (2013) Environmental DNA as a new method for early detection of New Zealand mudsnails (Potamopyrgus antipodarum). Freshwater Science, 32, 792–800.
10.1899/13-046.1
Web of Science® Google Scholar
Handyside AH, Robinson MD, Simpson RJ et al. (2004) Isothermal whole genome amplification from single and small numbers of cells: a new era for preimplantation genetic diagnosis of inherited disease. Molecular Human Reproduction, 10, 767–772.
10.1093/molehr/gah101
CAS PubMed Web of Science® Google Scholar
Hofreiter M, Serre D, Poinar HN, Kuch M, Pääbo S (2001) Ancient DNA. Nature Reviews Genetics, 2, 353–359.
10.1038/35072071
CAS PubMed Web of Science® Google Scholar
Hohenlohe PA, Bassham S, Etter PD, et al. (2010) Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genetics, 6, e1000862.
10.1371/journal.pgen.1000862
CAS PubMed Web of Science® Google Scholar
Hohenlohe PA, Amish SJ, Catchen JM, Allendorf FW, Luikart G (2011) Next-generation RAD sequencing identifies thousands of SNPs for assessing hybridization between rainbow and westslope cutthroat trout. Molecular Ecology Resources, 11, 117–122.
10.1111/j.1755-0998.2010.02967.x
PubMed Web of Science® Google Scholar
Hohenlohe PA, Day MD, Amish SJ et al. (2013) Genomic patterns of introgression in rainbow and westslope cutthroat trout illuminated by overlapping paired-end RAD sequencing. Molecular Ecology, 22, 3002–3013.
10.1111/mec.12239
CAS PubMed Web of Science® Google Scholar
Hosono S, Faruqi AF, Dean FB et al. (2003) Unbiased whole-genome amplification directly from clinical samples. Genome Research, 13, 954–964.
10.1101/gr.816903
CAS PubMed Web of Science® Google Scholar
Krzywinski M, Schein J, Birol I et al. (2009) Circos: an information aesthetic for comparative genomics. Genome Research, 19, 1639–1645.
10.1101/gr.092759.109
CAS PubMed Web of Science® Google Scholar
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nature Methods, 9, 357–359.
10.1038/nmeth.1923
CAS PubMed Web of Science® Google Scholar
Lasken R (2009) Genomic DNA amplification by the multiple displacement amplification (MDA) method. Biochemical Society Transactions, 37, 450.
10.1042/BST0370450
CAS PubMed Web of Science® Google Scholar
Li H, Handsaker B, Wysoker A et al. (2009) The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078–2079.
10.1093/bioinformatics/btp352
CAS PubMed Web of Science® Google Scholar
Lovmar L, Syvänen AC (2006) Multiple displacement amplification to create a long-lasting source of DNA for genetic studies. Human Mutation, 27, 603–614.
10.1002/humu.20341
CAS PubMed Web of Science® Google Scholar
Lovmar L, Fredriksson M, Liljedahl U, Sigurdsson S, Syvänen AC (2003) Quantitative evaluation by minisequencing and microarrays reveals accurate multiplexed SNP genotyping of whole genome amplified DNA. Nucleic Acids Research, 31, e129.
10.1093/nar/gng129
CAS PubMed Web of Science® Google Scholar
Mastretta-Yanes A, Arrigo N, Alvarez N et al. (2015) Restriction-ste associated DNA sequencing, genotyping error estimation and de novo assembly optimization for population genetic inference. Molecular Ecology Resources, 15, 28–41.
10.1111/1755-0998.12291
CAS PubMed Web of Science® Google Scholar
Milne I, Stephen G, Bayer M et al. (2013) Using Tablet for visual exploration of second-generation sequencing data. Briefings in Bioinformatics, 14, 193–202.
10.1093/bib/bbs012
CAS PubMed Web of Science® Google Scholar
Nadeau NJ, Martin SH, Kozak KM et al. (2013) Genome-wide patterns of divergence and gene flow across a butterfly radiation. Molecular Ecology, 22, 814–826.
10.1111/j.1365-294X.2012.05730.x
CAS PubMed Web of Science® Google Scholar
Narum SR, Buerkle CA, Davey JW, Miller MR, Hohenlohe PA (2013) Genotyping-by-sequencing in ecological and conservation genomics. Molecular Ecology, 22, 2841–2847.
10.1111/mec.12350
CAS PubMed Web of Science® Google Scholar
Pääbo S, Poinar H, Serre D et al. (2004) Genetic analyses from ancient DNA. Annual Review of Genetics, 38, 645–679.
10.1146/annurev.genet.37.110801.143214
PubMed Web of Science® Google Scholar
Paez JG, Lin M, Beroukhim R et al. (2004) Genome coverage and sequence fidelity of ϕ29 polymerase-based multiple strand displacement whole genome amplification. Nucleic Acids Research, 32, e71.
10.1093/nar/gnh069
CAS PubMed Web of Science® Google Scholar
Park JW, Beaty TH, Boyce P, Scott AF, McIntosh I (2005) Comparing whole-genome amplification methods and sources of biological samples for single-nucleotide polymorphism genotyping. Clinical Chemistry, 51, 1520–1523.
10.1373/clinchem.2004.047076
CAS PubMed Web of Science® Google Scholar
Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS ONE, 7, e37135.
10.1371/journal.pone.0037135
CAS PubMed Web of Science® Google Scholar
Pinard R, de Winter A, Sarkis GJ et al. (2006) Assessment of whole genome amplification-induced bias through high-throughput, massively parallel whole genome sequencing. BMC Genomics, 7, 216.
10.1186/1471-2164-7-216
CAS PubMed Web of Science® Google Scholar
Quail MA, Smith M, Coupland P et al. (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics, 13, 341.
10.1186/1471-2164-13-341
CAS PubMed Web of Science® Google Scholar
Raghunathan A, Ferguson HR, Bornarth CJ et al. (2005) Genomic DNA amplification from a single bacterium. Applied and Environmental Microbiology, 71, 3342–3347.
10.1128/AEM.71.6.3342-3347.2005
CAS PubMed Web of Science® Google Scholar
Rohland N, Hofreiter M (2007) Ancient DNA extraction from bones and teeth. Nature Protocols, 2, 1756–1762.
10.1038/nprot.2007.247
CAS PubMed Web of Science® Google Scholar
Rowe H, Renaut S, Guggisberg A (2011) RAD in the realm of next-generation sequencing technologies. Molecular Ecology, 20, 3499–3502.
10.1111/j.1365-294X.2011.05197.x
CAS PubMed Web of Science® Google Scholar
Rubin BE, Ree RH, Moreau CS (2012) Inferring phylogenies from RAD sequence data. PLoS ONE, 7, e33394.
10.1371/journal.pone.0033394
CAS PubMed Web of Science® Google Scholar
Schield DR, Card DC, Adams RH et al. (2015) Incipient speciation with biased gene flow between two lineages of the western diamondback rattlesnake (Crotalus atrox). Molecular Phylogenetics and Evolution, 83, 213–223.
10.1016/j.ympev.2014.12.006
PubMed Web of Science® Google Scholar
Sun G, Kaushal R, Pal P et al. (2005) Whole-genome amplification: relative efficiencies of the current methods. Legal Medicine, 7, 279–286.
10.1016/j.legalmed.2005.05.001
CAS PubMed Google Scholar
Taberlet P, Luikart G (1999) Non-invasive genetic sampling and individual identification. Biological Journal of the Linnean Society, 68, 41–55.
10.1111/j.1095-8312.1999.tb01157.x
Web of Science® Google Scholar
Taberlet P, Waits LP, Luikart G (1999) Noninvasive genetic sampling: look before you leap. Trends in Ecology and Evolution, 14, 323–327.
10.1016/S0169-5347(99)01637-7
CAS PubMed Web of Science® Google Scholar
Taberlet P, Coissac E, Hajibabaei M, Rieseberg LH (2012a) Environmental DNA. Molecular Ecology, 21, 1789–1793.
10.1111/j.1365-294X.2012.05542.x
CAS PubMed Web of Science® Google Scholar
Taberlet P, Coissac E, Pompanon F, Brochmann C, Willerslev E (2012b) Towards next-generation biodiversity assessment using DNA metabarcoding. Molecular Ecology, 21, 2045–2050.
10.1111/j.1365-294X.2012.05470.x
CAS PubMed Web of Science® Google Scholar
Telenius H, Carter NP, Bebb CE, Ponder BA, Tunnacliffe A (1992) Degenerate oligonucleotide-primed PCR: general amplification of target DNA by a single degenerate primer. Genomics, 13, 718–725.
10.1016/0888-7543(92)90147-K
CAS PubMed Web of Science® Google Scholar
Thomsen P, Kielgast J, Iversen LL et al. (2012a) Monitoring endangered freshwater biodiversity using environmental DNA. Molecular Ecology, 21, 2565–2573.
10.1111/j.1365-294X.2011.05418.x
CAS PubMed Web of Science® Google Scholar
Thomsen PF, Kielgast J, Iversen LL et al. (2012b) Detection of a diverse marine fish fauna using environmental DNA from seawater samples. PLoS ONE, 7, e41732.
10.1371/journal.pone.0041732
CAS PubMed Web of Science® Google Scholar
Wagner CE, Keller I, Wittwer S et al. (2013) Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation. Molecular Ecology, 22, 787–798.
10.1111/mec.12023
CAS PubMed Web of Science® Google Scholar
Waits LP, Paetkau D (2005) Noninvasive genetic sampling tools for wildlife biologists: a review of applications and recommendations for accurate data collection. Journal of Wildlife Management, 69, 1419–1433.
10.2193/0022-541X(2005)69[1419:NGSTFW]2.0.CO;2
Web of Science® Google Scholar
Wang G, Maher E, Brennan C et al. (2004) DNA amplification method tolerant to sample degradation. Genome Research, 14, 2357–2366.
10.1101/gr.2813404
CAS PubMed Web of Science® Google Scholar
Zhang L, Cui X, Schmitt K et al. (1992) Whole genome amplification from a single cell: implications for genetic analysis. Proceedings of the National Academy of Sciences, 89, 5847–5851.
10.1073/pnas.89.13.5847
CAS PubMed Web of Science® Google Scholar

CB designed the experiment and performed all laboratory work. CB and CRC analysed the data. All authors contributed to the writing of the manuscript.

Data Accessibility

(i) Demultiplexed reads: NCBI Sequence Read Archive (Table S1, Supporting information). (ii) Stacks output files: DRYAD doi: 10.5061/dryad.83dc2.

Citing Literature

Volume15, Issue5

September 2015

Pages 1079-1090

Assessing the utility of whole genome amplified DNA for next-generation molecular ecology

Abstract

Introduction