DNA metabarcoding diet analysis in a generalist omnivore: feeding trials reveal the efficacy of extraction kits and a multi-locus approach for identifying diverse diets
Abstract
Metabarcoding-based diet analysis is a valuable tool for understanding the feeding behavior of a wide range of species. However, many studies using these methods for wild animals assume accuracy and precision without experimental evaluation with known positive control food items. Here, we conducted a feeding trial experiment with a positive control community in pasture-raised chickens and assessed the efficacy of several commonly used DNA extraction kits and primer sets. We hand-fed 22 known food items, including insects and plants, to six backyard laying hens and collected their excreta for eight h. We evaluated the efficacy of three DNA extraction kits, three primer sets for plant identification (targeting rbcL, trnL, and internal transcribed spacer 2 [ITS2]), and three primer sets for arthropod identification (targeting cytochrome oxidase subunit I [COI]). The detection success rate of our positive control food items was highly variable, ranging from 2.04% to 93.88% for all kit/primer combinations and averaging 37.35% and 43.57% for the most effective kit/primer combination for plants and insects, respectively. Extraction kits using bead-based homogenization positively affected the recovery proportion of plant and insect DNA in excreta samples. The minimum time to detect known food items was 44 min post-feeding. Two COI primer sets significantly outperformed the third, and both recovery proportion and taxonomic resolution from ITS2 were significantly higher than those from rbcL and trnL. Taken together, these results display the potential variability that can be inherently present in DNA-based diet analyses and highlight the utility of experimental feeding trials in validating such approaches, particularly for omnivores with diverse diets.
INTRODUCTION
DNA-based molecular diet analysis facilitates unprecedented resolution in studying trophic interactions (Pompanon et al. 2012; Crisol-martínez et al. 2016) and ecosystem functions within broad ecological communities. In contrast to traditional methods (morphological identification of digested prey items), the development of non-invasive DNA-based techniques has allowed precise detection of DNA remains in fecal samples (Khanam et al. 2016; Hoenig et al. 2022; Snider et al. 2022). For avian systems, DNA-based methods are widely used to address trophic networks among many feeding guilds, including piscivores (Deagle et al. 2007; Oehm et al. 2017), frugivores (Gonzalez-Varo et al. 2014), nectivores (Hazlehurst et al. 2021), and insectivores (Jedlicka et al. 2013; Jedlicka et al. 2017). DNA-based methods are particularly useful for insectivorous predators whose prey is taxonomically variable, small, and/or rapidly digested (Oehm et al. 2011; Hoenig et al. 2022; Cabodevilla et al. 2023). Early DNA-based diet analysis using Sanger sequencing required plasmid cloning to discriminate mixed-prey amplicons or used an electrophoresis-based technique to separate mixed amplicons followed by sequencing (Irwin & Cristian 1998; Martin et al. 2006; Zeale et al. 2011; Lee et al. 2013; Thongjued et al. 2021). However, these protocols are labor-intensive and relatively expensive. High-throughput next-generation sequencing (NGS) is capable of generating sequences from millions of amplified DNA templates concurrently and offers a powerful and affordable tool for DNA metabarcoding in which mixed DNA within single samples are identified simultaneously. NGS DNA metabarcoding has been used to study diet in a variety of organisms, including mammals and birds (Jedlicka et al. 2017; Wray et al. 2018; Thomas et al. 2022; Schneider et al. 2023), and provides great insight into food webs, niche partitioning, foraging behavior, ecosystem services and disservices, and conservation (Cabodevilla et al. 2021; Olimpi et al. 2022; Spence et al. 2022; Tang et al. 2022; Volpe et al. 2022; Garcia et al. 2023). These methods have even proven useful in elucidating the diets of migratory generalist birds (e.g. Louisiana waterthrush [Parkesia motacilla] and western bluebirds [Sialia mexicana]), which are generally difficult cases for traditional diet analysis because these species travel widely, have a diverse range of food, and are relatively small in size (Trevelline et al. 2016; Jedlicka et al. 2017). Ultimately, these approaches provide unprecedented insight into ecological communities and do so in a non-invasive manner.
Despite the utility of NGS for DNA metabarcoding-based diet analysis, these methods still come with their own challenges. First, the quality and quantity of DNA from fecal samples affect amplification success, as extraction procedures often co-extract PCR inhibitors (e.g. glycoprotein and phenolic compounds) which may lead to false negative results (Jedlicka et al. 2013). Second, DNA metabarcoding commonly relies on “universal” primers that amplify a common fragment of DNA from a broad range of taxa. For arthropods, this region is most commonly the mitochondrial gene cytochrome oxidase subunit I (COI) (Hebert et al. 2003), and in plants, several genes are targeted such as chloroplast trnH-psbA intergenic spacer, large subunit of ribulose-bisphosphate carboxylase gene (rbcL), intron region of a chloroplast tRNA gene (trnL), megakaryocyte-associated tyrosine kinase (matK), and internal transcribed spacer 2 (ITS2) (Kress & Erickson 2007; Kress 2017; Moorhouse-Gann et al. 2018). While the purported universality of these primers makes them ideal for identifying unknown prey species, they often show amplification bias toward specific taxa, thus overestimating the importance of those prey taxa in subsequent diet analysis. For example, the most frequently used COI primers to study the diet of insectivores, developed by Zeale et al. (2011), have been shown to amplify Diptera and Lepidoptera effectively but may be biased against other arthropod orders (Ando et al. 2020). Silva et al. (2019) found that Zeale et al.’s (2011) primers did not detect Formicidae and other Hymenoptera, which were typical diet items for black wheatears (Oenanthe leucura) identified with other methods (Silva et al. 2019). Similarly, invertebrate COI primers developed by Jusino et al. (2019) misleadingly detected no prey taxa in ∼40% of fecal samples from skunks known to be feeding on wasps (Hymenoptera: Vespidae), and this bias was due to poor amplification success in these taxa (Tosa et al. 2023). In addition to bias caused by primer compatibility, fecal samples containing mixed DNA templates can also experience amplification competition between taxa, thus biasing estimates of prey item communities (Deagle et al. 2014; Elbrecht et al. 2019). One solution to limit primer and amplification bias in metabarcoding studies is to use multiple markers or multiple primer sets. This recommendation has recently been adopted in many studies to minimize the bias of single primer pairs, thus potentially increasing taxonomic coverage (Alberdi et al. 2019; Silva et al. 2019; Ando et al. 2020).
Another approach to identify and limit bias in DNA metabarcoding-based diet analyses is to validate methods using captive animals. Species-specific studies allow us to understand how biological, technical, and environmental factors could affect the prey DNA recovered from fecal samples. Studies of captive animals with known diets can inform optimal study design and suggest the most appropriate approach for field studies. This approach can be limiting if the manipulation of captive animals is impractical, as evidenced in that only a handful of studies have conducted such positive control validation of methods. These studies have demonstrated that prey DNA detectability can be influenced by meal size, prey identity, and intrinsic factors of the consumers such as gut transition time (Thalinger et al. 2017; Thuo et al. 2019; Schattanek et al. 2021). In addition, the effectiveness of the DNA markers and databases used significantly contributed to the identification success of prey consumed in feeding trial experiments (Nakahara et al. 2015; Srivathsan et al. 2015).
In this study, our objective was to establish an optimized protocol for DNA metabarcoding diet analysis in domestic and pasture-raised chickens. Pastured and free-range poultry operations are gaining in popularity in the United States (Rothrock et al. 2019), but the direct effects of poultry feeding on pastured systems (and the reciprocal) are unknown from a quantitative perspective. Molecular diet analysis using metabarcoding has not been conducted in chickens (Gallus gallus) to date, and most avian molecular diet analyses are conducted in field settings, where the use of positive controls in a feeding trial is impractical (Crisol-martínez et al. 2016; Trevelline et al. 2016; Jedlicka et al. 2017; Rytkönen et al. 2019; Cabodevilla et al. 2021; Forsman et al. 2022; Spence et al. 2022; Tang et al. 2022; Verkuil et al. 2022; Volpe et al. 2022). Such validation is important to avoid possible biased results in molecular diet analysis (Casper et al. 2007; Oehm et al. 2011; Egeter et al. 2015; Thalinger et al. 2017; Schattanek et al. 2021). We conducted a feeding trial experiment where layer-chicken hens were fed known food items and surveilled for defecation for 8 hours. All observable excreta were collected to evaluate the minimum time for fecal DNA detection, and we compared the efficacy of three commonly used extraction kits and multiple primer pairs for both arthropod and plant species identification. These results will facilitate future diet studies in poultry and provide suggestions for future molecular diet studies in omnivorous or generalist animals.
MATERIALS AND METHODS
Animal ethic statement
All methods were approved under the University of Kentucky Institutional Animal Care and Use Committee, Protocol number 2019–3410.
Feeding trial and excreta sample collection
To establish a methodology for molecular diet analysis in chickens, a feeding trial experiment was conducted using six layer-chicken hens (Table 1) housed in a stationary pen with access to the outdoors in the summer of 2020 (a typical coop with an outdoor run). Birds were not starved and had access to their normal feed which is composed of corn, roasted soybean, soybean meal, wheat, and alfalfa (Nature's Best organic egg layer pellets) during the duration of the feeding trial. We hand-fed known food items (hereafter referred to as “positive control food items”) to the birds over a period of 30 minutes. These positive control food items included a mixed-grain purée made from rice (Oryza sativa, Poaceae), quinoa (Chenopodium quinoa, Amaranthaceae), and “Harvest Delight™ poultry treat” blend (MannaPro®, containing Poaceae: Hordeum vulgare, Panicum miliaceum, Triticum aestivum, Zea mays, Asteraceae: Helianthus annus, Carthamus tinctorius, Linaceae: Linum usitatissimum, Fabaceae: Arachis hypogaea, Apiaceae: Daucus carota, Solanaceae: Solanum lycopersicum, and Vitaceae: Vitis vinifera) and nine commercially available insect species from four orders (Coleoptera: Tenebrio molitor and Zophobas atratus; Orthoptera: Acheta domesticus and Locusta migratoria; Diptera: Protophormia terraenovae and Hermetia illucens; Lepidoptera: Galleria mellonella, Manduca sexta, and Bombyx mori). In preliminary feeding observations, some relatively large insects (e.g., grasshopper [L. migratoria] and hornworm [M. sexta]) were refused by the chickens, so we instead blended those with the mixed-grain purée. Birds were individually hand-fed all items, and we documented which foods were consumed by which birds (Table 1). We then surveilled the birds individually for defecation for 8 hours post-feeding and collected fresh excreta immediately upon defecation on the ground using sterile, disposable polypropylene spatulas. For each excreta sample, we documented the source bird and the time since feeding. Samples were stored in plastic bags on ice and transferred to a −20°C freezer for long-term storage.
Consumed items | Number of excreta (h) | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bird identifier | Breed | Tm | Za | Ad | Lm | Pt | Hi | Gm | Ms | Bm | 1−3 | 4−6 | After 6 | Total excreta |
Perth | Colombian Rock Cross | X | X | X | X | X | X | X | X | X | 3 | 3 | 1 | 7 |
Loretta | Red Sexlink | X | 0 | 0 | X | X | X | X | X | X | 2 | 1 | 1 | 4 |
Sara | Ameraucana | X | 0 | X | X | X | X | X | X | X | 2 | 3 | 1 | 6 |
Aretha | Black Sexlink | X | X | X | X | X | X | X | X | X | 2 | 2 | 1 | 5 |
Patsy | Patridge Cochin | X | 0 | X | X | X | X | X | X | X | 3 | 2 | 1 | 6 |
Gertrude | Buff Orpington | 0 | 0 | X | X | X | X | X | X | X | 1 | 3 | 1 | 5 |
- X indicates that the insect was consumed by the individual bird and 0 means it was not consumed. Ad, Acheta domesticus; Bm, Bombyx mori; Pt, Protophormia terraenovae; Gm, Galleria mellonella; Hi, Hermetia illucens; Lm, Locusta migratoria; Ms, Manduca sexta; Tm, Tenebrio molitor; Za, Zophobas atratus.
DNA extraction and quantification
To compare efficacy of DNA extraction kits, each excreta sample was divided into three roughly equal subsamples (excluding urate and liquid portions of the excreta), each weighing 220–250 mg, and genomic DNA extractions were completed using three commonly used commercial DNA extraction kits: QIAamp Fast DNA Stool Mini Kit, QIAamp PowerFecal Pro DNA Kit, and DNeasy PowerSoil Pro Kit (QIAGEN). DNA was extracted following the manufacturer's instructions in each kit; however, for the QIAamp PowerFecal Pro DNA Kit and DNeasy PowerSoil Kit, we used a bead beating machine set to 10 min at maximum speed (Mini-BeadBeater-96 Homogenizer, Cole-Parmer) for cell lysis. To establish reference sequences for all known food items, insect and plant DNA was extracted from the aforementioned food items using DNeasy Blood and Tissue Kits (QIAGEN) and DNeasy Plant Mini Kits (QIAGEN), following manufacturer's recommendations, for insect and plant samples, respectively; we refer to these as “direct-extracted positive control items.” DNA was quantified using a Thermo Scientific™ NanoDrop™ One C Spectrophotometer and stored at −20°C. DNA quality and quantity measurements can be found in Table S1, Supporting Information 2. We also performed all subsequent library preparation, sequencing, and analysis steps on these positive control food items. In addition, DNA was extracted from fresh store-bought broccoli (Brassica oleracea) and Nymphalis antiopa (Lepidoptera: Nymphalidae) to use as a positive control for successful PCR amplification (ran on check gels, but not sequenced).
DNA amplification and sequencing
Illumina libraries were prepared using a two-step PCR protocol. Locus-specific primers containing an overhang for integrating Illumina adapters were used in the first PCR, and individual-specific i5 and i7 Illumina adapters were then added in a second step-out PCR. For arthropod identification, three pairs of arthropod-specific primers (Zeale et al. 2011; Geller et al. 2013; Leray et al. 2013; Elbrecht et al. 2017) were used to amplify different amplicons from the mitochondrial gene COI. For plant identification, large subunit of ribulose-bisphosphate carboxylase gene (rbcL), intron region of a chloroplast tRNA gene (trnL), and internal transcribed spacer 2 (ITS2) (Yang et al. 2016; Erickson et al. 2017; Moorhouse-Gann et al. 2018) were amplified using three primer pairs (see Table 2 for primer details). PCR was carried out in 40-μL reactions containing 2 μL (10–100 ng μL−1) of DNA template, 1 unit of KAPA HiFi DNA polymerase (Kapa Biosystems), 0.8 μL of 10 mM dNTPs, 8 μL of 5X KAPA HiFi Buffer, 2 μL of 10 μM forward primer, 2 μL of 10 μM reverse primer, and 25 μL PCR grade water. PCR cycling was performed by initial denaturation at 95°C for 3 min, followed by 35 cycles of 98°C for 30 s, 50°C for 30 s, and 72°C for 30 s, and a final extension of 72°C for 10 min. For all PCR batches, each of the positive (broccoli or Nymphalid DNA) and negative (DNA-free water instead of DNA template) control reactions were included to check for PCR amplification success and DNA contamination, respectively. PCR products were visualized using 1.5% agarose gel electrophoresis. Twenty-five microliters of successfully amplified products were purified using Sera-Mag™ Magnetic SpeedBeads™ (GE Healthcare Life Sciences) (Rohland & Reich 2012), and cleaned libraries were quantified using a Qubit 4 Fluorometer (Invitrogen, USA) and equal-molar pooled following the guide for 16S Metagenomic Sequencing Library Preparation for the Illumina MiSeq system. Paired-end 300-bp sequencing was conducted on the final pooled library using Illumina MiSeq V3 chemistry.
Primer name | Sequence (5′−3′) | Amplifiable region (amplicon size) | Target taxa | References |
---|---|---|---|---|
mlCO1intF | GGWACWGGWTGAACWGTWTAYCCYCC | COI (313 bp) | Insecta | Leray et al. (2013) |
jgHCO2198 | TAIACYTCIGGRTGICCRAARAAYCA | Geller et al. (2013) | ||
ZBJ-ArtF | AGATATTGGAACWTTATATTTTATTTTTGG | COI (211 bp) | Insecta and bats | Zeale et al. (2011) |
ZBJ-ArtR | WACTAATCAATTWCCAAATCCTCC | |||
BF1 | ACWGGWTGRACWGTNTAYCC | COI (316 bp) | Animalia | Elbrecht et al. (2017) |
BR2 | GCHCCHGAYATRGCHTTYCC | |||
rbcL-forward | CTTACCAGYCTTGATCGTTACAAAGG | rbcL (380 bp) | Plantae | Erickson et al. (2017) |
rbcL-reverse | GTAAAATCAAGTCCACCRCG | |||
trnL c | CGAAATCGGTAGACGCTACG | trnL (164–197 bp) | Plantae | Yang et al. (2016) |
trnL h | CCATTGAGTCTCTGCACCTATC | |||
UniPlantF | TGTGAATTGCARRATYCMG | ITS2 (187–387 bp) | Plantae | Moorhouse-Gann et al. (2018) |
UniplantR | CCCGHYTGAYYTGRGGTCDC |
Bioinformatic and statistical analysis
Bioinformatic analysis was performed in QIIME 2 version 2022.11 (Bolyen et al. 2019), and default parameters were used unless otherwise noted. A Phred quality score greater than 20 (which translates to 99% base-call accuracy) was required during demultiplexing. Raw sequences were quality filtered using the q2-demux plugin followed by denoising, chimeric sequence removing, singleton removing, denoised paired-end read joining, and sequences dereplicating with DADA2 (Callahan et al. 2016). Amplicon sequence variants (ASVs) were clustered at 100% identity, where even a single nucleotide difference between sequences will be called a unique variant (feature). Taxonomic assignments were performed against custom databases using q2 feature-classifier plugin (Bokulich et al. 2018). We created custom databases for COI, ITS2, rbcL, and trnL regions using RESCRIPt's “extract-seq-segments” protocol (Robeson et al. 2021). Full details are provided in the Supporting Information 1, including sequencing gathering, cleaning, increasing the reference pool, and building and evaluating the reference databases (“classifier” in QIIME system).
Species determinations were made when a query sequence had >98% similarity to a record in the database. Query sequences matching at <98% were assigned to higher taxonomic levels (i.e. genus, family, and order). Further analysis was conducted on the taxonomic tables (Tables S2–S7, Supporting Information 2) containing assigned taxa (identified features) for each sample and the number of reads. To clean the dataset, ASVs that could not be identified to at least family level or ASVs with a count fraction <0.01% were removed from the taxonomic table to avoid possible sequence data contamination (as in Crisol-martínez et al. 2016). Cleaned datasets were then used to calculate the “recovery proportion” for each marker/kit and overall, which is the number of detected positive-control food items in a sample divided by the total number of positive control items consumed by each bird. For example, if a chicken only fed on 10 species of positive control insects in the feeding trial and we detected five species, the recovery proportion was 0.5 (or 50% of positive control items found). Also, the detection success rate was calculated using the number of samples containing positive control items divided by the total number of tested samples.
The statistical analyses were conducted using R Studio version 4.2.1 (R Core Team 2022). A generalized linear mixed model (GLMM) was fit to each of the response variables using the function Anova.glmmTMB in the package glmmTMB (Brooks et al. 2017). A stepwise model selection approach was used until the best-performance DNA extraction kit and primer for each of the plant and insect datasets were found. First, we tested whether the recovery proportion of positive control DNA was influenced by the DNA extraction kits and which kits provided the best PCR product detection. To do this, we used DNA extraction kits as a fixed effect with primers and each bird as a random effect to essentially control pairs of fecal samples from the same fecal event. When the most efficient extraction kit was identified, we tested which primers gave the highest amplification success. The “primer” model then proceeded with the data retained from only the best performance extraction kit, while the primer factor was used as a fixed effect (individual bird was a random effect). Next, the data retained from the best-performance DNA extraction kit and primer were used to test whether the sampling interval time influenced the recovery proportion. Sampling time post-feeding was grouped into three time intervals (1–3, 4–6, and after 6 hours post-feeding), and the time interval was used as a fixed effect, while bird, kit, and primer were random effects within the model. Last, we tested whether the recovery proportion was influenced by individual birds, with each bird used as a fixed effect.
In addition to evaluating the recovery proportion of diet items, we ran additional models to investigate whether the detectability of each positive control item was influenced by time post-feeding. For this purpose, glmmTMB was fitted with a binomial distribution with a presence/absence (0,1) response variable and included continuous time post-feeding (0–480 minutes) as a fixed effect and bird as a random effect.
For the data subset produced with Zeale et al.’s primers (Zeale et al. 2011), we found poor amplification success across the majority of the samples, and the amplifiable products yielded very low DNA concentration resulting in sequencing failure (a result also found in other studies [Forsman et al. 2022]). Thus, we removed this primer set from the dataset before analysis with glmmTMB to prevent biased model performance.
RESULTS
Descriptive statistics
In total, we collected 33 excreta samples from six laying hens (Table 1). All 33 excreta samples were extracted using the PF kit but only 16 samples were extracted for each of the PS and FS kits due to limited fecal materials. For the rest of this paper, we refer to a single “sample” as one excreta sample extracted with a single extraction kit and amplified with a single primer pair. A total of 65 subsamples (PF = 33, PS = 16, and FS = 16) were amplified by the selected primers as shown in Table 2. The libraries generated 15 740 930 raw read pairs based on COI, ITS2, rbcL, and trnL amplification. After quality control filtering, 7 787 149 read pairs were retained. The number of filtered sequences from each quality control step and specific criteria for filtering are shown in Table 3. The cleaned sequences were clustered into 5712 ASVs for COI, 744 for ITS2, and 368 and 169 for rbcL and trnL, respectively. Raw QIIME output for all loci are provided in Tables S2–S7, Supporting Information 2.
Locus | Raw reads | Denoised reads | Merged reads | Percentage of input merged | Non-chimeric reads | Percentage of input non-chimeric | Cleaned reads | ASVs | Min length | Max length |
---|---|---|---|---|---|---|---|---|---|---|
COI | 4 646 109 | 2 306 765 | 2 171 948 | 29.82 | 303 923 | 5.79 | 844 016 | 5712 | 286 | 549 |
ITS2 | 2 475 924 | 1 697 429 | 1 602 307 | 44.62 | 262 685 | 9.71 | 606 167 | 744 | 285 | 492 |
rbcL | 808 616 | 596 582 | 558 458 | 50.38 | 467 943 | 43.01 | 448 645 | 348 | 270 | 507 |
trnL | 7 810 281 | 40 661 | 29 913 | 0.66 | 29 493 | 0.66 | 5 888 321 | 169 | 221 | 428 |
- COI, cytochrome oxidase subunit I; ITS2, internal transcribed spacer 2.
Overall, we detected DNA belonging to five kingdoms of Eukaryotes, with 89% of the amplified reads matching plants, 11% to Animalia and Fungi, and very few reads from Protozoa and Chromista (Table S8, Supporting Information 2). The assigned ASVs (of both positive control items and non-positive control taxa) were taxonomically classified into 23 orders and 39 families of land plants and 14 orders and 22 families of terrestrial animals. For plant detection, Poaceae was the most abundant plant family in terms of read counts and taxonomic variation (Tables S9,S11, Supporting Information 2). The other common families were Fabaceae and Chenopodiaceae, while Solanaceae, Primulaceae, and Convolvulaceae were rarely found. For insects, the most abundant sequences belonged to families Acrididae (Orthoptera) and Sphingidae (Lepidoptera). Additionally, only a few insect species in addition to the positive control items fed to the birds were detected, and these included Mimeoma maculata (Coleoptera: Scarabaeidae), Celticecis spiniformis (Diptera: Cecidomyiidae), and Anoplocnemis phasianus (Hemiptera: Coreidae) (Tables S10,S12, Supporting Information 2). Since the chosen primers are universal for metazoans, many animals including Arthropoda, Annelida, Mollusca, Nematoda, and Platyhelminthes were also co-amplified and detectable in the backyard bird fecal samples (Table S12, Supporting Information 2). For non-positive control plant items, 54 species in 72 genera from 34 families were detected across plant libraries, including a wide range of plants that were in close proximity to this habitat (personal observation). Regarding the efficacy of these markers at delimiting positive control food items, generally, all of the positive control food items could be distinguished by all markers, at least to the genus level, and no positive food items were from the same genus.
Positive control food item tests
Plant loci
Using three loci, we detected DNA from 12 of 13 positive control plant items (Fig. 1); 10 were detected with ITS2, while only 7 and 4 were detected with rbcL and trnL, respectively, and cumulative detection mirrored this pattern (Fig. S3, Supporting Information 1). Each primer pair was able to detect unique food items (Fig. 1). Although carrot (D. carota) was included as a positive control item as part of the processed poultry treat mix, it could not be detected with any primer pair. We did detect carrot in the direct-extracted positive control samples, although at very low read counts as compared to other directly extracted food items. This could be due to low concentrations of carrot as a component of the proprietary recipe, or DNA degradation from the production process (particularly heating and drying); proportions of ingredients in this poultry treat mix and processing steps were unfortunately proprietary, so we are unable to explore this further. The detection success rate for each plant species ranged from 2.04% to 81.63%. The most common detectable plant DNA belongs to C. quinoa (81.63%), H. vulgare (81.63%), and T. aestivum (79.59%) followed by H. annus (77.55), A. hypogaea (61.22%), and L. usitatissimum (48.98%). Solanum lycopersicum and Z. mays were poorly detected with around 2% observed for each (Fig. 2).


The GLMM comparing the effect of DNA extraction kits on the recovery proportion of plant items revealed a significant effect of extraction kits (P < 0.0001). A pair-wise post hoc analysis revealed that the FS kit had a lower recovery proportion of plant items relative to the PF (P < 0.0001) and PS kits (P = 0.0002), while the recovery proportions between the PF and PS kits did not differ (P = 0.9735). We then compared the effect of each targeted locus on the recovery of the diet items with the FS kit removed from the dataset. The comparison of each locus revealed that the recovery proportion of plant-positive control items was significantly influenced by the targeted locus (P < 0.0001). A pair-wise post hoc analysis revealed that the ITS2 locus had the highest recovery proportion of diet items relative to rbcL (P < 0.0001) and trnL (P < 0.0001), but there was no difference in recovery proportion between rbcL and trnL (P = 0.4432). We compared the effect of sampling time interval post-feeding on the recovery proportion of the plant items in excreta samples for the PF and PS kits and ITS2 primer only, which were the best-performing kits and locus. The GLMM confirmed a significant effect of sampling interval on recovery proportion (P < 0.0001). A pair-wise post hoc analysis revealed that collecting samples after 6 hours post-feeding had the highest recovery proportion of plant DNA relative to 1–3 hours post-feeding (P < 0.0001) and 4–6 hours post-feeding (P = 0.0255), and collecting samples 4–6 hours post feeding had higher recovery proportion of plant DNA than collecting 1–3 hours after feeding (P = 0.0082) (Fig. 3). Additionally, the GLMM analysis suggested that individual birds affected recovery proportion of plant positive control DNA (P = 0.0416).

We observed that for overall recovery proportion, the sampling time interval contributed to the amplification success of the plant positive control DNA. However, this pattern may be biased by species-specific effects. To test whether the DNA detectability of each food item individually was influenced by time post-feeding, we fit binomial models to evaluate the probability of the detection of each species over time (with ITS2 alone, again as the best-performing locus). The probability of detecting P. miliaceum (P = 0.0017), C. quinoa (P = 0.0026), and L. usitatissimum (P = 0.0005) increased with time post-feeding. Similarly, using the data obtained from loci rbcL and trnL, the probability of detecting P. miliaceum (P = 0.0443), C. quinoa (P = 0.0001), L. usitatissimum (P = 0.0031), A. hypogaea (P = 0.0059), and O. sativa (P = 0.0023) increased with time post-feeding (Fig. S1, Supporting Information 1).
Arthropod loci
Using two primer pairs, DNA from all nine insect-positive control items was detectable in the combined libraries. Both primer pairs from Geller et al. (2013)/Leray et al. (2013) as well as Elbrecht et al. (2017) performed equally; each of them was able to detect nine insect-positive control items, and detection success rate for positive control items ranged from 23.53% to 93.88% across samples. The most commonly detected feed items were M. sexta (93.88%) and L. migratoria (89.80%) followed by G. mellonella (75.51%), B. mori (65.31%), and P. terraenovae (63.27%), while A. domesticus, T. molitor, and Z. atratus were less frequently detected (23.53–41.86%) (Fig. 4). Cumulatively, these two primer pairs performed equally (Fig. S3, Supporting Information 1).

The GLMM comparing the effect of the DNA extraction kit on the recovery proportion of insect-positive control items revealed a significant effect of the extraction kit (P < 0.0001). A pair-wise post hoc analysis revealed that the FS kit had a lower recovery proportion of diet items relative to the PF (P < 0.0001) and PS kits (P < 0.0001), but the PF and PS kits did not significantly differ (P = 0.8512). Thus, we removed the FS kit data and proceeded to compare the effect of the “targeted locus” (which in the case of COI is alternative primer pairs for the same locus) on the recovery proportion of insect-positive control items. A GLMM comparing the effect of primer pairs on the recovery proportion of insect positive control revealed that there was no significant effect of the different primers used between Geller and BRBF primers (P = 0.2726) (Fig. 5). The GLMMs comparing either the effect of post-feeding sampling time interval or variation among individual birds on recovery proportion revealed that there was no significant effect for either variable (P = 0.5982 and P = 0.5174, respectively).

For individual species, we found no relationship between time post-feeding and the probability of DNA detection for all insect-positive control items, except Z. atratus (P = 0.0316) (Fig. S2, Supporting Information 1). With insect and plant food items combined, cumulative diet items detected over time on a per-bird basis followed similar trajectories (Fig. S4, Supporting Information 1).
DISCUSSION
In DNA-based diet analyses, positive control validation via feeding trials is an undervalued validation procedure. Our study is the first, to our knowledge, to implement feeding trials with both plant and animal diet items in a DNA metabarcoding-based diet analysis. Conducting such species-specific feeding trials is essential to robust analysis as it assures the applicability and reliability of the approach, thus avoiding bias in interpretation when applying such methodologies to field studies. These results will guide future research in the use of positive control feeding trials for DNA metabarcoding-based diet analysis and serve as the foundation for future research into the agroecological roles of poultry, and we discuss these impacts below.
Positive control feeding trials
Although DNA metabarcoding diet analysis holds some advantages over traditional observation and microscopic techniques, highly diverse diets, such as those of omnivorous animals, still present challenges to these methodologies. Taxon-specific amplification bias (Silva et al. 2019; Ando et al. 2020; Tosa et al. 2023), limitations of the universality of markers (Dupuis et al. 2012) or available databases (Coissac et al. 2012), and logistical considerations such as technical and biological replication (Mata et al. 2019) can all impact efficacy of metabarcoding approaches with diverse targets. Feeding trials with known positive control food items are one approach to ameliorate potential biases. However, such feeding trials have generally been limited to strictly herbivorous or carnivorous diets and few diet items. For example, Nakahara et al. (2015) conducted a feeding trial in captive sika deer (Cervus nippon) with five known plant species and the chloroplast trnL region and observed a mismatch between the proportions of food items consumed and resulting sequences. Srivathsan et al. (2015) compared the efficacy of metabarcoding and metagenomics approach in characterizing the diet of two red-shanked doucs langurs (Pygathrix nemaeus) that were fed a known diet including 15 species of foliage, fruits, vegetables, and cereals, and <40% of food items were detected with metabarcoding. Thalinger et al. (2017) conducted a feeding trial on piscivorous birds, the great cormorants (Phalacrocorax carbo), by feeding up to six fish species in varying quantities to assess meal size effects on post-feeding prey DNA detectability and concluded that numerous fecal samples were required to detect small or rarely consumed prey. Finally, Thuo et al. (2019) evaluated the effectiveness of DNA metabarcoding for diet analysis in two cheetahs (Acinonyx jubatus) by feeding them six species of prey animal, including birds and mammals, and found that DNA detection was influenced by scat age and degradation, meal size, and prey species consumed. Our study is the first to implement feeding trials in DNA metabarcoding-based diet analysis for both plant and arthropod diet items in an omnivore, and we included 22 species of food items in our feeding trial (although Oehm et al. (2011) conducted feeding trials and DNA-based diet analysis in an omnivore [carrion crow Corvus corone], they only used insects in their feeding trial). Similar to some of the aforementioned feeding trial studies, we found highly variable detection of diverse food items, particularly for plant food items, and can conclude that multiple fecal samples/multiple analyzed loci would generally be required for a comprehensive record of diet items. Although without using feeding trials, De Barba et al. (2014) studied the brown bear diet (Ursus arctos, another generalist omnivore) using metabarcoding with multiple markers to identify both plants and prey animals consumed and found that the multiplexing strategy could identify upward of 60% of targeted preys to species or genus level, which is within the range of detectability observed here.
In general, prey DNA detectability in omnivorous diets is influenced by both the prey being consumed, the predator species in question, and how much prey was consumed. Schattanek et al. (2021) found that the same multi-insect meal given to different bat species could be detectable after short digestion times. However, prey items fed in larger quantities could be detected longer than those fed in smaller quantities. Prey identity also had an effect on DNA detectability post-feeding with mealworm DNA being detectable longer than wax moths/house flies (Schattanek et al. 2021). Unfortunately, for our study, the end point of DNA detectability and the influence of meal size were not measured, and thus we have no evidence indicating the total time that each food item could be detected. However, for poultry specifically, our results indicate that 6 hours post-feeding is an accurate sampling time for realistic daily diet composition of pastured birds; during this period, DNA from all detectable plant- and insect-positive control items were detected, while earlier sampling times often failed to detect some harder-bodied insect prey, for example, the hard-bodied beetles T. molitor and Z. atratus.
DNA extraction kits and target loci
Our results indicated that multiple factors contributed to the success in recovering positive control DNA from fecal samples including DNA extraction kit choices and targeted loci. Extracting genomic DNA from fecal samples is difficult compared to other DNA sources (e.g. blood, muscle tissue), and obtaining DNA from the excreta of avian species has proved even more challenging since the mixture of fecal and urinary excretions add additional purification challenges to commercial DNA extraction kits (Eriksson et al. 2017). We found that the separation of fecal and urate portions of excreta and DNA extraction kits involving bead-homogenization (PF and PS) were able to provide high-quality DNA. For highly degraded samples, DNA loss can be substantial during the typical column-based extraction process (Kemp et al. 2014), and additional mechanical disruption of cells via bead homogenization is beneficial to release DNA from cells, and thus increase DNA availability downstream. Additionally, for omnivores, the bead homogenization step may help to grind up fecal material from hard food items (e.g. grain, seed coat). Eriksson et al. (2017) also found that a pretreatment use of bead beating followed by an optimal protocol of silica column-based DNA extraction kit greatly improved DNA concentration in wild Antarctic birds. In avian systems, several studies have compared the efficacy of DNA extraction kits for use with fecal samples but were focused on broadly different extraction procedures (e.g. CTAB vs. column-based vs. paramagnetic bead-based extraction; Oehm et al. 2011; Jedlicka et al. 2013; Vo & Jedlicka 2014); while Oehm et al. (2011) found that CTAB extractions outperformed commercial kits, this was only the case with small avian excreta samples, and the other studies focused explicitly on excreta from a small bird species (western bluebird Sialia mexicana). More comprehensive testing of non-column-based extraction kits with modifications for large excreta samples, such as those from poultry, would be useful for future studies.
Another factor that significantly affected fecal DNA detectability is the target locus that was used for sequencing. Many studies used only one marker/locus to study diet. For example, the chloroplast trnL P6 loop region is commonly used for plant identification when studying herbivores (Nakahara et al. 2015; Srivathsan et al. 2015), and the mitochondrial COI or 12S rRNA gene is often used to identify species of arthropod, fish, reptile, and mammal prey in carnivores (Oehm et al. 2011; Egeter et al. 2015; Thalinger et al. 2017; Thuo et al. 2019; Miller-ter Kuile et al. 2021; Schattanek et al. 2021). Here, we show that using a multi-locus approach increases taxonomic resolution and delimitation power in omnivorous diets and thus facilitates species identification (Dupuis et al. 2012; Tosa et al. 2023; Zhang et al. 2023). Our plant results, in particular, highlight that a multi-locus approach is generally required in amplifying fecal DNA to recover the full diet diversity of plant food items. There is no clear conclusion in the literature on which plant barcoding marker is best for diet analysis studies, and four plant markers are commonly used in plant DNA barcoding (Hollingsworth et al. 2009; Kress 2017). We compared three loci for metabarcoding of plant-positive control items and found that ITS2 outperformed both rbcL and trnL, although the latter did both detect unique positive control items. In addition to positive control species, ITS2 also provided the most diversity of non-positive plant species detection (Table S8, Supporting Information 2). Chen et al. (2010) found that using ITS2 as a barcoding region could identify medicinal plants accurately with >90% success for species-level identification (6600 specimens belonging to 4800 plant species). Similarly, several studies testing the efficacy of ITS2 confirmed its ability to identify plant species in a variety of families including dicotyledons, monocotyledons, gymnosperms, ferns, and mosses (Gao et al. 2010; Yao et al. 2010; Pang et al. 2011; Gu et al. 2013; Feng et al. 2016). Although trnL region is frequently used for metabarcoding diet analysis, some studies have found that it was limited in differentiating congeners, which limited prey/food item identification to the genus or family level (Scasta et al. 2019; Tosa et al. 2023). A low discriminatory power has also been demonstrated for rbcL (and is supported here by the poor species-level identification success with this locus) although this efficacy can be rectified by combining it with various plastid or nuclear loci for land plant identification at the family and genus levels (Li et al. 2015; Mallott et al. 2018).
Implications for poultry research
An additional impetus for this research was to establish methodologies for poultry diet analysis in agroecological research. Ongoing projects will use this approach in assessing the impact of poultry in small-scale organic agricultural systems (e.g. extending the work of Garcia et al. (2023)), and thus several important considerations and limitations about using DNA-based diet analysis in poultry systems deserve to be revisited. First, we found that sampling >6 hours post-feeding resulted in the most accurate estimates of diet diversity with positive control food items. Minimum detection time was 44 minutes, which is similar to other relatively large, omnivorous birds (carrion crows, 30 minute detection time; Oehm et al. 2011), although such “early” sampling may increase chances of false negatives, particularly with diverse arthropod prey where soft- and hard-bodied species may be present. Indeed, we found that super worm larvae (Z. atratus), a beetle with relatively hard-bodied larvae, were minimally detectable in the early sampling period, while other insects were detected (Fig. S2, Supporting Information 1). Our sampling did not encompass the time at which food items could no longer be detected, so we cannot use current data for estimating retention time. However, for field studies, this six-hour period will be useful in gauging a chicken's diet during a normal day-time activity period.
As with all sequencing-based diet analysis techniques, these data do not allow us to quantify the amount of any specific diet items as a function of relative read frequency, and we are limited to presence/absence-based conclusions. Additionally, these approaches are unable to identify the life stage of food items (e.g. larval vs. adult insects, seeds vs. foliage of plants). To gain a more complete picture of pastured poultry ecosystem functions and bird feeding behavior, sequencing-based approaches should be combined with other methods, for example, histological, camera recoding, direct observation, GPS-collared tracking, and stable isotope analysis as other authors have suggested (Clark & Gage 1996; Kerley et al. 2015; Murray et al. 2016; Lin et al. 2021).
CONCLUSION
Despite their use in ensuring reproducible and robust results, few metabarcoding diet analysis studies have utilized experimental feeding trials and positive control food items. Here, we present the first use of such an approach with both plant- and animal-positive control diet items in an omnivore with a diverse diet. While all insect food items were detected in our chicken feeding trial, we found highly variable results for plant food items, suggesting that multi-locus approaches were generally required to recover the full diet diversity of these positive controls. Ultimately, the accuracy of DNA metabarcoding-based diet analyses is highly dependent on marker choice and other aspects of the methodological approach, highlighting the need for protocol validation for each system. These results serve as a foundation for future research into the agroecological roles of poultry, as well as more broadly into the methodologies of DNA-based diet analysis.
ACKNOWLEDGMENTS
This research was supported by USDA-NIFA HATCH grants to J.R.D. (Project KY008091) and D.G. (KY008103) and by USDA OREI Grant #2019-51300-30244 awarded to D.G. Sequencing was conducted at the OncoGenomics Shared Resource Facility of the University of Kentucky Markey Cancer Center (P30CA177558), and we thank the University of Kentucky Center for Computational Sciences and Information Technology Services Research Computing for their support and use of the Lipscomb Compute Cluster and associated research computing resources and the valuable comments from Dr. Dezhi Zhang on an earlier draft of the study.