Funding: The study was funded by a Natural Environment Research Council grant NE/K016148/1 & NE/K016261/1 awarded as part of the Human Modified Tropical Forests Programme, and RD received additional support from The Leverhulme Trust Study Abroad Studentship (SAS-2016-100).

About

Sections

PDF

Tools

Share a link

Email
Wechat
Bluesky

ABSTRACT

Invertebrate-derived DNA (iDNA) has been successfully utilized for surveying mammalian biodiversity in several ecosystems. Yet, as with all sampling methods, this approach suffers from potential biases, including those introduced by the choice of invertebrate sampler, as well as the stochasticity of DNA amplification during PCR. Occupancy modeling is a statistical framework that can help account for imperfect detections in sampling and can be used to improve iDNA surveys. Using a case study based on the DNA screened from the blood meals of leeches, we demonstrate how multiscale occupancy models can be applied to the molecular detection of vertebrates to reveal the nuances in iDNA detections. Leeches were collected across a habitat degradation gradient in Sabah, Malaysian Borneo, in 2015 and 2016. We estimated three probabilities describing the occupancy, availability, and detection of three abundant mammals (bearded pig, muntjac and sambar deer) and compared how these values were impacted by environmental and technical covariates. For 2015, we found that null models without covariates revealed no clear differences in each of the three probabilities across taxa. However, in 2016, although the taxa have comparable occupancy, deviations occurred in the other two probabilities, with the sambar deer showing the lowest availability and muntjac with the lowest detection probability. Univariate models constructed for each taxon and year revealed differential impacts of the covariates; for example, a strong positive effect of DNA concentration on the detection of sambar deer and bearded pig was seen in 2016 only. Finally, our estimation of the minimum numbers of biological and technical replicates highlights the important trade-off between achieving high probabilities of availability and detection and realistic amounts of sampling. Our results showcase the use of occupancy models for leech-iDNA biodiversity surveys but highlight the potential effects of sample type, methodological design, and sample size.

1 Introduction

It has been shown previously that vertebrate DNA can be obtained from a range of blood-feeding invertebrates, including from carrion flies (e.g., Hoffmann et al. 2018), terrestrial leeches (e.g., Danabalan et al. 2023), mosquitos (e.g., Chivas et al. 2025) and sandflies (e.g., Kocher et al. 2017). This has led to growing interest in the potential of using these taxa as samplers for rapid assessments of vertebrate communities and to measure and monitor biodiversity (Gogarten et al. 2020). To date, most invertebrate-derived DNA (iDNA) studies have focused on documenting species presence, with less consideration of the extent to which such methods detect, or fail to detect, species that are present. Few estimates of false negatives have been conducted, and error rates in general are not well reported. Many common practices, such as the use of technical replicates, are based on convention and budget rather than data-driven and informed sampling approaches. However, like all survey methods, iDNA surveys are prone to the inherent issues of imperfect detection, and there is a need to quantify the uncertainty surrounding the detections to better understand how studies should be designed and how these methods can be improved to account for sampling biases.

Occupancy modeling is a powerful statistical framework that has been widely adopted for analyzing biodiversity survey data (MacKenzie et al. 2003). This approach is based on the idea that most survey methods are unlikely to detect all individuals that are present at a site, and it leverages replicate sampling to disentangle detection from presence (Guillera-Arroita et al. 2010). A particular feature of these models is the ability to incorporate covariates, such as information about location, habitat quality, or abiotic conditions, which may influence the data that is collected.

In recent years, occupancy models have been applied to environmental DNA (eDNA) surveys; for example, to track chytrid fungus in amphibians (Schmidt et al. 2013), monitor the invasive Burmese python in Florida (Hunter et al. 2015), and investigate goby fish metapopulation dynamics (Martel et al. 2020), providing improved detection probabilities, especially in species that are difficult to track. In contrast, almost no studies have applied occupancy models to iDNA surveys, with the exception of investigations that have examined detections in leech bloodmeals. For example, Abrams et al. (2019) revealed consistent estimates of occupancy probabilities obtained from leeches compared to those obtained from camera traps, although with higher uncertainty for the leech estimates, which the authors speculated might be overcome by increasing sample number. The value of large samples was also supported by a comprehensive assessment of vertebrate occupancy across a whole protected area (Ji et al. 2022). Here, an analysis of over 30,000 leeches revealed changes in occupancy with spatial metrics, including distance to reserve edge. While such studies have demonstrated real-world conservation applications of using leech iDNA, they also highlight trade-offs between the lower detection rate found in leeches (compared to methods such as camera trapping) and the sampling effort required to make robust estimates.

Given their flexibility, occupancy models applied to iDNA studies also provide a means to explore how detections can be affected by forms of replication. First, taking multiple samples from within a site during a survey is analogous to spatial replication in a traditional survey. Second, PCR replicates subsampled from the same DNA extract can be considered equivalent to temporal replicates in a traditional survey (Dorazio and Erickson 2017). Replicate PCR reactions are commonly used to overcome stochasticity in DNA amplification. Such heterogeneity is more pronounced where DNA concentrations are low (Barnes and Turner 2016), commonly seen in iDNA studies because of the degradation of DNA during digestion. Additionally, in leech-based iDNA studies, long inter-feeding intervals and the pooling of “empty” individuals can also contribute to lower levels of DNA. Both biological and technical forms of replication can be incorporated into multi-scale occupancy models. A multiscale occupancy model builds on the classic two-level occupancy model that estimates occupancy probability (ψ) and detection probability (p) (MacKenzie et al. 2003) with the additional parameter of availability probability (θ) (Kéry and Royle 2016). For the purposes of an iDNA survey, here the occupancy probability describes the probability that the DNA of the target animal is occupying a site. The availability probability is the probability that leeches sampled contain the DNA of the target animal, given it was at the site. While the detection probability then describes the probability that the DNA of the target animal is amplified given that the DNA was present in the leech, given the animal was at the site.

To assess the usefulness of multi-scale occupancy models applied to iDNA for revealing subtle differences in mammal presence across spatial, temporal, and technical replicates, here we reanalyze a dataset of mammal detections inferred from leech bloodmeals from Malaysian Borneo. Using multiscale occupancy models, we model three probabilities, occupancy (ψ), availability (θ) and detection (p), as functions of environmental and technical covariates relating to habitat degradation, sampling effort, and DNA concentration. Finally, we estimate the minimum replication needed to confidently detect all three focal taxa, as this remains an important financial and logistical constraint on DNA biodiversity surveys.

2 Materials and Methods

2.1 iDNA Biodiversity Survey and Data Generation

The preliminary iDNA data used to generate these occupancy models came from a previous study, Drinkwater, Jucker, et al. (2021) where full field sampling, molecular, and bioinformatic methods are described. Briefly, terrestrial tiger leeches (tiger leech morphospecies) collected at the Stability of Altered Forest Ecosystems project (SAFE), Sabah, Malaysia, were screened for mammal DNA using metabarcoding methods. Leeches were collected during repeated surveys in two campaigns (February—June 2015 and September—December 2016). In each campaign, several surveys were conducted at multiple sites in different forest types—categorized here as primary, twice-logged, heavily logged, and riparian forest (for more details on the SAFE project, see Ewers et al. 2011). DNA extracts from leeches sampled within the same site and survey were pooled, amplified in triplicate, and sequenced with 150 bp paired-end Illumina MiSeq (Table 1). Reads were processed following Drinkwater, Jucker, et al. (2021); however, after length and quality filtering, we retained all sequences regardless of how many replicates they were found in. This allowed us to investigate PCR replicate level covariates. The reads were clustered into OTUs at 95% similarity with sumaclust v1.3 (Mercier et al. 2013) and chimaeras were removed (Schloss et al. 2009). Post-clustering filtering was conducted with LULU, which removes OTUs that are likely to have arisen through sequencing error alone (Frøslev et al. 2017), and these curated OTUs were taxonomically assigned to the custom database using BLAST (Camacho et al. 2009).

TABLE 1. Summary of leech pools and detections of the three target mammals from each habitat type in 2015 and 2016. MUSP = muntjac, RUUN = sambar deer, and SUBA = bearded pig. Leeches were pooled based on site and survey date, aiming for 10 per pool where there were more than 10 individuals per site/survey.

Habitat type	Pools	Total leech individuals	Av. number of individuals per pool	MUSP	RUUN	SUBA
2015
Heavily logged	29	271	9.7	4	3	9
Selectively logged	41	425	10.3	10	11	7
Riparian	35	286	8.2	9	1	9
2016
Heavily logged	21	193	9.2	8	16	21
Selectively logged	32	359	11.2	2	8	61
Old growth	23	200	9.1	3	1	15
Riparian	6	107	17.8	1	0	1

2.2 Multi-Scale Occupancy Models

There are several assumptions of occupancy models, although these might not always be fully met. The first assumption is that detections are independent. Although this could be violated if multiple leeches fed on the same large-bodied mammals, our detection probabilities are estimated from pools obtained from different sites and surveys. Second, detection and occupancy are constant in space or accounted for by covariates, which is the strategy taken here. In the models described below, covariates are added to the model at each of the three levels (site, sample, and replicate) to account for variation influencing the probabilities. Third, the sites do not change occupancy state during the time of the survey (i.e., sampling is within a closed season). This assumption is harder to satisfy; however, field sampling took place over a short period of 3–4 months, and considering the feeding behavior and degradation of DNA in the leech gut (Schnell et al. 2012), we are confident that we are only detecting the last blood meal per leech. The final assumption is that there are no false positives with identifications; again, a harder assumption to satisfy, and this is achieved through bioinformatic filtering protocols and conservative taxonomic assignments to a well-resolved database.

We generated single species multi-scale occupancy models following Dorazio and Erickson (2017) and Hunter et al. (2015) but adapted for leech iDNA detections for two seasons (2015 and 2016). From the full dataset, we selected the three taxa with the most detections across all habitat types (heavily degraded forest, twice logged forest, old growth forest and riparian forest): the Bornean bearded pig (Sus barbatus), muntjac (Muntiacus sp) and sambar deer (Rusa unicolor). Of these, muntjac sequences were only identified to genus level due to limitations of the genetic marker (see Drinkwater, Jucker, et al. 2021), although both species co-occur in the area and are thought to occupy a similar habitat (IUCN). We generated detection histories for each taxon by coding each detection in a PCR replicate as a 1 and coding a non-detection as a 0. This was done across the technical and spatial replication within these three nested levels of sampling:

Site level—12 sites in 2015 and 10 sites in 2016 in the four forest types
Sample level—multiple leech pools from within each site, from the same survey (Table 1)
Replicate level—three PCR replicates per leech pool

For example, for bearded pig at a given site, in three leech pools (sample), with triplicate PCRs (replicate), a hypothetical detection history at a single site could be expressed as {101 111 100}, which shows the three PCR replicates nested within pools. Here, there are two detections in the first pool, three in the second pool, and a single detection in the third pool.

The levels of the hierarchical model as defined in the R package eDNAoccupancy (Dorazio and Erickson 2017) are as follows:

(1) Site-level (Occupancy probability, ψ)

{Z}_i\sim Bernoulli\left({\psi}_i\right)

The probability the DNA from the target taxon is occupying a site where terrestrial leeches are present. Z describes the presence (Z = 1) or absence (Z = 0) of the target DNA at site i as a function of the occupancy parameter at the site ψ_i.

(2) Sample-level (Availability probability, θ)

{A}_{ij}\mid {z}_i\sim Bernoulli\left({z}_i\ {\theta}_{ij}\right)

This describes the probability that the DNA was available in the leech pool, the leeches sampled had fed on the blood of the target taxon, given the target is present at the site. The availability parameter θ_ij is conditional on the value of z_i. A = the probability the target DNA occurs in the sample j, given the species was present (z = 1) at site i.

(3) Replicate-level (detection probability, p)

{Y}_{ij k}\mid {a}_{ij}\sim Bernoulli\left({a}_{ij}\ {p}_{ij k}\right)

The final equation describes the probability that DNA will be detected in a PCR replicate given that the DNA was present in the leech pool (and given it was present at the site). Here, the detection (Y_ijk) in the kth replicate of the jth sample at the ith site is conditional on the occurrence of the DNA within the leech sample a_ij. p_ijk is the detection probability, which is conditional on DNA being amplified in the kth PCR replicate of the jth leech pool, collected at the ith site.

2.3 Covariate Selection and Model Construction

To capture the variability across the human-modified landscape, we used a site-level covariate of habitat heterogeneity, calculated from a LiDAR aerial survey across the SAFE project in 2014 (Jucker et al. 2018), extracted at a 50 m buffer for each site (see Drinkwater, Jucker, et al. 2021). Values can range from −1 to 1, where high negative values correspond to a more even canopy dispersion, although this is rare in natural forests; high-quality forests with intact canopies are more likely to get values near zero, while values closer to one indicate that there is more clustering in canopy cover, for example with more gaps, as is typical in degraded, human-modified forests.

As a sample-level covariate, we used the number of pools per site to indicate sampling effort, which would therefore affect the availability probability (θ). High numbers of pools per site are expected to increase the chances of availability of the mammal DNA for sequencing. Finally, we included the DNA concentration of the pools as the replicate-level covariate. We expect this to influence the detection parameter, p, as DNA concentration can affect the amplification success of the PCR reaction, with too much DNA in the pool potentially leading to inhibition and too little DNA increasing the stochasticity of DNA amplification of the target species in a mixed pool.

For each of the three mammals and for each of the two sampling seasons (2015 and 2016), we constructed four models: null models (i.e., no covariates and only intercepts) and three univariate models, each including one covariate at the nested levels of the model (Table 2). We did not test the effect of the interactions between covariates because the dataset was not large enough, and MCMC chains did not converge for more complex models.

TABLE 2. The multiscale univariate occupancy models fitted to each taxon's detection histories in both years. Covariates were added at the site (ψ), sample (θ) and replicate-level (p) covariates, shown in brackets with (.) indicating no covariate.

Model	Covariate structure	Description
Null	ψ (.) θ (.) p (.)	No covariates
Site level	ψ (heterogeneity) θ (.) p (.)	Habitat heterogeneity
Sample level	ψ (.) θ (pools) p (.)	Survey effort
Replicate level	ψ (.) θ (.) p (conc)	iDNA concentration

Models were run using the eDNAoccupancy package (version 0.2.7) with default parameters (Dorazio and Erickson 2017) in R (R Core Team 2019), which uses Bayesian inference to estimate the parameters of ψ, θ and p, and assumes a uniform prior distribution for the occupancy parameter (ψ) and assumes a multivariate normal prior for the availability (θ) and detection (p) probabilities (further details in Dorazio and Erickson 2017). Each chain was updated at least 10 times after 20,000 iterations, using the updateOccModel function, and all models showed MCMC chain convergence after 200,000 iterations. We used trace plots to assess model convergence, and we compared goodness-of-fit for each of the models using two commonly applied criteria for Bayesian models: Watanabe's AIC (Watanabe 2010) and posterior-predictive loss (PPLC, Gelfand and Ghosh 1998) (see Table S1).

2.4 Cumulative Probabilities

Finally, to test the effects of sampling and PCR replication on two of the probabilities, availability (θ) and detection (p), we calculated the cumulative probability scores. This is an important consideration for molecular biodiversity studies as increasing the number of samples and PCR replicates to be sequenced will increase costs. Any increase in cost needs to be traded off against the increasing probability of detection with greater sample size, and the benefits of increased confidence in the detections by confirming a detection in multiple PCR reactions. This is of particular concern when designing molecular surveys for conservation monitoring schemes where confidence in the presence of a rare target species is crucial. We calculated the cumulative availability probability as $1-{\left(1-\uptheta \right)}^{kbio}$ and the cumulative detection probability as $1-{\left(1-p\right)}^{kpcr}$ , where θ is the availability probability, p is the detection probability, and k_bio describes the number of leech pools and k_pcr describes the number of PCR replicates. We calculated this parameter for k = 1 up to k = 20 (Hunter et al. 2015). In both cases, θ and p, we take the starting value for k = 1 from the mean posterior probability taken from the null model (which included no covariates).

3 Results

3.1 Detection Histories and Model Evaluation

To generate the detection history for each taxon, in each of the three PCR replicates, we set the lowest minimum PCR threshold to one, such that each taxon only had to be detected once in a replicate after filtering for sequence quality. This resulted in 485 OTUs, of which 45 OTUs were identified as chimeric sequences and thus removed. Of the remaining 440 OTUs, 82 were identified as true sequences (“parent OTUs”) by LULU (post-clustering filtering). For the three target taxa, we then identified the number of detections across each of the 187 leech pools (Table 1). The total number of detections for muntjac in 2015 and 2016 was, respectively, 22 and 11, for sambar deer was 12 and 10, and for bearded pig was 23 and 43.

3.2 Null Model

For null models fitted without covariates, the occupancy probability (ψ) was high (> 0.75) for all taxa in both years, except for the sambar deer in 2015 (Figure 1). The probability of availability (θ) showed no clear differences among taxa in 2015. In contrast, we recorded lower availability for all three taxa in 2016, with sambar deer showing much lower availability than bearded pig (0.13 and 0.54, respectively) (Figure 1). The probability of detection also showed variation in 2016, with higher values for bearded pig and sambar deer but a low probability of detection for muntjac DNA.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Null model results with the posterior median occupancy (ψ), availability (θ) and detection probability (p) for the three target taxa sambar deer (light blue), bearded pig (yellow) and muntjac (dark gray). In the null model the three parameters are estimated without covariates for both years, 2015 and 2016. Error bars show the 95% credible intervals.

3.3 Impact of Habitat Heterogeneity on Occupancy Probability

For site-level models that included habitat heterogeneity as a covariate, muntjac and bearded pig showed little or no change in mean occupancy with increasing levels of canopy clustering (Figure 2). However, the mean response of the sambar deer to habitat heterogeneity in 2015 was negative, with decreasing occupancy associated with increasing canopy clustering, indicating a negative impact of low-quality habitat (Figure 2). There is one clear outlier to this trend, where the least clustered (therefore highest quality) site also had one of the lowest probabilities of occupancy. However, patterns across all species were associated with large and overlapping credible intervals.

3.4 Impact of Sampling Effort on DNA Availability Probability—Sample Level Model

In 2015, the availability of DNA increased with the number of pools for all three taxa. In contrast, in 2016, we observed a slight positive trend in the bearded pig, but negative trends for the other two taxa. As leech pools were added, the probability of availability decreases for muntjac and sambar deer. (Figure 3).

3.5 Impact of DNA Concentration on Detection Probability—Replicate Level Model

The impact of DNA concentration cannot be compared directly between years, because the measurements were taken on different instruments which apply a different methodology. So within 2015, we found that detection probability remained low and relatively consistent between taxa (Figure 4). In 2016, the probability of detecting DNA from sambar deer and bearded pig increased with DNA concentration, especially for the bearded pig where this is a steep relationship, but has little impact on the detection probability of the muntjac, which remains constant across different DNA concentrations.

3.6 How Much Replication Is Needed?

Using the cumulative probability between 1 and 20 samples (biological and technical), we estimated the amount of replication needed to meet either an arbitrary 50% or 80% probability threshold (Figure 5). Broadly, 2015 required less sample level replication (i.e., leech pools) to reach an 80% availability probability. At a taxon level, an 80% availability probability would be reached for bearded pig in the least number of pools (2 or 3), for muntjac about 5 pools would be needed, but for sambar deer, to cover both years, this would require 11 or 12 pools (Figure 5). For detection probability, to reach between 50% and 80% detection, approximately 10 PCR replicates would be needed. However, the variability between the years is greater for this cumulative probability; for example, in 2016 only 2 or 3 replicates would be required for an 80% detection of bearded pig, but in 2015, to reach the same threshold, 15 PCR replicates would be needed (Figure 5).

4 Discussion

The objective of this study was to determine the usefulness of analyzing mammal detection data based on iDNA within an occupancy modeling framework. Although increasing numbers of molecular surveys have utilized occupancy modeling to estimate species presence probabilities, this has not often been done at the level of technical replicates (e.g., PCR). By reanalyzing an iDNA dataset and incorporating both environmental and technical covariates, here we show that this approach can reveal species and seasonal differences that would otherwise not be evident from an analysis of presence and absence data alone (Drinkwater, Jucker, et al. 2021). Focusing on three common mammals, we found that all three probability estimates for the first year of sampling were more consistent, and with a higher probability of occupancy and lower probability of detection, than in the second year. Results from the second year of sampling suggested more variation across probability types, making it harder to draw conclusions. Here we discuss whether this variation could have arisen from differences in environmental conditions between the sampling years, or might reflect stochasticity from small sample sizes.

At a site level, we observed that all three focal taxa appeared to be tolerant to changes in habitat quality, at least with regards to habitat heterogeneity. This is an expected result, given that all three taxa were selected based on their high level of detection across habitats in the original iDNA biodiversity assessment (Drinkwater, Jucker, et al. 2021). At the same time, however, the occupancy modeling here appears to uncover a more nuanced picture in which sambar deer showed a higher probability of occupancy in higher-quality habitat (i.e., sites with less patchy canopy cover) in the first year of sampling. The environmental variable ‘habitat heterogeneity’ describes the amount of clustering at the canopy level, with low heterogeneity values (hence lower amounts of clustering) signifying more intact and closed canopies that tend to reflect lower levels of logging pressure and less degradation. This result disagrees with previous reports that sambar deer abundance does not decrease in logged forests (e.g., Granados et al. 2016), and instead supports Deere et al. (2017), who also showed that the occupancy probability of sambar deer was slightly lower than bearded pig in some degraded forest classes. Similarly, Brodie et al. (2015) reported that while sambar deer occur in logged forest, they tend to avoid edge habitats. Therefore, given the high degree of fragmentation at our study site, it is possible that sambar deer move deeper into the forest interior.

Different patterns across the three taxa were also seen in the null model at the sample level between years (the replicate level model cannot be compared between years). Such contrasting results between consecutive years could stem from the fact that the sampling period spanned one of the strongest El Nino Southern Oscillation (ENSO) events on record (2014–2016) (Timmermann et al. 2018). In Sabah, the main regional consequence of this El Nino event was a delayed rainy season, with very dry conditions and intense fire outbreaks (Chen et al. 2016). These conditions will have directly affected food availability and fruiting phenology, and would likely have resulted in changes in mammal behavior (e.g., Fredriksson et al. 2006). Importantly for this type of iDNA study, the El Nino-induced drought will have affected the leeches too as they are known to be sensitive to microclimate (Drinkwater et al. 2020) and soil moisture (Nelaballi et al. 2022). The drought began in 2015 and lasted until late-2016 (Miyamoto et al. 2021), and, therefore, some leeches collected in 2015 might have fed prior to the onset of drought conditions, whereas leeches from 2016 are more likely to have been adversely affected. Given that leeches avoid dry conditions and may forage less during droughts, reduced foraging activity might have contributed to the observed wider variation in detection and encounter rates. Interestingly, some insect species have been reported to feed more at times of water stress (e.g., Hagan et al. 2018); however, this probably does not apply to leeches, which lack an exoskeleton and are at higher risk of desiccation (see Phillips et al. 2020; Drinkwater et al. 2020). Indeed, leeches might survive periods of drought by foraging less frequently, perhaps instead sheltering in moist leaf litter; though more evidence is needed.

Sampling effort was also seen to influence the chance of detecting the target DNA in a leech (availability probability) across both taxa and years. In particular, effort was positively associated with DNA availability probability in 2015, as expected, but this pattern was reversed in 2016 for two of the taxa. Given that this probability relies on the fact that sampled leeches have recently fed, we suggest that the observed trend for 2016 reflects longer inter-feeding intervals during the sustained drought period, such that mammal DNA was cleared from the gut at the time of sampling. Ultimately, including a covariate that describes the gut DNA degradation window (Schnell et al. 2012) may improve the sensitivity of iDNA occupancy models; such information could be ascertained through feeding trials under experimental conditions (Schnell et al. 2012; Drinkwater, Williamson, et al. 2021). Failure to detect target DNA might also result from swamping by DNA from a more abundant taxon (e.g., bearded pig) in the mixed DNA pool, making the DNA from rarer taxa less available for amplification. Alternatively, biases could also arise from technical distortion, such as due to differences in primer binding efficiency across taxa, an issue that has been seen in other molecular diet analysis studies (Alberdi et al. 2019). To overcome these potential problems, future studies could target the bloodmeals of single leeches, which are typically considered to contain DNA from a maximum of one individual mammal, although such a step would reduce cost effectiveness and may be unfeasible for many studies if there is a high rate of unfed “empty” leeches.

In the final level of the occupancy model, we found a positive relationship between the DNA concentration and probability of detection in a PCR replicate in two cases. Low starting concentrations may increase PCR stochasticity, thus reducing the rate of detection (Alberdi et al. 2018). Here, adopting laboratory procedures designed to maximize DNA concentration, for example from forensics or palaeogenomics, could help to enhance species detection further. Such refinements could include adjustments to buffer reagents in the DNA-binding step, or utilizing silica-coated magnetic beads to prevent DNA loss (Rohland et al. 2018). It is important to note, however, that such measures will increase the concentration of the total DNA as opposed to target DNA, with the unintended consequence of also concentrating inhibitors (McKee et al. 2015). The enrichment of target DNA has been advanced recently, with the application of new sequencing methods, including applying deeper mitochondrial sequencing to species-specific targets (Nguyen et al. 2021) and mitochondrial capture from leeches and flies (Danabalan et al. 2023). A method of adaptive sampling based on nanopore sequencing (Oxford Nanopore Technology) has also been somewhat successfully applied, where the host DNA (i.e., the invertebrate) is rejected during sequencing, allowing enrichment of the prey DNA (Khan et al. 2024).

To investigate further the impact of technical replicates on the probabilities of availability and detection, we estimated the number of biological and technical replicates needed to reach an arbitrary threshold of 80%. These analyses revealed that reaching this threshold of availability across all taxa and both years required 2–10 pools of leeches. In contrast, achieving the same probability of detection required a much higher number of > 15 PCR replicates to capture all taxa, which would be prohibitively expensive for most studies. Since higher availability can also be attained through biological replication, we suggest that increasing the numbers of leeches collected is more achievable and cost-effective compared with increasing technical replication, thus supporting earlier leech-based studies that advocate for high sample sizes (Abrams et al. 2019; Ji et al. 2022). This finding might pertain specifically to leeches, given that some unfed individuals will yield no vertebrate DNA, compared to, for example, eDNA studies more generally, where samples can contain multiple species.

As eDNA/iDNA sampling for biodiversity becomes commonplace for biodiversity assessments or targeted species detections, it is critical that we understand the biases underlying the detection data to help draw robust conclusions, especially for the detection of species of conservation concern. Although small in scale, our study demonstrates the potential usefulness of applying multiscale occupancy modeling to iDNA studies. In particular, we uncover changes in probabilities across year, habitat quality, and taxon, which would otherwise have not been evident. Ultimately, however, our relatively small sample sizes, coupled with the low detection rates in the leeches, mean that our estimated probabilities are typically associated with large and overlapping error bars. While some of the observed trends make intuitive sense, such as the apparent negative impact of habitat heterogeneity on bearded pig occupancy, we cannot rule out the possibility that at least some such variation stems from stochasticity linked to a lack of power. We thus recommend that future studies should balance the need for samples with the biology of the samplers as well as any potential ecological consequences of removing large numbers of invertebrates from the ecosystem.

Author Contributions

Study design: R.D., S.J.R. Analysis of the data: R.D. Interpretation of the data: E.L.C., S.J.R. Writing of the manuscript: R.D., E.L.C., S.J.R.

Acknowledgments

We thank Henry Bernard for all his help with the logistics and Nicolas Deere for help with samples and discussions regarding analyses. For help and discussions regarding statistics modeling we thank Alberto Carmagnini. All data are reanalyzed from Drinkwater, Jucker, et al. 2021 under the Sabah Biodiversity Council permits (JKM/MBS.1000-2/2 (374), JKM/MBS.1000-2/3 JLD.2 (55) JKM/MBS.1000-2/2 (34) JKM/MBS.1000-2/3 JLD.2 (107) and JKM/MBS.1000-2/3 JLD.3 (44)). The study was funded by a Natural Environment Research Council grant NE/K016148/1 & NE/K01626/1, awarded as part of the Human Modified Tropical Forests Programme, and R.D. received additional support from The Leverhulme Trust Study Abroad Studentship (SAS-2016-100). For the original data collection, we thank Yayasan Sabah, Sime Darby and Benta Wawasan for access and for assistance in the field, we thank all LOMBOK research assistants, the Stability of Altered Forest Ecosystems (SAFE) project and the South East Asian Rainforest Research Partnership (SEARRP). Original data were generated at the Bart's and the London Genome Centre (Queen Mary University of London) and the Danish National High-Throughput Sequencing Centre (University of Copenhagen).

Conflicts of Interest

The authors declare no conflicts of interest.

Open Research

Data Availability Statement

The raw data underlying this study is archived at raw sequence data and are available on the NCBI short read archive, with the SRA BioProject accession no: PRJNA672059 (https://www.ncbi.nlm.nih.gov/sra/PRJNA672059). Site level data are available on the SAFE project Zenodo repository (http://doi.org/10.5281/zenodo.4095374, Drinkwater, Jucker, et al. 2021).

Supporting Information

References

Abrams, J. F., L. A. Hörig, R. Brozovic, et al. 2019. “Shifting Up a Gear With iDNA: From Mammal Detection Events to Standardised Surveys.” Journal of Applied Ecology 56, no. 7: 1637–1648.
10.1111/1365-2664.13411
Web of Science® Google Scholar
Alberdi, A., O. Aizpurua, K. Bohmann, et al. 2019. “Promises and Pitfalls of Using High-Throughput Sequencing for Diet Analysis.” Molecular Ecology Resources 19, no. 2: 327–348.
10.1111/1755-0998.12960
PubMed Web of Science® Google Scholar
Alberdi, A., O. Aizpurua, M. T. P. Gilbert, and K. Bohmann. 2018. “Scrutinizing Key Steps for Reliable Metabarcoding of Environmental Samples.” Methods in Ecology and Evolution 9, no. 1: 134–147.
10.1111/2041-210X.12849
Web of Science® Google Scholar
Barnes, M. A., and C. R. Turner. 2016. “The Ecology of Environmental DNA and Implications for Conservation Genetics.” Conservation Genetics 17, no. 1: 1–17.
10.1007/s10592-015-0775-4
CAS Web of Science® Google Scholar
Brodie, J. F., A. J. Giordano, and L. Ambu. 2015. “Differential Responses of Large Mammals to Logging and Edge Effects.” Mammalian Biology 80, no. 1: 7–13.
10.1016/j.mambio.2014.06.001
Web of Science® Google Scholar
Camacho, C., G. Coulouris, V. Avagyan, et al. 2009. “BLAST+: Architecture and Applications.” BMC Bioinformatics 10: 1–9.
10.1186/1471-2105-10-421
CAS PubMed Web of Science® Google Scholar
Chen, C. C., H. W. Lin, J. Y. Yu, and M. H. Lo. 2016. “The 2015 Borneo Fires: What Have We Learned From the 1997 and 2006 El Niños?” Environmental Research Letters 11, no. 10: 104003.
10.1088/1748-9326/11/10/104003
Google Scholar
Chivas, C., A. Stow, A. Harford, et al. 2025. “Mosquito-Derived Ingested DNA as a Tool for Monitoring Terrestrial Vertebrates Within a Peri-Urban Environment.” Ecosphere 16, no. 1: e70163.
10.1002/ecs2.70163
Web of Science® Google Scholar
Danabalan, R., K. Merkel, I. Bærholm Schnell, et al. 2023. “Mammal Mitogenomics From Invertebrate-Derived DNA.” Environmental DNA 5, no. 5: 1004–1015.
10.1002/edn3.436
CAS Web of Science® Google Scholar
Deere, N. J., G. Guillera-Arroita, E. L. Baking, et al. 2017. “High Carbon Stock Forests Provide Co-Benefits for Tropical Biodiversity.” Journal of Applied Ecology 55, no. 2: 997–1008.
10.1111/1365-2664.13023
Web of Science® Google Scholar
Dorazio, R. M., and R. A. Erickson. 2017. “Ednaoccupancy: An R Package for Multiscale Occupancy Modelling of Environmental DNA Data.” Molecular Ecology Resources 18, no. 2: 368–380.
10.1111/1755-0998.12735
PubMed Web of Science® Google Scholar
Drinkwater, R., T. Jucker, J. H. Potter, et al. 2021. “Leech Blood-Meal Invertebrate-Derived DNA Reveals Differences in Bornean Mammal Diversity Across Habitats.” Molecular Ecology 30, no. 13: 3299–3312.
10.1111/mec.15724
PubMed Web of Science® Google Scholar
Drinkwater, R., J. Williamson, E. L. Clare, A. Y. Chung, S. J. Rossiter, and E. Slade. 2021. “Dung Beetles as Samplers of Mammals in Malaysian Borneo—A Test of High Throughput Metabarcoding of iDNA.” PeerJ 9: e11897.
10.7717/peerj.11897
PubMed Web of Science® Google Scholar
Drinkwater, R., J. Williamson, T. Swinfield, et al. 2020. “Occurrence of Blood-Feeding Terrestrial Leeches (Haemadipsidae) in a Degraded Forest Ecosystem and Their Potential as Ecological Indicators.” Biotropica 52, no. 2: 302–312.
10.1111/btp.12686
Web of Science® Google Scholar
Ewers, R. M., R. K. Didham, L. Fahrig, et al. 2011. “A Large-Scale Forest Fragmentation Experiment: the Stability of Altered Forest Ecosystems Project.” Philosophical Transactions of the Royal Society, B: Biological Sciences 366, no. 1582: 3292–3302.
10.1098/rstb.2011.0049
PubMed Web of Science® Google Scholar
Fredriksson, G. M., S. A. Wich, and TRISNO. 2006. “Frugivory in Sun Bears (Helarctos malayanus) is Linked to El Niño-Related Fluctuations in Fruiting Phenology, East Kalimantan, Indonesia.” Biological Journal of the Linnean Society 89, no. 3: 489–508.
10.1111/j.1095-8312.2006.00688.x
Web of Science® Google Scholar
Frøslev, T. G., R. Kjøller, H. H. Bruun, et al. 2017. “Algorithm for Post-Clustering Curation of DNA Amplicon Data Yields Reliable Biodiversity Estimates.” Nature Communications 8: 1188.
10.1038/s41467-017-01312-x
PubMed Web of Science® Google Scholar
Gelfand, A. E., and S. K. Ghosh. 1998. “Model Choice: A Minimum Posterior Predictive Loss Approach.” Biometrika 85, no. 1: 1–11.
10.1093/biomet/85.1.1
Web of Science® Google Scholar
Gogarten, J. F., C. Hoffmann, M. Arandjelovic, et al. 2020. “Fly-Derived DNA and Camera Traps Are Complementary Tools for Assessing Mammalian Biodiversity.” Environmental DNA 2, no. 1: 63–76.
10.1002/edn3.46
Google Scholar
Granados, A., K. Crowther, J. F. Brodie, and H. Bernard. 2016. “Persistence of Mammals in a Selectively Logged Forest in Malaysian Borneo.” Mammalian Biology 81, no. 3: 268–273.
10.1016/j.mambio.2016.02.011
Web of Science® Google Scholar
Guillera-Arroita, G., M. S. Ridout, and B. J. Morgan. 2010. “Design of Occupancy Studies With Imperfect Detection.” Methods in Ecology and Evolution 1, no. 2: 131–139.
10.1111/j.2041-210X.2010.00017.x
Web of Science® Google Scholar
Hagan, R. W., E. M. Didion, A. E. Rosselot, C. J. Holmes, S. C. Siler, and A. J. Rosendale. 2018. “Dehydration Prompts Increased Activity and Blood Feeding by Mosquitoes.” Scientific Reports 8, no. 1: 1–12.
10.1038/s41598-018-24893-z
CAS PubMed Google Scholar
Hoffmann, C., K. Merkel, A. Sachse, P. Rodríguez, F. H. Leendertz, and S. Calvignac-Spencer. 2018. “Blow Flies as Urban Wildlife Sensors.” Molecular Ecology Resources 18, no. 3: 502–510.
10.1111/1755-0998.12754
PubMed Web of Science® Google Scholar
Hunter, M. E., S. J. Oyler-McCance, R. M. Dorazio, et al. 2015. “Environmental DNA (eDNA) Sampling Improves Occurrence and Detection Estimates of Invasive Burmese Pythons.” PLoS One 10, no. 4: e0121655.
10.1371/journal.pone.0121655
PubMed Web of Science® Google Scholar
Ji, Y., C. C. Baker, V. D. Popescu, et al. 2022. “Measuring Protected-Area Effectiveness Using Vertebrate Distributions From Leech iDNA.” Nature Communications 13: 1555.
10.1038/s41467-022-28778-8
CAS PubMed Web of Science® Google Scholar
Jucker, T., G. P. Asner, M. Dalponte, et al. 2018. “Estimating Aboveground Carbon Density and Its Uncertainty in Borneo's Structurally Complex Tropical Forests Using Airborne Laser Scanning.” Biogeosciences 15, no. 12: 3811–3830.
10.5194/bg-15-3811-2018
Web of Science® Google Scholar
Kéry, M., and J. A. Royle. 2016. “ Linear Models, Generalized Linear Models (GLMs), and Random Effects Models.” In Applied Hierarchical Modeling in Ecology, 79–122. Academic Press.
10.1016/B978-0-12-801378-6.00003-5
Google Scholar
Khan, A., R. Carter, C. Mpamhanga, et al. 2024. “Swatting Flies: Biting Insects as Non-Invasive Samplers for Mammalian Population Genomics.” Authorea Preprints.
Google Scholar
Kocher, A., J. C. Gantier, P. Gaborit, et al. 2017. “Vector Soup: High-Throughput Identification of Neotropical Phlebotomine Sand Flies Using Metabarcoding.” Molecular Ecology Resources 17, no. 2: 172–182.
10.1111/1755-0998.12556
CAS PubMed Web of Science® Google Scholar
MacKenzie, D. I., J. D. Nichols, J. E. Hines, M. G. Knutson, and A. B. Franklin. 2003. “Estimating Site Occupancy, Colonization, and Local Extinction When a Species Is Detected Imperfectly.” Ecology 84, no. 8: 2200–2207.
10.1890/02-3090
Web of Science® Google Scholar
Martel, C. M., M. Sutter, R. M. Dorazio, and A. P. Kinziger. 2020. “Using Environmental DNA and Occupancy Modelling to Estimate Rangewide Metapopulation Dynamics.” Molecular Ecology 30, no. 13: 3340–3354.
10.1111/mec.15693
PubMed Web of Science® Google Scholar
McKee, A. M., S. F. Spear, and T. W. Pierson. 2015. “The Effect of Dilution and the Use of a Post-Extraction Nucleic Acid Purification Column on the Accuracy, Precision, and Inhibition of Environmental DNA Samples.” Biological Conservation 183: 70–76.
10.1016/j.biocon.2014.11.031
Web of Science® Google Scholar
Mercier, C., F. Boyer, A. Bonin, and E. Coissac. 2013. “ SUMATRA and SUMACLUST: Fast and Exact Comparison and Clustering of Sequences.” In Programs and Abstracts of the SeqBio 2013 Workshop, 27–29.
Google Scholar
Miyamoto, K., S. I. Aiba, R. Aoyagi, and R. Nilus. 2021. “Effects of El Niño Drought on Tree Mortality and Growth Across Forest Types at Different Elevations in Borneo.” Forest Ecology and Management 490: 119096.
10.1016/j.foreco.2021.119096
Web of Science® Google Scholar
Nelaballi, S., B. J. Finkel, A. B. Bernard, et al. 2022. “Impacts of Abiotic and Biotic Factors on Terrestrial Leeches in Indonesian Borneo.” Biotropica 54, no. 5: 1238–1247.
10.1111/btp.13146
Web of Science® Google Scholar
Nguyen, T. V., A. Tilker, A. Nguyen, et al. 2021. “Using Terrestrial Leeches to Assess the Genetic Diversity of an Elusive Species: The Annamite Striped Rabbit Nesolagus timminsi.” Environmental DNA 3, no. 4: 780–791.
10.1002/edn3.182
CAS Google Scholar
Phillips, A. J., F. R. Govedich, and W. E. Moser. 2020. “Leeches in the Extreme: Morphological, Physiological, and Behavioral Adaptations to Inhospitable Habitats.” International Journal for Parasitology: Parasites and Wildlife 12: 318–325.
10.1016/j.ijppaw.2020.09.003
PubMed Web of Science® Google Scholar
R Core Team. 2019. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.
Google Scholar
Rohland, N., I. Glocke, A. Aximu-Petri, and M. Meyer. 2018. “Extraction of Highly Degraded DNA From Ancient Bones, Teeth, and Sediments for High-Throughput Sequencing.” Nature Protocols 13, no. 11: 2447–2461.
10.1038/s41596-018-0050-5
CAS PubMed Web of Science® Google Scholar
Schloss, P. D., S. L. Westcott, T. Ryabin, et al. 2009. “Introducing Mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities.” Applied and Environmental Microbiology 75, no. 23: 7537–7541.
10.1128/AEM.01541-09
CAS PubMed Web of Science® Google Scholar
Schmidt, B. R., M. Kéry, S. Ursenbacher, O. J. Hyman, and J. P. Collins. 2013. “Site Occupancy Models in the Analysis of Environmental DNA Presence/Absence Surveys: A Case Study of an Emerging Amphibian Pathogen.” Methods in Ecology and Evolution 4, no. 7: 646–653.
10.1111/2041-210X.12052
Web of Science® Google Scholar
Schnell, I. B., P. F. Thomsen, N. Wilkinson, et al. 2012. “Screening Mammal Biodiversity Using DNA From Leeches.” Current Biology 22, no. 8: R262–R263.
10.1016/j.cub.2012.02.058
CAS PubMed Web of Science® Google Scholar
Timmermann, A., S. I. An, J. S. Kug, et al. 2018. “El Niño–Southern Oscillation Complexity.” Nature 559, no. 7715: 535–545.
10.1038/s41586-018-0252-6
CAS PubMed Web of Science® Google Scholar
Watanabe, S. 2010. “Equations of States in Singular Statistical Estimation.” Neural Networks 23, no. 1: 20–34.
10.1016/j.neunet.2009.08.002
PubMed Web of Science® Google Scholar

Volume7, Issue3

May–June 2025

e70121

Improving the Understanding of Detections From iDNA Surveys in Malaysian Borneo With Multiscale Occupancy Models: A Case-Study Using Leech Blood Meals

ABSTRACT

1 Introduction

2 Materials and Methods