Improving the Understanding of Detections From iDNA Surveys in Malaysian Borneo With Multiscale Occupancy Models: A Case-Study Using Leech Blood Meals
Funding: The study was funded by a Natural Environment Research Council grant NE/K016148/1 & NE/K016261/1 awarded as part of the Human Modified Tropical Forests Programme, and RD received additional support from The Leverhulme Trust Study Abroad Studentship (SAS-2016-100).
ABSTRACT
Invertebrate-derived DNA (iDNA) has been successfully utilized for surveying mammalian biodiversity in several ecosystems. Yet, as with all sampling methods, this approach suffers from potential biases, including those introduced by the choice of invertebrate sampler, as well as the stochasticity of DNA amplification during PCR. Occupancy modeling is a statistical framework that can help account for imperfect detections in sampling and can be used to improve iDNA surveys. Using a case study based on the DNA screened from the blood meals of leeches, we demonstrate how multiscale occupancy models can be applied to the molecular detection of vertebrates to reveal the nuances in iDNA detections. Leeches were collected across a habitat degradation gradient in Sabah, Malaysian Borneo, in 2015 and 2016. We estimated three probabilities describing the occupancy, availability, and detection of three abundant mammals (bearded pig, muntjac and sambar deer) and compared how these values were impacted by environmental and technical covariates. For 2015, we found that null models without covariates revealed no clear differences in each of the three probabilities across taxa. However, in 2016, although the taxa have comparable occupancy, deviations occurred in the other two probabilities, with the sambar deer showing the lowest availability and muntjac with the lowest detection probability. Univariate models constructed for each taxon and year revealed differential impacts of the covariates; for example, a strong positive effect of DNA concentration on the detection of sambar deer and bearded pig was seen in 2016 only. Finally, our estimation of the minimum numbers of biological and technical replicates highlights the important trade-off between achieving high probabilities of availability and detection and realistic amounts of sampling. Our results showcase the use of occupancy models for leech-iDNA biodiversity surveys but highlight the potential effects of sample type, methodological design, and sample size.
1 Introduction
It has been shown previously that vertebrate DNA can be obtained from a range of blood-feeding invertebrates, including from carrion flies (e.g., Hoffmann et al. 2018), terrestrial leeches (e.g., Danabalan et al. 2023), mosquitos (e.g., Chivas et al. 2025) and sandflies (e.g., Kocher et al. 2017). This has led to growing interest in the potential of using these taxa as samplers for rapid assessments of vertebrate communities and to measure and monitor biodiversity (Gogarten et al. 2020). To date, most invertebrate-derived DNA (iDNA) studies have focused on documenting species presence, with less consideration of the extent to which such methods detect, or fail to detect, species that are present. Few estimates of false negatives have been conducted, and error rates in general are not well reported. Many common practices, such as the use of technical replicates, are based on convention and budget rather than data-driven and informed sampling approaches. However, like all survey methods, iDNA surveys are prone to the inherent issues of imperfect detection, and there is a need to quantify the uncertainty surrounding the detections to better understand how studies should be designed and how these methods can be improved to account for sampling biases.
Occupancy modeling is a powerful statistical framework that has been widely adopted for analyzing biodiversity survey data (MacKenzie et al. 2003). This approach is based on the idea that most survey methods are unlikely to detect all individuals that are present at a site, and it leverages replicate sampling to disentangle detection from presence (Guillera-Arroita et al. 2010). A particular feature of these models is the ability to incorporate covariates, such as information about location, habitat quality, or abiotic conditions, which may influence the data that is collected.
In recent years, occupancy models have been applied to environmental DNA (eDNA) surveys; for example, to track chytrid fungus in amphibians (Schmidt et al. 2013), monitor the invasive Burmese python in Florida (Hunter et al. 2015), and investigate goby fish metapopulation dynamics (Martel et al. 2020), providing improved detection probabilities, especially in species that are difficult to track. In contrast, almost no studies have applied occupancy models to iDNA surveys, with the exception of investigations that have examined detections in leech bloodmeals. For example, Abrams et al. (2019) revealed consistent estimates of occupancy probabilities obtained from leeches compared to those obtained from camera traps, although with higher uncertainty for the leech estimates, which the authors speculated might be overcome by increasing sample number. The value of large samples was also supported by a comprehensive assessment of vertebrate occupancy across a whole protected area (Ji et al. 2022). Here, an analysis of over 30,000 leeches revealed changes in occupancy with spatial metrics, including distance to reserve edge. While such studies have demonstrated real-world conservation applications of using leech iDNA, they also highlight trade-offs between the lower detection rate found in leeches (compared to methods such as camera trapping) and the sampling effort required to make robust estimates.
Given their flexibility, occupancy models applied to iDNA studies also provide a means to explore how detections can be affected by forms of replication. First, taking multiple samples from within a site during a survey is analogous to spatial replication in a traditional survey. Second, PCR replicates subsampled from the same DNA extract can be considered equivalent to temporal replicates in a traditional survey (Dorazio and Erickson 2017). Replicate PCR reactions are commonly used to overcome stochasticity in DNA amplification. Such heterogeneity is more pronounced where DNA concentrations are low (Barnes and Turner 2016), commonly seen in iDNA studies because of the degradation of DNA during digestion. Additionally, in leech-based iDNA studies, long inter-feeding intervals and the pooling of “empty” individuals can also contribute to lower levels of DNA. Both biological and technical forms of replication can be incorporated into multi-scale occupancy models. A multiscale occupancy model builds on the classic two-level occupancy model that estimates occupancy probability (ψ) and detection probability (p) (MacKenzie et al. 2003) with the additional parameter of availability probability (θ) (Kéry and Royle 2016). For the purposes of an iDNA survey, here the occupancy probability describes the probability that the DNA of the target animal is occupying a site. The availability probability is the probability that leeches sampled contain the DNA of the target animal, given it was at the site. While the detection probability then describes the probability that the DNA of the target animal is amplified given that the DNA was present in the leech, given the animal was at the site.
To assess the usefulness of multi-scale occupancy models applied to iDNA for revealing subtle differences in mammal presence across spatial, temporal, and technical replicates, here we reanalyze a dataset of mammal detections inferred from leech bloodmeals from Malaysian Borneo. Using multiscale occupancy models, we model three probabilities, occupancy (ψ), availability (θ) and detection (p), as functions of environmental and technical covariates relating to habitat degradation, sampling effort, and DNA concentration. Finally, we estimate the minimum replication needed to confidently detect all three focal taxa, as this remains an important financial and logistical constraint on DNA biodiversity surveys.
2 Materials and Methods
2.1 iDNA Biodiversity Survey and Data Generation
The preliminary iDNA data used to generate these occupancy models came from a previous study, Drinkwater, Jucker, et al. (2021) where full field sampling, molecular, and bioinformatic methods are described. Briefly, terrestrial tiger leeches (tiger leech morphospecies) collected at the Stability of Altered Forest Ecosystems project (SAFE), Sabah, Malaysia, were screened for mammal DNA using metabarcoding methods. Leeches were collected during repeated surveys in two campaigns (February—June 2015 and September—December 2016). In each campaign, several surveys were conducted at multiple sites in different forest types—categorized here as primary, twice-logged, heavily logged, and riparian forest (for more details on the SAFE project, see Ewers et al. 2011). DNA extracts from leeches sampled within the same site and survey were pooled, amplified in triplicate, and sequenced with 150 bp paired-end Illumina MiSeq (Table 1). Reads were processed following Drinkwater, Jucker, et al. (2021); however, after length and quality filtering, we retained all sequences regardless of how many replicates they were found in. This allowed us to investigate PCR replicate level covariates. The reads were clustered into OTUs at 95% similarity with sumaclust v1.3 (Mercier et al. 2013) and chimaeras were removed (Schloss et al. 2009). Post-clustering filtering was conducted with LULU, which removes OTUs that are likely to have arisen through sequencing error alone (Frøslev et al. 2017), and these curated OTUs were taxonomically assigned to the custom database using BLAST (Camacho et al. 2009).
Habitat type | Pools | Total leech individuals | Av. number of individuals per pool | MUSP | RUUN | SUBA |
---|---|---|---|---|---|---|
2015 | ||||||
Heavily logged | 29 | 271 | 9.7 | 4 | 3 | 9 |
Selectively logged | 41 | 425 | 10.3 | 10 | 11 | 7 |
Riparian | 35 | 286 | 8.2 | 9 | 1 | 9 |
2016 | ||||||
Heavily logged | 21 | 193 | 9.2 | 8 | 16 | 21 |
Selectively logged | 32 | 359 | 11.2 | 2 | 8 | 61 |
Old growth | 23 | 200 | 9.1 | 3 | 1 | 15 |
Riparian | 6 | 107 | 17.8 | 1 | 0 | 1 |
2.2 Multi-Scale Occupancy Models
There are several assumptions of occupancy models, although these might not always be fully met. The first assumption is that detections are independent. Although this could be violated if multiple leeches fed on the same large-bodied mammals, our detection probabilities are estimated from pools obtained from different sites and surveys. Second, detection and occupancy are constant in space or accounted for by covariates, which is the strategy taken here. In the models described below, covariates are added to the model at each of the three levels (site, sample, and replicate) to account for variation influencing the probabilities. Third, the sites do not change occupancy state during the time of the survey (i.e., sampling is within a closed season). This assumption is harder to satisfy; however, field sampling took place over a short period of 3–4 months, and considering the feeding behavior and degradation of DNA in the leech gut (Schnell et al. 2012), we are confident that we are only detecting the last blood meal per leech. The final assumption is that there are no false positives with identifications; again, a harder assumption to satisfy, and this is achieved through bioinformatic filtering protocols and conservative taxonomic assignments to a well-resolved database.
- Site level—12 sites in 2015 and 10 sites in 2016 in the four forest types
- Sample level—multiple leech pools from within each site, from the same survey (Table 1)
- Replicate level—three PCR replicates per leech pool
For example, for bearded pig at a given site, in three leech pools (sample), with triplicate PCRs (replicate), a hypothetical detection history at a single site could be expressed as {101 111 100}, which shows the three PCR replicates nested within pools. Here, there are two detections in the first pool, three in the second pool, and a single detection in the third pool.
The levels of the hierarchical model as defined in the R package eDNAoccupancy (Dorazio and Erickson 2017) are as follows:
The probability the DNA from the target taxon is occupying a site where terrestrial leeches are present. Z describes the presence (Z = 1) or absence (Z = 0) of the target DNA at site i as a function of the occupancy parameter at the site ψi.
This describes the probability that the DNA was available in the leech pool, the leeches sampled had fed on the blood of the target taxon, given the target is present at the site. The availability parameter θij is conditional on the value of zi. A = the probability the target DNA occurs in the sample j, given the species was present (z = 1) at site i.
2.3 Covariate Selection and Model Construction
To capture the variability across the human-modified landscape, we used a site-level covariate of habitat heterogeneity, calculated from a LiDAR aerial survey across the SAFE project in 2014 (Jucker et al. 2018), extracted at a 50 m buffer for each site (see Drinkwater, Jucker, et al. 2021). Values can range from −1 to 1, where high negative values correspond to a more even canopy dispersion, although this is rare in natural forests; high-quality forests with intact canopies are more likely to get values near zero, while values closer to one indicate that there is more clustering in canopy cover, for example with more gaps, as is typical in degraded, human-modified forests.
As a sample-level covariate, we used the number of pools per site to indicate sampling effort, which would therefore affect the availability probability (θ). High numbers of pools per site are expected to increase the chances of availability of the mammal DNA for sequencing. Finally, we included the DNA concentration of the pools as the replicate-level covariate. We expect this to influence the detection parameter, p, as DNA concentration can affect the amplification success of the PCR reaction, with too much DNA in the pool potentially leading to inhibition and too little DNA increasing the stochasticity of DNA amplification of the target species in a mixed pool.
For each of the three mammals and for each of the two sampling seasons (2015 and 2016), we constructed four models: null models (i.e., no covariates and only intercepts) and three univariate models, each including one covariate at the nested levels of the model (Table 2). We did not test the effect of the interactions between covariates because the dataset was not large enough, and MCMC chains did not converge for more complex models.
Model | Covariate structure | Description |
---|---|---|
Null | ψ (.) θ (.) p (.) | No covariates |
Site level | ψ (heterogeneity) θ (.) p (.) | Habitat heterogeneity |
Sample level | ψ (.) θ (pools) p (.) | Survey effort |
Replicate level | ψ (.) θ (.) p (conc) | iDNA concentration |
Models were run using the eDNAoccupancy package (version 0.2.7) with default parameters (Dorazio and Erickson 2017) in R (R Core Team 2019), which uses Bayesian inference to estimate the parameters of ψ, θ and p, and assumes a uniform prior distribution for the occupancy parameter (ψ) and assumes a multivariate normal prior for the availability (θ) and detection (p) probabilities (further details in Dorazio and Erickson 2017). Each chain was updated at least 10 times after 20,000 iterations, using the updateOccModel function, and all models showed MCMC chain convergence after 200,000 iterations. We used trace plots to assess model convergence, and we compared goodness-of-fit for each of the models using two commonly applied criteria for Bayesian models: Watanabe's AIC (Watanabe 2010) and posterior-predictive loss (PPLC, Gelfand and Ghosh 1998) (see Table S1).
2.4 Cumulative Probabilities
Finally, to test the effects of sampling and PCR replication on two of the probabilities, availability (θ) and detection (p), we calculated the cumulative probability scores. This is an important consideration for molecular biodiversity studies as increasing the number of samples and PCR replicates to be sequenced will increase costs. Any increase in cost needs to be traded off against the increasing probability of detection with greater sample size, and the benefits of increased confidence in the detections by confirming a detection in multiple PCR reactions. This is of particular concern when designing molecular surveys for conservation monitoring schemes where confidence in the presence of a rare target species is crucial. We calculated the cumulative availability probability as and the cumulative detection probability as , where θ is the availability probability, p is the detection probability, and kbio describes the number of leech pools and kpcr describes the number of PCR replicates. We calculated this parameter for k = 1 up to k = 20 (Hunter et al. 2015). In both cases, θ and p, we take the starting value for k = 1 from the mean posterior probability taken from the null model (which included no covariates).
3 Results
3.1 Detection Histories and Model Evaluation
To generate the detection history for each taxon, in each of the three PCR replicates, we set the lowest minimum PCR threshold to one, such that each taxon only had to be detected once in a replicate after filtering for sequence quality. This resulted in 485 OTUs, of which 45 OTUs were identified as chimeric sequences and thus removed. Of the remaining 440 OTUs, 82 were identified as true sequences (“parent OTUs”) by LULU (post-clustering filtering). For the three target taxa, we then identified the number of detections across each of the 187 leech pools (Table 1). The total number of detections for muntjac in 2015 and 2016 was, respectively, 22 and 11, for sambar deer was 12 and 10, and for bearded pig was 23 and 43.
3.2 Null Model
For null models fitted without covariates, the occupancy probability (ψ) was high (> 0.75) for all taxa in both years, except for the sambar deer in 2015 (Figure 1). The probability of availability (θ) showed no clear differences among taxa in 2015. In contrast, we recorded lower availability for all three taxa in 2016, with sambar deer showing much lower availability than bearded pig (0.13 and 0.54, respectively) (Figure 1). The probability of detection also showed variation in 2016, with higher values for bearded pig and sambar deer but a low probability of detection for muntjac DNA.

3.3 Impact of Habitat Heterogeneity on Occupancy Probability
For site-level models that included habitat heterogeneity as a covariate, muntjac and bearded pig showed little or no change in mean occupancy with increasing levels of canopy clustering (Figure 2). However, the mean response of the sambar deer to habitat heterogeneity in 2015 was negative, with decreasing occupancy associated with increasing canopy clustering, indicating a negative impact of low-quality habitat (Figure 2). There is one clear outlier to this trend, where the least clustered (therefore highest quality) site also had one of the lowest probabilities of occupancy. However, patterns across all species were associated with large and overlapping credible intervals.

3.4 Impact of Sampling Effort on DNA Availability Probability—Sample Level Model
In 2015, the availability of DNA increased with the number of pools for all three taxa. In contrast, in 2016, we observed a slight positive trend in the bearded pig, but negative trends for the other two taxa. As leech pools were added, the probability of availability decreases for muntjac and sambar deer. (Figure 3).

3.5 Impact of DNA Concentration on Detection Probability—Replicate Level Model
The impact of DNA concentration cannot be compared directly between years, because the measurements were taken on different instruments which apply a different methodology. So within 2015, we found that detection probability remained low and relatively consistent between taxa (Figure 4). In 2016, the probability of detecting DNA from sambar deer and bearded pig increased with DNA concentration, especially for the bearded pig where this is a steep relationship, but has little impact on the detection probability of the muntjac, which remains constant across different DNA concentrations.

3.6 How Much Replication Is Needed?
Using the cumulative probability between 1 and 20 samples (biological and technical), we estimated the amount of replication needed to meet either an arbitrary 50% or 80% probability threshold (Figure 5). Broadly, 2015 required less sample level replication (i.e., leech pools) to reach an 80% availability probability. At a taxon level, an 80% availability probability would be reached for bearded pig in the least number of pools (2 or 3), for muntjac about 5 pools would be needed, but for sambar deer, to cover both years, this would require 11 or 12 pools (Figure 5). For detection probability, to reach between 50% and 80% detection, approximately 10 PCR replicates would be needed. However, the variability between the years is greater for this cumulative probability; for example, in 2016 only 2 or 3 replicates would be required for an 80% detection of bearded pig, but in 2015, to reach the same threshold, 15 PCR replicates would be needed (Figure 5).

4 Discussion
The objective of this study was to determine the usefulness of analyzing mammal detection data based on iDNA within an occupancy modeling framework. Although increasing numbers of molecular surveys have utilized occupancy modeling to estimate species presence probabilities, this has not often been done at the level of technical replicates (e.g., PCR). By reanalyzing an iDNA dataset and incorporating both environmental and technical covariates, here we show that this approach can reveal species and seasonal differences that would otherwise not be evident from an analysis of presence and absence data alone (Drinkwater, Jucker, et al. 2021). Focusing on three common mammals, we found that all three probability estimates for the first year of sampling were more consistent, and with a higher probability of occupancy and lower probability of detection, than in the second year. Results from the second year of sampling suggested more variation across probability types, making it harder to draw conclusions. Here we discuss whether this variation could have arisen from differences in environmental conditions between the sampling years, or might reflect stochasticity from small sample sizes.
At a site level, we observed that all three focal taxa appeared to be tolerant to changes in habitat quality, at least with regards to habitat heterogeneity. This is an expected result, given that all three taxa were selected based on their high level of detection across habitats in the original iDNA biodiversity assessment (Drinkwater, Jucker, et al. 2021). At the same time, however, the occupancy modeling here appears to uncover a more nuanced picture in which sambar deer showed a higher probability of occupancy in higher-quality habitat (i.e., sites with less patchy canopy cover) in the first year of sampling. The environmental variable ‘habitat heterogeneity’ describes the amount of clustering at the canopy level, with low heterogeneity values (hence lower amounts of clustering) signifying more intact and closed canopies that tend to reflect lower levels of logging pressure and less degradation. This result disagrees with previous reports that sambar deer abundance does not decrease in logged forests (e.g., Granados et al. 2016), and instead supports Deere et al. (2017), who also showed that the occupancy probability of sambar deer was slightly lower than bearded pig in some degraded forest classes. Similarly, Brodie et al. (2015) reported that while sambar deer occur in logged forest, they tend to avoid edge habitats. Therefore, given the high degree of fragmentation at our study site, it is possible that sambar deer move deeper into the forest interior.
Different patterns across the three taxa were also seen in the null model at the sample level between years (the replicate level model cannot be compared between years). Such contrasting results between consecutive years could stem from the fact that the sampling period spanned one of the strongest El Nino Southern Oscillation (ENSO) events on record (2014–2016) (Timmermann et al. 2018). In Sabah, the main regional consequence of this El Nino event was a delayed rainy season, with very dry conditions and intense fire outbreaks (Chen et al. 2016). These conditions will have directly affected food availability and fruiting phenology, and would likely have resulted in changes in mammal behavior (e.g., Fredriksson et al. 2006). Importantly for this type of iDNA study, the El Nino-induced drought will have affected the leeches too as they are known to be sensitive to microclimate (Drinkwater et al. 2020) and soil moisture (Nelaballi et al. 2022). The drought began in 2015 and lasted until late-2016 (Miyamoto et al. 2021), and, therefore, some leeches collected in 2015 might have fed prior to the onset of drought conditions, whereas leeches from 2016 are more likely to have been adversely affected. Given that leeches avoid dry conditions and may forage less during droughts, reduced foraging activity might have contributed to the observed wider variation in detection and encounter rates. Interestingly, some insect species have been reported to feed more at times of water stress (e.g., Hagan et al. 2018); however, this probably does not apply to leeches, which lack an exoskeleton and are at higher risk of desiccation (see Phillips et al. 2020; Drinkwater et al. 2020). Indeed, leeches might survive periods of drought by foraging less frequently, perhaps instead sheltering in moist leaf litter; though more evidence is needed.
Sampling effort was also seen to influence the chance of detecting the target DNA in a leech (availability probability) across both taxa and years. In particular, effort was positively associated with DNA availability probability in 2015, as expected, but this pattern was reversed in 2016 for two of the taxa. Given that this probability relies on the fact that sampled leeches have recently fed, we suggest that the observed trend for 2016 reflects longer inter-feeding intervals during the sustained drought period, such that mammal DNA was cleared from the gut at the time of sampling. Ultimately, including a covariate that describes the gut DNA degradation window (Schnell et al. 2012) may improve the sensitivity of iDNA occupancy models; such information could be ascertained through feeding trials under experimental conditions (Schnell et al. 2012; Drinkwater, Williamson, et al. 2021). Failure to detect target DNA might also result from swamping by DNA from a more abundant taxon (e.g., bearded pig) in the mixed DNA pool, making the DNA from rarer taxa less available for amplification. Alternatively, biases could also arise from technical distortion, such as due to differences in primer binding efficiency across taxa, an issue that has been seen in other molecular diet analysis studies (Alberdi et al. 2019). To overcome these potential problems, future studies could target the bloodmeals of single leeches, which are typically considered to contain DNA from a maximum of one individual mammal, although such a step would reduce cost effectiveness and may be unfeasible for many studies if there is a high rate of unfed “empty” leeches.
In the final level of the occupancy model, we found a positive relationship between the DNA concentration and probability of detection in a PCR replicate in two cases. Low starting concentrations may increase PCR stochasticity, thus reducing the rate of detection (Alberdi et al. 2018). Here, adopting laboratory procedures designed to maximize DNA concentration, for example from forensics or palaeogenomics, could help to enhance species detection further. Such refinements could include adjustments to buffer reagents in the DNA-binding step, or utilizing silica-coated magnetic beads to prevent DNA loss (Rohland et al. 2018). It is important to note, however, that such measures will increase the concentration of the total DNA as opposed to target DNA, with the unintended consequence of also concentrating inhibitors (McKee et al. 2015). The enrichment of target DNA has been advanced recently, with the application of new sequencing methods, including applying deeper mitochondrial sequencing to species-specific targets (Nguyen et al. 2021) and mitochondrial capture from leeches and flies (Danabalan et al. 2023). A method of adaptive sampling based on nanopore sequencing (Oxford Nanopore Technology) has also been somewhat successfully applied, where the host DNA (i.e., the invertebrate) is rejected during sequencing, allowing enrichment of the prey DNA (Khan et al. 2024).
To investigate further the impact of technical replicates on the probabilities of availability and detection, we estimated the number of biological and technical replicates needed to reach an arbitrary threshold of 80%. These analyses revealed that reaching this threshold of availability across all taxa and both years required 2–10 pools of leeches. In contrast, achieving the same probability of detection required a much higher number of > 15 PCR replicates to capture all taxa, which would be prohibitively expensive for most studies. Since higher availability can also be attained through biological replication, we suggest that increasing the numbers of leeches collected is more achievable and cost-effective compared with increasing technical replication, thus supporting earlier leech-based studies that advocate for high sample sizes (Abrams et al. 2019; Ji et al. 2022). This finding might pertain specifically to leeches, given that some unfed individuals will yield no vertebrate DNA, compared to, for example, eDNA studies more generally, where samples can contain multiple species.
As eDNA/iDNA sampling for biodiversity becomes commonplace for biodiversity assessments or targeted species detections, it is critical that we understand the biases underlying the detection data to help draw robust conclusions, especially for the detection of species of conservation concern. Although small in scale, our study demonstrates the potential usefulness of applying multiscale occupancy modeling to iDNA studies. In particular, we uncover changes in probabilities across year, habitat quality, and taxon, which would otherwise have not been evident. Ultimately, however, our relatively small sample sizes, coupled with the low detection rates in the leeches, mean that our estimated probabilities are typically associated with large and overlapping error bars. While some of the observed trends make intuitive sense, such as the apparent negative impact of habitat heterogeneity on bearded pig occupancy, we cannot rule out the possibility that at least some such variation stems from stochasticity linked to a lack of power. We thus recommend that future studies should balance the need for samples with the biology of the samplers as well as any potential ecological consequences of removing large numbers of invertebrates from the ecosystem.
Author Contributions
Study design: R.D., S.J.R. Analysis of the data: R.D. Interpretation of the data: E.L.C., S.J.R. Writing of the manuscript: R.D., E.L.C., S.J.R.
Acknowledgments
We thank Henry Bernard for all his help with the logistics and Nicolas Deere for help with samples and discussions regarding analyses. For help and discussions regarding statistics modeling we thank Alberto Carmagnini. All data are reanalyzed from Drinkwater, Jucker, et al. 2021 under the Sabah Biodiversity Council permits (JKM/MBS.1000-2/2 (374), JKM/MBS.1000-2/3 JLD.2 (55) JKM/MBS.1000-2/2 (34) JKM/MBS.1000-2/3 JLD.2 (107) and JKM/MBS.1000-2/3 JLD.3 (44)). The study was funded by a Natural Environment Research Council grant NE/K016148/1 & NE/K01626/1, awarded as part of the Human Modified Tropical Forests Programme, and R.D. received additional support from The Leverhulme Trust Study Abroad Studentship (SAS-2016-100). For the original data collection, we thank Yayasan Sabah, Sime Darby and Benta Wawasan for access and for assistance in the field, we thank all LOMBOK research assistants, the Stability of Altered Forest Ecosystems (SAFE) project and the South East Asian Rainforest Research Partnership (SEARRP). Original data were generated at the Bart's and the London Genome Centre (Queen Mary University of London) and the Danish National High-Throughput Sequencing Centre (University of Copenhagen).
Conflicts of Interest
The authors declare no conflicts of interest.
Open Research
Data Availability Statement
The raw data underlying this study is archived at raw sequence data and are available on the NCBI short read archive, with the SRA BioProject accession no: PRJNA672059 (https://www.ncbi.nlm.nih.gov/sra/PRJNA672059). Site level data are available on the SAFE project Zenodo repository (http://doi.org/10.5281/zenodo.4095374, Drinkwater, Jucker, et al. 2021).