Volume 21, Issue 3 pp. 721-732
RESOURCE ARTICLE
Full Access

Characterization of reproductive gene diversity in the endangered Tasmanian devil

Parice A. Brandies

Parice A. Brandies

School of Life and Environmental Sciences, Faculty of Science, University of Sydney, Sydney, NSW, Australia

Search for more papers by this author
Belinda R. Wright

Belinda R. Wright

School of Life and Environmental Sciences, Faculty of Science, University of Sydney, Sydney, NSW, Australia

Search for more papers by this author
Carolyn J. Hogg

Carolyn J. Hogg

School of Life and Environmental Sciences, Faculty of Science, University of Sydney, Sydney, NSW, Australia

Search for more papers by this author
Catherine E. Grueber

Catherine E. Grueber

School of Life and Environmental Sciences, Faculty of Science, University of Sydney, Sydney, NSW, Australia

San Diego Zoo Global, San Diego, CA, USA

Search for more papers by this author
Katherine Belov

Corresponding Author

Katherine Belov

School of Life and Environmental Sciences, Faculty of Science, University of Sydney, Sydney, NSW, Australia

Correspondence

Katherine Belov, School of Life and Environmental Sciences, Faculty of Science, University of Sydney, Sydney, New South Wales, Australia.

Email: [email protected]

Search for more papers by this author
First published: 14 November 2020
Citations: 3

Abstract

Interindividual variation at genes known to play a role in reproduction may impact reproductive fitness. The Tasmanian devil is an endangered Australian marsupial with low genetic diversity. Recent work has shown concerning declines in productivity in both wild and captive populations over time. Understanding whether functional diversity exists at reproductive genes in the Tasmanian devil is a key first step in identifying genes that may influence productivity. We characterized single nucleotide polymorphisms (SNPs) at 214 genes involved in reproduction in 37 Tasmanian devils. Twenty genes contained nonsynonymous substitutions, with genes involved in embryogenesis, fertilization and hormonal regulation of reproduction displaying greater numbers of nonsynonymous SNPs than synonymous SNPs. Two genes, ADAMTS9 and NANOG, showed putative signatures of balancing selection indicating that natural selection is maintaining diversity at these genes despite the species exhibiting low overall levels of genetic diversity. We will use this information in future to examine the interplay between reproductive gene variation and reproductive fitness in Tasmanian devil populations.

1 INTRODUCTION

Globally the number of species threatened with extinction is increasing as a result of human-induced activities including habitat fragmentation, invasive predators and pollution. Genetic diversity at functional gene families can have long-term consequences on species adaptation and survival in a changing world (Holderegger et al., 2006; Mimura et al., 2017). Understanding the causes and consequences of interindividual variation sits at the core of evolution and ecology, yet despite decades of molecular research, the genetic basis of phenotypic variation (i.e., genetic polymorphism) remains poorly quantified for the vast majority of species and traits (Forsman & Wennersten, 2016; Mimura et al., 2017). However, recent advances in sequencing technology have better enabled researchers to investigate interindividual variation at gene families and determine how this variation is linked to important phenotypic traits. For example, genetic diversity at immune genes, particularly genes of the major histocompatibility complex (MHC), have been associated with a range of key biological phenomena such as disease susceptibility and mate choice (Brandies et al., 2018; Sommer, 2005). These phenomena have significant implications on fitness and as a result interindividual variation at MHC loci has been extensively studied across a number of threatened species (Ujvari & Belov, 2011). Studies of MHC and other immune genes have demonstrated how characterizing genetic variation is crucial to predicting which genes may contribute to variable phenotypes, and the resultant implications for species conservation. However, little is currently known about diversity at other important gene families in threatened species.

Variation at reproductive genes may contribute to key productivity traits that impact the survival of threatened species. Relationships between gene variants and reproductive phenotypes have been extensively studied across a range of model organisms, from Drosophila to humans. For example, polymorphisms in male reproductive genes have been associated with variation in sperm competitive ability in Drosophila (Fiumera et al., 2005) and a range of gene mutations have been linked to infertility in humans (Layman, 2002). Associations between variants of key reproductive genes (e.g., those involved in the production or binding of reproductive hormones) and reproductive traits have also been reported in livestock species where high productivity is important (Kirkpatrick, 2002). Examining diversity at genes known to be involved in reproduction is a fundamental first step in determining which loci have the potential to underlie important reproductive traits. However, little is currently known about the variation at reproductive genes in wildlife species, particularly in threatened species that exhibit low levels of genetic diversity overall.

The Tasmanian devil (Sarcophilus harrisii) is one such threatened species that is suffering from a range of threatening processes, in addition to having low genome-wide diversity. Devils are the largest extant carnivorous marsupial and are native to the island state of Tasmania, Australia (Owen & Pemberton, 2005). Populations have declined by up to 80% across this species’ range due to a contagious cancer, known as devil facial tumour disease (DFTD). Historical population declines and contemporary habitat fragmentation have resulted in the erosion of genetic diversity (Jones et al., 2004; Miller et al., 2011), particularly at immune gene loci that are highly polymorphic in other species (Cheng et al., 2012; Morris et al., 2015). Tasmanian devils exhibit a number of interesting life-history strategies such the ability of females to undergo up to three oestrous cycles per breeding season (Keeley et al., 2012), the production of up to 30 embryos, of which only four can be supported by the four teats (Guiler, 1970; Hughes, 1982), precocial breeding (Lachish et al., 2009; Russell et al., 2019) and multiple paternity litters (Russell et al., 2019). Despite these unique reproductive traits, Tasmanian devils have shown concerning declines in productivity in both captivity (Farquharson et al., 2017) and the wild (Farquharson et al., 2018). So, an understanding of whether diversity exists at reproductive genes is a fundamental step in identifying genes that may be associated with differential reproductive phenotypes, and hence may influence reproductive fitness. Armed with this basic knowledge, conservation managers can then use this information in their management decisions pertaining to captive breeding and translocations.

Here, we aimed to identify and characterize reproductive genes, and then examine single nucleotide polymorphism (SNP) diversity at these genes using 37 resequenced Tasmanian devil genomes. We explore signatures of selection to identify polymorphic genes with adaptive potential (i.e., genes where specific alleles may result in differential phenotypes that are beneficial under particular circumstances). The results from this study provide a resource for future research to examine the association between reproductive diversity and productivity in the Tasmanian devil.

2 MATERIALS AND METHODS

2.1 Gene identification and characterization

In total, 250 genes that have previously been associated with reproduction in mammalian species were selected based on literature searches using the search terms “reproduction” and “gene,” as well as mining the human gene database GeneCards (www.genecards.org, Stelzer et al., 2016) using the keyword “reproduction.” The identified genes are involved in a variety of reproductive stages including: the hormonal regulation of reproduction, sexual/reproductive development, gametogenesis, fertilization and embryogenesis. Predicted complete and partial gene sequences from NCBI’s or Ensembl's automatic annotation process were identified in the Tasmanian devil genome reference assembly on NCBI (Devil_ref v7.0 [GCA_000189315.1], Murchison et al., 2012).

Gene predictions in the Tasmanian devil genome were checked using a number of methods including: (a) confirming gene synteny against model organisms (human and mouse) and the current highest-quality marsupial genome (koala) using NCBI’s genome viewer (NCBI Resource Coordinators, 2017); (b) mapping the predicted coding sequences (CDS) back to the reference genome using splign (Kapustin et al., 2008) to ensure all exons were correctly identified and confirm that CDS were complete and did not contain any premature stop codons or frameshift mutations; and (c) performing a blastp (Altschul et al., 1990) search on the predicted translated sequences against the UniProt (Consortium, 2018) database to confirm identity and protein lengths. For genes with multiple isoforms, the first-named isoform (Variant X1) was investigated (usually the longest). All genes were utilized in downstream analyses.

For partial gene predictions, any missing exons were identified by comparison to well-annotated model organism orthologues using the NCBI genome viewer (NCBI Resource Coordinators, 2017) and tblastn (Altschul et al., 1990) searches. Where exons were unable to be fully resolved (i.e., due to gaps in the reference sequence, genome fragmentation, etc.) partial sequences were utilized in downstream analyses. For any genes not automatically annotated in the reference genome by NCBI or Ensembl, the predicted location of these genes was identified through gene synteny and tblastn searches with model organisms (human and mouse), and gene prediction was performed using fgenesh+ (Solovyev, 2004) with koala orthologues as an input. If an orthologous sequence was not available in koala, human or mouse orthologues were used as an input instead.

2.2 Sample collection and genome resequencing

Two existing data sets of resequenced genomes were used to explore reproductive gene diversity in the Tasmanian devil. The first data set comprised 25 individuals (including 12 wild-born founders [Figure S1] and nine parent–offspring trios [Figure S2]) that were sequenced to a high coverage of ~45× (SRA accessions: SRX6096677–SRX6096696, Wright et al., 2020). The second data set included 12 wild individuals from a separate wild population (Figure S1) sequenced to a low coverage of 10–15× (SRA accessions: ERS682204–ERS682210; ERS1202857–ERS1202861, Wright et al., 2015, 2017). This low-coverage data set was only included following the preliminary SNP identification to minimize the risk of this data set introducing false SNPs. We refer to the 12 low-coverage genomes as “12L” to differentiate it from the data set encompassing the 25 high-coverage resequenced genomes (“25H”).

2.3 Preliminary SNP identification

To identify an initial high-confidence target SNP set, whole-genome alignment and SNP calling was performed on the 25H data set following the methods given by Wright et al. (2020). Briefly, reads were aligned to the Tasmanian devil reference genome assembly version 7.0 (GenBank: GCA_000189315.1, Murchison et al., 2012) using BWA version 0.7.15 (Li & Durbin, 2009). PCR duplicates were removed with picardtools version 1.119 (http://broadinstitute.github.io/picard/) and indel realignment was performed with gatk version 3.6 (McKenna et al., 2010). SNPs were called using samtools version 1.6 (Li et al., 2009) with minimum base and mapping quality of 30 and a coefficient for downgrading mapping quality for reads containing excessive mismatches of 50. annovar version 20180416 (Yang & Wang, 2015) gene-based annotation was used to annotate all variants from each of the 25H resequenced genomes aligned to the reference genome using the corresponding genome annotation file from NCBI (O'Leary et al., 2015). Any genes not included in the NCBI annotation were checked for SNPs manually in geneious (Kearse et al., 2012). SNPs associated with the reproductive genes in the 25H Tasmanian devils were identified by filtering the annovar output, and the total number of each type of SNP (synonymous, nonsynonymous, splicing, UTR5, UTR3, intronic, upstream, downstream) was calculated for each gene. Reproductive genes containing nonsynonymous SNPs were targeted for further analysis. The 12L data set was not included in the initial SNP identification procedure in order to minimize the risk of false positive SNPs, which may have resulted in inaccurate target gene identification, because SNPs from low-coverage data sets cannot be called as confidently as from higher-coverage data.

2.4 Nonsynonymous SNP confirmation and analysis

Reproductive genes containing nonsynonymous SNPs were investigated further in both the original 25H resequenced genomes as well as the 12L resequenced genomes. Variants within the target reproductive genes of the 12L resequenced genomes were called together with the 25H resequenced genomes using the same parameters, as above. This method was chosen as multisample callers result in the best accuracy when lower coverage samples are called simultaneously with a larger number of higher coverage individuals (Cheng et al., 2014). Individual sample VCF files were then subset from the multisample VCF file and filtered to exclude variants below a filtered depth threshold using bcftools version 1.3.1 (Li et al., 2009). We chose a minimum filtered read depth of 10 for the 25H resequenced genomes and a minimum filtered read depth of five for the 12L resequenced genomes to increase confidence in the variant calls while preventing excessive data loss. The remaining variants were then merged into a multisample VCF file and converted to transposed plink format (Purcell et al., 2007) using vcftools version 0.1.14 (Danecek et al., 2011). plink version 1.90 was used to calculate minor allele frequencies (MAFs) and determine genotypes for all variants present within the coding regions of the target reproductive genes. Any variants with an MAF below 0.05 that were called in only one individual and had a low allelic depth (below 10), were removed in geneious. Any positions that were called as variants relative to the reference, but which were monomorphic across the 37 resequenced genomes (i.e., MAF = 0), were also filtered out using gatk and bcftools. The final variant call files were used to create consensus sequences for each individual using gatk. IUPAC ambiguity codes were used to represent heterozygous positions in the individual consensus sequences, and any positions below the specified filtered read depth (as above), or with a missing genotype, were masked. Extraction of CDS for the target genes was performed using bedtools version 2.25 (Quinlan & Hall, 2010) with a custom bed file containing the target gene regions and exon positions. Alignments of the CDS were mapped to the reference in geneious to confirm all synonymous and nonsynonymous SNPs. Missing data/genotyping rate (by locus and individual), MAFs, heterozygosity and deviations from Hardy–Weinberg equilibrium were calculated for the identified nonsynonymous SNPs in plink version 1.90 (Purcell et al., 2007). These analyses were performed on all samples and again with the nine known offspring removed to ensure the measures were not influenced by relatedness.

2.5 Population diversity analysis

CDS alignments of genes confirmed to contain SNPs were converted to phase format using seqphase (Flot, 2010). phase version 2.1 (Stephens & Donnelly, 2003; Stephens et al., 2001) was used to construct haplotypes using the original model with default iteration parameters and output probability thresholds (-p and -q) set to 0. This was performed to ensure any missing SNPs were imputed (based on the distributions of known haplotypes and allele frequencies across the entire data set, see Stephens & Donnelly, 2003; Stephens et al., 2001) prior to performing the population diversity analysis. The -x flag was used to run the algorithm five times (with random seeds for each run) for each gene and the run with the highest goodness-of-fit statistic was selected for the output. seqphase was used to convert the phase output files to fasta format and cervus 3.0.7 (Kalinowski et al., 2007) was used to test whether the phased haplotypes were consistent across the nine trios present in the data set. dnasp version 6 (Rozas et al., 2017) was used to infer the number of haplotypes (h), haplotype diversity (hd) and nucleotide diversity per site (π) for each gene. Deviations from the neutral model of molecular evolution were tested using Tajima's D (Tajima, 1989) in dnasp and codon-based Z-tests of selection were performed in mega7 (Kumar et al., 2016) using the Nei–Gojobori method (Nei & Gojobori, 1986) with variance estimated from 500 bootstraps. These statistics were repeated with the nine known offspring excluded to ensure any significant findings were not influenced by relatedness.

3 RESULTS

3.1 Gene characterization

Of 250 genes examined, 214 had predicted (complete or partial) CDS (Table S1). These 214 predicted genes were confirmed through analysis of gene synteny, CDS and blastp searches and were investigated in the subsequent SNP analysis. The remaining 36 genes were not automatically annotated by NCBI or Ensembl and could not be identified in the Tasmanian devil genome (Table S2).

3.2 SNP identification and analysis

Using our 25H resequenced genomes, we identified over 5,000 putative SNPs associated with the 214 reproductive genes investigated (Figure 1) with an average of 28 putative SNPs per gene (range 0–549) (Table S3). Approximately 90% of these SNPs were intronic (Table S3). Forty-nine genes (23% of all genes investigated) were predicted to contain exonic SNPs, with 34 of these genes predicted to contain at least one nonsynonymous SNP (Table S3). Genes involved in embryogenesis, fertilization and hormonal regulation of reproduction displayed greater numbers of nonsynonymous SNPs than synonymous SNPs (Figure 1).

Details are in the caption following the image
Total number of SNPs identified in genes known or predicted to be involved in a variety of reproductive functions including embryogenesis (N = 13 genes), fertilization (N = 26), hormonal regulation of reproduction (N = 43), gametogenesis (N = 74), and general reproductive development and function (N = 58). (a) Exonic SNPs including synonymous (S) and nonsynonymous (NS) SNPs. (b) Other major SNP types including untranslated regions (UTR), flanking regions (F) and intronic regions (I). Stripes indicate intronic SNPs are plotted on the secondary axis. Light shading indicates SNPs that are 5′ (upstream), and dark shading indicates SNPs that are 3′ (downstream). See Table S3 for more information.

Confirmation of putative nonsynonymous SNPs was performed by analysing data from the 12L and 25H resequenced genomes together, along with additional filtering (see Methods). After filtering, 33 nonsynonymous SNPs across 20 of the genes remained (Table S4). These 20 genes represented molecular processes across a range of reproductive roles in females, males or both sexes (Table 1). For these nonsynonymous SNPs, the genotyping rate (percentage of individuals successfully genotyped at each SNP) was 82% (77% when excluding the nine known offspring) (Table S5). All nonsynonymous SNPs conformed to Hardy–Weinberg expectations (Table S5).

TABLE 1. Reproductive roles of genes found to contain nonsynonymous SNPs
Gene Role in reproduction Sex affected Reference
ADAMTS9 Important in uterine remodelling of implantation, placentation and parturition Female Russell et al. (2015)
ADAMTS10 Important for adhesion between the sperm and egg zona pellucida Male Dun et al., 2012
ADAMTSL1 Involved in embryonic gonadogenesis Female Carré et al. (2011)
AIRE Mutations result in autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy (APECED) which can lead to infertility Both Aaltonen et al. (1997)
BMP5 Predicted to play a role in ovarian folliculogenesis Female Pierre et al. (2005)
CHD7 Mutations result in CHARGE syndrome (pubertal failure and infertility) Both Kim et al. (2008)
CLU Increased expression results in reduced sperm quality and infertility Male Zalata et al. (2012)
CYP19A1 A key enzyme in oestrogen biosynthesis and influences female fertility Female Simpson et al. (1994); Altmäe et al. (2009)
DIAPH2 Important for normal ovarian development and function Female Bione et al. (1998)
DZIP1 Regulator of hedgehog signalling and may participate in spermatogenesis via its interaction with DAZ Male Moore et al. (2004); Sekimizu et al. (2004)
IRS4 Null mutations can lead to defects in reproduction Both Fantin et al. (2000)
KIT Plays a key role in germ cell development, spermatogenesis and oogenesis Both Rossi, 2013; Russell et al. (2015)
LEP Deficiencies can lead to hypogonadotrophic hypogonadism and infertility Both Chehab et al. (1996)
NANOG Transcription regulator important for embryonic stem cell pluripotency Both Pan and Thomson (2007)
PIP Functions in seminal fluid, important for fertilization Male Hassan et al. (2009)
PRDM14 Required for the proper initiation and coordination of the primordial germ cell specific gene expression programme and promotes pluripotency Both Hohenauer and Moore (2012)
PTCH1 Mediates hedgehog signalling in developing and adult marsupial gonads Both O'Hara et al. (2011)
PTCH2 Mediates hedgehog signalling in developing and adult marsupial gonads Both O'Hara et al. (2011)
PTGFRN Inhibitor of the Prostaglandin F2 Receptor which has multiple roles in reproduction (e.g., progesterone synthesis and ovulation) Female Craig (1975)
SPACA6 Involved in sperm–oocyte fusion; gene knockouts result in failed fusion Male Lorenzetti et al. (2014)

Haplotypes at 18 of the 20 genes were consistent with the known trio information. DIAPH2 showed inconsistencies in five sire–dam–offspring trios (offspring haplotypes were not observed in the parents), possibly due to sequence complexity or particular motifs in this gene region resulting in sequencing difficulty (Nakamura et al., 2011). This gene was excluded from further analysis due to the high error rate (25% of SNPs were inconsistent across the nine trios). PIP showed two occurrences of trio phasing inconsistency but was included in subsequent analysis due to the low error rate (2.2% of SNPs were inconsistent across the nine trios). This resulted in 19 final genes (following exclusion of DIAPH2) that were included in subsequent population diversity analysis.

The total number of SNPs (both synonymous and nonsynonymous) in the coding regions of each of the 19 final genes across the 37 resequenced genomes ranged from one to 10; the number of haplotypes per gene ranged from two to four (Table 2). Mean haplotype diversity was 0.36 (SD 0.20) and mean nucleotide diversity was 4.3 × 10−−4 (SD 5.4 × 10−4) (Table 2).

TABLE 2. Diversity statistics and neutrality tests performed on the target reproductive genes
Gene n CDS length (bp) SNPs (ns:s) h hd π Tajima's D Z-test
ADAMTS9 74 5,919 9 (1:8) 4 0.666 7.32 3.52 −2.63
ADAMTS10 74 3,342 1 (1:0) 2 0.104 0.31 −0.60 0.98
ADAMTSL1 74 5,298 1 (1:0) 2 0.294 0.55 0.53 1.00
AIRE 74 1,590 10 (4:6) 4 0.451 21.71 1.83 −1.73
BMP5 74 1,368 1 (1:0) 2 0.053 0.39 −0.90 1.04
CHD7 74 9,093 3 (3:0) 3 0.586 1.15 1.34 1.27
CLU 74 1,178 2 (2:0) 3 0.445 4.06 0.27 1.29
CYP19A1 74 1,512 1 (1:0) 2 0.053 0.35 −0.90 1.07
DZIP1 74 2,433 3 (1:2) 3 0.283 3.08 0.41 −1.26
IRS4 74 2,751 1 (1:0) 2 0.053 0.19 −0.90 1.06
KIT 74 2,901 3 (2:1) 4 0.545 3.76 1.47 −0.67
LEP 74 504 1 (1:0) 2 0.217 4.30 0.07 1.04
NANOG 74 936 2 (2:0) 2 0.494 10.55 2.30 1.01
PIP 74 534 3 (3:0) 4 0.588 12.54 0.17 0.16
PRDM14 74 1,662 2 (1:1) 2 0.217 2.61 0.09 −0.69
PTCH1 74 3,891 1 (1:0) 2 0.344 0.88 0.82 1.01
PTCH2 74 4,524 3 (3:0) 3 0.527 2.34 1.37 1.50
PTGFRN 74 2,892 2 (2:0) 3 0.416 1.54 0.14 1.40
SPACA6 74 1,122 1 (1:0) 2 0.462 4.12 1.53 1.02

Note

  • n, number of sequences (two allele sequences per individual); h, number of inferred haplotypes; hd, haplotype diversity; π, nucleotide diversity (×104); ns:s, nonsynonymous:synonymous.
  • * p < .05. Did not remain significant after Holm–Bonferroni multiple test correction.
  • ** p < .01. Did not remain significant after Holm–Bonferroni multiple test correction.
  • *** p < .001. Remained significant after Holm–Bonferroni multiple test correction.

ADAMTS9 and NANOG showed statistically significant deviation from neutrality at the sequence level with positive Tajima's D values suggesting population decline or balancing selection (Table 2). ADAMTS9 also showed evidence of purifying selection at the codon level with a statistically significant negative Z-test (p < .01; Table 2). There were no qualitative changes to the results when the nine known offspring were excluded from the analyses (Table S6).

4 DISCUSSION

As wildlife populations continue to decline globally, understanding the genetic basis of interindividual variation is crucial for determining which genes may govern important phenotypes and contribute to species’ long-term survival and fitness. Here we show how genomic data can be used to explore functional genetic diversity in an endangered species. This study identified a surprising amount of putatively functional variation at reproductive genes in an otherwise genetically depauperate species. Tasmanian devils have shown concerning declines in productivity over time in both captivity (Farquharson et al., 2017) and the wild (Farquharson et al., 2018). It is predicted that genetic variation may play a role in such changes (Farquharson et al., 2017; Gooley et al., 2020), although until now there was limited knowledge of whether diversity even exists at their reproductive genes. We characterized genetic variation at 214 reproductive genes in 37 Tasmanian devils and identified 5,933 putative SNPs. Signatures of selection were examined at a subset of 19 target genes that contained nonsynonymous variation, and hence may have functional consequences for reproduction. To the best of our knowledge, this is the first study to examine within-species reproductive gene diversity to this extent in a threatened species.

Tasmanian devils exhibit very low levels of genetic diversity overall (Cheng et al., 2012; Jones et al., 2004; Miller et al., 2011; Morris et al., 2015). Most (77%) of the reproductive genes we examined had monomorphic coding regions in our sample set of 37 resequenced genomes: a low level of diversity that is comparable to that seen in a previous study which examined genetic diversity at 167 immune genes in 10 Tasmanian devils (seven of which were included in the current study) (Morris et al., 2015). However, within those reproductive genes that showed nonsynonymous variation, we found surprisingly high diversity relative to a similar subset of immune genes that also contained nonsynonymous SNPs (Morris et al., 2015). For example, despite a much larger sample size of up to 196 individuals across multiple captive and wild populations (with the majority of individuals presumed to be unrelated), Morris et al. (2015) found a maximum of three SNPs per gene across nine polymorphic immune genes, compared with a maximum of 10 SNPs per reproductive gene here (across the final 19 polymorphic reproductive genes). Mean haplotype diversity was also higher in the current study. Differences in sample origin may contribute to the observed increased levels of diversity herein; however, the finding of higher genetic diversity at reproductive genes compared with immune genes is unexpected given the smaller sample size and presence of related individuals within the current study. We note that Morris et al. (2015) used amplicon sequencing to confirm SNP diversity in the subset of target genes, which resulted in fewer SNPs than predicted by genome resequencing data. Although we did not employ gene-targeted sequencing methods in this study, we believe that the SNPs identified are likely to reflect real diversity, not sequencing artefacts, due to the number of resequenced genomes (particularly those with high coverage, around 45×) and the strict variant calling and filtering parameters employed.

Thirty-six reproductive genes (14% of all genes investigated) present in model species could not be characterized in the Tasmanian devil genome by the methods applied here. For example, there were no tblastn hits for a number of genes including DPPA3/STELLA, SEMG1, SEMG2, TNP2 and PRM2, which are either too divergent from known orthologues to be identified by this method, or do not exist in marsupials (Johnson et al., 2018). Additionally, members of the NLRP (nucleotide-binding oligomerization domain, leucine-rich repeat and pyrin domain-containing proteins) gene family have shown extensive duplication and diversification in mammalian lineages (Tian et al., 2009) and were unable to be identified in the Tasmanian devil genome. Fragmentation and gaps in the current reference genome precluded characterizing a number of genes such as KLK3 and ZPBP (Table S2). Genes located on the Y chromosome (e.g., ATRY, DAZ1, USP9Y and DDX3Y) could not be identified due to the unavailability of Y-chromosome data in the female reference genome. Sequencing the Y chromosome will be important in the future to focus on male reproduction, as a number of important male reproductive genes are found on the Y chromosome (Murtagh et al., 2010; Toder et al., 2000).

Twenty genes were found to contain nonsynonymous SNPs in the current study (with DIAPH2 later excluded due to phasing inconsistencies). Since nonsynonymous mutations result in amino acid changes, genes that contain nonsynonymous SNPs may influence phenotype (Shastry, 2009). Although other SNPs, such as synonymous polymorphisms or variants outside the coding sequence, may contribute to phenotype via processes such as mRNA stability (Chamary & Hurst, 2005), these are expected to have a weaker effect on gene function compared with mutations that alter the protein sequence (Tomoko, 1995). The genes found to contain nonsynonymous SNPs in the current study are involved in a variety of reproductive functions in both males and females, and influence fertility-associated phenotypes in humans and other species (see Table 1 for more information). For example, mutations in the CHD7 gene cause idiopathic hypogonadotropic hypogonadism and Kallmann syndrome in humans, resulting in impaired sexual development in both males and females (Kim et al., 2008). Mutations in the AIRE gene cause autoimmune polyendocrinopathy, candidiasis and ectodermal dystrophy (APECED) (Aaltonen et al., 1997), which has also been linked to infertility in both men and women (Perheentupa, 2006). ADAMTS proteases influence a range of reproductive processes in humans and mice (Russell et al., 2015), three of which (ADAMTS9, ADAMTS10 and ADAMTSL1) displayed nonsynonymous variation in the current study.

The majority of the individuals in our sample set are known to have successfully reproduced based on breeding records in captive facilities (Figure S2), so most of the nonsynonymous SNPs identified in the current study are unlikely to cause the extreme infertile phenotypes that have been reported in humans and mice. However, these variants may result in more subtle phenotypic effects such as reduced fertilization success or reduced offspring survival. We note that a number of nonsynonymous homozygoyte genotypes were not observed in our data set (Table S5). They may encode more severe phenotypes which could be associated with pregnancy loss or infertility and may exist in a larger sample set or could potentially be lethal and hence never appear in homozygous form. Further research is required to explore the functional consequences of the identified nonsynonymous variants herein. Interestingly, we found that genes involved in embryogenesis, fertilization and hormonal regulation of reproduction displayed greater numbers of nonsynonymous SNPs than synonymous SNPs. This suggests that functional diversity may be important at genes involved in such processes. Tasmanian devils exhibit a number of unique reproductive characteristics including undergoing up to three oestrous cycles within their annual breeding season (Keeley et al., 2012); producing a greater number of embryos (up to 30) than can be supported by their four teats (Guiler, 1970; Hughes, 1982); and multiple paternity litters (Russell et al., 2019) even though mate-guarding is a behavioural reproductive strategy (Hamilton et al., 2019). We hypothesize that these unique reproductive traits may drive functional diversity across genes involved in particular reproductive processes through adaptive evolution. For example, multiple mating by females is known to drive sperm competition, which may result in selective pressures on genes involved in fertilization (Dapper & Wade, 2016; Fiumera et al., 2005). Similarly, fitness advantages associated with the timing or number of oestrous cycles, or the number of viable embryos, could potentially drive natural selection at genes involved in the hormonal regulation of reproduction or embryogenesis respectively. To explore these ideas further we investigated signatures of selection at the reproductive genes containing nonsynonymous SNPs.

Of the 19 final reproductive genes (following exclusion of DIAPH2 due to phasing inconsistencies), two genes (ADAMTS9 and NANOG) showed statistically significant signatures of selection, suggesting their variants may be linked to important phenotypic traits. After correcting for multiple testing using the Holm–Bonferroni method (Holm, 1979), the Tajima's D for ADAMTS9 remained statistically significant, indicating that this gene may be under balancing selection at the sequence level within the population. Demographic factors such as population bottlenecks can contribute to the value of Tajima's D (Tajima, 1989), although demographic factors are likely to affect loci across the whole genome. Because similar patterns of selection were not observed across all of the target loci, we hypothesize that ADAMTS9 may be a candidate for long-term balancing selection. Balancing selection actively maintains multiple alleles in a population, suggesting that the associated phenotypes may be advantageous under certain circumstances (e.g., Gos et al., 2012). ADAMTS9 is a pleiotropic gene that belongs to a large, diversified family of ADAMTS genes and has been implicated in several crucial female reproductive processes, namely: ovulation, implantation, placentation and parturition (Russell et al., 2015). ADAMTS9 is also a novel tumour suppressor (Du et al., 2013) and has undergone strong selection for increased longevity in a number of small-bodied mammal lineages (Lambert & Portfors, 2017). This is particularly interesting in our context, as Tasmanian devils have a short lifespan (maximum 5 years in the wild) in comparison to other mammals of their size and show unusually high vulnerability to tumours (Griner, 1979). However, our data cannot disentangle whether potential selection on the ADAMTS9 gene in Tasmanian devils may be attributed to that gene's role in reproduction and/or its role in tumour suppression and longevity. The attributes of this gene, such as its role in a number of key processes, make it a plausible candidate for adaptation and warrants further investigation.

The NANOG gene also showed a putative pattern of balancing selection in the Tasmanian devil, although this result did not remain statistically significant after correcting for multiple testing. NANOG is a key transcription factor involved in embryonic stem cell pluripotency (Pan & Thomson, 2007). We identified a multinucleotide nonsynonymous polymorphism within the CDS of NANOG. It is currently unknown whether these variants are associated with differential phenotypes. Investigations into whether the identified nonsynonymous SNP is correlated with embryonic survival traits and may influence reproductive success within the Tasmanian devil are required.

Although reproductive genes and their variants have been well studied in model and livestock species (see Hunt et al., 2018), there are few data on reproductive variants in threatened species, many of which typically show low overall levels of genome-wide diversity. As a result, it is difficult to ascertain whether Tasmanian devil reproductive gene diversity is higher or lower than expected compared to other threatened species. Furthermore, our study focused on a relatively small sample set from a limited number of locations in Tasmania and may not have captured the true extent of genetic diversity across the species’ range. As whole genome sequencing technology becomes cheaper with time, sampling Tasmanian devils across their range would improve our understanding of their reproductive gene diversity. The full benefit of understanding reproductive gene diversity in Tasmanian devils can be realized by studying the relationship between genetic variation and reproductive phenotypes. For this threatened species, this is possible as the Tasmanian devil insurance population is Australia's largest captive breeding programme (Hogg et al., 2019) with a large number of individuals across multiple generations with DNA samples and extensive reproductive records. This resource will allow us to investigate diversity across a range of candidate genes to determine whether variation in reproductive genes influences reproductive fitness. For example, the SPACA6 gene has been implicated in fertilization ability of male mice (Lorenzetti et al., 2014) and was found to contain a nonsynonymous SNP among the sampled Tasmanian devils in the current study. By sequencing the SPACA6 gene across hundreds of male Tasmanian devils using specific PCR primers, or a targeted capture approach, we could statistically determine whether this variant is correlated with an individual's siring ability. Candidate gene approaches have several advantages over whole-genome approaches, namely the higher inherent statistical power and reduced sequencing costs. However, it is possible that other genes or genomic regions that influence reproductive phenotypes may be missed and so a genome-wide association study (GWAS) may be more informative (for a review of candidate gene vs. GWAS approaches see Suh & Vijg, 2005). A combination of these approaches will probably be the best way forward to understanding the interplay between reproductive genotype and phenotype. The rise of whole genome sequencing and global consortia developing reference genomes for wildlife means that our understanding of functional gene diversity in a range of threatened species can only improve with time, particularly in those species where range reduction and population contraction has led them to be genetically depauperate. The approach used in this study demonstrates how these growing genomic resources can be utilized to explore functional diversity in threatened species and how this information can assist with their conservation management.

5 CONCLUSION

Our study has bioinformatically characterized diversity at 219 reproductive genes in 37 Tasmanian devils. We have identified and examined diversity at 19 polymorphic genes containing nonsynonymous SNPs that may have functional consequences on reproduction. The results from this study provide the foundation for future research to explore whether any of these genes are associated with variable reproductive phenotypes and hence may be involved in the generational productivity declines that have been observed in the Tasmanian devil insurance population (Farquharson et al., 2017; Hogg et al., 2015). If specific genotypes are found to influence productivity, preserving the functional variation described herein may be key to minimizing these declines and facilitating the success of conservation breeding programmes. Beyond assisting with conservation decisions for the Tasmanian devil, the candidate gene approach described here may also be applied to reproductive management in other threatened species conservation programmes.

ACKNOWLEDGEMENTS

This work was funded by the Australian Research Council (LP140100508; DP170101253) and supported by the Save the Tasmanian Devil Program, the Zoo and Aquarium Association Australasia and San Diego Zoo Global. We thank Professor Graham Wallis for insightful discussions on methodology and general themes of the paper. Dr Camilla Whittington provided helpful suggestions of genes important to reproduction. We thank Drew Lee for creating the supplemental map overlay. Computing resources were provided by the Sydney Informatics Hub and the University of Sydney's high-performance computing cluster Artemis.

    AUTHOR CONTRIBUTIONS

    K.B, C.E.G and C.J.H contributed to the design of the study and sourced funding. P.B compiled the list of target genes and performed bioinformatic gene characterisation and SNP prediction. B.W performed alignments of resequencing data to the reference genome and assisted P.B with the analysis of SNP genotypes. P.B wrote the manuscript with contributions and editing from B.W, C.E.G, C.J.H and K.B. All authors read and approved the final manuscript.

    Data Availability

    The Tasmanian Devil reference genome and associated data are available from NCBI:

    • Devil_ref version 7.0 GenBank: GCA_000189315.1 (Murchison et al., 2012)

    Raw genome resequencing reads are available from the referenced NCBI SRA accessions:

    • 25 high-coverage: SRX6096677–SRX6096696 (Wright et al., 2020).
    • 12 low-coverage: ERS682204–ERS682210; ERS1202857–ERS1202861 (Wright et al., 2015, 2017).

    The multisample VCF file for the reproductive genes of interest has been deposited in Dryad

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.