From high masked to high realized genetic load in inbred Scandinavian wolves
Abstract
When new mutations arise at functional sites they are more likely to impair than improve fitness. If not removed by purifying selection, such deleterious mutations will generate a genetic load that can have negative fitness effects in small populations and increase the risk of extinction. This is relevant for the highly inbred Scandinavian wolf (Canis lupus) population, founded by only three wolves in the 1980s and suffering from inbreeding depression. We used functional annotation and evolutionary conservation scores to study deleterious variation in a total of 209 genomes from both the Scandinavian and neighbouring wolf populations in northern Europe. The masked load (deleterious mutations in heterozygote state) was highest in Russia and Finland with deleterious alleles segregating at lower frequency than neutral variation. Genetic drift in the Scandinavian population led to the loss of ancestral alleles, fixation of deleterious variants and a significant increase in the per-individual realized load (deleterious mutations in homozygote state; an increase by 45% in protein-coding genes) over five generations of inbreeding. Arrival of immigrants gave a temporary genetic rescue effect with ancestral alleles re-entering the population and thereby shifting deleterious alleles from homozygous into heterozygote genotypes. However, in the absence of permanent connectivity to Finnish and Russian populations, inbreeding has then again led to the exposure of deleterious mutations. These observations provide genome-wide insight into the magnitude of genetic load and genetic rescue at the molecular level, and in relation to population history. They emphasize the importance of securing gene flow in the management of endangered populations.
1 INTRODUCTION
At least two processes can place small populations in genetic peril. First, mating between relatives (inbreeding) tends to increase the proportion of homozygous loci (Charlesworth & Willis, 2009; Fisher, 1965; Franklin, 1977). This will expose recessive alleles to selection and in the case of deleterious alleles increase the risk for inbreeding depression, a decrease in individual fitness due to the expression of recessives (Hedrick & Garcia-Dorado, 2016; Keller & Waller, 2002). Inbreeding can be difficult to avoid in small populations, like following a population bottleneck. Second, the magnitude of genetic drift is inversely proportional to the effective population size, and the efficacy of selection thereby lowered in small populations (Charlesworth & Willis, 2009). While deleterious mutations segregating at low frequency could get eliminated by genetic drift, they can also increase in frequency when selection is inefficient, and eventually get fixed (Charlesworth, 2009). In addition to these short-term effects, inbreeding and drift also threaten a population by lowering its evolutionary potential (Allendorf, 2017).
Empirical studies aimed at addressing the genetic vulnerability of populations have traditionally used genetic markers to assess the degree and distribution of genetic diversity (Allendorf, 2017; Avise, 1994). These markers often represent neutral loci that are not targets for selection and thus only provide indirect estimates of levels of genomic diversity, let alone can pinpoint the distribution and levels of functional diversity. With large-scale genomic re-sequencing data from population samples it became possible to obtain better estimates of genetic diversity, including in coding sequences across the whole genome (Davey et al., 2011; Luikart et al., 2003). Yet, it is not straightforward to directly translate such data into information about deleterious variation. In particular, the distribution of fitness effects of new mutations in natural populations is often unknown. This has been overcome, at least to some extent, by prediction of functional consequences of new mutations and/or assuming that derived variation at evolutionary conserved sites represent candidates for deleterious variation (Lindblad-Toh et al., 2011; Margulies et al., 2003; Miller et al., 2007; Zoonomia Consortium, 2020). For example, the genomic evolutionary rate profiling (GERP) score uses comparative genomic data from multispecies alignments to quantify the reduction in the number of substitutions across a phylogeny compared to neutral expectations (Cooper et al., 2005; Davydov et al., 2010). When the reduction is significant, sites are interpreted to evolve under the influence of purifying selection and derived variants at such sites thus potentially deleterious.
The genetic load is the occurrence of deleterious alleles in the population, and can be divided into realized load (expressed load) and masked load (potential load, inbreeding load) (Bertorelle et al., 2022). The realized load is formed by all sites where a deleterious allele is expressed, mainly sites that are homozygous for recessive deleterious alleles. The masked load consists of hidden deleterious alleles, sites that are heterozygous where a recessive deleterious allele does not contribute to loss of fitness (for simplicity, here we ignore that some alleles may be partially recessive). As long as a population remains large and genetic drift is negligible, most recessive deleterious alleles will segregate at low frequency and rarely be exposed to selection. As a consequence, the masked load can be high without immediate costs. If the population experiences a significant decline leading to inbreeding, several scenarios for the resolution of the masked load are possible. Exposure of recessive deleterious mutations in homozygous state can purge the gene pool from unfavourable variants and the rate of loss of such alleles may be accelerated by genetic drift. Long-term, it has indeed been shown that genetic purging can decrease the genetic load of small populations (figure 2 in Bertorelle et al., 2022; see also Grossen et al., 2020; Jensen et al., 2021; Mathur & DeWoody, 2021; Ochoa & Gibbs, 2021; Robinson et al., 2022; Xue et al., 2015). However, it comes with the cost of inbreeding depression. Drift can also lead to fixation of recessive deleterious mutations (mutational meltdown) and thereby an increase in the drift load of the population, and decline in fitness (Lynch et al., 1995a, 1995b). Since the distribution of selection coefficient values across loci or even haplotype blocks is typically unknown, the outcome of drift can be hard to predict. Molecular data to illustrate how the genetic load responds to sharp changes in demography will therefore be necessary.
The grey wolf (Canis lupus) is a keystone apex predator in large parts of the world and at the same time a flagship mammalian species in the context of biodiversity conservation (Chapron et al., 2014; Hindrikson et al., 2017). The decline of wolf populations is a concrete example of human-induced alteration in the abundance of a once-common species, since the main reason for its disappearance from many areas is human persecution (Mech, 1995). Many studies have addressed the genetic consequences of decreased size of wolf populations, including in North America (Adams et al., 2011; Hedrick et al., 2014, 2019; Hervey et al., 2021; Leonard et al., 2005; Muñoz-Fuentes et al., 2010; Robinson et al., 2019; Sinding et al., 2018; vonHoldt et al., 2016), Asia (Fan et al., 2016; Zhang et al., 2014) and Europe (Aspi et al., 2006; Gómez-Sánchez et al., 2018; Pilot et al., 2010). Several studies have also provided evidence for inbreeding depression in wolf populations (Liberg et al., 2005; Räikkönen et al., 2006, 2009).
After a more than century-long period of population decline eventually leading to functional extinction in the 1960s, a wolf population was re-established in Scandinavia in the 1980s by the arrival of three immigrants (Wabakken et al., 2001). In the absence of further immigration the population became highly inbred with a mean inbreeding coefficient of 0.25–0.30 among reproducing pairs (Åkesson et al., 2016; Bensch et al., 2006; Flagstad et al., 2003; Vilà et al., 2003; Wabakken et al., 2001). Genome-wide analysis has shown that the genome of most individuals contain very large tracts of runs of homozygosity, reflecting chromosomal regions identical by descent from a recent common ancestor (Kardos et al., 2018). More recently, limited immigration has counteracted depletion of genetic diversity and provided genetic rescue effects (Åkesson et al., 2016; Vilà et al., 2003). There are strong opposing views on how the population should be managed (Immonen & Husby, 2016; Laikre et al., 2022). Here we seek to assess the genomic incidence of deleterious alleles and the character of the genetic load in relation to inbreeding, drift and gene flow in this population.
2 MATERIALS AND METHODS
2.1 Variant detection
We used published high-coverage, whole genome sequencing data of 209 Northern European wolves (Table S1) from three previous studies (Kardos et al., 2018; Smeds et al., 2019, 2021). One-hundred of these wolves, sampled over a period of 30 years, were from the Scandinavian population and had a known pedigree (Åkesson & Svensson, 2016). Reads had already been mapped with bwa-mem version 0.7.17 (Li & Durbin, 2009) to the dog reference genome (CanFam 3.1: Lindblad-Toh et al., 2005), and sorted, deduplicated, base-recalibrated and individually variant called using samtools version 1.9 (Li et al., 2009), picard version 2.10.3 (http://broadinstitute.github.io/picard/) and gatk version 3.8 (McKenna et al., 2010). For the present study, variant calls from the 209 individuals (gvcf format) were jointly genotyped using GATK's GenotypeGVCFs (version 3.8). Only biallelic single nucleotide polymorphisms were used, and these variants were further “hard filtered” using GATK's VariantFiltration (QD < 2.0, FS > 60.0, MQ < 40.0, MQRankSum < −12.5 and ReadPosRankSum < −8.0; settings taken from alternative protocol 2 in GATK Best Practices, Van der Auwera et al., 2013). To ensure high quality of calls and reduce the risk of including duplicated regions, we only kept sites (a) with an overall coverage between 10× and twice the genome-wide coverage, (b) that had a genotype quality of at least 30 and (c) <10% missing data. Moreover, for all analyses except the calculation of site frequency spectra, we only included sites that had a minor allele count (MAC) of at least 2 in the whole data set. The X chromosome was analysed separately and only females were included for these sites to avoid SNP calling issues for the haploid males.
2.2 Polarization of alleles
We used publicly available short read data from two outgroups, Canis lupaster (African wolf, SRA accession: SRR8049196) and Lupulella mesomelas (black-backed jackal, SRA accession: ERR3210523), and mapped the sequences to the dog reference genome using the same procedure as described above, keeping only sites covered with at least five reads per outgroup. To avoid ascertainment bias towards the dog reference allele, we did not use called genotypes for the outgroups but instead pseudo-haploidized the genomes by randomly drawing one allele for each species using the read coverage as weight. This was done for each filtered variant site from above using custom python scripts. Pseudo-haploidization will lose information on polymorphic sites in an outgroup, just as when using an outgroup genome assembly for polarization (as in for example Dussex et al., 2021; Ochoa & Gibbs, 2021; von Seth et al., 2021). However, using two different outgroups reduces the risk of including sites with shared polymorphism.
The ancestral state of polymorphisms segregating in wolves was inferred for all sites where the outgroups agreed on one of the two alleles present in the wolf data set. Sites where the outgroups did not agree or agreed on a third allele not present among the wolves were removed from the analysis. The use of five agreeing outgroups (also including Canis simensis, Canis adustus and Cuon alpinus) did not impact the results based on polarized data, but decreased the proportion of sites that could be polarized due a lower average coverage of these three additional genome samples. The ancestral alleles were added as AA tags to the INFO field of the vcf file using a custom perl script. For all sites considered to contain deleterious variation, the derived allele was assumed to be the deleterious allele. All analyses were based on polarized mutations only.
2.3 Inferring genotypes of founder males
We used haplotype data in 1 Mb windows from 73 Scandinavian wolves obtained by Viluma et al. (2022). Viluma et al. (2022) also inferred haplotypes of the two unsampled male founders of the population based on observed haplotype combinations in their offspring. We translated founder male haplotypes to genotypes by for each variant site matching haplotypes to genotypes in all sequenced individuals. For example, if all individuals with haplotype A|A had the genotype 0/0, and all individuals with haplotype A|B had the genotype 0/1, we could infer that allele 0 was associated with haplotype A and allele 1 with haplotype B. When all haplotypes had been associated with an allele at each variant site, the two male founders were added to the vcf file with genotypes entirely based on their inferred haplotypes. As a validation of this approach, we also inferred the genotypes of the sequenced female founder and compared these to the calls from the genotyping of her DNA. They were identical at 58,806 out of 59,323 sites (99.1%), which suggests that the procedure is reliable.
2.4 Deleteriousness in coding regions
The command line version of Ensembl's variant effect predictor (VEP; McLaren et al., 2016) release 99 was run using the settings --species “canis_familiaris” and --sift b. VEP predicts the impact of each variant as either “low” (synonymous mutations), “moderate” (nonsynonymous mutations), “high” (nonsense mutations) or “modifier” (all other sites). Further, the –sift option returns predictions for moderate mutations as to if they are deleterious or tolerated based on precalculated sorting tolerant from intolerant (SIFT) scores (version 5.2.2., database Uniref90). SIFT uses both sequence homology and physical properties to predict if an amino acid substitution has an impact on the protein function (Kumar et al., 2009). In overlapping genes or transcripts a single site can have more than one prediction; for example, a site can be synonymous in one gene but nonsynonymous in another. For such sites, we used the most severe effect. The modifier category included sites within exons annotated as 5′ untranslated (UTR) variants, 3′ UTR variants and “noncoding transcript” variants.
The Miyata score (Miyata et al., 1979) and Sneath's Index (Sneath, 1966) were assigned for each amino acid change reported in the VEP output using a custom python script inspired by simpred (https://github.com/NBISweden/simpred.git). These models calculate the distance between replaced amino acids; the Sneath's index uses 134 categories of activity and structure, and the Miyata's distance is based on volume and polarity. A site was assigned deleterious if the Miyata score was higher than 1.85 or if the Sneath Index was higher than 20, with thresholds taken from Williamson Scott et al. (2005).
2.5 Deleteriousness based on GERP scores
A multiple alignment with 100 vertebrate species (“100way alignment”) including the CanFam3.1 reference genome was downloaded from the UCSC genome browser (http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/maf/). To avoid biases towards the focal genome, the dog genome was removed from the alignment using MafFilter version 1.1.2 (Dutheil et al., 2014) before gerp++ (version 20110522; Davydov et al., 2010) was run using the tree file provided by UCSC (http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/hg38.100way.nh) and hg38 as reference. The GERP scores were subsequently transferred to dog reference coordinates using LiftOver between hg38 and CanFam3.1 (Kuhn et al., 2013).
The range of GERP scores obtained from a particular alignment depends on the width of the corresponding phylogenetic tree. Suitable thresholds for judging whether mutations shall be considered deleterious or not will depend on the phylogenetic relationships among the species included in the multiple alignment. As a guide for setting a threshold we compared the distributions of GERP scores for sites assigned either synonymous or deleterious with VEP, and found a GERP score of 4 to represent a compromise between excluding as many as possible of tentatively neutral sites while including as many as possible of potentially deleterious sites.
2.6 Calculations of genotype proportions
The vcfR package was used to read the vcf files into r (version 4.1.1; R Core Team, 2021), and all subsequent calculations were performed in r using the package tidyverse (Wickham et al., 2019). When calculating the proportions of heterozygous genotypes and homozygous derived genotypes in an individual, we divided the number of sites of each genotype with the total number of called genotypes for that individual (including sites homozygous for the ancestral allele, genotyped because they were polymorphic in other individuals in our dataset).
3 RESULTS
We used whole-genome SNP data from 100 individuals of the highly inbred Scandinavian wolf population and from an additional 109 wolves from Finland and Russian Karelia. The Scandinavian wolves consisted of 73 animals sampled 1984–2015 that descend from the three individuals that founded the population (“the original population”), 11 immigrants of which four became integrated with the population 2008–2013 and bred in Scandinavia, and 16 offspring sampled 2010–2015 from matings between these four immigrants and individuals of the original population (referred to as “immigrant descendants”).
We identified 10,622,231 autosomal variant sites of which 8,313,538 could be polarized using two outgroups. With a genome assembly size of 2.4 Gb, this corresponds to an average of about one SNP every 300 bp. We began by focusing on protein-coding regions in the genome and identified 59,323 SNPs in 14,261 different genes. These SNPs were classified as synonymous (33,895), missense (leading to a new amino acid; 24,405) and nonsense (leading to a new stop codon, disrupting the reading frame or altering splice sites; 1023) mutations. Further, 17,790 of the missense variants were divided into deleterious (4809) and tolerated (12,981) mutations based on SIFT scores. An additional 70,016 SNPs from noncoding parts of exons were classified as modifier variants.
The number of variants was higher in the Finnish and Russian samples than in the original Scandinavian population for all functional categories (Table 1). For example, while 4640 and 3756 deleterious missense mutations were seen in Finland and Russia, respectively, only 1404 were present in the original Scandinavian population. The number of variants in a sample obviously depends on the number of unrelated individuals studied. Although the sample size from the original Scandinavian population was large, the lower number of detected variants can clearly be attributed to its very narrow genetic basis, given by only three founders. Downsampling data from Scandinavian wolves to the same sample size as the Russian population, the number of deleterious alleles in the Scandinavian population was about one-third (1274 ± 48 SD from 100 replicates) of that in Russia.
Original Scandinavia (n = 73) | Finland (n = 95) | Russia (n = 14) | ||||
---|---|---|---|---|---|---|
No | Mean per individual | No | Mean per individual | No | Mean per individual | |
Total | 25,992 | 57,722 | 50,128 | |||
Synonymous | 15,588 | 8194 ± 951 | 33,024 | 11,794 ± 344 | 29,015 | 11,675 ± 185 |
Missense | 9951 | 5080 ± 619 | 23,702 | 7681 ± 230 | 20,225 | 7563 ± 127 |
Tolerateda | 5616 | 2898 ± 357 | 12,618 | 4266 ± 124 | 10,858 | 4202 ± 71 |
Deleteriousa | 1404 | 603 ± 85 | 4640 | 1144 ± 50 | 3756 | 1109 ± 33 |
Nonsense | 453 | 236 ± 30 | 996 | 349 ± 14 | 888 | 343 ± 11 |
- Note: Number of individuals per sample in parenthesis.
- a Confidently assigned by sorting tolerant from intolerant (SIFT).
3.1 The effect of genetic drift
Most variants in a population typically segregate at low frequency, which was also the case in the wolf data (Figure 1). Unfolded site frequency spectra in the Russian population (Figure 1a) as well as in the three founders (Figure 1b) was significantly shifted to the left (more rare alleles) for deleterious missense mutations compared to synonymous mutations (chi-square goodness of fit test, p < 2.2e-16 in both cases), consistent with a history of purifying selection. The categories modifier, tolerated missense and, somewhat surprisingly, also nonsense mutations followed the distribution of supposedly neutral synonymous mutations. In the following we will mainly focus on synonymous and deleterious missense variants to contrast a category of mutations that, overall, are likely to be neutral with mutations that are likely to contribute to the genetic load.

About half of the deleterious mutations that entered the Scandinavian population were represented by only one copy in the founders. On the opposite side of the spectrum, all three founders were homozygous for 47 deleterious mutations segregating in neighbouring populations; these mutations were thus directly fixed in the Scandinavian population. The number of variants, both synonymous and deleterious, decreased over time in the original Scandinavian population (Table S2). For example, there were 1369 deleterious alleles segregating in the three founders, but only 1006 remaining after five generations of inbreeding; about one-quarter of alleles had thus become lost by genetic drift.
To further examine the effect of drift we compared allele frequencies in the three founders with that in the population after five generations of inbreeding, represented by 11 wolves sampled 2007–2015 (Figure 2). The significant variation in allele frequencies among inbred wolves for each of the six possible categories of starting frequencies (given three founders; Figure 2) indicates that the power of genetic drift in this small and bottlenecked population has been strong. For 30 deleterious and 618 synonymous mutations that were polymorphic in the founders, all 11 inbred individuals were homozygous for the derived allele, indicating that these sites had become fixed in the population. We note that of the tentatively fixed sites that had three copies of each allele in the founders, 61% became fixed for the ancestral allele and 39% for the derived allele (χ2 test, p = .028). Since the vast majority of mutations were at synonymous sites, it is unlikely that this difference was driven by purging of derived deleterious alleles.

The arrival and breeding of four immigrant wolves 2008–2013 in Scandinavia meant that many new alleles entered the population, including alleles that had become lost by drift in the original population. Of the 30 sites that tentatively had become fixed for the derived deleterious allele in the original population, 28 had regained the ancestral allele in 16 immigrant descendants sampled 2010–2015. For the 47 sites fixed for the derived allele already among the three founders, 21 had regained the ancestral allele in immigrant descendants. Not surprisingly, immigrants also contributed deleterious alleles—1890 in total, of which 844 had not previously been seen in Scandinavia. Immigrant descendants showed almost twice as many deleterious variants (1993) than the 11 inbred wolves with Scandinavian ancestry sampled during the same time period.
3.2 The effect of inbreeding
Inbreeding is expected to shift genotype frequencies. To examine if this led to the exposure of deleterious mutations in the original Scandinavian population, we followed changes in genotype frequencies over time and compared frequencies between populations. First, it was clear that the three founders of the original population had a lower proportion of heterozygous sites (deleterious: mean = 0.111 ± 0.020; synonymous: mean = 0.174 ± 0.029) than immigrant wolves (0.179 ± 0.017; 0.238 ± 0.016) as well as wolves from Finland (0.190 ± 0.015; 0.243 ± 0.017) and Russia (0.186 ± 0.008; 0.241 ± 0.009) (Figure 3a). The population thus started with less neutral and functional diversity than would have been the case with any three random individuals from the samples of Finnish, Russian and immigrant wolves.

Second, we found a clear and continuous reduction over time in the proportion of heterozygous genotypes in the original population, both for deleterious and synonymous mutations (Figure 3a). Third, the pattern for homozygous derived genotypes was essentially reversed (Figure 3b). Inbreeding resulted in an increased proportion of homozygous genotypes, both for deleterious mutations and neutral variants. The same patterns were found when grouping the individuals according to the fraction of the genome represented by runs of homozygosity (FRoH), a measure of inbreeding (Kardos et al., 2018): the proportion of homozygous derived genotypes increased with FRoH (Table S3), similar to what has been seen in, for example, Indian tigers (Khan et al., 2021).
Fourth, immigrants contributing to reproduction in Scandinavia 2008–2013 were genetically more variable (had a higher proportion of heterozygous genotypes) than individuals of the inbred population (Figure 3a). As a consequence, offspring from matings between immigrants and inbred individuals were more heterozygous and had a lower proportion of homozygous derived genotypes than inbred wolves from the same time period. However, just as in the original population, the proportion of heterozygous genotypes again decreased following new generations of inbreeding.
To test if the observations made above were robust to the method used for assessing the deleteriousness of nonsynonymous mutations, we also applied two classical models of deleteriousness based on physiochemical properties of amino acids: the Sneath's index (7185 deleterious mutations identified) and the Miyata's distance (8668). The relative patterns of diversity differences among groups of wolves were similar for all methods (see Figure S1).
Finally, we considered the absolute number of sites with homozygous deleterious mutations in protein-coding genes per individual. The three founders of the Scandinavian population had 175, 205 and 236 such sites, respectively. However, after six generations of inbreeding, the number of homozygous sites had increased to a mean of 278 ± 10.7 per individual (corresponding to a 45% increase from the first inbreeding generation). The number was higher than in the Russian population (mean 218 ± 19.1), among immigrants (mean 221 ± 20.8) and in the Finnish population (mean 240 ± 35.2).
3.3 Genes on the X-chromosome
We identified 1473 synonymous and 191 deleterious variants in 432 X-linked genes segregating in 74 females from the total sample. Of these variants, 275 and 25 were detected in the original Scandinavian population, respectively. With less data we could not perform the same analyses as with autosomal sequences but it was clear that deleterious alleles on the X-chromosome segregated at lower frequency than deleterious alleles on autosomes (Figure S2). As an example, in the Russian population the frequency of singleton deleterious alleles on the X-chromosome was 2.1 times higher than the frequency of singleton synonymous alleles, while on autosomes the frequency of deleterious singletons was 1.5 times higher than synonymous singletons. Since recessive X-linked alleles are exposed to selection in males, purifying selection (and thus purging) is more effective on the X-chromosome than on autosomes.
3.4 Analyses based on GERP
An alternative way to assess the potentially deleterious effects of mutations is to use conservation scores based on alignment of homologous sequences from a large number of species. This allows studying any alignable region of the genome, that is, also including noncoding sequences, and provides a quantitative estimate of deleteriousness. We assessed the distribution of GERP scores for all five categories of mutations in protein-coding genes (synonymous, modifier, nonsense, tolerated missense and deleterious missense; Figure S3A) as well as the distribution of scores for 4,995,746 polymorphic sites across the whole wolf genome (Figure S3B). The density plot for deleterious mutations was heavily skewed towards high GERP scores, as expected, although we note that some missense mutations considered deleterious by VEP/SIFT do not appear particularly conserved. Technical (for instance, incorrect polarization of segregating alleles) or biological reasons (like turnover of conserved sequences; Huber et al., 2020) could potentially explain this seemingly unexpected observation. The distribution of GERP scores for nonsense mutations was similar to the distribution for mutations in the whole genome, again justifying the focus on missense mutations classified as deleterious by VEP/SIFT as a candidate category of mutations for having negative fitness effects.
Based on the distributions of synonymous and deleterious missense mutations we set a GERP score threshold of 4 for defining a mutation as potentially deleterious (see Section 2). With this threshold, 7.5% (376,835) of all mutations present in alignable regions of the 209 wolf genomes analysed in this study were deemed potentially deleterious. This proportion is within the range of estimates obtained for the human genome using similar methods (Huber et al., 2020; Rands et al., 2014). In Finland, Russia and among immigrants to Scandinavia the mean number of deleterious sites (both heterozygous and homozygous) per individual genome was about 130,000 (Table S4). It was 18% lower among the three founders of the original Scandinavian population and further decreased with 16% to about 90,000 deleterious sites after six generations of inbreeding in the population.
Using polymorphism data from the whole genome we estimated the individual masked load as the sum of the GERP scores of all deleterious derived alleles in heterozygous genotypes, divided by the number of called genotypes per individual to account for differences in callability between individuals. The load was highest in wolves from Finland and Russia, and in immigrants to Scandinavia (Figure 4, top panel). The three founders of the original Scandinavian populations had somewhat lower masked load, and the load further decreased during subsequent generations of inbreeding. Like for VEP/SIFT data on heterozygous genotypes in protein-coding genes, the arrival and breeding of new immigrants to the Scandinavian population increased the masked load, followed by a decrease during subsequent generations of inbreeding.

The realized load, the sum of GERP scores of deleterious derived alleles in homozygous genotypes divided by all called sites in the genome, showed the opposite pattern (Figure 4, bottom panel), again similar to VEP/SIFT data on homozygous genotypes in protein-coding genes. In this case the load was generally lowest in the larger populations and in immigrants (about 40,000 derived homozygous sites per individual; Table S4). In Scandinavia, the realized load increased with inbreeding (up to over 52,000 sites, or 27%), was balanced by the integration of new immigrants (≈43,000 sites) and then again increased with subsequent inbreeding (≈47,000 sites after three generations of inbreeding).
Finally, we considered a set of mutations in protein-coding genes that are candidates for being truly deleterious, namely the intersect of deleterious missense mutations and mutations with a GERP score >4 (n = 2027). This set shows a site frequency spectrum that is further shifted to the left compared to deleterious missense mutations with a GERP score ≤4 (Figure S4). In the 11 inbred wolves sampled 2007–2015, there were on average 75.5 ± 10.5 homozygous “highly deleterious” genotypes per individual, compared to 47.9 ± 5.0 in the first-generation offspring to the founders. Thus, again, inbreeding led to the exposure of deleterious alleles, potentially contributing to inbreeding depression.
4 DISCUSSION
Although the concept of genetic load was formulated more than 70 years ago (Muller, 1950), it is not until very recently it has become possible to estimate the load with other than quantitative genetic approaches. While such approaches have provided important insights into the relationship between inbreeding and fitness (Morton et al., 1956), they cannot address the molecular basis of the genetic load or be used in natural populations of non-model species without information on inbreeding coefficients and access to phenotypic data. Those obstacles can be overcome by whole-genome sequencing of population samples followed by analyses of the functional character of segregating variation in the data, a direction of research with considerable current interest (Bertorelle et al., 2022).
With a sample size (>200 genomes) exceeding that of most other studies of natural populations of endangered species, it is not surprising that we found a large number of deleterious variants, larger than seen in several other species of conservation concern (like brown bears, Benazzo et al., 2017; Alpine ibex, Grossen et al., 2020; Indian tigers, Khan et al., 2021; Iberian lynx, Kleinman-Ruiz et al., 2022). For example, in the Russian reference sample of 14 individuals we found more than 20,000 missense mutations in protein-coding genes, of which 3756 were confidently assigned as deleterious with an average of more than 1000 mutations per individual. There is yet to be established a common approach for quantifying the genetic load and thereby be able to compare results from different studies and species. The quality (and coverage) of the original data differ among studies, and so does the bioinformatic pipeline used (e.g., including filtering procedures, the choice of algorithm used for effect prediction, and cutoffs for nominating a site as conserved). Genomes vary in assembly quality and annotation, and when using conservation scores from multiple alignments (like with GERP) the outcome is dependent on which species are included in the alignment. It is therefore difficult to say whether the masked or the realized load in wolf populations is high or low compared to other species. However, the sheer number (90,000) of mutations with a GERP score above 4 in the Russian population indicate that mutations with potentially negative effects on fitness are common in wolf genomes and that the masked load probably should be considered high. Moreover, since GERP data was only possible to obtain for regions of the wolf genome alignable across a very large number of species, the actual number of deleterious mutations in wolf genomes is likely to be higher.
Nonsense or loss of function (LoF) mutations are generally considered to have more severe fitness effects than other categories of mutations (and are typically assigned as having “high” impact by effect predictor software). Our results do not support this notion since nonsense mutations showed similar site frequency spectrum and distribution of GERP scores as presumably neutral synonymous mutations. A likely explanation to this observation is that the nonsense category contained a large proportion of mutations at incorrectly annotated loss-of-function sites in the dog reference genome (CanFam 3.1), assembled based on short-read technology more than 10 years ago. Such errors could be due to, for instance, the presence of alternative splice variants, incorrectly placed start or stop positions for translation, frameshifts or inclusion of pseudogenes. Even in the well-annotated human genome, many putative loss-of-function mutations cannot be validated (MacArthur et al., 2012). In support of our inference we note that the relative occurrence of nonsense mutations in our data (1023, compared to 4809 deleterious missense mutations) was much higher than seen in other species (e.g., Grossen et al., 2020; Ochoa & Gibbs, 2021), and that a small peak at ≈15 in the distribution of GERP scores for nonsense mutations might reflect a minor proportion of sites that represent truly loss-of-function mutations.
Most deleterious variants were rare and segregated at lower frequency than neutral alleles, consistent with the action of purifying selection. In large populations such mutations will rarely drift to high frequencies and become exposed to selection in homozygote form. In other words, they are not purged. The cost for the masked load in neighbouring populations is paid by the wolf population in Scandinavia, founded by only three immigrant wolves. We could see genetic signatures of how inbreeding and drift led to the exposure of deleterious alleles in this population. Although some deleterious alleles became lost after a number of generations of inbreeding, the proportion of homozygous genotypes of derived deleterious alleles increased. This was observed both for deleterious missense mutations and potentially deleterious mutations with high GERP scores across the whole genome. Moreover, some deleterious alleles more or less directly became fixed in the Scandinavian population, either because all three founders were homozygous or because deleterious alleles reached fixation after a few generations only (Figure 2). The most inbred individuals showed nearly 300 sites homozygous for deleterious alleles in protein-coding genes, and more than 50,000 such sites in the rest of the genome, which gives a quantitative estimate of the magnitude of the realized load.
A topical question in conservation genetics is to what extent purging can release endangered populations from deleterious alleles. Several recent genomic studies have shown that small populations may carry relatively fewer (strongly) deleterious variants than larger populations of the same species, consistent with purging (Dussex et al., 2021; Grossen et al., 2020; Khan et al., 2021; Kleinman-Ruiz et al., 2022; Mathur & DeWoody, 2021; Ochoa & Gibbs, 2021; Robinson et al., 2022; Xue et al., 2015). A common feature of these observations is that the small populations have been isolated for a long time, like the separation of an island population of kakapos from its mainland source population for at least 10,000 years (Dussex et al., 2021), the separation of the endemic Iberian lynx from European lynx ≈1 million years ago (Kleinman-Ruiz et al., 2022) or the single remaining population of the critically endangered vaquita porpoise (Robinson et al., 2022). Purging is indeed a long-term process and requires going through one or more phases of inbreeding depression. Suffice to say, the mentioned examples represent populations that successfully have survived such periods, which may not always be case.
The recovery of wolves in Scandinavia as recently as in the 1980s means that the population has yet only passed a limited number of generations; our samples were from up to 5–6 generations after founding, and up to three more generations after new immigrants became integrated with the population. On such short-term scale purging (purifying selection) is unlikely to leave genome-wide signatures in the distribution of deleterious alleles. Importantly, most deleterious mutations will still be embedded within large haplotype blocks representing founder chromosomes. Gradually, the efficacy of selection will increase as recombination breaks up haplotypes in increasingly shorter DNA segments. Moreover, natural selection has not been the only factor affecting the survival and reproduction of wolves in Scandinavia. Legal (licence or protective) hunting annually reduces the population size by about 10%, and pouching is considered to be the most common mortality factor among “disappearing” wolves (Liberg et al., 2020).
Inbreeding depression has been documented in the Scandinavian wolf population, involving morphological (Räikkönen et al., 2006, 2013) and other fitness-related traits (Bensch et al., 2006; Liberg et al., 2005). Inbreeding depression has also been recorded among wolves on Isle Royal (Robinson et al., 2019), in red wolves (Brzeski et al., 2014) and in Mexican wolves (Fredrickson et al., 2007). Wolves were once abundant and widespread over the northern Hemisphere. Analyses of ancient wolf genomes indicate that connectivity between wolf populations across continents was high, resembling panmixia, throughout Late Pleistocene (Bergström, 2022); indeed, the dispersal capacity of wolves is significant (Mech, 2020). Contemporary populations in Eurasia share a common ancestry that can be traced back to unidirectional gene flow from Siberia during the Last Glacial Maximum, although the survival of deep local ancestries argues against local extinctions during this process (Bergström, 2022; Loog et al., 2020; Ramos-Madrigal et al., 2021). There are thus reasons to believe that the high masked load we detected in Finland and Russia, and in immigrants to Scandinavia, was characteristic to many wolf populations before human persecution in the last centuries led to rapid and significant population declines and fragmented distributions (e.g., Hindrikson et al., 2017). With this demographic history contemporary wolf populations may be particularly sensitive to inbreeding depression by carrying a high masked load. We suggest that this could be the case as well for other vertebrate predators that suffered from human persecution during Anthropocene, increasing the risk for extinction (Kyriazis et al., 2021).
The arrival and breeding of new immigrants to the Scandinavian wolf population had positive effects on genetic diversity. New alleles arrived, ancestral alleles that had become lost in the original population re-entered the population and the proportion of homozygous genotypes of derived deleterious alleles decreased. These observations are concrete manifestations of genetic rescue at the molecular level and are consistent with concurrent population expansion and increased breeding success in the Scandinavian population (Åkesson et al., 2016; Vilà et al., 2003). Empirical evidence (from data on demography or fitness-related traits) for genetic rescue have been reported in several species (Frankham, 2015), including in other wolf populations (Adams et al., 2011; Fredrickson et al., 2007), but have rarely included genomic data demonstrating how deleterious alleles get masked.
Management of endangered and isolated populations emphasizes the importance of gene flow to counteract inbreeding and loss of genetic diversity (Whiteley et al., 2015). Our results demonstrate such effects in the Scandinavian wolf population and they also show that continuous immigration is necessary to make rescue effects other than just temporary. Genomic signatures of inbreeding were soon again apparent after the arrival of new immigrants 2008–2013, with decreased proportions of heterozygous genotypes and increased proportions of homozygous derived alleles. However, immigration can be seen as a double-edged sword since new deleterious alleles will enter the population, as shown in our study. The negative effects from new deleterious alleles may be of particular concern if immigrants are from a population with high masked load, as is the case with wolves from Finland and Russia, and if the recipient population is small and purging thereby inefficient. Under such scenario the need for continuous and stable immigration is strong (Pérez-Pereira et al., 2022). Maintaining connectivity to the larger populations in Finland and Russia, with frequent gene flow, should thus be of prime importance to wolf conservation in Scandinavia (Laikre et al., 2016). In parallell, continued monitoring of the population's genetic status will be essential for following the accumulation of deleterious mutations.
AUTHOR CONTRIBUTIONS
Hans Ellegren conceived of the study, Linnéa Smeds performed all analyses, Linnéa Smeds and Hans Ellegren interpreted the data and wrote the manuscript.
ACKNOWLEDGEMENTS
We acknowledge funding from the Swedish Research Council and the Knut and Alice Wallenberg Foundation and bioinformatic advice from Marcin Kierczak at the National Bioinformatics Infrastructure Sweden at SciLifeLab. The computations were enabled by resources in projects p2018002 and sllstore2017034 provided by the Swedish National Infrastructure for Computing (SNIC) at UPPMAX, partially founded by the Swedish Research Council through grant agreement no. 2018-05973.
CONFLICT OF INTEREST
The authors declare no conflicts of interest.
Open Research
DATA AVAILABILITY STATEMENT
Raw data in this study have been made publicly available with the following accession numbers: PRJEB20635, PRJEB28342 and PRJEB38198. The final vcf files (both coding and genome wide) are available on Dryad (https://doi.org/10.5061/dryad.7m0cfxpzj). All custom scripts and all commands for running the software used in the study are available on github (https://github.com/linneas/wolf-deleterious).