Cryptic and extensive hybridization between ancient lineages of American crows
Abstract
Most species and therefore most hybrid zones have historically been defined using phenotypic characters. However, both speciation and hybridization can occur with negligible morphological differentiation. Recently developed genomic tools provide the means to better understand cryptic speciation and hybridization. The Northwestern Crow (Corvus caurinus) and American Crow (Corvus brachyrhynchos) are continuously distributed sister taxa that lack reliable traditional characters for identification. In this first population genomic study of Northwestern and American crows, we use genomic SNPs (nuDNA) and mtDNA to investigate the degree of genetic differentiation between these crows and the extent to which they may hybridize. Our results indicate that American and Northwestern crows have distinct evolutionary histories, supported by two nuDNA ancestry clusters and two 1.1%-divergent mtDNA clades dating to the late Pleistocene, when glacial advances may have isolated crow populations in separate refugia. We document extensive hybridization, with geographic overlap of mtDNA clades and admixture of nuDNA across >900 km of western Washington and western British Columbia. This broad hybrid zone consists of late-generation hybrids and backcrosses, but not recent (e.g., F1) hybrids. Nuclear DNA and mtDNA clines had concordant widths and were both centred in southwestern British Columbia, farther north than previously postulated. Overall, our results suggest a history of reticulate evolution in American and Northwestern crows, perhaps due to recurring neutral expansion(s) from Pleistocene glacial refugia followed by lineage fusion(s). However, we do not rule out a contributing role for more recent potential drivers of hybridization, such as expansion into human-modified habitats.
1 INTRODUCTION
Phenotypic characters have historically been a primary basis for distinguishing between species (Bickford et al., 2007), but the increasing availability of DNA evidence has made clear that speciation is not always accompanied by morphological change (Fišer, Robinson, & Malard, 2018). Genomic analyses of morphologically conserved groups have uncovered cryptic species across the tree of life (e.g., Hotaling et al., 2016; Larsen, Miller, Rhodes, & Wiens, 2017; Pfenninger & Schwenk, 2007; Satler, Carstens, & Hedin, 2013), yet not all morphologically diagnosed species show genomic differences (e.g., Mason & Taylor, 2015).
Genomic tools provide ways to understand the population genetic structure and evolutionary history of cryptic species. Examples include identifying geographic clusters of genetically homogenous individuals, identifying independent evolutionary lineages, establishing phylogenetic relatedness of lineages, and estimating divergence dates (Struck et al., 2018). In geographic areas where genetically differentiated individuals co-occur, the presence or absence of genomic admixture indicates either hybridization or reproductive isolation (e.g., Linck et al., 2019; Pulido-Santacruz, Aleixo, & Weir, 2018; Scordato et al., 2017).
Where species interbreed extensively, hybrid zones offer natural laboratories in which to investigate the speciation continuum (Harrison, 1993; de Queiroz, 1998). This is just as true for cryptic hybrid zones, because genomic inquiry is productive regardless of morphological differences between parental species. Geographic-genomic patterns at cryptic hybrid zones can illuminate speciation processes like the influence of differing levels of gene flow and selection, and the fusion of lineages that can occur when previously allopatric populations experience secondary contact. A narrow cline may suggest a pronounced selection gradient across a hybrid zone, perhaps on morphologically cryptic but biologically important characters, whereas a broad cline is more consistent with neutral processes (Mallet et al., 1990). Comparing widths of mtDNA and nuDNA clines can elucidate additional speciation mechanisms, such as the presence of sex-biased asymmetries in dispersal or the effects of Haldane's rule (Toews & Brelsford, 2012). The relative age of a hybrid zone can also be inferred because the heterozygosity of admixed individuals decreases with each generation of backcrossing (Bouchemousse, Liautard-Haag, Bierne, & Viard, 2016; Milne & Abbott, 2008). In addition, the presence of pronounced peaks of differentiation across the genome may indicate elevated selection within certain genomic regions, possibly related to the evolution of morphologically cryptic reproductive isolating mechanisms. Fewer peaks, on the other hand, may suggest a predominant role for neutral processes like genetic drift (Irwin et al., 2018).
However, because most species have historically been diagnosed morphologically, most hybrid zone studies to date have likewise focused on morphologically distinct parental species (Barton & Hewitt, 1989). By comparison, much less is known about what is happening in cryptic hybrid zones, and research on this topic has begun to appear only recently (e.g., Herrera-Aguilar et al., 2009; Patel, Schell, Eifert, Feldmeyer, & Pfenninger, 2015; Pfenninger & Nowak, 2008; Pulido-Santacruz et al., 2018; Quilodrán, Austerlitz, Currat, & Montoya-Burgos, 2018). These few studies have already documented a variety of evolutionary patterns and processes operating in the absence of morphological differences, including cryptic reticulate evolution (Kearns et al., 2018). However, additional genomic studies of cryptic hybrid zones are needed to better synthesize how evolutionary processes may differ at hybrid zones with and without substantial morphological differentiation. For example, because the hybrid zone literature has mostly focused on morphologically distinct parentals, it may be underestimating the importance of nonmorphological characters in the evolution of reproductive isolation.
The Northwestern Crow (Corvus caurinus) and American Crow (Corvus brachyrhynchos) are candidates for cryptically hybridizing at their contact zone. These crows are sister taxa (Haring, Däubl, Pinsker, Kryukov, & Gamauf, 2012) that have long been considered separate species (Baird, 1858). However, they are nearly identical morphologically, and collectively they have a continuous distribution along the Pacific coast of North America (Figure 1; Clements et al., 2017; Johnston, 1961). The traditional, phenotypic characters for identifying these all-black corvids are based on putative differences in morphology, voice, and ecology, and have always been controversial (Johnston, 1961, also see Discussion). Uncertainty in their identification has moreover led to uncertainty about the location of their range boundary and confusion regarding the nature and extent of a secondary contact zone. It has remained unclear whether, for example, there is assortative mating of discrete forms in sympatry (Brooks, 1917, 1942) or clinal variation without diagnosable differences (Johnston, 1961), and little new information has surfaced during the past half century. The only prior molecular study sequenced mtDNA from just a handful of American and putative Northwestern crows (Haring et al., 2012), and all three “Northwestern” crow samples were from near the range boundary, leaving doubts as to whether these were Northwestern Crows, American Crows, or hybrids. After more than a century of uncertainty, genomic data now provides an opportunity to assess the geographic distribution of Northwestern and American crows and to determine the extent to which they mate assortatively, hybridize, or exhibit clinal variation.

In this study, our objective was to use genomic data and geographically robust sampling of American and Northwestern crows to better understand the evolutionary history of these presumptive species that lack well-defined phenotypic characters. Specifically, we set out to (a) assess whether Northwestern and American crows represent independently evolving evolutionary lineages, and, if so, (b) determine the extent to which they might hybridize or exhibit reproductive isolation, (c) test the role of specific geographic barriers in potentially structuring gene flow, and (d) better understand their evolutionary and biogeographic history.
2 MATERIALS AND METHODS
2.1 Sample collection and DNA extraction
To conduct a population genetic survey of American Crows and Northwestern Crows near their range boundary and across North America, we sampled frozen tissue (n = 218), blood (Alaska; n = 35), or feather material (Idaho; n = 6) from crows identified a priori as either species (Table S1). We also included two Carrion Crow (Corvus corone) tissue samples as outgroups (Haring et al., 2012). Tissue samples were obtained from natural history museums and were generally associated with vouchered specimens (Table S1). Blood samples were obtained under permits and approvals from the US Fish and Wildlife Service, the Alaska Department of Fish and Game, and the Institutional Animal Care and Use Committees at the University of Alaska Fairbanks and the US Geological Survey Alaska Science Center. Feather samples were collected under permits from the US Fish and Wildlife Service and the Idaho Department of Fish and Game, following recommended protocols in the Guidelines to the Use of Wild Birds in Research (Gaunt, Oring, & Ornithological Council, 1997). We extracted total genomic DNA with a DNeasy tissue extraction kit (Qiagen) following the manufacturer's protocol.
2.2 Mitochondrial DNA (mtDNA) sequencing
To survey a large sample of crows across the putative contact zone and throughout North America and to conduct divergence dating, we amplified 1,041 base pairs (bp) of mtDNA NADH dehydrogenase subunit 2 (ND2) from 259 individuals (Table S1). We used primers L5215 (Hackett, 1996) and TrC (Miller, Bermingham, & Ricklefs, 2007) and 12.5 μL PCR reactions on a T100 thermal cycler (Bio-Rad) as follows: 94°C for 2.5 min, 35 cycles of (94°C for 30 s, 54°C annealing for 30 s, 72°C for 1 min), 10 min at 72°C, 10°C hold. We sent PCR products to the High-Throughput Genomics Unit at the University of Washington for cleanup and sequencing. We unambiguously aligned complementary strands with sequencher 5.0 (Gene Codes Corporation) and downloaded a Carrion Crow ND2 sequence from GenBank as an outgroup (Table S1).
2.3 mtDNA phylogeny, divergence dating, and haplotype network
We compared models of ND2 sequence evolution using jmodeltest 2.1.4 (Posada, 2008) and conducted ND2 divergence dating in beast 1.8.4 (Suchard et al., 2018) using the HKY + G model, an uncorrelated relaxed clock with lognormal distribution, and a constant size coalescent tree prior. We fixed the uncorrelated relaxed clock mean parameter ucld.mean to 0.0145, half the ND2 rate of 2.9 × 10−2/substitutions site/My derived from Hawaiian honeycreepers based on sequential uplift dates of the Hawaiian island chain (Lerner, Meyer, James, Hofreiter & Fleischer, 2011). We ran a chain length of 50 million, saving 10,000 posterior trees. We used Tracer 1.6.0 to verify convergence (ESS scores > 200) and generated a maximum clade credibility tree with treeannotator 1.8.4 after discarding the first 2,500 trees as burn-in.
To verify the ND2 tree topology across phylogenetic methods, we also inferred a maximum likelihood tree in raxml 8.0.2 (Stamatakis, 2014) using the GTRGAMMA model with 500 rapid bootstrap replicates and an SVDquartets tree (Chifman & Kubatko, 2014) in PAUP* 4.0a165 (Swofford, 2002), sampling 100,000 quartets with a shared tree model and 100 bootstrap replicates. We constructed a median-joining ND2 haplotype network (Bandelt, Forster, & Röhl, 1999) in popart 1.7 (Leigh & Bryant, 2015) and calculated mean pairwise distance between haplotype groups in r (R Core Team, 2018) using ape 3.5 (Paradis, Claude, & Strimmer, 2004). We omitted one sample (gws4003_Pierce_Co) from the haplotype network and pairwise divergence calculations because only a partial ND2 sequence was obtained.
2.4 Nuclear DNA (nuDNA) SNP library preparation
To assess whether Northwestern and American crows show distinct evolutionary histories in the nuclear genome and to determine the extent to which these crows hybridize, we generated reduced representation double digest restriction-associated DNA (ddRAD) SNP libraries. We generated these genomic SNP libraries from a subset of 62 American/Northwestern crow individuals for which we had also sequenced mtDNA ND2, and two Carrion Crows as an outgroup (Table S1). To maximize our power to detect potentially distinct, sympatric lineages, we used mtDNA results as a guide for selecting individuals for nuDNA analysis. At localities containing both of the major mtDNA ND2 haplogroups in our broad population genetic survey, we selected individuals for ddRAD sampling to reflect the approximate overall ratio of mtDNA haplogroups at each locality. We followed the ddRAD sequencing protocol of Peterson, Weber, Kay, Fisher, and Hoekstra (2012) after verifying high molecular weight DNA on a gel. We digested 350–500 ng of DNA with SbfI-HF and MspI restriction endonucleases (New England BioLabs), pooled sets of eight samples, and size-selected 415–515 bp DNA fragments on a BluePippin machine (Sage Science). We multiplexed 96 avian ddRAD libraries from this study and another study, obtaining two runs of single-end 50 bp reads from Illumina HiSeq 2500 at the Computational Genomics Research Laboratory at the University of California.
2.5 nuDNA sequence assembly
We used process_radtags in stacks 1.42 (Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013) with default settings to demultiplex reads, discard low-quality reads, discard reads with an uncalled base, rescue barcodes, and rescue SbfI RAD tags. To align reads to an American Crow reference genome (Zhang et al., 2014), we built a genome database using default settings for gmap_build in GMAP (Wu & Watanabe, 2005). We aligned reads to the reference genome with gsnap version 2016-09-23 (Wu & Nacu, 2010), specifying ≥ 90% coverage, ≤3 mismatches, and default settings for other parameters. We converted alignments to BAM format with SAMtools (Li et al., 2009) and created loci and called SNPs from the aligned reads with ref_map.pl in stacks 1.42 (Catchen et al., 2013). We constructed a 64-crow alignment including the two Carrion Crow outgroup samples and a 62-crow alignment containing only American and Northwestern crows. For each data set, we retained stacks with a depth of ≥ 3 identical reads, loci with a coverage of ≥ 4 samples, and biallelic SNPs with a minor allele count ≥ 2 and heterozygosity ≤ 0.5. We output Structure files for downstream analysis. We used a custom r script to generate 64-crow and 62-crow alignments of unlinked SNPs retaining one random SNP per locus, a 62-crow alignment without missing data, and 48-crow alignments of Pacific coastal birds with and without missing data.
2.6 Mapping nuDNA loci to chromosomes
To infer chromosomal positions for our reference-aligned ddRAD loci from STACKS, we aligned the American Crow reference genome with the chromosome-annotated, repeat-masked genome of the Zebra Finch (Taeniopygia guttata; http://hgdownload.cse.ucsc.edu/goldenPath/taeGut2/bigZips/taeGut2.fa.masked.gz). This approach takes advantage of the high degree of synteny in birds at the chromosomal level (Ellegren, 2010). We aligned the American Crow and Zebra Finch genomes with nucmer in MUMmer 3.23 (Kurtz et al., 2004), customizing a shell script by C. J. Battey (https://github.com/cjbattey/truseq_assembly). After excluding aligned regions < 1,000 bp in length, we assigned an American Crow scaffold to a chromosome when > 75% of all > 1,000 bp alignment regions in that scaffold mapped to the same Zebra Finch chromosome.
2.7 nuDNA cluster analyses
We used two independent methods to determine K, the number of nuDNA ancestry clusters. First, we used the K-means clustering algorithm implemented in the find.clusters function in adegenet 2.0.1 (Jombart, Devillard, & Balloux, 2010) to calculate the Bayesian information criterion (BIC) for K = 1 through K = 10. We used the 62-sample data set with no missing data after transforming by principal components analysis (PCA), retaining all principal components.
Second, we also used the Bayesian admixture model with correlated allele frequencies implemented in structure 2.3.4 (Pritchard, Stephens, & Donnelly, 2000) to conduct five replicate runs each for K = 1 to K = 4 with 200,000 generations and a burn-in of 20,000. We estimated the optimal number of clusters in the unlinked 62 sample data set with missing data by analyzing the rate of change in the likelihood distribution between successive K values in structure harvester 0.6.94 (Earl & vonHoldt, 2012; Evanno, Regnaut, & Goudet, 2005; Leaché, Grummer, Harris, & Breckheimer, 2017). We combined results from replicate runs while accounting for permutations and label switching using clumpp 1.1.2 (Jakobsson & Rosenberg, 2007).
To test for a potential effect of Z chromosome copy number on nuDNA ancestry inference, we conducted a Welch's two-sample t test comparing K = 2 Structure ancestry proportions for males and females. To test for potential effects of different numbers of loci and different amounts of missing data in the alignments used for adegenet PCA and Structure, we plotted K = 2 ancestry proportions from Structure/CLUMPP against transformed genomic PC1 values from adegenet for the 48 Pacific coastal samples, and conducted a simple linear regression. We applied a custom linear transformation to genomic PC1 values for this analysis such that our southernmost and northernmost localities had population means of 0 and 1, respectively.
2.8 Isolation by distance
We tested for isolation by distance in the 62-sample continent-wide data set and in the 48-sample Pacific coastal data set (both without missing data) by conducting Mantel tests (Mantel, 1967) in ade4 (Dray & Dufour, 2007) with 1,000 Monte-Carlo permutations. We used Euclidean geographic distance and calculated Edwards' genetic distance (Edwards, 1971) in adegenet (Jombart, 2008).
2.9 Recent-generation versus late-generation hybrids
We compared genomic hybrid indices to intertaxon heterozygosities to determine whether individual crows were recent-generation hybrids or descendants of long-admixed populations (Bouchemousse et al., 2016; Milne & Abbott, 2008). We designated crows as parental American or parental Northwestern if these respective ancestry proportions exceeded 0.98 in our combined K = 2 Bayesian clustering runs (Scordato et al., 2017). We used a custom R script to subset the 62-sample SNP alignment containing missing data to include only parental individuals and only variable, unlinked SNPs present for ≥ 75% of individuals. With this alignment we calculated genome-wide and SNP-specific FST between parental populations with r package Hierfstat (Goudet, 2005). We considered SNPs with FST > 0.6 to be ancestry-informative (Scordato et al., 2017) and further limited our alignment to these SNPs using a custom r script. For each individual crow, we calculated a maximum likelihood estimate of the genomic hybrid index and the average intertaxon heterozygosity across ancestry-informative loci using r package Introgress (Gompert & Buerkle, 2010). F1 hybrids have an expected hybrid index of 0.5 and expected heterozygosity of 1.0 for loci fixed in parental individuals. Heterozygosity is reduced in later-generation hybrids and backcrosses. We considered crows with hybrid index > 0.25 and < 0.75 and heterozygosity > 0.5 to be recent-generation hybrids, individuals with hybrid index > 0.25 and < 0.75 but with heterozygosity < 0.5 to be later-generation hybrids, and birds with hybrid index < 0.25 or > 0.75 to be backcrosses (Larson, White, Ross, & Harrison, 2014; Milne & Abbott, 2008; Scordato et al., 2017; Toews, Lovette, Irwin, & Brelsford, 2018).
2.10 Fitting and comparing clines for mtDNA, overall nuDNA, and individual SNPs
We fit cline models for mtDNA and overall nuDNA along the Pacific Coast using r package hzar (Derryberry, Derryberry, Maley, & Brumfield, 2014). We assigned transect distances to each Pacific coastal population using a smoothed curve drawn parallel to the Pacific coastline in qgis 2.8.3 (QGIS Development Team, 2015). For mtDNA, we fit a cline model for population frequencies of American mtDNA (range 0–1) using hzar.doMolecularData1DPops and hzar.makeCline1DFreq. For overall nuDNA, we fit a cline model to means and variances of population ancestry proportions (q values) from combined K = 2 Bayesian clustering runs using hzar.doNormalData1DPops and hzar.makeCline1DNormal (Leaché et al., 2017; Linck et al., 2019; Scordato et al., 2017). For both mtDNA and overall nuDNA, we fit models without exponential tails, restricted parameter search space to a liberal yet reasonable range of values (cline centre > 500 km and < 3,500 km, cline width < 3,500 km; Derryberry et al., 2014) and fixed means, variances, and allele frequencies at cline ends to the observed values of the terminal populations (Linck et al., 2019; Lipshutz et al., 2019). We ran the Markov chain Monte Carlo (MCMC) optimizer for three iterative cycles using hzar.chain.doSeq, retaining the third run for subsequent analysis (Derryberry et al., 2014). We used hzar.get.ML.cline and hzar.getLLCutParam to obtain maximum likelihood estimates for widths, centres, and ± 2 log likelihood (LL) intervals. We used these intervals to test for coincidence of centres and concordance of widths (Derryberry et al., 2014; Lipshutz et al., 2019).
We also used hzar to fit frequency-based clines to 905 individual genomic SNPs with no missing data across the 48 Pacific coastal samples. We fit clines with free endpoint scaling and no exponential tails while reasonably restricting parameter search space (cline centre > 0 km and < 4,000 km; cline width < 4,000 km; Derryberry et al., 2014). We conservatively identified the SNPs with the best-supported clines (ΔAICc ≥ 6 between the cline model and the null model) and used a chi-squared test to assess whether the chromosomal distribution of these SNPs differed from the chromosomal distribution of all 905 SNPs examined.
2.11 Assessing heterogeneity of admixture near potential barriers to and corridors for gene flow
To provide an ad hoc assessment of potential differences in gene flow and admixture between Vancouver Island crows and those across the water barrier on the adjacent mainland, we compared nuDNA ancestry proportions of individuals from Vancouver Island (vic + nvi localities, n = 8) to those of nearby mainland populations (yvr + cbc localities, n = 8). We compared Northwestern nuDNA ancestry from the combined K = 2 Structure runs using a Student's t test with a two-tailed hypothesis and pooled variances. We also compared ND2 haplogroup proportions of crows between Vancouver Island the adjacent mainland using a Pearson's chi-squared test with 2,000 Monte Carlo replicates (n = 35 for nvi + vic; n = 34 for cbc + yvr).
To assess the Skeena River valley of British Columbia as a potential corridor for gene flow across the Coast Mountains, we conducted an ad hoc comparison of the within-population variance in nuDNA K = 2 ancestry proportions between two localities nearest to the Skeena River and all other localities. We fitted two nested mixed effects models in nlme (Pinheiro, Bates, DebRoy, Sarkar, & R Core Team, 2017) and compared them with a likelihood ratio test. Both models included the Northwestern nuDNA ancestry proportion as the response variable, membership in nbc or cbc (the localities nearest to the Skeena River) as a fixed effect, and locality as a random effect. The more complex model incorporated a variance function parameter allowing the Skeena River localities to take on a common within-population variance estimate that differed from a within-population variance estimate common to all other populations.
3 RESULTS
3.1 Sequence alignments
3.1.1 nuDNA
We obtained 181,580,191 raw sequencing reads across 64 samples in eight pools, retaining 150,638,152 reads (83.0%) after discarding reads with ambiguous barcodes, ambiguous RAD tags, or low quality scores. The Bayesian admixture model readily distinguished the two Carrion Crow samples from the 62 American and Northwestern crows, and we excluded these Carrion Crows from subsequent analyses. The 62-sample alignment contained 7,292 loci with 9,563 SNPs before random subsampling to 7,292 unlinked SNPs. The 62-sample alignment with no missing data contained 738 unlinked SNPs, and the 48-sample alignment of Pacific coastal samples with no missing data contained 905 unlinked SNPs.
We conservatively mapped 89% (6,494/7,292) of ddRAD loci in the 62-sample crow alignment to Zebra Finch chromosomes, revealing that our SNPs were widely dispersed across the genome and relatively concentrated on small, GC-rich chromosomes (Figure S2, Figure S3). Of the 6,494 mapped loci, 206 (3.2%) mapped to the AT-rich Z chromosome, which comprises 5.9% of the Zebra Finch genome (https://www.ncbi.nlm.nih.gov/genome/367).
3.1.2 mtDNA
We obtained 259 mtDNA ND2 sequences, including 258 full-length ND2 sequences (1,041 bp) and one partial sequence identifiable to haplogroup.
3.2 American and Northwestern crows have distinct evolutionary histories
Genomic SNPs and mtDNA sequences both revealed distinct evolutionary histories for Northwestern and American crows. Two different clustering methods both supported two nuDNA ancestry clusters corresponding to Northwestern and American crows (Figure 1). First, the K-means algorithm in adegenet minimized the BIC at K = 2 (256.85 for K = 1, 255.12 for K = 2, and ≥ 255.81 for 3 ≤ K ≤ 10; Figure S4). Second, the Evanno method also selected K = 2 as the best-fit model for the number of nuDNA ancestry clusters (Table S2). Combined results from replicate K = 2 Bayesian clustering runs in Structure are presented in Figure 1. There was no effect of sex on K = 2 Structure ancestry proportions (t = 0.46, p = .64). Structure analysis of the 7,292-SNP alignment with missing data and PCA analysis of the 905 SNP alignment without missing data yielded the same genomic signal despite different numbers of loci, different amounts of missing data, and different underlying analyses (R2 = .95 and p < 10–15 for regression of Structure K = 2 ancestry vs. genomic PC1, Figure S5).
We recovered two major lineages of mtDNA ND2 with an estimated divergence time of ~ 443,000 years ago (95% HPD interval 268–649 kya; Figure S6-S7). We hereafter refer to these ND2 lineages as Northwestern (n = 95) and American (n = 164) haplogroups. The Northwestern and American haplogroups were separated by three fixed differences and a mean uncorrected pairwise distance of 1.10% (range: 0.58%–1.63%). Bayesian posterior probability, maximum likelihood bootstrap, and quartets bootstrap values were 100%, 89%, and 74% for the American haplogroup clade and 77%, 66%, and 82% for the Northwestern haplogroup clade (Figure S8-S10).
3.3 American and Northwestern crow mtDNA haplogroups overlap geographically
Overall, population frequencies of the Northwestern mtDNA haplogroup increased with latitude west of the Cascades and Coast Ranges. All Alaskan crows had the Northwestern mtDNA haplogroup, and all crows from California, east of the Cascades of Oregon and Washington, and east of the Coast Mountains of British Columbia had the American mtDNA haplogroup (Figure 1, Figure S1). Individual crows with American and Northwestern mtDNA haplogroups co-occurred within a > 900 km overlap zone on the Pacific slope of Washington and British Columbia. We found no additional geographic-genetic structuring within the Northwestern or American mtDNA haplogroups.
3.4 American and Northwestern crows hybridize extensively
Of the crows for which we sampled nuDNA, all individuals from Alaska had pure ( > 98%) Northwestern ancestry, and all from California and east of the Cascades/Coast Mountains had pure American ancestry (Figure 1). Hybrid crows with nuDNA ancestry > 2% and < 98% occupied a Pacific coastline distance of > 900 km and included all of the crows that we sampled from coastal Washington to coastal British Columbia at the approximate latitude of northern Haida Gwaii (locality nbc). Among four crows at the northern limit of the hybrid zone, two had pure Northwestern ancestry, one hybrid had 82% Northwestern ancestry, and another hybrid had 81% American ancestry.
The first two genomic PCA axes explained 6.8% and 3.3% of nuDNA variation, respectively (Figure S11). The first axis closely approximated the Pacific coastal hybrid cline, separating crows from California and east of the Cascades from crows northwest of the Cascades in Washington, British Columbia, and Alaska. The second axis separated American Crows of eastern and western North America and also provided additional resolution of crows along the Pacific coastal hybrid cline. Isolation by distance was evident for continent-wide sampling (Figure S12; n = 62, p < .001) and within the Pacific coastal samples alone (Figure S13; n = 48, p < .0005).
3.5 The hybrid zone consists of late-generation hybrids and backcrosses
Among 62 Northwestern and American crows with nuDNA SNP data, 10 were parental Northwestern, 18 were parental American, and 34 were hybrids. Our alignment of the 28 parental crows contained 3,582 unlinked SNPs present in ≥ 75% of individuals, with a genome-wide FST of 0.13. A total of 35 SNPs were ancestry-informative (FST > 0.6) between parental American and parental Northwestern crows, including 2 SNPs with fixed differences (FST = 1). Two of these 35 ancestry-informative SNPs (5.7%) mapped to the Z chromosome, which did not significantly differ from the Z-chromosome proportion of all mapped loci (3.2%, X2 = 0.67, p = .63).
All 34 hybrids we sampled had low intertaxon heterozygosities at ancestry-informative SNPs given their respective hybrid indices, indicating that they were late-generation hybrids and backcrosses and not F1s or early-generation hybrids (Figure 2).

3.6 Nature and extent of mitonuclear concordance and discordance
Nuclear DNA ancestry was generally concordant with mtDNA haplogroup. Crows with pure (> 98%) nuDNA ancestry never had a “mismatched” mtDNA haplogroup, but some mitonuclear discordance was evident (Figure 1). Individuals with the American mtDNA haplogroup (n = 40) had up to 62% Northwestern nuDNA ancestry (K = 2), and crows with the Northwestern mtDNA haplogroup (n = 22) had up to 60% American nuDNA ancestry. At six localities, we sampled nuDNA from crows with both American and Northwestern mtDNA haplogroups. Five of these six localities contained only hybrids, and the sixth locality contained two hybrids and two pure Northwesterns.
3.7 Cline characteristics for mtDNA, overall nuDNA, and individual nuDNA SNPs
We generated Pacific coastal cline models using 218 mtDNA samples from 13 populations (range = 9–31 per population) and nuDNA ancestry from 48 individuals in 12 populations (four per population). The mtDNA and overall nuDNA clines were both centred in southwestern British Columbia and had overlapping ± 2 LL intervals for cline width (Figure 3). The mtDNA cline was centred near the latitude of central Vancouver Island (2,445 km; ± 2 LL range 2,340–2,530 km) with a width of 825 km (± 2 LL range 609–1,158 km). The nuDNA cline was centred near the latitude of Vancouver (2,642 km; ± 2 LL range 2,585–2,700 km) with a width of 918 km (± 2 LL range 560–1,266 km).

Of the 905 individual SNPs for which we calculated Pacific coastal clines, 94 SNPs exhibited a "strong" clinal pattern (ΔAICc ≥ 6 between the cline model and the null model; Figure S14). Clines for these 94 individual SNPs had a median centre of 2,401 km (median ± 2 LL range 1,913–2,853 km) and a median width of 379 km (median ± 2 LL range 0.4 km to 2,905 km). The 94 SNPs with "strong" clinal patterns did not differ in chromosomal distribution from the full set of 905 SNPs (Figure S15; X2 = 22.0, p = .88).
3.8 Heterogeneous admixture near potential barriers to and corridors for gene flow
Overall, both mtDNA and nuDNA results indicated a trend of increasing Northwestern ancestry with latitude west of the Cascades and Coast Mountains (Figure 1). However, Vancouver Island crows had a higher frequency of the Northwestern ND2 haplogroup (74%) than crows on the nearby mainland (26%; Χ2 = 15.8, p < .0005). Crows on Vancouver Island also averaged 44% more Northwestern nuDNA ancestry than adjacent mainland crows (74% vs. 51%, t = 3.13, df = 14, p < .01).
Crows within most localities were fairly homogeneous in their ancestry proportions, but localities closest to the Skeena River valley were a notable exception (Figure 1), where within-population ancestry variation was about nine times higher at these localities than elsewhere (variance parameter = 9.02, likelihood ratio = 85.02, df = 1, p < 1019).
4 DISCUSSION
To better understand a potential case of cryptic speciation and hybridization, we conducted a population genomic study of American and Northwestern crows. We found both mtDNA and nuDNA evidence that these crows represent two historically divergent evolutionary lineages, but we also identified a > 900 km wide cryptic hybrid zone consisting of a late-generation hybrid swarm. Overall, our results suggest that this system represents a compelling example of reticulate evolution (Huson & Bryant, 2006; Sessa, Zimmer, & Givnish, 2012).
4.1 Pleistocene divergence, historical biogeography, and gene flow
Climatic oscillations can have profound effects on the speciation process (Hewitt, 2004). During the late Pleistocene, when American and Northwestern mtDNA are estimated to have diverged ~ 443,000 years ago, North America was undergoing extensive glacial advances and retreats at regular ~ 100,000-year Croll-Milankovich intervals (Clark et al., 2009; Muller & MacDonald, 1997). Much of the Pacific Northwest was covered in ice sheets during the glacial periods, isolating terrestrial organisms south of the ice sheets or in ice-free northern refugia such as Beringia, Haida Gwaii, or the Alexander Archipelago (Anderson, Hu, Nelson, Petit, & Paige, 2006; Burg, Gaston, Winker, & Friesen, 2006; Galbreath & Cook, 2004; Geraldes et al., 2019; Godbout, Fazekas, Newton, Yeh, & Bousquet, 2008; Shafer, Cullingham, Côté, & Coltman, 2010). During the interglacial periods, terrestrial organisms expanded from refugial populations into newly ice-free habitats, leading to secondary contact and potentially renewed gene flow between closely related, previously allopatric forms (Shafer et al., 2010). These repeated cycles of isolation and secondary contact created complex and/or reticulate population genetic histories in many of the region's terrestrial organisms (Hewitt, 2004; Kearns et al., 2018; Latch, Heffelfinger, Fike, & Rhodes, 2009; Omland, Tarr, Boarman, Marzluff, & Fleischer, 2000).
Northwestern and American crows may have diverged following a similar pattern, with Northwestern Crow populations evolving in isolation in one or more of the ice-free northern refugia while American Crow populations remained south of the ice sheets. Today, Northwestern and American crow mtDNA haplogroups overlap across most of coastal Washington and all of coastal British Columbia, consistent with post-glacial expansion of previously isolated populations into newly available habitat during one or more interglacial periods. Further sampling of crows on Haida Gwaii, the Alexander Archipelago, and other putative northern refugia might uncover additional genetic diversity within the Northwestern mtDNA haplogroup, which would corroborate the Pleistocene refugia hypothesis (e.g., Geraldes et al., 2019; Krosby & Rohwer, 2009).
Mountain ranges can also represent significant barriers to gene flow. Our results show that most gene flow between American and Northwestern crows has occurred on a north-south axis to the west of the Coast Mountains and Cascades. These 2–5 My-old ranges would have restricted east-west gene flow in crows during Pleistocene interglacial periods as they do today (Shafer et al., 2010). Even though geography and genetic data both suggest predominantly north/south gene flow on the Pacific slope of the Coast Mountains and Cascades, we note the potential for limited east-west gene flow across these ranges, especially at major drainages with low passes such as the Fraser and Skeena River valleys. Indeed, the relatively high variation we found in nuDNA ancestry proportions near the Skeena River is consistent with more recent gene flow and backcrossing there compared to other localities. However, despite the higher variance in nuDNA ancestry proportions near the Skeena, the hybrid crows we sampled there all appeared to be late-generation hybrids and backcrosses, as found at all other localities sampled. The elevated Northwestern Crow ancestry in Vancouver Island hybrids compared to mainland birds also suggests that gene flow along the Pacific coastline may vary based on the geography of islands and water barriers.
4.2 Selection versus neutral processes across the hybrid zone
The width of a hybrid zone depends in part on whether the hybrid zone is primarily structured by selection or by neutral processes (Mallet et al., 1990). The > 900 km wide hybrid zone in American/Northwestern crows is > 7 times wider than typical avian hybrid zones in North America (130 ± 44 km, mean ± SD, n = 8; Hoffman, Wiens, & Scott, 1978; Rohwer & Wood, 1998; Ruegg, 2008; Irwin, Brelsford, Toews, MacDonald, & Phinney, 2009; Mettler & Spellman, 2009; Brelsford & Irwin, 2009; Toews, Brelsford, & Irwin, 2011; Seneviratne, Toews, Brelsford, & Irwin, 2012), suggesting a prominent role for neutral processes at the scale of the whole genome.
Peaks of differentiation within the genome can also indicate a role for selection across a hybrid zone. At the chromosome level, we found no evidence that some chromosomes had more SNPs with pronounced clines than did other chromosomes, again consistent with neutral processes. However, we note that although we sampled SNPs from diverse regions of the genome, the total fraction of the genome we sampled was small. Our marker density was thus insufficient to rule out smaller "islands" of genomic divergence within chromosomes (e.g., Toews et al., 2018), and it remains possible that selection is important at isolated loci (e.g., Poelstra et al., 2014). Across a morphologically cryptic hybrid zone like American/Northwestern crows, loci under strong selection could be associated with biologically important but less visually salient traits, e.g., those related to physiology or cognition.
A very broad hybrid zone consisting of late-generation hybrids and backcrosses is consistent overall with a prolonged period of neutral expansion. However, we also cannot rule out more recent processes. These crows are human commensals that thrive in disturbed landscapes (Verbeek & Butler, 1999; Verbeek & Caffrey, 2002), and indigenous peoples have inhabited the land situated within this hybrid zone for millennia. More recently, European settlers and their descendants have heavily modified the landscape through deforestation, agriculture, and urbanization. One hypothesis posits that recent land use changes and associated increased habitat heterogeneity may have removed habitat barriers to dispersal that existed before the time of European settlement, increasing opportunities for hybridization between a more maritime Northwestern Crow and a more agrarian American Crow (Haring et al., 2012; Marzluff & Angell, 2005). Under this scenario, more than a century of crow generations may have been sufficient genetic recombination to dilute highly heterozygous F1s and recent-generation hybrids out of the population.
4.3 Mitonuclear concordance and hybrid zone processes
Strong cline discordance between mtDNA and nuDNA can indicate that Haldane's rule (Carling & Brumfield, 2008; Coyne & Orr, 1989; Dasmahapatra et al., 2002; Devis, Aiello, Mallet, Pomiankowski & Silberglied, 1997; Haldane, 1922; McCormack, Heled, Delaney, Peterson, & Knowles, 2011), sex-biased asymmetries (Toews & Brelsford, 2012), and/or a selective gradient acting on mtDNA (Cheviron & Brumfield, 2009) are important processes in a hybrid zone. However, in the Pacific coastal hybrid zone between American/Northwestern crows, mtDNA and nuDNA clines had concordant widths and were both centred in southwestern British Columbia. This overall mitonuclear concordance, a pattern observed in ~ 82% of published studies (Toews & Brelsford, 2012), suggests that the above three processes probably do not play major roles in structuring this particular hybrid zone.
4.4 Reproductive isolation and phenotypic characters
Extensive genomic admixture constitutes strong evidence that reproductive isolation is lacking. Indeed, our population genomic study clarifies that American/Northwestern crows are not reproductively isolated, a question that had remained unresolved in the ornithological literature for > 160 years. In light of our results, past claims of two distinct crow species breeding assortatively in sympatry (Brooks, 1917, 1942) appear to have been overly ambitious, seemingly arising from the misapplication of subjective identification criteria. Traditional phenotypic characters for distinguishing American and Northwestern crows have included size, ecology, and voice, but these were always controversial when subjected to scrutiny. In the hindsight of our genomic study showing extensive admixture, it is now easier to see why these characters were unreliable. Historically, Northwestern Crows were considered to be diagnostically smaller than American Crows (Baird, 1858). In actuality, however, size variation in coastal crow populations is clinal, with northern birds averaging smaller, but with great overlap in measurements among individuals, especially near the range boundary (Johnston, 1961; Rhoads, 1893). Likewise, intertidal habitat use, once thought to be a distinguishing feature of Northwestern Crow (Baird, 1858), might simply reflect adaptive responses to local food availability (Cooper, 1870). Purported vocal differences (Baird, 1858; Brooks, 1917, 1942; Hellmayr, 1934; Suckley & Cooper, 1860) do not seem to correlate with size (Rhoads, 1893) or habitat (Johnston, 1961) near the range boundary, and individual birds have been observed giving typical vocalizations of both taxa (Johnston, 1961). Moreover, crows are oscine passerines that can learn vocalizations (Beecher & Brenowitz, 2005), and individual crows can even change vocalizations when joining a new social group (Brown, 1985).
The broad genomic hybrid zone we uncovered corroborates the work of some previous researchers who documented a continuous morphological cline in American/Northwestern crows along the Pacific Northwest coast (Johnston, 1961; Rhoads, 1893). Various authorities have been inconsistent regarding the southern range limit of Northwestern Crow, placing it anywhere from California (e.g., American Ornithologists’ Union, 1895) to Oregon (e.g., American Ornithologists’ Union, 1983) to Washington State (e.g., Clements et al., 2017; Ridgway, 1904; Verbeek & Butler, 1999; Verbeek & Caffrey, 2002). These difficulties in identifying a discrete range boundary now make sense given the existence of a broad genomic cline. Notably in our study, however, both mtDNA and nuDNA analyses placed the centre of the hybrid zone in southwestern British Columbia, farther north than previous hypotheses based on traditional phenotypic characters (e.g., American Ornithologists’ Union, 1998).
The lack of geographic-genetic structuring within American Crow mtDNA (Figure S1, Figures S6-S10) was somewhat surprising given that American Crows are widespread and morphologically variable across their North American distribution (Johnston, 1961; Ridgway, 1904). However, widespread migration in American Crows (Townsend, Frett, McGarvey, & Taff, 2018; Verbeek & Caffrey, 2002), combined with occasional long-distance female dispersal (McGowan, 2001; Withey & Marzluff, 2005), provide a ready ecological mechanism for homogenizing gene flow.
4.5 Prevalence and discovery of cryptic hybrid zones
This first population genomic analysis of the contact zone between American and Northwestern crows revealed a broad, cryptic hybrid zone. This avian hybrid zone in North America involves a well-known taxonomic group in an intensively studied geographic region, so the fact that it remained enigmatic for so long suggests that the global frequency of cryptic hybrid zones is greatly underestimated. In the case of these crows, > 160 years of muddled and conflicting ornithological literature based on subjective and variable phenotypic characters hinted at the potential existence of a cryptic hybrid zone. We expect that comprehensive population genomic surveys of other morphologically austere taxa will reveal many additional cryptic hybrid zones. Furthermore, we encourage researchers to carefully characterize these cryptic hybrid zones to further facilitate comparisons and syntheses of speciation and hybridization processes in both morphologically conserved and morphologically distinctive organisms.
ACKNOWLEDGEMENTS
Sharon Birks, Victoria Bowes, Matthew Cleland, Sergei Drovetski, Allen Furnell, Colleen Handel, John Marzluff, Lisa Pajot, and Tracy Sutherland helped salvage or collect crow samples. Robert W. Bryson, Jr., Ross Furbush, Jared Grummer, Adam Leaché, Hollie Walsh, and Robert Zink assisted with laboratory work or bioinformatics. CJ Battey, Cooper French, Laura Frost, HJ Kim, Adam Leaché, Ethan Linck, John Marzluff, Sabrina McNew, Lindsey Nietmann, Richard Olmstead, Josephine Pemberton, Yue Shi, and three anonymous reviewers provided comments or discussion that helped improve the manuscript. Sharon Birks arranged loans from the Genetic Resources Collection at the University of Washington Burke Museum, which provided most of the tissue samples for this study. Additional tissues were provided by the American Museum of Natural History, the Museum of Vertebrate Zoology at Berkeley, the University of Michigan Museum of Zoology, and the Louisiana State University Museum of Natural Science. RRH was supported by a grant from the College of Arts and Sciences at the University of Washington. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
AUTHOR CONTRIBUTIONS
D.L.S., R.R.H., S.R., and J.K. conceived the study; D.L.S. and J.K. designed the study; S.R., C.W., and C.V.H. collected field samples; D.L.S. and K.L.E. conducted the laboratory work. D.L.S. analyzed the data, interpreted the data, wrote the manuscript, and revised the manuscript with input from all coauthors.
Open Research
DATA AVAILABILITY STATEMENT
Input files and scripts for running analyses and producing figures are available on GitHub at https://github.com/slager/crow_hybrid_zone. Raw reads, barcodes, aligned reads, detailed sample information, and a snapshot of the above GitHub repository are available on Dryad at https://doi.org/10.5061/dryad.rr4xgxd5f. Mitochondrial DNA ND2 sequences are available under GenBank accession numbers MN830547-MN830805. Demultiplexed and quality-filtered nuDNA reads are available at the NCBI Sequence Read Archive (SRA) under BioProject accession number PRJNA595997 and BioSample accession numbers SAMN13608537-SAMN13608600.