Differential divergence in autosomes and sex chromosomes is associated with intra-island diversification at a very small spatial scale in a songbird lineage
Abstract
Recently diverged taxa showing marked phenotypic and ecological diversity provide optimal systems to understand the genetic processes underlying speciation. We used genome-wide markers to investigate the diversification of the Reunion grey white-eye (Zosterops borbonicus) on the small volcanic island of Reunion (Mascarene archipelago), where this species complex exhibits four geographical forms that are parapatrically distributed across the island and differ strikingly in plumage colour. One form restricted to the highlands is separated by a steep ecological gradient from three distinct lowland forms which meet at narrow hybrid zones that are not associated with environmental variables. Analyses of genomic variation based on single nucleotide polymorphism data from genotyping-by-sequencing and pooled RAD-seq approaches show that signatures of selection associated with elevation can be found at multiple regions across the genome, whereas most loci associated with the lowland forms are located on the Z sex chromosome. We identified TYRP1, a Z-linked colour gene, as a likely candidate locus underlying colour variation among lowland forms. Tests of demographic models revealed that highland and lowland forms diverged in the presence of gene flow, and divergence has progressed as gene flow was restricted by selection at loci across the genome. This system holds promise for investigating how adaptation and reproductive isolation shape the genomic landscape of divergence at multiple stages of the speciation process.
1 INTRODUCTION
As populations and lineages diverge from each other, a progressive loss of shared polymorphisms and accumulation of fixed alleles is expected. This is influenced by neutral processes (e.g., genetic drift), but also by natural and sexual selection, and the interaction between these processes may vary between different parts of the genome, creating a mosaic pattern of regions displaying different rates of divergence (Nosil, Harmon, & Seehausen, 2009; Wu, 2001). However, genomic regions directly involved in local adaptation and reproductive isolation may experience reduced effective gene flow compared to the genomic background (Ravinet et al., 2017). In addition, the effects of selection at linked sites can also locally increase divergence and magnify the effects of nonequilibrium demography over large genomic regions (Van Belleghem et al., 2018; Burri, 2017; Burri et al., 2015; Cruickshank & Hahn, 2014). Thus, establishing how different processes such as drift, selection and gene flow shape the rates of divergence at the genomic scale is critical to understand the links between speciation processes and their genetic and genomic consequences (Gavrilets, 2014).
Identifying the main drivers of genome-wide differentiation (i.e., isolation by environment versus reproductive isolation driven by nonadaptive factors) remains a complex question (Cruickshank & Hahn, 2014; Ravinet et al., 2017; Wolf & Ellegren, 2016). Recent studies have displayed varied results, and those focusing on the early stages of speciation have often emphasized ecological divergence over sexual selection and intrinsic incompatibilities (Bierne, Welch, Loire, Bonhomme, & David, 2011; Seehausen et al., 2014). In this context, studies of closely related taxa or populations that show phenotypic and ecological diversity, and are at different stages of divergence, hold promise to help clarify the chronology and relative importance of these underlying evolutionary mechanisms (Delmore et al., 2015; Mořkovský et al., 2018; Pryke, 2010; Sætre & Sæether, 2010; Safran, Scordato, Symes, Rodríguez, & Mendelson, 2013; Seehausen et al., 2014).
We used the Reunion grey white-eye (Zosterops borbonicus; taxonomy following Gill & Donsker, 2019), a songbird endemic to the small (2,512 km2) volcanic ocean island of Reunion (Mascarene archipelago, southwestern Indian Ocean), to quantify genome-wide patterns of divergence across its range and better understand underlying evolutionary factors. This species is characterized by complex patterns of plumage colour and size variation, with five distinct variants recognized across the island (Gill, 1973). These variants can be grouped into four parapatrically distributed geographical forms with adjoining ranges that came into secondary contact after diverging in allopatry (Bertrand et al., 2016; Cornuault et al., 2015; Delahaie et al., 2017). Three lowland forms differ primarily in plumage colour (Cornuault et al., 2015; Gill, 1973) and show a unique distribution pattern, with each form being separated from the other two by narrow physical barriers such as rivers or lava fields (Gill, 1973). These forms differ strikingly in head coloration and include a light brown form (lowland brown-headed brown form; hereafter LBHB), a grey-headed brown form (GHB) with a brown back and a grey head, and a brown-naped brown form (BNB) with a brown back and nape, and a grey crown (Figure 1 and Figure S1; see Cornuault et al., 2015 for a detailed description). A fourth form, restricted to the highlands (between 1,400 and 3,000 m), is relatively larger than the lowland forms and comprises two very distinct colour morphs, with birds showing predominantly grey (GRY) or brown (highland brown-headed brown form, HBHB) plumage, respectively (Bertrand et al., 2016; Cornuault et al., 2015; Gill, 1973; Milá, Warren, Heeb, & Thébaud, 2010). Both of these latter morphs occur in sympatry and represent a clear case of plumage colour polymorphism (Bourgeois et al., 2017). This highland form is separated from all three lowland forms by relatively narrow contact zones located along the elevational gradient (Gill, 1973). One such contact zone was recently studied and was found to correspond to an ecotone between native habitat (>1,400 m above sea level [a.s.l.]) and anthropogenic landscapes (<1,400 m a.s.l.), suggesting a possible role of environmental differences in influencing the location of these zones (Bertrand et al., 2016). While plumage colour differences between the two apparently similar all-brown variants (LBHB and HBHB) may appear subtle, they are in fact significant when considering bird vision and using a visual model to project these colours in an avian-appropriate, tetrachromatic colour space (Cornuault et al., 2015). Patterns of coloration among forms and morphs are stable over time, with no apparent sex effect (see Gill, 1973; Milá et al., 2010).

Recent studies have revealed that dispersal and gene flow must be limited in the Reunion grey white-eye, with low levels of historical and/or contemporary gene flow among populations, unless they are very close geographically (<10 km) (Bertrand et al., 2014), and that more variation exists among the different geographical forms than would be expected under drift for both morphological and plumage colour traits (Cornuault et al., 2015). Thus, it is the combination of reduced dispersal and divergent selection that seems to explain why white-eyes were able to differentiate into multiple geographical forms within Reunion, as originally proposed by Gill (1973). The island has a dramatic topography (maximum elevation: 3,070 m), and a steep elevational gradient was found to be associated with strong divergent selection on phenotypes and marked genetic structure for autosomal microsatellites, a pattern that is consistent with isolation by ecology between lowland and highland forms (Bertrand et al., 2016). In contrast, lowland forms show no association with neutral genetic (microsatellite) structure or major changes in vegetation characteristics and associated climatic variables and are separated by very narrow hybrid zones centred on physical barriers to gene flow (Delahaie et al., 2017). Although the autosomal markers used could not provide information on sex-linked loci, these patterns of genetic differentiation suggest that while a sharp ecological transition between lowlands and highlands could drive differentiation at many autosomal loci through local adaptation, phenotypic divergence between lowland forms involves either fewer loci or loci concentrated in a narrower genomic region not covered by microsatellites (Delahaie et al., 2017).
In this work, we aim to: (i) identify the genomic variation associated with phenotypic differentiation between forms in relation to ecological variation (low versus high elevation) and divergence in signalling traits (conspicuous variation in plumage colour between lowland forms in the absence of abrupt ecological transitions); (ii) determine whether divergence peaks are found on autosomes or sex chromosomes, respectively; and (iii) identify potential candidate genes associated with divergent genomic regions. We used individual genotyping by sequencing (GBS) (Elshire et al., 2011) to characterize the amount of divergence between forms. We further used a pooled RAD-sequencing (RAD-seq) (Baird et al., 2008) approach that produced a high density of markers to characterize with greater precision the genomic landscape of divergence and assess the extent of differentiation between the different colour forms. Finally, to test whether incomplete lineage sorting or gene flow explained shared genetic variation among forms, we use coalescent models to test alternative demographic scenarios of divergence, including models with different temporal patterns of gene flow and effective population size changes over time.
2 MATERIAL AND METHODS
2.1 Field sampling
We sampled a total of 259 Reunion grey white-eyes between 2007 and 2012 from nine locations that were chosen to extensively cover the species’ range and the different geographical forms. We also sampled 25 Mauritius grey white-eyes (Zosterops mauritianus) from a single location on Mauritius to be used as an outgroup in some of our analyses as this species and the Reunion grey white-eye are sister taxa (Warren, Bermingham, Prys-Jones, & Thébaud, 2006). Birds were captured using mist-nets, marked with a uniquely numbered aluminium ring, and ~50 µl of blood was collected from each individual and preserved in Queen's lysis buffer (Seutin, White, & Boag, 1991). All manipulations were conducted under a research permit issued by the Centre de Recherches sur la Biologie des Populations d’Oiseaux (CRBPO) – Muséum National d’Histoire Naturelle (Paris). Individuals were sexed using PCR (polymerase chain reaction) (Griffiths, Double, Orr, & Dawson, 1998) to infer the number of distinct Z chromosomes included in each genetic pool. We included 152 females and 132 males in this study, among which 47 females and 48 males were included in the GBS experiment (see below).
2.2 GBS using individual DNA samples
We performed GBS (Elshire et al., 2011) on 95 individuals, including 90 Reunion and five Mauritius grey white-eyes (Figure 1). We included 7–14 individuals from each Reunion location and two locations per geographical form; such a sampling scheme should be sufficient to retrieve patterns of differentiation and diversity at the scale of forms, as highlighted by both theoretical (Willing, Dreyer, & Oosterhout, 2012) and empirical studies (Jeffries et al., 2016; Nazareno, Bemmels, Dick, & Lohmann, 2017). GBS is similar to RAD-seq, but involves fewer preparation steps (Elshire et al., 2011) and it samples loci at a lower resolution across the genome. Approximately one microgram of DNA was extracted with a Qiagen DNeasy Blood & Tissue kit following the manufacturer's instructions and sent to the BRC Genomic Diversity Facility at Cornell University (see Elshire et al., 2011) for single-end sequencing on a single lane of an Illumina HiSeq2000 device after digestion with PstI. Read length was 100 bp. Three individuals had to be removed from subsequent analyses due to the extremely low number of reads obtained (Table S1). Raw reads were trimmed with trimmomatic (version 0.33; Bolger, Lohse, & Usadel, 2014) with a minimum base quality of 20. We used the recently assembled Zosterops lateralis genome (Cornetti et al., 2015) to map reads back onto this reference with bwa mem (version 0.7.12; Li & Durbin, 2009) and samtools (version 1.3.1; Li et al., 2009), instead of creating consensuses directly from the data as in Bourgeois et al. (2013). Reads with a mapping score below 20 were excluded (samtools view -q 20). We then aligned contigs and scaffolds from a congeneric white-eye species, Z. lateralis, on the zebra finch (Taeniopygia guttata) passerine reference genome (version July 2008, assembly wugsc version 3.2.4) using lastz (version 1.03.54; Harris, 2007; Schwartz, Kent, & Smit, 2003). We used the following options and thresholds: --masking = 254 --hspthresh = 4,500 --gappedthresh = 3,000. The first option means that any locus found mapping more than 254 times is automatically masked and does not appear in the final pairwise alignment. The --hspthresh parameter is an option that excludes any alignment with a score lower than 4,500 during the gap-free extension stage. The --gappedthresh option controls the maximum size of the gaps allowed to join best local alignments; the higher the score, the fewer gaps are allowed. We used the same set of options previously used in comparisons between other related bird genomes, such as chicken and grouse (e.g., Kozma, Melsted, Magnússon, & Höglund, 2016). Scaffolds were then assigned to chromosomal regions based on their alignment scores. We note that synteny is well conserved in birds (Derjusheva, Kurganova, Habermann, & Gaginskaya, 2004), and that misalignment is therefore unlikely to constitute a major source of errors.
SNPs were called using freebayes (version 0.9.15–1; Garrison & Marth, 2012) and filtered with vcftools (version 0.1.12b) using the following criteria for autosomal markers: (a) a sequencing depth between 8 × and 100 × for each individual genotype; (b) a minimal genotype quality of 20; and (c) no more than nine missing genotypes. Missing data per individual before filtering and after removing individuals with low read count was at most 50% (average 30%, SD = 5%, see Table S1). Average sequencing depth was 9.75× (SD = 3.05). The range of sequencing depth for filtering was chosen based on visual examination of histograms produced by samtools (option depth), to remove loci with a clear excess of mapping reads that may indicate repetitive sequences and those loci with very low depth for which genotypes may not be called confidently, while retaining enough information for inference. For Z-linked markers, we first listed scaffolds mapping on the Zebra finch's Z chromosome based on lastz alignments. We then used vcftools to extract genotypes found on Z scaffolds (providing a list of these scaffolds with the option --bed). SNPs were filtered in male individuals only, using the same criteria as for autosomes, except that no more than five missing genotypes were allowed. We then extracted these sites in females only, using vcftools (option --positions), and removed markers displaying more than three heterozygous females, allowing for some tolerance because freebayes attempts to balance the count of heterozygotes in a diploid population. This led to the removal of 171 sites out of 1,136. We then recalled SNPs with freebayes in females only, assuming haploidy (option --ploidy 1). The constraint on sequencing depth and genotype quality was removed in females to consider the fact that a single Z copy is found in these individuals, therefore reducing depth of coverage at Z-linked markers. Because we excluded reads with a mapping quality below 20 when creating BAM files, we considered that a single read was enough to call a site in females for the Z scaffold. This decision was taken to maximize the number of markers available for this chromosome. The final data set consisted of 34,951 autosomal markers and 965 Z-linked markers. Recent studies have suggested the existence of a neo sex chromosome in Sylvioidea, consisting of a fusion between ancestral Z and W chromosomes with the first 10 Mb of the zebra finch's chromosome 4A (Pala et al., 2012). We therefore excluded this region from our analyses and studied it separately, focusing on males only. In males, 917 and 730 SNPs called by freebayes were found polymorphic on the Z and the 4A sex-linked fragment, respectively.
2.3 Pooled RAD-seq
To identify loci and genomic regions associated with ecological variation (low versus. high elevation) and divergence plumage colour between lowland forms, we used a paired-end RAD-seq protocol, using a data set partially described elsewhere (Bourgeois et al., 2013) in which six pools of 20–25 individuals from the same three locations as those sampled for the high-elevation form in the GBS experiment were sequenced. We added seven more pools of 16–25 individuals from the lowland forms to cover the same localities as the GBS data set (Figure 1; Table S2). This protocol was used because it produced a higher density of markers along the genome relative to the GBS approach described above, thus increasing the ability to detect outlier genomic regions. This approach resulted in ~ 600,000 contigs with an average size of 400 bp, covering about 20% of the genome (Bourgeois et al., 2013). The larger number of individuals included in each pool should also increase the ability to detect shifts in allele frequencies between populations. We modified the bioinformatics protocol used in Bourgeois et al. (2013) by mapping the reads on the Z. lateralis genome using bwa mem instead of creating contigs from the RAD-seq reads. PCR duplicates were removed using samtools (Li et al., 2009). SNPs were called using popoolation2 (version 1.201; Kofler, Pandey, & Schlötterer, 2011), using a minimal allele count of two across all pools, and a depth between 10 × and 300 × for each pool to remove loci that were clear outliers for sequencing depth while keeping a high density of markers along the genome. We used bedtools (version 2.25.0; Quinlan & Hall, 2010) to estimate the proportion of sites covered at a depth between 10 × and 300 × in each pool (option genomecov). Overall, more than 1,104,000 SNPs for autosomes and 42,607 SNPs for the Z chromosome were obtained, covering between 12% and 18% of the genome (Table S2). We accounted for the unequal number of alleles between autosomal markers and Z-linked markers in all subsequent analyses.
2.4 Genetic structure
To assess population genetic structure within and between geographical forms, we first performed a principal components analysis (PCA; Patterson, Price, & Reich, 2006) on all GBS autosomal markers, using the Bioconductor package seqvartools (version 1.24.0; Huber et al., 2015), excluding markers with a minimal allele frequency below 0.05. We then evaluated population structure for both autosomal and Z-linked markers using the software admixture (version 1.3.0; Alexander & Novembre, 2009). This software is a fast and efficient tool for estimating individual ancestry coefficients. It does not require any a-priori grouping of individuals by locality but requires defining the expected number (K) of clusters to which individuals can be assigned. Importantly, admixture allows us to specify which scaffolds belong to sex chromosomes, and corrects for heterogamy between males and females. “Best” values for K were assessed using a cross-validation (CV) procedure using 10 CV replicates. In this context, CV consists in masking alternately one-fifth of the data set, then using the remaining data set to predict the masked genotypes. Predictions are then compared with actual observations to infer prediction errors. This procedure is therefore sensitive to heterogeneity in structure across markers induced by, for example, selection. Therefore, we present results for all values of K as they may reveal subtle structure supported by only a subset of markers under selection. Based on patterns of linkage disequilibrium (LD)-decay, we thinned the data set to limit the effects of linkage, with a minimal distance between two adjacent markers of 1,000 bp (Figure S2). Pairwise LD (measured as r2, which does not require phasing) between all pairs of markers was computed in vcftools (version 0.1.12b).
To further explore whether changes in SNP caller and the number of markers could affect the admixture analysis and observed differences between autosomes and Z-linked markers, we called Z-linked SNPs using angsd (version 0.923; Korneliussen, Albrechtsen, & Nielsen, 2014), following an approach similar to that used with freebayes. We first called SNPs in all individuals assuming diploidy, using a uniform prior based on allele frequencies but not assuming Hardy–Weinberg equilibrium in samples (option -doPost 2). SNP likelihoods were computed following the model implemented in samtools (-GL = 1). We then called SNPs in females only, assuming haploid markers and calling the consensus base (option -doHaploCall 2). We filtered reads so they mapped to a single site in the genome (-uniqueOnly 1 -remove_bads 1), had a mapping quality of at least 20 (-minMapQ 20) and a minimum read quality of 20 (-minQ 20), and were covered in at least two-thirds of individuals with a minimum individual depth of 6 × in males (-geno_minDepth 6). We corrected for excessive mismatch with the reference and excess of SNPs with indels (-C 50 -baq 1). This resulted in 2,282 Z-linked SNPs.
To assess the relative proportion of genetic variance contributing to the differentiation of geographical forms (estimated by FCT) while taking into account population substructure (FSC and FST), we conducted an analysis of molecular variance (AMOVA) in arlequin version 3.5 (Excoffier & Lischer, 2010) using as groups either islands (Reunion versus Mauritius), lowlands and highlands, or the forms themselves. F-statistics for the whole data set are weighted averages. Significance was assessed with 1,000 permutations.
We assessed relationships between populations from pooled data using poptree2 (Takezaki, Nei, & Tamura, 2010) to compute FST matrices across populations using allele frequencies. A neighbor-joining tree was then estimated from these matrices. We included 20,000 random SNPs with a minimum minor allele count of 2. Branch support was estimated through 1,000 bootstraps. As a supplementary control, we also report the correlation matrix estimated from the variance–covariance matrix obtained by the software baypass (version 2.1) (Gautier, 2015) for both GBS and pooled data. The variance–covariance matrix reflects covariation of allele frequencies within and between populations. The correlation matrix describing pairwise relatedness between populations was then derived using the R function cov2cor() provided with baypass. The function hierclust() was used to perform hierarchical clustering based on matrix coefficients. For poptree2 analysis and AMOVAs, we accounting for the different number of alleles between males and females for Z-linked markers, including one allele for females and two for males.
2.5 Demography
To help distinguish between the presence of extensive gene flow between forms and incomplete lineage sorting as an explanation for the generally low differentiation, we performed model comparison under the likelihood framework developed in fastsimcoal2.6 (Excoffier, Dupanloup, Huerta-Sanchez, Sousa, & Foll, 2013) using frequency spectra inferred by angsd from autosomal and Z-linked GBS data. We focused on the split between populations from high and low elevations. This split was the clearest across all analyses, with very little genetic differentiation between lowland forms for these markers (see Results). Given the weak genetic substructure for autosomal markers, and because a large number of individuals within groups is required to infer very recent demographic events (Robinson, Coffman, Hickerson, & Gutenkunst, 2014), we pooled individuals into three groups: highlands, lowlands and Mauritius. We acknowledge that models incorporating substructure could be built, but this would come at the cost of adding more parameters. We preferred to limit this study to simple models that can serve as a basis for future, more detailed work using more markers and individuals (Otto & Day, 2007).
We used GBS autosomal and Z-linked markers as they could be filtered with higher stringency than pooled markers and were more likely to follow neutral expectations, with only 1.6% of the genetic variance explained by GBS loci harbouring significant differentiation between highland and lowland forms (see AMOVAs in Results). SNPs mapping on scaffolds corresponding to the neo sex chromosome region on 4A were discarded from the analysis (see Results). We extracted the joint derived site frequency spectrum (SFS) using angsd, which takes into account genotypic uncertainties to directly output the most likely SFS. We used the reference genome as an outgroup to assign alleles to ancestral or derived states. Using angsd should correct for biases that may occur when calling genotypes from low- and medium-depth sequencing. We filtered markers using the same criteria used for the admixture analysis on Z-linked loci (-uniqueOnly 1 -remove_bads 1 -minMapQ 20 -minQ 20 -doPost 2 -geno_minDepth 6 -C 50 -baq 1), but excluded two individuals with very low average sequencing depth (993 and 11_0990), and removed sites that were not covered in all remaining individuals. For Z-linked markers, we extracted the SFS from male individuals only, given that angsd cannot simultaneously extract the spectrum from samples with mixed ploidy. Entries in the joint SFS were examined to exclude potential paralogues displaying strong heterozygosity. We did not filter on minimal allele frequency because singletons are important to properly estimate parameters and likelihoods. We compared four distinct demographic models (Figure 3), one in which all forms were essentially treated as a single population (by forcing a very recent split 10 generations ago) that went through a change in effective population sizes after the split from Mauritius, one with no gene flow between highland and lowland forms, one allowing constant and asymmetric gene flow between lowland and highland forms, and a last model in which gene flow could vary at some time in the past between the present and the split between lowland and highland forms. Population sizes could vary at each splitting time and each group was assigned a specific effective population size. Parameters were estimated from the joint SFS using the likelihood approach implemented in fastsimcoal2.6 (Excoffier et al., 2013). Parameters with the highest likelihood were obtained after 20 cycles of the algorithm, starting with 50,000 coalescent simulations per cycle, and ending with 100,000 simulations. This procedure was replicated 50 times and the set of parameters with the highest final likelihood was retained as the best point estimate. The likelihood estimated by fastsimcoal2.6 is a composite likelihood, which can be biased by covariance between close markers (Excoffier et al., 2013). To properly compare likelihoods and keep the effects of linkage to a minimum, we first used a thinned data set with SNPs separated by at least 10,000 bp (see LD decay, Figure S2). We then used the complete data set for parameter estimation. We used a fixed divergence time of 430,000 years between Reunion and Mauritius grey white-eyes (Warren et al., 2006) and assumed a generation time of 1 year to calibrate parameters and obtain an estimate in demographic units for the timing of diversification in the Reunion grey white-eye. We used python scripts implemented in ∂a∂i (Gutenkunst, Hernandez, Williamson, & Bustamante, 2009) to visualize and compare predicted and observed SFS. To assess deviation from neutrality and stable demography, we estimated Tajima's D (Tajima, 1989) from the spectra with ∂a∂i.
We estimated 95% confidence intervals (CIs) using a nonparametric bootstrap procedure, bootstrapping the observed SFS 100 times using angsd and repeating the parameter estimation procedure on these data sets, using 10 replicates per bootstrap run to reduce computation time. To visualize whether the model fitted the observed data, we compared the observed SFS with 100 SFS simulated using the set of parameters obtained from the model with the highest likelihood. Coalescent simulations were carried out in fastsimcoal2.6, simulating DNA fragments of the size of GBS loci (91 bp) and using a mutation rate of 3.6 × 10−9 mutations per generation (Axelsson, Smith, Sundström, Berlin, & Ellegren, 2004) until the number of SNPs matched the number of segregating sites in the observed data set. Parameters for each simulation were uniformly drawn from the CIs of the best model. We summarized SFS by PCA using the gfitpca() function of the abc package (Csilléry, François, & Blum, 2012), including only entries with at least 0.1% of the total number of segregating sites to reduce variance.
2.6 Selection and environmental association
To detect loci displaying a significant association with geographical forms and environment, we performed an association analysis on the pooled RAD-seq data. We used the approach implemented in baypass (version 2.1) to detect SNPs displaying high differentiation (Gautier, 2015). This approach is designed to robustly handle uncertainties in allele frequencies due to pooling and uneven depth of coverage, by directly using read count data. It performs well in estimating differentiation and population structure (Hivert, Leblois, Petit, Gautier, & Vitalis, 2018). Briefly, baypass estimates a variance–covariance matrix reflecting correlations between allele frequencies across populations. Divergence at each locus is quantified through a Bayesian framework using the XTX statistic, which can be seen as an SNP-specific FST corrected by this matrix. baypass also offers the option to estimate an empirical Bayesian p-value (eBPis) which can be seen as the support for a nonrandom association between alleles and any population-specific covariable. We also computed eBPis to determine the level of association of each SNP with geographical form and elevation. Specifically, we tested associations between allele frequencies and a binary covariable stating whether pools belonged or did not belong to Mauritius grey white-eye, GHB, BNB and LBHB forms. We also tested for an association with elevation, coded as a continuous variable. baypass was run using default parameters under the core model with 25 pilot runs and a final run with 100,000 iterations thinned every 100 iterations. Because estimates of the variance–covariance matrix are robust to minor allele count thresholds (Gautier, 2015), we only included SNPs with a minor allele count of 10 over the entire data set to reduce computation time and ran separate analyses on the Z chromosome and autosomes to account for their distinct patterns of differentiation and allele counts.
2.7 GO enrichment analysis
To gain insight into the putative selective pressures acting on Reunion grey white-eye populations, we performed a Gene Ontology (GO) enrichment analysis, selecting for each association test SNPs in the top 1% for both eBPis and XTX, considering separately the Z chromosome and autosomes. Gene annotations within 100-kb windows flanking selected SNPs were extracted using the zebra finch reference and the intersect function in bedtools version 2.25.0 (Quinlan & Hall, 2010). We chose 100 kb based on the average territory size of protein-coding genes in the zebra finch (87.8 kb, see table 1 in Warren et al., 2010). We adjusted the gene universe by removing zebra finch genes not mapping onto the Z. lateralis genome. GO enrichment analysis was performed using the package topGO in R (Alexa and Rahnenfuhrer, 2016), testing for significant enrichment using a Fisher's test for over-representation. We present raw p-values instead of p-values corrected for multiple testing, following recommendations from the topGO manual, and because we wanted to detect any interesting trend in the data set that could then be further explored. We present the top 50 GO terms associated with biological processes, ranked by their raw p-values.
3 RESULTS
3.1 Genetic structure and relationships among geographical forms
We first assessed whether geographical forms could be distinguished based on the genomic data available for individuals (GBS). A PCA on autosomal allele frequencies revealed a clear distinction between Mauritius and Reunion grey white-eyes on the first axis, as well as a distinction between localities from lowlands and highlands on the second axis (Figure 1b). When excluding Mauritius white-eyes, the main distinction remained between localities from high and low elevation, reflecting differentiation between highland and lowland forms. Further principal components did not reveal any strong clustering of the different forms from the lowlands (Figure 1b). This pattern was further confirmed by the admixture analysis, where low and high clustered separately for both autosomal and Z-linked markers (Figure 2). The CV procedure gave K = 2 and K = 7 as best models for autosomal and Z-linked markers respectively (Figure S3). However, CV scores were low for all K values ranging from 5 to 7 for Z-linked markers, making it difficult to clearly identify an optimal value. Given the subtle genetic structure, we present results for values of K ranging from 2 to 7.

Clustering was consistent with the PCA, with a distinction between localities from high and low elevations at K = 2 and 3, for Z-linked and autosomal markers respectively (Figure 2a). For autosomal markers, higher values of K highlighted a structure consistent with sampling sites, but structure according to geographical forms was more elusive. For Z-markers, clustering tended to be more consistent with forms at low elevation. At K = 7, one cluster corresponded to localities at high elevation, three others grouped lowland localities by forms, while a fifth cluster included Mauritian individuals (Figure 2a). Two localities (LF and TC) displayed stronger signals of mixed ancestry, probably due to gene flow between those two localities which are close to a zone of contact between the GHB and BNB forms. The same pattern was observed using the set of Z-linked SNPs called with angsd, the distinction between GHB and BNB forms being even clearer, probably due to the larger number of markers (Figure S4). We note that using this set of markers improved the discrimination between colour forms, probably because of the higher number of SNPs remaining after filtering with angsd (2,282 loci) instead of freebayes (965 loci).
Recent studies revealed the existence of neo-sex chromosomes in Sylvioidea, to which Zosteropidae belong (Pala et al., 2012), and their existence was recently confirmed in the Reunion grey white-eye (Leroy et al., 2019). These neo-sex chromosomes emerged from the fusion of Z and W chromosomes with the first 10 Mb of chromosome 4A. We extracted GBS markers mapping to this region and confirmed that they were sex-linked, as shown by a PCA on allele frequencies (Figure S5). Visual examination of genotypes revealed that females (ZW) displayed a strong excess of heterozygous markers, in contrast to males (ZZ), due to divergence between neo-Z and neo-W chromosomes. We further investigated population structure in males only, using a set of combined Z and 4A markers, and an admixture analysis tended to better discriminate geographical forms at K = 6 with this set of markers (Figure S6).
The same pattern was observed with the poptree2 analysis on the pooled data set. The autosomal topology supported a grouping of localities from high elevation together and supported a grouping of localities belonging to BNB and LBHB forms, yet there was no support for grouping together localities from the GHB form. In contrast, topology based on Z markers provided good support for a grouping of localities by geographical form (Figure 2b).
We further investigated population structure by estimating the variance–covariance matrix obtained from allele frequencies for both pooled and GBS data using baypass (Gautier, 2015). Localities from high elevation were systematically found clustering together, a pattern consistent with previous analyses (Figure S7). Again, both analyses on Z-linked markers for GBS and pooled data revealed a closer relationship between populations from the same geographical form when compared to autosomal markers. LBHB and BNB forms clustered together, with the GHB form branching off first within the lowland group.
To estimate quantitatively the proportion of the genome discriminating among geographical forms while accounting for population structure within forms, we performed an AMOVA on GBS data. This analysis confirmed the previous patterns, with a proportion of variance explained by forms or elevation not higher than 1.6% for autosomal markers (Table 1). The strongest differentiation was observed between localities from low and high elevations and between Mauritius and Reunion populations. For Z-linked markers, however, the proportion of variance explained by forms and elevation was more substantial, ranging from 3.4% to 12.6% (Table 1). This higher differentiation could not be explained by differences in sample size between autosomal and Z-linked data. For autosomal markers, AMOVAs performed only on males did not show substantial deviations from the results obtained with the full data set (Table 1).
Comparison | LBHB versus BNB | GHB versus LBHB | GHB versus BNB | High versus Low | Reunion versus Mauritius | |||||
---|---|---|---|---|---|---|---|---|---|---|
F-statistics | % of variance explained | F-statistics | % of variance explained | F-statistics | % of variance explained | F-statistics | % of variance explained | F-statistics | % of variance explained | |
Autosomes (without 4A sex-linked markers) | ||||||||||
Among groups (FCT) | 0.005 | 0.48 | 0.010 | 1.03 | 0.006 | 0.59 | 0.016 | 1.57 | 0.173 | 17.31 |
Among populations within groups (FSC) | 0.033 | 3.31 | 0.042 | 4.11 | 0.035 | 3.49 | 0.040 | 3.97 | 0.049 | 4.09 |
Within populations (FST) | 0.038 | 96.21 | 0.051 | 94.86 | 0.041 | 95.92 | 0.052 | 94.46 | 0.214 | 78.61 |
Z-linked markers | ||||||||||
Among groups (FCT) | 0.068 | 6.82 | 0.108 | 10.84 | 0.034 | 3.45 | 0.126 | 12.59 | 0.206 | 20.61 |
Among populations within groups (FSC) | 0.038 | 3.56 | 0.093 | 8.28 | 0.095 | 9.13 | 0.123 | 10.78 | 0.191 | 15.17 |
Within populations (FST) | 0.104 | 89.62 | 0.191 | 80.88 | 0.095 | 87.42 | 0.234 | 76.63 | 0.358 | 64.21 |
Autosomes (without 4A sex-linked markers, males only) | ||||||||||
Among groups (FCT) | 0.007 | 0.69 | 0.009 | 0.93 | 0.005 | 0.53 | 0.019 | 1.86 | 0.166 | 16.6 |
Among populations within groups (FSC) | 0.013 | 1.32 | 0.029 | 2.87 | 0.027 | 2.72 | 0.040 | 3.92 | 0.051 | 4.25 |
Within populations (FST) | 0.02 | 97.98 | 0.038 | 96.2 | 0.032 | 96.75 | 0.058 | 94.22 | 0.209 | 79.15 |
4A−1–10 Mb (males only) | ||||||||||
Among groups (FCT) | 0.044 | 4.44 | 0.069 | 6.92 | 0.048 | 4.81 | 0.080 | 7.95 | 0.370 | 28.58 |
Among populations within groups (FSC) | 0.050 | 4.80 | 0.059 | 5.47 | 0.039 | 3.68 | 0.075 | 6.89 | 0.118 | 8.44 |
Within populations (FST) | 0.044 | 90.76 | 0.124 | 87.61 | 0.085 | 91.51 | 0.148 | 85.16 | 0.286 | 62.98 |
3.2 Demographic history
We compared four nested models (Figure 3), allowing for no or constant gene flow after the split between populations. The highest likelihoods were found for the model allowing change in gene flow at some time (Tchange) after the initial split at Tsplit between lowlands and highlands (model D, Figure 3), while the other models were clearly rejected (Table 2). Such a pattern of higher gene flow in recent times is consistent with a scenario of introgression through secondary contact.

Marker | Model | Log(Likelihood) | Number of parameters | AIC | ΔAIC |
---|---|---|---|---|---|
Autosomes | D | −36,333.59 | 11 | 72,689.19 | 0.00 |
C | −36,372.41 | 8 | 72,760.81 | 71.62 | |
B | −36,453.23 | 6 | 72,918.45 | 229.27 | |
A | −37,509.72 | 7 | 75,033.44 | 2,344.25 | |
Z-linked | D | −1611.58 | 11 | 3,245.17 | 0.00 |
C | −1628.72 | 8 | 3,273.44 | 28.28 | |
B | −1662.54 | 6 | 3,337.09 | 91.92 | |
A | −1957.25 | 7 | 3,928.51 | 683.34 |
Assuming a conservative divergence time between Z. borbonicus and Z. mauritianus of 430,000 years, parameter estimates suggested a split between high- and low-elevation populations 400,000 years ago and an increase in gene flow 80,000 years ago (Table 3). Overall, point estimates of effective migration rates (2Nm, with N the diploid population size) were high (2NHigh mLow→High = 0.1 gene copies per generation, 2NLow mHigh→Low = 9.2 before Tchange, then reaching 2NHigh mLow→High = 9.9 and 2NLow mHigh→Low = 6.0 80,000 years ago), consistent with homogenization of genomes through migration. This model was able to explain our observed data set, as indicated by visual comparisons of the observed and predicted SFS (Figure S8a). Coalescent simulations using parameters drawn from CIs of the best model produced SFS similar to the observed one (Figure S8b).
Marker | Parameter | 2NMauritius | 2NLow | 2NHigh | 2Nanc_Reu | 2Nanc_Reu+Mau | T split | T change | mLow→High (recent) | mHigh→Low (recent) | mLow→High (ancient) | mHigh→Low (ancient) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Autosomal | Best estimate | 1,081,493 | 2,957,776 | 606,586 | 1,266,228 | 1,022,327 | 400,228 | 79,188 | 1.63E−05 | 2.03E−06 | 2.00E−07 | 3.11E−06 |
Autosomal | 2.5% lower bound | 906,204 | 2,303,253 | 480,897 | 596,984 | 916,712 | 374,624 | 62,305 | 1.54E−05 | 2.76E−07 | 1.40E−08 | 2.36E−06 |
Autosomal | 97.5% upper bound | 1,158,440 | 2,957,776 | 686,191 | 1,332,774 | 1,140,005 | 423,302 | 204,547 | 2.17E−05 | 3.60E−06 | 3.82E−06 | 5.64E−06 |
Z-linked | Best estimate | 640,280 | 1,595,330 | 228,792 | 996,150 | 905,370 | 404,812 | 133,746 | 6.66E−06 | 3.10E−07 | 1.92E−08 | 2.28E−09 |
Z-linked | 2.5% lower bound | 490,028 | 1,195,174 | 140,680 | 596,824 | 565,930 | 348,043 | 66,246 | 4.34E−06 | 3.76E−08 | 1.47E−09 | 6.91E−10 |
Z-linked | 97.5% upper bound | 878,312 | 1,807,538 | 338,704 | 1,175,232 | 1,194,501 | 419,568 | 166,238 | 1.05E−05 | 1.14E−06 | 2.36E−06 | 2.73E−07 |
Genes that are involved in reproductive isolation between populations are expected to resist gene flow due to counterselection of maladapted or incompatible alleles. This should result in an increased divergence at these loci when compared to the genomic background (Cruickshank & Hahn, 2014). We tested whether reduced gene flow explained increased divergence in the Z chromosome by estimating parameters from the four models but using this time all Z-linked markers, because they explained most of the differentiation between forms. We used the same filtering criteria that we used for autosomes but focused our analysis on diploid males only. The highest likelihood was also found for the model allowing gene flow to change after Tsplit (Table 2), again providing support for a scenario of introgression through secondary contact. As expected, given hemizygosity, effective population size estimates were lower for this set of markers than for autosomes (Table 3). Estimates of gene flow were also historically lower than for autosomes, while time since the split between highland and lowland groups was similar. This combination of lower population sizes and lower gene flow is expected to lead to increased divergence at Z-linked loci, in accordance with the stronger differentiation observed at these markers in AMOVAs and other tests.
To further explore whether differences in demographic inferences may be due to stronger effects of linked selection on the Z chromosome, we computed Tajima's D for each group (Tajima, 1989). This statistic should be negative (<2) in the case of recent population expansion, or if positive/purifying selection is acting. It should be positive (>2) in the case of a recent bottleneck or balancing selection. Tajima's D values for the autosomal spectra obtained by angsd were − 0.29, −0.73 and − 1.18 for Mauritius, highlands and lowlands respectively. Tajima's D values were higher for Z-linked markers, at 0.12, 0.09 and − 0.76, but followed the same trend, being lower in lowlands than in highlands and Mauritius. This suggests that widespread effects of linked selection on the Z chromosome are unlikely to explain its higher differentiation.
3.3 Genome scan for association and selection analysis
We used baypass (Gautier, 2015) on pooled data to retrieve markers displaying high levels of differentiation (XTX) and association (empirical Bayesian p-values; eBPis) with five different features (elevation, being GHB, being BNB, being LBHB, being Mauritian). Overall, this pooled data set included a total of 284 individuals sequenced at an average depth of ~250× over all pools. We examined Z and autosomal markers separately due to their distinct demographic histories. The results revealed a strong association of Z-linked markers with geographical forms and their associated colour phenotypes and elevation (Figure 4). SNPs discriminating Mauritius from other populations were distributed uniformly along the genome. Most of the peaks displaying a large XTX were also found associated with elevation. Strikingly, the clearest peaks on chromosomes 2, 3 and 5 covered large genomic regions, spanning several hundreds of kilobases. The sex-linked region on chromosome 4A was the clearest outlier. Because the neo-W chromosome is highly divergent and found in all females, an excess of variants with frequencies correlated to the proportion of females in the pool is expected. This may lead to high differentiation between pools with different sex ratios. Despite this, the strong association with elevation and geographical form on chromosome 4A is genuine because the expected proportion of divergent W-linked alleles in each pool was not correlated with those variables in our experiment (Figure S9), making confounding effects unlikely.

To assess the possible role of these genomic regions in adaptation, we performed a GO enrichment analysis using the zebra finch (Taeniopygia guttata) annotation (Tables S3-S6). Genes found in regions associated with elevation displayed enrichment for GO terms linked to development, body growth, gene expression, RNA metabolism, DNA organization, immune-system development, haematopoiesis and haemoglobin synthesis (GO:0005833: haemoglobin complex, three genes found over four in total, p = .0031). For genes associated with each one of the parapatric lowland forms, we mostly found associations with immune response, metabolic process, haematopoiesis, reproduction, morphogenesis and development (Tables S4-S6).
To identify candidates for plumage colour variation between forms, we screened the genes found in outlier regions for GO terms linked to melanin synthesis and metabolism (GO:0042438, GO:0046150, GO:0006582). TYRP1 (tyrosinase-related protein 1), located on the Z chromosome, was the only gene that was systematically found associated with BNB, GHB and LBHB forms. Another gene, WNT5A, was found associated only with the GHB form.
4 DISCUSSION
4.1 Genetic structure and genomic islands of differentiation
Our results quantify the relative importance of autosomal and sex-linked genetic variation underlying a potential case of incipient speciation in the Reunion grey white-eye. We confirm previous findings based on microsatellite data regarding the existence of a fine-grained population structure, with low but significant FST between localities (Bertrand et al., 2014), and a clear divergence between populations from low and high elevations (Bertrand et al., 2016; Cornuault et al., 2015). These events of divergence have probably occurred in the last 430,000 years (Warren et al., 2006), and perhaps even more recently as this divergence time estimate was based on mitochondrial data that tend to yield overestimates of population splitting times (Moore, 1995). A striking result is the clear contrast between patterns of variation in autosomes and the Z chromosome, the latter discriminating more clearly the different geographical forms and systematically displaying markers associated with plumage colour phenotypes and elevational ranges. Similar results were found independently for both pooled and individual data sets, and with two different SNP callers. These observations combined with demographic analyses suggest that while extensive gene flow has occurred between populations from high and low elevations, the Z chromosome acts as a barrier to it.
4.2 Reproductive isolation between forms may explain divergence at the Z chromosome
Reproductive isolation between nascent species can arise as incompatibilities between interacting loci accumulate in the genome, as described by the Bateson–Dobzhansky–Muller model (Dobzhansky 1934; Muller 1940, 1942; Orr, 1996). Sex chromosomes are particularly likely to accumulate such incompatibilities, because the genes they harbour are not transmitted by the same rules between males and females (Seehausen et al., 2014) and are therefore prone to genetic conflicts. Indeed, sex chromosomes often harbour many genes causing disruption of fertility or lower viability in hybrids (Good, Dean, & Nachman, 2008; Masly & Presgraves, 2007; Storchová et al., 2004; Storchová, Reif, & Nachman, 2010), which may lead to increased divergence (Carling & Brumfield, 2008; Ellegren et al., 2012; Macholán et al., 2011), and faster emergence of postzygotic isolation (Lima, 2014). Hemizygosity of sex chromosomes may also lead to the exposure of recessive incompatibilities in heterogametic individuals (Qvarnström & Bailey, 2009). This large effect of sex chromosomes has been a common explanation of Haldane's rule (Haldane, 1922; Orr 1997; Coyne & Orr, 2004), which states that in hybrids, the heterogametic sex often displays a stronger reduction in fitness, and may explain why an excess of highly differentiated regions on sex chromosomes is often observed during the early stages of speciation (e.g., Backström et al., 2010). An excess of highly differentiated regions on sex chromosomes at the early stages of speciation therefore suggests a role for intrinsic barriers to gene flow in promoting divergence (Backström et al., 2010).
In addition, premating and prezygotic isolation affect sexually dimorphic traits that are under the control of sex-linked genes (Pryke, 2010). While this type of isolation would lead to divergence across the entire genome due to the isolation of gene pools, divergence is expected to be accelerated at loci controlling traits under sexual selection, especially if hybrids and backcrosses have lower mating success (Svedin, Wiley, Veen, Gustafsson, & Qvarnstrom, 2008). Finally, effective recombination rates are lower in sex chromosomes because they recombine in only one sex (males in birds), which facilitates linkage of genes involved in pre- and post-zygotic barriers. This linkage would ultimately promote reinforcement of isolation between species or populations (Pryke, 2010). Because of these processes, gene flow at isolating sex-linked loci is expected to be impeded between diverging populations, leaving a stronger signature of differentiation than in the rest of the genome (e.g., Mořkovský et al., 2018).
Our findings are in line with these theoretical expectations, and show an excess of highly differentiated markers on the Z chromosome that display evidence for resisting gene flow, a pattern that is also consistent with Haldane's rule (e.g., Carling & Brumfield, 2008). Importantly, highly differentiated autosomal markers were mostly associated with differences in elevation ranges but Z chromosome variation was found to be associated with both elevation and plumage colour differences between lowland forms. This suggests that the divergence between lowland and highland forms is due to both polygenic adaptation to different elevational ranges and behavioural differentiation (e.g., mating preferences) (Sæther et al., 2007).
We acknowledge that we have no direct evidence yet of assortative mating, character displacement or lower hybrid fitness in contact zones between the different geographical forms of our study species, but: (a) a previous study showed that Reunion grey white-eyes can perceive colour differences between forms, and that both within- and between-form differences in plumage colour can be discriminated (Cornuault et al., 2015); (b) hybrid zones are narrow among lowland forms (Delahaie et al., 2017) and quite likely also between lowland and highland forms (Bertrand et al., 2016), suggesting that hybrid phenotypes must have a lower fitness; and (c) high gene flow within forms can erase signatures of character displacement if alleles responsible for premating isolation are nearly neutral in populations that are distant from contact zones (Servedio & Noor, 2003).
4.3 Role of drift and recent selective sweeps in Z chromosome divergence
The higher differentiation observed on the Z chromosome may also be due to its reduced effective population size, which leads to faster drift and lineage sorting between forms. Such a mechanism may explain the generally higher rate of divergence observed on Z chromosomes (fast-Z effect). This effect can become even stronger in the presence of a recent bottleneck and population size change (Van Belleghem et al., 2018; Pool & Nielsen, 2007). However, we did not find any evidence for strong bottlenecks or abrupt changes in population sizes, making faster accumulation of divergent alleles in recently established populations unlikely. Moreover, another recent study on Z. borbonicus has also shown that there is weak support for a fast-Z effect in the clade to which this species belongs (Leroy et al., 2019). Another mechanism that might lead to stronger differentiation at Z-linked loci is female-biased dispersal, which is frequent in passerine birds (Greenwood, 1980). However, dispersal in Z. borbonicus is extremely reduced for both sexes (Bertrand et al., 2014), which should attenuate the contrast between autosomal and Z-linked markers. In addition, although it could lead to stronger differentiation between localities, such a mechanism alone is unlikely to explain why such a high proportion of variance is associated with forms on the Z chromosome. Finally, we do not observe higher differentiation on Z markers or autosomes in males as compared to all individuals (Table 1).
Lower effective recombination rates on the sex chromosome may enhance the effects of selection at sites linked to loci involved in ecological adaptation, further reducing diversity on the Z chromosome. In addition, a possible explanation of a fast-Z effect in birds may result from positive selection on recessive beneficial alleles in heterogametic females (Dean, Harrison, Wright, Zimmer, & Mank, 2015). Faster drift on the Z chromosome may lead to a faster accumulation of incompatibilities (Janoušek et al., 2019), which will further increase divergence. Ultimately, drift and selection are interconnected processes that are difficult to disentangle. Unfortunately, we cannot provide with our data set alone a detailed understanding of how processes such as linked selection, chromosomal rearrangements and barriers to gene flow can interact (Bierne et al., 2011; Ravinet et al., 2017). Future studies using whole-genome resequencing data should provide a clearer picture, by providing information on genealogies and age of both autosomal and sex-linked haplotypes.
However, we note that the allele frequency spectra of lowland and highland forms do not display the expected signature of pervasive linked selection, with Tajima's D actually higher for Z markers than for autosomes. Our demographic analyses also show that the effective population sizes estimated from Z markers are reduced by a factor of 0.38–0.89 when compared to autosomes, the neutral expectation being 0.75. We acknowledge that polygenic adaptation to divergent environmental pressures is probably partly responsible for the observed genomic landscape of differentiation, possibly leading to localized divergence at autosomal and Z loci (see below). However, reproductive isolation seems to be at play in this system, either through premating isolation based on plumage characters, and/or post-zygotic isolation. More research about the ecology of the Reunion grey white-eye would also be useful to quantify the extent of sex-biased dispersal, assortative mating, parental imprinting, intrinsic incompatibilities, and how these factors may interact in this system (Pryke, 2010; Seehausen et al., 2014).
4.4 Autosomal divergence is mostly associated with elevation
Adaptation to local environmental conditions or natural selection against hybrids is more likely to be driven by genes scattered across the genome, assuming polygenic selection (Qvarnström & Bailey, 2009; Seehausen et al., 2014). In addition to Z-linked loci, we found many differentiated loci on autosomes, mostly in association with elevation. Genomic regions associated with elevation displayed an enrichment of genes involved in development and body growth, and included the cluster of haemoglobin subunits A, B and Z on chromosome 14. The function of these genes is consistent with biological expectations, given the wide elevational range (from 0 to 3,000 m) occupied by white-eyes on Reunion.
Large chromosomal rearrangements are powerful drivers of differentiation, as they prevent recombination between several consecutive genes, facilitating the maintenance of divergent allele combinations between populations. A famous example of adaptive inversions facilitating the maintenance of colour (geographical) forms and species has been reported in Heliconius butterflies (Joron et al., 2011) or the white-throated sparrow (Tuttle et al., 2016), and these rearrangements have been predicted to take place in isolation followed by secondary contact (Feder, Gejji, Powell, & Nosil, 2011). In this study, autosomal regions associated with geographical forms and elevation sometimes spanned more than 1 Mb. This suggests a possible role for large-scale rearrangements and linked selection in regions of low recombination as a substrate for divergence in Reunion grey white-eyes. Our results remind at a much smaller spatial scale what has been previously observed in Ficedula flycatchers, with high differentiation on the Z chromosome, linked selection and chromosomal rearrangements (Burri et al., 2015; Ellegren et al., 2012).
The low number of Z-linked markers found in this study and their higher level of population differentiation may limit the interpretation of results when comparing patterns of divergence with autosomes. Whole-genome resequencing data and refined demographic models building on those used in this study will be critical to precisely quantify the evolutionary dynamics of the Z chromosome compared to autosomes, and identify at a higher resolution the loci displaying strong divergence. Future studies should also focus on the variation in allele frequencies along hybrid zones and test whether loci that are more likely to be involved in local adaptation (such as immune genes or haemoglobin subunits) display changes in frequencies that are as sharp as Z-linked loci, as the latter are more probably involved in both pre- and post-zygotic isolation. Overall, our results suggest an extreme case of divergence with gene flow that can bring valuable insights into the relative order at which pre- and post-zygotic isolation mechanisms occur during speciation.
4.5 Genetics of colour
TYRP1, a well-characterized colour gene in both model species and natural populations (Abolins-Abols et al., 2018; Backström et al., 2010; Delmore, Toews, Germain, Owens, & Irwin, 2016; Nadeau, Mundy, Gourichon, & Minvielle, 2007), was the only known colour gene found in the regions associated with each of the three lowland forms. This gene had been previously studied using a candidate gene approach in Z. borbonicus, but no clear association with plumage colour phenotypes was found, probably owing to the low levels of polymorphism displayed among lowland forms (Bourgeois et al., 2016). The WNT5A gene was found associated with the GHB form, with a brown back and a grey head. This gene is known to regulate TYRP1 expression (Zhang et al., 2013) and is differentially expressed between black carrion and grey-coated hooded crows (Poelstra, Vijay, Hoeppner, & Wolf, 2015). However, this gene is not only involved in melanogenesis but also in cell migration and differentiation, making it a less straightforward candidate in this system.
Our pooled RAD-seq approach covered only about 10%–20% of the genome, and may therefore have missed some colour loci. The likelihood is high, however, that recent selective sweeps would have been detected if such genes had been targeted by selection. For example, the locus underlying colour polymorphism in the high-elevation form shows clear signs of a selective sweep reducing diversity over 500 kb (see Figure 3 in Bourgeois et al., 2017), a region which is large enough to be covered by tens of RAD-seq loci. However, cases of long-term balancing selection may be harder to detect because of the short haplotypes typically found in this case due to extensive recombination. Together with previous findings (Bourgeois, Bertrand, Thébaud, & Milá, 2012; Bourgeois et al., 2017), this suggests that a large part of plumage colour variation between the geographical forms of the Reunion grey white-eye may be controlled by a set of a few loci of major effect. More detailed studies of hybrid zones between the different lowland forms may help to characterize the exact association of alleles that produce a given plumage colour phenotype.
ACKNOWLEDGMENTS
We thank Joseph Manthey, Stéphane Boissinot, Maëva Gabrielli and Thibault Leroy for insightful comments that improved the manuscript. We also thank the Reunion National Park for granting us permission to conduct fieldwork and to collect blood samples. Thomas Duval, Guillaume Gélinaud, Josselin Cornuault, Philipp Heeb, Dominique Strasberg, Ben Warren and Juli Broggi assisted with fieldwork. Emeline Lhuillier and Olivier Bouchez assisted with the development of pooled RAD-seq. This research was carried out on the High-Performance Computing resources at New York University Abu Dhabi and the Genotoul HPC cluster. This work was supported by Fondation pour la Recherche sur la Biodiversité (FRB), Agence Française pour le Développement (AFD), Agence Nationale de la Recherche (ANR-2006-BDIV002), Centre National de la Recherche Scientifique (CNRS) through a PEPS grant, The National Geographic Society, and the “Laboratoire d'Excellence” TULIP (ANR-10-LABX-41). The first author was supported by an MESR (Ministère de l'Enseignement Supérieur et de la Recherche) PhD scholarship during this study.
AUTHOR CONTRIBUTIONS
B.M. and C.T. initiated, coordinated and supervised the project; Y.B., B.M. and C.T. conceived the study and designed the experiments; B.M., C.T., Y.B., J.A.B. and B.D. conducted the fieldwork; molecular data were generated by Y.B. and H.H.; Y.B. analysed the data; and Y.B., B.M. and C.T. wrote the paper with input from the other authors. All authors gave final approval for publication.
Open Research
DATA AVAILABILITY STATEMENT
All data associated with this manuscript are published on DRYAD (VCF files, allele counts for pooled data and position of Z. lateralis scaffolds on zebra finch chromosomes; https://doi.org/10.5061/dryad.z34tmpg8z) and European Nucleotide Archive (BAM files for pools and fastq files for individual GBS data; accession number PRJEB36701, https://www.ebi.ac.uk/ena/browser/view/PRJEB36701).