Phylogeographic relationships and morphological evolution between cave and surface Astyanax mexicanus populations (De Filippi 1853) (Actinopterygii, Characidae)
William R. Elliott: Retired.
Abstract
The Astyanax mexicanus complex includes two different morphs, a surface- and a cave-adapted ecotype, found at three mountain ranges in Northeastern Mexico: Sierra de El Abra, Sierra de Guatemala and Sierra de la Colmena (Micos). Since their discovery, multiple studies have attempted to characterize the timing and the number of events that gave rise to the evolution of these cave-adapted ecotypes. Here, using RADseq and genome-wide sequencing, we assessed the phylogenetic relationships, genetic structure and gene flow events between the cave and surface Astyanax mexicanus populations, to estimate the tempo and mode of evolution of the cave-adapted ecotypes. We also evaluated the body shape evolution across different cave lineages using geometric morphometrics to examine the role of phylogenetic signal versus environmental pressures. We found strong evidence of parallel evolution of cave-adapted ecotypes derived from two separate lineages of surface fish and hypothesize that there may be up to four independent invasions of caves from surface fish. Moreover, a strong congruence between the genetic structure and geographic distribution was observed across the cave populations, with the Sierra de Guatemala the region exhibiting most genetic drift among the cave populations analysed. Interestingly, we found no evidence of phylogenetic signal in body shape evolution, but we found support for parallel evolution in body shape across independent cave lineages, with cavefish from the Sierra de El Abra reflecting the most divergent morphology relative to surface and other cavefish populations.
1 INTRODUCTION
Cave-dwelling animals provide a unique opportunity to study phenotypic evolution in response to known environmental constraints such as the lack of light or reduced nutrients (White et al., 2019). Such conditions impose strong selective pressures on animals inhabiting these environments (e.g. Carlini & Fong, 2017; Gross et al., 2009, 2013; Stern & Crandall, 2018a, 2018b; Tabin, 2016). Common traits in cave-dwellers include regressive features like loss of eyes and body pigmentation (Jeffery, 2009; Klaus et al., 2013; McGaugh et al., 2014), constructive traits like increased non-visual senses (Gonzalez et al., 2018; Soares & Niemiller, 2020; Yoshizawa et al., 2012, 2013) and a variety of behavioral and physiological changes (Duboué et al., 2011; Elipot et al., 2013, 2014; Hyacinthe et al., 2019; Jaggard et al., 2017; Jeffery, 2020; Kowalko, 2019; O'Quin et al., 2013; Shiriagin & Korsching, 2019). Cave-adapted organisms also provide the opportunity to study how phenotypic evolution can result from historical contingencies such as phylogenetic inertia or genetic drift.
Cavefish of the Astyanax mexicanus complex include 32 geographically distinct cave populations found in the Sierra de El Abra, Sierra de Guatemala and Sierra de la Colmena (Micos) in Northeastern Mexico (Elliott, 2018; Espinasa et al., 2020). Several studies propose cave forms of northeastern Mexico have evolved through at least two invasions of two different surface lineages (e.g. Avise & Selander, 1972; Bradic et al., 2012; Ornelas-García & Pedraza-Lara, 2016; Ornelas-García et al., 2008; Herman et al., 2018; Strecker et al., 2003). A number of open questions remain, including whether extant surface and cave populations correspond to the same species, as well as the timing and mode of evolution of the cave populations (i.e. number of cave colonizations). For example, several taxonomic designations have been proposed, including the delimitation of species from one: A. fasciatus sensu (Strecker et al., 2004) to five distinct species: Astyanax argentatus (Bravo-Conchos basins and Nazas Aguanaval), Astyanax jordani (troglobitic forms), Astyanax mexicanus (sensu Miller et al., 2005), Astyanax rioverde (Río Verde, S.L.P.) and Astyanax tamiahua (Rivers in Tamaulipas, Tamiahua lagoon) (see Schmitter-Soto, 2016, 2017).
The taxonomic uncertainty within the group is likely due to the lack of a robust phylogenetic hypothesis that necessitates exhaustive sampling across extant cave populations. Additionally, there is controversy surrounding the number of invasions of surface populations into caves. The consensus is that at least two surface lineages have contributed to the current cave population distribution, but the number of hypothesized origins of the cave phenotype ranges from 1 to 5 (e.g. Avise & Selander, 1972; Badric et al., 2012; Bradic et al., 2012; Coghill et al., 2014; Fumey et al., 2018; Herman, 2018; Herman et al., 2018; Pérez-Rodríguez et al., 2021). Disparities in the literature are most likely explained by differences in sampling and the utilized markers, which include mtDNA (Dowling et al. 2002; Ornelas-García et al., 2008; Strecker et al., 2004), SNPs and simple sequence repeats (microsatellites; Bradic et al., 2012; Strecker et al., 2012), transcriptomes (Fumey et al., 2018), RADseq loci (Coghill et al., 2014) and resequenced genomes (Herman et al., 2018; Moran et al., 2023).
Further, adequate representation of surface variation has been lacking, since only one population from the two identified surface lineages has been considered (i.e. Rascón population, Ornelas-García et al., 2008; Herman et al., 2018), alongside a wider representation of the cave populations.
Moreover, there is a lack of consensus regarding the timing of cave colonization, as suggested dates span from millions to thousands of years in the past (see Fumey et al., 2018; Herman et al., 2018; Ornelas-García et al., 2008; Strecker et al., 2004). The most recent studies (using genomic data) suggest a very recent origin. Fumey et al. (2018) suggest cave populations originated <20 kya and found no support for two independent lineages. In contrast, Herman et al. (2018) reported two different cave lineages with similar divergence times from their closest surface populations—between 161 and 191 k generations ago.
Prior studies identified examples of morphological convergence among different cave animals (e.g. fishes and salamanders), including dorsoventral head-flattening and duck-bill-like rostral flaring (Christiansen, 2012; Edgington & Taylor, 2019; Fenolio et al., 2013; Soares & Niemiller, 2020). More recently, it was shown that cave forms from two independent lineages in Amblyopsidae fishes have a common body shape pattern, with long and slim heads and bodies in contrast with their surface counterparts (Hart et al., 2020). Previous studies have suggested a parallel evolution of troglobitic traits across lineages of Astyanax mexicanus (Gross et al., 2009; Jeffery, 2009; Powers et al., 2020); however, so far body shape evolution has not been explored in a phylogenetic context. Thus, the mechanisms underlying body shape evolution in cave environments of Astyanax, and how much is explained by their evolutionary history versus environmental pressures, remain unknown.
Here, we present the most exhaustive sampling of cave and surface Astyanax mexicanus populations to date to address these unanswered questions. Using RADseq and genomic-wide sequencing for 360,651 SNPs, we reconstructed the phylogenetic relationships based on 152 surface and cave-adapted Astyanax mexicanus individuals using maximum likelihood (ML), Splits network and coalescent species tree estimation methods (SVDQuartets). Next, we assessed body shape evolution across different cave lineages using geometric morphometrics, to evaluate if body shape has evolved in parallel in response to the cave environments or their historical contingencies (i.e. phylogenetic inertia). We found strong evidence of parallel evolution for cave populations, derived from two separate lineages of surface fish, and we hypothesize that there may have been up to four independent invasions of caves from surface fish. Interestingly, we found evidence of parallel evolution in the body shape across independent cave lineages, with cavefish from the Sierra de El Abra reflecting the most divergent morphology relative to the surface and other cavefish populations.
2 MATERIALS AND METHODS
2.1 Sampling and sequencing
We included a total of 152 individuals of the Astyanax mexicanus complex (Table S1; Figure 1a), from 20 cave populations in the Sierra de El Abra (14 caves), Sierra de Guatemala (five caves) and Micos (one cave); nine surface populations of the Pánuco River basin and six surface locations from Bravo-Conchos basins in northern Mexico. Additionally, four specimens were used as outgroups: two samples from the Tehuantepec basin, corresponding to Astyanax sp. nov., and two samples from the Astyanax aeneus lineage from the Balsas River basin (Herman et al., 2018). For the collection of cave specimens, permission was obtained from the competent authorities (SEMARNAT SGPA/DGVS/2438/15–16, SGPA/DGVS/05389/17 and SGPA/DGVS/1893/19). The specimens were cataloged and deposited in the National Fish Collection of the Institute of Biology, U.N.A.M. (LANABIO, IB-UNAM, Mexico City).

We used two sequencing strategies: restriction site-associated DNA sequencing (RADseq; Davye et al., 2010) and whole-genome shotgun (WGS) resequencing. For the 68 WGS samples, we collected fin clips from wild-caught fish, flash-frozen samples with liquid nitrogen and stored them at −72°C. DNA was extracted using the DNEasy Blood and Tissue kit (Qiagen). These 68 samples were sequenced with 150 bp paired-end (PE) reads on an Illumina NovaSeq 6000 at the University of Minnesota Genomics Center (UMGC). For 15 fin clips fixed in ethanol, WGS resequencing was performed through BGI Americas, generating an estimated sequencing depth of ~30× using 150-bp PE reads using the X-ten experimental platform. For three samples of Caballo Moro, we obtained the WGS sequenced as 100 bp PE reads on an Illumina HiSeq 2500. Samples were prepared for Illumina sequencing by barcoding each sample using Illumina's TruSeq Nano kit. For RADseq, genomic DNA was extracted from 36 fin clip samples using the 4.5 M sodium chloride (NaCl) extraction method (Sonnenberg et al., 2007).
The selection of the restriction enzyme was based on in silico digestion simulations, testing different enzymes on the Astyanax mexicanus version 2.0 complete genome (Warren et al., 2021), using the R Sim-RAD package (Lepais & Weir, 2014). For RADseq, the DNA samples were digested using MspI restriction enzyme, prepared for Illumina sequencing using single barcoding, and processed with Illumina's TruSeq Nano DNA Sample Preparation Kit, using V3 reagents. The samples were multiplexed and sequenced as 150 bp single-end reads on the Illumina NexSeq500 platform. This resulted in an average of 4.6 million reads per sample. The preparation of the libraries, as well as the sequencing, was carried out by the UMGC. Additionally, 30 resequenced genomes of Astyanax mexicanus from Herman et al. (2018) were included in this study, for a total of 152 samples (Table S2; Figure 1a). Genomes were downsampled to match the RADseq loci, and only the RADseq loci were analysed. A companion paper (Moran et al., 2022) provides an in-depth analysis of WGS data.
2.2 Bioinformatics processing
2.2.1 Quality assessment and trimming
The quality of the reads was assessed using the FASTQC program (Andrews, 2010) for both WGS and RADseq data. Based on the quality of the data, FASTQ raw sequences were trimmed with the TRIMMOMATIC version 0.39 (Bolger et al., 2014). A sliding window clipping approach was used with the default settings, which cuts bases within the size interval once the average quality falls below a set threshold. The adapters used for the sequencing process were also searched for and eliminated, and the first 14 bases of all reads were cut off, regardless of quality. Only reads that were at least 75 bases long were retained. After the trimming process, a re-assess of the quality was carried out from trimmed samples, to ensure enough quality for subsequent steps on the bioinformatic pipeline.
2.2.2 Alignment, variant calling and genotyping
Trimmed samples were aligned using the Burrows–Wheeler alignment tool with the mem algorithm, implemented in the bwa-0.7.1 program (Li & Durbin, 2009, 2010) to the draft surface fish genome of Astyanax mexicanus version 2.0 (Warren et al., 2021). Subsequently, samtools (Li et al., 2009) and Picard v1.83 (http://broadinstitute.github.io/picard/) were used for downstream processing and variant calling. Variant calling and genotyping were carried out using the HaplotypeCaller and GenotypeGVCFs modules, respectively, from GATK, version 4.1.2.0. (McKenna et al., 2010). For the processing of all the samples, the RADloci from RADseq samples were first identified, filtered and merged into a single gVCF format file, using the GATK and BCFTOOLS programs (Danecek et al., 2014). The RAD loci were first constructed by calling variants and genotyping each of the RADseq samples in GATK, preserving the invariant sites, and subsequently, samples were merged with BCFTOOLS, discarding those sites with a depth of less than 10 readings. The genome positions of the conserved RADloci were obtained and used for variant calling and genotyping of the WGS and RADseq samples.
The individually genotyped gVCF files were merged into a single dataset using the merge function of BCFtools program (Danecek et al., 2014). In this unified dataset, regions with a depth below 10 reads were discarded. Similarly, insertions and deletions (INDELs), invariant sites and sequences with mapping quality below 40 were removed. To avoid misleading results, linked SNPs were filtered (LD > 0.2).
2.3 Phylogenetic relationships
We used RAxML-HPC2 on XSEDE (version 8.2.10; Stamatakis, 2014) on CIPRES Science Gateway (version 3.3; https://www.phylo.org/) to infer phylogenetic relationships with the GTRCAT sequence model of evolution (which is a computational solution for the GTR model of nucleotide substitution, widely used under the GAMMA model of rate heterogeneity (Stamatakis, 2016), and 100 fast bootstrap replicates as a measure of tree support. Since SNP data do not contain invariant sites, branch lengths on phylogenetic reconstruction could be overestimated, so we applied an ascertainment bias correction (Lewis, 2001) to the likelihood calculations in RAxML, following the manual recommendations to perform a phylogenetic reconstruction.
2.4 Coalescent-based species tree reconstruction
Coalescent-based species tree reconstruction was conducted using SVDQuartets in PAUP* version 4.0a (Chifman & Kubatko, 2014, 2015; Swofford, 2002). SVDQuartets use site pattern frequencies in SNP loci to infer singular value decomposition (SVD) scores among alternative unrooted quartet trees (with lower scores being better). The generated topologies of quartets are then merged into a full tree containing all samples in the phylogeny. We selected a subset of 70 samples (Table S3), which in our ADMIXTURE analyses (described below) did not show evidence of introgression with other populations, so we included some populations that previously were reported as introgressed in the literature (e.g. Tinaja, Micos and Caballo Moro). We sampled 100,000 quartets from the dataset, and support for nodes was assessed using nonparametric bootstrapping with 1000 replicates. The trees were rooted with two samples from the Tehuantepec basin, corresponding to Astyanax sp. nov., and two samples from the Astyanax aeneus lineage from the Balsas River basin.
2.5 Phylogenetic network
Considering the possibility of introgression, hybridization, incomplete lineage sorting and recombination, strictly bifurcating trees might not reflect the evolutionary history of our model system (Herman et al., 2018; Huson & Bryant, 2005, 2006; Huson & Scornavacca, 2011; Morrison, 2011). Thus, we used phylogenetic networks to examine nontree-like relationships (Huson & Bryant, 2006; Huson & Scornavacca, 2011; Morrison, 2011). A Neighbor-Net Splits network was created using uncorrected p distances from SNP data from the complete dataset (Table S2) including both cave and surface populations, in Splits Tree4 version 4.15.1 (Huson & Bryant, 2006), though analysis with all individuals is given in the supplementary material. We then visualized the Splits network using the rooted equal angle algorithm (Gambette & Huson, 2008), with 100 nonparametric bootstrap replicates to evaluate the support of network edges and present edges with greater than or equal to 95% confidence.
2.6 Population structure and differentiation between cave and surface populations
We characterized the genetic structure and admixture using ADMIXTURE version 1.3.0 (Alexander et al., 2009, 2015; Alexander & Lange, 2011), by estimating individual ancestry coefficients that represent the proportions of an individual's genome that originated from multiple ancestral gene pools. We conducted 10 independent unsupervised runs for each value of K (number of ancestral population clusters) ranging from K = 1 to 20. We used PLINK (Purcell et al., 2007) to prepare the data in the correct format for the ADMIXTURE analysis. To choose the best value of K for each run, we used the cross-validation procedure of ADMIXTURE and presented the lowest cross-validation value across the 10 runs.
We carried out a second independent approach to assess population structure using Discriminant Analysis of Principal Components (DAPC, Jombart et al., 2010) using the DAPC.genlight function of the adegenet package (Jombart & Ahmed, 2011). This analysis is based on the decomposition of genetic data into principal components and the development of synthetic variables called discriminant functions. The aim is to identify the alleles that best discriminate a set of predefined clusters (Jombart et al., 2010). For this analysis, the clusters obtained with the ADMIXTURE analysis for the best K value among the 10 runs were used to designate the membership (>50%) for each individual. The same data used for ADMIXTURE analysis were used for the DAPC analysis and formatted by importing into a genlight object, using the functions of the vcfR package (Knaus & Grünwald, 2017) read.vcf and vcfR2genlight, respectively.
The estimation of the principal components as a preliminary step allows the discriminant functions to be uncorrelated. However, it is required that the number of principal components used is sufficient to ensure correct discrimination between groups, without overfitting the model (Jombart & Collins, 2015). To select the number of principal components to be used, a cross-validation analysis was carried out. In this test, the data were divided into two sets: a training set (90% of the total data) and a validation set (with 10% of the remaining data). The validation set was selected with a stratified random sampling, which guarantees that at least one member of each conglomerate or cluster was represented in both sets (Jombart & Collins, 2015). To identify the optimal number of principal components to retain, we carried out the DAPC cross-validation with 30 repetitions of the training set with variable numbers of retained principal components and evaluated (through the root-mean-square error) the success of the membership prediction for the individuals of the validation set (Jombart & Collins, 2015).
2.7 Ancient and recent gene flow
We used TREEMIX version 1.12 (Pickrell & Pritchard, 2012), to visualize migration events after removing individuals with signals of recent hybridization. For a subset of individuals (113 individuals, Table S4) corresponding to the main lineages, biallelic SNPs were imported in R (R core team, 2019) and transformed into a genlight object using the vcfR library (Knaus & Grünwald, 2017). After that, we use the gl2treemix function from the DartR package (Gruber & Georges, 2019) to convert the dataset into the correct input format for TREEMIX. For the selection of the number of migration events, we used the OptM Package version 0.1.3 (Fitak, 2021). Selecting the best migration model can be difficult. Adding migration events will always improve the likelihood of the model but may explain little variation in the data. OptM selected the number of migration events using ad hoc statistics based on the rate of second-order change in likelihood (Fitak, 2021). In this work, we used Evanno's method (∆m) as an ad hoc statistic for the selection of the number of migration events. Ten different migration events were evaluated on TreeMix (with −m flag, from −m1 to −m10), based on the change in the composite log-likelihood observed in Evanno's test (∆m) in OptM (Fitak, 2021). To take into account that nearby SNPs might not be independent, the analysis was carried out using different partitions of the dataset, using the −k parameter in TreeMix (−k = n SNPs window size). A total of 10 different values of k were used, ranging from −k from 600 to 1050 SNPs, with an increase of 50 bases per iteration. This procedure generated a total of 100 different migration models (with 10 different k partitions for each migration event). To evaluate the confidence in the topology estimated by TREEMIX, a consensus tree was built with Mesquite version 3.51 (Maddison & Maddison, 2019), using the topology of the 100 estimated models.
2.8 Inference of historic population dynamics and divergence times between lineages
Based on the genetic clustering of Astyanax mexicanus obtained with ADMIXTURE, we estimated the population dynamics and divergence time between pairwise clusters from different lineages through a demographic modelling, using a composite-likelihood method based on the site frequency spectrum (SFS), as implemented in Fastsimcoal2 (Excoffier et al., 2013). We evaluated three alternative hypotheses about the divergence times of the two lineages in A. mexicanus: from 0.1 to 1 Mya (Herman et al., 2018); from 5 to 7 Mya (Ornelas-García et al., 2008) and an intermediate interval from 1 to 5 Mya. The pairwise comparisons for each divergence time interval were carried out under four different models, considering the comparisons within and between environments.
To conduct demographic modelling, we generated unfolded joint site frequency spectra (2D-SFS) with easySFS software (https://github.com/isaacovercast/easySFS) using the filtered VCF file. First, we run a preview run to identify the values for projecting down each population. Then, we specify the projection values for each pair of genetic clusters. A total of 24 different pairwise comparisons for each time interval between genetic clusters were fitted to each 2D-SFS to infer the timing of population coalescence and effective population sizes using Fastsimcoal2. The average generation time was established at 1 year per generation (Herman et al., 2018). Estimation of unknown parameters was allowed by introducing an interval of the effective population size (Ne), based on a demographic test described in Bradic et al. (2012), and a mutation rate of 3.5 × 10–9 bp per generation which is estimated using parent–offspring trios in cichlids (Malinsky et al., 2018), similar in Herman et al. (2018). For the construction of the models, the gene flow prior to coalescence was not considered.
Each pairwise comparison made with Fastsimcoal2 was run with 100,000 simulations and 100 cycles of conditional maximization for each model (e.g. surface Lineage 1 vs. surface Lineage 2), with a different coalescence time interval. Each comparison was run independently 100 times for parameter estimation. Since the number of parameters is the same between models, the likelihood values can be directly compared to select the best divergence time interval. After identifying the best model, we used a block-bootstrapping procedure to account for linkage between SNPs to get the coalescence time in the best model. We generated different bootstrap samples from the original VCF file, from which the frequency spectra were rebuilt for each pairwise comparison, and we ran the parameter estimation under the best model 100 times for each pairwise comparison with each of these bootstrapped SFSs. With those 100 best runs for each comparison under the best model, the distributions of the divergence times were obtained.
2.9 Geometric morphometrics
To characterize the morphological divergence between surface and cavefish, we performed landmark-based geometric morphometrics (GM) analyses on lateral left-side view of the body shape, for a total of 146 individuals from 18 cave and surface populations wild-caught and kept in captivity for less than 3 years when the pictures were taken (Table S5). We digitized 12 homologous landmarks and a curve with 15 pseudo-landmarks with the TPSDig2 version 2.31 (Rohlf, 2015; Table S6; Figure S1). A generalized Procrustes analysis (GPA) was performed with the ‘gpagen’ function in Geomorph version 4.02 (R Core Team, 2019), and with the ‘curve’ argument the sliders were defined with the Procrustes distance criterion to optimize the position of the sliding reference point during the GPA. To avoid a size effect by allometry, the residuals of the regression of the shape on the centroid size (CS) were calculated with the function ‘procD.lm’. Once the residuals were obtained, an allometry-free shape was created, using these residuals of the morphological data. The analyses were made on this allometry-free shape. To assess the degree of variation in body shape between groups, we used principal component analysis using the ‘gm.prcomp’ function. This is an ordering analysis that captures the multidimensional variation that is inherent in body shape (Zelditch et al., 2004). Deformation grids were used to describe the changes in the morphospace at the ends of the axes, which allowed us to observe the difference in the relative shape associated with the components that explain a greater percentage of variance (Zelditch et al., 2004). We evaluated with MANCOVA the association of shape with ecotype (i.e. cave or surface) and lineage (i.e. ‘old’ or ‘new’), and their interaction in R version 3.6.1 (R Core Team, 2019).
2.10 Linear discriminant analysis
A linear discriminant analysis (LDA) was used to compare between surface and cave populations. First, a cross-validation test was performed to identify the number of principal components where the cumulative variance discriminates between groups optimally, as previously described. Subsequently, an LDA was conducted with 100 replicates, using the retained PCs from the body shape dataset. The membership probability based on the retained discriminant functions was plotted for each individual. Two different clusterings were tested: one for five groups (i.e. three for the cave regions: Sierra de El Abra associated with ‘old’ lineage, Sierra de Guatemala and Micos, associated with ‘new’ lineage) and two for the ecotypes (caves vs. surface).
2.11 Phylogenetic signal of body shape
We evaluated the phylogenetic signal of body shape, working with a subset of data from organisms for which morphological and genetic data were available (Table S5). First, the phylogenetic tree was reduced removing terminals for which we had no morphological information (Figure S2). Therefore, a consensus shape was generated using as criteria the molecular results and the hydrogeological regions proposed by Elliott (2018), grouping for some terminals more than one population (Table S5; Figure S2). To estimate the phylogenetic signal of the body shape, we used the Blomberg's K method modified by Adams for multivariate data (Adams, 2014). The K statistic of Blomberg et al. (2003) estimates the strength of the phylogenetic signal in a dataset relative to what is expected given a Brownian motion model of evolution (Adams, 2014). Values of K range from 0 to ∞, with an expected value of 1.0 under Brownian motion. Values of K < 1.0 describe data with less phylogenetic signal, while data with greater phylogenetic signal will present values of K > 1.0. Significance of the observed K (Kobs) value is evaluated via permutations, where data at the tips of the phylogeny are randomized relative to the tree, and the values of Krand are obtained for each permutation of the data which are then compared with the Kobs (Adams, 2014; Blomberg et al., 2003; Revell et al., 2008). In multivariate data, Adams (2014) proposed a distance-based equation of Kmult that is found from the statistical equivalency between covariance-based and distance-based approaches from Euclidean data, summarizing the data matrices by their variances and covariances. The Ape package was used to project our reduced phylogeny, and the phylogenetic signal of the body shape was estimated using the ‘physignal’ function with 1000 permutations. A phylogenetically aligned component analysis (PACA) was constructed with the ‘gm.prcomp’ function allowing us a graphical visualization of the phylomorphospace. PACA maximizes the variation in the directions that describe the phylogenetic signal and conserves the Euclidean distances between observations in morphospace (Collyer & Adams, 2021).
3 RESULTS
3.1 Phylogenetic relationships
A total of 116 WGS were analysed: 86 new samples were sequenced and 30 WGS were reanalysed by Herman et al. (2018), resulting in an average of 95.41 M reads per individual (SD ± 12.96 M reads; min: 63.43 M, max: 128.97 M). An additional 36 samples from RADseq had 4.69 M reads per individual (SD ± 1.72 M reads; min: 2.08 M, max: 9.96 M reads). The average depth of mapping over the entire genome of Astyanax version 2.0 (Warren et al., 2021) was 14.17 X per individual (SD ± 7.25). After filters were applied, a total of 360,651 biallelic SNPs were retained on a dataset of 152 samples, with 3.7% missing data.
3.2 Phylogenetic trees
The ML reconstruction included a total of 152 individuals, with Astyanax mexicanus from both 20 cave and 14 surface populations, and four outgroup individuals from A. aeneus and Astyanax sp. nov. from Tehuantepec basin (Table S1). The phylogeny of the A. mexicanus complex was recovered as monophyletic, with two highly supported lineages (i.e. Lineages 1 and 2, Figure 1b). Lineage 1 was composed of six clades (1A–F), which contained surface populations from northeastern Mexico and cave populations from Micos (Sierra de La Colmena) and Sierra de Guatemala. Clade 1A includes the surface populations of the Bravo-Conchos basin, in northern Mexico: Río Bravo (Grande) in Texas, Río Hualahuises, Río San Fernando and Río Conchos. Clade 1B included surface populations from Río Choy, Río Coy, Río Tampaón and Río Mante from the Pánuco River basin. Additionally, three individuals from the Arroyo cave population, from the Sierra de El Abra, were grouped within this clade. Micos was recovered in Clade 1C (Subterráneo cave) as a sister group of Pánuco surface populations. Cave populations from Sierra de Guatemala were recovered in two different clades, which were congruent with the geographic proximity of their populations (Figure 1a). Thus, Jineo, Molino and Escondido caves were grouped in Clade 1D, and Caballo Moro and Vázquez caves were grouped in Clade 1E. And finally, Clade 1F clade included the surface population of Santa Anita River in San Luís Potosí.
Lineage 2 was composed of five different clades (2A–E). The cave populations from the Sierra de El Abra formed a monophyletic group with four different, geographically congruent clades (Figure 1b). Clade 2A included individuals from the Pachón cave population, which is found in the northern region of the Sierra de El Abra, followed by Japonés and Yerbaniz, which formed a different clade, Clade 2B. Clade 2C recovered most of the cave populations of the middle region of the Sierra de El Abra: Tigre, Sabinos, Arroyo, Tinaja, Montecillos, Jos, Piedras and Palma Seca. Clade 2D included the cave populations of Chica, Chiquitita and Toro, located in the southern region of the Sierra de El Abra. Finally, Clade 2E included surface populations of this lineage, composed of Rascón, Gallinas, Tamasopo and Peroles populations. The SVDQuartets topology largely matched the ML topology with slightly differing support values, except for some clades that were not included in the analysis like 2D, including Chica, Chiquitita and Toro populations (Figure S3).
3.3 Phylogenetic network
The phylogenetic network (Figure S4) did not show a completely bifurcated pattern but rather showed several reticulate events (i.e. hybridization Huson & Scornavacca, 2011), within cave and surface populations. Although it does not show a fully bifurcating pattern (i.e. in the form of a phylogenetic tree), it is congruent with maximum likelihood inference and the SVDQuartets with a bootstrap support of 100 on each edge. We found that these reticulate events are present through the evolutionary history of Astyanax mexicanus, and considering the large sampling in the Sierra de El Abra, we observed several reticular events within this region. Reticulate events were also observed between surface populations from Lineages 1 and 2. Clade 1E from the Sierra de Guatemala region (containing Caballo Moro and Vázquez) also showed a highly reticulate pattern, in contrast to the Gomez Farias region with almost no reticulation among the caves of the region. On the contrary, some clades were clearly recovered as well-defined splits, such as the clades 1C, 1F, and 2A. In general terms, there was a clear geographical congruence and phylogenetic signal across the network, segregating the populations according to their clades and geographic regions.
3.4 Population structure and differentiation between cave and surface populations
ADMIXTURE analysis supports the assignment of cave and surface populations of Astyanax mexicanus in K = 11 genetic clusters (Figure 2), as the lowest cross-validation result (Table S7). This result is largely concordant with the major clades obtained in the ML reconstruction (Figure 1b) and with the neighbor network (Figure S4).

Cave and surface populations from Lineage 1 were assigned to five different clusters, all of them coincident with the clades recovered in the RAxML reconstruction; however, surface populations from Pánuco basin (Clade 1B, Figure 2b) showed evidence of admixture with the Subterráneo cave population (Clade 1C, Figure 2b). Interestingly, some individuals from Arroyo cave (Clade 2C) at the Sierra de El Abra showed slight levels of admixture with surface populations from the Pánuco basin of Lineage 1 (Clade 1B). Cavefish from Sierra de Guatemala were divided into two different genetic clusters, which correspond to the two clades recovered in the RAxML reconstruction (Subclades 1D and E, Figure 2b). As indicated in the phylogenetic network, these clusters from the Sierra de Guatemala region also showed evidence of admixture. Surface populations from the northeast of México (Bravo-Conchos, Clade 1A Figure 2b) formed a differentiated genetic cluster without evidence of admixture.
Cave populations from Lineage 2 were assigned to four different genetic clusters. Cave populations of Chica, Chiquitita and Toro formed a genetic cluster (Clade 2D, Figure 2b) with evidence of admixture with the surface Pánuco basin populations from Lineage 1 (Clade 1B) (similar to Moran et al., 2022), as well as with other cave populations of Sierra de El Abra. Most cave populations from the Sierra de El Abra region were grouped on a different cluster (Clade 2C) with some evidence of admixture with both Pánuco basin and Subterráneo cave populations (Clade 1B and Clade 1C). Japonés and Yerbaniz cave populations formed a distinct genetic cluster (Clade 2B) without evidence of introgression with the rest of the Sierra de El Abra populations. The same occurs for Pachón cave (Clade 2A), at the northern limit of the Sierra de El Abra.
The surface populations from Lineage 2 were assigned to two genetic clusters, with no evidence of intermixing. The first group corresponds to the Río Peroles population (2E), while the second group included the surface populations from Río Gallinas, Rascón, Tamasopo and Santa Anita (2E). Interestingly, samples from the Santa Anita population show evidence of intermixing with other genetic clusters, including geographically distant cave and surface populations, or those from different lineages (Figure 2b).
The genetic clusters found in the ADMIXTURE analysis were evaluated in the cross-validation test in DAPC, where 20 PCs were retained to construct 10 discriminant functions (with an accumulated variance of 63%). We recovered a clear segregation of the ADMIXTURE clusters in the discriminant space. Through a minimum spanning tree, it was possible to observe a large differentiation of Sierra de Guatemala caves (Clades 1D, 1E) from the rest of the genetic clusters recovered.
The minimum spanning tree shows that cave populations from Sierra de El Abra (Clade 2C) were connected to the surface populations from the same lineage (i.e. Río Gallinas, Rascón and Tamasopo; Clade 2E). Although the Chica, Chiquitita and Toro cave populations (Clade 2D) were connected to the Sierra de El Abra (Lineage 2) by the minimum spanning tree, they also showed proximity to the surface populations of Lineage 1 (i.e. Clade 1B). Similarly, Pachón (Clade 2A) and Subterráneo (Clade 1C) caves were closer to surface populations in the DAPC in contrast with other cave populations.
3.5 Ancient and recent gene flow
With TreeMix, we visualized the historic migration events among clades. Four migration events were favored (Figure 3a), based on the ad hoc test of ∆m with Evanno's method, with an explained variance of 99.48% of the data (Figure 3b). The relationships recovered between the genetic groups were similar to those observed in the RAxML reconstruction; however, in TreeMix reconstruction the surface populations from Lineage 2 (i.e. Río Gallinas, Rascón and Peroles) were recovered as a sister group of both lineages, instead of being nested within Lineage 2, as was recovered by the ML reconstruction.

The results from TreeMix illustrate a complex pattern of gene flow that occurred repeatedly from surface to cave populations in the Sierra de El Abra region, as well as gene flow from populations at the south of the Trans Mexican Volcanic Belt (i.e. Astyanax aeneus) towards surface populations of Lineage 1 (Figure 3a). Three out of four migration events observed involve the cave populations from the Sierra de El Abra region, which have experienced invasions by surface fish at different times (Figure 3a). The most recent migration event was estimated at the southern part of the Sierra de El Abra region, where Chica, Chiquitita and Toro were connected with surface populations of Lineage 1 from the Pánuco basin, as documented in Moran et al. (2022). At the northern limit of the Sierra de El Abra, gene flow was detected between the ancestral surface lineage and Pachón cave, which is interesting since ADMIXTURE analysis did not indicate recent hybrids within Pachón cave. For the middle and southern regions of the Sierra de El Abra, it is possible to identify an older gene flow event, compared with the two previously described, from the ancestral surface lineage. In this analysis, we can also observe the presence of long terminal branches for some groups, such as Pachón (Clades 2A) and Chica group (Clade 2D).
3.6 Inference of historic population dynamics and divergence times between lineages
Our fastsimcoal2 paired comparisons analysis 0.1 to 1 Mya as the best model regarding the divergence time between Lineages 1 and 2 for most of the pairwise Clade comparisons, except from 1E (Caballo Moro/Vásquez)-2B (Yerbaniz/Japones) Clade comparison (Figure 4b), whose likelihood values in all divergence scenarios were the same. Thus, this pair of Clades (i.e. 1E and 2B) was excluded for the reconstruction of the divergence time frequencies.

The mean estimated divergence time between the Lineages 1 and 2 were 101,495 generations (SD = 408.25 generations) and 101,461 for the median (Figure 4c), with very narrow differences between the maximum values (median = 101,610.5 generations), and the minimum values (median = 101,194.5 generations, Figure 4c; Table S8). Notably, the estimates here, considering the standard deviations, clearly overlap with the estimates of Herman et al. (2018) and their uncertainty.
The effective population sizes (Ne) estimated for all comparisons using Fastsimcoal2 under the best model (0.1–1 Mya) showed little difference in the effective population size of the two genetic lineages, in contrast with their ancestral population. The mean effective population size for Lineages 1 and 2 were 2955.4 and 2897.4 individuals, respectively, and the median were 2957 and 2955 individuals, respectively. The mean and median effective population sizes estimated for the ancestral population of Lineages 1 and 2 were 3241.8 and 3068 individuals, respectively (Figure S5).
3.7 Morphological diversity
We found differences in body shape between both ecotypes (cave vs. surface) and their respective lineages (Lineages 1 and 2). Additionally, all factors evaluated in the MANCOVA (ecotype, lineage and the interaction: ecotype × lineage) had a statistically significant effect on body shape variation (Table 1).
MANOVA body shape | D.F. | Pillai | F | p |
---|---|---|---|---|
Ecotype | 1 | 0.982 | 57.92 | <.001* |
Lineage | 1 | 0.297 | 22.62 | <.001* |
Ecotype × lineage | 2 | 0.184 | 5.47 | <.001* |
Phylogenetic MANOVA Body shape | D.F. | SS | F | p |
---|---|---|---|---|
Ecotype | 1 | 0.037117 | 1.3619 | .351 |
Lineage | 1 | 0.00901 | .03525 | .956 |
Ecotype × lineage | 2 | 0.008976 | .03511 | .858 |
- Note: Phylogenetic MANOVA to identify body shape differences between the ecotypes and lineages, and their interactions accounting for the phylogeny obtained in RAxML. Values with * are statistically significant (p < .05).
Regarding the morphospace obtained in the PCA (Figure 5a), the first two components explained 48.38% of the cumulative variance. PC1 explains 30.37% of the variance and shows that the change in body shape occurs mainly in body depth. In this component, there is no clear separation between the ecotypes; however, we can highlight that there is a slight separation between the lineages of the surface ecotype, with surface Lineage 1 grouped towards the positive side of the axis, showing a deeper body compared to surface fish of Lineage 2. PC2 (18.01%) described variation in the dorsal profile of the organism, particularly in the head region, as well as body height and caudal peduncle length, showing differences between cave and surface lineages. In this sense, the Sierra de El Abra populations (Lineage 2 caves) were more differentiated from the surface lineages, in contrast to the other cave populations from Lineage 1 caves (i.e. Sierra de Guatemala and Micos caves), showing a greater overlap with the surface populations (Figure 5a). The Sierra de El Abra cave populations were on the positive side of this second component and present a more concave head shape in the dorsal profile, a wider and shorter body, with an elongated caudal peduncle compared to surface populations found on the negative side of PC2. Sierra de Guatemala populations were distributed on both sides of the component (subclade 1D is closest in morphospace to the Sierra de El Abra populations, Figure 5a,c), and Micos was distributed towards the negative side of the axis, mostly overlapping surface populations. Thus, the body shape of the Sierra de El Abra cavefish populations is the most differentiated from the surface ecotype, while the cavefish populations of the Sierra de Guatemala have an intermediate shape between the two ecotypes. Interestingly, the cavefish of Micos and the two surface lineages exhibit a very similar body shape morphospace.

Two different groupings were evaluated, the first was comparing five groups, which included the caves of the Sierra de El Abra, Sierra de Guatemala and Micos, and the two surface lineages (Lineages 1 and 2). For the five groups tested, we retained a total of 14 PCs after the cross-validation analysis, which were used for LDA. For that model, we obtained an average allocation percentage of 90.83%. Exploring the assignment by group, we recovered an assignment of 93.6% for Sierra de El Abra cavefish, 86.21% for Guatemala and 87.07% Micos, 96.04% for surface Lineage 1, and 94.74% for surface Lineage 2. We observed the larger overlap between Micos cave and surface ecotypes (Lineages 1 and 2, Figure 5b).
The second a priori grouping tested was between ecotypes (i.e. caves vs. surface). Based on the cross-validation analysis, we retained a total of 14 PCs with an assignment percentage for the model of 97.26%. We can see that both ecotypes are well differentiated, although they still show a slight overlap (Figure 6). Considering the probability of individual assignment, the cave ecotype showed the highest values of correct assignment to their group (95.91%), in contrast to the surface ecotype (83.95%).

3.8 Phylogenetic signal of body shape
Next, we tested whether global body shape differences were repeatedly evolved between the two lineages of cavefish using geometric morphometrics. We did not observe a phylogenetic signal in body shape and obtained a relatively low Kmult value (Kmult = 0.588, p = .107; Figure 5c), supporting the lack of phylogenetic concordance. This can be observed in the phylomorphospace obtained (PACA), where the two first components explained 53.83% of the variance, and segregated caves from their surface sister lineages. This pattern is more evident in Component 1, where the cave populations, independent of their lineage, were grouped on the positive side, whereas surface populations were on the negative side of this component. For surface fish, populations from Lineage 2 were differentiated in phylomorphospace (i.e. Rascón and Peroles) in contrast to the rest of Lineage 1 surface populations. Interestingly, Caballo Moro, Molino, Vázquez, and Chica cave populations were on the negative side of Component 2, in contrast to the rest of the cave populations (Figure 5c).
Phylogenetic MANOVA results were not significant for any factors in a phylogenetic context (Table 1). This implies that there is no relationship between the evolution of the body in both morphotypes and their phylogenetic relationships, suggesting morphological homoplasy.
4 DISCUSSION
The Astyanax mexicanus species complex represents an extraordinary model to study the mechanisms of regressive evolution (see White et al., 2019), since it presents an extreme response to environmental conditions (Jeffery, 2009; McGaugh et al., 2020), with cave-adapted morphs as one of the most studied organisms (Gross et al., 2013; Jeffery, 2009; Tabin, 2016; Torres-Paz et al., 2018). In the present study, we included the most comprehensive sampling effort in terms of individuals, cave populations and genomic data to date, in order to reconstruct the evolutionary history of the Astyanax mexicanus cave populations and to assess the evolution in body shape of cave populations.
4.1 Phylogenetic reconstruction of the Astyanax mexicanus complex
Our phylogenetic analysis revealed that the Astyanax mexicanus complex is a monophyletic group, with Astyanax aeneus as its sister group (Figure 1b). The taxonomic recognition of the Astyanax mexicanus has varied depending on the criteria used. Currently, within Astyanax mexicanus distribution (i.e. border between Mexico and the United States to the Trans Mexican Volcanic Belt), there are five nominal recognized species: Astyanax argentatus (Bravo-Conchos and Nazas Aguanaval basins), Astyanax jordani (troglobitic forms), Astyanax mexicanus (sensu Miller et al., 2005 and Ornelas-García et al., 2008), Astyanax rioverde (Río Verde, S.L.P., Schmitter-Soto, 2017) and Astyanax tamiahua (Ríos in Tamaulipas, Tamiahua lagoon, Schmitter-Soto, 2017). Previous taxonomic discrepancies arose from differences in species delimitation criteria, including morphological, ecological and molecular criteria (e.g. Ornelas-García et al., 2008; Schmitter-Soto, 2016, 2017; Strecker et al., 2012).
Our phylogenetic analysis using RaxML and SVDQuartets supports the monophyly of A. mexicanus species sensu (Miller et al., 2005), with two interfertile lineages (Jeffery, 2009; Wilkens, 1988; Wilkens & Strecker, 2003). Additionally, we found A. aeneus as a sister species. These results differ from our previous study based on mtDNA (Ornelas-García et al., 2008), where A. mexicanus was found to be a paraphyletic containing A. aeneus. Furthermore, A. jordani, A. rioverde and A. tamiahua were not recovered as reciprocally monophyletic but were placed across the two lineages of Astyanax mexicanus. These findings suggest the need for a taxonomic revision of the species complex, incorporating multiple sources of evidence such as morphology, ecology and phylogeny, to determine whether or not they are valid species or they should be considered as junior synonyms of A. mexicanus.
4.2 Cave lineages and population structure of Astyanax mexicanus cavefish
Our study indicates that cave colonization has occurred multiple times from two distinct ancestral surface lineages. These findings align partially with previous research utilizing mtDNA and/or nucDNA data, which have suggested that cave populations evolved from at least two separate colonizations (Bradic et al., 2012; Dowling et al., 2002; Hausdorf et al., 2011; Herman et al., 2018; Ornelas-García et al., 2008; Strecker et al., 2003, 2004, 2012), Lineage 1 (previously referred to as the ‘new lineage’) and Lineage 2 (previously referred to as the ‘old lineage’). In addition, within Lineage 1, we discovered cave populations in two separate clades: the Sierra de Guatemala (including the Gómez Farías and Chamal-Ocampo areas) and the Micos cave population, suggesting an independent colonization event for each region (Ornelas-García et al., 2008; Strecker et al., 2004 although Coghill et al., 2014 and Moran et al. 2023 present an alternative perspective).
While the distribution of independent cave lineages generally aligns with their geographical locations (i.e. Lineage 1 = Sierra de Guatemala and Micos and Lineage 2 = Sierra de El Abra), the colonization events may have a more intricate nature. Specifically, within the Sierra de Guatemala, the Chamal-Ocampo region (Vázquez and Caballo Moro caves) exhibits strong genetic differentiation from the phylogenetic reconstruction, ADMIXTURE and DAPC analyses (Figures 1b and 2), as well as morphological distinctions (Figure 5), when compared to its sister group in Gómez Farías (i.e. Escondido, Jineo and Molino). These two clusters belong to separate mountain systems, and it is unlikely that there is an underground connection between the Sierra de los Mangos and Sierra Cucharas ranges.
Within the Sierra de El Abra lineage, four distinct cavefish clades were identified, corresponding to the genetic clusters observed in the ADMIXTURE analysis. Notably, our study supports the genetic differentiation of the southernmost Chica group (i.e. Chica, Chiquitita and Toro caves) from all other cave populations in the Sierra de El Abra. In addition, our results showed admixture between the Chica group and the Sierra de El Abra cave populations, as well as with surface populations of Lineage 1. This admixture could be influencing the differentiation observed for the Chica group (see Moran et al., 2022).
The rest of the Sierra de El Abra forms a monophyletic group, showing a congruence between the recovered clades and the hydrological systems proposed by Elliott (2018). The northernmost cave in the Sierra de El Abra, Pachón, forms its own clade (Clade 2A). Previous studies, based on mtDNA data, placed Pachón cave within Lineage 1 (Ornelas-García et al., 2008; Strecker et al., 2004). However, our analysis, along with other studies including a larger amount of nuclear data (e.g. Bradic et al., 2012; Herman et al., 2018), placed Pachón within Lineage 2 (old lineage). The sister clade to Pachón cave is the Yerbaniz cave group (Clade 2B), which includes Yerbaniz and Japonés caves well differentiated from Pachón, and the central part of Sabino's region (Elliott, 2018). Finally, the Sabinos group, the larger cluster of caves in Sierra de El Abra, including Tigre, Sabinos, Arroyo, Tinaja, Montecillos, Pichijumo, Jos, Piedras and Palma Seca caves, may exhibit underground aquatic connections (Elliott, 2018).
In summary, this work provides evidence for two independent origins of cavefish lineages, Lineages 1 and 2. We also propose the possibility of up to four independent colonization events in the caves. Specifically, in the Sierra de Guatemala region, there may have been two separate invasions: one in the Gómez Farías region (including Escondido, Jineo and Molino), and another in the Chamal-Ocampo region (including Vázquez and Caballo Moro). A third invasion likely occurred in the Micos region, and a fourth in the Sierra de El Abra. However, we cannot rule out the possibility that gene flow between caves within the Chamal-Ocampo region with surface populations may have contributed to the differentiation observed in those instances. Similarly, the Chica group, although hypothesized to be part of the invasion of the Sierra de El Abra, appears as a distinct lineage, likely due to introgression with surface populations, leading to differentiation from other caves in the region.
4.3 Ancient and recent gene flow
Our analysis revealed evidence of at least four historical events of ancestral gene flow between surface and cave populations, indicating multiple instances of introgression throughout their evolutionary history. This finding supports previous studies that emphasized the importance of gene flow in the evolutionary dynamics of cave populations, as observed in populations such as Chica and Pachón (Herman et al., 2018; Moran et al., 2022).
Interestingly, we also detected evidence of ancestral gene flow between Astyanax aeneus and surface populations from Lineage 1. This gene flow had been previously identified using microsatellite and genomic data, at the distribution boundaries of both species (Herman et al., 2018; Strecker et al., 2012). Originally, the Trans Mexican Volcanic Belt was suggested as the main barrier impeding the gene flow between these species (see Ornelas-García et al., 2008); however, this barrier has also been questioned in other fish groups such as Dorosoma spp. (Elías et al. 2022). This shows the need to evaluate the effect that this barrier has had in the genus Astyanax, and other freshwater groups, as previously suggested (Contreras et al. 1996).
These migration events have likely played a significant role in shaping the genetic composition of the cave populations (Herman et al., 2018), shedding light on the historical connections between different regions. Notably, an ancestral migration event was observed between surface Lineage 1 and the Chica cave, indicating an ongoing connection between the Río Choy and the Chica cave, as previously reported (Moran et al., 2022).
Furthermore, the migration events observed between populations are not symmetrical, indicating that certain cave populations have been more prone to interbreeding with surface fish compared to others. This asymmetry in gene flow patterns adds complexity to the evolutionary dynamics between surface and cave populations.
4.4 Times and population size of cave populations
According to our findings, the coalescent time between the two lineages was relatively recent, with an average of 101,000 years ago (SD = 408,000 years, Figure 4; Table S8). Our current estimates are mostly concordant with the cave colonization times proposed by Herman et al. (2018) and contrast with previous hypotheses of ancient times of divergence between the lineages based on mitochondrial molecular clocks (see Ornelas-García et al., 2008; Strecker et al., 2004).
In terms of hydrogeology, Elliott (2018) suggested different ages for each of the cave regions. Micos region was considered the most recent, with an estimated age range of 15,000–37,000 years. The Chamal-Ocampo area in the Sierra de Guatemala was identified as an older system, with an estimated age of 545,000 years for the oldest part of the system, while the Sierra de El Abra region was deemed the oldest among the cave systems and exhibited the most extensive stream capture, with a range of 2 million years ago to 15,000 years ago in the Los Sabinos area. Within the Sierra de El Abra region, Chica cave was relatively young based on geohydrological data, estimated to be 10,000 to 16,000 years old. Given these temporal frameworks, certain systems appear to be highly consistent with our estimates.
Our data reveal that the mean effective population sizes estimated for both lineages were similar and smaller than previously reported effective sizes. In our study, we estimated a Ne for Lineage 1 of 2955 individuals and a Ne for Lineage 2 of 2897 individuals. However, the mean size of their ancestral population was slightly higher, at 3241 individuals. In a previous study using transcriptomic data and genomic data (Fumey et al., 2018; Herman et al., 2018), estimates of ancestral population size turned out to be 3–4 times larger than the present study, with a mean ancestral Ne of 36,107 individuals. Contrasting the effective sizes of cave versus surface populations, Herman et al. (2018) found that cave population sizes were smaller, in contrast to surface populations. The differences in population sizes could be attributed to several factors, including the level of sampling comparisons. In our study, we compared at the clade level, while in previous studies, population sizes were estimated on a per-population basis. Further demographic analyses can shed light on the effect of genetic drift in this model system.
4.5 Parallel evolution in body shape of cave-adapted populations
Cave environments exert specific selective pressures that have led to the evolution of distinct morphological traits in Astyanax mexicanus, particularly related to body shape. In our study, we examined the body shape of different cave lineages to determine whether this morphology is influenced by phylogenetic history (i.e. phylogenetic inertia) or is driven by environmental pressures.
We observed parallel evolution between the A. mexicanus cavefish lineages, characterized by dorsoventral flattening of the head, represented in both the PCA analyses and the discriminant analyses (Figures 5 and 6), like other cave vertebrate species (Christiansen, 2012; Edgington & Taylor, 2019; Fenolio et al., 2013; Hart et al., 2020; Soares & Niemiller, 2020). Furthermore, A. mexicanus cavefish exhibit a shorter body and elongated caudal peduncle compared to their surface-dwelling counterparts, which display a more compressed and fusiform dorsoventral axis. These findings suggest that the common body shapes observed among the different cave lineages of A. mexicanus are the result of adaptation to common environmental challenges.
Despite the parallel evolution of body shape, we were able to observe a gradient in the cave phenotypes among different populations. Thus, populations from the Sierra de El Abra exhibited a greater differentiation in body shape compared to surface fish, whereas cave populations from the Sierra de Guatemala showed less pronounced differences. Interestingly, the Micos cave population displayed a body shape that nearly overlapped with the surface populations. This variable degree of troglomorphism in different cave populations has been previously described by Wilkens (1977). He proposed that the severity of troglomorphy could be influenced by the timing of cave adaptation. Alternatively, it is possible that the morphological similarity between the Micos cave population and the surface fish is a result of hybridization between the two groups, rather than the age of the cave population.
Overall, our findings indicate a parallel evolution of body shape in cave-dwelling populations of A. mexicanus, influenced by complex factors including environmental pressures, timing of cave adaptation and possible hybridization events with surface fishes. This study, as well as previous studies, has shown that both lineages are virtually contemporary; however, the differences found between regions may be due to historical contingencies (i.e. founder effect) as well as recent ones (i.e. hybridization events). An example of the latter includes Chica and Chiquitita from the Sierra de El Abra, and Micos, which exhibited slightly different distributions in body shape compared to other cave populations. This variation could potentially be explained by introgressive hybridization, as suggested in the study by Moran et al. (2022).
Previous studies that have examined these differences in the body shape across lineages of cave populations have suggested that the divergence times between Lineage 2 cave populations (Sierra de El Abra) and Lineage 1 cave populations are very similar (Herman et al., 2018).
5 CONCLUSIONS
In conclusion, this study represents the most exhaustive effort in both sampling and sequencing carried out in this model system. Our results support the parallel evolution of the cave ecotypes of the Astyanax mexicanus from two independent lineages. Further, we hypothesize that two independent colonization events of the caves may have occurred, corresponding to the two lineages, but based on the genetic and morphological data, the number of independent colonizations could be up to four different episodes: (1) Sierra de El Abra; (2) Gómez Farías region including Escondido, Jineo and Molino caves at Sierra de Guatemala region; (3) Chamal-Ocampo region including Vásquez and Caballo Moro caves in the Sierra de Guatemala region; and (4) Micos region. We found parallel evolution in the body shape in both Astyanax mexicanus lineages, with no phylogenetic signal, but evidence that these phenotypes evolve in response to the environment. Finally, our study supports the monophyly of Astyanax mexicanus, sensu (Miller et al., 2005). Thus, further taxonomic analyses are required to crystallize the taxonomy of this group.
AUTHOR CONTRIBUTIONS
Marco Garduño-Sánchez and Jorge Hernández-Lozano: performed research, analyzed the data, and wrote the paper. Rachel L. Moran and Jeff Miller: analysed the data, and wrote the paper. Ramsés Miranda-Gamboa and Lourdes Lozano-Vilano: contributed to the sample collection, and wrote the paper. Joshua B. Gross and Nicolas Rohner: contributed with reagents, and genomic data, and wrote the paper. William R. Elliott: contributed with geological information, and wrote the paper. Suzanne E. McGaugh: designed research, contributed with reagents, contributed with analytical tools and wrote the paper. C. Patricia Ornelas-García: designed research, contributed to the sample collection, contributed with reagents and wrote the paper.
ACKNOWLEDGEMENTS
We sincerely thank Andrea Jiménez-Marín, Andrea Jiménez, Laura Márquez and Nelly López (Laboratorio Nacional de la Biodiversidad [LANABIO], IB-UNAM) for their lab assistance. We also want to thank the Minnesota Supercomputing Institute for its support.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts.
FUNDING INFORMATION
This research was funded by Project No. 191986, Fronteras de la Ciencia—CONACyT and the Programa de Apoyo a Proyectos de Investigación e Innovación Tecnológica (PAPIIT), UNAM No. IN212419. We appreciate the resources provided by the Minnesota Supercomputing Institute, without which this work would not be possible.
Open Research
DATA AVAILABILITY STATEMENT
Radseq sequences are available on the sequence read archive at NCBI under BioProject PRJNA899818 and for the WGS the corresponding NCBI BioProjects numbers are in Data Table S1. The morphological data are available at Dryad link: https://datadryad.org/stash/dataset/doi:10.5061/dryad.hqbzkh1n6.