Whole genome sequencing of silver carp (Hypophthalmichthys molitrix) and bighead carp (Hypophthalmichthys nobilis) provide novel insights into their evolution and speciation
Abstract
The edible silver carp (Hypophthalmichthys molitrix) and bighead carp (H. nobilis), which are two of the “Four Domesticated Fish” of China, are cultivated intensively worldwide. Here, we constructed 837- and 845-Mb draft genome assemblies for the silver carp and the bighead carp, respectively, including 24,571 and 24,229 annotated protein-coding genes. Genetic maps, anchoring 71.7% and 83.8% of all scaffolds, were obtained for the silver and bighead carp, respectively. Phylogenetic analysis showed that the bighead carp formed a clade with the silver carp, with an estimated divergence time of 3.6 million years ago; the time of divergence between the silver carp and zebrafish was 50.7 million years ago. An East Asian cyprinid genome-specific chromosome fusion took place ~9.2 million years after this clade diverged from the clade containing the common carp and Sinocyclocheilus. KEGG and GO analyses indicated that the expanded gene families in the silver and bighead carp were associated with diseases, the immune system and environmental adaptations. Genomic regions differentiating the silver and bighead carp populations were detected based on the whole-genome sequences of 42 individuals. Genes associated with the divergent regions were associated with reproductive system development and the development of primary female sexual characteristics. Thus, our results provided a novel systematic genomic analysis of the East Asian cyprinids, as well as the evolution and speciation of the silver carp and bighead carp.
1 INTRODUCTION
An understanding of speciation and adaption in endemic species is key to an understanding of evolution (Grant & Grant, 2002; Lack, 1983; Meyer, 1993). Nine million years ago, the uplift of the Himalaya–Tibetan plateau raised a young endemic clade of East Asian cyprinids (Tao et al., 2010). The saline waters from the Indian Ocean and the Pacific Ocean were obstructed, which resulted in a cross-linked river system (Chang, 2004; Qian et al., 2002). This transversion tremendously increased the ecological and phenotypic diversity of the East Asian cyprinids within a short time, enabling these fish to exploit areal lakes and river drainages and leading to rapid diversification (Wang et al., 2007). Similar to the East African cichlids (Malinsky et al., 2015; Muschick et al., 2012), the cyprinid clade endemic to East Asia is a useful model for studies of rapid radiations and speciation (Kolar et al., 2007; Lamer et al., 2010). However, unlike the East African cichlids, the East Asian cyprinids also provide appropriate case studies for fish adaptions in riverine systems (Chick & Pegg, 2001). A better understanding of East Asian cyprinid diversification will improve endangered species protection, help to prevent invasive species, and guide river-fish aquiculture programmes. Due to the evolutionary importance of the East Asian Cyprinidae, phylogenies of this endemic clade have been constructed based on mitochondrial and nuclear genes (Li et al., 2007; Tao et al., 2010). However, draft genomes are as yet unavailable for many of these fish (Liu et al., 2017; Wang et al., 2015). Without these genomic resources, it is difficult to identify speciation events and adaptive radiations in the East Asian Cyprinidae.
Traditionally, black carp, grass carp, silver carp and bighead carp are known in China as the “Four Domesticated Fish.” Although these were once the most abundant fish in the Yangtze River, their breeding numbers decreased dramatically over time, falling to less than 10% in the 1960s. This reduction in wild populations of these fish represents a long-term hidden danger to food security, as 93.78% of all freshwater products in China are farmed, and the Four Domesticated Fish represent more than half of the farmed and heavily consumed fish in China. The silver carp (Hypophthalmichthys molitrix) and the bighead carp (Hypophthalmichthys nobilis) fall into the clade of endemic East Asian Cyprinidae. H. molitrix and H. nobilis have been introduced into many countries as critical food resources and plankton consumers. These species have become the two most commonly cultivated aquaculture species, with a combined global production exceeding three million tonnes in 2013 (FOODS, 1984).
Hypophthalmichthys molitrix and Hypophthalmichthys nobilis share several distinct characteristics, including late sexual maturity, rapid long-distance swimming ability and drifting-egg spawning (Coulter et al., 2013; DeGrandchamp et al., 2008; Peters et al., 2006; Polačik et al., 2009), that allow these species to quickly populate river systems and to become invasive (Chick & Pegg, 2001). Although they exhibit similar adaptions to river systems, H. molitrix and H. nobilis diverge with respect to diet, body colour and other traits (Spataru et al., 1983). Although hybridization between these species has been reported in the Mississippi River Basin (Lamer, 2015; Lamer et al., 2010), there are no reports of hybridization between these species in their native environment. Thus, H. molitrix and H. nobilis in East Asian river systems represent a promising model of speciation and genetic isolation, which could be investigated using genome decoding.
Moreover, a chromosome fusion relative to zebrafish chromosomes was previously identified in the endemic clade of East Asian Cyprinidae (Wang et al., 2015). As has been shown in studies of Drosophila and human speciation, chromosome fusion might play important roles in speciation and adaptive evolution (Ayala & Coluzzi, 2005; Painter & Stone, 1935). The construction of more complete genomes for H. molitrix and H. nobilis would enable us to trace this chromosome fusion event in the endemic East Asian cyprinid clade, clarifying chromosome rearrangements in this group. Here, we used both next-generation sequencing and genetic mapping to generate chromosome-scale genome assemblies of H. molitrix and H. nobilis. We also explored the genetic bases of phenotypic traits that facilitate adaptations to riverine environments, and traced the dynamics of transposable elements (TEs) and protein-coding genes after the chromosome fusion event. We also performed whole-genome resequencing of 20 H. molitrix and 22 H. nobilis individuals to investigate population structure within these species, and to identify divergent genome regions that might harbour potential speciation genes.
2 MATERIALS AND METHODS
2.1 DNA sampling and sequencing
2.1.1 Samples used for whole-genome sequencing
Genomic DNA was isolated from the fin rays and blood cells of a gynogenetic female adult silver carp and a gynogenetic female adult bighead carp using the DNeasy Blood and Tissue kit (Qiagen). Three different short paired-end (PE) libraries (170 bp/270 bp, 500 bp, and 800 bp) and four types of mate-pair libraries, with gradient insert size ranging from 2 to 20 kb, were constructed and sequenced (170 bp for silver carp and 270 bp for bighead carp), following Illumina protocols. The silver carp libraries were sequenced on an Illumina HiSeq2000, generating short insert size libraries (PE 100 bp) and mate-pair long insert size libraries (PE 49 bp). The bighead carp libraries were sequenced on an Illumina HiSeq4000, generating PE and mate-pair libraries of 150 bp.
2.1.2 Samples used for re-sequencing
Blood samples were taken from 42 captured wild individuals: 20 silver carp (seven from the Pearl River, four from the Amur River and nine from Yangtze River) and 22 bighead carp (eight from the Pearl River, four from the Amur River and 10 from Yangtze River). Genomic DNA was extracted using the Wizard Genomic DNA Purification Kit (Wizard A1123, Promega), following the manufacturer's protocols. The genome of each individual was sequenced at ~11–17 × depth with an Illumina HiSeq4000 (PE 150 bp). In total, 239.9 and 259.8 Gb clean data were obtained for the silver carp and bighead carp samples, respectively.
2.2 Genome assembly and evaluation
To obtain high-quality data, low-quality reads and duplicate reads arising from PCR amplification during library construction were removed. A strict filtering criterion was applied for the raw reads using soapfilter, a software application in the soapdenovo2 package (Luo et al., 2015). To estimate the Hypophthalmichthys molitrix and H. nobilis genome sizes, we performed K-mer analyses of some of the sequence reads using jellyfish software (Marcais & Kingsford, 2011). For each species, we calculated the 17-mer frequency. Total genome size was calculated as K-mer number/peak depth. We used soapdenovo version 2.04 (Luo et al., 2012) for genome assembly. First, the reads of the short insert libraries (<2000 bp) were used to construct a de Bruijn graph. The contig sequences were generated after the tips were clipped, the bubbles were merged and connections with low-coverage areas were removed. Second, the mate-paired reads (≥2000 bp) were mapped onto the contigs to construct scaffolds. Then, we filled the gaps between scaffolds with gapcloser version 1.12 in the soapdenovo package and performed a local assembly of the collected reads using the PE data to retrieve read-pairs. Finally, sspace version 2.0 (Boetzer et al., 2011) was used to construct super-scaffolds with reads from all of the long insert size libraries (2–20 kb).
To evaluate the quality of the assembled genomes, three independent processes were executed. First, the short-insert size libraries were mapped onto the genome using soap2.21 (Li et al., 2008) with default parameters to assess the mapping ratio, while soap coverage 2.27 was used to calculate sequencing depth. Second, the gene set was evaluated based on expressed sequence tags, and RNA-sequencing (RNA-seq) was used to align the scaffolds to the assembled transcriptome sequences. Finally, the completeness of the coding gene set was evaluated using busco (Benchmarking Universal Single-Copy Orthologs; http://busco.ezlab.org/; Simao et al., 2015). The total number of Actinopterygii genes used for evaluation was 4584. We identified all complete and fragmented BUSCOs in this gene set.
2.3 Chromosomal assignment of scaffolds
To anchor the scaffolds to chromosomes, previously published genetic map data were used to inform genome assembly. Single nucleotide polymorphism (SNP) or simple sequence repeat (SSR) marker sequences were aligned to the scaffolds using blat (E value < 1 × 10−10, identity ≥ 90%, and alignment rate > 80%). Pseudomolecules were constructed by anchoring the scaffolds to the genetic linkage map using allmaps (Tang et al., 2015). The H. molitrix genetic map included 3134 markers, including 703 microsatellites (Guo et al., 2013); the 861 scaffolds (~600 Mb, 71.7% of the genome; 388 oriented scaffolds) were anchored onto 24 chromosomes. The largest scaffold was about 35 Mb. The H. nobilis genetic map included 3900 2b-RAD markers (Fu et al., 2016); the 353 scaffolds (~717 Mb, 83.8% of the genome; 209 oriented scaffolds) were anchored onto 24 chromosomes. Using identical methods, we anchored 1309 scaffolds (~811 Mb, 72.7% of the genome; 552 oriented scaffolds) of the previously published high-density RAD-seq genetic map of the blunt snout bream (Liu et al., 2017), which included 14,648 RAD markers (Wan et al., 2017), onto 24 chromosomes. Using the chromosome-level data for silver carp, bighead carp, grass carp, blunt snout bream and zebrafish, the MCScan pipeline (option “-a -e 1e-5 -u 1 -s 5”) was used to investigate gene synteny.
2.4 Genome annotation
The genome annotation process included repeat annotation and gene prediction. Before performing gene predictions, the TEs were identified using homology-based approaches. A de novo repeat library was generated using repeatmodeler (http://www.repeatmasker.org/RepeatModeler). This library was searched against the Repbase database (version 16.10) (Bao et al., 2015) (http://www.girinst.org/) using repeatmasker (http://www.repeatmasker.org/RMDownload.html, version 3.3.0) (Tarailo-Graovac & Chen, 2009) to identify and mask known repeat elements in the genome sequence. The repeat elements were classified based on Repbase.
We performed gene annotation on the repeat-masked genome sequences using three different algorithms: de novo gene prediction, protein homology-based methods and RNA-seq data. For se novo gene prediction, we used genscan (http://genes.mit.edu/GENSCAN.html) (Burge & Karlin, 1997) and augustus (Stanke & Morgenstern, 2005). Small genes (protein-coding regions < 150 bp) and gene models with incomplete open reading frames (ORFs) were removed. To predict genes based on protein homology, protein sequences from Danio rerio, Gasterosteus aculeatus, Megalobrama amblycephala, Oryzias latipes, Takifugu rubripes and Tetraodon nigroviridis were downloaded from Ensembl (Kersey et al., 2016). These sequences were aligned to the silver carp and bighead carp genomes using tblastn (Gertz et al., 2006), with an E-value ≤ 1e−5. Based on the tblastn results, genewise (Birney et al., 2004) was used to align the matched protein–genome sequence pairs and to define gene models. Finally, to predict genes based on RNA-seq data, previously published transcriptome data from pooled samples of different tissues (heart, liver, brain, spleen and kidney; (Fu & He, 2012) were aligned to the assembled genome sequence using tophat (Trapnell et al., 2009). Then, after sorting and combing the results of tophat mapping, cufflinks (Trapnell et al., 2010) was used to predict transcript structures. All genes predicted by each method were transferred to glean (http://sourceforge.net/projects/glean-gene/) (Elsik et al., 2007), which synthesized all predictions and generated a final reliable gene set. To annotate and categorize the functional protein-coding genes in the H. molitrix and H. nobilis genomes, sequence-based alignments were performed sequentially against the NCBI nonredundant protein (NR) database, the Swiss-Prot protein database, the Kyoto Encyclopedia of Genes and Genomes database (KEGG) and the Cluster of Orthologous Groups of proteins (COG) database using blastx (Michael Cameron & Cannane, 2004; Stephen et al., 1997), with an E-value threshold of 1e−5.
2.5 Gene family identification and phylogenetic tree construction
The protein-coding genes from 13 species (Ctenopharyngodon idellus, D. rerio, Gadus morhua, Gasterosteus aculeatus, Latimeria chalumnae, M. amblycephala, Oreochromis niloticus, Oryzias latipes, Takifugu rubripes, Tetraodon nigroviridis, Xiphophorus maculatus, Cyprinus carpio and Sinocyclocheilus grahami) were downloaded from Ensembl (http://www.ensembl.org). We combined these protein-coding genes with those identified in the H. molitrix and H. nobilis assemblies, and then removed all genes with short ORFs (encoding < 50 amino acids). We then used treefam (Li et al., 2006) to identify gene families as follows. First, we compared the encoded protein sequences across species using blast, with the E-value threshold of 1e−7; fragmental alignments were conjoined for each gene pair using solar. Second, H-scores (minimum edge weights), ranging from 0 to 100, were calculated to evaluate similarity among genes. Third, gene families were identified by clustering homologous gene sequences using hcluster_sg (version 0.5.0). Finally, single-copy gene families conserved across the 15 analysed species were extracted and aligned, guided by the amino-acid alignments generated using muscle (Edgar, 2004). A phylogenetic tree based on the amino-acid alignment was constructed using phyml (Guindon et al., 2010), using the HKY85 + gamma model of sequence evolution. This phylogenetic tree and the associated divergence times were confirmed using the same set of amino acid sequences. Divergence times were estimated using paml mcmctree (paml version 4.5) (Yang, 2007; Yang & Rannala, 2006), with the approximate likelihood calculation method.
2.6 Positive selection
We used the branch-site model in paml24, which uses the maximum-likelihood method to estimate molecular evolution, to detect genes under positive selection in the East Asian cyprinids. We compared two models: ModelA1, in which sites may evolve neutrally or under purifying selection, with ModelA, which also allows sites to be under positive selection. The p-values were computed using the χ2 statistic adjusted using the false discovery rate (FDR) method to allow for multiple comparisons. The accuracy of positively selected gene identification using the branch-site model can be affected by alignment quality.
2.7 SNP calling and filtering
The Illumina PE reads from each individual were mapped to the silver carp assembly using the Burrows–Wheeler Aligner (bwa version 0.5.9) (Li & Durbin, 2009). Based on the mapping result, duplicated reads were removed using samtools (Li et al., 2009) and picard (verSion 1.118) (http://picard.sourceforge.net). Alignment reads with mismatches ≥ 5 and mapping quality = 0 were removed. SNP variants were called using the gatk toolkit (version 3.3) (McKenna et al., 2010). To avoid alignment errors, a second mapping was performed to correct and filter errors due to indel variations. First, duplicated alignments were identified using the markduplicates module, while SNPs and indels were identified using the haplotypecaller module. Then, the base quality of each read was recalibrated using baserecalibrator and indelrealigner. Second, the recalibrated reads were applied to call variants in GVCF format for each individual using the haplotypecaller module. Finally, all GVCF files were merged into the population genotype file using haplotypecaller, and filtered using the following parameters: “DP < 180 || DP > 950 || QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0.” Samples within the population with a miss ratio ≤40% (~11.7 million SNPs) and a minor allele frequency (MAF) ≥0.05 were retained for subsequent analyses.
2.8 Phylogenetic inference, population structure and admixture analyses
A phylogenetic tree was inferred with fneighbor, a module in the phylip package (version 3.69.650) (Baum, 1989), using the neighbour-joining method (Saitou & Nei, 1987). The inferred tree was displayed using mega7 (Kumar et al., 2016). To further characterize the relationships among populations, we performed a principal component analysis (PCA) to cluster individuals based on SNP differences, using the twstats module in eigensoft (Patterson et al., 2006) (version 4.2, https://www.hsph.harvard.edu/alkes-price/software/). The Tracey–Widom test was used to calculate significance. Population structure and admixture were estimated based on SNP data using plink (Purcell et al., 2007) (version 1.07, http://pngu.mgh.harvard.edu/purcell/plink/); admixture (Alexander et al., 2009) (version 1.3.0) was applied with default parameters. Ancestral populations with k-values of 2–4 were identified.
2.9 Linkage disequilibrium (LD) and polymorphism analysis
To measure LD in H. molitrix and H. nobilis, we calculated the pairwise correlation coefficients (r2) between genotypes using poplddecay (version 3.30, https://github.com/BGI-shenzhen/PopLDdecay) with the following parameters: “PopLDdecay -InGenotype –SubPop -MaxDist 500 -MAF 0.05 -Miss 0.6.” The average r2 values between all pairs of genotypes were visualized using rscript (version 3.2.1).
To identify differentiated loci between the silver and bighead carp, we performed a sliding window analysis based on pairwise estimates of differentiation (FST) within each population. FST was calculated using vcftools (Danecek et al., 2011) (version 0.1.13, http://vcftools.sourceforge.net/) with the parameters “--FST-window-size 20000 --FST-window-step 2000 –maf 0.05 --max-missing 0.2.” The window size was 20 kb and the step size was 2 kb. The XP-EHH test was performed using xpehh (Sabeti et al., 2007) with default parameters. We also determined the nucleotide diversity (π) of H. molitrix and H. nobilis, and then calculated the nucleotide diversity between species (H. molitrix/H. nobilis). We calculated PBS (population branch statistic) which is a summary statistic to quantify sequence differentiation along each branch among three population tree based on the log-transformed FST. Dxy, which is a measure of genetic divergence between two populations, was also identified. Both Dxy and PBS were calculated following methods previously used in Tibetan frogs (Wang et al., 2018; Yi et al., 2010). Finally, overlapping regions with π ratios in the top 5% and with FST values in the top 5% were identified as divergent or speciation regions, and the genes in those regions were extracted as potential speciation or selection genes.
3 RESULTS AND DISCUSSION
3.1 Sequencing, assembly and annotation
The genomes of the double-haploid fish Hypophthalmichthys molitrix and H. nobilis were sequenced using the Illumina platform. We generated 129.67 Gb (156.8 × coverage) and 228.87 Gb (262.46 × coverage) of raw data for H. molitrix and H. nobilis, respectively (Table S1). After filtering out low-quality reads, 76.17 Gb of high-quality sequences were obtained for H. molitrix, and 129.74 Gb of high-quality sequences were obtained for H. nobilis (Table S2). The sizes of the H. molitrix and H. nobilis genomes were estimated at 782 and 844 Mb, respectively, by K-mer and genome scope analyses (Table S3 and Figure S1). Using soapdenovo (Luo et al., 2012), we obtained an 837-Mb assembly for H. molitrix (contig N50: 19.9 kb; scaffold N50: 972.8 kb) and an 845-Mb assembly for H. nobilis (contig N50: 39.1 kb; scaffold N50: - 3334 kb) (Table 1, Figure 1a; Table S4). The quality of the two genomes was comparable with the published blunt snout bream and grass carp whose genomes were assembled based on cost-effective and accurate short reads, but lack the contiguity and completeness compared with the current standards (the combination of long reads, short reads and Hi-C technology) for assembly. In both cases, peak sequencing depth was about 80–90× (Figure S2). The GC contents of the H. molitrix and H. nobilis genomes were both about 37%, similar to those of previously published teleost genomes (Figure S3) (Liu et al., 2017). The quality of each assembled genome was evaluated using busco, with an Actinopterygii-specific gene set of 4584 single-copy orthologues. The H. molitrix and H. nobilis genomes captured respectively 94.9% (4315 of 4584) and 96.2% (4410 of 4584) of the complete BUSCOs, while the genomes of blunt snout bream and grass carp captured 91.8% and 76.4%, respectively (Table 1). Thus, our assemblies were of higher quality than those of other East Asian cyprinids.
Characteristics | Silver carp | Bighead carp | Blunt snout bream | Grass carp |
---|---|---|---|---|
Number of scaffolds (>2 kb) | 6208 | 1861 | 9769 | — |
Total size of scaffolds (bp) | 837,021,069 | 845,590,624 | 1,115,678,790 | 900,506,596 |
N90 scaffold length (bp) | 74,972 | 1,242,541 | 20,422 | 179,941 |
N90 scaffold number | 1193 | 302 | 4034 | 301 |
N50 scaffold length (bp) | 972,830 | 3,334,523 | 838,704 | 6,456,983 |
N50 scaffold number | 236 | 67 | 285 | 42 |
N50 contig length (bp) | 19,902 | 39,098 | 49,400 | 40,781 |
N50 contig number | 11,589 | 6164 | 5730 | 5701 |
Longest scaffold (bp) | 4,480,394 | 19,310,221 | 8,950,707 | 19,571,558 |
Number of markers in genetic map | 3134 | 3900 | 14,648 | 279 |
Number of chromosomes | 24 | 24 | 24 | 24 |
Scaffolds anchored on linkage groups | 861 | 353 | 1309 | 114 |
Length of scaffolds anchored on linkage groups (bp) | 600,174,639 (71.7%) | 717,030,603 (83.8%) | 811,361,125 (72.7%) | 573,471,712 (64%) |
Total GC content (%) | 37.0 | 37.2 | 37.3 | 37.4 |
Complete BUSCOs (%) | 94.9 | 96.2 | 91.8 | 76.4 |
Protein-coding gene number | 24,571 | 24,229 | 23,696 | 27,263 |
Content of transposable elements (%) | 39.36 | 45.04 | 34.18 | 38.06 |

The results of three gene prediction methods (de novo, homology-based and RNA-seq) were integrated using glean. We predicted a total of 24,571 and 24,229 protein-coding genes in the H. molitrix and H. nobilis genomes, respectively (Tables S5–S7). The region coverage of each gene set was evaluated based on transcriptome-identified unigenes and BUSCOs. Nearly 98% of all unigenes were covered by each assembled genome, and only about 6% of all BUSCO genes were missing per genome (Tables S8–S10). Of the 24,571 and 24,229 genes predicted in the H. molitrix and H. nobilis genomes, respectively, 22,036 (89.68%) in the H. molitrix genome and 22,959 (94.76%) in the H. nobilis genome were identified in at least one of the searched databases (InterPro, GO, KEGG, Swiss-Prot and TrEMBL) (Table S11). Repeat analysis indicated that the H. molitrix and H. nobilis genomes included 41.08% and 48.24% repeat sequence, respectively. These figures are slightly higher than those recovered in the blunt snout bream genome (38.68%), but lower than that in the zebrafish genome (52.2%) (Figures S4 and S5). The results showed that the main difference in the repeat is the type of DNA, which was positively correlated with genome size (Tables S12–S15). The noncoding RNA genes in the H. molitrix and H. nobilis databases were also annotated (Tables S16 and S17).
3.2 Evolution of the East Asian Cyprinidae
To decode the evolutionary history of the East Asian cyprinids, we constructed a phylogenetic tree using 724 single-copy genes from 13 species and four subgenomes from Cyprinus carpio and Sinocyclocheilus grahami. We identified 24,556 gene families across these 15 fish using the Treefam database (Table S18 and Figure S6). The results indicated that the ancestral lineage of the East Asian cyprinids and Cyprinus carpio diverged from that of zebrafish 50.7 million years ago (Ma) (Figure 2). In our phylogeny, H. nobilis was most closely related to H. molitrix, with an estimated divergence time of 3.6 Ma (Figure 2). All four East Asian cyprinids fell into the same clade, closely related to other clades of cyprinoids and zebrafish. We also compared orthologue profiles among the five cyprinids. A total of 9848 gene families were shared among the five species, and 442 gene families were identified in only one of the four East Asian cyprinids (Figure S7); 13,085 gene families overlapped between H. nobilis and H. molitrix. We estimated the expansion and contraction of gene families to compare evolutionary histories among the East Asian cyprinids, zebrafish and eight other fish genomes (the cyprinoid branch was excluded from this comparison due to polyploidy; Figure 2). The results indicated that 328 East Asian cyprinid gene families had expanded and 641 gene families had contracted (Figure 2). KEGG and GO analyses indicated that the terms significantly enriched in the expanded-family genes of both H. molitrix and H. nobilis (p < .05) were associated with diseases, the immune system and environmental adaptations (Figures S8 and S9). Using a branch-site model in paml4, a maximum-likelihood method of molecular evolution estimation, we detected 32 gene families under positive selection in the East Asian Cyprinidae (including silver carp, bighead carp, the blunt snout bream and the grass carp). We functionally annotated the genes in the H. molitrix and H. nobilis gene families under positive selection. Most of the positively selected genes were associated with the responses to stimuli and biotic stimuli, nucleotide binding, GTP binding, and receptor activity (Table S19).

3.3 Tracing chromosome fusion events
To compare genomes at the chromosome level, the blunt snout bream genome was anchored onto chromosomes using a high density RAD-eeq genetic map (Figures S10–S12 and Table S20). Previously published genetic map data for H. molitrix and H. nobilis were used to assign 71.7% and 83.8%, respectively, of the scaffolds to 24 pseudomolecules (Figures S11 and S12, Table S20). Using the chromosome-level data for silver carp, bighead carp, grass carp, blunt snout bream and zebrafish, the MCScan pipeline was used to investigate gene synteny. The recovered syntenic relationships indicated that most of the silver carp, bighead carp, grass carp and blunt snout bream linkage groups exhibited extensive collinearity among chromosomes across species. Gene alignment identified the highest synteny between H. molitrix and H. nobilis; up to 16,688 H. molitrix genes (67.92% of 24,571) were located on syntenic blocks. These results were consistent with the synteny of all chromosomes between zebrafish and the East Asian cyprinids, as determined by tbtools (Figure 1b). In all four East Asian cyprinids, one chromosome aligned to zebrafish chromosomes 22 and 10, but not to any other linkage group. This was consistent with previous results in grass carp and blunt snout bream. However, 25 pairs of chromosomes have been identified in the genomes of the zebrafish, Cyprinus carpio, and S. grahami. Our results suggested that the East Asian cyprinids may possess only 24 pairs of chromosomes due to the fusion of two ancestral chromosomes. The ancestral East Asian cyprinid was predicted to have 24 pairs of chromosomes. Our phylogeny (Figure 2) indicated that the most recent common ancestor of the East Asian Cyprinidae (2n = 48) and Cyprinus carpio (2n = 4x = 100) plus S. grahami (2n = 4x = 100) diverged from the zebrafish clade about 50.7 Ma (2n = 50). Since that time, H. molitrix and H. nobilis evolved different numbers of chromosomes. Chromosome fusion may have occurred about 9.2 Ma, suggesting that this event may have been a consequence of the Himalaya–Tibetan plateau uplift about 9 Ma, which led to the evolution and expansion of the East Asian cyprinid clade.
3.4 Population genetics and introgression
To further explore the relationship between the silver carp and bighead carp in different rivers, we sequenced the complete genomes of 42 individuals (20 silver carp and 22 bighead carp). We obtained 239. 9 Gb of clean data for H. molitrix and 259.8 Gb of clean data for H. nobilis (Table S21). Mapping the sequencing reads from each sample to the H. molitrix reference genome showed that the average sequencing depth was ~11–17× (Table S22). The average mapping rate for the silver carp was about 89.92%, while that of the bighead carp was lower (76.43%). Using the gatk pipeline (DePristo et al., 2011), we identified 6,990,203 population SNPs (Table S23). Based on these SNPs, we investigated the relationships among the 42 individuals using neighbour-joining phylogenetic analysis, PCA and structure. We explored the phylogenetic relationships among the 42 individuals using the identified high-quality SNPs. The 42 individuals clearly clustered into two subgroups (H. molitrix and H. nobilis) within the phylogeny (Figure 3). The H. molitrix individuals were further subdivided into three groups corresponding to the river region. The results for H. nobilis were similar, except that two individuals from the Yangtze River were assigned to the Pearl River group. The PCA plot separated H. molitrix and H. nobilis along the first eigenvector, which explained 91.67% of the total genetic variance (Figure 3). The second eigenvector identified subpopulations from the Pearl River, Yangtze River and Amur River and explained 2.45% of the variance. Population structure results, which were highly consistent with our phylogeny and PCA, revealed signs of nuclear gene flow from H. molitrix to H. nobilis (Figure 3). H. nobilis and H. molitrix are genetically isolated and occur symmetrically in their native range. However, introgression between these species was detected in population analysis, consistent with reports of widespread introgressive hybridization between bighead carp and H. molitrix in the Mississippi River Basin. The processes leading to the divergence of H. nobilis and H. molitrix provide useful information for closely related species. LD values (correlation coefficient, r2) were also calculated between H. molitrix and H. nobilis. LD was greater in H. molitrix (407 bp) than H. nobilis (322 bp), with an LD decay of 0.16 (Figure 3). Our results indicated that silver carp and bighead carp were highly divergent.

3.5 Genomic divergence and speciation
To detect the genomic regions that differed between the H. molitrix and H. nobilis populations, we used FST values and a modified PBS approach to measure the extent of genomic differentiation between the populations (Figure 4a,b; Figure S13). The mean PBS value of H. molitrix (PBS-H.molitrix) was 1.231, indicating that regions with observed PBS ≥ 1.129 could be identified as highly divergent (Figure 4c). In H. molitrix, there were 818 such regions, with a total length of 31.14 Mb. The longest fragment was 252 kb, and a few regions (16.63%) had fragments of only one window-length (20 kb). The mean FST among these regions was 0.823, more than 20-fold greater than the mean FST among subpopulations of H. molitrix (0.034) or H. nobilis (0.013) (Figure 4d).

To reliably detect positive selection in H. molitrix and H. nobilis against a background of speciation differentiation, we integrated the results of four methods (Dxy, Tajima's D, π and XP-EHH). Using a combination of FST and π analyses, we identified 58 putative regions, including 19 candidate genes, which might be associated with divergence or speciation. GO terms associated with reproduction and reproductive processes were enriched in three candidate genes. Using Dxy and XP-EHH analyses, we identified two additional candidate genes enriched in two GO terms associated with reproduction: reproductive system development (GO:0061458) and development of primary female sexual characteristics (GO:0046545). These putative genes may be targeted by natural selection and differences in functional traits between the carp species or habitat-specific environmental factors.
4 CONCLUSION
We have constructed two chromosomal-level genome assemblies and used 42 whole genome resequencing individuals to provide insights into the evolution of East Asian cyprinids and speciation of sliver carp and bighead carp. We uncovered common chromosome fusion events in the ancestral East Asian cyprinid. We revealed substantial genomic isolation and identified highly divergent regions between sliver carp and bighead carp at the population level. Our extensive genetic analyses characterized the adaptive radiations and speciation events that have allowed the silver carp and bighead carp to thrive in various native rivers. These genomic data provide an important resource for further studies of the economically and scientifically important East Asian cyprinids such as regarding their evolution, conservation and commercial breeding.
ACKNOWLEDGEMENTS
This research was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB31000000), the National Natural Science Foundation of China (31702016 and 31972866) and Youth Innovation Promotion Association, Chinese Academy of Sciences (http://www.yicas.cn).
CONFLICTS OF INTEREST
The authors declare no conflicts of interest.
AUTHOR CONTRIBUTIONS
S.H. conceived and initiated the study. J.J., B.W. and L.G. performed the genome sequencing and bioinformatics analyses. L.Y. and X.G. performed most of the experiments and L.G., L.F., J.L., K.D., B.F., M.B., Y.W. and M.C. assisted in part of the experiment. J.J. wrote the manuscript. S.H., L.Y., X.G., B.W., X.F. and S.J., revised the manuscript.
Open Research
DATA AVAILABILITY STATEMENT
The data supporting the findings of this work are available within the paper and its Supporting Information. Detailed information of the software used in this study can be found in Table S24 including the github or Link (https://github.com/wubin2-bin/silver-carp-genome). The data sets generated and analysed during the current study are available from the corresponding author upon request. The genome and annotation of the silver carp and bighead carp and all raw reads generated in this study are deposited in China National GeneBank (CNGB, https://db.cngb.org/cnsa) under accession CNP0000974.