Chromosome level assembly reveals a unique immune gene organization and signatures of evolution in the common pheasant
Abstract
The common pheasant Phasianus colchicus, belonging to the order Galliformes and family Phasianidae, is the most widespread species. Despite a long history of captivity, the domestication of this bird is still at a preliminary stage. Recently, the demand for accelerating its transformation to poultry for meat and egg production has been increasing. In this study, we assembled high quality, chromosome scale genome of the common pheasant by using PacBio long reads, next-generation short reads, and Hi-C technology. The primary assembly has contig N50 size of 1.33 Mb and scaffold N50 size of 59.46 Mb, with a total size of 0.99 Gb, resolving most macrochromosomes into single scaffolds. A total of 23,058 genes and 10.71 Mb interspersed repeats were identified, constituting 30.31% and 10.71% of the common pheasant genome, respectively. Our phylogenetic analysis revealed that the common pheasant shared common ancestors with turkey about 24.7–34.5 million years ago (Ma). Rapidly evolved gene families, as well as branch-specific positively selected genes, indicate that calcium-related genes are potentially related to the adaptive and evolutionary change of the common pheasant. Interestingly, we found that the common pheasant has a unique major histocompatibility complex B locus (MHC-B) structure: three major inversions occurred in the sequence compared with chicken MHC-B. Furthermore, we detected signals of selection in five breeds of domestic common pheasant, several of which are production-oriented.
1 INTRODUCTION
The common pheasant (Phasianus colchicus), also known as the ring-necked pheasant, is a bird native to Asia which has been widely introduced to Europe, America, and Australia (Johnsgard, 1999). It has a long tail and colourful plumage and belongs to the family Phasianidae in the order Galliformes. Up to 30 subspecies of P. colchicus, categorized mainly by their morphological variations, are distributed worldwide and have adapted to diverse habitats (Madge et al., 2002). This bird is characterized by strong sexual dimorphism: males are typically highly decorated with bright and coloured plumage, long striped tails, and spurs, whereas females are smaller, cryptic, and nonornamented.
The common pheasant has a long history of captivity, but a relatively shorter history of domestication. The Chinese ring-necked pheasants, mostly descendants of P. colchicus subsp. Torquatus, may be the most widely introduced subspecies of the common pheasants; they have merely less than 200 years of domestication history (Burger, 1988; Yardley, 2015). The pheasants are stocked on game farms for hunting purposes where they are commercially bred.
These domestic common pheasants were reintroduced into China during the 1980s and 1990s (Pan, 2015). For decades, these reintroduced commercial breeds, also called “seven-colour wild pheasant,” have been bred for their production traits. With an increase in demand in the domestic market, the commercial common pheasant has become the most consumed rare poultry and an important source of quality protein. Nonetheless, unlike for fully domesticated poultry such as chicken and ducks, genetic improvement of farmed common pheasants is urgently needed. Moreover, with an increase in the farming and release of domestic common pheasant, many indigenous subspecies have become threatened at different levels owing to genetically mixing with their feral relatives (Braasch et al., 2011). Thus, a high-quality reference genome is essential for the genetic improvement and conservation of the common pheasant.
Recent advances in single-molecule sequencing allow the production of long-range genomic data (Eid et al., 2009). The sequencing-based high-throughput chromosome conformation capture (Hi-C) approach has been used to produce chromosome-level scaffolds (Lieberman-Aiden et al., 2009). Hybrid de novo genome assembly approaches that utilize combinations of long-range technologies along with fragmented, yet high-quality, de novo next-generation sequencing (NGS) contigs have the potential to generate accurate chromosome-scale scaffolds (Mostovoy et al., 2016). While such new technologies have emerged, the vertebrate genome project consortium has set a standard of genome assembly quality that the N50 size should be at least 1 Mb for contigs and 10 Mb for scaffolds, and at least 90% of the sequences should be assigned to chromosomes (“A reference standard for genome biology,” 2018).
Here, we report a contiguous, accurate whole genome sequence of the common pheasant by using a hybrid de novo assembly strategy. Based on this high quality assembly, we revealed the karyotype of the common pheasant and organization of immune response genes. We also performed comparative analysis including genomes from other animals to search for genes that were rapidly evolved in this genome. In addition, we resequenced five commercial breeds of domestic common pheasant to investigate its population structure and signals of selection. This study provides insight into the genomic evolution, molecular phylogeny, and signals of selection in this pheasant. Furthermore, this robust reference genome represents an essential resource for molecular breeding and genetic improvement of domestic common pheasants, and genetic conservation of wild local common pheasants.
2 MATERIALS AND METHODS
2.1 Sample collection
Five breeds of P. colchicus (Figure S1) used in this study were collected from Shanghai Xinhao Rare Poultry Breeding Co., Ltd. (Shanghai, China). Ringneck (R), along with Manchurian (M) and melanistic mutant (Mt) breeds recently introduced from the United States. These three breeds are smaller in body size, and are primarily used for stock and hunting. Shenhong (S) is the first breed of domestic common pheasant certified by the Chinese government which is a descendant of the domestic Chinese ring-necked pheasant (also called a rainbow pheasant) that was introduced from the United States in the 1980s and 1990s. Both the black breed (B) and Mt have been considered mutants of the rainbow pheasant. Both the S and B breeds have been cultivated in China for decades, and mostly bred for egg and meat production. Ringneck, for its characteristic morphological features of the Chinese ring-necked pheasant, was selected to be a representative for de novo sequencing. Fresh tissues, including blood, muscle, heart, brain, liver, and spleen, were collected from 16 adult female common pheasants and then stored at –80°C for subsequent analysis. All animal experiments followed the guidelines established by the ethics committee for the Care and Use of Laboratory Animals of the Shanghai Jiao Tong University. The protocol was approved (permit number: 2017-0321).
2.2 Genome sequencing
We extracted DNA by using the Blood & Tissue Kit (TIANGEN), following the manufacturer's instructions. Two paired-end and four mate-pair libraries were prepared for de novo sequencing. Library sequencing was performed on Illumina platforms. Genomic DNA was sheared using g-Tubes (Covaris), with 20 kb library length. The sheared DNA was purified and concentrated and then used for SMRTbell library preparation according to the PacBio 20 kb template preparation protocol. Single-molecule sequencing was conducted on the PacBio RS-II and Sequel platform. Library preparation, fragment selection, and sequencing were performed at the Shanghai Personal Biotechnology Co., Ltd. (Shanghai, China). Two libraries were prepared for Hi-C sequencing. For each library, muscle tissue was crosslinked and digested with MboI. The libraries were sequenced using the HiSeq X-ten platform.
2.3 Genome assembly
AdapterRemoval version 2 (Schubert et al., 2016) was applied to remove sequencing adapters and filter low-quality NGS reads. The estimated genome size was 0.99 Gb according to k-mer analysis (k = 17). All clean next-generation reads provided approximately 80-fold mean coverage. First, we used the Illumina paired-end reads for de novo assembly without Pacific Biosciences (Pacbio) long reads. Soapdenovo version 2 (Luo et al., 2012) and sspace (Boetzer et al., 2011) were used to generate the first assembly with a contig N50 of 2.95 kb and scaffold N50 of 2.43 Mb.
falcon version 1.2.5 (Chin et al., 2016) was used to generate the draft genome by using PacBio reads. The estimated total coverage was 78×. The draft genome was polished using Pilon (Walker et al., 2014) by using the filtered NGS data. The de novo assembly with Pacbio long reads produced a P. colchicus reference genome with a contig N50 of 1.33 Mb. HiC-Pro (Servant et al., 2015) was applied to obtain valid paired-end long insertion reads from the raw approximately 500 Gb Hi-C data. The assembled contigs were then ordered and oriented using Lachesis (Dudchenko et al., 2017), followed by SALSA (Ghurye et al., 2017), by using the validated clean data. RaGOO (Alonge et al., 2019) was used with default settings to reconstruct the common pheasant's sex chromosomes, which aligned and ordered all the scaffolds to the chicken reference genome. Benchmarking universal single-copy orthologue (BUSCO) analysis of the draft assembly was run against the eukaryote, metazoan, vertebrate, and avian orthologue databases, to assess the assembly quality. See Supporting Information 2 for further details.
2.4 Genome annotation
The interspersed and low-complexity regions were identified using RepeatMasker (Smit et al., 2015) by using combined RepBase libraries (version 2017-01-27) and RepeatModeller calls generated from the genome assemblies. RNA from the heart, liver, spleen, brain, and muscle was isolated, library prepared, and sequenced using Illumina technology in Shanghai Personal Biotechnology Co., Ltd. Gene models were resolved using a combined strategy of ab initio predictions, homologue prediction, and transcriptome evidence. Trinity software package (Grabherr et al., 2011) was used for transcript de novo assembly. We use PASA (Haas et al., 2008) to aligned assembled transcripts to the common pheasant genome sequences and then TransDecoder from the Trinity package to identify the likely open reading frame within the transcripts. The ab initio gene prediction was performed using Augustus (Stanke et al., 2008), snap (Korf, 2004), and GlimmerHMM (Majoros et al., 2004). For the homology-based approach, the amino acid sequences from chicken, turkey, Japanese quail, and Guinea fowl were aligned to the common pheasant assembly by using Exonerate. All the results were integrated using the EVidenceModeler pipeline (Haas et al., 2008). The final gene set was then annotated using blast against the NCBI nonredundant protein database, with matches restricted to vertebrates with an e-value threshold of 1e10−5.
2.5 Whole-genome alignment
We aligned the common pheasant scaffolds to the genomic sequence of chicken and turkey by using mummer 3.0 (Kurtz et al., 2004) with default settings. We manually curated the predicted gene loci of major histocompatibility complex B locus (MHC-B) by incorporating evidence from the assembled transcripts and homologous regions of the chicken, turkey, Japanese quail, and Guinea fowl. The curated MHC-B fragment was aligned to the MHC-B sequences of chicken (NCBI accession: AB268588), turkey (NCBI accession: DQ993255), Japanese quail (NCBI accession: AB078884), golden pheasant (NCBI accession: JQ440366), black grouse (NCBI accession: JQ028669), and Mikado pheasant, respectively.
2.6 Gene family construction
To define gene families, we first downloaded coding sequences of 15 species from ENSEMBL Annotation Release 96 and extracted the longest protein for each gene. The 15 species were human, mouse, platypus, green anole, chicken, turkey, Japanese quail, Guinea fowl, duck, pink-footed goose, great tit, zebra finch, flycatcher, budgerigar, and emu. Orthofinder 2 (Emms & Kelly, 2019) were used with the default setting to identify orthogroups (gene families) based on phylogenetic relationships between gene sequences through pairwise sequence similarity obtained from an all-versus-all BLAST. Café (Han et al., 2013) was used to define the expansion and contraction of each gene family with default parameters.
2.7 Phylogenetic analysis
A total of 5,024 single-copy families, including the common pheasant and the other 12 species (green anole, chicken, turkey, Japanese quail, Guinea fowl, duck, pink-footed goose, great tit, zebra finch, flycatcher, budgerigar, and emu), were used to reconstruct phylogenies. We used muscle (Edgar, 2004) to align the amino acids and then used TrimAl (Capella-Gutierrez et al., 2009) to remove unreliably aligned sites and gaps in the alignments. IQ-tree (Nguyen et al., 2015) was used to each of the alignments to infer gene trees. All gene trees were merged using astral (Zhang et al., 2018). In the meantime, we concatenated all the amino acids alignments into a super-gene alignment. Finally, 2,233,863 sites remained and were used to construct maximum likelihood species trees by using RAxML (Stamatakis, 2014) with the PROTGAMMAAUTO model. The Monte–Carlo–Markov-Chain-based method was used to estimate the divergence times between the different species (dos Reis & Yang, 2011).
2.8 Branch-specific positive selection
We obtained codon alignments for the single-copy orthologous groups by first aligning the amino acids by using MUSCLE and then aligning codon-based nucleotides by using PAL2NAL (Suyama et al., 2006). We applied the branch model and branch-site model in codeml (Yang, 2007) to estimate the dN/dS substitution rates using the specified phylogeny (N. meleagris [{G. gallus, C. japonica}, {M. gallopavo, P. colchicus}]). A foreground branch was specified as the clade of the common pheasant in both models. In the branch tests, the null hypothesis was that all branches had the same dN/dS (model = 0) whereas the foreground branch had a different free dN/dS from the background branch in the alternative hypothesis (model = 2). For the branch-site model, we tested a null model (model = 2, NSsites = 2, fix omega = 1, and omega = 1) against an alternative model (model = 2, NSsites = 2, and fix omega = 0). Significant positive selection for the branch and branch-site models was inferred using log-likelihood ratio tests between the alternative and null models corrected for multiple testing.
2.9 Population data analysis
Fifteen females were sampled and sequenced following the whole genome sequencing protocol. Reads that passed the QC were aligned to the reference assembly by using Burrows-Wheeler alignment (BWA; Li & Durbin, 2009). A genome analysis toolkit version 3.7 (DePristo et al., 2011) was used for variant calling. plink (Slifer, 2018) was applied to filter the variants and perform principle component analysis (PCA). High confidence SNPs were concatenated to be used for phylogenetic tree inference of individuals from five populations. Vcftools (Danecek et al., 2011) was used for estimating genome-wide distribution of Tajima's D, θπ and FST in and between different breeds. To detect regions with significant signatures of selective sweep, two methods were implemented. We considered the distribution of the θπ ratios and FST values by comparing breed ringneck with the other four breeds. Windows simultaneously with significant low and high θπ ratios (the 5% left and right tails) and significant high FST values (the 5% right tail) of the empirical distribution were considered to be regions with strong selective sweep signals. We also searched for regions with significant low Tajima's D (the 5% left tail) and significant low θπ (the 5% left tail) of the empirical distribution.
3 RESULTS
3.1 The highly continuous genome assembly
The sequence data used for this assembly were solely obtained from three sequencing platforms: the PacBio RS II, PacBio Sequel, and Illumina X-ten. Details on the sequence data are shown in Table S1. After low-quality and adaptor reads were filtered, approximately 80-fold coverage of reads from 400 bp, 2 kb, 5 kb, 8 kb, and 12 kb libraries were retained for assembly. Moreover, we generated an approximately 78-fold SMRT whole genome shotgun sequence by using the 20 kb libraries. We generated approximately 650 M Hi-C paired-end reads, of which about 250 million were valid interaction reads.
A de novo assembly with single molecular long reads resulted in 3,666 contigs with a total length of 0.99 Gb and an N50 of 1.33 Mb (Table 1). After correction by using NGS reads, the assembled contigs were ordered and oriented on common pheasant chromosomes with Hi-C reads, resulting in an assembly consisting of 589 scaffolds. Among these scaffolds, 11 macroscaffolds (length ranging from 192.31 to 23.96 Mb) contained 74.52% of the total sequence, together with an additional 27 microscaffolds (minimum length of 1.48 Mb), accounting for the remaining 22.09% of the genome. The rest of the short scaffolds only contributed to a small portion of the genome size (Table S2).
Level | NGS | Pacbio + NGS + Hi-C | ||
---|---|---|---|---|
Contig | Scaffold | Contig | Scaffold | |
Sequences Num. | 606,078 | 85,695 | 3,666 | 589 |
Total bases | 975,778,029 | 1,032,529,423 | 985,688,982 | 987,376,876 |
GC content | 40.92% | 41.21% | 41.18% | 41.18% |
Ambiguous bases | 0 | 11,167,110 | 0 | 1,347,530 |
Longer than 1 kb | 276,740 | 4,372 | 3,666 | 589 |
Longest (bp) | 62,130 | 19,824,849 | 12,604,382 | 192,313,706 |
N20 (bp) | 6,976 | 4,731,716 | 3,207,971 | 110,717,904 |
N50 (bp) | 2,954 | 2,428,639 | 1,328,717 | 59,463,360 |
N90 (bp) | 701 | 548,955 | 147,345 | 10,732,325 |
The k-mer distribution of short-insert library reads exhibited two major peaks (Figure S2), representing approximately 0.6% heterozygosity. The assembled genome of the common pheasant was 0.99 Gb in size, which was mostly consistent with the cytological C value (Gregory, 2020) and k-mer–based estimation. The completeness of the common pheasant assembly was further evaluated using buscos (Simao et al., 2015). Of the 4,915 single-copy orthologues in the avian lineage, approximately 96.8% were found in our assembly, which is considerably greater than that of other birds (Korlach et al., 2017; Lee et al., 2018; Vignal et al., 2019) and comparable to other published high-quality reference genome assemblies (Johnson et al., 2018). In addition, about 91.4%, 92.1%, and 98.3% single-copy orthologous genes of metazoan, eukaryotes, and vertebrates were found, respectively (Figure S3). Our analysis suggests the completeness and overall high quality of the assembled common pheasant genome.
3.2 Genome structure characteristics
The overall GC content of the common pheasant’ genome was estimated to be 41.24%, which is similar to that of the other reference bird species. Interspersed repeats accounted for approximately 10.71% of the whole genome, spanning 99.44 Mb, and consisted of approximately 87.2 Mb retroelements and 10.9 Mb DNA transposons. Approximately 7.28% of the sequences were identified as long interspersed nuclear elements (LINEs), which were thus the largest component. The chicken repeat 1 group was the most abundant, occupying 98.7% of the identified LINEs, whereas 1.71% of the sequences were identified as long terminal repeats and 0.04% as small interspersed nuclear elements. The overall level of repetitive content in the common pheasant (Table S3) was similar to that in the chicken (International Chicken Genome Sequencing Consortium, 2004) and greater than those of the turkey (Dalloul et al., 2010) as well as most sequenced birds (Zhang, Li, et al., 2014), which may be attributed to the advantage of the long-read sequencing technology. In addition, 5,560 microsatellite markers were developed for further breeding (Table S4).
After the repeats were masked, a combined strategy of ab initio gene prediction, homologue-based protein evidence, and transcriptome sequencing (Table S5) was used to identify gene loci in the assembly sequences. The final gene models comprised 23,058 transcripts, spanning a 383.4 Mb genomic region. The statistics of the annotated genes in the assembly averaged 16.6 kb per gene, 170 bp per exon, and 2.5 kb per intron. We BLASTed these genes to the NR database and validated 16,208 protein products. A total of 305 tRNAs, 210 snoRNAs, and 21 partial and five complete rRNAs were identified.
3.3 Two chromosomal fission shaped the karyotype of the common pheasant
Despite the overall conservation across all birds, some chromosomes are still reported to be different in various species, and the diploid numbers of chromosomes range from 78 to 82 (Belterman & Boer, 1984; Stock & Bunch, 1982; Takagi & Sasaki, 1974). According to Shibusawa et al. (2004), the pheasant has 41 pairs of chromosomes, differing from the 2n = 78 of the chicken karyotype by two simple fissions of ancestral chromosomes 2 and 4.
To understand the similarities between the common pheasant and chicken at the chromosome level, we compared our assembly with the chicken reference (Gallus gallus 6.0). Most of the large scaffolds had their counterparts to the macrochromosomes in chicken except scaffolds 3, 4, and 6 (Figure 1a). Among them, scaffolds 3 and 6 were both aligned to chicken (Gallus gallus, GGA) chromosome 2 (GGA2), whereas scaffold 4 and microscaffold 15 added up to GGA4. We also compared the assembly with the turkey reference (Turkey 5.0). Correspondingly, the first nine scaffolds have their unique counterparts in the macrochromosomes of turkey (Figure 1b). Same as that in the common pheasant, the turkey (Meleagris gallopavo, MGA) chromosomes 3 (MGA3) and MGA6 are orthologous to GGA2, whereas MGA4 and MGA9 add up to GGA4.

Unlike auto-chromosomes, the assembly of pheasant sex chromosomes was fragmented, which is probably attributed to their structural complexity (Wang et al., 2014). To better resolve their sequence and structure, we reconstructed the Z and W chromosomes of the pheasant, leaving 126 unclustered scaffolds. The Z chromosome was reconstructed using 239 scaffolds with a length of 56.14 Mb, including nine scaffolds larger than 1 Mb. The reconstructed W chromosome was considerably smaller having a length of 4.08 Mb. Among the 13 scaffolds that constitute the W chromosome of pheasant, only 1 was larger than 1 Mb (Table S6).
Despite the overall high consistency of the common pheasant, chicken, and turkey genomes, some exceptions were observed. For example, scaffold 7 in the assembly of common pheasant was composed of several chicken chromosomes: GGA10, GGA19, GGA21, GGA26, and GGA28 (Figure 1c). Correspondingly, chromosomes 12, 21, 23, 28, and 30 of turkey were aligned to scaffold 7 (Figure 1d). Considering that no cytogenetic evidence supports such a novel macrochromosome, scaffold 7 may be a misassembled pseudochromosome. Apart from scaffold 7, one major fusion of GGA22 and GGA24 was found in our scaffold 20, whereas two fissions were identified in GGA15 and GGA27. Moreover, varying degrees of rearrangements were found between scaffolds 27 and 37 of pheasant and GGA25, GGA30, and GGA33 of chicken (Figure 1c). Because of the GC-richness, specific repeats, high microchromosome mutation rate in birds (Kawahara-Miki et al., 2013), and lack of cytogenetic supports, determining whether these short scaffolds are bona fide chromosomal rearrangements or simply assembly errors is difficult, and further studies are required in this regard.
3.4 Species tree building and divergence time estimation
We found large-scale differences in gene complements within birds and among birds, reptiles, and mammals by using the common pheasant gene set and combined gene sets from 12 birds, one reptile, and three mammals. Thus, we obtained 34,976 gene families from all 16 species. A total of 16,208 genes of P. colchicus were distributed in the 13,267 gene families, of which 11,211 were single orthologous genes. Of these gene families, 1,235 are specific to the common pheasant. Of the protein-coding gene sequences of the common pheasant, the pairwise orthologues for 11,245, 10,682, and 11,080 genes could be identified in the chicken, turkey, and human genomes, respectively (Figure 2a). A total of 8,385 combined orthologues could be identified across all the 13 animals.

A completely resolved phylogenetic tree of 48 birds was generated in 2014 (Jarvis et al., 2014); however, the common pheasant was not included. To place the analyses in a well-grounded evolutionary framework, we generated a whole-genome species tree by using RAxML (Stamatakis, 2014) and the green anole genome as an outgroup. In total, 2,233,863 sites from 5,024 single-copy orthologous genes of 13 species were obtained, yielding a species tree with 100% bootstrap support for all nodes (Figure 2b). To ensure that the phylogeny was robust, we used the coalescent-based phylogenetic method astral and the same gene set (Figure S4). As expected, the topology of the astral coalescent tree was identical to the species tree.
According to our results, among the 13 species used to reconstruct the phylogenetic tree, turkey is the most closely related to the common pheasant. Our results support that the lineage of the common pheasant is closer to the most recent common ancestor of the chicken and Japanese quail, which was consistent with the findings of Shibusawa et al. (2004) and topologically different from the results of other studies (Griffin et al., 2007; Jiang et al., 2014; Kan et al., 2010; Li et al., 2015; Stein et al., 2015). The species tree clearly split into Psittaciformes, Galliformes, Anseriformes, and Passeriformes. We estimated the divergence times among the 13 species by using three constraints and found that P. colchicus had diverged from the turkey approximately 24.7–34.5 million years ago (Ma). The common ancestor of the pheasant and turkey was inferred to have diverged from the common ancestor of the chicken and Japanese quail around 36.17 Ma.
3.5 Molecular evolution of the common pheasant protein-coding genes
Gene family expansion and contraction are important evaluations of adaptive evolution (Kondrashov, 2012). To assess the changes in gene family sizes, we used a likelihood model to determine significant expansions and contractions of gene families. The results revealed 974 expanded and 1,408 contracted gene families compared with that in the common ancestor of the common pheasant and turkey (Figure 2b). Based on the analysis of differentially evolved genes, we identified 66 gene families as being under accelerated evolution in the pheasant lineage (Table S7). Out of these 66 gene families, 16 were homologous to an uncharacterized protein.
In birds, the rate of sequence divergence in immune-related genes is usually higher than in the other genes primarily because of the co-evolution of host–pathogen interactions (Ekblom et al., 2010). Eight immune-related gene families were found to be expanded or contracted in the pheasant genome, such as genes that are homologous to FFAR3, CD8A, SCART1, CLEC2D, and IGHV. In addition to the abovementioned protein, most of the remaining gene families were related to calcium signalling; for example, PCDH1, TPM1, RGPD8, CBARP, and ITPRIPL1. We performed gene set enrichment analysis of the 2,382 expanded or contracted gene families (Figure 2c; Table S8). Three KEGG pathways and 32 GO terms (six from molecular function, 18 from biological process, and eight from cellular component) were found to be enriched. The most enriched pathway was the calcium signalling pathway, along with the enriched GO term melanin biosynthetic process and expanded keratin gene family, which might suggest its unique plumage formation mechanism (Fargallo et al., 2006; Li et al., 2018; Nadeau & Jiggins, 2010; Takeuchi et al., 1996).
To determine whether any branch-specific selection is present in the common pheasant, we estimated branch nonsynonymous substitution rate/synonymous substitution rate (dN/dS) for 7,533 orthologous genes in the common pheasant, chicken, guinea fowl, Japanese quail, and turkey. We also searched for episodic positive selection in the same gene set by using the branch-site model. Consequently, we retrieved 39 genes that had been subjected to significant positive selection in P. colchicus based on branch tests (Table S9), some of which were also calcium-related, such as PCDH9, STIM1, ILK, and S100A16. In addition, 13 of the 39 genes showed significance at 0.05 level based on the branch-site model.
3.6 Three genomic inversions occurred in the MHC-B
The high continuity and completeness of the assembly provided us insight into this bird's MHC-B. The latter was first found to be the main functional MHC genomic region in the chicken (Kaufman et al., 1999; Rogers et al., 2003), and subsequently found in a series of Galliformes (Chaves et al., 2009; Eimes et al., 2013; Lee et al., 2018; Shiina et al., 2004, 2007; Wang et al., 2012; Ye et al., 2012). We assembled one single scaffold (scaffold 37) that covered the entire MHC-B contiguous region of several Phasianids. After the curation, 15 putative MHC genes of the pheasant were identified within approximately 80 kb region in scaffold 37. The genes involved BG1, Blec1, Blec2, two MHC class II B loci, TAPBP, BRD2, DMA, two DMBs, two MHC class I loci, TAP1, TAP2, and C4, which are shown in Table S10.
The common pheasant and chicken showed similarities in the MHC-B region and shared an almost perfect syntenic gene order (Figure 3a). However, the TAP1 and TAP2 gene block was inverted in the pheasant MHC-B locus (Figure 3b, block in green). Furthermore, we found that the transcript of the TAPBP gene in the common pheasant was in the forward strand, which is opposite to that in the chicken and Japanese quail (Figure 3b, block in red). Thus, as shown in Figure 3, two gene loci, that is, TAPBP and the TAP1-TAP2 block, were inversely oriented compared to that in the chicken MHC-B.

In addition to both the abovementioned inversions, another minor inversion was identified in the MHC-B of the common pheasant, which overlapped with the gene locus of Blec2. By mapping the coding sequence of the Blec2 gene of chicken and turkey to the common pheasant assembly, we found that the locus of Blec2 was partially or completely duplicated, and the duplication was inversed in the common pheasant. The inversed duplication of Blec2 was similar to that in the golden pheasant, which also possesses putative Blec2 locus at both orientations, that is, Blec2 is homologous to that in chicken and its inversed duplication.
A phylogenetic tree was constructed using the aligned partial MHC-B sequences of the chicken, turkey, golden pheasant, common pheasant, Japanese quail, and black grouse (Figure 3b), which was topologically consistent with our species tree generated using a single orthologous gene set. However, the phylogenetic trees reconstructed by using seven individual genes (BLB1, TAPBP, BLB2, BRD2, BF1, TAP1, and TAP2) were not consistent (Figure S5). In most cases, the common pheasant is topologically next to the golden pheasant, and the chicken is close to the Japanese quail. In addition, only the tree topologies of the coding sequences of BF1 are consistent with the phylogenetic tree of the partial MHC region. In addition, six out of the seven genes showed signs of elevated dN/dS ratios, suggesting an increased balancing selection or relaxed purifying selection (Table S10).
3.7 Population genetics features of domestic common pheasant
We performed whole genome resequencing of 15 females from five breeds. These breeds differed in their appearances (Figure S1), growth, and productive traits (Figure 4a; Table S11). A total of 633 Gb of raw reads were generated, with an average depth of 40× (Table S12). These data were mapped against our reference assembly. A total of 3,024,631 INDEL and 16,330,279 SNP sites was identified. After filtering, 11,491,081 SNPs were used for subsequent analysis. PCA and phylogenetic tree was inferred based on the high-quality SNP sets, the results were showed in Figure 4b,c. The neighbor-joining tree revealed that each of the five breeds formed a group, while the PCA suggested that the same except that breed S and breed B are genetically closer than others at an overall genomic level.

We measured genetic diversity in each population by using 11,491,081 SNPs that passed QC. The demographic history of the common pheasant (Figure S6) was estimated by using pairwise sequential Markovian coalescent analysis (Li & Durbin, 2011). Two different estimators—Tajima's D and θπ—were calculated using the sliding window strategy across the whole genome (Figures S7 and S8). Mean θπ for each of the five breeds were 0.00417 (R), 0.00438 (M), 0.00356 (Mt), 0.00395 (S), and 0.00387 (B). The two measures indicated that the R and M breeds had a slightly higher Ne than the other three, which might suggest that they are closer to their wild relatives. Moreover, as all the five breeds had relatively high Tajima's D, domestic common pheasants might have faced a genetic bottleneck due to breeding process. To investigate population divergence, we calculated the population fixation statistics (FST) among the breeds. The result is shown in Table S13.
There are 76.1, 68.3, 58.6, and 61.4 Mb genomic regions with strong selective sweep signals in breed Mt, breed M, breed S, and breed B, respectively (Figures 4d, S9 and S10). We performed a gene set enrichment analysis of these selected genes (Table S14). The result showed that a single pathway—glycosaminoglycan biosynthesis–chondroitin sulphate/dermatan sulphate—was enriched between the ringneck breed with a smaller size and the Shenhong pheasant with a bigger size. Chondroitin sulphate and dermatan sulphate are long chains of repeating disaccharide units, which are important in a wide range of biological processes, including development, regeneration after injury, modulation of growth factor signalling, and cell migration (Purushothaman et al., 2007; Yamada & Sugahara, 2008). Considering the most obvious difference between the two breeds is body mass, this finding might suggest a footprint of artificial selection for growth (Maccarana et al., 2009).
4 DISCUSSION
In this study, we generated a chromosome-level de novo genome assembly for the domestic common pheasant. The contig and scaffold N50 of the assembly were substantially higher than those obtained in previous studies by using de novo bird genome assemblies, implicating the unique advantage of long-read sequencing and Hi-C sequencing in assembling a high-quality non-model animal genome. By using Hi-C data, mate-pair libraries, which were used to construct scaffolds in many previously published assemblies, were not necessary. In fact, although we tried to integrate both technologies by first implementing mate-pair scaffolding and then HiC scaffolding to generate a more intact reference genome, the output was confusing and contradictory. The inclusion of mate-pair libraries generated several huge scaffolds that contradicted the cytological evidences. Thus, the mate-pair sequencing data were not included in the final assembly.
Although we achieved partial chromosome-level assembly, anchoring all short scaffolds into their chromosomes required a considerable amount of effort. Physical map, genetic map, and bacterial artificial chromosome libraries are always applied to anchor contigs or scaffolds to chromosomes. However, such references are absent in most non-model organisms. Therefore, our assembly could be improved by using additional long-range information from technologies such as optical mapping (Lam et al., 2012) and linked reads (Zheng et al., 2016). Each of the abovementioned scaffolding technologies is associated with errors and biases; a combination of different data sources is required to improve the assembly quality.
A chromosome-level assembly of the common pheasant provided information on the karyotypic evolution of Galliformes. The overall genome structures of all avian species have been reported to be conserved at both karyotypic and sequential levels for over 150 million years (Zhang, Li, et al., 2014). The most investigated avian karyotypes belong to representatives of the order Galliformes owing to their agricultural and academic importance; nonetheless, most of these studies focused on macrochromosomes. For example, Shibusawa et al. (2004) demonstrated that two simple fission events have occurred in the ancestral chromosomes 2 and 4 of the pheasant, which is in agreement with our results. Unlike larger chromosomes, microchromosomes are more GC-rich and have higher mutation rates (International Chicken Genome Sequencing Consortium, 2004; Kawagoshi et al., 2008). Because of their size and DNA composition, the cytogenetic and genomic data on microchromosomes remain limited, which impeded avian karyotypic research. Our findings provide a relatively reliable and affordable strategy for karyotypic study.
The MHC is the most polymorphic vertebrate genomic region, which is involved in the adaptive immune responses (Hughes & Yeager, 1998). The human MHC is a large genetic region of 4 million bp with over 100 genes. In contrast, the chicken has a compact “minimal essential MHC,” which consists of 19 genes. In this study, we found that the common pheasant had a different MHC-B structure (Figure 3a and b), with three inversions (TAPBP, TAP1-TAP2, and Blec2). The same inversion events of the TAPBP and TAP1-TAP2 regions were also found in the Mikado pheasant, golden pheasant, and black grouse, but were absent in the Japanese quail. Unlike others, the turkey has the TAPBP inversion, but not the TAP1-TAP2 inversion, while the greater-prairie chicken has the TAP1-TAP2 inversion but not the former. Our results provide a valuable resource for future studies on the evolution of avian MHC genes and poultry immunity.
Although some studies have focused on the discovery of TAPBP and TAP1-TAP2 inversions in other birds (Chaves et al., 2009; Eimes et al., 2013; Lee et al., 2018; Wang et al., 2012; Ye et al., 2012), we found few studies on the functional alteration of those inversions. TAP1 and TAP2 genes encode ATP-binding-cassette transporter molecules that deliver peptides to MHC class I molecules. TAPBP encodes TAP-binding protein (tapasin) to mediate interaction of MHC class I molecules and TAP molecules. The three genes were postulated to co-evolve with BF2 gene, which serves in antigen presentation in chicken (Kaufman et al., 1999; Walker et al., 2011).
The domestic common pheasant is the most popular game bird in the world. In the UK alone, the number of released pheasants were estimated to be 25 to 50 million in the study by Madden et al. (2018), most being the domestic common pheasant. In China, the domestic common pheasant is the most consumed rare poultry for meat and egg production. More than 900,000 farmed common pheasants have been estimated to be raised, and 80 million chicks have thus far been traded per year in China. However, unlike the broiler and layer, major genetic improvements are still required. Since many major poultry breeding companies and government breeding development programmes now perform genomic selection for poultry breeding, our findings might provide a reference for genomic selection for common pheasant breeding.
Subspecies classification of the common pheasant has long been controversial. Furthermore, farmed pheasants are not isolated from their wild relatives because of the way in which they are stocked. Because of the comparatively brief period of domestication, bred common pheasants are neither morphically nor genetically divergent from the natural population. Therefore, these commercially bred common pheasants can hybrid with their wild relatives. Hybridized descendants between local subspecies and commercial farming subspecies exacerbate the state of local subspecies conservation. Even worse, the domestic common pheasant populations that originate from only few founders might lead to the loss of genetic diversity of indigenous populations. Although pheasants are listed as “least concern” by IUCN (BirdLife International, 2016), many of their subspecies are threatened to various degrees (Braasch et al., 2011). Previous studies on the phylogenetic classification, genetic diversity, and conservation of pheasants have been limited to mitochondrial sequences and single sequence repeat markers (Kayvanfar et al., 2017; Wang et al., 2017; Zhang et al., 2014), and our high-quality genomic sequences provide an improved resolution.
In conclusion, we achieved a high-resolution, chromosome level genome assembly for the common pheasant by integrating second- and third-generation sequencing technology data. The assembly has a contig N50 size of 1.33 Mb and scaffold N50 size of 59.46 Mb, containing 23,058 genes and approximate 99.4 Mb interspersed repeats. Karyotype and phylogenetic analysis demonstrated that the common pheasant was a sister species to the turkey, and these two species diverged in 24.7–34.5 Ma. We show that copy number alteration, as well as branch-specific positive selection, happened in a series of calcium-related genes, indicating an important role in adaptative evolution of P. colchicus. In addition, a complete MHC-B was assembled which expanded the spectrum of galliform MHC. Last, but not least, population genetic analysis of resequenced genomes of domestic breeds revealed signals of artificial selection in the productive trait.
ACKNOWLEDGEMENTS
We thank the staff of the Shanghai Personal Biotechnology Co., Ltd. and Annoroad Gene Technology for the sequencing services. This work was supported by the Shanghai Agriculture Applied Technology Development programme, China (grant 2017-02-08-00-12-F00069). We thank Dr Yan Zhang from Carilion Clinic (Virginia, USA) for his help in the revision of this manuscript.
AUTHOR CONTRIBUTIONS
H.M., X.L., and H.Y. conceived and coordinated the project. L.Z. collected the samples and provided the reagents and materials. L.X. extracted the DNA, built the sequencing libraries. K.X., J.D., H.Z., Y.Z., C.X.H., H.L., L.Y., and L.L. assisted in the experiments. F.A. assisted in the editing of the manuscript. C.H. supervised the sequencing, analysed the data and wrote the manuscript.
Open Research
DATA AVAILABILITY STATEMENT
This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession WUCP00000000. The version described in this paper is version WUCP01000000. Raw read sequences generated in the de novo sequencing have been deposited in the Sequence Read Archive (SRA) at NCBI under the project access ion PRJNA380312. Published genome data used in the analyses can be found under the following accession codes: G. gallus (GRCg6a [ftp://ftp.ensembl.org/pub/release-96/fasta/gallus_gallus/dna/]); M. gallopavo (UMD2 [ftp://ftp.ensembl.org/pub/release-96/fasta/Meleagris_gallopavo/dna/]). More dataset, as well as the pipelines and scripts, can be found in figshare https://figshare.com/projects/Phasianus_colchicus_genome_sequencing_and_assembly/88112.