Comparative analysis of 326 chloroplast genomes in Chinese jujube (Ziziphus jujuba): Structural variations, horizontal gene transfer events, and evolutionary patterns impacting its domestication from wild jujube
Abstract
Jujube (Ziziphus jujuba Mill.), renowned for its nutritional value and health benefits, is believed to have originated in the middle and lower reaches of the Yellow River in China, where it underwent domestication from wild jujube. Nonetheless, the evolutionary trajectory and species differentiation between wild jujube and cultivated jujube still require further elucidation. The chloroplast genome (plastome), characterized by its relatively lower mutation rate compared to the nuclear genome, serves as an excellent model for evolutionary and comparative genomic research. In this study, we analyzed 326 nonredundant plastomes, encompassing 133 jujube cultivars and 193 wild jujube genotypes distributed throughout China. Noteworthy variations in the large single copy region primarily account for the size differences among these plastomes, impacting the evolution from wild jujube to cultivated varieties. Horizontal gene transfer (HGT) unveiled a unique chloroplast-to-nucleus transfer event, with transferred fragments predominantly influencing the evolution of the nuclear genome while leaving the plastome relatively unaffected. Population genetics analysis revealed two distinct evolutionary pathways from wild jujube to cultivated jujube: one driven by natural selection with minimal human interference, and the other resulting from human domestication and cultivation. Molecular dating, based on phylogenetic analysis, supported the likelihood that wild jujube and cultivated jujube fall within the same taxonomic category, Z. jujuba. In summary, our study comprehensively examined jujube plastome structures and HGT events, simultaneously contributing novel insights into the intricate processes that govern the evolution and domestication of jujube species.
1 Introduction
Chinese jujube (Ziziphus jujuba Mill.), also commonly known as jujube or Chinese date, is a prominent deciduous fruit tree of the Rhamnaceae family. It is indigenous to China and widely distributed throughout the country, particularly abundant in the Yellow River Basin (Qu & Wang, 1993; Liu et al., 2020; Song et al., 2021). Jujube is esteemed for its substantial vitamin C content, a feature that accentuates its nutritional value and underscores its role as a vital source of essential nutrients and flavor enhancement (Huang et al., 2017). Moreover, the jujube fruit is abundant in cyclic nucleotides, phenolics, flavonoids, triterpenic acids, and polysaccharides, endowing it with a diverse array of medicinal properties, including its noted anti-cancer, anti-inflammatory, and immunostimulating properties, all of which are advantageous for human health (Gao et al., 2013; Liu et al., 2019; Liu et al., 2020). Extensive cultivation of Jujube is evident throughout China, with notable concentrations in the northwest, south, and northeast regions (Guo et al., 2021a). The direct ancestor of jujube, namely wild jujube, or sour jujube, thrives predominantly in the middle and lower reaches of the Yellow River (MLYR), specifically within the mountainous terrain of Shanxi, Shaanxi, Hebei, Shandong, and Henan Provinces (Du et al., 2022). In the following study, both cultivated and wild jujubes are collectively denoted as “jujube,” while distinctions between the two are discerned through the designations “C-jujube” for the cultivated jujube and “W-jujube” for the wild jujube. The differences between C-jujube and W-jujube, as a whole, are distinct and evident in numerous biological traits. These include sugar and soluble organic acid contents of mature fruits, modes of propagation (sexual or asexual), tree morphology, fruit and leaf dimensions, stone morphology, presence of spine, and various other characteristics. The variances in distribution regions, as well as the pronounced biological and morphological differences between C-jujube and W-jujube, can be attributed to the impacts of evolution and domestication (Guo et al., 2021a; Huang et al., 2016; Qu & Wang, 1993).
Natural selection and artificial domestication have given rise to a multitude of variations within the jujube species, resulting in the development of more than 800 known varieties (Liu et al., 2020). The divergence of C-jujube and W-jujube has been traced to an era approximately 2.7 million years ago (Ma), signifying a protracted pre-domestication phase that predates recorded human history. These extended periods are postulated to have been driven by natural selection, involving processes such as mutations and hybridizations, with the possible involvement of animals, especially primates, in the evolutionary trajectory (Guo et al., 2021a). In comparison, the artificial domestication of jujube transpired some 7000 years ago along the MLYR regions of China (Qu & Wang, 1993; Liu, 2006), and throughout the past three millennia of human history, extensive records have documented practices such as cultivar selection, flower thinning, girdling, and so on (Wang & Sun, 1986; Qu & Wang, 1993, Liu et al., 2020; Guo et al., 2021a). Recent research, primarily relying on genomic re-sequencing, has corroborated the conventional consensus designating the Shanxi-Shaanxi region as the principal center of jujube domestication; furthermore, Shanxi seems to have preceded Shaanxi as the initial center of domestication within MLYR (Guo et al., 2021a). Nevertheless, the domestication process from wild to cultivated jujube remains a complex and multifaceted phenomenon, replete with numerous unanswered questions. These questions encompass more nuanced aspects of phylogenetic classification distinguishing C-jujube and W-jujube, further evidence of speciation events, and a comprehensive comprehension of the branching patterns underlying jujube domestication along the MLYR.
The chloroplast, a vital organelle supporting plant growth and development, is distinguished by its unique characteristics, notably being parthenogenetic, haploid, and nonrecombinant. Moreover, it is marked by a moderate mutation rate within chloroplast genes, surpassing that of mitochondrial genes, while remaining inferior to the mutation rate exhibited by nuclear genes. This particular combination of traits renders the chloroplast an exemplary model for investigations within the domains of evolutionary and comparative genomics (Dong et al., 2020; Jiang et al., 2023; Lin et al., 2023). Among land plants, the chloroplast genome (plastome) boasts a circular configuration characterized by a remarkable degree of conservation. It comprises a quadripartite structure consisting of two inverted repeats (IRs), alongside a large single copy (LSC) region and a small single copy (SSC) region. Typically, these genomes encompass a repertoire of approximately 120–130 genes, primarily associated with essential biological processes such as photosynthesis, transcription, and translation. Furthermore, it has been discerned that chloroplast DNA integrates significantly into the nuclear genome, culminating in the formation of nuclear chloroplast DNAs (NUPTs) (Timmis et al., 2004). These distinctive attributes collectively endow the plastome with heightened utility for investigations about taxonomic classifications, phylogenetic elucidations, as well as inquiries into dynamic evolution and domestication processes (Daniell et al., 2016).
The advancement of high-throughput sequencing technology has ushered in the era of simultaneous sequencing of multiple samples through individual barcodes. This shift from traditional single chloroplast studies to the analysis of multiple plastomes has catalyzed significant progress in the field of chloroplast research. Yan et al. (2022) conducted a comprehensive comparative analysis encompassing 343 plastomes within the Solanum section Petota, to offer new insights into potato diversity, phylogeny, and species differentiation. Brock et al. (2022) undertook the assembly of 84 Camelina plastomes, leveraging these data sets to scrutinize maternal parentage in polyploid lineages, unveil novel taxonomic relationships, and estimate divergence time between domesticated Cannabis sativa and its progenitor, Cannabis microcarpa. Likewise, Guo et al. (2021b) assembled and meticulously annotated 77 plastomes of the Paphiopedilum species. This endeavor enabled the revelation of extensive IR expansion and SSC contraction in the plastomes of Paphiopedilum; and facilitated the resolution of relationships with the genus, except for the phylogenetic position of two inherently unstable species. Within the domain of rice, studies employing plastomes to investigate populations have unveiled evidence of at least two domestication events in the case of Asian rice. Additionally, within the japonica variety, the identification of signs of strong positive selection and bottleneck events during domestication has been a notable discovery (Cheng et al., 2019). Furthermore, numerous other studies have harnessed chloroplast data for population-level investigations, spanning diverse species such as apple (Nikiforova et al., 2013), pepper species (Magdy et al., 2019), Asian lotus (Wang et al., 2022), buckwheat (Fan et al., 2021), and east Asian peonies (Chen et al., 2023), among others. These extensive studies, encompassing a variety of species, collectively underscore the utility of plastomes in population-level analyses.
Plastomes have been harnessed as a valuable tool for investigating a spectrum of botanical facets, encompassing classification, phylogeny, haplotype-related breeding, genetic diversity, and population dynamic of W-jujubes. Huang et al. (2015) devised a set of 46 chloroplast microsatellite markers, subsequently applying them to 72 C-jujubes and 23 W-jujubes to elucidate their chlorotypes. Complementarily, Hu et al. (2022) constructed a haplotype network using 65 jujube plastomes to facilitate jujube breeding by offering guidance for the selection of parental plants in the hybridization of fresh-fruit jujube varieties. Furthermore, Du et al. (2023) conducted a comparative analysis of 21 plastomes from W-jujube, exploring their sequence variations and positioning within the broader phylogenetic context. While these antecedent studies have undoubtedly enriched the resources available for research on jujube's plastomes, there remains an unmet need for a more extensive examination within the domain of plastome studies, one that encompasses both C-jujube and W-jujube. Such investigations are not only essential for advancing our comprehension of chloroplast sequence diversity but also indispensable for fostering a comprehensive understanding of jujube's evolution trajectory and domestication processes.
In this study, we successfully assembled the plastomes of 403 samples, comprising 187 C-jujubes, 215 W-jujubes, and an additional specimen, Ziziphus mauritiana, utilized as the outgroup. Following the exclusion of duplicates – individuals featuring identical chloroplast sequences – we retained a total of 326 samples for subsequent investigations. These analyses spanned a spectrum of pivotal domains, encompassing comparative genomics, scrutiny of horizontal gene transfer (HGT), exploration of population genomics, and in-depth domestication analyses. Our research results consistently corroborated prior investigations, thereby affirming the premise of C-jujube originating from W-jujube. Additionally, our analysis unveiled two discernible paths of evolution trajectories. One path exhibited a predilection for natural selection, while the other exhibited marked indications of artificial selection. These insights not only expand our understanding of the jujube's origin and domestication processes but also harbor the potential to provide valuable insights for the optimization and refinement of ongoing jujube breeding initiatives.
2 Material and Methods
2.1 Sample collection and sequencing
Fresh leaves from all jujube samples were collected from three distinct locations: (i) the National Jujube Germplasm Resource Nursery in Taigu, Shanxi (112.58 °E, 37.42 °N, Alt. 801 m); (ii) the experimental base of Hebei Agricultural University in Baoding, Hebei Province, China (115.43 °E, 38.83 °N, Alt. 79.8 m); and (iii) the Fuping Experimental Station of Chinese Jujube at Hebei Agricultural University (114.28 °E, 38.72 °N, Alt. 338 m). These locations served as repositories for a significant portion of grafted C-jujube and W-jujube germplasm sourced from various regions of China and other foreign countries. The outgroup, Ziziphus mauritiana Lam., was originally collected from the Swat district, Khyber Pakhtunkhwa Province, Pakistan (72.34 °E, 34.41 °N, Alt. 864 m), and grafted in the greenhouse of Hebei Agricultural University. The leaves of all samples, as well as each DNA sample, were preserved at the Research Center of Chinese Jujube, Hebei Agricultural University.
Genomic DNA was extracted from fresh leaves of each collected sample using the modified cetyltrimethylammonium bromide (CTAB) method (Cota-Sánchez et al., 2006). The DNA's purity was assessed using a NanoDrop One Microvolume UV-Vis Spectrophotometer (Thermo Fisher Scientific, Waltham, Massachusetts, USA), and its integrity was verified through agarose gel electrophoresis. Subsequently, the DNA was utilized for constructing paired-end libraries with an insert size ranging from 200 to 400 bp. Sequencing was performed on the MGISEQ-2000 platform (BGI, Shenzhen, Guangdong, China), and for each sample, about 40 Gb of raw data were generated. The method for constructing the sequencing library was outlined in one of our recently published papers (Yang et al., 2023).
2.2 Assembly and annotation of the plastome
Quality control and the selection of high-quality reads from the raw sequencing data of all collected jujube samples were performed using Fastp v0.23.2 (Chen et al., 2018). Subsequently, we utilized GetOrganelle v1.7.7.0 (Jin et al., 2020) to assemble these reads. This software involves two primary steps: first, it employs Bowtie2 v2.3.5.1 (Langmead et al., 2019) to search and map reads to the reference plastome (CM036903.1), and second it extracts the mapped reads and assembles them into the plastome for each sample using SPAdes v0.7.12 software (Prjibelski et al., 2014). The resulting genomes all exhibited a circular structure with the canonical quadripartite organization. Finally, we conducted annotation and refinement of all assemblies using PGA v1.0 (Qu et al., 2019) and Geneious v2021.1.1 (Kearse et al., 2012).
2.3 HGT of chloroplast sequences to the nuclear genome
To identify transfer fragments, we incorporated the published nuclear genome data of C-jujube, and Ziziphus jujuba Mill. “Dongzao” (DZ) (Yang et al., 2023), and the published W-jujube, Z. jujuba var. spinosa (SZ) (Shen et al., 2021) with their respective plastome sequences. To facilitate a comparative analysis of plastome and nuclear genome sequences, we divided each genome sequence into adjacent 5 kb windows, which were subsequently aligned with its corresponding plastome using BLAST V2.12.0+ (Johnson et al., 2008). After obtaining the alignment results, a process of data integration and coordinate restoration was undertaken, culminating in the reintegration of individual windows within the context of their respective genomic landscapes.
To delve deeper into the investigation of these transferred NUPTs, we initiated a selective extraction of the coding genes from the aligned regions. Subsequently, we performed a meticulous comparative analysis of these coding genes with their corresponding chloroplast counterparts. Our primary focus was placed on transfer fragments that encompassed complete chloroplast genes, thereby enabling us to scrutinize variations relative to the corresponding chloroplast genes. To examine the changes incurred by chloroplast genes throughout the transfer process, we embarked on multiple sequence alignments between the chloroplast genes and the NUPTs in the nuclear genome using Mafft v7.310 (Katoh & Standley, 2013). After the alignment process, we extracted single nucleotide polymorphisms (SNPs) as well as insertions and deletions (Indel) from each of the multiple sequence alignment results of the HGT genes. The cumulative mutation ratio (CMR) for each intact transferred gene was calculated by dividing the sum of SNP and Indel sites by the gene length.
2.4 Sequence alignment, phylogenetic tree construction, and ancestral area reconstruction
We utilized “plastome_arch_info.py,” a script in the GetOrganelle (v1.7.7.0) package, to extract three specific regions – LSC, IRa, and SSC – from all available complete chloroplast sequences. Subsequently, these extracted sections were aligned using Mafft v7.310 (Katoh & Standley, 2013). The alignments of these three regions were concatenated to create a combined alignment. To construct a phylogenetic tree, we employed IQ-TREE2 v2.2.0 (Minh et al., 2020) and performed the analysis with 1000 bootstrap replications for robustness assessment with the following parameter settings: “-mset liemarkov -st DNA -B 1000 –bnni.” The optimal model for the phylogenetic study was determined to be “WS8.10a+I + I + R4.”
The analysis of historical biogeography for a set of 326 jujube samples was conducted using Reconstruct Ancestral State in Phylogenies (RASP) version 4.3 (Yu et al., 2015). The phylogenetic tree constructed above served as a basis for examining the compatibility of RASP models with our data. During the initial evaluation, the S-DIVA, DEC, and S-DEC models were found to be unsuccessful in passing the compatibility test. Consequently, we opted to employ the BayesArea method integrated within RASP for the subsequent reconstruction of ancestral geographic ranges.
2.5 Analysis of population structure
Using the plastome of Z. mauritiana as the reference, we utilized an in-house script to convert the multiple sequence alignment into the Variant Call Format (VCF). Subsequently, we executed VCF merging using bcftools (v1.10.2) (Narasimhan et al., 2016), as part of our preparatory steps for population genetics analysis. In the pursuit of data quality, we applied quality control on the VCF file using Plink v1.90 (Purcell et al., 2007) with two main parameters “maf = 0.05” and “geno = 0.2.” Population structure analysis was conducted using ADMIXTURE v1.3.0 (Alexander et al., 2009) with the block relaxation parameters. In parallel, we performed principal component analysis (PCA) employing GCTA v1.94.0 (Yang et al., 2011) with default parameters, where we considered the first two eigenvectors for an insightful representation of the genetic structure.
Furthermore, we calculated nucleotide diversity (π) and fixation index (Fst) values for different varieties across four different geographical regions, including MLYR, EC, SC, and WC. These computations were executed via VCFtools v 0.1.16, a tool introduced by Danecek et al. (2011). The evolutionary and domestication history of W-jujube and C-jujubes was inferred based on the values of π and Fst.
2.6 Haplotype analysis
After quality control, we proceeded to identify and extract the SNP loci from the VCF file generated by Plink with two main parameters “maf = 0.05” and “geno = 0.2.” These SNP loci were then transformed into a FASTA-formatted sequence file. To elucidate the plastome haplotypes, we harnessed DnaSP v6, a software tool meticulously designed for this purpose (Rozas et al., 2017). This facilitated the computation of haplotypes derived from the plastome. To visualize the haplotype relationships, we employed PopART v1.7 (Leigh & Bryant, 2015), which enabled us to construct a haplotype diagram following the TCS Network model. This diagram served as an illustrative means to elucidate the intricate interconnections and patterns within the plastome haplotypes.
2.7 Divergence dating
In estimating divergence times, we collected the complete plastomes of 38 Rhamnaceae species, along with two Elaeagnaceae species available in the NCBI database. To infer historical demography, we employed the software package BEAST v1.10.4 (Drummond & Rambaut, 2007) with an uncorrelated relaxed molecular clock. The phylogenetic tree, incorporating divergence time estimations, was meticulously constructed by integrating fossil-based chronological markers from three pivotal sources: the divergence between Ziziphus and Paliurus, which occurred between 66 and 71 Ma (Chen et al., 2017); the occurrence of the genus Rhamnus, dated to 65–70 Ma (Peppe et al., 2007); and the ancient divergence event between the families Rhamnaceae and Euphorbaceae, estimated to have transpired approximately 240–316 Ma. This latter estimate was derived from the recent discovery of a Phylica fossil with an age bracket of 99–110 Ma (He & Lamont, 2022; Lamont & He, 2022).
The choice of the best-fitting substitution model for the sequence data was determined to be “GTR + F + G4” using PhyloSuite v1.2.2 (Zhang et al., 2020a). In conducting Markov chain Monte Carlo (MCMC) analyses, we executed simulations for a duration of 200 000 000 generations, with samples drawn at intervals of 20 000 generations, amounting to a total of 10 000 trees. To ensure the convergence of the MCMC runs, we diligently examined the output log file using Tracer v1.7.2 (Rambaut et al., 2018) and confirmed that the expected effective sample size exceeded the threshold of 200. Subsequently, a burn-in phase, entailing the exclusion of the initial 25% of the sampled trees, was enacted, with subsamples acquired at intervals of 100 steps. Finally, the resultant tree file was rendered visually using Figtree v1.4.4, thereby facilitating the elucidation of estimated divergence times and the clarification of the phylogenetic relationships among the studied Rhamnaceae and Elaeagnaceae species.
To determine the divergence time among distinct genealogical groups, we employed a methodological approach that involved the selection of representative jujube specimens situated at the terminal nodes of each group within the phylogenetic tree, which comprised 326 jujube individuals. Furthermore, we selected representative samples from pivotal branches within each of these groups. These chosen jujube individuals were then used to estimate the divergence time for the four distinct genealogical groups. The method employed to calculate the divergence time closely parallels the approach detailed above with 38 Rhamnaceae species, and the divergence time between Z. jujuba and its closest relative, Ziziphus hajarensis, served as the calibration point.
3 Results
3.1 Genome sequencing and plastome assembly
A total of 230 jujube genotypes, consisting of 187 C-jujubes and 43 W-jujubes, were collected from 22 provinces in China. These genotypes were subjected to sequencing and assembly for the plastomes. Additionally, raw sequencing data of 172 W-jujubes were retrieved from the NCBI SRA database and subsequently assembled each for the plastomes, resulting in a total of 402 genotypes for analysis (Table S1). After eliminating redundancies, 326 unique genotypes, spanning 81 distinct locations from 21 provinces (Fig. 1; Table S1); were retained, while the remaining 76 genotypes, exhibiting identical sequences to those within the 326, were excluded from the final dataset.

The length of the plastomes exhibited a range of 160 874 to 161 921 bp, with an average size of 161 397 bp. The LSCs, SSCs, and IR regions displayed lengths spanning from 88 551 to 89 530 bp, 19 033 to 19 387 bp, and 26 461 to 26 533 bp, respectively (Fig. 2A; Table S2). Across all surveyed plastomes, the entire genome GC content exhibited variations, encompassing values from 36.75% to 36.85%, with the highest and lowest GC content located in IR regions (42.60%–42.70%) and SSC regions (30.85%–31.00%), respectively (Fig. 2B). The overall GC contents showed a negative correlation with the sizes of the plastomes, with the largest negative correlation (R = −0.9) detected in LSC and the smallest negative correlation (R = −0.36) in SSC (Fig. 2B). In all plastomes, a total of 127 annotated genes were identified, including 83 protein-coding genes, 36 tRNA genes, and eight rRNA genes (Fig. 3A).


3.2 Structural comparison of plastomes
Our findings revealed that the distance between the IRa/SSC boundary and the ndhF gene varied between 74 and 79 bp, with 74 bp being the most prevalent length (325 occurrences). Additionally, the SSC/IRb boundary was consistently positioned within the ycf1 gene for all samples, spanning from 4591 to 4597 bp within the SSC regions, exhibiting a 6 bp difference. The junctions of IRa/LSC were both located within the rps19 gene with 107 and 171 bp in the LSC and the IRa regions, respectively, without differences. The distance between the trnH gene and the IRb/LSC boundary ranged from 132 to 157 bp (Fig. 3B). The contraction and expansion of the IR regions resulted in a difference of a total of 36 bp in our survey of Ziziphus jujuba plastomes. However, the size difference of all plastomes was approximately 1 kb, which suggested that the contraction and expansion of the IR regions were not the main reason for the length difference of Z. jujuba plastomes.
Using “Dongzao” jujube as a reference, we identified a total of 548 SNPs and 454 Indels. These variations were distributed unevenly throughout the entire plastome, with mutation rates of 0.87%, 0.80%, and 0.12% in the LSC, SSC, and IR regions, respectively. Among these, 404, 110, and 34 SNPs, as well as 376, 44, and 34 Indels, were found in the LSC, SSC, and IR regions, respectively. Within these, a total of 148 SNP sites and 98 Indel sites were found within the gene regions (Fig. 3A; Table S3).
3.3 HGT from chloroplast to the nuclear genome
In DZ and SZ, we identified 6277 (5.93 Mb) and 3856 (3.26 Mb) NUPTs, which collectively spanned the entire plastomes. NUPTs larger than 2 kb constituted 13.55% in DZ and 11.87% in SZ (Figs. 4A, S1A). Chromosome 11 exhibited the highest number of NUPTs in both DZ (26.5%) and SZ (22.6%) (Table S4). Additionally, chromosome 6 contained a greater number of NUPTs in DZ (19.7%) compared with SZ (10.1%). Categorizing the NUPTs using kernel density estimation, we observed that within the high-density interval (>0.05), the identity between NUPTs and their respective chloroplast genes range from 89% to 98% for both DZ and SZ (Figs. 4B, S1B). The high-density regions of NUPTs in DZ and SZ comprised 80.51% and 80.34% of the total count, respectively, implying that NUPTs in jujube are predominantly concentrated within a specific timeframe denoted by this prominent peak. A collinear analysis of the nuclear genome and plastome revealed that, except for chromosomes 1 and 3, the majority of NUPTs in the high-density regions were situated at the termini of each chromosome (Figs. 4C, S1C). Notably, this phenomenon was unique to DZ in the case of chromosome 5 (Figs. 4A, 4C).

In both DZ and SZ genomes, a substantial proportion of NUPTs comprised intact chloroplast coding genes, accounting for 50.15% and 48.96% respectively. This finding highlights the substantial involvement of chloroplast-coding genes in HGT events. Among these intact genes, we identified 34 in DZ and 42 in SZ, with an overlap of 28 genes between the two genomes. Notably, gene transfer frequently resulted in nucleotide mutations, predominantly in the form of SNPs. The average CMRs for all transferred intact genes were 42.96% in DZ and 34.66% in SZ. Genes associated with self-replication exhibited notably higher average CMRs than those involved in photosynthesis in both DZ (48.42% versus 42.14%) and SZ (39.39% versus 34.10%). Despite the considerable range observed between genes with the highest and lowest CMRs, such as 70.57% in rpl23 versus 31.22% in rps2 for DZ, and 67.38% in rpl23 versus 15.98% in ndhF for SZ, statistical analysis did not reveal significant differences in CMRs between genes related to self-replication and photosynthesis (Figs. 4D, S1D, S2; Table S5).
3.4 Phylogenetic and population genetics analysis
To explore the phylogenetic relationships of 326 jujube genotypes, we constructed a phylogenetic tree using all complete plastomes, with Ziziphus mauritiana serving as the outgroup. The resulting tree (Fig. 5A) delineated clear clades, distinguishing W-jujube (Groups 1 and 3) from C-jujube (Groups 2 and 4). While a few instances exhibited W-jujubes interspersed within C-jujube clades, signifying transitional or semi-wild jujube genotypes, the tree reinforced the previous conclusion that C-jujube originated from W-jujube (Huang et al., 2016; Liu et al., 2020; Guo et al., 2021a). Specifically, W-jujubes in Group 1 evolved into the C-jujubes in Group 2, and W-jujubes in Group 3 generated C-jujubes within Group 4. These two trajectories were denoted as evolution path 1 (EP1) and evolution path 2 (EP2) in this study. These findings were substantiated by the results of principal component analysis (PCA) as well (Figs. 5B, S3).

Population genetics analysis through ADMIXTURE confirmed the presence of EP1 and EP2 populations at K = 2. Along EP2, a notable divergence between W-jujube and C-jujube populations emerged at K = 3. At K = 4, Group 1, one of the W-jujube groups, further segregated into two different sections, the smaller of which (brown color in K = 4 and green color in K = 5) positioned in the basal clade of the phylogenetic tree, potentially indicative as an ancestral lineage of W-jujube. The differentiation between W-jujube and C-jujube in EP1 became evident at K = 5, emphasizing a marked differentiation between the two in EP1 (Fig. 5C).
Utilizing an alignment matrix encompassing 326 jujube plastomes, we conducted an SNP-based haplotype analysis, which revealed the presence of 53 effective haplotypes. Among these, 30, 5, 11, and 7 corresponded to Groups 1, 2, 3, and 4, respectively (Table S6). The resulting haplotype network showcased a tendency for jujube haplotypes to cluster together, while the haplotypes of wild jujubes displayed greater dispersion, aligning consistently with the PCA results. Within the haplotype network, wild jujube haplotypes clustered into two groups, resulting in the emergence of two corresponding jujube haplotype groups. These four groups were the same as those in the phylogenetic and PCA analysis. This observation provided support for the existence of the two evolution pathways, EP1 and EP2 (Fig. 5D).
For a deeper insight into the chloroplast-based evolution of jujubes, we broadened our analysis to encompass diverse geographical regions. Upon categorizing them by distribution, we observed that Group 2 primarily comprised C-jujubes from the Shanxi province surrounding the MLYR basin, while Group 4 exhibited relatively equal representation across three locations: Shanxi, Shaanxi, and Henan provinces (Table 1). These findings corroborated previous research, which suggested that the initial domestication and cultivation of jujubes took place within the MLYR region, particularly highlighting Shanxi province as a primary domestication center preceding Shaanxi (Fuller & Stevens, 2019; Liu et al., 2020; Guo et al., 2021a). Subsequently, we conducted ancestral area reconstruction based on the phylogenetic tree of 326 jujube samples of Fig. 5A. This analysis utilized four distinct geographic distributions: the MLYR, Western China (WC), Southern China (SC), and Eastern China (EC), with the provinces encompassed by each region delineated in Table S7. The results validated that the MLYR region predominantly encompassed the ancestral nodes (nodes 640 and 641) (Fig. S4), thereby signifying the MLYR region as the origin of the jujube. Building upon this, our subsequent analysis regarded the MLYR as the origination area of jujubes.
Middle and Lower reaches of the Yellow River (MLYR) | |||||||
---|---|---|---|---|---|---|---|
Shanxi | Shaanxi | Henan | Western China (WC) | Southern China (SC) | Eastern China (EC) | Total | |
Group 1 | 14 | 10 | 16 | 39 | 1 | 41 | 121 |
Group 2 | 11 | 1 | 2 | 0 | 8 | 19 | 41 |
Group 3 | 12 | 7 | 8 | 13 | 1 | 19 | 60 |
Group 4 | 23 | 15 | 12 | 11 | 16 | 27 | 104 |
We computed the nucleotide diversity (π) values, which quantifies the level of polymorphism within a population, for the 326 jujube samples across four distinct regions: the MLYR, WC, SC, and EC. Our analysis revealed that within EP1, Group 1 exhibited a comparable π value (9.14e−4) to that of Group 2 (9.99e−4), implying a limited influence of π value on the transition from W-jujube to C-jujube. Conversely, within EP2, Group 3 displayed a higher π value (1.98e−3) in contrast to Group 4 (3.85e−4), indicating a more pronounced role of the π value in the transition from W-jujube to C-jujube. Notably, among the C-jujube groups, the lowest π values consistently occurred in C-jujube populations originating from the MLYR region (1.59E−3 in Group 2 and 6.46E−4 in Group 4) (Fig. 6A), underscoring the MLYR region as the center of jujube domestication.

To further elucidate the interrelationships among jujube populations within these diverse regions, we calculated the Fst values, which serve as indicators of the extent of genetic differentiation among various jujube populations (Fig. 6A). The Fst values between W-jujubes and C-jujubes were lower in EP1 (0.009) than in EP2 (0.214), indicating a diminished differentiation of W-jujubes to C-jujubes in EP1 compared with EP2, which was consistent with the results utilizing π values. With a composition of 124 W-jujubes and 62 C-jujubes, EP1 likely signifies the consequence of natural selection with minimal human interference, transitioning from W-jujubes to C-jujubes. In contrast, EP2 showcases higher genetic divergence than EP1, aligning more closely with the impact of artificial selection or human domestication processes.
Within Group 2, the genetic differentiation of jujube populations between MLYR and SC (Fst: 0.008) was lower than that between MLYR and EC (Fst: 0.016). This indicates a selection trajectory in EP1, likely commencing from MLYR to SC and subsequently extending from SC to EC. In Group 4, the Fst value between jujube populations from MLYR to EC was notably lower (Fst: 0.005) compared with that between MLYR and WC (Fst: 0.014) and MLYR and SC (Fst: 0.019). These observations imply that in EP2, following its origin in MLYR, the domestication process initially expanded to EC and subsequently extended WC and SC, which represented a different trajectory compared with EP1.
We speculated that the Tai-hang Mountains could account for these differences. For EP1, natural selection might have been hindered between MLYR and EC due to geographical factors, while in EP2, human-driven domestication activities along the Yellow River basin potentially surmounted this barrier, thus facilitating the initial cultivation of jujubes from MLYR to EC. In EP2, the Fst values between MLYR and WC (0.014) as well as between MLYR and SC (0.019) both exceeded the Fst between MLYR and EC. This pattern suggests a subsequent phase of the human domestication process from MLYR to WC and SC. Although we cannot precisely trace these historical processes, the historical human migrations toward the South and West over the past 2000 years likely played a role in shaping these genetic trends. This phenomenon was particularly evident in the SC regions. In EP1, the primary distribution areas of jujube were relatively concentrated around the southeastern coastal regions on a smaller scale that aligned with the features of natural selection. On the other hand, in EP2, the distribution areas were more scattered, extending into the further southern provinces, a trend likely attributed to human intervention. In the process of domestication from MLYR to WC during EP2, the trade facilitated by the “Silk Roads,” which commenced during the Han Dynasty, emerged as a plausible explanation (Fig. 6B).
3.5 Divergence time estimation
The plastome offers several advantages over the nuclear genome, such as higher stability, shorter length, easier accessibility, and lower variation rate (Raubeson et al., 2007), which makes it widely utilized in phylogenetic reconstruction. Utilizing three fossil-based calibration time points, as outlined in Section 2.7, we constructed the phylogenetic tree and estimated the divergence times. This analysis involved 16 genera, comprising a total of 36 Rhamnaceae species, and two additional Elaeagnaceae species as the outgroup, using their complete plastomes (Table S8). Within the broader evolutionary context dating back to 278 Ma, the Rhamnaceae family exhibits primarily two clades, which diverged approximately 267 Ma. Within the Ziziphus clade, two sub-clades have also been identified, with one being exemplified by Z. jujuba and the other by Z. mauritiana, both of which are prominent species within this family. The divergence between these two sub-clades occurred roughly 60 Ma, which is notably close in time when compared with the 68 Ma divergence between Ziziphus and its nearest relatives, particularly the genus Paliurus (Fig. 7A).

We further examined the divergence timing for the four genealogical groups of jujubes in our population analysis using the 22 Ma calibration time point that was determined by the divergence between Z. jujuba and Ziziphus hajarensis, as depicted in Fig. 7A. The phylogenetic tree was constructed using the complete plastomes of 37 represented jujube individuals from the tree of Fig. 5A, with Z. hajarensis as an outgroup (Fig. 7B). The results unveiled different divergence times of W-jujube and C-jujube along two divergent evolutionary paths, EP1 and EP2, with estimated times of 0.13 and 1.82 Ma, respectively (Fig. 7B). This phylogenetic analysis improved our understanding of the evolutionary history and temporal divergence events with the spectrum of jujube species.
4 Discussion
The plastome has proven to be feasible to determine phylogenetic relationships and conduct population genetics analysis in closely related species, as evidenced by studies on watermelon (Cui et al., 2020), Venus slipper (Guo et al., 2021b), and Pueraria montana (Sun et al., 2023). In this study, we sequenced the plastomes of 230 jujube genotypes and incorporated the public data of 172 W-jujube individuals. After removing redundant complete plastomes, we finally utilized 326 individuals for comprehensive analysis.
4.1 Size variations of plastome in Ziziphus jujuba
All jujube plastomes adhered rigorously to the quadruple structure and maintained a stable genome size of approximately 161 kb with a modest variation of 1048 bp among individuals. This degree of variation is comparable to that observed in the plastomes of Ulmus mianzhuensis, where a variation of 1053 bp was noted across 31 accessions (Lin et al., 2023). Comparative analysis of size variations, categorized into the entire plastome, LSC, SSC, and IR regions, across the four phylogenetic groups, revealed minimal differences in the SSC region but pronounced variations in the LSC region (Fig. S5).
A correlation was observed between length variations and two evolutionary paths, EP1 (comprising Group 1 and Group 2) and EP2 (involving Group 3 and Group 4). In EP1, plastome size predominantly contracted from W-jujube to C-jujube (P < 0.001), while the opposite expansion occurred in EP2, with size notably increasing from W-jujube to C-jujube (P < 0.001 except in SSC). The primary contributor to these variations was the LSC region, serving as a 979 bp length difference, accounting for approximately 93.50% of the total size disparity. This underscores the pivotal role of the LSC region in the evolutionary history. These contrasting changes between EP1 and EP2 exemplify the differential consequence of natural selection and artificial selection. Nevertheless, the reasons for size reduction in nature selection and enlargement in artificial selection remain elusive and necessitate additional evidence, such as an examination of phenotypic changes from W-jujube to C-jujube in both EP1 and EP2.
The IR region displayed a similar pattern to the LSC region, but with a narrower range of length variations, approximately 40 bp. This variance primarily resulted from the expansion or contraction of the IR boundary, rather than sequence variations within the IR region itself. This aligns with the well-established conclusion that IRs play a crucial role in maintaining plastome structure stability (Maréchal & Brisson, 2010). In comparison, studies on Asparagus densiflorus (Wong et al., 2022) and tree peony (Guo et al., 2020), have revealed that the principal determinants of plastome length were not the contraction and expansion of the IR region but rather the length variation in the LSC region. While a direct connection between these plastome variations and the overarching evolutionary processes in these species remains elusive, the patterns observed in jujube offer valuable insights into the study of plastome evolution.
4.2 The distribution of NUPTs in jujube
HGT constitutes a well-recognized, fundamental mechanism in organismal evolution (Smets & Barkay, 2005; Zhang et al., 2023). Although more prevalent in bacteria, recent research has illuminated instances of HGT occurring between organelles and the nuclear genome of plants, particularly evident in the transfer of genes from the plastome to the nuclear genome (Richardson & Palmer, 2007; Keeling & Palmer, 2008; Bock, 2010; Gao et al., 2014). Notably, previous research in Medicago and Moringa oleifera revealed varying patterns of NUPT distribution across chromosomes (Jiao et al., 2023; Marczuk-Rojas et al., 2023).
Drawing from these preceding studies, our present research endeavors to explore HGT events from the plastome to the nuclear genome in jujube and investigate the consequences of these NUPTs, which aim to discern any correlation between HGT and the evolutionary trajectory of the jujube. A comparative analysis of NUPTs between two jujube varieties, DZ and SZ, reveals intriguing distinctions. While both varieties encompass the entire plastome sequence in NUPTs, DZ exhibits a notably more widespread distribution, spanning 5.93 Mb compared with SZ's 3.26 Mb within the nuclear genome. This discrepancy, possibly attributable to differences in nuclear genome quality between DZ and SZ, extends to variations in NUPT size and their uneven distribution across specific chromosomes, like in Chromosome 6. Nonetheless, there are shared characteristics in NUPTs between DZ and SZ, such as an enrichment of NUPTs on Chromosome 11 and a tendency for NUPTs to cluster at chromosome ends (Figs. 4A, 4C). These features, while not commonly reported in other species, underscore the unique nature of NUPT dynamics in jujube. Our analysis of NUPT identity distribution with their corresponding plastome genes reveals an unimodal distribution (Fig. 4B), indicating a singular large-scale plastome transfer event in jujube, distinguishing it from Moringa oleifera, which experienced two distinct NUPT events (Marczuk-Rojas et al., 2023).
To date, limited research has focused on the implications of NUPT storage and the interplay between plastomes and nuclear genomes (Filip & Skuza, 2021). In a seminal 2003 study, a genetic system was designed to transfer chloroplast fragments to the nuclear genome, which revealed a remarkably high frequency of gene transfer from chloroplasts to the nucleus. This phenomenon underscores an ongoing mechanism in nuclear genome evolution driven by the recurrent acquisition of organellar DNAs, accentuating the crucial role of promiscuous DNA insertions in shaping genetic diversity within multicellular eukaryotes (Stegemann et al., 2003). In our examination, NUPT genes were categorized into two groups: intact genes, amenable to precise alignment with corresponding plastome genes (Fig. 4D), and non-intact or fragmentary ones. We conducted a survey of the SNP number for the corresponding chloroplast genes across 326 jujube populations. The results showed no significant difference in the SNP number between the two categories (Fig. S6), which indicates a balanced selection process for all chloroplast genes, irrespective of their transfer in intact or fragmented forms. Therefore, the recurrent migration of chloroplast DNA to the nuclear genome appears to function as a protective mechanism, ensuring the preservation of the plastome due to its critical role in plant photosynthesis.
Furthermore, the frequent integration of chloroplast DNA into the nuclear genome emerges as a mechanism driving nuclear genome evolution and contributing to intraspecific genetic diversity (Stegemann et al., 2003). Our recent investigation unveiled a distinctive storage mechanism for NUPTs within the jujube nuclear genome (DZ), wherein these NUPTs aggregate within the largest topologically associated domains (TADs), spanning 1.55 Mb within the three-dimensional chromosome architecture. This arrangement is characterized by high methylation and minimal gene expression (Yang et al., 2023). Analysis of intact NUPT genes revealed markedly high mutation rates across all surveyed genes (Figs. 4D, S1D, S2; Table S5). This, combined with the substantial mutations in those fragmentary NUPT genes, suggests a gradual adaptation of NUPTs to the nuclear genome. As previously reported, hypermethylation likely serves as a primary driver of NUPT mutations, concurrently playing a crucial role in maintaining nuclear genome stability against organellar sequences (Yoshida et al., 2019; Zhang et al., 2020b). The high methylation profiles and increased mutation rates observed in jujube NUPTs in our study support these earlier findings. To summarize, our jujube HGT studies not only elucidate the mechanisms of NUPT storage but also emphasize that HGT from the plastome to the nuclear genome predominantly drives the evolution of the nuclear genome, with a limited impact on the plastome itself.
4.3 Evolution of C-jujube – a two-way selection process
Our study from the RASP results aligned with the previous conclusion, confirming that C-jujube originated from W-jujube, with the MLYR region serving as the center of origin (Liu et al., 2020; Guo et al., 2021a). Additionally, we delineated two evolutionary paths: EP1 and EP2, which traced the trajectory from W-jujube to C-jujube. EP1 characterizes a process of natural selection with minimal human interference, primarily occurring from MLYR to the southeastern region of China. In contrast, EP2 entailed a human-driven process of domestication and cultivation, with its origins in MLYR, initial expansion across eastern China along the Yellow River basin, and subsequent spread to the southern and western regions of China through human migration, particularly along the ancient “Silk Roads.” These two paths were discernible in the values of π and Fst. In EP1, minimal differentiation exists between W-jujube and C-jujube. Conversely, in EP2, C-jujube exhibited a significantly lower π value compared with W-jujube, and the Fst between them was notably higher than in EP1. These demonstrated the role of artificial selection and human domestication in diminishing the genetic diversity of C-jujube in EP2 (Group 4). These insights shed light on the historical migration and dispersal patterns of jujube, providing a fresh perspective on the domestication and cultivation processes that have influenced the genetic diversity shaping the present distribution of jujube. This information will be instrumental in guiding the selection of suitable individuals for both W-jujube and C-jujube in jujube breeding research.
4.4 W-jujube and C-jujube tend to be the same species
Jujube has a long history of human domestication dating back to over 7000 years ago during the Neolithic age (Liu et al., 2020). Our plastome analysis suggested an even extended evolutionary timeline for jujube. Previous research with nuclear genome resequencing data indicated a divergence time of approximately 2.7 Ma between C-jujube and W-jujube, reflecting an extended pre-domestication phase (Guo et al., 2021a). However, our plastome analysis reveals two distinct divergence times of 0.13 and 1.82 Ma, corresponding to the divergence within EP1 and EP2, respectively, between W-jujube and C-jujube. These estimates are significantly later than the previously reported 2.7 Ma, primarily due to the comparatively slower mutation rate of plastome when compared to the nuclear genome (Dong et al., 2020; Jiang et al., 2023; Lin et al., 2023). Our recent comparative genomics study provided evidence supporting the notion that W-jujube and C-jujube belong to the same species (Yang et al., 2023). This perspective was further reinforced by our plastome analysis, which demonstrates that the divergence between Z. jujuba and Ziziphus hajarensis, the closest Ziziphus species to Z. jujuba in our dataset, as well as between Z. jujuba and Ziziphus mauritiana, another prominent Ziziphus species cultivated mainly in tropical or sub-tropical regions globally, occurred over timescales of 22 and 60 Ma, respectively. These temporal scales far exceed the 0.13 or 1.82 Ma timeline. Hence, the evolutionary process from W-jujube to C-jujube should correspond to the events within the same species as Z. jujuba, which indicates an ongoing evolution event from W-jujube to C-jujube.
Acknowledgements
We would like to acknowledge the National Jujube Germplasm Resource Nursery located in Taigu, Shanxi and the Fruit Tree Research Institute of Shanxi for their valuable assistance in sample collection. This work was supported by the projects of the General Program of the National Natural Science Foundation of China (Grant No. 32171817); the General Program of the Natural Science Foundation of Hebei Province, China (Grant No. C2022204030); the Special Research Projects for the New Talent of Hebei Agricultural University, Hebei Province, China (Grant No. YJ2020025); the Basic Scientific Research of Colleges and Universities in Hebei Province (Grant No. KY2022005); and the Hebei Province Key R&D Program (21326304D).
Conflicts of Interest
The authors declare no conflict of interest.
Open Research
Data Availability Statement
The assembled plastome of Ziziphus mauritiana and all 326 assembled plastomes of Ziziphus Jujuba have been deposited in the National Genomics Data Center (NGDC) under the GenBase Resource. The accession numbers for each plastome are stored in Table S1. The script for converting the multiple sequence alignment into a VCF file is accessible on GitHub at: https://github.com/biomichal/Plastome-project.