Genomic analyses provide insights into the polyploidization-driven herbicide adaptation in Leptochloa weeds
Summary
Polyploidy confers a selective advantage under stress conditions; however, whether polyploidization mediates enhanced herbicide adaptation remains largely unknown. Tetraploid Leptochloa chinensis is a notorious weed in the rice ecosystem, causing severe yield loss in rice. In China, L. chinensis has only one sister species, the diploid L. panicea, whose damage is rarely reported. To gain insights into the effects of polyploidization on herbicide adaptation, we first assembled a high-quality genome of L. panicea and identified genome structure variations with L. chinensis. Moreover, we identified herbicide-resistance genes specifically expanded in L. chinensis, which may confer a greater herbicide adaptability in L. chinensis. Analysis of gene retention and loss showed that five herbicide target-site genes and several herbicide nontarget-site resistance gene families were retained during polyploidization. Notably, we identified three pairs of polyploidization-retained genes including LcABCC8, LcCYP76C1 and LcCYP76C4 that may enhance herbicide resistance. More importantly, we found that both copies of LcCYP76C4 were under herbicide selection during the spread of L. chinensis in China. Furthermore, we identified another gene potentially involved in herbicide resistance, LcCYP709B2, which is also retained during polyploidization and under selection. This study provides insights into the genomic basis of the enhanced herbicide adaptability of Leptochloa weeds during polyploidization and provides guidance for the precise and efficient control of polyploidy weeds.
Introduction
Polyploidy is commonly found in plants. In addition to many important crops such as wheat (Triticum aestivum) (Brenchley et al., 2012), cotton (Gossypium hirsutum) (Zhang et al., 2015) and rapeseed (Brassica napus) (Chalhoub et al., 2014), some important weeds such as Leptochloa chinensis (Wang et al., 2022), barnyard grass (Echinochloa crus-galli) (Guo et al., 2017) and Shepherd's purse (Capsella bursa-pastoris) (Kasianov et al., 2017) are also polyploid. Polyploidy not only plays an important role in plant genome evolution and species diversification, but also increases the adaptive plasticity of plants to extreme environments because of their increased genetic variation and the buffering effect of their duplicated genes (Adams and Wendel, 2005; Chao et al., 2013; Doyle and Coate, 2019; Freeling et al., 2015; Guo et al., 2017; Meimberg et al., 2009; Te Beest et al., 2012; Van de Peer et al., 2009, 2017; Wendel, 2015). It has been shown that plant polyploidy has effects on both biotic and abiotic stress responses. Previous studies found that tetraploid garden impatiens (Impatiens walleriana) showed improved resistance to downy mildew (Plasmopara obducens) relative to its diploid counterparts (Wang et al., 2018), and tetraploid Livingstone potato (Plectranthus esculentus) was more resistant to root-knot nematodes than diploids (Hannweg et al., 2016). In terms of abiotic stresses, it has been shown that tetraploid Arabidopsis plants exhibit increased salt tolerance compared with diploids (Chao et al., 2013). Moreover, both tetraploid rice (Oryza sativa) and citrange (Citrus sinensis L. Osb. × Poncirus trifoliata L. Raf.) have an increased tolerance to salt and drought stresses as a result of whole-genome duplication (Ruiz et al., 2016; Yang et al., 2014). However, whether polyploidy mediates enhanced herbicide adaptation remains largely unknown.
Weeds are one of the most extreme survivors on the planet, which can effectively evade human's control and have a strong ability to adapt to the environment (Sharma et al., 2021). Leptochloa chinensis is one of the most notorious weeds in rice ecosystems and has recently become the major weed in direct seeded rice fields in China, which account for ~21% of the total rice production area (Chakraborty et al., 2017). Leptochloa chinensis has a strong environmental adaptability and adaptive plasticity, with many biotypes evolving tolerance to herbicides including cyhalofop-butyl and metamifop, two commonly used herbicides for the control of L. chinensis in rice fields (Chen et al., 2021; Peng et al., 2020; Yu et al., 2017; Zhang et al., 2021). However, due to the lack of suitable closely related plant models, few studies are available on the origin of its herbicide adaptive characteristics. The genus Leptochloa contains around 29 species (http://www.plantsoftheworldonline.org/taxon/urn:lsid:ipni.org:names:18378-1), but only two have been found in China, L. chinensis and L. panicea. Leptochloa chinensis is a tetraploid (2n = 4× = 40) while Leptochloa panicea is a diploid species (2n = 2× = 20). L Leptochloa chinensis and L. panicea display very similar morphological characteristics (Figure S1). Their plants are also very similar to rice in the seedling stage and have a similar life cycle to that of rice. Leptochloa chinensis has strong environmental adaptability, especially herbicide adaptation, in paddy fields and can cause great harm to rice production, while the damage of L. panicea is rarely reported.
Compared with crops, genome studies of weeds still lag far behind. In recent years, with the advances of sequencing technologies and the increasingly serious harm of weeds, genomes of nearly 20 weed species have been assembled (Sharma et al., 2021), but only a few have reached the chromosome level, which leaves many of the genetic mechanisms associated with weed adaptation still not well-resolved. In this study, we assembled the chromosomal-level genome of L. panicea, the sister diploid species of L. chinensis, to examine the mechanisms of herbicide adaptation and patterns of genome evolution during polyploidization. We performed a comparative genomic study to characterize the origin and evolutionary history of Leptochloa weeds and identified subgenome-shared genomic structural variants (SVs) and subgenome-specific SVs and found a shared inversion that may mediate plant defence responses. Next, we performed gene family expansion and contraction analysis and surveyed the gene synteny retention pattern during the polyploidization and found three pairs of polyploidization-retained genes including LcCYP76C1, LcCYP76C4 and LcABCC8 that may confer herbicide-resistance. Moreover, we found that LcCYP76C4 was under herbicide selection during the malignant spread of L. chinensis in China and accordingly screened for another polyploidization-retained LcCYP709B2 gene that was also under selection and could potentially confer herbicide resistance in the plant. This study reveals the genomic basis of the polyploidization-driven herbicide adaptation and provides a solid foundation for future development of novel or improved strategies for efficient management of L. chinensis and other polyploid weeds.
Results
Genome assembly and annotation of L. panicea
Based on k-mer frequency analysis, the L. panicea genome had an estimated size of about 286.51 Mb and a heterozygosity level of 0.047% (Figure S2). The estimated genome size was close to the 280 Mb determined using flow cytometry. We generated a total of 32.73 Gb (133×) PacBio HiFi sequences with a read N50 length of 16.60 kb (Table S1, Figure S3). These HiFi reads were de novo assembled into 348 contigs with an N50 size of 21.96 Mb (Table S2). Using Hi-C data of approximately 97× coverage (Table S1), a total of 235.12 Mb of assembled contigs were clustered into 10 pseudomolecules with sizes ranging from 15.55 Mb to 32.52 Mb (Figures 1a, S4), of which 7 and 3 had telomeric sequences (5′-TTTAGGG-3′) at both and single ends, respectively (Table S3). Evaluation using BUSCO (Simão et al., 2015) indicated that 98.7% of plant conserved orthologs were fully captured by the L. panicea assembly (Table S4). Illumina paired-end reads were mapped back to the assembly, resulting in an overall alignment rate of 99.23%. The LAI score of the assembly was 17.02 (Figure S5). Together, these results suggested the high quality of the L. panicea genome assembly (Table 1).

Assembly feature | |
Contig N50 | 21.96 Mb |
Contig number | 348 |
Assembled genome size | 235.12 Mb |
Chromosome number | 10 |
BUSCO coverage | 98.70% |
LAI assembly index | 17.02 |
Gene models | |
Number of gene models | 33 481 |
Mean coding sequence length | 1270 bp |
Mean number of exons per transcript | 5.2 |
Mean exon length | 320 bp |
Mean intron length | 388 bp |
Non-protein-coding genes | |
Number of miRNA gene | 114 |
Number of tRNA gene | 2751 |
Number of rRNA gene | 8114 |
Number of snoRNA gene | 362 |
Number of snRNA gene | 95 |
A total of 33 481 gene models were predicted in the L. panicea genome, of which about 99.36% (33267) were assigned to the 10 chromosomes (Table S5). In addition, 2751 tRNA genes, 8114 rRNA genes, 114 microRNA genes (miRNAs), 95 small nuclear RNA genes (snRNAs) and 362 small nucleolar RNA genes (snoRNAs) were predicted in the L. panicea genome (Table 1). BUSCO assessment indicated that 99.2% of plant conserved orthologs were completely covered by the predicted genes (Table S4).
Among the 33 481 predicted proteins, 30 842 (92.12%) were annotated by GenBank nr, 28 957 (86.49%) by eggNOG mapper, 20 638 (61.64%) by Swissprot, 11 023 (32.95%) by the KEGG database and 13 780 (41.16%) were assigned with Gene Ontology (GO) terms. The average GC content of the CDSs of L. panicea (56.06%) was similar to that of L. chinensis (56.90%), O. sativa (55.19%), Sorghum bicolor (55.51%) and Setaria viridis (56.01%), but higher than that of Arabidopsis thaliana (44.17%) (Table S6). The GC content and GC3s (GC of silent 3rd codon position) of CDSs showed a bimodal distribution in L. panicea, consistent with those found in other grasses (Figures S6, S7).
In the L. panicea genome assembly, we identified 66.24 Mb (28.21%) repetitive sequences (Table S7). The long-terminal repeat retrotransposon (LTR-RT) was the most abundant type of repetitive sequences in L. panicea, spanning 21.11 Mb (8.98%) of the genome. A total of 226 intact LTR-RTs were classified, including 129 Gypsy-type and 74 Copia-type LTRs. The largest LTR-RT superfamily Gypsy, comprising ~6.34% of the genome, was concentrated near the putative centromeres (Figure S8A). The other superfamily of LTR-RT, Copia, comprised ~1.37% of the genome (Figure S8B). Other interspersed repeats such as LINEs (Long Interspersed Nuclear Elements) and SINEs (Short Interspersed Nuclear Elements) occupied 1.86% and 0.08% of the genome, respectively (Figure S8C,D).
In addition, centromere regions were identified in nine of the 10 L. panicea chromosomes (except chromosome 6) using Tandem Repeat Finder (Benson, 1999) (Figure S9). Furthermore, we detected the top eight TE subfamilies, including four LTR/Gypsy, two unknown, one LINE/L1 and one LTR/cassandra type subfamilies, which together comprised over 10.65% of the L. panicea genome (Figure S10). Density of these TE subfamilies along the 10 chromosomes showed that only the longest unknown-type rnd-3_family-356 subfamily (6.44% of the genome) was enriched near centromeres but absent from the rest of the genome. These results are consistent with results from Tandem Repeat Finder and complement the identification of the centromere position on chromosome 6 (Figure S9). Overall, we predicted potential centromeric regions on all 10 chromosomes.
Comparison between the L. panicea and L. chinensis genomes
The high-quality genomes of L. panicea provides an opportunity to compare the two subgenomes (At and Bt) of L. chinensis (Wang et al., 2022) with the L. panicea genome to evaluate the effect of polyploidization in Leptochloa weeds. A total of 174.9 Mb syntenic regions between L. panicea and At, and 165.85 Mb between L. panicea and Bt were identified (Figure 1b, Table S8). Every chromosome of L. panicea was basically collinear with two homologous chromosomes of L. chinensis (Figure 1c), consistent with the diploid and tetraploid nature of L. panicea and L. chinensis, respectively. And each chromosome of L. panicea was basically collinear with one chromosome of O. thomaeum, a diploid species in Chloridoideae (Figure 1d).
SVs in gene body and promoter regions can impact gene functions and expression. We found that 6402 predicted genes in L. panicea and 5909 in At had at least one indel in their gene coding sequences or promoter regions, with 3792 in L. panicea and 3333 in At having indels in their coding sequences (Figure S11). In addition, 6144 predicted genes in L. panicea and 5662 in Bt had at least one indel in their gene body or promoter regions, with 3648 in L. panicea and 3271 in Bt having indels in their coding sequences (Figure S12). To explore whether there is a subgenome bias for genomic variation (SV), we found that 175 033 SVs existed only between L. panicea and At, 162162 SVs existed only between L. panicea and Bt, and 53 046 SVs shared by both At and Bt (Figure 2a). Moreover, we found that 9.09% of these SV sequences between L. panicea and At and 6.99% of these SV sequences between L. panicea and Bt were gypsy-like retrotransposons, compared to 6.34% of the entire genome, and 2.10% of these SV sequences between L. panicea and At and 1.74% of these SV sequences between L. panicea and Bt were copia-like retrotransposons, compared to 1.37% of the entire genome. The contents of other types of transposable elements were similar between the SV regions and the entire genome (Figure 2b), suggesting that SVs occurred more frequently in genome regions occupied by gypsy and copia-like retrotransposons.

Chromosomal rearrangements such as inversions and translocations have long been thought to play a critical role in adaptation and speciation (Dvorak et al., 2018). We identified 128 genome rearrangements (translocations and inversions) between L. panicea and At and 138 between L. panicea and Bt distributing across all 10 chromosomes (Figure S13). Among this, 77 inversions ranging from 2.11 Kb to 4.94 Mb were identified between L. panicea and At, and 74 inversions ranging from 2.03 Kb to 4.94 Mb were identified between L. panicea and Bt. Interestingly, we found two large inversions on chromosomes 2 and 9 shared by both At and Bt (Figures 2c, S13), which were further supported by Hi-C maps and/or Illumina read mapping (Figures 2c, S14–S17). GO term enrichment analysis of the 200 genes surrounding the breakpoint regions of chromosome 2 showed that biological processes such as ‘heterocycle metabolic process’, ‘DNA recombination’, ‘nucleic acid metabolic process’ and ‘DNA integration’ were significantly enriched (Figure 2d), and genes surrounding the breakpoint regions of chromosome 9 were mainly enriched with GO terms related to plant defence, including ‘defense response to fungus’ and ‘defense response to other organism’ (Figure 2d), which suggests that this inversion may mediate the difference in biotic stress tolerance between the two species.
Currently, it is unclear whether L. chinensis is an autotetraploid or an allotetraploid. In our previous study, we found that the two subgenomes of L. chinensis display neither fractionation bias nor overall gene expression dominance, suggesting a possible autopolyploid of L. chinensis (Wang et al., 2022). To further explore this issue, we broke the L. panicea genome into 100-mer fragments and mapped these 100-mers to the two subgenomes of L. chinensis. We found that two subgenomes showed substantially similar genomic similarity with the L. panicea genome (Figure S18), supporting an autopolyploid origin of L. chinensis.
Genome evolutionary of Leptochloa weeds
To investigate the genome evolutionary history of Leptochloa weeds, we first calculated Ks values for homologous gene pairs to identify the whole-genome duplication (WGD) events. According to the Ks peak, we dated the divergence time of L. panicea and L. chinensis at 11.6 million years ago (mya), and the two subgenomes of L. chinensis were separated from L. panicea simultaneously (Figure 3a). We found that L. panicea only experienced one ancient WGD event (ρ) shared by members of the grass family, while L. chinensis experienced an additional WGD event recently (the tetraploidization event), dating back to ~9.88 mya. The results indicated that the time of L. chinensis tetraploidization is very close to the divergence time of L. panicea and L. chinensis, and it is possible that this tetraploidization event had led to the L. chinensis speciation.

To further investigate the evolutionary relationships between Leptochloa weeds and other grasses, gene family clustering was carried out using Leptochloa weeds, seven other gramineous plants and A. thaliana. A total of 549 single-copy orthologs shared by these plants were identified and used for phylogenetic reconstruction and species divergence time estimation, which showed that O. thomaeum was the closest relative to the Leptochloa weeds, and the divergence of Leptochloa weeds and O. sativa was estimated to occur ~49.3 mya (Figure 3b).
In addition, we identified 21 086 shared gene families among Leptochloa weeds. A total of 901 gene families containing 2383 genes were unique in L. panicea, and 1479 gene families containing 2528 genes were in L. chinensis (Figure 3c). We found that for both shared and specific genes, the highest number of them were expressed in seeds, followed by roots, stems and leaves in L. chinensis. In L. panicea, on the contrary, the highest number of genes were expressed in seeds and stems, followed by roots and leaves (Figure 3d).
To detect the expansion and contraction of gene families, we used protein sequences of the aforementioned 10 species identified 30 737 orthologous groups (gene families). Through comparing gene families among these 10 species, 3432 gene families were found to be significantly expanded in L. chinensis and 3489 to be significantly expanded in L. panicea (Figure 3b). Functional analysis of the 3432 expanded gene families in L. chinensis revealed that a large number of them were involved in regulating various metabolic pathways such as ‘regulation of metabolic process’ (Figure 3e). Notably, metabolism is one of the main mechanisms of herbicide nontarget-site resistance (Gaines et al., 2020; Powles and Yu, 2010). However, functional analysis of the expanded gene families of L. panicea revealed no enrichment of such functions (Figure 3f). Next, we performed domain annotations of these 3432 gene families expanded in L. chinensis and identified 35 cytochrome P450, 25 ABC transporter, three glutathione S-transferase, 13 AP2 domain and six GRAS domain containing gene families that are known to be involved in the regulation of plant abiotic stresses, in particular, herbicide resistance (Figure 3g). These results indicated that expansion of these gene families may have provided a foundation for the adaptation of L. chinensis to the fields, especially in fields managed with herbicide applications. However, L. panicea does not have such basis for adaptation.
Genes loss and gain during the polyploidization process
To gain insights into the effects of polyploidization on herbicide adaptation, we first calculated gene family sizes by identifying protein domains in L. panicea, L. chinensis and S. italica. The results indicated that the sizes of the majority of gene families in L. chinensis were almost twofold of those in L. panicea (Figure 4a,c), and those in L. panicea and S. italica were almost same (Figure 4b,c). Furthermore, we calculated the sizes of several stress tolerance-related gene families in eight grasses and found that sizes of all the stress tolerance-related gene families in L. chinensis were larger than those in L. panicea and O. thomaeum, and sizes of most gene families in L. chinensis were larger than those in most grasses except NB-ARC and AP2 (Figure 4d).

To identify gene loss and gain during L. chinensis polyploidization, we calculated synteny retention ratios of diploid L. panicea genes in tetraploid L. chinensis (i.e. within a gene family, the percentage of gene members with a 1:(1:1) syntenic relationship among L. panicea, the At and Bt of the tetraploid L. chinensis). Across the genome, 58.18% of syntenic genes fit the 1:(1:1) retention ratio, whereas 19.15% and 22.67% fit the 1:(1:0) and 1:(0:1) retention ratios, respectively (Figure S19, Table S9). Herbicide resistance can be divided into target-site and nontarget-site resistances (Powles and Yu, 2010). It has been reported that amplification of herbicide target genes in weed genomes leads to their increased herbicide tolerance (Gaines et al., 2010). To investigate whether herbicide targets were amplified and retained during L. chinensis polyploidization, we identified five herbicide target enzymes, including acetyl-CoA carboxylase (ACCase), acetolactate synthase (ALS), 5-enol-pyruvylshikimate-3-phosphatesynthase (EPSPS), phytoene desaturase (PDS) and protoporphyrinogen oxidase (PPO), in L. panicea and L. chinensis. We found that these five herbicide target enzymes all conformed to the 1:(1:1) synteny retention pattern (Table S10). Further expression analysis indicated that genes encoding these target enzymes in both subgenomes of L. chinensis were expressed at a medium to high level, suggesting that these genes are functional after polyploidization-mediated amplification. The amplification of these target genes during polyploidization may have enhanced the tolerance of L. chinensis to herbicides (Figure 4e, Table S11).
Furthermore, we found that synteny retention ratios for most abiotic stress tolerance gene families were higher than 50%, such as ABC transporter (76.14%), GRAS (77.5%), glycosyl transferase (65.52%) and CYP450 (55.63%). However, GSTs just reached 39.53% and AP2 reached 49.15%. In biotic stress-related gene families, WRKY reached 64.56%, Legume lectin reached 55.56% but NB-ARC just reached 28.30%, indicating an obvious gene loss of NB-ARC (Figure 4f). An example of an NB-ARC gene that deviated from the 1:(1:1) synteny retention ratio is illustrated in Figure 4g. In relation to L. panicea, only one homeologous copy of this NB-ARC gene was retained in L. chinensis (within Bt). Interestingly, we found that the herbicide resistance gene ABCC8 (Pan et al., 2021), a member of the ABC transporter family which could transport glyphosate molecules into the vesicles to protect cells from toxicity (Figure 4j),within the same genome region as the NB-ARC gene, conformed to a 1: (1:1) synthetic retention ratio (Figure 4g,j). In addition, we found that CYP76C1 and CYP76C4 (Höfer et al., 2014), members of the cytochrome P450 family known to confer metabolic resistance to herbicides in plants (Figure 4k,l), also fit this conserved pattern during polyploidization (Figure 4h,i). We next investigated expression patterns of these three genes. Surprisingly, we found that LpABCC8 and LpCYP76C4 were expressed at the highest level in L. panicea seeds, and LpCYP76C1 was expressed at the highest level in stems, while they were all expressed the highest in L. chinensis leaves (Figure 4m). This shift of expression in the primary tissues may contribute to their resistance to metamifop and cyhalofop-butyl, common stem and leaf spray herbicides for L. chinensis weed control. Furthermore, we analysed transcriptome data of cyhalofop-butyl-resistant and cyhalofop-butyl-sensitive L. chinensis lines (Chen et al., 2021; Zhang et al., 2022b) and found that all five genes, except LcABCC8-2, were expressed at significantly higher levels in the resistant L. chinensis than the sensitive L. chinensis (Table S12).
Improvement of herbicide adaptation during L. chinensis polyploidization
To reveal the mechanism by which L. chinensis polyploidization confers greater herbicide adaptability, we conducted homology 3D reconstruction and molecular docking experiments on above three pairs of polyploidization-retained proteins with cyhalofop acid (the active molecules of herbicide cyhalofop-butyl) and metamifop, two common herbicides used to control Leptochloa weeds in paddy fields, to investigate whether these genes play a role in the herbicide environment. For the LcCYP76C4 gene pair (LcCYP76C4–9 and LcCYP76C4–10), we found that metamifop and cyhalofop acid molecules could bind to the binding pocket of LcCYP76C4–9 with the binding energy of −8.48 kcal/mol and −7.07 kcal/mol, respectively. Metamifop molecules formed strong hydrogen bonds with ILE-208 (bond length: 2.9 Å) and ARG-232 (bond length: 2.1 Å) of LcCYP76C4–9 in the binding pocket (Figure 5a), and cyhalofop acid molecules formed strong hydrogen bonds with TRP-114 (bond length: 2.4 Å), SER-99 (bond length: 2.2 Å), PRO-370 (bond length: 3.3 Å), LEU-367 (bond length: 1.9 Å) and PRO-366 (bond length: 3.2 Å) in the pocket (Figure 5a). Both molecules could also bind to the binding pocket of LcCYP76C4–10 with the binding energy of −8.76 kcal/mol and −7.2 kcal/mol, respectively. Metamifop molecules formed strong hydrogen bond interactions with ALA-295 (bond length: 2.8 Å) and ASN-102 (bond length: 2.1 Å and 3.5 Å) of LcCYP76C4–10 in the binding pocket (Figure 5b), and cyhalofop acid molecules formed strong hydrogen bond interactions with ALA-295 (bond length: 2.8 Å) and ARG-384 (bond length: 3.5 Å and 2.1 Å) in the pocket (Figure 5b). The LcCYP76C1 gene pair (LcCYP76C1–5 and LcCYP76C1–6) also had the similar function. We found that metamifop and cyhalofop acid molecules could bind to the binding pocket of LcCYP76C1–5 with the binding energy of −7.39 kcal/mol and − 6.18 kcal/mol, respectively. Metamifop molecules formed strong hydrogen bond interactions with PRO-343 (bond length: 2.3 Å) and ILE-345 (bond length: 2.0 Å) of LcCYP76C1–5 in the binding pocket (Figure S20A), and cyhalofop acid molecules formed strong hydrogen bond interactions with ARG-394 (bond length: 1.7 Å) and ARG-106 (bond length: 2.1 Å) in the pocket (Figure S20A). Metamifop and cyhalofop acid molecules could bind to the binding pocket of LcCYP76C1–6 with the binding energy of −12.02 kcal/mol and −7.63 kcal/mol, respectively. Metamifop molecules formed strong hydrogen bond interactions with ARG-33 (bond length: 2.5 Å) and THR-186 (bond length: 2.1 Å) of LcCYP76C1–6 in the binding pocket (Figure S20B), and cyhalofop acid molecules formed strong hydrogen bond interactions with ARG-33 (bond length: 2.0 Å) in the pocket (Figure S20B). The ABCC8 gene pair (LcABCC8-1 and LcABCC8-2, members of the ABC transporter family) could also be bound by the two herbicide molecules. Metamifop and cyhalofop acid molecules could bind to the binding pocket of LcABCC8-1 with the binding energy of −6.69 kcal/mol and − 6.26 kcal/mol, respectively. Metamifop molecules formed strong hydrogen bond interactions with HIS-495 (bond length: 2.3 Å) of LcABCC8-1 in the binding pocket (Figure S20C), cyhalofop acid molecules formed strong hydrogen bond interactions with ARG-494 (bond length: 1.8 Å) and HIS-495 (bond length: 2.3 Å) in the pocket (Figure S20C). Metamifop and cyhalofop acid molecules could bind to the binding pocket of LcABCC8-2 with the binding energy of −5.8 kcal/mol and −5.0 kcal/mol, respectively. Metamifop molecules formed strong hydrogen bond interactions with ARG-130 (bond length: 2.1 Å) and THR-241 (bond length: 3.6 Å) of LcABCC8-2 in the binding pocket (Figure S20D), and cyhalofop acid molecules formed strong hydrogen bond interactions with ARG-364 (bond length: 2.0 Å), MET-368 (bond length: 3.5 Å and 2.4 Å), GLN-375 (bond length: 3.3 Å) and ARG-274 (bond length: 2.7 Å and 1.7 Å) in the pocket (Figure S20D). We also performed homology 3D reconstruction and molecular docking experiments for LpABCC8, LpCYP76C1 and LpCYP76C4 genes from L. panicea and found that LpCYP76C1 and LpCYP76C4 had lower binding energies to metamifop/cyhalofop acid than their homologues in L. chinensis and that LpABCC8 had a similar binding energy compared with LcABCC8-2, but a much lower binding energy than LcABCC8-1 (Table S13). These results suggested that these genes in L. panicea may possess weaker herbicide resistance functions than those in L. chinensis and doubling of these genes in L. chinensis due to polyploidization could result in a much greater herbicide adaptation plasticity and buffering capacity in L. chinensis.

Polyploidization-retained herbicide resistance genes under herbicide selection
In a previous study, we have demonstrated that during its spread in China from the southern/southwestern provinces to the middle and lower reaches of the Yangtze River, L. chinensis has developed significantly increased herbicide resistance, accompanied by the selection of numerous genes involved in herbicide resistance (Wang et al., 2022). Interestingly, we found that both copies of LcCYP76C4 were under selection during the malignant spread of L. chinensis in China (Figure 5c,d), where strong herbicide selection has occurred. Their nucleotide diversity in L. chinensis of the southern border of China (not yet subject to herbicide selection) was significantly higher than that in the middle and lower reaches of the Yangtze River (subject to herbicide selection) (Figure 5e,f). These results indicate that the amplification of LcCYP76C4 driven by polyploidization may mediate the herbicide adaptation evolution of L. chinensis in China. We further identified 2738 genes that were not only under selection, but were also retained during the polyploidization process (Figure 5g), including 22 CYP450, seven GST, 11 AP2, 13 ABC transporter and four GRAS genes from families known to be involved in herbicide tolerance (Figure 5h).
We selected four pairs of herbicide resistance family-related genes whose two copies were both under herbicide selection and analysed their expression patterns. Interestingly, the LcCYP709B2 (Lc_Chr5.g15716) gene was highly expressed in leaves (Figure 5i), suggesting that this gene may contribute to the resistance to metamifop and cyhalofop-butyl, common stem and leaf spray herbicides for weed control. We further confirmed the homologous conservative relationships of this gene during polyploidization (Figure 5j) and found that LcCYP709B2–5 may be under more strongly selection than LcCYP709B2–6 (Figure 5k,l).
To test whether this polyploidization-retained LcCYP709B2 gene has the potential to help L. chinensis degrade the commonly used herbicide metamifop and cyhalofop-butyl, we conducted homology 3D reconstruction and molecular docking experiments on this protein. We found that the metamifop molecule could bind to the binding pocket of LcCYP709B2–5 with the binding energy of −10.51 kcal/mol and formed strong hydrogen bond interactions with TRY-338 (bond length: 3.1 Å), PHE-340 (bond length: 2.4 Å) and GLN-341 (bond length: 1.8 Å) in this binding pocket (Figure 5m). The cyhalofop acid molecule could also bind to the binding pocket of LcCYP709B2–5 with binding energy of −7.14 kcal/mol and formed strong hydrogen bond interactions with ASN-78 (bond length: 3.0 Å), TYR-338 (bond length: 3.2 Å and 2.2 Å), PHE-340 (bond length: 2.6 Å and 2.3 Å) and GLN-341 (bond length: 1.9 Å) in the binding pocket (Figure 5n), indicating that in addition to LcCYP76C4, LcCYP76C1 and LcABCC8 mentioned above, the polyploidization-retained LcCYP709B2 gene is also likely to degrade herbicides and plays a role in the malignant spread of L. chinensis in China. Taken together, these results suggest that amplification and retention of herbicide resistance genes mediated by polyploidization in L. chinensis may be the genomic basis that allows the L. chinensis rapid adaptation to field environments with herbicide stress (Figure 6).

Discussion
Polyploid organisms or polyploid populations are often considered more resilient to extreme environments because of their increased genetic variation and the buffering effect of their duplicated genes (Doyle and Coate, 2019; Van de Peer et al., 2009, 2017), and duplicated genes resulted from polyploidization appear to have been key to crop domestication and the evolution of stress resistance (Renny-Byfield and Wendel, 2014). Polyploids can arise via either autopolyploidy, formed through the duplication of a single diploid species, or allopolyploidy, formed through the hybridization and duplication from two or more distinct species (Alger and Edger, 2020). In order to obtain a more diploid-like state, polyploid genomes will undergo gene loss and chromosome number reduction (Lysak et al., 2006; Wendel, 2015), then sometimes show a dominant subgenome (Schnable et al., 2011). Subgenome dominance were generally absent in autopolyploids but present in allopolyploids (Garsmeur et al., 2014; Zhao et al., 2017). However, recent studies found that some allopolyploid plants also do not display subgenome dominance (Sun et al., 2017; VanBuren et al., 2020). To further determine the polyploid origin of L. chinensis, we split the genomes of L. panicea into k-mers, and mapped these k-mers to the two subgenomes of L. chinensis. The mapping coverage further supported the lack of subgenome dominance in L. chinensis (Figure S18). All extant angiosperms have at least one ancient WGD in their ancestry, with some lineages having experienced several additional rounds of genome doubling or tripling over time. However, the establishment or long-term survival of many of these WGDs is not random, but instead coincides with major periods of global climatic/geologic change and/or periods of mass extinction (Cai et al., 2019; Koenen et al., 2021; Novikova et al., 2018; Van de Peer et al., 2017; Wu et al., 2020). All grasses in gramineae have undergone an ancient ρ WGD, after which the number of chromosomes in each species changed from the seven protochromomes (Murat et al., 2017). We found that there is only WGD event (corresponding to the ancient ρ) in L. panicea, while there is a recent WGD in L. chinensis in addition to this ρ event. Moreover, the divergence time between L. panicea and L. chinensis was very close to the time of the recent WGD in L. chinensis (Figure 3a), suggesting that it is likely that tetraploidization of L. chinensis was accompanied by the species separation from L. panicea. In addition, the two subgenomes segregated from L. panicea almost simultaneously, which is different from the results of other studies in allopolyploids (Ye et al., 2020). Taken together, our results indicated that L. chinensis is highly likely to be autopolyploid.
Chromosomal rearrangements such as inversions and translocations have long been thought to play critical roles in adaptation and speciation (Dvorak et al., 2018). A previous study found that a chromosomal inversion polymorphism is geographically widespread and could contribute to local adaptation, life-history shift and multiple reproductive isolating barriers in monkeyflower (Lowry and Willis, 2010), and another study found an inversion that might be responsible for the difference of adaptability and plant architecture between golden buckwheat and Tartary buckwheat (He et al., 2022). In this study, we identified a number of chromosomal rearrangements between L. panicea and L. chinensis, some of which were shared between the two subgenomes of L. chinensis, indicating that they may have existed prior to tetraploidization. Furthermore, we found that the number of genomic SVs specifically present in one of the two subgenomes did not differ significantly, suggesting that there may be no subgenomic bias for structural variation in Leptochloa weeds. This is in contrast to the finding in allotetraploid cotton, in which more structural variation was found in the At subgenome than Dt (Yang et al., 2019). Among the shared SVs, two large inversions were identified on chromosomes 2 (4.62 Mb) and 9 (4.90 Mb) of L. panicea compared with both subgenomes of L. chinensis, containing 615 genes and 822 genes, respectively (Figure 2c). We found genes near the breakpoint regions of chromosome 9 were mainly enriched with GO terms related to plant defence, including ‘defense response to fungus’ and ‘defense response to other organism’ (Figure 2d), which may lead to the differences in the natural tolerance of the two species to biotic stresses.
Herbicide resistance has now been observed in at least 69 countries, with 251 weed species resistant to 162 different herbicides, covering 23 of 26 targets of action. In total, there are at least 469 separate herbicide resistance events, and the problem of herbicide resistance has been growing over time (http://weedscience.org). Herbicide resistance can be divided into target-site resistance (TSR; resistance conferred by mutations in target genes which are directly inhibited by herbicides) and nontarget-site resistances (NTSR; one or more genes involved in processes that protect the plant from herbicide toxicity. Examples include herbicide uptake, translocation and metabolism) (Kreiner et al., 2018; Powles and Yu, 2010; Sharma et al., 2021). In TSR, in addition to the inability of herbicide-active molecules to act on herbicide-targeted-site genes due to mutations (Achary et al., 2020; Murphy and Tranel, 2019; Zhang et al., 2022a), it has also been found that amplification of herbicide target genes leads to herbicide resistance in plants, which could modulate rapid glyphosate resistance through genome plasticity and adaptive evolution (Gaines et al., 2010). We found that polyploidization in L. chinensis mediated the amplification of at least five herbicide targets and that such amplification was very conservative, with no gene loss (Figure 4e). In addition, we found that many abiotic stress resistance gene families were also expanded following polyploidization, especially those associated with herbicide non-target-site resistance (Figure 4f). Similar findings have been reported in barnyardgrass (Ye et al., 2020), another notorious weed in rice paddy fields. Furthermore, we found that a substantial loss of the NB-ARC family genes involved in disease resistance in both tetraploid L. chinensis and barnyardgrass. Interestingly, several studies have shown that genes conferring resistances can lead to growth loss (Bergelson and Purrington, 1996; Brown and Rant, 2013; Van der Plank, 1963). Both growth and resistance are important for plant development, and the loss of the NB-ARC genes in weeds may indicate that natural selection in weeds may prefer a reduced disease resistance in favour of conferring greater growth and reproductive potential. Herbicide nontarget-site resistance can be divided into enhanced herbicide metabolism and transport herbicides to the extracellular compartment (Powles and Yu, 2010). Many CYP450 genes (Dimaano et al., 2020; Han et al., 2021; Höfer et al., 2014; Iwakami et al., 2014), GST genes (Cummins et al., 1999), ARK genes (Pan et al., 2019) have been shown to enhance the metabolism of herbicides in plants. Recently, an ABC transporter protein have been found to transport herbicide active molecules to the vesicles and thus protect plants from herbicide damage (Pan et al., 2021). In this study, we found that three known herbicide-resistance genes including CYP76C1, CYP76C4 (Höfer et al., 2014) and ABCC8 (Pan et al., 2021) were amplified and retained during polyploidization and confirmed their functions in Leptochloa through 3D reconstruction and molecular docking experiments. Most interestingly, we found that both copies of LcCYP76C4 and a LcCYP709B2 genes were under herbicide selection during the malignant spread of L. chinensis in China (Figure 5c–f). This suggests that it may play a herbicide-resistant role in the malignant spread of L. chinensis infesting rice fields to help weeds escape herbicide damages. In addition to herbicide target-site gene amplification, polyploidization in L. chinensis may also provide greater herbicide adaptability by mediating the amplification of nontarget-site herbicide resistance genes.
In this study, we revealed the relationship between polyploidization and the herbicide adaptation. We reported a high-quality chromosome-level genome of L. panicea, which provides crucial information to gain insights into the effects of polyploidization on environmental adaptation, especially herbicide adaptation. In summary, L. chinensis diverged from L. panicea about 11.6 mya and underwent a polyploidization event, which had driven the amplification of herbicide resistance genes such as CYP76C1, CYP76C4, ABCC8 and CYP709B2, providing enhanced buffering capacity under herbicide stress and greater herbicide adaptability in L. chinensis (Figure 6). This study not only contributes to the better understanding of the origin of Leptochloa weeds, but also reveals the polyploidization-driven extreme herbicide adaptability of L. chinensis as a malignant weed in paddy fields. It will provide new ideas and theoretical references for the control of Leptochloa and other polyploid weeds.
Materials and methods
Plant material
The L. panicea plant used in this study was grown in the greenhouse of Hunan Academy of Agricultural Sciences. The root tip was taken for flow cytometry analysis to determine the genome size of the plant.
DNA/RNA extraction, library construction and sequencing
DNA for genome sequencing was extracted from the young leaves of L. panicea using the cetyltrimethyl ammonium bromide (CTAB) method. DNA libraries for single-molecule real-time (SMRT) PacBio genome sequencing were constructed following the standard protocols of the Pacific Biosciences Company and sequenced on the PacBio Sequel II platform using the circular consensus sequencing (CCS) approach. The same DNA was used to construct an Illumina paired-end library with insert sizes of ~400 bp using the NEBNext Ultra DNA Library Prep Kit following the manufacturer's instructions, and the library was sequenced on the Illumina NovaSeq 6000 platform. A Hi-C library was constructed using the young fresh leaves of L. panicea following the proximo Hi-C plant protocol (Phase Genomics) and sequenced on the NovaSeq 6000 platform.
To assist gene predictions, RNA-sequencing (RNA-Seq) was performed using tissues from root, stem, leaf and seed. Total RNA was extracted from each tissue using the TRIzol reagent based on the recommended protocol (Invitrogen, Carlsbad, California, USA). Strand-specific RNA-Seq libraries were constructed using the Illumina TruSeq RNA Sample Prep Kit and sequenced on the NovaSeq 6000 platform. In addition, RNA from all samples was equally mixed, and the mixed RNA was used to construct one PacBio Iso-Seq library. Briefly, cDNA was first synthesized using the Clontech SMARTer® cDNA Synthesis Kit, and then purified using the AMPure PB beads. The Iso-Seq SMRTbell library was then constructed from the purified cDNA with the SMRTbell Express Template Prep kit 2.0 and sequenced on the PacBio Sequel II platform.
Estimation of genome size and heterozygosity
Genome size of L. panicea was estimated using k-mer frequency distribution derived from Illumina short reads with the program Jellyfish (Marçais and Kingsford, 2011) (version 1.1.10). GenomeScope (Vurture et al., 2017) was used to estimate the heterozygosity level of the L. panicea genome.
Genome assembly
Hifiasm (Cheng et al., 2021) was used to assemble HiFi reads into contigs with default parameters. Redundant contigs/sequences in the assembly were removed using Purge Haplotigs with parameters ‘-l 0 -m 40 -h 175’ (Roach et al., 2018). Hi-C data were then used to scaffold the final assembled contigs into pseudomolecules using the Juicer pipeline (Durand et al., 2016). Potential misassemblies were manually checked and corrected based on Hi-C map, genome synteny and read mapping information.
Genome assembly quality evaluation
BUSCO (Simão et al., 2015) was used to assess the completeness of genome assembly and predicted protein-coding genes. Illumina short reads were mapped to the genome assembly using BWA-MEM (version 0.7.17) (Li and Durbin, 2009), and the mapping rate was calculated. LTR Assembly Index (LAI) (Ou et al., 2018) was also used to evaluate the assembly quality.
Repetitive sequence analysis
The repeat library of L. panicea genome was ab initio constructed using RepeatModeler (version 2.0.1) (http://www.repeatmasker.org/RepeatModeler). The consensus TE sequences generated by RepeatModeler were combined with RepBase and used as repeat library in RepeatMasker (version 4.1.0) (http://www.repeatmasker.org) for repetitive element identification. A preliminary list of candidate LTR-RT was generated using LTR_FINDER (Xu and Wang, 2007) and LTR_harvest (Ellinghaus et al., 2008). The identification of high-quality intact LTR-RTs and the calculation of insertion age for intact LTR-RTs were carried out using LTR_retriever (Ou and Jiang, 2018) with default parameters. Tandem repeats were detected using Tandem Repeats Finder (Benson, 1999). Locations of centromeres and telomeres were inferred from the output generated by Tandem Repeats Finder.
Genome annotation
Protein-coding genes were predicted from the L. panicea genome assembly using an integrated approach. RNA-Seq reads from different tissues were aligned to the genome assembly using HISAT2 (v2.0.4) (Kim et al., 2019) and then assembled into transcripts using Cufflinks (v2.2.1) (Trapnell et al., 2012). Open reading frames (ORFs) in the Iso-Seq transcripts were predicted using PASA (v2.0.1) (Haas et al., 2003), and potential full-length cDNA sequences were then extracted and used as the training dataset for ab initio gene predictors, including AUGUSTUS (v3.03) (Stanke et al., 2006), SNAP (Korf, 2004), GlimmerHMM (Majoros et al., 2004) and GeneMark-ET (Lomsadze et al., 2014). Protein sequences from Leptochloa chinensis, Oryza indica, Setaria italica and Oropetium thomaeum were aligned to the L. panicea assembly for homology-based gene prediction using GeMoMa (v1.4.2) (Keilwagen et al., 2016). Finally, ab initio, homology- and transcript-based predictions were integrated using EVidenceModeler (Haas et al., 2008) to generate a consensus model for each gene. Functional annotations of the predicted genes were performed by comparing their protein sequences against the GenBank non-redundant (nr), InterPro, KEGG and eggNOG databases.
Genome comparison
Genome comparison between L. panicea and the two subgenomes of L. chinensis (Wang et al., 2022; available at the Genome Warehouse of the BIG Data Center under accession number GWHBJVB00000000) was performed via whole-genome alignment using the MUMmer package (Kurtz et al., 2004). MCScan (Wang et al., 2012) (Python version) was used for pairwise synteny region search.
Phylogenetic analysis and divergence time estimation
To investigate the evolutionary history of genus Leptochloa, two subgenomes of Leptochloa chinensis, nine grass subfamilies, Brachypodium distachyon, Oryza sativa, Setaria viridis, Setaria italica, Sorghum bicolor, Zea mays, Oropetium thomaeum, Leptochloa panicea and one dicot plant Arabidopsis thaliana were used for gene family construction using OrthoFinder (Emms and Kelly, 2015) (version 2.3.12) with default parameters. Protein sequences of 549 single-copy orthologs from the 10 species were concatenated for the species tree construction. Protein sequences of each single-copy orthologous group were aligned with ClustalW2 (Larkin et al., 2007). Maximum likelihood tree was constructed using FastTree (Price et al., 2010) with 1000 bootstrap replicates. The divergence time was estimated using MCMCTree (Yang and Rannala, 2006) with branch lengths estimated by BASEML in the PAML package (Yang, 2007) and the independent rate model for time estimation.
To detect whole-genome duplication events in the genus Leptochloa, we calculated nonsynonymous substitution (Ks) values for syntenic homeologous gene pairs using WGDI (Sun et al., 2022). We used the nucleotide substitution rate of 6.5 × 10−9 mutations × bp−1 × generation−1 as a molecular clock (Molina et al., 2011).
Gene family expansion and contraction analysis
Gene-family expansion and contraction in Leptochloa genomes were determined using CAFE (version 4.2.1) (De Bie et al., 2006). The gene family size for each species used in CAFE was calculated with OrthoFinder (Emms and Kelly, 2015) (version 2.3.12). The gene birth and death rate was estimated with orthologous groups that were conserved in all species. To better understand the potential functional category of each gene family, we used KinFin (Laetsch and Blaxter, 2017) (v1.0), along with gene functional annotations assigned by InterProScan and gene ontology, to derive rich annotations for gene families.
RNA sequencing data analysis
RNA-Seq reads were processed to remove adapters and to trim low-quality bases using fastp (Chen et al., 2018) (version 0.21.0). Cleaned reads were mapped to the L. panicea and L. chinensis genome using HISAT2 (Kim et al., 2019) (version 2.1.0) with default parameters. The Stringtie software (Pertea et al., 2015) was then used to calculate the TPM (transcripts per million) values of genes.
Diversity analysis
The nucleotide diversity (π) and population differentiation (FST) values were calculated using VCFtools (version 0.1.13) (Danecek et al., 2011) based on the high-confidence SNPs identified from 89 L. chinensis accessions reported in our previous study (Wang et al., 2022). The π value for each SNP was calculated, and the nucleotide diversity level was measured using a 100-kb window with a step size of 10 kb for each L. chinensis population.
Homology modelling
The NCBI BLAST server was used to select the three PDB proteins with the highest protein similarity as templates. The homology of amino acid sequence was aligned. Homology modelling was carried out using the Modeller (Webb and Sali, 2016) and assigning to the heme subsequently (in case of cytochrome P450 proteins).
Molecular docking
Molecular docking experiments were performed to investigate the binding mode between the proteins and cyhalofop acid/metamifop using Autodock4 (Morris et al., 2009). The 2D structure of the ligands was drawn by ChemBioDraw Ultra and was optimized by the MM2 method using ChemBio3D Ultrasoftware to obtain the 3D structure. The AutoDockTools 1.5.7 package (Morris et al., 2009) was employed to generate the docking input files. For docking, the default parameters were used if it was not mentioned. The best-scoring pose as judged by the AD4 docking score was chosen and visually analysed using PyMoL 1.7.6 (www.pymol.org).
Author contributions
L.B., L.Wang. and K.C. designed and managed the project. H.Y., D.L., Y.P. and Z.Z. contributed to sample collection. J.Z. and L.Wu. performed DNA/RNA extraction. K.C. and T.L. performed genome assembly, annotation and analysis. K.C. and H.Y. contributed to RNA-seq and the associated data analysis. K.C. and H.Y. wrote the manuscript. K.C., L.Wang. and L.B. revised the manuscript.
Acknowledgements
This research was supported by grants from the National Natural Science Foundation of China (No. 32272564), the National Key R&D Program of China (No. 2021YFD1700101), the Science and Technology Innovation Program of Hunan Province (Nos. 2023JJ10025 and 2022RC1017), the Training Program for Excellent Young Innovators of Changsha (kq2106079) and the China Agriculture Research System of MOF and MARA (CARS-16-E19).
Conflict of interest
The authors declare no competing interests.
Open Research
Data availability statement
The genomic sequencing reads, RNA-seq data and genome assembly have been deposited into the BIG data centre (https://bigd.big.ac.cn/) under accession number PRJCA010143.