Volume 23, Issue 5 pp. 1507-1520
Research Article
Open Access

The chromosome-scale assembly of the Salvia plebeia genome provides insight into the biosynthesis and regulation of rosmarinic acid

Yiqun Dai

Yiqun Dai

State Key Laboratory of Natural Medicines, Department of Resources Science of Traditional Chinese Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing, China

School of Pharmacy, Bengbu Medical University, Bengbu, China

These authors contributed equally to this study.

Search for more papers by this author
Mengqian He

Mengqian He

State Key Laboratory of Natural Medicines, Department of Resources Science of Traditional Chinese Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing, China

These authors contributed equally to this study.

Search for more papers by this author
Hui Liu

Hui Liu

Yangzhou Center for Food and Drug Control, Yangzhou, China

These authors contributed equally to this study.

Search for more papers by this author
Huihui Zeng

Huihui Zeng

State Key Laboratory of Natural Medicines, Department of Resources Science of Traditional Chinese Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing, China

Search for more papers by this author
Kaixuan Wang

Kaixuan Wang

State Key Laboratory of Natural Medicines, Department of Resources Science of Traditional Chinese Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing, China

Search for more papers by this author
Rui Wang

Rui Wang

Yunnan Yunke Characteristic Plant Extraction Laboratory Co., Ltd., Kunming, China

Search for more papers by this author
Xiaojing Ma

Xiaojing Ma

State Key Laboratory for Quality Ensurance and Sustainable Use of Dao-di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, China

Search for more papers by this author
Yan Zhu

Yan Zhu

State Key Laboratory of Natural Medicines, Department of Resources Science of Traditional Chinese Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing, China

Search for more papers by this author
Guoyong Xie

Guoyong Xie

State Key Laboratory of Natural Medicines, Department of Resources Science of Traditional Chinese Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing, China

Search for more papers by this author
Yucheng Zhao

Corresponding Author

Yucheng Zhao

State Key Laboratory of Natural Medicines, Department of Resources Science of Traditional Chinese Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing, China

Institute for Safflower Industry Research, Key Laboratory of Xinjiang Phytomedicine Resource and Utilization (Ministry of Education), School of Pharmacy, Shihezi University, Shihezi, China

Correspondence (Tel: 86-25-86185103; fax 86-25-85301528; email [email protected] (Y.Z.); Tel: 86-25-86185130; fax 86-25-85301528; email [email protected](M.Q.))

Search for more papers by this author
Minjian Qin

Corresponding Author

Minjian Qin

State Key Laboratory of Natural Medicines, Department of Resources Science of Traditional Chinese Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing, China

Correspondence (Tel: 86-25-86185103; fax 86-25-85301528; email [email protected] (Y.Z.); Tel: 86-25-86185130; fax 86-25-85301528; email [email protected](M.Q.))

Search for more papers by this author
First published: 13 February 2025

Summary

Salvia plebeia is an important traditional Chinese medicinal herb, with flavonoids and phenolic acids as its primary bioactive components. However, the absence of a reference genome hinders our understanding of genetic basis underlying the synthesis of these components. Here, we present a high-quality, chromosome-scale genome assembly of S. plebeia, spanning 1.22 Gb, with a contig N50 of 91.72 Mb and 36 861 annotated protein-coding genes. Leveraging the genome data, we identified four catalytic enzymes—one rosmarinic acid synthase (RAS) and three cytochrome P450 monooxygenases (CYP450s) —in S. plebeia, which are involved in rosmarinic acid biosynthesis. We demonstrate that SpRAS catalyses the conjugation of various acyl donors and acceptors, resulting in the formation of rosmarinic acid and its precursor compounds. SpCYP98A75, SpCYP98A77 and SpCYP98A78 catalyse the formation of rosmarinic acid from its precursors at either the C-3 or the C-3′ position. Notably, SpCYP98A75 exhibited a stronger hydroxylation capacity at the C-3′ position, whereas SpCYP98A77 and SpCYP98A78 demonstrate greater hydroxylation efficiency at the C-3 position. Furthermore, SpCYP98A75 hydroxylated both the C-3 and C-3′ positions simultaneously, promoting the conversion of 4-coumaroyl-4′-hydroxyphenyllactic acid to rosmarinic acid. Next, using a hairy root genetic transformation system for S. plebeia, we identified a basic helix–loop–helix protein type transcription factor, SpbHLH54, which positively regulates the biosynthesis of rosmarinic acid and homoplantaginin in S. plebeia. These findings provide a valuable genomic resource for elucidating the mechanisms of rosmarinic acid biosynthesis and its regulation and improve the understanding of evolutionary patterns within the Lamiaceae family.

Introduction

Salvia L. is the largest genus within the Lamiaceae family and ranks among the largest plant genera in the plant kingdom, encompassing over 1000 species (Abd Rashed and Rathi, 2021; Drew et al., 2017). Members of this genus are widely distributed worldwide and possess considerable medicinal, culinary, and ornamental value. Notable species include Salvia miltiorrhiza, Salvia officinalis, and Salvia rosmarinus. Salvia plebeia R. Brown, an annual or biennial herb belonging to this genus, is widely distributed across China, South Korea, Iran, India and Australia. In Chinese traditional medicine, S. plebeia has been widely applied to manage a range of inflammatory ailments, including nephritis, hepatitis, bronchitis, rheumatoid arthritis and common colds (Cui et al., 2020; Liang et al., 2020). Phytochemical studies have identified flavonoids, phenolic acids, terpenoids and volatile oils as the primary constituents of S. plebeia. These compounds endow S. plebeia with various pharmacological activities, including anticancer, immunomodulatory and anti-inflammatory (Choi et al., 2020; Zou et al., 2018). Despite the growing commercial demand for S. plebeia, the biosynthesis pathways and regulatory mechanisms underlying its active compounds remain poorly understood. Furthermore, the absence of genomic information for S. plebeia presents substantial challenges for understanding the molecular-level regulation of these compounds and limits efforts to enhance their content through molecular breeding techniques.

Rosmarinic acid, a specialized phenolic ester abundantly present in S. plebeia, has garnered considerable attention owing to its diverse therapeutic benefits, including antioxidant (Petersen and Simmonds, 2003), anticancer (Han et al., 2019), and anti-inflammatory (Chen et al., 2024) and preventive effects against Alzheimer's disease (Habtemariam, 2018). The biosynthesis pathway of rosmarinic acid has been described as nonlinear, employing a diverging–converging mechanism that integrates two parallel metabolic routes originating from phenylalanine and tyrosine (Petersen, 1997). In the phenylalanine pathway, phenylalanine undergoes sequential enzymatic reactions catalysed by phenylalanine ammonia-lyase (PAL), cinnamate-4-hydroxylase (C4H) and 4-coumarate: CoA ligase (4CL) to produce 4-coumaroyl-CoA. Meanwhile, in the tyrosine pathway, tyrosine aminotransferase (TAT) catalyses the conversion of tyrosine to 4-hydroxyphenylpyruvate, which is subsequently reduced to 4-hydroxyphenyllactate through the action of hydroxyphenylpyruvate reductase (HPPR). Finally, rosmarinic acid synthase (RAS) catalyses the combination of the acyl donor derived from the phenylpropanoid pathway with the acyl acceptor from the tyrosine pathway, resulting in the formation of rosmarinic acid or its precursor molecules. The rosmarinic acid precursors are subsequently hydroxylated by the CYP98A subfamily enzymes to form rosmarinic acid (Levsh et al., 2019). These enzymes have facilitated the heterologous reconstruction of rosmarinic acid biosynthesis pathways in Escherichia coli and Saccharomyces cerevisiae (Babaei et al., 2020; Wang et al., 2023). However, the biosynthesis mechanism of rosmarinic acid in S. plebeia remains unclear.

Beyond the enzymes directly participating in the biosynthesis of rosmarinic acid, transcription factors (TFs) are vital in regulating the production of secondary metabolites. In plants, the synthesis and accumulation of these metabolites represent a highly intricate process influenced by numerous elements that control the expression of critical genes encoding enzymes within metabolic pathways. This regulation occurs through the binding of TFs to cis-acting elements in gene promoters, which can either facilitate or inhibit the biosynthesis of metabolites. Several TF families, including MYB, MYC, basic helix–loop–helix (bHLH) and ethylene responsive factor families, have been reported to influence the biosynthesis and accumulation of phenolic compounds, including salvianolic acids and their related derivatives (Huang et al., 2019; Liu et al., 2022; Tian et al., 2022; Wang et al., 2021). For instance, the overexpression of the bZIP transcription factor SmbZIP1 can enhance the expression of biosynthetic enzyme genes for phenolic acids, such as cinnamate-4-hydroxylase (C4H1), thereby facilitating the biosynthesis of rosmarinic acid in S. miltiorrhiza (Deng et al., 2020). SmMYB1 can positively regulate the expression of key enzyme genes SmTAT1, SmHPPR1, SmPAL1, SmC4H1, Sm4CL1, SmCYP98A14 and SmRAS1 in the phenolic acid synthesis pathway of S. miltiorrhiza, thereby promoting the biosynthesis of rosmarinic acid in S. miltiorrhiza (Zhou et al., 2021). However, likely due to the lack of genomic information of S. plebeia, no studies have yet investigated the regulatory mechanisms of TFs involved in rosmarinic acid biosynthesis in S. plebeia.

In this research, we present a high-resolution chromosome-level genome assembly of S. plebeia, constructed using high-coverage Oxford Nanopore Technologies (ONT), Illumina sequencing and Hi-C data. Through evolutionary and comparative genomic analyses, we explored the phylogenetic divergence of S. plebeia. Furthermore, we characterized the RAS and CYP98 genes, which encode the key enzyme genes involved in rosmarinic acid biosynthesis in S. plebeia. Moreover, we identified the bHLH transcription factor gene SpbHLH54, which was demonstrated to contribute to rosmarinic acid biosynthesis in S. plebeia. Our findings offer a crucial genomic basis for understanding the biochemical mechanisms underlying the accumulation and regulation of rosmarinic acid biosynthesis. This study also offers insights into the evolutionary patterns of the Lamiaceae family and would facilitate future research on the genetic and metabolic pathways of S. plebeia.

Results

Genome assembly and gene annotation of S. plebeia

Chromosomal observations during the metaphase of mitosis in the root tips of S. plebeia revealed that its chromosomal ploidy is diploid, with a chromosome number of 2n = 2x = 16 (Figure S1). No variations in chromosome number or morphology were detected, indicating that the chromosome number and morphology of S. plebeia are relatively stable. Flow cytometry analysis determined the genome size of the S. plebeia to be 1.24 Gb (Table S1, Figure S2), which is larger than the genomes of S. miltiorrhiza (557 Mb) (Ma et al., 2021) and S. officinalis (480 Mb) (Li et al., 2022). A genome survey conducted with 19 k-mer frequencies derived from Illumina short reads estimated the genome size of S. plebeia to be 1.16 Gb, indicating a low degree of heterozygosity (~0.38%) (Table S2, Figure S3). To assemble the genome sequence of S. plebeia, data generated from Illumina, Oxford Nanopore Technologies (ONT) and Hi-C sequencing technologies were utilized.

ONT long-read sequencing produced 133.08 Gb of data, achieving approximately ~114× coverage and an N50 length of 33.04 kb (Table S3). Following error correction, trimming and assembly, the refined ONT reads were compiled into 38 contigs, encompassing a cumulative size of 1.22 Gb and an N50 length of 91.72 Mb. This assembly covered 98.12% of the predicted nuclear genome size (Table S4). Hi-C chromosome conformation capture sequencing produced 571 067 410 raw paired-end reads, with 35.90% (205008017) mapped uniquely to the contig assembly (Table S5). Within this unique dataset, 126 222 201 reads were utilized to assist in the construction of pseudochromosomes. In total, 1.21 Gb (99.35%) of the assembled genome was mapped to eight pseudochromosomes (Figures 1a and S4, Table S6). The genome assembly's quality and completeness were assessed using Benchmarking Universal Single-Copy Orthologs (BUSCOs). The analysis identified 97.4% of the 1614 embryophyta single-copy orthologues identified as complete, confirming the high quality and completeness of the genome assembly for S. plebeia (Table S7).

Details are in the caption following the image
Genomic features of S. plebeia. (a) Chromosome-level landscape of the genome. (I, gene density; II, repeat density; III, GC content). (b) Phylogenetic tree of S. plebeia and 11 other species. (c) Distribution of Ks values of S. plebeia gene pairs in segmental duplications; from left, the two peaks reflect a WGD in Lamiales and an ancient WGT in core eudicots. (d) Collinearity between S. plebeia, S. officinalis, S. miltiorrhiza and S. splendens of Lamiales.

A comparison analysis showed that the genome of S. plebeia is larger than the genomes of other diploid Lamiaceae species, including S. miltiorrhiza (Ma et al., 2021) and S. officinalis (Li et al., 2022). Notably, repetitive sequences accounted for 81.39% of the S. plebeia genome. Among these, long terminal repeat (LTR) retrotransposons were the most prevalent, comprising 58.74% of the genome. Of these LTR elements, 88.54% were classified as belonging to the Gypsy superfamily (60.20%) and the Copia superfamily (28.35%) (Table S8). A comprehensive approach that combined ab initio gene predictions, homologous protein searches and de novo assembled transcripts derived from RNA-seq data led to the identification of 40 613 predicted protein-coding genes. Evaluation using the embryophyta BUSCO dataset identified complete orthologues for 99.2% of the dataset, suggesting that the predicted protein-coding genes exhibit a high degree of completeness and quality (Table S7).

Phylogenetic and whole-genome duplication analyses

To deduce the evolutionary history of S. plebeia, phylogenetic reconstruction and divergence time estimation analyses were conducted using 211 single-copy genes common to S. plebeia and 11 other plant species. These included eight previously sequenced Lamiaceae species—Salvia splendens, S. miltiorrhiza, S. officinalis, S. rosmarinus, Scutellaria baicalensis, Salvia bowleyana, Nepeta cataria and Scutellaria barbata. Consistent with earlier reports (Han et al., 2023), the estimated divergence time for Lamiaceae is approximately 54.9 MYA, with a 95% highest posterior density interval of 49.4–59.3 MYA. The divergence between the genera Nepeta and Salvia occurred approximately 43.6 MYA, with a 95% highest posterior density of 39.5–48.4 MYA. Furthermore, S. plebeia diverged from its close relatives S. miltiorrhiza and S. bowleyana around 20.3 MYA, with a 95% highest posterior density of 14.5–25.3 MYA (Figure 1b).

Furthermore, an evolutionary analysis of gene families was performed using CAFE 5, revealed 1215 orthogroups that had expanded and 1057 orthogroups that had contracted in the S. plebeia genome (Figure S5). Pathway analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) indicated that the expanded gene families in S. plebeia were significantly associated with pathways related to monoterpenoid biosynthesis and phenylalanine, tyrosine biosynthesis (Figure S6). Similarly, Gene Ontology (GO) analysis revealed significant enrichment of expanded orthogroups in terpenoid biosynthesis processes and monooxygenase activity (Figure S7). Thus, the specific expansion of orthogroups for these biological processes may be associated with the important role of S. plebeia in metabolic processes.

Whole-genome duplications (WGDs) are pivotal in plant evolution (Jiao et al., 2011). An analysis of the density of synonymous substitutions per synonymous site (Ks) for paralogous genes identified two peaks (Figure 1c). The first peak corresponds to the whole-genome triplication (WGT) event that shared by core eudicots, while the second reflects a WGD event common to the Lamiaceae family. No recent WGD events were detected in S. plebeia. As illustrated in Figure 1d, S. plebeia and S. miltiorrhiza have the same chromosome number and exhibit a high degree of collinearity, with a total of 421 collinear regions identified between their genomes.

SpRAS And SpCYP98As participate in the biosynthesis of rosmarinic acid

Rosmarinic acid is the most abundant phenolic acid component in the genus Salvia and is also a major constituent of the phenolic acids in S. plebeia. Several acyl donors and acceptors, as well as their esterification products, including 4-coumaroyl-4′-hydroxyphenyllactic acid, 4-coumaroyl-3′,4′-hydroxyphenyllactic acid and caffeoyl-4′-hydroxyphenyllactic acid, have been detected in S. plebeia. This suggests that RAS plays a crucial role in the formation of rosmarinic acid and its precursors (Figure 2a). To investigate this, we sought to verify the role of catalytic activities of SpRAS in facilitating reactions between different acyl donors and acceptors. Based on our genome data, 13 candidate RAS genes were identified. A phylogenetic tree was developed to examine the connections between these 13 candidate genes and other plant acyltransferase family members (Figure S8). Among these, the SpleChr5G00230110.1 gene clustered closely with known functional RAS genes from the Lamiaceae family. Accordingly, SpleChr5G00230110.1 was selected as a candidate gene for functional validation. The open reading frame (ORF) of SpleChr5G00230110.1 (referred to as SpRAS) was cloned into the pET-28a vector, and the encoded protein was heterologous-expressed protein in E. coli. The cells were lysed, and the crude protein was purified through affinity chromatography (Figure S9). To evaluate substrate specificity, SpRAS was incubated with combinations of acyl-CoA donors (p-coumaroyl-CoA and caffeoyl-CoA) and acyl acceptor substrates (salvianic acid A and 4-hydroxyphenyllactic acid). These compounds serve as intermediates in the biosynthesis pathway of rosmarinic acid in S. plebeia. The reaction mixture underwent analysis using HPLC and UPLC-Q-TOF/MS. The results demonstrated that SpRAS can catalyse the reaction between salvianic acid A and caffeoyl-CoA to produce rosmarinic acid (Figure 2b, Figure S10), and it can also catalyse the reactions of caffeoyl-CoA and 4-hydroxyphenyllactic acid, p-coumaroyl-CoA and salvianic acid A, and p-coumaroyl-CoA and 4-hydroxyphenyllactic acid to produce the precursors of rosmarinic acid, caffeoyl-4′-hydroxyphenyllactic acid, 4-coumaroyl-3′,4′-hydroxyphenyllactic acid, and 4-coumaroyl-4′-hydroxyphenyllactic acid, respectively (Figures 2c–e, S10 and S11). These precursors of rosmarinic acid require the action of hydroxylases to form rosmarinic acid. Subsequently, these precursors of rosmarinic acid are converted into rosmarinic acid through the hydroxylation action of CYP98A subfamily enzymes (Levsh et al., 2019). Therefore, we further investigated the CYP98A subfamily enzymes in S. plebeia.

Details are in the caption following the image
Functional verifications of S. plebeia RAS genes. (a) Proposed rosmarinic acid biosynthetic pathway. Abbreviation: RAS, rosmarinic acid synthase; CYP98A, cytochrome P450 98A. (b) LC–MS analyses of recombinant SpRAS enzyme assays using salvianic acid A and caffeoyl-CoA as substrates. (c) LC–MS analyses of recombinant SpRAS enzyme assays using 4-hydroxyphenyllactic acid and caffeoyl-CoA as substrates. (d) LC–MS analyses of recombinant SpRAS enzyme assays using salvianic acid A and p-coumaroyl-CoA as substrates. (e) LC–MS analyses of recombinant SpRAS enzyme assays using 4-hydroxyphenyllactic acid and p-coumaroyl-CoA as substrates. The products were detected using HPLC (330 nm) and LC–MS in negative ionization mode. Boiled enzymes are used as a control. The red boxes indicate molecular ion peaks.

From the genome database of S. plebeia, four genes (SpleChr1G00047530.1, SpleChr5G00273770.1, SpleChr7G00340260.1 and SpleChr1G00047510.1) classified under the SpCYP98A subfamily were identified. These genes were successfully cloned using specific primers and were designated as SpCYP98A75, SpCYP98A76, SpCYP98A77 and SpCYP98A78 by the Cytochrome P450 Nomenclature Committee. The SpCYP98A family enzymes catalyse the meta-hydroxylation of hydroxycinnamic acid ester to produce rosmarinic acid, classifying them within the 4-coumaroyl ester 3-hydroxylase (C3H) enzyme family (Ernst et al., 2022). Phylogenetic analysis was conducted using the amino acid sequences of C3H enzymes from different species, and Neighbour-Joining (NJ) trees were constructed using the MEGA program to determine the link between SpCYP98A enzymes and C3H enzymes (Figure S12). The findings indicated that SpCYP98A75 exhibits considerable sequence homology with SmCYP98A75. The amino acid sequences of the four SpCYP98A enzymes included characteristic motifs typical of P450 monooxygenases, including the PERF motif, heme-binding cysteine motif and threonine-containing binding pocket. In order to explore their catalytic functions, the coding sequences of SpCYP98A75, SpCYP98A76, SpCYP98A77 and SpCYP98A78 were cloned into the pESC-His vector and expressed in yeast strain WAT11. Purified yeast microsomes containing recombinant SpCYP98A enzymes were used in activity assays with three substrates, including 4-coumaroyl-4′-hydroxyphenyllactic acid, 4-coumaroyl-3′,4′-hydroxyphenyllactic acid and caffeoyl-4′-hydroxyphenyllactic acid. Product analysis using UPLC-Q-TOF/MS revealed that SpCYP98A75 catalysed the conversion of 4-coumaroyl-3′,4′-hydroxyphenyllactic acid or caffeoyl-4′-hydroxyphenyllactic acid into rosmarinic acid. Additionally, SpCYP98A75 transformed 4-coumaroyl-4′-hydroxyphenyllactic acid into both 4-coumaroyl-3′,4′-hydroxyphenyllactic acid and rosmarinic acid. These findings demonstrate that SpCYP98A75 catalyses both the C-3′ and C-3 hydroxylation of the phenolic ring derived from the acyl acceptor. However, its catalytic activity was notably stronger at C-3 position (Figures 3 and S13). Similarly, SpCYP98A77 and SpCYP98A78 catalysed the conversion of 4-coumaroyl-3′,4′-hydroxyphenyllactic acid or caffeoyl-4′-hydroxyphenyllactic acid to rosmarinic acid as well as that of 4-coumaroyl-4′-hydroxyphenyllactic acid into caffeoyl-4′-hydroxyphenyllactic acid. However, these enzymes exhibited a preference for catalysing hydroxylation at the C-3′ position (Figures 3 and S13).

Details are in the caption following the image
Functional verifications of S. plebeia CYP98A genes. (a) HPLC analyses of recombinant SpCYP98A75, SpCYP98A77 and SpCYP98A78 enzymes assays using caffeoyl-4′-hydroxyphenyllactic acid as substrates. (b) HPLC analyses of recombinant SpCYP98A75, SpCYP98A77 and SpCYP98A78 enzymes assays using 4-coumaroyl-3′,4′-hydroxyphenyllactic acid as substrates. (c) HPLC and EIC analyses of recombinant SpCYP98A75, SpCYP98A77 and SpCYP98A78 enzymes assays using 4-coumaroyl-4′-hydroxyphenyllactic acid as substrates. The products were detected using HPLC (330 nm) and LC–MS in negative ionization mode. Microsomal proteins extracted from pESC-WAT11 as negative control and rosmarinic acid standard as positive control.

To further verify the functions of the genes involved in rosmarinic acid biosynthesis, we conducted Agrobacterium tumefaciens-mediated transient protein expression in Nicotiana benthamiana. We transiently co-expressed SpCYP98A75, SpCYP98A77 and SpCYP98A78 with SpRAS in N. benthamiana leaves, respectively. As shown in Figure S14, when 4-hydroxyphenyllactic acid was used as substrate, one new peak was generated and compared directly with standard sample, indicating that it was rosmarinic acid. The in vivo experimental results in N. benthamiana further indicate that SpRAS and SpCYP98As are involved in the biosynthesis of rosmarinic acid in S. plebeia.

Discovery of candidate TFs for rosmarinic acid content in S. plebeia

To investigate the regulatory mechanisms underlying rosmarinic acid biosynthesis, we first established a genetic transformation system for S. plebeia using hairy roots (Figure 4a). The effects of methyl jasmonate (MeJA) on rosmarinic acid accumulation were then examined in the hairy roots of S. plebeia. Quantitative analysis of rosmarinic acid content before and after MeJA treatment revealed that MeJA treatment not only increased rosmarinic acid content but also enhanced the levels of the flavonoid homoplantaginin (Figure 4b). In addition, MeJA treatment upregulated the biosynthetic genes involved in rosmarinic acid and homoplantaginin production, including SpPAL, SpC4H, Sp4CL2, SpCHS1, SpCHI, SpFNS, SpF6H1, SpF6OMT2, SpUGT1 and SpRAS (Figure 4c). To explore the potential TFs regulating the expression of these biosynthetic genes in response to MeJA treatment, a comparative transcriptome analysis was conducted. By comparing gene expression levels between the control and MeJA-treated groups, 199 differentially expressed transcription factors were identified, including members of the bHLH, MYB, WRKY and bZIP families, were identified. Next, a genome-wide analysis of bHLH transcription factors in S. plebeia was performed to systematically investigate their regulatory roles. A total of 124 SpbHLHs containing ORF sequences were identified and named SpbHLH1SpbHLH124 (Table S9). An NJ tree was constructed by comparing these SpbHLHs with 147 bHLH transcription factors from Arabidopsis thaliana. Following the grouping method proposed by Pires and Dolan (2010), the 124 bHLH family genes in S. plebeia were classified into 23 subfamilies (Figures 4d and S15, Table S10). Among the differentially expressed TFs, 16 belonged to the bHLH family, with 8 exhibiting upregulation (Figure 4e). Notably, the significantly upregulated gene SpbHLH54 was classified into bHLH subfamily III (d + e), which has been previously implicated in various JA-related physiological processes, including root growth inhibition and flavonoid biosynthesis promotion (Chen et al., 2011). Based on its upregulation and known functional relevance, SpbHLH54 was selected for subsequent functional validation.

Details are in the caption following the image
Identification of TFs regulating rosmarinic acid biosynthesis. (a) Hairy root induction and cultures of S. plebeia using A. rhizogenes strain A4. I: Aseptic seedlings of S. plebeia. II: Infection of leaf explants by A. rhizogenes A4. III: Growth status of hairy root strains of S. plebeia on 1/2 MS solid culture medium. IV: Growth status of hairy roots of S. plebeia in 1/2 MS liquid culture medium. (b) Effects of MeJA induction days on the content of rosmarinic acid and homoplantaginin in hairy roots of S. plebeia. (c) The transcription expression levels of biosynthetic genes related to rosmarinic acid and homoplantaginin in hairy root culture system of S. plebeia treated with MeJA different induction times (0, 3, 6, 9 and 12 h). (c) The phylogenetic tree of the bHLH gene family in S. plebeia and A. thaliana, see Figure S15 for the detailed phylogeny. (e) Heatmap of differentially expressed bHLH transcription factor treated with MeJA in hairy root culture system of S. plebeia. Bars are means ± standard deviation (Student's t-test, *P < 0.05, **P < 0.01).

SpbHLH54 Positively regulates rosmarinic acid biosynthesis in S. plebeia

To investigate the role of SpbHLH54 in regulating rosmarinic acid biosynthesis in S. plebeia, transgenic hairy root lines overexpressing SpbHLH54 (OE-SpbHLH54) and RNA interference knockdown lines (RNAi-SpbHLH54) were generated (Figure S16). Three independent overexpression lines and three RNAi lines were selected for further analysis. Compared to those in the control lines, the contents of rosmarinic acid and homoplantaginin in OE-SpbHLH54 lines increased by 1.87–2.31-fold and 2.11–2.40-fold, respectively. Conversely, in RNAi-SpbHLH54 lines, the contents of rosmarinic acid and homoplantaginin decreased by 24%–37% and 32%–50%, respectively (Figure 5a). These results indicated that SpbHLH54 acts as a positive regulator of the rosmarinic acid and homoplantaginin biosynthesis. To further elucidate the regulatory mechanism of SpbHLH54, the expression levels of key enzyme genes involved in the biosynthesis of rosmarinic acid and homoplantaginin were analysed in both overexpression and knockdown hairy root lines. The results indicated that in RNAi lines, the SpPAL, SpC4H, Sp4CL2, SpCHS1, SpCHI, SpFNS, SpF6H1, SpF6OMT2, SpUGT1 and SpRAS genes were downregulated to varying degrees, although the changes were not significant. In contrast, these genes were upregulated in the OE-SpbHLH54 lines, with particularly notable increase observed for SpFNS and SpRAS, which exhibited 4.0- and 2.6-fold upregulation, respectively (Figure 5b). Based on these results, we hypothesized that SpFNS and SpRAS may be direct target genes of SpbHLH54 in the regulation of rosmarinic acid and homoplantaginin biosynthesis.

Details are in the caption following the image
SpbHLH54 positively regulates rosmarinic acid biosynthesis. (a) The content of rosmarinic acid and homoplantaginin in SpbHLH54 transgenic and wild-type hairy root lines. (b) The transcription expression levels of biosynthetic genes related to rosmarinic acid homoplantaginin and in SpbHLH54 transgenic and wild-type hairy root lines. (c, d) Yeast-one-hybrid (Y1H) assay showing the interaction of SpbHLH54 with SpRAS and SpFNS promoter. (e–g) Dual-luciferase assay for SpbHLH54 and pSpRAS and pSpFNS in tobacco. Bars are means ± standard deviation (Student's t-test, *P < 0.05, **P < 0.01, ***P < 0.001).

To investigate whether SpbHLH54 directly binds to the promoters of SpRAS and SpFNS, yeast one-hybrid (Y1H) assays were performed. First, the promoters of SpRAS and SpFNS genes were cloned into the pLacZ vector, generating reporter constructs, while the coding sequence of SpbHLH54 was cloned into the pB42AD vector to produce the effector construct pB42AD-SpbHLH54. The results from the Y1H assays demonstrated that SpbHLH54 bound directly to two segments of the pSpRAS promoter (pSpRAS-Ebox-1 and pSpRAS-Ebox-2) and two fragments of the pSpFNS promoter (pSpFNS-Ebox-1 and pSpFNS-Ebox-4), respectively (Figure 5c,d). To further confirm this interaction, dual-luciferase (Dual-LUC) assays were conducted using a transient expression system in leaves of tobacco (N. benthamiana). The constitutive expression of SpbHLH54 served as an effector, while the promoters of SpRAS and SpFNS were incorporated into pGreenII0800-LUC vectors to generate reporter constructs. Effector and reporter constructs were co-expressed transiently in tobacco leaf cells (Figure 5e–g). The Dual-LUC assays revealed that SpbHLH54 significantly activated the promoters of SpRAS and SpFNS compared to the EGFP control. These results collectively demonstrate that SpbHLH54 enhances the transcription of SpRAS and SpFNS by specifically binding to E-box cis-elements in their promoters.

Discussion

The assembly and analysis of a high-quality reference-grade genome of S. plebeia provide important genetic insights into understanding rosmarinic acid biosynthesis. This genome of S. plebeia, with an approximate size of 1.22 Gb, is organized across eight chromosomes, contains 36 861 annotated protein-coding genes. WGD analysis identified two key duplication events—one corresponding to the WGT event shared by core eudicots and the other reflecting a subsequent WGD event specific to the Lamiaceae family (Xu et al, 2020). No recent WGD events were detected in S. plebeia. The larger genome size of S. plebeia than that of other Lamiaceae species is partially attributable to the recent expansion of genes and long terminal repeats within its genome.

Rosmarinic acid holds significant potential for applications in the food and pharmaceutical sectors owing to its diverse bioactive properties. Although the biosynthesis pathway of rosmarinic acid has been documented in several species (Levsh et al., 2019; Li et al., 2024; Mansouri and Mohammadi, 2021), the catalytic preferences of CYP98A family enzymes differ across species. For instance, in Lithospermum erythrorhizon, LeCYP98A6 catalyses the C-3 hydroxylation of 4-coumaroyl-4′-hydroxyphenyllactic acid to form caffeoyl-4′-hydroxyphenyllactic acid (Matsuno et al., 2002). In Phacelia campanularia, PcCYP98A112 and PcCYP98A113 specifically catalyse C-3 and C-3′ hydroxylation, respectively (Levsh et al., 2019). Similarly, in S. miltiorrhiza, SmCYP98A75 preferentially catalyses C-3′ hydroxylation, whereas SmCYP98A14 preferentially catalyses C-3 hydroxylation in rosmarinic acid biosynthesis (Zhou et al., 2024). Although CbCYP98A14 has been reported as a bifunctional enzyme capable of catalysing both the C-3 and C-3′ hydroxylation of 4-coumaroyl-4′-hydroxyphenyllactic acid, there is a lack of assays that employ 4-coumaroyl-4′-hydroxyphenyllactic acid as the substrate (Eberle et al., 2009). In this study, we discovered three SpCYP98A enzymes that are involved in the formation of rosmarinic acid in S. plebeia. Among them, SpCYP98A75 exhibited a stronger oxidation capability at the C-3′ position, while SpCYP98A78 and SpCYP98A77 showed a stronger preference for C-3 hydroxylation. Notably, SpCYP98A75 demonstrated bifunctional activity by oxidizing both the C-3 and C-3′ positions of rosmarinic acid precursors, thereby facilitating the efficient generation of rosmarinic acid from 4-coumaroyl-4′-hydroxyphenyllactic acid. The identification of SpCYP98A enzymes not only provides valuable enzymatic tools for the large-scale production of rosmarinic acid via synthetic biology but also improves our comprehension of its biosynthetic pathways in S. plebeia. These insights may further refine and expand our knowledge of rosmarinic acid biosynthesis across the Lamiaceae family, offering insights into the evolutionary adaptation of specialized metabolic pathways in this diverse plant lineage.

The chromosome-level S. plebeia genome provides a valuable resource for identifying bHLH TF-encoding genes throughout the genome. bHLH transcription factors are known to play crucial roles in regulating plant growth and development, stress resistance and signal transduction (Qian et al., 2021; Xing et al., 2018). Through a combined analysis of comparative transcriptomic and gene co-expression data, SpbHLH54 was identified as a key candidate among bHLH genes and other TF family. The regulatory effects of SpbHLH54 on rosmarinic acid biosynthesis were validated using the Y1H and Dual-LUC assays, along with transgenic approaches. Furthermore, SpbHLH54 was also shown to activate the SpFNS, suggesting its role in regulating the biosynthesis of flavonoid compounds in S. plebeia. These findings establish a basis for comprehending the molecular foundations and regulatory functions of bHLH transcription factors in S. plebeia and also offering critical insights into the biosynthesis, evolution and regulation of metabolites in the Lamiaceae family.

Materials and methods

Plant material

The plant materials of S. plebeia were collected from the Medicinal Botanic Garden of China Pharmaceutical University, Nanjing, China (118.83 E, 31.95 N). The voucher specimens (No. CPUZYZY2022011) of S. plebeia have been deposited in the Botanic Garden of China Pharmaceutical University, Nanjing, China. Leaves from 1-month-old sterile S. plebeia plants were harvested as explants for A. rhizogenes infection to produce transgenic hairy roots. N. benthamiana plants were cultivated in pots within a growth chamber maintained at 24 °C under a 16-h light photoperiod for dual-luciferase (Dual-LUC) assays.

Genome survey

To estimate the genome size of S. plebeia, a flow cytometry assay was employed. We began by collecting leaves from three biological replicates, which were then chopped in an mGb lysis buffer composed of 45 mM MgCl₂·6H₂O, 20 mM MOPS, 30 mM sodium citrate, 1% PVP 40, 0.2% Triton X-100, 10 mM Na₂EDTA and 20 μL/mL β-mercaptoethanol, adjusted to a pH of 7.5. This step facilitated the release of the nuclei from the plant cells. Following lysis, the nuclear suspension was filtered through a 40 μm cell strainer to eliminate debris and aggregates. The filtered nuclei were subsequently treated with propidium iodide (PI) at a concentration of 50 μg/mL and ribonuclease A also at 50 μg/mL. After a dark incubation on ice for 30 min, the fluorescence intensity of the PI-stained nuclei was evaluated using a FACSCalibur flow cytometer manufactured by Becton Dickinson. For accuracy and reliability in comparison, maize B73 was utilized as an internal reference during the analysis. All procedures were executed in triplicate for each sample, ensuring robust results.

Cytogenetic assay to determine karyotype

Cytogenetic analyses were performed to assess both the ploidy level and the chromosome count of S. plebeia. The root tips of S. plebeia were pretreated with an 8-hydroxyquinoline solution (2 mM) for 3 h at 25 °C in the dark. After washing five times with ddH2O, the samples were fixed on ice for 2 h using Carnoy's fixative (absolute ethanol: glacial acetic acid = 3:1) at 4 °C. After washing five times with ddH2O, the samples were subjected to acid hydrolysis in a water bath at 60 °C, using a mixture of 45% glacial acetic acid (v/v) and 1 M HCl in a 1:1 volume ratio for 1 min. Following this, the samples were washed five times with ddH2O, and they were subsequently immersed in ddH2O for 2 h. The samples were stained with an improved carbol fuchsin solution for 10 min. After slide preparation, clear chromosomal images were sought using a LEICA DM1000 optical microscope.

Genome sequencing and assembly

To begin the genomic analysis of S. plebeia, genomic DNA was extracted from young leaves using the cetyltrimethylammonium bromide (CTAB) method. The resulting DNA was evaluated for fragment size and degradation through 0.75% agarose gel electrophoresis. Following this, the purity and concentration of the DNA were quantified using NanoDrop One (Thermo Fisher Scientific, Waltham, MA, USA) and assessed with a Qubit 3.0 fluorometer (Life Technologies, Carlsbad, CA, USA). Once the quality and integrity of the DNA were confirmed, it was subjected to random shearing with a Covaris ultrasonic disruptor. Subsequently, libraries were prepared following the standard protocol using the SQK-LSK109 ligation kit. The purified library was then loaded onto primed R9.4 Spot-On Flow Cells and subjected to sequencing on a PromethION sequencer (Oxford Nanopore Technologies, Oxford, United Kingdom), with 48-h sequencing runs conducted at Wuhan Benagen Technology Co., Ltd. (Wuhan, China). The raw data obtained underwent base-calling analysis utilizing the Oxford Nanopore GUPPY software (Ewing and Green, 1998) (version 4.0.2). In addition, in situ Hi-C chromosome conformation capture was executed according to the DNase-based protocol outlined by Ramani et al. (2020). The resultant libraries were sequenced in a paired-end mode of 150 bp on an Illumina NovaSeq (Illumina, San Diego, CA, USA). For scaffolding at the pseudochromosome level, we employed the assembly software ALLHIC (version 0.9.12) to stitch together the sequences. Finally, the generated files (.hic and .assembly) were imported into Juicebox (Robinson et al., 2018) (version 1.11.08) for manual optimization.

Gene prediction and functional annotation

To identify protein-coding genes within the S. plebeia genome, various forms of evidence, including transcript mapping, de novo gene prediction and homologous gene alignment, were utilized. The ONT cDNA sequences from S. plebeia were aligned with the genome sequence through the use of Minimap2 (v2.17) (Heng, 2018). Subsequently, the transcripts were assembled with StringTie2 (v2.1.5) (Pertea et al., 2015), and ORFs for the assembled transcripts were predicted using Trans-Decoder (v5.1.0) (https://github.com/TransDecoder/TransDecoder). For de novo gene prediction, tools such as Augustus (v3.3.2) (Nachtweide and Stanke, 2019), Genscan (v1.0) (http://bioinf.uni-greifswald.de/webaugustus/predictiontutorial) and GlimmerHMM (v3.0.4) (http://ccb.jhu.edu/software/glimmer/index.shtml) were employed. In order to perform homologous gene alignment, proteins from various closely related species, including A. thaliana, S. splendens, S. miltiorrhiza, S. bowleyana, S. officinalis and S. rosmarinus, were aligned to the S. plebeia genome using Exonerate (v2.4.0) (https://github.com/nathanweeks/exonerate). Finally, gene sets predicted by these three methods were integrated using MAKER (v2.31.10) software (http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi). This process involved eliminating incomplete genes and those with coding sequences (CDS) shorter than 150 bp, resulting in a refined, non-redundant gene set.

The functional annotation of the predicted protein-coding genes was accomplished through Blastp searches (e-value cut-off set at 1e−05) against the NCBI nr database (http://www.ncbi.nlm.nih.gov/) and the Uniprot database (http://www.uniprot.org/). Additionally, searches to identify gene motifs and domains were conducted using InterProScan (v5.33) (Jones et al., 2014) and HMMER (v3.1). The Gene Ontology (GO) terms (http://geneontology.org/) associated with the genes were retrieved from the respective InterPro (https://github.com/ebi-pf-team/interproscan) or Uniprot entries (https://www.uniprot.org/). Pathway annotation was conducted using KOBAS (v3.0) (https://github.com/xmao/kobas) in reference to the KEGG database.

Comparative genomic analyses

Orthologous groups of Oryza sativa, Vitis vinifera, A. thaliana and nine other Lamiaceae plant species (S. splendens, S. miltiorrhiza, S. officinalis, S. rosmarinus, S. baicalensis, S. bowleyana, N. cataria, S. barbata and S. plebeia) were identified using OrthoFinder (Emms and Kelly, 2019) (version 2.3.12) with default settings. The protein sequences of single-copy orthologues from the 12 species were aligned with Muscle (version 3.8.31) (Edgar, 2004). To enhance alignment quality, poorly aligned segments were removed using trimAl (version 1.4) (Capella-Gutiérrez et al., 2009). A phylogenetic tree was subsequently constructed with RAxML (version 8.2.12), utilizing 1000 bootstrap replicates to assess support for the tree topology. Following the generation of the phylogenetic tree, the divergence times among species were estimated employing MCMCtree from PAML (version 4.9) (Yang, 2007) with three specific calibration constraints. Additionally, an analysis of gene family contraction and expansion was conducted using CAFÉ (version 3.1) software (Han et al., 2013) based on the results of gene family clustering.

To analyse whole genome duplications, assessments were performed across six species: O. sativa, Sesamum indicum, S. baicalensis, S. splendens, S. miltiorrhiza and S. plebeia. Initially, the protein sequences across different species were compared using an all-to-all search approach implemented in BLASTP (version 2.6.0+). Next, MCScanX (Wang et al., 2012) (https://github.com/wyp1125/MCScanx) was utilized to evaluate the genomic syntenic blocks. Finally, the synonymous mutation frequency (Ks) of the syntenic gene pairs was calculated following the methodology developed by Yang, as executed in PAML (version 4.9) (Yang, 2007). The distribution of the synonymous mutation rate (Ks) was visualized using the ggplot2 package (version 2.2.1) in R version 2.15 (www.r-project.org).

Functional characterization for candidate RAS in E. coli system

The candidate SpRAS gene was synthesized from the cDNA of S. plebeia using gene specific primers (Table S11) and subsequently inserted into the pET28a vector at the BamHI and EcoRI restriction sites. Following this, the purified plasmids were introduced into BL21 (DE3) strains of E. coli. The transformed bacterial cells were cultured overnight on Luria–Bertani (LB) agar at 37 °C. Positive colonies were identified through colony PCR and further grown in 200 mL of LB medium supplemented with kanamycin (50 mg/L), shaking at 200 rpm and 37 °C until the optical density at 600 nm (OD600) reached between 0.6 and 0.8. Induction of protein expression was then initiated by adding 0.5 mM isopropyl-β-d-thiogalactopyranoside, and the culture was maintained at 20 °C for 14 h. After incubation, the cells were collected and lysed ultrasonically to extract the proteins. The recombinant proteins were purified using Ni-NTA resin, followed by additional purification with the SDL-030-F2 protein purification system (SePure Instruments Co., Ltd., Suzhou, China) to isolate pure recombinant proteins suitable for enzyme activity assays. The enzyme activity was assessed in a reaction mixture containing 20 μg of purified proteins, 1 mM of 4-hydroxyphenyllactic acid or salvianic acid A, and 1 mM of p-coumaroyl-CoA or caffeoyl-CoA in a total volume of 200 μL of 50 mM Tris–HCl (pH 7.5). The reaction was conducted at 25 °C for 1 h. After the reaction period, samples were extracted with 400 μL of ethyl acetate, and the resulting mixtures were dissolved in methanol for subsequent analysis using UPLC-Q-TOF/MS to identify the reaction products.

Functional characterization for candidate CYP98As in yeast system

The candidate gene SpCYP98As was isolated from the cDNA of S. plebeia using genes-specific primers (Table S11) and subsequently integrated into the pESC-His vector, utilizing BamHI and XhoI restriction sites. Following this cloning process, the purified plasmids were introduced into the S. cerevisiae strain WAT11. The transformed yeast cells were then cultivated in SD-His medium at a temperature of 30 °C until the optical density (OD600) reached between 0.8 and 1.0. Afterwards, the cells were harvested and subjected to two washes with sterile water to ensure purity. To induce protein expression, the cells were transferred to yeast extract peptone dextrose medium supplemented with 2% galactose, followed by incubation at 28 °C and 200 rpm for 18 h. Once the induction period elapsed, the culture was promptly utilized for the isolation of microsomal proteins. For enzyme assays, a Tris–HCl buffer (pH 7.5) was prepared, composed of 2 mg of microsomal protein, 1 mM DTT, 15 mM glucose-6-phosphate, 0.015 U glucose-6-phosphate dehydrogenase, along with 15 μM FAD (flavin adenine dinucleotide), 15 μM FMN (flavin mononucleotide) and 1 mM NADPH (the reduced form of nicotinamide adenine dinucleotide phosphate). The substrates included 4-coumaroyl-3′,4′-dihydroxyphenyllactic acid, caffeoyl-4′-hydroxyphenyllactic acid and 4-coumaroyl-4′-hydroxyphenyllactic acid, all combined in a total assay volume of 200 μL. These reactions were then incubated at 28 °C for a duration of 4 h. To analyse the reaction products, the samples were extracted using 400 μL of ethyl acetate and subsequently dissolved in methanol. The final solutions were prepared for UPLC-Q-TOF/MS analysis. As a control, yeast strain WAT11 transformed with an empty pESC-His vector was employed to assess the specificity and reliability of the experimental results.

Functional verification of SpRAS and SpCYP98As genes in N. benthamiana

The ORFs of SpRAS and SpCYP98As genes were cloned into the pEAQ binary plasmid using the restriction enzyme sites AgeI and XhoI, respectively. Correctly sequenced recombinant plasmids were transformed into A. tumefaciens strain GV3101. Positive transformants were selected on selective LB agar plates supplemented with 50 μg/mL kanamycin and 50 μg/mL rifampicin, and incubated at 28 °C. Subsequently, positive transformants were cultured in 10 mL of liquid LB medium, shaking at 28 °C. After cultivation, the cells were harvested by centrifugation at 6000 rpm for 5 min. The resulting pellet was then resuspended in Agrobacterium induction medium (10 mmol/L MES, 10 mmol/L MgCl2, 100 μmol/L acetosyringone, pH 5.8), and incubated at 25 °C for 1 h. The optical density (OD600) of the cell suspension was measured to assess concentration. Using a needleless syringe, the bacterial suspensions were infiltrated into the underside of N. benthamiana leaves aged 4–5 weeks. Two days post-infiltration, 4-hydroxyphenyllactic acid (100 μM) was injected into the previously infected area of the leaf. After another 2 days, the substrate-infiltrated leaves were collected and subsequently extracted with methanol for LC–MS analysis. As a negative control, N. benthamiana leaves infiltrated with A. tumefaciens containing an empty vector were utilized. Each experiment includes at least three plants as parallel experiments.

Hairy root culture induction

A loop of A. rhizogenes A4 was cultivated in 20 mL of yeast extract mannitol broth (YEB) for 3 days, maintained under a photoperiod of 16 h light followed by 8 h of darkness at 28 °C, with agitation set at 200 rpm. After this initial growth period, the 3-day-old bacterial suspension cultures were utilized for the infection of explants. The explants were submerged in the bacterial suspension for 15 min and subsequently transferred to semi-solid 1/2 MS media enriched with 100 μM acetosyringone. These cultures were then incubated at 25 °C in the absence of light for a coculturing phase that lasted 3 days. Following this, a decontamination procedure was initiated, involving treatment with 500 mg/L cefotaxime over a span of 7 days. After the initial 7-day period, the cefotaxime concentration was reduced to 400 mg/L and continued for an additional 7 days. The subcultures were kept in the same type of media until bacterial presence was no longer detected. The resulting hairy roots were grown in 50 mL of liquid 1/2 MS media and subcultured every 30 days, maintaining conditions at 25 °C in darkness while agitating at 120 rpm.

Generating transgenic hairy roots

The complete SpbHLH54 ORF was cloned into the pCAMBIA1301-EGFP vector (at restriction enzyme sites XbaI and BamHI) to generate the pCAMBIA1301-SpbHLH54-EGFP recombinant plasmid. A 237-bp (from 121 to 357 bp) fragment of SpbHLH54 was cloned to construct the pFGC5941-SpbHLH54-RNAi plasmid. S. plebeia hairy root transformation was performed using this accession with the pCAMBIA1301-SpbHLH54-EGFP overexpression plasmid and the pFGC5941-SpbHLH54-RNAi plasmid, while the pCAMBIA1301 and pFGC5941 empty plasmid was used as the control, respectively. The above plasmids were transformed into A4 and then used to infect S. plebeia for hairy roots. Genomic DNA was extracted from hairy roots using the CTAB method, and positive clones were identified by PCR with specific primers (Table S11). The positive hairy roots were harvested from several independent transgenic lines and used for gene expression or content analysis.

Quantitative real-time PCR

The total RNA was extracted from the hairy root using Trizol (Vazyme, Nanjing, China) The first strand of cDNA was synthesized with the HiScript II Q RT SuperMix for qPCR (Vazyme, Nanjing, China). qRT-PCR experiments were conducted using the ChemQ SYBR qPCR Master Mix (Vazyme, Nanjing, China). The relative expression levels of the target genes were assessed using the 2−∆∆Ct method, with SpActin as the reference gene. The specific primers for qRT-PCR are listed in Table S11. Each sample was analysed in three biological replicates.

Yeast one-hybrid assay

In the context of Yeast One-Hybrid (Y1H) assays, the complete SpbHLH54 ORF was amplified and subsequently inserted into the effector plasmid pB42AD. Meanwhile, a triple tandem repeat of the pSpRAS E-box region (CCNNTG) was integrated into the reporter plasmid pLacZ, located between the EcoRI and XhoI restriction sites. Both the effector and reporter plasmids were co-transformed into the yeast strain EGY48a. The transformants were then allowed to grow on SD/-Ura/−Trp medium for a period of 48 h. Following this incubation, the cultures were subsequently assayed on SD/-Ura/−Trp medium supplemented with 5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside (X-gal) for 24 h to observe reporter activity. As a control, yeast was co-transformed with the empty pB42AD and pLacZ plasmids to assess the specificity of the results. A comprehensive list of primers used to amplify SpbHLH54 and the corresponding DNA motifs is provided in Table S12.

Dual-LUC assays

To evaluate the capacity of SpbHLH54 to transcriptionally activate the genes involved in rosmarinic acid biosynthesis, the approximately 2000 bp promoter regions of pSpRAS were examined and subsequently cloned into the pGreenII0800-LUC vector. This construct was designed to drive the expression of the Firefly luciferase gene, while the Renilla luciferase gene, regulated by the CaMV 35S promoter, served as an internal control. The constructed vectors were then co-transformed along with the helper plasmid pSoup into the A. tumefaciens strain GV3101. For the effector component, this strain was transformed with pHB-SpbHLH54-EGFP, while pHB-EGFP was utilized as the negative control. The reporter strains were combined in a 1:1 ratio with the effector strains containing either pHB-SpbHLH54-EGFP or pHB-EGFP. Luminescence detection assays for the Firefly luciferase gene were conducted following the methodology previously outlined for Dual-LUC assays (Shu et al., 2022). The luminescence produced was captured using a chemiluminescence imaging system (Tanon-5200, China). The fluorescence intensity was quantified using ImageJ software. This experiment was conducted in triplicate to ensure reliability of the results.

HPLC and LC–MS analysis

The analysis of metabolites was conducted using UPLC-Q-TOF-MS/MS with an AB SCIEX TripleToF® 5600 mass spectrometer (Redwood City, CA, USA), operating in negative ion mode. For the UPLC procedure, a C18 reverse-phase column (100 × 2.1 mm, 1.5 μm, Thermo Scientific, USA) was employed, with the column temperature set to 30 °C. The detection was performed at a wavelength of 330 nm. The solvent gradient utilized two solutions: solvent A consisted of 0.1% formic acid in water, while solvent B was acetonitrile. The gradient profile was as follows: starting with 10% B at 0 min, increasing to 40% B at 20 min, reaching 80% B at 25 min and finally achieving 95% B at 30 min. The flow rate was consistently maintained at 0.4 mL/min. The mass spectrometry parameters included a survey scan range of 100-2000 Da, an ion source heater temperature of 550 °C, an ion spray voltage set at 4500 V and a collision energy of 44 V. The obtained MS/MS data was processed using Peakview Software (version 1.2.0.3, AB SCIEX, Redwood City, CA, USA). For the quantitative analysis of various metabolites in the transgenic hairy roots of S. plebeia, high-performance liquid chromatography (HPLC) was employed. Different lines of transgenic hairy roots were cultured in 100 mL of 1/2 MS liquid medium for 30 days, after which they were harvested and freeze-dried before being ground into powder. To measure the metabolites, 20 mg of the powdered root tissue received an addition of 1 mL of methanol and underwent ultrasonication for 30 min. Each tissue sample was analysed in triplicate.

Acknowledgements

This research was supported by Xinjiang safflower industry development fund, the key project at the central government level: the ability establishment of sustainable use for valuable Chinese medicine resources (2060302) and the University Synergy Innovation Program of Anhui Province (GXXT-2023-070). It was also supported by the Fundamental Research Funds for the Central Universities (2632024TD04), the open research fund of Yunnan characteristic plant extraction laboratory (YKKF2023002) and the specialized research funds form the State Key Laboratory of Natural Medicines, China Pharmaceutical University (SKLNMZZ2024JS37). The authors are grateful to Dr. Meiliang Zhou (Chinese Academy of Agricultural Sciences) for providing the A. rhizogenes strain A4.

    Conflict of interest

    The authors declare no conflicts of interest.

    Author contributions

    Yucheng Zhao and Minjian Qin conceived and designed the research. Yiqun Dai and Guoyong Xie collected the samples. Rui Wang performed the genome assembly, experimental and data analysis. Yiqun Dai, Kaixuan Wang, Huihui Zeng and Mengqian He conducted gene functional experiments. Hui Liu, Xiaojing Ma and Yan Zhu participated in data and LC–MS analysis. Yiqun Dai and Yucheng Zhao wrote the manuscript. Yucheng Zhao and Minjian Qin revised the manuscript.

    Data availability statement

    The genome sequence data have been deposited to the Genome Sequence Archive at the National Genomics Data Center (NGDC) under BioProject no. PRJCA020085. The accession numbers in the NCBI database of the genes whose functions have been verified in this work are as follows: SpRAS (PQ510195), SpCYP98A77 (PQ510196), SpCYP98A78 (PQ510197), SpCYP98A75 (PQ510198) and SpbHLH54 (PQ510199). Other data regarding this study is included in the supporting information.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.