Structure-based virtual screening aids the identification of glycosyltransferases in the biosynthesis of salidroside
Summary
Glycosylation plays an important role in the structural diversification of plant natural products. The identification of efficient glycosyltransferases is also a crucial step for the biosynthesis of valuable glycoside products. However, functional characterization of glycosyltransferases (GTs) from an extensive plant gene list is labour-intensive and challenging. Salidroside is a bioactive component derived from plants, widely utilized in the fields of food and medicine. Here, through transcriptome analysis and structure-based virtual screening, we identified two GTs that participated in the biosynthesis of salidroside from a rarely studied herbaceous plant, Astilbe chinensis. Ach15909 was found to possess high catalytic activity as evidenced by the determination of its catalytic parameters. The key residues that determine its catalytic activity were further determined. Additionally, Ach15909 shows a preference for substrates with a volume of <150 Å3, and replacing the interdomain linker region located between the N- and C-terminal domains of Ach15909 allows it to accept substrates that were previously not catalyzable. Overall, the structure-based virtual screening approach showed high efficiency and cost-effectiveness; the successful identification of GTs in salidroside glycosylation sheds light on uncovering additional plant biosynthesis enzymes in the forthcoming research.
Introduction
Plants produce a large number of chemicals with varying biological activities (Dixon, 2001). Glycosylation plays an important role in the structural diversification of plant natural products (Louveau and Osbourn, 2019). In addition, glycosylation represents an important step in metabolic engineering to produce drug leads, cosmetics, nutrients and sweeteners (Yu et al., 2012). UDP-glycosyltransferases (UGTs) that are grouped into family 1 of the glycosyltransferase (GT) family could facilitate the transference of glycosyl moieties from nucleotide-activated sugars to their specific acceptor molecules (Cantarel et al., 2009). Usually, the UGTs accommodate their sugar donors primarily via a conserved 44-amino-acid plant secondary product glycosyltransferase (PSPG) motif, and a histidine positioned around the 20th residue was reported to act as a general base to deprotonate the acceptor (Bowles et al., 2006). The UGT crystal structure shows the classical GT-B folding, which consists of two N- and C-terminal domains with similar Rossmann-like folds. The N- and C-terminal domains of the UGT enzymes form a cleft which is the substrate binding site. The nucleotide sugar donor is located in this cleft and mainly interacts with the C-terminal domains of the enzymes and the acceptor mainly binds to the N-terminal domains (Lairson et al., 2008). However, the determinants of acceptor recognition remain obscure, and the residues within the acceptor binding pocket are poorly conserved, resulting in remarkable promiscuity with regard to aglycone acceptors (Wang, 2009).
The discovery of novel plant UGTs has greatly accelerated with the development of high-throughput sequencing (Liu et al., 2022; Sun et al., 2022). Traditional sequence-based gene function screening methods, along with co-expression analysis, have enabled the identification of a diverse array of enzyme genes over the past few decades. However, owing to the labour-intensive function characterization of a range of individual enzymes, the determinants of aglycone acceptors in UGTs are still challenging (Louveau and Osbourn, 2019; Osmani et al., 2009). Recently, several new technologies have been applied in the rapid identification of UGTs, such as GT-predict and glycoside-specific metabolomic and precursor isotopic labelling (GSM-PIL) analysis (Wu et al., 2022; Yang et al., 2018). These approaches provided meaningful biological insights that guided functional annotation of the UGTs, but limitations still existed in the accuracy and the regiochemical bias of the substrates, as the 3D structure and the protein-ligand interactions were not fully considered.
Alphafold and RoseTTAFold were developed by incorporating the empirical knowledge about protein structure into a deep-learning algorithm, and the resulting protein models were reported as accurate as experimentally determined (Baek et al., 2021; Jumper et al., 2021). Structure-based virtual screening uses computer programs to discover high-affinity ligands to the potential protein structure (Lavecchia and Di Giovanni, 2013; Macalino et al., 2015). The methods have been extensively applied in the early stage of drug discovery for more than a decade (Kufareva et al., 2012; Lagarde et al., 2015). Here, we suppose that the virtual screening system would be applicable in the primary function screening of diverse plant enzymes, such as UGTs.
Salidroside, the 8-O-glucoside of tyrosol, is of particular interest and value because of its biological activities (Zhang et al., 2021). However, commercially available salidroside is currently obtained through direct purification from Rhodiola plants (Booker et al., 2016). Attempts to engineer salidroside biosynthesis in E. coli or yeast chassis have been developed (Liu et al., 2018, 2021). Meanwhile, the UGTs that regiospecifically glycosylate tyrosol at the 8-OH group have been mined from plants or microbes, such as RrUGT33, UGT85A1, UGT72B14, UGT74R1, UGT73B6, UGT85AF8, ugtBL1 and ugtBL3 (Chung et al., 2017; Fan et al., 2017; Ma et al., 2007; Torrens-Spence et al., 2018; Yang et al., 2021; Yu et al., 2011). However, the development of new UGT enzymes with high activity and specificity is still urgently needed, and the catalysis mechanism also requires detailed investigation.
In this study, we found that Astilbe chinensis contains glycosyltransferases capable of synthesizing salidroside. We then characterized 49 candidate UGT genes by a combination of full-length and short-read transcriptome sequencing. Their protein structures were further predicted by RoseTTAFold. Through a virtual screening towards the substrate tyrosol, we successfully identified and characterized the UGT involved in salidroside biosynthesis. With additional mutagenesis experiments, its catalytic and substrate recognition mechanism were further studied.
Results
Crude proteins isolated from the A. chinensis rhizome specifically glycosylate tyrosol to produce salidroside
Salidroside was detected in our trial metabolomics study in the A. chinensis plant (Figure 1a). The contents of salidroside of the root, stem, leaf, rhizome and flower of A. chinensis were then measured using high-performance liquid chromatography–tandem mass spectrometry (HPLC–MS) via comparisons to authorized standards (Figure S1). In the presence of UDP-glucose and tyrosol, crude protein from the rhizome was observed to catalyse the substrate (tyrosol) to generate a slight but clear peak that corresponded to the salidroside standard (Figure 1b); the product was further confirmed by HPLC-MS (Figure S2). We thus presumed that the rhizome of A. chinensis contained glycosyltransferases for salidroside biosynthesis.

Characterization of candidate UGTs by integrating transcriptome sequencing and virtual screening
Given that there was no A. chinensis genome information available, we employed a PacBio HiFi full-length transcriptome sequencing by mixing the root, rhizome, leaf and stem tissues. Illumina short reads sequencing was also performed on the individual samples. A total of 23 684 transcripts were assembled by a hybrid de novo assembly of PacBio and Illumina reads. Based on these data, we identified 82 UGT genes by using a UGT superfamily signature motif PF00201 (http://pfam.xfam.org/). 76 UGT sequences with amino acid lengths >300 were selected for further analysis. The differential expression genes (DEGs) analysis recommended that 49 UGTs are highly expressed in the rhizome (TPM >1) (Figure 1c).
The protein structures of these UGTs were then predicted using RoseTTAFold (Baek et al., 2021). By aligning the predicted structures with the publicly released UGT crystal structure Bc7OUGT (PDB ID: 8ITA), the binding position of the sugar donor was determined for each candidate UGT. Then, the space close to the glucose molecule was selected for the potential sugar receptor pocket (Figure 2a). The volume of binding pockets of these candidate UGTs showed large variations ranging from 121.1 to 1293.2 Å3, with an average of 387 Å3 (Figure 2b, Table S1). In our virtual screening, the molecular docking between the predicted 49 candidate UGT protein structures and the substrate tyrosol was performed by using AutoDock Vina (De Vita et al., 2019; Trott and Olson, 2010) (Figure 2a).

In the conformation with the lowest energy provided by molecular docking, we further analysed the distances and angles between the key catalytic His residue of UGT, the sugar donor and the sugar acceptor (Figure 2a). Out of 49 UGTs, seven UGTs lacking the key His residue were excluded. First, we measured the distance between the N atom of the His residue and the O atom of the 8-OH of tyrosol, with 14 UGTs showing a distance <5 Å. Second, we measured the distance between the C atom of UDP-Glu on the sugar donor and the O atom of the 8-OH group of tyrosol, with another 10 UGTs having a distance <5 Å. Lastly, the substrate molecule should also be positioned at a favourable angle within the binding pocket. In this case, the smaller the angle between the sugar C atom of UDP-Glu, tyrosol 8-OH and tyrosol 1-OH, the more precise the transfer of the sugar moiety to 8-OH instead of 1-OH. The dihedral angles were then measured for each conformation (Figure 2c). Overall, among all the analysed UGTs, Ach15909, Ach17173 and Ach16750 had the best docking models with the sugar acceptor and sugar donor, exhibiting optimal distances and suitable dihedral angles (Figure 2d).
Enzymatic properties and key residues identification of Ach15909
Gene-specific primers of these UGTs were designed and utilized to amplify the corresponding genes from the cDNA library of A. chinensis rhizome tissue (Table S2). The recombinant proteins were heterologously expressed in Escherichia coli and subsequently tested. By using UDP-glucose as the sugar donor and tyrosol as the acceptor, their catalytic activities were tested in vitro, and crude extract from the strain that expressed an empty vector was included as a negative control. As shown, the heterologously expressed Ach15909 and Ach16750 successfully converted tyrosol to salidroside, whereas Ach17173 exhibited no activity. In detail, Ach15909 exhibited a nearly 100% conversion rate, Ach16750 converted 41.3% substrate, and product 1a was further confirmed by HPLC with authentic salidroside (Figure 3a). In brief, these results underscored that Ach15909 successfully glycosylated tyrosol to salidroside.

Since Ach15909 showed the highest activity, the catalytic properties of Ach15909 were further explored. We measured the catalytic and kinetic parameters of Ach15909 with tyrosol as the substrate with an apparent Km of 0.74 mM. The Vmax value of Ach15909 was 640.83 nmol min−1 μg−1 protein, and the Kcat/Km was 773.62 s−1 mM−1 (Figure S3). Among the enzymes with the same function that have been reported, its enzymatic activity was the highest with its low Km value, indicating that Ach15909 showed high affinity for tyrosol (Table 1). The effects of pH and temperature on its catalytic activity were also tested. The maximum catalytic activity of Ach15909 was observed at pH 7.0 and 42 °C (Figure 3b). In addition, while EDTA, Mg2+ and Zn2+ had no effects on its glycosylation activity, the presence of Cu2+, Mn2+ and Ca2+ caused an inhibition of 27–85% of the catalytic activity of Ach15909 (Figure 3b). To investigate the sugar donor selectivity, four sugar donors were tested using tyrosol as the acceptor. Ach15909 exhibited the highest preference for UDP-Glu, and about 20% conversion rate for UDP-Rha, while the conversion rates for UDP-GlcA and UDP-Gal were <10% (Figure 3b).
Name | Species | Km | Vmax | Kcat | Kcat/Km | Reference |
---|---|---|---|---|---|---|
Ach15909 | Astilbe chinensis | 0.74 ± 0.20 mM | 640.83 ± 72.22 nmol min−1 μg−1 | 572.47 ± 64.51 s−1 | 773.62 s−1 mM−1 | This study |
RrUGT33 | Rhodiola rosea | 1.37 ± 0.05 mM | 576.20 ± 5.68 s−1 | 420.6 s−1 mM−1 | Torrens-Spence et al. (2018) | |
UGT73C5 | Arabidopsis thaliana | 2.07 ± 0.1 mM | 1.19 ± 0.08 min−1 | 0.57 min−1 mM−1 | Liu et al. (2021) | |
UGT72B14 | Rhodiola sachalinensis | 4.7 ± 0.35 μM | 57.8 ± 3.2 (pkat mg−1) | Yu et al. (2011) | ||
UGT74R1 | Rhodiola sachalinensis | 172.4 ± 14.1 μM | 293.1 ± 14 (pkat mg−1) | Yu et al. (2011) | ||
UGT73B6 | Rhodiola sachalinensis | 54.3 ± 4.9 μM | 249.8 ± 13 (pkat mg−1) | Yu et al. (2011) | ||
UGT85AF8 | Ligustrum robustum | 51.33 ± 9.34 μM | 2.562 ± 0.240 s−1 | 48.12 s−1 mM−1 | Yang et al. (2021) |
To explore the catalytic mechanism of Ach15909, the molecular docking model of Ach15909/UDP-Glu/tyrosol was launched (Figure 3c). Our model showed that the 8-OH group of tyrosol was positioned closest to the glucose moiety of UDP-Glu. The substrate tyrosol was surrounded by hydrophobic residues (distance <5 Å), such as Ile87, Tyr157, Ile305, Phe400 and Ala401, creating a hydrophobic environment in the substrate binding pocket (Figure S4). Meanwhile, we found that the amide group of Gln403 with the 8-OH group of tyrosol established a hydrogen bond, a specific molecular interaction, which was possibly involved in its catalytic activity. In addition, the imidazole group of His22, which was previously characterized for its role in the deprotonation of the activated hydroxyl group, showed close proximity to the 8-OH group of tyrosol (Shao et al., 2005). There was a strong π-π stacking conjugated hydrophobic interaction between the benzene ring of the Phe400 and tyrosol, which might help to maintain the planarity of the structure of tyrosol (Sun et al., 2024). To investigate their roles in the catalytic process, the residues His22, Gln403 and Phe400 were substituted with Alanine. Subsequent in vitro experiments showed that the catalytic activity of these mutants was decreased by 88.6%, 76.3% and 35.5%, respectively (Figure 3d). This suggests that the interactions between these residues and the substrates are crucial for the glycosylation process.
The Ach15909 interdomain linker contributes to its catalytic promiscuity
To determine the substrate promiscuity of Ach15909, a compound library with 60 molecules was constructed as potential substrates, containing common types of compounds with diverse molecular sizes and chemical backbones, such as phenols, coumarins, naphthalenes, flavonoids, anthraquinones, glycosides, alkaloids and terpenoids (Figure 4a, Figure S5). Glycosylation activities for compounds 1–60 were tested with purified Ach15909, their products were further analysed by HPLC, and the conversion rates were compared and summarized (Figure 4b, Figures S6–S22). To sum up, Ach15909 showed a higher conversion rate towards substrates with small sizes (Figure 4b). Especially for solvent-excluded volume of compounds that are smaller than 150 Å3, such as 1, 4–7, 9–14, 19 and 23, the conversion rates for these substrates were higher than 60%. By contrast, for substrates with larger molecular volume, such as 26–28, 30–32 and 38–40, Ach15909 showed <40% conversion rate. Apart from the compounds mentioned above, some compounds with small molecular volume, such as 2–3, 8, 15–22, 24–25, 29 and 33–37, as well as substrates 43–60 that were larger than 240 Å3 were tested, but Ach15909 showed no activities.

To explore the potential mechanism with regard to the catalysis promiscuity of Ach15909, we performed the B-factor analysis of the protein structure. It revealed that the interdomain linker region (Val256-Glu282) was highly flexible (Figure 4c), and it was probably served as the entrance of the substrate binding pocket (Figure 4d). We also compared the predicted Ach15909 protein structure to the crystal-solved UGT85H2 protein structure (PDB ID: 2PQ6) (Figure 4d) (Li et al., 2007). While the amino acid sequence identity between Ach15909 and UGT85H2 was 56%, the overall Root Mean Square Deviation (RMSD) between the Ach15909 and UGT85H2 was 1.24 Å, and the RMSD values between their N-terminal domain and C-terminal domain were 1.22 Å and 0.77 Å, respectively. Several reports have proposed that the flexible interdomain linker might mediate the domain movements of UGTs, resulting in the alteration of the cleft spaciousness between them (Qasba et al., 2005; Shao et al., 2005). This hypothesis was further supported by the fact that the crystal structure of the linker region was highly flexible and was hard to elucidate in certain UGTs (Table S3, Figure S23A) (Brazier-Hicks et al., 2007; Li et al., 2007; Offen et al., 2006). To investigate the role of the interdomain linker in Ach15909's substrate selectivity, we replaced the 26 amino acid linker sequence of Ach15909 with the 28 amino acid linker sequence from UGT85H2, resulting in the creation of a chimeric protein, Ach15909-UGT85H2linker (Figure 4e). In our predicted structure model, the residues surrounding the substrate binding pocket were found to remain unchanged by the substitution, but the cleft depth increased from 25.39 Å to 28.44 Å (Figures S24–S25). The chimeric protein Ach15909-UGT85H2linker was heterologously expressed, and its activity was further tested.
The results indicated that the catalytic activity of the chimeric enzyme towards tyrosol (1) was completely lost, and so are the substrates 6–10, 19, 26–28, 30–32, 38 and 39. While catalytic activity towards the substrate 14 remained unchanged, it was significantly reduced against 4, 5, 12, 13, 23 and 40 (Figure 4f). However, the acquisition of glycosylation function was determined in flavonoid substrates such as 25 and 34; these new peaks were identified as O-glycoside products by HPLC and LC-MS (Figure 4g, Figure S26). Even though Ach15909 didn't show catalytic activity to the all given substrates with volumes smaller than 150 Å3, such as 2, 3, 11, 15–18 and 20–22, the abolished or reduced activities were mainly found for substrates with volumes smaller than 150 Å3 towards the chimeric protein Ach15909-UGT85H2linker, while a gain of function was detected in the substrates that were larger than 150 Å3.
These findings indicated that the interdomain linker of Ach15909 played a critical role in substrate accommodation and recognition. For the chimeric protein, the substrate 34 was detected to be glycosylated, while Ach15909 could only accept the structurally similar substrate 27. There was a hypothesis that the replacement of the interdomain linker possibly resulted in a larger and more flexible binding pocket, able to accommodate larger small-molecule substrates. However, this assumption implied to the substrate immobilization during the substrate binding process, which may lead to a lower final conversion rate towards the substrates, such as 40. Moreover, we found that the linker region in plant UGTs was highly variable with respect to both length and sequence (Table S3, Figure S23B). The construction of chimeric enzymes by altering this linker region may improve UGT catalytic properties, and new enzymes can be artificially created.
Discussion
Traditional sequence-based gene function screening methods, along with co-expression analysis methods, have enabled the identification of a diverse array of enzyme genes in the past few decades (Hu et al., 2021). We view structure-based virtual screening as an adjunct to these conventional approaches, offering the potential to enhance screening efficiency. The advanced artificial intelligence-driven accurate prediction of protein structures, such as AlphaFold2 and RoseTTAFold, presents great potential in precise virtual screening. Herein, we present a novel attempt that has proven successful in our study for the identification of salidroside glycosyltransferases. These initial efforts also suggest that when dealing with larger datasets, involving thousands or even tens of thousands of candidate genes, this method could exhibit a superior efficiency that surpasses experimental techniques (Huang et al., 2023).
While crystal structures of several plant UGTs have been successfully determined, there remains a significant scarcity in the structures depicting the UGT-sugar donor-sugar acceptor complexes. This limitation hinders a comprehensive understanding of the catalytic mechanisms of plant UGTs. Consequently, further research is required to elucidate the substrate selectivity and catalytic efficiency of these enzymes to provide a more detailed explanation of their functions. The UGT crystal structure shows the C-terminal domain interacting with the sugar donor and the N-terminal domain interacting with the acceptor (Osmani et al., 2009). However, the information about the role of the middle-connected linker domain is nearly unknown. Previous mechanism studies mainly focused on the residues around the substrate binding pocket, but our work demonstrated the linker region apart from the binding pocket also contributed to its activity and substrate promiscuity. This highly flexible and variable linker structural domain was located at the entrance to the substrate binding pocket, probably controlling the recognition and accommodation of substrates.
A. chinensis is a perennial herb in the Saxifragaceae family. It has been used as an ornamental plant for its beautiful flowers, and also as a medicinal plant for the treatment of arthralgia, chronic bronchitis and headache (Pan, 1985). Nonetheless, little study has been done on A. chinensis or any species in the Saxifragaceae family so far; its phytochemistry and genetic resources deserve to be explored. Salidroside has traditionally been extracted only from Rhodiola rosea, a species belonging to the Crassulaceae family (Chiang et al., 2015). To date, its presence in A. chinensis or any other plant within the Saxifragaceae family has not been documented, thereby our results suggest that this plant species possesses a wealth of undiscovered resources.
In summary, by integration of transcriptome sequencing and virtual screening, we identified a novel and highly efficient enzyme Ach15909 from A. chinensis that participated in the biosynthesis of salidroside. This method could be applicable for future research, enabling rapid and cost-effective exploration of plant enzymes that synthesize valuable plant natural products. Our further mechanistic studies revealed that an interdomain linker region within Ach15909 was crucial for its substrate promiscuity, enriching our understanding of the catalytic mechanism of plant UGTs.
Materials and methods
Plant materials, reagents and chemicals
A. chinensis was purchased from Tianjin Lanxiu Gardening Co., Ltd (Tianjin, China). The species was identified by morphology and simple sequence repeat markers (Agarwal et al., 2008). All anhydrous solvents were in analytical reagent grade and obtained from Sangon Biotech Co., Ltd. (Shanghai, China). Methanol and acetonitrile were in HPLC grade and purchased from FTSCI Science and Technology Co., Ltd (Wuhan, China). Substrates of the compound library, authentic standards and sugar donors were purchased from Bide Pharmatech Co., Ltd. (Shanghai, China).
Extraction and quantification of salidroside in A. chinensis
The rhizome, stem, leaf and root tissues of A. chinensis were collected separately, promptly frozen in liquid nitrogen, followed by grinding into fine powder. Subsequently, the samples were lyophilized utilizing a Freeze Dryer (Labconco, Kansas, USA) and extracted as reported previously (Cheng et al., 2023). Three biological replicates were set up for each sample. Quantification analysis of salidroside was performed by an external standard to conduct a calibration curve.
HPLC and MS analysis
HPLC analysis was performed by LC-2030 Plus (Shimadzu Corporation, Tokyo, Japan) with ZORBAX SB-C18 column (Agilent, 5.0 μm, 4.6 mm × 250 mm) at a flow rate of 0.5 mL min−1 with a binary solvent system. Solvent A (0.1% formic acid in water) and solvent B (0.1% formic acid in acetonitrile) served as mobile phases. The gradient elution programme was ramped from 10% B to 100% B over 30 min (20% B at 10 min, 40% B at 15 min and 100% B at 25 min). The injection volume was 10 μL, and the detection was monitored at a wavelength of 275 nm. The column temperature was set at room temperature.
MS analysis was performed with LTQ-Orbitrap-XL mass spectrometer (Thermo Fisher Scientific, Boston, MA, USA) consisting of an Accela ultra-high-pressure liquid chromatograph and a TSQ Quantum Ultra triple-quadrupole mass spectrometer equipped with an ESI source at a flow rate of 0.3 mL min−1. Mobile phases were consistent with HPLC analysis and the injection volume was 1 μL. The gradient elution procedure was as follows: 10% B for 1 min; 10–100% B for 10 min; 100% B for 5 min. The column temperature was maintained at 25 °C. The mass analysis was run in positive ion mode, with a scan of the mass range: 100–1000 Da. MS condition and software were same with that we used previously (Cheng et al., 2022).
Crude enzyme extraction and enzymatic assays
Two grams of fresh leaves and rhizomes of A. chinensis were collected and were grounded into fine powder in liquid nitrogen. Frozen tissue samples were immersed in 10 mM PBS buffer (pH 7.4) supplemented with 5 mM Dithiothreitol (DTT), 2% polyvinylpolypyrrolidone (PVPP), 10 mM EDTA, 1× protease inhibitor cocktail I (Cat No: C0001; TargetMol Chemicals Inc., Boston, MA, USA) and 1× Phenylmethanesulfonyl fluoride (PMSF). Then these samples were rotated slowly in a freezer at 4 °C for 1 h, and the mixture was centrifuged to pellet plant debris. Subsequently, the supernatant was expeditiously filtered through a 0.22 μm polyethersulfone membrane, and dialyzed against a PBS buffer (pH 7.4) overnight at 4 °C to remove low molecular weight components. The obtained crude protein was collected and stored at −80 °C. The conditions for the crude enzyme incubation reaction were as follows. The crude protein extraction was incubated with 50 mM PB buffer (pH 8.0), 1.5 mM UDP-glucose, 1 mM substrate and 1 mM MgCl2 in a total volume of 100 μL at 37 °C for 12 h, and the control reaction was set up with inactivated crude enzyme. The reaction was stopped by the addition of 100 μL methanol on ice. These samples were centrifuged at 13 400 g for 2 min, then filtered through a 0.22 μm organic filter for HPLC analysis as that described before. The assay was carried out according to methods reported previously (Fu et al., 2021).
Total RNA extraction, cDNA synthesis and gene cloning
The total RNA was extracted from the stem, root, leaf, flower, and rhizome of A. chinensis by using a FastPure® Plant RNA Isolation Kit (Vazyme Biotech Co., Ltd., Nanjing, China) according to the manufacturer's instructions. To obtain the first-strand cDNA from total RNA, a HiScript® III 1st Strand cDNA Synthesis kit (Vazyme Biotech Co., Ltd.) was used. To amplify the coding sequence (CDS) of candidate UGT genes, further PCR procedure was performed using the first-strand cDNA (1 μL) as a template. The construction of targeted mutations and chimeric proteins was performed by PCR. All primer sequences used were shown in Table S2. The target UGT genes were cloned into the pET28a (+) vector with an N-terminal 6 × His tag and an N-terminal SUMO tag using homologous recombination method (Vazyme Biotech Co., Ltd.). The single recombinant plasmid of the UGTs was chosen to verify by Sanger sequencing.
Transcriptome sequencing and data analysis
For full-length transcriptome sequencing, mixed RNA samples were sent for iso-seq using the PacBio Sequel II platform. Sequencing was performed by Novogene (Beijing, China). Open reading frames were identified using TransDecoder software (version 5.5.0, https://github.com/TransDecoder/TransDecoder) to obtain putative genes, coding sequences (CDS) and protein sequences. For gene annotation, the DIAMOND program (version 2.0.4) (Buchfink et al., 2021) was used to align the sequencing reads against the Swiss-Prot protein database (https://www.uniprot.org) with an E-value threshold of 1e−5.
Illumina short-read RNA-seq data was analysed to determine the expression level of each predicted transcript. Prior to downstream processing, Trimmomatic (version 0.39) (Bolger et al., 2014) was used to filter out reads contaminated by primer/adaptor sequences, low-quality reads with more than 30% bases, and those with a Phred quality score of <15. After filtering, various statistical files were generated in text and graph formats, along with the files of the high-quality filtered reads. In order to determine the expression pattern of the predicted genes, we first mapped the clean RNA-Seq reads derived from different tissues to the assembled transcripts, and the expression was calculated for each isoform using Salmon software (version 1.4.0). The counts were normalized by calculating the Transcripts Per Kilobase of exon model per Million mapped reads (TPM) for each gene and were further log 10-transformed.
Sequence alignment and phylogenetic analysis
UGT genes were identified using the Hidden Markov Model (Eddy, 2004) in the PFAM database. The redundant sequences with a similarity of 0.90 were removed using CD-HIT program (version 4.8.1) (Fu et al., 2012). Subsequently, multiple sequence alignments of these proteins were performed using MAFFT tool (version 7.475) (Katoh et al., 2002). Finally, a phylogenetic tree was constructed utilizing the FastTree software (version 2.1.11) (Price et al., 2009) and the maximum likelihood method. The bootstrap process was replicated 1000 times.
Recombinant protein expression in E. coli and purification
The recombinant plasmid was transformed into E. coli Rosetta 2 (DE3) cells for heterologous expression. A single colony was then inoculated into LB medium containing 50 μg mL−1 kanamycin and 50 μg mL−1 chloramphenicol, and cultured according to the method reported previously (Cheng et al., 2022). The cells were subsequently harvested and the collected cell pellets were resuspended in PBS buffer (pH 7.4), subjected to sonication at 100 W for 10 min for cell disruption to release the proteins. The resulting mixed solution was centrifuged to facilitate removal of insoluble cell debris. The supernatant was loaded on a Ni-NTA affinity column (GenScript, Nanjing, China) with a linear gradient of imidazole for gradient elution and the eluted protein was further exchanged into PBS buffer in 30 kDa Millipore Ultrafiltration centrifugal filter (Merck, Germany) by centrifugation. The protein concentration was quantified by using an Enhanced BCA Protein Assay Kit (Beyotime, Shanghai, China), and the purified protein was stored at −80 °C for catalytic activity assays.
Enzymatic assays and optimization of enzymatic reaction
60 μg purified protein was incubated with 50 mM PB buffer (pH 8.0) containing 1.5 mM UDP-glucose and 1 mM substrate in a total reaction volume of 100 μL. The incubation reactions proceeded at 37 °C for 12 h and were terminated by the addition of 100 μL of methanol. Through centrifugation and filtration, these samples were applied for HPLC analysis. The kinetics of UGT were characterized in 2 mM UDP-glucose and various concentrations (0.05–2.0 mM) of tyrosol in 100 μL of reaction buffer (50 mM PB, pH 8.0). Reactions were started with the addition of purified protein, incubated at 37 °C for 30 min and quenched for HPLC analysis. The optimal reaction conditions were determined using the method of controlling variables including pH (2–12), temperature (27–57 °C), and 1 mM metal cation (Mg2+, Cu2+, Zn2+, Ca2+, Mn2+ and EDTA). PB buffer with different pH values was prepared by mixing disodium hydrogen phosphate and sodium dihydrogen phosphate, while hydrochloric acid and sodium hydroxide were used to adjust. All reactions were performed in triplicate. Standards solution of varying concentrations were prepared, and corresponding standard curves were plotted. The product concentration of each reaction was determined using external standards, and the results were fitted non-linearly. The enzymatic kinetic parameters were calculated by utilizing GraphPad Prism software (version 6.0).
Molecular docking and physicochemical properties calculations of molecules
The UGT protein structures of A. chinensis were predicted by RoseTTAFold. Firstly, the protein structure of UGT was preprocessed and optimized. Energy minimization of the entire structure was further performed by using the OPLS3-2005 force field. UGTs with reported crystal structures were downloaded from the PDB database (https://www.rcsb.org/). Then sugar donors were inserted into the docking model and the corresponding docking grid was generated by aligning the studied UGT protein structure with Bc7OUGT (PDB ID: 8ITA) in three-dimensional space. Molecular docking calculations were conducted with AutoDock Vina program. Visualization of the docking models and corresponding graphical outcomes was implemented by PyMol.
To verify the protein conformation, we calculated the distances between the N atom of histidine side chain and O atom of the tyrosol 8-OH position, as well as between the C atom of the sugar donor and the O atom of the tyrosol 8-OH position. In addition, we measured the angles from the C atom of the sugar donor to the O atom at the tyrosol 1-OH position and the O atom at the tyrosol 8-OH position. If these distances were <5 Å and the angle was <90°, the protein was considered to be conformationally correct.
Each molecular structure was energy-minimized by using ChemBio 3D's (version 14.0) MM2 function and then converted to ‘.sdf’ format individually. Further selected molecules were used for properties computing. The desired physicochemical properties were picked, and structural information of substrate molecules such as molecular volume and molecular weight (MW) was calculated, respectively. The results were listed in Table S4.
Acknowledgements
This work is supported by National Key Research and Development Program of China grant 2022YFA0912100, Natural Science Foundation of China grant 32100413, and a startup fund from Wuhan University.
Conflict of interest
The authors have applied for a patent based on this work.
Author contributions
YY, FC, CW, XC and WC performed the experiments. FC, WC and QW analysed the data. LL and YY conceived the project, formulated the experiments, and prepared the manuscript. ZD and TL discussed the results and contributed to the final manuscript. All authors read and approved the final manuscript.
Open Research
Data availability statement
RNA-Seq raw reads have been submitted to NCBI as a BioProject under accession PRJNA930593. Sequences of UGTs in this study were provided in Table S1.