Rapid degradation of DHX36 revealing its transcriptional role by interacting with G-quadruplex
Ziang Lu, Jinglei Xu, and Yuqi Chen contributed equally to this study.
Abstract
Accumulating evidence indicates that G-quadruplexes (G4s) are involved in transcriptional regulation. Previous studies have demonstrated that DHX36 preferentially resolves G4s, suggesting its potential impact on gene transcription mediated by these structures. However, systematic validation is required to establish a link between DHX36 activity and its roles in transcriptional regulation. In this study, we investigate the role of DHX36 in transcription. First, we employ the cleavage under targets and tagmentation (CUT&Tag), an efficient method for mapping protein–DNA interactions, to identify the binding sites in the chromatin of MCF-7 cells. Subsequently, we use the auxin-inducible degron (AID) protein degradation system and improved nascent RNA sequencing method acrylonitrile-mediated uridine-to-cytidine conversion sequencing (AMUC-seq) to pinpoint genes directly regulated by DHX36. Our results reveal a significant enrichment of G4 structures at DHX36 target sites, predominantly located in active genomic regions. In vitro assays further demonstrate DHX36's interaction with G4 sequences from three specific oncogenes. These findings underscore the potential role of DHX36 in modulating gene transcription through G4 structures.
1 INTRODUCTION
The G-quadruplex (G4) is a four-stranded nucleic acid secondary structure formed in guanine-rich sequences via Hoogsteen hydrogen bonds. Bioinformatics analysis using various sequencing methods has revealed that G4s are widespread in human RNA and chromatin DNA.[1-3] Genome-wide sequencing methods such as G4 ChIP-seq and G4 cleavage under targets and tagmentation (G4 CUT&Tag) not only detect the distribution regions of G4s but also explore their specific functions.[4, 5] Further studies have shown that G4s are involved in numerous biological processes, including the maintenance of genome stability, transcription, and translation.[6-11] In genomic DNA, G4 sites are particularly enriched in promoters, which are regions involved in gene regulation. In multiple oncogenes such as c-MYC, BCL2, and PDGF-A, G4 structures within promoters act as transcriptional repressors.[12-14] Additionally, G4-stabilizing ligands have been shown to broadly affect gene expression across the transcriptome.[15] Although initially hypothesized to be transcriptional roadblocks, further evidence indicates that G4s are associated with transcriptional enhancement.[16] These studies provide evidence that G4 structures are involved in transcriptional regulation. However, the specific regulatory roles and underlying mechanisms of G4 regulation remain unclear.
Numerous studies suggest that the recognition and resolution of G4s by helicases may play a significant role in transcription regulation. Helicases from the DEAD box and RecQ-like families have been identified for their ability to recognize or resolve G4 structures.[17, 18] Helicases such as XPB and XPD, which are enriched near transcription start sites (TSSs), bind to DNA G4s.[19] DHX36, a conserved helicase from the DEAD box family, is known to resolve both DNA and RNA G4 structures.[18] The X-ray crystallography structure of DHX36 has provided insights into the structural mechanism of G4 unfolding.[20] Furthermore, DHX36 affects the expression of genes such as YY1 and PITX, which contain G4 structures, suggesting its potential role in transcription regulation by unfolding G4s.[21, 22] However, the transcriptome-wide targets of DHX36 and its interactions with G4 structures need to be systematically studied.
Here, we investigate the function of DHX36 using a series of techniques. First, CUT&Tag assay reveals that the genomic binding sites of DHX36 are enriched in the promoter regions and extensively overlapped with known G4 sites. The results suggest that DHX36 plays a role in transcription regulation through G4 interaction. Subsequently, using the nascent RNA sequencing method acrylonitrile-mediated uridine-to-cytidine conversion sequencing (AMUC-seq) alongside the auxin-inducible degron (AID) protein degradation system, we identify differentially expressed genes (DEGs) after the depletion of DHX36. Considering that regulation requires interactions, we consider DEGs containing DHX36 binding sites as high-confidence gene targets. To further elucidate the role of G4 in transcriptional regulation, we analyzed public data for transposase-accessible chromatin with sequencing (ATAC-seq) and histone modification assay. Our findings suggest that DHX36 tends to influence active genes, many of which harbor G4 sites. Additionally, we confirm DHX36's binding and unfolding of G4 sequences in three oncogenes. Taken together, these results suggest that DHX36 plays an important role in transcription.
2 RESULTS
2.1 DHX36 binds G4-riched transcriptional regulation region
DHX36, a well-established helicase, is recognized for its ability to resolve G4 structures, which are essential for normal cellular functions.[6-11, 19] MCF-7, a cell line derived from human breast cancer cells, is widely used as a model cell line for cancer research.[23] Studying the activity of DHX36 in MCF-7 cells is crucial for understanding the role of helicases in transcriptional regulation, particularly in G4-related oncogene expression. We utilized CUT&Tag, an advanced method that builds upon chromatin immunoprecipitation (ChIP) principles with notable improvements, to explore protein–DNA interactions.[24] Using 100,000 MCF-7 cells and an anti-DHX36 antibody for library preparation, our next-generation sequencing and subsequent data analysis identified 10,438 high-confidence peaks as DHX36 binding sites (Figure S1). To display the distribution of these binding sites, we annotated these peaks with gene annotation references and found significant enrichment in promoter regions and 5′-untranslated regions (5′-UTR) compared to the overall genome (Figure 1A,B). A clustered heatmap further illustrated that these reads were concentrated near TSSs (Figure 1C). These results suggest that DHX36 is biased toward binding at promoter regions and might be involved in transcriptional regulation.

To investigate whether DHX36 binding sites corresponded with G4 structures, we compared these binding sites to both predicted G4 and known G4 sites in the genome. Random background regions with the same number and length as the peaks were obtained using the bedtools shuffle command, repeated three times. We classified predicted G4s (PQS) into four categories using custom scripts, prioritizing them from canonical to noncanonical G4. The proportion of peaks containing the four kinds of PQS was calculated using a custom script described below. The enrichment of PQS was determined as the base two logarithm of the difference between the proportion of peaks and background. The analysis revealed that 93.99% of the peaks contain PQS, and 29.96% feature canonical PQS (Figure 1D). In the promoter region, these proportions increased to 98.57% and 37.31%, respectively. Conversely, the random background comparison showed an approximate eightfold decrease in normal loop (Figure S2). These results suggest that there are a significant number of PQS within the binding sites of DHX36. Among the existing G4 detection methods, we selected the classical in vitro method G4-seq and the new in vivo method G4 CUT&Tag. We then looked for public sequencing data in the MCF-7 cell line. Ultimately, only G4 CUT&Tag data were available for MCF-7 cells, and we chose G4-seq data from NA18507 cells from the initial research.[25, 26] The positions of peaks and random background were compared with G4 CUT&Tag and G4-seq sites using the bedtools intersect command. Finally, the rate of G4 overlap was calculated by dividing the number of peaks containing G4 by the total number of peaks. The results showed 77.6% and 39.6% of the peaks overlap with known G4 sites with an increased proportion in promoter regions (91.7% and 60.9%) (Figure 1E). Additionally, MEME motif searches underscored the prevalence of G4 motifs, particularly in promoter regions (Figure 1F). These results corroborate the association between DHX36 and G4 structures, particularly those located in promoter regions, suggesting that DHX36 may play a role in regulating gene expression via G4 structures.
2.2 The AID system achieves rapid degradation of endogenous DHX36
In the transcription regulation, regulatory proteins play a pivotal role by binding to gene regulatory regions and modulating mRNA levels. To further explore whether DHX36 influences gene transcription, we employed nascent RNA-seq in DHX36 knock-down/knock-out (KD/KO) cell lines. Traditional KD/KO assays, which persistently affect protein expression, often result in secondary changes at the mRNA level, making it challenging to distinguish directly regulated genes from those that are indirectly affected. To detect the genes directly regulated by DHX36 and mitigate the secondary effects, we applied the AID AtAFB2-miniIAA7 system.[27] This system allows for the rapid degradation of DHX36 upon auxin induction, enabling us to observe instantaneous alterations in transcriptional activity.
The AID system involves the fusion of a degron tag, derived from the Aux/IAA proteins, to the target protein of interest. This degron tag is recognized by the TIR1/AFB protein, which belongs to the SKP1–CUL1–F-box (SCF) ubiquitin ligase complex. For the AID system to function in non-plant cells, the host cells must express the F-box protein. Upon the addition of indole-3-acetic acid (IAA), the interaction between the F-box protein and the degron tag fused to the target protein is facilitated. The binding of IAA to the F-box protein promotes the recruitment of the target protein to the SCF complex. Subsequently, the SCF complex ubiquitinates the degron-tagged target protein, marking it for recognition by the 26S proteasome. The proteasome then degrades the target protein, leading to a rapid reduction in its cellular levels. The expression of the F-box protein and the degron was achieved through two rounds of CRISPR/Cas9 gene editing (Figure 2A). To enhance selection efficiency, we fused fluorescent proteins and added antibiotic resistance markers. Specifically, AtAFB2-mCherry and blasticidin resistance were inserted into the AAVS1 safe harbor locus. Meanwhile, miniIAA7-EGFP and puromycin resistance were inserted at the C-terminal of DHX36. After transfection for gene editing, cells resistant to antibiotics were selected through antibiotic treatment, whereas cells expressing fluorescence were sorted using monoclonal cell sorting (Figure S3). Monoclonal cell lines with homozygous insertions were obtained through genome PCR verification (Figure S4). Following two rounds of cell editing and sorting, the resulting cell line was named MCF7-AtAFB2-DHX36.

To assess rapid protein degradation, we conducted live-cell confocal imaging of MCF7-AtAFB2-DHX36 cells alongside wild-type MCF-7 cells (Figure 2B). In MCF7-AtAFB2-DHX36 cells, AtAFB2-mCherry was observed in the cytoplasm, indicated by red fluorescence, whereas DHX36-EGFP was localized in the nucleus, indicated by green fluorescence. Notably, DHX36-EGFP fluorescence disappeared within 2 h after IAA addition. Western blot analysis confirmed significant degradation within 60 min of IAA addition, with complete depletion observed after 75 min. Even after removing IAA from the cell culture for 8 h, DHX36 remained at a low level (Figure 2C). These results validate the efficiency of the AID system in enabling differential gene expression analysis through nascent RNA-seq.
2.3 Rapid degradation of DHX36 leads to differential gene expression in nascent RNA
In previous work, we established an enrichment-free nascent RNA sequencing method AMUC-seq based on acrylonitrile-mediated uridine-to-cytidine conversion via 4-thiouridine (s4U) cyanoethylation.[28] Here, we performed AMUC-seq in MCF7-AtAFB2-DHX36 cells under conditions both with and without the addition of s4U and IAA (Figure 3A). Mutation analysis demonstrated a significant increase in the T-to-C mutation rate following s4U addition (Figure S5), confirming successful labeling of nascent RNA. We subsequently analyzed gene expression at both total mRNA and s4U-labeled mRNA levels. Differential fold changes and TPM (transcript per kilobase per million mapped reads) expression level were visualized in a volcano plot (Figure 3B,C). Out of the analyzed genes, 2635 exhibited a fold change >1.5 in labeled mRNA and a TPM >1 in total mRNA. We defined these genes as “regulated genes”, whereas others were categorized as “unregulated genes”. The fold change distribution of the regulated and unregulated genes both showed a lower change in total mRNA, indicating that nascent RNA-seq provides higher resolution compared to traditional RNA-seq.

Given that transcription regulation often involves close interaction with genomic sequences, we extracted 8533 genes bound by DHX36 from 10,438 CUT&Tag peaks. Specifically analyzing these genes with identified DHX36 binding sites, we observed higher expression levels in both total and nascent mRNA (Figure S6). Differential analysis of these genes also revealed significant differences in labeled RNA levels (Figure 3D,E). Among these genes, 916 genes containing 1011 DHX36 CUT&Tag peaks met the differential expression threshold and were classified as high-confidence regulated targets of DHX36. The nascent RNA levels of these genes in control and treated samples were depicted in a clustered heatmap (Figure S7). The clustering heatmap shows both up- and downregulated differences between the control and treatment groups, as well as basic stability between the two biological replicates. Gene Ontology (GO) analysis reveals that targets of DHX36 are significantly associated with protein transport and protein binding function (Figure S8). In summary, rapid degradation of DHX36 caused a rapid change in nascent mRNA levels. Moreover, a subset of these genes contains DHX36 binding sites, supporting the notion that DHX36 binds to these 916 target genes and influences their transcription.
2.4 DHX36 tends to affect transcription via G4 in active chromatin region
Transcription in eukaryotic cells is a complex and tightly regulated process influenced by various factors, including chromatin accessibility and histone modifications.[29] To investigate the chromatin state in DHX36 target regions, we selected available public datasets containing ATAC-seq, H3K4me3, H3K9me3, and H3K27me3 CUT&Tag experiments in MCF-7 cells.[30] The distribution of reads in 1011 DHX36 target sites was computed (Figure S9).[31] All four types of modification signals showed concentration at the peak center. Specifically, ATAC-seq and H3K4me3 signals were observed in approximately 75% of the peaks. In contrast, H3K9me3 and H3K27me3 signals appeared more widely distributed but concentrated in the remaining 25% of the peaks. These results indicate that DHX36 targets regions with active chromatin features such as accessibility and H3K4me3 modification. The distribution of reads near the TSSs of 916 target genes was also calculated (Figure 4A). Similarly, ATAC-seq and H3K4me3 data showed higher read density near the TSSs of DHX36 target genes compared to H3K9me3 and H3K27me3, which exhibited lower read density. These findings suggest that DHX36 predominantly influences genes that are actively transcribed at promoter regions.

G4 located in the promoter region has been suggested to affect transcription both negatively, such as in cancer gene repression, and positively, such as by stabilizing R-loops.[8, 9] The formation of G4 structures requires double-stranded DNA (dsDNA) to resolve in open chromatin regions. These processes can be further explored using genome-wide sequencing methods like G4 ChIP-seq and G4 CUT&Tag.[5, 6] To confirm the involvement of G4s in DHX36 regulation, we reanalyzed 1011 binding sites within the 916 target genes for predicted G4 sequences (PQS) and known G4 structures. Our findings show that 96.64% of the binding sites contain PQS, and 33.33% contain canonical PQS (Figure 4B). Additionally, 85.7% and 53.1% of the binding sites overlap with G4-seq and G4 CUT&Tag, respectively. The proportion of peaks in the promoter region raised to 92.5% and 64.0%, respectively (Figure 4C). The ranking of G4 motifs within DHX36 target regions is notably elevated, placing them in the second and third positions (Figure S10). The high correlation between G4 and DHX36 target regions indicated that the regulatory mechanism of DHX36 may involve the G4 structure.
Upon annotating the 1011 DHX36 target sites, 75.47% were found to be promoter regions, and 3.46% were transcript terminal sites (Figure S11A). These sites were distributed within 1 kbp upstream and downstream of the TSS of target genes, with enrichment at the center (Figure S11B). To further confirm the impact of G4 structures on DHX36 target genes, we conducted a dual-luciferase reporter assay (Figure S12A). Two hundred base pair sequences upstream of the TSS containing potential G4 or G4-mutated sequences were inserted into firefly luciferase plasmids as promoter. These plasmids were co-transfected with Renilla luciferase plasmids for internal reference. Luciferase activity detection shows that G4 mutation significantly reduced firefly/Renilla relative expression level in FANCC and MYO5A (Figure 4D). This disruption in G4 structure correlated with reduced gene expression, consistent with the observed rapid degradation of DHX36 in AMUC-seq experiments (Figure S12B). However, BAZ2A showed no significant change, possibly due to minimal structural alteration post-G4 mutation. This was further supported by weaker structural changes observed in BAZ2A when the sequence was mutated or in the presence of Li+ (Figure S13). The dual-luciferase reporter assay highlighted DHX36's role in regulating target genes by interacting with G4 structures within promoter regions.
In summary, DHX36 can affect the transcription process at the nascent RNA level. Moreover, DHX36 tends to affect the active gene promoter region with G4 structure, indicating that DHX36 may regulate these target genes through their G4 structure.
2.5 DHX36 unfolds oncogene G4 structure in vitro
Our investigation extended to whether DHX36 can bind to and resolve G4 structures within identified target genes. We expressed recombinant DHX36 protein, fused to a trigger factor (TF) chaperone protein, using the pCold-TF vector (Figure S14). This chaperone protein is designed to assist in the correct folding of recombinant proteins expressed in Escherichia coli. It is particularly useful in enhancing the solubility and proper folding of proteins that tend to aggregate or misfold when overexpressed in bacterial systems. Recognizing the well-established role of G4s in oncogene promoters, we selected several additional target genes that were downregulated following DHX36 degradation, as identified from cancer-related databases (Figure S15B).[12-14] For these genes, we chose the G4 sequences with the highest G4-Hunter scores from their binding sites, which also coincided with known G4 sites such as G4 CUT&Tag or G4-seq (Figure S15A).[32] In the presence of K+, ASH2L forms a parallel G4 structure, and HDAC6 and ADSL form a hybrid G4 structure (Figure S16). To verify the ability of DHX36 to bind to these G4s, we performed electrophoretic mobility shift assay (EMSA) with FAM-labeled G4 and G4-mutated sequences (Figure S17). DHX36 is significantly bound to labeled G4 DNA annealed with K+, forming a DHX36-G4 complex band at the top of the gel. However, little binding was found in the presence of Li+ or in the absence of K+, suggesting DHX36 tends to bind G4-formed DNA rather than single-stranded DNA (Figure S18). The same results were obtained in the G4 sites of the three genes validated above (Figure S19). Importantly, no binding was observed to mutated G4, indicating that DHX36 specifically binds these G4 sequences.
Fluorescence resonance energy transfer (FRET) was frequently used to verify G4 unfolding. Inspired by a DHX36 RNA G4 unfolding experiment from Lyu et al., we designed a similar assay where G4 DNA, labeled with Cy5 at the 3′ end and Cy3 at the 5′ end, was mixed with complementary DNA (Figure S20A).[33] DHX36 was expected to resolve the G4 DNA into dsDNA, resulting in decreased FRET effect and increased Cy3 fluorescence. Our EMSA using Cy3–Cy5-labeled DNA and unlabeled complementary DNA revealed that the migration of G4 DNA band was retarded by the addition of complementary DNA (Figures S20B and S21). Similarly, in the fluorescence spectrum of dsDNA group, the addition of DHX36 significantly enhanced the Cy3 fluorescence at 570 nm and reduced the Cy5 fluorescence at 670 nm (Figure S22). However, in ssDNA group, only three validated genes showed a decrease at 570 nm and little change at 670 nm. The double-stranded design verifies that DHX36 reduced the FRET effect of target G4s, which means DHX36 changed these G4 structures in vitro. This finding provides evidence that DHX36 might regulate transcription through G4 unfolding.
3 DISCUSSION
As a member of the DEAD box family of helicases, DHX36 has been demonstrated to resolve G4 structures in vitro and exhibits a specific affinity for promoter regions, which are critical for transcription. The variable impact of G4 structures on transcription suggests a probable association of DHX36 with transcriptional processes.[16] In addition to G4 sequences, motif analysis of DHX36 reveals similarities to motifs found in zinc finger proteins (Figure S23), suggesting potential interactions similar to those observed in zinc finger proteins. This motif similarity hints at DHX36 possibly interacting with similar genomic regions and playing roles in transcription. Similar biases in binding region and roles in transcription may also be shared by other G4 helicases, such as BLM from the RecQ-like family. This points to a broader, significant role for these helicases in transcription regulation.
As a common method for researching gene expression, nascent RNA-seq measures transient changes of mRNA expression with higher resolution than normal RNA-seq. Combining the AID rapid protein depletion approach with nascent RNA-seq enables the immediate and direct assessment of protein absence over a defined short period. Rapid degradation of DHX36 leads to both gene upregulation and downregulation, supporting the hypothesis that G4 may not function only as a transcription repressor. By integrating DHX36 CUT&Tag and AMUC-seq, DEGs containing binding sites are identified as regulation target of DHX36. Notably, the presence of non-DEGs with DHX36 binding sites may be due to either non-specific DHX36 binding or inherent limitations of the CUT&Tag assay. Conversely, DEGs lacking DHX36 binding sites could potentially reflect indirect effects of DHX36's broader regulatory roles.
The public ATAC-seq and histone modification dataset shows that DHX36 binds and affects active genes rather than silent genes, supporting that DHX36 acts on functional genomic DNA located in open chromatin region. The high G4 overlap ratio suggests that DHX36 might regulate transcription by unfolding G4.
Given the increasing focus on G4 structures within oncogenes as potential therapeutic targets and the use of G4-stabilizing ligands in anticancer treatments, our findings highlight DHX36's relevance in cancer research.[34] DHX36 is involved in diverse cell differentiation processes such as spermatogenesis and heart development.[35, 36] In addition, DHX36 plays a role in multiple types of cancer, including lung cancer, breast cancer, and colon cancer cells.[37-39] The in vitro unfolding of G4 structures within oncogenes, such as HDAC6 and ADSL, underscores the potential of targeting DHX36-mediated mechanisms in cancer therapy.[40, 41] The comprehensive mapping of DHX36 targets across the genome lays the groundwork for future research into its role in both cancer and transcription regulation.
In conclusion, our work reveals the binding sites of DHX36 in the genome and the transcriptional regulation targets of DHX36 at the nascent RNA level. We demonstrate that DHX36 tends to affect active genes in the G4-enriched promoter region and to unfold the oncogene G4 in vitro. However, the specific mechanism of regulation is still unclear. These findings enhance our understanding of DHX36's role in transcriptional regulation through G4 structures, providing new avenues for investigating gene expression mechanisms and potential therapies for cancer.
4 EXPERIMENTAL SECTION
4.1 Cell culture
All cells were cultured with Dulbecco's modified Eagle's medium (Gibco) containing 10% fetal bovine serum, 1 U/mL penicillin, and 1 µg/mL streptomycin (Gibco) in the presence of 5% CO2 at 37°C incubators.
4.2 CUT&Tag and data analysis
Wild-type MCF7 cells were used for the CUT&Tag assay. The CUT&TAG assay and library construction were processed according to the manufacturer's protocol of CUT&TAG kit (TD903; Vazyme) with 1:100 DHX36 Rabbit pAb (ABclonal). The libraries were sequenced on the Illumina HiSeq X Ten platform. Adapters in raw reads were trimmed with Trim Galore (v0.6.1), and reads were mapped to human genome (hg38) with bowtie2 (v2.4.1).[42] Duplicate reads were removed from unique reads using Picard MarkDuplicates (v2.23.3). The clean reads were used to call peaks using MACS2 (v2.2.7.1). After removing peaks in blacklist (https://github.com/Boyle-Lab/Blacklist/tree/master/lists), the high repeatability peaks between biological replicates were obtained using IDR (v2.0.4.2).[43] The annotation and enrichment analysis of the high repeatability peaks were performed by Homer (v4.11). In addition, the proportion of peaks in each region was calculated in analysis and compared to the proportion of the length of the whole genome in each region. Heatmap and density diagram were generated by deepTools plotHeatmap (v3.5.1).[31] Random background region that have the same number and length as peaks were obtained with bedtools shuffle (v2.30.0) three times. The proportion of peak containing four potential G4 sites (PQS) was calculated by custom script, which is described below. The enrichment of PQS was the base two logarithm of the difference between the proportion of peaks and background. Then the position of peaks and random background were compared with G4 CUT&Tag and G4-seq sites with bedtools intersect.[25, 26] Finally, the rate of G4 overlapping was calculated by dividing the number of peaks containing G4 by all peaks. G4-seq and G4 CUT&Tag data were downloaded from GSE63874 and GSE181373 from Gene Expression Omnibus (GEO) database, respectively. Motif analysis was performed with MEME in anr mode.
4.3 Predicted G-quadruplex site analysis
PQS was searched using the hierarchical assignment. PQS were classified as four categories, including Normal Loop, Long Loop, Simple Bulge, and Complex Bulge/Two-tetrads based on the published articles.[4, 44] Basic G4 sequence pattern was G{3,}N{1,7}G{3,}N{1,7}G{3,}N{1,7}G{3,}. Normal Loop and Long Loop were G4 sequence without bulge and with at least one loop, the length of which was 1–7 or >7 bases (up to 12 for any loop and 21 for middle loop). Simple Bulge was G4 sequence with multiple 1 base bulge or a 1–7 bases bulge. Complex Bulge was G4 sequence with multiple 1–5 bases bulges or 2 repeat G in every G-repeat. Two-tetrads was G4 sequence with 2 repeat G in every G-repeat. The latter two classes were G4 sequence with 1–7 bases loop. PQS types in peaks were classified following priority rules above. All PQS analyses were analyzed by regular expression matching using custom Perl scripts (https://github.com/xinnnluuu/sci_bioinformatics_script).
4.4 Plasmid construction
For CRISPR/Cas9 editing, different oligo pairs (Genecreate) encoding the 20-nt guide sequences were inserted into BbsI site of pSpCas9(BB)-2A-GFP to generate all sgRNA-Cas9 plasmids. pSH-EFIRES-B-AtAFB2-mCherry containing blasticidin resistance was used as Donor plasmid for AtABF2 inserting. Homology arms sequence flanking the editing site of DHX36 was synthesized (GenScript) and inserted into Vector pBluescript II KS(+). Inserting sequence containing miniIAA7, EGFP, and puromycin resistance was amplified by PCR from pLVX-miniIAA7-mEGFP-Puro and inserted into BamHI site of homology arms to generate Donor plasmid for DHX36 editing.
For recombination protein expression, codon-optimized DHX36 full coding sequence containing BamHI and EcoRI sites was synthesized (GenScript) and inserted into pUC57-Kan vector. DHX36 sequence were digested and inserted into pCold-TF vector to generate pCold-TF-DHX36 for protein expression.
For the dual-luciferase reporter assay, promotor sequences of selected genes were amplified from MCF-7 cell genome DNA by PCR. pGL4.23 vector was digested at KpnI and HindIII sites. Insert sequences containing G4 were amplified by PCR and ligated with linearized vector using pEASY-Basic Seamless Cloning and assembly Kit (TransGen Biotech). G4 mutation was performed with QuickMutation Plus (Beyotime).
T4 ligase (Thermo Fisher Scientific), BbsI-HF, BamHI-HF, EcoRI-HF, KpnI-HF, and HindIII-HF (NEB) were used in above experiments. Normal PCR was performed with FastPfu DNA polymerase (TransGen Biotech). Genome PCR was performed with PrimeSTAR GXL DNA Polymerase (TaKaRa).
4.5 CRISPR/Cas9
All sgRNAs were designed on Benchling (https://www.benchling.com/crispr), and three close editing sites were designed for each edit. Plasmid construction was processed in above experiments. Cells were plated onto 24-well plates and transfected with Lipofectamine 3000 according to manufacturer's protocol when cells reached 70% confluency. Cells were divided into three groups, and two kinds of sgRNAs were used in each group. For each group, 400 ng sgRNA-Cas9 plasmid A, 400 ng sgRNA-Cas9 plasmid B, 600 ng Donor plasmid, 2.8 µL p3000, and 1.5 µL lipo3000 were used. Cells were transferred to 6-well plates cultured with fresh medium 12 h after transfection. Blasticidin of 10 µg/mL or puromycin of 1 µg/mL (Solarbio) was added 1.5 days after transfection and removed after 3 days. Cells were collected for further FACS experiment. Wild-type MCF7 cells were used in first editing, and MCF-AtAFB2 cell line was used in second editing.
4.6 FACS
Selected cells were plated onto 6-well plates until the cells reached 70% confluency. The cells were collected in PBS containing 1% FBS and stored on ice before FACS analysis. FACS analysis was performed on S3e Cell Sorter (Bio-Rad). EGFP was excited with a 488 nm laser and detected with a 530/40 detector. mCherry was excited with a 561 nm laser and detected with a 615/20 detector. Wild-type MCF7 cells were used as fluorescence background. Single clone containing high fluorescence was sorted to 96-well plates for further PCR screening.
4.7 Selection of homozygously edited cell lines
Different editing results on both sister chromatids can be recognized by different PCR productions with same primers flanking editing sites. Only short, mixed, and only long PCR production were amplified after ineffectively, heterozygously, and homozygously editing, respectively. For each edit, primers flanking all three sites were designed. Genomic DNA was extracted from each single-clone cells with EasyPure Genomic DNA Kit (TransGen Biotech), and only homozygous single clones were selected by PCR with PrimeSTAR GXL DNA Polymerase (TaKaRa).
4.8 Live-cell imaging
MCF7-AtAFB2-DHX36 cells were plated onto Glass Bottom Cell Culture Dish (NEST). Images were obtained using Leica TCS SP8 DIVE microscope equipped with an HC PL APO CS2 × 63.0/1.40 OIL objective and gadolinium hybrid (HyD) detectors, with 488 and 561 nm laser excitation. Cells with appropriate and consistent cell density were selected under the brightfield observations. The data of all channels of these cells were collected subsequently and comprehensively analyzed by ImageJ software.
4.9 Western blot
106 MCF7-AtAFB2-DHX36 cells were lysed in 100 µL RIPA buffer (50 mM Tris–HCl pH 7.4, 150 mM NaCl, 1.0% Trition X-100, 0.1% SDS, 1% sodium deoxycholate) containing 1× Halt protease and phosphatase inhibitor cocktail (Thermo Scientific) at 4°C for 30 min. Lysates were cleared by centrifugation in 12,000 g at 4°C for 30 min, and protein was quantified with BCA protein assay kit (Thermo Scientific) according to the manufacturer's protocol. Equal quantities of protein were added to SDS loading buffer, boiled at 95°C for 10 min, and separated by 10% SDS–PAGE gels. The separated proteins were transferred to a PVDF membrane (Millipore) at 70 V for 2.5 h. The membrane was blocked in TBST buffer containing 5% BSA at 37°C for 1 h, incubated with 1:2000 DHX36 Rabbit pAb (ABclonal) at 4°C overnight, and incubated with 1:5000 HRP-conjugated Affinipure Goat Anti-Rabbit IgG (H + L) (Proteintech) at 37°C for 1 h. The membrane was washed three times with TBST at 37°C for 5 min after each incubation. The blots were imaged on ChemiDoc XRS+ Imaging System (Bio-Rad) with SuperSignal West Pico PLUS (Thermo Scientific).
4.10 AMUC-seq and data analysis
MCF7-AtAFB2-DHX36 cells were plated onto 10 cm cell dish and grown until 70% confluency. 500 µM IAA was added to medium for 1 h to degrade DHX36, and 500 µM s4U was added for another 1 h to label nascent RNA. RNAs for library construction were prepared after total RNA extraction, polyA RNA isolation, gDNA removal, RNA fragmentation, and RNA cyanoethylation (45°C, 10 h). Standard RNA-seq libraries were prepared using NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (NEB) according to manufacturer's instructions. Sequencing was performed on Illumina HiSeq X Ten platform.
Adapters in raw reads were trimmed with Trim Galore (v0.6.1), and reads were mapped to human rRNA with bowtie2 (v2.4.1) to remove rRNA sequences. Following steps were designed according to the protocol of REDItools for collecting mutation information.[45] Unmapped reads were mapped to human genome (hg38) with STAR (v2.7.0d).[46] Duplicate reads were removed from unique reads using Picard MarkDuplicates (v2.20.4). Mutation sites were collected and annotated with REDItools, and SNP sites were removed. Mutation rate calculation and s4U labeled read extraction were performed with a custom script (https://github.com/xinnnluuu/sci_bioinformatics_script). Raw counts of s4U-labeled mRNA and all mRNA were obtained with gencode.v39.annotation using HTseq (v0.13.5).[47] TPM expression matrix and fold change were calculated with custom script. Data statistics and heatmap were generated with R package ggplot2 and pheatmap. ATAC-seq and histone modification data were got from GSE201262. Heatmap of these data was generated with deepTools (v3.5.1).
4.11 Dual-luciferase reporter assay
Promoter G4 sequence about 200 bp was inserted in pGL4.23[luc2 minP] firefly luciferase reporter vector. pGL4.73 Renilla luciferase plasmid was used as internal reference. Two kinds of plasmids were transfected into 12-well plates when cell reached 60% confluency with lipofectamine 3000 (Thermo Scientific). After 1.5 days, cells were lysed, and luciferase activity was determined with Dual Luciferase Reporter Assay Kit (Vazyme).
4.12 Cancer database and GO analysis
Cancer-related gene list was downloaded from website of Bushman's group (http://www.bushmanlab.org/links/genelists). The list was compiled from different cancer database containing Atlas, CANgenes, CIS (RTCGD), Human Lymphoma, Miscellaneous, Sanger and Vogelstein. A total of 153 target genes were uploaded to DAVID to perform GO analysis.[48] Data statistics were displayed with ggplot2.
4.13 Circular dichroism (CD) spectroscopy
The DNA oligonucleotides (Sangon Biotech) were dissolved in 10 mM Tris–HCl buffer (pH 7.5) at 5 µM concentration with the addition of 150 mM KCl or LiCl, heated at 95°C for 10 min, and cooling to 25°C by −1°C/min. Circular dichroism (CD) experiments were performed at 25°C with Chirascan CD spectroscopy (Applied Photophysics). CD spectra were collected from 200 to 350 nm with 1.0 mm path length. The bandwidth was 1.0 nm, scanning speed was 50 nm/min, and response time was 1.0 s. All result was baseline-corrected with buffer and repeated three times. Melting curves were measured at the wavelength of highest peak in different samples and fitted to a concerted two-state model for calculating melting temperature.
4.14 Expression and purification of recombinant protein
The expression plasmid pCold-TF-DHX36 was transformed into BL21 (DE3) Chemically Competent Cell (TransGen Biotech). The cells were cultured in LB medium at 37°C; IPTG was added to 1 mM when OD600 reached 0.6 and then cultured at 16°C for 12 h. The cells were centrifuged at 4000 rpm and 4°C for 15 min and resuspended in lysis buffer (50 mM Tris–HCl, pH 7.0, 500 mM NaCl). The suspensions were sonicated on ice for 30 min and centrifuged at 8000 rpm and 4°C for 30 min. The His-tagged recombinant proteins were purified from supernatant with HisTrap HP (Cytiva Life Sciences) in NGC Discover 10 Chromatography System. The purified protein was quantified with BCA protein assay kit (Thermo Scientific) and examined with 10% SDS–PAGE.
4.15 Electrophoretic mobility shift assay (EMSA)
The DNA oligonucleotides with fluorescent labels (Sangon Biotech) were annealed similarly to CD experiment. Complementary strand was added at same concentration in mixed system before annealing. The binding reaction was performed in 10 µL mix (10 mM Tris–HCl buffer, pH 7.5, 2 mM MgCl2, 150 mM KCl, 10% glycerol, 1 µM recombinant proteins, and 100 nM oligonucleotides) at 37°C for 30 min. Additional 2 mM ATP was added to the unwinding reaction mix. The mix was separated on a pre-electrophoresed 10% non-denaturing polyacrylamide gel (79:1 acrylamide:bisacrylamide) with 1x TBE at 4°C for 1.5 h and detected with ChemiDoc Touch (Bio-Rad).
4.16 FRET fluorescence spectra
Unlike the CD experiments, the DNA oligonucleotides were annealed to 10 µM. The concentration of unfolding mix was accordingly changed (10 mM Tris–HCl buffer, pH 7.5, 2 mM MgCl2, 150 mM KCl, 5 µM recombinant proteins, and 10 nM oligonucleotides). The emission spectrum of unwinding reaction mix was measured from 530 to 750 nm at 500 nm excitation wavelength with 10 nm excitation slit width and 15 nm emission slit width on LS 55 Fluorescence Spectrometer (PerkinElmer). A TTTT tail is added to the 3' and 5' ends of the three G4 sequences MYO5A, BAZ2A and FANCC.
4.17 Statistical analysis
All statistical analysis was proceeded in R. T-test was performed with R functions in the default setting.
ACKNOWLEDGMENTS
This work is supported by National Natural Science Foundation of China (22037004, 21721005, 92153303, 21907079, 92253202, 22177987). Funding for open access charge: National Natural Science Foundation of China. The numerical calculations in this article have been done on the supercomputing system in the Supercomputing Center of Wuhan University.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest.
Open Research
DATA AVAILABILITY STATEMENT
The sequencing data that support the findings of this study are deposited into the Gene Expression Omnibus (GEO). The accession number is GSE228815. The custom scripts of this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.10519025. The oligonucleotides sequences used are shown in Table S1.