Volume 34, Issue 6 e70172
RESEARCH ARTICLE
Full Access

The WRC domain of GRF transcription factors: Structure and DNA recognition

Franco A. Biglione

Franco A. Biglione

Instituto de Biología Molecular y Celular de Rosario (IBR-CONICET-UNR), Santa Fe, Argentina

Área Biofísica, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Santa Fe, Argentina

Contribution: Conceptualization, Methodology, Formal analysis, Writing - review & editing, Writing - original draft, ​Investigation

Search for more papers by this author
Nahuel D. González Schain

Nahuel D. González Schain

Instituto de Biología Molecular y Celular de Rosario (IBR-CONICET-UNR), Santa Fe, Argentina

Contribution: Funding acquisition, Writing - review & editing, Conceptualization

Search for more papers by this author
Javier F. Palatnik

Javier F. Palatnik

Instituto de Biología Molecular y Celular de Rosario (IBR-CONICET-UNR), Santa Fe, Argentina

Contribution: Conceptualization, Writing - review & editing

Search for more papers by this author
Rodolfo M. Rasia

Corresponding Author

Rodolfo M. Rasia

Instituto de Biología Molecular y Celular de Rosario (IBR-CONICET-UNR), Santa Fe, Argentina

Área Biofísica, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Santa Fe, Argentina

Correspondence

Rodolfo M. Rasia, Instituto de Biología Molecular y Celular de Rosario (IBR-CONICET-UNR), Rosario 2000, Santa Fe, Argentina; Área Biofísica, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Rosario 2000, Santa Fe, Argentina.

Email: [email protected]

Contribution: Conceptualization, Funding acquisition, ​Investigation, Writing - review & editing, Project administration, Supervision

Search for more papers by this author
First published: 29 May 2025

Review Editor: Jeanine Amacher

Abstract

Growth-regulating factors (GRFs) belong to a family of transcription factors found in plants which display important roles in growth and development. GRF transcriptional activity is finely tuned by regulatory processes involving post-transcriptional repression exerted by microRNA miR396, and protein–protein interactions involving a family of co-transcriptional regulators known as GRF-interacting factors (GIFs). In this way, the activity of GRF target genes is modulated by a highly complex interplay between GRF/GIF isoform diversity and expression patterns along with miR396 and GIF gradients throughout plant tissues. At the protein level, GRFs are composed of two highly evolutionarily conserved domains known as QLQ and WRC and a less conserved C-terminal trans-activation domain. Whereas QLQ mediates GRF–GIF interaction by forming a complex with a conserved domain called SNH (by SYT N-terminal homology) found in GIFs' N-terminal region, the WRC has been proposed as a putative zinc finger domain responsible for target DNA recognition and nuclear import. However, the structural aspects governing GRF transcriptional activity and target recognition remain unknown. In this work, we applied bioinformatic and biophysical analysis to comprehensively characterize the structural features that modulate the biological function of this protein family with a focus on the WRC domain. We provide insights into the structure of the WRC domain in GRFs and explore the WRC features driving GRFs:DNA complex formation. These findings offer new insights into how WRC domains modulate the biological functions of GRFs, laying the groundwork for future studies on their structure–function relationship in gene regulation and development of plants.

1 INTRODUCTION

Eukaryotic organisms rely on complex regulatory networks to proficiently perform their biological functions. The differential activity of genes is finely tuned by the action of transcription factors that orchestrate the expression of specific genes through binding to DNA cis-targeting sequences. Among these factors, Growth-regulating factors (GRFs) play a crucial role in plant growth and development, such as the regulation of cell proliferation in leaves (Horiguchi et al., 2005; Kim et al., 2003; Kim & Kende, 2004; van der Knaap et al., 2000). GRFs are widely conserved in land plants, in the form of gene families of between 8 and 20 members. For instance, Arabidopsis thaliana, rice (Oryza sativa), and soybean (Glycine max) possess 9, 12, and 24 GRF members, respectively (Fonini et al., 2020). GRFs were shown to bind to regulatory sequences near target genes and modify their expression levels (Kim et al., 2012; Kuijt et al., 2014; O'Malley et al., 2016; Osnato et al., 2010).

At the structural level, GRFs span between 250 and 500 residues in length and are defined by the presence of two conserved domains at their N-terminus: the QLQ and the WRC domains (Kim et al., 2003) (Figure 1a). The QLQ domain, which has 36 residues, contains a conserved QX3LX2Q motif and drives the interaction with the transcriptional coregulators of the GRF-interacting factors family (GIFs) (Kim & Kende, 2004). The WRC domain, in turn on its side, is a putative zinc finger DNA-binding motif as indicated by the presence of a strictly conserved CX9CX10CX2H motif (Raventós et al., 1998; van der Knaap et al., 2000). Finally, the C-terminal region is highly variable in length and composition and is associated with the transactivation activity of GRF proteins (Choi et al., 2004; Kim & Kende, 2004).

Details are in the caption following the image
The WRC domains can be grouped in three clusters with specific amino acid imprinting. (a) Domain architecture of WRC-containing proteins. (b) Mapping of the clusters identified in the first three singular coordinates of the SVD analysis of WRCs. Projected sequences are color-coded according to K-means clustering results. Known WRC-containing proteins from A. thaliana are highlighted in dark colors. Larger circles denote the coordinates of the consensus sequences calculated for each cluster. (c) Cluster-specific consensus sequences. Each sequence highlights those residues preferentially enriched (in-cluster vs. out-of-cluster enrichment of at least 60%). Representative protein architectures preferentially partitioned to each cluster are shown on the left side of the panel.

The regulatory role of GRFs in proliferating cells extends beyond leaves to encompass other organs in a wide variety of plant species (Lazzara et al., 2024). In this sense, these transcription factors, together with the GIF coactivators, participate not only in the growth of leaves but also in roots, stems, and inflorescences. Moreover, they also influence tolerance to UV light and drought and lead to delayed senescence in Arabidopsis. GRF expression and localization are in turn subject to post-transcriptional regulation by miR396, whose recognition site is found within the coding sequence of the WRC domain (Liebsch & Palatnik, 2020).

Central to their function, GRFs interact with specific cis regulatory sequences of different genes (Kim et al., 2012; Kuijt et al., 2014; Osnato et al., 2010; Piya et al., 2020). Deletion of the WRC domain of barley GRF1 (BGRF1) prevents binding of the protein to the promoter region of the Bkn3 gene (Osnato et al., 2010). It has also been shown that A. thaliana GRF7 (AtGRF7) controls the transcription of the transcriptional activator DREB2A by binding to a cis regulatory element (TGTCAGG) in its promoter (Kim et al., 2012). In particular, GRF7 and GRF10 from O. sativa (OsGRF7/10) modulate the expression of the rice KNOX Oskn2 gene, involved in the regulation of meristematic function, through the interaction with a 34 bp fragment in the promoter region (Kuijt et al., 2014). Interestingly, probes of this region lacking the TGTCAGG sequence are also recognized by AtGRF4, AtGRF5, and AtGRF6, and binding requires both the WRC and QLQ domains. This finding suggests that the specificity of WRCs for DNA sequences may be relaxed. In addition, a ChIP-seq study of AtGRF1 and AtGRF3 resulted in enrichment in the ACTCGAC and CTTCTTC sequences (Piya et al., 2020). These divergent results challenge our understanding of the specificity of WRC domains for DNA sequences (Lazzara et al., 2024). Moreover, the residues responsible for DNA recognition within this domain remain unknown.

The WRC domain has been classified as a noncanonical Zn(II) finger domain due to its similarity to the hordeum repressor of transcription (HRT) protein found in barley (Raventós et al., 1998). Unlike canonical Zn fingers, the conserved pattern in the WRC domains is C-X7-WRC-X10-C-X2-H. Its classification as a Zn finger responds to the conservation of the 4 putative metal ligand residues (CCCH), but its structural properties remain unknown due to the lack of homology to any Zn finger domain of known structure. Zinc finger domains display extensive sequence and structural variability, and the metal coordination is essential for their folding (Kluska et al., 2018). While critical for structural integrity, the Zn(II) ion typically does not directly engage in DNA binding. Although most of the characterized Zn fingers adopt a ββα fold, there are numerous variants that accommodate different numbers of residues between the metal ligands, giving rise to remarkably diverse structural arrangements (Padjasek et al., 2020). The coordination of the metal ion offers a strong stabilization compared to other short domains and gives it the possibility to explore a wide sequence space, enhancing its versatility.

The WRC domain sequence, spanning only 44 residues, is strictly found in plants and algae and experiences significant evolutionary pressure within GRFs and potentially other proteins containing this domain (Kim, 2019). It features a putative nuclear localization signal composed of basic residues (R/K) at 11 specific positions along its sequence. Additionally, the domain coding sequence harbors a miR396 binding site critical for post-transcriptional regulation of GRFs and proper plant development (Kim & Tsukaya, 2015). Despite its role in binding to DNA and guiding the GRFs to selectively regulate gene expression, the residues within the WRC domain responsible for the recognition of GRF cis-targeting sequences remain elusive. Delineating the mechanisms by which this domain interacts with DNA is essential to understand differences in target gene specificity for WRC-containing proteins, both within and outside the GRF family of transcription factors.

The WRC domain is not exclusive to the GRF family of transcription factors but is also found in other plant proteins. To date, the only characterized WRC domain outside GRF proteins is in JMJ28, a Jumonji C domain-containing protein in Arabidopsis (Figure 1a). This domain is essential for directing ATX1/2-containing COMPASS-like complexes to specific chromatin targets (Xie et al., 2023). However, the presence of WRCs across diverse protein architectures (Mulder, 2010) and their sequence heterogeneity has not yet been explored.

In this study, we conducted bioinformatic and structural analysis of WRC domains specifically focusing on those from A. thaliana GRF1, GRF3, GRF5, and GRF7. We found that all these domains bind one Zn(II) ion per monomer with high affinity, which is crucial for their proper folding. The folded proteins exhibit moderate stability, with melting temperatures ranging from 38 to 56°C. Finally, our characterization revealed that these GRFs can bind dsDNA in a nonspecific fashion but display preferences for their specific cognate sequences.

2 RESULTS

2.1 WRC domains from GRFs share a distinct amino acid imprinting

To gain insight into the uniqueness of WRC domains in GRFs compared to those in other WRC-containing proteins, we performed singular value decomposition (SVD) of WRC sequences belonging to heterogeneous protein architectures. With this aim, 6957 protein sequences with an associated WRC PFAM (PF08879) were retrieved from UNIPROT, all of which represented 2083 non-duplicated WRC sequences. The unique WRCs were further filtered based on their sequence identity (<90%) and length (60% interval around the median length) to reduce redundancy and retain sequence variability (Figure S1A). A total of 1244 curated WRCs were used to generate a multiple sequence alignment (MSA) which was finally analyzed through SVD (Baxter-Koenigs et al., 2022). The nearly linear correlation of the consensus identity of WRC sequences in the MSA and σ1ui(1) with a pearson coefficient of 0.989 indicates a strong relationship between conservation and the first singular coordinate, implying that the dataset proficiently captures overall residue conservation (Figure S1B).

The projection of sequences in SVD space revealed that WRCs are not randomly distributed but instead cluster into distinct groups, suggesting they may belong to protein subfamilies with divergent functions (Figure 1b). Clustering using K-means resulted in an optimal of three clusters which are well resolved in the first three singular coordinates (Figure S1C). Strikingly, we found that characterized WRC-containing proteins from A. thaliana segregate based on their established biological function. A detailed cluster-specific analysis of the proteins harboring the WRC sequences showed that each SVD subgroup displays preferential enrichment for specific domain architectures in which the WRC domain is embedded (Table S1 and Supplementary File 1), with GRFs almost exclusively grouped into a distinct cluster which is in good agreement with the phylogenetic tree obtained for the entire dataset (Table S2 and Figure S2). In this sense, we also found a cluster formed by WRC-containing proteins with only one WRC copy and lacking annotation of other known domains. These WRC domains belong to the same phylogenetic branch. In contrast, although JmjC-type proteins preferentially partition to a third cluster (depicted in blue), it exhibits significant heterogeneity as accounted for by the protein architectures found within this subgroup.

Segregation of WRC from GRFs into a specific cluster in the SVD space suggests the presence of intrinsic features central to their function that are encoded in the domain sequence. To characterize the relevant residues responsible for cluster partitioning, we calculated the consensus sequences of each cluster and compared the frequency of each residue at each position between different clusters (Figures 1c and S1D–E). In the case of the cluster that includes WRCs from proteins with a GRF architecture, the residues having a preferential enrichment are in good agreement with the sequence differences found between AtGRFs and AtJMJ24/AtJMJ28 at the level of the WRC domain (Figure S1D). Interestingly, taxonomic mapping of WRC-containing proteins suggests that primitive WRCs belonging to the most diverse cluster may have acquired and retained GRF-like features throughout evolution, smoothly leading to their later functional segregation (Figure S3). As a result, our SVD analysis allowed us to identify potential WRC residues that may define the GRF nature of the WRC domains within the specific architecture of this family of transcription factors.

2.2 The WRC is a zinc finger domain that folds in the presence of Zn2+

We next carried out structural studies to properly assess the role of the conserved residues within the domain in atomic detail. As shown in Figure S1D, A. thaliana has nine GRF genes which have been reported to belong to five different phylogenetic groups (Fonini et al., 2020). We selected one representative sequence from four phylogenetic groups, leaving out the group of AtGRF9 as this isoform has been suggested to function differently from other GRFs due to the presence of two WRC domains (Omidbakhshfard et al., 2018). Therefore, we expressed the WRC domains from AtGRF1, AtGRF3, AtGRF5, AtGRF7 as C-terminal fusions to Thioredoxin. We will name these constructs WRC1/3/5/7 along the manuscript. All constructs were soluble, and the corresponding WRC domains were obtained by digestion of the fusion products (Figure S4).

We studied the overall folding properties of the four isoforms using circular dichroism. The near-UV spectra of all domains show that they have a stable tertiary structure with well-structured bands (Figures 2a and S5A). On its side, the far-UV spectra are less well defined, indicating that the domains have little content of canonical secondary structure elements (Figures 2b and S5B). The fold of the domains is marginally stable towards temperature unfolding, with melting temperatures in the 40–60°C range (Figure S5A, C).

Details are in the caption following the image
Binding of Zn(II) is essential for WRCs to acquire a folded conformation. (a) Near-UV CD spectra of WRC1 in the presence (HoloWRC1) and absence (ApoWRC1) of Zn(II). (b) Far-UV CD spectra of Holo and ApoWRC1. 1H-15N HSQC spectra of holo (c) and apo (d) WRC1.

The WRC domain has been proposed as a putative noncanonical Zinc Finger domain due to the conservation of three cysteines and one histidine residue. To validate this, we first examined the metal binding properties of the domain using the chelating chromophore PAR (Kluska et al., 2018). The quantification of the Zn(II) content revealed one equivalent per protein with an affinity in the low nanomolar range (KD = 14.4 nM for WRC7), consistent with the identification of a single zinc-coordination motif (CCCH). We confirmed in this way that WRCs are Zn(II) binding domains (Figure S6).

We then acquired NMR spectra of all four domains. The 1H-15N HSQC spectra show in all cases fewer signals than expected based on the domain's length and amino acid composition, suggesting the presence of regions with conformational flexibility in an intermediate timescale (Table S5 and Figure S7). Removal of the Zn(II) ion results in loss of the signals' dispersion, indicating that the metal ion is essential for the domain to acquire its folded conformation (Figures 2c, d, S7). Loss of structure is also reflected in the disappearance of the near-UV CD bands and weak secondary structure elements in the far-UV spectra (~220 nm) (Figure 2a, b, respectively). Therefore, our results confirm that the WRC is a CCCH zinc finger domain with one metal binding site that is essential for its native conformation.

2.3 The WRC displays a non-canonical fold

Unlike canonical CCHH zinc finger domains, which fold into a ββα-motif around a tetrahedrally coordinated Zn2+, CCCH zinc fingers adopt non-canonical conformations with limited contributions from defined secondary structure elements (Kluska et al., 2018; Padjasek et al., 2020). We obtained structural models for the AtGRF1/3/5/7 WRC domains using the AlphaFold3 web server (Abramson et al., 2024). The resulting structures share a common structural arrangement comprising a β-hairpin stabilized by hydrogen bonds, followed by a loop and a short α-helix. Metal coordination binds these motifs together (Figures 3a and S8). The models show overall little regular secondary structure, in agreement with the far-UV CD spectra. In contrast to other canonical Zn fingers, WRC domains lack a hydrophobic core (Figure S8D). Regarding the conserved WRC motif, the cysteine functions as one of the Zn(II) ligands, while neither the tryptophan nor the arginine side chains appear to play a direct role in specific interactions within the domain. The presence of a basic residue preceding the metal-coordinating cysteine has been linked to a reduction in the pKa value of the thiol, thereby enhancing the metal site's affinity (Padjasek et al., 2020). On the other hand, the tryptophan residue may be involved in nucleic acid interactions, as observed in other similar domains (Fasken et al., 2019; Loughlin et al., 2009; Nguyen et al., 2011). Finally, the region coded by the miR396 recognition sequence is found in an unstructured region of the protein, thus allowing for adaptability in the post-transcriptional regulation of the protein (Liebsch & Palatnik, 2020).

Details are in the caption following the image
The WRC domains show an extended fold with no hydrophobic core and little canonical secondary structure content. (a) AlphaFold3 model of WRC3. Panel Inset highlights the predicted hydrogen bonding network from the structural model (dashed lines), with calculated temperature factors (red colormap). The agreement between predicted bonds and experimental NMR data underscores the model's accuracy. (b) Metal-binding site for which the histidine protonation state is validated by the long-range HSQC spectra displaying characteristic histidine epsilon tautomer side-chain pattern (lower panel).

We then obtained NMR assignments for the WRC3 construct to validate the accuracy of the WRC predicted structures via the analysis of structural restraints (Figure S9). Our NMR analyses indicate that the secondary structure populations derived from WRC3 backbone chemical shifts align well with the predicted secondary structure elements (Figure S10). Furthermore, hydrogen bond interactions, assessed through 1HN temperature coefficients, closely match the predicted H-bond networks, reinforcing the model's accuracy (Figures 3a, inset, S10). Perturbation of the secondary structure with trifluoroethanol (TFE) leads to linear displacements of the 1HN chemical shifts. The changes in the slope of these displacements correlate well with the boundaries of secondary structure motifs in the model, further validating the predicted locations of these elements (Figure S10). The WRC3 HetNOE profile, along with R1 and R2 NMR relaxation measurements, confirms the presence of unstructured N- and C-terminal segments outside the predicted folded regions (Figure S11).

Additionally, we confirmed the orientation of the folded structural elements and their relative spatial distribution via HN-N residual dipolar coupling (RDC) and NOESY-HSQC analyses. RDC measurements of amide protons in WRC3 within an aligned medium indicate that the relative orientation of the α-helix and β-hairpin matches the model's predicted orientation (Figure S10). The cross-peaks observed in NOESY-HSQC spectra agree with those expected from the model, further supporting the spatial distribution and predicted inter-residue distances for several WRC amino acids (Figure S12).

Finally, we assigned side-chain resonances of the Zn-bound histidine to analyze its tautomeric state (Figure S13). The signal pattern observed in long-range 1H-15N HSQC spectra for all constructs reveals a histidine in an HNε tautomeric form, indicating that the metal ion is coordinated via Nδ (Figures 3b, S14) (Damblon et al., 1999). On top of that, we also observed NOEs between the aromatic proton resonances of the conserved tryptophan and the coordinating histidine (Figure S13). Altogether, the NMR restraints obtained agree with the AlphaFold3 output, suggesting that the model accurately represents the WRC solution structure.

2.4 The WRC motif is involved in DNA recognition

The WRC domains of GRFs (Kim et al., 2012; Osnato et al., 2010) and other proteins (Xie et al., 2023) were found to drive interactions with dsDNA segments in promoter regions. To explore the mechanisms underlying DNA recognition and binding by WRCs, we conducted in vitro biophysical studies on the WRC domains of A. thaliana GRFs.

Our analysis focused initially on WRC7, as its DNA target sequence is more thoroughly studied than those of other GRF WRCs (Kim et al., 2012). To assess WRC binding, we evaluated its interaction with previously reported specific and non-specific DNA sequences using tryptophan fluorescence quenching (Urbaneja et al., 2000). We observed that all DNA fragments quenched the intrinsic fluorescence of WRC, regardless of their specificity (Figure 4a). However, the dsDNA containing the cis-targeting sequence induced a complete fluorescence quenching, whereas the non-specific sequence resulted in only partial quenching. Complete quenching was also observed for a minimal sequence harboring the specific DNA binding motif (Figure S15). This behavior suggests a preference for specific DNA sequences and points towards a possible role for the conserved Trp in DNA binding. Moreover, CD difference spectra of WRC in the presence of the minimal dsDNA target sequence also show significant changes (Figure 4b). The near UV region shows an intense unstructured difference band, suggesting that WRC binding causes significant alterations in the conformation of bound DNA. The far-UV region shows a ca. 7 nm red shift in the minimum, hinting at a rearrangement of the WRC backbone. The titration curves obtained by both methodologies allowed us to estimate an occupation of ca. 7 bp for WRC7, consistent with the length of the reported GRF7 recognition sequence.

Details are in the caption following the image
WRC domains bind differently to specific and non-specific DNA with a footprint of 7 bp. (a) Tryptophan fluorescence quenching of specific (dark green) and non-specific (light green) DNA sequences at different bp/WRC7 equivalents. Maximal quenching for each probe is indicated by dashed lines. Insets show the fluorescence emission spectra for each bp/WRC7 ratio, with darker colors representing lower ratios. (b) Circular dichroism (CD) difference spectra of WRC7 titrated with DNA. The inset displays the binding curve derived from the integral CD signal, indicating an occupancy of 7.18 bp per WRC molecule.

We then aimed to identify the residues involved in WRC–DNA interaction using NMR spectroscopy. We studied the WRC1 domain due to its higher stability and the larger number of signals observed in the HSQC spectra compared to WRC7 (Figures S5C and S7, and Table S5). We designed minimal specific and non-specific binding probes based on previously reported GRF1 cis-targeting sequences and confirmed their binding to WRC1 by CD spectroscopy (Figure S16). To map the residues involved in WRC–DNA interaction, we acquired NMR spectra of WRC1 in the presence of either target DNA or control DNA. Surprisingly, while the complex with target DNA gives a well resolved spectrum, the complex with control DNA yields a spectrum with only few signals (Figure S16A, B, respectively). These results suggest that, while WRC1 binds both specific and non-specific DNA, it only forms a well-defined complex in the case of the specific DNA but an undefined complex in the case of the non-specific DNA. This finding aligns with the differences in specificity observed for WRC7 probes and accounts for the higher stabilization of the WRC-DNA complex in the presence of specific DNA sequences.

We calculated the chemical shift perturbation of assigned signals in the presence of the specific minimal sequence (Figures 5a and S16A). As shown in Figure 5b, residues exhibiting significant chemical shift perturbations cluster into the same region within the WRC folded structure. Residues G200, R201, K209, W210, R211, C212, and R225 form the core DNA recognition interface. In addition, the signal corresponding to residue E198, which lies at the end of the disordered N-terminal region and in close proximity to G200 and R201, disappears upon DNA addition, also suggesting a role in DNA binding. Zinc removal was found to impair dsDNA binding (Figure S16), consistent with the assembly of a DNA interaction surface that integrates the two distant arginine residues found in positions 201 and 225 and the conserved WRC motif. However, due to the incomplete WRC assignment, we cannot rule out the participation of other residues in the WRC–DNA interaction. Notably, E198, G200, K209, and R225 are key residues that define the unique amino acid signature of GRF-type WRCs, as revealed by our bioinformatic analysis (Figure 1c). This signature may underlie the sequence specificity of WRCs within this family of transcription factors. Our findings highlight the role of the conserved WRC motif in DNA binding and suggest that zinc binding is not only essential for structural integrity but also for the proper alignment of residues critical for DNA recognition.

Details are in the caption following the image
Residues involved in DNA-binding cluster on one side of the WRC domain. (a) Chemical shift perturbation (CSP) analysis of WRC1 residues upon titration with specific dsDNA. Bars are colored proportionally to the CSP values, with darker colors indicating larger perturbations. The dashed bar corresponds to residue E198, which disappears upon DNA binding, indicating a significant conformational change or exchange broadening. Thresholds corresponding to the mean (CSP), mean + 0.5σ, and mean + 1σ are indicated by solid and dashed lines. Gray-shaded areas represent non-assigned residues. (b) Mapping of the perturbed residues with a CSP higher than mean + 0.5σ onto the folded region of the validated WRC3 structure. The Zn(II) ion is shown as a gray sphere. Key residues involved in DNA binding are labeled. Residues in both panels are numbered based on the GRF1 primary sequence.

3 DISCUSSION

GRFs play integral roles in diverse biological processes that shape plant growth, development, and physiological responses (Lazzara et al., 2024; Liebsch & Palatnik, 2020). The critical functions of this transcription factor family and their potential biotechnological applications have sparked considerable interest in elucidating their molecular mechanisms. Numerous studies have focused on identifying the DNA cis-regulatory sequences targeted by GRFs across various plant species (Kim et al., 2012; O'Malley et al., 2016; Omidbakhshfard et al., 2018; Piya et al., 2020). As transcription factors, GRFs exhibit a modular domain architecture comprising three distinct regions: the QLQ domain for protein–protein interactions, the WRC domain for DNA binding, and a C-terminal transactivation region. Since their discovery, the WRC domain has been hypothesized to function as a ZF due to its similarity to the CCCH motif of HRT DNA-binding domain (Raventós et al., 1998; van der Knaap et al., 2000). A recent study provides experimental information on the interaction between the WRC and QLQ domains of GRFs and its involvement in DNA recognition (Nosaki & Ohtsuka, 2025). However, the mechanisms by which the WRC domain enables GRFs to recognize and bind their DNA targets remain poorly understood. In this study, we provide a comprehensive characterization of the structural and functional properties that distinguish the WRC domains within the GRF family.

Our bioinformatics and experimental analyses reveal a unique amino acid signature within GRF-associated WRC domains, which appears essential for their DNA recognition specificity. Furthermore, SVD analysis underscores the distinctive nature of these domains by clustering GRF WRCs into a discrete group, reinforcing the notion that their intrinsic features are tailored for specific biological roles.

ZF transcription factors represent a versatile class of DNA-binding proteins characterized by zinc-coordinating motifs that stabilize their structure and enable DNA interaction, and account for one of the largest transcription factor families in plants (Englbrecht et al., 2004; Wang et al., 2008). Whereas many characterized ZF can be grouped based on a limited set of canonical structures, these domains present an extremely large variability in terms of global conformation. The structural validation we present here confirms that WRCs are CCCH ZF domains that adopt a non-canonical fold, with the metal ion playing a crucial role in stabilizing their conformation. We found that WRCs lack a distinct hydrophobic core, making zinc coordination and hydrogen bonding the primary forces driving their folding into a specific tertiary structure. The low quality of their NMR spectra, with signals absent for several residues in HN-N and triple resonance spectra, indicates that even the core region is flexible on a μs–ms timescale. This flexibility may be important for binding a variety of targets. In fact, our experiments suggest that WRCs can bind both specific and non-specific dsDNA, as reported for other Zn finger domains (Urbaneja et al., 2000). The discrimination between both most probably needs local conformational rearrangements of the backbone structure that are explored in the free protein.

WRC domains appear to be necessary for binding of GRFs to DNA, but not sufficient to confer specificity. Transcription factors that regulate developmental processes, such as GRFs, must be tightly controlled and limited to particular cell types and precise developmental stages. In the present work, we study the domain isolated from its native context. One could speculate that the missing information required for sequence specificity may reside in interactions with cellular partners, adding a further control layer to the GRFs regulatory network.

Within the defining triad of residues WRC, the cysteine is one of the Zn(II) ligands and thus essential for the stability of the fold, but neither the arginine nor the tryptophan make significant contacts with the rest of the protein. Instead, their sidechains are both exposed and appear to be essential for nucleic acid binding, as reported by the NMR spectra of the complex. In RanBP2-type ZFs, a tryptophan and two arginines dictate the recognition of specific nucleotides in single-stranded RNAs, with the tryptophan stacking between two bases (Loughlin et al., 2009; Nguyen et al., 2011). Although this binding mechanism is not feasible for double-stranded DNA given its conformational restraints, our results, along with the high conservation of this residue within WRCs, hint towards a pivotal role for this aromatic amino acid in DNA binding. Further studies will be essential to delineate the structural function of the indole sidechain in the formation of GRFs:DNA complexes in atomic detail.

GRFs are among the few known CCCH ZFs that bind directly to DNA to regulate gene expression (Li & Thomas, 1998; Wang et al., 2020). This contrasts with the majority of CCCH zinc finger proteins, which are primarily associated with RNA metabolism, including RNA cleavage, degradation, polyadenylation, and other post-transcriptional processes (Jan et al., 2013; Lee et al., 2012; Maldonado-Bonilla et al., 2014; Peng et al., 2012; Tants et al., 2024). Moreover, while HRT and other CCCH proteins have multiple ZF domain copies, most GRFs only harbor one WRC domain. To our knowledge, GRFs function has only been linked to DNA association; however, the unique structural and biochemical features behind GRFs' architecture make this system appealing for the exploration of new biological roles. The lack of residue-specific structural information for the WRC domain, coupled with the absence of homologous structures, has left the molecular mechanisms underlying GRF–DNA recognition insufficiently understood. Deciphering the determinants of sequence specificity that govern GRF interactions with their cis-regulatory elements is critical for elucidating potential functional divergences in DNA binding among the various isoforms. Such insights are vital for advancing our understanding of the gene regulatory networks mediated by GRFs. Furthermore, a deeper comprehension of these interactions is pivotal for the rational design of GRF-based biotechnological applications, particularly in enhancing transformation and regeneration efficiencies in crops, thereby contributing to improved agricultural productivity and sustainability.

4 MATERIALS AND METHODS

4.1 Bioinformatic analysis

WRC-containing proteins were downloaded from UniProt using the PFAM ID (PF08879). Briefly, the WRC sequences were curated by extracting WRC extended sequences from each full-length primary sequence, extending beyond the WRC limits reported by UniProt by 25 amino acids at both ends when possible. These extended sequences were then aligned using MAFFT. Conservation analysis within the alignment facilitated the proper determination of WRC boundaries, which were further trimmed by removing non-conserved terminal regions to obtain more compact WRC sequences. Sequences with >90% identity were discarded, along with those shorter or longer than the median sequence length by more than 30%. The length-filtered set was then re-aligned, and positions containing gap residues in more than 50% of the sequences were eliminated (Figure S1A). SVD analysis was performed using previously established protocols and open-source Python scripts (Baxter-Koenigs et al., 2022). The phylogenetic tree was constructed with the neighbor joining algorithm using the PAM 250 substitution matrix implemented in Jalview (Waterhouse et al., 2009). Circular representation of the phylogenetic tree with the annotated SVD clusters was performed using the PyCirclize Python library (github.com/moshi4/pyCirclize).

4.2 Plasmid construction

The A. thaliana WRC sequences from GRF1, GRF3, GRF5, and GRF7 (UniProt accession numbers O81001, Q9SJR5, Q8L8A6, and Q9FJB8, respectively) were amplified from cDNA samples (Table S3) by PCR and inserted into the pET-32a-derived pT7 expression vector using restriction-free cloning (Correa et al., 2014). This vector provides a His tag, followed by the TrxA fusion protein and a TEV cleavage site at the 5' end of the target gene. Constructs were verified by DNA sequencing. All expressed WRC domains span 5 amino acids beyond the reported limits of the WRC sequences for these GRF isoforms at both ends. All DNA oligonucleotides and sequencing services were obtained from Macrogen (Seoul, Republic of Korea).

4.3 Protein expression and purification

Expression plasmids were transformed in Escherichia coli BL21(DE3) cells, which were then grown at 37°C in Erlenmeyer flasks shaken at 180 rpm in either M9 minimal medium supplemented with 1 g/L 15N-NH4Cl or 1 g/L 15N-NH4Cl and 2 g/L U[13C]-Glucose (Cambridge Isotope Laboratories), or in LB broth in the presence of 100 μg/mL Ampicillin. Protein expression was induced with 0.25 mM IPTG at OD600 ≈ 0.7, and cells were incubated for an additional 4 h at 37°C. The cells were collected by centrifugation at 4000 g for 20 min at 4°C, followed by resuspension into a 20 mL solution containing 50 mM Tris pH 8.0, 500 mM NaCl, and 1 mM β-mercaptoethanol. The suspension was disrupted by sonication, and the soluble fraction was clarified by centrifugation for 30 min at 20,000 g. His-tagged Trx-WRC fusions were purified using a Ni(II) column and digested with His-tagged TEV protease. The digested proteins were purified with a Superdex 75 Increase 10/300 GL size exclusion chromatography column equilibrated with 20 mM HEPES, 50 mM NaCl, pH 7.0. Proteins were concentrated using centrifugal filter units and supplemented with TCEP 5 mM and 1 equivalent of ZnSO4. Protein concentration was measured by UV absorption at 280 nm using the corresponding absorptivity coefficient calculated by ProtParam tool at ExPASy web portal (Gasteiger et al., 2005). Sequences of purified WRC variants are detailed in Table S4. WRC samples were used directly or flash-frozen in liquid nitrogen and transferred immediately to a −80°C freezer for long-term storage.

4.4 Zn(II) stoichiometry and binding affinity

The zinc(II) content was determined using the metallochromic indicator 4-(2-pyridylazo)resorcinol (PAR) (Kocyła et al., 2015). HoloWRC7 protein samples (6.2 μM) were incubated with PAR (100 μM) in a denaturing buffer (2.8 M guanidinium chloride, 20 mM HEPES, and 50 mM NaCl, pH 7.0) at 100°C for 20 min to release bound zinc. Absorbance was measured at 500 nm, with background correction at 650 nm, using a 96-well plate. Zinc concentrations were quantified against a Zn2+ calibration curve (0–10 μM) prepared from a ZnCl2 standard solution. The results represent the average of triplicate measurements. All buffer solutions were treated with Chelex 100 (Sigma) by extensive stirring to remove trace metal contamination. Prior to zinc quantification, HoloWRC7 was diafiltered into Chelex-treated buffer (20 mM HEPES, 50 mM NaCl, pH 7.0) to remove unbound zinc and minimize interference. A protein-free buffer was used as a control for quantification.

ApoWRC7 was prepared by acidifying WRC7 in 20 mM HEPES, 50 mM NaCl, pH 2.0, at 4°C for 4 h to remove bound zinc. The sample was subsequently concentrated and subjected to three cycles of diafiltration in Chelex-treated buffer at pH 2.0 to remove residual metals. During this process, 5 mM TCEP was added to prevent cysteine oxidation. The buffer was then exchanged to 20 mM HEPES, 50 mM NaCl, and 5 mM TCEP at pH 7.0. Dissociation constants for Zn(II) binding were estimated by competition with the chromophoric chelator PAR (Kocyła et al., 2015). PAR is a metallochromic compound whose UV–visible absorption spectrum changes upon metal coordination, resulting in a shift of its maximum absorption wavelength from 414 to 500 nm. Pure PAR and PAR2Zn spectra were used to deconvolute the spectra of each species using multivariate curve resolution-alternating least squares (MCR–ALS). The dissociation constant of PAR2-Zn under our experimental conditions was estimated as 2.2 × 10−12 M2, consistent with previously reported values at pH 7 (Kocyła et al., 2015). This value was then used to calculate the Kd of WRC-Zn using the same deconvolution procedure. The binding curve of PAR competition experiments was fitted to a one-site binding model for WRC7, as measured by our Zn2+ content determination experiments. Spectra were measured at 25°C using a Jasco V-630 BIO spectrophotometer (Jasco, Easton, MD, USA) with Peltier temperature control (10 mm quartz cell) and each spectrum corresponds to an independently prepared sample. All Kd fittings were performed by using the DynaFit software package (Kuzmič, 2009).

4.5 Circular dichroism spectroscopy

Circular dichroism (CD) spectra were recorded using a Jasco J-1500 spectropolarimeter (Jasco, Easton, MD, USA).

For far-UV CD measurements, protein samples were placed in a 1 mm pathlength quartz cuvette to minimize buffer absorption. Spectra were recorded from 190 to 240 nm at 10°C, averaging four scans to enhance the signal-to-noise ratio. The buffer contained 50 mM phosphate at pH 7.0, and the final protein concentration was 10–20 μM (~0.1 mg/mL). Samples were freshly prepared by diluting stock WRC solutions (Section 3.2).

Near-UV CD spectra were recorded from 250 to 320 nm at 10°C using a 10 mm pathlength quartz cuvette. Protein samples were prepared at a final concentration of 120–200 μM (A280 ≈ 1) in the stock buffer. Four scans were averaged for each spectrum.

To obtain CD spectra for ApoWRC, 10 molar equivalents of EDTA were added to chelate zinc, ensuring complete demetallation.

Thermal melting experiments were performed by recording near-UV spectra from 10 to 95°C in 5°C increments, with a heating rate of 1°C/min. The data sets of melting spectra were analyzed using MCR-ALS to resolve the spectra of folded and unfolded states. Melting temperatures (Tm) were determined by fitting the evolution of the folded component as a function of temperature to a two-state transition model.

CD data were processed using an in-house developed library specifically designed for reading and processing CD spectra (github.com/francobiglione/ProteinBiophysics), along with custom Python scripts for data analysis and visualization.

4.6 NMR spectroscopy

HSQC NMR spectra of 15N labeled WRCs were acquired at 298 K on a 700 MHz Bruker Avance III spectrometer equipped with a TXI probehead using pulse sequences from the standard Bruker library. All spectra were processed with NMRPipe and analyzed with CCPNMR in the NMRBox software suite (Delaglio et al., 1995; Maciejewski et al., 2017; Skinner et al., 2016). ApoWRC forms were generated as described in Section 4.5.

The WRC3 construct was assigned using a set of triple-resonance spectra (HNCA/(CO)CA, HNCACB/(CO)CACB, HN(CA)CO/HNCO) and 1H-15N heteronuclear TOCSY and NOESY datasets, collected on the same spectrometer for double (13C-15N) or simple (15N) uniformly labeled 600 μM samples at 298 K and stock buffer conditions with 10% D2O (Section 3.3). NMR assignments of WRC3 resonances were deposited in the BioMagResBank (BMRB entry 53,000). Secondary structure populations were predicted by implementation of the simultaneous sequence-based predictor s2D by means of assigned backbone chemical shifts (Hα, CO, Cα, Cβ, HN, NH) (Sormanni et al., 2015). Amide hydrogen bond interactions were assessed by calculating temperature coefficients derived from 15N–1H HSQC spectra recorded at 298, 303, 308, 313, and 318 K (Cierpicki & Otlewski, 2001). Similarly, 2,2,2-trifluoroethanol (TFE) coefficients were calculated from 5N-1H HSQC recorded at 298 K at increasing ratios of TFE/buffer mixtures (0, 4, 9, and 15% v/v). These coefficients were used to map secondary structure elements based on TFE's ability to perturb them (Roccatano et al., 2002). For temperature and TFE coefficients, error bars in the plots represent standard errors for the linear fit of each residue position. HN-N RDCs were measured at 298 K in C12E5 n-hexanol anisotropic medium after stabilization of the sterically induced alignment (Rückert & Otting, 2000). HN-N RDCs were obtained on a 15N labeled sample using the standard IPAP sequence, as implemented in the Bruker library. Analysis and calculations of RDCs were performed with the best-ranked AlphaFold (AF3) WRC3 model using the software MODULE2 (Dosset et al., 2001). Backbone dynamics of WRC3 were determined at 298 K. 15N T1 and T2 relaxation and 1H-15N nuclear Overhauser effect (NOE) experiments were recorded using standard pulse sequences from the Bruker Topspin library. For 15N T1 and T2 acquisitions, relaxation delays were randomized, and duplicate spectra were collected at several time points to estimate uncertainties. Relaxation rates were calculated by fitting peak intensities at different time points to a two-parameter exponential decay function using CCPNMR. Steady-state 1H–15N hetNOE values were obtained by dividing the peak heights of paired spectra collected with and without an initial 4-s proton saturation period. Correlation time (τc) was calculated from the R2/R1 ratio as described elsewhere (Rossi et al., 2010). Tryptophan and histidine sidechain resonances were assigned using homonuclear 1H-1H TOCSY and NOESY (D2O) experiments and long-range 15N–1H HSQC spectra acquired at 278 K on non-labeled WRC7 or 15N WRC7, respectively, using pulse sequences from the standard Bruker library. NOESY samples were generated by diafiltration with D2O-prepared stock buffer. All illustrations containing NMR spectra were generated using Python's standard visualization libraries and the nmrglue module (Helmus & Jaroniec, 2013).

4.7 DNA interaction

4.7.1 Probes for DNA interaction studies

Probes used in DNA interaction studies are detailed in Table S3. For fluorescence and NMR experiments, we utilized specific and non-specific GRF7 long oligonucleotides (34 bp), previously labeled as E1 and E2 in the original publication (Kim et al., 2012). For CD and NMR studies, minimal specific dsDNA sequences were designed based on reported GRF1 (13 bp) and GRF7 (12 bp) cis-targeting sites. A minimal non-specific probe, derived from a nucleotide region shown to lack GRF7 interaction (Kim et al., 2012), was used for GRF1 NMR studies due to the similarity of cis-targeting sites between GRF7 (TGTCAGG) and GRF1 (reverse complement of GTCGAGT*). Minimal probes were designed as tetraloops to ensure the dsDNA structure at GRF-targeting sites was maintained, minimizing the presence of interfering ssDNA. Loop nucleotide identity and closing base pairs were rationally selected to optimize tetraloop stability based on well-characterized studies in the literature (Wang et al., 2018). Long probes were annealed to dsDNA by heating stock solutions at 95°C for 5 min, followed by slow cooling to 20°C. Minimal dsDNA was prepared by heating stock solutions for 5 min at 95°C, then snap cooling in an ice bath to prevent intermolecular interactions.

4.7.2 Fluorescence binding studies

Fluorescence equilibrium binding isotherms for WRC7 were obtained by monitoring tryptophan quenching upon nucleic acid binding. Fluorescence measurements were conducted using a Cary Eclipse spectrofluorometer with an excitation wavelength of 280 nm and emission spectra recorded from 290 to 450 nm. Data were collected at 15°C, following the addition of 15 μM long or minimal GRF7 oligonucleotide solutions to 50 μL protein samples (1 μM) at varying DNA/protein ratios (0–2 equivalents). The buffer conditions were 20 mM HEPES, 50 mM NaCl, pH 7.0. Each fluorescence spectrum corresponded to an independent sample of a defined DNA/WRC ratio, measured in a 50 μL quartz cell. Spectra were corrected for baseline contributions from buffer fluorescence. WRC7-nucleic acid complex formation was quantified by the reduction in initial protein fluorescence, expressed as fraction quenching (1−F/F0), and plotted against bp/WRC equivalents (DNA/WRC ratio multiplied by oligonucleotide length).

4.7.3 Circular dichroism (CD) binding studies

CD spectra for DNA-binding analyses were recorded from 190 to 320 nm at 10°C, averaging four scans to enhance the signal-to-noise ratio. The buffer condition was 50 mM phosphate, pH 7.0. Protein concentrations were 13 μM, and minimal dsDNA (30 μM stock solutions) was added at DNA/WRC ratios ranging from 0 to 2 equivalents. Difference CD spectra were generated by subtracting blank DNA spectra at the same oligonucleotide concentrations. Each protein-DNA or blank DNA spectrum was measured from an independent sample. Binding signals were derived by integrating the area under the curve (270–300 nm) using Riemann's approximation. bp/WRC equivalents were calculated as described previously. To assess the lack of binding of ApoWRC1 to WRC1 minimal sequences, CD measurements were performed in the presence of 0.5 equivalents of minimal dsDNA probes and 1.6 mM EDTA to chelate zinc ions.

4.7.4 NMR chemical shift perturbation analysis

15N-1H HSQC NMR spectra of WRC1:dsDNA samples were acquired at 298 K (Section 3.6) in 3 mm NMR tubes at a final WRC1 concentration of 130 μM. Mock buffer, minimal specific or minimal nonspecific DNA probes (0.9–1 mM stock DNA solutions) were stepwise added at 0.5 and 1 equivalents to three independent 15N WRC1 samples before acquisition of spectra at these conditions (samples were diluted in stock buffer). WRC1 assignment was performed by transferring WRC3 assignment of conserved residues onto WRC1 peaks of spectra without DNA, signals for which there was no clear assignment were removed. Chemical shift displacement was identified by analyzing disappearing signals and appearing of new signals nearby and confirmed through the analysis of spectra at the two increasing concentrations to account for signal direction of displacements. Quantification and mapping of DNA-binding residues was performed by calculation of chemical shift perturbations of each residue using the following equation: SP = Δ δ H 2 + 0.142 * Δ δ N 2 $$ \mathrm{SP}=\sqrt{{\left(\varDelta {\delta}_H\right)}^2+{\left({0.142}^{\ast}\varDelta {\delta}_N\right)}^2\ } $$ . Mean CSP plus half standard deviation was used as a threshold for identifying perturbed residues involved in DNA binding.

AUTHOR CONTRIBUTIONS

Franco A. Biglione: Conceptualization; methodology; formal analysis; writing – review and editing; writing – original draft; investigation. Nahuel D. González Schain: Funding acquisition; writing – review and editing; conceptualization. Javier F. Palatnik: Conceptualization; writing – review and editing. Rodolfo M. Rasia: Conceptualization; funding acquisition; investigation; writing – review and editing; project administration; supervision.

ACKNOWLEDGMENTS

F.A.B. is a Fellow at CONICET. J.F.P., N.D.G.S, and R.M.R. are Career Researchers at CONICET. We are grateful to Silvana Sut for her excellent assistance with labware and media preparation, as well as Andrea Coscia and Alejandro Gago for maintenance of the NMR facility. We also thank Daniela Liebsch for insightful discussions and valuable input throughout the development of this work.

    CONFLICT OF INTEREST STATEMENT

    The authors state no conflicts of interest.

    DATA AVAILABILITY STATEMENT

    The data that supports the findings of this study are available in the supplementary material of this article.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.