Hierarchy in regulator interactions with distant transcriptional activation domains empowers rheostatic regulation
Review Editor: Aitziber L. Cortajarena
Abstract
Transcription factors carry long intrinsically disordered regions often containing multiple activation domains. Despite numerous recent high-throughput identifications and characterizations of activation domains, the interplay between sequence motifs, activation domains, and regulator binding in intrinsically disordered transcription factor regions remains unresolved. Here, we map sequence motifs and activation domains in an Arabidopsis thaliana NAC transcription factor clade, revealing that although sequence motifs and activation domains often coincide, no systematic overlap exists. Biophysical analyses using NMR spectroscopy show that the long intrinsically disordered region of senescence-associated transcription factor ANAC046 is devoid of residual structure. We identify two activation domain/sequence motif regions, one at each end that both bind a panel of six positive and negative regulator domains from biologically relevant regulators promiscuously. Binding affinities measured using isothermal titration calorimetry reveal a hierarchy for regulator binding of the two ANAC046 activation domain/sequence motif regions defining these as regulatory hotspots. Despite extensive dynamic intramolecular contacts along the disordered chain revealed using paramagnetic relaxation enhancement experiments and simulations, the regions remain uncoupled in binding. Together, the results imply rheostatic regulation by ANAC046 through concentration-dependent regulator competition, a mechanism likely mirrored in other transcription factors with distantly located activation domains.
1 INTRODUCTION
Intrinsically disordered regions (IDRs) are conformationally malleable, empowering them with an enormous interaction potential in protein networks (Holehouse & Kragelund, 2023) and responsiveness to cellular cues through post-translational modifications (Newcombe et al., 2022) and interactions (Bjarnason et al., 2023). Short linear motifs (SLiMs) are short conserved sequence stretches (typically less than 10 residues) within an otherwise non-conserved context (Berlow et al., 2017; Davey et al., 2012; O'Shea et al., 2017). They are prevalent in IDRs, where they are responsible for interactions. Often, SLiMs fold upon binding to form short α-helices (Davey et al., 2012; Gianni et al., 2016; Rogers et al., 2014; Wright & Dyson, 2015) with large context effects on affinity, mostly arising from the flanking regions (Karlsson et al., 2022; O'Shea et al., 2017; Palopoli et al., 2018; Prestel et al., 2019). However, SLiMs that remain disordered in the complex also exist (Dreier et al., 2022).
Intrinsic disorder is pronounced in transcriptional processes (Christensen et al., 2019; Liu et al., 2006; Staby et al., 2017) and transcription factors are key players. They activate or repress target gene transcription by the recruitment of additional components of the transcriptional machinery including coregulators to the target gene (Kornberg, 2005). Transcription factors generally consist of at least a folded DNA binding domain, enabling binding to their cis-elements, and a transcriptional regulatory domain, which is often intrinsically disordered. Within these IDRs, activation domains (ADs) drive coactivator binding, thereby activating transcription (Udupa et al., 2024).
The identification of ADs and the decomposition of their sequence properties have been facilitated by numerous recent high-throughput screens conducted in multiple organisms (Broyles et al., 2021; Erijman et al., 2020; Hummel et al., 2023; Morffy et al., 2024; Staller et al., 2022). These show that transcription factors can carry more than one AD, and that neighboring ADs may retard regulator dissociation by allovalency involving recapturing of regulators from one AD through binding to another AD (Delaforge et al., 2025; Sanborn et al., 2021). The studies also showed that the AD strength, that is, the ability to activate transcription, depends on the density and patterning of different types of amino acid residues, for example, acidic and aromatic/leucine residues (Broyles et al., 2021; Erijman et al., 2020; Hummel et al., 2023; Staller et al., 2022). The acidic-aromatic patterning was suggested to be important for facilitating binding site accessibility in the transcription factors for coregulators (Kotha & Staller, 2023; Staller et al., 2022). The patterning often leads to structure formation of the AD upon complex formation (Lochhead et al., 2020). However, some studies also propose the formation of more dynamic complexes between ADs and coregulators (Tuttle et al., 2021). High-throughput affinity measurements of regulator:AD interactions revealed affinities in the nM–μM range with coregulator domains binding ADs either specifically or promiscuously (DelRosso et al., 2024). Furthermore, SLiMs were suggested to play a functional role in ADs primarily when the ADs fold upon binding (Sanborn et al., 2021). Although these recent studies have contributed to the understanding of aspects of AD:coactivator interactions such as specificity/promiscuity, multivalency, and the role of SLiMs, several of the studies have focused on AD interactions with the MEDIATOR subunit MED15 or with cAMP-response element binding (CREB)-binding protein (CBP)/p300 (DelRosso et al., 2024; Mindel et al., 2024; Sanborn et al., 2021). Therefore, more case studies with both different transcription factors and different regulators are needed to improve our understanding of transcription.
To decompose the molecular details of regulator:AD interactions, we linked data from a high-throughput AD screen (Morffy et al., 2024) with clade bioinformatics and experimental biophysical analyses of the Arabidopsis thaliana NAM, ATAF1/2, CUC (ANAC) transcription factor family, a plant-specific family with more than 100 members. Due to similar biological functions of the members in senescence and development (Gonçalves et al., 2015; Oda-Yamamizo et al., 2016) and to the intriguing patterns in their disorder profile (Stender et al., 2015) we focused on clade II-3 containing ANAC transcription factors. We identified the SLiMs of the clade and compared these with AD locations, revealing nonsystematic overlaps between SLiMs and ADs. Using ANAC046 as a representative, we thermodynamically and structurally characterized its interactions with a pool of biologically relevant regulators. The affinities and order of binding uncovered a promiscuous binding strategy with independence and hierarchy of the two distant ADs in the long IDR of ANAC046. These results suggest rheostatic regulation with little chemical specificity relying on concentration-dependent competition. We suggest that similar regulation mechanisms are mirrored in other transcription factors with distant ADs in long IDRs.
2 RESULTS
2.1 SLiM identification in the long IDRs of ANAC clade II-3 transcription factors
As a first step, we identified conserved sequence motifs in the ANAC clade II-3 transcription factors, members of which share regulatory functions (Stender et al., 2015). In the model plant Arabidopsis, this subgroup consists of 13 transcription factors, which all contain a folded NAC DNA binding domain (Figure 1a). The DNA binding domain is followed by a C-terminal region, differing in length, and with low AlphaFold (Jumper et al., 2021) confidence (pLDDT) scores suggestive of intrinsic disorder (Figure 1a) (Akdel et al., 2022). To identify SLiMs, we aligned the IDRs of each clade member individually with orthologous viridiplantae sequences and generated logo plots (Figure S1a). In this way, we identified four SLiMs (M1–M4) spanning 6–11 residues (Figure 1a, b). M2 and M3 were previously identified in some of the NAC transcription factors (Jensen et al., 2010; Taoka et al., 2004), but their functions remain elusive, while M1 and M4 were identified here (Figure 1b). M2 is the most frequently occurring SLiM present in 11 of the 13 transcription factors of clade II-3, while M3 is present in six members. M1 and especially M4, present in the same four members, are long and each may contain more than one SLiM.

To examine if the four SLiMs are present in additional Arabidopsis transcription factors, each SLiM was screened against the Arabidopsis transcription factor proteome using PSSMSearch (Krystkowiak et al., 2018). The analyses (Tables S1–S4) revealed no additional instances of M3. For M1, M2, and M4, additional instances remained after filtering. For M1, instances were found in ANAC017 and ANAC016. M2 occurs in nine additional transcription factors. For M4, two separate analyses were performed using different filtering strategies. These revealed that M4 has four and 31 instances from different transcription factors depending on the SLiM consensus used in the filtering strategy (Table S4). Summing up, the IDRs of the ANAC clade II-3 transcription factors are rich in SLiMs (Figure 1a), but the expansion of their SLiMs outside this subgroup is limited except for M4.
2.2 SLiMs are not directly linked to AD activity
To investigate whether the SLIMs overlap with ADs, we used data from a high-throughput yeast-based assay (Morffy et al., 2024), with the AD threshold μd as one standard deviation from the mean AD score of the full library (Figure 1a). Five clade members lack functional ADs, while ANAC087 and ANAC046 contain two ADs, and the rest of the transcription factors one identified AD. We found a link between transcription factors being strong activators (Hummel et al., 2023) and containing two experimentally identified ADs (Figure 1a) (Morffy et al., 2024). M1, M2, and M3 locate within ADs for some transcription factors, but not at the AD center, and there is no systematic overlap. Noticeably, an N-terminal AD detected for ANAC046 and ANAC087 was not identified for ANAC100 and ANAC079 even though all four contain both M1 and M2 in this region (Figure S1b). Combined, these analyses suggest either no direct involvement of SLiMs in AD activity or indirect involvement through regulatory modulation, for example, phosphorylation or binding of negative regulators.
The AD scores in the C-terminal region of some of the transcription factors exceeded the threshold value; hence, we examined the C-terminal sequences (Figure 1c). The 21 C-terminal residues of all transcription factors contain hydrophobic or aromatic residues (Figure 1c), and most are rich in acidic residues as reflected in low isoelectric points (Figures 1c and S3). ANAC046 contains a SLiM (here named M5) (Figure 1c) in its C-terminus, known as the Radical-Induced Cell Death1 (RCD1)-RST interaction motif (RIM) (Christensen et al., 2019). M5 has also been identified in other transcription factor families (Christensen et al., 2019; O'Shea et al., 2017). In clade II-3, M5 is also present in ANAC087 (Christensen et al., 2019). The comparison of the clade C-termini to the AD activity suggests that the presence of acidic and hydrophobic/aromatic residues is needed for activity.
To analyze the interplay between SLiMs, ADs, and contexts within a long IDR, we focused on ANAC046. This transcription factor is a strong activator (Hummel et al., 2023) containing two ADs (Figure 1a) and three of the four identified SLiMs (Figure 1), and it is functionally well-characterized (Mahmood et al., 2019; Oda-Yamamizo et al., 2016). From the AD scores, which report on the potential for a sequence region to activate transcription (Morffy et al., 2024), ANAC046 contains an N-terminal and a C-terminal AD spanning 30 and 18 residues, respectively (Figure 1d). The N-terminal AD (AD1) overlaps with M1 and M2 (Figure 1d, lower). The C-terminal AD (AD2) was identified previously (O'Shea et al., 2015) and overlaps with M5, hinting at regulatory effects of this SLiM in activation. Based on these qualities, we proceeded to provide a detailed molecular characterization of the features of the ANAC046 IDR responsible for coregulator binding.
2.3 The ANAC046 IDR: Lack of residual structure, with propensity to phase separate
The IDR of ANAC046 spans 167 residues (ANAC046172–338) of a 339 residues-long protein (Figure 2a). To characterize the IDR, we first used nuclear magnetic resonance (NMR) spectroscopy. The derived NMR secondary chemical shifts (SCS) of Cα and Cβ nuclei calculated from previous work (Newcombe et al., 2021) showed no consecutive positive and negative values corresponding to secondary structure features (Figures 2b and S4a). From 15N transverse (R2) and longitudinal (R1) relaxation rates and {1H}-15N heteronuclear NOE (hetNOE) measurements, the backbone dynamics supported the lack of secondary structure but revealed a small increase in R2 and hetNOE values for parts of both AD1 and AD2 (Figures 2b, c and S4a). Being void of substantial residual secondary structure, AD1 and AD2 may form long-range contacts within the ensemble leading to the observed local increase in R2.

To address if other features of the sequence of ANAC046172–338 would be responsible for the increased R2-values, we investigated the sequence using IDDomainspotter (Millard et al., 2020) (Figure 2c). This revealed it to be overall hydrophobic with a high content of acidic residues (mainly aspartates) in the two ADs. This agrees with the classification of the two ADs as subtype 1 ADs, which are enriched in these residue types (Morffy et al., 2024). The IDR is rich in aromatic (7% FYW), glycine (8%) and serine (13%) residues, features that have been found in several phase-separating proteins. Consequently, two central regions were predicted to be phase separation-prone (Ibrahim et al., 2023) (Figure 2d), although the full ANAC046172–338 sequence is not predicted to undergo homotypic phase separation using a recently described sequence-based prediction model (von Bülow et al., 2024).
So far, the characterization of ANAC046172–338 suggests a highly dynamic IDR with no residual secondary structure. To address the chain compactness, we recorded small-angle x-ray scattering (SAXS) on ANAC046172–338. We also simulated the IDR ensemble using the CALVADOS 2 coarse-grained model (Tesei & Lindorff-Larsen, 2023), and reweighted the ensemble with the SAXS data (Figure 2e, f). From this, we calculated an apparent scaling exponent (ν) of 0.56, in good agreement with a value of 0.53 predicted directly from sequence (Tesei et al., 2024). Compared to the distribution of values of ν found for a broad range of human IDPs (Tesei et al., 2024; Tesei & Lindorff-Larsen, 2023), we find that the ANAC046 IDR ensemble is slightly expanded.
2.4 The ANAC046 IDR binds regulators promiscuously at overlapping sites
We next explored if the identified AD/SLiM regions, AD1(181–210)/M1(184–191)-M2(202–207) (named AD1/M1-2) and AD2(321–338)/M5(328–297) (named AD2/M5) in the ANAC046 IDR (Figure 2b), are involved in partner binding. Here, we sought proteins related to both positive and negative regulation (Table 1). We selected the RST domain from the negative regulator RCD1, known for interactions with transcription factors containing M5 (O'Shea et al., 2015; O'Shea et al., 2017; Shapiguzov et al., 2019), and from the positive coregulator TATA-box binding protein associated factor 4 (TAF4). Due to similarities between M2 and the SLiM (LP(Q/E)L), known to bind the TAZ1 domain of CBP (Berlow et al., 2017), we selected the Arabidopsis homolog of CBP HAC1 and its KIX, TAZ1, and TAZ2 domains. Lastly, we included the ACID domain of the Arabidopsis MEDIATOR subunit 25 (MED25) known to bind transcription factors (Theisen et al., 2024). Overall, a total of six domains from important transcriptional regulators were selected (Table 1); all with a high isoelectric point (pI: 8.7–10.3) and α-helical structures, except MED25-ACID, which forms a β-barrel (Figure S5) (Theisen et al., 2024).
Domain | Arabidopsis parent protein | Relevance | References |
---|---|---|---|
RST | RCD1: negative regulator | Binds transcription factors through M5 | (O'Shea et al., 2015; O'Shea et al., 2017; Shapiguzov et al., 2019) |
RST | TAF4: coactivator as part of the TFIID transcription initiation TFIID complex | Binds transcription factors through M5 | (Friis Theisen et al., 2022) |
TAZ1 | HAC1: coactivator homolog of human CBP | M2 is similar to TAZ1-binding SLiM LP(Q/E)L | (Berlow et al., 2017) |
TAZ2 | HAC1: coactivator homolog of human CBP | CBP-TAZ2 binds transcription factors promiscuously | (DelRosso et al., 2024) |
KIX | HAC1: coactivator homolog of human CBP | CBP-KIX binds transcription factors specifically | (DelRosso et al., 2024) |
ACID | MED25: subunit 25 of the transcriptional coactivator MEDIATOR | MED25-ACID binds various transcription factors | (Theisen et al., 2024) |
To test whether the regulator domains interact with the ANAC046 IDR, we recorded 15N-HSQC NMR spectra of 15N-ANAC046172–338 alone and in the presence of each domain (Figure 3a, b). For all six domains, their presence resulted in a decrease in NMR peak intensities for ANAC046172–338 suggestive of interactions. For the HAC1-TAZ2 interaction, a general decrease in peak intensity for signals across the IDR was observed, leaving most peaks almost invisible, mostly in the regions of AD1/M1-2 and AD2/M5 (Figure 3b, annotated peaks). The general signal loss across the chain can be explained by precipitation upon mixing the proteins and happened also to varying degrees for the other domains. To distinguish between the effects from adding the different partners, we focused on the chemical shifts of three residues positioned differently in ANAC046172–338: S189 in AD1/M1-2, S261 in the middle part, and M315 near AD2/M5 (Figure 3c). For S261, no change in peak position was observed with any of the partners, suggesting that the central region does not participate in the interactions. S189 and M315, positioned in AD1/M1-2 and at the border of AD2/M5, respectively, experienced different chemical shift perturbations (CSPs) depending on the interacting domain. HAC1-TAZ1, HAC1-KIX, and RCD1-RST induced similar changes for the two residues. Conversely, the spectral changes were similar for the binding of MED25-ACID and TAF4-RST, although S189 did not move in the case of MED25-ACID. Thus, the IDR experienced similar chemical environments for interactions with HAC1-KIX, HAC1-TAZ1, and RCD1-RST, and a different one when interacting with MED25-ACID and TAF4-RST.

To define the regulator binding sites in the ANAC046 IDR, we focused on the systems with the least visible aggregation and performed NMR titrations of 15N-ANAC046172–338 with RCD1-RST and HAC1-TAZ1 (Figure 3d, e). Adding RCD1-RST to 15N-ANAC046172–338 decreased the intensities of residues within AD1/M1-2 and AD2/M5, the latter with effects exceeding AD2/M5 by covering residues 300–338. The titration with HAC1-TAZ1 was slightly affected by precipitation (Figure 3e). Still, we observed binding of HAC1-TAZ1 to AD1/M1-2 and AD2/M5 including their contexts, highly analogous to the titration with RCD1-RST. Furthermore, changes in the intensities of residues bordering an NMR invisible central region in the IDR (Figure 3d, e, marked with *) could suggest a potential weak third binding site or an induced contact redistribution from partner binding. For the interactions with RCD1-RST and HAC1-TAZ1, where a full titration could be obtained, we note that AD1/M1-2 is in fast to intermediate exchange and AD2/M5 is in intermediate to slow exchange on the NMR time scale (Figure 3c). Due to pronounced precipitation, a full titration of the remaining complexes was not possible. Instead, we analyzed the NMR peak intensity ratios from the spectra in Figure 3a (Ibound/Ifree) except for the HAC1-TAZ2 complex, due to extensive precipitation (Figure S11). From these ratios, it is evident that all five domains bind ANAC046 IDR at the same two sites. Based on these NMR observations, the three SLiMs, M1, M2, and M5, are involved in regulator binding, although not alone. The full ADs along with additional flanking regions outside the ADs are also involved in the interactions.
We modeled the complexes between the AD/SLiMs and the different regulator domains using AlphaFold 3 (Abramson et al., 2024) (Figures 3f, i and S6). The models predict in all cases induction of structure in the AD/SLiMs but to various degrees, and with more heterogeneity in the AD1/M1-2 complexes. Thus, for AD2/M5, which exchanges on the intermediate to slow NMR timescale, the models are less heterogeneous, suggesting that the consistency of the AlphaFold 3 models may reflect on the exchange rate and likely the affinity. The recently identified core structure formed by transcription factors in complexes with RCD1-RST consisting of a short strand followed by a turn (Newcombe et al., 2024), is only present in the RCD1-RST:AD2/M5 complex (Figure 3g). However, the turn is formed in both AD1/M1-2 and AD2/M5 in complex with RCD1-RST and TAF4-RST. In these two complexes, AD1/M1-2 and AD2/M5 share the RST-binding site, although in the TAF4-RST:AD2/M5 models, the AD2/M5 orientation is reversed compared to that in RCD1-RST:AD2/M5. For AD1/M1-2, the orientations vary for the different models with RCD1-RST, whereas there is only one orientation of AD1/M1-2 in complex with TAF4-RST (Figures 3f, g and S6). Given the relations observed above regarding exchange regimes and model homogeneity, we speculate that the affinity of AD1/M1-2 for TAF4-RST and HAC1-KIX is higher than for RCD1-RST (Figure S6). In interactions with HAC1-TAZ2 and MED25-ACID, AD1/M1-2 and AD2/M5 use different binding interfaces and are more heterogeneous, potentially alluding to weaker affinities for both (Figure 3h, i and S6). Together with the CSPs, the models suggest that the two AD/SLiMs of ANAC046 bind the same regulators, but with different binding modes and affinities, and with more structural heterogeneity in the AD1/M1-2 complexes.
We determined the affinities of ANAC046-IDR for RCD1-RST and HAC1-TAZ1 using isothermal titration calorimetry (ITC) (Table 2, Figures S7 and S8). A short ANAC046 fragment covering just AD2/M5 (ANAC046319–338) was previously reported to bind RCD1-RST with a KD of 0.6 μM (O'Shea et al., 2015). However, with the expanded binding region identified from NMR, we designed a longer fragment covering the region 300–338 (ANAC046300–338). For ANAC046300–338, a KD of 0.12 ± 0.05 μM was obtained, increasing the affinity five-fold compared to the shorter ANAC046319–338. The enhanced binding is reflected in an increased binding enthalpy (more negative ΔH) highlighting context contribution to M5 binding/core structure formation. The same ANAC046 fragment had a 50-fold lower affinity for HAC1-TAZ1 (KD = 6 ± 2 μM). Due to unfavorable injection heats, the binding of a fragment containing AD1/M1-2 (ANAC046172–222) to the two regulators could not be analyzed by ITC. Instead, we used NMR and determined similar KD values of 148 ± 6 μM and 112 ± 6 μM for the binding to RCD1-RST and HAC1-TAZ1, respectively (Table 2, Figures S9 and S10). Thus, there is a hierarchy in binding, with AD2/M5 binding the regulators more strongly than AD1/M1-2. Based on the linker length between AD1/M1-2 and AD2/M5 (110 residues), we did not expect avidity or allovalency effects (Olsen et al., 2017) within the full IDR (ANAC046172–338) and ITC only captured the high-affinity site (AD2) for binding ANAC046172–338 to RCD1-RST. For the interaction with HAC1-TAZ1, we were able to probe the interactions of both AD/SLiMs to ANAC046172–338 using ITC, resulting in KD values of 260 ± 10 μM and 3.0 ± 0.3 μM for AD1/M1-2 and AD2/M5, respectively, and with highly unfavorable entropic effects for AD1/M1-2 (Figure 4a). This was in accordance with the affinities obtained with the individual AD/SLiMs alone (ANAC046172–222 and ANAC046300–338), underscoring the lack of coupling between AD1/M1-2 and AD2/M5 in regulator binding. For both regulators, two binding sites within the long IDR were observed (n = 2).
Cell/syringe | N | KD (μM) | ΔH (kJ/mol) | –TΔS (kJ/mol) | ΔG (kJ/mol) |
---|---|---|---|---|---|
RCD1-RST499–572/ANAC046319–338a | 0.88 ± 0.04 | 0.6 ± 0.1 | −24 ± 1 | −11.3 | −35.1 |
RCD1-RST499–572/ANAC046172–222b | - | 148 ± 6 | - | - | - |
RCD1-RST499–572/ANAC046300–338 | 0.8 ± 0.1 | 0.12 ± 0.05 | −31 ± 7 | −8 ± 8 | −40 ± 1 |
ANAC046172–338 /RCD1-RST499–572 | 0.89 ± 0.08 | 0.20 ± 0.07 | −23 ± 2 | −15 ± 3 | −38.5 ± 0.8 |
HAC1-TAZ1/ANAC046172–222b | - | 112 ± 6 | - | - | - |
ANAC046300–338/HAC1-TAZ1 | 0.97 ± 0.09 | 6 ± 2 | −25 ± 1 | −5 ± 2 | −29.9 ± 0.7 |
ANAC046172–338/ HAC1-TAZ1c | 0.91 ± 0.01 | 3.0 ± 0.3 | −25 ± 2 | −7 ± 2 | −31.5 ± 0.2 |
1.12 ± 0.03 | 260 ± 10 | −65 ± 4 | 44 ± 4 | −20.5 ± 0.1 |
- Note: Error on ITC obtained in this study is given as SEM from triplicates.
- a From O'Shea et al. (2015).
- b Interaction not measurable by ITC, affinity obtained from NMR including error of fit.
- c The interaction was fitted to a two-site binding model corresponding to two sets of sites.

The thermodynamic analysis revealed a favorable entropic contribution to the interactions of regulator domains and AD/SLiMs in the ANAC046 IDR, except for the association between AD1/M1-2 and HAC1-TAZ1 (Figure 4a). The favorable entropic contribution suggests a dynamic interaction sustained by conformational entropy (Skriver et al., 2023), consistent with the AlphaFold 3 models, although contributions from water and counterion release could also contribute. Dynamics in this complex agree with a recent study showing that substituting the normal L-amino acid version of AD2/M5 with the D-version had little effect on ANAC046 binding (Newcombe et al., 2024). From the experiments with ANAC046172–338 and HAC1-TAZ1, the entropic penalty of the AD1/M1-2 association with HAC1-TAZ1 may suggest conformational restrictions upon complex formation involving the long ANAC046 IDR (Figure 4a). However, due to the aggregation impeding high-concentration analyses, the interpretation of the thermodynamics for AD1/M1-2 should be done with caution. Overall, the entropic contribution suggests a dynamic interaction between the AD/SLiMs of the ANAC046 IDR and the regulator domains.
2.5 Intramolecular interactions in the ANAC046 IDR
To explain the decrease in peak intensity in the central ANAC046 IDR region upon the addition of RCD1-RST or HAC1-TAZ1, which could not be explained by a third binding site, we analyzed the ANAC046 IDR for internal contacts. We explored three positions, S217C in AD1/M1-2, S259C central in the IDR, and C323 in AD2/M5. These sites were selected to be located close to, but outside, the two AD/SLiM regions and the NMR-invisible central region. We added S-(1-oxyl-2,2,5,5-tetramethyl-2,5-dihydro-1H-pyrrol-3-yl)methylmethanesulfonothioate (MTSL) as a spin label for measuring paramagnetic relaxation enhancements (PREs) using NMR. This allows detection of long-range (up to 2.5 nm (Sjodt & Clubb, 2017)) distances by the unpaired electron. A control experiment adding MTSL-labeled 14N-ANAC046172–338 to 15N-ANAC046172–338 did not show any intermolecular effects (Figure S12), ruling out oligomerization. Thus, any effects of the MTSL label were dominantly caused by intrachain contacts. HSQCs of ANAC046172–338 with the label in paramagnetic and diamagnetic states, respectively, were recorded and peak intensity ratios between the two states calculated (Figure 4b). PRE-effects were observed to different extents throughout the IDR for all label positions, suggesting that the region of the label position contacts the rest of the IDR (Figure 4b). While spin labels at positions 217 or 323 affected intensity ratios mostly in AD1/M1-2 and AD2/M5 and in the central part of the IDR, a spin label at position 259 showed PREs throughout the IDR (Figure 4b).
The internal contacts revealed by the PREs made us revisit the reweighted IDR ensemble (Figure 2e) to address whether the ensemble could explain the PRE data (Tesei, Martins, et al., 2021). We derived the PRE effects from the simulated ensemble reweighted with SAXS, and overall, there was good agreement between the results obtained from the ensemble-derived and experimental approaches (Figure 4b, black dots), in particular for the PREs measured with labels at positions 217 and 259. This shows that the ANAC046-IDR can form dynamic internal contacts along the IDR and that long-range contacts (>8 Å) are more frequent than in a random coil chain as evident from the pair-distance distribution plots (Figure 4c). Together, this highlights the extensive dynamic molecular communication within the ANAC046-IDR ensemble, facilitated by long-range contacts.
We finally addressed whether the contact distribution within the ANAC046 IDR would be altered upon regulator interaction. Here, we exploited position 323 and added a 1:1 molar ratio of RCD1-RST to 15N-ANAC046172–338 C323-MTSL, which at the concentration used primarily saturates AD2/M5 (96%) and to a lesser degree AD1/M1-2 (2%) (Figure 4b, last panel). Increased PRE effects induced by RCD1-RST binding compared to unbound ANAC046-IDR were seen throughout the IDR, suggesting that RCD1-RST further stabilizes the internal contacts in ANAC046 (Figure 4b, lower panel). Supporting this conclusion, SAXS data revealed an Rg for the ANAC046 IDR alone of 3.53 ± 0.03 nm and 3.80 ± 0.04 nm in complex with RCD1-RST (in a 1:1 ratio complex) (Figure S13), and with an increased proportion of long-range distances (>8 Å) in the ensemble (Figure 4c). Overall, this supports the conclusion that the ANAC046 IDR retains its intramolecular interactions when bound to a partner.
3 DISCUSSION
In this work, we analyzed if SLiMs and ADs may coincide in long IDRs of transcription factors. Our SLiM search of the NAC II-3 clade resulted in the identification of M1-M4, with instances in additional Arabidopsis transcription factors for all but M3. Their functions remain unknown, and the importance of SLiMs for AD function is still debated (Udupa et al., 2024). A fifth SLiM, M5, was functionally identified in the C-terminal regions of the proteins in the clade. Although M1, M2, and M5 overlap with ADs in two regulatory hotspots in ANAC046 and ANAC087, SLiMs and ADs do not consistently overlap in the members of clade II-3. However, not all ADs may be identified in this specific screen (Morffy et al., 2024), in part explaining the lack of systematic overlap between SLiMs and ADs.
We paid special attention to ANAC046, carrying two distant AD/SLiM regions separated by a long linker region, and representing a suitable model for studies of the molecular ensemble characteristics of and functional interplay in a long transcription factor IDR. The ANAC046 IDR lacks residual secondary structure and populates an ensemble with the potential for dynamic molecular communication through long-range contacts. Although the central part of the IDR facilitates most of the contacts, the AD/SLiM regions at each end of the ANAC046 IDR also transiently interact. The ensemble distribution and compactness can affect the transcriptional activity of transcription factors (Flores et al., 2024). From the SAXS data, regulator binding to the C-terminal AD/SLiM does not change the overall ensemble size; in fact, it may slightly increase the number of long-range contacts. At higher regulator concentrations, where both AD/SLiM regions would be occupied, compaction may, however, change.
The two regulatory hotspots in the ANAC046 IDR are separated by a long linker, and we asked what features this could carry. First, the long flexible linker may provide conformational buffering (González-Foutel et al., 2022). Second, the linker possesses features such as phase separation propensity and a long SLiM (M4) of potential additional functional importance. NMR titrations of ANAC046172–338 with the different regulator domains resulted in a general peak intensity decrease across the IDR, likely due to precipitation upon mixing. This suggests that the ANAC046 IDR may undergo aggregation and even condensate formation in the presence of regulators (Figure 5a) as in the case of Arabidopsis HAC and transcription factors (Theisen et al., 2024). For M4, its length suggests that it can be divided into two SLiMs, one of which resembles a (S/T)Q motif phosphorylated by stress kinases (Kim et al., 1999). From the PRE analysis, the M4 region is devoid of intramolecular communication at least in the absence of phosphorylation. Thus, phosphorylation of this region may play a regulatory role in the ANAC046 IDR ensemble redistribution.

Generally, ADs are regarded as functionally interchangeable (Ptashne & Gann, 1997) and able to bind unrelated regulators (Ravarani et al., 2018; Sigler, 1988; Warfield et al., 2014). Here, we analyzed the binding of the AD/SLiM hotspot regions of ANAC046 to a set of six biologically relevant regulator domains. Indeed, both AD1, overlapping with M1 and M2, and AD2, overlapping with M5, were able to bind all these domains. This seemingly low specificity contrasts with results from recent studies of human CBP showing that the KIX domain binds specific ADs, while the TAZ2 domain is promiscuous in its binding mode, interacting with many ADs (DelRosso et al., 2024). Our approach allowed us to ask what happens to the structure-less AD/SLiM regions of ANAC046 upon complex formation. Our recent work revealed that transcription factors from different transcription factor families share a core structure in complex with RCD1-RST (Newcombe et al., 2024). The AlphaFold 3 models of AD1/M1-2 and AD2/M5 in complex with the different regulatory domains reveal a complete core structure only in the interaction between RCD1-RST and AD/SLiMs, while the turn of the core structure may form in other complexes. In additional models, the AD/SLiMs of ANAC046 all form turn or helical structure to varying degrees and mostly bind to different binding surfaces, with larger heterogeneity in the AD1/M1-2 complexes. Together with the entropic contribution determined by ITC, this suggests that the complexes between AD/SLiMs and regulator domains are dynamic, as suggested for the RCD1-RST:AD2/M5 interaction (Newcombe et al., 2024). Understanding specificities in these types of interactions is complicated by their dynamics and heterogeneity. Limited chemical specificity of the ANAC046 hotspots can lead to a broad range of interactions that are context-dependent and influenced by the cellular environment.
Based on our studies, AD2/M5 binds with higher affinity than AD1/M1-2 to the regulators. This suggests that the binding of regulators confers a graded response to a cellular cue, with an output depending on the competitive binding of positive and negative regulators (Figure 5b). RCD1, binding with the highest affinity to AD2/M5 exposed at the end of a long flexible chain, would first bind here. RCD1-RST may then bind AD1/M1-2 but occupy it only partially due to the lower affinity. A similar sequential binding was observed for HAC1-TAZ1 binding to the two ANAC046 AD/SLiMs. Thus, due to the differences in Kd values, the regulation of ANAC046 appears to be both hierarchical and rheostatic. This further implies that regulators with a large difference in Kd values between the two AD/SLiM sites would be operational over a wider concentration range. This will facilitate a broader dynamic response window, regulation enabled by the presence of more than one AD/SLiM region. Together, this suggests a promiscuous binding strategy within the IDR of ANAC046 functioning in competition-based rheostatic regulation with little chemical specificity but relying on the protein levels of both positive and negative regulators, regulated by, for example, stress and senescence as in the case of ANAC046 (Oda-Yamamizo et al., 2016) or by modifications. The existence of two distant and uncoupled AD/SLiMs explains how ANAC046 is a strong activator (Hummel et al., 2023) with the ability to fine-tune transcription. We suggest that similar regulation mechanisms are mirrored in other transcription factors with AD-dependent regulation through distant AD/SLiM regions.
4 MATERIALS AND METHODS
4.1 Bioinformatics
The viridiplantae ortholog alignments from Proviz (Jehl et al., 2016) were exported for each of the C-terminal IDRs of the clade II-3 transcription factors; a logo plot was constructed in a logo generator (http://slim.icr.ac.uk/visualisation/) using the relative binomial representation.
Conserved regions in a non-conserved context that followed the definition of a SLIM (approximately 4–10 residues in length) and found in multiple IDR logos were identified and extracted. To identify instances in other Arabidopsis transcription factors, sequence alignments were generated for SLiM-containing regions from transcription factor orthologs using GOPHER (Davey et al., 2007) in other viridiplantae species. Each sequence in the alignments was scored against the original Arabidopsis query SLiM regions using BLOSUM62, and peptide sequences with a score of less than 75% were discarded. The sequence alignments for the remaining sequences were used to generate position-specific scoring matrix (PSSM) in PSSMSearch (Krystkowiak et al., 2018) for each SLiM. The PSI-blast IC scoring method with default settings was used for searching the Arabidopsis proteome. The instance list was filtered based on Arabidopsis transcription factors in PlantTFDB (Jin et al., 2017), on disorder scores (IUPred (Erdős et al., 2021) score >0.4), and on accessibility (pLDDT score >0.5). For each SLiM, the residue frequency at each position in the PSSM (Figure S2) was used to define a SLiM consensus. Two SLiM consensus sequences were identified for M4, resulting in individual filtering analyses.
Sequence properties and residue-type density in the transcription factor IDRs were illustrated using IDDomainspotter applying default settings (Millard et al., 2020). To analyze the ANAC046172–338 sequence for propensities for phase separation ParSe (Ibrahim et al., 2023) was used with default settings. The C-termini (21 residues) of the NAC clade were analyzed using Clustal Omega (Sievers & Higgins, 2018).
4.2 Protein expression and purification
The IDR of ANAC046 (ANAC046172–338) was expressed in the pET24a vector in Rosetta (DE3) and purified with a His6-SUMO tag either as previously described (Newcombe et al., 2021) or as detailed below. The cells were lysed by sonication in 20 mM Na2HPO4/NaH2PO4 pH 7.0, 300 mM NaCl (also used as equilibration buffer and wash buffer) on ice and centrifuged at 18.000 rpm for 15 min at 4°C. The lysate was purified using immobilized metal affinity chromatography (IMAC) with TALON® Superflow™ (Cytiva) resin (5 mL pr L culture). After equilibration, the resin was incubated with the lysate for 1 h at 4°C under rotation. The resin was washed with wash buffer for 3 column volumes (CV). Bound protein was eluted with 20 mM Na2HPO4/NaH2PO4 pH 7.0, 100 mM NaCl, and 250 mM imidazole (2 CV). Before cleavage, the sample was dialyzed against 20 mM Na2HPO4/NaH2PO4, pH 7.0, 100 mM NaCl, 1 mM dithiothreitol (DTT) to remove imidazole. Cleavage of the His6-SUMO-tag with Ubl-specific protease 1 (ULP1) (produced in-house (Singh & Graether, 2020)) was performed for 3 h on rotation at 4°C. A second IMAC purification step resulted in the protein being in the flow through, which was concentrated using an Amicon® spin filter (MWCO 10 kDa, Millipore), followed by acid precipitation of the IDR (Newcombe et al., 2021). This step was repeated before purification using size exclusion chromatography (SEC) in the desired experimental buffer (Superdex™ Increase 75 10/300 GL, GE Healthcare).
Variants of ANAC046172–338 containing cysteine mutations, ANAC046172–338,S259C;C323S and ANAC046172–338,S217C;C323S were expressed with a His6-SUMO tag using a pET24a vector in BL21 (DE3) and purified as described for ANAC046172–338.
ANAC046172–222,S175W and ANAC046300–338 were expressed with an N-terminal His6-SUMO tag using a pET24a vector in BL21 (DE3) cells. The cells grew at 37°C to OD600 0.6–0.8 before induction with 1 mM isopropyl ß-D-1-thiogalactopyranoside (IPTG) and incubated overnight at 16°C.
For ANAC046172–222,S175W, cells were lysed in 50 mM Na2HPO4/NaH2PO4, pH 7.0, 150 mM NaCl by sonication. After centrifugation, the lysate was purified by IMAC (TALON® Superflow™, Cytiva) (5 mL resin pr 1 L culture) equilibrated in lysis buffer. After incubation for 1 h, the resin was washed with 50 mM Na2HPO4/NaH2PO4, pH 7.0, 300 mM NaCl (10 CV) before elution with 50 mM Na2HPO4/NaH2PO4, pH 7.0, 150 mM NaCl, 250 mM imidazole (2 CV). The eluate was dialyzed against 50 mM Na2HPO4/NaH2PO4, pH 7.0, 150 mM NaCl, and the His6-SUMO-tag was cleaved with ULP1 overnight at 4°C under rotation with 1 mM DTT added to the sample. The sample was further purified using SEC (Superdex™ Peptide 10/300 GL, Cytiva) into the desired experimental buffer. A similar protocol was used for ANAC046300–338 with different buffers. 20 mM Tris–HCl pH 8.0, 300 mM NaCl was used under lysis and equilibration and wash steps of the IMAC column. 20 mM Tris–HCl pH 8.0, 100 mM NaCl, 250 mM imidazole was used under IMAC elution. The dialysis buffer was 20 mM Tris–HCl pH 8.0, 100 mM NaCl. After cleavage with ULP1 (same conditions as for ANAC046172–222), a second IMAC step was performed as the first. The flowthrough from the second IMAC step was purified using SEC.
The RCD1-RST (residues S499-S572) and TAF4-RST (residues N182-Y254) were expressed and purified as described (Bugge et al., 2018; Friis Theisen et al., 2022). As a final purification step, a SEC was performed using Superdex™ Increase 75 10/300 GL (GE Healthcare). The MED25-ACID (residues S532-N680) was expressed and purified as described by (Theisen et al., 2024).
From HAC1 (UniProt ID: Q9C5X9), the coding sequences of the domains of HAC1-KIX (G43-N134), HAC1-TAZ1 (G626-R724) and HAC1-TAZ2 (N1574-G1697) were selected. HAC1-KIX and HAC1-TAZ2 were expressed with an N-terminal His6-SUMO tag using a pET24a vector, and the HAC1-TAZ1 domain was expressed without a tag in a pET11a vector, all in BL21 (DE3) cells. All domains were expressed after OD600 reached 0.6–0.8 (grown at 37°C), induced with 1 mM IPTG, and incubated overnight at 16°C. A final concentration of 150 μM ZnSO4 for the TAZ domains was added at induction.
Cells expressing HAC1-KIX were lysed by sonication in 20 mM Tris–HCl pH 8.0, 300 mM NaCl. The lysate was purified using IMAC (TALON® Superflow™, Cytiva) with 5 mL resin per L culture equilibrated with 20 mM Tris–HCl pH 8, 300 mM NaCl and incubated for 1 h at 4°C. The resin was washed with equilibration buffer (10 CV) and the protein was eluted in 20 mM Tris–HCl pH 8, 100 mM NaCl, 250 mM imidazole (2 CV). The eluate was dialyzed against 20 mM Tris–HCl pH 8, 100 mM NaCl before cleavage of the His6-SUMO-tag with ULP1 (0.05 mg) for 2 h at 4°C on rotation. A second IMAC step was performed as just described and the flow-through was dialyzed against 20 mM Tris–HCl pH 8.8, 20 mM NaCl. An ion exchange chromatography (IEX) step with a gradient of 20 mM to 1 M NaCl in 20 mM Tris–HCl pH 8.8 (SOURCE™ 15S, Cytiva, 1 mL/min flow) was performed before SEC (Superdex™ Increase 75 10/300 GL, GE Healthcare).
Cells expressing HAC1-TAZ1 were lysed in 20 mM Tris–HCl pH 8.0, 50 mM NaCl, 10 μM ZnSO4, 1 mM DTT by sonication. The lysate was purified using IEX with a SOURCE™ 15S column (GE Healthcare) equilibrated with the same buffer as used under cell lysis. The gradient was 50 mM to 1 M NaCl. Fractions containing the domain were further purified using SEC (Superdex™ Increase 75 10/300 GL, GE Healthcare).
Cells expressing HAC1-TAZ2 were lysed by sonication in 20 mM Tris–HCl pH 8.0, 20 mM NaCl, 10 μM ZnSO4, 1 mM DTT. The lysate was purified by IEX (SOURCE™ 15S, Cytiva) with a gradient of 20 mM to 1 M NaCl in 20 mM Tris–HCl pH 8.0, 10 μM ZnSO4, 1 mM DTT. The fractions containing protein were pooled, and a cleavage step was performed by adding 0.05 mg ULP1 to cleave off the His6-SUMO tag, incubating for 2 h at 4°C under rotation. A second IEX step was performed under similar conditions but at pH 9.0. A final step of SEC (Superdex™ Increase 75 10/300 GL, GE Healthcare).
For S-(1-oxyl-2,2,5,5-tetramethyl-2,5-dihydro-1H-pyrrol-3-yl)methyl methanesulfonothioate (MTSL)-labeling of ANAC046172–338 variants, we utilized a single internal cysteine. For the purified proteins, a final concentration of 5 mM fresh DTT was added, incubating for 30 min at 4°C; subsequently, buffer exchanged using a PD10 desalting column (Cytiva) into 50 mM Hepes pH 8.0, 50 mM NaCl, 3 M GdnHCl. The sample was collected in a foil-covered tube, and MTSL was added at a 10× molar excess. The sample was flushed with nitrogen and incubated overnight (at least 16 h) at room temperature under rotation. Separation of the labeled protein from the free label and unlabeled protein was performed using reverse phase chromatography (RPC) (RESOURCE™ RPC, 3 mL (Cytiva)) with a gradient of buffer A (50 mM NH3HCO3 in MilliQ) and buffer B (30 mM NH3HCO3, 70% acetonitrile).
Protein concentrations were in all cases determined from the absorbance at 280 nm and the extinction coefficients were determined using the ProtParam tool at the ExPASy server using the primary structure of the proteins as input with all cysteines reduced.
4.3 Small angle x-ray scattering
SAXS data was obtained on the PETRA III P12 beamline (DESY, Hamburg) using batch-mode acquisition at 25°C following standard procedures except the transmission was set to 50%. Primary data reduction was made in BIOXTAS RAW. Recordings on ANAC046172–338 were made alone or in a 1:1 complex with RCD1-RST with different concentrations of 0.7, 1.4 mg/mL, and 1 mg/mL complex, respectively, in 20 mM Na2HPO4/NaH2PO4 pH 7.0, 100 mM NaCl, 5 mM DTT. Using the ATSAS package (Petoukhov et al., 2012), the scattering curves were manually assessed, averaged, and subtracted with averaged buffer scattering curves. The Rgs were extracted from the scattering curves using the ATSAS package for one concentration of each protein condition. The scattering curves of the ANAC046 IDR alone were used to build a coarse-grain model of the IDR ensemble. The RANCH application from the EOM (Bernadó et al., 2007) suite (part of ATSAS) was used to generate 100 random models of ANAC046172–338 to produce a scattering curve using the CRYSOL application (ATSAS). Pair-distance distribution functions were generated from scattering curves using the GNOM application in ATSAS.
4.4 Nuclear magnetic resonance spectroscopy
All NMR experiments were acquired on Bruker AVANCE 600 or 800 MHz (1H) spectrometers equipped with cryogenic probes. All experiments were performed in 20 mM Na2HPO4/NaH2PO4, pH 7.0, 100 mM NaCl, 1 mM DTT, 10% (v/v) D2O, 0.02% (w/v) NaN3, and 0.2 mM 4,4-dimethyl-4-silapentane-1sulfonic acid (DSS) except for HAC1-TAZ1 or -TAZ2; here 20 mM Hepes, pH 7.0 was used in the absence of DSS (a similar sample was made containing DSS without the domains for referencing). Assignments of free ANAC046172–338 were from previous work (BMRB ID:51033) (Newcombe et al., 2021). 15N,1H-HSQCs were recorded for samples containing 50 μM 15N-ANAC046172–338 alone and with either 200 μM RCD1-RST, 200 μM TAF4-RST, 75 μM MED25-ACID, 200 μM HAC1-TAZ1, 200 μM HAC1-KIX, or 75 μM HAC1-TAZ2, respectively. Titrations of 15N-ANAC046172–338 (50 μM) with RCD1-RST and HAC1-TAZ1 were obtained with molar ratios varying from 0.1 to 4 at 10°C. From the titrations, peak intensity ratios were calculated. Titration experiments were also recorded at 25°C on 75 μM 15N-RCD1-RST or 75 μM 15N-HAC1-TAZ1 with ANAC046172–222 of varying molar ratios from 0.1 to 5.
To determine paramagnetic relaxation enhancement effects, best-TROSY experiments on a Bruker AVANCE 600 MHz (1H) spectrometer with a cryogenic probe at 10°C were recorded on samples containing 90 μM MTSL-labeled 15N-ANAC046172–338, 15N-ANAC046172–338,S217C;C323S or 15N-ANAC046172–338,S259C;C323S, a sample with 90 μM MTSL-labeled 15N-ANAC046172–338 and 90 μM RCD1-RST or a sample with 50 μM 15N-ANAC046172–338 and 50 μM MTSL-labeled ANAC046172–338 (paramagnetic state of the label). The same experiments were recorded after obtaining the diamagnetic state by incubation with 10× molar excess ascorbic acid for 3 h at 20°C. The effects were quantified from the ratio between resonance intensities for the individual residues.
4.5 Isothermal titration calorimetry
All experiments were performed on a MicroCal™ ITC200 (GE Healthcare) instrument at 25°C in triplicates. A buffer containing 50 mM Hepes, pH 7.0, 100 mM NaCl, and 1 mM tris(2-carboxyethyl)phosphine (TCEP) (when cysteines were present) was used for all experiments. Samples were centrifuged at 17.000 × g for 10 min at the experimental temperature before loading into the microcalorimeter. Concentrations varied between 10 and 100 μM in the cell and between 100 and 1000 μM in the syringe aiming at a 1:10 cell:syringe concentration ratio. A total of 19 injections were made, with the first being 0.5 μL followed by 2 μL injections. For the interaction of ANAC046172–338 with HAC1-TAZ1, a total of 38 injections were made, with a 0.5 μL injection followed by 1 μL injections. All data was analyzed using in-house scripts fitted to either one set or twosets of sites as documented by Microcal and Origin. The data points from the first injection of each experiment were removed in the fitting.
4.6 Coarse-grained molecular dynamics simulations
We ran a coarse-grained simulation of the ANAC046 IDR (172–338) with the CALVADOS 2 model (Tesei & Lindorff-Larsen, 2023; Tesei, Schulze, et al., 2021) using HOOMD-blue 2.9.3 (Anderson et al., 2020). The simulation was performed using a Langevin integrator at 298 K with an ionic strength of 100 mM and the partial charge of His side chains set based on pH 7.0, corresponding to the conditions of the SAXS experiments. We used a 2 nm cutoff for the Ashbaugh-Hatch potential and a 4 nm cutoff for the Debye–Hückel potential. The simulation was started from an Archimedean spiral arrangement of the protein chain. We equilibrated the system for 10,000 steps with a 5 fs time step and ran the production simulation for 10 μs with a 10 fs time step. For our final ensemble, we extracted 10,000 evenly spaced frames and reconstructed the all-atom structures using Pulchra 3.06 (Rotkiewicz & Skolnick, 2008).
4.7 SAXS calculations and Bayesian/maximum entropy reweighting
We calculated SAXS profiles from our ensemble of 10,000 frames using Pepsi-SAXS 3.0 (Grudinin et al., 2017). We fixed the parameters for the contrast of the hydration layer, δρ = 3.34 e/nm3, and the volume of displaced solvent, r0/rm = 1.025, to avoid overfitting to the experimental SAXS profile (Pesce & Lindorff-Larsen, 2021). We used a Bayesian/maximum entropy (BME) approach to reweight our ensemble against the experimental SAXS data (Bottaro et al., 2020). We iteratively fitted the scale and constant background of the ensemble averaged SAXS profile while reweighting (Pesce & Lindorff-Larsen, 2021). In BME reweighting, the parameter θ is used to tune the balance between the agreement with the prior ensemble and the experimental data. We scanned θ and selected a value that gave a large decrease in the reduced χ2 (χ2r) to the experimental data while remaining as close to the prior ensemble as possible, as measured by the fraction of effective frames (φeff). Before reweighting, we used the Bayesian Indirect Fourier Transform algorithm (BIFT) to rescale the errors of the experimental SAXS intensities to make χ2r = 1 the target value for our ensemble model (Hansen, 2000; Larsen & Pedersen, 2021).
4.8 Scaling exponent calculation
The scaling exponent, ν, was determined using a least-squares fit of the function in the long-distance region with r0 and ν as free parameters, where rij is the ensemble averaged distance between residues i and j in the sequence and r0 is a prefactor related to the length of a residue. We used the compute distances function in MDTraj (McGibbon et al., 2015) to calculate rij and the curve fit function in SciPy (Virtanen et al., 2020) for least-squares regression.
4.9 PRE calculations from ensemble
We used a rotamer library approach implemented in the DEER-PREdict software (Tesei, Martins, et al., 2021) to calculate PREs from our ensemble, both with uniform weights and with weights from BME reweighting against SAXS. We assumed an effective correlation time of the spin label (τt) of 100 ps, a transverse relaxation rate for the diamagnetic protein of 10 s,−1 and a total INEPT time of the HSQC measurement of 5.4 ms. We scanned values of the rotational correlation time (τc) from 1 to 20 ns in steps of 1 ns and selected 12 ns as the value that minimized the χ2r to the experimental PRE data for both the uniform and reweighted ensemble.
AUTHOR CONTRIBUTIONS
Amanda D. Due: Conceptualization; investigation; formal analysis; visualization; writing – original draft; writing – review and editing; data curation. Norman E. Davey: Investigation; formal analysis; writing – review and editing; resources; data curation. F. Emil Thomasen: Investigation; software; formal analysis; writing – review and editing. Nicholas Morffy: Resources; writing – review and editing. Andreas Prestel: Formal analysis; writing – review and editing. Inna Brakti: Investigation; writing – review and editing. Charlotte O'Shea: Resources; writing – review and editing. Lucia C. Strader: Resources; writing – review and editing. Kresten Lindorff-Larsen: Supervision; funding acquisition; resources; writing – review and editing; methodology; software. Karen Skriver: Conceptualization; project administration; writing – review and editing; writing – original draft; resources; funding acquisition; validation. Birthe B. Kragelund: Conceptualization; supervision; funding acquisition; writing – original draft; writing – review and editing; resources; project administration; investigation; validation; methodology.
ACKNOWLEDGMENTS
The authors thank Dr. Izabella Krystkowiak, Dr. Frederik F. Theisen, and Steffie Elkjær for valuable discussions and Signe Sjørup for technical assistance. Acknowledgments are made to the DESY/EMBL P12 beamline at PETRA III facilities in Hamburg and beamline scientist Cy Jeffries for excellent assistance. The work was supported by the Novo Nordisk Foundation (grant no.: NNF18OC0032996 to B.B.K., cOpenNMR and grant no.: NNF18OC0033926 to B.B.K. and K.S. and no. NNF22OC0079339 to K.S.), Independent Research Fond Denmark (grant no.: 9040-00164B to BBK), Lundbeck Foundation BRAINSTRUC initiative grant (grant no.: R155-2015-2666 to K.L.-L. and B.B.K.), Villum Fonden for NMR equipment, and Cancer Research UK Senior Cancer Research Fellowship (grant no.: C68484/A28159 to N.E.D.).
CONFLICT OF INTEREST STATEMENT
KL-L holds stock options and is a consultant for Peptone Ltd. All other authors declare no competing interests.
Open Research
DATA AVAILABILITY STATEMENT
The data that supports the findings of this study are available in the supplementary material of this article.