Computational modeling of Repeat1 region of INI1/hSNF5: An evolutionary link with ubiquitin
Abstract
The structure of a protein can be very informative of its function. However, determining protein structures experimentally can often be very challenging. Computational methods have been used successfully in modeling structures with sufficient accuracy. Here we have used computational tools to predict the structure of an evolutionarily conserved and functionally significant domain of Integrase interactor (INI)1/hSNF5 protein. INI1 is a component of the chromatin remodeling SWI/SNF complex, a tumor suppressor and is involved in many protein-protein interactions. It belongs to SNF5 family of proteins that contain two conserved repeat (Rpt) domains. Rpt1 domain of INI1 binds to HIV-1 Integrase, and acts as a dominant negative mutant to inhibit viral replication. Rpt1 domain also interacts with oncogene c-MYC and modulates its transcriptional activity. We carried out an ab initio modeling of a segment of INI1 protein containing the Rpt1 domain. The structural model suggested the presence of a compact and well defined ββαα topology as core structure in the Rpt1 domain of INI1. This topology in Rpt1 was similar to PFU domain of Phospholipase A2 Activating Protein, PLAA. Interestingly, PFU domain shares similarity with Ubiquitin and has ubiquitin binding activity. Because of the structural similarity between Rpt1 domain of INI1 and PFU domain of PLAA, we propose that Rpt1 domain of INI1 may participate in ubiquitin recognition or binding with ubiquitin or ubiquitin related proteins. This modeling study may shed light on the mode of interactions of Rpt1 domain of INI1 and is likely to facilitate future functional studies of INI1.
Statement
This research article describes the prediction of structure of an evolutionarily conserved and functionally significant domain (Rpt1) of Integrase Interactor (INI)1/hSNF5 protein. This study, for the first time, puts forward the hypothesis on the presence of Ubiquitin-like fold for Rpt1 domain of INI1. This modeling study may shed light on the mode of interactions of Rpt1 domain of INI1 and is likely to facilitate future functional studies of INI1.
Introduction
To study protein function and activity, structural information is required. Because experimental structures are available for just a small fraction of all known protein sequences, computational methods such as protein modeling can provide useful information of unknown protein structures.1 Protein structure prediction methods that are commonly used include (i) homology modeling,2 (ii) threading,3 and (iii) ab initio folding.4 The homology modeling method is based on the observation that proteins with homologous sequences fold to almost identical structures. Therefore, when a highly homologous template structure for the target sequence is available in the Protein Data Bank (PDB), the method can produce an accurate model similar to its native structure.5 However, because of the relative lack of homologous template structures of most of the proteins in the data base, the application of homology modeling is relatively narrow and it becomes impossible to get the structural information of most of the protein sequences, using this method. The relatively minor sequence homology between the proteins of known and unknown functions has hampered protein scientists to infer the functions of many proteins that are difficult to be crystallized. In such cases, either threading or ab initio modeling techniques, which do not depend on template structures, can be utilized. Threading is based on the observation that there are entire groups of proteins without apparent sequence similarity but have similar folds.3 Another technique, ab initio or de novo prediction method, essentially folds a protein model from scratch, typically using a Monte Carlo optimization on physicochemical or knowledge-based statistical potential, mimicking a physical protein folding process.4 The Modeling using threading or ab initio techniques are restricted to small protein sequences and they are not applicable to large proteins.4
In this report, we have applied threading and ab initio-based tertiary structure prediction methods to a conserved domain of Integrase Interactor 1 (INI1/hSNF5/BAF47/SMARCB1) protein which does not possess homology to any of the known protein structures. INI1 is a core subunit of the ATP-dependent chromatin-remodeling SWI/SNF complex that regulates the transcription of a subset of eukaryotic genes.6 INI1 and several other components of the SWI/SNF complex are tumor suppressor genes frequently mutated in a large number of human cancers. INI1 was first found to be a tumor suppressor mutated in >95% of the rhabdoid tumors, a rare but aggressive pediatric cancers with poor prognosis.7 Subsequently, INI1 mutations were also found in a large number of other human cancers including but not limited to schwannomatosis schwannoma and epithelioid sarcoma.8 The number of human cancers that are associated with mutations in INI1 tumor suppressor is increasing with the advent of cancer genome sequencing. The exact function of the INI1 subunit within the SWI/SNF complex is unknown. The INI1 protein is involved in protein–protein interactions with a large number of cellular and viral proteins. INI1 has been found to interact with viral proteins such as human immunodeficiency virus-type 1 (HIV-1) integrase, HIV-1 Tat, human papillomavirus (HPV) 18 E1 protein, and Epstein-Barr virus (EBV) protein EBNA-2.9-12 It also interacts with human proteins such as cMYC, SAP18 and ALL1.13-15 The mechanisms by which INI1 interacts with a large number of unrelated proteins and influences their functions, remains to be determined.
INI1 protein consists of 385 amino acids with two highly phylogenetically conserved imperfect repeat units, Rpt1 and Rpt2, of 60 and 61 amino acids respectively, connected by a short linker [Fig. 1(A)].16 The two repeat units have significant sequence similarity. Rpt2 has a masked nuclear export signal (NES).17 The N-terminal region of INI1 appears to have winged DNA-binding domains.18 The repeat regions appear to be protein–protein interaction modules that are able to self associate as well as interact with other proteins.19 Previous studies have demonstrated that the C-terminal domain of INI1 containing the two Rpts are involved in multiple protein-protein interactions. The two Rpt domains associate with SNF2/BRG1, the ATPase subunit of the SWI/SNF complex.20 The two Rpt domains associate with SNF2/BRG1, the key ATPase subunit of the SWI/SNF complex. SNF2/BRG1 belongs to a subfamily of ATPases that have the ability to generate superhelical torsion in nucleosomal DNA. Evidence suggests that these two transcriptional activators, SNF2/BRG1/SWI2 and SNF5 or hSNF5, function by antagonizing repression mediated by nucleosomes.21 Fragments of INI1 containing Rpt1 (aa 186–245) region, termed S6 (aa 183–294 with NES) and S6(Rpt1) (aa 183–265 without the NES) associate with HIV-1 IN and when expressed in HIV-1 producer cells, inhibit viral production.22, 23 The exact mechanism by which these fragments inhibit HIV-1 particle production is unknown. S6 (aa 183–294) has also been shown to bind to cMYC and inhibit its transactivation function from a reporter gene containing E-box.13

Cartoon illustrating the domain structure of INI1 and the Secondary Structure prediction within Rpt1(183–265) domain. A. Arrangement of two repeat domains of INI1 (Rpt = repeat; NES = nuclear export signal). B. Secondary structures within Rpt1 as predicted by different methods; (a) Jpred (b) PSIPRED (c) YASPIN (d) PHD. Numbers (0–9) represent the confidence/reliability of prediction.
Because Rpt1 is a highly conserved domain and is stably expressed, and since it acts as an inhibitor of HIV-1 replication and cMYC transactivation, we wanted to determine the structural features of this domain that governs its ability to act as a protein–protein interaction module to gain insight into its function. Here we have carried out molecular modeling studies of S6(Rpt1, aa 183-265) fragment of INI1, which includes Rpt1 and the linker region, as this fragment is the minimal fragment shown to inhibit HIV-1 replication.22
Our efforts to characterize the structure of S6(Rpt1) fragment of INI1/hSNF5 by using computational structure modeling tools have revealed a characteristic folding pattern present in the conserved Rpt1 region of SNF5 family proteins. Analyses using various programs (Rosetta, I-TASSER, and Quark) revealed that the core of the Rpt1 structure is defined by an alpha beta (α + β) type of fold. The structural core (Rpt1) shows striking similarity to PFU, the PLAA family of Ubiquitin binding domain. PFU domain has been identified as an ubiquitin-associated domain containing (α + β) topology.24 Currently, the exact function of either the PFU domain or other proteins with same or similar fold is unknown. Structural similarity between PFU and Rpt1 domains suggests a functional connection between the two proteins. PFU domain of PLAA is a Ubiquitin associated protein and it has structural similarity to ubiquitin itself, suggesting a structural similarity of Rpt1 to Ubiquitin. Like ubiquitin, INI1 Rpt1 domain has the ability to bind to a large number of unrelated proteins. Because of the predicted structural similarity of Rpt1 to PFU domain of PLAA and because of the similarity of PFU domain to ubiquitin, we compared the modeled Rpt1 structure to Ubiquitin structure and found that modeled Rpt1 exhibits partial structural similarity to ubiquitin. The presence of ubiquitin related fold in Rpt1 is unprecedented and perhaps this is the basis for its activity as protein–protein interaction module. As Rpt1 domain is the minimum integrase interacting domain, the predicted structural information is likely to provide mechanistic insights for INI1 function and therapeutic insights for the treatment for HIV infection and/or cancer. Furthermore, since Rpt1 is a conserved domain in all SNF5 family members and since it is important for binding to BRG1/SNF220 the prediction of Rpt1 structure is likely to broaden our understanding of its biological and biochemical activities governing aggressive cancers and provide new insights into the overall function of the INI1 in SWI/SNF complex.
Results
Homolog identification and secondary structure prediction
To gain insight into the INI1 structure and to identify the unexplored features of this protein, we carried out a systematic homolog search of the full length protein sequence using global BLAST search. This global search did not yield any known template for full length INI1. Because proteins having no significant amino acid sequence similarity can nonetheless share the same structure or fold/s, and sometimes, the same function, we attempted to identify structural homologs of INI1. For this purpose, we first determined the secondary structural elements and ordered regions within the protein using Psipred. This analysis indicated that 35% of the INI1 as being disordered and rest of the molecule having ordered secondary structures (Supporting Information Fig. S1). Interestingly, the two conserved Rpt regions within INI1 contained ordered secondary structural elements and the linker region between the two Rpt domains predicted to be disordered. This result indicated that Rpt regions can fold into secondary structural elements and may fold into stable 3D structure. Because we were interested in S6(Rpt1) fragment, the secondary structure element of this fragment was analyzed by PSIPRED, Jpred, YASPIN, and PHD. All these programs predicted almost identical secondary structural elements, α + β fold, within the Rpt1 domain, and disordered region in the linker region [Fig. 1(B)].
3D structure modeling of S6(Rpt1) fragment of INI1
We initiated the 3D structure modeling of full length INI1 using Robetta. This program uses the Rosetta fragment-assembly technique to build models of protein domains in both template-based and de novo methods.25 Robetta divided the whole INI1 sequence into five regions (Supporting Information Fig. S2). One region corresponding to aa 172–254 was recognized as structural homologue of PFU domain of PLAA (PDB: 2K89A, aa 14–76). This analysis suggested that Rpt1 (aa 186–245) domain can fold like PFU domain of PLAA. We next subjected the S6(Rpt1) fragment for molecular modeling by Robetta. Robetta aligned the S6(Rpt1) sequence onto the parent structure of PLAA (PDB: 2K89A) to model the aligned part. It then modeled the variable regions by allowing them to explore conformational space with fragments similar to the de novo procedure. This analysis successfully created five potential models of S6(Rpt1). The first model with least energy structure was considered as best model for further analysis. Robetta predictions strongly supported that S6(Rpt1) folds with the elements of secondary structure that define ββαα fold with anti-parallel beta sheets followed by a helix-turn helix motif (Fig. 2). Thus the Rpt1 region of the modeled fragment shows remarkable similarity with PFU domain of PLAA.

Best models of S6(Rpt1) generated by different programs. (A) Robetta, (B) I-TASSER, (C) QUARK. Coloring from N (blue) to C (brown) terminus. Both the front view and transverse view is depicted.
In addition to Robetta, two other programs, iterative threading assembly refinement algorithm (I-TASSER) and QUARK were used to predict the S6(Rpt1) structure. I-TASSER generated five ab initio-based 3D models of the S6(Rpt1) fragment (details in methods section) and identified PFU domain of PLAA as one of the template, out of ten templates (detail in Supporting Information Fig. S3) used for model building. The best I-TASSER model (C score = −1.51, Tm score = 0.53 ± 0.15) was similar to that of the Robetta model with ββαα topology with same orientation of the structural elements. We next employed QUARK ab initio modeling method that successfully assembled two β strands and α helices together to form a compact structure of Rpt1 with a reasonably high TM-score 0.49 for the best model. This ab initio predicted structure was also very similar to that predicted by Robetta and I-TASSER. In QUARK, structural models are assembled from small continuous fragments (1–20 residues) excised from unrelated proteins. This is different from the other two programs as the QUARK-based fragment assembly simulations starts from random conformations without relying on global threading templates, which enables it to construct new protein folds from scratch. Apart from some minor differences all the three programs suggested the same folding pattern for Rpt1 core region as a compact fold with two anti-parallel beta sheets at the N-terminus followed by helix-turn-helix motif at the C-terminus (Fig. 2).The first helix predicted by the three programs appeared extremely similar in terms of length, composition and orientation. However, the length of the second helix showed slight variances at the beginning. In addition, the three programs predicted different orientation of the linker region in the C-terminus of the S6(Rpt1). The modeling differences appeared to be originated from the different techniques used by the three programs.
The quality of predicted models was checked by various protein structure analysis tools. Ramachandran Plot and Z score analysis for all the three models consistently provided a high degree of allowed regions (82.7–89.3%) and a high Z score (−4.85 to −5.31) (Supporting Information Fig. S4). However, the 3D model predicted by Robetta was the top-ranking one among the three (Supporting Information Table SI). The Ramachandran plot of Robetta model showed majority of the residues in most favorable region and showed overall good quality. 89.3% of the residues were in the most favorable region, 9.3% in the allowed region, 1.3% in the additional region, and 0% in the disallowed region. As a comparison, the PLAA template (2k89A), exhibited 83.68, 10.4, 1.5, and 4.5%, in the most favorable, the allowed, the additional and the disallowed regions respectively (Supporting Information Fig. S4). These results indicated that most residues had consistent phi–psi distribution and very likely represent a reliable fold for further analysis. Finally, PROSA was used to check energy criteria, which showed that the interaction energy of each residue with rest of the protein is negative (Supporting Information Fig. S4). The quality of the 3D structure of this model was further confirmed by acceptable Z score (−5.31). The overall quality of the model was 100% as identified by Errat program (Supporting Information Fig. S4). This indicated that residues are well folded in the 3D model. The modeled protein structure was validated by Verify-3D, to check compatibility with its sequence. Verify-3D graph showed that the model 3D–1D score was above zero and 92.77% of the residues had an averaged 3D–1D score ≥0.2 (Supporting Information Fig. S4), indicating that the side chain environments were acceptable.
Structural features of S6(Rpt1) of INI1
Central core portion of the S6(Rpt1) structure exhibited hydrophobic/non polar residues [Fig. 3(A)].The two helices within Rpt1 region were amphipathic [Fig. 3(B)] and were connected through Interhelical hydrogen bonds or salt bridges between the side chains of Arg242 (HelixII), Glu216 (HelixI), Glu210 and Thr214 [Fig. 3(A)]. By using the program “consurf” we also carried out sequence-structure alignment to determine the conservation of the modeled S6(Rpt1) structure. This analysis indicated that the core of the structure as represented by clustered residues [Fig. 3(C), conserved region in green], was evolutionarily conserved among members of SNF5 family, indicating that this fold may be important for its function. This conserved region is stabilized by hydrophobic central patch (grey) formed by several hydrophobic residues like Leu and Phe, surrounded by conserved positively charged (blue) and conserve negatively charged (red) region [Fig. 3(D)].

Intramolecular interactions, distribution of residues and conservation: (A) Hydrophobic core in the structure and interhelical H-bonds (salt bridge) stabilizing the helices. Arrangement of hydrophobic residues (grey) projecting at the center of the protein. (B) Arrangement of charged residues (magenta) and Hydrophobic residues (cyan) on the helices. (C) Molecular surface of S6(Rpt1) colored by sequence conservation among SNF5 homologues, analyzed by the program ConSurf. The colors vary from dark green for highly conserved residues to magenta for residues with little conservation. Conserved surface patch is in the center of the structure. (D) Surface map of hydrophobic (grey)/polar (red: negatively charged, blue: positively charged) residues.
Similarity of Rpt1, PFU domain of PLAA, and ubiquitin
Modeling studies suggested that core of the S6(Rpt1) fragment has similarity with PFU domain of PLAA, belonging to the “α and β” (α + β) class and contains a double-stranded β-sheet in the order 1–2, followed by a long packed α-helix-turn-helix (HTH) with relatively small unstructured loops. There is just 11% sequence identity between the structurally homologous region of S6(Rpt1) and PFU domain of PLAA [Fig. 4(A)]. A superposition of the predicted Rpt1 structure with that of the PFU domain of PLAA revealed some minor differences but the relative orientation of helices, loops and sheets remained the same between the two structures [Fig. 4(B)]. Comparison of the structures of Rpt1 and PFU domain of PLAA reveals that both form very similar folds, and have an exposed large hydrophobic surface patch [Fig. 4(C,D)]. The packing of the secondary structures, especially the helices in the HTH region, were almost identical in the two proteins (Fig. 4).

Comparison of (A) Sequence and secondary structures, (B) 3D Structures of PFU domain of PLAA (2k89A.pdb), S6(Rpt1) of INI1 and Ubiquitin. (C) Superposition of the structural homologous region of INI1 Rpt1 and PFU domain of PLAA. (D) Surface representation of charged and hydrophobic regions PFU and S6(Rpt1), blue: electropositive, red: electronegative. Selected region within the dotted oval shows the hydrophobic patch at the center. (E) Superposition of the structural homologous N-terminal region of INI1 Rpt1 and ubiquitin.
To better understand the mode by which the S6(Rpt1) might interact with different proteins, we compared it with other known structures by performing a search of the PDB database with the DALI program. This search yielded a number of related proteins (Supporting Information Table SII) to predicted structure of S6(Rpt1), closest was PFU domain of human PLAA (z score = 6.8) and PLAA homologue DOA from yeast (z score = 5.6). Structural similarity of S6(Rpt1) model to PFU domain of PLAA suggested that the two proteins may share functional similarity or might interact with identical partners or proteins of similar structures.
PFU domain of PLAA belongs to ubiquitin interacting motif which binds to ubiquitin, and itself shares partial structural similarity with ubiquitin.26 Because of this similarity of PFU domain to Ubiquitin, and similarity of Rpt1 domain to PFU, we compared the Rpt1 domain to Ubiquitin. The comparison of 3D structures was done using pairwise Dali alignments. Structural comparison of ubiquitin and Rpt1 revealed that they share partial structure similarity (Fig. 4). Though they have only about 10% identical residues in the region of INI1 (aa 183–233) from modeled S6(Rpt1) domain, when superimposed on human Ubiquitin (PDB ID: 2JZZ) shows an RMSD of 2.7 Å for 40 Cα atoms. The β1, β2, α1, and turn (between α1 and α2) of the Rpt1 corresponded to the β1, β2, α1, and η1 of human Ubiquitin, respectively [Fig. 4(E)]. This suggest that N terminus of Rpt1 has very similar topology to that of the N terminus of ubiquitin.
Structural similarity of the two rpt motifs in INI1
Though our main focus was to identify the structure of Rpt1, since Rpt1 and Rpt2 share partial sequence similarity, we examined to see if Rpt2 also shares structural similarity with Rpt1. We modeled the INI fragment (aa 251–319), comprising part of the linker and Rpt2 sequence (aa 259–319), using QUARK. Modeled Rpt2 structure also adopts an (α + β) fold with ββαα topology. This structure possessed reliable Ramachandran characteristics and valid verify 3D score (Supporting Information Fig. S5) which suggested that amino acids are properly folded and having proper side chain environment in 3D space. As evidenced by resulted structure, the arrangement of sheets and helices appear to be similar to Rpt1. Though the topology of two repeats revealed similarity in terms of arrangement of sheets and helices, the 3D arrangement of charges (electropositive and electronegative) were different in two structures (Fig. 5). The differences in the nature of amino acids should govern the difference in their biological activities. Quark gave result which proved our speculation that these two repeats can fold similarly. Further, this analysis gave more confidence in our prediction of Rpt1 structure.

Sequence and Structure comparison of Rpt1 and Rpt2. Modeled structure of (A) Rpt2 (B) Rpt1. Ionizable residues (red: negative, blue: positive). (C) Superposition of Rpt1 (green) and Rpt2 (red).
Discussion
Three-dimensional structural data is fundamental to understand the relationship between protein structure and its function. However, crystallographic data is available for just a small fraction of all known proteins. Often the proteins are too difficult to crystallize or too big for NMR studies. Computational methods of protein modeling such as (i) comparative or homology modeling,2 (ii) threading,3 and (iii) ab initio folding4 can provide useful information of unknown protein structures. INI1/hSNF5 protein comes in the category of such challenging proteins, which are large and possess significant amount of disordered structural elements. It has been difficult to obtain sufficient amounts of homogenous, stable and soluble full length INI1. Because of the lack of any structure, we initiated the modeling study of INI1. Initial modeling efforts elucidated the presence of unique folding pattern for a conserved Rpt1 domain of INI1. In this article we report the modeled structure of Rpt1 region of INI1. Here, we have applied threading and ab initio method to predict the structure of Rpt1 domain of INI1/hSNF5. The importance of INI1/hSNF5 in SWI/SNF complex, and its interaction with viral and cellular proteins and its function in tumor suppression, transcription and HIV-1 replication is extensively studied. Although the exact function of Rpt1 domain of INI1 is not known, it appears to function as a protein-protein interaction module. We focused our attention on stably expressed S6(Rpt1) region harboring Rpt1 domain of INI1 for modeling studies, because of its significance to inhibition of HIV-1 replication.
The studies presented here, using a combination of secondary structural analysis and 3D structural modeling, demonstrate that the S6(Rpt1) fragment of INI1 has a well folded Rpt1 region, predicted by all three programs used. Three independent 3D structure modeling programs, which work on very different modeling methodology, predicted essentially the same structure of Rpt1 domain as a compact structure with unstructured linker region. Resulting modeling data also confirmed the presence of similar fold for Rpt2 bearing partial sequence similarity with Rpt1. In all the three modeled structures of S6(Rpt1), the linker region was disordered. The disordered nature of liker region was also predicted by secondary structure prediction methods. Thus, we speculate that while the Rpt regions (Rpt1 and Rpt2) have well folded structures, the region connecting the two repeat units is intrinsically flexible. We speculate that this may allow the Rpt1 domain to fold and function independently of the Rpt2. These results strengthen the fact that fragments (Rpt1 and Rpt2) can be singly expressed in the absence of other repeat as shown previously in various studies and in some cases showed activity, independently even without the intact protein.13, 22, 23, 27 The quality and strength of the predicted structure was validated by several criteria, which provided a high level of confidence in the structure. The results suggested that core of the S6(Rpt1) has a folded antiparallel beta sheets and helix-turn-helix motif. The hydrophobic residues that belong to the central core (Fig. 4) participated in stabilizing interactions between the helix and sheets and seem to be important for overall stability of the fold. The polar/charged residues were exposed to the solvent, perhaps free to contact any binding partner. Sequence-structure conservation profile for this modeled structure showed conserved clustered hydrophobic core surrounded by charged residues (Fig. 4). The conserved exposed hydrophobic surface may drive the interactions or associations with other proteins which may be further strengthened by the ionic interactions of polar residues in the area. This may be a common protein-interacting surface present in SNF5 family. While the modeled structure (Rpt1) is of a domain of INI1, the accessibility of the hydrophobic patch will be dependent upon the structure of full length protein and domain–domain interactions. We believe that there are domain–domain interactions within INI1, and on account of other binding partners, like HIV-1 Integrase, there is domain movement or rearrangements which may make the hydrophobic surface patch available for other binding partners. We speculate that the disordered region between the two Rpt domains might help in these rearrangements. These are the speculations until we have the full length 3D structure of INI1. While full length protein is important for protein to function, the Rpt1 fragment itself has been expressed and has been shown to bind to IN and function as a dominant negative inhibitor of HIV-1 replication.22, 28 Because of these reasons, we believe that there is sufficient significance for deciphering the structure of Rpt1 fragment.
On the basis of the structural similarity between Rpt1 and PFU domains, we can predict the functions of INI1 Rpt1 domain, as the presence of structurally related fragments are suggestive of similar function. It has been shown that 3D structure alignment of unrelated (non-homologous but structurally similar) proteins generally show very low sequence identity (as low as 10%) in the aligned region.29 Here the unrelated proteins PLAA and Rpt1 share 11% sequence identity but have remarkably similar structures and thus can be considered as distant homologs. Structure is evolutionarily more conserved than sequence and equivalent residues obtained from a structural alignment are meaningful for functional interpretation.29 Structural alignment of these two proteins revealed some important facts about functionally important regions of Rpt1 of INI1. Interestingly PFU domain of PLAA has been proposed to be able to bind with Ub, which plays important roles in endoplasmic reticulum-associated degradation, vesicle formation, and DNA damage response. Because, INI1 has similar structure to PFU domain of PLAA, we propose that INI1 may bind to ubiquitin or ubiquitin related domains. In this context, it can be predicted that, like PFU, the hydrophobic surface patch of Rpt1 domain may interact with the hydrophobic surface on the ubiquitin or ubiquitin-related proteins. There are visible differences in the orientation and charges at the region of loop between the anti-parallel beta sheets of PFU and RPT1 domain (Fig. 4). In PFU domain this loop was found to be flexible and also governing the binding affinity for Ubiquitin. Although the driving force for the interaction is provided by hydrophobic patch, it has been noted that the flexible loop between the beta sheets influence the binding to Ubiquitin.24 Likewise it can be predicted that the corresponding loop in Rpt1 should be important for interaction with ubiquitin or ubiquitin-related domain.
While there are no reports of direct interaction of INI1 and ubiquitin, there is indirect evidence to suggest that INI1 protein may bind to proteins with similarity to Ubiquitin or to ubiquitynated proteins. For example, an ubiquitin-like protein SAP18 (Sin3-associated polypeptide of 18 kDa) has previously been shown to interact with INI1.14 However this study does not provide any information regarding the interacting domains or the molecular nature of interactions between the two proteins. In addition, it has been reported that SWI/SNF complex as well as INI1 within the complex preferentially interact with mono-ubiquitylated histone H2A containing nucleosomal array.30 This study did not reveal if INI1 subunit directly interacts with ubiquitin. However, our structural modeling data suggests that Rpt1 domain of INI1 may be able to bind to ubiquitin and related protein/s and provides a possible mode of interaction of INI1 with SAP18.
INI1 Rpt1 domain has been shown to interact with multi-domain proteins like HIV-1 IN and c-MYC, which are structurally unrelated and have very different domain architectures. Previous experiments demonstrated that INI1 Rpt1 and other deletion fragments possessing Rpt1 region interact with both HIV IN and cMYC. While deletion analysis indicated that HIV-1 IN central core domain (RNaseH fold) is necessary and sufficient for interaction with INI1, cryo-EM studies indicated that C-terminal DNA binding SH3 domain of HIV-1 IN interacts with INI1.22, 31 It is also interesting to note that a study to isolate HIV-1 protein interaction partners based on affinity tagging and purification mass spectrometry identified PLAA as a binding partner of HIV-1 Integrase. However, this study did not provide any other details about the interactions between these two proteins.32 It is still unknown whether PFU domain is involved in binding to HIV-1 Integrase. Future research will likely to elucidate the details about this interaction.
It has also been demonstrated that INI1 rpt1 domain binds to helix-loop-helix-leucine zipper (bHLH-zip) domain of cMYC. The question is how these structurally distinct domains interact with Rpt1? On the basis of our modeling study, we propose a rationale that the ability of Rpt1 to bind to different and distinct partners is due to its similarity to ubiquitin or ubiquitin-like proteins. Ubiquitin like domains show little or no significant sequence similarity to ubiquitin, but nevertheless, preserve the similarity in fold.33 Ubiquitin and ubiquitin related domains are central to protein-protein interactions.34 Their interactions with a large number of varied domains and protein complexes are well known.35 Ubiquitin fold is a perfect example in nature that has remarkable tendency to interact with multiple, different class of protein folds or domains. Similarity with ubiquitin strengthens the notion that Rpt1 is a protein interaction module and that it has the tendency to bind to many different and very distinct binding partners like HIV-IN or c-MYC. Future delineation of the binding interface between HIV-1 Integrase or c-MYC and Rpt1 will shed light on the mode of interaction of INI1 with other proteins.
Interestingly, ubiquitin has also been reported to play a role in the SNF5 protein stability. SNF5 protein was found to be poly-ubiquitinated and regulated by a proteasome-mediated degradation pathway.27 We propose that ubiquitin-mediated degradation pathway regulates the stability of INI1, which is in turn is facilitated or governed by Rpt1 conserved domain. In this regard, the Rpt1 domain may work as a recognition signal in protein degradation pathway. Whether binding of Rpt1 with poly- or mono-Ub provides a signal for the proteosomal degradation or other process (such as recruitment of SWI/SNF to nucleosomes) remains to be elucidated in the future.
Conclusion
The molecular modeling approach provides a means for the study of proteins that are not amenable to traditional X-ray crystallography and NMR techniques. We have reported the predicted three-dimensional structure of S6(Rpt1) fragment of INI1, by molecular modeling, which does not have any homologs. The studies presented here, using a combination of sequence analysis and structural modeling, demonstrate that Rpt1 has significant similarity with the ubiquitin associated PFU domain of PLAA. Similar fold for Rpt1 was observed by three independent protein modeling methods. Our studies provide an explanation for the role of Rpt1 as a protein–protein interaction module involved in binding to various proteins. Similar to ubiquitin, INI1 may be engaged in binding to variety of partners for achieving a common effect. Our structural predictions strengthen the notion that Rpt1 is a protein-protein interaction module of INI1. However, further experimentation is required to establish our prediction. The similarity between PLAA, Ubiquitin and INI1 should provide useful insight for further investigation.
Methods
Secondary structure analysis
Secondary structure analysis was conducted using the INI1 sequence (UniProtKB - Q12824 and GeneBank accession number CAA76639). The S6(Rpt1) fragment was 83 residues long located between protein positions 183–265. The sequence-based prediction of the secondary structure of the s6rpt1, was performed using the Four secondary structure prediction methods: Jpred,36 PSIPRED,37 YASPIN,38 PHD.39
Comparative modeling
We have used three different modeling programs for 3D modeling of this fragment, which are based on different methodologies. First is Robetta, which parses input sequences into domains and builds models for domains with sequence homology to proteins of known structure using comparative modeling, and models for domains lacking such homology using de novo structure prediction method. Robetta uses “Ginzu domain prediction method” as the initial step for structure prediction which is a hierarchical screening procedure that first uses BLAST, PSIBLAST,40 FFAS03,41 and 3D-Jury42 to detect regions in the query sequence that are homologous to experimentally determined structures, and then proceeds with multiple sequence alignment (MSA) based methods to predict putative domains. Any query with detected parents is modeled with template-based modeling protocol. Remaining unassigned regions are then modeled by Rosetta de novo protocol.43 The second program which we used is I-TASSER which has been described previously,44 which gave the best protein models at the Critical Assessment of Structure Prediction (CASP 7 and CASP 8), a community-wide, worldwide experiment designed to obtain an objective assessment of the state-of-the-art in structure prediction. The I-TASSER algorithm consists of three consecutive steps: threading, fragment assembly and iteration. During the threading, I-TASSER generates the template alignments by a sequence profile–profile alignment approach according to the secondary structures. Next, fragment assembly is performed based on threaded alignments. The fragments in the aligned regions are used directly from the template structures and the unaligned regions are modeled ab initio. Subsequent clustering is done with the use of a knowledge-based force field and cluster centroids are generated and ranked. The conformations with the lowest energy are finally selected and evaluated by confidence scores (C-scores). Third algorithm which we used was Quark which is an ab initio protein folding and protein structure prediction based on amino acid sequence only. QUARK models are built from small fragments (1–20 residues long) by replica-exchange Monte Carlo simulation under the guide of an atomic-level knowledge-based force field. Because no global template information is used in QUARK simulation, the server is suitable for proteins which are considered without homologous templates. A TM-score 0.5 indicates a model of correct topology and a TM-score, 0.17 suggests a random similarity.
Model quality assessment and structure and sequence comparison analysis
To assess the overall stereochemical quality of the generated 3D model, the geometrical accuracy of the residues and 3D profile quality index were inspected with the PROCHECK (ver. 3.5).45 The modeled protein is also validated by VERIFY3D which check compatibility of 3D models with its sequences. PROSA was used for final model to check energy criteria. The modeled structure was compared with similar structures using DALI46 and conservation of sequence and structure was analyzed by ConSurf.47
Acknowledgments
This work was supported by a grant from NIGMS R01GM112520 to GVK and Royalty funds (3A2533) of SAA. SB is thankful to Protein Society for providing early career researcher travel award for presenting a part of the work at annual symposium of protein society 2015.