Volume 33, Issue 3 e4911

TOOLS FOR PROTEIN SCIENCE

Open Access

Key interaction networks: Identifying evolutionarily conserved non-covalent interaction networks across protein families

Dariia Yehorova,

Dariia Yehorova

School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA

Search for more papers by this author

Rory M. Crean,

Rory M. Crean

Department of Chemistry—BMC, Uppsala University, Uppsala, Sweden

Search for more papers by this author

Peter M. Kasson,

Corresponding Author

Peter M. Kasson

[email protected]

Department of Molecular Physiology, University of Virginia, Charlottesville, Virginia, USA

Department Biomedical Engineering, University of Virginia, Charlottesville, Virginia, USA

Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden

Correspondence

Peter M. Kasson, Department of Molecular Physiology, University of Virginia, Charlottesville, Virginia, USA.

Email: [email protected]

Shina C. L. Kamerlin, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA.

Email: [email protected]

Search for more papers by this author

Shina C. L. Kamerlin,

Corresponding Author

Shina C. L. Kamerlin

[email protected]

orcid.org/0000-0002-3190-1173

School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA

Department of Chemistry—BMC, Uppsala University, Uppsala, Sweden

Correspondence

Peter M. Kasson, Department of Molecular Physiology, University of Virginia, Charlottesville, Virginia, USA.

Email: [email protected]

Shina C. L. Kamerlin, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA.

Email: [email protected]

Search for more papers by this author

Dariia Yehorova,

Dariia Yehorova

School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA

Search for more papers by this author

Rory M. Crean,

Rory M. Crean

Department of Chemistry—BMC, Uppsala University, Uppsala, Sweden

Search for more papers by this author

Peter M. Kasson,

Corresponding Author

Peter M. Kasson

[email protected]

Department of Molecular Physiology, University of Virginia, Charlottesville, Virginia, USA

Department Biomedical Engineering, University of Virginia, Charlottesville, Virginia, USA

Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden

Correspondence

Peter M. Kasson, Department of Molecular Physiology, University of Virginia, Charlottesville, Virginia, USA.

Email: [email protected]

Shina C. L. Kamerlin, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA.

Email: [email protected]

Search for more papers by this author

Shina C. L. Kamerlin,

Corresponding Author

Shina C. L. Kamerlin

[email protected]

orcid.org/0000-0002-3190-1173

School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA

Department of Chemistry—BMC, Uppsala University, Uppsala, Sweden

Correspondence

Peter M. Kasson, Department of Molecular Physiology, University of Virginia, Charlottesville, Virginia, USA.

Email: [email protected]

Shina C. L. Kamerlin, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA.

Email: [email protected]

Search for more papers by this author

First published: 15 February 2024

https://doi.org/10.1002/pro.4911

Citations: 8

Review Editor: Nir Ben-Tal

Share a link

Email
Wechat
Bluesky

Abstract

Protein structure (and thus function) is dictated by non-covalent interaction networks. These can be highly evolutionarily conserved across protein families, the members of which can diverge in sequence and evolutionary history. Here we present KIN, a tool to identify and analyze conserved non-covalent interaction networks across evolutionarily related groups of proteins. KIN is available for download under a GNU General Public License, version 2, from https://www.github.com/kamerlinlab/KIN. KIN can operate on experimentally determined structures, predicted structures, or molecular dynamics trajectories, providing insight into both conserved and missing interactions across evolutionarily related proteins. This provides useful insight both into protein evolution, as well as a tool that can be exploited for protein engineering efforts. As a showcase system, we demonstrate applications of this tool to understanding the evolutionary-relevant conserved interaction networks across the class A β-lactamases.

1 INTRODUCTION

Non-covalent interactions between residues play a critical role in determining protein structure and function. Within families of evolutionarily related proteins, studying these interactions can provide insights into functional specificity and evolutionary constraints placed on the sequences (Jack et al., 2016). However, considering the large number of non-covalent interactions in any given protein, traditional experimental techniques often lack the high-throughput capabilities or resolution required for a comprehensive analysis of these relationships. Computational sequence-based approaches such as EVcouplings (Hopf et al., 2019), GREMLIN (Kamisetty et al. 2013; Ovchinnikov et al., 2014), Protein Sparse InverseCOVariance (PSICOV; Jones et al., 2012), Direct Coupling analysis (DCA; Morcos et al., 2011) have been applied to address this gap. These techniques rely on a large, diverse sample of available sequences and are thus less suitable for characterizing small groups of evolutionarily related proteins. Structural information, particularly those based on residue interaction networks (RINs), can provide an alternative way to identify preserved interaction motifs and co-evolving groups of residues. A RIN is a graph-based representation of a protein structure where nodes represent individual residues and edges represent physico-chemical interactions between them (Clementel et al., 2022). Combined with graph-theoretic analysis techniques, RIN-based tools have demonstrated success in analysis of stability, folding, and function of proteins (Atilgan et al., 2004; Hu et al., 2007; Tse & Verkhivker, 2015).

In this study, we introduce a tool named Key Interaction Networks (KIN), a Python package that can construct a conservation-based RIN for a set of evolutionary related proteins. While there already exist a number of tools for analyzing protein interactions in multi-state structures (del Conte et al., 2023; Huggins et al., 2018; Sladek et al., 2021), KIN provides the additional capability to analyze any group of related proteins by finding a common network of interactions that are conserved within the family. Specifically, using KIN, the RIN across the protein family can be projected onto a structure of interest and used to identify both conserved and variable interaction subnetworks throughout the family. Further, the RINs used can be determined directly from analysis of a single structure of each member (e.g., a crystal structure) or alternatively from an ensemble of structures (e.g., molecular dynamics trajectories of simulations performed on each member). To demonstrate the utility of KIN, we applied it to study 69 evolutionarily related class A β-lactamases, proteins that play a major role in bacterial resistance to β-lactam antibiotics (Bush & Bradford, 2016). As we will showcase here, projecting conserved interactions onto a structure of interest allows characterization of the core interaction network across a family, which can in turn provide insight into functionally important residues and interactions as well as help to identify relevant residues that could be targeted for mutation. This makes KIN a useful tool both for advancing our understanding of the fundamental biochemistry of enzyme families as well as for protein engineering efforts.

1.1 Description of KIN workflow

Residue contact networks have been argued to be powerful tools for modeling evolutionary structural changes in proteins (Zhang et al., 2013). Key Interactions Network (KIN) is an open-source Python-based semi-automated tool that identifies of shared residue interaction networks (RINs) within a community of evolutionarily related proteins. KIN is available for download at https://www.github.com/kamerlinlab/KIN. Prior study in this space has focused on the evolution of interaction networks between proteins and/or other large networks (e.g., Alhindi et al., 2017; Ali & Deane, 2020; Fraser et al., 2002; Levy & Pereira-Leal, 2008; Pawlowski et al., 2013; Schoenrock et al., 2017; Schüler & Bornberg-Bauer, 2011; Stumpf et al., 2007; Sun & Kim, 2011; Wagner, 2001; Zitnik et al., 2019, among many others), or on characterizing functionally/allosterically important interaction networks within individual proteins (i.e., without broader evolutionary mapping, e.g., Clementel et al., 2022; Ali & Deane, 2020; Amaro et al., 2007; Brown et al., 2017; Crean et al., 2023; Felline et al., 2022; La Sala et al., 2023; McCormick et al., 2021; Reynolds et al., 2011; Scheurer et al., 2018; Seeber et al., 2011), or has been focused on analysis at the sequence rather than structural level (Anishchenko et al., 2017; Green et al., 2021; Hopf et al., 2019; Lee et al., 2008). KIN focuses on mapping conserved interaction networks within evolutionarily related proteins, which can in turn be responsible for the functionally relevant properties of these proteins such as protein folding stability, allostery or catalysis.

The general workflow of KIN is schematized in Figure 1a and described below. After identifying a group of related proteins to study, all PDB files are downloaded and prepared for analysis. Preparation involves adding missing atoms/residues; assigning protonation states; and standardizing the format of the PDB files to that used by Amber (see Data S1). This file format standardization generates consistent residue and atom labelling schemes that facilitate comparison across related proteins. At this point, KIN analysis can be performed in one of two ways: either on the PDB structures directly, or alternately, biomolecular simulations can be run using AmberMD and the resulting trajectories analyzed with KIN. As alluded to in Figure 1a, RINs can then be constructed based on either structural or simulation data, with non-covalent interactions classified into any of the following types: salt bridge, hydrogen bond, cation-π, π-π, hydrophobic and van der Waals (vdWs) interactions. If MD simulations are performed to generate a structural ensemble for each protein, then KIN uses an adjustable cutoff to determine how frequently an interaction must be present for inclusion in that protein's RIN. For example, if the cutoff is set to 10%, contacts will be included if present for at least 10% of the simulation time.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

(a) An overview of the main steps in the Key Interaction Networks (KIN) workflow. (b) Example of two output formats of the method: (top) Structural projection of interactions, where lines indicate a contact between two residues. The color of the line reflects the interaction type while the thickness is proportional to the level of conservation. (bottom) Summary of the interaction type, type of interacting residues (main-chain, Mc, or side-chain, Sc) and conservation scores for a subset of the interactions preserved in the protein family of interest.

However generated, the resulting RIN graphs are converted to a multiple sequence alignment (MSA)-based indexing representation for conservation analysis. The MSA is created using Modeller (Šali & Blundell, 1993) and uses the residue type as the alignment feature (see the Data S1 for more information). At this point, an appropriate reference structure is selected to project the results onto. Conservation scores are computed for each interaction possible within the target protein. This score can be computed in two alternative ways, either the ratio of the number of proteins where a contact is present divided by a total number of proteins or the ratio of the number of proteins where a contact is present divided by the number of proteins that contain the interacting residues at appropriate MSA positions. The second approach means that the conservation score is not penalized if it would be impossible for some structures to form certain interactions because either or both residues involved are not present. These two approaches are designed to account for structural diversity of a family. While many commonly studied families exhibit high structural homogeneity, intriguing mechanisms such as enzyme promiscuity and gene fusion are often observed in in families and superfamilies with a high level of diversity (Das et al., 2015). In this study, we perform all calculations with the first method, as the two results are nearly identical for a family of similar structures.

Additionally, conservation scores of the residue pairs can be summed into per residue scores and normalized using Min-Max scaling. This is defined as follows, where N is the total number of contacts that include residue r_i and S is the distribution of all unnormalized per residue scores s:

s\left({r}_i\right)=\sum \limits_j^N{\left(c\mathrm{onservation}\ \mathrm{score}\right)}_j

()

S=\frac{s-\min (s)}{\max (s)-\min (s)}

()

The conservation scores for each contact can be visualized on a structure of choice using the PyMOL scripts generated by KIN or analyzed further in tabular format (Figure 1b). The tabular dataset is stored as a pandas (McKinney, 2010; The Pandas Development Team, 2020) DataFrame, with each row defining a different contact. Columns include: residue numbers (with both MSA and PDB numbering); interaction type (e.g., hydrogen bond); which parts of each residue are involved in the contact (i.e., side or main chain); and the conservation score. All of these columns can easily be filtered for further analysis, such as to highlight only the most conserved interactions across a family of proteins.

To illustrate this and further capabilities of KIN, we used this tool to find shared interaction networks among 69 structures of class A β-lactamases. We projected these interaction networks onto the structure of TEM-1 (PDB ID: 1M40; Minasov et al., 2002) as a representative system throughout, as this enzyme is commonly associated with clinical ß-lactam resistance in Gram-negative bacteria (Bradford, 2001; Livermore, 1995) and is considered a model class A β-lactamase (Brown et al., 2009). The most conserved interactions among β-lactamases (Figure 2a) comprise hydrogen bonding networks within secondary structural elements and highly preserved hydrophobic core interactions. Interactions can also be filtered by whether the side chain or main chain is involved, and filters can be applied consecutively to focus on a specific set of interactions more deeply (Figure 2b). Users can also obtain the set of interactions conserved within the analyzed family but not present in the protein of interest (Figure 2c). This is designed to help understand why a particular protein differs functionally from the rest of an evolutionary group or potentially to guide protein engineering.

1.2 Comparison of static and dynamic KIN analysis

For comparative purposes, we performed KIN analysis on both static (crystallographic) structures available in the Protein Data Bank (Ovchinnikov et al., 2014) and on molecular dynamics trajectories of the respective structures. MD simulations of all 69 structures were performed and prepared according to the protocol described in the Data S1. A tunable cutoff parameter specifies what fraction of the simulation frames an interaction must be present in order to be counted for that protein. Figure 3a shows the dependence of interactions detected on cutoff value, compared to analysis of PDB structures only. We will refer to this reference analysis of PDB structures as the crystallographic network. As might be expected, including all interactions present in any frames (cutoff >0%) of the simulation is highly sensitive but skewed by rare events. Setting the cutoff to even ≥10% still captures almost all of the crystallographic network while reducing the number of transient interactions (Figure 3a). Such transient interactions may be functionally important or may be artifactual, which is why a user-tunable parameter is critical here. Increasing the cutoff values between 10 and 50% results in a network more similar to the crystallographic network, with the number of shared residues between the two networks decreasing a lot slower than the number of the MD contacts (Figure 3a). This means that the interactions removed as a result of the increasing cutoff are largely non-crystallographic.

A visual comparison of the conserved interaction network obtained from MD simulations using per-trajectory cutoffs is rendered in Figure 3b. The majority of interactions that are preserved between crystallographic and MD-based networks (blue) describe interactions within secondary structure components. As might be expected, interactions found only via MD analysis are located primarily in the more flexible regions of the system. It is important to note that many of the dynamic interactions that are discarded when using a higher cutoff are strongly conserved across class A ß-lactamases and thus may contain important information (Figure S1). For example, we observe that, of the interactions preserved when a per-trajectory cutoff of 10% is used instead of 50%, 32% (47 out of 143) are conserved in >80% of ß-lactamase proteins analyzed. Further, while a number of hydrogen bonding interactions dominate both MD-based and crystallographic networks, cutoff sensitivity analysis shows many hydrophobic interactions to generally be more transient (Figure S2). Many of these interactions are not present in the crystallographic network and yet are highly conserved across the family.

Reside contact maps provide another means to compare crystallographic and MD-based interaction networks (Figure 3c). Here, we plot crystallographic contact maps and MD-based ones analyzed using a 10% cutoff. Generally, interactions within α-helices and β-sheets are constant between the two. In contrast, more dynamic regions displayed greater variation in the contacts found and their relative conservation scores, particularly interactions between neighboring $\alpha$ -helicies or within loops and linker regions. These observations support the notion that the conserved interactions within secondary structural elements can be reliably described with either of the methods, but the analysis of crystallographic structures alone might be insufficient to capture more dynamic interactions.

1.3 Conservation of interaction networks in the β-lactamase family

In the following section, we use the set of class A β-lactamase structures to illustrate further system-specific analysis that can be done using KIN. The conserved interaction network derived from crystallographic data is rendered in Figure S2 and clearly shows that most of the highly conserved interactions are inter-molecular hydrogen bonds within α-helices or hydrophobic contacts in the core of the protein. When the contact maps are plotted by interaction type, hydrogen bonding interactions clearly predominate, while dense interactions around residue 50 characterize the hydrophobic core (Figure S3). We performed a structural alignment of all 69 class A β-lactamases (Figure S4) to the TEM-1 structure and observed the α-helices and the core of each protein to be structurally preserved among the majority of these structures with RMSD score of 0.42 Å, and are an important common feature across the β-lactamase fold (Philippon et al., 2015; Tooke, Hinchliffe, et al., 2019b).

We also explored the relationship between conserved interaction networks and the active site architecture (Figure 4). Each residue was classified as “Catalytic” (directly involved in catalysis), “Active Site” (within 5 Å of a co-crystalized transition state analogue, see Figure 4) and “Other” (all other residues). (Jelsch et al., 1993; Minasov et al., 2002). Because we are interested in polar interactions that might vary with mutation, we filtered the interactions to only include those that were non-hydrophobic and involved at least one side-chain residue. This decision was based on the observations made from Figure S4, in which the most strongly conserved interactions formed across the β-lactamases are hydrogen bonds between two main chain residues inside the α-helices and hydrophobic interactions at the core of protein. These interactions were filtered out in order to not dominate the subsequent analysis in Figure 4.

Both our analysis on the crystal structures and that on the MD simulation dataset highlight E166 (which acts as a general base in class A β-lactamase catalysis; Kalp et al., 2009) to be one of the residues with the highest per residue conservation score. This is primarily as a result of the highly conserved hydrogen bonds E166 forms with the side-chains of residues S70, N136, and N170, and the salt bridge with K73. Taken together, this would suggest these interactions with E166 are essential for class A β-lactamase catalysis, which is consistent with the literature on class A ß-lactamases (Ji & Boxer, 2022; Kalp et al., 2009). However, only the MD-based network is able to identify conserved interactions for the second catalytically important residue, S70. This residue is responsible for the formation of a covalent acyl intermediate during the catalytic reaction (Padayatti et al., 2004). MD simulations sampled alternative S70 conformations that were not observed in available crystal structures. Subsequent KIN analysis identified conserved hydrogen bonds between S70–K73 and S70–E166. These interactions are consistent with the established role of S70 in facilitating proton transfer between K73 and E166 (Cs et al., 2019; Meroueh et al., 2005). Additionally, Figure 4 illustrates that analysis of MD trajectories gave more insight into highly conserved interactions concentrated around the active site. The MD-based network identifies conserved interactions involving 78% of the active site residues, while the crystallographic network identifies only 41%. Furthermore, 40% of the MD-identified residues are contained within the top 20% of the KIN per-residue scores. These conserved interactions within the active site region form a hydrogen-bonding network. These observations demonstrate that KIN can serve as a useful complementary tool for characterizing evolutionarily conserved active site interactions.

1.4 Comparison of KIN to sequence based data

Contact-based and sequence-based analyses of protein evolutionary groups are expected to be complementary and to some extend concordant. One such way to compare the similarity of two proteins at the sequence level is to calculate their percentage identity, or what fraction of amino acids in an aligned sequence are identical. The percentage identity matrix (PIM) for all ß-lactamases studied is shown in Figure S5A. As a means of comparison, we also generated a protein contact similarity matrix (PCSM) based on contact networks (Figure S5B), with the metric the ratio of conserved contacts against non-conserved contacts between any two proteins. The Spearman rank order correlation between the PIM and PCSM matrices is 0.71. This indicates relatively strong concordance between evolutionary conservation of key interaction networks and sequence conservation across class A β-lactamases.

The PCSM also enables hierarchal clustering to generate a contact-based equivalent of a phylogenetic tree (i.e., a dendrogram, Figure 5a), using a distance measure based on how similar their interaction networks are. Analyzing the interactions that differentiate two sub-trees can yield insight into putative evolutionary or functional differences. As an example of this, we selected a branch point in the dendrogram and used this to create two groups of proteins, labeled Clusters 1 and 2 in Figure 5a. Each contact was scored by the difference in degree of conservation between the two clusters, quantified as C_2,ij–C_1,ij, where C_1,ij is the fraction of structures in group 1 where contact ij is present, and the resulting histogram is plotted in Figure 5a. This histogram clearly shows most contacts do not notably differ between the two clusters. To study those contacts that differ the most between the two clusters, we selected contacts with a conservation difference of ≥ ± 0.3. Of the 58 contacts thus selected (from 1712 total), those with the greatest difference in contact conservation are rendered in Figure 5b, on the structure of a representative protein from each cluster.

While a detailed analysis of the differences in the contact networks of β-lactamases is beyond the scope of this manuscript, as an example, our comparison of Clusters 1 and 2 identified several well conserved intermolecular hydrogen bonds towards the N-terminus of the Cluster 1 β-lactamases (circled on Figure 5b) resulting in an extended α-helix for the Cluster 1 β-lactamases.

We identified the phylum to which each β-lactamase belonged in order to compare contact based hierarchal clustering against taxonomic classification (Figure S6). While there clearly was some level of cluster separation according to the phylum for example 9/10 of the Bacillota were clustered to the far right of the dendrogram, it could not explain all of the separations (Figure S6). The most striking example of this is the most distinct cluster found in the dendrogram which is separation that occurs at a height of ~4 (Figure 5) and is made up of five β-lactamases. These five β-lactamases contain three of the six phyla present throughout the whole dataset (Figure S7). We applied the same approach as described for Figure 5b to identify which contacts differed between the two groups, which by and large showed hydrogen bonds distributed throughout most of the protein to differ (Figure S8).

Finally, we note that an alternative strategy could be to first construct a phylogenetic tree/dendrogram using sequence data (which would contain many more examples), but then interpret a branch point of interest in the phylogenetic tree using the approach we described above based on available crystal structures and/or protein structure prediction.

1.5 Applicability of KIN towards protein engineering

Many existing protein engineering tools such as dTERMen (Zhou et al., 2019), FuncLib (Khersonsky et al., 2018) and Hotspot Wizard (Sumbalova et al., 2018) make use of sequence-based information from homologues to help select candidate mutations. These tools hypothesize that mutating a residue to an amino acid that is found in homologous proteins at that same position is less likely to give rise to a “harmful” mutation (Frappier & Keating, 2021). The information generated from our conservation network analysis allows for an implementation of the same idea, and in the following section, we will demonstrate how this could be applied using TEM1 as an example, as a potential first step in a protein engineering pipeline.

Using the crystal structure of TEM1, we defined the distance that each interacting pair of residues is from the active site and plotted this against the conservation of the contact score (Figure 6a). This enabled us to identify contacts distributed throughout TEM1 that are both proximal and distant from the active site, with varying levels of conservation. Figure 6a shows two examples of such contacting pairs in TEM1, with one example involving two pairs of variable hydrophobic residues forming hydrophobic contacts within the core. While in TEM1, the residues that provide this hydrophobic interaction are Leu and Ile, we can identify the other combinations of residue pairs that form this interaction, which would require a single or double mutation to occur in TEM1. In comparison to the hydrophobic interaction described above, our second example shows a hydrogen bond/salt bridge interaction between the residues Arg and Gln in TEM1. Interestingly, two of these alternate interactions show a rather substantial conformational difference in which one of the helices is now a loop, and the interaction formed is now between the sidechain and the backbone instead of between the two sidechains (Figure 6a).

The approach described above provides a candidate set of 365 single point and 982 double point mutations that could be considered for protein engineering purposes, as opposed to the ~5000 possible single and ~25,000,000 possible double mutations from the 263 residues of TEM1. Given that these mutations are observed in other related lactamases, these mutations would be expected to be more tolerant of mutation. To evaluate the “usefulness” of this approach, we utilized data generated by Firnberg et al. (2014), in which protein fitness values for ampicillin resistance were determined for all single point mutants of the TEM1 β-lactamase. We compared the fitness values obtained from our subset of single point mutations against all single point mutations, as well as selecting single point mutations of a similar amino acid type to the wild-type amino acid (Figure 6b). As expected, selecting only single point mutations similar to the WT amino acid gives rise to a notable improvement, with the mean fitness value changing from 0.51 to 0.57 (as compared to using all possible single point mutations), Figure 6b. By applying KIN and using only the crystal structure contact data we obtained a substantial improvement, reaching a mean fitness value of 0.83, Figure 6b. We also assessed if including interactions from MD-based contact networks would change the results. Using a per-trajectory inclusion cutoff of 10%, we obtained 501 candidate single point mutations to test as compared to the 365 from the X-ray structure alone. As depicted in Figure 6b, there was not an increase in the mean fitness values from the inclusion of MD simulation data. It should be noted that the MD-based networks include more potential mutations (which could be subsequently filtered and/or combined in subsequent steps), so the denominator to the mean fitness value increases. We also performed a sensitivity analysis with regard to per-trajectory cutoff score (Figure S9). The shape of the distributions and mean values for the distributions obtained were very similar for both “strict” and “relaxed” cutoffs, with the only notable (and expected) difference being that using a stricter cutoff reduces the number of possible mutations to try.

Finally, one might expect that filtering mutations to only include those within a certain distance from the active site of TEM1 would increase the variance of the selected mutations fitness values. We assessed the impact of filtering mutations using several different distance cutoffs between 5 and 25 Å of TEM1's active site (Figure S10). While there was no notable difference in the distributions of fitness values across the different cutoffs (Figure S10), we caution that this is most probably a system specific effect rather than a general rule.

2 DISCUSSION

Protein interaction networks are essential to facilitating protein function, and understanding their evolutionary conservation gives insight into how novel protein functions evolve (Jack et al., 2016). While there exist are a number of tools that can calculate protein interaction networks, including from dynamical trajectories (del Conte et al., 2023; Huggins et al., 2018; Sladek et al., 2021), these typically do not additionally focus on evolutionary conservation, and tools that study evolutionary conservation typically focus on protein–protein interaction networks (Alhindi et al., 2017; Ali & Deane, 2020; Fraser et al., 2002; Levy & Pereira-Leal, 2008; Pawlowski et al., 2013; Schoenrock et al., 2017; Schüler & Bornberg-Bauer, 2011; Stumpf et al., 2007; Sun & Kim, 2011; Wagner, 2001; Zitnik et al., 2019) rather than interactions within an individual protein subunit. We note here, however, a recent related study that exploited MD-based correlation-based networks in combination with sequence conservation analysis for protein engineering, in order to design new variants of the tryptophan synthase complex using activity enhancing distal mutations (Maria-Solano et al., 2021).

We present here a new tool, KIN, that is an open-source Python package that can construct conservation-based RINs for sets of evolutionary related proteins from both static and dynamic data. Our dynamic approach (MD simulations) was able to identify many novel interactions not observed with the static contact analysis. The current analysis is based on 5 × 100 ns MD simulations per system, which in the case of the current enzymes, proved adequate to provide valuable insight into their contact networks. However, the contacts found are of course modulated by the conformational space explored during the simulation(s), which is an important trade off to consider when using this tool. That is, for more conformationally complex systems, additional replicates or longer simulations may identify more novel contacts, but at the expense of additional computational resources.

We showcase an application of KIN to class A β-lactamases and demonstrate its usefulness both as a tool to understand protein evolution more broadly, as well as part of the protein engineering toolkit. Our results suggest two possible uses for the KIN tool in protein engineering. The first is to identify sites where point mutagenesis is unfavorable. As shown in Figure 6b, mutation of a single partner in an evolutionarily conserved interaction tends to incur a substantial fitness penalty. These sites should either be spared or co-mutated so as to maintain the interaction. Second, conserved interactions that are missing from a particular template scaffold are likely good targets for mutational engineering. If these interactions can be reconstituted, we would predict protein stability and/or function to improve, particularly if the template is an ancestral scaffold that may lack some interactions that evolved later.

ACKNOWLEDGMENTS

This study was supported by the Swedish Research Council (Grant No. 2019-03499 to Shina C. L. Kamerlin) and the National Institutes of Health (GM138444 to Peter M. Kasson). Computational resources were provided by the Swedish National Infrastructure for Computing (Grant Nos. 2019/2-1, 2019/3-258, and 2020/5-250), and simulations were performed on the BerzeLiUs and Tetralith clusters at NSC Linköping.

Supporting Information

REFERENCES

Alhindi T, Zhang Z, Ruelens P, Coenen H, Degroote H, Iraci N, et al. Protein interaction evolution from promiscuity to specificity with reduced flexibility in an increasingly complex network. Sci Rep. 2017; 7:44948.
10.1038/srep44948
CAS PubMed Web of Science® Google Scholar
Ali W, Deane CM. Evolutionary analysis reveals low coverage as the major challenge for protein interaction networks. Mol BioSys. 2020; 6: 2296–2304.
10.1039/c004430j
Google Scholar
Amaro RE, Sethi A, Myers RS, Davisson VJ, Luthey-Schulten ZA. A network of conserved interactions regulates the allosteric signal in a glutamine amidotransferase. Biochemistry. 2007; 46: 2156–2173.
10.1021/bi061708e
CAS PubMed Web of Science® Google Scholar
Anishchenko I, Ovchinnikov S, Kamisetty H, Baker D. Origins of coevolution between residues distant in protein 3D structures. Proc Natl Acad Sci U S A. 2017; 114: 9122–9127.
10.1073/pnas.1702664114
CAS PubMed Web of Science® Google Scholar
Atilgan AR, Akan P, Baysal C. Small-world communication of residues and significance for protein dynamics. Biophys J. 2004; 86: 85–91.
10.1016/S0006-3495(04)74086-2
CAS PubMed Web of Science® Google Scholar
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000; 28: 235–242.
10.1093/nar/28.1.235
CAS PubMed Web of Science® Google Scholar
Bradford PA. Extended-spectrum β-lactamases in the 21st century: characterization, epidemiology, and detection of this important resistance threat. Clin Microbiol Rev. 2001; 14: 933–951.
10.1128/CMR.14.4.933-951.2001
CAS PubMed Web of Science® Google Scholar
Brown DK, Penkler DL, Amamuddy OS, Ross C, Atilgan AR, Atilgan C, et al. MD-TASK: a software suite for analyzing molecular dynamics trajectories. Bioinformatics. 2017; 33: 2768–2771.
10.1093/bioinformatics/btx349
CAS PubMed Web of Science® Google Scholar
Brown NG, Shanker S, Prasad BVV, Palzkill T. Structural and biochemical evidence that a TEM-1 β-lactamase N170G active site mutant acts via substrate-assisted catalysis. J Biol Chem. 2009; 284: 33703–33712.
10.1074/jbc.M109.053819
CAS PubMed Web of Science® Google Scholar
Bush K, Bradford PA. β-Lactams and β-lactamase inhibitors: an overview. Cold Spring Harb Perspect Med. 2016; 6:a025247.
10.1101/cshperspect.a025247
PubMed Web of Science® Google Scholar
Clementel D, Del Conte A, Monzon AM, Camagni GF, Minervini G, Piovesan D, et al. RING 3.0: fast generation of probabilistic residue interaction networks from structural ensembles. Nucleic Acids Res. 2022; 50: W651–W656.
10.1093/nar/gkac365
CAS PubMed Web of Science® Google Scholar
Crean RM, Slusky JSG, Kasson PM, Kamerlin SCL. KIF—key interactions finder: a program to identify the key molecular interactions that regulate protein conformational changes. J Chem Phys. 2023; 158:144114.
10.1063/5.0140882
CAS PubMed Web of Science® Google Scholar
Cs M, Vadlamani G, Holicek V, Chu M, Larmour VLC, Vocadlo DJ, et al. Molecular basis for the potent inhibition of the emerging Carbapenemase VCC-1 by avibactam. Antimicrob Agents Chemother. 2019; 63: e02112–e02118.
PubMed Web of Science® Google Scholar
Das S, Dawson NL, Orengo CA. Diversity in protein domain superfamilies. Curr Opin Genet Dev. 2015; 35: 40–49.
10.1016/j.gde.2015.09.005
CAS PubMed Web of Science® Google Scholar
del Conte A, Miguel Monzon A, Clementel D, Camagni GF, Minervini G, Tosatto SCE, et al. RING-PyMOL: residue interaction networks of structural ensembles and molecular dynamics. Bioinformatics. 2023; 39: btad260.
10.1093/bioinformatics/btad260
PubMed Web of Science® Google Scholar
Felline A, Seeber M, Fanelli F. PSNtools for standalone and web-based structure network analyses of Conformational ensembles. Comput Struct Biotechnol J. 2022; 20: 640–649.
10.1016/j.csbj.2021.12.044
CAS PubMed Web of Science® Google Scholar
Firnberg E, Labonte JW, Gray JJ, Ostermeier M. A comprehensive, high-resolution map of a gene's fitness landscape. Mol Biol Evol. 2014; 31: 1581–1592.
10.1093/molbev/msu081
CAS PubMed Web of Science® Google Scholar
Frappier V, Keating AE. Data-driven computational protein design. Curr Opin Struct Biol. 2021; 69: 63–69.
10.1016/j.sbi.2021.03.009
CAS PubMed Web of Science® Google Scholar
Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW. Evolutionary rate in the protein interaction network. Science. 2002; 296: 750–752.
10.1126/science.1068696
CAS PubMed Web of Science® Google Scholar
Green AG, Elhabashy H, Brock KP, Maddamsetti R, Kohlbacher O, Marks DS. Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences. Nat Commun. 2021; 12: 1396.
10.1038/s41467-021-21636-z
CAS PubMed Web of Science® Google Scholar
Hopf TA, Green AG, Schubert B, Mersmann S, Schärfe CPI, Ingraham JB, et al. The EVcouplings python framework for coevolutionary sequence analysis. Bioinformatics. 2019; 35: 1582–1584.
10.1093/bioinformatics/bty862
CAS PubMed Web of Science® Google Scholar
Hu Z, Bowen D, Southerland WM, del Sol A, Pan Y, Nussinov R, et al. Ligand binding and circular permutation modify residue interaction network in DHFR. PLoS Comput Biol. 2007; 3:e117.
10.1371/journal.pcbi.0030117
CAS PubMed Web of Science® Google Scholar
Huggins DJ, Biggin PC, Dämgen MA, Essex JW, Harris SA, Henchman RH, et al. Biomolecular simulations: from dynamics and mechanisms to computational assays of biological activity. WIRES Comp Mol Sci. 2018; 9:e1393.
10.1002/wcms.1393
Google Scholar
Jack BR, Meyer AG, Echave J, Wilke CO. Functional sites induce long-range evolutionary constraints in enzymes. PLoS Biol. 2016; 14:e1002452.
10.1371/journal.pbio.1002452
PubMed Web of Science® Google Scholar
Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci USA. 2013; 110: 15674–15679.
10.1073/pnas.1314045110
CAS PubMed Web of Science® Google Scholar
Jelsch C, Mourey L, Masson J-M, Samama J-P. Crystal structure of Escherichia coli TEM1 β-lactamase at 1.8 Å resolution. Proteins. 1993; 16: 364–383.
10.1002/prot.340160406
CAS PubMed Web of Science® Google Scholar
Ji Z, Boxer SG. β-Lactamases evolve against antibiotics by acquiring large active-site electric fields. J Am Chem Soc. 2022; 144: 22289–22294.
10.1021/jacs.2c10791
CAS PubMed Google Scholar
Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparce inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012; 28: 184–190.
10.1093/bioinformatics/btr638
CAS PubMed Web of Science® Google Scholar
Kalp M, Buynak JD, Carey PR. Role of E166 in the imine to enamine tautomerization of the clinical β-lactamase inhibitor sulbactam. Biochemistry. 2009; 48: 10196–10198.
10.1021/bi901416t
CAS PubMed Web of Science® Google Scholar
Khersonsky O, Lipsh R, Avizemer Z, Ashani Y, Goldsmith M, Leader H, et al. Automated design of efficient and functionally diverse enzyme repertoires. Mol Cell. 2018; 72: 178–186.e175.
10.1016/j.molcel.2018.08.033
CAS PubMed Web of Science® Google Scholar
La Sala G, Pfleger C, Käck H, Wissler L, Nevin P, Böhm K, et al. Combining structural and coevolution information to unveil allosteric sites. Chem Sci. 2023; 14: 7057–7067.
10.1039/D2SC06272K
CAS PubMed Web of Science® Google Scholar
Lee B-C, Park K, Kim D. Analysis of the residue-residue coevolution network and functionally important residues in proteins. Prot Struct Func Bioinformat. 2008; 72: 863–872.
10.1002/prot.21972
CAS PubMed Web of Science® Google Scholar
Levy ED, Pereira-Leal JB. Evolution and dynamics of protein interactions and networks. Curr Opin Struct Biol. 2008; 18: 349–357.
10.1016/j.sbi.2008.03.003
CAS PubMed Web of Science® Google Scholar
Livermore DM. β-Lactamases in laboratory and clinical resistance. Clin Microbiol Rev. 1995; 8: 557–584.
10.1128/CMR.8.4.557
CAS PubMed Web of Science® Google Scholar
Maria-Solano MA, Kinateder T, Iglesias-Fernández J, Sterner R, Osuna S. In silico identification and experimental validation of distal activity-enhancing mutations in tryptophan synthase. ACS Catal. 2021; 11: 13733–13743.
10.1021/acscatal.1c03950
CAS PubMed Web of Science® Google Scholar
McCormick JW, Russo MAX, Thompson S, Blevins A, Reynolds KA. Structurally distributed surface sites tune allosteric regulation. Elife. 2021; 10:e68346.
10.7554/eLife.68346
CAS PubMed Web of Science® Google Scholar
McKinney W. Data structures for statistical computing in Python. Proc Python Sci Conf. 2010; 56–61.
10.25080/Majora-92bf1922-00a
Google Scholar
Meroueh SO, Fisher JF, Schlegel HB, Mobashery S. Ab initio QM/MM study of class A β-lactamase acylation: dual participation of Glu166 and Lys73 in a concerted base promotion of Ser70. J Am Chem Soc. 2005; 44: 15397–15407.
10.1021/ja051592u
Google Scholar
Minasov G, Wang X, Shoichet BK. An ultrahigh resolution structure of TEM-1 β-lactamase suggests a role for Glu166 as the General Base in acylation. J Am Chem Soc. 2002; 19: 5333–5340.
10.1021/ja0259640
Web of Science® Google Scholar
Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A. 2011; 108: E1301–E1392.
10.1073/pnas.1111471108
Web of Science® Google Scholar
Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. Elife. 2014; 1:e02030.
Google Scholar
Padayatti PS, Helfand MS, Ma T, Carey MP, Hujer AM, Carey PR, et al. Tazobactam forms a stoichiometric trans-enamine intermediate in the E166A variant of SHV-1 β-lactamase: 1.63 Å crystal structure. Biochemistry. 2004; 43: 843–848.
10.1021/bi035985m
CAS PubMed Web of Science® Google Scholar
Pawlowski PH, Kaczanowski S, Zielenkiewicz P. A kinetic model of the evolution of a protein interaction network. BMC Genomics. 2013; 14: 172.
10.1186/1471-2164-14-172
CAS PubMed Web of Science® Google Scholar
Philippon A, Slama P, Dény P, Labia R. A structure-based classification of class a β-lactamases, a broadly diverse family of enzymes. Clin Microbiol Rev. 2015; 29: 29–57.
10.1128/CMR.00019-15
Google Scholar
Reynolds KA, McLaughlin RN, Ranganathan R. Hot spots for allosteric regulation on protein surfaces. Cell. 2011; 147: 1564–1575.
10.1016/j.cell.2011.10.049
CAS PubMed Web of Science® Google Scholar
Šali A, Blundell TL. Comparative protein modeling by satisfaction of spatial restraints. J Mol Biol. 1993; 234: 779–815.
10.1006/jmbi.1993.1626
CAS PubMed Web of Science® Google Scholar
Sauvage E, Fonzé E, Quinting B, Galleni M, Frère J-M, Charlier P. Crystal structure of the mycobacterium fortuitum class A β-lactamase: structural basis for broad substrate specificity. Antimicrob Agents Chemother. 2006; 50: 2516–2521.
10.1128/AAC.01226-05
CAS PubMed Web of Science® Google Scholar
Scheurer M, Rodenkirch P, Siggel M, Bernardi RC, Schulten K, Tajkhorshid E, et al. PyContact: rapid, customizable, and visual analysis of noncovalent interactions in MD simulations. Biophys J. 2018; 114: 577–583.
10.1016/j.bpj.2017.12.003
CAS PubMed Web of Science® Google Scholar
Schoenrock A, Burnside D, Moteshareie H, Pitre S, Hooshyar M, Green JR, et al. Evolution of protein-protein interaction networks in yeast. PLoS One. 2017; 12:e0171920.
10.1371/journal.pone.0171920
PubMed Web of Science® Google Scholar
Schüler A, Bornberg-Bauer E. The evolution of protein interaction networks. Methods Mol Biol. 2011; 696: 273–289.
10.1007/978-1-60761-987-1_17
CAS PubMed Google Scholar
Seeber M, Felline A, Raimondi F, Muff S, Friedman R, Rao F, et al. Wordom: a user-friendly program for the analysis of molecular structures, trajectories, and free energy surfaces. J Comput Chem. 2011; 32: 1183–1194.
10.1002/jcc.21688
CAS PubMed Web of Science® Google Scholar
Sladek V, Yamamoto Y, Harada R, Shoji M, Shigeta Y, Sladek V. pyProGA-A PyMOL plugin for protein residue network analysis. PLoS One. 2021; 30:e0255167.
Google Scholar
Stumpf MPH, Kelly WP, Thorne T, Wiuf C. Evolution at the system level: the natural history of protein interaction networks. Trends Ecol Evol. 2007; 22: 366–373.
10.1016/j.tree.2007.04.004
PubMed Web of Science® Google Scholar
Sumbalova L, Stourac J, Martinek T, Bedner D, Damborsky J. HotSpot wizard 3.0: web server for automated design of mutations and smart libraries based on sequence input information. Nucleic Acids Res. 2018; 46: W356–W362.
10.1093/nar/gky417
CAS PubMed Web of Science® Google Scholar
Sun MGF, Kim PM. Evolution of biological interaction networks: from models to real data. Genome Biol. 2011; 12: 235.
10.1186/gb-2011-12-12-235
PubMed Web of Science® Google Scholar
The Pandas Development Team. pandas-dev/pandas: Pandas. Zenodo. 2020.
Google Scholar
Tooke CL, Hinchcliffe P, Lang PA, Mulholland AJ, Brem J, Schofield CJ. Molecular basis of class A β-lactamase inhibition by relebactam. Antimicrob Agents Chemother. 2019a; 63:e00564-00519.
10.1128/AAC.00564-19
Web of Science® Google Scholar
Tooke CL, Hinchliffe P, Bragginton EC, Colenso CK, Hirvonen VHA, Takebayashi Y, et al. β-Lactamases and β-lactamase inhibitors in the 21st century. J Mol Biol. 2019b; 431: 3472–3500.
10.1016/j.jmb.2019.04.002
CAS PubMed Web of Science® Google Scholar
Tse A, Verkhivker GM. Molecular determinants underlying binding specificities of the ABL kinase inhibitors: combining alanine scanning of binding hot spots with network analysis of residue interactions and coevolution. PLoS One. 2015; 10:e130203.
10.1371/journal.pone.0130203
PubMed Web of Science® Google Scholar
Wagner A. The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol Biol Evol. 2001; 18: 1283–1292.
10.1093/oxfordjournals.molbev.a003913
CAS PubMed Web of Science® Google Scholar
Zhang X, Perica T, Teichmann SA. Evolution of protein structures and interactions from the perspective of residue contact networks. Curr Opin Struct Biol. 2013; 23: 954–963.
10.1016/j.sbi.2013.07.004
CAS PubMed Web of Science® Google Scholar
Zhou J, Panaitiu AE, Grigoryan G. A general-purpose protein design framwork based on mining sequence-structure relationships in known protein structures. Proc Natl Acad Sci U S A. 2019; 117: 1059–1068.
10.1073/pnas.1908723117
PubMed Google Scholar
Zitnik M, Sosič R, Feldman MW, Leskovec J. Evolution of resilience in protein Interactomes across the tree of life. Proc Natl Acad Sci U S A. 2019; 116: 4426–4433.
10.1073/pnas.1818013116
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume33, Issue3

March 2024

e4911

This article also appears in:

Tools for Protein Science 2024

Key interaction networks: Identifying evolutionarily conserved non-covalent interaction networks across protein families

Abstract

1 INTRODUCTION

1.1 Description of KIN workflow

1.2 Comparison of static and dynamic KIN analysis

1.3 Conservation of interaction networks in the β-lactamase family

1.4 Comparison of KIN to sequence based data

1.5 Applicability of KIN towards protein engineering

2 DISCUSSION

ACKNOWLEDGMENTS

Supporting Information

REFERENCES

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Key interaction networks: Identifying evolutionarily conserved non-covalent interaction networks across protein families

Abstract

1 INTRODUCTION

1.1 Description of KIN workflow

1.2 Comparison of static and dynamic KIN analysis

1.3 Conservation of interaction networks in the β-lactamase family

1.4 Comparison of KIN to sequence based data

1.5 Applicability of KIN towards protein engineering

2 DISCUSSION

ACKNOWLEDGMENTS

Supporting Information

REFERENCES

Citing Literature

Figures

References

Related

Information