RESEARCH ARTICLE

Free Access

Systematic enhancement of protein crystallization efficiency by bulk lysine-to-arginine (KR) substitution

Nooriel E. Banayan

orcid.org/0000-0001-9064-4923

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Software, Investigation, Formal analysis, Writing - original draft, Visualization, Writing - review & editing, Validation, Data curation, Methodology

Search for more papers by this author

Blaine J. Loughlin,

Blaine J. Loughlin

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Investigation

Search for more papers by this author

Shikha Singh,

Shikha Singh

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Methodology, Investigation

Search for more papers by this author

Farhad Forouhar,

Farhad Forouhar

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Investigation

Search for more papers by this author

Guanqi Lu,

Guanqi Lu

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Software

Search for more papers by this author

Kam-Ho Wong,

Kam-Ho Wong

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Investigation

Search for more papers by this author

Matthew Neky,

Matthew Neky

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Investigation

Search for more papers by this author

Henry S. Hunt,

Henry S. Hunt

Department of Physics, Stanford University, Stanford, California, USA

Contribution: Software

Search for more papers by this author

Larry B. Bateman Jr,

Larry B. Bateman Jr

Accendero Software, Idaho Falls, Idaho, USA

Contribution: Software

Search for more papers by this author

Angel Tamez,

Angel Tamez

Accendero Software, Idaho Falls, Idaho, USA

Contribution: Software

Search for more papers by this author

Samuel K. Handelman,

Samuel K. Handelman

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Methodology

Search for more papers by this author

W. Nicholson Price,

W. Nicholson Price

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Methodology

Search for more papers by this author

John F. Hunt,

Corresponding Author

John F. Hunt

[email protected]

orcid.org/0000-0003-0034-5167

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Correspondence

John F. Hunt, Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, NY 10027, USA.

Email: [email protected]

Contribution: Conceptualization, Software, Formal analysis, Funding acquisition, Project administration, Writing - original draft, Methodology, Data curation, Validation, Supervision, Visualization, Resources, Writing - review & editing

Search for more papers by this author

Nooriel E. Banayan,

Nooriel E. Banayan

orcid.org/0000-0001-9064-4923

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Software, Investigation, Formal analysis, Writing - original draft, Visualization, Writing - review & editing, Validation, Data curation, Methodology

Search for more papers by this author

Blaine J. Loughlin,

Blaine J. Loughlin

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Investigation

Search for more papers by this author

Shikha Singh,

Shikha Singh

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Methodology, Investigation

Search for more papers by this author

Farhad Forouhar,

Farhad Forouhar

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Investigation

Search for more papers by this author

Guanqi Lu,

Guanqi Lu

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Software

Search for more papers by this author

Kam-Ho Wong,

Kam-Ho Wong

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Investigation

Search for more papers by this author

Matthew Neky,

Matthew Neky

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Investigation

Search for more papers by this author

Henry S. Hunt,

Henry S. Hunt

Department of Physics, Stanford University, Stanford, California, USA

Contribution: Software

Search for more papers by this author

Larry B. Bateman Jr,

Larry B. Bateman Jr

Accendero Software, Idaho Falls, Idaho, USA

Contribution: Software

Search for more papers by this author

Angel Tamez,

Angel Tamez

Accendero Software, Idaho Falls, Idaho, USA

Contribution: Software

Search for more papers by this author

Samuel K. Handelman,

Samuel K. Handelman

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Methodology

Search for more papers by this author

W. Nicholson Price,

W. Nicholson Price

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Contribution: Methodology

Search for more papers by this author

John F. Hunt,

Corresponding Author

John F. Hunt

[email protected]

orcid.org/0000-0003-0034-5167

Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, New York, USA

Correspondence

John F. Hunt, Department of Biological Sciences, 702A Sherman Fairchild Center, MC2434, Columbia University, New York, NY 10027, USA.

Email: [email protected]

Search for more papers by this author

First published: 15 February 2024

https://doi.org/10.1002/pro.4898

Citations: 2

Nooriel E. Banayan and Blaine J. Loughlin contributed equally to the work reported in this paper.

Review Editor: Jeanine Amacher

Share a link

Email
Wechat
Bluesky

Abstract

Structural genomics consortia established that protein crystallization is the primary obstacle to structure determination using x-ray crystallography. We previously demonstrated that crystallization propensity is systematically related to primary sequence, and we subsequently performed computational analyses showing that arginine is the most overrepresented amino acid in crystal-packing interfaces in the Protein Data Bank. Given the similar physicochemical characteristics of arginine and lysine, we hypothesized that multiple lysine-to-arginine (KR) substitutions should improve crystallization. To test this hypothesis, we developed software that ranks lysine sites in a target protein based on the redundancy-corrected KR substitution frequency in homologs. This software can be run interactively on the worldwide web at https://www.pxengineering.org/. We demonstrate that three unrelated single-domain proteins can tolerate 5–11 KR substitutions with at most minor destabilization, and, for two of these three proteins, the construct with the largest number of KR substitutions exhibits significantly enhanced crystallization propensity. This approach rapidly produced a 1.9 Å crystal structure of a human protein domain refractory to crystallization with its native sequence. Structures from Bulk KR-substituted domains show the engineered arginine residues frequently make hydrogen-bonds across crystal-packing interfaces. We thus demonstrate that Bulk KR substitution represents a rational and efficient method for probabilistic engineering of protein surface properties to improve crystallization.

1 INTRODUCTION

More than 50 years after the solution of the first protein crystal structure (Kendrew, 1959; Kendrew et al., 1958; Kendrew & Perutz, 1948), protein crystallization remains a hit-or-miss proposition. Synergistic developments in crystallographic methods (Hendrickson et al., 1990; Liebschner et al., 2019; Liu & Hendrickson, 2017; Otwinowski & Minor, 1997; Sheldrick, 2010; Terwilliger, 2001), synchrotron beamlines (Grimes et al., 2018; Hendrickson, 2000; Sanishvili & Fischetti, 2017; Wilson, 2022), and high-speed computing have made structure solution and refinement routine, even for massive complexes, but only if high-quality crystals are available. However, there has been comparatively little progress in improving methods for protein crystallization. Structural genomics consortia systematically confirmed that most naturally occurring proteins do not readily yield high-quality crystals suitable for x-ray structure determination and that crystallization is the major obstacle to the determination of protein structures using diffraction methods (Canaves et al., 2004; Price 2nd et al., 2009; Slabinski et al., 2007). While numerous methods have been developed that have some efficacy in improving protein crystallization properties (Anstrom et al., 2005; Cieslik & Derewenda, 2009; Cooper et al., 2007; Czepas et al., 2004; Derewenda, 2004a; Derewenda, 2004b; Derewenda & Godzik, 2017; Derewenda & Vekilov, 2006; Janda et al., 2004; Longenecker et al., 2001; Mateja et al., 2002; Qiu & Janson, 2004), none work with sufficiently high efficiency to have been applied with significant frequency by practicing crystallographers. While the development of AlphaFold has provided a de facto solution to the protein-folding problem for many sequence families (Jumper et al., 2021; Jumper & Hassabis, 2022; Senior et al., 2020), its occasional failure, limited stereochemical accuracy, and inability to date to model ligand complexes means protein crystallography remains widely practiced (Chowdhury et al., 2022; Hendrickson, 2023; Oeffner et al., 2022; Terwilliger et al., 2022; Terwilliger et al., 2023), especially for structure-based drug discovery projects (Bijak et al., 2023). We therefore set out to develop more efficient methods for rational engineering of protein surface properties to improve crystallization propensity.

The first phase of our research identified a large number of local primary sequence patterns, which we called crystallization epitopes, that are strongly overrepresented in crystal-packing interfaces (Naumov et al., 2019). We demonstrated that introducing these epitopes individually into proteins generally increases their crystallization propensity and that introducing multiple such epitopes progressively increases crystallization propensity. The cumulative nature of the observed improvements suggested that multiple simultaneous mutations could potentially produce definitive improvements in crystallization propensity in a single protein construct based on large-scale probabilistic engineering of protein surface properties. We herein present an efficient method to achieve this goal while preserving protein stability and solubility.

Our efforts to develop rational methods to improve protein crystallization properties are grounded in sequence and structural analyses of historical crystallization results and associated thermodynamic studies. Our published analyses of large-scale experimental studies showed that several surface properties of proteins are a significant determinant of protein crystallization propensity (Price 2nd et al., 2009). These studies demonstrated that overall thermodynamic stability is not a major determinant of protein crystallization propensity. They identified a number of primary sequence properties that correlate with successful crystal structure determination, including significant anticorrelations with predicted backbone disorder and the sidechain entropy of predicted solvent-exposed residues as well as significant positive correlations with the fractional content of several individual amino acids (Price 2nd et al., 2009). In follow-up studies, we analyzed 87,683 crystal structures from the Protein Data Bank (PDB) and identified contiguous amino acid patterns strongly overrepresented in crystal packing interfaces (Naumov et al., 2019). This analysis also generated data on the relative overrepresentation of individual amino acids in crystal-packing interfaces segregated by protein secondary structure (Figure 1), and these data suggested the streamlined approach reported in this paper that enhances protein crystallization propensity based on multiple simultaneous surface mutations.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Overrepresentation ratios (Naumov et al., 2019) of amino acids in crystal-packing interfaces normalized to overall surface composition in 87,684 crystal structures deposited in the Protein Data Bank (PDB). The thick dotted line schematizes the gain in packing probability produced by K-to-R mutations, while the thin dotted lines schematize the changes from N-to-Q and D-to-E mutations. The overrepresentation ratios are segregated by protein 2° structure as assessed by DSSP (Kabsch & Sander, 1983), and the amino acids are ordered in decreasing order of overrepresentation ratio in α-helical secondary structure.

Our computational analysis of crystal-packing interactions in the PDB showed a substantially higher probability for arginine to mediate inter-molecular packing contacts than lysine (Figure 1), consistent with our expectations based on earlier analyses of correlations between primary sequence features and protein crystallization propensity (Price 2nd et al., 2009). The observation that arginine mediates crystal-packing contacts more frequently than lysine is particularly notable because the entropy of the arginine sidechain is estimated to be somewhat higher than that of lysine (Bhowmick & Head-Gordon, 2015; DuBay & Geissler, 2009; Srinivasan & Rose, 1999; Sternberg & Chickos, 1994), which implies its immobilization in an intermolecular interface should tend to incur a higher entropic penalty that reduces its probability of making crystal-packing contacts (Cooper et al., 2007; Czepas et al., 2004; Derewenda, 2004b; Janda et al., 2004; Price 2nd et al., 2009). Therefore, the more frequent occurrence of arginine compared to lysine in crystal-packing contacts suggests that the guanidino group on arginine is substantially “stickier,” in terms of intermolecular interaction free energy than the primary amine on lysine.

This inference concerning the comparative interaction potential of arginine vs. lysine is supported by research in physical biochemistry (Auton et al., 2007; Auton et al., 2011; Auton & Bolen, 2007; Beck et al., 2007; Bennion & Daggett, 2003; Bolen & Rose, 2008; Ferreon & Bolen, 2004; Holthauzen et al., 2010; Hu et al., 2010; Scott et al., 2008; Vener et al., 2015; Wetlaufer & Lovrien, 1964; Yang et al., 2000). One straightforward contributor is likely to be the greater hydrogen-bonding (H-bonding) potential of the guanidino group in arginine compared to the primary amine in lysine (i.e., the presence of five H-bond donor protons compared to three). The significance of this factor is supported by molecular dynamics simulations showing multiple stable interaction geometries for salt bridges containing arginine (Liu et al., 2022; Vener et al., 2015). Arginine is much more effective than lysine in inhibiting protein aggregation, which is believed to reflect strong solvation interactions between arginine and protein surfaces (Arakawa et al., 2007; Arakawa & Tsumoto, 2003; Tischer et al., 2010; Tischer et al., 2014; Tsumoto et al., 2005), including at apolar sites (Das et al., 2007; Mitchell et al., 1994). The guanidinium ion, a close analog of the guanidino group in arginine, shows thermodynamically significant interactions with some proteins in their native conformations (Courtenay et al., 2000; Courtenay et al., 2001; Ferreon & Bolen, 2004; Makhatadze & Privalov, 1992; Zarrine-Afsar et al., 2006), and it also enhances the solvation (Venkatesu et al., 2007) and solubility (Wetlaufer et al., 1964) of apolar groups. The greater interaction potential of arginine is further supported by the well-known properties of the guanidinium ion as a potent protein denaturant (Auton et al., 2011; Bolen & Rose, 2008; Pace et al., 2004; Schellman, 1987), a property that is not shared by primary amines. The thermodynamics of the denaturation process involve an interplay between enthalpically favorable solvation interactions with protein group (Courtenay et al., 2000; Courtenay et al., 2001; Ferreon & Bolen, 2004; Liu & Bolen, 1995; Nandi & Robinson, 1984; Robinson & Jencks, 1963; Robinson & Jencks, 1965; Schrier & Schrier, 1976; Scott et al., 2008; Venkatesu et al., 2007; Zheng et al., 2016), perturbations in water structure (Scott et al., 2008) that weaken the hydrophobic effect (Scott et al., 2008; Wetlaufer et al., 1964), and competition with polar groups on the protein for H-bonding to water. Experimental measurements of the free energy of transfer (Nozaki & Tanford, 1970) of model peptides (Courtenay et al., 2000; Courtenay et al., 2001; Liu & Bolen, 1995; Nandi & Robinson, 1984; Robinson & Jencks, 1963; Robinson & Jencks, 1965; Venkatesu et al., 2007) support strong solvation of the backbone being a dominant contributor to guanidinium-induced protein denaturation (Auton et al., 2011; Ferreon & Bolen, 2004; Liu & Bolen, 1995; Schrier & Schrier, 1976). NMR measurements of backbone amide exchange rates show that guanidinium does not H-bond to the backbone (Lim et al., 2009), while quantum chemical calculations support its favorable solvation of the backbone deriving from strong enthalpic interactions with the Cβ atom of the amino acids (Scott et al., 2008). Finally, molecular dynamics simulations show stabilizing interactions increasing the duration of non-covalent associations of guanidinium with both the backbone and many sidechains (Zheng et al., 2016).

Based on the multifaceted evidence that arginine mediates stronger intermolecular interactions with protein groups than lysine, we hypothesized that introducing multiple arginine-to-lysine (KR) substitutions in a protein would enhance crystallization propensity. We furthermore hypothesized that, given the very similar physicochemical properties of arginine and lysine in terms of size and polarity, multiple simultaneous substitutions would be tolerated without significantly impairing thermodynamic stability.

We herein report the results of biophysical studies that support the validity of this hypothesis. We developed a computer program that automates the selection of sites for KR mutagenesis based on the frequency of such substitutions in naturally occurring homologs, which should tend to avoid sites where lysine is critical for function or structural stability. We furthermore characterized the effects of introducing multiple simultaneous KR mutations on the thermodynamic stability, solubility, and crystallization propensity of three unrelated test proteins, one of which crystallizes readily and two of which are recalcitrant to crystallization with their native sequences. These studies demonstrate that introducing multiple KR mutations into a protein, which we call Bulk KR substitution, is a simple and effective method to improve crystallization propensity. Physicochemical analyses have thus guided the development of an efficient method for large-scale probabilistic engineering of protein surface properties to improve crystallization, which was historically considered a stochastic phenomenon refractory to rational experimental manipulation.

2 RESULTS

2.1 KR mutation site-selection algorithm and software

Sites for Bulk KR substitution are ranked and selected based on the frequency of these substitutions in naturally evolved sequences in a phylogenetic alignment (Figure 2). The procedure is fully automated in Python code that is available for download and that can also be run interactively via the worldwide web using our protein crystallization engineering webserver (see Section 4 for details.) The algorithm implemented by the program ranks sites based on a redundancy-compensated estimate (explained below) of the frequency of KR substitutions in homologous sequences, which are divided into mutually exclusive bins with progressively lower levels of overall percent identity relative to the target sequence. The first bin includes sequences with less than 99% identity (to avoid mutant variants of the target sequence) and greater than or equal to 90% identity. The next bin includes sequences with less than 90% identity and greater than or equal to 80% identity, while subsequent mutually exclusive bins reduce the range of identity levels in 10% steps down to a minimum of 30%. The algorithm steps through these bins in order from highest to lowest percent identity. In each bin, it selects first the site with the highest estimated number of independent arginine substitutions among up to the seven most remotely related homologs in that bin (Figure 2d). It then selects additional sites that show KR substitutions in the same percent identity bin in decreasing order of their estimated number of independent arginine substitutions among the same set of most remote homologs, stopping when it hits a user-adjustable minimum count.

This threshold count is imposed to avoid selecting a site based on an arginine substitution in a single sequence that could potentially be inaccurate or present only in a small number of very closely related sequences, in which case they could potentially share a function-impairing or stability-impairing mutation. The threshold count defaults to a value of 1.1, which ensures observation of a KR substitution in at least two sequences with no more than ~93% identity to one another. Given the details of our heuristic sequence-divergence metric described in Section 4.3, every 0.1 increase in the threshold count is equivalent to either requiring ~7% lower identity in two sequences having the same substitution or having an additional pair of homologs at the same divergence level with the same substitution.

After hitting the specified threshold or selecting all sites showing KR substitutions above the threshold count in the bin being evaluated, the algorithm progresses to the next lower percent identity bin and implements the same site-selection protocol in that bin. This selection algorithm continues until the same site-selection protocol is executed in the final mutually exclusive bin, which includes sequences with less than 40% identity and greater than or equal to 30% identity to the target protein. The algorithm thus provides a rank-order for mutation of all lysine sites in the target protein at which mutations to arginine are observed above the threshold count in any of the evaluated percent identity bins. The program outputs the complete list of sites selected by the algorithm ranked according to their order of selection (e.g., as shown in Figure 2e).

The software provides graphical displays of summary parameters characterizing the amino acid distribution in the homologs in each of the percent-identity bins at every lysine site in the target sequence (Figure 2), as well as a graphical display of the overall sequence diversity in each of the bins (Figure S1). The displayed summary parameters are the Shannon entropy of the amino-acid frequency distribution, the frequency of all residues other than lysine, the KR ratio, the total count of sequences with an arginine residue at the site, and two different estimates of that count after compensation for redundancy between those sequences. Both redundancy-compensation calculations use the same heuristic estimate of the degree of mutational resampling between pairs of sequences, as described in Section 4.3, which provides explanations of the details of the two algorithms. In brief, the first redundancy-reduced count evaluates all sequences using a calculation that has rigorously correct behavior in the cases of full redundancy and full independence between the sequences but is otherwise approximate. The second count provides a rigorous probabilistic estimate of the number of independent KR substitutions between the seven most remotely related sequence pairs that have arginine at that site in the percent identity bin being evaluated. Extending this calculation to more sequences is computationally prohibitive, but the estimate based on a limited set of the most diverged homologs provides a highly effective method to ensure that multiple independently determined protein sequences have an arginine residue at the lysine site in the target protein, which is the essential goal of the redundancy-compensation calculations. This second calculation is used for the automated site-ranking algorithm described above.

The program additionally provides a ranking of sites for introducing aspartate-to-glutamate (DE) and asparagine-to-glutamine (NQ) mutations together with a record of which of those sites have potential salt-bridging or H-bonding partners in the target sequence, which would tend to reduce the entropy of the longer sidechains in beta-sheet (i ± 2) or α-helical (i ± 3, i ± 4) secondary structures (Donald et al., 2011; Olson et al., 2001; Vener et al., 2015). (The rationale behind this approach is described in Section 3 below.) Lysine, arginine, and histidine are considered potential salt-bridging (ionic interaction) partners for glutamate and H-bonding partners for glutamine. Asparagine, glutamine, serine, and threonine are considered potential H-bonding partners for both glutamate and glutamine, while aspartate and glutamate are also considered potential H-bonding partners for glutamine.

Our site-selection strategy, which is based on making amino acid substitutions observed in multiple non-redundant homologs, will tend to preserve activity because evolutionary selection for organismic fitness tends to preserve protein function. Consistently with the observed plasticity in homologous protein sequences in the course of biological evolution, large-scale saturation mutagenesis studies using next-generation sequencing methods support most amino acid substitutions at most positions preserving qualitative protein function (Gupta & Varadarajan, 2018; Hopf et al., 2017; Kelsic et al., 2016; Nisthal et al., 2019). Based on these data, a physicochemically conservative substitution observed in strongly homologous proteins is unlikely to abrogate activity. Nonetheless, multiple conservative point mutations could impair protein function, so direct experimental evaluation of protein activity would be required to verify the functional competency of proteins harboring Bulk KR mutations. Lysine residues known or suspected to be important for activity or involved in oligomeric interactions should generally be avoided when implementing the method.

2.2 Test protein selection and expression

We chose to test the Bulk KR substitution approach using three proteins with different crystallization properties. The hPDIa domain is a human drug target (Hoffstrom et al., 2010; Khan et al., 2011) that represents the first of four domains in the endoplasmic-reticulum-resident human Protein Disulfide Isomerase (hPDI) protein. The hPDIa domain had never successfully been crystallized on its own, but its structure was known from a relatively low-resolution crystal structure of a much longer multi-domain construct containing hPDIa (Wang et al., 2013). This crystal structure enables evaluation of the impact of Bulk KR substitutions on the conformation of the hPDIa domain, as reported below. Escherichia coli RNaseH is difficult to crystallize in the absence of ligands stabilizing active site structure but has had its crystal structure determined by groups studying its enzymological mechanism and folding (Goedken & Marqusee, 2001; Katayanagi et al., 1992; Katayanagi, Ishikawa, et al., 1993; Katayanagi, Okumura, et al., 1993; Liao et al., 2022; Yang et al., 1990). MA_2137, an S-adenosyl-methionine-dependent RNA methyltransferase from Methanosarcina acetivorans, crystallizes well in the presence of S-adenosyl-homocysteine (SAH), the product of the methyltransferase reaction that it catalyzes. We included this last protein because we previously demonstrated that increasing hit count in high-throughput crystallization screening is strongly correlated with the probability of successful crystal-structure determination (Price 2nd et al., 2009), which implies quantification of hit count for a protein that crystallizes well is an effective assay for crystallization propensity. KR mutations were introduced into the D65R mutant of MA_2137 because we had previously demonstrated that this single mutation improves the crystallization of this protein, and we wanted to determine whether Bulk KR substitutions could improve it even further.

We introduced 2–13 KR mutations into these proteins (Table 1), and we first examined the expression and solubility levels of the full set of mutant constructs when expressed from a pET plasmid using T7 RNA polymerase in E. coli, which yields high-level expression of the three parental proteins in the form of efficiently purified monodisperse monomers. The largest number of KR mutations that we tested preserved high-yield protein production in a monodisperse state for both hPDIa and MA_2137-D65R (i.e., the hPDIa-9KR and MA_2137-D65R-11KR constructs). Only two native lysine residues remain in each of these constructs. The RNaseH-2KR and RnaseH-5KR constructs similarly preserved high-yield protein production in a monodisperse state. However, the RnaseH-7KR construct yielded polydisperse protein that co-purified with the Hsp33 molecular chaperone protein (Graf et al., 2004; Moayed et al., 2020), while the RnaseH-11KR was completely insoluble even though it expressed at a high level (data not shown). The stability studies presented in the next section confirm earlier research (Goedken et al., 2000; Ishikawa et al., 1993) showing that RnaseH has a low thermal melting temperature of ~45°C, making it marginally stable, which likely explains its tolerance for fewer KR mutations than the other target proteins.

TABLE 1. Summary of expression, stability, and crystallization results for Bulk KR mutant proteins.^a

Target protein	Construct	KR mutation sites	Expression	Solubility	Apparent T_m (°C)	Apparent ΔH_vH (kcal/mol)	# Hits at 4 weeks	Crystal resolution (Å)
hPDIa 120 amino acids with 11 native lysines	WT	−	+++	+++	59.9, 67.9	204 ± 36, 131 ± 9	0	n/p
	2KR	42, 114	+++	+++	58.9, 68.6	201 ± 40, 117 ± 5	n/a	n/a
	5KR	2KR + 130, 69, 71	+++	+++	48.2, 65.7	47.7 ± 2, 125 ± 3	n/a	n/a
	7KR	5KR + 31, 131	+++	+++	49.5, 64.1	89.8 ± 5, 156 ± 5	n/a	n/a
	9KR	7KR + 57, 65	+++	+++	60.4	161 ± 3	9	1.89 Å
MA_2137 194 amino acids with 13 native lysines	WT	−	+++	+++	69.0	283 ± 13.5	75	n/a
	D65R	−	+++	+++	68.7	250 ± 4.60	126	1.60 Å
	D65R-3KR	126, 129, 194	+++	+++	67.1	194 ± 3.30	n/a	n/a
	D65R-5KR	3KR + 52, 71	+++	+++	67.4	153 ± 7.10	n/a	n/a
	D65R-7KR	5KR + 172, 133	+++	+++	65.5	193 ± 3.90	n/a	n/a
	D65R-11KR	7KR + 8, 64, 142, 155	+++	+++	63.4	150 ± 2.60	238	1.91 Å
RNaseH 166 amino acid with 11 native lysines	WT	−	+++	+++	45.2	135 ± 2.20	0	n/p
	2KR	31, 90	+++	+++	n/a	n/a	n/a	n/a
	5KR	2KR + 66, 89, 35	+++	+++	45.5	128 ± 2.60	0	n/p
	7KR	5KR + 107, 124	+++	+	n/p	n/p	n/p	n/p
	11KR	7KR + 111, 62, 88, 101	+++	−	n/p	n/p	n/p	n/p

^a The constructs harboring increasing numbers of KR mutations also include all of those in the constructs of the same protein harboring fewer KR mutations. The order of addition of KR mutations for hPDIa is slightly different from the rankings produced by the automated site-selection algorithm (Figure 2e) because this experiment was initiated before development of that software was completed. The code “n/p” stands for not possible, while the code “n/a” stands for not applicable or not attempted. The T_m and ΔH_vH values are labeled as apparent because the reversibility of the unfolding transitions was not assessed in the thermal denaturation experiments.

2.3 KR mutations are generally only minimally destabilizing

The thermal stabilities of all the successfully purified Bulk KR constructs were characterized using circular dichroism (CD) spectroscopy. These assays show a variable but generally very small degree of destabilization by KR mutations (Figure 3 and Table 1). RNase-5KR shows an approximately unaltered apparent T_m compared to the wild-type (WT) protein, demonstrating that KR mutations can have a completely neutral effect on stability. PDIa-9KR shows an ~8° reduction compared to the 68°C apparent T_m of the WT domain, while MA_2137-D65R-11KR shows an ~6° reduction compared to the 69°C apparent T_m of the parental protein. Considering the entire set of mutant proteins in our study that could be purified, which includes 25 different KR mutations (Table 1), there is on average a 0.54 ± 0.30° reduction in apparent T_m per KR mutation. Therefore, KR mutations are generally very well tolerated, although large sets of mutations tend to produce modest reductions in protein stability (Sokalingam et al., 2012) that can reduce soluble protein yield in vivo when the stability of the WT protein is relatively low.

2.4 Bulk KR mutations enhance crystallization propensity and yield strongly diffracting crystals

The purified protein constructs harboring the largest number of KR mutations (i.e., PDIa-9KR and MA_2137-D65R-11KR) along with matched controls were screened for crystallization at the National Crystallization Center at the Hauptman-Woodward Institute (HWI) using their automated, high-throughput 1536-condition screen. This well-documented (Budziszewski et al., 2023; Luft et al., 2001; Luft et al., 2003; Luft, Snell, et al., 2011; Luft, Wolfley, et al., 2011; Lynch et al., 2023) microbatch under-oil screen was employed for initial crystallization screening by the Northeast Structural Genomics Consortium (Acton et al., 2011; Boel et al., 2016; Everett et al., 2016; Xiao et al., 2010) (www.nesg.org), which used it to generate 664 crystal structures deposited in the PDB. Neither the WT nor 5KR construct of RNaseH yielded any crystallization hits in a screen intentionally conducted without any ligands stabilizing active site structure in order to provide the most exacting test of protein crystallization propensity; the lack of success for this protein was potentially influenced by the high 15 mg/mL protein concentration used for screening, which produced pervasive amorphous precipitation in the screen at the earliest observation times (data not shown). However, the hPDIa-9KR and MA_2137-D65R-11KR constructs both yielded significantly more crystallization hits than the control proteins. MA_2137-D65R-11KR yielded hits under twice as many conditions as the MA_2137-D65R control protein, while hPDIa-9KR yielded nine high-quality hits compared to no hits at all for the WT construct (Figure 4 and Table 1). A small number of hit conditions for each protein were chosen for optimization, which very rapidly yielded 1.9 Å structures for both Bulk KR constructs based on a single session of remote synchrotron diffraction screening and data collection (Figures 5 and S1 and Table S1). Therefore, for both target proteins, crystallization screening only had to be conducted on the soluble construct harboring the largest number of Bulk KR mutations to rapidly obtain high-quality crystal structures.

2.5 Bulk KR mutations do not perturb protein structure and frequently make H-bonds in crystal-packing interfaces

The 1.9 Å crystal structures of our Bulk-KR-substituted constructs (Figures 5 and S2 and Table S1) show 0.32–0.33 Å root-mean-square deviations for their backbone Cα atoms compared to the reference structures (Figure S3) (i.e., the much larger multidomain hPDI(abb‘xa’) construct for hPDIa because the isolated domain has never successfully been crystallized before and the parental MA_2137-D65R construct for MA_2137-D65R-11KR). The observed deviations are close to the expected coordinate error in well-refined crystal structures in the operative resolution range (Cruickshank, 1960), indicating our Bulk KR substitution method does not significantly perturb protein conformation for either of our targets. Moreover, the detailed analyses of protein conformation presented in Figure S3 demonstrate that the distribution of the local backbone deviations at the KR mutation sites is equivalent to that observed for all residues in each structure.

Detailed analyses of the intermolecular interactions in our crystal structures demonstrate that the engineered arginine sidechains make extensive crystal-packing contacts. Their contact counts consistently exceed the number of van der Waals contacts and especially H-bonds made by the native arginine sidechains in the same constructs, and they greatly exceed the number of both kinds of contacts made by lysine sidechains in the parental constructs (Table 2 and Figure 5). The larger number of crystal-packing contacts made by the engineered vs. native arginine residues could potentially reflect greater sequestration of the native residues in local surface interactions reducing the probability of reaching across a packing interface to make an energetically stabilizing interaction with a neighboring molecule in the crystal lattice. More extensive experimentation will be required to evaluate this possibility and also to establish the statistical robustness of the trends documented in Table 2. The observed trends nonetheless support the premise underlying our Bulk KR substitution strategy, which was based on the substantially stronger overrepresentation of arginine versus lysine in crystal-packing interfaces in our large-scale analysis of crystal structures previously deposited in the PDB (Figure 1).

TABLE 2. Crystal-packing contacts in reference and Bulk KR protein structures.^a

PDB ID (chain)	4EKZ (A)	8GDY (A)	8GDY (B)	6MRO (A)	8GDU (A)
Construct	WT	9KR		D65R	D65R-11KR	Totals
Resolution (Å)	2.51	1.93	1.93	1.6	1.95
Crystal Solvent Content	44.2%	37.4%	37.4%	35.2%	51.2%
Domain	hPDIa	hPDIa	hPDIa	MA_2137	MA_2137
# Residues in domain	120	118	120	194	194	746
# Ordered surface residues	75	74	74	118	118	459
*# Ordered surface residues not K or R*	58	57	57	94	97	*363*
*# Disordered K*	0	0	0	0	0	0
*# Ordered K (all surface-exposed)*	11	2	2	13	2	30
*# Disordered native R*	0	0	0	0	2	2
*# Ordered native R (all surface-exposed)*	6	6	6	11	11	40
*# Disordered engineered R*	0	0	0	0	1	1
*# Ordered engineered R (all surface-exposed)*	*n/a*	9	9	*n/a*	8	26
BB vdW contacts per residue in domain	0.72	0.37	0.43	0.45	0.27	0.43
BB vdW per ordered surface residue	1.15	0.59	0.70	0.75	0.44	0.70
*BB vdW contacts per ordered surface residue not K or R*	*1.41*	*0.68*	*0.84*	*0.84*	*0.48*	*0.81*
*BB vdW contacts per ordered surface K*	*0.27*	0	0	*0.69*	0	0
*BB vdW contacts per ordered surface native R*	*0.17*	0	0	0	*0.09*	*0.05*
*BB vdW contacts per ordered surface engineered R*	*n/a*	*0.56*	*0.44*	*n/a*	*0.50*	*0.50*
BB H-bond per residue in domain	0.08	0	0.03	0.04	0.04	0.04
BB H-bond per ordered surface residue	0.12	0	0.05	0.07	0.06	0.06
*BB H-bonds per ordered surface residue not K or R*	*0.16*	0	*0.05*	*0.06*	*0.06*	*0.07*
*BB H-bonds per ordered surface K*	0	0	0	0	0	0
*BB H-bonds per ordered surface native R*	0	0	0	0	0	0
*BB H-bonds per ordered surface engineered R*	*n/a*	0	*0.11*	*n/a*	*0.13*	*0.08*
SC vdW per residue in domain	0.99	1.42	1.24	1.16	0.99	1.14
SC vdW per ordered surface residue	1.59	2.26	2.01	1.92	1.63	1.86
*SC vdW contacts per ordered surface residue not K or R*	*1.83*	*1.75*	*1.56*	*1.80*	*1.29*	*1.62*
*SC vdW contacts per ordered surface K*	*0.09*	0	0	*0.46*	0	0
*SC vdW contacts per ordered surface native R*	*2.00*	*3.33*	*3.33*	*4.64*	*3.09*	*3.43*
*SC vdW contacts per ordered surface engineered R*	*n/a*	*5.22*	*4.44*	*n/a*	*4.13*	*4.62*
SC H-bonds per residue	0.09	0.25	0.26	0.22	0.05	0.16
SC H-bond per ordered surface residue	0.15	0.39	0.42	0.36	0.08	0.27
*SC H-bonds per ordered surface residue not K or R*	*0.12*	*0.28*	*0.32*	*0.32*	0	*0.20*
*SC H-bonds per ordered surface K*	*0.09*	0	0	*0.15*	0	*0.10*
*SC H-bonds per ordered surface native R*	*0.50*	*0.67*	*0.50*	*0.91*	*0.09*	*0.53*
*SC H-bonds per ordered surface engineered R*	*n/a*	*1.00*	*1.11*	*n/a*	*1.13*	*1.08*
# Ordered K making 2, 1, 0 SC H-bonds	0, 1, 10	0, 0, 2	0, 0, 2	1, 0, 12	0, 0, 2	1, 1, 28
# Ordered native R making 5, 4, 3, 2, 1, 0 SC H-bonds	0, 0, 0, 1, 1, 4	0, 1, 0, 0, 0, 5	0, 0, 1, 0, 0, 5	1, 0, 0, 1, 1, 7	0, 0, 0, 0, 1, 10	1, 1, 1, 1, 5, 31
# Ordered engineered R making 5, 4, 3, 2, 1, 0 SC H-bonds	n/a	1, 0, 0, 1, 2, 5	0, 1, 1, 1, 1, 5	n/a	0, 0, 1, 2, 2, 3	1, 1, 2, 4, 5, 13

^a Statistics are tabulated separately for backbone (BB) and sidechain (SC) atoms in the indicated amino acids. The abbreviation n/a stands for not applicable. Note: Rows with a white background give counts relative to all residues in the protein construct. Rows highlighted in dark colors with entries in plain text give counts relaitve to all ordered surface-exposed residues, while rows highlighted in the corrresponding light colors with italicized entries give counts segregated according to amino acid type as indicated by the title on each line. Rows highlighted in shades of blue provide overall counts of surface-exposed residues, while rows highlighted in shades of orange provide counts of van der Waals contacts, and rows hihglighted in shades of yeloow provide counts of H-bonds. Ratios per engineered arginine reflect exclusively ordered residues and exclude the three disordered arginine residues in the MA_2137-D65R-11KR structure. Ordered residues are defined as having sufficient electron density to be included in the refined coordinate model deposited in the PDB. Candidate H-bonds were initially identified based on the participating heteroatoms having an internuclear separation ≤3.5 Å, but they were included in the count only if visually confirmed to have reasonable interaction geometry. Among the five crystal structures analyzed here, only two potential H-bonding interactions in crystal-packing interfaces fulfilled the distance criterion but failed the geometric evaluation. Individual atoms fulfilling the basic distance and geometric criteria with two different potential H-bonding partner atoms were counted as contributing two H-bonds (Vener et al., 2015), consistent with Coulomb's law being additive. Atom pairs with internuclear separation ≤4.0 Å not meeting the H-bonding criteria were counted as van der Waals contacts.

On average, the well-ordered engineered arginine sidechains in our Bulk KR structures make 1.08 H-bonds each to a neighboring protein molecule in the crystal lattice, compared to 0.48 each for the native arginine residues (Table 2). In comparison, the well-ordered lysine sidechains make an average of 0.10 H-bonds each to a neighboring protein molecule in the native structures and none in our Bulk KR structures. These results support our hypothesis for the physicochemical basis of the greater overrepresentation of arginine compared to lysine in crystal-packing interfaces in the PDB, which is that the guanidino group in arginine is substantially more efficacious than the primary amine group in lysine in mediating energetically stabilizing H-bonds in the relevant stereochemical contexts (Figure 5).

The number of van der Waals contacts per ordered sidechain follows a similar trend. Engineered and native arginine residues each make on average 4.62 and 3.43 contacts, respectively. In contrast, lysine residues in the native structures each make on average 0.28 contacts, while the lysine residues in the engineered structures make none (Table 2). The greater number of intermolecular van der Waals contacts made by the arginine sidechains could potentially be influenced by their greater H-bonding propensity leading to more frequent occurrence in crystal-packing interfaces, but additional research will be required to determine the relative energetic contributions of their van der Waals versus H-bonding interactions to lattice stabilization. Notably, the number of backbone H-bonding and van der Waals interactions made by arginine versus lysine residues in our reference and engineered structures do not show any clear trends (Table 2).

2.6 Influence of Bulk KR mutations on protein solubility in PEG3350 solutions

Thermodynamic solubility assays using polyethylene glycol 3350 (PEG3350) to induce protein precipitation (Arakawa & Timasheff, 1985; Bhat & Timasheff, 1992; Kita et al., 1994) assess the relative free energy of the hydrated state of individual protein molecules compared to the most favorable self-associated state under conditions of constant ionic strength but reduced water activity (effective concentration). In practice, these assays monitor optical density at 280 nm in the supernatant of solutions containing different concentrations of protein in the presence of increasing concentrations of PEG3350 after centrifugation to remove large particulate molecular assemblies. Therefore, they effectively measure the equilibrium concentration of protein that remains soluble as water activity is reduced. The observed results depend intrinsically on the free energy of the self-associated state, which varies significantly for different proteins in different solvent environments and can include crystalline phases and liquid–liquid phase separated (LLPS) phases in addition to heterogeneous amorphously precipitated phases. This factor can complicate the interpretation of thermodynamic solubility assays, but they nonetheless provide insight into physicochemical properties that ultimately control protein crystallization behavior.

WT hPDIa and the 9KR mutant show no significant difference in their behavior in PEG3350 precipitation assays (Figure S4a), indicating that the Bulk KR substitutions in this protein domain do not alter its thermodynamic solubility under these conditions even though they enable crystallization and high-resolution structure determination of a domain that does not crystallize at all with its native sequence. Even when harboring the 9KR mutations, hPDIa crystallizes only under a very small fraction (0.6%) of the solution conditions explored in high-throughput crystallization screening (Figure 4) while showing amorphous precipitation in many of them (data not shown). Therefore, our thermodynamic solubility assays on the hPDIa constructs are likely measuring the free energy of the hydrated state of individual protein molecules compared to amorphously precipitated phases, and they demonstrate that the physicochemical properties controlling the formation of such phases are likely different from those controlling protein crystallization behavior.

PEG3350 precipitation assays on our MA_2137 constructs demonstrate more complex phase behavior (Figure S4b,c) likely reflecting different physical forms of self-association under assay conditions. Notably, the WT and D65R mutant could not be precipitated by the highest 35% (v/v) concentration of PEG3350 that was assayed (Figure S4b). These protein constructs instead showed some tendency to exhibit a small increase in optical density at low PEG3350 concentration, likely reflecting light scattering due to some form of protein self-association in a low-density state that does not sediment during low-speed centrifugation. During crystallization screening, these constructs showed clear evidence of LLPS without any apparent amorphous precipitation in many reaction conditions (Figure S5). Therefore, the inability to precipitate these constructs at high PEG3350 concentration likely reflects LLPS being energetically more favorable for this protein under conditions of low water activity than amorphous precipitation. Our crystal structures of MA_2137 constructs show clear and well-ordered electron density for every residue in this 202-residue protein except for the C-terminal hexahistidine tag that was added to enable purification using NiNTA affinity chromatography and a 12-residue internal loop that is disordered in the MA_2137-D65R-11KR structure, although well ordered by Ca⁺⁺ ions from the mother liquor in the structure of the parental MA_2137-D65R construct (Figure S3b). Furthermore, our CD thermal melting data demonstrate that the protein is very stably folded (Figure 3). Therefore, our solubility data (Figures S4c and S5) combined with our crystallization screening data (Figure 4b) suggest that MA_2137 undergoes LLPS in an essentially fully folded conformational state.

In contrast to the behavior of the WT and the D65R constructs, the 5KR and 11KR constructs of MA_2137 show precipitation at the highest PEG3350 concentrations used in our solubility assays, with the 11KR construct showing stronger precipitation than the 5KR construct (Figure S4c). These results indicate the free energy of these MA_2137 constructs is lower in the precipitated state than in the LLPS state under conditions of very low water activity, reflecting a reduction in thermodynamic solubility. However, these constructs both crystallize extremely promiscuously, with the 5KR and 11KR constructs yielding crystallization hits in ~8% and ~ 15% of screened conditions, respectively (Figure 4). These results raise the possibility that the precipitate formed by the 5KR and 11KR constructs at very high PEG3350 concentration could be in a microcrystalline state rather than amorphously precipitated state due to the high efficacy of the Bulk KR mutations in promoting crystallization. Further research will be needed to determine whether the reduced solubility of the 5KR and 11KR constructs reflects the stabilization of crystalline states or amorphously precipitated states of MA_2137-D65R.

3 DISCUSSION

The results presented in this paper demonstrate the efficacy of a new method for probabilistic engineering of protein surface properties to enhance crystallization propensity based on the substitution of multiple lysine (K) residues with arginine (R). The rationale behind this “Bulk KR” substitution method is that lysine and arginine have very similar physicochemical properties, but arginine shows substantially higher overrepresentation than lysine in a large-scale computational analysis we performed of crystal structures deposited in the PDB (Figure 1) (Naumov et al., 2019). We have developed software to rank lysine sites for substitution based on the redundancy-corrected count of KR substitutions observed in homologous proteins with the highest level of sequence identity (Figure 2), based on the rationale that biological evolution selects against destabilizing and function-impairing mutations. We demonstrate that mutations selected this way are only minimally destabilizing (Figure 3 and Table 1) and significantly enhance crystallization propensity for two of three test proteins (Figure 4). The crystals yielded by our Bulk KR method diffract strongly and enabled efficient determination of a 1.9 Å crystal structure (Table S1) for the hPDIa protein domain that does not crystallize at all with its native sequence (Figure 4 and Table 1). Our crystal structures of Bulk KR substituted proteins show no significant conformational or stereochemical differences versus reference proteins (Figure S3). Furthermore, the engineered arginine residues, like the native ones, make both van der Waals contacts and H-bonds in crystal-packing interfaces at substantially higher frequencies than either lysine residues or other residues (Table 2). These crystal structures were produced by the Bulk KR constructs harboring the highest number of substitutions, which were the only constructs for which any diffraction data were measured. These results support the efficacy of a streamlined pipeline for crystal structure determination in which solubility is tested for a set of constructs with an increasing number of KR mutations, but purification and crystallization screening are only performed on the construct harboring the largest number of mutations. In summary, the biophysical results presented in this paper support bulk KR substitution being a rational and effective probabilistic strategy to engineer protein surface properties to enhance protein crystallization propensity.

Our Bulk KR method focuses on large-scale modification of protein surface properties using mutations between amino acids that conserve qualitative physicochemical properties. Many previous studies have demonstrated that mutation of individual surface residues generally changes crystallization behavior and can significantly improve it, but the strategies evaluated in the past have led to mixed results (Anstrom et al., 2005; Cieslik & Derewenda, 2009; Cooper et al., 2007; Czepas et al., 2004; Derewenda, 2004a; Derewenda, 2004b; Derewenda & Godzik, 2017; Derewenda & Vekilov, 2006; Janda et al., 2004; Longenecker et al., 2001; Mateja et al., 2002; Qiu & Janson, 2004). We therefore focused on the simultaneous introduction of multiple putative crystallization-enhancing mutations based on the hypothesis that stronger and more consistent improvements in crystallization behavior are likely to be promoted by more extensive changes in surface properties, assuming the individual changes tend to increase crystallization probability. This probabilistic surface-engineering strategy requires reliable information on the relative influence of different amino acids on crystallization propensity, and it also requires that the individual mutations do not significantly reduce protein stability so that large-scale mutagenesis does not impair protein folding and prevent effective purification. We used our computational analyses summarized in Figure 1 (Banayan et al., 2023; Naumov et al., 2019) to guide the selection of crystallization-enhancing amino acid substitutions, leading us to focus initially on KR substitutions because of the equivalent charge and similar volume and entropy of lysine and arginine sidechains.

KR mutations have been explored in the past both for their ability to modulate protein stability (Sokalingam et al., 2012) and crystallization propensity (Czepas et al., 2004). Bulk KR substitution in GFP was shown to greatly reduce the amount of soluble protein expressed in vivo in E. coli and also reduce the fluorescence level of the protein that could be purified, although the mutations slowed the rate of unfolding by chemical denaturants (Sokalingam et al., 2012). However, this study only examined 14KR and 19KR mutations at sites selected based on diffuse criteria. Our studies show that KR mutations at 25 sites selected based on the frequency of substitution observed in homologs show on average a 0.54°C reduction in apparent T_m per KR mutation (Figure 3 and Table 1).

A series of previous experimental studies examined the effects of mutating surface-exposed lysine residues on crystallization propensity (Anstrom et al., 2005; Czepas et al., 2004). These studies were guided by different conceptual premises that prioritize fundamentally different kinds of amino acid substitutions. The surface entropy reduction (SER) method, which represented pioneering research on the use of rational surface mutagenesis to improve protein crystallization propensity, focused on mutations that replace high entropy sidechains with low entropy sidechains, especially lysine-to-alanine (KA) mutations, in order to reduce the free energy penalty incurred upon immobilizing flexible surface residues in crystal-packing interfaces (Cieslik & Derewenda, 2009; Cooper et al., 2007; Czepas et al., 2004; Derewenda, 2004a; Derewenda, 2004b; Derewenda & Godzik, 2017; Derewenda & Vekilov, 2006; Janda et al., 2004; Longenecker et al., 2001; Mateja et al., 2002). The alternative approach was based on a computational analysis of the amino acids making crystal-packing interactions in a set of 233 protein crystal structures (Dasgupta et al., 1997). This groundbreaking analysis, which used different computational methods, produced different conclusions compared to our computational analysis of a much larger set of 87,684 crystal structures (Banayan et al., 2023; Naumov et al., 2019) (Figure 1). When comparing results for all 20 amino acids, there is not a statistically significant correlation between the single amino-acid crystal-packing propensities deduced from the two analyses (Figure S6). The most salient difference is that the earlier study concluded that lysine, glutamate, and tryptophan are all disfavored in crystal-packing contacts (Dasgupta et al., 1997), which contradicts our results reported in Figure 1 (Banayan et al., 2023; Naumov et al., 2019). Supporting the validity of our analysis, we have successfully used introduction of both lysine and glutamate residues to improve protein crystallization (Naumov et al., 2019). The earlier computational analysis did conclude that arginine and glutamine are favored in crystal-packing contacts (Dasgupta et al., 1997), consistent with our results (Banayan et al., 2023; Naumov et al., 2019) (Figure 1), leading the authors of the earlier analysis to suggest that KR and KQ substitutions could improve protein crystallization propensity.

Two different groups subsequently tested these proposals (Anstrom et al., 2005; Czepas et al., 2004). One group tested nine single, double, or triple KR mutations, five of which produced diffracting crystals, but only one of which produced a crystal that diffracted to higher resolution than the WT protein (Czepas et al., 2004). Their overall conclusion was that KR mutations show lower efficacy in improving protein crystallization than KA mutations (Czepas et al., 2004). The other group conducted a more extensive study that effectively led to the opposite conclusion regarding the efficacy of KA mutations (Anstrom et al., 2005). They systematically examined the substitution of 15 surface-exposed lysine residues with either glutamine or alanine, which did not yield any well-diffracting crystals. This rigorous study demonstrated that the surface mutations consistently changed the crystallization conditions that yielded hits, and weakly diffracting crystals were obtained from one KQ and one KA mutant protein, while none were obtained from the WT protein using the same crystallization screens. However, the KQ mutations yielded hits in those screens at the same rate at the WT protein within experimental error, while the KA mutations yielded hits at a significantly reduced rate that was ~2/3 that of the WT protein (Anstrom et al., 2005). These experimental results are consistent with our computational results reported in Figure 1 (Banayan et al., 2023; Naumov et al., 2019), which show glutamine has a somewhat higher probability of participating in crystal-packing contacts than lysine, while alanine has a significantly lower probability of participating in crystal-packing contacts. The experimental studies summarized above highlight the probabilistic nature of the influence of single amino acid substitutions on crystallization propensity, which contributed to the development of our strategy focused on engineering more consistent improvements in crystallization behavior via simultaneous introduction of multiple putative crystallization-enhancing surface mutations.

This large-scale surface-mutagenesis strategy requires that individual mutations preserve or at most minimally perturb protein stability, which has led us to focus on physicochemically conservative substitutions and initially KR mutations. One complication with KA mutations is that the grossly different physicochemical properties of alanine compared to lysine are more likely to produce significant protein destabilization. This effect impedes the introduction of multiple KA mutations, while our results (Tables 1 and 2) and previously reported results (Czepas et al., 2004) show that the introduction of multiple mutations is an effective strategy to improve crystallization propensity. Our computational analyses raise additional questions about the KA method. Alanine is significantly underrepresented in crystal-packing interfaces, while lysine is significantly overrepresented (Figure 1). These observations are consistent with the results of computational analyses demonstrating that, for all amino acids, increasing solvent exposure correlates strongly with increasing the probability of making a crystal-packing interaction (Banayan, 2023). The low solvent exposure of alanine residues at most sites in proteins (Banayan, 2023; Rost & Sander, 1994; Tien et al., 2013) is therefore consistent with the strong underrepresentation of alanine in crystal-packing interfaces (Banayan et al., 2023; Naumov et al., 2019) (Figure 1). Additional computational analyses show that some alanine-containing sequences located in α-helix-capping motifs and short surface loops are significantly overrepresented in crystal-packing interfaces (Naumov et al., 2019; Price 2nd et al., 2009). These results suggest that the influence of KA mutations on crystallization propensity is likely to depend strongly on local structural context (Cieslik & Derewenda, 2009; Cooper et al., 2007; Czepas et al., 2004; Derewenda, 2004a; Derewenda, 2004b; Derewenda & Godzik, 2017; Derewenda & Vekilov, 2006; Janda et al., 2004; Longenecker et al., 2001; Mateja et al., 2002) and may involve more complex physicochemical effects than SER.

The conceptual foundation of the crystallization engineering method reported in this paper is fundamentally different from that of earlier studies. As indicated above, it focuses on probabilistic reengineering of protein surface properties via simultaneous substitution of multiple amino acids with similar physicochemical properties but different propensity to make crystal-packing interactions based on large-scale computational analyses of previously determined crystal structures (Banayan et al., 2023; Naumov et al., 2019) (Figure 1). We demonstrated significant improvement in the crystallization properties of two out of three target proteins when the method was applied using a streamlined experimental design in which a single mutant construct of each target was purified and subjected to crystallization screening (Figures 4 and 5, Tables 1 and 2, and Table S1). For each target, the expression and solubility properties were evaluated for a series of constructs containing an increasing number of physicochemically conservative crystallization-enhancing mutations, which is readily done in parallel for at least a dozen constructs, but exclusively the soluble construct with the greatest number of crystallization-enhancing mutations was purified, which is substantially more labor-intensive. This streamlined experimental protocol led to solution of a high-resolution crystal structure (Figure 5b and Table S1) for the hPDIa domain, a human drug target that does not yield any crystal hits with its native sequence (Figure 4b).

Given the success of the Bulk KR method in improving protein crystallization behavior, our computational analysis of crystal-packing interactions in the PDB (Figure 1) suggests several related strategies with promise to improve crystallization behavior based on the same conceptual approach. Aspartate and glutamate frequently substitute for one another in the course of evolution (Henikoff & Henikoff, 1993; Morcos et al., 2011; Thomas et al., 2008) due to their very similar physicochemical properties, but glutamate shows over twofold higher overrepresentation in crystal-packing interfaces (Figure 1). A similar trend relative to crystal packing interactions is observed for asparagine and glutamine, which also have very similar physicochemical properties. These observations suggest bulk DE and NQ substitutions are also likely to improve crystallization propensity.

In the case of these substitutions, the higher entropy of the sidechain with greater crystal-packing propensity will tend to thermodynamically oppose immobilization in a crystal-packing interface, while this factor does not apply to bulk KR substitution due to the very similar entropy of these sidechains. However, our computational analyses of crystal-packing interactions in the PDB (Banayan, 2023; Naumov et al., 2019) shows that high-entropy sidechains mediating crystal-packing interactions tend to participate in salt-bridging and H-bonding interactions (Donald et al., 2011; Olson et al., 2001; Vener et al., 2015) with nearby residues in the primary sequence, especially at ±3 and ±4 positions in α-helices and ±2 positions in β-strands. These interactions likely reduce the entropy of the sidechains in the isolated protein molecules, which will reduce or eliminate entropy loss due to immobilization in a crystal-packing interface. Therefore, bulk DE and NQ substitution seems likely to be most effective when the residues prioritized for mutation have potential salt-bridging or H-bonding partners at ±3 and ±4 positions in α-helices or ±2 positions in β-strands.

Future research by the structural biology community will be required to rigorously assess the efficacy of the Bulk KR method and the related DE and NQ bulk substitution methods proposed above. Given the relatively small sizes of the proteins evaluated in our experiments, one important question to be addressed in future studies is how the number of KR mutations tolerated by a protein and the number needed to achieve substantial improvement in crystallization properties scales with protein size. Future studies will also be needed to establish the most efficient paradigm for combing KR, DE, and NQ mutations to maximize crystallization hit rate and crystal quality while minimizing the number of constructs needed to obtain a diversity of different crystal forms and a high-resolution crystal structure. Nonetheless, our bulk substitution method focused on large-scale probabilistic remodeling of protein surface properties to enhance crystallization propensity already shows significant efficacy for rational engineering of proteins to improve their crystallization properties.

4 MATERIALS AND METHODS

4.1 Site-selection software and input sequence alignment format

Our Python program that automatically generates a suggested order for mutation of lysine residues to arginine incorporates routines from the BioPython package (Cock et al., 2009). The program code is available for download in our Github archive (https://github.com/huntmolecularbiophysicslab/pxengineering), and it can also be run interactively using our protein crystallization engineering webserver (http://www.pxengineering.org). The program requires the input of an alignment of homologous sequences in Clustal (Thompson et al., 1994) format. In brief, the first line of files in this ASCII format must start with the words “CLUSTAL W” or “CLUSTALW.” Blank lines then separate sequential blocks of text containing aligned sequence segments with one sequence per line. Each of those lines starts with a string containing the sequence name, which is followed by blank spaces and then up to 60 single-letter amino acid codes. Each line can contain additional blank spaces followed by a residue number. Each block optionally ends with a line containing characters encoding the degree of sequence conservation at each position in the alignment.

4.2 GPU acceleration of sequence identity calculation

Our Python program accelerates the calculation of the absolute percent identity between two sequences by parallelizing site-by-site comparison on a Graphical Processing Unit (GPU) chip using a custom reduction kernel (Kirk & Hwu, 2023) written using the CuPy library (Okuta et al., 2017). In brief, each pair of aligned amino acids in two sequences is sent to a separate GPU core. If the one-character amino acid codes in both sequences match each other and neither is empty due to a gap in the sequence alignment, that core stores a value of 1 in its register, which is otherwise set to 0. The values in the registers for all sites are then summed using parallel reduction to count the number of identical amino acids in the sequence. Details can be found in the code at https://github.com/huntmolecularbiophysicslab/pxengineering.

4.3 Prioritization of mutation sites based on redundancy-corrected counts of KR mutations observed in homologous proteins

To compensate for outright redundancy as well as inhomogeneous phylogenetic sampling in sequence databases, our software performs two redundancy-compensation calculations on the set of sequences in each percent-identity bin having an arginine substitution at a specific lysine site in the target protein. Both calculations use the same heuristic estimate for the probability of evolutionary resampling at an aligned site between two sequences i and j with an overall fraction f_id of identical residues at all aligned sites:

{P}_{site- resampling}{\left({f}_{id}\right)}_{ij}=\left\{\ \begin{array}{c}1.0\kern0.75em for\ {f}_{id}\le {f}_{min}\ \\ {}\left(\frac{1-{f}_{id}}{1-{f}_{min}}\right)\kern0.75em for\ {f}_{id}>{f}_{min}\ \end{array}\right\}.

We assume f_min = 0.3. One calculation gives a redundancy-reduced estimate C_R of arginine counts using the following formula in which the summation is performed over all unique pairs of the N sequences in the bin having an arginine substitution at one lysine site in the target protein:

{C}_R=\left(\frac{2}{N}\sum \limits_{i<j}{P}_{site- resampling}{\left({f}_{id}\right)}_{ij}\right)+1.

This calculation is extremely rapid and includes all homologs but is only rigorously accurate in the limiting cases of full redundancy or full independence. The second calculation gives a rigorous estimate of the expectation value for the number of independent observations of arginine based on combinations of the heuristic probabilities of being resampled or not being resampled for all unique pairs among the seven most diverged sequences in each bin that have arginine at a specific lysine site. These sequences are identified by taking the single sequence with the lowest percent identity to the target sequence and then progressively adding the sequence with the lowest average percent identity to those already selected. The details of the implementation can be found in the code at https://github.com/huntmolecularbiophysicslab/pxengineering.

4.4 Protein expression and purification

Protein coding sequences harboring Bulk KR substitutions were synthesized (Twist Bioscience, South San Francisco, CA) with a short C-terminal hexahistidine tag (LEHHHHHH), cloned under the control of the T7 RNA polymerase promoter in the pET21_NESG expression plasmid (https://dnasu.org/DNASU/GetCloneDetail.do?cloneid=336944), and then transformed into BL21(DE3) Rosetta E. coli cells (MilliporeSigma, Burlington, MA, USA). Protein expression was induced with 1 mM IPTG for 4 h at 30°C in Terrific Broth (Terrific Broth, 2015). Cells were pelleted by centrifugation at 4000 rpm for 25 min at 4°C and resuspended on ice in 10 mM imidazole, 300 mM NaCl, 1 mM TCEP, 5% (w/v) glycerol, 50 mM NaH₂PO₄, and pH 7.5, before cell lysis by probe sonication. The supernatant following 15,000 rpm centrifugation at 4°C was mixed with Ni-NTA resin and incubated at 4°C for 1 h. The mixture was then transferred into a column and washed with the same buffer containing a higher 100 mM imidazole concentration before elution of the protein in 6 mL of the same buffer containing 250 mM imidazole. A 1-mL aliquot of eluted protein was concentrated to 500 μL using an Amicon 10 kDa centrifugal filter (MilliporeSigma, Burlington, MA, USA) and loaded via a 1-mL loop onto a Superdex 200 Increase 10/300 gL gel filtration column equilibrated in 100 mM NaCl, 10 mM DTT, 10 mM Tris-Cl, pH 7.5. Protein-containing fractions were concentrated to ~15 mg/mL based on a priori sequence-based extinction coefficients (Gill & von Hippel, 1989), OD_280nm values were measured using a Nanodrop spectrophotometer (ThermoFisher, Waltham, MA, USA), and the concentrated protein was immediately flash-frozen in aliquots in liquid nitrogen before storage at −80°C pending use.

4.5 Thermal stability assays using CD spectroscopy

An Applied Photophysics (Leatherhead, UK) Chirascan V100 spectropolarimeter with a Peltier-jacketed cell holder was used to collect serial CD spectra spanning 200–250 nm continuously during a 3°C/min thermal ramp nominally running from 10°C to 84°C. Data were measured from protein samples in a 0.5-mm quartz cuvette in 1 nm increments using a 0.25 integration time per point and a 1-nm bandwidth, corresponding to 35 s per spectrum. Direct measurements of the cell temperature during the experiment, which were used for data display and analysis, indicated the actual range of the temperature ramp was from ~11°C to 78°C for all samples. Protein samples were diluted to 2 mg/mL using gel filtration buffer lacking DTT (i.e., 100 mM NaCl, 10 mM Tris-Cl, and pH 7.5) with the addition of 1 mM SAH for the MA_2137 constructs. Global curve fitting of spectral data during the thermal ramp was performed from 215 to 230 nm using the program GLOBAL3 (Applied Photophysics) using double linear baseline correction (i.e., before and after the observed transitions) to extract the thermodynamic parameters and melting temperatures (Table 1 and Figure 3). Each dataset was analyzed using the smallest number of transitions showing approximately random directions for the residuals for adjacent points in the CD versus measured-temperature surface. Because the reversibility of the unfolding transitions was not evaluated, the inferred T_m and ΔH_vH values listed in Table 1 and cited in the text are labeled as apparent.

4.6 Solubility assays

Protein stock solutions were diluted to working concentration in the same buffer used for gel-filtration chromatography containing different weight/volume concentrations of PEG3350. Following a 60-min incubation at room temperature, the samples were spun for 10 min at 14,000 RPM in a microfuge to pellet particulates, and the concentration of protein in the supernatant was measured using the optical density at 280 nm measured in a Nanodrop spectrophotometer (ThermoFisher) based on the a priori extinction coefficient (Gill & von Hippel, 1989). Centrifugation and measurement of concentration in the supernatant were repeated 24 h later to ensure equilibrium had been reached.

4.7 Protein crystallization screening

For each target protein, the parental construct and the construct containing the largest number of KR mutations that could be purified, but not the constructs containing fewer KR mutations, were submitted for crystallization screening using the standard protocol (Budziszewski et al., 2023; Luft et al., 2001; Luft et al., 2003; Luft, Snell, et al., 2011; Luft, Wolfley, et al., 2011; Lynch et al., 2023) at the High-Throughput Crystallization Screening Center at the Hauptman-Woodward Medical Research Institute (https://hwi.buffalo.edu/high-throughput-crystallization-center/). Protein stocks were adjusted to ~15 mg/mL concentration before mixing 1:1 with the well solution for microbatch under-oil crystallization screening. Crystallization reactions were imaged in a Rock Imager before protein addition and at 1 day, 1 week, 2, 3, 4, and 6 weeks after reaction setup using parallel brightfield microscopy, ultraviolet two-photon-exited fluorescence microscopy, and SONICC (second-order harmonic imaging of chiral crystals) microscopy. Hits were identified based on visual observation of refractive supramolecular aggregates in brightfield microscopy images that had corresponding features in either fluorescence or SONICC microscopy images or both. Hits were characterized as “High Quality” based on the judgement of an expert protein crystallographer that it would likely be straightforward to reproduce the crystals and optimize them to sufficient size to mount for measurement of x-ray diffraction data.

4.8 Protein crystal optimization

A small subset of the crystal hits for hPDIa-9KR and MA_2137-D65R-11KR was reproduced using the microbatch under-oil method at 4°C and 18°C, and they were subsequently optimized by seeding. The crystal used for determination of the hPDIa-9KR structure was grown by mixing the 15 mg/mL stock solution at a 2:1 volume ratio with a crystallization reagent comprising 24% (w/v) PEG 20k, 0.1 M potassium thiocyanate, 0.1 M MES, and pH 6. The crystal used for determination of the MA_2137-D65R-11KR structure was grown by mixing the 15 mg/mL stock solution at a 1:1 volume ratio with a crystallization reagent comprising 30% (w/v) PEG 1k, 0.1 M HEPES, and pH 7.5. All crystals were transferred into a similar crystallization solution supplemented with 20% (v/v) ethylene glycol before mounting and flash-freezing in liquid nitrogen.

4.9 Crystal structure determination and refinement

X-ray diffraction data were collected from single crystals of hPDIa-9 M and MA_2137-D65R-11 using, respectively, the NE-CAT 24-ID-E and 24-ID-C beam lines at the Advanced Photon Source (Table S1). The images were processed and scaled using XDS (Kabsch, 1988a; Kabsch, 1988b; Kabsch, 2010a; Kabsch, 2010b). The structure of hPDIa-9KR was solved by molecular replacement using the program MOLREP (Vagin & Teplyakov, 2010) employing a search model comprising the first domain in the crystal structure of full-length hPDI (PDB id 4EKZ). The structure of MA_2137-D65R-11KR was solved using the same methods employing the structure of MA_2137-D65R (PDB id 6MRO) as the search model. Both structures were refined (Table S1) using PHENIX (Liebschner et al., 2019) in conjunction with manual rebuilding in XtalView (McRee, 1999) and COOT (Casanal et al., 2020).

AUTHOR CONTRIBUTIONS

Nooriel E. Banayan: Software; investigation; formal analysis; writing – original draft; visualization; writing – review and editing; validation; data curation; methodology. Blaine J. Loughlin: Investigation. Shikha Singh: Methodology; investigation. Farhad Forouhar: Investigation. Guanqi Lu: Software. Kam-Ho Wong: Investigation. Matthew Neky: Investigation. Henry S. Hunt: Software. Larry B. Bateman: Software. Angel Tamez: Software. Samuel K. Handelman: Methodology. W. Nicholson Price: Methodology. John F. Hunt: Conceptualization; software; formal analysis; funding acquisition; project administration; writing – original draft; methodology; data curation; validation; supervision; visualization; resources; writing – review and editing.

ACKNOWLEDGMENTS

This work was supported by a grant from the US NIH-NIGMS to JFH (GM127883). We thank Accendro Inc. for assistance with web programming and G.T. Montelione, R. Xiao, and the other members of the Northeast Structural Genomics Consortium for long-term collaboration and advice.

CONFLICT OF INTEREST STATEMENT

JFH is a member of the scientific advisory board of Nexomics Biosciences.

Supporting Information

Filename

Description

pro4898-sup-0001-supinfo.pdfPDF document, 1.1 MB

Table S1. Data collection and refinement statistics for Bulk KR crystal structures.

Figure S1. Pairwise percent sequence identity distributions between homologs in mutually exclusive bins spanning 10% ranges of sequence identity to target protein hPDIa.

Figure S2. Stereo views of crystal-packing interactions in structures from the parental proteins and protein constructs containing Bulk KR mutations. The protein backbone is shown in ribbon representation, colored gray for symmetry mates, and shades of green, blue, cyan, and teal for the subunits modeled in the asymmetric unit of each crystal structure. The sidechains of the native arginines (magenta) and residues mutated from lysine-to-arginine (red) are shown in space-filling representation. Note that the red residues in the parental hPDI(abb‘xa’) and MA_2137-D65R structures on the left are the native lysine residues, while the red residues in the hPDIa-9KR and MA_2137-D65R-11KR structures on the right are the mutated arginine residues.

Figure S3. Crystal structures of Bulk KR-substituted protein constructs show no significant conformational or stereochemical changes compared to reference structures. (a) Comparison of our crystal structure of hPDIa-9KR (Figure 5b and Table S1) to an earlier structure of the same domain as part of a much larger multi-domain hPDI(abb‘xa’) construct (PDB ID 4EKZ) (Wang et al., 2013). (b) Comparison of our crystal structures of MA_2137-D65R-11KR (Figure 5d and Table S1) and MA_2137-D65R (PDB ID 6MRO). The loop at residues 141–152 is ordered by a high concentration of Ca⁺⁺ present in the mother liquor for the latter structure, while there was no Ca⁺⁺ present in the mother liquor for former structure. Least-squares alignments were performed in PyMOL (www.pymol.org) and ChimeraX (Pettersen et al., 2021) using the default algorithms. (c–e) More detailed analyses of the structural differences in the Bulk KR constructs compared to the parental domains were performed using the Global Distance Test (GDT) (Zhang & Skolnick, 2004). The top graphs in panels (c and d) show the Cα displacements between all shared residues in the alignments used for the test; the vertical dotted red lines in these panels indicate the sites of the KR mutations. The bottom graphs in these panels show the corresponding histograms of the Cα displacements for all residues (blue) and the engineered arginine residues (red). Panel (e) shows the fraction of Cα atoms within the indicated cutoff distances as calculated by the High Accuracy GDT algorithm.

Figure S4. Solubility assays on selected hPDIa and MA_2137 constructs. Assays were conducted at room temperature. The lower graph in the center column shows data collected without S-adenosylhomocysteine (SAH) in the buffer, which is a product of the methyltransferase reaction catalyzed by MA_2137, while all of the other solubility assays on this protein were conducted in the presence of 1 mM SAH. The top graphs in both MA_2137 columns show the same data, which are duplicated but matched in scale to the graphs beneath them in order to provide a visual reference.

Figure S5. Optical micrographs showing time evolution of representative MA_2137-D65R crystallization screening reactions showing evidence of liquid–liquid phase separation. Representative micrographs from the HWI generation #19 1536-well crystal screen (https://hwi.buffalo.edu/crystallization-cocktails/) show evidence of liquid–liquid phase separation for MA_2137-D65R. Other than a short engineered C-terminal affinity tag and a 12-residue internal loop, likely to be a calcium-binding site (Figure S3b), the backone atoms of all residues in this protein are well-ordered based on our crystal structures.

Figure S6. Comparison of overrepresentation ratios of amino acids in crystal-packing interfaces in our analysis of 87,684 crystal structures (Naumov et al., 2019) (Figure 1) to an earlier analysis of contact potentials in crystal-packing interfaces in 233 crystal structures (Dasgupta et al., 1997).

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

REFERENCES

Acton TB, Xiao R, Anderson S, Aramini J, Buchwald WA, Ciccosanti C, et al. Preparation of protein samples for NMR structure, function, and small-molecule screening studies. Methods Enzymol. 2011; 493: 21–60. https://doi.org/10.1016/B978-0-12-381274-2.00002-9
10.1016/B978-0-12-381274-2.00002-9
CAS PubMed Web of Science® Google Scholar
Anstrom DM, Colip L, Moshofsky B, Hatcher E, Remington SJ. Systematic replacement of lysine with glutamine and alanine in Escherichia coli malate synthase G: effect on crystallization. Acta Crystallogr Sect F Struct Biol Cryst Commun. 2005; 61: 1069–1074. https://doi.org/10.1107/S1744309105036559
10.1107/S1744309105036559
CAS PubMed Web of Science® Google Scholar
Arakawa T, Ejima D, Tsumoto K, Obeyama N, Tanaka Y, Kita Y, et al. Suppression of protein interactions by arginine: a proposed mechanism of the arginine effects. Biophys Chem. 2007; 127: 1–8. https://doi.org/10.1016/j.bpc.2006.12.007
10.1016/j.bpc.2006.12.007
CAS PubMed Web of Science® Google Scholar
Arakawa T, Timasheff SN. Mechanism of poly(ethylene glycol) interaction with proteins. Biochemistry. 1985; 24: 6756–6762. https://doi.org/10.1021/bi00345a005
10.1021/bi00345a005
CAS PubMed Web of Science® Google Scholar
Arakawa T, Tsumoto K. The effects of arginine on refolding of aggregated proteins: not facilitate refolding, but suppress aggregation. Biochem Biophys Res Commun. 2003; 304: 148–152. https://doi.org/10.1016/s0006-291x(03)00578-3
10.1016/S0006-291X(03)00578-3
CAS PubMed Web of Science® Google Scholar
Auton M, Bolen DW. Application of the transfer model to understand how naturally occurring osmolytes affect protein stability. Methods Enzymol. 2007; 428: 397–418. https://doi.org/10.1016/S0076-6879(07)28023-1
10.1016/S0076-6879(07)28023-1
CAS PubMed Web of Science® Google Scholar
Auton M, Holthauzen LM, Bolen DW. Anatomy of energetic changes accompanying urea-induced protein denaturation. Proc Natl Acad Sci U S A. 2007; 104: 15317–15322. https://doi.org/10.1073/pnas.0706251104
10.1073/pnas.0706251104
CAS PubMed Web of Science® Google Scholar
Auton M, Rosgen J, Sinev M, Holthauzen LM, Bolen DW. Osmolyte effects on protein stability and solubility: a balancing act between backbone and side-chains. Biophys Chem. 2011; 159: 90–99. https://doi.org/10.1016/j.bpc.2011.05.012
10.1016/j.bpc.2011.05.012
CAS PubMed Web of Science® Google Scholar
Banayan N. A computational approach to rational engineering of protein crystallization. PhD thesis. Columbia University in the City of New York; 2023. https://doi.org/10.7916/cq51-nf34
Google Scholar
Banayan N, Loughlin BJ, Singh S, Forouhar F, Lu G, Wong K-H, et al. Systematic enhancement of protein crystallization efficiency by bulk lysine-to-arginine (KR) substitution. bioRxiv. 2023. https://doi.org/10.1101/2023.06.03.543563
10.1101/2023.06.03.543563
Google Scholar
Beck DA, Bennion BJ, Alonso DO, Daggett V. Simulations of macromolecules in protective and denaturing osmolytes: properties of mixed solvent systems and their effects on water and protein structure and dynamics. Methods Enzymol. 2007; 428: 373–396. https://doi.org/10.1016/S0076-6879(07)28022-X
10.1016/S0076-6879(07)28022-X
CAS PubMed Web of Science® Google Scholar
Bennion BJ, Daggett V. The molecular basis for the chemical denaturation of proteins by urea. Proc Natl Acad Sci U S A. 2003; 100: 5142–5147. https://doi.org/10.1073/pnas.0930122100
10.1073/pnas.0930122100
CAS PubMed Web of Science® Google Scholar
Bhat R, Timasheff SN. Steric exclusion is the principal source of the preferential hydration of proteins in the presence of polyethylene glycols. Protein Sci. 1992; 1: 1133–1143. https://doi.org/10.1002/pro.5560010907
10.1002/pro.5560010907
CAS PubMed Web of Science® Google Scholar
Bhowmick A, Head-Gordon T. A monte carlo method for generating side chain structural ensembles. Structure. 2015; 23: 44–55. https://doi.org/10.1016/j.str.2014.10.011
10.1016/j.str.2014.10.011
CAS PubMed Web of Science® Google Scholar
Bijak V, Szczygiel M, Lenkiewicz J, Gucwa M, Cooper DR, Murzyn K, et al. The current role and evolution of X-ray crystallography in drug discovery and development. Expert Opin Drug Discovery. 2023; 18: 1221–1230. https://doi.org/10.1080/17460441.2023.2246881
10.1080/17460441.2023.2246881
PubMed Web of Science® Google Scholar
Boel G, Letso R, Neely H, Price WN, Wong KH, Su M, et al. Codon influence on protein expression in E. coli correlates with mRNA levels. Nature. 2016; 529: 358–363. https://doi.org/10.1038/nature16509
10.1038/nature16509
CAS PubMed Web of Science® Google Scholar
Bolen DW, Rose GD. Structure and energetics of the hydrogen-bonded backbone in protein folding. Annu Rev Biochem. 2008; 77: 339–362. https://doi.org/10.1146/annurev.biochem.77.061306.131357
10.1146/annurev.biochem.77.061306.131357
CAS PubMed Web of Science® Google Scholar
Budziszewski GR, Snell ME, Wright TR, Lynch ML, Bowman SEJ. High-throughput screening to obtain crystal hits for protein crystallography. J Vis Exp. 2023; e65211. https://doi.org/10.3791/65211
10.3791/65211
Web of Science® Google Scholar
Canaves JM, Page R, Wilson IA, Stevens RC. Protein biophysical properties that correlate with crystallization success in Thermotoga maritima: maximum clustering strategy for structural genomics. J Mol Biol. 2004; 344: 977–991. https://doi.org/10.1016/j.jmb.2004.09.076
10.1016/j.jmb.2004.09.076
CAS PubMed Web of Science® Google Scholar
Casanal A, Lohkamp B, Emsley P. Current developments in Coot for macromolecular model building of Electron Cryo-microscopy and Crystallographic Data. Protein Sci. 2020; 29: 1069–1078. https://doi.org/10.1002/pro.3791
10.1002/pro.3791
CAS PubMed Web of Science® Google Scholar
Chowdhury R, Bouatta N, Biswas S, Floristean C, Kharkar A, Roy K, et al. Single-sequence protein structure prediction using a language model and deep learning. Nat Biotechnol. 2022; 40: 1617–1623. https://doi.org/10.1038/s41587-022-01432-w
10.1038/s41587-022-01432-w
CAS PubMed Web of Science® Google Scholar
Cieslik M, Derewenda ZS. The role of entropy and polarity in intermolecular contacts in protein crystals. Acta Crystallogr D Biol Crystallogr. 2009; 65: 500–509.
10.1107/S0907444909009500
CAS PubMed Web of Science® Google Scholar
Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009; 25: 1422–1423. https://doi.org/10.1093/bioinformatics/btp163
10.1093/bioinformatics/btp163
CAS PubMed Web of Science® Google Scholar
Cooper DR, Boczek T, Grelewska K, Pinkowska M, Sikorska M, Zawadzki M, et al. Protein crystallization by surface entropy reduction: optimization of the SER strategy. Acta Crystallogr D Biol Crystallogr. 2007; 63: 636–645. https://doi.org/10.1107/S0907444907010931
10.1107/S0907444907010931
CAS PubMed Web of Science® Google Scholar
Courtenay ES, Capp MW, Record MT Jr. Thermodynamics of interactions of urea and guanidinium salts with protein surface: relationship between solute effects on protein processes and changes in water-accessible surface area. Protein Sci. 2001; 10: 2485–2497. https://doi.org/10.1110/ps.ps.20801
10.1110/ps.ps.20801
CAS PubMed Web of Science® Google Scholar
Courtenay ES, Capp MW, Saecker RM, Record MT Jr. Thermodynamic analysis of interactions between denaturants and protein surface exposed on unfolding: interpretation of urea and guanidinium chloride m-values and their correlation with changes in accessible surface area (ASA) using preferential interaction coefficients and the local-bulk domain model. Proteins. 2000;41: 72–85. https://doi.org/10.1002/1097-0134(2000)41:4+<72::aid-prot70>3.0.co;2-7
10.1002/1097-0134(2000)41:4+<72::AID-PROT70>3.0.CO;2-7
PubMed Web of Science® Google Scholar
Cruickshank D. The required precision of intensity measurements for single-crystal analysis. Acta Crystallogr. 1960; 13: 774–777.
10.1107/S0365110X60001874
CAS Google Scholar
Czepas J, Devedjiev Y, Krowarsch D, Derewenda U, Otlewski J, Derewenda ZS. The impact of Lys→Arg surface mutations on the crystallization of the globular domain of RhoGDI. Acta Crystallogr D Biol Crystallogr. 2004; 60: 275–280.
10.1107/S0907444903026271
CAS PubMed Web of Science® Google Scholar
Das U, Hariprasad G, Ethayathulla AS, Manral P, Das TK, Pasha S, et al. Inhibition of protein aggregation: supramolecular assemblies of arginine hold the key. PLoS One. 2007; 2:e1176. https://doi.org/10.1371/journal.pone.0001176
10.1371/journal.pone.0001176
CAS PubMed Web of Science® Google Scholar
Dasgupta S, Iyer GH, Bryant SH, Lawrence CE, Bell JA. Extent and nature of contacts between protein molecules in crystal lattices and between subunits of protein oligomers. Proteins. 1997; 28: 494–514.
10.1002/(SICI)1097-0134(199708)28:4<494::AID-PROT4>3.0.CO;2-A
CAS PubMed Web of Science® Google Scholar
Derewenda ZS. The use of recombinant methods and molecular engineering in protein crystallization. Methods. 2004a; 34: 354–363.
10.1016/j.ymeth.2004.03.024
CAS PubMed Web of Science® Google Scholar
Derewenda ZS. Rational protein crystallization by mutational surface engineering. Structure. 2004b; 12: 529–535.
10.1016/j.str.2004.03.008
CAS PubMed Web of Science® Google Scholar
Derewenda ZS, Godzik A. The “sticky patch” model of crystallization and modification of proteins for enhanced crystallizability. Methods Mol Biol. 2017; 1607: 77–115. https://doi.org/10.1007/978-1-4939-7000-1_4
10.1007/978-1-4939-7000-1_4
CAS PubMed Google Scholar
Derewenda ZS, Vekilov PG. Entropy and surface engineering in protein crystallization. Acta Crystallogr D Biol Crystallogr. 2006; 62: 116–124.
10.1107/S0907444905035237
CAS PubMed Web of Science® Google Scholar
Donald JE, Kulp DW, DeGrado WF. Salt bridges: geometrically specific, designable interactions. Proteins. 2011; 79: 898–915. https://doi.org/10.1002/prot.22927
10.1002/prot.22927
CAS PubMed Web of Science® Google Scholar
DuBay KH, Geissler PL. Calculation of proteins' total side-chain torsional entropy and its influence on protein-ligand interactions. J Mol Biol. 2009; 391: 484–497. https://doi.org/10.1016/j.jmb.2009.05.068
10.1016/j.jmb.2009.05.068
CAS PubMed Web of Science® Google Scholar
Everett JK, Tejero R, Murthy SB, Acton TB, Aramini JM, Baran MC, et al. A community resource of experimental data for NMR/X-ray crystal structure pairs. Protein Sci. 2016; 25: 30–45. https://doi.org/10.1002/pro.2774
10.1002/pro.2774
CAS PubMed Web of Science® Google Scholar
Ferreon AC, Bolen DW. Thermodynamics of denaturant-induced unfolding of a protein that exhibits variable two-state denaturation. Biochemistry. 2004; 43: 13357–13369. https://doi.org/10.1021/bi048666j
10.1021/bi048666j
CAS PubMed Web of Science® Google Scholar
Gill SC, von Hippel PH. Calculation of protein extinction coefficients from amino acid sequence data. Anal Biochem. 1989; 182: 319–326.
10.1016/0003-2697(89)90602-7
CAS PubMed Web of Science® Google Scholar
Goedken ER, Keck JL, Berger JM, Marqusee S. Divalent metal cofactor binding in the kinetic folding trajectory of Escherichia coli ribonuclease HI. Protein Sci. 2000; 9: 1914–1921. https://doi.org/10.1110/ps.9.10.1914
10.1110/ps.9.10.1914
CAS PubMed Web of Science® Google Scholar
Goedken ER, Marqusee S. Co-crystal of Escherichia coli RNase HI with Mn2+ ions reveals two divalent metals bound in the active site. J Biol Chem. 2001; 276: 7266–7271. https://doi.org/10.1074/jbc.M009626200
10.1074/jbc.M009626200
CAS PubMed Web of Science® Google Scholar
Graf PC, Martinez-Yamout M, VanHaerents S, Lilie H, Dyson HJ, Jakob U. Activation of the redox-regulated chaperone Hsp33 by domain unfolding. J Biol Chem. 2004; 279: 20529–20538. https://doi.org/10.1074/jbc.M401764200
10.1074/jbc.M401764200
CAS PubMed Web of Science® Google Scholar
Grimes JM, Hall DR, Ashton AW, Evans G, Owen RL, Wagner A, et al. Where is crystallography going? Acta Crystallogr D Struct Biol. 2018; 74: 152–166. https://doi.org/10.1107/S2059798317016709
10.1107/S2059798317016709
CAS PubMed Web of Science® Google Scholar
Gupta K, Varadarajan R. Insights into protein structure, stability and function from saturation mutagenesis. Curr Opin Struct Biol. 2018; 50: 117–125. https://doi.org/10.1016/j.sbi.2018.02.006
10.1016/j.sbi.2018.02.006
CAS PubMed Web of Science® Google Scholar
Hendrickson WA. Synchrotron crystallography. Trends Biochem Sci. 2000; 25: 637–643. https://doi.org/10.1016/s0968-0004(00)01721-7
10.1016/S0968-0004(00)01721-7
CAS PubMed Web of Science® Google Scholar
Hendrickson WA. Facing the phase problem. IUCrJ. 2023; 10: 521–543. https://doi.org/10.1107/S2052252523006449
10.1107/S2052252523006449
CAS PubMed Web of Science® Google Scholar
Hendrickson WA, Horton JR, LeMaster DM. Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD): a vehicle for direct determination of three-dimensional structure. EMBO J. 1990; 9: 1665–1672.
10.1002/j.1460-2075.1990.tb08287.x
CAS PubMed Web of Science® Google Scholar
Henikoff S, Henikoff JG. Performance evaluation of amino acid substitution matrices. Proteins. 1993; 17: 49–61. https://doi.org/10.1002/prot.340170108
10.1002/prot.340170108
CAS PubMed Web of Science® Google Scholar
Hoffstrom BG, Kaplan A, Letso R, Schmid RS, Turmel GJ, Lo DC, et al. Inhibitors of protein disulfide isomerase suppress apoptosis induced by misfolded proteins. Nat Chem Biol. 2010; 6: 900–906. https://doi.org/10.1038/nchembio.467
10.1038/nchembio.467
CAS PubMed Web of Science® Google Scholar
Holthauzen LM, Rosgen J, Bolen DW. Hydrogen bonding progressively strengthens upon transfer of the protein urea-denatured state to water and protecting osmolytes. Biochemistry. 2010; 49: 1310–1318. https://doi.org/10.1021/bi9015499
10.1021/bi9015499
CAS PubMed Web of Science® Google Scholar
Hopf TA, Ingraham JB, Poelwijk FJ, Scharfe CP, Springer M, Sander C, et al. Mutation effects predicted from sequence co-variation. Nat Biotechnol. 2017; 35: 128–135. https://doi.org/10.1038/nbt.3769
10.1038/nbt.3769
CAS PubMed Web of Science® Google Scholar
Hu CY, Kokubo H, Lynch GC, Bolen DW, Pettitt BM. Backbone additivity in the transfer model of protein solvation. Protein Sci. 2010; 19: 1011–1022. https://doi.org/10.1002/pro.378
10.1002/pro.378
CAS PubMed Web of Science® Google Scholar
Ishikawa K, Nakamura H, Morikawa K, Kanaya S. Stabilization of Escherichia coli ribonuclease HI by cavity-filling mutations within a hydrophobic core. Biochemistry. 1993; 32: 6171–6178.
10.1021/bi00075a009
CAS PubMed Web of Science® Google Scholar
Janda I, Devedjiev Y, Cooper D, Chruszcz M, Derewenda U, Gabrys A, et al. Harvesting the high-hanging fruit: the structure of the YdeN gene product from Bacillus subtilis at 1.8 angstroms resolution. Acta Crystallogr D Biol Crystallogr. 2004; 60: 1101–1107.
10.1107/S0907444904007188
CAS PubMed Web of Science® Google Scholar
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596: 583–589. https://doi.org/10.1038/s41586-021-03819-2
10.1038/s41586-021-03819-2
CAS PubMed Web of Science® Google Scholar
Jumper J, Hassabis D. Protein structure predictions to atomic accuracy with AlphaFold. Nat Methods. 2022; 19: 11–12. https://doi.org/10.1038/s41592-021-01362-6
10.1038/s41592-021-01362-6
CAS PubMed Web of Science® Google Scholar
Kabsch W. Automatic indexing of rotation diffraction patterns. J Appl Crystallogr. 1988a; 21: 67–71.
10.1107/S0021889887009737
CAS Web of Science® Google Scholar
Kabsch W. Evaluation of single-crystal x-ray diffraction data from a position-sensitive detector. J Appl Crystallogr. 1988b; 21: 916–924.
10.1107/S0021889888007903
CAS Web of Science® Google Scholar
Kabsch W. XDS. Acta Crystallogr D Biol Crystallogr. 2010a; 66: 125–132. https://doi.org/10.1107/S0907444909047337
10.1107/S0907444909047337
CAS PubMed Web of Science® Google Scholar
Kabsch W. Integration, scaling, space-group assignment and post-refinement. Acta Crystallogr D Biol Crystallogr. 2010b; 66: 133–144. https://doi.org/10.1107/S0907444909047374
10.1107/S0907444909047374
CAS PubMed Web of Science® Google Scholar
Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983; 22: 2577–2637.
10.1002/bip.360221211
CAS PubMed Web of Science® Google Scholar
Katayanagi K, Ishikawa M, Okumura M, Ariyoshi M, Kanaya S, Kawano Y, et al. Crystal structures of ribonuclease HI active site mutants from Escherichia coli. J Biol Chem. 1993; 268: 22092–22099.
10.1016/S0021-9258(20)80652-8
CAS PubMed Web of Science® Google Scholar
Katayanagi K, Miyagawa M, Matsushima M, Ishikawa M, Kanaya S, Nakamura H, et al. Structural details of ribonuclease H from Escherichia coli as refined to an atomic resolution. J Mol Biol. 1992; 223: 1029–1052. https://doi.org/10.1016/0022-2836(92)90260-q
10.1016/0022-2836(92)90260-Q
CAS PubMed Web of Science® Google Scholar
Katayanagi K, Okumura M, Morikawa K. Crystal structure of Escherichia coli RNase HI in complex with Mg2+ at 2.8 A resolution: proof for a single Mg(2+)-binding site. Proteins. 1993; 17: 337–346. https://doi.org/10.1002/prot.340170402
10.1002/prot.340170402
CAS PubMed Web of Science® Google Scholar
Kelsic ED, Chung H, Cohen N, Park J, Wang HH, Kishony R. RNA structural determinants of optimal codons revealed by MAGE-Seq. Cell Syst. 2016; 3: 563–571.e6. https://doi.org/10.1016/j.cels.2016.11.004
10.1016/j.cels.2016.11.004
CAS PubMed Web of Science® Google Scholar
Kendrew JC. Structure and function in myoglobin and other proteins. Fed Proc. 1959; 18: 740–751.
CAS PubMed Web of Science® Google Scholar
Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC. A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature. 1958; 181: 662–666. https://doi.org/10.1038/181662a0
10.1038/181662a0
CAS PubMed Web of Science® Google Scholar
Kendrew JC, Perutz MF. A comparative X-ray study of foetal and adult sheep haemoglobins. Proc R Soc Lond A Math Phys Sci. 1948; 194: 375–398.
10.1098/rspa.1948.0087
CAS PubMed Google Scholar
Khan MM, Simizu S, Kawatani M, Osada H. The potential of protein disulfide isomerase as a therapeutic drug target. Oncol Res. 2011; 19: 445–453. https://doi.org/10.3727/096504011x13123323849717
10.3727/096504011X13123323849717
PubMed Web of Science® Google Scholar
Kirk DB, Hwu WW. Programming massively parallel processors—a hands-on approach. 2nd ed. Morgan Kaufmann/Elsevier: Burlington, MA; 2023.
Google Scholar
Kita Y, Arakawa T, Lin TY, Timasheff SN. Contribution of the surface free energy perturbation to protein-solvent interactions. Biochemistry. 1994; 33: 15178–15189.
10.1021/bi00254a029
CAS PubMed Web of Science® Google Scholar
Liao Z, Oyama T, Kitagawa Y, Katayanagi K, Morikawa K, Oda M. Pivotal role of a conserved histidine in Escherichia coli ribonuclease HI as proposed by X-ray crystallography. Acta Crystallogr D Struct Biol. 2022; 78: 390–398. https://doi.org/10.1107/s2059798322000870
10.1107/S2059798322000870
CAS PubMed Web of Science® Google Scholar
Liebschner D, Afonine PV, Baker ML, Bunkoczi G, Chen VB, Croll TI, et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr D Struct Biol. 2019; 75: 861–877. https://doi.org/10.1107/S2059798319011471
10.1107/S2059798319011471
CAS PubMed Web of Science® Google Scholar
Lim WK, Rosgen J, Englander SW. Urea, but not guanidinium, destabilizes proteins by forming hydrogen bonds to the peptide group. Proc Natl Acad Sci U S A. 2009; 106: 2595–2600. https://doi.org/10.1073/pnas.0812588106
10.1073/pnas.0812588106
CAS PubMed Web of Science® Google Scholar
Liu H, Fu H, Chipot C, Shao X, Cai W. Accurate description of solvent-exposed salt bridges with a non-polarizable force field incorporating solvent effects. J Chem Inf Model. 2022; 62: 3863–3873. https://doi.org/10.1021/acs.jcim.2c00678
10.1021/acs.jcim.2c00678
CAS PubMed Web of Science® Google Scholar
Liu Q, Hendrickson WA. Contemporary use of anomalous diffraction in biomolecular structure analysis. Methods Mol Biol. 2017; 1607: 377–399. https://doi.org/10.1007/978-1-4939-7000-1_16
10.1007/978-1-4939-7000-1_16
CAS PubMed Google Scholar
Liu Y, Bolen DW. The peptide backbone plays a dominant role in protein stabilization by naturally occurring osmolytes. Biochemistry. 1995; 34: 12884–12891. https://doi.org/10.1021/bi00039a051
10.1021/bi00039a051
CAS PubMed Web of Science® Google Scholar
Longenecker KL, Garrard SM, Sheffield PJ, Derewenda ZS. Protein crystallization by rational mutagenesis of surface residues: Lys to Ala mutations promote crystallization of RhoGDI. Acta Crystallogr D Biol Crystallogr. 2001; 57: 679–688.
10.1107/S0907444901003122
CAS PubMed Web of Science® Google Scholar
Luft J, Wolfley J, Jurisica I, Glasgow J, Fortier S, DeTitta GT. Macromolecular crystallization in a high throughput laboratory in the search phase. J Crys Growth. 2001; 232: 591–595.
10.1016/S0022-0248(01)01206-4
CAS Web of Science® Google Scholar
Luft JR, Collins RJ, Fehrman NA, Lauricella AM, Veatch CK, DeTitta GT. A deliberate approach to screening for initial crystallization conditions of biological macromolecules. J Struct Biol. 2003; 142: 170–179.
10.1016/S1047-8477(03)00048-0
CAS PubMed Web of Science® Google Scholar
Luft JR, Snell EH, Detitta GT. Lessons from high-throughput protein crystallization screening: 10 years of practical experience. Expert Opin Drug Discovery. 2011; 6: 465–480. https://doi.org/10.1517/17460441.2011.566857
10.1517/17460441.2011.566857
CAS PubMed Web of Science® Google Scholar
Luft JR, Wolfley JR, Snell EH. What's in a drop? Correlating observations and outcomes to guide macromolecular crystallization experiments. Cryst Growth Des. 2011; 11: 651–663. https://doi.org/10.1021/cg1013945
10.1021/cg1013945
CAS PubMed Web of Science® Google Scholar
Lynch ML, Snell ME, Potter SA, Snell EH, Bowman SEJ. 20 years of crystal hits: progress and promise in ultrahigh-throughput crystallization screening. Acta Crystallogr D Struct Biol. 2023; 79: 198–205. https://doi.org/10.1107/S2059798323001274
10.1107/S2059798323001274
CAS PubMed Web of Science® Google Scholar
Makhatadze GI, Privalov PL. Protein interactions with urea and guanidinium chloride. A calorimetric study. J Mol Biol. 1992; 226: 491–505.
10.1016/0022-2836(92)90963-K
CAS PubMed Web of Science® Google Scholar
Mateja A, Devedjiev Y, Krowarsch D, Longenecker K, Dauter Z, Otlewski J, et al. The impact of Glu→Ala and Glu→Asp mutations on the crystallization properties of RhoGDI: the structure of RhoGDI at 1.3 Å resolution. Acta Crystallogr D Biol Crystallogr. 2002; 58: 1983–1991.
10.1107/S090744490201394X
CAS PubMed Web of Science® Google Scholar
McRee DE. XtalView/Xfit—a versatile program for manipulating atomic coordinates and electron density. J Struct Biol. 1999; 125: 156–165.
10.1006/jsbi.1999.4094
CAS PubMed Web of Science® Google Scholar
Mitchell JB, Nandi CL, McDonald IK, Thornton JM, Price SL. Amino/aromatic interactions in proteins: is the evidence stacked against hydrogen bonding? J Mol Biol. 1994; 239: 315–331.
10.1006/jmbi.1994.1370
CAS PubMed Web of Science® Google Scholar
Moayed F, Bezrukavnikov S, Naqvi MM, Groitl B, Cremers CM, Kramer G, et al. The anti-aggregation holdase Hsp33 promotes the formation of folded protein structures. Biophys J. 2020; 118: 85–95. https://doi.org/10.1016/j.bpj.2019.10.040
10.1016/j.bpj.2019.10.040
CAS PubMed Web of Science® Google Scholar
Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A. 2011; 108: E1293–E1301. https://doi.org/10.1073/pnas.1111471108
10.1073/pnas.1111471108
CAS PubMed Web of Science® Google Scholar
Nandi PK, Robinson DR. Effects of urea and guanidine hydrochloride on peptide and nonpolar groups. Biochemistry. 1984; 23: 6661–6668. https://doi.org/10.1021/bi00321a058
10.1021/bi00321a058
CAS PubMed Web of Science® Google Scholar
Naumov V, Price WN, Handelman SK, Hunt JF. Engineering surface epitopes to improve protein crystallization. United States patent 16/252337. 2019.
Google Scholar
Nisthal A, Wang CY, Ary ML, Mayo SL. Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. Proc Natl Acad Sci U S A. 2019; 116: 16367–16377. https://doi.org/10.1073/pnas.1903888116
10.1073/pnas.1903888116
CAS PubMed Web of Science® Google Scholar
Nozaki Y, Tanford C. The solubility of amino acids, diglycine, and triglycine in aqueous guanidine hydrochloride solutions. J Biol Chem. 1970; 245: 1648–1652.
10.1016/S0021-9258(19)77141-5
CAS PubMed Web of Science® Google Scholar
Oeffner RD, Croll TI, Millan C, Poon BK, Schlicksup CJ, Read RJ, et al. Putting AlphaFold models to work with phenix.process_predicted_model and ISOLDE. Acta Crystallogr D Struct Biol. 2022; 78: 1303–1314. https://doi.org/10.1107/S2059798322010026
10.1107/S2059798322010026
CAS PubMed Web of Science® Google Scholar
Okuta R, Unno Y, Nishino D, Hido S. Crissman. CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations. 2017.
Google Scholar
Olson CA, Spek EJ, Shi Z, Vologodskii A, Kallenbach NR. Cooperative helix stabilization by complex Arg-Glu salt bridges. Proteins. 2001; 44: 123–132.
10.1002/prot.1079
CAS PubMed Web of Science® Google Scholar
Otwinowski Z, Minor W. Processing of x-ray diffraction data collected in oscillation mode. Methods Enzymol. 1997; 276: 307–326.
10.1016/S0076-6879(97)76066-X
CAS PubMed Web of Science® Google Scholar
Pace CN, Trevino S, Prabhakaran E, Scholtz JM. Protein structure, stability and solubility in water and other solvents. Philos Trans R Soc Lond B Biol Sci. 2004; 359: 1225–1234; discussion 1234-1225. https://doi.org/10.1098/rstb.2004.1500
10.1098/rstb.2004.1500
CAS PubMed Web of Science® Google Scholar
Pettersen EF, Goddard TD, Huang CC, Meng EC, Couch GS, Croll TI, et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 2021; 30: 70–82. https://doi.org/10.1002/pro.3943
10.1002/pro.3943
CAS PubMed Web of Science® Google Scholar
Price WN 2nd, Chen Y, Handelman SK, Neely H, Manor P, Karlin R, et al. Understanding the physical properties that control protein crystallization by analysis of large-scale experimental data. Nat Biotechnol. 2009; 27: 51–57. https://doi.org/10.1038/nbt.1514
10.1038/nbt.1514
CAS PubMed Web of Science® Google Scholar
Qiu X, Janson CA. Structure of apo acyl carrier protein and a proposal to engineer protein crystallization through metal ions. Acta Crystallogr D Biol Crystallogr. 2004; 60: 1545–1554. https://doi.org/10.1107/S0907444904015422
10.1107/S0907444904015422
CAS PubMed Web of Science® Google Scholar
Robinson DR, Jencks WP. Effect of denaturing agents of the urea-guanidinium class on the solubility of acetyltetraglycine ethyl ester and related compounds. J Biol Chem. 1963; 238: 1558–1560.
10.1016/S0021-9258(18)81223-6
CAS PubMed Web of Science® Google Scholar
Robinson DR, Jencks WP. The effect of compounds of the urea-guanidinium class on the activity coefficient of acetyltetraglycine ethyl ester and related compounds. J Am Chem Soc. 1965; 87: 2462–2470. https://doi.org/10.1021/ja01089a028
10.1021/ja01089a028
CAS PubMed Web of Science® Google Scholar
Rost B, Sander C. Conservation and prediction of solvent accessibility in protein families. Proteins. 1994; 20: 216–226.
10.1002/prot.340200303
CAS PubMed Web of Science® Google Scholar
Sanishvili R, Fischetti RF. Applications of x-ray micro-beam for data collection. Methods Mol Biol. 2017; 1607: 219–238. https://doi.org/10.1007/978-1-4939-7000-1_9
10.1007/978-1-4939-7000-1_9
CAS PubMed Google Scholar
Schellman JA. The thermodynamic stability of proteins. Annu Rev Biophys Biophys Chem. 1987; 16: 115–137. https://doi.org/10.1146/annurev.bb.16.060187.000555
10.1146/annurev.bb.16.060187.000555
CAS PubMed Web of Science® Google Scholar
Schrier MY, Schrier EE. Transfer free energies and average static accessibilities for ribonuclease A in guanidinium hydrochloride and urea solutions. Biochemistry. 1976; 15: 2607–2612. https://doi.org/10.1021/bi00657a020
10.1021/bi00657a020
CAS PubMed Web of Science® Google Scholar
Scott JN, Nucci NV, Vanderkooi JM. Changes in water structure induced by the guanidinium cation and implications for protein denaturation. J Phys Chem A. 2008; 112: 10939–10948. https://doi.org/10.1021/jp8058239
10.1021/jp8058239
CAS PubMed Web of Science® Google Scholar
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020; 577: 706–710. https://doi.org/10.1038/s41586-019-1923-7
10.1038/s41586-019-1923-7
CAS PubMed Web of Science® Google Scholar
Sheldrick GM. Experimental phasing with SHELXC/D/E: combining chain tracing with density modification. Acta Crystallogr D Biol Crystallogr. 2010; 66: 479–485. https://doi.org/10.1107/S0907444909038360
10.1107/S0907444909038360
CAS PubMed Web of Science® Google Scholar
Slabinski L, Jaroszewski L, Rodrigues AP, Rychlewski L, Wilson IA, Lesley SA, et al. The challenge of protein structure determination—lessons from structural genomics. Protein Sci. 2007; 16: 2472–2482. https://doi.org/10.1110/ps.073037907
10.1110/ps.073037907
CAS PubMed Web of Science® Google Scholar
Snell EH, Nagel RM, Wojtaszcyk A, O'Neill H, Wolfley JL, Luft JR. The application and use of chemical space mapping to interpret crystallization screening results. Acta Crystallogr D Biol Crystallogr. 2008; 64: 1240–1249. https://doi.org/10.1107/S0907444908032411
10.1107/S0907444908032411
CAS PubMed Google Scholar
Sokalingam S, Raghunathan G, Soundrarajan N, Lee SG. A study on the effect of surface lysine to arginine mutagenesis on protein stability and structure using green fluorescent protein. PLoS One. 2012; 7:e40410. https://doi.org/10.1371/journal.pone.0040410
10.1371/journal.pone.0040410
CAS PubMed Web of Science® Google Scholar
Srinivasan R, Rose GD. A physical basis for protein secondary structure. Proc Natl Acad Sci U S A. 1999; 96: 14258–14263. https://doi.org/10.1073/pnas.96.25.14258
10.1073/pnas.96.25.14258
CAS PubMed Web of Science® Google Scholar
Sternberg MJ, Chickos JS. Protein side-chain conformational entropy derived from fusion data—comparison with other empirical scales. Protein Eng. 1994; 7: 149–155. https://doi.org/10.1093/protein/7.2.149
10.1093/protein/7.2.149
CAS PubMed Web of Science® Google Scholar
Terrific Broth. Cold Spring Harbor Protocols. 2015 https://doi.org/10.1101/pdb.rec087874
10.1101/pdb.rec087874
Google Scholar
Terwilliger TC. Maximum-likelihood density modification using pattern recognition of structural motifs. Acta Crystallogr D Biol Crystallogr. 2001; 57: 1755–1762.
10.1107/S0907444901013737
CAS PubMed Web of Science® Google Scholar
Terwilliger TC, Afonine PV, Liebschner D, Croll TI, McCoy AJ, Oeffner RD, et al. Accelerating crystal structure determination with iterative AlphaFold prediction. Acta Crystallogr D Struct Biol. 2023; 79: 234–244. https://doi.org/10.1107/S205979832300102X
10.1107/S205979832300102X
CAS PubMed Web of Science® Google Scholar
Terwilliger TC, Poon BK, Afonine PV, Schlicksup CJ, Croll TI, Millan C, et al. Improved AlphaFold modeling with implicit experimental information. Nat Methods. 2022; 19: 1376–1382. https://doi.org/10.1038/s41592-022-01645-6
10.1038/s41592-022-01645-6
CAS PubMed Web of Science® Google Scholar
Thomas J, Ramakrishnan N, Bailey-Kellogg C. Graphical models of residue coupling in protein families. IEEE/ACM Trans Comput Biol Bioinform. 2008; 5: 183–197. https://doi.org/10.1109/TCBB.2007.70225
10.1109/TCBB.2007.70225
CAS PubMed Web of Science® Google Scholar
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nuc Acids Res. 1994; 22: 4673–4680.
10.1093/nar/22.22.4673
CAS PubMed Web of Science® Google Scholar
Tien MZ, Meyer AG, Sydykova DK, Spielman SJ, Wilke CO. Maximum allowed solvent accessibilites of residues in proteins. PLoS One. 2013; 8:e80635. https://doi.org/10.1371/journal.pone.0080635
10.1371/journal.pone.0080635
PubMed Web of Science® Google Scholar
Tischer A, Lilie H, Auton M, Lange C. Oxidative refolding of rPA in l-ArgHCl and in ionic liquids: A correlation between hydrophobicity, salt effects, and refolding yield. Biopolymers. 2014; 101: 1129–1140. https://doi.org/10.1002/bip.22518
10.1002/bip.22518
CAS PubMed Web of Science® Google Scholar
Tischer A, Lilie H, Rudolph R, Lange C. L-arginine hydrochloride increases the solubility of folded and unfolded recombinant plasminogen activator rPA. Protein Sci. 2010; 19: 1783–1795. https://doi.org/10.1002/pro.465
10.1002/pro.465
CAS PubMed Web of Science® Google Scholar
Tsumoto K, Ejima D, Kita Y, Arakawa T. Review: Why is arginine effective in suppressing aggregation? Protein Pept Lett. 2005; 12: 613–619. https://doi.org/10.2174/0929866054696109
10.2174/0929866054696109
CAS PubMed Web of Science® Google Scholar
Vagin A, Teplyakov A. Molecular replacement with MOLREP. Acta Crystallogr D Biol Crystallogr. 2010; 66: 22–25. https://doi.org/10.1107/S0907444909042589
10.1107/S0907444909042589
CAS PubMed Web of Science® Google Scholar
Vener MV, Odinokov AV, Wehmeyer C, Sebastiani D. The structure and IR signatures of the arginine-glutamate salt bridge. Insights from the classical MD simulations. J Chem Phys. 2015; 142:215106. https://doi.org/10.1063/1.4922165
10.1063/1.4922165
CAS PubMed Web of Science® Google Scholar
Venkatesu P, Lee MJ, Lin HM. Thermodynamic characterization of the osmolyte effect on protein stability and the effect of GdnHCl on the protein denatured state. J Phys Chem B. 2007; 111: 9045–9056. https://doi.org/10.1021/jp0701901
10.1021/jp0701901
CAS PubMed Web of Science® Google Scholar
Wang C, Li W, Ren J, Fang J, Ke H, Gong W, et al. Structural insights into the redox-regulated dynamic conformations of human protein disulfide isomerase. Antioxid Redox Signal. 2013; 19: 36–45. https://doi.org/10.1089/ars.2012.4630
10.1089/ars.2012.4630
CAS PubMed Web of Science® Google Scholar
Wetlaufer DB, Lovrien R. Induction of reversible structural changes in proteins by nonpolar substances. J Biol Chem. 1964; 239: 596–603.
10.1016/S0021-9258(18)51725-7
CAS PubMed Web of Science® Google Scholar
Wetlaufer DB, Malik SK, Stoller S, Coffin RL. Nonpolar group participation in the denaturation of proteins by urea and guanidinium salts. Model compound studies. J Am Chem Soc. 1964; 86: 508–514.
10.1021/ja01057a045
CAS Web of Science® Google Scholar
Wilson MA. Mapping enzyme landscapes by time-resolved crystallography with synchrotron and x-ray free electron laser light. Annu Rev Biophys. 2022; 51: 79–98. https://doi.org/10.1146/annurev-biophys-100421-110959
10.1146/annurev-biophys-100421-110959
CAS PubMed Web of Science® Google Scholar
Xiao R, Anderson S, Aramini J, Belote R, Buchwald WA, Ciccosanti C, et al. The high-throughput protein sample production platform of the Northeast Structural Genomics Consortium. J Struct Biol. 2010; 172: 21–33. https://doi.org/10.1016/j.jsb.2010.07.011
10.1016/j.jsb.2010.07.011
CAS PubMed Web of Science® Google Scholar
Yang M, Ferreon AC, Bolen DW. Structural thermodynamics of a random coil protein in guanidine hydrochloride. Proteins. 2000; 4: 44–49. https://doi.org/10.1002/1097-0134(2000)41:4+<44::aid-prot40>3.3.co;2-z
10.1002/1097-0134(2000)41:4+<44::AID-PROT40>3.0.CO;2-7
PubMed Web of Science® Google Scholar
Yang W, Hendrickson WA, Crouch RJ, Satow Y. Structure of ribonuclease H phased at 2 A resolution by MAD analysis of the selenomethionyl protein. Science. 1990; 249: 1398–1405. https://doi.org/10.1126/science.2169648
10.1126/science.2169648
CAS PubMed Web of Science® Google Scholar
Zarrine-Afsar A, Mittermaier A, Kay LE, Davidson AR. Protein stabilization by specific binding of guanidinium to a functional arginine-binding surface on an SH3 domain. Protein Sci. 2006; 15: 162–170. https://doi.org/10.1110/ps.051829106
10.1110/ps.051829106
CAS PubMed Web of Science® Google Scholar
Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins. 2004; 57: 702–710. https://doi.org/10.1002/prot.20264
10.1002/prot.20264
CAS PubMed Web of Science® Google Scholar
Zheng W, Borgia A, Buholzer K, Grishaev A, Schuler B, Best RB. Probing the action of chemical denaturant on an intrinsically disordered protein by simulation and experiment. J Am Chem Soc. 2016; 138: 11702–11713. https://doi.org/10.1021/jacs.6b05443
10.1021/jacs.6b05443
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume33, Issue3

March 2024

e4898

Systematic enhancement of protein crystallization efficiency by bulk lysine-to-arginine (KR) substitution

Abstract

1 INTRODUCTION

2 RESULTS

2.1 KR mutation site-selection algorithm and software

2.2 Test protein selection and expression

2.3 KR mutations are generally only minimally destabilizing

2.4 Bulk KR mutations enhance crystallization propensity and yield strongly diffracting crystals

2.5 Bulk KR mutations do not perturb protein structure and frequently make H-bonds in crystal-packing interfaces

2.6 Influence of Bulk KR mutations on protein solubility in PEG3350 solutions

3 DISCUSSION

4 MATERIALS AND METHODS

4.1 Site-selection software and input sequence alignment format

4.2 GPU acceleration of sequence identity calculation

4.3 Prioritization of mutation sites based on redundancy-corrected counts of KR mutations observed in homologous proteins

4.4 Protein expression and purification

4.5 Thermal stability assays using CD spectroscopy

4.6 Solubility assays

4.7 Protein crystallization screening

4.8 Protein crystal optimization

4.9 Crystal structure determination and refinement

AUTHOR CONTRIBUTIONS

ACKNOWLEDGMENTS

CONFLICT OF INTEREST STATEMENT

Supporting Information

REFERENCES

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Systematic enhancement of protein crystallization efficiency by bulk lysine-to-arginine (KR) substitution

Abstract

1 INTRODUCTION

2 RESULTS

2.1 KR mutation site-selection algorithm and software

2.2 Test protein selection and expression

2.3 KR mutations are generally only minimally destabilizing

2.4 Bulk KR mutations enhance crystallization propensity and yield strongly diffracting crystals

2.5 Bulk KR mutations do not perturb protein structure and frequently make H-bonds in crystal-packing interfaces

2.6 Influence of Bulk KR mutations on protein solubility in PEG3350 solutions

3 DISCUSSION

4 MATERIALS AND METHODS

4.1 Site-selection software and input sequence alignment format

4.2 GPU acceleration of sequence identity calculation

4.3 Prioritization of mutation sites based on redundancy-corrected counts of KR mutations observed in homologous proteins

4.4 Protein expression and purification

4.5 Thermal stability assays using CD spectroscopy

4.6 Solubility assays

4.7 Protein crystallization screening

4.8 Protein crystal optimization

4.9 Crystal structure determination and refinement

AUTHOR CONTRIBUTIONS

ACKNOWLEDGMENTS

CONFLICT OF INTEREST STATEMENT

Supporting Information

REFERENCES

Citing Literature

Figures

References

Related

Information