Protein engineering by chemical methods: Incorporation of nonnatural amino acids as a tool for studying protein folding, stability, and function
Funding information: University of Padua, Grant/Award Number: PRAT-2015
Abstract
Proteins are large complex biomolecules that act as the effectors of essentially all cell functions. Due to the intrinsic complexity of protein architecture at the microscopic level and the inadequacy of theoretical methods to predict protein reactivity (ie, folding, stability, and function), protein engineering has emerged as a valuable tool to investigate structure–stability–activity relationships in proteins and nowadays recombinant DNA technologies are the “gold standard” for site-specifically manipulating a given protein chain. The usefulness of current mutagenesis techniques, however, is limited by the relatively poor chemical diversity of the 20 DNA-coded amino acids, such that it is difficult to precisely assign the observed change of protein stability or function to the variation of a single physicochemical property at a protein site (ie, hydrophobicity, conformational propensity, polarizability, hydrogen bonding, etc). In this article, we report relevant examples from our laboratory showing that chemical methods, that is, enzyme-catalyzed semisynthesis and stepwise solid-phase synthesis, allow to conveniently incorporate non-natural amino acids with “tailored” side chains into small proteins and thus effectively transfer the structure–activity relationship methodology, typical of the medicinal chemistry approach on small molecules, to the study of folding, stability, and molecular recognition in macromolecular protein systems.
Graphical Abstract
Abbreviations
-
- aaRS
-
- aminoacyl-tRNA-synthetase
-
- Abu
-
- α-aminobutyric acid
-
- ADAMTS-13
-
- A Disintegrin And Metalloproteinase with ThromboSpondin motifs family member 13
-
- Aib
-
- α-aminoisobutyric acid
-
- ASA
-
- accessible surface area
-
- ChCl
-
- choline chloride
-
- Fmoc
-
- fluorenylmethyloxycarbonyl
-
- GpIbα
-
- glycoprotein Ibα
-
- HBTU/HOBt
-
- hexafluorophosphate benzotriazole tetramethyl uronium/hydroxybenzotriazole
-
- HM2
-
- hirudin variant 2 from Hirudinaria manillensis
-
- MD
-
- molecular dynamics
-
- MES
-
- 2-(N-morpholino)ethanesulfonic acid
-
- MetSO
-
- methionine sulfoxide
-
- NT
-
- 3-NO2-Tyr
-
- PAR-1
-
- protease activated receptor 1
-
- PEG
-
- polyethylene glycol
-
- PPACK
-
- (D)-Phe-Pro-Arg-CH2-Cl
-
- PRP
-
- proline-rich peptides
-
- r.t.
-
- room temperature
-
- SH3
-
- Src Homology-3 domains
-
- SPPS
-
- solid-phase peptide synthesis
-
- SSPA
-
- site-specific perturbation approach
-
- TLN
-
- thermolysin
-
- vWF
-
- von Willebrand factor
-
- αT
-
- α-thrombin
1 INTRODUCTION
Proteins are large complex biomolecules that act as the effectors of essentially all cell functions and play fundamental roles in the structure, function, and regulation of the body's tissues and organs.1 Proteins are made up of hundreds (or thousands) of amino acids, which are linked to one another by peptide bonds to form long chains. There are 20 different types of amino acids that can be combined to make a protein and the sequence with which these amino acids are found along the polypeptide chain, that is, the primary structure determines the 3D structure of each protein and its unique functional properties. In order to function, the vast majority of proteins require that, after mRNA translation within the cell, the polypeptide chain folds into an ordered and stable tertiary structure. Although it is widely accepted that the primary structure determines the tertiary structure and that this, in turn, specifies protein function, the complexity of protein architecture at the microscopic level is so high that a coherent description of the physicochemical properties is not (yet) available.2 In fact, each amino acid in a protein structure is involved in an intricate network of interactions and the strength of these interactions depends not only on the physicochemical properties of that specific residue but also on those of the surrounding chemical environment.3
In this scenario, protein engineering was aimed at identifying in a polypeptide chain those amino acid residues, and their relevant physicochemical properties, that are important for protein folding, stability, and function, with the final goal to modify in a predictable manner key properties of a given protein, such as thermodynamic stability, molecular recognition, and catalytic efficiency/specificity. Due to the inherent complexity of proteins and the (still holding) inadequacy of theoretical methods to predict protein reactivity, earlier protein engineering studies started in the 1970s by exploiting the SSPA, whereby the effect of the chemical modification of a given amino acid in a protein was representative of the role of that residue in dictating the structure, stability, and function of that protein.4 Albeit convincing on theoretical grounds, the application of SSPA was limited by the relatively low yields of chemical modification, the use of harsh conditions (often leading to protein denaturation and [partial] loss of function), and, more importantly, by the poor/moderate specificity of derivatization, which frequently resulted in ambiguous results.4 The advent of recombinant DNA technology in the late 1980s, as a consolidated technique, represented a “big jump” in the advancement of protein science and much expanded the applicability of SSPA, allowing researchers to site-specifically alter the composition of a given polypeptide chain at a glance and thus to more effectively investigate the molecular mechanism of protein folding/stability/function.5 Notwithstanding, a quantitative description of the physical and chemical bases that make a polypeptide chain to efficiently fold into a well-defined, stable, and functionally active conformation is still elusive.2, 6 Similarly, these difficulties stem primarily from the fact that nature put together, in a yet unknown manner, different properties (ie, hydrogen bonding capability, conformational propensity, hydrophobicity, polarizability, etc) into the 20 standard protein amino acids. This makes difficult, if not impossible, to univocally relate the variation of the physicochemical properties caused by the amino acid exchange to the observed changes in protein stability or function. Under this perspective, the possibility to incorporate into a given polypeptide chain non-coded amino acids with tailored side chains would allow investigators to finely tune the structure at a specific protein site, thus facilitating dissection of the effects of a given mutation in terms of only one or a few physicochemical properties, thus much expanding the scope of physical organic chemistry in the study of proteins.7 For instance, let us consider the study of the effect of increasing the side-chain volume at a given site (X) of a ligand protein (L) on the affinity for the corresponding protein receptor (R) (Figure 1A). Structure–activity relationship (SAR) studies can be conducted by classical mutagenesis techniques, where X can be substituted by Gly, Ala, Val, Ile, and Leu in the aliphatic series (Table 1). Figure 1B shows that by increasing the side-chain volume there is also an increase in the apolar character, as given by the linear increase of the hydrophobic substituent constant π. Intriguingly, the conformational properties of each substituent amino acid change a lot with the side-chain volume. Due to its higher conformational entropy, Gly destabilizes both α-helix and β-sheet, whereas the presence of a CH3 group in the Ala side chain restricts the accessible conformational space and strongly stabilizes the α-helical conformation (Table 1).8 A further increase in the side-chain volume, as with the isopropyl group of Val, destabilizes α-helix and stabilizes β-sheet.9 This trend is retained with the sec-butyl group of Ile and is reversed with the isobutyl of Leu, which (like Ala) is a strong helix inducer.9 Hence, a classical mutagenesis study at the X site of the ligand protein in Figure 1A would introduce uncertainty in the interpretation of SAR data, as it is difficult (if not impossible) to dissect the contribution of side-chain volume on the binding strength from that of hydrophobicity and conformational propensity. This problem can be simplified by incorporating non-coded amino acids at the X site, such as Abu and its higher homologues nor-Val and nor-Leu, which are incrementally bulkier and more hydrophobic than Ala but, conversely to Val and Ile, retain the helix-stabilizing properties of Ala (Table 1 and Figure 1B). Another example of the usefulness of non-coded amino acids is the study of aromatic–aromatic interactions, also known as π–π interactions, which have been recognized to play a key role in protein stability and molecular recognition.10 As depicted in Figure 1C, the contribution of aromatic–aromatic interactions involving phenylalanine can be easily addressed by phenylalanine (Phe) → cyclohexylalanine (Cha) substitution. In fact, although Phe and Cha share similar side-chain volume, hydrophobicity, and conformational propensities, Cha has a saturated ring and therefore cannot establish stabilizing π–π interactions.

![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Property | Gly | Ala | Val | Leu | Abu | Nor-Val | Nor-Leu |
Vvdw a | 48 | 67 | 105 | 124 | 86 | 114 | 133 |
π b | 0 | 0.31 | 1.22 | 1.70 | 0.82 | 1.37 | 1.70 |
Conf c | -d | α | β | α | α | α | α |
- a VvdW is the van der Waals volume (Å3).
- b π is the hydrophobicity constant of the side chain: π = logP(X) - logP(Gly), where P(X) and P(Gly) are the water → octanol partition coefficient of the amino acid X and Gly, respectively.
- c The conformational preferences of the each amino acids are reported according to their prevalent α-helix or β-sheet stabilizing properties.
- d Although Gly can be found both in helical and sheet structures, it is an intrinsically destabilizer of secondary structures, due to its high conformational flexibility.
As a natural extension of mutagenesis studies with standard DNA-coded amino acids, several genetic strategies have been pursued during the last two decades for incorporating non-coded amino acids into proteins, in either a residue-specific or a site-specific fashion.11-15 The residue-specific incorporation of non-coded amino acids allows to replace a given amino acid type, at any position in the protein chain, with a non-coded counterpart usually having physicochemical properties similar to those of the corresponding natural amino acids.11, 12 This approach makes use of auxotrophic bacterial strains lacking the enzymatic machinery needed to synthesize one selected natural amino acid, whereby the expression host is first grown in a medium containing all 20 natural amino acids and is later shifted to a medium lacking only the amino acid to be replaced and containing the incoming unnatural amino acid. The method relies on the ability of the bacterial aaRS to recognize a certain non-coded amino acid (eg, selenomethionine) as a substrate and coupling it to the tRNA of the natural counterpart (ie, methionine). For this reason, the method is mainly restricted to those non-coded analogs that are isosteres (ie, selenomethionine, p-fluorophenylalanine, or 7-azatryptophan) of the corresponding natural amino acids (eg, methionine, phenylalanine, or tryptophan).16, 17 Mutant aaRS with altered substrate specificity were expressed in auxotrophic Escherichia coli strains, allowing to incorporate non-coded amino acids with different functionalities (eg, p-acethyl-phenylalanine).17
The site-specific incorporation of non-coded amino acids in whole cells can be achieved by genetic code expansion, whereby an aaRS/tRNA pair is used to insert the non-coded amino acid into the growing polypeptide chain in response to an amber stop codon on the mRNA.18, 19 Genetic code expansion requires an “orthogonal” aaRS/tRNA pair and a blank codon that can be used to encode unnatural amino acid incorporation.20 An “orthogonal” aaRS does not aminoacylate any endogenous tRNAs in the host cell, but specifically aminoacylates its cognate “orthogonal” tRNA, which in turn is not a substrate for endogenous aaRSs, but only for the “orthogonal” aaRS.20 Briefly, an unnatural amino acid, added to the cell growth medium, is specifically recognized by a modified aaRS and coupled to an amber suppressor tRNA, which is decoded by the ribosome in response to an amber codon (UAG) introduced into the gene of interest, thus allowing the synthesis of a protein with a site-specifically introduced unnatural amino acid.20 Genetic code expansion techniques allowed the site-specific incorporation into different protein systems of more than 100 diverse non-coded amino acids, carrying novel chemical entities and spectroscopic probes, for studying protein/enzyme structure, folding, stability, and function.21 Although whole-cell expression methods have much improved the overall efficiency of mutagenesis with non-coded amino acids, compared to cell-free methods originally introduced 13, genetic methods are still limited by the low amount of the resulting mutant proteins that can be obtained.20, 21
In this scenario, chemical methods such as enzyme-catalyzed semisynthesis22 and SPPS23, 24 remain invaluable experimental tools for engineering small proteins with non-coded amino acids. Enzyme-catalyzed semisynthesis exploits the micro-reversibility of hydrolase mechanism in water-restricted environments, that is, moderate/high concentrations of organic cosolvents glycerol or propanol, to re-stitch a peptide bond between the carboxy-terminal end of a natural fragment and the amino-terminal end of another fragment, usually of synthetic origin.25 However, enzymatic semisynthesis cannot be applied to every protein system, as it requires careful tailoring of the experimental condition, that is, pH, temperature, organic cosolvents, choice of the enzyme, etc. At variance, the development of more efficient carbonyl activators and enhancement of automated synthesizers performances made experimentally accessible the manipulation of even long polypeptide chains (50-90 amino acids) on laboratory routine basis, by inserting any non-coded amino acid at any protein site for producing large amount of the resulting mutated chain, from milligrams to grams, for studying protein folding, stability, and function.24 Recently, the efficiency and versatility of SPPS has been extensively exploited to introduce special spectroscopic probes and photosensitizers into self-assembling peptides for in vivo imaging studies or for the construction of biocompatible supramolecular nanostructures to be used in antitumor photodynamic and photothermal therapy.26 Furthermore, photo-inducible chromophores (eg, tris-bipyridyl stable complexes of ruthenium[II]) have been successfully incorporated at the N-terminus of alanine-rich peptides to monitor ultrafast kinetics of helix-coil conformational transition.27
In this article, we present relevant applications from our laboratory of protein engineering studies with non-coded amino acids, aimed at elucidating structure–stability and structure–activity relationships in several different protein systems, including: (1) the C-terminal domain 255-316 of TLN, a thermostable bacterial metalloprotease; (2) the N-terminal domain 1-47 of hirudin, a highly potent anticoagulant from blood-sucking leeches; (3) proline-rich peptide binders to SH3 domains; and (4) the oxidation sensitive region 1596-1669 of vWF, a key plasma protein in primary hemostasis.
2 INCREASING PROTEIN THERMOSTABILITY BY ALANINE → Aib REPLACEMENTS
Aib (or α-methylalanine) is a natural non-coded amino acid which has been first identified in short (15-20 amino acids) bioactive peptides of microbial origin.28 Aib is intrinsically achiral and can be considered as arising from the replacement of the Cα proton of an Ala residue by a methyl group29, 30 (Figure 2). Furthermore, peptide host–guest studies indicate that Aib has an even higher intrinsic helical propensity than Ala, which however is known as the most α-helix stabilizing among the 20 coded amino acids.29 This helix-inducer effect primarily stems from the presence of the geminal CH3-groups on the Cα-atom of Aib, which severely limits the rotation around the NCα and CαC′ bonds of the peptide backbone to torsion angles (ϕ, ψ) characteristic of the helical structure, either α-helix and 310-helix.30-33 On these grounds, it is reasonable to propose that incorporation of Aib into a polypeptide chain might increase protein stability by decreasing the conformational chain entropy (Sconf) of the polypeptide in the unfolded state, thus making the unfolding transition less likely. The same principle has been previously exploited by introducing Gly → Ala mutations in proteins, where the extra CH3-group reduces the conformational space accessible to Ala (38%) in the unfolded state, compared to Gly (70%) (Figure 2) and thus shifts the F ↔ U equilibrium toward the folded state.34, 35

To evaluate the potential stabilizing effect of Aib at the protein level, we introduced single and double Ala → Aib substitutions in the carboxy-terminal domain 255-316 of thermolysin, TLN(255-316), and studied the conformational and stability properties of the corresponding Aib-containing analogs.36 TLN is a heat-stable zinc-dependent protease secreted by thermophilic bacterium Bacillus thermoproteolyticus and stabilized by four Ca2+ ions.37 NMR38 and differential scanning calorimetry39 analyses reveal that TLN(255-316) in solution predominantly exists as a symmetric homo-dimer, where each monomer has a global fold identical to that of the corresponding sequence in the crystallographic structure of the intact protease, composed of three α-helices encompassing residues 260-274, 281-295, and 301-311.
Aib was incorporated into TLN(255-316) by enzyme-catalyzed coupling of fragment 255-302 to Aib-containing analogs of fragment 303-316 using Glu-C specific V8-protease from Staphylococcus aureus in 50% (v/v) aqueous glycerol, where TLN(255-302) was of natural origin whereas Aib derivatives of TLN(303-316) were produced by SPPS25 (Figure 3). The three Ala residues at position 304, 309, and 312 were each replaced with Aib and the doubly substituted analog Ala304Aib/Ala309Aib was also produced. Notably, the substitution sites display different conformational and accessibility properties. The NMR solution structure of TLN(255-316) indicates that Ala304 and Ala309 are both embedded in the C-terminal α-helix 301-311 in a fixed conformation, with dihedral angles fully compatible with the lowest-energy geometry of Aib (ϕ = ±57°, ψ = ±47°).31 Conversely, Ala312 is in a flexible site, as given by the variance of the ϕ and ψ angles, and explores backbone conformations not allowed to the rigid Aib residue. The substitution sites also display rather different solvent exposure, with Ala304 and Ala312 being moderately exposed and Ala309 completely buried in the apolar environment at the dimer interface.

As schematized in Figure 3, the natural fragment TLN(255-302) was obtained after three consecutive proteolytic reactions, starting from intact TLN. In the first step, incubation of TLN with EDTA resulted in the abstraction of a bound Ca2+ ion in the omega loop that became susceptible of proteolysis by the active TLN molecules still present in solution, allowing to isolate the fragment TLN(205-316). In the second proteolytic step, limited proteolysis of TLN(205-316) with subtilisin resulted in the accumulation of TLN(255-316) that was quite resistant to further proteolysis. In the third proteolysis reaction, TLN(255-316) could be fragmented with V8-protease only under denaturing conditions (0.2% SDS or 4 M urea), at the single Glu302 in the first turn of the C-terminal helix, generating fragments TLN(255-302) and TLN(303-316). Circular dichroism (CD) spectroscopy indicated that these proteolytic fragments were largely unfolded, suggesting that the C-terminal helix 301-311 is crucial for TLN(255-316) to acquire a folded native-like structure. Hence, TLN(255-302) was purified by semi-preparative RP-HPLC, whereas Aib-derivatives of the 14-residue long TLN(303-316) were prepared by automated stepwise SPPS, using standard Fmoc-chemistry on a p-alkoxybenzyl ester resin and the HBTU/HOBt activation procedure. To overcome poor coupling yields, which are often observed in the synthesis of sterically hindered Aib peptides, a double coupling cycle was used for Fmoc-Aib and for the amino acid at Aib + 1 position along the growing peptide chain. The peptides carrying the Ala → Aib substitution at position 309 and 312 were synthesized with a ~90% yield, whereas those substituted at position 304 underwent a drop in the coupling efficiency (30% yield), due to the presence of the sterically hindered Val residue at position 303. The covalent coupling of the natural TLN(255-302) fragment to the synthetic TLN(303-316) Aib derivatives, to reconstitute the full-length “mutated” TLN(255-316), was efficiently achieved by exploiting the microscopic reversibility of the enzymatic reactions, whereby the protease (ie, V8-protease) that was able to catalyze the cleavage of the peptide bond Glu302-Val303 in aqueous buffer at pH 7.8 was also able to promote the re-stitching of the same bond under conditions disfavoring hydrolysis and favoring synthesis, that is, in the presence of high concentrations of organic cosolvents such as 50% glycerol or 90% n-propanol, and at pH 6.4 to minimize the relative amount of charged COO− or NH3+ end-groups of Glu302 and Val303, respectively. TLN(255-302) was incubated at r.t., room temperature (20°C-22°C) with a 5-fold molar excess of each synthetic TLN(303-316) Aib analog. The reaction was monitored by RP-HPLC and after 72 hours, the Aib analogs of full-length TLN(255-316) were obtained with 80%-90% yields (Figure 4A,B). The homogeneity and chemical identity of the semisynthetic products were established by Capillary Zone Electrophoresis, N-terminal sequence analysis, and high-resolution mass spectrometry. The semisynthetic approach described here is both practical and useful, and allowed us to prepare enough quantity of TLN(255-316) analogs (5-10 mg) for conformational and stability analyses. The exceedingly high yields of peptide bond re-stitching likely result from the formation of a fragment complementing system, formed by fragments TLN(255-302) and TLN(303-316), and the resistance to proteolysis of the semisynthetic reaction products, that is, the TLN(255-316) analogs, (see above) which slowly accumulate in the reaction mixture. The conformational characterization of Aib derivatives was carried out by CD spectroscopy in the far-UV and near-UV region, where far-UV CD is representative of the secondary structure content40 while near-UV CD is a sensitive probe of protein tertiary structure and is taken as a spectroscopic fingerprint of the aromatic side-chain topology.41 The spectra of Aib derivatives both in the far-UV and near-UV region (Figure 4C,D) are superimposable to those recorded for the wild-type TLN(255-316). This is taken as a strong evidence that the incorporation of Aib residue(s) does not perturb the native-like structure of TLN(255-316) and allows us to interpret the stability data (see below) as arising solely from the effects of Ala → Aib substitution(s) at the mutation site(s).

The stability of wild-type and Aib derivatives of TLN(255-316) was determined by monitoring the decrease of the ellipticity at 222 nm, representative of the α-helical content, as a function of temperature. The thermal unfolding process was cooperative and fully reversible (>95%). The resulting denaturation curves were analyzed within the framework of a reversible two-state process, N2 ↔ 2 U, whereby at a given temperature only native (fN) and unfolded (fU) fractions of protein molecules are present at significant concentration: 2·fN + fU = 1. From the thermal unfolding curves, the melting temperature (Tm), defined as the temperature at which there is an equal concentration of protein molecules in the native and unfolded state, and the difference in the unfolding free energy change (ΔΔGU) were extracted. The data in Figure 4E and Table 2 provide clear-cut evidence that the effects of Ala → Aib substitution are remarkably context dependent. Indeed, the replacement of Ala304 with Aib enhances Tm by 2.2°C, whereas the same mutation at position 309 or 312 can be either stabilizing (ΔTm = +5.4°C) or destabilizing (ΔTm = −0.6°C). Notably, the double mutated analog Ala304Aib/Ala309Aib was more stable than the wild-type TLN(255-316) by 8.0°C.
Fragment 255-316 | Tm (°C) | ΔTm (°C) | ΔΔGU (kcal·mol−1) |
---|---|---|---|
Natural fragment | 63.5 | - | - |
Ala312Aib | 62.9 | −0.6 | −0.20 |
Ala304Aib | 65.7 | 2.2 | 0.71 |
Ala309Aib | 68.9 | 5.4 | 1.76 |
Ala304Aib/Ala309Aib | 71.5 | 8.0 | 2.43 |
- a The thermodynamic data were extracted from the unfolding curves reported in panel E.

Considered that Ala → Aib exchange does not alter the conformation of the protein, it follows that γAla ~ γAib in the native state. At the melting temperature of wild-type TLN(255-316), T m(WT), the contribution of the backbone conformational entropy of unfolding to protein stability is given by: ΔΔGconf = −T m(WT)·ΔΔSU = +0.74 kcal/mol, in excellent agreement with the value determined experimentally (ΔΔGU = +0.71 kcal/mol) (Table 2). Note that, the replacement of an amino acid with higher conformational entropy, such as Gly (γGly = 70%), with Aib would result in an even greater entropic effect, with an estimated ΔΔSU(Gly→Aib) = −3.5 cal/(mol·K).
As with Ala304Aib analog, the Aib can be incorporated at position 309 without any significant conformational strain or steric hindrance. However, at variance with position 304, Aib309 is shielded from the solvent and buried at the apolar dimer interface. Hence, the higher stabilizing effect of Ala309Aib analog (ΔT m = +5.4°C) originates from a combination of favorable conformational and hydrophobic contributions to the unfolding free energy change: ΔΔGU = ΔΔGconf + ΔΔGϕ. The conformational effect can be estimated as above, whereas the hydrophobic contribution can be estimated as: ΔΔGϕ = σ·ΔΔ(ASAϕ), where ΔΔ(ASAϕ) is the difference of apolar accessible surface area Δ(ASAϕ) that is buried upon folding of Ala309Aib and the wild-type TLN(255-316) (Figure 3), and σ is a proportionality constant accounting for the burial of 1 Å2 of hydrophobic surface (σ = 25 cal/mol/Å2).44 Hence, for Ala309Aib the unfolding free energy change can be estimated as follows: ΔΔGU = ΔΔGconf + ΔΔGϕ. = 0.74 + 1.1 = 1.75 kcal/mol, which is identical to that determined experimentally, that is, ΔΔGU = 1.76 kcal/mol (Table 2).
At last, the slightly lower stability of Ala312Aib analog is the result of opposing factors: first, the favorable entropic effect reported above is likely lower for Ala312 → Aib replacement, as Ala is at the flexible C-terminal end of the protein domain and therefore Aib is expected to rigidify both the unfolded and the folded state, with a resulting larger (unfavorable) unfolding entropy change; second, the (small) entropic stabilization is likely overwhelmed by destabilizing strain energy effects due to forcing Aib312 into a conformation (ϕ = −90 ± 14°, ψ = 10 ±20°) which is different from that of lowest energy for Aib (ϕ = ±57°, ψ = ±47°)31 (Figure 4).
Importantly, the stabilizing effects of Ala → Aib substitutions at positions 304 and 309 are fairly additive for both ΔT m and ΔΔGU, and allowed us to significantly increase the T m of TLN(255-316) by 8.0°C and 2.4 kcal/mol (Figure 4E and Table 2), suggesting that the mutations sites behave independently.35 Furthermore, the additivity of the mutational effects highlight a general strategy for engineering protein stability, whereby the rational incorporation of multiple X → Aib substitutions in helical segments can markedly increase stability, if a suitable procedure for the incorporation of Aib into a protein is found.
3 Incorporation of non-coded amino acids for SAR studies of hirudin binding to αT
αT is a serine protease of the chymotrypsin family playing a pivotal role at the interface of coagulation, inflammation, sepsis, and cell growth.45-47 For this reason, molecules able to inhibit αT activity are the focus of intensive pharmacological research. αT displays a unique surface electrostatic potential (Figure 5), generated by the asymmetry of charge distribution, where a negatively charged active-site region is flanked by two positive patches opposite to the active site, that is, exosites 1 and 2, and responsible for physiological substrates binding.48, 49 In hemostasis, αT exerts either procoagulant or anticoagulant functions.50 The procoagulant functions entail the proteolytic conversion of soluble fibrinogen into insoluble fibrin polymers and aggregation of platelets, which occurs after cleavage of PAR-1.46, 51 The anticoagulant role regards the proteolytic activation of the protein C (PC), generating active PC (aPC) which then proteolytically inactivates the key pro-coagulant cofactors V and VIII, with a final anticoagulant effect.47 Notably, the binding of a Na+ ion to a specific site on αT triggers the conformational transition of the enzyme from an anticoagulant (slow) form to a procoagulant (fast) form.52, 53 The Na+-bound (fast) form displays improved procoagulant properties, as it cleaves more efficiently procoagulant substrates such as fibrinogen and PAR-1. The Na+-free (slow) form, instead, is anticoagulant because it retains the normal proteolytic activity of the fast form toward protein C, but it is unable to promote acceptable hydrolysis of the procoagulant substrates.52, 54, 55

Hirudins are small proteins (63-65 amino acids) that are produced by the salivary glands of the blood sucking leaches Hirudo sp. to keep the host blood fluid.56 Hirudin is the most potent and specific inhibitor of αT (Kd = 20-200 fM) and is now used as an anticoagulant drug (Refludan). Hirudin HM2 variant from Hirudinaria manillensis, an hematophagous parasite of bovines, is formed by a compact N-terminal region (amino acids 1-47) and a flexible negatively charged C-terminal tail (amino acids 48-64) (4htc.pdb).57, 58 The N-terminal domain is stabilized by three disulfide bonds and covers the protease active site by extensively penetrating into the enzyme specificity sites through its N-terminal tripeptide59 (Figure 5C-E). After limited proteolysis with trypsin, it was possible to isolate in high yields hirudin fragments HM2(1-47) and HM2(48-64), where the N-terminal domain inhibits αT hydrolytic activity with an equilibrium inhibition constant (Ki) of 40 nM while the C-terminal tail binds to the protease exosite-1 with a dissociation constant (Kd) of about 1 μM.60-63 Like full-length hirudin, the N-terminal fragment HM2(1-47) binds ~30-fold more tightly to the fast form of αT than to the slow form, suggesting that the structural determinants for this behavior are stored in the N-terminal domain. 53, 55, 63
3.1 SAR studies
Starting from these considerations, we decided to conduct SAR studies by introducing natural and nonnatural amino acids replacements into HM2(1-47) with the aim to: (1) get valuable information on the physicochemical determinants underlying the extraordinary affinity and specificity of hirudin for αT; (2) obtain a highly potent mini-hirudin based on the structure of Hir(1-47); and (3) map the conformational properties of αT recognition sites in the slow or fast form.64
HM2(1-47) analogs, containing both coded and non-coded amino acids, were produced by combining automated and manual stepwise solid-phase synthesis using standard Fmoc chemistry on a p-alkoxybenzyloxy resin.65, 66 The crude HM2(1-47) analogs, with the six Cys residues in the reduced state, were purified by semi-preparative RP-HPLC with a final synthesis/purification yields of 30%-45%. Thereafter, oxidative disulfide renaturation was carried out under air-oxidizing conditions in bicarbonate buffer, pH 8.3, 100 μM β-mercaptoethanol, allowing to obtain the final folded species with a final yield in the 70%-90% range63, 66 (Figure 6). Renaturation yields are remarkably high if one considers that on simple statistical grounds there is only one correct disulfide bond topology out of 15 total possibilities, corresponding to 6.6%. The addition of small concentrations of a reducing agent like β-mercaptoethanol is crucial for obtaining such high folding yields, as it reduces unstable non-native disulfides, thus giving the polypeptide chain more chances to fold into the correct topology. Enzymatic peptide mass fingerprint analysis was used to establish the chemical identity and correct disulfide pairing of the folded HM2(1-47) synthetic analogs.63, 66 After purification by RP-HPLC, all the peptides were characterized by far-ultraviolet (UV) and near-UV CD, showing that single-point amino acid exchanges do not appreciably affect the conformation of HM2(1-47), thus allowing to interpret the differences in the affinity for αT solely on the basis of the variation of the physicochemical properties at the mutation site.

The inhibitory potency of the synthetic Hir(1-47) analogs toward the procoagulant (fast) and anticoagulant (slow) form of αT was measured at 25°C by monitoring the release of p-nitroanilide (p-NA) from the synthetic substrate (D)-Phe-Pro-Arg-p-NA under salt conditions stabilizing the fast (0.2 M NaCl) or the slow (0.2 M ChCl) form of the protease66, 67 (Figure 7). The amino acid exchanges were designed starting from the NMR solution structure of HM258 and the crystallographic structure of hirudin-αT complex.59

Position 1. When bound to αT, HM2(1-47) orients Val-1 in a rather constrained environment, toward the enzyme S2 substrate specificity site, such that incorporation of a larger amino acid would introduce unfavorable steric clashes (Figure 5D,E). Hence, Val-1 was replaced with a small amino acid like Ala and with tert-butylglycine (t-Bug), a non-natural amino acid having a side-chain volume comparable to that of Val68 and similar β-sheet forming tendency69 (Figure 7A). The replacement of Val-1 with the smaller Ala residue reduces the affinity of HM2(1-47) for αT in the fast or slow form by 15-fold and 3-fold, respectively, whereas the incorporation of t-Bug enhances the affinity for the fast form by about 3-fold, with a gain in the binding free energy change (ΔΔGb) of 0.63 kcal/mol (Figure 7B). This value is identical to that estimated from the reduction of the entropy change of binding of the t-Bug analog compared to the wild-type inhibitor, ΔΔGb = −T·ΔΔS = −T·R·ln(γ t-Bug/γVal) = −0.65 kcal/mol at 25°C. In fact, t-Bug has three energetically equivalent side-chain rotamers available for binding in a functionally active conformation to the enzyme (γ t-Bug = 3), whereas Val has only one rotamer (ie, the trans rotamer) available for binding to αT.69 These findings suggest a general strategy to improve affinity by introducing non-coded amino acids with symmetrical side chains (eg, valine → tert-butylglycine or leucine → tert-butylalanine) that favorably reduce the overall change in binding entropy (ΔSb) by increasing the entropy of the ligand in the bound state, with minimal steric requirements.
Position 2. Ser-2 in HM2(1-47) covers but does not fill the primary specificity site (S1) of the enzyme, which contains the negative Asp-189.59 Hence, we decided to replace the short/polar side chain of serine with that of the long/positively charged arginine and its more rigid analog, p-guanido-phenylalanine (p-Gnd-Phe) (Figure 7A). The presence of a positive charge at position-2 enhances affinity by 20-fold for the fast form and by 100-fold for the slow form (Figure 7B). This strong increase in affinity is caused by the electrostatic coupling of the positive Arg-residue or p-Gnd-Phe with the negative Asp-189, positioned at the bottom of the S1 site. Our data suggest that electrostatic steering is more important for driving HM2(1-47) binding to the slow form rather than to the fast form.
Position 3. Tyr-3 in HM2(1-47) has a well-defined conformation both in the free58 and αT-bound state.59 Tyr-3 partially fills the apolar S3 site of the protease, which is shaped by Leu-99, Ile-174, and Trp-215, and is highly conserved in the hirudin family.70 To address the importance of this residue in αT inhibition, we introduced large structural and chemical diversity at this position by replacing Tyr-3 with 13 different amino acids, either natural and non-natural, having different side-chain volume, hydrophobicity, electronic, and conformational properties (Figure 7A). Importantly, shaving of Tyr-3 side chain with Ala strongly reduces affinity for αT fast form by 65-fold. Conversely, the binding strength of Tyr3Ala analog to the slow form was reduced by only 1.8-fold. These results provide strong, albeit indirect, evidence that the S3 site in the slow form is less accessible to ligand binding than in the fast form.53, 55 Modeling studies on the hirudin-αT complex indicate that the presence of Ala-3 creates a large cavity at the S3 site and loss of van der Waals contacts. Consistent with these results, enlargement of the side-chain volume at position 3 with bulkier and more hydrophobic amino acids, like β-naphthylalanine (β-Nal) and biphenylalanine (Bip), dramatically enhanced the affinity of HM2(1-47) by 40-fold and 210-fold, respectively. The importance of hydrophobicity in driving hirudin–αT interaction is well documented by the replacement of the apolar Trp-3 (π = 2.60) with the isosteric and less hydrophobic 7-aza-Trp (π = 2.10), resulting in a 10-fold lower affinity. However, a deeper analysis of the αT inhibition data reported in Figure 7B reveals that, beyond hydrophobicity (ie, the intrinsic tendency of apolar groups to avoid water), side-chain orientation and electronic effects also play an important role in molecular recognition. For instance, α-Nal and β-Nal have the same hydrophobicity and molecular volume, but different side-chain orientation and indeed Tyr3β-Nal is 6-fold more potent than the Tyr3α-Nal analog. Likewise, although homo-Phe is more hydrophobic than Tyr and Phe, the affinity of the Tyr3homo-Phe analog is about 4-fold and 40-fold lower than that of HM2(1-47) and Tyr3Phe analog, respectively. Modeling studies indicate that both α-Nal and homo-Phe orient their side chain toward the sterically constrained S2 site of αT, whereas β-Nal and Bip point to and fill the wide and apolar S3 site of the enzyme. In addition to orientation requirements, subtle electronic effects also seem to play an important role in hirudin-αT recognition. As an example, saturation of the aromatic ring of Phe-3 to yield cyclohexylalanine (Cha), which is almost isosteric with Phe and even more hydrophobic, results in a 12-fold drop in the affinity of HM2(1-47) for αT, which is caused by the deletion of the edge-to-face interaction existing between the phenyl-ring of Phe-3 of hirudin and the indole-moiety of Trp-215 in αT (Figure 5D,E). These aromatic–aromatic (π–π) interactions are known to stabilize ligand–receptor complexes on electrostatic grounds,10 but they cannot be formed with the saturated Cha ring (Figure 1C). These results prompted us to the increase the strength of π–π interactions by introducing strong electron-withdrawing groups, like p-fluoro and p-nitro, on the aromatic ring of Phe. Contrary to expectations, the affinity of Tyr3pF-Phe and Tyr3p-nitro-Phe decreased by 4-fold or 40-fold, compared to Tyr3Phe (Figure 7B). These electron-attracting groups much increased the electric dipole moment (μ) of the amino acid at position 3, and this would favor π–π interactions with Trp-215. However, they also introduced destabilizing electrostatic effects, as the negative end of the dipole of the aromatic ring at position 3 points toward the protease S3 site (Figure 5D,E), which has a strong negative potential (Figure 5B).
Overall, the results of SAR studies demonstrate that even a single amino acid exchange, that is, Ala-3 → Bip exchange, can enhance the affinity of HM2(1-47) for αT by 1.3 × 104-fold and that, beyond hydrophobicity, other factors (ie, orientation and electronic effects) are important for driving molecular recognition.
3.2 Molecular mapping of αT binding sites in the fast and slow form
HM2(1-47) binds ~30-fold more tightly to the procoagulant fast form of αT than to the anticoagulant slow form,36, 53, 55 with a coupling free energy of binding, ΔGc = ΔGf - ΔGs = −2.1 kcal/mol. Note that, αT inhibition data in Figure 7B indicate that in certain cases the same amino acid replacement in HM2(1-47) can have remarkably different effect on the affinity for the fast or slow form. According to the thermodynamic cycle in Figure 8A, a change in ΔGc is a measure of the energetic contribution of the interactions being lost (or gained) upon mutation and this is a valuable experimental tool for identifying those regions on αT that have different structural features in the slow or fast form. In our case, shaving of Val-1 (pointing to the S2 site) and Tyr3 (filling the S3 site) with a smaller amino acid like Ala almost exclusively lowers the affinity for the fast form without appreciably affecting the affinity of Hir(1-47) for the slow form, thus resulting in the abrogation of ΔGc. These results indicate that in the slow form the S2 and S3 sites are in a conformation that is less accessible and forgiving than that they explore in the fast form (Figure 8D). In keeping with this picture, the presence of a long and positive side-chain at position 2 of HM2(1-47), such as that of Arg or p-Gnd-Phe, allows electrostatic coupling with Asp-189 in the primary S1 site of αT preferentially in the less accessible slow form compared to the more open fast form.

The structural model of the fast and slow form emerging from the SAR studies reported above is consistent with the results of our MD simulations53 and with kinetic measurements by Di Cera and coworkers.71 MD simulations show that, upon Na+ binding, significant conformational changes occur in the S2 and S3 sites (Figure 8B), which become more open and accessible, and in the 148-loop, which becomes more exposed on the protein surface (Figure 8C). On the other hand, fast kinetic measurements indicate that αT exists in equilibrium between more accessible and active (E) species and more closed and inactive (E*) species (or ensembles) and that binding of Na+ shifts the pre-existing E*↔E equilibrium in favor of the active E form.71 Our model also explains why the Na+-bound fast form cleaves procoagulant substrates (ie, fibrinogen and PAR-1) 20-40-fold more efficiently than the Na+-free slow form. In fact, both fibrinogen and PAR-1 orient in the S3 site bulky amino acid side chains, such as Phe-8 in fibrinogen, VvdW(Phe) = 127 Å3 (1bbr.pdb),72 and Leu-38 in PAR-1, VvdW(Leu) = 100 Å3 (1nrs.pdb).73 On the other hand, protein C zymogen (PC), which is related to the anticoagulant function of αT, seems to only partially fill with its Val-38, VvdW(Val) = 80 Å3, the S3 site of the enzyme (4dt7.pdb)74. Consistently, Na+ binding does not significantly alter PC cleavage by αT (ΔGc = 0.2 kcal/mol).52
3.3 Increasing the affinity of HM2(1-47) for αT by multiple amino acid substitutions
Although full-length hirudin is the most potent and specific anti-αT drug known so far, the use of hirudin as an anticoagulant is limited by its low therapeutic index.75 These problems mainly stem from the susceptibility of the highly flexible C-terminal tail to cleavage by endogenous proteases, generating truncated N-terminal fragments which are by far less potent (Kd = 30-400 nM) than the parent hirudin molecule (Kd = 0.2-1.0 pM).62 Taking advantage of the SAR data reported above, we decided to produce a mini-hirudin based on the structure of HM2(1-47) and displaying much higher inhibition potency for αT. Hence, we combined the best performing amino acid substitutions (ie, Val-1 → t-Bug, Ser-2 → Arg, and Tyr-3 → β-Nal) in the same HM2(1-47) analog, hereafter denoted as BAN (Figure 7). Notably, BAN inhibited the fast or slow form of the enzyme by 2670-fold and 6820-fold more efficiently than the natural HM2(1-47) (42 nM), with an affinity (Kd = 15 pM) not so far from that of the full-length recombinant hirudin HM2 (Kd = 0.2 pM).76 Importantly, the effects of the amino acid replacements were additive for the inhibition of either the fast and slow form of αT, indicating that the S1, S2, and S3 substrate recognition sites on αT behave independently35, 77 and that the properties of a multiple substituted analog might be accurately predicted simply from those of the single mutants. Importantly, BAN is resistant to proteolysis by digestive proteases (ie, pepsin, trypsin, and chymotrypsin) and displays almost absolute selectivity for αT, without significantly inhibiting other proteases of the coagulation cascade (eg, factor Xa) and the fibrinolytic system (eg, plasmin). Finally, BAN was able to prolong the coagulation parameters of thrombin time (TT) and prothrombin time (PT) in human plasma, although less efficiently than full-length HM2. The combination of high potency and target specificity with elevated proteolytic stability make BAN an attractive alternative to HM2 as a direct anti-thrombotic agent.
4 Non-coded amino acids as spectroscopic probes in the study of protein folding and binding
Protein engineering with non-coded amino acids allows to obtain “spectrally enhanced proteins”78 that contain non-coded amino acids as spectroscopic probes, possessing physicochemical properties (eg, side-chain volume, hydrophobicity) similar to those of the corresponding natural amino acids, but displaying unique spectral features, distinct from those of the natural counterparts, and high sensitivity to the chemical environment where they are embedded. For instance, site-specific incorporation of 6-dimethylamino-2-acyl-naphthylalanine (Aladan) into the B1 domain of protein G from S. aureus, along with the context-dependent emission properties of this fluorophore, allowed to obtain direct estimates of the local dielectric constant of the protein at different sites.79 In the following, we show that incorporation of non-natural analogs of Trp, Tyr, and Phe, such as 7-aza-Tryptophan (7-aza-Trp), 3-nitro-Tyrosine (NT) and p-iodo-phenylalanine (p-I-Phe), can be effectively exploited for studying protein folding and binding.
4.1 7-Aza-tryptophan as a “spectrally enhanced” Trp-analog in the study of HM2(1-47) folding
7-aza-Trp is an isostere of Trp, containing a nitrogen atom at position 7 in the indole nucleus. The N7 atom of 7-aza-Trp has a pKa = 4.5 and thus it is not protonated at physiological pH.80 7-aza-Trp displays unique absorption and fluorescence properties.81 In particular, the absorption λmax is redshifted by 10 nm, compared to the natural Trp residue, and the fluorescence emission is strongly influenced by the polarity of the chemical environment, whereby on going from water to cyclohexane the emission λmax is blueshifted from 400 to 325 nm and the quantum yield (ϕ) is increased by 10-fold. An even more dramatic 25-fold increase of 7-aza-Trp quantum yield is observed going from water (ϕ = 0.01) to acetonitrile (ϕ = 0.25). These spectroscopic properties make 7-aza-Trp a good isosteric Trp-analog for investigating protein binding and folding.
Taking advantage of the relative accessibility of HM2(1-47) chemical synthesis, we replaced Tyr3 with 7-aza-Trp and demonstrated the utility of this spectroscopic probe for SAR studies on the hirudin-αT interacting system and for monitoring disulfide-coupled hirudin folding.82 7-aza-Trp is commercially available as a racemic mixture and therefore, prior to chemical synthesis of the corresponding HM2(1-47) analog, the enantiomeric mixture was efficiently resolved by acetic anhydride derivatization, followed by enantioselective de-acylation with Asperigillus oryzae acylase-I to yield free (L)-7-aza-Trp. After coupling with Fmoc-Cl, the Nα-Fmoc-7-aza-Trp derivative was used for SPPS. As reported in Figure 7B, the replacement of Trp3 with 7-aza-Trp led to a 10-fold drop in the affinity of HM2(1-47) for αT, indicating that exchange of even a single atom (C → N) can substantially affect binding strength. These results can be explained by the lower hydrophobicity of 7-aza-Trp compared to that of Trp, as given by the values of octanol → water partition coefficient (logP) of 3-methylindole (logP = 2.60) and 3-methyl-7-azaindole (logP = 2.1).82 On the other hand, the results of spectroscopic characterization indicate that the absorption λmax of the 7-aza-Trp analog in the disulfide-folded state is redshifted by 10 nm, while the fluorescence λmax undergoes a batochromic shift by 40 nm, compared to Tyr3Trp analog (Figure 9A). Interestingly, the emission spectrum of the fully disulfide reduced 7-aza-Trp analog of HM2(1-47) shows two distinct well-resolved bands at 305 and 397 nm, assigned to the contribution of Tyr at position 13 and 7-aza-Trp at position 3. In the folded state the Tyr-band disappears, while the fluorescence of 7-aza-Trp analog is blueshifted to 390 nm and enhanced by about 20% (Figure 9B). These spectral features are compatible with the existence of an efficient fluorescence resonance energy transfer (FRET) between Tyr at position 13 (ie, the donor) and 7-aza-Trp at position 3 (ie, the acceptor) in the folded state, where the donor–acceptor distance is well within the Forster distance of the Tyr/7-aza-Trp pair, that is, 12-15 Å. In the unfolded state, this distance much increases and FRET efficiency drops down. In the case of Tyr3Trp analog, a single band at 350 nm is observed in the native state, while in the SS reduced state the contribution of Tyr13 appears as a barely distinguishable shoulder at 303 nm, overwhelmed by the stronger emission of Trp3 at 352 nm (Figure 9C). This would make difficult (if not impossible) to monitor the folding process of HM2(1-47) by Tyr → Trp energy transfer measurements. Conversely, due to the redshifted emission of 7-aza-Trp, it would be much easier to monitor the kinetics of HM2(1-47) folding reaction by recording the decrease of Tyr emission at 305 nm.

4.2 Incorporation of NT as an energy acceptor in FRET studies of HM2(1-47) binding to αT
NT is produced in vivo by reaction of protein tyrosines with peroxynitrite83 and its side chain is only 30 Å3 larger than the unmodified Tyr. At variance with Tyr, the presence of the electron-withdrawing nitro group makes the phenolic hydrogen of NT 103-fold more acidic (pKa 6.8). At pH < pKa, where the neutral (OH) form is predominant, NT is more hydrophobic than Tyr, whereas at higher pH, where NT exists in the ionized form (O−), it is much more polar.84 NT can form an internal hydrogen bond and its absorption properties are strongly pH dependent (Figure 10A-C). At basic pH, the UV/Vis spectrum of NT displays a major band at 422 nm, characteristic of the ionized form, whereas at acidic pH a prominent band appears at 355 nm, assigned to the contribution of the neutral form. NT is non-fluorescent and absorbs radiation in the wavelength range where both Tyr and Trp emit fluorescence, with a Trp-to-NT Förster's distance of 26 Å.85 For these reasons, NT has great potential as an energy acceptor in FRET studies aimed at investigating distance-dependent biochemical processes such as folding and binding. The NT-containing analog of HM2(1-47), Tyr3NT, was produced by SPPS and its αT inhibition potency determined by either classical enzyme competitive inhibition assays (not shown) and direct FRET measurements (Figure 10D-F).86 The equilibrium inhibition constant, Ki, of Tyr3NT for the fast form was estimated as 1.4 μM (Figure 7B), much higher than that of the unmodified HM2(1-47) (Ki = 40 nM). As the nitro group of NT can be snugly accommodated into the S3 recognition site of the enzyme, the drop in the affinity of Tyr3NT can be ascribed to the lower hydrophobicity of NT, compared to Tyr, and the strong increase of the electric dipole moment of NT side chain (Figure 7A), whose negative end points toward the negatively charged S3 site of αT (Figure 5). The equilibrium dissociation constant (Kd) of the Tyr3NT-αT complex was also measured by direct FRET measurements, recording the decrease of fluorescence intensity of (some) Trp residues on αT (ie, the donors) at increasing concentrations of the inhibitorTyr3NT (ie, the acceptor) (Figure 10F). Even in this case, a much higher Kd = 1.3 μM was obtained, very close to that estimated by indirect enzymatic assays.

These findings put forward NT as a suitable Tyr-analog to be used as a spectroscopic probe in FRET studies for investigating ligand–protein interactions.
4.3 p-Iodo-Phenylalanine as a Trp-fluorescence quencher for studying the binding topology of PRP to SH3 domains.
Iodine-containing molecules are known to act as collisional quenchers of Trp fluorescence by promoting non-radiative decay of the excited singlet state through intersystem crossing to an excited triplet state.87 Here, we report the use of p-iodophenylalanine (pIF) as a spectroscopic probe in fluorescence quenching measurements as a mean to determine the affinity and the binding mode of PRP in complexes with SH3.88
SH3 domains are widespread protein modules (50-70 amino acids) in eukaryotic organisms that bind to Pro-rich regions on many different proteins/enzymes, forming multiprotein networks in signal transduction processes.89 Despite the even large differences in the sequences of both interacting partners, the binding mode of SH3-PRP complexes is highly reproducible90 and in a minimalist approach, two parameters seem to be sufficient to describe an SH3-PRP complex: the affinity of the interaction and the directionality of the peptide on the SH3 domain. The target peptides, typically 7-15 amino acids long, adopt a left-handed polyproline II helix conformation (PPII) and sit in a well-defined groove of the SH3 domain.91 Because of the intrinsic symmetry of PPII helices, the peptides can in principle be accommodated into the SH3 groove in two alternative and opposite orientations, according to their sequence motifs shown in Figure 11A and classified as classes I and II.89, 91 In this view, the possibility to obtain a fast and reliable determination of the affinity and orientation of selected PRP to a given SH3 domain can be confidently exploited for: (1) classifying the peptide binders under investigation and (2) generating low-resolution models, with no need of solving the detailed atomic structure of each complex, thus avoiding elaborated and time consuming structural analysis. To this aim, we studied the position-dependent effect of pIF incorporation into a class-I PRP, hereafter denoted as P2, on the fluorescence intensity of the SH3 domain of type-I myosin isoform 3 from Saccharomyces cerevisiae, Myo3-SH3 (Table 3). Myo3-SH3 is known to recognize class-I peptides and the NMR solution structure of Myo3-SH3/P2 complex reveals that the N-terminal and the C-terminal ends of P2 are at 6-Å and 22-Å distance from Trp-39 (Cβ-Cβ distance), respectively, where Trp-39 sits directly into the peptide binding pocket and is highly conserved among the SH3 sequences (Figure 11).92 The remarkable difference in the distance of the N-terminal and C-termini of PRP from Trp-39 is confirmed by the analysis of known SH3-PRP complexes in the Protein Data Bank, where the N-terminus of class-I peptides is much closer (4.7 ±0.9 Å) to Trp-39 than the C-terminus (18.8 ±1.5 Å). Notably, this pattern is inverted in class-II peptides. For a class-I peptide binding to Myo3-SH3, we can therefore confidently expect that a pIF located at the N-terminal end of P2 should bring the iodide atom in close proximity to Trp-39 and specifically quench fluorescence by a direct collisional mechanism. Conversely, incorporation of pIF in the C-terminal region should not affect significantly the emission properties, as the quencher is too far from the Trp-pocket. For class-II peptides, the position-dependent spectroscopic effects of pIF incorporation are expected to be reversed.

Peptide ID | Peptide sequence | Kd (μM) a |
---|---|---|
P2 | 1 His-Pro-Pro-Arg-Lys-Pro-Pro-Pro-Pro-Pro10 | 18.8 ± 0.9 |
His1pIF | 1 pIF-Pro-Pro-Arg-Lys-Pro-Pro-Pro-Pro-Pro10 | 16.5 ± 0.4 |
Gly13pIF | 1 His-Pro-Pro-Arg-Lys-Pro-Pro-Pro-Pro-Pro-Gly-Gly-pIF-Gly 14 | 18.5 ± 0.8 |
- a Kd values were obtained by fitting the fluorescence data reported in Figure 12B with the Langmuir equation, describing the one-site binding model.
To experimentally validate our hypothesis, two synthetic P2 analogs were produced, containing the substitution His1 → pIF (His1pIF) and Gly13 → pIF (Gly13pIF) (Table 3). As shown in Figure 12A, incorporation of pIF at position 1 of P2 quenches the emission of Myo3-SH3 by ~50%, with a concomitant blueshift in the λmax value of 7 nm, whereas when pIF was inserted at position 13 the quenching effect was remarkably lower and superimposable to that of the unmodified P2 (Figure 12B). In the case of His1pIF, pIF is brought close enough to Trp-39 to quench Trp fluorescence and shield the fluorophore from the polar aqueous solvent. Conversely, in Gly13pIF, the quencher is too far from Trp-39 to exert any effect. These results provide evidence that the position-dependent incorporation of pIF into SH3 binding peptides can be effectively used to obtain fast and reliable information on their affinity and orientation in the SH3-peptide complexes, allowing to grasp quickly the basic features of the complex and avoid lengthy and difficult structural determinations.93, 94




5 INCORPORATION OF NON-CODED AMINO ACIDS AS A TOOL TO INVESTIGATE ENZYME SUBSTRATE SPECIFICITY
vWF is a multi-domain plasma protein (240 kDa; 2050 amino acids) which is predominantly expressed in vascular endothelial cells and platelets and is secreted in the bloodstream as disulfide-bridged ultra-large polymers (UL-vWF), which are formed by up to 200 monomers with a molecular weight of 20-50 MDa95 (Figure 13A). Under high shear forces in the bloodstream, globular UL-vWF multimers become stretched and expose internal domains which are crucial for the interaction of vWF with the GpIbα receptor on the platelets surface and for the recruitment/activation of platelets, to initiate primary hemostasis.96 Notably, the platelet aggregating potential of vWF is crucially dependent on its length, as longer UL-vWF are intrinsically more sensitive to shear-induced unfolding than shorter oligomers. Hence, UL-vWF polymers are more prothrombotic than shorter vWF species.96 The length of UL-vWF multimers is regulated by proteolysis operated by ADAMTS-13 (A Disintegrin And Metalloproteinase with ThromboSpondin motifs family member 13) that exclusively cleaves vWF at the single peptide bond Tyr1605-Met1606.97(Figure 13A). ADAMTS-13 is a 190-kDa multi-domain glycoprotein secreted by the liver. ADAMTS-13 metalloprotease domain contains three calcium ions and a catalytic zinc in the active site.97 Intriguingly, the scissile bond is shielded in the core of vWF A2 domain and is cleaved by the protease only after vWF is stretched by shear forces and when the vulnerable bond becomes exposed to the solvent.95 Hence, the pro-thrombotic potential of vWF is the result of a dynamic equilibrium between the concentration of bioactive UL-vWF and the proteolytic activity of ADAMTS-13. This delicate balance can be altered by genetic, immunological, and chemical factors. We have demonstrated that elevated concentrations of reactive oxygen and nitrogen species can shift the oligomers ↔ polymers equilibrium toward vWF polymers by making the Tyr1605-Met1606 bond resistant to proteolysis by ADAMTS-13, thus leading to the accumulation of the more pro-thrombotic UL-vWF polymers. Consistent with this model, we found that the proportion of UL-vWF is significantly higher in the plasma of patients with type-2 diabetes and chronic kidney disease,98-100 severe pathologies which are characterized by an elevated oxidative stress and very often complicated by dramatic thrombotic microangiopathies.101

Literature data indicate that vWF is the only substrate for ADAMTS-13 and that the minimum length of vWF sequence for efficient cleavage encompasses the 74-amino acid chain spanning residues from Asp1596 to Arg1669 (Figure 13B).98 The resulting vWF74 synthetic polypeptide assumes an unfolded/disordered conformation in solution98. Considered that both amino acids forming the scissile bond in vWF (ie, Tyr1605 - Met1606) are quite sensitive to oxidation, to yield 3-nitro-Tyr (NT) and MetSO, we proposed that oxidative modification of one or both these amino acids might impair cleavage of the vulnerable bond by ADAMTS-13, thus shifting the oligomers ↔ polymers equilibrium toward the accumulation of UL-vWF polymers.98-100 vWF is a complex molecule and contains many oxidant-sensitive amino acids (169 Cys, 41 Met, 18 Trp and 49 Tyr). Hence, to test our hypothesis on a simpler substrate, we decided to chemically synthesize the wild-type vWF74 and two analogs containing at the scissile bond the oxidation products of Tyr1605 and Met1606, namely NT and MetSO, respectively, resulting from nitro-oxidative reactions in vivo by peroxynitrite and superoxide radicals.102 The corresponding Tyr1605NT and Met1606MetSO analogs of vWF74 were chemically synthesized by stepwise Fmoc chemistry, purified by semi-preparative RP-HPLC, and characterized by high-resolution mass spectrometry (Figure 13C,D). Strikingly, enzyme activity assays in Figure 13E indicate that oxidation of Met1606 to MeSO completely abrogates vWF74 cleavage by ADAMTS-13, whereas nitration of Tyr1605 to NT seems to even increase cleavage efficiency compared to wild-type vWF74. Further investigations by docking simulations provided important clues for rationalizing this puzzling behavior (Figure 13F).103 For the analog Tyr1605NT, in fact, the large and apolar primary specificity site (S1) of ADAMTS-13 can easily accommodate the larger NT residue (ΔVol [NT-Tyr] = 30 Å3), thus explaining the enhanced cleavage efficiency observed with Tyr1605NT. For Met1606MetSO, although oxidation of Met to MetSO increases the side-chain volume by only 8 Å3, it turns an hydrophobic amino acid like Met (logP = 2.33) into a highly hydrophilic residue like MetSO (logP = −1.04)104. The burial of the polar/charged MetSO into the apolar S1′ subsite of ADAMTS-13 is energetically unfavorable. Furthermore, Met oxidation generates a mixture of tetrahedral R and S sulfoxide diastereoisomers at the Sγ atom. From our docking simulations, it seems that only the (R)-MetSO isomer can fit in the protease S1′ site, whereas the (S)-MetSO isomer bumps against His224 in the ADAMTS-13 catalytic site.
Altogether, these findings allowed us to unequivocally link the extent of the oxidative modification at Met1606 to the accumulation of UL-vWF polymers and to increased thrombotic risk in diseases characterized by high oxidative stress.
6 CONCLUDING REMARKS
The incorporation of non-coded amino acids into proteins has emerged as a promising approach in protein science for elucidating the molecular mechanisms underlying protein folding, stability, and function.21, 105, 106 Compared to peptide engineering, protein engineering with non-coded amino acids has a greater predictive power of the mutational effects. Indeed, amino acid substitutions at a selected site on a short disordered peptide affect not only the productive interactions of the peptide ligand with the receptor binding site, but also the conformational properties of the peptide chain in an unpredictable fashion, such that it becomes difficult to discriminate between direct (ie, active interactions on the receptor) and indirect (ie, conformational) effect of amino acid exchanges. On the other hand, amino acid substitutions on a well-defined ordered structure of a protein ligand is expected to solely influence the binding strength to the receptor, without significantly perturbing the conformation/flexibility of the protein ligand. The results of our work during the last two decades demonstrate that chemical methods, that is, enzyme catalyzed semisynthesis and stepwise solid-phase synthesis, allow to transfer the SAR methodology, typical of the medicinal chemistry approach for small-molecule drug development, to real protein systems. In particular, the possibility to site-specifically incorporate non-coded amino acids with a structural and chemical diversity much larger than that attainable with standard amino acids makes possible to disclose “hidden interactions” and to assign the effect of a given mutation to the variation of a single physicochemical property, thus improving the predictive power of protein engineering.
Beyond structure–stability–activity relationship studies, the incorporation of non-coded amino acids has been exploited in supramolecular chemistry to modulate the conformational propensities of self-assembling peptides. It has also been used to introduce spectroscopic probes and photosensitizers for diagnostic and therapeutic applications into self-assembled synthetic peptides (eg, RGD-based), polypeptides (eg, poly-Lys/poly-Leu) and natural proteins (eg, apoferritin).26, 107, 108 In particular, the possibility of being able to conjugate photosensitizer moieties (eg, phthalocyanine, protoporphirin, and purpurin) to self-assembled peptides/proteins, which are able to locally release heat or cytotoxic reactive oxygen species upon controlled light stimulation, has opened new and promising avenues for the fabrication of biocompatible nanostructures for use in antitumor photothermal and photodynamic therapy.26, 107, 108
ACKNOWLEDGMENTS
This work was financially supported in part by PRAT-2015 Grant from the University of Padua to V.D.F.
CONFLICT OF INTERESTS
The authors state that they have no conflict of interest.
Biography
Vincenzo De Filippis received his degree in Pharmaceutical Chemistry and Technology from the University of Naples Federico II, Italy, in 1987. He then moved to the University of Padua, Italy, as a Research Associate (1988-1991), Assistant Professor for Organic Chemistry (1992-2005), and Associate Professor of Biochemistry (2006-present). He has been the head of the Protein Chemistry Laboratory, in the Department of Pharmaceutical and Pharmacological Sciences since 1998. In 2012, he received the Italian National Academic Qualification of Full Professor in Biochemistry. His scientific interests lie at the interplay of organic chemistry, biochemistry and molecular medicine, and are currently focused on the study of blood coagulation mechanisms and development of novel anticoagulants.