Tools for protein science
When I started as a Ph.D. student in Australia in 1961, no one in our department, or for that matter in the entire University of Adelaide, had a computer. All of our calculations had to be done by hand. Determining a projection electron density map for a simple small molecule crystal structure, for example, took three days and nights of solid effort. (But as I realized later, it did give one a deep understanding of the nitty-gritty of X-ray structure determination.) At that time the first protein structure, of myoglobin, had recently been determined by Kendrew's group at the MRC Lab in Cambridge using the prototype computer EDSAC I.
A few years later, when I was a postdoc at the MRC myself, it still took several months, and many chunky crystals, to obtain a single dataset for α-chymotrypsin.
Now, of course, the field has been revolutionized. Everyone has access to abundant, powerful computing facilities. High-resolution X-ray datasets can be collected from a single small crystal in minutes if not seconds.
-
Structure Determination
-
X-ray crystallography
For the first protein structure determinations, each laboratory wrote its own software which was also not only lab-specific, but also restricted to the protein under study. Robert Huber's group, in Munich, also pioneered in generalizing their computational procedures so that they could accommodate any protein in any crystal form. Eleanor Dodson, in York, wanted to go further, namely to develop software that would not only be general-purpose, but could be readily used by structural biology groups worldwide. A brief history of the birth and development of her “baby,” CCP4, is included (DOI: 10.1002/pro.3298). Jon Agirre also describes the CLIPPER PYTHON module which will further automate tasks in protein structure determination (DOI: 10.1002/pro.3299).
Radiation damage remains an ongoing problem in macromolecular crystallography. It can be reduced but not eliminated by freezing. RADDOSE-3D, developed by Elspeth Garman and her colleagues (DOI: 10.1002/pro.3302), allows the experimenter to monitor radiation damage both spatially and temporally. It permits more efficient data collection as well as analysis of the structural effects of radiation damage.
-
Nuclear magnetic resonance
Nuclear magnetic resonance (NMR) continues to be one of the most powerful techniques to understand protein structure and dynamics. As has become apparent, information from NMR and X-ray crystallography are often complementary, not competing. The Xplor-NIH suite developed by Marius Clore, Charles Schwieters and coworkers (DOI: 10.1002/pro.3248) is one of the most popular tools for structure analysis using NMR and can also exploit information from other sources.
An area in which NMR is making unique contributions is in the analysis of intrinsically disordered proteins (IDPs). Here the group of Ad Bax describes an approach to define the (φ,ψ) angle distribution of statistical coil proteins in order to compare to IDPs for identification of preferential structure (DOI: 10.1002/pro.3292).
-
Electron microscopy
The explosion in cryo-electron microscopy (cryo-EM) during the past decade or so is self-evident, culminating in the recent award of Nobel Prizes to Richard Henderson, Joachim Frank, and Jacques Dubochet. The task of collating, sorting, assembling, and combining tens of thousands of individual images to yield the structure of a biological complex is unthinkable without massive computing power and highly sophisticated software.
Here, Jose Maria Carazo and coworkers describe Scipion Web Tools that allow cryo-EM image processing over the web, and are also tailored for non-expert users (DOI: 10.1002/pro.3315).
The Bsoft package for cryo-EM, which has a 20-year history in the field, is described by Bernard Heymann (DOI: 10.1002/pro.3293). It allows customization for specific cases, and also includes various validation procedures which have been found to be essential.
The SIMPLE/PRIME tool, described by Hans Elmlund and coworkers (DOI: 10.1002/pro.3266) describes recent innovations in cryo-EM to reduce noice and improve resolution.
-
Circular dichroism
Circular dichroism (CD) continues to be a well-established method especially useful for analyzing the secondary structure and folding status of proteins. DichroMatch (DOI: 10.1002/pro.3207) is an online tool by Bonnie Wallace, Robert Janes and coworkers, now available via the Protein Circular Dichroism Data Bank. It is now possible, for example, to identify spectral nearest neighbors based on different methods of matching.
-
-
Protein Structure, Visualization, and Analysis
It is worth remembering that less than 50 years ago there was no such thing as a protein data bank. Furthermore, there were eminent structural biologists who opposed the idea that when a new structure was reported in the literature, the coordinates should be deposited and made available to the community at large. There are still areas of science where key data are withheld from dissemination. Those opposed to such withholding point to structural biology as an example of a field in which science is conducted in a way that should be emulated. In this context, Christine Zardecki, Steven Burley and their group at Rutgers (DOI: 10.1002/pro.3331), as well as Haruki Nakamura and coworkers in Japan (DOI: 10.1002/pro.3273), give updates on new tools available at the worldwide Protein Data Bank.
Another very useful structural database, PixelDB, has been developed by Amy Keating's group at MIT (DOI: 10.1002/pro.3320). It includes high-resolution structures of protein–peptide complexes and is organized to facilitate the study of structurally conserved versus non-conserved elements of protein–peptide engagement.
UCSF ChimeraX, developed by Thomas Ferrin and coworkers (DOI: 10.1002/pro.3235), is “next generation” molecular visualization software built on Chimera. As noted by a reviewer, “ChimeraX is a huge leap forward in the area of macromolecular (and beyond) visualization, using state-of-the-art methods and engineering, and ready to handle the kind of multi-source, multi-scale data we'll be seeing more and more often in the future.”
PDBsum (DOI: 10.1002/pro.3289) developed by Roman Laskowski and coworkers is an atlas of proteins for which structures are in the Protein Data Bank. It provides a wealth of useful visual information including interaction plots, “wiring diagrams” of secondary structure, Ramachandran plots and networks of related protein architectures.
In order to analyze and represent the conformational variation in proteins, Andrew Brereton and Andrew Karplus have developed Ensemblator v3, which is a novel approach to compare user-defined groups of models in residue-level detail (DOI: 10.1002/pro.3249). The approach, which requires no prior knowledge about an ensemble, classifies models into subgroups and calculates a novel “discrimination index” that quantifies similarities and differences.
When the Protein Data Bank was first created the emphasis was on having as many sets of coordinates deposited as quickly as possible so that the information would not be lost. There was less concern about checking the quality of the coordinates or having the supporting experimental data deposited along with the coordinates. Now, every entry in the PDB is accompanied with validation information which gives the user key guidelines as to the quality of the structure.
One of the most valuable tools for structure validation is MolProbity, developed by Jane and Dave Richardson and their coworkers. It is central to the validation summary which now accompanies each entry in the PDB and makes it possible for the user to assess the quality of the coordinates. The report included here (DOI: 10.1002/pro.3330) provides a major update of MolProbity based on a wealth of new information including better-determined hydrogen bonding and van der Waals parameters. New validations include updated rotamers, diagnosis of misfit secondary structure, and flagging of cis-nonPro or twisted peptides.
Another very useful tool to detect and remediate errors in crystallographic models is Structure Comparison (DOI: 10.1002/pro.3296) developed by Nigel Moriarty and coworkers which makes it possible to compare multiple similar structures and to correct differences not supported by appropriate electron density.
Wladek Minor and coworkers have developed Molstack, a cloud-based server useful for presenting structural models along with the electron density maps used to derive them, plus validation metrics useful for further validation (DOI: 10.1002/pro.3272).
The Adaptive Poisson–Boltzmann Solver (APBS) solvation software was developed by a consortium of users to solve the equations of continuum electrostatics for large biomolecular assemblies. The present report (DOI: 10.1002/pro.3280) by Nathan Baker and coworkers describes a number of recent enhancements including a geometry-based flow solvation model, a graph theory algorithm for determining pKa values, and an improved web-based tool for viewing electrostatics.
-
Protein and Small Molecule Modeling
The combination of readily available computing resources with large databases of experimentally determined protein structures and their complexes has led to an explosion in protein modeling, design, and prediction. A number of tools to facilitate such studies are included in this special issue.
IMP, developed by Andrej Sali, Benjamin Webb and coworkers (DOI: 10.1002/pro.3311), is a powerful modeling platform for integrative studies of macromolecular assemblies. It allows users to combine information from a variety of different sources such as electron microscopy, proteomics, sparse nuclear magnetic resonance, and X-ray crystallography.
Another powerful toolbox for integrative structure modeling is MMM, developed by Gunnar Jeschke's group (DOI: 10.1002/pro.3269). It grew out of a tool for predicting conformation distributions of spin label side-chains and now allows incorporation of restraints from electron paramagnetic resonance experiments with site-directed spin labeling, and from other experimental techniques.
DOCKGROUND, developed by Ilya Vakser, Petras Kundrotas and colleagues (DOI: 10.1002/pro.3295) is a comprehensive data resource for studying protein complexes. It provides the protein docking community with realistic test sets of varying difficulty, and facilitates the development of docking algorithms and intermolecular potentials.
Derek Woolfson and Christopher Wood have improved and updated their coiled-coil modeling tool CCBuilder (DOI: 10.1002/pro.3279). As noted by a reviewer, “The web interface is appealing, user friendly, and nicely integrated.”
Interfering with a protein–protein interaction can often be therapeutically beneficial. But at the same time, if the interface is large, flat and “featureless,” it may be difficult to disrupt with a small-molecule ligand. To address this challenge, Carlos Camacho and coworkers have developed AnchorQuery (DOI: 10.1002/pro.3303) which can be used to computationally screen libraries consisting of tens of millions of potential binders.
Many groups, under the leadership of David Baker, have contributed to the development of Rosetta, a molecular modeling suite which provides a wide range of tools for prediction and design of biological macromolecules. Here, Jeffrey Gray and colleagues describe ROSIE (DOI: 10.1002/pro.3313), which provides a common environment for hosting Rosetta protocols, and permits their easier implementation. Also, Jens Meiler and Amanda Duran describe progress toward the use of Rosetta in designing membrane proteins (DOI: 10.1002/pro.3335).
Jianpeng Ma and coworkers have been developing a novel method named OPUS-CSF for validating and scoring protein structural models. It is based on comparison with main-chain atoms from peptide segments of 5, 7, 9, and 11 residues in length, taken from all of the structures in the PDB (DOI: 10.1002/pro.3327). It will be very interesting to see how this method compares with more traditional approaches, both in terms of speed and effectiveness.
HDBSCAN, developed by Freddie Salsbury and coworkers (DOI: 10.1002/pro.3268), allows one to visualize correlated motion in proteins and is intended to address the question, “How are different parts of a protein dynamically coupled?” The approach is applied to several protein systems to illustrate the type of information that can be obtained.
-
Sequence Analysis
One of the most powerful tools for protein sequence analysis has been provided via the various iterations of Clustal. Multiple sequence alignment is a core component of bioinformatics, and Clustal Omega, as described here (DOI: 10.1002/pro.3290), will continue to be of exceptional value in making accurate alignments of multiple sequences.
Intrinsically disordered regions of protein structure are now known to occur frequently, and are involved in multiple biological functions. Zsuzsanna Dosztanyi describes a tool IUPred (DOI: 10/1002/pro.3334) to predict regions of protein disorder based on amino acid sequence.
It is well known that most amino acids can be encoded by two or more synonymous codons. What is not so well appreciated, however, is that synonymous codons cannot be interchanged with impunity. Substitution of a rarely-used codon for a common one, or vice versa, can have a substantial impact on the yield and folding efficiency of the encoded protein. %MinMax, developed by Patricia Clark's group, evaluates the relative frequencies of codon usage and can predict the effects of synonymous codon substitutions (DOI: 10.1002/pro.3336).
-
Proteins in the Cell
Arne Elofsson and colleagues have developed a tool, SubCons, to predict the subcellular localization of a protein based on its amino sequence (DOI: 10.1002/pro.3297). It is possible to carry out proteome-wide analysis and also to download predictions for several eukaryotic organisms.
About 10 years ago a very ambitious project was initiated, namely to develop a Human Protein Atlas. It was intended to describe the dynamic expression of all human proteins in different tissues—both in health and disease. The present report, from Peter Thul and Cecilia Lindskog (DOI: 10.1002/pro.3307), comprehensively describes the Atlas database functions and how users can utilize it for their own research.
-
Brian Matthews
-
Institute of Molecular Biology
-
University of Oregon
-
Eugene, OR 97403, USA
-
E-mail: [email protected]