Volume 7, Issue 5-6 pp. 388-391
Personal Perspective
Full Access

Bringing proteomics into the clinic: The need for the field to finally take itself seriously

Lennart Martens

Corresponding Author

Lennart Martens

Department of Medical Protein Research, VIB, Ghent, Belgium

Department of Biochemistry, Ghent University, Ghent, Belgium

Correspondence: Professor Lennart Martens, Department of Medical Protein Research, VIB, Ghent University, A. Baertsoenkaai 3, B-9000 Ghent, Belgium

E-mail:[email protected]

Fax: +32-92649484

Search for more papers by this author
First published: 02 May 2013
Citations: 13

Abstract

Proteomics has fast become a standard tool in the life sciences, with increasingly sophisticated approaches and instruments delivering ever growing numbers of identified and quantified proteins. Yet despite the enormous technological progress, and the triumphant papers published on whole-cell proteomes being collected and analyzed, proteomics has so far failed to enter the clinic for routine applications. This is a peculiar contradiction, and one that warrants some closer study. I here argue that for proteomics to make a difference in the clinic, it needs to stop shirking responsibility, and to mature into an analytical, transparent, and reproducible discipline that also invests in the consolidation of its technology rather than only focusing on the next big leap forward. A key enabling factor in this maturation process is quality control and quality assurance, with bioinformatics, in its least noticeable but most influential form, as a key underlying technology.

Abbreviations

  • CPTAC
  • Clinical Proteomic Tumour Analysis Consortium
  • QC
  • quality control
  • Proteomics has matured considerably in technological terms. The advent of improved instrumentation 1 and methods 2 has enabled the field to dig ever deeper into the proteomes of cells and tissues, identifying and quantifying thousands of proteins in the process 3-5. As a result of this continuous increase in analytical power, several exultant papers on the technology have been published 6-8. However, growth in a field does not (and should not) only consist of raw analytical power. Indeed, besides such technological development, a field can also mature as a respected supplier of information to downstream research, and it can become an accredited, production-grade discipline in terms of accuracy, precision, and reproducibility. The former, where proteomics can for instance function as a key data supplier for biological model development, has recently been addressed in an editorial on publication guidelines in the field, where it is pointed out that proteomics datasets should be aimed at such predictive modeling efforts in order for them to be considered of real value 9. The latter type of growth on the other hand, concerning the maturation of proteomics as a quality-assured analytical platform, seems to be less fashionable at the present time. Yet, for a field that has entered its second decade as a high-throughput means to detect and analyze entire proteomes, it seems that consolidation efforts to establish the mass spectrometer as a reliable and reproducible tool to analyze hundreds to thousands of samples with consistent sensitivity and specificity are already overdue. That is not to say that efforts have not been undertaken, as recent publications on across-site reproducibility illustrate 10-12. Yet despite the effort spent, all these studies are comparatively small scale, relying on relatively simple samples and on a high degree of coordination and crosstalk between the partners involved. It is also worth noting that some of these studies consider discovery experiments, while others look at validation experiments using targeted proteomics. These two types of experiments typically come with different robustness requirements, where discovery experiments can be more tolerant of limited robustness if follow-up studies are cheap and fast anyway. As such, a variable threshold for reproducibility should be taken into account, much as variable thresholds for statistical significance can be chosen a priori when applying a test. Overall, however, as a field, we are quite far away from demonstrating that we can run anything such as a set of 20 patient-derived samples reliably at 50 different sites in either discovery or validation type experiments, even if standardized instrumentation and protocols could be used. Note that there is no implicit statement here that we cannot in principle perform this feat with current technology; it is just that as a field we have never seriously invested in building up the required analytical rigor. The NCI CPTAC (Clinical Proteomics Tumor Analysis Consortium) project probably has come closest to such an effort, and it is telling that one of the main papers to come out of this project has been the definition of a set of automatically extracted quality control features that can be used to assess system performance 13. The CPTAC project work also showed the importance of consistent and standardized sample handling, and the need to limit preanalytical effects as much as possible, issues that are of course not unique to proteomics, and that need to be taken carefully into account in any analytical endeavor. Since the publication of the quality control paper by the CPTAC consortium, a follow-up implementation has been published to generalize the allowed input formats and fix some minor flaws 14, with another, more focused approach to automated quality control recently published as well 15. Compared to an earlier effort to perform online quality scans to halt a system when its performance dropped below a threshold 16, these more recent papers all discuss extensive a posteriori metrics meant to be used exclusively for quality control and reporting. It cannot be emphasized sufficiently how important such a development is for the long-term growth of the field of proteomics, and the authors of the above-mentioned papers should correspondingly command the community's respect for their role as intrepid trailblazers in this neglected area. Indeed, where previous reporting on documentation and quality filtering methods has covered the details of the experiment 17, or the specifics of protein or peptide identification 18, 19, there has of yet been no movement on behalf of the journals to mandate concise, standardized quality control reports alongside proteomics data. It is worth noting in this context that the mandatory accompaniment of experimental data with a standardized quality control report is not an uncommon practice in other, closely related fields. The sibling field of protein structure analysis for instance, has for a long time already mandated every structure file to be processed through publicly available, standardized online tools that produce standardized quality control reports (see http://www.rcsb.org/pdb/static.do?p=software/software_links/analysis_and_verification.html) as a prerequisite for structure deposition. It is a legitimate question then, why a similar set of tools, standards, and stringent guidelines do not yet exist for MS-based proteomics.

    I believe that one of the main issues is the lack of confidence that proteomics researchers have in their own capabilities. Despite the headline success stories that proteomics analyses generate, very few labs actually accumulate a continuous and comprehensive set of metrics that show their overall analytical performance over time. That is not to say that no control is performed; rather, such procedures are in the eye of the beholder, and are usually based on ad hoc information gleaned from the overall results, or on incidental data from instrument reporting. The way I see to break this rather unproductive predicament is to move forward in small, incremental steps. First of all, the key concept upon which to base all other steps is informed self-confidence. For this, there is a need for freely available, open source, automated, and easy-to-use software to extract quality control metrics from MS datasets. These tools (there can, and probably should be, more than one, as is the case for protein structure validators) then need to be put in place in proteomics labs, but initially only for internal use. As noted above, several such tools already exist, although they can likely be improved further. When researchers are confronted with standardized, easily inspected and analyzed reports on their own performance, corrective measures can readily be taken where required, and overall self-confidence will grow as the metrics show good performance. Notice how easy a sell this should be: free software, easy to use, that provides information to make you more confident when things go right, and that helps you take immediate corrective action if things go wrong. An ideal partner to such quality control (QC) introspection will be the availability of standardized and proven (through peer replication and review) key protocol steps. Interesting efforts in this direction have also shown up in the literature, with a recent example by the Sickmann group where trypsin digest is scrupulously evaluated, even going to the effort to compare the performance of trypsin batches obtained from different vendors 20. The next step up the ladder then, introduces the concept of transparency, where the in-house quality control becomes public, through association with publicly shared data in repositories such as PRIDE 21. Obviously, there will be more hesitation for this step, even if researchers feel quite confident of their own data. Will the community accept the achieved levels of quality, or will the data look bad in comparison to others? I firmly believe that this sort of transparency in data quality reports will actually show that proteomics is a technique that works well in the hands of many, rather than showing that it is an arcane art best left to only a select few specialists. Note that this does leave ample room for specialization in terms of the complexity of the analytical task that can be handled, yet robust, analytical proteomics will never be cutting-edge proteomics. It will rather be last year's proteomics, but done carefully and consistently. The positive news here is that the required bioinformatics infrastructure to support such transparency is already very much in place, and that little additional investment is needed to produce a working system where mandatory deposition of QC data along with experimental data can be safely instated by journals in the field.

    Once a system is in place that records and shares standard QC reports from labs all over the world, it will be time to advance another step. A variety of labs should then be enlisted to perform across laboratory comparisons on one or more reference samples. Here too, trailblazing efforts have already been carried out, including the famous ABRF Proteomics Research Group studies that brought us the UPS (universal proteomics standard) samples, the HUPO (Human Proteome Organisation) test samples study (rather unfortunately, and incorrectly colloquially referred to by some as the HUPO “irreproducibility study”) 22, the NCI CPTAC yeast samples 11, or even standard samples of phosphorylated peptides 23. By working toward a consistent, high-quality analysis of such a standard (or set of standards), the field would effectively be moving toward support for accreditation testing, where a prescribed standard of analytical prowess becomes formalized and routinely tested. Such a level of analytical rigor would break the last conceptual barriers toward running proteomics analyses routinely in the clinic. At the same time, it needs to be pointed out that this last step constitutes a large amount of work, and that it is implicitly difficult to reconcile with continuous technology development. Indeed, it is extremely hard to standardize and consolidate to such high levels when the underlying instrumentation and methodologies remain in constant flux. It thus becomes important to recognize whether the development of the field is best characterized by a hyperbole, indicative of continuous asymptotic growth, or whether it manifests itself as a series of sigmoids, where periods of intense and rapid development are alternated by relatively stable periods of only incremental advances. Perhaps needless to say, the latter scenario is much more conducive to consolidation efforts, while being less appealing to instrument vendor marketing departments. It is also understood that such a level of consolidation is decidedly less appealing than continuous development, and that it is correspondingly harder to obtain funding for these efforts. Yet, despite this hurdle, sufficient examples already exist where large-scale funders with important stakes in delivering on this technology have provided clear vision with respect to funding efforts to push quality control in the field of proteomics. Notable examples include the EU-funded standardization and QC efforts in the FP6 ProDaC grant, and the FP7 ProteomeXchange and PRIME-XS grants, the above-mentioned NCI CPTAC program, and the Wellcome Trust's support for quality control at the PRIDE database 24.

    As a field, I therefore believe we should now take a moment to reflect upon our future, and the role and relative importance of quality control in our growth strategy. Allowing proteomics to mature to the point that it delivers in the clinic is not a pipe dream, but it is a lot of work. And while this work may not seem cutting edge, and does not hold the appeal of the next big leap forward, it will be essential for the long-term viability of the field. The basic premise after all is simply that the field takes up its responsibility as a unique and powerful analytical tool in the life sciences, and vouches to continue to do so well into the future.

    Acknowledgments

    The author acknowledges the support of Ghent University (Multidisciplinary Research Partnership “Bioinformatics: from nucleotides to networks”), and the PRIME-XS and ProteomeXchange projects funded by the European Union 7th Framework Program under grant agreement numbers 262067 and 260558, respectively.

    The author has declared no conflict of interest.

        The full text of this article hosted at iucr.org is unavailable due to technical difficulties.