Volume 27, Issue 1 pp. 195-201

Tools for Protein Science

Free Access

The SubCons webserver: A user friendly web interface for state-of-the-art subcellular localization prediction

M. Salvatore,

M. Salvatore

Science for Life Laboratory, Stockholm University, 171 21 Solna, Sweden

Department of Biochemistry and Biophysics, Stockholm University, 106 91 Stockholm, Sweden

Search for more papers by this author

N. Shu,

N. Shu

Science for Life Laboratory, Stockholm University, 171 21 Solna, Sweden

Department of Biochemistry and Biophysics, Stockholm University, 106 91 Stockholm, Sweden

Sweden Bioinformatics Infrastructure for Life Sciences (BILS), Stockholm University, Stockholm, Sweden

Search for more papers by this author

A. Elofsson,

Corresponding Author

A. Elofsson

[email protected]

Science for Life Laboratory, Stockholm University, 171 21 Solna, Sweden

Department of Biochemistry and Biophysics, Stockholm University, 106 91 Stockholm, Sweden

Correspondence to: A. Elofsson; E-mail: [email protected]Search for more papers by this author

M. Salvatore,

M. Salvatore

Science for Life Laboratory, Stockholm University, 171 21 Solna, Sweden

Department of Biochemistry and Biophysics, Stockholm University, 106 91 Stockholm, Sweden

Search for more papers by this author

N. Shu,

N. Shu

Science for Life Laboratory, Stockholm University, 171 21 Solna, Sweden

Department of Biochemistry and Biophysics, Stockholm University, 106 91 Stockholm, Sweden

Sweden Bioinformatics Infrastructure for Life Sciences (BILS), Stockholm University, Stockholm, Sweden

Search for more papers by this author

A. Elofsson,

Corresponding Author

A. Elofsson

[email protected]

Science for Life Laboratory, Stockholm University, 171 21 Solna, Sweden

Department of Biochemistry and Biophysics, Stockholm University, 106 91 Stockholm, Sweden

Correspondence to: A. Elofsson; E-mail: [email protected]Search for more papers by this author

First published: 13 September 2017

https://doi.org/10.1002/pro.3297

Citations: 11

Share a link

Email
Wechat
Bluesky

Abstract

SubCons is a recently developed method that predicts the subcellular localization of a protein. It combines predictions from four predictors using a Random Forest classifier. Here, we present the user-friendly web-interface implementation of SubCons. Starting from a protein sequence, the server rapidly predicts the subcellular localizations of an individual protein. In addition, the server accepts the submission of sets of proteins either by uploading the files or programmatically by using command line WSDL API scripts. This makes SubCons ideal for proteome wide analyses allowing the user to scan a whole proteome in few days. From the web page, it is also possible to download precalculated predictions for several eukaryotic organisms. To evaluate the performance of SubCons we present a benchmark of LocTree3 and SubCons using two recent mass-spectrometry based datasets of mouse and drosophila proteins. The server is available at http://subcons.bioinfo.se/

Abbreviations

CYT: cytoplasm
ERE: endoplasmic reticulum
EXC: extracellular
GLG: Golgi apparatus
LYS: lysosome
MEM: plasma membrane
MIT: mitochondria
MSA: multiple sequence alignment
NUC: nucleus
PEX: peroxisome
PSSM: Position-specific scoring matrix

Introduction

In eukaryotic cells proteins are located in different subcellular compartment. Localization and function of a protein are closely related. Therefore, the correct localization of proteins is crucial, and atypical subcellular localization can lead to several diseases, such as cancer1 and Alzheimer.2

To understand the system of a cell is necessary to have a complete map of subcellular proteomes. For many years, imaging3 and purification-based methods have been the most used experimental approaches.4, 5 Unfortunately, these methods are not always perfectly accurate, rather expensive and time-consuming.

In contrast, computational methods are cheaper and faster, therefore they are an important complement to experimental methods. Many localization predictors have been developed and improved since the introduction of the first signal peptide predictor more than 30 years ago.6 Today, prediction methods for specific localization,7, 8 for a few localizations9 or for a wide range of localizations10-15 exist.

Most of the computation tools combine biological understanding on how subcellular localization work with some machine-learning algorithm, including deep learning.16 They can roughly be divided into sequence- or annotation-based methods. Sequence-based methods use co- and post-translational targeting signals, linear motifs detections, amino acid distributions, gapped-paired, surface or pseudo amino acid compositions to predict the localization from the sequence directly. In contrast, annotation-based methods use annotations from databases including localization of homologous protein(s), annotated gene ontology terms, functional domains, text information from PubMed abstracts and protein-protein interactions.17 The most successful methods use a combination of both the approaches.

Most of these tools were developed using subcellular annotations from UniProt.18 Therefore, the evaluation of the performance is difficult, as there is often an overlap between training and test sets.19 In contrast to most other methods, SubCons was developed using data from a “golden dataset” consisting of only proteins with two confirmatory experimental evidences in either UniProt or in recent large-scale experimental studies of human cells.3-5

Some of the best tools are not easily accessible for the general scientists, either due to computational requirements or due to licensing. Therefore, for most users web-based tools is the best method to access subcellular prediction tools. However, a limitation of most web-based tools is that they do not scale, i.e. they cannot be used for proteomic wide predictions. Here, we present the SubCons web server, a web interface based on the SubCons algorithm that predicts nine subcellular localizations, combining four predictors using a Random Forest classifier.19 SubCons provide rapid, and accurate predictions by using a fast approach to generate PSSMs as introduced earlier in TOPCONS,20 but also provides the user the option to submit entire proteomes.

In addition to presenting the web-server we also demonstrate the good performance of SubCons using two novel mass spectrometry datasets of mouse and drosophila proteins.21, 22 We compare the performance in these dataset with the state of the art method, LocTree3.23

Results and Discussion

Instructions for SubCons

The SubCons web server has an user-friendly environment see Figure 1. The input to the server can either be one amino-acid sequence or a file with multiple sequences in FASTA-format that will be processed in due time. It is possible to submit up to 10,000 proteins to the server. To facilitate proteome-wide assignments, we have also developed a standard WSDL interface for programmatic access to the server.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

The SubCons web page. The image shows the SubCons web interface that can be used to run single or multiple fasta sequences. The users can either paste the sequence(s) or upload a file containing the protein sequence(s). Providing an email address the user can receive a notification when the job is finished. In the download section, we provide precalculated localization prediction for organisms including *Danio rerio, Caenorhabditis elegans, Gallus domesticus, Drosophila melanogaster, Mus musculus, and Homo sapiens*.

Today, many tools in Bioinformatics use profiles/PSSMs. Profiles are often obtained by searching large sequence database such as Uniprot18 with PSI-BLAST.24 However, the rapid increase of database sizes slows down the search so that it usually takes up to several minutes. Clearly, this is a bottleneck for a web-server, where in general a user expects a quick response. To improve the response time we are using PRODRES20 a method that often speeds up a typical PSI-BLAST search one order of magnitude, for details see the methods section. Currently the average time to run a single protein in the web-server is a few minutes, depending on the overload of the server. Result will appear immediately if the subcellular localization of the protein has been predicted before, as the server caches the predictions.

SubCons output

Users can analyze results either graphically on the screen see Figures 2, 3 and 4 and/or download the predictions in plain-text format. Moreover, if provided, results can be sent via email. The results show the prediction score for each of the nine subcellular localizations (CYT, ERE, EXC, GLG, LYS, MEM, MIT, NUC, PEX) both for SubCons (SubCons-RF-Score) and the other tools (CELLO2.5, MultiLoc2, and SherLoc2). To note, LocTree2 provides only single localization, thus we can show only the score for the predicted localization.

The predicted localization probabilities of SubCons are computed as the mean predicted class probabilities of each tree in the forest. However, the score from classes are not directly related to the reliability of the prediction. Therefore, SubCons output shows a reliability score of the predicted class (SubCons reliability), calculated from the ROC curve describing the relationship between score and prediction accuracy for each subcellular localization see Figure 5.19

SubCons database

For convenience we also provide precalculated localization prediction for organisms including Danio rerio, Caenorhabditis elegans, Gallus domesticus, Drosophila melanogaster, Mus musculus, and Homo sapiens.

Benchmark dataset

SubCons was originally trained using only human proteins. Here, we have tested the performance of SubCons on two independent mass-spectrometry datasets.21, 22 A mouse and drosophila datasets which have at most 20% sequence identity to proteins in the same sets and in the SubCons training data set. We compare SubCons with LocTree3, another state-of-the-art subcellular localization predictor that largely transfers its predictions directly from Uniprot. Due to this it is not feasible to compare the predictors on Uniprot directly.

Mouse dataset

From Table 1, it can be seen that the dataset is clearly biased towards mitochondrial proteins with more than 40% of the proteins classified to be mitochondrial. This bias does not seem to severely affect any of the predictors as they accurately estimate the correct number of mitochondrial proteins.

Table 1. Fraction of Predicted Localizations Based on the Mouse Dataset

Loc	Mass-Spec	LocTree3	SubCons
NUC	17%	22%	13%
CYT	14%	19%	25%
MIT	44%	39%	43%
PEX	2%	1%	2%
ERE	5%	6%	4%
GLG	0%	2%	0%
LYS	5%	0%	4%
MEM	10%	5%	8%
EXC	1%	6%	2%

Here, we show the number of predicted localizations by LocTree3 and SubCons based on the experimental proteins of the mouse dataset.

However, there are some small differences in the number of proteins predicted by the two methods when compared with the experimental dataset. Both prediction methods overpredict cytoplasmic and extracellular proteins and underpredict membrane proteins, see Table 1.

As described in SubCons,19 we evaluate the results for single location using the F₁ score25 (F₁) and Matthews correlation coefficient (MCC).26 For the evaluation of the performance over all subcellular locations, we use the generalized squared correlation (GC²)27 as well as F₁ score, which in the multiclass case, is defined as the weighted average of the F₁ score for each class.19

In Table 3, we show the performance of SubCons and LocTree3 using two different measures. In general the two methods perform on par, the F₁ score is similar, around 80%. However, using GC², that is less dependent on an uneven distribution, we observed that the performance of SubCons appears slightly better (0.47 vs. 0.43). The difference in performance between the two predictors is mainly due to the inability of LocTree3 to predict lysosomal proteins. Therefore, we estime that the overall performance of the two individual predictors is similar, but it varies for different subcellular compartments.

It can be observed that the F₁-score of LocTree3 is higher than SubCons for nuclear, cytosolic, and endoplasmic reticulum, while SubCons shows a better performance for membrane, mitochondrial and extracellular proteins, see Table 3. This indicates that there is sometimes a balance between performances in different compartments. Both methods perform equally for mitochondrial protein. SubCons predict lysosomal proteins, with an F₁-score of 82%, while Loctree3 cannot predict this class at all. Both predictors perform best for mitochondrial, endoplasmic reticulum and nuclear proteins. It can also be noted that no Golgi proteins are present in the examined dataset.

Benchmark on a Drosophila dataset

Table 2 shows that the dataset is clearly biased towards membrane proteins with 49% of all entries in this class compared to an estimate of 20–30% of a typical genome.28 It can also be seen that both SubCons and LocTree3 over-predict cytoplasmic and under-predict membrane proteins, see Table 2.

Table 2. Fraction of Predicted Localizations Based on the Drosophila Dataset

Loc	Mass-Spec	LocTree3	SubCons
NUC	6%	12%	9%
CYT	17%	42%	42%
MIT	16%	14%	19%
PEX	1%	1%	2%
ERE	7%	9%	6%
GLG	3%	5%	2%
LYS	1%	0%	1%
MEM	49%	8%	17%
EXC	0%	9%	2%

Here, we show the number of predicted localizations by LocTree3 and SubCons based on the experimental proteins of the drosophila dataset.

Table 3. Performance of the Predictors in the Mouse Dataset

Loc	#	LocTree3	MCC	SubCons	MCC
Loc	#	F₁	MCC	F₁	MCC
NUC	109	0.87	0.85	0.75	0.72
CYT	90	0.72	0.67	0.62	0.56
MIT	277	0.92	0.87	0.93	0.88
PEX	13	0.73	0.74	0.75	0.75
ERE	34	0.86	0.86	0.77	0.78
GLG	0	0	0	0	0
LYS	33	0	0	0.82	0.83
MEM	64	0.54	0.53	0.55	0.5
EXC	9	0.31	0.37	0.7	0.71
	#	F₁	GC²	F₁	GC²
Overall	629	0.78	0.43	0.8	0.47

The table shows the performance in the mouse data set yield by LocTree3 and SubCons in terms of F₁ score and generalized correlation coefficient. Moreover, the table shows the fraction of correct predictions, in terms of F₁ score and Matthews correlation coefficient, for each of the nine standard localizations.(# = proteins in the dataset for each localization, GC² = generalized correlation coefficient, F₁ = F₁ score, and MCC = Matthews correlation coefficient.

The F₁ score of SubCons is slightly better (55%) than LocTree3 (48%), while the opposite is seen when using GC², 0.42 for LocTree3 vs. 0.37 for SubCons. Even in this dataset the overall performance of the two predictors is similar, but it differs for different subcellular compartments.

Looking at F₁ score for individual compartments, it is clear that LocTree3 is better than SubCons for lysosomal, cytosolic, mitochondrial, endoplasmic reticulum, and peroxisomal proteins, see Table 4. On the other hand, SubCons shows a better performance for membrane proteins and both methods perform similarly for nuclear proteins. Finally, it can also be noted that no extracellular proteins are present in this dataset.

Table 4. Performance of the Predictors in the Drosophila Dataset

Loc	#	LocTree3	MCC	SubCons	MCC
Loc	#	F₁	MCC	F₁	MCC
NUC	11	0.65	0.67	0.64	0.63
CYT	33	0.5	0.4	0.39	0.23
MIT	30	0.81	0.77	0.78	0.73
PEX	2	1	1	0.67	0.7
ERE	13	0.73	0.72	0.67	0.65
GLG	5	0.71	0.74	0.5	0.51
LYS	2	0	0	0.67	0.71
MEM	94	0.29	0.31	0.5	0.44
EXC	0	0	0	0	0
	#	F₁	GC²	F₁	GC²
Overall	190	0.48	0.42	0.55	0.37

The table shows the performance in the drosophila data set yield by LocTree3 and SubCons in terms of F₁ score and generalized correlation coefficient. Moreover, the table shows the fraction of correct predictions, in terms of F₁ score and Matthews Correlation Coefficient, for each of the 9 standard localizations.(# = proteins in the dataset for each localization, GC² = generalized correlation coefficient, F₁ = F₁ score, and MCC = Matthews correlation coefficient.

Conclusions

Here, we introduce the SubCons web server—a state of the art method for subcellular localization. SubCons can be helpful to understand the localization of a protein, in particular as it scales to complete genomes. In addition to providing state of the art predictions, a confidence score rates the reliability of a prediction enabling the user to evaluate the reliability of the prediction. We believe that SubCons should be a valuable resource for protein scientists.

We do also provide a comparison of SubCons19 and LocTree323 using two recent mass spectrometry datasets.21, 22 Here, it is shown that the overall performance is similar but differs for different subcellular compartments.

Materials and Methods

Dataset used in this study

SubCons was originally trained only using human proteins as described earlier.19 Here we investigate the performance of SubCons and LocTree3 in two datasets of mouse and drosophila proteins derived from mass spectrometry studies using the hyperLOPIT method.4, 21

The initial mouse and drosophila datasets contains 885 and 203 proteins, respectively. After homology reduction at 20% sequence identity using BLASTClust29 629 mouse and 190 drosophila proteins remained.

Both the datasets were originally generated using a combination of mass spectrometry, biochemical fractionation, and iTRAQ 8-plex.4 We retrieved all the experimental protein localizations using the pRloc package (www.bioconductor.org/packages).

The original datasets4, 22 provides subcellular localizations at different resolution see supplementary table in Ref.,19 therefore, to make comparisons feasible we have mapped all subcellular classifications into nine standard compartments (CYT, ERE, EXC, GLG, LYS, MEM, MIT, NUC, PEX). The composition of the mapped dataset is showed in Table 1. Here, it is evident that mitochondrial proteins represent the most present class (44%), followed by nuclear (17%), cytoplasmic (14%) and membrane (10%) proteins. The presence of other localization varies between 1 and 5%. It can also be noted that no Golgi apparatus proteins are present. On the other hand, in the drosophila it is evident that membrane proteins represent the most present class (49%), followed by cytoplasmic (17%), mitochondrial (16%), endoplasmic reticulum (7%), and nucleus (6%) proteins. The presence of other localization varies between 1 and 3%.

The SubCons algorithm

To increase the number of proteins of known localization and to train SubCons, we used two experimentally verified data sets19 and manually reviewed localizations from UniProt (www.uniprot.org).18

In SubCons, the scores of the predictors are combined into a vector of 36 values (4 predictors times 9 “standard” localizations).19 LocTree2 provides only single localization, thus we use the predicted score for the predicted class and 0 for all other classes. On the other hand, CELLO2.5, MultiLoc2, and SherLoc2 provide a score for each localization, that we could directly use. This vector is thereafter used as an input for a Random Forest classifier30, 31 implemented using the Scikit-learn library,32 see Figure 6.

To generate the PSSM profile required by LocTree2,11 we use PRODRES a tool developed in our lab that first scans a query sequence(s) against the Pfam database and then use all the full-length sequences to create a query-specific database that is further scanned for homologous proteins.20 Importantly, if no hits are found, PRODRES uses Psi-Blast24 to generate the PSSM profile20

LocTree3

LocTree3 is a further step of methods based on sequence homology and on the assumption that a protein tends to stay in the same compartment in the course of evolution. It is not trivial to determine how similar a pair of proteins has to be in order to infer the possible subcellular localization. Using sequence alignment programs such as BLAST, it is possible to transfer the subcellular localization annotation from the best hit to the query, or in another word to infer subcellular localization from the annotation of homologs which do not necessarily have experimentally known subcellular localization.17

LocTree3 combines homology search information when available and all the features used in LocTree2.11

The annotated localizations are transferred by homology using PSI-BLAST.24 For all proteins with experimentally known localization, a PSI-BLAST profile24 is generated, using an 80% nonredundant database combining UniProt18 and PDB.33 These profiles are then aligned against all proteins with experimental annotation of a single localization. PSI-BLAST24 hits to the input protein are excluded.23

Acknowledgments

This work has been supported by the Sven och Lilly Lawski's fond för naturvetenskaplig forskning, the Swedish Research Council (VR-NT 2012–5046) and the Swedish E-science Research Center.

Conflict of interests

No conflict of interests is declared.

REFERENCES

1 Lurila K, Vihinen M (2009) Prediction of disease-related mutations affecting protein localization. BMC Genomics 10: 122.
Google Scholar
2 Park S, Yang JS, Shin YE, Park J, Jang SK, Kim S (2011) Protein localization as a principal feature of the etiology and comorbidity of genetic diseases. Mol Syst Biol 7: 494.
10.1038/msb.2011.29
CAS PubMed Web of Science® Google Scholar
3 Uhlen M, Oksvold PP, Fagerberg L, Lundberg E, Jonasson K, Forsberg M, Zwahlen M, Kampf C, Wester K, Hober S, Wernerus H, Bjrling L, Ponten F (2010) Towards a knowledge-based human protein atlas. Nat Biotechnol 28: 1248–1250.
10.1038/nbt1210-1248
CAS PubMed Web of Science® Google Scholar
4 Christoforou A, Arias AM, Lilley KS (2014) Determining protein subcellular localization in mammalian cell culture with biochemical fractionation and itraq 8-plex quantification. Shotgun Proteom Methods Protocols Method Mol Biol 1156: 157–174.
10.1007/978-1-4939-0685-7_10
CAS PubMed Web of Science® Google Scholar
5 Breckels LM, Gatto L, Christoforou A, Groen AJ, Lilley KS, Trotter MWB (2013) The effect of organelle discovery upon sub-cellular protein localisation. J Proteomics 88: 129–140.
10.1016/j.jprot.2013.02.019
CAS PubMed Web of Science® Google Scholar
6 von Heijne G (1986) A new method for predicting signal sequence cleavage sites. Nucleic Acids Res 14: 4683–4690.
10.1093/nar/14.11.4683
CAS PubMed Web of Science® Google Scholar
7 Cokol M, Nair R, Rost B (2000) Finding nuclear localization signals. EMBO Rep 1: 411–415.
10.1093/embo-reports/kvd092
CAS PubMed Web of Science® Google Scholar
8 Savojardo C, Martelli PL, Fariselli P, Casadio R (2014) Tppred2: improving the prediction of mitochondrial targeting peptide cleavage sites by exploiting sequence motifs. Bioinformatics 30: 2973–2974.
10.1093/bioinformatics/btu411
Google Scholar
9 Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) Predicting subcellular localization of proteins based on their n-terminal amino acid sequence. J Mol Biol 30: 1005–1016.
Google Scholar
10 Yu CS, Chen YC, Lu CH, Hwang JK (2006) Prediction of protein subcellular localization. Proteins Struct Funct Bioinform 64: 643–651.
10.1002/prot.21018
CAS PubMed Web of Science® Google Scholar
11 Goldberg T, Hamp T, Rost B (2012) Loctree2 predicts localization for all domains of life. Bioinformatics 28: 458–465.
10.1093/bioinformatics/bts390
CAS PubMed Web of Science® Google Scholar
12 Blum T, Briesemeister S, Kohlbacher O (2009) Multiloc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction. BMC Bioinform 10: 274–285.
10.1186/1471-2105-10-274
CAS PubMed Web of Science® Google Scholar
13 Briesemeister S, Blum T, Brady S, Lam Y, Kohlbacher O, Shatkay H (2009) Sherloc2: a high-accuracy hybrid method for predicting subcellular localization of proteins. J Proteom Res 8: 5363–5366.
10.1021/pr900665y
CAS PubMed Web of Science® Google Scholar
14 Horton P, Park K, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, Nakai K (2007) Wolfpsort: protein localization predictor. Nucleic Acids Res 35: 585–587.
10.1093/nar/gkm259
PubMed Web of Science® Google Scholar
15 Briesemeister S, Rahnenführer J, Kohlbacher O (2010) Yloc-an interpretable web server for predicting subcellular localization. Nucleic Acids Res 38: 497–502.
10.1093/nar/gkq477
CAS PubMed Web of Science® Google Scholar
16 Jose Juan Almagro A, Casper Kaae S, Soren Kaae S, Henrik N, Ole W (2017, in press) Deeploc: prediction of protein subcellular localization using deep learning. Bioinformatics.
Google Scholar
17 Nielsen H (2015) Predicting subcellular localization of proteins by bioinformatic algorithms, Volume 10 of current topics in microbiology and immunology. Berlin, Heidelberg: Springer.
Google Scholar
18UniProt-Consortium (2015) Uniprot: a hub for protein information. Nucleic Acids Res 43:D204–D212.
Google Scholar
19 Salvatore M, Warholm P, Shu N, Basile W, Elofsson A (2017) Subcons: a new ensemble method for improved human subcellular localization predictions. Bioinformatics 33: 2464–2470.
10.1093/bioinformatics/btx219
Google Scholar
20 Tsirigos KD, Peters C, Shu N, Käll L, Elofsson A (2015) The topcons web server for consensus prediction of membrane protein topology and signal peptides. Nucleic Acids Res 43: W401–W407.
10.1093/nar/gkv485
CAS PubMed Web of Science® Google Scholar
21 Christoforou A, Mulvey CM, Breckels LM, Geladaki A, Hurrell T, Hayward PC, Naake T, Gatto L, Viner R, Arias AM, Lilley KS (2016) A draft map of the mouse pluripotent stem cell spatial proteome. Nat Commun 7: 8992.
10.1038/ncomms9992
PubMed Web of Science® Google Scholar
22 Tan DJL, Dvinge H, Christoforou A, Bertone P, Arias AM, Lilley KS (2009) Mapping organelle proteins and protein complexes in drosophila melanogaster. J Proteome Res 8: 2667–2678.
10.1021/pr800866n
CAS PubMed Web of Science® Google Scholar
23 Goldberg T, Hecht M, Hamp T, Karl T, Yachdav G, Ahmed N, Altermann U, Angerer P, Ansorge S, Balasz K, Bernhofer M, Betz A, Cizmadija L, Do KT, Gerke J, Greil R, Joerdens V, Hastreiter MM, Hembach K, Herzog M, Kalemanov M, Kluge M, Meier A, Nasir H, Neumaier U, Prade V, Rebel J, Sorokoumov A, Troshani I, Vorberg S, Waldraff S, Zierer J, Nielsen H, Rost B (2014) Loctree3 prediction of localization. Nucleic Acids Res 42: 350–355.
10.1093/nar/gku396
CAS PubMed Web of Science® Google Scholar
24 Altschul S, Madden T, Schäffer A, Zhang J, Zhang Z, Miller W, Lipman D (1997) Gapped blast and psi-blast: A new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402.
10.1093/nar/25.17.3389
CAS PubMed Web of Science® Google Scholar
25 Van Rijsbergen CJ (1979). Information retrieval. 2nd Ed. Butterworth-Heinemann Newton, MA, USA.
Google Scholar
26 Matthews BW (1975) Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochim Biophys Acta 405: 442–451.
10.1016/0005-2795(75)90109-9
CAS PubMed Web of Science® Google Scholar
27 Baldi P, Brunak S, Chauvin Y, Andersen C, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16: 412–424.
10.1093/bioinformatics/16.5.412
CAS PubMed Web of Science® Google Scholar
28 Elofsson A, von Heijne G (2007) Membrane protein structure: Prediction versus reality. Annu Rev Biochem 76: 125–140. doi:10.1146/annurev.biochem.76.052705.163539.
10.1146/annurev.biochem.76.052705.163539
CAS PubMed Web of Science® Google Scholar
29 Alva V, Nam SZ, Sding J, Lupas AN (2016) The mpi bioinformatics toolkit as an integrative platform for advanced protein sequence and structure analysis. Nucleic Acids Res 44: W410–W415.
10.1093/nar/gkw348
CAS PubMed Web of Science® Google Scholar
30 Breiman L (2001) Random forests. Mach Learn 45: 5–32.
10.1023/A:1010933404324
Web of Science® Google Scholar
31 Kingsford C, Salzberg S (2008) What are decision trees? Nat Biotechnol 26: 1011–1013.
10.1038/nbt0908-1011
CAS PubMed Web of Science® Google Scholar
32 Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12: 2825–2830.
Web of Science® Google Scholar
33 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28: 235–242.
10.1093/nar/28.1.235
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume27, Issue1

Special Issue on Tools for Protein Science

January 2018

Pages 195-201

The SubCons webserver: A user friendly web interface for state-of-the-art subcellular localization prediction