We have developed an algorithm, ParSe, which accurately identifies from the primary sequence those protein regions likely to exhibit physiological phase separation behavior. Originally, ParSe was designed to test the hypothesis that, for flexible proteins, phase separation potential is correlated to hydrodynamic size. While our results were consistent with that idea, we also found that many different descriptors could successfully differentiate between three classes of protein regions: folded, intrinsically disordered, and phase-separating intrinsically disordered. Consequently, numerous combinations of amino acid property scales can be used to make robust predictions of protein phase separation. Built from that finding, ParSe 2.0 uses an optimal set of property scales to predict domain-level organization and compute a sequence-based prediction of phase separation potential. The algorithm is fast enough to scan the whole of the human proteome in minutes on a single computer and is equally or more accurate than other published predictors in identifying proteins and regions within proteins that drive phase separation. Here, we describe a web application for ParSe 2.0 that may be accessed through a browser by visiting https://stevewhitten.github.io/Parse_v2_FASTA to quickly identify phase-separating proteins within large sequence sets, or by visiting https://stevewhitten.github.io/Parse_v2_web to evaluate individual protein sequences.

1 INTRODUCTION

Protein-mediated macromolecular phase separation, through which membrane-free coacervates form spontaneously from the cellular milieu (Brady et al., 2017; Brangwynne et al., 2009; Lafontaine et al., 2021; Molliex et al., 2015), is increasingly recognized as an important organizing phenomenon in cells (Chong & Forman-Kay, 2016; Gomes & Shorter, 2019; Mitrea & Kriwacki, 2016). By forming specific compartments and micro-environments, protein-mediated macromolecular phase separation, or, more generally, protein phase separation, exerts control over the biochemical reactivity within cells (Jacobs et al., 2021; Li et al., 2018; Zhang et al., 2021). Biological coacervates can form in response to environmental stress (Dao et al., 2018; Rabouille & Alberti, 2017), at specific points in the cell cycle (Gibson et al., 2019; Yamazaki et al., 2022), or exist constitutively (Lafontaine et al., 2021), and have been found to facilitate key cellular processes, including transcription, translation, RNA processing, DNA damage repair, signaling, and metabolism (Chong & Forman-Kay, 2016; Kang et al., 2022; Liu et al., 2021; Lu et al., 2018; Oshidari et al., 2020; Prouteau & Loewith, 2018). Moreover, dysregulation of protein phase separation has been associated with several human diseases (Aguzzi & Altmeyer, 2016; Alberti & Dormann, 2019; Tsang et al., 2020), for example neurodegeneration (Molliex et al., 2015; Prasad et al., 2019) and cancer (Bouchard et al., 2018).

While phase separation can be driven by multivalent interactions between many types of protein domains, including ordered domains (Bouchard et al., 2018; Su et al., 2016), many proteins that drive phase separation have intrinsically disordered regions (IDRs) that are necessary and sufficient for phase separation to occur (Martin et al., 2020; Mitrea & Kriwacki, 2016; Murthy et al., 2019; Uversky et al., 2015; Vernon et al., 2018). Accurate identification of IDRs that drive phase separation is important for testing the underlying mechanisms of phase separation, identifying biological processes that rely on phase separation, and designing sequences that modulate phase separation. To this end, we created the ParSe algorithm (Partition Sequence; voiced as “parse”). ParSe identifies phase-separating (PS) IDRs starting from predictions of hydrodynamic size (Paiz et al., 2021). The correlation between PS IDR potential and hydrodynamic size assumes that the same forces that drive compaction in monomeric proteins also drive protein phase separation (Dignon et al., 2018; Lin et al., 2020; Lin & Chan, 2017; Zeng et al., 2020). Our results were consistent with that idea (Paiz et al., 2021). However, we also found robust property differences between folded, ID, and PS ID protein regions (Ibrahim et al., 2023). In ParSe 2.0, an optimal set of property scales allows facile predictions of domain-level structure and provides a simple, quantitative metric for the sequence-calculated phase separation potential. Notably, the ParSe-computed PS potential can be modified to account for interactions between amino acids and trained to reproduce effects of mutations on phase separation behavior (Ibrahim et al., 2023).

A benefit to using ParSe 2.0, compared to the many other available protein phase separation predictors (Chu et al., 2022; Hardenberg et al., 2020; Klus et al., 2014; Lancaster et al., 2014; Pancsa et al., 2021; Vernon et al., 2018), is that it can be broadly applied for analyses on very large scales, even to entire proteomes. The algorithm is computationally simple and fast enough to scan tens of thousands of sequences in minutes using a single computer. Moreover, its algorithmic simplicity does not diminish its accuracy; we have found ParSe 2.0 to be as, or more, accurate than other published predictors in identifying proteins and regions within proteins that drive phase separation (Ibrahim et al., 2023). We have created a web application that enables researchers to utilize ParSe 2.0 for proteome-scale searches for sequences that drive protein phase separation. Herein, we describe the ParSe 2.0 algorithm and show how this application can quickly search large sets of sequences for proteins and the regions within proteins that are predicted to drive phase separation. Furthermore, we show how the ParSe-computed PS potential can be used to predict mutant phase separation behavior, finding that it reasonably reproduced a newly published dataset (Rekhi et al., 2023) of mutation effects on the saturation concentration (c_sat) associated with protein phase separation.

2 RESULTS

The ParSe 2.0 algorithm performs three tasks to resolve the regions within a protein that are ID, and which subset of those are likely to drive phase separation in a biological context. The tasks are:

Calculate local properties in the sequence using an optimal set of property scales.
Determine where local properties match the folded, ID, or PS ID classes.
Identify regions of uniform class to predict domain-level organization.

2.1 Calculate local sequence properties

The algorithm predicts the modular organization within a protein from its regional variations in intrinsic sequence properties. ParSe 2.0 continuously determines the average properties within a 25-residue segment, or window, that advances through the whole sequence, as shown in Figure 1a. This approach avoids averaging properties between distant regions that may have different characteristics.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

The ParSe 2.0 algorithm. (a) A sliding window approach is used to identify regions within a protein that match the folded, ID, and PS ID classes. Hydrophobicity (ϕ), α-helix propensity (α), and v_model are calculated for each contiguous stretch of 25 residues, or “window,” in the primary sequence. (B) Each window is assigned a label, F, P, or D, depending on the values of ϕ, α, and v_model. Small circles show the values calculated for the 25-residue windows in the human SSBP4 sequence; ϕ in the top plot, α and v_model in the bottom plot, and compared to the distributions calculated in the folded (black), ID (red), and PS ID training sets (blue). (C) Window labels are assigned to the central residue of the window. Terminal residues are assigned labels according to the first and last windows. (D) Contiguous regions of at least 20 residues that are 90% of only one label P, D, or F are colored blue, red, or black, respectively, to represent predicted PS, ID, or folded regions. (E) Classifier distance of each window, assigned to the central residue of the window and colored according to its label P (blue), D (red), or F (black).

Several properties are calculated from the sequence in each 25-residue segment: (1) the average hydrophobicity, ϕ, using an amino acid scale from Bastolla et al. (2005), which was derived from contact matrices of globular protein structures, (2) the average intrinsic propensity for α-helix, α, using an amino acid scale from Tanaka and Scheraga (1977) calculated from x-ray data on native proteins, and (3) v_model, which is a sequence-based model of the polymer scaling exponent (Paiz et al., 2021) that is based on hydrodynamic size and originally was developed from polymer theories to extract information on the balance of self and solvent interactions in long homopolymers (Flory, 1949). Experimental v has been used widely as a measure of chain compaction in biological proteins (Borgia et al., 2016; Hofmann et al., 2012; Kohn et al., 2004; Marsh & Forman-Kay, 2010; Müller-Späth et al., 2010; Riback et al., 2017; Wilkins et al., 1999; Wuttke et al., 2014). The results from this algorithm are relatively independent of the size of the segment or window used (Paiz et al., 2021).

2.2 Match local sequence properties to protein class

Previously, we established that the three classes of proteins regions, for folded, ID, and PS ID, exhibit robust property differences (Ibrahim et al., 2023). This was shown using curated datasets of folded, ID, and PS ID sequences to examine how broadly existing amino acid property scales can be used to distinguish between the three classes of protein regions. We found that ~95% of the 566 scales of amino acid properties in the Amino Acid Index Database (Kawashima & Kanehisa, 2000), a curated set of numerical indices representing various physicochemical and biochemical properties of the amino acids, produced statistically significant differences between the means of the folded and ID sets. The largest statistical separation in the means, determined by t-test (Welch, 1947), was obtained when using the Bastolla scale for hydrophobicity, ϕ. Based on that finding, ParSe 2.0 uses ϕ to identify those 25-residue windows in a sequence that are likely to map to a folded protein region (Figure 1b).

Similarly, the largest statistical separation in the means when comparing the ID and PS ID sequence sets was obtained when using the Tanaka and Scheraga propensity scale, α (Figure 1b); however, there was considerable overlap in the performance of different predictors (Ibrahim et al., 2023). Principal component analysis (PCA) of the variance in the combined sequence sets demonstrated that the variance arising from v_model was orthogonal to the variance from α, and thus α and v_model could be combined without significant redundancy when comparing protein sequences (Ibrahim et al., 2023). PCA also was used to reduce the dimensionality in the dataset (Pearson, 1901), finding that most of the variability within the sequence sets measured by high-performing scales can be captured by 2–3 parameters (Ibrahim et al., 2023). Based on that finding, ParSe 2.0 first uses ϕ to identify ID regions (PS ID or ID) as compared to folded regions. Then, it uses α and v_model to identify the 25-residue windows that are likely to show phase separation behavior.

Window labels are used by ParSe 2.0 to record the results of this decision tree. Windows that match the folded class in ϕ are labeled F; all others are labeled P or D. D is assigned to windows with high α and high v_model (matching the ID class), while P is assigned to windows with low α and low v_model (matching the PS ID class). The P/D boundary was determined by the line that bisects the overlapping distributions of α and v_model in the ID and PS ID training sets. Next, the window label is assigned to the central residue in the window (Figure 1c). N- and C-terminal residues not belonging to a central window position are assigned the label of the central residue in the first and last window, respectively, of the whole sequence.

2.3 Identify regions of uniform protein class

Protein regions predicted by ParSe 2.0 to be folded, ID, or PS ID are determined by finding contiguous residue positions of length ≥ 20 that are ≥90% of only one label F, D, or P, respectively. When an overlap occurs between adjacent predicted regions, owing to the up to 10% label mixing allowed, this overlap is split evenly between the two adjacent regions. Figure 1d shows the application of this ruleset to human single-stranded DNA-binding protein 4 (SSBP4), which is not reported to phase separate in the current literature. For SSBP4, protein regions predicted by ParSe 2.0 to be folded, ID, or PS ID have been colored black, red, or blue, respectively; white corresponds to regions with a mixture of F, D, or P labels.

The classifier distance was developed to assess confidence in the P, D, and F label assignments (Ibrahim et al., 2023). Here, the algorithm calculates the linear distance of a window into its classifier sector, relative to the cutoff boundary and normalized by the distance separating the boundary and the training set mean. For example, the classifier distance for a P-labeled window would be the shortest distance of the window to the P/D boundary divided by the shortest distance of the P/D boundary to the mean of the PS ID training set (see Figure 1b). Thus, for a P-labeled window, values greater than 1 in the classifier distance indicate a window located at a distance further from the P/D boundary than that of the PS ID set mean, whereas values less than 1 indicate a window closer to the cutoff boundary than the PS ID set mean and, as such, possibly with some uncertainty for its classifier label (i.e., a classifier distance that resides within the overlap in the ID and PS ID training set distributions). For D- and F-labeled windows, identically structured calculations are performed using cutoff boundaries and training set means, for D-labeled windows in the D sector of the α versus v_model plot and for F-labeled windows in the high ϕ region. Position-specific classifier distances calculated from the SSBP4 sequence are shown in Figure 1e.

2.4 Proteome-scale searches using ParSe 2.0

One advantage of ParSe 2.0 is that it is very fast and can be applied to very large datasets. To facilitate searches on a proteomic scale, we have developed the ParSe 2.0 algorithm into a web application that takes a user-supplied input in FASTA format. This application may be accessed through a browser by visiting https://stevewhitten.github.io/Parse_v2_FASTA. The required computation time increases linearly with the number of sequences in the input FASTA file. All calculations are performed on the user's local system through the JavaScript interface. We have found that the application can process, on average, ~14,000 primary sequences per minute using a standard desktop computer. Figure 2 shows the computational expense in minutes for FASTA files containing various-sized sequence sets. The largest proteome used for this figure is the human proteome with splice variants obtained from UniProt (UniProt Consortium, 2021) and representing 75,776 primary sequences. The second largest proteome in the figure, also obtained from UniProt, is the human proteome represented by one sequence per gene and containing 20,594 primary sequences. The computation rate for the ParSe 2.0 web application is compared in the figure to the rates we measured for other available tools used to predict phase separation behavior and that can process multiple input sequences (Chu et al., 2022; Klus et al., 2014; Lancaster et al., 2014; Vernon et al., 2018). ParSe 2.0 is compared in more detail to other predictors below.

Upon completion, the application outputs datasets that allow the user to quickly identify those proteins within the input file that have regions predicted to drive phase separation behavior. We demonstrate how to use this application and read its output by example below. We also made a second application (accessed at https://stevewhitten.github.io/Parse_v2_web) that can be used to evaluate individual protein sequences, which produces output in the format shown in Figures 1c–e when provided a single, primary sequence as input.

2.5 Characteristics of biological proteins that drive physiological phase separation

ParSe 2.0 was designed to identify PS IDRs from the primary sequence (Ibrahim et al., 2023). First, to demonstrate the output expected from proteins that phase separate, we tested a FASTA file of 43 proteins confirmed to exhibit homotypic phase separation behavior that was curated by Vernon et al. (2018). The UniProt Knowledgebase accession ID (UniProtKB ID), gene name, and primary sequence for the proteins in this set are given in Table S1.

The output includes two sets of plots and four datasets that can be downloaded. The plots are intended to quickly show if the uploaded dataset is enriched or depleted in phase separation potential relative to a reference; here, the reference is the human proteome containing splice variants. Figure 3 reproduces these plots for the Vernon set and shows that this dataset is highly enriched in proteins with long (N ≥ 50) predicted PS IDRs relative to the human proteome (Figure 3a). In addition to length of the predicted PS IDR, the summed classifier distance for every window labeled P has been used as a proxy to estimate the PS potential in a sequence (Ibrahim et al., 2023). We find that the Vernon set is heavily enriched in proteins with computed PS potentials ≥100, relative to the human proteome (Figure 3b). Furthermore, these data are used to create recall plots from which the area under the curve (AUC) is calculated. AUC values >0.5 in either metric indicate a set of sequences enriched in phase separation potential relative to the reference human proteome.

The results can be analyzed using any of four tables, linked after the plots described above. The first is a summary table of sequence-calculated values that can be sorted within the application (or downloaded and sorted separately). Sorting this table by the third column ranks, the input file sequences by their computed PS potential (i.e., the classifier distance sum of windows labeled P), or by the fourth column to rank by length of the longest predicted PS IDR (Figure 4). Thus, proteins within the input file predicted to have PS IDRs can be quickly identified by simply sorting the third or fourth columns of this table. This table for the Vernon set is reproduced in Table S2. If the description line for each sequence is formatted according to UniProt, where the line preceding each primary sequence lists the UniProtKB ID followed by the gene name and protein name, these identifying labels will be listed in the last three columns of the summary table.

Located below the summary table in the application are tables containing the data used to make the plots described above, allowing for their reproduction outside of the application. Finally, the application also outputs a FASTA file containing the predicted PS IDRs with length ≥ 50 that were found within the original input file.

2.6 Comparing ParSe 2.0 to other sequence-based predictors

Previously, we found that ParSe 2.0 is at least as accurate in identifying proteins and regions within proteins that drive phase separation compared to other published phase separation predictors and using publicly available datasets (Ibrahim et al., 2023). To demonstrate such predictor comparisons here, we used ParSe 2.0 to generate three separate sequence sets derived from the human proteome. Each set contains protein regions of at least 50 residues identified by our algorithm; the sets differ in the distribution of anticipated protein classes. Predicted PS IDRs comprise the first set, predicted IDRs (non-PS) the second set, and predicted folded regions the third.

Of the protein regions, ParSe 2.0 predicts to be PS IDRs or IDRs; Figure 5a shows that metapredict (Emenecker et al., 2021) and flDPnn (Hu et al., 2021) also predict those regions as ID for >95% of the sequences found in either set. Metapredict was trained on consensus disorder data from eight different disorder predictors, whereas flDPnn was one of the top five predictors of disorder, which were not statistically different from each other in performance, in the recently completed Critical Assessment of protein Intrinsic Disorder (CAID) prediction experiment (Necci et al., 2021). These data suggest that IDR predictions by ParSe 2.0 are likely to be regions predicted as ID by metapredict and flDPnn. For the set not expected to be ID, only a few (<15%) of the ParSe-predicted folded regions were classified as ID by either of these two ID predictors. Thus, ParSe 2.0, despite predicting IDRs from a single metric, ϕ, shows good overall agreement with these two ID predictors when applied to the human proteome.

Figure 5b shows that the predictors PSPredictor (Chu et al., 2022), FuzDrop (Hardenberg et al., 2020), PScore (Vernon et al., 2018), catGranule (Klus et al., 2014), and PLAAC (Lancaster et al., 2014) predict phase separation behavior primarily, though not exclusively, in sequences found in the PS IDR set. PSPredictor was developed from machine learning tools that were trained using sequence data of proteins known to phase separate. FuzDrop was developed using sequence-based estimates of the probability for both disorder and disordered binding to find droplet promoting protein regions. PScore was developed based on a specific molecular mechanism thought to drive phase separation; the propensity of π–π interactions to form cohesive protein interactions. Originally, PLAAC was developed to identify prions, and catGranule targets ID and RNA-binding ability; however, these two algorithms have been widely used as proxies for potential phase separation behavior (Chiu et al., 2022; Pancsa et al., 2021; Shen et al., 2021). Interestingly, PSPredictor, FuzDrop, and catGranule each find substantial proportions (30%–80%) of the predicted non-PS IDR set as possibly showing phase separation behavior.

A key difference among this set of predictors is that ParSe 2.0, PSPredictor, and FuzDrop find PS IDRs within mucins that are mostly missed by PScore, catGranule, and PLAAC (Figure 5C). Mucins are heavily glycosylated proteins that are known to form gel-like assemblies (Demouveaux et al., 2018). On average, a ParSe-predicted mucin PS IDR is enriched in serine and threonine content and depleted in proline, glycine, arginine, and tyrosine, when its composition is compared to the typical PS IDR predicted within a human protein (Figure 5d). Whether or not ParSe-predicted mucin PS IDRs indeed drive physiological phase separation has not been tested. Interestingly, ParSe-predicted PS IDRs within human RNA-binding and ribonucleoproteins are enriched in arginine and tyrosine content, when compared to the human PS IDR composition average, whereas transcription factors, zinc finger proteins, kinases, and proteins with “homeobox” in their name are found to generally match the average predicted PS IDR composition in humans. Others have found that PS, RNA-binding proteins often have tyrosine-rich, low sequence complexity, prion-like domains and arginine-rich RNA-binding domains (Wang et al., 2018). ParSe-predicted PS IDRs within collagen proteins are highly enriched in proline and glycine, which is consistent with the atypical amino acid composition of this protein type.

2.7 Modeling mutation effects on phase separation potential

Extensive mutagenesis studies involving several proteins have been used to understand the sequence features that drive phase separation (Brady et al., 2017; Bremer et al., 2022; Martin et al., 2020; Schuster et al., 2020; Vernon et al., 2018; Wang et al., 2018). The results of those studies implicate specific interactions between amino acids in the formation of phase-separated droplets, for example, cation–anion, cation–π, and π–π. Overall, the main result of many studies is that multiple, redundant molecular mechanisms contribute to the formation of phase-separated droplets from IDRs (Cai et al., 2022; Ibrahim et al., 2023; Rekhi et al., 2023).

Because the PS potential computed by ParSe 2.0 does not include the effects of pairwise interactions involving combinations of amino acid types, the calculation was expanded to contain both the classifier distance sum of P-labeled positions and terms quantifying the effects of interactions between amino acids, termed U_π for π–π and cation–π interactions and U_q for charge-based effects (Ibrahim et al., 2023). We trained U_π and U_q against existing data on mutant sequences from Ddx4, LAF-1, and A1 (Brady et al., 2017; Bremer et al., 2022; Martin et al., 2020; Schuster et al., 2020). However, the different studies used different metrics to quantify phase separation potential. We used the saturation concentration (c_sat) at 4°C and thermodynamic properties associated with phase separation behavior (standard molar ∆h°, ∆s°, and ∆g°) to separately train the calculation. We found that the summed P classifier distance was only moderately able to predict the effects of mutations designed to perturb phase separation behavior. In contrast, the expanded PS potential including U_π and U_q obtained reasonable predictive power, highlighting the importance of pairwise interactions in modulating phase separation behavior (Ibrahim et al., 2023).

The ParSe 2.0 application targeted at individual protein sequences (accessed at https://stevewhitten.github.io/Parse_v2_web) outputs the computed PS potential both with and without the U_π and U_q extensions. We used this modified algorithm to assess a newly published mutant dataset that was not included in the training of U_π and U_q. Mittal and coworkers measured the effects on c_sat at 37°C from mutation in an artificial IDP consisting of 25-repeats of GRGDSPYS (Rekhi et al., 2023). Figure 6 shows that including U_π and U_q in the calculation increased the correlation between experimental c_sat and predicted PS potential (from r = 0.24–0.59). If we used U_π and U_q trained previously by ∆h°, rather than trained by c_sat, the predictive power for capturing sequence effects on c_sat in the new mutant dataset was not as good (r = 0.46, Figure S1). This result is consistent with the observation that mutant rank order in c_sat (at a given temperature) does not necessarily agree with mutant rank order in the measured thermodynamic properties associated with protein phase separation (Bremer et al., 2022).

3 DISCUSSION

ParSe 2.0 was developed with a particular focus on predicting which IDRs in a protein sequence can lead to phase separation. Our approach for identifying potential PS IDRs is based primarily on sequence composition and not on sequence patterns or combinations of amino acids. This approach was inspired by the finding that a wide variety of amino acid scales show statistically significant differences between curated ID and PS ID datasets, indicating that PS IDRs are a robustly different class of protein region than conventional, non-PS IDRs (Ibrahim et al., 2023). Similarly, we and others (Dunker et al., 2000; Meng et al., 2017; Romero et al., 2001; Uversky, 2002) have shown that ID (both conventional and PS) and folded protein regions are robustly different in their intrinsic properties, which enables the sequence-based prediction of the modular organization within a protein with respect to ID, PS ID, and folded regions (Figure 1d).

However, to yield reasonable predictive power for mutations that have been designed to test the role of specific amino acid types in driving protein phase separation, the PS potential as computed by ParSe 2.0 has been modified to account for interactions between amino acids. With this modification (i.e., including both U_π and U_q), we have been able to match existing data on mutant sequences (Ibrahim et al., 2023). The original training of U_π and U_q used protein constructs that were based on sequences from Ddx4, LAF-1, and A1 (Brady et al., 2017; Bremer et al., 2022; Martin et al., 2020; Schuster et al., 2020). The mutant set in Figure 6 is from an artificial protein construct (Rekhi et al., 2023) and shows surprisingly good agreement between changes in the computed PS potential and changes in the measured c_sat at 37°C, especially when the PS potential was modified to include U_π and U_q.

Though, in general, the different phase separation predictors exhibit similar performance when applied to publically available datasets of PS and non-PS protein sequences (Ibrahim et al., 2023), the percent of the human proteome predicted to phase separate can vary substantially by predictor. For example, ParSe 2.0 (Figure 3), PScore (Vernon et al., 2018), and catGranule (Klus et al., 2014) each identify a relatively small subset of the human proteome (~10%–20%) as exhibiting high potential for phase separation when compared to other predictors, including FuzDrop (Hardenberg et al., 2020) that reports ~40%. This could be owing to a narrower focus of some predictors, for example, ParSe for IDR drivers, PScore for a specific mechanism, while other predictors consider both phase separation drivers and clients (Chen et al., 2022; Hardenberg et al., 2020) or a broad set of physical interaction mechanisms (Cai et al., 2022). Indeed, including U_π and U_q into ParSe 2.0 to account for potential protein–protein interactions increases the number of proteins identified that drive phase separation in the human proteome (Ibrahim et al., 2023). However, whether this is a result of correctly classifying more human proteins as driving phase separation or whether the false negative rate has increased remains to be seen.

We have built web application versions of ParSe 2.0 for the scientific community. Because of its speed, the ParSe 2.0 algorithm can be applied to datasets of large size (Figure 2). The strong performance of ParSe 2.0 on existing datasets, the robust nature of the differences between PS IDRs and conventional IDRs, and the high correlation between ParSe 2.0 and other predictors on databases of PS proteins (Ibrahim et al., 2023) all give confidence that the algorithm can identify PS IDRs with significant accuracy.

4 METHODS

4.1 Window calculation of ϕ, α, and v_model

ϕ was calculated as the sequence sum divided by the length, N, using the hydrophobicity scale from Bastolla et al. (2005). For a window of 25 residues, N = 25. Similarly, α was calculated as the sequence sum divided by N using the α-helix propensity scale from Tanaka and Scheraga (1977). v_model, introduced previously (Paiz et al., 2021), was calculated by,

{\nu}_{model}=\mathit{\log}\left({R}_h/{R}_o\right)/\mathit{\log}(N),

(1)

where R_o was a constant set to 2.16 Å, and the hydrodynamic radius, R_h, was calculated from sequence using an equation found to be accurate for monomeric IDPs (English et al., 2017, 2019; Langridge et al., 2014; Perez et al., 2014; Tomasso et al., 2016). The equation to calculate R_h for a disordered sequence is,

{R}_h=2.16\overset{\ocirc }{\mathrm{A}}\bullet {N}^{\left(0.503-0.11\bullet \mathit{\ln}\left({f}_{PPII}\right)\right)}+0.26\bullet \left|{Q}_{net}\right|-0.29\bullet {N}^{0.5},

(2)

where f_PPII is the fractional number of residues in the PPII conformation, and Q_net is the net charge. f_PPII was estimated from ∑ P_PPII,i/N, where P_PPII,i is the experimental PPII propensity determined for amino acid type i in unfolded peptides by Elam et al. (2013). Q_net was determined from the number of lysine and arginine residues minus the number of glutamic acid and aspartic acid.

4.2 ParSe 2.0 algorithm

For an arbitrary sequence, whereby the amino acids are restricted to the 20 common types, ParSe 2.0 first reads the sequence to determine its length, N. Next, the algorithm uses a sliding window scheme (Figure 1a) to calculate v_model, α-helix propensity, and ϕ for every 25-residue segment of the primary sequence. This window scheme can be applied to proteins with N > 25. A window is labeled F if ϕ > 0.08 (Figure 1b). If ϕ < 0.08, a window is labeled P or D depending on the values of v_model and α-helix propensity. Windows with high α-helix propensity and high v_model are labeled D, while those with low α-helix propensity and low v_model are labeled P. The P/D boundary is given by v_model = −0.244·α-helix propensity +0.789. The window label is assigned to the central residue in that window. N- and C-terminal residues not belonging to a central window position are assigned the label of the central residue in the first and last window, respectively, of the whole sequence. Protein regions predicted to be PS, ID, or folded are determined by finding contiguous residue positions of length ≥20 that are ≥90% of only one label P, D, or F, respectively.

4.3 Classifier distance calculation

The classifier distance is the normalized distance of a ParSe 2.0 generated window into its classifier sector (i.e., F, D, or P sector) and relative to the cutoff boundary (Figure 1b). For F-labeled windows, the classifier distance is ϕ (of the window) minus the cutoff value of 0.08 and then normalized to distance of the folded training set mean ϕ (0.1164) to the cutoff. Specifically, this is (ϕ − 0.08)/(0.1164–0.08). For P or D labeled windows, first we find the point on the P/D boundary (defined above) that makes a perpendicular bisector when paired with the window values of v_model and α-helix propensity. Then the distance between this point and the point defined by the window values of v_model and α-helix propensity is determined. Specifically, this distance is sqrt ((α − x)·(α − x) + (v_model − y)·(v_model − y)), where α and v_model are defined above, x is (α/0.244 + 0.789 − v_model)/(0.244 + 1/0.244), and y is (x − α)/0.244 + v_model. This distance is normalized by dividing by 0.019, the distance from the boundary to either of the training set means.

4.4 Computed PS potential

The PS potential for a sequence is the summed classifier distance for every window labeled P. This potential can be expanded to include contributions of aromatic and cation-π interactions (U_π) and charge-based interactions (U_q) as described below.

4.5 Contribution of aromatic and cation–π interactions to the PS potential

The contributions of aromatic and cation–π interactions to protein phase separation follows the observed rank order by Wang et al. (2018): Tyr–Arg > Tyr–Lys ~ Phe–Arg > Phe–Lys. To mimic this ranking, we assumed 3:2:1 weighting and, also, that Phe–Tyr interactions would contribute comparably to Phe–Lys interactions,

{U}_{\pi }={\displaystyle \begin{array}{l}a\bullet \Big(3\bullet \left(\#\mathrm{Y}\times \#\mathrm{R}/{\left(\#\mathrm{Y}-\#\mathrm{R}\right)}_{\#\mathrm{Y}\ne \#\mathrm{R}}\right)+2\\ {}\bullet \left(\#\mathrm{Y}\times \#\mathrm{K}/{\left(\#\mathrm{Y}-\#\mathrm{K}\right)}_{\#\mathrm{Y}\ne \#\mathrm{K}}\right)+2\\ {}\bullet \left(\#\mathrm{F}\times \#\mathrm{R}/{\left(\#\mathrm{F}-\#\mathrm{R}\right)}_{\#\mathrm{F}\ne \#\mathrm{R}}\right)+1\\ {}\bullet \left(\#\mathrm{F}\times \#\mathrm{K}/{\left(\#\mathrm{F}-\#\mathrm{K}\right)}_{\#\mathrm{F}\ne \#\mathrm{K}}\right)+1\\ {}\bullet \left(\#\mathrm{F}\times \#\mathrm{Y}/{\left(\#\mathrm{F}-\#\mathrm{Y}\right)}_{\#\mathrm{F}\ne \#\mathrm{Y}}\right)\Big).\end{array}}

(3)

In Equation 3, #Y, #R, #F, and #K represent the number of Tyr, Arg, Phe, and Lys residues, respectively, calculated on a per-window basis. Thus, U_π increases with increasing Tyr, Arg, Phe, and Lys content and more so when interaction partners are present at similar levels. When the divisor is zero (e.g., when #Y = #R), it is changed to 1 to avoid infinite potentials. The fitting parameter a was determined previously (Ibrahim et al., 2023) by finding the optimal correlation of the expanded PS potential to experimental ∆h° (finding a = 0.14), ∆s° (finding a = 0.08), ∆g° (finding a = 0.11), or c_sat at 4°C (finding a = 0.28). Window-specific U_π is added to the classifier distance at windows labeled P. U_π also is calculated at D-labeled windows, allowing for the possibility of labels changing from D to P. This would occur when the value for U_π was larger than the classifier distance at a D-labeled window. Thus, protein regions that otherwise have characteristics more like the ID set, in v_model and α-helix propensity, could be labeled P if U_π was large enough. Here, the given classifier distance was determined by the difference between U_π and the original classifier distance of the window formerly labeled D.

4.6 Contribution of charge-based interactions to the PS potential

The contributions of charge-based interactions to protein phase separation follows the observations by Schuster et al. (2020) and Bremer et al. (2022) that changes in the sequence charge decoration, SCD, and net charge per residue, NCPR, respectively, can affect phase separation potential. Thus, a simple charge-based potential was defined,

{U}_q=b\bullet SCD+c\bullet \mid NCPR\mid,

(4)

where b and c are fitting parameters, and U_q is calculated on a per-window basis. U_q is added to the classifier distance at each window labeled P and is applied to windows labeled D, following the scheme described above for U_π, again allowing for the possibility of labels changing from D to P. The parameters b and c were determined previously (Ibrahim et al., 2023) by finding the optimal correlation of the expanded PS potential and Δh° (finding 8.4 and 5.6, respectively), Δs° (finding 4.6 and 7.0, respectively), Δg° (finding 5.2 and 5.4, respectively), or c_sat at 4°C (finding −16.0 and 33, respectively). NCPR is the number of Lys and Arg residues minus the number of Glu and Asp residues, divided by N. SCD is calculated by N⁻¹∑_i∑_j,j>i(q_iq_j)|j-i|^1/2, where q is the amino acid-specific charge (Sawle & Ghosh, 2015).

4.7 Metapredict calculation

Metapredict score (Emenecker et al., 2021), which predicts the presence of ID in a sequence, was calculated by computer algorithm using the Python script available at http://metapredict.net. The per-residue average metapredict score, when >0.5, was used to classify a protein region as predicted to be ID.

4.8 flDPnn calculation

flDPnn score (Hu et al., 2021), which predicts the presence of ID in a sequence, was calculated by using the webserver available at http://biomine.cs.vcu.edu/servers/flDPnn. The per-residue average flDPnn binary score, when >0.5, was used to classify a protein region as predicted to be ID.

4.9 PSPredictor calculation

PSPredictor score was calculated by using the PSPredictor (Chu et al., 2022) webtool available at http://www.pkumdl.cn:8000/PSPredictor. PSPredictor score, when >0.5, was used to classify a protein region as predicted to exhibit phase separation behavior.

4.10 FuzDrop calculation

FuzDrop calculations (Hardenberg et al., 2020) used the webtool available at https://fuzdrop.bio.unipd.it/predictor. The residue-based droplet-promoting probability ( ${p}_{DP}$ ), when >90% of residues having ${p}_{DP}$ >0.6, was used to classify a protein region as predicted to exhibit phase separation behavior.

4.11 PSCORE calculation

PSCORE, which is a phase separation propensity predictor (Vernon et al., 2018), was calculated by computer algorithm using the Python script and associated database files available at https://doi.org/10.7554/eLife.31486.022. The overall PScore, when >4, was used to classify a protein region as predicted to exhibit phase separation behavior.

4.12 Granule propensity calculation

Granule propensity was calculated by using the catGranule (Klus et al., 2014) webtool available at http://www.tartaglialab.com. Granule propensity, when >0, was used to classify a protein region as predicted to exhibit phase separation behavior.

4.13 PLAAC LLR calculation

LLR score, which identifies prion-containing sequences (Lancaster et al., 2014), was calculated by using the webtool available at http://plaac.wi.mit.edu. The LLR score, when >0, was used to classify a protein region as predicted to exhibit phase separation behavior.

AUTHOR CONTRIBUTIONS

Loren E. Hough and Steven T. Whitten: conceptualization; Colorado Wilson and Steven T. Whitten: programming; Karen A. Lewis, Nicholas C. Fitzkee, Loren E. Hough, and Steven T. Whitten: methodology; Karen A. Lewis, Nicholas C. Fitzkee, Loren E. Hough, and Steven T. Whitten: formal analysis; Steven T. Whitten: writing—original draft; Karen A. Lewis, Nicholas C. Fitzkee, Loren E. Hough, and Steven T. Whitten: writing—review and editing.

FUNDING INFORMATION

This work was supported by the National Institutes of Health under grants R35GM119755 (Loren E. Hough) and R01AI139479 (Nicholas C. Fitzkee), the National Science Foundation under grants 1818090 (Nicholas C. Fitzkee) and 1943488 (Loren E. Hough), and Texas State University Office of Research and Sponsored Projects through the Research Enhancement Program (Steven T. Whitten and Karen A. Lewis). No nongovernmental sources were used to fund this project. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NSF or NIH.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

Open Research

DATA AVAILABILITY STATEMENT

A web application of ParSe 2.0 that evaluates individual protein sequences can be accessed at https://stevewhitten.github.io/Parse_v2_web. A web application of ParSe 2.0 that can be used to quickly find phase-separating proteins within large sequence sets can be accessed at https://stevewhitten.github.io/Parse_v2_FASTA. The source code for both applications can be accessed at https://github.com/stevewhitten.

Supporting Information

REFERENCES

Aguzzi A, Altmeyer M. Phase separation: linking cellular compartmentalization to disease. Trends Cell Biol. 2016; 26: 547–558.
10.1016/j.tcb.2016.03.004
CAS PubMed Web of Science® Google Scholar
Alberti S, Dormann D. Liquid–liquid phase separation in disease. Annu Rev Genet. 2019; 53: 171–194.
10.1146/annurev-genet-112618-043527
CAS PubMed Web of Science® Google Scholar
Bastolla U, Porto M, Roman HE, Vendruscolo M. Principal eigenvector of contact matrices and hydrophobicity profiles in proteins. Proteins. 2005; 58: 22–30.
10.1002/prot.20240
CAS PubMed Web of Science® Google Scholar
Borgia A, Zheng W, Buholzer K, Borgia MB, Schüler A, Hofmann H, et al. Consistent view of polypeptide chain expansion in chemical denaturants from multiple experimental methods. J Am Chem Soc. 2016; 138: 11714–11726.
10.1021/jacs.6b05917
CAS PubMed Web of Science® Google Scholar
Bouchard JJ, Otero JH, Scott DC, Szulc E, Martin EW, Sabri N, et al. Cancer mutations of the tumor suppressor SPOP disrupt the formation of active, phase-separated compartments. Mol Cell. 2018; 72: 19.e8–36.e8.
10.1016/j.molcel.2018.08.027
Web of Science® Google Scholar
Brady JP, Farber PJ, Sekhar A, Lin Y-H, Huang R, Bah A, et al. Structural and hydrodynamic properties of an intrinsically disordered region of a germ cell-specific protein on phase separation. Proc Natl Acad Sci U S A. 2017; 114: E8194–E8203.
10.1073/pnas.1706197114
CAS PubMed Web of Science® Google Scholar
Brangwynne CP, Eckmann CR, Courson DS, Rybarska A, Hoege C, Gharakhani J, et al. Germline P granules are liquid droplets that localize by controlled dissolution/condensation. Science. 2009; 324: 1729–1732.
10.1126/science.1172046
CAS PubMed Web of Science® Google Scholar
Bremer A, Farag M, Borcherds WM, Peran I, Martin EW, Pappu RV, et al. Deciphering how naturally occurring sequence features impact the phase behaviours of disordered prion-like domains. Nat Chem. 2022; 14: 196–207.
10.1038/s41557-021-00840-w
CAS PubMed Web of Science® Google Scholar
Cai H, Vernon RM, Forman-Kay JD. An interpretable machine-learning algorithm to predict disordered protein phase separation based on biophysical interactions. Biomolecules. 2022; 12:1131.
10.3390/biom12081131
CAS PubMed Web of Science® Google Scholar
Chen Z, Hou C, Wang L, Yu C, Chen T, Shen B, et al. Screening membraneless organelle participants with machine-learning models that integrate multimodal features. Proc Natl Acad Sci U S A. 2022; 119:e2115369119.
10.1073/pnas.2115369119
CAS PubMed Web of Science® Google Scholar
Chiu S-H, Ho W-L, Sun Y-C, Kuo J-C, Huang J. Phase separation driven by interchangeable properties in the intrinsically disordered regions of protein paralogs. Commun Biol. 2022; 5: 1–12.
10.1038/s42003-022-03354-4
PubMed Web of Science® Google Scholar
Chong PA, Forman-Kay JD. Liquid-liquid phase separation in cellular signaling systems. Curr Opin Struct Biol. 2016; 41: 180–186.
10.1016/j.sbi.2016.08.001
CAS PubMed Web of Science® Google Scholar
Chu X, Sun T, Li Q, Xu Y, Zhang Z, Lai L, et al. Prediction of liquid-liquid phase separating proteins using machine learning. BMC Bioinformatics. 2022; 23: 72.
10.1186/s12859-022-04599-w
CAS PubMed Web of Science® Google Scholar
Dao TP, Kolaitis R-M, Kim HJ, O'Donovan K, Martyniak B, Colicino E, et al. Ubiquitin modulates liquid-liquid phase separation of UBQLN2 via disruption of multivalent interactions. Mol Cell. 2018; 69: 965.e6–978.e6.
10.1016/j.molcel.2018.02.004
Web of Science® Google Scholar
Demouveaux B, Gouyer V, Gottrand F, Narita T, Desseyn J-L. Gel-forming mucin interactome drives mucus viscoelasticity. Adv Colloid Interface Sci. 2018; 252: 69–82.
10.1016/j.cis.2017.12.005
CAS PubMed Web of Science® Google Scholar
Dignon GL, Zheng W, Best RB, Kim YC, Mittal J. Relation between single-molecule properties and phase behavior of intrinsically disordered proteins. Proc Natl Acad Sci U S A. 2018; 115: 9929–9934.
10.1073/pnas.1804177115
CAS PubMed Web of Science® Google Scholar
Dunker AK, Obradovic Z, Romero P, Garner EC, Brown CJ. Intrinsic protein disorder in complete genomes. Genome Inform Ser Workshop Genome Inform. 2000; 11: 161–171.
CAS PubMed Google Scholar
Elam WA, Schrank TP, Campagnolo AJ, Hilser VJ. Evolutionary conservation of the polyproline II conformation surrounding intrinsically disordered phosphorylation sites. Protein Sci. 2013; 22: 405–417.
10.1002/pro.2217
CAS PubMed Web of Science® Google Scholar
Emenecker RJ, Griffith D, Holehouse AS. Metapredict: a fast, accurate, and easy-to-use predictor of consensus disorder and structure. Biophys J. 2021; 120: 4312–4319.
10.1016/j.bpj.2021.08.039
CAS PubMed Web of Science® Google Scholar
English LR, Tilton EC, Ricard BJ, Whitten ST. Intrinsic α helix propensities compact hydrodynamic radii in intrinsically disordered proteins. Proteins. 2017; 85: 296–311.
10.1002/prot.25222
CAS PubMed Web of Science® Google Scholar
English LR, Voss SM, Tilton EC, Paiz EA, So S, Parra GL, et al. Impact of heat on coil hydrodynamic size yields the energetics of denatured state conformational bias. J Phys Chem B. 2019; 123: 10014–10024.
10.1021/acs.jpcb.9b09088
CAS PubMed Web of Science® Google Scholar
Flory PJ. The configuration of real polymer chains. J Chem Phys. 1949; 17: 303–310.
10.1063/1.1747243
CAS Web of Science® Google Scholar
Gibson BA, Doolittle LK, Schneider MWG, Jensen LE, Gamarra N, Henry L, et al. Organization of chromatin by intrinsic and regulated phase separation. Cell. 2019; 179: 470.e21–484.e21.
10.1016/j.cell.2019.08.037
Web of Science® Google Scholar
Gomes E, Shorter J. The molecular language of membraneless organelles. J Biol Chem. 2019; 294: 7115–7127.
10.1074/jbc.TM118.001192
CAS PubMed Web of Science® Google Scholar
Hardenberg M, Horvath A, Ambrus V, Fuxreiter M, Vendruscolo M. Widespread occurrence of the droplet state of proteins in the human proteome. Proc Natl Acad Sci U S A. 2020; 117: 33254–33262.
10.1073/pnas.2007670117
CAS PubMed Web of Science® Google Scholar
Hofmann H, Soranno A, Borgia A, Gast K, Nettels D, Schuler B. Polymer scaling laws of unfolded and intrinsically disordered proteins quantified with single-molecule spectroscopy. Proc Natl Acad Sci U S A. 2012; 109: 16155–16160.
10.1073/pnas.1207719109
CAS PubMed Web of Science® Google Scholar
Hu G, Katuwawala A, Wang K, Wu Z, Ghadermarzi S, Gao J, et al. flDPnn: accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat Commun. 2021; 12:4438.
10.1038/s41467-021-24773-7
CAS PubMed Web of Science® Google Scholar
Ibrahim AY, Khaodeuanepheng NP, Amarasekara DL, Correia JJ, Lewis KA, Fitzkee NC, et al. Intrinsically disordered regions that drive phase separation form a robustly distinct protein class. J Biol Chem. 2023; 299:102801.
10.1016/j.jbc.2022.102801
CAS PubMed Web of Science® Google Scholar
Jacobs MI, Jira ER, Schroeder CM. Understanding how Coacervates drive reversible small molecule reactions to promote molecular complexity. Langmuir. 2021; 37: 14323–14335.
10.1021/acs.langmuir.1c02231
CAS PubMed Web of Science® Google Scholar
Kang J-Y, Wen Z, Pan D, Zhang Y, Li Q, Zhong A, et al. LLPS of FXR1 drives spermiogenesis by activating translation of stored mRNAs. Science. 2022; 377:eabj6647.
10.1126/science.abj6647
CAS PubMed Web of Science® Google Scholar
Kawashima S, Kanehisa M. AAindex: amino acid index database. Nucleic Acids Res. 2000; 28: 374.
10.1093/nar/28.1.374
CAS PubMed Web of Science® Google Scholar
Klus P, Bolognesi B, Agostini F, Marchese D, Zanzoni A, Tartaglia GG. The cleverSuite approach for protein characterization: predictions of structural properties, solubility, chaperone requirements and RNA-binding abilities. Bioinformatics. 2014; 30: 1601–1608.
10.1093/bioinformatics/btu074
CAS PubMed Web of Science® Google Scholar
Kohn JE, Millett IS, Jacob J, Zagrovic B, Dillon TM, Cingel N, et al. Random-coil behavior and the dimensions of chemically unfolded proteins. Proc Natl Acad Sci U S A. 2004; 101: 12491–12496.
10.1073/pnas.0403643101
CAS PubMed Web of Science® Google Scholar
Lafontaine DLJ, Riback JA, Bascetin R, Brangwynne CP. The nucleolus as a multiphase liquid condensate. Nat Rev Mol Cell Biol. 2021; 22: 165–182.
10.1038/s41580-020-0272-6
CAS PubMed Web of Science® Google Scholar
Lancaster AK, Nutter-Upham A, Lindquist S, King OD. PLAAC: a web and command-line application to identify proteins with prion-like amino acid composition. Bioinformatics. 2014; 30: 2501–2502.
10.1093/bioinformatics/btu310
CAS PubMed Web of Science® Google Scholar
Langridge TD, Tarver MJ, Whitten ST. Temperature effects on the hydrodynamic radius of the intrinsically disordered N-terminal region of the p53 protein. Proteins. 2014; 82: 668–678.
10.1002/prot.24449
CAS PubMed Web of Science® Google Scholar
Li X-H, Chavali PL, Pancsa R, Chavali S, Babu MM. Function and regulation of phase-separated biological condensates. Biochemistry. 2018; 57: 2452–2461.
10.1021/acs.biochem.7b01228
CAS PubMed Web of Science® Google Scholar
Lin Y-H, Brady JP, Chan HS, Ghosh K. A unified analytical theory of heteropolymers for sequence-specific phase behaviors of polyelectrolytes and polyampholytes. J Chem Phys. 2020; 152:045102.
10.1063/1.5139661
CAS PubMed Web of Science® Google Scholar
Lin Y-H, Chan HS. Phase separation and single-chain compactness of charged disordered proteins are strongly correlated. Biophys J. 2017; 112: 2043–2046.
10.1016/j.bpj.2017.04.021
CAS PubMed Web of Science® Google Scholar
Liu S, Wang T, Shi Y, Bai L, Wang S, Guo D, et al. USP42 drives nuclear speckle mRNA splicing via directing dynamic phase separation to promote tumorigenesis. Cell Death Differ. 2021; 28: 2482–2498.
10.1038/s41418-021-00763-6
CAS PubMed Web of Science® Google Scholar
Lu H, Yu D, Hansen AS, Ganguly S, Liu R, Heckert A, et al. Phase-separation mechanism for C-terminal hyperphosphorylation of RNA polymerase II. Nature. 2018; 558: 318–323.
10.1038/s41586-018-0174-3
CAS PubMed Web of Science® Google Scholar
Marsh JA, Forman-Kay JD. Sequence determinants of compaction in intrinsically disordered proteins. Biophys J. 2010; 98: 2383–2390.
10.1016/j.bpj.2010.02.006
CAS PubMed Web of Science® Google Scholar
Martin EW, Holehouse AS, Peran I, Farag M, Incicco JJ, Bremer A, et al. Valence and patterning of aromatic residues determine the phase behavior of prion-like domains. Science. 2020; 367: 694–699.
10.1126/science.aaw8653
CAS PubMed Web of Science® Google Scholar
Meng F, Uversky VN, Kurgan L. Comprehensive review of methods for prediction of intrinsic disorder and its molecular functions. Cell Mol Life Sci. 2017; 74: 3069–3090.
10.1007/s00018-017-2555-4
CAS PubMed Web of Science® Google Scholar
Mitrea DM, Kriwacki RW. Phase separation in biology; functional organization of a higher order. Cell Commun Signal. 2016; 14: 1.
10.1186/s12964-015-0125-7
PubMed Web of Science® Google Scholar
Molliex A, Temirov J, Lee J, Coughlin M, Kanagaraj AP, Kim HJ, et al. Phase separation by low complexity domains promotes stress granule assembly and drives pathological fibrillization. Cell. 2015; 163: 123–133.
10.1016/j.cell.2015.09.015
CAS PubMed Web of Science® Google Scholar
Müller-Späth S, Soranno A, Hirschfeld V, Hofmann H, Rüegger S, Reymond L, et al. Charge interactions can dominate the dimensions of intrinsically disordered proteins. Proc Natl Acad Sci U S A. 2010; 107: 14609–14614.
10.1073/pnas.1001743107
CAS PubMed Web of Science® Google Scholar
Murthy AC, Dignon GL, Kan Y, Zerze GH, Parekh SH, Mittal J, et al. Molecular interactions underlying liquid-liquid phase separation of the FUS low-complexity domain. Nat Struct Mol Biol. 2019; 26: 637–648.
10.1038/s41594-019-0250-x
CAS PubMed Web of Science® Google Scholar
Necci M, Piovesan D, Tosatto SCE. Critical assessment of protein intrinsic disorder prediction. Nat Methods. 2021; 18: 472–481.
10.1038/s41592-021-01117-3
CAS PubMed Web of Science® Google Scholar
Oshidari R, Huang R, Medghalchi M, Tse EYW, Ashgriz N, Lee HO, et al. DNA repair by Rad52 liquid droplets. Nat Commun. 2020; 11: 695.
10.1038/s41467-020-14546-z
CAS PubMed Web of Science® Google Scholar
Paiz EA, Allen JH, Correia JJ, Fitzkee NC, Hough LE, Whitten ST. Beta turn propensity and a model polymer scaling exponent identify intrinsically disordered phase-separating proteins. J Biol Chem. 2021; 297:101343.
10.1016/j.jbc.2021.101343
CAS PubMed Web of Science® Google Scholar
Pancsa R, Vranken W, Mészáros B. Computational resources for identifying and describing proteins driving liquid-liquid phase separation. Brief Bioinform. 2021; 22:bbaa408.
10.1093/bib/bbaa408
PubMed Web of Science® Google Scholar
Pearson K. LIII. On lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci. 1901; 2: 559–572.
10.1080/14786440109462720
PubMed Google Scholar
Perez RB, Tischer A, Auton M, Whitten ST. Alanine and proline content modulate global sensitivity to discrete perturbations in disordered proteins. Proteins. 2014; 82: 3373–3384.
10.1002/prot.24692
CAS PubMed Web of Science® Google Scholar
Prasad A, Bharathi V, Sivalingam V, Girdhar A, Patel BK. Molecular mechanisms of TDP-43 Misfolding and pathology in amyotrophic lateral sclerosis. Front Mol Neurosci. 2019; 12: 25.
10.3389/fnmol.2019.00025
CAS PubMed Web of Science® Google Scholar
Prouteau M, Loewith R. Regulation of cellular metabolism through phase separation of enzymes. Biomolecules. 2018; 8: 160.
10.3390/biom8040160
PubMed Web of Science® Google Scholar
Rabouille C, Alberti S. Cell adaptation upon stress: the emerging role of membrane-less compartments. Curr Opin Cell Biol. 2017; 47: 34–42.
10.1016/j.ceb.2017.02.006
CAS PubMed Web of Science® Google Scholar
Rekhi S, Garcia CG, Barai M, Rizuan A, Schuster BS, Kiick KL, et al. Expanding the molecular language of protein liquid-liquid phase separation. bioRxiv. 2023 Available from: https://www.biorxiv.org/content/10.1101/2023.03.02.530853v1
Google Scholar
Riback JA, Bowman MA, Zmyslowski AM, Knoverek CR, Jumper JM, Hinshaw JR, et al. Innovative scattering analysis shows that hydrophobic disordered proteins are expanded in water. Science. 2017; 358: 238–241.
10.1126/science.aan5774
CAS PubMed Web of Science® Google Scholar
Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK. Sequence complexity of disordered protein. Proteins. 2001; 42: 38–48.
10.1002/1097-0134(20010101)42:1<38::AID-PROT50>3.0.CO;2-3
CAS PubMed Web of Science® Google Scholar
Sawle L, Ghosh K. A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins. J Chem Phys. 2015; 143:085101.
10.1063/1.4929391
PubMed Web of Science® Google Scholar
Schuster BS, Dignon GL, Tang WS, Kelley FM, Ranganath AK, Jahnke CN, et al. Identifying sequence perturbations to an intrinsically disordered protein that determine its phase-separation behavior. Proc Natl Acad Sci U S A. 2020; 117: 11421–11431.
10.1073/pnas.2000223117
CAS PubMed Web of Science® Google Scholar
Shen B, Chen Z, Yu C, Chen T, Shi M, Li T. Computational screening of phase-separating proteins. Genomics Proteomics Bioinformatics. 2021; 19: 13–24.
10.1016/j.gpb.2020.11.003
PubMed Web of Science® Google Scholar
Su X, Ditlev JA, Hui E, Xing W, Banjade S, Okrut J, et al. Phase separation of signaling molecules promotes T cell receptor signal transduction. Science. 2016; 352: 595–599.
10.1126/science.aad9964
CAS PubMed Web of Science® Google Scholar
Tanaka S, Scheraga HA. Statistical mechanical treatment of protein conformation. 5. Multistate model for specific-sequence copolymers of amino acids. Macromolecules. 1977; 10: 9–20.
10.1021/ma60055a002
CAS PubMed Web of Science® Google Scholar
Tomasso ME, Tarver MJ, Devarajan D, Whitten ST. Hydrodynamic radii of intrinsically disordered proteins determined from experimental Polyproline II propensities. PLoS Comput Biol. 2016; 12:e1004686.
10.1371/journal.pcbi.1004686
PubMed Web of Science® Google Scholar
Tsang B, Pritišanac I, Scherer SW, Moses AM, Forman-Kay JD. Phase separation as a missing mechanism for interpretation of disease mutations. Cell. 2020; 183: 1742–1756.
10.1016/j.cell.2020.11.050
CAS PubMed Web of Science® Google Scholar
UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021; 49: D480–D489.
10.1093/nar/gkaa1100
PubMed Web of Science® Google Scholar
Uversky VN. Natively unfolded proteins: a point where biology waits for physics. Protein Sci. 2002; 11: 739–756.
10.1110/ps.4210102
CAS PubMed Web of Science® Google Scholar
Uversky VN, Kuznetsova IM, Turoverov KK, Zaslavsky B. Intrinsically disordered proteins as crucial constituents of cellular aqueous two phase systems and coacervates. FEBS Lett. 2015; 589: 15–22.
10.1016/j.febslet.2014.11.028
CAS PubMed Web of Science® Google Scholar
Vernon RM, Chong PA, Tsang B, Kim TH, Bah A, Farber P, et al. Pi-pi contacts are an overlooked protein feature relevant to phase separation. Elife. 2018; 7:e31486.
10.7554/eLife.31486
PubMed Web of Science® Google Scholar
Wang J, Choi J-M, Holehouse AS, Lee HO, Zhang X, Jahnel M, et al. A molecular grammar governing the driving forces for phase separation of prion-like RNA binding proteins. Cell. 2018; 174: 688.e16–699.e16.
10.1016/j.cell.2018.06.006
Web of Science® Google Scholar
Welch BL. The generalization of ‘Student's’ problem when several different population Varlances are involved. Biometrika. 1947; 34: 28–35.
10.1093/biomet/34.1-2.28
CAS PubMed Web of Science® Google Scholar
Wilkins DK, Grimshaw SB, Receveur V, Dobson CM, Jones JA, Smith LJ. Hydrodynamic radii of native and denatured proteins measured by pulse field gradient NMR techniques. Biochemistry. 1999; 38: 16424–16431.
10.1021/bi991765q
CAS PubMed Web of Science® Google Scholar
Wuttke R, Hofmann H, Nettels D, Borgia MB, Mittal J, Best RB, et al. Temperature-dependent solvation modulates the dimensions of disordered proteins. Proc Natl Acad Sci U S A. 2014; 111: 5213–5218.
10.1073/pnas.1313006111
CAS PubMed Web of Science® Google Scholar
Yamazaki H, Takagi M, Kosako H, Hirano T, Yoshimura SH. Cell cycle-specific phase separation regulated by protein charge blockiness. Nat Cell Biol. 2022; 24: 625–632.
10.1038/s41556-022-00903-1
CAS PubMed Web of Science® Google Scholar
Zeng X, Holehouse AS, Chilkoti A, Mittag T, Pappu RV. Connecting coil-to-globule transitions to full phase diagrams for intrinsically disordered proteins. Biophys J. 2020; 119: 402–418.
10.1016/j.bpj.2020.06.014
CAS PubMed Web of Science® Google Scholar
Zhang Y, Narlikar GJ, Kutateladze TG. Enzymatic reactions inside biological condensates. J Mol Biol. 2021; 433:166624.
10.1016/j.jmb.2020.08.009
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume32, Issue9

September 2023

e4756

This article also appears in:

Tools for Protein Science 2024

ParSe 2.0: A web tool to identify drivers of protein phase separation at the proteome level

Abstract