In deep learning, a major difficulty in identifying suicidality and its risk factors in clinical notes is the lack of training samples given the small number of true positive instances among the number of patients screened. This paper describes a novel methodology that identifies suicidality in clinical notes by addressing this data sparsity issue through zero-shot learning. Our general aim was to develop a tool that leveraged zero-shot learning to effectively identify suicidality documentation in all types of clinical notes.

Methods

US Veterans Affairs clinical notes served as data. The training data set label was determined using diagnostic codes of suicide attempt and self-harm. We used a base string associated with the target label of suicidality to provide auxiliary information by narrowing the positive training cases to those containing the base string. We trained a deep neural network by mapping the training documents’ contents to a semantic space. For comparison, we trained another deep neural network using the identical training data set labels, and bag-of-words features.

Results

The zero-shot learning model outperformed the baseline model in terms of area under the curve, sensitivity, specificity, and positive predictive value at multiple probability thresholds. In applying a 0.90 probability threshold, the methodology identified notes documenting suicidality but not associated with a relevant ICD-10-CM code, with 94% accuracy.

Conclusion

This method can effectively identify suicidality without manual annotation.

Key points

Due to data sparsity, suicidality documentation is difficult to identify in clinical notes. Zero-shot learning addresses the data sparsity issue. Zero-shot learning enables identification of suicidality and its risks in clinical notes, and associated patients, where no diagnostic code relevant to suicide or self-harm has been recorded.

1 INTRODUCTION

Suicide (the act of killing oneself [“Suicide,”])¹ is a significant problem in the United States, increasing 35.2% from 1999 to 2018, and from 10.5 to 14.2 suicides per every 100,000 individuals in that same time period.² In 2021, 48,183 people died from suicide, and there were approximately 1.7 million suicide attempts in the United States.³ By 2013 the mainstream press reported that suicide rates were “sharply rising”⁴ and by 2016, it was noted that US suicide rates had already reached a 30-year high.⁵ Suicide is a complicated problem that includes a dynamic web of individual-level risk factors (e.g., depression, substance use behaviors, personality traits), interpersonal risk factors (e.g., violence, victimization), and community-level factors (e.g., unemployment, stigmatization of mental illness).^{6, 7}

Veterans are especially affected by suicide, with an age- and sex-adjusted rate that is 1.5 times higher than nonveterans.⁸ The Department of Veterans Affairs (VA) operates the single largest integrated health care system in the United States, and has devoted resources to suicide prevention, including the Suicide Prevention Applications Network (SPAN), embedding suicide prevention coordinators and special reporting measures in facilities,⁹ increased mental health staffing, partnerships with community care organizations, and enhanced surveillance and monitoring through its electronic health record (EHR) system.^{10, 11} Additionally, the VA has continual efforts to develop predictive analytics to identify patients at the highest risk of suicide.^{9, 12} The data elements for these predictive analytic algorithms rely on structured data (e.g., International Classification of Disease [ICD] diagnosis codes, prescription data, socio-demographic data, care utilization metrics)¹³ which often provide an incomplete record.^{14, 15} Less is known about how unstructured data, such as contained in clinical notes, can contribute to suicidality (i.e., suicidal ideation or attempt) identification and prevention. Given that a suicide attempt is one of the greatest risk factors for subsequent suicide death, a more thorough means of detecting such events is warranted.¹⁶

1.1 Background and significance

Natural language processing (NLP) combined with machine learning may add value to suicide documentation research. Supervised machine learning methods use “supervised,” or preclassified data. However, naïve attempts at note retrieval using keyword search alone quickly demonstrate the difficulty of this problem, as words such as “suicide” occur in standard questionnaires which are included in many notes, with few actually documenting suicidality. For instance, in a prior experiment we carried out, we randomly collected 1000 VA notes containing the term “suicidal” or “suicide” from 1000 individual patients and performed manual chart review for affirmed suicidality. Only 1.57% of these notes documented actual suicidality. Patient reluctance to disclose suicidal ideation provides a further complicating factor.¹⁷ As a result, a patient's negative response to a suicide ideation inquiry may not reflect their real feelings or intentions. Additionally, relying on structured data alone will result in incomplete identification of patients who have or are experiencing suicidality, because relevant coding is prone to underuse.⁹ However, not all clinical notes associated with relevant structured data document suicidality. For example, a note documenting a secondary service such as group therapy, or a note documenting fluid intake may not directly document suicidality.

Prior attempts to apply NLP and machine learning are often limited to mental health-oriented notes and may suffer if using imbalanced data. Levis et al.¹⁸ applied sentiment analysis and various machine learning algorithms to classify suicide, using VA psychotherapy notes, yielding area under the curve (AUC) ratings comparable to chance. Fernandes et al.¹⁹ obtained excellent NLP performance in their study of clinical notes from the Clinical Record Initiative Search (CRIS), but performance was computed after removing neutral (non-suicide) results from their machine learning output. Carson et al. enriched notes associated with suicide attempt that were then used to train a random forest model achieving 83% sensitivity, but only 22% specificity.²⁰ Cook et al.²¹ applied a bag-of-words approach with machine learning to identify suicide ideation and psychiatric symptoms using notes for patients identified as having performed self-harm, achieving 61% PPV (positive predictive value), 59% sensitivity, and 60% specificity, with results varying depending on the task. Zhang et al. sought to identify psychological stressors using a preannotated data set of psychiatric evaluation records from the CEGS N-GRID 2016 challenge²² as a gold standard, for a conditional random fields machine learning model,²³ yielding final F scores of 73.91% and 89.01%, respectively, on exact and inexact stressor matching, and 97.73% and 100% respectively, for exact and inexact suicide recognition on instances of the positive keywords with the stressors; however, their evaluation methods for this are not detailed.

Zhong et al. applied structured data and NLP to identify suicidal behavior in pregnant women, achieving PPV of 76% and 30%, for women identified through relevant diagnostic codes and through NLP for women not receiving a relevant diagnostic code, respectively.²⁴ Obeid et al.²⁵ trained a convolutional neural network that achieved an AUC of 0.882 and an F1 score of 0.769 in predicting relevant suicide ICD codes in subsequent years. Using notes from psychiatric encounters, Cusick et al.²⁶ developed a rule-based NLP tool to identify positive instances of suicide-oriented keywords that leveraged NegEx.²⁷ They also developed different weakly supervised machine learning models. A convolutional neural network receiving Word2Vec²⁸ word embeddings as input achieved precision, recall, F1 score, and AUC values of 0.81, 0.83, 0.82, and 0.946, respectively. In a subsequent evaluation, the convolutional neural network correctly classified 87% of the 23 notes (of 5000 clinical notes) receiving a positive classification, from notes for patients diagnosed with depression or prescribed an antidepressant. Recently, Rozova et al. obtained promising results (87% AUC) using a gradient boosting model, although the study was limited to emergency room triage notes.²⁹

Seeking suicidality in all types of clinical notes, among all types of patients, or when hampered by imbalanced data, that is, when there are few positive examples, is indeed a complex task. Some of the methods in the papers cited above tend to suffer from low precision, specificity, and possibly also low sensitivity (recall). Identifying probability thresholds addresses these problems, providing flexibility for a given task. A strategic implementation of a technique like zero-shot learning (ZSL) may provide accurate identification of suicidality in clinical notes.

1.2 ZSL

ZSL enables predictions on unseen data using a model trained on data that has labels that are different than those of the unseen data.^{30, 31} It largely operates by mapping select properties of the data (i.e., the “feature space”) to a semantic representation (i.e., the “semantic space”) that enables prediction of unseen classes.³² In other words, auxiliary information must be provided on the labels of the unseen classes to make it possible for a trained model to recognize them in the testing data.

ZSL has been applied in several computer vision tasks,^33-36 as well as NLP tasks.³⁷ Accordingly, a feature space can consist of data derived from images³⁸ or text.³⁷ The semantic representation can be based on several different approaches, including data attributes, semantic word vectors as those provided by skip-gram or continuous-bag-of-word architectures³² or BERT output,³⁹ or knowledge graphs.³² Examples in NLP applications include semantic utterance classification⁴⁰ multilingual translation⁴¹ and emotion detection.⁴² However, other than Sivarajkumar and Wang's work³⁹ there is little ZSL research in unstructured clinical text data.

1.3 Objectives

We investigated a ZSL methodology applied to a binary suicidality classification task. The training data set was constructed using diagnostic codes (ICD-10-CM codes) related to suicide. Our target label is the broader concept of suicidality. To enable ZSL, a base string representing suicidality was selected. We then built the semantic space by identifying key features associated with suicidality in the training data set. A DNN model was developed using the training data and tested on two different sets of unseen data with the unseen label of suicidality. Specifically, we sought to answer:

Will ZSL effectively identify suicidality documentation from among all types of clinical notes, using review by clinicians as the reference standard?
Will ZSL effectively identify suicidality or suicide risk documentation from among clinical notes not associated with a relevant ICD-10-CM code, by probability threshold, in terms of precision, using the same reference standard?

We are unaware of previous descriptions of this methodology and to our knowledge it has not been used before this study.

2 METHODS

2.1 Ethical approval

This project was carried out in support of the Understanding Suicide Risks among LGBT Veterans in VA Care study, which was approved by the VA Central IRB #01955. No patients were contacted, only data from the EHR database were used.

2.2 Training data

A training data set was created using two corpora. The first corpus consisted of 50,000 randomly selected VA clinical notes from outpatient encounters recorded between 2016 and 2019 which contained the base string “suicid” (e.g., “suicide,” “suicidal”) and were associated with at least one ICD-CM-10⁴³ code identified by the National Health Statistics Report from the Centers for Disease Control and Prevention (CDC) indicating suicide attempt or intentional self-harm.⁴⁴ This corpus is referred to as stringAndDx (9170 unique patients). The second corpus consisted of 50,000 randomly selected VA clinical notes from outpatient encounters recorded between 2016 and 2019 that were associated with other ICD-CM-10 codes that were irrelevant to suicidality or self-harm. These notes were extracted from patients matching the stringAndDx patients in age (at the time of document retrieval), race, and ethnicity. This second corpus is referred to as noDx (8638 unique patients). Each corpus was preprocessed by transforming all letters to lower case, removing basic formatting markup and punctuation, separating character strings into tokens (words), separating relevant concatenated tokens (e.g., “suicidalhomicidal” to “suicidal” ”homicidal”), and removing all tokens that did not entirely consist of letters.

2.3 Semantic space feature extraction and mapping

The task to build the semantic space consisted of four steps: First, we identified a list of features that were potentially relevant for the positive training label. Second, word embeddings were created using a skip-gram architecture. Third, we identified context words of the selected features using the word embeddings. In the fourth step, a contextual weight was assigned to each feature for each document in mapping the semantic space to the feature space.

In the first step, a term frequency–inverse document frequency (TFIDF) analysis was used to identify the n most important terms in each corpus. For this investigation, n = 1000. TFIDF evaluates term frequency using the count of documents containing a given term. In each document, the relative frequency of each term is weighted by the log of the number of documents in the corpus divided by the number of documents containing the term, as shown in the below equation

{t}_{i,j}=t{f}_{i,j}\,\ast \,\mathrm{log}(\frac{n}{d{f}_{i}})

()

where t_i,j is term i in document j, tf_i,j is the relative frequency of term i in document j, n is the total number of documents, and df_i is the number of documents containing term i. Because TFIDF is a document-based measurement, we used the mean TFIDF value for each term in its respective corpus. The words with the top TFIDF scores that are unique to the stringAndDx corpus were treated as features. Figure 1 illustrates this process. Each circle represents terms from one of the corpora. Sets a and b are the words with the top n TFIDF scores for stringAndDx and noDx, respectively. Set c is the overlap between a and b. The feature set F contains words that are in set a, but not in the overlap set c or in set b (f ∈ a and f ∉ c and f ∉ b).

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Feature identification. Words that are deemed as features are in set a, excluding words in c and b.

In the second step, we created a Word2Vec model using the stringAndDx corpus. In this study, the model was a shallow neural network with the hidden layer containing 300 nodes, applying the skip-gram architecture, with an analytic window size of 5, trained through 10 iterations.

In the third step, the top m context words for each feature word were identified using the word embeddings from the Word2Vec model. The m words most similar to each feature word, according to cosine similarity values, served as its context words. In this investigation, m = 50.

In the fourth step, the feature space, that is, a document's preprocessed content, is mapped to the semantic space. A weight v is assigned to each feature word for each document, based on its occurrence with its context words in a window in the document's text. This weight is the summed total of the cosine similarity between the feature and a co-occurring context word multiplied by the mean TFIDF value of the feature word. The formula is shown in the below equation.

v=\sum _{x\in F,y\in D}\mathrm{cosSim}(x,y)\,\ast \,{tfidf}(x)

()

where x is a feature in F, the set of features in the semantic space, and y is a context word of set D, the context words for x in the semantic space, which occurs in a five-word window around x in the document's text. This process is illustrated in Figure 2, where “pattern” is a feature word, and “internalizing” and “fitful” are among its set of context words and appear in a five-word window.

If a feature word is not in the text, its value is zero for the given document.

2.4 Model development

The authors randomly selected 20,000 documents from each corpus (stringAndDx and noDx). These documents were used to train a DNN model (here referred to as the ZSL DNN) consisting of five fully connected hidden layers of alternating sizes of 30 or 70 nodes, with each layer implementing a dropout rate of 0.5. The Adam optimizer⁴⁵ was implemented, with a learning rate of 0.0012, beta 1 value of 0.92, beta 2 value of 0.9992, and an epsilon value of 1e-08, with binary cross entropy as the loss function, and the sigmoid function in the output layer, since it was a binary classification task. The architecture and hyperparameters were chosen on empirical grounds, after experimentation. Each document from the stringAndDx corpus was classified as “1” (a generic positive instance), and each document from the noDx corpus was classified as “0” (a generic negative instance). These labels do not indicate whether or not the given document directly pertains, or not pertains, to suicidality or its risks, but an association with a structured data element, and for those labeled “1,” also containing a base string. Balancing the training data in this manner (i.e., providing balanced training examples) addressed the problematic issue of otherwise training a model with few positive and many negative instances. We implemented a 60% training, 20% validation, and 20% testing split in developing the ZSL DNN. Figure 3 illustrates the method.

2.5 Evaluation

The authors retrieved 5000 different clinical notes recorded in 2020 that were associated with at least one of the relevant ICD-10-CM codes. This corpus is subsequently labeled as testSet1. The authors also randomly retrieved 5000 different clinical notes recorded in 2020 that were associated with other ICD-10-CM codes irrelevant to suicidality or self-harm. This corpus is subsequently labeled testSet2.

The contents of each of the notes in testSet1 and testSet2 were mapped to the semantic space, that is, deriving a weight for each feature word as described earlier in the fourth step. Then, the trained ZSL DNN was used to classify the notes in testSet1 and testSet2 as (a) containing suicidality documentation, or (b) not containing suicidality documentation.

In joint sessions, two clinical psychologists familiar with VA clinical note documentation together identified suicidality (i.e., current or past suicide ideation or attempt) in 200 notes randomly selected from testSet1 and testSet2 (100 from each test set), after being instructed to look for documentation for these specific events. They addressed differences of opinion through discussion and mutual consensus during the joint sessions. In a second evaluation, to explore how the application's output may serve to identify patients who had experienced or were at risk for suicidality, but never formally diagnosed as such, the clinicians examined the testSet2 notes containing the base string “suicid” that received a probability value of 0.90 or greater from the trained ZSL DNN, for documentation of suicidality and/or its risk factors, according to NIH guidelines.⁴⁶ This threshold was chosen to explore how high-probability documents (i.e., the top 10% in terms of probability) would be representative in identifying documented suicidality or its risk factors with high precision, thus addressing our second question.

2.5.1 Baseline comparison

For comparative purposes, we identified the 163 most frequent bigrams unique to the stringAndDx corpus and used them to develop a bag-of-words baseline model. We trained a DNN (here referred to as the Baseline DNN) using these 163 bigrams as features for the 20,000 stringAndDx documents and the 20,000 noDx documents. This baseline DNN was also used to classify the notes in testSet1 and testSet2, for (a) containing suicidality documentation, or (b) not containing suicidality documentation, using the 163 most frequent bigrams as features.

We evaluated performance of the ZSL DNN and the Baseline DNN by measuring AUC, sensitivity/recall, specificity, and precision/PPV. These metrics are standard to NLP assessment.

3 RESULTS

The first step of the new method (described in Section 2) identified 163 feature words associated with suicidality diagnosis. The top thirty feature words are listed in Table 1. No form of the base string “suicid” was found among the 163 final feature words. Both “suicide” and “suicidal” were prominent terms in both the noDx and stringAndDx corpora, along with terms like “psychiatrist” and “psychosocial”; this is likely due to the proliferation of objects like questionnaires, and mental health care documentation in notes that are unrelated to suicidality.

Table 1. Top 30 feature words for zero-shot learning (ZSL) model.

flag	overdose	coordinator	took	spc
observation	called	warning	pills	prf
unknown	interrupted	gun	placement	lcsw
lethal	outcome	reportedly	notified	sdv
occurred	police	protocol	od	supports
seeking	category	preparatory	cut	determined

3.1 ZSL DNN and baseline DNN performance

The classifications by the clinicians and the probabilities assigned by the ZSL DNN and the Baseline DNN were first assessed by AUC score. The results are in Table 2 and Figure 4.

Table 2. AUC performance of models.

Zero-shot learning (ZSL) deep neural network (DNN)	Baseline DNN
0.946	0.47

In terms of AUC, the ZSL DNN trained through mapping the semantic space to the feature space outperformed the Baseline DNN trained with the bigram bag-of-words features.

The sensitivity, specificity, and PPV results at 0.15, 0.5, and 0.85 probability thresholds for each DNN are in Tables 3–5. Probability refers to the probability the DNN assigned to each note for positive suicidality documentation. We applied the median probability (0.1499, rounded) assigned by the ZSL DNN to the testSet2 documents (the test set containing random notes associated with irrelevant ICD-10-CM codes) in forming minimum and maximum thresholds; 0.5 is a standard midpoint probability threshold. The combined scores in these tables were computed with all true positives, true negatives, false positives, and false negatives for both test sets, for the indicated metrics. Values of NaN (not a number) occurred where there were no true positives or false positives.

Table 3. Evaluation results at 0.15 probability threshold.

Zero-shot learning (ZSL) deep neural network (DNN)	Sensitivity/Recall (%)	Specificity (%)	Precision/positive predictive value (PPV) (%)
testSet1	97	100	91
testSet2	100	64	05
Combined	97	59	67
Baseline DNN
testSet1	99	0	90
testSet2	50	9	1
Combined	98	8	48

Table 4. Evaluation results at 0.5 probability threshold.

Zero-shot learning (ZSL) deep neural network (DNN)	Sensitivity/Recall (%)	Specificity (%)	Precision/positive predictive value (PPV) (%)
testSet1	92	40	93
testSet2	50	97	25
Combined	91	92	90
Baseline DNN
testSet1	92	0	89
testSet2	50	10	1
Combined	91	9	46

Table 5. Evaluation results at 0.85 probability threshold.

Zero-shot learning (ZSL) deep neural network (DNN)	Sensitivity/Recall (%)	Specificity (%)	Precision/positive predictive value (PPV) (%)
testSet1	77	70	96
testSet2	50	100	100
Combined	76	97	96
Baseline DNN
testSet1	0	100	NaN/div by 0
testSet2	0	100	NaN/div by 0
Combined	0	100	NaN/div by 0

The ZSL DNN outperformed the Baseline DNN in most metrics at all probability thresholds.

3.2 Second evaluation

To explore how this new methodology can identify clinical notes documenting suicidality that are not associated with a relevant ICD-10-CM code with high precision, the clinicians also reviewed the 16 notes from testSet2 containing the base string “suicid” that received a probability at or above 0.90 from the trained ZSL DNN. The clinicians noted suicide ideation or attempt, and the presence of the following suicide risk factors, based on National Institute of Mental Health guidelines⁴⁶:

Depression and other mental health disorders
Substance abuse disorder
Family history of a mental health or substance abuse disorder
Family history of suicide
Family violence, including physical or sexual abuse
Having guns or other firearms in the home
Being in prison or jail
Being exposed to others' suicidal behavior

Of these 16 clinical notes (associated with 16 different patients), seven documented current or past suicide ideation or attempt. Eight of the remaining notes included one or more risk factors for suicide (nearly all included multiple risk factors). In all, 15 of the 16 notes contained documentation of current or past suicide ideation or attempt, and/or suicide risk factors, for patients who had never received a suicidality ICD-10-CM code diagnosis during the study period, achieving a PPV of 93.8%.

4 DISCUSSION

Regarding the study's original questions, our ZSL approach effectively identified suicidality in all types of clinical notes, surpassing the performance of the bag-of-words baseline in conjunction with deep learning. It also effectively identified suicidality or suicide risk documentation from among clinical notes not associated with a relevant ICD-10-CM code with high precision, on probability threshold.

4.1 Semantic space

In this work, the semantic space development is framed as feature extraction where mapping is enhanced by attaching weights to features found in the data, an approach also used in computer vision ZSL.⁴⁷ The semantic space captures natural data properties by identifying salient terms and relevant contextual terms in collective clinical suicidality documentation (i.e., a corpus of notes associated with relevant ICD codes). Table 1 lists 30 prominent feature words associated with collective suicidality documentation after removing terms associated with other kinds of documents. There is an intuitive sense to these words; “flag” is found in the phrase “high risk for suicide flag”; “overdose” and “cut” refer to suicide methods; “pills” and “gun” refer to suicide instruments. Identifying terms contextually similar to these provides patterns in relevant documentation. Again, this has an intuitive logic. The most contextually similar terms to “flag” include “reactivate” and “deactivate” (for a high suicide risk flag) and “high” (the level of risk). The most contextually similar terms to “pills” include “handful,” “fistfuls,” and “bunch,” implying large quantities, along with “overdosing” and “took,” the associated actions. The feature word “spc” indicates VA's suicide prevention coordinators, which is a structural change that VA implemented for suicide prevention.¹⁰ Concordantly, “police” and “lcsw” (i.e., licensed clinical social worker) refer to other professions highly associated with individuals at risk for suicide. For example, police may be activated for a rescue, and a licensed clinical social worker may be involved in treatment planning or referral connections for suicidal individuals. The feature words “prf” and “sdv” refer to “patient record flag” and “self-directed violence,” respectively. The semantic space provided an efficient representation for effective mapping to the feature space.

4.2 Data retrieval and model training

Using associated structured data elements like ICD-10-CM codes, and a base string provides a means to locate equally sized corpora for training that could be generically labeled “0” or “1.” These labels were primarily based on a structured data association, since their individual unstructured content was mostly unknown. This approach solves the issue of imbalanced training data. The predominant clinical note types (Appendix) also illustrate this. Most of the frequent note types associated with one of the relevant CDC ICD-10-CM codes and containing the base string are relevant to suicidality. Addendum is a common note type⁴⁸ associated with many domains.⁴⁹ The most frequent note types not associated with a relevant code resemble frequencies of all note types in the VA.⁴⁸

4.3 Identifying suicidality documentation

To our knowledge, this method has not been applied in other studies. Unlike VA surveillance methods using structured data, it also leverages information found in EHR notes. Also, unlike other NLP methods^{18, 20, 21, 23, 24, 26, 29, 50} it can be applied to all patients and note types. In other studies, a bag-of-words approach has been applied to suicidality identification and other machine-learning tasks.^{21, 51, 52} However, the results of this current study suggest that the complexity of suicidality documentation demands a more targeted approach.

This method could complement existing measures like SPAN, alerting suicide prevention coordinators of additional patients at risk. The results of the two clinical psychologists' evaluations demonstrate the method's efficiency in identifying suicidality documentation for documents where there is no relevant ICD-10-CM code. The performance on both test sets demonstrates the methodology's effectiveness in classifying notes that are mixed in terms of ICD-10-CM coding.

Tables 3–5 suggest that the probability threshold can be adjusted to suit a specific task like finding suicidality and its risk factors with high precision among notes not associated with a relevant ICD-10-CM code. This is especially true considering the small prevalence of suicidality documentation in clinical notes. The second evaluation (which yielded 93.8% PPV) demonstrates this. By applying a high probability threshold of 0.90 to all 5000 testSet2 documents and focusing on clinical notes containing the base string, of the 16 documents (for 16 different patients), 94% contained suicidality and/or suicidality risk factor documentation, based on clinician review. These results exceed those of Cusick et al.'s²⁶ similar task, where 87% of notes were correctly classified, among notes for patients diagnosed with depression or prescribed an antidepressant. In this current study's second evaluation, none of the 16 patients identified had ever received a suicide ICD-10-CM code during the study's time period. It is impossible to know if the patients in the 8 notes simply containing documented risk factors were suicidal or not based solely on EHRs. Suicidal patients sometimes deny suicide ideation or attempt.^{53, 54} For example, in one note from the chart review associated with a relevant ICD-10-CM code, the patient reportedly denied suicide ideation, even after checking into the hospital hours earlier for a self-reported suicide attempt.

4.4 Future work

This work is part of a larger study of patients at risk for suicide.⁵⁵ The next step is to combine these findings with prior work. We also plan an analysis of patients from first suicide ideation or attempt documented in the VA system, to understand their evolution of care.

4.5 Limitations

VHA data largely cover a population of older men. However, the amount of women and younger patients is increasing, thus also increasing the generalizability of these findings. The corpora retrieval method we used to train the ZSL DNN is dependent on clinicians' use of the relevant ICD-10-CM codes in documenting care, which may be prone to underuse.⁹ However, the results of this study indicate the method's utility. Due to environmental computational limitations, we randomly selected 20,000 notes from the stringAndDx corpus, and 20,000 notes from the noDx corpus for training the ZSL DNN.

5 CONCLUSION

We developed a new methodology to identify suicidality in clinical notes using zero-shot learning (ZSL). A trained ZSL deep neural network (DNN) outperformed a DNN trained using a baseline bag-of-words method in AUC scores and other metrics assessed at various probability thresholds on unseen data, according to expert review. This novel methodology identifies suicidality and its risk factors with high precision, when applying a 0.90 probability threshold, in VA clinical notes not associated with a relevant ICD-10-CM code. This methodology could complement existing suicidality identification measures. These findings hold promise for future research.

AUTHOR CONTRIBUTIONS

Terri Elizabeth Workman: Conceptualization; data curation; formal analysis; investigation; methodology; project administration; software; validation; visualization; writing—original draft; writing—review and editing. Joseph Goulet: Funding acquisition; supervision; writing—original draft. Cynthia Brandt: Supervision; writing—original draft. Allison Warren: Formal analysis; validation; writing—original draft. Jacob Eleazer: Formal analysis; validation; writing—original draft. Melissa Skanderson: Data curation; writing—original draft. Luke Lindemann: Writing—original draft. John R. Blosnich: Writing—original draft. John O'Leary: Data curation; writing—original draft. Qing Zeng-Treitler: Conceptualization; formal analysis; funding acquisition; supervision; validation; writing—original draft; writing—review and editing.

ACKNOWLEDGMENTS

This work was funded by Veterans Affairs Health Services Research and Development Services grant IIR 18-035 Understanding Suicide Risks among LGBT Veterans in VA Care, and NIH National Center for Advancing Translational Sciences grant UL1TR001876. The views expressed are those of the authors and do not necessarily reflect those of the Department of Veterans Affairs, the United States Government, or the academic affiliate institutions.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflict of interest.

TRANSPARENCY STATEMENT

The lead author Terri Elizabeth Workman affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

APPENDIX

See Table A1.

Table A1. Most Frequent clinical note types by corpus.

Most frequent note types in training data by corpus
stringAndDx		noDx
Note type	Count	Note type	Count
Addendum	2844	Addendum	5683
Suicide behavior and report	843	Primary care secure messaging	291
Suicide prevention telephone note	811	Nursing note	228
Suicide behavior and overdose report	613	Administrative note	207
Suicide prevention note	452	State prescription drug monitoring program	110
Suicide prevention safety plan	448	Care flow sheet	88
Mental health nursing assessment note	374	Telephone contact	75
Veterans crisis line note	222	Mental health diagnostic study note	71
Social work note	213	Non-VA care consult result note	69
Suicide prevention contact	212	Operation report	64

Open Research

DATA AVAILABILITY STATEMENT

Because the data used in this study include protected health information, they are not publicly available.

REFERENCES

1 Suicide. National Library of Medicine/National Institutes of Health. https://www.ncbi.nlm.nih.gov/mesh/68013405
Google Scholar
2Hedegaard H, Curtin SC, Warner M. Suicide mortality in the United States, 1999–2019. NCHS Data Brief No. 398; 2021. doi:10.15620/cdc:101761
Google Scholar
3 Suicide Statistics. 2023. https://afsp.org/suicide-statistics
Google Scholar
4Parker-Pope T. Suicide rates rise sharply in U.S. The New York Times. 2013. https://www.nytimes.com/2013/05/03/health/suicide-rate-rises-sharply-in-us.html#:~:text=Suicide%20Rates%20Rise%20Sharply%20in%20U.S.,-Give%20this%20article&text=Suicide%20rates%20among%20middle%2Daged,vulnerable%20to%20self%2Dinflicted%20harm
Google Scholar
5Tavernise S. U.S. suicide rate surges to a 30-year high. The New York Times. 2016. https://www.nytimes.com/2016/04/22/health/us-suicide-rate-surges-to-a-30-year-high.html#:~:text=WASHINGTON%20%E2%80%94%20Suicide%20in%20the%20United,was%20particularly%20steep%20for%20women
Google Scholar
6Hawton K, Casañas i Comabella C, Haw C, Saunders K. Risk factors for suicide in individuals with depression: a systematic review. J Affect Disord. 2013; 147(1-3): 17-28. doi:10.1016/j.jad.2013.01.004
10.1016/j.jad.2013.01.004
PubMed Web of Science® Google Scholar
7Stack S. Suicide: a 15-year review of the sociological literature. Part I: cultural and economic factors. Suicide Life Threat Behav. 2000; 30(2): 145-162. doi:10.1111/j.1943-278X.2000.tb01073.x
10.1111/j.1943-278X.2000.tb01073.x
CAS PubMed Web of Science® Google Scholar
8 2019 National Veteran Suicide Prevention Annual Report; 2019. https://www.mentalhealth.va.gov/docs/data-sheets/2019/2019_National_Veteran_Suicide_Prevention_Annual_Report_508.pdf
Google Scholar
9Hoffmire C, Stephens B, Morley S, Thompson C, Kemp J, Bossarte RM. VA suicide prevention applications network: a national health care system-based suicide event tracking system. Public Health Rep. 2016; 131(6): 816-821. doi:10.1177/0033354916670133
10.1177/0033354916670133
PubMed Web of Science® Google Scholar
10Carroll D, Kearney LK, Miller MA. Addressing suicide in the veteran population: engaging a public health approach. Front Psychiatry. 2020; 11:569069. doi:10.3389/fpsyt.2020.569069
10.3389/fpsyt.2020.569069
PubMed Web of Science® Google Scholar
11Katz I. Lessons learned from mental health enhancement and suicide prevention activities in the Veterans Health Administration. Am J Public Health. 2012; 102(suppl 1): S14-S16. doi:10.2105/AJPH.2011.300582
10.2105/AJPH.2011.300582
PubMed Google Scholar
12McCarthy JF, Bossarte RM, Katz IR, et al. Predictive modeling and concentration of the risk of suicide: implications for preventive interventions in the US department of veterans affairs. Am J Public Health. 2015; 105(9): 1935-1942. doi:10.2105/AJPH.2015.302737
10.2105/AJPH.2015.302737
PubMed Web of Science® Google Scholar
13Kessler RC, Hwang I, Hoffmire CA, et al. Developing a practical suicide risk prediction model for targeting high-risk patients in the Veterans Health Administration. Int J Methods Psychiatr Res. 2017; 26(3):e1575. doi:10.1002/mpr.1575
10.1002/mpr.1575
PubMed Web of Science® Google Scholar
14Kong HJ. Managing unstructured big data in healthcare system. Healthc Inform Res. 2019; 25(1): 1-2. doi:10.4258/hir.2019.25.1.1
10.4258/hir.2019.25.1.1
PubMed Web of Science® Google Scholar
15Moss J, Andison M, Sobko H. An analysis of narrative nursing documentation in an otherwise structured intensive care clinical information system. AMIA. 2007; 2007: 543-547.
PubMed Google Scholar
16Bostwick JM, Pabbati C, Geske JR, McKean AJ. Suicide attempt as a risk factor for completed suicide: even more lethal than we knew. Am J Psychiatry. 2016; 173(11): 1094-1100. doi:10.1176/appi.ajp.2016.15070854
10.1176/appi.ajp.2016.15070854
PubMed Web of Science® Google Scholar
17Louzon SA, Bossarte R, McCarthy JF, Katz IR. Does suicidal ideation as measured by the PHQ-9 predict suicide among VA patients? Psychiatr Serv. 2016; 67(5): 517-522. doi:10.1176/appi.ps.201500149
10.1176/appi.ps.201500149
PubMed Web of Science® Google Scholar
18Levis M, Leonard Westgate C, Gui J, Watts BV, Shiner B. Natural language processing of clinical mental health notes may add predictive value to existing suicide risk models. Psychol Med. 2020; 51(8): 1382-1391. doi:10.1017/S0033291720000173
10.1017/S0033291720000173
PubMed Web of Science® Google Scholar
19Fernandes AC, Dutta R, Velupillai S, Sanyal J, Stewart R, Chandran D. Identifying suicide ideation and suicidal attempts in a psychiatric clinical research database using natural language processing. Sci Rep. 2018; 8(1): 7426. doi:10.1038/s41598-018-25773-2
10.1038/s41598-018-25773-2
PubMed Web of Science® Google Scholar
20Carson NJ, Mullin B, Sanchez MJ, et al. Identification of suicidal behavior among psychiatrically hospitalized adolescents using natural language processing and machine learning of electronic health records. PLoS One. 2019; 14(2):e0211116. doi:10.1371/journal.pone.0211116
10.1371/journal.pone.0211116
CAS PubMed Web of Science® Google Scholar
21Cook BL, Progovac AM, Chen P, Mullin B, Hou S, Baca-Garcia E. Novel use of natural language processing (NLP) to predict suicidal ideation and psychiatric symptoms in a text-based mental health intervention in Madrid. Comput Math Methods Med. 2016; 2016: 1-8. doi:10.1155/2016/8708434
10.1155/2016/8708434
CAS Web of Science® Google Scholar
22Uzuner Ö, Stubbs A, Filannino M. A natural language processing challenge for clinical records: research domains criteria (RDoC) for psychiatry. J Biomed Inf. 2017; 75S: S1-S3. doi:10.1016/j.jbi.2017.10.005
10.1016/j.jbi.2017.10.005
PubMed Google Scholar
23Zhang Y, Zhang OR, Li R, et al. Psychiatric stressor recognition from clinical notes to reveal association with suicide. Health Informatics J. 2019; 25(4): 1846-1862. doi:10.1177/1460458218796598
10.1177/1460458218796598
PubMed Web of Science® Google Scholar
24Zhong Q-Y, Karlson EW, Gelaye B, et al. Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing. BMC Med Inform Decis Mak. 2018; 18(1): 30. doi:10.1186/s12911-018-0617-7
10.1186/s12911-018-0617-7
PubMed Google Scholar
25Obeid JS, Dahne J, Christensen S, et al. Identifying and predicting intentional self-harm in electronic health record clinical notes: deep learning approach. JMIR Med Inform. 2020; 8(7):e17784. doi:10.2196/17784
10.2196/17784
PubMed Web of Science® Google Scholar
26Cusick M, Adekkanattu P, Campion, Jr. TR, et al. Using weak supervision and deep learning to classify clinical notes for identification of current suicidal ideation. J Psychiatr Res. 2021; 136: 95-102. doi:10.1016/j.jpsychires.2021.01.052
10.1016/j.jpsychires.2021.01.052
PubMed Web of Science® Google Scholar
27Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inf. 2001; 34(5): 301-310. doi:10.1006/jbin.2001.1029
10.1006/jbin.2001.1029
CAS PubMed Web of Science® Google Scholar
28Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv: 1301. 3781; 2013. doi:10.48550/arXiv.1301.3781
Google Scholar
29Rozova V, Witt K, Robinson J, Li Y, Verspoor K. Detection of self-harm and suicidal ideation in emergency department triage notes. J Am Med Inform Assoc. 2022; 29(3): 472-480. doi:10.1093/jamia/ocab261
10.1093/jamia/ocab261
PubMed Web of Science® Google Scholar
30Lampert CH, Nickisch H, Harmeling S. Learning to detect unseen object classes by between-class attribute transfer. 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009; 2009: 951-958. doi:10.1109/CVPR.2009.5206594
10.1109/CVPR.2009.5206594
Google Scholar
31Larochelle H, Erhan D, Bengio Y. Zero-data learning of new tasks. AAAI. 2008; 1(2): 3. doi:10.5555/1620163.1620172
Google Scholar
32Sun X, Gu J, Sun H. Research progress of zero-shot learning. Appl Intel. 2021; 51(6): 3600-3614. doi:10.1007/s10489-020-02075-7
10.1007/s10489-020-02075-7
Web of Science® Google Scholar
33Hu RL, Xiong C, Socher R. Zero-shot image classification guided by natural language descriptions of classes: a meta-learning approach. NeurIPS; 2018. https://nips2018vigil.github.io/static/papers/accepted/28.pdf
Google Scholar
34Romera-Paredes B, Torr P. An embarrassingly simple approach to zero-shot learning. In International conference on machine learning. PMLR; 2015: 2152-2161.
Google Scholar
35Socher R, Ganjoo M, Manning CD, Ng A. Zero-shot learning through cross-modal transfer. Adv Neural Inf Process Syst. 2013; 26: 1-10.
Google Scholar
36Xian Y, Lampert CH, Schiele B, Akata Z. Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell. 2019; 41(9): 2251-2265. doi:10.1109/TPAMI.2018.2857768
10.1109/TPAMI.2018.2857768
PubMed Web of Science® Google Scholar
37Dinu G, Lazaridou A, Baroni M. Improving zero-shot learning by mitigating the hubness problem. arXiv preprint arXiv: 1412. 6568. doi:10.48550/arXiv.1412.6568
Google Scholar
38Pourpanah F, Abdar M, Luo Y, et al. A review of generalized zero-shot learning methods. IEEE Trans Pattern Anal Mach Intell. 2023; 45(4): 4051-4070. doi:10.1109/TPAMI.2022.3191696
PubMed Web of Science® Google Scholar
39Sivarajkumar S, Wang Y. HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural Language Processing. arXiv preprint arXiv: 2203.05061; 2022. doi:10.48550/arXiv.2203.05061
Google Scholar
40Dauphin YN, Tur G, Hakkani-Tur D, Heck L. Zero-shot learning for semantic utterance classification. arXiv preprint arXiv: 1401.0509; 2013. doi:10.48550/arXiv.1401.0509
Google Scholar
41Johnson M, Schuster M, Le QV, et al. Google's multilingual neural machine translation system: enabling zero-shot translation. Trans Asso Comput Linguist. 2017; 5: 339-351. doi:10.1162/tacl_a_00065
10.1162/tacl_a_00065
Google Scholar
42Tesfagergish SG, Kapočiūtė-Dzikienė J, Damaševičius R. Zero-shot emotion detection for semi-supervised sentiment analysis using sentence transformers and ensemble learning. Appl Sci. 2022; 12(17):8662. doi:10.3390/app12178662
10.3390/app12178662
CAS Google Scholar
43 International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM); 2023. https://www.cdc.gov/nchs/icd/icd-10-cm.htm
Google Scholar
44Hedegaard H, Schoenbaum M, Claassen C, Crosby A, Holland K, Proescholdbell S. Issues in developing a surveillance case definition for nonfatal suicide attempt and intentional self-harm using international classification of diseases, tenth revision, clinical modification (ICD-10-CM) coded data. Natl Health Statis Rep. 2018;(108): 1-19.
PubMed Google Scholar
45Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint arXiv: 1412. 6980; 2014. doi:10.48550/arXiv.1412.6980
Google Scholar
46 Frequently Asked Questions about Suicide. 2023. https://www.nimh.nih.gov/health/publications/suicide-faq/#pub2
Google Scholar
47Caceres CA, Roos MJ, Rupp KM, et al. Feature selection methods for zero-shot learning of neural activity. Front Neuroinform. 2017; 11:41. doi:10.3389/fninf.2017.00041
10.3389/fninf.2017.00041
PubMed Web of Science® Google Scholar
48Shao Y, Divita G, Workman TE, Redd D, Garvin JH, Zeng-Treitler Q. Clinical sublanguage trend and usage analysis from a large clinical corpus. 2020 IEEE Int Conf Big Data. 2020; 2020: 3837-3845. doi:10.1109/BigData50022.2020.9378203
10.1109/BigData50022.2020.9378203
Google Scholar
49Workman TE, Divita G, Zeng-Treitler Q. Discovering sublanguages in a large clinical corpus through unsupervised machine learning and information gain. 2019 IEEE Int Conf Big Data. 2019; 2019: 4889-4898. doi:10.1109/BigData47090.2019.9006492
10.1109/BigData47090.2019.9006492
Google Scholar
50Tsui FR, Shi L, Ruiz V, et al. Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts. JAMIA Open. 2021; 4(1):ooab011. doi:10.1093/jamiaopen/ooab011
10.1093/jamiaopen/ooab011
PubMed Google Scholar
51Clapp MA, Kim E, James KE, Perlis RH, Kaimal AJ, McCoy, Jr. TH. Natural language processing of admission notes to predict severe maternal morbidity during the delivery encounter. Am J Obstet Gynecol. 2022; 227(3): 511.e1-511.e8. doi:10.1016/j.ajog.2022.04.008
10.1016/j.ajog.2022.04.008
PubMed Web of Science® Google Scholar
52Figueroa RL, Flores CA. Extracting information from electronic medical records to identify the obesity status of a patient based on comorbidities and bodyweight measures. J Med Syst. 2016; 40(8): 191. doi:10.1007/s10916-016-0548-8
10.1007/s10916-016-0548-8
PubMed Web of Science® Google Scholar
53Harmer B, Lee S, Saadabadi A. Suicidal Ideation. StatPearls Publishing; 2023.
Google Scholar
54Mérelle S, Foppen E, Gilissen R, Mokkenstorm J, Cluitmans R, Van Ballegooijen W. Characteristics associated with non-disclosure of suicidal ideation in adults. Int J Environ Res Public Health. 2018; 15(5):943. doi:10.3390/ijerph15050943
10.3390/ijerph15050943
PubMed Web of Science® Google Scholar
55Workman TE, Goulet JL, Brandt C, et al. A prototype application to identify LGBT patients in clinical notes. 2020 IEEE Int Conf Big Data. 2020; 2020: 4270-4275. doi:10.1109/BigData50022.2020.9378109
10.1109/BigData50022.2020.9378109
Google Scholar

Volume6, Issue9

September 2023

e1526

Identifying suicide documentation in clinical notes through zero-shot learning

Abstract

Background and Aims