This study used machine learning (ML) to predict mental health employees' turnover in the following 12 months using human resources data in a community mental health centre. The data contain 621 employees' information (e.g., demographics, job information and client information served by employees) hired between 2011 and 2021 (56.5% turned over during the study period). Six ML methods (i.e., logistic regression, elastic net, random forest [RF], gradient boosting machine [GBM], neural network and support vector machine) were used to predict turnover, along with graphical and statistical tools to interpret predictive relationship patterns and potential interactions. The result suggests that RF and GBM led to better prediction according to specificity, sensitivity and area under the curve (>0.8). The turnover predictors (e.g., past work years, work hours, wage, age, exempt status, educational degree, marital status and employee type) were identified, including those that may be unique to the mental health employee population (e.g., training hours and the proportion of clients with schizophrenia diagnosis). It also revealed nonlinear and nonmonotonic predictive relationships (e.g., wage and employee age), as well as interaction effects, such that past work years interact with other variables in turnover prediction. The study indicates that ML methods showed the predictability of mental health employee turnover using human resources data. The identified predictors and the nonlinear and interactive relationships shed light on developing new predictive models for turnover that warrant further investigations.

List of Abbreviations

ALEs: accumulated local effect plots
AUC: area under the receiver operating characteristic curve
EN: elastic net
GBM: gradient boosting machine
HR: human resources
KNN: K-nearest neighbour
LR: logistic regression
ML: machine learning
NN: neural network
RF: random forest
SHARPs: Shapley values
SVM: support vector machine
VIMs: variable importance scores

1 Introduction

Excessive employee turnover among mental health providers is a significant problem, with reported annual turnover rates ranging from 25% to 60% (Aarons and Sawitzky 2006; Beidas et al. 2016; Bukach et al. 2017). Some data indicate that over 70% of counties in the US report a severe shortage of mental health professionals (Hawkins 2021). This perpetuates a negative cycle of problems for remaining employees (e.g., increased workload and reduced morale). Predicting and preventing employee turnover is critical.

Turnover factors for mental health providers include low wages, high burnout, job dissatisfaction, low organisational support, lack of professional development opportunities (Cho and Song 2017) and concerns about workplace safety (Cho and Song 2017; Yanchus et al. 2015; Yanchus, Periard, and Osatuke 2017). Other factors include large client caseloads with severe symptoms (Hallett et al. 2024), increased job demands (Scanlan and Still 2013), staffing models (e.g., full-time vs. part-time; Beidas et al. 2016) and increased training demands for evidence-based practices (Brabson et al. 2019) in public mental healthcare systems. The combination of these individual and organisational factors contributes to increased turnover (SAMHSA, 2022). Although there are known turnover factors, as described above, the predictabilities are inconsistent across studies. Accordingly, effective turnover prevention strategies have yet to be identified.

Mental health organisations have struggled to identify high turnover risk employees and develop prevention strategies. The relevant challenges include the fact that turnover predictors in mental health literature are often identified through correlational analyses of turnover intention with limited variables collected from small survey samples. Although turnover intention correlates with actual turnover (Fukui, Wu, and Salyers 2019; Hom et al. 2017), their predictors can differ. For instance, turnover intention may be correlated with job stressors, while actual turnover could be associated with mental health provider characteristics, including age, work years in the field, educational degree and work hours (Fukui, Rollins, and Salyers 2020). Another challenge in survey methods with traditional analytical approaches (i.e., inferential statistics to test population parameters) is the limited application of the findings to local agencies, such as identifying high risk turnover cases in varying contexts (e.g., size of the agency, demographic composition and client populations). Finally, the traditional statistical methods (e.g., logistic regression) are limited in tailoring turnover prediction models based on the data characteristics, which often involve nonlinear, heterogeneous and interactive relationships. The present study aimed to overcome these limitations.

2 Background

In the current technological advancements enabling mental health researchers and practitioners to access and interpret big data, data-driven analytical approaches, including machine learning (ML), may identify localised turnover mechanisms. ML is a broad framework for extracting regularities from data using powerful automated algorithms (Hastie, Friedman, and Tibshirani 2001; Kassambara 2018). Unlike traditional statistical methods, ML is flexible in accommodating methods that can account for nonlinear and heterogeneous variable relationships. Although past turnover studies have utilised ML (Esmaieeli Sikaroudi, Ghousi, and Sikaroudi 2015; Sexton et al. 2005; Zhao et al. 2019), most of them were not focused on mental health service employees (who may differ from employees in for-profit firms), utilised a small set of ML methods or provided limited interpretations of the results (Fukui et al. 2023).

The primary goal of the current study was to apply various ML methods to predict community mental health employee turnover within the next 12 months using human resources (HR) data. Mental health employees are often overwhelmed by the amount of paperwork (Sullivan, Kondrat, and Floyd 2015), so collecting data through surveys may not always be ideal. On the other hand, HR data management systems contain useful variables that could predict turnover, yet they are often unutilised.

In the current study, we considered six representative ML methods to predict turnover versus nonturnover, including logistic regression (LR), elastic net (EN), random forest (RF), gradient boosting machine (GBM), neural network (NN) and support vector machine (SVM). Our focus is not only to examine the predictive power or accuracy of the variables in HR data but also to facilitate our understanding of turnover mechanisms by interpreting the relationships between turnover and the predictors using various graphical and statistical tools. Our study will promote the application of ML methods for turnover prediction in the mental health field. A comprehensive interpretation of the results will also shed light on developing future intervention strategies (e.g., Lundberg and Lee 2017; Molnar, Casalicchio, and Bischl 2020).

The rest of the article is organised as follows. First, we describe the data used in this study. Second, we briefly introduce the ML methods considered in the study, followed by several statistical and graphical tools to assist with result interpretation. We then present the results from the methods, including prediction power, identification of the most important predictors, visualisation of the patterns of the predictive relationships for the important predictors and exploration of potential interactions among the predictors (i.e., heterogeneous relationships). We conclude this article by discussing the main findings, limitations and potential directions for future research.

3 Method

3.1 Data

We used HR data from an urban community mental health centre in the US, which was obtained for the parent study to understand employee turnover mechanisms (NIMH R34MH119411). The HR data were extracted from five different data management systems by the HR, operations and clinical administration departments at the centre. The data were linked to one another by employee IDs, de-identified with research IDs, cleaned and processed for the ML applications by the researchers. Although the majority of data was well maintained in the data management systems, some data were incomplete. The missing data were filled in by HR staff where possible. The historical HR data (2011–2021) for the current study contained information on 621 employees. Among the 621 employees, 56.5% turned over (leavers) during the study period and 43.5% stayed (stayers). The Indiana University Institutional Review Board approved the original study. A waiver of informed consent was applied to the data already collected by the organisation.

We included 14 predictors from the HR data in our ML analyses. Six are categorical: gender (73% female vs. male), race (51% White vs. Black), educational degree (32% Associate degree and below, 34% Bachelors' degree and 34% Masters' degree and above), marital status (26% married vs. single), employee type (77% clinical vs. nonclinical) and exempt status (14% exempt vs. not exempt). Eight are continuous: employee's age, past work years, average hourly wage, average work hours per week, the total number of job training hours, the client information served by the employee (the proportions of male clients and clients with Schizophrenia diagnosis), and average client age. These data were all administratively collected in their routine HR data management. Table 1 shows the descriptive statistics of the included predictors by leavers and stayers.

TABLE 1. Descriptive statistics of predictors for stayers and leavers.

Predictors	Stayers (n = 270)			Leavers (n = 351)
Predictors	Mean	SD	Miss	Mean	SD	Miss
Continuous
Age in the previous year	44.2	13.1	0%	40.4	12.6	0%
Past work years	5.6	6.2	0%	3.2	5.5	0%
Hourly wage^a	0.3	0.1	0.4%	0.3	0.1	1.1%
Weekly work hours	36	4.1	0.7%	32	5.6	1.7%
Total number of job training hours	120.6	112.5	7%	103.1	84.6	8.5%
Average client age	46.7	11.7	42%	48.1	9.0	50%
% male clients	58%	24%	42%	59%	23%	50%
% clients with schizophrenia diagnosis	49%	32%	42%	55%	31%	50%

	Frequency	Frequency
Categorical
Gender
Female	205 (75.9%)	255 (72.7%)
Male	61 (22.6%)	95 (27.1%)
Miss	4 (1.5%)	1 (0.3%)
Race
Black	146 (54.1%)	163 (46.4%)
White	121 (44.8%)	171 (48.7%)
Miss	3 (1.1%)	17 (4.8%)
Educational degree
Associate degree and below	87 (32.2%)	111 (31.6%)
Bachelors' degree	82 (30.4%)	128 (36.5%)
Masters' degree and above	101 (37.4%)	112 (31.9%)
Marital status
Married	81 (30%)	87 (24.8%)
Single	184 (68.1%)	254 (72.4%)
Miss	5 (1.9%)	10 (2.8%)
Employee type
Clinical	216 (80%)	261 (74.4%)
Nonclinical	54 (20%)	49 (14%)
Miss	0 (0%)	41 (11.7%)
Exempt status
Exempt	42 (15.6%)	47 (13.4%)
Nonexempt	228 (84.4%)	304 (86.6%)

Note: The reference group for each categorical variable is in italics. Miss indicates missing data.
^a Hourly wage was log-transformed due to its non-normality and standardised by year to remove influence of inflation.

3.2 Data Preprocessing

Following the standard ML process, continuous predictors were standardised to ensure that their effects would not be influenced by their scale metrics (Hastie, Friedman, and Tibshirani 2001). In addition, because the wage ($) variable was highly skewed (skewness = 5.72, kurtosis = 52.49), we log-transformed it before the standardisation. All categorical data were dummy coded. There was a small amount of missing data on the predictors, ranging from 1% to 7%, except for the client-related variables (47%).

The missing data were imputed using the k-nearest neighbour (KNN) imputation, which is generally considered an effective and robust technique for imputing missing data (Jerez et al. 2010; Kuhn and Johnson 2013; Pereira, Basto, and Silva 2016; Troyanskaya et al. 2001). Simply speaking, KNN first identifies the complete cases similar to a missing case (based on Euclidian distance) and then takes a weighted average (weighted by the distance) of their observations to fill in the missing value (Troyanskaya et al. 2001). Because KNN is a nonparametric imputation method, it does not rely on any distributional assumptions nor assume linear relationships among the variables. Thus, it is unlikely to result in biased imputation or distort the relationships among the variables.

3.3 ML Methods

Below, we briefly introduce six ML methods used in the current study, focusing on conceptual understanding. The ML literature suggests that none of the methods could be universally superior. The relative performance can largely depend on the data set's characteristics (Domladovac 2021; Murugan, Nair, and Kumar 2019); thus, testing the competing methods in our particular study setting is important.

3.3.1 Logistic Regression (LR)

LR uses a logit link function to transform the probability of an event (i.e., turnover in the current study) and models the logit as a linear combination of a set of predictors. LR requires the logit to be correctly expressed as a linear combination of predictors. Any higher-order effects (such as interaction and quadratic effects), if considered, must be explicitly specified in the model. This requirement is not often satisfied in real-world applications. For simplicity or due to small sample sizes, researchers usually settle with a misspecified model, for example, a model with a small subset of possible higher-order effects (e.g., limited two-way interaction effects).

3.3.2 Elastic Net (EN)

EN is a regularisation technique that extends regular regression analysis by adding a penalty term (i.e., penalising the violated constraints) for regression coefficients in the discrepancy function for estimation. This penalty term will shrink the estimates, leading to biased estimates. However, this is intentionally designed to achieve a better variance-bias trade-off and improved generalisability. The penalty term makes EN capable of handling more nonlinear effects, even when the number of effects exceeds the sample size (Zou and Hastie 2005). In the current study, we fit several EN models with different complexities and chose the one that produced the highest area under a receiver operating characteristic curve, AUC (described below). These models included one with only first-order effects (i.e., no interactions, Model 1), one with all possible two-way interactions (Model 2), one with all quadratic effects for continuous predictors (Model 3), one that combined the effects in Models 2 and 3 (Model 4) and one with all two-way and three-way interactions (Model 5).

3.3.3 Neural Network (NN)

NN models relationships among the data by mimicking the learning process of the neurons in a human brain using a series of mathematical equations (Jun 2021). A simple NN involves an input layer (predictors) and an output layer (predicted outcome) connected by one or more hidden layers of nodes/neurons in the middle, which transfer the information from the input to the output. Predictors are combined through the hidden layer(s), and interactions are modelled implicitly, approximating any complex nonlinear relationships between the predictors and outcome (Hastie, Friedman, and Tibshirani 2001; Kavzoglu and Mather 2003). We only considered the simple NN in the current study.

3.3.4 Random Forest (RF)

RF is a powerful extension to the classification tree (a traditional nonparametric classification) method. The classification tree predicts an outcome by successively splitting the dataset into increasingly homogeneous subsets based on one predictor at a time (Breiman et al. 2017). Because a single classification tree is prone to biased and unstable results, compromising generalisability (Kirasich, Smith, and Sadler 2018; Strobl, Malley, and Tutz 2009), RF solves the problem by building many trees (e.g., 500). Specifically, it uses a resampling approach, such as bootstrapping, to create many samples from the original data and build a tree for each. It then assembles the results across all the trees to achieve more stable and accurate predictions. Furthermore, when building a tree, RF randomly selects a subset of predictors at each split. This reduces the correlations or redundancies among the trees, contributing to improved generalisability of the result (Breiman 2001; Kuhn and Johnson 2013).

3.3.5 Gradient Boosting Machine (GBM)

GBM offers another way to solve the generalisability problem of the classification tree. Like RF, GBM also builds a series of trees. However, it does so sequentially instead of parallelly by training each new tree to capture better the cases that have not been correctly classified (i.e., residuals) in the previous one. Briefly speaking, the first tree is grown based on the original dataset in which each observation is weighted equally. The second tree is then fit to a modified dataset with greater weights assigned to the cases that are difficult to classify and lower weights for those that are easy to classify (Zhang and Haghani 2015). This process continues for a specified number of iterations. The predictions from the multiple trees are then combined to determine the final classification prediction (Friedman 2002).

3.3.6 Support Vector Machine (SVM)

SVM involves searching for an optimal decision hyperplane (or decision boundary) to separate the classes in a categorical outcome (Brereton and Lloyd 2010). The decision hyperplane maximises the margin or distance between support vectors of the classes (leavers and stayers in our study). Support vectors are the closest data points to the hyperplane, making them the most difficult to classify. The basic idea is that if the most challenging cases can be optimally classified, so would the others. SVM allows for nonlinear decision boundaries, one major advantage over linear classifiers such as discriminant analysis (Gokcen and Peng 2002).

3.4 Cross-Validation

Overfitting (fitting random noise in a specific sample instead of true regularities in the corresponding population) is a common concern in ML methods. To prevent this problem, we used repeated K-fold cross-validation to evaluate the performance of the predictive models (Hastie, Friedman, and Tibshirani 2001). We set K = 10 based on the general practice and size of the data (James et al. 2021). This approach splits the data randomly into 10 equal-sized folds, and each fold is used as the testing data to evaluate the performance of the predictive model trained based on the other ninefolds. This process is repeated ten times, resulting in 100 sets of predictions. The predictions are then aggregated to evaluate the performance of each ML method according to the criteria described below.

3.5 Evaluation Criteria for Prediction Performance

Three criteria were used to evaluate the performance of the prediction models: specificity or true negative rate, sensitivity or true positive rate and area under the receiver operating characteristic curve (AUC). AUC is calculated based on sensitivity and specificity across various thresholds, providing an overall index of how well the cases are classified. An AUC value above 0.8 is generally considered good prediction power (Hosmer, Lemeshow, and Sturdivant 2013).

3.6 Interpretation of ML Results

As mentioned above, additional tools are needed to interpret the results from the examined ML methods. Several graphical and statistical tools were used in this study, including variable importance scores (VIMs), accumulated local effect plots (ALEs), Friedman's H-statistics and Shapley values (SHAPs). VIMs are used to capture the overall impact of each predictor. ALEs are graphical tools for visualising the marginal effects of the predictors. ALEs could also be used to visualise interaction effects. Given the many possible interaction effects (with 14 predictors, there could be 91 two-way interaction effects, not to mention higher-order interactions), we used H-statistics to identify a few impactful two-way interaction effects to investigate. Finally, SHAPs, case-wise statistics, were used for case studies to gain further insights into how leavers and stayers would be different in the potential predictors at the individual case level.

3.7 Variable Importance Scores (VIMs)

VIM is calculated for each predictor, providing a global measure of the impact of each predictor on the prediction (Kuhn 2022). The VIMs are often scaled to facilitate interpretation by dividing the original importance scores by the highest importance scores. Scaled importance scores range between 0% and 100%. For instance, an importance score of 60% means that the importance of the predictor is 60% of that of the most important predictor. There is no absolute criterion to evaluate the scaled importance scores. In practice, researchers often select predictors based on the rank orders of their importance scores (e.g., choose the top 10 scores depending on the number of predictors in the model, see Loh and Zhou 2021).

3.8 Accumulated Local Effect (ALE) Plots

ALEs can be applied to one or two predictors. For one predictor (e.g., X₁), the ALE is a two-dimensional plot showing the marginal relationship between the predictor and a target outcome (e.g., the probability of turnover) averaged across the distributions of all other predictors (Friedman 2001; Molnar, Casalicchio, and Bischl 2020). In other words, it shows how the average prediction could change as a predictor changes. When applied to two predictors (e.g., X₁ and X₂), ALSs could display how the average prediction could change for different combinations of the two predictors across the marginal distributions of the other predictors (e.g., X₃—X_p). Thus, they help visualise two-way interactions.

3.9 Identifying Potential Interaction Effects via Friedman's H-Statistics

Many ML methods (e.g., random forest) can account for potential interaction effects (heterogeneity) without explicitly expressing them in the models. In this case, it requires the use of indices such as Friedman's H-statistics to extract the potential interaction effects (Friedman and Popescu 2008). Briefly speaking, the H-statistics evaluates the degree to which a predictor (e.g., X₁) may interact with any other predictors in the model. There are two types of H statistic values. One represents the proportion of the standard deviation (SD) in prediction from all predictors due to all interaction effects associated with a particular predictor. The other is more specific and can be used to evaluate the strength of a specific two-way interaction (e.g., between X₁ and X₂) relative to the joint impact from both predictors. For convenience, we call the former overall H and the latter specific H. For simplicity, we first used the overall H to identify a predictor for which the interaction effects could be most influential. We then focused on the predictor to explore its potential two-way interactions with any other predictors using the specific H. We used ALEs to portray the top 2 two-way interaction effects associated with the identified predictor.

3.10 Shapley Values (SHAPs)

SHAPs are case-wise statistics (i.e., available for each case), aiming to measure the contribution of each predictor to the prediction relative to the average prediction from all predictors for each individual (Rodríguez-Pérez and Bajorath 2020). SHAPs originate from a well-established coalitional game theory (Hart 1989) by treating predictors as players collaborating to produce the final prediction. As such, SHAPs have many desirable theoretical properties. For example, it provides a fair contribution allocation across the predictors and produces zero values for predictors that cannot predict (Lundberg and Lee 2017; Rodríguez-Pérez and Bajorath 2020). Given that SHAPs are case-wise, they are well suited for case studies which could facilitate our understanding of why a specific employee chose to leave while another decided to stay and what predictors were most influential to their decisions. We present the SHAPs for two extreme cases: one with the highest and one with the lowest probability of leaving.

3.11 Software Implementation

All analyses were conducted in R 4.2.1 (R Core Team 2021). Specifically, the caret package (Kuhn 2022) was used to implement the six ML methods with repeated 10-fold cross-validation and to obtain the variable importance scores. The iml package (Molnar 2018) was used to generate the ALE plots, H-statistics and SHAPs. The R scripts are in the online supplementary materials (https://osf.io/am9nj).

4 Result

4.1 Predictive Performance

AUC, sensitivity and specificity were used to evaluate the predictive performance of the methods. The average values for these criteria across 100 sets of prediction from the 10-fold repeated cross-validation are presented in Table 2. We used paired t-tests to examine whether the means for each criterion were significantly different for each pair of the methods (six methods, 15 pairs). To avoid inflated type I error rates, the α value for the t-tests was adjusted using the Bonferroni correction (i.e., corrected α = 0.05/15 = 0.0033) (Dunn 1961). Based on the significance of these tests, we ranked the methods for each criterion (the methods that did not differ significantly were grouped together). The rank orders were as follows. For AUC, (RF, GBM) > (LR, EN, NN, SVM); for sensitivity, (RF, GBM) > (NN, SVM) > (LR, EN); and for specificity, all methods had comparable performance (see Table 2).

TABLE 2. Predictive performance of the machine learning methods.

	LR	EN	RF	GBM	NN	SVM
Specificity	0.73 (0.07)	0.73 (0.08)	0.73 (0.07)	0.73 (0.07)	0.72 (0.08)	0.72 (0.07)
Sensitivity	0.70 (0.08)	0.71 (0.08)	0.83 (0.07)	0.80 (0.08)	0.74 (0.09)	0.73 (0.09)
AUC	0.77 (0.04)	0.77 (0.04)	0.86 (0.04)	0.84 (0.04)	0.78 (0.05)	0.78 (0.05)

Note: The highest values are in boldfaces for each evaluation criterion. Standard deviations across 100 cross-validation samples are shown in the paratheses. AUC standards for the area under the characteristic curve.
Abbreviations: EN, elastic net; GBM, gradient boosting machine; LR, logistic regression; NN, neural network; RF, random forest; SVM, support vector machine.

4.2 Variable Importance Scores (VIMs)

We reported the top five predictors based on their VIMs (see Table 3). As shown in Table 3, past work years appeared to be the most important predictor across all methods. RF and GBM were highly consistent regarding their top five predictors, except for the fifth predictor (i.e., proportion of clients with schizophrenia diagnosis from RF vs. employee age from GBM). A few categorical predictors (i.e., exempt status, educational degree, marital status and employee type) appeared in the top five list from EN, NN or SVM. In total, Table 3 included 10 predictors, with past work years selected by all methods, average weekly work hours and the total training hours selected by four methods, employee age and proportion of clients with schizophrenia diagnosis selected by three methods, average hourly wage, exempt status and educational degree selected by two methods, and marital status and employee type chosen by one method.

TABLE 3. Top five predictors from the machine learning methods.

Rank	1	2	3	4	5
Method
LR	Past work years (100%)	Weekly work hours (64%)	Total training hours (51%)	Age (25%)	Proportion of clients with schizophrenia diagnosis (14%)
EN	Past work years (100%)	Total training hours (24%)	Weekly work hours (22%)	Exempt status (10%)	Educational degree (Masters' vs. Associate) (10%)
RF	Past work years (100%)	Total training hours (45%)	Weekly work hours (35%)	Hourly wage (28%)	Proportion of clients with schizophrenia diagnosis (24%)
GBM	Past work years (100%)	Weekly work hours (49%)	Total training hours (44%)	Hourly wage (32%)	Age (28%)
NN	Past work years (100%)	Exempt status (66%)	Total training hours (46%)	Educational degree (Bachelors' vs. Associate) (36%)	Marital Status (30%)
SVM	Past work years (100%)	Weekly work hours (51%)	Age (29%)	Proportion of clients with schizophrenia diagnosis (15%)	Employee type (11%)

Note: The scaled importance scores (in percent) are shown in the parentheses. AUC standards for area under the characteristic curve.
Abbreviations: EN, elastic net; GBM, gradient boosting machine; LR, logistic regression; NN, neural network; RF, random forest; SVM, support vector machine.

4.3 Visualising Marginal Effects via ALEs

To visualise the marginal effects of the selected 10 predictors, we drew an ALE for each predictor in Table 3. If a predictor was included in the top five list of more than one method, we only presented the plot for the method with the highest AUC. For example, we presented the plot for exempt status based on NN instead of EN. More graphs can be found in the online supplementary materials (https://osf.io/am9nj). The ALEs for the 10 predictors are shown in Figure 1.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Accumulated local effect (ALE) plots for the predictors in Table 3. The title on the top of each graph indicates the method based on which the graph was drawn. The range of y varied across the graphs. More important predictors had more variability in the y-axis. For a continuous predictor, the x-axis shows the density of the distribution with darker colour representing higher density of the values.

As shown in Figure 1, the predictive relationships appeared to be nonlinear. However, in general, higher total training hours and the proportion of clients with schizophrenia diagnosis were associated with a higher probability of turnover. On the other hand, more past work years and weekly work hours were associated with a lower probability of turnover. The predictive relationship for hourly wage appeared to be nonmonotonic (e.g., the direction of the relationship changed during the range of the predictor). Although a wage increase in the lower wage range could reduce the probability of turnover, it was associated with an increased probability of turnover in the higher wage range. A similar phenomenon occurred in the relationship between employee age and turnover. An increase in employee age was associated with an increased probability of turnover for younger employees (e.g., <35 years old) but a decreased probability for employees between 35 and 54 years old. Passing around 54 years old, the trend went up again. For the categorical predictors, those with exempt status, were married, held nonclinical positions or had masters' degree or above tended to have a higher predicted probability of turnover.

4.4 Interaction Effects

The ALEs in Figure 1 portray the marginal effects of the predictors but not interactions. Friedman's H-statistics (or H) were used to explore potential interaction effects. We only presented the H for the method that provided the best overall prediction (i.e., RF). Figure 2a displays the overall H for all the predictors. Recall that H values reflect the impact of all possible interaction effects associated with each predictor. The overall H was the highest (about 50%) for past work years, indicating that about 50% of SD in the prediction from all predictors was due to the interaction of past work years with one or more of the other variables. Centring on past work years, we examined all possible two-way interaction effects associated with it using specific H. As shown in Figure 2b, the top 2 two-way interaction effects are the one between past work years and total training hours (H = 37%) and the one between past work years and weekly work hours (H = 22%). These two interaction effects were visualised using ALEs in Figure 3.

As shown in Figure 3, the ALEs for interaction are two-dimensional heatmaps with the x and y axis representing the two predictors that interacted with each other, respectively. The density of the colours reflected the level of predicted outcome for each combination of two predictor values, with lighter colours indicating a higher predicted probability of turnover. Figure 3 shows that newly hired employees are most likely to leave, and those who have worked for a long time are least likely to leave, regardless of the training or weekly work hours. In between, increased training hours could increase the probability of turnover (Figure 3a), while increased weekly work hours could decrease the probability of turnover (see Figure 3b).

4.5 Case Study Using Shapley Values (SHAPs)

Finally, to examine how the leavers and stayers differ at the individual level, we extracted two extreme cases from the sample: a case with the highest probability to leave and a case with the lowest probability to leave. We compared their SHAPs of the predictors based on the RF results (see Figure 4). Note that the lengths of the bars in the figure reflect the magnitudes of the SHAPs, with a higher SHAP indicating a greater contribution of the corresponding predictor to the prediction for the specific case. One can see that the difference in prediction between the two cases was mainly determined by past work years, weekly work hours and the proportion of clients with schizophrenia diagnosis, generally consistent with the predictors' variable importance scores (i.e., VIMs). Taking a closer look at the characteristics of the two cases, we find that although both were clinical employees with nonexempt work status, the leaver case had shorter work history (7 months vs. 3 years), shorter work hours per week (5 vs. 40 h) and worked with more clients with schizophrenia diagnosis (97% vs. 52%). In addition, the leaver case was a younger black male versus an older white female; however, age, race and gender showed minimum influence (i.e., SHAPs) for these two sample cases.

5 Discussion

The current study used various ML methods to predict employee turnover in a community mental health centre using HR data. These ML methods had not been extensively tested for mental health employee turnover prediction, or practical interpretations of the results had not been offered in a way that can be intuitively translated for practitioners (e.g., leadership, HR) in mental health. Among the examined methods, the tree-based methods (i.e., RF and GBM) showed better predictive performance, followed by SVM and NN. LR and EN were generally inferior to the other methods, probably due to their limitations in accounting for complex nonlinear relationships among turnover factors (Holtom et al. 2008). The best-performing methods achieved >0.8 AUC, >0.8 sensitivity, and >0.7 specificity, indicating that the HR data were informative in predicting mental health employee turnover.

We presented the top five predictors from each method. As expected, there were overlaps and discrepancies. All methods agreed upon past work years being the most influential predictor. Total training hours and weekly work hours were also in the top three for most methods. These predictors could be more generalisable than the others. However, all the important predictors identified by the competing methods warrant further investigation and validation. For instance, three methods (i.e., LR, RF and SVM) identified that employees who worked with a higher proportion of clients with a schizophrenia diagnosis were more likely to leave. Working with more clients who struggle with severe mental health symptoms could add more job demands to the employees (leading to a higher probability of turnover). This concern has been discussed in the mental health literature (e.g., Rollins et al. 2010). However, the ML results provide the organisation's leaders and HR with direct implications for their turnover struggles in their organisational contexts.

Although some impactful predictors in this study were consistent with predictors discussed in the literature (e.g., past work years; Tsai, Bernacki, and Lucas 1989), others seemed to be inconsistent (e.g., no-significant associations found with exempt status, marital status, educational degree, Fukui, Rollins, and Salyers 2020). The differences in methods (e.g., traditional statistical methods vs. ML methods), data sources (e.g., survey data vs. HR data), tested variables and prediction models, and implementation settings could contribute to the discrepancies. Understanding these differences requires further investigation (e.g., interpreting the predictors in the local contexts).

We used ALEs to explore the marginal relationship of each important predictor with the turnover outcome. The plots revealed nonlinear or nonmonotonic relationships among predictors. For example, the predictability of employees' wages and age varied depending on their ranges, suggesting the need to examine their impacts in different career and life stages. Evidence for interaction effects also emerged according to H-statistics and ALE heatmaps. In particular, past work years were found to interact with other turnover predictors, meaning that the impact of turnover factors may depend on an employee's past work years. The length of work years is typically included in turnover prediction models as a covariate (to control for the effect). However, our study suggests the importance of considering it as a moderator (i.e., the interaction effects). For instance, the organisation's leaders should consider the impact of job factors (e.g., training) based on their employees' career stages instead of implementing uniform turnover prevention practices across employees. Furthermore, comparing and visualising the case (employee) characteristics with different turnover probabilities via SHAPs can also help communicate with the organisation's leaders and HR about the risk and preventive factors in their organizational contexts. These findings provide valuable insights on how to tailor the interventions, given varying employee characteristics.

The results from the current study support the dynamic processes (mechanisms) of turnover (e.g., nonlinear, nonmonotonic, moderator or mediator) suggested in turnover theories (Holtom et al. 2008). However, empirical evidence for the dynamic process was limited in the past due to the limitations of traditional statistical modelling, especially with small survey data or the challenge of collecting an actual turnover outcome. In this aspect, our study clearly demonstrated the advantages of ML.

5.1 Limitations and Future Directions

Some limitations of the current study are worth mentioning. First, the ML turnover prediction models were developed in the specific organisational contexts. ML is a data-driven approach that can localise the prediction model. However, this may limit the generalizability of the findings to other mental health settings or organisations. Second, the current sample size was on the small side in the ML literature (Zhao et al. 2019). Although the ML methods are applicable to similar sample sizes like ours (e.g., Quinn, Rycraft, and Schoech 2002; Tzeng, Hsieh, and Lin 2004; Zhao et al. 2019), small sample sizes could compromise the generalisability of the result. Future research is warranted to apply the ML methods to more extensive HR data across multiple mental health centres and to validate the result with external data sets. Third, the result is limited to the predictors included in the study. If more predictors are included, especially those with potentially high impact, then the predictive performance, the rank order of the variable importance score and the relationship patterns could change. Identifying important predictors that are missing in the current HR data and developing practical and efficient ways to integrate them into the HR data collection process would be promising in boosting the predictive power of HR data. Fourth, like other administratively collected data, HR data are not bias free. For instance, if particular subpopulations (e.g., gender, race and age) left the organisation systematically by other structural factors (e.g., insufficient organisational support or discrimination towards minoritised employees), prediction models trained by the historical data can be biased. Finally, the current study predicted turnover probability within the first year and considered only averaged effects across individuals. With more data, future research could be conducted to account for individual changes of turnover probability over time and examine time-varying and time-constant predictors for individual changes. This will require implementing ML methods that are capable of handling longitudinal categorical data with random effects (Cascarano et al. 2023).

6 Conclusion

The current study has significant implications for community mental health research. The study applied various ML methods to leverage routinely collected organisational data (e.g., HR data) for predicting employee turnover. It also provided extended insights on how to interpret the findings with various tools. Through the effort, our study revealed that the variable relationships among turnover predictors are more conditional, requiring the understanding of the interactive aspects than simply controlling for the effect of other variables (e.g., past work years). ML methods have advantages over traditional methods in this respect. Furthermore, our study suggests predictors that may be unique to the mental health employee population (e.g., the proportion of clients with schizophrenia diagnosis). This is an important first step to identifying high turnover risk employees for locally tailored and individualised (person-centred) interventions to prevent turnover in mental health.

7 Relevance for Clinical Practice

The proposed ML methods with HR data provide HR and leadership with a strategy to identify employees with high turnover probabilities without burdening them with additional data collection. Identifying such employees is a critical first step to developing localised (employee and agency-centred) interventions to prevent excessive turnover of the mental health workforce. The current study is a subset of the larger efforts (NIMH R34MH119411), which was initiated by collaborative efforts among mental health staff, organisational leaders and researchers to address high turnover struggles in the mental health field. Our next steps are for mental health practitioners and administrators to further contextualise the interpretations and applications in their specific organisational contexts through mixed methods approaches.

Author Contributions

Wei Wu and Sadaaki Fukui contributed to designing the study, analysing the data and drafting and approving the final manuscript.

Acknowledgements

We thank the study site's HR department and leadership for providing the data.

Disclosure

The material in this paper was first presented at the 2022 NIMH Mental Health Services Research Conference, August 2–3, 2022. The content is solely the responsibility of the authors and does not represent the official views of NIH.

Ethics Statement

The study was approved by the Indiana University Institutional Review Board (protocol#: 1908641462) and performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments, and waiver of informed consent was applied to the data already being collected by the organisation.

Conflicts of Interest

The authors declare no conflicts of interest.

Open Research

Data Availability Statement

Because of the sensitivity of data, the original data are not available to the public. However, a simulated dataset and the R scripts for all methods examined in the study as well as other online supplementary materials are available at https://osf.io/am9nj.

Supporting Information

References

Aarons, G. A., and A. C. Sawitzky. 2006. “Organizational Climate Partially Mediates the Effect of Culture on Work Attitudes and Staff Turnover in Mental Health Services.” Administration and Policy in Mental Health and Mental Health Services Research 33, no. 3: 289–301. https://doi.org/10.1007/s10488-006-0039-1.
10.1007/s10488-006-0039-1
PubMed Web of Science® Google Scholar
Beidas, R. S., S. Marcus, C. B. Wolk, et al. 2016. “A Prospective Examination of Clinician and Supervisor Turnover Within the Context of Implementation of Evidence-Based Practices in a Publicly-Funded Mental Health System.” Administration and Policy in Mental Health and Mental Health Services Research 43, no. 5: 640–649. https://doi.org/10.1007/s10488-015-0673-6.
10.1007/s10488-015-0673-6
PubMed Web of Science® Google Scholar
Brabson, L. A., A. D. Herschell, D. J. Kolko, and S. J. Mrozowski. 2019. “Associations Among Job Role, Training Type, and Staff Turnover in a Large-Scale Implementation Initiative.” The Journal of Behavioral Health Services & Research 46, no. 3: 399–414. https://doi.org/10.1007/s11414-018-09645-1.
10.1007/s11414-018-09645-1
PubMed Web of Science® Google Scholar
Breiman, L. 2001. “Random Forests.” Machine Learning 45, no. 1: 5–32. https://doi.org/10.1023/A:1010933404324.
10.1023/A:1010933404324
Web of Science® Google Scholar
Breiman, L., J. H. Friedman, R. A. Olsen, and C. J. Stone. 2017. “Classification and Regression Trees.” Routledge. https://doi.org/10.1201/9781315139470.
10.1201/9781315139470
Google Scholar
Brereton, R. G., and G. R. Lloyd. 2010. “Support Vector Machines for Classification and Regression.” The Analyst 135, no. 2: 230–267. https://doi.org/10.1039/B918972F.
10.1039/b918972f
CAS PubMed Web of Science® Google Scholar
Bukach, A. M., F. K. Ejaz, N. Dawson, and R. J. Gitter. 2017. “Turnover Among Community Mental Health Workers in Ohio.” Administration and Policy in Mental Health and Mental Health Services Research 44, no. 1: 115–122. https://doi.org/10.1007/s10488-015-0706-1.
10.1007/s10488-015-0706-1
PubMed Web of Science® Google Scholar
Cascarano, A., J. Mur-Petit, J. Hernández-González, et al. 2023. “Machine and Deep Learning for Longitudinal Biomedical Data: A Review of Methods and Applications.” Artificial Intelligence Review 56, no. S2: 1711–1771. https://doi.org/10.1007/s10462-023-10561-w.
10.1007/s10462-023-10561-w
Google Scholar
Cho, Y. J., and H. J. Song. 2017. “Determinants of Turnover Intention of Social Workers: Effects of Emotional Labor and Organizational Trust.” Public Personnel Management 46, no. 1: 41–65. https://doi.org/10.1177/0091026017696395.
10.1177/0091026017696395
Web of Science® Google Scholar
Domladovac, M. 2021. “ Comparison of Neural Network With Gradient Boosted Trees, Random Forest, Logistic Regression and SVM in Predicting Student Achievement.” In 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO), 211–216. Opatija, Croatia: IEEE. https://doi.org/10.23919/MIPRO52101.2021.9596684.
10.23919/MIPRO52101.2021.9596684
Google Scholar
Dunn, O. J. 1961. “Multiple Comparisons Among Means.” Journal of the American Statistical Association 56, no. 293: 52–64.
10.1080/01621459.1961.10482090
Web of Science® Google Scholar
Esmaieeli Sikaroudi, A. M., R. Ghousi, and A. Sikaroudi. 2015. “A Data Mining Approach to Employee Turnover Prediction (Case Study: Arak Automotive Parts Manufacturing).” Journal of Industrial and Systems Engineering 8, no. 4: 106–121.
Google Scholar
Friedman, J. H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics 29, no. 5: 1189–1232.
10.1214/aos/1013203451
Web of Science® Google Scholar
Friedman, J. H. 2002. “Stochastic Gradient Boosting.” Computational Statistics & Data Analysis 38, no. 4: 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2.
10.1016/S0167-9473(01)00065-2
Web of Science® Google Scholar
Friedman, J. H., and B. E. Popescu. 2008. “Predictive Learning via Rule Ensembles.” The Annals of Applied Statistics 2, no. 3: 916–954.
10.1214/07-AOAS148
Web of Science® Google Scholar
Fukui, S., A. L. Rollins, and M. P. Salyers. 2020. “Characteristics and Job Stressors Associated With Turnover and Turnover Intention Among Community Mental Health Providers.” Psychiatric Services 71, no. 3: 289–292.
10.1176/appi.ps.201900246
PubMed Web of Science® Google Scholar
Fukui, S., W. Wu, J. Greenfield, et al. 2023. “Machine Learning With Human Resources Data: Predicting Turnover Among Community Mental Health Center Employees.” The Journal of Mental Health Policy and Economics 26, no. 2: 63–76.
PubMed Web of Science® Google Scholar
Fukui, S., W. Wu, and M. P. Salyers. 2019. “Mediational Paths From Supervisor Support to Turnover Intention and Actual Turnover Among Community Mental Health Providers.” Psychiatric Rehabilitation Journal 42, no. 4: 350–357.
10.1037/prj0000362
PubMed Web of Science® Google Scholar
Gokcen, I., and J. Peng. 2002. “ Comparing Linear Discriminant Analysis and Support Vector Machines.” In Advances in Information Systems, edited by T. Yakhno, 104–113. Berlin, Heidelberg: Springer. https://doi.org/10.1007/3-540-36077-8_10.
10.1007/3-540-36077-8_10
Google Scholar
Hallett, E., E. Simeon, V. Amba, D. Howington, K. J. McConnell, and J. M. Zhu. 2024. “Factors Influencing Turnover and Attrition in the Public Behavioral Health System Workforce: Qualitative Study.” Psychiatric Services 75, no. 1: 55–63. https://doi.org/10.1176/appi.ps.20220516.
10.1176/appi.ps.20220516
PubMed Web of Science® Google Scholar
Hart, S. 1989. “ Shapley Value.” In Game Theory, edited by J. Eatwell, M. Milgate, and P. Newman, 210–216. UK: Palgrave Macmillan. https://doi.org/10.1007/978-1-349-20181-5_25.
10.1007/978-1-349-20181-5_25
Google Scholar
Hastie, T., J. Friedman, and R. Tibshirani. 2001. The Elements of Statistical Learning. New York, NY: Springer. https://doi.org/10.1007/978-0-387-21606-5.
10.1007/978-0-387-21606-5
Google Scholar
Hawkins, M. 2021. Review of Physician and Advanced Practitioner Recruiting Incentives [White Paper]. https://www.merritthawkins.com/uploadedFiles/physician-advanced-practitioner-incentive-review-2021.pdf.
Google Scholar
Holtom, B. C., T. R. Mitchell, T. W. Lee, and M. B. Eberly. 2008. “5 Turnover and Retention Research: A Glance at the Past, a Closer Review of the Present, and a Venture Into the Future.” The Academy of Management Annals 2, no. 1: 231–274. https://doi.org/10.1080/19416520802211552.
10.5465/19416520802211552
Web of Science® Google Scholar
Hom, P. W., T. W. Lee, J. D. Shaw, and J. P. Hausknecht. 2017. “One Hundred Years of Employee Turnover Theory and Research.” Journal of Applied Psychology 102, no. 3: 530–545.
10.1037/apl0000103
PubMed Web of Science® Google Scholar
Hosmer, D. W., S. Lemeshow, and R. X. Sturdivant. 2013. Applied Logistic Regression. New York, NY: John Wiley & Sons.
10.1002/9781118548387
Web of Science® Google Scholar
James, G., D. Witten, T. Hastie, and R. Tibshirani. 2021. An Introduction to Statistical Learning: With Applications in R. New York, NY: Springer US. https://doi.org/10.1007/978-1-0716-1418-1.
10.1007/978-1-0716-1418-1
Google Scholar
Jerez, J. M., I. Molina, P. J. García-Laencina, et al. 2010. “Missing Data Imputation Using Statistical and Machine Learning Methods in a Real Breast Cancer Problem.” Artificial Intelligence in Medicine 50, no. 2: 105–115. https://doi.org/10.1016/j.artmed.2010.05.002.
10.1016/j.artmed.2010.05.002
PubMed Web of Science® Google Scholar
Jun, M.-J. 2021. “A Comparison of a Gradient Boosting Decision Tree, Random Forests, and Artificial Neural Networks to Model Urban Land Use Changes: The Case of the Seoul Metropolitan Area.” International Journal of Geographical Information Science 35, no. 11: 2149–2167. https://doi.org/10.1080/13658816.2021.1887490.
10.1080/13658816.2021.1887490
Web of Science® Google Scholar
Kassambara, A. 2018. Machine Learning Essentials: Practical Guide in R. 1st ed. CreateSpace Independent Publishing Platform.
Google Scholar
Kavzoglu, T., and P. M. Mather. 2003. “The Use of Backpropagating Artificial Neural Networks in Land Cover Classification.” International Journal of Remote Sensing 24, no. 23: 4907–4938. https://doi.org/10.1080/0143116031000114851.
10.1080/0143116031000114851
Web of Science® Google Scholar
Kirasich, K., T. Smith, and B. Sadler. 2018. “Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets.” SMU Data Science Review 1, no. 3: 25.
Google Scholar
Kuhn, M. 2022. caret: Classification and Regression Training. R Package Version 6.0-92. https://CRAN.R-project.org/package=caret.
Google Scholar
Kuhn, M., and K. Johnson. 2013. Applied Predictive Modeling. New York, NY: Springer. https://doi.org/10.1007/978-1-4614-6849-3.
10.1007/978-1-4614-6849-3
Google Scholar
Loh, W.-Y., and P. Zhou. 2021. Variable Importance Scores. (arXiv:2102.07765). https://doi.org/10.48550/arXiv.2102.07765.
10.48550/arXiv.2102.07765
Google Scholar
Lundberg, S. M., and S.-I. Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” Advances in Neural Information Processing Systems 30: 4768–4777. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.
Google Scholar
Molnar, C. 2018. “Iml: An R Package for Interpretable Machine Learning.” Journal of Open Source Software 3, no. 26: 786. https://doi.org/10.21105/joss.00786.
10.21105/joss.00786
Google Scholar
Molnar, C., G. Casalicchio, and B. Bischl. 2020. “ Interpretable Machine Learning—A Brief History, State-of-the-Art and Challenges.” In Joint European conference on machine learning and knowledge discovery in databases, vol. 1323, 417–431. Cham, Switzerland: Springer International Publishing. https://doi.org/10.1007/978-3-030-65965-3_28.
Google Scholar
Murugan, A., S. A. H. Nair, and K. P. S. Kumar. 2019. “Detection of Skin Cancer Using SVM, Random Forest and kNN Classifiers.” Journal of Medical Systems 43, no. 8: 269. https://doi.org/10.1007/s10916-019-1400-8.
10.1007/s10916-019-1400-8
CAS PubMed Web of Science® Google Scholar
Pereira, J. M., M. Basto, and A. F. da Silva. 2016. “The Logistic Lasso and Ridge Regression in Predicting Corporate Failure.” Procedia Economics and Finance 39: 634–641. https://doi.org/10.1016/S2212-5671(16)30310-0.
10.1016/S2212-5671(16)30310-0
Google Scholar
Quinn, A., J. R. Rycraft, and D. Schoech. 2002. “Building a Model to Predict Caseworker and Supervisor Turnover Using a Neural Network and Logistic Regression.” Journal of Technology in Human Services 19, no. 4: 65–85. https://doi.org/10.1300/J017v19v04_05.
10.1300/J017v19v04_05
Google Scholar
R Core Team. 2021. R: A Language and Environment for Statistical Computing [Computer software]. https://www.R-project.org/.
Google Scholar
Rodríguez-Pérez, R., and J. Bajorath. 2020. “Interpretation of Machine Learning Models Using Shapley Values: Application to Compound Potency and Multi-Target Activity Predictions.” Journal of Computer-Aided Molecular Design 34, no. 10: 1013–1026. https://doi.org/10.1007/s10822-020-00314-0.
10.1007/s10822-020-00314-0
CAS PubMed Web of Science® Google Scholar
Rollins, A. L., M. P. Salyers, J. Tsai, and J. M. Lydick. 2010. “Staff Turnover in Statewide Implementation of ACT: Relationship With ACT Fidelity and Other Team Characteristics.” Administration and Policy in Mental Health and Mental Health Services Research 37, no. 5: 417–426. https://doi.org/10.1007/s10488-009-0257-4.
10.1007/s10488-009-0257-4
PubMed Web of Science® Google Scholar
SAMHSA. 2022. Addressing Burnout in the Behavioral Health Workforce through Organizational Strategies. https://store.samhsa.gov/sites/default/files/SAMHSA_Digital_Download/pep22-06-02-005-.PDF.
Google Scholar
Scanlan, J. N., and M. Still. 2013. “Job Satisfaction, Burnout and Turnover Intention in Occupational Therapists Working in Mental Health.” Australian Occupational Therapy Journal 60, no. 5: 310–318. https://doi.org/10.1111/1440-1630.12067.
10.1111/1440-1630.12067
CAS PubMed Web of Science® Google Scholar
Sexton, R. S., S. McMurtrey, J. O. Michalopoulos, and A. M. Smith. 2005. “Employee Turnover: A Neural Network Solution.” Computers & Operations Research 32, no. 10: 2635–2651. https://doi.org/10.1016/j.cor.2004.06.022.
10.1016/j.cor.2004.06.022
Web of Science® Google Scholar
Strobl, C., J. Malley, and G. Tutz. 2009. “An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests.” Psychological Methods 14, no. 4: 323–348.
10.1037/a0016973
PubMed Web of Science® Google Scholar
Sullivan, W. P., D. C. Kondrat, and D. Floyd. 2015. “The Pleasures and Pain of Mental Health Case Management.” Social Work in Mental Health 13, no. 4: 349–364. https://doi.org/10.1080/15332985.2014.955942.
10.1080/15332985.2014.955942
Web of Science® Google Scholar
Troyanskaya, O., M. Cantor, G. Sherlock, et al. 2001. “Missing Value Estimation Methods for DNA Microarrays.” Bioinformatics 17, no. 6: 520–525. https://doi.org/10.1093/bioinformatics/17.6.520.
10.1093/bioinformatics/17.6.520
CAS PubMed Web of Science® Google Scholar
Tsai, S. P., E. J. Bernacki, and L. J. Lucas. 1989. “A Longitudinal Method of Evaluating Employee Turnover.” Journal of Business and Psychology 3: 465–473.
10.1007/BF01020714
Google Scholar
Tzeng, H.-M., J.-G. Hsieh, and Y.-L. Lin. 2004. “Predicting Nurses' Intention to Quit With a Support Vector Machine: A New Approach to Set up an Early Warning Mechanism in Human Resource Management.” Computers, Informatics, Nursing 22, no. 4: 232–242.
10.1097/00024665-200407000-00012
PubMed Web of Science® Google Scholar
Yanchus, N. J., D. Periard, S. C. Moore, A. C. Carle, and K. Osatuke. 2015. “Predictors of Job Satisfaction and Turnover Intention in VHA Mental Health Employees: A Comparison Between Psychiatrists, Psychologists, Social Workers, and Mental Health Nurses.” Human Service Organizations: Management, Leadership & Governance 39, no. 3: 219–244. https://doi.org/10.1080/23303131.2015.1014953.
10.1080/23303131.2015.1014953
Web of Science® Google Scholar
Yanchus, N. J., D. Periard, and K. Osatuke. 2017. “Further Examination of Predictors of Turnover Intention Among Mental Health Professionals.” Journal of Psychiatric and Mental Health Nursing 24, no. 1: 41–56. https://doi.org/10.1111/jpm.12354.
10.1111/jpm.12354
CAS PubMed Web of Science® Google Scholar
Zhang, Y., and A. Haghani. 2015. “A Gradient Boosting Method to Improve Travel Time Prediction.” Transportation Research Part C: Emerging Technologies 58: 308–324. https://doi.org/10.1016/j.trc.2015.02.019.
10.1016/j.trc.2015.02.019
PubMed Web of Science® Google Scholar
Zhao, Y., M. K. Hryniewicki, F. Cheng, B. Fu, and X. Zhu. 2019. “ Employee Turnover Prediction With Machine Learning: A Reliable Approach.” In Intelligent Systems and Applications, edited by K. Arai, S. Kapoor, and R. Bhatia, vol. 869, 737–758. Cham, Switzerland: Springer International Publishing. https://doi.org/10.1007/978-3-030-01057-7_56.
Google Scholar
Zou, H., and T. Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society, Series B: Statistical Methodology 67, no. 2: 301–320.
10.1111/j.1467-9868.2005.00503.x
Web of Science® Google Scholar

Citing Literature

Volume33, Issue6

December 2024

Pages 2180-2192

This article also appears in:

World Evidence-Based Healthcare Day 2024

Using Human Resources Data to Predict Turnover of Community Mental Health Employees: Prediction and Interpretation of Machine Learning Methods

ABSTRACT

List of Abbreviations

1 Introduction

2 Background

3 Method

3.1 Data

3.2 Data Preprocessing

3.3 ML Methods

3.3.1 Logistic Regression (LR)

3.3.2 Elastic Net (EN)

3.3.3 Neural Network (NN)

3.3.4 Random Forest (RF)

3.3.5 Gradient Boosting Machine (GBM)

3.3.6 Support Vector Machine (SVM)

3.4 Cross-Validation

3.5 Evaluation Criteria for Prediction Performance

3.6 Interpretation of ML Results

3.7 Variable Importance Scores (VIMs)

3.8 Accumulated Local Effect (ALE) Plots

3.9 Identifying Potential Interaction Effects via Friedman's H-Statistics

3.10 Shapley Values (SHAPs)

3.11 Software Implementation

4 Result

4.1 Predictive Performance

4.2 Variable Importance Scores (VIMs)

4.3 Visualising Marginal Effects via ALEs

4.4 Interaction Effects

4.5 Case Study Using Shapley Values (SHAPs)

5 Discussion

5.1 Limitations and Future Directions

6 Conclusion

7 Relevance for Clinical Practice

Author Contributions

Acknowledgements

Disclosure

Ethics Statement

Conflicts of Interest

Open Research

Data Availability Statement

Supporting Information

References

Citing Literature

Figures

References

Related

Information