ORIGINAL ARTICLE

Open Access

Deep-HH: A deep learning-based high school student hidden hunger risk prediction system

Yang Yang

orcid.org/0009-0009-5175-7250

First School of Clinical Medicine, Anhui Medical University, Hefei, Anhui, China

Contribution: Conceptualization (equal), Data curation (equal), Formal analysis (equal), Investigation (equal), Methodology (equal), Software (equal), Validation (equal), Visualization (equal), Writing - original draft (equal)

Search for more papers by this author

Zheng Zhang,

Zheng Zhang

First School of Clinical Medicine, Anhui Medical University, Hefei, Anhui, China

Search for more papers by this author

Huake Cao,

Huake Cao

First School of Clinical Medicine, Anhui Medical University, Hefei, Anhui, China

Contribution: Formal analysis (equal), Investigation (equal), Validation (equal), Writing - original draft (equal)

Search for more papers by this author

Yuchen Zhang,

Yuchen Zhang

orcid.org/0000-0002-0466-220X

Second School of Clinical Medicine, Anhui Medical University, Hefei, Anhui, China

Contribution: Formal analysis (equal), Investigation (equal), Validation (equal), Writing - original draft (equal)

Search for more papers by this author

Minao Wang,

Minao Wang

First School of Clinical Medicine, Anhui Medical University, Hefei, Anhui, China

Contribution: Validation (equal), Visualization (equal)

Search for more papers by this author

Ning Zhang,

Corresponding Author

Ning Zhang

[email protected]

First School of Clinical Medicine, Anhui Medical University, Hefei, Anhui, China

Correspondence

Ning Zhang.

Email: [email protected]

Contribution: Funding acquisition (equal), Project administration (equal), Resources (equal), Supervision (equal), Validation (equal), Writing - review & editing (equal)

Search for more papers by this author

Yang Yang,

Yang Yang

orcid.org/0009-0009-5175-7250

First School of Clinical Medicine, Anhui Medical University, Hefei, Anhui, China

Search for more papers by this author

Zheng Zhang,

Zheng Zhang

First School of Clinical Medicine, Anhui Medical University, Hefei, Anhui, China

Search for more papers by this author

Huake Cao,

Huake Cao

First School of Clinical Medicine, Anhui Medical University, Hefei, Anhui, China

Contribution: Formal analysis (equal), Investigation (equal), Validation (equal), Writing - original draft (equal)

Search for more papers by this author

Yuchen Zhang,

Yuchen Zhang

orcid.org/0000-0002-0466-220X

Second School of Clinical Medicine, Anhui Medical University, Hefei, Anhui, China

Contribution: Formal analysis (equal), Investigation (equal), Validation (equal), Writing - original draft (equal)

Search for more papers by this author

Minao Wang,

Minao Wang

First School of Clinical Medicine, Anhui Medical University, Hefei, Anhui, China

Contribution: Validation (equal), Visualization (equal)

Search for more papers by this author

Ning Zhang,

Corresponding Author

Ning Zhang

[email protected]

First School of Clinical Medicine, Anhui Medical University, Hefei, Anhui, China

Correspondence

Ning Zhang.

Email: [email protected]

Contribution: Funding acquisition (equal), Project administration (equal), Resources (equal), Supervision (equal), Validation (equal), Writing - review & editing (equal)

Search for more papers by this author

First published: 11 December 2024

https://doi.org/10.1002/med4.87

Yang Yang and Zheng Zhang contributed equally to this work and share the co-first authorship.

Share a link

Email
Wechat
Bluesky

Abstract

Background

Hidden hunger (HH) refers to the deficiency of certain micronutrients. Current research suggests that approximately 70% of chronic diseases are linked to HH, which significantly affects public health. Consequently, there is an urgent need for an effective method to assess the risk of HH. This study aims to develop risk prediction models for HH using machine learning (ML).

Methods

We conducted a questionnaire survey among 9336 high school students in 11 cities within Anhui Province and assessed their HH risk using a scale. After quality control, we designated 632 students from Xuancheng City as the external test cohort and used the remaining 6477 students as the training cohort to develop predictive models. We used six ML algorithms (i.e., deep-learning neural network [DNN], random forest, support vector machine, extreme gradient boosting, gradient boosting decision tree, and k-nearest neighbor) to fit the training set using five-fold cross-validation, with hyperparameter tuning performed via Bayesian optimization. We used the “Streamlit” library to construct an online application and the “shapley additive explanations” library for model interpretability analysis.

Results

We observed that the DNN model performed best. In the external test cohort, the area under the curve reached 0.813, accuracy was 0.739, and sensitivity and specificity were 0.720 and 0.760, respectively. Furthermore, the precision-recall curve, calibration curve, and decision curve analysis also indicated that our model had high predictive accuracy. To aid practical use, we developed an online application (http://sec.mitusml.com:9000/). Through model interpretability analysis, we discovered that the frequent consumption of fruits and coarse grains was likely to reduce the risk of HH, whereas frequently eating snacks and fried foods increased the risk of HH.

Conclusions

We developed an effective prediction model for HH and analyzed the factors that influence its risk.

Abbreviations

CI: confidence interval
DCA: decision curve analysis
DNN: deep neural network
GBDT: gradient boosted decision trees
GDP: gross domestic product
HH: hidden hunger
HHAS: hidden hunger assessment scale
KNN: k-nearest neighbors
ML: machine learning
NPV: negative predictive values
PPV: positive predictive values
PR: precision-recall
RF: random forest
ROC: receiver operating characteristic
SHAP: shapley additive explanations
SVM: support vector machine
XGBoost: eXtreme gradient boosting

1 BACKGROUND

Hidden hunger (HH) refers to a condition where the body, despite adequate energy intake, lacks specific micronutrients like iron, zinc, and vitamin A [1]. The term “hidden” is used because this condition often lacks obvious physical symptoms [2]. An imbalance in nutrient intake is a major contributing factor to HH [3]. Statistics indicate that approximately 2 billion people are affected by HH worldwide, especially in economically underdeveloped regions [4]. Numerous studies have demonstrated the significant impact of HH on people's health [5]. For example, prolonged iodine deficiency can lead to cognitive impairments, and iodine deficiency in pregnant women during the gestational period can increase the risk of stillbirth [6]; long-term zinc deficiency can adversely affect growth and development as well as diminish the body's resistance to diseases [7]; and the deficiency of vitamin A similarly augments the risk of maternal and fetal mortality [8]. Accurate identification of populations experiencing micronutrient deficiencies is of paramount importance for symptom improvement. However, there are no effective assessment methods for HH due to constraints such as technology and funding.

At present, the detection of micronutrients predominantly relies on conventional physical and chemical methods. For instance, water-soluble vitamins (including vitamins B₁, B₂, B₃, B₅, B₆, B₇, B₉, B₁₂, and C) are often analyzed using liquid chromatography tandem mass spectrometry (LC-MS/MS) and high-performance liquid chromatography (HPLC) techniques [9, 10]. Nonetheless, because of the lack of effective standardization across different laboratories and the susceptibility of detection results to the influence of sample collection conditions, the accuracy of the obtained results is often compromised [11]. Regarding the detection of fat-soluble vitamins, ¹³C isotope labeling LC-MS/MS (vitamin A) [12], LC-MS/MS (vitamins D and K) [13, 14], and liquid chromatography-ultraviolet detection (vitamin E) [15] are usually used. However, because of variations between detection techniques, and the requirement for lipids and cholesterol as quantitative references, the test results are not always reliable [16]. For the quantitative analysis of minerals (including Fe, Cu, Zn, Se, I, and Mn), electrochemiluminescence immunoassay, atomic absorption spectroscopy, and inductively coupled plasma mass spectrometry are often used [17-19]. However, this approach is also susceptible to an individual's inflammatory status and lifestyle habits [20]. Moreover, the high cost and the lack of specialized detection equipment also restrict the large-scale screening for HH. Thus, there is an urgent need for an effective method to predict HH risk.

Machine learning (ML) is an ever-evolving realm encompassing computational algorithms designed to enable systems to acquire knowledge and emulate human intelligence by iteratively learning from the surrounding environment [21]. With the continuous application of ML in the field of medicine, it has demonstrated significant advantages in personalized disease treatment [22], drug discovery [23] and sensitive drug screening [24], surgical robotics [25], assistive surgical decision-making [26], medical image identification [27], and omics analysis [28]. Particularly, the application of ML in disease diagnosis has effectively facilitated early detection, thereby improving disease prognosis. For example, recently published ML-based diagnostic models have provided strong assistance in the clinical diagnosis of Parkinson's disease [29]. Regarding the risk assessment of HH, ML also holds significant potential. It can greatly enhance the efficiency of these assessments and reduce detection costs, making it particularly suitable for large-scale population screening. In a previous study, we found significant associations between individuals' living habits, diets, nutritional cognition, and their risk of HH [30]. Therefore, we speculated that establishing ML models based on these features would contribute to the auxiliary diagnosis of HH.

In this study, we use various ML algorithms to establish effective risk-prediction models for HH based on individuals' lifestyle habits, dietary patterns, and nutritional knowledge. Moreover, we conducted an in-depth analysis of the potential influencing factors of HH based on the model's interpretations. Additionally, to aid practical use, we also developed an online application. This will facilitate the widespread screening and early prevention of HH.

2 METHODS

2.1 Data collection and pre-processing

Cities in Anhui Province were initially classified into southern, northern, eastern, and western regions based on their geographical locations. This classification facilitated comprehensive sampling across diverse areas, thereby minimizing geographical bias and enhancing the robustness of the predictive model. Subsequently, they were categorized into developed and underdeveloped cities using the average gross domestic product of all cities in Anhui Province as the cut-off value. If multiple cities had similar geographical locations and economic levels, only one city was randomly selected as a representative. Additionally, cities with highly imbalanced population distribution between urban and rural areas were excluded. Based on the aforementioned selection criteria, a set of 11 cities were finally chosen: Bengbu, Chuzhou, Anqing, Fuyang, Huaibei, Xuancheng, Wuhu, Huangshan, Suzhou, Hefei, and Huainan. Then, a multistage stratified cluster sampling method was used within these cities, with three or four senior high schools randomly selected in each city. Ultimately, a total of 35 senior high schools were selected. Then, based on the inclusion criteria (1) currently enrolled in high school in Anhui Province and (2) voluntarily willing to participate in the survey, a total of 9336 high school students were finally selected. Prior to the commencement of this study, the corresponding official approval was obtained from the respective schools in addition to informed consent from the survey participants. All participants were instructed on how to effectively complete the survey and encouraged to provide truthful responses. The entire research process is illustrated in Figure 1.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Flow chart of this study. DNN, deep-learning neural network; GBDT, gradient boosted decision trees; KNN, k-nearest neighbors; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.

For each participant, a questionnaire and HH assessment scale (HHAS) were distributed [30]. The questionnaire contained demographic characteristics (including sex, grade, birthplace, number of siblings, mother's highest education level, father's highest education level, monthly expenditure on food, and daily sun exposure time), nutritional knowledge (including understanding level of HH and essential nutrients, understanding level of the relationship between HH and chronic diseases, willingness to acquire nutrition knowledge, attitude toward HH, and sources of acquiring nutrition knowledge), and lifestyle habits (including frequency of irregular eating, reasons for irregular eating, satisfaction level with daily meals, degree of emphasis on nutritional balance in daily life, existence of picky eating and strong-flavored dietary habits, the consumption of nutritional supplements in the past year, weekly exercise duration, supplementation of micronutrients after sweating, and frequency of consuming fried food, coarse grains, fruits, and snacks) (Figure S1). The questionnaire survey results were double entered by two investigators, and cross-checking was conducted using Epidata 3.1 (EpiData Association). The HHAS consisted of 12 items, each with three options (usually, sometimes, and seldom), which were assigned scores of 5, 3, 1 or 1, 3, 5. A total score of 38 or less indicates high risk, whereas a score above 38 indicates low risk [30]. After questionnaires with incomplete information and those containing outliers were excluded, 7109 high school students' demographic characteristics, nutritional awareness, lifestyle habits, and HH risk statuses were obtained.

2.2 ML model construction

2.2.1 Dataset splitting and feature engineering

A total of 632 high school students from Xuancheng City were allocated to an external test set, and the remaining 6477 students were divided into training and validation sets based on stratified five-fold cross-validation. Next, a chi-square test was conducted for each questionnaire item against its corresponding HH risk among the 6477 samples, and items with a p-value <0.05 were retained for ML model development. Then, the features of the training, internal validation, and external test sets were encoded using one-hot encoding, and the HH risks were encoded as 0 and 1 (where 0 represents low-risk and 1 represents high-risk). Because of the imbalanced nature of the dataset, the minority class samples in the training sets were subjected to a random oversampling technique.

2.2.2 DNN model construction

A deep-learning neural network (DNN) model was constructed using the “tensorflow” library (version 2.13.0), which consisted of an input layer, three hidden layers, and an output layer. Overfitting is a common issue that requires attention in deep learning. To mitigate this, a dropout layer was incorporated after each hidden layer, with random nodes in the neural network deactivated, which significantly reduced the risk of overfitting and enhanced the model's robustness. The activation function for each hidden layer was ReLU and the output layer's activation function were Sigmoid:

()

where x denotes the input value for each neuron. The loss function was binary cross-entropy:

()

where y_i denotes the actual class and p(y_i) denotes the probability of that class. Then, the area under the receiver operating characteristic (ROC) curve (AUROC) was chosen as the evaluation metric:

()

where ∑Rank (posi) denotes the sum of the ranks of all positive-class samples, posi denotes the number of positive-class samples, and nega denotes the number of negative-class samples. Adam was then selected as the optimizer. To mitigate overfitting during the training process, early stopping was implemented as a callback with the validation set's AUROC as a monitored metric. Training was stopped when the AUROC no longer improved. Finally, the “hyperopt” library (version 0.2.7) was used for Bayesian optimization, with the objective of minimizing the negative average AUROC on the validation sets, to search for the optimal values of the hidden layers' units, dropout layers' rates, and parameters of the Adam optimizer (including learning rate, beta_1, and beta_2).

2.2.3 Construction of other ML models

The algorithms random forest (RF), gradient boosting decision tree (GBDT), support vector machine (SVM), and k-nearest neighbors (KNN) from Python's “scikit-learn” library (version 1.2.0), along with the extreme gradient boosting (XGBoost) algorithm from the “xgboost” library (version 1.7.6), were used to fit the training data to develop predictive models, which were then evaluated using an external test set. Specifically, the Bayesian optimization algorithm from the “hyperopt” library (version 0.2.7) and the stratified five-fold cross-validation technique were used for hyperparameter optimization, and the random oversampling algorithm from Python's “imbalanced-learn” library (version 0.12.4) was used to balance the data. For more details, please refer to Supporting Information S1: Methods.

2.3 Statistical analysis

All statistical tests were conducted using Python (version 3.11.3) and R (version 4.3.1), with two-sided tests performed and a significance level set at p < 0.05 to indicate statistical significance. To assess the performance of these models, accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score were calculated, respectively, for each model on the training sets, validation sets, and external test set:

()

where TP denotes true positive samples, TN denotes true negative samples, FP denotes false positive samples, and FN denotes false negative samples. The ROC curves, precision-recall (PR) curves, and calibration curves were plotted using the “scikit-learn” library. The decision curve analysis (DCA) curve was plotted using the “dcurves” Python library (Version 1.1.0). The interpretation of the model was conducted using the “shapley additive explanations (SHAP)” Python library (Version 0.45.0).

3 RESULTS

3.1 Baseline characteristics of the study subjects

A total of 9336 samples were collected. After samples with missing values and outliers were excluded, 7109 valid samples were obtained. The samples from Xuancheng City, totaling 632, were designated as the external test set, while the remaining 6477 samples were used for constructing the ML models. In the training set, a total of 4412 high school students (68.12%) were evaluated as being at high risk of HH, whereas in the test set, 366 high school students (57.91%) were deemed to be at high risk of HH. In the training data, senior-one students constituted the largest proportion (45.47%), followed by senior-two students (35.43%), and senior-three students represented the smallest proportion (19.10%). The external test set from Xuancheng City exhibited a similar distribution, with senior-one, senior-two, and senior-three students comprising 47.94%, 39.40%, and 12.66%, respectively. The proportion of only children in the test set (50.16%) was significantly higher than that in the training set (29.16%) (Table 1).

TABLE 1. Baseline characteristics of the study subjects.

Characteristics	Training set	Test set	χ²	p-value^a
Sex			2.94	0.09
Male	3183 (49.14)	288 (45.57)
Female	3294 (50.86)	344 (54.43)
Grade			16.19	<0.05
Senior one	2945 (45.47)	303 (47.94)
Senior two	2295 (35.43)	249 (39.40)
Senior three	1237 (19.10)	80 (12.66)
Birthplace			1.00	0.32
Urban	1864 (28.78)	170 (26.90)
Rural	4613 (71.22)	462 (73.10)
Is the only child in family			118.58	<0.05
Yes	1889 (29.16)	317 (50.16)
No	4588 (70.84)	315 (49.84)
Father's highest education			18.26	<0.05
Incomplete primary education	366 (5.65)	35 (5.54)
Elementary or middle school	3712 (57.31)	345 (54.59)
High school or vocational school	1556 (24.02)	151 (23.89)
Associate degree or undergraduate	740 (11.43)	100 (15.82)
Master's degree and above	103 (1.59)	1 (0.16)
Mother's highest education			32.08	<0.05
Incomplete primary education	930 (14.36)	55 (8.70)
Elementary or middle school	3661 (56.52)	395 (62.50)
High school or vocational school	1265 (19.53)	111 (17.56)
Associate degree or undergraduate	537 (8.29)	71 (11.24)
Master's degree and above	84 (1.30)	0 (0.00)
HH risk			27.22	<0.05
Low-risk	2065 (31.88)	266 (42.09)
High-risk	4412 (68.12)	366 (57.91)

Note: Data are presented as n (%).
Abbreviation: HH, hidden hunger.
^a Chi-square test.

3.2 Performance of ML models

All the developed ML models demonstrated high predictive accuracy. Notably, the DNN model, represented by “Deep-HH,” achieved an AUROC of 0.813 in the external test cohort from Xuancheng City, which underscored its superior discriminative capability and robustness relative to other models (Table 2). The specific architecture of this model is presented in Figure 2. Based on stratified five-fold cross-validation, the average AUROC was 0.857 for the training sets and 0.833 for the internal validation sets (Figure 3a,b). In the external test set, the AUROC achieved 0.813 (Figure 3c). Because of the imbalanced dataset, PR curves were additionally plotted to further evaluate the model's predictive performance. The results indicated that the Deep-HH model achieved average precision of 0.861, 0.910, and 0.843 for the training sets, internal validation sets, and external test set, respectively, which further confirmed its high predictive accuracy (Figure 3d–f). Furthermore, the calibration curves for the training, internal validation, and external test sets demonstrated that the Deep-HH model was well-calibrated (Figure 3g–i). DCA is a method used to evaluate a model's utility. In this study, DCA was used to further assess the utility of the model, and the results indicated that the Deep-HH model had significant value (Figure 3j–l). For more performance metrics of the Deep-HH model, please refer to Table 2. To aid use, an online application was developed (http://sec.mitusml.com:9000/).

TABLE 2. Metrics of six models for the training sets, internal validation sets, and external test set.

Data sets	Metrics	DNN (95% CI)	RF (95% CI)	SVM (95% CI)	GBDT (95% CI)	XGBoost (95% CI)	KNN (95% CI)
Training sets	Accuracy	0.777 (0.769–0.785)	0.774 (0.771–0.778)	0.772 (0.769–0.775)	0.768 (0.764–0.771)	0.769 (0.767–0.771)	0.748 (0.744–0.752)
	Sensitivity	0.822 (0.797–0.847)	0.806 (0.801–0.811)	0.810 (0.804–0.816)	0.808 (0.804–0.812)	0.794 (0.789–0.799)	0.780 (0.774–0.786)
	Specificity	0.730 (0.690–0.770)	0.746 (0.738–0.754)	0.738 (0.727–0.749)	0.728 (0.717–0.739)	0.744 (0.739–0.749)	0.716 (0.711–0.721)
	PPV	0.756 (0.734–0.778)	0.758 (0.751–0.765)	0.756 (0.748–0.764)	0.748 (0.741–0.755)	0.754 (0.749–0.759)	0.732 (0.728–0.736)
	NPV	0.805 (0.789–0.821)	0.793 (0.791–0.795)	0.794 (0.791–0.797)	0.791 (0.789–0.894)	0.783 (0.780–0.786)	0.765 (0.759–0.770)
	F1 score	0.784 (0.779–0.789)	0.781 (0.779–0.784)	0.780 (0.779–0.782)	0.778 (0.774–0.782)	0.774 (0.769–0.779)	0.758 (0.754–0.762)
	AUROC	0.857 (0.849–0.863)	0.855 (0.854–0.856)	0.848 (0.847–0.849)	0.849 (0.849–0.850)	0.848 (0.846–0.850)	0.829 (0.827–0.831)
Validation sets	Accuracy	0.773 (0.767–0.779)	0.766 (0.760–0.773)	0.774 (0.767–0.782)	0.776 (0.769–0.783)	0.766 (0.755–0.777)	0.755 (0.749–0.761)
	Sensitivity	0.808 (0.773–0.843)	0.794 (0.778–0.810)	0.804 (0.791–0.817)	0.806 (0.796–0.816)	0.788 (0.774–0.802)	0.776 (0.758–0.794)
	Specificity	0.696 (0.638–0.754)	0.704 (0.682–0.726)	0.708 (0.686–0.730)	0.712 (0.691–0.733)	0.720 (0.694–0.746)	0.710 (0.680–0.740)
	PPV	0.852 (0.832–0.872)	0.852 (0.842–0.862)	0.854 (0.846–0.862)	0.856 (0.848–0.864)	0.856 (0.844–0.868)	0.852 (0.841–0.863)
	NPV	0.633 (0.609–0.657)	0.617 (0.603–0.631)	0.630 (0.618–0.642)	0.632 (0.620–0.644)	0.614 (0.597–0.631)	0.598 (0.586–0.610)
	F1 score	0.828 (0.818–0.838)	0.822 (0.815–0.829)	0.828 (0.824–0.832)	0.830 (0.824–0.836)	0.822 (0.811–0.833)	0.810 (0.804–0.816)
	AUROC	0.833 (0.829–0.839)	0.832 (0.825–0.839)	0.837 (0.832–0.842)	0.839 (0.833–0.845)	0.836 (0.829–0.843)	0.823 (0.818–0.828)
Test set	Accuracy	0.739	0.728	0.734	0.747	0.729	0.723
	Sensitivity	0.720	0.700	0.700	0.720	0.700	0.740
	Specificity	0.760	0.770	0.780	0.780	0.770	0.700
	PPV	0.810	0.810	0.820	0.820	0.810	0.770
	NPV	0.666	0.650	0.654	0.671	0.652	0.662
	F1 score	0.760	0.750	0.750	0.770	0.750	0.760
	AUROC	0.813	0.801	0.809	0.812	0.806	0.796

Abbreviations: CI, confidence interval; DNN, deep-learning neural network; GBDT, gradient boosting decision tree; KNN, k-nearest neighbors; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.

Additionally, the predictive performance of the GBDT model was second only to that of the Deep-HH model, and achieved an AUROC of 0.812 for the external test set (Table 2). Both the ROC and PR curves demonstrated its high predictive accuracy (Figure S2a–f). In comparison, the SVM (Figure S3a–f), XGBoost (Figure S4a–f), and RF (Figure S5a–f) models achieved AUROC values of 0.809, 0.806, and 0.801, respectively, for the external test set (Table 2). The KNN model exhibited poorer performance compared with the other models and achieved an AUROC of only 0.796 for the external test set (Figure S6a–f). Table 2 presents the detailed performance metrics of these predictive models for the training sets, internal validation sets, and external test set.

3.3 ML model interpretation

The “SHAP” library identified the top 20 most important features in the Deep-HH model as follows: frequency of eating fruits, paying attention to nutritional balance, frequency of eating coarse grains, satisfaction with diet, grade, monthly expenditure on food, frequency of irregular eating, father's highest education level, understanding of essential nutrients, replenishing micronutrients after sweating, daily sun exposure time, weekly exercise time, mother's highest education level, birthplace, frequency of eating snacks, understanding of HH, frequency of eating fried food, reasons for irregular eating, understanding the relationship between HH and chronic disease, and the only child in the family (Figure 4a). Among these features, frequent consumption of fruits and coarse grains is associated with a lower risk of HH, whereas excessive intake of fried foods may increase it. Furthermore, maintaining a balanced diet, timely replenishment of micronutrients after sweating, and satisfaction with daily meals are correlated with a reduced HH risk (Figure 4b).

4 DISCUSSION

In this study, we developed effective predictive models for HH risk based on the demographic characteristics, nutritional awareness, and lifestyle habits of high school students. Additionally, we used the “SHAP” Python library for model explanation to investigate potential factors that affect the risk of HH in high school students, with the aim of leveraging this insight for early intervention against HH. Furthermore, we developed a web application based on the established model.

Currently, the detection of micronutrients in the human body is typically conducted using techniques such as HPLC, ELISA, and LC-MS/MS [31]. However, these techniques still encounter several limitations in practical applications. For example, studies have shown that HH is more prevalent in low-income and middle-income countries and regions [32]. Unfortunately, in these countries and regions, the medical infrastructure is often underdeveloped, lacking the necessary human resources, physical facilities, and financial resources to support large-scale laboratory screening for HH [33]. Therefore, a low-cost and high-efficiency detection method would be beneficial for the large-scale screening of HH, particularly in these areas. ML, because of its efficiency and convenience, is particularly well-suited for addressing these types of problems. For instance, the diagnostic model for acute appendicitis, developed using ML algorithms, significantly outperforms traditional clinical scoring systems and markedly accelerates the evaluation process. This advancement facilitates earlier medical treatment and substantially reduces labor costs [34]. Our Deep-HH model also effectively addressed the limitations inherent in current HH detection methods. By inputting basic personal information, users can assess their HH risk within minutes using our model, which dramatically reduces the time required compared with traditional methods. The online deployment of this model allows users to rapidly evaluate their HH risk via electronic devices, which significantly lowers equipment costs and reduces the reliance on professional staff. This is particularly beneficial for economically underdeveloped regions with limited resources, which provides our predictive model with a distinct advantage for rapid screening of HH in large-scale populations.

Traditionally, deep learning models have often been regarded as black boxes. However, in the medical field, attempting to interpret such black boxes has become crucial [35]. Therefore, we conducted a detailed analysis using the “SHAP” library. Our findings indicate that dietary habits significantly influence the risk of HH. Individuals who regularly consume fruits and coarse grains ensure nutritional balance in their diet, and those satisfied with their meals exhibit a reduced risk of HH. Conversely, those who frequently indulge in fried foods and exhibit irregular eating habits face an increased risk of HH. Research indicates that fruits are rich in vitamin C, vitamin K, magnesium, and carotenoids [36], and grains contain ample amounts of vitamins B, vitamin E, zinc, selenium, magnesium, and copper [37]. Furthermore, coarse grains are a significant source of dietary fiber, which can help mitigate the risk of coronary heart disease, diabetes, and certain gastrointestinal disorders [38]. Thus, regularly consuming fruits and coarse grains contributes to replenishing these essential nutrients, subsequently reducing the risk of HH. However, the prolonged consumption of substantial quantities of fried foods can elevate the risk of diabetes and might even heighten the risk of throat cancer [39, 40]. Consequently, we recommend minimizing the consumption of fried foods as much as possible. Beyond diet, our research indicates that individual lifestyle habits significantly influence the risk of HH. Specifically, those who quickly replace electrolytes after exercising and sweating, spend more time in the sun each week, and engage in longer durations of weekly physical activity show a reduced risk of HH. Existing studies show that the synthesis of vitamin D in the human body is closely linked to sun exposure duration [41]. Hence, extended periods of sunlight exposure contribute to the efficient synthesis of vitamin D, thereby facilitating nutritional replenishment and consequently lowering the risk of HH. Additionally, we discovered a significant correlation between an individual's familiarity with nutritional knowledge and the individual's risk of HH. Those more knowledgeable about nutrition face a lower risk of HH, which is likely because of their proactive engagement in preventive interventions. Intriguingly, our findings indicate that the higher the educational attainment of the participants' fathers, the lower their risk of HH. This is likely to be linked to improved economic status and enhanced nutritional understanding [42, 43].

Our predictive model demonstrated high accuracy in both the internal validation and external test sets. However, it still showed a gap compared with the HHAS, potentially because of the model's limited focus on the quantitative assessment of diet. Quantitatively evaluating diet increases the complexity of implementation and requires professional guidance, which can impede the rapid screening of HH risk in large-scale populations. Despite this, certain improvements can help to mitigate this gap. Through model interpretation, we identified that dietary habits, particularly the frequency of consuming fruits and coarse grains, significantly impact the risk of HH. Therefore, when subjects demonstrate a marked reduction in the frequency of consuming fruits, coarse grains, and similar foods, it is crucial to direct increased attention toward these cases to enhance the model's detection rate.

However, there were still some limitations to this study. For instance, the groups predicted as high-risk in this study still require further validation through laboratory testing methods. Additionally, this predictive model was established based on a population of high school students and may not be applicable to other age groups. Therefore, there is a need for future research to develop predictive models based on a more diverse sample.

5 CONCLUSIONS

We developed an efficient, accurate, and cost-effective ML-based risk prediction model for HH. This model will facilitate the screening of HH in large-scale populations.

AUTHOR CONTRIBUTIONS

Yang Yang: Conceptualization (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); software (equal); validation (equal); visualization (equal); writing—original draft (equal). Zheng Zhang: Conceptualization (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); software (equal); validation (equal); visualization (equal); writing—original draft (equal). Huake Cao: Formal analysis (equal); investigation (equal); validation (equal); writing—original draft (equal). Yuchen Zhang: Formal analysis (equal); investigation (equal); validation (equal); writing—original draft (equal). Minao Wang: Validation (equal); visualization (equal). Ning Zhang: Funding acquisition (equal); project administration (equal); resources (equal); supervision (equal); validation (equal); writing—review & editing (equal).

ACKNOWLEDGMENTS

We thank the Undergraduate Interdisciplinary Medical Research Association of Anhui Medical University for providing valuable guidance, support, and research opportunities.

CONFLICT OF INTEREST STATEMENT

The authors declare there are no competing interests.

ETHICS STATEMENT

This study was approved by the ethics committee of Anhui Medical University (20190495).

INFORMED CONSENT

All patients provided written informed consent at the time of entering this study.

Open Research

DATA AVAILABILITY STATEMENT

The code used in this study can be downloaded from GitHub (https://github.com/YangYangRes/Deep-HH). The data used in this study are available via Mendeley Data (https://data.mendeley.com/datasets/2v9hf95wdx/2).

Supporting Information

REFERENCES

1Lowe NM. The global challenge of hidden hunger: perspectives from the field. Proc Nutr Soc. 2021; 80(3): 283–289. https://doi.org/10.1017/S0029665121000902
10.1017/S0029665121000902
PubMed Web of Science® Google Scholar
2Eggersdorfer M, Akobundu U, Bailey RL, Shlisky J, Beaudreault AR, Bergeron G, et al. Hidden hunger: solutions for America's aging populations. Nutrients. 2018; 10(9):1210. https://doi.org/10.3390/nu10091210
10.3390/nu10091210
PubMed Web of Science® Google Scholar
3Ibeanu VN, Edeh CG, Ani PN. Evidence-based strategy for prevention of hidden hunger among adolescents in a suburb of Nigeria. BMC Public Health. 2020; 20(1):1683. https://doi.org/10.1186/s12889-020-09729-8
10.1186/s12889-020-09729-8
PubMed Google Scholar
4Harding KL, Aguayo VM, Webb P. Hidden hunger in South Asia: a review of recent trends and persistent challenges. Public Health Nutr. 2018; 21(4): 785–795. https://doi.org/10.1017/s1368980017003202
10.1017/S1368980017003202
PubMed Web of Science® Google Scholar
5Burchi F, Fanzo J, Frison E. The role of food and nutrition system approaches in tackling hidden hunger. Int J Environ Res Public Health. 2011; 8(2): 358–373. https://doi.org/10.3390/ijerph8020358
10.3390/ijerph8020358
PubMed Web of Science® Google Scholar
6Zimmermann MB. Iodine deficiency. Endocr Rev. 2009; 30(4): 376–408. https://doi.org/10.1210/er.2009-0011
10.1210/er.2009-0011
CAS PubMed Web of Science® Google Scholar
7Tuerk MJ, Fazel N. Zinc deficiency. Curr Opin Gastroenterol. 2009; 25(2): 136–143. https://doi.org/10.1097/mog.0b013e328321b395
10.1097/MOG.0b013e328321b395
CAS PubMed Web of Science® Google Scholar
8Song P, Adeloye D, Li S, Zhao D, Ye X, Pan Q, et al. The prevalence of vitamin A deficiency and its public health significance in children in low- and middle-income countries: a systematic review and modelling analysis. J Glob Health. 2023; 13:04084. https://doi.org/10.7189/jogh.13.04084
10.7189/jogh.13.04084
PubMed Google Scholar
9Abano EE, Godbless Dadzie R. Simultaneous detection of water-soluble vitamins using the High Performance Liquid Chromatography (HPLC) - a review. Croat J Food Sci Technol. 2014; 6(2): 116–123. https://doi.org/10.17508/cjfst.2014.6.2.08
10.17508/CJFST.2014.6.2.08
Google Scholar
10Kakitani A, Inoue T, Matsumoto K, Watanabe J, Nagatomi Y, Mochizuki N. Simultaneous determination of water-soluble vitamins in beverages and dietary supplements by LC-MS/MS. Food Addit Contam Part A Chem Anal Control Expo Risk Assess. 2014; 31(12): 1939–1948. https://doi.org/10.1080/19440049.2014.977965
10.1080/19440049.2014.977965
CAS PubMed Google Scholar
11Puts J, de Groot M, Haex M, Jakobs B. Simultaneous determination of underivatized vitamin B1 and B6 in whole blood by reversed phase ultra high performance liquid chromatography tandem mass spectrometry. PLoS One. 2015; 10(7):e0132018. https://doi.org/10.1371/journal.pone.0132018
10.1371/journal.pone.0132018
PubMed Google Scholar
12Oxley A, Berry P, Taylor GA, Cowell J, Hall MJ, Hesketh J, et al. An LC/MS/MS method for stable isotope dilution studies of β-carotene bioavailability, bioconversion, and vitamin A status in humans. J Lipid Res. 2014; 55(2): 319–328. https://doi.org/10.1194/jlr.d040204
10.1194/jlr.D040204
CAS PubMed Google Scholar
13Shah I, James R, Barker J, Petroczi A, Naughton DP. Misleading measures in Vitamin D analysis: a novel LC-MS/MS assay to account for epimers and isobars. Nutr J. 2011; 10(1):46. https://doi.org/10.1186/1475-2891-10-46
10.1186/1475-2891-10-46
CAS PubMed Google Scholar
14Zhang Y, Bala V, Mao Z, Chhonker YS, Murry DJ. A concise review of quantification methods for determination of vitamin K in various biological matrices. J Pharm Biomed Anal. 2019; 169: 133–141. https://doi.org/10.1016/j.jpba.2019.03.006
10.1016/j.jpba.2019.03.006
CAS PubMed Google Scholar
15Korchazhkina O, Jones E, Czauderna M, Spencer SA, Kowalczyk J. HPLC with UV detection for measurement of vitamin E in human milk. Acta Chromatogr. 2005; 16:48.
Google Scholar
16Sauberlich HE, Dowdy RP, Skala JH. Laboratory tests for the assessment of nutritional status. CRC Crit Rev Clin Lab Sci. 1973; 4(3): 215–340. https://doi.org/10.3109/10408367309151557
10.3109/10408367309151557
CAS PubMed Google Scholar
17Sofiantin N, Kurniawan LB, Arif M. Analysis of ferritin levels, TIBC and Fe serum in central obesity and non central obesity. STRADA J Ilm Kesehat. 2021; 10(1): 1265–1271. https://doi.org/10.30994/SJIK.V10I1.691
10.30994/SJIK.V10I1.691
Google Scholar
18Smith JC, Butrimovitz GP, Purdy WC. Direct measurement of zinc in plasma by atomic absorption spectroscopy. Clin Chem. 1979; 25(8): 1487–1491. https://doi.org/10.1093/clinchem/25.8.1487
10.1093/clinchem/25.8.1487
CAS PubMed Web of Science® Google Scholar
19Forrer R, Gautschi K, Lutz H. Simultaneous measurement of the trace elements Al, As, B, Be, Cd, Co, Cu, Fe, Li, Mn, Mo, Ni, Rb, Se, Sr, and Zn in human serum and their reference ranges by ICP-MS. Biol Trace Elem Res. 2001; 80(1): 77–93. https://doi.org/10.1385/BTER:80:1:77
10.1385/BTER:80:1:77
CAS PubMed Google Scholar
20Höller U, Bakker SJL, Düsterloh A, Frei B, Köhrle J, Konz T, et al. Micronutrient status assessment in humans: current methods of analysis and future trends. Trac Trends Anal Chem. 2018; 102: 110–122. https://doi.org/10.1016/j.trac.2018.02.001
10.1016/j.trac.2018.02.001
CAS Web of Science® Google Scholar
21El NI, Murphy MJ. What is machine learning? In: Machine learning in radiation oncology. Cham: Springer International Publishing; 2015. p. 3–11. https://doi.org/10.1007/978-3-319-18305-3_1
Google Scholar
22MacEachern SJ, Forkert ND. Machine learning for precision medicine. Genome. 2021; 64(4): 416–425. https://doi.org/10.1139/gen-2020-0131
10.1139/gen-2020-0131
PubMed Web of Science® Google Scholar
23Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019; 18(6): 463–477. https://doi.org/10.1038/s41573-019-0024-5
10.1038/s41573-019-0024-5
CAS PubMed Web of Science® Google Scholar
24Menden MP, Iorio F, Garnett M, McDermott U, Benes CH, Ballester PJ, et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS One. 2013; 8(4):e61318. https://doi.org/10.1371/journal.pone.0061318
10.1371/journal.pone.0061318
CAS PubMed Web of Science® Google Scholar
25Kassahun Y, Yu B, Tibebu AT, Stoyanov D, Giannarou S, Metzen JH, et al. Surgical robotics beyond enhanced dexterity instrumentation: a survey of machine learning techniques and their role in intelligent and autonomous surgical actions. Int J Comput Assist Radiol Surg. 2016; 11(4): 553–568. https://doi.org/10.1007/s11548-015-1305-z
10.1007/s11548-015-1305-z
PubMed Web of Science® Google Scholar
26Kiranantawat K, Sitpahul N, Taeprasartsit P, Constantinides J, Kruavit A, Srimuninnimit V, et al. The first Smartphone application for microsurgery monitoring: SilpaRamanitor. Plast Reconstr Surg. 2014; 134(1): 130–139. https://doi.org/10.1097/PRS.0000000000000276
10.1097/PRS.0000000000000276
CAS PubMed Google Scholar
27Mohammad-Rahimi H, Nadimi M, Ghalyanchi-Langeroudi A, Taheri M, Ghafouri-Fard S. Application of machine learning in diagnosis of COVID-19 through X-ray and CT images: a scoping review. Front Cardiovasc Med. 2021; 8:638011. https://doi.org/10.3389/fcvm.2021.638011
10.3389/fcvm.2021.638011
CAS PubMed Google Scholar
28Pang J, Liang B, Ding R, Yan Q, Chen R, Xu J. A denoised multi-omics integration framework for cancer subtype classification and survival prediction. Brief Bioinform. 2023; 24(5):bbad304. https://doi.org/10.1093/bib/bbad304
10.1093/bib/bbad304
PubMed Google Scholar
29Mei J, Desrosiers C, Frasnelli J. Machine learning for the diagnosis of Parkinson's disease: a review of literature. Front Aging Neurosci. 2021; 13:633752. https://doi.org/10.3389/fnagi.2021.633752
10.3389/fnagi.2021.633752
PubMed Web of Science® Google Scholar
30Zhang N, Wang M, Zhang Y, Cao H, Yang Y, Shi Y, et al. Reliability and validity of the hidden hunger assessment scale in China-revised for high school students. Glob Health J. 2023; 7(2): 110–116. https://doi.org/10.1016/j.glohj.2023.05.001
10.1016/j.glohj.2023.05.001
CAS Google Scholar
31Udhani R, Kothari C, Sarvaiya J. A comprehensive study: traditional and cutting-edge analytical techniques for the biomarker based detection of the micronutrients & POC sensing directions for next-generation diagnostic. Crit Rev Anal Chem. 2024; 54(7): 2378–2397. https://doi.org/10.1080/10408347.2023.2169823
10.1080/10408347.2023.2169823
CAS PubMed Google Scholar
32Gödecke T, Stein AJ, Qaim M. The global burden of chronic and hidden hunger: trends and determinants. Glob Food Secur. 2018; 17: 21–29. https://doi.org/10.1016/j.gfs.2018.03.004
10.1016/j.gfs.2018.03.004
Web of Science® Google Scholar
33Peters DH, Garg A, Bloom G, Walker DG, Brieger WR, Hafizur Rahman M. Poverty and access to health care in developing countries. Ann N Y Acad Sci. 2008; 1136(1): 161–171. https://doi.org/10.1196/annals.1425.011
10.1196/annals.1425.011
PubMed Web of Science® Google Scholar
34Issaiy M, Zarei D, Saghazadeh A. Artificial intelligence and acute appendicitis: a systematic review of diagnostic and prognostic models. World J Emerg Surg. 2023; 18(1):59. https://doi.org/10.1186/s13017-023-00527-2
10.1186/s13017-023-00527-2
PubMed Google Scholar
35Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. 2019; 1(5): 206–215. https://doi.org/10.1038/s42256-019-0048-x
10.1038/s42256-019-0048-x
PubMed Web of Science® Google Scholar
36Slavin JL, Lloyd B. Health benefits of fruits and vegetables. Adv Nutr. 2012; 3(4): 506–516. https://doi.org/10.3945/an.112.002154
10.3945/an.112.002154
CAS PubMed Web of Science® Google Scholar
37Slavin JL, Jacobs D, Marquart L, Wiemer K. The role of whole grains in disease prevention. J Am Diet Assoc. 2001; 101(7): 780–785. https://doi.org/10.1016/s0002-8223(01)00194-8
10.1016/S0002-8223(01)00194-8
CAS PubMed Web of Science® Google Scholar
38Nirmala Prasadi VP, Joye IJ. Dietary fibre from whole grains and their benefits on metabolic health. Nutrients. 2020; 12(10):3045. https://doi.org/10.3390/nu12103045
10.3390/nu12103045
PubMed Web of Science® Google Scholar
39Osorio-Yáñez C, Gelaye B, Qiu C, Bao W, Cardenas A, Enquobahrie DA, et al. Maternal intake of fried foods and risk of gestational diabetes mellitus. Ann Epidemiol. 2017; 27(6): 384–390.e1. https://doi.org/10.1016/j.annepidem.2017.05.006
10.1016/j.annepidem.2017.05.006
PubMed Google Scholar
40Bosetti C, Talamini R, Levi F, Negri E, Franceschi S, Airoldi L, et al. Fried foods: a risk factor for laryngeal cancer? Br J Cancer. 2002; 87(11): 1230–1233. https://doi.org/10.1038/sj.bjc.6600639
10.1038/sj.bjc.6600639
CAS PubMed Web of Science® Google Scholar
41Alfredsson L, Armstrong BK, Butterfield DA, Chowdhury R, de Gruijl FR, Feelisch M, et al. Insufficient Sun exposure has become a real public health problem. Int J Environ Res Public Health. 2020; 17(14):5014. https://doi.org/10.3390/ijerph17145014
10.3390/ijerph17145014
CAS PubMed Web of Science® Google Scholar
42Siddiqui F, Salam RA, Lassi ZS, Das JK. The intertwined relationship between malnutrition and poverty. Front Public Health. 2020; 8:453. https://doi.org/10.3389/fpubh.2020.00453
10.3389/fpubh.2020.00453
PubMed Web of Science® Google Scholar
43Tesfaye A, Adissu Y, Tamiru D, Belachew T. Nutritional knowledge, nutritional status and associated factors among pregnant adolescents in the West Arsi Zone, central Ethiopia. Sci Rep. 2024; 14(1):6879. https://doi.org/10.1038/s41598-024-57428-w
10.1038/s41598-024-57428-w
CAS PubMed Google Scholar

Volume2, Issue4

December 2024

Pages 349-360

Deep-HH: A deep learning-based high school student hidden hunger risk prediction system