Diagnostic Panel of Three Genetic Biomarkers Based on Artificial Neural Network for Patients With Idiopathic Generalized Epilepsy
Abstract
The aim of this study is to evaluate the utility of an artificial neural network (ANN) model in diagnosing idiopathic generalized epilepsy (IGE) and to compare the results of the diagnostic model constructed by combining the expression levels of miR-146a, miR-155, and miR-132 genes using ANN, random forest (RF), and discriminant analysis (DA). qRT-PCR is employed to determine the expression levels of the three miRNA genes. Forty-six IGE patients and 51 healthy controls were included in the study. Three genetic biomarkers were employed to assess the discriminative power of the disease, and they were combined using ANN. Additionally, the performance of ANN was compared with RF and DA. Compared to healthy controls, the miR-132 gene was significantly higher (p < 0.001) and the miR-155 and miR-146a genes were significantly lower in IGE patients (p < 0.001). The area under the curve (AUC) for predictions made by the ANN, RF, and DA were 0.96, 0.87, and 0.75, respectively, with accuracy rates of 0.96, 0.88, and 0.76, respectively. We demonstrate that ANN exhibits the highest accuracy, AUC, sensitivity, and specificity values among the three methods. The obtained results indicate that the combination of the three genes used as markers in IGE plays a significant role in the diagnosis of the disease. Instead of assessing biomarkers individually for the disease, combining them using machine learning methods leads to improved model performance. Additionally, not relying on a single genetic biomarker for the disease enables discrimination based on the collective impact of all biomarkers.
1. Introduction
Idiopathic generalized epilepsies (IGEs) represent the most prevalent category of epilepsy conditions. These disorders become apparent during the period from early childhood to adolescence and account for around 33% of all instances of epilepsy [1]. IGEs are a clearly delineated group with distinct electrophysiological features [2]. They are marked by widespread spike-wave discharges, variations in when they begin, distinct abnormalities in electroencephalography (EEG) readings, the absence of brain abnormalities, and typical developmental traits [3, 4]. The clinical observations in IGEs commonly involve characteristic absence seizures, myoclonic jerks, and generalized tonic-clonic seizures (GTCSs), either occurring individually or in conjunction [5]. Consequently, numerous genes linked to IGE have been recognized. Concurrently, there is ongoing exploration into genetic indicators for IGE that can be utilized for both diagnosis and monitoring purposes. The most significant challenge encountered in IGEs is the ability to differentiate the disease from other epilepsy types. For this purpose, specific markers (genetic, imaging, EEG records, etc.) are being investigated to determine the specific feature unique to the disease [6, 7]. Notably, microRNAs (miRNAs) appear to hold significance in this regard [8]. miRNAs are small noncoding RNAs that function as regulatory agents. It’s established that miRNAs play precision-adjusting roles in fundamental biological mechanisms by controlling gene expression through processes like mRNA degradation or the inhibition of translation [9, 10]. Hence, changes in the levels of miRNA expression have been linked to various conditions, including neurological disorders [11, 12]. Because of these characteristics, miRNAs have emerged as significant targets for therapy and/or diagnostic markers for numerous specific diseases. However, the specificities and sensitivities of genetic biomarkers may not be effectively analyzed using basic statistical analysis approaches [13, 14]. Additionally, examining genetic biomarkers individually for a disease can reduce the robustness of the analysis. Therefore, data mining methods that enable the evaluation of the collective impact of biomarkers have been developed [15]. Among these methods, the most commonly preferred and highly accurate model is artificial neural networks (ANNs). When examining the reasons for the widespread use of the ANN method, it can be observed that it possesses an excellent learning capability. When trained on large datasets, ANNs have the ability to identify complex relationships and extract meaningful features from these data. It is a flexible method that lacks any assumptions. ANN possesses the ability to adapt to increasing data: As the volume of data increases, the method’s performance improves [16]. There are also studies that demonstrate the superiority of ANN applications over traditional statistical methods. For instance, when ANN is employed as effective tools for diagnosing lung cancer, there is a significant improvement in diagnostic specificity and accuracy [17, 18]. McLaren et al. highlighted the superiority of the ANN approach over logistic regression when it comes to categorizing malignant breast lesions [19]. In another study, Duan et al. conducted the ANN technique to combine four distinct genes functioning as biomarkers for the early detection of lung cancer [17]. Their findings indicated that compared to discriminant analysis (DA), the ANN method exhibited enhanced accuracy and an elevated area under the curve (AUC) value.
ANN is a type of information processing technology developed by drawing inspiration from the information processing technique of the human brain. ANN imitates the workings of a simple biological nervous system. This involves the digital modeling of biological neurons and the synaptic connections between them. Also, it proposes different computing method from the known calculation methods. This computing method, which adapts to the environment it is in, works with incomplete information, can make decisions under uncertainty, is tolerant to errors, and has successful applications in almost every aspect of life. Therefore, the use of ANN for diagnosis and prognosis in medicine has increased in recent years [17, 20, 21]. In the scope of the study, the application of ANN will be examined for the selected three miRNAs in IGE disease, and as a result of a detailed literature research, miR-132, miR-146, and miR-155 have been selected. In the light of this information, the aim of this study is to identify and combine the genetic biomarkers miR-132, miR-146a, and mir-155 for the diagnosis of generalized idiopathic epilepsy based on recurrent ANN models.
2. Materials and Methods
2.1. Study Population
A total of study population 46 IGE patients and 51 healthy controls were obtained from Bezmialem Vakif University Faculty of Medicine, Department of Neurology. Healthy controls included in the study were consisted of volunteers without any neurological or systemic disease. After obtaining informed consent from the research subjects, researchers and physicans in the field collected their demographic data and peripheral blood samples. Demographic data included age, gender, seizure onset age, myoclonus onset age, and absence onset age. The study protocol was approved by the Local Ethics Committee (#10.03.2021/3-11) at the Bezmialem Vakif University, Türkiye.
2.2. RNA Extraction and Quantitative Real-Time Polymerase Chain Reaction (qRT-PCR)
Total cellular RNA in whole blood was extracted by QIAamp RNA Blood Mini Kit (Qiagen, Hilden, Germany). Quantity and purity of the extracted RNA were examined by Multiskan GO (Thermo Fisher Scientific, Boston, Massachusetts, United States). Total RNA was used to synthesize complementary DNA (cDNA) via miRNA All-in-One cDNA Synthesis (AbmGood, Vancouver, British Columbia, Canada). According to manufactures’ instructions, qRT-PCR was performed using BlasTaq 2X qPCR MasterMix (AbmGood). The qRT-PCR were carried out on the Bio-Rad CFX96 Connect Real-time PCR system (Bio-Rad Laboratories, Inc., Hercules, California, United States) with U6 primer as an internal reference. The qRT-PCR conditions were as follows: 95°C for 3 min, 40 cycles of 95°C for 15 s, 60°C for 10 s, and 72°C for 50 s. miRNA primers were purchased from AbmGood with the following catalog numbers: hsa-miR-132 forward (#MPH01144), hsa-miR-146a forward (#MPH01169), and hsa-miR-155 forward (#MPH01188) with universal 3 ′ miRNA reverse primer (#MPH00000).
2.3. Selection of miRNAs
The extensive impact of miR-132 on the nervous system has been reported in the literature. Especially, the characteristics of miR-132 in patients with epilepsy, including tolerance and a significant ability to react to different stimuli, are known [22]. Additionally, in another study conducted with human primary astrocytes, it has been shown to be effective on the inflammatory pathway and thereby contribute to the pathogenesis of epilepsy [23]. According to the article titled “Dysregulation of miR-146a: a causative factor in epilepsy pathogenesis, diagnosis, and prognosis” published in 2023, this miRNA is directly responsible for epilepsy pathogenesis [24, 25]. miR-146a plays a role in the formation of both focal and generalized epilepsy [26]. miR-155, again, plays a role in epilepsy pathogenesis through inflammation [8]. In addition, it plays a role in the pathogenesis through sodium channels [8].
2.4. Statistical Analysis
The distribution of the data was analyzed with the Shapiro-Wilk test. Mann–Whitney U test was used for comparisons of healthy controls and patients groups. Pearson chi-square test was used to evaluate the difference of categorical variables between groups. Descriptive statistics of numerical variables are expressed as median (min–max). Categorical variables are expressed as frequency (percentage). Examination of the relationship between genetic biomarkers was used Spearman correlation analysis. Significance was evaluated at p < 0.05 levels. Dataset was analyzed using IBM SPSS Statistics 26.0 software program.
2.5. ANN Model Design
- •
Step 1: The search interval for layers is initialized, and the data is split into three sets: training, validation, and test. The interval for the number of neurons in the first layer is set to 1 to 5. The interval for the number of neurons in the second layer is set to 1 to 2. Training set: It is used to train the ANN (52.5% of the observations are used as training set). Validation set: It is used to determine the optimum number of nodes in the layers (22.5% of the observations are used as the validation set). Test set: It is used to evaluate the performance of ANN (25% of the observations are used as test set).
- •
Step 2: ANN is trained using training set.
- a.
Step 2.1: The number of neurons in the first and the second layers is set to 1.
- b.
Step 2.2: ANN is trained by using training set.
- c.
Step 2.3: Predictions are obtained for validation set, and RMSE and MAPE values are calculated.
- d.
Step 2.4: Increase the number of neurons by one in the second layer and go to Step 2.2.
- e.
Step 2.5: Increase the number of neurons by one in the first layer and go to Step 2.2.
- f.
Step 2.6: Obtain the optimum numbers of neurons that has the minimum RMSE/MAPE in the first and second layer.
- •
Step 3: Training and validation sets are combined in the training set.
- •
Step 4: ANN is trained by using the combined training set.
- •
Step 5: Predictions are obtained for test set, and RMSE and MAPE values are calculated for evaluation.
3. Results
The study included 68 (70.1%) females and 29 (29.9%) males. It is observed that the patient and control groups are homogeneous in terms of gender (p = 0.912). There is a significantly difference between the groups in terms of age (p < 0.001), with the median age of the patient group (24 (18–48)) found to be lower compared to the median age of the control group (31 (20–61)). Among 46 patients with IGE, 31 (67.4%) were diagnosed with juvenile myoclonic epilepsy (JME), 6 (13%) with juvenile absence epilepsy (JAE), and 9 (19.6%) with GTCS. When examining the seizure types of the patients, 33 (71.7%) had myoclonic seizures, 11 (23.9%) had absence seizures, and 38 (82.6%) had juvenile tonic-clonic (JTC) seizures. The distribution of the drugs used was as follows: levetiracetam (LEV) in 25 patients (54.3%), lamotrigine (LMT) in 9 patients (19.6%), valproic acid (VPA) in 18 patients (39.1%), and oxcarbazepine (OCZ) in 1 patient (2.2%). The results of the comparison of demographic characteristics between the patient and control groups are given in Table 1.
Control (n = 51) | Patient (n = 46) | p value | |
---|---|---|---|
Gender | |||
Female | 36 (70.6%) | 32 (69.6%) | 0.912a |
Male | 15 (29.4%) | 14 (30.4%) | |
Age | 31 (20–61) | 24 (18–48) | <0.001b |
Diagnose | |||
JME | — | 31 (67.4%) | NA |
JAE | — | 6 (13%) | NA |
GTCS | — | 9 (19.6%) | NA |
Seizure type | |||
Myoclonic | — | 33 (71.7%) | NA |
Absence | — | 11 (23.9%) | NA |
JTK | — | 38 (82.6%) | NA |
Drug | |||
LEV | — | 25 (54.3%) | NA |
LMT | — | 9 (19.6%) | NA |
VPA | — | 18 (39.1%) | NA |
OCZ | — | 1 (2.2%) | NA |
- Note: Data are expressed as frequency (percentage) or median (min–max).
- Abbreviation: NA: not applicable.
- aPearson chi-square test.
- bMann–Whitney U test.
In the study, expression levels of miR-146a, miR-155, and miR-132 subtypes were calculated to be used as a biomarker for the diagnosis of IGE. Statistical comparisons between the patient and healthy control groups, as well as comparisons based on gender, are given in Tables 2 and 3.
Control (n = 51) | Patient (n = 46) | p valuea | |
---|---|---|---|
miR-132 | 0.1236 (0.0197–0.6073) | 0.2371 (0.0741–0.4686) | <0.001 |
miR-146a | 0.0247 (0.0069–0.3457) | 0.0146 (0.0042–0.0410) | <0.001 |
miR-155 | 0.0027 (0.0005–0.4517) | 0.0010 (0.0002–-0.0026) | <0.001 |
- Note: Data are expressed as median (min–max).
- aMann–Whitney U test.
miR-132 | miR-146a | miR-155 | |||||||
---|---|---|---|---|---|---|---|---|---|
Control | Patient | p value | Control | Patient | p value | Control | Patient | p value ∗ | |
Gender | |||||||||
Female | 0.099 (0.019–0.6073) | 0.229 (0.074–0.468) | <0.001 | 0.022 (0.006–0.345) | 0.0131 (0.004–0.041) | 0.001 | 0.002 (0–0.045) | 0.001 (0–0.001) | <0.001 |
Male | 0.171 (0.087–0.451) | 0.291 (0.079–0.463) | 0.063 | 0.028 (0.013–0.056) | 0.015 (0.004–0.026) | 0.001 | 0.002 (0.001–0.010) | 0.001 (0–0.002) | <0.001 |
p value ∗∗ | 0.039 | 0.073 | 0.363 | 0.489 | 0.549 | 0.650 |
- ∗The p value corresponds to the comparison of gene expression levels between the patient and control groups after stratification by gender.
- ∗∗The p value corresponds to the comparison of gene expression levels between female and male after stratification by patients and healthy controls.
When the expression levels of miR-132, miR-146a, and miR-155 genes were evaluated between IGE patients and a healthy control group, a statistically significant difference was found (p < 0.001). miR-132 was found to be higher in the patient group compared to the control group, while miR-146a and miR-155 were found to be lower in the patient group compared to the control group. When evaluating the relationship between the biomarker genes miR-155, miR-146a, and miR-132, which are used as biomarkers in IGE disease, it can be observed that there is a statistically significant, positive, and moderate correlation between mir155 and mir146a, as well as between mir155 and mir132 genes (r = 0.566, p < 0.001 and r = 0.687, p < 0.001, respectively). Additionally, a statistically significant, positive, and moderate correlation exists between mir146 and mir132 (r = 0.732, p < 0.001).
There is a statistically significant difference in the expression levels of miR-132, miR-146a, and miR-155 genes between the female patient and control groups (p < 0.001). The expression level of miR-132 is significantly higher in the female patient group compared to the control group, while the expression levels of miR-146a and miR-155 are significantly lower in the female patient group compared to the control group. In males, there is no significant difference in miR-132 (p = 0.063), but the expression levels of miR-146a and miR-155 are significantly lower in the male patient group compared to the control group. Additionally, a significant difference was observed in miR-132 gene expression levels based on gender in the control group (p = 0.039). Accordingly, in the control group, the miR-132 level in males was found to be significantly higher compared to females. However, there was no significant difference in the expression levels of miR-146a and miR-155 based on gender in the control group (p > 0.05). On the other hand, in IGE patients, the expression levels of all three genes did not show a statistically significant difference based on gender (p > 0.05).
There is a statistically significant, negative, and weak correlation between age and miR-132 in the entire sample (r = −0.247, p = 0.015). However, there is no significant relationship between age and miR-146a or miR-155 (p = 0.130, p = 0.071, respectively). In IGE patients and the healthy control group, no significant relationship was found between age and the gene expression levels of miR-132, miR-155, and miR-146a (p > 0.05) (Table 4).
Total | IGE | Control | ||||
---|---|---|---|---|---|---|
Age | ||||||
r | p value | r | p value | r | p value | |
miR-132 | −0.247 | 0.015 | −0.159 | 0.290 | −0.208 | 0.143 |
miR-146a | 0.155 | 0.130 | −0.083 | 0.584 | −0.125 | 0.384 |
miR-155 | 0.184 | 0.071 | −0.120 | 0.427 | 0.073 | 0.610 |
- Note: r: Spearman correlation coefficient.
3.1. Results of ANN
The purpose of normalization is to prevent errors that may arise from differences in measurement levels between the input and output. After normalizing, the data it is divided into three sets: 52.5% for the training, 22.5% for validation, and 25% for the test sets. After obtaining the optimum neuron numbers in layers, training and validation sets are combined in the training set. The BP algorithm is a widely used type of neural network model that consists of multiple layers. This algorithm involves propagating the error signals backward through the network and adjusting the weights of the connections between neurons to minimize the overall error (J. [27]). Also, the learning rule of BP involves adjusting the weight and threshold values of the network, adopting the steepest descent method to achieve the minimum sum of squared errors. The generated model was evaluated using diagnostic test evaluation methods such as sensitivity, specificity, accuracy, and AUC. An AUC value lower than 0.5 indicates no diagnostic significance, values between 0.5 and 0.7 indicate low accuracy, values between 0.7 and 0.9 indicate accurate results, and values above 0.9 indicate the most accurate results [28].
Due to the nonnormal distribution of the miR-132, miR-146a, and miR-155 genes, after applying the normality transformation, the classification results of the constructed ANN, random forest (RF), and DA models were obtained as shown in Table 4. The classification obtained by the ANN achieved an accuracy of 96% in both the training and predicted set, whereas for RF, the accuracy was 88% in the training set and 88% in the predicted set. In the case of DA, the accuracy was determined to be 81% in the training set and 76% in the predicted set (Table 5).
Groups | Training set (n = 75%) | Predicted set (n = 25%) | ||
---|---|---|---|---|
IJE | Control | IJE | Control | |
ANN | ||||
IGE | 34 | 0 | 12 | 0 |
Control | 2 | 36 | 1 | 12 |
Total | 36 | 36 | 13 | 12 |
Accuracy (%) | 96 | 96 | ||
Random forest | ||||
IGE | 35 | 0 | 9 | 2 |
Control | 0 | 37 | 1 | 13 |
Total | 35 | 37 | 10 | 15 |
Accuracy (%) | 100 | 88 | ||
Discriminant analysis | ||||
IGE | 28 | 7 | 9 | 2 |
Control | 7 | 30 | 4 | 10 |
Total | 35 | 37 | 13 | 12 |
Accuracy (%) | 81 | 76 |
According to Table 6, it can be observed that on the one hand, based on the predictions obtained from the test set, the ANN method has better AUC, sensitivity, specificity, and classification accuracy compared to the RF method. On the other hand, the RF method demonstrates better AUC, sensitivity, specificity, and classification accuracy than the DA. It can be concluded that DA is the weakest method in terms of prediction accuracy, while ANN is the strongest method. As a result, instead of classifying IGE patients based on an evaluated single gene (miR-132 or miR-146a or miR-155), modeling the three genes that have an impact on this disease has resulted in higher AUC, sensitivity, specificity, and accuracy rates. The obtained ANN structure after searching the optimum number of neurons in hidden layers is presented in Figure 1. Inspecting the Figure 1, it is clear that there are two hidden layers and two neurons in each layer.
Evaluation criteria | ANN | Random forest | Discriminant analysis |
---|---|---|---|
AUC | 0.96 | 0.87 | 0.75 |
Sensitivity (%) | 93 | 93 | 86 |
Specificity (%) | 100 | 82 | 64 |
Accuracy (%) | 96 | 88 | 76 |

4. Discussion
Due to the complex nature of epilepsy, a disease-specific specific biomarker has not yet been identified. This situation applies to IGE, which is the most commonly observed among all epilepsies. While promising studies exist in the literature regarding this fact, a consensus has not yet been reached [29, 30]. Currently, studies are being conducted on the widespread use of miRNAs in the diagnosis, follow-up, and possible treatment of many neurological diseases [8, 31]. The studies in the literature generally involve findings obtained by comparing limited patient groups with healthy cohorts [26, 32]. To overcome the issues arising from the limitations of these samples, studies based on various statistical approaches are being conducted nowadays. This way, more precise results regarding the utilized genetic biomarker miRNAs are being aimed to be obtained. Taking these aforementioned points into consideration, in this study, the distinctiveness of the miR-132, miR-146a, and miR-155 genes in IGE patients and their relationship with IGE disease were examined. The results revealed that all three genes individually had an impact on IGE patients, and an increase in miR-132 expression along with decreased levels of miR-146a and miR-155 posed a risk for IGE disease. Furthermore, when evaluating the effect of gender on miR-132, miR-146a, and miR-155 gene expression levels, it was observed that the levels of these three genes did not vary according to gender in IGE patients.
ANN has emerged as a result of mathematically modeling the learning process, taking inspiration from the human brain. They imitate the structure of biological neural networks in the brain and replicate their abilities in learning, memory, and generalization. The aim of this study was to mathematically model the impact of the miR-132, miR-146a, and miR-155 genes, evaluated as biomarkers in IGE disease, using ANNs, to elucidate their role in early diagnosis of the disease [17, 33]. In the studies, it has been observed that the ANN method, when ensembled with the miR-132, miR-146a, and miR-155 genes, distinguishes IGE patients with higher accuracy, AUC, sensitivity, and specificity values compared to the commonly used DA and RF methods for classification. Also, it is known that ANN, unlike univariate statistical methods, does not perform classification based on a single biomarker but provides more reliable results through combinations of multiple genes [34]. As a limitation of the study, the ANN model could be optimized by increasing the number of genes used to distinguish IGE disease and by incorporating additional clinical information. This way, the value of biomarkers combined with data mining could be enhanced in the early diagnosis of IGE disease. In addition, neural network models, while having the advantage that they use information from multiple genes, are also “black box” models, because it is close to impossible to get any insight about the learned function from the parameters/weights.
5. Conclusion
As a conclusion, the combination of miR-132, miR-146a, and miR-155 genes can be used as biomarkers for IGE diagnosis or serve a role in early detection. Combining different gene markers yields stronger results for diagnosis, as these genes complement each other. In the model created using ANN, the three gene biomarkers have been found to be accurate and beneficial in distinguishing IGE disease.
Nomenclature
-
- ANN
-
- artificial neural network
-
- AUC
-
- area under the curve
-
- BP
-
- back propagation
-
- DA
-
- discriminant analysis
-
- EEG
-
- electroencephalography
-
- GTCS
-
- generalized tonic-clonic seizure
-
- IGE
-
- idiopathic generalized epilepsy
-
- JAE
-
- juvenile absence epilepsy
-
- JME
-
- juvenile myoclonic epilepsy
-
- JTC
-
- juvenile tonic-clonic
-
- LEV
-
- levetiracetam
-
- LMT
-
- lamotrigine
-
- MAPE
-
- mean absolute percentage error
-
- miRNA
-
- microribonucleic acid
-
- mRNA
-
- messenger ribonucleic acid
-
- OCZ
-
- oxcarbazepine
-
- qRT-PCR
-
- quantitative real-time polymerase chain reaction
-
- RF
-
- random forest
-
- RMSE
-
- root mean squared error
-
- VPA
-
- valproic acid
Conflicts of Interest
The authors declare no conflicts of interest.
Funding
This study funded by Bezmialem Vakif University Scientific Research Project Unit (grant number: 20210408).
Acknowledgments
We thank Alisan Bayrakoglu MD. for his support in the classification of patients and healthy controls.
Open Research
Data Availability Statement
Data are available on request from the authors.