Evidence-Based Complementary and Alternative Medicine

Volume 2022, Issue 1 7708376

Research Article

Open Access

Knowledge-Based Recurrent Neural Network for TCM Cerebral Palsy Diagnosis

Dongmei Li

orcid.org/0000-0001-7409-6656

School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China bjfu.edu.cn

Engineering Research Center for Forestry-oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China

Search for more papers by this author

Jintao Qu,

Jintao Qu

orcid.org/0000-0001-6894-388X

School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China bjfu.edu.cn

Engineering Research Center for Forestry-oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China

Search for more papers by this author

Ziwei Tian,

Ziwei Tian

School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China bjfu.edu.cn

Engineering Research Center for Forestry-oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China

Search for more papers by this author

Zijun Mou,

Zijun Mou

Shandong University of Traditional Chinese Medicine, Jinan 250355, Shandong, China sdutcm.edu.cn

Search for more papers by this author

Lei Zhang,

Lei Zhang

orcid.org/0000-0003-0529-1305

National Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China cacms.ac.cn

Search for more papers by this author

Xiaoping Zhang,

Corresponding Author

Xiaoping Zhang

[email protected]

orcid.org/0000-0001-6116-1906

National Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China cacms.ac.cn

Search for more papers by this author

Dongmei Li,

Dongmei Li

orcid.org/0000-0001-7409-6656

School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China bjfu.edu.cn

Engineering Research Center for Forestry-oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China

Search for more papers by this author

Jintao Qu,

Jintao Qu

orcid.org/0000-0001-6894-388X

School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China bjfu.edu.cn

Engineering Research Center for Forestry-oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China

Search for more papers by this author

Ziwei Tian,

Ziwei Tian

School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China bjfu.edu.cn

Engineering Research Center for Forestry-oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China

Search for more papers by this author

Zijun Mou,

Zijun Mou

Shandong University of Traditional Chinese Medicine, Jinan 250355, Shandong, China sdutcm.edu.cn

Search for more papers by this author

Lei Zhang,

Lei Zhang

orcid.org/0000-0003-0529-1305

National Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China cacms.ac.cn

Search for more papers by this author

Xiaoping Zhang,

Corresponding Author

Xiaoping Zhang

[email protected]

orcid.org/0000-0001-6116-1906

National Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China cacms.ac.cn

Search for more papers by this author

First published: 12 October 2022

https://doi.org/10.1155/2022/7708376

Citations: 2

Academic Editor: Huantian Cui

Share a link

Email
Wechat
Bluesky

Abstract

Cerebral palsy is one of the most prevalent neurological disorders and the most frequent cause of disability. Identifying the syndrome by patients’ symptoms is the key to traditional Chinese medicine (TCM) cerebral palsy treatment. Artificial intelligence (AI) is advancing quickly in several sectors, including TCM. AI will considerably enhance the dependability and precision of diagnoses, expanding effective treatment methods’ usage. Thus, for cerebral palsy, it is necessary to build a decision-making model to aid in the syndrome diagnosis process. While the recurrent neural network (RNN) model has the potential to capture the correlation between symptoms and syndromes from electronic medical records (EMRs), it lacks TCM knowledge. To make the model benefit from both TCM knowledge and EMRs, unlike the ordinary training routine, we begin by constructing a knowledge-based RNN (KBRNN) based on the cerebral palsy knowledge graph for domain knowledge. More specifically, we design an evolution algorithm for extracting knowledge in the cerebral palsy knowledge graph. Then, we embed the knowledge into tensors and inject them into the RNN. In addition, the KBRNN can benefit from the labeled EMRs. We use EMRs to fine-tune the KBRNN, which improves prediction accuracy. Our study shows that knowledge injection can effectively improve the model effect. The KBRNN can achieve 79.31% diagnostic accuracy with only knowledge injection. Moreover, the KBRNN can be further trained by the EMRs. The results show that the accuracy of fully trained KBRNN is 83.12%.

1. Introduction

Cerebral palsy is a leading cause of disability and could be challenging to cure throughout life [1]. The TCM theory plays an active role in the treatment of cerebral palsy. Symptoms are crucial in clinical diagnosis and treatment [2]. During clinical diagnosis, doctors integrate TCM theories to identify the syndrome based on patients’ symptoms, which are heavily influenced by the doctor’s previous experience. AI-assisted TCM diagnosis relies primarily on digital data obtained by modern electronic instruments, making TCM diagnosis more quantitative, objective, and standardized [3]. Thus, it is necessary to have a computer-aided decision-making model for the diagnosis to balance the uncertainty of human factors.

For the past two decades, owing to advancements in sensor, detector, and transducer technologies, it makes possible for AI to learn from digital information. Thus, AI-assisted TCM diagnosis has become a burgeoning field of research [4]. In earlier research, most AI approaches employed in TCM diagnosis are mostly limited to traditional machine-learning algorithms and their modified forms, such as support vector machine (SVM), random forest (RF), AdaBoost, and decision tree (DT). Wang [5] used a Bayesian classifier to generate the relationships between the human pulse and diagnostic. Zhang et al. [6] studied quantitative correlations between diseases and the physical appearance of the human tongue. In these conventional machine learning methods, the characteristics are extracted by specialists with extensive TCM clinical expertise. Deep learning technology has grown rapidly in recent years. Unlike the traditional machine learning methods, neurons in deep learning models can acquire diagnostic properties from the initial data set. The deep learning model comprises more complex hierarchical multilayer networks of artificial neurons that can automatically discover valuable features from the original data. Hu et al. [7] proposed a classifier by using the Shannon energy envelope, Hilbert transform, and deep convolutional neural networks (DCNN) for the analysis of the human pulse. Combing the characteristics of basic image processing and deep learning, Fu et al. [8] presented a computerized tongue coating nature diagnosis method using deep neural networks. Hou et al. [9] proposed a neural network for tongue color classification, which is more practical and accurate than the traditional one. Although the previous studies have attained a high level of accuracy, they only considered single-modal data and only a portion of patients’ information. Therefore, recent studies are expected to introduce more comprehensive data. Yang et al. [10] developed a novel deep neural network that uses multiview features of the gene data to identify the disease genes. Dai et al. [11] proposed a multimodal deep learning framework based on the four-diagnosis of TCM. These approaches effectively compensate for the information in a single-modal and improve the accuracy of the model.

With the rise of medical digitalization, the hospital information system deposited a considerable volume of EMR data, which completely documents the patients’ situation in text form. There is increasing interest in applying machine learning techniques to decision-making models for medical diagnosis and treatment. Liang et al. [12] adopted the deep belief network (DBN) to acquire feature representation from EMR and then combined the SVM for supervised learning on the labeled data. Similarly, various supervised machine learning algorithms such as random forest and logistic regression were used in [13] to build ischemic stroke classifiers. Although these ML-based methods outperform conventional techniques such as rule-based algorithms by using massive datasets, they ignore domain-specific knowledge.

The knowledge graph (KG), once known as ontology in early research, serves as an excellent solution to inject domain-specific knowledge into the ML models. The KG is a multirelational graph composed of entities and relationships containing a large amount of prior knowledge [14, 15]. Gone et al. [16] stood on advances in graph embedding learning techniques, decomposing the medicine recommendation task into a link prediction process, and proposed the safe medicine recommendation framework. Abdelaziz et al. [17] developed a large-scalesimilarly-based framework that predicts drug-drug interactions through text and graph embedding algorithms. These studies fully exploit the domain knowledge in the knowledge graph, but they cannot benefit from the large scale of labeled data. In other words, an exceptional specialist should process not just sound professional knowledge but also extensive experience.

For the TCM cerebral palsy diagnosis model to benefit from both the knowledge graph and the EMR, we propose a two-step model called KBRNN to achieve this purpose. In the first step, we extract evidence-based diagnostic knowledge from cerebral palsy KG by using intelligent optimization algorithms and represent this knowledge as tensors. Then, we inject the knowledge into RNN by converting the tensor to the parameter of the RNN. So far, we have obtained the knowledge-based RNN (KBRNN) that can be trained with the TCM data for fine-tuning.

Our key contributions are listed as follows:

(1)
We propose the knowledge-based RNN (KBRNN). Compared with the traditional methods, the KBRNN can be enhanced by the domain knowledge in KG. Also, the performance of KBRNN can be further enhanced by training on the labeled data.
(2)
Under the KBRNN proposed, we design an evolutionary algorithm for knowledge extraction and give an ingenious way to represent the knowledge as tensors and inject them into the RNN.
(3)
The experiment results show the accuracy of diagnosis of the untrained KBRNN which only with knowledge injections is 79.31%, and is up to 83.12% for the fully trained KBRNN.

2. Related Work

2.1. Knowledge Graph Inference and Its Applications

The knowledge graph contains the amount of prior knowledge [18], which can provide external information for various downstream tasks [19]. For medical tasks, Yang et al. [20] introduced the link prediction for the diagnosis of syndrome by dismantling medical records into multiple symptoms based on the KG. Zheng et al. [21] learned the relational embedding from nodes in KG to access medical knowledge and used them to improve the classifier’s performance through the mechanism of medical knowledge attention. Zhang and Che. [22] constructed Parkinson’s disease KG and KG completion methods that were leveraged to predict drug candidates. Yang et al. [23] pretrained the embeddings of entities by large-scale domain-specific corpus while learning the knowledge embeddings of entities via a joint TransC-TransE model. Lin et al. [24] combined the context provided by medical entity descriptions with the embeddings of medical entities and relations and user embeddings to learn patient similarities through a convolutional neural network. Lin et al. [25] utilized graph representation learning models to obtain the embedding vectors of the entities, then applied the embeddings to study patient similarities. These works used joint representation to bring entity and word vector space closer. However, for KGs with large numbers of entities, dealing with entities and their relationships leads to higher time complexity.

Furthermore, there is also some research about inference on the KG directly, without embedding the relations and entities. El-Shafai et al. [26] provided a method that simulates syndrome differentiation through Bayes and TF-IDF on a knowledge graph to achieve automated diagnosis in TCM. Yao et al. [27] presented an ontology-based model that utilized ontology attributes for training the neural network for medicine side-effect prediction. Xie et al. [28] applied the TF-IDF to the TCM KG and proposed a knowledge-based syndrome reasoning model.

2.2. Neural Network with Knowledge Enhance

Lin et al. [29] proposed a trigger matching network, which trains a trigger matching network with additional annotation and uses the output as the attention of the sequence labeler. Luo et al. [30] combined a neural network with regular expressions (RE) to improve supervised learning for natural language processing. Jiang et al. [31] proposed FA-RNN, a recurrent neural network that incorporates the benefits of both neural networks and regular expression rules. Finally, Jiang et al. [32] transformed regular expressions into neural networks to combine the two ways for slot filling.

3. Methods

3.1. Framework Overview

Figure 1 shows a two-step routine to construct a KBRNN, i.e., knowledge extracting and knowledge injecting. In knowledge extraction, an evolutionary algorithm is designed to extract high-scored knowledge from the KG. A part of EMRs is utilized to score knowledge. In knowledge injecting, knowledge is converted to a tensor in the knowledge embedding module. Then, the tensor decompose module decomposes the knowledge tensor as the parameters of RNN. This gives us the KBRNN which incorporates domain knowledge.

Details are in the caption following the image — Open in figure viewer PowerPoint

3.2. Notation

To focus on diagnosing the syndrome by the patients’ symptoms, shown in Figure 2, we reconstruct a sub-KG K based on the KG proposed by [33]. In this sub-KG K , we only retain the symptom and syndrome entities related to this research and exclude other entities such as acupoints, formula, and herb which are not related to diagnosis. For description, we give each symptom a unique and continuous ID starting from 0 and denote the symptom by “SYM”+ID. Similarly, we use “SYN”+ID to refer to a syndrome.

As a KG, K consists of entities E and relations R.

E: a set of entities. |E| = N. There are three types of entities (main symptom, additional symptom, and syndrome), E = E_{main_sym} ∪ E_{add_sym} ∪ E_syn.
R: a set of relations. |R| = M.
t: Let e_i, e_k ∈ E, r_j ∈ R, t = (e_i, r_j, e_j) is the relationship between entities.

In data processing and knowledge extraction, two common operations on K should be mentioned here.

E_Query(SYNi, E^′): returns a set containing all the entities in E^′ that are connected to SYNi.
E_match(sentence): sequential output the alias of entities which appear in sentence.

In this study, each EMR contains two parts: the descriptions of the main symptom and the additional symptom. Via data preprocessing, we splice the two parts of each EMR to get a sentence s and c convert each EMR to a symptom-level sentence by E_match(s). The EMR sentence corresponds to the EMR labeled as SYNi is defined as

(1)

where s_SYNi[1, i]⊆E_{main_sym}, s_SYNi[i + 1, n]⊆E_{add_sym}, n is the length of sentence s.

3.3. Extract Knowledge from KG

3.3.1. Definitions and Task Complexity

This section details the thought to treat the knowledge extracting task as an optimization problem.

Above all, we define what the “knowledge” in the KG is. For the SYN2 shown in Figure 2, one of the knowledge about SYN2 denoted as Knowl_SYN2 can acquire by (2) and the result as (3).

(2)

(3)

The (3) can be visually converted to a regular expression (RE) as (4), where “|” is the OR operator, “+” means one or more occurrences.

(4)

Obviously, the sentence s_SYN2 = <SYM5, SYM3, SYM7, SYN2> labeled as SYN2 can be recognized by RE_SYN2. However, the risk raised with the RE s is that it may lead to the wrong diagnosis. For example, RE_SYN2 may also recognize the sentence s_SYN3 = <SYM4, SYM2, SYM7> labeled as SYN3. For this issue, it looks like a feasible method that enumerates the subsets of E_{main_sym} and E_{add_sym}, then, splicing them to generate as candidate solutions and filtering the useful Knowl_SYNi with the verification of EMR sentences for each syndrome. But the time complexity is as high as . Fortunately, too much knowledge injection complicates the diagnosis model, which will be discussed further in Section 3.4.2. Thus, for a specific syndrome named SYNi and a Knowl_SYNi scoring function V, it is enough to find the “top-k Knowl_SYNi” corresponding to the k highest score Knowl_SYNi from all the Knowl_SYNi of each syndrome.

By the well-performance of the evolutionary algorithm in searching for relative optimal solutions from the large solution space, we design an evolutionary algorithm for knowledge extraction. Figure 3 shows the main steps of the algorithm.

Our knowledge extraction method via evolutionary algorithms is based on the combination of two well-known expansions to the standard genetic strategy. On the one hand, we apply repeated reinitializations of the candidate solution when it reaches a state of stagnation. On the other hand, we utilize parallel computing in the process of evolution. While the former effectively overcomes the evolutionary algorithm’s difficulty of falling into local optimal, the latter significantly improves the efficiency by allowing parallel calculation of the score of each solution. Moreover, assigning individuals to different computational cores can be viewed as a strategy for multiple population evolution, optimizing the algorithm’s robustness.

There are two problem-specific modules in evolutionary algorithms, i.e., generator and evaluator. The following sections detail their specific implementation.

3.3.2. Generator

The generator module creates the initial set of

as candidate solutions for SYNi. A candidate solution corresponding to a Knowl_SYNi can be defined as a triple τ = <φ, ψ, v>, let |E_{main_sym} ∪ E_{add_sym}| = m.

(i)
φ ∈ {0,1}^m: main symptom vector, ϕ[j] = 1 if SYMj is selected, else ϕ[j] = 0.
(ii)
ψ ∈ {0,1}^m: additional symptom vector, ψ[j] = 1 if SYMj is selected, else ψ[j] = 0.
(iii)
v ∈ℝ: the score of such solution, calculated by using the evaluator module. Initialize to 0.

The generator generates a list of τ_r = <ϕ_r, ψ_r, 0> denoted by Γ = [τ₀, τ₁, τ₂, …, τ_l−1] by initializing the ϕ_i and ψ_i randomly, where l is the length of Γ, l > k, 0 ≤ r < l.

3.3.3. Evaluator

The evaluator calculates the score of each τ in Γ. We select n EMR sentences as the test case to compute the score of τ. This section will detail the scoring algorithm.

For a syndrome aliased SYNi, the evaluator divides the n EMR sentences into two disjoint sets denoted by EMR_true and EMR_false, where let |EMR_true| = a, |EMR_false| = b, a + b = n. A sentence is divided into EMR_true if and only if it is labeled as SYNi.

By these, variables t_r, f_r, c_r about solution τ_r = <φ_r, ψ_r, v_r> can be defined as follows:

(i)
t_r∈ℕ: the number of sentences in EMR_true which can be recognized by τ_r
(ii)
f_r∈ℕ: the number of sentences in EMR_false which can be recognized by τ_r
(iii)
c_r∈ℕ: the number of symptoms in τ_r

As explained in Section 3.3.1, a high-score solution corresponding to an RE that recognizes the maximum number of s_SYNi while maintaining a minimal number of symptoms. In addition, misrecognition is not allowed. This provides us with the fundamental form of the scoring function equation.

(5)

where

is the indicator function,

if f_r ∈ N⁺, else

c_r can be calculated as equation.

(6)

The following describes the calculation of t_r and f_r. We maintain two matrices TP, FP with the following rules.

(i)
TP ∈ {0,1}^a×m: TP[i][j] = 1 if the SYMj in the ith sentence of EMR_true, otherwise TP[i][j] = 0
(ii)
TF ∈ {0,1}^b×m: TF[i][j] = 1 if the SYMj in the ith sentence of EMR_false, otherwise TF[i][j] = 1

Then, we obtain t_r and f_r from equation (7) and equation (8), respectively.

(7)

(8)

where ∘ denotes element-wise product and A ∈ [1]^m.

3.4. Convert the Knowledge to KBRNN

By the knowledge extraction algorithm details in Section 3.3, we get the top-k Knowl_SYNi for each syndrome. A Knowl_SYNi can be converted to an RE as (3)and (4) details in Section 3.3.1. We formally take the syndrome diagnosis task as a text classification problem, i.e., given an EMR sentence as the input of KBRNN, the output is the syndrome corresponding to the sentence.

For this task, as usual, we further process the EMR sentences as follows: we add the “BOS” and “EOS” at both ends of each EMR sentence as the mark of the start and end. We fill the sentence with “PAD”s to make all the sentences of the same length. Accordingly, to ensure that the RE corresponding to the Knowl_SYNi can recognize these new sentences, we add the $ ^∗ at both ends of RE, while $ is the wildcard, and ^∗ is the Kleene star operator. Take (3) as an example. The equation (4) corresponding to (3) can be rewritten as the following equation:

(9)

In the following section, we illustrate the implementation of the KBRNN, which is generated by injecting Knowl_SYNi into RNN.

3.4.1. Embedding the Knowledge via Finite-State Automaton

Finite-State Automaton (FSA) is an abstract model of computation, which can change from one state to another in response to some inputs. The FSA can be used to recognize sentences. Given a sentence s = <^′BOS^′, sym₁, sym₂, sym3, …, sym_n,^′EOS^′> , an FSA Λ, we feed the elements of s into Λ in order. Λ recognizes s if and only if the state transition sequence starts from the start state and ends with a final state.

There are two types of FSA: nondeterministic finite automaton (NFA) and deterministic finite automaton (DFA). The “deterministic” indicates that by giving the state an input, there is a unique transition to the next state. With Thompson’s construction algorithm [34], an RE can be converted into an NFA. Then, the NFA can be converted to a unique DFA with a minimum number of states called m-DFA by the power set construction algorithm and the DFA minimization algorithm.

For k × |E_syn|Knowl_SYNi obtained by the algorithm in Section 3.3, each Knowl_SYNi can be converted to an m-DFA. Then, we merge all the m-DFAs by adding a new start state q_ϵ and adding empty transitions from q_ϵ to all start states of m-DFAs. This new FSA is denoted as

, which can be defined formally as a 5-tuple:

Q: a nonempty, finite set of states. Let |Q| = K.
Σ: a nonempty, finite set of input vocabulary. Let |Σ| = V, V ∝ |E_sym|.
δ: transfer function, δ(q, σ) = p(p, q ∈ Q, σ ∈ Σ).
q_ϵ: the start state, q_ϵ ∈ Q.
F^′: a nonempty, finite set of final states, F^′⊆Q.

Based on the above definition, we can represent

equivalently by matrixes T, S, F.

T ∈ {0,1}^V×K×K: the transfer matrix, T[σ, i, j] = 1 if the state q_i can transit to q_j when input a vocabulary σ, otherwise 0. (q_i, q_j ∈ Q, σ ∈ Σ).
S ∈ {0,1}^K: S[i] = 1 if q_ϵ can transit to q_i directly, otherwise 0.
F ∈ {0,1}^K: F[i] = 1 if q_i ∈ F^′, otherwise 0.

Now, we obtain the knowledge embedding 〈T, S, F〉.

3.4.2. Inject the Knowledge Embedding into RNN

For a sentence s = <s₁, s₂, s₃ … , s_x>, the Out(s) denotes the number of s recognized by m-DFAs, which can be expressed as the following equation:

(10)

Here, we extend the approach in [31], which used canonical polyadic decomposition (CPD) to decompose T into E_R ∈ R^V×r, D₁ ∈ R^K×r, D₂ ∈ R^K×r, where r is a hyperparameter. As the study in [31], the decomposition is approximate when the r converges to the rank of T, and if r is too large, it may lead to a higher space complexity. In this work, the rank of T is positive to the number of symptoms in Knowl_SYNi. That is why, we must maintain a minimum number of symptoms in Knowl_SYNi.

E_R has a dimension equal to the size of the input set Σ, which can be considered as the word embedding of each input word. In this work, we integrate the BERT [35] embedding into E_R. Let w_t be the word embedding of s_t, u_t be the 768-dim word embedding generated by using bert-base-chinese, and v_t be the embedding of s_t in E_r. The BERT embedding can be integrated by equation (11). Here, β ∈ [0,1] is a hyperparameter, and G ∈ R^D×r is a trainable matrix.

(11)

With the CPD result, the equation (10) can be rewritten to the recurrent form similar to the formal definition of RNN as the following equation:

(12)

So far, we have obtained the RNN injected with knowledge.

4. Experiments and Results

4.1. Datasets

We collect the dataset from a project by the National Key Research and Development Program of the Chinese Academy of Traditional Chinese Medicine, “Chinese Medicine Data Center and Health Cloud Platform Building.” The EMR data are mainly from the Hospital Information System (HIS), which includes admission records, course records, discharge summaries, and medical records of cerebral palsy patients within a specific time frame. These data come from clinically valid cases and have been desensitized to protect patients’ private information.

The original EMR data has several flaws, including a nonstandard format and diverse expression. A team of professionals is invited to tag the EMR data manually so that it may be organized into structured data for further research. Data tagging assumes the form of two-person cooperation to prevent errors caused by the limited expertise of a single individual. There remain nonstandard data in the structured data after the data tagging process. For instance, a particular symptom may have several distinct expressions. In data standardization, numerous professional words are first standardized and sorted out collaboratively by a group of individuals. Then, a medical specialist induces the standard terms included in the medical records. In the end, the standardization of 988 symptoms and 15 syndromes was achieved. According to traditional Chinese medicine, these symptoms may be further subdivided into main symptoms and additional symptoms. The main symptoms might generally represent the patients’ overall condition, but the additional symptoms relate to complications, which is a significant diagnostic criterion for syndrome kinds.

Thus, we obtained 5514 labeled diagnostic records from 1755 patients. Each record has three fields, main symptoms, additional symptoms, and syndrome as the label.

4.2. Experimental Steps

We divide the EMR dataset randomly into the following four parts:

(i)
Pre-set (20%): the pretrained dataset that engages in the scoring of knowledge in the knowledge extraction algorithm
(ii)
Train-set (50%): train dataset, the dataset used for training models
(iii)
Dev-set (20%): validation dataset, a set of examples used to tune hyperparameters
(iv)
Test-set (10%): test dataset, a dataset for testing the performance of the trained model

During the knowledge extraction phase, we execute the evolutionary algorithm and utilize pre-set data for knowledge scoring and obtain top-k Knowl_SYNi (k = 6 in practice) for each syndrome. We removed some Knowl_SYNi that scored poorly, which is caused by the insufficiency of the corresponding syndromes’ sample sizes.

During knowledge embedding and injecting, we obtain an untrained KBRNN that has not been trained on the train-set. We adopt some conventional machine learning models which are frequently used in text classification as baselines and compare them with KBRNN. For each baseline, we feed the hidden representation produced by these models into a multilayer perceptron (MLP) and use the cross-entropy loss as the objective function.

4.3. Experimental Results

We compare KBRNN with RNN [36], LSTM [37], GRU [38], 4-layer CNN [39], 4-layer DAN [40] as well as their bidirectional variants. We use the cross-entropy loss as the objective function and input the hidden representation generated by these models into a 3-layer MLP to obtain the label logits. For each dataset, we tune the learning rates from [0.01, 0.005, 0.001, 0.0005, 0.0001] and the number of hidden states in [50, 100, 150, 200]. Two potential benefits are explored as follows:

(1)
The contribution of knowledge extraction to KBRNN: we use the pre-set as the training set of baselines and compare the performance of untrained KBRNN and baselines on the test set
(2)
The ability of KBRNN to benefit from labeled data: we utilize both the pre-set and the train-set (50%, 100%) as the training set and fine-tune the untrained KBRNN with the train-set

Table 1 displays the classification accuracy of the KBRNN and baseline models on the test-set after training with varying amounts of training data. The KBRNN can achieve 79.31% diagnostic accuracy with only injecting the knowledge extracted from the KG based on pre-set and rises to 83.12% with sufficient training based on the 100% train-set.

1. The classification accuracy of the KBRNN and baselines.

	Preset	Preset + 50% train-set	Preset + 100% train-set
KBRNN	79.31	82.03	83.12
RNN	44.28	71.32	78.03
LSTM	45.01	76.04	79.49
GRU	45.19	76.22	79.85
Bi-RNN	44.64	76.59	82.39
Bi-LSTM	45.74	77.13	80.40
Bi-GRU	45.91	79.85	82.58
CNN	47.37	79.67	81.67
DAN	48.09	80.21	81.85

The result shows that the untrained KBRNN outperforms all the other baselines which are only trained on the pre-set. It is also better than some of the baselines trained with 50% of the train-set (Figure 4.). We believe that KBRNN obtains considerable a priori knowledge from the knowledge graph through injection. The classification result on the full samples by using the fully trained KBRNN is shown as the confusion matrix in Figure 5, which provides a good insight into how often samples of each fifteen syndromes are correctly classified or misclassified by the proposed model. We can find that the number of samples varies greatly in each syndrome type, and the true positive rate could be maintained at a high level even for the syndrome with a large number of samples. As with other models, KBRNN can benefit from expanding the training set while keeping accuracy benefits.

5. Discussion and Conclusions

TCM, as a complementary field of medicine outside the modern medicine system, has played a significant role in cerebral palsy syndrome diagnosis. In this work, we propose a knowledge-based RNN (KBRNN) for cerebral palsy syndrome diagnosis. Our major contribution is building an evolutionary algorithm to extract the diagnosis knowledge from the KG. In particular, we also present the method of injecting the TCM knowledge into the RNN. Compared with the simple KG inference or the rule-based methods, as a neural network model, the KBRNN can be further trained by EMR data, which makes the KBRNN more generalized. On the other hand, compared with the traditional neural network model, KBRNN can benefit from TCM knowledge. Specifically, with the help of TCM knowledge, KBRNN outperforms previous neural approaches in the scene where only a few EMRs are available, and it remains competitive in rich-resource settings.

In conclusion, KBRNN can benefit from two aspects, i.e., knowledge extracted from the cerebral palsy knowledge graph and labeled EMR. We show that KBRNN achieves higher accuracy in syndrome diagnostic tasks only with knowledge injection. Moreover, the performance of KBRNN can be further improved after training with a large amount of labeled EMR, which outperforms the current model.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the CACMS Innovation Fund (CI2021A00512) and the National Key Research and Development Program of China under grant (2017YFC1703506).

Open Research

Data Availability

All data included in this study are available upon request by contact with the corresponding author.

References

1 Graham H. K., Rosenbaum P., and Paneth N., Erratum: cerebral palsy, Nature Reviews Disease Primers. (2016) 2, no. 1.
PubMed Google Scholar
2 Zhou X., Menche J., Barabási A.-L., and Sharma A., Human symptoms–disease network, Nature Communications. (2014) 5, no. 1.
10.1038/ncomms5212
Web of Science® Google Scholar
3 Wang Y., Shi X., Li L., Efferth T., and Shang D., The impact of artificial intelligence on traditional Chinese medicine, The American Journal of Chinese Medicine. (2021) 49, no. 6, 1297–1314, https://doi.org/10.1142/s0192415x21500622.
10.1142/S0192415X21500622
PubMed Google Scholar
4 Zhou X., Chen S., Liu B., Zhang R., Wang Y., Li P., Guo Y., Zhang H., Gao Z., and Yan X., Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support, Artificial Intelligence in Medicine. (2010) 48, no. 2–3, 139–152, https://doi.org/10.1016/j.artmed.2009.07.012, 2-s2.0-77951641206.
10.1016/j.artmed.2009.07.012
PubMed Web of Science® Google Scholar
5 Wang H., A computerized diagnostic model based on naive bayesian classifier in traditional Chinese medicine, International Conference on BioMedical Engineering and Informatics. (2008) 1, 474–477.
Google Scholar
6 Zhang D., Pang B., Li N., Wang K., and Zhang H., Computerized diagnosis from tongue appearance using quantitative feature classification, The American Journal of Chinese Medicine. (2005) 33, no. 6, 859–866, https://doi.org/10.1142/s0192415x05003466, 2-s2.0-31144456142.
10.1142/S0192415X05003466
PubMed Web of Science® Google Scholar
7 Hu X., Zhu H., Xu J., Xu D., and Dong J., Wrist pulse signals analysis based on deep convolutional neural networks, Proceedings of the IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, 2014, Honolulu, HI, USA, IEEE, 1–7.
Google Scholar
8 Fu S., Zheng H., and Yang Z., Computerized tongue coating nature diagnosis using convolutional neural network, Proceedings of the IEEE 2nd International Conference on Big Data Analysis (ICBDA), 2017, Piscataway, NY, USA, IEEE, 730–734.
Google Scholar
9 Hou J., Su H.-Y., and Yan B., Classification of tongue color based on cnn, Proceedings of the IEEE 2nd International Conference on Big Data Analysis (ICBDA), 2017, Piscataway, NY, USA, IEEE, 725–729.
Google Scholar
10 Yang K., Zheng Y., Lu K., Chang K., Wang N., Shu Z., Yu J., Liu B., Gao Z., and Zhou X., PDGNet: predicting disease genes using a deep neural network with multi-view features, IEEE/ACM Transactions on Computational Biology and Bioinformatics. (2022) 19, no. 1, 575–584, https://doi.org/10.1109/tcbb.2020.3002771.
10.1109/TCBB.2020.3002771
PubMed Google Scholar
11 Dai Y., Wang G., Dai J., and Geman O., A multimodal deep architecture for traditional Chinese medicine diagnosis, Concurrency and Computation: Practice and Experience. (2020) 32, no. 19, https://doi.org/10.1002/cpe.5781.
10.1002/cpe.5781
Google Scholar
12 Liang Z., Liu J., Ou A., Zhang H., Li Z., and Huang J. X., Deep generative learning for automated EHR diagnosis of traditional Chinese medicine, Computer Methods and Programs in Biomedicine. (2019) 174, 17–23, https://doi.org/10.1016/j.cmpb.2018.05.008, 2-s2.0-85048733579.
10.1016/j.cmpb.2018.05.008
PubMed Google Scholar
13 Sung S.-F., Lin C.-Y., and Hu Y.-H., EMR-Based phenotyping of ischemic stroke using supervised machine learning and text mining techniques, IEEE Journal of Biomedical and Health Informatics. (2020) 24, no. 10, 2922–2931, https://doi.org/10.1109/jbhi.2020.2976931.
10.1109/JBHI.2020.2976931
PubMed Google Scholar
14 Chen X., Jia S., and Xiang Y., A review: knowledge reasoning over knowledge graph, Expert Systems with Applications. (2020) 141, https://doi.org/10.1016/j.eswa.2019.112948.
10.1016/j.eswa.2019.112948
Google Scholar
15 Ji S., Pan S., Cambria E., Marttinen P., and Yu P. S., A survey on knowledge graphs: representation, acquisition, and applications, IEEE Transactions on Neural Networks and Learning Systems. (2022) 33, no. 2, 494–514, https://doi.org/10.1109/tnnls.2021.3070843.
10.1109/TNNLS.2021.3070843
PubMed Web of Science® Google Scholar
16 Gong F., Wang M., Wang H., Wang S., and Liu M., SMR: medical knowledge graph embedding for safe medicine recommendation, Big Data Research. (2021) 23, https://doi.org/10.1016/j.bdr.2020.100174.
10.1016/j.bdr.2020.100174
Google Scholar
17 Abdelaziz I., Fokoue A., Hassanzadeh O., Zhang P., and Sadoghi M., Large-Scale structural and textual similarity-based mining of knowledge graph to predict drug-drug interactions, Journal of Web Semantics. (2017) 44, 104–117, https://doi.org/10.1016/j.websem.2017.06.002, 2-s2.0-85020795447.
10.1016/j.websem.2017.06.002
Google Scholar
18 Liu W., Zhou P., Zhao Z., Wang Z., Bert K., Ju Q., Deng H., and Wang P, K-BERT: enabling language representation with knowledge graph, Proceedings of the AAAI Conference on Artificial Intelligence. (2020) 34, no. 3, 2901–2908, https://doi.org/10.1609/aaai.v34i03.5681.
10.1609/aaai.v34i03.5681
Google Scholar
19 Wang H., Ren H., and Leskovec J., Relational message passing for knowledge graph completion, 2021, https://arxiv.org/abs/2002.06757.
Google Scholar
20 Yang R., Ye Q., Cheng C., Zhang S., Lan Y., and Zou J., Decision-making system for the diagnosis of syndrome based on traditional Chinese medicine knowledge graph, Evidence-based Complementary and Alternative Medicine. (2022) 9, https://doi.org/10.1155/2022/8693937.
10.1155/2022/8693937
Google Scholar
21 Zheng W., Yan L., Gou C., Zhang Z. C., Jason Zhang J., Hu M., and Wang F. Y., Pay attention to doctor–patient dialogues: multi-modal knowledge graph attention image-text embedding for COVID-19 diagnosis, Information Fusion. (2021) 75, 168–185, https://doi.org/10.1016/j.inffus.2021.05.015.
10.1016/j.inffus.2021.05.015
PubMed Google Scholar
22 Zhang X. and Che C., Drug repurposing for Parkinson’s disease by integrating knowledge graph completion model and knowledge fusion of medical literature, Future Internet. (2021) 13, no. 1, https://doi.org/10.3390/fi13010014.
10.3390/fi13010014
Google Scholar
23 Yang Y., Yin X., and Yang H., KGSynNet: A Novel Entity Synonyms Discovery Framework with Knowledge Graph, 2021, Springer, New York, NY, USA.
Google Scholar
24 Lin Z., Yang D., and Yin X., Patient similarity via joint embeddings of medical knowledge graph and medical entity descriptions, IEEE Access. (2020) 8, 156663–156676, https://doi.org/10.1109/access.2020.3019577.
10.1109/ACCESS.2020.3019577
Google Scholar
25 Lin Z., Yang D., Jiang H., and Yin H., Learning patient similarity via heterogeneous medical knowledge graph embedding, IAENG International Journal of Computer Science. (2021) 48.
Google Scholar
26 El-Shafai W., A Mahmoud A., M El-Rabaie E. S., Taha T., Zahran O., El-Fishawy A., Abd-Elnaby M., and Abd El-Samie F., Traditional Chinese medicine automated diagnosis based on knowledge graph reasoning, Computers, Materials and Continua. (2022) 71, no. 1, 159–170, https://doi.org/10.32604/cmc.2022.017295.
10.32604/cmc.2022.017295
Google Scholar
27 Yao Y., Wang Z., Li L., Lu K., Liu R., Liu Z., and Yan J., An ontology-based artificial intelligence model for medicine side-effect prediction: taking traditional Chinese medicine as an example, Computational and Mathematical Methods in Medicine. (2019) 7, https://doi.org/10.1155/2019/8617503, 2-s2.0-85073621815.
10.1155/2019/8617503
Google Scholar
28 Xie Y., Hu L., Chen X., Feng J., and Zhang D., Auxiliary diagnosis based on the knowledge graph of TCM syndrome, Computers, Materials and Continua. (2020) 65, no. 1, 481–494, https://doi.org/10.32604/cmc.2020.010297.
10.32604/cmc.2020.010297
Google Scholar
29 Lin B. Y., Lee D.-H., and Shen M., TriggerNER: learning with entity triggers as explanations for named entity recognition, 2020, https://aclanthology.org/2020.acl-main.752.
Google Scholar
30 Luo B., Feng Y., and Wang Z., Marrying up regular expressions with neural networks: a case study for spoken language understanding, 2018, https://arxiv.org/abs/1805.05588.
Google Scholar
31 Jiang C., Zhao Y., Chu S., Shen L., and Tu K., Cold-start and interpretability: turning regular expressions into trainable recurrent neural networks, 2020, https://aclanthology.org/2020.emnlp-main.258.
Google Scholar
32 Jiang C., Jin Z., and Tu K., Neuralizing regular expressions for slot filling, 2021, https://aclanthology.org/2021.emnlp-main.747.
Google Scholar
33 Mou Z., Study on the construction of tcm diagnosis and treatment knowledge map and the dominance of tacit knowledge in children with cerebral palsy, Chinese Academy Of Traditional Chinese Medicine. (2021) 33.
Google Scholar
34 Thompson K., Programming Techniques: regular expression search algorithm, Communications of the ACM. (1968) 11, no. 6, 419–422, https://doi.org/10.1145/363347.363387, 2-s2.0-84945708555.
10.1145/363347.363387
Web of Science® Google Scholar
35 Devlin J., Chang M.-W., Lee K., and Toutanova K., Bert: pre-training of deep bidirectional transformers for language understanding, 2019, https://arxiv.org/abs/1810.04805.
Google Scholar
36 Elman J. L., Finding structure in time, Cognitive Science. (1990) 14, no. 2, 179–211, https://doi.org/10.1207/s15516709cog1402_1.
10.1207/s15516709cog1402_1
Web of Science® Google Scholar
37 Hochreiter S. and Schmidhuber J., Long short-term memory, Neural Computation. (1997) 9, no. 8, 1735–1780, https://doi.org/10.1162/neco.1997.9.8.1735, 2-s2.0-0031573117.
10.1162/neco.1997.9.8.1735
CAS PubMed Web of Science® Google Scholar
38 Chung J., Gulcehre C., Cho K., and Bengio Y., Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014, https://arxiv.org/abs/1412.3555.
Google Scholar
39 Kim Y., Convolutional neural networks for sentence classification, 2014, https://arxiv.org/abs/1408.5882.
Google Scholar
40 Iyyer M., Manjunatha V., Boyd-Graber J., and Daumé H., Deep unordered composition rivals syntactic methods for text classification, 2015, https://aclanthology.org/P15-1162.
Google Scholar

Citing Literature

All articles

Knowledge-Based Recurrent Neural Network for TCM Cerebral Palsy Diagnosis

Abstract

1. Introduction

2. Related Work

2.1. Knowledge Graph Inference and Its Applications

2.2. Neural Network with Knowledge Enhance

3. Methods

3.1. Framework Overview

3.2. Notation

3.3. Extract Knowledge from KG

3.3.1. Definitions and Task Complexity

3.3.2. Generator

3.3.3. Evaluator

3.4. Convert the Knowledge to KBRNN

3.4.1. Embedding the Knowledge via Finite-State Automaton

3.4.2. Inject the Knowledge Embedding into RNN

4. Experiments and Results

4.1. Datasets

4.2. Experimental Steps

4.3. Experimental Results

5. Discussion and Conclusions

Conflicts of Interest

Acknowledgments

Open Research

Data Availability

References

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley