Volume 2022, Issue 1 4262270

Research Article

Open Access

[Retracted] Research of Vertical Domain Entity Linking Method Fusing Bert-Binary

Retraction(s) for this article

Hairong Wang,

Corresponding Author

Hairong Wang

[email protected]

orcid.org/0000-0001-6416-5160

School of Computer Science and Engineering, North Minzu University, 750021, China nun.edu.cn

Search for more papers by this author

Beijing Zhou,

Beijing Zhou

School of Computer Science and Engineering, North Minzu University, 750021, China nun.edu.cn

Search for more papers by this author

Bo Li,

Bo Li

School of Computer Science and Engineering, North Minzu University, 750021, China nun.edu.cn

Search for more papers by this author

Xi Xu,

Xi Xu

School of Computer Science and Engineering, North Minzu University, 750021, China nun.edu.cn

Search for more papers by this author

Hairong Wang,

Corresponding Author

Hairong Wang

[email protected]

orcid.org/0000-0001-6416-5160

School of Computer Science and Engineering, North Minzu University, 750021, China nun.edu.cn

Search for more papers by this author

Beijing Zhou,

Beijing Zhou

School of Computer Science and Engineering, North Minzu University, 750021, China nun.edu.cn

Search for more papers by this author

Bo Li,

Bo Li

School of Computer Science and Engineering, North Minzu University, 750021, China nun.edu.cn

Search for more papers by this author

Xi Xu,

Xi Xu

School of Computer Science and Engineering, North Minzu University, 750021, China nun.edu.cn

Search for more papers by this author

First published: 23 August 2022

https://doi.org/10.1155/2022/4262270

Citations: 1

Academic Editor: Yuan Li

Share a link

Email
Wechat
Bluesky

Abstract

To solve the problem of unclear entity boundaries and low recognition accuracy in Chinese text, we construct the crop dataset and propose a Bert-binary-based entity link method. Candidate entity sets are generated through entity matching in multiple data sources. The Bert-binary model is called to calculate the correct probability of the candidate entity, and the entity with the highest score is screened for linking. In comparative experiments with three models on the crop dataset, the F1 value is increased by 2.5% on the best method or by 8.8% on average. The experimental results show the effectiveness of Bert-binary method in this paper.

1. Introduction

As a key technology of natural language processing, entity linking is widely used in knowledge representation, knowledge retrieval, and other fields [1]. Entity linking methods mainly include dictionary-based methods [2–4], search engine methods [5, 6], context augmentation methods, and deep learning methods. For the past few years, with the development of deep learning and their wide application in various fields, the research on entity linking methods based on deep learning has been paid more and more attention [5, 7–29]. Ganea and Hofmann [7] carried out joint disambiguation in entity link by using the method of combining local and global. Le and Titov [8] proposed a neural entity linking model and introduce entity relations, which optimized entity linking in an end-to-end manner with relations as latent variables. Hosseini et al. [9] proposed a neural embedding-based feature function that enhances implicit entity linking through prior term dependencies and entity-based feature function interpolation. Shengchen et al. [10] proposed a domain-integrated entity linking method based on relation index and representation learning, to address the problem that existing entity linking methods cannot combine text information and knowledge base information well. Xie et al. [11] proposed a GRCCEL (graph-ranking collective Chinese entity linking) algorithm, aiming at the problem of ignoring the entity semantic association relationship and being limited by the size of knowledge graph, using the structural relationship between entities in the knowledge graph and the additional background information provided by external sources of knowledge base, to obtain more semantic and structural information; the purpose is to obtain stronger ability to distinguish similar entities. Xia et al. [12] proposed an integrated entity link algorithm that uses topic distribution to represent knowledge and generate candidate entities. Rosales et al. [13] proposed a fine-grained entity link classification model to distinguish different types of entities and relationships. Wang et al. [14] explored methods to resolve the ambiguity of HIN entities for heterogeneous information network (HIN). Huang et al. [15] constructed an entity linking model combining deep neural network and association graph, which enhanced referential and entity representation by adding character features, context, and deep semantics of information, and performed similarity matching. Li et al. [16] constructed an entity link model, and a candidate entity generation is proposed, which is combining knowledge base matching and word vector similarity calculation. Rosales-Méndez et al. [13] designed a questionnaire, proposed a fine-grained entity link classification scheme based on the survey results, relabeled three general entity link datasets according to the classification scheme, and created a fine-grained entity link system. Zhou et al. [17] graph-based joint feature method preprocesses the knowledge base and text and then combines the semantic similarity of multiple features and using restarted random walks and joint disambiguation to select referential linked entities in a graph model. Zhan et al. [18] introduced the BERT pretrained language model into the entity linking task and used TextRank keyword extraction technology to enhance the topic information of the comprehensive description information of the target entity.

The main tasks of entity linking are candidate entity generation and entity disambiguation [30]. The common methods of candidate entity generation include dictionary-based construction methods [19–22], context-based augmentation methods [23–25], and search engine-based construction methods [5, 26]. Entity disambiguation is to disambiguate the generated candidate entity set to determine the target entity in the current context, and its mainstream method is entity disambiguation based on deep learning [7, 27–29].

We make the following contributions in this paper: (1) a Bert-binary entity linking method is proposed, that combines Bert and binary classification methods; (2) the algorithm of candidate entity set generation is given; (3) Proposed the candidate entity disambiguation processing method.

2. Model Framework

The entity linking method based on Bert-binary identifies one or more entity references to be linked according to the text after named entity recognition, generates a candidate entity set, performs joint disambiguation on all entity references in the candidate entity set, and returns the target entity corresponding to each entity reference expression item in the entity reference set in the knowledge base. The process is shown in Figure 1.

Details are in the caption following the image — Open in figure viewer PowerPoint

The entity linking method of Bert-binary mainly includes two main tasks: candidate entity generation and candidate entity disambiguation.

3. Candidate Entity Set Generated

The quality of candidate entity determines the effect of entity linking [31]. Normally, the entity representations identified by the named entity recognition technology may be ambiguous, and there are situations where the expressions of the same semantic entity are different or the expressions of different semantic entities are the same. Therefore, this paper constructs an entity mapping table corresponding to entity references for a specific domain, which should be able to contain all candidate entities corresponding to entity references. Taking the domain of agriculture as an example, some of the constructed entity mappings are shown in Table 1.

1. Entity mapping table.

Entity reference	Candidate entities corresponding to entity references
Rice	Paddy, millet
Clouds of rice disease	Brown leaf blight, leaf burn
Peony leaf tip blight	Tip-white blight
Potato	yáng yù、mǎ líng shŭ、dì dàn
Rice paddies aphids	Macrosiphum avenae, sitobion avenae
Downy mildew	Yellow stunt
Rice blast	Fire blast, knock blast
Sheath blight	Sharp eyespot, Moire disease
…	…

Based on the constructed domain entity reference mapping table, the type of entity reference is determined according to the entity reference and its context information. The knowledge base and semantic dictionary are used to determine the candidate entity corresponding to the entity reference through retrieval, and the candidate entity set is generated. The specific processing is shown in Algorithm 1.

Algorithm 1: Candidate entity generation.

Input: Dictionary D, a set of entity mention M (m ∈ M), a set of named entities key=(k: value)
Output: Candidate entity set E_m
1: for m (m ∈ M) do
2: if (k = = m ‖ entity name ∈ entity mention) then
3: E_m = E_m.add(key)
4: else if (The k exactly matches the first letter of all words in the M) then
5: E_m = E_m.add(key)
6: end if
7: end for

The candidate entity set is generated by fuzzy matching with the same semantic word dictionary and CNKI. Taking Ningxia rice as an example, the text to be processed is as follows: “In recent years, varieties widely promoted in Ningxia production include: Gongyuan 4, Ningjing 23, Ningjing 24….” The entities contained in the text are “Ningxia,” “Gongyuan 4,” “Ningjing 24,” and “Ningjing 23.” Algorithm 1 can be used to correspond “Ningxia” to “place name,” while entity types of “Gongyuan 4,” “Ningjing 24,” and “Ningjing 23” can be corresponding to “rice variety.”

4. Candidate Entity Disambiguation

Aiming at the possible ambiguity in the generated candidate entity set, this paper adopts a method based on binary classification, invokes the Bert model, calculates the probability score of the candidate entity, and regards the entity with the highest score as the correct entity to be linked. The process is shown in Figure 2.

The fully connected layer in the model acts as a classifier. Map n real numbers(-∞n + ∞) to the real numbers of K intervals (-∞n + ∞), map K real numbers to the probability values of K intervals (0, 1) by sigmoid activation function, and the sum of K probabilities is 1. The calculation method is

(1)

In the formula, x is the input of the fully connected layer, W^T is the weight, b is the bias, and

is the probability of sigmoid output. The calculation formula of sigmoid is

(2)

In general, the input of the last fully connected layer of the deep neural network is used as the feature extracted by the neural network from the input data, and the calculation formula is

(3)

After expanding formula (3), it can be expressed as

(4)

Among them, w_jn is the weight of feature under the jn class.

The feature vector output by the BERT layer is spliced and input to the fully connected layer. The calculation formula is

(5)

Among them, S_CLS represents the output vector of the CLS position, S_begin represents the feature vector corresponding to the beginning of the candidate entity, and S_end represents the feature vector corresponding to the end position of the candidate entity.

To solve the overfitting phenomenon in model training, a dropout layer is added between the two fully connected layers to reduce the overfitting phenomenon by reducing the number of feature detectors. In the entity disambiguation experiment, due to the paper uses only two fully connected layers and sigmoid layers, which belong to the shallow neural network, the dropout is 0.15.

For the binary classification task, the loss function is shown in

(6)

Among them, is the positive probability of the model predicted sample, y_j is the sample label. Assuming that the sample is a positive example, the value is 1; otherwise, the value is 0.

5. Method Validation

In order to verify our method, a crop text dataset is constructed for the agricultural domain. The dataset has a total of 24,779 named entities. The precision (P), recall (R), and F1 values are used to evaluate the effectiveness of our method, and the calculation formula is

(7)

(8)

(9)

The experiment is based on the Python environment. On the constructed crop dataset, our method is compared with BiLSTM-Attention [32], LSTM-CNN-CRF [33], and BiLSTM-CNN [34], respectively. The experimental results are shown in Table 2.

2. Comparing the results.

Model	P	R	F1
BiLSTM-Attention [32]	0.8013	0.7710	0.7859
LSTM-CNN-CRF [33]	0.8674	0.7734	0.8177
BiLSTM-CNN [34]	0.9144	0.8716	0.8925
Bert-binary(our method)	0.9154	0.9247	0.9175

Experimental results of BiLSTM-Attention [32] on crowdsourced annotated datasets in the field of information security show that the model significantly outperforms BiLSTM-Attention-CR, CRF-MA, Dawid & Skene-LSTM, BiLSTM-Attention-CRF-VT, and various model methods. The F1 was 3.6% higher from the optimal one. LSTM-CNN-CRF [33] used a hand-crafted feature, part-of-speech tagging information, and prebuilt lexicon information to augment features for representing sentence; the proposed method improves the performance of named entity recognition. The BiLSTM-CNN [34] incorporates the CNN ability to extract local and long-distance dependent features. Compared with CNN- and LSTM-based methods, our method has an average increase of 5.4% in accuracy, 11.9% in recall rate, and 8.8% in F1 value compared to the other three methods. It can be seen from the experimental results that our method has an improvement in accuracy recall rate and F1 value.

6. Conclusions

A Bert-binary method is proposed in this paper, which constructs an entity mapping table for a specific domain, calls semantic dictionary to generate candidate entity set, calls Bret-binary algorithm to conduct entity disambiguation for generated candidate entities, and filters referents of entity names through probability scores of candidate entities. The agricultural domain is selected, a crop dataset is constructed, and a Python experimental environment is set up to conduct comparative experiments with three models on the crop dataset. The experimental results show the effectiveness of Bret-binary method in this paper.

In the future, the combination of the fully connected network and sigmoid as classifiers can be improved by optimizing the decision variables of neural networks [35, 36] to obtain better accuracy, feasibility, and reliability.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work is supported by the North Minzu University Major Education and Teaching Reform Projects (2021) and the Key Laboratory of Images & Graphics Intelligent Processing of State Ethnic Affairs Commission.

Open Research

Data Availability

The crop dataset used to support the findings of this study have been deposited in the Baidu’s AI Studio repository (https://aistudio.baidu.com/aistudio/datasetdetail/153737).

References

1 Jia N., Cheng X., Su S., and Ding L., CoGCN: combining co-attention with graph convolutional network for entity linking with knowledge graphs, Expert Systems. (2021) 38, no. 1, article e12606, https://doi.org/10.1111/exsy.12606.
10.1111/exsy.12606
PubMed Google Scholar
2 Gattani A., Lamba D. S., Garera N., Tiwari M., Chai X., Das S., Subramaniam S., Rajaraman A., Harinarayan V., and Doan A., Entity extraction, linking, classification, and tagging for social media: a Wikipedia-based approach, Proceedings of the VLDB Endowment. (2013) 6, 1126–1137, https://doi.org/10.14778/2536222.2536237, 2-s2.0-84891089289.
10.14778/2536222.2536237
Google Scholar
3 Zhang W., Su J., Tan C. L., and Wang W. T., Entity linking leveraging auto-matically generated annotation, Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), 2010, Beijing, China, 1290–1298.
Google Scholar
4 Zheng Z., Li F., Huang M., and Zhu X., Learning to link entities with knowledge base, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010, Los Angeles, California, USA, 483–491.
Google Scholar
5 Monahan S., Lehmann J., Nyberg T., Plymale J., and Jung A., Cross-Lingual Cross-Document Coreference with Entity Linking, TAC, 2011, 1–10.
Google Scholar
6 Dredze M., McNamee P., Rao D., Gerber A., and Finin T., Entity disambiguation for knowledge base population, Coling 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, 2010, Beijing, China, 277–285.
Google Scholar
7 Ganea O. E. and Hofmann T., Deep joint entity disambiguation with local neural attention, https://arxiv.org/abs/1704.04920.
Google Scholar
8 Le P. and Titov I., Improving entity linking by modeling latent relations between mentions, https://arxiv.org/abs/1804.10637.
Google Scholar
9 Hosseini H., Nguyen T. T., Wu J., and Bagheri E., Implicit entity linking in tweets: an ad-hoc retrieval approach, Applied Ontology. (2019) 14, no. 4, https://doi.org/10.3233/AO-190215.
10.3233/AO-190215
Web of Science® Google Scholar
10 Jiang S.-C., Wang H.-B., Yu Z.-T., Xian Y.-T., and Wang H.-T., Domain integrated entity linking based on relational index and representation learning, Acta Automatica Sinica. (2019) 47, no. 10, 1–10.
Google Scholar
11 Xie T., Wu B., Jia B., and Wang B., Graph-ranking collective Chinese entity linking algorithm, Frontiers of Computer Science. (2020) 14, no. 2, 291–303, https://doi.org/10.1007/s11704-018-7175-0.
10.1007/s11704-018-7175-0
Web of Science® Google Scholar
12 Xia Y., Wang X., Gu L., Gao Q., Jiao J., and Wang C., A collective entity linking algorithm with parallel computing on large-scale knowledge base, The Journal of Supercomputing. (2020) 76, no. 2, 948–963, https://doi.org/10.1007/s11227-019-03046-7.
10.1007/s11227-019-03046-7
Web of Science® Google Scholar
13 Rosales-Méndez H., Hogan A., and Pob-lete B., Fine-grained entity linking, Journal of Web Semantics. (2020) 65, article 100600, https://doi.org/10.1016/j.websem.2020.100600.
10.1016/j.websem.2020.100600
Web of Science® Google Scholar
14 Wang C., He X., and Zhou A., HEEL: exploratory entity linking for heterogeneous information networks, Knowledge and Information Systems. (2020) 62, no. 2, 485–506, https://doi.org/10.1007/s10115-019-01354-1, 2-s2.0-85064347435.
10.1007/s10115-019-01354-1
Web of Science® Google Scholar
15 Jinjie H., Xuanwei Z., and Xinyao Z., Short text entity linking based on domain knowledge graph, Computer Engineering and Applications. (2021) 6, 1–12.
Google Scholar
16 Li Dun F. and Zijian Z. Z., An entity linking model based on candidate features, Social Network Analysis and Mining. (2021) 11, no. 1, 50–58, https://doi.org/10.1007/s13278-021-00761-z.
10.1007/s13278-021-00761-z
Web of Science® Google Scholar
17 Jin Z., Zhu Y., Zhang T., Yixue X., and Ke Z., A graph-based method for multi-feature entity linking, Journal of Shanghai University (Natural Science Edition). (2020) 26, no. 5, 747–755.
Google Scholar
18 Fei Z., Yanhui Z., Wentong L., and Xiangbing J., Entity linking method based on Bert and TextRank keyword extraction, Journal of Hunan University of Technology. (2020) 34, no. 4, 63–70.
Google Scholar
19 Gattani A., Lamba D. S., Garera N., Tiwari M., Chai X., das S., Subramaniam S., Rajaraman A., Harinarayan V., and Doan A. H., Entity extraction, linking, classification, and tagging for social media, Proceedings of the VLDB Endowment. (2013) 6, no. 11, 1126–1137, https://doi.org/10.14778/2536222.2536237, 2-s2.0-84891089289.
10.14778/2536222.2536237
Google Scholar
20 Bunescu R. C. and Pasca M., Using Encyclopedic Knowledge for Named Entity Disambiguation, Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06), 2006, Trento, Italy, 9–16.
Google Scholar
21 Cucerzan S., Large-scale named entity disambiguation based on Wikipedia data, Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), 2007, Prague, Czech Republic, 708–716.
Google Scholar
22 Zhang W., Su J., Tan C. L., and Wang W. T., Entity linking leveraging automatically generated annotation, Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), 2010, Beijing, China, 1290–1298.
Google Scholar
23 Zhang W., Tan C. L., Sim Y. C., and Su J., NUS-I2R: Learning a Combined System for Entity Linking, TAC, 2010, 1–5.
Google Scholar
24 Gottipati S. and Jiang J., Linking Entities to a Knowledge Base with Query Expansion Association for Computational Linguistics, 2011, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 804–813.
Google Scholar
25 Cucerzan S., TAC Entity Linking by Performing Full-Document Entity Extraction and Disambiguation, TAC, 2011, 1–7.
Google Scholar
26 Han X. and Zhao J., Nlpr-Kbp in TAC 2009 KBP Track: A Two-Stage Method to Entity Linking, TAC, 2009.
Google Scholar
27 Francis-Landau M., Durrett G., and Klein D., Capturing semantic similarity for entity linking with convolutional neural networks, https://arxiv.org/abs/1604.00734.
Google Scholar
28 Chen S., Wang J., Jiang F., and Lin C. Y., Improving entity linking by modeling latent entity type information, Proceedings of the AAAI conference on artificial intelligence. (2020) 34, no. 5, 7529–7537, https://doi.org/10.1609/aaai.v34i05.6251.
10.1609/aaai.v34i05.6251
Google Scholar
29 Devlin J., Chang M. W., Lee K., and Toutanova K., BERT: pre-training of deep bidirectional transformers for language understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2019, Minneapolis, 4171–4186.
Google Scholar
30 Guo Y., Qin B., Li Y., Liu T., and Li S., Improving candidate generation for entity linking International Conference on Application of Natural Language to Information Systems, 2013, Springer, Berlin, Heidelberg, 225–236, https://doi.org/10.1007/978-3-642-38824-8_19, 2-s2.0-84884924371.
10.1007/978-3-642-38824-8_19
Google Scholar
31 Fang Z., Cao Y., Li R., Zhang Z., Liu Y., and Wang S., High quality candidate generation and sequential graph attention network for entity linking, Proceedings of The Web Conference 2020, 2020, Taipei, Taiwan, 640–650, https://doi.org/10.1145/3366423.3380146.
10.1145/3366423.3380146
Google Scholar
32 Zhang H., Guo Y., and Li T., Domain named entity recognition combining GAN and BiLSTM-attention-CRF, Journal of Computer Research and Development. (2019) 56, no. 9, 1851–1858.
Google Scholar
33 Lee D. Y., Yu W., and HeuiSeok L., Bi-directional LSTM-CNN-CRF for Korean named entity recognition system with feature augmentation, Journal of the Korea Convergence Society. (2017) 8, no. 12.
Google Scholar
34 Liu Y. F., Cai S., Yang H. X., and Zhang C., Network intrusion detection method integrating CNN and BiLSTM, Computer Engineering. (2019) 45, no. 12, 127–133.
Google Scholar
35 Sabir Z., Saoud S., Raja M. A. Z., Wahab H. A., and Arbi A., Heuristic computing technique for numerical solutions of nonlinear fourth order Emden-Fowler equation, Mathematics and Computers in Simulation. (2020) 178, 534–548, https://doi.org/10.1016/j.matcom.2020.06.021.
10.1016/j.matcom.2020.06.021
Web of Science® Google Scholar
36 Sabir Z., Raja M. A. Z., Arbi A., Altamirano G. C., and Cao J., Neuro-swarms intelligent computing using Gudermannian kernel for solving a class of second order Lane-Emden singular nonlinear model, AIMS Math. (2020) 6, no. 3, 2468–2485.
10.3934/math.2021150
Web of Science® Google Scholar

Citing Literature

All articles

[Retracted] Research of Vertical Domain Entity Linking Method Fusing Bert-Binary

Retraction(s) for this article

Retracted: Research of Vertical Domain Entity Linking Method Fusing Bert-Binary

Abstract

1. Introduction

2. Model Framework

3. Candidate Entity Set Generated

4. Candidate Entity Disambiguation

5. Method Validation

6. Conclusions

Conflicts of Interest

Acknowledgments

Open Research

Data Availability

References

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

[Retracted] Research of Vertical Domain Entity Linking Method Fusing Bert-Binary

Retraction(s) for this article

Retracted: Research of Vertical Domain Entity Linking Method Fusing Bert-Binary

Abstract

1. Introduction

2. Model Framework

3. Candidate Entity Set Generated

4. Candidate Entity Disambiguation

5. Method Validation

6. Conclusions

Conflicts of Interest

Acknowledgments

Open Research

Data Availability

References

Citing Literature

Figures

References

Related

Information