Volume 2022, Issue 1 3616432
Research Article
Open Access

Correlation between the Dissemination of Classic English Literary Works and Cultural Cognition in the New Media Era

Weiwei Guo

Corresponding Author

Weiwei Guo

Zhengzhou Vocational College of Industrial Safety, Zhengzhou 451192, China

Search for more papers by this author
First published: 20 July 2022
Citations: 1
Academic Editor: Qiangyi Li

Abstract

With the continuous development of new media technology, the spiritual needs of the masses have been greatly satisfied and the aesthetic ability has also been significantly improved compared with the past. From the current point of view, “literary works,” as the spiritual food of contemporary people, are promoting social spirit. The use of natural language processing and knowledge graph technology can improve cultural cognition to promote the dissemination and development of classic English literature, which has become a necessary means of dissemination of classic English literature. Most of the existing classic English literary works are appreciated based on modern literature datasets. Nowadays, with the continuous development of new media technology, there are fewer studies on the dissemination and cultural cognition of classic English literary works. This makes it impossible for readers to obtain cultural cognition from classic English literary works, making it difficult for the dissemination and development of classic English literary works. In view of the above problems, using natural language processing and knowledge graph technology, taking Shakespeare's play “Hamlet” represented by classic English literary works as an example, the research on the construction method of knowledge graph is carried out and the cultural characteristics in literary works are extracted and analyzed. In parsing, a bidirectional gated recurrent unit network model based on hybrid character embedding is proposed. Based on n-gram embedding, by combining pretraining embedding and radical embedding, it can fully consider the rich semantic information in English literature works to extract. Feature: in terms of named entity recognition, based on the existing iterative atrous convolutional network model, an iterative atrous convolutional network model is proposed. To get the best sequence label and get the last labeled entity information, in terms of knowledge graph construction and visual query, a workflow method for building knowledge graph from unstructured text is proposed and a flask-based knowledge graph visual query system is designed, which applies the best model of the above two tasks. We decode the complete “Hamlet” text, extract entities and their semantic links as nodes and relationships in the knowledge graph, store knowledge through the graph database, and finally form a visual query system that combines the front and back end.

1. Introduction

New media has become a new form of media supported by Internet big data. Since its inception, new media has been successfully applied to commercial promotion activities. Many book publishing or book marketing companies have begun to use new media for communication , and many literary works have been recognized by the public through this form of communication [1]. The characteristics of new media communication include strong interactivity, sea quantitative information-carrying function, information fragmentation, and strong personalization and initiative.

Cultural cognition should be composed of three levels as follows: general cultural epistemology, special cultural epistemology, and comparative cultural epistemology. The relationship between cultural cognition and general epistemology is not special and general, but the transformation of the research perspective. However, the cultural cognition theory was formed in the cultural history school in the 1970s, and the interdisciplinary research represented by Vygotsky gradually penetrated into the integrated theoretical framework of social, cultural, background, and psychological levels and constantly confirmed the development and life of cognition. Specific experiences are associated with conclusions [2]. Precisely because cognitive development is actually the product of specific activities in specific contexts, culture and cognition are no longer viewed as independent and dependent variables, respectively, but as mutually influential [3]. As a result, different cultural groups have cognitive differences that are generally associated with specific experiences rooted in specific socio-cultural and historical contexts. In our various educational practices, we should pay attention to the influence of cultural differences on individual cognition, especially in language education, and we need to explore the influence and guiding significance of the cultural cognition theory on language teaching in different cultural contexts [4].

In the context of the information industry era, “Internet+” technology has injected new impetus into the development of all walks of life. In the process of creating and disseminating literary works, along with the prosperity and growth of digital media, literary works are carried out using networks [5]. The spread is now imminent. With the continuous development and updating of computer technology, people began to digitize classic English literary works [6].

Among them, the knowledge graph is management, such as semantic retrieval, knowledge question answering, and recommendation system [7]. The knowledge graph is known as knowledge domain visualization in the book and intelligence community. Visual technology is used to describe the mutual connection between them. Google proposed the concept of knowledge graph (KG), which is dedicated to improving search engines [8]. Usually, the expression is expressed as G = (E, R, S), where E = {e, e,..gq} represents the entity set, R = 5 represents the relation set, GER represents the three in the knowledge graph tuple [9].

Knowledge graphs can be divided into English and Chinese types according to different languages. The vertical domain knowledge graph in English includes IMDB and MusicBrainz [10]. The vertical domain knowledge graph in Chinese includes traditional Chinese medicine (TCM) knowledge graph, marine knowledge graph, and enterprise knowledge graph. The advantage of this is usually constructed manually, which requires a lot of human and financial resources [11].

This paper promotes the dissemination and development of classic English literary works in the new media environment by using natural language processing and knowledge graph technology to improve readers' cultural cognition. The deep neural network model is used to segment the content of works, tag parts of speech, and named entities. Recognition, compared with traditional methods, does not require artificially constructing complex feature templates, which greatly reduces the amount of labor [12]. Finally, we apply the best models of the above two natural language processing tasks on the English dataset to the representative “Hamlet” in the classic English literature and manually organize and modify the output text [13].Based on this, manually create relationships for the extracted entities, supplement knowledge with reference to Wikipedia, and use the graph database as knowledge storage to design and implement the “Hamlet” knowledge graph visualization query system [14].

2. Theoretical Research

2.1. Cultural Cognition Analysis of Classic English Literary Works

As a kind of language education, classic English literature works are aimed at people who are not native speakers of Chinese [15]. Therefore, on the one hand, classic English literature works are language education, which aims to teach students the knowledge of language ontology, such as pronunciation, grammar, vocabulary, and Chinese characters, to cultivate their listening, speaking, reading, and writing abilities to achieve communication. Literary works are foreign language education, which is a second language education based on the cultural environment of the mother tongue and has the particularity of Chinese as a second language education [16]. The former emphasizes the language characteristics of Chinese itself in the classic English literary works and the operating procedures in the education process, while the latter puts forward the deep essential interpretation of language education, that is, the ethnic differences formed by the interaction and evolution of language and culture [17]. Although the cultural essence of classic English literature is the interaction of foreign cultures based on ethnic differences, the meaning of “culture” in classic English literature cannot simply be interpreted by general cultural concepts [18]. It mainly refers to the culture that connects people's communication and is flowing, that is to say, the culture of classic English literature is “communicative.” It can be said that all language learning is for the smooth completion of communication. In the process of achieving this goal, in addition to language, there are also nonlinguistic factors that restrict the flow of language, to truly complete communication smoothly. Therefore, while acquiring language knowledge, learners should also have a deep understanding of the communicative culture of Chinese-speaking countries.

2.2. Knowledge Graph Overview

To promote the cultural cognition and dissemination of English classic literary works in the new media environment, it is used to model the classic literary works [19]. The knowledge graph essentially originates from the semantic network. In the structure of the knowledge graph, nodes can be entities in real life, nodes can also be some numerical values, and edges represent the relationship between two nodes [20]. As shown in Figure 1, the nodes in the knowledge graph can be various types of entities, “Yao Ming” and “China Shanghai” are two different types of entities, namely, person entity and location entity. The expression form can be “entity-relationship entity,” such as “Yao Ming-place of birth-Shanghai,” indicating that the relationship between the “Yao Ming” character entity and the “China” place entity is “place of birth,” and the expression form can also be “entity-attribute-attribute-value.” All these nodes and the relationships between nodes are usually triples.

Details are in the caption following the image

2.3. Introduction to Knowledge Extraction

The main purpose is to identify relevant information elements in data, such as cultural characteristics in English literary works, to quickly build large-scale knowledge graphs. According to different data sources, data can be divided into different types. Structured data refer to data with a strict data model structure, and semistructured data refer to data with a nonrigid specific structure. Unstructured data are information that does not have a predefined data model. The knowledge extraction task can be reduced to several basic subtasks:
  • (1)

    The main purpose is to identify meaningful entities from text, which is the most critical part of knowledge extraction.

  • (2)

    Relation extraction: relation extraction means that, after obtaining entities, the relationship between entities needs to be extracted from related texts, and the initially connected unrelated entities are connected through this relationship to form a network of structured knowledge structures.

2.4. Serialization Annotation Network Model

For serialization labeling tasks, the most typical neural network models are bidirectional LSTM networks (BILSTM) and bidirectional LSTM networks with a CRF layer (BILSTM-CRF). As shown in Figure 2, in the BILSTM-CRF neural network model structure, due to the bidirectional LSTM network structure, the network structure can use both forward and reverse LSTM inputs. In addition, due to the CRF layer, the network structure can also use sentence-level label information, which makes the model have better accuracy.

Details are in the caption following the image

3. Techniques and Means

To facilitate readers to analyze the cultural characteristics of classic English literary works in the new media environment and to improve readers' cultural cognition of English literary works, a neural network structure model is constructed, and an iterative convolutional network based on the attention mechanism is used to construct a knowledge map.

3.1. Neural Network Structure

3.1.1. Gated Cyclic Unit

GRU is a memory network (LSTM), which contains forget gate, input gate, and output gate. The GRU network structure includes update gates and reset gates, the forget gate, and the input gate in the LSTM structure into an update gate, so the GRU structure not only has the advantages of LSMT but also simplifies the network structure and can effectively extract features. Its network structure is shown in Figure 3. If the cloud's value range is [0,1], it needs to be passed when the value is close to 1, and when the value is close to 0, the information needs to be ignored. The calculation formula of the reset gate is similar to the principle of the update gate, but the weight matrix is different. At time t, the calculation process of z, and r is shown in formulas (1) and (2).

Details are in the caption following the image
First, the input x at time t and the h−1 at the previous time are multiplied by the corresponding weights and passed through a function. After computing z, and r is complete, it is possible to calculate what needs to be remembered at time t. Second, the reset gate is used to determine the state of the hidden layer at the previous time t−1 and the information that needs to be ignored at time t. Then, input r, h−1, x and calculate the candidate’s hidden state through a tanh function. The calculation process is shown in formulas (3) and (4).
(1)
(2)
(3)
(4)

3.1.2. n-Gram Character Embedding

In general, popular character-based neural network models assume that large ranges of text, such as words and n-grams, can be represented by sequences of characters that they are composed of. Pass to the function f to obtain a vector representation Vmn of cmn, where f is usually an RNN or CNN neural network.
(5)

In this paper, we do not solely rely on using BiGRU, as shown in Figure 4, we perform an algorithm on the rich local information in character vectors by using an incremental cascaded n-gram model.

Details are in the caption following the image

3.2. Iterative Atrous Convolutional Networks Based on Attention Mechanism

3.2.1. Attention Mechanism

Iterative atrous convolutional network models can rapidly aggregate broad context information by dilating CNNs. To address this problem, we apply the attention mechanism to the iterative atrous convolutional network model. The current token and all tokens in the sequence by formula (6) as a projection matrix, which is then normalized by the sofrax function.
(6)
Here, is the final output position t in equation (4.6), and sim represent the similarity between the two vectors. We have
(7)
We take the calculation of the weight vector of the unit at the position as an example to illustrate the detailed calculation process. We have
(8)
Finally, the output of the current position and the output of the attention layer are concatenated as the output of this module.
(9)
Here, Wo is a weight matrix that maps the output to the class space. Finally, like regular named entity recognition tasks, a conditional random field layer is added for final sequence labeling.

3.2.2. Neural Network Structure

Although iterative atrous convolutional networks can effectively summarize a wide range of contexts, compared with traditional BILSTM models, this model ignores word order features and important information of local context in texts. The BILSTM model is composed of the forward LSTM combined with the backward LSTM. The natural temporal structure of the traditional BILSTM model can capture location information, but the iterative atrous convolutional network has no temporal structure. Finally, the neural network model structure we adopted is an iterative atrous convolutional neural network structure based on the attention mechanism, as shown in Figure 5.

Details are in the caption following the image

3.3. Knowledge Graph Construction

Taking the representative “Hamlet” in English classic literary texts as an example, to further analyze the cultural characteristics of this work, we define the following steps of the knowledge map construction process of classic English literary works. As shown in Figure 6, it is mainly divided into three parts, namely, corpus and text processing, automatic annotation, and knowledge supplementation.

Details are in the caption following the image

3.3.1. Graph Database Stored Procedures

We use the method of named entity recognition for automatic entity extraction. Since there are still incorrect labels in the named entity recognition results of the “Hamlet” text, manual correction is required, and for the abbreviation of the person's name, it needs to be matched according to the context and replaced by the complete person's name. Since there are few dynasties and era entities in “Hamlet,” we mainly extract the entities of person names and place names. We use the named entity recognition labels “PER” and “LOC” as the person entity and the location entity in the knowledge graph, respectively.

4. Conclusion and Analysis

4.1. Word Segmentation and Part-of-Speech Tagging Experiments

4.1.1. Experimental Setup

Our neural network architecture is implemented using the Tensorflow 1.3, which operates on labeled sentences. Table 1 shows the hyperparameters used, and the experiments were performed with the same hyperparameter settings. We use the Adagrd algorithm with minimal batching to optimize the initial learning rate. Among them, the initial learning rate ηo = 0.1, the decay rate ρ = 0.05, with the number of iterations, and the learning rate is defined as follows, where t is the current number of iterations.
(10)
1. Hyperparameter settings-(a).
Parameter Value
Character embedding size 60
N-gram embedding size 60
Radical embedding size 32
Optimizer Adagrad
Initial learning rate 0.1
Decay rate 0.05
Gradient clipping 5.0
Dropout rate 0.5
Batch size 10
Character font Simsun
Character size 32 ∗ 32
Convolutional filter size 5 ∗ 5
Convolutional filter number 34
Max pooling size 2 ∗ 2
Fully connected size 100

Among them, we use the pygame library to convert Chinese characters into 30 ∗ 30 text images for the input of radical embedding. The size of the convolution kernel of the CNN network used in the radical embedding is 5∗ 5, the number is 32, the maximum size of the pooling layer is 2∗ 2, and the size of the fully connected layer is 100.

4.1.2. Analysis of Experimental Results

As the evaluation criteria of the model, use Seg to represent the evaluation result of word segmentation and Tag to represent joint word segmentation and part-of-speech tagging for the evaluation results.

Table 2 shows the experimental results using cascaded n-gram embeddings under different models. We can obtain relatively large improvements over regular character embeddings, which shows that recurrent neural networks cannot efficiently capture all local information. Utilizing cascaded n-grams can ensure features. From the table, we can see that the improvement effect of using 2-Gram is the most obvious, and 4-Gram has no improvement effect on the dataset. The results show that since high-order n-grams are sparse in the training data, their vector representations cannot be effectively trained. Therefore, a larger n is detrimental to the model. In addition, using higher-order n-grams also increases the number of weights in the model, making both training and decoding very slow.

2. F1 values of n-gram embeddings on the development set.
Models n = 1 n = 2 n = 3 n = 4
Seg Tag Seg Tag Seg Tag Seg Tag
BILSTM 79.88 75.46 87.33 84.97 87.04 84.71 87.10 84.96
BILSTM-CRF 81.09 78.72 87.91 86.30 87.73 86.17 87.32 85.74
BIGRU-CRF 81.03 78.44 87.94 86.35 87.74 86.06 87.48 86.06

Table 3 shows the experimental results of augmenting radical embeddings and pretrained embeddings using the BIGRU-CRF model. From the table, we can see that using radical embedding and pretraining embedding on the basis of 1-Gram, the effect of the model is improved, but when using radical embedding and pretraining embedding on the basis of 2-Gram, the effect of enhancement is not particularly obvious. In particular, since the radical embedding uses the CNN network structure, using it to extract image features consumes more GPU resources.

3. F1 values on the development set using radical embeddings and pretrained embeddings on BIGRU-CRF.
Seg Tag
1-Gram 81.13 78.44
+Radical 81.52 78.72
+Radical, word2vec 81.97 79.53
2-Gram 87.94 86.35
+Radical 87.96 86.38
+Radical, word2vec 87.99 86.42

Finally, we choose 2-Gram along with radical embeddings and pretrained character embeddings as the best settings for final evaluation. As shown in Table 4, the final test experimental results under the same settings with different network model structures are shown. From the table, we can see that the BIGRU-CRF model outperforms the other two models.

4. Evaluation results of different model structures on the test set.
Models Precision Recall F1-score
Seg Tag Seg Tag Seg Tag
BILSTM 80.41 78.76 87.88 86.08 83.98 82.26
BILSTM-CRF 83.03 81.73 87.77 86.40 85.33 84.00
BIGRU-CRF 82.96 81.82 88.06 86.86 85.43 84.27

4.2. Attention-Based Iterative Atrous Convolutional Network Experiments

4.2.1. Experimental Setup

The specific hyperparameter settings of our model are listed in Table 5. In terms of iterative dilated convolution parameters, the word embedding size is 64, the filter width is 3, the number of filters is 256, the expansion width is [1, 1, 2], and the number of each block in the iterative atrous convolutional network model is 4. In terms of training, it is 1e-3, the dropout is set to 0.5, and the size of gradient clipping is 5.

5. Hyperparameter settings-(b).
Parameter Value
Char embedding dim 64
Filter width 3
Numfilter 256
Dilation [1, 2]
Block number 4
Learning rate 1e-3
Dropout rate 0.5
Gradient clipping 5
Batch size 64
Epoch 50

4.2.2. Experimental Results

I compared the iterative atrous convolutional network model based on the attention mechanism with the basic model iterative atrous convolutional network model (IDCNN) and the BILSTM-CRF model.

As shown in Table 6, our attention-based iterative atrous convolutional network model outperforms the other two models in F1 value. Compared with the IDCNN model, the F1 value of our model is increased by 1.45%, which shows that, by adding an attention mechanism, information can be fused and local contextual information can be focused, which can improve the model. The performance is also improved compared to that of the BILSTM-CRF model, showing that the attention-based iterative atrous convolutional network model is inference. The F1 value of our model is improved by 0.92%; however, the accuracy rate is lower than the BILSTM-CRF model, indicating that the CNN-based network model is still slightly lower than the LSTM-based network model in terms of accuracy in processing text.

6. Evaluation results of different models on the test set.
Models Precision Recall F1-score
BILSTM-CRF 69.13 59.31 63.85
IDCNN 65.13 61.62 63.32
Attention-IDCNN 68.47 61.45 64.77

4.3. Design and Implementation of the Knowledge Graph Visual Query System

Usually, the system is divided into the basic architecture of C/S and B/S. To support the compatibility of multiple platforms, we use the B/S architecture, that is, the browser/server mode. The front-end uses HTML + CSS + D3.js technology to create pages, the jQuery framework is used to achieve interactivity, and the back-end uses the Flask framework to implement logic control and corresponding functions. The framework adopted by the “Hamlet” knowledge graph visual query system is shown in Figure 7, which are the query layer, the logic layer, and the presentation layer.

Details are in the caption following the image

We take the relationship entity associated with the character entity as an example to describe the implementation of its search function, and the same is true for the functions of other entities associated with the character entity. As shown in Table 7, it is the back-end core code of the relationship entity associated with the character entity. The back-end constructs a cypher query statement according to the person's name sent by the front-end, matches the relationship directly related to the searched person in the Neo4j graph database, and returns the person's name and a set. This collection contains the relationship names of all characters. To convert the returned data into data that can be directly used by the presentation layer, it is also necessary to serialize the collection, convert the collection into a list of dictionary types, and finally return it to the front end in the form of JSON. Taking the query “Hamlet” as an example, the JSON form of the character entity and the related entity is shown in Table 7.

7. JSON form example of person entity and relation entity.
{name:” Hamlet”
 cast: [{Relation: “Uncle”,node:” claudius”},
  {relation: “Queen”,node:” trudeau”},
  {relation: “George”,node:” chancellor polonius”},
  {relation: “close friend”,node:” horash”},
  {relation: “prince of Norway”,node:” fortinbras”}]
}

The front-end display platform sends a GET request to the back-end through the jQuery framework according to the character input by the user and disassembles the returned JSON data. Since there may be a one-to-many relationship between the character entity and the position entity, a traversal is required. Operation: sequentially splicing HTML tags of the position entities associated with the character entities and their relationships. And, according to the returned JSON data, the D3.js visualization library is used to draw the force-oriented graph, the entities and their relationships are drawn into nodes and lines, the text display of the nodes and lines is drawn, and finally, the monitoring of drag events is set, which is used to achieve the effect that other nodes follow when dragging any node.

5. Conclusion

In short, the emergence of new media has played an important role in improving people's abilities while satisfying the spiritual needs of the audience. At present, the form of dissemination of literary works in the new media environment has also undergone significant changes. To enhance the dissemination of literary works, it is urgent to optimize the form of dissemination. Excellent literary works have a very prominent role in improving the humanistic spirit of the people and creating a good social value system. In the new media environment, the form of dissemination of literary works has changed, which has greatly enhanced the dissemination of literary works. The use of natural language processing and knowledge graph technology can improve cultural cognition to promote the dissemination and development of classic English literature, which has become a necessary means of dissemination of classic English literature. This paper uses natural language processing and knowledge graph technology to improve readers' cultural cognition of classic English literary works, so as to promote the dissemination and development of English classic literary works, which has important reference significance for the dissemination of English classic literary works.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

This work was supported by the Zhengzhou Vocational College of Industrial Safety.

    Data Availability

    The labeled data set used to support the findings of this study is available from the author upon request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.