Ginger Product Price Information Based on Semantic Heterogeneity of Multisource Network Information
Abstract
In order to improve prices of ginger products information service, promote the transmission and interaction between prices of ginger products information data, agricultural products are roughly divided into nine categories. Through network research, the price information of agricultural products on the Internet is analyzed. According to the hierarchical theory of information processing, three types of semantic heterogeneity of schema, contextual data, and individual anomaly data with hierarchical progressive relationship are proposed. The results show that almost 100% of the main attributes exist in the network information sources, in addition, more than 80% of the websites have type differentiation of agricultural products, and less is provided for attributes such as specification and origin, which also have an impact on the price of agricultural products. Through the network information source, the attribute characteristics of multisource network prices of ginger products information are grasped, and the classification characteristics of existing semantic heterogeneity are combined, and it provides a strong support for constructing the classification system of semantic heterogeneity of prices of ginger products information.
1. Introduction
With the development of social economy and science and technology, the Internet has been closely linked with modern social life. All kinds of information on the Internet are increasing rapidly and form a huge information base composed of distributed and heterogeneous information sources. Structured information such as databases, semistructured information such as HTML and XML, and unstructured information such as text and multimedia. Because of the differences in information system platform and data format, the interoperation among information sources becomes complicated and difficult. The heterogeneity of information sources is the main bottleneck restricting information sharing, how to shield and eliminate the influence of heterogeneity is an urgent problem to be solved in information integration. The heterogeneity of information sources can be divided into four levels: system heterogeneity, syntactic heterogeneity, structural heterogeneity, and semantic heterogeneity, among which semantic heterogeneity is the most difficult to eliminate. China is a large agricultural country and a developing country with a population of 1.4 billion, but the arable land area per capita is less than 40% of the world average level, and the low-yield farmland accounts for more than 70% of the total arable land area. The continuous and rigid growth of China's population requires that the future grain production capacity must reach an average annual growth of 2.0%, and the problem of food security is becoming more and more prominent due to the continuous decrease of grain-sown area, the increasing shortage of water resources, and the increasing frequency of disasters caused by global climate change. Therefore, it is imperative to eliminate the semantic heterogeneity of multisource network information in order to make the development of agriculture present a high-speed, healthy, and sustainable state [1]. In view of the network agricultural information price information of agricultural products as the research object, we tried to investigate the formation of the heterogeneous phenomenon of the network agricultural products price information from a cognitive perspective, and collected the network price information release method, the type, the update frequency, and the attribute value. We borrowed the design idea of the information life cycle theory backtracking from the heterogeneous appearance to the information acquisition scheme to explore the generating causes and external features of different heterogeneity, and form a collection of features of different semantic heterogeneous phenomena.
Semantic heterogeneity is caused by different types of semantic conflicts. Semantic conflict refers to the semantic inconsistency caused by the difference in description, structure, and content when describing the same real world. At present, there is no uniform type of semantic heterogeneity due to the different research directions and research objects. Foreign researches on semantic heterogeneity have gone through the process from structured information sources based on relational database or object-oriented data model to unstructured and semistructured information sources. Semantic heterogeneity is divided into two categories: schema heterogeneity and data heterogeneity. For semistructured XML information sources, researchers classify semantic heterogeneity into four categories: structural conflict, domain conflict, data conflict, and linguistic conflict. The first classification method mainly aims at the semantic heterogeneity among structured databases and can be used as a reference for information integration within or between enterprises in a smaller scope. Second way is mainly aimed at the core of the XML specification of classification, the integration is of important guiding significance for the semantic heterogeneity between the semistructured information source, adopted by way of XML, but too dependent on today's multisource network more and more data in the present, limited to not being able to really eliminate all semantic isomerism data format. The heterogeneity in multisource networks can be divided into four levels: system heterogeneity, structure heterogeneity, syntax heterogeneity, and semantic heterogeneity, among them, semantic heterogeneity is the most difficult to eliminate. Semantic heterogeneity is caused by different types of semantic conflicts. The study of semantic heterogeneity has gone through the process of pairing structured information sources based on relational database or object-oriented data schemas, to unstructured and semistructured information sources.
At present, China's agricultural Internet has been developing rapidly, and the number of agricultural websites has exceeded 30,000, which contains a rich and diverse variety of columns. Common columns include agricultural information, newsletters, policies and regulations, market dynamics, prices, information exchange, and forums. Agricultural product price information is related to the economic income of farmers, which is the information that farmers want to get most. However, to obtain the required agricultural price information from the huge agricultural website, it needs to exclude many interference information, and the results obtained from obtaining agricultural price information through traditional search engines are not ideal, and even unanswered. Therefore, in order to effectively obtain the more valuable price information of agricultural products, it is necessary to adopt a professional agricultural search engine, to realize the identification and elimination of the semantic heterogeneous phenomenon of agricultural products price information, and to improve the efficiency of agricultural products price information acquisition. The purpose of this network research is to confirm the semantic heterogeneity phenomenon of agricultural product price information existing in the current multisource network information and to lay the foundation for identifying and eliminating the heterogeneous phenomenon.
Integrating heterogeneous databases must overcome semantic heterogeneity caused by differences in underlying data structures or data representations. Ma et al. analyzed model-based database semantic conflicts and proposed a framework for resolving pattern integration [2]. Shi and C adopt object-oriented database programming language to solve domain conflicts and structure conflicts [3]. Yu and D gave a rule inference model to solve the problem of entity heterogeneity, which mainly established the equivalence relationship between attribute values and attribute values through some rules [4]. However, different instances of attributes have different rules, so the semantic heterogeneity of different instances cannot be unified, and this method may lead to the loss of basic information. Pang et al proposed an automatic strategy of establishing global identifiers by defining different types of equivalence relations and using neural network-based global identifiers to determine candidate semantic integration processes [5]. Yun et al. established a probabilistic decision model to eliminate semantic heterogeneity by adopting the method of entity matching. Because entity descriptors vary from database to database, and there are human errors in data collection and data representation, an exact match of entities is not realistic; therefore, this model adopts probability theory to model the representation of uncertain data and solves the problem of entity heterogeneity by minimizing the entity matching cost [6]. In China, Professor Liu, Y, and others have conducted several issues of Web data management, focusing on Web query, semistructured data model and Web information integration method. They proposed a Web data management system framework based on XML, which first transformed each heterogeneous information source into the XML data center, and then managed the XML data in the data center [7]. The results show that almost 100% of the main attributes exist in the network information sources, in addition, more than 80% of the websites have type differentiation of agricultural products, less is provided for attributes such as specification and origin, which also have an impact on the price of agricultural products. Through the network information source, the attribute characteristics of multisource network prices of ginger products information are grasped, and the classification characteristics of existing semantic heterogeneity are combined, it provides a strong support for constructing the classification system of semantic heterogeneity of prices of ginger products information.
2. Network Semantic Heterogeneity of Prices of Ginger Products Information
2.1. Research Purpose
At present, China's agricultural Internet is developing rapidly, the number of agricultural websites has exceeded 30,000, which contain rich and diverse columns, common columns include agricultural information, news bulletin, policies and regulations, market dynamics, price quotes, information exchange and forum, etc. The price information of agricultural products is related to the economic income of farmers and is the most desired information for farmers. Ginger, as a landmark crop, due to the imbalance between supply and demand and the large range of price changes in the development of its industry, the price of ginger fluctuates. But getting the price information you need from the vast agricultural website requires eliminating a lot of distractions, and through the traditional search engine to obtain prices of ginger products information results are not ideal, and even may appear to answer the question. Therefore, in order to obtain more valuable prices of ginger products information effectively, it is necessary to use professional agricultural search engines, to realize the recognition and elimination of semantic heterogeneity of prices of ginger products information, and to improve the efficiency of prices of ginger products information acquisition. The purpose of this network research is to confirm the semantic heterogeneity of prices of ginger products information in the current multisource network information and lay the foundation for identifying and eliminating the heterogeneity.
2.2. Survey Object
This research adopts network research, mainly for the price information of agricultural products on the Internet, so the research object selected from such a huge agricultural website should comply with the following principles:
First, agricultural website-published prices of ginger products information have credibility. There are many existing agricultural websites, most of which have problems of repeated information, untimely update, and even lack of pertinacity and accuracy. For prices of ginger products, information should have a valuable guiding role, timeliness, and accuracy are important indicators, so we need to consider this when choosing research objects.
Second, it should be combined with the main characteristics of agricultural products prices, conducive to the analysis of research data, should have a more complete network information in the description of agricultural products prices, it can identify existing semantic heterogeneity and avoid excessive invalid survey data and interference data.
Based on the above two principles, the research objects are composed of two aspects: on the one hand, “Nongsoo” obtained government-level agricultural websites with prices of ginger products information website. On the other hand, China Agricultural Information Network, sponsored by the Information Center of the Ministry of Agriculture, provides the website of the wholesale market of agricultural products in each province. This is not only authoritative, but also covers the price information of agricultural products in almost all provinces of the country [8, 9].
2.3. Survey Content
The object of this research is to extract the prices of ginger products information published on the website every day from the above objects according to the quality of the information and the space–time demand, carry out the following research statistics on the issued agricultural products information: (1) The name of the column of prices of ginger products information provided in the agricultural website and the way of displaying the price information, to understand the structure heterogeneity problems existing in the existing network. (2) The types involved in the price information of agricultural products in the existing network were counted, and the pattern heterogeneity in the variety names was observed. (3) The update time of prices of ginger products information was calculated to determine the timeliness of the selected objects, which laid a foundation for improving the efficiency of obtaining prices of ginger products information. (4) Statistics of the format and content of prices of ginger products information, determine the necessary attributes of prices of ginger products information, and observe the heterogeneity of data.
2.4. Analysis of Research Results
This research agricultural website 200 had an analysis of a sample survey of 120 published prices of ginger products information of agricultural websites, involving 30 provincial government agricultural websites, wholesale markets of agricultural products, and other related agricultural products price information network.
2.4.1. Release of Prices of Ginger Products Information
With the increasing number of agricultural websites, there are more and more technical ways to do it. In most of the sampled websites, semistructured data and CSS are combined for display, among which 80% are in the form of table and 20% are in the form of text description (Table 1). In addition, as represented by China Agricultural Information Network, the price information of agricultural products is encapsulated and published in unstructured ways such as Flash.
Project | Tabular form | Text form | Other |
---|---|---|---|
Number of websites (individual) | 91 | 23 | 6 |
The percentage% | 75.83 | 19.17 | 5.00 |
Thus, if the semantic heterogeneity of semistructured form of prices of ginger products information in the network can be solved, most of the accurate prices of ginger products information can be obtained; therefore, this research will mainly focus on the release content of form information.
2.4.2. Prices of Ginger Products Information Belongs to Category
China is a large agricultural country with a long history, agricultural and rural economy has developed significantly since the reform and opening up, agricultural products have realized a historic leap from long-term shortage to total balance and surplus in bumper years, and have even become the main market of global agricultural exports. Thus, it can be seen that China's agricultural products involved a wide variety. According to the results of the sample survey, the agricultural products with price information published in the network can be roughly divided into nine categories, including vegetables, fruits, livestock and poultry eggs, aquatic products, grain and oil, medicinal herbs, tea, meat, and nonstaple food condiments.
More than 90% of the sites sampled provided price information for vegetables, nearly 80% of the websites provided fruit price information, which reflected that these two categories are the most concerned types in people's daily life. The second is the price information of livestock and poultry eggs, aquatic products, grain and oil, meat, and nonstaple food. People with different living conditions and living environment pay different attention to it. Finally, medicinal materials and tea account for a relatively low proportion. On the one hand, most of the sampled websites are comprehensive or wholesale market websites, on the other hand, such agricultural products have obvious regional characteristics, so fewer agricultural websites publish their price information, see Figure 1. As there are so many types of agricultural products, information providers are all over the country, and by observing the expression of agricultural products in the network, it is found that there are a variety of different ways, will appear with the same name, and other semantic heterogeneities.

2.4.3. Prices of Ginger Products Information Update Frequency
Agricultural products are an indispensable part of life, the price changes of agricultural products directly affect the daily life of the majority of residents, especially the price of ginger, it has the characteristics of frequent fluctuation, and affects the real income and production enthusiasm of farmers; therefore, it is very important to grasp the price information of agricultural products in all regions in a timely manner. At present, because different managers in the network update the prices of agricultural products with different frequencies, after a long period of observation, the difference of update frequency among the sampled subjects was statistically calculated, as shown in Figure 2 [10].

More than half of respondents can update their data daily, but nearly 25% of respondents have been updating their data for more than a month or even a year. In order to obtain the changing trend of agricultural prices in the time dimension more accurately, in the classification of semantic heterogeneity of prices of ginger products information in multisource network, agricultural websites that have not been updated for a long time are excluded, mainly observe the network information source that can be updated daily. If the existing semantic heterogeneity can be eliminated and a unified and timely distribution channel can be established, it plays an important role for farmers and agricultural workers to grasp the real-time and effective price information of agricultural products.
2.4.4. Attributed Value of Prices of Ginger Products
In the sampling survey, due to the differences in the understanding of the developers, the published prices of ginger products information is also quite different, but for agricultural price information to be useful, the main attributes that must be included are the name of the agricultural product, the price (lowest price, highest price, and average price), the unit, the date of release, the place of transaction, in addition, according to statistics, it can also be attributed information such as type, specification, market volume, and transaction volume. The main attributes are almost 100% present in network information sources, in addition, more than 80% of the sites categorize produce by type, less is provided for attributes such as specification and origin, which also have an impact on the price of agricultural products, as shown in Figure 3. As the standards are inconsistent and most of the information is manually input by the information staff, there are a lot of data semantic heterogeneity and data anomalies. Therefore, the statistical results can be used to describe the prices of ginger products information in the multisource network by attribute nine-tuple, that is, agricultural product variety, price, unit, release time, wholesale market, province from, specification, type, network source.

Through the analysis of the survey results, the typical network information sources are screened out, and the attributes and characteristics of multisource network prices of ginger products information are grasped, combined with the existing classification features of semantic heterogeneity, it provides a strong support for the construction of the classification system of semantic heterogeneity of prices of ginger products information [11].
3. Classification of Semantic Heterogeneity of Prices of Ginger Products Information on Multisource Network
Combining the attribute results of prices of ginger products information on multisource network, the semantic heterogeneous classification method based on XML, and the hierarchical relationship of semantic heterogeneity, through cluster analysis of 11 information sources in the survey objects, the semantic heterogeneity of prices of ginger products information is divided into three levels: pattern semantic heterogeneity, context data semantic heterogeneity, and individual abnormal data semantic heterogeneity. The semantic heterogeneity of model is mainly the difference generated in the construction of standard prices of ginger products information attribute description, which mainly includes absence conflict, aggregation conflict, ancestor descendant conflict, and type conflict. Semantic heterogeneity of context data is mainly aimed at the conflict between attribute values, and it mainly includes naming conflict, data value conflict, data representation conflict, data unit domain conflict, and data precision conflict. Semantic heterogeneity of individual heterogeneous data is mainly the detection of similarity and accuracy of information description of individual data, and it mainly includes individual description conflict and data credibility conflict. There is a hierarchical relationship among these three types of semantic heterogeneity, which needs to be eliminated layer by layer to obtain the knowledge information with real use value, and the user gets the right and meaningful results.
Based on the defined semantic meta-paths, a heterogeneous network can be decomposed into homogeneous subnetworks under multiple different meta-paths, and the node similarity of each subnetwork is the meta-path-based node proximity. The node representation of each homogeneous subnetwork adopts the semisupervised stacked denoising autoencoder (SDAEs) learning framework for training and learning, and the node representation of different subnetworks is deeply fused by the stacked fully connected autoencoder to obtain the final representation of the node.
3.1. Node Representation Learning under Different Semantic Meta Paths
Stacked autoencoder is a multi-layer deep neural network based on hierarchical training, which learns data features of different granularities layer by layer and learns highly complex nonlinear features of data at high layers. In order to improve the robustness of the subnetwork representation, stacked denoising autoencoders (SDAEs) are adopted, the input neurons of each layer are randomly discarded, and some input neurons are assigned a value of 0 with a certain probability.
where is the proximity between node i and node j based on the meta-path p, and a(p) is the hyperparameter, which is the weight coefficient of the meta-path p. In the experimental phase of this chapter, the grid search method is used to select a(p) and a(p) ∈ {0.1, 0.2, 0.3, 0.4, 0.5}, and the optimal hyperparameters are obtained by the 10-fold cross-validation method of the labeled dataset for the subsequent prediction task.
4. Analysis
4.1. Research Status
This study proposes the classification of semantic heterogeneity of prices of ginger products information on multisource network, it can classify and identify the massive prices of ginger products information in the network. The semantic heterogeneity is divided into pattern semantic heterogeneity, context data semantic heterogeneity, and individual anomaly data semantic heterogeneity by using a three-layer structure, the information obtained by eliminating semantic heterogeneity layer by layer is the unambiguous, error-free, and valuable information required by users.
4.2. Further Research Direction
The ratio of semantic heterogeneity of prices of ginger products information in multisource network can be obtained through data statistics. As shown in Figure 4, combined with the analysis of current research status, it can be seen that the existing elimination of semantic heterogeneity of prices of ginger products information can only solve the problem of a certain type of semantic heterogeneity, the idea of overall architecture has not been put forward comprehensively, and there are few services for users' personalized needs, these aspects deserve further study in the coming years.

4.2.1. Improve the Semantic Heterogeneous Elimination System of Prices of Ginger Products Information
According to the classification of semantic heterogeneity of prices of ginger products information, the problem of eliminating semantic heterogeneity is solved step by step: (1) Improve domain ontology to solve naming conflicts such as agricultural product variety names and market names, so as to solve semantic heterogeneity of patterns. (2) Eliminate semantic heterogeneity of context data with context arbitrator technology and realize data unambiguities, data representation consistency, data unit unification, and data precision clarity. (3) On the basis of heterogeneous data monitoring and data weight elimination technology, the identification and elimination of individual heterogeneous information are realized. When improving the system, it is also necessary to consider the operating efficiency of the system and obtain efficient and quick results as far as possible [12].
4.2.2. Realize Personalized Service
The final result of the multisource network prices of ginger products information semantic heterogeneity elimination system is to obtain the prices of ginger products display system with use significance for the majority of farmers and agricultural workers, because users have different requirements for data usage and queries, while providing rich query services, the system also needs to provide personalized services for users. Therefore, the establishment of users' personal data space and the realization of real–time push personalized data have become an important direction of future research.
5. Conclusion
According to the attribute results of prices of ginger products information in multisource network, XML technology is used to integrate the data of agricultural multisource information, make the data concrete, model, make the data transmission and integration problem can be effectively solved. The results show that almost 100% of the main attributes exist in the network information sources, in addition, more than 80% of the websites have type differentiation of agricultural products, less is provided for attributes such as specification and origin, which also have an impact on the price of agricultural products. Through the network information source, the attribute characteristics of multisource network prices of ginger products information are grasped, and the classification characteristics of existing semantic heterogeneity are combined, and it provides a strong support for constructing the classification system of semantic heterogeneity of prices of ginger products information. Obtain relevant information by using agricultural information platform, greatly improve the utilization rate of agricultural data, at the same time, it also has a significant impact on the integration of agricultural multisource heterogeneous data and accelerates the pace of modern agricultural development. In the future, the focus of the research is to carry out the layer-by-layer elimination of the three-layer semantic heterogeneity and to study the unresolved semantic conflicts. Context plays a central role in information processing. Grasp the situation of multisource network prices of ginger products information, establish the overall framework of semantic information integration in line with the characteristics of prices of ginger products semantic heterogeneity, so as to eliminate its semantic heterogeneity.
Conflicts of Interest
The authors declare no conflicts of interest.
Open Research
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.