Volume 33, Issue 7 e4484
RESEARCH ARTICLE

Clustering and parallel indexing of big IoT data in the fog-cloud computing level

Karima Khettabi

Karima Khettabi

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Search for more papers by this author
Zineddine Kouahla

Zineddine Kouahla

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Search for more papers by this author
Brahim Farou

Brahim Farou

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Search for more papers by this author
Hamid Seridi

Hamid Seridi

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Search for more papers by this author
Mohamed Amine Ferrag

Corresponding Author

Mohamed Amine Ferrag

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Correspondence

Mohamed Amine Ferrag, Labstic Laboratory, Department of Computer Science, Guelma University, Guelma 24000, Algeria.

Email: [email protected]

Search for more papers by this author
First published: 07 March 2022
Citations: 1

Abstract

In recent years, the large amount of heterogeneous data generated by the Internet of Things (IoT) sensors and devices made recording and research tasks much more difficult, and most of the state-of-the-art methods have failed to deal with the new IoT requirements. This article proposes a new efficient method that simplifies data indexing and enhances the quality and velocity of the similarity query search in the IoT environment. In this method, the fog layer was divided into two levels. In the clustering fog level, the incremental density-based spatial clustering of applications with noise (DBSCAN) algorithm was used to separate collected data into clusters in order to minimize data overlap during in parallel indexes construction. Parallelism was also used, in the indexing fog level to speed up the similarity-based search process and speed up the similarity-based search process. The data in each cluster were indexed using our proposed structure called B3CF-tree (binary tree based on containers at the cloud-clusters fog computing level). The objects in the leaf nodes of the B3CF-trees are, finally, stored in the cloud. Using this approach for computing multiple datasets, the retrieve time of the similarity search is significantly reduced. The experimental results showed that the combination of DBSCAN clustering and parallel indexing make the B3CF-trees outperform the latest real data indexing methods. For example, in terms of quality, the B3CF-tree has the smallest number of nodes and leaf nodes. In addition, the use of parallelism during kNN search reduced, significantly, the retrieve time of the similarity query search.

DATA AVAILABILITY STATEMENT

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.