Transactions on Emerging Telecommunications Technologies

Volume 33, Issue 7 e4484

RESEARCH ARTICLE

Clustering and parallel indexing of big IoT data in the fog-cloud computing level

Karima Khettabi,

Karima Khettabi

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Search for more papers by this author

Zineddine Kouahla,

Zineddine Kouahla

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Search for more papers by this author

Brahim Farou,

Brahim Farou

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Search for more papers by this author

Hamid Seridi,

Hamid Seridi

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Search for more papers by this author

Mohamed Amine Ferrag,

Corresponding Author

Mohamed Amine Ferrag

[email protected]

orcid.org/0000-0002-0632-3172

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Correspondence

Mohamed Amine Ferrag, Labstic Laboratory, Department of Computer Science, Guelma University, Guelma 24000, Algeria.

Email: [email protected]

Search for more papers by this author

Karima Khettabi,

Karima Khettabi

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Search for more papers by this author

Zineddine Kouahla,

Zineddine Kouahla

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Search for more papers by this author

Brahim Farou,

Brahim Farou

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Search for more papers by this author

Hamid Seridi,

Hamid Seridi

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Search for more papers by this author

Mohamed Amine Ferrag,

Corresponding Author

Mohamed Amine Ferrag

[email protected]

orcid.org/0000-0002-0632-3172

Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria

Correspondence

Mohamed Amine Ferrag, Labstic Laboratory, Department of Computer Science, Guelma University, Guelma 24000, Algeria.

Email: [email protected]

Search for more papers by this author

First published: 07 March 2022

https://doi.org/10.1002/ett.4484

Citations: 1

Share a link

Email
Wechat
Bluesky

Abstract

In recent years, the large amount of heterogeneous data generated by the Internet of Things (IoT) sensors and devices made recording and research tasks much more difficult, and most of the state-of-the-art methods have failed to deal with the new IoT requirements. This article proposes a new efficient method that simplifies data indexing and enhances the quality and velocity of the similarity query search in the IoT environment. In this method, the fog layer was divided into two levels. In the clustering fog level, the incremental density-based spatial clustering of applications with noise (DBSCAN) algorithm was used to separate collected data into clusters in order to minimize data overlap during in parallel indexes construction. Parallelism was also used, in the indexing fog level to speed up the similarity-based search process and speed up the similarity-based search process. The data in each cluster were indexed using our proposed structure called B3CF-tree (binary tree based on containers at the cloud-clusters fog computing level). The objects in the leaf nodes of the B3CF-trees are, finally, stored in the cloud. Using this approach for computing multiple datasets, the retrieve time of the similarity search is significantly reduced. The experimental results showed that the combination of DBSCAN clustering and parallel indexing make the B3CF-trees outperform the latest real data indexing methods. For example, in terms of quality, the B3CF-tree has the smallest number of nodes and leaf nodes. In addition, the use of parallelism during kNN search reduced, significantly, the retrieve time of the similarity query search.

Open Research

DATA AVAILABILITY STATEMENT

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

REFERENCES

1Sefati SS, Navimipour NJ. A QoS-aware service composition mechanism in the internet of things using a hidden Markov model-based optimization algorithm. IEEE IoT J. 2021.
Google Scholar
2De Mauro A, Greco M, Grimaldi M. A formal definition of big data based on its essential features. Libr Rev. 2016; 65(3): 122-135.
10.1108/LR-06-2015-0061
Web of Science® Google Scholar
3Gani A, Siddiqa A, Shamshirband S, Hanum F. A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl Inf Syst. 2016; 46(2): 241-284.
10.1007/s10115-015-0830-y
Web of Science® Google Scholar
4Benghozi P-J, Bureau S, Massit-Folea F. L'Internet des objets. Quels enjeux pour les Européens?. 2008. https://hal.archives-ouvertes.fr/hal-00405070
Google Scholar
5Al-Fuqaha A, Guizani M, Mohammadi M, Aledhari M, Ayyash M. Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Commun Surv Tutor. 2015; 17(4): 2347-2376.
10.1109/COMST.2015.2444095
Web of Science® Google Scholar
6Bouchedjera SA, Louail L. Latency and energy efficient routing-aware TDMA for wireless sensor networks. Int J Inform Appl Math. 2020; 3: 22-38.
Google Scholar
7Mukherjee M, Matam R, Shu L, et al. Security and privacy in fog computing: challenges. IEEE Access. 2017; 5: 19293-19304.
10.1109/ACCESS.2017.2749422
Web of Science® Google Scholar
8Mustafa Q, Anjum A, Zineddine K. Security of smart-meters against side-channel-attacks (SCA). Int J Inform Appl Math. 2019; 2: 27-36.
Google Scholar
9Kumar N, Jindal A, Villari M, Srirama SN. Resource Management of IoT Edge Devices: Challenges, Techniques, and Solutions. Hoboken, NJ: Wiley Online Library; 2021.
Google Scholar
10Shahidinejad A, Ghobaei-Arani M. Joint computation offloading and resource provisioning for edge-cloud computing environment: a machine learning-based approach. Softw Pract Exper. 2020; 50(12): 2212-2230.
10.1002/spe.2888
Web of Science® Google Scholar
11Rahimi M, Navimipour NJ, Hosseinzadeh M, Moattar MH, Darwesh A. Toward the efficient service selection approaches in cloud computing. Kybernetes. 2021.
Web of Science® Google Scholar
12Heidari A, Navimipour NJ. Service discovery mechanisms in cloud computing: a comprehensive and systematic literature review. Kybernetes. 2021.
Web of Science® Google Scholar
13Heidari A, Navimipour NJ. A new SLA-aware method for discovering the cloud services using an improved nature-inspired optimization algorithm. Peer J Comput Sci. 2021; 7.
PubMed Web of Science® Google Scholar
14Heidari A, Jabraeil Jamali MA, Jafari Navimipour N, Akbarpour S. Internet of things offloading: ongoing issues, opportunities, and future challenges. Int J Commun Syst. 2020; 33(14):e4474.
10.1002/dac.4474
Web of Science® Google Scholar
15Zhang Y, Jiang Y, Qi L, Bhuiyan MZA, Qian P. Epilepsy diagnosis using multi-view & multi-medoid entropy-based clustering with privacy protection. ACM Trans Internet Technol. 2021; 21(2): 1-21.
10.1145/3404893
Web of Science® Google Scholar
16Wang T, Liang Y, Tian Y, Bhuiyan MZA, Liu A, Asyhari AT. Solving coupling security problem for sustainable sensor-cloud systems based on fog computing. IEEE Transactions on. Sustain Comput. 2019.
Google Scholar
17Wang T, Bhuiyan MZA, Wang G, Rahman MA, Wu J, Cao J. Big data reduction for a smart city's critical infrastructural health monitoring. IEEE Commun Mag. 2018; 56(3): 128-133.
10.1109/MCOM.2018.1700303
Web of Science® Google Scholar
18Kouahla Z, Anjum A, Akram S, Saba T, Martinez J. XM-tree: data driven computational model by using metric extended nodes with non-overlapping in high-dimensional metric spaces. ComputMath Org Theory. 2019; 25(2): 196-223.
10.1007/s10588-018-9272-x
Web of Science® Google Scholar
19Brinis S, Traina C, Traina AJ. Hollow-tree: a metric access method for data with missing values. J Intell Inf Syst. 2019; 53(3): 481-508.
10.1007/s10844-019-00567-8
Web of Science® Google Scholar
20Srinivasan V, Carey MJ. Performance of B-tree concurrency control algorithms. Paper presented at: Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data; 1991:416-425; ACM, New York, NY.
Google Scholar
21Chen X, Xu J, Zhou R, et al. S 2 R-tree: a pivot-based indexing structure for semantic-aware spatial keyword search. GeoInformatica. 2020; 24(1): 3-25.
10.1007/s10707-019-00372-z
Web of Science® Google Scholar
22Xia J, Huang S, Zhang S, et al. DAPR-tree: a distributed spatial data indexing scheme with data access patterns to support digital earth initiatives. Int J Dig Earth. 2020; 13(12): 1656-1671.
10.1080/17538947.2020.1778804
Web of Science® Google Scholar
23Yang K, Ding X, Zhang Y, Chen L, Zheng B, Gao Y. Distributed similarity queries in metric spaces. Data Sci Eng. 2019; 4(2): 93-108.
10.1007/s41019-019-0095-7
Google Scholar
24Friedman JH, Bentley JL, Finkel RA. An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw. 1977; 3(3): 209-226.
10.1145/355744.355745
Google Scholar
25Traina C, Traina A, Seeger B, Faloutsos C. Slim-trees: high performance metric trees minimizing overlap between nodes. Paper presented at: Proceedings of the International Conference on Extending Database Technology; 2000:51-65; Springer, New York, NY.
Google Scholar
26Benrazek AE, Kouahla Z, Farou B, Ferrag MA, Seridi H, Kurulay M. An efficient indexing for Internet of Things massive data based on cloud-fog computing. Trans Emerg Telecommun Technol. 2020; 31(3):e3868.
10.1002/ett.3868
Web of Science® Google Scholar
27Krishna K, Murty MN. Genetic K-means algorithm. IEEE Trans Syst Man Cybern B (Cybern). 1999; 29(3): 433-439.
10.1109/3477.764879
CAS PubMed Web of Science® Google Scholar
28Ester M, Kriegel HP, Sander J, Xu X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD; Vol. 96, 1996:226-231.
Google Scholar
29Limkar SV, Jha RK. A novel method for parallel indexing of real time geospatial big data generated by IoT devices. Future Gener Comput Syst. 2019; 97: 433-452.
10.1016/j.future.2018.09.061
Web of Science® Google Scholar
30Guttman A. R-trees: a dynamic index structure for spatial searching. Paper presented at: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, Boston, MA, 18-21 June 1984; 47-57.
Google Scholar
31Wan S, Zhao Y, Wang T, Gu Z, Abbasi QH, Choo KKR. Multi-dimensional data indexing and range query processing via Voronoi diagram for internet of things. Future Gener Comput Syst. 2019; 91: 382-391.
10.1016/j.future.2018.08.007
Web of Science® Google Scholar
32Papadopoulos A & Katsaros D. A-tree: distributed indexing of multidimensional data for cloud computing environments. Paper presented at: Proceedings of the 2011 IEEE 3rd International Conference on Cloud Computing Technology and Science, Athens, Greece, 29 Nov - 1 Dec 2011; 407-414; IEEE.
Google Scholar
33Bloom BH. Space/time trade-offs in hash coding with allowable errors. Commun ACM. 1970; 13(7): 422-426.
10.1145/362686.362692
Web of Science® Google Scholar
34Zhang X, Ai J, Wang Z, Lu J, Meng X. An efficient multi-dimensional index for cloud data management. Paper presented at: Proceedings of the 1st International Workshop on Cloud Data Management, Hong Kong, China, 2 November 2009;17-24.
Google Scholar
35Ma Y, Rao J, Hu W, et al. An efficient index for massive IOT data in cloud environment. Paper presented at: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, 29 Oct-2 Nov 2012;2129-2133.
Google Scholar
36Hu F, Yang C, Jiang Y, et al. A hierarchical indexing strategy for optimizing Apache Spark with HDFS to efficiently query big geospatial raster data. Int J Dig Earth. 2020; 13(3): 410-428.
10.1080/17538947.2018.1523957
Web of Science® Google Scholar
37Chen Z, Yao B, Wang ZJ, et al. ITISS: an efficient framework for querying big temporal data. GeoInformatica. 2020; 24(1): 27-59.
10.1007/s10707-019-00362-1
Web of Science® Google Scholar
38Do P, Phan TH. A distributed M-tree for similarity search in large multimedia database on spark. In: BG Brij, G Deepak, eds. Handbook of Research on Multimedia Cyber Security. Hershey, PA: IGI Global; 2020: 146-164.
10.4018/978-1-7998-2701-6.ch007
Google Scholar
39Miao D, Liu L, Xu R, Panneerselvam J, Wu Y, Xu W. An efficient indexing model for the fog layer of industrial internet of things. IEEE Trans Ind Inform. 2018; 14(10): 4487-4496.
10.1109/TII.2018.2799598
Web of Science® Google Scholar
40Krishnaraj N, Elhoseny M, Lydia EL, Shankar K, ALDabbas O. An efficient radix trie-based semantic visual indexing model for large-scale image retrieval in cloud environment. Softw Pract Exper. 2021; 51(3): 489-502.
10.1002/spe.2834
Web of Science® Google Scholar
41Jin S, Kim O, Feng W. M X-tree: a double hierarchical metric index with overlap reduction. Paper presented at: Proceedings of the International Conference on Computational Science and Its Applications; 2013:574-589; Springer, New York, NY.
Google Scholar
42Sprenger S, Schäfer P, Leser U. BB-Tree: a practical and efficient main-memory index structure for multidimensional workloads. EDBT; 2019:169-180.
Google Scholar
43Uhlmann JK. Satisfying general proximity/similarity queries with metric trees. Inf Process Lett. 1991; 40(4): 175-179.
10.1016/0020-0190(91)90074-R
Web of Science® Google Scholar
44Ciaccia P, Patella M, Zezula P. M-tree: an efficient access method for similarity search in metric spaces. VLDB. 97: Citeseer; 1997: 426-435.
Google Scholar
45Ciaccia P, Patella M, Rabitti F, Zezula P. Indexing metric spaces with M-tree. SEBD. Vol 97; 1997: 67-86.
Google Scholar
46Berchtold S, Keim DA, Kriegel HP. The X-tree: an index structure for high-dimensional data. Very Large Data-Bases; 1996: 28-39.
Google Scholar
47Sun Z, Wei L, Xu C, et al. An energy-efficient cross-layer-sensing clustering method based on intelligent fog computing in WSNs. IEEE Access. 2019; 7: 144165-144177.
10.1109/ACCESS.2019.2944858
Web of Science® Google Scholar
48Shahidinejad A, Ghobaei-Arani M, Masdari M. Resource provisioning using workload clustering in cloud computing environment: a hybrid approach. Clust Comput. 2021; 24(1): 319-342.
10.1007/s10586-020-03107-0
Web of Science® Google Scholar
49Karthika E, Mohanapriya S. Dynamic clustering-genetic secure energy awareness routing to improve the performance of energy efficient In IoT cloud. IOP Conf Ser Mater Sci Eng. 2020; 995:012035.
10.1088/1757-899X/995/1/012035
Google Scholar
50Happ D, Bayhan S. On the impact of clustering for IoT analytics and message broker placement across cloud and edge. Paper presented at: Proceedings of the 3rd ACM International Workshop on Edge Systems, Analytics and Networking, Heraklion, Greece, 27 April 2020;43-48.
Google Scholar
51Dagdeviren ZA, Dagdeviren O. BICOT: big data analysis approach for clustering cloud based IoT systems. Avrupa Bilim ve Teknoloji Dergisi. 2021; 26: 395-400.
Google Scholar
52Sadrishojaei M, Navimipour NJ, Reshadi M, Hosseinzadeh M. A new preventive routing method based on clustering and location prediction in the mobile internet of things. IEEE IoT J. 2021.
Google Scholar
53Lin JW, Arul JM, Kao JT. A bottom-up tree based storage approach for efficient IoT data analytics in cloud systems. J Grid Comput. 2021; 19(1): 1-19.
10.1007/s10723-021-09553-3
Web of Science® Google Scholar
54Tsai CW, Liu SJ, Wang YC. A parallel metaheuristic data clustering framework for cloud. J Parallel Distrib Comput. 2018; 116: 39-49.
10.1016/j.jpdc.2017.10.020
Web of Science® Google Scholar
55Sun L, Ci S, Liu X, Guo L, Zheng X, Luo Y. Secure grid-based density peaks clustering on hybrid cloud for industrial IoT. Int J Netw Manag. 2021; 31(2):e2139.
10.1002/nem.2139
Web of Science® Google Scholar
56Woodley A, Tang LX, Geva S, Nayak R, Chappell T. Parallel K-Tree: a multicore, multinode solution to extreme clustering. Future Gener Comput Syst. 2019; 99: 333-345.
10.1016/j.future.2018.09.038
Web of Science® Google Scholar
57Balakrishna S, Thirumaran M, Solanki VK, Núñez Valdéz ER. Incremental hierarchical clustering driven automatic annotations for unifying IoT streaming data. Int J Interact Multimed Artif Intell. 2020.
Web of Science® Google Scholar
58Diefenbach D, Lopez V, Singh K, Maret P. Core techniques of question answering systems over knowledge bases: a survey. Knowl Inf Syst. 2018; 55(3): 529-569.
10.1007/s10115-017-1100-y
Web of Science® Google Scholar
59Shuja J, Humayun MA, Alasmary W, Sinky H, Alanazi E, Khan MK. Resource efficient geo-textual hierarchical clustering framework for social IoT applications. IEEE Sens J. 2021.
10.1109/JSEN.2021.3060953
Web of Science® Google Scholar
60Roy SG, Chakrabarti A. A novel graph clustering algorithm based on discrete-time quantum random walk. In: B Siddhartha, M Ujjwal, D Paramartha, eds. Quantum Inspired Computational Intelligence. Boston, MA: Morgan Kaufmann; 2017; 361-389.
10.1016/B978-0-12-804409-4.00011-5
Google Scholar
61Etemadi M, Ghobaei-Arani M, Shahidinejad A. Resource provisioning for IoT services in the fog computing environment: an autonomic approach. Comput Commun. 2020; 161: 109-131.
10.1016/j.comcom.2020.07.028
Web of Science® Google Scholar
62Kouahla Z, Anjum A. A Parallel Implementation of GHB Tree. Paper presented at: Proceedings of the IFIP International Conference on Computational Intelligence and Its Applications; 2018:47-55; Springer, New York, NY.
Google Scholar
63Schubert E, Sander J, Ester M, Kriegel HP, Xu X. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans Database Syst. 2017; 42(3): 1-21.
10.1145/3068335
Web of Science® Google Scholar
64 Geographical coordinates; 2019. http://data.public.lu/fr/datasets/r/a7d551d7-f374-491aab93-63715b98e6dd.
Google Scholar
65 GPS trajectories data set. https://archive.ics.uci.edu/ml/datasets/GPS+Trajectories.
Google Scholar
66 Ward; 2019. https://people.eecs.berkeley.edu/˜yang/software/WAR/WARD1.zip.
Google Scholar
67Yang AY, Jafari R, Sastry SS, Bajcsy R. Distributed recognition of human actions using wearable motion sensor networks. J Ambient Intell Smart Environ. 2009; 1(2): 103-115.
10.3233/AIS-2009-0016
Web of Science® Google Scholar
68 Smart Home data. https://www.kaggle.com/cnrieiit/mqttset/version/1.
Google Scholar
69Vaccari I, Chiola G, Aiello M, Mongelli M, Cambiaso E. MQTTset, a new dataset for machine learning techniques on MQTT. Sensors. 2020; 20(22): 6578.
10.3390/s20226578
Web of Science® Google Scholar
70Ferrag MA, Friha O, Hamouda D, Maglaras L, Janicke H. Edge-IIoTset: a new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning; 2022.
Google Scholar

Citing Literature

Volume33, Issue7

July 2022

e4484

Clustering and parallel indexing of big IoT data in the fog-cloud computing level

Abstract

Open Research

DATA AVAILABILITY STATEMENT

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Clustering and parallel indexing of big IoT data in the fog-cloud computing level

Abstract

Open Research

DATA AVAILABILITY STATEMENT

REFERENCES

Citing Literature

References

Related

Information