Clustering and parallel indexing of big IoT data in the fog-cloud computing level
Karima Khettabi
Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria
Search for more papers by this authorZineddine Kouahla
Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria
Search for more papers by this authorBrahim Farou
Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria
Search for more papers by this authorHamid Seridi
Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria
Search for more papers by this authorCorresponding Author
Mohamed Amine Ferrag
Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria
Correspondence
Mohamed Amine Ferrag, Labstic Laboratory, Department of Computer Science, Guelma University, Guelma 24000, Algeria.
Email: [email protected]
Search for more papers by this authorKarima Khettabi
Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria
Search for more papers by this authorZineddine Kouahla
Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria
Search for more papers by this authorBrahim Farou
Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria
Search for more papers by this authorHamid Seridi
Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria
Search for more papers by this authorCorresponding Author
Mohamed Amine Ferrag
Labstic Laboratory, Department of Computer Science, Guelma University, Guelma, Algeria
Correspondence
Mohamed Amine Ferrag, Labstic Laboratory, Department of Computer Science, Guelma University, Guelma 24000, Algeria.
Email: [email protected]
Search for more papers by this authorAbstract
In recent years, the large amount of heterogeneous data generated by the Internet of Things (IoT) sensors and devices made recording and research tasks much more difficult, and most of the state-of-the-art methods have failed to deal with the new IoT requirements. This article proposes a new efficient method that simplifies data indexing and enhances the quality and velocity of the similarity query search in the IoT environment. In this method, the fog layer was divided into two levels. In the clustering fog level, the incremental density-based spatial clustering of applications with noise (DBSCAN) algorithm was used to separate collected data into clusters in order to minimize data overlap during in parallel indexes construction. Parallelism was also used, in the indexing fog level to speed up the similarity-based search process and speed up the similarity-based search process. The data in each cluster were indexed using our proposed structure called B3CF-tree (binary tree based on containers at the cloud-clusters fog computing level). The objects in the leaf nodes of the B3CF-trees are, finally, stored in the cloud. Using this approach for computing multiple datasets, the retrieve time of the similarity search is significantly reduced. The experimental results showed that the combination of DBSCAN clustering and parallel indexing make the B3CF-trees outperform the latest real data indexing methods. For example, in terms of quality, the B3CF-tree has the smallest number of nodes and leaf nodes. In addition, the use of parallelism during kNN search reduced, significantly, the retrieve time of the similarity query search.
Open Research
DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
REFERENCES
- 1Sefati SS, Navimipour NJ. A QoS-aware service composition mechanism in the internet of things using a hidden Markov model-based optimization algorithm. IEEE IoT J. 2021.
- 2De Mauro A, Greco M, Grimaldi M. A formal definition of big data based on its essential features. Libr Rev. 2016; 65(3): 122-135.
- 3Gani A, Siddiqa A, Shamshirband S, Hanum F. A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl Inf Syst. 2016; 46(2): 241-284.
- 4Benghozi P-J, Bureau S, Massit-Folea F. L'Internet des objets. Quels enjeux pour les Européens?. 2008. https://hal.archives-ouvertes.fr/hal-00405070
- 5Al-Fuqaha A, Guizani M, Mohammadi M, Aledhari M, Ayyash M. Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Commun Surv Tutor. 2015; 17(4): 2347-2376.
- 6Bouchedjera SA, Louail L. Latency and energy efficient routing-aware TDMA for wireless sensor networks. Int J Inform Appl Math. 2020; 3: 22-38.
- 7Mukherjee M, Matam R, Shu L, et al. Security and privacy in fog computing: challenges. IEEE Access. 2017; 5: 19293-19304.
- 8Mustafa Q, Anjum A, Zineddine K. Security of smart-meters against side-channel-attacks (SCA). Int J Inform Appl Math. 2019; 2: 27-36.
- 9Kumar N, Jindal A, Villari M, Srirama SN. Resource Management of IoT Edge Devices: Challenges, Techniques, and Solutions. Hoboken, NJ: Wiley Online Library; 2021.
- 10Shahidinejad A, Ghobaei-Arani M. Joint computation offloading and resource provisioning for edge-cloud computing environment: a machine learning-based approach. Softw Pract Exper. 2020; 50(12): 2212-2230.
- 11Rahimi M, Navimipour NJ, Hosseinzadeh M, Moattar MH, Darwesh A. Toward the efficient service selection approaches in cloud computing. Kybernetes. 2021.
- 12Heidari A, Navimipour NJ. Service discovery mechanisms in cloud computing: a comprehensive and systematic literature review. Kybernetes. 2021.
- 13Heidari A, Navimipour NJ. A new SLA-aware method for discovering the cloud services using an improved nature-inspired optimization algorithm. Peer J Comput Sci. 2021; 7.
- 14Heidari A, Jabraeil Jamali MA, Jafari Navimipour N, Akbarpour S. Internet of things offloading: ongoing issues, opportunities, and future challenges. Int J Commun Syst. 2020; 33(14):e4474.
- 15Zhang Y, Jiang Y, Qi L, Bhuiyan MZA, Qian P. Epilepsy diagnosis using multi-view & multi-medoid entropy-based clustering with privacy protection. ACM Trans Internet Technol. 2021; 21(2): 1-21.
- 16Wang T, Liang Y, Tian Y, Bhuiyan MZA, Liu A, Asyhari AT. Solving coupling security problem for sustainable sensor-cloud systems based on fog computing. IEEE Transactions on. Sustain Comput. 2019.
- 17Wang T, Bhuiyan MZA, Wang G, Rahman MA, Wu J, Cao J. Big data reduction for a smart city's critical infrastructural health monitoring. IEEE Commun Mag. 2018; 56(3): 128-133.
- 18Kouahla Z, Anjum A, Akram S, Saba T, Martinez J. XM-tree: data driven computational model by using metric extended nodes with non-overlapping in high-dimensional metric spaces. ComputMath Org Theory. 2019; 25(2): 196-223.
- 19Brinis S, Traina C, Traina AJ. Hollow-tree: a metric access method for data with missing values. J Intell Inf Syst. 2019; 53(3): 481-508.
- 20Srinivasan V, Carey MJ. Performance of B-tree concurrency control algorithms. Paper presented at: Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data; 1991:416-425; ACM, New York, NY.
- 21Chen X, Xu J, Zhou R, et al. S 2 R-tree: a pivot-based indexing structure for semantic-aware spatial keyword search. GeoInformatica. 2020; 24(1): 3-25.
- 22Xia J, Huang S, Zhang S, et al. DAPR-tree: a distributed spatial data indexing scheme with data access patterns to support digital earth initiatives. Int J Dig Earth. 2020; 13(12): 1656-1671.
- 23Yang K, Ding X, Zhang Y, Chen L, Zheng B, Gao Y. Distributed similarity queries in metric spaces. Data Sci Eng. 2019; 4(2): 93-108.
10.1007/s41019-019-0095-7 Google Scholar
- 24Friedman JH, Bentley JL, Finkel RA. An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw. 1977; 3(3): 209-226.
10.1145/355744.355745 Google Scholar
- 25Traina C, Traina A, Seeger B, Faloutsos C. Slim-trees: high performance metric trees minimizing overlap between nodes. Paper presented at: Proceedings of the International Conference on Extending Database Technology; 2000:51-65; Springer, New York, NY.
- 26Benrazek AE, Kouahla Z, Farou B, Ferrag MA, Seridi H, Kurulay M. An efficient indexing for Internet of Things massive data based on cloud-fog computing. Trans Emerg Telecommun Technol. 2020; 31(3):e3868.
- 27Krishna K, Murty MN. Genetic K-means algorithm. IEEE Trans Syst Man Cybern B (Cybern). 1999; 29(3): 433-439.
- 28Ester M, Kriegel HP, Sander J, Xu X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD; Vol. 96, 1996:226-231.
- 29Limkar SV, Jha RK. A novel method for parallel indexing of real time geospatial big data generated by IoT devices. Future Gener Comput Syst. 2019; 97: 433-452.
- 30Guttman A. R-trees: a dynamic index structure for spatial searching. Paper presented at: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, Boston, MA, 18-21 June 1984; 47-57.
- 31Wan S, Zhao Y, Wang T, Gu Z, Abbasi QH, Choo KKR. Multi-dimensional data indexing and range query processing via Voronoi diagram for internet of things. Future Gener Comput Syst. 2019; 91: 382-391.
- 32Papadopoulos A & Katsaros D. A-tree: distributed indexing of multidimensional data for cloud computing environments. Paper presented at: Proceedings of the 2011 IEEE 3rd International Conference on Cloud Computing Technology and Science, Athens, Greece, 29 Nov - 1 Dec 2011; 407-414; IEEE.
- 33Bloom BH. Space/time trade-offs in hash coding with allowable errors. Commun ACM. 1970; 13(7): 422-426.
- 34Zhang X, Ai J, Wang Z, Lu J, Meng X. An efficient multi-dimensional index for cloud data management. Paper presented at: Proceedings of the 1st International Workshop on Cloud Data Management, Hong Kong, China, 2 November 2009;17-24.
- 35Ma Y, Rao J, Hu W, et al. An efficient index for massive IOT data in cloud environment. Paper presented at: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, 29 Oct-2 Nov 2012;2129-2133.
- 36Hu F, Yang C, Jiang Y, et al. A hierarchical indexing strategy for optimizing Apache Spark with HDFS to efficiently query big geospatial raster data. Int J Dig Earth. 2020; 13(3): 410-428.
- 37Chen Z, Yao B, Wang ZJ, et al. ITISS: an efficient framework for querying big temporal data. GeoInformatica. 2020; 24(1): 27-59.
- 38Do P, Phan TH. A distributed M-tree for similarity search in large multimedia database on spark. In:
BG Brij, G Deepak, eds.
Handbook of Research on Multimedia Cyber Security. Hershey, PA: IGI Global; 2020: 146-164.
10.4018/978-1-7998-2701-6.ch007 Google Scholar
- 39Miao D, Liu L, Xu R, Panneerselvam J, Wu Y, Xu W. An efficient indexing model for the fog layer of industrial internet of things. IEEE Trans Ind Inform. 2018; 14(10): 4487-4496.
- 40Krishnaraj N, Elhoseny M, Lydia EL, Shankar K, ALDabbas O. An efficient radix trie-based semantic visual indexing model for large-scale image retrieval in cloud environment. Softw Pract Exper. 2021; 51(3): 489-502.
- 41Jin S, Kim O, Feng W. M X-tree: a double hierarchical metric index with overlap reduction. Paper presented at: Proceedings of the International Conference on Computational Science and Its Applications; 2013:574-589; Springer, New York, NY.
- 42Sprenger S, Schäfer P, Leser U. BB-Tree: a practical and efficient main-memory index structure for multidimensional workloads. EDBT; 2019:169-180.
- 43Uhlmann JK. Satisfying general proximity/similarity queries with metric trees. Inf Process Lett. 1991; 40(4): 175-179.
- 44Ciaccia P, Patella M, Zezula P. M-tree: an efficient access method for similarity search in metric spaces. VLDB. 97: Citeseer; 1997: 426-435.
- 45Ciaccia P, Patella M, Rabitti F, Zezula P. Indexing metric spaces with M-tree. SEBD. Vol 97; 1997: 67-86.
- 46Berchtold S, Keim DA, Kriegel HP. The X-tree: an index structure for high-dimensional data. Very Large Data-Bases; 1996: 28-39.
- 47Sun Z, Wei L, Xu C, et al. An energy-efficient cross-layer-sensing clustering method based on intelligent fog computing in WSNs. IEEE Access. 2019; 7: 144165-144177.
- 48Shahidinejad A, Ghobaei-Arani M, Masdari M. Resource provisioning using workload clustering in cloud computing environment: a hybrid approach. Clust Comput. 2021; 24(1): 319-342.
- 49Karthika E, Mohanapriya S. Dynamic clustering-genetic secure energy awareness routing to improve the performance of energy efficient In IoT cloud. IOP Conf Ser Mater Sci Eng. 2020; 995:012035.
10.1088/1757-899X/995/1/012035 Google Scholar
- 50Happ D, Bayhan S. On the impact of clustering for IoT analytics and message broker placement across cloud and edge. Paper presented at: Proceedings of the 3rd ACM International Workshop on Edge Systems, Analytics and Networking, Heraklion, Greece, 27 April 2020;43-48.
- 51Dagdeviren ZA, Dagdeviren O. BICOT: big data analysis approach for clustering cloud based IoT systems. Avrupa Bilim ve Teknoloji Dergisi. 2021; 26: 395-400.
- 52Sadrishojaei M, Navimipour NJ, Reshadi M, Hosseinzadeh M. A new preventive routing method based on clustering and location prediction in the mobile internet of things. IEEE IoT J. 2021.
- 53Lin JW, Arul JM, Kao JT. A bottom-up tree based storage approach for efficient IoT data analytics in cloud systems. J Grid Comput. 2021; 19(1): 1-19.
- 54Tsai CW, Liu SJ, Wang YC. A parallel metaheuristic data clustering framework for cloud. J Parallel Distrib Comput. 2018; 116: 39-49.
- 55Sun L, Ci S, Liu X, Guo L, Zheng X, Luo Y. Secure grid-based density peaks clustering on hybrid cloud for industrial IoT. Int J Netw Manag. 2021; 31(2):e2139.
- 56Woodley A, Tang LX, Geva S, Nayak R, Chappell T. Parallel K-Tree: a multicore, multinode solution to extreme clustering. Future Gener Comput Syst. 2019; 99: 333-345.
- 57Balakrishna S, Thirumaran M, Solanki VK, Núñez Valdéz ER. Incremental hierarchical clustering driven automatic annotations for unifying IoT streaming data. Int J Interact Multimed Artif Intell. 2020.
- 58Diefenbach D, Lopez V, Singh K, Maret P. Core techniques of question answering systems over knowledge bases: a survey. Knowl Inf Syst. 2018; 55(3): 529-569.
- 59Shuja J, Humayun MA, Alasmary W, Sinky H, Alanazi E, Khan MK. Resource efficient geo-textual hierarchical clustering framework for social IoT applications. IEEE Sens J. 2021.
- 60Roy SG, Chakrabarti A. A novel graph clustering algorithm based on discrete-time quantum random walk. In: B Siddhartha, M Ujjwal, D Paramartha, eds. Quantum Inspired Computational Intelligence. Boston, MA: Morgan Kaufmann; 2017; 361-389.
10.1016/B978-0-12-804409-4.00011-5 Google Scholar
- 61Etemadi M, Ghobaei-Arani M, Shahidinejad A. Resource provisioning for IoT services in the fog computing environment: an autonomic approach. Comput Commun. 2020; 161: 109-131.
- 62Kouahla Z, Anjum A. A Parallel Implementation of GHB Tree. Paper presented at: Proceedings of the IFIP International Conference on Computational Intelligence and Its Applications; 2018:47-55; Springer, New York, NY.
- 63Schubert E, Sander J, Ester M, Kriegel HP, Xu X. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans Database Syst. 2017; 42(3): 1-21.
- 64 Geographical coordinates; 2019. http://data.public.lu/fr/datasets/r/a7d551d7-f374-491aab93-63715b98e6dd.
- 65 GPS trajectories data set. https://archive.ics.uci.edu/ml/datasets/GPS+Trajectories.
- 66 Ward; 2019. https://people.eecs.berkeley.edu/˜yang/software/WAR/WARD1.zip.
- 67Yang AY, Jafari R, Sastry SS, Bajcsy R. Distributed recognition of human actions using wearable motion sensor networks. J Ambient Intell Smart Environ. 2009; 1(2): 103-115.
- 68 Smart Home data. https://www.kaggle.com/cnrieiit/mqttset/version/1.
- 69Vaccari I, Chiola G, Aiello M, Mongelli M, Cambiaso E. MQTTset, a new dataset for machine learning techniques on MQTT. Sensors. 2020; 20(22): 6578.
- 70Ferrag MA, Friha O, Hamouda D, Maglaras L, Janicke H. Edge-IIoTset: a new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning; 2022.