Topical community detection from mining user tagging behavior and interest
Xiaoling Sun
Department of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian, 116023 China
Search for more papers by this authorHongfei Lin
Department of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian, 116023 China
Search for more papers by this authorXiaoling Sun
Department of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian, 116023 China
Search for more papers by this authorHongfei Lin
Department of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian, 116023 China
Search for more papers by this authorAbstract
With the development of Web2.0, social tagging systems in which users can freely choose tags to annotate resources according to their interests have attracted much attention. In particular, literature on the emergence of collective intelligence in social tagging systems has increased. In this article, we propose a probabilistic generative model to detect latent topical communities among users. Social tags and resource contents are leveraged to model user interest in two similar and correlated ways. Our primary goal is to capture user tagging behavior and interest and discover the emergent topical community structure. The communities should be groups of users with frequent social interactions as well as similar topical interests, which would have important research implications for personalized information services. Experimental results on two real social tagging data sets with different genres have shown that the proposed generative model more accurately models user interest and detects high-quality and meaningful topical communities.
References
- Backstrom, L., Huttenlocher, D., Kleinberg, J., & Lan, X. (2006). Group formation in large social networks: Membership, growth, and evolution. Proceedings of the 12th International Conference of the Association for Computing Machinery Special Interest Group (ACM SIGIR) on Knowledge Discovery and Data Mining (KDD'06) (pp. 44–54). New York, NY: ACM Press.
-
Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., & Su, Z. (2007). Optimizing web search using social annotations. Proceedings of the 16th International Conference on World Wide Web (WWW'07) (pp. 501–510). New York, NY: ACM Press.
10.1145/1242572.1242640 Google Scholar
- Blei, D.M., & Jordan, M.I. (2003). Modeling annotated data. Proceedings of the 26th annual International Association for Computing Machinery (ACM) Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR'03) (pp. 127–134). New York, NY: ACM Press.
- Blei, D.M., Ng, A., & Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning, 3, 993–1022.
-
Carman, M.J., Baillie, M., & Crestani, F. (2008). Tag data and personalized information retrieval. Proceedings of the 2008 Association for Computing Machinery (ACM) Workshop on Search in Social Media (SSM'08) (pp. 27–34). New York, NY: ACM Press.
10.1145/1458583.1458591 Google Scholar
-
Carmel, D., Zwerdling, N., Guy, I., Oferk-Koifman, S., Har'el, N., Ronen, I., … Chernov, S. (2009). Personalized social search based on the user's social network. Proceedings of the 18th Association for Computing Machinery (ACM) Conference on Information and Knowledge Management (CIKM'09) (pp. 1227–1236). New York, NY: ACM Press.
10.1145/1645953.1646109 Google Scholar
- Clauset, A. (2005). Finding local community structure in networks. Physical Review E, 72(2), 026132.
- Clauset, A., Newman, M.E.J., & Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6), 066111.
- Girvan, M., & Newman, M.E.J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, USA, 99, 7821–7826.
- Griffiths, T.L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, USA, 101, 5228–5235.
- Heinrich, G. (2005). Parameter estimation for text analysis (Tech. Report).
-
Heymann, P., Koutrika, G., & Garcia-Molina, H. (2008). Can social bookmarking improve web search? Proceedings of the International Conference on Web Search and Web Data Mining (WSDM'08) (pp. 195–206). New York, NY: ACM Press.
10.1145/1341531.1341558 Google Scholar
- Hofmann, T. (1999). Probabilistic latent semantic indexing. Proceedings of the 22th annual International Association for Computing Machinery (ACM) Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR '99) (pp. 50–57). New York, NY: ACM Press.
- Jäschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., & Stumme, G. (2007). Tag recommendations in folksonomies. Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'07) (pp. 506–514). Berlin, Germany: Springer.
- Kashoob, S., Caverlee, J., & Ding, Y. (2009). A categorical model for discovering latent structure in social annotations. Proceedings of the 3rd International Association for the Advancement of Artificial Intelligence (AAAI) Conference on Weblogs and Social Media (ICWSM'09). Menlo Park, CA: AAAI Press.
-
Kashoob, S., Caverlee, J., & Kamath, K. (2010). Community-based ranking of the social web. Proceedings of the 21st Association for Computing Machinery (ACM) Conference on Hypertext and Hypermedia (HT'10) (pp. 141–150). New York, NY: ACM Press.
10.1145/1810617.1810642 Google Scholar
- Lancichinetti, A., & Fortunato, S. (2009). Community detection algorithms: A comparative analysis. Physical Review E, 80(5), 056117.
- Lancichinetti, A., Radicchi, F., Ramasco, J.J., & Fortunato, S. (2011). Finding statistically significant communities in networks. PLoS ONE, 6(4), e18961.
-
Leskovec, J., Lang, K.J., & Mahoney, M. (2010). Empirical comparison of algorithms for network community detection. Proceedings of the 19th International Conference on World Wide Web (WWW'10) (pp. 631–640). New York, NY: ACM Press.
10.1145/1772690.1772755 Google Scholar
- Li, D., Ding, Y., Sugimoto, C., He, B., Tang, J., Yan, E., … Dong, T. (2011). Modeling topic and community structure in social tagging: The TTR–LDA–Community Model. Journal of the American Society for Information Science and Technology, 62(9), 1849–1866.
- Li, H., Nie, Z., Lee, W.C., Giles, C.L., & Wen, J.R. (2008). Scalable community discovery on textual data with relations. Proceedings of the 17th International Conference of the Association for Computing Machinery Special Interest Group (ACM SIGIR) on Information and Knowledge Management (CIKM'08) (pp. 1203–1212). New York, NY: ACM Press.
- Lu, C., Hu, X., Chen, X., Park, J., He, T., & Li, Z. (2010). The topic-perspective model for social tagging systems. Proceedings of the 16th International Conference of the Association for Computing Machinery Special Interest Group (ACM SIGIR) on Knowledge Discovery and Data Mining (KDD'10) (pp. 683–692). New York, NY: ACM Press.
- MacKay, D. (2003). Information theory, inference, and learning algorithms. Cambridge, United Kingdom: Cambridge University Press.
-
Manning, C.D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge, United Kingdom: Cambridge University Press.
10.1017/CBO9780511809071 Google Scholar
- McCallum, A.K. (1999). Multi-label text classification with a mixture model trained by EM. Proceedings of the International Association for the Advancement of Artificial Intelligence (AAAI) Workshop on Text Learning. Menlo Park, CA: AAAI Press.
-
Mei, Q., Cai, D., Zhang, D., & Zhai, C. (2008). Topic modeling with network regularization. Proceedings of the 17th International Conference on World Wide Web (WWW'08) (pp. 101–110). New York, NY: ACM Press.
10.1145/1367497.1367512 Google Scholar
-
Ramage, D., Heymann, P., Manning, C.D., & Garcia-Molina, H. (2009). Clustering the tagged web. Proceedings of the International Conference on Web Search and Web Data Mining (WSDM'09) (pp. 54–63). New York, NY: ACM Press.
10.1145/1498759.1498809 Google Scholar
- Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. Institute of Electrical and Electronics Engineers (IEEE) Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
- Song, Y., Zhuang, Z., Li, H., Zhao, Q., Li, J., Lee, W.C., & Giles, C.L. (2008). Real-time automatic tag recommendation. Proceedings of the 31st annual International Association for Computing Machinery (ACM) Special Interest Group on Research and Development in Information Retrieval Conference (SIGIR'08) (pp. 515–522). New York, NY: ACM Press.
- Steyvers, M., Smyth, P., Rosen-Zvi, M., & Groffiths, T. (2004). Probabilistic author–topic models for information discovery. Proceedings of the 10th Association for Computing Machinery (ACM) Conference on Information and Knowledge Management (CIKM'04) (pp. 306–315). New York, NY: ACM Press.
-
Suchanek, F.M., Vojnovic, M., & Gunawardena, D. (2008). Social tags: Meaning and suggestions. Proceedings of the 17th Association for Computing Machinery (ACM) Conference on Information and Knowledge Management (CIKM'08) (pp. 223–232). New York, NY: ACM Press.
10.1145/1458082.1458114 Google Scholar
- Wetzker, R., Zimmermann, C., & Bauckhage, C. (2008). Analyzing social bookmarking systems: A del.icio.us cookbook. Proceedings of the European Conference on Artificial Intelligence (ECAI 2008) Mining Social Data Workshop (MSoDa) (pp. 26–30). Washington, DC: IOS Press.
-
Wu, X., Zhang, L., & Yu, Y. (2006). Exploring social annotations for the semantic web. Proceedings of the 15th International Conference on World Wide Web (WWW'06) (pp. 417–426). New York, NY: ACM Press.
10.1145/1135777.1135839 Google Scholar
- Xu S., Bao, S., Fei, B., Su, Z., & Yu, Y. (2008). Exploring folksonomy for personalized search. Proceedings of the 31st annual International Association for Computing Machinery (ACM) Special Interest Group Conference on Research and Development in Information Retrieval (SIGIR'08) (pp. 155–162). New York, NY: ACM Press.
- Yang, T., Jin, R., Chi, Y., & Zhu, S. (2009). Combining link and content for community detection: A discriminative approach. Proceedings of the 15th Association for Computing Machinery (ACM) Conference on Information and Knowledge Management (CIKM'09) (pp. 927–936). New York, NY: ACM Press.
- Zhang, H., Qiu, B., Giles, C.L., Foley, H.C., & Yen, J. (2007). An lda-based community structure discovery approach for large-scale social networks. Institute of Electrical and Electronics Engineers (IEEE) International Conference on Intelligence and Security Informatics (ISI'07) (pp. 200–207). Washington, DC: IEEE.
-
Zhou, D., Bian, J., Zheng, S., Zha, H., & Giles, C.L. (2008). Exploring social annotations for information retrieval. Proceedings of the 17th international Conference on World Wide Web (WWW'08) (pp. 715–724). New York, NY: ACM Press.
10.1145/1367497.1367594 Google Scholar
-
Zhou, D., Manavoglu, E., Li, J., Giles, C.L., & Zha, H. (2006). Probabilistic models for discovering e-communities. Proceedings of the 15th International Conference on World Wide Web (WWW'06) (pp. 173–182). New York, NY: ACM Press.
10.1145/1135777.1135807 Google Scholar