Link Analysis in Web Mining: Techniques and Applications
Prasanna Desikan
Naval Postgraduate School, Monterey, California, USA
Search for more papers by this authorColin DeLong
Naval Postgraduate School, Monterey, California, USA
Search for more papers by this authorJaideep Srivastava
Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, USA
Search for more papers by this authorPrasanna Desikan
Naval Postgraduate School, Monterey, California, USA
Search for more papers by this authorColin DeLong
Naval Postgraduate School, Monterey, California, USA
Search for more papers by this authorJaideep Srivastava
Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, USA
Search for more papers by this authorPhillip C.-Y. Sheu
University of California, Irvine, California, USA
Search for more papers by this authorHeather Yu
Search for more papers by this authorC. V. Ramamoorthy
Search for more papers by this authorArvind K. Joshi
Search for more papers by this authorLotfi A. Zadeh
Search for more papers by this authorSummary
This chapter describes some of the major advances in the Web domain made possible through link analysis primarily for such a definition of a link as well as the applications resulting from research in this area. However, with the advent of enabling technology such as semantically rich markup languages and scripting languages such as JavaScript, the information representation of the data on the Web has remarkably changed, resulting in a “link” taking a more generic sense and pointing to necessity of further research in the domain of Web mining for such kind of data which is more rich in information. The chapter discusses some basic concepts and techniques that have proven their value in this domain and show great promise even as the Web - based technologies move to Web 2. 0 and further.
REFERENCES
- V. Krebs, Data mining email to discover social networks and communities of practice, available: http://www.orgnet.com/email.html, 2003.
- O. Sheyner, S. Jha, J. Haines, R. Lippmann, and J. M. Wing, Automated generation and analysis of attack graphs, in Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, May 2002, SPIEEE Computer Society, 273.
- A. L. Barabasi, Linked: The New Science of Networks, Perseus Publishing, Cambridge, MA, pp. 3–8, 2002.
- M. Henzinger, Link analysis in Web information retrieval, ICDE Bull., 23 (3): 12–23, 2000.
-
J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan, Web usage mining: Discovery and applications of usage patterns from Web data (2000), SIGKDD Explorations, 1 (2), 2000.
10.1145/846183.846188 Google Scholar
-
R. Kosala and H. Blockeel, Web mining research: A survey, SIG KDD Explorations, 2: 1–15, 2000.
10.1145/360402.360406 Google Scholar
- Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J. 2000. Graph structure in the web, in Proceedings of the 9th international World Wide Web Conference on Computer Networks: The International Journal of Computer and Telecommunications Networking, North - Holland Amsterdam, 309–320.
- R. Albert, H. Jeong, and A.-L. Barabasi, Diameter of the World Wide Web, Nature, 401: 130–131, 1999.
- O. Etzioni The World Wide Web: Quagmire or goldmine, Commun. ACM, 39 (11): 65–68, 1996.
- R. Cooley, B. Mobasher, and J. Srivastava, Web mining: Information and pattern discovery on the World Wide, in Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI '97), November 1997.
-
S. Chakrabarti. Data mining for hypertext: A tutorial survey, ACM SIGKDD Explorations, 1 (2): 1–11, 2000.
10.1145/846183.846187 Google Scholar
- K. Efe, V. Raghavan, C. H. Chu, A. L. Broadwater, L. Bolelli, and S. Ertekin, The shape of the Web and its implications for searching the Web, paper presented at the International Conference on Advances in Infrastructure for Electronic Business, Science, and Education on the Internet-Proceedings, Rome, August 2000.
- M. Richardson and P. Domingos. The intelligent surfer: Probabilistic combination of link and content information in PageRank, in Proceedings of the 2001 Neural Information Processing Systems (NIPS) Conference, 2002, Advances in Neural Processing Systems, 14, MIT Press, Cambridge, MA, pp. 1441–1448.
- S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan, Automatic resource compilation by analyzing hyperlink structure and associated text, in Proceedings of the Seventh International World Wide Web Conference, Elsevier, Amsterdam, 1998.
- K. Bharat and M. R. Henzinger, Improved algorithms for topic distillation in hyperlinked environments, in Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, 1998, SIGIR '98, ACM, New York, pp. 104–111.
-
T. Haveliwala, Topic-sensitive pagerank, in Proceedings of the Eleventh International Conference on World Wide Web, 2002, www'02, ACM, New York, pp. 517–526.
10.1145/511446.511513 Google Scholar
- D. Cohn and T. Hofmann, The missing link — A probabilistic model of document content and hypertext connectivity, Adv. Neural Inform. Process. Syst., 13: 430–436, 2001.
- D. Rafiei and A.O. Mendelzon, What is this page known for? Computing Web page reputations, in Proceedings of the Ninth International WWW Conference, Amsterdam, North Holland, Amsterdam, May 2000, pp. 823–835.
- G. Xue, H. Zeng, Z. Chen, W. Ma, H. Zhang, and C. Lu, Implicit link analysis for small web search, in Proceedings of the 26th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, August 1, SIGIR'03, ACM, New York, 2003, pp. 56–63.
- G. Jeh and J. Widom, Scaling personalized Web search, technical report, Stanford University, Stanford, CA, 2002.
- H. Chang, D. Cohn, and A. McCullum, Learning to create customized authority lists, in Proceedings of the 17th International Conference on Machine Learning, 2000, P. Langley (Ed.), Morgan Kaufman, San-Francisco, pp. 127–134.
-
F. Radlinski and T. Joachims, Query chains: Learning to rank from implicit feedback, in Proceeding of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, 2005, KDD '05, ACM, New York, pp. 239–248.
10.1145/1081870.1081899 Google Scholar
- J. M. Kleinberg, Authoritative sources in hyperlinked environment, in Proc. of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 1998, SIAM, Philadelphia, pp. 668–677.
- L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: Bringing order to the Web, Stanford Digital Library Technologies, working paper 1999-0120, January 1998.
- R. Lempel and S. Moran, The stochastic approach for link-structure analysis (SALSA) and the TKC effect, in Proceedings of the 9th International Conference on Computer Network, International Journal of Computer and Telecommunication Networking, May 2000, pp. 387–401.
- D. Gibson, J. Klienberg, and P. Raghavan. Inferring web communities from link topology, in Proc. 9th ACM Conference on Hypertext and Hypermedia, Pittsburgh, June 1998, HYPERTEXT'98, ACM, New York, pp. 225–234.
- G. W. Flake, S. Lawrence, and C. L. Giles. Efficient identification of Web communities, paper presented at the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2000, KDD '05, ACM, New York, pp. 150–160.
- (a) A. Y. Ng, A. X. Zheng, and M. I. Jordan (2001), Stable algorithms for link analysis, in Proc. 24th International Conference on Research and Development in Information Retrieval (SIGIR), 2001. (b) A. Y. Ng, A. X. Zheng, and M. I. Jordan, Link analysis, eigenvectors and stability, in IJCAI 01, 2001.
- J. Zhu, J. Hong, and J. G. Hughes, Using Markov chains for link prediction in adaptive Web sites, in Proc. of ACM SIGWEB Hypertext 2002.
- S. Brin and L. Page, The anatomy of a large-scale hypertextual Web search engine, paper presented at the Seventh International World Wide Web Conference, Brisbane, Australia, 1998.
- T. Haveliwala, Efficient computation of PageRank, technical report, Stanford University, Standford, CA, 1999.
- A. Borodin, G. O. Roberts, J. S. Rosenthal, and P. Tsaparas, Finding authorities and hubs from link structures on the World Wide Web, Hongkong, May 2001, WWW01, ASM, New York, pp. 415–429.
- C. DeLong, S. Mane, and J. Srivastava, Concept-aware ranking: Teaching an old graph new moves, paper presented at the Workshop on Ontology and Knowledge Discovery from Semi-Structured Documents, in conjunction with 6th IEEE International Conference on Data Mining (ICDM), Washington, DC, December 2006.
-
T. Hope, T. Nishimura, and H. Takeda, An integrated method for social network extraction, in Proceedings of the 15th International Conference on World Wide Web, 2006, WWW'06, ASM, New York, pp. 845–846.
10.1145/1135777.1135907 Google Scholar
- L. Ding, T. Finin, and A. Joshi, Analyzing social networks on the semantic Web, IEEE Intell. Syst., 9 (1): 211–223, 2005.
- P. Mika, Flink: Using semantic Web technology for the presentation and analysis of online social networks, J. Web Semantics, 3 (2): 211–223, 2005.
-
J. Xu and H. Chen, Criminal network analysis and visualization, Commun. ACM, 48 (6): 100–107, 2005.
10.1145/1064830.1064834 Google Scholar
- T. Carpenter, G. Karakostas, and D. Shallcross, Practical issues and algorithms for analyzing terrorist networks, in Proc. WMC 2002, San Antonio, TX, 2002.
- B. Prasetyo, I. Pramudiono, K. Takahashi, and M. Kitsuregawa, Naviz: Website navigational behavior visualizer, in Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, May 6–8, 2002.
- M. E. J. Newman, Fast algorithm for detecting community structure in networks, Phys. Rev. E, 69: 066133, 2004.
-
J. Tyler, D. Wilkinson, and B. Huberman, Email as spectroscopy: Automated discovery of community structure within organizations, Communities Technol., 2003, pp 81–69.
10.1007/978-94-017-0115-0_5 Google Scholar
- G. Palla, A.-L. Barabási, and T. Vicsek, Quantifying social group evolution, Nature, 446: 664–667, 2007.
- D. Krackhardt and J. R. Hanson, Informal networks: The company behind the chart, Harvard Business Review, 2004.
- T. Lonier and C. Matthews, Measuring the impact of social networks on entrepreneurial success: The master mind principle, paper presented at the 2004 Babson Kauffman Entrepreneurship Research Conference, Glasgow, Scotland, June.
- N. Pathak, S. Mane, and J. Srivastava, Who thinks who knows who? Sociocognitive analysis of email networks, in ICDM 2006, December 2006, IEEE Computer Society, Washington DC, pp. 466–477.