Learning to rank academic experts in the DBLP dataset
Catarina Moreira
Instituto Superior Técnico, INESC-ID, Av. Professor Cavaco Silva, 2744-016 Porto Salvo, Portugal
Search for more papers by this authorPável Calado
Instituto Superior Técnico, INESC-ID, Av. Professor Cavaco Silva, 2744-016 Porto Salvo, Portugal
Search for more papers by this authorBruno Martins
Instituto Superior Técnico, INESC-ID, Av. Professor Cavaco Silva, 2744-016 Porto Salvo, Portugal
Search for more papers by this authorCatarina Moreira
Instituto Superior Técnico, INESC-ID, Av. Professor Cavaco Silva, 2744-016 Porto Salvo, Portugal
Search for more papers by this authorPável Calado
Instituto Superior Técnico, INESC-ID, Av. Professor Cavaco Silva, 2744-016 Porto Salvo, Portugal
Search for more papers by this authorBruno Martins
Instituto Superior Técnico, INESC-ID, Av. Professor Cavaco Silva, 2744-016 Porto Salvo, Portugal
Search for more papers by this authorAbstract
Expert finding is an information retrieval task that is concerned with the search for the most knowledgeable people with respect to a specific topic, and the search is based on documents that describe people's activities. The task involves taking a user query as input and returning a list of people who are sorted by their level of expertise with respect to the user query. Despite recent interest in the area, the current state-of-the-art techniques lack in principled approaches for optimally combining different sources of evidence. This article proposes two frameworks for combining multiple estimators of expertise. These estimators are derived from textual contents, from graph-structure of the citation patterns for the community of experts and from profile information about the experts. More specifically, this article explores the use of supervised learning to rank methods, as well as rank aggregation approaches, for combining all of the estimators of expertise. Several supervised learning algorithms, which are representative of the pointwise, pairwise and listwise approaches, were tested, and various state-of-the-art data fusion techniques were also explored for the rank aggregation framework. Experiments that were performed on a dataset of academic publications from the Computer Science domain attest the adequacy of the proposed approaches.
References
- Adali, S., M. Magdon-Ismail and B. Marshall (2007) A classification algorithm for finding the optimal rank aggregation method, in Proceedings of the 22nd International International Symposium on In Computer and Information Sciences.
- Balog, K., L. Azzopardi and M. de Rijke (2006) Formal models for expert finding in enterprise corpora, in Proceedings of the 29th annual international ACM Conference on Research and Development in Information Retrieval.
- Balog, K., L. Azzopardi and M. de Rijke (2009) A language modeling framework for expert finding, Information Processing and Management, 45, 1–19.
- Balog, K., Y. Fang, M. de Rijke, P. Serdyukov and L. Si (2012) Expertise retrieval, Foundations and Trends in Information Retrieval, 6, 127–256.
- Batista, P.D., M.G. Campiteli and O. Kinouchi (2006) Is it possible to compare researchers with different scientific interests? Scientometrics, 68, 179–189.
- Bozkurt, I.N., H. Gurkok and E.S. Ayaz (2007) Data fusion and bias, Technical report, Bilkent University.
- Brin, S., L. Page, R. Motwani and T. Winograd (1999) The pagerank citation ranking: bringing order to the web, Technical Report 1999–66, Stanford Digital Library Technologies Project.
- Burges, C., T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton and G. Hullender (2005) Learning to rank using gradient descent, in Proceedings of the 22nd International Conference on Machine Learning.
- Cao, Y., J. Liu, S. Bao and H. Li (2006) Research on expert search at enterprise track of TREC 2005, in Proceedings of the 14th Text REtrieval Conference.
- Chapelle, O. and Y. Chang (2011) Yahoo! Learning to Rank Challenge – overview, Machine Learning Research, 14, 1–24.
- Chen, P.-J., H. Xie, S. Maslov and S. Redner (2007) Finding scientific gems with Google's pagerank algorithms, Informetrics, 1, 8–15.
- Craswell, N., D. Hawking, A.-M. Vercoustre and P. Wilkins (2001) P@noptic expert: searching for experts not just for documents, in Proceedings of the 7th Australian World Wide Web Conference (poster papers).
- de Borda, J.-C. (1781) Mémoire sur les Élections au Scrutin, Histoire de l'Académie Royale des Sciences.
- Deng, H., I. King and M.R. Lyu (2008) Formal models for expert finding on DBLP bibliography data, in Proceedings of the 8th IEEE International Conference on Data Mining.
- Deng, H., I. King and M.R. Lyu (2011) Enhanced models for expertise retrieval using community-aware strategies, IEEE Transactions on Systems, Man, and Cybernetics, 99, 1–14.
- Dwork, C., R. Kumar, M. Naor and D. Sivakumar (2001) Rank aggregation revisited, in Proceeding of the 10th World Wide Web Conference Series.
- Egghe, L. (2006), Theory and practice of the g-index, Scientometrics, 69, 131–152.
- Ertekin, S. and C. Rudin (2011) On equivalence relationships between classification and ranking algorithms, Machine Learning Research, 12, 2905–2929.
- Fang, H. and C. Zhai (2007) Probabilistic models for expert finding, in Proceedings of the 29th European Conference on Information Retrieval Research.
- Fox, E. and J.A. Shaw (1994) Combination of multiple searches, in Proceedings of the 2nd Text Retrieval Conference.
- Freund, Y., R. Iyer, R.E. Schapire and Y. Singer (2003) An efficient boosting algorithm for combining preferences, Machine Learning Research, 4, 933–969.
- Haykin, S. (2008) Neural Networks and Learning Machines, Pearson Education: Upper Saddle River, New Jersey, Unites States of America.
- Hirsch, J.E. (2005) An index to quantify an individual's scientific research output, in Proceedings of the National Academy of Sciences USA.
- Hsu, C.-W., C.-C. Chang and C.-J. Lin (2010) A practical guide to support vector classification, Technical report, National Taiwan University.
- Ji, M., J. Han and M. Danilevsky (2011) Ranking-based classification of heterogeneous information networks, in Proceedings of the 17th ACM International Conference on Knowledge Discovery and Data Mining.
- Joachims, T. (2006) Training linear SVMs in linear time, in Proceedings of the 12th ACM Conference on Knowledge Discovery and Data Mining.
- Liu, T.-Y. (2009) Learning to rank for information retrieval, Foundations of Trends Information Retrieval, 3, 225–331.
- Liu, X., J. Bollen, M.L. Nelson and H.V. de Sompel (2005) Co-authorship networks in the digital library research community, Information Processing and Management, 41, 1462–1480.
- Macdonald, C. and I. Ounis (2008) Voting techniques for expert search, Knowledge Information Systems, 16, 259–280.
- Macdonald, C. and I. Ounis (2011) Learning models for ranking aggregates, in Proceedings of the 33rd European Conference on Information Retrieval.
-
Manning, C.D. (2008) Introduction to Information Retrieval, Cambridge University Press: New York, Unites States of America.
10.1017/CBO9780511809071 Google Scholar
- Metzler, D. and W.B. Croft (2007) Linear feature-based models for information retrieval, Information Retrieval, 16, 1–23.
- Montague, M.H. and J.A. Aslam (2002) Condorcet fusion for improved retrieval, in Proceedings of the 11th international conference on information and knowledge management.
- Moreira, C., P. Calado and B. Martins (2011) Learning to rank for expert search in digital libraries of academic publications, in Proceedings of the 15th Portuguese Conference on Artificial Intelligence.
- Petkova, D. and B. Croft (2006) Hierarchical language models for expert finding in enterprise corpora, in Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence.
- Petkova, D. and B. Croft (2007) Proximity-based document representation for named entity retrieval, in Proceedings of the 16th ACM conference on Conference on information and knowledge management.
- Pfahringer, B. (2011) Semi-random model tree ensembles: an effective and scalable regression method, in Proceedings of the 24th Australasian Joint Conference in Advances in Artificial Intelligence.
- Qin, T., T.-Y. Liu, X.-D. Zhang, D.-S. Wang, W.-Y. Xiong and H. Li (2008) Learning to rank relational objects and its application to web search, in Proceedings of the 17th international conference on World Wide Web.
- Riker, W.H. (1988) Liberalism Against Populism: A Confrontation Between the Theory of Democracy and the Theory of Social Choice, Waveland Press: Long Grove, Illinois, United States of America.
- Serdyukov, P. (2009) Search for Expertise: Going Beyond Direct Evidence, PhD thesis, University of Twente.
- Serdyukov, P. and D. Hiemstra (2008) Modeling documents as mixtures of persons for expert finding, in Proceedings of the 30th European conference on Advances in information retrieval.
- Sidiropoulos, A. and Y. Manolopoulos (2005) A citation-based system to assist prize awarding, Journal of the ACM Special Interest Group on Management of Data Record, 34, 54–60.
- Sidiropoulos, A. and Y. Manolopoulos (2006) Generalized comparison of graph-based ranking algorithms for publications and authors, Journal for Systems and Software, 79, 1679–1700.
- Sidiropoulos, A., D. Katsaros and Y. Manolopoulos (2007) Generalized h-index for disclosing latent facts in citation networks, Scientometrics, 72, 253–280.
- Smucker, M.D., J. Allan and B. Carterette (2007) A comparison of statistical significance tests for information retrieval evaluation, in Ins Proceedings of the sixteenth ACM conference on Conference on information and knowledge management.
- Sorokina, D., R. Caruana and M. Riedewald (2007) Additive groves of regression trees, in Proceedings of the 18th European Conference on Machine Learning.
- Tsochantaridis, I., T. Joachims, T. Hofmann and Y. Altun (2005) Large margin methods for structured and interdependent output variables, Machine Learning Research, 6, 1453–1484.
- Voorhees, E. (1999) The trec-8 question answering track report, in Proceedings of the 8th Text Retrieval Conference.
- Xu, J. and H. Li (2007) Adarank: a boosting algorithm for information retrieval, in Proceedings of the 30th annual international ACM conference on Research and development in information retrieval.
- Xu, J., T.-y. Liu, M. Lu, H. Li and W.-y. Ma (2008) Directly optimizing evaluation measures in learning to rank, in Proceedings of the 31st annual international ACM conference on Research and development in information retrieval.
- Yang, Z., J. Tang, B. Wang, J. Guo, J. Li and S. Chen (2009) Expert2bole: from expert finding to bole search, in Proceedings of the 15th ACM Conference on Knowledge Discovery and Data Mining.
- Yue, Y., T. Finley, F. Radlinski and T. Joachims (2007) A support vector method for optimizing average precision, in Proceedings of the 30th Annual International ACM Conference on Research and Development in Information Retrieval.
- Zhang, C.-T. (2009) The e-index, complementing the h-index for excess citations, Public Library of Science One, 4, 5.
- Zhu, J., D. Song and S. Rüger (2007) The open university at TREC 2006 enterprise track expert search task, in Proceedings of the 15th Text Retrieval Conference.
- Zhu, J., D. Song, S. Rüger and J. Huang (2008) Modeling document features for expert finding, in Proceedings of the 17th ACM Conference on Information and Knowledge Management.