Expert finding is an information retrieval task that is concerned with the search for the most knowledgeable people with respect to a specific topic, and the search is based on documents that describe people's activities. The task involves taking a user query as input and returning a list of people who are sorted by their level of expertise with respect to the user query. Despite recent interest in the area, the current state-of-the-art techniques lack in principled approaches for optimally combining different sources of evidence. This article proposes two frameworks for combining multiple estimators of expertise. These estimators are derived from textual contents, from graph-structure of the citation patterns for the community of experts and from profile information about the experts. More specifically, this article explores the use of supervised learning to rank methods, as well as rank aggregation approaches, for combining all of the estimators of expertise. Several supervised learning algorithms, which are representative of the pointwise, pairwise and listwise approaches, were tested, and various state-of-the-art data fusion techniques were also explored for the rank aggregation framework. Experiments that were performed on a dataset of academic publications from the Computer Science domain attest the adequacy of the proposed approaches.

References

Adali, S., M. Magdon-Ismail and B. Marshall (2007) A classification algorithm for finding the optimal rank aggregation method, in Proceedings of the 22nd International International Symposium on In Computer and Information Sciences.
Google Scholar
Balog, K., L. Azzopardi and M. de Rijke (2006) Formal models for expert finding in enterprise corpora, in Proceedings of the 29th annual international ACM Conference on Research and Development in Information Retrieval.
Google Scholar
Balog, K., L. Azzopardi and M. de Rijke (2009) A language modeling framework for expert finding, Information Processing and Management, 45, 1–19.
10.1016/j.ipm.2008.06.003
Web of Science® Google Scholar
Balog, K., Y. Fang, M. de Rijke, P. Serdyukov and L. Si (2012) Expertise retrieval, Foundations and Trends in Information Retrieval, 6, 127–256.
10.1561/1500000024
Web of Science® Google Scholar
Batista, P.D., M.G. Campiteli and O. Kinouchi (2006) Is it possible to compare researchers with different scientific interests? Scientometrics, 68, 179–189.
10.1007/s11192-006-0090-4
CAS Web of Science® Google Scholar
Bozkurt, I.N., H. Gurkok and E.S. Ayaz (2007) Data fusion and bias, Technical report, Bilkent University.
Google Scholar
Brin, S., L. Page, R. Motwani and T. Winograd (1999) The pagerank citation ranking: bringing order to the web, Technical Report 1999–66, Stanford Digital Library Technologies Project.
Google Scholar
Burges, C., T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton and G. Hullender (2005) Learning to rank using gradient descent, in Proceedings of the 22nd International Conference on Machine Learning.
Google Scholar
Cao, Y., J. Liu, S. Bao and H. Li (2006) Research on expert search at enterprise track of TREC 2005, in Proceedings of the 14th Text REtrieval Conference.
Google Scholar
Chapelle, O. and Y. Chang (2011) Yahoo! Learning to Rank Challenge – overview, Machine Learning Research, 14, 1–24.
Google Scholar
Chen, P.-J., H. Xie, S. Maslov and S. Redner (2007) Finding scientific gems with Google's pagerank algorithms, Informetrics, 1, 8–15.
10.1016/j.joi.2006.06.001
Web of Science® Google Scholar
Craswell, N., D. Hawking, A.-M. Vercoustre and P. Wilkins (2001) P@noptic expert: searching for experts not just for documents, in Proceedings of the 7th Australian World Wide Web Conference (poster papers).
Google Scholar
de Borda, J.-C. (1781) Mémoire sur les Élections au Scrutin, Histoire de l'Académie Royale des Sciences.
Google Scholar
Deng, H., I. King and M.R. Lyu (2008) Formal models for expert finding on DBLP bibliography data, in Proceedings of the 8th IEEE International Conference on Data Mining.
Google Scholar
Deng, H., I. King and M.R. Lyu (2011) Enhanced models for expertise retrieval using community-aware strategies, IEEE Transactions on Systems, Man, and Cybernetics, 99, 1–14.
Google Scholar
Dwork, C., R. Kumar, M. Naor and D. Sivakumar (2001) Rank aggregation revisited, in Proceeding of the 10th World Wide Web Conference Series.
Google Scholar
Egghe, L. (2006), Theory and practice of the g-index, Scientometrics, 69, 131–152.
10.1007/s11192-006-0144-7
Web of Science® Google Scholar
Ertekin, S. and C. Rudin (2011) On equivalence relationships between classification and ranking algorithms, Machine Learning Research, 12, 2905–2929.
Web of Science® Google Scholar
Fang, H. and C. Zhai (2007) Probabilistic models for expert finding, in Proceedings of the 29th European Conference on Information Retrieval Research.
Google Scholar
Fox, E. and J.A. Shaw (1994) Combination of multiple searches, in Proceedings of the 2nd Text Retrieval Conference.
Google Scholar
Freund, Y., R. Iyer, R.E. Schapire and Y. Singer (2003) An efficient boosting algorithm for combining preferences, Machine Learning Research, 4, 933–969.
10.1162/jmlr.2003.4.6.933
Web of Science® Google Scholar
Haykin, S. (2008) Neural Networks and Learning Machines, Pearson Education: Upper Saddle River, New Jersey, Unites States of America.
Google Scholar
Hirsch, J.E. (2005) An index to quantify an individual's scientific research output, in Proceedings of the National Academy of Sciences USA.
Google Scholar
Hsu, C.-W., C.-C. Chang and C.-J. Lin (2010) A practical guide to support vector classification, Technical report, National Taiwan University.
Google Scholar
Ji, M., J. Han and M. Danilevsky (2011) Ranking-based classification of heterogeneous information networks, in Proceedings of the 17th ACM International Conference on Knowledge Discovery and Data Mining.
Google Scholar
Joachims, T. (2006) Training linear SVMs in linear time, in Proceedings of the 12th ACM Conference on Knowledge Discovery and Data Mining.
Google Scholar
Liu, T.-Y. (2009) Learning to rank for information retrieval, Foundations of Trends Information Retrieval, 3, 225–331.
10.1561/1500000016
CAS Google Scholar
Liu, X., J. Bollen, M.L. Nelson and H.V. de Sompel (2005) Co-authorship networks in the digital library research community, Information Processing and Management, 41, 1462–1480.
10.1016/j.ipm.2005.03.012
Web of Science® Google Scholar
Macdonald, C. and I. Ounis (2008) Voting techniques for expert search, Knowledge Information Systems, 16, 259–280.
10.1007/s10115-007-0105-3
Web of Science® Google Scholar
Macdonald, C. and I. Ounis (2011) Learning models for ranking aggregates, in Proceedings of the 33rd European Conference on Information Retrieval.
Google Scholar
Manning, C.D. (2008) Introduction to Information Retrieval, Cambridge University Press: New York, Unites States of America.
10.1017/CBO9780511809071
Google Scholar
Metzler, D. and W.B. Croft (2007) Linear feature-based models for information retrieval, Information Retrieval, 16, 1–23.
Google Scholar
Montague, M.H. and J.A. Aslam (2002) Condorcet fusion for improved retrieval, in Proceedings of the 11th international conference on information and knowledge management.
Google Scholar
Moreira, C., P. Calado and B. Martins (2011) Learning to rank for expert search in digital libraries of academic publications, in Proceedings of the 15th Portuguese Conference on Artificial Intelligence.
Google Scholar
Petkova, D. and B. Croft (2006) Hierarchical language models for expert finding in enterprise corpora, in Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence.
Google Scholar
Petkova, D. and B. Croft (2007) Proximity-based document representation for named entity retrieval, in Proceedings of the 16th ACM conference on Conference on information and knowledge management.
Google Scholar
Pfahringer, B. (2011) Semi-random model tree ensembles: an effective and scalable regression method, in Proceedings of the 24th Australasian Joint Conference in Advances in Artificial Intelligence.
Google Scholar
Qin, T., T.-Y. Liu, X.-D. Zhang, D.-S. Wang, W.-Y. Xiong and H. Li (2008) Learning to rank relational objects and its application to web search, in Proceedings of the 17th international conference on World Wide Web.
Google Scholar
Riker, W.H. (1988) Liberalism Against Populism: A Confrontation Between the Theory of Democracy and the Theory of Social Choice, Waveland Press: Long Grove, Illinois, United States of America.
Google Scholar
Serdyukov, P. (2009) Search for Expertise: Going Beyond Direct Evidence, PhD thesis, University of Twente.
Google Scholar
Serdyukov, P. and D. Hiemstra (2008) Modeling documents as mixtures of persons for expert finding, in Proceedings of the 30th European conference on Advances in information retrieval.
Google Scholar
Sidiropoulos, A. and Y. Manolopoulos (2005) A citation-based system to assist prize awarding, Journal of the ACM Special Interest Group on Management of Data Record, 34, 54–60.
Web of Science® Google Scholar
Sidiropoulos, A. and Y. Manolopoulos (2006) Generalized comparison of graph-based ranking algorithms for publications and authors, Journal for Systems and Software, 79, 1679–1700.
10.1016/j.jss.2006.01.011
Web of Science® Google Scholar
Sidiropoulos, A., D. Katsaros and Y. Manolopoulos (2007) Generalized h-index for disclosing latent facts in citation networks, Scientometrics, 72, 253–280.
10.1007/s11192-007-1722-z
CAS Web of Science® Google Scholar
Smucker, M.D., J. Allan and B. Carterette (2007) A comparison of statistical significance tests for information retrieval evaluation, in Ins Proceedings of the sixteenth ACM conference on Conference on information and knowledge management.
Google Scholar
Sorokina, D., R. Caruana and M. Riedewald (2007) Additive groves of regression trees, in Proceedings of the 18th European Conference on Machine Learning.
Google Scholar
Tsochantaridis, I., T. Joachims, T. Hofmann and Y. Altun (2005) Large margin methods for structured and interdependent output variables, Machine Learning Research, 6, 1453–1484.
Web of Science® Google Scholar
Voorhees, E. (1999) The trec-8 question answering track report, in Proceedings of the 8th Text Retrieval Conference.
Google Scholar
Xu, J. and H. Li (2007) Adarank: a boosting algorithm for information retrieval, in Proceedings of the 30th annual international ACM conference on Research and development in information retrieval.
Google Scholar
Xu, J., T.-y. Liu, M. Lu, H. Li and W.-y. Ma (2008) Directly optimizing evaluation measures in learning to rank, in Proceedings of the 31st annual international ACM conference on Research and development in information retrieval.
Google Scholar
Yang, Z., J. Tang, B. Wang, J. Guo, J. Li and S. Chen (2009) Expert2bole: from expert finding to bole search, in Proceedings of the 15th ACM Conference on Knowledge Discovery and Data Mining.
Google Scholar
Yue, Y., T. Finley, F. Radlinski and T. Joachims (2007) A support vector method for optimizing average precision, in Proceedings of the 30th Annual International ACM Conference on Research and Development in Information Retrieval.
Google Scholar
Zhang, C.-T. (2009) The e-index, complementing the h-index for excess citations, Public Library of Science One, 4, 5.
Web of Science® Google Scholar
Zhu, J., D. Song and S. Rüger (2007) The open university at TREC 2006 enterprise track expert search task, in Proceedings of the 15th Text Retrieval Conference.
Google Scholar
Zhu, J., D. Song, S. Rüger and J. Huang (2008) Modeling document features for expert finding, in Proceedings of the 17th ACM Conference on Information and Knowledge Management.
Google Scholar

Citing Literature

Volume32, Issue4

August 2015

Pages 477-493

Learning to rank academic experts in the DBLP dataset

Abstract

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Learning to rank academic experts in the DBLP dataset

Abstract

References

Citing Literature

References

Related

Information