Volume 32, Issue 3 pp. 358-367
Special Issue Paper

Reinforcement learning behaviors in sponsored search

Wei Chen

Corresponding Author

Wei Chen

Microsoft Research, Beijing, China

Correspondence to: Wei Chen, Microsoft Research, Beijing, China.

E-mail: [email protected]

Search for more papers by this author
Tie-Yan Liu

Tie-Yan Liu

Microsoft Research, Beijing, China

Search for more papers by this author
Xinxin Yang

Xinxin Yang

Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China

Search for more papers by this author
First published: 05 February 2016
Citations: 7

Abstract

This paper is concerned with the modeling of advertiser behaviors in sponsored search. Modeling advertiser behaviors can help search engines better serve advertisers, improve auction mechanism, and forecast future revenue. Previous works on this topic either unrealistically assume advertisers to be able to perceive the states of the sponsored search system and the private information of other advertisers or ignore the differences in advertisers' abilities to optimize their bid strategies. To tackle the problems, we propose viewing sponsored search auctions as partially observable multi-agent system with private information. Then, we employ a reinforcement learning behavior model to describe how each advertiser responds to this multi-agent system. The proposed model no longer assumes advertisers to have perfect information access, but instead assumes them to optimize their strategies only based on the partially observed states in the auctions. Furthermore, the model does not specify how the optimization is conducted, but instead uses parameters learned from data to describe different advertisers' abilities in obtaining the optimal strategies. Our experiments on real sponsored search data demonstrate that the proposed model outperforms previous models in predicting the bids and rank positions of the advertisers in the near future. In addition to the accurate prediction of these short-term behaviors, our study shows another nice property of the proposed model. That is, if all the advertisers behave according to the model, the multi-agent system of sponsored search will converge to a locally envy-free equilibrium, under certain conditions. This result establishes a connection between machine-learned behavior models and game-theoretic properties of the system. Copyright © 2016 John Wiley & Sons, Ltd.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.