Volume 12, Issue 10 pp. 695-724

Training and delayed reinforcements in Q-learning agents

Pierguido V. C. Caironi

Corresponding Author

Pierguido V. C. Caironi

Progetto di Intelligenza Artificiale e Robotica, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano, Italy

Progetto di Intelligenza Artificiale e Robotica, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano, ItalySearch for more papers by this author
Marco Dorigo

Marco Dorigo

IRIDIA, Université Libre de Bruxelles, Avenue Franklin Roosevelt 50, CP 194/6, 1050 Bruxelles, Belgium

Search for more papers by this author

Abstract

Q-learning can greatly improve its convergence speed if helped by immediate reinforcements provided by a trainer able to judge the usefulness of actions as stage setting with respect to the goal of the agent. This article experimentally investigates this hypothesis studying the integration of immediate reinforcements (also called training reinforcements) with standard delayed reinforcements (namely, reinforcements assigned only when the agent–environment relationship reaches a peculiar state, such as when the agent reaches a target). The article proposes two new algorithms (TL and MTL) able to exploit even locally wrong and misleading training reinforcements. The proposed algorithms are tested against Q-learning and other algorithms (AB–LEC and BB–LEC) described in the literature [S. D. Whitehead, TR-365, University of Rochester, NY, 1991], which also make use of training reinforcements. Experiments are run in a grid world where a Q-agent, a simple simulated robot, must learn to reach a target. © 1997 John Wiley & Sons, Inc.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.