International Journal of Intelligent Systems

Training and delayed reinforcements in Q-learning agents

Corresponding Author

Pierguido V. C. Caironi

[email protected]

Progetto di Intelligenza Artificiale e Robotica, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano, Italy

Progetto di Intelligenza Artificiale e Robotica, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano, ItalySearch for more papers by this author

Marco Dorigo,

Marco Dorigo

[email protected]

IRIDIA, Université Libre de Bruxelles, Avenue Franklin Roosevelt 50, CP 194/6, 1050 Bruxelles, Belgium

Search for more papers by this author

Pierguido V. C. Caironi,

Corresponding Author

Pierguido V. C. Caironi

[email protected]

Progetto di Intelligenza Artificiale e Robotica, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano, Italy

Marco Dorigo,

Marco Dorigo

[email protected]

IRIDIA, Université Libre de Bruxelles, Avenue Franklin Roosevelt 50, CP 194/6, 1050 Bruxelles, Belgium

Search for more papers by this author

First published: 07 December 1998

https://doi.org/10.1002/(SICI)1098-111X(199710)12:10<695::AID-INT1>3.0.CO;2-T

Citations: 14

About

PDF

Tools

Share a link

Email
Wechat
Bluesky

Abstract

Q-learning can greatly improve its convergence speed if helped by immediate reinforcements provided by a trainer able to judge the usefulness of actions as stage setting with respect to the goal of the agent. This article experimentally investigates this hypothesis studying the integration of immediate reinforcements (also called training reinforcements) with standard delayed reinforcements (namely, reinforcements assigned only when the agent–environment relationship reaches a peculiar state, such as when the agent reaches a target). The article proposes two new algorithms (TL and MTL) able to exploit even locally wrong and misleading training reinforcements. The proposed algorithms are tested against Q-learning and other algorithms (AB–LEC and BB–LEC) described in the literature [S. D. Whitehead, TR-365, University of Rochester, NY, 1991], which also make use of training reinforcements. Experiments are run in a grid world where a Q-agent, a simple simulated robot, must learn to reach a target. © 1997 John Wiley & Sons, Inc.

References

1 S. D. Whitehead, A Study of Cooperative Mechanisms for Faster Reinforcement Learning, TR-365, Computer Science Dept., University of Rochester, NY, 1991.
Google Scholar
2 R. S. Sutton “Learning to predict by the methods of temporal differences,” Mach. Learn., 3, 9–44 (1988).
10.1007/BF00115009
Google Scholar
3 R. S. Sutton, “ Temporal credit assignment in reinforcement learning,” Ph.D. Thesis, Department of Computer and Information Science, University of Massachusetts, Amherst, MA, 1984.
Google Scholar
4 A. G. Barto, R. S. Sutton, and C. J. C. H. Watkins, “ Learning and sequential decision making,” in Learning and Computational Neuroscience: Foundations of Adaptive Network, M. Gabriel and J. W. Moore, Eds. MIT Press, Bradford Books, Cambridge, MA, 1990.
Google Scholar
5 L. Booker, D. E. Goldberg, and J. H. Holland, “Classifier systems and genetic algorithms,” Artif. Intell., 40, 235–282 (1989).
10.1016/0004-3702(89)90050-7
Web of Science® Google Scholar
6 C. J. C. H. Watkins, “ Learning with delayed rewards,” Ph.D. Dissertation, Psychology Department, University of Cambridge, England, 1989.
Google Scholar
7 S. D. Whitehead, “ A complexity analysis of cooperative mechanisms in reinforcement learning,” Proceeding of the Ninth National Conference on Artificial Intelligence (AAAI-91), 1991, pp. 607–613.
Google Scholar
8 S. B. Thrun, Efficient Exploration in Reinforcement Learning, Technical Report CMU-CS-92-102, Carnegie Mellon University, Pittsburgh, PA, 1992.
Web of Science® Google Scholar
9 M. Dorigo and M. Colombetti, Robot Shaping: An Experiment in Behavior Engineering, MIT Press, Bradford Books, Cambridge, MA, 1997.
10.7551/mitpress/5988.001.0001
Google Scholar
10 M. Dorigo “ALECSYS and the autonoMouse: Learning to control a real robot by distributed classifier systems,” Mach. Learn., 19, 209–240 (1995).
10.1007/BF00996270
Web of Science® Google Scholar
11 M. Dorigo and M. Colombetti “Robot shaping: Developing autonomous agents through learning,” Artif. Intell., 71, 321–370 (1994).
10.1016/0004-3702(94)90047-7
Web of Science® Google Scholar
12 M. Dorigo and M. Colombetti, “ The role of the trainer in reinforcement learning,” Proceedings of the MLC-COLT ′94 Workshop on Robot Learning, New Brunswick, NJ, 1994, pp. 37–45.
Google Scholar
13 R. S. Sutton, “ Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” Proceedings of the Seventh International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1990, pp. 216–224.
Google Scholar
14 L-J. Lin, “ Reinforcement learning for robots using neural networks,” Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, 1993.
Google Scholar
15 L-J. Lin “Self-improving reactive agents based on reinforcement learning, planning and teaching,” Mach. Learn., 8, 293–322 (1992).
10.1023/A:1022628806385
Web of Science® Google Scholar
16 S. Siegel and N. J. Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill, 1956.
Google Scholar
17 M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda “Purposive behavior acquisition for a real robot by vision-based reinforcement learning,” Mach. Learn., 23, 279–303 (1996).
10.1007/BF00117447
Web of Science® Google Scholar
18 S. Mahadevan and J. Connell “Automatic programming of behavior-based robots using reinforcement learning,” Artif. Intell., 55, 311–365 (1992).
10.1016/0004-3702(92)90058-6
Web of Science® Google Scholar
19 J. C. Santamaria, R. S. Sutton, and A. Ram, “ Experiments with reinforcement learning in problems with continuous state and action spaces,” Technical Report UM-CS-1966-088, Department of Computer Science, University of Massachusetts, Amherst, MA, 1996.
Google Scholar
20 P.-Y. Glorennec, “ Fuzzy Q-learning and dynamical fuzzy Q-learning,” Proceedings of the Third IEEE International Conference on Fuzzy Systems, IEEE Press, Piscata-way, NJ, 1994, pp. 474–479.
Web of Science® Google Scholar
21 H. R. Berenji, P. S. Khedkar, and A. Malkani, “ Refining linear fuzzy rules by reinforcement learning,” Proceedings of the Fifth IEEE International Conference on Fuzzy Systems, IEEE Press, Piscataway, NJ, 1996, pp. 1750–1756.
Web of Science® Google Scholar
22 J. F. Shepanski and S. A. Macy, “ Manual training techniques of autonomous systems based on artificial neural networks,” Proceedings of the IEEE First Annual International Conference on Neural Networks, IEEE Press, Piscataway, NJ, 1987, pp. 697–704.
Google Scholar
23 U. Nehmzow and B. McGonigle, “ Achieving rapid adaptations in robots by means of external tuition,” Proceedings of From Animal to Animats, Third International Conference on Simulation of Adaptive Behaviour (SAB94), MIT Press, Cambridge, MA, 1994, pp. 301–308.
Web of Science® Google Scholar

Citing Literature

All articles

Training and delayed reinforcements in Q-learning agents

Abstract

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Training and delayed reinforcements in Q-learning agents

Abstract

References

Citing Literature

References

Related

Information