DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS
Corresponding Author
Brian Lau
NEW YORK UNIVERSITY
Center for Neural Science, New York University, 4 Washington Place, Room 809, New York, New York 10003 (e-mail: [email protected]).Search for more papers by this authorCorresponding Author
Brian Lau
NEW YORK UNIVERSITY
Center for Neural Science, New York University, 4 Washington Place, Room 809, New York, New York 10003 (e-mail: [email protected]).Search for more papers by this authorAbstract
We studied the choice behavior of 2 monkeys in a discrete-trial task with reinforcement contingencies similar to those Herrnstein (1961) used when he described the matching law. In each session, the monkeys experienced blocks of discrete trials at different relative-reinforcer frequencies or magnitudes with unsignalled transitions between the blocks. Steady-state data following adjustment to each transition were well characterized by the generalized matching law; response ratios undermatched reinforcer frequency ratios but matched reinforcer magnitude ratios. We modelled response-by-response behavior with linear models that used past reinforcers as well as past choices to predict the monkeys' choices on each trial. We found that more recently obtained reinforcers more strongly influenced choice behavior. Perhaps surprisingly, we also found that the monkeys' actions were influenced by the pattern of their own past choices. It was necessary to incorporate both past reinforcers and past choices in order to accurately capture steady-state behavior as well as the fluctuations during block transitions and the response-by-response patterns of behavior. Our results suggest that simple reinforcement learning models must account for the effects of past choices to accurately characterize behavior in this task, and that models with these properties provide a conceptual tool for studying how both past reinforcers and past choices are integrated by the neural systems that generate behavior.
REFERENCES
- Akaike, H. (1974). A new look at the statistical model identification. IEEE Transaction on Automatic Control, 19, 716–723.
- Anderson, K. G., Velkey, A. J., & Woolverton, W. L. (2002). The generalized matching law as a predictor of choice between cocaine and food in rhesus monkeys. Psychopharmacology, 163, 319–326.
- Bailey, J., & Mazur, J. (1990). Choice behavior in transition: Development of preference for the higher probability of reinforcement. Journal of the Experimental Analysis of Behavior, 53, 409–422.
- Barraclough, D. J., Conroy, M. L., & Lee, D. (2004). Prefrontal cortex and decision making in a mixed-strategy game. Nature Neuroscience, 7, 404–410.
- Baum, W. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231–242.
- Baum, W. (1979). Matching, undermatching, and overmatching in studies of choice. Journal of the Experimental Analysis of Behavior, 32, 269–281.
- Baum, W., & Rachlin, H. (1969). Choice as time allocation. Journal of the Experimental Analysis of Behavior, 12, 861–874.
- Baum, W., Schwendiman, J., & Bell, K. (1999). Choice, contingency discrimination, and foraging theory. Journal of the Experimental Analysis of Behavior, 71, 355–373.
- Bayer, H., & Glimcher, P. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141.
- Box, G. E. P., & Jenkins, G. M. (1976). Time series analysis: Forecasting and control. ( Rev. ed.). San Francisco: Holden-Day.
- Brownstein, A. (1971). Concurrent schedules of response-independent reinforcement: Duration of a reinforcing stimulus. Journal of the Experimental Analysis of Behavior, 15, 211–214.
-
Burnham, K. P., &
Anderson, D. R. (1998). Model selection and inference: A practical information-theoretic approach. New York: Springer.
10.1007/978-1-4757-2917-7 Google Scholar
-
Bush, R. R., &
Mosteller, F. (1955). Stochastic models for learning. New York: Wiley.
10.1037/14496-000 Google Scholar
- Camerer, C., & Ho, T. (1998). Experience-weighted attraction learning in coordination games: Probability rules, heterogeneity, and time-variation. Games and Economic Behavior, 42, 305–326.
- Cowie, R. (1977). Optimal foraging in great tits (parus major). Nature, 268, 137–139.
- Cox, D. R. (1970). The analysis of binary data. London: Methuen.
- Davis, D. G., Staddon, J. E., Machado, A., & Palmer, R. G. (1993). The process of recurrent choice. Psychological Review, 100, 320–341.
- Davison, M. (2004). Interresponse times and the structure of choice. Behavioral Processes, 66, 173–187.
- Davison, M., & Baum, W. M. (2000). Choice in a variable environment: Every reinforcer counts. Journal of the Experimental Analysis of Behavior, 74, 1–24.
- Davison, M., & Baum, W. M. (2002). Choice in a variable environment: Effects of blackout duration and extinction between components. Journal of the Experimental Analysis of Behavior, 77, 65–89.
- Davison, M., & Baum, W. M. (2003). Every reinforcer counts: Reinforcer magnitude and local preference. Journal of the Experimental Analysis of Behavior, 80, 95–129.
- Davison, M., & Hunter, I. (1979). Concurrent schedules: Undermatching and control by previous experimental conditions. Journal of the Experimental Analysis of Behavior, 32, 233–244.
- Davison, M., & McCarthy, D. (1988). The matching law: A research review. Hillsdale, NJ: Erlbaum.
- Daw, N., Niv, Y., & Dayan, P. (2005). Actions, policies, values and the basal ganglia. In E. Bezard (Ed.), Recent breakthroughs in basal ganglia research (pp. XX–XX). New York: Nova Science Publishers.
- de Villiers, P. (1977). Choice in concurrent schedules and a quantitative formulation of the law of effect. In W. Honig & J. Staddon (Eds.), Handbook of operant behavior (pp. 233–287). Englewood Cliffs, NJ: Prentice-Hall.
- Devenport, L., & Devenport, J. (1994). Time-dependent averaging of foraging information in least chipmunks and golden-mantled ground squirrels. Animal Behavior, 47, 787–802.
- Dorris, M., & Glimcher, P. W. (2004). Activity in posterior parietal cortex is correlated with the relative subjective desirability of action. Neuron, 44, 365–378.
- Dreyfus, L. (1991). Local shifts in relative reinforcement rate and time allocation on concurrent schedules. Journal of Experimental Psychology: Animal Behavior Processes, 17, 486–502.
-
Fahrmeir, L., &
Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models ( 2nd ed.). New York: Springer.
10.1007/978-1-4757-3454-6 Google Scholar
- Gallistel, C. R., Mark, T. A., King, A. P., & Latham, P. E. (2001). The rat approximates an ideal detector of changes in rates of reward: Implications for the law of effect. Journal of Experimental Psychology: Animal Behavior Processes, 27, 354–372.
- Glimcher, P. W. (2002). Decisions, decisions, decisions: Choosing a biological science of choice. Neuron, 36, 323–332.
- Glimcher, P. W. (2005). Indeterminacy in brain and behavior. Annual Review of Psychology, 56, 25–56.
- Grace, R. C., Bragason, O., & McLean, A. P. (1999). Rapid acquisition of preference in concurrent chains. Journal of the Experimental Analysis of Behavior, 80, 235–252.
- Haruno, M., Kuroda, T., Doya, K., Toyama, K., Kimura, M., Samejima, K., et al. (2004). A neural correlate of reward-based behavioral learning in caudate nucleus: A functional magnetic resonance imaging study of a stochastic decision task. Journal of Neuroscience, 24, 1660–1665.
- Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267–272.
-
Herrnstein, R. J., &
Vaughan, W. Jr. (1980). Melioration and behavioral allocation. In J. Staddon (Ed.), Limits to action: The allocation of individual behavior (pp. 143–176). New York: Academic Press.
10.1016/B978-0-12-662650-6.50011-8 Google Scholar
- Heyman, G. (1979). Markov model description of changeover probabilities on concurrent variable-interval schedules. Journal of the Experimental Analysis of Behavior, 31, 41–51.
- Hikosaka, O., Nakahara, H., Rand, M. K., Sakai, K., Lu, X., Nakamura, K., et al. (1999). Parallel neural networks for learning sequential procedures. Trends in Neurosciences, 22, 464–471.
- Hinson, J. M., & Staddon, J. E. (1983). Hill-climbing by pigeons. Journal of the Experimental Analysis of Behavior, 39, 25–47.
- Houston, A., Kacelnik, A., & McNamara, J. (1982). Some learning rules for acquiring information. In D. McFarland (Ed.), Functional ontogeny (pp. 140–191). London: Pitman Books.
- Houston, A., & McNamara, J. (1981). How to maximize reward rate on two variable-interval paradigms. Journal of the Experimental Analysis of Behavior, 35, 367–396.
- Houston, A., & Sumida, B. (1987). Learning rules, matching and frequency dependence. Journal of Theoretical Biology, 126, 289–308.
- Hunter, I., & Davison, M. (1985). Determination of a behavioral transfer function: White-noise analysis of session-to-session response-ratio dynamics on concurrent VI VI schedules. Journal of the Experimental Analysis of Behavior, 43, 43–59.
- Iglauer, C., & Woods, J. (1974). Concurrent performances: Reinforcement by different doses of intravenous cocaine in rhesus monkeys. Journal of the Experimental Analysis of Behavior, 22, 179–196.
- Judge, S. J., Richmond, B. J., & Chu, F. C. (1980). Implantation of magnetic search coils for measurement of eye position: An improved method. Vision Research, 20, 535–538.
- Kacelnik, A., Krebs, J., & Ens, B. (1987). Foraging in a changing environment: An experiment with starlings (sturnus vulgaris). In M. Commons, A. Kacelnik, & S. Shettleworth (Eds.), Quantitative analyses of behaviour, Vol. 6: Foraging (pp. 63–87). Mahwah, NJ: Erlbaum.
- Keller, J. V., & Gollub, L. R. (1977). Duration and rate of reinforcement as determinants of concurrent responding. Journal of the Experimental Analysis of Behavior, 28, 145–153.
- Killeen, P. R. (1981). Averaging theory. In C. Bradshaw, E. Szabadi, & C. Lowe (Eds.), Quantification of steady state operant behavior (pp. 21–34). North Holland, Amsterdam: Elsevier.
- Killeen, P. R. (1994). Mathematical principles of reinforcement. Behavioral and Brain Sciences, 17, 105–172.
- Lee, D., Conroy, M., McGreevy, B., & Barraclough, D. (2004). Reinforcement learning and decision making in monkeys during a competitive game. Cognitive Brain Research, 22, 45–58.
- Luce, R. D. (1959). Individual choice behavior; a theoretical analysis. New York: Wiley.
- Machado, A. (1993). Learning variable and stereotypical sequences of responses: Some data and a new model. Behavioural Processes, 30, 103–129.
- Mark, T. A., & Gallistel, C. R. (1994). Kinetics of matching. Journal of Experimental Psychology: Animal Behavior Processes, 20, 79–95.
- Mazur, J. E. (1992). Choice behavior in transition: Development of preference with ratio and interval schedules. Journal of Experimental Psychology: Animal Behavior Processes, 18, 364–378.
- McCoy, A. N., Crowley, J. C., Haghighian, G., Dean, H. L., & Platt, M. L. (2003). Saccade reward signals in posterior cingulate cortex. Neuron, 40, 1031–1040.
- McCullagh, P., & Nelder, J. A. (1989). Generalized linear models ( 2nd ed.). London; New York: Chapman and Hall.
- McDowell, J. J., Bass, R., & Kessel, R. (1992). Applying linear systems analysis to dynamic behavior. Journal of the Experimental Analysis of Behavior, 57, 377–391.
- Montague, P. R., & Berns, G. S. (2002). Neural economics and the biological substrates of valuation. Neuron, 36, 265–284.
- Mookherjee, D., & Sopher, B. (1994). Learning behavior in an experimental matching pennies game. Games and Economic Behavior, 7, 62–91.
- Musallam, S., Corneil, B. D., Greger, B., Scherberger, H., & Andersen, R. A. (2004, July 9). Cognitive control signals for neural prosthetics. Science, 305, 258–262.
- Neuringer, A. (1967). Effects of reinforcement magnitude on choice and rate of responding. Journal of the Experimental Analysis of Behavior, 10, 417–424.
- Neuringer, A. (2002). Operant variability: Evidence, functions, and theory. Psychonomic Bulletin and Review, 9, 672–705.
- Nevin, J. (1969). Interval reinforcement of choice behavior in discrete trials. Journal of the Experimental Analysis of Behavior, 12, 875–885.
- Nevin, J. (1979). Overall matching versus momentary maximizing: Nevin (1969) revisited. Journal of Experimental Psychology: Animal Behavior Processes, 5, 300–306.
- O'Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004, April 16). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304, 452–454.
- Palya, W. (1992). Dynamics in the fine structure of schedule-controlled behavior. Journal of the Experimental Analysis of Behavior, 57, 267–287.
- Palya, W., & Allan, R. (2003). Dynamical concurrent schedules. Journal of the Experimental Analysis of Behavior, 79, 1–20.
- Palya, W., Walter, D., Kessel, R., & Lucke, R. (1996). Investigating behavioral dynamics with a fixed-time extinction schedule and linear analysis. Journal of the Experimental Analysis of Behavior, 66, 391–409.
- Palya, W., Walter, D., Kessel, R., & Lucke, R. (2002). Linear modeling of steady-state behavioral dynamics. Journal of the Experimental Analysis of Behavior, 77, 3–27.
- Platt, M. L., & Glimcher, P. W. (1997). Responses of intraparietal neurons to saccadic targets and visual distractors. Journal of Neurophysiology, 78, 1574–1589.
- Platt, M. L., & Glimcher, P. W. (1999). Neural correlates of decision variables in parietal cortex. Nature, 400, 233–238.
- Roitman, J. D., & Shadlen, M. N. (2002). Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. Journal of Neuroscience, 22, 9475–9489.
- Schneider, J. (1973). Reinforcer effectiveness as a function of reinforcer rate and magnitude: A comparison of concurrent performances. Journal of the Experimental Analysis of Behavior, 20, 461–471.
- Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80, 1–27.
- Schultz, W. (2004). Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioural ecology. Current Opinion in Neurobiology, 14, 139–147.
- Shadlen, M. N., & Newsome, W. T. (2001). Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. Journal of Neurophysiology, 86, 1916–1936.
- Shimp, C. P. (1966). Probabilistically reinforced choice behavior in pigeons. Journal of the Experimental Analysis of Behavior, 9, 443–455.
- Silberberg, A., Hamilton, B., Ziriax, J. M., & Casey, J. (1978). The structure of choice. Journal of Experimental Psychology: Animal Behavior Processes, 4, 368–398.
- Staddon, J. E., Hinson, J. M., & Kram, R. (1981). Optimal choice. Journal of the Experimental Analysis of Behavior, 35, 397–412.
- Staddon, J. E., & Motheral, S. (1978). On matching and maximizing in operant choice experiments. Psychological Review, 85, 436–444.
- Stephens, D. W., & Krebs, J. R. (1986). Foraging theory. Princeton, NJ: Princeton University Press.
- Sugrue, L. P., Corrado, G. S., & Newsome, W. T. (2004, June 18). Matching behavior and the representation of value in the parietal cortex. Science, 304, 1782–1787.
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
- Tanji, J. (2001). Sequential organization of multiple movements: Involvement of cortical motor areas. Annual Review of Neuroscience, 24, 631–651.
- Tanji, J., & Hoshi, E. (2001). Behavioral planning in the prefrontal cortex. Current Opinion in Neurobiology, 11, 164–170.
- Todorov, J. (1973). Interaction of frequency and magnitude of reinforcement on concurrent performances. Journal of the Experimental Analysis of Behavior, 19, 451–458.
- Williams, B. (1988). Reinforcement, choice, and response strength. In R. C. Atkinson, R. J. Herrnstein, G. Lindzey, & R. Luce (Eds.), Stevens's handbook of experimental psychology. ( 2nd ed., pp. 167–244). New York: Wiley.
- Williams, Z. M., Elfar, J. C., Eskandar, E. N., Toth, L. J., & Assad, J. A. (2003). Parietal activity and the perceived direction of ambiguous apparent motion. Nature Neuroscience, 6, 616–623.