Volume 20, Issue 3 pp. 883-909

Original Articles

Open Access

Forward-looking experimentation of correlated alternatives

Yu Fu Wong,

Corresponding Author

Yu Fu Wong

[email protected]

Department of Economics, University of Pittsburgh

Search for more papers by this author

Yu Fu Wong,

Corresponding Author

Yu Fu Wong

[email protected]

Department of Economics, University of Pittsburgh

Search for more papers by this author

First published: 30 July 2025

https://doi.org/10.3982/TE4960

This paper is based on my master's thesis, “Spatial Experimentation,” submitted to the Toulouse School of Economics. I am indebted to Thomas Mariotti for his supervision. I am grateful to Arjada Bardhi, Jess Benhabib, Alberto Bisin, Adrien Blanchet, Mike Borns, Steven Callander, Yeon-Koo Che, Ricardo Fernholz, Robert Fernholz, Xavier Gabaix, Matthieu Gomez, Duarte Gonçalves, Johannes Hörner, Boyan Jovanovic, Ioannis Karatzas, Navin Kartik, Elliot Lipnowski, Qingmin Liu, Laurent Miclo, Marcel Nutz, Bernard Salanié, and Bruno Strulovici for insightful comments. I thank Martsella Davitaya for research assistance.

Share a link

Email
Wechat
Bluesky

Abstract

This paper studies how a forward-looking decision maker experiments on unknown alternatives of correlated utilities. The utilities are modeled by a Brownian motion such that similar alternatives yield similar utilities. Experimentation trades off between the continuation value of exploration and the opportunity cost of exploitation. The optimal strategy is to continuously explore unknown alternatives and then exploit the best known alternative when the one being explored is found to be sufficiently worse than the best one. The decision maker explores unknown alternatives more quickly as they prove to be worse than the best known one. Applied to firm experimentation, my model predicts a conditional version of Gibrat's law and a linear relation between firm size and profitability.

1 Introduction

This paper studies the experimentation problem of a long-lived decision maker who chooses alternatives of unknown correlated utility. The decision maker learns the utility of her chosen alternative, which informs her future choices. To capture the correlation, I model utility as a Brownian motion over a continuum of alternatives such that similar alternatives yield similar utilities. Prior work on correlated utility, including Jovanovic and Rob (1990) and Callander (2011), studies short-lived agents who can exploit a better alternative based on past explorations. My innovation is to study a long-lived decision maker who can also continue to explore.

The optimal experimentation strategy describes how quickly the decision maker continuously explores unknown alternatives and when she stops to exploit the best known one (Theorem 1). The speed of exploration depends on the drawdown, i.e., the utility difference between the best known alternative and the one being explored, and the decision maker exploits the best alternative when the drawdown exceeds a threshold.

The innovation of my analysis is the application of time change to the domain of alternatives. A time change indexes a stochastic process by an increasing sequence of stopping times. The experimentation literature on finitely many risky arms, such as Karatzas (1984), Moscarini and Smith (2001), and Keller, Rady, and Cripps (2005), performs time change in the domain of information that accumulates at a controlled rate. The most up-to-date information is a sufficient statistic because it encompasses past information. In my model, the decision maker controls the speed of exploring new alternatives. The alternative being explored is not a sufficient statistic because previously explored alternatives could be better. I show that the optimal strategy also takes the best known alternative into account.

The trade-off between the continuation value of exploration and the flow opportunity cost of exploitation implies faster exploration for more negative drawdowns (Proposition 1). On the one hand, faster exploration acquires information more quickly and realizes the continuation value sooner. When the drawdown becomes more negative, the impatient decision maker finds it less pressing to realize the decreased continuation value. On the other hand, faster exploration shortens the duration of exploration as the flow opportunity cost of exploitation accrues. When the drawdown is more negative, the increased flow opportunity cost prompts the decision maker to explore in a shorter time. I show that the opportunity cost incentive always dominates the continuation value incentive. This is because the option to exploit partially insures the decision maker from the drawdown, and so the continuation value is less sensitive to the drawdown than the flow opportunity cost.

In the time series, i.e., how experimentation evolves over time, the speed's monotonicity with respect to drawdown implies that the speed of exploration increases over time under conditions on model parameters (Corollary 1).

I study the comparative statics of the optimal strategy (Proposition 2). Intuitively, the value of exploration increases when the decision maker is more patient, the cost of exploration is lower, or the utility has a more positive drift. To realize the increased value, the decision maker explores alternatives more quickly for any given drawdown and tolerates more negative drawdowns, i.e., the drawdown threshold increases. The comparative statics also hold when the utility is more volatile because of the option value of exploitation.

My model applies to a firm that experiments on its firm size to maximize profits. I interpret the log number of employees as a proxy for firm size and log profit as a proxy for profitability. Under the Brownian specification, firm-size elasticity of profitability is modeled to be independent and identically distributed over all firm sizes.

In the cross section, i.e., how experimentation differs across Brownian realizations, I derive two results that correspond to predictions on firm dynamics. The first result is that as the decision maker explores more alternatives, the speed of exploration converges asymptotically in distribution (Proposition 3). In terms of firm dynamics, this translates to a conditional version of Gibrat's law: among large firms, the percentage change in firm size conditional on growth follows essentially the same distribution that is invariant to absolute firm size. The percentage change depends on absolute size only through the extensive margin, i.e., firms that stop growing, but not the intensive margin, i.e., firms that keep growing. By comparison, the original, unconditional version of Gibrat's law states that the distribution is invariant at both the extensive margin and the intensive margin.

The second result is that the exploited alternative and its utility are asymptotically linear (Proposition 4). In the context of firm experimentation, this result means that long-run profitability is proportional to long-run firm size among large firms. The result highlights an endogenous selection effect: firms choose to operate at large sizes because they experience increasing profits along the growth path. Therefore, the regression of firm size on profits may overestimate average profit elasticity.

Related literature

My paper incorporates a long-lived decision maker into the experimentation problem where the utilities of unknown alternatives are modeled by a Brownian motion. Jovanovic and Rob (1990) analyze such a problem with overlapping generations of short-lived agents, each of whom can learn the Brownian realization of one alternative before choosing another for utility. Callander (2011) examines myopic agents who reveal the utility of an alternative to their choice. Garfagnini and Strulovici (2016) extend Callander (2011) to agents who live for two periods.¹ In these papers, the agents acquire information at most once before exploitation and, therefore, do not incorporate the information value of continued exploration.² By contrast, the long-lived decision maker in my model can continue to explore. As a result, past exploration becomes instrumental to future exploration, and the information value feeds back to the decision maker's experimentation strategy.

Urgun and Yariv (2025) considers a related search model in which an agent searches continuously among alternatives of Brownian utility. The objective is to maximize the expected discounted utility of the best alternative subject to a search cost. As in my model, their optimal strategy is to search continuously until the drawdown reaches a threshold. The optimal speed of search is driven by the continuation value. By contrast, the objective in my model is to maximize the expected discounted flow utility subject to a learning cost. The optimal speed of exploration is driven by the flow opportunity cost in addition to the continuation value.

The time change enables me to derive the time-series and cross-sectional results from the literature on Brownian motions that are stopped at a drawdown threshold. Taylor (1975) and Lehoczky (1977) characterize the joint distribution of the running maximum and the stopping time, for a Brownian motion whose drift and volatility may depend on its value. However, their framework does not accommodate my optimal strategy because both the drift and the volatility in my case depend on the drawdown. To leverage their results, I apply a time change to the Brownian utility, transforming it from the time domain to the domain of alternatives. Because the Brownian utility over alternatives has constant drift and volatility, their results can be applied to this process, as the threshold strategy remains invariant under the time change.

The remainder of the paper is organized as follows. Section 2 introduces the experimentation problem. Section 3 derives the optimal experimentation strategy. Section 4 studies the properties of optimal experimentation. Section 5 discusses key modeling assumptions. The Appendix provides proofs.

2 Experimentation problem

I study the experimentation problem of a forward-looking decision maker who continuously explores new alternatives of unknown and correlated utility subject to a learning cost.

I model utility as a Brownian motion with drift over the domain of alternatives to capture their correlation. Let

be a probability space that can support a standard Brownian motion. The utility process follows

for realization

and alternative

, where

is the drift,

is the volatility, and B is a standard Brownian motion.

An experimentation strategy specifies the choice of alternatives based on past experiments. In realization , the decision maker chooses alternative at time . As in Callander (2011) and Garfagnini and Strulovici (2016), the decision maker learns the realized utility upon her choice via learning-by-doing. Therefore, her information at time t consists of the history of calendar time, the chosen alternatives, and the corresponding utilities, i.e., , where denotes the generated sigma algebra. Let denote the augmented filtration and let denote the decision maker's discount rate. I define the strategy space as follows.

Definition 1.An experimentation strategy, , or simply , is a stochastic process that satisfies the following conditions.

Initial condition. For each , .
Continuous exploration condition. For each , the frontier is absolutely continuous.
Measurability condition. Strategy is predictable with respect to the augmented filtration .
Growth condition. There exists and such that for all and .

An important and immediate consequence of the initial condition and the continuous exploration condition is that the (closure of the) set of explored alternatives at time t is the interval

When the decision maker explores new alternatives, she incurs a flow learning cost , where is the speed of exploration. The cost function is twice continuously differentiable and strictly convex. It is assumed to satisfy , , and the Inada condition, . Note that the cost is increasing in the speed of exploration and is zero when the speed is zero, e.g., when the decision maker exploits a previously chosen alternative. Thus, the flow learning cost can be interpreted as the utility loss incurred by the decision maker when she first learns to extract utility from a given alternative.

The forward-looking decision maker chooses an experimentation strategy to maximize the expected discounted Brownian utility from the chosen alternatives net of the learning cost of exploration. Formally,

(1)

I denote the value attained by strategy

, and call

an optimal strategy if it solves the maximization problem (1).

Remark 1. (Key assumptions)I comment on the key assumptions in the model. First, the utility process with stationary independent increments allows the recursive formulation of the optimal strategy. In particular, the Brownian specification with constant drift and volatility lends itself to the well developed mathematics literature. Second, on the strategy space, the continuous exploration condition permits jumps to known alternatives and forbids jumps to unknown alternatives to the right of the frontier. I argue in Section 5.1 that this condition is without loss of optimality for convex learning costs. Third, the growth condition rules out strategies with explosive explorations that attain infinite utility at infinite cost.

3 Optimal experimentation strategy

I derive the optimal experimentation strategy in three steps. First, I show that the objective function is bounded. Second, I construct a candidate strategy based on several conjectures. Third, I prove the optimality of this strategy through verification. The primary departure from the standard argument is that the strategy cannot be characterized by a stochastic differential equation due to the possibility of exploitation.

3.1 Bounded objective function

Unlike myopic or short-lived agents, the long-lived decision maker might attain infinite utility by exploring an arbitrarily large set of alternatives for an arbitrarily good alternative. Because such a strategy would also incur an infinite learning cost, the objective function would be ill-defined as infinity minus infinity.³

The growth condition guarantees a bounded objective function by limiting exploration in finite time. The decision maker can choose from an exponentially growing, but bounded, set of alternatives whose running maximum increases at most exponentially in expectation. The restriction on in the condition guarantees that the running maximum grows at a slower rate than the exponential discounting, leaving the objective bounded from above.

Lemma 1. (Bounded objective function)Over the set of strategies that satisfy the growth condition with fixed constants and , the objective function (1) is bounded from above.

3.2 Verification argument

I construct a candidate strategy based on some conjectures and then verify the optimality of the strategy. The key observation is that, among the many explored alternatives, the utility of the best known alternative and that of the frontier (i.e., rightmost alternative) are state variables, due to the Markovian property of Brownian utility.

I first claim that exploitation is a lookback option that gives the running maximum, , when executed. A lookback option is an option with a backward-looking and history-dependent payoff.⁴ Because exploitation does not generate information or advance the frontier for future experiments, once the decision maker starts to exploit, she will continue to do so by recalling the best known alternative.

Together with the running maximum, the frontier utility, , characterizes the continuation value, i.e., . This is because all unknown alternatives are to the right of . Conditional on available information, the utility distribution of unknown alternatives depends only on due to the Markov property of Brownian motion.

Next, I claim that the option value is a function of the drawdown. The option value is the difference between the continuation value and the value of exploitation, i.e., , and the drawdown is the difference between the frontier utility and the running maximum, i.e., . Note that the drawdown is weakly negative by definition. Suppose that both the frontier utility and the running maximum increase by the same constant. The utility distribution of unknown alternatives then shifts upward by the same amount due to the stationary independent increments of Brownian utility. Therefore, both the value of exploration and the value of exploitation increase by the same amount, leaving the option value unchanged. The option value function can thus be written as .

I derive the laws of the frontier utility, running maximum, and drawdown by applying a time change to the utility process.

Lemma 2. (Time change)The frontier utility satisfies , where the process is a continuous square-integrable -martingale with quadratic variation .

The drift and volatility of

are both linear in the speed of exploration because the time change controls how quickly the frontier advances and experiences the variations in the domain of alternatives. The Dambis–Dubins–Schwarz theorem further implies that

and, therefore,

, where

is a standard Brownian motion in a possibly enlarged probability space.⁵ As the time change is continuous by the continuous exploration condition, the running maximum of explored alternatives equals that of the frontier utility, i.e.,

. When the drawdown is strictly negative, the running maximum does not increase and, therefore, the drawdown shares the same law as the frontier utility, i.e.,

and

for

The decision maker executes the option to exploit when the drawdown exceeds a threshold, i.e., when the opportunity cost of exploitation exceeds the continuation value of exploration. When the frontier utility decreases, the utility distribution of unknown alternatives shifts downward due to the stationary independent increments. The continuation value therefore decreases. At the same time, the opportunity cost increases. Therefore, there exists a threshold

such that exploitation is optimal if and only if

. In addition, the stationary independent increments further imply that the threshold does not depend on the frontier utility or the running maximum. I denote by τ the exploitation time, i.e.,

(2)

I posit that, during exploration, the option value satisfies a Hamilton–Jacobi–Bellman equation (HJB) that captures the opportunity cost of exploitation and the flow value of exploration. To maximize the option value, the speed of exploration must maximize the Hamiltonian, i.e.,

(3)

The option value consists of three components. First, the drawdown,

, captures the opportunity cost of exploring at the frontier instead of exploiting the best known alternative. Second, the learning cost,

, increases with the speed of exploration. Third, the flow value of exploration characterizes how the option value function changes with the drawdown. Because

for

, the variation in drawdown translates to the option value via the drift term

and the Itô term

. These two terms are controlled by the speed of exploration via Lemma 2.

The optimal speed of exploration maximizes the flow value of exploration subject to the learning cost. As the speed controls how quickly the frontier advances in the domain of alternatives over time, the impatient decision maker can reap the value of exploration earlier in time by exploring at a higher speed subject to the convex learning cost. The optimal speed therefore solves the first-order condition (FOC)

(4)

which equates the marginal learning cost to the marginal value of exploration.

In addition to (3), the option value function should satisfy three standard boundary conditions. First, the value-matching condition defines the drawdown threshold by equalizing the opportunity cost and the value of exploration. The option value function therefore vanishes at the threshold, i.e.,

(5)

Second, the smooth-pasting condition characterizes the optimal execution of the option to exploit. For

, the decision maker exploits immediately and so

. If the marginal option value were positive at the threshold, i.e.,

, the kink would imply a positive infinite flow value of exploration, so the decision maker would explore more and vice versa. Thus, optimal execution implies that

(6)

Third, the super-contact condition determines the marginal value of frontier utility when the decision maker marks a new maximum. Because the running maximum increases by the same amount as the frontier utility, so does the continuation value, i.e.,

(7)

By constructing a candidate strategy that satisfies these conditions, I characterize the value function and the unique optimal strategy by a verification argument.

Theorem 1. (Optimal experimentation)

(i) The continuation value satisfies . The option value function v is the unique solution to (3) subject to boundary conditions (5)–(7), with extension for . Moreover, v is increasing and convex.
(ii) The optimal experimentation strategy is unique and is given by
where (2) defines exploitation time τ and (4) defines speed of exploration s.

4 Properties of optimal experimentation

I analyze how optimal experimentation evolves over time, depends on model parameters, and differs in the cross section.

4.1 Time-series properties

By analyzing two opposing incentives, I show that the more negative is the drawdown, the higher is the speed of exploration. The speed controls the drawdown, which in turn determines the dynamics of exploration.

Proposition 1.The speed of exploration s is higher for the more negative drawdown y.

Define

. It is continuously differentiable and strictly increasing, and so is its inverse

, which exists. Plugging (3) into (4), I write the speed of exploration as a function of

, i.e.,

(8)

The function

maps the sum of the option value and the opportunity cost (recall that

) to the optimal speed, subject to the convex learning cost. Note that neither

nor

enters (8). This is because the drift and volatility terms in (3) are both linear in the rate of time change, s, and, therefore, can be substituted by (4) simultaneously.

The intertemporal trade-off between the flow opportunity cost of exploitation and the continuation value of exploration determines the optimal speed of exploration. On the one hand, the decision maker incurs the flow opportunity cost, , when she explores new alternatives instead of exploiting the best known one. To avert the flow cost, she would like to explore quickly over a short period of time, and more so for more negative drawdowns when the opportunity cost is higher. On the other hand, the impatient decision maker would like to realize the continuation value, v, sooner by acquiring more information. For less negative drawdowns, the value is higher and, therefore, prompts faster exploration.

The speed of exploration is higher for more negative drawdowns because the opportunity cost is more sensitive to the drawdown than the continuation value, i.e., . When the drawdown experiences a negative shock at frontier , the utilities of all alternatives to the right of decrease, while those to the left, including the best alternative, remain unchanged. As a result, the flow opportunity cost increases by the size of the shock.

However, the continuation value decreases by a smaller amount because the option to exploit partially insures against the shock. Given optimal execution of the option to exploit, the envelope theorem implies⁶ . In words, the shock at the current frontier is relevant to the continuation value only when the chosen alternative in the future, , lies to the right of . If the frontier utility reaches the drawdown threshold before marking a new maximum, the decision maker will exploit the best alternative, which lies to the left of , and, therefore, she insulates herself from the negative shock. Only when , is alternative the best, and so all future alternatives chosen during exploration and exploitation will lie to its right. The negative shock at factors in the decision maker's flow utility permanently, implying that .

I provide a sufficient condition for the speed of exploration to increase on average in the time series.

Corollary 1.If and is decreasing over , then the speed of exploration is a submartingale over .

By Proposition 1, the negative drift in drawdown contributes to a higher speed of exploration over time. The volatility also contributes to it via the Itô term because v is convex. The condition on c guarantees that

in (8) is convex as well.⁷ Note that the change in the running maximum does not factor in the calculation because

by the smooth-pasting condition (6).

4.2 Comparative statics

I derive the comparative statics of the option value function, the speed of exploration, and the drawdown threshold with respect to model parameters. I write that c increases if increases and increases pointwise, and that v and, respectively, s increase if v and s increase pointwise. Moreover, a set of parameters is said to be more favorable if either the drift μ is higher, the volatility is higher, the discount rate r is lower, or the cost function c is lower.

Proposition 2. (Comparative statics)Option value v, speed of exploration s, and drawdown threshold are increasing in the drift μ and the volatility of the utility process, and are decreasing in the discount rate r and the learning cost function c.

Intuitively, the decision maker derives higher value from exploration when it yields higher utility, the decision maker is more patient, or learning is less costly (Figure 1). As a result, she also tolerates a more negative drawdown threshold. Moreover, she is motivated to explore at a higher speed at any given drawdown so as to realize the increased value more quickly. The comparative statics hold when the utility process is more volatile as well, because the option to exploit insures against negative shocks.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

The option value functions and the speed of exploration functions for two sets of parameters.

4.3 Cross-sectional properties

In the application of firm experimentation, a firm experiments on its size, , to maximize expected discounted profitability, . I use the log number of employees as a proxy for firm size and use log profit for profitability. The Brownian specification means that over all firm sizes, the firm-size elasticity of profitability is independent and identically distributed. I interpret the learning cost, c, as the loss in profits when a firm first operating at a given firm size fails to allocate human resources to maximize profit. Starting at the minimal firm size, the optimal strategy is to grow continuously until the percentage drawdown, y, exceeds a threshold and then to scale down to exploit the most profitable size in the long run. The percentage growth rate, s, depends on the percentage drawdown.

Stochastic utility generates heterogeneous experimentation behaviors in the cross section under the optimal strategy. In the context of firm experimentation, I obtain two cross-sectional predictions on firm dynamics: the conditional version of Gibrat's law and the linear relation between long-run firm size and profitability.

First, I present the asymptotic convergence of the speed of exploration, which corresponds to the conditional version of Gibrat's law. With a slight abuse of notation, I index the drawdown and speed of exploration by the alternative rather than calendar time, i.e., and , by undoing the time change . This is possible because whether the drawdown has exceeded the threshold is invariant to the continuous time change, i.e., .

Proposition 3. (Asymptotic convergence)As , the speed of exploration conditional on exploration, , converges in distribution.

The proposition concerns the few firms that grow beyond a large size even when most other firms have already stopped exploration. The growth rate of these remaining few firms follows a stable distribution regardless of absolute firm size. In other words, the asymptotic convergence implies the conditional version of Gibrat's law: among large firms, the percentage growth in firm size conditioned on growth is essentially invariant to absolute firm size.

The conditional version of Gibrat's law can explain an empirical deviation from the original law, which states that the percentage growth in firm size is (unconditionally) invariant to absolute firm size. Daunfeldt and Elert (2013) reject the original, unconditional version of Gibrat's law on the aggregate level for Swedish firms during 1998–2004 because small firms on average grow more quickly than large firms. By contrast, the conditional version of Gibrat's law suggests that its deviation from the original law derives from the extensive margin: large and old firms are more likely to have stopped exploration and growth. Moreover, the conditional law weakens the original law parsimoniously in that the intensive margin—the percentage growth in firm size conditional on growth—is invariant.

The proof of Proposition 3 also implies that the decision maker quickly stops exploring.

Corollary 2. (Short exploration)There exist such that for all .

Corollary 2 states that the tail probability is subexponential; i.e., exploration stops at least exponentially quickly.⁸ Most firms stop growing and maintain a fixed size soon after establishment.

My second cross-sectional result is that the exploited alternative is linear in its utility. This result corresponds to the linear relation between long-run firm size and profitability. To present a stark contrast, I restrict attention to a driftless utility process, i.e., , which means that the average elasticity of profit is zero. Denote the exploited alternative by and its utility by .

Proposition 4. (Linear relation)For , the pair satisfies the linear relation

Among large established firms, the average firm-size elasticity of profit is

, which is positive.⁹

Proposition 4 highlights an endogenous selection effect: large firms choose to operate at large sizes because they experience increasing profits along the growth path. As a result, the naïve regression of profitability on firm size would estimate a positive average elasticity despite the zero mean in the profit process, because the estimation overlooks the selection bias. The proposition implies that the bias persists even for very large firms.

5 Discussions on modeling assumptions

In this section, I discuss how my analysis depends on three modeling assumptions: the continuous exploration condition, the Inada condition, and the learning cost.

5.1 Continuous exploration condition

The continuous exploration condition in Definition 1 is without loss of optimality under convex learning costs. The idea is that in continuous time, fast continuous exploration approximates discontinuous jumps, and small but frequent jumps can approximate continuous explorations.

Suppose that the decision maker can explore discontinuously subject to a discrete learning cost , where is the distance of the chosen alternative from previous explorations and is the discrete cost function. I assume that and that C is convex as in Garfagnini and Strulovici (2016). This formulation is consistent with the flow learning cost of continuous exploration because the speed of exploration measures the infinitesimal distance from previous explorations.

I claim that the discrete cost function is effectively linear. Because C is convex, it suffices to show subadditivity, i.e., for . Instead of exploring an alternative at distance D in one step, the decision maker can first explore the middle point at distance and then explore the chosen alternative, now at distance . Over an arbitrarily short time lapse, the two-step exploration reveals additional information at the middle point while accumulating zero flow utility. The cost therefore bounds the effective learning cost from below.

The linearity of the discrete cost function implies the optimality of continuous exploration. For the case

, the statement follows because the decision maker can explore continuously at an arbitrarily high speed instead of discontinuously. Such continuous exploration incurs strictly lower learning cost because of the Inada condition. Now let us consider the case

. Instead of exploring continuously at speed

, the decision maker can mix or “chatter” between exploring continuously at speed

and discontinuously in arbitrarily small steps.¹⁰ For the same distance per unit time, the mixed exploration incurs a flow cost of

, which is lower than

. Because the utility process is continuous, the mixed exploration provides essentially the same information as continuous exploration when the step size vanishes. The effective flow learning cost function

falls into the first case described above and so continuous exploration is weakly optimal, but not necessarily strictly so, because the effective cost function fails the Inada condition.

Remark 2.The above argument fails if C is not convex. When the decision maker explores discontinuously, the (closure of the) set of explored alternatives may no longer be an interval. In addition to the utility at the rightmost alternative, the decision maker also needs to take the utility at each boundary point into account. The number of state variables grows with each discontinuous exploration, leaving the recursive analysis intractable.

5.2 Inada condition

The Inada condition guarantees an interior solution to (4). If the learning cost function c fails the condition, it may be optimal to explore at infinite speed. Such instantaneous exploration realizes the continuation value immediately and, therefore, averts the flow opportunity cost. The decision maker does so over an interval of alternatives until the drawdown reaches the threshold or becomes less negative, whereupon the optimal speed becomes finite again.¹¹ Despite the more involved technicalities, such instantaneous explorations yield the same economic insights.

5.3 Learning cost

My analysis extends to the experimentation problem with an adjustment friction in place of the learning cost. I model the adjustment friction by restricting the chosen alternative to be continuous, with a given Lipschitz constant; in other words, the speed of change is uniformly bounded. The key difference is that the decision maker can no longer exploit the best known alternative instantaneously, but can only take time to backtrack from the frontier.

I denote the highest value from backtracking from a given alternative by the exploitation value. The exploitation value–frontier utility pair can be shown to be a Markov process over the domain of alternatives.

Similar to the case with a learning cost, the optimal strategy is to explore until the generalized drawdown—i.e., the difference between the exploitation value and the frontier utility—exceeds a threshold, and then backtrack to exploit the alternative that attains the exploitation value. While the maximum utility increases monotonically during exploration, the exploitation value decreases at a rate proportional to the generalized drawdown, reflecting the lower flow utility when backtracking from the frontier. As a result, the exploited alternative may differ from the best one.

Appendix: Proofs

A.4 Proof of Lemma 1

By the growth condition, the expectation of the running maximum of U over

is bounded from above, i.e.,

where

is the maximum cumulative drift and

is the expected running maximum of the driftless Brownian motion σB. Invoking the Fubini theorem, I bound the objective function from above by

A.5 Proving Lemma 2

This lemma adapts Proposition 1.1.5 in Chapter V of Revuz and Yor (2013) to my setting where is defined over t instead of x. The time change may remind the reader of the celebrated Dambis–Dubins–Schwarz theorem (see, for example, Theorem 1.1.6 in Chapter V of Revuz and Yor (2013)). However, my model specifies a standard Brownian motion (and utility process ) as a primitive and then constructs its time-change . By comparison, the theorem specifies a continuous martingale and then shows the existence of a Brownian motion in a possibly enlarged probability space such that the martingale is a time-change of that Brownian motion. In particular, my lemma does not require such enlargement.

Proof of Lemma 2.Define . The process Z is continuous by composition and is adapted to by the definition of . For , its expectation satisfies

where

is the natural filtration of B. The first equality follows because x is predictable by the measurability condition. The second is due to the optional sampling theorem because

is a uniformly integrable

-martingale on

by the growth condition. Therefore, Z is an

-martingale.

The same argument shows that is an -martingale, because is a uniformly integrable -martingale on for all . Therefore, Z is squared-integrable and its quadratic variation is by definition. □

A.6 Proving Theorem 1

Following the conjectures in the main text, I construct a candidate strategy (Proposition 5 and Corollary 3) and then verify its optimality and uniqueness.

Proposition 5.There exist constants and , and a convex function with such that v satisfies HJB (3) and boundary conditions (5), (6), and (7). For such v, the unique maximizer is given by the FOC (4).

Proof.I prove the existence to the free boundary problem by the shooting method. First, I transform the HJB to an autonomous ordinary differential equation (ODE) by a change of variables. Second, I prove the existence to an initial value problem that omits the value matching condition and the smooth pasting condition (Lemma 3). Third, I derive the monotonicity (Lemma 4) and asymptotic properties (Lemma 5 and Lemma 6) of such solutions. Fourth, I show the two omitted conditions hold for one such solution, which, therefore, solves the free boundary problem (Lemma 7).

Consider the change of variables . By substituting the speed of exploration (8) into HJB (3), I obtain

(9)

, where

is increasing. The boundary conditions translate to

, and

Define to be 0 for and for . For so that , consider the initial value problem of ODE (9) with initial conditions and . Such exists because .

Lemma 3.There exists a unique solution ϕ to the initial value problem. Moreover, .

Proof.By the Picard–Lindelöf theorem, there exists a unique solution to the initial value problem on , where is well defined.

To show by contradiction, I suppose . For , I have because by continuity. The supremum is attained also by continuity. Moreover, I have and, thus, . Differentiating ODE (9) at , I obtain

Therefore,

for sufficiently small

, which is a contradiction.

Because implies , the solution exists and is unique on . □

Lemma 4.The function is continuously differentiable with respect to , and its derivative is positive and strictly decreasing in y.

Proof.Because , the derivative of ϕ with respect to is the solution to the variational equation

with initial conditions

and

. It then suffices to show that

To obtain a contradiction, I suppose . The supremum is attained because . I have for and for . At , I have as . Therefore, for sufficiently small , which is a contradiction. □

Lemma 5.For sufficiently close to , the solution intersects with the 45-degree line, −y.

Proof.For the case of , I have . The initial conditions satisfy and . Moreover, and so ϕ exists and is unique in a neighborhood of 0. I have for some close enough to zero.

For the case of , I have . The solution is the constant function . It satisfies for some .

In either case, I have for some . By the continuity of ϕ with respect to , I have for sufficiently small . Because , intersects with −y by the intermediate value theorem. □

Lemma 6.For sufficiently large , the solution does not intersect with the 45-degree line, −y.

Proof.For the case of , define

The infimum is finite and strictly positive because the left-hand side of the inequality is increasing, and the right-hand side is decreasing and spans

. For

, I have

Therefore,

. On

, I have

since

and

; on

, I also have

because

and

. The lemma therefore obtains.

For the case of , define

As in the previous case, the infimum is finite and strictly positive. For

, I show

by contradiction. Suppose otherwise. I have

Therefore,

, which is a contradiction. On

, I have

; on

, I have

because

. The lemma therefore obtains. □

Lemma 7.There exist and such that on with equality at and .

Proof.Let be the supremum of solutions of initial values that intersect with the 45-degree line. It exists by Lemma 5 and Lemma 6. Let denote the corresponding solution. Let denote a maximizing sequence. For each n, let denote the solution for and let denote the first intersection. The sequence is uniformly bounded by L by the proof of Lemma 6 and, therefore, admits a converging subsequence. With a slight abuse of notation, let denote the converging subsequence. Define .

For all n, I have because intersects −y from above. In addition, is uniformly bounded from below by −1 on , because is increasing in the initial value and decreasing in y by Lemma 4.

The first derivative is uniformly bounded by and so is uniformly Lipschitz. Moreover, is uniformly bounded by on because . The Arzelà–Ascoli theorem therefore implies that the sequence admits a uniformly converging subsequence. The uniqueness of the initial value problem further implies that the limit is . Moreover, the uniform convergence implies ; i.e., intersects with the 45-degree line. It also implies that on

I show that is uniformly bounded on . For the case of , I have

because

. For the case of

, I have

because

is bounded from below by −1.

The uniformly bounded implies that is uniformly Lipschitz. Because the first derivative is uniformly bounded between , the Arzelà–Ascoli theorem implies that admits a uniformly converging subsequence. By the uniqueness of the initial value problem, the limit is . Moreover, the uniform convergence implies .

It remains to show . Suppose . I have for sufficiently small . By the continuity of the initial value problem, , so intersects the 45-degree line, which contradicts the fact that is the supremum. □

Because solves ODE (9) and the corresponding boundary conditions, I obtain the desired function v by reverting the change of variables via . □

Proposition 5 allows me to extend the construction from to .

Corollary 3.Any function v defined in Proposition 5, with the extension of value 0 for , solves the HJB equation

subject to the super-contact condition

Remark 3.When , the solution to the free boundary problem can also be characterized by

This integral equation characterization is possible because the first-order derivative vanishes in HJB (9).

Proof of Part (i) of Theorem 1.Define the candidate continuation value function as for . Note that the function is convex. Recall that .

For any strategy , Itô's lemma gives

(10)

for any

. In the first equality, the Itô terms for M are zero because M is an increasing process. The quadratic variation of Z in the second equality follows from Lemma 2. In the third equality, the

term vanishes because both its components are zero:

The first component on the right-hand side is identically zero by the super-contact condition. The second component is also zero according to the Skorokhod equation; i.e., the maximum of a continuous process does not increase when the process is away from its maximum.

Taking expectation (over ) of (10), I obtain

The expectation of the

term vanishes because it is a martingale.

Including the flow utility and learning costs over and taking expectations, I obtain

For , the chosen alternative is at the frontier and so . In this case, Corollary 3 implies that the integrand is weakly negative. For , I have . The integrand is also weakly negative since v is weakly positive.

By construction, attains the maximum integrand (at zero) over all s because it satisfies the HJB equation in Corollary 3. For this strategy, the speed is bounded and so the learning cost is uniformly integrable. As , the terminal value, , vanishes and so the value of is given by .

Consider an arbitrary strategy . The growth condition implies that is uniformly integrable. Fatou's lemma gives

(11)

where the second inequality follows from Corollary 3. Therefore,

and

is an optimal strategy. □

Proof of Part (ii) of Theorem 1.For any optimal strategy, the integrand in (11) is zero t-almost everywhere with probability 1. Within the threshold , it is uniquely maximized by . Beyond the threshold , the integrand is maximized if and only if and . Therefore, the process is unique. Within the threshold, and is, therefore, unique. For the unique X, the threshold is reached only when . Then is unique almost surely. □

A.7 Proof of Proposition 2

The proof of Proposition 2 relies on two single-crossing lemmata about the solution to initial value problem (9) with respect to model parameters (Lemma 9 and Lemma 10). They imply the monotonicity of and (Corollary 4 and Corollary 5). Given the monotonic endpoints, the single-crossing lemmata strengthen the monotonicity results to the functions v (Lemma 11) and s pointwise (Lemma 12).

Consider two sets of parameters labeled by that are identical except either , , , or . Recall that .

Lemma 8.If , then pointwise.

For the proof, see the proof of Proposition 5(c) in Moscarini and Smith (2001).

For , let denote the solution to ODE (9) subject to initial condition and from until , for parameters i.

Lemma 9. (Single crossing, starting from 0)

If , then and for all .
If , then there is at most one such that . Moreover, and for all .

Proof.I first prove the lemma for either or . By Lemma 8, pointwise.

For the case of , I prove on by showing there. Suppose to the contrary that . It is attained by continuity of s and in the neighborhood of 0, because . Moreover, on because there. At , I have

Therefore,

, which is a contradiction.

For the case of , I show that and crosses at most once. Let denote the first crossing. Because , I have . Whenever , I have , and so for sufficiently small .¹² I continue to show for all by showing there. Suppose to the contrary that . It is attained by continuity of ϕs. In addition, I have either or and . Moreover, on because there. At , I have

Therefore,

, which is a contradiction.

The proofs for single crossing with respect to μ and are analogous: the only difference lies in the derivation of the ordering between s. For μ, I obtain the ordering by noting and, thus, for all interior y. For , I note and so . □

Corollary 4.Let denote the option value at 0 for parameters i. Then .

Proof.Suppose otherwise. I have by Lemma 9. Because touches the 45-degree line, the function will never intersect the line, which is a contradiction. □

For , let denote the solution to ODE (9) for parameters i from until , subject to initial conditions and , where . The solution exists at least for by construction in Proposition 5.

Lemma 10. (Single crossing, starting from )

If , then and for .
If , then there exists at most one such that . Moreover, and for .

The proof is analogous to Lemma 9. The only difference is the starting point y instead of 0. For , I have and by the convexity of . The reverse goes for .

Corollary 5.Let denote the drawdown threshold for parameter i. Then .

Proof.Suppose otherwise. I have , a contradiction. □

Lemma 11.Let denote the option value for parameters i. Then pointwise.

Proof.It suffices to show that pointwise.

Suppose otherwise. There exists such that by the intermediate value theorem. By Lemma 9, I have and so does not intersect the 45-degree line on . Moreover, and so the convex does not intersect the 45-degree line on . Therefore, does not intersect with the 45-degree line, which is a contradiction. □

Lemma 12.Let denote the speed of exploration for parameters i. Then pointwise.

Proof.By (8), the monotonicity of s with respect to μ, , and r follows directly from that of ϕ. With respect to c, Lemma 8 implies that and, therefore, are both decreasing pointwise. □

A.8 Proof of Corollary 1

Define . It is increasing because g is increasing. Moreover, elementary calculus shows that is decreasing whenever is increasing.

Applying Itô's lemma to (8) gives

where the second equality follows from Lemma 2 and the fact that

by the Skorokhod equation. The three dt terms are positive because (i) h is increasing,

, and

, (ii)

is decreasing because

is increasing by the hypothesis, and (iii) v is convex. The dM term vanishes because

by the smooth pasting condition (6). Therefore, the corollary obtains because Z is a martingale by Lemma 2.

A.9 Proof of Proposition 3

Lemma 13.Let z denote a Brownian motion with drift μ and volatility reflected at . Then the pairs of processes and share the same law, where is the local time of z at 0.

For the proof, see Peskir (2006).

Proof of Proposition 3.Lemma 13 implies that the measure is absolutely continuous and its density satisfies the Kolmogorov forward equation

subject to boundary conditions

and

, and initial condition

This diffusion-drift problem with absorption/reflection boundary conditions can be solved by Fourier series. See Chapter 4 of Strauss (2007). The solution converges asymptotically to the leading eigenfunction, denoted by , exponentially at the rate of the leading eigenvalue, denoted by .

I note that for . □

A.10 Proof of Corollary 2

The proof of Proposition 3 implies that the distribution of decays to zero exponentially at rate λ. Because , I have . Therefore, the distribution of τ also decays to zero exponentially at rate λ.

A.11 Proof of Proposition 4

Similar to Proposition 3, I index the utility process U by the alternatives instead of time by undoing the time change . Because the threshold strategy is invariant to the continuous time change, the utility process is stopped at the alternative , where .

I prove Proposition 4 by establishing the linear relationship between and , and then translating the relationship from to . I first approximate the tail distribution of (Lemma 14). I continue to derive the expectation of conditional on , using that of conditional on (Lemma 15). I use this indirect method because is not adapted to the natural filtration , but is. Finally, I translate the conditional expectation to conditional on by the marginal distribution of approximated in Lemma 14 and the Markov inequality.

Lemma 14.Suppose . Then has an exponential tail with exponent λ.

Proof.Proposition 3 implies that has an exponential tail, and so there exist such that . Because , the tail probability of is bounded from above by .

The difference is independent of due to the strong Markov property. Recall that Brownian motion near an extrema follows a Bessel-3 process starting at zero. Therefore, z follows the distribution of hitting time at of that process. Its density follows an exponential distribution asymptotically.¹³ Because the stopping alternative is distributed as the hitting time at of a Bessel-1 process, i.e., reflected Brownian motion, z has a thinner exponential tail than . Because and z are independent and sum to , the tail of must be at least as thick as the tail of . □

Lemma 15.Suppose . Then and satisfy

Proof.Because by definition, is independent of due to the strong Markov property of Brownian motion. Therefore, it suffices to compute and .

First, I compute . To simplify notation, I will compute it under the normalization . For standard Brownian motion starting at , let H denote the exit time from . The expectation of conditional on exiting at z is¹⁴ . Because has finite expectation, the dominated convergence theorem applies and the conditional expectation of H is given by .

Due to the quadratic variation of Brownian motion, the escape time converges to almost surely as . By the dominated convergence theorem, the conditional expectation converges as well, i.e., .

Second, I compute . For a standard Brownian motion and , the conditional Laplace transform satisfies

where

is the local time at zero.¹⁵ Because

has finite expectation by Proposition 3, the dominated convergence theorem implies that the conditional expectation of

. Lemma 13 implies that the running maximum is twice the local time at zero; thus,

Lehoczky (1977) shows that is exponentially distributed with mean . Using his result, I obtain the tail conditional expectation of by integrating the conditional expectation

Therefore, the tail conditional expectation of

For the utility process U with , the Brownian scaling property implies that the conditional expectation for volatility and threshold σz is

I obtain the desired equality by taking

and

. □

Proof of Proposition 4.Note that the coefficient of x is, in fact, . See the proof of Proposition 3.

I first show . The Markov inequality applied to Lemma 15 yields

Substituting the tail probability for

from Lehoczky (1977), I obtain

The integral over

gives

As both sides are integrable, the boundary terms from the integration by parts vanish,

where

is defined by

It is well defined for sufficiently large x. The approximation is due to Lemma 14. The conditional expectation is therefore, bounded by

I continue to show . For a standard Brownian motion stopped at drawdown , the Laplace transform of conditional on satisfies¹⁶

For the utility process with volatility

stopped at

, I take

to obtain

The independence between and implies

The Markov inequality applied to

conditional on

reads

. The tail probability is then bounded by

Given the distribution of in Lehoczky (1977), the density of conditional on is bounded by

Define

and

. Integration by parts yields

□

References

Borodin, Andrei N. and Paavo Salminen (2012), Handbook of Brownian Motion: Facts and Formulae. Birkhäuser.
Google Scholar
Callander, Steven (2011), “Searching and learning by trial and error.” American Economic Review, 101, 2277–2308.
10.1257/aer.101.6.2277
Web of Science® Google Scholar
Conze, Antoine and Vish Viswanathan (1991), “Path dependent options: The case of lookback options.” Journal of Finance, 46, 1893–1907.
10.1111/j.1540-6261.1991.tb04648.x
Web of Science® Google Scholar
Daunfeldt, Sven-Olov and Niklas Elert (2013), “When is Gibrat's law a law?” Small Business Economics, 41, 133–147.
10.1007/s11187-011-9404-x
Web of Science® Google Scholar
Garfagnini, Umberto and Bruno Strulovici (2016), “Social experimentation with interdependent and expanding technologies.” Review of Economic Studies, 83, 1579–1613.
10.1093/restud/rdw008
Web of Science® Google Scholar
Jovanovic, Boyan and Rafael Rob (1990), “Long waves and short waves: Growth through intensive and extensive search.” Econometrica, 58, 1391–1409.
10.2307/2938321
Web of Science® Google Scholar
Karatzas, Ioannis (1984), “Gittins indices in the dynamic allocation problem for diffusion processes.” Annals of Probability, 12, 173–192.
10.1214/aop/1176993381
Google Scholar
Keller, Godfrey, Sven Rady, and Martin Cripps (2005), “Strategic experimentation with exponential bandits.” Econometrica, 73, 39–68.
10.1111/j.1468-0262.2005.00564.x
Web of Science® Google Scholar
Lehoczky, John P. (1977), “Formulas for stopped diffusion processes with stopping times based on the maximum.” Annals of Probability, 5, 601–607.
10.1214/aop/1176995770
Google Scholar
Moscarini, Giuseppe and Lones Smith (2001), “The optimal level of experimentation.” Econometrica, 69, 1629–1644.
10.1111/1468-0262.00259
Web of Science® Google Scholar
Peskir, Goran (2006), “ On reflecting Brownian motion with drift.” In Proceedings of the ISCIE International Symposium on Stochastic Systems Theory and Its Applications, volume 2006 of The ISCIE Symposium on Stochastic Systems Theory and Its Applications, 1–5.
10.5687/sss.2006.1
Google Scholar
Revuz, Daniel and Marc Yor (2013), Continuous Martingales and Brownian Motion, volume 293. Springer.
Google Scholar
Strauss, Walter A. (2007), Partial Differential Equations: An Introduction. Wiley.
Google Scholar
Taylor, Howard M. (1975), “A stopped Brownian motion formula.” Annals of Probability, 3, 234–246.
10.1214/aop/1176996395
Web of Science® Google Scholar
Urgun, Can and Leeat Yariv (2025), “Contiguous search: Exploration and ambition on uncharted terrain.” Journal of Political Economy, 133.
10.1086/732889
Web of Science® Google Scholar
Weitzman, Martin L. (1979), “Optimal search for the best alternative.” Econometrica, 47, 641–654.
10.2307/1910412
Web of Science® Google Scholar

1 Garfagnini and Strulovici (2016) show that the agents may alternate between exploration and exploitation. This is because the current agent's remaining lifespan is nonstationary in their overlapping generations model.

2 In Garfagnini and Strulovici (2016), the agents can in principle continue to explore in the second period. However, it is weakly suboptimal to do so because, under their driftless Brownian specification, the expected utility is maximized at some known alternative. Therefore, the experiment in the first period depends on the marginal distribution of utilities only, but not the joint distribution. Consequently, their optimal strategy can be derived from the Gittins index as in Weitzman (1979), as if the alternatives were independent.

3 The utility is unbounded only for

. In such a case, an example strategy (which violates the growth condition) is

4 See Conze and Viswanathan (1991) for a discussion of lookback options in finance.

5 I do not use this corollary of Lemma 2 because the optimal strategy it identifies may not be adapted to the original probability space; i.e., the strategy may be a weak solution to the decision maker's problem.

6 One can also show

from the super-contact condition (7) and the strict convexity of v.

7 The function

is positive and attains 0 at

, and, therefore, it is not decreasing monotonically.

8 Therefore, the decision maker will eventually stop exploring, as Callander (2011) and Garfagnini and Strulovici (2016) show in their models with short-lived agents.

9 The proof of Proposition 4 establishes that

, and

all exhibit exponential tails. Moreover, the proof shows that all three variables satisfy pairwise linear relations, and thus implies selection biases in the pairwise regressions in the long-run profitability, long-run firm size, and historic maximum firm size in the context of firm experimentation.

10 Moscarini and Smith (2001) make a related convexification argument and attribute it to Paul Milgrom.

11 Technically, the value is not attained; instead, it is approached by continuous exploration at an arbitrarily high speed.

12 If

is the left endpoint of one of the ϕs, then the lemma already obtains.

13 See Formula 2.0.2, Chapter 4, Part II of Borodin and Salminen (2012) for the exact distribution.

14 See Formula 3.0.4(b) and Formula 3.0.5(b) of Chapter 1, Part II in Borodin and Salminen (2012).

15 See Formula 2.3.4 together with Formula 2.3.2 of Chapter 3, Part II in Borodin and Salminen (2012).

16 See Formula 2.3.4 and Formula 2.3.2 of Chapter 3, Part II in Borodin and Salminen (2012).

Volume20, Issue3

July 2025

Pages 883-909

Forward-looking experimentation of correlated alternatives

Abstract

1 Introduction

Related literature

2 Experimentation problem

3 Optimal experimentation strategy

3.1 Bounded objective function

3.2 Verification argument

4 Properties of optimal experimentation

4.1 Time-series properties

4.2 Comparative statics

4.3 Cross-sectional properties

5 Discussions on modeling assumptions

5.1 Continuous exploration condition

5.2 Inada condition

5.3 Learning cost

Appendix: Proofs

A.4 Proof of Lemma 1

A.5 Proving Lemma 2

A.6 Proving Theorem 1

A.7 Proof of Proposition 2

A.8 Proof of Corollary 1

A.9 Proof of Proposition 3

A.10 Proof of Corollary 2

A.11 Proof of Proposition 4

References

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley