Large Deviation Analysis of a Droplet Model Having a Poisson Equilibrium Distribution
Abstract
In this paper we use large deviation theory to determine the equilibrium distribution of a basic droplet model that underlies a number of important models in material science and statistical mechanics. Given and c > b, K distinguishable particles are placed, each with equal probability 1/N, onto the N sites of a lattice, where K/N equals c. We focus on configurations for which each site is occupied by a minimum of b particles. The main result is the large deviation principle (LDP), in the limit K → ∞ and N → ∞ with K/N = c, for a sequence of random, number-density measures, which are the empirical measures of dependent random variables that count the droplet sizes. The rate function in the LDP is the relative entropy R(θ∣ρ∗), where θ is a possible asymptotic configuration of the number-density measures and ρ∗ is a Poisson distribution with mean c, restricted to the set of positive integers n satisfying n ≥ b. This LDP implies that ρ∗ is the equilibrium distribution of the number-density measures, which in turn implies that ρ∗ is the equilibrium distribution of the random variables that count the droplet sizes.
1. Introduction
This paper is motivated by a natural question for a basic model of a droplet. Given and c > b, K distinguishable particles are placed, each with equal probability 1/N, onto the N sites of a lattice ΛN = {1,2, …, N}. Under the assumption that K/N = c and that each site is occupied by a minimum of b particles, what is the equilibrium distribution, as N → ∞, of the number of particles per site? We prove in Corollary 3 that this equilibrium distribution is a Poisson distribution, with mean c, restricted to the set of positive integers n satisfying n ≥ b. As we explain near the end of the Introduction, this equilibrium distribution has important applications to technologies using sprays and powders.
As in many other models in statistical mechanics, we can identify the equilibrium distribution by exhibiting it as the unique minimum point of a rate function in a large deviation principle (LDP). Other models for which this procedure can be implemented are discussed at the end of the Introduction.
For the droplet model we prove the LDP for a sequence of random probability measures, called number-density measures, which are the empirical measures of a sequence of dependent random variables that count the droplet sizes. This LDP is stated in Theorem 1. Our proof is self-contained and starts from first principles, using techniques that are familiar in applied mathematics and statistical mechanics. For example, the proof of the local large deviation estimate in Theorem 5, a key step in the proof of the LDP for the number-density measures, is based on combinatorics, Stirling’s formula, and Laplace asymptotics.
Our use of combinatorial methods goes back to Boltzmann in his work on the discrete ideal gas. He calculated the Maxwell-Boltzmann equilibrium distribution for this system by analyzing the asymptotic behavior of a particular multinomial coefficient [1]. Starting with Boltzmann’s work, combinatorial methods have remained an important tool in both statistical mechanics and in the theory of large deviations, offering insights into a wide variety of physical and mathematical phenomena via techniques that are elegant, powerful, and often elementary. In applications to statistical mechanics, this state of affairs is explained by the observation that “many fundamental questions … are inherently combinatorial, … including the Ising model, the Potts model, monomer-dimer systems, self-avoiding walks and percolation theory” [2]. For the two-dimensional Ising model and other exactly soluble models, [3, 4] are recommended.
A similar situation holds in the theory of large deviations. For example, Section 2.1 of [5] discusses combinatorial techniques for finite alphabets and points out that because of the concreteness of these applications the LDPs are proved under much weaker conditions than the corresponding results in the general theory, into which the finite-alphabet results give considerable insight. The text [6] devotes several early sections to large deviation results for i.i.d. random variables having a finite state space and proved by combinatorial methods, including a sophisticated, level-3 result for the empirical pair measure.
In order to formulate the LDP for the number-density measures in our droplet model, a standard probabilistic model is introduced. The configuration space is the set consisting of all ω = (ω1, ω2, …, ωK), where ωi denotes the site in ΛN occupied by the ith particle. The cardinality of ΩN equals NK. Denote by PN the uniform probability measure that assigns equal probability 1/NK to each of the NK configurations ω ∈ ΩN. The asymptotic analysis of the droplet model involves the two random variables, which are functions of the configuration ω ∈ ΩN: for , denotes the number of particles occupying the site in the configuration ω; for , Nj(ω) denotes the number of sites for which .
In order to carry out the asymptotic analysis of the droplet model, we introduce a quantity m = m(N) that converges to ∞ sufficiently slowly with respect to N; specifically, we require that m(N) 2/N → 0 as N → ∞. In terms of b and m we define the subset ΩN,b,m of ΩN consisting of all configurations ω for which every site of ΛN is occupied by at least b particles and at most m of the quantities Nj(ω) are positive. This second condition is a useful technical device that allows us to control the errors in several estimates. In Appendix D of [7] we present evidence supporting the conjecture that this condition can be eliminated. The discussion in that appendix involves a number of interesting topics including Stirling numbers of the second kind (see [8, pp. 96-97] and [9, §5.4]) and their asymptotic behavior [10, Example 5.4].
The random quantities in the droplet model for which we formulate an LDP are the number-density measures ΘN,b. For ω ∈ ΩN,b,m these random probability measures assign to the probability Nj(ω)/N, which is the number density of the jth droplet class. Because of the two conservation laws in (1) and because K/N = c, for ω ∈ ΩN,b,m, ΘN,b(ω) is a probability measure on having mean c. Thus ΘN,b takes values in , which is defined to be the set of probability measures on having mean c.
The probability measure PN,b,m defining the droplet model is obtained by restricting the uniform measure PN to the set of configurations ΩN,b,m. Thus PN,b,m equals the conditional probability PN(·∣ΩN,b,m). In the language of statistical mechanics PN,b,m defines a microcanonical ensemble that incorporates the conservation laws for number and mass expressed in (1).
The content of Theorem 1 is the following: as N → ∞, the sequence of number-density measures ΘN,b satisfies the LDP on with respect to the measures PN,b,m. The rate function is the relative entropy R(θ∣ρb,α) of with respect to the Poisson distribution ρb,α on having components ρb,α;j = [Zb(α)] −1 · αj/j! for . In this formula Zb(α) is the normalization that makes ρb,α a probability measure, and α equals the unique value αb(c) for which has mean c [Theorem A.2]. Using the fact that equals 0 at the unique measure , we apply the LDP for ΘN,b to conclude in Theorem 2 that is the equilibrium distribution of ΘN,b. Corollary 3 then implies that is also the equilibrium distribution of .
The space is the most natural space on which to formulate the LDP for ΘN,b in Theorem 1. Not only is the smallest convex set of probability measures containing the range of ΘN,b for all , but also the union over of the range of ΘN,b is dense in . As we explain in part (a) of Theorem 4, is not a complete, separable metric space, a situation that prevents us from directly applying general results in the theory of large deviations that require the setting of a complete, separable metric space.
The droplet model is defined in Section 2. Step 1 in the proof of the LDP for ΘN,b is to derive the local large deviation estimate in part (b) of Theorem 5. This local estimate, one of the centerpieces of the paper, gives information not available in the LDP for ΘN,b, which involves global estimates. Step 2 is to lift the local large deviation estimate to the large deviation limit for ΘN,b lying in open balls and certain other subsets of while Step 3 is to lift the large deviation limit for open balls and certain other subsets to the LDP for ΘN,b stated in Theorem 1. Steps 2 and 3 are explained in Section 4.
Details of Steps 2 and 3 as well as other routine proofs are omitted from the present paper. They appear in the unpublished companion paper [7], which also contains additional background material. The paper [1] explores how our work on the droplet model was inspired by the work of Ludwig Boltzmann on a simple model of a discrete ideal gas. The main connection is via the local large deviation estimate in part (b) of Theorem 5. When b = 0, the LDP for a path version of Θn,0 with K = tN and t > 0 varying appears in [11, 12].
The main application of the results in this paper is to technologies using sprays and powders, which are ubiquitous in many fields, including agriculture, the chemical and pharmaceutical industries, consumer products, electronics, manufacturing, material science, medicine, mining, paper making, the steel industry, and waste treatment. In this paper we focus on sprays; our theory also applies to powders with only changes in terminology [13]. The behavior of sprays might be complex depending on various parameters including evaporation, temperature, and viscosity. Our goal here is to consider the simplest model where the only assumption is made on the average size of droplets in the spray. In many situations it is important to have good control over the sizes of the droplets, which can be translated into properties of probability distributions. The size distributions are important because they determine reliability and safety in each particular application.
Interestingly, there does not seem to be a rigorous theory that predicts the equilibrium distribution of droplet sizes, analogous to the Maxwell-Boltzmann distribution of energy levels in a discrete ideal gas [14, 15]. Our goal in the present paper is to provide such a theory. We do so by focusing on one aspect of the problem related to the relative entropy, an approach that characterizes the equilibrium distribution of droplet sizes as being a Poisson distribution restricted to . We expect that this distribution will be important in experimental observations. A full understanding of droplet behavior under dynamic conditions requires treating many other aspects and is beyond the scope of this paper. We plan to apply the ideas in this paper to understand the entropy of dislocation networks [16].
The importance of predicting droplet size can be seen from the wide range of applications utilizing sprays [17, 18]. Because of the importance of this problem, novel approaches for measuring size distribution of droplet size in sprays have been developed [19–23]. What makes the problem of predicting droplet size particularly interesting is the complexity of droplet-size distribution, which is attributed to many factors such as temperature and viscosity. As [24] shows, even the nozzle plays a significant role in the outcome. Many theoretical tools used to understand the distribution of droplet size in sprays include entropy [25], which also plays a key role in the present paper.
We end the Introduction by expanding on a comment made at the beginning of this section. This comment concerns one of the main applications of large deviation theory in statistical mechanics, which is to identify the equilibrium distribution or distributions of a model as the minimum point(s) of the rate function in an LDP for the model. This procedure is also useful to study phase transitions in the model, which concern how the structure of the set of equilibrium distributions changes as the parameters defining the model change. There are numerous other models for which this procedure has been used. They include the following three lattice spin models: the Curie-Weiss spin system, the Curie-Weiss-Potts model, and the mean-field Blume-Capel model, which is also known as the mean-field BEG model. As explained in the respective Sections 6.6.1, 6.6.2, and 6.6.3 of [26], the large deviation analysis shows that each of these three models has a different phase transition structure. Details of the analysis for the three models are given in the references [6, §IV.4], [27–29]. Section 9 of [30] outlines how large deviation theory can be applied to determine equilibrium structures in statistical models of two-dimensional turbulence. Details of this analysis are given in [31].
2. Definition of Droplet Model and Main Theorem
After defining the droplet model, we state the main theorem in the paper, Theorem 1. The content of this theorem is the LDP for the sequence of random, number-density measures, which are the empirical measures of a sequence of dependent random variables that count the droplet sizes in the model. As we show in Theorem 2 and in Corollary 3, the LDP enables us to identify a Poisson distribution as the equilibrium distribution both of the number-density measures and of the droplet-size random variables. In Theorem 4 we prove a number of properties of two spaces of probability measures in terms of which the LDP for the number-density measures is formulated.
We start by fixing parameters and c ∈ (b, ∞). The droplet model is defined by a probability measure PN,b parameterized by and the nonnegative integer b. The measure depends on two other positive integers, K and m, where 2 ≤ m ≤ N < K. Both K and m are functions of N in the large deviation limit N → ∞. In this limit we take K → ∞ and N → ∞, where K/N, the average number of particles per site, equals c. Thus K = Nc. In addition, we take m → ∞ sufficiently slowly by choosing m to be a function m(N) satisfying m(N) → ∞ and m(N) 2/N → 0 as N → ∞; for example, m(N) = Nδ for some δ ∈ (0,1/2). Throughout this paper we fix such a function m(N). The parameter b and the function m = m(N) first appear in the definition of the set of configurations ΩN,b,m in (3), where these quantities will be explained.
Because K and N are integers, c must be a rational number. This in turn imposes a restriction on the values of N and K. If c is a positive integer, then N → ∞ along the positive integers and K → ∞ along the subsequence K = cN. If c = x/y, where x and y are relatively prime, positive integers with y ≥ 2, then N → ∞ along the subsequence N = yn for and K → ∞ along the subsequence K = cN = xn. Throughout this paper, when we write or N → ∞, it is understood that N and K satisfy the restrictions discussed here.
In the droplet model K distinguishable particles are placed, each with equal probability 1/N, onto the sites of the lattice ΛN = {1,2, …, N}. This simple description corresponds to a simple probabilistic model. The configuration space is the set consisting of all sequences ω = (ω1, ω2, …, ωK), where ωi ∈ ΛN denotes the site in ΛN occupied by the ith particle. Let ρ(N) be the measure on ΛN that assigns equal probability 1/N to each site in ΛN, and let PN = (ρ(N)) K be the product measure on ΩN with equal one-dimensional marginals ρ(N). Thus PN is the uniform probability measure that assigns equal probability 1/NK to each of the NK configurations ω ∈ ΩN; for subsets A of ΩN we have PN(A) = card(A)/NK, where card denotes cardinality.
The asymptotic analysis of the droplet model involves two random variables. For and ω ∈ ΩN, denotes the number of particles occupying site in the configuration ω. For and ω ∈ ΩN, Nj(ω) denotes the number of sites for which . The dependence of and Nj(ω) on N is not indicated in the notation. Because the distributions of both random variables depend on N, both and Nj form triangular arrays.
The constraint restricting the number of positive components of N(ω) is a useful technical device that allows us to control the errors in several estimates. In Appendix D of [7] we give evidence supporting the conjecture that this restriction can be eliminated.
When b is a positive integer, for each ω ∈ ΩN,b,m, each site in ΛN is occupied by at least b particles. In this case it is useful to think of each particle as having one unit of mass and of the set of particles at each site as defining a droplet. With this interpretation, for each configuration ω, denotes the mass or the size of the droplet at site . The jth droplet class has Nj(ω) droplets and mass jNj(ω). Because the number of sites in ΛN equals N and the sum of the masses of all the droplet classes equals K, it follows that the quantities Nj(ω) satisfy the two conservation laws in (1) for all ω ∈ ΩN,b,m.
We now consider the modifications that must be made in these definitions when b = 0. In this case the first constraint in the definition of ΩN,b,m disappears because we allow sites to be occupied by 0 particles, and therefore Nj(ω) is indexed by . On the other hand, we retain the second constraint in the definition of ΩN,0,m, which requires that for any configuration ω ∈ ΩN,0,m at most m of the components Nj(ω) for are positive. When b = 0, the definition of ΩN,0,m becomes ΩN,0,m = {ω ∈ ΩN : |N(ω)|+ ≤ m = m(N)}. Because the choice b = 0 allows sites to be empty, we lose the interpretation of the set of particles at each site as being a droplet. However, for ω ∈ ΩN,0,m the two conservation laws in (1) continue to hold.
We next introduce several spaces of probability measures that arise in the large deviation analysis of the droplet model. denotes the set of probability measures on . Thus has the form , where the components θj satisfy θj ≥ 0 and . We say that a sequence of measures in converges weakly to , and write θ(n)⇒θ, if, for any bounded function f mapping into , as n → ∞. is topologized by the topology of weak convergence. There is a standard technique for introducing a metric structure on for which we quote the main facts. Because is a complete, separable metric space with metric d(x, y) = |x − y|, there exists a metric π on called the Prohorov metric with the following two properties: (1) convergence with respect to the Prohorov metric is equivalent to weak convergence [32, Thm. 3.3.1]; (2) with respect to the Prohorov metric, is a complete, separable metric space [32, Thm. 3.1.7].
We denote by the set of measures in having mean c. Thus has the form , where the components θj satisfy θj ≥ 0, , and . The number-density measures ΘN,b defined in (5) take values in .
According to part (a) of Theorem 4, is not a closed subset of . Hence it is natural to introduce the closure of in . As we prove in part (b) of Theorem 4, the closure of in equals , which is the set of measures in having mean lying in the closed interval [b, c]. Being the closure of the relatively compact, separable metric space , is a compact, separable metric space with respect to the Prohorov metric. This space appears in the formulation of the large deviation upper bound in part (c) of Theorem 1.
As a consequence of the fact that is not closed in , the large deviation upper bound takes two forms depending on whether the subset F of is compact or whether F is closed. When F is compact, in part (b) we obtain the standard large deviation upper bound for F. When F is closed, in part (c) we obtain a variation of the standard large deviation upper bound, which, when F is compact, coincides with the upper bound in part (b). The refinement in part (c) is important. It is applied in the proof of Theorem 2 to show that is the equilibrium distribution of the number-density measures ΘN,b. In turn, Theorem 2 is applied in the proof of Corollary 3 to show that is the equilibrium distribution of the droplet-size random variables .
In the next theorem we assume that m is the function m(N) appearing in the definition of ΩN,b,m in (3) and satisfying m(N) → ∞ and m(N) 2/N → 0 as N → ∞. The assumption that m(N) 2/N → 0 is used to control error terms in Lemmas 6 and 7 in the present paper and in Lemma B.3 in [7]. This assumption on m(N) is optimal in the sense that it is a minimal assumption guaranteeing that error terms in parts (a) and (b) of Lemma B.3 in [7] converge to 0. In the next theorem, for A a subset of or we denote by the infimum of over θ ∈ A.
Theorem 1. Fix a nonnegative integer b and a rational number c ∈ (b, ∞). Let m be the function m(N) appearing in the definition of ΩN,b,m in (3) and satisfying m(N) → ∞ and m(N) 2/N → 0 as N → ∞. Let be the distribution having the components defined in (7). Then as N → ∞, with respect to the measures PN,b,m, the sequence ΘN,b satisfies the LDP on with rate function in the following sense.
- (a)
maps into [0, ∞] and has compact level sets in ; that is, for any M < ∞ the set is compact.
- (b)
For any compact subset F of we have the large deviation upper bound
() - (c)
For any closed subset F of , let denote the closure of F in . We have the large deviation upper bound
() - (d)
For any open subset G of we have the large deviation lower bound
()
The properties of in part (a) are proved in [33, Lem. 1.4.1] and part (a) of Theorem A.1. The basic step in proving the large deviation bounds in parts (b)–(d) is the local large deviation estimate in part (b) of Theorem 5. As explained in Section 4, this local estimate is lifted to large deviation limits involving open balls stated in Theorem 8, which in turn are used to derive the bounds in parts (b)–(d) of Theorem 1.
In the next theorem we use the large deviation upper bound in part (c) of Theorem 1 to prove that the Poisson distribution is the equilibrium distribution of the number-density measures ΘN,b. In this theorem denotes the complement in of the open ball . denotes the complement in of the open ball .
Theorem 2. One assumes the hypotheses of Theorem 1. The following results hold for any ε > 0.
- (a)
The quantity is strictly positive.
- (b)
For any number y in the interval (0, x∗) and all sufficiently large N
()
Proof. The starting point is the large deviation upper bound in part (c) of Theorem 1 applied to the closed set , which is a subset of . We denote the closure of in by . Since , the large deviation upper bound in part (c) of Theorem 1 takes the form
We now apply Theorem 2 to prove that is also the equilibrium distribution of the random variables , which count the droplet sizes at the sites of ΛN. This is the content of the next corollary. A fact needed in the proof is that ΘN,b is the empirical measure of these random variables; that is, for ω ∈ ΩN,b,m, ΘN,b(ω) assigns to subsets A of the probability . This representation is valid because both ΘN,b(ω) and the empirical measure assign to j ∈ ΛN the probability Nj(ω)/N.
Corollary 3. One assumes the hypotheses of Theorem 1. Then for any site and any
Proof. Since the random variables are identically distributed, it suffices to prove the corollary for . For fixed , the limit (12) with g(θ) = θj yields
The last theorem in this section proves several properties of and with respect to the Prohorov metric that are needed in the paper.
Theorem 4. Fix a nonnegative integer b and a real number c ∈ (b, ∞). The metric spaces and have the following properties.
- (a)
, the set of probability measures on having mean c, is a relatively compact, separable subset of . However, is not a closed subset of and thus is not a compact subset or a complete metric space.
- (b)
, the set of probability measures on having mean lying in the closed interval [b, c], is the closure of in . is a compact, separable subset of .
Proof. (a) For satisfying ξ ≥ b let Ψξ denote the compact subset {b, b + 1, …, ξ} of , and let [Ψξ] c denote its complement. For any
We now prove that is not a closed subset of by exhibiting a sequence having a weak limit that does not lie in . Let θ be any measure in with mean β ∈ [b, c); thus . The sequence
(b) Since is a separable subset of and is dense in , it follows that is separable. We prove that is the closure of in . Let θ(n) be a sequence in converging weakly to . Since θ(n)⇒θ implies that for each , Fatou’s lemma implies that c = liminfn→∞〈θ(n)〉 ≥ 〈θ〉, where 〈θ(n)〉 and 〈θ〉 denote the means of θ(n) and θ. Since for any we have 〈θ〉≥b, it follows that c ≥ 〈θ〉≥b. This shows that the closure of in is a subset of . We next prove that is a subset of the closure of in by showing that for any there exists a sequence such that θ(n)⇒θ. If 〈θ〉 = c, then we choose θ(n) = θ for all . If 〈θ〉 = β ∈ [b, c), then we use the sequence θ(n) in (17), which converges weakly to θ. We conclude that θ lies in the closure of and thus that is a subset of the closure of in . This completes the proof of part (b). The proof of Theorem 4 is done.
In the next section we present the local large deviation estimate that will be used in Section 4 to prove the LDP for ΘN,b in Theorem 1.
3. Local Large Deviation Estimate Yielding Theorem 1
For ω ∈ ΩN,b,m the components ΘN,b;j(ω) of the number-density measure defined in (5) are Nj(ω)/N for , where Nj(ω) denotes the number of sites in ΛN containing j particles in the configuration ω. We denote by N(ω) the sequence . By definition, for every ω ∈ ΩN,b,m each site is occupied by at least b particles, and |N(ω)|+ ≤ m = m(N). It follows that AN,b,m is the range of N(ω) for ω ∈ ΩN,b,m; the two sums involving νj in (18) correspond to the two sums involving Nj(ω) in (1).
In part (b) of the next theorem we state the local large deviation estimate for the event {ΘN,b = θN,b,ν}. In part (a) we introduce the Poisson distribution that appears in the local estimate; is defined in terms of a parameter αb(c) guaranteeing that it has mean c.
In part (a) of Theorem C.2 in [7] we give the straightforward proof of the existence of αb(c) for b = 1. The proof of the existence of αb(c) for general is much more subtle than the proof for b = 1. The proof for general is given in Theorem A.2 in the present paper.
Theorem 5. (a) Fix a nonnegative integer b and a real number c ∈ (b, ∞). For α ∈ (0, ∞) let ρb,α be the measure on having components ρb,α;j = [Zb(α)] −1 · αj/j! for , where Z0,α = eα, and, for , . Then there exists a unique value αb(c)∈(0, ∞) such that lies in the set of probability measures on having mean c. If b = 0, then α0(c) = c. If , then αb(c) is the unique solution in (0, ∞) of αZb−1(α)/Zb(α) = c.
(b) Fix a nonnegative integer b and a rational number c ∈ (b, ∞). Let m be the function m(N) appearing in the definition of ΩN,b,m in (3) and satisfying m(N) → ∞ and m(N) 2/N → 0 as N → ∞. For any ν ∈ AN,b,m we define to have the components θN,b,ν;j = νj/N for . Then
We now prove the local large deviation estimate in part (b) of Theorem 5. This proof is based on a combinatorial argument that is reminiscent of and is as natural as the combinatorial argument used to prove Sanov’s theorem for empirical measures defined in terms of i.i.d. random variables having a finite state space [1, §3]. Part (b) of Theorem 5 is proved by analyzing the asymptotic behavior of the product of two multinomial coefficients that we now introduce.
The next two steps in the proof of the local estimate given in part (b) of Theorem 5 are to prove the asymptotic formula for card(ΔN,b,m;ν) in Lemma 6 and the asymptotic formula for card(ΩN,b,m) in part (b) of Lemma 7. The proof of Lemma 6 is greatly simplified by a substitution in line 4 of (34). This substitution involves a parameter α ∈ (0, ∞), which, we emphasize, is arbitrary in this lemma. The substitution in line 4 of (34) allows us to express the asymptotic behavior of both card(ΔN,b,m;ν) in Lemma 6 and card(ΩN,b,m) in Lemma 7 directly in terms of the relative entropy R(θN,b,ν∣ρb,α), where ρb,α is the probability measure on having the components defined in part (a) of Theorem 5. One of the major issues in the proof of part (b) of Theorem 5 is to show that the arbitrary parameter α appearing in Lemmas 6 and 7 must take the value αb(c), which is the unique value of α guaranteeing that [Theorem 5(a)]. We show that α must equal αb(c) after the statement of Lemma 7.
Lemma 6. Fix a nonnegative integer b and a rational number c ∈ (b, ∞). Let α be any real number in (0, ∞), and let m be the function m(N) appearing in the definition of ΩN,b,m in (3) and satisfying m(N) → ∞ and m(N) 2/N → 0 as N → ∞. We define
Proof. The proof is based on a weak form of Stirling’s approximation, which states that, for all satisfying N ≥ 2 and for all satisfying 1 ≤ n ≤ N, 1 ≤ log(n!) − (nlogn − n) ≤ 2logN. We summarize the last formula by writing
To simplify the notation, we rewrite (24) in the form card(ΔN,b,m;ν) = M1(N, ν) · M2(K, ν), where M1(N, ν) denotes the first multinomial coefficient on the right side of (24), and M2(K, ν) denotes the second multinomial coefficient on the right side of (24). We have
The asymptotic behavior of the first term on the right side of the last display is easily calculated. Since ν ∈ AN,b,m, there are |ν|+ ∈ {1,2, …, m} positive components νj. Because of this restriction on the number |ν|+ of positive components of ν, we are able to control the error in line 3 of (29). We define . For each j ∈ ΨN(ν), since the components νj satisfy 1 ≤ νj ≤ N, we have log(νj!) = νjlogνj − νj + O(logN) for all N ≥ 2. Using the fact that , we obtain
We now study the asymptotic behavior of the second term on the right side of (28). Since K = Nc, we obtain for all K ≥ 2
Substituting (29) and (31) into (28), we obtain
Now comes the key step, the purpose of which is to express the sum in the next-to-last line of (32) as the relative entropy R(θN,b,ν;j∣ρb,α), where α ∈ (0, ∞) is arbitrary. To express the sum in the next-to-last line of (32) as R(θN,b,ν∣ρb,α), we rewrite the sum as shown in line 4 of the next display:
The next step in the proof of the local large deviation estimate in part (b) of Theorem 5 is to prove the asymptotic formula for card(ΩN,b,m) stated in part (b) of the next lemma. The proof of this lemma uses Lemma 6 in a fundamental way. After the statement of this lemma we show how to apply it and Lemma 6 to prove part (b) of Theorem 5.
Lemma 7. Fix a nonnegative integer b and a rational number c ∈ (b, ∞). The following conclusions hold:
- (a)
limN→∞N−1logcard(AN,b,m) = 0.
- (b)
Let α be the positive real number in Lemma 6, and let m be the function m(N) appearing in the definition of ΩN,b,m in (3) and satisfying m(N) → ∞ and m(N) 2/N → 0 as N → ∞. We define f(α, b, c, K) = logZb(α) − clogα + clogK − c. Then R(θ∣ρb,α) attains its infimum over , and
()
We now complete the proof of part (b) of Theorem 5 by proving Lemma 7.
Proof of Lemma 7. (a) We write . By [8, Cor. 2.5] the number of elements in the set indexed by k equals the binomial coefficient C(N − 1, k − 1). Since by assumption m/N → 0 as N → ∞, for all sufficiently large N, the quantities C(N − 1, k − 1) are increasing and are maximal when k = m. Since C(N − 1, k − 1) ≤ C(N, k), it follows that
(b) The starting point is (23), which states that . For distinct ν ∈ AN,b,m the sets ΔN,b,m;ν are disjoint. Hence
We continue with the estimation of card(ΩN,b,m). By Lemma 6
We now prove that ηN → 0 as N → ∞. To do this, we use (45) to write
We now prove (48). R(·∣ρb,α) is lower semicontinuous on [33, Lem. 1.4.3(b)] and thus on . Since R(·∣ρb,α) has compact level sets in [Theorem A.1(a)], it attains its infimum over at some measure θ∗. We apply Theorem B.1 in [7] to θ = θ∗, obtaining a sequence θ(N) with the following properties: (1) for , θ(N) ∈ BN,b,m has components for , where ν(N) is an appropriate sequence in AN,b,m; (2) θ(N)⇒θ∗ as N → ∞; (3) R(θ(N)∣ρb,α) → R(θ∗∣ρb,α) as N → ∞. The limit in (48) follows from the inequalities
In the next section we explain how the local large deviation estimate in part (b) of Theorem 5 yields the LDP in Theorem 1.
4. Proof of Theorem 1 from Part (b) of Theorem 5
In Theorem 1 we state the LDP for the sequence ΘN,b of number-density measures. This sequence takes values in , which is the set of probability measures on having mean c ∈ (b, ∞). The purpose of the present section is to explain how the local large deviation estimate in part (b) of Theorem 5 yields the LDP for ΘN,b. All details appear in Section 4 of [7]. The basic idea is first to prove the large deviation limit for ΘN,b lying in open balls in and in other subsets defined in terms of open balls and then to use this large deviation limit to prove the LDP in Theorem 1.
In Theorem 8 we state the large deviation limit for open balls and other subsets defined in terms of open balls. Two types of open balls are considered. Let θ be a measure in , and take r > 0. Part (a) states the large deviation limit for open balls , where π denotes the Prohorov metric on . This limit is used to prove the large deviation upper bound for compact subsets of in part (b) of Theorem 1 and the large deviation lower bound for open subsets of in part (d) of Theorem 1. Now let θ be a measure in . Part (b) states the large deviation limit for sets of the form , where . This limit is used to prove the large deviation upper bound for closed subsets in part (c) of Theorem 1. If , then , and the conclusions of parts (a) and (b) of the next theorem coincide.
Theorem 8. Fix a nonnegative integer b and a rational number c ∈ (b, ∞). Let m be the function m(N) appearing in the definitions of ΩN,b,m in (3) and satisfying m(N) → ∞ and m(N) 2/N → 0 as N → ∞. The following conclusions hold:
- (a)
Let θ be a measure in and take r > 0. Then for any open ball Bπ(θ, r) in , is finite, and one has the large deviation limit
() - (b)
Let θ be a measure in and take r > 0. Then the set is nonempty, is finite, and one has the large deviation limit
()
We prove Theorem 8 by applying the local large deviation estimate in part (b) of Theorem 5. A key step is to approximate probability measures in Bπ(θ, ε) and in by appropriate sequences of probability measures in the range of ΘN,b. This procedure allows one to show in part (a) that the infimum can be approximated by the infimum of over θ lying in the intersection of Bπ(θ, ε) and the range of ΘN,b; a similar statement holds for the infimum in part (b). A set of hypotheses that allow one to carry out this approximation procedure is given in Theorem 4.2 in [7], a general formulation that yields Theorem 8 as a special case.
Theorem 1 states the LDP for the number-density measures ΘN,b. In order to complete the proof of Theorem 1, we must lift the large deviation limits in Theorem 8 to the large deviation upper bound for compact sets and for closed sets and the large deviation lower bound for open sets. The large deviation lower bound for open sets is immediate from the limit in part (a). To prove the large deviation upper bound for compact sets, we cover the compact set by open balls and use the limit in part (a); the large deviation upper bound for closed sets follows by a similar procedure involving part (b). The details of this procedure are carried out as an application of general formulation in Theorem 4.3 in [7].
In the Appendix we prove two properties of the relative entropy and prove the existence of the quantity αb(c) appearing in part (a) of Theorem 5.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The research of Shlomo Ta’asan is supported in part by a grant from the National Science Foundation (NSF-DMS-1216433). Richard S. Ellis thanks Jonathan Machta for sharing his insights into statistical mechanics and Michael Sullivan for his generous help with a number of topological issues arising in this paper. Both authors thank the referee for a careful reading of the paper and for suggesting a number of references.
Appendix
Properties of Relative Entropy and Existence of αb(c)
We fix a nonnegative integer b and a real number c ∈ (b, ∞). Given θ a probability measure on , the mean of θ is denoted by 〈θ〉. In Theorem A.1 we present two properties of the relative entropy R(θ∣ρb,α) and for θ in each of the following three spaces, which are introduced in Section 2: , the set of probability measures on ; , the set of satisfying 〈θ〉 = c; and , the set of satisfying 〈θ〉∈[b, c].
We recall that, for α ∈ (0, ∞), ρb,α denotes the Poisson distribution on having components ρb,α;j = [Zb(α)] −1 · αj/j! for , where Z0(α) = eα, and, for , . According to part (a) of Theorem 5 there exists a unique value α = αb(c) for which ; thus lies in . In Theorem A.2 we prove the existence of αb(c). In part (a) of the next theorem we show that R(θ∣ρb,α) has compact level sets in , , and . After the statement of Lemma 7 we use part (b) of the next theorem to show that the arbitrary parameter α in Lemmas 6 and 7 must have the value αb(c).
Theorem A.1. Fix a nonnegative integer b and a real number c ∈ (b, ∞). For any α ∈ (0, ∞) the relative entropy has the following properties:
- (a)
R(·∣ρb,α) has compact level sets in , , and .
- (b)
For any , .
Proof. (a) The fact that has compact level sets in is proved in part (c) of Lemma 1.4.3 in [33]. Since is a compact subset of [Theorem 4(d)], R(·∣ρb,α) also has compact level sets in . Because is not a closed subset of [Theorem 4(a)], the proof that R(·∣ρb,α) has compact level sets in is more subtle. If θ(n) is any sequence in satisfying R(θ(n)∣ρb,α) ≤ M < ∞, then since and R(·∣ρb,α) has compact level sets in , there exist and a subsequence such that and R(θ∣ρb,α) ≤ M. To complete the proof that R(·∣ρb,α) has compact level sets in , we must show that ; that is, 〈θ〉 = c. By Fatou’s lemma . In addition, for any w ∈ (0, ∞)
(b) We define g(α, b, c) = logZb(α) − clogα − (logZb(αb(c)) − clogαb(c)). Step 1 is to prove that for any
We now prove the two assertions in Step 2. R(·∣ρb,α) is lower semicontinuous on [33, Lem. 1.4.3(b)] and thus on . Since R(·∣ρb,α) has compact level sets in , it attains its infimum over . The relative entropy attains its minimum value of 0 over at the unique measure [33, Lem. 1.4.1]. Hence (A.2) implies that the minimum value of R(·∣ρb,α) over equals
We now prove that there exists a unique value of αb(c) for which . The conclusion of the next theorem is part (a) of Theorem C.1 in [7]. In part (b) of that theorem we derive two sets of bounds on αb(c) and use these bounds to show that αb(c) is asymptotic to c as c → ∞. In part (d) of Theorem C.1 in [7] we make precise the relationship between and a Poisson random variable having parameter αb(c).
Theorem A.2. Fix a nonnegative integer b and a real number c ∈ (b, ∞). There exists a unique value αb(c)∈(0, ∞) such that lies in the set of probability measures on having mean c. If b = 0, then α0(c) = c. If , then αb(c) is the unique solution in (0, ∞) of αZb−1(α)/Zb(α) = c.
According to this theorem, for , αb(c) is the unique solution of αZb−1(α)/Zb(α) = c. The heart of the proof of Theorem A.2, and its most subtle step, is to prove that the function γb(α) = αZb−1(α)/Zb(α) satisfies for α ∈ (0, ∞) and thus is monotonically increasing on this interval. This fact is proved in the next lemma.
Lemma A.3. Fix a positive integer b and a real number c ∈ (b, ∞). For α ∈ (0, ∞) the function γb(α) = αZb−1(α)/Zb(α) satisfies .
Proof. For and for α ∈ (0, ∞), we have . Thus . The key to proving that is to represent logZb(α) in terms of the moment generating function of a probability measure. We do this by first expressing Zb(α) in terms of the upper incomplete gamma function via the formula . As suggested in [38], we now make the change of variables x = yα, obtaining the representation
Using (A.6) and the formulas and , we calculate
We are now ready to prove Theorem A.2.
Proof of Theorem A.2. We first consider b = 0. In this case ρ0,α is a standard Poisson distribution on having mean α. It follows that α0(c) = c is the unique value for which has mean c and thus lies in . This completes the proof for b = 0.
We now consider . In this case ρb,α is a probability measure on having mean
We have proved the theorem for all . Since we also validated the conclusion of the theorem for b = 0, the proof for all nonnegative integers b is done.