The Price equation shows the unity between the fundamental expressions of change in biology, in information and entropy descriptions of populations, and in aspects of thermodynamics. The Price equation partitions the change in the average value of a metric between two populations. A population may be composed of organisms or particles or any members of a set to which we can assign probabilities. A metric may be biological fitness or physical energy or the output of an arbitrarily complicated function that assigns quantitative values to members of the population. The first part of the Price equation describes how directly applied forces change the probabilities assigned to members of the population when holding constant the metrical values of the members—a fixed metrical frame of reference. The second part describes how the metrical values change, altering the metrical frame of reference. In canonical examples, the direct forces balance the changing metrical frame of reference, leaving the average or total metrical values unchanged. In biology, relative reproductive success (fitness) remains invariant as a simple consequence of the conservation of total probability. In physics, systems often conserve total energy. Nonconservative metrics can be described by starting with conserved metrics, and then studying how coordinate transformations between conserved and nonconserved metrics alter the geometry of the dynamics and the aggregate values of populations. From this abstract perspective, key results from different subjects appear more simply as universal geometric principles for the dynamics of populations subject to the constraints of particular conserved quantities.

1 Introduction

Changes in populations can often be described by changes in probability distributions. The dynamics of probability distributions therefore sets the basis for much of theoretical population biology.

This article develops abstract principles for the dynamics of probability distributions. Those abstract principles deepen general understanding, leading to better connections of theoretical population biology to physics, statistics, and other population-based disciplines.

To understand the dynamics of probability distributions, one must consider the forces and constraints that influence the change in populations. Many methods can be used to study dynamics. Here, I apply the Price equation, a highly abstract description of change in populations. The abstractness of the Price equation facilitates discovery and understanding of connections between seemingly different disciplines.

I use the Price equation to show the essentially identical basis for fundamental equations of natural selection, entropy, and information. I emphasize the first steps in how one might go about building a common framework in which to understand the similarities and differences between various disciplines. From this abstract perspective, key results from different subjects appear more simply as universal geometric principles for the dynamics of populations subject to the constraints of particular conserved quantities.

2 Overview

This article provides the basis for unifying diverse subjects. Given the incompatible goals, methods, languages, and cultures of the different disciplines, it is useful to begin with an extended overview.

This overview serves only to orient in the direction of what follows, not as a complete summary unto itself. Readers who prefer to start with the details may wish to skip this section.

Sections 3-5 introduce the Price equation and prepare for application to different subjects. In the Price equation, a population consists of different types. Each type associates with a frequency or probability and with a property. I assume that the properties are quantitative values. I use the words frequency and probability interchangeably. In other contexts, there may be good reasons to distinguish between these words.

The Price equation partitions the total change between two populations into a part caused by changes in frequencies and a part caused by changes in properties. That separation allows clear understanding of dynamics in terms of changes in probability distributions and changes in population quantities, such as biological fitness or physical energy or economic wealth.

Section 6 presents the canonical equation of conservation in populations, in which the change caused by frequency differences balances the change caused by property value differences. In biology, this equation represents the fact that the average of relative reproductive success (fitness) cannot change, because increases in relative fitness caused by natural selection must be exactly balanced by decreases in relative fitness caused by the changed state of the population.

The conservation of relative fitness arises directly from the conservation of total probability. Alternative measures of property values can be understood as geometric coordinate transformations from the property of fitness (frequency change) to alternative measures that often lead to nonconservative changes in populations. For example, a logarithmic measure of fitness leads to classical measures of information.

Section 7 describes various identities and alternative partitions for the conservation of total probability. The different notational forms provide the basis for connecting seemingly different subjects to the common underlying geometric principles.

Section 8 considers frequency changes in relation to an abstract notion of force. By expressing frequency changes in terms of force, the Price equation partitions the conservation of total probability into two balancing components of change. The first component arises from directly acting forces with respect to a fixed frame of reference for the quantitative properties. The second balancing component of change arises from the inertial forces that alter the frame of reference.

The balance between the consequences of the direct and inertial forces provides an analogy to d'Alembert's principle of mechanics. That connection establishes a first step in relating different disciplines to the common underlying geometric foundation.

Sections 9, 11 transform the quantitative property of frequency change into logarithmic coordinates. In the canonical Price equation's partition of conserved total probability into direct and inertial components, the property of each type is its frequency change or growth rate, an analogy with biological fitness. In particular, the relative growth, or fitness, of the ith type is $urn:x-wiley:20457758:media:ece32922:ece32922-math-0001$ , the ratio of the derived frequency, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0002$ , relative to the initial frequency, q_i.

The change between the initial and derived frequency can be considered as a path divided into segments, in which the overall growth, or fitness, arises by multiplication of the fitnesses along each segment of the path.

If we transform our focal property of fitness to logarithmic coordinates, then we can add component property values along the segments of a path, achieving an additive geometry of change that greatly enhances the power of analysis and interpretation. The classical notions of information and entropy follow immediately from use of the logarithmic coordinates in the canonical Price equation partition of conserved total probability.

Sections 12 and 13 continue to set the geometric foundations for analysis. When we divide a path of change into many small segments, then we can think of overall change as the combination of many small instantaneous changes in response to directly applied force at each point along the path.

For small changes, the direct force at each point becomes approximately the same for the initial linear coordinates of change, w_i, and the logarithmic coordinates, log w_i, apart from a constant shift that does not alter the dynamics. The convergence of linear and logarithmic coordinates with respect to small changes explains the common forms of many fundamental results in different fields of study.

Section 14 develops two complementary abstract notions of force. In the canonical expression of the Price equation for the conservation of total probability, the “fitness” term $urn:x-wiley:20457758:media:ece32922:ece32922-math-0003$ simply describes the change in frequencies relative to the fixed frame of reference given by the initial frequencies. One may treat this description of change as an inductive expression of an underlying force.

Alternatively, it often makes sense to consider the initial frequencies and forces as given, from which one deduces the change in frequency. This section expresses the given forces geometrically by the separation between the initial frequencies, q_i, and the given point, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0004$ . By expressing force in this way, we have a common geometric basis for the inductive and deductive perspectives.

Section 15 develops the deductive perspective by deriving the changes in frequencies for given initial frequencies and given forces. The analysis applies the Lagrangian method, which maximizes the first component of the Price equation partition. That first component is an abstraction of the classical mechanics action term, as the virtual work of the direct forces with respect to a fixed frame of reference. The Lagrangian method generalizes the principle of least action.

The Lagrangian also includes various forces of constraint, such as the conservation of total probability, and any additional forces associated with other conserved quantities. The forces of constraint impose a limited set of potential paths that may be followed in the geometric space of frequency change. The actual path of change extremizes the action among those paths that are consistent with the forces of constraint.

Sections 16-18 present a partial maximum entropy production principle that follows from the dynamics of frequency change. To obtain this result, I partition the direct force into two components. The first component becomes an additional force of constraint that expresses the invariance imposed by the conservation of some system quantity, such as energy or biomass or the direct change in some value. The remaining component of the direct force is—log q_i, which can be thought of as the entropy or information in the ith dimension.

The entropy becomes the action term maximized by the path of change, leading to a path that maximizes the production of entropy. Because the maximization is taken with respect to the fixed frame of reference defined by the initial population, ignoring any inertial forces that alter the frame of reference, one can think of the entropy production as the result of a partial change holding constant the frame of reference—the partial maximum entropy production principle.

Sections 19 and 20 develop the notion of a conserved system quantity as a force of constraint. Jaynes maximum entropy analysis of thermodynamics and probability patterns follows as a special case of the general geometric principles of change in populations developed in earlier sections. From Jaynes’ work and the later extensions of his theory to simple invariance principles, we have a unified framework in which to understand the relations between commonly observed probability distributions.

Section 21 discusses alternative ways in which to interpret maximum entropy paths. I argue that the most basic principles derive from the underlying geometry. Notions of entropy and information are simply interpretations of that geometry applied to particular disciplines of study.

Section 22 relates the path of change for populations to the Fisher information metric. That metric arises frequently in particular disciplines, including the fundamental approaches of information geometry.

Sections 23 and 24 briefly review key results. The Appendix provides brief histories of key topics and background references.

3 Separation of Frequency and Property

The Price equation provides an abstract way in which to analyze changes in populations. The equation separates the frequency of entities from the property of those entities (Frank, 2012a; Price, 1972a).

Suppose, for example, that for entities with label i, we express frequency as q_i and the average of the associated property value as z_i. The z_i values can be height, or energy level, or any quantity.

If entities with label i always have an average value, z_i, then frequency change completely describes population change. If the change in frequency between two populations is $urn:x-wiley:20457758:media:ece32922:ece32922-math-0005$ , then the change in the average value of z is

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0006$

in which the dot product, Δq · z, is understood in the usual way as the sum of the element-wise product of two vectors.

Alternatively, one may separate frequency from property. Thus, we have differences in frequency, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0007$ , and differences in property values, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0008$ .

For example, a transportation planner might study the overall assessment of changing modes of transport in a population. The index i could label different transportation modes, such as automobile, train, and so on. The frequency q_i is the fraction of individuals who travel by a particular mode. The quantity z_i may be the relative assessment for the value associated with a transportation mode.

The separation of frequency and property allows a more general description of change. Changes in the total assessment of transportation can arise from changes in the frequencies of usage, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0009$ , and from changes in the assessment of value for each mode, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0010$ .

4 Set Mapping of Labels Between Populations

Our goal is to describe the change between two populations. We may arbitrarily label one population as the ancestor and the second population as the descendant. The general formulation concerns only the differences between populations, independently of any particular underlying scale of separation, such as space or time or updating in light of new evidence. In this section, I consider the example of separation between populations by time.

The term $urn:x-wiley:20457758:media:ece32922:ece32922-math-0011$ is the change in the descendant frequency, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0012$ , compared with the ancestral frequency, q_i. For the transportation example, one would typically read this as the frequency of people traveling by train or other mode, i, at two different times. If the frequency of people traveling by train is increasing, then Δq_i is positive. That interpretation makes a lot of sense and is nearly universal.

The Price equation allows a more abstract notion of the mapping between sets. Let $urn:x-wiley:20457758:media:ece32922:ece32922-math-0013$ be the frequency of entities in the second population that derive from type i in the first population. Thus, for travel mode by train, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0014$ would be the frequency of individuals in the descendant population who derived from, or map to, train travelers in the ancestral population.

Consider two interpretations. First, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0015$ and q_i could have their traditional meaning of the frequencies of train travelers at each point in time. For example, change may occur by social contagion, in which people become train travelers only by learning about trains from someone who already travels by train; an individual train traveler maps to self as a descendant train traveler. In this case, each descendant train traveler maps to a train traveler in the ancestral population. Positive Δq_i reflects growth of the ith class by successful recruitment.

In a second interpretation, we could map descendant individuals to their mothers. Then, Δq_i has to do with the number of babies produced by each mother. In this case, a descendant's label i is defined only by ancestral type. Descendants do not have their own types, only their mapping to an ancestral i.

We handle the fact that descendants may use travel modes that differ from their mother by adjusting the change in property value, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0016$ . For mothers who travel by train, with property value z_i, their descendants have some average property value, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0017$ , that accounts for both changes in travel mode by descendants and changes in property value associated with each travel mode.

In the general, abstract interpretation, the label i applies only to the initial, or ancestral set. All entities from the second, or descendant, population map to ancestors, and thus derive their labels from their ancestors. We can use partial assignments, so that a descendant is made up of various fractions of ancestors, each descendant part accounted for separately by its assignment to an ancestral label, i.

At first glance, this set mapping abstraction may seem rather complicated and obscure. However, its great power arises from the fact that nearly all studies of changes in populations can be described by specific mapping assumptions and associated interpretations. Thus, anything that we can prove about the general abstract setup applies to the very many apparently different special cases that arise in different applications.

5 The Price Equation

The Price equation (Frank, 2012a; Price, 1972a) describes the change between two populations in the aggregate value of some property (this section is modified from Frank, 2015). Each component of the population has a frequency weighting, q, and a property value, z. Begin with a discrete analog of the chain rule for differentiation of a product

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0018$

in which q′ = q + Δq and z′ = z + Δz. The same chain rule can be applied to vectors. Using dot product notation, we obtain an abstract form of the Price equation (Frank, 2012a,b, 2013)

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0019$ (1)

in which a dot product is understood in the usual way as q · z = ∑ q_i z_i.

This equation can be interpreted in various ways, as discussed in prior sections. In general analysis, I adopt the most abstract interpretation with regard to set mapping between two populations. Roughly speaking, we can take q_i to be the frequency associated with a subset, i, of the initial population, such that the total frequency is ∑ q_i = 1. Thus, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0020$ is the average of z.

Here, z_i is an arbitrary function that maps i to some property value, and z_i is interpreted as the average of z in each dimension or subset, i. Because z can be any quantity, calculated in any way, this equation gives the most general expression for $urn:x-wiley:20457758:media:ece32922:ece32922-math-0021$ , the change in the average of z. One can think of $urn:x-wiley:20457758:media:ece32922:ece32922-math-0022$ as a functional of the arbitrary function, z, that maps i ↦ z_i.

For a second population, with frequencies $urn:x-wiley:20457758:media:ece32922:ece32922-math-0023$ and values $urn:x-wiley:20457758:media:ece32922:ece32922-math-0024$ , we have $urn:x-wiley:20457758:media:ece32922:ece32922-math-0025$ , in which the primes denote the abstract mapping described in the prior section. Our only restriction is that we can map the index i between the two populations. We may define the average value in the second population as $urn:x-wiley:20457758:media:ece32922:ece32922-math-0026$ . Thus,

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0027$

so that we may write the Price equation in Equation 1 as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0028$ (2)

an explicit expression for the change in average values. Because z can be defined in any way, this expression describes the change in any quantitative property of populations.

6 Biological Fitness and the Conservation of Total Probability

We may define an abstract analog of biological fitness. For a type or subset with label i, comprising frequency q_i in the ancestral population, the fraction of the descendant population derived from i is $urn:x-wiley:20457758:media:ece32922:ece32922-math-0029$ . Thus, the relative success of type i in contributing to the descendant population may be written as its relative fitness

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0030$ (3)

Average relative fitness is always one

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0031$

because the total frequency or probability is always a conserved value of one. In some articles, w_i is taken as an absolute measure of the number of descendants assigned to type i, and $urn:x-wiley:20457758:media:ece32922:ece32922-math-0032$ is the average number of descendants, which may differ from one. In that case, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0033$ is relative fitness. Here, I am using w_i as the measure of relative fitness, with $urn:x-wiley:20457758:media:ece32922:ece32922-math-0034$ always equal to one. The following analysis does not differ under the alternative definitions, but it is important to keep in mind the distinct definitions that may be used.

If we use relative fitness for the abstract property in the Price equation of Equation 2, with z ↦ w, we obtain

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0035$ (4)

It is often useful to express fitnesses as deviations from their average value, which we obtain by subtracting one from relative fitness

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0036$ (5)

which is known as Fisher's average excess in fitness (Fisher, 1958). The average value a is always zero; thus, we can write Equation 4 as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0037$ (6)

7 Identities for the Conservation of Probability

We may express the conservation of total probability in a variety of equivalent forms. This section shows some of the variants. The purpose of these variants is to set up the discussion in the next section, in which we interpret the Price equation partition in Equation 6 as a partition of total change into two parts. The first part is the change ascribed to direct forces, F. The second part is the change ascribed to the altered context of the population, which may be thought of as a change in the frame of reference caused by inertial forces, I.

I will discuss the interpretation of direct and inertial forces in the next section. Here, we must first consider various notational manipulations, which by themselves do not have much obvious meaning. The goal will ultimately be to discuss general aspects of change in populations subject to the constraint set by the conservation of total probability, which allows us to write the Price equation partition in Equation 6 as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0038$ (7)

We will need a toolkit of notational variants to establish this form and to show the connections between seemingly different subjects. It is a bit tedious to set up the various notational identities, but it is important to do so to develop alternative interpretations and to avoid confusion. On first reading, one may wish skim quickly through this section and then refer back to the notations as needed.

To start, note that q′ = q + Δq and Δa = a′ − a, thus we can write the second term of Equation 6 as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0039$ (8)

because q′ · a′ = q · a are the average values of a, which are always zero. Thus, we end up with the seemingly trivial partition

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0040$ (9)

which we will nonetheless find quite useful, because the partition provides some hints about the balance of direct and inertial forces in a conservative system. Before turning to that balance of forces in the next section, it is useful to consider some additional identities.

Each term in Equation 9 expresses the variance in fitness and, equivalently, a measure of the squared Euclidean distance through which the population moves

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0041$

in which a² is the vector of the squared terms, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0042$ , and thus, q · a² is the second moment of a. Here, V_w is the variance in relative fitness, because a_i = w_i − 1 is relative fitness shifted so that the mean value of a is zero. Thus, the second moment of a is the variance.

The term q · a² can be thought of as a squared distance starting from an initial point at zero and moving through the distance given by the sum of the squared deviations in each dimension, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0043$ , each dimension weighted by its frequency, q_i. Thus, the distance that the population moves in frequency space, caused by the changes in frequency given by variable fitnesses, is equivalent to the variance in fitness. Put another way, the reason that the variance in fitness always arises as the key metric in population change is that the variance describes the distance that the population moves.

We can also write

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0044$

which are forms that arise in information theory interpretations of frequency changes, and also clarify the geometric squared distance interpretation of frequency changes (Amari & Nagaoka, 2000). We can write this equation in a nonstandard vector notation, which will be convenient to use in this article, as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0045$ (10)

in which a ratio of vectors implies element-wise division, and vectors distribute through parentheses as dot products.

We can also rewrite the second term of Equation 6 by rearranging Equation 8 as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0046$ (11)

in which

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0047$

which measures the nonlinearity, or bending, in the changes of q in subsequent steps, which is roughly like an acceleration.

Note that Equation 11 has Δq_i terms in the denominator, which may appear to be problematic when such terms include zero values. However, each term is always part of a dot product, yielding values of $urn:x-wiley:20457758:media:ece32922:ece32922-math-0048$ for each term; thus, we can always interpret such terms directly by their actual value. The reason for splitting the terms in the manner of Equation 11 follows at the end of this section.

Note also that

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0049$ (12)

by the conservation of total probability. However, in each individual dimension, i, the value of $urn:x-wiley:20457758:media:ece32922:ece32922-math-0050$ is not necessarily zero. Although the total value is constrained to be zero, it is often useful to retain this term to emphasize the fact that the values in each dimension can vary.

We can combine the various pieces to express the Price equation partition for the change in relative fitness in Equation 6 as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0051$ (13)

or, using a = Δq/q, as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0052$ (14)

The second form emphasizes that this expression is given purely as the nondimensional description of changes in frequency or probability. Later, it will be useful to drop the middle term using the identity in Equation 12, leading to the form in Equation 9 expressed as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0053$ (15)

8 Balance of Direct and Inertial Forces

The previous sections described the conservation of total probability, which imposes strong constraints on the geometry of change in populations. In particular, the dynamics of probability distributions must move along the constraint that the total probability remains unchanged. Within that constraint, the probability distributions that characterize populations may change in response to directly applied forces, such as biological fitness or physical forces or informational processes.

This section analyzes the changes in probability distributions in response to direct forces and subject to the constraint of conserved total probability. The previous section established the key equations. On the abstract side, Equation 7 presented the partition between the forces that directly change frequencies, F, and the forces that change the inertial frame of reference for the population, I, as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0054$

which expresses a nondimensional analogy of d'Alembert's principle with respect to the balance between the direct and inertial components (Lanczos, 1986). d'Alembert's principle describes classical physical laws of motion in systems that conserve total energy, for example, motion that does not lose energy by friction and dissipation of heat. I previously discussed d'Alembert's principle in the context of frequency changes in populations (Frank, 2015). Here, I repeat a few key points from my previous article.

The term F is the vector of direct forces acting on the system, and the term I is the vector of inertial forces that balance the direct forces to achieve no net change. d'Alembert's principle can be thought of as a generalization of Newton's second law of motion (Lanczos, 1986), in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0055$ is read as the total force, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0056$ , equals mass, μ, times total acceleration, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0057$ . Total force and total acceleration must include forces of constraint, which in our case means that Σ Δq_i = 0. If we write total inertial force as $urn:x-wiley:20457758:media:ece32922:ece32922-math-0058$ , then Newton's law is $urn:x-wiley:20457758:media:ece32922:ece32922-math-0059$ .

In d'Alembert's formulation, the direct and inertial forces typically do not sum to zero, F + I = 0, because those terms do not include the constraining forces that act on Δq. Instead, in d'Alembert's expression (F + I)Δq = 0, the term Δq · F combines the direct and constraining forces, and the term Δq · I combines all inertial forces, including any forces of constraint. Newton's law is a special case of the more general principle of d'Alembert (Lanczos, 1986).

Here is a simple intuitive description of d'Alembert's principle (Wikipedia, 2015). You are sitting in a car at rest, and the car suddenly accelerates. You feel thrown back into the seat. But, even as the car gains speed, you effectively do not move in relation to the frame of reference of the car: Your velocity relative to the car remains zero. That net zero velocity can be thought of as the balance between the direct force of the seat pushing on you and the inertial force sending you back as the car accelerates forward.

As long as your frame of reference moves with you, then your net motion in your frame of reference is zero. Put another way, there is a changing frame of reference that zeroes net change by balancing the work of direct forces against the work of inertial forces. Although the system is a dynamic expression of changing components, it also has an overall static, equilibrium quality that aids analysis. As Lanczos (1986) emphasizes, d'Alembert's principle “focuses attention on the forces, not on the moving body…”

In terms of explicit notation for changes in frequencies, the previous section developed a Price equation expression for the partition of direct and inertial forces in Equation 14 as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0060$ (16)

with analogy to d'Alembert's form by expressing direct and inertial forces as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0061$

For frequency changes, one can think of a coordinate system that locates a population as a point defined by the population's frequency or probability distribution. The direct work done to move the population in that coordinate system is Δq · F, the sum of the force multiplied by the displacement in each dimension, calculated when holding constant the frame of reference defined by the coordinate system. That direct work is balanced by the inertial work done to accelerate the reference frame coordinate system by a total amount Δq · I, which relocates the altered population and its associated forces so that it appears in the new frame of reference to have a net total displacement multiplied by force of zero.

I use the word “force” here in an abstract, nondimensional manner, rather than in the specifically defined manner of classical physics. Such words can be a barrier to interdisciplinary insight and understanding. Readers highly trained in particular disciplines, such as physics, sometimes believe that a word such as “force” has a single correct meaning and associated units of expression. Any variant use of the word is thought to be misleading or mistaken. I take the opposite view. The underlying nondimensional geometry expresses the purest abstract notion of such concepts.

In each separate discipline, the particular dynamics and related equations have terms that take on specific interpretations, units, and meaning. Those specific aspects arise from the application of the same underlying universal geometry to particular problems, which usually means the same underlying conserved quantities and associated symmetries. The same geometry and abstract concepts will take on different units and interpretations in different disciplines.

9 Average Force Along a Path

In the Price equation description of change, we have only the differences between two populations. The two populations describe the initial and final probability distributions, q and q′. Each distribution can be thought of as a single point in a space of probability distributions. The separation between the two points is a nondimensional change that can be small or large. There is no underlying parameter, such as time or spatial distance, that defines the scale of separation and the path of change that connects the points.

Most applications analyze changes along a path with respect to an underlying parametric scale. To relate the Price equation to other theoretical frameworks, it is useful to add an abstract notion of change along a parametric path that connects the initial and final probability distributions.

Let θ be a parameter that describes change along a path that connects q to q′

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0062$

in which Δθ = θ − θ₀. We can set θ₀ = 0 and thus write θ ≡ Δθ. For notational convenience, let the dependence of q(θ) on the parameter θ be implicit, so that we can write the same expression more simply as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0063$ (17)

We can think of r_i as the average force acting along the path that moves the system from q_i to $urn:x-wiley:20457758:media:ece32922:ece32922-math-0064$ with respect to total path length, θ = Δs², in the parametric length scale, s. Thus, r_iθ is the total force in the ith dimension along the path of change. For our purposes, we can treat s as a nondimensional scale, and think of r_i as having nondimensional units of 1/s², interpreted as a nondimensional force or acceleration. In biology, the force r_i is interpreted as the Malthusian expression of biological fitness in analyses of natural selection, connecting the abstract analysis here to models of biological evolution (Frank, 2015).

Note that

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0065$

So that we may think of r_i as the average change in logarithmic coordinates of probability with respect to changes in the parametric length scale Δθ = Δs².

We can express the total nondimensional force in these logarithmic coordinates acting along the path of change from q to q′ as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0066$

Because m_i = log w_i, we can think of m_i as log fitness. Using m_i to express fitness, or force, the expression for change along a path in Equation 17 becomes

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0067$

10 Comparing Linear and Logarithmic Coordinates

In linear coordinates, for each implicit i, we combine forces multiplicatively

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0068$

in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0069$ separates $urn:x-wiley:20457758:media:ece32922:ece32922-math-0070$ into the segments $urn:x-wiley:20457758:media:ece32922:ece32922-math-0071$ , with $urn:x-wiley:20457758:media:ece32922:ece32922-math-0072$ between q and q′.

In logarithmic coordinate, we combine forces additively

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0073$

The two coordinate systems describe the same total fitness, or force, as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0074$ (18)

We can decompose any fitness value and its associated vector, (q, q′), into a large number of small pieces. In principle, we could analyze large changes in frequency, Δq = q′ − q, by combining the changes along each small segment in a decomposition of total change.

11 Log Coordinates, Entropy and Information

The average value of log fitness is

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0075$

in which

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0076$

is the Kullback–Leibler divergence (Cover & Thomas, 1991; Kullback, 1959). This divergence measures relative entropy by extending the classical measure of entropy, −q · log q, for a probability vector q, to a measure of the entropic divergence of q relative to a given probability vector, q′.

One can think of classical entropy for a probability vector, q, as a special case of the more general relative entropy by comparing q to a uniform distribution described by a constant probability vector in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0077$ for all i. The Kullback–Leibler divergence is also a primary measure of information in statistics and information theory.

The properties of entropy and information derive from the fundamental geometric properties of logarithmic coordinates, such as the additivity described in the previous section.

From the equality above, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0078$ , we can write the change in mean log fitness as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0079$

which measures the bending, or curvature, of the divergence between the populations in the sequence $urn:x-wiley:20457758:media:ece32922:ece32922-math-0080$ . When the divergence between successive steps remains constant, then mean log fitness is invariant.

We can use the Price equation in Equation 2 to partition the total change in log fitness into direct and inertial components

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0081$ (19)

The direct component is

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0082$

in which

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0083$ (20)

is the Jeffreys divergence. In earlier work, I showed that the Jeffreys divergence is the proper expression for the direct component of change caused by natural selection or, more generally, the component associated with direct forces when evaluated with respect to the fixed frame of reference given by the initial probability vector (Frank, 2012b).

For small changes, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0084$ and $urn:x-wiley:20457758:media:ece32922:ece32922-math-0085$ converge to the Fisher information metric. Thus, analyses of small changes often invoke $urn:x-wiley:20457758:media:ece32922:ece32922-math-0086$ , $urn:x-wiley:20457758:media:ece32922:ece32922-math-0087$ or Fisher information without distinguishing between the measures. For small changes, the Fisher information metric is often preferable, because it has many useful geometric properties (Amari & Nagaoka, 2000) and is more widely known than $urn:x-wiley:20457758:media:ece32922:ece32922-math-0088$ . However, it is useful to keep in mind that, in general, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0089$ is the correct measure for the direct effect of natural selection, or for the direct component of change relative to a fixed frame of reference.

The inertial component is

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0090$

12 Small Changes: Prelude

In the remainder of this article, I focus only on the small changes that arise from forces acting at a given point. Small changes correspond to a single small segment in any larger path. I focus on small changes for two reasons.

First, the conceptual relations between different disciplines can be seen mostly clearly in small changes around a focal point.

Second, analysis of larger changes requires either an assumed constancy of a force field, or potential function, or an explicit notion of how forces change with both time and the changing context of the population. Those required assumptions reduce the generality of any particular formulation and obscure the common conceptual basis of different subjects.

In the future, it would be useful to extend analysis to cases in which there is no meaningful decomposition of a large change vector into small segments and to cases in which there exists a constant force field for which one could reconstruct the path of change over a sequence of small segments. Such extensions exist within individual disciplines, but it remains unclear how to connect the analyses from those different subjects to a common unifying framework.

13 Small Changes: Analysis

When changes $urn:x-wiley:20457758:media:ece32922:ece32922-math-0091$ are small, I use the notation $urn:x-wiley:20457758:media:ece32922:ece32922-math-0092$ . For linear coordinates, we may write

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0093$

and for logarithmic coordinates when $urn:x-wiley:20457758:media:ece32922:ece32922-math-0094$ is small, we may write

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0095$

Because the consequence of forces is shift invariant in expressions such as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0096$

the linear and logarithmic expressions of force, w and m, are equivalent for small changes. We may express this equivalence explicitly by noting that, in general, the direct component of change was given earlier as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0097$

which, when $urn:x-wiley:20457758:media:ece32922:ece32922-math-0098$ is small, we may write as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0099$

This last expression is the Fisher information metric, which arises as the direct component of population change or natural selection (Frank, 2009), the limiting expression of the Jeffreys divergence given earlier.

14 Given Forces

I have defined $urn:x-wiley:20457758:media:ece32922:ece32922-math-0100$ as proportional to the force acting along the infinitesimal change $urn:x-wiley:20457758:media:ece32922:ece32922-math-0101$ . These expressions describe a consistency relation between force and frequency change. Often, we wish to consider how extrinsic or given forces cause change, rather than simply express consistency.

Suppose, for example, that we have a given force vector acting at the point in frequency space, q. The given force is the nondimensional vector

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0102$ (21)

Given the location, q, and the force vector, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0103$ , the vector $urn:x-wiley:20457758:media:ece32922:ece32922-math-0104$ provides an alternative way to express the intensity of the force vector as log $urn:x-wiley:20457758:media:ece32922:ece32922-math-0105$ . We can multiple $urn:x-wiley:20457758:media:ece32922:ece32922-math-0106$ by an arbitrary positive constant, because the net consequences of a force vector are shift invariant. Thus, we may implicitly consider $urn:x-wiley:20457758:media:ece32922:ece32922-math-0107$ as the target and choose $urn:x-wiley:20457758:media:ece32922:ece32922-math-0108$ to sum to one, satisfying the conservation of total probability.

As with m, we can write the total nondimensional force as a description of an exponential growth process

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0109$

in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0110$ is the endpoint of the exponential growth process that began at q_i. Thus, the location q and the “target” location $urn:x-wiley:20457758:media:ece32922:ece32922-math-0111$ are sufficient to describe the given force vector. In the following, we will only be interested in small changes, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0112$ , that result from the instantaneous given forces with respect to a fixed frame of reference. One goal will be to find the changes, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0113$ , that arise from given forces and various constraints on change.

It is common in classical mechanics to define force, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0114$ , in relation to coordinates, q_i, by the negative gradient of a potential function Φ, which for our definition of $urn:x-wiley:20457758:media:ece32922:ece32922-math-0115$ leads to

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0116$

We can use the potential function

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0117$ (22)

in which the second term expresses the constraint on total probability, so that the resulting force includes the force of constraint. The average force, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0118$ , is also a relative entropy expression.

15 Extreme Action and Frequency Dynamics

The given forces and the conservation of total probability do not by themselves tell us what frequency changes occur. In the study of frequency changes, the simplest variational approach (Lanczos, 1986) finds the extremum (maximum or minimum) of a Lagrangian subject to a constraint. In our case, we may write

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0119$ (23)

in which we take as given the direct force in each dimension, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0120$ .

We measure the total change caused by the direct forces as $urn:x-wiley:20457758:media:ece32922:ece32922-math-0121$ . That expression comes from Price's separation of direct and inertial forces in Equation 19. In terms of classical mechanics (Lanczos, 1986), the expression $urn:x-wiley:20457758:media:ece32922:ece32922-math-0122$ is the virtual work of the direct forces, in which work is distance times force (ignoring mass).

Geometrically, we can think of the constraint in the second term as fixing the total path length moved in frequency space (Amari & Nagaoka, 2000), in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0123$ measures distance by the Fisher information metric for infinitesimal displacements, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0124$ , or, biologically, C² is the variance in fitness. I assume that C² is chosen so that a solution exists that satisfies the constraints. The final term constrains total probability to remain constant.

The constraints of $urn:x-wiley:20457758:media:ece32922:ece32922-math-0125$ and $urn:x-wiley:20457758:media:ece32922:ece32922-math-0126$ do not by themselves determine which frequency changes actually occur. Many different frequency vectors, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0127$ , satisfy those two constraints.

Given these forces and constraints, what actual path do the dynamics follow? In other words, what is the realized vector $urn:x-wiley:20457758:media:ece32922:ece32922-math-0128$ ? We can think of the first term in the Lagrangian as the action, and extremize the action subject to the given constraints (Lanczos, 1986). That action term is $urn:x-wiley:20457758:media:ece32922:ece32922-math-0129$ , the product of the displacement times the given force, which is the virtual work. In this case, maximizing the virtual work in the Lagrangian finds the displacement $urn:x-wiley:20457758:media:ece32922:ece32922-math-0130$ aligned with the direct and constraining forces.

To find the extreme action path, we evaluate $urn:x-wiley:20457758:media:ece32922:ece32922-math-0131$ , which yields

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0132$ (24)

in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0133$ is the excess force relative to the average, and $urn:x-wiley:20457758:media:ece32922:ece32922-math-0134$ follows from satisfying the conservation of total probability and the assumption that the virtual displacements are small. The constant of proportionality

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0135$ (25)

satisfies the constraint on total path length, in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0136$ is the standard deviation of the direct forces.

Here, we have deduced a fundamental expression for frequency dynamics by the principle of extreme action. We can rewrite the expression for frequency dynamics as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0137$ (26)

which shows that the forces, m_i, may be arrived at inductively by consistency with given changes, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0138$ . This expression also shows that the forces described by m are related by affine transformation to a vector of given forces, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0139$ , from which one may deduce the actual frequency changes.

16 Direct Forces and Constraining Forces

The distinction between direct and constraining forces is arbitrary. We may choose to describe a force by its constraint on allowable displacements, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0140$ , or by its inclusion in the direct forces, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0141$ .

The Lagrangian in Equation 23 defines the action to be extremized as the work done along the path, which is the total displacement, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0142$ , times the direct component of force, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0143$ . We can use $urn:x-wiley:20457758:media:ece32922:ece32922-math-0144$ rather than $urn:x-wiley:20457758:media:ece32922:ece32922-math-0145$ for force, because we can ignore the constant, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0146$ , and $urn:x-wiley:20457758:media:ece32922:ece32922-math-0147$ .

The constraining forces in the Lagrangian of Equation 23 are the fixed path length, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0148$ , and the conservation of total probability, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0149$ .

We are free to relabel a component of the direct force as a constraining force (Lanczos, 1986). In practice, deriving the altered Lagrangian provides an easy way to see how the changed labeling of direct and constraining forces enters into the analysis.

Consider the direct forces as defined in Equation 21 as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0150$

We can think of this expression as the sum of two component forces, log $urn:x-wiley:20457758:media:ece32922:ece32922-math-0151$ and –log q. The virtual work term of the direct forces becomes

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0152$ (27)

We may choose to relabel $urn:x-wiley:20457758:media:ece32922:ece32922-math-0153$ as a force of constraint. The remaining term $urn:x-wiley:20457758:media:ece32922:ece32922-math-0154$ becomes the virtual work associated with the direct forces. The next section illustrates how this change in labeling can be useful.

17 Conserved System Quantities as the Primary Forces of Constraint

In relabeling $urn:x-wiley:20457758:media:ece32922:ece32922-math-0155$ as a constraining force, we may write

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0156$ (28)

in which log k is understood to be a constant vector with elements k when used in a vector context, k is chosen so that $urn:x-wiley:20457758:media:ece32922:ece32922-math-0157$ obeys the conservation of total probability, the term λ is a positive constant, and z_i > 0 is chosen to make the equality hold. Thus, we can express the force associated with $urn:x-wiley:20457758:media:ece32922:ece32922-math-0158$ using z_i. The constraining force now becomes associated with the component

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0159$ (29)

The advantage of using z is that we may define the force of constraint directly in terms of any system quantity that we may associate with z. Each z_i is, in this analysis, a given value associated with a subset i of the population. We can use any quantity for z, including energy or momentum or monetary wealth or a quantitative biological trait.

Often, underlying quantities of a system, x_i, become transformed by various processes before we evaluate the final quantity of the outcome, z_i. We may, in general, consider z_i = T(x_i), in which x_i is an intrinsic quantitative value associated with the subset i, and T(x_i) is a transformation that defines a scaling relation between the intrinsic x_i values and the constraining force, z_i. The analysis of pattern often reduces to understanding the processes that set the scaling relation (Frank, 2014), T.

Because we can define z_i = T(x_i) in any way, the quantity $urn:x-wiley:20457758:media:ece32922:ece32922-math-0160$ can represent almost any sort of functional on the system. This expression for $urn:x-wiley:20457758:media:ece32922:ece32922-math-0161$ is also the average value of z. It is often useful to consider changes in $urn:x-wiley:20457758:media:ece32922:ece32922-math-0162$ , with infinitesimal change as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0163$ (30)

which we obtain by a simple chain rule expansion of the differential, yielding an infinitesimal expression of the Price equation given in Equation 2.

If $urn:x-wiley:20457758:media:ece32922:ece32922-math-0164$ is constrained, then that constraint defines the constraint on $urn:x-wiley:20457758:media:ece32922:ece32922-math-0165$ in Equation 29. For example, the total system quantity $urn:x-wiley:20457758:media:ece32922:ece32922-math-0166$ may be conserved, which means that $urn:x-wiley:20457758:media:ece32922:ece32922-math-0167$ . If the z quantities do not themselves change, then $urn:x-wiley:20457758:media:ece32922:ece32922-math-0168$ , and consequently, we have the constraint on the given forces $urn:x-wiley:20457758:media:ece32922:ece32922-math-0169$ . We may also consider other ways in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0170$ is constrained, thereby defining the given forces $urn:x-wiley:20457758:media:ece32922:ece32922-math-0171$ that determine dynamics.

18 Maximum Entropy Production Principle

With the split between direct and constraining forces in Equation 27, and the expression of the constraining forces in terms of z in Equation 29, we can write a new Lagrangian that is equivalent to the Lagrangian in Equation 23, using dot product notation

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0172$ (31)

The first term is the total action to be maximized, which is the virtual work of the direct forces, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0173$ . The other terms describe the constraints on the path that $urn:x-wiley:20457758:media:ece32922:ece32922-math-0174$ may follow. I assume that C² and B are chosen such that a solution exists.

The classical definition of entropy is − q · log q. Thus, the path $urn:x-wiley:20457758:media:ece32922:ece32922-math-0175$ that maximizes $urn:x-wiley:20457758:media:ece32922:ece32922-math-0176$ , subject to the constraints on $urn:x-wiley:20457758:media:ece32922:ece32922-math-0177$ , is, in the limit of small changes, the path that maximizes the production of entropy subject to the constraints—the maximum entropy production principle (see Appendix for references).

The idea is that the most likely path is the one that maximizes the production of entropy, which is equivalent to the maximization of the virtual work of the direct forces, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0178$ , subject to the constraints on $urn:x-wiley:20457758:media:ece32922:ece32922-math-0179$ . The constraints in $urn:x-wiley:20457758:media:ece32922:ece32922-math-0180$ include all forces that determine the location of $urn:x-wiley:20457758:media:ece32922:ece32922-math-0181$ .

The maximum entropy production principle is always true, in the sense that one can always split the total direct forces, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0182$ , into a constraining component, log $urn:x-wiley:20457758:media:ece32922:ece32922-math-0183$ , and a direct component, −log q. The extent to which maximum entropy production is meaningful depends on two questions. First, how meaningful is it to treat $urn:x-wiley:20457758:media:ece32922:ece32922-math-0184$ as a constraint? Second, how meaningful is it to consider paths of change in the context of the Price equation separation of direct and inertial forces, a generalization of d'Alembert's principle?

In order to answer those questions about maximum entropy production, the next section analyzes dynamics with respect to z as a constraint. The following section discusses the Jaynesian theory of maximum entropy in relation to equilibrium thermodynamic expressions for common probability distributions. After those two sections, I return to the broader question of how to interpret the maximum entropy production principle in terms of the Price equation.

19 Maximum Entropy Path Subject to Constraint

To interpret the meaning of z as a constraint, we return to the Lagrangian in Equation 31. That Lagrangian is equivalent to the form in Equation 23, thus solving $urn:x-wiley:20457758:media:ece32922:ece32922-math-0185$ yields a solution equivalent to Equation 24, which we can expand to emphasize alternative interpretations

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0186$

with deviations from average values $urn:x-wiley:20457758:media:ece32922:ece32922-math-0187$ and

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0188$

in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0189$ is the traditional definition of system entropy. Thus, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0190$ is the deviation of the entropy in the ith dimension from the system entropy. The constant $urn:x-wiley:20457758:media:ece32922:ece32922-math-0191$ is absorbed by expressing $urn:x-wiley:20457758:media:ece32922:ece32922-math-0192$ and $urn:x-wiley:20457758:media:ece32922:ece32922-math-0193$ as deviations from their average values. The constant $urn:x-wiley:20457758:media:ece32922:ece32922-math-0194$ is given by Equation 25, in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0195$ is the standard deviation of the forces, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0196$ .

The constraint $urn:x-wiley:20457758:media:ece32922:ece32922-math-0197$ implies

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0198$

The term β_ɛz is the regression coefficient of $urn:x-wiley:20457758:media:ece32922:ece32922-math-0199$ , on z_i, which transforms the scale for the forces of constraint imposed by z to be on a common scale with the direct forces of entropy, −log q. The term $urn:x-wiley:20457758:media:ece32922:ece32922-math-0200$ describes the required force of constraint on frequency changes so that the new frequencies move $urn:x-wiley:20457758:media:ece32922:ece32922-math-0201$ by the amount $urn:x-wiley:20457758:media:ece32922:ece32922-math-0202$ . The term $urn:x-wiley:20457758:media:ece32922:ece32922-math-0203$ is the variance in z.

When the z values change, the changing frame of reference with respect to z follows from Equation 30 as $urn:x-wiley:20457758:media:ece32922:ece32922-math-0204$ . When $urn:x-wiley:20457758:media:ece32922:ece32922-math-0205$ is a conserved quantity and the z values remain constant such that $urn:x-wiley:20457758:media:ece32922:ece32922-math-0206$ , then $urn:x-wiley:20457758:media:ece32922:ece32922-math-0207$ . When B = 0, the force of constraint for the conserved quantity is expressed simply by $urn:x-wiley:20457758:media:ece32922:ece32922-math-0208$ .

20 Equilibrium Thermodynamics and Probability

This section analyzes how the system equilibrium arises from the direct force causing maximum increase in entropy and the constraining forces imposed by z. That equilibrium can be interpreted as the maximum entropy probability distribution.

The dynamics are expressed in Equation 24 as $urn:x-wiley:20457758:media:ece32922:ece32922-math-0209$ . Equilibrium requires that the forces be constant in each dimension, thus $urn:x-wiley:20457758:media:ece32922:ece32922-math-0210$ . We can take that condition as the forces in each dimension given by

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0211$

which means that the equilibrium condition can be written as $urn:x-wiley:20457758:media:ece32922:ece32922-math-0212$ . We can express $urn:x-wiley:20457758:media:ece32922:ece32922-math-0213$ in terms of the system quantities, z, that set the forces of constraint. From Equation 28, we write the equilibrium condition as $urn:x-wiley:20457758:media:ece32922:ece32922-math-0214$ , or

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0215$

That probability distribution is the classic Jaynesian thermodynamic equilibrium (Jaynes, 1957a,b, 2003) that arises by maximizing entropy subject to a constraint on $urn:x-wiley:20457758:media:ece32922:ece32922-math-0216$ . That constraint is usually interpreted as a conserved quantity, such that $urn:x-wiley:20457758:media:ece32922:ece32922-math-0217$ , and $urn:x-wiley:20457758:media:ece32922:ece32922-math-0218$ . We can use multiple constraints on a set of system values $urn:x-wiley:20457758:media:ece32922:ece32922-math-0219$ , and replace $urn:x-wiley:20457758:media:ece32922:ece32922-math-0220$ by $urn:x-wiley:20457758:media:ece32922:ece32922-math-0221$ summed over j. For simplicity, I focus on a single constraint.

Suppose we want to find a Lagrangian that leads to the Jaynesian equilibrium, in which the defined forces $urn:x-wiley:20457758:media:ece32922:ece32922-math-0222$ arise from a constraint on a conserved system quantity, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0223$ . The following Jaynesian Lagrangian does the job

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0224$ (32)

in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0225$ , is the classical expression for entropy defined earlier. This Lagrangian is simply the entropy, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0226$ , subject to two constraints. First, the total probability must be one. Second, the system quantity $urn:x-wiley:20457758:media:ece32922:ece32922-math-0227$ is conserved and equal to $urn:x-wiley:20457758:media:ece32922:ece32922-math-0228$ . The terms $urn:x-wiley:20457758:media:ece32922:ece32922-math-0229$ and $urn:x-wiley:20457758:media:ece32922:ece32922-math-0230$ are the Lagrangian multipliers that adjust to guarantee that the constraints are satisfied.

Maximum entropy subject to the constraints requires $urn:x-wiley:20457758:media:ece32922:ece32922-math-0231$ , which yields the maximum entropy probability distribution

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0232$

in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0233$ , and $urn:x-wiley:20457758:media:ece32922:ece32922-math-0234$ . We can extend this result to unify the commonly observed probability distributions within a single framework by noting that $urn:x-wiley:20457758:media:ece32922:ece32922-math-0235$ is an arbitrary scaling relation of an underlying value, x_i (Frank, 2014, 2016).

Two conclusions follow. First, equilibrium probability distributions at maximum entropy express the force of constraint on total probability and the forces of constraint on total system quantities. The point of maximum entropy occurs at the minimum relative entropy, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0236$ , which is achieved as q → $urn:x-wiley:20457758:media:ece32922:ece32922-math-0237$ .

Second, pattern follows from the values of z that set the forces of constraint and thus the magnitudes of $urn:x-wiley:20457758:media:ece32922:ece32922-math-0238$ . How the z values arise has not been specified. Thus, the study of pattern often reduces to the study of how various processes set z. The analysis here clarifies how those processes and the associated maximum entropy probability distribution relate to the universal Price equation expression for the dynamics of populations.

21 Interpretation of Maximum Entropy Path

The previous sections analyzed forces in terms of Price's partition of direct and inertial forces, an abstract generalization of d'Alembert's principle of mechanics. By analogy with d'Alembert's principle, the Price equation term $urn:x-wiley:20457758:media:ece32922:ece32922-math-0239$ can be thought of as an abstraction of the virtual work associated with the direct and constraining forces.

The direct forces are F. The constraining forces are included in the allowable set of displacements, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0240$ , taken relative to the fixed frame of reference. Such displacements relative to a fixed frame of reference are sometimes called virtual displacements, thus the name virtual work for the term $urn:x-wiley:20457758:media:ece32922:ece32922-math-0241$ . The Lagrangian expressions provide a method for maximizing the virtual work subject to the constraints that limit the possible set of displacements.

We may interpret the partition of direct and constraining forces in different ways, to match the interpretation of different problems. In this article, I split the total direct forces into a direct force that increases entropy, F = − log q, and a set of potential virtual displacements, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0242$ , that obey the forces of constraint defined by conservation of a functional, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0243$ , of the system quantities, z, where one can think of each z_i as a function on the subset, i, of the population.

In particular, I defined the total direct forces by $urn:x-wiley:20457758:media:ece32922:ece32922-math-0244$ , and then split those forces as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0245$

If we take $urn:x-wiley:20457758:media:ece32922:ece32922-math-0246$ as the direct forces, then the frequency changes can be obtained from the Lagrangian in Equation 23 that maximizes the action $urn:x-wiley:20457758:media:ece32922:ece32922-math-0247$ , which is equivalent to minimizing the change in relative entropy, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0248$ .

If we take –log q as the direct forces, then the frequency changes can be obtained from the Lagrangian in Equation 31 that maximizes the action $urn:x-wiley:20457758:media:ece32922:ece32922-math-0249$ , which is equivalent to maximizing the gain in entropy, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0250$ .

In other words, the realized path maximizes the production of entropy when analyzed within the fixed frame of reference, thus the maximum entropy production principle. That conclusion holds only in the d'Alembert–Price distinction between direct and constraining forces, in which we choose to interpret all direct forces except entropy production as constraining forces on the possible virtual displacements, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0251$ . In addition, the changes in frame of reference that typically arise from change in location, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0252$ , or from change in the constraining forces, are separated by the Price equation approach into the consequences of the inertial forces.

Maximum entropy production only holds for the partial change from the direct forces, when separating all direct forces other than entropy into the constraints, and when ignoring changes in the frame of reference associated with the inertial forces.

Does it make sense to follow this particular partition of forces into components? There is no correct answer to that question. The principle exists. The interpretations of usefulness and meaning will always have a strongly subjective aspect.

I follow Lanczos (1986) in the claim that separating direct, inertial, and constraining components is the great unifying perspective in the study of forces. In many systems, it makes sense to describe most of the applied forces in terms of the constraining forces of conserved system quantities. Often, all that remains is the only truly universal force, the increase of entropy, which completes the description of the total direct forces acting on a system.

In some cases, it may make sense to use a different partition of applied forces into direct and constraining component forces. When the remaining direct component of force differs from entropy alone, then it would appear that the system does not follow the maximum entropy production principle. However, it is better to say that the maximum entropy production principle always holds, but alternative expressions may provide a more meaningful perspective for particular problems.

In this interpretation, entropy is simply a geometric description of position and change for probability distributions when located in logarithmic coordinates. That fundamental geometry explains the universality of entropy, or information, in widely different disciplines and applications.

22 Geometry and the Fisher Information Metric

We can write the conservation of total probability expression in Equation 15 for small changes, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0253$ , as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0254$

in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0255$ is the Fisher information metric, and the subscripts on $urn:x-wiley:20457758:media:ece32922:ece32922-math-0256$ denote the direct and inertial components of the Price equation.

In various models of natural selection, information, and entropy, different measures arise in terms of the Jeffreys divergence, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0257$ , the Kullback–Leibler divergence, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0258$ , and the Fisher information metric, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0259$ . Confusion sometimes occurs, because in the limit of small changes, all three measures converge to an equivalent form that often appears as the Fisher information metric. That limiting equivalence hides the significant differences between the measures and the different situations to which each measure naturally applies.

The Fisher information metric is used in many applications (Cover & Thomas, 1991; Kullback, 1959). For example, Frieden (2004) has emphasized that this Fisher information partition subsumes nearly all of the key results of theoretical physics. Similarly, the subject of information geometry subsumes nearly all of the classical aspects of statistical inference through a Riemannian geometry based on the Fisher information metric (Amari & Nagaoka, 2000).

From the general perspective of the Price equation and d'Alembert's form for the conservation of total probability in Equation 7, the partition into Fisher information components arises as a special case in the limit of small changes (Frank, 2015). In that special case of Fisher information, in which $urn:x-wiley:20457758:media:ece32922:ece32922-math-0260$ , one does not separate the forces of constraint from the other directly applied forces. Instead, all directly applied and constraining forces combine into a single quantity that describes the path, in which that path has a natural geometric expression in terms of the Fisher information metric. That geometry is very useful in many applications. But it is important to recognize the more general perspective of Price and d'Alembert, which allows a deeper conceptual understanding of the different roles played by directly applied forces, constraining forces, and inertial forces.

One can think of the maximum entropy production principle in terms of Fisher information geometry. The universal direct force that increases entropy is always present. In addition to that universal direct force, various additional constraining forces combine to influence the curvature of the space of allowable virtual displacements. The direct and constraining forces combine to determine the paths of change within the Fisher information geometry (Amari & Nagaoka, 2000).

23 Direct Work, Information, and Entropy

I summarize in two parts. In this section, I briefly review the Price equation formulation of the work of the direct forces. I then show how the classic measures of information and entropy follow from simple geometric assumptions about the most useful scale on which to measure changes in populations. The following section focuses on the Lagrangian analysis of the dynamical paths of change, including the partial maximum entropy production principle, and provides a final summary.

The Price equation presents universal principles of total change in populations. The strongest principles arise when studying change purely in terms of altered probability distributions. In that case, the natural selection definition of relative fitness as the ratio of probabilities, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0261$ , leads to a Price equation expression for the change in average relative fitness, describing the conservation of total probability in Equation 6, as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0262$

We can write that conservation law for total probability in terms of d'Alembert's partition of direct, inertial, and constraining forces in Equation 6 as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0263$

The allowable displacements in probability, Δq, must obey any constraints imposed on changes in the system, and thus implicitly reflect any underlying forces of constraint. Such displacements may be reversed, because all allowable displacements fall within the constraints of conserved total probability. Reversible infinitesimal displacements that obey the constraining forces, taken in the context of the fixed frame of reference in the initial state of the population, are often called virtual displacements.

In this abstract Price equation generalization of d'Alembert's principle of mechanics for conserved systems, the first component of change arises from the direct forces, a = F, which may be written from Equation 10 as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0264$

which is the nondimensional product of a displacement multiplied by a force, yielding the Price equation abstraction of the mechanical notion of the work of the direct forces. For infinitesimal displacements, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0265$ , consistent with the forces of constraint, the term $urn:x-wiley:20457758:media:ece32922:ece32922-math-0266$ is often called the virtual work.

The work of the direct forces describes change in the context of the fixed frame of reference given by the initial population. The total change depends on how the frame of reference changes, captured by the second term q′ · Δa = Δq · I, as in Equation 11.

Often, it is difficult to interpret the changing frame of reference in a simple way. Instead, the strongest universal principles come from study of the work of the direct forces—the partial change caused by the direct forces with respect to the fixed initial frame of reference.

The work of the direct forces may be partitioned into components of directly applied forces, F, and constraining forces expressed by the allowable displacements, Δq. One can make that partition in a variety of ways according to the interpretation of a particular system. The emphasis on forces helps greatly in understanding the causes of change (Lanczos, 1986).

Fitnesses, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0267$ , are ratios of probabilities. Geometrically, it is convenient to have identical ratios correspond to identical distances between coordinates of probability. We achieve that identity by expressing fitness in logarithmic coordinates

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0268$

When we interpret fitness as a force, the logarithmic coordinates change the multiplication of fitness components of force into the addition of the logarithmic fitness components of force, as in Equation 18.

In the Price equation, we can use any arbitrary coordinates, z, for the quantitative property values associated with probabilities. We can think of those arbitrary coordinates as a geometric transformation of the fundamental coordinates of conserved probability and fitness, w ↦ z. Equivalently, we may write a ↦ z, because a = w − 1, and the Price equation is shift invariant.

When we transform from the fundamental coordinates of fitness to the logarithmic coordinates of fitness, w ↦ m, we obtain many of the classic expressions for information and entropy, which ultimately express the simple underlying geometry of change described by the Price equation. For example, in logarithmic coordinates, the work of the direct forces becomes

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0269$

which is the Jeffreys divergence measure of entropy or information, as in Equation 20. The symmetric Jeffreys divergence is the sum of reflected asymmetric Kullback–Leibler divergences, in which the Kullback–Leibler divergence is the most commonly used measure of relative entropy or relative information.

When the changes, Δq_i/q_i, are small, the logarithmic measure of fitness converges to the linear measure of fitness, m→a, and the Jeffreys divergence and the Kullback–Leibler divergence converge to the Fisher information metric. The Fisher metric is the fundamental measure of distance between probability distributions that forms the basis of much of statistical inference and information geometry.

In these Price equation descriptions of change, we have taken the fitnesses as given, and equated fitness or the logarithm of fitness with a notion of force. That approach is essentially inductive, in which we take the probabilities as given locations, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0270$ , and implicitly induce the force that would be consistent with the change from q_i to $urn:x-wiley:20457758:media:ece32922:ece32922-math-0271$ .

24 Partial Maximum Entropy Production

The main point of this article is to analyze the traditional deductive perspective of dynamics with respect to force. In that traditional perspective, we begin with the initial location of the population, q, and given forces which we denote $urn:x-wiley:20457758:media:ece32922:ece32922-math-0272$ . From those given conditions, we then deduce the changes in location and the new probabilities, q′. I confined the analysis to the study of small changes, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0273$ .

To obtain the dynamics, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0274$ , from the initial location and the given forces, I first wrote the Lagrangian expression for each particular case. The Lagrangian focuses on a first term, often called the action, which is either maximized or minimized (extremized). When minimized, the procedure follows the principle of least action, but more generally, the procedure is known as the principle of extreme action.

In this article, I maximized the virtual work of the given direct forces, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0275$ . Intuitively, this simply means that the changes will follow the lines of force in relation to the magnitudes of the force in each dimension. However, we must consider both the direct and constraining force.

The Lagrangian approach provides a natural way to combine direct and constraining forces. In each Lagrangian, the first term gives the virtual work of the direct forces to be maximized. The remaining terms give the constraints that must be satisfied, usually as some total quantity that is conserved when summed over all dimensions of the system. The Lagrangian procedure transforms the system constraints into the constraining force components in each dimension.

The various results in the text show how different kinds of constraints and different ways of separating overall force into direct and constraining components determine the change in frequencies.

The key result concerns the partial maximum entropy production principle, which I briefly review. I expressed the given forces as $urn:x-wiley:20457758:media:ece32922:ece32922-math-0276$ . Thus, the virtual work of the given forces in Equation 27 is

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0277$

I assumed that there is some quantity, z, such as energy or biomass or any other appropriate measure, that is constrained so that the total direct changes in that quantity are $urn:x-wiley:20457758:media:ece32922:ece32922-math-0278$ . We may relabel the part of the given forces, log $urn:x-wiley:20457758:media:ece32922:ece32922-math-0279$ , as a constraining force associated with the fixed value imposed on direct changes in z, given by the expression in Equation 29 as

$urn:x-wiley:20457758:media:ece32922:ece32922-math-0280$

With this component labeled as a constraining force, the remaining part of the virtual work of the direct forces is $urn:x-wiley:20457758:media:ece32922:ece32922-math-0281$ , which in the limit for small changes is the production of entropy along the path of small changes, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0282$ . This component is the action term maximized along the path of change; thus, the path follows the direction that maximizes the production of entropy. I call this the partial maximum entropy production principle, because the result expresses the change in terms of the fixed frame of reference of the initial population state. Total change must also evaluate any changes in the frame of reference through the inertial forces.

The entropy production principle simply expresses the basic geometry for the path of change when extrinsic forces are considered as constraints on system quantities, and logarithmic coordinates are used to locate populations. Because changes in probabilities as fitness or force have a natural expression as the ratio of probabilities, $urn:x-wiley:20457758:media:ece32922:ece32922-math-0283$ , and such quantities combine multiplicatively, logarithmic coordinates arise naturally from the transformation that yields additive combinations. Thus, entropy production or changes in information arise as the inevitable consequence of the geometry of change when evaluated in the Price equation partition of direct and inertial forces.

In summary, several different disciplines share the same basic fundamental theory of change. From the perspective of the Price equation, we have seen common expressions for natural selection, aspects of physical mechanics and thermodynamics, entropy expressions for probability distributions, and common measures of information theory. Perhaps many common models of learning by reinforcement (Sutton & Barto, 1998; Szepesvri, 2010) and Bayesian updating (Campbell, 2016; Harper, 2011; Shalizi, 2009) will also share the same underlying geometric principles.

Acknowledgements

National Science Foundation grant DEB–1251035 supports my research. I did this work while on fellowship at the Wissenschaftskolleg zu Berlin.

Conflict of Interest

None declared.

Appendix A: Literature in Specific Disciplines

Natural selection

Price originally formulated his equation as an expression of natural selection (Price, 1970, 1972a). In another article, without any direct connection to the Price equation, he speculated about a unified theory of change based on an abstract generalization of the principle of selection (Price, 1995).

In Price's vision for a general theory of selection, he suggested the separation of frequency and property values in the description of population change. He also described changes by an abstract mapping scheme between members of two populations. Price never connected these abstract ideas about mapping and about separating frequency and property directly to his formulation of the Price equation, although one can see hints of this in Price (1972a).

In other work (Price, 1972b), Price clarified one of the great puzzles in the history of evolutionary theory. In 1930, Fisher stated his fundamental theorem of natural selection as: “The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time.”

Fisher emphasized the exactness of the theorem and his belief that the theorem was a general and profound statement about natural selection. The puzzle is that Fisher's theorem holds exactly only under a very restricted set of assumptions (Crow & Kimura, 1970). Fisher is regarded as perhaps the greatest mathematical biologist ever. So the mismatch between Fisher's strong claim and the seemingly obvious failure of the theorem was hard to reconcile.

Price (1972b) solved the puzzle. In the language of the present article, Fisher meant that the rate of increase in fitness equals the variance in fitness when evaluated with respect to the fixed frame of reference of the population's initial state. Selection acts as a direct force, with consequences of the direct force evaluated by holding constant the context. Any changes to the population that alter the fitnesses of individuals are regarded as consequences of inertial forces that alter the frame of reference.

Price (1972b) did not use the language of direct and inertial forces, but he clearly understood Fisher's partition of total change into two components. Later work clarified a variety of early theories about natural selection within the context of the Fisher's partition (Ewens, 1989, 1992; Frank & Slatkin, 1992).

In summary, Price left three separate insights about natural selection: the Price equation, the separation of frequency and property in an abstract mapping scheme, and Fisher's method of partitioning total change with respect to the frame of reference. My own work has unified those different pieces into an extended, more general and abstract interpretation of the Price equation (Frank, 1995, 1997, 2012a,b).

Another important line of work in evolutionary theory concerns the path of change in gene frequencies. Wright (1931, 1932) initiated the approach most closely related to analogies with classical mechanics. That line of work continues to be developed, including explicit connections to notions of entropy and statistical mechanics (de Vladar & Barton, 2011).

The studies initiated by Wright contrast with Fisher's approach (Frank, 2012c). In the language of this article, Fisher emphasized instantaneous change at a point and the partition of direct and inertial components of change. Fisher believed that the inertial components of change were too unpredictable to allow an explicit theory for the full path of change over significant lengths. By contrast, Wright and his descendants sought a theory of the paths of change over significant distances. This article emphasized the Fisherian perspective.

Maximum entropy production

Jaynes’ theory of maximum entropy (Jaynes, 1957a,b, 2003) emphasizes that probability distributions can be read as expressions of constraining forces (Frank, 2014).

For example, a Gaussian distribution expresses a constraint on the average distance of observations from the mean value. If one constrains that average distance of fluctuations from the mean, then the Gaussian distribution arises by maximizing the entropy subject to that constraint. Maximizing entropy is roughly equivalent to minimizing information or maximizing randomness.

Jaynes’ maximum entropy describes an equilibrium condition (Jaynes, 1957a,b, 2003). The idea is that entropy increase is a ubiquitous force—a ubiquitous entropic force. Increasing entropy plus constraining forces together define the form of the equilibrium distribution.

The increase in entropy toward an equilibrium leaves open the problem of the dynamical path followed from initial condition to final equilibrium state. What characterizes the increments along that path? One possibility is that each increment follows the direction that maximizes the increase in entropy—the path of maximum entropy production (MEP).

Some authors have proposed MEP as a fundamental principle similar to the principle of least action (Dewar, 2005; Dewar, Lineweaver, Niven, & Regenauer-Lieb, 2014). By that view, essentially all realized paths of motion maximize the production of entropy. Other authors have suggested that MEP is only an approximate description of dynamics (Dewar et al., 2014). By that view, certain special systems follow MEP exactly, whereas many other systems follow MEP approximately or not at all.

The logical status of MEP as a principle and its usefulness in analysis remain open problems. The interpretation of MEP is important, because that interpretation reflects our general understanding of diverse subjects and the relations between those subjects.

In this article, I showed that MEP is an exact statement about dynamics when interpreted in the context of the Price equation and the information theory definition of entropy. The Price equation provides an abstraction of change that may be interpreted as a partition into components that separate direct, inertial, and constraining forces.

This Price equation separation of forces is an abstract generalization of d'Alembert's principle of classical mechanics (Lanczos, 1986). The Price equation formulation can be applied to both conservative and nonconservative systems, extending d'Alembert's application to conservative systems. Wang (2007) proposed a different way to connect entropy and d'Alembert through a more traditional thermodynamic approach.

Although MEP is a valid principle, I suggested that a purely geometric interpretation provides a more fundamental and universal perspective than does the entropy perspective of MEP. In particular, the conservation of total probability imposes strong geometric symmetry and constraint on the separation of direct and inertial forces (Frank, 2015). Maximum entropy production is a useful but often unnecessarily complicated way of expressing those fundamental geometric principles.

Returning to Jaynes, his goal was to express an abstract and general approach to understanding probability patterns. He sought to transcend the specific physical assumptions of statistical mechanics and thermodynamics, thereby achieving a more general theory that applied to broader range of disciplines.

In several ways, Jaynes did not go far enough. For example, he retained entropy and information as primary quantities. Similarly, information geometry, based on metrics such as Fisher information, retains a notion of information as primary. In my view, the underlying geometry, conserved quantities, and symmetries provide the true foundation for analysis as, for example, in Frank (2016).

Statistical inference and learning algorithms

This article showed that natural selection connects to universal expressions of population change and probability through the Price equation (Frank, 1995, 2012a; Price, 1970, 1972a). One can think of natural selection as an algorithm for accumulating information. Many authors have noted formal connections between natural selection, information theory (Frank, 2009, 2012b); Bayesian updating in statistical inference (Campbell, 2016; Harper, 2011; Shalizi, 2009); and learning algorithms (Campbell, 1974).

Although initial connections have been made between natural selection and those different subjects, unification based on a deeper geometric foundation remains an open problem. For example, Jaynes maximum entropy approach ultimately aimed to unify probability, information, statistical inference, and physical theories of statistical mechanics and thermodynamics (Jaynes, 2003). Another subject which might eventually coalesce is reinforcement learning (Sutton & Barto, 1998; Szepesvri, 2010) which provides the basis for aspects of neuroscience, cognitive science, and machine learning.

How do those various subjects relate to general underlying geometric principles for the dynamics of change in populations?

References

Amari, S., & Nagaoka, H. (2000). Methods of information geometry. New York, NY: Oxford University Press.
Google Scholar
Campbell, D. T. (1974). Evolutionary epistemology. In P. A. Schilpp (Ed.), The Philosophy of Karl Popper, Volume 1 (pp. 413–463). LaSalle, IL: Open Court Press.
Google Scholar
Campbell, J. O. (2016). Universial Darwinism as a process of Bayesian inference. Hypothesis and Theory, 10, 49.
Web of Science® Google Scholar
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York, NY: Wiley.
10.1002/0471200611
Google Scholar
Crow, J. F., & Kimura, M. (1970). An introduction to population genetics theory. Minneapolis, Minnesota: Burgess.
10.1006/tpbi.1995.1025
Google Scholar
Dewar, R. C. (2005). Maximum entropy production and the fluctuation theorem. Journal of Physics A: Mathematical and General, 38, L371–L381.
10.1088/0305-4470/38/21/L01
Web of Science® Google Scholar
R. C. Dewar, C. H. Lineweaver, R. K. Niven, & K. Regenauer-Lieb (Eds.) (2014). Beyond the second law: Entropy production and non-equilibrium systems. Berlin: Springer-Verlag.
10.1007/978-3-642-40154-1
Google Scholar
Ewens, W. J. (1989). An interpretation and proof of the fundamental theorem of natural selection. Theoretical Population Biology, 36, 167–180.
10.1016/0040-5809(89)90028-2
CAS PubMed Web of Science® Google Scholar
Ewens, W. J. (1992). An optimizing principle of natural selection in evolutionary population genetics. Theoretical Population Biology, 42, 333–346.
10.1016/0040-5809(92)90019-P
CAS PubMed Web of Science® Google Scholar
Fisher, R. A. (1930). The genetical theory of natural selection. Oxford: Clarendon.
10.1890/0012-9658(2006)87[1445:SOEFDD]2.0.CO;2
Google Scholar
Fisher, R. A. (1958). The genetical theory of natural selection, 2nd ed. New York, NY: Dover.
Google Scholar
Frank, S. A. (1995). George Price's contributions to evolutionary genetics. Journal of Theoretical Biology, 175, 373–388.
10.1006/jtbi.1995.0148
CAS PubMed Web of Science® Google Scholar
Frank, S. A. (1997). The Price equation, Fisher's fundamental theorem, kin selection, and causal analysis. Evolution, 51, 1712–1729.
10.1111/j.1558-5646.1997.tb05096.x
PubMed Web of Science® Google Scholar
Frank, S. A. (2009). Natural selection maximizes Fisher information. Journal of Evolutionary Biology, 22, 231–244.
10.1111/j.1420-9101.2008.01647.x
CAS PubMed Web of Science® Google Scholar
Frank, S. A. (2012a). Natural selection. IV. The Price equation. Journal of Evolutionary Biology, 25, 1002–1019.
10.1111/j.1420-9101.2012.02498.x
CAS PubMed Web of Science® Google Scholar
Frank, S. A. (2012b). Natural selection. V. How to read the fundamental equations of evolutionary change in terms of information theory. Journal of Evolutionary Biology, 25, 2377–2396.
10.1111/jeb.12010
CAS PubMed Web of Science® Google Scholar
Frank, S. A. (2012c). Wright's adaptive landscape versus Fisher's fundamental theorem. In E. Svensson, & R. Calsbeek (Eds.), The adaptive landscape in evolutionary biology (pp. 41–57). New York, NY: Oxford University Press.
Google Scholar
Frank, S. A. (2013). Natural selection. VI. Partitioning the information in fitness and characters by path analysis. Journal of Evolutionary Biology, 26, 457–471.
10.1111/jeb.12066
CAS PubMed Web of Science® Google Scholar
Frank, S. A. (2014). How to read probability distributions as statements about process. Entropy, 16, 6059–6098.
10.3390/e16116059
Web of Science® Google Scholar
Frank, S. A. (2015). d'Alembert's direct and inertial forces acting on populations: The Price equation and the fundamental theorem of natural selection. Entropy, 17, 7087–7100.
10.3390/e17107087
Web of Science® Google Scholar
Frank, S. A. (2016). Common probability patterns arise from simple invariances. Entropy, 18, 192.
10.3390/e18050192
Web of Science® Google Scholar
Frank, S. A., & Slatkin, M. (1992). Fisher's fundamental theorem of natural selection. Trends in Ecology and Evolution, 7, 92–95.
10.1016/0169-5347(92)90248-A
CAS PubMed Web of Science® Google Scholar
Frieden, B. R. (2004). Science from Fisher information: A unification. Cambridge, UK: Cambridge University Press.
10.1017/CBO9780511616907
Google Scholar
Harper, M. (2011). The replicator equation as an inference dynamic. arXiv:0911.1763v3 [math.DS].
Google Scholar
Jaynes, E. T. (1957a). Information theory and statistical mechanics. The Physical Review, 106, 620–630.
10.1103/PhysRev.106.620
Web of Science® Google Scholar
Jaynes, E. T. (1957b). Information theory and statistical mechanics II. The Physical Review, 108, 171–190.
10.1103/PhysRev.108.171
Web of Science® Google Scholar
Jaynes, E. T. (2003). Probability theory: The logic of science. New York, NY: Cambridge University Press.
10.1017/CBO9780511790423
Google Scholar
Kullback, S. (1959). Information theory and statistics. New York, NY: Wiley.
Web of Science® Google Scholar
Lanczos, C. (1986). The variational principles of mechanics, 4th ed. New York, NY: Dover Publications.
Google Scholar
Price, G. R. (1970). Selection and covariance. Nature, 227, 520–521.
10.1038/227520a0
CAS PubMed Web of Science® Google Scholar
Price, G. R. (1972a). Extension of covariance selection mathematics. Annals of Human Genetics, 35, 485–490.
10.1111/j.1469-1809.1957.tb01874.x
CAS PubMed Web of Science® Google Scholar
Price, G. R. (1972b). Fisher's ‘fundamental theorem’ made clear. Annals of Human Genetics, 36, 129–140.
10.1111/j.1469-1809.1972.tb00764.x
CAS PubMed Web of Science® Google Scholar
Price, G. R. (1995). The nature of selection. Journal of Theoretical Biology, 175, 389–396.
10.1006/jtbi.1995.0149
CAS PubMed Web of Science® Google Scholar
Shalizi, C. R. (2009). Dynamics of Bayesian updating with dependent data and misspecified models. Electronic Journal of Statistics, 3, 1039–1074.
10.1214/09-EJS485
Web of Science® Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Google Scholar
Szepesvri, C. (2010). Algorithms for reinforcement learning (pp. 1–103). San Rafael, CA: Morgan & Claypool Publishers.
Google Scholar
de Vladar, H. P., & Barton, N. H. (2011). The contribution of statistical physics to evolutionary biology. Trends in Ecology & Evolution, 26, 424–432.
10.1016/j.tree.2011.04.002
PubMed Web of Science® Google Scholar
Wang, Q. A. (2007). From virtual work principle to maximum entropy for nonequilibrium system. arXiv:0712.2583v1 [cond-mat.stat-mech].
Google Scholar
Wikipedia (2015). Fictitious force — Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=Fictitious_force&oldid=659661243
Google Scholar
Wright, S. (1931). Evolution in Mendelian populations. Genetics, 16, 97–159.
10.1111/j.1471-8286.2006.01560.x
CAS PubMed Google Scholar
Wright, S. (1932). The roles of mutation, inbreeding, cross-breeding and selection in evolution. Proceedings VI International Congress of Genetics, 1, 356–366.
Google Scholar

Citing Literature

Volume7, Issue10

May 2017

Pages 3381-3396

Universal expressions of population change by the Price equation: Natural selection, information, and maximum entropy production

Abstract

1 Introduction

2 Overview

3 Separation of Frequency and Property

4 Set Mapping of Labels Between Populations

5 The Price Equation

6 Biological Fitness and the Conservation of Total Probability

7 Identities for the Conservation of Probability

8 Balance of Direct and Inertial Forces

9 Average Force Along a Path

10 Comparing Linear and Logarithmic Coordinates

11 Log Coordinates, Entropy and Information

12 Small Changes: Prelude

13 Small Changes: Analysis

14 Given Forces

15 Extreme Action and Frequency Dynamics

16 Direct Forces and Constraining Forces

17 Conserved System Quantities as the Primary Forces of Constraint

18 Maximum Entropy Production Principle

19 Maximum Entropy Path Subject to Constraint

20 Equilibrium Thermodynamics and Probability

21 Interpretation of Maximum Entropy Path

22 Geometry and the Fisher Information Metric

23 Direct Work, Information, and Entropy

24 Partial Maximum Entropy Production

Acknowledgements

Conflict of Interest

Appendix A: Literature in Specific Disciplines

Natural selection

Maximum entropy production

Statistical inference and learning algorithms

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Universal expressions of population change by the Price equation: Natural selection, information, and maximum entropy production

Abstract

1 Introduction

2 Overview

3 Separation of Frequency and Property

4 Set Mapping of Labels Between Populations

5 The Price Equation

6 Biological Fitness and the Conservation of Total Probability

7 Identities for the Conservation of Probability

8 Balance of Direct and Inertial Forces

9 Average Force Along a Path

10 Comparing Linear and Logarithmic Coordinates

11 Log Coordinates, Entropy and Information

12 Small Changes: Prelude

13 Small Changes: Analysis

14 Given Forces

15 Extreme Action and Frequency Dynamics

16 Direct Forces and Constraining Forces

17 Conserved System Quantities as the Primary Forces of Constraint

18 Maximum Entropy Production Principle

19 Maximum Entropy Path Subject to Constraint

20 Equilibrium Thermodynamics and Probability

21 Interpretation of Maximum Entropy Path

22 Geometry and the Fisher Information Metric

23 Direct Work, Information, and Entropy

24 Partial Maximum Entropy Production

Acknowledgements

Conflict of Interest

Appendix A: Literature in Specific Disciplines

Natural selection

Maximum entropy production

Statistical inference and learning algorithms

References

Citing Literature

References

Related

Information