Hamilton's rule, inclusive fitness maximization, and the goal of individual behaviour in symmetric two-player games
Abstract
Hamilton's original work on inclusive fitness theory assumed additivity of costs and benefits. Recently, it has been argued that an exact version of Hamilton's rule for the spread of a pro-social allele (rb > c) holds under nonadditive pay-offs, so long as the cost and benefit terms are defined as partial regression coefficients rather than pay-off parameters. This article examines whether one of the key components of Hamilton's original theory can be preserved when the rule is generalized to the nonadditive case in this way, namely that evolved organisms will behave as if trying to maximize their inclusive fitness in social encounters.
Introduction
Inclusive fitness theory is a widely used framework for studying the evolution of social behaviour. Hamilton's original formulation of the theory contains two distinct though related ideas (Hamilton, 1964, 1971). The first is Hamilton's rule, the famous criterion (rb > c) for when an allele coding for a social behaviour will be favoured by selection. This aspect of the theory fits with the ‘gene's eye’ view of evolution. The second is maximization of inclusive fitness, rather than classical fitness, as the ‘goal’ towards which an individual's social behaviour will appear designed. This aspect fits with the traditional individualist view of evolution and is frequently employed by behavioural ecologists.
The relation between these two aspects of inclusive fitness theory is not fully settled. Much theoretical work has focused solely on the first aspect; indeed, the notion of individuals ‘trying’ to maximize their inclusive fitness is often omitted from expositions of kin selection theory. However, recently Grafen (2006, 2009), Queller (2011) and Gardner et al. (2011) have argued for the central importance of inclusive fitness maximization as the ‘goal’ of individual behaviour; Grafen (2006) provides a population-genetic foundation for the idea. This goes some way to reconciling the two aspects of Hamilton's theory.
The work of Grafen, Queller and Gardner et al. suggests an intriguing link between social evolution and rational choice theory. For in effect, these authors are arguing that inclusive fitness plays the role of a utility function in rational choice, that is, it is the quantity that an evolved organism will behave as if it is trying to maximize. Thus, Gardner et al. (2011) write ‘we can imagine the individual adjusting her inclusive fitness…by altering her behaviour’, before choosing an action which brings maximal inclusive fitness (p. 1039–40). This way of thinking about evolution is an instance of what Sober (1988) called ‘the heuristic of personification’, which says that a trait will be favoured by natural selection if and only if a rational individual, seeking to maximize its fitness, would choose that trait over the alternatives. In effect, Gardner et al. are suggesting that this heuristic is valid in social settings, where the trait in question is a social action, so long as ‘fitness’ is defined as inclusive fitness.
Our aim here is to propose a particular way of formalizing this ‘rational actor heuristic’ in the context of social evolution and to ask how generally it applies. This is a pressing question because Grafen's (2006) argument that evolution will lead to inclusive fitness maximizing behaviour assumes additivity of costs and benefits. This assumption is quite restrictive as in many social situations, the benefit that a given action confers on a recipient may depend on the recipient's own type (Frank, 1998; Lehmann & Rousset, 2014a). In our simple model below we find that if the additivity assumption is made, then the rational actor heuristic, with inclusive fitness as the individual's utility function, applies neatly. However, matters are more complex if there is nonadditivity.
Asking whether the rational actor heuristic applies is different from asking whether Hamilton's rule itself applies in nonadditive scenarios. This latter question has been extensively discussed in the literature. The upshot is that an exact version of Hamilton's rule does apply under nonadditivity, so long as the cost and benefit terms are suitably defined (Queller, 1992; Frank, 1998, 2013; Gardner et al., 2011); though the biological significance of the resulting rule has been questioned (Allen et al., 2013; Birch, 2014; Birch & Okasha, 2015). However, this does not settle the issue about individual maximization that is our focus here.
The structure of this article is as follows. Firstly, we study social evolution using a simple additive Prisoner's dilemma model and show how the rational actor heuristic applies. Secondly, we consider a nonadditive variant of the same model and ask whether a similar conclusion holds. Finally, we discuss the significance of the results obtained.
The case of additive pay-offs
Additive Prisoner's dilemma
Consider a simple model of the evolution of social behaviour of the sort used in evolutionary game theory. An infinite population of haploid asexual organisms engage in pairwise social interactions in every generation. Organisms are of two types, altruists (A) and selfish (S). A types perform an action that is costly for themselves but benefits their partner; S types do not perform the action. Type is hard-wired genetically and perfectly inherited.
An organism's pay-off from the social interaction depends on its own type and its partner's type. Pay-offs are interpreted as increases in lifetime reproductive fitness over a unit baseline. The social action is assumed to affect only the actor and their partner, and thus local interaction is assumed absent. An A type incurs a cost of −c as a result of its action and confers a benefit of b on its partner, where c > 0 and b > 0; thus, the game is a Prisoner's dilemma.
Pay-offs to the actor, referred to as ‘personal pay-offs’, are shown in Table 1. We let V(i, j) denote the pay-off to an actor from playing i when her opponent plays j, where i, j ∈ {A, S}. Note that pay-offs are additive: an altruist alters their own pay-off by −c and their partners’ pay-off by b, irrespective of the type of their partner.
Partner | |||
---|---|---|---|
A | S | ||
Actor | A | b − c | −c |
S | b | 0 |
There are three pair-types in the population, AA, AS and SS, whose relative frequencies in the initial generation are fAA, fAS and fSS, respectively, where fAA + fAS + fSS = 1. The overall frequency of the A type in the initial generation is denoted p, where . The change in p over one generation is denoted Δp.
The sign and magnitude of Δp depend on the rules by which the pairs are formed. If pairing is random, then the S type must be fitter overall, so Δp will be negative. However, if pairing is assortative, then the A type may be fitter overall, for the benefits of altruistic actions then fall disproportionately on other altruists. Random pairing means that the frequency distribution of the pair-types will be binomial, that is fAA = p2, fAS = 2p(1 − p) and fSS = (1 − p)2.
Where pairing is nonrandom, a simple regression analysis yields a measure of the statistical correlation between social partners. We use the variable pi to indicate an organism's own type and to indicate its partner's type; thus pi = 1 if the ith organism is an A, pi = 0 otherwise; and
if the ith organism is paired with an A,
otherwise. We then compute the linear regression of
on pi, given by
, which is a standard way of defining the r term of Hamilton's rule. Henceforth, we refer to
as r.
In the early kin selection literature, r was often defined in genealogical terms, for example as the probability that actor and partner share an allele that is identical by descent, yielding the familiar values of for full sibs,
for offspring and
for grandoffspring (see Michod & Hamilton, 1980). In some ways, this definition of r is the more natural one for expressing the idea that organisms value their relatives’ reproduction in proportion to how closely related they are. However, the statistical definition of r, above, yields a version of Hamilton's rule that is more generally applicable.

It follows that r ranges from −1 (perfect disassortment) to 1 (perfect assortment); when pairing is random, r = 0.
Evolutionary analysis


With the constant r assumption, the outcome of the evolutionary process is easily determined. If rb > c the A type will spread to fixation; if rb < c the S type will spread to fixation; if rb = c there will be no evolutionary change.
Rational actor analysis: preliminaries
To apply a rational actor analysis, we transpose our evolutionary model to a rational choice context. We consider two players playing a symmetric game. Each player has two pure strategies, A and S. If a player plays a mixed strategy, this means that they randomize over their pure strategies; thus, πA denotes the mixed strategy in which A is played with probability πA and S with probability 1 − πA. The pay-off to a mixed strategy is then simply its expected pay-off.
Each player has a utility function which measures how desirable they find the possible outcomes of the game; we assume that both players have the same utility function. Each player's goal is to maximize their utility function. One possibility is that the utility function is given by the personal pay-offs in Table 1 above, in which case we write U(i, j) = V(i, j), where U(i, j) is the utility a player gets from playing i when their partner plays j; i, j ∈ {A, S}. There are other possibilities too; see below.
Once the players’ utility function has been specified, the next step is to seek the Nash equilibrium (or equilibria) of the game. (A Nash equilibrium is a pair of strategies, possibly mixed, one for each player, each of which is a best response to the other.) Game theory predicts that if the players are rational, they will end up at a Nash equilibrium of the game (see for example Binmore, 2007). We can then ask whether the Nash equilibria of the game correspond to the outcomes of the evolutionary process described above. If so, we can conclude that evolution will lead organisms to behave as if trying to maximize the utility function in question.
This is a natural way of formalizing the rational actor heuristic in a game-theoretic context. It differs somewhat from Grafen's (2006) formalization of the same idea, which posits ‘links’ between gene-frequency change and individual optimization. Our approach allows recovery of Grafen's main result, and by taking optimization to include best-response, that is optimal choice conditional on the other player's choice, extends easily to the nonadditive case. A similar approach is found in Alger & Weibull (2012) and Lehmann et al. (2015).
Utility as inclusive fitness
One possibility to explore is that a player's utility function depends on their partner's pay-off as well as their own. For example, suppose that a player's utility for any outcome is given by the quantity: personal pay-off plus r times partner's pay-off, that is U(i, j) = V(i, j) + rV(j, i). Applying this transformation to the personal pay-offs yields Table 2 below, which we refer to as the ‘inclusive fitness pay-off matrix’.
Partner | |||
---|---|---|---|
A | S | ||
Actor | A | (b − c)(r + 1) | −c + rb |
S | b − rc | 0 |
This transformation was first suggested by Hamilton (1971) and has been discussed by Grafen (1979), Bergstrom (1995), Wade & Breden (1980), Day & Taylor (1998), Taylor & Nowak (2007) and Martens (2015). It is a natural formalization of the idea that an actor, in their social behaviour, will care about their partner's pay-off, discounted by relatedness, as well as their own pay-off. Clearly, other transformations of the personal pay-off matrix are also conceivable.
The pay-offs in Table 2 do not correspond exactly to the verbal definition of inclusive fitness in Hamilton (1964), which was ‘the personal fitness which an individual actually expresses…once it is stripped of all components which can be considered as due to the individual's social environment…then augmented by certain fractions of the quantities of harm and benefit which the individual himself causes to the fitnesses of his neighbours…The fractions in question are simply the coefficients of relationship’ (p. 8). This definition is sometimes but not always adhered in the literature.
The discrepancy between Table 2 and Hamilton's definition arises because in the left column, the actor's pay-off has not been stripped of the component that is due to the partner's altruistic action (b) and has been augmented by r times the partner's entire pay-off, rather than the portion of that pay-off that is caused by the actor (b). Applying Hamilton's definition exactly would lead to the pay-off matrix in Table 3 below.
Partner | |||
---|---|---|---|
A | S | ||
Actor | A | −c + rb | −c + rb |
S | 0 | 0 |
Note that Table 3 derives from Table 2 by subtraction of the quantity b − rc from the left column. In game-theoretic terms, Table 3 is thus a ‘local shift’ of Table 2 (and vice versa), which means that their Nash equilibria are necessarily identical (Weibull, 1995). Therefore, if the players’ utility function is given by Table 3, game theory predicts exactly the same outcome(s) as if it were given by Table 2. So although taking Table 2 as the definition of inclusive fitness involves an element of ‘double counting’ – which Hamilton's definition was designed to avoid – it is harmless.
In fact, there is a positive reason to prefer Table 2 as the definition of inclusive fitness, in a game-theoretic context, for Hamilton's definition does not generalize easily to nonadditive pay-offs. With nonadditivity, it is unclear how to decide which component of the actor's pay-off is ‘caused’ by its partner's action and vice versa (cf. Allen et al., 2013). By contrast, the definition used in Table 2 – actor pay-off plus r times partner pay-off – applies just as well to the nonadditive case. In order not to prejudge the issue of whether inclusive fitness maximization, or a similar result, obtains under nonadditivity, this is the definition preferred here.
Rational actor analysis: results
Suppose firstly that the utility function is personal pay-off (Table 1). It is easy to see that (S, S) is the only Nash equilibrium of the game, as S strongly dominates A, that is each player does strictly better by playing S irrespective of their partner's choice. This familiar result shows that the rational actor heuristic fails for this choice of utility function, as it would have us conclude that altruism can never evolve, which we know to be false.
What if the utility function is inclusive fitness pay-off (Table 2)? In that case, we can show the following. If rb > c, then (A, A) is the unique Nash equilibrium; if rb < c, then (S, S) is the unique Nash equilibrium; if rb = c, then (A, A) and (S, S) are both Nash equilibria, as is every pair of mixed strategies, so game theory makes no prediction about the players’ choices (See Appendix for proof).
It follows that with additive pay-offs, defining utility as inclusive fitness makes the rational actor heuristic valid. The condition for the A type to evolve, rb > c, is identical to the condition for (A, A) to be the unique Nash equilibrium of the rational game, and similarly for S (Table 4). This supports the idea that evolution will lead organisms to appear as if trying to maximize their inclusive fitness, just as Hamilton originally argued.
rb > c ⇔ A evolves ⇔ (A, A) is unique Nash equilibrium |
rb < c ⇔ S evolves ⇔ (S, S) is unique Nash equilibrium |
rb = c ⇔ no evolution ⇔ all pairs of strategies, pure and mixed, are Nash equilbiria |
- ⇔ means ‘if and only if’.
An equivalent perspective on the situation is this. The quantity (rb − c) equals the difference in a player's inclusive fitness pay-off between playing A and S, irrespective of what its partner does (see Table 2). Thus, we can determine whether the A type will evolve by asking whether a rational agent, who wants to maximize their inclusive fitness, would choose A over S. In short, equating utility with inclusive fitness ensures that the rational agent's choice coincides with the ‘choice’ made by natural selection.
A caveat: uniqueness
One important caveat is needed. In the above model, the inclusive fitness pay-off matrix (Table 2) is not the unique utility function that yields the rb > c condition for action A to be chosen over S. In game theory, the utility function is only ever unique up to choice of origin and unit; so any affine transformation (of the form U′ = aU + b, where , a > 0) will leave all Nash equilibria of the game unchanged. Furthermore, a ‘local shift’ of the utility function, which involves adding a constant to any column of the utility matrix, will also leave unchanged the Nash equilbiria, as noted above.
One local shift of the inclusive fitness pay-off matrix (Table 2) is of particular interest. If we add the quantity (rc − rb) to the left-hand column of Table 2, we get the matrix in Table 5 below.
Partner | |||
---|---|---|---|
A | S | ||
Actor | A | (b − c) | −c + rb |
S | (1 − r)b | 0 |
The pay-offs in Table 5 are related to the personal pay-offs (Table 1) by the transformation U(i, j) = rV(i, i) + (1 − r)V(i, j). This transformation was first suggested by Grafen (1979), hence the label ‘Grafen 1979 pay-off’; see Bergstrom (1995), Day & Taylor (1998), Alger & Weibull (2012) for discussion. By contrast with the inclusive fitness pay-offs (Table 2), which involve adding r times partner's pay-off to the actor's personal pay-off, the Grafen 1979 pay-offs involve taking an (r, 1 − r) weighted average of the personal pay-off that would accrue to the actor if their partner had chosen the same as the actor and if their partner made the choice that they actually did.
As the Grafen 1979 pay-off matrix (Table 5) is a local shift of the inclusive fitness pay-off matrix (Table 2), the Nash equilibria of the resulting games are identical; thus, the rational actor heuristic works equally well with either. [This is because in both cases, (rb − c) is the pay-off difference between playing A and S.] Therefore, although our simple model has vindicated Hamilton's claim that evolution will lead organisms to behave as if trying to maximize their inclusive fitness, it is important to see that inclusive fitness [whether defined our way (Table 2) or in Hamilton's original way (Table 3)] is not the unique quantity of which this maximization claim is true.
Nonadditive pay-offs
To determine whether the above results generalize to the nonadditive case, we consider a modified Prisoner's dilemma in which the pay-off to an A type paired with another A type is (b − c + d) rather than (b − c). So the parameter d quantifies the deviation from pay-off additivity, or synergistic effect, when two A types are paired together; d can be either positive or negative. The resulting pay-off structure (Table 6) is sometimes referred to as a ‘synergy game’ (van Veelen, 2009).
Partner | |||
---|---|---|---|
A | S | ||
Actor | A | b − c + d | −c |
S | b | 0 |
Again, we assume that pairs of organisms are drawn from an infinite population to play the game; type is genetically hard-wired and mutation is absent.
Evolutionary analysis
As before, Δp denotes the change in frequency of the A type over a generation. Unsurprisingly, rb > c is no longer the condition for Δp to be positive. However, an exact version of Hamilton's rule can be recovered by suitably defining the cost and benefit terms, as emphasized by Gardner et al. (2011), whose approach we follow here. (A different approach, not discussed here, incorporates nonadditive pay-offs into Hamilton's rule by a weak selection approximation, see for example Lehmann & Rousset, 2014b).




Following Hamilton (1964), instead of considering the effect on the actor's fitness of their partner's action , we can consider the effect on their partner's fitness of the actor's action, denoted
. These two partial regression coefficients are numerically dentical (Taylor et al., 2007). (This is the well-known switch from ‘neighbour-modulated’ to ‘inclusive’ fitness.) Following Gardner et al. (2011), we denote the
and
coefficients as −C and B, respectively.
Importantly, eqn 2 can be fitted whether or not the true relation between w, pi and is linear. In the nonadditive case under consideration that relation is nonlinear (since d > 0), which implies that the partial regression coefficients −C and B will be functions of population-wide gene frequencies, and liable to change as the population evolves. Therefore, unlike c and b, which are fixed pay-off parameters, −C and B are population variables.





Note that (rB − C) is a function of p, so satisfaction of rB > C in generation t does not imply its satisfaction in generation t + 1. Selection is thus frequency-dependent, and neither type will necessarily spread to fixation. A polymorphic equilibrium will obtain when p = [c − r(b + d)]/d[1 − r]; the stability of this equilibrium depends on the sign of d. The full evolutionary dynamics are summarized in Table 7 below, see Appendix for proof.
Case 1: r < 1, d > 0 | ||
(i) | rb − c + rd ≥ 0 | A evolves to fixation |
(ii) | rb − c + d ≤ 0 | S evolves to fixation |
(iii) | rb − c + d > 0 > rb − c + rd | Unstable polymorphism at p = [c−r(b + d)]/d[1 − r] |
Case 2: r < 1, d < 0 | ||
(i) | rb − c + d ≥ 0 | A evolves to fixation |
(ii) | rb − c + rd ≤ 0 | S evolves to fixation |
(iii) | rb − c + d < 0 < rb − c + rd | Stable polymorphism at p = [c − r(b + d)]/d[1 − r] |
Case 3: r = 1 | ||
(i) | b − c + d > 0 | A evolves to fixation |
(ii) | b − c + d < 0 | S evolves to fixation |
(iii) | b − c + d = 0 | No evolutionary change |
The general version of Hamilton's rule embodied in eqn 5 raises interesting interpretive questions. Some have argued that the rule in this form has little explanatory value (Nowak et al., 2011; Allen et al., 2013), whereas others have seen the generality of the rule as an advantage, a proof that inclusive fitness theory does not rely on restrictive assumptions (Gardner et al., 2011). This debate has been analysed elsewhere (Birch, 2014; Birch & Okasha, 2015) and is not the focus here.
Instead, our question is this. Given that eqn 5 is true, and given the resulting evolutionary dynamics, can the rational actor heuristic be applied? Will evolution lead organisms to behave as if maximizing a utility function, and if so what is it?
Importantly, the answer to this question cannot simply be read off eqn 5. In the additive case, there was a simple link between Hamilton's rule and a utility function with the desired property: rb − c > 0 was the condition for the A type to spread, and (rb − c) the utility difference between playing A and S. One might hope to extrapolate this to the nonadditive case by simply replacing (rb − c) with (rB − C) in Table 2. However, as B and C are functions of p, they cannot meaningfully feature in the utility function.
The reason is as follows. The point of the rational actor heuristic is to find a link between gene-frequency dynamics and a ‘goal’ that organisms behave as if they are trying to achieve. Such a link would be trivial if the ‘goal’ were allowed to change as gene frequencies change. For the heuristic to have any value, the goal must remain fixed. So our task is to find a utility function whose arguments are restricted to the pay-off parameters (b, c and d), and the relatedness coefficient r, which makes the rational actor heuristic work.
Rational actor analysis
To address this question, we again transpose the evolutionary model to a rational choice context and study the Nash equilibria of the resulting game. Suppose firstly that the utility function is given by the inclusive fitness pay-off transformation, that is personal pay-off plus r times partner pay-off. This yields the pay-offs in Table 8 below.
Partner | |||
---|---|---|---|
A | S | ||
Actor | A | (b − c + d)(r + 1) | −c + rb |
S | b − rc | 0 |

It follows that, unlike in the additive case, the rational actor heuristic does not work when utility is defined as inclusive fitness. The condition for (A, A) to be a Nash equilibrium is not identical to the condition for A to evolve to fixation, similarly for S. Furthermore, the condition for there to be a mixed-strategy Nash equilibrium is not the same as the condition for there to be a polymorphism. So it is not true that at evolutionary equilibrium, organisms will behave as if trying to maximize their inclusive fitness.
Can we find a utility function modulo which the rational actor heuristic works? The answer is yes. The Grafen 1979 pay-off matrix, which to recall is derived from the personal pay-off matrix by the transformation U(i, j) = rV(i, i) + (1 − r)V(i, j), does the trick. This yields the pay-offs in Table 9 below.
Partner | |||
---|---|---|---|
A | S | ||
Actor | A | (b − c + d) | −c + rb + rd |
S | (1 − r)b | 0 |

This restores the rational actor heuristic. In particular, if (A, A) is the only pure-strategy Nash equilibrium, then A evolves to fixation; if (S, S) is the only pure-strategy equilibrium, then S evolves to fixation. If there is a mixed-strategy Nash equilibrium but no pure-strategy equilibria, the population evolves to a stable polymorphism; if there is a mixed-strategy Nash equilibrium and both (A, A) and (S, S) are pure-strategy equilibria, then there is an unstable polymorphism; in both cases, the weights on A and S in the mixed-strategy Nash equilibrium equal the proportions of A and S in the polymorphism. Thus, there is a tight correspondence between the Nash equilibria and the evolutionary dynamics, summarized in Table 10 (see Appendix for proof).
(A, A) is only pure N.E. ⇒ A evolves to fixation |
(S, S) is only pure N.E. ⇒ S evolves to fixation |
(πA, πA) is the only N.E. ⇔ stable polymorphism at p = πA |
(πA, πA), (A, A), (S, S) all N.E. ⇔ unstable polymorphism at p = πA |
- Note: πA = (c − r(b + d))/d(1 − r).
The upshot is that with nonadditive pay-offs, the rational actor heuristic will work so long as the utility function is defined as Grafen 1979 pay-off, rather than inclusive fitness pay-off. Again any affine transformation of the Grafen 1979 pay-off matrix, or any local shift, will also preserve the correspondences above. Note that, unlike in the additive case, the Grafen 1979 pay-off matrix (Table 9) is not a local shift of the inclusive fitness pay-off matrix (Table 8). This is why the rational actor heuristic fails if utility is defined as inclusive fitness in the nonadditive case.
Discussion
Hamilton's original formulation of inclusive fitness theory assumed additivity of costs and benefits. A number of authors have emphasized that an exact version of Hamilton's rule holds with nonadditive pay-offs, so long as the −C and B terms are suitably defined. Here, we have focused on the relevance of pay-off additivity not for Hamilton's rule itself, but for Hamilton's (logically distinct) claim that evolution will lead organisms to behave as if trying to maximize their inclusive fitness, understood here to mean personal pay-off plus r times partner pay-off.
In a recent critique, Allen et al. (2013) observe that arguments for inclusive fitness maximization all rely on pay-off additivity, and that where selection is frequency-dependent, fitness maximization need not generally occur. They write: ‘evolution does not, in general, lead to the maximization of inclusive fitness or any other quantity’ (p. 20138).
Our analysis partly supports this conclusion. Here, we have understood maximization to include best-response, so that the presence of frequency dependence does not automatically preclude a maximization principle from holding; we have allowed the utility function to be any function of the pay-off parameters b, c and d and the relatedness coefficient r. At the evolutionary equilibrium of our simple nonadditive model, it is not true that organisms behave as if trying to maximize their inclusive fitness pay-off. However, there is a somewhat similar quantity – Grafen 1979 pay-off – that organisms do behave as if they are trying to maximize.
It is an open question whether our positive result – maximization of Grafen 1979 pay-off – extends to more complicated models of social evolution, for example that incorporate local interaction, multiple social partners, or class structure, or to more realistic genetic architectures than haploid inheritance. There is no guarantee that it does, as such models typically lead to more complicated evolutionary dynamics than those assumed here. As has been emphasized before, a valid maximization argument must always deduce the quantity being maximized, if any, from the underlying evolutionary dynamics (Mylius & Diekmann, 1995).
Also, we have assumed that the coefficient of relatedness, r, remains constant as the population evolves. Without this assumption, it makes little sense to allow the utility function to depend on r, as this would be tantamount to positing a changing ‘goal’ so would again trivialize the rational actor heuristic. In some inclusive fitness models, r is in fact a dynamic variable rather than a constant (e.g. van Baalen & Rand, 1998), so it cannot be assumed that our results, or ones like them, can be derived for these models.
Our negative result, that maximization of inclusive fitness only holds with additive pay-offs, is in line with previous results by Bergstrom (1995) and Lehmann & Rousset (2014a); it supports some of the claims made by opponents of inclusive fitness theory such as Allen & Nowak (2015). The key logical point to note is that although a version of Hamilton's rule is indeed a fully general evolutionary principle, as Gardner et al. (2011) stress, no principle about individual maximization can be deduced directly from this form of the rule. Whether such a principle holds, and if so what the quantity being maximized is, needs to be shown on a case-by-case basis.
Finally, what are the implications for biological practice? Behavioural ecologists have often used inclusive fitness maximization as a way to interpret observed behaviour in the field, in line with Hamilton's original suggestion. Our analysis suggests that this will not always be possible. If an observed social behaviour fails to maximize an individual's inclusive fitness, defined as personal pay-off plus r times partner's pay-off, the behaviour may nonetheless be adaptive and the population at an evolutionary equilibrium. Moreover, the quantity we have called ‘Grafen 1979 pay-off’ will serve the needs of the behavioural ecologist seeking to identify the ‘goal’ of evolved behaviour in a broader range of cases than will inclusive fitness itself.
Acknowledgments
Thanks to Ken Binmore, Bengt Autzen, Jonathan Birch, Steve Frank, Herb Gintis, Alan Grafen, Andy Gardner and two anonymous referees for their comments and discussion. This work was supported by the European Research Council Seventh Framework Program (FP7/20072013), ERC Grant agreement no. 295449.
Appendix 1
Additive pay-offs


which is eqn 2.


Therefore, (A, A) and (B, B) are both N.E.rb = c.
Let πA be an arbitrary mixed strategy that plays A with probability πA.

Therefore, U (A, πA) = U(S, πA) rb = c
(πA, πA ) is a N.E.
This establishes the correspondences in Table 4.
Nonadditive pay-offs

From eqn 3, we have:

which is eqn 5.
(ii) Equation 6 us that rB – C = (rb − c) + d [r + p(1 − r)]



If d > 0, then ∆p < 0 when p < p* and ∆p > 0 when p > p*, so the equilibrium is unstable. If d < 0, the equilibrium is stable.
Suppose that r < 1 and d > 0 (case 1).
- Then, (rb − c) + d[r + p(1 − r)] → (rb – c + rd) as p → 0
- and (rb − c) + d[r + p(1 − r)] → (rb – c + d) as p → 1
- Therefore, A evolves to fixation
(rb − c) + rd ≥ 0.
- Similarly, S evolves to fixation
(rb − c) + d ≤ 0.
- When rb – c + d > 0 > rb – c + rd, then 0 < [c − r(b + d)]/d[1 − r] < 1, so an unstable polymorphism evolves.
Suppose that r < 1 and d < 0 (case 2).
- By identical reasoning to case 1:
- A evolves to fixation
(rb − c) + d ≥ 0.
- S evolves to fixation
(rb − c) + rd ≤ 0.
- When rb − c + d < 0 < rb – c + rd, then 0 < [c − r(b + d)]/d[1 − r] < 1, so a stable polymorphism evolves.
Suppose that r = 1 (case 3).
- Then, rB – C = b – c + d. Therefore,
- A evolves to fixation
b – c + d > 0
- S evolves to fixation
b – c + d < 0
- No evolutionary change occurs
b – c + d = 0
This establishes the evolutionary dynamics in Table 7.




Therefore,


Therefore,



Similarly, (S, S) is sole pure-strategy N.E. ⇒ S spreads to fixation.
Consider mixed strategy (πA, πA), where






This establishes the correspondences in Table 10.