Volume 87, Issue 6 e70048
REVIEW ARTICLE
Full Access

Collective Acoustics in Pan: Conserved Roots in the Evolution of Human Musicality

James Brooks

Corresponding Author

James Brooks

Cooperative Evolution Lab, German Primate Center, Göttingen, Germany

Institute for Advanced Study, Kyoto University, Kyoto, Japan

Wildlife Research Center, Kyoto University, Kyoto, Japan

Correspondence: James Brooks ([email protected])

Contribution: Conceptualization (equal), ​Investigation (equal), Writing - original draft (equal), Writing - review & editing (equal)

Search for more papers by this author
Zanna Clay

Zanna Clay

Department of Psychology, Durham University, Durham, UK

Contribution: Conceptualization (equal), ​Investigation (equal), Writing - original draft (equal), Writing - review & editing (equal)

Search for more papers by this author
Valérie Dufour

Valérie Dufour

Laboratoire de Psychologie Sociale et Cognitive, UMR 6024, CNRS-Université Clermont Auvergne, Clermont-Ferrand, France

Contribution: Conceptualization (equal), ​Investigation (equal), Writing - original draft (equal), Writing - review & editing (equal)

Search for more papers by this author
Pawel Fedurek

Pawel Fedurek

Division of Psychology, Faculty of Natural Sciences, University of Stirling, Stirling, UK

Contribution: Conceptualization (equal), ​Investigation (equal), Writing - original draft (equal), Writing - review & editing (equal)

Search for more papers by this author
Cédric Girard-Buttoz

Cédric Girard-Buttoz

The Ape Social Mind Lab, Institut des Sciences Cognitives Marc Jeannerod, UMR 5229, CNRS, Bron, France

Department of Human Behavior, Ecology and Culture, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany

Contribution: Conceptualization (equal), ​Investigation (equal), Writing - original draft (equal), Writing - review & editing (equal)

Search for more papers by this author
Shinya Yamamoto

Shinya Yamamoto

Institute for Advanced Study, Kyoto University, Kyoto, Japan

Wildlife Research Center, Kyoto University, Kyoto, Japan

Institute for the Future of Human Society, Kyoto University, Kyoto, Japan

Contribution: Conceptualization (equal), ​Investigation (equal), Writing - original draft (equal), Writing - review & editing (equal)

Search for more papers by this author
First published: 28 May 2025

ABSTRACT

The evolution of human musicality has attracted immense and intense cross-disciplinary research attention. However, despite widespread interest, there has been surprisingly little explicit focus on the conserved roots and evolutionary precursors of musicality in our closest relatives, chimpanzees (Pan troglodytes) and bonobos (P. paniscus). We here aim to evaluate the extant literature on chimpanzees and bonobos in behavioral contexts relevant to evolutionary theories of musicality, especially simultaneous production of acoustics signals by multiple individuals (“collective acoustics”). We illustrate the importance of this literature by evaluating and comparing a pair of recent, influential, and competing theories on the evolution of human musicality (music for social bonding and music for credible signaling) in light of the reviewed empirical evidence. We conclude by highlighting core remaining questions for future empirical studies on great ape collective acoustics that may have a critical influence on our understanding of the evolution of human musicality.

Summary

  • Comparative research on musicality focuses largely on convergent rather than conserved elements of musicality, with almost no direct attention to our closest relatives, the great apes.

  • Both chimpanzees and bonobo produce a range of vocal and percussive sounds together (“collective acoustics”) across social settings, but display notable differences in both form and context.

  • Comparing chimpanzee and bonobo collective acoustic behavior, alongside their social contexts and underlying mechanisms, provides an important study system to evaluate competing theories on the evolution of human musicality.

1 Introduction

1.1 General Background

Collective acoustics, or simultaneous production of acoustic signals by multiple individuals, is a staple of human group behavior and musicality. One major approach to the scientific study of human musical origins is through analysis of non-human animal acoustic behavior. Specifically, the biomusicology approach described by Fitch (2015) emphasizes a multi-component, pluralistic, comparative, and ecologically-motivated perspective towards the evolution of musicality. Traditionally, much of this study has focused on cases of convergent vocal evolution, but as such, there has been little attention paid to the conserved roots of human musicality found in our closest relatives, the Pan apes. Likewise, the majority of research attention on the function of great ape acoustics has so far focused on signals from a single caller, leaving the role of collective acoustics largely underexplored. We, therefore, address this gap by focusing explicitly on collective acoustics, both vocal and percussive, in bonobos and chimpanzees. We analyse the natural contexts and underlying factors involved in the simultaneous and synchronized vocalizations and drumming observed in great apes, and discuss their relevance and implications for broader theories targeting the evolution of musicality.

Theorizing about the evolutionary origins of music has been ongoing for nearly as long as theorizing about evolution itself. Darwin himself first speculated that music's evolutionary role was in the context of sexual selection (Darwin 1871). Another influential, but opposing, perspective suggests instead that music is a mere byproduct of auditory preferences driven by other evolutionary processes (so-called “auditory cheesecake”) (Pinker 2003). Other perspectives, still not always mutually exclusive, suggest core evolutionary roles for elements of musicality in contexts of offspring care (Dissanayake 2001), frightening off predators (Jordania 2009), emotional communication (Mithen 2006), or as a consequence of our switch to bipedalism (Larsson et al. 2019). There is no shortage of alternatives. While our paper aims to comprehensively review Pan collective acoustics (and related music-related behaviors) relevant to any theory of human musicality, we here focus especially on illustrating the importance of this approach through application to two influential but competing hypotheses: music for social bonding (MSB) and music for credible signaling (MCS). This pair of hypotheses both emphasize continuity between species, make testable predictions for non-human animal acoustic behavior, and, more specifically, emphasize functional importance in behavioral contexts that significantly differ between our two closest relatives, bonobos and chimpanzees. The opposing predictions of this pair, therefore, provide a clear case study for evaluating the implications of conserved collective acoustic behavior on our understanding of the evolution of human musicality.

1.2 Chimpanzees and Bonobos

Chimpanzees and bonobos are humans’ two closest relatives. The Homo lineage split from Pan an estimated 5–6 million years ago, while the two Pan species split from one another only around 1–2 million years ago (de Manuel et al. 2016; Prado-Martinez et al. 2013; Prüfer et al. 2012; Takemoto et al. 2015). In this time, the three extant Hominins each developed a distinct range of collective acoustic behavior, both in form and context. To explain why humans alone display the diversity of musical behavior universally expressed in this species, we also require a careful assessment of the collective acoustics of bonobos and chimpanzees. We must explain when and why the forms that existed in the last common ancestor were insufficient, and why the two Pan species show differing overlap with humans in key properties of their collective acoustic repertoire. A full understanding of the evolution of musicality will entail explanation of both what humans share with other apes, but also what they do not, and further will identify the behavioral precursors that human-unique forms likely developed from.

Both chimpanzees and bonobos are flexible frugivores, exhibit fission-fusion grouping structures, and have male philopatry (Gruber and Clay 2016). Despite their recent divergence and similar ecological niches, however, they show some striking differences in social behavior. Notably, chimpanzees are characterized by violent intergroup relations, have strict male dominance hierarchies shaped by coalitionary aggression, and in some populations engage in group hunting. Bonobos, on the other hand, have largely tolerant intergroup relations (even lasting weeks at a time) and an absence of inter-group lethal aggression, their dominance hierarchies are often described as co-dominant between males and females (with strong female but little male coalitions), and they outperform chimpanzees at most captive dyadic cooperation tasks (Hare and Yamamoto 2017). Both species also have rich, overlapping yet distinct, acoustic repertoires, a key difference being the notably elevated pitch of bonobo vocalizations as compared to those of chimpanzees (Grawunder et al. 2018). As humans’ two closest relatives, comparisons between these three species have provided key insights about humans’ evolutionary past (see Hare and Yamamoto 2017 for a review). While the other great apes, gorillas and orangutans, are undoubtedly relevant as well, we limit our focus to Pan to allow a more thorough and focused discussion of the two sister species.

1.3 MSB and MCS

The first of the two major theories we explore in apes is the music for social bonding (MSB) hypothesis (Savage et al. 2021). As the name suggests, this theory argues that the principle unifying evolutionary function of musicality is to support the development and maintenance of social bonds. Focusing on the effects of joint music production on social relations between performers, Savage et al. (2021) review evidence on music's structural suitability for large-scale social bonding (amenable to a reliable prediction of large numbers of participants through repetitive structure), neurobiological evidence for its effects on social affiliation and reward (especially highlighting the release of oxytocin and dopamine in musical and bonding contexts), and cross-cultural and cross-species evidence for the contexts in which certain elements of musicality can reliably be found. MSB is proposed to also apply to parent-offspring bonds and mate bonding through the same mechanisms, but emphasizes especially music's evolutionary role in group-wide social cohesion. The argument is that music and its evolutionary precursors allowed individuals to align in time, coordinate in large groups, and share emotions through expressive performance. The authors thus predict that music is not an all-or-nothing phenomenon, but rather emphasize continuity, and predict that elements of musicality, even in the absence of full-fledged music, will have similar social and physiological effects across species. They, therefore, predict that synchronous and predictable acoustic production will promote social bonding across species, and that the occurrence of music-like behavior should reflect species’ socio-ecologies (being more pronounced in species requiring scalable bonding mechanisms beyond the dyadic level). They also predict that similar physiological mechanisms, acoustic structural properties, and emotional content should be found across species in these contexts.

Alternatively, the second major theory we explore is the music for credible signalling (MCS) hypothesis (Mehr et al. 2021). This theory instead focuses on credible signaling of cooperative intent in two domains: coalitional strength and parental attention. As the empirical evidence relevant for the latter in chimpanzees and bonobos is largely insufficient for evaluation of this hypothesis at this time, we focus here on the arguments for credible signaling of coalitional strength. In this domain, Mehr et al. (2021) suggest that the primary evolutionary function of musicality is in its effects on listeners as opposed to the producers. According to the MCS, collective acoustic signals in the context of cooperative signaling can reliably indicate the number of individuals, their coordination ability, and the willingness of the participants to join the cooperative act (i.e., louder more cohesive and coordinated acoustic displays can be a signal of larger more coordinated coalitions). They highlight that complex acoustic signals have evolved independently across species in territorial defense, that coordination requires substantial effort and could, therefore, be a costly signal of investment in the coalition, and that group coordination ability could thus signal value in a biological market of potential allies for both within-group and between-group contests. MSC, like MSB, emphasizes continuity between species. Predictions relevant to other species arising from this view are, therefore, that collective acoustics will vary by degree of coalitional and intergroup competition, that they will be associated with cooperative intent in species-typical cooperative contexts, and that individuals should be highly sensitive to the acoustic coordination of potential allies and opponents.

1.4 Previous Studies

Despite the emphasis on continuity in both proposals, most previous comparative biomusicology has focused on convergent rather than conserved behavior. Work with songbirds, marine mammals, and pinnipeds, for example, have shaped evolutionary thinking around complex vocal learning (Fitch 2005; Patel 2021; Ravignani et al. 2016), while the duetting of some birds and primates (e.g., gibbons, Hylobatidae; titi monkeys, Callicebinae; and indris, Indri indri) have provided insights on the biological roots of vocal synchronization (Clink et al. 2020; Gamba et al. 2016; Geissmann 2002; Haimoff 1986; Müller and Anzenberger 2002). Attention to great apes has thus far been more limited (Wagner and Hoeschele 2022). Some authors have speculated on the relevance of attention to our closest relatives, but largely without detailed reference to available empirical literature. Of note, Merker (1999) appealed explicitly to the context of synchronous chorusing in our great ape relatives in the origins of music, and Hagen and Hammerstein (2009) build from this perspective, suggesting “Thus did the calls of human ancestors, which were probably functionally analogous to chimpanzee pant hoots or wolf howls, evolve into the music of modern humans.” Fitch (2015) pays more attention to another great ape behavior potentially relevant to the evolution of music, noting that “ape drumming represents a striking parallel to human percussive behavior, and its appearance in our closest living relatives (but not, apparently, among orangutans or other primates) strongly suggests the possibility of an overlooked and important homology for human instrumental music making.” Conversely, Killin (2024) suggests “wild chimps might drum along on tree buttresses or their bodies, and indeed make a racket, but not a racket with a steady beat” and McDermott and Hauser (2005) judge that it “seems unlikely that any resemblance between some elements of human and animal song is due to a homology.” Clearly, more attention to the empirical evidence is needed, both to specify the key similarities and differences between Homo and Pan collective acoustic behavior and to evaluate the evolutionary reasons for the differences between the three species. An accumulating body of work, both on acoustic behavior and musical precursors such as rhythm perception in apes, now allows for more direct comparison of theory and evidence as applied to the potentially conserved elements of our species’ musicality.

For the remainder of this paper, we follow Duranton and Gaunet (2016) in distinguishing behavioral synchrony (performing the same behavior at the same time) and temporal synchrony (fine-scale temporal alignment, but not necessarily the same behavior). Table 1 illustrates a (non-exhaustive) set of predictions for species commonalities and differences that can be drawn from the MSB and MCS hypothesis for the evolution of human musicality as applied to Pan collective acoustics. The two hypotheses make some similar and some different predictions in each case. Where they agree, they can be used to assess how well the hypotheses together account for collective acoustic behavior in Pan, and where they differ, they can be used to identify which gives more accurate predictions. Caution is required, as even similar proximate behaviors may have different ultimate functions, and likewise, similar ultimate functions may produce different proximate behaviors. We, therefore, survey both the range of contexts of ape collective acoustics as well as our knowledge on their proximate underpinnings, and propose several future directions in bridging the fields of biomusicology and great ape acoustics.

Table 1. Summary of differing predictions of MSB and MCS for Pan collective acoustics.
Hypothesis Main evolutionary function of music Predicted commonalities across Pan species Predicted differences within and between Pan species
Music for Social Bonding (MSB) Promote bonds between joint callers

Collective acoustics track affiliative (e.g., grooming, food sharing) networks.

Low importance of listeners on collective acoustic production.

Collective acoustics most common in socio-positive affiliative settings.

Collective acoustics match species differences in bonding patterns (most prominent in chimpanzees in males and at the group-level, in bonobos most prominent in females and at the dyadic-level).

Collective acoustics in intergroup encounters more prominent in bonobos (intergroup affiliation and bonding only in bonobos).

Small between-species compared to within-species differences in overall collective acoustics and rhythmic ability (similar importance of social bonds overall to both species despite distinctions in form).

Music for Credible Signaling (MCS) Honestly signal coalitionary strength

Collective acoustics track coalitionary aggression networks.

High importance of listeners on collective acoustic production.

Collective acoustics most common in socio-negative competitive settings.

Collective acoustics match species differences in coalition patterns (prominent only in male chimpanzees and female bonobos).

Collective acoustics in intergroup encounters more prominent in chimpanzees (intense intergroup competition only in chimpanzees).

More collective acoustics and rhythmic ability in chimpanzees (intense intergroup competition and coalitionary dominance networks only found in chimpanzees).

2 Contexts of Ape Collective Acoustics

2.1 Joint Calling With Others: Social Factors Associated With Calling Together

Joint calling, when two or more individuals vocalize at the same time, is a common behavior in many animal species ranging from insect species to humans (Bailey and Hammond 2003; Demartsev et al. 2018; Levinson 2016; Merker 1999; Pika et al. 2018; Schulz et al. 2008; Snowdon and Cleveland 1984; Takahashi et al. 2013). Coordinated calling, as used here, takes place when two or more calling individuals additionally actively coordinate their vocal behavior (e.g., temporally align their calls or engage in turn-taking) rather than merely calling at the same time. Since human music and singing often involve such coordinated joint vocal displays, investigating the evolutionary basis of this behavior, especially in our closest evolutionary ape relatives can inform about its evolution in humans (Merker 1999).

Joint calling has been especially well researched in wild chimpanzees, with one call type, the pant hoot, receiving particularly close attention. Pant hoots play an important role in coordinating movements of individuals in a society with a high degree of fission-fusion dynamics (Mitani and Nishida 1993). However, pant hoots are also commonly produced in choruses with other individuals, with males in some populations calling more often jointly with others than alone (Fedurek et al. 2013; Pougnault et al. 2021). Pant hooting seems to be a coordinated vocal event, and there is evidence that joint pant hooting plays an important role in cultivating social bonds, especially between males, who are more likely to chorus with bonded individuals (Fedurek et al. 2013; Mitani and Gros-Louis 1998). Pant-hoots are sometimes, but not always, steadily isochronous (van der Vleuten et al. 2024) and tend to accelerate in tempo in the build-up relative to the introduction phase (Fedurek et al. 2013). While isochrony is sometimes hailed as a prerequisite for fine-scale temporal synchrony, predictable acceleration may also enable predictable interbeat intervals. Although bonobos also produce a homologous call, the high-hoot (de Waal 1988), significantly less work has focused on bonobo acoustic communication, including joint calling. Nevertheless, one study conducted in captivity suggestively found evidence for preferential joint calling with close social partners (Levréro et al. 2019). Interestingly, where joint sound production in chimpanzees is predominantly characterized by overlapping calls of several individuals (Pougnault et al. 2021), joint calling in bonobos is more typically characterized by dyadic overlap avoidance (Levréro et al. 2019) and even turn-taking (Cornec et al. 2022; Schamberg et al. 2016). The similar social effects and presence of coordination in both species, but diverging acoustic structures, may be consistent with MSB and with recent discussion differentiating dyadic from group bonding strategies in bonobos and chimpanzees (Brooks and Yamamoto 2022).

At the same time, however, pant hoot chorusing among chimpanzees may facilitate coalitions between males, regardless of the level of social bonds between them. For example, one study showed that males were more likely to tolerate each other at feeding sites and support each other in agonistic interactions, on days when they chorused together (Fedurek et al. 2013). On a short-term basis, chorusing was a better indicator of other affiliative behavior than even social grooming (Fedurek et al. 2013). There is additional evidence for call convergence and active accommodation in chimpanzee pant hoots. Calls produced together become more similar both acutely during choruses (Mitani and Gros-Louis 1998), and longitudinally as a function of time associating together (Mitani and Brandt 2010). This indicates a dynamic process of matching acoustic properties with coalition partners, perhaps more consistent with the predictions of MCS in invoking coalitionary aggression and honest signals of alliance strength through coordination. Relatedly, there is even evidence for between-group differences in the calls of both species (Crockford et al. 2004; Girard-Buttoz et al. 2022; Mitani et al. 19921999; Schamberg et al. 2024). In wild bonobos, adults often produce long-distance high hoot choruses in a seemingly coordinated way (Hohmann and Fruth 1994); however, compared to chimpanzee pant hoot choruses, considerably less is known about their function, and there are no studies of acoustic and coalitional network similarity. Nevertheless, some differences may be anticipated. Most notably, although male chimpanzees show strong coalitionary alliances, reflected in their coordinated pant hoot calling, bonds among male bonobos are typically weak with an absence of strong coalitionary ties (Surbeck et al. 20122017). Therefore, high hoots would be expected to show less acoustic coordination among male bonobos as compared to chimpanzees under both MSB and MCS. Female bonobos do, by contrast, show stronger coalitionary relationships (Furuichi 2011), although it remains unclear whether these may be signaled through acoustic displays such as hoot chorusing.

Besides the pant hoot, joint calling can also be observed in other contexts such as hunting, social grooming, and alarm calling. For example, in wild chimpanzees, joint calling can precipitate engagement in group hunts, where the bark vocalization serves as a reliable indicator of hunting motivation, promotes recruitment of others to the hunt, and joint calling is associated with increased hunt efficiency (Mine et al. 2022). In at least this context, therefore, joint calling is a credible signal of both cooperative intent and coordinative ability, consistent with MCS, but time-series data is needed to evaluate the presence of active coordination of barks. While wild bonobos engage in hunting, it is mostly opportunistic and individual-based, with little evidence of group coordination (Ihobe 1992; Surbeck and Hohmann 2008). Both species also regularly produce sounds (e.g., lip-smacks, sputters, teeth chomps) during grooming bouts, which are associated with longer and reciprocated grooming (Fedurek et al. 2015) as well as grooming with higher-ranking and closer coalition partners (Watts 2016). These sounds are typically produced only by individuals actively grooming (and not those simply receiving grooming) (Fedurek et al. 2015), and while no coordination of these vocalizations has been quantitatively assessed, bouts with multiple simultaneous groomers are more common in chimpanzees than bonobos (both mutual and polyadic grooming) (Allanic et al. 2020; Girard-Buttoz et al. 2020a; Sakamaki 2013; Surbeck and Hohmann 2015). Individually, lip-smacks are the only chimpanzee vocalization so far have been found to contain speech-like rhythmic properties (Pereira et al. 2020), suggesting this may be a worthwhile direction of study. MSB, but less so MCS, would predict the social bonding context to be particularly salient for collective acoustic behavior. Finally, alarm call vocalizations are common to both species and can often overlap. Of relevance, in a field experiment, chimpanzees’ alert calling was more effective at alerting groupmates of a potential threat (model viper) than bonobos in a group context, presumably involving some degree of coordination (Girard-Buttoz et al. 2020b).

Evidence from both great apes, but primarily chimpanzees, therefore, provides some empirical support to both MSB and MCS. Across studies, these findings point towards continuity between humans and other apes in the form and function of collective acoustic production.

2.2 Joint Calling to Others: Audience Sensitivity

Here, we discuss two examples of processes in ape vocal behavior which have importance for both MSB and MCS: sensitivity to audience presence and sensitivity to the audience's response.

Audience effects—sensitivity to the presence and type of audience when signaling—are common processes in many animal species and are informative about both signal function and the mechanisms involved in its production (Zuberbühler 2008). Audience effects play a role in joint signaling, as again demonstrated by research on long-distance chimpanzee pant hooting and bonobo high-hooting. A recent study, for example, showed that male chimpanzees were more likely to produce pant hoots during displays when other males, especially dominant males, were nearby (Soldati et al. 2022). Male pant hoot rates may also increase in the presence of an oestrous female, possibly because the need for signaling social status and social bonds between males is higher during the times of elevated male-male competition (Fedurek et al. 2014; Kalan and Boesch 2018; but see Eleuteri et al. 2022; Mitani and Nishida 1993). Another study found that chimpanzees pant hooted more frequently when alliance partners were nearby, but importantly found no effect of preferred grooming partners (Mitani and Nishida 1993), more consistent with MCS than MSB. Among wild bonobos, there is evidence that individuals can signal to other audience members their intent to either join or be joined by other members in different parties in coordinated vocal exchanges (Schamberg et al. 20162017), but the role of diverse forms of social relations in their coordination has not yet been directly investigated.

Adjusting signal production in accordance with the response of the audience is another way of facilitating coordinated signaling. Acoustic analyses have revealed that the structure of pant hoots actually promotes chorusing. In chimpanzees, the acoustic structure of pant hoots has been shown to be subject to modification depending on whether other individuals participate in joint calling. Male chimpanzee adjust the duration of the build-up and climax phases of the pant hoot depending on the audience response: if the partner joined late the initiator's call at the build-up rather than the introduction stage, the initiator prolonged the build-up as if to let the partner join the display, before progressing to the climax (Fedurek et al. 2013). Importantly, there was a positive correlation between the number of climax elements given by two chorusing males, suggesting that males actually coordinate their calls when pant hooting together (Fedurek et al. 2013). Chimpanzees also prolong the climax phase while calling in groups, again to facilitate chorusing (Fedurek et al. 2013). Similar coordinated calling has been proposed for bonobo high hoots, with individuals apparently temporally synchronizing their calls when chorusing with others (de Waal 1988; Hohmann and Fruth 1994), but direct evidence is limited. Unlike chimpanzee pant hoots, which are multi-componential and contain longer/slower call elements that can promote joint calling, bonobo high hoots are short in duration and produced in fairly rapid repetitive phases (Clay and Zuberbuhler 2014). Moreover, although wild bonobos have been shown to combine high hoots with a longer ‘whistle’ prefix call, this combination appears to instead signal the caller's intention to join another party, rather than be used to promote joint calling (Schamberg et al. 2017). In this sense, although it remains to be tested, the acoustic structure and call delivery of bonobo high-hoots may make them less amenable to joint calling, especially at the group level.

Joint calls in Pan, therefore, may overlap in context and effects with some human musical behavior as predicted by MSB and MCS. Individuals are actively modifying and coordinating their calls with one another, and this calling and coordination depend on both other producers and the receivers.

2.3 Percussive Behavior

Fitch (2006) has suggested that drumming in chimpanzees and chest beating in gorillas could be homologous to human music. Bimanual percussion is rare across animals and among primates only found in the African great apes (Fitch 2006). Looking at the forms and functions of this percussive behavior in the Pan apes, especially in collective contexts, is thus one way to investigate the origins of our musical behaviors.

In chimpanzees, drumming can accompany vocal signals such as long-distance calls (e.g., climax of a pant hoot), and can sometimes even replace part of the vocal phrase (Arcadi 1996; Babiszewska et al. 2015; Boesch 1991). Drumming in the wild shows rhythmicity and some between-subspecies variation (Eleuteri et al. 2025) though the importance of such differences are not yet well-understood. Tree-buttress drumming with hands and feet is most commonly observed, but stone-assisted drumming has also been reported, and may serve a distinct behavioural function (van Loon et al. 2025). Drumming can be seen in a diversity of contexts, the most frequent of which in chimpanzees are during dominance displays and during travel (Clark Arcadi et al. 19982004; Goodall 1986). When a male starts drumming, others often respond with their own drumming and/or vocalizations which can generate, or coincides with, high levels of tension at the collective level (Arcadi et al. 1998; Ghiglieri 1984; Goodall 1986; Nishida et al. 1999; Reynolds and Reynolds 1965). In bonobos, drumming has also been reported but appears to be much rarer, with no focused studies on the phenomenon. Some authors report that buttress drumming is generally performed by males and that it is often associated with long-distance vocalizations, as in chimpanzees (Hohmann and Fruth 2003), and also that it is often seen just after a collective departure (Kano 1998). However, more research is needed to assess how, why, and when bonobos drum.

Long-distance communication is one likely function for chimpanzee drumming. For example, drumming can inform the various parties about the direction one has taken, improving group cohesion and facilitating regrouping at the next resting site (Boesch and Boesch-Achermann 2000). To this end, Babiszewska et al. (2015) found drumming occurred most frequently during travel, Fitzgerald et al. (2022) and Wilhelm et al. (2025) found a preference for thin and wide buttresses for drumming, and Eleuteri et al. (2022) found individual distinctiveness in the drumming bouts produced during travel but not in those produced during dominance displays.

Not all drumming behavior, however, may function only for long-distance communication. Chimpanzees often drum when they regroup after traveling or on feeding sites (Goodall 1986), also called booming, carnival displays, or even celebrations in the food context (Ghiglieri 1984; Reynolds and Reynolds 1965; Sugiyama 1969), which may serve an affiliative function, proximately linked to high collective arousal in a positive social context. Kalan et al. (2019) report on chimpanzees’ preference for resonant timbre in accumulative stone throwing. Matsusaka (2012) reports on a young wild chimpanzee who repeatedly hit a clay pot with facial expressions indicating he enjoyed the noise or the drumming, while Nishida et al. (2009) describe the use of a metal wall for loud drumming only by older individuals as well as a novel “belly slap” percussive behavior in play contexts. In captivity, one chimpanzee was recorded drumming alone for several minutes including with isochronous rhythms (Dufour et al. 2015). Another chimpanzee on multiple occasions produced a multi-material “instrument” for his drumming displays, and even repaired it when necessary (Watson et al. 2022). The lack of obvious communicative function for these drumming events may suggest that chimpanzee drumming cannot be attributed to a single function.

There is some evidence for both between-community (Clark Arcadi et al. 19982004) and between-individual (Arcadi and Wallauer 2013; Babiszewska et al. 2015; Eleuteri et al. 2022) differences in drumming behavior. Though Arcadi and Wallauer (2013) attribute much of the between-individual rhythmic variation to differences in gait, Eleuteri et al. (2022) find that individual drumming distinctiveness depends on context. In bonobos, too, there are between-community differences, where buttress drumming may be performed mainly by hands in the Lomako population but by feet at Wamba (Ingmanson 1996). However, the existence of between-individual variation in drumming has not been quantified in bonobos. Whether between-community differences are socially learned or a by-product of differences in the environment remains to be further explored in both species.

Thus, while drumming in the wild can be involved in dominance competitions and territorial defense, it is also frequently associated with social gathering, collective arousal, play, and other non-aggressive contexts, consistent with MCS and MSB, respectively. Its occurrence is not indicative of a singular communicative function, but rather represents a broad category of acoustic behavior in apes that can be social or non-social. Careful attention to differentiating the underlying causes of drumming across contexts will, therefore, be necessary to evaluate its relevance to the evolution of human musicality. Its rarity across species, yet prevalence across diverse contexts in chimpanzees in both the wild and captivity, is suggestive of a multi-faceted evolutionary history of percussive acoustic production in great apes.

2.4 Intergroup Encounters

One central point in MCS is that music evolved as a credible signal of coalition strength, size, and cooperation abilities, in particular to out-groups. Chimpanzees are territorial and hostile to neighbors, and between-group killings are a pervasive feature of chimpanzee societies (Wilson and Wrangham 2003). In contrast, bonobos are not territorial, aggression between communities is generally mild (Moscovice et al. 2022), no between-group killings have ever been reported, and bonobos often affiliate and cooperate with out-group members (Fruth and Hohmann 2018; Furuichi 2011; Tokuyama et al. 2019). The two recently diverged sister species in otherwise largely similar ecological niches can, therefore, be used to test the role of collective acoustics in primate intergroup competition. If credible signaling of group strength drives the expression of the components of music (MCS), such components would be expected to be expressed during inter-group encounters in chimpanzees more than bonobos.

In chimpanzees, most inter-group encounters are characterized by “auditory combats” (Wittig et al. 2016). Such “combats” are observed across several populations and start with members of one community engaging in chorusing of pant-hoots, and in particular the “roar” pant-hoot variant (Crockford 2019; Goodall 1986), as well as other calls such as screams and barks often combined with joint drumming on buttresses (Boesch and Boesch-Achermann 2000; Goodall 1986; Nakamura et al. 2015; Wittig et al. 2016). The other community replies by counter-calling (Boesch and Boesch-Achermann 2000; Goodall 1986; Nakamura et al. 2015; Watts and Mitani 2001; Wilson et al. 2001) using the same combination of duet calling and drumming. Chimpanzees also use other means to produce sounds and possibly impress the opponents, such as dragging and shaking branches (Goodall 1986). In most cases, the smallest sub-group (or party) and/or the one with the least number of males retreats (Goodall 1986), and it has been argued that duet calling provides an honest signal of group size (Wilson et al. 2007). As such, chorusing but also coordinated drumming may serve as a credible signal of group strength. Such a signal allows gauging the power of the enemy (Boesch and Boesch-Achermann 2000) and impacts the decision of chimpanzees to engage in a conflict (Herbinger et al. 2009; Lemoine et al. 2022; Wilson et al. 2007), thereby lowering the risk of deadly outcomes. Indeed, in playback experiments (Wilson et al. 2001), wild chimpanzees parties with three or more males typically responded to outgroup pant-hoots with loud, behaviorally synchronous chorusing and approached the speaker, while parties with less than three adult males typically stayed silent and moved slowly away from it. In captivity, too, hearing outgroup pant hoots was associated with both increased vigilance and increased affiliative social behavior with ingroup members (Brooks et al. 2021).

In bonobos, auditory exchanges during inter-group encounters also occur (Furuichi 2019; Itani 1990), though they are described in less detail. For instance, it remains unclear whether bonobos of one community coordinate calling during such exchanges. Bonobos drum on occasions during inter-group encounters (Moscovice et al. 2022), but unlike in chimpanzees, drumming is not behaviorally synchronous between individuals. Whether this results from the lack of opportunities related to the size of trees in the bonobo habitat remains to be established. Bonobos also produce sounds by dragging, shaking or dinning branches during inter-group encounters (Hohmann and Fruth 2003), especially the males (Furuichi 2019), but it remains to be established whether this constitutes an attempt by males to signal to females their intent to travel away from the encounter or if they are used to impress the opponent (Furuichi 2019). Altogether, evidence so far supports the overall idea that joint calling and drumming is more important during inter-group encounters in chimpanzees than in bonobos, supporting MCS.

3 Proximate Underpinnings

3.1 Rhythm and Entrainment

Central to both MSB and MCS is rhythm. According to MSB, the main advantage of maintaining a regular rhythm is that it allows others to join in and synchronize, while according to MCS it is to signal coordinative ability and motivation. If collective acoustic behavior in Pan is homologous to elements of human musicality, both theories would predict pronounced rhythmic ability and the occurrence of rhythm in social contexts in chimpanzees and bonobos.

Given that chimpanzees and bonobos are drummers in the wild, we may expect that they should be skilled at producing rhythmic drumming bouts. In the wild, percussive events are short (generally shorter than 12 s; Fitch 2006), which has long precluded many detailed analyses of rhythmicity, but recent work using a large cross-site dataset was able to demonstrate species-level non-random drumming patterns and subspecies-level variation in some facets of rhythmicity (such as isochrony) (Eleuteri et al. 2025). In captivity, the drumming sequences of a male chimpanzee named Barney showed several regular rhythms, including a binary rhythm (enabled by the bimanual form) and an isochronous rhythm that lasted for more than 30 s (155 beats) (Dufour et al. 2015). A follow-up study of Barney's captive colony further found that several other individuals also drummed rhythmically (on buckets, various other objects, or against walls), frequently and for long durations (Dufour et al. 2017). More recently, van der Vleuten et al. (2024) analyzed the rhythmicity of vocal and motoric behavior sequences in two groups of captive chimpanzees and found considerable isochrony. Despite individual variation in tempo, display sequences across both groups contained regular, isochronous rhythms, suggesting some abilities underlying rhythmic production likely are conserved from our common ancestor (van der Vleuten et al. 2024). Evidence, therefore, suggests that chimpanzees are able to produce regular beats for more than a few seconds, though their ability to flexibly modulate their drumming remains unknown. In bonobos, Kanzi, a captive language-trained individual, was also reported to perform rhythmic drumming (Kugler and Savage Rumbaugh 2002), suggesting the ability may be found across Pan. However, there is unfortunately no published data detailing these events. Replication of the original study and/or additional work on bonobo rhythmicity are, therefore, crucially necessary. Overall, these studies support the notion that the Pan apes can generate and express internal rhythms, as would be predicted by homology to human music.

The ability to match behavior in time to an external stimulus is called entrainment, which is thought to be relatively widespread in the animal kingdom and central to human musicality (see Ravignani 2014 for a review). Despite its importance, there is little direct study of entrainment in bonobos or chimpanzees. In the wild, a form of action entrainment by motor mimicry of pounding gestures has been reported in young chimpanzees watching others cracking nuts (Fuhrmann et al. 2014). Relatedly, chimpanzees also produce “carnival displays” in which several individuals vocalize and produce repetitive movements together in a tense but non-aggressive way (Ghiglieri 1984; Merker 1999). Chimpanzees have also been found to temporally synchronize their walking rhythms when walking together compared to independently (Schweinfurth et al. 2022), and temporal synchrony akin to dancing has been reported between two individuals even without explicit auditory cues (Lameira et al. 2019). Further, percussive drumming by one individual can also trigger drumming in another, or intense emotional responses such as repeated swaying (Dufour et al. 2015; Goodall 1986). Still, none of these behaviors yet provides convincing evidence for true entrainment.

From experimental studies, we do know that one chimpanzee, Ai, could alternate pressing between two keys on a piano in time to a 600 ms inter-beat interval auditory stimulus even without previous training (Hattori et al. 2013). Additionally, auditory rhythms influenced chimpanzees’ tapping onset in a test of distractor effects (Hattori et al. 2015). However, in both these studies, this effect occurred only for tempos that were close to the participants’ spontaneous tapping tempos, limiting conclusions about any true entrainment. Hattori and Tomonaga (2020) additionally showed that rhythmic swaying could be induced by diffusing sound stimuli (rhythmical or not) to chimpanzees. The rhythmicity in one male's movements was positively correlated with the rhythm of the sound, but only while standing upright. A bonobo was also found to spontaneously match its own drumming tempo to the one of a human drummer, consistent with entrainment effects, but this occurred for less than a few seconds at a time (Large and Gray 2015). There is, therefore, not yet clear evidence for entrainment, but these studies do emphasize flexibility in coordinating interindividual rhythmic movements and suggest the crucial need for controlled and direct studies.

Thus, it appears rhythm production and beat keeping may occur in both bonobos and chimpanzees, highlighting the importance of both species to evolutionary considerations into the origins of music. That said, evidence for entrainment via coordination to others’ rhythms remains equivocal at best, and may point towards considerable individual variation. Future work is necessary to clearly define their extent and functional importance, but the presence and natural occurrence of certain rhythmic abilities in our ape relatives emphasize their relevance to music evolution and the potential for homology in the mechanisms underlying collective acoustics among human and non-human great apes.

3.2 Behavioral Coordination

Behavioral coordination is a key element for any major theory of music evolution. In MSB, coordination mediates the connection between music and social bonds, where music is proposed to enhance coordination of actions, synchrony, and turn-taking among individuals. In MCS, the costs associated with acoustic coordination are what justify its position and efficacy as a credible signal. We, therefore, here focus on mechanisms of behavioral synchrony more generally, describe examples of the kinds of complex coordination observed in wild chimpanzees and bonobos, and highlight the role of collective acoustic behavior in promoting their success.

On a proximate level, to recognize behavioral coordination, individuals must first be able to perceive the matching of their own movements with the perceived actions of others. This also underscores mirror self-recognition, which has been documented to some extent in great apes, including chimpanzees and bonobos (Anderson 1984; Anderson and Gallup 2015). Experiments involving chimpanzees have further demonstrated that self-recognition can occur even with a delay of up to 4 s between their actual movements and the movements reflected on a monitor displaying their own images (Hirata et al. 2017). Apes have additionally shown imitation recognition with experimenters (Haun and Call 2008; Nielsen et al. 2005), zoo visitors (Persson et al. 2018; though see Motes-Rodrigo et al. 2021), and robots (Davila-Ross et al. 2014). In other words, it is clear that individuals can recognize coordination between their own movements and the movements of “others,” even with a certain degree of extension across time and space.

Then, does actual behavioral coordination occur among individuals, and what are the consequences? Yu and Tomonaga (2015) found that when two chimpanzees were tasked with alternately tapping two buttons, the timing of tapping spontaneously synchronized between the two individuals. This synchronization, interestingly, seemed to be induced more strongly by auditory cues than visual cues (Yu and Tomonaga 2018). In experiments involving pairs of captive chimpanzees in which participants were given a shared version of a computer-based serial ordering task, it was shown that two individuals could jointly perform tasks by alternating roles (Martin et al. 2017). Observationally, walking chimpanzees have been found to coordinate the timing of their steps (Schweinfurth et al. 2022). Grooming is one regular context of fine-scale temporal and behavioral coordination that can be observed among wild adults in both species. Interestingly, while chimpanzees exhibit frequent polyadic grooming interactions, requiring simultaneous coordination between several individuals and suggested to support polyadic affiliation in support of intergroup conflict, grooming in bonobos tends to be between just two individuals (Girard-Buttoz et al. 2020a; Sakamaki 2013), perhaps parallel to differences in joint calling reported above. In both controlled tasks and observation, therefore, apes do appear to engage in spontaneous temporal coordination, with some hints at a species difference in at least some contexts.

The consequences of synchrony and coordination on the psychology of individuals, however, still remain relatively unexplored. Crucial to MSB is that individuals should become more closely affiliated after engaging in coordinated behavior. Somewhat suggestively, captive chimpanzees have been reported to show enjoyment and playfulness as a consequence of imitation (Sauciuc et al. 2017). Further, imitation recognition is correlated with socio-cognitive ability in chimpanzees (Pope et al. 2015) and chimpanzees approach human experimenters coordinating their gaze faster than behaviorally unmatched humans (Wolf and Tomasello 2019). Some evidence from phylogenetically more distant species (e.g., capuchin monkeys, Paukner et al. 2009, and dogs, Duranton et al. 2019) also suggests that the positive link between synchronization and social bonds may not be unique to humans and may instead be phylogenetically widespread. Evidence in great apes is limited, and explicit tests are much needed.

The available evidence is altogether consistent with the view that great apes possess the necessary precursors for both the downstream effects of collective acoustics posited in MSB as well as the upstream ability to coordinate behaviorally and vocally highlighted in MCS.

3.3 Emotional Content

According to MSB, the sharing or synchronization of emotions, moods, and behavioral states through collective acoustic processes can serve to establish or solidify social bonds. This entails some degree of emotional contagion, which refers to the matching or sharing of emotional states, triggered through observing the states of others (Chotard et al. 2018; Hatfield et al. 1993; Provine 1992). Emotional contagion is considered to form a core basis of many social interactions, including those of humans, where it can promote social affinity, information transfer, and even more advanced forms of prosocial emotional orientation, such as empathy (Preston and de Waal 2002). Ape vocalizations are considered to be tightly tied to emotional states, even if they may also provide referential information at the same time. According to MSB, emotion contagion should thus occur in contexts of great ape collective acoustics, while MCS does not make specific predictions for emotion sharing between producers.

Some evidence for emotional contagion in chimpanzee collective acoustics comes from Baker and Aureli (1996) and Videan et al. (2005), finding evidence for a neighbor effect in chimpanzees where overheard vocalizations promoted behavior associated with the vocalizations. Specifically, vocalizations produced in agonistic contexts promoted agonistic behaviors in the listeners, while overheard affiliative vocalizations promoted affiliative behaviors (Videan et al. 2005). The emotional content of the neighbor effects was thus not constrained to a particular context or vocalization, but the caller's state seems to have diffused to those who heard. Relatedly, Debracque et al. (2023) additionally find that even humans can recognize the emotional content of many primate vocalizations, with the highest performance with chimpanzee calls. This suggests deep-rooted emotional content embedded in these vocalizations. Interestingly, human rating performance was less accurate for detecting the emotional content of bonobo calls, which may be due to their higher acoustic pitch, making them appear less acoustically similar to the human vocal repertoire.

Food discovery represents one especially notable context that elicits rich collective acoustic displays and emotion sharing in both chimpanzees and bonobos. Both chimpanzees and bonobos produce diverse arrays of vocalizations during food discovery and consumption, including acoustically distinct food-associated calls combined with other calls such as greetings and long-distance vocalizations (Clay and Zuberbühler 2009). In both species, perhaps more consistent with MSB, the discovery of a desirable food patch typically results in a loud eruption of food-associated vocalizations across individuals, who may continue to call together while they are feeding, arriving into the feeding tree, and in some cases, engaging in food sharing. As noted, on the other side of the emotional spectrum, chimpanzee drumming can lead others to respond with their own drumming associated with high levels of collective tension (Arcadi 1996; Goodall 1986; Nishida et al. 1999).

Coordinated calling can be found across emotional contexts from socio-positive grooming, food patch discovery, and reunions, to highly tense alarm-calls, dominance displays, and intergroup encounters. Although more work is needed, engaging in collective acoustic displays during highly emotional events may represent an important evolutionary context we share with our ape relatives, upon which more advanced forms of Hominid musicality could have evolved.

3.4 The Possible Role of Oxytocin

Oxytocin, an ancestral neuropeptide involved in the reproductive function of most vertebrates (Jurek and Neumann 2018) and promoting social bonding and prosocial behaviors in social mammals (Crockford et al. 2014), has been hypothesized to play a crucial role in human musicality (Fukui and Toyoshima 2023; Harvey 2020). In humans, the brain regions associated with music perception and production and the neural activity associated with the oxytocinergic system largely overlap (Harvey 2020). Furthermore, oxytocin both promotes coordinated musical behaviors (e.g., temporally synchronized finger tapping; Gebauer et al. 2016) and is released following the expression of musical behaviors (e.g., group drumming; Yuhi et al. 2017, and choir singing; Kreutz 2014). To our knowledge, the link between the oxytocinergic system and components of music has not yet been investigated in chimpanzees and bonobos. However, bonding and prosocial behaviors (i.e., grooming, food sharing, and post-conflict affiliations in chimpanzees; Crockford et al. 2013; Preis et al. 2018; Wittig et al. 2014, and female-female sexual interactions in bonobos; Moscovice et al. 2019) are also associated with a rise in peripheral oxytocin levels in both species. In turn, intranasal administration of oxytocin increases the propensity to groom others in at least some bonobos (Brooks et al. 2022). Furthermore, chimpanzee oxytocin levels reach exceptionally high levels during but also in anticipation of inter-group conflicts and border patrols (Samuni et al. 20172019), during which group members need to coordinate their behaviors, and intranasal oxytocin relatedly promotes social attention to outgroup members (Brooks et al. 2022). Oxytocin may thus play a key role in mediating music-like behaviors, such as duet calling or joint drumming, which are thought to promote social bonding and require high coordination in coalitionary contexts (see Section 2.1). Interestingly, the oxytocin system has even been highlighted in the social divergence of chimpanzees and bonobos. Behavioral evidence has found opposite effects of oxytocin on social attention to the eyes (enlarging known species differences; Brooks et al. 2021), while genetic evidence has found several species differences in the OXTR region across numerous investigations (e.g., Staes et al. 201420192021; Kovalaskas et al. 2020; Summers and Summers 2023). These studies together further emphasize the need for more research in the hormone's role in the context of collective acoustics, drumming, and music evolution.

3.5 Proximate Effects of Music on Ape Behavior

Finally, a few previous studies have investigated the direct effects of playing human music on chimpanzee behavior, which can be suggestive of whether homologous mechanisms underlie the perception of musical stimuli in both human and non-human apes. For example, some work has found that chimpanzees, like humans, not only discriminate between but also show some preference for specific types of music. Sugimoto et al. (2010) found that chimpanzees show a spontaneous preference for consonant over dissonant versions of music stimuli, as seen in humans in early stages of development. Mingle et al. (2014) investigated musical preferences of chimpanzees by exposing them to various types of world music and found a preference for West African akan and North Indian raga music. This suggests potential commonalities in acoustic preferences between humans and non-human primates, but detailed follow-up preference tests with finer-grain resolution have not been conducted.

In another line of studies, researchers have played back music to captive chimpanzees primarily as a means of environmental enrichment. Wallace et al. (2017) found mixed results in a systematic survey, finding decreased abnormal behavior, increased activity, and decreased social behavior, but no change in self-grooming or aggression, in music and compared to silent conditions (using both classical and pop music as stimuli). Howell et al. (2003), by contrast, found that music has positive effects on ape behavior by reducing agitation and aggression, using a wide sample of musical stimuli. Videan et al. (2007) also examined the effects of different types and genres of music on captive chimpanzees and found that instrumental music increased their affiliative behavior, while vocal music (especially slower “easy-listening” music) reduced their agonistic behavior. These findings suggest that music can potentially serve as an environmental enrichment tool for apes and that overheard human-made musical stimuli affect free behavior. The choice of stimuli interacts in complex ways, as would be expected given the diversity of human music, but many of these effects are still to be uncovered in future work.

These findings generally align with the MSB hypothesis, perhaps moreso than MCS, where increased affiliation and decreased agonism in music contexts do not have straightforward explanations. This suggests that music's role in fostering social bonds, possibly through stress reduction and behavioral synchronization, may be shared.

4 Discussion and Future Directions

The studies cited in the previous two sections provide compelling evidence that much of our own species’ musicality has precursors that can be found and studied in the behavioral repertoire of our closest living great ape relatives (see summary in Table 2). Chimpanzees and bonobos, therefore, provide a pair of species through which to test the major theories. Despite largely overlapping in life histories, ecology, and behavior, their apparently striking differences in social bond structure, patterns of coalitionary aggression, and collective acoustic behaviors allow for a clear context where hypotheses can be directly compared. More attention to these species may help shift the balance of evidence, if differing theories can more convincingly explain and make novel predictions for species-specific patterns of behavior. In this section, we, therefore, briefly present a selection of specific directions of future study, especially pertinent to distinguishing these theories for the evolutionary origins of human musicality.

Table 2. Summary of empirical highlights related to Pan collective acoustics and the conserved roots of human musicality. Note that this table is meant to refer readers to existing literature, and while many chimpanzee findings have not been found in bonobos this does not imply the absence of meaningful effects but rather the absence of direct study.
Domain Takeaways Key references
Joint calling Joint calling is associated with social affiliation in both chimpanzees and bonobos, with support in agonistic interactions associated in chimpanzees, and with cooperative intent and recruitment in chimpanzee during hunting Fedurek et al. (2013); Levréro et al. (2019); Mine et al. (2022); Mitani and Gros-Louis (1998)
Chimpanzees may tend towards call overlap, bonobos towards overlap avoidance Cornec et al. (2022); Fedurek et al. (2013); Levréro et al. (2019); Pougnault et al. (2021); Schamberg et al. (2016)
Long-distance vocalizations are coordinated in the short term and become more similar over the long term for both species, with more evidence in chimpanzee pant hoot de Waal (1988); Fedurek et al. (2013); Hohmann and Fruth (1994); Mitani and Brandt (2010); Mitani and Gros-Louis (1998); Schamberg et al. (2016), 2017
Possible group-specific call and drum patterns Arcadi (1996); Arcadi et al. (2004); Crockford et al. (2004); Girard-Buttoz et al. (2022); Mitani et al. (19921999; Schamberg et al. (2024)
Audience effects Chimpanzee pant hoots are produced most in competitive situations (around dominant males and estrus females), and more in the presence of coalition partners, but not grooming partners Fedurek et al. (2014); Kalan and Boesch (2018); Mitani and Nishida (1993); Soldati et al. (2022)
Little empirical work on bonobos, but some evidence for coordinated calling in signaling interparty movement Schamberg et al. (20162017)
Percussive behavior (drumming) Drumming is common in varied contexts in chimpanzees, can be integrated into pant hoots, and is often performed in social settings Arcadi (1996); Arcadi et al. (19982004); Babiszewska et al. (2015); Boesch (1991); Boesch and Boesch-Achermann (2000); Eleuteri et al. (20222025); Fedurek et al. (2013); Goodall (1986); Nishida et al. (1999)
Chimpanzees show preference for resonant sounds in their drumming Fitzgerald et al. (2022); Kalan et al. (2019); Matsusaka (2012); Nishida et al. (2009); Wilhelm et al. (2025)
Bonobos sometimes drum too, but likely to a much lesser extent than chimpanzees, though there are no focused studies Hohmann and Fruth (2003); Ingmanson (1996); Kano (1998); Moscovice et al. (2022)
Intergroup encounters Chimpanzees engage in “vocal combats” of chorusing and drumming. Bonobos have been reported to drum in intergroup encounters too, but rarely Boesch and Boesch-Achermann (2000); Crockford (2019); Goodall (1986); Moscovice et al. (2022); Nakamura et al. (2015); Wittig et al. (2016)
Chimpanzees evaluate size and strength of other groups based on their calling Boesch and Boesch-Achermann (2000); Herbinger et al. (2009); Lemoine et al. (2022); Wilson et al. (2001), 2007
Rhythm and entrainment Chimpanzees can drum isochronously Dufour et al. (20152017); Eleuteri et al. (2025); van der Vleuten et al. (2024)
Chimpanzees can temporally synchronize movement, with some evidence of action entrainment Fuhrmann et al. (2014); Lameira et al. (2019); Schweinfurth et al. (2022)
Auditory stimuli influence tapping and rhythmic swaying in chimpanzees, but evidence for true auditory entrainment is so far unclear Hattori et al. (20132015); Hattori and Tomonaga (2020)
Behavioral coordination Chimpanzees can consistently recognize action coordination Haun and Call (2008); Nielsen et al. (2005); Persson et al. (2018); Pope et al. (2015); Sauciuc et al. (2017)
Chimpanzees can synchronize finger taps, affected by auditory cues Yu and Tomonaga (2015), 2018
Emotional content In chimpanzees, hearing vocalizations associated with emotional behaviors makes the same behaviors more likely Baker and Aureli (1996); Videan et al. (2005)
Humans recognize emotional content of primate vocalizations Debracque et al. (2023)
Effects of music on ape behavior Some null results and some evidence for reduced agonism/increased affiliation caused by music playback in chimpanzees Howell et al. (2003); Mingle et al. (2014); Sugimoto et al. (2010); Videan et al. (2005); Wallace et al. (2017)

One of the major threads highlighted throughout the empirical review section is a need for greater attention to bonobos. This includes replication of nearly all the previous studies conducted with chimpanzees. Studies assessing patterns of call convergence as associated with grooming or coalitionary aggression partners (Mitani and Brandt 2010; Mitani and Gros-Louis 1998), correlations between rates of joint calling and other social variables in wild populations (Fedurek et al. 2013; Mitani and Nishida 1993), acoustic analysis of vocalization suitability for joint calling (Fedurek et al. 2013; Mitani and Gros-Louis 1998), measurement of spontaneous coordination in experimental and observational contexts (Schweinfurth et al. 2022; Yu and Tomonaga 2015), and observation of response to music playback (Howell et al. 2003; Mingle et al. 2014; Videan et al. 2007; Wallace et al. 2017) are a few particular avenues worth repeating. MCS would predict more frequent coordinated calling and drumming, greater rhythmic ability and sensitivity, and more musically-like behavior in general in male chimpanzees than bonobos, given the extreme species difference in coalitionary and between-group aggression. MSB, on the other hand, would predict more species similarity, and would be supported instead by evidence of a prosocial effect of collective acoustic behavior in both species. Although there is a lack of reports on bonobo collective acoustic behavior, the absence of evidence is not evidence of absence, and it would be premature to draw conclusions about species differences. For instance, there are no systematic reports on the percussive behavior of wild bonobos, but it is unclear if this represents a species difference, an ecological difference in buttress availability, or merely a sampling bias in the published literature. Repeating the analyses of published studies with data from bonobos is the most obvious, and possibly most telling, path forward.

Another central thread highlighted throughout sections is the need for more detailed analysis of the time-series and rhythmic properties of ape collective acoustics. Behavioral synchrony alone is insufficient, and firm conclusions will depend on detailed analysis of finer temporal synchrony across behaviors and contexts. As mentioned briefly, the role of non-isochronous, but still highly temporally predictable, rhythms through steady or systematic acceleration may be overlooked in studies of animal musicality. The use of rhythmic categories in analysis of primate acoustics has proved a promising approach (De Gregorio et al. 2021). Even in humans, true isochrony can be challenging, and tempo drift is regularly observed (Collier and Collier 2007; Collier and Ogden 2001; Dotov et al. 2022), often tending towards anticipation of acceleration of the next beat (negative mean asynchrony), especially among non-musicians (Repp 2005). Interestingly, Okano et al. (2017) find greater tempo acceleration in paired than solo conditions for all tempos tested. Rhythmically, chimpanzee pant hoots (and possibly other collective acoustic phenomena) in some settings may be thus better described as accelerando and tempo rubato than strict isochrony. Notably, however, while accelerando tempos are limited in duration and are inherently short-lived, isochronous tempos can be sustained indefinitely, which may have been a functionally important development in early human collective acoustics.

There are several more future directions for observation of naturalistic acoustic production in both bonobos and chimpanzees. For example, it will also be important to validate whether differences that are reported in patterns of joint calling, characterized by overlap in chimpanzees and turn-taking in bonobos, are consistent across populations. This difference would be more consistent with MSB, especially considering recent suggestions of tendencies towards dyadic compared to scalable bonding strategies in bonobos compared to chimpanzees (Brooks and Yamamoto 2022; Girard-Buttoz et al. 2020a). Another important area of future study in both species is aiming to address the direction of effect for many of the correlations found in previous studies. MSB specifically predicts music's evolutionary role in fostering and developing social bonds, and thus predicts joint calling to be more predictive of subsequent rather than prior social affiliation, while MCS predicts coordinated calling to be a consequence rather than a cause of coalitional support. Many of the reviewed papers are highly suggestive, but true support for either hypothesis will require evidence of directionality. More studies of stimuli playback, fine-grained temporal patterns in wild populations, and experimental manipulations of synchrony, followed by measures of affiliation, are a handful of specific studies. Finally, as mentioned above, more studies are needed on drumming across contexts. Studies have been gradually accumulating over the past decade, but there is still a serious deficit of knowledge about the diverse roles of percussive behavior.

More proximately, there is a clear need for work in both species on entrainment, oxytocin, and sensitivity to sound stimuli. There is not yet solid evidence for entrainment, but there is also a lack of reporting on any clear and explicit tests for its absence. Oxytocinergic involvement in collective acoustics and drumming would be consistent with both hypotheses, but especially aligned with the predictions of MSB. Future studies testing whether oxytocin increases the propensity of chimpanzee and bonobos to engage in acoustic communication and in particular in coordinated vocal or sound production, as well as measuring oxytocin levels following the expression of such coordinated acoustic productions, will be key in retracing the physiological system that possibly promoted the emergence and supported the maintenance of music in our lineage (Harvey 2020). Further, playback experiments should target both naturalistic sounds and experimentally controlled stimuli with certain musical properties. With naturalistic stimuli, MSB predicts high sensitivity to the context and emotional content of calls, while MCS predicts instead higher sensitivity to the degree of coordination involved in their production. With more controlled stimuli, simple tests of discrimination between solo and jointly produced, rhythmic and arhythmic beats may prove insightful. If apes can consistently discriminate between rhythms, it would suggest they may be able to form associations between rhythms and emotional contexts, individuals, or groups from low-level processes. Association between rhythms and emotional content would support MSB, while higher discrimination in the underlying coordination, but not absolute rhythm, would support MCS. Creation of species-relevant music following Snowdon et al. (2015) and Snowdon and Teie (2010) may be especially valuable, where spontaneous social behavioral effects of these stimuli will help contextualize the existing studies of music playback. MCS would not obviously predict a behavioral effect of species-relevant musical stimuli, beyond perhaps vigilance to potential rivals, while MSB would predict a wide emotional response depending on the content and a general increase in social affiliation.

Finally, moving back in the evolutionary tree, there is significant need for greater attention to the other great ape species (gorillas and orangutans) as well as placing other primate collective acoustic-related behaviors (such as the duetting of some primates; Gamba et al. 2016; Geissmann 2002; and the percussive tool use and stone handling of capuchins and macaques; Mangalam et al. 2016; Nahallage and Huffman 2008) into a cohesive evolutionary perspective. While collective acoustic behavior in gorillas and orangutans has received less attention than in Pan, it is noteworthy that bimanual percussive behavior seems to be absent in orangutans, who are not group-living, but present in gorillas, chimpanzees, bonobos, and humans (Fitch 2006). Considerable variability in the intensity of intergroup competition across gorilla species (Forcina et al. 2019; Morrison et al. 2020) and diverse bonding structures across all great apes suggest that looking beyond Pan in homologous collective acoustic behavior may provide further evidence to distinguish between major hypotheses of music evolution. Beyond great apes, duetting primates (e.g., Gamba et al. 2016; Geissmann 2002; Müller and Anzenberger 2002) have attracted growing research as convergent with human singing, but future work could benefit from direct investigation into the conserved roots of these phenomena found in their non-duetting relatives, and which capacities may represent exaptations of relatively widespread abilities to the species’ distinctive socioecologies. Joint calling is widespread, but few primate species have been found to actively coordinate their vocalizations, suggesting future research directions may benefit from examining consistent social differences associated with the rates of, contexts for, and degree of coordination in joint calling across species. Further, the percussive behavior found in other primate species, such as the percussive tool use of capuchin monkeys (Mangalam et al. 2016) and the stone handling of macaques (Nahallage and Huffman 2008) may represent a salient source of information for investigation into the origins and distribution of drumming and percussive behavior more broadly.

These future directions are far from comprehensive, but suggest a set of studies and predictions that can be used to evaluate hypotheses about music's evolutionary role in social bonding, credible signaling, or perhaps both. We emphasize that the default assumption for elements of great ape musicality should not be complete absence, as has often been presumed, but rather agnosticism until the lacking studies have been completed. The comparative method and a focus on homology are the focal points for the unique role that the Pan species can play in revealing our species’ musical origins.

5 Conclusion

Across the available literature for Pan apes, we highlight the considerable relevance of conserved collective acoustic behavior in chimpanzees and bonobos to evaluating theories of human musicality. We aim to chart a path forward in identifying specific pieces of evidence that would provide critical support or opposition, distinguishing hypotheses and revealing the specific features of human musicality that are conserved and those that are derived and unique to our own species. Attention to bonobos is especially needed, where differences in coalitionary aggression alongside similarity in social affiliation compared to chimpanzees can lead to specific and falsifiable predictions. On the whole, significantly greater attention is needed to homology, not only analogy, in our own species’ musical evolution. The collective acoustic behavior of the two Pan species can shape our understanding of the evolution of musicality, if we care to listen.

Author Contributions

James Brooks: conceptualization (equal), investigation (equal), writing – original draft (equal), writing – review and editing (equal). Zanna Clay: conceptualization (equal), investigation (equal), writing – original draft (equal), writing – review and editing (equal). Valérie Dufour: conceptualization (equal), investigation (equal), writing – original draft (equal), writing – review and editing (equal). Pawel Fedurek: conceptualization (equal), investigation (equal), writing – original draft (equal), writing – review and editing (equal). Cédric Girard-Buttoz: conceptualization (equal), investigation (equal), writing – original draft (equal), writing – review and editing (equal). Shinya Yamamoto: conceptualization (equal), investigation (equal), writing – original draft (equal), writing – review and editing (equal).

Ethics Statement

The authors have nothing to report.

Conflicts of Interest

The authors declare no conflicts of interest.

Data Availability Statement

Data sharing is not applicable to this article, as no new data were created or analyzed in this study.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.