Mirror neurons in the tree of life: mosaic evolution, plasticity and exaptation of sensorimotor matching responses
ABSTRACT
Considering the properties of mirror neurons (MNs) in terms of development and phylogeny, we offer a novel, unifying, and testable account of their evolution according to the available data and try to unify apparently discordant research, including the plasticity of MNs during development, their adaptive value and their phylogenetic relationships and continuity. We hypothesize that the MN system reflects a set of interrelated traits, each with an independent natural history due to unique selective pressures, and propose that there are at least three evolutionarily significant trends that gave raise to three subtypes: hand visuomotor, mouth visuomotor, and audio–vocal. Specifically, we put forward a mosaic evolution hypothesis, which posits that different types of MNs may have evolved at different rates within and among species. This evolutionary hypothesis represents an alternative to both adaptationist and associative models. Finally, the review offers a strong heuristic potential in predicting the circumstances under which specific variations and properties of MNs are expected. Such predictive value is critical to test new hypotheses about MN activity and its plastic changes, depending on the species, the neuroanatomical substrates, and the ecological niche.
I. INTRODUCTION
Among sensorimotor neurons, a subclass of neurons fires both when an individual performs an action and observes that same or similar action performed by another. These neurons, called mirror neurons (MNs), were first described in the ventral premotor cortex and inferior parietal lobule of the macaque monkey (Di Pellegrino et al., 1992; Gallese & Goldman, 1998; Rizzolatti & Craighero, 2004). MNs received great interest (Heyes, 2010): the matching between perception and action at the level of single neurons was relevant to several fields of research and MNs have been proposed to play a key role in social cognition (Gallese & Goldman, 1998; Rizzolatti & Craighero, 2004).
Research on MNs has paved the way for the formulation of various hypotheses based on interpretations of their (i) possible function, (ii) mechanism, (iii) ontogenetic development and (iv) evolutionary history (Rizzolatti & Arbib, 1998; Rizzolatti & Craighero, 2004; Bonini & Ferrari, 2011; Cook et al., 2014). This review however is concerned with a comparative analysis of MNs that considers properties and factors associated with MN development in different species. Thus, we will mostly consider the development and phylogeny of MNs although some functions and mechanisms will be covered as well. Indeed, although development, evolution, function and mechanism refer to different level of explanations (Tinbergen, 1963), they are not completely independent from each other and should be analysed together in order to understand fully the evolution of MNs as a functional trait.
Some scholars propose that macaque and human MNs are a necessary phylogenetic stage within the evolutionary path leading to the emergence of high-level cognitive functions, such as action-understanding, imitation, mind-reading and language (Gallese & Goldman, 1998; Rizzolatti & Arbib, 1998; Gentilucci & Corballis, 2006). These classic views mostly focus on the functional role of MNs during phylogeny but neglect the developmental processes contributing to the construction of this role during ontogeny (Keysers & Gazzola, 2014).
Other models address the question of the ontogenetic origin of MNs. According to an associative model (Cook et al., 2014), MNs acquire ontogenetically their observation–execution matching properties through a domain-general process of sensorimotor associative learning. As a by-product of motor learning, MNs may still play a functional role, but ‘do not necessarily have a specific evolutionary purpose or adaptive function’ (Cook et al., 2014, p. 1; see also Catmur, Walsh & Heyes, 2007; Heyes, 2010). Complementary Hebbian learning models (Oztop & Arbib, 2002; Keysers & Perrett, 2004; Bonaiuto & Arbib, 2010; Keysers & Gazzola, 2014) focus more on the spike-time-dependent plasticity that occurs at the synapses and the anatomical details of the connections giving rise to MN systems. They propose that Hebbian learning (i.e. synaptic efficiency through concurrent neural firing) plays a major role in wiring together sensory and motor areas of the brain and subsequently generalizing them to actions performed by others.
Several authors (Giudice, Manera & Keysers, 2009; Ferrari et al., 2013; Bonaiuto, 2014; Lotem & Kolodny, 2014; Oberman, Hubbard & McCleery, 2014; Orban, 2014) propose an integration of the developmental and evolutionary dynamics of MNs. Some focus attention on the role of canalization (Giudice et al., 2009; Ferrari et al., 2013) and developmental plasticity (Ferrari et al., 2013), and consider MNs as the result of both maturational processes of the brain, and epigenetic regulation of specific populations of motor neurons under the influence of sensorimotor experiences during ontogeny (Ferrari et al., 2013; Tramacere, Ferrari & Iriki, 2015). They thus propose that some of the environmental, social and molecular conditions which contribute to the development of MNs have been canalized or stabilized during phylogeny, promoting the adaptive ability to decode social information and facilitating social interactions from the first phases of development (Giudice et al., 2009; Ferrari et al., 2013; Tramacere et al., 2015).
While associative accounts (Heyes, 2010; Cook et al., 2014) deny the possibility that MNs had a specific functional role in phylogeny and that they could have emerged through an evolutionary process as specific adaptations, others argue that the phylogeny of MNs represents a stage in the evolution of highly specialized (and often ‘higher’) cognitive faculties, such as language (Rizzolatti & Arbib, 1998) or mind-reading (Gallese & Goldman, 1998). Both the phylogenetic view and the alternative models influenced by research in epigenetics (Giudice et al., 2009; Ferrari et al., 2013) have sought to provide a speculative narrative of how mirror neurons evolved. However, while an evolutionary account must principally attempt to answer questions of when and how a particular change occurred during phylogeny, in these models what remains unclear is the process or mechanism that produced the canalization, including how mirror neurons became canalized during evolutionary history. Herein, we try to bridge these gaps. In addition, while previous models are based on information derived from studies on macaque and human MNs, we expand this perspective by including additional critical information regarding MNs in songbirds (Prather et al., 2008) and marmosets (Suzuki et al., 2014), as well as inferences based on careful analysis of brain activity through neuroimaging in another primate species, the Pan troglodytes (Hecht et al., 2013). In light of these data, we formulate a new view of MN evolution, consistent with comparative neuroanatomical and behavioural evidence to date. Further, we suggest directions for future research in the analysis of sensorimotor neural structures.
II. ARE MIRROR NEURONS A VALID TRAIT TO COMPARE ACROSS SPECIES?
In order to analyse MNs from an evolutionary perspective, it is necessary to identify them as a valid trait to compare across species (Table 1). A valid trait is reliably present in many individuals and distinguished from other traits (Striedter, 1999). In the nervous system, a valid trait, such as a brain region, is defined in terms of specific attributes, including: (i) anatomical location, (ii) physiology, (iii) pattern of connections, (iv) functions and (v) cytoarchitecture (Tyner, 1975; Kaas, Merzenich & Killackey, 1983; Striedter, 1998, 1999).
![]() |
![]() |
![]() |
![]() |
![]() |
Homo sapiens (human) |
Pan troglotydes (chimpanzee) |
M. nemestrina M. mulatta M. fascicularia M. fuscata (macaque) |
Callitris jaccus (marmoset) |
Melospiza georgiana (swamp sparrow) Lonchura striata domestica (Bengalese finch) |
fMRI PET TMS MEG EEG single-cell |
PET |
fMRI EEG single-cell |
Single-cell | Single-cell |
Supplementary motor area (SMA), primary motor cortex (M1), superior temporal sulcus (STS), inferior frontal gyrus (IFG), inferior parietal lobule (IPL), premotor cortex (PM), intraparietal sulcus (IPS), superior parietal lobule (SPL) | Inferior parietal lobule (IPL), superior temporal sulcus (STS), inferior frontal gyrus (IFG) | Supplementary motor area (SMA), primary motor cortex (M1), inferior parietal lobule (IPL), intraparietal sulcus (IPS), premotor cortex (PM) | Premotor cortex (PM), superior temporal sulcus (STS) | High vocal center (HVC) Field L |
(1) Anatomical location
MNs have been investigated in four species of macaque monkeys, and localized in specific sections of the ventral premotor and parietal cortex (Di Pellegrino et al., 1992; Rizzolatti et al., 1996). Furthermore, recent experiments suggest that MNs might be present also in the medial frontal cortex (Yoshida et al., 2011) and primary motor cortex (Vigneswaran et al., 2013), an area strongly connected with premotor regions.
In humans, MN activity has been indirectly inferred in specific sectors of premotor cortex, motor cortex and inferior parietal lobule (IPL), plus the intraparietal sulcus (Fadiga et al., 1995; Hari et al., 1998; Oberman et al., 2005; Tunik et al., 2007; Iacoboni, 2009). These data were confirmed by at least two meta-analyses: for 139 functional magnetic resonance imaging (fMRI) and positron emission tomography (PET) experiments (Caspers et al., 2010) and another 76 fMRI studies (Molenberghs, Cunnington & Mattingley, 2012); the inferior frontal gyrus, ventral premotor cortex and inferior parietal lobe were active in both the execution and observation of body actions. Mirror activity at the level of single neurons was also investigated in the few studies in which neurons with mirror-like properties were reported in the supplementary motor cortex (SMA) (Mukamel et al., 2010), and in the anterior cingulate cortex (ACC) (Hutchison et al., 1999), although in this latter study only a single observation was reported.
Furthermore, recent studies employing PET imaging found mirror-like activation in the frontal and parietal cortex in chimpanzees (Hecht et al., 2013). Interestingly, a recent study reported neurons discharging during execution and perception of the same actions in a sector of the premotor cortex of marmosets (Suzuki et al., 2014). Although these investigations of MNs in marmosets (three subjects) and chimpanzees (four subjects) were performed with a limited number of animals, they are nevertheless very valuable because they represent the only source of information regarding this system in other primate species. These findings clearly parallel those obtained with single cells and fMRI in macaques. Our analysis relies on the high level of similarities in all primate species investigated between the anatomical locations where MNs have been identified, other than for the neural circuits involved in sensorimotor transformation of hand behaviour, and for the specificity of the physiological responses (Hecht et al., 2013; Suzuki et al., 2014). It is also important to note that in the neurophysiological studies in which MNs were recorded, monkeys involved in the experiments originated either from colonies, with a rich social life and complex interactions, or from animals born in captivity with very limited social experience. Despite such significant differences, MNs were found consistently in the same location and with similar percentages out of the total number of recorded neurons. Therefore it is likely that they are always present, in the same anatomical areas with very similar properties, in all individuals of these species.
Neurons with mirror properties are not restricted to primates. In songbirds, MNs for vocalization have been localized through single-cell recording in the high vocal center (HVC) song nucleus, a premotor area necessary for learning and voluntary control of songs (Prather et al., 2008).
Table 1 does not include MNs activated by the observation of emotional facial expressions and those activated during pain-related stimuli resulting in the activation of areas of the limbic system, such as amygdala, anterior insula, anterior cingulate cortex and secondary somatosensory cortex (Carr et al., 2003; Wicker et al., 2003; Keysers et al., 2004; van der Gaag, Minderaa & Keysers, 2007; Ebisch et al., 2014). These MNs have been investigated primarily in humans and therefore there are insufficient data across species to speculate regarding their evolution. It is also worth noting that a neurophysiological study conducted in humans (Mukamel et al., 2010) may suggest that MNs can be localized also outside the classical mirror areas, i.e. in the hippocampus or entorhinal cortex. However, although it is possible that neurons with mirror properties may be present in other areas of the brain, there are not enough data to justify a comparative analysis.
(2) Physiology
From a neurophysiological point of view, MNs are defined as neurons that fire during the perception and execution of the same, similar and, in a very few cases, logically related actions. About 30% of recorded MNs can be defined as strictly congruent (i.e. fire during the execution and observation of virtually identical actions, both in terms of general movement and the way in which this movement is executed). More than 60% of MNs are broadly congruent, i.e. respond during the execution and observation of similar actions. Most of these neurons fire for hand actions during both visual and motor tasks, and a few fire for more than one effector (hand and mouth) (Gallese et al., 1996; Ferrari et al., 2003). Logically related MNs have also been identified, discharging when the observed action is only logically related to the executed action and possibly conceived as preparatory to it (Di Pellegrino et al., 1992). This category represents less than 5% of all MNs. Thus the various visual responses of MNs can be selective for and modulated by the different contextual conditions in which the actions occur, and are more or less specific to different aspects of the action itself.
However, the majority of neurophysiological studies on MNs have been conducted irrespective of congruency, i.e. only measuring visuomotor properties of MNs during general movements (i.e. grasping) (Cook & Bird, 2013). As a consequence, the precise visuomotor congruence of different MNs is not always specified and we only know that they activate during the execution and perception of the same actions. Any recorded motor neuron which is also activated while the subject visually or auditory perceives other's movements, has been categorized as a MN. The above definition clearly reflects the methodological approach in which neurophysiologists typically collect data concerning neurons with sensorimotor properties.
(3) Patterns of connectivity
MNs are embedded in the mirror neuron system (MNS), a network of interconnected areas that simultaneously process information related to the execution and perception of specific biological actions (Keysers & Perrett, 2004; Rizzolatti & Craighero, 2004), comprising specific sections of premotor, parietal and primary motor cortices that do contain MNs, and the superior temporal sulcus (STS) that contains only sensory neurons. There is clear evidence from neuroanatomical studies that those parietal and premotor sectors containing MNs are anatomically connected and thus form a functional circuit (Borra et al., 2008). It has also been shown that some MNs of the premotor cortex project to the spinal cord (Kraskov et al., 2009), and therefore have a direct input to the muscles.
MNs are inseparable from the neural connectivity in which they are embedded. The ensemble of neurons that form the MNS are deeply dependent on the intrinsic (between neurons located in the same anatomical sector) and extrinsic (between neurons having the same properties but located in different sectors of the cerebral cortex) connections in which they are embedded and work, as a whole, at a population scale (Keysers & Perrett, 2004; Iacoboni et al., 2005; Borra et al., 2008; Nelissen et al., 2011). If MNs are defined as populations of neurons activating during the execution and observation of the same or similar actions, then the MNS can be seen as a network of interconnected populations of MNs located in different areas of the brain, that can be operationalized as an estimate of the number of neurons activated during the perception of specific actions minus the activity of mirror-like or purely sensory neurons (Nelissen et al., 2005, 2011).
(4) Function
Although the functional attribution of this class of neurons is still debated (Hickok & Hauser, 2010; Rizzolatti & Sinigaglia, 2010; Cook et al., 2014), the function most commonly associated to them is action recognition (Rizzolatti & Sinigaglia, 2010). Being recruited during visual perception, MNs are thought to allow an individual to produce a sensorimotor representation of what another individual is doing (Rizzolatti & Craighero, 2004), activating an internal description of various attributes (i.e. action direction, action goal, spatial location, kinematics) relevant to action execution (Rizzolatti, Fogassi & Gallese, 2001).
Consistent with this, a recent study using transcranial magnetic stimulation (TMS) in humans highlighted the causal role of a population of premotor neurons in recognition tasks involving observation of lip and hand actions (Michael et al., 2014). A meta-analysis of 11 studies involving more than 350 patients with brain lesions in the inferior frontal cortex and posterior parietal cortex found impairments in capacity to recognize others' actions, supporting the causal role of the temporo-parietal-premotor brain network in action recognition (Urgesi, Candidi & Avenanti, 2014).
Some scholars (Heyes, 2010; Catmur, 2014; Cook et al., 2014) have criticized the action recognition role attributed to MNs, stating that although MNs may have a function in low-level processes of action perception, they are not involved in higher-level processes such as matching an action to its goal object or selecting the relevant aspects for that action in a given context (Catmur, 2014). We agree with Michael et al. (2014), who suggested that this criticism depends upon the radically modularist premise that it is not possible for the processes underpinning action perception to contribute to the process of making judgments about the (proximal and distal) goals of the observed actions. We do not endorse this radically modularist perspective, which contradicts our definition of MNs at the population and systemic scale. Although further research is needed to corroborate hypotheses of MN function and to find parallels in non-human primates, research on sensory activation of motor neurons is showing that the motor system can contribute to perception and that its impairment can cause deficits in perceptual recognition (Michael et al., 2014; Urgesi et al., 2014).
(5) Cytoarchitecture
MNs seem not to be linked to specific properties of the cerebral cortex (granular, agranular or dysgranular). For example, human MNs have been localized in the premotor area (BA6), which is agranular, and in Broca's area, constituted by BA44 and BA45, which are dysgranular and granular, respectively (Brodmann, 1909; Amunts et al., 1999). Further investigations are needed to verify whether the neurophysiological properties of MNs are linked to a specific cortical layer, such as layer III, which contains both agranular and dysgranular tissues (Shipp, 2007). However, since characters or traits in comparative biology are not defined by any particular properties, and a trait must satisfy as many attributes (i.e. location, physiology, functions, pattern of connections, cytoarchitecture) as possible (Tyner, 1975; Striedter, 1999), we do not consider the absence of one of these attributes a sufficient reason to dismiss the validity of the trait
III. DIFFERENT CATEGORIES OF MIRROR NEURONS
Although MNs are a defined and recognizable trait sharing a core neural matching mechanism of action – perception, probably across the amniotes [tetrapods that have an amniotic egg, including sauropsids (reptiles and birds) and sinapsids (mammals)], a closer inspection suggests that MNs in such heterogeneous taxa do not reflect a uniform and stable execution–observation matching system both within and across species (Ferrari et al., 2013). Firstly, they can be activated in different sensorial modalities, such as vision (Di Pellegrino et al., 1992), hearing (Pulvermüller et al., 2006), or both (Kohler et al., 2002), and can involve different effectors, such as the hand (Rizzolatti et al., 2001), mouth (Ferrari et al., 2003), or, in the limbic system, a combination of both (Gallese, Keysers & Rizzolatti, 2004; Keysers & Perrett, 2004). Secondly, MNs are highly plastic and highly variable in their locations and proximate functions during ontogeny (Calvo-Merino et al., 2005, 2006; Haslinger et al., 2005).
In order to assimilate such variability and complexity, we propose a more parsimonious categorization that may help to clarify the properties of MNs by taking into account the evidence for sensorimotor processing. We will classify MNs using two unambiguous physiological criteria: the modalities of sensory input triggering the response, and the effectors involved in the motor output. We thus obtain three main categories of MNs: (i) hand visuomotor MNs, (ii) mouth visuomotor MNs, and (iii) audio–vocal MNs. Some studies identify bimodal MNs, which fire during both the visual and auditory presentation of the same action (Kohler et al., 2002), or MNs that respond to grasping either with the hand or with the mouth, suggesting that MNs may generalize either the sensorial input or the biological effector by which the action is performed (Gallese et al., 1996). There might thus be cases for which the categories we propose partially overlap. However, there are reasons to believe that such overlap will not prevent our categories from being useful. By differentially focusing on their perceptual input and motor output, our categorization provides a useful dictionary of neural sub-classes, allowing understanding of neurophysiology in an ethological perspective, by focusing on how the individual moves and what they perceive according to the experimental paradigms utilized for investigating neuronal responses. More importantly, the proposed taxonomy reflects the topographic organization of the cerebral cortex. Premotor, parietal and motor regions of the brain present quite sharp segregation of neurons in relation to the hand, mouth, arm, face and eyes, with defined overlapping regions (Rizzolatti & Luppino, 2001; Kaas, 2008). Accordingly, an increase in studies of MNs that consider actions performed not only with the hand or mouth but also with the arm, eyes or legs, may lead to a proliferation of MN categories in the future.
Even though MNs can be divided into categories according to these sub-traits, MNs can be identified as a unique trait by virtue of their general neurophysiological mechanism, location and common anatomical connectivity. A useful analogy is the trait ‘cerebral cortex’ that can be further divided into a large number of functionally distinct processing sub-traits (cortical areas) which in turn are divided into sets of modules or columns of functionally related neurons, so that single areas can mediate several distinct, but related functions (Kaas, 2008). As for MN sub-traits, which could be considered as one of these interconnected arrays of cortical areas, cortical columns show inter-individual variability and are influenced greatly by experience (Tzourio-Mazoyer et al., 2004). However, it is still useful to categorize them in the context of different types of comparative and functional analyses to facilitate a general understanding of the central nervous system (Kaas, 2008).
(1) Hand MNs in primates: a common evolutionary history
Hand visuomotor mirror neurons (hand MNs) refer to interconnected populations of neuronal cells activated by the visual observation of others' hand gestures, and also involved in the control of one's own hand actions (Fig. 1).

Although the gross physiology and neural connectivity of MNs are remarkably similar in different primate species, they also present interesting differences in comparative analysis. In the frontoparietal circuit of the macaque, hand MNs respond to the observation and execution of transitive grasping actions (Rizzolatti & Craighero, 2004). The response can, in some cases, be specific to the type of grip (e.g. precision versus power grip) or type of action (i.e. manipulation, holding, etc.) (Rizzolatti et al., 2002). In other cases, MNs may exhibit a large degree of generalization, firing in response to actions performed by both conspecifics and heterospecifics (humans), even when these are performed from different visual perspectives (Caggiano et al., 2009). Neuroimaging studies show that in the macaque intransitive or mimicking actions elicit very weak activation in mirror cortical regions (Nelissen et al., 2011). On the contrary, in the frontoparietal cortex of chimpanzees mirror responses have been found during the observation of both transitive grasping actions and intransitive movements (Hecht et al., 2013). Chimpanzees have a MNS supported by cortical regions corresponding to those consistently found in humans, namely the premotor and parietal areas as well as the STS region. Interestingly, and in contrast to macaques, chimpanzees show similar motor activation also during the observation of non-goal-directed actions (i.e. miming grasping), thus resembling the properties of the human mirror system (Hecht et al., 2013). In humans, in fact, both intransitive and transitive gestures enable the activation of the mirror system (Rizzolatti et al., 2002).
Mirror responses have also been found in the ventral portion of the frontal cortex of marmosets, during the execution and observation of transitive reaching and grasping actions (Suzuki et al., 2014), widening the range of species that show visually activated motor neurons and suggesting that, in primate phylogeny, MNs are probably evolutionarily more ancient than previously thought.
The recruitment of hand visuomotor mirror responses during visual perception may be involved, in terms of proximate causes, in the function of action recognition in several social species. A similar mechanism could have had an evolutionary role during primate phylogeny. Another possible function of MNs, suggested by Jeannerod (1994), is that their properties seem suitable for imitative purposes. Neuroimaging studies in humans have confirmed that core areas of the MNS are activated during both simple imitation of mouth gestures and hand movements not directed towards a target (Iacoboni et al., 1999, 2005; Carr et al., 2003). Interestingly, such results seem to contrast with monkey studies where fMRI investigations show that the MNS responds only weakly during the observation of non-goal-directed actions, while in chimpanzees and humans the mirror system is also activated by intransitive, meaningless movements. No monkey studies have investigated the possible role of MNs during imitation because macaques are considered to have poor imitative capacities, and even through training, the main route of new motor skills acquisition is unlikely to take place through imitation. According to this explanation, the monkey MNS would appear to be sensitive to the goal of actions, but not to code the details of the observed action that leads to the goal. This may be a possible explanation of the behavioural evidence that monkeys cannot replicate the observed actions, although they seem to recognize goal-directed movements during perception (Rizzolatti, 2005; Rochat et al., 2008). These observations raise the possibility that in primate phylogeny, the activation of tightly interconnected populations of the MNS by intransitive and meaningless hand actions might be the result of recent adaptive evolution that led to an autapomorphy (uniquely derived trait) of the mirror machinery in hominids (chimps and humans).
Support for the hypothesis that stronger manual imitative abilities in humans and chimpanzees, compared to macaques, appear to be related to species differences in features of their MNS also comes from anatomical studies (Hecht et al., 2012). From a behavioural point of view, macaques only copy or emulate the end result of observed hand actions, while humans and chimpanzees are able to copy step-by-step action processes (Huffman & Quiatt, 1986; Visalberghi & Fragazy, 2002), although several differences exist in the way they imitate (Arbib, Bonaiuto & Rosta, 2006; Fridland & Moore, 2014). Chimpanzees exhibit a ‘simple form of imitation’ (or emulation), which allows single actions to be acquired within a limited number of attempts, whereas humans are also capable of ‘complex imitation’, which can be defined as the capacity to recognize novel actions through the comparison of variants of known goal-directed movements (Arbib, 2002, 2005; Tennie, Call & Tomasello, 2012).
This distinction is likely reflected in the evidence that macaques do not acquire tool use by imitation learning, while chimpanzees and humans do at the more rudimental/emulative and complex/compositional level, respectively (Byrne & Russon, 1998; Biro et al., 2003; Whiten, Horner & de Waal, 2005). The explanation may lie in differences in the neural connections within which the MNS is embedded (see Hecht et al., 2012). In the macaque there is a large discrepancy between the ventral (STS with frontal areas) and the dorsal (parietal lobe with frontal areas) circuits linking the main sources of visual information related to biologically meaningful stimuli (i.e. hand/body movement, face perception, gaze movement) to cortical areas involved in higher cognitive functions. The ventral connections seem to be much larger and stronger than the dorsal ones, while this difference is less pronounced in chimpanzees and absent in humans (Hecht et al., 2012). Functionally speaking, the ventral route might be useful in coding the physical end result of observed actions, while the dorsal route may code the spatial mapping of movements and may extract finer levels of action kinematics (Johnson-Frey et al., 2003; Goldenberg, 2009; Hecht et al., 2012).
Moreover, in humans, but not in other species, an additional dorsal pathway passes through the parietal opercular white matter to the anterior supramarginal gyrus; this pathway seems to be implicated in tool use (Iriki, 2006; Peeters et al., 2009; Hecht et al., 2012). Finally, the link between the mirror parietal region (enlarged in humans and associated with spatial awareness) and the inferior temporal sulcus, where the perception of objects and tools is coded, is strongest in humans, intermediate in chimpanzees and weakest in macaques (Hecht et al., 2012). Control anatomical tractography in the three species has been performed in the geniculostriate and corticospinal tracts but no significant differences between these tracts were found (Hecht et al., 2012).
This evidence suggests that hand MNs could be a primate homology of cerebral connectivity (fronto-parietal circuit) and the core mechanism (i.e. matching execution to observation), with differentiation of function (i.e. action recognition, imitation). In particular, primates seem to have inherited specific sensorimotor structures (i.e. premotor and parietal regions) in specific areas of the brain from a common ancestor (Preuss & Goldman-Rakic, 1991a,b; Kaas, 2008). The connectivity between these areas produces specialization of visuomotor neurons (involved in the visual coordination of arms in space) in MNs. Beyond intraspecific differences in neurophysiological responses and strength of axonal connections, frontoparietal connectivity must be common to all primate classes, as it is present both in strepsirrhines and anthropoid primates (Preuss & Goldman-Rakic, 1991a,b), and in some species (e.g. human, chimpanzee, macaque, marmoset) has been exploited also to code others' hand movements.
However, given the plasticity shown by MNs, and their high functional value, focusing only on the classical homology of this trait may be misleading. The common heritage of primate hand MNs is in fact not centred on specialized neurons activated during perception and execution of grasping actions (Rizzolatti & Matelli, 2003); rather it is based on more generalizable and learnable matching properties of sensorimotor processes regarding the execution and observation of interactions between the subject, hands and objects (Toni et al., 2008; Bonaiuto & Arbib, 2010). In other words, the primate frontoparietal circuit (as for many other cerebral traits) is greatly influenced by sensorimotor experience, and might transform social visual information into a motor format by virtue of evolutionarily conserved sensorimotor mechanisms tied to the contextual use of the upper appendages in a given environment.
Computational models plus behavioural and neurobiological evidence (Keysers & Perrett, 2004; Arbib & Bonaiuto, 2008; Cook et al., 2014) suggest that, beyond species-specific differences between the ecological and social niches of various primate populations which expose individuals to different types of manipulative (transitive, intransitive) actions (Arbib, Ganesh & Gasser, 2014), hand MNs/MNS are subjected to similar developmental trajectories in monkeys and humans, i.e. similar processes of brain interaction during ontogeny, as consequence of the need to reach objects through coordination of eyes and hand movements in space. These models predict that hand MNS emerge throughout motor development as newborns learn to extract relevant features from visually perceived manual actions, for controlling the hand in the space (Oztop, Bradley & Arbib, 2004). At the mechanistic level, MNs are thought to emerge through probabilistic connections and interactions between (pre)motor, parietal and sensorial neurons coding for different aspects of the same actions (Bonaiuto, Rosta & Arbib, 2007).
These hypotheses are supported by behavioural evidence. Skilled manual abilities require a long period of maturation (from 1 to 2 years in humans; several months in macaques) and many levels of sensorimotor integration. The development and refinement of a successful control strategy for visually guided reaching movements is accompanied by the execution of appropriate exploratory behaviour that involves concurrent and coordinated motor and visual experience (Lederman & Klatzky, 1987; Sommerville, Woodward & Needham, 2005). At three months of age human infants are not yet able to grasp objects, and at the same time they are also not visually sensitive to the goal structure of reaching and grasping movements performed by others (Sommerville et al., 2005). However, through motor training with a sticky mitten that allows an infant to reach for objects, their capacity to understand the goal structure of grasping movements emerges. This suggests that the emergence of fine perceptual skills related to manual actions is causally correlated to the development of associated motor abilities.
It has been hypothesized that these processes of brain interactions related to specific and repeated behavioural experiences epigenetically modify gene expression in MNs during development (Ferrari et al., 2013; Taschereau-Dumouchel et al., 2016), suggesting that the physiological role of MNs can be reflected at the epigenetic–nuclear level in specific brain regions. This is in line with evidence and models stating that epigenetic regulation associated with developmental plasticity reflect adaptive functional interactions of the brain with the environment during ontogeny (Riedl, 1977; Striedter, 1998; Fishell & Heintz, 2013; Bronfman, Ginsburg & Jablonka, 2014; Lokk et al., 2014).
Considering the plausibility of a common developmental trajectory in hand MNs involving both common brain interactions and molecular regulation underlying synaptic plasticity, we propose that hand MNs can be considered developmental or epigenetic homologues among the different primate species (Wagner, 1989; Rieppel, 1994; Striedter, 1998). Structures from two individuals or two species are developmental or epigenetic homologues if they share a set of developmental constraints caused by locally acting self-regulatory mechanisms of organ differentiation (Wagner, 1989). In particular, developmental homology may rely on common epigenetic mechanisms and similar processes of maturation that produce stable phenotypic results during development (Wagner, 1989).
The description of MN development within an epigenetic theoretical framework implies that MNs are subject to both endogenous and exogenous environmental influences during development, being not simply the result of top-down (learning) processes, but also of bottom-up dynamics of brain maturation. The epigenetic account differs from the associative and Hebbian account in explaining MNs as both the result of sensorimotor experiences and other processes of maturation which reflect the evolutionary history of the species (Ferrari et al., 2013), and suggests that associative learning is necessary but not sufficient for MNs development (Bonaiuto, 2014; Oberman et al., 2014; Orban, 2014). Thus, the epigenetic account not only makes predictions that are (partly) compatible with those of the associative hypothesis (some populations of MNs emerge when individuals are exposed to correlated sensorimotor experiences), but adds that pre-existing conditions – such as coarse-grained connectivity and input representations (Bonaiuto, 2014), are required to make the space of hand–vision relation trajectories possible.
The epigenetic interpretation of MNs further predicts that these pre-existing conditions are responsible for differences in action perception related to hand behaviour in different primate species. Differences in activation of the MNS associated with manual abilities in the chimpanzee, macaque and human may not only reflect (Arbib, 2012), but also be intrinsically intertwined with, wider differences in the cognitive and perceptual abilities of these animals, which affect the way in which individuals learn to perform manipulative actions in a social perspective. Thus, the epigenetic account is compatible (while the associative account is not) with the idea that social pressures favoured (beyond several domain-general adaptations) perception–action matching neural mechanisms (i.e. MNs) during the phylogenetic history of primates.
(2) Mouth MNs: evolutionary counterparts across primate species
Mouth visuomotor mirror neurons (mouth MNs) refer to populations of neurons activated by both the visual observation of others' mouth actions, as well as by the performance (execution) of mouth actions oneself. In macaques, and probably other primate species such as apes and humans, there appear to be two types of mouth MNs: the ingestive and the communicative. MNs may discharge during the observation and execution of actions performed during an interaction involving the mouth and an object, such as grasping and holding (with mouth), sucking, chewing and breaking (ingestive MNs; Ferrari et al., 2003). By contrast, communicative MNs discharge in response to non-goal-directed intransitive actions, such as lipsmacking or tongue protusion facial gestures (Ferrari et al., 2003), i.e. facial gestures that are used for communication.
Importantly, communicative MNs also fire during the execution of ingestive actions (such as sucking) and interestingly, ingestive MNs also fire during communicative behaviour such as lipsmacking and tongue protrusion, but with a weaker discharge.
Mouth MNs are very likely to be present in other primates, such as humans (Buccino et al., 2001; Iacoboni et al., 2004; Leslie, Johnson-Frey & Grafton, 2004), but have not yet been, investigated in chimpanzees. During the observation of facial gestures or during the execution/imitation of the same facial gesture, mirror activation was found in motor (area 6), supplemental motor area (SMA), premotor (BA44 and BA45) and parietal (IGF) cortex (Buccino et al., 2001; Carr et al., 2003). Consistent with these results, Mukamel et al. (2010) recorded neurons with mirror-like properties responding to the execution and perception of mouth actions in areas not classically associated with the MNS, such as the SMA. Thus, the overlap between areas with specific mouth motor activity and areas responding to similar mouth actions performed by others suggests that, in humans, a number of frontal (premotor and motor) and parietal areas controlling mouth movements may have mirror properties.
In light of similarities between the mouth and hand MNS at the physiological and structural level, a similar evolutionary relationship may be hypothesized for mouth visuomotor MNs, with the same conserved function, including later re-functionalization (i.e. exaptation). In fact, a common feature of mouth MNs across primate species is their pattern of activation in the frontoparietal circuit, where the upper part of the temporal cortex is connected to the caudal portion of the parietal cortex, which in turn is linked to the dorsal region of the premotor and possibly motor cortices (Gerbella et al., 2011; Rizzolatti et al., 2014). This would justify the assessment of classical homology between macaque and human mouth visuomotor MNs, because these neurons are found in corresponding locations and embedded in very similar patterns of connectivity. Moreover, given that a frontoparietal circuit controlling orofacial movements has been identified in marmosets (Yao et al., 2002) and prosimians (Kaas, 2008), the attribution of classical homology highlights that neurons in the frontoparietal circuit controlling orofacial movements must have existed in the ancestor of macaques and humans and that they have been recruited during the perception of others' mouth actions under appropriate conditions.
However, between hand and mouth MNs there also exist important differences, which could reflect their phylogenetic or ontogenetic origin. One such difference is the fact that mouth MNs activate during intransitive (communicative) actions even in monkeys, while hand MNs do not. Further, it is also noteworthy that, since agents do not have direct visual access to their own faces, in contrast to hand movements, mouth MNs may be the result of different (temporal and spatial) dynamics of sensorial and motor interactions in parietal and premotor brain areas.
Accordingly, the phenomenon of neonatal imitation makes the existence of mirror responses associated with face movements even more intriguing from a developmental and evolutionary point of view. In monkeys, apes and humans, just hours after birth, approximately 50% of newborns are capable of imitating a mouth gesture, such as lipsmacking or tongue protusion (Meltzoff & Moore, 1977; Ferrari et al., 2003, 2006; Myowa-Yamakoshi et al., 2004; Bard, 2007). Moreover, 36 h following birth, human newborns can discriminate and reproduce three different emotional facial gestures (happiness, sadness and surprise) expressed by a model (Field et al., 1982; see also Simpson et al., 2014). Given that face-to-face interactions typically involve effectors (such as the mouth and tongue) that neonates cannot visually access, processes of sensorimotor mapping between observation and execution of mouth gestures is crucial in neonatal imitation (Meltzoff & Moore, 1977; Meltzoff & Borton, 1979), suggesting that some rudimentary observation/execution matching system may be present at birth or shortly after birth in the human, ape and monkey brain (Ferrari et al., 2012; Vanderwert et al., 2015).
In confirmation of this hypothesis, electroencephalogram (EEG) studies in newborn macaques have found, similar to in human infants, neural activity reflective of brain mirroring (Ferrari et al., 2012; Vanderwert et al., 2015). In particular, during the observation and imitation of facial gestures specific frequency bands, the mu rhythm, of the EEG tends to reduce in amplitude. As the mu rhythm represents an indirect marker of the MN system (Marshall & Meltzoff, 2011), the finding of a mu rhythm in newborn monkeys and its sensitivity to early social experience (Vanderwert et al., 2015) provides important information about the presence of a mirror mechanism at birth correlated to infant synchronous dyadic communication.
As mu rhythm signals are considered to be associated with the activation of sensorimotor neural areas (Vanderwert, Fox & Ferrari, 2013), a plausible interpretation is that a rudimentary visual–mouth mirror mechanism is present from birth, allowing matching of observed mouth actions and the associated facial expressions with the display of the same behaviours (Nagy & Molnar, 2004; Bonini & Ferrari, 2011; Tramacere et al., 2015). The presence of mu rhythm suppression during observation of mouth gestures by macaque and human neonates, is even more informative if we consider that a similar electrophysiological effect is not associated with hand action perception in the same perinatal period (F. Festante, R. Vanderwert, A. Paukner, S. J. Suomi, N. A. Fox & P. F. Ferrari, in preparation). If confirmed, this result will support the idea that the neural networks involved in facial perception have peculiar properties of developmental ‘readiness’.
The developmental properties of the mirror mechanism underlying early imitative responses resemble the properties of so-called canalized traits, such as song learning in songbirds or emotion expression in humans (Gottlieb, 1991; Leppänen & Nelson, 2009). These are cases of experiential canalization, a species-specific perception by which individuals respond in a characteristic way to certain stimuli (i.e. respond only to some patterns of sensory stimulation and not to others) (Gottlieb, 1991), in order to efficiently develop a specific behavioural response. Accordingly, the mirror mechanism associated with neonatal imitation would be a case of developmental sensitivity only to environmental factors that are themselves invariant within the organism's typical developmental environment (Ariew, 1999).
Interestingly, some visual orientation is possible even in occipital cortical blind infants, suggesting that early visual abilities might rely on subcortical structures as well (Dubowitz et al., 1986). A neural circuit involving both neocortex and subcortex is likely to participate and contribute to early imitative responses in monkeys, ape and humans (Bonini, 2016).
Note that the capacity to imitate seems to disappear after a short period of time and to be confined to a narrow developmental window (Heimann, 1989; Myowa-Yamakoshi et al., 2004; Ferrari et al., 2006), resembling in some aspects the phenomenon of sensitive period. This latter, in fact, can be seen as a sort of brain plasticity that is temporally constrained by genotypic–environmental interactions during development (Burggren & Mueller, 2015).
Although the adaptive significance of neonatal imitation is not yet clear and is still debated (Simpson et al., 2014; Oostenbroek et al., 2016), understanding it as a experientially canalized or sensitive period phenomenon may pave the way to the identification of the different factors that play a role in its emergence and interindividual variability.
(3) Audio–vocal MNs: the exaptation of old structures
Audio-vocal mirror neurons are populations of neurons activated by others' vocal sounds and likewise by one's own vocalization production. These neurons have been directly investigated in songbirds and indirectly inferred in humans (Prather et al., 2008; Pulvermüller & Fadiga, 2010).
In avian species, auditory responses of single identified neurons were investigated in swamp sparrows (Melospiza georgiana) and Bengalese finches (Lonchura striata domestica). In swamp sparrows, neurons in HVC projecting Area X (HVC(X)) fired during song playback (Prather et al., 2008) (Fig. 2). These neurons appear to exhibit a highly selective firing pattern, responsive only to one song within the bird's repertoire (Prather et al., 2008). Unlike these neurons, HVC(RA) neurons, projecting to the robust nucleus of the arcopallium (RA), were entirely unresponsive to stimulation during listening (Prather et al., 2008). Thus, it seems that avian audio–vocal MNs, present only in HVC neurons projecting to Area X, are part of the so-called cortico-striatal loop, connecting the HVC, Area X (a basal ganglia nucleus) and the medial portion of the dorsolateral nucleus of the thalamus (DLM). Several lesion experiments clearly show that the cortico-striatal loop, considered to be analogous to that investigated in mammals (Graybiel, 2005; Jarvis et al., 2005), is necessary for song learning in juvenile birds (Bottjer, Miesner & Arnold, 1984).

Earlier studies on X–projecting MNs proposed that they might serve the function of corollary discharge of motor commands and have a specific role in communication (Prather et al., 2008; Keller & Hahnloser, 2009). However, additional experiments are necessary to confirm this hypothesis and understand whether audio–vocal MNs are simply the result of auditory feedback or whether they have a role in guiding sensorimotor learning.
Evidence from fMRI and TMS techniques shows that mirror mechanisms are present also in humans during vocal/speech perception (Pulvermüller & Fadiga, 2010). Spoken words have been shown to activate neural populations in part of the inferior frontal gyrus (BA44, also called Broca's area) (Pulvermüller, Shtyrov & Ilmoniemi, 2003). Wilson et al. (2004) and Pulvermüller et al. (2006) showed that passive perception of syllables led to strongly increased activation in Broca's area; in both studies activation was greater for speech than for non-speech sounds.
Consistent with the view that speech perception involves the motor system in a process of auditory-to-articulatory mapping to access phonation with motor properties are findings obtained through TMS techniques: listening to speech increases evoked motor potentials in the lip muscles, while listening to non-speech sounds does not (Watkins & Paus, 2004). Similar perceptual enhancements of motor excitability have been identified in the tongue area of the motor cortex and tongue muscles (Fadiga et al., 2002; Fadiga, Craighero & Olivier, 2005). On the basis of these observations some authors (Hauk, Shtyrov & Pulvermüller, 2008; Schomers et al., 2014) attempted to establish the cognitive advantages of perceptual processing that overlaps with neural circuits devoted to motor execution, rather than having different channels for perceptual input and motor output. Efficiency in word-to-picture recognition tasks was measured by using TMS pulses on the areas of motor cortex representing the lip and tongue muscles. The results suggested a causal role of the motor cortex in word auditory perceptual recognition (Schomers et al., 2014).
Given that audio–vocal MNs are present in humans and songbirds, the assessment of homology should extend back to their distant ancestors. Sauropsida (reptiles and birds) and Synapsida (mammals) belong to Amniota, and originated in the Carboniferous about 310 million years ago (Coates, Ruta & Friedman, 2008) with the reptiles/birds and mammals thereafter evolving independently. Thus, in order to investigate possible homology between avian and human audio–vocal MNs, it is important to establish whether these types of sensorimotor neurons are present in corresponding locations of the brain of non-amniotes, such as amphibians. Unfortunately, the only information available for amphibians is based on single-cell recording. Furthermore, avian MNs are known only from songbirds (Prather et al., 2008), and are very likely absent in mostly suboscines (non-singing passerines). Suboscines do make innate calls, associated with midbrain/brainstem centres, but show no encoding or production of vocalizations acquired from conspecifics (Nottebohm, 1972). Suboscines also lack the forebrain cell clusters where audio–vocal MNs are located and that, by integrating sensorimotor signals, control song learning in oscines (songbirds) (Nottebohm, 1972).
In order to understand the evolutionary origin of MNs, it would also of interest to determine whether mechanisms of neural mirroring mediate innate calls (i.e. alarm calls) in suboscines. However, one of the very few studies (indirectly) addressing this issue (Liu et al., 2013) failed to find sensorimotor neural matching in this area. Auditory processing occurs in auditory regions (Field L) in all non-oscine birds investigated (Gahr, 2000), and auditory and motor processing seem to be processed by distinct cerebral regions. Although further investigations are required regarding the presence of mirror mechanisms in sub-cortical structures of non-vocal learners, structural homology of MNs located in the neocortex across synapsids and sauropsids seems unsupported.
Another possible explanation is the convergent evolution of avian and human audio–vocal MNs. Convergent evolution is the process whereby distantly related organisms independently evolve structurally similar traits as a result of occupying similar ecological niches (Stern, 2013). It results in functional analogy between even complex traits, rather than their presence due to a shared common ancestor (for example, echolocation in bats and some birds). As the gross structures of bird and human telencephalic regions (where audio–vocal MNs are located) are very different (human multi-layered composition of the cerebral cortex, type and distribution of subcortical regions, presence of segregated nuclei in songbirds, etc.), audio–vocal MNs could represent a convergent trait that evolved independently as a result of adaptation to similar ecological conditions (auditory–vocal communication).
However, some convergent traits are also modifications of primitive conditions (Shubin, Tabin & Carroll, 2009); between the avian and human telencephalon there are also astonishing similarities. To consider only aspects that are pertinent to the development of MNs in humans, audio–vocal mirror responses are localized to Broca's area (BA44), a premotor region of the frontal cortex (Wilson et al., 2004; Pulvermüller et al., 2006), whereas in songbirds MNs are localized to the HVC, a nucleus with premotor properties of the pallium (Kozhevnikov & Fee, 2007). This sector of the avian pallium, as for the corresponding frontal sector of mammal neocortex, receives visual and auditory signals from the thalamus (Jarvis et al., 2005). It also processes the same types of sensory information as the mammalian neocortex and gives rise to important descending projections to the motoneurons of the brainstem and spinal cord involved in the voluntary control of the vocal tract and respiration. Finally, like the mammalian neocortex, the pallium serves a crucial role in song sensorimotor learning (Jarvis et al., 2005) (Fig. 2). Several researchers have highlighted functional similarities between the neurophysiological properties of the avian HVC and human Broca's area (Doupe & Kuhl, 1999; Merker & Okanoya, 2007): they are implicated in coding syringeal or mouth movements, and in high-order decoding of respectively syllable and speech production, respectively.
From an evolutionary point of view, songbirds and mammals may have thus co-opted a similar primitive neural structure with appropriate functional characteristics for the emergence of vocal learning (Bolhuis, Okanoya & Scharff, 2010). According to this hypothesis, the development of audio–vocal MNs both in humans and songbirds could be understood as neural reuse (Anderson, 2010) or exaptation (Gould & Vrba, 1982; Pievani & Serrelli, 2011) of the primitive properties of avian and human prefrontal neurons: once the connections between prefrontal sensorimotor neurons of the brainstem became established for vocal execution, Broca's and HVC neurons may have generalized their firing response to the perception of conspecific vocal sounds.
Moreover, based on the hypothesis that similar avian and mammalian brain areas may express similar gene sets (Jarvis et al., 2013; Pfenning et al., 2014), it is possible that tissue-level similarities between Broca's area and the avian HVC are due to a common molecular pathway. Molecular studies recently revealed that one or more genes underlying a complex trait could underlie convergent evolution even across species separated by hundreds of millions of years from a common ancestor (Pfenning et al., 2014). Examples of this are echolocation in bats and cetaceans (Liu et al., 2010), the emergence of electric organs in different lineages of fishes (Zakon et al., 2006), and skin colours in different mammalian species (Arendt & Reznick, 2008).
Convergent changes in amino acid sequences between avian species and Homo sapiens have recently been reported in relation to vocal learning (Wang, 2011; Zhang et al., 2014). In addition, preliminary molecular studies performed on micro-dissected song-control nuclei and human post mortem samples report remarkably convergent genetic expression between the avian RA and human laryngeal motor cortex and between Area X (a striatal nucleus) and the human putamen (Pfenning et al., 2014). A fascinating hypothesis is that common genetic regulation could underlie the premotor specialization of the pallium in songbirds and the cortex in Homo sapiens, and that it is also involved in the general emergence of neuronal activities with mirror responses. For example, some studies in songbirds show that in specific song nuclei, type-II cadherin expression in the RA switches from cadherin-7 to cadherin-6B at the transition from sensory to sensorimotor learning (Matsunaga & Okanoya, 2011; Matsunaga et al., 2011), suggesting that molecular analysis is able to uncover fine properties of different types of neurophysiological phenomena. These results are likely to be very important for understanding epigenetic regulation of sensorimotor neurons in general, and of mirror neurons in particular, and to shed light on how gene expression, brain organization and behaviour operate and evolve in conjunction. Thus the term ‘factorial homologues’ would perhaps be better for avian and human audio–vocal MNs; factorial homology means two or more traits that are not historical homologues (because they are not derived from a common ancestor, having evolved independently from different ancestral structures), but are developmental homologues, having independently co-opted the same developmental module, i.e. the same generative gene network module (Minelli & Fusco, 2013).
Thus, novel complex adaptations may have emerged during evolution by the re-use or functional extension of homologous anatomical and genetic structures, which were shared at some point in the past (Wray & Abouheif, 1998; Shubin et al., 2009). This is implicated in the development of the matching between vocal execution and auditory feedback seen in both songbirds and humans.
IV. THE MOSAIC EVOLUTION HYPOTHESIS FOR MIRROR NEURONS
The MNS is not a single evolutionary trait, but rather is a set of interrelated traits sharing a core action–perception matching mechanism, each with an independent evolutionary history reflecting unique selective pressures, much like human language and the mammalian isocortex (Barton & Harvey, 2000; Fitch, 2012). MN evolution may then be interpreted as a case of ‘mosaic evolution’. Mosaic evolution refers to evolutionary changes that occur in some body parts or systems without simultaneous changes in other parts. In other words, complex traits may evolve at varying rates within and among species (Carroll, 1997; Minelli & Fusco, 2013). Some traits may have an ancient phylogenetic history and may be changing slowly through gradual evolution, while other parts could be phylogenetically recent and may be changing at a rapid rate. Some traits may exhibit ancient and homologous structural constraints, while others may exhibit more recent functional changes. With mosaic evolution, a major morphological and behavioural transition of a trait (for instance, bipedalism in hominins) can occur through several adaptive solutions in different species and even genera, each being a different combination of primitive and derived sub-traits.
We hypothesize that in the case of MNs, their phylogenetic emergence in disparate species suggests that during evolution different perception neural mechanisms have been selected for, thus producing different developmental dynamics, physiological properties and patterns of connections with other neuronal systems for each one. Indeed, MNs seem to develop with different timing and dynamics in relation to the effector hand (mouth or vocal tract), and in relation to the species-specific environmental demands (Fig. 3).

(1) Mosaic evolution of manual gestures
The evolutionary history of hand MNs appear to be the result of a series of events in which a combination of uses, reuses, exaptations and specializations might have occurred. The existence of neurons responding to the observation and execution of hand actions may be associated with evolution of the forelimb, which in turn is related to the neural control of muscles and bones that evolved for locomotion. In arboreal primates, who move rapidly among branches, locomotion involves particularly developed reaching and grasping ability. Arboreal primates have thus evolved neural circuits for highly mobile forelimbs, which allow these animals to flex their arms, and through a highly specialized hand (and, in some cases, tails) to grasp in many planes (Schmitt, 1998; Falgairolle et al., 2006).
A widely accepted view suggests that primates evolved fine-skilled hand actions through the potentiation of a direct pathway between the premotor and motor cortex and motoneurons of the spinal tract controlling the limbs (Yuste et al., 2005; Lemon, 2008). Primates thus achieved a sort of ‘cortical dominance’, with cortical circuits controlling spinal activity to allow efficient forelimb movements (Yuste et al., 2005; Lemon, 2008). This derived and specialized trait can be dated to around 60 million years ago, to the evolution of Strepsirrhines: lemurs, galagos and tarsiers, can use their hands for locomotion, but also for foraging and other prehensile activities (Fragaszy, 1998). Neuroanatomical studies on prosimians confirm that they have a frontoparietal circuit, specialized for the coordination of the hand in space and connected with the spinal tract (Preuss & Goldman-Rakic, 1991b; Kaas, 2008; Lemon, 2008). Species with more complex social dynamics, such as diurnal species (e.g. Lemur catta), might have acquired mirror responses in cortical regions specialized for hand movements. Even though the social cohesion of Strepsirrhines in natural environments is limited (with few exceptions) compared with anthropoid primates (Goodman, O'Connor & Langrand, 1993), they are far from being isolated. Although many lemurs and galagos are classified as ‘solitary foragers’ (Bearder, 1999), they still exhibit some level of social organization: some live in pairs (Thalmann, 2001), others show forms of gregariousness, and others live in multimale/multifemale groups (Warren & Crompton, 1997). It is possible that the enriched social life of some prosimian species might have favoured the emergence of cortical circuits specifically devoted to decoding visual information regarding the behaviour of others. Under these circumstances, parietal–premotor circuits responding to action observation could have emerged as a consequence of specific social pressures.
With the evolution of primate societies, as observed in New World monkeys (Lazaro-Perea, 2001), hand visuomotor MNs emerged as neurons related to relevant social cognitive functions. Since MN development requires coupling between neural circuits for the execution and the observation of the same actions, they may have been acquired to facilitate highly social activities, which require proximity, close social interaction and coordination, such as grooming, social foraging, defensive behaviours, and cooperative breeding (Lazaro-Perea, 2001). One pioneering study in marmosets described the presence of hand MNs through single-cell recording (Suzuki et al., 2014); we expect that neurons with similar mirror properties are present also in other New World species such as tamarins, squirrel monkeys and capuchins. Since hand visuomotor MNs are known in four species of macaque and a chimpanzee, we expect that their existence will be verified in other species of Old World monkeys (macaques, mandrills, etc.) and apes (chimpanzees, gibbons, gorillas, orangutans).
The MNS, as for most neural structures, shows a high degree of plasticity. One interesting example is in the use of tools, expanding their response to other types of actions. This finding may have important implications from an evolutionary point of view. Tool use recruits neural areas involved in execution and perception of hand actions in humans (Järveläinen, Schuermann & Hari, 2004; Rochat et al., 2010), and probably in chimpanzees (Hecht et al., 2012, 2013, 2015) and macaques (Ferrari et al., 2005; Umiltà et al., 2008). In the macaque, where single-cell recording is experimentally feasible, neurons in premotor cortex responding to a particular tool-use observation have been found to be specific to this task (Ferrari et al., 2005; Rochat et al., 2010). This response was stronger than that obtained when the macaque observed a similar action performed with a biological effector, such as the hand (Ferrari et al., 2005). These findings suggest that tool-use neurons could be the result of a plastic co-option and functional shift of pre-existing hand-grasping or hand-reaching MNs.
Further experiments with macaque tool use are consistent with this interpretation. Although these animals do not typically use tools in natural environments, after two weeks training they are proficient at obtaining food with a rake (Ishibashi, Hihara & Iriki, 2000). Interestingly, during and immediately after this training, neuroplastic changes were observed in the parietal region, where hand MNs are located, such as macroscopic expansion of grey matter, axogenesis and synaptogenesis (Hihara et al., 2006); these changes were accompanied by elevated expression of immediate early genes (Ishibashi et al., 1999) and neurotrophic factors (Ishibashi et al., 2002a,b).
These findings suggest that when monkeys face a novel cognitive challenge such as recruiting or handling sources of food with tools, hand MNs together with canonical neurons (related to object observation and manipulation; Rizzolatti & Fadiga, 1998) and bimodal neurons (related to visual and tactile input processing: Iriki et al., 2001) in premotor and parietal cortices (Grove & Coward, 2008) undergo structural and molecular changes. Whether these plastic changes are responsible for the emergent specificity of the tool-responding MNs investigated in macaque monkeys during laboratory experiments requires careful future work.
These experiments on macaque tool use suggest that monkeys possess latent cognitive abilities (i.e. molecular plasticity underlying the reorganization of neuronal connectivity between mirror areas) that can be realized by exposure to an enabling environment (Iriki & Taoka, 2012). In human evolutionary history, tool use could have given rise to mutual interactions between individuals, groups and their environments (resulting in a process of social and ecological niche construction: Iriki & Taoka, 2012), where novel motor behaviour paved the way to new and causally correlated perceptual abilities and cognitive skills.
Thus, hand MNS are an important property of monkeys, apes and humans, inevitably and reliably resulting from the evolution of the regulatory structures of the primate brain and associated environmental niches. Explanation of the origins of the stability of MNs across development in terms of natural selection is unwarranted, because the developmental plasticity of the primate brain is sufficient to account for their emergence during ontogeny. However, natural selection may have contributed to the maintenance of hand MNs (and subsequently of tool-responding MNs) during phylogeny, as they seem to be associated with relevant biological functions, such as recognition of manual gestures and imitation in a social and communicative context.
(2) Natural selection on facial coordination in neonates
In anthropoid primates (New World monkeys, Old World monkeys, apes and humans), visual preference for faces (Burrows, 2008; Leopold & Rhodes, 2010) involves a process of selective attention to specific markers, such as the eyes and mouth (Schmidt & Cohn, 2001). The mouth seems to be an important component (Kano, Call & Tomonaga, 2012). Yawning, for example, is a signal for conspecifics (Smith, 1999), smiling and lipsmacking are affiliative cues for humans and non-human primates, respectively (van Hooff, 1962; Ferrari et al., 2003), and the exposure of canines signals a low-grade threat (Hadidian, 1980). In primates' natural environments, where faces seem to be salient stimuli for a variety of interactions including conflict resolution, territory defence, sexual signals, parent–offspring interactions, social integration and communication (Bradbury & Vehrencamp, 1998), it is plausible that face-to-face interactions are tuned through processes of Hebbian learning and synaptic competition, sensorimotor circuits for facial and mouth proprioception and motor coordination on the one hand, and specialized circuits for perceiving and interpreting others' facial signals, on the other (Schmidt & Cohn, 2001).
Since most of the parietal–frontal circuits controlling hand and mouth movements have a similar frontoparietal pattern of connections in Old World monkeys and prosimians, it is possible that in prosimians some forms of communication based on facial displays might rely on these circuits that operate through neural mirroring in similar ways to macaques, chimpanzees and humans. Although still speculative, it is possible that dyadic communicative episodes, such as the relaxed open-mouth display used as a play signal in Lemur catta (Palagi, Norscia & Spada, 2014), may have led individuals of those species to activate frontoparietal circuits for facial and mouth movements (Preuss & Goldman-Rakic, 1991b; Stepniewska, Fang & Kaas, 2005) during face perception, producing the first forms of mirror-like action–perception coupling in the primate brain. Populations of neurons responding both to the execution and observation of specific mouth actions may thus have evolved in individuals of some social and diurnal strepsirrhine species, in contrast to nocturnal species that rely more on olfactory cues, and who possess primary motor and premotor areas very similar to those of anthropoid primates (Preuss & Goldman-Rakic, 1991a,b; Kaas, 2008).
We speculate that mechanisms of perceptual–motor coupling related to face- and mouth-movement processing might have emerged by 60 million years ago in some early primates, such as in prosimians or strepsirrhines, which includes Lemuroidea and Lorisoidea. Some of these species have a low but significant level of face-to-face interactions that is translated into a grouped organization and intraspecific communication (Klopfer & Boskoff, 1979; Kappeler & van Schaik, 2002), such as for example the performance of playful facial signals (Palagi, 2009).
In most prosimians complex communicative signals based on face-to-face exchanges are observed primarily during late adolescence and adulthood (and not during earlier postnatal periods) where males and females perform relatively stereotyped play, grooming and reproductive behaviour (Doyle, 1979). Indeed, parents of galagos, lemurs and tarsiers spend only short periods with their infants, who mature precociously (Klopfer & Boskoff, 1979). Given that prosimian mothers carry their babies by mouth or on the mother's back, the number of mother–infant face-to-face interactions is likely to be low and the potential for facial gesture exchanges will be lower than in anthropoid primates (Klopfer & Boskoff, 1979). These observations (and the degree of neoteny in non-human primate species) may have important implications for brain development, and more specifically for how cortical and subcortical networks evolved in order to sustain complex social interactions based on facial gesturing. Although prosimians may develop mouth MNs during adolescence or adulthood, it seems unlikely that these species develop mouth visuomotor mirror responses in the early phases of development.
Increasing social demand (i.e. increased group size and social complexity) that occurred during the transition between ancestral prosimians and the lineage leading to the modern anthropoid primates about 40 million years ago (Shultz & Dunbar, 2007; Dunbar, 2010) may have favoured individuals efficient in coordinating their own facial and mouth movements in response to facial gestures of conspecifics, including caregivers, rivals, and companions. Changes in social niches, such as the growth of multilevel societies and more complex dynamics of parental and social bonding (Shultz & Dunbar, 2007; Dunbar, 2010), may have increased this selective pressure on anthropoid primates' facial recognition and expression capacities underpinned by the MNS.
Thus, the increasing neoteny and a progressively more demanding social niche may have produced a selective pressure on individuals to be more efficient in intraspecific communication (Dunbar & Shultz, 2007; Shultz & Dunbar, 2007), favouring the coordination of dyadic facial events, such as those occurring during precocious affiliative communicative situations (Ferrari et al., 2009). Consistent with this interpretation, anthropoid primates with extensive face-to-face interactions in neonatal and early postnatal developmental periods exhibit some of the most intricate facial displays and most complex facial musculature among all mammals (Burrows, 2008).
During primate phylogeny, individuals may thus have gained fitness by solving the so-called correspondence problem or perceptual-motor translation problem (Meltzoff & Decety, 2003; Brass & Heyes, 2005; Heyes, 2010), which refers to the capacity to respond coherently with one's own facial expression to that of another individual, without directly being able to observe one's own face. The early development of mouth visuomotor MNs might thus have been experientially canalized (Giudice et al., 2009; Ferrari et al., 2013) during the evolutionary history of primates, leading to preadaptation of neural circuits associated with mouth observation to code for hand visual processing. As explained above, mouth mirror mechanisms seem to be functional at birth, as they are probably crucial in neonatal imitation in monkeys, apes and humans (Meltzoff & Moore, 1977; Myowa-Yamakoshi et al., 2004; Ferrari et al., 2006, 2012; Simpson et al., 2014). Although coupling between perceptual cues and motor activation is not fully developed and refined at birth, wiring between the frontoparietal and temporal sensorimotor regions (Smyser, Snyder & Neil, 2011) and newborn sensitivity to visual facial cues might provide the neurobiological basis upon which neonates rely to associate their facial motor output with mouth visual input during interactions with the caregiver (Soussignan et al., 2011; Ferrari et al., 2013).
Current data on cortical development suggest that genetic, developmental and environmental factors (Provençal et al., 2012) might operate in concert progressively to achieve a fully functional MNS, that continues to expand, reorganize and connect through associative processes and cultural demands during the lifespan of the individual (Ferrari et al., 2013). More generally, experiential canalization produces a situation where the output is stable despite subtle changes in input or developmental trajectory (Miller, 1989; Gottlieb, 1991; Jablonka & Lamb, 2002; Dor & Jablonka, 2010).
Therefore mirror responses related to face and mouth actions may have already emerged in frontoparietal cortex of adult prosimians. However, we suggest that only in anthropoid primates did the increasing social selective pressures, neoteny and a progressively more interactive nature of the mother–infant relationship result in the evolution of mouth mirror responses at birth, or even during the neonatal period of development (Tramacere & Ferrari, 2016) [for an alternative view see Brass & Heyes, 2005; Cook, Johnston & Heyes, 2013].
(3) Developmental plasticity in vocal communication
Audio–vocal MNs may be the result of an evolutionary process in which ancestral amniote sensorimotor circuits controlling innate calls were extended by the establishment of corticospinal connections for the execution and perception of learned vocalizations. In the majority of birds and mammals, however, vocal behaviour is mostly innate, spontaneous and localized in the ‘paleoencephalon’ (Nottebohm, 1972; Reiner et al., 2004). The evolutionary emergence of vocal learning circuits is limited to some species of birds (songbirds, parrots, hummingbirds), cetaceans and Homo sapiens (Nottebohm, 1972; Jarvis et al., 2005), and is thought to involve the projection of a neural pathway from the premotor and motor cortex to the brainstem (Fig. 2). As explained in Section 10.3, audio–vocal mirror mechanisms emerged in the avian and human premotor regions (HVC and Broca's area, respectively) as neurons related to executive and perceptual vocal coding.
The phenomenon of imitation is central to vocal learning: forming a model of the heard vocal sound is in fact crucial to the development of song or spoken language (Jarvis, 2004). In addition, auditory feedback is important because it is instrumental for the evaluation of the vocal skills achieved by the speaker (Konishi, 1965). Thus, it is plausible that neural matching between conspecifics' auditory input and vocal output is responsible for the incorporation of these types of MNs into the first form of vocal learning in songbirds and primates.
The distinction between vocal and non-vocal learners is blurred due to plasticity. In several species of vertebrates (e.g. mice, frogs, reptiles, birds) calls are highly stereotypic and associated with emotional drives (Jarvis, 2004; Feng et al., 2006; Kikusui et al., 2011). They are under the control of subcortical brain structures, which allow little to no voluntary control (Nottebohm, 1972). While monkeys were previously thought to have a similar system (Jürgens, 2002), more recent studies reveal that their vocalizations and the capacity for voluntary control are more nuanced (Coudé et al., 2011; Fogassi, Coudé & Ferrari, 2013; Hage & Nieder, 2013). Indeed, monkeys can be trained to modify and control their vocalizations (Coudé et al., 2011; Hage & Nieder, 2013). Macaques can learn to achieve a significant level of vocal control and to emit a voluntary vocalization following vocal operant conditioning (Coudé et al., 2011). After such training, neurons were activated specifically during the production of voluntary vocalizations.
Many songbird species show strong sexual dimorphism, with sexual selection and female choice, so that only males vocalize and develop the nuclei necessary for the emergence of songs (Nottebohm, 1972). Nevertheless, females treated with hormones such as oestrogens and testosterone can be conditioned to undergo the sensorimotor phase responsible for learning complex songs (Nottebohm, 1980). These observations suggest that vocal learning is not an on–off mechanism during development, but that it requires developmental construction (Petkov & Jarvis, 2012). Studies in birds show that the biological substrates underlying vocal or non-vocal learning abilities are not clear-cut; there are latent cognitive mechanisms related to conditioned vocal behaviour that can be triggered by the proper cues (learning conditions) and/or physiological influences (hormonal regulation).
Although vocal learning is partly a plastic behaviour, because the effect of specific exogenous or endogenous environmental conditions can activate specific neural circuits associated with it, the difference between vocal and non-vocal learners cannot be reduced simply to a matter of plasticity. Indeed, there is a difference between the development of vocal learning through conditioning and the ability to acquire it fluently without reward. If the former can be seen as the capacity to make limited modifications of presumably innate vocalizations; the latter seems to be the result of evolved brain (and morphological) mechanisms that support the rapid accurate reproduction of vocal variants (Petkov & Jarvis, 2012; Arbib, 2013).
V. CONCLUSIONS
(1) MNs are discrete populations of neurons responding to the execution and perception of the same actions and forming a network of interconnected neural areas (the MNS) in the frontoparietal regions of the brain. Different categories of MNs can be described according to the effector performing and the sensory channel perceiving the mirrored actions: audio–vocal MNs, hand visuomotor MNs, and mouth visuomotor MNs. In each of these, the mirror response (i.e. the matching between perception and action) is consistent, suggesting an evolutionarily stable basis to the trait. Nevertheless, functions, locations, sensory modalities and developmental trajectories of the various types of MNs differ (both at interspecific and intraspecific levels), so the evolution of MNs as a single trait seems unlikely.
(2) Audio–vocal MNs seem to be the result of convergent evolution between songbirds and humans. In both species vocal-learning produces neural circuits devoted to auditory feedback in the motor system, such as neurons activated both during vocalization and listening. However, further analyses of the pathways in which these MNs are located suggest that they might represent factorial homologous traits, and therefore that audio–vocal MNs may be the result of the expression of similar molecular regulation associated with sensorimotor regions. Careful research, in line with recent pioneering neurogenetics studies in songbirds (see Fishell & Heintz, 2013; Scharff & Adam, 2013), is needed to verify this possibility.
(3) Because of their similar cerebral location in frontoparietal circuits across primate species, and because similar neural connections for the control of hand and mouth movements are present in early primates, including prosimians, we propose that hand and mouth visuomotor MNs are the result of classical homology. This homology reflects inheritance from a common ancestor who possessed a specific pattern of connectivity between specific cerebral areas giving rise to the MNS (Arbib & Bota, 2003). Characters might be considered homologous on the basis of their structural correspondence and if the more parsimonious interpretation is that they evolved once during phylogeny (Striedter & Northcutt, 1991).
(4) Computational models, plus behavioural and neurophysiological evidence related to the developmental properties of MNs (Keysers & Perrett, 2004; Catmur et al., 2007; Bonaiuto & Arbib, 2010; Ferrari et al., 2013) suggest that primate hand MNs could be considered homologous at the developmental level as well. In this case, the phylogeny of MNs can be described as developmentally or epigenetically homologous, reflecting ontogenetic patterns that tend to reappear reliably during each individual ontogenetic process (Striedter, 1998). Emphasizing the developmental similarity of a homologous trait may sound unnecessary, but in the animal kingdom it is not rare to find structurally homologous traits that exhibit different developmental trajectories, i.e. originate from different embryonic germ layers (Jenkinson, 1913; Striedter, 1998).
(5) The integration of classical and developmental homology in the primate MNs/MNS has been used to explain both observed developmental plasticity (Heyes, 2010; Ferrari et al., 2013) and phylogenetic continuity (Rizzolatti & Arbib, 1998; Casile, Caggiano & Ferrari, 2011) of this adaptive set of traits. This is compatible with a hypothesis that focuses on the role of epigenetic processes underlying developmental plasticity and canalization processes (Giudice et al., 2009; Ferrari et al., 2013) in the development and evolution of the primate MNS.
(6) Despite remarkable similarities at the structural and neurophysiological levels, the mouth and hand MNSs seem to show different timing in their developmental maturation. The likely involvement of mouth visuomotor mirror responses in neonatal imitation suggests that the mouth MNS may emerge more precociously than the hand MNS. In macaques, mirror mechanisms related to mouth actions have been detected during the first postnatal days of life, while the observation of hand actions produces activation of sensorimotor regions only after a week of postnatal development (Casile et al., 2011). A possible explanation for this difference in the ontogenetic trajectory of mouth and hand MNs is that mouth mirror mechanisms may have undergone a stronger selective pressure in the context of the increasing social demands experienced by anthropoid primates about 30 Mya, likely involving strengthening of mother–infant relationships and face-to-face interactions (Dunbar, 2010; Casile et al., 2011; Tramacere & Ferrari, 2016).
(7) The development of mouth mirror mechanisms associated with facial perception may be interpreted as a case of experiential canalization and be instrumental to an understanding of the different (endogenous and exogenous) factors that influence facial gesture perception/reproduction in primate newborns. This may explain the variety of both positive and negative results on neonatal facial imitative responses (Oostenbroek et al., 2016), in both humans and macaques, approximately 50% of individuals reliably imitate facial gestures – likely as a result of endogenous variation amongst infants in mirroring mechanisms. Analysis of the early developmental environments of different species may provide new avenues for research on the perception of mouth gestures and its neural and molecular basis, both in primate and prosimian neonates.
(8) This review offers a new perspective suggesting hypotheses regarding MN evolution, and also widens the heuristic potential for predicting the circumstances under which specific variations in MN activity are expected. Such predictive value is critical to testing new hypotheses about MN activity and plastic changes, neuroanatomical substrates, and ecological niche.
(9) Our mosaic evolution hypothesis for the MNS predicts that different species may recruit a wider network of neural areas in relation to the evolutionary/social pressures they encountered as they evolved. Indeed, the comparison between human and macaque hand MNs seems to suggest multiple neural reuse, giving rise in humans (and possibly in apes) to the emergence of a specific class of MNs selective for tool actions (i.e. tool-response MNs). As some species of macaque are now known to have greater abilities in tool-manipulative actions than previously thought (Gumert & Malaivijitnond, 2012), this new approach may stimulate additional investigations on the role of MNs in the evolution of tool use and associated cognitive traits, such as emulation/imitation of complex manual gestures (Iriki, 2006; Arbib et al., 2009; Iriki & Taoka, 2012).
(10) Finally, the comparison between MNs related to perception and execution of vocalizations in songbirds and humans might suggest interesting evolutionary hypotheseis. Although there are differences between song and speech development in songbirds and humans, the presence of common mechanisms of neural audio–vocal matching may support hypotheses on the role of MNs in the evolution of communication. For example, given that audio–vocal MNs have been proposed to function in speech recognition in humans (Pulvermüller et al., 2006, the lack of vocal learning in females of songbirds may clarify possible differences between purely auditory and sensorimotor perception of communicative cues.
(11) Considering the occurrence of different rates of evolutionary change related to the emergence of various categories of MNs in different species and different cerebral regions, we interpreted the MNS to result from mosaic evolution. The mosaic evolution hypothesis posits that in order to understand how the mirror cognitive system emerges, a confluence of multiple mechanisms is necessary, each having different precursors. We have suggested different evolutionary trajectories for the various categories of MNs, each with unique adaptations (specific cortical areas, i.e. premotor, parietal, temporal cortices), exaptations (functional shift during phylogeny), environmental challenges, and timing in developing novelties (i.e. tool-responding mirror circuits). Note, however, that this division of MNs/MNS into sub-traits does not mean that these different sub-traits have not interacted dynamically during primate history, possibly giving rise to new emergent cognitive skills. Rather, the comparative analysis of the interaction between different categories of MNs may be helpful to the understanding of the evolution of higher cognitive skills, in particular that of human language, which functions as a multilevel system involving hand, vocal and orofacial gesture perception and execution (Tramacere & Moore, in press). A comparison between avian and primate audio–vocal MNs, in addition to analysis of the possible phylogenetic trajectory of hand and mouth visuomotor MNs in primates, could facilitate further discussion on the evolution of communication within a MN perspective.
VI. ACKNOWLEDGEMENTS
The work was supported by NIH P01HD064653. We thank Francesco Suman, Elisabeth Simpson, Alessandro Minelli, Eva Jablonka and Richard Moore for suggestions and criticism.