Diversity, distribution, and evolutionary history of the most studied African rodents, multimammate mice of the genus Mastomys: An overview after a quarter of century of using DNA sequencing
Contributing authors: Alexandra Hánová ([email protected]), Adam Konečný ([email protected]), Ondřej Mikula ([email protected]), Anna Bryjová ([email protected]), Radim Šumbera ([email protected])
Abstract
enDespite the importance of rodents as agricultural pests and reservoirs of zoonoses, the taxonomy and evolutionary history of many groups is still not sufficiently understood. The genus Mastomys (multimammate mice or rats) comprises abundant and intensively studied rodents, widespread across sub-Saharan Africa. Here, we used an extensive dataset of mitochondrial DNA markers comprising of nearly 2700 individual sequences from 30 African countries to update the information about the geographical distribution of their genetic diversity. In the next step, we sequenced complete mitogenomes, six nuclear markers, and produced anchored phylogenomic data (355 loci) and, for the first time, sufficiently resolved phylogenetic relationships among all extant Mastomys species and reconstructed their evolutionary history. The results suggest eight species of Mastomys occupying various non-forested environments. Some species are very widespread (Mastomys natalensis, Mastomys kollmannspergeri, and Mastomys erythroleucus; for the latter we provide first records from Tanzania, thus significantly extending its distribution), while others have their distribution restricted to particular geographical areas (Mastomys coucha in South African region, Mastomys awashensis in Ethiopia, and Mastomys angolensis in Angola and southern DRC) or to particular habitat, that is, wetlands in western (Mastomys huberti) or southwestern (Mastomys shortridgei) Africa. The first split separating M. angolensis (with five pairs of mammae only) from remaining multimammate taxa occurred in mid-Pliocene, but the most intensive radiation occurred in mid-Pleistocene and was likely driven by the intensification of climate oscillations. The resolved phylogeny of Mastomys will facilitate their further use as model taxa, for example, in understanding proximate mechanisms of evolution of the multimammate phenotype.
RÉSUMÉ
frMalgré l'importance des rongeurs comme ravageurs agricoles et réservoirs de zoonoses, la taxonomie et l'histoire évolutive de nombreux groupes restent largement méconnues. Le genre Mastomys (souris ou rats à mamelles multiples) comprend des rongeurs abondants et intensément étudiés, répandus dans toute l'Afrique subsaharienne. Ici, nous avons utilisé un important jeu de données comprenant près de 2700 séquences d'ADN mitochondrial provenant de 30 pays pour mettre à jour les informations sur la répartition géographique de leur diversité génétique. Nous avons également séquencé des mitogénomes complets, ainsi que six marqueurs nucléaires et produit des données phylogénomiques (355 loci). Cela nous a permis, pour la première fois, de résoudre les relations phylogénétiques entre toutes les espèces actuelles du genre, et de reconstruire son histoire évolutive. Les résultats suggèrent l’existence de huit espèces de Mastomys occupant divers environnements non boisés. Certaines espèces sont très répandues (Mastomys natalensis, Mastomys kollmannspergeri, Mastomys erythroleucus ; pour cette dernière nous montrons qu’elle est présente en Tanzanie, étendant ainsi considérablement son aire de distribution), tandis que d'autres ont une distribution restreinte à des zones géographiques particulières (Mastomys coucha dans la région sud-africaine, Mastomys awashensis en Éthiopie, Mastomys angolensis en Angola et sud de la RDC) ou à un habitat particulier (zones humides de l'ouest de l'Afrique pour Mastomys huberti ou du sud-ouest pour Mastomys shortridgei). La première scission séparant Mastomys angolensis (avec cinq paires de mamelles seulement) des autres Mastomys s'est produite au milieu du Pliocène, mais la radiation la plus importante s'est produite au milieu du Pléistocène et était probablement due à l'intensification des oscillations climatiques à cette période. La phylogénie résolue obtenue pour le genre Mastomys facilitera sont utilisation comme taxon modèle, par exemple pour comprendre les mécanismes proximaux conduisant à l'évolution du phénotype multimammelles.
1 INTRODUCTION
Murid rodents (Rodentia: Muridae) represent one of the evolutionarily most successful group of mammals. Their phylogenetic relationships and evolutionary history are relatively well known due to recent analyses of large multilocus genetic datasets and calibration of molecular clock based on multiple paleontological records (Aghová et al., 2018; Steppan & Schenk, 2017; Upham et al., 2019). It is generally agreed that the family is divided into five subfamilies, Murinae being the most species-rich among them (656 species are recognized in Wilson et al., 2017). Within this subfamily, 15 tribes were delimited, five of which (Otomyini, Arvicanthini, Malacomyini, Murini, and Praomyini) are indigenous in sub-Saharan Africa (Denys et al., 2017; Lecompte et al., 2008). Following the most diverse (ca. 100 species) African tribe Arvicanthini (Mikula et al., 2021), the tribe Praomyini represents the second most speciose group of rodents in sub-Saharan Africa, where all but two species occur (Denys et al., 2017). They are widely distributed in various ecosystems; their populations are often very abundant and have immense significance to human as agricultural pests or reservoirs of zoonoses. Eight genera have been traditionally recognized in this clade, based mainly on external and skull morphology (Colomys, Heimyscus, Hylomyscus, Mastomys, Myomyscus, Praomys, Stenocephalemys, and Zelotomys; Denys et al., 2017; Musser & Carleton, 2005; but see important recent taxonomic changes by Nicolas et al., 2021). Very recently, Giarla et al. (2021) performed the first genetic analysis of the enigmatic Nilopegamys, the Ethiopian endemic genus known only from the holotype and probably extinct (considered as “incertae sedis” in Denys et al., 2017), and showed that it is a sister genus to Colomys. The alpha-taxonomy of praomyine genera is only partially solved. Recent application of integrative taxonomy approaches has illuminated the cryptic and undescribed diversity in several groups (e.g., Giarla et al., 2021; Mizerovská et al., 2019, 2020; Nicolas et al., 2020), and the number of named species of Praomyini will likely further rise in near future.
The genus Mastomys is probably the most widespread and abundant group of rodents in Africa. It occurs primarily in sub-Saharan part of the continent with isolated populations in eastern Sahara (Chad and Sudan) and Morocco (Monadjem et al., 2015). The species of this genus prefer savannah-like habitats, and they likely profited from aridification and opening of the African landscape since Pliocene (e.g., Lecompte, Denys, et al., 2005; Nicolas et al., 2021). Their populations significantly fluctuate and often reach very high densities (Leirs et al., 1997), which makes them important agricultural pests and reservoirs of zoonoses, for example, Lassa mammarenavirus (Begon et al., 1999; Lecompte et al., 2006; Taylor et al., 2008). They are also very useful model taxa for fundamental biological research, for example,in speciation, population genetics and phylogeography (Brouat et al., 2007, 2009; Colangelo et al., 2013; Dobigny et al., 2010), host–parasite interactions (Brouat & Duplantier, 2007), virus evolution and epidemiology (Cuypers et al., 2020; Fichet-Calvet et al., 2007; Goüy de Bellocq et al., 2020; Gryseels et al., 2017; Lalis et al., 2012; Lecompte et al., 2006), or developmental biology (Hardin et al., 2019). The knowledge of their species diversity, geographical distribution, and evolutionary history is, therefore, crucial for solving particular questions, when using Mastomys as a research model. Unfortunately, most species in this genus are morphologically cryptic (e.g., Lecompte, Brouat, et al., 2005), and information based on genetically identified specimens is still insufficient or absent from large part of the distribution range of the genus.
Until very recently (see Nicolas et al., 2021), eight species of Mastomys were recognized: Mastomys natalensis, Mastomys erythroleucus, Mastomys kollmannspergeri, Mastomys huberti, Mastomys shortridgei, Mastomys coucha, Mastomys awashensis, and Mastomys pernanus (e.g., Granjon et al., 1997; Happold, 2013; Monadjem et al., 2015; Musser & Carleton, 2005; Wilson et al., 2017). Individual species of the genus have been originally distinguished by protein electrophoresis (Duplantier et al., 1990), but mainly by karyotypes (e.g., Britton-Davidian et al., 1995; Duplantier et al., 1990; Volobouev et al., 2001; reviewed in Granjon et al., 1997). Later on, it was found that different cytotypes can be distinguished by mitochondrial sequences (e.g., Lecompte, Brouat, et al., 2005), and this allowed to DNA-barcode numerous materials across Africa. Most of the previous DNA sequence-based studies of Mastomys dealt either with individual species (e.g., phylogeographic single-taxon studies of Brouat et al., 2009; Colangelo et al., 2013; Dobigny et al., 2008; Mouline et al., 2008) or with multiple species in geographically limited areas (northern Cameroon and Chad—Dobigny et al., 2010; southeastern Senegal—Brouat et al., 2007; and Ethiopia—Martynov et al., 2020). Distribution ranges of multiple Mastomys species often overlap, so they can live in syntopy or habitat micro-allopatry (Brouat et al., 2007; Dobigny et al., 2008; Duplantier & Granjon, 1988; Martynov et al., 2020). Surprisingly (and despite the practical importance of the genus Mastomys), most previous phylogenetic and phylogeographic studies are based solely on mitochondrial sequences of cytochrome b (CYTB). Furthermore, even studies with the best taxon sampling (Colangelo et al., 2010; Dobigny et al., 2008; Eiseb et al., 2021) did not include representatives of all recognized Mastomys species (sensu Nicolas et al., 2021), and the evolutionary relationships among individual species have never been sufficiently resolved.
The last profound review of the systematics of the genus was published approximately a quarter of century ago (Granjon et al., 1997). It was based primarily on the analysis of karyotypes and provided the first solid ideas about the species richness and geographical distribution of Mastomys taxa. Since that time, numerous datasets based mainly on CYTB barcoding were produced and significantly improved our knowledge of the genus diversity (see references above). Recently, Nicolas et al. (2021) reconstructed phylogeny of Praomyini by using material from all major evolutionary lineages of the tribe (except Nilopegamys) and genomic-scale data. This most complete and well-resolved phylogeny of the tribe has important implications for the taxonomy of the group, especially for the species content of particular genera. Two important changes concerned the genus Mastomys. It became clear that M. pernanus is not Mastomys (in agreement with previous suggestions, for example, Lecompte, Denys, et al., 2005) and a separate generic name Serengetimys was proposed for this taxon. On the contrary, the species “angolensis” from Angola and southern DRC (Krásová et al., 2021), included previously in Myomyscus (e.g., Denys et al., 2017), unequivocally clustered with other Mastomys, keeping the number of recognized Mastomys species equal to eight.
In this study, we first assembled all published and georeferenced CYTB sequences of the genus Mastomys and combined them with numerous unpublished sequences collected during our own field research in the last two decades. The resulting alignment composed of 2693 CYTB sequences from all eight currently recognized species (sensu Nicolas et al., 2021) was used for updating the distribution of all species and intraspecific mitochondrial clades across Africa. Besides CYTB, we assembled complete mitogenomes of all species and reconstructed mitochondrial phylogeny of the genus. In the next step, we produced a species tree of all Mastomys species based on nuclear markers (either six Sanger-sequenced genes or tens to hundreds of loci from the so-called anchored phylogenomics approach; Lemmon et al., 2012). Finally, based on both mitochondrial and nuclear phylogenetic reconstructions, fossil-based calibration of molecular clock, review of cytogenetic data and distribution of each species and intraspecific lineages, we proposed evolutionary scenario for this important genus of African rodents.
2 MATERIALS AND METHODS
2.1 Sampling
This study is based on 2699 individuals of the genus Mastomys from 526 localities/30 countries across the distribution of the genus (Figure 1, Table S1); 2660 of them were genotyped on CYTB: 566 within this study by the authors and their collaborators and 2094 sequences were downloaded from GenBank (many of them also genotyped by the authors) and from the African Mammalia database (Van de Perre et al., 2019). Remaining 39 specimens were reliably identified to species based on karyotypes or short unpublished CYTB sequences referenced in particular papers (see Table S1), and they were used only to update the species distributions. In total, the used alignment of 2660 sequences represents the most comprehensive available genetic dataset of all currently recognized Mastomys species (sensu Nicolas et al., 2021), and to our knowledge, it is the largest genetic dataset (in number of included specimens) of any African mammal genus. New tissue samples analyzed in this study were collected by authors and their collaborators during field expeditions over last 20 years. All captured animals were dissected, and pieces of tissue (heart, lung, spleen, kidney, muscle, or tail) were stored in 96% ethanol until DNA extraction. All fieldwork complied with legal regulations in the respective African countries and sampling was carried out in accordance with local legislation (see Acknowledgements).

2.2 Genotyping
DNA was extracted from ethanol-preserved tissues using the commercial Exgene Tissue SV plus kit (GeneAll) following the manufacturer's instructions. Primers L14123 and H15915 (Lecompte et al., 2002) were used to amplify the whole CYTB gene by polymerase chain reaction (PCR). After performing pilot phylogenetic analysis of mtDNA, we selected 50 individuals, representing each major mitochondrial clade of Mastomys (with geographical distribution as large as possible), for reconstruction of nuclear phylogeny using six nuclear markers—exon 1 of interphotoreceptor retinoid-binding protein gene (IRBP), recombination activating gene 1 (RAG1), intron 7 of 24-dehydrocholesterol reductase gene (DHCR), intron 9 of smoothened homolog gene (SMO), intron 4 of glutamine-dependent NAD synthetase gene (NS), and glucosaminyl (N-acetyl) transferase 4 homolog gene (Gcnt4). For more detailed information on the markers used, including primer sequences and PCR protocols, see Tables S2 and S3. All new sequences were submitted to GenBank under accession numbers OK646652–OK647238 (CYTB), OK512762–OK512809 (RAG1), OK357341–OK357392 (IRBP), OK357290–OK357340 (DHCR), OK357393–OK357442 (SMO), OK512810–OK512858 (NADSYN), and OK647239–OK647289 (GCNT4 ) ( see Table S1 for more details). In addition, we produced the genomic-scale dataset (one individual per species; 355 loci) for all eight named Mastomys species by the anchored hybrid enrichment approach and high-throughput Illumina sequencing of enriched libraries (Lemmon et al., 2012). Probe design and data collection were performed by the Center for Anchored Phylogenomics (www.anchoredphylogeny.com). More details about the laboratory protocols and bioinformatic pipeline are provided in Mikula et al. (2021). The same raw reads were also used to assemble complete mitogenomes of all eight Mastomys species (one individual per species; 14,991 bp; for more details, see Nicolas et al., 2021).
2.3 Mitochondrial diversity and phylogeny
Newly acquired CYTB sequences were edited in Geneious 9.1.5 (Kearse et al., 2012) and aligned together with all available Mastomys sequences from GenBank in AliView 1.26 (Larsson, 2014). First, we reconstructed phylogeny of 1169 long (>700 bp) haplotypes found among the total of all 2660 CYTB sequences. We used the maximum likelihood (ML) inference in RAxML 8.2.12 (Stamatakis, 2014), using GTR + G substitution model. The robustness of the nodes was evaluated by rapid bootstrapping with 1000 replications. Phylogenetic placement of the remaining 1491 sequences was established by their identity to the long haplotypes or according to the nearest neighbor criterion. Identification and classification of haplotypes were performed in R (R Core Team, 2021) using package ape (Paradis & Schliep, 2019). Alternatively, we identified haplotypes for each species in DnaSP 5 (Librado & Rozas, 2009) and constructed median-joining haplotype networks in Network 10.2.0.0 (Bandelt et al., 1999). The whole dataset for haplotype networks consisted of 2313 sequences from all eight species, that is, very short sequences were removed, and the networks were constructed separately from following alignments: Mastomys angolensis (23 seqs/645 bp), M. awashensis (33 seqs/645 bp), M. coucha + M. shortridgei (31 seqs/645 bp), M. erythroleucus (211 seqs/645 bp), M. huberti (37 seqs/645 bp), M. kollmannspergeri (23 seqs/645 bp), and M. natalensis (1955 seqs/669 bp). The combined output of the ML analysis of complete dataset, haplotype networks, and distributional data were used for the delimitation of 20 molecular operational taxonomic units (MOTUs; see individual assignments in Table S1). Each MOTU represents species or intraspecific lineage, which was monophyletic in at least one of the analyses and whose geographical distribution is largely parapatric to other conspecific MOTU. The only exception is the pair of species M. coucha and M. shortridgei; they are differentiated by ecology, karyotypes, and morphology (Eiseb et al., 2021), but their mtDNAs do not form monophyletic clusters.
In the next step, we selected 60 individuals (three specimens per MOTU, covering their geographical distribution and genetic diversity as much as possible) and used both ML and Bayesian inference (BI) approaches for the reconstruction of mitochondrial CYTB phylogeny of the genus. The alignment was partitioned according to the codon position, following the results from PartitionFinder2 (Lanfear et al., 2016). The nucleotide substitution model was GTR +G, and its parameters were unlinked between partitions. ML analysis was performed in RAxML 8.2.12 (Stamatakis, 2014), and robustness of the nodes was evaluated by rapid bootstrapping with 1000 replications. BI was performed in MrBayes 3.2.7a (Ronquist et al., 2012), using 5,000,000 generations and 25% burn-in. Convergence of the posterior distribution sampling was assessed in Tracer 1.7.0 (Rambaut et al., 2018) by visual comparison of four independent Markov Chain Monte Carlo runs. As outgroups for rooting the tree we included five species from the tribe Praomyini: Praomys rostratus (GenBank accession number EU053828), Zelotomys hildegardeae (KY754181), Serengetimys pernanus (AF518343, OK646705, OK646706, OK646726–OK646730, OK647171), Stenocephalemys albipes (MH297581), and Stenocephalemys sokolovi (MF685484); and three species from the tribe Arvicanthini: Aethomys chrysophilus (OK646817), Lemniscomys zebra (MT968081), and Arvicanthis niloticus (AF004572). The calculation of mean p-distances among MOTUs and species was performed in MEGA 7.0.18 (Kumar et al., 2018) and was based on the same dataset that we used for CYTB phylogenetic analysis, that is, 60 sequences covering all species and their intraspecific variability.
Due to the only partially resolved phylogenetic relationships in the CYTB tree (see Results), we performed phylogenetic analyses using complete mitogenomes of all eight species. Sequences were individually aligned in Geneious using MAFFT v.7.308. Protein-coding genes (13 genes), genes for non-coding RNA (2 ribosomal RNA and 22 transfer RNA), and control region (CR) were annotated in Geneious according to the reference of complete mtDNA of M. coucha (GenBank MF062946). For the phylogenetic analysis, sequences of CR and ND6 gene were excluded (see Mikula et al., 2021 for more details). All sequences were deposited in the GenBank, and accession numbers are listed in Table S1. The ML phylogenetic reconstruction of the concatenated dataset (alignment of 14 991 bp) was performed in IQ-TREE v. 2.1.1. (Nguyen et al., 2015). This analysis included the selection of substitution models (Kalyaanamoorthy et al., 2017), which also merged 61 predefined partitions into just four (see Data S1–S18). One sequence of S. sokolovi (LAV1947; GenBank MT408179) was used as an outgroup to root the tree. The ultrafast bootstrap approximation (UFBoot2; Hoang et al., 2018) was used for evaluating the clade support. The Bayesian reconstruction of the mitochondrial phylogeny was performed in MrBayes using the partitions and substitution models selected in IQ-TREE. Its result was summarized by the outgroup-rooted maximum clade credibility tree with median common ancestor node heights and clade supports quantified by their posterior probabilities (PP).
2.4 Concatenated nuclear data analysis, species trees, and divergence dating
The partitioned ML analysis of the concatenated alignment of six nuclear fragments (in total 3760 bp) genotyped in 50 ingroup individuals was performed in IQ-TREE as described above. We a priori specified separate partitions for each locus and in IRBP and RAG1 also for each codon position, but the model selection procedure merged them into just two (see Data S1–S18). Three concatenated multilocus sequences of S. pernanus (formerly included in Mastomys; Nicolas et al., 2021) were used as outgroups.
The species tree was estimated using the multispecies coalescent model as implemented in STARBEAST 2 (Ogilvie et al., 2017), using the same dataset, that is, six loci in 50 specimens. Each individual was assigned to the species according to the results of mtDNA phylogeny (+ ecological and karyotypic data in the case of M. shortridgei/M. coucha; see Table S1). Alignments for each of the six fragments were imported into BEAUTI v.2.4.6, where separate and unlinked substitution, clock, and tree models were set. We applied a strict molecular clock and HKY substitution model (Hasegawa et al., 1985) for each locus. As we did not expect substantial missing lineages, the less complex Yule speciation model was chosen with an uninformative prior on speciation rate. The constant per-branch population sizes were analytically integrated using approach of Hey and Nielsen (2007). Two independent runs were carried out for 50 × 106 generations in BEAST, with sampling every 5000 generations. The resulting parameter and tree files from the two independent runs were examined for convergence and effective sample size (to be >200) in TRACER v.1.6 (Rambaut et al., 2018) and combined using LOGCOMBINER v.2.4.6 (Drummond, & Bouckaert, 2015), discarding the first 10% as burn-in. The maximum clade credibility (MCC) tree was calculated by using TREEANNOTATOR v.2.4.6 (Drummond, & Bouckaert, 2015).
The analysis of 355 anchored phylogenomic loci was conducted in BP&P v. 4.3.0, using the model A01 (Flouri et al., 2018), with each species being represented by a single unphased genotype at every locus. For all loci, we assumed HKY substitution model and strict molecular clock with a flat Dirichlet prior (α = 1) on their relative mutation rates. Inverse gamma priors for the basal divergence time (τ0) and population size (θ) parameters were set to (α = 3, β = 0.0152) and (α = 3, β = 0.0044), respectively, based on preliminary analyses of interspecific genetic distances. The analysis was run two times and convergence checked in TRACER. The posterior sample was summarized by the MCC tree with median node heights calculated by R functions accessible at https://github.com/onmikula/mcctree.
The divergence dating was performed in STARBEAST2 on 56 loci that were previously selected for divergence dating of the tribe Praomyini (Nicolas et al., 2021) and were successfully sequenced in all eight species of Mastomys. The analysis integrated two kinds of information: species tree and gene tree topologies estimated in BP&P in the current study and divergence dates from the time-calibrated tree of tribe Praomyini (Nicolas et al., 2021). Five species of Mastomys were included in the previous study (Nicolas et al., 2021; M. angolensis, M. kollmannspergeri, M. coucha, M. erythroleucus, and M. natalensis), and so we considered posterior samples of four divergence dates. Log normal distributions were fitted to them in R package MASS (Venables & Ripley, 2002), and the results were used as calibration priors. The procedure yielded the following mutually nested calibrations: M. angolensis versus the rest (μ = 1.22, σ = 0.10), M. kollmannspergeri versus the rest (μ = 0.73, σ = 0.11), M. coucha versus the rest (μ = −0.09, σ = 0.12), and M. erythroleucus versus M. natalensis (μ = −0.60, σ = 0.40). Constant per-branch population sizes were analytically integrated (with an uninformative hyperprior on the mean of population size priors), and Yule (no extinction) prior was used for the species tree. Gene and species tree topologies were kept fixed by setting BP&P's MCC trees as starting ones and removing topology-changing STARBEAST2 operators.
3 RESULTS
3.1 Mitochondrial diversity and its geographical distribution and mitogenomic phylogeny
The phylogenetic ML analysis of 1169 CYTB haplotypes from Mastomys sensu Nicolas et al., (2021) allowed clear delimitation of seven major monophyletic clades (Figure S1), representing six recognized species and one species pair (M. coucha/shortridgei). More detailed analysis of mitochondrial diversity by haplotype networks (Figure 2) allowed to define distinct intraspecific groups with parapatric distribution in three species (Figure 1), in agreement with previous studies of mtDNA in M. natalensis (Colangelo et al., 2013), M. erythroleucus (Brouat et al., 2009 + newly discovered lineage E in Tanzania), and M. huberti (Mouline et al., 2008). We found relatively high intraspecific diversity also in M. awashensis and M. kollmannspergeri, with possible parapatric distribution of main haplogroups (Figures 1 and 2). On the contrary, two very distinct haplogroups of M. angolensis can live in sympatry, at least in the locality near Namba (20 km southwest of Cassoungue) in Angola, indicating either an ancestral polymorphism or a secondary contact. Two species, M. coucha and M. shortridgei, shared one haplotype, but a majority of M. shortridgei formed a distinct haplogroup (Figure 2; see also Eiseb et al., 2021).

In total, we defined 20 mitochondrial MOTUs (not considering internal structure in five species due to limited amount of data), and their phylogenetic relationships are shown in Figure 3a. Topology of CYTB trees depended on the type of analysis (ML or BI), and although most monophyletic MOTUs were highly supported in both analyses, their mutual relationships were not sufficiently resolved. The conflicting results between ML and BI topology only concerned poorly supported clades. In both analyses, M. angolensis was a sister to all remaining species, followed by the offshoot of M. kollmannspergeri. Two species, M. coucha and M. shortridgei, were indistinguishable at CYTB and their clade formed a sister to four remaining species, but the support for their monophyly was low (ML bootstrap = 55, posterior probability of BI = 0.59; Figure 3a). In all analyses, M. erythroleucus and M. natalensis were sister species. These results were very similar to the ML and Bayesian trees from complete mitogenomes, but these were better resolved (Figure 3b). Most notably, the position of mtDNA of M. huberti as sister group of the remaining species (besides M. angolensis and M. kollmannspergeri) had relatively high support (UFBoot2 = 84, PP=1.00).

Mean genetic p-distances at CYTB among 20 delimited MOTUs in the genus Mastomys are summarized in Table 1. They ranged from 1.7% (hubC vs. hubD) to 15.3% (hubA vs. koll or eryA vs. koll). The mean distances between eight recognized species ranged from 4% (M. shortridgei vs. M. coucha) to 15% (M. huberti vs. M. kollmannspergeri).
(A) | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
koll | ang | cou | sho | awa | hubD | hubC | hubA | hubB | eryE | eryB | eryA | eryD | eryC | natAIII | natAII | natBIV | natBVI | natAI | natBV | |
koll | – | 0.006 | 0.008 | 0.008 | 0.008 | 0.009 | 0.009 | 0.009 | 0.009 | 0.008 | 0.009 | 0.009 | 0.009 | 0.009 | 0.009 | 0.008 | 0.009 | 0.009 | 0.009 | 0.008 |
ang | 0.103 | – | 0.007 | 0.007 | 0.007 | 0.008 | 0.008 | 0.008 | 0.008 | 0.007 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.007 |
cou | 0.123 | 0.123 | – | 0.004 | 0.007 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.007 | 0.008 | 0.008 | 0.008 | 0.008 | 0.007 | 0.007 |
sho | 0.111 | 0.120 | 0.039 | – | 0.007 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.009 | 0.008 | 0.008 | 0.008 | 0.007 | 0.008 | 0.008 | 0.007 | 0.007 |
awa | 0.119 | 0.114 | 0.096 | 0.097 | – | 0.008 | 0.008 | 0.008 | 0.008 | 0.007 | 0.007 | 0.007 | 0.007 | 0.007 | 0.007 | 0.007 | 0.007 | 0.007 | 0.007 | 0.007 |
hubD | 0.151 | 0.143 | 0.104 | 0.112 | 0.109 | – | 0.003 | 0.005 | 0.004 | 0.008 | 0.008 | 0.008 | 0.009 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.007 | 0.008 |
hubC | 0.143 | 0.136 | 0.099 | 0.106 | 0.105 | 0.017 | – | 0.005 | 0.004 | 0.008 | 0.007 | 0.008 | 0.008 | 0.008 | 0.008 | 0.007 | 0.008 | 0.008 | 0.007 | 0.008 |
hubA | 0.153 | 0.143 | 0.109 | 0.116 | 0.114 | 0.029 | 0.034 | – | 0.003 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.007 | 0.008 | 0.008 | 0.007 | 0.008 |
hubB | 0.148 | 0.137 | 0.103 | 0.109 | 0.105 | 0.024 | 0.028 | 0.018 | – | 0.008 | 0.007 | 0.008 | 0.008 | 0.007 | 0.008 | 0.007 | 0.008 | 0.008 | 0.007 | 0.008 |
eryE | 0.122 | 0.113 | 0.094 | 0.096 | 0.081 | 0.100 | 0.091 | 0.099 | 0.093 | – | 0.007 | 0.007 | 0.006 | 0.006 | 0.007 | 0.007 | 0.007 | 0.007 | 0.007 | 0.007 |
eryB | 0.147 | 0.143 | 0.107 | 0.114 | 0.094 | 0.089 | 0.085 | 0.091 | 0.083 | 0.067 | – | 0.005 | 0.005 | 0.005 | 0.007 | 0.007 | 0.007 | 0.007 | 0.007 | 0.008 |
eryA | 0.153 | 0.145 | 0.112 | 0.120 | 0.098 | 0.093 | 0.093 | 0.096 | 0.089 | 0.068 | 0.040 | – | 0.005 | 0.005 | 0.007 | 0.007 | 0.007 | 0.007 | 0.007 | 0.008 |
eryD | 0.143 | 0.135 | 0.104 | 0.111 | 0.090 | 0.097 | 0.090 | 0.098 | 0.091 | 0.054 | 0.047 | 0.045 | – | 0.005 | 0.007 | 0.007 | 0.007 | 0.007 | 0.007 | 0.008 |
eryC | 0.147 | 0.144 | 0.106 | 0.115 | 0.098 | 0.092 | 0.090 | 0.096 | 0.090 | 0.064 | 0.046 | 0.046 | 0.047 | – | 0.007 | 0.007 | 0.007 | 0.007 | 0.006 | 0.008 |
natAIII | 0.141 | 0.141 | 0.100 | 0.106 | 0.090 | 0.091 | 0.087 | 0.094 | 0.087 | 0.080 | 0.069 | 0.075 | 0.073 | 0.072 | – | 0.005 | 0.005 | 0.005 | 0.005 | 0.006 |
natAII | 0.131 | 0.134 | 0.103 | 0.104 | 0.097 | 0.101 | 0.100 | 0.104 | 0.098 | 0.086 | 0.085 | 0.093 | 0.085 | 0.090 | 0.051 | – | 0.005 | 0.006 | 0.005 | 0.006 |
natBIV | 0.135 | 0.129 | 0.094 | 0.098 | 0.088 | 0.100 | 0.096 | 0.103 | 0.096 | 0.066 | 0.081 | 0.086 | 0.073 | 0.085 | 0.042 | 0.059 | – | 0.005 | 0.005 | 0.006 |
natBVI | 0.139 | 0.135 | 0.097 | 0.102 | 0.089 | 0.097 | 0.093 | 0.098 | 0.093 | 0.080 | 0.078 | 0.082 | 0.075 | 0.081 | 0.041 | 0.062 | 0.039 | – | 0.005 | 0.006 |
natAI | 0.131 | 0.130 | 0.097 | 0.100 | 0.092 | 0.099 | 0.093 | 0.099 | 0.095 | 0.079 | 0.079 | 0.082 | 0.074 | 0.083 | 0.046 | 0.064 | 0.048 | 0.048 | – | 0.006 |
natBV | 0.112 | 0.120 | 0.094 | 0.089 | 0.088 | 0.110 | 0.104 | 0.112 | 0.105 | 0.079 | 0.101 | 0.106 | 0.092 | 0.107 | 0.071 | 0.073 | 0.055 | 0.061 | 0.067 | – |
(B) | ||||||||
---|---|---|---|---|---|---|---|---|
kollmannspergeri | angolensis | coucha | shortridgei | awashensis | huberti | erythroleucus | natalensis | |
kollmannspergeri | – | 0.006 | 0.009 | 0.008 | 0.008 | 0.009 | 0.008 | 0.008 |
angolensis | 0.103 | – | 0.007 | 0.007 | 0.007 | 0.008 | 0.007 | 0.007 |
coucha | 0.123 | 0.123 | – | 0.004 | 0.007 | 0.008 | 0.007 | 0.007 |
shortridgei | 0.111 | 0.120 | 0.039 | – | 0.008 | 0.008 | 0.007 | 0.007 |
awashensis | 0.119 | 0.114 | 0.096 | 0.097 | – | 0.008 | 0.006 | 0.007 |
huberti | 0.149 | 0.140 | 0.104 | 0.111 | 0.108 | – | 0.007 | 0.007 |
erythroleucus | 0.142 | 0.136 | 0.105 | 0.111 | 0.092 | 0.092 | – | 0.006 |
natalensis | 0.131 | 0.131 | 0.098 | 0.100 | 0.091 | 0.098 | 0.083 | - |
3.2 Nuclear phylogeny and divergence dating
The maximum likelihood analysis of the concatenated dataset of six nuclear markers strongly supported the monophyly of the genus, composed of eight currently recognized species (UFBoot2 = 100; Figure 4a). The tree was poorly resolved (probably due to low variability of the markers), but showed similarity with mtDNA trees in following points: (i) M. angolensis was sister to remaining species (UFBoot2 = 88) with M. kollmannspergeri as the next offshoot (UFBoot2 = 95); (ii) M. shortridgei and M. coucha clustered together (UFBoot2 = 98), but did not form reciprocally monophyletic clades. On the contrary, the nuclear tree differs from the mtDNA tree in the position of M. shortridgei/coucha and M. huberti, with the former being the sister group of the remaining species (except M. angolensis and M. kollmannspergeri) based on the nuclear markers (Figure 4a), and the latter based on mtDNA (Figure 3). Another difference is that M. natalensis clustered with M. huberti at nuclear tree (UFBoot2 = 64), with M. natalensis clade AI being sister to all remaining specimens (Figure 4a). The species tree in STARBEAST2 calculated from six nuclear markers (Figure 4b) identified three pairs of sister species, that is, M. awashensis/M. erythroleucus (PP = 0.56), M. huberti/M. natalensis (PP = 0.99), and M. coucha/M. shortridgei (PP = 1.00). The deeper nodes within the genus were only poorly supported in the species tree (Figure 4b).

On the contrary, the multispecies coalescent analysis of 355 anchored phylogenomic loci provided a fully resolved species tree (Figure S2; for the topology see also Figure 5). The basal divergence in this phylogeny is between M. angolensis and all the other species, and the next one is between M. kollmannspergeri and the remaining six species. The remaining six species form three pairs of sister species with the pair of M. coucha + M. shortridgei being sister to the other two (M. erythroleucus + M. awashensis, and M. natalensis + M. huberti). All these relationships were supported with PP = 1.00. The time-calibrated species tree (Figure 5) shows the first split between M. angolensis and the other species in the late Pliocene, 3.41 Ma (million years) ago with 95% highest posterior density (HPD) interval 2.93–3.89. The next period of diversification includes three splits, all dated to Pleistocene: M. kollmannspergeri from the rest 1.63 (1.38–1.90) Ma ago, M. coucha + M. shortridgei lineage from the remaining two lineages 1.05 (0.91–1.21) Ma ago, and the split of M. erythroleucus + M. awashensis and M. natalensis + M. huberti lineages, which was dated to 0.91 (0.76–1.07) Ma ago. Three pairs of sister species split in the late Pleistocene: M. erythroleucus and M. awashensis 0.18 (0.00–0.46) Ma, M. natalensis and M. huberti 0.44 (0.28–0.61) Ma, and M. coucha and M. shortridgei just 0.06 (0.00–0.14) Ma ago.

4 DISCUSSION
4.1 Taxonomy, distribution, and genetic diversity of eight Mastomys species
The last comprehensive review of the genus Mastomys was published approximately one quarter of century ago, and it was based primarily on the detailed analysis of karyotypes (Granjon et al., 1997). This paper set up the taxonomy of the genus, which has been after several updates (Dobigny et al., 2008; Lavrenchenko et al., 1998) generally accepted till today. Only very recently, the species content of the genus has changed, based on the phylogenomic study of the tribe Praomyini (Nicolas et al., 2021). Specifically, the species Mastomys (previously Myomyscus) angolensis was included in Mastomys, and Serengetimys (previously Mastomys) pernanus was excluded, making the genus monophyletic and keeping the number of Mastomys species equal eight. We followed here this genus delimitation, for the first time resolved the phylogenomic relationships among all eight recognized species and provided updates of their distribution using almost 2700 DNA-barcoded specimens across Africa (Figure 1). In a systematic review below, we provide additional taxonomic, ecological, and biogeographical comments for each Mastomys species, highlighting the use of DNA sequencing in solving evolutionary questions. At the same time, our study indicated shortcomings associated with the use of only one gene (CYTB for instance) or a non-optimal set of genes (as just a few of nuclear ones) for deciphering evolutionary relationships in recently radiated mammalian groups.
4.2 Mastomys angolensis (Bocage, 1890)
It is genetically the most distinct species of the genus Mastomys, sister to all remaining taxa in all but one analyses (different position on the poorly resolved species tree from STARBEAST can be caused by ascertainment bias of six used loci; Figure 4b). It keeps the ancestral mammary formula (3 + 2) = 10, so it is the only species of the genus, which is not “multimammate” (note that members of two genera sister to Mastomys, that is, Ochromyscus and Stenocephalemys, possess also five pairs of nipples). In overall phenotypic appearance (length of the tail, width of molars, general skull morphology, fur structure, and coloration), however, it is a typical Mastomys (Crawford-Cabral, 1989). It is the only species of the genus that has never been karyotyped. Especially in lower elevation, it can live in sympatry with M. natalensis, from which it differentiates by number of teats and skull morphology (wider pterygoid fossa and anterior palatal foramina not extending back to the inner root of the first upper molars; Crawford-Cabral, 1989). It occurs in various woodland habitats in western Angola and southwestern DRC, where it can reach relatively high population densities, like other Mastomys in suitable conditions, and can form a dominant species in small mammal assemblages (Crawford-Cabral, 1989; Krásová et al., 2021). Two very divergent mitochondrial haplogroups were found in this species (Figure 2), differing by 6.72% at CYTB, suggesting a phylogeographic structure. Both haplogroups, however, can occur at the same locality (e.g., in Namba, 20 km SW of Cassoungue, 4 and 18 specimens had mtDNA sequences from different groups). Whether it represents a case of ancestral polymorphism or secondary contact of differentiated populations will require further research.
The species M. angolensis has a confusing taxonomic history (Crawford-Cabral, 1989; Musser & Carleton, 2005). The name angolensis was first used by Bocage (1890) for the description of Mus angolensis from Capangombe, that is, low-elevation locality (527 m a.s.l.) in southwestern Angola with semiarid tropical climate. The type series was destroyed by fire in the Lisbon Museum in 1978, but according to the original description it is a species with a tail much longer than head and body, white feet, soft and thick fur, and five pairs of teats (3 + 2), all characters very typical of Myomyscus (sensu Nicolas et al., 2021, that is, including currently only Myomyscus verrauxii from the South African Cape region). Crawford-Cabral (1989) thus argued that widely distributed Mastomys-like taxon in more humid regions at altitudes higher than 1000 m a.s.l. (repeatedly referred as angolensis since Thomas, 1904) is a different species, and he proposed the name angolae (Crawford-Cabral, 1989 as “Praomys (Mastomys) angolae nom. nov.”). However, Musser and Carleton (2005) in their influential taxonomic work did not accept this view (“In our view, he [Crawford-Cabral, 1989] simply renamed Bocage's angolensis”), and consequently, only one taxon with this name was reported from Angola in recent compendia, usually as Myomyscus (Beja et al., 2019; Denys et al., 2017; Monadjem et al., 2015). Because the type material of Bocage's angolensis is not available, we propose to continue to use this name for the widely distributed Mastomys in Angolan highlands and southwestern DRC. However, if future research would lead to a rediscovery of “true” Myomyscus in southwestern Angola (and based on biogeographical analysis by Krásová et al., 2021, it is possible), the taxonomic situation should be reconsidered. In that case, the name angolae will be available for Mastomys.
4.3 Mastomys kollmannspergeri (Petter, 1957)
This species was reported as a very distinct taxon, sister of all remaining species of the genus, in previous phylogenetic studies that did not include genetic data on M. angolensis (Colangelo et al., 2010; Dobigny et al., 2008; Eiseb et al., 2021; Martynov et al., 2020). It was described as a subspecies of M. natalensis from central Niger (Petter, 1957) and later assigned to a separate species due to very distinct karyotype analyzed in specimens collected 50 km eastwards of the type locality (Dobigny et al., 2002). The karyotype (2n = 38, NFa = 40; called sometimes MER-2 in previous publications) is different from all other species occurring potentially in the same area. Animals with this karyotype also formed a distinct clade in mitochondrial phylogenies, and they were recorded from central Niger (Aïr Mountains, Ighazer plains) through northeastern Nigeria, northern Cameroon, Chad, to eastern Sudan (Dobigny et al., 2002, 2008; Granjon et al., 2004; Viegas-Péquignot et al., 1987; Volobouev et al., 2001). Recently, Martynov et al. (2020) provided the first genetically confirmed record of this taxon from northwestern Ethiopia, what is currently the easternmost limit of the species distribution. Dobigny et al. (2008) considered Mastomys verheyeni, described on morphological grounds by Robbins & Van de Straeten (1989) from northern Nigeria and northern Cameroon, as a junior synonym of M. kollmannspergeri based on combined molecular and cytogenetic analyses. It is possible that Petter's kollmannspergeri may not be the oldest name, as there are multiple older holotypes from Sudan and South Sudan, potentially representing this species (Dobigny et al., 2008; Musser & Carleton, 2005). They were provisionally mentioned in the M. erythroleucus account by Musser and Carleton (2005), pending the proper integrative taxonomic assessment. The number of nipples ranges from 18 to 24, and they are equally spaced from axilla to inguinal region (Leirs, 2013).
4.4 Mastomys coucha (Smith, 1834) and Mastomys shortridgei (St. Leger, 1933)
These two species live in South African bioregion, south and west of the Zambezi River. All phylogenetic analyses confirmed that they are closely related and the support for the recognition of them as distinct species is by far the less convincing of all in the Mastomys genus. However, they significantly differ by the karyotype (both species have 2n = 36, but NFa = 52–56 in M. coucha and NFa = 46–50 in M. shortridgei), skull morphology (M. shortridgei having generally bigger skulls, but shorter bullae) and ecological requirements (M. coucha occupies semiarid savannahs in Namibia and lower-rainfall grasslands in Botswana and South Africa, while M. shortridgei is a specialist to wetlands in the Okavango basin and few additional drainage systems) (see Eiseb et al., 2021 and references therein). Mastomys shortridgei probably split from M. coucha by peripatric speciation (i.e., geographical separation of peripheral populations) during later phases of Pleistocene; our estimate based on genomic-scale data is just 60 kya. This process was enhanced by several pericentric inversions in their karyotype and local adaptation to swamp habitats along large rivers in southwestern Africa (Eiseb et al., 2021). The two species share one mtDNA haplotype (Figure 2), which might represent the case of ancestral polymorphism or secondary mtDNA introgression. The first option is supported by the facts that this haplotype is the most frequent in M. coucha in Namibia (Eiseb et al., 2021) and that the karyotypes differing by pericentric inversions usually prevent successful hybridization between taxa, which is a prerequisite of partial introgression. It is worth to note that the records reported originally as M. coucha from upper Zambezi plains in southwestern Zambia by Bryja et al. (2012) represent, in fact, the easternmost records of M. shortridgei, significantly expanding its known distributional area (see also Eiseb et al., 2021). The distribution of M. coucha seems to be disjoint (Figure 1), but it can be the effect of low sampling intensity in some regions, as suggested by new genotyped records from Botswana (McDonough & Sotero-Caio, 2019).
The number of nipples is probably secondarily reduced in M. shortridgei. While Skinner and Smithers (1990) followed older descriptions mentioning only five pairs of mammae (similarly as in the type specimen), other studies reported eight pairs, not separated into axillary and inguinal sets (Eiseb et al., 2021). In M. coucha, 12 pairs are repeatedly mentioned in various reviews (e.g., Monadjem et al., 2015). However, the individuals from the breeding colonies have only eight pairs of mammary glands: three pectoral, three abdominal, and two inguinal with occasional supernumerary mammary glands in the anterior region (Hardin et al., 2019), and the same number was reported recently from a free-living population in Namibia (Eiseb et al., 2021). If confirmed by additional studies of M. coucha in southern Africa, the reduced number of nipples (from 12 to eight pairs) is, therefore, characteristic for the whole clade of M. coucha/shortridgei.
4.5 Mastomys erythroleucus (Temminck, 1853) and Mastomys awashensis Lavrechenko et al., 1998
Mastomys erythroleucus is a typical species of the Sudanian savannah, distributed in a wide belt across the continent (plus an isolated population in Morocco; Figure 1b). A previous phylogeographic study of Brouat et al. (2009) identified four mitochondrial lineages (A–D), parapatrically distributed along the east-west axis. In a recent study, Martynov et al. (2020) showed that the easternmost lineage D, occurring east of the Nile River and in southern Ethiopia, is clearly split into two clades by the Great Rift Valley. Finally, here we provide the first evidence of the highly divergent fifth lineage E in Tanzania, significantly expanding the known range of the species. The karyotype has 2n = 38, and Dobigny et al. (2010) showed that the polymorphism in the number of autosomal arms (NFa = 50–56) is caused by pericentric inversions and is non-randomly distributed in mitochondrial phylogroups A–C. This suggests historical role of genetic drift in fixation of polymorphisms in populations living in isolated savannah refugia during humid periods of Pleistocene. After subsequent expansions in more benign climatic conditions, the partially differentiated populations meet in secondary contact zones, often located along large rivers (Brouat et al., 2009), where the gene flow among them is further limited by the decrease of fertility in hybrids due to heterozygosity in pericentric inversions (Dobigny et al., 2010). In this sense, we can hypothesize that 38-chromosome karyotypes with NFa = 60 described from eastern DRC (Král, 1971) and Uganda (Volobouev et al., 2001) evolved by similar process in eastern lineages D or E, which were not included in Dobigny's et al. (2010) study.
Mastomys awashensis was described by Lavrenchenko et al. (1998), based on genetic data, genital morphology, and karyotypes. This species was originally thought to be endemic to the Awash Valley (type locality), but accumulating genetic data suggest a wider distribution in Ethiopia (Colangelo et al., 2010; Lavrenchenko et al., 2010) and possibly in neighboring countries, for example, Eritrea (reviewed in Bryja et al., 2019, 2021; Martynov et al., 2020). Phylogenomic analysis showed that its most closely related taxon is M. erythroleucus (Figure 5), and they are also morphologically most similar, albeit distinguishable (Lavrenchenko et al., 1998). Its karyotype (2n = 32, NFa = 54) is on the first view very similar to that of M. natalensis, but they are distinct in number of unique characters (Lavrenchenko et al., 1998). On the contrary, nobody compared in detail the karyotypes of the sister species M. awashensis and M. erythroleucus, but the differences could be possibly explained by simple Robertsonian translocations of three pairs of chromosomes.
4.6 Mastomys natalensis (Smith, 1834) and Mastomys huberti (Wroughton, 1909)
Mastomys natalensis is probably the most abundant and the most widespread native mammal species in Africa (but see Bryja et al., 2014 for a comparison with Mus minutoides with even larger distributional area). Colangelo et al. (2013) conducted the first pan-African phylogeographic analysis based on mtDNA, which resulted in a division of this species into six main lineages (AI–AIII and BIV–BVI). The lineages have parapatric distribution and form a contact zones with very limited introgressive hybridization, which makes them important areas for speciation studies and understanding the co-divergence of hosts with their pathogens (Cuypers et al., 2020; Gryseels et al., 2017). Even if the species can be partially commensal (e.g., Brouat et al., 2007; Gryseels et al., 2016), which may facilitate its spreading, it is probably missing in large equatorial rainforests as well as in arid region of the Horn of Africa (see also the species distribution models in Martynov et al., 2020).
Finally, M. huberti inhabits wetland areas from coastal Mauritania and Senegal to Burkina Faso and Nigeria (Denys et al., 2009; Mouline et al., 2008). It is sister of M. natalensis in phylogenomic analysis, and the two species differ (besides ecological requirements) by pericentric inversions in their karyotypes (Britton-Davidian et al., 1995). Because the distribution of M. huberti is on the margin of a large distribution of M. natalensis, we can expect a similar evolutionary scenario (i.e., peripatric speciation enhanced by karyotypic changes and local adaptations) as in the case of M. coucha/shortridgei. However, M. huberti and M. natalensis are currently clearly identifiable by mtDNA sequences (Lecompte, Brouat, et al., 2005), so their split is probably older than that of M. coucha/shortridgei. This is confirmed by our divergence dating of the phylogenomic species tree (Figure 5). Close relationships between M. huberti and M. natalensis were revealed by experimental crossings, and one hybrid was also observed in nature (Duplantier et al., 1990). The current clearly fragmented range of M. huberti is a result of a stepwise colonization from west to east, and in some areas (northern Senegal and Mali), a demographic expansion was suggested during the African Humid Period (ca. 10 kya) (Mouline et al., 2008).
4.7 Evolutionary scenario for the diversification of Mastomys in open habitats of sub-Saharan Africa
Using genomic-scale data, we for the first time sufficiently resolved phylogenetic relationships among all recognized species of the genus Mastomys. Previous attempts were based mostly on mtDNA sequences or karyotypes, and none of them used complete taxon sampling. Although M. kollmannspergeri was repeatedly revealed as the most distinct species (because no study included M. angolensis), the relationships among remaining taxa depended on the method of analysis and used data (Colangelo et al., 2010; Dobigny et al., 2008; Eiseb et al., 2021), suggesting that multimammate mice diversified at a high rate. Here, we confirmed the inability to reconstruct the Mastomys phylogeny using only mtDNA (even whole mitogenomes failed) or just few nuclear sequences (see Figures 3 and 4). On the contrary, the species tree based on 355 nuclear loci provided a completely resolved phylogeny that we subsequently dated using fossil-based calibration of molecular clock. Thus, we will discuss a possible evolutionary scenario mainly based on the results of our multilocus phylogenomic reconstruction (Figure 5).
The ancestors of the Praomyini tribe appeared in Africa in late Miocene, and their first wave of radiation was linked to the fragmentation of pan-African forests at 7–5 Mya. This resulted in allopatric diversification of forest-adapted taxa on one hand, but on the other hand overall aridification also created new types of savannah-like habitats (Nicolas et al., 2021). The genus Mastomys belongs to the clade of taxa (together with Ochromyscus and Stenocephalemys) living predominantly in open ecosystems and their ancestor adapted to the newly appearing non-forest environments already at the end of Miocene (Lecompte, Denys, et al., 2005; Nicolas et al., 2021). The split of the three genera is dated to the Early Pliocene, that is, the most humid and warm period over the last 5 Myr, when the aridification trend was reversed, forests re-expanded and savannahs were again fragmented (Feakins & deMenocal, 2010). It is difficult to localize the origin of the genus Mastomys, but it is possible that it was in southern Africa, where numerous fossils assigned to Mastomys are known from Early Pliocene (Winkler et al., 2010) and where the most ancient extant species of the genus, M. angolensis, occurs. Based on the current distribution ranges, ancestors of Ochromyscus probably lived in more arid savannahs of Eastern Africa and those of Stenocephalemys adapted to montane moorlands in Ethiopian highlands.
The split of the lineage leading to M. angolensis is dated to Late Pliocene (median node age 3.4 Mya; Figure 5). Even if the aridification of Africa continued in Pliocene, it was likely interrupted by several more humid periods (Feakins & deMenocal, 2010), promoting allopatric diversification of taxa living in savannahs. Interestingly, the split of M. angolensis (the species currently occurring only in Angola and southern DRC) from other Mastomys species is roughly of the same age as the split between two other murid genera living in open habitats, that is, Arvicanthis and Lemniscomys from the Arvicanthini tribe (Mikula et al., 2021). These estimates are not biased by the used methodology, as they are based on similar phylogenomic data and comparable set of fossils (Mikula et al., 2021; Nicolas et al., 2021). In the same period, that is, Pliocene/Pleistocene transition, the multimammate phenotype evolved, which could have facilitated a fast Pleistocene spreading of the genus in savannahs across whole sub-Saharan Africa via increased reproductive potential.
Seven out of eight Mastomys species diverged in the Pleistocene. Besides M. kollmannspergeri that likely evolved in isolation in northern savannahs during one of the first Pleistocene climate oscillations, the diversification burst is dated to the period around 1 Mya, that is, during the so-called mid-Pleistocene transition period that occurred 1.25–0.7 Mya and resulted in prolonged strongly asymmetric cycles with long-duration cooling of the climate, followed by a fast change to a warm and humid interglacial (Brovkin et al., 2019). Numerous savannah mammals in Africa diversified in the consequence of the intensified climate oscillations (for example in rodents see Bryja et al., 2014, 2019; Hánová et al., 2021; Mazoch et al., 2018). A generally accepted scenario predicts the survival of arid-adapted taxa in spatially restricted refugia during interglacial periods (sometimes called “pluvials” in tropics), separated from each other by mountains (Cuypers et al., 2021) or large rivers and riverine forests (Brouat et al., 2009). The precise localization of such refugia is difficult to reconstruct, but very often the allopatric diversification produced spatially concordant patterns in widely distributed taxa, with main phylogeographic groups in western, eastern, and southern Africa (for ungulates reviewed by Lorenzen et al., 2012, for gerbils see Colangelo et al., 2007 and Granjon et al., 2012). In Mastomys, three pairs of currently recognized species diverged around 1 Mya and we can see similar geographical patterns to savannah ungulates or gerbils, even if in Mastomys they are partly blurred as a consequence of their high colonization capacity. While the origin of M. coucha/shortridgei clade is unequivocally placed in southern Africa, the two remaining pairs are now widely distributed across the continent. However, we argue that origin of M. natalensis/huberti is in western savannahs, where the diversity of haplogroups of M. natalensis is generally higher (Figure 2; see also Colangelo et al., 2013) and where the distribution of M. huberti is restricted (Mouline et al., 2008). On the contrary, the origin of M. erythroleucus/awashensis is more likely to be in eastern Africa, based on the diversity of haplogroups and the topology of mtDNA tree of M. erythroleucus (see Figures 2 and 3a; also Brouat et al., 2009) and current distribution of M. awashensis. Only subsequently and probably very recently, M. natalensis colonized almost whole sub-Saharan Africa to the east and south, while M. erythroleucus spread to the west (up to Senegal) and north (Morocco; Figure 1).
The diversity that evolved in savannah rodents during the last 1 Myr is usually interpreted as intraspecific variation (e.g., M. minutoides—Bryja et al., 2014; A. chrysophilus—Mazoch et al., 2018; Saccostomus campestris—Mikula et al., 2016). Even if the available nuclear DNA data are still very limited, it seems that some of the recently separated gene pools in Mastomys represent true biological species, reproductively isolated from each other. Based on the distributional patterns and phylogenomic analysis, we hypothesize that the speciation process was peripatric (see above), that is, M. huberti was originally a peripheral population of M. natalensis, M. shortridgei of M. coucha, and M. awashensis of M. erythroleucus. In the first two pairs, the speciation was further reinforced (or even primarily driven) by pericentric chromosomal inversions causing reproductive isolation (Britton-Davidian et al., 1995; Dobigny et al., 2010) and ecological shifts of peripheral populations toward wetland habitats (Eiseb et al., 2021). In the last pair, the ancestor of M. awashensis was likely isolated from the rest of M. erythroleucus in the refugium in the margins of the Afar Triangle by the Ethiopian highlands and the large lakes in the bottom of the Great Rift Valley (Bryja et al., 2021). Both species can coexist today in the drier parts of the Ethiopian highlands (Martynov et al., 2020), but this is very likely a secondary contact and no hybrids have been observed yet (A. A. Martynov, & L. A. Lavrenchenko, pers. comm.).
4.8 Genus Mastomys as a model group for future research
In view of ecological, distributional, and genetic differences between Mastomys species, it is clear that the genus may serve as an excellent model for answering many fundamental evolutionary and ecological questions, like the role of chromosomal rearrangements in colonization of particular habitats, ecological adaptations in recently evolved parapatric and sympatric species, and mechanisms of co-existence of multiple closely related species (e.g., Martynov et al., 2020). Ongoing speciation process can be studied in the framework of highly structured parapatric intraspecific clades in some species. For example, secondary contact zones between differentiated gene pools (e.g., Gryseels et al., 2017; Olayemi, Obadare, et al., 2016) can serve as useful models for identification of genomic regions with different introgression abilities. Finally, some species (e.g., M. natalensis) are very successful in colonizing human-altered environments and settlements including large towns, which is accompanied with rapid genetic evolution of urban populations (Gryseels et al., 2016). It will be interesting to see if such processes occur also in other large towns across its distribution and alternatively in other Mastomys species.
Due to high potential for commensalism and close vicinity to human, Mastomys mice are of epidemiological relevance as zoonotic reservoirs. The most studied pathogens in Mastomys hosts are arenaviruses (the Mammarenavirus genus), especially the Lassa virus hosted by M. natalensis in western Africa, the causative agent of the Lassa hemorrhagic fever (Fichet-Calvet et al., 2007; Lecompte et al., 2006). New arenaviruses are being described in M. natalensis, usually restricted to particular intraspecific clades of the host, such as the Gairo virus in the clade natBIV, the Morogoro virus in natBV, the Luna virus in natBVI, the Dhati Welel virus in natAIII, or the Mobala-like virus in natAII (Cuypers et al., 2020; Goüy de Bellocq et al., 2020; Gryseels et al., 2015, 2017; Olayemi, Obadare, et al., 2016; Těšíková et al., 2021). Besides various arenaviruses, M. natalensis (clade BVI in Zambia) has been recently reported as a natural reservoir for the encephalomyocarditis virus (EMCV or Cardiovirus A, Picornaviridae), relevant to human health (Kishimoto et al., 2021). Contact zones among intraspecific M. natalensis clades and their specific pathogens are crucially important as model systems for studies of host-parasite coevolution, speciation, and host-switching mechanisms. Two areas within the distribution of the species seem especially suitable for this purpose: (i) central Tanzania in the contact of natBIV, natBV, and natBVI clades (Cuypers et al., 2020; Gryseels et al., 2017), and (ii) central Nigeria in the contact of natAI and natAII, where both mitochondrial lineages host the Lassa virus, but to what extent the hybrid mice are concerned needs to be evaluated using genomic tools (Olayemi, Obadare, et al., 2016).
An outstanding morphological trait (with associated consequences for reproduction rate and population dynamics), that gave name to the “multimammate mice”, is multiplication of nipple pairs. Only the most ancient offshoot of the genus, M. angolensis, has ancestral state of five pairs of mammae, similarly as the sister praomyine genera Ochromyscus and Stenocephalemys (Nicolas et al., 2021). All other Mastomys species possess 9–12 pairs of nipples, except for the sister species M. coucha and M. shortridgei with usually eight pairs (reviewed in Eiseb et al., 2021). Recently, Hardin et al. (2019) compared the whole genome of M. coucha with that of Mus musculus and Rattus norvegicus (they diverged from ancestors of Mastomys 10–13 Ma; Aghová et al., 2018) with the aim to localize genomic regions associated with the multimammate phenotype. They identified 515 regions with accelerated evolution in Mastomys, most of them located near or within exons of important mammary development genes and some of them putatively enhancing the number of nipples development. Nevertheless, considering the resolved Mastomys phylogeny and divergence time estimates, such experiments could be modified to provide more relevant data. The multimammate phenotype evolved relatively recently (between 3.4 and 1.6 Ma; Figure 5) and a comparison of the genome of M. angolensis (the plesiomorphic state), with some 12-nipple pair species (M. natalensis as a suitable candidate) will be much more appropriate to evaluate the proximate developmental factors underlying this intriguing mammal trait. Furthermore, M. coucha as the species with probably secondarily reduced nipple number is not an ideal representative of the typical multimammate phenotype.
ACKNOWLEDGEMENTS
This study was supported by the Czech Science Foundation (project no. 20-07091J). Samples utilized in the study have been lawfully acquired and were collected prior to The Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from Their Utilization to the Convention on Biological Diversity has been in effect. We are indebted to many local authorities for providing permits to carry out the research, especially the Ethiopian Wildlife Conservation Authority (EWCA), Government of Ethiopia and the Oromia Forest and Wildlife Enterprise (OFWE) in Ethiopia, the Sokoine University of Agriculture in Morogoro (Tanzania), the Kenyan Forest Service and the Kenyan Wildlife Service (Kenya), Zambian Wildlife Authority (ZAWA), National Directorate for Protected Areas (DINAC) in Mozambique, and ISCED Lubango in Angola. For help during field work and providing the samples, we acknowledge J. Goüy de Bellocq, S. Gryseels, L. Cuypers, Y. Meheretu, J. Šklíba, H. Konvičková, M. Lövy, V. Mazoch, J. Zima jr., V. Nicolas, S. Eiseb, J. Stuyck, H. Leirs, L. Granjon, E. Verheyen, B.S. Kilonzo, C. A. Sabuni, M. Michiels, M. Colyn, F. Sedláček, T. Aghová, J. Vrbová Komárková, J. Favaits, T. Locus, J. Votýpka, J. Sádlová, D. Frynta, and all local collaborators. We are grateful to the Center for Anchored Phylogenomics (A.H. Lemmon and E.M. Lemmon) for producing anchored phylogenomics dataset within another larger project. All phylogenetic analyses were run using CIPRES Science Gateway (https://www.phylo.org/) and the Czech National Grid Infrastructure MetaCentrum provided under the program CESNET LM2015042.
AUTHOR CONTRIBUTIONS
JB, AK, and RS conceived the study; all authors collected and provided the samples; AH performed genotyping and prepared the datasets of mtDNA sequences with corresponding data; AB did an important part of the lab work and assembled complete mitogenomes; AH, JB, and OM analyzed data; AH and JB wrote the first version of the manuscript with contributions of OM and AK. All authors contributed to the final version of the paper. All authors read and approved the final manuscript.