Volume 27, Issue 22 pp. 4347-4349
NEWS AND VIEWS
Free Access

Why phylogenomic uncertainty enhances introgression analyses

James B. Pease

Corresponding Author

James B. Pease

Department of Biology, Wake Forest University, Winston-Salem, North Carolina

Correspondence

James B. Pease, Department of Biology, Wake Forest University, Winston-Salem, NC.

Email: [email protected]

Search for more papers by this author
First published: 25 November 2018
Citations: 3

How do we rigorously test for gene flow when the relationships among species are uncertain? In this issue of Molecular Ecology, Beckman, Benham, Cheviron, and Witt (2018) test for introgression in a group of Neotropical Passerine siskins (Spinus; Figure 1) using an approach that accommodates phylogenetic uncertainty. Their analysis demonstrates that even when a singular “true species tree” is not distinct, conducting tests using a set of similar alternative species trees can still infer a coherent model of speciation and introgression. More broadly, this method reinforces that carefully characterizing the nuanced genetic boundaries during the early stages of speciation is more important than inferring a universal species phylogeny.

Details are in the caption following the image
Hooded siskin (Spinus magellanicus, adult male) from central Peru. (Photo credit: M. Baumann)

Most phylogenomic analyses of species radiations have shown that observable ecological, reproductive or phenotypic separation develops long before genetic processes can produce sufficient quantities of lineage-specific alleles (i.e., phylogenetically informative characters). Similarly, the “effective migration” (Aeschbacher, Selby, Willis, & Coop, 2017) of introgressed alleles requires successful hybridization, recombination and (often) selection, and therefore will generally lag behind the actual migratory rate of individuals. Therefore, when speciation is rapid and gene flow is maintained (or develops) postspeciation, studies of introgression must consider measuring diffuse and opposite signals of differentiation and introgression simultaneously (recently reviewed in Degnan, 2018).

Practically, this genetic signal lag means a bifurcating “true species tree” may not be identifiable from sequence data in the early stages of divergence. Stochastic processes (e.g., shared ancestral polymorphisms, incomplete lineage sorting and demographic effects) along with directional processes (e.g., selection and introgressive gene flow) collectively cause loci in the genome to exhibit different evolutionary histories (Figure 2, and see examples cited in Beckman et al., 2018). The combination of alleles that are either concordant or discordant with the actual species relationships is a primary cause of phylogenetic uncertainty in phylogenomic data.

Details are in the caption following the image
All five panels (a–e) show a distribution of gene trees with the same consensus tree (blue) but increasingly diffuse distributions of gene trees. When discordance is absent (a), introgression may be nonexistent or nonsignificant. An idealized scenario (b), where a clear set of species relationships by descent (black) are complemented by a clear set of species relationships due to introgression history (gold). More realistic scenarios (c,d) with poor separation within subclades, but clear separations between them. In an extreme case (e), the boundaries between clades and species may be completely obscure. (Generated by Python script and plotted with Densitree2; Bouckaert & Heled, 2014)

Observations of phylogenetic uncertainty have prompted several responses in introgression testing, which I will broadly classify as (a) “no-tree,” (b) “strict-tree,” (c) “conservative-tree” and (d) “multiple-tree” methods. “No-tree” methods characterize allele data, most commonly visualized as a principal component analysis or hierarchical clustering. These methods do not specifically test for introgressive gene flow but simply illustrate the existence of population genetic diversity and structure. A common-sense temptation is to interpret individuals that appear genetically “intermediate” as introgressed, but substantial background variation and a host of other factors can cause false positives (Lawson, van Dorp, & Falush, 2016). While useful as a preliminary diagnostic visualization, this approach does not test against a prior expectation and cannot specifically support or reject gene flow.

“Strict-tree” methods establish a single consensus tree, regardless of the underlying uncertainty, and force analysis of all gene trees or allele patterns through this single tree. Rigid imposition of a universal phylogeny on all loci in a genome ignores that all loci do not evolve according to the same species-tree topology (a.k.a. the “Procrustean Bed”; Hahn & Nakhleh, 2015). Genome-wide diversity of phylogenies is readily apparent in most phylogenomic data sets (including Spinus). Once we accept that not all loci have evolved according to the same phylogeny, the question then arises as to which tree is “true” for the purposes of a reference to use for introgression analysis.

This apparent paradox is easily defused because we do not technically need one “true species tree” topology to test introgression. More fundamentally, we need to establish expectations of species’ genetic relationships as a baseline to interpret unexpectedly high genetic similarity between species as evidence of gene flow. A unique, universal bifurcating tree topology is not strictly necessary in order to establish these expectations and does not provide a realistic picture of genomic diversification. Furthermore, consensus phylogenies inferred from conflicting loci will often manifest as a harmonic mean of these various gene trees rather than a representation of the loudest voice or the true order of species divergences.

Most published phylogenomic data sets do not exhibit wildly different gene trees with no clades in common. Even a broad diversity of thousands of gene trees tends to cluster within a relatively small neighbourhood of similar trees compared to the massively larger space of possible trees (Figure 2). “Conservative-tree” methods analyse the distribution of gene trees to establish high-confidence branches, which are present in nearly all gene trees (e.g., Pease, Haak, Hahn, & Moyle, 2016). Therefore, phylogenetic uncertainty is accommodated by testing for introgression only across genetic boundaries that are generally unambiguous. This is both statistically and biologically practical since this method recognizes that it may only be relevant to test for introgression between individuals with clear genetic separation.

Beckman et al. (2018) propose a “multiple-tree” framework that is similar to “conservative-tree” approaches. This approach shifts the question from “What is the exact species tree?” to “Do small potential variations in species relationships affect the inference?” Rather than imposing a single strict species tree or focusing on high-confidence branches, the phylogeny is permuted within a set of observed trees to check for the effect of the tree itself on introgression inferences. This includes analyses that integrate the entire tree (SNAPP; Bryant, Bouckaert, Felsenstein, Rosenberg, & RoyChoudhury, 2012) and others that subset the tree, making them largely immune to small perturbations in topology (D-statistics; Durand, Patterson, Reich, & Slatkin, 2011).

This strategy is practical when the exact species relationships are unknown, but the general bounds and shape of those expectations are present. The possible trees presented in Figure 6 of Beckman et al. (2018) are quite similar (i.e., more like the simulation shown here in Figure 2c than 2e). Relationships between the pairs of S. magellanicus populations and their relationship to S. atratus remain constant. In contrast, S. crassoristris and S. uropygalis show different relationships to each other and the stable clades. This means that while the “true species tree” itself is not specifically known, the set of likely trees shares enough common clades to make hypothesis testing possible. As a counter-example, imagine a distribution of gene trees so wildly different that they share few common features (e.g., Figure 2e), leading to introgression analyses that will return weak allele structure patterns rather than actual gene flow.

This analysis by Beckman et al. (2018) echoes a growing sense that treating the phylogeny as a fixed rigid parameter is no longer an effective or necessary strategy for many evolutionary genomic analyses. A gene tree distribution is not “noise” that prevents us from clearly seeing the “true species tree.” Instead, gene tree heterogeneity informs us about which clades have achieved genetic distinctiveness and which are still in the process of genetically sorting. Species tree estimation, branch support and demographic modelling are all transitioning towards an appreciation of phylogenomic diversity (e.g., Pease, Brown, Walker, Hinchliff, & Smith, 2018; Zhang, Rabiee, Sayyari, & Mirarab, 2018). Beckman et al. (2018) present an introgression testing approach that embraces a diverse distribution of gene trees to determine more nuanced genetic boundaries in a species complex followed by rigorous hypothesis testing for introgression. While there is still much development necessary to improve phylogenomic evolutionary and ecological models, we can see encouraging progress from approaches that treat phylogenomic diversity as an instrument and not an impediment.

AUTHOR CONTRIBUTIONS

J.B.P. conceived and wrote the manuscript

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.