How do we rigorously test for gene flow when the relationships among species are uncertain? In this issue of Molecular Ecology, Beckman, Benham, Cheviron, and Witt (2018) test for introgression in a group of Neotropical Passerine siskins (Spinus; Figure 1) using an approach that accommodates phylogenetic uncertainty. Their analysis demonstrates that even when a singular “true species tree” is not distinct, conducting tests using a set of similar alternative species trees can still infer a coherent model of speciation and introgression. More broadly, this method reinforces that carefully characterizing the nuanced genetic boundaries during the early stages of speciation is more important than inferring a universal species phylogeny.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Hooded siskin (*Spinus magellanicus*, adult male) from central Peru. (Photo credit: M. Baumann)

Most phylogenomic analyses of species radiations have shown that observable ecological, reproductive or phenotypic separation develops long before genetic processes can produce sufficient quantities of lineage-specific alleles (i.e., phylogenetically informative characters). Similarly, the “effective migration” (Aeschbacher, Selby, Willis, & Coop, 2017) of introgressed alleles requires successful hybridization, recombination and (often) selection, and therefore will generally lag behind the actual migratory rate of individuals. Therefore, when speciation is rapid and gene flow is maintained (or develops) postspeciation, studies of introgression must consider measuring diffuse and opposite signals of differentiation and introgression simultaneously (recently reviewed in Degnan, 2018).

Practically, this genetic signal lag means a bifurcating “true species tree” may not be identifiable from sequence data in the early stages of divergence. Stochastic processes (e.g., shared ancestral polymorphisms, incomplete lineage sorting and demographic effects) along with directional processes (e.g., selection and introgressive gene flow) collectively cause loci in the genome to exhibit different evolutionary histories (Figure 2, and see examples cited in Beckman et al., 2018). The combination of alleles that are either concordant or discordant with the actual species relationships is a primary cause of phylogenetic uncertainty in phylogenomic data.

Observations of phylogenetic uncertainty have prompted several responses in introgression testing, which I will broadly classify as (a) “no-tree,” (b) “strict-tree,” (c) “conservative-tree” and (d) “multiple-tree” methods. “No-tree” methods characterize allele data, most commonly visualized as a principal component analysis or hierarchical clustering. These methods do not specifically test for introgressive gene flow but simply illustrate the existence of population genetic diversity and structure. A common-sense temptation is to interpret individuals that appear genetically “intermediate” as introgressed, but substantial background variation and a host of other factors can cause false positives (Lawson, van Dorp, & Falush, 2016). While useful as a preliminary diagnostic visualization, this approach does not test against a prior expectation and cannot specifically support or reject gene flow.

“Strict-tree” methods establish a single consensus tree, regardless of the underlying uncertainty, and force analysis of all gene trees or allele patterns through this single tree. Rigid imposition of a universal phylogeny on all loci in a genome ignores that all loci do not evolve according to the same species-tree topology (a.k.a. the “Procrustean Bed”; Hahn & Nakhleh, 2015). Genome-wide diversity of phylogenies is readily apparent in most phylogenomic data sets (including Spinus). Once we accept that not all loci have evolved according to the same phylogeny, the question then arises as to which tree is “true” for the purposes of a reference to use for introgression analysis.

This apparent paradox is easily defused because we do not technically need one “true species tree” topology to test introgression. More fundamentally, we need to establish expectations of species’ genetic relationships as a baseline to interpret unexpectedly high genetic similarity between species as evidence of gene flow. A unique, universal bifurcating tree topology is not strictly necessary in order to establish these expectations and does not provide a realistic picture of genomic diversification. Furthermore, consensus phylogenies inferred from conflicting loci will often manifest as a harmonic mean of these various gene trees rather than a representation of the loudest voice or the true order of species divergences.

Most published phylogenomic data sets do not exhibit wildly different gene trees with no clades in common. Even a broad diversity of thousands of gene trees tends to cluster within a relatively small neighbourhood of similar trees compared to the massively larger space of possible trees (Figure 2). “Conservative-tree” methods analyse the distribution of gene trees to establish high-confidence branches, which are present in nearly all gene trees (e.g., Pease, Haak, Hahn, & Moyle, 2016). Therefore, phylogenetic uncertainty is accommodated by testing for introgression only across genetic boundaries that are generally unambiguous. This is both statistically and biologically practical since this method recognizes that it may only be relevant to test for introgression between individuals with clear genetic separation.

Beckman et al. (2018) propose a “multiple-tree” framework that is similar to “conservative-tree” approaches. This approach shifts the question from “What is the exact species tree?” to “Do small potential variations in species relationships affect the inference?” Rather than imposing a single strict species tree or focusing on high-confidence branches, the phylogeny is permuted within a set of observed trees to check for the effect of the tree itself on introgression inferences. This includes analyses that integrate the entire tree (SNAPP; Bryant, Bouckaert, Felsenstein, Rosenberg, & RoyChoudhury, 2012) and others that subset the tree, making them largely immune to small perturbations in topology (D-statistics; Durand, Patterson, Reich, & Slatkin, 2011).

This strategy is practical when the exact species relationships are unknown, but the general bounds and shape of those expectations are present. The possible trees presented in Figure 6 of Beckman et al. (2018) are quite similar (i.e., more like the simulation shown here in Figure 2c than 2e). Relationships between the pairs of S. magellanicus populations and their relationship to S. atratus remain constant. In contrast, S. crassoristris and S. uropygalis show different relationships to each other and the stable clades. This means that while the “true species tree” itself is not specifically known, the set of likely trees shares enough common clades to make hypothesis testing possible. As a counter-example, imagine a distribution of gene trees so wildly different that they share few common features (e.g., Figure 2e), leading to introgression analyses that will return weak allele structure patterns rather than actual gene flow.

This analysis by Beckman et al. (2018) echoes a growing sense that treating the phylogeny as a fixed rigid parameter is no longer an effective or necessary strategy for many evolutionary genomic analyses. A gene tree distribution is not “noise” that prevents us from clearly seeing the “true species tree.” Instead, gene tree heterogeneity informs us about which clades have achieved genetic distinctiveness and which are still in the process of genetically sorting. Species tree estimation, branch support and demographic modelling are all transitioning towards an appreciation of phylogenomic diversity (e.g., Pease, Brown, Walker, Hinchliff, & Smith, 2018; Zhang, Rabiee, Sayyari, & Mirarab, 2018). Beckman et al. (2018) present an introgression testing approach that embraces a diverse distribution of gene trees to determine more nuanced genetic boundaries in a species complex followed by rigorous hypothesis testing for introgression. While there is still much development necessary to improve phylogenomic evolutionary and ecological models, we can see encouraging progress from approaches that treat phylogenomic diversity as an instrument and not an impediment.

AUTHOR CONTRIBUTIONS

J.B.P. conceived and wrote the manuscript

REFERENCES

Aeschbacher, S., Selby, J. P., Willis, J. H., & Coop, G. (2017). Population-genomic inference of the strength and timing of selection against gene flow. Proceedings of the National Academy of Sciences of the United States of America, 114, 7061–7066. https://doi.org/10.1073/pnas.1616755114
10.1073/pnas.1616755114
CAS PubMed Web of Science® Google Scholar
Beckman, E. J., Benham, P. M., Cheviron, Z. A., & Witt, C. C. (2018). Detecting introgression despite phylogenetic uncertainty: The case of the South American siskins. Molecular Ecology, 27(22), 4350–4367. https://doi.org/10.1111/mec.14795
10.1111/mec.14795
Google Scholar
Bouckaert, R., & Heled, J. (2014). DensiTree 2: Seeing trees through the forest. bioRxiv. https://doi.org/10.1101/012401
Google Scholar
Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N. A., & RoyChoudhury, A. (2012). Inferring species trees directly from biallelic genetic markers: Bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution, 29, 1917–1932. https://doi.org/10.1093/molbev/mss086
10.1093/molbev/mss086
CAS PubMed Web of Science® Google Scholar
Degnan, J. H. (2018). Modeling hybridization under the network multispecies coalescent. Systematic Biology, 67, 786–799. https://doi.org/10.1093/sysbio/syy040
10.1093/sysbio/syy040
PubMed Web of Science® Google Scholar
Durand, E. Y., Patterson, N., Reich, D., & Slatkin, M. (2011). Testing for ancient admixture between closely related populations. Molecular Biology and Evolution, 28, 2239–2252. https://doi.org/10.1093/molbev/msr048
10.1093/molbev/msr048
CAS PubMed Web of Science® Google Scholar
Hahn, M. W., & Nakhleh, L. (2015). Irrational exuberance for resolved species trees. Evolution, 70, 7–17. https://doi.org/10.1111/evo.12832
10.1111/evo.12832
PubMed Web of Science® Google Scholar
Lawson, D., van Dorp, L., & Falush, D. (2016). A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots. bioRxiv. https://doi.org/10.1101/066431
Google Scholar
Pease, J. B., Brown, J. W., Walker, J. F., Hinchliff, C. E., & Smith, S. A. (2018). Quartet sampling distinguishes lack of support from conflicting support in the green plant tree of life. American Journal of Botany, 105, 385–403. https://doi.org/10.1002/ajb2.1016
10.1002/ajb2.1016
PubMed Web of Science® Google Scholar
Pease, J. B., Haak, D. C., Hahn, M. W., & Moyle, L. C. (2016). Phylogenomics reveals three sources of adaptive variation during a rapid radiation. PLoS Biology, 14, e1002379. https://doi.org/10.1371/journal.pbio.1002379
10.1371/journal.pbio.1002379
PubMed Web of Science® Google Scholar
Zhang, C., Rabiee, M., Sayyari, E., & Mirarab, S. (2018). ASTRAL-III: Polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics, 19, 153. https://doi.org/10.1186/s12859-018-2129-y
10.1186/s12859-018-2129-y
PubMed Web of Science® Google Scholar

Citing Literature

Volume27, Issue22

November 2018

Pages 4347-4349

Why phylogenomic uncertainty enhances introgression analyses

AUTHOR CONTRIBUTIONS

REFERENCES

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Why phylogenomic uncertainty enhances introgression analyses

AUTHOR CONTRIBUTIONS

REFERENCES

Citing Literature

Figures

References

Related

Information