Genetic signature of recent glaciation on populations of a near-shore marine fish species (Syngnathus leptorhynchus)
Abstract
Continental glaciation has played a major role in shaping the present-day phylogeography of freshwater and terrestrial species in the Northern Hemisphere. Recent work suggests that coastal glaciation during ice ages may have also had a significant impact on marine species. The bay pipefish, Syngnathus leptorhynchus, is a near-shore Pacific coast fish species with an exceptionally wide latitudinal distribution, ranging from Bahia Santa Maria, Baja California to Prince William Sound, Alaska. Survey data indicate that S. leptorhynchus is experiencing a range expansion at the northern limit of its range, consistent with colonization from southern populations. The present study uses six novel microsatellite markers and mitochondrial DNA (mtDNA) sequence data to study the present-day population genetic structure of four coastal populations of S. leptorhynchus. Deficits in mtDNA and nuclear DNA diversity in northern populations from regions glaciated during the last glacial maximum (LGM) [c. 18 000 years before present (bp)] suggest that these populations were effected by glacial events. Direct estimates of population divergence times derived from both isolation and isolation-with-migration models of evolution are also consistent with a postglacial phylogenetic history of populations north of the LGM. Sequence data further indicate that a population at the southern end of the species range has been separated from the three northern populations since long before the last interglacial event (c. 130 000 years bp), suggesting that topographical features along the Pacific coast may maintain population separation in regions unimpacted by coastal glaciation.
Introduction
While the impact of continental glaciation on the historical distribution of terrestrial, freshwater and anadromous species is firmly established in the literature (Bernatchez & Wilson 1998; Hewitt 2000), little is known about the consequences of ice ages on marine animals. Although pelagic marine populations may be relatively robust to range restrictions caused by glacial advance, the loss of near-shore reproductive habitat due to glaciation may contribute to local extinctions and/or range retreats in some species. Intertidal animals are expected to be especially sensitive to near-shore habitat loss and these species may have postglacial recolonization histories similar to those identified in terrestrial and freshwater groups. While studies in this area remain limited, recent work suggests that glacial movements may also have a significant impact on near-shore marine animals (Hellberg et al. 2001; Hickerson & Cunningham 2005).
At the height of the most recent continental glaciation 15 000–18 000 years ago, the Cordilleran ice sheet covered much of the northeastern Pacific coastline [i.e. last glacial maximum (LGM); Fig. 1; Prest 1969; Dyke & Prest 1987]. At that time, many populations of northern coastal fishes would have experienced local extinctions and the ranges of many species would have been restricted to unglaciated regions in northern Alaska (i.e. Beringia) and south of the continental ice sheet. Several recent studies support the idea that isolated coastal refugia within the glaciated region (Pielou 1991) may have also harboured species during the last ice age (Hickerson & Ross 2001; Marko 2004; Hickerson & Cunningham 2005).

Map of specimen collection localities along the Pacific coast of North America. The limits of the contemporary distribution of Syngnathus leptorhynchus (Bahía Santa Maria, Mexico to Prince William Sound, Alaska) are indicated. Point Conception, an important biogeographic boundary, is also highlighted. Shading reflects the maximum extent of the Cordilleran glacial ice sheet (15 000–18 000 years bp; Prest 1969; Dyke & Prest 1987).
While demographics of newly colonized populations may stabilize relatively rapidly, a strong signature of historical phylogeography remains at the genetic level long after population colonization (Avise 2000; Excoffier 2004). Northern populations of marine species that have been unimpacted by glacial movements in the Pleistocene are expected to have levels of genetic diversity comparable with that in regions south of the glacial margin. In contrast, if coastal glaciation has limited the availability of northern habitat, contemporary populations in regions impacted by glaciation will have reduced genetic diversity and a phylogenetic history consistent with colonization from southern refugia, with divergence times less than the LGM (i.e. ‘southern refugium hypothesis’; Hewitt 1996). If these regions have been recolonized from isolated coastal refugia in the glaciated region, they should have a genetic history distinct from populations south of the glacial margin and divergence times greater than the previous interglaciation (c. 130 000 years bp) when refugial populations became isolated (i.e. ‘secondary refugium hypothesis’). By comparing and contrasting patterns of genetic diversity in contemporary populations spanning the glacial margin, large scale impacts of coastal glaciation can be identified.
Topographic and oceanographic boundaries also play important roles in limiting species distributions. Three faunal provinces are recognized along the eastern Pacific coast of North America: San Diego, ranging from Baja California north to Point Conception; Oregon, ranging from Point Conception to the Alaskan Peninsula; and Aleutian, north from the Oregon province to the eastern Bering Sea (Briggs 1995). Point Conception, California (Fig. 1), has long been recognized as a key biogeographic feature for Pacific coast marine species. Although the majority of warm water species are restricted to habitats south of Point Conception, cold water animals often have ranges that extend south of this boundary (Briggs 1995; Jacobs et al. 2004). While the importance of Point Conception as a species-level boundary is clear, recent molecular work has questioned the importance of this feature on the phylogeography of species with distributions that span the region (Burton 1998).
The genus Syngnathus is one of the most speciose groups of syngnathid fishes (seahorses and pipefishes), with 32 recognized species worldwide (Froese & Pauly 2000). Syngnathus pipefish inhabit near-shore marine habitats in temperate and tropical waters. While typically associated with eelgrass habitats, some species of Syngnathus inhabit kelp beds and offshore reefs. Southern California is a centre of Pacific coast syngnathid biodiversity, and is home to five species of Syngnathus (Fritzsche 1980). Molecular evidence suggests that eastern Pacific Syngnathus spp. are basal members of the genus, which has spread to freshwater and marine environments in the Atlantic Ocean and Europe (Wilson et al. 2001, 2003).
In contrast to the majority of Californian syngnathids that are geographically restricted, Syngnathus leptorhynchus is distributed from Bahia Santa Maria, Baja California to Prince William Sound, Alaska and has one of the widest distributions of any Syngnathus species (Fig. 1). Due to the exceptionally wide distribution of this species, there has been debate as to whether it represents a single species or several subspecies. Based on meristic evidence, Herald (1941) suggested that S. leptorhynchus was limited to habitats south of Morro Bay, near Point Conception. North of this boundary, Syngnathus griseolineatus was defined on the basis of an increased dorsal fin ray count.
Although Fritzsche (1980) also observed an increased dorsal fin ray count in populations of Syngnathus north of Morro Bay, he found that meristic characters were inconsistent and strongly influenced by local fluctuations in temperature. Due to the mosaic nature and local plasticity of meristic counts, Fritzsche synonymized S. griseolineatus with S. leptorhynchus, concluding that while high levels of local population structuring were possible within the species, the natural range of S. leptorhynchus extended from Alaska to Baja California.
Genetic data are critical for assessing species boundaries. While morphological characters can be strongly influenced by genotype–environment interactions (i.e. phenotypic plasticity), neutral genetic variation provides an unbiased estimate of the degree of divergence between populations and species. If populations of Syngnathus north and south of the Point Conception boundary have been genetically isolated from one another for sufficiently long to allow complete assortment of genetic diversity, reciprocal monophyly of genetic data would support the existence of two separate species. If, however, topographic and/or ecological boundaries in the Point Conception region are insufficient to completely limit gene flow and/or genetic divergence between lineages is relatively recent, northern and southern lineages may not warrant species status.
The present study uses mitochondrial DNA (mtDNA) and microsatellite markers to complement previous morphological and survey work, investigating whether there is any evidence for broad-scale population subdivision in S. leptorhynchus across recognized faunal boundaries and whether glacial activity has influenced the present-day genetic structure of this near-shore inhabitant. The restrictive role of glacial and geological boundaries on historical and contemporary dispersal is expected to leave a prominent signature on contemporary populations.
Materials and methods
Sample collection
Syngnathus leptorhynchus were collected from eelgrass beds at four sites along the North American Pacific coast in May and June 2003 (Fig. 1). Two populations were collected in the region covered with ice at the LGM (AK and WA) and two populations were collected south of the glacial margin (OR and CA). At the three southern sites, a 9 m beach seine with a 2.5 mm mesh was dragged by hand 5–10 times along a 30 m transect in 0.75–1.5 m of water. The AK population was sampled in four hauls by a boat-drawn 37 m variable-mesh beach seine with a 9 m bunt of 3.2 mm square mesh. Standard length, sex and meristic counts were recorded. Whole animals were stored frozen at −60 °C prior to DNA extraction.
DNA extraction and yield determination
Genomic DNA was extracted from fin clips with DNeasy 96 Tissue Kits on a BioRobot 8000 (QIAGEN) following the manufacturer's recommendations. DNA samples were diluted to a common concentration of 2 ng/mL prior to performing microsatellite and DNA sequencing analyses.
Microsatellite development
A suite of microsatellite loci were identified through the construction of a genomic library highly enriched for ACn repeats (Tenzer et al. 1999; Garner 2000). Genomic DNA was extracted from a single individual of S. leptorhynchus from Padilla Bay, Washington, using a methylene chloride: isoamyl alcohol extraction protocol (Claxton et al. 1997) and digested into 600- to 1000-bp fragments with DpnII (New England Biolabs).
A specific double-stranded oligonucleotide linker was formed by annealing Er1Bh1Blunt (5′-CGGAATTCAGTGGATCCTGCC-3′) and Er1Bh1Sticky (5′-GATCGGCAGGATCCACTGAATTCCG-3′) (Brinkmann & Hrbek, personal communication). This linker was ligated to the digested DNA fraction with T4 ligase (Sigma-Aldrich). The size fraction-linker ligation was enriched through polymerase chain reaction (PCR) amplification using the Er1Bh1Blunt primer.
An AC15 probe was biotinilated and bound to M-280 Streptavidin-labelled magnetic beads (Dynal Biotech). The enriched DNA library was hybridized to the probe-bead complex at 65 °C. DNA from the hybridized beads was cloned into the pCR2.1-TOPO vector (Invitrogen) and transformed into chemically competent TOPO10 Escherichia coli (Invitrogen), resulting in a DNA library highly enriched for ACn repeats. Enrichment efficiency was evaluated by PCR amplification with the AC15 and Er1Bh1Blunt primers. Clones containing an insert without an AC microsatellite yielded a single PCR product while clones containing a microsatellite repeat produced two PCR products. Positive clones were amplified with M13 vector primers and sequenced on an ABI 3100 automated sequencer (Applied Biosystems).
Microsatellite amplification
PCR primers were designed for 10 microsatellite markers using Fast PCR version 3.2 (Kalendar 2003). Annealing and amplification conditions were standardized and potential interlocus primer interactions were minimized in an effort to allow for multiplexing microsatellite loci in a single PCR. Following a preliminary test of unlabelled primer sets, six microsatellite loci were chosen for further optimization. The forward primer of each of these microsatellites was fluorescently labelled (Table 1) and all iterations of microsatellite multiplexing reactions were tested (data not shown). Four of the microsatellite loci multiplexed well (Slep3, Slep6, Slep7, Slep9), while the two remaining microsatellite loci were PCR amplified independently and added to the multiplex set prior to analysis on an ABI 3100 automated sequencer (Applied Biosystems).
Locus | Alleles | H O | H E | Primer sequence (5′−3′) | Cloned repeat motif | Size of cloned product (bp) | Fluorescent label |
---|---|---|---|---|---|---|---|
Slep3 | 48 | 0.949 | 0.968 | F: AAGGATGCATTGCTTCATGCR: AGTCATTACCTGGCCCATTG | TG 42 | 160 | VIC |
Slep6 | 41 | 0.900 | 0.960 | F: CGCGTGTGTGCAATAATAAAAR: GATCAATACCAGGTCTTTATCTGGC | TG 33 | 289 | VIC |
Slep7 | 63 | 0.958 | 0.979 | F: GGTGTTAACATCACGCACCAR: TTGGCTGTCACCAGTTTGTC | TG 20 | 194 | FAM |
Slep8 | 34 | 0.805 | 0.913 | F: AACTCTTCCAACAGAAATCTCTCAAR: CGAGGATTATTTCCCCCATT | GT 43 | 192 | PET |
Slep9 | 40 | 0.944 | 0.961 | F: AAGTGAGTCATTTGCTGCTATGGR: CGACAGACAGGTCAAGATTTGG | GT 40 | 321 | PET |
Slep10 | 51 | 0.969 | 0.965 | F: CTGGCAGGGATGAAAAGATGR: TGCATACCTGTGGCCATCACC | TG 25 | 285 | NED |
The amplification conditions for the multiplexed microsatellites consisted of 10-µL reactions containing 1 U Taq (Promega), 1 µL 10× reaction buffer (Promega), 2.5 mm MgCl2 (Promega), 0.4 µm dNTPs (Promega), 150–260 nm primers and 6 ng DNA. Individual loci were also amplified in 10 µL cocktails, with Taq reduced to 0.4 U and primer concentrations standardized at 200 nm. All PCRs were performed in MJ DNA Engine Tetrad machines with an initial 96 °C denaturation (1 min) followed by 30 cycles of 96 °C (1 min), 60 °C (1 min), 72 °C (1 min). Each reaction ended with a final 5-min extension at 72 °C.
PCRs were diluted into a single plate for genotyping. Multiplexed reactions were diluted 1:10 and individual reactions were diluted 1 : 20. 1 µL of the bulk PCR dilutions was added to a plate containing 0.09 µL of the GeneScan 500 LIZ genotyping standard and 9.91 µL of HiDi Formamide (Applied Biosystems). Reactions were genotyped on an ABI 3100 sequencer and automatically scored using genotyper 3.7 (Applied Biosystems). Scoring of microsatellite alleles was verified by eye for each sample.
Mitochondrial DNA sequencing
In a preliminary analysis of population genetic variation, 16S rDNA data were collected for four randomly selected individuals from CA, WA and AK and three individuals from OR. S. auliscus was included as an outgroup. A 579-bp fragment of 16S rDNA was amplified and sequenced under previously published reaction conditions (Wilson et al. 2001). For each individual, 546 bp of sequence data were collected.
Control region sequences were collected for a random subset of 20 individuals from each of the four populations. A 480-bp fragment of the mitochondrial control region was amplified with primers L15926 (Kocher et al. 1989) and H16498 (Meyer et al. 1990). PCRs were performed in 25-µL volumes containing 1 U Taq, 2.5 µL 10× reaction buffer, 2.5 mm MgCl2, 0.4 µm dNTPs, 500 nm primers and 15 ng DNA. PCR amplification included an initial 96 °C denaturation (5 min) followed by 40 cycles of 96 °C (1 min), 48 °C (1 min), 72 °C (1 min). PCRs were visualized on a 1.5% agarose gel. Reactions were purified with Montage PCRµ96 Filter Plates (Millipore) and eluted in 50 µL.
Ten-microlitre sequencing reactions included 2–3 µL of cleaned PCR product, 1 µm primer L15926 and 1 µL of BigDye v3.1 sequencing mix (Applied Biosystems). PCR cycling conditions included 30 cycles of 96 °C (10 s), 50 °C (5 s), 60 °C (4 min). Sequencing reactions were purified with CleanSeq magnetic beads (Agencourt) following the manufacturer's recommendations. Reactions were resuspended in 40 µL ddH2O and sequenced on an ABI 3100.
Statistical analyses
Microsatellites. Observed and expected heterozygosities of microsatellites were calculated using excel microsatellite toolkit version 3.1 (Park 2001). Exact tests of linkage disequilibrium among microsatellite loci and probability tests for deviations from Hardy–Weinberg equilibrium were conducted using genepop version 3.1d (Raymond & Rousset 1995). Estimates of FIS were calculated according to Weir & Cockerham (1984). The significance of these tests was estimated by a 10 000 step, 1000 iteration, Markov chain series of permutations (10 000 dememorization steps).
Pairwise FST and RST estimates between populations were calculated with arlequin version 2.1 (Schneider et al. 1997). Departures of FST and RST from the null hypothesis of panmixia were evaluated with a permutation test (20 000 iterations). To allow for direct comparison among population samples, standardized allelic richness was calculated following the rarefaction method implemented in Petit et al. (1998). A permutation procedure implemented by poptools version 2.6.6 (Hood 2005) was used to detect deviations from the null expectation of random allelic distribution (10 000 iterations).
A model-based clustering approach implemented by structure version 2.1 (Pritchard et al. 2000) was used to identify population affinities of individual samples. Three independent runs, incorporating burn-ins of 50 000 Monte Carlo Markov chain (MCMC) replicates, followed by 106 replicates of data collection, were performed for all analyses. Figures were prepared from structure output with distruct (Rosenberg 2004).
An admixture model without geographic data was first used to estimate the number of natural populations sampled (K). Five replicated runs, with K fixed at 1–5 populations, were conducted. Posterior probabilities of the data set from each run were calculated, and Pr(K) was estimated according to Bayes’ rule (Pritchard et al. 2000). Admixture proportions were estimated from the data set and fixed for all populations.
Second, geographic priors were incorporated into the analysis, incorporating sampling locality information for each of the four study populations. This approach allows the identification of probable recent migrants into each of the populations. The probability of recent immigration (i.e. within the past two generations) to each population was fixed at 10%.
Mitochondrial DNA. Intrapopulation gene diversity and nucleotide diversity statistics were calculated using arlequin version 2.1 (Schneider et al. 1997). Observed allelic diversity of mitochondrial DNA haplotypes at the population level was compared to the null hypothesis of panmixia with a permutation test (10 000 permutations), as implemented by poptools version 2.6.6 (Hood 2005).
Parsimony-based haplotype networks of 16S and control region haplotypes were constructed with tcs version 1.17 (Clement et al. 2000), with the connection limit set at 95% and gaps coded as missing data. The haplotype networks were calculated with and without S. auliscus as an outgroup.
The best-fit model of sequence evolution and model parameters for the mtDNA control region data set was assessed under the Akaike information criterion (AIC) using modeltest version 3.4 (Posada & Crandall 1998). Common haplotypes were removed from the data set using collapse version 1.2 (Posada 2005) and the maximum-likelihood topology was determined using paup version 4b10 (Swofford 2000). Initial tree topologies estimated using distance, maximum-parsimony and quartet-puzzling likelihood methods were used as starting trees in a maximum-likelihood analysis with parameters fixed at starting values. Starting branch lengths were estimated using Rogers–Swofford approximation and optimized following Newton–Raphson optimization (20 branch smoothing passes, Δ = 1 × 10−6). The optimal ML tree was identical in all three replicates.
A pairwise HKY distance matrix (Hasegawa et al. 1985) was generated for the control region data set by paup version 4b10 (Swofford 2000) and used to calculate population ΦST estimates according to Excoffier et al. (1992). ΦST is an analogue of traditional FST estimates, and provides a measure of hierarchical subdivision within and between populations using pairwise distances (Excoffier et al. 1992). The significance of ΦST estimates was evaluated using a permutation test (20 000 iterations) as implemented in poptools version 2.6.6 (Hood 2005).
Molecular dating
The likelihood of the control region tree was also calculated with and without the constraint of rate constancy using paml version 3.14 (Yang 2000). Using the estimated time of divergence of S. leptorhynchus from S. auliscus from a fossil-based analysis of stem, loop and total 16S evolution (Wilson 2006) allowed the calculation of the rate of molecular evolution for the control region fragment sequenced.
Intrapopulation divergence of control region haplotypes was calculated using HKY distances. Net interpopulation divergence was calculated according to the method of Nei & Li (1979), by subtracting average intrapopulation divergence from the estimated interpopulation divergence estimate. The time since population divergence was inferred using the calibrated control region rate of molecular evolution.
Isolation with migration
An isolation-with-migration (IM) model is particularly well-suited for studying divergence between recently separated populations, as it explicitly addresses the nonequilibrium state of these populations. In an effort to gain a better understanding of effective population sizes, divergence and the direction of migration between populations, an MCMC likelihood model incorporating migration (Hey & Nielsen 2004) was used.
Population pairs were analysed using IM (Hey & Nielsen 2004) under an HKY model of sequence evolution and population parameters (i.e. HiPt = parameter values with highest posterior probability) were estimated. Parameters estimated included θ = 4Nu for ancestral and descendant populations (where N = effective population size and u = mutation rate per year for the total gene), m = m/u for the two descendant populations (where m1 = rate of migration from population 2 to population 1) and t = tu (where t = time since population subdivision). Following a 1 × 106 step burn-in, posterior probabilities of parameter estimates were calculated during 50 × 106 cycles of data collection. Under these conditions, autocorrelations among parameters were low (P < 0.02 for all parameters) and effective sample sizes of all parameters exceeded 300, suggesting adequate exploration of the sample space.
An initial data collection phase assuming wide prior parameter distributions was used to determine upper bounds of parameters. Upper bounds were fixed following an initial run with maximum priors set at q1max = : 20; q2max: 20; qAmax: 20; m1max: 20; m2max: 20; tmax: 10 and 3 replicate runs with different random number seeds were run for each population comparison. No Metropolis coupling was used.
Coalescent simulations
A coalescent simulation approach was used to assess the fit of the genetic data to southern refugium and secondary refugium models. serial simcoal (Anderson et al. 2005) implements the two-step coalescent-mutation model of Excoffier et al. (2000), reconstructing the genealogy of a series of genes sampled from present-day populations and incorporating random mutations onto this genealogy under a constant Poisson process. The Kimura-2-parameter model is the most complex model of sequence evolution implemented by this package and was used as the best approximation of the HKY model used in the genetic analyses above. The simulations assumed a stepwise population colonization and expansion model from the southern end of the species range, where populations were founded by 100 individuals and expanded exponentially to present day population size. Demographic parameters estimated by IM were used. One thousand data sets of a haploid gene fragment of 430 bp were simulated for each run.
An initial simulation used population size, migration rate and divergence time estimates of IM to generate a probability distribution of divergence time estimates for the empirical data. A comparison of this probability distribution to that of the simulated scenarios allowed an analysis of statistical power. As in Hickerson & Cunningham (2005), power was defined as the proportion of instances where the estimated divergence times under the simulated empirical parameters exceeded the 95% confidence interval of the distributions simulated under each of the constrained scenarios.
To test the southern refugium hypotheses, IM divergence time estimates for populations north of the glacial margin were scaled so that the OR-WA divergence was equal to the LGM (18 000 years bp). The secondary refugium hypothesis assumed that an unsampled population was colonized from OR 130 000 years bp (the end of the last interglacial period) and isolated from the southern populations during the last glacial period. This population seeded the AK and WA populations simultaneously following the end of the LGM.
Net interpopulation divergence time estimates were calculated for each simulation as outlined above, forming a probability distribution of divergence time estimates for the two refugial scenarios. IM divergence time estimates between OR and WA, populations which span the LGM, were compared to the simulated distribution and the 95% confidence interval of simulated divergence times was used to test the fit of the data to each model. The simulations described here were run using the three molecular clock estimates (0.96%, 0.72% and 4.13% bp per million years) with qualitatively similar results.
Results
Microsatellite analysis
All six microsatellite loci are hypervariable (between 34 and 63 alleles per locus: n = 263; Table 1). Population-level and global tests of linkage disequilibrium failed to detect any instances of significant linkage among the six microsatellite markers, supporting locus independence. While there were no locus-wide trends in FIS estimates, a global probability test rejected the null hypothesis of Hardy–Weinberg equilibrium (P < 0.001).
Standardized allelic richness was depressed at five of six loci in the WA population and all six microsatellites had fewer than expected alleles in the AK population (Table 2). Allelic diversity of all six loci was highest in the OR population. Both pairwise FST and RST estimates indicated significant population structure (permutation test: P < 0.001 for all population comparisons following Bonferroni correction) (Table 3).
Population (n) | Slep3 | Slep6 | Slep7 | Slep8 | Slep9 | Slep10 | mtDNA | mtDNA gene diversity (± SD) | mtDNA nucleotide diversity (π) (± SD) |
---|---|---|---|---|---|---|---|---|---|
CA (31) | 27 (27) | 25 (25) | 23 (23)** | 17 (17) | 24 (24) | 25 (25) | 6 | 0.5789 (0.1242) | 0.0056 (0.0035) |
OR (85) | 38 (28.3) | 35 (26.4) | 59 (38.2) | 30 (22.4) | 36 (28.2) | 42 (30.2) | 7 | 0.7263 (0.0917) | 0.0030 (0.0022) |
WA (90) | 35 (26.2)** | 26 (21.0)** | 40 (28.8)** | 26 (19.1)* | 31 (23.7)** | 40 (27.2) | 7 | 0.5211 (0.1346) | 0.0014 (0.0013) |
AK (57) | 29 (23.2)** | 25 (20.4)** | 33 (25.2)** | 13 (10.2)** | 28 (22.7)** | 29 (25.1)** | 2** | 0.2684 (0.1133) | 0.00062 (0.00080) |
- * P < 0.05;
- ** P < 0.001.
CA | OR | WA | AK | |
---|---|---|---|---|
CA | — | 0.246** | 0.180** | 0.477** |
OR | 0.015** | — | 0.083** | 0.056** |
WA | 0.025** | 0.006** | — | 0.259** |
AK | 0.051** | 0.030** | 0.043** | — |
- ** P < 0.001.
Replicated admixture analysis using structure version 2.1 suggested the presence of three natural populations [Pr(K:3 populations) = 1; Pr(K:1,2,4,5 populations) = 0; data not shown]. While population structure was evident in the CA and WA populations under the three-population admixture model, extensive contemporary gene flow was evident between the three southern populations under this model (Fig. 2a). In contrast, AK formed a well-supported population with little evidence of recent immigrant genotypes.

structure version 2.1 (Pritchard et al. 2000) individual-based cluster analysis. (a) Admixture model w/o geographic data; (b) model incorporating population priors (M = 0.01). Y-axis reflects the probability of population assignment. X-axis grouping reflect sampling locality.
Cluster analysis incorporating sampling locality priors was also performed in an effort to identify individuals with a high probability of recent migration. Once again, AK formed a highly supported population, with little indication of recent migration (Fig. 2b). Several individuals could not be accurately assigned to their collection locality (P < 0.95), suggesting that they are possibly relatively recent immigrants from one of the other sampling localities or, more likely, recent arrivals from unsampled areas between the four geographically distant sampling localities.
Mitochondrial DNA analysis
Preliminary analysis of a 563-bp alignment of 16S rDNA identified a single haplotype shared by representatives of the three northern populations (OR, WA, and AK) (Fig. 3a). All individuals from CA shared a common haplotype that was distinct from those detected in the northern populations (T→A transversion replacement at alignment position 351; Fig. 3a). In an effort to gain a higher level of resolution on contemporary and historical gene flow, 480 bp from the 5′ region of the mtDNA control region were sequenced for a subset of 20 individuals from each of the four populations. The mtDNA control region typically evolves at an accelerated rate compared to 16S rDNA and, as such, is often suitable for resolving recent genetic events.

Unrooted network of mtDNA haplotypes for (a) 16S and (b) control region data. Cluster size is relative to the overall frequency of the sampled haplotypes. Unsampled intermediate haplotypes are indicated. Haplotype sequences have been submitted to GenBank (accession nos 16S: DQ309795, DQ309797, DQ309798; control region: DQ309801–DQ309818).
A total of 17 control region haplotypes were identified in the 465-bp fragment sequenced for 80 individuals (Fig. 3b). The haplotype network calculated with and without the outgroup taxa Syngnathus auliscus had an identical ingroup topology (data not shown). A 34-bp insertion was present in a single individual collected at the CA site and a 1-bp indel was present in a subset of the northern populations. No other insertions or deletions were observed in the sequenced fragment. Significant differences in base frequencies (A: 0.2750, C: 0.1483, G: 0. 1816, T: 0.3952) and the frequency of transition vs. transversion mutations (k = 5.45404) were observed in the empirical data and significant rate heterogeneity was evident in the sequenced fragment. Consequently, an AIC test indicated that a HKY + I model of evolution best fit the mtDNA sequence data. Rate variation among sites is difficult to estimate reliably using data sets with low levels of variation and is not expected to influence phylogenetic reconstruction for recently diverged taxa (Wilson, unpublished; Yang, personal communication). As the HKY model without rate variation is the most complex model available in all of the analyses performed here, this model was used for all analyses. A likelihood-based test of molecular evolution failed to reject the null hypothesis of clocklike sequence evolution for the control region data set (free model: –ln L = 749.98; constrained model: –ln L = 757.48; d.f. = 17; χ2 = 0.595).
While the number of control region haplotypes identified in OR and WA is not significantly different from that detected in CA (7 vs. 6), AK has significantly reduced levels of genetic diversity, with two control region haplotypes, one of which is present in 17 of the 20 AK individuals sequenced (Table 2; Fig. 3b). Reductions in mtDNA gene and nucleotide diversity are also evident in the AK population (Table 2).
In a mtDNA haplotype network, the most frequent haplotypes are inferred to be ancestral, while terminal haplotypes are considered to be more recently derived (Castelloe & Templeton 1994). The haplotype network in Fig. 3b has two dominant haplotypes, one which is dominant in, and restricted to, CA and the other which is the most common haplotype in each of the three northern populations, suggesting that CA has been separated from these populations for a considerable period. The absence of this haplotype in the CA population is intriguing, especially considering the presence of several alleles closely related to this common haplotype in the CA population (Fig. 3b). This disjunct haplotypic distribution was reflected in the highly significant ΦST estimates separating CA from the three northern populations (Table 4).
- * P < 0.05;
- ** P < 0.001.
Molecular clock analyses
Three divergence time estimates (Total 16S data set: Calibration I; 16S Stems: Calibration II, 16S Loops, Calibration III) between Syngnathus leptorhynchus from S. auliscus (Wilson 2006) were used to calibrate a rate of molecular evolution for ingroup control region data set using paml version 3.14. Under a clock-like model of evolution, the rate of sequence change of this fragment of the control region was estimated to be 0.96 ± 0.26%/million years (Calibration I), 4.13 ± 1.09% (Calibration II) and 0.72 ± 0.20% (Calibration III). Intrapopulation sequence divergence was calculated and the time since population divergence of each of the population pairs was inferred using the three control region rate calibrations (Table 5). According to Calibration I, the CA population has been separated from each of the three northern populations for approximately 500 000 years (438 888–561 635 years bp; Table 5). In contrast to the long period of separation between CA and the northern populations, both the WA and AK populations have population histories consistent with postglacial (i.e. 18 000 years bp) recolonization (Table 5).
Isolation model | Isolation-with-migration model | |||||
---|---|---|---|---|---|---|
Total data set | Loop data | Stem data | Total data set | Loop data | Stem data | |
CA-OR | 438 888 | 572 993 | 99 892 | 189 881 | 247 901 | 43 217 |
(343 796–606 698) | (448 429–793 375) | (79 034–135 709) | (50 099–1 105 270) | (65 407–1 442 991) | (11 403–251 563) | |
CA-WA | 558 869 | 729 635 | 127 200 | 270 658 | 353 359 | 61 603 |
(437 781–772 554) | (571 019–1 010 263) | (100 639–172 808) | (112 321–902 029) | (146 641–1 177 649) | (25 565–205 304) | |
CA-AK | 561 635 | 733 246 | 127 830 | 459 673 | 600 129 | 104 623 |
(439 947–776 378) | (573 844–1 015 263) | (101 137–173 663) | (211 282–989 114) | (275 840–1291 344) | (48 088–225 125) | |
OR-WA | 12 570 | 16 411 | 2860 | 93 765 | 122 416 | 21 341 |
(9846–17 376) | (12 843–22 723) | (2264–3887) | (23 503–247 155) | (30 685–322 674) | (5349–56 253) | |
OR-AK | 18 277 | 23 862 | 4160 | 95 744 | 125 000 | 21 792 |
(14 317–25 266) | (18 675–33 040) | (3291–5652) | (79 416–492 578) | (103 682–643 088) | (18 075–112 112) | |
WA-AK | 852 | 1113 | 194 | 30 430 | 39 729 | 6926 |
(668–1178) | (871–1541) | (153–264) | (14 102–494 557) | (18 411–645 672) | (3210–112 563) |
Population parameters including divergence times were also calculated under an IM model of population subdivision. Analyses from a representative run of the three replicates are shown in Tables 5 and 6. Although 90% highest posterior density limits (HPDs) are broad, significant differences in population sizes among the study populations are evident from the pairwise population comparisons (compare q1 and q2 in Table 6) and all three northern population pairs are inferred to be derived from small ancestral populations (qA < 0.150 for OR-WA, OR-AK and WA-AK; Table 6). Strikingly, a strong signal of southward migration (i.e. m1 > m2) is present in five of the six population comparisons. In the sixth population comparison (OR-WA), migration from OR to WA exceeds that in the opposite direction (m2 = 0.765; m1 = 0.065; Table 6).
Pop. pair | Parameter | |||||
---|---|---|---|---|---|---|
q 1 | q 2 | q A | m 1 | m 2 | t | |
CA-OR | 2.910 | 16.250 | 3.930 | 0.803 | 0.008 | 0.768 |
(0.830–10.030) | (7.670–19.990) | (0.010–17.270) | (0.003–4.038) | (0.003–3.533) | (0.203–4.468) | |
CA-WA | 5.770 | 8.830 | 3.330 | 0.473 | 0.005 | 1.094 |
(2.510–15.370) | (5.010–19.030) | (0.010–17.170) | (0.003–1.523) | (0.001–0.815) | (0.454–3.647) | |
CA-AK | 6.450 | 1.295 | 0.005 | 0.551 | 0.009 | 1.858 |
(3.755–9.995) | (0.245–5.615) | (0.005–9.995) | (0.089–1.649) | (0.001–1.493) | (0.854–3.998) | |
OR-WA | 9.450 | 85.850 | 0.050 | 0.065 | 0.765 | 0.379 |
(1.450–64.350) | (21.350–99.950) | (0.050–10.350) | (0.005–8.525) | (0.005–8.875) | (0.095–0.999) | |
OR-AK | 13.845 | 0.5550 | 0.075 | 2.405 | 0.015 | 0.387 |
(7.695–29.595) | (0.045–5.895) | (0.015–24.795) | (0.005–7.715) | (0.005–7.485) | (0.321–1.991) | |
WA-AK | 98.150 | 0.5500 | 0.150 | 11.990 | 0.005 | 0.123 |
(25.350–99.950) | (0.050–22.550) | (0.050–99.950) | (5.410–19.99) | (0.01–15.310) | (0.057–1.999) |
Divergence time estimates as estimated by the IM model are shown in Table 5. Consistent with equilibrium estimates, the time of population divergence between the CA and northern populations under the total data set calibration is high (189 881–459 673 years bp). The three northern populations also appear to have diverged from one another much more recently and lower bounds on the 90% HPD values are consistent with recent colonization (OR-AK: 79 416 years; OR-WA: 23 503 years; WA-AK: 14 102 years; Table 5).
Coalescent simulations
The simulated data sets produced a wide range of divergence time estimates, highlighting the stochastic nature of the population sampling process (Fig. 4). For the southern refugium simulation, 95% confidence estimates of the OR-WA divergence time from the simulated data sets were high (71 100–247 989 years), depending on the molecular clock calibration used) and all three simulations failed to reject the southern refugium hypothesis (P = 0.141–0.238). The median OR-WA divergence time under the southern refugium hypothesis was 17 985 years bp for Calibration I. Statistical power was very low for all tests (0.056–0.146).

Frequency distribution of OR-WA divergence time estimates derived from coalescent simulation of colonization scenarios (1000 replicates) using population size, migration rate and divergence time estimates from isolation-with-migration analysis (0.96% divergence/million years). (a) Recolonization from southern refugium. Divergence times in glaciated region scaled so that OR-WA divergence time = 18 000 years bp. (b) Recolonization from secondary refugium. Refugial population separated from populations at the end of the last interglaciation (c. 130 000) and WA and AK colonized independently from this refugium. Coalescent simulation with estimated parameters shown in both figures for comparison. im estimated divergence time of OR-WA (93 765 years bp) (Table 5) indicated.
The simulation of the secondary refugium hypothesis generated a multimodal distribution (Fig. 4b), precluding the possibility of statistical testing. However, it is clear that there is no support to reject this hypothesis with the observed data. Confidence estimates in these simulations were even wider in this series of simulations and median divergence time was 187 411 years bp for Calibration I.
Discussion
Topographic and glacial features have both played important roles in the historical phylogeography of pipefish populations along the eastern Pacific coast. Populations of Syngnathus leptorhynchus in regions impacted by glaciation (AK and WA) have reduced allelic diversities compared to populations in the south, consistent with recent colonization by small founding populations.
The importance of faunal boundaries
While the San Diego population (CA) has an mtDNA allelic distribution disjunct from the three northern populations, there is no evidence for strict reciprocal monophyly between CA and the northern clade, suggesting that the CA population does not represent a genetically distinct species with a long period of isolation. Although the most common northern haplotype was not detected in CA, the presence of low frequency allelic variants in the San Diego population that are closely related to this common haplotype (Fig. 3b) raises the possibility that these individuals are recent migrants with northern ancestry. Given the long period of separation between CA and the three northern sites, there exists the possibility that similar allelic variants in northern and southern populations have evolved independently since population divergence and do not reflect shared ancestry.
Notwithstanding the lack of complete lineage separation between northern (AK, WA, and OR) and the CA populations, the high levels of genetic differentiation observed at both microsatellite and mtDNA loci suggest that intervening topographical features between the three northern populations and CA may be important barriers to gene flow along the Pacific coast. Differences in dorsal fin ray counts, trunk rings and tail rings between populations on either side of Herald's (1941) postulated biogeographical boundary north of Point Conception are consistent with those observed in previous studies with much more detailed population-level sampling (Herald 1941; Fritzsche 1980; Table 7). However, as noted by Fritzsche (1980), variation in these meristic characters is strongly influenced by ambient temperature, an environmental parameter that varies positively with latitude.
This study | Fritzsche (1980) | Herald (1941) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
N | DFR | TrR | TaR | N | DFR | TrR | TaR | N | DFR | TrR | TaR | |
North of Morro Bay | 3 | 37.68** | 17.93** | 41.54** | 6 | 38.20** | 18.69** | 42.96** | 21 | 38.63n.a. | n.a. | 42.85n.a. |
(S. griseolineatus) | (29–44) | (16–20) | (37–48) | (34–43) | (17–21) | (40–46) | (31–44) | (18–20) | (37–46) | |||
South of Morro Bay | 1 | 32.29 | 16.83 | 37.10 | 9 | 32.70 | 17.70 | 38.85 | 5 | 33.15 | n.a. | 39.22 |
(S. leptorhynchus) | (30–36) | (15–18) | (30–41) | (28–37) | (16–19) | (36–43) | (28–37) | (17–19) | (36–41) |
- ** P < 0.0001; n.a., raw data not available.
As genetic and morphological data indicate that an important phylogeographic boundary exists between CA and OR, more detailed sampling of populations in this region will be necessary to resolve the role of local topographic boundaries in population divergence and potential incipient speciation of S. leptorhynchus. It is important to note that while Point Conception is frequently cited as a key biogeographic boundary for Pacific coast marine taxa, the biogeographic split between S. leptorhynchus and Syngnathus griseolineatus inferred by Herald (1941) is between Morro Bay and Elkhorn Slough, CA, over 100 km north of Point Conception.
A role for hybridization?
Although genetic data suggest that syngnathid species found in San Diego Bay are reproductively isolated, genetic analysis of male broods in San Diego has revealed high levels of interspecific mating between S. leptorhynchus and Syngnathus auliscus (Wilson 2006). Interspecies hybridization between species of Syngnathus has long been thought to be possible (Herald 1941), but has only recently been directly detected through molecular analyses of males and their broods in San Diego Bay. While hybrid adults have not been detected in the Bay, the possibility exists that interspecific hybridization between S. leptorhynchus and one or more of its congeners may be at least partially responsible for the high levels of genetic diversity and the disjunct allelic distribution observed. Temporal sampling of southern populations is necessary to determine whether hybrid juveniles are viable and survive to reproduce. Comparative phylogeographic studies of Californian syngnathids might also highlight contemporary gene flow between species at southern sites.
Range expansion from refugial populations
Both microsatellite cluster analysis incorporating geographic information and the analysis conducted in the absence of geographic priors indicate a low level of gene flow between the Alaskan and southern populations. While these analyses failed to assign several of the individuals in southern populations with high confidence, a strong signal of population structure among the three southern populations was evident when geographic priors were used (Fig. 2).
Work by Orsi et al. (1991) has recently documented an 850-km northward range expansion of S. leptorhynchus in Alaska. The low levels of gene flow between Alaska and the south, coupled with the reduced genetic diversity at both mtDNA and nuclear microsatellite markers, suggest that Alaskan populations of Syngnathus are derived from low numbers of founding individuals. As S. leptorhynchus has never been collected north of this location, it is exceedingly unlikely that populations of Syngnathus persisted in the Beringia refugium in northern Alaska through the last glaciation.
Mutation rates at microsatellite loci are accelerated when compared to molecular evolution of mtDNA (Hedrick 1999). A recent study of S. leptorhynchus mating systems has identified an average germ-line microsatellite mutation rate of 5.4 × 10−4−8.0 × 10−4 (unpublished data), suggesting that many of the observed microsatellite differences between contemporary populations may reflect mutational events that occurred since the last glaciation 18 000 years ago and that allelic homoplasy may be present in the data set. In contrast, the relatively reduced rate at which mutations accumulate in the mtDNA control region (here 0.72–4.13% changes per million years) makes it likely that the majority of the observed mtDNA diversity reflects mutational events that occurred before the most recent glaciation (but see Discussion above).
An isolation-with-migration (IM) model of population structure offers a higher degree of biological realism than that implicit in models which assume mutation–drift equilibrium. In addition to estimating the time since population divergence, IM models can estimate contemporary and historical population sizes and independent migration rates for each of the populations investigated.
The general IM model assumes that an ancestral population gave rise to two descendent populations at some point in the past. These two descendent populations are subsequently free to exchange genetic material through migration, as may be expected following recent population splits (Hey & Nielsen 2004). While this approach has been previously applied to putative population range expansions (Hey 2005; Hickerson & Cunningham 2005), it is not necessarily clear how this model will perform under conditions where a small subset of an ancestral deme founds a daughter population that continues to receive high levels of gene flow from the parent population. The performance of the IM approach under metapopulation and range expansion scenarios warrants further study.
Analysis of migration between adjacent populations suggests that the dominant direction of contemporary migration is southwards, a pattern also suggested by the structure analysis of microsatellite data. The exception to this pattern is a northern bias in migration between OR and WA (Table 6). As the AK and WA populations have relatively recent ancestry when compared to populations to the south, southwards-biased migration is unexpected. The direction of migration inferred here may be an artefact of the recent population expansion (see above) and low divergence between these populations. It remains, however, possible that while populations impacted by glaciation were initially colonized by small numbers of southern migrants, subsequent migration has been dominated by movements of northern individuals towards the south. If the possibility that this pattern is a methodological artefact can be eliminated, physical explanations such as the southeast direction of the dominant coastal current (Longhurst 1998) may help to explain this pattern of contemporary migration.
While the IM model has distinct advantages over strict isolation models in nonequilibrium situations, it is nonetheless difficult to disentangle migration rates and divergence times, as these variables are clearly tightly correlated with one another. These factors may be particularly confounded when no significant signal of population subdivision is present in a data set (Nielsen, personal communication). Under these circumstances, geographically isolated sampling localities act effectively as one panmictic population and divergence time, migration rate and population size estimates derived from the IM model may be misleading.
The IM method is further limited in that it is extremely difficult to accurately estimate confidence intervals around parameter estimates. Traditional 95% confidence intervals are not directly estimable for this likelihood-based approach and the 90% highest posterior density (HPD), the shortest span of the sampled distribution that contains 90% of the posterior probability, is used as a rough indication of the confidence in parameter estimates. The HPD is strongly dependent on population priors and the upper bound of parameter 90% HPDs is often close to the fixed parameter maximum.
Notwithstanding the limitations of the IM approach, in the present study this method is largely in agreement with the equilibrium model and 90% HPDs of divergence times encompass divergence time estimates derived from the no migration model. While the estimated population divergence times among northern populations are higher under the model incorporating contemporary migration, the lack of significant population subdivision among northern populations (AK-WA and WA-OR) may explain this discrepancy (see above). Total data set divergence time estimates among northern populations derived from an equilibrium approach are consistent with postglacial recolonization of WA and AK (all divergence estimates < 18 000 years) and lower limits of 90% HPD estimates between OR and WA (23 500 years) and WA and AK (14 100 years) based on Calibration I are also consistent with this scenario.
Coalescent simulation of genetic data provided a statistical test of the two refugial hypotheses. While a southern refugium scenario was not rejected for any of the three molecular clock calibrations used here (Fig. 4a), the statistical power of these tests was extremely low (0.056–0.142), due to the close similarity between the empirically estimated parameters and the refugial scenario investigated. Due to the multimodal distribution of simulated data, a statistical test of the secondary refugium hypothesis was not possible, but there was no evidence that the estimated divergence time between populations which span the glacial margin differed from that expected under this scenario (Fig. 4b).
While a simulation approach should be an integral part of population genetic and phylogenetic studies, it is important to recognize that the colonization scenarios simulated here are idealized refugial recolonization models. As the colonization history of wild populations is likely far more complex than idealized simulation models, the power of statistical tests such as this to resolve the true phylogeographic history of wild populations is often low. This form of statistical test is most useful when major differences are apparent between the inferred population history and the refugial hypotheses tested. When inferred population divergence times are high, these tests may have high statistical power and may allow the rejection of refugial scenarios (Hickerson & Cunningham 2005). In the present study, population divergence times of 71 000 years bp (Calibration III) to 250 000 years bp (Calibration II) would have been necessary to reject the southern refugium hypothesis.
Coastal persistence and recolonization: an emerging pattern
Recent work by Hickerson & Cunningham (2005) stresses that our understanding of glacial influences on coastal regions requires large-scale comparative work across a diversity of taxa with a range of life history strategies. The impact of Pleistocene glaciation on near-shore Pacific species has been studied in 12 rocky intertidal animals (9 invertebrates and 3 fish species). Of these 12 species, six exhibit a phylogenetic history consistent with persistence during glacial periods and six exhibit a strong signal of recolonization from southern refugia (Hickerson & Cunningham 2005). Clearly, coastal glaciation has differentially affected near-shore populations and the fact that a significant number of species persisted above the glacial margin suggests that ecological differences and/or stochastic factors are most likely responsible for population persistence during glaciation.
Experimental increases or reductions in ambient water temperature have been shown to have a significant impact on physiology and reproduction in marine fishes (Pepin 1991). Temperature reductions associated with glacial periods could strongly affect the physiology of near-shore inhabitants and might ultimately exclude species from previously suitable environment. For species such as S. leptorhynchus that are intimately associated with eelgrass, temperature shifts and glacial scour might indirectly limit species persistence through impacts on Zostera eelgrass meadows.
Recent work has demonstrated that Pleistocene glaciation has strongly influenced the population genetics of Zostera marina, the most common eelgrass species in the northern Pacific. Olsen et al. (2004) included nine Pacific coast populations in a global survey of Zostera phylogeography. Pacific coast populations of Z. marina fell into two major groups, populations south of Point Conception (Channel Islands) and those north of this boundary (Bodega Bay, CA, WA and AK populations). Genetic diversity in southern populations of Z. marina was greater than that detected at northern sites, a pattern concordant with the results shown here for S. leptorhynchus.
Congruent phylogeographic patterns in pipefish and eelgrass suggest that, in addition to direct effects of glaciation on pipefish reproduction and physiology, the lack of available eelgrass habitat at northern latitudes may also have limited the persistence of pipefish populations during the most recent glacial period. The study of other common eelgrass inhabitants that span the most recent glacial maximum offers an ideal opportunity to test the hypothesis that habitat availability has influenced the persistence or exclusion of near-shore inhabitants.
Conclusions
Syngnathus leptorhynchus has a genetic history consistent with postglacial recolonization from refugial populations, with reduced allelic diversities in AK and WA and population ages consistent with glacial impacts. Strong divergence between populations in the north (AK, WA, and OR) and south (CA) portions of the species range suggest that topographic boundaries have also played important roles in the present-day phylogeography of this species. Recent thesis work by Louie (2003) supports these conclusions. Louie (2003) investigated mtDNA sequence variation in 29 near-shore populations of S. leptorhynchus. While population sample sizes were limited in this work (1–32 individuals per population), the greater density of population sampling allowed a more powerful test of the recolonization hypothesis put forward here. The general recolonization scenario inferred from the broad-scale sampling here is supported by Louie's analysis (2003), and the presence of divergent haplotypes in northern populations provides compelling evidence of possible secondary refugia in the region effected by glaciation. Further temporal and spatial investigations of southern populations will help clarify the importance of hybridization as a source of genetic variation in populations of S. leptorhynchus (Wilson 2006).
Acknowledgements
Many thanks to A. Kagley, K. Lewand, T. Lundrigan, K. Musolf, S. Johnston, J. Rhydderch, M. Rowse and H. J. Walker for assistance with collections. Thanks to M. Ford and J. Hess for discussions and feedback on the manuscript. Methodological discussions and feedback from L. Excoffier, J. Hey, M. Hickerson, R. Nielsen and J. Pritchard are greatly appreciated. This work was funded by a National Research Council Research Associateship to A.B.W. and supported with funding from the National Oceanographic and Atmospheric Administration.
References
The interests of the author concentrate on the interplay between environmental pressures and sexual selection during the early stages of speciation. This work was conducted while A.B.W. was an NRC research associate at the Northwest Fisheries Science Center in Seattle, Washington.