Diverse processes shape deep phylogeographical divergence in Cobitis sinensis (Teleostei: Cobitidae) in East Asia
Abstract
Little has been known about the impacts of past vicariance events on the phylogeography and population structure of freshwater fishes in East Asia. The aims of this study are to assess the genetic variability with extensive sampling throughout the range of Chinese spiny loach, Cobitis sinensis, and to infer the genetic structure and evolutionary history of populations. Cobitis sinensis in China may have initiated from two ancestral populations, namely Yangtze and Pearl Rivers, which diverged about 7.24 MYA likely due to drainage systems alteration. In the phylogroup I, a southward dispersal event occurred from East China (Yangtze River) to south ZheMin and Hainan subregions, followed by eastward dispersal from ZheMin to south Taiwan. In the phylogroup II, eastward colonization took place from Pearl River to north Taiwan in the late Pliocene, coupled with loss of genetic diversity in the island populations. This study showed that Cenozoic tectonic movements and climatic and sea-level fluctuations may have shaped the genetic structure of C. sinensis in concert. Highly diverged mtDNA sequences suggest existence of cryptic species in morphospecies C. sinensis.
Introduction
Phylogeographical studies unveil the evolutionary history of species in the context of paleoenvironmental changes (Hewitt 1996, 2004) and particularly test the biogeographical hypotheses and the influences of the geological events on the distribution of taxa and genetic variants (Avise 2000). In phylogeographical analyses, freshwater fishes, which are strictly constrained by the drainage systems, have provided key insights into the relationships between the contemporary genetic structure and historical changes in the environment (e.g. Hewitt 2004; Wang et al. 2004; Culling et al. 2006).
Freshwater fish fauna of East and South Asia represents the richest ichthyofauna in the Sino-Indian region (Banarescu 1991; Cavender and Coburn 1992), which is considered as indicative of early species diversification. Moreover, East Asia landmass has been characterized by a complex geological history, including tectonic movements, mountain building, periodical climate change and river capture/reversal (e.g. Huang 1945; Zhang et al. 1990; Rüber et al. 2004). The Cenozoic tectonic movements and glaciations have shaped the diversification of freshwater fishes in East Asia (e.g. Chen et al. 2007; Lei et al. 2009; Chiang et al. 2010). China, a vast geographical area with complex geology, is divided into five major geographical regions according to the essential geo-historical events and ichthyofauna (Li 1981). These regions are as follows: (1) North region, (2) West China, (3) Mongolia–Ningxia region, (4) East China and (5) South China, each of which can be subdivided into several subregions. The geomorphology of China has been altered extensively, and the trajectories of drainages and glaciations have changed the landforms greatly since the Quaternary (Zhang et al. 1990). Nevertheless, given such prominent geological and geographical background, phylogeographical analyses across this vast region were almost absent, except those focusing on Yangtze River (East China), Pearl River (South China) and Taiwan Island (Xiao et al. 2001; Perdices et al. 2004; Berrebi et al. 2005, 2006; Lei and He 2008; Chiang et al. 2010; Wang et al. 2011).
Chinese spiny loach, Cobitis sinensis Sauvage & Dabry de Thiersant, 1874 (Cobitidae), a benthic loach widespread across all geographical regions in China, is exceptionally interesting because it offers an ideal model for reconstructing the phylogeography across this vast geographical region. However, the information of genetic structuring of C. sinensis remains unknown. Furthermore, C. sinensis is of no use to humans; therefore, the probability of the artificial dispersal is very low, altogether making the species suitable for testing the vicariance dispersal hypotheses.
During late Tertiary, tectonic movements were active, with the formation of many mountains, for example, Qinling, Nanling and Yunkai Mountains, in China (Huang 1945). The east–west oriented Nanling Mountains isolate the watershed of two major river systems, Yangtze (the largest river in China) and Pearl River (the second largest river in China) (Lei and He 2008; Lei et al. 2009). It has been suggested that the fish assemblages between two sides of Nanling Mt. were unique and defined as different zoogeographical regions, that is, East and South China (Li 1981). However, many tributaries of Yangtze River are geographically close to those of Pearl River. Geological and biogeographical evidence also suggests that the upper reaches of these two rivers were once connected (Ren and Han 1959; Liu 1998; Zhang 1999; Yap 2002). Moreover, the downstream of the Yangtze River and the drainages of southeast China (ZheMin subregion, South China region) are also geographically close. Li and Zheng (1998) suggested that some fishes migrated between these two regions freely before the early Pleistocene, whereas in the mid-Pleistocene, the rising of Wuyi Mountains hindered the inter-rivers migration. Taiwan Island was recognized as a subregion of the South China region; nevertheless, C. sinensis in Taiwan was found to have been colonized for more than twice from the mainland source populations (Chiang et al. 2010), a pattern agreeing with the geological records that indicate emergence of land bridges connecting Taiwan Island to the Asian continent for four to six times (Gascoyne et al. 1979; Fairbanks 1989; Ota 1991, 1997; Yang et al. 1994; Yu 1995; Wang et al. 2000; Creer et al. 2001; Lin et al. 2002; Shih et al. 2006).
The aims of the present study are to assess the genetic variability of the mitochondrial DNA (mtDNA) cytochrome b (cyt b) gene with large sampling throughout the range of C. sinensis in East Asia and to infer their genetic structure and migratory routes. Two questions related to the evolutionary history of C. sinensis are addressed. They are (1) what is the extent of the geographical structure in the mtDNA variability? And (2) how and when did C. sinensis colonize the rivers of different geographical districts?
Materials and Methods
Population sampling and molecular methods
A total of 158 sequences of C. sinensis from 35 localities were investigated covering the species' range in South and East China regions (Fig. 1). Of them, 97 specimens were newly sampled and 61 specimens were retrieved from the GenBank (Chiang et al. 2010). Four spiny loaches, Cobitis taenia Linnaeus, 1758 (DQ175659), Cobitis choii Kim & Son, 1984 (JN858875), Cobitis macrostigma Dabry de Thiersant, 1872 (DQ105229), and Cobitis lutheri Rendahl, 1935 (DQ105231), were chosen as outgroups. Locality information and sample numbers are provided in Table 1. According to the ichthyofauna classification of China (Li 1981; Chen et al. 2007), 35 sampled streams belong to five geographical subregions: Yangtze River subregion, ZheMin subregion, Pearl River subregion, Hainan subregion and Taiwan subregion (Fig. 1; Table 1). All sampled individuals were collected from the field with seines and fatally anesthetized with MS-222 (Sigma, www.sigmaaldrich.com).
Region Subregion | Population | Abbreviations | Samples size | Number of haplotype | Haplotype diversity (h) | Nucleotide diversity (%) | mtDNA haplogroups | |
---|---|---|---|---|---|---|---|---|
θπ | θw | |||||||
East China | ||||||||
Yangtze River | 17 | 15 | 0.985 | 8.949 | 6.616 | B, D | ||
SinJiang | SJ | 6 | 5 | 0.933 | 0.620 | 0.768 | D | |
TunSi | TS | 5 | 5 | 1.000 | 4.447 | 5.179 | B, D | |
ChouZhou | CZ | 6 | 5 | 0.933 | 7.731 | 5.839 | B, D | |
South China | ||||||||
ZheMin | 54 | 51 | 0.998 | 6.983 | 6.891 | A, B, C, D | ||
TianTai | TT | 5 | 5 | 1.000 | 0.561 | 0.547 | B | |
SianGu | SG | 6 | 6 | 1.000 | 1.357 | 1.575 | B | |
GinUi | GU | 6 | 5 | 0.933 | 0.199 | 0.231 | B | |
FuAn | FA | 5 | 4 | 0.900 | 5.123 | 6.147 | A, D | |
KuangGe | KG | 4 | 4 | 1.000 | 0.716 | 0.766 | A | |
UongTai | UT | 4 | 4 | 1.000 | 0.351 | 0.335 | A | |
MuKlan | MK | 3 | 3 | 1.000 | 0.585 | 0.585 | A | |
HuaAn | HA | 4 | 4 | 1.000 | 0.234 | 0.239 | A | |
SanZao | SZ | 3 | 3 | 1.000 | 0.409 | 0.409 | C | |
DaPu | DP | 6 | 6 | 1.000 | 1.661 | 1.844 | A | |
FengZun | FZ | 2 | 2 | 1.000 | 0.526 | 0.526 | A | |
JieXi | JX | 6 | 6 | 1.000 | 1.199 | 1.498 | A | |
Pearl River | 19 | 19 | 1.000 | 9.487 | 7.537 | A, E | ||
ZhiShi | ZS | 4 | 4 | 1.000 | 0.804 | 0.861 | E | |
SinFeng | SF | 6 | 6 | 1.000 | 1.795 | 1.959 | A | |
KuiLin | KL | 5 | 5 | 1.000 | 0.810 | 0.887 | E | |
HeChi | HC | 4 | 4 | 1.000 | 0.629 | 0.670 | E | |
Hainan | 7 | 5 | 0.857 | 7.719 | 6.731 | C, E | ||
GianJiang | GJ | 2 | 2 | 1.000 | 0.614 | 0.614 | C | |
PeiLun | PL | 3 | 3 | 1.000 | 9.591 | 9.591 | C, E | |
FangCheng | FC | 2 | 1 | 0.000 | 0.000 | 0.000 | C | |
Taiwan | 61 | 52 | 0.994 | 4.383 | 5.290 | A, F | ||
ChongKang | CK | 6 | 6 | 1.000 | 1.269 | 1.537 | F | |
HouLong | HL | 6 | 5 | 0.933 | 0.205 | 0.269 | F | |
DaAn | TA | 7 | 6 | 0.952 | 0.615 | 0.753 | F | |
TaKia | TK | 6 | 5 | 0.933 | 0.310 | 0.384 | F | |
TaDu | TD | 6 | 6 | 1.000 | 0.292 | 0.384 | F | |
TzengWen | TW | 1 | – | – | – | – | A | |
ChouShi | CS | 4 | 2 | 0.500 | 0.351 | 0.383 | F | |
KaoPing | KP | 6 | 5 | 0.933 | 0.784 | 0.730 | A | |
SinDian | SD | 4 | 3 | 0.833 | 0.322 | 0.335 | F | |
TouChiang | TC | 1 | – | – | – | – | F | |
LanYang | LY | 5 | 3 | 0.700 | 0.395 | 0.421 | F | |
HsuWao | HW | 6 | 6 | 1.000 | 0.246 | 0.269 | F | |
SiuKoluan | SK | 3 | 3 | 1.000 | 1.053 | 1.053 | F | |
Total | 158 | 142 | 0.998 | 10.369 | 9.950 |

Samples were fixed and stored in 100% ethanol. Genomic DNAs were extracted from muscle tissue with a standard protocol of Blin and Stafford (1976). The entire cyt b gene was amplified using polymerase chain reaction (PCR) with primers L14724 (5′-GACTTGAAAAACCACCGTTG-3′) and H15915 (5′-CTCCGATCTCCGGATTACAAGAC-3′) (Xiao et al. 2001). Each 100 μl PCR contained 10 ng template DNA, 10 μl 10× reaction buffer, 10 μl dNTP mix (10 mM), 10 pmol of each primer and 4 U of Taq polymerase (Promega, Madison, WI, USA). PCR was programmed on an MJ thermal cycler as one cycle of denaturation at 95°C for 4 min, 30 cycles of denaturation at 94°C for 45 s, annealing at 48°C for 1 min 15 s and extension at 72°C for 1 min 30 s, followed by 72°C extension for 10 min and 4°C for storage. PCR products were purified by electrophoresis in a 1.0% agarose gel using 1× TAE buffer. The gel was stained with ethidium bromide, and the desired DNA band was cut and eluted using the agarose gel purification kit (QIAGEN, Valencia, CA, USA). Products of the cycle sequencing reactions were run on an ABI 377 automated sequencer (Applied Biosystem, Foster City, CA, USA). Chromatograms were checked with the chromas software (Technelysium), and sequences were manually edited using bioedit 6.0.7 (Hall 1999).
Sequence alignment and phylogenetic inferences
Nucleotide sequences were aligned with clustalx 1.81 (Thompson et al. 1997) and verified visually. Levels of intrapopulation genetic diversity were estimated with indices of haplotype diversity (h) (Nei and Tajima 1983) and nucleotide diversity (θπ and θW) (Jukes and Cantor 1969). The current (θπ) and historical (θW) genetic diversity were estimated with dnasp 3.14 (Rozas and Rozas 1999). Comparing estimates generated by θπ and θW provides insight into population dynamics over recent evolutionary history (Templeton 1993; Pearse and Crandall 2004; Buhay and Crandall 2005). Interpopulation genetic diversity was estimated by nucleotide divergence (Da) using DnaSP. The haplotype genealogies were generated by neighbour-joining (NJ) and maximum-likelihood (ML) analyses with mega 5 (Tamura et al. 2011), and Bayesian Inference (BI), using mrbayes 3.0b4 (Huelsenbeck and Ronquist 2001). Bootstrapping was performed with 400 replicates for ML analyses and 1000 replicates for NJ analyses. The HKY + I + G model of DNA substitution was selected as the most appropriate model for the analyses using the hierarchical likelihood ratio tests (hLRTs; lnL = 9767.8643) (Abdo et al. 2005) with modeltest (version 3.06, Posada and Crandall 1998).
The cyt b evolutionary rate in C. sinensis was estimated using the r8s 1.50 program (Sanderson 2002, 2003) with two calibration points: the vicariance events associated with the emergence above sea level of northern and central parts of Taiwan Island about 5.0 MYA (Teng 1990; Liu et al. 2000) and the formation of southeast coastal districts of mainland China about 1.8 MYA (Zhang et al. 1990; Zheng 2004). The confidence interval was estimated following the user's manual, and the rate of substitution per site per year was calculated using the penalized likelihood method (Sanderson 2002) with the optimal smoothing value estimated by cross-validation. The hierarchical structure of cyt b variation was examined with an analysis of molecular variance (amova, Excoffier et al. 1992) using arlequin 2.000 (Schneider et al. 2000). This analysis was performed with a K2P distance with the gamma correction (α = 1.242) and 20 000 permutations. amova partitioned the observed variation among samples into within-population (FST), within-group (FSC) and among-group (FCT) components. McDonald and Kreitman (1991), Fu and Li (1993) and Tajima (Tajima 1989) tests of neutrality were calculated.
Population structuring and phylogeographical history
Two approaches were conducted to estimate and test different phylogeographical scenarios. First, the program mdiv (Nielsen and Wakeley 2001), which uses an MCMC method within a Bayesian framework to estimate the posterior distribution of theta (θ = 2Neu), the number of migrants per generation (M = Nem, where m is migration rate) and the divergence time between populations (equations adjusted for mtDNA), was employed. The program also estimates the expected time to the most recent common ancestor (TMRCA) for all sequences in the samples. Five runs were performed with 5 000 000 cycles, each for the MCMC and burn-in time of 10% as recommended by the program manual. A Bayesian analysis was also conducted with the computer program beast version 1.3 (Drummond and Rambaut 2005) to investigate the history of C. sinensis in East and South China. A coalescent model with constant population size was used to estimate the effective population size in each clade (population). Posterior distributions of parameters were approximated using two independent MCMC analyses of 20 000 000 steps each, following a discarded burn-in of 2 000 000 steps. Samples from the two chains, which yielded similar results, were combined. Convergence of the chains was checked using the program tracer 1.4 (Rambaut and Drummond 2004), and the effective sample size for each parameter was found to exceed 200, which suggested acceptable mixing and sufficient sampling. The TMRCA for all sequences and for all major clades was also estimated using this approach to compare with TMRCA from MDIV and the results of the methods implemented in r8s.
A coalescent-based MCMC simulation was used to estimate the population size, migration rates and divergence times for the main clades as implemented in the program IM (Hey and Nielsen 2004). The IM model (isolation with migration) assumes that an ancestral population splits into two populations at a time t and that the descendant populations may change migrants in both directions at unequal rates (Hey and Nielsen 2004; Hey 2005). The parameters are scaled by the neutral mutation rate (μ): θA, θ1 and θ2, population mutation rates for ancestral population, and two daughter population since divergence, where θ = Neμ, where Ne is the effective population size; the time since population splitting (t), where t = Tμ, where T is time in years; and the asymmetric migration rates between two populations (m1 and m2). In addition, the program can record an estimate for the most common ancestor (TMRCA).
Results
Genetic diversity
A total of 1140 bp of the mtDNA cyt b gene sequences were obtained for 158 C. sinensis specimens analysed (GenBank accession numbers AM910624-39, AM711121-24, AM921759-75, AM922265-82, AM922487-505, AM924168-86, AM930249-64, AM930396-411, AM930557-72 and AM937063-79). In total, 142 unique haplotypes defined by 505 variable sites were identified. The sequences were aligned unambiguously with no indels observed in the data set. No evidence was found for saturation in transitions or transversions (graphs not shown). No significant deviations from neutrality were detected when all haplotypes were pooled together (Fu and Li 1993; D = −0.07, p > 0.10; F = 0.04, p > 0.10; Tajima 1989; D = 0.14, p > 0.01) or within individual populations.
Haplotype diversities were remarkably high within populations, ranging between 0.500 and 1.000, except for the fixation in FC population in Hainan subregion (Table 1). Hierarchical comparisons revealed that nucleotide divergence (Da) among geographical subregions (with an average of 12.20%) was higher than that among populations (mean = 9.63%). The nucleotide diversity (θπ) in population of each subregion was the highest in Pearl River subregion (mean = 9.487%) and the lowest in Taiwan subregion (mean = 4.383%) (mean = 3.983% within ZheMin subregion; mean = 7.719% within Hainan subregion; mean = 8.949% within Yangtze River subregion) (Table 1). Estimates of the current (θπ) and historical (θw) genetic diversity per sites for each sample indicated that all populations (35 sample sites as separate units) showed a pattern of decline (θπ < θw; Table 1), whereas the populations within region or subregion displayed a pattern of growth (θπ > θw; Table 1), indicating that the genetic diversity of C. sinensis as a whole is expanding, while shrinking locally.
Phylogenetic analysis
Individual trees reconstructed with different phylogenetic methods were highly consistent. The neighbour-joining tree using C. taenia as the outgroup (Fig. 2) based on the complete data set displayed a well-resolved phylogeny. The phylogenetic topologies were identical when using C. choii as the outgroup (data not shown). Six major clusters were identified, referring to hereafter as haplogroups A–F, and were supported with high bootstrap values, except for haplogroup D. In addition, the phylogeny using other loaches, C. macrostigma and C. lutheri, as outgroups did not displayed significant differences (Figs 2 and 3). Rooted at C. macrostigma or C. lutheri, the haplogroup D was divided into two minor haplogroups (D1 and D2) (Fig. 3).


Samples of each population were nested within one haplogroup, except four populations (TS, CZ, FA and PL) that consisted of different haplogroups (A, B, C, D, E) (Fig. 2; Table 1). The six haplogroups were allopatric in distribution with some overlaps (Figs 1 and 2; Table 1). Haplogroup A had the widest geographical range, occurring in ZheMin, Pearl River, that is, SF, and southern Taiwan subregions, that is, TW and KP, while haplogroup F occurred exclusively in northern Taiwan. Haplogroup B was distributed in Yangtze River and ZheMin subregions. The population SZ (southern ZheMin subregion) and all populations of Hainan subregion comprised haplogroup C, while the populations of Yangtze River subregion (SJ, TS and CZ) and FA (ZheMin subregion) populations together comprised haplogroup D. Haplogroup E was distributed in Pearl River and Hainan subregions. The divergences (uncorrected p-distances) among these six haplogroups ranged from 9.8% to 16.8% and ranged from 3.3% (haplogroup B) to 6.7% (haplogroup E) within haplogroup. However, these six haplogroups were assorted into two major monophyletic phylogroups with high bootstrap values, phylogroup I and phylogroup II (Figs 2 and 3). Phylogroup I consisted of haplogroups A–D, and phylogroup II contained haplogroups E and F. The sequence divergence between phylogroups I and II was 13.7%. Within phylogroup I, the pairwise divergences among haplogroups A–D ranged from 9.8% to 14.4%; within phylogroup II, the divergence between haplogroups E and F was 11.1%.
Genetic structuring
amova indicated that most genetic variability in C. sinensis could be explained by six haplogroups partitioning (Table 2). For these six haplogroups, suggested by the phylogenetic analysis, the estimated FST values were 0.945. amova identified 69.58% of the variants to be present among phylogenetic haplogroups, 24.88% of the variation among populations within haplogroups and only 5.54% of the variation within populations (Table 2). When considering each of the 35 sample sites as a separate population, FST was estimated at 0.868. However, as the six major haplogroups were analysed separately, amova revealed that most of the diversity occurred within sample sites. These data suggested that gene flow among populations that represented discrete haplogroups may have not taken place for a significant period of time. Results from the Mantel test showed a non-significant correlation between genetic and geographical distances among the 35 sample sites (r = 0.15; determination of Y by X = 2.1%; p = 0.012), indicating that ‘isolation by distance’ was less likely the scenario in this species (graphs not shown).
Scheme | Category description | % Var. | Statistic | p |
---|---|---|---|---|
Five geographical groups (Yangtze River) (ZheMin) (Pearl River) (Hainan) (Taiwan) | ||||
Among regions | 39.44 | FSC = 0.904 | <0.001 | |
Among populations in region | 54.77 | FST = 0.942 | <0.001 | |
Within population | 5.79 | FCT = 0.394 | <0.001 | |
Two groups suggested by phylogenetic analysis (phylogroups I & II) | ||||
Among groups | 49.69 | FSC = 0.830 | <0.001 | |
Among populations in group | 41.78 | FST = 0.915 | <0.001 | |
Within population | 8.53 | FCT = 0.497 | <0.001 | |
Six groups suggested by phylogenetic analysis (haplogroups A–F) | ||||
Among groups | 69.58 | FSC = 0.818 | <0.001 | |
Among populations in group | 24.88 | FST = 0.945 | <0.001 | |
Within population | 5.54 | FCT = 0.696 | <0.001 | |
35 sample sites as separate units | See Fig. 1 & Table 1 | FST = 0.868 | <0.001 |
Divergence times and the time to the most recent common ancestor (TMRCA)
Calibrating at two time points, the evolutionary rate at cyt b in C. sinensis was estimated here to be 1.00% sequence divergence per MY (per lineage per MY; 2.00% divergence per pairwise comparison per MY). According to the molecular clock, two major phylogroups of C. sinensis (phylogroups I and II, Figs 2 and 3) were divergent at 7.24 MYA. Besides, a Bayesian approach implemented in the program beast yielded an estimated TMRCA for the C. sinensis diversification at about 8.76 MY, while the likelihood approach implemented in MDIV yielded an estimate of 8.61 MY, and penalized likelihood method using r8s suggested a time of 7.72 MY (Table 3), revealing no significant differences among methodologies. The TMRCA of each haplogroup was estimated at approximately 1.40–4.79 MY (Table 3). Haplogroup D was the oldest, while haplogroup F was the most recent. MDIV estimates of divergence times among haplogroups ranged from 4.22 to 7.10 MY (Table 3), suggesting that all of these events occurred during the period from late Miocene to early Pliocene.
Approaches | |||
---|---|---|---|
r8s | MDIV | BEAST | |
All sequences (TMRCA) | 7.72 (7.56–7.88) | 8.61 (8.43–8.79) | 8.76 (8.58–8.94) |
Haplogroup A | 2.57 (2.51–2.63) | 2.95 (2.89–3.01) | 2.56 (2.50–2.62) |
Haplogroup B | 2.20 (2.16–2.24) | 2.89 (2.83–2.95) | 3.22 (3.16–3.28) |
Haplogroup C | 4.23 (4.15–4.31) | 4.22 (4.14–4.30) | 4.44 (4.36–4.52) |
Haplogroup D | 4.79 (4.69–4.89) | 4.83 (4.73–4.93) | 5.24 (5.14–5.34) |
Haplogroup E | 4.12 (4.04–4.20) | 4.66 (4.56–4.76) | 4.74 (4.64–4.84) |
Haplogroup F | 1.40 (1.38–1.42) | 1.59 (1.55–1.63) | 1.68 (1.64–1.72) |
Divergence time | |||
Haplogroups C–D | 6.34 (5.10–7.58) | ||
Haplogroups D–E | 7.10 (5.71–8.49) | ||
Haplogroups B–D | 5.66 (4.56–6.77) | ||
Haplogroups E–F | 4.98 (4.01–5.96) | ||
Haplogroups A–B | 4.22 (3.40–5.05) | ||
Population size | |||
Haplogroup A | 85.99 | 134 | |
Haplogroup B | 25.45 | 65 | |
Haplogroup C | 19.51 | 57 | |
Haplogroup D | 19.15 | 50 | |
Haplogroup E | 41.88 | 97 | |
Haplogroup F | 35.14 | 84 |
IM coalescence
IM analyses revealed unambiguous marginal posterior probability distributions of the parameters for all comparisons. Five independent runs converged to effectively similar parameter values (ca. 50 × 106 steps). The posterior probability distributions were mainly unimodal and had reasonable 95% high posterior density intervals (HPD) for most of the parameters (Fig. 4). In accordance with the mtDNA phylogeny analysis, the peak posterior estimate of time since divergence (TD) of haplogroups D and E was greater than that of other comparisons. The parameter estimate of the divergence time between haplogroups A and B was the lowest. These divergence times estimated by IM were correspondent to those calculated by MDIV (Table 3). All marginal posterior density distributions of divergence time displayed that the upper limits did not return to zero, so that credibility intervals cannot be defined (Fig. 4). The posterior distribution of the migration parameters had the highest probability near zero, steeply decreasing at higher values, which was interpreted as effective isolations of the different haplogroups (data not shown). Based on peak posterior distribution estimates of population mutation rate (θ) of all comparisons, the ancestral theta (θA) was with broad confidence limits (Fig. 4). The θ (θ = Neμ) was correspondent to effect population size (Ne) or effective population diversity (Dolman and Moritz 2006). Accordingly, the effect population diversity estimates for the haplogroup A was considerably higher. The IM coalescence results indicated that the population diversity in haplogroup A is twofold higher than that of other haplogroups. In contrast, haplogroup B had lower population diversity than haplogroup A (Fig. 4a). All these results were corresponding to the estimated effective population size based on BEAST and MDIV (Table 3). That is, the population size of haplogroup A was the greatest and of haplogroup D was the smallest.

Discussion
Cryptic species in C. sinensis
This study discovered a large number of haplotypes and high levels of nucleotide diversity in C. sinensis, a result likely associated with its widespread distribution and abundance in each river (cf. Tang et al. 2006). Gene genealogies of the complete cyt b displayed deep splitting of six haplogroups that correspond to geographical subregions with only some overlaps (Figs 1 and 2; Table 1). Coalescent simulations based on the IM model and amova suggested that gene flow has been obstructed between the geographical areas where major haplogroups were harboured. The observation of broad allopatry of divergent lineages is categorized as the category I pattern (cf. Avise et al. 1987; Avise 2000), which is considered as a result of long-term extrinsic barriers to genetic exchanges. The climatic or habitat conditions were different among these regions and therefore affected the levels of genetic diversity largely. Accordingly, these six major haplogroups of C. sinensis in East Asia may represent different evolutionary identities. By comparing the genetic structure of other fishes in the same geological region, high genetic divergence within species was also observed in two cyprinids, Zacco platypus Temminck & Schlegel, 1846 (Perdices et al. 2004, 2005; Perdices and Coelho 2006) and Opsariichthys bidens Günther, 1873 (Berrebi et al. 2005, 2006; Li et al. 2009).
The present mtDNA data indicate that two phylogroups (I and II) with six haplogroups (A–F) exist within this freshwater species. The sequence divergence between phylogroups I and II was 13.7%. Within phylogroup I, the pairwise divergence among haplogroups A–D ranged from 9.8% to 14.4%; within phylogroup II, the divergence between haplogroups E and F was 11.1%. Altogether, the divergences among these six haplogroups ranged from 9.8% to 16.8%; the divergence within haplogroup ranged from 3.3% (haplogroup B) to 6.7% (haplogroup E). High levels of intraspecific genetic diversity imply existence of cryptic species, as taking genetic divergence as a criterion for defining species (Johns and Avise 1998). In fishes, the intraspecific distances in seven Glyptothorax species were estimated as 0.17 ± 0.05% (Singh et al. 2012); intraspecific variation at cyt b of Goniistius varied from 4.5% to 13.7% (Burridge and White 2000); and the distance among haplotypes of O. bidens varied from 3.6% to 28.2% (Perdices et al. 2005; Li et al. 2009). Pairwise sequence divergence between Squalidus argentatus and S. nitens was 0.4% (Yang et al. 2006); the distance between Pareuchiloglanis sinensis and P. anteanalis was, nevertheless, zero (Peng et al. 2004). Mean uncorrected cyt b p-distances found among congeneric species were 14.0 ± 2.3% for Cobitis and 14.7 ± 3.0% for Misgurnus (Perdices et al. 2012). Likewise, sequence comparisons among eight deep lineages of Albula spp. were 5.56–30.6% (Colborn et al. 2001). Apparently, high genetic diversification implies that geographical populations of C. sinensis tend to diverge.
The phylogeny topologies (Figs 2 and 3) revealed paraphyly of haplogroup D. Furthermore, when adding other loaches, C. choii, C. macrostigma, C. taenia and C. lutheri into in-groups, the phylogeny uncovered paraphyly of C. sinensis. The phylogenetic analysis revealed close affinity between haplogroups A–D of C. sinensis, which were assorted into three monophyletic groups (A+B, C and D), and C. macrostigma and C. lutheri, and affinity between haplogroups E and F, as a monophyletic group, and C. choii and C. taenia (data not shown). Paraphyly of C. sinensis and high levels of intraspecific genetic divergence suggested existence of cryptic species (cf. Tang et al. 2008). However, many studies suggested that species and genus must fulfil two criteria, monophyly and distinctness (Wiley 1978; Paterson 1985; Cracraft 1989; Gill et al. 2005). Here, most loach species, for example, C. paludica, C. biwae, C. striata, C. lutheri and Misgurnus anguillicaudatus, do not meet monophyly (Doadrio and Perdices 2005; Perdices et al. 2012). Likewise, Perdices and Coelho (2006) found paraphyly of diverged species of Zacco platypus and Opsariichthys bidens. Tang et al. (2008) suggested existence of cryptic species based on diverse morphological characteristics of C. sinensis (Chen 1981; Zhu 1995; Chen and Chen 2005), which apparently was supported by the genetic analyses.
Origin and colonization history
The evolutionary rate at cyt b in morphospecies C. sinensis was estimated here to be 1.00% sequence divergence per MY, which was greater than those in other Cobitis species, ranging from 0.68% (Doadrio and Perdices 2005) to 0.84% (Perdices and Doadrio 2001), whereas approximating the rates in other fishes (Bermingham et al. 1997; Johns and Avise 1998). Brown et al. (1979) and Klicka and Zink (1997) proposed a “standard” mtDNA clock calibration with a rate of about 2% sequence divergence per MY between a pair of lineages. Wang et al. (2011) also found that the mutation rate of 1.7% for Candidia barbatus was much higher than that (0.54–0.82%) for cyprinid fishes (Zardoya and Doadrio 1999; Ruber et al. 2007). Ho et al. (2005) suggested that estimates of mutation rates based on population-level and pedigree data are usually higher than substitution rates inferred at species level, and Wang et al. (2011) suggested that the molecular clock estimated from a distinct group might not necessarily be suitable for the population level.
Accurate estimation of the times of evolutionary events is critical to the understanding of evolutionary forces that shape the population dynamics (Shapiro et al. 2004). However, single locus estimation of population divergence should always be treated with caution as it faces two main limitations: (1) overestimation due to polymorphism in the ancestral population and (2) a large variance due to the stochastic nature of the lineage sorting process (Jenning and Edwards 2005). To minimize these biases, TMRCA of the major population groups was estimated with different approaches, including MDIV analysis and the coalescence-based BEAST analysis. The TMRCA from different methods nevertheless did not show statistically significant differences (Table 3). Furthermore, in the phylogenetic analyses (Figs 2 and 3), cyt b sequences of morphospecies C. sinensis identified a highly diverse taxon composing of six geographically structured haplogroups, corresponding to different cryptic species. Arbogast et al. (2002) suggested that deep divergences are usually not affected by ancestral polymorphism as more recent events. Therefore, the divergence times in this study were likely proper estimates.
Phylogenetic analysis revealed that Cobitis split into two phylogroups (I and II) (Figs 2 and 3), which diverged about 7.24 MYA (Miocene). The haplogroups D and E are sister to the remaining haplogroups of phylogroups I and II, respectively. Haplogroup E is mostly distributed in Pearl River, while the Pearl River population ZS splits from the basal node in this haplogroup, indicating ancestry of Pearl River populations. Within haplogroup D, the first split minor haplogroup (D2) was distributed in Yangtze (SJ) and ZhenMin (FA) subregions, and the minor haplogroup D1 was distributed in Yangtze River. Based on the fish fauna, Li and Zheng (1998) suggested that some fish taxa of ZheMin subregion originated from the lower reaches of Yangtze River in the early Pleistocene. Later on, in the mid-Pleistocene, rising Wuyi Mountains hindered the migration of fishes from the Yangtze River. Accordingly, the ancestral area for phylogroup I was likely located in the Yangtze River where the FA population may have originated. Based on the geomorphology, many tributaries of Yangtze River are very close to those of Pearl River, while the upper reaches of Yangtze River were once connected with the upper reaches of Pearl River (Ren and Han 1959; Liu 1998; Zhang 1999; Yap 2002), thus allowing populations of Cobitis to disperse freely. On the contrary, the geographical separation of the river system divided Cobitis into two phylogroups.
Within the phylogroup I, Cobitis may have migrated from downstream of Yangtze River (haplogroup D) to ZheMin (SZ) and Hainan (haplogroup C), followed by subsequent eastward expansion to northern ZheMin and then southward through southern ZheMin to southern Taiwan (haplogroups A–B) subregion. According to the present river system, dispersal of Cobitis from Yangtze River to Pearl River, Hainan and Taiwan must have overpassed the ZheMin subregion (southeast coastal districts) (Fig. 1), mostly due to a fact that that the southeast coastal districts of mainland China had not been formed until Pliocene (ca. 2–5 mya) (Zhang et al. 1990; Zheng 2004). That is, the first dispersal event of Cobitis phylogroup I may have taken place during Miocene glaciations, followed by the formation of southeast coastal districts of mainland China.
Within phylogroup II, the haplogroup E was distributed in Pearl River and PL population (Hainan subregion). Based on the geomorphology, the upper reaches of PeiLun River (PL) and the upper reaches of the tributaries of Pearl River approximate each other. Our results also suggested that these two rivers were possibly once connected. In the Cobitis phylogroup II, ancestral population was likely located in Pearl River, where subsequent eastward migration occurred in northern Taiwan (haplogroup F) during the early Pliocene. Altogether, Cobitis in Taiwan likely originated from two different ancestral populations in ZheMin and Pearl Rivers (cf. Chiang et al. 2010). Moreover, the haplogroup F diverged from its sister group (E) 4.98 MA, while the TMRCA for haplogroups F was estimated at 1.59 MA only (Table 3). This sharp difference may stem from the loss of genetic diversity in haplogroup F over the colonization course, inevitably causing the underestimation in TMRCA.
Surprisingly, the population of SinFeng (SF) at one tributary of Pearl River was more related to the populations of southern ZheMin subregion (haplogroup A) than to other populations of the same river (haplogroup E) (Fig. 2), given no connection between the two river drainages. This unusual phylogeographical pattern was also found in Glyptothorax fokiensis (Chen et al. 2007). According to the geological records (Zhang et al. 1990), SinFeng River once flew northwards to the ZheMin subregion, whereas currently flowing southwards to the Pearl River subregion, partly explaining the possibility of historical river mergence that caused the unusual phylogeography.
Acknowledgements
We are grateful to Prof. Shao K.T., Biodiversity Research Center, Academia Sinica, Wu W.L., Lee Y.F., Department of Life Sciences, Cheng Kung University, and Tzeng C.S., Department of Life Sciences, Tsing Hwa University, for stimulating discussions. The correspondence author is very grateful for the grant support from Biodiversity Research Center, Academia Sinica, Nankang, Taipei, Taiwan, in 2008 to 2011. Part of this work was carried out using the resources of the Computational Biology Service Unit from Cornell University, which is partially funded by Microsoft Corporation. The authors thank two anonymous reviewers for valuable comments on the manuscript.