DNA sequence variation in BpMADS2 gene in two populations of Betula pendula
Abstract
The PISTILLATA (PI) homologue, BpMADS2, was isolated from silver birch (Betula pendula Roth) and used to study nucleotide polymorphism. Two regions (together about 2450 bp) comprising mainly untranslated sequences were sequenced from 10 individuals from each of two populations in Finland. The nucleotide polymorphism was low in the BpMADS2 locus, especially in the coding region. The synonymous site overall nucleotide diversity (πs) was 0.0043 and the nonsynonymous nucleotide diversity (πa) was only 0.000052. For the whole region, the π values for the two populations were 0.0039 and 0.0045, and for the coding regions, the π values were only 0 and 0.00066 (for the corresponding coding regions of Arabidopsis thaliana PI world-wide π was 0.0021). Estimates of π or θ did not differ significantly between the two populations, and the two populations were not diverged from each other. Two classes of BpMADS2 alleles were present in both populations, suggesting that this gene exhibits allelic dimorphism. In addition to the nucleotide site variation, two microsatellites were also associated within the haplotypes. This allelic dimorphism might be the result of postglacial re-colonization partly from northwestern, partly from southeastern/eastern refugia. The sequence comparison detected five recombination events in the regions studied. The large number of microsatellites in all of the three introns studied suggests that BpMADS2 is a hotspot for microsatellite formation.
Introduction
Genetic variation exists within every species and forms the basis for selection and evolution. Levels and patterns of genetic diversity vary greatly within and among populations (and species) and are shaped by both population level and genomic evolutionary forces (Small et al. 1999; Vieira & Charlesworth 2001). This diversity can be studied at different levels, but DNA sequence data represent the highest level of genetic resolution. Molecular variation in genes that regulate development provides insight into the evolutionary processes that shape the diversification of morphogenetic pathways (Purugganan & Suddith 1999). However, little is so far known about the molecular population genetics of developmental pathways and the genes that govern them.
The Quaternary cold periods in Europe are thought to have heavily influenced the amount and distribution of intraspecific genetic variation in plants (Taberlet et al. 1998). For morphological and other quantitative traits, there is extensive variation within populations in many forest trees (Stern 1964; Howland et al. 1995). Variations in defensive chemistry between and within Betula pendula populations are also considerable (Keinänen et al. 1999; Laitinen et al. 2000). Similarly, a high level of variation in trees has been found at isozyme loci (Hamrick & Godt 1996). High variation is assumed to give an adaptive advantage for a long-lived species subject to conditions varying greatly from year to year (Stern 1964).
Within morphologically variable natural populations of B. pendula and B. pubescens, random fragment length polymorphisms (RFLP) and random amplified polymorphic DNA (RAPD) studies showed a high degree of polymorphisms in both species (Howland et al. 1995). Scots pine (Pinus sylvestris), a long-lived predominantly outcrossing tree, also had high RFLP and microsatellite diversity (Karhu et al. 1996), but nucleotide diversity was rather low (Dvornyk et al. 2002).
Other results on nucleotide level evolution of trees concern substitution rates between species (Liston et al. 1999; Manos et al. 1999; Steane et al. 1999). The genera Alnus and Betula showed different substitution rates for three different genes: substitutions per site per year were 0.43 × 10−10 for the rbcL gene, 1.2 × 10−10 for the nuclear 18S rRNA gene and 11 × 10−10 for the ITS1 and ITS2 regions (Savard et al. 1993). These rates are lower than the synonymous and nonsynonymous substitution rates in Drosophila: 15.60 × 10−9 and 1.91 × 10−9 substitutions per site per year (Li 1997; pp. 190–191). Because the substitution rate at neutral loci is governed by the mutation rate, low substitution rates predict low nucleotide polymorphism, 4Neµ (where Ne is the effective population size and µ is the the mutation rate).
We have been studying the genetic regulation of flower development in Betula pendula Roth (silver birch) (Elo et al. 2001; Lemmetyinen et al. 2001), which is one of the three ecologically most important forest tree species in Northern Europe (Anonymous 1999). The current population structure of B. pendula in Finland appears to have been formed mainly by the postglacial migration from refugia located southwest, south and southeast from Finland (Huntley & Birks 1983; Hyvärinen 1987; Willis et al. 2000). Betula pendula is a wind-pollinated, diploid species with 2n = 28 (de Jong 1993). In contrast to typical angiosperms, the monoecious birch has separate male and female catkins (inflorescences). The male flowers consist of a minute perianth and two stamens, whereas the female flowers consist only of a pistil, having no perianth (Atkinson 1992).
Molecular and genetic data show that the mechanisms controlling flower development are largely conserved even in distantly related plant species (Yanofsky 1995). The focus of this study is the birch gene BpMADS2. BpMADS2 is similar to PISTILLATA (PI) from Arabidopsis thaliana and GLOBOSA (GLO) from Antirrhinum majus, which are homeotic B-function genes. B-function genes are needed for the specification of the identity of petals and stamens, and mutations in B-function genes display transformations of petals into sepals and stamens into carpels (Jack et al. 1992; Goto & Meyerowitz 1994). In angiosperms, there are two B-function genes, which belong to related clades of the plant MADS (yeast MCM1, Arabidopsis thaliana AGAMOUS, Antirrhinum majus DEFICIENS, human SRF) box genes and which encode polypeptides functioning as heterodimeric transcription factors (Yanofsky 1995). In addition to the well-conserved DNA-binding MADS domain, MADS proteins also contain three other regions, the I-region, the K-domain and the C-terminal region, which are less conserved than the MADS domain.
In this study we describe the isolation of BpMADS2 and try to find the answers to the following questions. (i) Is BpMADS2 a true birch homologue of PI and GLO? (ii) What is the pattern of variation in the different parts of the gene, also in comparison to the available Arabidopsis PI data? (iii) How is the level of variation related to the low substitution rates found in many woody angiosperms? (iv) Is the postglacial colonization history reflected in the distribution of variation within populations?
Materials and methods
Plant materials
For the isolation of cDNA and for the Southern and Northern blot hybridizations male and female inflorescences were collected from adult trees so that all the main developmental stages were represented. Leaves for Northern blot hybridization were collected from 4-week-old in vitro grown Betula pendula (clone JR 1/4) seedlings. The samples were frozen in liquid nitrogen and stored at −80 °C.
For the analysis of variation, young leaves of B. pendula were collected from adult trees (≈ 65–70 years of age and 20–25 m in height, 10 individuals from each population) in Punkaharju (61°49′ N, 29°19′ E), Finland and in Rovaniemi (66°20′ N, 26°40′ E), Finland (Fig. 1). The distance between these sites was about 550 km. The samples were frozen and stored as above.

The geographical location of the two Betula pendula populations in Finland. The distance between the populations is about 550 km.
Isolation of the cDNA clone of BpMADS2
Isolation of total RNA was performed according to Friemann et al. (1992) and isolation of mRNA was according to Davis et al. (1986; pp. 139–142). First-strand cDNA was prepared from the mRNA extracted from young male inflorescences using a First-Strand cDNA Synthesis kit (Amersham Pharmacia Biotech). Partial cDNA corresponding to BpMADS2 was obtained by polymerase chain reaction (PCR) first using a partially degenerative forward primer (5′ TGTGTTGTCGACGATGGGIAGAGGAAA(A/G)ATIGA G 3′), corresponding to peptide sequence MGRGKIE from the MADS-boxes of the known PI-homologues, and a reverse primer (5′ GTGAATTCTTCICCITTIA(A/G)(A/G)TG(T/C)CT G 3′), corresponding to peptide sequence RHLKGE from the K-boxes of the known PI-homologues. Conditions for the first PCR amplification with the degenerative primers were as follows: 96 °C for 2 min, five cycles of 95 °C for 1 min, 44 °C for 1 min, 72 °C for 1 min, followed by 30 cycles of 95 °C for 1 min, 53 °C for 1 min, 72 °C for 1 min and an extension of 5 min at 72 °C. The PCR product was cloned into the pUC18 vector and sequenced. A new forward primer (5′ TGATGCTAAAGTTCCTCTTG 3′) was designed on the basis of the sequence obtained and a second PCR was performed with this specific primer, together with an oligo-d(T) primer. Conditions for the second PCR amplification with the specific and oligo-d(T) primers were as follows: 96 °C for 2 min, followed by 30 cycles of 95 °C for 30 s, 53 °C for 30 s, 72 °C for 1 min and an extension of 2 min in 72 °C. Both of the PCR reactions were performed using Tbr polymerase (Dynazyme, Finnzymes Inc.). The PCR product was cloned into the pUC18 vector and sequenced. All sequencing reactions were performed with the dideoxy chain-termination method using the T7Sequencing™ Kit (Amersham Pharmacia Biotech). Both strands of the clone were sequenced and all differences between the strands were visually rechecked from chromatograms. The cDNA sequence of BpMADS2 is available from EMBL nucleotide Sequence Database (accession number AJ488589)
Isolation of genomic DNA and Southern blot hybridization
Southern blots were prepared by standard methods. DNA was isolated from leaves of B. pendula using a Dneasy Plant Mini kit (QIAGEN Corp.), digested with BamHI, XbaI, PstI and EcoRI restriction enzymes and 10 µg of total DNA was loaded per lane. A 533-bp fragment comprising a part of the K-box and the entire C-region of the BpMADS2 cDNA clone was used as a probe (nucleotides 337–870). This fragment was isolated by digestion of the BpMADS2 cDNA in the pUC18 vector (Amersham Pharmacia Biotech) by EcoRV and XbaI restriction enzymes. The filter was hybridized at 42 °C using a low-temperature hybridization solution for 16 h and washed twice for 15 min at room temperature in 1 × SSC (saline-sodium citrate)/0.5% sodium dodecyl sulphate (SDS), twice for 15 min at 37 °C in 1 × SSC/0.5% SDS and finally for 20 min at 65 °C in 0.1 × SSC/1% SDS.
Isolation of RNA and Northern blot hybridization
Total RNA was isolated using an Rneasy Plant Mini kit (QIAGEN Corp.). A Northern blot was prepared by standard methods with 10 µg RNA per lane. The same probe was used as for the Southern blot hybridization. The filter was hybridized at 42 °C for 16 h. The filter was first washed at room temperature in 1 × SSPE (saline-sodium phosphate-EDTA)/0.5% SDS, then washed twice for 15 min each at 37 °C in 1 × SSPE/0.5% SDS and finally for 10 min at 65 °C in 0.1 × SSPE/1% SDS.
Expression was also studied with PCR using first-strand cDNA from vegetative shoots, male inflorescences and young female inflorescences as a template (the conditions for the PCR and the primers used were the same as those used for Region I).
Isolation and sequencing of two regions of BpMADS2 gene
Total DNA was extracted from young leaves of B. pendula by Dneasy Plant Mini kit (QIAGEN Corp.) and used as a template for PCR amplification. Polymerase chain reactions were conducted using either Tbr polymerase (Dynazyme, Finnzymes Inc., Region I, Fig. 2) or Expand enzyme mix (Boehringer Mannheim, Region II, Fig. 2). The error rate for Tbr polymerase according to the manufacturer is twofold lower than that of Taq DNA polymerase. Expand polymerase mix is composed of a unique enzyme mix containing thermostable Taq and Pwo (proofreading activity) DNA polymerases. Pwo polymerase can reduce error frequency by a factor of 10 compared to Taq DNA polymerase. Our own data from direct PCR-product sequencing show no PCR mistakes made by Expand in coding the region of BpMADS2 Region II (about 3000 bp). This suggests, based on Poisson distribution of errors, that the 95% upper bound for the error rate is 1 × 10−3. Primers for PCR amplification were as follows: 5′ TGATGCTAAAGTTCCTCTTG 3′ (forward) and 5′-TGTTGTCGTTCTCTTTCTTG 3′ (reverse) for Region I and 5′ GGAGGAGGAGAATAAGTGCC 3′ (forward) and 5′ GGTGCACGAAATGATCGC 3′ (reverse) for Region II. Each region required different amplification conditions for optimal results. Conditions for Region I were as follows: 96 °C for 2 min, followed by 30 cycles of 95 °C for 30 s, 50 °C for 45 s and 72 °C for 1 min. Conditions for Region II were as follows: 96 °C for 2 min, followed by 30 cycles of 95 °C for 30 s, 57 °C for 45 s and 68 °C for 90 s. After amplification, PCR products were first cloned using a SureClone Ligation kit (Amersham Pharmacia Biotech), the positive clones were selected by PCR using vector primers (Universal primer and Reverse) and both strands were sequenced. Sequencing was conducted with an ALF automated sequencer (at A.I. Virtanen Institute, Kuopio, Finland). Sequencing primers were designed in 300–350-bp intervals. All sequence polymorphisms were visually rechecked from chromatograms. For direct sequencing, the PCR products were purified using QIAquick PCR Purification kit (QIAGEN Corp.). The sequencing reaction follows the dideoxy chain termination method directly using PCR products as templates. Region I sequences were totally rechecked with this method and so were the 5′ and 3′ ends and most of the singletons of Region II. The DNA sequence alignments are available from EMBL nucleotide Sequence Database (ALIGN_000395 to ALIGN_000398).

The genomic regions of BpMADS2 used in sequence variation analysis. Region I (about 770 bp) consists of two relatively small introns and some of the coding region. Region II (about 1680 bp) consists of one rather long intron, some of the coding region and 3′ untranslated region. The K-box of BpMADS2 has been cloned but not completely sequenced. According to preliminary results, the K-box (189 bp) of BpMADS2 contains at least two, but according to homology (with PI and GLO) probably three, introns.
Sequence analysis
Nucleotide and amino acid sequences of BpMADS2 were analysed using GCG program package release 10.0, program pileup (Genetics Computer Group, Inc.). Sequences in data banks were used in alignments together with the sequence of BpMADS2. Both nucleotide and amino acid alignments were constructed with genedoc and clustalx programs and refined visually taking both nucleotide and amino acid sequences into consideration. Neighbour-joining trees (Saitou & Nei 1987) of sequences were constructed using the programs available in clustalx (Thompson et al. 1997). In the neighbour-joining analysis, the Kimura-2P distance measure was used. All analyses were performed with 1000 bootstrap replicates.
The polymorphism data were analysed using the program package dnasp, version 3.5 (Rozas & Rozas 1999). Levels of nucleotide polymorphism were estimated as pairwise differences (π) and an estimate of θ (4Neµ) was obtained based on segregating sites (Ne is the effective population size and µ is the mutation rate, Nei 1987). Insertion/deletion (indel) and microsatellite length variations were not included in the estimates of nucleotide diversity. Identification of possible recombinants utilized the four-gamete test and the minimum number of recombination events (RM) (Hudson & Kaplan 1985). The Tajima (1989) and Fu & Li (1993) tests were used for testing the fit of the frequency distribution to the neutral expectation. Levels of nucleotide diversity between populations were estimated as average number of substitutions per site between populations (Dxy, Nei 1987) and FST (Hudson et al. 1992). Fisher's exact test and χ2 test (Sokal & Rohlf 1981) were used to assess the level of linkage disequilibrium (or non–random association between variants of different polymorphic sites). Microsatellite variation was analysed by comparing mean numbers of repeats between populations and haplotypes.
Results
Isolation and sequence analysis ofBpMADS2
PCR with degenerative primers was used to isolate a partial cDNA for the birch homologue of PI, named BpMADS2 (Betula pendula MADS2). Sequence comparisons revealed that this gene belonged to the same subgroup of plant MADS box genes as PI (Goto & Meyerowitz 1994) and GLO (Jack et al. 1992). A second PCR was performed with a specific primer and an oligo-d(T) primer. Sequence comparison revealed that the amplified DNA fragment encompassed an almost full-length cDNA of BpMADS2 (the missing 21 bp from the 5′-end of the coding region of BpMADS2 were isolated later, as part of a λ-clone from a genomic library, data not shown). Comparison of the sequence of BpMADS2 at amino acid level revealed rather high similarity to PI with 59% identity (same amino acids) and 72% similarity (similar amino acids), and also to GLO with 65% identity and 74% similarity and to MdPI (Malus domestica) (Yao et al. 2001) with 67% identity and 75% similarity.
To establish the relationships between BpMADS2 and some of the reported MADS genes from other plant species (Table 1), we produced an alignment of the amino acid sequences belonging to the PI clade (Purugganan et al. 1995; Yao et al. 2001). The whole coding sequences were used. This alignment was used as a guide to construct a phylogenetic tree (Fig. 3). The tree clearly shows that BpMADS2 clusters with 99% bootstrap support along with other higher eudicot PI homologues.
Subclass | Family | Species | Gene | Ref. | Accession no. |
---|---|---|---|---|---|
Asteridae | Scrophulariaceae | Antirrhinum majus | GLO | Trobner et al. (1992) | S28062 |
Asteridae | Solanaceae | Petunia hybrida | FBP1 | Angenent et al. (1992) | M91190 |
Asteridae | Solanaceae | Petunia hybrida | PMADS2 | Kush et al. (1993) | X69947 |
Asteridae | Solanaceae | Nicotiana tabacum | NtGLO | Hansen et al. (1993) | X67959 |
Dilleniidae | Brassicaceae | Arabidopsis thaliana | PI | Goto & Meyerowitz et al. (1994) | D30807 |
Rosidae | Rosaceae | Malus domestica | MdPI | Yao et al. (2001) | AJ291490 |
Rosidae | Betulaceae | Betula pendula | BpMADS2 | This study | AJ488589 |
Rosidae | Myrtaceae | Eucalyptus grandis | EGM2 | Southerton et al. (1998) | AF029976 |
Caryophyllidae | Caryophyllaceae | Silene latifolia | SLM2 | Hardenack et al. (1994) | X80489 |
Ranunculidae | Papaveraceae | Papaver nudicaule | PnPI-1 | Kramer et al. (1998) | AF052855 |
Commelinidae | Poaceae | Orysza sativa | OsMADS2 | Chung et al. (1995) | L37256 |
Commelinidae | Poaceae | Orysza sativa | OsMADS4 | Chung et al. (1995) | L37527 |
Coniferales | Pinaceae | Pinus radiata | PrDGL | Mourdarov et al. (unpublished) | AF120097 |
Dilleniidae | Brassicaceae | Arabidopsis thaliana | AP3 * | Thomas et al. (1992) | D21125 |
- * AP3 was used as an outgroup.

Phylogenetic tree of 13 PI-related genes from different plant species. The tree was constructed using the clustalx neighbour-joining program using Arabidopsis thaliana AP3 gene as an outgroup member. The numbers show bootstrap values for each node. All nodes with < 50% bootstrap support are collapsed.
Expression ofBpMADS2
Southern hybridization revealed that there was only one genomic fragment hybridizing with the probe consisting of the end of the K box, the sequence encoding the C-terminal domain of the protein and the untranslated 3′ end (data not shown). Therefore, BpMADS2 is a single copy gene and this probe without the MADS box could be used as a gene-specific probe.
Northern hybridization (Fig. 4) showed that BpMADS2 was active in male inflorescences of birch and that the activity was weak during the early stages of inflorescence development. When the inflorescences grew rapidly, before flower opening, the expression became stronger. In female inflorescences, there was a very weak expression at the early stage of development. In contrast to some other genes regulating flower development in birch (e.g. BpMADS1, BpMADS3, BpMADS4 and BpMADS5; Elo et al. 2001; Lemmetyinen et al. 2001), no expression was detected at the later stages of female inflorescence development nor was there any expression in the vegetative parts. PCR confirmed that BpMADS2 was expressed in male and young female inflorescences but not in vegetative tissues (data not shown). Inflorescence-specific expression, with high activity in male inflorescences and weak expression in young female inflorescences, is consistent with the idea that BpMADS2 is the PI homologue of birch.

Expression of BpMADS2 in vegetative parts (leaves) as well as in male and female inflorescences at various stages of development assayed by RNA gel blot. Methylene blue staining was used as a control for RNA loading.
Nucleotide polymorphism
We studied polymorphism in two regions of the BpMADS2 gene (Fig. 2). Two regions were used, because the whole BpMADS2 gene was too long and difficult to amplify as one fragment. The regions studied were selected to contain mainly noncoding regions, which could be expected to vary more than coding regions. Region I (about 770 bp) comprises the 3′ end of the MADS-box (35 bp), the I-region (intervening region, 89 bp) and the 5′ end of the K-box (20 bp) and two introns (113–161 bp and 506 bp). Region II (about 1680 bp) comprises most of the C-terminal region (158 bp), one intron (1347–1360 bp) and some of the 3′ untranslated region (163–165 bp). In Region I we obtained both alleles but in Region II only one allele. In all, nucleotide polymorphism statistics are based on 303 sites in the coding regions (altogether 8960 bp), and 2096 noncoding sites (gaps excluded) where data were available for all sequences.
When all of the sequences from both populations (20 individuals) were aligned together (Tables 2 and 3), we found 69 segregating sites, of which 23 were singletons, 20 were microsatellite length variations and one was an indel. The synonymous site overall nucleotide diversity (πs) was 0.0043 and the nonsynonymous nucleotide diversity (πa) was 0.000052. The frequency spectrum under the neutral model (see Tajima 1989) is compared to the observed distribution in Fig. 5. For Region I, with no recombination, Tajima's D was not significant (Table 4). In Region II, the value (D = −0.9557) was not significant, but because there was evidence of recombination, the test is conservative.
Individual and allele | Intron I | Intron II | |||||
---|---|---|---|---|---|---|---|
63–111 (CT)n | 95 C | 153–204 (CT)n | 183 C | 516 C | 620 G | 636 C | |
P.1.a. | 13 | . | 14 | . | . | . | . |
P.1.b. | 13 | . | 14 | . | . | . | . |
P.3.a. | 14 | . | 16 | . | . | . | . |
P.3.b. | 14 | . | 11 | . | . | . | . |
P.5.a. | 14 | . | 16 | . | . | . | . |
P.5.b. | 14 | . | 16 | . | . | . | . |
P.7.a. | 14 | . | 16 | A | T | C | T |
P.7.b. | 14 | . | 18 | . | . | . | . |
P.9.a. | 16 | . | 18 | . | . | . | . |
P.9.b. | 13 | . | 11 | A | T | C | T |
P.11.a. | 12 | . | 17 | A | T | C | T |
P.11.b. | 12 | . | 11 | . | . | . | . |
P.13.a. | 13 | . | 16 | . | . | . | . |
P.13.b. | 14 | . | 14 | A | . | . | . |
P.15.a. | 14 | . | 16 | A | T | C | T |
P.15.b. | 11 | . | 18 | . | . | . | . |
P.17.a. | 14 | . | 21 | . | . | . | . |
P.17.b. | 13 | . | 16 | A | T | C | T |
P.19.a. | 12 | . | 13 | A | T | C | T |
P.19.b. | 12 | T | 13 | . | . | . | . |
R.1.a. | 24 | . | 25 | A | T | T | T |
R.1.b. | 14 | . | 18 | . | . | . | . |
R.2.a. | 21 | . | 21 | A | T | C | T |
R.2.b. | 13 | . | 13 | . | . | . | . |
R.4.a. | 12 | . | 13 | . | . | . | . |
R.4.b. | 13 | . | 13 | . | . | . | . |
R.5.a. | 13 | . | 13 | . | . | . | . |
R.5.b. | 13 | . | 13 | T | . | . | . |
R.7.a. | 13 | . | 13 | . | . | . | . |
R.7.b. | 16 | . | 13 | A | . | . | . |
R.8.a. | 13 | . | 15 | . | . | . | . |
R.8.b. | 14 | . | 14 | . | . | . | . |
R.10.a. | 11 | . | 20 | A | T | C | T |
R.10.b. | 14 | . | 15 | . | . | . | . |
R.11.a. | 12 | . | 13 | . | . | . | . |
R.11.b. | 14 | . | 13 | . | . | . | . |
R.13.a. | 14 | . | 15 | . | . | . | . |
R.13.b. | 14 | . | 18 | . | . | . | . |
R.14.a. | 12 | . | 13 | . | . | . | . |
R.14.b. | 16 | . | 20 | A | T | C | T |
- Only differences from the consensus sequence are shown. Dots indicate identity with the consensus sequence. The positions of the polymorphic sites in two different introns are indicated at the top. All allele sequences from two different populations are aligned together.
- a, cloned allele; b, allele from direct sequencing; P, punkaharju; R, Rovaniemi.
Ind. | Intron | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
165 A | 183 G/C | 189 C | 306 C | 322 A | 337 A | 338– 345 (T)n | 344 T | 346– 366 (A)n | 375– 379 (A)n | 385 C | 404– 407 TAG | 425 C | 428 A/G | 443– 446 (A)n | |
P.1. | . | G | . | . | . | . | 8 | . | 13 | 4 | . | . | . | A | 3 |
P.3. | . | G | . | . | . | . | 7 | . | 16 | 4 | . | . | . | A | 3 |
P.5. | . | G | . | . | . | . | 7 | . | 15 | 4 | . | . | . | A | 3 |
P.7. | . | G | . | . | . | . | 7 | . | 16 | 4 | . | . | . | A | 3 |
P.9. | . | C | . | . | . | G | 8 | . | 15 | 4 | . | . | T | G | 4 |
P.11. | . | G | . | . | . | . | 7 | . | 16 | 4 | . | . | . | A | 3 |
P.13. | . | G | . | . | . | . | 7 | . | 13 | 4 | . | . | . | G | 3 |
P.15. | . | G | . | . | . | . | 7 | . | 17 | 4 | . | . | . | A | 3 |
P.17. | . | G | . | . | . | . | 7 | . | 17 | 4 | . | . | . | A | 3 |
P.19. | . | C | . | T | C | . | 7 | . | 15 | 4 | . | — | . | G | 3 |
R.1. | . | G | . | . | . | . | 7 | . | 17 | 4 | . | . | . | A | 3 |
R.2. | G | C | . | T | C | . | 6 | — | 15 | 4 | . | . | . | G | 3 |
R.4. | . | C | T | . | C | . | 6 | C | 15 | 4 | . | . | . | G | 3 |
R.5. | . | C | . | . | . | G | 8 | . | 13 | 4 | . | . | T | G | 3 |
R.7. | G | C | . | T | C | . | 6 | — | 17 | 4 | . | . | . | G | 3 |
R.8. | . | C | . | . | C | . | 7 | . | 16 | 4 | . | . | . | G | 3 |
R.10. | . | C | . | . | C | . | 7 | . | 20 | 4 | . | . | . | G | 3 |
R.11. | . | G | . | . | C | . | 6 | C | 18 | 4 | T | . | . | A | 3 |
R.14. | . | G | . | . | . | . | 7 | . | 17 | 4 | . | . | . | A | 3 |
R.15. | G | C | . | T | C | . | 6 | — | 16 | 5 | . | . | . | G | 3 |
Ind. | Intron | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
471 G | 489– 500 (T)n | 501 C | 508– 514 (A)n | 515 T | 519 G | 534 G | 538 C/T | 543 G | 599– 603 (C)n | 600 C | 603 C | 604 — | 629– 634 (A)n | 680 T | |
P.1. | . | 11 | T | 7 | . | . | . | T | . | 5 | . | . | . | 6 | . |
P.3. | . | 10 | . | 6 | . | . | . | T | . | 5 | . | . | . | 6 | . |
P.5. | . | 11 | . | 6 | . | . | . | T | . | 5 | . | . | . | 6 | . |
P.7. | . | 10 | . | 6 | . | . | . | T | . | 5 | . | . | . | 6 | . |
P.9. | . | 11 | . | 6 | . | . | . | C | . | 5 | . | . | . | 6 | . |
P.11. | . | 11 | . | 6 | . | . | . | T | . | 5 | . | . | . | 6 | . |
P.13. | . | 9 | . | 6 | . | . | . | C | . | 5 | . | . | . | 6 | C |
P.15. | . | 10 | . | 6 | . | . | . | T | . | 5 | . | . | . | 6 | . |
P.17. | . | 10 | . | 6 | . | . | . | T | . | 5 | . | . | . | 6 | . |
P.19. | . | 9 | . | 6 | . | T | . | C | . | 5 | . | . | . | 6 | . |
R.1. | . | 11 | . | 6 | . | . | . | T | A | 5 | . | . | . | 6 | . |
R.2. | . | 10 | . | 6 | . | T | . | C | . | 5 | . | . | . | 6 | . |
R.4. | . | 10 | . | 6 | . | . | A | C | . | 5 | . | . | . | 6 | C |
R.5. | . | 12 | . | 6 | . | . | . | C | . | 4 | . | — | . | 6 | . |
R.7. | A | 9 | . | 6 | . | T | . | C | . | 5 | . | . | . | 6 | . |
R.8. | . | 10 | . | 6 | . | . | . | C | . | 4 | A | . | . | 5 | . |
R.10. | . | 10 | . | 7 | — | . | . | C | . | 5 | . | . | T | 6 | . |
R.11. | . | 11 | . | 6 | . | . | . | T | . | 5 | . | . | . | 6 | . |
R.14. | . | 11 | T | 7 | . | . | . | T | . | 5 | . | . | . | 6 | . |
R.15. | A | 10 | . | 6 | . | T | . | C | . | 5 | . | . | . | 6 | . |
Ind. | Intron | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
688 C | 696– 703 (T)n | 704 A | 757 G | 781 G | 816– 831 (A)n | 880– 893 (A)n | 895 T | 897 A | 908 G | 923 T | 979 T | 1033 A | 1048T | 1074 T | 1117 G | |
P.1. | T | 7 | . | . | . | 14 | 11 | . | . | . | . | . | . | . | . | . |
P.3. | . | 7 | . | . | . | 14 | 12 | . | . | . | . | . | . | . | C | . |
P.5. | . | 7 | . | . | . | 13 | 12 | . | . | . | . | . | . | . | . | . |
P.7. | . | 7 | . | . | . | 14 | 11 | . | . | . | . | . | G | . | . | . |
P.9. | . | 7 | . | . | A | 15 | 12 | . | . | T | G | . | . | . | . | . |
P.11. | . | 7 | . | . | . | 14 | 11 | . | . | . | . | . | . | . | . | . |
P.13. | . | 7 | G | . | . | 15 | 11 | — | G | . | . | . | . | . | . | . |
P.15. | . | 7 | . | . | . | 15 | 12 | . | . | . | . | . | . | . | . | . |
P.17. | . | 7 | . | . | . | 14 | 12 | . | . | . | . | . | . | . | . | . |
P.19. | . | 7 | G | . | . | 13 | 11 | . | . | . | . | . | . | C | . | A |
R.1. | . | 7 | . | . | . | 14 | 12 | . | . | . | . | . | . | . | . | . |
R.2. | . | 7 | G | . | . | 13 | 12 | . | . | . | . | . | . | . | . | . |
R.4. | . | 8 | G | . | . | 14 | 12 | . | G | . | . | . | . | . | . | . |
R.5. | . | 7 | . | . | A | 15 | 14 | . | . | . | . | . | . | . | . | . |
R.7. | . | 7 | G | . | . | 14 | 11 | . | . | . | . | . | . | . | . | . |
R.8. | . | 7 | . | A | . | 16 | 14 | . | . | . | . | . | . | . | . | . |
R.10. | . | 7 | G | . | . | 13 | 10 | . | . | . | . | G | . | . | . | . |
R.11. | . | 7 | . | . | . | 15 | 11 | . | . | . | . | . | . | . | . | . |
R.14. | T | 7 | . | . | . | 13 | 10 | . | . | . | . | . | . | . | . | . |
R.15. | . | 7 | G | . | . | 14 | 10 | . | . | . | . | . | . | . | . | . |
Ind. | Intron | Exon | 3′ Untranslated | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1126 — | 1128 C | 1156 A | 1176– 1187 (A)n | 1186 A | 1190 T | 1207– 1225 (T)n | 1250 T | 1251 C | 1274 C | 1370– 1377 (C)n | 1416 G | 1569 A | 1599 G/T | 1666 T | 1677– 1688 (AT)n | 1699 A | 1700 C | |
P.1. | . | . | . | 10 | . | . | 18 | . | . | G | 7 | . | . | G | · | 6 | · | · |
P.3. | T | A | . | 9 | . | . | 17 | . | . | . | 7 | . | . | G | · | 6 | · | · |
P.5. | T | A | G | 10 | . | . | 17 | . | . | . | 7 | . | . | G | · | 6 | · | · |
P.7. | T | A | . | 11 | . | . | 17 | . | . | . | 7 | . | . | G | · | 6 | · | · |
P.9. | . | . | . | 11 | . | . | 17 | . | . | . | 7 | . | G | T | · | 6 | · | · |
P.11. | T | A | . | 10 | . | . | 18 | . | . | . | 7 | . | . | G | C | 6 | · | · |
P.13. | . | . | . | 10 | . | . | 15 | . | . | . | 7 | . | . | T | · | 6 | · | · |
P.15. | T | A | . | 11 | . | . | 18 | . | . | . | 7 | . | . | T | · | 6 | · | · |
P.17. | T | A | . | 9 | . | . | 18 | . | . | . | 7 | . | . | G | · | 6 | · | · |
P.19. | . | . | . | 12 | . | A | 19 | A | T | . | 8 | . | . | T | · | 5 | · | T |
R.1. | T | A | . | 10 | . | . | 17 | . | . | . | 7 | . | . | G | · | 6 | · | · |
R.2. | . | . | . | 11 | C | A | 12 | . | T | . | 7 | A | . | T | · | 6 | · | · |
R.4. | . | . | . | 10 | . | . | 14 | . | . | . | 7 | . | . | T | · | 6 | · | · |
R.5. | . | . | . | 10 | . | . | 17 | . | . | . | 7 | . | . | T | · | 6 | · | · |
R.7. | . | . | . | 10 | C | A | 12 | . | T | . | 7 | . | . | T | · | 6 | C | · |
R.8. | . | . | . | 11 | . | . | 16 | . | . | . | 7 | . | . | T | · | 6 | · | · |
R.10. | . | . | . | 10 | . | . | 17 | . | . | . | 7 | . | . | G | · | 6 | · | · |
R.11. | T | A | . | 10 | . | . | 18 | . | . | . | 7 | . | . | G | · | 6 | · | · |
R.14. | . | . | . | 10 | . | . | 18 | . | . | . | 7 | . | . | G | · | 6 | · | · |
R.15. | . | . | . | 9 | C | A | 12 | . | T | . | 7 | . | . | T | · | 6 | · | · |
- Only differences from the consensus sequence are shown. Dots (·) indicate identity with the consensus sequence and lines (–) indicate deletions. The positions of the polymorphic sites are indicated at top.

Frequency spectrum of polymorphic nucleotides in Region I and Region II in BpMADS2 gene. The expectation of each frequency class was calculated according to Tajima (1989).
Population | Region I | Region II | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
n | S | θ | π (SD) | D | D * | n | S | θ | π (SD) | D | D * | |
Punkaharju | 20 | 5 | 0.00189 | 0.00256 | 1.07668 | 0.38667 | 10 | 31 | 0.00659 | 0.00471 | −1.37603 | −1.68340 |
(0.00044) | (0.00116) | |||||||||||
Rovaniemi | 20 | 4 | 0.00151 | 0.00200 | −0.37024 | −0.15415 | 10 | 29 | 0.00617 | 0.00570 | −0.36688 | −0.50483 |
(0.00056) | 0.00050) | |||||||||||
All | 40 | 5 | 0.00158 | 0.00226 | 0.05903 | −0.97624 | 20 | 42 | 0.00716 | 0.00544 | −0.95574 | −1.30325 |
(0.00035) | (0.00056) |
- n is the number of alleles surveyed; S is the number of mutations; θ (per site) is from S from Watterson (1975); π is the estimation of nucleotide diversity (Nei, 1987); D is Tajima's D (P > 0.10); D* is Fu and Li's (1993) D* (P > 0.10).
In Region I, a total of five nucleotide and three length variation polymorphisms were detected from 40 alleles (Table 2). In both populations all of the segregating sites were located within the introns. In population Punkaharju the nucleotide diversity (π) for the entire region was 0.0026 and in population Rovaniemi it was 0.0020, thus both populations were equally variable in this part of the gene.
In Region II, a total of 43 nucleotide, one indel and 17 length variation polymorphisms were detected from 20 alleles. In population Punkaharju all of the segregating sites were located within the noncoding region but in population Rovaniemi there was one nonsynonymous polymorphism in the coding region. This was a transition of G to A at site 1412, which changed the amino acid alanine into threonine. In population Punkaharju the nucleotide diversity (π) for the entire region was 0.0047 and in population Rovaniemi 0.0057, which means that polymorphism in Region II was twice as high as in Region I. In Region I, the total number of haplotypes among the 40 sequences was six, whereas the 20 sequences of Region II each represented a different haplotype. When the cloned alleles (20) of Regions I and II were combined, nucleotide diversity for the population Punkaharju was 0.0039 and for the population Rovaniemi was 0.0045 (Table 5), suggesting that both populations had equal levels of nucleotide polymorphism.
Population | Region I | Region II | In total | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Coding region | Intron 1 | Coding region | Intron 2 | Coding region | Coding region | Intron 3 | Coding region | 3′untranslated region | Coding region | Introns | |
Punkaharju | 0 | 0.0062 | 0 | 0.0026 | 0 | 0 | 0.0050 | 0 | 0.0070 | 0.0000 | 0.0042 |
Rovaniemi | 0 | 0.0049 | 0 | 0.0020 | 0 | 0 | 0.0064 | 0.0014 | 0.0044 | 0.0007 | 0.0051 |
In total | 0 | 0.0055 | 0 | 0.0020 | 0 | 0 | 0.0060 | 0.0007 | 0.0057 | 0.0003 | 0.0047 |
Summary of the statistics of within population variation at Regions I and II (based on 303 sites in coding regions and 2096 sites in noncoding regions) are presented in Table 4 and a summary of the distribution of the nucleotide diversity is given in 5, 6 and in Table 5.

Sliding window analysis of nucleotide diversity (dnasp). The analysis is based on a 100-bp window that moves across the gene in 25-bp intervals. The exon–intron structure of the gene regions used is indicated as a reference. The continuous line indicates the Punkaharju population and the broken line indicates the Rovaniemi population.
Divergence between populations
In Region I, nucleotide diversity, π(total) was 0.0023. The average number of nucleotide substitutions per site between the two populations, Dxy, was 0.0022 and FST was −0.0226 (Hudson et al. 1992). In Region II, π(total) was 0.0054, Dxy was 0.0057 and FST was 0.0930. Region II clearly showed more divergence than Region I. These results show that the genetic differentiation between the populations overall was very low.
Dimorphism of haplotypes
Two classes of BpMADS2 alleles were present in both populations (Fig. 7). In Region I (20 individuals, 40 alleles) the class A alleles, which were found in all 20 individuals, comprised 75% of the sequences. These formed a group with 80% bootstrap support in the neighbour-joining tree. The remaining 25% of the alleles (referred to as B alleles) formed another group in the phylogeny. Nucleotide divergence, Dxy, between the two allele classes was 0.0053. In Region II (20 individuals, 20 alleles) the class A alleles, which were found in 16 individuals and comprised 80% of the sequences, formed a monophyletic group with 87% bootstrap support in the neighbour-joining tree (one should notice that we have no conclusive evidence suggesting that the two allele groups A and B in Regions I and II are associated). Nucleotide divergence, Dxy, between the two allele classes was 0.0084.

Gene genealogies of BpMADS2 alleles. All nodes with < 50% bootstrap support are collapsed; the other bootstrap values are indicated next to relevant nodes. Class A and B alleles for BpMADS2 Regions I and II are indicated (one should notice, that we have no conclusive evidence suggesting that the two allele groups A and B in Regions I and II are associated).
An analysis of recombination confirmed the strong dimorphism in Region I
R M indicates the minimum number of recombination events in the history of the sample and it is obtained by using the four-gamete test (Hudson & Kaplan 1985). RM is known to underestimate the total number of recombination events. Because RM was zero in Region I in both populations, apparently no intragenic recombination has occurred in this region. In Region II, the test found two recombination events [(183, 428) (1128, 1599)] in population Punkaharju and three recombination events [(183, 322) (322, 428) (704, 1599)] in population Rovaniemi (Table 3).
Linkage disequilibrium
Significant linkage disequilibria were observed within both regions. In Region I in population Punkaharju, there was pairwise association by χ2 test between six pairs of sites, and in population Rovaniemi between one pair of sites (516, 636). The test for linkage disequilibrium had more power in Punkaharju, because the variant haplotype had a higher frequency. In Region II population Punkaharju showed significant pairwise association by χ2 test between eight pairs of sites, but only one pair of sites remained significant after Bonferroni correction [(428, 538), 0.001 < P < 0.01]. In population Rovaniemi Region II showed no evidence for linkage disequilibrium.
Variation at microsatellite loci
Introns of the BpMADS2 gene appeared to harbour a number of sequence repeat motifs containing four or more mono- or dinucleotide repeats. We recognized a sequence motif as a microsatellite locus if it contained at least six repeat units (mono- and dinucleotide repeats). In the three introns analysed from both populations at least 30 microsatellite regions where identified (Fig. 8). Eight of the 30 microsatellites found in BpMADS2 consisted of pure (A)n or (T)n mononucleotide repeats and many other longer microsatellites had an (A)n or (T)n repeat inside the microsatellite. In addition to this there were seven imperfect microsatellites interrupted by one nonrepeat nucleotide. Two microsatellites contained (CT)n dinucleotide repeats and one microsatellite contained an (AT)n dinucleotide repeat. The rest of the microsatellites are more complicated.

The distribution of microsatellites in the two BpMADS2 regions analysed.
Region I of BpMADS2 displayed substantial length variation with sequences ranging in size from 763 bp to 811 bp. This was the result of two small microsatellite repeats (CT)n in all of the individuals analysed. Region II also displayed some length variation with sequences ranging in size from 1670 bp to 1683 bp. This was mainly because of six small (A)n and (T)n microsatellite repeats. The average repeat sizes of microsatellites were similar between populations (Table 6).
Population | Microsatellite I (n) | Microsatellite II (n) | ||||
---|---|---|---|---|---|---|
Haplotype A | Haplotype B | Haplotype A | Haplotype B | |||
Punkaharju | 13.3 (SD ± 1.13) | 13.4 (SD ± 1.22) | 13.0 (SD ± 0.89) | 15.3 (SD ± 2.63) | 15.4 (SD ± 2.82) | 14.8 (SD ± 2.32) |
Rovaniemi | 14.3 (SD ± 3.10) | 13.4 (SD ± 1.02) | 18.0 (SD ± 5.72) | 15.3 (SD ± 3.48) | 14.1 (SD ± 1.73) | 22.0 (SD ± 2.65) |
Total average | 13.4 (SD ± 1.10) | 15.0 (SD ± 4.24) | 14.7 (SD ± 2.37) | 18.4 (SD ± 4.24) |
The longer microsatellites were variable within populations, as Tables 2 and 3 show. However, there was a difference in average microsatellite size content between the two haplotypes of Region I. Especially in population Rovaniemi, haplotype B was associated with the longer microsatellites in this region. In population Punkaharju, the haplotype did not extend as far to the 5′ direction (Table 2).
Discussion
Our study is based on 65–70-year-old birches that because of natural regeneration are apparently unaffected by any human selection. Recently, in Finland, more and more birches have been planted with material coming from selected trees. Although the proportion of planted birches to those originating from seeds from native trees is still small (about 3% between 1980 and 1998, Anonymous 1999), it is increasing all the time. Therefore studies performed on the current genetic structure of populations are likely to be very valuable in the future for providing baseline reference values of diversity.
BpMADS2 is the PI-homologue of birch
The sequence analysis of BpMADS2 shows that it belongs to the PI clade of MADS box genes. The high expression of BpMADS2 in male inflorescences and low or absent expression in the female inflorescences further support the notion that BpMADS2 is a B-function gene resembling PI, and that its regulation is more similar to that of PI than that of GLO, which is not expressed in whorl 4.
The presence of both of the two highly conserved sequence motifs common to members of the PI clade (MPF × FRVQP × QPNLQE in the C-terminal end and KHE × L in the K domain) (Kramer et al. 1998) further support the assumption that BpMADS2 is the PI-homologue of birch.
Low level of within-population DNA variation in the coding region of BpMADS2
This study included a small part, 303 bp, of the coding region of BpMADS2, which proved to have only one variable site. This resulted in an estimate of nonsynonymous polymorphism (πa) of 0.000052. There was no synonymous variation in the coding region. The overall silent variation (πs) including introns and the 3′ untranslated area was 0.0043. The level of nonsynonymous polymorphism in corresponding gene areas of Arabidopsis thaliana is much higher with πa = 0.00055 [the overall silent variation (πs) including corresponding introns, but not the 3′ untranslated area, was 0.0045].
The level of diversity of BpMADS2 is also lower than the overall estimates of species-wide nucleotide diversity in Arabidopsis for two other MADS box genes (AP3, π = 0.0065 and CAL, π = 0.0070; Purugganan & Suddith 1998, 1999) and for two other structural loci (ChiA, π = 0.0104 and Adh, π = 0.0080; Innan et al. 1996; Kawabe et al. 1997). The estimate of species-wide nucleotide diversity, π, for BoCAL was 0.0030 in Brassica oleracea (37 world-wide accessions, four distinct subspecies; Purugganan et al. 2000), which is about half the value observed for the Arabidopsis CAL gene and also lower than values for BpMADS2. When comparing these different values, one should notice the population and sampling structure in the different species. In the studies with Arabidopsis 17–21 distinct ecotypes representing the world-wide diversity have been used. Also these differences can partly be explained with the history of A. thaliana: an excess of low-frequency nucleotide polymorphisms suggests that Arabidopsis has undergone recent, rapid population expansion and now exists in small, inbred subpopulations (Purugganan & Suddith 1999). Also the effective population size of Betula pendula is likely to be much greater than that of Arabidopsis, mainly because of its large distribution area and the efficient spreading of pollen by wind. Very low values of nucleotide diversity have also been found in Antirrhinum majus, where of the five putative FIL1 loci sequenced, no variation was found at all, within or between populations, for three loci (the coding, the intron, or the 3′ flanking regions; Vieira & Charlesworth 2001).
Another long-lived predominantly out-crossing perennial, Scots pine (Pinus sylvestris L.) also shows low level of within-population DNA variation in the PAL gene (single exon without introns) with πs = 0.0049 and πa = 0.0003 in the four populations studied (Dvornyk et al. 2002). This means that the level of synonymous site overall nucleotide diversity in Scots pine is about the same as in silver birch, but the nonsynonymous polymorphism is much higher.
Variation in different regions ofBpMADS2
All of the three introns studied showed much more variation than the coding region of BpMADS2 (Table 5). The clearly higher variation in introns 1 and 3 in comparison to that in intron 2 might indicate that intron 2 is subject to different functional constraints than introns 1 and 3 and also the 3′ untranslated region. In the coding region there is only one polymorphism and therefore conclusions on relative levels of variation in the different regions are not possible.
Also in Arabidopsis both introns showed much more variation than the studied coding region. As in B. pendula, intron 1 (location of intron 1 corresponds to intron 2 in B. pendula), showed much less variation than intron 3 (Table 7). As aligning the introns between species is not possible, we were not able to formally test for different rates of divergence and polymorphism.
Coding region | Intron 1 | Intron 2 | Intron 3 | |
---|---|---|---|---|
Betula pendula (both populations) | 0.00033 (0.00029) | 0.00548 (0.00078) | 0.00230 (0.00043) | 0.00599 (0.00061) |
Arabidopsis thaliana (16 ecotypes) | 0.00213 (0.00070) | — | 0.00548* (0.00050) | 0.01185 (0.00278) |
- Location of intron 1 in PI corresponds to intron 2 in BpMADS2. Standard deviations are given in parentheses.
- * In PI sequences the intron corresponding to intron 1 in BpMADS2 is lacking, and therefore intron 2 of BpMADS2 corresponds to intron 1 in PI and intron 3 of BpMADS2 corresponds to intron 2 in PI.
The low level of nucleotide diversity found in birch is consistent with other findings on woody perennials. Interestingly, the rbcL gene has been shown to evolve faster in annual angiosperms than in perennial angiosperms (Bousquet et al. 1992; Savard et al. 1993). The lower variation in BpMADS2 is consistent with this idea. Also the CoxI (cytochrome oxidase subunit 1) genes of woody perennials (either monocots or dicots) showed a slower substitution rate when compared with herbaceous annual taxa (Laroche et al. 1997). In the genus Sidalcea the annual species had significantly higher molecular evolutionary rates than the perennials in ITS and ETS (external transcribed spacers) regions (Andreasen & Baldwin 2001). Several hypotheses have been suggested to explain this rate heterogeneity among plant taxa which clearly follows a trend related to life history. In this case, generation time could be the factor because the woody perennials (longer generation times) analysed showed lower numbers of nucleotide substitutions per site than herbaceous annual taxa (shorter generation times). The trend was more obvious at synonymous than at nonsynonymous sites, which could be expected from generation time effect because selection is likely to be less stringent for synonymous sites. The mechanisms for the lower rate of evolution in perennials is not well understood. We do not know, for example how many cell divisions there are between generations of zygotes in annual or perennial plants. Nor do we know whether there are any differences in the frequencies of errors made by the DNA polymerase in different types of plants.
Dimorphic variation in BpMADS2
Both Region I and Region II of BpMADS2 displayed allelic dimorphism. In each region, the main group represented about 75% of the whole sample. Even though the dimorphism at some parts of the gene was maximal, this was not strong enough to results in a significant Tajima's D or Fu and Li test. In Region I the dimorphism was based on just five segregating sites. The microsatellites that are also in disequilibrium could not be incorporated in these tests. In Region II the dimorphism was not as evident as in Region I (see Fig. 7). Neutral coalescent processes can also generate patterns resembling dimorphism because of the expected long branches before the coalescence. Nevertheless, the pattern suggesting dimorphism was here found in both parts of the gene, and a similar finding has been made at another locus, BpMADS5 (Järvinen et al. unpublished results). However, the existence of dimorphism in the B. pendula genome requires further confirmation from more genomic regions. The results here can be compared to the findings of Dvornyk et al. (2002). The sequences sampled from across Europe from the pal1 locus showed no evidence of dimorphism. The possible occurrence of allelic dimorphism in birch might result from contact between two previously isolated populations. The initial birch forests spread rapidly into Fennoscandia after the last glacial epoch about 10 000 years ago as a result of postglacial migration of individuals from refugia located southwest, south and southeast from Finland (Huntley & Birks 1983; Hyvärinen 1987). In general, the northern part of Europe, including Finland, has been colonized primarily from Iberic and Balkan refugia (Taberlet et al. 1998).
Variation in chloroplast DNA shows that the silver birches in Europe can be classified into two main haplogroups, the distribution of which provides evidence for multiple origins after the last glaciation (Palméet al. submitted for publication). Although these two haplogroups can be found across most of Europe they show a clear geographical distribution: one haplogroup is dominant in the northwest and the other is dominant in the southeast and east. In Finland the northwestern haplotype represents 10–22% of the whole sample, the southeastern/eastern being the dominant haplotype. In the BpMADS2 locus the rarer allele type represents 10–30% of the whole sample. According to these results it is possible that the two main allele types of the nuclear gene BpMADS2 reflect the same dual origin of the Finnish birches as the chloroplast haplotypes.
In A. thaliana, dimorphic DNA variation was observed in the ADH region and was considered to be the result of balancing selection (Hanfstingl et al. 1994) and/or fusion of two diverged subpopulations (Innan et al. 1996). Similarly, dimorphism was observed in the ChiA region (Kawabe et al. 1997), in the CAULIFLOWER region (Purugganan & Suddith 1998), in the AP3 and PI regions (Purugganan & Suddith 1999) as well as in ChiB region (Kawabe & Miyashita 1999), suggesting that the presence of dimorphic variations could be a characteristic of the nuclear genome of A. thaliana. However, not all A. thaliana genes show such dimorphism (Aguadé 2001).
Variation between the two populations
The differentiation between the two populations from different parts of Finland was very low. The current population structure of B. pendula in Finland appears to have been formed mainly by postglacial migration (Huntley & Birks 1983; Hyvärinen 1987; Willis et al. 2000). According to fossil records, B. pendula populations were not limited to only a few southern refugia but occurred along the edge of the ice sheet and rapidly started to spread into the rest of Europe once the glaciers started to retreat. Because of these nonisolated refugial birch populations one could expect lower differentiation than in species whose populations were isolated from each other during the last glaciation. Low levels of differentiation between these two B. pendula populations may also indicate a large effective population size in B. pendula.
The differentiation between four Scots pine populations separated by thousands of kilometres was also very low but it did not show any allelic dimorphism (Dvornyk et al. 2002). The lack of dimorphism in Scots pine might be due to its history during the last glaciation. The main refugia of Pinus sylvestris were probably in southern Europe (Spain, northern Italy and Romania) from where it started to spread into much of central and northern continental Europe (Hyvärinen 1987).
Microsatellites
Microsatellites are present in high numbers in mammals, but appear to be less abundant in plant genomes. The estimate of microsatellite density for plants ranges between 0.85% (Arabidopsis) and 0.37% (maize), whereas for human chromosome 22 the estimate of microsatellite density is 2.12% (Morgante et al. 2002). In B. pendula, BpMADS2 contains a large number of sequence repeat regions, many of which can be considered as microsatellite sequences (Fig. 8.). Microsatellites in intron 1 (Region I) and intron 3 (Region II) displayed a greater degree of length variation than those in intron 2 (Region I), which in fact did not display any length variation at all. This result was consistent with results of nucleotide variation in different introns: intron 2 showed much less variation than introns 1 and 3 (Table 5). The lack of length variation in intron 2 might partly be a result of structural interruptions in microsatellites: all the other microsatellites but two in intron 2 are interrupted with one or more nonrepeat nucleotides or are otherwise more complicated in structure rendering them less prone to slippage mutations. The main allele types showed some differences in microsatellite level too, especially in population Rovaniemi where haplotype B was associated with the longer microsatellites in Region I.
The large number of repeat regions in the three introns studied suggests that BpMADS2 may be some kind of hotspot for microsatellite formation. One can find the same trend when comparing the corresponding two introns of PI (ecotype Ler, introns one and five, Accession number AF115827, isolated by Purugganan & Suddith 1999), although the microsatellites in the two introns of PI seem to be somewhat shorter than those in BpMADS2. Introns of the ASAP1 (A. sandwicense APETALA1) and ASAP3/TM6 (A. sandwicense APETALA3/TM6) genes also appear to harbour a number of sequence repeat motifs containing four or more mono-, di-, or trinucleotide repeats (Barrier et al. 2000).
Acknowledgements
We thank Dr Matti Rousi and Dr Risto Jalkanen for the population samples from Punkaharju and Rovaniemi. We especially thank Riitta Pietarinen for her skilful work in the laboratory*. The study has been carried out with financial support from the Tekes (as part of the Finnish Biodiversity Programme, FIBRE), Faculty of Science of University of Joensuu, the Graduate School of Biology and Biotechnology of Forest Trees, and the Graduate School of Forest Sciences. *We thank Dr Julio Rozas for discussing the data with us.
References
Pia L.H. Järvinen and Juha Lemmetyinen are both PhD students. The present manuscript will form part of the PhD thesis of the former. Outi Savolainen is professor of genetics working in plant population genetics and evolution. Tuomas Sopanen is professor of botany working with genes regulating flower development in birch.