Linkage Disequilibrium and Haplotype Architecture for two ABC Transporter Genes (ABCC1 and ABCG2) in Chinese Population: Implications for Pharmacogenomic Association Studies
Summary
Information about linkage disequilibrium (LD) patterns and haplotype structures for candidate genes is instructive for the design and analysis of genetic association studies for complex diseases and drug response. ABCC1 and ABCG2 are genes coding for two multidrug resistance (MDR) associated transporters; they are also related to some pathophysiological traits. To pinpoint the LD profiles of these MDR genes in Chinese, we systemically screened 27 unrelated individuals for single nucleotide polymorphisms (SNPs) in the coding and regulatory regions of these genes, and thereby characterized their haplotype structures. Despite marked variations in haplotype diversity, LD pattern and intragenic recombination intensity between the two genes, both loci could be partitioned into several LD blocks, in which a modest number of haplotypes accounted for a high fraction of the sampled chromosomes. We concluded that each locus has its own genomic LD profile, but that they still share a common segmental LD architecture with low haplotype diversity. Our data will benefit genetic association studies of complex traits and drug response possibly related to these genes.
Introduction
Multidrug resistance (MDR, cross-resistance to structurally and mechanically unrelated drugs) is a complex phenomenon in cancer chemotherapeutics caused by several different cellular mechanisms. Three members of the ATP-binding cassette (ABC) superfamily have been well documented in MDR related systems: ABCB1 (also known as P-gp and MDR1), ABCC1 (or multidrug resistance-associated protein 1, MRP1), and ABCG2 (also known as mitoxantrone-resistance protein, MXR; or breast cancer resistance protein, BCRP) (Gottesman et al. 2002). These MDR associated proteins are often expressed in both cancer cells and many normal tissues (Borst et al. 2000; Maliepaard et al. 2001). ABCC1 transports many exogenous chemotherapy drugs (Borst et al. 2000) and some endogenous factors such as glutathione conjugates and organic anions (Jedlitschky et al. 1996) and folate (Assaraf et al. 2003). ABCG2 is a methotrexate and methotrexate polyglutamate transporter (Volk et al. 2003). It also restricts exposure to the dietary carcinogen 2-amino-1-methyl-6-phenylimidazo (4,5-b) pyridine (PhIP), suggesting that interindividual differences in its activity may thus also affect the exposure to PhIP and related food carcinogens, with possible implications for cancer susceptibility (van Herwaarden et al. 2003). Though physiological substrates of ABCG2 are poorly known, its distinct role as a negative regulator of hematopoietic repopulating activity, and its specific expression in hematopoietic side-population stem cells have recently been clarified (Zhou et al. 2001), suggesting that ABCG2 may play an important role in normal and malignant hematopoiesis, such as relapsed acute myeloid leukemia (AML) (van Den Heuvel-Eibrink et al. 2002).
Recently, genetic polymorphisms in ABCB1 and their clinical relevance have been intensively studied: one synonymous SNP 3435 C>T is correlated with the expression level and activity of the transporter (Hoffmeyer et al. 2000), chemotherapeutic outcomes (Fellay et al. 2002; Illmer et al. 2002) and disease susceptibility (Siegsmund et al. 2002). It has also been reported that there is strong LD between multiple SNPs at the ABCB1 locus, and common alleles or haplotypes are associated with altered P-gp function (Kim et al. 2001). Though LD and haplotype profiles of ABCB1 have been documented and proved functionally relevant, knowledge of the LD extent and haplotype structure is very limited for pharmacogenomics studies of other multidrug resistance related ABC genes.
In this study, we conducted a systematic screening in 27 unrelated individuals for sequence variations in the coding and regulatory regions of ABCC1 and ABCG2, and further carried out genomic level analysis of their LD pattern and haplotype structure. The knowledge of SNP distribution, LD profile and haplotype structure within these genes will be useful in assessing their correlation or association with MDR and other complex clinical traits.
Materials and Methods
Sample, Sequence Accession and Primer Design
Genomic DNA from 27 unrelated healthy subjects was chosen from the sample collection, which was constructed for the Chinese Human Genome Diversity Project through a coordinated effort of several institutes (Chu et al. 1998). This study was approved by the Ethical Committee of Chinese National Human Genome Center. The sample included 54 chromosomes, providing a 95% confidence level to detect alleles with frequency >5%. Accession numbers used in this study are U91323.1, AC025778.4, AC003026.1 (ABCC1) and AC084732.1 (ABCG2). With a candidate-gene strategy, all exons (except ABCC1 exon1 because of high GC content), 5'flanking regions (with lengths of about 1.6 kb and 0.8 kb in ABCC1 and ABCGI, respectively), untranslated regions (UTR) (about 0.6 kb and 1.1 kb), and about 9.7 kb and 5.1 kb of intronic sequence of ABCC1 and ABCG, respectively, were covered for SNP screening. Primers were designed online using the Primer 3 program (http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi). Details regarding the position in genomic sequence, length, product size, and annealing temperature for each primer pair are denoted at the Appendix table. The same primers were used for both polymerase chain reactions (PCR) and sequencing reactions.
PCR-resequencing Based SNP Discovery
Combined hot start and touchdown PCR was carried out in a 25 μl volume system containing 10 ng of genomic DNA, 0.3 μM of each primer, 200 μM of each dNTP, 1.5 mM MgCl2, 10×PCR buffer, 5×Q solution, and 1 Unit HotStarTaq™ DNA polymerase (QIAGEN). The thermal cycling conditions in GeneAmp PCR System 9700 Thermocyclers (Perkin-Elmer) were as follows: an initial denaturation of 95°C for 15 min followed by 13 cycles of 94°C for 30 sec, annealing temperatures stepdown every cycle of 0.5°C, and extension at 72°C for 50 sec; In the following stage of cycling, denaturation and extension phases were the same as above with a final extension at 72°C for 5 min. PCR products were purified with MultiScreen™-PCR plates (Millipore) according to the protocol. Bidirectional sequencing was carried out using an ABI PRISM Big Dye Terminator and run on an ABI 3700 DNA Analyzer. Polymorphic sites were identified using the PolyPhred program. SNP genotypes were verified by manual evaluation of the individual sequence traces.
Assessing Nucleotide Diversity
Chi-square test for the significance of deviation of genotype distributions from Hardy-Weinberg equilibrium (HWE) was carried out for each site. Two commonly used estimates of nucleotide variation were calculated, which correct for both sample size and the length of the region surveyed: Pi (π), the direct estimate of per site heterozygosity, that is the average number of nucleotide differences per site between two sequences chosen from a randomly mating population; and Theta (θ), the estimate of population mutation parameter based on the number of polymorphic sites in the sample (Li, 1997).
Inference of Haplotype, Recombination, and Linkage Disequilibrium
Haplotypes were reconstructed on phase-unknown genotype data using PHASE version 0.21 (Stephens et al. 2001). Based on Bayesian statistics, PHASE infers haplotypes and their frequencies implementing the Markov chain-Monte Carlo (MCMC) algorithm. DnaSP version 3.51 (Rozas et al. 1999) was used to estimate nucleotide and haplotype diversity, recombination, and linkage disequilibrium. The normalized coefficient (Lewontin's coefficient) was used as an LD measure (Devlin et al. 1995). The four-gamete test and the Hudson and Kaplan recombination statistic R were used to test for recombination (Hudson, 1987). Positions for recombination events and the minimum number of recombination events were determined using the recombination module of DnaSP. We carefully conducted haplotype block partitioning according to reported principles and methods (Daly et al. 2001; Johnson et al. 2001). Here, the common haplotypes were defined as those that were observed at least four times (or >7%) in the sample, and the proportion of chromosomes accounted for by common haplotypes was used as a general surrogate of haplotype diversity information for the local region.
Results
SNP Detection and Distribution
In each of the 27 subjects, about 17 kb and 9 kb inconsecutive sequences within the respective loci were sampled for SNP screening. A total of 45 biallelic polymorphisms (32 in ABCC1 and 13 in ABCG2) were identified (Table 1), of which 22 in ABCC1 and 9 in ABCG2 had been deposited in dbSNP prior to our report (Revised August 11, 2003 2:18 PM). Details of SNPs and their genotype for each sample are also provided in the Appendix table. 12 SNPs located in coding exons, 24 in introns, 4 in 5' flanking regions, 2 in 5' UTRs, 2 in 3' UTRs and 1 in a 3' flanking region (Table 1). No other variants (deletions, insertions, microsatellites, or triallelic polymorphisms) were surveyed. Of the 12 SNPs within coding regions, 5 SNPs result in amino acid alteration, all of which seem to be benign or conservative in functional effect prediction using PolyPhen (http://www.bork.embl-heidelberg.de/PolyPhen/). Using the Chi-square test for deviation from HWE, only SNP19 of ABCC1 in its 5' flanking region did not fit with expectations of genotype distribution, with no heterozygote and 2 mutant homozygotes found in the sample. We measured the SNP data validity by repeating 10% of the genotype assays, based on the PCR and bi-direction resequencing SNP platform. The error rate was relatively low (3.6%).
Locus | SNP No. | SNP position | Nucleotide sequence (major/minor) | Minor allele frequency (%) | dbSNP ID | effect |
---|---|---|---|---|---|---|
ABCC1 | 1 | 5'FR/−1862 | gacccG/Aggcca | 44.4 | ||
2 | 5'FR/−1830 | atcctA/Gtctac | 1.9 | |||
3 | 5'FR/−1680 | gaggaG/Aaaaag | 1.9 | |||
4 | 5'FR/−471 | cggatA/Gctgtc | 7.4 | |||
5 | E2/218 | caaaaC/Tcaaaa | 3.7 | Thr73Ile | ||
6 | I2/−26 | gttgtG/Aggggg | 1.9 | rs8187842 | ||
7 | I3/−66 | ctgggT/Cgacaa | 37.0 | rs4148337 | ||
8 | I7/+54 | ccactC/Actgtg | 9.3 | rs903880 | ||
9 | I7/+64 | ggcctC/Gaatcc | 48.1 | rs246232 | ||
10 | E8/816 | cagccG/Agtgaa | 1.9 | wobble | ||
11 | E8/825 | aaggtT/Cgtgta | 38.9 | rs246221 | wobble | |
12 | E9/1062 | gtgaaT/Cgacac | 35.2 | rs35587 | wobble | |
13 | I9/+8 | aggggA/Gcgctg | 37.0 | rs35588 | ||
14 | I12/−37 | cactcA/Ggggca | 20.4 | rs35604 | ||
15 | E13/1684 | tggccT/Ctgtgc | 20.4 | rs35605 | wobble | |
16 | I13/+105 | ccggtC/Tgggct | 20.4 | rs35606 | ||
17 | I14/+105 | ccagcC/Tgcttg | 1.9 | |||
18 | I15/+627 | gctgtA/Gtttta | 25.8 | rs35628 | ||
19 | I15/+669 | aatctG/Ttagaa | 7.4* | rs4148353 | ||
20 | I15/−967 | ctttcT/Ggctgt | 37.0 | rs152029 | ||
21 | E16/2007 | atcccC/Tgaagg | 3.7 | rs2301666 | wobble | |
22 | E17/2168 | tctccG/Aagaaa | 5.6 | rs4148356 | Arg723Gln | |
23 | I18/−30 | gcactG/Cacgtg | 16.7 | rs2074087 | ||
24 | I22/+62 | aattaT/Ctccct | 27.8 | rs3887893 | ||
25 | I22/−43 | gtcagC/Ttccct | 3.7 | |||
26 | E27/3915 | gaggaC/Tctgga | 1.9 | wobble | ||
27 | E28/4002 | aagtcG/Atccct | 11.1 | rs2239330 | wobble | |
28 | I28/−35 | tcagcA/Gtgaca | 27.8 | rs212087 | ||
29 | I30/+30 | gcacaG/Atggcc | 29.6 | rs212088 | ||
30 | 3'UTR/+801 | accccC/Gactcc | 33.3 | rs129081 | noncoding | |
31 | 3'UTR/+866 | tactgT/Atccca | 14.8 | rs212090 | noncoding | |
32 | 3'FR/+1513 | gttctT/Ctaagg | 27.8 | |||
ABCG2 | 1 | 5'UTR/−407 | cgcagC/Tgcctc | 1.9 | ||
2 | 5'UTR/−376 | ggggaG/Acgctc | 1.9 | |||
3 | E2/34 | tcccaG/Atgtca | 20.4 | rs2231137 | Val12Met | |
4 | I2/+36 | ttttaA/Gtttac | 25.9 | rs4148152 | ||
5 | I3/+10 | gtataA/Ggagag | 20.4 | rs2231138 | ||
6 | E5/421 | acttaC/Agttct | 22.2 | rs2231142 | Gln141Lys | |
7 | E7/805 | acgggC/Tctgct | 3.7 | Pro269Ser | ||
8 | I9/−126 | agccaT/Gtgagt | 7.4 | |||
9 | I11/+20 | gttctA/Gggaac | 31.5 | rs2231153 | ||
10 | I12/+49 | cctatG/Tggtga | 16.7 | rs2231156 | ||
11 | I13/+40 | tgtttT/Ctttcc | 24.1 | rs2231157 | ||
12 | I13/−21 | tgactC/Tttagt | 29.6 | rs2231162 | ||
13 | I14/−46 | ttcttG/Aaaatt | 48.1 | rs2725267 |
- SNPs in specific regions, i.e. 5'flanking region (5'FR), 5'untranslated region (5'UTR), intron (I), exon (E), 3'UTR, and 3'FR, are presented as region/+(−): for 5'FR and 5'UTR, n nucleotides upstream (−) from the translation initiation site; for 3'UTR and 3'FR, n nt downstream (+) from the third base of stop codon; for coding regions, n corresponds to positions of their cDNA with the first base of start codon set to 1; and for introns, n nt upstream (−) from 3' site or downstream (+) from 5' site of introns. Segregating sites in local sequences are denoted as major/minor allele. *In Chi-square tests for deviation from Hardy-Weinberg equilibrium (HWE), only one (I15/+669 in ABCC1) out of the 45 segregating sites showed statistical significance.
Nucleotide and Haplotype Diversity
We screened 45 SNPs in nearly 26 kb of genomic sequence. The indices of nucleotide and haplotype variation within the two genes are summarized in Table 2. ABCC1 and ABCG2 showed similar patterns of nucleotide diversity: θ, the expected proportion of polymorphic sites were 4.1×10−4 and 3.1×10−4 respectively; in the study population one SNP could be observed per 2 439 bp and 3 226 bp in the targeted sequences of ABCC1 and ABCG2, respectively. The mutation parameters (θ) for coding and non-coding regions were almost identical to the overall estimates, but nucleotide diversities (π) for coding regions were relatively lower than those for non-coding regions and overall estimates. When compared with the gene-based averages of sequence variation indices, which were from one systematic SNP screening (313 genes) effort with a similar strategy (Stephens et al. 2001), the two loci showed lower nucleotide variation (when θ but not π was measured). The discrepancy could mainly be accounted for by pronounced intergenic differences in the sequence mutation rate across the genome.
Locus | Screened Length (bp) | No. SNPs | Nucleotide variation (×10−4) (over all; coding; non-coding region) | No. haplotype | Haplotype (H) * | |
---|---|---|---|---|---|---|
π± SD | θ±SD | |||||
ABCC1 | 17 006 | 32 | 5.0 ± 0.2 | 4.1 ± 1.3 | 46 | 0.99 |
3.9 ± 0.3 | 4.3 ± 1.8 | |||||
5.5 ± 0.3 | 4.1 ± 1.4 | |||||
ABCG2 | 9 094 | 13 | 4.0 ± 0.2 | 3.1 ± 1.2 | 16 | 0.89 |
3.5 ± 0.4 | 3.4 ± 1.9 | |||||
4.1 ± 0.2 | 3.1 ± 1.3 |
- *H, measure of haplotype diversity, is the expected heterozygosity based on haplotype frequencies, H= 1−∑i−qi2.
When all the SNPs were used to reconstruct haplotypes, an excess of haplotypes was predicted, with 46 based on 32 SNPs in ABCC1, and 16 based on 13 SNPs in ABCG2, respectively. In order to further interrogate recombination and LD with more validity (as the estimates of LD, D', could be biased with small sample sizes), haplotype deconvoluting was also based on common SNPs (with minor allele frequency > 5% and in HWE). In ABCC1, 39 haplotypes were predicted using 22 common SNPs; and in ABCG2, 13 haplotypes based on 10 common SNPs were constructed. Assessment of data quality for common-SNP-based haplotypes was provided by PHASE. A pronounced fraction of haplotype calling was unambiguous (66% for ABCC1 and 76% for ABCG2). As to the configuration dataset for unphased position, 44% and 78% of haplotype calling was obtained with probability >0.95 for the two loci, respectively.
Recombination
According to the neutral infinite-sites mutation model, intragenic recombination in the two loci could be readily inferred from the above observation of haplotype diversity. We further assessed recombination using the four-gamete test (Fig. 1). In ABCC1 only 53 out of 231 pairs were found to be in complete LD (fewer than four gemetes observed in the sample), indicating extensive intragenic recombination throughout the gene. The estimates of the population recombination parameter R (= 4Ner, where Ne is the effective population size and r is the recombination rate per gene per generation) for ABCC1 and ABCG2 were 70.4 and 35.8 respectively. The minimum number of recombination events, Rm, indicated that recombination was detected for 15 pairs of SNPs in ABCC1. In ABCG2, though more than half of 45 pairs were in complete LD, five recombination events were detected.

Recombination measurements of ABCC1 and ABCG2. Recombination was determined using the four-gamete test with R, potential recombination sites, indicated by ×s. Blackened boxes indicate site pairs having all possible four gametic types, which implies that recombination has occurred between these two sites; Blank boxes indicate site pairs having less than four gametic types.
Linkage Disequilibrium and Haplotype Structure
LD was measured using the statistic |D′| in a pairwise manner across all common SNPs. Despite the above inference of recombination in ABCC1 and ABCG2, however, pronounced LD was observed in their local regions. A very irregular picture of LD was observed for ABCC1 (Fig. 2A). For example, in the subgroup of SNP8-9-11-12-13 of ABCC1, which spans about 10 kb of sequence, only 4 pairs (SNP8/11, SNP8/12, SNP11/12, and SNP12/13) were in complete LD (|D′| = 1). However, the LD profile of ABCG2 was less complicated, with only one segmental LD subgroup partitioned (Fig. 2B). The haplotype structures of the two ABC genes are shown in Fig. 3. It became evident that the loci could be largely decomposed into three and two discrete blocks for ABCC1 and ABCG2, respectively. These blocks span from about 10 kb to about 50 kb and contain multiple (five or more) common SNPs. The major attribute of each block is that only a few (3-5) common haplotypes accounted for most (>80%) of the sampled chromosomes, although an excess of haplotypes was predicted in most of the blocks.

LD intensity across loci of ABCC1 and ABCG2. LD was measured using the statistic |D'| in a pairwise manner across common SNPs in HWE.


Sequence and haplotype diversity of ABCC1 (A) and ABCG2 (B) loci in the Chinese population. SNP names, frequencies (common SNPs in HWE that were used in LD analysis were marked with asterisks on the top line), and haplotype blocks of low diversity and their common haplotype profiles were denoted. In each block, length of covered sequence, number of common SNPs, number and frequency of total and common haplotypes (with frequencies >7%), and total proportion of common haplotypes in the sample were presented.
Discussion
It is well recognized that inherited differences in drug disposition systems (metabolizing enzymes and transporters) and drug effect systems (sensors, receptors and targets) have great influences on the efficacy and toxicity of medications, and the risks of some diseases related with xenobiotic exposure (Evans & Relling, 1999). Geneticists and physicians have come to view clinical drug responses as complex traits associated with polygenic determinants. The notable successes of LD-based positional cloning studies for Mendelian disorders, superimposed with the high availability of genome polymorphism data, has sparked a strong interest in LD-based association studies for pinpointing the genes underlying complex diseases and drug response. Thus, there is an urgent need to resolve the allelic architecture of xenobiotic disposition and effect systems underlying clinical phenotypes of drug response or disease predisposition.
The present study provides a comprehensive analysis in the Chinese population of genetic variants of two large ABC transporter genes (ABCC1 and ABCG2) introduced in multidrug resistance. Variations in LD pattern and haplotype diversity were observed between the two loci. However, haplotypes in the two loci could be partitioned into several LD units in which haplotype diversity is low and a few common haplotypes account for most sampled chromosomes. Our data contribute to the growing LD landscape in the human genome, and will facilitate marker choosing for pharmacogenomics studies of cancer chemotherapy and susceptibility.
Most of our analyses were based on haplotype data, which was reconstructed in silico from population unphased genotype data. Although PHASE is a well recognized program, it only provides estimates of unphased haplotypic substructure which may, at times, be biased; furthermore, our datasets lack phased genotypic information from pedigrees and these methodological limitations might to some extent affect the accuracy of configuration in our datasets. However, although our sample size was relatively small (2N = 54), there are at least three lines of evidence suggesting that our observations were not severely affected by ascertainment bias due to sampling and genotyping. Firstly, the genotype frequencies at nearly all the SNP sites followed HWE. Secondly, on the basis of the observed numbers of segregating sites, our estimates of θ (per sequence) for the two loci were 6.97 and 2.82 respectively (the estimates in Table 2 were calculated on per-site basis), which is largely in agreement with the observation that there were 6 and 2 singletons in the analysed regions of ABCC1 and ABCG2, respectively (under the infinite-sites neutral model, the expected number of singletons is simply θ (Fu & Li, 1993)). Furthermore, because rare alleles with frequencies <5% do not have sufficient statistic power for LD detection (Lewontin, 1995; Goddard et al. 2000), haplotype reconstruction for LD and recombination analysis was based only on common SNPs.
One feature of the present data that merits attention is the inference that there was somewhat pronounced variation in LD pattern and recombination between ABCC1 and ABCG2. For ABCC1, many more distinct haplotypes were observed than theoretically expected for the value of θ estimated from the segregating sites. When ten rare SNPs were subtracted from the total of 32, the number of predicted haplotypes only decreased from 46 to 39. These facts, and the observation that the proportion of haplotype calling with sound probability (>0.95) was relatively low (the error rate for PHASE algorithm partially depends on recombination rate), indicated extensive intragenic recombination throughout this locus. Comparison of LD measurements and the four-gamete matrices yielded close concurrence. The plethora of recombination throughout ABCC1 helps to explain the highly irregular pattern of LD in this locus. However, a low population recombination rate and regular LD pattern were observed at ABCG2. These results are consistent with the latest genetic map of the human genome (Kong et al. 2002), in which the deduced local recombination rate of the ABCC1 locus is much higher than that of ABCG2. Empirical data from other candidate-gene-based LD studies also show pronounced intergenic difference in LD pattern (Jeffreys et al. 2000; Bonnen et al. 2002). Clearly for the two loci, the varied sequence lengths surveyed, and therefore the different number of SNPs screened, might partly explain their different LD pattern. However, in simple population genetics models, the key parameter in determining the extent of linkage disequilibrium is the product of the recombination rate and the effective population size, often termed the population recombination rate R. Several molecular and population genetic factors may account for most of the heterogeneous LD pattern across genes and genome: sequence constitution dependent mutation rate, recombination rate, population demographic history, natural selection forces, and even genetic drift (Reich et al. 2002; Stumpf & Goldstein, 2003).
Recently, both regional and chromosome-wide studies of linkage disequilibrim and haplotype structure have revealed block-like patterns of LD (Daly et al. 2001; Jeffreys et al. 2001; Gabriel et al. 2002; Phillips et al. 2003). A small fraction of SNPs in each block are statistically sufficient to capture the haplotype information content of that block. Therefore, haplotype based association studies may hold promise for complex trait mapping. Taking into consideration their functional implications in MDR and other pathophysiological traits, and their marked genomic sequence length and complexity, we putatively dissected the haplotype structure of ABCC1 and ABCG2 invoking the scenario of “chromosome coverage” (Daly et al. 2001). Here, we subjectively defined the common haplotypes as those with a frequency >7% and defined the threshold of their “chromosome coverage” in one block as 80%. With common haplotypes defined here, attainable sample sizes in typical association studies could provide sufficient statistical power to detect insulting variants with an odds ratio greater than 1.5 (Johnson et al. 2001). The two loci were thereby empirically reduced into three and two haplotype blocks respectively. The outlined haplotype blocks made up 32% (63 out of 200 kb) and 76% (52 out of 68 kb) of the ABCC1 and ABCG2 genomic sequence, respectively, and spanned about 10 kb to about 50 kb. In each block, a small number of common haplotypes (3 to 5) typically captured >80% of all chromosomes in the sample. This general pattern is largely consistent with the recent high-resolution haplotype analyses based on chromosomes (Phillips et al. 2003) or large-scale autosomal regions (Gabriel et al. 2002). It is also apparent, however, that the haplotype structures of the two genes differ to some extent. The chromosome coverage of the common haplotypes in either block2 or block3 of ABCC1 is relatively low compared with that of ABCG2, which is reinforced by their agreement with the observation that intragenic recombination in ABCC1 was more extensive and more intensive than that in ABCG2. This pattern of haplotype complexity suggests that a set of markers would need to be more carefully chosen for ABCC1 than for ABCG2 for association studies.
Some inherent limitations of this study must be acknowledged while we discuss the implications of the data. Candidate-gene-based SNP screening usually targets on coding and regulatory regions rather than the complete genomic sequence of concerned loci. This strategy may be methodologically sound to some extent, and it has been demonstrated that there are relatively large numbers of SNPs within coding and regulatory regions of candidate genes, some of which might be functional, and that the LD strength and haplotype structure based on such SNPs are very informative in LD-based association studies (Tiret et al. 2002). It is also readily conceivable, however, that the candidate-gene-based LD pattern and haplotype structure could be obscured by such incomplete coverage. Take ABCG2 as an example to explain the bias resulting from a sampling strategy with partial information. Block 1, with a length of 7 kb, might extend had potential common SNPs in intron1 (about 18 bk) had been covered. Likewise, block 2 (spanning 45 kb) might be further partitioned into subgroups if some markers in the 3' segment of the block had been added. However, in this preliminary stage of a pharmacogenomics study, we have not attempted to catalogue all of the genetic variations in the loci. Instead, we wanted to obtain general information about LD and common haplotypes for these genes in the study population. Supporting our scenario, a systematic survey of haplotype structure reveals that in regions with a low recombination rate, a small number of randomly chosen common markers are sufficient to identify most common haplotypes (Gabriel et al. 2002).
Another limitation of our study is its lack of sampling of other major ethnic populations. Though it has been indicated that both the boundaries of blocks and common haplotypes are shared to a remarkable extent across different populations (Gabriel et al. 2002), theoretical and empirical data also emphasize the effects on LD pattern of recombination rate heterogeneity, demographic population history, and even stochastic effects (Reich et al. 2002; Stumpf & Goldstein, 2003; Wang et al. 2002). We cannot address here the interesting issues mentioned above, especially the extent to which block boundaries are conserved across populations. The implications of our dataset for association studies in other populations should be very prudently interpreted. Taking the two issues together, we should acknowledge that the LD profile and haplotype block architecture we outlined, might be only one of a possible variety of ways to describe the LD pattern, as the position and frequency of markers, mutation and recombination rate, methodology for block definition, etc, all together dynamically depict the fine allelic landscape in a population.
In summary, with a candidate-gene-based SNP screening strategy, we characterized the linkage disequilibrium and haplotype diversity within two multidrug resistance related genes in the Chinese population. ABCC1 showed much greater complexity than ABCG2 in LD pattern, intragenic recombination intensity and haplotype diversity. Though the LD profiles were complicated by intragenic recombination, and other factors, there are still a few blocks of low haplotype diversity in the two loci with a somewhat stringent definition. The LD and haplotype landscape for ABCC1 and ABCG2 may be informative, and benefit genetic association studies for complex diseases and drug response that might be related to these multidrug resistance genes.
Acknowledgements
We give special thanks to Prof. Wei Huang at Chinese National Human Genome Center at Shanghai, Prof. Li Jin at Fudan University, and Prof. Jiujin Xu and Prof. Ruofu Du at Institute of Genetics & developmental biology, CAS, for their contribution of sample collection and distribution. We fully acknowledge Prof. Yan Shen and Prof. Zhijian Yao at Chinese National Human Genome Center at Beijing for their support of the sequencing platform. We thank Xiaojia Dong, Xiumei Zhang, and Fengying He for their excellent bench work. We also thank Dr. Keyue Ding for his constructive discussion. This work was supported by a grant-in-aid from the National High Technology Project of China (2001AA224011) and Shanghai Science and Technology Developing Program (Grant No. Ø3DZ14024).
References
Supplementary Material
List of position in genomic sequence, length, product size, and annealing temperature for each primer pair, details of SNPs and their genotype for each sample.