dbQSNP: A database of SNPs in human promoter regions with allele frequency information determined by single-strand conformation polymorphism-based methods†
Tomoko Tahira
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
Tomoko Tahira, Shingo Baba, Koichiro Higasa, and Yoji Kukita contributed equally to this work.
Search for more papers by this authorShingo Baba
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
Tomoko Tahira, Shingo Baba, Koichiro Higasa, and Yoji Kukita contributed equally to this work.
Search for more papers by this authorKoichiro Higasa
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
Tomoko Tahira, Shingo Baba, Koichiro Higasa, and Yoji Kukita contributed equally to this work.
Search for more papers by this authorYoji Kukita
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
Tomoko Tahira, Shingo Baba, Koichiro Higasa, and Yoji Kukita contributed equally to this work.
Search for more papers by this authorYutaka Suzuki
Laboratory of Functional Genomics, Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
Search for more papers by this authorSumio Sugano
Laboratory of Functional Genomics, Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
Search for more papers by this authorCorresponding Author
Kenshi Hayashi
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Fukuoka, JapanSearch for more papers by this authorTomoko Tahira
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
Tomoko Tahira, Shingo Baba, Koichiro Higasa, and Yoji Kukita contributed equally to this work.
Search for more papers by this authorShingo Baba
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
Tomoko Tahira, Shingo Baba, Koichiro Higasa, and Yoji Kukita contributed equally to this work.
Search for more papers by this authorKoichiro Higasa
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
Tomoko Tahira, Shingo Baba, Koichiro Higasa, and Yoji Kukita contributed equally to this work.
Search for more papers by this authorYoji Kukita
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
Tomoko Tahira, Shingo Baba, Koichiro Higasa, and Yoji Kukita contributed equally to this work.
Search for more papers by this authorYutaka Suzuki
Laboratory of Functional Genomics, Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
Search for more papers by this authorSumio Sugano
Laboratory of Functional Genomics, Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
Search for more papers by this authorCorresponding Author
Kenshi Hayashi
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Fukuoka, JapanSearch for more papers by this authorCommunicated by David N. Cooper
Abstract
We present a database, dbQSNP (http://qsnp.gen.kyushu-u.ac.jp/), that provides sequence and allele frequency information for single-nucleotide polymorphisms (SNPs) located in the promoter regions of human genes, which were defined by the 5′ ends of full-length cDNA clones. We searched for the SNPs in these regions by sequencing or single-strand conformation polymorphism (SSCP) analysis. The allele frequencies of the identified SNPs in two ethnic groups were quantified by SSCP analyses of pooled DNA samples. The accuracy of our estimation is supported by strong correlations between the frequencies in our data and those in other databases for the same ethnic groups. The frequencies vary considerably between the two ethnic groups studied, suggesting the need for population-based collections and allele frequency determination of SNPs, in, e.g., association studies of diseases. We show profiles of SNP densities that are characteristic of transcription start site regions. A fraction of the SNPs revealed a significantly different allele frequency between the groups, suggesting differential selection of the genes involved. Hum Mutat 26(2), 1–9, 2005. © 2005 Wiley-Liss, Inc.
REFERENCES
- Akey JM, Zhang G, Zhang K, Jin L, Shriver MD. 2002. Interrogating a high–density SNP map for signatures of natural selection. Genome Res 12: 1805–1814.
- Baba S, Kukita Y, Higasa K, Tahira T, Hayashi K. 2003. Single-stranded conformational polymorphism analysis using automated capillary array electrophoresis apparatuses. Biotechniques 34: 746–750.
- Barratt BJ, Payne F, Rance HE, Nutland S, Todd JA, Clayton DG. 2002. Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. Ann Hum Genet 66: 393–405.
- Carlson CS, Eberle MA, Rieder MJ, Smith JD, Kruglyak L, Nickerson DA. 2003. Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans. Nat Genet 33: 518–521.
- den Dunnen JT, Paalman MH. 2003. Standardizing mutation nomenclature: why bother? Hum Mutat 22: 181–182.
- Ewing B, Hillier L, Wendl MC, Green P. 1998. Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res 8: 175–185.
- Fredman D, White SJ, Potter S, Eichler EE, Den Dunnen JT, Brookes AJ. 2004. Complex SNP-related sequence variation in segmental genome duplications. Nat Genet 36: 861–866.
- Halushka MK, Fan JB, Bentley K, Hsie L, Shen N, Weder A, Cooper R, Lipshutz R, Chakravarti A. 1999. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat Genet 22: 239–247.
- Hartl DL, Clark AG. 1997. Principles of population genetics. 3rd ed. Sunderland: Sinauer Associates. p 57–62.
- Hellmann I, Zollner S, Enard W, Ebersberger I, Nickel B, Paabo S. 2003. Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res 13: 831–837.
- Higasa K, Kukita Y, Baba S, Hayashi K. 2002. Software for machine-independent quantitative interpretation of SSCP in capillary array electrophoresis (QUISCA). BioTechniques 33: 1342–1348.
- Hirakawa M, Tanaka T, Hashimoto Y, Kuroda M, Takagi T, Nakamura Y. 2002. JSNP: a database of common gene variations in the Japanese population. Nucleic Acids Res 30: 158–162.
- Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. 2004. Detection of large-scale variation in the human genome. Nat Genet 36: 949–951.
- Inazuka M, Wenz HM, Sakabe M, Tahira T, Hayashi K. 1997. A streamlined mutation detection system: multicolor post-PCR fluorescence labeling and single-strand conformational polymorphism analysis by capillary electrophoresis. Genome Res 7: 1094–1103.
- International HapMap Consortium. 2003. The international HapMap project. Nature 426: 789–796.
- Kruglyak L, Nickerson DA. 2001. Variation is the spice of life. Nat Genet 27: 234–236.
- Kukita Y, Hayashi, K. 2002a. Multicolor post-PCR labeling of DNA fragments with fluorescent ddNTPs. Biotechniques 33: 502–506.
- Kukita Y, Higasa K, Baba S, Nakamura M, Manago S, Suzuki A, Tahira T, Hayashi K. 2002b. A single-strand conformation polymorphism method for the large-scale analysis of mutations/polymorphisms using capillary-array electrophoresis. Electrophoresis 23: 2259–2266.
10.1002/1522-2683(200207)23:14<2259::AID-ELPS2259>3.0.CO;2-8 CAS PubMed Web of Science® Google Scholar
- Majewski J, Ott J. 2002. Distribution and characterization of regulatory elements in the human genome. Genome Res 12: 1827–1836.
- Marth GT, Korf I, Yandell MD, Yeh RT, Gu Z, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR. 1999. A general approach to single-nucleotide polymorphism discovery. Nat Genet 23: 452–456.
- Nelson MR, Marnellos G, Kammerer S, Hoyal CR, Shi MM, Cantor CR, Braun A. 2004. Large-scale validation of single nucleotide polymorphisms in gene regions. Genome Res 14: 1664–1668.
- Nickerson DA, Tobe VO, Taylor SL. 1997. PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Res 25: 2745–2751.
- Ohnishi Y, Tanaka T, Ozaki K, Yamada R, Suzuki H, Nakamura Y. 2001. A high-throughput SNP typing system for genome-wide association studies. J Hum Genet 46: 471–477.
- Oota H, Pakstis AJ, Bonne-Tamir B, Goldman D, Grigorenko E, Kajuna SL, Karoma NJ, Kungulilo S, Lu RB, Odunsi K, Okonofua F, Zhukova OV, Kidd JR, Kidd KK. 2004. The evolution and population genetics of the ALDH2 locus: random genetic drift, selection, and low levels of recombination. Ann Hum Genet 68: 93–109.
- Orita M, Sekiya T, Hayashi K. 1990. DNA sequence polymorphisms in Alu repeats. Genomics 8: 271–278.
- Pastinen T, Hudson TJ. 2004. Cis-acting regulatory variation in the human genome. Science 306: 647–650.
- Reich DE, Gabriel SB, Altshuler D. 2003. Quality and completeness of SNP databases. Nat Genet 33: 457–458.
- Rozen S, Skaletsky HJ. 2000. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 2000; 132: 365–386.
- Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, Mullikin JC, Mortimore BJ, Willey DL, Hunt SE, Cole CG, Coggill PC, Rice CM, Ning Z, Rogers J, Bentley DR, Kwok PY, Mardis ER, Yeh RT, Schultz B, Cook L, Davenport R, Dante M, Fulton L, Hillier L, Waterston RH, McPherson JD, Gilman B, Schaffner S, Van Etten WJ, Reich D, Higgins J, Daly MJ, Blumenstiel B, Baldwin J, Stange-Thomann N, Zody MC, Linton L, Lander ES, Altshuler D, International SNP Map Working Group. 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928–933.
- Salisbury BA, Pungliya M, Choi JY, Jiang R, Sun XJ, Stephens JC. 2003. SNP and haplotype variation in the human genome. Mutat Res 526: 53–61.
- Sasaki T, Tahira T, Suzuki A, Higasa K, Kukita Y, Baba S, Hayashi K. 2001. Precise estimation of allele frequencies of single-nucleotide polymorphisms by a quantitative SSCP analysis of pooled DNA. Am J Hum Genet 68: 214–218.
- Sham P, Bader JS, Craig I, O'Donovan M, Owen M. 2002. DNA pooling: a tool for large-scale association studies. Nat Rev Genet 3: 862–871.
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. 2001. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29: 308–311.
- Suzuki Y, Tsunoda T, Sese J, Taira H, Mizushima-Sugano J, Hata H, Ota T, Isogai T, Tanaka T, Nakamura Y, Suyama A, Sakaki Y, Morishita S, Okubo K, Sugano S. 2001. Identification and characterization of the potential promoter regions of 1031 kinds of human genes. Genome Res 11: 677–684.
- Suzuki Y, Yamashita R, Shirota M, Sakakibara Y, Chiba J, Mizushima-Sugano J, Nakai K, Sugano S. 2004. Sequence comparison of human and mouse genes reveals a homologous block structure in the promoter regions. Genome Res 14: 1711–1718.
- Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673–4680.
- Tomso DJ, Bell DA. 2003. Sequence context at human single nucleotide polymorphisms: overrepresentation of CpG dinucleotide at polymorphic sites and suppression of variation in CpG islands. J Mol Biol 327: 303–308.
- Weir BS. 1996. Genetic data analysis II. Sunderland: Sinauer Associates, Inc. p 161–201.
- Wright S. 1969. Evolution and the genetics of populations. Vol. II: The theory of gene frequencies. Chicago: University of Chicago Press. p 290–344.
- Zhao Z, Boerwinkle E. 2002. Neighboring-nucleotide effects on single nucleotide polymorphisms: a study of 2.6 million polymorphisms across the human genome. Genome Res 12: 1679–1686.