Bioinformatics in Functional Genomics
Isaac S. Kohane
Children's Hospital Informatics Program, Boston, MA, USA
Search for more papers by this authorAtul Butte
Children's Hospital Informatics Program, Boston, MA, USA
Search for more papers by this authorIsaac S. Kohane
Children's Hospital Informatics Program, Boston, MA, USA
Search for more papers by this authorAtul Butte
Children's Hospital Informatics Program, Boston, MA, USA
Search for more papers by this authorAbstract
Databases of DNA sequences, physical maps, genetic maps, gene polymorphisms, protein structures, and gene expression have produced a need for systematic quantitative analysis that goes by the name of bioinformatics. To manage these large datasets in the domain of functional genomics requires algorithmic implementation on computers. Functional genomics will continue to be hypothesis-driven and hypothesis-generating biological research. Because the datasets are of high dimensionality, involving relatively small numbers of cases, a large number of solutions can explain the data; the computational techniques to unravel these involve both supervised and unsupervised learning.
References
- 1 Akutsu, T., Miyano, S. & Kuhara, S. (2000). Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function, Journal of Computational Biology 7, 331–343.
- 2 Akutsu, T., Miyano, S. & Kuhara, S. (2000). Inferring qualitative relations in genetic networks and metabolic pathways, Bioinformatics 16, 727–734.
- 3 Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A. et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature 403, 503–511.
- 4 Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D. & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proceedings of the National Academy of Sciences 96, 6745–6750.
- 5 Alter, O., Brown, P. O. & Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling, Proceedings of the National Academy of Sciences 97, 10101–10106.
- 6 Antonarakis, S. E. (1998). Recommendations for a nomenclature system for human gene mutations. Nomenclature Working Group, Human Mutation 11, 1–3.
- 7 Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M. et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nature Genetics 25, 25–29.
- 8 Belanger, C., Hennekens, C., Rosner, B. & Speizer, F. (1978). The Nurses' Health Study, American Journal of Nursing 78, 1039–1040.
- 9 Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M. & Yakhini, Z. (2000). Tissue classification with gene expression profiles, Journal of Computational Biology 7, 559–583.
- 10 Ben-Dor, A., Friedman, N. & Yakhini, Z. (1999). In International Conference on Computational Biology (RECOMB), ACM, Tokyo, pp. 31–38.
- 11 Ben-Dor, A., Shamir, R. & Yakhini, Z. (1999). Clustering gene expression patterns, Journal of Computational Biology 6, 281–297.
- 12 Brown, M. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, M., Jr & Haussler, D. (2000). Knowledge-based analysis of microarray gene expression data by using support vector machines, Proceedings of the National Academy of Sciences 97, 262–267.
- 13 Brunak, S., Engelbrecht, J. & Knudsen, S. (1990). Neural network detects errors in the assignment of mRNA splice sites, Nucleic Acids Research 18, 4797–4801.
- 14 Butte, A. & Kohane, I. S. (1999). In Fall Symposium, American Medical Informatics Association, N. Lorenzi, ed., Hanley & Belfus, Washington, pp. 711–715.
- 15 Butte, A. & Kohane, I. (2000). In Pacific Symposium on Biocomputing 2000, R. Altman, K. Dunker, L. Hunter, K. Lauderdale & T. Klein, eds, World Scientific, Hawaii, pp. 418–429.
- 16 Butte, A., Tamayo, P., Slonim, D., Golub, T. R. & Kohane, I. S. (2000). Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks, Proceedings of the National Academy of Sciences 97, 12182–12186.
- 17 Butte, A. J., Ye, J., Niederfellner, G., Rett, K., Häring H. U., White, M. F. & Kohane, I. S. (20001). In Pacific Symposium on Biocomputing, Vol. 6, R. Altman, ed., World Scientific, Hawaii, pp. 6–17.
- 18 Chow, M. L., Moler, E. J. & Mian, I. S. (2001). Identifying marker genes in transcription profiling data using a mixture of feature relevance experts, Physiological Genomics 5, 99–111.
- 19 Dawber, T., Meadors, G. & Moore, F. (1951). The Framingham Study: epidemiological approaches to heart disease, American Journal of Public Health 41, 279–286.
- 20 Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation 10, 1895–1923.
- 21 Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns, Proceedings of the National Academy of Sciences 95, 14863–14868.
- 22 Ermolaeva, O., Rastogi, M., Pruitt, K. D., Schuler, G. D., Bittner, M. L., Chen, Y., Simon, R., Meltzer, P., Trent, J. M. & Boguski, M. S. (1998). Data management and analysis for gene expression arrays, Nature Genetics 20, 19–23.
- 23 Fiehn, O., Kopka, J., Dormann, P., Altmann, T., Trethewey, R. N. & Willmitzer, L. (2000). Metabolite profiling for plant functional genomics, Nature Biotechnology 18, 1157–1161.
- 24 Friedman, N., Linial, M., Nachman, I. & Péer, D. (2000). Using Bayesian networks to analyze expression data, Journal of Computational Biology 7, 601–620.
- 25 Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W., Schummer, M. & Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics 16, 906–914.
- 26 Gardiner-Garden, M. & Littlejohn, T. G. (2001). A comparison of microarray databases, Briefings in Bioinformatics 2, 143–158.
- 27 Getz, G., Levine, E. & Domany, E. (2000). Coupled two-way clustering analysis of gene microarray data, Proceedings of the National Academy of Sciences 97, 12079–12084.
- 28 Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. & Lander, E. S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science 286, 531–537.
- 29
Hastie, T.,
Tibshirani, R.,
Botstein, D. &
Brown, P.
(2001).
Supervised harvesting of expression trees,
Genome Biology
2,
3.1–3.12.
10.1186/gb-2001-2-1-research0003 Google Scholar
- 30
Hastie, T.,
Tibshirani, R.,
Eisen, M. B.,
Alizadeh, A.,
Levy, R.,
Staudt, L.,
Chan, W. C.,
Botstein, D. &
Brown, P.
(2000).
“Gene shaving” as a method for identifying distinct sets of genes with similar expression patterns,
Genome Biology
1,
3.1–3.21.
10.1186/gb-2000-1-2-research0003 Google Scholar
- 31 Hilsenbeck, S. G., Friedrichs, W. E., Schiff, R., O'Connell, P., Hansen, R. K., Osborne, C. K. & Fuqua, S. A. (1999). Statistical analysis of array expression data as applied to the problem of tamoxifen resistance, Journal of the National Cancer Institute 91, 453–459.
- 32 Iyer, V. R., Eisen, M. B., Ross, D. T., Schuler, G., Moore, T., Lee, J. C. F., Trent, J. M., Staudt, L. M., Hudson, J., Jr, Boguski, M. S., Lashkari, D., Shalon, D., Botstein, D. & Brown, P. O. (1999). The transcriptional program in the response of human fibroblasts to serum, Science 283, 83–87.
- 33 Jenssen, T. K., Laegreid, A., Komorowski, J. & Hovig, E. (2001). A literature network of human genes for high-throughput analysis of gene expression, Nature Genetics 28, 21–28.
- 34 Karp, P. D. (2000). An ontology for biological function based on molecular interactions, Bioinformatics 16, 269–285.
- 35 Kim, J. H., Ohno-Machado, L. & Kohane, I. S. (2001). In Pacific Symposium on Biocomputing, Vol. 6, R. Altman, ed., World Scientific, Hawaii, pp. 30–41.
- 36 Kohane, I. S. (2000). Bioinformatics and clinical informatics: the imperative to collaborate (editorial), Journal of the American Medical Informatics Association 7, 512–516.
- 37 Kuska, B. (1997). Scientists reach a turning point with gene nomenclature, Journal of the National Cancer Institute 89, 1332–1334.
- 38 Lewis, S., Ashburner, M. & Reese, M. G. (2000). Annotating eukaryote genomes, Current Opinion in Structural Biology 10, 349–354.
- 39 Li, C. & Wong, W. H. (2001). Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection, Proceedings of the National Academy of Sciences 98, 31–36.
- 40 Liang, S., Fuhrman, S. & Somogyi, R. (1998). Reveal, a general reverse engineering algorithm for inference of genetic network architectures, in Pacific Symposium on Biocomputing, Vol. 6, R. Altman, ed., World Scientific, Hawaii, pp. 18–29.
- 41 Maltais, L. J., Blake, J. A., Eppig, J. T. & Davisson, M. T. (1997). Rules and guidelines for mouse gene nomenclature: a condensed version. International Committee on Standardized Genetic Nomenclature for Mice, Genomics 45, 471–476.
- 42 Masys, D. R. (2001). Linking microarray data to the literature, Nature Genetics 28, 9–10.
- 43 Masys, D. R., Welsh, J. B., Lynn Fink, J., Gribskov, M., Klacansky, I. & Corbeil, J. (2001). Use of keyword hierarchies to interpret gene expression patterns, Bioinformatics 17, 319–326.
- 44 Matsuno, H., Doi, A., Nagasaki, M. & Miyano, S. (2000). Hybrid Petri net representation of gene regulatory network, in Pacific Symposium on Biocomputing, Vol. 6, R. Altman, ed., World Scientific, Hawaii, pp. 341–352.
- 45 Moler, E. J., Radisky, D. C. & Mian, I. S. (2000). Integrating naive Bayes models and external knowledge to examine copper and iron homeostasis in S. cerevisiae, Physiological Genomics 4, 127–135.
- 46 Raychaudhuri, S., Stuart, J. M. & Altman, R. B. (2000). Principal components analysis to summarize microarray experiments: application to sporulation time series, in Pacific Symposium on Biocomputing, Vol. 6, R. Altman, ed., World Scientific, Hawaii, pp. 455–466.
- 47 Schadt, E. E., Li, C., Su, C. & Wong, W. H. (2000). Analyzing high-density oligonucleotide gene expression array data, Journal of Cellular Biochemistry 80, 192–202.
- 48 Schulze-Kremer, S. (1997). Adding semantics to genome databases: towards an ontology for molecular biology, Proceedings of the International Conference on Intelligent Systems in Molecular Biology Vol. 5, pp. 272–275.
- 49 Schulze-Kremer, S. (1998). Ontologies for molecular biology, in Pacific Symposium Biocomputing, Vol. 6, R. Altman, ed., World Scientific, Hawaii, pp. 695–706.
- 50 Sherlock, G., Hernandez-Boussard, T., Kasarskis, A., Binkley, G., Matese, J. C., Dwight, S. S. et al. (2001). The Stanford Microarray Database, Nucleic Acids Research 29, 152–155.
- 51 Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. & Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization, Molecular Biology of the Cell 9, 3273–3297.
- 52 Szallasi, Z. & Liang, S. (1998). Modeling the normal and neoplastic cell cycle with “realistic Boolean genetic networks”: their application for understanding carcinogenesis and assessing therapeutic strategies, in Pacific Symposium on Biocomputing, Vol. 6, R. Altman, ed., World Scientific, Hawaii, pp. 66–76.
- 53 Toronen, P., Kolehmainen, M., Wong, G. & Castren, E. (1999). Analysis of gene expression data using self-organizing maps, FEBS Letters 451, 142–146.
- 54 Tsien, C. L., Libermann, T. A., Gu, X. & Kohane, I. S. (2001). In Pacific Symposium on Biocomputing, Vol. 6, R. Altman, ed., World Scientific, Hawaii, pp. 496–507.
- 55 Weinstein, J. N., Kohn, K. W., Grever, M. R., Viswanadhan, V. N., Rubinstein, L. V., Monks, A. P. et al. (1992). Neural computing in cancer drug development: predicting mechanism of action, Science 258, 447–451.
- 56 Wen, X., Fuhrman, S., Michaels, G. S., Carr, D. B., Smith, S., Barker, J. L. & Somogyi, R. (1998). Large-scale temporal gene expression mapping of central nervous system development, Proceedings of the National Academy of Sciences 95, 334–339.
- 57 White, J. A., McAlpine, P. J., Antonarakis, S., Cann, H., Eppig, J. T., Frazer, K., Frazal, J., Lancet, D., Nahmias, J., Pearson, P., Peters, J., Scott, A., Scott, H., Spurr, N., Talbot, C., Jr & Povey, S. (1997). Guidelines for human gene nomenclature (1997). HUGO Nomenclature Committee, Genomics 45, 468–471.
- 58 Wu, C., Whitson, G., McLarty, J., Ermongkonchai, A. & Chang, T. (1992). Protein classification artificial neural system, Protein Science 1, 667–677.
- 59 Wuensche, A. (1998). Genomic regulation modeled as a network with basins of attraction, in Pacific Symposium on Biocomputing, Vol. 6, R. Altman, ed. World Scientific, Hawaii, pp. 89–102.