Information Retrieval in Biomedical Research

Zina Ben Miled

Indiana University Purdue University Indianapolis, Electrical and Computer Engineering Department, Indianapolis, Indiana

Search for more papers by this author

Nianhua Li,

Nianhua Li

Indiana University Purdue University Indianapolis, Electrical and Computer Engineering Department, Indianapolis, Indiana

Search for more papers by this author

Malika Mahoui,

Malika Mahoui

Indiana University Purdue University Indianapolis, Computer and Information Science Department, Indianapolis, Indiana

Search for more papers by this author

Omran Bukhres,

Omran Bukhres

Indiana University Purdue University Indianapolis, Computer and Information Science Department, Indianapolis, Indiana

Search for more papers by this author

Zina Ben Miled,

Zina Ben Miled

Indiana University Purdue University Indianapolis, Electrical and Computer Engineering Department, Indianapolis, Indiana

Search for more papers by this author

Nianhua Li,

Nianhua Li

Indiana University Purdue University Indianapolis, Electrical and Computer Engineering Department, Indianapolis, Indiana

Search for more papers by this author

Malika Mahoui,

Malika Mahoui

Indiana University Purdue University Indianapolis, Computer and Information Science Department, Indianapolis, Indiana

Search for more papers by this author

Omran Bukhres,

Omran Bukhres

Indiana University Purdue University Indianapolis, Computer and Information Science Department, Indianapolis, Indiana

Search for more papers by this author

First published: 14 April 2006

https://doi.org/10.1002/9780471740360.ebs0631

Share a link

Email
Wechat
Bluesky

Abstract

Information retrieval is important in various biomedical research fields. This article covers the theoretical background and the state-of-the-art and future trends in biomedical information retrieval. Techniques for literature searches, genomic information retrieval, and database searches are discussed. Literature searches techniques cover name entity extraction, document indexing, document clustering, and event extraction. Genomic information retrieval techniques are based on sequence alignment algorithms. This article also briefly describes widely used biological databases and discusses the issues related to the information retrieval from these databases. Terminology systems are involved in almost every aspect of information retrieval. The various types of terminology systems and their usage to support information retrieval are reviewed.

Bibliography

1C. J. Van Rijsbergen, Information Retrieval, 2nd ed. New York: Butterworth-Heinemann, 1999.
Google Scholar
2I. Iliopoulos, A. J. Enright, and C. A. Ouzounis, Textquest: Document clustering of MEDLINE abstracts for concept discovery in molecular biology. Proc. Pacific Symp. on Biocomputing, 2001: 384–395.
Google Scholar
3G. Salton, Automatic Information Organization and Retrieval. New York: McGraw-Hill, 1968.
Google Scholar
4W. R. Hersh, Information retrieval: A health and biomedical perspective, 2nd ed. New York: Springer-Verlag, 2003.
Google Scholar
5E. M. Voorhees, Overview of TREC 2003, TREC 2003, 2003.
Google Scholar
6L. Wong, PIES, a Protein Interaction Extraction System. Proc. Pacific Symp. on Biocomputing, 2001: 520–531.
Google Scholar
7M. A. Andrade and A. Valencia, Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstract: Development of a prototype system. Proc. 5th Int. Conf. on Intelligent Systems for Molecular Biology, 1997: 25–32.
Google Scholar
8D. L. Wheeler, D. M. Church, S. Federhen, A. E. Lash, T. L. Madden, J. U. Pontius, G. D. Schuler, L. M. Schriml, E. Sequeira, T. A. Tatusova, and L. Wagner, Database resources of the National Center for Biotechnology. Nucleic Acids Res. 2003; 31(1): 28–33.
10.1093/nar/gkg033
CAS PubMed Web of Science® Google Scholar
9A. R. Aronson, O. Bodenreider, H. F. Chang, S. M. Humphrey, J. G. Mork, S. J. Nelson, T. C. Rindflesch, and W. J. Wilbur, The NLM Indexing Initiative, 2000 AMIA Annu. Fall Symp., 2000: 17–21.
Google Scholar
10R. Willett, Recent trends in hierarchic document clustering: A critical review. Inform. Processing Manage 1988; 25: 577.
10.1016/0306-4573(88)90027-1
Web of Science® Google Scholar
11P. Glenisson, P. Antal, J. Mathys, Y. Moreau, and B. Demoor, Evaluation of the vector space representation in text-based gene clustering. Proc. Pacific Symp. on Biocomputing, 2003: 391–402.
Google Scholar
12 Gene Ontology Consortium, Creating the gene ontology resource: Design and implementation. Genetic Res. 2001; 11(8): 1425–1433.
10.1101/gr.180801
Web of Science® Google Scholar
13W. J. Wilbur, A thematic analysis of the AIDS literature. Proc. Pacific Symp. on Biocomputing, 2003: 386–397.
Google Scholar
14L. Hirschman, J. C. Park, J. I. Tsujii, L. Wong, and C. H. Wu, Accomplishments and challenges in literature data mining for biology. Bioinformatics 2002; 18(12): 1553–1561.
10.1093/bioinformatics/18.12.1553
CAS PubMed Web of Science® Google Scholar
15D. Hanisch, J. Fluck, H. Mevissen, and R. Zimmer, Playing biology's name game: Identifying protein names in scientific text. Proc. Pacific Symp. on Biocomputing, 2003: 403–414.
Google Scholar
16K. Fukuda, T. Tsunoda, A. Tamura, and T. Takagi, Toward information extraction: Identifying protein names from biological papers. Proc. Pacific Symp. on Biocomputing, 1998: 707–718.
Google Scholar
17M. Narayanaswamy, K. E. Ravikumar, and K. Vijay-shanker, A biological named entity recognizer. Proc. Pacific Symp. on Biocomputing, 2003: 427–450.
Google Scholar
18N. Collier, C. Nobata, and J. I. Tsujii, Extracting the names of genes and gene products with a Hidden markov Model. Proc. 18th Int. Conf. on Computational Linguistics, 2000: 201–207.
Google Scholar
19K. Takeuchi and N. Collier, Use of support vector machines in extended named entity recognition. Proc. 6th Conf. Natural Language Learning, 2002: 119–125.
Google Scholar
20T. Ono, H. Hishigaki, A. Tanigami, and T. Takagi, Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 2001; 17(2): 155–161.
10.1093/bioinformatics/17.2.155
CAS PubMed Web of Science® Google Scholar
21M. Krauthammer, A. Rzhetsky, P. Morozov, and C. Friedman, Using BLAST for identifying gene and protein names. Gene 2000; 259: 245–252.
10.1016/S0378-1119(00)00431-5
CAS PubMed Web of Science® Google Scholar
22S. K. Ng and M. Wong, Toward routine automatic pathway discovery from on-line scientific text abstracts. Genome Inform. 1999; 10: 104–112.
CAS Google Scholar
23S. Y. Chung and L. Wong, Kleisli: A new tool for data integration in biology. Trends Biotechnol. 1999; 17(9): 351–355.
10.1016/S0167-7799(99)01342-6
CAS PubMed Web of Science® Google Scholar
24E. R. Gansner and S. C. North, An open graph visualization system and its applications to software engineering. Software—Practics Experience 2000; 30: 1203–1233.
10.1002/1097-024X(200009)30:11<1203::AID-SPE338>3.0.CO;2-N
Web of Science® Google Scholar
25J. Thomas, D. Milward, C. Ouzounis, S. Pulman, and M. Carroll, Automatic extraction of protein interactions from scientific abstracts. Proc. Pacific Symp. on Biocomputing, 2000: 538–549.
Google Scholar
26J. C. Park, H. S. Kim, and J. J. Kim, Bidirectional incremental parsing for automatic pathway identification with combinatory categorical grammar. Proc. Pacific Symp. on Biocomputing, 2001: 396–407.
Google Scholar
27C. Blaschke, M. A. Andrade, C. Ouzounis, and A. Valencia, Automatic extraction of biological information from scientific text: protein-protein interactions. ISMB 1999; 7: 60–67.
Google Scholar
28A. Yakushiji, Y. Tateisi, Y. Miyao, and J. I. Tsujii, Event extraction from biomedical papers using a full parser. Proc. Pacific Symp. on Biocomputing, 2001: 408–419.
Google Scholar
29T. C. Rindflesch, L. Tanabe, J. N. Weinstein, and L. Hunter, EDGAR: Extraction of drugs, genes and relations from the biomedical literature. Proc. Pacific Symp. on Biocomputing, 2000: 517–528.
Google Scholar
30G. Leroy and H. Chen, Filing preposition-based templates to capture information from medical abstracts. Proc. Pacific Symp. on Biocomputing, 2002: 350–361.
Google Scholar
31B. J. Stapley and G. Benoit, Biobibliometrics: Information retrieval and visualization from co-occurrences of gene names in MEDLINE abstracts. Proc. Pacific Symp. on Biocomputing, 2000: 529–540.
Google Scholar
32J. Ding, D. Berleant, D. Nettleton, and E. Wurtele, Mining MEDLINE: Abstracts, sentences, or phrases? Proc. Pacific Symp. on Biocomputing, 2002: 326–337.
Google Scholar
33S. Brenner and F. Lewitter, Trends Guide to Bioinformatics. London: Elsevier, 1998.
Google Scholar
34T. F. Smith and M. S. Waterman, Identification of common molecular subsequences. J. Mol. Biol. 1981; 147: 195–197.
10.1016/0022-2836(81)90087-5
CAS PubMed Web of Science® Google Scholar
35A. D. Baxevanis and B. F. Ouellette, Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. New York: Wiley, 1998.
10.1002/9780470110607
Google Scholar
36Z. Zhang, S. Schwartz, L. Wagner, and W. Miller, A greedy algorithm for aligning DNA sequences. J. Computat. Biol. 2000; 7(1–2): 203–214.
10.1089/10665270050081478
PubMed Web of Science® Google Scholar
37T. Kahveci and A. Singh, MAP: Searching large genome databases. Proc. Pacific Symp. on Biocomputing, 2003: 303–314.
Google Scholar
38B. Ma, J. Tromp, and M. Li, PatternHunter: Faster and more sensitive homology search. Bioinformatics. 2002; 18(3): 440–445.
10.1093/bioinformatics/18.3.440
CAS PubMed Web of Science® Google Scholar
39T. A. Tatusova and T. L. Madden, Blast 2 sequences-a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 1999; 174: 247–250.
10.1111/j.1574-6968.1999.tb13575.x
CAS PubMed Web of Science® Google Scholar
40S. F. Altschul, T. L. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, Gapped BLAST and PSI-BLAST: A new generation of protein databse search programs. Nucleic Acids Res. 1997; 25(17): 3389–3402.
Google Scholar
41A. Delcher, A. Phillippy, J. Carlton, and S. L. Salzberg, Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002; 30(11): 2478–2483.
10.1093/nar/30.11.2478
PubMed Web of Science® Google Scholar
42C. Burks, Molecular biology database list. Nucleic Acids Res. 1999; 27(1): 1–9.
10.1093/nar/27.1.1
CAS PubMed Web of Science® Google Scholar
43A. D. Baxevanis, The Molecular Biology Database Collection: 2003 update. Nucleic Acids Res. 2003; 31(1): 1–12.
10.1093/nar/gkg120
CAS PubMed Web of Science® Google Scholar
44E. M. Zdobnov, R. Lopez, R. Apweiler, and T. Etzold, The EBI SRS server-new features. Bioinformatics 2002; 18(8): 1149–1150.
10.1093/bioinformatics/18.8.1149
CAS PubMed Web of Science® Google Scholar
45D. Frishman, K. Heumann, A. Lesk, and H. Mewes, Comprehensive, comprehensible, distributed and intelligent databases: Current status. Bioinformatics 1998; 14(7): 551–561.
10.1093/bioinformatics/14.7.551
CAS PubMed Web of Science® Google Scholar
46L. Hirschman, C. Friedman, R. Mcentire, and C. Wu, Linking biological language, information and knowledge. Proc. Pacific Symp. on Biocomputing, 2003: 388–390.
Google Scholar
47S. Schulze-Kremer, Ontologies for molecular biology. Proc. Pacific Symp. on Biocomputing, 1998: 693–704.
Google Scholar
48T. K. Jenssen, A. Lagreid, J. Komorowski, and E. Hovig. A literature network of human genes for high-throughput analysis of gene expression. Nature Gen. 2001; 28: 21.
CAS PubMed Web of Science® Google Scholar
49M. Safran, I. Solomon, O. Shmueli, M. Lapidot, S. Shen-Orr, A. Adato, U. Ben-Dor, N. Esterman, N. Rosen, I. Peter, T. Olender, V. Chalifa-Caspi, and D. Lancet, GeneCards 2002: Towards a complete, object-oriented, human gene compendium. Bioinformatics 2002; 18(11): 1542–1543.
10.1093/bioinformatics/18.11.1542
CAS PubMed Web of Science® Google Scholar
50P. D. Karp, EcoCyc: The Resource and the Lessons Learned. New York: Kluwer Academic, 1999, pp. 47–62.
Google Scholar
51P. D. Karp, C. Ouzounis, and S. Paley, HinCyc: A knowledge base of the complete genome and metabolic pathways of H. influenzae. Proc. 4th Int. Conf. on Intelligent Systems in Molecular Biology, 1999: 116–124.
Google Scholar
52S. I. Letovsky, R. W. Cottingham, C. J. Proter, and P. W. D. Li, GDB: The Human Genome Database. Nucleic Acids Res. 1998; 26: 94–99.
10.1093/nar/26.1.94
CAS PubMed Web of Science® Google Scholar
53C. A. Goble, N. W. Paton, R. Stevens, P. G. Baker, G. Ng, M. Peim, S. Bechhofer, and A. Brass, Transparent access to multiple bioinformatics information sources. IBM Syst. J. 2001; 40(2): 532–551.
10.1147/sj.402.0532
Web of Science® Google Scholar
54Z. Ben Miled, N. Li, G. M. Kellet, B. Sipes, and O. Bukhres, Complex life science multidatabase queries. Proc. IEEE 2002; 90(11): 1754–1763.
10.1109/JPROC.2002.804683
Web of Science® Google Scholar
55R. Stevens, C. Goble, P. Baker, and A. Brass, A classification of tasks in bioinformatics. Bioinformatics 2001; 17(2): 180–188.
10.1093/bioinformatics/17.2.180
CAS PubMed Web of Science® Google Scholar
56M. A. Andrade, N. P. Brown, C. Leroy, S. Hoersch, A. de Daruvar, C. Reich, A. Franchini, J. Tamames, A. Valencia, C. Ouzounis, and C. Sander, Automated genome sequence analysis and annotation. Bioinformatics 1999; 15(5): 391–412.
10.1093/bioinformatics/15.5.391
CAS PubMed Web of Science® Google Scholar

Wiley Encyclopedia of Biomedical Engineering

Browse other articles of this reference work: