New Online Search Tool Helps Scientists Explore Genes in Medical Literature


Computational biology researchers today announced a new Internet tool for the exploration of the scientific literature in medicine and biology. The freely accessible iHOP service provides fast, accurate, comprehensive, and up-to-date summary information on more than 80,000 biological molecules by automatically extracting key sentences from millions of PubMed documents when a search is requested. PubMed is a service of the US National Library of Medicine that includes more than 16 million citations from life science journals for biomedical articles.

“By using genes and proteins as hyperlink sources between sentences and abstracts, iHOP converts the information in PubMed into a navigable information network, exploiting the power of the Internet for scientific literature investigation,” said Sloan Kettering Institute postdoctoral fellow Robert Hoffmann, who started the iHOP project when he worked at the Protein Design Group in Madrid, Spain.

The most recent work of the Computational Biology Center at Memorial Sloan Kettering Cancer Center (MSKCC), headed by Chris Sander, Chairman of the Sloan Kettering Institute’s Computational Biology Program, has taken iHOP to the next level in terms of speed, accuracy, relevance, and coverage.

“This is a next-generation search tool that allows scientists to extract comprehensive information regarding the function of genes in signaling pathways, interaction networks, and biological processes,” said Dr. Sander. “We are now able to provide daily updates, processing about 2000 new publications per day, which makes iHOP the most comprehensive and up-to-date resource for literature-derived gene information on the Web.”

iHOP has been available at no cost to the public since 2004 and has become one of the most used resources for biomedical research, accessed by up to 200,000 different users per month. The new version provides current information on even more genes and chemical compounds, covering all organisms - ranging from human and chimpanzee to yeast and HIV - making iHOP useful to tens of thousands of biomedical researchers.

iHOP allows researchers to explore a network of gene interactions by directly navigating the pool of published scientific literature. Rather than providing long lists of entire abstracts upon keyword searches, iHOP selectively retrieves information that is specific to genes and proteins and summarizes their interactions and functions. The system adds value by filtering and ranking extracted sentences according to significance, impact factor, date of publication, and syntax.

The complexity of the information stored in scientific databases and journals and the growing number of scientific publications gave rise to the need for text-mining systems to help researchers navigate through these interrelated information resources. Openly accessible text-mining systems, such as iHOP, facilitate the integration of database and text information and support researchers in the formulation of novel hypotheses.

“We plan to extend the iHOP concept to full text sources and the algorithmic exploration of gene networks,” added Dr. Hoffmann.

Access to iHOP, an acronym for the tool’s original version, Information Hyperlinked over Proteins, is freely available at More information about computational biology research is at