From Genotype to Phenotype: a Shortcut Through the Library
Total Page:16
File Type:pdf, Size:1020Kb
RESEARCH HIGHLIGHTS URLs BIOINFORMATICS From genotype to phenotype: a shortcut through the library A list of genes tells you little about 11,026 orthologous gene sets were PLoS Biol. 3, e134 (2005) WEB SITES their biological roles: understanding identified from the STRING (search Peer Bork’s laboratory: http://www-db.embl.de/ this generally requires time-con- tool for the retrieval of interacting jss/EmblGroupsHD/g_27.html suming functional or comparative genes/proteins) database, defining The STRING database: http://string.embl.de studies. A new method provides a shared sets of genes. shortcut on this path from genotype The final stage was to look for to phenotype — it uses a combina- correlations between the groupings tion of comparative genomics and of MEDLINE nouns and the group- literature mining to predict the ings of shared genes. Bork and col- functions of large sets of sequenced leagues identified 2,700 significant genes. associations between orthologous Peer Bork and colleagues rea- groups and trait words, which soned that if a group of species has a allowed them to relate 28,888 genes shared phenotypic trait, orthologous to at least one trait. Among these genes shared among the species are associations, many of the gene–phe- likely to be involved in the underly- notype associations were already ing biological process. Such geno- known, confirming the validity of type–phenotype correlations have the method. However, many new been made in the past, but required discoveries were also made. For initial manual collection of pheno- example, previously unknown asso- typic information for each species, ciations were made between a group which is labour-intensive and might of metabolic genes and trait words lead to biases in the phenotypes linked to food poisoning. examined. Many of the other associations In the new study, this annota- made in this study also linked genes tion stage was avoided by directly to disease-related phenotypes, which linking species with phenotypic could provide a valuable source of information already available in the new drug targets. This skew towards published literature. The authors clinically relevant phenotypes reflects linked 92 completely sequenced the fact that pathogens are highly prokaryotic species with 172,967 represented among fully sequenced nouns in MEDLINE abstracts (from prokaryotes, but as more sequences the database compiled by the US and more MEDLINE entries become National Library of Medicine). The available, this method should provide nouns were grouped according to a way to link genes to a wide range of the species they matched up with, biological processes. assuming that words that relate Louisa Flintoft more frequently to a particular set References and links of species are likely to be specific to ORIGINAL RESEARCH PAPER Korbel, J. O. et al. Systematic association of genes a shared trait. For the same species, to phenotypes by genome and literature mining. NATURE REVIEWS | GENETICS VOLUME 6 | JULY 2005 | 1 © 2005 Nature Publishing Group .