248 Genome Informatics 13: 248-249 (2002)

BioRuby: Object Oriented Open Source Library for

Toshiaki Katayama1 Shuichi Kawashima1 Naohisa Goto2 [email protected] [email protected] [email protected] Mitsuteru C. Nakao3 Yoshinori K. Okuji Minoru Kanehisa1 [email protected] [email protected] [email protected] 1 Bioinformatics Center , Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan 2 Genome Information Research Center , Osaka University, Yamadaoka 3-1, Suita, Os- aka 565-0871, Japan 3 Human Genome Center , Institute of Medical Science, University of Tokyo, 4-6-1 Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan

Keywords: opensource, ruby scriptinglanguage, database interface, object oriented

1 Introduction BioRuby[6] is the projectto bulid usefullibrary for bioinformaticstasks withthe objectoriented scriptinglanguage Ruby, started in late 2000.Ruby is madein Japanand is gettingpopularity by its simpleand powerfulsyntax in recentyears. For Perl, Java and Pythonprogrammers, there are alreadyexisting BioPerl [4], BioJava [3] and [5] projects as precedentefforts, and organized by the openbio foundation [7, 2, 1]. How- ever,they tendto be complicatedunnecessarily because of the limitationof eachlanguage characteris- tics and historicalreasons that they havegradually developed. Taking these state into consideration, wefind the rubylanguage is suitablefor constructingsimple and easy to use openbio library. In 2002,the first invited-onlyOpen Bio* developers meeting called BioHackathon was held in Arizonaand CapeTown, and we haveagreed on fivenew ways to retrievea recordgiven an identifier fromthe biologicalsequence databases. These methods are named OBDA (open bio sequence database access)and include flatfile indexing (simple, Berkeley DB), BioFetch (CGI/HTTP), BioSQL (MySQL, PostgreSQL,Oracle), XEMBL and Corbadescribed below.

2 Project Status Currently,we haveclasses for biologicalsequences and annotations(Bio::Sequence, Bio::Location, Bio::Feature),literature management classes for retrieving and storing reference information (Bio::Refer- ence,Bio::PubMed), parsers for over 20 major biological databases (Bio::DB, Bio::GenBank, Bio::KEGG, etc.), wrappersfor sequenceanalysis softwares such as BLAST,FASTA and EMBOSS(Bio::Blast, Bio::Fastaetc.) and classesfor pathwaycomputation (Bio::Pathway, Bio::Relation). Additionally, we havealready implemented accessing methods for OBDAvia Bio::Registry,Bio::FlatFile, Bio::Fetch and Bio::SQLclasses (XEMBL and Corbainterfaces are planned). As for OBDA,Bio::Registry is a mechanismto selectaccessing methods by usinginitialization files (~/.bioinformatics/seqdatabase ini). Since,OBDA has fivedifferent ways to retrievethe same entryfrom several databases, users can determine which method to use forobtaining an entry.Flatfile indexingis a simplestmethod to buildagainst flatfile databases. BioFetch is a methodto retrievean BioRuby 249

Figure 1: Ruby classes implemented in BioRuby library. entry through the CGI over the HTTP protocol and we are also providing the BioFetch server for the community. BioSQL store the data into the relational database and suitable for the most complicated tasks. Currently available classes are summarized in Fig. 1.

References

[1] Mangalam, H., The Bio* toolkits - a brief overview, Brief Bioinform, 3:296-302, 2002. [2] Stein, L., Creating a bioinformatics nation, Nature, 417:119-120, 2002. [3] http://biojava.org/ [4] http://bioperl.org/ [5] http://biopython.org/ [6] http://bioruby.org/ [7] http://open-bio.org/