The Bioinformatics Playground
Total Page:16
File Type:pdf, Size:1020Kb
Gearing for bioinformatics Gearing for bioinformatics Bela Tiwari and Dawn Field explore the tools and facilities that ioinformatics’ is a buzz word that is Projects with enough funding are able to hire users will depend on the system, how they will becoming increasingly audible in the dedicated system administrators to provide access it, etc. Live CD or DVD distributions may can be used by the budding open source bioinformatician ‘BLinux world. Fast, economical, sustainable bioinformatics computing systems, be good for an individual and for demonstration flexible, and extensible computing power is but many of us are not that lucky and have to purposes, but they are probably not the right making Linux increasingly attractive to scientists go it alone. choice for the provision of tools to a whole in many areas of research, including biology. To add to the challenge, much bioinformatics department. More generally, the open source movement has software is written by academics, and while greatly benefited biological research; the most there are some very good, well tested packages LIVE DISTRIBUTIONS publicised project being the publicly funded out there, there are also many that were Live Linux distributions are a relatively new effort to sequence and make freely available the intended to answer a particular question, on a phenomenon and offer some big advantages. human genome. Less well publicised is the huge particular machine, for a particular group. Such You don’t have to install anything to run them. amount of biological data that can be freely packages were often not built with portability, Just slot the CD or DVD into the drive and boot accessed. The combination of data availability future use or further development in mind. your machine. Et voila! If the developers have The bioinformatics playground and free software is revolutionising this field. Knowing when to persevere or give up with a done their jobs correctly, the software should be The ability to redistribute Linux, the existence piece of software is all part of the key skills of a configured to run properly without any further of online documentation, active user and bioinformatician or bioinformatics systems configuration. Live distributions may appeal to developer communities, and the fact that much provider. Even very experienced system people who want to try a system out, those bioinformatics software is developed for Linux/ administrators can sometimes find installing and who want to demonstrate software to others, or Unix systems, has opened the way for individual integrating bioinformatics software and those who want a portable Linux system for users without access to large centralised databases frustrating and tedious. their own purposes. It is unlikely, however, that resources to be able to install and run Many developers have faced these challenges a live distribution will suffice as your primary bioinformatics software to analyse data, and to already and taking advantage of the resources bioinformatics system if you want to undertake start developing for the wider community. some of them have made freely available can serious bioinformatics work. Here we outline projects that can help to greatly reduce the overheads involved in significantly ease the experience of trying out, establishing a new system for bioinformatics. FULL SYSTEMS using, and providing computing platforms Some of these resources are described in this Full systems customised for bioinformatics work appropriate for bioinformatics analyses. article including CD and DVD-based Live Linux are offered freely by a number of groups. distributions customized for bioinformatics Installed systems are very flexible. Unlike a Live KNOWING WHEN analyses, full distributions that can be installed distribution, you can always add extra software Turning data into knowledge is a complex task from iso images or installed over the network, and customise to your hearts content. The that demands data manipulation, comparison, and also specialised package repositories. Each distributions reviewed here are available either statistical analysis, visualisation, as well as data of these solutions has its particular attractions by downloadable iso files (BioBrew and storage and dissemination. Usually, the weight for users with different requirements. BioLand) or by network installation (Bio-Linux). of many lines of evidence must be combined to Currently, BioBrew is the only distribution of the In order to carry out answer a scientific question, and the PICKING YOUR SOLUTION three reviewed that can also be purchased on meaningful analyses, you interpretation of the output of many different Whether you plan to use a system yourself or DVD. need to have a question software tools plays a key role in discerning and provide it for others, give thought to your long- By nature a certain degree of knowledge is assembling data from which biological term requirements. Questions you might be required for maintaining a machine running to answer and an knowledge is born. asking yourself include how much computing Linux, with the level required varying between understanding of the Finding and installing common tools for power you are likely to need, whether you the systems reviewed here. For example, if you context of that question bioinformatics on your own machine, especially require a cluster-based solution, how many are a biologist with little computing or systems for those new to Linux, can be a daunting task. databases need to be stored locally, how many knowledge, but you require access to a high 50 LinuxUser & Developer LinuxUser & Developer 51 Gearing for bioinformatics Gearing for bioinformatics powered bioinformatics computer cluster, Bioinformatics.org, or the bioinformatics section BioBrew might be a distribution for you to on Sourceforge and see how you can get DDBJ Just slot the CD or DVD into the drive and boot your machine. Et voila! consider, but doing this in collaboration with a involved. www.ddbj.nig.ac.jp skilled system administrator could make a huge Setting up your bioinformatics toolbox is just (LinuxUser and Developer, issue 43) we expand is, and applications from it also run from under command line when you enter the system. As difference. one step in the research process. In order to EnsEMBL on that information and outline issues that the graphical menu system. Documentation for speed is one of the major issues when running It is also advisable to consider the level of carry out meaningful analyses, you need to have www.ensembl.org might help people new to this area choose this system is good, with information available Live CD distributions, having less memory security on these systems at the time of a question to answer and an understanding of which distributions to try first. both on the website and on static pages intensive options than KDE, the window installation, and whether you need to take the context of that question. The good news is UCSC Genome Browser included in the system. manager commonly run on Knoppix-derived additional steps to keep out hackers. If you are that you can keep up with many of the new genome.ucsc.edu VLINUX Overall, this system is worth trying and might systems, is very useful for those working on going to use the machine within an developments in this area of science as access to Developer: V. Vimalkumar be suitable particularly for new users who have slower systems. organisation and connect it to the network, you the scientific literature is increasingly free. Biobar URL: bioinformatics.org/vlinux/ an interest in chemistry and structure in Overall, if you are happy using the command may have to give assurances about the security The freedom to make packages and biobar.mozdev.org Availability: Free download of iso addition to sequence analysis programs. line and giving full paths to the programs, there of the system. distributions available to the wider community Base system: Knoppix are lots of programs to choose from on For bioinformatics systems providers, it may comes back to the commitment of many VLinux contains a good range of bioinformatics BIOKNOPPIX DNALinux. Otherwise, you may want to look at be worth using one of these distributions as a software developers, and others, to the open software, mostly concentrating on sequence Developer: C.M. Rodriguez, High Performance other distributions listed in this review. base system, which can then be further source ethos. Allied with free availability of Open data manipulation and analysis, and structural Computing Facility, University of Puerto Rico modified to meet your requirements. An biological datasets, and access to scientific It has been to biology’s immense benefit that viewing programs. A main benefit, apart from URL: bioknoppix.hpcf.upr.edu/ QUANTIAN example of such a project is the collaboration literature, this truly puts the world of biology the ethos in the biological research community the range of software available, is the provision Availability: Free download of iso Developer: Dirk Eddelbuettel, Debian between Bio-Linux developers and Keith Jolley and bioinformatics within the grasp of anyone (and the requirement for publication in many of graphical menu options for the bioinformatics Base System: Knoppix URL: dirk.eddelbuettel.com/quantian.html of the University of Oxford, who distributes this willing to learn. journals) is that data be submitted to public software on the system. This includes a BioKnoppix is probably the most well known of Availability: free download of iso platform to users involved in