Impacting the Bioscience Progress by Backporting Software for Bio-Linux
Total Page:16
File Type:pdf, Size:1020Kb
Impacting the bioscience progress by backporting software for Bio-Linux Sasa Paporovic [email protected] v0.9 What is Bio-Linux and what is it good for - also its drawbacks: If someone says to use or to have a Linux this is correct as like it is imprecise. It does not exist a Linux as full functional operating system by itself. What was originally meant by the term Linux was the operating system core[1]. The so called kernel, or in a case of a Linux operating system the Linux kernel. It is originally designed and programmed by Linus Torvalds, who is also today the developer in chef or to say it with his words, he is the “alpha-male” of all developers[2]. Anyway, what we have today are Distributions[3]. It has become common to call them simply “a Linux”. This means that there are organizations out there, mostly private, some funded and some other commercial, which gather all what is needed to design around the Linux kernel a full functional operating system. This targets mostly Software, but also web and service infrastructure. Some of them have a history that is nearly as long as the Linux kernel is alive, like Debian. Some others are younger like Ubuntu and some more others are very young, like Bio-Linux[4]. The last Linux, the Bio-Linux, especially its latest version Bio-Linux 7 we are focusing here[5]. In year 2006 Bio-Linux with the work of Tim Booth[42] and team gives its rising[6] and provide an operating system that was and still specialized in providing a bioinformatic specific software environment for the working needs in this corner of bioscience. A software environment? Yes, as seen above a Linux is more than only its kernel(core of the system). A Linux of any distribution incorporates, and in this point it is very different from a Windows, all available Software for this distribution. This is done by a preparation step of the developers of the distribution that is called packaging[7]. In this package process the source code of a software will be gathered and compiled specific for all components that should be integrate in the finished Linux Distribution. The packaging process will produce a package of the software that should be and is mostly capable of to be installed smoothly in the Linux. This is done for every software that should be available for the finalized Distribution/Linux version, so that with the time huge software pools called repositories have been designed, some of them with several 10.000 programs in it[8]. Every Distribution installation is capable to get the available software from central servers on which the software pool reside and install it from there via package management tools like dpkg or rpm[9]. Installation of software without using the prepared packages from the central servers is still possible, but except for very simple programs this may require remarkable skills especially to keep sure to render the Linux not over time in an unusable state. So, packaging is for trustful obtaining of software (every package is cryptographic signed) and easy installation. But, this forces Linux developers to make a much more intensive job, then the developers at Microsoft have. Microsoft employers only have to orchestrate some hundred programs for a good running operating system they sell. Every additional software is primary an issue for the user/customer and with it the most problems that an external software may causing to the heart of a Windows.(e.g incompatible .dll files)[10]. Linux developers have taken over by their own choice the much harder job. With the process of packaging, which is an extended form of system integration, they orchestrate the functionality for and with several 10.000 packages, respective software[11]. In principle it is thinkable to do this continuously without deadlines and heartbreaks of the package/software integration, so that all the time the latest software and software versions are available. This is called a rolling release[12]. There are some approaches to do this, and one of the more famous ones is “Arch Linux”[13], which do time line independent integration of the latest software. But, you know, nothing you can get without a price. Bringing in the latest software into a productive installation can bring the system in unstable states and can cause subtle and only hard foreseeable errors within the system. As consequence the most Linux distributions have decided not to perform a rolling release for smoothness integration of software. Instead they have development phases in so called release cycles with a defined end and Linux versions as result. For example Ubuntu[14], which will also be handled here. In every development cycle new software is packaged and integrated, which means that one of the development objectives is to stabilize the imported software. Logically in some point during the development cycle there is a import stop of new software or new software versions. After this point no new software will be incorporated for this specific release version. After the import freeze the stabilization of the whole software canon for this release version begins and you see as consequence that every release version has a fixed software canon, that without a little number of exceptions will not be changed anymore until finalizing of development and also afterwards when it is final/stable. So, dependent on the development time, between the availability of one final version and the next final version there might be huge time gabs. In this time gabs no newer software beside the freezed software canon will be available for the installations made by the users of the latest final version[15]. How this all affects Bio-Linux? Simply Bio-Linux is also affected by a release cycle and with this the final releases of Bio-Linux will not have the latest bioinformatic software on board. What is and could be done to get around this problem? To obtain an answer let us see where the source of all bioinformatic software in Bio-Linux 7 is: Mostly it begins on Debian Linux. There is a team that call itself Debian-Med[16]. Andreas Tille[41] is one of the most active members. They decide in the best of there knowledge what bioinformatic software is good for Debian and, properly more remarkable, they do the packaging and system integration job[17,18]. And how comes Ubuntu in the game? Ubuntu is a bit parasitic to Debian. Oh, let me correct that. It is a symbiont. They derive the Software for the actual development branch of Ubuntu from the actual testing or unstable branch of Debian[19]. The other way around Ubuntu provide problem information and bug fixes for the Debian project. Additional many Ubuntu developers are also Debian developers(or the other way around)[14]. To user terms of genealogy: Debian is the mother and Ubuntu the daughter. And what is the relation to Bio-Linux? Tim Boot and team do the same as the Ubuntu people with Debian. They derive, but not from Debian direct. Instead they derive from Ubuntu long term supported Versions, so called Ubuntu LTS. The latest derivings were Ubuntu10.04 for Bio-Linux 6 and Ubuntu12.04 for Bio-Linux 7. With this Debian is the grandmother, Ubuntu the mother and Bio-Linux the daughter. So, to summaries the game: Source code Int The source code of any mai n() bioinformatic software for .. Anreas Tille et al. Debian-Med is selected, compiled Compiling . and and packaged by Andreas . packaging Sooner or Tille[41] and the Debian-Med . later in Int Debian Team. Sooner or later the mai testing n() branch .. software hits the testing branch of . Debian-Linux and is derived from Ubuntu develop there into Ubuntu LTS(Long term er team supported version) of Ubuntu-Linux. Tim Booth[42] Deriving from and the Bio-Linux team derive Debian Tim again the actual Bio-Linux Boot version from the actual Ubuntu et al. LTS version. Deriving from Ubuntu The drawback of the workflow: Ubuntu LTS version Bio-Linux There are only every two(even) years Ubuntu Figure1: The software flow between different Linux LTS-Versions(Ubuntu6.06 in distributions is shown and with this their genealogy. Also the 2006, Ubuntu8.04 in April 2008, involved workgroups are named. Icons and images are from Ubuntu10.04 in April 2010, differen online sources[16,37,38,39] The graphic is designed Ubuntu12.04 in April 2012 and with LibreOffice 4.1.2 on Ubuntu13.10. the next will be 14.04 in April 2014[20]. It is not good to get only every two years fresh scientific software/software versions, which is the consequence of the fact that Bio-Linux is derived only on the LTS versions and their 2 year release cycle. The latest scientific developments are not available in this way, which slows down the scientific progress. To circumvent this gab, the Bio-Linux team is incorporating additional software in Bio-Linux and bring new software version on over their own repository. It is enabled by default in Bio-Linux. They give also back to Ubuntu and Debian and prepare with this also the next versions[21]. Is this now all? No, there is still a other bridge on which new software could migrate into Bio-Linux. Let us have a look. The backporting bridge for getting over the 2 year gab. What is a backport? Your Bio-Linux7-System gather its software from different servers/sources, called repositories[8]. Some of them are the original repositories for Ubuntu12.04, from which Bio-Linux is derived.