Robust Cross-Platform Workflows: How Technical and Scientific

Total Page:16

File Type:pdf, Size:1020Kb

Robust Cross-Platform Workflows: How Technical and Scientific Data Sci. Eng. (2017) 2:232–244 https://doi.org/10.1007/s41019-017-0050-4 Robust Cross-Platform Workflows: How Technical and Scientific Communities Collaborate to Develop, Test and Share Best Practices for Data Analysis 1,2 2,3 2,4 2,5 Steffen Mo¨ller • Stuart W. Prescott • Lars Wirzenius • Petter Reinholdtsen • 6 7 8,9,10 11 Brad Chapman • Pjotr Prins • Stian Soiland-Reyes • Fabian Klo¨tzl • 12 13 2 2,9 Andrea Bagnacani • Matu´sˇ Kalasˇ • Andreas Tille • Michael R. Crusoe Received: 13 June 2017 / Revised: 1 October 2017 / Accepted: 18 October 2017 / Published online: 16 November 2017 Ó The Author(s) 2017. This article is an open access publication Abstract Information integration and workflow technolo- essential to have all the flexibility of open-source tools and gies for data analysis have always been major fields of open-source workflow descriptions. Workflows in data- investigation in bioinformatics. A range of popular work- driven science such as computational biology have con- flow suites are available to support analyses in computa- siderably gained in complexity. New tools or new releases tional biology. Commercial providers tend to offer with additional features arrive at an enormous pace, and prepared applications remote to their clients. However, for new reference data or concepts for quality control are most academic environments with local expertise, novel emerging. A well-abstracted workflow and the exchange of data collection techniques or novel data analysis, it is the same across work groups have an enormous impact on the efficiency of research and the further development of the field. High-throughput sequencing adds to the ava- & Steffen Mo¨ller lanche of data available in the field; efficient computation [email protected] and, in particular, parallel execution motivate the transition 1 Rostock University Medical Center, Institute for Biostatistics from traditional scripts and Makefiles to workflows. We and Informatics in Medicine and Ageing Research, Rostock, here review the extant software development and distri- Germany bution model with a focus on the role of integration testing 2 Debian Project, https://www.debian.org and discuss the effect of common workflow language on 3 School of Chemical Engineering, UNSW, Sydney, distributions of open-source scientific software to swiftly NSW 2052, Australia and reliably provide the tools demanded for the execution 4 QvarnLabs, Helsinki, Finland of such formally described workflows. It is contended that, alleviated from technical differences for the execution on 5 University Center for Information Technology, University of Oslo, Oslo, Norway local machines, clusters or the cloud, communities also gain the technical means to test workflow-driven interac- 6 Harvard School of Public Health, Boston MA, USA tion across several software packages. 7 University Medical Center Utrecht, Utrecht, The Netherlands 8 eScience Lab, School of Computer Science, The University Keywords Continuous integration testing Á Common of Manchester, Manchester, UK workflow language Á Container Á Software distribution Á 9 Common Workflow Language Project, Automated installation http://www.commonwl.org 10 Apache Software Foundation, Forest Hill MD, USA 11 Max-Planck-Institute for Evolutionary Biology, Plo¨n, 1 Introduction Germany 12 Department of Systems Biology and Bioinformatics, An enormous amount of data is available in public data- University of Rostock, Rostock, Germany bases, institutional data archives or generated locally. This 13 Computational Biology Unit, Department of Informatics, remote wealth is immediately downloadable, but its inter- University of Bergen, Bergen, Norway pretation is hampered by the variation of samples and their 123 Robust Cross-Platform Workflows: How Technical and Scientific Communities Collaborate to… 233 biomedical condition, the technological preparation of the 2 Methods sample and data formats. In general, all sciences are challenged with data management, and particle physics, The foundations of functional workflows are sources for astronomy, medicine and biology are particularly known readily usable software. The software tools of interest are for data-driven research. normally curated by maintainers into installable units that The influx of data further increases with more technical we will generically call ‘‘packages’’. We reference the advances and higher acceptance in the community. With following package managers that can be installed on one more data, the pressure raises on researchers and service single system and, together, represent all bioinformatics groups to perform analyses quickly. Local compute facil- open-source software packages in use today: ities grow and have become extensible by public clouds, • Debian GNU/Linux (http://www.debian.org) is a soft- which all need to be maintained and the scientific execu- ware distribution that encompasses the kernel (Linux) tion environment be prepared. plus a large body of other user software including Software performing the analyses steadily gains func- graphical desktop environments, server software and tionality, both for the core analyses and for quality control. specialist software for scientific data processing. New protocols emerge that need to be integrated in existing Debian and its derivatives share the deb package for- pipelines for researchers to keep up with the advances in mat with a long history of community support for knowledge and technology. Small research groups with a bioinformatics packages [26, 27]. focus on biomedical research sensibly avoid the overhead • GNU Guix (https://www.gnu.org/software/guix/)isa entailed in developing full solutions to the analysis problem package manager of the GNU project that can be or becoming experts in all details of these long workflows, installed on top of other Linux distributions and rep- instead concentrating on the development of a single soft- resents the most rigorous approach towards dependency ware tool for a single step in the process. Best practices management. GNU Guix packages are uniquely iso- emerge, are evaluated in accompanying papers and are then lated by a hash value computed over all inputs, shared with the community in the most efficient way [37]. including the source package, the configuration and all Over the past 5 years, with the avalanche of high- dependencies. This means that it is possible to have throughput sequencing data in particular, the pressure on multiple versions of the same software and even dif- research groups has risen drastically to establish routines in ferent combinations of software, e.g. Apache with ssl non-trivial data processing. Companies like Illumina offer and without ssl compiled in on a single system. hosted bioinformatics services (http://basespace.illumina. • Conda (https://conda.io/docs/) is a package installation com/) which developed into a platform in its own right. tool that, while popularised by the Anaconda Python This paper suggests workflow engines to take the position distribution, can be used to manage software written in of a core interface between the users and a series of any language. Coupled with Bioconda [7](https://bio communities to facilitate the exchange, verification and conda.github.io/), its software catalogue provides efficient execution of scientific workflows [32]. Prominent immediate access to many common bioinformatics examples of workflow and workbench applications are packages. Apache Taverna [38], with its seeds in the orchestration of • Brew (https://brew.sh) is a package manager that dis- web interfaces; Galaxy [1, 11], which is popular for tributes rules to compile source packages, originally allowing the end-users’ modulation of workflows in nucleic designed to work on macOS but capable of managing acid sequencing analysis and other fields; the lightweight software on Linux environments as well. highly compatible Nextflow [8]; and KNIME [5] which has emerged from machine learning and cheminformatics The common workflow language (CWL) [3] is a set of environments and is now also established for bioinfor- open-community-made standards with a reference imple- matics routine sequence analyses [13].1 mentation for maintaining workflows with a particular Should all these workflow frameworks bring the soft- strength for the execution of command-line tools. CWL ware for the execution environment with them (Fig. 1)or will be adopted also by already established workflow suites should they depend on system administrators to provide the like Galaxy and Apache Taverna. It is also of interest as an executables and data they need? abstraction layer to reduce complexity in current hard- coded pipelines. The CWL standards provide 1 For a growing list of alternative workflow engines and formalisms, • formalisms to derive command-line invocations to see https://github.com/common-workflow-language/common-work flow-language/wiki/Existing-Workflow-systems and https://github. software com/pditommaso/awesome-pipeline. An overview of workbench • modular, executable descriptions of workflows with applications in bioinformatics is in [23, 36] and [18] pp. 35–39, http:// auto-fulfillable specifications of run-time dependencies bora.uib.no/bitstream/handle/1956/10658/thesis.pdf#page=35. 123 234 S. Mo¨ller et al. Fig. 1 Workflow specifications comprise formal references to a range of packages to be installed without specifying the execution environment. The separation between problem- specific software, and the packages a Linux distribution provides, is fluid The community embracing the CWL standards provides versions of two
Recommended publications
  • Linux and Electronics
    Linux and Electronics Urs Lindegger Linux and Electronics Urs Lindegger Copyright © 2019-11-25 Urs Lindegger Table of Contents 1. Introduction .......................................................................................................... 1 Note ................................................................................................................ 1 2. Printed Circuits ...................................................................................................... 2 Printed Circuit Board design ................................................................................ 2 Kicad ....................................................................................................... 2 Eagle ..................................................................................................... 13 Simulation ...................................................................................................... 13 Spice ..................................................................................................... 13 Digital simulation .................................................................................... 18 Wings 3D ....................................................................................................... 18 User interface .......................................................................................... 19 Modeling ................................................................................................ 19 Making holes in Wings 3D .......................................................................
    [Show full text]
  • Communicate the Future PROCEEDINGS HYATT REGENCY ORLANDO 20–23 MAY | ORLANDO, FL and Summit.Stc.Org #STC18
    Communicate the Future PROCEEDINGS HYATT REGENCY ORLANDO 20–23 MAY | ORLANDO, FL www.stc.org and summit.stc.org #STC18 SOCIETY FOR TECHNICAL COMMUNICATION 1 Efficiency exemplified Organizations globally use Adobe FrameMaker (2017 release) Request demo to transform the way they create and deliver content Accelerated turnaround time for 70% reduction in printing and paper customized publications material cost Faster creation and delivery of content 50% faster production of PDF for new products across devices documentation 20% faster development of Accelerated publishing across formats course content Increased efficiency and reduced 20% improvement in process translation costs while producing multilingual manuals efficiency For a personalized demo or questions, write to us at [email protected] © 2018 Adobe Systems Incorporated. All rights reserved. The papers published in these proceedings were reproduced from originals furnished by the authors. The authors, not the Society for Technical Communication (STC), are solely responsible for the opinions expressed, the integrity of the information presented, and the attribution of sources. The papers presented in this publication are the works of their respective authors. Minor copyediting changes were made to ensure consistency. STC grants permission to educators and academic libraries to distribute articles from these proceedings for classroom purposes. There is no charge to these institutions, provided they give credit to the author, the proceedings, and STC. All others must request permission. All product and company names herein are the property of their respective owners. © 2018 Society for Technical Communication 9401 Lee Highway, Suite 300 Fairfax, VA 22031 USA +1.703.522.4114 www.stc.org Design and layout by Avon J.
    [Show full text]
  • Introduction to Gentoo Linux
    Introduction to Gentoo Linux Ulrich Müller Developer and Council member, Gentoo Linux <[email protected]> Institut für Kernphysik, Universität Mainz <[email protected]> Seminar “Learn Linux the hard way”, Mainz, 2012-10-23 Ulrich Müller (Gentoo Linux) Introduction to Gentoo Linux Mainz 2012 1 / 35 Table of contents 1 History 2 Why Gentoo? 3 Compile everything? – Differences to other distros 4 Gentoo features 5 Gentoo as metadistribution 6 Organisation of the Gentoo project 7 Example of developer’s work Ulrich Müller (Gentoo Linux) Introduction to Gentoo Linux Mainz 2012 2 / 35 /"dZEntu:/ Pygoscelis papua Fastest swimming penguin Source: Wikimedia Commons License: CC-BY-SA-2.5, Attribution: Stan Shebs Ulrich Müller (Gentoo Linux) Introduction to Gentoo Linux Mainz 2012 3 / 35 How I came to Gentoo UNIX since 1987 (V7 on Perkin-Elmer 3220, later Ultrix, OSF/1, etc.) GNU/Linux since 1995 (Slackware, then S.u.S.E.) Switched to Gentoo in January 2004 Developer since April 2007 Council Mai 2009–June 2010 and since July 2011 Projects: GNU Emacs, eselect, PMS, QA Ulrich Müller (Gentoo Linux) Introduction to Gentoo Linux Mainz 2012 4 / 35 Overview Based on GNU/Linux, FreeBSD, etc. Source-based metadistribution Can be optimised and customised for any purpose Extremely configurable, portable, easy-to-maintain Active all-volunteer developer community Social contract GPL, LGPL, or other OSI-approved licenses Will never depend on non-free software Is and will always remain Free Software Commitment to giving back to the FLOSS community, e.g. submit bugs
    [Show full text]
  • Efficient Support for Data-Intensive Scientific Workflows on Geo
    Efficient support for data-intensive scientific workflows on geo-distributed clouds Luis Eduardo Pineda Morales To cite this version: Luis Eduardo Pineda Morales. Efficient support for data-intensive scientific workflows ongeo- distributed clouds. Computation and Language [cs.CL]. INSA de Rennes, 2017. English. NNT : 2017ISAR0012. tel-01645434v2 HAL Id: tel-01645434 https://tel.archives-ouvertes.fr/tel-01645434v2 Submitted on 13 Dec 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Lucía, que no dejó que me rindiera. Acknowledgements One page is not enough to thank all the people who, in one way or another, have helped, taught, and motivated me along this journey. My deepest gratitude to them all. To my amazing supervisors Gabriel Antoniu and Alexandru Costan, words fall short to thank you for your guidance, support and patience during these 3+ years. Gabriel: ¡Gracias! For your trust and support went often beyond the academic. Alex: it has been my honor to be your first PhD student, I hope I met the expectations. To my family, Lucía, for standing by my side every time, everywhere; you are the driving force in my life, I love you.
    [Show full text]
  • Apache Taverna
    Apache Taverna hp://taverna.incubator.apache.org/ Donal Fellows San Soiland-Reyes @donalfellows @soilandreyes [email protected] [email protected] hp://orcid.org/0000-0002-9091-5938 hp://orcid.org/0000-0001-9842-9718 Alan R Williams @alanrw [email protected] hp://orcid.org/0000-0003-3156-2105 This work is licensed under a Creave Commons Collaboraons Workshop 2015-03-26 A,ribuIon 3.0 Unported License. Taverna Workflow Ecosystem • Workflow Language — SCUFL2 (and t2flow) • Workflow Engine — Taverna • Used in… – Taverna Command Line Tool – Taverna Server – Taverna WorkBench • Allied services – myExperiment, workflow repository – Service Catalographer, service catalog software • Instantiated as BioCatalogue, BiodiversityCatalogue, … NERSC Workflow Day 2 Map of the Taverna Ecosystem Taverna Ruby client Taverna Online Applicaon-Specific Portals Player UI UI Plugins UI Plugins Plugins Taverna UI Plugins Lite Taverna Taverna Workbench UI Plugins Command UI Plugins Line Tool REST API Components Taverna Other TavernaUI Plugins APIs Server Servers Taverna SOAP API Taverna Core AcIvity Engine UI Ps UI Plugins Plugins Workflow Service many Repository Catalogs services… 3 Users, Scientific Areas, Projects Taverna In Use NERSC Workflow Day 4 Taverna Users Worldwide NERSC Workflow Day 5 Taverna Uses — Scientific Areas • Biodiversity — BioVeL project • Digital Preservation — SCAPE project • Astronomy — AstroTaverna product • Solar Wind Physics — HELIO project • In silico Medicine — VPH-Share project NERSC Workflow Day 6 Biodiversity: BioVeL • Virtual e-LaBoratory for
    [Show full text]
  • Pacloud: Towards a Universal Cloud-Based Linux Package Manager
    Pacloud: Towards a Universal Cloud-based Linux Package Manager Olivier Bal-Pétré Pierre Varlez Fernando Perez-Tellez Technological University Dublin Technological University Dublin Technological University Dublin Dublin, Ireland Dublin, Ireland Dublin, Ireland [email protected] [email protected] [email protected] ABSTRACT or Qt framework. The LibreOffice package is built to be Package managers are a very important part of Linux distributions compatible with every user interface framework, hence heavier but we have noticed two weaknesses in them: They use pre-built than necessary: only one framework will be used for this software packages that are not optimised for specific hardware and often installation. they are too heavy for a specific need, or packages may require To optimise configuration and installation performance, source- plenty of time and resources to be compiled. In this paper, we based Linux distributions are used, one of the most famous being present a novel Linux package manager which uses cloud Gentoo Linux. computing features to compile and distribute Linux packages without impacting the end user's performance. We also show how Gentoo's package manager (Portage) builds packages from source Portage, Gentoo's package manager can be optimised for code and allows for specific compilation flags. This feature allows customisation and performance, along with the cloud computing to have packages that are optimised for a specific hardware. features to compile Linux packages more efficiently. All of this Portage also allows to build and install packages for specific resulting in a new cloud-based Linux package manager that is system requirements, with the help of USE flags [4].
    [Show full text]
  • Gentoo Guide: Installation
    Gentoo Guide: Installation Finalizing The Installation Tools MySQL Database Server Apache Web Server PHP Mail (Sendmail/SSMTP) MySQL Backup Protecting Your Web Directories With .htaccess PHPMyAdmin Webalizer TeamSpeak2 Server GenSplash Framebuffer Getting a GUI, Gentoo and X Sound, Gentoo and ALSA Window Managers IRC Server Installation Gentoo Linux is my OS of choice. It is highly customizable, has no extra bloat, and can be tailored and fine tuned to the system it is running on. If you really want to learn how to use Linux as well as what makes it tick then install Gentoo from scratch! You will be amazed at how much you will learn, not only about Gentoo and Linux, but also about the hardware inside your PC. The best way to install gentoo is to follow the handbook for your particlular arch found here. Then download the Gentoo Minimal/Install CD found here. Follow the handbook and it will get you up and running with the latest updated version of Gentoo. I use the handbook for every installation I do, it is an excellent resource. Once you are done you should have a basic Gentoo installation with a user created. When you get to page 12 "Where to go from here?" check out the links it offers then come back and check out the next section of this guide: Finalizing the Installation. Back to Top Finalizing the Installation Ok, so you followed the handbook and completed your installation. Now what? Well one of the last things the guide had you do was create a user. Here is some info about the groups that you added your user to and some others that are available.
    [Show full text]
  • A Review of Scalable Bioinformatics Pipelines
    Data Sci. Eng. (2017) 2:245–251 https://doi.org/10.1007/s41019-017-0047-z A Review of Scalable Bioinformatics Pipelines 1 1 Bjørn Fjukstad • Lars Ailo Bongo Received: 28 May 2017 / Revised: 29 September 2017 / Accepted: 2 October 2017 / Published online: 23 October 2017 Ó The Author(s) 2017. This article is an open access publication Abstract Scalability is increasingly important for bioin- Scalability is increasingly important for these analysis formatics analysis services, since these must handle larger services, since the cost of instruments such as next-gener- datasets, more jobs, and more users. The pipelines used to ation sequencing machines is rapidly decreasing [1]. The implement analyses must therefore scale with respect to the reduced costs have made the machines more available resources on a single compute node, the number of nodes which has caused an increase in dataset size, the number of on a cluster, and also to cost-performance. Here, we survey datasets, and hence the number of users [2]. The backend several scalable bioinformatics pipelines and compare their executing the analyses must therefore scale up (vertically) design and their use of underlying frameworks and with respect to the resources on a single compute node, infrastructures. We also discuss current trends for bioin- since the resource usage of some analyses increases with formatics pipeline development. dataset size. For example, short sequence read assemblers [3] may require TBs of memory for big datasets and tens of Keywords Pipeline Á Bioinformatics Á Scalable Á CPU cores [4]. The analysis must also scale out (horizon- Infrastructure Á Analysis services tally) to take advantage of compute clusters and clouds.
    [Show full text]
  • Programming Models to Support Data Science Workflows
    UNIVERSITAT POLITÈCNICA DE CATALUNYA (UPC) BARCELONATECH COMPUTER ARCHITECTURE DEPARTMENT (DAC) Programming models to support Data Science workflows PH.D. THESIS 2020 | SPRING SEMESTER Author: Advisors: Cristián RAMÓN-CORTÉS Dra. Rosa M. BADIA SALA VILARRODONA [email protected] [email protected] Dr. Jorge EJARQUE ARTIGAS [email protected] iii ”Apenas él le amalaba el noema, a ella se le agolpaba el clémiso y caían en hidromurias, en salvajes ambonios, en sustalos exas- perantes. Cada vez que él procuraba relamar las incopelusas, se enredaba en un grimado quejumbroso y tenía que envul- sionarse de cara al nóvalo, sintiendo cómo poco a poco las arnillas se espejunaban, se iban apeltronando, reduplimiendo, hasta quedar tendido como el trimalciato de ergomanina al que se le han dejado caer unas fílulas de cariaconcia. Y sin em- bargo era apenas el principio, porque en un momento dado ella se tordulaba los hurgalios, consintiendo en que él aprox- imara suavemente sus orfelunios. Apenas se entreplumaban, algo como un ulucordio los encrestoriaba, los extrayuxtaba y paramovía, de pronto era el clinón, la esterfurosa convulcante de las mátricas, la jadehollante embocapluvia del orgumio, los esproemios del merpasmo en una sobrehumítica agopausa. ¡Evohé! ¡Evohé! Volposados en la cresta del murelio, se sen- tían balpamar, perlinos y márulos. Temblaba el troc, se vencían las marioplumas, y todo se resolviraba en un profundo pínice, en niolamas de argutendidas gasas, en carinias casi crueles que los ordopenaban hasta el límite de las gunfias.” Julio Cortázar, Rayuela v Dedication This work would not have been possible without the effort and patience of the people around me.
    [Show full text]
  • ADMI Cloud Computing Presentation
    ECSU/IU NSF EAGER: Remote Sensing Curriculum ADMI Cloud Workshop th th Enhancement using Cloud Computing June 10 – 12 2016 Day 1 Introduction to Cloud Computing with Amazon EC2 and Apache Hadoop Prof. Judy Qiu, Saliya Ekanayake, and Andrew Younge Presented By Saliya Ekanayake 6/10/2016 1 Cloud Computing • What’s Cloud? Defining this is not worth the time Ever heard of The Blind Men and The Elephant? If you still need one, see NIST definition next slide The idea is to consume X as-a-service, where X can be Computing, storage, analytics, etc. X can come from 3 categories Infrastructure-as-a-S, Platform-as-a-Service, Software-as-a-Service Classic Cloud Computing Computing IaaS PaaS SaaS My washer Rent a washer or two or three I tell, Put my clothes in and My bleach My bleach comforter dry clean they magically appear I wash I wash shirts regular clean clean the next day 6/10/2016 2 The Three Categories • Software-as-a-Service Provides web-enabled software Ex: Google Gmail, Docs, etc • Platform-as-a-Service Provides scalable computing environments and runtimes for users to develop large computational and big data applications Ex: Hadoop MapReduce • Infrastructure-as-a-Service Provide virtualized computing and storage resources in a dynamic, on-demand fashion. Ex: Amazon Elastic Compute Cloud 6/10/2016 3 The NIST Definition of Cloud Computing? • “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” On-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf • However, formal definitions may not be very useful.
    [Show full text]
  • Sabayon Linux 5.4
    Sabayon Linux 5.4 Sabayon L inux yang dibangun berdasarkan distro Gentoo Linux meluncurkan versi 5.4 yang dioptimalkan untuk Desktops menggunakan Kernel Linux 2.6.35. Di lumbung repositorinya juga disediakan Kernel yang dioptimalkan untuk server. Berbeda dengan Gentoo Linux yang langsung dikompilasi dari kode sumbernya saat instalasi yang memakan waktu cukup lama, Sabayon Linux dapat diinstalasi dalam waktu singkat. Menurut pengembangnya, instalasi Sabayon Linux 5.4 dapat dituntaskan tidak lebih dari sepuluh menit. Media instalasi Sabayon 5.4 menyediakan pilihan lingkungan Desktop baik GNOME 2.30 atau KDE 4.5.1. Sebagai sistem berkas standar digunakan Ext4, disamping mendukung Btrfs. Sebagai pengelola paket software disediakan Entropy-Framework baru yang dapat memasang paket binari di komputernya dengan cepat. Sebagai alternatif juga masih tersedia sistem Portage, yang mampu mengkompilasi program dan dioptimalkan dengan hardware. Paket lain yang disertakan termasuk X.Org 7.5 dan OpenOffice.org 3.2, kemudian Media-Center XBMC sebagai aplikasi multimedia dan Game demo World of Goo sebagai pengisi kegiatan saat jeda. Sekitar 1.000 paket aplikasi telah diaktualisasi dan 100 kekliruan telah diluruskan sejak versi 5.3 terakhir. Kebutuhan hardware minimal untuk Sabayon 5.4 adalah prosesor i686-kompatibel, 512 MB RAM dan 8 GB spasi harddisk. Media diterbitkan baik untuk arsitektur 32 maupun 64-B it. Fitur utama Sabayon Linux 5.4: • Linux kernel 2.6.35 (dioptimalkan untuk desktops); • Extra kernel packages disediakan di lumbung repositories (Server-optimized,
    [Show full text]
  • Workspace - a Scientific Workflow System for Enabling Research Impact
    22nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 2017 mssanz.org.au/modsim2017 Workspace - a Scientific Workflow System for enabling Research Impact D. Watkinsa, D. Thomasa, L. Hethertona, M. Bolgera, P.W. Clearya a CSIRO Data61, Australia Email: [email protected] Abstract: Scientific Workflow Systems (SWSs) have become an essential platform in many areas of research. Compared to writing research software from scratch, they provide a more accessible way to acquire data, customise inputs and combine algorithms in order to produce research outputs that are reproducible. Today there are a number of SWSs such as Apache Taverna, Galaxy, Kepler, KNIME, Pegasus and CSIRO’s Workspace. Depending on your definition you may also consider environments such as MATLAB or programming languages with dedicated scientific support, such as Python with its SciPy and NumPy libraries, to be SWSs. All address different subsets of requirements, but generally attempt to address at least the following four to varying degrees: 1) improving researcher productivity, 2) providing the ability to create reproducible workflows, 3) enabling collaboration between different research teams, and 4) providing some aspect of portability and interoperability - either between different sciences or computing environments, or both. Most SWSs provide a generic set of capabilities such as a graphical tool to construct, save, load, and edit workflows, and a set of well proven functions, such as file I/O, and an execution environment. Workspace is an SWS that has been under continuous development at CSIRO since 2005. Its development is guided by four core themes; Analyse, Collaborate, Commercialise and Everywhere.
    [Show full text]