An Effective Git and Org-Mode Based Workflow for Reproducible Research

Total Page:16

File Type:pdf, Size:1020Kb

An Effective Git and Org-Mode Based Workflow for Reproducible Research An Effective Git And Org-Mode Based Workflow For Reproducible Research Luka Stanisic, Arnaud Legrand and Vincent Danjean fi[email protected] CNRS/Inria/Univ. of Grenoble, Grenoble, France ABSTRACT is no reason why this should not be applied to computer In this paper we address the question of developing a science as well. lightweight and effective workflow for conducting experimen- Hence, a new movement promoting the development of tal research on modern parallel computer systems in a repro- reproducible research tools and practices has emerged, espe- ducible way. Our approach builds on two well-known tools cially for computational sciences. Such tools generally focus (Git and Org-mode) and enables to address, at least par- on replicability of data analysis [14]. Although high perfor- tially, issues such as running experiments, provenance track- mance computing or distributed computing experiments in- ing, experimental setup reconstruction or replicable analysis. volve running complex codes, they do not focus on execution We have been using such a methodology for two years now results but rather on the time taken to run a program and on and it enabled us to recently publish a fully reproducible how the machine resources were used. These requirements article [12]. To fully demonstrate the effectiveness of our call for different workflows and tools, since such experiments proposal, we have opened our two year laboratory notebook are not replicable by essence. Nevertheless in such cases, re- with all the attached experimental data. This notebook and searchers should still at least aim at full reproducibility of the underlying Git revision control system enable to illus- their work. trate and to better understand the workflow we used. There are many existing solutions partially addressing these issues, some of which we present in Section3. However, none of them was completely satisfying the needs and con- 1. INTRODUCTION straints of our experimental context. Therefore, we decided In the last decades, both hardware and software of modern to develop an alternative approach based on two well-known computers have become increasingly complex. Multi-core and widely-used tools: Git and Org-mode. The contribu- architectures comprising several accelerators (GPUs or the tions of our work are the following: Intel Xeon Phi) and interconnected by high-speed networks have become mainstream in the field of High-Performance. • We propose and describe in Section 4.1 a Git branching Obtaining the maximum performance of such heterogeneous model for managing experimental results synchronized machines requires to rely on combinations of complex soft- with the code that generated them. We used such ware stacks ranging from the compilers to architecture spe- branching model for two years and have thus identified cific libraries or languages (e.g., CUDA, OpenCL, MPI) and a few typical branching and merging operations, which complex dynamic runtimes (StarPU, Charm++, OmpSs, we describe in Section 4.3 etc.) [1]. Dynamic and opportunistic optimizations are done at every level and even experts have troubles fully under- • Such branching model eases provenance tracking, ex- standing the performance of such complex systems. Such periments reproduction and data accessibility. How- machines can no longer be considered as deterministic, espe- ever, it does not address issues such as documentation cially when it comes to measuring execution times of paral- of the experimental process and conclusions about the lel multi-threaded programs. Controlling every relevant so- results, nor the acquisition of meta-data about the ex- phisticated component during such measurements is almost perimental environment. To this end, in Section 4.2 impossible even in single processor cases [8], making the we complete our branching workflow with an intensive full reproduction of experiments extremely difficult. Con- use of Org-mode [11], which enables us to manage and sequently, studying computers has become very similar to keep in sync experimental results and meta-data. It studying a natural phenomena and it should thus use the also provides literate programming, which is very con- same principles as other scientific fields that had them de- venient in a laboratory notebook and eases the edition fined centuries ago. Although many conclusions are com- of reproducible articles. We explain in Section 4.3 how monly based on experimental results in this domain of com- the laboratory notebook and the Git branching can be puter science, articles generally poorly detail the experimen- nicely integrated to ease the set up of a reproducible tal protocol. Left with insufficient information, readers have article. generally troubles reproducing the study and possibly build upon it. Yet, as reminded by Drummond [4], reproducibility • Through the whole Section4, we demonstrate the of experimental results is the hallmark of science and there effectiveness of this approach by providing examples and opening our two years old Git repository at http: Copyright is held by the authors //gforge.inria.fr/projects/starpu-simgrid. We illustrate several points in the discussion by pointing directly to coherent, well-structured and reproducible way that allows specific commits. anyone to inspect how they were performed. • Although we open our whole Git repository for illus- 2.2 Case Study #2: Studying the Performance tration purposes, this is not required by our workflow. of CPU Caches There may be situations where researchers may want to share only parts of their work. We discuss in Sec- Another use case that required a particular care is the tion5 various code and experimental data publishing study of CPU cache performance on various Intel and ARM options that can be used within such a workflow. micro-architectures [13]. The developed source code was quite simple, containing only a few files, but there were nu- • Finally, we discuss in Section6 how the proposed merous input, compilation and operating system parameters methodology helped us conduct two very different to be taken into account. Probably the most critical part of studies in the High Performance Computing (HPC) this study was the environment setup, which proved to be domain. We also report limits of our approach, to- unstable, and thus, responsible for many unexpected phe- gether with some open questions. nomena. Therefore, it was essential to capture, understand and easily compare as much meta-data as possible. 2. MOTIVATION AND USE CASE DE- Limited resources (both in terms of memory, CPU power and disk space) and software packages restricted availability SCRIPTION on ARM processors have motivated some design choices of Our research is centered on the modeling of the perfor- our methodology. In particular, this required our solution mance of modern computer systems. To validate our ap- to be lightweight and with minimal dependencies. Although proach and models, we have to perform numerous measure- this research topic has not led to a reproducible article yet, ments on a wide variety of machines, some of which being we used the proposed workflow and can still track the whole sometimes not even dedicated. In such context, a presum- history of these experiments. ably minor misunderstanding or inaccuracy about some pa- rameters at small scale can result in a totally different behav- ior at the macroscopic level [8]. It is thus crucial to carefully 3. RELATED WORK collect all the useful meta-data and to use well-planed ex- In the last several years, the field of reproducibility re- periment designs along with coherent analyses, such details search tools has been very active, various alternatives emerg- being essential to the reproduction of experimental results. ing to address diverse problematic. However, only few of Our goal was however not to implement a new tool, but them are appropriate for running experiments and mea- rather to find a good combination of already existing ones suring execution time on large, distributed, hybrid systems that would allow a painless and effective daily usage. We comprising prototype hardware and software. The ultimate were driven by purely pragmatic motives, as we wanted to goal of reproducible research is to bridge the current gap keep our workflow as simple and comprehensible as possible between the authors and the article readers (see Figure1) while offering the best possible level of confidence in the by providing as much material as possible on the scientist reproducibility of our results. choices and employed artifacts. We believe that opening a In the rest of this section, we present two research topics laboratory notebook is one step in that direction since not we worked on, along with their specifics and needs regard- only the positive results are shared in the final article but ing experimentation workflow. These needs and constraints also all the negative results and failures that occurred during motivated several design choices of the workflow we present the process. in Section4. There are successful initiatives in different computer sci- ence fields (e.g., DETER [7] in cybersecurity experimenta- 2.1 Case Study #1: Modeling and Simulating tion or PlanetLab in the peer-to-peer community) that pro- Dynamic Task-Based Runtimes vide good tools for their users. However, when conduct- Our first use case aims at providing accurate perfor- ing experiments and analysis in the HPC domain, several mance predictions of dense linear algebra kernels on hybrid specific aspects need to be considered. We start by de- architectures (multi-core architectures comprising multiple scribing the ones related to the experimental platform setup GPUs). We modeled the StarPU runtime [1] on top of and configuration. Due to the specific nature of our re- the SimGrid simulator [3]. To evaluate the quality of our search projects (prototype platforms with limited control modeling, we had to conduct experiments on hybrid proto- and administration permissions), we have not much ad- type hardware whose software stack (e.g., CUDA or MKL dressed these issues ourselves although we think that they libraries) could evolve along time.
Recommended publications
  • Gforge-Lists-Mailman Apache2 0. Mailman 0. Gforge-Db-Postgresql
    llvm-2.8-dev mongodb-clients libfpdi-php lazarus-ide-gtk2 libecpg6 libtulip-ogl-dev gnome-mime-data 0. 0. 0. libparrot3.3.0 libwnck-3-0 0. 0. libjaxp1.3-java-gcj 0. libusbmuxd1 erlang-public-key 0. 1.93236714976 0. llvm-2.8 mongodb-dev libfpdf-tpl-php 0. libecpg-compat3 0. lazarus-ide libgemplugin-ruby1.8 0. 0. 2.24719101124 0. libtulip-dev 3.33333333333 0. 0. 0. 0. 0. 0. 0. 0. libgnomevfs2-0 gnome-desktop3-data 0. 0. 0. 1.68269230769 libopenr2-3 gir1.2-wnck-3.0 0. libxalan2-java-gcj 2.22222222222 libimobiledevice1 3.22580645161 erlang-inets 0. libgpg-error0 libblkid1 0.24154589372 libparrot-dev parrot llvm-2.8-runtime mongodb-server pnp4nagios-web libpgtypes3 libtulip-qt4-dev libnet-daemon-perl lazarus-src 0.0. 0. 0. 0. libgemplugin-ruby kdebase-workspace-data libecore-input1 0. libbft-dev libgtk2.0-common 0. 0. 0. 0.70949940875 0.749506903353 libeet-dev python-lazr.uri 0. 4.54674623473 0. cl-alexandria libxau6 0. pia freespacenotifier libksieve4 libtasn1-3 libgnutls26 0.45045045045 libwnck-3-common usbmuxd erlang-ssl libxerces2-java-gcj libgnomevfs2-common 0.0426257459506 0. gir1.2-gnomedesktop-3.0 fxload 0. dahdi-linux 0. 0. libcgns-dev 0. 0. 0. 0. mount initscripts sysvinit-utils 0. 0. reportbug libdbi-perl geant321-data 0.0394477317554 0.355029585799 0. 0. 4.07786302927 llvm-2.9 libecore-evas1 libecore-fb1 0. 0. 0.1574803149611.06709047565 libatk1.0-0 0. sugar-presence-service-0.90 0. 0. 0. 0. parrot-minimal mongrel 1.41899881751.45956607495 libgcrypt11 0. ksysguard kdebase-workspace-bin 0.
    [Show full text]
  • Gestión De Proyectos Software
    Proyecto Fin de Carrera AITForge: Gestión de Proyectos Software Autor: Antonio Domingo Lagares Alfaro Titulación: Ingeniero de Telecomunicación (Plan 98) Especialidad: Telemática Año: 2005 Tutor: Antonio Estepa Alonso AITForge: Gestión de Proyectos Software Índice de contenido 1 Prefacio..................................................................................................................6 2 Portales de Desarrollo Colaborativo......................................................................7 2.1 Introducción a los Entornos Colaborativos....................................................8 2.1.1 Hosting de Proyectos de Software Libre (FOSPHost)...........................9 2.1.2 ¿Software Libre y Software de Fuentes Abiertas?...............................10 2.1.3 Prácticas deseables en Software Libre................................................11 2.1.4 El nacimiento de una nueva filosofía de trabajo...................................13 2.1.5 Objetivos de los sistemas libres de FOSPHost....................................14 2.1.6 Principales características de los sistemas FOSPHost........................16 Características intrínsecas..........................................................................16 Características de utilidad...........................................................................17 Características de usabilidad......................................................................18 Características contextuales.......................................................................19 2.1.7
    [Show full text]
  • This Book Doesn't Tell You How to Write Faster Code, Or How to Write Code with Fewer Memory Leaks, Or Even How to Debug Code at All
    Practical Development Environments By Matthew B. Doar ............................................... Publisher: O'Reilly Pub Date: September 2005 ISBN: 0-596-00796-5 Pages: 328 Table of Contents | Index This book doesn't tell you how to write faster code, or how to write code with fewer memory leaks, or even how to debug code at all. What it does tell you is how to build your product in better ways, how to keep track of the code that you write, and how to track the bugs in your code. Plus some more things you'll wish you had known before starting a project. Practical Development Environments is a guide, a collection of advice about real development environments for small to medium-sized projects and groups. Each of the chapters considers a different kind of tool - tools for tracking versions of files, build tools, testing tools, bug-tracking tools, tools for creating documentation, and tools for creating packaged releases. Each chapter discusses what you should look for in that kind of tool and what to avoid, and also describes some good ideas, bad ideas, and annoying experiences for each area. Specific instances of each type of tool are described in enough detail so that you can decide which ones you want to investigate further. Developers want to write code, not maintain makefiles. Writers want to write content instead of manage templates. IT provides machines, but doesn't have time to maintain all the different tools. Managers want the product to move smoothly from development to release, and are interested in tools to help this happen more often.
    [Show full text]
  • Cross-Platform Environment for Application Life Cycle Management
    International Journal “Information Theories and Applications”, Vol. 24, Number 2, © 2017 177 CROSS-PLATFORM ENVIRONMENT FOR APPLICATION LIFE CYCLE MANAGEMENT Elena Chebanyuk, Oleksii Hlukhov Abstract: “Application Lifecycle Management (ALM) integrates and governs the planning, definition, design, development, testing, deployment, and management phases throughout the application lifecycle” [OMG, 2006]. This paper is devoted to designing of ALM for supporting all software development processes. A review of papers, making strong contribution for improving software development life cycle processes is represented. This review touches three branches of investigation, namely papers, related to: (1) improving of communication processes between stakeholders; (2) increasing effectiveness of some operations in software development life cycle processes; (3) developing fundamental methods and tools for performing different operations related to several software development life cycle. Then comparative analysis of such ALM environments as Visual Studio, Team Foundation Server, FusionForge, TeamForge, IBM Rational Team Concert, IBM Rational Software Architect, and IBM Rational Functional Tester, is performed. Comparison of different ALM environments’ functionality lets to formulate requirements for designing cross-platform ALM environment. Then the conceptual schema of cross-platform ALM based on Eclipse environment is proposed. All plugins’ functionalities were properly tested. Collaboration of plugins for supporting several software development tasks
    [Show full text]
  • Download, Copy, Distribute, Print, Search, Or Link to the Full Texts of Articles
    Interdisciplinary Journal of Problem-Based Learning Volume 4 | Issue 2 Article 6 Published online: 9-19-2010 An Investigation of an Open-Source Software Development Environment in a Software Engineering Graduate Course Xun Ge University of Oklahoma, [email protected] Kun Huang University of Oklahoma, [email protected] Yifei Dong Openseed Institute, [email protected] IJPBL is Published in Open Access Format through the Generous Support of the Teaching Academy at Purdue University, the School of Education at Indiana University, and the Jeannine Rainbolt College of Education at the University of Oklahoma. Recommended Citation Ge, X. , Huang, K. , & Dong, Y. (2010). An Investigation of an Open-Source Software Development Environment in a Software Engineering Graduate Course. Interdisciplinary Journal of Problem-Based Learning, 4(2). Available at: https://doi.org/10.7771/1541-5015.1120 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. This is an Open Access journal. This means that it uses a funding model that does not charge readers or their institutions for access. Readers may freely read, download, copy, distribute, print, search, or link to the full texts of articles. This journal is covered under the CC BY-NC-ND license. An Investigation of an Open-Source Software Development Environment in a Software Engineering Course Xun Ge, Kun Huang, and Yifei Dong Abstract A semester-long ethnography study was carried out to investigate project-based learning in a graduate software engineering course through the implementation of an Open-Source Software Development (OSSD) learning environment, which featured authentic projects, learning community, cognitive apprenticeship, and technology affordances.
    [Show full text]
  • Githosting Public Git Hosting Sites
    GitHosting From Git SCM Wiki Public Git hosting sites Here are some places that provide free Git hosting. Check on GitServer if you want to host your own repository. List is limited to sites that provide explicit Git hosting, not including generic hosting sites that can be used to host Git repositories. Framework is Support for Open-source Space Free private Provider open-source? other SCM repositories (GB) repositories Assembla (http://www.assembla.com/catalog/51-free-private- No SVN/Hg/P4 Yes 0.15 1 project, 3 users git-repository-package?type=private&ad=git-wiki) Beanstalk (http://beanstalkapp.com/?ad=git-wiki) No SVN No 0.1 1 projects, 1 user Unlimited projects, 5 bitbucket.org (http://bitbucket.org/) No Mercurial Yes Unlimited collaborators 5 repositories, 5 Codetidy (https://codetidy.net) No No No 0.1 collaborators Public access 1 project (unlimited Codebase (http://codebasehq.com) No Mercurial/SVN 0.05 available repos), 2 collaborators CloudForge (http://www.cloudforge.com/) No CVS/SVN Yes 0.2 1 user only Unlimited projects, 6 Deveo (https://deveo.com/) No Mercurial/SVN No Unlimited collaborators Unlimited projects, 10 GitEnterprise (http://www.gitenterprise.com/) No No No 1 collaborators GitHub (http://github.com/) No SVN Yes Unlimited No Unlimited projects, GitLab.com (https://about.gitlab.com/gitlab-com/) Yes No Yes Unlimited unlimited collaborators Pikacode (http://pikacode.com/) No Mercurial Yes 1 No 1 project, 2 ProjectLocker (http://www.projectlocker.com) No SVN Read-only http 0.2 collaborators repo.or.cz (http://repo.or.cz/)
    [Show full text]
  • Collaborative Software Development Using R-Forge
    INVITED SECTION:THE FUTURE OF R 9 Collaborative Software Development Using R-Forge Special invited paper on “The Future of R” Forge, host a project-specific website, and how pack- age maintainers submit a package to the Compre- by Stefan Theußl and Achim Zeileis hensive R Archive Network (CRAN, http://CRAN. R-project.org/). Finally, we summarize recent de- velopments and give a brief outlook to future work. Introduction Open source software (OSS) is typically created in a R-Forge decentralized self-organizing process by a commu- nity of developers having the same or similar inter- R-Forge offers a central platform for the develop- ests (see the famous essay by Raymond, 1999). A ment of R packages, R-related software and other key factor for the success of OSS over the last two projects. decades is the Internet: Developers who rarely meet R-Forge is based on GForge (Copeland et al., face-to-face can employ new means of communica- 2006) which is an open source fork of the 2.61 Source- tion, both for rapidly writing and deploying software Forge code maintained by Tim Perdue, one of the (in the spirit of Linus Torvald’s “release early, release original SourceForge authors. GForge has been mod- often paradigm”). Therefore, many tools emerged ified to provide additional features for the R commu- that assist a collaborative software development pro- nity, namely a CRAN-style repository for hosting de- cess, including in particular tools for source code velopment releases of R packages as well as a quality management (SCM) and version control.
    [Show full text]
  • Managing Large-Scale, Distributed Systems Research Experiments with Control-Flows Tomasz Buchert
    Managing large-scale, distributed systems research experiments with control-flows Tomasz Buchert To cite this version: Tomasz Buchert. Managing large-scale, distributed systems research experiments with control-flows. Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Lorraine, 2016. English. tel- 01273964 HAL Id: tel-01273964 https://tel.archives-ouvertes.fr/tel-01273964 Submitted on 15 Feb 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. École doctorale IAEM Lorraine MANAGINGLARGE-SCALE,DISTRIBUTED SYSTEMSRESEARCHEXPERIMENTSWITH CONTROL-FLOWS THÈSE présentée et soutenue publiquement le 6 janvier 2016 pour l’obtention du DOCTORATDEL’UNIVERSITÉDELORRAINE (mention informatique) par TOMASZBUCHERT Composition du jury : Rapporteurs Christian Pérez, Directeur de recherche Inria, LIP, Lyon Eric Eide, Research assistant professor, Univ. of Utah Examinateurs François Charoy, Professeur, Université de Lorraine Olivier Richard, Maître de conférences, Univ. Joseph Fourier, Grenoble Directeurs de thèse Lucas Nussbaum, Maître de conférences, Université de Lorraine Jens Gustedt, Directeur de recherche Inria, ICUBE, Strasbourg Laboratoire Lorrain de Recherche en Informatique et ses Applications – UMR 7503 ABSTRACT(ENGLISH) Keywords: distributed systems, large-scale experiments, experiment control, busi- ness processes, experiment provenance Running experiments on modern systems such as supercomputers, cloud infrastructures or P2P networks became very complex, both technically and methodologically.
    [Show full text]
  • (RES) Architecture Study
    Reuse Enablement System (RES) Architecture Study Prepared by: NASA Earth Science Data Systems – Software Reuse Working Group July 23, 2007 Updated February 14, 2008 Earth Science Data Systems Software Reuse Working Group Editor: James Marshall (Innovim / NASA Goddard Space Flight Center) Contributing Working Group Members: Angelo Bertolli (Innovim / NASA Goddard Space Flight Center) Nancy Casey (Science Systems and Applications, Inc. / NASA Goddard Space Flight Center) Bradford Castalia (University of Arizona) Victor Delnore * (NASA Langley Research Center) Robert R. Downs (Columbia University / NASA Socioeconomic Data and Applications Center) Ryan Gerard (Innovim / NASA Goddard Space Flight Center) Mary Hunter (Innovim / NASA Goddard Space Flight Center) Shahin Samadi (Innovim / NASA Goddard Space Flight Center) Mark Sherman (SGT Inc. / NASA Goddard Space Flight Center) Ross Swick (National Snow and Ice Data Center, University of Colorado – Boulder) Curt Tilmes (NASA Goddard Space Flight Center) Robert Wolfe * (NASA Goddard Space Flight Center) * Co-chair Working Group Participants: Nadine Alameh (MobiLaps LLC) Howard Burrows (Autonomous Undersea Systems Institute / National Science Digital Library) Yonsook Enloe (SGT Inc. / NASA Goddard Space Flight Center) Stefan Falke (Washington University in St. Louis) Michael Folk (National Center for Supercomputing Applications) Emily Greene (Raytheon Company) Tommy Jasmin (University of Wisconsin-Madison, Space Science and Engineering Center) Steve Olding (Everware / NASA Goddard Space Flight Center) Bill Teng (Science Systems and Applications, Inc. / NASA Goddard Space Flight Center) Fred Watson (California State University, Monterey Bay) Acknowledgements: Michael Chyatte (Computer Science Corporation / NASA Goddard Space Flight Center) – SourceMotel site running GForge Thomas Clune (NASA Goddard Space Flight Center) – SourceMotel site running GForge Kathy Fontaine (NASA Goddard Space Flight Center) – Earth Science Data System Working Group Coordinator Gene Major (Science Systems and Applications, Inc.
    [Show full text]
  • Contributions for Resource and Job Management in High Performance Computing Yiannis Georgiou
    Contributions for Resource and Job Management in High Performance Computing Yiannis Georgiou To cite this version: Yiannis Georgiou. Contributions for Resource and Job Management in High Performance Computing. Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Grenoble, 2010. English. tel- 01499598 HAL Id: tel-01499598 https://hal-auf.archives-ouvertes.fr/tel-01499598 Submitted on 31 Mar 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Public Domain THÈSE Pour obtenir le grade de DOCTEUR DE L’UNIVERSITÉ DE GRENOBLE Spécialité : Informatique Arrêté ministériel : 7 août 2006 Présentée par « Yiannis GEORGIOU » Thèse dirigée par « Olivier RICHARD » et codirigée par « Jean-Francois MEHAUT » préparée au sein du Laboratoire d'Informatique de Grenoble dans l'École Doctorale de Mathématiques, Sciences et Technologies de l'Information, Informatique Contributions for Resource and Job Management in High Performance Computing Thèse soutenue publiquement le « 5 novembre 2010 », devant le jury composé de : M. Daniel, HAGIMONT Professeur à INPT/ENSEEIHT, France, Président M. Franck, CAPPELLO Directeur de Recherche à INRIA, France, Rapporteur M. William T.C., KRAMER Directeur de Recherche à NCSA, USA, Rapporteur M. Morris, JETTE Informaticien au LLNL, USA, Membre Mme.
    [Show full text]
  • Postgresql the Postgresql.Org Infrastructure
    PostgreSQL the postgresql.org infrastructure Stefan Kaltenbrunner [email protected] http://www.kaltenbrunner.cc/blog Froscon 2009 - Sankt Augustin, Germany Agenda Infrastucture? Facts and figures Webservices Operational aspects The future A marketing problem? -sysadmins is like -core it's just that -core is doing their “secret stuff” in public ... ... -core is listed on the website but figuring out who is the sysadmin team and how to contact them is way more difficult German core member on IRC Infrastructure - why? Infrastructure - wth? cvs, website, wiki, mailinglists – something else? services pugs, pgfoundry, wwwmaster, wiki anoncvs, git hosting, dns, monitoring web and ftp-mirrors, ftp-master archives, pmt, jabber, pgweb development hosts planet Who? pgfoundry team (gforge, project approval, support) web weam (website and webservices) sysadmin team (servers, monitoring, operations) Subteams (translation, commitfest,...) Agenda Infrastructure? Facts and figures Webservices Operations The future Facts and figures ~60 Hosts (~75% community managed) 400 Service FreeBSD, Debian, Ubuntu, Slackware, CentOS, Redhat Enterprise Linux, VMware Server Austria, Canada, France, Israel, Panama, Sweden, USA Facts and figures Agenda Infrastructure? Facts and figures Webservices Operations The future? The website http://www.postgresql.org - static http://wwwmaster.postgresql.org - dynamic master subversion repository 3-4 active commiters PHP with PostgreSQL backend hourly snapshots & replication Interfaces
    [Show full text]
  • Open Source: Open for Business Open Source: Open for Business for Open Source: Open LEF09/3Cover.Qxd 9/3/04 1:18 PM Page 2
    LEF09/3cover.qxd 9/3/04 1:18 PM Page 1 THE LEADING PRESENTS: FORUM EDGE THE LEADING EDGE FORUM PRESENTS: Open Source: Open for Business Open Source: Open for Business LEF09/3cover.qxd 9/3/04 1:18 PM Page 2 CSC’s Leading Edge Forum is a global thought leadership program that examines the technology trends and issues affecting us today and those that will impact us in the future. As part of the CSC Office of Innovation, the LEF explores emerging technologies through sponsored inno- vation and grants programs, applied research, awards for the most innovative client solutions, and alliances with research labs. The LEF examines technology marketplace ABOUT THE LEF DIRECTORS trends and best practices, and stimulates innovation and collaboration among CSC, our clients and our alliance partners. In this ongoing series of reports about technology directions, the LEF looks at the role of innovation in the marketplace both now and in the years to come. By studying technology’s current realities and anticipating its future shape, these reports provide organizations with the necessary balance between tactical decision making and strategic planning. PAUL GUSTAFSON WILLIAM KOFF Director, Leading Edge Forum, and Senior Partner, Vice President, Leading Edge Forum CSC Consulting Group Paul Gustafson is an accomplished technologist and Bill Koff is a leader in CSC’s technology community. proven leader in emerging technologies, applied He chairs the Leading Edge Forum executive committee, research and strategy. As director of the Leading Edge whose members are the chief technologists from each Forum, Paul brings vision and leadership to a portfolio of CSC’s business units.
    [Show full text]