(Expipe): a Lightweight Data Management Platform to Simplify the Steps from Experiment to Data Analysis

Research Collection Journal Article Experimental Pipeline (Expipe): A Lightweight Data Management Platform to Simplify the Steps From Experiment to Data Analysis Author(s): Lepperød, Mikkel Elle; Dragly, Svenn-Arne; Buccino, Alessio Paolo; Mobarhan, Milad Hobbi; Malthe- Sørenssen, Anders; Hafting, Torkel; Fyhn, Marianne Publication Date: 2020-07 Permanent Link: https://doi.org/10.3929/ethz-b-000431534 Originally published in: Frontiers in Neuroinformatics 14, http://doi.org/10.3389/fninf.2020.00030 Rights / License: Creative Commons Attribution 4.0 International This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use. ETH Library ORIGINAL RESEARCH published: 24 July 2020 doi: 10.3389/fninf.2020.00030 Experimental Pipeline (Expipe): A Lightweight Data Management Platform to Simplify the Steps From Experiment to Data Analysis Mikkel Elle Lepperød 1,2*, Svenn-Arne Dragly 1,3, Alessio Paolo Buccino 1,4,5, Milad Hobbi Mobarhan 1,6, Anders Malthe-Sørenssen 1,3, Torkel Hafting 1,2 and Marianne Fyhn 1,6 1 Center for Integrative Neuroplasticity, University of Oslo, Oslo, Norway, 2 Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway, 3 Department of Physics, University of Oslo, Oslo, Norway, 4 Department of Informatics, University of Oslo, Oslo, Norway, 5 Department of Biosystems Science and Engineering, ETH, Zurich, Switzerland, 6 Department of Biosciences, University of Oslo, Oslo, Norway As experimental neuroscience is moving toward more integrative approaches, with a variety of acquisition techniques covering multiple spatiotemporal scales, data management is becoming increasingly challenging for neuroscience laboratories. Often, datasets are too large to practically be stored on a laptop or a workstation. The ability to query metadata collections without retrieving complete datasets is therefore critical to efficiently perform new analyses and explore the data. At the same time, new experimental paradigms lead to constantly changing specifications for the metadata Edited by: to be stored. Despite this, there is currently a serious lack of agile software tools for David A. Gutman, data management in neuroscience laboratories. To meet this need, we have developed Emory University, United States Expipe, a lightweight data management framework that simplifies the steps from Reviewed by: Pietro Pinoli, experiment to data analysis. Expipe provides the functionality to store and organize Politecnico di Milano, Italy experimental data and metadata for easy retrieval in exploration and analysis throughout Andrew P. Davison, the experimental pipeline. It is flexible in terms of defining the metadata to store and aims UMR9197 Institut des Neurosciences Paris Saclay (Neuro-PSI), France to solve the storage and retrieval challenges of data/metadata due to ever changing *Correspondence: experimental pipelines. Due to its simplicity and lightweight design, we envision Expipe Mikkel Elle Lepperød as an easy-to-use data management solution for experimental laboratories, that can [email protected] improve provenance, reproducibility, and sharing of scientific projects. Received: 30 October 2019 Keywords: data management, Python (programming language), open source software (OSS), analysis, data Accepted: 15 June 2020 sharing, data base (DB) Published: 24 July 2020 Citation: Lepperød ME, Dragly S-A, 1. INTRODUCTION Buccino AP, Mobarhan MH, Malthe-Sørenssen A, Hafting T and Experimental neuroscience is increasingly moving toward an integrative understanding of Fyhn M (2020) Experimental Pipeline phenomena by simultaneously collecting data with a wide range of techniques including behavioral (Expipe): A Lightweight Data Management Platform to Simplify the tasks, electrophysiology, imaging and genetics. Datasets from these types of experiments span Steps From Experiment to Data a wide range of spatial and temporal scales. Often, the experimental setup is not finalized or Analysis. Front. Neuroinform. 14:30. rigidly predefined before data acquisition begins. Results may thus require additional branches doi: 10.3389/fninf.2020.00030 of experimentation or re-evaluation of the setup. For example, results may initiate additional Frontiers in Neuroinformatics | www.frontiersin.org 1 July 2020 | Volume 14 | Article 30 Lepperød et al. Expipe: A Lightweight Data Management Platform behavioral studies, or combining electrophysiology with imaging on individual files that are generated during experiments, which data. Also, the majority of research today is carried out by may lead to metadata e.g., not being stored alongside data in research fellows employed on temporary contracts, imposing a a modularized and searchable fashion and may thus hamper challenge for both continuation of projects and data sharing. Put shareability and usability. simply, projects usually organically grow and mature through The above data management systems and tools either impose the experimental timeline. Moreover, the need for multi-modal little structure on the stored data or metadata, leaving it up approaches in neuroscience makes data management ever more to the researcher to design a custom storage specification, or challenging, complicating data sharing and open collaboration. assume particular fields that need to be predefined such as in In this paper we introduce a data management tool DataJoint1. However, research is dynamic in nature and new called Expipe (Experimental pipeline) which enables data discoveries often change what data and metadata within datasets management to simply evolve and mature organically together should be in focus. An ideal data management solution for with experiments in a semi-structured fashion. neuroscience laboratories needs to be flexible and adaptable to To improve reproducibility in neuroscience, several (larger) various experimental paradigms (Denker and Grün, 2016). initiatives point toward tools that facilitate sharing of data Alyx2 is a notable exception that for the most part has few and code (Crook et al., 2013; Denker and Grün, 2016; Zehl assumptions about the metadata to be stored, and allows its et al., 2016; Gleeson et al., 2017). Part of the data management users to store arbitrary metadata in JSON fields. However, like challenge comes from the wide range of formats produced by many other data management solutions, Alyx requires manual different experimental paradigms. Moreover, with increased size installation, configuration, and maintenance of a server to be of datasets, researchers are often unable to carry all their data used in a multi-user environment. Solutions that instead are around on their laptops or store them on workstations. The based on existing hosting providers can significantly lower possibility to query a metadata collection without retrieving the threshold for adapting a data management solution in entire datasets is therefore becoming more important. a laboratory. Data and metadata managing tools typically differ in the To address the shortcomings of existing solutions, we have amount of a priori imposed structure. In a structured database, created Expipe, a flexible, lightweight system for data handling. fields are typically required to be predefined and are best suited We propose a semi-structured data management platform for use cases where it is possible to predict the types of data and that is lightweight in nature and requires little planning and metadata that will be stored. In unstructured databases, fields maintenance to facilitate a broad range of experiments in typically evolve while the database is used and updated. Being neuroscience. Being modular and providing both human and highly flexibile, these types of databases are easy to use, but can machine readable metadata Expipe also support provenance be difficult to share across users as their evolved structure might tracking with GIN3 and Git Large File Storage4. not be intuitive or well-documented. The current tools that exist To organize metadata for data collected in the lab, an Expipe for experimental databases, can typically be described by one of Project contains the following objects: Modules, Actions, Entities, those two categories. and Templates (Figure 1A). The concepts are abstract, making DataNet (Harke¸Zlak˙ et al., 2014) is a data management Expipe flexible to use in many different scenarios. Also, we made method and architecture that defines repositories which can be the concepts few and simple to avoid introducing an overly accessed by any programming language through REST-based abstract framework that appears foreign to other researchers. APIs. The goal of DataNet is to deliver a scalable solution that As dataset sizes can grow very quickly, making it slow to facilitates reproducibility and is capable of handling large data explore a scientific project, the capability of querying metadata volumes. DataNet is designed to be run on top of a platform-as- alone is essential to get an overview of the project and possibly to a-service (PaaS) provider, such as CloudFoundry. While DataNet select subsets of the database for further processing. In Expipe, can be regarded as an advanced data management solution, its Modules sit at the core of the system and contain metadata setup and usage is not specific to neuroscience and may require describing Projects, Actions and Entities in detail. The Modules existing experience in data management solutions. typically specify metadata about the equipment, environment, or Another effort toward a lightweight data management

Load more