Research Collection

Journal Article

Experimental Pipeline (Expipe): A Lightweight Data Management Platform to Simplify the Steps From Experiment to Data Analysis

Author(s): Lepperød, Mikkel Elle; Dragly, Svenn-Arne; Buccino, Alessio Paolo; Mobarhan, Milad Hobbi; Malthe- Sørenssen, Anders; Hafting, Torkel; Fyhn, Marianne

Publication Date: 2020-07

Permanent Link: https://doi.org/10.3929/ethz-b-000431534

Originally published in: Frontiers in Neuroinformatics 14, http://doi.org/10.3389/fninf.2020.00030

Rights / License: Creative Commons Attribution 4.0 International

This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.

ETH Library ORIGINAL RESEARCH published: 24 July 2020 doi: 10.3389/fninf.2020.00030

Experimental Pipeline (Expipe): A Lightweight Data Management Platform to Simplify the Steps From Experiment to Data Analysis

Mikkel Elle Lepperød 1,2*, Svenn-Arne Dragly 1,3, Alessio Paolo Buccino 1,4,5, Milad Hobbi Mobarhan 1,6, Anders Malthe-Sørenssen 1,3, Torkel Hafting 1,2 and Marianne Fyhn 1,6

1 Center for Integrative Neuroplasticity, University of Oslo, Oslo, Norway, 2 Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway, 3 Department of Physics, University of Oslo, Oslo, Norway, 4 Department of Informatics, University of Oslo, Oslo, Norway, 5 Department of Biosystems Science and Engineering, ETH, Zurich, Switzerland, 6 Department of Biosciences, University of Oslo, Oslo, Norway

As experimental neuroscience is moving toward more integrative approaches, with a variety of acquisition techniques covering multiple spatiotemporal scales, data management is becoming increasingly challenging for neuroscience laboratories. Often, datasets are too large to practically be stored on a laptop or a workstation. The ability to query metadata collections without retrieving complete datasets is therefore critical to efficiently perform new analyses and explore the data. At the same time, new experimental paradigms lead to constantly changing specifications for the metadata Edited by: to be stored. Despite this, there is currently a serious lack of agile tools for David A. Gutman, data management in neuroscience laboratories. To meet this need, we have developed Emory University, United States Expipe, a lightweight data management framework that simplifies the steps from Reviewed by: Pietro Pinoli, experiment to data analysis. Expipe provides the functionality to store and organize Politecnico di Milano, Italy experimental data and metadata for easy retrieval in exploration and analysis throughout Andrew P. Davison, the experimental pipeline. It is flexible in terms of defining the metadata to store and aims UMR9197 Institut des Neurosciences Paris Saclay (Neuro-PSI), France to solve the storage and retrieval challenges of data/metadata due to ever changing *Correspondence: experimental pipelines. Due to its simplicity and lightweight design, we envision Expipe Mikkel Elle Lepperød as an easy-to-use data management solution for experimental laboratories, that can [email protected] improve provenance, reproducibility, and sharing of scientific projects.

Received: 30 October 2019 Keywords: data management, Python (programming language), open source software (OSS), analysis, data Accepted: 15 June 2020 sharing, data base (DB) Published: 24 July 2020 Citation: Lepperød ME, Dragly S-A, 1. INTRODUCTION Buccino AP, Mobarhan MH, Malthe-Sørenssen A, Hafting T and Experimental neuroscience is increasingly moving toward an integrative understanding of Fyhn M (2020) Experimental Pipeline phenomena by simultaneously collecting data with a wide range of techniques including behavioral (Expipe): A Lightweight Data Management Platform to Simplify the tasks, electrophysiology, imaging and genetics. Datasets from these types of experiments span Steps From Experiment to Data a wide range of spatial and temporal scales. Often, the experimental setup is not finalized or Analysis. Front. Neuroinform. 14:30. rigidly predefined before data acquisition begins. Results may thus require additional branches doi: 10.3389/fninf.2020.00030 of experimentation or re-evaluation of the setup. For example, results may initiate additional

Frontiers in Neuroinformatics | www.frontiersin.org 1 July 2020 | Volume 14 | Article 30 Lepperød et al. Expipe: A Lightweight Data Management Platform behavioral studies, or combining electrophysiology with imaging on individual files that are generated during experiments, which data. Also, the majority of research today is carried out by may lead to metadata e.g., not being stored alongside data in research fellows employed on temporary contracts, imposing a a modularized and searchable fashion and may thus hamper challenge for both continuation of projects and data sharing. Put shareability and usability. simply, projects usually organically grow and mature through The above data management systems and tools either impose the experimental timeline. Moreover, the need for multi-modal little structure on the stored data or metadata, leaving it up approaches in neuroscience makes data management ever more to the researcher to design a custom storage specification, or challenging, complicating data sharing and open collaboration. assume particular fields that need to be predefined such as in In this paper we introduce a data management tool DataJoint1. However, research is dynamic in nature and new called Expipe (Experimental pipeline) which enables data discoveries often change what data and metadata within datasets management to simply evolve and mature organically together should be in focus. An ideal data management solution for with experiments in a semi-structured fashion. neuroscience laboratories needs to be flexible and adaptable to To improve reproducibility in neuroscience, several (larger) various experimental paradigms (Denker and Grün, 2016). initiatives point toward tools that facilitate sharing of data Alyx2 is a notable exception that for the most part has few and code (Crook et al., 2013; Denker and Grün, 2016; Zehl assumptions about the metadata to be stored, and allows its et al., 2016; Gleeson et al., 2017). Part of the data management users to store arbitrary metadata in JSON fields. However, like challenge comes from the wide range of formats produced by many other data management solutions, Alyx requires manual different experimental paradigms. Moreover, with increased size installation, configuration, and maintenance of a server to be of datasets, researchers are often unable to carry all their data used in a multi-user environment. Solutions that instead are around on their laptops or store them on workstations. The based on existing hosting providers can significantly lower possibility to query a metadata collection without retrieving the threshold for adapting a data management solution in entire datasets is therefore becoming more important. a laboratory. Data and metadata managing tools typically differ in the To address the shortcomings of existing solutions, we have amount of a priori imposed structure. In a structured database, created Expipe, a flexible, lightweight system for data handling. fields are typically required to be predefined and are best suited We propose a semi-structured data management platform for use cases where it is possible to predict the types of data and that is lightweight in nature and requires little planning and metadata that will be stored. In unstructured databases, fields maintenance to facilitate a broad range of experiments in typically evolve while the database is used and updated. Being neuroscience. Being modular and providing both human and highly flexibile, these types of databases are easy to use, but can machine readable metadata Expipe also support provenance be difficult to share across users as their evolved structure might tracking with GIN3 and Git Large File Storage4. not be intuitive or well-documented. The current tools that exist To organize metadata for data collected in the lab, an Expipe for experimental databases, can typically be described by one of Project contains the following objects: Modules, Actions, Entities, those two categories. and Templates (Figure 1A). The concepts are abstract, making DataNet (Harke¸Zlak˙ et al., 2014) is a data management Expipe flexible to use in many different scenarios. Also, we made method and architecture that defines repositories which can be the concepts few and simple to avoid introducing an overly accessed by any programming language through REST-based abstract framework that appears foreign to other researchers. . The goal of DataNet is to deliver a scalable solution that As dataset sizes can grow very quickly, making it slow to facilitates reproducibility and is capable of handling large data explore a scientific project, the capability of querying metadata volumes. DataNet is designed to be run on top of a platform-as- alone is essential to get an overview of the project and possibly to a-service (PaaS) provider, such as CloudFoundry. While DataNet select subsets of the database for further processing. In Expipe, can be regarded as an advanced data management solution, its Modules sit at the core of the system and contain metadata setup and usage is not specific to neuroscience and may require describing Projects, Actions and Entities in detail. The Modules existing experience in data management solutions. typically specify metadata about the equipment, environment, or Another effort toward a lightweight data management subjects, such as the numerical aperture of a microscope lens, software is dtool (Olsson and Hartley, 2019). Dtool was mainly the serial number of an acquisition system, or the temperature designed for bioinformatics/genomics data and it provides of a room. Actions define events that occurred at a specific time, a solution to package data and metadata together. Dtool such as an experiment, an analysis, or a simulation (Figure 1B). implements a CLI and a simple Python API to create datasets, Actions have a few specific attributes, such as a timestamp, and and metadata are provided by the user when a new dataset is store detailed metadata in Modules. Entities are long-lived things generated. The dtool framework, however, does not enforce or that are used in an Action, typically the ID of an experimental suggest any organization of the dataset, leaving it to the user. subject. Actions refer to Entities, but they do not link directly to Another proposed solution to organize and store complex them. Messages are user specific lines of text added to actions, metadata is the odML (Grewe et al., 2011) framework. Using odMLtables (Sprenger et al., 2019) it is possible to organize 1https://datajoint.io/ and store complex metadata in a hierarchical format and 2alyx.readthedocs.io collect, manipulate, visualize, and store metadata in tabular 3https://gin.g-node.org/ representations. However, this platform imposes no structure 4git-lfs.github.com

Frontiers in Neuroinformatics | www.frontiersin.org 2 July 2020 | Volume 14 | Article 30 Lepperød et al. Expipe: A Lightweight Data Management Platform

FIGURE 1 | Expipe data model. (A) An Expipe Project contains Entities and Actions. Entities represent the long-lived elements in a project (e.g., experimental subjects). Actions define events that occurred at a certain time (e.g., an experiment or a surgery). Modules contain metadata about a Project, an Action, or an Entity. Templates can be used to pre-define Modules. (B) A typical working example, using Expipe to structure an experimental pipeline. In this example, there is one Entity (experimental subject: Rat #0007) and three Actions with correspondent Modules: a surgery, a recording, and an analysis. such as notes. As Modules can be tedious to define each time an which are a snapshot of the Template at the time of creation. Action is created, Templates can be used to ease this process by Templates can change over time to reflect changes in the Project holding predefined information typically added to Modules. We without affecting existing Modules, since the Modules are copied will cover Expipe objects in more detail in section 2. from, rather than linked to Templates. Records in a relational A common obstacle in designing a general data management database, in comparison, are tied to a schema. solution for experimental data is to choose the right database Expipe is portable and has few dependencies. By default, schema in advance. For that reason, Expipe uses a NoSQL key- Expipe uses the file system for storing metadata, which means value database model which is flexible in terms of defining that no additional database installation or configuration is the metadata to store. Rather than forcing the user to select a required. Moreover, we have written a reference implementation database schema ahead of time, Expipe uses implicit schemas in in Python and an extendable command line interface (CLI), the form of what we call module Templates. These are similar making Expipe widely available to the scientific community. in scope to odML terminologies and to a large extent also The Python API allows users to interact with Expipe compatible with odML. Templates can be used to create Modules, programmatically. Additional Jupyter extensions are included

Frontiers in Neuroinformatics | www.frontiersin.org 3 July 2020 | Volume 14 | Article 30 Lepperød et al. Expipe: A Lightweight Data Management Platform with the API to provide a graphical user interface (GUI) that type and datetime, which can be easily accessed and modified gives an overview of stored contents. as follows: Expipe is written with modularity in mind and can use NoSQL from datetime import datetime 5 databases as backends, such as a Google Firebase . However, the entity.tags = ["wild type"] filesystem is used as a backend by default. One benefit with the entity.datetime = datetime.now() filesystem backend is that it allows data to be stored close to the entity.location = "Housing room 1234" metadata, within the Expipe directory structure. The filesystem entity.type = "Rat" backend also allows Expipe to easily be combined with GIN or entity.users = ["Peter", "Mary"] Git LFS to get full version control, safe synchronization between Entities are not static, but they can be updated over time collaborators, and hosting for data sharing. following the course of a project. In our example, for instance, the Our goal has been to make Expipe a lightweight framework Entity will undergo a modification when a surgery is performed, that can be adopted and used by researchers in laboratories when a recording is made, or when the animal is euthanized. with immediate data handling needs. To this end we present the These types of modifications can further be described with Expipe data model and envisioned usage below. Expipe Actions.

2. EXPIPE WALK-THROUGH 2.3. Actions Actions represent things that have happened at a specific point In this section we will present a step-by-step walk-through to in time, such as an experiment, an analysis, a surgery, or a the Expipe framework, by setting up an Expipe project for simulation run. In our toy project, after we have performed a sample application from neuroscience involving open-field an experiment, we can register it as an Action using the foraging experiments on rats combining extracellular recordings Project.create_action function: of medial entorhinal cortex (MEC) and optogenetic stimulation. action = project.create_action("2020-01-12- Expipe is available on PyPI6 and can be installed with pip. For recording") documentation, we refer to https://expipe.readthedocs.io. Actions can be updated over time (for instance, by adding processed data after some analysis). All Actions have some 2.1. Project common attributes, such as tags, users, location, type, First of all, we need to create an Expipe project. To entities and datetime. In our example, we performed a 11 Hz create a Project using the Python API for Expipe, one optogenetics stimulation during the recording, hence we can add simply needs to import the expipe package and run the this piece of information as tags: create_project function: from datetime import datetime import expipe action.tags = ["stimulation", "11 Hz"] project = expipe.create_project("project-x") action.datetime = datetime.now() action.location = "Room 1234" Expipe, by default, utilize the filesystem as a backend. This means action.type = "Recording" that an organized set of folders and files are used. When our action.entities = ["0007"] “project-x” is created, an Expipe folder named project-x will action.users = ["Peter", "Mary"] be created in the current working directory. These attributes are stored in the attributes.yaml text file in An Expipe project will contain a collection of Actions, Entities, the Action folder. Templates, Messages and project Modules (Figure 1A), which we will explain in the following sections. A typical working example 2.4. Modules of how to structure an experimental pipeline with Expipe is So far, we have only handled common metadata for Actions and illustrated in Figure 1B. Entities, such as tags, dates, and users. However, further specific metadata can be stored using Modules. Actions, Entities and the 2.2. Entities Project as a whole can have Modules attached. A Module holds Entities represent physical or conceptual things, such as metadata in key-value form, which is similar to a map or hash experimental equipment or subjects (like rats and mice). In our table in popular programming languages. Modules are intended simple example, we assume we are using a single rat (ID 0007) to hold metadata such as the equipment in an experiment, the for our experiments. We can then create the “rat” Entity using protocol that was used, or a summary of the obtained results. For the Project.create_entity function: example, a Module could describe the arena for the open-field entity = project.create_entity("0007") experiment that we performed as follows: action.modules["tracking"] = { Similarly to the project creation, the above command will create "environment": { a folder “0007” in the Entity folder of the project. All Entities "type": "box", have some common attributes, such as tags, users, location, "width": { "value": 1.2, 5firebase.google.com "unit": "m" 6https://pypi.org/project/expipe/ },

Frontiers in Neuroinformatics | www.frontiersin.org 4 July 2020 | Volume 14 | Article 30 Lepperød et al. Expipe: A Lightweight Data Management Platform

"height": { analysis, such as noting that a recording channel is noisy or "value": 1.4, that possibly a good unit is found. In order to keep a virtual "unit": "m" laboratory book, Messages can be added to an Action to add notes } and comments: } } action.create_message ("Experiment went well, possible grid cell on Within the Action’s folder there is another folder called modules, channels 4-8", which contains each Module as a YAML file. The above code user="Peter") snippet would for instance produce a file name tracking.yaml with the following contents: Messages are given a timestamp (the time of creation if not otherwise given), and stored within the Action. environment: type: "box" 2.7. Data width: value: 1.2 Actions, such as recordings, are usually performed by acquiring unit: "m" experimental data. Data can be easily linked to an Action in height: Expipe by using the data property of an Action, which is a map value: 1.4 from a string ID (e.g., “tracking”) to a path relative to the data unit: "m" folder of an Action: This means that the metadata can be easily modified later, not action.data["tracking"] = "results.exdir/tracking" only using the Python API, but also by manually editing the file action.data["snapshot"] = "snapshot.jpg" in a text editor. Note that when using Expipe in combination Here, exdir (Dragly et al., 2018) is used as the storage format. with provenance tracking systems such as Git or GIN, these The absolute path of the file is retrieved as a native pathlib types of changes will be documented and thus not pose a path by calling the Action.data_path function: risk for corruption of metadata integrity. The simple YAML action.data_path("tracking") syntax makes editing easy, without the need for a separate GUI only for editing purposes. Since many Actions could share the >>> PosixPath('/home/user/data_repo/actions same metadata (e.g., several recording using the same open-field /2020-01-12-recording/ arena), the creation of Modules is facilitated by Templates. data/results.exdir/tracking') 2.5. Templates By default the path is assumed to be stored relative to the action Modules can be created from scratch, as above, or and the absolute path can be obtained with Action.data_path. automatically be included based on a Template, by passing However, it is possible to use the data field to store any string, for the template argument: example, pointing to a directory on a server: tracking = action.require_module("tracking", action.data["tracking"] = "//server/ template= tracking_data/2020-01-12/" "tracking") We recommend storing the data directly in the “data” folder of This will copy the entire Template named tracking into a an Action, since the data and metadata can be tracked together Module with the same name in the given Action. As some by version control systems such as GIN, or Git LFS. metadata can be Action-specific, the Module can then be edited, for instance by filling out any blank values in the Template, either 2.8. Expipe Command Line Interface manually or by using the Python API: The command line interface (CLI) provides minimal interaction with the Expipe environment. The CLI can be used to create tracking["environment"]["type"] = "sphere" and configure projects, and to list available Actions, Entities In addition to Action Modules, Templates can also be used and Templates. to instantiate Modules for Entities and the entire Project It is easily extendable to add user specific functionalities by (Figure 1A). making an Expipe plugin. The addition of a plugin is performed There is minimal linking between metadata in Expipe to in two stages. First, a Python package (named my_package in the improve provenance. In a relational database, an Action would following example) must be installed in the Python environment. typically have links to the equipment used in a many-to-many Then, using the click Python package7, one can create a subclass relationship. However, Expipe is instead designed to copy the of the expipecli.utils.plugin.IPlugin class and define the entire equipment Template into the Modules of the Action. This required commands within the attach_to_cli method: is to ensure that the exact state of the equipment is recorded in class MyPlugin(IPlugin): the Action, and removes the risk of inadvertently updating the def attach_to_cli(self, cli): state of the equipment for an existing Action. @cli.command('my_extension') def my_extension(): 2.6. Messages print('Welcome to Expipe!') When performing an experiment, it is important for the experimenter to log some messages as a future reminder for the 7https://click.palletsprojects.com/

Frontiers in Neuroinformatics | www.frontiersin.org 5 July 2020 | Volume 14 | Article 30 Lepperød et al. Expipe: A Lightweight Data Management Platform

Finally, the newly created plugin must be added to the expipe Expipe framework: expipe config global --add plugin my_package

The newly created CLI command can be now invoked through the Expipe CLI:

>>> expipe my_extension

Welcome to Expipe!

For a comprehensive plugin used by the CINPLA laboratory to register, store, and analyze experimental recordings, we refer to the expipe_plugin_cinpla package (https://github.com/ CINPLA/expipe-plugin-cinpla).

2.9. Exploring Expipe Projects When an Expipe project has been created and populated, it can be explored through the API by simply looking in the filesystem (if this is the preferred backend) or with a Graphical User Interface (GUI). A basic GUI is available when using Expipe in a Jupyter notebook8. This GUI is based on IPython Widgets9. The widgets can be spawned by simply running the Expipe objects in a Jupyter cell, as shown in Figure 2. In addition, an entire Expipe Project can be visualized using the available Browser: import expipe expipe.Browser(project_path).display()

In the Browser GUI all Actions with their attributes are indexed enabling the user to get an overview of the entire Project structure, and contents such as attributes, Messages, and Modules. Expipe objects support queries like searching for Actions, Entities, and Modules, either through the GUI or with custom scripts by means of attributes. To perform more complex queries it can be convenient to combine object attributes such as tags etc. and metadata into a structured database, e.g., using Pandas10. The Expipe Browser allows to export all Actions and attributes to comma-separated values (CSV) file, which can then be e.g., loaded in Pandas. To include information from modules, custom scripts must be written, or if modules are created with odML, these can be combined with odMLtables (Sprenger et al., 2019). Through the Python API, Expipe objects can be conveniently FIGURE 2 | Graphical User Interface with Ipython widgets. (A) Simple accessed as dictionaries in order to ease iteration, retrieval and overview GUI obtained when an Expipe object container (such as setting. Actions, Modules and Entities can be iterated directly project.actions) is run in a Jupyter cell. (B) For a more comprehensive using values(), for both key and values items() is preferred overview the Browser can also be attained, where the entire Project structure is indexed. for action_name, action in project.actions.items(): if action.type != 'Recording': print(action_name) continue for module in action.modules.values(): 3. DISCUSSION print(module.content) In contemporary neuroscience, innovation happens also with discoveries of new types of measurement techniques 8https://jupyter.org/ and data. These may differ profoundly from existing data, 9https://ipywidgets.readthedocs.io/ when for example a new behavioral acquisition is added 10https://pandas.pydata.org/ to electrophysiology or imaging setup. Such changes in

Frontiers in Neuroinformatics | www.frontiersin.org 6 July 2020 | Volume 14 | Article 30 Lepperød et al. Expipe: A Lightweight Data Management Platform the collected data within a project require a flexible data convenient to be able to efficiently search for indicators that management system. signify inclusion in such a subset. This kind of search can be done In this paper, we present Expipe—a data management solution by using Action attributes such as tags. for neuroscience laboratories. Differently from existing solutions Because of the lightweight nature of Expipe, it can easily be (Harke¸Zlak˙ et al., 2014; Olsson and Hartley, 2019; Sprenger integrated with other data management software. For instance, et al., 2019)1, Expipe provides a semi-structured, but flexible workflows written in Snakemake (Köster and Rahmann, 2012) data management solution specifically designed to encompass can depend on files in an Expipe structure to define an automated the life-cycle of experimental projects in neuroscience. Being analysis workflow. Data sharing platforms that are based on lightweight and simple, relying on the familiar file-system the file system, such as GIN, can easily track the files in backend to organize project components, we see Expipe as an Expipe folder. Other tools, such as Git and Git LFS or an accessible and easy-to-use tool for laboratories to start Perforce12 for version control, can also be used in combination implementing a reproducible data management system in their with Expipe, with Git LFS being the preferred solution in research. This is a first and important step for (many) research our lab. groups that only use an ad-hoc solutions for organization of data Expipe does not impose any restriction on file formats, and metadata. to improve flexibility and to enable dealing with different The Expipe structure based on Entities, Actions, and Modules types of data. In our lab, we have used the Exdir format shares similarities with the core structures of PROV-DM11 (Dragly et al., 2018), which we developed as an alternative to (Entity, Activity, and Agent). The descriptions of an Entity is very HDF5, together with Expipe in several projects. Alternatively, similar in scope of what we envisioned as Entities. Activities are a common standard that is being increasingly used by the similar to Actions, although there is no required link between neuroscience community and that we strongly recommend is an Action and an Entity in Expipe. Finally, instead of an Agent, Neurodata Without Boarders (NWB) (Teeters et al., 2015). Other we have chosen to optionally have a user to be specified with an common file formats, such as image sequences, HDF5, and Action or an Entity. video files can be stored in the data directory of any Action. One of the main strengths of Expipe is its flexibility. There is no limitation in Expipe to the types of files it can However, flexibility can also be considered as a limitation. point to. The definition of metadata (Modules) is left to the user, Finally, Expipe uses the file system as backend for projects. but we encourage the use of predefined Templates for data However, this is not the only available solution. A Firebase collection, ideally standardized by the scientific community, backend is also supported, which stores the entire project as e.g., using odML terminologies (https://terminologies.g-node. key value pairs using Google Firebase5. The file system backend org/v1.1/terminologies.xml). Odml, however, is not designed as a could also support integration to cloud-based systems, such as database, rather as a way to structure metadata, one experimentat Dropbox13, Google14, or Amazon S315. a time. Expipe can work together with odML to give structure and modularization, with Expipe providing structure to the Project (e.g., each experiment is an Action belonging to the project) 4. CONCLUSION and with odML giving structure at the metadata level, by using Experimental progress in neuroscience is often innovative in well-defined and community-accepted metadata fields. terms of how behavioral and data acquisition paradigms are Another possible limitation of the Expipe framework is related used and combined. In such cases it can be difficult to a priori to provenance. Our relaxed integrity verification in relations design a data and metadata structure that encompass all aspects between objects simplifies structure and development, but also of a project. On the other hand, having no structure at all can comes with some drawbacks. For instance, when adding an Entity lead to problems with reproducibility and sharability. To solve to an Action, there is no insurance that this Entity exists or is this problem we propose a semi-structured data management described. Similarly, if a user is added to an Action attribute, platform that is lightweight in nature and requires little planning the user name might, for instance, be incorrectly spelled. The and maintenance to facilitate a broad range of experiments in structure in Expipe thus relies on its users to ensure provenance. neuroscience. Being modular and providing both human and Methods for user specific schemas that ensure provenance could machine readable metadata in text files Expipe can easily be be added through plugin functionality, e.g., by building a stricter combined with other tools such as odMLtables, Pandas, Git and control of object creation and annotation. For example, the Git LFS. Moreover, it is easy to search and create subsets of click Python package required to create custom Expipe plugins experiments within a large project making Expipe ideal both provides a first check on argument types. Finally, an extended during data acquisition and data analysis. Expipe is a novel data plugin functionality that accepts schemas at project creation management tool that solves many of the problems associated could be added, this would also ease integration of Expipe into with existing data and metadata management software. Our hope more complete data handling solutions. is that Expipe will be adopted by the community and become A typical project in neuroscience may contain many experiments, but only a subset of the experiments might be 12perforce.com/products/helix-core selected for further analysis. In this situation it is highly 13https://www.dropbox.com/ 14console.cloud.google.com/ 11https://www.w3.org/TR/prov-dm/ 15aws.amazon.com/s3

Frontiers in Neuroinformatics | www.frontiersin.org 7 July 2020 | Volume 14 | Article 30 Lepperød et al. Expipe: A Lightweight Data Management Platform a simple data management solution that can be integrated with DATA AVAILABILITY STATEMENT other software for analysis workflows, provenance tracking, and data sharing. The source code of Expipe is available at github.com/ With Expipe we propose that a modularized semi-structured cinpla/expipe. database model can enable an efficient and user friendly approach to handling complex experimental datasets. AUTHOR CONTRIBUTIONS

5. SIGNIFICANCE STATEMENT ML conceived the project, wrote the code, and wrote the manuscript. S-AD conceived the project, wrote the code, and To facilitate data sharing, provenance and management of wrote the manuscript. AB and MM wrote the code and wrote data and metadata we introduce Expipe, a semi-structured the manuscript. AM-S acquired funding. TH and MF acquired and lightweight data management platform designed for funding and wrote the manuscript. All authors contributed to the neuroscience research. Expipe implements a conceptually simple article and approved the submitted version. and familiar project structure and includes functionalities for data and metadata handling, retrieval, and exploration, which FUNDING in turn can simplify the steps from experiments to analysis. Differently from existing solutions, the flexible and easy-to-use This work was funded by the Norwegian Research Council Expipe framework can provide an entry-level data management (Grant No. 217920 to MF and Grant No. 231248 for TH) and solution for both small and large experimental laboratories. by the University of Oslo.

REFERENCES Köster, J., and Rahmann, S. (2012). Snakemake–a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522. doi: 10.1093/bioinformatics/bts480 Crook, S. M., Davison, A. P., and Plesser, H. E. (2013). “Learning from Olsson, T. S., and Hartley, M. (2019). Lightweight data management with dtool. the past: approaches for reproducibility in computational neuroscience,” in PeerJ 7:e6562. doi: 10.7717/peerj.6562 20 Years of Computational Neuroscience. Springer Series in Computational Sprenger, J., Zehl, L., Pick, J., Sonntag, M., Grewe, J., Wachtler, T., Neuroscience, Vol. 9, ed J. Bower (New York, NY: Springer), 73–102. et al. (2019). odMLtables: a user-friendly approach for managing doi: 10.1007/978-1-4614-1424-7_4 metadata of neurophysiological experiments. Front. Neuroinform. 13:62. Denker, M., and Grün, Ss. (2016). “Designing workflows for the reproducible doi: 10.3389/fninf.2019.00062 analysis of electrophysiological data,” in Brain-Inspired Computing. BrainComp Teeters, J. L., Godfrey, K., Young, R., Dang, ., Friedsam, C., Wark, B., et al. 2015. Lecture Notes in Computer Science, Vol. 10087, eds K. Amunts, (2015). Neurodata without borders: creating a common data format for L. Grandinetti, T. Lippert, and N. Petkov (Cham: Springer), 58–72. neurophysiology. Neuron 88, 629–634. doi: 10.1016/j.neuron.2015.10.025 doi: 10.1007/978-3-319-50862-7_5 Zehl, L., Jaillet, F., Stoewer, A., Grewe, J., Sobolev, A., Wachtler, T., et al. (2016). Dragly, S.-A., Hobbi Mobarhan, M., Lepperød, M. E., Tennøe, S., Fyhn, M., Handling metadata in a neurophysiology laboratory. Front. Neuroinform. 10:26. Hafting, T., et al. (2018). Experimental directory structure (Exdir): an doi: 10.3389/fninf.2016.00026 alternative to HDF5 without introducing a new file format. Front. Neuroinform. 12:16. doi: 10.3389/fninf.2018.00016 Conflict of Interest: The authors declare that the research was conducted in the Gleeson, P., Davison, A. P., Silver, R. A., and Ascoli, G. A. (2017). absence of any commercial or financial relationships that could be construed as a A commitment to open source in neuroscience. Neuron 96, 964–965. potential conflict of interest. doi: 10.1016/j.neuron.2017.10.013 Grewe, J., Wachtler, T., and Benda, J. (2011). A bottom-up approach Copyright © 2020 Lepperød, Dragly, Buccino, Mobarhan, Malthe-Sørenssen, Hafting to data annotation in neurophysiology. Front. Neuroinform. 5:16. and Fyhn. This is an open-access article distributed under the terms of the Creative doi: 10.3389/fninf.2011.00016 Commons Attribution License (CC BY). The use, distribution or reproduction in Harke¸Z˙ lak, D., Kasztelnik, M., Pawlik, M., Wilk, B., and Bubak, M. (2014). other forums is permitted, provided the original author(s) and the copyright owner(s) “A lightweight method of metadata and data management with DataNet,” in are credited and that the original publication in this journal is cited, in accordance eScience on Distributed Computing Infrastructure, eds M. Bubak, J. Kitowski, with accepted academic practice. No use, distribution or reproduction is permitted and K. Wiatr (Cham: Springer), 164–177. doi: 10.1007/978-3-319-10894-0_12 which does not comply with these terms.

Frontiers in Neuroinformatics | www.frontiersin.org 8 July 2020 | Volume 14 | Article 30