ERMrest: an entity-relationship data storage service for web-based, data-oriented collaboration. Karl Czajkowski, Carl Kesselman, Robert Schuler, Hongsuda Tangmunarunkit Information Sciences Institute Viterbi School of Engineering University of Southern California Marina del Rey, CA 90292 Email: fkarlcz,carl,schuler,
[email protected] Abstract—Scientific discovery is increasingly dependent on a are hard to evolve over time, and make it difficult to search scientist’s ability to acquire, curate, integrate, analyze, and share for specific data values. large and diverse collections of data. While the details vary from domain to domain, these data often consist of diverse digital In previous work, we have argued for an alternative ap- assets (e.g. image files, sequence data, or simulation outputs) that proach based on scientific asset management [5]. We separate are organized with complex relationships and context which may the “science data” (e.g. microscope images, sequence data, evolve over the course of an investigation. In addition, discovery flow cytometry data) from the “metadata” (e.g. references, is often collaborative, such that sharing of the data and its provenance, properties, and contextual relationships). We have organizational context is highly desirable. Common systems for also defined a data-oriented architecture which expresses col- managing file or asset metadata hide their inherent relational laboration as the manipulation of shared data resources housed structures, while traditional relational database systems do not extend to the distributed collaborative environment often seen in complementary object (asset) and relational (metadata) in scientific investigations. To address these issues, we introduce stores [6]. The metadata encode not only properties and refer- ERMrest, a collaborative data management service which allows ences of individual assets, but relationships among assets and general entity-relationship modeling of metadata manipulated other domain-specific elements such as experiments, protocol by RESTful access methods.