Tool Talk: Open Technologies and the Role of Librarians

Presenters: Vicky Steeves (NYU), Ted Laderas (OHSU), Kristi Holmes (Northwestern) Moderator: Lisa Federer (NLM)

MLA '20 Breakout Session 2 Generalist institutional repositories are essential for intentional open science

Kristi Holmes, PhD Galter Health Sciences Library & Learning Center Northwestern University Feinberg School of Medicine Tool Talk: Open Technologies and the Role of Librarians August 14, 2020 @kristiholmes

https://pixabay.com/photos/macro-cogwheel-gear-engine-vintage-1452987/ The benefits of OPEN

Why Open Science http://www.researchsupport.uct.ac.za/why-open-science Research isn’t open for everyone

Illustration by Tom Dunne

Bahlai, C. A., et al. (2019). Open Science Isn't Always Open to All Scientists: Current efforts to make research more accessible and transparent can reinforce inequality within STEM professions. American Scientist, 107(2). Available at https://www.americanscientist.org/article/open-science-isnt-always-open-to-all-scientists KNOWLEDGE VALIDATION: GENERATION: analyze and interpret observe and experiment

DISSEMINATION: IDEATION: report and share plan and design

PRESERVATION: PROVOCATION: store and maintain connect and discover

National Academies of Sciences, Engineering, and Medicine. 2018. Open Science by Design: Realizing a Vision for 21st Century Research. Washington, DC: The National Academies Press. https://doi.org/10.17226/25116. The generalist A powerful tool for open science

next-generation!

The InvenioRDM project. An international, interdisciplinary collaboration to build a user-focused, turn-key, next-gen research repository. Available at https://inveniosoftware.org/products/rdm/ We’re leveraging Invenio as a strong foundation at NU. Here’s why.

Behind the scenes User-focused features ▪ Discoverability. Leverages metadata standards, extensible and customizable metadata, and ▪ Research, shared. Securely share and preserve data the powerful Elasticsearch full-text search engine retrieves, facets, sorts, and filters searches records and a wide range of research types with with ease. collaborators, with asy dissemination to the community. ▪ Scalablity. Invenio is fast. Designed to manage 100+ million records and petabytes of files. All ▪ Communities. Create and curate communities (e.g., data can be archived independently of the size. workshop, project, lab, or journal). ▪ Technology. Modular underlying technology (Python, Flask) widely supported. Invenio is JSON- ▪ Integrates with best-practice tools and workflows. native and provides RESTful APIs to make it easy to build apps on top of the framework. GitHub, Jupyter Notebooks, Binder, and more. ▪ Ethical metrics. Industry standard usage statistics for record pages with all tracking completely ▪ Compliance-friendly. Comply with data sharing anonymized. mandates* and acknowledge your funders. ▪ Easy. Turn-key research data management platform & index can be easily deployed in the local ▪ Get credit & be cited. Get a DOI to make records easily environment. A SAAS-model for service via TIND (CERN spinoff). Customize the look and feel and uniquely citable. Pre-formatted citation text makes it to our local environment. easy to cite your work and be cited. Contributor roles ▪ A robust OS community: Large team of developers & active open source community. allows recognition of the whole team.

*Draft NIH Policy for Data Management and Sharing https://osp.od.nih.gov/wp-content/uploads/Draft_NIH_Policy_Data_Management_and_Sharing.pdf Collaboration and discovery, globally Next Generation Repository (NGR) behaviors & technologies

Behavior user stories: • “As a machine or human user, I need to easily and uniformly identify the licensing and re-use conditions of a scholarly resource, so that I know what I am allowed to do with it.”

Behavior user stories: • “As a user, I want to know when one of my social media contacts added a document, someone commented on a paper in a feed I was subscribed to, an open review has been provided on a paper I have read, a new dataset has been attached to a paper I am watching, a paper has been published based on a dataset I have used, etc.” • “As a user, I want to be able to discover and identify important people, relevant scientific methods, conference/journal/meetup venues, funding opportunities, etc. in my research field.”

Defining the next generation repository. COAR. Available at: https://ngr.coar-repositories.org Commonly requested features

InvenioRDM features all of the following:

● Embargo ● Preview of images, documents, compressed ● Handles any file type files ● Handles large data (up to 50GB per upload) ● GitHub integration ● Versioning ● Support for multiple authentication types ● DOI minting ● API support ● ORCiD sync-up ● Use API to export in JSON format, serializers ● Leverages controlled vocabularies (MeSH, FAST, ● Metadata schemas can be extended and etc.) and identifiers (DOI, ORCID, ROR, etc.) customized ● Automatic citation generation ● Extraction of metadata from files: size, name, ● Featured collections file type, md5 Data management for reproducibility and : study- focused resource types

InvenioRDM helps record, store, manage and, if needed, share study outputs: • Study-based resource types to manage a large range of assets • Reproducibility is enhanced: store research proposals, datasets, code • Be compliant with data sharing mandates • Cite and attribute the work of all contributors to research • Reuse deposited data or measures from other studies Communities & Collections

Community: Define a research group, department, event, or other collaborative unit; official and ad hoc groups supported

Clinical Studies Research Proposals Collection: Create multiple Collections under the Protocols XYZ Clinical Data Management Plans umbrella of a Community. Study Methods Descriptions Lorem ipsum Measures dolor sit amet, do Case Reports eiusmod tempor Collections bring together related groupings of files to consectetur elit, Datasets and Analyses sed incididunt ut labore et dolore communicate process, enable sharing of results, and magna aliqua. support publication, compliance and reproducibility Phenotype This example page from Definitions Phenotype Definitions Zenodo shows approach; Definitions Lorem ipsum dolor sit amet, sample records highlight Resource types are customizable for each instance as Characterizations do eiusmod Evaluations consectetur application in biomedicine elit, sed needed. Metadata incididunt Dissemination Strategy Generalist repositories Translation to Practice: My team wants to find out about clinical trial opportunities to better offer Making the most of important institutional assets, supporting intentional workflows patients all options for treatment. It is important to us to openly share the latest research with patients. The InvenioRDM communities give us a way to make Basic Science: I lead a large basic science research group. these materials openly, packaged in a cohesive and We use the repository to support reproducible science by attractive manner. As resources are updated, we can packaging our data and methods in a combined way. Everything upload the new versions and track access. gets a unique identifier and versioning is supported. Our lab prioritizes science communication, so the graduate students set up a collection of lay summaries of their research projects to enhance engagement and dissemination, tweeting back to the Population Health and Health Equity: Our multi- research summaries and other materials. institution health equity project uses InvenioRDM to collaborate with our community-based partners and properly credit these collaborators. We can share materials from community health events, project materials, training materials, annual Pre-Clinical: We’re managing a large multi-site project, harmonizing data from numerous sources reports, and lay summaries of research. InvenioRDM helps us to and managing research projects. We want to create communities of practice to integrate theories, data, be better partners, accountable to collaborators and the techniques, and tools. community. (ChicagoCHEC)

Early-Stage Investigators: I’m an early career Clinical Trials: I am a clinical researcher. I need a way to pre- researcher just getting started on my research career. I want register protocols or research proposals, search on demographics of Dissemination: Our institute wants to “put my best foot forward” to showcase my work and participants in similar studies, get insights into recruitment, and a way to publish and disseminate content demonstrate my expertise and collaborations. Our repository share portions of study for compliance. I also want an easy way to like handbooks, lay summaries, and more. gives me a way to make all of my research efforts findable share materials such as recruitment protocols, outreach We want to credit all contributors and and the metrics are helpful for reporting to leadership. The materials, and lay summaries of the trials with the community in produce an attractive and interactive grant database was a huge help to me as I prepared my K an organized way. resource that can be easily updated. proposal and I submitted it successfully last week!

Use cases and implementation in collaboration with the Northwestern University Clinical and Translational Sciences Institute [NCATS UL1TR001422], the Northwestern University Institute for Innovations in Developmental Sciences (DevSci), & the Chicago Cancer Health Equity Collaborative (ChicagoCHEC) [NCI U54CA202995, U54CA202997, and U54CA203000] The community

https://inveniosoftware.org/products/rdm/ Resources & Links

➔ Official InvenioRDM site: https://inveniosoftware.org/products/rdm/ ➔ Roadmap: https://inveniosoftware.org/products/rdm/roadmap/ ➔ GitHub: https://github.com/inveniosoftware/invenio-app-rdm ➔ Documentation: https://inveniordm.docs.cern.ch/ ➔ Project Boards: https://github.com/orgs/inveniosoftware/projects ➔ Releases: https://invenio-talk.web.cern.ch/c/projects/invenio-rdm/

InvenioRDM sandbox: https://inveniordm.web.cern.ch/ Documentation: https://inveniordm.docs.cern.ch/

Northwestern's (alpha!) Proof of Concept: http://bit.ly/inveniordm-at-nu ➔ Test Login: gla3975 ➔ Password: InvenioRDM@NU_2019

Install your own instance: https://inveniordm.docs.cern.ch/ Get updates and announcements: https://bit.ly/invenioRDMinfo Thank you!

Teams Support ● The Invenio team @ CERN & Work presented here is supported in part by: ● InvenioRDM collaborators (here) ● CERN Knowledge Transfer Fund ● Galter Health Sciences Library ● European Union’s Horizon 2020 research and innovation ● Northwestern University Clinical and Translational programme under the grant agreements OpenAIREplus, Sciences Institute OpenAIRE2020, OpenAIRE-Connect & OpenAIRE-Advance ● Confederation of OA Repositories (COAR) ● NUCATS: UL1TR001422 (NCATS) ● Northwestern University Libraries ● CD2H: U24TR002306 (NCATS) ● The Northwestern University Institute for Innovations in ● ChicagoCHEC: U54CA202995, U54CA202997, Developmental Sciences U54CA203000 (NCI) ● Alfred P. Sloan Foundation ● Sara Gonzales ● Arcadia Fund ● Sign up to receive project updates or arrange a demo: ● all of the InvenioRDM project partners https://bit.ly/invenioRDMinfo