1/25/2019

Biocuration of and pathways involves the creation of a user-friendly narrative of biological information—based on review, analysis and systematic organization of data—using manual and semi-automated methods. Involving the Research Community in Biocuration of Genes and Pathways

Sushma Naithani Department of Botany and Pathology Oregon State University [email protected]

Plant and Animal Genome XXVII, San Diego, Jan 15, 2019 Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

Plant in the grand scheme of An example of a curated rice reference pathway in genomics science the Plant Reactome

Data generation Knowledgebase Plant’s response to biotic stimuli: Fungi and • Sequence data • Mining • Proteomes • Synthesis • Genotyping • Visualization Handling • Phenotype • Cyberinfrastructure • Analysis • Storage • Annotation • Metadata Data impacts • Hypothesis • Translational research

Plant and Animal Genome XXVII, San Diego, Jan 15, 2019 Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

Plant Reactome: an open resource for the The scope of pathway curation data community

Plant and Animal Genome XXVII, San Diego, Jan 15, 2019 Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

1 1/25/2019

Biocuration is one of the bottlenecks in making Involving the community in biocuration genomic data FAIR Finding a common framework to depict pathways PubMed search results for anthocyanin biosynthesis 600

500 Year 2018 - 499 items Dec • Expertise

400

300 • Time

200 Numberpublications of 100 • Training

0

2018

1980 1985 1990 1995 2000 2005 2010 2015 Year

Plant and Animal Genome XXVII, San Diego, Jan 15, 2019 Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

Design of a workshop

FAIR is a Fairy tale! • Tool Centric: Teach how to use biocuration tools Usually workshops are OK But, hardly anyone returns with a curated pathway!

• Knowledge Centric (process of curation)  gathering data  evaluating evidence  synthesizing knowledge There is very little scientific information that meets FAIR standards (Forget about tools)

Plant and Animal Genome XXVII, San Diego, Jan 15, 2019 Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

Strategy and workflow of the pathway curation Every fairy tale needs: Task 1: Selection of the articles ‘Magic’ Secret recipes (SOPs for and pathway curation)

+ Alliances

You can be a part of this ‘fairy tale’

Plant and Animal Genome XXVII, San Diego, Jan 15, 2019 Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

2 1/25/2019

Task 2: Critical review of the literature Biocuration in Reactome Curator Tool A core strength of Biologists Evaluation of the data and synthesis of knowledge

two-hybrid assay – Co-immunoprecipitation – Mutant and transgenic studies – Quantitative trait locus (QTL) mapping – Isotope-coded affinity tagging (ICAT) – Predicted -protein interactions A caricature by Kara (sponsored by DNA Link) – Expression clustering techniques – Literature-mining for specified interactions – Green florescent protein (GFP) tagging

Plant and Animal Genome XXVII, San Diego, Jan 15, 2019 Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

View of a Plant Reactome pathway We do not excel using only Excel! Standard Gene IDs Subcellular location Reaction / Pathway But, it could be a stepping stone… UniProt IDs Membrane association (TMM) Summary with citation Task 3: Data collection • genes • functions • cellular location • associated reactions • associated pathway(s)

Plant and Animal Genome XXVII, San Diego, Jan 15, 2019 Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

Task 4: Connecting dots Community Pathway Curation Jamboree 2018 at Oregon State University • imagining reactions • building pathways

Reactome Data Model

Plant and Animal Genome XXVII, San Diego, Jan 15, 2019 Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

3 1/25/2019

Curation outcome from community DO NOT DRAW in PowerPoint! curators

Os01g09528 OsWOX5 STI-like 00 (HLH-TF) Os12g023380 0 Os03g026630 Os11g055010 0 • We curated data from 7 research articles 0 AP2/ERF Os09g0438700 OsDREB2 OsVP1 (TF) D Heat Os03g0277300

OsCML18 Os11g0547000 Os12g0244100 • Extracted a list of 300 genes HVA22 OsEnS-2 Os01g0136100 OsEnS-18 MADS57 OsAPX Drought 1 OsWOX4 • Curated two pathways, gathered material for 3 Salinity OsMPK5 OsWOX12B OSERF3 Os08g052160 SUS2 pathways VAL3- 0 like/GD1-B3 OsWRKY7 1 ABA OsABF1 Ethylene OsHsfA7 GA Os05g0542500 • 1 opinion article (under review in DATABASE) Biotic KEY Submergence induced ERD4 Cold suppressed GIP13 +ively Naithani et al. Involving community in genes and pathway curation. Database (2019) Vol. Os01g0615100 RACK1A regulated by a TF hormone RAR1 OSMADS18 2019: article ID bay146; doi:10.1093/database/bay146 OsRac1 HSP Hormon s e XB24 OsEnS-22 LEA enzymes SAB4 SAB1 Plant and Animal Genome XXVII, San Diego, Jan 15, 2019 Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

Learning outcome for the students Other option for drawing pathways & gene networks

 Critical review of literature http://www.wikipathways.org/  The value of consistency in gene nomenclature

 Integration of information from various sources  Data organization  How to build data-driven hypotheses  Why ontologies are useful  Genomic resources are not perfect and are a work in progress.

Plant and Animal Genome XXVII, San Diego, Jan 15, 2019 Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

Online collaborators for gene and pathway curation Drawing gene-gene networks using PathVisio

• Prof. Ashwani Pareek’s Group (JNU, Delhi, India) • Dr. Snehlata Pareek’s Group (ICGEB Delhi, India) • Dr. Bijiyalakshmi Mohanty, University of Singapore

Plant and Animal Genome XXVII, San Diego, Jan 15, 2019 Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

4 1/25/2019

Conclusions

 If incorporated in the graduate school curriculum, biocuration training could be beneficial to students and simultaneously increase the community’s contribution to biocuration of public databases.

 The investment by various stakeholders (academia, industry, educators, scientific societies, and publishers) in engaging and training the broader research community in biocuration will provide a sustainable and quality solution for keeping pace with the Big Data explosion.

Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

Acknowledgements Oregon State University Cold Spring Harbor Laboratory .Pankaj Jaiswal (Co-PI) .Doreen Ware (PI) .Justin Preece (Software Dev) .Andrew Olson (Search integration) .Sushma Naithani (Curation & outreach) .Marcela K. Tello-Ruiz (Project Coordinator) .Parul Gupta (Curation) .Justin Elser (Software Dev) European Bioinformatics Institute . Priyanka Garg (Curation) . Antonio Fabregat Mundo (Reactome Dev) . Irene Papatheodorou (ATLAS) NYU Langone Medical Center . Alfonso Muñoz-Pomer Fuentes .Peter D’Eustachio (Curation mentor) . IntAct

Ontario Institute for Cancer Source data providers & collaborators Research • Araport • BAR • TreeBase .Lincoln Stein (Reactome PI) • SoyBase • MaizeGDB .Robin Haw • PeanutBase • Phytozome .Joel Weiser • Legume information System • Planteome .Guanming Wu • WikiPathways

Funding: Gramene - Exploring Function through Comparative Genomics and Network Analysis (NSF IOS 1127112) Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

Thank you!

Plant and Animal Genome XXVII, San Diego, Jan 15, 2019

5