The National Center for Genome Analysis Support 2019 Annual Report

ABI Sustaining: The National Center for Genome Analysis Support 2019 Annual Report Thomas G. Doak https://orcid.org/0000-0001-5487-553X Craig A. Stewart https://orcid.org/0000-0003-2423-9019 Robert Henschel https://orcid.org/0000-0003-2289-9398 Matthew Hahn https://orcid.org/0000-0002-5731-8808 Yuzhen Ye https://orcid.org/0000-0003-3707-3185 Indiana University PTI Technical Report PTI-TR19-003 September 10, 2019 Please cite as: Doak, T. G., Stewart, C. A., Henschel, R., Hahn, M., Yuzhen, Y. (2019). ABI Sustaining: The National Center for Genome Analysis Support 2019 Annual Report (PTI Technical Report). Indiana University, Bloomington, IN. Retrieved from http://hdl.handle.net/2022/25133. This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Table of Contents ABI Sustaining: The National Center for Genome Analysis Support 2019 Annual Report ...........i Executive Summary ................................................................................................................... iii 1. Accomplishments ................................................................................................................1 1.1. What are the major goals of the project? ................................................................................ 1 1.2. What was accomplished under these goals? ........................................................................... 1 1.3. What opportunities for training and professional development has the project provided? .... 8 1.4. How have the results been disseminated to communities of interest? ................................... 9 1.5. What do you plan to do during the next reporting period to accomplish the goals? Goals for the next year ..................................................................................................................................... 14 2. Products ............................................................................................................................18 2.1. Products resulting from this project during the specified reporting period ........................... 18 3. Participants .......................................................................................................................21 3.1. Individuals ............................................................................................................................. 21 3.2. Partner organizations ............................................................................................................ 23 3.3. Have other collaborators or contacts been involved? ........................................................... 24 4. Impact ...............................................................................................................................24 4.1. What is the impact on the development of the principal discipline(s) of the project? .......... 25 4.2. What is the impact on other disciplines? ............................................................................... 25 4.3. What is the impact on the development of human resources? ............................................. 25 4.4. What is the impact on physical resources that form infrastructure? ..................................... 25 4.5. What is the impact on institutional resources that form infrastructure? .............................. 25 4.6. What is the impact on information resources that form infrastructure? ............................... 25 4.7. What is the impact on technology transfer? .......................................................................... 26 4.8. What is the impact on society beyond science and technology? ........................................... 26 5. Changes/Problems ............................................................................................................26 5.1. Changes in approach and reasons for change ........................................................................ 26 5.2. Actual or anticipated problems or delays and actions or plans to resolve them ................... 26 5.3. Changes that have significant impact on expenditures ......................................................... 26 5.4. Significant changes in use or care of human subjects ............................................................ 26 5.5. Significant changes in the use or care of vertebrate animals ................................................. 26 5.6. Significant changes in the use or care of biohazards ............................................................. 26 6. Appendices ........................................................................................................................26 Appendix 1. NCGAS National Workshop Results ............................................................................... 27 Appendix 2. List of NSF Funded Projects Using NCGAS Resources .................................................... 30 Appendix 3. Software Supported by NCGAS ..................................................................................... 45 7. Acknowledgements ...........................................................................................................87 i Table of Tables Table 1. List of all the NCGAS preconfigured virtual machines available in the Jetstream library. ........... 4 Table 2. Individuals who have worked on the NCGAS project. .............................................................. 21 Table 3. Partner organizations................................................................................................................ 23 Table 4. Self-assessed improvement of 11 skills before and after Introduction to R for Biologists Workshop ...................................................................................................................................... 29 Table 5. Self-assessed improvement of 11 skills before and after de novo Transcriptome Assembly and Analysis Workshop, averaged across first two iterations ................................................................ 29 Table 6. Genome analysis software packages available on Indiana University’s Carbonate, Karst clusters. ...................................................................................................................................................... 45 Table of Figures Figure 1. States and Territories with NCGAS Users, all services .............................................................. 2 Figure 2. World sources of hits to the NCGAS web page. ...................................................................... 10 Figure 3. US sources of hits to the NCGAS web page. ........................................................................... 10 Figure 3. Pageviews for the NCGAS Blog from September 1, 2018 through August 12, 2019. ............... 11 Figure 4. NCGAS twitter posts from September 2018 to August 15, 2019 with their engagement rate. ... 12 Figure 5. Facebook NCGAS page views from September 2018 to August 2019. .................................... 13 Figure 6. Users of the NCGAS-supported Trinity Galaxy instance worldwide ........................................ 16 Figure 7. Users of the NCGAS-supported IU GenePattern instance worldwide....................................... 17 Figure 8. Origin of NCGAS National Workshops applicants in US. ....................................................... 28 Figure 9. Origin of NCGAS National Workshops applicants in the World. ............................................. 28 ii Executive Summary The major goal of the NSF ABI Sustaining Award remains to support the continuing and expanding activities of the National Center for Genome Analysis (NCGAS). 1) Providing excellent bioinformatics consulting services to all NSF-funded researchers in need. 2) Maintaining, supporting, and delivering genome assembly and analysis software on national cyberinfrastructure (CI) systems. 3) Providing education and outreach programs on genome analysis and assembly, including designing genomics experiments, using best-of-breed software and hardware tools, and interpreting data. 4) Disseminating tools for genome assembly and analysis in forms useable by biologists. 5) Providing long-term archival storage for genome biologists. NCGAS emphasizes genome, transcriptome, and microbiome assembly at the technically challenging end of the spectrum of current bioinformatics—for example de novo assembly—where both specialized computational resources and applications are needed. NCGAS has just been awarded its second three-year Sustaining award, DBI-1759906, which started in September 2018 (PIs Doak, Henschel, Stewart, Hahn, Ye). Pittsburgh Supercomputing Center (PSC) and PI Blood are again on a collaborative award. The current award has just started a year of no-cost extension. Accomplishments: NCGAS serves researchers in all 50 states and Puerto Rico, including 17 EPSCoR states, and has users around the world. NCGAS supported 250 biologists using genome analysis (50 named new allocations) during the current year, including 266 short consultations and 38 extended consultations. Highlights detailed within the report include: The maintenance of available suites of software and addition of new ones as new methods of sequencing and analysis become available Use of the Jetstream cloud environment to disseminate applications Outreach, direct consultation, and training activities Very positive results from the 2018 user survey Professional growth

The National Center for Genome Analysis Support 2019 Annual Report

Open Source Copyrights

Nosql Databases

Easybuild Documentation Release 20210907.0

Towards a Fully Automated Extraction and Interpretation of Tabular Data Using Machine Learning

A Multilingual Information Extraction Pipeline for Investigative Journalism

Introduction to Openrefine

Issue Editor

Please Stand by for Realtime Captions. My Name Is Jamie Hayes I Am Your

Mike Bolam Metadata Librarian Digital Scholarship Services University Library System [email protected] // 412-648-5908 Assessment Survey

Report of Contributions

A Component-Based Approach to Traffic Data Wrangling

Utilizing AI/ML Methods for Measuring Data Quality Student: Bc