Investigator Toolkit

Dave Vieglais University of Kansas Andrew Sallans University of Virginia DUG – July 2013

1 Communal Notepad

http://epad.dataone.org/dug2013-BK2-Tools

2 Investigator Toolkit (ITK)

ITK

3 Gateway for Researchers

ITK

4 Adapt Tools to DataONE Services

Investigator Toolkit Analysis, Data Discovery Data Management Visualization Java Library Python Library CLI Tools REST URLs

• Reduce stovepipe systems • Use tools with any Member Node

• Improve efficiency • Work together on Open Source • Reduce redundant efforts, better results possible

• Build community

5 Informing Priorities: Assessments

Data Mgr. stakeholder persona and scenario surveys development

cyberinfrastructure development Scientist Library’s/Librarians usability testing

external assessments / surveys

6 DUG Surveys: 86 Tools Identified Access Fortran library LiveAccessServer Primer ArcGIS Fragstats MapServer Python library Brahms GBIF IPT Mathematica QuantumGIS C/C++ library Geoportal Server Matlab Command Line Giovanni Mendeley RDV Cyberintegrator Google Forms Metavist SAS DataONE Drive Google Fusion Tables Metawin SciFlow DataTurbine Google Spreadsheet ModelCenter Specify Density Grads MODIS subset tool Spyder Distance GRASS Morpho SQLServer DMP Tool GridFTP MP Parser Stata Drupal HydroDesktop MycoDB Systat Earth System Curator HydroExcel myExperiment Taverna Earth Sys. Mod Fr. HydroR Nature's Notebook Thredds ENVI/IDL HydroTagger OPeNDAP Thredds Server ERDAS/Imagine IDRISI Oxygen XML Editor UDig ERDDAP IDV/McIDAS Panoply VisTrails EstimateS Java library PC-ORD Visual Fox Pro Excel JDBC/ODBC Perl library Web search Fdiversity JMP PHP Zotero Ferret Kepler PlantList/TRY

FishR KPHP Presence 7 DUG Surveys: ITK Priorities

9 DataONE Drive 9 R 7 ArcGIS 5 R 2 Excel 3 R 4 Python library 2 Java library 2 Geoportal 4 Java library 2 PHP 2 OPeNDAP Python Python 3 IDL 2 library 2 library 2 ArcGIS 1 QuantumGIS 1 Java library 1 Geoportal Server 1 Matlab 1 Fortran library 1 Mendeley 1 DMP Tool 1 Metavist 1 Matlab 1 MP Parser 1 MetaVist 1 Spyder 1 Thredds 86 Total Tools Identified 1 VisTrails 8 Investigator Toolkit

9 Tools Across the Data Life Cycle

Plan

Analyze Collect

Drive

Integrate Assure Drive

Discover Describe

Preserve

10 Tools Across the Data Life Cycle

Plan

Analyze Collect Drive

Integrate Assure Drive

Discover Describe

Preserve

11 Tools Across the Data Life Cycle

Plan

Analyze Collect Drive

Integrate Assure Drive

Discover Describe

Preserve

12 DMPTool2

Andrew Sallans Head of Strategic Data Initiatives University of Virginia Library Co-Lead of DMPTool and PM Overview of DMPTool 1

• Free • Workflow for creating a DMP • Helps meet funder requirements • Supplies questions • Includes explanation/context provided by the agency • Provides links to the agency website • Institutional integration via Shibboleth • Local guidance • Pointing to resources and staff

Data management planning made EASIER • Recognition that data management is complex and requires a dialogue amongst many stakeholders • Recognition that there’s a range of understanding and available support resources • DMPTool focus is on simplifying and scaling the common parts, developing a community, and providing functionality to advance services when possible “Data Management Planning Tool 2: Responding to the Community”

• Funded by the Alfred P. Sloan Foundation • $590K for 12 months, roughly 2013 • PI: Trisha Cruse • Co-PI: Andrew Sallans, Sarah Shreeves

Who Role Who Role Stephen Abrams Technical Lead TBD Application Developer 1 (CDL) (CDL) Marisa Strong Development TBD Application Developer 2 (CDL) Lead (CDL) Scott Fisher UI TBD Community / (CDL) Implementation (UIUC) Requirements Builder Sherry Lake Content Analyst Tao UI Interaction/Design (UVA) Zhang Manager (UVA/ Purdue) “Improving Data Stewardship with the DMPTool: Empowering Libraries to Seize Data Management Education” • Funded by IMLS • 12 month grant, 2012/2013 • PI: Trisha Cruse • Lead: Carly Strasser • Goals: • Strengthen the roles of libraries by promoting use of the DMPTool in the data management space • Enhance the DMPTool’s efficacy at promoting good data stewardship and sound data management planning. Advisory groups

• Two groups have been established • Researcher Advisory Board: • https://bitbucket.org/dmptool/main/wiki/ ResearcherAdvisoryBoard • Administrative User Advisory Board • https://bitbucket.org/dmptool/main/wiki/ AdministrativeUserAdvisoryBoard NEW FUNCTIONALITY IN DMPTOOL2

23 Planned functionality for Researchers • Create, update, delete, or retract (once public) plans • Create a plan by copying an existing plan • Include collaborators on plans • Share plans with specific group or with public • Send plans to be reviewed • Export plans 24 Planned Functionality for Institutions • Brand the tool • Create their own requirements template (e.g. for a locally required DMP) • Add and edit resources through interface • Require plans to be reviewed • Review plans – approve and request edits • Export all DMPs for data mining 25 purposes See the Wireframes

• View the wireframes yourself here: • http://ux.cdlib.org/mstrong/dmptool2_wf/

• Send any feedback or comments directly to me: • [email protected]

26 Current Status of DMPTool2 Project

Project Task Planned Deadline

Functional Requirements Review May 2013

Wireframes Review May 2013

Functionality Development August 2013

API Implementation September 2013

Production Release with October 2013 documentation

Continued recruitment of institutions Throughout – target to double by and users December 2013

27 Questions

1. How often are you reusing a DMP that you created for a previous grant proposal? 2. How often would you use the tool in a year? 3. How important are guidance and local resources to your DMP creation process? 4. How often are you collaborating on the creation of a DMP? 5. How can we demonstrate benefit to the researcher from using the DMPTool? 6. Should we put more emphasis on quality (ie. Guidance, workflow, templates, etc.) or quantity (ie. Number of institutions, users, plans, etc.)?

28 Tools Across the Data Life Cycle

Plan

Analyze Collect Drive

Integrate Assure Drive

Discover Describe

Preserve

29 Describe: DataUp

• Extension for Microsoft Excel • Widely used for management of simple data • Promote best practices for spreadsheet data • 65% of data in DataONE is readable by Excel

30 Flexibility Detracts from Reusability

31 DataUp Solution

1. Guide user to clean spreadsheet • Remove spurious formatting • Promote naming conventions • Promote consistency in spreadsheet layout 2. Gather metadata describing spreadsheet • Stored in a new worksheet • Expressed as EML 3. Apply a unique identifier to the spreadsheet 4. Upload the content to DataONE

32 DataUp Add-In

33 DataUp: Check Data

34 DataUp: Add Metadata

35 DataUp: Publish to DataONE

36 DataUp: Next Steps

• Current state: • Functional with opportunities for improvement • Open source on Bitbucket: https://bitbucket.org/dataup • Recently awarded NSF supplement • Correct and refine • Improve web version (cross platform support) • Actively investigating further opportunities

37 Tools Across the Data Life Cycle

Plan

Analyze Collect Drive

Integrate Assure Drive

Discover Describe

Preserve

38 Discover

• Data discovery portal • Search and retrieval of content indexed by DataONE • Collates metadata across all Member Nodes • Operates on each Coordinating Node

39 Workflow

?

40 ONEDrive

• Workspace defines network drive contents • Select by specific objects or queries • Web interface to select targets, refine queries • Access content with unmodified applications

[Filters] Network

41 ONEDrive - root

42 43 44 45 Workspace content

Predefined Views

User Defined Folders

46 47 Predefined Views in Folders

48 Author List

49 Open Data

50 Regional Hierarchy

51 Temporal Coverage

52 Temporal Coverage

53 Temporal Coverage

54 Taxonomic Hierarchy

55 Taxonomic Hierarchy

56 ONEDrive: Next Steps

Three main focus areas: 1. Refine the content views • Improve client side rendering of results 2. Enable online workspace management • Integrate with ONEMercury • Integrate with identity management • Integrate with content use reporting 3. Augment search index (e.g. taxonomic, spatial) • Utilize external services for lookups • Improvements feed back to other systems

57 Next Steps

• Finalize and release initial versions • Gather feedback • Iterate on next versions designs • Develop and release • Emerging features: • Enhanced discovery through semantics • Tracking Morpho

58 Questions?

“To understand recursion, it is first necessary to understand recursion.”

59