Investigator Toolkit
Dave Vieglais University of Kansas Andrew Sallans University of Virginia DUG – July 2013
1 Communal Notepad
http://epad.dataone.org/dug2013-BK2-Tools
2 Investigator Toolkit (ITK)
ITK
3 Gateway for Researchers
ITK
4 Adapt Tools to DataONE Services
Investigator Toolkit Analysis, Data Discovery Data Management Visualization Java Library Python Library CLI Tools REST URLs
• Reduce stovepipe systems • Use tools with any Member Node
• Improve efficiency • Work together on Open Source • Reduce redundant efforts, better results possible
• Build community
5 Informing Priorities: Assessments
Data Mgr. stakeholder persona and scenario surveys development
cyberinfrastructure development Scientist Library’s/Librarians usability testing
external assessments / surveys
6 DUG Surveys: 86 Tools Identified Access Fortran library LiveAccessServer Primer ArcGIS Fragstats MapServer Python library Brahms GBIF IPT Mathematica QuantumGIS C/C++ library Geoportal Server Matlab R Command Line Giovanni Mendeley RDV Cyberintegrator Google Forms Metavist SAS DataONE Drive Google Fusion Tables Metawin SciFlow DataTurbine Google Spreadsheet ModelCenter Specify Density Grads MODIS subset tool Spyder Distance GRASS Morpho SQLServer DMP Tool GridFTP MP Parser Stata Drupal HydroDesktop MycoDB Systat Earth System Curator HydroExcel myExperiment Taverna Earth Sys. Mod Fr. HydroR Nature's Notebook Thredds ENVI/IDL HydroTagger OPeNDAP Thredds Server ERDAS/Imagine IDRISI Oxygen XML Editor UDig ERDDAP IDV/McIDAS Panoply VisTrails EstimateS Java library PC-ORD Visual Fox Pro Excel JDBC/ODBC Perl library Web search Fdiversity JMP PHP Zotero Ferret Kepler PlantList/TRY
FishR KPHP Presence 7 DUG Surveys: ITK Priorities
9 DataONE Drive 9 R 7 ArcGIS 5 R 2 Excel 3 R 4 Python library 2 Java library 2 Geoportal 4 Java library 2 PHP 2 OPeNDAP Python Python 3 IDL 2 library 2 library 2 ArcGIS 1 QuantumGIS 1 Java library 1 Geoportal Server 1 Matlab 1 Fortran library 1 Mendeley 1 DMP Tool 1 Metavist 1 Matlab 1 MP Parser 1 MetaVist 1 Spyder 1 Thredds 86 Total Tools Identified 1 VisTrails 8 Investigator Toolkit
9 Tools Across the Data Life Cycle
Plan
Analyze Collect
Drive
Integrate Assure Drive
Discover Describe
Preserve
10 Tools Across the Data Life Cycle
Plan
Analyze Collect Drive
Integrate Assure Drive
Discover Describe
Preserve
11 Tools Across the Data Life Cycle
Plan
Analyze Collect Drive
Integrate Assure Drive
Discover Describe
Preserve
12 DMPTool2
Andrew Sallans Head of Strategic Data Initiatives University of Virginia Library Co-Lead of DMPTool and PM Overview of DMPTool 1
• Free • Workflow for creating a DMP • Helps meet funder requirements • Supplies questions • Includes explanation/context provided by the agency • Provides links to the agency website • Institutional integration via Shibboleth • Local guidance • Pointing to resources and staff
Data management planning made EASIER • Recognition that data management is complex and requires a dialogue amongst many stakeholders • Recognition that there’s a range of understanding and available support resources • DMPTool focus is on simplifying and scaling the common parts, developing a community, and providing functionality to advance services when possible “Data Management Planning Tool 2: Responding to the Community”
• Funded by the Alfred P. Sloan Foundation • $590K for 12 months, roughly 2013 • PI: Trisha Cruse • Co-PI: Andrew Sallans, Sarah Shreeves
Who Role Who Role Stephen Abrams Technical Lead TBD Application Developer 1 (CDL) (CDL) Marisa Strong Development TBD Application Developer 2 (CDL) Lead (CDL) Scott Fisher UI TBD Community / (CDL) Implementation (UIUC) Requirements Builder Sherry Lake Content Analyst Tao UI Interaction/Design (UVA) Zhang Manager (UVA/ Purdue) “Improving Data Stewardship with the DMPTool: Empowering Libraries to Seize Data Management Education” • Funded by IMLS • 12 month grant, 2012/2013 • PI: Trisha Cruse • Lead: Carly Strasser • Goals: • Strengthen the roles of libraries by promoting use of the DMPTool in the data management space • Enhance the DMPTool’s efficacy at promoting good data stewardship and sound data management planning. Advisory groups
• Two groups have been established • Researcher Advisory Board: • https://bitbucket.org/dmptool/main/wiki/ ResearcherAdvisoryBoard • Administrative User Advisory Board • https://bitbucket.org/dmptool/main/wiki/ AdministrativeUserAdvisoryBoard NEW FUNCTIONALITY IN DMPTOOL2
23 Planned functionality for Researchers • Create, update, delete, or retract (once public) plans • Create a plan by copying an existing plan • Include collaborators on plans • Share plans with specific group or with public • Send plans to be reviewed • Export plans 24 Planned Functionality for Institutions • Brand the tool • Create their own requirements template (e.g. for a locally required DMP) • Add and edit resources through interface • Require plans to be reviewed • Review plans – approve and request edits • Export all DMPs for data mining 25 purposes See the Wireframes
• View the wireframes yourself here: • http://ux.cdlib.org/mstrong/dmptool2_wf/
• Send any feedback or comments directly to me: • [email protected]
26 Current Status of DMPTool2 Project
Project Task Planned Deadline
Functional Requirements Review May 2013
Wireframes Review May 2013
Functionality Development August 2013
API Implementation September 2013
Production Release with October 2013 documentation
Continued recruitment of institutions Throughout – target to double by and users December 2013
27 Questions
1. How often are you reusing a DMP that you created for a previous grant proposal? 2. How often would you use the tool in a year? 3. How important are guidance and local resources to your DMP creation process? 4. How often are you collaborating on the creation of a DMP? 5. How can we demonstrate benefit to the researcher from using the DMPTool? 6. Should we put more emphasis on quality (ie. Guidance, workflow, templates, etc.) or quantity (ie. Number of institutions, users, plans, etc.)?
28 Tools Across the Data Life Cycle
Plan
Analyze Collect Drive
Integrate Assure Drive
Discover Describe
Preserve
29 Describe: DataUp
• Extension for Microsoft Excel • Widely used for management of simple data • Promote best practices for spreadsheet data • 65% of data in DataONE is readable by Excel
30 Flexibility Detracts from Reusability
31 DataUp Solution
1. Guide user to clean spreadsheet • Remove spurious formatting • Promote naming conventions • Promote consistency in spreadsheet layout 2. Gather metadata describing spreadsheet • Stored in a new worksheet • Expressed as EML 3. Apply a unique identifier to the spreadsheet 4. Upload the content to DataONE
32 DataUp Add-In
33 DataUp: Check Data
34 DataUp: Add Metadata
35 DataUp: Publish to DataONE
36 DataUp: Next Steps
• Current state: • Functional with opportunities for improvement • Open source on Bitbucket: https://bitbucket.org/dataup • Recently awarded NSF supplement • Correct and refine • Improve web version (cross platform support) • Actively investigating further opportunities
37 Tools Across the Data Life Cycle
Plan
Analyze Collect Drive
Integrate Assure Drive
Discover Describe
Preserve
38 Discover
• Data discovery portal • Search and retrieval of content indexed by DataONE • Collates metadata across all Member Nodes • Operates on each Coordinating Node
39 Workflow
?
40 ONEDrive
• Workspace defines network drive contents • Select by specific objects or queries • Web interface to select targets, refine queries • Access content with unmodified applications
[Filters] Network
41 ONEDrive - root
42 43 44 45 Workspace content
Predefined Views
User Defined Folders
46 47 Predefined Views in Folders
48 Author List
49 Open Data
50 Regional Hierarchy
51 Temporal Coverage
52 Temporal Coverage
53 Temporal Coverage
54 Taxonomic Hierarchy
55 Taxonomic Hierarchy
56 ONEDrive: Next Steps
Three main focus areas: 1. Refine the content views • Improve client side rendering of results 2. Enable online workspace management • Integrate with ONEMercury • Integrate with identity management • Integrate with content use reporting 3. Augment search index (e.g. taxonomic, spatial) • Utilize external services for lookups • Improvements feed back to other systems
57 Next Steps
• Finalize and release initial versions • Gather feedback • Iterate on next versions designs • Develop and release • Emerging features: • Enhanced discovery through semantics • Tracking provenance Morpho
58 Questions?
“To understand recursion, it is first necessary to understand recursion.”
59