Provenance in Data Exploration for Reproducibility and Beyond

Total Page:16

File Type:pdf, Size:1020Kb

Provenance in Data Exploration for Reproducibility and Beyond Provenance in Data Exploration for Reproducibility and Beyond Juliana Freire [email protected] VisTrails Group & Web and Databases Lab NYU Poly Science Today: Data Intensive Simulations Sensors User studies Particle colliders Obtain Analyze/ Publish/ Data Visualize Share Web Sequencing machines Databases CATT Research Review 2011 Juliana Freire 2 Science and Business Today: Data Intensive Simulations Sensors User studies Particle colliders Obtain Analyze/ Publish/ Data Visualize Share Web Sequencing machines Databases CATT Research Review 2011 Juliana Freire 3 Science Today: Data + Computing Intensive Simulations Sensors AVS User studies Particle colliders Obtain Analyze/ Publish/ Data Visualize Share Web Sequencing VisTrails machines Databases Taverna CATT Research Review 2011 Juliana Freire 4 Science Today: Data + Computing Intensive Simulations Sensors User studies Particle colliders Obtain Analyze/ Publish/ Data Visualize Share Web Sequencing machines Databases CATT Research Review 2011 Juliana Freire 5 Science Today: Data + Computing Intensive Simulations Sensors User studies Particle colliders Obtain Analyze/ Publish/ Data Visualize Share Web Sequencing machines Databases CATT Research Review 2011 Juliana Freire 6 Science Today: Incomplete Publications ◆ Publications are just the tip of the iceberg - Scientific record is incomplete--- to large to fit in a paper - Large volumes of data - Complex processes ◆ Others can’t (easily) reproduce results ◆ Authors can’t remember all the steps led to a result… CATT Research Review 2011 Juliana Freire 7 Science Today: Incomplete Publications ◆ Publications are just the tip of the iceberg “It’s impossible to verify most of the results that - Scientificcomputational record scientistsis incomplete--- present at conference to large to fit in a paper and in papers.” [Donoho et al., 2009] - Large“Scientific volumes and of mathematical data journals are filled - Complexwith pretty processes pictures of computational experiments ◆ Can’t that(easily) the reader reproduce has no resultshope of repeating.” [LeVeque, 2009] “Published documents are merely the advertisement of scholarship whereas the computer programs, input data, parameter values, etc. embody the scholarship itself.” [Schwab et al., 2007] CATT Research Review 2011 Juliana Freire 8 Science Today: Incomplete Publications ◆ Publications are just the tip of the iceberg “It’s impossible to verify most of the results that - Scientificcomputational record scientistsis incomplete--- present at conference to large to fit in a paper and in papers.” [Donoho et al., 2009] - Large“Scientific volumes and of mathematical data journals are filled - Complexwith pretty processes pictures of computational experiments ◆ Can’t that(easily) the reader reproduce has no resultshope of repeating.” [LeVeque, 2009] “Published documents are merely the advertisement of scholarship whereas the computer programs, input data, parameter values, etc. embody the scholarship itself.” http://[Schwaben.wikipedia.org/wiki/Scientific_misconduct et al., 2007] http://ori.dhhs.gov/misconduct/cases/ CATT Research Review 2011 Juliana Freire 9 Science and Reproducibility CATT Research Review 2011 Juliana Freire 10 Provenance in Science When ◆ Not a new issue! ◆ Lab notebooks have been used for a long time ◆ What is new? Annotation – Large volumes of data – Complex analyses— computational processes ◆ Writing notes is no longer an option Observed data DNA recombination By Lederberg CATT Research Review 2011 Juliana Freire 11 The VisTrails System ◆ Workflow-based system for data analysis and visualization – Allows multiple tools to be combined into pipelines ◆ Comprehensive provenance infrastructure ◆ Transparently tracks provenance of the discovery process---from data acquisition to visualization – The trail followed as users generate and test hypotheses ◆ Leverage provenance to streamline exploration – Support for reflective reasoning and collaboration – Query and mine provenance ◆ Focus on usability ◆ The system is open source: http://www.vistrails.org – Multi-platform: Linux, Mac, Windows – Written in Python + Qt CATT Research Review 2011 Juliana Freire 12 The VisTrails System ◆ Workflow-based system for data analysis and visualization – Allows multiple tools to be combined into pipelines ◆ Comprehensive provenance infrastructure ◆ Transparently tracks provenance of the discovery process---from data acquisition to visualization – The trail followed as users generate and test hypotheses ◆ Leverage provenance to streamline exploration – Support for reflective reasoning and collaboration • Study on the use of tms for improving memory • Visualizing environmental simulations (CMOP STC) – Query and mine provenance (Pyschiatry, U. Utah) • Simulation for solid, fluid and structural mechanics • eBird (Cornell, NSF DataONE) (Galileo◆ Network, UFRJ Brazil) Focus on usability • Astrophysical Systems (Tohline, LSU) • Quantum physics simulations (ALPS, ETH Switzerland) • NIH NBCR (UCSD) • Climate◆ The analysis system (CDAT) is open source: http://www.vistrails.org • Pervasive Technology Labs (Heiland, Indiana • Habitat modeling (USGS) – Multi-platform: Linux, Mac, WindowsUniversity) • Open Wildland Fire Modeling (U. Colorado, NCAR) • Linköping University (Sweden) • High-energy physics (LEPP, Cornell) – Written in Python + Qt • University of North Carolina, Chapel Hill • Cosmology simulations (LANL) • UTEP CATT Research Review 2011 Juliana Freire 13 Demo Provenance Beyond Reproducibility ◆ Support for reflective reasoning ◆ Ability to compare data products [Freire et al., IPAW 2006] CATT Research Review 2011 Juliana Freire 15 Provenance Beyond Reproducibility ◆ Support for reflective reasoning ◆ Ability to compare data products ◆ Explore parameter spaces and compare results [Freire et al., IPAW 2006] CATT Research Review 2011 Juliana Freire 16 Provenance Beyond Reproducibility ◆ Support for reflective reasoning ◆ Ability to compare data products ◆ Explore parameter spaces and compare results ◆ Support for collaboration [Ellkvist et al., IPAW 2008] CATT Research Review 2011 Juliana Freire 17 Provenance Enabling 3rd-Party Tools Autodesk Maya ParaView VisIt ImageVis3d [Callahan et al., IPAW 2008] CATT Research Review 2011 Juliana Freire 18 Provenance Plugin for ParaView http://www.cs.utah.edu/~juliana/videos/paraview_plugin.avi CATT Research Review 2011 Juliana Freire 19 Provenance Plugin for Autodesk Maya http://www.cs.utah.edu/~juliana/videos/paraview_plugin.avi CATT Research Review 2011 Juliana Freire 20 Provenance-Rich Publications http://www.crowdlabs.org/vistrails/medleys/details/2/ CATT Research Review 2011 Juliana Freire 21 Reproducible Publications: Benefits ◆ Produce more re-usable knowledge---not just text ◆ Allow scientists to stand on the shoulders of giants and on their own shoulders! ◆ Science can move faster – http://www.nytimes.com/2011/06/26/opinion/sunday/26ideas.html?_r=1 ◆ Higher-quality publications – Authors will be more careful – Many eyes to check results ◆ Describe more of the discovery process: learn from successes and mistakes ◆ Expose scientific community to different techniques and tools: expedite their training, reduce time to insight ◆ More impact, more citations (?) CATT Research Review 2011 Juliana Freire 22 !"#!"#$%&'&(%)*+(,-$%&'()#!*$+,-.# A Reproducible Paper: ALPS2.0 The ALPS project release 2.0: .#"/0#1# Open source software for strongly correlated systems B. Bauer1 L. D. Carr2 H.G. Evertz3 A. Feiguin4 J. Freire5 S. Fuchs6 L. Gamper1 J. Gukelberger1 E. Gull7 S. Guertler8 A. Hehn1 R. Igarashi9,10 S.V. Isakov1 D. Koop5 P.N. Ma 1 P. Mates 1,5 H. Matsuo11 O. Parcollet12 G. Pawlowski13 J.D. Picon14 L. Pollet1,15 E. Santos5 V.W. Scarola16 U. Schollw¨ock17 C. Silva5 B. Surer1 S. Todo10,11 S. Trebst18 M. Troyer1 M. L. Wall2 P. Werner1 S. Wessel19,20 ‡ 1Theoretische Physik, ETH Zurich, 8093 Zurich, Switzerland 2Department of Physics, Colorado School of Mines, Golden, CO 80401, USA 3Institut f¨ur Theoretische Physik, Technische Universit¨atGraz, A-8010 Graz, Austria 4Department of Physics and Astronomy, University of Wyoming, Laramie, Wyoming 82071, USA 5Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah 84112, USA 6Institut f¨ur Theoretische Physik, Georg-August-Universit¨atG¨ottingen, G¨ottingen, Germany 7Columbia University, New York, NY 10027, USA 2+3"'"+%4# 8Bethe Center for Theoretical Physics, Universit¨atBonn, Nussallee 12, 53115 Bonn, The ALPS projectGermany release 2.0: Open source software for strongly correlated systems 15 9Center for Computational Science & e-Systems, Japan Atomic Energy Agency, 110-0015 Tokyo, Japan !*$+#,-.# 10Core Research for Evolutional Science and Technology, Japan Science and Technology Agency, 332-0012 Kawaguchi, Japan 11Department of Applied Physics, University of Tokyo, 113-8656 Tokyo, Japan 12Institut de Physique Th´eorique, CEA/DSM/IPhT-CNRS/URA 2306, CEA-Saclay, F-91191 Gif-sur-Yvette, France 13Faculty of Physics, A. Mickiewicz University, Umultowska 85, 61-614 Pozna´n, arXiv:1101.2646v4 [cond-mat.str-el] 23 May 2011 Poland /%0&120134# 14Institute of Theoretical Physics, EPF Lausanne, CH-1015 Lausanne, Switzerland 15Physics Department, Harvard University, Cambridge 02138, Massachusetts, USA 16Department of Physics, Virginia Tech, Blacksburg, Virginia 24061, USA 17Department for Physics, Arnold Sommerfeld Center for Theoretical Physics and Center for NanoScience, University of Munich, 80333 Munich, Germany 5'6'# 18Microsoft Research, Station Q, University of California, Santa
Recommended publications
  • Vistrails: Enabling Interactive Multiple-View Visualizations
    VisTrails: Enabling Interactive Multiple-View Visualizations Louis Bavoil 1 Steven P. Callahan 1 Patricia J. Crossno 3 Juliana Freire 2 Carlos E. Scheidegger 1 Claudio´ T. Silva 1,2 Huy T. Vo 1 1 Scientific Computing and Imaging Institute, University of Utah 2 School of Computing, University of Utah 3 Sandia National Laboratories Figure 1: VisTrails Visualization Spreadsheet. This ensemble shows the surface salinity variation at the mouth of the Columbia River over the period of a day. The green regions represent the fresh-water discharge of the river into the ocean. A single vistrail specification is used to construct this ensemble. Each cell corresponds to an instance of this specification executed using a different timestamp value. ABSTRACT 1 INTRODUCTION In recent years, with the explosion in the volume of scientific data, VisTrails is a new system that enables interactive multiple-view vi- we have observed a paradigm shift in how scientists use visualiza- sualizations by simplifying the creation and maintenance of visu- tion. Projects such as CORIE, an environmental observation and alization pipelines, and by optimizing their execution. It provides forecasting system for the Columbia River, generate and publish a general infrastructure that can be combined with existing visu- on the Web thousands of new images daily which depict river cir- alization systems and libraries. A key component of VisTrails is culation forecasts and hindcasts, as well as real-time sensor data.1 the visualization trail (vistrail), a formal specification of a pipeline. The growing demand for visualization has led to the development Unlike existing dataflow-based systems, in VisTrails there is a clear of new and freely available systems [14, 20, 24], which due to separation between the specification of a pipeline and its execution increased computational power, wide availability of inexpensive instances.
    [Show full text]
  • Vistrails Documentation Release 1.7.0
    VisTrails Documentation Release 1.7.0 University of Utah December 05, 2011 CONTENTS I Preliminary Pages1 1 Preface 3 1.1 Acknowledgements...........................................3 II An Introduction to VisTrails5 2 What Is VisTrails? 7 2.1 Important Features............................................7 2.2 Obtaining the software..........................................8 3 Getting Started 9 3.1 Installation................................................9 3.2 Quick Start................................................ 11 3.3 Manipulating VisTrails Files....................................... 11 3.4 VisTrails Basics............................................. 14 3.5 VisTrails Interaction........................................... 14 III Learning VisTrails By Example 17 4 Creating and Modifying Workflows 19 4.1 Working with Modules.......................................... 19 4.2 Adding and Deleting Modules...................................... 19 4.3 Connecting Modules........................................... 21 4.4 Changing Module Parameters...................................... 21 4.5 Changing Module Labels........................................ 23 4.6 Configuring Module Ports........................................ 23 4.7 Grouping Modules............................................ 24 4.8 Basic Modules.............................................. 24 5 Interacting with the Version Tree 29 5.1 Version Tree View............................................ 29 5.2 Adding and Deleting Tags........................................ 33 5.3
    [Show full text]
  • Using Vistrails and Provenance for Teaching Scientific Visualization
    DOI: 10.1111/j.1467-8659.2010.01830.x COMPUTER GRAPHICS forum Volume 30 (2011), number 1 pp. 75–84 Using VisTrails and Provenance for Teaching Scientific Visualization Claudio´ T. Silva, Erik Anderson, Emanuele Santos and Juliana Freire Scientific Computing and Imaging Institute, University of Utah, USA Abstract Over the last 20 years, visualization courses have been developed and offered at universities around the world. Many of these courses use established visualization libraries and tools (e.g. VTK, ParaView, AVS, VisIt) as a way to provide students a hands-on experience, allowing them to prototype and explore different visualization techniques. In this paper, we describe our experiences using VisTrails as a platform to teach scientific visualization. VisTrails is an open-source system that was designed to support exploratory computational tasks such as visualization and data analysis. Unlike previous scientific workflow and visualization systems, VisTrails provides a comprehensive provenance management infrastructure. We discuss how different features of the system, and in particular, the provenance information have changed the dynamics of the Scientific Visualization course we offer at the University of Utah. We also describe our initial attempts at using the provenance information to better assess our teaching techniques and student performance. Keywords: provenance, scientific visualization, education ACM CCS: K.3 [Computer and Education]: General. 1. Introduction tant to contrast the two fields. In the more traditional com- puter graphics, used to make movies or games, the goal is to As the volume of digital data explodes, the ability to visual- produce visually engaging (beautiful) imagery that ‘appears ize these data has become increasingly important.
    [Show full text]
  • Tackling the Provenance Challenge One Layer at a Time
    CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2007; V:1–11 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Tackling the Provenance Challenge One Layer at a Time Carlos Scheidegger1, David Koop2, Emanuele Santos1, Huy Vo1, Steven Callahan1, Juliana Freire2, Cl´audio Silva1 1 Scientific Computing and Imaging Institute, University of Utah 2 School of Computing, University of Utah SUMMARY VisTrails is a new workflow and provenance management system that provides support for scientific data exploration and visualization. Whereas workflows have been traditionally used to automate repetitive tasks, for applications that are exploratory in nature, change is the norm. VisTrails uses a new change-based provenance mechanism which was designed to handle rapidly-evolving workflows. It uniformly and automatically captures provenance information for data products and for the evolution of the workflows used to generate these products. In this paper, we describe how the VisTrails provenance data is organized in layers and present a first approach for querying this data that we developed to tackle the Provenance Challenge queries. key words: visualization, provenance, workflow evolution 1. Introduction Workflows are emerging as a paradigm for representing and managing complex computations. Workflows capture elaborate processes in a structured way and provide the provenance information necessary for result reproducibility, publication, and sharing among collaborators. Because of the formalism they provide and the automation they support, workflows have the potential to accelerate and transform the information analysis process [6]. Workflows are rapidly replacing primitive shell scripts as evidenced by the release of Apple’s Mac OS X Automator [1] and Microsoft’s Workflow Foundation [12].
    [Show full text]
  • Reproducibility Using Vistrails
    1 Reproducibility Using VisTrails Juliana Freire Polytechnic Institute of NYU David Koop Polytechnic Institute of NYU Fernando Seabra Chirigati Polytechnic Institute of NYU Cl´audio T. Silva Polytechnic Institute of NYU CONTENTS 1.1 Introduction ...................................................... 1 1.2 Reproducibility, Workflows, and Provenance .................... 3 1.2.1 The Anatomy of a Reproducible Experiment ........... 3 1.2.2 Describing Computations as Workflows ................. 4 1.2.3 Provenance in Workflow Systems ....................... 6 1.2.4 Workflows and Reproducibility .......................... 7 1.3 The VisTrails System ............................................ 7 1.4 Reproducing and Publishing Results with VisTrails ............ 9 1.4.1 Reproducibility Support ................................. 10 1.4.2 Publishing Results ....................................... 14 1.4.3 Publishing Interactive Results on the Web .............. 14 1.5 Challenges and Opportunities ................................... 16 1.6 Related Work .................................................... 16 1.7 Conclusion ........................................................ 17 Acknowledgments ................................................... 17 1.1 Introduction Science has long placed an emphasis on revisiting and reusing past results: re- producibility is a core component of the scientific process. Testing and extend- 1 2 Implementing Reproducible Computational Research ing published results are standard activities that lead to practical progress:
    [Show full text]
  • Vistrails Documentation Release 2.0.3
    VisTrails Documentation Release 2.0.3 NYU Poly March 31, 2014 CONTENTS I User’s Guide1 1 Preliminary Pages 3 1.1 Preface..................................................3 2 An Introduction to VisTrails 5 2.1 What Is VisTrails?............................................5 2.2 Getting Started..............................................6 3 Learning VisTrails By Example 15 3.1 Creating and Modifying Workflows................................... 15 3.2 Groups and Subworkflows........................................ 23 3.3 Interacting with the Version Tree.................................... 27 3.4 Merging Two Version Trees....................................... 32 3.5 Querying the Version Tree........................................ 32 3.6 Spreadsheet................................................ 38 3.7 Using Analogies to Update Workflows................................. 44 3.8 Parameter Exploration.......................................... 50 3.9 Provenance Browser........................................... 57 3.10 Mashups................................................. 59 3.11 Module Descriptions and Examples................................... 61 4 Intermediate Concepts and VisTrails Packages 71 4.1 Control Flow in VisTrails........................................ 71 4.2 The Control Flow Assistant....................................... 81 4.3 Connecting to a Database........................................ 86 4.4 Example: Web Services......................................... 90 4.5 Persistence in VisTrails........................................
    [Show full text]
  • Making Computations and Publications Reproducible with Vistrails
    R EPRODUCIBLE R E S E A R C H F O R S C I E N T I F I C C OMPUTING Making Computations and Publications Reproducible with VisTrails The VisTrails system supports the creation of reproducible experiments. VisTrails integrates data acquisition, derivation, analysis, and visualization as executable components throughout the scientific exploration process, and through systematic provenance capture, it makes it easier to generate and share reproducible results. Using VisTrails, authors can link results to their provenance, reviewers can assess the experiment’s validity, and readers can repeat and utilize the computations. mportant scientific results give insight and www.vpf.ethz.ch/services/researchethics/Broschure. lead to practical progress. The ability to pdf), funding agencies, conferences (www. test these results is crucial for science to be sigmod2011.org/calls_papers_sigmod_research_ self-correcting, and the ability to reuse and repeatability.shtml), and journals (www. Iextend the results enables science to move for- signalprocessingsociety.org/publications/periodicals/ ward. In natural science, long tradition requires tsp) have started to encourage (or require) authors that results be reproducible, and in math, results to include reproducible results in their publications. must be accompanied by formal, verifiable proofs. However, a major barrier to the wider adoption However, the same standard hasn’t been applied of reproducibility is the fact that it’s hard for au- for the results of computational experiments. thors to derive a compendium that encapsulates Most computational experiments are specified all the components (for example, the data, code, only informally in papers, where experimental parameter settings, and environment) needed to results are briefly described in figure captions, reproduce a result; and even when a compendium and the code that produced the results is seldom is available, it’s often hard for reviewers to verify available.
    [Show full text]
  • Managing Rapidly-Evolving Scientific Workflows
    Managing Rapidly-Evolving Scientific Workflows Juliana Freire Claudio T. Silva http://www.sci.utah.edu/~vgc/vistrails/ University of Utah Joint work with: Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger and Huy T. Vo Our Motivation: CORIE Environmental observation and forecasting system (EOFS) –Combine real-time sensor measurements with advanced computer models to describe complex, and dynamic environmental systems – focus on the Columbia River Initially: goal was to develop 3D visualizations Look at visualization from an information management perspective IPAW 2006 Juliana Freire 2 Data Exploration through Visualization Hard to make sense out of large volumes of raw data, e.g., sensor feeds, simulations, MRI scans Insightful visualizations help analyze and validate various hypothesis But creating a visualization is a complex, iterative process Perception & Data Visualization Image Knowledge Cognition Specification Exploration Data Visualization User J. van Wijk, IEEE Vis 2005 IPAW 2006 Juliana Freire 3 Visualization Systems: State of the Art Interactive creation and manipulation of visualizations Systems: SCIRun, ParaView/VTK Visual programming for creating visualization pipelines—dataflows of visualization operations Hard to create and compare a large number of visualizations Limitations: – No separation between the specification of a dataflow and its instances – Destructive updates—no provenance tracking mechanism – Users need to manage data and metadata The generation and maintenance of visualizations is a major
    [Show full text]
  • Vistrails User's Guide
    User's Guide Version 1.0 September 28, 2007 ii This is the preface. Contents 1 What Is VisTrails? 1 2 Getting Started 3 2.1 Installation . .3 2.2 Quick Start . .3 2.3 Manipulating VisTrails Files . .5 2.4 VisTrails Basics . .5 2.5 VisTrails Interaction . .6 3 Creating and Modifying Workflows 7 3.1 Working with Modules . .7 3.2 Adding and Deleting Modules . .7 3.3 Connecting Modules . .8 3.4 Changing Module Parameters . .9 3.5 Configuring Module Ports . 10 3.6 Basic Modules . 10 4 Interacting with the Version Tree 12 4.1 Version Tree View . 12 4.2 Adding and Deleting Tags . 13 4.3 Adding Version Annotations . 14 4.4 Navigating Versions . 14 4.5 Comparing Versions . 14 5 The Spreadsheet 16 5.1 The Spreadsheet Layout . 16 5.2 Using the Spreadsheet . 17 5.2.1 Interactive Mode . 17 5.2.2 Editing Mode . 18 5.3 Saving a Spreadsheet . 19 5.4 Creating a Customized Cell Widget . 20 iii iv CONTENTS 6 Querying the Version Tree 21 6.1 Query By Example . 21 6.2 Textual Queries . 22 6.3 Query Results . 24 7 Parameter Exploration 25 7.1 Creating a Parameter Exploration . 25 7.2 Spreadsheet Integration . 28 7.3 Examples . 28 7.3.1 Isosurfaces . 29 7.3.2 Resampling . 29 7.3.3 Animation . 30 8 Using Bookmarks 33 9 Connecting to a Database 34 9.1 Setup . 34 9.1.1 Setting up the database . 34 9.1.2 Setting up VisTrails . 35 9.2 Opening from a database .
    [Show full text]
  • Introduction to the Vistrails System
    Draft – do not distribute without permission. Introduction to the VisTrails system Steven P. Callahan Juliana Freire Emanuele Santos Carlos E. Scheidegger Cláudio T. Silva Wayne Tyler Huy T. Vo Scientific Computing and Imaging Institute School of Computing University of Utah [email protected] Summary This document is a short introduction and tutorial of the VisTrails prototype. This is not meant for widespread public use! We are making this early alpha release of VisTrails available to a select group of potential collaborators to give them a feel of the system and to get early feedback. You should expect broken or missing features and bugs in this release as well as major changes between this and future versions of VisTrails. Rationale and Goals of VisTrails Scientists are now faced with an incredible volume of data to analyze. To analyze and validate various hypotheses, it is necessary to create insightful visualizations of both the simulated processes and observed phenomena. Data exploration through visualization requires scientists to go through several steps. They need to select data sets and specify a series of operations to apply to the data to create appropriate visual representations before they can finally view and analyze the results. Often, insight comes from comparing multiple visualizations. Unfortunately, today this process is far from interactive and contains many error-prone and time-consuming tasks. Today, the generation and maintenance of visualization data products is a major bottleneck in the scientific process, hindering not only the ability to mine scientific data, but the actual use of scientific data. VisTrails extends existing dataflow-based visualization systems to streamline the creation, execution and sharing of complex visualizations.
    [Show full text]
  • Vistrails Training Workflow
    VisTrails Training Workflow VisTrails is an open-source scientific workflow and provenance management system that provides support for simulations, data exploration and visualization. Whereas workflows have been traditionally used to automate repetitive tasks, for applications that are exploratory in nature, such as simulations, data analysis and visualization, very little is repeated---change is the norm. As an engineer or scientist generates and evaluates hypotheses about data under study, a series of different, albeit related, workflows are created while a workflow is adjusted in an interactive process. VisTrails was designed to manage these rapidly-evolving workflows. A key distinguishing feature of VisTrails is a comprehensive provenance infrastructure that maintains detailed history information about the steps followed and data derived in the course of an exploratory task: VisTrails maintains provenance of data products, of the workflows that derive these products and their executions. This information is persisted as XML files or in a relational database, and it allows users to navigate workflow versions in an intuitive way, to undo changes but not lose any results, to visually compare different workflows and their results, and to examine the actions that led to a result. It also enables a series operations and user interfaces that simplify workflow design and use, including the ability to create and refine workflows by analogy and to query workflows by example. VisTrails supports the creation and execution of workflows. It allows the combination of loosely-coupled resources, specialized libraries, grid and Web services. The released version comes with support for several packages including, VTK, Image Magick, Web Services, and pylab.
    [Show full text]
  • Sharing Experiments and Their Provenance
    Sharing Experiments and their Provenance David Koop Juliana Freire Large-Scale Visualization and Data Analysis (VIDA) Center Polytechnic Institute of New York University www.vistrails.org NSF Community Codes 2012 Science Today 011100101 111001011 001001101 101010110 111000110 Collect/Generate/Obtain Filter/Analyze/Visualize Publish/Share Data Results Findings www.vistrails.org NSF Community Codes 2012 2 Science Today 011100101 111001011 001001101 101010110 111000110 Collect/Generate/Obtain Filter/Analyze/Visualize Publish/Share Data Results Findings • There’s more... - Revisit or extend the initial result - Share with a colleague who wants to reproduce an experiment - Investigate the effect of new techniques in the same framework - Determine how flawed data or algorithms impacted results www.vistrails.org NSF Community Codes 2012 2 Provenance, Reproducibility, and Sharing • Goals: - Capture necessary provenance - Support reproducibility - Improve sharing and collaboration Visualizations Results Source Code Workflows Libraries 011100101 111001011 001001101 101010110 111000110 Text Data www.vistrails.org NSF Community Codes 2012 3 5 a) honeycomb rung terms as 0.56 0.56 J =sin✓ and J = cos ✓ , 0.48 0.48 r p p ) / J 0.4 0.4 L where ✓ =0corresponds to the unperturbed Hamiltonian. ( ∆ The phase diagrams as a function of ✓ have been mapped out 0.32 0.32 for both the DFib model18 and the DYL model,4 respectively. 0.24 0.24 Directly probing the topological order in the DYL model 0.16 width W = 2 0.16 width W = 3 and its Hermitian counterpart we show the lifting of their re- finite-size gap 0.08 0.08 spective ground-state degeneracies in Figs.
    [Show full text]