Sharing Experiments and their Provenance
David Koop Juliana Freire
Large-Scale Visualization and Data Analysis (VIDA) Center Polytechnic Institute of New York University
www.vistrails.org NSF Community Codes 2012 Science Today
011100101 111001011 001001101 101010110 111000110
Collect/Generate/Obtain Filter/Analyze/Visualize Publish/Share Data Results Findings
www.vistrails.org NSF Community Codes 2012 2 Science Today
011100101 111001011 001001101 101010110 111000110
Collect/Generate/Obtain Filter/Analyze/Visualize Publish/Share Data Results Findings
• There’s more... - Revisit or extend the initial result - Share with a colleague who wants to reproduce an experiment - Investigate the effect of new techniques in the same framework - Determine how flawed data or algorithms impacted results
www.vistrails.org NSF Community Codes 2012 2 Provenance, Reproducibility, and Sharing • Goals: - Capture necessary provenance - Support reproducibility - Improve sharing and collaboration
Visualizations Results
Source Code Workflows Libraries
011100101 111001011 001001101 101010110 111000110
Text Data
www.vistrails.org NSF Community Codes 2012 3 5
a) honeycomb rung terms as 0.56 0.56 J =sin✓ and J = cos ✓ , 0.48 0.48 r p p
) / J 0.4 0.4
L where ✓ =0corresponds to the unperturbed Hamiltonian. (
∆ The phase diagrams as a function of ✓ have been mapped out 0.32 0.32 for both the DFib model18 and the DYL model,4 respectively. 0.24 0.24 Directly probing the topological order in the DYL model 0.16 width W = 2 0.16 width W = 3 and its Hermitian counterpart we show the lifting of their re-
finite-size gap 0.08 0.08 spective ground-state degeneracies in Figs. 6 and 7 when in- cluding a string tension. We find a striking qualitative dif- 0 0 ference between these two models: For the DYL model the 0 0.1 0.2 0.3 0.4 0.5 inverse system size 1/L lifting of the ground-state degeneracy is exponentially sup- pressed with increasing system size – characteristic of a topo- logical phase. For the Hermitian model, on the other hand, we b) ladder find a splitting of the ground-state degeneracy proportional to 0.32 0.32 JrL. The linear increase with both system size and coupling can be easily understood by the different matrix elements of p 0.24 0.24 the string tension term on a single rung for the two degener- ) / J L
( ate ground-states of the unperturbed model. Plotting the low- ∆ energy spectrum in Fig. 7 clearly shows that the two-fold de- 0.16 0.16 generacy of the unperturbed Hermitian model arises from a (fine-tuned) level crossing. Similar behavior is found in the honeycomb lattice model (not shown). 0.08 0.08
finite-size gap Considering the model in a wider range of couplings, as shown in Fig. 8, further striking differences between the non- 0 0 0 0.05 0.1 0.15 0.2 0.25 Hermitian DYL model and its Hermitian counterpart are re- inverse system size 1/L vealed: The DYL model exhibits two extended topological phases around ✓ =0and ✓ = ⇡/2 (with two and four de- DemoFIG. 4. (color online) Scaling of the finite-size gap (L) (in units generate ground states, respectively), which are separated by of Jp) with linear system size for the Hermitian projector model ✓ = ⇡/4 herm a conformal critical point at precisely c as discussed H on two different lattice geometries: the honeycomb lattice extensively in Refs. 4 and 18. In contrast, the Hermitian model with L W plaquettes (top panel) and 2-leg ladder systems of length Galois Conjugates of Topological Phases Hherm exhibits no topological phase anywhere, and the inter- L (bottom⇥ panel). M. H. Freedman,1 J. Gukelberger,2 M. B. Hastings,1 S. Trebst,1 M. Troyer,2 and Z. Wang1 mediate coupling ✓ = ⇡/4 does not stand out. 1Microsoft Research, Station Q, University of California, Santa Barbara, CA 93106, USA 2Theoretische Physik, ETH Zurich, 8093 Zurich, Switzerland (Dated: July 6, 2011) Galois conjugation relates unitary conformal field theories (CFTs) and topological quantum field theories (TQFTs) to their non-unitary counterparts.a Here we investigate↵ Galois conjugatesb of quantum double models, such as the Levin-Wen model. While these Galois conjugated Hamiltonians are typically non-Hermitian, we find that their ground state wave functions still obey a generalized version of the usual code property (local operators non-Hermitian DYL model do not act on the ground state manifold) and hence enjoy a generalized topological protection. The key question addressed in this paper is whether such non-unitary topological phases can also appear as the ground states of Hermitian Hamiltonians. Specific attempts at constructing Hermitian Hamiltonians with these ground states 3 3 lead to a loss of the code property and topological protection of the degenerate ground states. Beyond this we rigorously prove that no local change of basis (IV.5) can transform the ground states of the Galois conjugated doubled Fibonacci theory into the groundd states of a topological model whosec Hermitian Hamiltonian satisfies
Lieb-Robinson bounds. These include all gapped local or quasi-local Hamiltonians. A similar statement holds ) x 1000 for many other non-unitary TQFTs. One consequence is that the “Gaffnian” wave function cannot be the ground 0 E
state of a gapped fractional quantum Hall state. - 1
PACSFIG. numbers: 5. 05.30.Pr, Edge 73.43.-f labeling for a plaquette of the ladder lattice. E 2 2 I. INTRODUCTION Abelian Levin-Wen model.8 This model, which is also called “DFib”, is a topological quantum field theory (TQFT) whose states are string-nets on a surface labeled by either a triv- Galois conjugation, by definition, replaces a root of a poly- ial or “Fibonacci” anyon. From this starting point, we give nomial by another one with identical algebraic properties. For a rigorous argument that the “Gaffnian” ground state cannot example, i and i are Galois conjugate (consider z2 +1=0) be locally conjugated to the ground state of any topological The1+p 5 quasi-one1 1 p5 dimensional2 geometry allows to numerically as are = and = (consider z z 1=0), 2 2 phase, within a Hermitian model satisfying Lieb-Robinson 3 3 2⇡i/3 3 2⇡i/3 3 9 as well as p2, p2e , and p2e (consider z 2= (LR) bounds (which includes but is not limited to gapped diagonalize systems up to linear system size L = 13. The 1 L = 4 1 0). In physics Galois conjugation can be used to convert non- local and quasi-local Hamiltonians).herm L unitaryfinite-size conformal field theories gap (CFTs) of theto unitary Hermitian ones, and Lieb-Robinson model boundsH are a technicalis again tool for local found lattice = 6 vice versa. One famous example is the non-unitary Yang-Lee models. In relativistically invariant field theories, the speed of L = 8 CFT,to which vanish is Galois conjugate in the to the Fibonacci thermodynamic CFT (G2)1, light is a limit, strict upper bound showing to the velocity a of linear propagation. de-In L = 10 the even (or integer-spin) subset of su(2)3. lattice theories, the LR bounds provide a similar upper bound Inpendence statistical mechanics on non-unitary the inverse conformal field system theo- by a velocity size called as the shown LR velocity, in but inFig. contrast 4b). to the rel- To ries have a venerable history.1,2 However, it has remained less ativistic case there can be some exponentially small “leakage” clearfurther if there exist demonstrate physical situations in which the non-unitary fragilityoutside of the these light-cone gapless in the lattice case. ground The Lieb-Robinson states bounds are a way of bounding the leakage outside the light- ground-state degeneracry splitting ( 0 0 models can provide a useful description of the low energy 18 physicsagainst of a quantum local mechanical perturbations system – after all, Galois wecone. add The a LR string velocity is tension set by microscopic details of the -0.1 0 0.1 conjugation typically destroys the Hermitian property of the Hamiltonian, such as the interaction strength and range. Com- -0.05 0.05 Hamiltonian. Some non-Hermitian Hamiltonians, which sur- bining the LR bounds with the spectral gap enables us to prove coupling parameter θ / π prisingly have totally real spectrum, have been found to arise locality of various correlation and response functions. We will in the study of PT-invariant one-particle systemspert3 and in call a Hamiltonian a Lieb-Robinson Hamiltonian if it satisfies some Galois conjugate many-body systemsH4 and might= beJrLR bounds. l(r),⌧ (13) seen to open the door a crack to the physical use of such We work primarily with a single example, but it should be FIG. 6. (color online) Ground-state degeneracy splitting of the non- models. Another situation, which has recently attracted some clearrungs that ther concept of Galois conjugation can be widely ap- arXiv:1106.3267v3 [cond-mat.str-el] 5 Jul 2011 interest, is the question whether non-unitary models can de- pliedX to TQFTs. The essential idea is to retain the particle Hermitian doubled Yang-Lee model when perturbed by a string ten- scribe 1D edge states of certain 2D bulk states (the edge holo- types and fusion rules of a unitary theory but when one comes graphicfavoring for the bulk). In the particular, trivial there is currently label a discus-l(r)=to writing1 on down each the algebraic rung form of of the theF -matrices ladder. (also sion (✓ =0). sion on whether or not the “Gaffnian” wave function could be called 6j symbols), the entries are now Galois conjugated. A 6 the ground state for a gapped fractional quantum Hall (FQH) slight complication, which is actually an asset, is that writing stateWe albeit with parameterize a non-unitary “Yang-Lee” the CFT couplings describing its an ofF -matrix the requires competing a gauge choice and plaquette the most convenient and edge.5–7 We conclude that this is not possible, further restrict- choice may differ before and after Galois conjugation. ing the possible scope of non-unitary models in quantum me- Our method is not restricted to Galois conjugated DFibG chanics. and its factors FibG and FibG, but can be generalized to in- We reach this conclusion quite indirectly. Our main thrust finitely many non-unitary TQFTs, showing that they will not is the investigation of Galois conjugation in the simplest non- arise as low energy models for a gapped 2D quantum mechan- www.vistrails.org NSF Community Codes 2012 4 Benefits of Provenance-Rich Publications • Produce more knowledge–not just text • Allow scientists to stand on the shoulders of giants (and their own) • Science can move faster! • Higher-quality publications • Authors will be more careful • Many eyes to check results • Describe more of the discovery process: people only describe successes, can we learn from mistakes? • Expose users to different techniques and tools: expedite their training; and potentially reduce their time to insight
www.vistrails.org NSF Community Codes 2012 5 VisTrails • Combines features of visualization, data analysis, and scientific workflow systems - Orchestrate multiple tools and libraries (e.g., VTK, R, matplotlib) - Visual spreadsheet for comparing results • Tracks provenance automatically as users generate and test hypotheses • Leverages provenance to streamline exploration • Supports reflective reasoning and collaboration • Concerned with usability
www.vistrails.org NSF Community Codes 2012 6 VisTrails • Open-source, freely downloadable system (www.vistrails.org) - Also on github (github.com/vistrails) • Multi-platform: users on Mac, Linux, and Windows • Python code and uses PyQt and Qt for the interface • Over 35,000 downloads • User’s guide, wiki, and mailing list • Many users in different disciplines and countries: • Using tms for improving memory (Pyschiatry, U. • Visualizing environmental simulations (CMOP STC) Utah) • Simulation for solid, fluid and structural mechanics • eBird (Cornell, NSF DataONE) (Galileo Network, UFRJ Brazil) • Astrophysical Systems (Tohline, LSU) • Quantum physics simulations (ALPS, ETH Zurich) • NIH NBCR (UCSD) • Climate analysis (CDAT) • Pervasive Technology Labs (Heiland, Indiana • Habitat modeling (USGS) University) • Open Wildland Fire Modeling (U. Colorado, NCAR) • Linköping University • High-energy physics (LEPP, Cornell) • University of North Carolina, Chapel Hill • Cosmology simulations (LANL) • UTEP www.vistrails.org NSF Community Codes 2012 7 DataONE Integration • Distributed framework and sustainable cyberinfrastructure to access well-described and easily discovered observational data • Have VisTrails package to access data from DataONE
www.vistrails.org NSF Community Codes 2012 8 USGS Habitat Modeling 262
270 Figure 2A
263 264
265 Morisette et al.
[Morisette et al., 2012]271
www.vistrails.org 272 Figure 2B NSF Community Codes 2012 9
15
273
274 Morisette et al.
17 UV-CDAT: Climate Analysis • Climate-specific app built on VisTrails workflows and provenance
Variables Project Workspace
Visual Spreadsheet Visualization Properties
Plots & Analyses
[uv-cdat.llnl.gov] [Santos et al., 2012] www.vistrails.org NSF Community Codes 2012 10 Workflows
data = vtk.vtkStructuredPointsReader() data.SetFileName(../examples/data/head.120.vtk)
contour = vtk.vtkContourFilter() contour.SetInput(data.GetOutput()) contour.SetValue(0, 67)
mapper = vtk.vtkPolyDataMapper() mapper.SetInput(contour.GetOutput()) mapper.ScalarVisibilityOff()
actor = vtk.vtkActor() actor.SetMapper(mapper)
cam = vtk.vtkCamera() cam.SetViewUp(0,0,-1) cam.SetPosition(745,-453,369) cam.SetFocalPoint(135,135,150) cam.ComputeViewPlaneNormal()
ren = vtk.vtkRenderer() ren.AddActor(actor) ren.SetActiveCamera(cam) ren.ResetCamera() renwin = vtk.vtkRenderWindow() renwin.AddRenderer(ren)
style = vtk.vtkInteractorStyleTrackballCamera() iren = vtk.vtkRenderWindowInteractor() iren.SetRenderWindow(renwin) iren.SetInteractorStyle(style) iren.Initialize() iren.Start()
www.vistrails.org NSF Community Codes 2012 11 Workflows
data = vtk.vtkStructuredPointsReader() data.SetFileName(../examples/data/head.120.vtk)
contour = vtk.vtkContourFilter() contour.SetInput(data.GetOutput()) contour.SetValue(0, 67)
mapper = vtk.vtkPolyDataMapper() mapper.SetInput(contour.GetOutput()) mapper.ScalarVisibilityOff()
actor = vtk.vtkActor() actor.SetMapper(mapper)
cam = vtk.vtkCamera() cam.SetViewUp(0,0,-1) PythonSource cam.SetPosition(745,-453,369) cam.SetFocalPoint(135,135,150) cam.ComputeViewPlaneNormal()
ren = vtk.vtkRenderer() ren.AddActor(actor) ren.SetActiveCamera(cam) ren.ResetCamera() renwin = vtk.vtkRenderWindow() renwin.AddRenderer(ren)
style = vtk.vtkInteractorStyleTrackballCamera() iren = vtk.vtkRenderWindowInteractor() iren.SetRenderWindow(renwin) iren.SetInteractorStyle(style) iren.Initialize() iren.Start()
www.vistrails.org NSF Community Codes 2012 11 Workflows
data = vtk.vtkStructuredPointsReader() data.SetFileName(../examples/data/head.120.vtk) vtkStructuredPointsReader contour = vtk.vtkContourFilter() contour.SetInput(data.GetOutput()) contour.SetValue(0, 67)
mapper = vtk.vtkPolyDataMapper() vtkContourFilter mapper.SetInput(contour.GetOutput()) mapper.ScalarVisibilityOff()
actor = vtk.vtkActor() actor.SetMapper(mapper)
cam = vtk.vtkCamera() vtkDataSetMapper cam.SetViewUp(0,0,-1) cam.SetPosition(745,-453,369) cam.SetFocalPoint(135,135,150) cam.ComputeViewPlaneNormal()
ren = vtk.vtkRenderer() vtkCamera vtkActor ren.AddActor(actor) ren.SetActiveCamera(cam) ren.ResetCamera() renwin = vtk.vtkRenderWindow() renwin.AddRenderer(ren)
style = vtk.vtkInteractorStyleTrackballCamera() vtkRenderer iren = vtk.vtkRenderWindowInteractor() iren.SetRenderWindow(renwin) iren.SetInteractorStyle(style) iren.Initialize() iren.Start() VTKCell www.vistrails.org NSF Community Codes 2012 11 Workflows
data = vtk.vtkStructuredPointsReader() data.SetFileName(../examples/data/head.120.vtk) vtkStructuredPointsReader contour = vtk.vtkContourFilter() contour.SetInput(data.GetOutput()) contour.SetValue(0, 67)
mapper = vtk.vtkPolyDataMapper() vtkContourFilter mapper.SetInput(contour.GetOutput()) mapper.ScalarVisibilityOff()
actor = vtk.vtkActor() actor.SetMapper(mapper)• Orchestrate multiple tools cam = vtk.vtkCamera() vtkDataSetMapper cam.SetViewUp(0,0,-1) • Structured: easier to understand cam.SetPosition(745,-453,369) cam.SetFocalPoint(135,135,150) cam.ComputeViewPlaneNormal()• Natural granularity for tracking
ren = vtk.vtkRenderer() vtkCamera vtkActor ren.AddActor(actor) modifications ren.SetActiveCamera(cam) ren.ResetCamera() Simpler maintenance renwin = vtk.vtkRenderWindow()• renwin.AddRenderer(ren)
style = vtk.vtkInteractorStyleTrackballCamera() vtkRenderer iren = vtk.vtkRenderWindowInteractor() iren.SetRenderWindow(renwin) iren.SetInteractorStyle(style) iren.Initialize() iren.Start() VTKCell www.vistrails.org NSF Community Codes 2012 11 Making code available in VisTrails • Package infrastructure • Wrap python libraries, command-line calls, or use other interfaces (jpype, rpy, etc.) • Need to specify: 1. Package identification information 2. Module structures: input & output ports 3. Compute method for each module
www.vistrails.org NSF Community Codes 2012 12 Example: Wrapping an existing python library • seawater python package: - http://pypi.python.org/pypi/seawater/1.0.3
identifier = 'org.ocefpaf.seawater' version = '1.0.3' name = 'Seawater Routines'
import seawater
class SaturationN2(Module): _input_ports = [('S', Float), ('T', Float)] _output_ports = [('res', Float)]
def compute(self): s = self.getInputFromPort("S") t = self.getInputFromPort("T") res = seawater.satN2(s, t) self.setResult('res', res)
_modules = [SaturationN2,
www.vistrails.org NSF Community Codes 2012 13 Change-based Provenance • Undo/redo stacks are linear! • We lose history of exploration • Old Solution: User saves files/state
• VisTrails Solution: - Automatically & transparently capture entire history as a tree - Users can tag or annotate each version - Users can go back to any version by selecting it in the tree
www.vistrails.org NSF Community Codes 2012 14 Representing Provenance: Version Tree
Isosurface Isosurface Script Volume Rendering HW Volume Rendering SW
Histogram Clipping Plane HW Clipping Plane SW
Combined Rendering HW Combined Rendering SW
Image Slices HW Image Slices SW
www.vistrails.org NSF Community Codes 2012 15 Representing Provenance: Version Tree
Isosurface Isosurface Script Volume Rendering HW Volume Rendering SW
vtkStructuredPointsReader Histogram Clipping Plane HW Clipping Plane SW vtkStructuredPointsReader
vtkContourFilter vtkPiecewiseFunction vtkColorTransferFunction
vtkStructuredPointsReader vtkVolumeTextureMapper3D vtkDataSetMapper Combined Rendering HW Combined Rendering SW vtkVolumeProperty
vtkContourFilter MplPlot vtkCamera vtkActor
vtkCamera vtkVolume
vtkDataSetMapper vtkRenderer Image Slices HW Image Slices SW
MplFigure vtkRenderer vtkCamera vtkActor VTKCell
MplFigureCell VTKCell vtkRenderer
VTKCell www.vistrails.org NSF Community Codes 2012 15 Structure of Changes
vtkStructuredPointsReader
vtkContourFilter
vtkDataSetMapper
Change 1 (add module): vtkCamera vtkActor add module MplPlot Change 2 (change configuration): Isosurface vtkRenderer add function source(“vspr = self.getInputFromPort(...”) Change 3 (add connection): add connection vtkStructuredPointsReader → MplPlot VTKCell Change 4 (paste): add module MplFigure Histogram add module MplFigureCell vtkStructuredPointsReader add connection MplFigure → MplFigureCell Change 5 (add connection): add connection MplPlot → MplFigre vtkContourFilter MplPlot
vtkDataSetMapper
MplFigure
vtkCamera vtkActor
MplFigureCell vtkRenderer
VTKCell [Freire et al., 2006] www.vistrails.org NSF Community Codes 2012 16 Execution Provenance
VTKCell www.vistrails.org NSF Community Codes 2012 17 Provenance: Beyond Reproducibility • Support reflective reasoning • Compare data products • Explore parameter spaces and compare results • Suggest new directions
www.vistrails.org NSF Community Codes 2012 18 Reflective Reasoning
Data Perception & Data Computation Knowledge Products Cognition
Specification
[Modified from Van Wijk, Vis 2005]
www.vistrails.org NSF Community Codes 2012 19 Reflective Reasoning
Data Perception & Data Computation Knowledge Products Cognition
Specification Exploration
[Modified from Van Wijk, Vis 2005] • Data analysis and visualization are iterative processes • In exploratory tasks, change is the norm! “Reflective thought requires the ability to store temporary results, to make inferences from stored knowledge, and to follow chains of reasoning backward and forward, sometimes backtracking when a promising line of thought proves to be unfruitful. The process takes time.” – Donald A. Norman www.vistrails.org NSF Community Codes 2012 19 Exploring and Comparing Data & Results • Workflow Differences
vtkStructuredPointsReader vtkStructuredPointsReader
vtkPiecewiseFunction vtkColorTransferFunction
vtkContourFilter
vtkVolumeTextureMapper3D
vtkVolumeProperty vtkDataSetMapper
vtkCamera vtkVolume vtkCamera vtkActor
vtkRenderer vtkRenderer
VTKCell VTKCell
www.vistrails.org NSF Community Codes 2012 20 Exploring and Comparing Data & Results • Workflow Differences
vtkStructuredPointsReader vtkStructuredPointsReadervtkStructuredPointsReader
vtkPiecewiseFunction vtkColorTransferFunction vtkPiecewiseFunction vtkColorTransferFunction vtkContourFilter vtkContourFilter
vtkVolumeTextureMapper3D
vtkVolumeTextureMapper3D vtkDataSetMapper vtkVolumeProperty vtkVolumeProperty vtkDataSetMapper
vtkCameravtkVolume vtkVolume vtkCamera vtkActorvtkCamera vtkActor
vtkRenderer vtkRenderer vtkRenderer
VTKCell VTKCell VTKCell
www.vistrails.org NSF Community Codes 2012 20 Exploring and Comparing Data & Results • Parameter Exploration
www.vistrails.org NSF Community Codes 2012 21 VisComplete • Similar to textual completions on the web and in user interfaces • Mine provenance collection: Identify fragments that co-occur in a collection of workflows • Predict sets of likely workflow additions to a given partial workflow
[Koop et al., 2008] www.vistrails.org NSF Community Codes 2012 22 VisComplete
vtkDataSetReader vtkDataSetReader vtkDataSetReader
vtkStreamTracer vtkContourFilter vtkMaskPoints
vtkTubeFilter vtkDataSetMapper vtkGlyph3D
vtkPolyDataMapper vtkActor vtkPolyDataMapper
vtkActor vtkRenderer vtkActor
vtkRenderer VTKCell vtkRenderer
VTKCell VTKCell
www.vistrails.org NSF Community Codes 2012 23 Sharing and Collaboration • Packaging: maintain vistrail file/database that contains all workflow versions, packages used, user/date/time stamps, mashups - Multiple users can work on the same vistrail - Working on allowing users to more easily include code and data • Stronger links from provenance to actual data • Workflow Mashups: simplify interaction in intuitive interfaces • crowdLabs: a social web site for sharing workflows and provenance - www.crowdlabs.org - Upload workflows from VisTrails - Run workflows from a web browser - Explore parameterizations from a web browser using mashups
www.vistrails.org NSF Community Codes 2012 24 Support multiple users • Provenance allows others to see what you have done, how you computed it, and build from that • Distributed like modern version control systems (e.g. git)