Reproducibility Using Vistrails

1 Reproducibility Using VisTrails Juliana Freire Polytechnic Institute of NYU David Koop Polytechnic Institute of NYU Fernando Seabra Chirigati Polytechnic Institute of NYU Cláudio T. Silva Polytechnic Institute of NYU CONTENTS 1.1 Introduction ...................................................... 1 1.2 Reproducibility, Workflows, and Provenance .................... 3 1.2.1 The Anatomy of a Reproducible Experiment ........... 3 1.2.2 Describing Computations as Workflows ................. 4 1.2.3 Provenance in Workflow Systems ....................... 6 1.2.4 Workflows and Reproducibility .......................... 7 1.3 The VisTrails System ............................................ 7 1.4 Reproducing and Publishing Results with VisTrails ............ 9 1.4.1 Reproducibility Support ................................. 10 1.4.2 Publishing Results ....................................... 14 1.4.3 Publishing Interactive Results on the Web .............. 14 1.5 Challenges and Opportunities ................................... 16 1.6 Related Work .................................................... 16 1.7 Conclusion ........................................................ 17 Acknowledgments ................................................... 17 1.1 Introduction Science has long placed an emphasis on revisiting and reusing past results: reproducibility is a core component of the scientific process. Testing and extend- 1 2 Implementing Reproducible Computational Research ing published results are standard activities that lead to practical progress: science moves forward using past work and allowing scientists to “stand on the shoulders of giants”. In natural science, long tradition requires experiments to be described in enough detail so that they can be reproduced by other researchers. This standard, however, has not been widely applied for computational experiments. Researchers often have to rely on tables, plots and figure captions included in papers. Consequently, it is difficult to verify and reproduce many published results [46], and this has led to a credibility crisis in computational science [19]. Scientific communities in di↵erent domains have started to act in an at- tempt to address this problem. Prestigious conferences such as SIGMOD [62] and VLDB [75], journals such as PNAS [55], Biostatistics [1], the IEEE Trans- actions on Signal Processing [69], Nature, Science, to name a few, have been encouraging—and sometimes requiring—that published results be accompa- nied by the necessary data and code needed to reproduce them. However, it can be difficult and time-consuming for authors to make their experiment reproducible and for reviewers to verify the results. Authors need to encap- sulate the whole experiment (data, parameter settings, source code and en- vironment) to guarantee that the same results are generated. Even when an experiment compendium is available, reviewers may have difficulties reproducing the experiments due to missing libraries or dependences on a specific operating system version to run (or compile) the experiment. We posit that by planning for reproducibility and through the use of systems that systematically capture provenance of the scientific exploration process, researchers will not only create results that are reproducible but they can also streamline many of the tasks they have to carry out. With this in mind, we have built a framework that supports the life cycle of computational experiments [28, 41]. This framework has been implemented and is currently released as part VisTrails [23, 74], an open-source, workflow-based data exploration and visualization system. VisTrails relies on a provenance management component to automatically and transparently capture the necessary metadata to allow experiments to be reproduced, including the exe- cutable specification of computational process (i.e., the workflow structure), parameter settings, input and output data, library versions, and code. It im- plements mechanisms that leverage the provenance information to support the exploratory process [64, 42, 40] which is common in data-intensive science [33]. These mechanisms also help reviewers run and verify the results. In this chapter, we describe the VisTrails reproducibility infrastructure. We start in Section 1.2 with our definition for computational reproducibility and reproducibility levels. We also give an overview of workflow systems and discuss their benefits and limitations for the creation of reproducible experiments. The VisTrails system is described in Section 1.3, and in Section 1.4, we present specific components we have added to the system to support both authors and reviewers of reproducible experiments. We review related work in Reproducibility Using VisTrails 3 Provenance Galois Conjugates of Topological Phases M. H. Freedman,1 J. Gukelberger,2 M. B. Hastings,1 S. Trebst,1 M. Troyer,2 and Z. Wang1 1Microsoft Research, Station Q, University of California, Santa Barbara, C 93106, US 2Theoretische Physik, ETH Zurich, 8093 Zurich, Switzerland (Dated: July 6, 2011) Galois conjugation relates unitary conformal field theories (CFTs) and topological quantum field theories (TQFTs) to their non-unitary counterparts. Here we investigate Galois conjugates of quantum double models, such as the Levin-Wen model. While these Galois conjugated Hamiltonians are typically non-Hermitian, we find that their ground state wave functions still obey a generalized version of the usual code property (local operators do not act on the ground state manifold) and hence enjoy a generalized topological protection. The key question addressed in this paper is whether such non-unitary topological phases can also appear as the ground states of Hermitian Hamiltonians. Specific attempts at constructing Hermitian Hamiltonians with these ground states lead to a loss of the code property and topological protection of the degenerate ground states. Beyond this we rigorously prove that no local change of basis (IV.5) can transform the ground states of the Galois conjugated doubled Fibonacci theory into the ground states of a topological model whose Hermitian Hamiltonian satisfies Lieb-Robinson bounds. These include all gapped local or quasi-local Hamiltonians. similar statement holds 5 for many other non-unitary TQFTs. One consequence is that the “Gaffnian” wave function cannot be the ground state ofa) a gappedhoneycomb fractional quantum Hall state. rung terms as 0.56 0.56 P CS numbers: 05.30.Pr, 73.43.-f J = sin ✓ and J = cos ✓, 0.48 0.48 r p p ) / J 0.4 0.4 L where ✓8 =0corresponds to the unperturbed Hamiltonian. I.( INTRODUCTION belian Levin-Wen model. This model, which is also called ∆ The phase diagrams as a function of ✓ have been mapped out 0.32 “DFib”, is a0.32 topological quantum field theory18 (TQFT) whose 4 states are string-netsfor on both a the surface DFib labeled model byand either the DYL a triv- model, respectively. Galois conjugation,0.24 by definition, replaces a root of a poly- 0.24 ial or “Fibonacci” anyon.Directly From probing this starting the topological point, we order give in the DYL model nomial by another one with identical algebraic properties. For 0.16 width a W rigorous = 2 argument0.16 and that its the Hermitian “Gaffnian” counterpart ground state we show cannot the lifting of their re- example, i and i are Galois conjugate (consider z2 +1=0)width W = 3 finite-size gap be locally conjugated to the ground state of any topological 1+p−5 0.081 1 p5 2 0.08 spective ground-state degeneracies in Figs. 6 and 7 when in- as are φ = and = − (consider z z 1=0), 2 − φ 2 − − phase, within a Hermitiancluding a model string satisfying tension. We Lieb-Robinson find a striking qualitative dif- 3 3 2⇡i/3 3 2⇡i/3 3 9 as well as p2, p2e ,0 and p2e− (consider z 2= (LR) bounds0 (whichference includes between but is these not two limited models: to gapped For the DYL model the 0 0.1 0.2− 0.3 0.4 0.5 0). In physics Galois conjugation can be usedinverse to system convert size non- 1/L local and quasi-locallifting Hamiltonians). of the ground-state degeneracy is exponentially sup- unitary conformal field theories (CFTs) to unitary ones, and Lieb-Robinson boundspressed are with a technical increasing tool system for local size –lattice characteristic of a topo- vice versa. One famous example is the non-unitary Yang-Lee models. In relativisticallylogical invariant phase. For field the theories, Hermitian the model, speed ofon the other hand, we b) ladder CFT, which is Galois conjugate to the Fibonacci CFT (G2)1, light is a strict upperfind bound a splitting to theof velocity the ground-state of propagation. degeneracy In proportional to Simulation 0.32 0.32 the even (or integer-spin) subset of su(2)3. lattice theories, theJ LRrL bounds. The linear provide increase a similar with upper both bound system size and coupling by a velocity calledcan the be LR easily velocity, understood but in contrast by the to different the rel- matrix elements of Results In statistical mechanicsp non-unitary conformal field theo- 1,2 ries have a venerable history.0.24 However, it has remained less ativistic case0.24 there canthe be string some tension exponentially term on small a single “leakage” rung for the two degener- ) / J L clear if there exist physical( situations in which non-unitary outside the light-coneate in ground-states the lattice case. of the The unperturbed Lieb-Robinson model. Plotting the low- ∆ models can provide a useful description of the low energy bounds are a way ofenergy bounding spectrum the leakage in Fig. 7 outside clearly the

Reproducibility Using Vistrails

Vistrails: Enabling Interactive Multiple-View Visualizations

Vistrails Documentation Release 1.7.0

Using Vistrails and Provenance for Teaching Scientific Visualization

Tackling the Provenance Challenge One Layer at a Time

Vistrails Documentation Release 2.0.3

Making Computations and Publications Reproducible with Vistrails

Managing Rapidly-Evolving Scientific Workflows

Vistrails User's Guide

Introduction to the Vistrails System

Vistrails Training Workflow

Sharing Experiments and Their Provenance

Workflows for Parameter Studies of Multi-Cell Modeling