
V is U A L I Z A T I O N C ORN E R Editors: Claudio Silva, [email protected] Joel E. Tohline, [email protected] PROVENANCE FOR VISUALIZATIONS REPRODUCIBILITY AND BEYOND By Claudio T. Silva, Juliana Freire, and Steven P. Callahan The demand for the construction of complex visualizations is growing in many disciplines of science, as users are faced with ever increasing volumes of data to analyze. The authors present VisTrails, an open source provenance-management system that provides infrastructure for data exploration and visualization. omputing has been an enor- Without provenance, it’s diffi- on the notion of data flows,2 and they mous accelerator for science, cult (and sometimes impossible) to provide visual interfaces for produc- C leading to an information ex- reproduce and share results, solve ing visualizations by assembling pipe- plosion in many different fields. Fu- problems collaboratively, validate re- lines out of modules (or functions) ture scientific advances depend on our sults with different input data, and connected in a network. SCIRun ability to comprehend the vast amounts understand the process used to solve supports an interface that lets users of data currently being produced and a particular problem. In addition, directly edit data flows. MayaVi and acquired. To analyze and understand data products’ longevity becomes ParaView have a different interac- this data, though, we must assemble limited—without precise and suffi- tion paradigm that implicitly builds complex computational processes and cient information about how the data data flows as the user makes “task- generate insightful visualizations, product was generated, its value di- oriented” choices (such as selecting which often require combining loosely minishes significantly. an isosurface value). coupled resources, specialized librar- The lack of adequate provenance Although these systems let users ies, and grid and Web services. Such support in visualization systems mo- create complex visualizations, they processes could generate yet more tivated us to build VisTrails, an open lack the ability to support data explo- data, adding to the information over- source provenance-management sys- ration at a large scale. Notably, they flow scientists currently deal with. tem that provides infrastructure for don’t adequately support collabora- Today, the scientific community data exploration and visualization tive creation and exploration of mul- uses ad hoc approaches to data ex- through workflows. VisTrails trans- tiple visualizations. Because these ploration, but such approaches have parently records detailed provenance systems don’t distinguish between serious limitations. In particular, of exploratory computational tasks the definition of a data flow and its scientists and engineers must expend and leverages this information be- instances, to execute a given data substantial effort managing data yond just the ability to reproduce and flow with different parameters (for (such as scripts that encode computa- share results. In particular, it uses this example, different input files), users tional tasks, raw data, data products, information to simplify the process of must manually set these parameters images, and notes) and recording exploring data through visualization. through a GUI. Clearly, this process provenance information (that is, all the doesn’t scale to more than a few vi- information necessary to reproduce Visualization Systems sualizations at a time. Additionally, a certain piece of data) so that they Visualization systems such as Maya- modifications to parameters or to a can answer basic questions: Who cre- Vi (http://mayavi.sourceforge.net) and data flow’s definition are destruc- ated a data product and when? When ParaView (www.paraview.org)—which tive—the systems don’t maintain any was it modified, and who modified it? are built on top of Kitware’s Visual- change history. This requires the What process was used to create the ization Toolkit (VTK)1—as well as user to first construct the visualiza- data product? Were two data products SCIRun (http://software.sci.utahedu/ tion and then remember the input derived from the same raw data? This scirun.html) enable users to interac- data sets, parameter values, and the process is not only time-consuming, tively create and manipulate complex exact dataflow configuration that led but also error-prone. visualizations. Such systems are based to a particular image. 82 Copublished by the IEEE CS and the AIP 1521-9615/07/$25.00 ©2007 IEEE COMPUTING IN SCIENCE & ENGINEERING REPRODUCIBILITY AND SHARING more actively participate in the way we do things. This first column discusses the benefits of provenance DATA AND PROCEssES FOR THE and makes a case that better provenance mechanisms are VISUALIZATION CORNER needed for visualization. In upcoming installments, we’ll attempt to inform the scientific community at large about By Claudio Silva and Joel E. Tohline the benefits and technologies related to provenance. In particular, we want to promote the idea of reproducible reetings! We’re the new co-editors for the Visualiza- visualizations. We encourage authors of articles published Gtion Corner. Claudio is a computer science professor at here to provide metadata for visualizations in their articles the University of Utah and faculty member of the Scientific that let readers reproduce images as well as generate Computing and Imaging Institute, where he does research related ones (for example, using different data). Ultimately, primarily in visualization, graphics, and applied geometry. our hope is that this trend will spread to the point that Joel is a professor of physics and astronomy at Louisiana published articles will contain not only textual descriptions State University and a faculty member in LSU’s Center for of the techniques, but links to data, code, and the complete Computation and Technology, with a research focus on overall process used to generate the scientific results. complex fluid flows in astrophysical systems. We both have As a mechanism to capture and share provenance meta- extensive experience in high-performance computing. In data, authors can use VisTrails to produce specifications partnership with our readers and colleagues, we hope to of the figures and plots presented in their articles. We’ll bring you relevant and effective information about visual- archive this information at www.vistrails.org/index.php/ ization techniques that can directly affect the way our read- CiSE. The data and processes associated with this column ers do science. We would like to use new Web technologies are already available on the Web site, so you can reproduce (Wikis, blogs, and so on) to encourage the community to them, right now, from your desktop! Finally, before constructing a vi- this detailed provenance of the pipe- extensible infrastructure lets users in- sualization, users must often acquire, line evolution as a visualization trail, tegrate a wide range of libraries. This generate, or transform a given data or vistrail. makes the system suitable for other set—for example, to calibrate a simu- The stored provenance ensures exploratory tasks, including data min- lation, they must obtain data from sen- that users will be able to reproduce ing and integration. sors, generate data from a simulation, the visualizations and lets them easily and finally construct and compare the navigate through the space of pipe- Creating an Interactive visualizations for both data sets. Most lines created for a given exploration Visualization with VisTrails visualization systems, however, don’t task. The VisTrails interface lets users To illustrate the issues involved in give users adequate support for cre- query, interact with, and understand creating visualizations and how prov- ating complex pipelines that support the visualization process’s history. In enance can aid in this process, we multiple libraries and services. particular, they can return to previous present the following scenario, com- versions of a pipeline and change the mon in medical data visualization. VisTrails: Provenance specification or parameters to gener- Starting from a volumetric computed for Visualization ate a new visualization without losing tomography (CT) data set, we generate The VisTrails system (www.vistrails. previous changes. different visualizations by exploring org) we developed at the Univer- Another important feature of the the data through volume rendering, sity of Utah is a new visualization action-based provenance model is isosurfacing (extracting a contour), and system that provides a comprehensive that it enables a series of operations slicing. Note that with proper modi- provenance-management infrastruc- that greatly simplify the exploration fications, this example also works for ture and can be easily combined with process and could reduce the time to visualizing other types of data (for ex- existing visualization libraries. Unlike insight. In particular, the model al- ample, tetrahedral meshes). previous systems, VisTrails uses an lows the flexible reuse of pipelines action-based provenance model that and provides a scalable mechanism Dataflow-Processing Networks uniformly captures changes to both for creating and comparing numer- and Visual Programming parameter values and pipeline defini- ous visualizations as well as their cor- A useful paradigm for building visu- tions by unobtrusively tracking all responding pipelines. Although we alization applications is the dataflow changes that users make to pipelines
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-