
V is U A L I Z A T I O N C ORN E R Editors: Claudio Silva, [email protected] Joel E. Tohline, [email protected] PROVENANCE FOR VISUALIZATIONS: REPRODUCIBILITY AND BEYOND By Claudio T. Silva, Juliana Freire, and Steven P. Callahan The demand for the construction of complex visualizations is growing in many disciplines of science, as users are faced with ever increasing volumes of data to analyze. The authors present VisTrails, an open-source provenance management system that provides infrastructure for data exploration and visualization. omputing has been an enor- Without provenance, it’s difficult on the notion of data flows,2 and they mous accelerator for science, (and sometimes impossible) to repro- provide visual interfaces for producing C leading to an information ex- duce and share results, solve problems visualizations by assembling pipelines plosion in many different fields. Future collaboratively, validate results with out of modules (or functions) connect- scientific advances depend on our abil- different input data, and understand ed in a network. SCIRun supports an ity to comprehend the vast amounts the process used to solve a particular interface that lets users directly edit of data currently being produced and problem. In addition, data products’ data flows. MayaVi and ParaView have acquired. To analyze and understand longevity becomes limited—with- a different interaction paradigm that this data, though, we must assemble out precise and sufficient informa- implicitly builds data flows as the user complex computational processes and tion about how the data product makes “task-oriented” choices (such as generate insightful visualizations, was generated, its value diminishes selecting an isosurface value). which often require combining loosely significantly. Although these systems let users coupled resources, specialized librar- The lack of adequate provenance create complex visualizations, they ies, and grid and Web services. Such support in visualization systems mo- lack the ability to support data explo- processes could generate yet more tivated us to build VisTrails, an open ration at a large scale. Notably, they data, adding to the information over- source provenance-management sys- don’t adequately support collaborative flow scientists currently deal with. tem that provides infrastructure for creation and exploration of multiple Today, the scientific community data exploration and visualization visualizations. Because these systems uses ad hoc approaches to data ex- through workflows. VisTrails trans- don’t distinguish between the defini- ploration, but such approaches have parently records detailed provenance tion of a data flow and its instances, serious limitations. In particular, sci- of exploratory computational tasks to execute a given data flow with dif- entists and engineers must expend and leverages this information be- ferent parameters (for example, differ- substantial effort managing data (such yond just the ability to reproduce and ent input files), users must manually as scripts that encode computational share results. In particular, it uses this set these parameters through a GUI. tasks, raw data, data products, images, information to simplify the process of Clearly, this process doesn’t scale to and notes) and recording provenance exploring data through visualization. more than a few visualizations at a information (that is, all the informa- time. Additionally, modifications to tion necessary for reproducing a cer- Visualization Systems parameters or to a data flow’s defini- tain piece of data) so that they can Visualization systems such as MayaVi tion are destructive—the systems don’t answer basic questions: Who created (http://mayavi.sourceforge.net) and maintain any change history. This re- a data product and when? When was ParaView (www.paraview.org)—which quires the user to first construct the it modified, and who modified it? are built on top of Kitware’s Visual- visualization and then remember the What process was used to create the ization Toolkit (VTK)1—as well as input data sets, parameter values, and data product? Were two data products SCIRun (http://software.sci.utah.edu/ the exact dataflow configuration that derived from the same raw data? Not scirun.html) enable users to interac- led to a particular image. only is this process time-consuming, tively create and manipulate complex Finally, before constructing a vi- it’s also error-prone. visualizations. Such systems are based sualization, users must often acquire, Copublished by the IEEE CS and the AIP 1521-9615/07/$25 ©2007 IEEE COMPUTING IN SCIENCE & ENGINEERING REPRODUCIBILITY AND SHARING we do things. This first column discusses the benefits of provenance DATA AND PROCEssES FOR THE and makes a case that better provenance mechanisms VISUALIZATION CORNER are needed for visualization. In upcoming columns of the Visualization Corner, we will attempt to inform the sci- By Claudio Silva and Joel E. Tohline entific community at large about the benefits and tech- nologies related to provenance. In particular, we want to reetings! We’re the new co-editors for the Visualiza- promote the idea of reproducible visualizations. We en- Gtion Corner. Claudio is a computer science professor courage authors of articles published in the department at the University of Utah and faculty member of the Scien- to provide metadata for visualizations in their articles that tific Computing and Imaging (SCI) Institute, where he does let readers reproduce images as well as generate related research primarily in visualization, graphics, and applied ones (for example, using different data). Ultimately, our geometry. Joel is a professor of physics and astronomy at hope is that this trend will spread to the point that pub- Louisiana State University and a faculty member in LSU’s lished articles will contain not only textual descriptions Center for Computation and Technology (CCT), with a of the techniques, but links to data, code, and the com- research focus on complex fluid flows in astrophysical plete overall process used to generate the scientific results. systems. We both have extensive experience in high-per- As a mechanism to capture and share provenance formance computing. In partnership with our readers and metadata, authors can use VisTrails to produce speci- colleagues, we hope to bring you relevant and effective in- fications of the figures and plots presented in their -ar formation about visualization techniques that can directly ticles. We’ll archive this information at www.vistrails.org/ affect the way our readers do science. We would like to use index.php/CiSE. The data and processes associated with new Web technologies (Wikis, blogs, and so on) to encour- this column are already available on the Web site, so age the community to more actively participate in the way you can reproduce them, right now, from your desktop! generate, or transform a given data line evolution as a visualization trail, users integrate a wide range of librar- set—for example, to calibrate a simu- or vistrail. ies. This makes the system suitable for lation, they must obtain data from sen- The stored provenance ensures other exploratory tasks, including data sors, generate data from a simulation, that users will be able to reproduce mining and integration. and finally construct and compare the the visualizations and lets them easily visualizations for both data sets. Most navigate through the space of pipe- Creating an Interactive visualization systems, however, don’t lines created for a given exploration Visualization with VisTrails give users adequate support for cre- task. The VisTrails interface lets users To illustrate the issues involved in ating complex pipelines that support query, interact with, and understand creating visualizations and how prov- multiple libraries and services. the visualization process’s history. In enance can aid in this process, we particular, they can return to previous present the following scenario, com- VisTrails: Provenance versions of a pipeline and change the mon in medical data visualization. for Visualization specification or parameters to gener- Starting from a volumetric computed The VisTrails system (www.vistrails. ate a new visualization without losing tomography (CT) data set, we generate org) we developed at the University previous changes. different visualizations by exploring of Utah is a new visualization system Another important feature of the the data through volume rendering, that provides a comprehensive prov- action-based provenance model is isosurfacing (extracting a contour), and enance-management infrastructure that it enables a series of operations slicing. Note that with proper modi- and can be easily combined with ex- that greatly simplify the exploration fications, this example also works for isting visualization libraries. Unlike process and could reduce the time visualizing other types of data (for ex- previous systems, VisTrails uses an to insight. In particular, it allows the ample, tetrahedral meshes). action-based provenance model that flexible reuse of pipelines and provides uniformly captures changes to both a scalable mechanism for creating and Dataflow Processing Networks parameter values and pipeline defini- comparing numerous visualizations as and Visual Programming. tions by unobtrusively tracking all well as their corresponding pipelines. A useful paradigm for building visu- changes that users make to pipelines Although we originally built VisTrails alization applications is the
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-