Discovering the Unexpected

Guest Editors’ Introduction

Discovering the Kris Cook Paciﬁc Northwest National Laboratory

Rae Earnshaw Unexpected University of Bradford, UK John Stasko Georgia Institute of Technology

Visualization has been the cornerstone of scien- with visualization that have led to the current situation. tific progress throughout history. Much of mod- In addition to introducing the articles in this special ern physics is a result of the superior abstract issue, this column sets out some of the key issues and visualization abilities of a few brilliant men. New- challenges associated with discovering the unexpected. ton visualized the effect of gravitational force fields in three dimensional space acting on the Interfaces and interaction center of mass. And Einstein visualized the geo- In visual analytics, the key purpose of visualizations metric effects of objects in relative and uniform and interaction techniques is to help the user gain accelerated motion, with the speed of light a insight into complex data and situations where models constant, time part of space, and acceleration alone are insufficient and human analytic skills must be indistinguishable from gravity. Virtually all com- employed. Visualizations must not only support the rep- prehension in science, technology, and even art resentation of critical data features but also provide suf- calls on our ability to visualize. In fact, the ability ficient contextual cues to help the user rapidly interpret to visualize is almost synonymous with under- what he or she is seeing. Interaction techniques strive standing. We have all used the expression “I see” to enable users to go beyond data exploration to achieve to mean “I understand.”1 a dialogue with their information space to detect trends and anomalies, evaluate hypotheses, and uncover unex- he need to make sense of complex, con- pected connections. T flicting, and dynamic information has Computer scientists wish to develop effective interfaces provided the impetus for new tools and technologies to computers that facilitate communication and interac- that combine the strengths of visualization with pow- tion between the human and the information in the erful underlying algorithms and innovative interaction machine. In the past, the importance of interface design techniques; tools that make up the emerging field of has not always been fully recognized, or it may have been visual analytics.2 Visual analytics is the formation of even ignored completely. Today, good design is increas- abstract visual metaphors in combination with a human ingly recognized as being a key requirement for a user information discourse (usually some form of interac- interface to be usable, flexible, and successful. With the tion) that enables detection of the expected and dis- current proliferation of computing devices, including covery of the unexpected within massive, dynamically mobile phones, PDAs, and other handheld devices, design changing information spaces. It is an outgrowth of the is even more important in order to enable the user to man- fields of scientific and information visualization but age the complexity that this introduces. With the intelli- includes technologies from many other fields, includ- gence in these devices, they can communicate with each ing knowledge management, statistical analysis, cog- other and reduce the cognitive load they place on the nitive science, decision science, and others. user. However, if information is filtered before it is pre- This marriage of computation, visual representation, sented to the user, how do we ensure that it is filtered and interactive thinking supports intensive analysis. appropriately and that key information that subsequent- The goal is not only to permit users to detect expected ly turns out to be important is not omitted or deleted? events, such as might be predicted by models, but also Studies have focused on the ways that users interact to help users discover the unexpected—the surprising with different kinds of devices. For example, the human anomalies, changes, patterns, and relationships that perception of information on a mobile phone is different are then examined and assessed to develop new from that on a wall-size display. We need to be aware of insight. these differences and the opportunities and constraints The “Visualization Time Line” sidebar gives a brief that they present both for the display of information and summary of some of the key developments associated also the user’s interaction with it.

Visualization Time Line ■ 1987 – NSF panel publishes a report on Visualization in Sci- Visualization in a presentation sense has been used for at entific Computing (McCormick, DeFanti, and Brown). Prin- least a thousand years. cipal recommendations: National funding should be granted for short- and long-term provision of tools and envi- ■ 1952 – A.S. Douglas receives a PhD at the University of ronments to support scientific visualization, and these Cambridge on human-computer interaction using the CRT should be made available to the community at large. display on Edsac 1 computer (http://www.pong- ■ 1987 – Volume data. William Lorensen and Harvey Cline story.com/1952.htm). publish their Marching Cubes algorithm (http:// ■ 1962 – Jack E. Bresenham develops one of the first graph- en.wikipedia.org/wiki/Marching_cubes). ics algorithms. ■ 1989 – IEEE hosts first annual IEEE Visualization Conference. ■ 1963 – Ivan E. Sutherland’s sketchpad system uses graphi- ■ 1990 – The World Wide Web debuts. cal techniques for human-machine communication ■ 1990 – Gregory Nielson and Bruce Shriver publish Visual- (http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR- ization in Scientific Computing, the first book on visualiza- 574.pdf). tion (IEEE Computer Society Press). ■ 1963 – General Motors Research develops the DAC-I sys- ■ 1992 – OpenGL real-time 3D graphics standard is tem, the first computer-aided design system (paper pub- published. lished at the 1963 American Federation of Information ■ 1992 – Modular visualization environments. Users create Processing Societies Fall Joint Computer Conference). their application interactively by connecting modules using ■ 1967 – ACM Special Interest Group on Graphics and Inter- a point-and-click interface (for example, AVS, Khoros, Iris active Techniques is founded. Explorer). ■ 1973 – William Newman and Robert Sproull publish Princi- ■ 1995 – IEEE hosts first annual Information Visualization ples of Interactive Computer Graphics (McGraw-Hill). Conference. ■ 1974 – Ted Nelson publishes Computer Lib/Dream Machines. ■ 1998 – Collaborative visualization. Networked multiuser ■ 1974 – ACM Siggraph hosts first annual ACM Siggraph environments developed. Conference. ■ 1999 – Readings in Information Visualization, edited by ■ 1977 – Apple releases Apple II micro with color graphics. Stuart Card, Jock Mackinlay, and Ben Shneiderman, is ■ 1977 – ACM Siggraph hosts first annual ACM Siggraph published (Morgan Kaufmann). Workshop on User-oriented Design of Interactive Graphics ■ 2000 – Robert Spence publishes Information Visualization Systems. (Addison-Wesley). ■ 1978 – ACM Special Interest Group on Social and Behavioral ■ 2004 – National Visualization and Analytics Center is found- Computing hosts a conference on “People-oriented Sys- ed. tems: When and How?” ■ 2005 – Illuminating the Path: the R&D Agenda for Visual ■ 1982 – Silicon Graphics is founded, developing 3D graph- Analytics, edited by James Thomas and Kristin Cook, is ics terminals and workstations. published (IEEE Computer Society Press). ■ 1982 – ACM Special Interest Group on Computer-Human ■ 2006 – NIF/NSF publishes a report on Visualization Research Interaction is founded. Challenges (http://tab.computer.org/vgtc/vrc/index.html) ■ 1983 – The Psychology of Human-Computer Interaction, edit- ■ 2006 – IEEE hosts first annual IEEE Visual Analytics Sympo- ed by Stuart K. Card, Thomas P. Moran, and Allen Newell, sium. is published (Lawrence Erlbaum Assoc.). ■ 1983 – Edward R. Tufte publishes The Visual Display of Quan- A more detailed version of this time line, which you can titative Information. add to, is available at http://www.inf.brad.ac.uk/home/ ■ 1983 – ACM Sigchi hosts first annual conference. Visualization Time Line Long v2.doc.

Models and data draw on statistical and mathematical approaches, as Data complexity inherently complicates the analytic well as mathematical representations that simplify data process. Some analytic challenges require the under- in ways that are appropriate to the task at hand. standing of massive volumes of data, such as simulation data or network data. In other cases, the complexity Cognitive loading results not from the data’s large scale but from the diver- A human can observe information being displayed sity of the data types required for analysis. In still other in real time or explore an information space using situations, data that is readily interpretable by humans, interactive techniques. However, what the human such as text, is much more difficult for a computer to brain can receive is limited in terms of information that interpret. In cases where well-formed models can be it must process and make judgments about, often in reliably constructed for identifying information and sit- the context of adjacent information either in time or uations of interest, they can form the basis for auto- space in the display environment. More speciﬁcally, it mated data analysis. However, in situations that are not commonly refers to the load on the human’s working well understood, or in which the purpose of the analy- memory during problem solving, reasoning, and think- sis is to detect surprising information, traditional mod- ing. Cognitive load theory, as defined by Sweller,3 els alone will not suffice. These models must be states that optimum learning occurs in humans when augmented with feature extraction techniques that the load on working memory is kept to a minimum to

16 September/October 2007 best facilitate the changes in long-term memory. Dis- Visual analytic systems like those the articles in this playing information in visual form may circumvent this issue describe can help analysts examine the data under to some degree, but not all visual representations may new perspectives or simply in a fashion that makes it be appropriate to searching for new pieces of infor- easier to understand the trends, themes, and relation- mation or anomalies in the data. This suggests the cog- ships the data suggests. Essentially, analysts construct nitive and human-computer interface aspects of schema that map the facts and data being examined into visualization are extremely important, and until these higher-order plans and activities. Visual analytic tools issues are addressed effectively, information and assist in the evidence gathering and information forag- knowledge will remain undiscovered, at least where ing aspects of this process as well as the integration and computers are being used. construction of new knowledge phases.

Further challenges Analytical reasoning One of the difficulties with discovering new infor- Discovery of the unexpected is a critical part of the mation is that it often lies outside the boundaries of the analytical reasoning process. When people are trying to current investigation, or it may make sense of their data to under- be transitory—only present for a stand situations and decide on an particular period of time. In cer- action, they develop various sce- tain circumstances these time Visual analytic systems narios for actions and their out- constraints can be external. In comes then evaluate data against security investigations, for exam- can help analysts examine data their mental models of these sce- ple, we may be given a time limit narios to determine how to max- within which an investigation under new perspectives or imize the outcome. People must must be conducted. If no new be able to identify unexpected information is uncovered within simply in a fashion that makes it information and have support for a specified time interval, the incorporating that information investigation must be concluded. easier to understand the trends, into their thought processes to What methods might be used to determine not only how it affects optimally home in on areas themes, and relationships the the potential outcomes that they where new information may be envision, but also whether it found? Our investigations there- data suggests. invalidates the potential scenar- fore could be subject to internal ios themselves. and external time constraints. If we knew what we were However, this can be a challenging process. To take a looking for, we would open up the relevant part of the simple example, when using computer systems, users boundary or time window to ensure we could investi- often stick with the particular settings they have always gate it. used (often the defaults), even though other settings In addition, if we don’t know what to expect, how do might be better. In more complex analytic situations, we know if we have found it? According to the principle cognitive biases can prevent us from seeing and inter- of falsifiability, defined by Karl Popper in the 1960s, preting information accurately. Tools and techniques progress in scientific discovery and understanding is are needed to help overcome users’ inherent human lim- made through the iterative refinement of existing the- itations to be able to see and truly understand their data. ories by discovering new information that is inconsis- tent with the theory.4 Thomas Kuhn has found little Discovering knowledge evidence of this and has argued that scientists work Using data analysis algorithms to search large vol- more in a series of paradigms5 —hence, the use of the umes of data might uncover new items of information. term paradigm shift. However, their significance may be related to other Given the increasing size and complexity of data sets items of information that are not discovered. Thus, only produced by laboratory experiments or the observations a partial, and perhaps erroneous, picture is obtained. Is of natural phenomena, the volume of data to be ana- it possible to adopt a more holistic approach that uncov- lyzed is a major challenge. Even with interactive visual ers all the new and unexpected pieces of information in tools and sophisticated data analysis algorithms, this is a data set—and the relationships between them? still difficult and time-consuming. More traditional information discovery approaches have relied on search engines to find significant pieces Sense-making of data. This is appropriate for some problems. How- Once data has been gathered and organized into ever, the significance of one piece of data may lie more forms to facilitate further inquiry, analysts perform a in its relationship to another piece of data so that the variety of sense-making activities on and with them. total is more than the sum of the discrete parts. Fur- One would hope to be so fortunate that the threads of thermore, the importance of new information may be evidence and discovery fit together seamlessly to expose apparent only in the context of the user’s understand- the greater insights embedded in the data, but this is ing of the problem at hand. Although information dis- rarely the case in practice. Analysts must make connec- covery can be supported by learning techniques such tions between disparate pieces of data and begin to con- as hybrid neural networks and genetic algorithms, the struct plausible scenarios of the bigger picture. human user’s understanding of the situation plays a

IEEE Computer Graphics and Applications 17 Guest Editors’ Introduction

key role in discovering knowledge and developing tools to understand the attributes of a set’s members. insight. They used two parameters: depth, which refers to the Stephen H. Muggleton says: prevalence of the distribution of attribute values in the set, and diversity, which refers to the distribution of During the twenty-first century, it is clear that these values across a range. Composite representations computers will continue to play an increasingly capture these values in the set and communicate this central role in supporting the testing, and even information to the user. The technique has been stud- formulation, of scientific hypotheses. This tradi- ied in three application domains and delivered some tionally human activity has already become unexpected results. unsustainable in many sciences without the aid In “nSpace and GeoTime: A VAST 2006 Case Study,” of computers. This is not only because of the scale Proulx and his colleagues discuss the use of their visual of the data involved but also because scientists analytics systems in working on the 2006 IEEE Sympo- are unable to conceptualize the breadth and sium on Visual Analytics Science and Technology Con- depth of the relationships between relevant data- test. The authors describe the analytic processes bases without computational support. The poten- undertaken in working on this challenge and how these tial benefits to science of such computerization systems assisted their exploration and sense-making are high—knowledge derived from large-scale activities. Their suite of tools combines data analysis scientific data could well pave the way to new algorithms and techniques with flexible visualizations technologies, ranging from personalized medi- and user interfaces, resulting in an environment that cines to methods for dealing with and avoiding allows analysts to pose and research hypotheses about climate change [Towards 2020 Science (Microsoft, the data. 2006); http://research.microsoft.com/towards “Bridging the Semantic Gap: Visualizing Transition 2020science]. ¼ Meanwhile, machine-learning Graphs with User-Defined Diagrams,” by Pretorius and techniques from computer science (including van Wijk, presents a method for assisting with the sense- neural nets and genetic algorithms) are being making of data. Custom diagrams convey the semantics used to automate the generation of scientific associated with the data. Two applications of the tech- hypotheses from data. Some of the more nique to large real-world applications show how new advanced forms of machine learning enable new properties were discovered and an unknown error was hypotheses, in the form of logical rules and prin- identified. ■ ciples, to be extracted relative to predefined back- ground knowledge. ¼ One exciting development that we might expect in the next ten years is the construction of the first microfluidic robot scien- References tist, which would combine active learning and 1. J.H. Clark, “Foreword,” An Introductory Guide to Scientific autonomous experimentation with microfluidic Visualization, R.A. Earnshaw and N. Wiseman, Springer technology. 6 Verlag, 1992. 2. Illuminating the Path: The Research and Development Agen- The articles in this special issue da for Visual Analytics, J.J. Thomas and K.A. Cook (eds.), “Visual Discovery in Computer Network Defense,” by IEEE CS Press, 2005, p. 186. D’Amico et al., explores using visual tools to assist in 3. J. Sweller, “Cognitive Load During Problem Solving: Effects locating patterns of network activity in large volumes of on Learning,” Cognitive Science, vol. 12, no. 1, 1988, pp. data. It also aims to provide a framework that synchro- 257-285. nizes with the cognitive and operational requirements 4. K. Popper, The Logic of Scientific Discovery, Routledge, 2002 of analysts who work in this field. Since this approach is (originally published 1959). designed to uncover both known and currently 5. T.S. Kuhn, The Structure of Scientific Revolutions, Univ. unknown forms of user activity, it is designed to assist Chicago Press, 1996 (originally published 1962). with the discovery of new forms of activity—that is, 6. S.H. Muggleton, “Exceeding Human Limits,” Nature, vol. unexpected within the current framework of activity. 440, 2006, pp. 409-410. Thus, the approach seeks to extend the boundaries of current systems. “Insights Gained through Visualization for Large Kris Cook is a project manager at Earthquake Simulations,” by Chourasia et al., applies Pacific Northwest National Laborato- visualization techniques to simulations using massive ry, where she has led R&D efforts in data sets. The objective is to make the simulation as information visualization and visual close to the physical situation as possible, so that the analytics projects for the past 11 years. simulation can be predictive of the future. The visual- As part of the leadership team for the izations have enabled instabilities in the simulation National Visualization and Analytics process to be uncovered and also delivered new results Center, she coordinates the work of five in the end points of the simulations that the seismolo- Regional Visualization and Analytics Centers at universi- gists did not expect. ties throughout the United States. She has a BS in chemi- In “Visualizing Diversity and Depth over a Set of cal engineering from The Ohio State University. Contact Objects,” by Pearlman et al., the authors have developed her at [email protected].

18 September/October 2007 Rae Earnshaw is pro vice-chan- John Stasko is a professor in the cellor (Strategic Systems Develop- School of Interactive Computing and ment) at the University of Bradford, the Graphics, Visualization, and UK, and professor of electronic imag- Usability Center at the Georgia Insti- ing and media communications in tute of Technology, where he is director the School of Informatics. He has of the Information Interfaces Research authored and edited 33 books on Group. His research is in the area of computer graphics, visualization, human-computer interaction with a multimedia, design, and virtual reality, and published 140 speciﬁc focus on information visualization, visual analyt- papers in these areas. He has a PhD in computer science ics, and the peripheral awareness of information. He has from the University of Leeds. He is a member of ACM, IEEE, a PhD in computer science from Brown University. Stasko Computer Graphics Society, Eurographics, a Fellow of the is on the editorial staff of ﬁve journals focusing on the top- British Computer Society, a Fellow of the Institute of ics of visualization and HCI, and is on the steering com- Physics, and a Fellow of Royal Society of Arts. Contact him mittee for the IEEE Information Visualization Conference. at [email protected]. Contact him at [email protected].

Coming Discover a New World in Programming Next Issue: Looking for a career with an innovative company where you can use your programming skills to make a difference in the world? Become a key Real-Time technical member of ESRI’s development team and you’ll be designing and developing the next generation of our world-leading geographic Interaction information system (GIS) mapping software. with Complex Models

Interacting with 3D models of almost unlimited size and complexity is a key challenge in computer graphics and scientiﬁc visualization. Such models can contain millions, even billions of 3D

primitives, such as point We are seeking software developers with solid core programming skills and sets, surfaces, voxels, and a passion for inventing new technology. We have opportunities to work higher dimensional data on everything from database and Web development to graphics, 2D/3D sets, and each data set is rendering, core server technology, cartography, and using Python to create often associated with a applications, just to mention a few. complex set of parameters. Our employees enjoy competitive salaries, exceptional benefits including Many techniques accelerate 401(k) and profit sharing programs, tuition assistance, a café complete with the management and inter- Starbucks coffee bar, an on-site fitness center, and much more. We employ 4,000 people worldwide, 1,700 of whom are based at our Redlands head- action of large data sets quarters, a community ideally located in Southern California. based on sample-based representation and render- Join ESRI and be a part of changing the world. ing, polygon rasterization hardware, and ray tracing. Learn more about ESRI and apply online at www.esri.com/programmers. IEEE

Copyright © 2007 ESRI. All rights reserved. ESRI, the ESRI globe logo, and www.esri.com, are trademarks, registered trademarks, or service marks of ESRI in the United States, the European Community, or certain other jurisdictions. Other companies and products mentioned herein may be trademarks or registered trademarks of their respective trademark owners. ESRI is an Equal Opportunity Employer.