Massive Data Exploration and Visualization

Total Page:16

File Type:pdf, Size:1020Kb

Massive Data Exploration and Visualization

MASSIVE DATA EXPLORATION AND VISUALIZATION

Web Page: http://graphics.cs.ucdavis.edu

Participating UC Davis Faculty: Bernd Hamann (CS -- [email protected]) Kenneth I. Joy (CS -- [email protected] ) Oliver G. Staadt (CS – [email protected])

Center for Image Processing and Integrated Computing (CIPIC), UC Davis

(1) Interactive Visual Multi-dimensional Data Exploration. Scientific and engineering applications are producing larger and larger data sets, including inherently multi-dimensional data sets. The explosion in multi-dimensional data is especially evident in biology and life sciences: DNA-chip technology is enabling scientists to generate vast amounts of data collections for which hardly any reliable and useful interactive visual exploration tools exist. G. Grinstein states: "What makes bioinformatics ripe for visualization research and commercialization is the accelerated rate of generation of truly massive quantities of complex data for analysis and interpretation. ... The data sets are large: they have a large number of dimensions, and an even larger number of observations or records. They challenge our ability to successfully analyze and visualize data." Research investments are needed to bring together statistics, vision, AI, data mining, and visualization - to make possible the development of solid foundations for interactive visual multi-dimensional data exploration frameworks. Clusters of powerful computers will become more and more important to deal with the massive data exploration problems facing us.

The relatively new field of "information visualization" has developed techniques that are more "problem-specific" in nature when compared with visualization techniques used for data sets permitting a direct geometrical interpretation of data - like a 3D temperature field. The fact that, until very recently, most research concerned with inherently "abstract" and multi-dimensional data was and still is done to a large degree in the private sector might explain why only few good visual exploration methods exist for multi-dimensional data. Multivariate statistical methods, e.g., cluster analysis, outlier detection, multi-dimensional scaling, and correlation analysis, are well studied from a theoretical point of view. Unfortunately, what has not happened to date it the development of algorithms based on such fundamental approaches that permit true interactive exploration of massive multi-dimensional data. The problem to be addressed, therefore, has at least two important aspects: (1) How to appropriately "project" multi-dimensional data, possibly derived ones, to a number of dimensions suitable for computer graphics? (2) How to adapt traditional methods in such a way that real-time responses, and extracted "features," can be obtained in just a fraction of a second? We propose to investigate these types of issues and devise novel approaches to solving pressing problems.

Approaches for "more generally applicable" interactive techniques for abstract, multi-dimensional data exploration are needed, as the traditional approach of developing ad-hoc, problem-specific data analysis solutions does not keep pace with our ability to generate data. We believe that it will be possible to clearly identify the "essentials" of a widely applicable interactive visual data exploration framework only by working closely with application scientists and engineers, have them clearly define their objectives, and extract the underlying, unifying themes that can support a general multi- dimensional data exploration framework - at the same time keeping such a framework open to extensions needed for highly specialized needs. CITRIS represents an extremely broad spectrum of scientific and engineering applications with needs in multi-dimensional data exploration. By having "problem providers" (data producers) work in collaboration with visualization experts, we believe that substantial progress can be made towards more general massive data exploration and visualization techniques.

(2) Distributed Synthetic Collaborative Environments. Mastering the rapidly changing computing and communication resources is an essential key to personal and professional success in a global scientific community. The main challenge consists not only in accessing data, but rather in extracting relevant information and combining it into new structures. The efficient and collaborative deployment of applications becomes increasingly important the more complex and interactive tools we have at our disposal. Today's technology enables information exchange and simple communication. However, it often fails in the promising field of computer enhanced collaboration in virtual reality environments. Some improvements were made by coming-of-age virtual reality systems that offer a variety of instrumental tools for stand-alone visual analysis. Nevertheless, the crucial interaction between humans and virtual objects is mostly neglected. Therefore successful models of truly computer supported collaborative work are still rare.

A novel tele-presence environment that is more than a virtual extension of a real office or conference room will allow scientists to meet and to collaborate in distributed information spaces. Each user will be able to navigate freely in the synthetic environment. For this becoming reality, novel visualization and interaction methods and advanced interconnected collaborative environments have to be combined. By employing a range of devices from fully immersive displays to wireless hand- held devices scientists would be enabled to access and to interact with shared environments. These visualization portals can be equipped with multiple cameras that capture three-dimensional representations of objects or persons in real-time, which can be merged with the virtual environment.

Major research challenges when building a novel collaborative tele-presence environment include the design and development of scalable rendering and visualization architectures. Furthermore, novel paradigms for interactive and for collaborative visualization of large data sets have to be developed. State-of-the-art methods for capturing and distributing three-dimensional object representations are not sufficiently fast for employment in interactive environments. Again, using powerful clusters of computers will be an integral part of devising suitable system architectures to overcome the underlying massive data processing problems.

In conjunction with other CITRIS thrusts, research efforts would have the goal of setting up a tele- presence environment for collaborative scientific data exploration and visualization. Research aspects will include scalable graphics and visualization architectures, real-time capture of three- dimensional objects and methods for interactive visualization of massive data. The potential benefits are the development of visualization portals suitable for a wide range of collaborative visualization applications.

Desired Support: Funding for grad student participants. One Sun workstation and software support.

Recommended publications