Data Mining Using Graphics Processing Units
Christian B¨ohm1, Robert Noll1, Claudia Plant2, Bianca Wackersreuther1, and Andrew Zherdin2
1 University of Munich, Germany {boehm,noll,wackersreuther}@dbs.ifi.lmu.de 2 Technische Universit¨at M¨unchen, Germany {plant,zherdin}@lrz.tum.de
Abstract. During the last few years, Graphics Processing Units (GPU) have evolved from simple devices for the display signal preparation into powerful coprocessors that do not only support typical computer graph- ics tasks such as rendering of 3D scenarios but can also be used for general numeric and symbolic computation tasks such as simulation and optimization. As major advantage, GPUs provide extremely high paral- lelism (with several hundred simple programmable processors) combined with a high bandwidth in memory transfer at low cost. In this paper, we propose several algorithms for computationally expensive data min- ing tasks like similarity search and clustering which are designed for the highly parallel environment of a GPU. We define a multidimensional in- dex structure which is particularly suited to support similarity queries under the restricted programming model of a GPU, and define a sim- ilarity join method. Moreover, we define highly parallel algorithms for density-based and partitioning clustering. In an extensive experimental evaluation, we demonstrate the superiority of our algorithms running on GPU over their conventional counterparts in CPU.
1 Introduction
In recent years, Graphics Processing Units (GPUs) have evolved from simple devices for the display signal preparation into powerful coprocessors supporting the CPU in various ways. Graphics applications such as realistic 3D games are computationally demanding and require a large number of complex algebraic op- erations for each update of the display image. Therefore, today’s graphics hard- ware contains a large number of programmable processors which are optimized to cope with this high workload of vector, matrix, and symbolic computations in a highly parallel way. In terms of peak performance, the graphics hardware has outperformed state-of-the-art multi-core CPUs by a large margin. The amount of scientific data is approximately doubling every year [26]. To keep pace with the exponential data explosion, there is a great effort in many research communities such as life sciences [20,22], mechanical simulation [27], cryptographic computing [2], or machine learning [7] to use the computational capabilities of GPUs even for purposes which are not at all related to computer
A. Hameurlain et al. (Eds.): Trans. on Large-Scale Data- & Knowl.-Cent. Syst. I, LNCS 5740, pp. 63–90, 2009.