Fmri Analysis on the GPU - Possibilities and Challenges

fMRI Analysis on the GPU - Possibilities and Challenges Anders Eklund, Mats Andersson and Hans Knutsson Linköping University Post Print N.B.: When citing this work, cite the original article. Original Publication: Anders Eklund, Mats Andersson and Hans Knutsson, fMRI Analysis on the GPU - Possibilities and Challenges, 2012, Computer Methods and Programs in Biomedicine, (105), 2, 145-161. http://dx.doi.org/10.1016/j.cmpb.2011.07.007 Copyright: Elsevier http://www.elsevier.com/ Postprint available at: Linköping University Electronic Press http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-69677 fMRI Analysis on the GPU - Possibilities and Challenges Anders Eklunda,b,∗, Mats Anderssona,b, Hans Knutssona,b aDivision of Medical Informatics, Department of Biomedical Engineering bCenter for Medical Image Science and Visualization (CMIV) LinköpingUniversity, Sweden Abstract Functional magnetic resonance imaging (fMRI) makes it possible to non-invasively measure brain activity with high spatial resolution. There are however a number of issues that have to be addressed. One is the large amount of spatio-temporal data that needs to be processed. In addition to the statistical analysis itself, several preprocessing steps, such as slice timing correction and motion compensation, are normally applied. The high computational power of modern graphic cards has already successfully been used for MRI and fMRI. Going beyond the first published demonstration of GPU-based analysis of fMRI data, all the preprocessing steps and two statistical approaches, the general linear model (GLM) and canonical correlation analysis (CCA), have been implemented on a GPU. For an fMRI dataset of typical size (80 volumes with 64 x 64 x 22 voxels), all the preprocessing takes about 0.5 s on the GPU, compared to 5 s with an optimized CPU implementation and 120 s with the commonly used statistical parametric mapping (SPM) software. A random permutation test with 10 000 permutations, with smoothing in each permutation, takes about 50 s if three GPUs are used, compared to 0.5 - 2.5 h with an optimized CPU implementation. The presented work will save time for researchers and clinicians in their daily work and enables the use of more advanced analysis, such as non-parametric statistics, both for conventional fMRI and for real-time fMRI. Keywords: Functional magnetic resonance imaging (fMRI), Graphics processing unit (GPU), CUDA, General linear model (GLM), Canonical correlation analysis (CCA), Random permutation test 1. Introduction & Motivation of MRI have been image reconstruction [13], fiber mapping from diffusion tensor MRI data [12], image registration [9, 11] Functional magnetic resonance imaging (fMRI), introduced and visualization of brain activity [15, 16, 17]. The main advan- by Ogawa et al. [1], besides conventional MRI and diffusion tage of GPUs compared to CPUs is the much higher degree of MRI [2] is a modality that is becoming more and more common parallelism. As the time series are normally regarded as inde- as a tool for planning brain surgery and understanding of the pendent in fMRI analysis, it is perfectly suited for parallel im- brain. One problem with fMRI is the large amount of spatio- plementations. The first work about fMRI analysis on the GPU temporal data that needs to be processed. A normal experiment is the work by Gembris et al. [14] that used the GPU to speedup results in voxels that are of the size 2-4 mm in each dimension the calculation of correlations between voxel time series, a tech- and the collected volumes are for example of the resolution 64 nique that commonly is used in resting state fMRI [18] for iden- x 64 x 32 voxels. For a typical acquisition speed of one volume tifying functional brain networks. Liu et al. [19] have also used every other second, a 5 minute long fMRI experiment will result the GPU to speedup correlation analysis. We recently used the in 150 volumes that need to be processed. Before the data can GPU to create an interactive interface, with 3D visualization, be analyzed statistically, it has to be preprocessed in order to, for exploratory functional connectivity analysis [6]. Another for example, compensate for head movement and to account example is the work by da Silva [10] that used the GPU to for the fact that the slices in a volume are collected at slightly speedup the simulation of a Bayesian multilevel model. A cor- different time points. responding fMRI CUDA package, cudaBayesreg [20] was cre- Graphic processing units (GPUs) have already been applied ated for the statistical program R. A final example is the Mat- to a variety of fields [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] to lab - GPU interface Jacket [21] that has been used to speedup achieve significant speedups (2 - 100 times), compared to opti- some parts of the commonly used statistical parametric map- mized CPU implementations. Applications of GPUs in the field ping (SPM) software [22]. ∗ Corresponding author at: Division of Medical Informatics, Department In this work the first complete fMRI GPU processing pipeline of Biomedical Engineering, Linkoping¨ University, University hospital, 581 85 Linkoping,¨ Sweden, Tel: +46 (0)13 28 67 25 is presented, where both the preprocessing and the statistical Email address: [email protected] (Anders Eklund) analysis is performed on the GPU. The following sections fur- Preprint submitted to Computer Methods and Programs in Biomedicine January 23, 2012 ther justify the use of GPUs for fMRI analysis. 1.3. High Quality Visualization Visualization of brain activity is perhaps the most obvious 1.1. Towards Higher Temporal and Spatial Resolution application of GPUs for fMRI data, as demonstrated by Roßler¨ et al. [15] and Jainek et al. [16]. In our recent work [17] the low An alternative to fMRI is electroencephalography (EEG), resolution fMRI activity volume is fused with the high resolu- which provides direct information about the electrical activ- tion anatomical volume by treating the fMRI signal as a light ity of neural ensambles compared to fMRI, which at present source, such that the brain activity glows from the inside of the depicts activation related blood flow changes. The sampling brain. The brain activity is in real-time estimated by perform- rate can therefore be lower, which accomodates the lower speed ing canonical correlation analysis (CCA) on a sliding window possible with fMRI. An aim of ongoing efforts is to achieve an of data. acquisition speed for volume data previously only possible for single slice data. Approaches investigated currently are com- 1.4. More Advanced Conventional Analysis pressed sensing [23], parallel imaging [24] and EEG-like MR imaging [25]. A higher computational power is not only needed for real- The advantage of fMRI compared to EEG is a much higher time analysis, in 2008 Cusack et al. [36] used a PC cluster with spatial resolution, which can be increased even further by using 98 processors to analyze 1400 fMRI datasets from 330 subjects. stronger magnetic fields. Heidemann et al. [26] have proposed Woolrich et al. [37] proposed a fully Bayesian spatio-temporal how to obtain an isotropic voxel size of 0.65 mm with a 7T modeling of fMRI data based on Markov chain Monte Carlo MR scanner. With a repetition time of 3.5 seconds they obtain methods. The downside of their approach is the required com- volumes of the resolution 169 x 240 x 30 voxels, which are putation time. In 2004 the calculations for a single slice needed much more demanding to process than volumes of the more 6 h, i.e. a total of 5 days for a typical fMRI dataset with 20 common resolution 64 x 64 x 32 voxels. slices. Nichols and Holmes [38] describe how to apply permu- Increasing temporal and spatial resolution will put even more tation tests for multi-subject fMRI data, instead of parametric stress on standard computers and is one reason to use GPUs for tests such as the general linear model. In 2001 it took 2 hours future fMRI analysis. to perform 2000 permutations for between subject analysis. There is thus a need of more computational power, both for conventional fMRI and for real-time fMRI. This need is not 1.2. More Advanced Real-Time Analysis unique to the field of fMRI, it basically exists for any kind of medical image analysis, since the amount of data that is col- Another reason to use GPUs is to enable more advanced real- lected for each subject has increased tremendously during the time analysis. There are several applications of real-time fMRI, last decades. an overview is given by deCharms [27]. One of the possible applications is interactive brain mapping where the fMRI analysis runs in real-time. The physician then sees the result of the anal- 1.5. Using a GPU instead of a PC Cluster ysis directly and can interactively ask the subject to perform Previous and many current approaches to high performance different tasks, rather than performing a number of experiments computing (HPC) have to a large extent been based on PC clus- and then look at static activity maps. ters, which generally cause less acquisition costs than dedicated There are only three approaches to perform the statistical parallel computers, but still more than single-PC solutions and analysis faster, focusing the analysis on a region of interest in- also require more efforts for administration. An application of stead of the whole brain, calculating the same thing in a smarter PC clusters for fMRI data processing has been described by way or increasing the processing power. Cox et al. [28] in 1995 Stef-Praun et al. [39]. The parallel computational power of a proposed a recursive algorithm to be able to run the fMRI anal- modern graphics card provides a more efficient and less expen- ysis in real-time and a sliding window approach was proposed sive alternative.

Fmri Analysis on the GPU - Possibilities and Challenges

A GPU-Based Architecture for Real-Time Data Assessment at Synchrotron Experiments Suren Chilingaryan, Andreas Kopmann, Alessandro Mirone, Tomy Dos Santos Rolo

Развитие На Gpgpu Супер-Компютрите. Възможности За Приложение На Nvidia Cuda При Паралелната Обработка На Сигнали И Изображения

Título: Aceleración De Aplicaciones Científicas Mediante Gpus. Alumno: Diego Luis Caballero De Gea. Directores: Xavier Martor

PAP Advanced Computer Architectures 1 Motivation

Andre Luiz Rocha Tupinamba

Universidade Regional Do Noroeste Do Estado Do Rio Grande Do Sul

Beowulf Cluster Architecture Pdf

Friday, June 14, 13 Jeff Heaton