Density-Based Clustering of Static and Dynamic Functional MRI Connectivity
Total Page:16
File Type:pdf, Size:1020Kb
Rangaprakash et al. Brain Inf. (2020) 7:19 https://doi.org/10.1186/s40708-020-00120-2 Brain Informatics RESEARCH Open Access Density-based clustering of static and dynamic functional MRI connectivity features obtained from subjects with cognitive impairment D. Rangaprakash1,2,3, Toluwanimi Odemuyiwa4, D. Narayana Dutt5, Gopikrishna Deshpande6,7,8,9,10,11,12,13* and Alzheimer’s Disease Neuroimaging Initiative Abstract Various machine-learning classifcation techniques have been employed previously to classify brain states in healthy and disease populations using functional magnetic resonance imaging (fMRI). These methods generally use super- vised classifers that are sensitive to outliers and require labeling of training data to generate a predictive model. Density-based clustering, which overcomes these issues, is a popular unsupervised learning approach whose util- ity for high-dimensional neuroimaging data has not been previously evaluated. Its advantages include insensitivity to outliers and ability to work with unlabeled data. Unlike the popular k-means clustering, the number of clusters need not be specifed. In this study, we compare the performance of two popular density-based clustering methods, DBSCAN and OPTICS, in accurately identifying individuals with three stages of cognitive impairment, including Alzhei- mer’s disease. We used static and dynamic functional connectivity features for clustering, which captures the strength and temporal variation of brain connectivity respectively. To assess the robustness of clustering to noise/outliers, we propose a novel method called recursive-clustering using additive-noise (R-CLAN). Results demonstrated that both clustering algorithms were efective, although OPTICS with dynamic connectivity features outperformed in terms of cluster purity (95.46%) and robustness to noise/outliers. This study demonstrates that density-based clustering can accurately and robustly identify diagnostic classes in an unsupervised way using brain connectivity. Keywords: Functional MRI, Brain networks and dynamic connectivity, Cognitive impairment and alzheimer’s disease, Unsupervised learning and clustering, DBSCAN, OPTICS 1 Introduction data? In response to this, machine learning classifers Since the successful emergence of functional neuroimag- have been employed on neuroimaging features to gener- ing, a new barrier has surfaced: can a strong correlation ate models that, within some accuracy, predict the cogni- be established between brain activity (as measured by tive and disease states to which new data belong [1–11]. functional magnetic resonance imaging [fMRI]) and the Tere are two major categories of machine learning cognitive state of an individual? More specifcally, can we classifcation techniques: supervised and unsupervised. accurately classify neurological diseases based on fMRI Supervised learning, commonly used in fMRI studies, involves splitting the dataset into training and test data. *Correspondence: [email protected] Each member of the training data is given a ‘label’ as to 6 AU MRI Research Center, Department of Electrical and Computer which class (or group) it belongs to. Te classifer is then Engineering, Auburn University, 560 Devall Dr, Suite 266D, Auburn, AL ‘trained’ on this data to determine a generalized model 36849, USA Full list of author information is available at the end of the article (thus ‘supervised’ learning). Ten, the classifcation © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/. Rangaprakash et al. Brain Inf. (2020) 7:19 Page 2 of 13 accuracy is measured by testing the model on the test demonstrate the utility and feasibility of the unsupervised data with known labels. Conversely, in unsupervised learning approach of density-based clustering, applied on learning, patterns within the entire dataset are used to resting-state fMRI (rs-fMRI) data to cluster disease states ‘cluster’ the data without any pre-assigned labels, and in a noise-robust manner. cluster purity is measured against the known ground- Clustering is an important technique of grouping truth, post hoc, instead of an accuracy. As such, unsuper- objects that are similar to each other and isolating them vised learning is agnostic to pre-assigned labels, and thus from those that are dissimilar. Clusters are formed with determines inherent classes instead of ftting a model features having minimum intra-cluster distance and based on classes provided by us. Although the ‘cluster maximum inter-cluster distance [28]. Clustering has purity’ given by clustering is, in principle, the same as found wide application in many felds including, but not the ‘classifcation accuracy’ given by supervised classif- limited to, the recovery of information, recognition of ers (both give the percentage of correct classifcations as natural patterns, the analysis of digital images, bioinfor- against the known ground truth), we would use the term matics, data mining, taxonomy, DNA microarray analy- ‘cluster purity’ in this work, so as to highlight that this sis, and many others [29–34]. Te performance of most metric was obtained through unsupervised clustering, clustering algorithms, however, is highly sensitive to and not through conventional supervised classifcation. their input parameters, such as the number of clusters in FMRI studies generally use supervised learning meth- k-means clustering [35]. For efective performance, one ods to classify disease or cognitive states [1–3, 12, 13]. almost needs a priori knowledge of these parameters. Unsupervised learning methods are generally used for Although there are a variety of clustering algorithms, spatially and temporally clustering voxel signals to local- density-based clustering is one of the most computation- ize brain activity [10, 11, 14–20]; for dimensionality ally efective methods in clustering large-scale databases reduction or feature selection before applying a super- [36]. Unlike k-means clustering, density-based clustering vised learning algorithm to the data [13, 21–24]; or for techniques do not need the user to specify the number mapping specifc brain patterns to a cognitive state [6, of clusters since the algorithm determines that by itself. 25]. A few reports have adopted unsupervised learning Using a local proximity measure, clusters are automati- methods as state classifers across individuals. For exam- cally formed by points in higher density regions, well sep- ple, one study used the one-class support vector machine arated by lower density regions. In this work, we focus on (SVM) to determine the boundary around healthy con- two of the most popular density-based clustering meth- trol subjects and to classify outliers based on their dis- ods: Density-Based Spatial Clustering of Applications tance from this boundary. However, such methods are with Noise (DBSCAN) and Ordering Points to Identify highly susceptible to noise and do not work well with the the Clustering Structure (OPTICS). Tey require few high-dimensional data common in fMRI [26]. input parameters, need no a priori user choices, detect Research on clustering of subjects using unsupervised outliers, and efectively detect clusters of arbitrary shapes learning methods is still limited in fMRI literature, to [37, 38]. the best of our knowledge. Diagnostic labels are often DBSCAN is a single-scan technique that makes no pre- predetermined in neuroimaging studies, and most stud- sumptions regarding the distribution of data. In addition ies often stick to this labeling, because of which super- to clustering the data, the algorithm also detects outliers. vised classifers are more popular. However, the main Te OPTICS algorithm was developed by Ankerst et al. advantage of unsupervised learning is that it requires [38] to address DBSCAN’s major shortcoming that it can- no a priori knowledge of categorical labels, thus reduc- not detect clusters with diferent local densities. OPTICS ing the problem of selection bias described by Demerci is a non-rational, data-independent representation of et al. [5]. Tis allows for the learning of more complex the cluster structure that displays important informa- models, pattern discrimination and the identifcation of tion regarding the distribution of the data. It is capable of hidden states [27] among others. Unsupervised learn- identifying all possible clusters of varying shapes. Advan- ing also does not have the issue of overftting the data, tages of OPTICS over DBSCAN are that it eliminates which largely plagues supervised learning models. Te free parameter choices needed in DBSCAN, is relatively overftting nature of supervised learning models results insensitive to noise and can detect clusters of diferent in high performances on the data being used, but much densities