DATA-DRIVEN, LABEL CONSISTENT, DICTIONARY LEARNING METHODS FOR ANALYSIS OF BIOLOGICAL DATASETS

A Dissertation

Presented to

The Faculty of the Department of Electrical Engineering

University of Houston

In Partial Fulfillment

Of the Requirements for the Degree of

Doctor of Philosophy

in Electrical Engineering

By

Murad Megjhani

August 2016

DATA-DRIVEN, LABEL CONSISTENT, DICTIONARY LEARNING METHODS FOR ANALYSIS OF BIOLOGICAL DATASETS

______Murad Megjhani

Approved: ______Chair of the Committee Badrinath Roysam, Professor, Department of Electrical and Computer Engineering

______Co-Chair of the Committee Jose Luis Contreras-Vidal, Professor, Department of Electrical and Computer Engineering

Committee Members: ______Wei-Chuan Shih, Associate Professor, Department of Electrical and Computer Engineering

______David Mayerich, Assistant Professor, Department of Electrical and Computer Engineering

______J. Leigh Leasure, Associate Professor, Department of Psychology

______Jared Burks, Assistant Professor, Department of Leukemia, M.D. Anderson Cancer Center

______Suresh K. Khator, Associate Dean Badrinath Roysam, Professor and Chair Cullen College of Engineering Department of Electrical and Computer Engineering

ACKNOWLEDGEMENTS

I would like to express my sincerest gratitude to my advisor, Dr. Badrinath Roysam, for his continuous support during my PhD program. His ideas, vision, patience and, of course, the grants that funded my PhD program from day one, have helped me throughout my PhD life. His relationships with biologist and doctors, and his in-depth knowledge of biology helped me work with the most interesting brain and cancer images. I would also like to thank Dr. Jose-Luis Contreras Vidal (Pepe), for co- advising this thesis and giving me an opportunity to apply my algorithms on interesting and challenging neurological datasets. I will forever be grateful to him, for accepting me in his lab. His energy, dedication and enthusiasm is only contagious, his standards for himself are so high, and it reflects on what he expects from his students. Finally, I would like to thank all my committee members, Dr. Wei-Chuan Shih, Dr. Jared Burks,

Dr. David Mayerich and Dr. J. Leigh Leasure, for their insightful discussions and their support.

My sincerest gratitude to the Dr. William Shain, Dr. Correa De Sampaio, Pedro, Dr.

Carstens, Julienne Leigh, and all the other collaborators who provided me with the most interesting datasets. This work would not have seen the light, if not for their images. I would also like to thank all the animals who sacrificed their life for the advancement of science. The eighty infants and 400 museum goers who gave us the piece of their mind through EEG signals.

I would like to thank my brother, Dr. Malik Megjhani, who made sure that I never have to worry about my anything and freed me up of any responsibility, stood by me and let me pursue my dreams. Of course my parents, for giving me life, and being there

iv always. I would also like to thank people who left me, you will always be there in my heart, and without you and your support I would not have pursued my dream of higher education.

My family in school; Raghav Padmanabhan, Amine Merouane, Yanbin Lu, Kedar

Grama, Yan Xu and, Gül Ugur. Raghav, I owe a lot to you, without your continuous motivation I would have not made it. You have always been there for me. Amine, thanks for introducing me to my thesis topic, I have always enjoyed the interesting discussions that we had about various topics. Yanbin, Kedar and Yan, you have always been there for me since day one. Gül, thanks for being there when I needed you the most, you have always supported me and were there for me.

Finally, I would like to dedicate this work to all the people who have succumbed to the worldly pressure and lost their lives. I wish, my journey after here could help me work for the betterment of humanity.

v

DATA-DRIVEN, LABEL CONSISTENT, DICTIONARY LEARNING METHODS FOR ANALYSIS OF BIOLOGICAL DATASETS

An Abstract

Of a

Dissertation

Presented to

The Faculty of the Department of Electrical Engineering

University of Houston

In Partial Fulfillment

Of the Requirements for the Degree of

Doctor of Philosophy

in Electrical Engineering

By

Murad Megjhani

August 2016

vi

ABSTRACT

The goal of this thesis is to develop a data-driven, label consistent, and dictionary

learning based framework that can be applied on a variety of signal analysis

problems. Current methods based on analytical models do not adequately take the

variability within and across datasets into consideration when designing signal analysis

algorithms. This variability can be added as a morphological constraint to improve the

signal analysis algorithms. In particular, this work focuses on three different

applications: 1) we present a method for large-scale automated three-dimensional (3-D)

reconstruction and profiling of microglia populations in extended regions of brain tissue

for quantifying arbor morphology, sensing activation states, and analyzing the spatial

distributions of cell activation patterns in tissue; this work provided an opportunity to

profile the distribution of microglia in the controlled and device implanted brain. 2) we

present a novel morphological constrained spectral unmixing (MCSU) algorithm that

combines the spectral and morphological cues in the multispectral image data cube to

improve the unmixing quality, this work provided an opportunity to identify new

therapeutic opportunities for pancreatic ductal adenocarcinoma (PDAC) from the

images collected from humans; and finally, 3) we developed a framework to analyze

neuronal response from electroencephalography (EEG) datasets acquired from the

infants ranging from 6-24 months. We demonstrated that combining different frequency

bands from different spatial locations, yields better classification results, instead of the

traditional approach where either one or two frequency bands are used. Using an

adaptation of Tibshirani’s Sparse Group LASSO algorithm, we uncovered different

spatial and bio markers for understanding a human infant’s brain. These bio-markers

vii can be used for developmental stages of infants and further analysis is required to study the clinical aspects of infant’s social and cognitive development.

This work establishes the fundamental mathematical basis for the next generation of algorithms that can leverage the morphological cues from the biological datasets. The algorithm has been embedded into the open source FARSIGHT toolkit with an intuitive graphical user interface.

viii

TABLE OF CONTENTS

Acknowledgements ...... iv Abstract ...... vii Table of Contents ...... ix List of Figures ...... xi List of Tables ...... xiv Chapter 1 Introduction ...... 1 Chapter 2 Background and Prior Literature ...... 8 2.1 Review of Microglia Arbor Tracing Methods...... 8 2.2 Review of Spectral Unmixing Methods ...... 11 2.3 Review Infant Decoding and Mirror Neuron System (MNS) ...... 15 2.4 LASSO and Group LASSO ...... 19 2.5 Sparse Group LASSO ...... 21 2.6 Review of Sparse Reconstruction Methods ...... 22 2.7 Review of Dictionary Learning ...... 26 2.7.1. K-Singular Value Decomposition (K-SVD) ...... 26 2.7.2. Online Dictionary Learning ...... 27 2.7.3. Label Consistent Dictionary Learning for Classification ...... 29 Chapter 3 Proposed Methods ...... 34 3.1 Population-scale Three-dimensional Reconstruction and Quantitative Profiling of Microglia Arbors ...... 34 3.1.1. Sparsity Based Modeling of Microglia Processes ...... 36 3.1.2. Automated Microglia Arbor Reconstruction...... 45 3.1.3. Dynamic Cost Metric for Linking Nodes ...... 47 3.1.4. Priority-queue Based Reconstruction Algorithm ...... 49 3.1.5. Feature Computation ...... 51 3.1.6. DIADEM metric ...... 51 3.1.7. Software Implementation Methods ...... 53 3.1.8. Click and Trace ...... 56 3.2 Spectral Unmixing by Morphologically-constrained Dictionary Learning for Multiplex Fluorescence Microscopy ...... 57

ix

3.2.1. Reference Images ...... 58 3.2.2. The Morphologically Constrained Spectral Unmixing (MCSU) Algorithm ..... 59 3.2.3. Morphological Constrained Spectral Unmixing – Total Variation (MCSU-tv) 61 3.2.4. Adaptive Setting of the Regularization Parameter ...... 63 3.3 Infant data decoding and channel relevance using sparse group lasso and label consistent dictionary learning...... 63 3.3.1. Subjects ...... 63 3.3.2. Experimental Protocol ...... 64 3.3.3. Data Acquisition ...... 65 3.3.4. Experimental Task...... 65 3.3.5. Data Pre-processing...... 65 3.3.6. EEG Data Analysis...... 66 3.3.7. 흁-rhythm identification ...... 67 3.3.8. Channel Relevance and Neural Classification ...... 67 Chapter 4 Results and Validation ...... 69 4.1 Microglia Arbor Tracing ...... 69 4.1.1. Parameter Selection ...... 70 4.1.2. Arbor Analysis by Harmonic Co-clustering of L-Measure Data ...... 73 4.2 Morphologically Constrained Spectral Unmixing ...... 77 4.2.1. Spectral Imaging ...... 77 4.2.2. Parameter Selection ...... 79 4.2.3. Unmixing Performance ...... 79 4.3 Analysis of Infant Neural Cortical Activity ...... 81 4.3.1. Behavioral Results...... 81 4.3.2. EEG Analysis ...... 82 4.3.2.1. Neural Models ...... 82 4.3.2.2. Spectral Analysis ...... 84 4.3.2.3. Neural Decoding ...... 85 Chapter 5 Conclusion ...... 89 References ...... 93

x

LIST OF FIGURES

Figure 1. (A) Maximum intensity projection of a 3-D two-channel mosaic image (yellow:

IBA1, blue: Hoechst). (A1) close-up of the boxed region. (B) Automated

reconstructions. (B1) Close-up of the reconstructions in the boxed region...... 2

Figure 2. Limitations of spectral unmixing. (A) Emittance from eight fluorophores. (B)

Spectral angles between the fluorophores. (C-K) Close-up images illustrating the

distinct morphologies. (L-Q) Unmixing results for the DAPI and Coumarin...... 3

Figure 4. Workflow for analysis of 23 month-old male subject (RB23). (A-D)

Synchronized EEG, and video (frames) recordings. (E) Event-Related

Synchronization (F) Topographic scalp maps of relevant channels (G) Confusion

matrix...... 6

Figure 5. Flowchart of the proposed Microglia arbor tracing algorithm ...... 39

Figure 6. A sample of 3-D image patch dictionary entries displayed as maximum-

intensity projections, showing two distinct structures – ones that represent parts of

microglial arbors (highlighted with red boxes), and others background...... 40

Figure 7. (A) Sample IBA1+ cell (grayscale) overlaid with initially detected seed points

(red). (B) Close-up of boxed region in Panel A. The white arrows indicate some

valid seed points, and the yellow arrows some indicate invalid seed points...... 45

Figure 8. Pictorial representation of Microglial Arbor Features ...... 51

Figure 9. Test Trees A and B are compared with the ground truth in yellow...... 52

xi

Figure 10. “Click and Trace” feature of Trace Editor...... 56

Figure 11. Illustrating the ability of the learned dictionary atoms to capture the

morphologies of structures for the channels. The first column shows the reference

images (R1 – R8). The remaining columns show sample dictionary atoms...... 58

Figure 12. Illustrating the distribution of the total infants participated in the study, by age

and race. A total of 61 subjects from Houston area participated in the study. .... 64

Figure 13. (A) The mean DIADEM metric for different algorithms. Our algorithm was

insensitive to the: (B) size of the dictionary K; and (C) to variations in the

sparsity constraint T. (D) Increasing the cost threshold 흉 led to reduced

performance...... 72

Figure 14. Quantitative analysis of the microglial cell population. (A) 3-D rendering of

the reconstructed microglial field, with the arbor traces color coded to indicate the

automatically identified groups (B) Heat map representation of the co-clustering.

...... 76

Figure 15. Qualitative performance of different algorithms indicating the need for

morphological constraints. (A & B) are original image of Coumarin and DAPI

respectively, (E-P) are the results of unmixing for different algorithms...... 78

Figure 16. Performance of different algorithms as measured by MSE, SRE ratio, and

FSIM. (A – C) performance of the algorithms as the noise level was varied. (D –

F) performance of different algorithms with varying number of end members. .. 80

xii

Figure 17. The distribution of observed behaviors varied as a function of age ...... 81

Figure 18. Time-frequency analysis of EEG showing event-related desynchronization

(ERD). Plot depicts the maximum suppression for all the trials and subjects...... 83

Figure 19. Neural classifier model can predict behavioral action from brain activity

(EEG). A) Mean classification accuracy (%) for different frequency bands is

indicated by colored asterisks. B) Mean confusion matrix (%) across all subjects.

...... 84

Figure 20. Example of relevance of EEG channels to prediction of behavioral action.

Scalp maps for each behavior shown by a 23 month-old male subject (RB23). .. 85

Figure 21. Topographic scalp maps for each behavior and age group of all subject data

applied to EEG classification (N=51)...... 88

xiii

LIST OF TABLES

Table 1: Summary of Arbor Tracing Methods ...... 9

Table 2: Summary of Spectral Unmixing Algorithms ...... 13

Table 3: Summary of Neural Classification ...... 16

Table 4: Overview of Sparse coding algorithms...... 22

Table 5: Pseudo Code for Matching Pursuit (MP) and Orthogonal Matching Pursuit

(OMP) ...... 24

Table 6: Pseudo Code for K-SVD algorithm ...... 27

Table 7: Pseudo code for Online Dictionary Learning ...... 29

Table 8: Pseudo code: Summary of the proposed seed point detection algorithm ...... 55

Table 9: Validation of seed points ...... 70

Table 10: Relative Abundance of Arbor Morphology Groups ...... 74

Table 11: Selected Features of Arbor Morphology Groups ...... 75

xiv

CHAPTER 1 INTRODUCTION

The goal of this work is to develop highly effective signal analysis algorithms using machine learning techniques that can cope with the natural variability of biological datasets in a label consistent manner. When designing the image analysis algorithms, current methods based on the analytical models, do not adequately take the variability within and across the dataset into consideration. Biological tissues exhibit complex morphological structures that are traditionally modelled using complex mathematical design. Addressing diverse morphologies requires different mathematical models. These analytical models fail to address the variability of the distinct cells across a larger section of tissues. For example, when profiling the distribution of complex brain cells, like microglia. Microglia are the resident immune cells of the mammalian central nervous system. They are distributed throughout the brain in non-overlapping territories, and comprise up to 20% of the glial cell population (Fields 2013; Gehrmann, Matsumoto, and

Kreutzberg 1995). Their morphologies are dynamic, and informative of their internal state of activation (Streit 2005). Resting microglia exhibit complex and symmetric arbors that are optimal for sensing perturbations in their local environment. When perturbed

(e.g. by an injury), they exhibit progressively less complex arbors with increasing levels of activation. Quantification (Lu et al. 2014; Xu et al. 2016) of arbor morphological changes is therefore valuable for sensing microglia activation, with or without the benefit of additional markers of cell activation. It is valuable to perform this type of sensing on large populations of microglia in order to permit spatial mapping of activation patterns across the brain tissue. Thus, there is a need for novel algorithms to reconstruct these diverse

1

Figure 1. (A) Maximum intensity projection of a 3-D two-channel mosaic image (yellow: IBA1, blue: Hoechst). (A1) close-up of the boxed region. (B) Automated reconstructions. (B1) Close-up of the reconstructions in the boxed region. cells accurately, represent them concisely and in a label consistent manner to perform the quantitate analysis on large sections of brain tissue. Our approach is to learn the diverse morphologies of these complex structures using dictionary learning methods. The methods based on dictionary learning methods can model these diverse morphologies in a sparse and label consistent manner.

Similar needs arise when quantifying the cancer tissues imaged using multi-spectral imaging. Multiplex fluorescent labeling of molecular markers, and multispectral microscopy are increasingly used for imaging biological tissue samples in order to capture multiple cyto-histological structures (e.g., multiple cell types and microvasculature in a heterogeneous tissue sample), and multiple functional molecular

2

Figure 2. Limitations of spectral unmixing. (A) Emittance from eight fluorophores. (B) Spectral angles between the fluorophores. (C-K) Close-up images illustrating the distinct morphologies. (L-Q) Unmixing results for the DAPI and Coumarin. markers in a manner that preserves their relative spatial context (Eroglu et al. 2016;

Patterson et al. 1997; J. Zhang et al. 2002). The natural complexity of multiplex immune- labeling, when combined with the constraints of spectral emission profiles of available fluorophores, and the related practical challenges (such as tissue auto-fluorescence) often lead to situations in which two or more observable entities overlap significantly. Spectral unmixing algorithms aim to overcome these limitations by computational means, by decomposing the measured emission spectrum at each pixel into a collection of the constituent spectra (a.k.a. end members), and a set of corresponding fractions (a.k.a. abundances), that indicate the proportion of each end member present at a pixel. Figure 2 illustrates the multispectral imaging of a human pancreatic ductal adenocarcinoma

(PDAC) specimen stained with a multiplex protocol, demonstrating its association with marked fibrosis and diverse stromal populations whose roles are the subject of ongoing investigations (Ozdemir et al. 2014). Identifying new therapeutic opportunities for PDAC 3 requires a deeper understanding of the interactions between these diverse cellular components. With the availability of a vast array of fluorescent proteins (FPs) (Patterson et al., 1997; Zhang et al., 2002; Eroglu et al., 2016), spectral imaging combined with unmixing algorithms offers the potential to analyze these multiple cell types at the same time, and in their spatial context. Accordingly, this sample was multiplex labeled for the following: cell nuclei (DAPI); αSMA (FITC) for mesenchymal cells, either stromal fibroblasts or mural cells surrounding the large vessels; CK8 (Cy3.5) found in the membranes of epithelial tumor cells; Collagen-I (Coumarin) for extracellular secreted matrix proteins presenting a fibrous morphology; CD4 (680) in the membrane of helper

T-cells showing a ring-like morphology; CD8 (Cy5) in the membrane of a different population of T-cells (cytotoxic T-cells) with similar morphology to CD4; FoxP3

(AF594) present in the nucleus of a subset of CD4+ T-cells (regulatory T-cells); and

CD31 (Cy3) seen on the membrane of endothelial cells. There is a desire to record more of the tissue constituents simultaneously in order to provide additional insight, but the number of fluorophores that can be imaged simultaneously is limited by the need to separate the fluorophores that overlap spectrally and spatially. Figure 2A shows the emission spectra for this sample, and Figure 2B lists the spectral angles between each of the constituents. Smaller spectral angles imply the corresponding pair of constituents are harder to separate, and vice versa. Notably, α-SMA (FITC, red) and CD4 (680, aqua) are quite distinct, and this is reflected in the high 86.3o spectral angle. On the other hand,

DAPI (blue) and Collagen-I (Coumarin, yellow) are separated by a small (12.3o) spectral angle indicating a significant overlap, and therefore, are particularly challenging to separate. The challenge of separating these fluorophores by the traditional linear spectral

4 unmixing (N. Keshava and Mustard 2002) algorithm is highlighted in Figure 2 (L & M)..

Even a more advanced method based on sparse unmixing (Iordache, Bioucas-Dias, and

Plaza 2011a) performed better (Figure 2- N, O), but was still not fully successful.

However, as indicated by close ups in Figure 2 (C-K), and in Figure 3, their morphologies are diverse. These morphologies can be added as an additional cues to traditional unmixing algorithms to accurately unmix the spectra with smaller spectral angles between them.

Finally, we extended this framework to uncover hidden neuronal patterns in the human infants to study different behavioral actions. There might be no greater stage of development as rapid as the first 24 months of a person’s life, where an infant’s physical, social, and cognitive abilities increasingly develop to interact and learn their environment. According to Jean Piaget’s Theory of Cognitive Development (1896-1980)

(Wall 1982), there are specific cognitive stages that many children encounter as they go through this period of rapid development. During the first stage, or sensorimotor stage, infants learn to manipulate objects through behavioral actions such as grasping (Nicolich

1977; Wall 1982), imitating by observing (Wall 1982), and exploring with their hands or mouth (Jones 2009; Meltzoff 1988; Wall 1982). In particular, perception (observe) and execution (imitate) actions are closely linked (Marshall and Meltzoff 2011) and imitation has even been used as a measure to understand the socio-emotional function of communication (Marshall and Meltzoff 2011), as well as the cognitive function of motor skill acquisition (Marshall and Meltzoff 2011). Uncovering neurophysiological correlates and cortical mechanisms behind such functions is one place to begin, as posited by the analog of Mirror Neuron System (MNS) (Iacoboni 2005; di Pellegrino et al. 1992) in

5 humans, which has helped in understanding the production of goal-oriented actions and corresponding observations. Recent advancements of mobile brain imaging technology,

Figure 3. Workflow for analysis of 23 month-old male subject (RB23). (A-D) Synchronized EEG, and video (frames) recordings. (E) Event-Related Synchronization (F) Topographic scalp maps of relevant channels (G) Confusion matrix. technology have provided an avenue for recording the human neural signal in non- traditional experimental environments. These technology provides an opportunity to explore electroencephalographic (EEG) pattern in such complex natural settings of freely behaving human infants. However, this also presents a challenge in analyzing these complex behavioral patterns. Current methods rely on modelling these EEG patterns using specific frequency band of interest from few spatial locations from the scalp.

Accurately identifying the spatial locations or regions of interest, and frequency bands corresponding to these regions of interest is key requirement in uncovering bio-marker and classifying the cortical activity for different behavioral actions. For this we propose a classification technique that combines the different frequency information from different regions of interest using the sparse-group LASSO technique and then use these weighted

6 multiband frequency/channel features to learn the diverse features for distinct behavioral actions. The weights from the sparse-Group Lasso also presents an important information about the relevance of the regions of interest in the brain that brings another insight for source localization.

7

CHAPTER 2 BACKGROUND AND PRIOR LITERATURE

2.1 Review of Microglia Arbor Tracing Methods

Microglia are immune cells of the mammalian central nervous system whose importance to brain health is receiving growing recognition. These cells are distributed throughout the brain in non-overlapping territories, and comprise up to 20% of the glial cell population (Fields 2013; Gehrmann, Matsumoto, and Kreutzberg 1995). Their arbor morphologies are dynamic, informative of their internal state of activation, and are related to tissue perturbations. Resting microglia exhibit motile and highly branched arbors that constantly screen for perturbations in their local environment. Even in normal tissue, microglia exhibit morphological heterogeneity. When perturbed by pathological stimuli, they undergo rapid changes in arbor morphology, and migrate toward the lesion site (Ohsawa & Kohsaka, 2011). The arbor morphologies are vital indicators of cell activation, and are valuable regardless of any molecular markers of activation. For these reasons, there is a need for methods for quantitative analysis of individual microglial arbors, characterizing arbor alterations, and profiling spatial distributions over extended tissue regions. Capturing these phenomena given the small dimensions of microglial processes (<2µm) requires high-resolution microscopy. Accurately capturing the asymmetrical arbors of non-resting microglia, requires fully three-dimensional (3-D) imaging of tissue samples that are thick enough to include entire cells. Capturing perturbations to spatial distributions of these cells requires high-extent imaging, especially when the lesions are much larger than the field of view of the microscope

(Rey-Villamizar et al., 2014). Figure 1 illustrates the use of step-and-repeat confocal microscopy and computational image mosaicing to record large 3-D images of brain

8 tissue. Our goal is to describe automated methods for quantifying microglial arbor morphologies and spatial distribution in such images.

Table 1: Summary of Arbor Tracing Methods

In considering arbor tracing methods, we note that there is a vast literature on tracing algorithms for neurons (Meijering 2010). In contrast, the literature on microglial arbor analysis is much smaller (Galbreath 2011; Rouchdy and Cohen 2013; Yu Wang,

Narayanaswamy, and Roysam 2011; Xiao and Peng 2013). Importantly, the prior efforts have tackled much smaller cell populations than reported here, cannot reconstruct microglia seamlessly across significant extents of brain tissue (multiple mm – cm), and do not guarantee that the reconstructions have a tree topology. Finally, the prior literature does not address population-scale quantitative analysis of microglial arbors. The prior literature on automated neuron tracing provides a valuable basis for reconstructing

9 microglia, but there are important limitations that are specific to microglia that remain unaddressed. Most of the methods are based on predefined analytical models of the tubularity (“vesselness”) and continuity of neurites. For example, (Narayanaswamy,

Wang, and Roysam 2011) used the multi-scale curvelet transform to model neurites.

Automated neurite tracing methods are generally based on model-based sequential tracing (Bas and Erdogmus, 2010; Al-Kofahi et al., 2008; Peng et al., 2010); probabilistic extraction of centerlines (Breitenreicher et al. 2013; Gonzalez et al. 2010; Türetken et al.

2011, 2014); or on segmentation (Chothani, Mehta, and Stepanyants 2011; Vasilkoski and Stepanyants 2009). The neurite centerlines are usually detected by variations of skeletonization (He et al. 2003; Peng et al. 2014), or voxel coding (Chothani, Mehta, and

Stepanyants 2011; Jiménez et al. 2014). For handling complex/dense neurite fields, various approaches have been proposed for negotiating branches and crossovers (Schmitt et al. 2004). The tracing performance depends on the quality of seed points, effective modeling of the peculiarities of the images being processed, and tracing control criteria

(stopping, branching, negotiating crossovers, etc.). Our method is designed to address the needs of large-scale microglia reconstruction by exploiting microglia-specific constraints

(e.g., known topology), and using algorithms that are specifically designed for mosaiced high-extent imaging of brain tissue (Tsai et al. 2011). Specifically, we address the need to cope with biological and imaging variability within and across samples, without having to re-tune algorithm parameters. For this, we propose a machine learning approach for modeling the characteristics of microglia in actual images, rather than resorting to a pre- defined analytical model as in the prior work on neuron tracing. In order to overcome the difficulty of tuning parameters over large-extent images, we propose algorithms with

10 very few parameters that are easy to interpret and adjust. Our method detects the microglial processes reliably using a machine learning algorithm, traces the arbors with constraints that restrict the reconstructions to trees. Appropriately designed cost terms guide our algorithm to exploit the fact that microglia, unlike neurons, are spatially localized (lacking long processes). Whereas tight fiber crossovers are common for neurite fields, they are rare for microglia. Furthermore, microglia do not form inter-connections with neighboring cells. The more pressing challenge in our work to correctly reconstruct microglia that are split across the microscope fields to enable accurate spatial distribution analysis. Our method uses these same constraints to handle such cells. In order to achieve scalability, our method is designed to exploit parallel computers. This results in an array of arbor reconstructions, one per IBA1+ microglial cell that can be visualized and proofread using 3-D visualization tools that are capable of handling thousands of cells at a time (Luisi et al., 2011). The overall pipeline is very usable and scalable. The few adjustable parameters of our method affect the results smoothly, permitting stable and objective analysis of microglia morphologies. The resulting reconstructions are directly amenable to quantitative analysis using Scorcioni’s L-measure (Scorcioni, Polavaram, and Ascoli 2008). The quantitative arbor measurements are analyzed by Coifman’s harmonic co-clustering to reveal morphologically distinct arbor classes that concord with the known microglia activation patterns. Application of these methods to brain tissue analysis will provide unprecedented measurements of microglia morpho-metrics as a function of physiological and pathological stimuli.

2.2 Review of Spectral Unmixing Methods

The design of spectral unmixing algorithms has been widely discussed, especially in

11 the remote sensing literature (Keshava & Mustard, 2002; Keshava, 2003; Bioucas-Dias et al., 2012; Ma et al., 2014 and references therein), and mostly in the context of reflected light imaging. Despite all the progress, this subject remains a topic of continued research, especially the idea of exploiting additional cues to improve unmixing performance when the spectral overlaps are high (Bioucas-Dias et al. 2012; Ma et al. 2014). The field of multiplex fluorescence microscopy presents both opportunities and challenges from the standpoint of unmixing algorithm design. Importantly, it is possible in microscopy to capture reference spectra, as well as the corresponding reference images – something that is harder to obtain in remote sensing. We also know the number of end members in advance, and this is an advantage. On the other hand, autofluorescence, and the natural biological complexity of cyto-histological images present challenges not encountered in remote sensing. Importantly, the reference images contain valuable structural information that is not exploited by the traditional linear spectral unmixing algorithm. Linear spectral unmixing solves the inverse problem where the known information is the set of observed spectral signatures, and the algorithm is designed to estimate the end-member abundances. Typically the spectral signature of each fluorophore as shown in Figure 2, is obtained a priori as a set of reference “휆-stacks,” and these reference stacks are used to compute the abundance at each pixel. When these reference stacks are acquired, there is the opportunity to obtain additional information about the morphologies of the biological structures, as shown in Figure 2 (C – K). This is valuable information that is not exploited by linear unmixing algorithms. Nevertheless, the impact of linear spectral unmixing has been significant – it has enabled scientists to image a larger number of fluorophores simultaneously than is possible with instrumentation alone (Tsurui et al.

12

2000). Spectral unmixing algorithms can be broadly classified into Geometrical,

Statistical, Sparse, and Spatial-spectral context based methods.

Table 2: Summary of Spectral Unmixing Algorithms

Geometrical methods (Bioucas-Dias et al. 2012; Chan et al. 2011; Esser et al. 2012;

Nascimento and Dias 2005) exploit the fact that under linear mixing models, the normalized hyperspectral vectors belong to the unit simplex (a mathematical vector whose components lie in unit intervals [0,1], and whose sum is constrained to be 1), so the unmixing problem is formulated in such a way that the vertices of the simplex correspond to the end members. When the spectral data are highly mixed, geometrical methods yield poor results since there are not enough vectors in the simplex facets. In these cases, statistical methods (Arngren, Schmidt, and Larsen 2010; Dobigeon et al.

2009; Parra et al. 2000) provide an alternative. Under a statistical unmixing framework,

13 the spectral unmixing is formulated as an inference problem that relies on the posterior probability density of the unknown abundances given the data.

Sparse regression based unmixing algorithms (Iordache, Bioucas-Dias, and Plaza

2011b) are a more recent development. They rely on a large pre-existing library of pure spectral signatures, and the unmixing algorithm finds the optimal subset of the signatures from this very large spectral library. The main innovation here is the use of the mathematical L1 norm as a penalty in the underlying regression framework that leads to sparse unmixing solutions. Several variants of constrained sparse regression (CSR) have been used for spectral unmixing. Constrained basis pursuit (CBP) is a variant of basis pursuit algorithms under sum to one and non-negativity constraints. Similarly, constrained basis pursuit denoising (CBPDN) is a generalization of CBP that accounts for modeling errors.

When a spectral library is unavailable, dictionary learning methods (Aharon, Elad, and

Bruckstein 2006; Mairal, Bach, et al. 2009) have been used to learn the end-member signatures to estimate the abundances from the data (Charles, Olshausen, and Rozell

2011; S. Yang et al. 2014). However, the dictionary that is learned from the data by these algorithms only accounts for the spectral signatures of the end members, and not the combination of spatial and morphological information that we seek to incorporate in this paper. These methods work well when the spectral angles between pairs of spectral signatures are large, and when imaging a limited number of fluorophores. Gammon et al.

(Gammon et al. 2006) used linear unmixing using manual user input to address the problem of spectral overlap. Merouane et al. (2015) used sparse unmixing as part of a study of immune cell-cell interactions, but with only two fluorophores. Al-Kofahi et al.

14

(2010), and Lovisa et al. (2015) analyzed multiplex-stained histological sections of breast and kidney samples using the proprietary Nuance software (Perkin Elmer, Hopkinton,

MA) to unmix fluorophores into non-overlapping channels. Neher et al. (2009) and Yang et al. (2012) have applied non-negative matrix factorization (NMF) and sparse component analysis methods to extract the abundances of fluorophores. Again, these techniques work only when the number of channels to be unmixed is limited, and when the spectral angle between overlapping pairs of fluorophores is large. This sets the limitation on the number of channels that can be imaged at the same time.

The recently reported spatial-spectral unmixing algorithms (Fauvel et al. 2008;

Ghamisi, Benediktsson, and Ulfarsson 2014; Iordache, Bioucas-Dias, and Plaza 2012,

2014) pioneered the idea of incorporating pixel spatial information, since neighboring pixels in an image tend to have similar abundances. These methods incorporate spatial information by using a fixed-size window to ascertain when neighboring pixels have similar abundances. Unlike remote sensing images, biological microscopy images tend to have distinct morphologies for different types of structures whose morphologies can potentially be learned from the unmixed reference images using dictionary learning methods (Aharon, Elad, and Bruckstein 2006; Mairal, Bach, et al. 2009). The proposed method overcomes these limitations, as explained below.

2.3 Review Infant Decoding and Mirror Neuron System (MNS)

Action understanding and action production are essential to developing socio- cognitive skills early in life. The neural basis of action understanding and production in goal-oriented actions has been attributed to a postulated analog of the mirror neuron

15 system (MNS) (Iacoboni 2005; di Pellegrino et al. 1992) in human adults. The MNS has been proposed to constitute a basis for important social-cognition components such as imitation and communication (Gallese, Keysers, and Rizzolatti 2004; Rizzolatti and

Craighero 2004). An EEG signature of the MNS in human adults (Pineda, Allison, and

Vankov 2000) has been identified as the suppression of 8-13 Hz power during action observation and execution, also known as 휇 suppression when found in sources encompassing the primary somatosensory and motor areas (Makeig et al., 2004).

Table 3: Summary of Neural Classification

Few studies have examined the emergence and development of the MNS in human infants during unconstrained social interactions and this remains mostly unexplored. In one study, EEG power suppression in the 6–9 Hz range was reported for 14-month-old

16 infants during action observation in a social context (Marshall et al., 2011). The authors examined a range of 4-9 Hz across several scalp regions, and found a broad distribution of infant mu-suppression. Recently, the mu-rhythm representation of MNS activation in natural social interaction has been further characterized for toddlers (mean age 41 ± 4 months), relating ICA-based cortical sources and spectral properties (Y. Liao et al. 2015).

These studies show recent efforts to overcome the constraints of spatial and spectral assumptions usually associated to the MNS in humans.

An understanding of neural sources is of crucial importance to investigating the functional and temporal mechanisms underlying the MNS in humans. However, there have been a limited number of efforts to extend EEG source imaging methods to infants and young children. Current source localization methods include boundary-element and finite-element models, constrained by conductivity estimates of interposed tis-sues (Acar and Makeig, 2010). Using an age-appropriate anatomical model, it should be possible to estimate the source locations of coherent EEG activity in infants. This would represent an advance in documenting the early cortical substrate of social action processing. Optimal methods for processing EEG data, including source localization, include a biologically plausible approach to estimate cortical sources.

The classification and prediction of movement intent using invasive ECoG and non- invasive EEG methods has long been studied, usually in research related to the fields of brain computer interfaces and neuro-prosthetics (Cruz-Garza et al. 2014; Hochberg et al.

2012). However, such studies generally focus on the prediction of the kinematics of functional movements; the prediction of emotional, expressive, and contextual properties of movements has not been as well studied (Cruz-Garza et al. 2014), even though such

17 properties can affect the kinematics of a motion (Becchio et al. 2008). To the best of our knowledge, although the neural basis of the action-intention has been studied, especially during changes in μ–rhythm (E. N. Cannon et al. 2014), little is known of this basis in infants.

Few studies have focused on either the delta band (Hernandez et al. 2014), or 휇- rhythm (Marshall and Meltzoff 2011) for classification of the neural signals. However, these methods rely on the few channels surrounding the sensori-motor area to predict the behavior, and does not take the wider spectrum and different regions of the brain that could potentially improve the classification results. Moreover, these algorithms solves simple two-class classification problem. For example, Wang and Makeig (Yijun Wang and Makeig 2009), predicted the intended movement of direction using EEG from posterior parietal cortex and have focused on 0-25 Hz frequency band for classification.

Mühl et al. (Mühl, Jeunet, and Lotte 2014), looked at multiple frequency bands 훿 (1-

4Hz), 휃 (4-7Hz), 훼 (8-12Hz), 훽 (12-30Hz), 훾1 (30-40Hz), 훾2 (53-90Hz) for classifying stress vs non-stress workload. Then for each of these bands, the band-pass filtered EEG trials are used to optimize spatial filters, i.e., linear combinations of the original EEG channels. These spatial filters are optimized using the Common Spatial Pattern (CSP) algorithm (Blankertz et al. 2008), which finds the optimal channel combination such that the power of the resulting spatially filtered signals is maximally discriminant between the two conditions (here, low and high workload). Next, 18 most relevant of the 72 power features (12 CSP filters × 6 frequency bands), were selected by maximum Relevance

Minimum Redundancy (mRMR), algorithm (Peng, Long, and Ding 2005). Finally these

18 most selected power features were used to train a shrinkage Linear Discriminant

18

Analysis (LDA) classifier (Fabien Lotte and Guan 2010). Even though this technique looks at multiple frequency bands, but is restricted to solve two class classification problem. Since the CSP filters are optimized for two class. Moreover, the selection of top 18 relevant features from mRMR algorithm is arbitrary. A robust technique is required that can look at more features and select the features automatically by weighing those features.

Recent progress in the fast solvers for 퐿0 and 퐿1 minimization techniques have garnered a lot of attention in signal processing and . Especially, the sparse representation based classification (SRC). SRC have also been studied for EEG based classification (Shin et al. 2012; Zhou, Yang, and Yu 2012).

In this study we aim to use EEG to study MNS development in freely-behaving infants by analyzing neural correlates of imitation-learning. A classification algorithm using label consistent dictionary learning was used to evaluate time and frequency-domain features from EEG data for identification of different behaviors elicited by the infant. In order to better understand the features driving the successful classification of varying neural and behavioral dynamics in natural behavior, we deployed an automatic feature selection algorithm based on the sparse group LASSSO that could reveal the relevant frequency bands and channels corresponding to different behaviors.

2.4 LASSO and Group LASSO

In 1996, Robert Tibshirani proposed a new method for estimation in linear models.

The LASSO (Least Absolute Shrinkage and Selection Operator) is a regression method that involves penalizing the absolute size of the regression coefficients. The “LASSO” minimizes the residual sum of squares subject to the sum of the absolute value of the

19 coefficients being less than a constant value. By penalizing (or equivalently constraining the sum of the absolute values of the estimates) you end up in a situation where some of the parameter estimates may be exactly zero. The larger the penalty applied, the further estimates are shrunk towards zero. This is convenient when we want some automatic feature/variable selection, or when dealing with highly correlated predictors. In our application, we choose to use lasso for selecting distinguishing measurement to interpret the profiling result and rebuild a model for further analysis.

In a typical linear regression model, assume the data of interest consists of 푁 data points and each with 푝 dimensions. The data matrix can be presented as 푋 ∈ 푅푁×푝, a respond (label) vector is usually given 풚 ∈ 푅푁. In many applications, we have 푝 ≫ 푁. To solve this, it is regularized by bounding the 푙1 norm and minimize the objective function,

2 min(‖푦 − 푋훽‖2 + 휆‖훽‖1). 훽∈푅푝 (1)

In the equation above, 훽 is the coefficients vector. The solution of the above optimization problem yields a sparse coefficient vector 훽, which in applications are interpreted as significant feature/measurement for the prediction/regression model.

Further, consider the application where features/measurements can be divided into practical meaningful groups, for example in a gene expression data, a group of genes are regulating the express of the same protein. Or a group of features are statistically correlated but independent among groups. If these group information is given or can be extracted from the dataset, a desired solution will be one that is able to give a sparse set.

Yuan & Lin proposed an algorithm for solving this problem. Suppose that the 푝 predictors are divided into 퐿 groups, with 푝푙 the size of each group. For ease of notation,

20 we use a matrix 푋푙 to represent the predictors corresponding to the 푙푡ℎ group, with corresponding coefficient vector 훽푙. Assume that 푦 and X are normalized with zero mean.

The algorithm is formulated as

퐿 2 퐿

min(‖푦 − ∑ 푋푙훽푙‖ + 휆 ∑ √푝푙‖훽푙‖2), 훽∈푅푝 푙=1 2 푙=1 (2)

where the √푝푙 terms accounts for the varying group sizes, and ‖. ‖2 is the Euclidean norm (not squared). This procedure acts like the lasso at the group level: depending on 휆, an entire group of predictors may drop out of the model. In fact if the group sizes are all one, it reduces to the lasso.

2.5 Sparse Group LASSO

In the above algorithm, however, we cannot achieve sparsity within groups.

Specifically, if the coefficient for one group does not shrink to zero, all the member variables within the group will be non-zero. In our application, we need a regularization that yields sparsity both in group levels and individual variable levels as well. To achieve this, we adopt a sparse group lasso model proposed by (Noah), which provides for a more general criterion also works for the standard group lasso with non-orthonormal model matrices. Consider the sparse group lasso criterion,

퐿 2 퐿

min (‖푦 − ∑ 푋푙훽푙‖ + 휆1 ∑ √푝푙‖훽푙‖2 + 휆2‖훽‖1) , (3) 훽∈푅푝 푙=1 2 푙=1 where = 훽 = (훽1, 훽2, 훽3, … , 훽퐿 ) is the entire parameter vector.

21

2.6 Review of Sparse Reconstruction Methods

Sparse coding has been successfully applied on variety of problems in computer

Table 4: Overview of Sparse coding algorithms

Type Algorithms Authors

Matching pursuit (MP) Mallat et al. , 1993 (Mallat 1993)

Orthogonal Matching Pati et al., 1993 (Pati, Rezaiifar, and

pursuit (OMP) Krishnaprasad 1993)

Greedy Donoho et al., 2006 (Donoho et al.

Stage-wise OMP 2006)

Least Angle Efron et al., 2015 (Efron et al. 2015)

Regression(LARS)

Focal Undetermined Gorodnitsky & Rao, 1997

System Solver (FOCUSS) (Gorodnitsky and Rao 1997)

Elad et al., 2006 (M. Elad et al. Iterative Iterated Shrinkage 2007)

Block Coordinate Sardya et al., (Sardy, Bruce, and

Relaxation (BCR) Tseng 2012)

Chen et al., 1998 (Chen, Donoho,

Basis Pursuit (BP) and Saunders 1998)

Relaxation Least Absolute Shrinkage Tibshirani 1996 (Tibshirani 1996)

and Selection

Operator(LASSO)

22 vision including image denoising (Michael Elad and Aharon 2006), image restoration

(Mairal, Sapiro, and Elad 2008), image classification (Ramirez, Sprechmann, and Sapiro

푛 2010). Given a signal 푥 ∈ 푅 , and a set of basis elements{푑1, … , 푑퐾}, known as dictionary and represented by 퐷 ∈ 푅푛×퐾 sparse coding algorithm then finds 훾 ∈ 푅퐾 that represents the signal 푥 as the linear combination of the few basis elements of dictionary 퐷. The representation of the signal may be exact,

푥 = 퐷훾, (4) or approximate, satisfying certain criteria

‖푥 − 퐷훾‖푝 ≤ 휖; 푤ℎ푒푟푒 푝 = 1, 2 푎푛푑 ∞. (5)

If 푛 < 퐾 and 퐷 is a full rank matrix then there are infinite number of solutions, thus there is a constraint on these problems. There are many variants of sparse coding algorithms proposed in the literature, with minor variation in the way these constraints are imposed.

Equations (6),(7) and (8) list three such variants depending on the way these problems are formed,

min‖훾‖0 푠. 푡. 푥 = 퐷훾, 훾 (6)

min‖훾‖0 푠. 푡. ‖푥 − 퐷훾‖푝 ≤ 휖, 훾 (7) and min‖푥 − 퐷훾‖푝 푠. 푡. ‖훾‖0 ≤ 푇, 훾 (8)

푡ℎ where ‖. ‖0 is the L0 norm, counting non-zero entries in the vector, ‖. ‖푝 is the 푝 norm where 푝 = 1,2 푎푛푑 ∞. An exact determination of sparse representations is proven to be a

23

NP-hard problem (Natarajan, 1995). Several algorithms have been proposed in the literature to solve the above sparse coding problem, usually by relaxing the L0 penalty to

Table 5: Pseudo Code for Matching Pursuit (MP) and Orthogonal Matching Pursuit (OMP)

Input 푥, Dictionary 퐷 and Sparsity Threshold 푇

Output : Sparse Coefficients 훾 begin

initialization

1. Active Set 푆 = {푒푚푝푡푦 푠푒푡};

2. Coefficients 훾 = 0; and

3. Residual 푟0 = 푥

for i= 1 to T do

4. Select the atom that reduces the objective

푖̂ ← arg max < 푑푖, 푟푖−1 > 푖∈푆퐶

5. Update the active set : 푆 = 푆 ∪ {푖̂}

6. Update Coefficients

Matching Pursuit (MP) : (updates only one coefficient corresponding

to the selected atom)

훾[푖̂] ← < 푑푖̂,, 푟푖−1 >

Orthogonal Matching Pursuit (OMP) : (updates coefficient of the

24

Table 5: Pseudo Code for Matching Pursuit (MP) and Orthogonal Matching Pursuit (OMP) (continued)

entire active set)

푇 −ퟏ 푇 훾푆 ← 퐷푆(퐷푆 퐷푆) 퐷푆 푥

7. Update the residual

8. 푟푖 ← 푥 − 퐷훾

end for end

use the convex L1 penalty instead, as in the basis pursuit method (Chen et al., 1998). The

Focal Underdetermined System Solver (FOCUSS) (Gorodnitsky & Rao, 1997) is very similar in using the 퐿푝 norm, with 푝 < 1 as a replacement for the L0 norm. Another approach is to use greedy algorithms like Matching Pursuit (Mallat, 1993), or Orthogonal

Matching Pursuit (OMP) (Pati et al., 1993), that select the dictionary atoms sequentially.

In this work, we chose OMP to solve the sparse coding problem with computational efficiency in mind.

All the above sparse coding algorithms works with the known set of basis elements,

퐷, that can be either analytical, i.e., pre-specified set of functions or learned from the data. Pre-defined set of functions as in case of overcomplete , curvelets are chosen due to their simplicity and preference is generally given to the tight frames for which psedo-inverse can be easily computed. However, finding a sparse representation for a signal depends heavily on the quality or type of the dictionary 퐷 and selecting these

25 pre-defined functions might not always yield accurate sparse representation. In this work we aim to learn these dictionaries using the data to yield the sparse representation of the signal instead of predefined dictionaries.

2.7 Review of Dictionary Learning

2.7.1. K-Singular Value Decomposition (K-SVD)

The pseudo code for the K-Singular Value Decomposition (K-SVD) is presented in

Table 6. The K-SVD algorithm is an iterative approach for implementing this optimization in equation that operates in two steps. First, equation (4) or equation (5) is minimized with respect to Γ for a fixed D. Given D, the sparse coding algorithm computes the sparse representation Γ of signal X by solving equation (4) using sparse coding algorithms like Orthogonal Matching pursuit presented in Table 5. The next step is to compute the atoms of the dictionary by minimizing,

2 < 퐷 > = arg min‖X − DΓ‖F , D

s. t. ∀ i, ‖γi‖0 ≤ T.

The above equation can be simplified to solve for one dictionary atom at a time as

푇 2 < 푑k > = arg min‖퐸푘 − 푑푘훾푘 ‖F, 푑푘 where,

푁 푇 퐸푘 = 푋 − ∑푗=1;푗≠푘 푑푗훾푗 .

The above equation is solved using Singular Value decomposition (SVD) by choosing the columns of 퐸푘 corresponding to those elements initially used by 푑푘 in their representation. The pseudo code for K-SVD is presented in Table 6. 26

2.7.2. Online Dictionary Learning

The online dictionary learning method (Mairal, Bach, et al. 2009), is based on stochastic approximations, and can easily scale up to millions of training examples. The dictionary learning problem is cases as the optimization of a smooth nonconvex objective function over a convex set by minimizing the expected cost when the training goes to infinity.

Table 6: Pseudo Code for K-SVD algorithm

Input Signal matrix 푋and Sparsity Threshold 푇, no. of iterations 푚

Output : Dictionary 퐷 and Sparse Coefficients Γ begin:

initialization

1. Set 퐷(0) ∈ 푅푛×퐾 = 퐷퐶푇(푛, 퐾) 표푟 푟푎푛푑푛(푛, 퐾),

2. Coefficients 퐽 = 1; and,

3. 푁 = 푛표. 표푓 푒푥푎푚푝푙푒푠. repeat (until convergence or for some fixed number of iterations 푚):

Sparse Coding Stage: Use any sparse coding algorithm to compute Γ

4. for 푖 = 1 풕풐 푁

5. min‖푥 − 퐷훾푖‖2 푠. 푡. ‖훾‖0 ≤ 푇, 훾푖

6. end

27

Table 6: Pseudo Code for K-SVD algorithm (continued)

Codebook Update Stage:

7. for 푘 = 1 풕풐 퐾

8. define the group of examples that uses 푑푘,

휔푘 = {푖|1 ≤ 푖 ≤ 푁, 푥푖(푘) ≠ 0},

푁 푇 9. compute 퐸푘 = 푋 − ∑푗=1;푗≠푘 푑푗훾푗 ,

10. restrict 퐸푘 by choosing only the columns corresponding to those

푅 elements that initially used 푑푘in their representation, and obtain 퐸푘 ,

푅 푇 11. apply SVD decomposition 퐸푘 = 푈Δ푉

푇 12. update 푑푘 = 푢1 푎푛푑 훾푘 = Δ(1,1). 푣1

13. end

14. Set 퐽 = 퐽 + 1 end

This is an iterative online algorithm that solves the problem, by minimizing at each step a quadratic function of the empirical cost of set of constraints. The pseudo code of the algorithm is presented in Table 7.

28

2.7.3. Label Consistent Dictionary Learning for Classification

The sparse coefficients Γ can now be used directly used as a feature vector for classification. An effective classifier 푓(훾) can be obtained by determining the model parameters 푊 ∈ 푅(푚×퐾), satisfying,

2 < 푊 > = arg min ∑푖 ℒ{ ℎi, f(훾, W)} +λ ‖W‖F, (9) W

Table 7: Pseudo code for Online Dictionary Learning

Input Signal matrix 푋and 휆 ∈ 푅, no. of iterations 푚

Output : Dictionary 퐷 and Sparse Coefficients Γ begin:

initialization

1. Set 퐷(0) ∈ 푅푛×퐾 = 퐷퐶푇(푛, 퐾) 표푟 푟푎푛푑푛(푛, 퐾),

2. for 푡 = 1 풕풐 푇 do

3. Draw 푥푡

4. Sparse Coding :

1 2 훾푡 ← arg min ‖푥푡 − 퐷푡−1훾‖2 + 휆‖훾‖1 훾∈푅퐾 2

5. Dictionary Learning:

1 푡 1 2 퐷푡 ← arg min ∑푖=1 ( ‖푥푡 − 퐷푡−1훾‖2 + 휆‖훾‖1) 퐷∈∁ 푡 2

6. end

29 where ℒ is the classification loss function, ℎ푖 are the labels (seed point / not a seed point), and λ is a regularization term that is incorporated to prevent overfitting. Widely used loss functions include the logistic function, hinge, and the quadratic. We used the linear predictive classifier in our study so 푓(훾, 푊) = 푊훾. However, separating the dictionary learning from the classifier learning will make 퐷 suboptimal for classification.

It is possible to jointly learn dictionary and classification models by implementing the following optimization,

2 2 퐷, 푊, Γ > = arg min ‖X − DΓ‖F + +β‖H − WΓ‖F D,W,Γ (10)

s. t. ∀ i, ‖γi‖0 ≤ T,

(푚×푁) where 퐻 = {ℎ1, … ℎ푁} ∈ 푅 denotes the class labels for 푁 examples and m classes.

These approaches require relatively large dictionaries to achieve good classification, making it difficult to set the dictionary size (퐾). By introducing the label consistency term, a concise unified dictionary can be learned as explained in the next section.

The Label Consistent KSVD (LC-KSVD) algorithm learns the classifier model jointly with the dictionary. The resulting dictionary enables reliable classification using only a small, unified dictionary and a single multi-class linear classifier. The central idea behind this approach is to add two additional terms to the objective function for learning the dictionary, specifically a label consistency term, and a classification error term. The dictionary learned in this manner will adapt to the underlying structure of the training data, and generate discriminative sparse codes regardless of the size of the dictionary. Let

(퐾×푁) 푄 = {푞1, … , 푞푁} ∈ 푅 denote such a discriminative set of sparse codes. Let 퐴 denote a linear transformation matrix that transforms the original sparse codes to the most

30

퐾 discriminative sparse codes in the sparse feature space 푅 . Let 퐻 = {ℎ1, … , ℎ_푁 } ∈

푅푚×푁 denote the class labels for the 푁 examples for 푚 classes. With this notation, the

LC-KSVD algorithm can be expressed as the following expanded optimization problem,

2 2 2 < 퐷, 푊, 퐴, Γ = arg min ‖X − DΓ‖F +α ‖Q − AΓ‖F + β‖H − WΓ‖F D,W,A,Γ (11)

s. t. ∀ i, ‖γi‖0 ≤ T,

2 where the first term ‖X − DΓ‖F represents the squared reconstruction error. The second

2 term ‖Q − AΓ‖F represents the discriminative sparse-coding error. It is intended to penalize sparse codes Γ that deviate from the discriminative sparse codes Q. Intuitively, it forces the signals from the same class to have similar representations. For example, if qi is the discriminative sparse code corresponding to the input signal xi, then the non- zero values of qi occur at those indices where the input signal xi and the dictionary

2 element dk share the same label. The third term ||H − WΓ||2 represents the classification

mXN error, and H = {h1, h2, … hN} ∈ ℛ is the matrix of class labels for N samples and m classes. As a concrete example, for a two-class problem, suppose that the size of the dictionary K is 4 i.e., D = {d1, … , d4}, and the number of examples N is 4 i.e., X =

{x1, … , x4}. To create a discriminative dictionary we want the examples of Class 1

(x1, x2) to use the first two dictionary elements i.e., D = {d1, d2}, and the remaining elements to use Class 2. Therefore, the Q matrix can be written as

1 1 0 0 1 1 0 0 Q = [ ]. (12) 0 0 1 1 0 0 1 1

The matrix Q is set automatically. For the above Q matrix the label matrix can be written as

31

1 1 0 0 H = [ ] . (13) 0 0 1 1

The dictionary learned in this manner has excellent representational power, and enforces strong discrimination. Importantly, the dictionary is selected to provide a sparse representation of the image. We are interested in constructing a dictionary such that a small number of its atoms can be linearly combined to represent the image. Following (Q.

Zhang and Li 2010) the above optimization can be re-written as

X D 2 < 퐷, 푊, 퐴, Γ ≥ arg min ‖(√αQ) − ( √αA ) Γ‖ , D,W,A,Γ √βH √βW (14) F

s. t. ∀ i, ‖γi‖0 ≤ T.

This can be recast concisely as,

′ ′ ′ 2 < 퐷 , Γ > = arg min‖X − D Γ‖F, 퐷′,Γ (15) s. t. ∀ i, ‖γi‖0 ≤ T, where, X D 푋′ = (√αQ), and D′ = ( √αA ). √βH √βW

The above optimization is conducted using the Orthogonal Matching Pursuit (OMP) algorithm. Given the nature of the L0 norm, the non-zero values of qi in the solution occur at those indices where the input signal xi and the dictionary element dk share the

′ same label. We then solve equation (4) using the KSVD method. The matrix D is L2 - normalized column-wise. We then compute the matrices D = {d1, . . , dk}, A =

′ {a1, . . , ak}, and W = {w1, . . , wk} from D . However, we cannot simply use D, A, and W for classifying new image patches since they are jointly normalized in D′

32 i.e., ‖dt , αat , √βwt ‖ = 1. The normalized dictionary D̂, transform parameters Â, and k √ k k 2 classifier parameters Ŵ are then computed as

d d D̂ = { 1 , … , k }, ‖d1‖2 ‖dk‖2

a a  = { 1 , … , k }, and (16) ‖d1‖2 ‖dk‖2

w w Ŵ = { 1 , … , k }. ‖d1‖2 ‖dk‖2

33

CHAPTER 3 PROPOSED METHODS

3.1 Population-scale Three-dimensional Reconstruction and Quantitative Profiling

of Microglia Arbors

Microglia consist of branching processes that form a tree, and the processes stem from a central soma that envelops the cell nucleus. Detecting the soma is the first step to reconstructing the microglia. For this, we proceed in two steps. First, we perform an automated segmentation of the Hoechst image channel (the nuclear label) to detect all cell nuclei (Al-Kofahi et al., 2010). From the segmentation results, we compute a set of features including the 3-D location, volume, shape factor, and chromatin intensity variance (a texture measure) for each nucleus. Next, we compute the total amount of the microglial marker IBA1 within a distance of 8 voxels to each nucleus (Bjornsson et al.

2008). We then isolate the microglial cell nuclei based on these features, using an active learning based classifier (Padmanabhan et al. 2014). Microglial cell somas that extend beyond the nuclei are segmented using a level-set algorithm. The centroids of the somas are used as the starting point for the tracing algorithms.

The soma segmentations allow us to eliminate any false seed points that lie within the soma. The remaining seeds are used to reconstruct the cellular arbors. As noted above, individual microglia are known a priori to have a tree topology. Furthermore, microglia do not form inter-connections with neighboring cells (Rouchdy et al. 2011). These constraints enable us to develop fast and scalable algorithms that are appropriate for tracing the arbors of these cells. The tree assumption is also helpful from the standpoint of extracting quantitative measurements of arbors, since existing computational neuroanatomy tools, e.g., the L-measure (Scorcioni, Polavaram, and Ascoli 2008) expect

34 a tree topology. Based on these considerations, we designate the centroids of the microglial somas as the root points for their respective cellular arbors. Starting from all root points, we simultaneously construct a forest of R minimum spanning trees (MST’s), where R is the total number of detected root points. Each MST consists of nodes and directed edges. The nodes for a single cell include the root point, a subset of the seed points, referred to here as the primary nodes and denoted v ∈ V0, an all intermediate pixels that are termed secondary nodes. The secondary nodes are used for calculating the costs associated with their neighbors (as explained further below).

Each node is identified by its 3-D coordinates in the image. The directed edges, denoted eij ∈ E, link the nodes, and their orientations always point towards the root point.

Each edge is associated with a weight value that is used for determining whether or not a given pixel belongs in the arbor reconstruction. The process of growing each MST is based on an adaptation of Prim’s algorithm (Mariano et al. 2013; Teacher and Griffiths

2011). Starting from each root node, our algorithm connects each successive node to the nearest primary node (in the sense of the cost metric defined below), to form an edge

(link). This process is repeated at each added node. The algorithm iterates as it expands all the trees starting from their respective roots concurrently to cover all the primary nodes V0. For each cell, the tree-growing procedure is stopped when a cumulative cost function (described next) exceeds a preset threshold. Once the computation of the MST’s is complete, we obtain a forest of trees Tk = (Vk, Ek), where k is the index of a tree, Vk is the list of all nodes belonging to tree k, and Ek is the list of all edges belonging to tree k.

Each tree captures the morphology of a single microglial cell.

Quantitative tissue-scale microglia profiling requires three-dimensional (3-D) imaging

35 of large numbers of cells spread across significant lateral and axial extents of brain tissue

(multi-millimeter lateral extent, and 100 – 300m depth). For the experiments reported here, coronal sections of 4% paraformaldehyde fixed rat brain tissue from the motor cortex region were cut into 100μm thick sections, and fluorescently labeled for microglia- specific protein IBA1 to highlight microglia, and DNA stain Hoechst to highlight all nuclei). A Rolera EM-C2 camera (QImaging, Surrey, Canada; 1,004×1,002 pixels) mounted on an Olympus DSU spinning-disk confocal microscope with a ×30 objective

(N.A. 1.05, Silicone oil) was used for imaging individual 3-D image fields of interest

(resolution of 0.267μm/pixel, 14 bits/pixel (x, y), z-step size of 0.3μm), and also to record a series of overlapping 3-D images for mapping tissue regions that are larger than the field of the microscope. For example, the dataset in Fig. 1 is a mosaic constructed from

12 rows and 7 columns of overlapping image tiles, each covering a 267μm square region.

The tiles are collected with a 15 – 20% overlap, spanning a 2.615mm×1.550mm×0.1mm region of tissue. The image tiles were combined into a 3-D montage of the whole field using a previously published method, and displayed in Fig. 1A (Tsai et al. 2011). In addition to these large montages, several randomly selected fields from different animals were analyzed.

3.1.1. Sparsity Based Modeling of Microglia Processes

Given 3-D images 퐼(푥, 푦, 푧), our method learns first the structure of microglial processes from actual images to construct a computational “dictionary based” model for microglial processes, and then utilizes this model to reconstruct the arbors. The term

“dictionary” refers to a set of basis vectors that can be combined linearly to represent the given image.

36

These basis vectors are learned from image patches that are labeled as being foreground (microglia processes), or background. Several authors have shown that learning a dictionary directly from images rather than using a predetermined analytical model can lead to more effective image representations, and can therefore provide improved results in many practical image-processing applications such as restoration, segmentation, and classification (Rubinstein, Bruckstein, and Elad 2010). Our modeling approach is based on the sparse over-complete dictionary learning method described by

(Q. Zhang and Li 2010) that was inspired by the K-singular value decomposition (K-

SVD) algorithm (Aharon, Elad, and Bruckstein 2006). Specifically, (Q. Zhang and Li

2010) described an extension of the K-SVD method, known as the label-consistent K- singular value decomposition method (LC-KSVD). In this method, small image patches that are labeled to indicate classes (e.g., foreground / background) are used as training examples. The typical patches in our work are quite small, typically, 15 × 15 × 3 voxels for the current image stacks. These labeled image patches are extracted from representative training images. The LC-KSVD algorithm has the important advantage of simultaneously and consistently learning a single discriminative dictionary (that is typically more compact compared to the K-SVD method), and a linear classifier. In our work, the dictionary typically consists of 675 − 2,000 dimensional vectors (atoms). A mathematical description of our approach is presented next.

n×N Let X = {x1, … , xN} ∈ ℛ denote a set of image patches drawn from the 3-D image

퐼(푥, 푦, 푧), where n denotes the number of voxels in each patch, and N denotes the total number of image patches. We wish to learn an over-complete dictionary, denoted D =

n×K {d1, … , dK} ∈ ℛ (K > 푛 making the dictionary over-complete) containing K basis

37 elements, usually referred to as “atoms” for building a sparse representation of X. The

K×N patch data X is approximated by DΓ, where Γ = {γ1, … , γN} ∈ ℛ is a matrix that is chosen to respect a sparsity constraint. Specifically, we are interested in representations that minimize the number of non-zero entries in γi for representing signal xi. In practice, the number of non-zero entries is a user-selected parameter that is known as the sparsity constraint, and denoted T. This requires solving the following optimization,

2 < 퐷, Γ > = arg min‖X − DΓ‖F , D,Γ (17)

s. t. ∀ i, ‖γi‖0 ≤ T.

2 The term ‖X − DΓ‖F denotes the squared signal reconstruction error where ‘F’ is the

Frobenius norm. The reconstruction error is minimized subject to the L0 sparsity constraint ‖γi‖0 ≤ T. The K-SVD algorithm is an iterative approach for implementing this optimization in equation that operates in two steps. First, equation (1) is minimized with respect to Γ to learn a dictionary 퐷. Given 퐷, the sparse coding algorithm computes the sparse representation Γ of signal X by solving,

2 < 훤 > = arg min‖X − DΓ‖F s. t. ∀ i, ‖γi‖0 ≤ T. (18) Γ

An exact determination of sparse representations is proven to be a NP-hard problem

(Natarajan 1995). Several algorithms have been proposed in the literature to solve the above sparse coding problem, usually by relaxing the L0 penalty to use the convex L1 penalty instead, as in the basis pursuit method (Chen, Donoho, and Saunders 1998). The

Focal Underdetermined System Solver (FOCUSS) (Gorodnitsky and Rao 1997) is very similar in using the Lp norm, with p < 1 as a replacement for the L0 norm. Another approach is to use greedy algorithms like Matching Pursuit (Mallat 1993), or Orthogonal

38

Matching Pursuit (OMP)(Pati, Rezaiifar, and Krishnaprasad 1993), that select the dictionary atoms sequentially. In this work, we chose OMP to solve the sparse coding problem with computational efficiency in mind.

Figure 4. Flowchart of the proposed Microglia arbor tracing algorithm

The next step is to classify the voxels as seed points using these sparse features. The sparse codes from equation (1) can be directly used for classification. An effective classifier 푓(훾) can be obtained by determining the model parameters W ∈ ℛm×K, satisfying,

2 < 푊 > = arg min ∑푖 ℒ{ hi, f(훾, W)} +λ ‖W‖F, (19) W where ℒ is the classification loss function, hi are the labels (seed point / not a seed point), and λ is a regularization term that is incorporated to prevent overfitting. Widely used loss functions include the logistic function, hinge, and the quadratic. We used the linear predictive classifier in our study so f(훾, W) = 푊훾. However, separating the dictionary

39 learning from the classifier learning will make D suboptimal for classification (Jiang, Lin,

Figure 5. A sample of 3-D image patch dictionary entries displayed as maximum-intensity projections, showing two distinct structures – ones that represent parts of microglial arbors (highlighted with red boxes), and others background. and Davis 2011). It is possible to jointly learn dictionary and classification models

(Mairal, Ponce, et al. 2009; J. Yang, Yu, and 2010) by implementing the following optimization,

2 2 퐷, 푊, Γ > = arg min ‖X − DΓ‖F + +β‖H − WΓ‖F , D,W,Γ (20)

s. t. ∀ i, ‖γi‖0 ≤ T,

m×N where H = {h1, h2, … hN} ∈ ℛ denotes the class labels for N examples and m classes. These approaches require relatively large dictionaries to achieve good classification (Jiang, Lin, and Davis 2011), making it difficult to set the dictionary size

(K). By introducing the label consistency term, a concise unified dictionary can be learned as explained in the next section.

40

The Label Consistent KSVD (LC-KSVD) algorithm learns the classifier model jointly with the dictionary. The resulting dictionary enables reliable classification using only a small, unified dictionary and a single multi-class linear classifier. The central idea behind this approach is to add two additional terms to the objective function for learning the dictionary, specifically a label consistency term, and a classification error term. The dictionary learned in this manner will adapt to the underlying structure of the training data, and generate discriminative sparse codes regardless of the size of the dictionary, as

K×N described by (Jiang, Lin, and Davis 2011). Let Q = {q1, q2, … qN} ∈ ℛ denote such a discriminative set of sparse codes. Let 퐴 denote a linear transformation matrix that transforms the original sparse codes to the most discriminative sparse codes in the sparse

K m×N feature space ℛ . Let H = {h1, h2, … hN} ∈ ℛ denote the class labels for the N examples for m classes. With this notation, the LC-KSVD algorithm can be expressed as the following expanded optimization problem,

2 2 2 < 퐷, 푊, 퐴, Γ = arg min ‖X − DΓ‖F +α ‖Q − AΓ‖F + β‖H − WΓ‖F, D,W,A,Γ (21)

s. t. ∀ i, ‖γi‖0 ≤ T,

2 where the first term ‖X − DΓ‖F represents the squared reconstruction error. The second

2 term ‖Q − AΓ‖F represents the discriminative sparse-coding error. It is intended to penalize sparse codes Γ that deviate from the discriminative sparse codes Q. Intuitively, it forces the signals from the same class to have similar representations. For example, if qi is the discriminative sparse code corresponding to the input signal xi, then the non- zero values of qi occur at those indices where the input signal xi and the dictionary

2 element dk share the same label. The third term ||H − WΓ||2 represents the classification

mXN error, and H = {h1, h2, … hN} ∈ ℛ is the matrix of class labels for N samples and

41 m classes. As a concrete example, for a two-class problem, suppose that the size of the dictionary K is 4 i.e., D = {d1, … , d4}, and the number of examples N is 4 i.e., X =

{x1, … , x4}. To create a discriminative dictionary we want the examples of Class 1

(x1, x2) to use the first two dictionary elements i.e., D = {d1, d2}, and the remaining elements to use Class 2. Therefore, the Q matrix can be written as

1 1 0 0 1 1 0 0 Q = [ ]. (22) 0 0 1 1 0 0 1 1

The matrix Q is set automatically. For the above Q matrix,

1 1 0 0 H = [ ] . (23) 0 0 1 1

The dictionary learned in this manner has excellent representational power, and enforces strong discrimination between the two classes (e.g., seed points vs. non-seed points), as detailed later. Importantly, the dictionary is selected to provide a sparse representation of the image. We are interested in constructing a dictionary such that a small number of its atoms can be linearly combined to represent the image. Following (Q. Zhang and Li

2010) the above optimization can be re-written as

X D 2 < 퐷, 푊, 퐴, Γ ≥ arg min ‖(√αQ) − ( √αA ) Γ‖ , D,W,A,Γ √βH √βW (24) F

s. t. ∀ i, ‖γi‖0 ≤ T.

This can be recast concisely as

42

′ ′ ′ 2 < 퐷 , Γ > = arg min‖X − D Γ‖F , 퐷′,Γ

s. t. ∀ i, ‖γi‖0 ≤ T, where (25)

X D 푋′ = (√αQ), and D′ = ( √αA ). √βH √βW

The above optimization is conducted using the Orthogonal Matching Pursuit (OMP) algorithm. Given the nature of the L0 norm, the non-zero values of qi in the solution occur at those indices where the input signal xi and the dictionary element dk share the

′ same label. We then solve equation (4) using the KSVD method. The matrix D is L2 - normalized column-wise. We then compute the matrices D = {d1, . . , dk}, A =

′ {a1, . . , ak}, and W = {w1, . . , wk} from D . However, we cannot simply use D, A, and W for classifying new image patches since they are jointly normalized in D′ i.e., ‖dt , αat , √βwt ‖ = 1. The normalized dictionary D̂, transform parameters Â, and k √ k k 2 classifier parameters Ŵ are then computed as

d d D̂ = { 1 , … , k }; ‖d1‖2 ‖dk‖2

a a  = { 1 , … , k }; and (26) ‖d1‖2 ‖dk‖2

w w Ŵ = { 1 , … , k }. ‖d1‖2 ‖dk‖2

The inputs to the seed detection algorithm are the set of training image patches X, and a corresponding label matrix H. The algorithm requires two adjustable parameters α and β. The setting of these parameters is described in the Experimental

Results section. They merely serve to adjust the relative importance of the respective

43 penalty terms in equation (3). The outputs of the algorithm consists of the dictionary 퐷, classifier weights 푊, transformation matrix 퐴, and the sparse features Γ. The sample dictionary shown in Fig. 2, shows two distinct classes despite the variability, corresponding to the arbor processes (IBA1+ foreground), and the background. Once the dictionary is learned from the training images, it can be applied to seed detection in other

(non-training) images that are collected using the same imaging protocols. Given the dictionary 퐷, the sparse features Γ can be computed using equation (2) by any sparse coding algorithm, including OMP.

Once the sparse features Γ and the classifier W, are learned, the next step is to clasify each pixel in the test image as a seed point or otherwise, based on the sparse image representation. For this, we first extract the image patch xi, surrounding the voxel in question from the test image 퐼푡(푥, 푦, 푧), and compute the sparse feature γi,with respect to the learned dictionary D. Then, we classify the pixel using a linear classifier W as,

w1 i) Compute Wγi = [ ], w2

1 or w (seed point) if w > w ii) Set 푉(x, y, z) = { 1 1 2, 0 (background) else

44

Figure 6. (A) Sample IBA1+ cell (grayscale) overlaid with initially detected seed points (red). (B) Close-up of boxed region in Panel A. The white arrows indicate some valid seed points, and the yellow arrows some indicate invalid seed points.

where V is the seed point image whose voxel value is set to 1 or w1 for pixels that are classified as a seed point, and to 0 otherwise. In Section 4, we present experimental data showing that the seed points detected by the above algorithm (Fig. 6) are sufficiently reliable and accurate for reconstructing microglial arbors.

3.1.2. Automated Microglia Arbor Reconstruction

Microglia consist of branching processes that form a tree, and the processes stem from a central soma that envelops the cell nucleus. Detecting the soma is the first step to reconstructing the microglia. For this, we proceed in two steps. First, we perform an automated segmentation of the Hoechst image channel (the nuclear label) to detect all cell nuclei (Al-Kofahi et al., 2010). From the segmentation results, we compute a set of

45 features including the 3-D location, volume, shape factor, and chromatin intensity variance (a texture measure) for each nucleus. Next, we compute the total amount of the microglial marker IBA1 within a distance of 8 voxels to each nucleus (Bjornsson et al.

2008). We then isolate the microglial cell nuclei based on these features, using an active learning based classifier (Padmanabhan et al. 2014). Microglial cell somas that extend beyond the nuclei are segmented using a level-set algorithm. The centroids of the somas are used as the starting point for the tracing algorithms.

The soma segmentations allow us to eliminate any false seed points that lie within the soma. The remaining seeds are used to reconstruct the cellular arbors. As noted above, individual microglia are known a priori to have a tree topology. Furthermore, microglia do not form inter-connections with neighboring cells (Rouchdy et al. 2011). These constraints enable us to develop fast and scalable algorithms that are appropriate for tracing the arbors of these cells. The tree assumption is also helpful from the standpoint of extracting quantitative measurements of arbors, since existing computational neuroanatomy tools, e.g., the L-measure (Scorcioni, Polavaram, and Ascoli 2008) expect a tree topology. Based on these considerations, we designate the centroids of the microglial somas as the root points for their respective cellular arbors. Starting from all root points, we simultaneously construct a forest of R minimum spanning trees (MST’s), where R is the total number of detected root points. Each MST consists of nodes and directed edges. The nodes for a single cell include the root point, a subset of the seed points, referred to here as the primary nodes and denoted v ∈ V0, an all intermediate pixels that are termed secondary nodes. The secondary nodes are used for calculating the costs associated with their neighbors (as explained further below).

46

Each node is identified by its 3-D coordinates in the image. The directed edges, denoted eij ∈ E, link the nodes, and their orientations always point towards the root point.

Each edge is associated with a weight value that is used for determining whether or not a given pixel belongs in the arbor reconstruction. The process of growing each MST is based on an adaptation of Prim’s algorithm (Mariano et al. 2013; Teacher and Griffiths

2011). Starting from each root node, our algorithm connects each successive node to the nearest primary node (in the sense of the cost metric defined below), to form an edge

(link). This process is repeated at each added node. The algorithm iterates as it expands all the trees starting from their respective roots concurrently to cover all the primary nodes V0. For each cell, the tree-growing procedure is stopped when a cumulative cost function (described next) exceeds a preset threshold. Once the computation of the MST’s is complete, we obtain a forest of trees Tk = (Vk, Ek), where k is the index of a tree, Vk is the list of all nodes belonging to tree k, and Ek is the list of all edges belonging to tree k.

Each tree captures the morphology of a single microglial cell.

3.1.3. Dynamic Cost Metric for Linking Nodes

The tree-growing process described above is guided by a cost metric that takes into account the normalized weight, denoted wi, for the classified seed points, and the spatial proximity of points being considered for a connection. Intuitively, this method traces microglial processes by linking nodes forming the shortest geodesic path along the brightest (IBA1+) voxels representing the image foreground. With this in mind, we compute our connection costs using the Fast Marching Method (FMM) (Sethian 1996) that provides an efficient way to identify geodesic paths from each node to the corresponding root node. The FMM provides a fast approximation to the stationary

47

Hamilton-Jacobi equation, commonly referred to as the Eikonal equation. This non-linear partial differential equation is used for path planning problems in non-homogeneous media. Given a domain Ω ⊂ ℛn, the Eikonal Equation is of the form,

‖∇ΩC(푥, 푦, 푧)‖ = F(푥, 푦, 푧), (27)

C(푥0) = 0, where C(푥, 푦, 푧) is the cost function, F(푥, 푦, 푧) is a positive-valued function, ∇ denotes the gradient operator, and ‖.‖ is the Euclidean norm. In our formulation, the function

F(푥, 푦, 푧) is set to the inverse of the score given by the result of seed detection algorithm, instead of the voxel intensity.

In our FMM implementation, a vertex vi from a growing MST Tk can be thought to emanate “waves” that propagate through the image with their speed of propagation proportional to the normalized weight of the classifier. The wave propagation from the vertex vi is allowed to continue to the next closest vertex vj, that is not part of any previously detected tree. The new vertex vj , and the edge ej at this point are included in

Vk and Ek, respectively, resulting in an expansion of the tree Tk. This process is repeated concurrently for all trees (one per microglial cell) until all the primary nodes in V0 are linked, or the minimum distance between all possible links exceeds a pre-defined cost threshold τ. Nodes that are deemed more expensive than this threshold are not added to a tree. This is a mechanism to prevent unreliable points from being erroneously added to the distal portions of microglia reconstructions. In other words, τ sets the overall sensitivity of the reconstruction method. A higher threshold makes the algorithm more sensitive to image intensity values, and vice versa.

48

3.1.4. Priority-queue Based Reconstruction Algorithm

In order to implement our modified version of Prim’s algorithm, we use a priority queue based approach. Each element of the queue is a voxel in the original input image.

Both primary and secondary nodes are added to the queue. The priority of each element is determined by its cost, with lower costs indicating a higher priority. The queue is initialized with the root nodes indicating the centroids of the microglial soma. These initial elements are all assigned a cost of zero, meaning that they have the highest priority. Each of these initial points become the root of a separate tree, since they each belong to a separate microglial cell. Once the priority queue has been initialized, we start using our modified version of Prim’s algorithm to construct arbors of the microglia. An overview of this procedure follows.

Each iteration of the graph construction algorithm consists of the following steps:

1. Remove node 푛 with the lowest cost from the queue. Steps 2 – 4 are only

executed if it is a primary node that has not already been assigned to a tree.

2. Find the closest tree 푡 to node 푛. Here “closest” is meant in the sense of

minimizing our cost metric. We will refer to the node closest to n that already

belongs to 푡 as a leaf node.

3. Assign to tree 푡 all the nodes along the path from 푛 to the leaf node.

4. Update the cost of all the nodes along this path.

5. Search node 푛′푠 neighborhood for other low-cost nodes and add them to the

priority queue.

We repeat this procedure until the queue is empty.

Once a MST is generated for each microglial cell, interpolation is required to smooth

49 the arbor representations. For this, all nodes that have been added to the MST are considered, with the exception of the root and leaf nodes. The location of each such intermediate node is updated to the average of its prior position, its parent position, and the position of its children. This averaging is weighted by the image intensity at each location, considering that brighter voxels are more likely to reside on arbors.

Finally, a pruning process is performed to reduce the “footprint” or artifacts of MSTs without losing important morphological details. In this pruning process, each node is marked as “active” or “inactive” before any pruning actually takes place. Initially, all nodes are marked inactive. A node is marked as active if any of the following conditions are met:

• The node is a branch point, having more than one child.

• The node is a leaf node, having no children.

• The node is a root node, having no parent.

• The node’s parent is inactive and its child is also inactive.

This prevents removal of consecutive nodes, which would otherwise result in loss of morphological detail. A node is explicitly marked as inactive if it is a leaf node whose distance from the branch point is less than an empirically set minimum offshoot length.

In our experiments, this parameter was set to 15 voxels. This procedure results in removal of small branches that likely are tracing artifacts rather than genuine microglial arbor segments. Once all the nodes are marked, those marked as inactive are removed. Children of inactive parents are re-assigned to be children of their closest active ancestor. Finally, once pruning is complete, the MSTs are re-interpolated.

50

3.1.5. Feature Computation

Given the reality that no single number captures all aspects of a complex cellular

Figure 7. Pictorial representation of Microglial Arbor Features arbor, it is necessary to compute libraries of arbor features. Scorcioni et al. (2008) have compiled a comprehensive library called the L-measure, containing approximately 130 features per arbor that quantify various aspects of arbor morphology (e.g. size and shape of the soma, number of stems, branching and tapering profiles, etc.), as shown in Figure

7.

3.1.6. DIADEM metric

As per our knowledge there has not been much literature in validating the seed points from different algorithms. However, there are a few metrics available for validating the traces, one of which is the DIADEM metric, which was used in DIADEM Challenge

51

(DIgital reconstructions of Axonal and DEndritic Morphology), to compare different algorithms for automatically tracing representative neuroscience data sets.

The DIADEM metric is designed to compare the test data with the ground truth (or the golden truth), which is generated manually. This metric is quite popular and uses a geometric measure based on a nearest-neighbor search to map the branch points of the test data to those in the ground truth.

Figure 8. Test Trees A and B are compared with the ground truth in yellow.

The algorithm moves up towards the root node, scoring the results based on correctly detected nodes and length of the fibers. It finds the target node within a predefined region (cylindrical region) around the node in the golden truth, and if the target node is within the threshold, then it marks the node as the correct match, if not, then it marks it as either the missed node or the excess node. By default, the weight of the target node is its degree (i.e. the number of terminations to which the node leads). Terminal test nodes that do not match any node in the gold standard tree are excess nodes, counted as misses with weight 1. Once the algorithm finds all the correct, excess, and missed nodes, a metric is calculated as

# 푐표푟푟푒푐푡 푚푎푡푐ℎ푒푠 퐺푟푎푛푑 푆푐표푟푒 = . #푛표푑푒푠 푖푛 𝑔표푙푑푒푛 푡푟푢푡ℎ+#푒푥푐푒푠푠 푛표푑푒푠

52

3.1.7. Software Implementation Methods

The proposed arbor reconstruction algorithm is implemented in C++ using widely used open source libraries including ITK for common image processing steps (Ibanez et al. 2003) and openMP for parallel processing (Dagum and Menon 1998). We used

Rubinstein’s KSVD library (Rubinstein, Bruckstein, and Elad 2010), and Zhang’s LC-

KSVD library (Q. Zhang and Li 2010) in MATLAB for the dictionary learning steps. For representing the cell reconstructions, we used the widely used SWC open source file format (R. C. Cannon et al. 1998). The design of our algorithm guarantees that the arbor reconstructions will be trees (no cycles).

The algorithms described above can be used for reconstructing individual 3-D image fields from the confocal microscope that fit into the computer’s memory. Analyzing the large mosaic fields like Figure 1 requires a more scalable implementation that is not limited by the computer memory. One idea in this regard is to trace the cells in the individual tiles, and then stitch together the resulting traces across the image boundaries.

This task is extremely difficult, and a reliable method does not exist. With this in mind, we developed an indefinitely scalable “dice & trace” implementation that works as follows. The seamless mosaic image is divided into small “dices.” Each dice is a small 3-

D image that is carved out from the mosaic, centered on one microglial cell, and just large enough to include the cell completely. Each dice may contain more than one cell, or parts thereof. The size of the dice can be easily specified, and for the current experiments it was set to 600 × 600 × 300. The size of the dice was chosen sufficiently large so that we don’t clip any arbors while carving the image. We run the multi-arbor reconstruction algorithm over each dice, but only retain the reconstruction for the cell in the center of

53 the dice. These retained traces are then assembled together to form the reconstruction results for the large mosaic image. This approach has the disadvantage of performing repeated tracing of cells, since the reconstructed non-center cells in each dice are discarded. This is compensated adequately by the fact that dices can be processed in parallel by assigning one computing thread per dice, giving this approach the advantage of unlimited scalability. The total time to compute each dice is roughly 5-8 mins on a

Dell PowerEdge server with Intel Xeon(R) 4870 2.4GHz processors with 500GB RAM, and this includes time for seed detection and tracing each dice. If this process was done using the traditional approaches then the time taken for the image with 3,300 cells would be 412.5 hours, however with our approach the time taken is 5.2 hours that is much better than the traditional approaches.

54

Table 8: Pseudo code: Summary of the proposed seed point detection algorithm

Algorithm 1 Detect Seed Points

Goal : To obtain seed points given the dictionary D and weights of classifier W

Step 1: For each pixel in the image 퐼(푥, 푦, 푧), extract the window centered at the pixel. (To reduce the computation cost, we have applied binarized mask and overestimated the foreground by reducing the threshold value).

Step 2: Get Sparse Features using OMP by solving the below equation

′ ′ ′ 2 < 퐷 , Γ > = arg min‖X − D Γ‖F s. t. ∀ i, ‖γi‖0 ≤ T 퐷′,Γ

X D where 푋′ = (√αQ), and D′ = ( √αA ). √βH √βW

Step 3 : Given the sparse codes and weights of classifier W, classify the pixel as a potential seed point if

w WΓ = [ 1] w2

1 or w1 Class 1 (arbors): if w1 > w2

0 Class 2(background) Otherwise

Step 4: Update the seed points list

Step 5: Filter out the seed points if they are not on the centerline of the arbor by retaining only those seed points, which are at the maximum intensity of profile in the window 15x15x3(same size as that of the patch).

55

Figure 9. “Click and Trace” feature of Trace Editor.

3.1.8. Click and Trace

We have also developed a plugin in Farsight-toolkit to trace individual cells whose nuclei were missed due to the segmentation error or because of the poor image quality.

For microglia whose nuclei were missed by the automated process, the user can manually initiate automatic arbor tracing by clicking on the soma center, and specifying how far out to trace, and the imaging noise level to tolerate as show in the Figure 9.

56

3.2 Spectral Unmixing by Morphologically-constrained Dictionary Learning for

Multiplex Fluorescence Microscopy

The proposed Morphologically Constrained Spectral Unmixing algorithm (MCSU) is inspired by the recent progress in blind source separation (Abolghasemi, Ferdowsi, and

Sanei 2012; Starck et al. 2005). It augments the constrained least squares (CLS) method by a morphological constraint to enable morphologically constrained spectral analysis.

The rationale behind this is quite intuitive – if the spectral angle between a pair of fluors is small, then the morphological constraint should be given more importance, and vice versa. For this reason, we set a tradeoff parameter (explained below) adaptively based on the spectral angle. The second proposed algorithm Morphologically Constrained Spectral

Unmixing with Total Variation (MCSU- TV) adds the morphological constraint to the constrained sparse unmixing by a variable splitting augmented Lagrangian, and total variation (CSUnSAL-TV), following the approach of Iordache et al. (2012). In effect, we add the constraint that neighboring pixels not only have similar abundance matrices, but also correspond to similar local morphologies. Both the Morphologically Constrained

Spectral Unmixing algorithm (MCSU) and Morphological Constraint Spectral Unmixing-

Total Variation (MCSU-TV), learn a patch-based dictionary using dictionary learning algorithms (Aharon, Elad, and Bruckstein 2006; Mairal, Bach, et al. 2009) from the reference images, as shown in Figure 10. These patch based dictionaries are learned from the reference images as explained in the following section.

57

Figure 10. Illustrating the ability of the learned dictionary atoms to capture the morphologies of structures for the channels. The first column shows the reference images (R1 – R8). The remaining columns show sample dictionary atoms.

3.2.1. Reference Images

Single stain reference slides were obtained for each marker, and imaged using the

Vectra Multispectral Imaging System (Perkin-Elmer, Waltham, MA). A 200× image was taken at every 10nm within the range for each filter to generate a multispectral image cube. Filters used were DAPI (440nm-680nm range), FITC (520nm-680nm range), Cy3

(570nm-690nm range), Texas Red (580nm-700nm range), and Cy5 (670nm-720nm range). A spectral library was generated with the Nuance image analysis software

(Perkin-Elmer, city) using a combination of single-stained reference slides and multiplexed slides to compensate for spectral interactions. These multi-spectral single- stained images were unmixed using their spectral signatures to derive the reference

58 image. These reference images were then used to learn the distinct morphologies of different cells.

3.2.2. The Morphologically Constrained Spectral Unmixing (MCSU) Algorithm

휆×푛2 푛2 Let 푌 ∈ 푅 denote the observed (mixed) data, where 푦푖: ∈ 푅 , each row is an image of size 푛 × 푛 acquired at a specific spectral band 휆. Let S denote the matrix of fractional abundances. Given these mixtures, the linear model can be written as

푌 = 퐴푆 + 훴, (28)

2 where 퐴 ∈ 푅휆×푠 denotes the spectral signature, 푆 ∈ 푅푠×푛 is the source image and each

T 푛2 row vector denoted by 푠푖 ∈ 푅 represents the unmixed or the source image, and Σ is the additive noise. Usually the only known information is 푌 – the mixing matrix 퐴 and the source images 푆 are estimated by solving,

푚푖푛‖푌 − 퐴푆‖2, 퐴,푆 (29) 푠 푠. 푡. ∑푖=1 푠푖 = 1 , 푆 ≥ 0.

The linear equations are usually solved with the singular value decomposition (SVD) method. However, in some cases, the spectral library A can be derived prior to the unmixing by computing the reference spectra.

푠 At the end of dictionary learning stage, we have dictionaries {퐷푖}푖=1 from each

푠 reference image that capture the local morphology. Given these dictionaries {퐷푖}푖=1, we reformulate the above linear unmixing problem in (2) as

푠 푁 2 푇 2 푠 ‖ ‖ (30) 푚푖푛퐴,푆,{훾 } ( 푌 − 퐴푆 2 + 휆퐷 ∑ ∑‖ℜ푖푠푗 − 퐷푗훾푗‖ ), 푗 푗=1 2 푗=1 푖=1

59

푠 푠. 푡. ∑푖=1 푠푖 = 1 , 푆 ≥ 0, ‖훾푖‖0 ≤ 푇.

The first term in equation (30) minimizes the spectral deviation as in the case of linear unmixing. However, the second term is the morphological constraint that we have added to take advantage of the known morphologies learned during the initialization stage. The

T 푟 푡ℎ vector ℜ푖푠푗 ∈ 푅 represents the 푖 patch of size √푟 × √푟. For notational simplicity, the

푡ℎ 푟×푛2 푡ℎ 푖 patch is expressed as a multiplication of operator ℜ푖 ∈ 푅 and 푗 source

푇 푛2×1 image 푠푗 ∈ 푅 . The regularization parameter 휆퐷 controls the relative importance of the spectral and morphological constraints. Intuitively, this parameter should be set to a higher value if the spectral signatures between the reference images are close, thereby giving more importance to the morphological constraint, and can be lower if the spectral signatures are apart thereby giving higher importance to the spectral constraint. Section

3.2.4 explains how our method adaptively selects this parameter. Equation (5) can be simplified further as

2 2 푚푖푛 (‖ 퐸 − 푎 푠 푇‖ + 휆 ∑푁 ‖ℜ 푠 − 퐷 훾 ‖ ) , 푠 푗 푗 푗 2 퐷 푖=1 푖 푗: 푗 푖 2 {푎:푗,푠푗:,{훾푖}} 푗=1 (31)

푠 푠. 푡. ∑푖=1 푠푖 = 1 , 푆 ≥ 0, ‖훾푖‖0 ≤ 푇,

푠 푇 푡ℎ where 퐸푗 = 푌 − ∑푖=1;푖≠푗 푎:푖푠푖: , is the 푗 residual, thus we can solve the above problem for one source at a time. Since the problem is NP-hard we solve it iteratively for cases where the spectral signature is unknown, the spectral signatures matrix 퐴 can be first computed by treating 푆 and 훾푗 in equation (5) as fixed, that gives the following optimization,

(‖ ‖2) (32) 푚푖푛퐴 푌 − 퐴푆 2 ,

60

푠 푠. 푡. ∑푖=1 푠:푖 = 1 , 푠:푖 ≥ 0.

This is a constrained least square problem that is easily solved by computing the pseudo-

푇 inverse of S. Next the source images 푠푗 can be computed by taking the gradient of equation (6) w.r.t 푠푗: by keeping all the terms fixed and equating it to zero, that gives,

푇 푇 −1 푇 푇 푠푗 = (퐼 + 휆 ∑푖 ℜ푖 ℜ푖) (휆퐷퐸푗 푎:푗 + ∑푖 ℜ푖 퐷푗훾푖). (33)

This step is similar to the one described for the classic K-SVD algorithm (Aharon, Elad, and Bruckstein 2006; Michael Elad and Aharon 2006) for handling large images, and in blind source separation using adaptive dictionaries (Abolghasemi, Ferdowsi, and Sanei

2012). In our implementation, we learn the dictionaries during the initialization and the regularization parameter 휆퐷 is set adaptively at each iteration as explained in the following section.

3.2.3. Morphological Constrained Spectral Unmixing – Total Variation (MCSU-tv)

2 2 Let 푌 ∈ 푅휆×푛 denote the mixtures of observed data, let 푆 ∈ 푅푠×푛 be the fractional abundance matrix, and ‖푆‖퐹 denote the Frobenius norm of S. Also, let ‖푆‖1,1 =

푛 푡ℎ ∑푖=1‖푠푖‖1, where 푠푖 denotes the 푖 column of S. Next, let 휆 ≥ 0, 휆푇푉 ≥ 0, 휆퐷 ≥ 0 denote the regularization parameters, and 퐷 denote the unified dictionary learned from the reference images. Finally, let 퐴 denote the spectral signatures matrix. With these definitions the MCSU-TV algorithm can be formulated as the following optimization,

푚푖푛 (‖푌 − 퐴푆‖2 + 휆‖푆‖ + 휆 푇푉(푆) + 휆 ‖ℜ푆 − 퐷훤‖2 ), 퐷,Γ,푋 2 1,1 푇푉 퐷 퐹 (34)

푠. 푡. 푆 ≥ 0, ‖γi‖0 ≤ T,

61 where

푇푉(푆) = ∑ ‖푠 − 푠 ‖ , (35) 푖 푗 1 {푖,푗}∈휋 is a vector extension of the non-isotropic total variation measure (Iordache, Bioucas-Dias, and Plaza 2012), that promotes the uniformity of the abundances for neighboring pixels, and 휋 denotes the set of horizontal and vertical neighbors in the image. ℜ푆 ∈ 푅r×Ns denotes the patches of size √푟 × √푟 from 푁푠 pixels from all the reference images. The optimization problem in (10) can be solved iteratively by first solving for dictionary 퐷 and sparse codes Γ, that gives,

푚푖푛 (‖ℜ푆 − 퐷훤‖2 ), (36) 퐷,Γ 퐹

푠푢푏푗푒푐푡 푡표 ‖γi‖0 ≤ T.

Equation (36) can be solved using K-SVD (Aharon, Elad, and Bruckstein 2006)s. Given

퐷, Γ we can solve for 푆, by reformulating equation 34 as

‖ ′ ′ ‖2 ‖ ‖ ( ) 푚푖푛푋 ( 푌 − 퐴 푆 2 + 휆 푆 1,1 + 휆푇푉푇푉 푆 ),

푠푢푏푗푒푐푡 푡표 푆 ≥ 0, ‖γi‖0 ≤ T, (37) where,

1 1 ′ ( )푌 ′ ( )퐴 푌 = [ 휆퐷 ] 푎푛푑 퐴 = [ 휆퐷 ]. ℜ+퐷Γ 퐼

The optimization problem in equation (37) can be solved in a using the fast Sparse

Unmixing via Variable Splitting Augmented Lagrangian (SUnSAL) (Iordache et al.,

2012) algorithm, which uses Alternating Direction method of Multipliers ( ADMM)

(Eckstein and Bertsekas, 1992) to solve the constrained optimization problem.

62

3.2.4. Adaptive Setting of the Regularization Parameter

Let the angle between two different spectral signature 푎:푖, 푎:푗 be denoted 휃푖푗. Higher values of 휃푖푗 indicate that the two respective spectral signature are apart, thus simple linear unmixing can easily separate two images. On the other hand, lower values of 휃푖푗 indicate that two spectral signature are not easily separated. In other words, the spectral crosstalk between two source images is high, thus a higher value of the regularization parameter 휆1 is needed to solve equation (5) to give more importance to morphological constraint, thus the parameter 휆1푤 can be set adaptively after every iteration 푤 as

− min ∠(푎푖,푎푗) (38) 휆1푤 = 푒 ∀ 푖, 푗 = 1 … 푠.

3.3 Infant data decoding and channel relevance using sparse group lasso and label

consistent dictionary learning.

3.3.1. Subjects

Eighty healthy infants (41 female and 39 male) ranging from 6 to 24 months old, were recruited for this study under informed consent by their parent(s) or guardian(s). This study was approved under the University of Houston Institutional Review Board

(Application ID 2881). Parents were given a brief overview of the study. Data from sixty one (28 male and 33 female) of these infants were used for analysis, since not all the infants cooperated while wearing the headsets or became uncomfortable during the experiment. Of those, n=43 were white, n=6 were Asian, n=6 were African-American and n=8 were from more than one race. In total 54.1% of children were female, and the mean age was 13.42 months (SD=4.96, range 6 – 24). 45.90% of the children were male,

63 and the mean age was 11.57 months (SD=5.87, range 6-24). Between the ages of 6 – 10

Figure 11. Illustrating the distribution of the total infants participated in the study, by age and race. A total of 61 subjects from Houston area participated in the study. months old were predominantly male and between the ages of 12 to 18 were predominantly female. After 20 months old of age the distribution between genders was equal for both.

3.3.2. Experimental Protocol

The experimental methods have been described in detail in Cruz-Garza et al. (2015)

(Cruz-Garza et al. 2015; Hernandez et al. 2014). Briefly, we summarize the protocol here.

The child was seated on a table accompanied by his/her caregiver, who sat behind the child. The child was prepared for the EEG by measuring her/his head circumference in centimeters and the distance from the nasion to the inion along the midsagittal plane of the surface of the scalp. The EEG cap was place on the infant’s head, aligning the Cz electrode with the vertex of the head, and center the Fp1 and Fp2 electrode in the 64 forehead. The cap was adjusted symmetrically along the midsagittal plane of the head.

Using a small syringe, gel was injected into the space between the scalp and the electrode. The IMUs were secured with straps on wrists, chest and the head of the infant.

3.3.3. Data Acquisition

To record brain activity, we used a 64-channel active EEG scalp cap (actiCAP, Brain

Products GmbH.) aligned to a 10-20 international extended system montage. Reference and ground electrodes were placed at the FCz and AFz locations, respectively. Data collected from the amplifiers were sampled at 1000 Hz with a 0.1 μV resolution and band-passed filtered between 0.1-to-1000 Hz. A video camera was used for recording video during the experiment session. The recording was used for marking behavioral events enacted by the infant as well as confirming the start and end markers of the experiment session.

3.3.4. Experimental Task

Both subject and actor seated facing one another and a series of trial were conducted.

The trials consisted of a series of turn-taking actions, where the actor passed the object to the infant in an effort to initiate an imitation or exploratory response, and retrieved or asked the object back from the infant to repeat another trial. To reduce confounding effects of adaptation, different toys were used throughout the experimental session. At the beginning of the experiment, the experimenter sat still for 1min to allow for collection of rest data from the infant.

3.3.5. Data Pre-processing

All data analyses were performed offline using custom software developed in

65

MATLAB (Mathworks, Natick, MA) including the FieldTrip toolbox (Oostenveld et al.

2011). Whole experiment session EEG data was first separated into trials based on manually identified behaviors from each subject. All behaviors except for rest were segmented by taking two seconds before and after the beginning time stamp of the behavior. The rest behavior, which could contain up to a 30-second time duration, was divided into segments of four seconds within the beginning and end time stamps of the behavior. EEG and behavior time stamps were aligned temporally using external triggers during the experiment session. The trial-segmented data structure was then resampled to

500 Hz to speed up computation time and channels with high impedance (Z > 60 훀) were removed. A spherical spline (Perrin et. al., 1989) was applied to interpolate missing channels using triangulation-based channel neighbors. A fully automated online artifact removal technique for brain-machine interfacing (FORCe) was employed to reduce the effects of non-EEG signals, especially those generated from muscle, eye, or electrode movement-related artifacts(Daly et al. 2015). This method has been shown to outperform other automated artifact methods such as the Lagged Auto-Mutual Information Clustering

(LAMIC)(Nicolaou and Nasuto 2007) and Fully Automated Statistical Thresholding for

EEG artifact Rejection (FASTER) (Nolan, Whelan, and Reilly 2010). Further, for each infant and behavior type, we looked at the maximum acceleration data from IMUs for each trial. The trials, that were three standard deviations away from the mean were identified as outliers, and the corresponding EEG trials were removed for any further analysis.

3.3.6. EEG Data Analysis

In previous studies only distinct frequency bands (e.g., delta band) from EEG signals

66 were selected (Cruz-Garza et al. 2014; Liao K., Xiao R., Gonzalez J., Ding L, Wolpert

2014; F Lotte et al. 2007) for classifying the neuronal response or studying the channel relevance to localize the source, however these approaches neglect the important information from other frequency bands that could be vital for classifying neuronal patterns or in discovering hidden bio-markers. We first identified different frequency bands of interest using power spectral density and computed the 훍-rhythm using morelet based transform.

3.3.7. 흁-rhythm identification

To compute the ERD, for each segmented trial we first performed the spectral analysis using wavelet transform. A Morlet wavelet was used to perform the normalized wavelet transform for the channels ('CP3', 'CP1', 'CPz', 'CP2', 'CP4', 'C4', 'C2', 'Cz', 'C1', 'C3',

'FC3', 'FC1', 'FC2', 'FC4'). We then computed the median of all wavelet transform for the

‘rest’ trials. For each trial and for each behavior type, we then computed ERD as: (b-a); where ‘b’ is wavelets for behavior (per trial) and ‘a’ is the median of all rest trials. Finally we identified the maximum suppression between 4-10 Hz.

3.3.8. Channel Relevance and Neural Classification

Based on power spectral density, we decomposed the signal into different frequency bands for every trial. We then computed six different features (Average Power, Max amplitude, Kurtosis, median absolute deviation, Standard Deviation and mean) for each band and each electrode within a moving window of 0.5 seconds with an overlap of 0.25 seconds. Thus making the feature vector size 64 (channels) ×6 (number of features) ×7

(no. of different frequency bands) = 2688. These features were computed for every trial and every behavior. Next we presented these features to Sparse-Group Lasso (SG-

67

LASSO) (Jun Liu, Shuiwang Ji 2009) to determine the most important features.

This information about different weights for every channel and for different frequency bands was then used as an input to the dictionary learning based classifier. Given the features X from channels selected by the SG-LASSO, our models learns a label consistent dictionary and a classifier jointly, that learns the distinct characteristics of each class and stores it in a dictionary D and a classifier W, to separate different behavioral actions, by solving the following optimization,

2 2 2 < 퐷, 푊, 퐴, Γ = arg min ‖X − DΓ‖F +α ‖Q − AΓ‖F + β‖H − WΓ‖F, D,W,A,Γ (39)

s. t. ∀ i, ‖γi‖0 ≤ T.

Once the dictionary D and classifier W are learned from the training features, the classification of the test data Xtest is obtained by first solving for sparse codes Γtest and then classifying the behavior type by solving max(WΓtest) . The above procedure of i computing the features using SG-LASSO and classification using label consistent dictionary learning was repeated to solve two class classification to compare the different behaviors (explore, reach-grasp, reach-offer and imitate) against the rest for each subject.

68

CHAPTER 4 RESULTS AND VALIDATION

4.1 Microglia Arbor Tracing

For proofreading the 3-D arbor reconstructions, we used the open source FARSIGHT

Trace editor (Luisi et al. 2011). This tool allows the user to view the reconstructions from any angle, and edit the reconstructions if desired. The editing capability was not utilized for the experiments described here, since our end goal was an analysis of microglial activation patterns rather than exact reconstructions of each cell. The core of the dice and trace method is also embedded into this tool for tracing individual cells interactively – the user can trigger the reconstruction of an entire missed arbor by manually specifying the root point. Manually reconstructing thousands of cells, or even editing automatically generated reconstructions at this scale is not feasible and is time consuming. Our expectations of reconstruction accuracy are also appropriately pragmatic – we expect valid reconstructions with sufficient accuracy to analyze microglia activation patterns.

With these considerations in mind, our experiments are primarily designed to study the impact of parameter selection on reconstruction performance. For this, we randomly selected a population of 50 cells across multiple images from multiple animals. These cells were re constructed manually by a pool of volunteers. Each cell was reconstructed by at least two different users using the Neuromantic tool (Myatt et al. 2012), and saved into the standard SWC file format. This is a labor-intensive activity that takes between 40

– 60 minutes per cell. The next step was to compute a metric to compare the automatic reconstructions against the manual reconstructions. The DIADEM (Gillette, Brown, and

Ascoli 2011) metric is currently the most respected metric for comparing two different reconstructions. However there are a few limitations, for example, two reconstructions

69 performed manually by different users do not necessarily concord with a perfect score of

1, and this metric was originally intended for neurons. Despite these limitations, we found this metric to be helpful in comparing reconstructions of the tree-like microglial arbors.

Table 9: Validation of seed points

% Seed points on % Seed points Coverage (% of traces Algorithm Traces not on traces covered by seed points)

Wang et al., 2011 40.1 59.8 94.3

Galbreath et al. 2011 37.5 62.5 98.6

Our Method 72.3 27.8 94.7

To compare the quality of the seed points we computed three scores for every cell in the test set. The first score is the number of seed points in the vicinity (we fixed the radius to 5 voxels) of the manual trace. A higher percentage value indicates that the detected seed points lie on the traces. However, this does not indicate the quality of the seed points. The next score is the percentage of seed points that are in the background.

This measure indicates the potential for spurious traces. The third score is the percentage of traces missed or not covered by the detected seed points. This measure indicates the potential for missed processes. The average scores for 25 cells is presented in Table 13.

4.1.1. Parameter Selection

Seed point selection is the primary determinant of reconstruction performance. For

70 this reason, our first experiments are designed to evaluate the impact of parameters that directly affect this step. The first step in seed detection is to learn the patch-based dictionary. Based on the expected range of diameters of the arbor processes, we used a fixed 3-D block (patch) size of 15 × 15 × 3 voxels. The labels for learning the dictionary and training the classifier within the LC-KSVD algorithm were selected randomly from the manual traces. A total of 50,000 seed points from the image foreground were randomly selected from the training images, and the same numbers of pixels were selected from the background. The dictionary D and the classifier W were derived jointly using the LC-KSVD method.

The seed detection algorithm has four parameters (T, K, 훼 , 훽) for the dictionary learning step. The parameters 훼 and 훽 set the relative importance of the terms in equation (3). For our experiments, these parameters were set to 5 and 8, respectively following (Jiang, Lin, and Davis 2011). We observed that changing these parameters did not have a substantial impact on the type of dictionary learned. We also compared the

DIADEM metric for K-SVD (by setting 훼 to zero) vs. LC-KSVD. Our results demonstrated that discriminative dictionary learning (LC-KSVD) outperforms the K-

SVD method. Furthermore, the DIADEM metric did not vary significantly with respect to the size of the dictionary. During the classification stage, there is only one parameter – the sparsity constraint (T) that must be set by the user. This parameter can be set to the same value that was used for the learning stage.

To examine how the reconstruction performance varies with different choices of

71

Figure 12. (A) The mean DIADEM metric for different algorithms. Our algorithm was insensitive to the: (B) size of the dictionary K; and (C) to variations in the sparsity constraint T. (D) Increasing the cost threshold 흉 led to reduced performance. T and K, we divided our dataset of 50 cells into two groups, a validation set containing 25 cells and a test set containing 25 cells. We varied T from 2 – 15 and K from 675 – 1,800, respectively, and plotted the average DIADEM metric for these 25 cells in Figs. 12B and

12C. As pointed out in describing the LC-KSVD algorithm, the performance of the classification algorithm was not very sensitive to the size of the dictionary and the sparsity constraint, due to the discriminative term that was added during the dictionary and classifier learning stage. The performance of our algorithm was between 0.7 and 0.8 for different values of K and T, indicating that it is stable against parameter changes. The tracing algorithm is, by design, sensitive to one parameter – the cost threshold τ. Fig. 4D

72 presents a plot of the average DIADEM metric for 25 different cells against the cost threshold. As expected, the performance of the algorithm was sensitive to this parameter.

To test the robustness of the parameters to perturbations, we selected the parameters

(K, T, τ) with the highest DIADEM metric on the validation set and traced the test set.

The algorithms performed equally well on this dataset with a mean DIADEM metric of

0.77 and a standard deviation of 0.07.

We also compared the performance of our algorithm with two different state of the art algorithms based on Laplacian of Gaussian (LoG) and vesselness measures for detecting the seed points (Galbreath 2011; Yu Wang, Narayanaswamy, and Roysam 2011) for microglia images. The results were found to be on par, and in some cases, better, as shown in Fig. 4A. However our algorithm has the important advantage of performing reconstructions in a scalable and seamless manner across large 3-D image mosaics containing thousands of cells as exemplified by Fig. 1B. This gives us the ability to perform arbor analytics seamlessly, on a large scale, over extended tissue regions.

4.1.2. Arbor Analysis by Harmonic Co-clustering of L-Measure Data

Given the reality that no single number captures all aspects of a complex cellular arbor, it is necessary to compute libraries of arbor features. Scorcioni et al. (2008) have compiled a comprehensive library called the L-Measure, containing approximately 130 features per arbor that quantify various aspects of arbor morphology (e.g., size and shape of the soma, number of stems, branching and tapering profiles, etc.). These features capture the seemingly abstract notion of “arbor morphology” in the aggregate. In this work, we adopted this methodology for microglia populations, taking advantage of the

73

Table 10: Relative Abundance of Arbor Morphology Groups

Group 1 Group 2 Group 3 Group 4 Total

Population 993 960 927 430 3,310

Percentage 31% 29% 27% 13% 100%

fact that our reconstructions are guaranteed to be trees. This computation yields a large table of L-measure data with 130 columns and as many rows as there are cells.

Interpreting this large table to extract usable insight is a difficult task for several reasons.

First the data dimensionality is high, and individual L-measure features represent dissimilar quantities (lengths, areas, volumes, ratios, counts, etc.). The most common need for microglia analysis is to organize the cell population into groups of similar cells so the data interpretation can be attempted one group at a time. Another practical need is to identify features and groups of features that distinguish groups of cells. Co-clustering is a natural strategy for achieving these goals simultaneously. The choice of an appropriate distance measure is a central decision for co-clustering. Traditional metrics, such as the Euclidean distance measure, are quite inadequate for at least two reasons.

First, the various features in the L-measure are dissimilar quantities with different units

(e.g., soma volume vs. process length). Second, the dimensionality of the data is high.

Third, the data is inherently noisy. For these reasons, we prefer the non-linear stochastic diffusion distance (Coifman and Lafon 2006) to compare arbor morphologies. Coifman’s unsupervised harmonic co-clustering algorithm (Lu et al. 2014) uses this measure in conjunction with Haar wavelet smoothing to robustly identify groups of cells and features in a hierarchical manner. Importantly, this algorithm has very few tunable parameters,

74 and usable results are produced with default parameter settings.

The harmonic co-clustering algorithm clusters both cell reconstructions and cell features at multiple scales simultaneously. This reveals groups of cells with similar morphological properties, and the corresponding features that distinguish the cell groups.

Table 11: Selected Features of Arbor Morphology Groups

Surface Volume Branch Bifur- Group Area Segments Stems Skewness (μm3) Point cation (μm2)

1 159 174 35 8 14 13 30.1

2 78 84 19 5 7 6 28.3

3 42 44 9 2 4 5 19.6

4 2 1 0 0 0 0 0

The results of co-clustering are presented as a heat-map & dual-tree representation in

Fig. 14. Each row in the heat-map corresponds to one cell, and each column corresponds to one feature. The horizontal tree on the left-hand side of the heat-map shows the grouping of the cells, whereas the tree on the top shows the grouping of features. The analysis reveals that microglia in this dataset fall into four major groups (indicated by circles drawn on the horizontal tree). Close-up renderings of representative cells from the four groups are presented in Panels 14G1 – G4. They clearly correspond to the known microglial activation states including ramified/resting, elongated, activated, and amoeboid/round, respectively. The percentages of these four groups in the population are

75 listed in Table 14. The ramified and elongated cells have the highest proportion in this

Figure 13. Quantitative analysis of the microglial cell population. (A) 3-D rendering of the reconstructed microglial field, with the arbor traces color coded to indicate the automatically identified groups (B) Heat map representation of the co-clustering. data set and they are evenly distributed, whereas the activated and amoeboid cells are the smallest populations. The morphological complexities of the four groups are in decreasing order. We also calculated the mean values of the significant features from each group in Table 15. The overall size (surface area and volume) of the four groups are dramatically different from each other. Ramified cells and elongated cells are much larger than the activated cells and amoeboid cells. The complexities (segments, bifurcation, branch point and skewness) of different groups are also distinct. Ramified and elongated cells exhibit higher complexity compared to the other two types. This example illustrates the practical utility of our method.

76

4.2 Morphologically Constrained Spectral Unmixing

In this section we evaluated the unmixing performance using four different metrics,

Minimum Squared Error (MSE), Peak Signal to Noise Ratio (RSNR), Structural similarity index (SSI) and FSIM. Our experimental designs were conducted to test 1) robustness to additive noise and 2) the performance of the unmixing results with increase in the number of the endmembers on the pancreatic ductal adenocarcinoma (PDAC) tissue.

4.2.1. Spectral Imaging

Archival formalin fixed paraffin embedded pancreas tumor tissue was obtained from

PDAC patients (n=10) upon tumor resection and cut into 5μm-thick tissue sections.

Paraffin sections were deparaffinized and tissues were fixed to slides with formaldehyde:methanol (1:10) prior to antigen retrieval. All slides went through seven sequential staining rounds, each including antigen retrieval, protein blocking and incubation in primary antibody followed by the corresponding secondary polymer and tertiary fluorophore. Antigen retrieval was performed in heated Citric Acid Buffer (pH

6.0) for 15 minutes (EZ Retriever microwave BioGenex) and slides were blocked with

1% BSA in TBST. Primary antibody and corresponding secondary polymers are as follows: CD4(1:25, BioCare, Super-Picture from Invitrogen), Collagen I (1:500, AbD

Serotec, Goat Polink from GBI), CD31 (1:3000, Santa Cruz, Goat Polink from GBI),

Cytokeratin 8 (CK8) (1:1000, Troma, Rat Probe/Polymer from BioCare), CD8 (1:100,

Dako, Super-Picture from Invitrogen), αSMA (1:2000, Dako, Super-Picture from

Invitrogen), and Foxp3 (1:50, Abcam, Super-Picture from Invitrogen). Fluorescent molecules were covalently attached to proteins of interest at the end of each round using

77 tyramide signal amplification (680, Coumarin, Cy3, Cy3.5, Cy5, FITC, and Biotin-

Strepavidin Alexfluor 594, respectively, from Perkin Elmer, Waltham, MA or Life

Figure 14. Qualitative performance of different algorithms indicating the need for morphological constraints. (A & B) are original image of Coumarin and DAPI respectively, (E-P) are the results of unmixing for different algorithms.

Technologies) and counterstained with DAPI (Life Technologies). Single stain reference slides were also obtained for each marker. Slides were imaged using the Vectra

Multispectral Imaging System (Perkin-Elmer). A 200x image was taken at every 10nm within the range for each filter to generate a multispectral image cube. Filters used were

DAPI (440nm-680nm range), FITC (520nm-680nm range), Cy3 (570nm-690nm range),

Texas Red (580nm-700nm range), and Cy5 (670nm-720nm range). A spectral library was generated with the Nuance image analysis software (Perkin-Elmer, Waltham, MA) using a combination of single stained reference slides and multiplexed slides to compensate for spectral interactions. The generated spectral library was used for multispectral unmixing of the image cubes using the Inform Image Analysis software (Perkin-Elmer, Waltham,

MA).

78

4.2.2. Parameter Selection

The parameter 휆퐷 in MCSU and MCSU-tv was calculated as mentioned in section

3.2.4 based on the spectral signatures and the parameter 휆푡푣 in MCSVU_tv was set empirically to 0.05 for the given dataset. The detailed analysis of parameter 휆푡푣 is provided in SUNSAL-tv (Iordache, Bioucas-Dias, and Plaza 2012). The size of patch to learn the dictionary was set to 8×8 and 15×15, large enough to capture the local morphology and size of the dictionary was set to twice the patch size making the dictionary over-complete.

4.2.3. Unmixing Performance

The experiments were primarily designed to test the performance of the algorithms to varying spectral noise and to test the performance with increase in the number of the endmembers. Starting from two to eight the single stained images were mixed with their true spectral signature with additive noise and the performance of different algorithms

2 were measured using the following metrics; Mean Squared Error (MSE) =퐸[|푋 − 푋̂| ] 2 where 푋 and 푋̂ are the original and the recovered images; RSNR, peak signal to

퐸[|푋|2] reconstruction ratio = 10 log ( 2 ); we also compared the performance to with 10 푀푆퐸

Structural similarity Index (SSIM)(Z. Wang et al. 2004)s and FSIM (A Feature Similarity

Index )(L. Zhang et al. 2011) to test the image quality from structure-based and human visual system (HVS) perspective.

As the signal to noise level increases the performance of all the algorithms gets better, i.e., the MSE decreases and RSNR decreases, but we found that proposed MCSU and

MCSVU-tv and the algorithm based on contextual information SUNSAL-tv have lower

79 mean squared error and higher Peak Signal to Reconstruction error ratio as shown in

Figure 15. Performance of different algorithms as measured by MSE, SRE ratio, and FSIM. (A – C) performance of the algorithms as the noise level was varied. (D – F) performance of different algorithms with varying number of end members.

Figure 15. These algorithms also performed well even when the signal to noise ratio was low. As we increased the number of the endmember from 2 to 8 and noise from 0.1 to 50, the performance of the algorithms declined as with more endmembers the spectral angle between the two endmember decreased thereby making it difficult for the algorithms to unmix just based on the spectral signatures. The results presented in Figure 15A,15B and

15C are the average values of MSE, RSNR and FSIM for varying noise, and 15D, 15E, and 15F are the average values for increase in number of endmembers. Even in this case we found that the algorithms relying on neighborhood information making use of the morphologies and the contextual information performed better than the other algorithms.

80

The qualitative results for two endmember are presented in Figure 15 clearly indicates the need for the spatial and morphological information for unmixing multi-spectral data.

4.3 Analysis of Infant Neural Cortical Activity

4.3.1. Behavioral Results

The infants participated in turn-taking tasks where the actor performed an action with an object to demonstrate how to manipulate the object. Six different behaviors were

Figure 16. The distribution of observed behaviors varied as a function of age identified across all the infants and segmented using the video recording: rest, observe, reach-grasp, explore, reach-offer, imitate. The distribution of these different behaviors varied as a function of age. The occurrence of ‘reach-offer’ and ‘imitate’ actions increased with age as shown in Figure 16. Six months old infants demonstrate very little reach-offer and imitate behaviors. The most common behavior across all ages was

‘reach-grasp’, followed by ‘observe’ and ‘explore’. The use of the mouth, exclusively for explore, drastically decreases after 12 months of age. The larger portion of trials was

81 performed with the right hand by the subjects, except for ‘explore’ where they used primarily both hands as shown in Figure 16.

Sixty eight different toys were used in total across all subjects. The mean number of toys used in a single session was 10.05 (SD = ±7.68). The six objects that were used mostly to illicit imitative actions from the infant were the rattle, camera, car, experimenter’s hand gestures (‘hi-five’, waving, etc.), turtle and phone; which corresponded to 50% of the objects used. 8560 behavioral trials were identified and segmented across all infants. The mean number of action trials performed per experiment session was 142.66 (SD = ±49.56).

4.3.2. EEG Analysis

4.3.2.1. Neural Models

To design predictive models that extract mappings between neural activity and behavioral actions, continuous EEG data from each experimental session was first segmented into trials based on the actions identified by the raters. All behaviors, except for ‘rest’, were segmented by taking two seconds before and after the beginning time stamp of each action. The ‘rest’ behavior, which could typically last 30 seconds, was divided into segments of four seconds within its beginning and end time stamps (EEG and action time stamps were aligned temporally using external triggers during the experiment session). The trial-segmented data structure was then resampled to 500 Hz to speed up computation time and channels with high impedance (Z > 60 훀) were removed.

A spherical spline (Perrin et al. 1989) was applied to interpolate missing channels using triangulation-based channel neighbors. Given the unconstrained nature of the tasks, artifactual components (e.g., motion, myoelectric activity from scalp muscles, eye blinks

82 and 60 Hz power line noise) were expected to contaminate the EEG measurements. To remove the artifactual components from the EEG recordings, a fully automated online

Figure 17. Time-frequency analysis of EEG showing event-related desynchronization (ERD). Plot depicts the maximum suppression for all the trials and subjects. artifact removal technique for brain-machine interfacing (FORCe) was employed to reduce the effects of non-EEG signals, especially those generated from muscle, eye, or electrode movement-related artifacts (Daly et al. 2015). This method has been shown to outperform other automated artifact methods such as the Lagged Auto-Mutual

Information Clustering (LAMIC, (Nicolaou and Nasuto 2007)) and Fully Automated

83

Statistical Thresholding for EEG artifact Rejection (FASTER, (Nolan, Whelan, and

Reilly 2010)). Further, for each infant and action, we looked at the maximum acceleration

Figure 18. Neural classifier model can predict behavioral action from brain activity (EEG). A) Mean classification accuracy (%) for different frequency bands is indicated by colored asterisks. B) Mean confusion matrix (%) across all subjects.

for each trial. The trials that were three standard deviations away from the mean were

identified as outliers, and the corresponding EEG trials were removed for any further

analysis.

4.3.2.2. Spectral Analysis

To estimate the spectral content in the 4-10Hz band, the Morlet wavelet transform was

first performed (Liao et al. 2015; Makeig 1993). To identify the μ-rhythm, which was

expected to shift in frequency as a function of age (Cuevas et al. 2014; Liao et al. 2015;

Marshall, Bar-Haim, and Fox 2002; Marshall and Meltzoff 2011; Stroganova, Orekhova,

and Posikera 1999), the maximum suppression for each trial was estimated from the

event related desynchronization (ERD). To compute the ERD, we calculated the

84 normalized wavelet transform for the channels: 'CP3', 'CP1', 'CPz', 'CP2', 'CP4', 'C4', 'C2',

'Cz', 'C1', 'C3', 'FC3', 'FC1', 'FC2', 'FC4'. We then computed the median of all wavelet transforms for the ‘rest’ trials, which served as the control condition.

For each trial and for each action type, we then computed ERD as: (b-a); where ‘b’ is wavelet for each action (per trial) and ‘a’ is the median of all ‘rest’ trials. Then, the maximum suppression between 4-10 Hz was identified. Maximum power suppression for most infants occurred after the moment onset as indicated by the projection on the Time-

Frequency plane in Figure 17. Overall, the maximum power suppression frequency increased as a function of age as shown in the Age-Frequency plane. The maximum ERD frequency also increased for each action type, with the exception of ‘reach-offer’.

4.3.2.3. Neural Decoding

A predictive model was designed to predict behavioral actions from scalp EEG of the subjects. To accomplish this goal, we first decomposed the EEG into different frequency bands (1-4 Hz, 4-6 Hz, 6-9 Hz, 7-11 Hz, 12-20 Hz, 20-35 Hz and 35-50 Hz) for every trial and subject. These frequency bands were identified based on the prior literature

(Cuevas et al. 2014; Marshall, Bar-Haim, and Fox 2002; Stroganova, Orekhova, and

Posikera 1999) and visual inspection of power spectral density and time-frequency plots.

Unlike previous studies that focussed on studying a particular frequency band or a

85 Figure 19. Example of relevance of EEG channels to prediction of behavioral action. Scalp maps for each behavior shown by a 23 month-old male subject (RB23). combination of a few we inspected a wider range of frequency bands for identifying relevant spatial, temporal and spectral features for accurate classification.

We computed six different features (average power, max amplitude, kurtosis, median absolute deviation, standard deviation and mean) for each band and each electrode within a moving window of 0.5 seconds with an overlap of 0.25 seconds. Thus making the feature vector size 64(channels) ×6 (number of features) ×7 (no. different frequency bands) =2688. These features were computed for every trial and every behavior type.

Next, we identified the most relevant features (e.g., with the most predictive power) using the Sparse-Group Lasso (SG-LASSO) method (Jun Liu, Shuiwang Ji 2009). The output of this method, which provided information about the relevance of each channel and each frequency band, was then used as an input to a dictionary learning based classifier(Jiang,

Lin, and Davis 2011, 2013; Megjhani et al. 2015): Given the features 퐗 from {channels, frequencies} selected by the SG-LASSO, the model learns a label consistent dictionary and a classifier jointly, that learns the distinct characteristics of each class and stores it in a dictionary 퐃 and a classifier 퐖, to separate different behavioral actions.

Once the dictionary 퐃 and classifier 퐖 are learned from the training features, the classification of the test data 퐗퐭퐞퐬퐭 is obtained by first solving for sparse codes 횪퐭퐞퐬퐭 and then classifying the behavior type by solving 퐦퐚퐱(퐖횪퐭퐞퐬퐭). In order to classify the six 퐢 behavioral types, we built a predictive model first using the above SG-LASSO and dictionary learning classifier. This enabled us to differentiate the behaviors based solely on neuronal activity. Figure 18A shows the classification accuracy for all the subjects, and the confusion matrix is shown in Figure 18B. ‘Rest’ had the highest classification accuracy, followed by ‘reach-offer’. The classifier was correctly able to distinguish

86 between ‘reach-grasp’ and ‘reach-offer’, indicating different cognitive process involved.

For ‘imitate’, ‘observe’ and ‘reach-grasp’, these classes were often confused with

‘explore’. The overall mean accuracy of the classifier was 77%. Next, in order to study the unique characteristics (channels, frequency) that differentiate them from the ‘rest’ condition, we performed the SG-LASSO and dictionary learning for two classes

(behavior type vs ‘rest’). Figure 19 shows the relevant electrodes for classifying different neuronal patterns related to the behavioral actions of interest for one participant.

‘Explore’ was predicted, in sensor space, by a sparse network of EEG channels containing information in higher frequency bands (20-50 Hz) while ‘imitate’ was better represented by information in lower frequency bands (4-9 Hz). Both reach behaviors

(‘reach-grasp’ and ‘reach-offer’) showed relevant channels over sensorimotor and parietal areas in the 7-11 Hz and 12-20 Hz frequency bands and ‘observe’ shows relevant channels in similar areas except for the 1-4 Hz frequency band. Overall, the 20-35 Hz frequency band is apparent across all channels, which may be related to large differences in beta activity between the five movement-related behaviors and its comparison to ‘rest’, as shown in Figure 20.

The significance of our findings are three-fold: 1) Our neural decoding approach provide a window to study the development of intentionality, or goal-oriented behavior, in the first two years of life, 2) challenge dogma in regard to the neurophysiological correlates of mirror neuron functioning as it shows that neural activity of sparse neural networks in EEG sensor space can predict age-relevant behaviors, 3) our approach opens the possibility of harnessing mirror neurons to develop brain-machine interfaces (BMIs) in the pediatric population to restore motor function in infants and children with

87

Figure 20. Topographic scalp maps for each behavior and age group of all subject data applied to EEG classification (N=51). neurological diseases or physical disability impacting social communication and movement such as autism spectrum disorders (ASD) or physical disabilities.

88

CHAPTER 5 CONCLUSION

The rapid advancements in Mobile Brain imaging (MoBi), multispectral microscopes, and, confocal microscopy, have enabled us to acquire terabytes of data every day. These techniques have provided the means to study the whole brain. It has also provided a way to study human infants in a freely behaving environment. However, it has presented few challenges to signal analysis community, with this vast and diverse data. Traditional methods relying on analytical models do not accurately address the variability within and across the datasets. Moreover, the methods relying on pre-defined analytical model do not represent the data concisely or in a consistent manner on big dataset, as these models cater to only few morphologies. Thus there is a need to address these variabilities using data-driven, label consistent approaches that can learn from the data. In this thesis, we have presented a framework that can learn from the data and represent the data: concisely; accurately; and consistently, using dictionary learning approaches. We extended this framework to address three different applications: analysis of brain activity data using EEG; spectral unmixing by morphologically constrained dictionary learning for multispectral data; and profiling of microglia (an immune cell in brain) on a larger sections (mm) of brain tissue.

The results for developmental analysis of behaviors related to neural activity, demonstrate the feasibility of decoding intentional goal oriented behavioral actions from scalp EEG in freely behaving infants. These results show that scalp EEG contains valuable predictive information about the infant's intent. The experimental protocol provides an opportunity to explore neural and motion patterns in infants as they engage in social interaction to learn by imitation. Sparse-Group Lasso, enabled us to uncover

89 different spectral-spatial maps during developmental cycle of infants (6-24 months). The proposed dictionary learning based framework enabled us to classify the neural patterns associated with the different behaviors (Accuracy≈75%). We also noted that the maximum power suppression in frequency increased with age. These results can be used as biomarkers for developmental stages of infants. Further analysis is required to study the clinical aspects of children’s social and cognitive development.

The morphologically constrained spectral unmixing overcomes the fundamental challenge of separating fluorophores with very similar emission spectra by exploiting spatial cues that are often available in multispectral microscopy data. We see this as a way to escalate the level of fluorescence multiplexing, i.e., the ability to see multiple molecular markers simultaneously in their spatial context. Continued growth in discovering molecular markers that represent different cellular sub-populations, organelles, and functional states opens the door for the study of complex cell-cell interactions. Our algorithm synthesizes and extends the very best ideas from the prior literature including the use of sparse representations, constraints based on dictionary learning, and adaptive weighting of spectral and structural cues. Our experimental results, demonstrate that adding the dictionary model as a morphological constraint and including the total variation regularizer improves the performance of the unmixing algorithms. The average Mean Squared Error (MSE) and Signal Reconstruction Error (SRE) ratio of the proposed method was 39.6% less and 8% more, respectively, compared to the best of all other algorithms that do not exploit these spatial constraints. In the absence of a spectral signature library, MCSU also provides an option to extract the true spectra from the data itself. When the spectral library is available, MCSU-TV is a better alternative. This work

90 establishes the fundamental mathematical basis for the next generation of spectral unmixing algorithms. As a next step, a user-friendly implementations of this method is required that can be incorporated into laboratory imaging software to perform the unmixing on fly.

Finally, we have developed a comprehensive solution for reconstructing and profiling large microglia arbor populations in a seamless manner from extended 3-D mosaics of brain tissue. Our pipeline incorporates methods for montaged three-dimensional brain tissue imaging, scalable/parallel arbor reconstruction algorithms, arbor morphology quantification, visualization and proofreading, and appropriate “big data analytics” tools to interpret the resulting high-dimensional data. Our arbor reconstruction algorithms, and implementation methods are specifically designed for handling large collections of localized arbors in a scalable and seamless manner, rather than detailed reconstruction of individual neuronal arbors that are imaged at high resolution, as emphasized in the neuron reconstruction literature (Meijering 2010) These distinct needs entailed different algorithm design choices throughout. Our algorithms adapt to specific image collections using a data-driven machine learning approach rather than relying on a predefined set of analytical models, allowing it to adapt to novel imaging protocols, and variations in imaging quality. The pragmatic “dice and trace” implementation strategy guarantees that our method can generate accurate reconstructions seamlessly, and is indefinitely scalable.

Formulating the reconstruction problem in terms of minimal spanning trees is important – it guarantees that the reconstructions are trees. This allows the computation of L measures that, in turn, enables “arbor analytics” by harmonic co-clustering. This is a powerful approach for translating the image data to usable insight that is robust to noise

91 in features, produces biologically meaningful yet concise results, and is scalable to large datasets (Lu et al. 2014). Our experimental results show that our method is able to identify morphologically distinct subpopulations of microglia that concord with the known biology. By combining this method with a linked visualization system as in the

FARSIGHT trace editor (Luisi et al. 2011), it is possible to perform interactive 3-D visualization of cell sub-populations, and map the spatial distribution of microglia activation patterns over extended regions of brain tissue.

The strengths and weakness of the proposed framework are ultimately tied to the ability to capture these morphologies of the data. The performance of the method relies in the variability in the data. In other words, our algorithms performs better with more variability in the dataset. Since the dictionary learning framework is dependent on the training samples, a large number of training data are required. Results of incorrect learning that could be because of incorrect labelling may have detrimental effects on the results. As an extension of this thesis, the training examples during learning could be selected intelligently by incorporating the active learning based approaches to select the training examples with more relevant information. A further extension of this work could be related to learning higher biological level features inspired by techniques. Our framework provides an opportunity to learn the lower level edge based features. Hierarchical dictionaries along with the deep learning techniques, will provide an interesting perspective to different applications of interest presented in this thesis.

92

REFERENCES

Abolghasemi, Vahid, Saideh Ferdowsi, and Saeid Sanei. 2012. “Blind Separation of

Image Sources via Adaptive Dictionary Learning.” IEEE Transactions on Image

Processing 21(6): 2921–30.

Aharon, M., M. Elad, and A. Bruckstein. 2006. “K-SVD: An Algorithm for Designing

Overcomplete Dictionaries for Sparse Representation.” IEEE Transactions on

Signal Processing 54(11): 4311–22.

Al-Kofahi, Yousef, Wiem Lassoued, William Lee, and Badrinath Roysam. 2010.

“Improved Automatic Detection and Segmentation of Cell Nuclei in Histopathology

Images.” IEEE Transactions on Biomedical Engineering 57(4): 841–52.

Al-Kofahi, Yousef, William Shain, and Badrinath Roysam. 2008. “Improved Detection

of Branching Points in Algorithms for Automated Neuron Tracing from 3D

Confocal Images.” Cytometry Part A 73(1): 36–43.

Arngren, Morten, Mikkel N. Schmidt, and Jan Larsen. 2010. “Unmixing of Hyperspectral

Images Using Bayesian Non-Negative Matrix Factorization with Volume Prior.”

Journal of Signal Processing Systems 65(3): 479–96.

Bas, Erhan, and Deniz Erdogmus. 2010. “Piecewise Linear Cylinder Models for 3-

Dimensional Axon Segmentation in Brainbow Imagery.” In Proceedings of the

IEEE International Symposium on Biomedical Imaging: From Nano to Macro, ,

1297–1300.

Becchio, Cristina, Luisa Sartori, Maria Bulgheroni, and Umberto Castiello. 2008. “The

Case of Dr. Jekyll and Mr. Hyde: A Kinematic Study on Social Intention.”

93

Consciousness and Cognition 17(3): 557–64.

Bioucas-Dias, José M., Antonio Plaza, Paul Gader, and Jocelyn Chanussot. 2012.

“Hyperspectral Unmixing Overview: Geometrical, Statistical, and Sparse

Regression-Based Approaches.” IEEE J. Selected Topics in Applied Earth

Observations and Remote Sensing 5(2): 354–79.

Bjornsson, Christopher S, Yousef Al-Kofahi, William Shain, and Badrinath Roysam.

2008. “Associative Image Analysis: A Method for Automated Quantification of 3D

Multi-Parameter Images of Brain Tissue.” Journal of Neuroscience Methods 170(1):

165–78.

Blankertz, Benjamin, Ryota Tomioka, Steven Lemm, and Klaus-robert Muller. 2008.

“Optimizing Spatial Filters for Robust EEG Single-Trial Analysis.” IEEE Signal

Processing Magazine 25(1): 41–56.

Breitenreicher, Dirk, Michal Sofka, Stefan Britzen, and Shaohua K. Zhou. 2013.

“Hierarchical Discriminative Framework for Detecting Tubular Structures in 3D

Images.” Lecture Notes in Computer Science (including subseries Lecture Notes in

Artificial Intelligence and Lecture Notes in Bioinformatics) 7917 LNCS: 328–39.

Cannon, Erin N., Kathryn H. Yoo, Ross E. Vanderwert, and R Lizio. 2014. “Action

Experience, More than Observation, Influences Mu Rhythm Desynchronization” ed.

Marco Iacoboni. PLoS ONE 9(3): e92002.

Cannon, R C, D A Turner, G K Pyapali, and H V Wheal. 1998. “An on-Line Archive of

Reconstructed Hippocampal Neurons.” Journal of neuroscience methods 84(1-2):

49–54.

94

Chan, Tsung-Han, Wing-Kin Ma, ArulMurugan Ambikapathi, and Chong-Yung Chi.

2011. “A Simplex Volume Maximization Framework for Hyperspectral Endmember

Extraction.” IEEE Transactions on Geoscience and Remote Sensing 49(11): 4177–

93.

Charles, Adam S., Bruno A. Olshausen, and Christopher J. Rozell. 2011. “Learning

Sparse Codes for Hyperspectral Imagery.” IEEE Journal of Selected Topics in

Signal Processing 5(5): 963–78.

Chen, Scott Shaobing, David L. Donoho, and Michael A. Saunders. 1998. “Atomic

Decomposition by Basis Pursuit.” SIAM Journal on Scientific Computing 20(1): 33–

61.

Chothani, Paarth, Vivek Mehta, and Armen Stepanyants. 2011. “Automated Tracing of

Neurites from Light Microscopy Stacks of Images.” Neuroinformatics 9(2-3): 263–

78.

Coifman, Ronald R., and Stéphane Lafon. 2006. “Diffusion Maps.” Applied and

Computational Harmonic Analysis 21(1): 5–30.

Cruz-Garza, Jesus G, Zachery R Hernandez, Berdakh Abibullaev, and Jose L Contreras-

Vidal. 2015. “A Novel Experimental and Analytical Approach to the Multimodal

Neural Decoding of Intent During Social Interaction in Freely-Behaving Human

Infants.” Journal of visualized experiments : JoVE (104).

Cruz-Garza, Jesus G, Zachery R Hernandez, Karen K Bradley, and Jose L Contreras-

Vidal. 2014. “Neural Decoding of Expressive Human Movement from Scalp

Electroencephalography (EEG).” Frontiers in human neuroscience 8: 188.

95

Cuevas, Kimberly, Erin N. Cannon, Kathryn Yoo, and Nathan A. Fox. 2014. “The Infant

EEG Mu Rhythm: Methodological Considerations and Best Practices.”

Developmental Review 34(1): 26–43.

Dagum, Leonardo, and Ramesh Menon. 1998. “OpenMP: An Industry Standard API for

Shared-Memory Programming.” Computational Science & Engineering, IEEE 5(1):

46–55.

Daly, Ian, Reinhold Scherer, Martin Billinger, and Gernot Müller-Putz. 2015. “FORCe:

Fully Online and Automated Artifact Removal for Brain-Computer Interfacing.”

IEEE transactions on neural systems and rehabilitation engineering : a publication

of the IEEE Engineering in Medicine and Biology Society 23(5): 725–36.

Dobigeon, Nicolas, Saïd Moussaoui, Jean-Yves Tourneret, and Cédric Carteret. 2009.

“Bayesian Separation of Spectral Sources under Non-Negativity and Full Additivity

Constraints.” Signal Processing 89(12): 2657–69.

Donoho, D L, Tsaig, Drori, and Strack. 2006. “Sparse Solution of Underdetetmined

Linear Equations by Stagewise Orthogonal Mathcing Pursuit.” 58(2): 39.

Efron, Bradley, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. 2015. “Least Angle

Regression.” The Annals of Statistics 32(2): 407–99.

Elad, M., B. Matalon, J. Shtok, and M. Zibulevsky. 2007. “A Wide-Angle View at

Iterated Shrinkage Algorithms.” In Optical Engineering + Applications, eds. Dimitri

Van De Ville, Vivek K. Goyal, and Manos Papadakis. International Society for

Optics and Photonics, 670102–670102 – 19.

Elad, Michael, and Michal Aharon. 2006. “Image Denoising Via Sparse and Redundant

96

Representations Over Learned Dictionaries.” IEEE Transactions on Image

Processing 15(12): 3736–45.

Eroglu, Emrah, Benjamin Gottschalk, Suphachai Charoensin, and Roland Malli. 2016.

“Development of Novel FP-Based Probes for Live-Cell Imaging of Nitric Oxide

Dynamics.” Nature communications 7: 10623.

Esser, Ernie, Michael Möller, Guillermo Sapiro, and Jack Xin. 2012. “A Convex Model

for Nonnegative Matrix Factorization and Dimensionality Reduction on Physical

Space.” IEEE transactions on image processing : a publication of the IEEE Signal

Processing Society 21(7): 3239–52.

Fauvel, Mathieu, Jón Atli Benediktsson, Jocelyn Chanussot, and Johannes R. Sveinsson.

2008. “Spectral and Spatial Classification of Hyperspectral Data Using SVMs and

Morphological Profiles.” IEEE Transactions on Geoscience and Remote Sensing

46(11): 3804–14.

Fields, R. Douglas. 2013. “Neuroscience: Map the Other Brain.” Nature 501(7465): 25–

27.

Galbreath, Zachary Stephen . 2011. “Tracing, Extracting Features, and Classifying

Microglia from Volumetric Images of Brain Tissue.” Rensselaer Polytechnic

Institute: 56.

Gallese, Vittorio, Christian Keysers, and Giacomo Rizzolatti. 2004. “A Unifying View of

the Basis of Social Cognition.” Trends in cognitive sciences 8(9): 396–403.

Gammon, Seth T, W Matthew Leevy, Shimon Gross, and David Piwnica-Worms. 2006.

“Spectral Unmixing of Multicolored Bioluminescence Emitted from Heterogeneous

97

Biological Sources.” Analytical chemistry 78(5): 1520–27.

Gehrmann, J, Y Matsumoto, and G W Kreutzberg. 1995. “Microglia: Intrinsic

Immuneffector Cell of the Brain.” Brain Research Reviews 20(3): 269–87.

Ghamisi, Pedram, Jon Atli Benediktsson, and Magnus Orn Ulfarsson. 2014. “Spectral–

Spatial Classification of Hyperspectral Images Based on Hidden Markov Random

Fields.” IEEE Transactions on Geoscience and Remote Sensing 52(5): 2565–74.

Gillette, Todd A, Kerry M Brown, and Giorgio A Ascoli. 2011. “The DIADEM Metric:

Comparing Multiple Reconstructions of the Same Neuron.” Neuroinformatics 9(2-

3): 233–45.

Gonzalez, German, Engin Turetken, Franccois Fleuret, and Pascal Fua. 2010.

“Delineating Trees in Noisy 2D Images and 3D Image-Stacks.” In 2010 IEEE

Computer Society Conference on and Pattern Recognition, IEEE,

2799–2806.

Gorodnitsky, I.F., and B.D. Rao. 1997. “Sparse Signal Reconstruction from Limited Data

Using FOCUSS: A Re-Weighted Minimum Norm Algorithm.” IEEE Transactions

on Signal Processing 45(3): 600–616.

He, Wenyun, Thomas A. Hamilton, Andrew R. Cohen, and Badrinath Roysam. 2003.

“Automated Three-Dimensional Tracing of Neurons in Confocal and Brightfield

Images.” Microscopy and Microanalysis 9(04): 296–310.

Hernandez, Zachery R, Jesus Cruz-Garza, Teresa Tse, and Jose L Contreras-Vidal. 2014.

“Decoding of Intentional Actions from Scalp Electroencephalography (EEG) in

Freely-Behaving Infants.” Conference proceedings : Annual International

98

Conference of the IEEE Engineering in Medicine and Biology Society. IEEE

Engineering in Medicine and Biology Society. Annual Conference 2014: 2115–18.

Hochberg, Leigh R., Daniel Bacher, Beata Jarosiewicz, and John P. Donoghue. 2012.

“Reach and Grasp by People with Tetraplegia Using a Neurally Controlled Robotic

Arm.” Nature 485(7398): 372–75.

Iacoboni, Marco. 2005. “Neural Mechanisms of Imitation.” Current opinion in

neurobiology 15(6): 632–37.

Ibanez, Luis, William Schroeder, Lydia Ng, and Josh Cates. 2003. “The ITK Software

Guide.” Kitware Inc.

Iordache, Marian-Daniel, Jose M. Bioucas-Dias, and Antonio Plaza. 2014. “Collaborative

Sparse Regression for Hyperspectral Unmixing.” IEEE Transactions on Geoscience

and Remote Sensing 52(1): 341–54.

Iordache, Marian-Daniel, José M. Bioucas-Dias, and Antonio Plaza. 2011a. “Sparse

Unmixing of Hyperspectral Data.” IEEE Transactions on Geoscience and Remote

Sensing 49(6): 2014–39.

Iordache, Marian-Daniel, José M. Bioucas-Dias, and Antonio Plaza. 2011b. “Sparse

Unmixing of Hyperspectral Data.” IEEE Transactions on Geoscience and Remote

Sensing 49(6): 2014–39.

Iordache, Marian-Daniel, José M. Bioucas-Dias, and Antonio Plaza. 2012. “Total

Variation Spatial Regularization for Sparse Hyperspectral Unmixing.” IEEE

Transactions on Geoscience and Remote Sensing 50(11): 4484–4502.

Jiang, Zhuolin, Zhe Lin, and Larry S Davis. 2013. “Label Consistent K-SVD: Learning a

99

Discriminative Dictionary for Recognition.” IEEE transactions on pattern analysis

and machine intelligence 35(11): 2651–64.

Jiang, Zhuolin, Zhe Lin, and Larry S. Davis. 2011. “Learning a Discriminative Dictionary

for Sparse Coding via Label Consistent K-SVD.” In IEEE Conference on Computer

Vision and Pattern Recognition (CVPR), Ieee, 1697–1704.

Jiménez, David, Demetrio Labate, Ioannis A Kakadiaris, and Manos Papadakis. 2014.

“Improved Automatic Centerline Tracing for Dendritic and Axonal Structures.”

Neuroinformatics: 1–18.

Jones, Susan S. 2009. “The Development of Imitation in Infancy.” Philosophical

transactions of the Royal Society of London. Series B, Biological sciences

364(1528): 2325–35.

Jun Liu, Shuiwang Ji, Jieping Ye. 2009. SLEP: Sparse Learning with Efficient

Projections. Arizona State University.

Keshava, N., and J.F. Mustard. 2002. “Spectral Unmixing.” IEEE Signal Processing

Magazine 19(1): 44–57.

Keshava, Nirmal. 2003. “A Survey of Spectral Unmixing Algorithms.” Lincoln

Laboratory Journal 14(1): 55–78.

Liao K., Xiao R., Gonzalez J., Ding L, Wolpert, DM. 2014. “Decoding Individual Finger

Movements from One Hand Using Human EEG Signals” ed. Wang Zhan. PLoS

ONE 9(1): e85192.

Liao, Yu, Zeynep Akalin Acar, Scott Makeig, and Gedeon Deak. 2015. “EEG Imaging of

Toddlers during Dyadic Turn-Taking: Mu-Rhythm Modulation While Producing or

100

Observing Social Actions.” NeuroImage 112: 52–60.

Lotte, F, M Congedo, A Lécuyer, and B Arnaldi. 2007. “A Review of Classification

Algorithms for EEG-Based Brain-Computer Interfaces.” Journal of neural

engineering 4(2): R1–13.

Lotte, Fabien, and Cuntai Guan. 2010. “Learning from Other Subjects Helps Reducing

Brain-Computer Interface Calibration Time.” In 2010 IEEE International

Conference on Acoustics, Speech and Signal Processing, IEEE, 614–17.

Lovisa, Sara, Valerie S LeBleu, Björn Tampe, and Raghu Kalluri. 2015. “Epithelial-to-

Mesenchymal Transition Induces Cell Cycle Arrest and Parenchymal Damage in

Renal Fibrosis.” Nature medicine 21(9): 998–1009.

Lu, Yanbin, Lawrence Carin, William Shain, and Badrinath Roysam. 2014. “Quantitative

Arbor Analytics: Unsupervised Harmonic Co-Clustering of Populations of Brain

Cell Arbors Based on L-Measure.” Neuroinformatics: 1–17.

Luisi, Jonathan, Arunachalam Narayanaswamy, Zachary Galbreath, and Badrinath

Roysam. 2011. “The FARSIGHT Trace Editor: An Open Source Tool for 3-D

Inspection and Efficient Pattern Analysis Aided Editing of Automated Neuronal

Reconstructions.” Neuroinformatics 9(2-3): 305–15.

Ma, Wing-Kin, Jose M. Bioucas-Dias, Tsung-Han Chan, and Chong-Yung Chi. 2014. “A

Signal Processing Perspective on Hyperspectral Unmixing: Insights from Remote

Sensing.” IEEE Signal Processing Magazine 31(1): 67–81.

Mairal, Julien, Francis Bach, Jean Ponce, and Guillermo Sapiro. 2009. “Online

Dictionary Learning for Sparse Coding.” In Proceedings of the 26th Annual

101

International Conference on Machine Learning - ICML ’09, New York, New York,

USA: ACM Press, 1–8.

Mairal, Julien, Jean Ponce, Guillermo Sapiro, and Francis R. Bach. 2009. “Supervised

Dictionary Learning.” In Advances in Neural Information Processing Systems, ,

1033–40.

Mairal, Julien, Guillermo Sapiro, and Michael Elad. 2008. “Learning Multiscale Sparse

Representations for Image and Video Restoration.” Multiscale Modeling &

Simulation 7(1): 214–41.

Makeig, S. 1993. “Auditory Event-Related Dynamics of the EEG Spectrum and Effects

of Exposure to Tones.” Electroencephalography and clinical neurophysiology

86(4): 283–93.

Mallat, S.G. 1993. “Matching Pursuits with Time-Frequency Dictionaries.” IEEE

Transactions on Signal Processing 41(12): 3397–3415.

Mariano, Artur, Dongwook Lee, Andreas Gerstlauer, and Derek Chiou. 2013. “Hardware

and Software Implementations of Prim’s Algorithm for Efficient Minimum

Spanning Tree Computation.” In Embedded Systems: Design, Analysis and

Verification, Heidelberg: Springer, 151–58.

Marshall, Peter J, Yair Bar-Haim, and Nathan A Fox. 2002. “Development of the EEG

from 5 Months to 4 Years of Age.” Clinical neurophysiology : official journal of the

International Federation of Clinical Neurophysiology 113(8): 1199–1208.

Marshall, Peter J, and Andrew N Meltzoff. 2011. “Neural Mirroring Systems: Exploring

the EEG μ Rhythm in Human Infancy.” Developmental cognitive neuroscience 1(2):

102

110–23.

Megjhani, Murad, Nicolas Rey-Villamizar, Amine Merouane, and Badrinath Roysam.

2015. “Population-Scale Three-Dimensional Reconstruction and Quantitative

Profiling of Microglia Arbors.” Bioinformatics (Oxford, England): btv109 – .

Meijering, Erik. 2010. “Neuron Tracing in Perspective.” Cytometry A 77(7): 693–704.

Meltzoff, A N. 1988. “Infant Imitation and Memory: Nine-Month-Olds in Immediate and

Deferred Tests.” Child development 59(1): 217–25.

Merouane, Amine, Nicolas Rey-Villamizar, Yanbin Lu, and Badrinath Roysam. 2015.

“Automated Profiling of Individual Cell-Cell Interactions from High-Throughput

Time-Lapse Imaging Microscopy in Nanowell Grids (TIMING).” Bioinformatics

(Oxford, England) 31(19): 3189–97.

Mühl, Christian, Camille Jeunet, and Fabien Lotte. 2014. “EEG-Based Workload

Estimation across Affective Contexts.” Frontiers in neuroscience 8: 114.

Myatt, Darren R, Tye Hadlington, Giorgio A Ascoli, and Slawomir J Nasuto. 2012.

“Neuromantic - from Semi-Manual to Semi-Automatic Reconstruction of Neuron

Morphology.” Frontiers in Neuroinformatics 6: 4.

Narayanaswamy, Arunachalam, Yu Wang, and Badrinath Roysam. 2011. “3-D Image

Pre-Processing Algorithms for Improved Automated Tracing of Neuronal Arbors.”

Neuroinformatics 9(2-3): 219–31.

Nascimento, J.M.P., and J.M.B. Dias. 2005. “Vertex Component Analysis: A Fast

Algorithm to Unmix Hyperspectral Data.” IEEE Transactions on Geoscience and

Remote Sensing 43(4): 898–910.

103

Natarajan, B. K. 1995. “Sparse Approximate Solutions to Linear Systems.” SIAM Journal

on Computing 24(2): 227–34.

Neher, Richard A, Miso Mitkovski, Frank Kirchhoff, and André Zeug. 2009. “Blind

Source Separation Techniques for the Decomposition of Multiply Labeled

Fluorescence Images.” Biophysical journal 96(9): 3791–3800.

Nicolaou, N., and S. J. Nasuto. 2007. “Automatic Artefact Removal from Event-Related

Potentials via Clustering.” The Journal of VLSI Signal Processing Systems for

Signal, Image, and Video Technology 48(1-2): 173–83.

Nicolich, Lorraine McCune. 1977. “Beyond Sensorimotor Intelligence: Assessment of

Symbolic Maturity Through Analysis of Pretend Play.” Merrill-Palmer Quarterly.

Nolan, H, R Whelan, and R B Reilly. 2010. “FASTER: Fully Automated Statistical

Thresholding for EEG Artifact Rejection.” Journal of neuroscience methods 192(1):

152–62.

Oostenveld, Robert, Pascal Fries, Eric Maris, and Jan-Mathijs Schoffelen. 2011.

“FieldTrip: Open Source Software for Advanced Analysis of MEG, EEG, and

Invasive Electrophysiological Data.” Computational intelligence and neuroscience

2011: 156869.

Ozdemir, Berna C, Tsvetelina Pentcheva-Hoang, Julienne L Carstens, and Raghu Kalluri.

2014. “Depletion of Carcinoma-Associated Fibroblasts and Fibrosis Induces

Immunosuppression and Accelerates Pancreas Cancer with Reduced Survival.”

Cancer cell 25(6): 719–34.

Padmanabhan, Raghav K., Vinay H. Somasundar, Sandra D. Griffith, and William M. F.

104

Lee. 2014. “An Active Learning Approach for Rapid Characterization of Endothelial

Cells in Human Tumors” ed. Joseph Najbauer. PLoS ONE 9(3): e90495.

Parra, Lucas C, Clay Spence, Paul Sajda, and Klaus-Robert Müller. 2000. “Unmixing

Hyperspectral Data.” Proc. Adv. Neural Inf. Process. Syst. (NIPS) 12: 942–48.

Pati, Y.C., R. Rezaiifar, and P.S. Krishnaprasad. 1993. “Orthogonal Matching Pursuit:

Recursive Function Approximation with Applications to Wavelet Decomposition.”

In Proceedings of 27th Asilomar Conference on Signals, Systems and Computers,

IEEE Comput. Soc. Press, 40–44.

Patterson, G H, S M Knobel, W D Sharif, and D W Piston. 1997. “Use of the Green

Fluorescent Protein and Its Mutants in Quantitative Fluorescence Microscopy.”

Biophysical journal 73(5): 2782–90. di Pellegrino, G, L Fadiga, L Fogassi, and G Rizzolatti. 1992. “Understanding Motor

Events: A Neurophysiological Study.” Experimental brain research 91(1): 176–80.

Peng, Hanchuan, Fuhui Long, and Chris Ding. 2005. “Feature Selection Based on Mutual

Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy.”

IEEE transactions on pattern analysis and machine intelligence 27(8): 1226–38.

Peng, Hanchuan, Zongcai Ruan, Deniz Atasoy, and Scott Sternson. 2010. “Automatic

Reconstruction of 3D Neuron Structures Using a Graph-Augmented Deformable

Model.” Bioinformatics 26(12): i38–46.

Peng, Hanchuan, Jianyong Tang, Hang Xiao, and Fuhui Long. 2014. “Virtual Finger

Boosts Three-Dimensional Imaging and Microsurgery as Well as Terabyte Volume

Image Visualization and Analysis.” Nature Communications 5: 4342.

105

Perrin, F, J Pernier, O Bertrand, and J F Echallier. 1989. “Spherical Splines for Scalp

Potential and Current Density Mapping.” Electroencephalography and clinical

neurophysiology 72(2): 184–87.

Pineda, J A, B Z Allison, and A Vankov. 2000. “The Effects of Self-Movement,

Observation, and Imagination on Mu Rhythms and Readiness Potentials (RP’s):

Toward a Brain-Computer Interface (BCI).” IEEE transactions on rehabilitation

engineering : a publication of the IEEE Engineering in Medicine and Biology

Society 8(2): 219–22.

Ramirez, Ignacio, Pablo Sprechmann, and Guillermo Sapiro. 2010. “Classification and

Clustering via Dictionary Learning with Structured Incoherence and Shared

Features.” 2010 IEEE Computer Society Conference on Computer Vision and

Pattern Recognition (1): 3501–8.

Rizzolatti, Giacomo, and Laila Craighero. 2004. “The Mirror-Neuron System.” Annual

review of neuroscience 27: 169–92.

Rouchdy, Youssef, Laurent D Cohen, Olivier Pascual, and Alain Bessis. 2011. “Minimal

Path Techniques for Automatic Extraction of Microglia Extensions.” International

Journal for Computational Vision and Biomechanics 4(1): 35–42.

Rouchdy, Youssef, and Laurent D. Cohen. 2013. “Geodesic Voting for the Automatic

Extraction of Tree Structures. Methods and Applications.” Computer Vision and

Image Understanding 117(10): 1453–67.

Rubinstein, Ron, Alfred M Bruckstein, and Michael Elad. 2010. “Dictionaries for Sparse

Representation Modeling.” Proceedings of the IEEE 98(6): 1045–57.

106

Sardy, Sylvain, Andrew G. Bruce, and Paul Tseng. 2012. “Block Coordinate Relaxation

Methods for Nonparametric Wavelet Denoising.”

Schmitt, Stephan, Jan Felix Evers, Carsten Duch, and Klaus Obermayer. 2004. “New

Methods for the Computer-Assisted 3-D Reconstruction of Neurons from Confocal

Image Stacks.” Neuroimage 23(4): 1283–98.

Scorcioni, Ruggero, Sridevi Polavaram, and Giorgio A Ascoli. 2008. “L-Measure: A

Web-Accessible Tool for the Analysis, Comparison and Search of Digital

Reconstructions of Neuronal Morphologies.” Nature Protocols 3(5): 866–76.

Sethian, J A. 1996. “A Fast Marching Level Set Method for Monotonically Advancing

Fronts.” Proceedings of the National Academy of Sciences of the United States of

America 93(4): 1591–95.

Shin, Younghak, Seungchan Lee, Junho Lee, and Heung-No Lee. 2012. “Sparse

Representation-Based Classification Scheme for Motor Imagery-Based Brain-

Computer Interface Systems.” Journal of neural engineering 9(5): 056002.

Starck, J.-L., Y. Moudden, J. Bobin, and D. L. Donoho. 2005. “Morphological

Component Analysis.” In Optics & Photonics 2005, eds. Manos Papadakis, Andrew

F. Laine, and Michael A. Unser. International Society for Optics and Photonics,

59140Q – 59140Q – 15.

Streit, Wolfgang J. 2005. “Microglia and Neuroprotection: Implications for Alzheimer’s

Disease.” Brain Research Reviews 48(2): 234–39.

Stroganova, T A, E V Orekhova, and I N Posikera. 1999. “EEG Alpha Rhythm in

Infants.” Clinical neurophysiology : official journal of the International Federation

107

of Clinical Neurophysiology 110(6): 997–1012.

Teacher, A G F, and D J Griffiths. 2011. “HapStar: Automated Haplotype Network

Layout and Visualization.” Molecular Ecology Resources 11(1): 151–53.

Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of

the Royal Statistical Society. Vol. 58(No. 1): 267–88.

Tsai, C-L, J P Lister, W Shain, and B Oysam. 2011. “Robust, Globally Consistent and

Fully Automatic Multi-Image Registration and Montage Synthesis for 3-D Multi-

Channel Images.” Journal of Microscopy 243(2): 154–71.

Tsurui, H., H. Nishimura, S. Hattori, and T. Shirai. 2000. “Seven-Color Fluorescence

Imaging of Tissue Samples Based on Fourier Spectroscopy and Singular Value

Decomposition.” Journal of Histochemistry & Cytochemistry 48(5): 653–62.

Türetken, Engin, Fethallah Benmansour, Bjoern Andres, and Pascal Fua. 2014.

“Reconstructing Curvilinear Networks Using Path Classifiers and Integer

Programming.” IEEE Transactions on Pattern Analysis and Machine Intelligence.

Türetken, Engin, Germán González, Christian Blum, and Pascal Fua. 2011. “Automated

Reconstruction of Dendritic and Axonal Trees by Global Optimization with

Geometric Priors.” Neuroinformatics 9(2-3): 279–302.

Vasilkoski, Zlatko, and Armen Stepanyants. 2009. “Detection of the Optimal Neuron

Traces in Confocal Microscopy Images.” Journal of Neuroscience Methods 178(1):

197–204.

Wall, W D. 1982. “Jean Piaget - 1896-1979.” Journal of child psychology and psychiatry,

and allied disciplines 23(2): 97–104.

108

Wang, Yijun, and Scott Makeig. 2009. “Predicting Intended Movement Direction Using

EEG from Human Posterior Parietal Cortex.” : 437–46.

Wang, Yu, Arunachalam Narayanaswamy, and Badrinath Roysam. 2011. “Novel 4-D

Open-Curve Active Contour and Curve Completion Approach for Automated Tree

Structure Extraction.” In IEEE Conference on Computer Vision and Pattern

Recognition (CVPR), IEEE, 1105–12.

Wang, Z., A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. “Image Quality

Assessment: From Error Visibility to Structural Similarity.” IEEE Transactions on

Image Processing 13(4): 600–612.

Xiao, Hang, and Hanchuan Peng. 2013. “APP2: Automatic Tracing of 3D Neuron

Morphology Based on Hierarchical Pruning of a Gray-Weighted Image Distance-

Tree.” Bioinformatics 29(11): 1448–54.

Xu, Yan, Murad Megjhani, William Shain, and Badrinath Roysam. 2016. “Unsupervised

Profiling of Microglial Arbor Morphologies and Distribution Using a Nonparametric

Bayesian Approach.” IEEE Journal of Selected Topics in Signal Processing 10(1):

115–29.

Yang, Jianchao, Kai Yu, and Thomas Huang. 2010. “Supervised Translation-Invariant

Sparse Coding.” In 2010 IEEE Computer Society Conference on Computer Vision

and Pattern Recognition, IEEE, 3517–24.

Yang, Shuyuan, HongHong Jin, Min Wang, and Licheng Jiao. 2014. “Data-Driven

Compressive Sampling and Learning Sparse Coding for Hyperspectral Image

Classification.” IEEE Geoscience and Remote Sensing Letters 11(2): 479–83.

109

Yang, Zuyuan, Yong Xiang, Shengli Xie, and Yue Rong. 2012. “Nonnegative Blind

Source Separation by Sparse Component Analysis Based on Determinant Measure.”

IEEE transactions on neural networks and learning systems 23(10): 1601–10.

Zhang, Jin, Robert E Campbell, Alice Y Ting, and Roger Y Tsien. 2002. “Creating New

Fluorescent Probes for Cell Biology.” Nature reviews. Molecular cell biology 3(12):

906–18.

Zhang, Lin, Lei Zhang, Xuanqin Mou, and David Zhang. 2011. “FSIM: A Feature

Similarity Index for Image Quality Assessment.” IEEE transactions on image

processing : a publication of the IEEE Signal Processing Society 20(8): 2378–86.

Zhang, Qiang, and Baoxin Li. 2010. “Discriminative K-SVD for Dictionary Learning in

Face Recognition.” 2010 IEEE Computer Society Conference on Computer Vision

and Pattern Recognition (CVPR): 2691–98.

Zhou, Wei, Ya Yang, and Zhuliang Yu. 2012. “Discriminative Dictionary Learning for

EEG Signal Classification in Brain-Computer Interface.” In Control Automation

Robotics & Vision (ICARCV), 2012 12th International Conference on, IEEE, 1582–

85.

110