Analytic tools for content and shape analysis in 3D brain images

Pedro Alexander Díaz Quiroga Departamento de Ingeniería de Sistemas y Computación Universidad de los Andes

A thesis submitted for the degree of Maestría en ingeniería de información Agosto, 2019

2

Gratitude

I would like to thank my thesis director, Professor Jose Tiberio Hernandez for his guidance and advice during this thesis process. In the same way, I want to thank Dr. Nathalie Charpak for her constant support and answers to my concerns. I would also like to thank Dr. Jorge Marín, Dr. Diego Angulo and engineer Alejandra Castelblanco, who collaborated with me, framing the project within an even larger project of added value for industry and society. Finally, I thank my family.

3

Abstract

This document is the product of the work done for the master´s degree in information engineering project of the author, which aims to add value to the information asset of the Canguro program, (http://fundacioncanguro.co/), concerning the study of the development of premature patients and specifically, the impact of different aspects of the environment in the brain of said patients. The project takes the brain images of subjects from different ages, environments and genders and provides tools that allow to visualize and group the subjects according to physical characteristics of the structures of their brain, such as volume and shape.

4

1 Contents

CHAPTER 1. INTRODUCTION 7

CHAPTER 2. GENERAL PROJECT DESCRIPTION 8

2.1 Objectives 8

2.2 Background 8

2.3 Problem Definition and Significance 9

CHAPTER 3. PROJECT STATEMENT AND SPECIFICATIONS 11

3.1 Problem description 11

3.2 Design description 11

3.3 Data profiling 12

CHAPTER 4. PROJECT DEVELOPMENT 25

4.1 Referential Framework 25

4.2 Design proposal 45

CHAPTER 5 IMPLEMENTATION 55

5.1 Description 55

5.2 Dependencies 56

5.3 Studied Parameters 56

CHAPTER 6 USE CASES 64

6.1 Application possibilities 72

6.2 Dimensionality reduction 75

6.3 Some Cases observed 78

CHAPTER 7 CONCLUSIONS 89

5

7.1 Future Work 89

REFERENCES 91

6

Chapter 1. Introduction

The purpose of this graduation project is to take advantage of the data that is available in the Canguro Foundation in Colombia, related to the brain images of patients taken during 20 years of follow-up since their premature birth. This data was exploited during the execution of the project, and the results were shown to medical experts for interpretation.

Exploitation of the data in this context means to understand it, to transform it into an adequate format and to convert it into visual and statistical information for medical doctors to interact with. The characteristics of the data refer to the shape and content of the functional structures of the patients’ brain. It is based on the hypothesis that these characteristics are influenced by environmental elements in which the subject develops.

This document is structured as follows: First, a summary the of state-of-the- art context is presented; then the design proposal defined to achieve the project’s objective is described; next, the elements that form part of the solution in the implementation are detailed; the study case, which has been designed in the simplest possible way is described next and finally the results and conclusions o are presented.

7

Chapter 2. General project description

2.1 Objectives

Gather variables related to the volume, shape and content of brain structures and generate tools as input to the analysis of experts that allows them to create new hypotheses and / or generate new knowledge

Present a summary of the review of the available tools to validate the value provided through the proposed tools in this project

Contribute with the computation of new variables related to the volume, shape and content of the brain structures, such as relative mass center and determine its possible contribution in the characterization of the subjects of study

Test with the expert users the developed tools through real data coming from the 20-year investigation of the kangaroo program in Bogotá

2.2 Background

There are several references that are used as antecedents of this thesis. First, the following references suggest that the forms of brain structures and functions can characterize subjects ([17] Sherbondy et al., 2005; [21] Wang, Qi et al., 2016; [1] Cabral, Joana et al., 2017). However, it is not easy to find a study in which there are more than 100 subjects in periods of time greater than 3 years. There are several factors that make this kind of studies difficult:

The process of taking brain images is not easy and requires time, money and patience from subjects and researchers.

8

Due to the so-called cerebral plasticity, the functions of the subject may not always be associated with certain structural forms. This makes the study complicated due to the large number of possibilities that may arise.

There are many possible variables to consider which makes the problem not easy to handle.

There are already many open tools that have been refined and in many cases they present much more information than what is actually used, because of its huge quantity and diversity and where a possible application is not perceived or because information engineers have not interacted with the doctors and researchers to generate valuable knowledge from these sources.

There are references related to the comparison between the most important tools available, whose objective is to quantify the accuracy of the measurements. This is the case of ([16] Morey, Rajendra A et al., 2009), comparing structures such as the hippocampus and the amygdala.

There are other references in which techniques are exposed to characterize a 3D shape in brain investigations such as the analysis of intracranial aneurysms ([15] Meuschke, Monique et al., 2018); tractography in the human brain ([17] Sherbondy et al., 2005); surface registration of cortical areas ([20] Tardif, Christine Lucas et al., 2015); cerebral microstructural subdivision ([7] Fischl, Bruceet al., 2018); Analysis of the shape of the cerebral ventricles ([11] Gerig, G et al., 2001); Brain split based on connectivity ([21] Wang, Qi al., 2016), etc.

2.3 Problem Definition and Significance

The human brain is the most complex system known in the universe. From the brain much information can be extracted that can be used for a greater understanding of our nature. Understanding how our brain is affected by the factors that surround it will help to benefit it and therefore help human development.

9

This project studies the data from the CanguroFoundation, ([30] http://fundacioncanguro.co/), whose mission is to apply science to humanize neonatology. The Canguro Foundation has a wide set of data on brain images, medical, behavioral and social variables of patients with premature birth. The foundation was in charge of monitoring the patients who volunteered, gathering data during a 20-year period to enrich its scientific mission.

This valuable resource, unique in the world, deserves to involve information engineers in an interdisciplinary group already composed of medical experts and scientific researchers.

10

Chapter 3. Project statement and

Specifications

3.1 Problem description

A friendly interactive web platform that allows analyzing parameters of shape and content of brain morphological structures either individually or for groups of individuals is currently not available.

3.2 Design description

The comparison of parameters of shape and content in the morphology of the

subjects´ brains requires a preprocessing module that allows making those

parameters comparable by means of alignment, rotation, uniformization and

displacement from the same common reference system.

An adaptation layer for data processing that allows the information to be

integrated into the analysis tool, which must be able to compare a large group of

subjects, as well as to examine details of a single subject.

The platform must be scalable and integrable with other modules that perform

other types of analysis such as functional analysis and must also be web accessible,

so that it can be reachable from any device, operating system and location.

11

3.3 Data profiling

The medical images of the brain emerging from the measuring equipment, come in a format called DICOM (Digital & communications in medicine), which contains the pixels of the image as well as a header with metadata and additional information. Each DICOM image contains a slice of the brain as if it were a single section of the 3D brain.

On the other hand, there is another format, the NIFTI (

Informatics Technology Initiative) format, which represents all the slides of the brain, that is, the 3D brain image is obtained by stacking individual slices on top of each other. Then, with several DICOM files, a single NIFTI file can be generated. The

NIFTI format is easier to use because it contains the 3D information necessary for the analysis of the volumetric images.

The medical images of the brain can be of the MRI type (magnetic resonance imaging) or FMRI type (functional magnetic resonance imaging). The difference between the two is that for structural images there is no time variable, while for functional images there is a follow-up to the subject’s brain in time. The structural images represent the morphology of the brain of the subjects under study.

Figure 3.1: the example of functional MRI left, which makes sense to be done in time. Right, an MRI image that does not take time into account and therefore constitutes a that reveals structural details of the brain.

12

NIFTI images can be loaded with python through the nibabel library to be

converted into numpy 3D arrays which allows for further processing.

This project focuses on structural MRI images, that is, they represent the

morphological structure in a given time. For our case correspond to the images of

the subjects of young adults who participated in the cangaro method because their

birth was premature.

With preprocessing, images are “standardized”, or taken to the same reference

system in size and content to allow comparison in these aspects. For our case this

will be done with the pipeline, although there are several other tools that

or taken to the same reference system

3.3.1 Variables framework and profiling

According to the model proposed, in order to reference the input data to the

tool, related to the categorization of the data variables that feed it, the cataloging of

the variables found for the specific case of the analysis of the data, (Kangaroo

subjects), is shown below.

Figure 3.2: Data model for this project

The classification of the more than 1500 variables taken during the previous

kangaroo analysis was carried out to catalog them within the above categories.

Here is an example of the cataloging of these variables:

13

Figure 3.3: Data model for this project, which integrates an example of the variables.

Every variable represents certain characteristics of the subjects, for example,

their initial state, the intervention to which they were subjected,(e.g. through the

cangaro program), their anatomical images and their intellectual performance. As

a hypothesis a certain relationship between the variables is expected, for example,

if a subject had a bad initial state, the intervention to which it is subjected will alter

the anatomical and performance results, compared to a subject that does not

undergo such intervention.

This project contemplates a deepening of anatomical variables, which analyzes

in detail different features that could characterize a cerebral anatomical structure.

Features related to volume, shape and content of brain structures are considered

interesting, as a way to characterize brain structures and therefore we will see

them in more detail below.

14

Subject Anatomic result variables:

The characteristics that identify brain structures are the following. It shows

the ones calculated with the Freesurfer pre-processing pipeline, as well as those

calculated within the present project by means of the tool:

Variable Description Units Calc By NumVert Number of Vertices unitless Freesurfer SurfArea Surface Area mm^2 Freesurfer GrayVol Gray Matter Volume mm^3 Freesurfer ThickAvg Average Thickness mm Freesurfer ThickStd Thickness StdDev mm Freesurfer MeanCurv Integrated Rectified Mean Curvature mm^-1 Freesurfer GausCurv Integrated Rectified Gaussian Curvature mm^-2 Freesurfer FoldInd Folding Index unitless unitless Freesurfer CurvInd FieldName Intrinsic Curvature Index unitless Freesurfer Volume Volume by number of voxels Voxels Tool SumOfContent Sum of content voxels unitless Tool MaxOfContent Max of content voxels unitless Tool MinOfContent Min of content voxels unitless Tool MediaOfContent Media of content voxels unitless Tool MeanOfContent Mean of voxels content unitless Tool StdOfContent Standard dev of voxels content unitless Tool CenterOfMass Center of mass of voxels content Lineal voxels Tool RCenterOfMass Relative Center of mass of voxels content Lineal voxels Tool

Volume Corresponds to the physical description described as a metric magnitude of

scalar type defined as the extension in three dimensions of a region of space. It is a

magnitude derived from the length, since it is found by multiplying the length,

width and height. Our concerts has to do with volume of brain internal structures

Within this project, two types of volumes were worked on.

Variable Description Units Calc By GrayVol Gray Matter Volume mm^3 Freesurfer Volume Volume by number of voxels Voxels Tool

Shape:

In geometry, the shape of a physical object located in a space, is a geometric

description of the part of the space occupied by the object, as determined by its outer

limit and without taking into account its location and orientation in space, the size, and

other properties such as color, content and composition of the material

15

Within the previous cataloging, the characteristic of curvature, surface area and

thickness will be taken specifically, where we find the following variables:

Curvature Variable Description Units Calc By MeanCurv Integrated Rectified Mean Curvature mm^-1 Freesurfer GausCurv Integrated Rectified Gaussian Curvature mm^-2 Freesurfer FoldInd Folding Index unitless unitless Freesurfer CurvInd FieldName Intrinsic Curvature Index unitless Freesurfer

Superficial area Variable Description Units Calc By NumVert Number of Vertices unitless Freesurfer SurfArea Surface Area mm^2 Freesurfer

Thickness

Variable Description Units Calc By ThickAvg Average Thickness mm Freesurfer ThickStd Thickness StdDev mm Freesurfer

Content The word content has several meanings according to the dictionary. We are

going to take a simple description that describes the characteristic that we want to

determine to describe each brain structure

“Set of each one of the parts that consist in a unit”.

In our case, the content is equivalent to each of the units or voxels that make up the

3D image. Voxels are tiny cubes with an intensity value in the 3D image. The higher the

voxel intensity value, the greater the bright visual and vice versa.

Within this definition, the following are the variables that describe the content of

brain structures:

Variable Description Units Calc By SumOfContent Sum of content voxels unitless Tool MaxOfContent Max of content voxels unitless Tool MinOfContent Min of content voxels unitless Tool MediaOfContent Media of content voxels unitless Tool MeanOfContent Mean of voxels content unitless Tool StdOfContent Standard dev of voxels content unitless Tool CenterOfMass Center of mass of voxels content Lineal voxels Tool

16

RCenterOfMass Relative Center of mass of voxels content Lineal voxels Tool

Within the category of "content" there are two proposed variables that are intended to characterize a complete brain structure that are as follows:

CenterOfMass Center of mass of voxels content RCenterOfMass Relative Center of mass of voxels content

These variables are analogous to the center of physical mass of an object in the sense that they represent the point of greatest concentration of the “mass”, which in our case turns out to be the concentration of higher values of intensity of the voxels in the image. This center of mass then depends on the shape, the number of curves of the structure, the volume and the content itself of the brains internal structures. Therefore, it is representing all the variables that are to be characterized in this project.

The center of mass in physics, in a three-dimensional form, results from the

independent calculation in each of the axes where a reference point is taken and

from there the distances and mass are measured in said coordinate. The sum of the

distances multiplied by the mass at that distance from the reference point is collected

and the entire result is divided by the sum of the masses.

As an example, the calculation of the center of mass for the following system, measured

in the “x” coordinate (the two masses are in the same horizontal line), was performed

taking as a reference point, the left line in the figure

17

Figure 3.4: Example of mass center calculation

The units of the center of mass for the previous example are in centimeters,

(unit of distance from the reference point).

It can be deduced from the previous figure, that the center of mass depends on

the distribution of the masses and the mass value itself.

CenterOfMass (Center of mass of voxels content)

The following figure illustrates the methodology of calculation of “center of

mass" for the 3D images we have. For example, we want to calculate the center of

mass of the hippocampus highlighted in blue in the image. The hippocampus is

taken from the pre-processed and segmented image by freesurfer. Then a reference

point is taken from which you want to calculate said variable, (In our case, the

reference point is taken as the x, y, z (0, 0, 0) coordinates in the numpy array in

which the image is transformed). Then you take the sum of voxel values in the first

section or segment (You can start with the X coordinate), multiplied by the distance

that corresponds to the amount of voxels in the calculation direction from the

reference point. The previous summation is divided by the total amount of "mass",

which in our case corresponds to the total amount of the sum of the voxel values.

18

Figure 3.5: Example of mass center calculation for this project for different brain internal structures

The final result is a distance in voxels, which is measured from the reference

point,(in the calculation coordinate, e.g. axis “x” ), and which, as stated, represents

the 3D figure in its dimensions, shape and content, since if any of the previous

features is modified, the final result will also change

RcenterOfMass (Relative Center of mass of voxels content)

If there are differences in the sizes of the hippocampus of the test subjects, the

centers of mass will also be different. For this reason the relative center of mass is

proposed, which is defined as the distance between the center of mass, (defined and

calculated in the previous section), and the center of mass equivalent to a shape

whose intensity of voxels is constant and uniform with value of "1".

This eliminates the comparison between subjects from the perspective of the

size of the structures and directs it towards the deviation of its structure from a

uniform reference structure with the same shape and volume as its own. That is, this

new measure, is more oriented to characterize the content only, unlike the center of

mass, which is susceptible to any of the characteristics of volume, shape and content

19

Reasoning differently, the RcenterOfMass, is a measure of how different the

subject's image is from a completely flat image at the level of intensity values. If the

subject's image had very constant values, this value is zero, or very close to zero.

The medical significance of this is something that has not yet been discussed. Is it

expected that there will be great variation in this parameter for certain structures?

Or on the contrary a great uniformity is expected? These are questions we want to

answer in this project

Canguro variables profiling:

The Canguro and proposed variables are framed in the tools designed so that

all of these can be seen in three different views. All the variables mentioned above

have been outlined in order to have a better understanding of them and see how you

can play with them within the tool that is used to see the interrelation between them.

By individually understanding each variable, you can better choose those that want

to be analyzed together and see their inter relationships.

The profiles below analyze some of the most important variables of the

Kangaroo study for their better understanding. Given the large number of variables.

Annex A to will be used for a better deepening of this profiling. We want to show

some variables that are important to highlight because they are key for further

analysis within this project.

Python profiling was used to outline the various variables of the kangaroo study

as shown below with two of them as an example:

Category: Subject initial state variables SCB_agemother: Corresponds to the age of the mother at the moment of the Baby's birth Numeric

20

Distinct count 22 Unique (%) 19.5% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 28.283 Minimum 19 Maximum 40 Zeros (%) 0.0%

The previous histogram shows the numerical variable, (age), labeled as Age of

the mother (years), with a kind of bimodal distribution, where age range is between

19 and 40 years of age, a little more inclined towards the range of 19 to 30, and a

notable group around 30.

Category: Subject initial state variables FOLLOW_Fragility_Rasch_746_2PL: Parameter of fragility1 of the baby at birth. The smaller or more negative, it implies less fragility or greater robustness of the baby Numeric Distinct count 17 Unique (%) 15.0% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 0.2142 Minimum -1.1861 Maximum 1.6533 Zeros (%) 0.0%

The previous variable, labeled as Fragility index 2PL including variables before

randomization in the 746 infants from the original cohort, is a numerical type with a

distribution similar to the normal one with a range of values between -1.1 and 1.6. There

are two ways to show and relate this variable with the rest of the variables in a view of

the proposed heat . This is, by means of a type of membership variable in which

1 Construction of a fragility index As the initially randomized sample was slightly unbalanced 20 years later, instead of adjusting it by weight variables or propensity scores, we used a Rasch model to determine whether the groups differed at the start of the intervention with regard to general fragility (or vulnerability or limited development). A set of 15 binary indicators was selected to detect damage that might have occurred during pregnancy, birth or the neonatal period before randomization Instead of a simply total of the indicators, the fragility index is based on individual factorial scores, on the assumption that a common latent variable measures the non-specific personal fragility of an infant. 21

each value of the heat map belongs to a value of the variable, (in this case that of

fragility), which can be seen at the top of the map. Another way is to integrate it into the

heat map itself, for which the variable must undergo the function of standardization and

other treatments that allow it to be displayed visually with the color tones established for

the range that goes from negative numbers in blue, to numbers positive in red

Category: Performance WASI_PercRsngcompositescore Numeric Distinct count 47 Unique (%) 41.6% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 89.858 Minimum 52 Maximum 123 Zeros (%) 0.0%

WASI_PercRsngcompositescore is labeled as PRI (Perceptual Reasoning Index

Composite score). Is a variable that indicates the value of the perceptual reasoning test

in the subjects, at the age of 20 years. It is observed that it has a distribution in the form

of a gaussian bell, with some outliers.

Category: Subject initial state variables BIRTH_apgar5_5 Numeric Distinct count 6 Unique (%) 5.3% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 9.0442 Minimum 5 Maximum 10 Zeros (%) 0.0%

The Apgar score is a test to evaluate newborns shortly after birth. This test evaluates

the baby's heart rate, muscle tone and other signs to determine if he needs additional or

emergency medical help.

22

Usually, the Apgar test (also known as the "Apgar test") is given to the baby twice:

the first time, one minute after birth, and again, five minutes after birth. Sometimes, if

the baby's physical condition is worrisome, the baby can be evaluated a third time.

BIRTH_apgar5_5, labeled as APGAR score at 5 minutes shown above, concentrate

its values around the value of 9 2.

It is important to note that all the variables associated with each of the subjects,

which catalog them and provide additional information regarding their initial status,

intervention status to which they were subjected or final performance status, (regardless

of their type), may be visualized in the tool within the cataloging variables found in the

upper part of the heat map, as shown in the following figure.

Figure 3.6: Variables related to the kangaroo study that can be represented at the

top of the heat map

2 The word Apgar refers to "Appearance, Pulse, Irritability (Grimace), Activity and Breathing."

In the test, these five factors are used to assess the health of the baby. And each factor or aspect is evaluated on a scale ranging from 0 to 2, with 2 being the highest possible score:

Appearance (skin color) Pulse (heart rate) Irritability (reflex response) (from Grimace in English) Activity (muscle tone) Breathing (respiratory rate and respiratory effort)

23

A sample of the profiling of the most important variables that were considered for this project is included in Annex A.

24

Chapter 4. Project Development

4.1 Referential Framework

4.1.1 State of the art tools and platforms

In neuroimaging research there is increasing evidence that shape analysis of brain structures provides new information which is not available by conventional volumetric measurements. This motivates development of novel morphometric analysis techniques answering clinical research questions which have been asked for a long time but which remained unanswered due to the lack of appropriate measurement tools. The challenges are the choice of biologically meaningful shape representations, robustness to noise and small perturbations, and the ability to capture the shape properties of populations that represent natural biological shape variation.

The following is a compendium of the available MORPHOLOGY MEASUREMENT SOFTWARE TOOLS | MAGNETIC RESONANCE IMAGING ANALYSIS nowadays.

https://omictools.com/morphometric-analysis-3-category

4.1.1.1 BoneJ

A plugin for bone image analysis in ImageJ. BoneJ provides free, open source tools for trabecular geometry and whole bone shape analysis. It calculates several trabecular, cross-sectional and particulate parameters in a convenient format. Java technology allows BoneJ to run on commodity computers, independent of scanner devices, fully utilising hardware resources. ImageJ’s plugin infrastructure provides a flexible working environment that can be tailored to diverse experimental setups. BoneJ is a working program and a starting point for further development, which will be directed by users’ requests and the emergence of new techniques.

25

Critical analysis: It is a multipurpose tool that allows analysis of the geometric characteristics of bones but does not have a focus on our main concern, which are brain images. The main uses have to do with the oculocerebrorenal syndrome, analysis of fractures, Marián syndrome, Cardinoma, Pancreatitic Ductal, among others. It is used mostly in the USA, Canada, Europe, Australia, Chile, Brazil and South Africa. Is a free, open source tool that can be downloaded at http://bonej.org/

Figure 4.1: Graphical output from BoneJ. (A) Trabecular thickness (Tb.Th) measured with the Thickness command. Yellow regions are thicker than blue regions. (B) Centroid and principal axes as calculated by Slice Geometry on a tomographic slice from the tarsometatarsal bone of a little spotted kiwi (Apteryx owenii). (C) Murine osteocyte lacunae imaged with synchrotron μCT and measured with Analyse Particles.

4.1.1.2 FSL

Allows users to work about magnetic resonance imaging (MRI) brain imaging data. FSL can realize functional magnetic resonance imaging (FMRI) analysis (brain extraction, smoothing, statistics, and registration). This tool can analyze a wide range of MR modalities (task FMRI, resting FMRI, ASL, diffusion, structure), and can be easily scripted and run over computing clusters.

Critical analysis: FSL was in fact one of the most explored tools during this project, because it provides very important information regarding the characteristics that we wanted to analyze. FSL, like Freesurfer, provides a pre-processing pipeline that allows us to segment the images, which constitutes the food of our tool. FSL is well implemented in the world and even in Colombia it has been an exploited tool. The tool has been widely used to perform studies related

26

to Alzheimer's disease, brain injuries, Lymphoma, Non-Hodgkin, Stroke, cerebral infarction among others.

Figure 4.2: FSL GUI for graphical analysis

4.1.1.3 Brain Extraction Tool

Deletes non-brain tissue from an image of the whole head. BET can also estimate the inner and outer skull surfaces, and outer scalp surface, if T1 and T2 input images are of good quality. It is very robust and accurate and has been tested on thousands of data sets from a wide variety of scanners and taken with a wide variety of magnetic resonance sequences. BET uses a deformable model that evolves to fit the brain's surface by the application of a set of locally adaptive model forces. The method is very fast and requires no preregistration or other pre- processing before being applied. BET takes about 5-20 sec to run on a modern desktop computer and is freely available, as a standalone program that can be run from the command line or from a simple GUI, as part of FSL (FMRIB Software Library).

Critical analysis: BET as well as FSL was considered a lot for this project as part of the pre-processing pipeline components due to its lightness and power. BET is a very mature tool that delivers clean brain images which is a great help to start any type of brain analysis. BET has been used worldwide and has collaborated in studies similar to those mentioned with FSL.

27

Figure 4.3: Brain segmentation obtained with the Brain Extraction Tool

4.1.1.4 FracLac

Offers a platform for fractal analysis and morphology functions. FracLac is a module that can be used as an ImageJ or FIJI plugin. The application includes features such as: (i) analyzing complexity and heterogeneity, (ii) measuring difficult to describe geometrical forms; (iii) extracting of pattern from several types of images for analysis. The software is suitable for images of biological cells as well as for other structures such as branching structures or known fractals.

Critical analysis: FracLac is specialized in characterizing biological structures in general, which is precisely the objective of this project. Being so general and not being specialized in the brain, it would still require adjustments to integrate it into this project, however it was an interesting tool that still has potential to be exploited. At the level of America, it is used only in Brazil and the USA. Additionally it is used in a good part of Europe and Asia. The tool has been exploited in problems related to Neoplasms (similar to tumors), Brain injuries and detection of irregular forms in the human body

28

Figure 4.4: FracLac for ImageJ to quantify cell complexity and shape. (a) Illustration of FracLac box counting method to derive fractal dimension calculations of a microglia outline. (b) of the convex hull (blue), bounding circle (pink) and convex hull ellipse (orange) with accompanying longest length and width (dashed lines)

4.1.1.5 BrainVISA Hosts heterogeneous tools dedicated to neuroimaging research. BrainVISA aims to help researchers in developing new neuroimaging tools, sharing data and distributing software. It offers a way to define viewers which may use any software. Thanks to its data management functions, the tool can define the data types handled by the software, associate key attributes for indexation, and filename patterns to make the link between the filesystem and the database schema.

Critical analysis: BrainVISA is widely used in the American continent, including Colombia and much of Europe, China and Australia, is a tool that facilitates collaboration between neuroimaging researchers that is just our area of concern. In more advanced stages of our process, it may be feasible to contribute within that environment. The tool has contributed with studies of Alzheimer's disease, Glioma, among others

29

Figura 4.5: BrainVISA IntrAnat module that is a collection of software to manage, analyse and anatomical (MRI, CT) and functional SEEG data (Intracranial electrodes) of epileptic patients.

4.1.1.6 Mango/ Multi-image Analysis GUI

Automates regional behavioral analysis of human brain images. Mango provides analysis tools and a user interface to navigate image volumes. The tool is ease to use, multi-platform Java application and extensive region of interest tools. It has the ability to add and update software as a plugin module and offers full access to a suite of image viewing and processing features. The software is able to rapidly determine regionally specific behaviors for researchers’ brain studies.

Critical analysis: Mango was another of the tools explored with enough success in this project due to its ability to represent realistic images of the brain in a web environment with the help of javascript. It was thought to use this tool to adapt it to the volumetric graphing module of the brain, but finally it was decided to carry out a development completely from scratch since the latter facilitated the manipulation of the segmented images according to the requirements of the experts, which Finally, it would provide greater adaptability to the needs. Mango is used for many tasks related to the analysis of brain diseases and is used in much of Europe, the United States, Brasil and China

30

Figura 4.6: Mango mobile-friendly research application for the Apple iPad. It features many of the same ROI and analysis tools as Mango and uses interoperable file formats and customization files such as ROIs and user- defined color tables.

4.1.1.7 Braviz (Brain Visualization)

Braviz a python library, and a system with a graphical user interface which can be used to analyze brain data. The braviz system is a collection of small applications tailored for specific tasks. The idea is that each application should be easy to understand and use. Nevertheless, applications are connected to each other in several ways which permits to complete more complicated tasks. If none of the available applications fit the task at hand, a new application may be developed and integrated into the current system using braviz as a library.

Critical analysis: Braviz is a tool of the house whose power lies in its adaptation to the analysis problems that exist in Colombia. Braviz has a wide range of subject analysis characteristics which allows analyzing individual characteristics in great detail. He suffers that he did not adapt to the web environment in the first version and that it should be strengthened at the level of group analysis. Braviz has been used in Colombia for the study of the brain of the Canguro subjects by a group of scientists from different parts of the world. The idea of the present project is precisely to build a contribution to a second version of Braviz.

31

Figure 4.7: An example of BRAVIZ running on a large display in a collaborative setting.

4.1.1.8 BrainSuite

Enables largely automated processing of magnetic resonance images (MRI) of the human brain. BrainSuite provides a sequence of low-level operations that can produce accurate brain segmentations in clinical time. It produces classified brain volumes that can be useful for quantitative studies of different regions of the brain. The tool consists of several modules that performs skull and scalp removal, nonuniformity correction, tissue classification, and object topology correction.

Critical analysis: Brainsuite is an interesting tool that should be explored on subsequent occasions due to its ability to perform pre- processing of images being apparently friendlier than freesurfer and with great analytical characteristics. Unlike Freesurfer and others, it runs on windows that allows more researchers to work with it. There have been scientists from Colombia, Argentina, China, Europe and the USA who have used this tool.

32

Figure 4.8: BrainSuite magnetic resonance image analysis tools

4.1.1.9 Freesurfer FreeSurfer is a software package for the analysis and visualization of structural and functional neuroimaging data from cross-sectional or longitudinal studies. It is developed by the Laboratory for Computational Neuroimaging at the Athinoula A. Martinos Center for Biomedical Imaging. FreeSurfer is the structural MRI analysis software of choice for the Human Connectome Project.

Critical analysis: Freesurfer is the basis for the pre-processing of this project because the data pre-processed in this tool was already in advance. Freesurfer is a very mature and widely known tool in the field of brain imaging analysis. Freesurfer has tools that allow the analysis of structural data of the brain and in its pre-processing generates a large amount of data related to this; however, the tool does not have a web interface that facilitates the connection of scientists from any place or system.

33

Figure 4.9: Freesurfer GUI for 3D volume of the brains internal structures analysis

4.1.1.10 Neurosynth

Neurosynth is a platform for large-scale, automated synthesis of functional magnetic resonance imaging (fMRI) data.

It takes thousands of published articles reporting the results of fMRI studies, chews on them for a bit, and then spits out images

Critical analysis: It is a very interesting tool whose characteristic is to allow a very wide range of functional MRI image analysis and great flexibility. It has a web interface that facilitates access by researchers and open source based on python and javascript.

Figure 4.10: Neuroimaging, Reverse inference. A, when using Neurosynth for a traditional meta-analysis. B, when using Neurosynth in reverse inference mode. C, The same frontal areas that identified in a traditional meta-analysis for. D, In contrast, reverse inference for faces no

34

longer identifies frontal cortex activity but rather locates the activation predicting highest probability for face percepts in the right FFA

4.1.1.11 MedInria

is a multi-platform medical image processing and visualization software. It is free and open-source. Through an intuitive user interface, medInria offers from standard to cutting-edge processing functionalities for medical images such as 2D/3D/4D image visualization, image registration, diffusion MR processing and tractography.

Critical analysis: Another very flexible and open source development that allows installation in various operating systems. It is a tool oriented to the visualization of images and their treatment in aspects of segmentation, registration, filtering of images, file management. It is not punctually specialized in aspects of characterization of form and volume that is our concern.

Figure 4.11: MedInria registration of two images

4.1.1.12 Invizian

Invizian enables fly through and interact with hundreds of human brains to compare structural differences or carefully inspect individual specimens. It enables, via your computer, to display and interact with hundreds of neuroimaging data sets at once —bringing together brain image data from some of the world’s best neuroscience research teams. Invizian empowers both researchers and students of neuroscience to explore and understand the human brain using a

35

simple and powerful user interface for neuroimaging data exploration and discovery.

In this interface, cortical surfaces specific to each brain are positioned such that data sets whose neuroanatomy is most similar lay closest to one another whereas brains with the most different cortical anatomy are positioned furthest away from one another. This creates a “brain cloud” based on their neuroanatomical similarity. Users may use their mouse to navigate through the space then perform meta-data searches highlighting specific brains. Any brain may be clicked to provide interaction with a high-resolution version of the surface. Brains may be color-coded according to specific attributes or regional metrics. Brains may be grouped and systematically compared using a variety of data mining tools.

Critical analysis: It has the great advantage of allowing the multivariate analysis of a large number of subjects by means of data mining techniques, which was a flaw we wanted to solve. It allows observing the set of subjects in a single view and selecting a specific one for a specific analysis, which is coherent with the Visual Information- Seeking Mantra of Shneiderman. Another advantage is that it has the possibility of considering the time within the analyzes, showing the evolution of the subjects, giving the possibility of linking it with variables of intervention during the evolution of the patient. The only flaw I could mention is that it does not have the web option, but it can be said that it is one of the tools with more tools oriented to our objective that we found.

36

Figure 4.12: Invizian t-SNE color coded by age. T-distributed stochastic neighbor embedding (t-SNE) algorithm, which reduces the number of dimensions for each scan in the input data set to two dimensions while preserving the local structure of data sets

4.1.1.13 Bids.neuroimaging.io

Neuroimaging experiments result in complicated data that can be arranged in many different ways. So far there is no consensus how to organize and share data obtained in neuroimaging experiments. Even two researchers working in the same lab can opt to arrange their data in a different way. Lack of consensus (or a standard) leads to misunderstandings and time wasted on rearranging data or rewriting scripts expecting certain structure. Bids describe a simple and easy to adopt way of organizing neuroimaging and behavioral data.

Critical analysis: It is a type of specification that allows organizing neuroimaging data in a standard way so that all researchers can share their data in an organized and common way to facilitate understanding and the types of interfaces that can be used to interact with these data. The objective of BIDS is to reduce the data exploitation times, unifying the structure of them to a standard. Of course, it is another initiative to which we should join since it tends towards the collaborative and standardized environment to optimize times and facilitate collaboration.

37

Figure 4.13: BIDS format example of how it order the folders

4.1.1.14 SPHARM

SPHARM is a software application to perform 3-dimensional spherical harmonic analyses of triangular mesh surfaces. Spherical harmonic analyses are a 3D extension of Fourier analyses that generate a 3D mathematical model of an object's surface.

To construct a spherical harmonic model of an object using SPHARM, a triangular mesh representation of the object and a set of landmarks are needed to define the object. The triangular mesh consists of a dense coverage of points (i.e., vertices) on the object's surface and lines connecting the points to form a complete set of triangles (i.e., faces) that define the surface. The landmarks are used to orient and register a series of objects relative to one another so that they can be compared

Critical analysis: Spharm is another of the tools explored within the present project because it was found that it could provide characterization to brain structures. The characterization is done by mathematically describing the three-dimensional shape by its fourier components, as Fourier characterizes a wave by decomposing it into sinuidal components. In this way, the volume and the surface of the

38

structure is univocally characterized. This would be equivalent to having a characterization for the volume, plus one for the surface.

Figure 4.14: SPHARM Algorithms to automatically quantify the geometric similarity of anatomical surfaces.

4.1.1.15 BrainBrowser

BrainBrowser is an open source JavaScript library exposing a set of web-based 3D visualization tools primarily targetting neuroimaging. Using open web-standard technologies, such as WebGL and HTML5, it allows for real-time manipulation and analysis of 3D imaging data through any modern web browser. The BrainBrowser Surface Viewer is a WebGL-based 3D viewer capable of displaying 3D surfaces in real-time and mapping various sorts of data to them. The BrainBrowser Volume Viewer is an HTML5 canvas-based viewer allowing slice-by-slice traversal of MINC, NIfTI and MGH/freesurfer volumetric data.

Critical analysis: Brainbrowser is one of the key tools for this work, since it was a fundamental guide for the realistic three-dimensional rendering module of brain structures. Brainbrowser allows representing the images in different formats through the web interface, one of the fundamental requirements of this project. Finally, it was decided during the present project, to render the volumes, using the three.js library after processing the images with the help of python.

39

Figure 4.15: Brainbrowser web-enabled brain surface viewer that allows the user to explore in real time a 3D brain map expressed on a base surface. Typically, this map might be a statistical map derived from a group analysis of functional or structural imaging

4.1.1.16 Vtkweb

The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D , image processing, and visualization. Its implementation consists of an ES6 JavaScript class library that can be integrated into any web application. The toolkit leverages WebGL and supports a wide variety of visualization algorithms including scalar, vector, tensor, texture, and volumetric methods. VTK is part of ’s collection of commercially supported open- source platforms for software development.

Critical analysis: Together with Brainbrowser, vtk is another tool for image processing and visualization, allowing you to render them and play with these realistic images. VTKweb is based on vtk.js, which is a web rendering library of 3D objects widely used in medical and scientific applications of high detail.

40

Figure 4.16: Visualization Toolkit , cross-platform tool VTK view

4.1.1.17 ANTS

Advanced Normalization Tools (ANTS) is an ITK-based suite of normalization, segmentation and template-building tools for quantitative morphometric analysis. Many of the ANTS registration tools are diffeomorphic, but deformation (elastic and BSpline) transformations are available. Unique components of ANTS include multivariate similarity metrics, landmark guidance, the ability to use label images to guide the mapping and both greedy and space-time optimal implementations of diffeomorphisms. The symmetric normalization (SyN) strategy is a part of the ANTS toolkit as is directly manipulated free form deformation (DMFFD).

Critical analysis: ANTS is a tool comparable to freesurfer, which is responsible for doing many pre-processing tasks with images such as segmentation, registration, (to unify the reference of images), along with co-registration, (registration between images of the same subjects to reference them to a common point), etc. This tool is the basis for many others that take the preprocessing pipeline and allow such data to be analyzed in detail later.

41

Figure 4.17: ANTS GUI for images registrations and parcellations

4.1.1.18 Mindboggle

The Mindboggle project's mission is to improve the accuracy, precision, and consistency of automated labeling and shape analysis of human brain image data. Mindboggle promote open science by making all software, data, and documentation freely and openly available.

Mindboggle’s open source brain morphometry platform takes in preprocessed T1-weighted MRI data, and outputs volume, surface, and tabular data containing label, feature, and shape information for further analysis. The software runs on Linux and is written in Python 3 and Python-wrapped C++ code called within a Nipype pipeline framework.

Critical analysis: Mindboggle was one of the tools considered to be part of the present project, due to its functionality to extract volume, shape and surface information from other pre-processing pipelines, (something that is being done in the present project with self development). This information extracted from Freesurfer and / or ANTs, allows to contrast these two options and correct one or the other

42

by manual means. Mindboggle also generates realistic interactive visualizations, as well as statistical visualizations of the group of subjects studied. Finally, this tool was not taken to have greater control of the data in this project.

Figure 4.18: Mindboogle WEB GUI whose name is ROYGVIV

4.1.1.19 Discussion of the state of the art

It is observed that there are a large number of available platforms related to what we are concerned about that is the analysis of brain morphological images, as well as information related to the subjects studied. Many of them are powerful and very competent in the field of their specialty. However, it should be noted that a tool that brings both things together is not so easy to find.

It is highly desirable to count with a tool that allows several of them to be integrated and even allows the categorical information of the kangaroo method subjects to be interrelated in a quite flexible and friendly way so that it can still evolve based on the needs of the experts

4.1.2 Available Information for this project

4.1.2.1 Data Set

43

The available data comes from the exams taken to the patients of the Canguro program, where brain images are available both at rest, as well as in the performance of different tasks proposed for the functional study of subjects’ brains. These images were already subjected to pre-processing in FreeSurfer for analysis in the Braviz 1 project. On the other hand, demographic information about the subjects is available too.

This data is held in different formats:

 3D images in .mgz format where the meaning of their voxels varies according to the processing stage. For example, the values of the voxels can come after a first stage of intensity normalization of the original image. On the other hand, this value can also be product of segmentation, which results in values corresponding to different brain structures. That is, a certain voxel value represents a certain morphological structure.  Format in plain text product of the FreeSurfer preprocessing where the volumetric, superficial and shape data are found, such as the different types of curvature that FreeSurfer generates for the brains internal structures.  type files in excel format of demographic data of subjects

4.1.2.2 Preprocessed data

As mentioned in 4.1.2.1, the data used in the project is taken after the pre- processing phase of freesurfer, since we want to take advantage of the extensive analysis of form and volume freesurfer performs.

The amount of information that freesurfer generates is large and much of it is not used because in many cases it is redundant and in other cases the interactive freesurfer tool is not as friendly as required.

44

For the above reason it is important to provide a tool with the flexibility and simplicity that allows experts to have an overview of the total information available and that additionally you can select those variables that you want to analyze in detail

4.1.3 What is missing

Today we already have tools that generate a good amount of information related to the shape and volume of the morphological structures and indirectly the content. This is the case of FreeSurfer that generates an extensive amount of calculations related to shape and volume. What is not so easy to find is a web tool that gathers all this information and facilitates its analysis and can also focus on the

parameters studied in form and content.

4.2 Design proposal

Within the design, the Data science lifecycle has been taken into consideration, which consists of a continuous cycle of feedback phases, where the final objective is to solve a business problem that in our case is to provide knowledge to the experts regarding the factors of the Canguro program that impact the patient’s brains to a greater degree.

4.2.1 Data framework detail

In order to generate the greatest benefit of the proposed tools, the following data model is proposed which, as stated above, revolves around the analysis of the morphometric characteristics (volume, content, form of the brains internal structures) of the study subjects. This model structures the data as follows:

Data variables categories:

45

Subject initial state variables: describe how is the initial state of subjects before the analysis

Subject intervention variables: Indicate some type of intervention on the subjects

Subject anatomic result variables: Volumes, content and shape of internal brain structures

Subject performance variables: How is the performance of the subject after intervention time

The following represents the relationship between the different categories of variables and how they represent a state over time of the subjects. This scheme serves to understand how these variables describe the evolution of the subjects, showing the response to the initial and intervention variables.

Figure 4.19: Model of kangaroo studio variables that catalog them for easier understanding.

The previous data model is framed in a framework that leverage computer visualization defined by [2] as: “computer- based visualization systems provide visual representations of datasets intended to help people carry out some task more effectively”.

46

Figure 4.20: Visual analytics framework proposed by Tamara Munzner that guides this project

According to the framework proposed by Tamara, the following are the basic premises we have to consider to fulfill the former definition for the present project.

• Human in the loop needs the details: Experts need to get information about volume, shape and content of subjects brain’s structures

• External representation: perception vs cognition: Chose the best channel to provide the information to experts

• Intended task: Determine the possible contribution of proposed variables in the characterization of subjects

• Measurable definitions of effectiveness: Validation of proposed tools against some known case

The interpretation of the data (volume, surface and content of the brains internal structures) will be carried out by the experts based on the visual information

47

that is delivered to them. The following diagram shows the three different representations of data proposed in this project:

Figure 4.21: Scheme of the proposed tools for this project

The information of volume, surface and content of internal brain structures is then represented as follows:

1. Heat map view that represents the values of the variables and where the largest or smallest values of the different variables are. The proposed map contains the subjects against their proposed features. This map allows observing clusters based on cosine distance of total vector features or subjects. Statistical information for quantification of belonging to the cluster is also provided

2. Numerical information that contributes to the quantification carried out with the heat map before is provided in a second screen. On the other hand a

48

histogram is provided that allows to see the distribution of content of any brain structure of several subjects to compare

3. Realistic volumetric representation of the analyzed structure where one or more subjects can be superimposed for comparison by the experts in a third screen.

The tool was carried out following the idea of the Visual Information-Seeking Mantra of Shneiderman [1] that summarizes many visual design guidelines and provides an excellent framework for desiging Information visualization applications. The mantra dictates the following steps:

1. Overview: Have a complete overview of the characteristics of all study subjects

2. Zoom and Filter: Being able to filter an individual or group of subjects to see more detail of the rest

3. Details on demand: Ability to see the detail of a single subject or a few.

49

The design allows taking advantage of the visual design guidelines and provides a common and consistent framework to carry out the research process by the experts.

Figure 4.22: Scheme of the proposed tools for this project under the Visual Information-Seeking Mantra of Shneiderman

At the overview level, there is the option of observing a large number of subjects and their morphometric characteristics in a single view, which allows detecting outliers, clusters, the lowest or highest values. For this, the data is treated by means of a standardizing function that takes them to the same range [-10, 10], independent of their real value, imitating the real behavior of said data but now in the common range so that they can be analyzed by the eye of the expert.

The overview contains the following views:

 Subjects-Features: Allows grouping subjects of similar or opposite characteristics according to their features

 Features-Features: It allows to group features of similar or opposite characteristics, (which provide similar and opposite information for dimension reduction)

50

 Subject-Subjects: Allows focusing on subjects to analyze those that have similar or opposite characteristics

All the previous views can be grouped into clusters using the cosine distance of the vectors resulting from considering the values of the variables in the rows or columns, which generates data groupings both vertically and horizontally according to the need of the experts.

The Zoom and filter feature consists of the functionality of being able to filter subjects or features of the cluster view. It is possible to select those that you want to analyze in detail and still apply the cluster criterion to observe similarities or differences, as well as outliers.

For details on demand, two screens were created that allow selecting the subjects and brain structures of specific subjects and being able to see in detail their volume and content by means of the statistical characterization of their components. It is possible to compare two or more subjects and see if they resemble at the level of their "center of mass" characteristic.

Finally, it is possible to see the realistic representation of the selected structures in the subjects chosen for a volumetric analysis by the experts.

The following diagram shows the independent processes where we worked to integrate the data so that everything was in the framework and model already mentioned

51

Figure 4.23: block scheme of the components of this project

The data coming from the processing pipeline, come in very diverse formats such as unstructured text files, semi structured text files, segmented images or raw images. All these data sources go to the data parsing module (1) that allows to recognize where the data of interest are, taking these data to two routes: The first one that allows to standardize the data (2) for its later processing (3) and visual display in the heat map (4) and the second, to compute new descriptive variables (5) that are later transformed (6) so that they can be received within the correct types and ranges for the visualization tools already described allowing to see the numerical detail of those descriptive variables (7), as well as the realistic views of the volumes in question (8).

52

Figure 4.24: Standardization function to adapt the data to the visualization

The following image represents the view of the variables categorized according to the established model and their visual relationship in the tools proposed for the analysis.

Figure 4.25: Data model and visualization tools proposed

53

It is observed that the tools proposed under the guidelines of Munzner, are associated with the model of variables established so that through the 3 proposed views you could see the possible relationships of the variables to be able to interact and take full advantage of them.

The following figure shows the general interaction model that experts have with the variables and particularly with the features of interest of this project:

Figure 4.26: Scheme that shows experts interacting with the data model and visualization tools

Beyond the central task proposed in this project, the proposed tools can be used more generally to solve research questions that the experts raise. A case of the Kangaroo Mother Care (KMC) Method will be used to test with experts the ability of the tool to generate insights.

The tool will then be a support in the scientific cycle of data as shown in the figure above.

54

Chapter 5

Implementation

Version 2 of Braviz, has followed the same philosophy of Braviz 1 in that it focuses on the user as the protagonist of the scientific process and therefore provides additional tools that facilitate the user's access. The first premise is that it should be a web-type tool that allows the user to access from any machine without having to install any software component which decreases production time and eliminates startup problems. In the second instance, the philosophy of modular architecture was maintained, which facilitates collaboration in joint development through the establishment of a main framework that should be communicated through defined interfaces with the modules.

A very important feature in which Braviz 2 had to work was to provide tools for the comparison between subjects so that patterns could be found that allowed the grouping of subjects by criteria of analyzed characteristics.

5.1 Description

The content and volume analysis module consists of a 3-layer architecture: GUI, a module, and the underlying algorithms:

GUI This layer allows easy interaction between the user and the application. This layer shows the different possibilities that the user has, as well as the reports generated for the subjects and their basic individual properties. It also provides a tool to analyze the relevant statistical measures of the sample and of the individual subjects.

Adaptation layer Made in python has the function of adapting the consolidated data to be able to display it in an appropriate way. Transform the data to uniform ranges so that it is possible to compare it. It also performs necessary

55

processing on data when the user requests or pre-calculates it in other cases. Perform data checking and cleaning

Data extraction layer Transforms data of interest into the appropriate format for further processing and visualization treatments. Most of the data are images in 3D and other data comes from flat texts product of FreeSurfer pre- processing

5.2 Dependencies

Several libraries were required throughout the development for python. Most of them are well-known libraries in Python programming for data science, although some others are not included in the standard installation. These dependencies are:

NumPy Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays.

Nipy python project for analysis of structural and functional neuroimaging data

Nibabel This package provides read +/- write access to some common medical and neuroimaging file formats. We can read and write FreeSurfer geometry, annotation and morphometry files.

Pandas is a software library written for the Python programming language for data manipulation and analysis.

5.3 Studied Parameters

The central parameters of our analysis for the present project are:

 Shape and volume of the brains internal structures  Content of the brains internal structures

56

It was said in 4.1.3 that much of the information related to these characteristics can be exploited from the pre-processing done by freesurfer. In the case of content, information is available within the freesurfer output files, but processing is required to obtain valuable information for users. Below a brief description of these parameters.

5.3.1.1 Relevance in shape curvature

Human intellect is thought to be a result of the disproportionately large cortical surface area to whole brain volume compared to other species. This increase in cortical surface area is achieved by increased gyral folding. Interestingly this increase in cortical surface area does not come with similar increases in cortical thickness. In fact, the several orders of magnitude increase in the human gray matter surface area compared to mice and monkey is achieved with barely a doubling in the cortical thickness. Thus, it appears that cortical surface area to brain volume and cortical folding are important parameters of cognitive development and ability.

If we can quantify the development of gyral folding, we may improve our ability to characterize normal trajectories of brain development and detect deviation with disease. ([39] R. Pienaar et al)

5.3.1.2 Relevance in volume of brain structures

Neurodegenerative disorders, psychiatric disorders, and healthy aging are all frequently associated with structural changes in the brain. These changes can cause alterations in the imaging properties of brain tissue, as well as changes in morphometric properties of brain structures. Morphometric changes may include variations in the volume or shape of subcortical regions, as well as alterations in the thickness, area, and folding pattern of the cortex. While surface-based analyses that depend on models of the position and orientation of the cortical ribbon can provide an accurate assessment of cortical variability, volumetric techniques are required to detect changes in non-cortical structures. For example, changes in ventricular or hippocampal volume are frequently associated with a variety of

57

diseases. This type of analysis has commonly been accomplished by having a trained anatomist or technician manually label some or all of the structures in the brain, a procedure that can take up to a week for high-resolution images. ([40] B. Fischl et al.)

5.3.2 Volume and content

5.3.2.1 Basic definitions

the Latin word volūmen has promoted the appearance of the concept of volume, a word that allows describing the thickness or size of a certain object. Likewise, the term serves to identify the physical magnitude that informs about the extension of a body in relation to three dimensions (height, length and width). Within the International System, the unit that corresponds to it is the cubic meter (m3).

At the content level, the analysis to be carried out considers a count of the elements that make up this volume. In this case, the count of the voxels and their intensity value which represents the type of brain tissue in that small volume (for example the amount of water or fat tissue in that point). This set of voxels constitute a population of elements with a fundamental characteristic equivalent to the intensity whose characterization by statistical methods is fundamental to be able to compare different structures.

5.3.2.2 Volume and content processing

To obtain the volume of the morphological structures, several preprocessing steps are required where the input are the 3D images (MRI) Magnetic resonance imaging coming from the scanner machine.

The data is subjected to the following process, which results in an image of the isolated brain without surrounding organs, with normalized intensity. Then the same reference system is carried out by means of rotation, translation and scaling.

58

Figure 5.1: Schematic of freesurfer pre-processing in 3D MRI images used in the present thesis project to feed the analysis tool.

The image is finally represented as a set of voxels which are equivalent to building blocks Like stacked bricks that form a larger 3D object. We can get voxels information such as their axis coordinates and its corresponding scalar value.

The "population" of voxels in each image is analyzed in two aspects:

The statistical composition of the voxel "population" is analyzed, which serves as a descriptor of the structure and comparison criteria with other structures in different subjects.

An equivalent to the "center of mass" or "centroid" is calculated by its simile in physics, which represents the point where the highest concentration of voxel high values occurs. This value is also a descriptor of the structure and helps characterize it for comparison with other subjects

59

The centroid also serves as a reference to compare two or more figures, allowing their alignment and voxel to voxel comparison, allowing to calculate the intersection and union of elements.

5.3.3 Shape Curvature

5.3.3.1 Basic definitions

In mathematics, curvature is any of a number of loosely related concepts in different areas of geometry. Intuitively, curvature is the amount by which a geometric object such as a surface deviates from being a flat plane, or a curve from being straight as in the case of a line, but this is defined in different ways depending on the context. There is a key distinction between extrinsic curvature, which is defined for objects embedded in another space (usually a Euclidean space) – in a way that relates to the radius of curvature of circles that touch the object – and intrinsic curvature, which is defined in terms of the lengths of curves within a Riemannian manifold ( n dimensional space).

Figure 5.2: Examples of Euclidean geometry, spherical and hyperbolic or Riemann geometry

60

Figure 5.3: Conceptual illustration of curvature. The measurement of the "curvature" of a figure can be understood by taking a series of circles that simply "touch" gently each of the undulations of the curve.

The curvature of the figure is proportional to the sum of the inverse radii of each circle. In figure 5.3 [A] and [B], the effect on the curvature when "compressing" the sulcal extent can be seen; [C] and [D], show the effect of higher order "bumps" in the general curvature.

5.3.3.2 Curvature processing

The curvature in general is measured as 1/r, where r is the radius of an inscribed circle. Since mean curvature is the average of the two principal curvatures it has the units of 1/mm. The Gaussian curvature is the product of them, so it is 1/mm^2.

Note that separating “positive” and “negative” curvatures from each other allows us a more accurate measure of the total curvature of the shape. A simple numerical mean of the curvatures found by summing the inverse radii of the osculating circles might mask the additional buckling that has occurred.

In addition to this average curvature data, FreeSurfer calculates, based on a mathematical description of the entire surface covering the entire shape, the mean and Gaussian curvatures. The mean curvature H of a surface is a measure of

61

curvature that comes from differential geometry and that locally describes the curvature of an embedded surface in some ambient space, e.g. Euclidean space. 5.3.4 “Center of mass”

It is not literally the center of mass of which we speak in physics, however it is analogous to the physical concept because it represents the point in space where the greatest amount of intensity is concentrated in the voxels of the morphological image. Voxel-based morphometry is a computational approach to neuroanatomy that measures differences in local concentrations of brain tissue, through a voxel- wise comparison of multiple brain images.

The 3-dimensionsal image in MRI is built up in units called voxels. Each one represents a tidy cube of brain tissue—a 3-D image building block analogous to the 2-D pixel of computers screens, televisions or digital cameras. Each voxel can represent a million or so brain cells. Every structure in an image, (Hippocampus, Amygdala, etc.), are actually clusters of voxels—perhaps tens or hundreds of them.

The Magnetic Resonance Imaging MRI pixel intensity is proportional to the signal intensity of the appropriate voxel. Signal intensity interpretation in MR imaging has a major problem. Often there is no intuitive approach to signal behavior as signal intensity is a very complicated function of the contrast- determining tissue parameter, proton density, T1 and T2, and the machine parameters TR and TE. For this reason, the terms T1 weighted image, T2 weighted image and proton density weighted image were introduced into clinical MR imaging.

Air and bone produce low-intensity, weaker signals with darker images. Fat and marrow produce high-intensity signals with brighter images.

In summary, the "center of mass" parameter measures the site where the brightest areas of the image are concentrated. That is, the areas with a high concentration of marrow and fat in the structure analyzed.

As mentioned in section 3.3.1, Figure 3.4, the center of mass is calculated based on a reference point. The line on the left in the figure, from where the distance "x" is measured. The reference point for this thesis was the center of the arrangement that represents the image of (256x256x256) Voxels. In this project,

62

the distances from the center of reference are measured and the mass (equivalent to the sum of voxels) of the slide at that point for the specific axis is calculated , as shown in the following figure.

Figure 5.4: Illustration of the way in which the center of mass is calculated for the structures of the present thesis.

The calculation is made separately in each of the axes. The result is a point or distance from the reference point in space for each axis, resulting in a point in three- dimensional space.

63

Chapter 6

Use cases

The following two cases are considered useful to reveal the way of use of the tool and the investigative process.

Determine if a variable can be used as anomaly detector for USE CASE-01 the subject Version 1,0,0 Dependencies No Dependencies

The expert or analyst has the hypothesis that a certain variable can be used as a subject anomaly detector Precondition

The following procedure will be executed in the case that the previous hypothesis had been formulated, in order to determine the acceptance or not of the same Description Normal Sequence Step Action Load the variable in the heat map for all the subjects. If clearly visible outliers are perceived, this variable can be considered as a possible anomaly detector for the subjects within one or 1 several chosen brain structures

Choose a group of variables that you want to analyze under this concept and load them into the 2 heat map Choose those variables that have the smallest variances and submit them to step 1 to check by 3 this procedure. There is a list of variables that have passed the test, which tells us that they are variables with relatively common values for most subjects. In subjects where this value is out of the Post condition ordinary, it should be analyzed why. Exceptions Step Action 1 NA 2 NA 3 NA

64

The variables that are determined with this procedure, can indicate outliers that although the reason for these is not known, it can give clues of certain subjects that are worth Comments analyzing in more detail

Below is the previous case with the following example:

Step 1

Load the variable in the heat map for all the subjects. If clearly visible outliers are perceived, this variable can be considered as a possible anomaly detector for the subjects within one or several chosen brain structures

The variable that was loaded in this case is that of relative center of mass for the Right and left caudate structures of 242 patients. It is possible to clearly perceive outliers towards the right side of the figure in red color and therefore it can be considered a variable that allows detecting anomalies of subjects

Figure 6.1: heat map for the variable observation use case

Step 2

Choose a group of variables that you want to analyze under this concept and load them into the heat map

The following variables have been selected to analyze for the hippocampus, amygdala, caudate and putamen for the 242 subjects:

65

cX: Center of mass in x cY: Center of mass in y cZ: Center of mass in z dcX: Relative Center of mass in x dcY: Relative Center of mass in y dcZ: Relative Center of mass in z disC: Relative Center of mass max: Max value of voxel intensity for the structure mean: Mean value of voxel intensity for the structure median: Median value of voxel intensity for the structure min: Min value of voxel intensity for the structure std: Std value of voxel intensity for the structure sum: Sum value of voxel intensity for the structure vol: voxel counting for the structure

Figure 6.2: heat map for the variable overview observation use case

Step 3

Choose those variables that have the smallest variances and submit them to step 1 to check by this procedure.

As can be seen, the variables with the lowest variance are those found in the lower part of the heat map, once they are sorted by this criterion, using the controls on the right side.

66

Figure 6.3: heat map for the variable overview observation use case, sorting variables by variance

Volume, shape and content of the brains internal structures USE CASE-02 variables exploration Version 1,0,0 Dependencies No Dependencies

The expert or analyst aims to perform an exploratory analysis of the variables, which will allow you to see if there are trends, relationships or patterns. Pre-calculations of the subjects to study must be done before. Precondition

The following procedure aims to perform an exploratory analysis of the variables related to volume, shape and content of brain structures Description Normal Sequence Step Action

Load all the variables, volume, shape and content, (Destrieux 1 Atlas structures), in heat map for all the subjects.

Sort these variables by the different available criteria such as: cluster, any of the variables, sum, variance, or the criteria that the analyst or expert wants to experience according to their 2 knowledge and / or intuition.

67

If the analyst or expert wants to go into detail and compare one or more subjects, move on to the comparison of volumes of the brains internal structures , that allows you to select the analysis subjects and the structures that you want to 3 compare.

The analyst or expert confirms or not hypotheses or suspicions. If it is not possible to confirm them in a first inspection, more focus on the variables Post condition or subjects of interest should be made. Exceptions Step Action 1 NA 2 NA 3 NA

After having done the analysis. The expert or analyst may want to filter several variables or subjects to focus on some of them according to their initial vision, intuition or knowledge. The process can then be restarted Comments from step 1 with these new conditions.

Step 1

Load all the variables, volume, shape and content of the brains internal structures, (Destrieux Atlas structures), in heat map for all the subjects.

The following figure shows the analysis matrix of 176 subjects in terms of geometric parameters of volume, surface areas, of white matter. Red represents high values and blue represents low values

68

Figure 6.4: heat map for the variable overview observation use case, sorting variables by clusters

Figure above, two large clusters, group 1 and group 2 are observed. The two differ mostly in a geometric feature, which is the volume, (Nvoxels), of the different structures as can be seen below, there is no evidence that there are differences between hemispheres within the subjects that allow us to conclude that a certain hemisphere is different from the other in this geometric aspect.

Step 2

Sort these variables by the different available criteria such as: cluster, any of the variables, sum, variance, or the criteria that the analyst or expert wants to experience according to their knowledge and / or intuition.

69

Figure 6.5: heat map for the variable overview observation use case, sorting variables by cluster, presenting additional variables in upper and left

Group 1 with smaller volumes is mostly made up of female subjects, while group 2 with larger volumes is mostly made up of male subjects.

Step 3

If the analyst or expert wants to go into detail and compare one or more subjects, move on to the comparison of volumes that allows you to select the analysis subjects and the structures that you want to compare.

At this point, if the analyst wants to compare the volumes, intersection and internal content of two subjects. One from group 1 and one from group 2 can do it.

Note: at this time the image data must be placed on the same server in order to perform these calculations

70

Figure 6.6: Volume analysis module of the tools worked on this project

The previous figure shows the volume comparison for the hippocampus of subjects 8 and 940. A larger volume is observed in said structure for subject 8 with respect to 940.

Subject 8 has higher values at the Mean, Sum, Median and standard deviation level. The physical significance of the above implies that the subject 8 has a greater

71

volume and its content at the level of intensity of voxels is greater in all statistical parameters. Medical significance must still be analyzed and discussed with experts.

It can be seen that the first case has to do with the profiling of the variables that describe the subjects. This procedure can be performed each time you want to quickly determine if a variable can serve as a subject outlier identifier. It is a method of quickly collecting variables that have common values for most subjects and that only varies especially for a few subjects, that is, to detect outliers that can be examined in more detail after this procedure.

The second use case is a general way to test the results of the tool and contrast them with already established and tested results. This allows validating the observations and finding other possible additional results. Additionally, this use case serves to mark the general procedure of performing an analytical exploratory visualization of the data.

6.1 Application possibilities

The knowledge generation process is a non-trivial cyclical process. The applications that allow to generate knowledge through the process of data science, must scale in a way that facilitates this series of cycles, modifying the variables and characteristics according to the needs over and over again until decanting the information that contributes the value to the generation of knowledge.

The application is made up of 3 modules that allow the analysis of specific characteristics of the information relevant to this project:

Clustergrammer: library tool that allows to analyze the variables and their characteristics in the form of a matrix. This tool allows us to see in condensed form

72

a large amount of data, from between 500,000 to 1,000,000 cells, which is suitable in the case of more than 170 subjects of the Canguro program and a similar number of control subjects. Regarding the characteristics, it was possible to obtain up to 1900 features related to the shape and volume of the brain structures of the subjects. The tool allows to see the clusters of features and / or variables with the previous calculation of cosine distances for the arrangement of variables, either in columns or in rows.

Volume Compare: It is a feature developed by the author to compare two or more brain structures of the subjects. It performs the the center of mass calculation of the structures and puts the structures in this center to calculate their intersection. This is a way to characterize two or more subjects. On the other hand, it shows the statistical values of the contents such as the histogram of the values of voxels, maximum, minimum, average, average and standard deviation.

3D: It is a tool to visualize the selected structures in the number of subjects chosen in the "volume compare" tab. You can see the structures displaced to the center of mass to see the intersection between them visually. You can see the protuberances and shapes that can capture the attention of experts.

For the tools to be more flexible, the way to load the information must be dynamic and with the option of being able to search, filter and select insofar as the scientific processing of data is carried out.

The following are some views of the filtering facilities for Clustergrammer and Volume compare that allow to select subjects and features of the study by taking a pre-processed text file by FreeSurfer and an adaptation layer that scales the values of the variables to uniform values that can be represented in Clustergrammer.

73

Figure 6.7: Main screen to filter variables for the shape and volume analysis cluster tool with charged subjects and features (subjects and features already selected)

Figure 6.8: Output of volume and content analysis screen for two subjects at hippocampal level.

Figure 6.9: histogram for both hippocampi in the selected subjects

74

Figure 6.10: 3D view for the two hippocampi in the selected subjects

Figure 6.11: 3D view for the two hippocampi in the selected subjects (changed point of view)

6.2 Dimensionality reduction

The view of features vs. features not only allows the reduction of dimensionality, but also allows us to see which variables are inversely related. The features that most resemble, are related by means of the positive correlation in red the similarity matrix. Likewise, inversely related variables are identified by the negative correlation in blue within the matrix.

Following is the case of the features vs features matrix for structure content characteristics:

75

The features are:

"Center of mass" (parameter in x) Cx of the structure "Center of mass" (parameter in y) Cy of the structure "Center of mass" (parameter in z) Cz of the structure Voxel count of the structure Sum of value of voxels of the structure Average voxel value of the structure median voxel value of the structure Standard deviation of voxel value of the structure.

According to the following figure, there are red areas where there is a greater affinity between features, (apart from those on the diagonal), which indicates that these features contain relatively redundant information. On the other hand, there are areas in blue that indicate opposite and complementary information.

Figure 6.12: view of feature against feature matrix for dimensionality reduction

76

In the following figure, for example, it is observed that the voxel value summation variables and the individual sum of the same voxels in general provide similar information exactly 67%.

Figure 6.13: detailed view of features with positive relationship (in red)

On the other hand, the following graph shows that the value of the "center of mass" of the putamen moves in a positive direction (towards one side), if the volumes of the hippocampus, amygdala and putamen tend to be smaller.

Figure 6.14: detailed view of features with negative relationship (in blue)

77

6.3 Some Cases observed

The following are example cases with real data that are exposed to show some insights and the procedure carried out to reach them.

Brain structures analysis

The following figure shows the analysis matrix of 176 subjects in terms of geometric parameters of volume, surface areas, of different brain’s structures. The red color represents the high values and the blue one, the low values

Figure 6.15: two clusters to be analyzed with the cluster tool of the solution for Braviz 2

In the previous figure, there are two large clusters, group 1 and group 2. The two differ mainly in a geometric feature, which is the volume (Nvoxels) of the different structures of the brain. As can be seen below, there is no evidence that there are differences between hemispheres within the subjects that allows concluding that a certain hemisphere is different to the other in this geometrical aspect.

78

Figure 6.16: two clusters to be analyzed with the cluster tool of the solution for Braviz 2 (Quantification of P value)

In the previous figure if it is evident, (for P value less than 0.05), there is a difference between the subjects of groups 1 and 2 at the level of volume, average and minimum values of said volumetric characteristic. The parameter BRITH_Sex, is mostly sex 2 for where the volume has higher values and mostly sex 1 for where the volume is mostly smaller.

Dimensionality reduction

In the following figure it is observed that the features: NumVert (number of vertices of the tesselization) and surfArea (surface area of the structure), have very similar information. So we proceed to remove one of them.

79

Figure 6.17: Dimension reduction based on behavioral similarity view across all observed subjects

Something similar happens with the variables GausCurv, FoldInd and CurvInd. Of which only CurvInd will remain.

Figure 6.18: Dimension reduction based on behavioral similarity view across all observed subjects

80

The reduction done previously can be corroborated by looking at the lower cluster on the right side where most of the items with feature NumVert, SurfArea and GrayVol reside. Something similar happens with the upper cluster where most curvature features reside.

Figure 6.19: reduction of dimensions based on contribution on the characterization in subjects, based on the pvalue (curvature characteristics)

Figure 6.20: reduction of dimensions based on contribution on the characterization in subjects, based on the pvalue (surface characteristics)

81

If these variables are eliminated and the columns and rows are flipped, the following figure is obtained, where the male sex shows higher "surface area" values, while the female sex shows lower values.

Figure 6.21: difference between male and female subjects at the level of shape parameters having already made the reduction of dimensionality

Example of Canguro case

The following figures, represents volume of the brains internal structures comparative view between control subjects, (on the left), and kangaroo subjects, (on the right), both ordered by fragility, (greater fragility on the left side), for specific structures (Amygdala, Putamen, Caudate and hippocampus)

Canguro Subjects: Premature subjects who participate in the kangaroo method for a period.

Control subjects: Premature subjects who did not participate in the kangaroo method.

82

Figure 6.22: Comparative view between control subjects, (on the left), and kangaroo subjects, (on the right), both ordered by fragility, (greater fragility on the left side). Fragility is an “initial state variable”

In general, it is observed that the volumes in red, (those of greater value), are a little more lateralized to the right in the view of the control subjects and more distributed throughout the graph of the kangaroo subjects3. In fact, the values are of greater magnitude in this graph because the red color is more strongly dominant (especially in the caudate). All structures, (hippocampus, putamen, caudate, amygdala), look a little more reddish, (with greater volume), even with high fragilities on the Canguro´s side. Even with such very high weaknesses, the color blue, (lower volumes), are not as low as with the control subjects, since the shade of blue is a little less strong for subjects in Canguro.

Let´s compare the most fragile subjects, that is, (those that are to the left end of each of the previous figures), of each of the previous groups, (control and Canguro respectively), to analyze them in detail.

Feature Canguro Control group group 'Subject ID' 765 369 'NHPT_AverageMD' 26.5 20.5

3 the baby's charging time by the mother or father or some other family member is represented by the variable EX41_durPCconcontroles

83

'FOLLOW_Fragility_Rasch_441_2PL' 2.1571 2.7289 'EX41_durPCconcontroles03' 22 0 'location' 1 2

Using the volume and content comparison tool we find the following information that allows focusing the analysis. In this case, the caudate of both subjects has been selected to see their internal and volumetric detail

Figure 6.23: Comparative view between the most fragile subjects in the control and Canguro groups. The subject of the control group has 27% lower volume in the Caudate.

The content values for both flows are still comparable in terms of mean, medium and standard deviation as seen in the previous figure.

84

With the following histogram you can see the distribution of content values and their quantities where you can clearly see the difference in volumes between both subjects for their caudate.

Figure 6.24: Comparative histogram for the most fragile subjects in the control and Canguro groups. A clear difference in both histograms is evident for those control and Canguro subjects

The difference is evident in the amount of “infinitesimal” volumes that make up the total structure of the caudate whose values are equivalent to the mean value that for both caudate is similar.

Through the tool, it is possible to observe the volumetric structures and compare them in an interactive 3D view that allows the experts to have an interesting detail because they can see differences not evident between the studied subjects

85

Figure 6.25: Volumetric comparative view between the caudate structures of the most fragile subjects for both Kangaroo and control

This type of view also shows the difference in volumes between the kangaroo and control subjects. Let´s continue with some other views.

Figure 6.26: Comparative view between less fragile subjects and more fragile subjects for both control and Canguro. Less fragile have in general greater structure volume and vice versa.

Fragility comparison, greater than and less than 0.5. (Left greater than 0.5, right less than 0.5) for both control and kangaroo subjects, (Mixed subjects). On the left, the fragility is the greatest starting from the value 0.5, (2.7289 to 0.5146), on

86

the right the fragility is lower (0.4882 to -1.0152). Greatest volume values are observed the less the fragility (right view) and vice versa in the figure on the left.

In the figure on the left, where the fragility is greater, it is observed that the volume of the structures is greater when the charging time in Cangaroo is increased.

Figure 6.27: Comparative view between Canguro and control subjects ordered by NHTP_promedioMD, (Average dominant hand), which is a variable of type "subject performance variable”.

In the left figure Cangaroo subjects ordered by the variable NHPT_promedioMD, (Average dominant hand), (greater on the left side of the figure), (27 to 14.5). It is observed that the variable charge duration forms a kind of “clusters” with peak values and low values between them.

In the right figure, the uncharged subjects, (control) subjects, ordered by the same variable, NHPT_MediumMD (greater than the left side of the figure), (33 to 13.5).

87

Both figures show a distribution of volumes in species of alternate "clusters" (of positive size and then of smaller than average size), more clearly visible for loaded subjects. It is evident that those that were not loaded, show lower volumes, (more blue and more intense components), than those that were loaded.

The potential of performing this type of analysis for the NHPT, (NHTP_promedioMD), performance relationship with the most fragile subjects and the kangaroo loading time is shown here. Compared with subjects of comparable fragility characteristics in the control group.

88

Chapter 7

Conclusions

The influence of different factors such as the sex of the subject, over his condition of being premature on the surface of brain structures is evidenced. In fact, a clear differentiation is observed between the male and female subjects of the Canguro program in terms of the volume and surface area of brain structures. The previous exercise shows that kangaroo subjects generally have higher volumes in structures, (Amygdala, Putamen, Caudate and hippocampus), than premature subjects who have not undergone the kangaroo method.

From the use case 1, it was seen that the characteristic center of relative mass of the caudate, turns out to have the lowest of the variances among the four structures mentioned above. The caudate is then a structure that at the level of content is the most uniform and therefore some irregularity within it can be detected easier than in the rest of them.

The knowledge search process must be continuous, otherwise several of the initial iterations will be lost. The process of data mining is enriched with each cycle and the secret lies in a very good documentation, publication and collaboration.

7.1 Future Work The scientific data process is a non-trivial continuous cyclic process that evolves as it progresses over time. For this reason, it is necessary to continue with the process to refine the model and generate better results. It is very important to

89

work continuously with brain experts so that their feedback refines the type of variables and features required achieving a greater actionable knowledge. During the process of this Project, many insights were corroborated along with the experts. This validates the tool and drives to continue with the proposed cycles hoping for additional insights to be found. Several clusters were observed in the data mining exercise together with the neurologist. It is necessary to associate these clusters with the environment variables of the subject to confirm why and how the brain structures were affected.

The process and several of the techniques and tools used in the present project can also be applied in analogous problems in which one wishes to find patterns that allow the characterization of volumetric or 2D images. Already experts have proposed applying this research to prostate analysis or to the analysis of retina images.

Much of the preprocessing of this project was leveraged in the FreeSurfer tool. We wanted to take advantage of the work done in Braviz 1 in which all the information of the subjects was already pre-processed and available. FreeSurfer is a tool with a high degree of maturity that generates a large amount of information that according to this work is in many cases redundant and abundant. Another part of the preprocessing was implemented by the author, as was the case of the "centers of mass" and the internal characterizations of the voxels of the structures. It would still be interesting to add greater sources of pre-processing to the parameters of interest of this thesis, (shape and content), (such as those executed with FSL or Mindboggle). So that the segmentations and parcellations coming from only one source are refined.

It is important to make the tool even more flexible so that the variables can be filtered in a simple way and according to the needs of the experts, to be able to play with them to expand the possibilities of generating as many insights as possible.

90

References

*This references contain information related to different studies at the level of volume and shape in different brain structures which show a relationship between these characteristics and different aspects of the subject such as tumors, diseases or problems in brain development:

***Thisreferences are related to the impact on different brain structures in premature patients. These references guide us about which areas may be more susceptible to premature birth. these references also deal with the relationships between the volume of brain structures and different pathologies in subjects

[1] J. Cabral, M. L. Kringelbach, and G. Deco, “Functional connectivity dynamically evolves on multiple time-scales over a static structural connectome: Models and mechanisms,” Neuroimage, vol. 160, pp. 84–96, Oct. 2017.

[2] C.-S. Chen, S.-Y. Lin, M.-H. Fan, and C.-H. Huang, “A Closed-Form Algorithm for Converting Hilbert Space-Filling Curve Indices *.”

[3] M. K. Chung, “Statistical Methods in Brain Image Analysis with MATLAB.”

[4] M. K. Chung, Statistical and computational methods in brain image analysis. .

[5] S. B. Eickhoff, B. T. T. Yeo, and S. Genon, “Imaging-based parcellations of the human brain,” Nat. Rev. Neurosci., vol. 19, no. 11, pp. 672–686, Nov. 2018.

[6] L. Fan et al., “The Human Brainnetome Atlas: A New Brain Atlas Based on Connectional Architecture,” Cereb. Cortex, vol. 26, no. 8, pp. 3508–3526, Aug. 2016.

[7] B. Fischl and M. I. Sereno, “Microstructural parcellation of the human brain,” Neuroimage, vol. 182, pp. 219–231, Nov. 2018.

[8] K. J. (Karl J. . Friston, J. Ashburner, S. Kiebel, T. Nichols, and W. D. Penny, Statistical parametric mapping : the analysis of funtional brain images. Elsevier/Academic Press, 2007.

91

[9] V. L. Galinsky and L. R. Frank, “Automated segmentation and shape characterization of volumetric data.,” Neuroimage, vol. 92, pp. 156–68, May 2014.

[10] I. García, F. Tutoras, D. Carmen, S. Gotarredona, D. Begoña, and A. Piñero, “Aportaciones a la Segmentación y Caracterización de Imágenes Médicas 3D.”

[11] G. Gerig, M. Styner, D. Jones, D. Weinberger, and J. Lieberman, “Shape analysis of brain ventricles using SPHARM.”

[12] D. Grado and P. R. Román, “ESTAD´ISTICAESTAD´ESTAD´ISTICA DESCRIPTIVA E INTRODUCCI´ONINTRODUCCI´ INTRODUCCI´ON A LA PROBABILIDAD.”

[13] Ian L. Dryden; Kanti V. Mardia, “17 Shape in images - Statistical Shape Analysis, 2nd Edition,” Wiley, 2016. [Online]. Available: https://www.safaribooksonline.com/library/view/statistical-shape- analysis/9780470699621/c17.xhtml. [Accessed: 03-Oct-2018].

[14] Joelle M. Abi-Rached; Nikolas Rose, “Two: The Visible Invisible - Neuro,” Princeton University Press, 2013. [Online]. Available: https://www.safaribooksonline.com/library/view/neuro/9781400846337/11 _Chapter2.xhtml. [Accessed: 03-Oct-2018].

[15] M. Meuschke, P. Berg, R. WickenhöferWickenh, B. Preim, and K. Lawonn, “Visual Analysis of Aneurysm Data using .”

[16] R. A. Morey et al., “A comparison of automated segmentation and manual tracing for quantifying hippocampal and amygdala volumes ☆,” Neuroimage.

[17] A. Sherbondy, “Shape Analysis of Fiber Tractography in the Human Brain.”

[18] P. Shilane and T. Funkhouser, “Selecting Distinctive 3D Shape Descriptors for Similarity Retrieval.”

[19] J. W. H. Tangelder and R. C. Veltkamp, “A Survey of Content Based 3D Shape Retrieval Methods.”

[20] C. L. Tardif, A. Schäfer, M. Waehnert, J. Dinse, R. Turner, and P.-L. Bazin, “Multi- contrast multi-scale surface registration for improved alignment of cortical areas,” Neuroimage, vol. 111, pp. 107–122, May 2015.

92

[21] Q. Wang, R. Chen, J. JaJa, Y. Jin, L. E. Hong, and E. H. Herskovits, “Connectivity- Based Brain Parcellation: A Connectivity-Based Atlas for Schizophrenia Research.,” Neuroinformatics, vol. 14, no. 1, pp. 83–97, Jan. 2016.

[22] J. Weissenbock, B. Frohler, E. Groller, J. Kastner, and C. Heinzl, “Dynamic Volume Lines: Visual Comparison of 3D Volumes through Space-filling Curves,” IEEE Trans. Vis. Comput. Graph., pp. 1–1, 2018.

[23] Z. Wu et al., “3D ShapeNets: A Deep Representation for Volumetric Shapes.”

[24] L. Zhang, M. João, D. Fonseca, and A. Ferreira, “União Europeia-Fundos Estruturais Governo da República Portuguesa PROJECTOS DE INVESTIGAÇÃO CIENTÍFICA E DESENVOLVIMENTO TECNOLÓGICO Survey on 3D Shape Descriptors.”

[25] J. Zhuo, “Introduction to Statistical Parametric Mapping.” [Online]. Available: https://www.fil.ion.ucl.ac.uk/spm/doc/intro/#_IV.__Statistical_Parametric Mapping. [Accessed: 04-Oct-2018].

[26] J. Zhuo, Handbook of Medical Image Processing and Analysis. Elsevier, 2009.

[27] “THE NIFTI-1 DATA FORMAT.”

[28] “Probabilidad condicionada - Wikipedia, la enciclopedia libre.” [Online]. Available: https://es.wikipedia.org/wiki/Probabilidad_condicionada. [Accessed: 04-Oct-2018].

[29] “What Is FMRI? - Center for Functional MRI - UC San Diego.” [Online]. Available: http://fmri.ucsd.edu/Research/whatisfmri.html. [Accessed: 21-Nov- 2018].

[30] “Fundación Canguro.” [Online]. Available: http://fundacioncanguro.co/. [Accessed: 21-Nov-2018].

[31] “SPM - Wikibooks, open books for an open world.” [Online]. Available: https://en.wikibooks.org/wiki/SPM. [Accessed: 04-Oct-2018].

[32] D. A. Angulo, C. Schneider, J. H. Oliver, N. Charpak, and J. T. Hernandez, “A Multi-facetted Visual Analytics Tool for Exploratory Analysis of Human Brain and Function Datasets.,” Front. Neuroinform., vol. 10, p. 36, 2016.

93

[33] S. C. Ng, “Principal component analysis to reduce dimension on digital image,” Procedia Comput. Sci., vol. 111, pp. 113–119, Jan. 2017.

[34] T. D. Pham et al., “The hidden-Markov brain: comparison and inference of white matter hyperintensities on magnetic resonance imaging (MRI),” J. Neural Eng., vol. 8, no. 1, p. 016004, Feb. 2011.

[35] G. A. Carmichael, “The Cohort and Period Approaches to Demographic Analysis,” 2016, pp. 85–128.

[36] G. A. Carmichael, Fundamentals of Demographic Analysis: Concepts, Measures and Methods, vol. 38. Cham: Springer International Publishing, 2016.

[37] G. A. Carmichael, “Comparison: Standardization and Decomposition,” Springer, Cham, 2016, pp. 49–84.

[38] M. M. Kazhdan, “SHAPE REPRESENTATIONS AND ALGORITHMS FOR 3D MODEL RETRIEVAL A DISSERTATION PRESENTED TO THE FACULTY OF PRINCETON UNIVERSITY IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY RECOMMENDED FOR ACCEPTANCE,” 2004.

[39] R. Pienaar, B. Fischl, V. Caviness, N. Makris, and P. E. Grant, “A METHODOLOGY FOR ANALYZING CURVATURE IN THE DEVELOPING BRAIN FROM PRETERM TO ADULT.,” Int. J. Imaging Syst. Technol., vol. 18, no. 1, pp. 42–68, Jun. 2008.

[40] B. Fischl et al., “Whole Brain Segmentation: Automated Labeling of Neuroanatomical Structures in the Human Brain,” Neuron, vol. 33, no. 3, pp. 341– 355, Jan. 2002.

[41] “Visual Information-Seeking Mantra - InfoVis:Wiki.” [Online]. Available: https://infovis-wiki.net/wiki/Visual_Information-Seeking_Mantra. [Accessed: 15-Jul-2019].

[42] T. Munzner and E. (Graphic artist) Maguire, Visualization analysis & design. .

[43] ***O. Potvin, A. Mouiha, L. Dieumegarde, and S. Duchesne, “Normative data for subcortical regional volumes over the lifetime of the adult human brain,” Neuroimage, vol. 137, 2016.

94

[44] * J. V. Manjón and P. Coupé, “volBrain: An Online MRI Brain Volumetry System,” Front. Neuroinform., vol. 10, 2016.

[45] *** D. Diaz, J. Villegas, J. A. Guerra-Gomez, N. Charpak, and J. T. Hernández, “Visual tools for the exploration of growth data in a cohort of kangaroo infants during their first year of life,” in 2017 IEEE Workshop on Visual Analytics in Healthcare, VAHC 2017, 2018.

[46] * A. A. Salam, T. Khalil, M. U. Akram, A. Jameel, and I. Basit, “Automated detection of glaucoma using structural and non structural features,” Springerplus, vol. 5, no. 1, 2016.

[47] M. Joseph Nitzken and M. Joseph, “Shape analysis of the human brain. Recommended Citation,” 2015.

[48] *** P. Coupé, G. Catheline, E. Lanuza, J. Manjón, and V. Manjón, “Towards a unified analysis of brain maturation and aging across the entire lifespan: A MRI analysis.”

[49] * S. S. Keller and N. Roberts, “Measurement of brain volume using MRI: software, techniques, choices and prerequisites,” 2009.

[50] * M. Styner, I. Oguz, S. Xu, C. Brechbühler, D. Pantazis, and G. Gerig, “Statistical Shape Analysis of Brain Structures using SPHARM-PDM.”

[51] * M. Styner et al., “Framework for the Statistical Shape Analysis of Brain Structures using SPHARM-PDM.,” Insight J., no. 1071, pp. 242–250, 2006.

[52] * G. Gerig, M. Styner, D. Jones, D. Weinberger, and J. Lieberman, “Shape analysis of brain ventricles using SPHARM,” in Proceedings IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA 2001), pp. 171– 178.

[53] M. E. Brummer, R. M. Mersereau, R. L. Eisner, and R. R. J. Lewine, “Automatic detection of brain contours in MRI data sets,” IEEE Trans. Med. Imaging, vol. 12, no. 2, pp. 153–166, Jun. 1993.

95

[54] * S. Ruan, C. Jaggi, J. Xue, J. Fadili, and D. Bloyet, “Brain tissue classification of magnetic resonance images using partial volume modeling,” IEEE Trans. Med. Imaging, vol. 19, no. 12, pp. 1179–1187, 2000.

[55] * S. Tootoonian, R. Abugharbieh, X. Huang, and M. J. McKeown, “Shape vs. Volume: Invariant Shape Descriptors for 3D Region of Interest Characterization in MRI,” in 3rd IEEE International Symposium on Biomedical Imaging: Macro to Nano, 2006., pp. 754–757.

[56] * B. Singh, N. Marshkole, K. Singh, and A. S. Thoke, “Texture and Shape based Classification of Brain Tumors using Linear Vector Quantization,” 2011.

[57] * G. Sanabria-Diaz et al., “Surface area and cortical thickness descriptors reveal different attributes of the structural human brain networks,” Neuroimage, vol. 50, no. 4, pp. 1497–1510, May 2010.

[58] * U. Castellani et al., “A New Shape Diffusion Descriptor for Brain Classification,” Springer, Berlin, Heidelberg, 2011, pp. 426–433.

[59] *** W. Y. Loh et al., “Neonatal basal ganglia and thalamic volumes: very preterm birth and 7-year neurodevelopmental outcomes.,” Pediatr. Res., vol. 82, no. 6, pp. 970–978, Dec. 2017.

[60] *** N. Charpak et al., “Twenty-year Follow-up of Kangaroo Mother Care Versus Traditional Care,” Pediatrics, vol. 139, no. 1, p. e20162063, Jan. 2017.

[61] *** M. H. Beauchamp et al., “Preterm infant hippocampal volumes correlate with later working memory deficits,” Brain, vol. 131, no. 11, pp. 2986–2994, Jun. 2008.

[62] *** Y. Lao et al., “Thalamic alterations in preterm neonates and their relation to ventral striatum disturbances revealed by a combined shape and pose analysis.,” Brain Struct. Funct., vol. 221, no. 1, pp. 487–506, Jan. 2016.

[63] *** C. Solé-Padullés et al., “Intrinsic connectivity networks from childhood to late adolescence: Effects of age and sex,” Dev. Cogn. Neurosci., vol. 17, pp. 35–44, Feb. 2016.

96

[64] * R. W. Cox, “AFNI: Software for Analysis and Visualization of Functional Magnetic Resonance Neuroimages,” Comput. Biomed. Res., vol. 29, no. 3, pp. 162– 173, Jun. 1996.

[65] *** C. M. Y. Chau et al., “Hippocampus, Amygdala, and Thalamus Volumes in Very Preterm Children at 8 Years: Neonatal Pain and Genetic Variation,” Front. Behav. Neurosci., vol. 13, p. 51, Mar. 2019.

[66] *** A. L. Cismaru et al., “Altered Amygdala Development and Fear Processing in Prematurely Born Infants.,” Front. Neuroanat., vol. 10, p. 55, 2016.

97

Annex A:

Profile description of important variables according to experts, for analysis of the tool with kangaroo data. code BirthDate Initial_State SCB_agemother Initial_State BIRTH_peso5 Initial_State BIRTH_talla5 Initial_State BIRTH_pc5 Initial_State BIRTH_apgar1_5 Initial_State BIRTH_apgar5_5 Initial_State BIRTH_sexo5 Initial_State BIRTH_retardocrecimientointraute Initial_State BIRTH_gestacat Initial_State BIRTH_semanas32 Initial_State BIRTH_peso1200gr Initial_State NEO_gestasalcat Initial_State NEO_gestasal Initial_State NEO_peso6 Initial_State FOLLOW_Fragility_Rasch_441_2PL Initial_State FOLLOW_Fragility_Rasch_746_2PL Initial_State FOLLOW_GA_day1intervention Initial_State FOLLOW_ageday1intervention Initial_State EX41_pesosalidaPC Intervention EX41_tallasalidaPC Intervention EX41_PCsalidaPC Intervention EX41_EGsalidaPC Intervention EX41_egestasal Intervention EX41_papacarga Intervention EX41_papacarga02 Intervention EX41_mamacargo Intervention EX41_separationfrommother Intervention WASI_PercRsngcompositescore Intervention

BIRTH_pc5 Numeric Distinct count 21 Unique (%) 18.6% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 298.31 Minimum 235 Maximum 360

98

Zeros (%) 0.0% Mini histogram Toggle details

BIRTH_apgar5_5 Numeric Distinct count 6 Unique (%) 5.3% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 9.0442 Minimum 5 Maximum 10 Zeros (%) 0.0% Mini histogram Toggle details

BIRTH_talla5 Numeric Distinct count 17 Unique (%) 15.0% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 405.75 Minimum 310 Maximum 480 Zeros (%) 0.0% Mini histogram Toggle details

BIRTH_peso5 Numeric Distinct count 31 Unique (%) 27.4% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 1547.1 Minimum 800 Maximum 1800 Zeros (%) 0.0% Mini histogram Toggle details

FOLLOW_ageday1intervention Highly correlated This variable is highly correlated with EX41_separationfrommother and should be ignored for analysis

Correlation 0.91366

99

BIRTH_peso1200gr Boolean Distinct count 2 Unique (%) 1.8% Missing (%) 0.0% Missing (n) 0 0 99 1 14 Toggle details

EX41_egestasal Numeric Distinct count 57 Unique (%) 50.4% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 36.829 Minimum 31.143 Maximum 43.571 Zeros (%) 0.0% Mini histogram Toggle details

NEO_gestasal Numeric Distinct count 42 Unique (%) 37.2% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 34.994 Minimum 31.143 Maximum 40.429 Zeros (%) 0.0% Mini histogram Toggle details

EX41_separationfrommother Numeric Distinct count 39 Unique (%) 34.5% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 16.434 Minimum 1 Maximum 64 Zeros (%) 0.0% Mini histogram Toggle details

EX41_papacarga

100

Categorical Distinct count 3 Unique (%) 2.7% Missing (%) 0.0% Missing (n) 0 3 53 1 42 0 18 Toggle details

BIRTH_gestacat Categorical Distinct count 3 Unique (%) 2.7% Missing (%) 0.0% Missing (n) 0 1 53 2 37 3 23 Toggle details

BirthDate Date Distinct count 91 Unique (%) 80.5% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Minimum 1993-06-18 00:00:00 Maximum 1994-09-10 00:00:00 Mini histogram Toggle details

BIRTH_apgar1_5 Numeric Distinct count 8 Unique (%) 7.1% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 7.6018 Minimum 1 Maximum 10 Zeros (%) 0.0% Mini histogram Toggle details

NEO_peso6 Numeric Distinct count 47

101

Unique (%) 41.6% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 1573 Minimum 1025 Maximum 1890 Zeros (%) 0.0% Mini histogram Toggle details

NEO_gestasalcat Highly correlated This variable is highly correlated with NEO_gestasal and should be ignored for analysis

Correlation 0.98435

FOLLOW_GA_day1intervention Numeric Distinct count 44 Unique (%) 38.9% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 34.604 Minimum 30.571 Maximum 40.429 Zeros (%) 0.0% Mini histogram Toggle details

SCB_agemother Numeric Distinct count 22 Unique (%) 19.5% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 28.283 Minimum 19 Maximum 40 Zeros (%) 0.0% Mini histogram Toggle details

BIRTH_semanas32 Boolean Distinct count 2 Unique (%) 1.8% Missing (%) 0.0% Missing (n) 0

102

0 60 1 53 Toggle details

FOLLOW_Fragility_Rasch_746_2PL Numeric Distinct count 17 Unique (%) 15.0% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 0.2142 Minimum -1.1861 Maximum 1.6533 Zeros (%) 0.0% Mini histogram Toggle details

WASI_PercRsngcompositescore Numeric Distinct count 47 Unique (%) 41.6% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 89.858 Minimum 52 Maximum 123 Zeros (%) 0.0% Mini histogram Toggle details

code Numeric Distinct count 113 Unique (%) 100.0% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 531.9 Minimum 9 Maximum 1049 Zeros (%) 0.0% Mini histogram Toggle details

BIRTH_retardocrecimientointraute Boolean Distinct count 2

103

Unique (%) 1.8% Missing (%) 0.0% Missing (n) 0 0 81 1 32 Toggle details

EX41_mamacargo Boolean Distinct count 2 Unique (%) 1.8% Missing (%) 0.0% Missing (n) 0 1 58 0 55 Toggle details

BIRTH_sexo5 Boolean Distinct count 2 Unique (%) 1.8% Missing (%) 0.0% Missing (n) 0 1 64 2 49 Toggle details

EX41_papacarga02 Boolean Distinct count 2 Unique (%) 1.8% Missing (%) 0.0% Missing (n) 0 0 71 1 42 Toggle details

FOLLOW_Fragility_Rasch_441_2PL Numeric Distinct count 102 Unique (%) 90.3% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 0.36574 Minimum -1.0152 Maximum 2.7289 Zeros (%) 0.0% Mini histogram Toggle details

104

Correlation Pearson’s

All the previous variables all can be used to feeding the tool, independent of its type, that is, if it is numerical or categorical, because the heat map has at the top zone, the ability to introduce category or membership variables, which constitutes a type of hierarchy.

105