Analytic tools for content and shape analysis in 3D brain images
Pedro Alexander Díaz Quiroga Departamento de Ingeniería de Sistemas y Computación Universidad de los Andes
A thesis submitted for the degree of Maestría en ingeniería de información Agosto, 2019
2
Gratitude
I would like to thank my thesis director, Professor Jose Tiberio Hernandez for his guidance and advice during this thesis process. In the same way, I want to thank Dr. Nathalie Charpak for her constant support and answers to my concerns. I would also like to thank Dr. Jorge Marín, Dr. Diego Angulo and engineer Alejandra Castelblanco, who collaborated with me, framing the project within an even larger project of added value for industry and society. Finally, I thank my family.
3
Abstract
This document is the product of the work done for the master´s degree in information engineering project of the author, which aims to add value to the information asset of the Canguro program, (http://fundacioncanguro.co/), concerning the study of the development of premature patients and specifically, the impact of different aspects of the environment in the brain of said patients. The project takes the brain images of subjects from different ages, environments and genders and provides tools that allow to visualize and group the subjects according to physical characteristics of the structures of their brain, such as volume and shape.
4
1 Contents
CHAPTER 1. INTRODUCTION 7
CHAPTER 2. GENERAL PROJECT DESCRIPTION 8
2.1 Objectives 8
2.2 Background 8
2.3 Problem Definition and Significance 9
CHAPTER 3. PROJECT STATEMENT AND SPECIFICATIONS 11
3.1 Problem description 11
3.2 Design description 11
3.3 Data profiling 12
CHAPTER 4. PROJECT DEVELOPMENT 25
4.1 Referential Framework 25
4.2 Design proposal 45
CHAPTER 5 IMPLEMENTATION 55
5.1 Description 55
5.2 Dependencies 56
5.3 Studied Parameters 56
CHAPTER 6 USE CASES 64
6.1 Application possibilities 72
6.2 Dimensionality reduction 75
6.3 Some Cases observed 78
CHAPTER 7 CONCLUSIONS 89
5
7.1 Future Work 89
REFERENCES 91
6
Chapter 1. Introduction
The purpose of this graduation project is to take advantage of the data that is available in the Canguro Foundation in Colombia, related to the brain images of patients taken during 20 years of follow-up since their premature birth. This data was exploited during the execution of the project, and the results were shown to medical experts for interpretation.
Exploitation of the data in this context means to understand it, to transform it into an adequate format and to convert it into visual and statistical information for medical doctors to interact with. The characteristics of the data refer to the shape and content of the functional structures of the patients’ brain. It is based on the hypothesis that these characteristics are influenced by environmental elements in which the subject develops.
This document is structured as follows: First, a summary the of state-of-the- art context is presented; then the design proposal defined to achieve the project’s objective is described; next, the elements that form part of the solution in the implementation are detailed; the study case, which has been designed in the simplest possible way is described next and finally the results and conclusions o are presented.
7
Chapter 2. General project description
2.1 Objectives
Gather variables related to the volume, shape and content of brain structures and generate tools as input to the analysis of experts that allows them to create new hypotheses and / or generate new knowledge
Present a summary of the review of the available tools to validate the value provided through the proposed tools in this project
Contribute with the computation of new variables related to the volume, shape and content of the brain structures, such as relative mass center and determine its possible contribution in the characterization of the subjects of study
Test with the expert users the developed tools through real data coming from the 20-year investigation of the kangaroo program in Bogotá
2.2 Background
There are several references that are used as antecedents of this thesis. First, the following references suggest that the forms of brain structures and functions can characterize subjects ([17] Sherbondy et al., 2005; [21] Wang, Qi et al., 2016; [1] Cabral, Joana et al., 2017). However, it is not easy to find a study in which there are more than 100 subjects in periods of time greater than 3 years. There are several factors that make this kind of studies difficult:
The process of taking brain images is not easy and requires time, money and patience from subjects and researchers.
8
Due to the so-called cerebral plasticity, the functions of the subject may not always be associated with certain structural forms. This makes the study complicated due to the large number of possibilities that may arise.
There are many possible variables to consider which makes the problem not easy to handle.
There are already many open tools that have been refined and in many cases they present much more information than what is actually used, because of its huge quantity and diversity and where a possible application is not perceived or because information engineers have not interacted with the doctors and researchers to generate valuable knowledge from these sources.
There are references related to the comparison between the most important tools available, whose objective is to quantify the accuracy of the measurements. This is the case of ([16] Morey, Rajendra A et al., 2009), comparing structures such as the hippocampus and the amygdala.
There are other references in which techniques are exposed to characterize a 3D shape in brain investigations such as the analysis of intracranial aneurysms ([15] Meuschke, Monique et al., 2018); tractography in the human brain ([17] Sherbondy et al., 2005); surface registration of cortical areas ([20] Tardif, Christine Lucas et al., 2015); cerebral microstructural subdivision ([7] Fischl, Bruceet al., 2018); Analysis of the shape of the cerebral ventricles ([11] Gerig, G et al., 2001); Brain split based on connectivity ([21] Wang, Qi al., 2016), etc.
2.3 Problem Definition and Significance
The human brain is the most complex system known in the universe. From the brain much information can be extracted that can be used for a greater understanding of our nature. Understanding how our brain is affected by the factors that surround it will help to benefit it and therefore help human development.
9
This project studies the data from the CanguroFoundation, ([30] http://fundacioncanguro.co/), whose mission is to apply science to humanize neonatology. The Canguro Foundation has a wide set of data on brain images, medical, behavioral and social variables of patients with premature birth. The foundation was in charge of monitoring the patients who volunteered, gathering data during a 20-year period to enrich its scientific mission.
This valuable resource, unique in the world, deserves to involve information engineers in an interdisciplinary group already composed of medical experts and scientific researchers.
10
Chapter 3. Project statement and
Specifications
3.1 Problem description
A friendly interactive web platform that allows analyzing parameters of shape and content of brain morphological structures either individually or for groups of individuals is currently not available.
3.2 Design description
The comparison of parameters of shape and content in the morphology of the
subjects´ brains requires a preprocessing module that allows making those
parameters comparable by means of alignment, rotation, uniformization and
displacement from the same common reference system.
An adaptation layer for data processing that allows the information to be
integrated into the analysis tool, which must be able to compare a large group of
subjects, as well as to examine details of a single subject.
The platform must be scalable and integrable with other modules that perform
other types of analysis such as functional analysis and must also be web accessible,
so that it can be reachable from any device, operating system and location.
11
3.3 Data profiling
The medical images of the brain emerging from the measuring equipment, come in a format called DICOM (Digital imaging & communications in medicine), which contains the pixels of the image as well as a header with metadata and additional information. Each DICOM image contains a slice of the brain as if it were a single section of the 3D brain.
On the other hand, there is another format, the NIFTI (Neuroimaging
Informatics Technology Initiative) format, which represents all the slides of the brain, that is, the 3D brain image is obtained by stacking individual slices on top of each other. Then, with several DICOM files, a single NIFTI file can be generated. The
NIFTI format is easier to use because it contains the 3D information necessary for the analysis of the volumetric images.
The medical images of the brain can be of the MRI type (magnetic resonance imaging) or FMRI type (functional magnetic resonance imaging). The difference between the two is that for structural images there is no time variable, while for functional images there is a follow-up to the subject’s brain in time. The structural images represent the morphology of the brain of the subjects under study.
Figure 3.1: the example of functional MRI left, which makes sense to be done in time. Right, an MRI image that does not take time into account and therefore constitutes a photograph that reveals structural details of the brain.
12
NIFTI images can be loaded with python through the nibabel library to be
converted into numpy 3D arrays which allows for further processing.
This project focuses on structural MRI images, that is, they represent the
morphological structure in a given time. For our case correspond to the images of
the subjects of young adults who participated in the cangaro method because their
birth was premature.
With preprocessing, images are “standardized”, or taken to the same reference
system in size and content to allow comparison in these aspects. For our case this
will be done with the freesurfer pipeline, although there are several other tools that
or taken to the same reference system
3.3.1 Variables framework and profiling
According to the model proposed, in order to reference the input data to the
tool, related to the categorization of the data variables that feed it, the cataloging of
the variables found for the specific case of the analysis of the data, (Kangaroo
subjects), is shown below.
Figure 3.2: Data model for this project
The classification of the more than 1500 variables taken during the previous
kangaroo analysis was carried out to catalog them within the above categories.
Here is an example of the cataloging of these variables:
13
Figure 3.3: Data model for this project, which integrates an example of the variables.
Every variable represents certain characteristics of the subjects, for example,
their initial state, the intervention to which they were subjected,(e.g. through the
cangaro program), their anatomical images and their intellectual performance. As
a hypothesis a certain relationship between the variables is expected, for example,
if a subject had a bad initial state, the intervention to which it is subjected will alter
the anatomical and performance results, compared to a subject that does not
undergo such intervention.
This project contemplates a deepening of anatomical variables, which analyzes
in detail different features that could characterize a cerebral anatomical structure.
Features related to volume, shape and content of brain structures are considered
interesting, as a way to characterize brain structures and therefore we will see
them in more detail below.
14
Subject Anatomic result variables:
The characteristics that identify brain structures are the following. It shows
the ones calculated with the Freesurfer pre-processing pipeline, as well as those
calculated within the present project by means of the tool:
Variable Description Units Calc By NumVert Number of Vertices unitless Freesurfer SurfArea Surface Area mm^2 Freesurfer GrayVol Gray Matter Volume mm^3 Freesurfer ThickAvg Average Thickness mm Freesurfer ThickStd Thickness StdDev mm Freesurfer MeanCurv Integrated Rectified Mean Curvature mm^-1 Freesurfer GausCurv Integrated Rectified Gaussian Curvature mm^-2 Freesurfer FoldInd Folding Index unitless unitless Freesurfer CurvInd FieldName Intrinsic Curvature Index unitless Freesurfer Volume Volume by number of voxels Voxels Tool SumOfContent Sum of content voxels unitless Tool MaxOfContent Max of content voxels unitless Tool MinOfContent Min of content voxels unitless Tool MediaOfContent Media of content voxels unitless Tool MeanOfContent Mean of voxels content unitless Tool StdOfContent Standard dev of voxels content unitless Tool CenterOfMass Center of mass of voxels content Lineal voxels Tool RCenterOfMass Relative Center of mass of voxels content Lineal voxels Tool
Volume Corresponds to the physical description described as a metric magnitude of
scalar type defined as the extension in three dimensions of a region of space. It is a
magnitude derived from the length, since it is found by multiplying the length,
width and height. Our concerts has to do with volume of brain internal structures
Within this project, two types of volumes were worked on.
Variable Description Units Calc By GrayVol Gray Matter Volume mm^3 Freesurfer Volume Volume by number of voxels Voxels Tool
Shape:
In geometry, the shape of a physical object located in a space, is a geometric
description of the part of the space occupied by the object, as determined by its outer
limit and without taking into account its location and orientation in space, the size, and
other properties such as color, content and composition of the material
15
Within the previous cataloging, the characteristic of curvature, surface area and
thickness will be taken specifically, where we find the following variables:
Curvature Variable Description Units Calc By MeanCurv Integrated Rectified Mean Curvature mm^-1 Freesurfer GausCurv Integrated Rectified Gaussian Curvature mm^-2 Freesurfer FoldInd Folding Index unitless unitless Freesurfer CurvInd FieldName Intrinsic Curvature Index unitless Freesurfer
Superficial area Variable Description Units Calc By NumVert Number of Vertices unitless Freesurfer SurfArea Surface Area mm^2 Freesurfer
Thickness
Variable Description Units Calc By ThickAvg Average Thickness mm Freesurfer ThickStd Thickness StdDev mm Freesurfer
Content The word content has several meanings according to the dictionary. We are
going to take a simple description that describes the characteristic that we want to
determine to describe each brain structure
“Set of each one of the parts that consist in a unit”.
In our case, the content is equivalent to each of the units or voxels that make up the
3D image. Voxels are tiny cubes with an intensity value in the 3D image. The higher the
voxel intensity value, the greater the bright visual and vice versa.
Within this definition, the following are the variables that describe the content of
brain structures:
Variable Description Units Calc By SumOfContent Sum of content voxels unitless Tool MaxOfContent Max of content voxels unitless Tool MinOfContent Min of content voxels unitless Tool MediaOfContent Media of content voxels unitless Tool MeanOfContent Mean of voxels content unitless Tool StdOfContent Standard dev of voxels content unitless Tool CenterOfMass Center of mass of voxels content Lineal voxels Tool
16
RCenterOfMass Relative Center of mass of voxels content Lineal voxels Tool
Within the category of "content" there are two proposed variables that are intended to characterize a complete brain structure that are as follows:
CenterOfMass Center of mass of voxels content RCenterOfMass Relative Center of mass of voxels content
These variables are analogous to the center of physical mass of an object in the sense that they represent the point of greatest concentration of the “mass”, which in our case turns out to be the concentration of higher values of intensity of the voxels in the image. This center of mass then depends on the shape, the number of curves of the structure, the volume and the content itself of the brains internal structures. Therefore, it is representing all the variables that are to be characterized in this project.
The center of mass in physics, in a three-dimensional form, results from the
independent calculation in each of the axes where a reference point is taken and
from there the distances and mass are measured in said coordinate. The sum of the
distances multiplied by the mass at that distance from the reference point is collected
and the entire result is divided by the sum of the masses.
As an example, the calculation of the center of mass for the following system, measured
in the “x” coordinate (the two masses are in the same horizontal line), was performed
taking as a reference point, the left line in the figure
17
Figure 3.4: Example of mass center calculation
The units of the center of mass for the previous example are in centimeters,
(unit of distance from the reference point).
It can be deduced from the previous figure, that the center of mass depends on
the distribution of the masses and the mass value itself.
CenterOfMass (Center of mass of voxels content)
The following figure illustrates the methodology of calculation of “center of
mass" for the 3D images we have. For example, we want to calculate the center of
mass of the hippocampus highlighted in blue in the image. The hippocampus is
taken from the pre-processed and segmented image by freesurfer. Then a reference
point is taken from which you want to calculate said variable, (In our case, the
reference point is taken as the x, y, z (0, 0, 0) coordinates in the numpy array in
which the image is transformed). Then you take the sum of voxel values in the first
section or segment (You can start with the X coordinate), multiplied by the distance
that corresponds to the amount of voxels in the calculation direction from the
reference point. The previous summation is divided by the total amount of "mass",
which in our case corresponds to the total amount of the sum of the voxel values.
18
Figure 3.5: Example of mass center calculation for this project for different brain internal structures
The final result is a distance in voxels, which is measured from the reference
point,(in the calculation coordinate, e.g. axis “x” ), and which, as stated, represents
the 3D figure in its dimensions, shape and content, since if any of the previous
features is modified, the final result will also change
RcenterOfMass (Relative Center of mass of voxels content)
If there are differences in the sizes of the hippocampus of the test subjects, the
centers of mass will also be different. For this reason the relative center of mass is
proposed, which is defined as the distance between the center of mass, (defined and
calculated in the previous section), and the center of mass equivalent to a shape
whose intensity of voxels is constant and uniform with value of "1".
This eliminates the comparison between subjects from the perspective of the
size of the structures and directs it towards the deviation of its structure from a
uniform reference structure with the same shape and volume as its own. That is, this
new measure, is more oriented to characterize the content only, unlike the center of
mass, which is susceptible to any of the characteristics of volume, shape and content
19
Reasoning differently, the RcenterOfMass, is a measure of how different the
subject's image is from a completely flat image at the level of intensity values. If the
subject's image had very constant values, this value is zero, or very close to zero.
The medical significance of this is something that has not yet been discussed. Is it
expected that there will be great variation in this parameter for certain structures?
Or on the contrary a great uniformity is expected? These are questions we want to
answer in this project
Canguro variables profiling:
The Canguro and proposed variables are framed in the tools designed so that
all of these can be seen in three different views. All the variables mentioned above
have been outlined in order to have a better understanding of them and see how you
can play with them within the tool that is used to see the interrelation between them.
By individually understanding each variable, you can better choose those that want
to be analyzed together and see their inter relationships.
The profiles below analyze some of the most important variables of the
Kangaroo study for their better understanding. Given the large number of variables.
Annex A to C will be used for a better deepening of this profiling. We want to show
some variables that are important to highlight because they are key for further
analysis within this project.
Python profiling was used to outline the various variables of the kangaroo study
as shown below with two of them as an example:
Category: Subject initial state variables SCB_agemother: Corresponds to the age of the mother at the moment of the Baby's birth Numeric
20
Distinct count 22 Unique (%) 19.5% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 28.283 Minimum 19 Maximum 40 Zeros (%) 0.0%
The previous histogram shows the numerical variable, (age), labeled as Age of
the mother (years), with a kind of bimodal distribution, where age range is between
19 and 40 years of age, a little more inclined towards the range of 19 to 30, and a
notable group around 30.
Category: Subject initial state variables FOLLOW_Fragility_Rasch_746_2PL: Parameter of fragility1 of the baby at birth. The smaller or more negative, it implies less fragility or greater robustness of the baby Numeric Distinct count 17 Unique (%) 15.0% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 0.2142 Minimum -1.1861 Maximum 1.6533 Zeros (%) 0.0%
The previous variable, labeled as Fragility index 2PL including variables before
randomization in the 746 infants from the original cohort, is a numerical type with a
distribution similar to the normal one with a range of values between -1.1 and 1.6. There
are two ways to show and relate this variable with the rest of the variables in a view of
the proposed heat map. This is, by means of a type of membership variable in which
1 Construction of a fragility index As the initially randomized sample was slightly unbalanced 20 years later, instead of adjusting it by weight variables or propensity scores, we used a Rasch model to determine whether the groups differed at the start of the intervention with regard to general fragility (or vulnerability or limited development). A set of 15 binary indicators was selected to detect damage that might have occurred during pregnancy, birth or the neonatal period before randomization Instead of a simply total of the indicators, the fragility index is based on individual factorial scores, on the assumption that a common latent variable measures the non-specific personal fragility of an infant. 21
each value of the heat map belongs to a value of the variable, (in this case that of
fragility), which can be seen at the top of the map. Another way is to integrate it into the
heat map itself, for which the variable must undergo the function of standardization and
other treatments that allow it to be displayed visually with the color tones established for
the range that goes from negative numbers in blue, to numbers positive in red
Category: Performance WASI_PercRsngcompositescore Numeric Distinct count 47 Unique (%) 41.6% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 89.858 Minimum 52 Maximum 123 Zeros (%) 0.0%
WASI_PercRsngcompositescore is labeled as PRI (Perceptual Reasoning Index
Composite score). Is a variable that indicates the value of the perceptual reasoning test
in the subjects, at the age of 20 years. It is observed that it has a distribution in the form
of a gaussian bell, with some outliers.
Category: Subject initial state variables BIRTH_apgar5_5 Numeric Distinct count 6 Unique (%) 5.3% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 9.0442 Minimum 5 Maximum 10 Zeros (%) 0.0%
The Apgar score is a test to evaluate newborns shortly after birth. This test evaluates
the baby's heart rate, muscle tone and other signs to determine if he needs additional or
emergency medical help.
22
Usually, the Apgar test (also known as the "Apgar test") is given to the baby twice:
the first time, one minute after birth, and again, five minutes after birth. Sometimes, if
the baby's physical condition is worrisome, the baby can be evaluated a third time.
BIRTH_apgar5_5, labeled as APGAR score at 5 minutes shown above, concentrate
its values around the value of 9 2.
It is important to note that all the variables associated with each of the subjects,
which catalog them and provide additional information regarding their initial status,
intervention status to which they were subjected or final performance status, (regardless
of their type), may be visualized in the tool within the cataloging variables found in the
upper part of the heat map, as shown in the following figure.
Figure 3.6: Variables related to the kangaroo study that can be represented at the
top of the heat map
2 The word Apgar refers to "Appearance, Pulse, Irritability (Grimace), Activity and Breathing."
In the test, these five factors are used to assess the health of the baby. And each factor or aspect is evaluated on a scale ranging from 0 to 2, with 2 being the highest possible score:
Appearance (skin color) Pulse (heart rate) Irritability (reflex response) (from Grimace in English) Activity (muscle tone) Breathing (respiratory rate and respiratory effort)
23
A sample of the profiling of the most important variables that were considered for this project is included in Annex A.
24
Chapter 4. Project Development
4.1 Referential Framework
4.1.1 State of the art tools and platforms
In neuroimaging research there is increasing evidence that shape analysis of brain structures provides new information which is not available by conventional volumetric measurements. This motivates development of novel morphometric analysis techniques answering clinical research questions which have been asked for a long time but which remained unanswered due to the lack of appropriate measurement tools. The challenges are the choice of biologically meaningful shape representations, robustness to noise and small perturbations, and the ability to capture the shape properties of populations that represent natural biological shape variation.
The following is a compendium of the available MORPHOLOGY MEASUREMENT SOFTWARE TOOLS | MAGNETIC RESONANCE IMAGING ANALYSIS nowadays.
https://omictools.com/morphometric-analysis-3-category
4.1.1.1 BoneJ
A plugin for bone image analysis in ImageJ. BoneJ provides free, open source tools for trabecular geometry and whole bone shape analysis. It calculates several trabecular, cross-sectional and particulate parameters in a convenient format. Java technology allows BoneJ to run on commodity computers, independent of scanner devices, fully utilising hardware resources. ImageJ’s plugin infrastructure provides a flexible working environment that can be tailored to diverse experimental setups. BoneJ is a working program and a starting point for further development, which will be directed by users’ requests and the emergence of new techniques.
25
Critical analysis: It is a multipurpose tool that allows analysis of the geometric characteristics of bones but does not have a focus on our main concern, which are brain images. The main uses have to do with the oculocerebrorenal syndrome, analysis of fractures, Marián syndrome, Cardinoma, Pancreatitic Ductal, among others. It is used mostly in the USA, Canada, Europe, Australia, Chile, Brazil and South Africa. Is a free, open source tool that can be downloaded at http://bonej.org/
Figure 4.1: Graphical output from BoneJ. (A) Trabecular thickness (Tb.Th) measured with the Thickness command. Yellow regions are thicker than blue regions. (B) Centroid and principal axes as calculated by Slice Geometry on a tomographic slice from the tarsometatarsal bone of a little spotted kiwi (Apteryx owenii). (C) Murine osteocyte lacunae imaged with synchrotron μCT and measured with Analyse Particles.
4.1.1.2 FSL
Allows users to work about magnetic resonance imaging (MRI) brain imaging data. FSL can realize functional magnetic resonance imaging (FMRI) analysis (brain extraction, smoothing, statistics, and registration). This tool can analyze a wide range of MR modalities (task FMRI, resting FMRI, ASL, diffusion, structure), and can be easily scripted and run over computing clusters.
Critical analysis: FSL was in fact one of the most explored tools during this project, because it provides very important information regarding the characteristics that we wanted to analyze. FSL, like Freesurfer, provides a pre-processing pipeline that allows us to segment the images, which constitutes the food of our tool. FSL is well implemented in the world and even in Colombia it has been an exploited tool. The tool has been widely used to perform studies related
26
to Alzheimer's disease, brain injuries, Lymphoma, Non-Hodgkin, Stroke, cerebral infarction among others.
Figure 4.2: FSL GUI for graphical analysis
4.1.1.3 Brain Extraction Tool
Deletes non-brain tissue from an image of the whole head. BET can also estimate the inner and outer skull surfaces, and outer scalp surface, if T1 and T2 input images are of good quality. It is very robust and accurate and has been tested on thousands of data sets from a wide variety of scanners and taken with a wide variety of magnetic resonance sequences. BET uses a deformable model that evolves to fit the brain's surface by the application of a set of locally adaptive model forces. The method is very fast and requires no preregistration or other pre- processing before being applied. BET takes about 5-20 sec to run on a modern desktop computer and is freely available, as a standalone program that can be run from the command line or from a simple GUI, as part of FSL (FMRIB Software Library).
Critical analysis: BET as well as FSL was considered a lot for this project as part of the pre-processing pipeline components due to its lightness and power. BET is a very mature tool that delivers clean brain images which is a great help to start any type of brain analysis. BET has been used worldwide and has collaborated in studies similar to those mentioned with FSL.
27
Figure 4.3: Brain segmentation obtained with the Brain Extraction Tool
4.1.1.4 FracLac
Offers a platform for fractal analysis and morphology functions. FracLac is a module that can be used as an ImageJ or FIJI plugin. The application includes features such as: (i) analyzing complexity and heterogeneity, (ii) measuring difficult to describe geometrical forms; (iii) extracting of pattern from several types of images for analysis. The software is suitable for images of biological cells as well as for other structures such as branching structures or known fractals.
Critical analysis: FracLac is specialized in characterizing biological structures in general, which is precisely the objective of this project. Being so general and not being specialized in the brain, it would still require adjustments to integrate it into this project, however it was an interesting tool that still has potential to be exploited. At the level of America, it is used only in Brazil and the USA. Additionally it is used in a good part of Europe and Asia. The tool has been exploited in problems related to Neoplasms (similar to tumors), Brain injuries and detection of irregular forms in the human body
28
Figure 4.4: FracLac for ImageJ to quantify cell complexity and shape. (a) Illustration of FracLac box counting method to derive fractal dimension calculations of a microglia outline. (b) Schematic of the convex hull (blue), bounding circle (pink) and convex hull ellipse (orange) with accompanying longest length and width (dashed lines)
4.1.1.5 BrainVISA Hosts heterogeneous tools dedicated to neuroimaging research. BrainVISA aims to help researchers in developing new neuroimaging tools, sharing data and distributing software. It offers a way to define viewers which may use any visualization software. Thanks to its data management functions, the tool can define the data types handled by the software, associate key attributes for indexation, and filename patterns to make the link between the filesystem and the database schema.
Critical analysis: BrainVISA is widely used in the American continent, including Colombia and much of Europe, China and Australia, is a tool that facilitates collaboration between neuroimaging researchers that is just our area of concern. In more advanced stages of our process, it may be feasible to contribute within that environment. The tool has contributed with studies of Alzheimer's disease, Glioma, among others
29
Figura 4.5: BrainVISA IntrAnat module that is a collection of software to manage, analyse and anatomical (MRI, CT) and functional SEEG data (Intracranial electrodes) of epileptic patients.
4.1.1.6 Mango/ Multi-image Analysis GUI
Automates regional behavioral analysis of human brain images. Mango provides analysis tools and a user interface to navigate image volumes. The tool is ease to use, multi-platform Java application and extensive region of interest tools. It has the ability to add and update software as a plugin module and offers full access to a suite of image viewing and processing features. The software is able to rapidly determine regionally specific behaviors for researchers’ brain studies.
Critical analysis: Mango was another of the tools explored with enough success in this project due to its ability to represent realistic images of the brain in a web environment with the help of javascript. It was thought to use this tool to adapt it to the volumetric graphing module of the brain, but finally it was decided to carry out a development completely from scratch since the latter facilitated the manipulation of the segmented images according to the requirements of the experts, which Finally, it would provide greater adaptability to the needs. Mango is used for many tasks related to the analysis of brain diseases and is used in much of Europe, the United States, Brasil and China
30
Figura 4.6: Mango mobile-friendly medical imaging research application for the Apple iPad. It features many of the same ROI and analysis tools as Mango and uses interoperable file formats and customization files such as ROIs and user- defined color tables.
4.1.1.7 Braviz (Brain Visualization)
Braviz a python library, and a system with a graphical user interface which can be used to analyze brain data. The braviz system is a collection of small applications tailored for specific tasks. The idea is that each application should be easy to understand and use. Nevertheless, applications are connected to each other in several ways which permits to complete more complicated tasks. If none of the available applications fit the task at hand, a new application may be developed and integrated into the current system using braviz as a library.
Critical analysis: Braviz is a tool of the house whose power lies in its adaptation to the analysis problems that exist in Colombia. Braviz has a wide range of subject analysis characteristics which allows analyzing individual characteristics in great detail. He suffers that he did not adapt to the web environment in the first version and that it should be strengthened at the level of group analysis. Braviz has been used in Colombia for the study of the brain of the Canguro subjects by a group of scientists from different parts of the world. The idea of the present project is precisely to build a contribution to a second version of Braviz.
31
Figure 4.7: An example of BRAVIZ running on a large display in a collaborative setting.
4.1.1.8 BrainSuite
Enables largely automated processing of magnetic resonance images (MRI) of the human brain. BrainSuite provides a sequence of low-level operations that can produce accurate brain segmentations in clinical time. It produces classified brain volumes that can be useful for quantitative studies of different regions of the brain. The tool consists of several modules that performs skull and scalp removal, nonuniformity correction, tissue classification, and object topology correction.
Critical analysis: Brainsuite is an interesting tool that should be explored on subsequent occasions due to its ability to perform pre- processing of images being apparently friendlier than freesurfer and with great analytical characteristics. Unlike Freesurfer and others, it runs on windows that allows more researchers to work with it. There have been scientists from Colombia, Argentina, China, Europe and the USA who have used this tool.
32
Figure 4.8: BrainSuite magnetic resonance image analysis tools
4.1.1.9 Freesurfer FreeSurfer is a software package for the analysis and visualization of structural and functional neuroimaging data from cross-sectional or longitudinal studies. It is developed by the Laboratory for Computational Neuroimaging at the Athinoula A. Martinos Center for Biomedical Imaging. FreeSurfer is the structural MRI analysis software of choice for the Human Connectome Project.
Critical analysis: Freesurfer is the basis for the pre-processing of this project because the data pre-processed in this tool was already in advance. Freesurfer is a very mature and widely known tool in the field of brain imaging analysis. Freesurfer has tools that allow the analysis of structural data of the brain and in its pre-processing generates a large amount of data related to this; however, the tool does not have a web interface that facilitates the connection of scientists from any place or system.
33
Figure 4.9: Freesurfer GUI for 3D volume of the brains internal structures analysis
4.1.1.10 Neurosynth
Neurosynth is a platform for large-scale, automated synthesis of functional magnetic resonance imaging (fMRI) data.
It takes thousands of published articles reporting the results of fMRI studies, chews on them for a bit, and then spits out images
Critical analysis: It is a very interesting tool whose characteristic is to allow a very wide range of functional MRI image analysis and great flexibility. It has a web interface that facilitates access by researchers and open source based on python and javascript.
Figure 4.10: Neuroimaging, Reverse inference. A, when using Neurosynth for a traditional meta-analysis. B, when using Neurosynth in reverse inference mode. C, The same frontal areas that identified in a traditional meta-analysis for. D, In contrast, reverse inference for faces no
34
longer identifies frontal cortex activity but rather locates the activation predicting highest probability for face percepts in the right FFA
4.1.1.11 MedInria
is a multi-platform medical image processing and visualization software. It is free and open-source. Through an intuitive user interface, medInria offers from standard to cutting-edge processing functionalities for medical images such as 2D/3D/4D image visualization, image registration, diffusion MR processing and tractography.
Critical analysis: Another very flexible and open source development that allows installation in various operating systems. It is a tool oriented to the visualization of images and their treatment in aspects of segmentation, registration, filtering of images, file management. It is not punctually specialized in aspects of characterization of form and volume that is our concern.
Figure 4.11: MedInria registration of two images
4.1.1.12 Invizian
Invizian enables fly through and interact with hundreds of human brains to compare structural differences or carefully inspect individual specimens. It enables, via your computer, to display and interact with hundreds of neuroimaging data sets at once —bringing together brain image data from some of the world’s best neuroscience research teams. Invizian empowers both researchers and students of neuroscience to explore and understand the human brain using a
35
simple and powerful user interface for neuroimaging data exploration and discovery.
In this interface, cortical surfaces specific to each brain are positioned such that data sets whose neuroanatomy is most similar lay closest to one another whereas brains with the most different cortical anatomy are positioned furthest away from one another. This creates a “brain cloud” based on their neuroanatomical similarity. Users may use their mouse to navigate through the space then perform meta-data searches highlighting specific brains. Any brain may be clicked to provide interaction with a high-resolution version of the surface. Brains may be color-coded according to specific attributes or regional metrics. Brains may be grouped and systematically compared using a variety of data mining tools.
Critical analysis: It has the great advantage of allowing the multivariate analysis of a large number of subjects by means of data mining techniques, which was a flaw we wanted to solve. It allows observing the set of subjects in a single view and selecting a specific one for a specific analysis, which is coherent with the Visual Information- Seeking Mantra of Shneiderman. Another advantage is that it has the possibility of considering the time within the analyzes, showing the evolution of the subjects, giving the possibility of linking it with variables of intervention during the evolution of the patient. The only flaw I could mention is that it does not have the web option, but it can be said that it is one of the tools with more tools oriented to our objective that we found.
36
Figure 4.12: Invizian t-SNE plot color coded by age. T-distributed stochastic neighbor embedding (t-SNE) algorithm, which reduces the number of dimensions for each scan in the input data set to two dimensions while preserving the local structure of data sets
4.1.1.13 Bids.neuroimaging.io
Neuroimaging experiments result in complicated data that can be arranged in many different ways. So far there is no consensus how to organize and share data obtained in neuroimaging experiments. Even two researchers working in the same lab can opt to arrange their data in a different way. Lack of consensus (or a standard) leads to misunderstandings and time wasted on rearranging data or rewriting scripts expecting certain structure. Bids describe a simple and easy to adopt way of organizing neuroimaging and behavioral data.
Critical analysis: It is a type of specification that allows organizing neuroimaging data in a standard way so that all researchers can share their data in an organized and common way to facilitate understanding and the types of interfaces that can be used to interact with these data. The objective of BIDS is to reduce the data exploitation times, unifying the structure of them to a standard. Of course, it is another initiative to which we should join since it tends towards the collaborative and standardized environment to optimize times and facilitate collaboration.
37
Figure 4.13: BIDS format example of how it order the folders
4.1.1.14 SPHARM
SPHARM is a software application to perform 3-dimensional spherical harmonic analyses of triangular mesh surfaces. Spherical harmonic analyses are a 3D extension of Fourier analyses that generate a 3D mathematical model of an object's surface.
To construct a spherical harmonic model of an object using SPHARM, a triangular mesh representation of the object and a set of landmarks are needed to define the object. The triangular mesh consists of a dense coverage of points (i.e., vertices) on the object's surface and lines connecting the points to form a complete set of triangles (i.e., faces) that define the surface. The landmarks are used to orient and register a series of objects relative to one another so that they can be compared
Critical analysis: Spharm is another of the tools explored within the present project because it was found that it could provide characterization to brain structures. The characterization is done by mathematically describing the three-dimensional shape by its fourier components, as Fourier characterizes a wave by decomposing it into sinuidal components. In this way, the volume and the surface of the
38
structure is univocally characterized. This would be equivalent to having a characterization for the volume, plus one for the surface.
Figure 4.14: SPHARM Algorithms to automatically quantify the geometric similarity of anatomical surfaces.
4.1.1.15 BrainBrowser
BrainBrowser is an open source JavaScript library exposing a set of web-based 3D visualization tools primarily targetting neuroimaging. Using open web-standard technologies, such as WebGL and HTML5, it allows for real-time manipulation and analysis of 3D imaging data through any modern web browser. The BrainBrowser Surface Viewer is a WebGL-based 3D viewer capable of displaying 3D surfaces in real-time and mapping various sorts of data to them. The BrainBrowser Volume Viewer is an HTML5 canvas-based viewer allowing slice-by-slice traversal of MINC, NIfTI and MGH/freesurfer volumetric data.
Critical analysis: Brainbrowser is one of the key tools for this work, since it was a fundamental guide for the realistic three-dimensional rendering module of brain structures. Brainbrowser allows representing the images in different formats through the web interface, one of the fundamental requirements of this project. Finally, it was decided during the present project, to render the volumes, using the three.js library after processing the images with the help of python.
39
Figure 4.15: Brainbrowser web-enabled brain surface viewer that allows the user to explore in real time a 3D brain map expressed on a base surface. Typically, this map might be a statistical map derived from a group analysis of functional or structural imaging
4.1.1.16 Vtkweb
The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, image processing, and visualization. Its implementation consists of an ES6 JavaScript class library that can be integrated into any web application. The toolkit leverages WebGL and supports a wide variety of visualization algorithms including scalar, vector, tensor, texture, and volumetric methods. VTK is part of Kitware’s collection of commercially supported open- source platforms for software development.
Critical analysis: Together with Brainbrowser, vtk is another tool for image processing and visualization, allowing you to render them and play with these realistic images. VTKweb is based on vtk.js, which is a web rendering library of 3D objects widely used in medical and scientific applications of high detail.
40
Figure 4.16: Visualization Toolkit , cross-platform tool VTK view
4.1.1.17 ANTS
Advanced Normalization Tools (ANTS) is an ITK-based suite of normalization, segmentation and template-building tools for quantitative morphometric analysis. Many of the ANTS registration tools are diffeomorphic, but deformation (elastic and BSpline) transformations are available. Unique components of ANTS include multivariate similarity metrics, landmark guidance, the ability to use label images to guide the mapping and both greedy and space-time optimal implementations of diffeomorphisms. The symmetric normalization (SyN) strategy is a part of the ANTS toolkit as is directly manipulated free form deformation (DMFFD).
Critical analysis: ANTS is a tool comparable to freesurfer, which is responsible for doing many pre-processing tasks with images such as segmentation, registration, (to unify the reference of images), along with co-registration, (registration between images of the same subjects to reference them to a common point), etc. This tool is the basis for many others that take the preprocessing pipeline and allow such data to be analyzed in detail later.
41
Figure 4.17: ANTS GUI for images registrations and parcellations
4.1.1.18 Mindboggle
The Mindboggle project's mission is to improve the accuracy, precision, and consistency of automated labeling and shape analysis of human brain image data. Mindboggle promote open science by making all software, data, and documentation freely and openly available.
Mindboggle’s open source brain morphometry platform takes in preprocessed T1-weighted MRI data, and outputs volume, surface, and tabular data containing label, feature, and shape information for further analysis. The software runs on Linux and is written in Python 3 and Python-wrapped C++ code called within a Nipype pipeline framework.
Critical analysis: Mindboggle was one of the tools considered to be part of the present project, due to its functionality to extract volume, shape and surface information from other pre-processing pipelines, (something that is being done in the present project with self development). This information extracted from Freesurfer and / or ANTs, allows to contrast these two options and correct one or the other
42
by manual means. Mindboggle also generates realistic interactive visualizations, as well as statistical visualizations of the group of subjects studied. Finally, this tool was not taken to have greater control of the data in this project.
Figure 4.18: Mindboogle WEB GUI whose name is ROYGVIV
4.1.1.19 Discussion of the state of the art
It is observed that there are a large number of available platforms related to what we are concerned about that is the analysis of brain morphological images, as well as information related to the subjects studied. Many of them are powerful and very competent in the field of their specialty. However, it should be noted that a tool that brings both things together is not so easy to find.
It is highly desirable to count with a tool that allows several of them to be integrated and even allows the categorical information of the kangaroo method subjects to be interrelated in a quite flexible and friendly way so that it can still evolve based on the needs of the experts
4.1.2 Available Information for this project
4.1.2.1 Data Set
43
The available data comes from the exams taken to the patients of the Canguro program, where brain images are available both at rest, as well as in the performance of different tasks proposed for the functional study of subjects’ brains. These images were already subjected to pre-processing in FreeSurfer for analysis in the Braviz 1 project. On the other hand, demographic information about the subjects is available too.
This data is held in different formats:
3D images in .mgz format where the meaning of their voxels varies according to the processing stage. For example, the values of the voxels can come after a first stage of intensity normalization of the original image. On the other hand, this value can also be product of segmentation, which results in values corresponding to different brain structures. That is, a certain voxel value represents a certain morphological structure. Format in plain text product of the FreeSurfer preprocessing where the volumetric, superficial and shape data are found, such as the different types of curvature that FreeSurfer generates for the brains internal structures. Table type files in excel format of demographic data of subjects
4.1.2.2 Preprocessed data
As mentioned in 4.1.2.1, the data used in the project is taken after the pre- processing phase of freesurfer, since we want to take advantage of the extensive analysis of form and volume freesurfer performs.
The amount of information that freesurfer generates is large and much of it is not used because in many cases it is redundant and in other cases the interactive freesurfer tool is not as friendly as required.
44
For the above reason it is important to provide a tool with the flexibility and simplicity that allows experts to have an overview of the total information available and that additionally you can select those variables that you want to analyze in detail
4.1.3 What is missing
Today we already have tools that generate a good amount of information related to the shape and volume of the morphological structures and indirectly the content. This is the case of FreeSurfer that generates an extensive amount of calculations related to shape and volume. What is not so easy to find is a web tool that gathers all this information and facilitates its analysis and can also focus on the
parameters studied in form and content.
4.2 Design proposal
Within the design, the Data science lifecycle has been taken into consideration, which consists of a continuous cycle of feedback phases, where the final objective is to solve a business problem that in our case is to provide knowledge to the experts regarding the factors of the Canguro program that impact the patient’s brains to a greater degree.
4.2.1 Data framework detail
In order to generate the greatest benefit of the proposed tools, the following data model is proposed which, as stated above, revolves around the analysis of the morphometric characteristics (volume, content, form of the brains internal structures) of the study subjects. This model structures the data as follows:
Data variables categories:
45
Subject initial state variables: describe how is the initial state of subjects before the analysis
Subject intervention variables: Indicate some type of intervention on the subjects
Subject anatomic result variables: Volumes, content and shape of internal brain structures
Subject performance variables: How is the performance of the subject after intervention time
The following diagram represents the relationship between the different categories of variables and how they represent a state over time of the subjects. This scheme serves to understand how these variables describe the evolution of the subjects, showing the response to the initial and intervention variables.
Figure 4.19: Model of kangaroo studio variables that catalog them for easier understanding.
The previous data model is framed in a visual analytics framework that leverage computer visualization defined by Tamara Munzner [2] as: “computer- based visualization systems provide visual representations of datasets intended to help people carry out some task more effectively”.
46
Figure 4.20: Visual analytics framework proposed by Tamara Munzner that guides this project
According to the framework proposed by Tamara, the following are the basic premises we have to consider to fulfill the former definition for the present project.
• Human in the loop needs the details: Experts need to get information about volume, shape and content of subjects brain’s structures
• External representation: perception vs cognition: Chose the best channel to provide the information to experts
• Intended task: Determine the possible contribution of proposed variables in the characterization of subjects
• Measurable definitions of effectiveness: Validation of proposed tools against some known case
The interpretation of the data (volume, surface and content of the brains internal structures) will be carried out by the experts based on the visual information
47
that is delivered to them. The following diagram shows the three different representations of data proposed in this project:
Figure 4.21: Scheme of the proposed tools for this project
The information of volume, surface and content of internal brain structures is then represented as follows:
1. Heat map view that represents the values of the variables and where the largest or smallest values of the different variables are. The proposed map contains the subjects against their proposed features. This map allows observing clusters based on cosine distance of total vector features or subjects. Statistical information for quantification of belonging to the cluster is also provided
2. Numerical information that contributes to the quantification carried out with the heat map before is provided in a second screen. On the other hand a
48
histogram is provided that allows to see the distribution of content of any brain structure of several subjects to compare
3. Realistic volumetric representation of the analyzed structure where one or more subjects can be superimposed for comparison by the experts in a third screen.
The tool was carried out following the idea of the Visual Information-Seeking Mantra of Shneiderman [1] that summarizes many visual design guidelines and provides an excellent framework for desiging Information visualization applications. The mantra dictates the following steps:
1. Overview: Have a complete overview of the characteristics of all study subjects
2. Zoom and Filter: Being able to filter an individual or group of subjects to see more detail of the rest
3. Details on demand: Ability to see the detail of a single subject or a few.
49
The design allows taking advantage of the visual design guidelines and provides a common and consistent framework to carry out the research process by the experts.
Figure 4.22: Scheme of the proposed tools for this project under the Visual Information-Seeking Mantra of Shneiderman
At the overview level, there is the option of observing a large number of subjects and their morphometric characteristics in a single view, which allows detecting outliers, clusters, the lowest or highest values. For this, the data is treated by means of a standardizing function that takes them to the same range [-10, 10], independent of their real value, imitating the real behavior of said data but now in the common range so that they can be analyzed by the eye of the expert.
The overview contains the following views:
Subjects-Features: Allows grouping subjects of similar or opposite characteristics according to their features
Features-Features: It allows to group features of similar or opposite characteristics, (which provide similar and opposite information for dimension reduction)
50
Subject-Subjects: Allows focusing on subjects to analyze those that have similar or opposite characteristics
All the previous views can be grouped into clusters using the cosine distance of the vectors resulting from considering the values of the variables in the rows or columns, which generates data groupings both vertically and horizontally according to the need of the experts.
The Zoom and filter feature consists of the functionality of being able to filter subjects or features of the cluster view. It is possible to select those that you want to analyze in detail and still apply the cluster criterion to observe similarities or differences, as well as outliers.
For details on demand, two screens were created that allow selecting the subjects and brain structures of specific subjects and being able to see in detail their volume and content by means of the statistical characterization of their components. It is possible to compare two or more subjects and see if they resemble at the level of their "center of mass" characteristic.
Finally, it is possible to see the realistic representation of the selected structures in the subjects chosen for a volumetric analysis by the experts.
The following diagram shows the independent processes where we worked to integrate the data so that everything was in the framework and model already mentioned
51
Figure 4.23: block scheme of the components of this project
The data coming from the processing pipeline, come in very diverse formats such as unstructured text files, semi structured text files, segmented images or raw images. All these data sources go to the data parsing module (1) that allows to recognize where the data of interest are, taking these data to two routes: The first one that allows to standardize the data (2) for its later processing (3) and visual display in the heat map (4) and the second, to compute new descriptive variables (5) that are later transformed (6) so that they can be received within the correct types and ranges for the visualization tools already described allowing to see the numerical detail of those descriptive variables (7), as well as the realistic views of the volumes in question (8).
52
Figure 4.24: Standardization function to adapt the data to the visualization
The following image represents the view of the variables categorized according to the established model and their visual relationship in the tools proposed for the analysis.
Figure 4.25: Data model and visualization tools proposed
53
It is observed that the tools proposed under the guidelines of Munzner, are associated with the model of variables established so that through the 3 proposed views you could see the possible relationships of the variables to be able to interact and take full advantage of them.
The following figure shows the general interaction model that experts have with the variables and particularly with the features of interest of this project:
Figure 4.26: Scheme that shows experts interacting with the data model and visualization tools
Beyond the central task proposed in this project, the proposed tools can be used more generally to solve research questions that the experts raise. A case of the Kangaroo Mother Care (KMC) Method will be used to test with experts the ability of the tool to generate insights.
The tool will then be a support in the scientific cycle of data as shown in the figure above.
54
Chapter 5
Implementation
Version 2 of Braviz, has followed the same philosophy of Braviz 1 in that it focuses on the user as the protagonist of the scientific process and therefore provides additional tools that facilitate the user's access. The first premise is that it should be a web-type tool that allows the user to access from any machine without having to install any software component which decreases production time and eliminates startup problems. In the second instance, the philosophy of modular architecture was maintained, which facilitates collaboration in joint development through the establishment of a main framework that should be communicated through defined interfaces with the modules.
A very important feature in which Braviz 2 had to work was to provide tools for the comparison between subjects so that patterns could be found that allowed the grouping of subjects by criteria of analyzed characteristics.
5.1 Description
The content and volume analysis module consists of a 3-layer architecture: GUI, a module, and the underlying algorithms:
GUI This layer allows easy interaction between the user and the application. This layer shows the different possibilities that the user has, as well as the reports generated for the subjects and their basic individual properties. It also provides a tool to analyze the relevant statistical measures of the sample and of the individual subjects.
Adaptation layer Made in python has the function of adapting the consolidated data to be able to display it in an appropriate way. Transform the data to uniform ranges so that it is possible to compare it. It also performs necessary
55
processing on data when the user requests or pre-calculates it in other cases. Perform data checking and cleaning
Data extraction layer Transforms data of interest into the appropriate format for further processing and visualization treatments. Most of the data are images in 3D and other data comes from flat texts product of FreeSurfer pre- processing
5.2 Dependencies
Several libraries were required throughout the development for python. Most of them are well-known libraries in Python programming for data science, although some others are not included in the standard installation. These dependencies are:
NumPy Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays.
Nipy python project for analysis of structural and functional neuroimaging data
Nibabel This package provides read +/- write access to some common medical and neuroimaging file formats. We can read and write FreeSurfer geometry, annotation and morphometry files.
Pandas is a software library written for the Python programming language for data manipulation and analysis.
5.3 Studied Parameters
The central parameters of our analysis for the present project are:
Shape and volume of the brains internal structures Content of the brains internal structures
56
It was said in 4.1.3 that much of the information related to these characteristics can be exploited from the pre-processing done by freesurfer. In the case of content, information is available within the freesurfer output files, but processing is required to obtain valuable information for users. Below a brief description of these parameters.
5.3.1.1 Relevance in shape curvature
Human intellect is thought to be a result of the disproportionately large cortical surface area to whole brain volume compared to other species. This increase in cortical surface area is achieved by increased gyral folding. Interestingly this increase in cortical surface area does not come with similar increases in cortical thickness. In fact, the several orders of magnitude increase in the human gray matter surface area compared to mice and monkey is achieved with barely a doubling in the cortical thickness. Thus, it appears that cortical surface area to brain volume and cortical folding are important parameters of cognitive development and ability.
If we can quantify the development of gyral folding, we may improve our ability to characterize normal trajectories of brain development and detect deviation with disease. ([39] R. Pienaar et al)
5.3.1.2 Relevance in volume of brain structures
Neurodegenerative disorders, psychiatric disorders, and healthy aging are all frequently associated with structural changes in the brain. These changes can cause alterations in the imaging properties of brain tissue, as well as changes in morphometric properties of brain structures. Morphometric changes may include variations in the volume or shape of subcortical regions, as well as alterations in the thickness, area, and folding pattern of the cortex. While surface-based analyses that depend on models of the position and orientation of the cortical ribbon can provide an accurate assessment of cortical variability, volumetric techniques are required to detect changes in non-cortical structures. For example, changes in ventricular or hippocampal volume are frequently associated with a variety of
57
diseases. This type of analysis has commonly been accomplished by having a trained anatomist or technician manually label some or all of the structures in the brain, a procedure that can take up to a week for high-resolution images. ([40] B. Fischl et al.)
5.3.2 Volume and content
5.3.2.1 Basic definitions
the Latin word volūmen has promoted the appearance of the concept of volume, a word that allows describing the thickness or size of a certain object. Likewise, the term serves to identify the physical magnitude that informs about the extension of a body in relation to three dimensions (height, length and width). Within the International System, the unit that corresponds to it is the cubic meter (m3).
At the content level, the analysis to be carried out considers a count of the elements that make up this volume. In this case, the count of the voxels and their intensity value which represents the type of brain tissue in that small volume (for example the amount of water or fat tissue in that point). This set of voxels constitute a population of elements with a fundamental characteristic equivalent to the intensity whose characterization by statistical methods is fundamental to be able to compare different structures.
5.3.2.2 Volume and content processing
To obtain the volume of the morphological structures, several preprocessing steps are required where the input are the 3D images (MRI) Magnetic resonance imaging coming from the scanner machine.
The data is subjected to the following process, which results in an image of the isolated brain without surrounding organs, with normalized intensity. Then the same reference system is carried out by means of rotation, translation and scaling.
58
Figure 5.1: Schematic of freesurfer pre-processing in 3D MRI images used in the present thesis project to feed the analysis tool.
The image is finally represented as a set of voxels which are equivalent to building blocks Like stacked bricks that form a larger 3D object. We can get voxels information such as their axis coordinates and its corresponding scalar value.
The "population" of voxels in each image is analyzed in two aspects:
The statistical composition of the voxel "population" is analyzed, which serves as a descriptor of the structure and comparison criteria with other structures in different subjects.
An equivalent to the "center of mass" or "centroid" is calculated by its simile in physics, which represents the point where the highest concentration of voxel high values occurs. This value is also a descriptor of the structure and helps characterize it for comparison with other subjects
59
The centroid also serves as a reference to compare two or more figures, allowing their alignment and voxel to voxel comparison, allowing to calculate the intersection and union of elements.
5.3.3 Shape Curvature
5.3.3.1 Basic definitions
In mathematics, curvature is any of a number of loosely related concepts in different areas of geometry. Intuitively, curvature is the amount by which a geometric object such as a surface deviates from being a flat plane, or a curve from being straight as in the case of a line, but this is defined in different ways depending on the context. There is a key distinction between extrinsic curvature, which is defined for objects embedded in another space (usually a Euclidean space) – in a way that relates to the radius of curvature of circles that touch the object – and intrinsic curvature, which is defined in terms of the lengths of curves within a Riemannian manifold ( n dimensional space).
Figure 5.2: Examples of Euclidean geometry, spherical and hyperbolic or Riemann geometry
60
Figure 5.3: Conceptual illustration of curvature. The measurement of the "curvature" of a figure can be understood by taking a series of circles that simply "touch" gently each of the undulations of the curve.
The curvature of the figure is proportional to the sum of the inverse radii of each circle. In figure 5.3 [A] and [B], the effect on the curvature when "compressing" the sulcal extent can be seen; [C] and [D], show the effect of higher order "bumps" in the general curvature.
5.3.3.2 Curvature processing
The curvature in general is measured as 1/r, where r is the radius of an inscribed circle. Since mean curvature is the average of the two principal curvatures it has the units of 1/mm. The Gaussian curvature is the product of them, so it is 1/mm^2.
Note that separating “positive” and “negative” curvatures from each other allows us a more accurate measure of the total curvature of the shape. A simple numerical mean of the curvatures found by summing the inverse radii of the osculating circles might mask the additional buckling that has occurred.
In addition to this average curvature data, FreeSurfer calculates, based on a mathematical description of the entire surface covering the entire shape, the mean and Gaussian curvatures. The mean curvature H of a surface is a measure of
61
curvature that comes from differential geometry and that locally describes the curvature of an embedded surface in some ambient space, e.g. Euclidean space. 5.3.4 “Center of mass”
It is not literally the center of mass of which we speak in physics, however it is analogous to the physical concept because it represents the point in space where the greatest amount of intensity is concentrated in the voxels of the morphological image. Voxel-based morphometry is a computational approach to neuroanatomy that measures differences in local concentrations of brain tissue, through a voxel- wise comparison of multiple brain images.
The 3-dimensionsal image in MRI is built up in units called voxels. Each one represents a tidy cube of brain tissue—a 3-D image building block analogous to the 2-D pixel of computers screens, televisions or digital cameras. Each voxel can represent a million or so brain cells. Every structure in an image, (Hippocampus, Amygdala, etc.), are actually clusters of voxels—perhaps tens or hundreds of them.
The Magnetic Resonance Imaging MRI pixel intensity is proportional to the signal intensity of the appropriate voxel. Signal intensity interpretation in MR imaging has a major problem. Often there is no intuitive approach to signal behavior as signal intensity is a very complicated function of the contrast- determining tissue parameter, proton density, T1 and T2, and the machine parameters TR and TE. For this reason, the terms T1 weighted image, T2 weighted image and proton density weighted image were introduced into clinical MR imaging.
Air and bone produce low-intensity, weaker signals with darker images. Fat and marrow produce high-intensity signals with brighter images.
In summary, the "center of mass" parameter measures the site where the brightest areas of the image are concentrated. That is, the areas with a high concentration of marrow and fat in the structure analyzed.
As mentioned in section 3.3.1, Figure 3.4, the center of mass is calculated based on a reference point. The line on the left in the figure, from where the distance "x" is measured. The reference point for this thesis was the center of the arrangement that represents the image of (256x256x256) Voxels. In this project,
62
the distances from the center of reference are measured and the mass (equivalent to the sum of voxels) of the slide at that point for the specific axis is calculated , as shown in the following figure.
Figure 5.4: Illustration of the way in which the center of mass is calculated for the structures of the present thesis.
The calculation is made separately in each of the axes. The result is a point or distance from the reference point in space for each axis, resulting in a point in three- dimensional space.
63
Chapter 6
Use cases
The following two cases are considered useful to reveal the way of use of the tool and the investigative process.
Determine if a variable can be used as anomaly detector for USE CASE-01 the subject Version 1,0,0 Dependencies No Dependencies
The expert or analyst has the hypothesis that a certain variable can be used as a subject anomaly detector Precondition
The following procedure will be executed in the case that the previous hypothesis had been formulated, in order to determine the acceptance or not of the same Description Normal Sequence Step Action Load the variable in the heat map for all the subjects. If clearly visible outliers are perceived, this variable can be considered as a possible anomaly detector for the subjects within one or 1 several chosen brain structures
Choose a group of variables that you want to analyze under this concept and load them into the 2 heat map Choose those variables that have the smallest variances and submit them to step 1 to check by 3 this procedure. There is a list of variables that have passed the test, which tells us that they are variables with relatively common values for most subjects. In subjects where this value is out of the Post condition ordinary, it should be analyzed why. Exceptions Step Action 1 NA 2 NA 3 NA
64
The variables that are determined with this procedure, can indicate outliers that although the reason for these is not known, it can give clues of certain subjects that are worth Comments analyzing in more detail
Below is the previous case with the following example:
Step 1
Load the variable in the heat map for all the subjects. If clearly visible outliers are perceived, this variable can be considered as a possible anomaly detector for the subjects within one or several chosen brain structures
The variable that was loaded in this case is that of relative center of mass for the Right and left caudate structures of 242 patients. It is possible to clearly perceive outliers towards the right side of the figure in red color and therefore it can be considered a variable that allows detecting anomalies of subjects
Figure 6.1: heat map for the variable observation use case
Step 2
Choose a group of variables that you want to analyze under this concept and load them into the heat map
The following variables have been selected to analyze for the hippocampus, amygdala, caudate and putamen for the 242 subjects:
65
cX: Center of mass in x cY: Center of mass in y cZ: Center of mass in z dcX: Relative Center of mass in x dcY: Relative Center of mass in y dcZ: Relative Center of mass in z disC: Relative Center of mass max: Max value of voxel intensity for the structure mean: Mean value of voxel intensity for the structure median: Median value of voxel intensity for the structure min: Min value of voxel intensity for the structure std: Std value of voxel intensity for the structure sum: Sum value of voxel intensity for the structure vol: voxel counting for the structure
Figure 6.2: heat map for the variable overview observation use case
Step 3
Choose those variables that have the smallest variances and submit them to step 1 to check by this procedure.
As can be seen, the variables with the lowest variance are those found in the lower part of the heat map, once they are sorted by this criterion, using the controls on the right side.
66
Figure 6.3: heat map for the variable overview observation use case, sorting variables by variance
Volume, shape and content of the brains internal structures USE CASE-02 variables exploration Version 1,0,0 Dependencies No Dependencies
The expert or analyst aims to perform an exploratory analysis of the variables, which will allow you to see if there are trends, relationships or patterns. Pre-calculations of the subjects to study must be done before. Precondition
The following procedure aims to perform an exploratory analysis of the variables related to volume, shape and content of brain structures Description Normal Sequence Step Action
Load all the variables, volume, shape and content, (Destrieux 1 Atlas structures), in heat map for all the subjects.
Sort these variables by the different available criteria such as: cluster, any of the variables, sum, variance, or the criteria that the analyst or expert wants to experience according to their 2 knowledge and / or intuition.
67
If the analyst or expert wants to go into detail and compare one or more subjects, move on to the comparison of volumes of the brains internal structures , that allows you to select the analysis subjects and the structures that you want to 3 compare.
The analyst or expert confirms or not hypotheses or suspicions. If it is not possible to confirm them in a first inspection, more focus on the variables Post condition or subjects of interest should be made. Exceptions Step Action 1 NA 2 NA 3 NA
After having done the analysis. The expert or analyst may want to filter several variables or subjects to focus on some of them according to their initial vision, intuition or knowledge. The process can then be restarted Comments from step 1 with these new conditions.
Step 1
Load all the variables, volume, shape and content of the brains internal structures, (Destrieux Atlas structures), in heat map for all the subjects.
The following figure shows the analysis matrix of 176 subjects in terms of geometric parameters of volume, surface areas, of white matter. Red represents high values and blue represents low values
68
Figure 6.4: heat map for the variable overview observation use case, sorting variables by clusters
Figure above, two large clusters, group 1 and group 2 are observed. The two differ mostly in a geometric feature, which is the volume, (Nvoxels), of the different structures as can be seen below, there is no evidence that there are differences between hemispheres within the subjects that allow us to conclude that a certain hemisphere is different from the other in this geometric aspect.
Step 2
Sort these variables by the different available criteria such as: cluster, any of the variables, sum, variance, or the criteria that the analyst or expert wants to experience according to their knowledge and / or intuition.
69
Figure 6.5: heat map for the variable overview observation use case, sorting variables by cluster, presenting additional variables in upper and left
Group 1 with smaller volumes is mostly made up of female subjects, while group 2 with larger volumes is mostly made up of male subjects.
Step 3
If the analyst or expert wants to go into detail and compare one or more subjects, move on to the comparison of volumes that allows you to select the analysis subjects and the structures that you want to compare.
At this point, if the analyst wants to compare the volumes, intersection and internal content of two subjects. One from group 1 and one from group 2 can do it.
Note: at this time the image data must be placed on the same server in order to perform these calculations
70
Figure 6.6: Volume analysis module of the tools worked on this project
The previous figure shows the volume comparison for the hippocampus of subjects 8 and 940. A larger volume is observed in said structure for subject 8 with respect to 940.
Subject 8 has higher values at the Mean, Sum, Median and standard deviation level. The physical significance of the above implies that the subject 8 has a greater
71
volume and its content at the level of intensity of voxels is greater in all statistical parameters. Medical significance must still be analyzed and discussed with experts.
It can be seen that the first case has to do with the profiling of the variables that describe the subjects. This procedure can be performed each time you want to quickly determine if a variable can serve as a subject outlier identifier. It is a method of quickly collecting variables that have common values for most subjects and that only varies especially for a few subjects, that is, to detect outliers that can be examined in more detail after this procedure.
The second use case is a general way to test the results of the tool and contrast them with already established and tested results. This allows validating the observations and finding other possible additional results. Additionally, this use case serves to mark the general procedure of performing an analytical exploratory visualization of the data.
6.1 Application possibilities
The knowledge generation process is a non-trivial cyclical process. The applications that allow to generate knowledge through the process of data science, must scale in a way that facilitates this series of cycles, modifying the variables and characteristics according to the needs over and over again until decanting the information that contributes the value to the generation of knowledge.
The application is made up of 3 modules that allow the analysis of specific characteristics of the information relevant to this project:
Clustergrammer: library tool that allows to analyze the variables and their characteristics in the form of a matrix. This tool allows us to see in condensed form
72
a large amount of data, from between 500,000 to 1,000,000 cells, which is suitable in the case of more than 170 subjects of the Canguro program and a similar number of control subjects. Regarding the characteristics, it was possible to obtain up to 1900 features related to the shape and volume of the brain structures of the subjects. The tool allows to see the clusters of features and / or variables with the previous calculation of cosine distances for the arrangement of variables, either in columns or in rows.
Volume Compare: It is a feature developed by the author to compare two or more brain structures of the subjects. It performs the the center of mass calculation of the structures and puts the structures in this center to calculate their intersection. This is a way to characterize two or more subjects. On the other hand, it shows the statistical values of the contents such as the histogram of the values of voxels, maximum, minimum, average, average and standard deviation.
3D: It is a tool to visualize the selected structures in the number of subjects chosen in the "volume compare" tab. You can see the structures displaced to the center of mass to see the intersection between them visually. You can see the protuberances and shapes that can capture the attention of experts.
For the tools to be more flexible, the way to load the information must be dynamic and with the option of being able to search, filter and select insofar as the scientific processing of data is carried out.
The following are some views of the filtering facilities for Clustergrammer and Volume compare that allow to select subjects and features of the study by taking a pre-processed text file by FreeSurfer and an adaptation layer that scales the values of the variables to uniform values that can be represented in Clustergrammer.
73
Figure 6.7: Main screen to filter variables for the shape and volume analysis cluster tool with charged subjects and features (subjects and features already selected)
Figure 6.8: Output of volume and content analysis screen for two subjects at hippocampal level.
Figure 6.9: histogram for both hippocampi in the selected subjects
74
Figure 6.10: 3D view for the two hippocampi in the selected subjects
Figure 6.11: 3D view for the two hippocampi in the selected subjects (changed point of view)
6.2 Dimensionality reduction
The view of features vs. features not only allows the reduction of dimensionality, but also allows us to see which variables are inversely related. The features that most resemble, are related by means of the positive correlation in red the similarity matrix. Likewise, inversely related variables are identified by the negative correlation in blue within the matrix.
Following is the case of the features vs features matrix for structure content characteristics:
75
The features are:
"Center of mass" (parameter in x) Cx of the structure "Center of mass" (parameter in y) Cy of the structure "Center of mass" (parameter in z) Cz of the structure Voxel count of the structure Sum of value of voxels of the structure Average voxel value of the structure median voxel value of the structure Standard deviation of voxel value of the structure.
According to the following figure, there are red areas where there is a greater affinity between features, (apart from those on the diagonal), which indicates that these features contain relatively redundant information. On the other hand, there are areas in blue that indicate opposite and complementary information.
Figure 6.12: view of feature against feature matrix for dimensionality reduction
76
In the following figure, for example, it is observed that the voxel value summation variables and the individual sum of the same voxels in general provide similar information exactly 67%.
Figure 6.13: detailed view of features with positive relationship (in red)
On the other hand, the following graph shows that the value of the "center of mass" of the putamen moves in a positive direction (towards one side), if the volumes of the hippocampus, amygdala and putamen tend to be smaller.
Figure 6.14: detailed view of features with negative relationship (in blue)
77
6.3 Some Cases observed
The following are example cases with real data that are exposed to show some insights and the procedure carried out to reach them.
Brain structures analysis
The following figure shows the analysis matrix of 176 subjects in terms of geometric parameters of volume, surface areas, of different brain’s structures. The red color represents the high values and the blue one, the low values
Figure 6.15: two clusters to be analyzed with the cluster tool of the solution for Braviz 2
In the previous figure, there are two large clusters, group 1 and group 2. The two differ mainly in a geometric feature, which is the volume (Nvoxels) of the different structures of the brain. As can be seen below, there is no evidence that there are differences between hemispheres within the subjects that allows concluding that a certain hemisphere is different to the other in this geometrical aspect.
78
Figure 6.16: two clusters to be analyzed with the cluster tool of the solution for Braviz 2 (Quantification of P value)
In the previous figure if it is evident, (for P value less than 0.05), there is a difference between the subjects of groups 1 and 2 at the level of volume, average and minimum values of said volumetric characteristic. The parameter BRITH_Sex, is mostly sex 2 for where the volume has higher values and mostly sex 1 for where the volume is mostly smaller.
Dimensionality reduction
In the following figure it is observed that the features: NumVert (number of vertices of the tesselization) and surfArea (surface area of the structure), have very similar information. So we proceed to remove one of them.
79
Figure 6.17: Dimension reduction based on behavioral similarity view across all observed subjects
Something similar happens with the variables GausCurv, FoldInd and CurvInd. Of which only CurvInd will remain.
Figure 6.18: Dimension reduction based on behavioral similarity view across all observed subjects
80
The reduction done previously can be corroborated by looking at the lower cluster on the right side where most of the items with feature NumVert, SurfArea and GrayVol reside. Something similar happens with the upper cluster where most curvature features reside.
Figure 6.19: reduction of dimensions based on contribution on the characterization in subjects, based on the pvalue (curvature characteristics)
Figure 6.20: reduction of dimensions based on contribution on the characterization in subjects, based on the pvalue (surface characteristics)
81
If these variables are eliminated and the columns and rows are flipped, the following figure is obtained, where the male sex shows higher "surface area" values, while the female sex shows lower values.
Figure 6.21: difference between male and female subjects at the level of shape parameters having already made the reduction of dimensionality
Example of Canguro case
The following figures, represents volume of the brains internal structures comparative view between control subjects, (on the left), and kangaroo subjects, (on the right), both ordered by fragility, (greater fragility on the left side), for specific structures (Amygdala, Putamen, Caudate and hippocampus)
Canguro Subjects: Premature subjects who participate in the kangaroo method for a period.
Control subjects: Premature subjects who did not participate in the kangaroo method.
82
Figure 6.22: Comparative view between control subjects, (on the left), and kangaroo subjects, (on the right), both ordered by fragility, (greater fragility on the left side). Fragility is an “initial state variable”
In general, it is observed that the volumes in red, (those of greater value), are a little more lateralized to the right in the view of the control subjects and more distributed throughout the graph of the kangaroo subjects3. In fact, the values are of greater magnitude in this graph because the red color is more strongly dominant (especially in the caudate). All structures, (hippocampus, putamen, caudate, amygdala), look a little more reddish, (with greater volume), even with high fragilities on the Canguro´s side. Even with such very high weaknesses, the color blue, (lower volumes), are not as low as with the control subjects, since the shade of blue is a little less strong for subjects in Canguro.
Let´s compare the most fragile subjects, that is, (those that are to the left end of each of the previous figures), of each of the previous groups, (control and Canguro respectively), to analyze them in detail.
Feature Canguro Control group group 'Subject ID' 765 369 'NHPT_AverageMD' 26.5 20.5
3 the baby's charging time by the mother or father or some other family member is represented by the variable EX41_durPCconcontroles
83
'FOLLOW_Fragility_Rasch_441_2PL' 2.1571 2.7289 'EX41_durPCconcontroles03' 22 0 'location' 1 2
Using the volume and content comparison tool we find the following information that allows focusing the analysis. In this case, the caudate of both subjects has been selected to see their internal and volumetric detail
Figure 6.23: Comparative view between the most fragile subjects in the control and Canguro groups. The subject of the control group has 27% lower volume in the Caudate.
The content values for both flows are still comparable in terms of mean, medium and standard deviation as seen in the previous figure.
84
With the following histogram you can see the distribution of content values and their quantities where you can clearly see the difference in volumes between both subjects for their caudate.
Figure 6.24: Comparative histogram for the most fragile subjects in the control and Canguro groups. A clear difference in both histograms is evident for those control and Canguro subjects
The difference is evident in the amount of “infinitesimal” volumes that make up the total structure of the caudate whose values are equivalent to the mean value that for both caudate is similar.
Through the tool, it is possible to observe the volumetric structures and compare them in an interactive 3D view that allows the experts to have an interesting detail because they can see differences not evident between the studied subjects
85
Figure 6.25: Volumetric comparative view between the caudate structures of the most fragile subjects for both Kangaroo and control
This type of view also shows the difference in volumes between the kangaroo and control subjects. Let´s continue with some other views.
Figure 6.26: Comparative view between less fragile subjects and more fragile subjects for both control and Canguro. Less fragile have in general greater structure volume and vice versa.
Fragility comparison, greater than and less than 0.5. (Left greater than 0.5, right less than 0.5) for both control and kangaroo subjects, (Mixed subjects). On the left, the fragility is the greatest starting from the value 0.5, (2.7289 to 0.5146), on
86
the right the fragility is lower (0.4882 to -1.0152). Greatest volume values are observed the less the fragility (right view) and vice versa in the figure on the left.
In the figure on the left, where the fragility is greater, it is observed that the volume of the structures is greater when the charging time in Cangaroo is increased.
Figure 6.27: Comparative view between Canguro and control subjects ordered by NHTP_promedioMD, (Average dominant hand), which is a variable of type "subject performance variable”.
In the left figure Cangaroo subjects ordered by the variable NHPT_promedioMD, (Average dominant hand), (greater on the left side of the figure), (27 to 14.5). It is observed that the variable charge duration forms a kind of “clusters” with peak values and low values between them.
In the right figure, the uncharged subjects, (control) subjects, ordered by the same variable, NHPT_MediumMD (greater than the left side of the figure), (33 to 13.5).
87
Both figures show a distribution of volumes in species of alternate "clusters" (of positive size and then of smaller than average size), more clearly visible for loaded subjects. It is evident that those that were not loaded, show lower volumes, (more blue and more intense components), than those that were loaded.
The potential of performing this type of analysis for the NHPT, (NHTP_promedioMD), performance relationship with the most fragile subjects and the kangaroo loading time is shown here. Compared with subjects of comparable fragility characteristics in the control group.
88
Chapter 7
Conclusions
The influence of different factors such as the sex of the subject, over his condition of being premature on the surface of brain structures is evidenced. In fact, a clear differentiation is observed between the male and female subjects of the Canguro program in terms of the volume and surface area of brain structures. The previous exercise shows that kangaroo subjects generally have higher volumes in structures, (Amygdala, Putamen, Caudate and hippocampus), than premature subjects who have not undergone the kangaroo method.
From the use case 1, it was seen that the characteristic center of relative mass of the caudate, turns out to have the lowest of the variances among the four structures mentioned above. The caudate is then a structure that at the level of content is the most uniform and therefore some irregularity within it can be detected easier than in the rest of them.
The knowledge search process must be continuous, otherwise several of the initial iterations will be lost. The process of data mining is enriched with each cycle and the secret lies in a very good documentation, publication and collaboration.
7.1 Future Work The scientific data process is a non-trivial continuous cyclic process that evolves as it progresses over time. For this reason, it is necessary to continue with the process to refine the model and generate better results. It is very important to
89
work continuously with brain experts so that their feedback refines the type of variables and features required achieving a greater actionable knowledge. During the process of this Project, many insights were corroborated along with the experts. This validates the tool and drives to continue with the proposed cycles hoping for additional insights to be found. Several clusters were observed in the data mining exercise together with the neurologist. It is necessary to associate these clusters with the environment variables of the subject to confirm why and how the brain structures were affected.
The process and several of the techniques and tools used in the present project can also be applied in analogous problems in which one wishes to find patterns that allow the characterization of volumetric or 2D images. Already experts have proposed applying this research to prostate analysis or to the analysis of retina images.
Much of the preprocessing of this project was leveraged in the FreeSurfer tool. We wanted to take advantage of the work done in Braviz 1 in which all the information of the subjects was already pre-processed and available. FreeSurfer is a tool with a high degree of maturity that generates a large amount of information that according to this work is in many cases redundant and abundant. Another part of the preprocessing was implemented by the author, as was the case of the "centers of mass" and the internal characterizations of the voxels of the structures. It would still be interesting to add greater sources of pre-processing to the parameters of interest of this thesis, (shape and content), (such as those executed with FSL or Mindboggle). So that the segmentations and parcellations coming from only one source are refined.
It is important to make the tool even more flexible so that the variables can be filtered in a simple way and according to the needs of the experts, to be able to play with them to expand the possibilities of generating as many insights as possible.
90
References
*This references contain information related to different studies at the level of volume and shape in different brain structures which show a relationship between these characteristics and different aspects of the subject such as tumors, diseases or problems in brain development:
***Thisreferences are related to the impact on different brain structures in premature patients. These references guide us about which areas may be more susceptible to premature birth. these references also deal with the relationships between the volume of brain structures and different pathologies in subjects
[1] J. Cabral, M. L. Kringelbach, and G. Deco, “Functional connectivity dynamically evolves on multiple time-scales over a static structural connectome: Models and mechanisms,” Neuroimage, vol. 160, pp. 84–96, Oct. 2017.
[2] C.-S. Chen, S.-Y. Lin, M.-H. Fan, and C.-H. Huang, “A Closed-Form Algorithm for Converting Hilbert Space-Filling Curve Indices *.”
[3] M. K. Chung, “Statistical Methods in Brain Image Analysis with MATLAB.”
[4] M. K. Chung, Statistical and computational methods in brain image analysis. .
[5] S. B. Eickhoff, B. T. T. Yeo, and S. Genon, “Imaging-based parcellations of the human brain,” Nat. Rev. Neurosci., vol. 19, no. 11, pp. 672–686, Nov. 2018.
[6] L. Fan et al., “The Human Brainnetome Atlas: A New Brain Atlas Based on Connectional Architecture,” Cereb. Cortex, vol. 26, no. 8, pp. 3508–3526, Aug. 2016.
[7] B. Fischl and M. I. Sereno, “Microstructural parcellation of the human brain,” Neuroimage, vol. 182, pp. 219–231, Nov. 2018.
[8] K. J. (Karl J. . Friston, J. Ashburner, S. Kiebel, T. Nichols, and W. D. Penny, Statistical parametric mapping : the analysis of funtional brain images. Elsevier/Academic Press, 2007.
91
[9] V. L. Galinsky and L. R. Frank, “Automated segmentation and shape characterization of volumetric data.,” Neuroimage, vol. 92, pp. 156–68, May 2014.
[10] I. García, F. Tutoras, D. Carmen, S. Gotarredona, D. Begoña, and A. Piñero, “Aportaciones a la Segmentación y Caracterización de Imágenes Médicas 3D.”
[11] G. Gerig, M. Styner, D. Jones, D. Weinberger, and J. Lieberman, “Shape analysis of brain ventricles using SPHARM.”
[12] D. Grado and P. R. Román, “ESTAD´ISTICAESTAD´ESTAD´ISTICA DESCRIPTIVA E INTRODUCCI´ONINTRODUCCI´ INTRODUCCI´ON A LA PROBABILIDAD.”
[13] Ian L. Dryden; Kanti V. Mardia, “17 Shape in images - Statistical Shape Analysis, 2nd Edition,” Wiley, 2016. [Online]. Available: https://www.safaribooksonline.com/library/view/statistical-shape- analysis/9780470699621/c17.xhtml. [Accessed: 03-Oct-2018].
[14] Joelle M. Abi-Rached; Nikolas Rose, “Two: The Visible Invisible - Neuro,” Princeton University Press, 2013. [Online]. Available: https://www.safaribooksonline.com/library/view/neuro/9781400846337/11 _Chapter2.xhtml. [Accessed: 03-Oct-2018].
[15] M. Meuschke, P. Berg, R. WickenhöferWickenh, B. Preim, and K. Lawonn, “Visual Analysis of Aneurysm Data using Statistical Graphics.”
[16] R. A. Morey et al., “A comparison of automated segmentation and manual tracing for quantifying hippocampal and amygdala volumes ☆,” Neuroimage.
[17] A. Sherbondy, “Shape Analysis of Fiber Tractography in the Human Brain.”
[18] P. Shilane and T. Funkhouser, “Selecting Distinctive 3D Shape Descriptors for Similarity Retrieval.”
[19] J. W. H. Tangelder and R. C. Veltkamp, “A Survey of Content Based 3D Shape Retrieval Methods.”
[20] C. L. Tardif, A. Schäfer, M. Waehnert, J. Dinse, R. Turner, and P.-L. Bazin, “Multi- contrast multi-scale surface registration for improved alignment of cortical areas,” Neuroimage, vol. 111, pp. 107–122, May 2015.
92
[21] Q. Wang, R. Chen, J. JaJa, Y. Jin, L. E. Hong, and E. H. Herskovits, “Connectivity- Based Brain Parcellation: A Connectivity-Based Atlas for Schizophrenia Research.,” Neuroinformatics, vol. 14, no. 1, pp. 83–97, Jan. 2016.
[22] J. Weissenbock, B. Frohler, E. Groller, J. Kastner, and C. Heinzl, “Dynamic Volume Lines: Visual Comparison of 3D Volumes through Space-filling Curves,” IEEE Trans. Vis. Comput. Graph., pp. 1–1, 2018.
[23] Z. Wu et al., “3D ShapeNets: A Deep Representation for Volumetric Shapes.”
[24] L. Zhang, M. João, D. Fonseca, and A. Ferreira, “União Europeia-Fundos Estruturais Governo da República Portuguesa PROJECTOS DE INVESTIGAÇÃO CIENTÍFICA E DESENVOLVIMENTO TECNOLÓGICO Survey on 3D Shape Descriptors.”
[25] J. Zhuo, “Introduction to Statistical Parametric Mapping.” [Online]. Available: https://www.fil.ion.ucl.ac.uk/spm/doc/intro/#_IV.__Statistical_Parametric Mapping. [Accessed: 04-Oct-2018].
[26] J. Zhuo, Handbook of Medical Image Processing and Analysis. Elsevier, 2009.
[27] “THE NIFTI-1 DATA FORMAT.”
[28] “Probabilidad condicionada - Wikipedia, la enciclopedia libre.” [Online]. Available: https://es.wikipedia.org/wiki/Probabilidad_condicionada. [Accessed: 04-Oct-2018].
[29] “What Is FMRI? - Center for Functional MRI - UC San Diego.” [Online]. Available: http://fmri.ucsd.edu/Research/whatisfmri.html. [Accessed: 21-Nov- 2018].
[30] “Fundación Canguro.” [Online]. Available: http://fundacioncanguro.co/. [Accessed: 21-Nov-2018].
[31] “SPM - Wikibooks, open books for an open world.” [Online]. Available: https://en.wikibooks.org/wiki/SPM. [Accessed: 04-Oct-2018].
[32] D. A. Angulo, C. Schneider, J. H. Oliver, N. Charpak, and J. T. Hernandez, “A Multi-facetted Visual Analytics Tool for Exploratory Analysis of Human Brain and Function Datasets.,” Front. Neuroinform., vol. 10, p. 36, 2016.
93
[33] S. C. Ng, “Principal component analysis to reduce dimension on digital image,” Procedia Comput. Sci., vol. 111, pp. 113–119, Jan. 2017.
[34] T. D. Pham et al., “The hidden-Markov brain: comparison and inference of white matter hyperintensities on magnetic resonance imaging (MRI),” J. Neural Eng., vol. 8, no. 1, p. 016004, Feb. 2011.
[35] G. A. Carmichael, “The Cohort and Period Approaches to Demographic Analysis,” 2016, pp. 85–128.
[36] G. A. Carmichael, Fundamentals of Demographic Analysis: Concepts, Measures and Methods, vol. 38. Cham: Springer International Publishing, 2016.
[37] G. A. Carmichael, “Comparison: Standardization and Decomposition,” Springer, Cham, 2016, pp. 49–84.
[38] M. M. Kazhdan, “SHAPE REPRESENTATIONS AND ALGORITHMS FOR 3D MODEL RETRIEVAL A DISSERTATION PRESENTED TO THE FACULTY OF PRINCETON UNIVERSITY IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY RECOMMENDED FOR ACCEPTANCE,” 2004.
[39] R. Pienaar, B. Fischl, V. Caviness, N. Makris, and P. E. Grant, “A METHODOLOGY FOR ANALYZING CURVATURE IN THE DEVELOPING BRAIN FROM PRETERM TO ADULT.,” Int. J. Imaging Syst. Technol., vol. 18, no. 1, pp. 42–68, Jun. 2008.
[40] B. Fischl et al., “Whole Brain Segmentation: Automated Labeling of Neuroanatomical Structures in the Human Brain,” Neuron, vol. 33, no. 3, pp. 341– 355, Jan. 2002.
[41] “Visual Information-Seeking Mantra - InfoVis:Wiki.” [Online]. Available: https://infovis-wiki.net/wiki/Visual_Information-Seeking_Mantra. [Accessed: 15-Jul-2019].
[42] T. Munzner and E. (Graphic artist) Maguire, Visualization analysis & design. .
[43] ***O. Potvin, A. Mouiha, L. Dieumegarde, and S. Duchesne, “Normative data for subcortical regional volumes over the lifetime of the adult human brain,” Neuroimage, vol. 137, 2016.
94
[44] * J. V. Manjón and P. Coupé, “volBrain: An Online MRI Brain Volumetry System,” Front. Neuroinform., vol. 10, 2016.
[45] *** D. Diaz, J. Villegas, J. A. Guerra-Gomez, N. Charpak, and J. T. Hernández, “Visual tools for the exploration of growth data in a cohort of kangaroo infants during their first year of life,” in 2017 IEEE Workshop on Visual Analytics in Healthcare, VAHC 2017, 2018.
[46] * A. A. Salam, T. Khalil, M. U. Akram, A. Jameel, and I. Basit, “Automated detection of glaucoma using structural and non structural features,” Springerplus, vol. 5, no. 1, 2016.
[47] M. Joseph Nitzken and M. Joseph, “Shape analysis of the human brain. Recommended Citation,” 2015.
[48] *** P. Coupé, G. Catheline, E. Lanuza, J. Manjón, and V. Manjón, “Towards a unified analysis of brain maturation and aging across the entire lifespan: A MRI analysis.”
[49] * S. S. Keller and N. Roberts, “Measurement of brain volume using MRI: software, techniques, choices and prerequisites,” 2009.
[50] * M. Styner, I. Oguz, S. Xu, C. Brechbühler, D. Pantazis, and G. Gerig, “Statistical Shape Analysis of Brain Structures using SPHARM-PDM.”
[51] * M. Styner et al., “Framework for the Statistical Shape Analysis of Brain Structures using SPHARM-PDM.,” Insight J., no. 1071, pp. 242–250, 2006.
[52] * G. Gerig, M. Styner, D. Jones, D. Weinberger, and J. Lieberman, “Shape analysis of brain ventricles using SPHARM,” in Proceedings IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA 2001), pp. 171– 178.
[53] M. E. Brummer, R. M. Mersereau, R. L. Eisner, and R. R. J. Lewine, “Automatic detection of brain contours in MRI data sets,” IEEE Trans. Med. Imaging, vol. 12, no. 2, pp. 153–166, Jun. 1993.
95
[54] * S. Ruan, C. Jaggi, J. Xue, J. Fadili, and D. Bloyet, “Brain tissue classification of magnetic resonance images using partial volume modeling,” IEEE Trans. Med. Imaging, vol. 19, no. 12, pp. 1179–1187, 2000.
[55] * S. Tootoonian, R. Abugharbieh, X. Huang, and M. J. McKeown, “Shape vs. Volume: Invariant Shape Descriptors for 3D Region of Interest Characterization in MRI,” in 3rd IEEE International Symposium on Biomedical Imaging: Macro to Nano, 2006., pp. 754–757.
[56] * B. Singh, N. Marshkole, K. Singh, and A. S. Thoke, “Texture and Shape based Classification of Brain Tumors using Linear Vector Quantization,” 2011.
[57] * G. Sanabria-Diaz et al., “Surface area and cortical thickness descriptors reveal different attributes of the structural human brain networks,” Neuroimage, vol. 50, no. 4, pp. 1497–1510, May 2010.
[58] * U. Castellani et al., “A New Shape Diffusion Descriptor for Brain Classification,” Springer, Berlin, Heidelberg, 2011, pp. 426–433.
[59] *** W. Y. Loh et al., “Neonatal basal ganglia and thalamic volumes: very preterm birth and 7-year neurodevelopmental outcomes.,” Pediatr. Res., vol. 82, no. 6, pp. 970–978, Dec. 2017.
[60] *** N. Charpak et al., “Twenty-year Follow-up of Kangaroo Mother Care Versus Traditional Care,” Pediatrics, vol. 139, no. 1, p. e20162063, Jan. 2017.
[61] *** M. H. Beauchamp et al., “Preterm infant hippocampal volumes correlate with later working memory deficits,” Brain, vol. 131, no. 11, pp. 2986–2994, Jun. 2008.
[62] *** Y. Lao et al., “Thalamic alterations in preterm neonates and their relation to ventral striatum disturbances revealed by a combined shape and pose analysis.,” Brain Struct. Funct., vol. 221, no. 1, pp. 487–506, Jan. 2016.
[63] *** C. Solé-Padullés et al., “Intrinsic connectivity networks from childhood to late adolescence: Effects of age and sex,” Dev. Cogn. Neurosci., vol. 17, pp. 35–44, Feb. 2016.
96
[64] * R. W. Cox, “AFNI: Software for Analysis and Visualization of Functional Magnetic Resonance Neuroimages,” Comput. Biomed. Res., vol. 29, no. 3, pp. 162– 173, Jun. 1996.
[65] *** C. M. Y. Chau et al., “Hippocampus, Amygdala, and Thalamus Volumes in Very Preterm Children at 8 Years: Neonatal Pain and Genetic Variation,” Front. Behav. Neurosci., vol. 13, p. 51, Mar. 2019.
[66] *** A. L. Cismaru et al., “Altered Amygdala Development and Fear Processing in Prematurely Born Infants.,” Front. Neuroanat., vol. 10, p. 55, 2016.
97
Annex A:
Profile description of important variables according to experts, for analysis of the tool with kangaroo data. code BirthDate Initial_State SCB_agemother Initial_State BIRTH_peso5 Initial_State BIRTH_talla5 Initial_State BIRTH_pc5 Initial_State BIRTH_apgar1_5 Initial_State BIRTH_apgar5_5 Initial_State BIRTH_sexo5 Initial_State BIRTH_retardocrecimientointraute Initial_State BIRTH_gestacat Initial_State BIRTH_semanas32 Initial_State BIRTH_peso1200gr Initial_State NEO_gestasalcat Initial_State NEO_gestasal Initial_State NEO_peso6 Initial_State FOLLOW_Fragility_Rasch_441_2PL Initial_State FOLLOW_Fragility_Rasch_746_2PL Initial_State FOLLOW_GA_day1intervention Initial_State FOLLOW_ageday1intervention Initial_State EX41_pesosalidaPC Intervention EX41_tallasalidaPC Intervention EX41_PCsalidaPC Intervention EX41_EGsalidaPC Intervention EX41_egestasal Intervention EX41_papacarga Intervention EX41_papacarga02 Intervention EX41_mamacargo Intervention EX41_separationfrommother Intervention WASI_PercRsngcompositescore Intervention
BIRTH_pc5 Numeric Distinct count 21 Unique (%) 18.6% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 298.31 Minimum 235 Maximum 360
98
Zeros (%) 0.0% Mini histogram Toggle details
BIRTH_apgar5_5 Numeric Distinct count 6 Unique (%) 5.3% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 9.0442 Minimum 5 Maximum 10 Zeros (%) 0.0% Mini histogram Toggle details
BIRTH_talla5 Numeric Distinct count 17 Unique (%) 15.0% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 405.75 Minimum 310 Maximum 480 Zeros (%) 0.0% Mini histogram Toggle details
BIRTH_peso5 Numeric Distinct count 31 Unique (%) 27.4% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 1547.1 Minimum 800 Maximum 1800 Zeros (%) 0.0% Mini histogram Toggle details
FOLLOW_ageday1intervention Highly correlated This variable is highly correlated with EX41_separationfrommother and should be ignored for analysis
Correlation 0.91366
99
BIRTH_peso1200gr Boolean Distinct count 2 Unique (%) 1.8% Missing (%) 0.0% Missing (n) 0 0 99 1 14 Toggle details
EX41_egestasal Numeric Distinct count 57 Unique (%) 50.4% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 36.829 Minimum 31.143 Maximum 43.571 Zeros (%) 0.0% Mini histogram Toggle details
NEO_gestasal Numeric Distinct count 42 Unique (%) 37.2% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 34.994 Minimum 31.143 Maximum 40.429 Zeros (%) 0.0% Mini histogram Toggle details
EX41_separationfrommother Numeric Distinct count 39 Unique (%) 34.5% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 16.434 Minimum 1 Maximum 64 Zeros (%) 0.0% Mini histogram Toggle details
EX41_papacarga
100
Categorical Distinct count 3 Unique (%) 2.7% Missing (%) 0.0% Missing (n) 0 3 53 1 42 0 18 Toggle details
BIRTH_gestacat Categorical Distinct count 3 Unique (%) 2.7% Missing (%) 0.0% Missing (n) 0 1 53 2 37 3 23 Toggle details
BirthDate Date Distinct count 91 Unique (%) 80.5% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Minimum 1993-06-18 00:00:00 Maximum 1994-09-10 00:00:00 Mini histogram Toggle details
BIRTH_apgar1_5 Numeric Distinct count 8 Unique (%) 7.1% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 7.6018 Minimum 1 Maximum 10 Zeros (%) 0.0% Mini histogram Toggle details
NEO_peso6 Numeric Distinct count 47
101
Unique (%) 41.6% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 1573 Minimum 1025 Maximum 1890 Zeros (%) 0.0% Mini histogram Toggle details
NEO_gestasalcat Highly correlated This variable is highly correlated with NEO_gestasal and should be ignored for analysis
Correlation 0.98435
FOLLOW_GA_day1intervention Numeric Distinct count 44 Unique (%) 38.9% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 34.604 Minimum 30.571 Maximum 40.429 Zeros (%) 0.0% Mini histogram Toggle details
SCB_agemother Numeric Distinct count 22 Unique (%) 19.5% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 28.283 Minimum 19 Maximum 40 Zeros (%) 0.0% Mini histogram Toggle details
BIRTH_semanas32 Boolean Distinct count 2 Unique (%) 1.8% Missing (%) 0.0% Missing (n) 0
102
0 60 1 53 Toggle details
FOLLOW_Fragility_Rasch_746_2PL Numeric Distinct count 17 Unique (%) 15.0% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 0.2142 Minimum -1.1861 Maximum 1.6533 Zeros (%) 0.0% Mini histogram Toggle details
WASI_PercRsngcompositescore Numeric Distinct count 47 Unique (%) 41.6% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 89.858 Minimum 52 Maximum 123 Zeros (%) 0.0% Mini histogram Toggle details
code Numeric Distinct count 113 Unique (%) 100.0% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 531.9 Minimum 9 Maximum 1049 Zeros (%) 0.0% Mini histogram Toggle details
BIRTH_retardocrecimientointraute Boolean Distinct count 2
103
Unique (%) 1.8% Missing (%) 0.0% Missing (n) 0 0 81 1 32 Toggle details
EX41_mamacargo Boolean Distinct count 2 Unique (%) 1.8% Missing (%) 0.0% Missing (n) 0 1 58 0 55 Toggle details
BIRTH_sexo5 Boolean Distinct count 2 Unique (%) 1.8% Missing (%) 0.0% Missing (n) 0 1 64 2 49 Toggle details
EX41_papacarga02 Boolean Distinct count 2 Unique (%) 1.8% Missing (%) 0.0% Missing (n) 0 0 71 1 42 Toggle details
FOLLOW_Fragility_Rasch_441_2PL Numeric Distinct count 102 Unique (%) 90.3% Missing (%) 0.0% Missing (n) 0 Infinite (%) 0.0% Infinite (n) 0 Mean 0.36574 Minimum -1.0152 Maximum 2.7289 Zeros (%) 0.0% Mini histogram Toggle details
104
Correlation Pearson’s
All the previous variables all can be used to feeding the tool, independent of its type, that is, if it is numerical or categorical, because the heat map has at the top zone, the ability to introduce category or membership variables, which constitutes a type of hierarchy.
105