PHM for Biomedical Analytics: A Case Study on

Neurophysiologic Data from Patients with Traumatic Brain

Injury

A thesis submitted to the Division of Research and Advanced Studies of the University of Cincinnati in partial fulfillment of the requirements

for the Degree of Master of Science

In the Department of Mechanical Engineering of the College of Engineering

2016

by

Laura Pahren

B.S. in Mechanical Engineering, The Ohio State University (2014)

Committee Chair: Dr. Jay Lee Committee Member: Dr. Brandon Foreman Committee Member: Dr. Jay Kim

i

ABSTRACT

Neurological data is the principal feedback for clinicians treating comatose patients in the Neuro-Intensive Care Unit (NICU), making this data critical in determining treatment, and hence patient outcomes. If this data is misinterpreted, patients can endure varying degrees of long term cognitive disabilities, or death. Therefore, understanding the signals themselves, their relationships to patient outcomes, and developing heterogeneous models for patient-specific modeling has become a key area of interest.

This study has been conducted for 7 comatose patients, who have suffered traumatic brain injuries (TBI) and were treated in the University of Cincinnati’s Neuro ICU

Department. The primary signals of interest were 15 channels of cortical depth electroencephalogram (EEG) and intracranial pressure (ICP). Data was collected within

12 to 24 hours of injury and for 48 to 72 hours after, with intermittent gaps. The aim of this project was to investigate the existence of an EEG and ICP signal relationship, develop a biomedical data cleaning protocol for the inclusion of future signals and determine prominent ICP thresholds in relation to EEG variables. After extracting various EEG features such as energy in key sub bands, Hjorth parameters, features and , data was classified into different peak ICP threshold ranges.

These feature data sets are then central to determining whether varying ICP changes can be quantified based on the cortical EEG recordings and whether a common data element can be identified for deeper understanding of these signal relationships. Long term, by realizing the complex causal relationships of neurological data, ICP may be assessed via surface EEG, eliminating the need to drill into the skull and its associated risks. Moreover, further neurophysiological brain mapping can create knowledge that can enable more

ii informed decision-making in ICP-moderating intervention to reduce secondary brain injuries. The criteria and future work vital to determining the details of this relationship are assessed after a comprehensive case study has been made to verify of the existence of an EEG/ICP relationship by modeling EEG variables in a neural networked-based self- organizing map (SOM). The accuracy of the clusters developed in the SOM are assessed using image processing techniques to estimate its ability to distinguish between the corresponding ICP values’ threshold adherence using external validity measures.

Furthermore, to mitigate issues of dynamic brain states, the windows of time for the modeled data were determined from consistent segments of strong negatively or positively correlated ICP and cerebral blood flow values, which can be indicative of intracranial compliance, cerebral spinal fluid regulation and cerebral autoregulation. From this analysis, an average estimated external validity was determined to be 85.3%, which an estimated external validity high of 98.0%. These results lay the groundwork for further defining the exact nature of this ICP/EEG relationship for clinical use.

iii

iv

ACKNOWLEGEMENTS

I’d like to express my gratitude for the help and support of my committee chair, Dr.

Jay Lee. I’d also like to thank Dr. Brandon Foreman for the incredible opportunity to collaborate with him in such a meaningful and impactful field of study, as well as for sharing his endless knowledge and insight. I would like to thank Dr. Jay Kim for his persistence in encouraging me with my studies. I could not have done it without Dr.

Hossein Davari, who constantly guided throughout this project. Many thanks are also due to many members of the Center for Intelligent Maintenance Systems: She Zhi, Jin Chao,

Matt Buzza, Yuan Di, Patrick Brown and any others who I may have forgotten.

Finally, I would like to thank my family; my husband for his patience and support throughout this process, my parents for all the support throughout my education to get me to this point and my brothers who have always remained my greatest role models.

v

TABLE OF CONTENTS

ABSTRACT ...... ii

ACKNOWLEGEMENTS ...... v

TABLE OF CONTENTS ...... vi

LIST OF TABLES ...... x

1. INTRODUCTION ...... 1

1.1. Background ...... 1

1.2. Research Objectives ...... 6

1.3. Thesis Layout ...... 9

2. LITERATURE REVIEW ...... 11

2.1. Sensor Systems in the Neuro Intensive Care Unit ...... 11

2.2. Clinical Studies and Interpretation of Neurophysiologic Signals ...... 15

2.3. Application of Data Analytic Tools ...... 22

3. TECHNICAL APPROACH ...... 33

3.1. Overview ...... 33

3.2. ...... 34

3.3. EEG Signal Analysis ...... 35

3.3.1. EEG Data Cleaning ...... 36

3.3.2. EEG and Feature Extraction ...... 40

vi

3.4. ICP Signal Analysis ...... 46

3.4.1. ICP Data Cleaning ...... 46

3.4.2. ICP Signal Processing and Feature Extraction ...... 48

3.5. EEG Feature Selection ...... 51

3.6. Brain State Segmentation...... 56

3.6.1. Cerebral Blood Flow Signal Processing ...... 56

3.6.2. Data Segmentation ...... 58

3.7. Pattern Recognition Using Self-Organizing Map ...... 60

4. RESULTS AND DISCUSSION ...... 63

4.1. ICP-EEG Relationships ...... 63

4.2. Data Cleaning Strategies ...... 73

5. CONCLUSIONS AND FUTURE WORK ...... 76

5.1. Research Findings ...... 76

5.2. Broader Impacts ...... 77

5.3. Recommendations for Future Work ...... 78

REFERENCES ...... 80

vii

LIST OF FIGURES

Figure 1: Data Collection Schematic ...... 12

Figure 2: X-ray Showing Sensor Set-up ...... 14

Figure 3: Hemodynamics Feedback Loop (Ursino & Lodi, 1997) ...... 22

Figure 4: Adapted PHM-based Approach ...... 23

Figure 5: Relationship Model ...... 24

Figure 6: Fuzzy Logic Example ...... 25

Figure 7: Hierarchical Clustering ...... 27

Figure 8: K- Example ...... 29

Figure 9: ANN Layers ...... 31

Figure 10: Overall Neurophysiologic Assessment Approach ...... 33

Figure 11: Specified Matlab GUI for Moberg Data for ICP signal with event data ...... 36

Figure 12: Data Rejection and Alignment ...... 38

Figure 13: Sample of Removed Window of Data ...... 39

Figure 14: EEG Spectrum Example ...... 40

Figure 15: Wavelet Decomposition ...... 44

Figure 16: Wavelet Coefficient Plots ...... 45

Figure 17: Overall Data Cleaning and Signal Processing ...... 48

Figure 18: ICP Peak Morphology ...... 49

Figure 19: ICP Peak Detection with Low-Pass Filtered Signal ...... 50

Figure 20: Window of ICP Peaks and Filtered EEG Signal ...... 51

Figure 21: Top 5 Fisher Results by Patient ...... 53

Figure 22: Cumulative Fisher Results ...... 54

viii

Figure 23: Sample Feature Plots of A5me with ICP ...... 55

Figure 24: CBF Processing ...... 57

Figure 25: CBF/ICP/EEG window alignment ...... 58

Figure 26: Neurological Signal Correlations ...... 59

Figure 27: Self Organizing Map Results ...... 64

Figure 28: U-matrix Shifting and Scaling ...... 65

Figure 29: U-matrix and Desired Output Label Map Overlay ...... 66

Figure 30: Label Location and Text Reader ...... 68

Figure 31: Cluster Boundary Tracing...... 69

Figure 32: Confusion Matrix ...... 70

Figure 33: External Validity Time Relationship ...... 72

Figure 34: Biomedical Data Cleaning Strategy ...... 75

ix

LIST OF TABLES

Table 1: Detailed Sensor List for Moberg Multimodality Monitoring ...... 13

Table 2: EEG Frequency Sub Bands ...... 18

Table 3: Patient Data Information ...... 35

Table 4: Information on Removed Patient Data ...... 37

Table 5: Wavelet Coefficient Frequency Sub Bands ...... 45

Table 6: List of Included EEG Features ...... 46

Table 7: Feature Names ...... 52

Table 8: Estimated Relationship Validity Results ...... 71

Table 9: Combined Patient Results ...... 73

x

1. INTRODUCTION

1.1. Background

Within the manufacturing industry, the impact of Prognostics and Health

Management (PHM) practices have become widespread through research and development of intelligent models that have enabled asset health condition assessments, remaining useful life (RUL) predictions and fault type identifications. These practices have allowed for optimized maintenance scheduling and overall costly downtime reduction.

These intelligent models can be typically divided into two categories: physics-based models and data-driven models. Physics-based models require kinematic and/or kinetic knowledge of the system itself, while data-driven models focus solely on observational data. These data-driven models have been used for health prediction of manufacturing components and systems such as bearings, wind turbine gearboxes and electro- mechanical systems (Siegel et al., 2012; Zhao et al., 2013; Yuan et al., 2004). In fact, the

US Environmental Protection Agency stated that an equipment increase of 15-

25% can be attained using Total Predictive Maintenance Strategies, a subset of PHM

(Environmental Protection Agency, 2003). PHM has proven its impact for many industries that have the data that meets standards for volume, velocity, variety and veracity (Sheth,

2014).

One of these industries that can benefit significantly from PHM tools is the medical industry. With an increasing amount of collected and stored clinical variables and biomedical data, intelligent models are being adopted for assessing patient conditions, predicting outcomes and identifying sources of deteriorating conditions. In fact, (Asri,

1

2015) found that healthcare data had increased to 150 exabytes (or 150 x 1018 bytes) in

2013, with a rate of increase of about 1.2 to 4 exabytes a year. This growing, available data presents opportunities for risk modeling, condition assessments and optimization in treatment scheduling. Recently, (Kevorkova & Popov, 2016) presented a PHM-based predictive diagnostic tool to monitor astronaut health on space missions and determine root causes and contributing factors to enable autonomous healthcare. This innovative application has laid out some of the initial groundwork to coupling PHM algorithms to human-centric data. Validation of some of these PHM-based tools with biomedical data has appeared in research involving prediction of patient readmissions for diseases and disease risk prediction (Al-Sayouri, 2014; Wessler, et al., 2015). Most of the intelligent modeling for medical data falls in the PHM category of data-driven models, given that the human body is one of the most complex systems, hence difficulty in comprehensive physiological modeling. Furthermore, by using these data-driven models for prediction, physicians can also learn more about lesser understood human physiology, especially in relationship to certain disease or injury progression. One such example, was (Goncalves et al., 2014), who used a combination of machine learning techniques to model 394 patient vital signs, mechanical ventilator data, pharmaceutical data and laboratory results to predict sepsis levels, attaining a 100% accuracy. Similarly, (Asadi, et al., 2014) used an artificial neural network (ANN) to model demographic, procedural and clinical factors in predicting outcomes of acute ischemic stroke post-arterial therapy. This ANN achieved an accuracy of 80% between the target and output classes. Understanding these factors that contribute most to these aforementioned models can keep healthcare providers alert to predictors of future issues and improve current treatment pathways. Additionally,

2 researchers can begin to investigate the way in which diseases interact with the body and influence physiology. Challenges in modeling biomedical data include limitations in how modern anatomy and physiology understand the physical relationships of biologic variables and the indirect nature of the majority of biomedical measurements. Moreover, the non-linearity of many biomedical signals, as well as the dependency of many of these signals on patient attributes (i.e. age, sex, height, medical history, etc.) pose additional modeling difficulties. Altogether, there has begun an emergence of these PHM methodologies in human body modeling on various levels, but proper techniques and advancements may be necessary to deal with many complexities unique to biomedical data.

One of the most complex systems within the human body, itself, is the central nervous system (CNS), comprised of the brain and spinal cord, which control the activities of the body by processing information received from other parts of the body. Multivariate modeling has manufactured important knowledge and applications for the treatments of injury and disease related to the CNS. This is illustrated by (Shehadeh, 2014), who developed a brain aneurysm rupture prediction framework and achieved a prediction accuracy of 0.91, while also establishing characteristics that were important to the model.

These characteristics were determined to be dome to neck ratio, baseline Fisher, location, and number of lobes and size. Hence, these variables can be considered important to understanding both the management of brain aneurysms, as well as their overall development. Though there are numerous CNS-related variables and measurements, one particular measurement has become popular among biomedical data scientist: electroencephalography (EEG), which can be taken intracranially, or within the skull, or

3 noninvasively, where the electrodes are placed along the scalp. Intelligent modeling specifically for the brain is often constructed with EEG as a central input, given its ability to capture brain activities. An example of this is in (Günes et al., 2010), which used a k- nearest neighbor algorithm with weighted sleep stages to classify EEG into five stages of sleep, achieving an 82.15% success rate. In this situation, EEG signals become a powerful predictor of different states. More specifically, at the convergence of EEG and machine learning, seizure prediction has emerged as a predominant focus area of research. Several studies have investigated methodologies to classify pre-ictal (before seizure) EEG data and ictal (during seizure) data with recognizable success. Such a case was (Mirowski et al., 2009), who developed patient specific classifiers for prediction of interictal and preictal EEG patterns with bivariate features of EEG synchronization to achieve a 71% sensitivity with 0 false positives. More recently, (Ma & Bliss, 2014) developed both an intra-patient and inter-patient seizure prediction framework using features based on spatial temporal covariance in conjunction with support vector machines (SVM). The intra-patient receiver operating characteristic (ROC) result was

0.977, while the inter-patient ROC was 0.822, signifying that EEG patterns may be similar among patients, but more accurate results can be acquired through use of precision medicine, where modeling is performed on a patient-specific basis. Precision medicine has the ability to optimize treatments towards a patient’s individual records and data information. These and other similar studies lend to the theory that EEG signals sustain a predictive power in treating brain related illnesses like seizure, while also inferring there may be validity to handling data on a patient to patient basis.

4

Seizure falls in a class of potentially deadly neurological ailments that may benefit from intelligent modeling. However, another ailment that falls into this category is traumatic brain injury (TBI). In the United States, there are approximately 1.7 million people who suffer a TBI every year, with TBI accounting for 30% of injury-related deaths, of which, 1.365 million are treated and released, 275,000 are hospitalized and 52,000 die

(Faul et al., 2010). When an individual is diagnosed with TBI, they are also assessed for the severity of their TBI, which is done using the Glasgow Coma Scale (GCS) (Corrigan et al., 2010). A severe TBI classification is a patient with a GCS score of 3 to 8. The severity of the TBI is a key factor in patient prognosis, which include the person’s cognitive function, motor function, sensation and emotion. Damage to these abilities can lead to varying degrees of disability. From a report of the Traumatic Brain Injury Act of 1996 there were about 5.3 million people living with a disability caused by TBI in the United States

(Langlois, 2006). Understandably, a higher GCS score causes a higher probability of disability. Patients with severe TBIs have only a 25-33% probability of having positive outcomes for closed injuries (Association of Neurological Surgeons). Poor outcomes can be anything from moderate to severe disabilities, to a persistently vegetative state, to death. Several previous studies have been dedicated to establishing a relationship between TBI treatments and overall outcomes, yet a gap in standing research exists in terms of verifying how intervening treatments may be affecting immediate neurophysiology and future brain cognition. This knowledge may promote influences of precision medicine for TBI treatments, where clinicians can better establish patient- specific thresholds for these intervening treatments to optimize patient outcomes and future quality of life. This study seeks to investigate potential methods for increasing

5 positive cognitive functional outcomes with a specific focus on patients with severe TBIs, using neurophysiologic signals by determining if a possible relationship between current

TBI treatment targets and EEG exist.

Modern TBI treatments are based off of interpretation of raw bedside data and currently focuses on intracranial pressure (ICP) and ICP-related measurements. ICP, or the pressure inside the skull, is a key measurement. Physicians in the NICU will intervene with treatments to manage ICP within clinically-based thresholds. In order to assess these

TBI treatment targets, in the form of the ICP signal, and EEG, there are several challenges. For one, patients are a very dynamic monitored system that often become disconnected from the monitoring system for treatments causing several disconnections.

Secondly, existing analyses are too simple to assess the invisible patterns in EEG.

Finally, any relationship may be affected by some kind of brain states which may be defined by other neurophysiologic signals, treatments and/or environmental factors.

1.2. Research Objectives

There are several clinical trials that have taken place to investigate neurophysiologic signals and their relationship with TBI outcomes (Budohoski et al., 2012;

Manevich et al., 2014; Mehta et al., 2009), yet little has been done, in this area, using modern machine learning approaches. By managing medical data from a PHM-based perspective, this study aims to determine the presence of correlation between some of the signals themselves, which can then be used to enrich current medical knowledge and provide doctors in the Neurocritical Care Unit (NICU) with better information for establishing treatment protocols for TBIs. The chosen signals of interest were intracranial

6 pressure (ICP) and cortical depth EEG. Increased ICP can put pressure on important structures that can restrict blood flow to the brain, which can cause damage both the brain and spinal cord, affecting patient outcomes in this way. Furthermore, EEG has been extensively researched with respect to patient outcomes (Sandsmark et al., 2016;

Nenadovic et al., 2014; Rundgren et al., 2006). EEG reflects post-synaptic potentials, the signaling that underlies brain functions such as cognition, and may hold an extended ability in assessing these outcomes on a more immediate minute to minute interpretation.

The overall goals of this investigation lead to a causality dilemma. To establish a complex relationship, is it best to model EEG from the ICP or the ICP from the EEG?

Given their hypothesized mutually dependency, the answer to this question is directed towards medical-based assessments. From a clinical point-of-view, ICP is the focus of many TBI treatments, whereas EEG characteristics are less understood from a TBI context, making EEG a more desirable model input with the ICP as the output. From these observations, the objectives are laid out in chronological order as follows:

 Develop a robust data cleaning strategy that can be adapted for expanded

biomedical signals for future signal inclusions and comparisons.

 Find meaningful data patterns in EEG signals.

 Determine the patterns that may contribute most to the ICP levels.

 Find more basic correlations of ICP and other physiologic signals with

linear dynamics to determine ideal states for segmentation of ICP and EEG

data for models.

 Construct a model of ICP levels from these EEG data patterns.

7

 Translate this model into meaningful clinical information from the data-

driven model of ICP and EEG characteristics.

The developed models will be evaluated based on a cluster validity assessment.

Moreover, the models will be evaluated on their time-dependency and ability to classify both inter-patient and intra-patient EEG.

These investigations offer prospective benefits on dual levels: for the obvious TBI treatment application, but also the approach towards dealing with an unknown relationship between two signals, which could be modified for industrial practices.

Through an alternative approach towards understanding signal relationships, the expected benefit of these studies may provide the groundwork for affirming or offering alternative data-based evidence for amendments to current TBI treatment guidelines.

These recommendations could lead to improved TBI patient outcomes by stabilizing ICP using patient-specific thresholds in attempt to avoid changes in the patient EEG. By aiming to reduce secondary brain injuries, overall patient outcomes may be improved, especially with considerations to future patient quality of life. Additionally, with TBI estimated to cost $76.5 billion in 2010, optimal treatment planning can reduce these direct and indirect economic hardships on patients and their families (“Licenses Protein”, 2014).

Exploring these relationships may provide a base foundation for deeper advancement to these larger goals, which could also include using scalp EEG to predict ICP, negating the need for intracranial EEG and thus drilling into the patient’s skull.

On a higher level view of this process, a similar approach could be used towards predicting unknown values for in sectors such as semiconductor

8 manufacturing. By relating easily measurable signals such as the sensor data with quality control measures such as metrology in a classification model, semiconductor companies can model the measurable inputs to detect if the virtual metrology is within specifications without time-consuming (and often costly), direct quality control measurements.

Moreover, researchers can determine what signal or signal characteristics are contributing to quality issues to optimize the overall process.

1.3. Thesis Layout

The thesis documentation is constructed of four chapters, beginning with the

Introduction. The Introduction explains the potential impact and critical need for PHM- based methodologies in medical data, especially in the NICU, the impact of using intelligent models with respect to patient outcomes and the specific framework for this approach to TBI signal exploration through neurophysiologic data modeling.

Chapter 2 looks at existing, published clinical trials that attempted to assess ICP and its relationship to patient outcomes, EEG-based classification studies and current

PHM modeling strategies.

Chapter 3 provides the overall architecture and expands on the steps for the proposed approach for modeling ICP thresholds with EEG characteristics. An overview of the steps is presented first, followed by a detailed data cleaning strategy to deal with sensor disconnects and large artifacts. Then, the chosen signal processing methodologies and features are laid out in depth for both EEG and ICP. Next, a contribution assessment of the EEG patterns is discussed on a patient-to-patient basis.

Following this, a simple correlation analysis is presented using a third neurophysiologic

9 signal, cerebral blood flow (CBF), and ICP to detect segments of strong correlated time periods for EEG and ICP segmentation. And finally, a clustering methodology is presented to model the EEG variables with the ICP values of interest, within these states of CBF and ICP correlation.

Chapter 4 provides an evaluation based on the achieved models from a PHM perspective and the translated clinical perspective, as well as laying out recommendations for future work in neurophysiologic modeling for both this particular case study and for further investigations into TBI neurophysiologic modeling. A universal, yet generic data cleaning strategy is also outlined for additional signals that may be encountered in future work.

10

2. LITERATURE REVIEW

2.1. Sensor Systems in the Neuro Intensive Care Unit

Monitoring in the NICU is particularly important since it can be representative of the onset of neurologic decline for a patient, which can be either life-threatening or brain- damaging (Wright, 2007). Moreover, sensor systems are the principal feedback for physicians and their reliability and translatability are important to treatments for disease and injury management. Of these sensor measurements, ICP and ICP-related signals are the cornerstone of severe TBI management (Brain Trauma Foundation, 2007). In fact, better patient outcomes have been associated with centers with higher rates of ICP monitoring in the United States (Bennett et al., 2012). In the Neuro Critical Care

Department at the University of Cincinnati, the source of the provided data and medical knowledge, clinicians utilize the Moberg CNS EEG & Multimodal Neuromonitor and real- time HL7 feeds through Capsule. Signals provided from these devices are listed in Table

1, which concern three distinct areas: brain-related activity monitoring, captured in the

Moberg system, as well as cardiovascular monitoring and respiratory monitoring, captured through the HL7 feeds. A schematic of the data sources and storage is shown in Figure 1, which follows the sensors to the data collection source, over to the analytical end user represented by the Center for Intelligent Maintenance Systems (IMS), where the computational portion of these investigations was carried out.

11

Figure 1: Data Collection Schematic

The data collected is unique and powerful in its ability to capture minute to minute measurements, whereas most physiologic data collected for clinical use is reported on an hourly basis through paper charting or electronic medical record (EMR) systems, leading to potentially important events being missed. For instance, these hourly measurements may omit valuable periods of ICP elevations, which has been said to be up to 40% of these prolonged ICP periods that last over 10 minutes (Chambers et al., 2008). With this improved data resolution, practices of predictive modeling have the added ability to increase valuable medical insight beyond a human’s mental integration of independent raw bedside data.

12

Table 1: Detailed Sensor List for Moberg Multimodality Monitoring

Data Modality Sensor Type Resolution Units Device

Brain Intracranial Pressure Strain-gauge 128 Hz mmHg Raumedic PTO (ICP) Brain Tissue Oxygen Fiber optic 128 Hz mmHg Raumedic PTO (PbtO2) Sensor Thermal Regional Cerebral Blood Diffusion 128 Hz mL/100g/min Hemedex Flow Sensor Brain Temperature Thermistor 128 Hz C Hemedex Thermal Brain Water Content Diffusion Variable K Hemedex Sensor Intracranial Platinum 256 Hz μV AD-Tech Electroencephalogram Electrode Scalp Continuous Ag-AgCl Cup 256 Hz μV Natus Electroencephalogram Electrode Cardiovascular Gel Adhesive Electrocardiogram 128 Hz mV Medline Electrode Edwards Pressure Arterial Blood Pressure 128 Hz mmHg Lifescience Transducer TruWave PX 6000I Respiratory Infrared SpO2.com (Masimo Plethysmography (SpO ) Photoelectric 128 Hz % 2 Corp.) Sensor Infrared End-tidal Capnography Photoelectric 128 Hz mmHg Philips (EtCO ) 2 Sensor Impedance Respiratory Rate 128 Hz Ohms (Ω) Philips Sensor

Neuromonitoring is an invasive measurement technique that is only utilized when it is required for patient treatment. The Brain Trauma Foundation states that when a patient enters the hospital with a TBI, physicians should determine whether or not to perform intracranial monitoring based on the presence of an abnormal CT scan or if the patient is over the age of 40, has a systolic blood pressure under 90 mmHg or has motor

13 posturing (Brain Trauma Foundation, 2007). Intracranial monitoring has the benefit of providing doctors with comprehensive by the second update on various intracranial activities. However, with the use of intracranial monitoring comes the risks associated with the invasive drilling into the patient’s skull that is required for many of the monitoring devices such intracranial pressure (ICP) measurements and intracranial electroencephalogram (EEG) recordings. Figure 2 exemplifies the depth and invasiveness required for these sensors to acquire the appropriate data.

Figure 2: X-ray Showing Sensor Set-up

In TBI patients, ICP is a crucial signal given its ability to detect the onset of hydrocephalus, where a blockage from cerebrospinal fluid build-up occurs within a ventricle or subarachnoid space (Hall & O’Kane, 2016). Hydrocephalus can be deadly, making

14 treatment of ICP imperative. EEG, on the other hand, can provide information about seizure onsets, consciousness levels and overall brain cognition (Wright, 2007).

Multimodal monitoring systems hold several promising abilities which are followed by some challenges, including critical investments and maintenance of the overall infrastructure (Wright, 2007). Aside from these challenges, the potential abilities of multimodal monitoring give promise to guided patient-specific therapies and physiological relationship understandings through investigation of the time-locked data streams of the different signals. With the ability to better explore and understand biomedical signals, medical professionals can improve the delivery of critical medical care. (De Georgia et al., 2005) further emphasized the potential of these multimodal monitoring technologies in the Neurological Intensive Care Unit, underscoring its ability to provide crucial information about both brain physiology and metabolism, and, in combination with computational techniques, enhance management of neurologic diseases or injuries.

2.2. Clinical Studies and Interpretation of Neurophysiologic Signals

ICP levels have been widely used for TBI management, with numerous studies on the relationship between ICP and TBI outcomes. In the study that established the universal guidelines for ICP monitoring, an elevated ICP that could not be reduced increased outcomes of severe disabilities/vegetative states and death (Narayan et al.,

1982). Similar findings were verified in (Miller et al., 1981), where a significant correlation was found with raised ICP over 20 mmHg, and disability, vegetation, or death.

Additionally, they suggested that, based on their findings, an ICP greater than 40 mmHg may be responsible for brain ischemia and severe or fatal neurological dysfunction.

15

Though the idyllic ICP threshold is unclear, a standard of 20 mmHg has been widely adopted from clinical studies like the preceding one. In studying the relation of ICP to mortality, (Maramou et al., 1991) found that the explanatory power of ICP proportion variables peaked at this 20 mmHg standard. Moreover (Tsutsumi et al., 1986) found that

40 mmHg was the critical high limit for ICP. (Badri et al., 2012) confirmed this well- illustrated relationship with ICP and mortality, however, no correlation with average ICP and neuropsychological functioning for TBI patients after the 6-month follow-up could be established. Altogether, there is little conclusive evidence for the ideal method to interpret

ICP. Much less exists when put into consideration to other neurophysiologic signals and how they can provide complementary or discordant information about the progression of the injury.

In coma patients, EEG recordings can be of great importance to assessing brain activity. EEG has been used in a broad of classification solutions from neurodevelopment of preterm infants to elderly mild cognitive decline (Millichap, 2012;

Giannakopoulos, 2009). At the forefront of these EEG-based classifications is seizure prediction, which often relies on classifying EEG data based on expert-labels for preictal

(before seizure) and ictal (baseline) periods. Seizure prediction and detection approaches have been explored using time domain, , wavelet transform, singular value decomposition, empirical decomposition, and Principal Component Analysis

(PCA) and Independent Component Analysis (ICA) algorithms (Alotaiby et al., 2014). By using these methods, EEG signals, for this investigation, can be summarized into more explanatory variables than the raw, recorded EEG values.

16

Time domain, the simplest of these approaches has been used for EEG signal analysis, especially with respect to measures of chaos and complexity. In a basic example, (Yucelbas et al., 2016) was able to use 50 time domain variables including , and Hjorth parameters as features for EEG and EMG signals to classify 5 stages of sleep for 15 subjects. Using ANN, a classification high of 91.03% was recorded for the classification of stages after Non-REM II and a low of 75.42% was recorded for predicting Non-REM I epochs. An even more basic example of time-series variables in EEG classification is provided in (Khorshidtalab et al., 2013). Khorshidtalab et al. used a multi-class SVM for motor imagery classification using 15 time domain variables of EEG including kurtosis, skewness, maximum, and RMS. The classification results from the SVM yielded an accuracy of over 80% for every subject.

These results were relatively successful, especially considering the simplicity of the variables used in the models.

With increasingly complex features, such as frequency domain variables, the accuracy of the models of non-stationary waveforms may significantly increase. Of these frequency domain features, EEG frequency sub bands have been well utilized throughout the advancement of EEG research. These frequency sub bands have been found to have rhythmic oscillatory patterns that appear during different physiologic states. For instance,

(Abdullah et al., 2010) was able to relate these frequency bands with ECG heart rate variability at different sleep stages, where select sub bands were related with cardiac

HRV parameters. Another example of this lies with (Yaghouby et al., 2015), who classified

EEG canine seizure data for a seizure prediction study, using mean power frequency and mean phase coherence in a random forest classification. This strategy yielded a true

17 positive rate of 100% for all subjects and a maximum true negative rate of 98.72%. Once more, in (Haddad et al., 2014), temporal seizures were predicted using conventional EEG frequency sub-bands (delta, theta, alpha, beta and gamma). In the aforementioned study, the seizure signature was extracted from the delta and high frequencies (gamma band and 60-120 Hz sub-band) to yield an anticipation accuracy of 72% and a false-positive rate of 0%. This study’s use of common EEG frequency sub bands are common across many studies using EEG because of their relationship to brain states, functions and pathologies. These sub-bands characterize different types of brain activity and have been an important overall contribution to EEG research. A brief overview of these standard sub band relations to some examined human activities is outlined in Table 2 (Freeman &

Quiroga, 2013).

Table 2: EEG Frequency Sub Bands

Frequency Approximate Activity Location Band Name Frequency Range Frontal locations Delta 0.5 - 4 Hz Deep sleep in adults Deep sleep, high Found in Theta 4 – 8 Hz amplitude can be related to Different Areas of epilepsy in adults. the Brain Awake and relaxed, often Occipital Alpha 8 – 12 Hz with eyes closed. locations During focused activities or Central and Beta 12 – 30 Hz deep thought frontal locations Sensory Processing, Gamma ˃ 30 Hz Parietal location stress, anxiety

Frequency domain features offer information about the spectral content of the EEG signal, but lacks the event order of signal information that time domain features offer. However,

18 time-frequency analysis carries both these representations with a tradeoff in balancing their content.

Time-frequency analysis has this added advantage to showing how the signal’s energy is distributed across both the time and frequency domain. For example,

(Musselman & Djurdjanovic, 2012) used bilinear time-frequency features in an epilepsy prediction using multi-class SVM with high accuracy. In expanding to time frequency analysis, one of the most common features applied to EEG signals for classification are discrete wavelet transforms. (Srinivasan et al., 2013; Lee et al., 2014; Ocak, 2009).

Wavelets are ideal for non-stationary signals given their effective time-scale representation. A specific wavelet example can be provided from (Chen, 2014), who used a dual-tree complex wavelet tested with a multitude of classification algorithms and achieved a low seizure classification accuracy of 92.22% and several 100% seizure classification rates. have seen significant recent exercise in EEG-based studies and will continue to adapt with non-stationary biomedical data analyses. Features from these various domains and applications can then be translated for exploration of a relationship of less perceivable information of the EEG waveform to the concurring ICP values.

Moreover, a quantitative relationship with EEG and ICP may bring about insight into how ICP intervention may affect cognitive outcomes (Foreman & Claassen, 2012).

(Claassen et al., 2013) looked at physiological measurements and intracranial EEG for patients with seizures after acute brain injury, where a rise in ICP was seen after seizure onset. This association shines light on the possible definable relationship between these signals, but little has been published about how modulating ICP might affect the EEG.

19

There has been few exploratory works on the actual relationship between ICP and patterns of EEG. Some of those who have investigated this relationship have obtained either outcome-based or observational results, rather than the machine learning approach being targeted. (Chen et al., 2012) used a Spearman’s correlation coefficient to find a relationship between ICP measured via lumbar puncture and EEG for subjects with central nervous system disorders. Using 20 randomly selected 5 second segments, they concluded there was a -0.849 correlation between ICP value and the Pressure Index (PI), which they defined with frequency and delta ratio. Their results appear to provide evidence of a significant negative correlation, however the sample size is relatively small and may have excluded critical instances captured in the long-term, continuous data collection. Furthermore, it is unknown how the patient population of central nervous system disorders may differ from comatose TBI patients. (Connolly et al., 2016) used an overall larger data set with 246 ICP segments, from a limited patient population of two

NICU patients, where 79.6% of ICP increases were followed by a burst suppression in the EEG data. These promising results are still somewhat limited in overall size (64 minutes of data), as well as the limited brain states of the burst suppression patterns. This issue of evolving brain states was of potential concern when trying to include the entire

EEG dataset in these investigations.

To mitigate these issues, it was decided that EEG and ICP investigations should only be carried out on some regulated portion of data. This regulation was dedicated to the correlations of the ICP and cerebral blood flow (CBF), both of which are stationary neurophysiologic measurements. This relationship has been more heavily investigated than the ICP/EEG relationship (Gobiet et al., 1975; Tackla et al., 2015; Partington &

20

Farmery, 2014). In fact, (Gikes & Whitfield, 2009) provided support for a negative correlation between these signals. These signals can be modeled by the following equations describing the CBF and ICP relationship as related through cerebral perfusion pressure (CPP), mean arterial pressure (MAP) and cerebrovascular resistance (CVR):

퐶푃푃 (1) 퐶퐵퐹 = 퐶푉푅

퐶푃푃 = 푀퐴푃 − 퐼퐶푃 (2)

푀퐴푃 − 퐼퐶푃 (3) 퐶퐵퐹 = 퐶푉푅

From these equations, a relatively constant MAP and CVR would give a negative relationship between CBF and ICP. In a more complex scenario, (Ursino & Lodi, 1997) used mathematical modeling to develop a brain hemodynamics feedback loop which showed the instances at which ICP could have both positive and negative correlations, which were dependent upon cerebral autoregulation, intracranial compliance and cerebral spinal fluid circulation. This relationship was illustrated in Figure 3.

21

Figure 3: Hemodynamics Feedback Loop (Ursino & Lodi, 1997)

Given the ability of these models to be better physically modeled, these two stationary neurophysiologic signals became the targets to limit the modeled brain states when investigating the overall ICP/EEG relationship.

2.3. Application of Data Analytic Tools

PHM seeks to take large data sets and process the data into easily understandable predictions of asset lifecycles by determining deviations from normal, baseline data characteristics. The framework used to handle PHM solutions will be similarly applied towards medical innovations in the NICU. A typical PHM approach will consist of some assortment of the following steps: Signal Processing, Feature Extraction, Feature

Selection/Dimension Reduction, Health Assessment, Health Prediction, Health Diagnosis and Visualization. “Health”, in these applications, is in reference to the system’s behavior,

22 with a “healthy” system behaving normally and an “unhealthy” system behaving abnormally. These steps have been adopted to the approach in Figure 4.

Figure 4: Adapted PHM-based Approach

In order to build a model to monitor the non-linear, dynamic dependencies of neurophysiologic data, literature from the monitoring of other complex systems is employed. Industrial complex systems modeling may often exercise fault detection and diagnosis algorithms. The faults in industrial applications, which typically identify the source or types of failures, will rather relate to an anomalous physiologic states. The normal, baseline physiologic states will be defined by current clinical definitions identified in medical literature. In terms of modeling this unknown relationship, a ‘black box” system model best characterizes the neurophysiologic system, where the system inputs are characteristics of the EEG signals and the system output are the physiologic states as

23 specified by the patient ICP as seen in Figure 5. Using these PHM fault diagnosis algorithms, departures from normal system behavior could be detected for identification of the anomalous physiologic states, whereby the ICP raises above the identified ICP threshold.

Figure 5: Relationship Model

Fault diagnosis algorithms can be divided into two different categories: pattern matching and pattern recognition methodologies. Pattern matching algorithms require training data for pattern comparison, while pattern recognition algorithms perform in an unsupervised fashion, self-assigning labels for class membership. Given the user- specified class labels, which will depend on the selected ICP threshold for physiologic state definitions, the pattern recognition algorithms were determined to be the more appropriate. Essentially, these labels are what are currently seen as meaningful medical assessments of a signal, but may not be certainly accurate when the data variables are actually derived and modeled. Pattern recognition algorithms or clustering algorithms have been widely used in applications associated with genetic data, marketing data and social media data to name a few (Geyer-Schulz, 1995; Ye et al., 2013; Javed, 2016).

These pattern recognition algorithms can be segregated based on several characteristics of the algorithms themselves starting with the learning type: hard or fuzzy classifiers. A hard classifying algorithm deals with class assignment based on labels, whereas fuzzy

24 classifying outputs a “degree of membership”. Fuzzy clustering can be used using fuzzy logic.

Fuzzy logic uses “degrees of truth” determined by membership functions, mapped between the extremes of 0 and 1 as exemplified in Figure 6. Once the inputs are determined through the membership function, the fuzzy rules are evaluated to form a final result or fuzzy value. The output class is then obtained according to the membership function in a process called defuzzification (Mendil & Benmahammed, 2001). Fuzzy logic has bas been used in determining severity levels for cases such as that of (Su et al.,

2001) who used fuzzy clustering to measure changes in gait pattern following ankle

arthrodesis. Figure 6 displays how the “degrees of membership” are determined.

Degree of Degree Membership

Figure 6: Fuzzy Logic Example

One example of a fuzzy clustering algorithm is a fuzzy c-means, where the cluster’s centroid is calculated as a mean of the weighted observational values within that cluster. The weighted values are based on the degree of membership to that cluster.

This centroid calculation is based off of Equation 4, where C is the jth cluster, x is the

25 given data point, m is the fuzzifier, and w is the function which relates to the degree of membership to the jth cluster (Bezdek, 1981).

푁 푚 ∑푖=1 푤푗(푥) 푥 (4) 퐶푗 = 푁 푚 ∑푖=1 푤푗(푥)

The clusters are organized using the function from Equation 5, where the optimal clustering sets are obtained when this function is minimized.

푁 퐶 (5) 푚 2 푚𝑖푛 ∑ ∑ 푤푖푗 ‖푥푖 − 퐶푗‖ 푖=1 푗=1

And 1 (6) 푤푖푗 = 2 푚−1 퐶 ‖푥푖 − 퐶푗‖ ∑푘=1 ( ) ‖푥푖 − 퐶푘‖

Aside from these soft clustering methodologies, (Rokach & Maimon, 2005) presented additional categories for pattern recognition algorithms including hierarchical, partitioning, density-based and model-based methods. These particular areas are further explained with examples in the proceeding section.

Hierarchical clustering seeks to construct a dendogram an agglomerative approach or a divisive approach. Agglomerative classifiers begin by assigning a single cluster label to each feature vector and continually merging clusters until certain criteria has been met. Conversely, divisive clustering begins by including all feature vectors in one cluster and sequentially distributes them into separate clusters until some criteria has been met. An agglomerative constructed hierarchy is exampled in Figure 7.

26

Figure 7: Hierarchical Clustering

There are two most common types of hierarchical clustering: single-link and complete- link, which deal with the criteria between clusters distances (of which there are also several different measures). Single-linkage implements distance minimization, while complete link performs distance maximization as shown in the proceeding Equations, respectively, where d is the selected distance measure, and X and Y are the sets of observations designated as x and y (“The CLUSTER Procedure”):

min {푑(푥, 푦): 푥 ∈ 푋, 푦 ∈ 푌} (7)

27

max {푑(푥, 푦): 푥 ∈ 푋, 푦 ∈ 푌} (8)

The next category is partitioning methodologies, where the number of clusters is known. Most partitioning methods are defined by their ability to move observations from one cluster to another to iteratively optimize the clustering criteria. The most well-known partitioning technique is a k-mean pattern recognition. The k-means algorithm seeks to minimize the distances of cluster points to the cluster centroid. In a PHM case, (Siegel &

Lee, 2011) used a k-means algorithm in partitioning mean wind speed values into two clusters representing the health status of each anemometer. K-means clustering can be performed from Equation 9’s minimization function in order to cluster N observations into

K clusters, where 휇 is the mean of the ith cluster (Lloyd, 1982).

퐾 푁 (9) 2 푚𝑖푛 ∑ ∑‖푛 − 휇푖‖ 푖=1 푗=1

Samples of a k-means clustering procedure can be observed in Figure 8: K-means

Example, where the number of clusters was K = 3 for the given example.

28

Figure 8: K-means Example

The next category is distribution-based pattern recognition. Distribution-based pattern recognition facilitates clustering based on a . It distribution- based clustering, the goal is to optimize the fit of the intended distribution, whether it be a Gaussian distribution, exponential distribution, gamma distribution, etc. These probability distributions can then be used to make statistical inferences from the observational data on the class label. One of the most well-known distribution models is the Gaussian Mixture Model (GMM), where the feature vectors are assumed to have individual normal distributions. GMM is determined as a weighted sum of a number of individual Gaussian models, where the weights are determined by some distribution(휋).

A GMM is given below, where the K is the total number of vector components for which there is a mean (μ) and a covariance matrix (Σ). An Expectations-Maximization algorithm can be used to estimate the remaining model parameters (Yu, 2012).

29

퐾 (10) 푝(푥) = ∑ 휋푘푁(푥|휇푘, Σ푘 ) 푘=1

The last category discussed is model-based pattern recognition algorithms, where the input data is fit to some mathematical model. One such set of model-based classifiers are artificial neural networks (ANNs). Though computationally expensive, ANNs alter their weights with time, making it a highly dynamic machine learning algorithm. ANNs can be both used in a supervised and unsupervised paradigm. Through supervised learning, the

ANN adjusts the weights to match the provided outputs, whereas, in unsupervised learning, the algorithm aims to minimize the cost function. A visualization of the ANN process shows the inputs to the ANN and the outputs of the ANN, as well as the hidden layer connections. The hidden layer is made up of different functions that can be applied to the input layer and whose activations are transformed back to the desired output scale by the output layer. This process is illustrated by Figure 9.

30

Figure 9: ANN Layers

One of the most dynamic ANNs is a self-organizing map (SOM). In this case study,

SOM becomes an ideal candidate for its adaptability and robustness and will be discussed in depth in the Technical Approach.

Clustering techniques require some way to assess the “clusterability” or the validity of the clusters. Several different measurements have been used to quantify the clusters produced from the pattern recognition algorithms. (Jain et al.) described three types of cluster evaluations: internal, external and relative. Internal cluster validity looks at the clusters themselves and determines if the score based on maximum distance between clusters and minimal distance within the cluster. An example of an internal cluster validity measures is the Davies-Bouldin index (Davies & Bouldin, 1979):

31

푁 (11) 1 휎푖 + 휎푗 퐷퐵 = ∑ max ( ) 푁 푗 ≠푖 푑(푐푖, 푐푗) 푖= 1 where N is the number of clusters, c are the cluster centroids, d is the distance between the clusters and σ is the average distance of the cluster observational values.

External cluster validity assessment looks at any predefined class labels and compares them to labels achieved through the clusters. These measures including measures such as true positive (TP), true negatives (TN), false positives (FP), false negatives (FN) and numerous metrics made up of these values. One example is the cluster’s accuracy, according to how similar the clusters are to the classification labels, which can be defined using the Rand index (Rand, 1971):

푇푃 + 푇푁 (12) 푅퐼 = 푇푃 + 푇푁 + 퐹푃 + 퐹푁

A confusion matrix is also a complete visualization table that summarizes measures of

TP, TN, FP and FN.

The final cluster validity measure is relative cluster validity, which can be used to evaluate optimization or differences in the clusters. Measures such as the Davies-Bouldin index can be used for relative clustering when using the comparative clusters between different methodologies, rather than between different clusters of one methodology.

Relative cluster validity is best served when trying to optimize results by benchmarking several clustering methodologies.

32

3. TECHNICAL APPROACH

3.1. Overview

The overall systematic framework for this approach include the data cleaning, signal processing, feature extraction, feature selection, data segmentation and data modeling as illustrated in Figure 10.

Figure 10: Overall Neurophysiologic Assessment Approach

33

Both EEG, CBF and ICP data cleaning consist of detecting disruptions of the data collection system. Because of the multimodal monitoring scheme, these areas of disconnect must be removed from each signal and data from associated time gaps are removed from the other aligned signals. A continuous moving window can then be applied simultaneously across all data sets. Within each window, appropriate predetermined EEG features can be taken for the combined clustering methodology. Previous literature eluded to potential complications of attempting to correlate brain data over long periods of time because of changing states. These regimes have not been conclusively defined to date, making this investigation more complex in dealing with this obstacle. To limit these states, a simple moving window over the features for two physiologic features in

ICP and CBF is used to define periods of high correlation. Then, the clustering methodology, will find like-patterns in the EEG features for data that meets the previously defined criteria. These similar patterns will be clustered based on their relationship likelihood using a model-based pattern recognition algorithm such as the neural network- based Self-Organizing Map. This model can ascertain distinguishable differences in an unknown number of EEG pattern groupings. A distinct boundary between EEG feature sets predicates that there is a determinable change in the EEG waveform. The labels, determined by the concurring ICP window values, can then be compared to these EEG groupings. Labels that correspond to EEG changes at a specified ICP value can indicate a determinable relationship.

3.2. Data Collection

Upon admittance to the University of Cincinnati Neuroscience Intensive Care Unit

(NSICU), patients with severe, acute TBI, who are < 48 hours from the occurrence of their

34 injury were eligible for study enrollment. The severity of the TBI was rated using the

Glasgow Coma Scale score of 8 or less after initial resuscitation (Corrigan, 2010).

The resulting data set was provided for 7 TBI patients with a total of approximately

825.24 hours of data. The figures in Table 3 demonstrate the breakdown of the recorded measurement time for each of these patients.

Table 3: Patient Data Information

Approximate Patient Identifier Size Total Recorded Time

14 3.54 GB 37.34 hours 21 9.97 GB 106.82 hours 22 18.8 GB 203.21 hours 26 5.97 GB 63.77 hours 27 18.6 GB 198.91 hours 28 9.64 GB 100.12 hours 30 11 GB 115.07 hours

From these observations, the average data collection period was about 117.89 +/- 62.86 hours of data collection per patient.

3.3. EEG Signal Analysis

Due to the large volume of the supplied data (77.6 GB), a MATLAB user interface was necessary in order to observe the signals themselves before applying further analyses. This graphical user interface (GUI) was created to navigate between patients, signals and channels of interest. To cater towards the specific signals, appropriate techniques were performed for the data cleaning and signal processing strategies,

35 discussed in detail in the following sections. This GUI allowed for detection of further necessary signal processing techniques for specific signal distortions observed, determining features that could be valuable through time domain, frequency domain and time-frequency domain observations, and looking at reactions of the signals to some of the provided event data. Examples of the constructed MATLAB GUI as seen in Figure 11.

Figure 11: Specified Matlab GUI for Moberg Data for ICP signal with event data

3.3.1. EEG Data Cleaning

The EEG data was provided having already been segmented due to the length of the signals. The ICP files, with half the rate of the EEG, at 128 Hz, were first loaded in full for each patient. The computer’s RAM could not handle the entirety of the

EEG files, which had a sampling rate of 256 Hz and had a total of 15 channels, as compared to the ICP’s single channel. Therefore, the EEG files were loaded separately and cleared from the memory after the features were extracted. The data cleaning is a crucial step in the analysis process. A robust outlier detection and filtering algorithm

36 combined in an overall architecture is needed to automatically remove samples distorted by artifacts, interruptions in the data collection and false sensor measurements. As the

EEG files were uploaded one by one, filtered, they are cleaned and aligned with a corresponding segment of ICP data. EEG’s non-stationary behavior makes cleaning the data increasingly difficult. As a result, a different approach was used to deal with these instances that did not occur in line with the ICP distortions or disconnections. The ICP signal was chronologically cleaned first, making any data outliers consistent across both signals, already removed. In order to remove outliers in the more complex EEG signal, further acceptance/rejection rules were applied to after the data windowing, where the resulting ICP and EEG signals were windowed in 20 second, non-overlapping time intervals. It was believed the amount of the rejected data windows would make this technique acceptable. The percentage of these ignored time windows for each patient is provided in Table 4.

Table 4: Information on Removed Patient Data

Patient Percent of Data Identifier Unused 14 1.40 % 21 6.62 % 22 12.63 % 26 4.53 % 27 8.17 % 28 2.59 % 30 22.65 %

37

It is also noted that the patients with noticeably higher rejection rates contained entire files of information void EEG signal at the end of the patient data, which had to be removed.

Figure 12: Data Rejection and Alignment

The discrepancies in the biological data can have significant impact on the patterns of the data and filtering them is vital to having an accurate output from the analysis. In order to effectively remove the unwanted samples from the data, both complex systems experts and medical professionals combined data knowledge to develop rules to remove unwanted events from the stationary neurophysiologic signals.

These rules included discrepancies in amplitudes, and anomalous distance between

38 amplitudes. This automated process is an effective and advantageous tool in data cleaning strategies for biomedical signals. The outline of this process is presented in

Figure 12, where the ICP cleaning technique is discussed further in detail in Section

3.4.1.

A sample of data that was ignored during the process that filters out windows is shown below in Figure 13. This example was ignored because of an abnormally large amplitude with points above 800 microvolts, which would have created anomalous EEG features in the resultant EEG feature dataset It escaped the preceding outlier removal process because the signal was still a continuous and mimicked characteristics of the signal shape, which would missed by other disconnection and distortion removing techniques of the EEG signal.

Figure 13: Sample of Removed Window of Data

39

3.3.2. EEG Signal Processing and Feature Extraction

In the EEG spectrogram, a peak was apparent at 0 and 60 Hz in all the EEG unfiltered data files, which were attributed to the DC offset and machine interference, respectively. For each individual window, a bandpass filter was used with a high-pass filter of 0.5 Hz to remove the previously mentioned DC offset and a low-pass filter of 70

Hz to remove noise artifacts. Then, a notch filter was used to remove the 60 Hz powerline hum. This is shown in Figure 14. Windowing was used to divide the data into small portions that were then used to extract more detailed information from the signals in the form of various features.

Figure 14: EEG Spectrum Example

A comprehensive number of EEG features were calculated for each of these time windows, beginning with time domain features, which provide measurements of the time

40 series signal. variables were chosen based on previous literature and alike practices of handling features for industrial non-stationary signals such as vibration signals. These features were mean, maximum, peak to peak (P2P) and RMS, which are calculated by:

∑푁 퐴 푀푒푎푛 = 푖=1 푖 (13) 푁

∑푁 (퐴 − 푀푒푎푛)2 푆푇퐷 = 푖=1 푖 (14) 푁 − 1

∑푁 퐴 2 푅푀푆 = √ 푖=1 푖 (15) 푁

푃2푃 = max(퐴푖) − min(퐴푖) (16) where A is the EEG Amplitude, w is the length of the EEG window and I is the discrete points of the EEG signal. In addition to these statistical time series variables, a few other time domain variables were collected from previous literature such as the Hjorth

Parameters and Shannon Entropy (SE), where Shannon entropy was defined as

(Coifman & Wickerhauser, 1992) and gives a measure of uncertainty in the signal:

푤 2 2 푆퐸 = − ∑ 퐴푖 log(퐴푖 ) (17) 푖=1

Another captured feature were higher order time domain parameters called Hjorth parameters. These parameters can capture useful time and frequency domain information while retaining a relatively low computational requirement. Hjorth parameters have also appear in many EEG classifications. For instance, (Hamida, et al.) successfully utilized in a k-means classifier for insomnia identification with a 91% sensitivity and a 91%

41 specificity, which used namely the Hjorth parameters of the mobility and complexity, motivating the use of Hjorth parameters for alternate EEG classification. Mobility and complexity are just two of the three Hjorth parameters, which also include a measure known as Activity. From their definitions, Mobility is representative of the mean frequency and retains proportions of the power spectrum , while complexity represents changes in the frequency spectrum. Activity was also added as the additional

Hjorth parameter and is representative of the signal power. The computation for these features are provided in Equation 18, 19 and 20 below (Hjorth, 1970):

퐴푐푡𝑖푣𝑖푡푦 = 푣푎푟(퐴(푡)) (18)

(19) 푑퐴 ( ) 푣푎푟 (퐴 푡 ) 푀표푏𝑖푙𝑖푡푦 = √ 푑푡 푣푎푟(퐴(푡))

푑퐴 (20) 푀표푏𝑖푙𝑖푡푦 (퐴(푡) ) 퐶표푚푝푙푒푥𝑖푡푦 = 푑푡 푀표푏𝑖푙𝑖푡푦(퐴(푡))

Due to the non-stationary nature of the EEG signal, frequency domain are useful in assessing the signal’s spectral content. The frequency sub bands highlighted in other

EEG analyses and documented in the Literature Review were taken into consideration for this overall feature set with the delta (0.5-4 Hz), theta (4-7 Hz), alpha (8-12 Hz) and beta (12-20 Hz) sub bands’ power, as well as their relative powers and total power. This power was achieved through:

퐻퐹

푃표푤푒푟 = ∑ 푌푓 (21) 푓=퐿퐹

42

Where Y is the power spectrum of the EEG window, LF is the lower frequency range of the band and HF is the higher frequency range of the band. The pressure index (PI) from

(Chen et al., 2012) was further included in the frequency feature set. This feature was calculated as:

1 푃퐼 = (22) (퐷푒푙푡푎 푅푎푡𝑖표)(푀퐹)

Where the delta ratio is the ratio of delta power to the sum of the alpha and beta power.

Additionally, MF is the median frequency or the power of the frequency that makes up

50% of the total EEG window power. In addition to PI, the median frequency was also added to the feature set as well as spectral edge frequency, which is the power of the frequency accounting for 95% of the total power in the EEG window.

The final feature category was time-frequency features, summarized in the form of wavelet features. A wavelet energy analysis uses wavelet decomposition as a filter which iteratively branches the signal into a lower frequency subdivision and a higher frequency subdivision. These branches are known as the approximation and detail coefficients as shown in Figure 15. The approximation signal estimates the frequency content (fA) provided in the range from:

푓푛푦푞푢푖푠푡 0 < 푓 < (23) 퐴 2푛+1

Where n is the decomposition level. Furthermore, the detail estimates the remaining frequency content (fD) from:

푓푛푦푞푢푖푠푡 푓푛푦푞푢푖푠푡 < 푓 < (24) 2푛+1 퐷 2푛

43

Figure 15: Wavelet Decomposition

The decompositions originate from the original signal and then from the previous levels’ approximation signal. The mother wavelet was chosen from (Hojjat et al., 2003) as

Daubachies 4 (‘db4’) mother wavelet for 5 levels, which aims to mimic the EEG’s transient signatures. In order to fit the wavelet coefficients to meaningful frequency ranges, the

EEG waveform was first down sampled by a factor of 2 so that the new sampling rate would be 128 Hz. Coefficient ranges were selected based on their ability to capture meaningful frequency content. As a result, six wavelet coefficients were chosen as displayed in Figure 16, using detail coefficients for all five levels and the approximate coefficient at the final level to capture the remaining frequency content.

44

Figure 16: Wavelet Coefficient Plots

These wavelets, with the down sampling, matched up to ideal sub band ranges shown in

Table 5, which ties the coefficients and their frequency range to a unique characteristic

EEG sub band. The actual features stored in the EEG feature matrix are the mean, standard deviation and ratios of the mean of adjacent sub bands for the selected wavelet levels and chosen coefficients (Hojjat et al., 2003).

Table 5: Wavelet Coefficient Frequency Sub Bands

Wavelet Corresponding Closest Related EEG Frequency Frequency Sub Band Range D1 – Detail Coefficient Level 1 32-64 Hz Gamma D2 – Detail Coefficient Level 2 16-32 Hz Beta D3 – Detail Coefficient Level 3 8-16 Hz Alpha D4 – Detail Coefficient Level 4 4-8 Hz Theta D5 – Detail Coefficient Level 5 2-4 Hz Delta A5 – Approximation Coefficient Level 5 0-2 Hz Delta

45

A comprehensive list of all these detailed features is provided in Table 6 below, where each feature is categorized by time domain, frequency domain or wavelet.

Table 6: List of Included EEG Features

EEG Variables Time Domain Mean Maximum Peak to Peak RMS Hjorth Parameters : Activity, Complexity, Mobility Shannon Entropy Frequency Domain Delta Power (0.5- 4 Hz) and Relative Delta Power (0.5 – 4 Hz / 0.5 – 70 Hz)

Theta Power (4 – 7 Hz) and Relative Theta Power (4 – 7 Hz/ 0.5 – 70 Hz) Alpha Power (8 – 12 Hz) and Relative Alpha Power (8 – 12 Hz/ 0.5 – 70 Hz) Beta Power (12 – 20 Hz) and Relative Beta Power (12 – 20 Hz / 0.5 – 70 Hz) Total Power (0.5 – 70 Hz) Spectral Edge Frequency and Mean Frequency Pressure Index (PI) Wavelet Features Mean, Standard Deviation and Ratios of the Mean of Adjacent sub bands for decomposition level of 5 (Type: Daubauchies 4 Wavelets)

3.4. ICP Signal Analysis

3.4.1. ICP Data Cleaning

In order to detect disconnections in the ICP measurement a series of steps was developed to detect different types of noise in any biomedical signal. This process, specific to the ICP signal, involved removing data points greater than a maximum ICP value of 90 mmHg. Further, the derivative of the ICP vector and ICP time vector was also

46 utilized to find extraneous corresponding derivatives for outlier removal, which were determined to be anything greater than the values of 5 mmHg or 0 mmHg. Some of the distortions in the ICP signal appeared as a not completely uniform signal, but, rather, with very small oscillations. To filter these out, a small 0.2 second moving window was used, where the standard deviation of the window was acquired, and the signal was removed if the standard deviation was smaller than the expected ICP peak standard deviations, which was determined as 0.5 microvolts. Similar to the EEG signal, gaps removed from the ICP signal were also removed from the EEG signal as to properly align them.

The specific rules developed from knowledge of the understood signal’s threshold is outlined in Figure 17 for the EEG and ICP signals, as well as the overall data cleaning, alignment, windowing, window rejection techniques and ending with the feature extraction of the ICP and EEG signal.

The more generalized steps in this cleaning process will also be outlined further in the results section, which outlines a plan for data cleaning for any stationary biomedical signal. The aforementioned EEG data cleaning process was adapted to the window acceptance/rejection method based on the unstable predictability that ensues with non- stationary signals. However, many of the disconnections that take place would be filtered out of the EEG during alignment with the removed disconnections in the ICP signal, before the windowing even occurs.

47

Figure 17: Overall Data Cleaning and Signal Processing

3.4.2. ICP Signal Processing and Feature Extraction

The ICP signal, after having undergone the data cleaning for signal distortions, was a relatively smooth signal with little signal processing required. However, the feature extraction proved to be challenging, given the continuously changing ICP morphology.

These peaks are designated as the percussion, tidal and dicrotic waveforms. The first peak, the percussion wave, relates to the arterial systole, while the second peak, the tidal

48 wave, relates to ICP autoregulation (Singhi & Tiwari, 2009). Lastly, the dicrotic wave, or the third peak, relates to venous activity (Singhi & Tiwari, 2009). A normal, healthy ICP morphology is shown in Figure 18b with the location of these three sub-waveforms. Figure

18a displays a non-compliant ICP waveform, with the peak detection results provided in both. Because of the inconsistencies with compliant and non-compliant waves, only the highest peak of the ICP wave was targeted for the ICP feature extraction. For a tri-peak compliant ICP, the percussion peak is stored, while for tri-peak non-compliant waveforms, the tidal wave would be stored. This decision was based on how ICP may be similarly quantified in the NICU when gauged over longer periods (typically hourly bases).

Percussion Tidal

Dicrotic

a) b)

Figure 18: ICP Peak Morphology

Straightforward local maxima detection would identify every ICP sub peak.

Generally, as observed in the provided severe TBI patient data, non-compliant ICP waveforms are more often observed, making sub peak detection for all waveforms virtually impossible given the non-compliant peak’s less prominent outer sub peaks.

Therefore, with a focus on maximum sub peak detection only, a filter was utilized to distinguish where each major peak was located using a high-pass filter at 0.5 Hz. The

49 maximum ICP sub peak value surrounding each of these major peaks was stored for the peak ICP value for the ICP feature matrix. Samples of this process are shown in Figure

19, with the ICP signal with its sub peak detection at the top, the high-pass filtered ICP and its peak detection at the bottom and the alignments of these peaks shown with the dotted lines.

Figure 19: ICP Peak Detection with Low-Pass Filtered Signal

Once the ICP peaks have been denoted, the resulting data can be observed in

Figure 20, where the aligned signals contain the peak ICP detection and a cleaned and filtered EEG signal, which is then prepared for the EEG feature extraction. After this previously explained process occurs, the EEG features, the according EEG feature

50 legends and ICP mean peak values are saved for each patient’s neurophysiologic data, which is stored for further analysis.

Figure 20: Window of ICP Peaks and Filtered EEG Signal

3.5. EEG Feature Selection

In order to determine which EEG features can best discriminate against patterns that are considered “Above Threshold” and “Below Threshold”, a feature selection methodology is necessary. In order to reduce the already large computational load, a simple feature selection strategy is preferred. Fisher Criterion meets this requirement and is ideal for its ability to rank features by their ability to distinguish between two classes, as well as determining its overall contribution power to this . To compute the Fisher

Criterion, Equation 25 is used.

2 |휇푛,푘 − 휇푎,푘| (25) 퐽푘(푛, 푎) = 2 2 휎푛,푘 + 휎푎,푘

51 where ‘n’ and ‘a’ are the class indices, representing “Normal” or below threshold and

“Abnormal” for above threshold. The fisher criterion for the kth feature is the summation of Equation 25. This step is not only important to developing effective models in the following steps, but also provides important insight on the relationship between these two signals. If a pattern in high ranking features between patients or time periods can be detected, a potential hypothesis can be made about what EEG features correlate with

ICP peak values for further exploration. Fisher values were scored for several models using the same threshold value at 40 mmHg based on the majority of the data distributions and its clinical relevance. A record was constructed from the top four ranked features for each folder with sufficient data to produce a meaningful model. Figure 21 displays the frequency of each feature’s appearance in each patient’s folders using Fisher criterion for a threshold at 40 mmHg. The feature name key is given below this figure, in Table 7.

Table 7: Feature Names

Shorthand Name Explanation

Rel. Delta, Rel. Theta, Rel. Alpha, Relative Energy in Frequency Sub Band Rel. Beta P2P Peak to Peak Total Total Power Mob, Cpx, Act Mobility, Complexity and Activity D1me, D2me, D3me, D4me, Mean in the Detail or Approximation Wavelet Coefficient at D5me, A5me Specified Level D1s, D2s, D3s, D4s, D5s, A5s Standard Deviation in the Detail or Approximation Wavelet Coefficient at Specified Level sR1, sR2, sR3, sR4, sR5 Ratio of Wavelet Coefficient Means sR1: D1me/D2me sR2: D2me/D3me sR3: D3me/D4me sR4: D4me/D5me sR5: D5me/A5me PI Pressure Index MF Mean Frequency SEF Spectral Edge Frequency SE Shannon Entropy

52

Figure 21: Top 4 Fisher Results by Patient

This 3D graph was generated in order to assess if these features appear to be patient- dependent or patient-independent. From this figure, there is some visual correlation in the

Fisher ranking of the EEG features with respect to the ICP mean peak values.

53

In order to decipher which exact features seem to be more uniformly important across all the patients, a second frequency plot was constructed in Figure 22, where the frequency was the cumulative feature occurrence in the top four for each possible model.

Frequency of Feature in Top 4 Fisher Results

Figure 22: Cumulative Fisher Results

Figure 22 has several clearly predominant features like ‘D5me’. In order to determine a frequency threshold on what features to include, a relative frequency threshold of 7 was used, which would indicate the features presence in over approximately 23.3 % of the files tested. Examples of one of these top four features for one patient are shown in Figure

23. Figure 23 shows some of the feature plots of the top ranking EEG features, ‘A5me’, in relation to the mean peak ICP values in the corresponding time windows.

54

Correlation Coefficient: 0.62

Correlation Coefficient: -0.48

Correlation Coefficient: -0.04

Figure 23: Sample Feature Plots of A5me with ICP

From these plots, there was a noticeable change in some of these ICP/EEG correlations, which ranged from positive, negative and no correlations. It was believed part of this issue was the previously mentioned brain states, which merited the use of data segmentation through CBF and ICP correlations, which could show representations of intracranial compliance, cerebral spinal fluid regulation and cerebral autoregulation.

55

3.6. Brain State Segmentation

3.6.1. Cerebral Blood Flow Signal Processing

From some existing literature, the CBF was a smooth signal, yet was collected less frequently than many of the other signals given its calibration time and 60 minute limits between recalibrations. Its inclusion in the analysis disqualified several patient data windows that did not contain a corresponding CBF data stream. The CBF was processed based on two other related signals, perfusion pressure and perfusion temperature. When the perfusion temperature dipped below 36 ̊C or when the probe placement assistant (PPA) rating exceeded 2, the corresponding CBF data was considered corrupt. A perfusion temperature less than 36 ̊C was an indication that the probe was too close to the surface. The PPA was rated from 0-10 for optimal placement of the perfusion probe and was said to indicate good placement when less than 2 (“BPM

Neuromonitoring”). This CBF signal processing is exampled in Figure 24 below, where the bottom two plots show the perfusion temperature and PPA, and the top figure shows the original and retained portions of the CBF signal.

56

Figure 24: CBF Processing

After undergoing this processing method, the CBF and ICP/EEG features were aligned with respect to beginning, end times and gaps. The same 20 second moving window was used on the CBF signal, taking a simple mean for each window. This alignment process is laid on in Figure 25.

57

Figure 25: CBF/ICP/EEG window alignment

3.6.2. Data Segmentation

A segmentation methodology was desired to limit the number of brain states in the modeled date sets. In order to do this, a secondary moving window function was applied to take 20 minute data divisions of all three signals: the CBF, ICP and EEG.

Then, a simple Pearson’s correlation coefficient from Equation 26, was calculated for each combination of these 20 minute windows for ICP vs. EEG, EEG vs. CBF and ICP vs. CBF.

푐표푣(푋, 푌) 휌 = (26) 휎푋휎푌

Where ρ represent the Pearson’s correlation coefficient of two signals, X and Y, and σ represents the standard deviation of these signals. To visualize these results, a weighted color diagram was constructed where green was a strong positive correlation, red was a strong negative correlation and yellow presented little to no correlation.

Results showed no visible patterns in the EEG vs. CBF or ICP vs. EEG, which provides evidence for the necessity of deeper EEG modeling. Conversely, there were noticeable

58 periods of strong ICP vs. CBF correlations. However, these correlation values were not always consistently positive or negative. Different patient’s entire data set would exhibit both long stretches of positive and/or negative correlations lasting several hours as shown in Figure 26. Unsure of what these correlations directly exhibit, the proceeding modeling would include separate instances or positive correlation, negative correlation and combined positive and negative correlations periods to compare similarities and differences. The segmentation would take periods of CBF vs. ICP correlation that were greater than 0.5 or less than -0.5 and lasted for a minimum of an hour, or the length of three of these secondary windows.

Pat 21 ICPvEEG EEGvCBF CBFvICP 0.748777 -0.60966 -0.53276 0.134579 -0.10112 -0.66984 0.072739 0.046639 -0.87912 -0.5353 0.602454 -0.98388 -0.29976 0.343839 -0.8975 -0.22251 0.775745 -0.62189 0.213268 0.065361 -0.36188 0.203047 -0.19749 -0.56309 0.142681 -0.16949 -0.86383 0.062282 0.55986 -0.69376 -0.3522 0.529315 -0.74884 0.908878 -0.88447 -0.92733 0.590994 -0.42761 -0.67119 0.626614 -0.10197 -0.11179 -0.34462 0.642918 -0.42388 -0.87569 0.894964 -0.88601 -0.54858 0.533326 -0.78408 -0.26991 0.200396 -0.93758 -0.73769 0.435599 -0.48704 -0.00955 0.22819 -0.82851 0.754998 0.058096 -0.36492 0.376969 -0.63823 -0.79063 -0.02466 0.139412 -0.88411 -0.77136 0.675313 -0.89301 0.473964 -0.04539 -0.59963 0.704937 -0.55557 -0.84369 0.398032 -0.34105 -0.88702 0.196098 -0.27126 -0.5277 -0.03879 0.260652 -0.64493 0.296104 -0.24134 -0.87663 0.462524 -0.73785 -0.69899

Figure 26: Neurological Signal Correlations

59

3.7. Pattern Recognition Using Self-Organizing Map

Of the methodologies discussed in the Literature Review, an ANN method is desired given their adaptable to different inputs. In an ANN, the weights are constantly changing with added input patterns until a criterion has been met. This ability is especially desirable given the unconfirmed theories on complete brain physiology. One of the most powerful of these methodologies, the self-organizing map was specifically targeted as the pattern recognition algorithm for the calculated EEG variables.

The self-organizing map is a valuable tool for clustering input patterns, which are simplified in a color map, displaying this distribution of an unknown set of classes. Using a color gradient, where lighter colors shown similar patterns and darker colors display higher distances, creating boundaries of distinct groups of features. In this way, the SOM aims to distinguish distinct changes in the patient EEG, which would be visually displayed in the SOM topological map, also known as the U-matrix. Ideally, these clusters would relate to labels that relate to actual ICP values or ranges of ICP values from the corresponding window, indicating that changes in the EEG, represented by variables of the EEG signal, are related to the patient ICP.

The first issue that arises with the use of the SOM is determining these ICP labels with limited prior knowledge on this relationship. Initially, it was determined that the ICP could be classified as either “Above Threshold” or “Below Threshold”. These thresholds were then tested using the clinically relevant values of 20 mmHg and the indicated critical high of 40 mmHg, depending on the data distributions. It was also understood that not all patients could be tested using both, one or either of these thresholds depending on the

60

ICP distribution of the patient’s data, which negated several daily accumulated data sets for the initial clustering.

One of the main attributes of the SOM map is that it does unsupervised clustering and does not know the desired number of clusters. Therefore, the trained SOM map has the ability to detect any clusters of EEG data without little predetermined cluster criteria.

This quality has to be taken into consideration when inputting the EEG features into the

SOM map. For example, if a patient’s data is made up of a large portion above threshold data and a small portion below threshold data, the SOM may seek out differences in the above threshold data patters, which could be a result of other factors not representative of the ICP/EEG relationship. Moreover, the SOM may disregard comparably smaller changes that might exist between the above and below threshold clusters. In order to handle these dissimilar data ratios, the EEG feature sets are broken up by varying ICP mean peak thresholds and balanced to have a more equal number of “Above Threshold” and “Below Threshold” EEG vectors. Before these features could be input into the model, normalization was carried out to scale the variety of EEG feature magnitudes.

Self-organizing map (SOM), is an unsupervised learning technique and a versatile tool for classifying the patterns in the feature set SOM is a non-linear, single layer feed- forward network used to map high-dimensional data into a lower dimensional space

(Kohonen, 1982). SOM has the capacity to cluster large, complex data sets with no prior knowledge while summarizing its results in to an easy-to-interpret map. In SOM, each node is given initial weights. As the feature vectors are input into the system as training data, the nodes’ weights, which are the same dimensions as the input vector, are compared to determine which nodes are most similar to the input vector. This node with

61 the closest matching weights is known as the best matching unit (BMU), which is calculated using Equation 27:

‖푥 − 푤푐‖ = min{‖푥 − 푤푗‖} (27) where the Euclidean distance is calculated using the expression below in Equation 28:

푛 2 ‖푥 − 푤푐‖ = √∑(푥푖 − 푤푗,푖) (28) 푖

The number of neurons in the SOM map is represented by j, x is the input vector and w the weight vector for each neuron, with the BMU represented as 푤푐. After the BMU is determined, the ‘radius of neighborhood’ finds nodes located near the BMU. This radius iteratively decreases with each time-step, while the nodes within this radius have their weights adjusted in accordance with the input feature vector. The change to these weights depends on the distance between the node and the BMU as shown in Equation 29 for the adjusted node weight.

푤푗(푡 + 1) = 푤푗(푡) + 훼(푡)ℎ푗,푤푐(푡)(푥 − 푤푗(푡)) (29)

The iteration step is symbolized by t, while ℎ푗,푤푐 gives the topological neighborhood kernel centered at the BMU. The neighborhood kernel function used was a Gaussian expression defined by Equation 30:

2 ‖푟푗−푟푐‖ −( ) 2휎2(푡) (30) ℎ푗,푤푐(푡) = 푒

62 where r is the map location of units c and j and σ is the width of the lattice at time t. Finally,

훼 denotes the learning rate, which monotonically decreases with time and is bound between [0, 1]. The learning rate used was an inverse-of-time function shown in Equation

31, where a0 is the initial learning rate and 훼푇 was the final learning rate. Furthermore, t is the time and T is the training length.

푡⁄ 훼푇 푇 훼0 ( ) (31) 훼0

As this process undergoes each iteration, the training map is constructed, associating output nodes with clusters of characteristic feature patterns.

As these maps were constructed, it was necessary to find a way to quantitatively assess the “clusterability” of the input data. Though the SOM maps can often times be qualitatively assessable, a better validating measure can distinguish between the

“clusterability” between different input combinations and time periods of input features.

4. RESULTS AND DISCUSSION

4.1. ICP-EEG Relationships

Initially, results from the SOM maps were not translatable to quantitative information. Referring back the initial concerns of modeling biomedical data, it was hypothesized that inconsistencies in the SOM map could relate to other states that the brain can react to including environmental stimuli and other physiologic states. To limit these effects, the input data was further segmented into periods of correlated CBF vs.

ICP periods. These periods were not limited in size, as long as they exceeded an hour of data and the correlation strength was relatively maintained.

63

With the constraints put on the modeled EEG data features, there becomes a very limited number of SOM models. However, from these models, there were several observable clusters that could be seen, that typically clustered to two groups. These clusters could be seen in the U-matrix, which is constructed with the EEG inputs and with no previously conceived knowledge of the class labels. The class label map then that verified these clusters matched changes in the physiologic states in the form of ICP changes with respect to the designated threshold as shown in Figure 27.

A = Abnormal

N = Normal

Figure 27: Self Organizing Map Results

64

There are several different ways to look at the cluster validity. The cluster external validity was the primary concern for this application. External validity will look at the clusters compared to the labels. However the label map and U-matrix are not directly comparable as seen in Figure 27. In order to achieve a quantitative estimate of this accuracy, several image processing techniques were combined and performed on the resulting SOM figure. First, the output figure from the SOMToolbox was divided into two separate .png files: one of the U-Matrix and the other of the label map. The label map was scarcely a compatible size to the U-matrix. To best fit them, scaling and shifting were used until the fitted boxed outer boundary of both figures matched up as seen in

Figure 28.

Figure 28: U-matrix Shifting and Scaling

This process entailed, first, determining the outer boundary of the two maps.

These outer boundaries were determined by converting the image to grayscale, keeping

65 only the darkest pixels and identifying the remaining closed-form shapes, of which the largest was labeled as the whole map boundary. Once this was done for both the U- matrix (Step 1) and label map (Step 2), the label map boundary was set over the U- matrix (Step 3). As seen in Figure 28, the boundaries of these maps did not align.

Finding, the different in the horizontal and vertical positions of the upper left corner of the boundary boxes, the U-matrix was shifted to match the boundary box’s left corner position of the label map (Step 4). Finally, the U-matrix much be resized to fit to the label map boundary box (Step 5). Now, the two maps were more compatible for a better estimation of the external cluster relationship validity, where the overlay of the U-matrix and label map are shown in Figure 29.

Figure 29: U-matrix and Desired Output Label Map Overlay

66

Once the maps have been resized, labels, the label positions and cluster boundaries are needed. To find the actual label location, the label map was converted to a black and white image. The boundaries around all the resulting closed-shape regions were found by connecting the black pixels. Then, a rectangular box was fit to contain the closed form shape inside, where:

푐푏표푡푡표푚,푙푒푓푡 = (푥푚푖푛, 푦푚푖푛)

푐푏표푡푡표푚,푟푖푔ℎ푡 = (푥푚푎푥, 푦푚푖푛) (32-35)

푐푡표푝,푙푒푓푡 = (푥푚푖푛, 푦푚푎푥)

푐푡표푝,푟푖푔ℎ푡 = (푥푚푎푥, 푦푚푎푥) from which, x and y are the closed form shapes and c are the boundary box corners.

The largest boundary box was ignored to exclude the actual label map and all boundary boxes outside that box were also ignored to exclude letters and labels from the figure outside of the actual map boundary box. The resulting boundary boxes included only the ‘A’s and ‘N’s. To attain the actual label values, a simple division of the larger and small remaining boundary boxes was made. The larger boundary boxes correlated with the ‘N’ labels, which had a greater overall area, and the smaller boundary boxes correlated with the ‘A’ labels. An example of the boundary boxes are shown in Figure

30.

67

Figure 30: Label Location and Text Reader

Now that the desired output labels and location have been established, the clusters created by the U-matrix can be assessed. In an SOM map, clusters are observed by the boundaries defined by the color scale. By tracing this region, the number of clusters and location can be determined. To do this, the RGB pixels which exceeded a threshold for red values was set to white. Then, the U-matrix is converted to a black and white image, where the lighter pixels, which correspond to the background and cluster boundaries get set to white, while the remaining blue pixels are set to black. As a result, a similar process can be carried out to detect closed form shapes. From this process, the locations of the outer perimeter tracings of each distinct cluster can be achieved. The traced closed form shapes with an area below a specified threshold were also excluded

68 from the cluster identification to ignore small text. Finally, of the remaining shapes, the shapes with the maximum x-position were excluded to ignore the color bar of the SOM.

These resulting cluster are shown in Figure 31 below.

1

2

Figure 31: Cluster Boundary Tracing

Having established these clusters, cluster locations, labels and label locations, an estimated relationship validity calculation can be achieved with respect to the EEG clusters and ICP labels. This estimated validity measure is calculated assuming the SOM would classify the cluster region to be the same as the majority of the ICP labels in that

69 cluster regional boundary. Therefore, each cluster is labeled as an overall ‘N’ or ‘A’ cluster based on the majority of the true labels in that region.

Confusion Matrix

True False Negative Negative 25 0 100% A0 51.0% 0.0% 0.0%

False True Positive Positive N 1 23 95.8% 1 2.0% 46.9% 4.2%

Output Class Output

Estimated Relationship Validity 96.2% 100% 98.0% 3.8% 0.0% 2.0%

0A 1N Target Class

Figure 32: Confusion Matrix

As a next step, the number of output labels that match the cluster labels correctly are summed. Likewise, the incorrect matches are tallied. The estimated relationship validity is then determined using Equation 36, which compares the actual labels to the cluster classes. This information can be summarized in a Confusion matrix, where the output classes are defined by the ICP labels and the target classes are obtained from the EEG

70

SOM clusters. Furthermore, the meanings of the Confusion matrix and results from the sample SOM are given in Figure 32.

푇푃 + 푇푁 퐸푅푉 = ∙ 100 % (36) 푇푃 + 푇푁 + 퐹푃 + 퐹푁

For periods of the appropriate data lengths and trends are shown below in Table 8.

Table 8: Estimated Relationship Validity Results

Estimated Model Patient Time Interval Threshold Correlation Relationship Validity 1 21 2.1 hours 40 mmHg (-) 72.3 % 2 21 4.2 hours 40 mmHg (-) 82.1 % 3 21 13.8 hours 40 mmHg (-) 97.9 % 4 22 7.7 hours 40 mmHg (-) 87.2 % 5 22 2.4 hours 40 mmHg (-) 83.7 % 6 22 4.2 hours 40 mmHg (+) 98.0 % 7 22 6.8 hours 20 mmHg (+) 82.3 % 8 26 9.0 hours 40 mmHg (-) 87.8 % 9 26 1.7 hours 40 mmHg (-) 81.0 % 10 26 6.8 hours 20 mmHg (-) 81.1 %

Additionally, the SOM estimated validity over time for one data files is shown in Figure

33, where the time interval was increased for the modeled data. From this figure, the validity measurement increased with the time interval. Overall, from these results, it can be said that there is evidence of this complex EEG/ICP relationship which seems to be dependent upon brain states, which can exist during periods of ICP/CBF correlation.

Furthermore, a high estimated relationship validity was found for cases of both positive

71 and negative correlation, with an average relationship validity of 85.3%. However, better validity estimates were found for strong negatively correlated periods using the 40 mmHg ICP threshold for ICP labeling.

Figure 33: External Validity Time Relationship

Combining divisions of correlated ICP data within the patient data set with adequate data, the following SOM models were assembled for Table 9, where the validity measure decreased drastically from the individual results in Table 8. As a result, the

EEG/ICP relationship can be said to be time-dependent. This time-dependency makes the global patient model even less likely, where combined results of Patient 21 and

Patient 21 yielded a relationship validity of 50.5 %, which was as a result of cluster boundaries not being strong enough to distinguish with the validation method.

72

Table 9: Combined Patient Results

Patient External Relationship Validity 21 60.3 % 22 51.9 % 26 61.1 %

4.2. Data Cleaning Strategies

Measurement based treatments require reliable and accurate data from biomedical sensors. However, sensor artifacts and distortions from human activity in a dynamic environment are unavoidable. Data-driven methodologies require quality data, necessitating a proper data cleaning strategy to deal with these aforementioned distortions. Biomedical signals have several different important aspects to take into account when developing sets of rules and procedures to ensure data quality, while maintaining the signal integrity:

 Distinctive signal features. Biomedical signals often have their own unique

characteristics. Conferring with medical literature and specific medical

professionals can bring further insight into what signal characteristics to preserve

and what characteristics should be filtered out during the data cleaning process.

Medical knowledge can provide information of normal signal amplitudes, as well

as how these signals are translated in a medical environment to determine the

most important aspects of these signals as well. For any signal removing large

outliers and distortions requires the understanding of what normal values and

behavior of the expected signal looks like.

 Events that may occur and what distortions that may cause. Depending on the

patient population and the signal of interest, there are innumerable instances

73

that can cause changes in the signal that are not reflective of the actual

measured values. Knowing the possible courses of treatment and environmental

influences and how they affect these signals is essential in developing an

approach to remove these variables.

 The systematic way that the data is collected. Data outliers and distortions may

not only occur from influences outside of the sensor system, but in the data

collection system itself. Knowing how the data is collected helps determine

additional rules that may be necessary to filter out electric noise and further

understand data collection variables such as distance between points, which is

based on sampling rate.

With these three areas covered, general rules for outlier removal and filtering can be enacted in a similar way to this project’s ICP processing to perform proper data cleaning for other biomedical signals. In a more general way, these rules can be outlined in Figure

34. These steps can be considered a standard set of protocols to remove extraneous information from any biomedical signal. Further filtering, smoothing or other signal processing strategies would be selected for exclusively to the signal of interest. This data cleaning strategy becomes much more difficult for non-stationary signals, whose patterns are less predictable, which is why the acceptance/rejection method was utilized for the

EEG signal. Some of these rules can help guide the development of a data cleaning strategy, to capture proper data that represents the measurement of interest and disregard those that do not.

74

Figure 34: Biomedical Data Cleaning Strategy

75

5. CONCLUSIONS AND FUTURE WORK

5.1. Research Findings

Through research of biomedical signals, collaboration with experts in the medical field and integration of complex systems modeling, evidence for a relationship between

ICP and EEG has been presented. This basis of a relationship creates groundwork for further examination into understanding the ICP/EEG relationship as well as incorporation of more neurophysiologic signals into the model. This approach further established a general framework for biomedical signal data cleaning.

Most stationary biomedical signals will have some of the data cleaning issues presented in these studies. Disconnection and other signal disruptions and distortions are an all too common event in medical data, where the monitored asset, the patient, is undergoing various treatments, procedures and assessments that can influence the data collection. By following the outlined data cleaning strategy, with proper medical consultation with respect to the threshold setting, this process can be applied to any stationary biomedical signal. This data cleaning strategy has been designed to be as general as possible, whereas signal processing methodologies will be required to be more dependent on the signal itself

Preliminary correlation analysis of the EEG/ICP variables, observed in the correlation analysis, suggest that there lacks a simplicity of this relationship and more complex modeling and considerations must be taken into account. Results from the feature selection in the Fisher Ranking suggest time-frequency features represented by

76 wavelet features are the best way to explore this relationship. Modeling of this relationship appears to also require even more complex considerations to better fully understand the signal-based relationship. The ultimate goal of these studies was to establish if a relationship between EEG and ICP existed. Results from the SOM modeling of correlated ICP/CBF segments suggest that this EEG/ICP relationship does, in fact, exist. The estimated accuracy of these data sets reached from 70.9 % to 97.8 %.

5.2. Broader Impacts

Eventually, these results hope to impact understandings of treatments in the

NICU. Now that evidence of a relationship has been recognized, further analysis into this relationship can be executed, which requires some of the suggestions discussed in the proceeding section. This basis of knowledge can begin the expansion of neurophysiologic mapping using complex modeling techniques. The ultimate value in defining this relationship can lead to the following:

 Development of a manner in which to measure ICP using scalp EEG,

eliminating the need to drill into the patients’ skull for cranial feedback.

 Acquire a definable relationship between ICP and EEG to look at both

immediate minute-by-minute effects of ICP levels on EEG and whether

this affects cognition.

 Improve clinical understandings of this relationship in a precision medicine

approach to establish an ICP intervention threshold that is unique to a

patient given their neurophysiologic feedback.

77

5.3. Recommendations for Future Work

The work that has been laid out has been essential in establishing an attainable

EEG/ICP relationship in TBI patients. However, there is a considerable amount of remaining work necessary to establish a more detailed map of this relationship, as well as a better understanding of what the discussed brain states are and the conditions under which they are occurring.

This work was specifically focused on determining a relationship between EEG and ICP, however there are numerous additional physiological measurements taken from the NICU to be incorporated into this model. These additional signals may be necessary to separate different neurophysiologic regimes of the EEG signal itself, which was a major concern of the current research, which is why only limited time segments of correlated

CBF/ICP data had to be targeted. To be able to completely map out this physiological brain model would be a major contribution to neurophysiologic therapies. This work was also directed towards modeling characteristics of physiological states such as elevated

ICP, however there are multiple other physiologic states and environmental factors which may be contributing to further EEG patterns. For example, a patient’s EEG patterns will be altered based on wakefulness, arousal, sleep, etc., which can be further integrated into the model. These environmental factors could possibly attribute to different working regimes of the EEG signal itself. If a framework for determining these behavioral states can be formed, a more accurate physiological assessment may be achievable. This data is not currently covered in the provided data set, but may be necessary to segment data into behavioral states as well as physiologic states.

78

Beyond modifications to the current methodology and further neurophysiologic brain mapping, this presented technique requires several additional steps and review before any type of clinical application. Part of this would entail testing additional features and models. Additional features include testing other mother wavelet selections or wavelet features, while further feature selection techniques may be tested as well to try to improve the current model to help develop the detailed model targeted in the overall signal mapping.

These steps would be crucial in validating this data-driven approach towards discovering these neurophysiologic relationships.

79

REFERENCES

Abdullah, Haslaile, et al. "Cross-Correlation of EEG Frequency Bands and Heart Rate

Variability for Sleep Apnea Classification." Medical & biological engineering &

computing 48.12 (2010): 1261-9. Web.

Alotaiby, Turkey N., et al. "EEG Seizure Detection and Prediction Algorithms: A

Survey." EURASIP Journal on Advances in Signal Processing 2014.1 (2014): 1-

21. Web.

Al-Sayouri, Saba. "Predictive Analytics of Hospital Readmissions using an Integrated

Data Mining Framework." ProQuest Dissertations Publishing, 2014. Web. "The

American Association of Neurological Surgeons." AANS. N.p., n.d. Web.

Asadi, Hamed, et al. "Machine Learning for Outcome Prediction of Acute Ischemic

Stroke Post Intra-Arterial Therapy." PloS one 9.2 (2014): e88225. Web.

Asri, Hiba, et al. "Big Data in Healthcare: Challenges and Opportunities". Web.

Badri, Shide, et al. "Mortality and Long-Term Functional Outcome Associated with

Intracranial Pressure After Traumatic Brain Injury." Intensive care medicine 38.11

(2012): 1800-9. Web.

"Banyan Biomarkers, Inc. Licenses Protein Markers to Help Evaluate Traumatic Brain

Injury." Business Wire2014. Web.

80

Bennett, T. D., et al. "Variation in Intracranial Pressure Monitoring and Outcomes in

Pediatric Traumatic Brain Injury." Archives of Pediatrics and Adolescent

Medicine 166.7 (2012): 641-7. Print.

Bezdek, James C. (1981). Pattern Recognition with Fuzzy Objective Function

Algorithms. ISBN 0-306-40671-3.

BPM Neuromonitoring Guide. (n.d.). Retrieved November 4, 2016, from

http://www.promedics.de/Downloads/HEMEDEX/BPM Neuromonitoring Guide rel

5 11.pdf

Brain Trauma Foundation. (2007). Guidelines for Management of Severe Traumatic

Brain Injury. Journal of Neurotrauma. vol. 24, pp. S1-S106. doi:

10.1089/neu.2007.9976

Budohoski, Karol P., et al. "Non-Invasively Estimated ICP Pulse Amplitude Strongly

Correlates with Outcome After TBI." 2012th ed. 114 Vol. Vienna: Springer Vienna,

2012. 121-125. Web.

Chambers, Iain, et al. "BrainIT Collaborative Network: Analyses from a High Time-

Resolution Dataset of Head Injured Patients." Acta neurochirurgica.Supplement 102

(2008): 223. Web.

Chen, GY. "Automatic EEG Seizure Detection using Dual-Tree Complex Wavelet-

Fourier Features." Expert Systems with Applications 41.5 (2014; 2013): 2391-4.

Web.

81

Chen, Hui, et al. "A New Method of Intracranial Pressure Monitoring by EEG Power

Spectrum Analysis." Canadian Journal of Neurological Sciences 39.4 (2012): 483-

7. Web.

Coifman, R.R, & Wickerhauser, M.V. “Entropy-based Algorithms for best basis selection.”

IEEE Trans. On Inf. Theory 38.2 (1992): 713-718.

Connolly, Mark, Paul Vespa, and Xiao Hu. "Characterization of Cerebral Vascular

Response to EEG Bursts using ICP Pulse Waveform Template Matching." Acta

neurochirurgica.Supplement 122 (2016): 291-4. Web.

Corrigan, John D., Anbesaw W. Selassie, and Jean A. Langlois Orman. "The

Epidemiology of Traumatic Brain Injury." The Journal of head trauma

rehabilitation 25.2 (2010): 72-80. Web.

Davies, David L., & Bouldin, Donald W., “A Cluster Separation Measure.” IEEE

Transactions on Pattern Analysis and Machine Intelligence. PAMI-1.2 (1979): 224-

227. doi:10.1109/TPAMI.1979.4766909.

De Georgia, Michael A., and Anupa Deogaonkar. "Multimodal Monitoring in the

Neurological Intensive Care Unit." The Neurologist 11.1 (2005): 45-54. Web.

Environmental Protection Agency. "Leaning Manufacturing and the Environment." Lean

Report. October 2003 Web. September 22, 2016

.

82

Faul, Mark, and National Center for Injury Prevention and Control (U.S.). Traumatic

Brain Injury in the United States: Emergency Department Visits, Hospitalizations,

and Deaths, 2002-2006. Atlanta, Ga.: U.S. Department of Health and Human

Services, Centers for Disease Control and Prevention, National Center for Injury

Prevention and Control, 2010. Web.

Foreman, Brandon, and Jan Claassen. "Quantitative EEG for the Detection of Brain

Ischemia." Critical Care16.2 (2012): 216-. Web.

Freeman, Walter, and Rodrigo Quian Quiroga. Imaging Brain Function with EEG:

Advanced Temporal and of Electroencephalographic Signals. 1.

Aufl.; 1 ed. New York, NY: Springer-Verlag, 2013; 2012. Web.

Geyer-Schulz, Andreas. Fuzzy Rule-Based Expert Systems and Genetic Machine

Learning. 3 Vol. Heidelberg: Physica-Verlag, 1995. Web.

Giannakopoulos, Panteleimon, et al. "Electrophysiological Markers of Rapid Cognitive

Decline in Mild Cognitive Impairment". Web.

Gilkes, C. E., and P. C. Whitfield. "Intracranial Pressure and Cerebral Blood Flow. A

Pathophysiological and Clinical Perspective." Surgery (Oxford) 27.3 (2009): 139-

44. Web.

Gobiet, W., W. Grote, and W. J. Bock. "The Relation between Intracranial Pressure,

Mean Arterial Pressure and Cerebral Blood Flow in Patients with Severe Head

Injury." Acta Neurochirurgica 32.1-2 (1975): 13. Web.

83

Goncalves, Joao MC, et al. "Real-Time Predictive Analytics for Sepsis Level and

Therapeutic Plans in Intensive Care Medicine." International Journal of

Healthcare Information Systems and Informatics9.3 (2014): 36-54. Web.

Güneş, Salih, Kemal Polat, and Şebnem Yosunkaya. "Efficient Sleep Stage Recognition

System Based on EEG Signal using k-Means Clustering Based Feature

Weighting." Expert Systems with Applications37.12 (2010): 7922-8. Web.

Haddad, Tahar, et al. "Temporal Epilepsy Seizures Monitoring and Prediction using

Cross-Correlation and Chaos Theory." Healthcare Technology Letters 1.1 (2014):

45-50. Web.

Hall, Allan, and Roddy O’Kane. "The Best Marker for Guiding the Clinical Management

of Patients with Raised Intracranial pressure—the RAP Index Or the Mean Pulse

Amplitude?" Acta Neurochirurgica158.10 (2016): 1997-2009. Web.

Hamida, Sana Tmar-Ben, Beena Ahmed, and Thomas Penzel. "A Novel Insomnia

Identification Method Based on Hjorth Parameters". Web.

Hjorth, Bo & Elema-Schönander, AB. “EEG analysis based on time domain properties”.

Electroencephalography and Clinical Neurophysiology 2.2 (1970): 306-310.

Hojjat Adeli, Ziqin Zhou, and Nahid Dadmehr. "Analysis of EEG Records in an Epileptic

Patient using Wavelet Transform." Journal of neuroscience methods 123.1

(2003): 69-87. Web.

84

Javed, Ali. "A Hybrid Approach to Semantic Hashtag Clustering in Social Media."

ProQuest Dissertations Publishing, 2016. Web.

Kevorkova, Olha, and Alexandre Popov. "Developing Requirements on a PHM-Based

Technology to Enable Autonomous Healthcare on Space Missions". Web.

Khorshidtalab, A., MJE Salami, and M. Hamedi. "Robust Classification of Motor Imagery

EEG Signals using Statistical Time-Domain Features." Physiological

Measurement 34.11 (2013): 1563-79. Web.

Kohonen T. Self-organized formation of topologically correct feature maps. Biol Cybern.

1982;43(1):59-69.

Langlois, Jean A., Wesley Rutland-Brown, and Marlena M. Wald. "The

and Impact of Traumatic Brain Injury: A Brief Overview." The Journal of head

trauma rehabilitation 21.5 (2006): 375-8. Web.

Lee, BG, BL Lee, and WY Chung. "Mobile Healthcare for Automatic Driving Sleep-

Onset Detection using Wavelet-Based EEG and Respiration

Signals." SENSORS 14.10 (2014): 17915-36. Web.

Ma, Shuoxin, and D. W. Bliss. "Intra-Patient and Inter-Patient Seizure Prediction from

Spatial-Temporal EEG Features". Web.

Manevich, Y., et al. "Peroxiredoxin VI Oxidation in Cerebrospinal Fluid Correlates with

TBI Outcome." Free radical biology & medicine 72 (2014): 210-21. Web.

85

Marmarou, A., et al. "Impact of ICP Instability and Hypotension on Outcome in Patients

with Severe Head Trauma." Journal of neurosurgery 75 (1991): S59-66. Web.

Mehta, A., et al. "The Relationship between Intracranial Pressure (Icp) and Cerebral

Perfusion Pressure (Cpp) with Outcome in Young Children After Traumatic Brain

Injury (Tbi)." Critical Care Medicine37.12 (2009): A292-. Web.

Mendil, Boubekeur, and K. Benmahammed. "Activation and Defuzzification Methods for

Fuzzy Rule-Based Systems." Journal of Intelligent and Robotic Systems 32.4

(2001): 437-44. Web.

Miller, J.D., Butterworth, J.F., Gudeman, S.K., Faulkner, J.E., Choi, S.C., Selhorst, J.B.,

Harbison, J.W., Lutz, H.A., Young, H.F., & Becker, D.P. (1981). Further Experience

in the Management of Severe Head Injury. Journal of Neurosurgery. vol. 54 (3),

pp. 289-299.

Millichap, J. Gordon. "EEG in Prediction of Early Neurodevelopment of Preterm

Infants." Pediatric Neurology Briefs 26.11 (2012): 85-. Web.

Mirowski, Piotr, et al. "Classification of Patterns of EEG Synchronization for Seizure

Prediction." Clinical Neurophysiology 120.11 (2009): 1927-40. Web.

Musselman, Marcus, and Dragan Djurdjanovic. "Time-Frequency Distributions in the

Classification of Epilepsy from EEG Signals." Expert Systems with

Applications 39.13 (2012): 11413. Web.

86

Narayan, R.K., Kishore, P.R.S, Becker, D.P, Ward, J.D., Enas, G.G., Greenberg, R.P, Da

Silva, A.D., Lipper, M.H., Choi, S.C., Mayhall, C.G., Lutz III, H.A., & Young, H.F.

(1982). Intracranial Pressure: to Monitor or not to Monitor? Journal of

Neurosurgery. vol. 56 (5), pp. 650-659.

Nenadovic, Vera, Jose Luis Perez Velazquez, and James Saunders Hutchison. "Phase

Synchronization in Electroencephalographic Recordings Prognosticates

Outcome in Paediatric Coma." PloS one 9.4 (2014): e94942. Web.

Ocak, Hasan. "Automatic Detection of Epileptic Seizures in EEG using Discrete Wavelet

Transform and Approximate Entropy." Expert Systems with Applications 36.2

(2009): 2027-36. Web.

Partington, Tomas, and Andrew Farmery. "Intracranial Pressure and Cerebral Blood

Flow." Anaesthesia and Intensive Care Medicine 15.4 (2014): 189-94. Web.

Rokach, Lior, and Oded Maimon. "Clustering Methods." Boston, MA: Springer US,

2005. 321-352. Web.

Rundgren, Malin, et al. "Amplitude-Integrated EEG (aEEG) Predicts Outcome After

Cardiac Arrest and Induced Hypothermia." Intensive care medicine 32.6 (2006):

836-42. Web.

Sandsmark, Danielle K., et al. "Sleep Features on Continuous Electroencephalography

Predict Rehabilitation Outcomes After Severe Traumatic Brain Injury." The

Journal of head trauma rehabilitation 31.2 (2016): 101-7. Web.

87

Shehadeh, Karmel Sami. "Predictive Analytics and Interpretation of Brain Aneurysms."

ProQuest Dissertations Publishing, 2014. Web.

Sheth, Amit. "Transforming Big Data into Smart Data: Deriving Value Via Harnessing

Volume, Variety, and Velocity using Semantic Techniques and

Technologies". Web.

Siegel, D. and Lee, J., "An Auto-Associative Residual Processing and K-means

Clustering Approach for Anemometer Health Assessment," International Journal

of Prognostics and Health Management Society, 2.2 (2011): 12.

Siegel, D., Ly, C. and Lee, J., "Methodology and Framework for Predicting Rolling

Element Helicopter Bearing Failure,” Special Issue on PHM in the IEEE

Transactions on Reliability, 16.4 (2012).

Singhi, Sunit C., and Lokesh Tiwari. "Management of Intracranial Hypertension." The

Indian Journal of Pediatrics 76.5 (2009): 519-29. Web.

S. P. Lloyd, "Least Squares Quantization in PCM", IEEE Trans. Information Theory 28

(1982): 129-137.

Srinivasan, K., J. Dauwels, and M. R. Reddy. "Multichannel EEG Compression:

Wavelet-Based Image and Volumetric Coding Approach." IEEE Journal of

Biomedical and Health Informatics 17.1 (2013): 113-20. Web.

88

Su, Fong-Chin, et al. "Fuzzy Clustering of Gait Patterns of Patients After Ankle

Arthrodesis Based on Kinematic Parameters." Medical Engineering and

Physics 23.2 (2001): 83-90. Web.

Tackla, Ryan, et al. "Assessment of Cerebrovascular Autoregulation using Regional

Cerebral Blood Flow in Surgically Managed Brain Trauma Patients." Neurocritical

Care 23.3 (2015): 339-46. Web.

"The CLUSTER Procedure: Clustering Methods". SAS/STAT 9.2 Users Guide. SAS

Institute.

Tsutsumi H, Ide K, Mizutani T, et al. “The relationship between intracranial pressure,

cerebral perfusion pressure and outcome in head injured patients: the critical level

of cerebral perfusion pressure”, in Miller JD, Teasdale GM, Rowan JO, et al (eds):

Intracranial Pressure VI. Berlin: Springer-Verlag (1986): 661–666.

Ursino, Mauro, and Carlo Alberto Lodi. "A Simple Mathematical Model of the

between Intracranial Pressure and Cerebral Hemodynamics." Journal of applied

physiology 82.4 (1997): 1256-69. Web.

Wessler, BS, et al. "Clinical Prediction Models for Cardiovascular Disease Tufts

Predictive Analytics and Comparative Effectiveness Clinical Prediction Model

Database." CIRCULATION-CARDIOVASCULAR QUALITY AND

OUTCOMES 8.4 (2015): 368-75. Web.

89

W. M. Rand (1971). "Objective criteria for the evaluation of clustering methods". Journal

of the American Statistical Association. American Statistical

Association. 66 (336): 846–850. doi:10.2307/2284239. JSTOR 2284239.

Wright, Wendy L. "Multimodal Monitoring in the ICU: When could it be Useful?" Journal

of the neurological sciences 261.1 (2007): 10-5. Web.

Yaghouby, Farid, et al. "Feasibility of Seizure Risk Prediction using Intracranial EEG

Measurements in Dogs". Web.

Ye, Luo, et al. "Customer Segmentation for Telecom with the k-Means Clustering

Method." Information Technology Journal 12.3 (2013): 409-13. Web.

Yu, Guoshen. “Solving Inverse Problems with Piecewise Linear Estimators: From

Gaussian Mixture Models to Structured Sparsity”. IEEE Transactions on Image

Processing 21(5) (2012): 2481-2499. doi:10.1109/tip.2011.2176743

Yuan, S., Ge, M. and Lee, J., Xu, Y. S. Intelligent Diagnostics of Electromechanical

Operation Systems, Proceeding of IEEE International Conference on Robotics

and Automation, New Orleans, Louisiana, April 26 - May 1, 2004. S.A. Ali, J. Lee

And H. Seifoddini, “Multi Constraint Based Product Mix and Dynamic Scheduling

for Flexible Manufacturing Systems”, Proceedings of the 2004 Japan-USA

Symposium on Flexible Automation (JUSFA 2004), Denver, Colorado, July 19-

21, 2004.

90

Yucelbas, Sule, et al. "Effect of EEG Time Domain Features on the Classification of

Sleep Stages." Indian Journal of Science and Technology 9.25 (2016) Web.

Zhao, W., Siegel, D., Lee, J., Su, L., "An Integrated Framework of Drivetrain

Degradation Assessment and Fault Localization for Offshore Wind Turbines",

International Journal of the PHM Society: Volume 4 (Special Issue Wind Turbine

PHM) 012, pages 12, 2013.

91