<<

Faculdade de Engenharia da Universidade do Porto

Computational Analysis of Magnetic Resonance Images of the Upper Airways: Algorithms and Applications

Jessica Condesso Delmoral

February 2015

ii

Computational Analysis of Magnetic Resonance Images of the Upper Airways: Algorithms and Applications

Jessica Condesso Delmoral

Monograph of Preparation for Dissertation on Integrated Master in Bioengineering – Branch of Biomedical Engineering Faculty of Engineering of the University of Porto

Supervisor: Prof. Doutor João Manuel R. S. Tavares Prof. Auxiliar do Departamento de Engenharia Mecânica e Gestão Industrial Faculdade de Engenharia da Universidade do Porto

Co-Supervisor: Dra. Sandra Rua Ventura Escola Superior de Tecnologia da Saúde do Porto

February 2015

iii

iv

Abstract

This monograph report falls within the issues addressed in Computer Vision, specifically under the field of three-dimensional (3D) Reconstruction of anatomically relevant structures in medical images. The tasks of imaging and quantification of the ’s anatomy is of great relevance on applications of further study and analysis of the incomplete knowledge of the anatomic and physiological mechanisms that govern it, the physiology included in the tasks of breathing, and speech production, and finally its application to surgical planning, post-operative rehabilitation and the study of new adaptations acquired upon possible changes in function of pathological origin, such as in the presence of tongue cancer, surgical intervention or aging. The sparse knowledge of these mechanisms represents a prominent gap in the established anatomy of the human body. However, advances under the Computer Vision field over the years, have allowed the development of new tools of analysis that can be applied to medical images of the human body. Accordingly, this can fulfill the gap on the understanding of the human tongue functionality, using the improvement of areas of image reconstruction and segmentation. For the study of soft tissues, the state of the art technic for visualization is Magnetic Resonance imaging (MRI), since it provides the best image contrast of soft tissues such as the muscular tissue of the tongue. The state of the art technics of image processing and computer vision applied in the study of the human tongue are described in this report as well as the different stages of this analysis in order to obtain the best quality on analysis and reconstruction that include various technics of image registration using Mutual Information similarity measure, and segmentation though supervised and semi-supervised seed growing, landmark-based procedures, among others.

v

Contents

ABSTRACT ...... V CONTENTS ...... VI LIST OF FIGURES ...... VIII LIST OF TABLES ...... IX NOMENCLATURE ...... X - INTRODUCTION ...... 1

1.1. MOTIVATION ...... 2 1.2. OBJECTIVES ...... 2 1.3. REPORT ORGANIZATION ...... 3 HUMAN UPPER AIRWAY ANATOMY ...... 5

2.1. ANATOMY OF THE TONGUE ...... 6 2.2. NEUROPHYSIOLOGICAL CONTROL OF THE TONGUE ...... 12 2.3. SPEECH PRODUCTION, RESPIRATION AND DEGLUTITION ...... 13 2.4. FINAL CONSIDERATIONS ...... 16 - MAGNETIC RESONANCE IMAGING IN THE CONTEXT OF UPPER AIRWAY STRUCTURAL COMPONENTS ...... 19

3.1. INTRODUCTION ...... 19 3.2. BASIC PRINCIPLES IN MAGNETIC RESONANCE IMAGING ...... 20 3.3. RELAXATION TIMES ...... 21 3.4. K-SPACE ...... 22 3.5. CONTRAST AND TISSUE SIGNAL IN RM ...... 22 3.5.1. TR, TE and Pulse Sequences ...... 23 3.6. LIMITATIONS AND DETERMINANT CONSIDERATIONS ...... 23 - STATE OF THE ART ...... 25

4.1 INTRODUCTION ...... 25 4.2 STATE OF THE ART ...... 25 4.3 MRI 3D VOLUMES IMAGE SEGMENTATION TECHNICS ...... 27 4.3.1. DICOM Standard Overview and Volumetric Data ...... 27 4.3.2. Slice-to-Volume Registration ...... 29

vi

4.3.3. Super Resolution Volumes ...... 30 4.3.4. Segmentation ...... 30 4.4 3D RECONSTRUCTION ...... 32 – FINAL CONCLUSIONS AND FUTURE PERSPECTIVES ...... 33

5.1. FINAL CONCLUSIONS ...... 33 5.2. FUTURE PERSPECTIVES ...... 33 REFERENCES ...... 35

vii

List of Figures

Figure 2.1 - MR Midsagittal image (slice) indicating the vocal tract’s structures. From Rua Ventura et al. (2011)...... 5

Figure 2.2 - Side view of the skull. The styloid process is just posterior to the ...... 6

Figure 2.3 - Tongue’s attachments and neighboring structures in a sagittal anatomical view. From Gray (1918)...... 7

Figure 2.4 - Extrinsic muscle of the tongue. visible at center top (in red). From Gray1019 modified by Uwe Gille...... 9

Figure 2.5 - Muscles of the tongue, from Takemoto (2001). GG , T transversus, V verticalis, HG , IL inferior longitudinalis, S superior longitudinalis, PG palatoglossus, SG Styloglossus...... 11

Figure 2.6 - Tongue contour extracted from midsagittal images, during production of vocalic sounds present in Portuguese language. From (Ventura et al., 2008)...... 14

Figure 2.7 - Abd-El-Malek (1955) illustration of the preparatory stage of mastication...... 15

Figure 2.8 - Abd-El-Malek (1955) illustration of the throwing stage of mastication...... 15

Figure 2.9 - Abd-El-Malek (1955) illustration of the guarding stage of mastication...... 15

Figure 2.10 - Abd-El-Malek (1955) illustration of the initial stage of deglutition. . 16

Figure 3.1 - Spin orientation of atoms within tissues under no external field (a), and under a horizontal external field (b). Adapted from (Rinck, 2001)...... 20

Figure 3.2 - Representation of equilibrium magnetization vector B0, and the modulated magnetic RF field, at Larmor frequency, which forms a spiral magnetic field, oriented in the z axis. Consequent spin precession phenomenon with angle θ of deflection is represented. Adapted from (Rinck, 2001)...... 20

viii

List of Tables

Table I - Muscles of tongue movement. From Seikel et al. (2009)...... 10

ix

Nomenclature

2D Two Dimensional 3D Three Dimensional CAD Computer-Aided Diagnosis DTI Diffusion Tensor Imaging GG Genioglossus muscle GTF Game-theoretic Framework MI Mutual Information MRI Magnetic Resonance Imaging NMCs neuromuscular compartments PET Positron Emission Tomography RF Radiofrequency RW Random Walker SPECT Single-photon emission computed tomography US Ultrasound

x

- Introduction

The tongue constitutes a unique anatomical structure among all the organs integrating the human body. It is a specialized organ located in the oral cavity, which plays an important role in respiration, mastication, deglutition, speech production of humans, and also suction in children. The ability of production of fast and precise movements during the production of vocalic and consonant sounds, and doing so that there is an extensive variety of languages, each with its characteristic sounds, makes the study of the tongue of great interest and importance. The muscle components of the tongue have the unique purpose of contracting in order to deform the body of the tongue itself, and not simply function as most skeletal muscles in the human body. These act as a force generating organ for the movement and stabilization of attached body structures. In speech production the tongue deforms to modulate the flow and acoustic resonances of air through the vocal tract. The transport of the bolus through and around the appropriate surfaces through tongue movements, followed by its propulsion into the esophagus is the purpose of mastication and deglutition tasks, respectively. The functions of speech production and swallowing, can affect particularly the survival and quality of life. Therefore, for this process to occur, the tongue needs to be able to execute a sequence of organized and integrated motor events, mediated by neuro-motor stimulus, which can only be feasible if the anatomical and physiological integrity of this structure is preserved. All of these functions are controlled by highly evolved neuromuscular systems under both voluntary and involuntary control. The purpose of this dissertation is to establish an automatic segmentation and volume 3D reconstruction tool of analysis for further understanding of how the tongue changes shape in response to muscular contraction, given that researchers have remarked that our knowledge of the tongue is extremely limited.

The tongue, in spite of always being recognized as the primary and absolutely indispensable organ for the articulation of speech sounds, seems to be somewhat left behind in the study of speech. One of the reasons for this may be that the anatomical structure of the tongue is far from simple and that the dynamics of lingual movement are as complex as the tongue itself. (Miyawaki, 1974, 29).

1

1.1. Motivation

The upper airways, in all of its functionalities, represent a vital mean of survival for humans. This section, is part of the respiratory system, limited from the to the trachea. There are many functions secured by all the component organs that constitute this section, being the vocal tract one of the most important and complex structures. One of the most important structures is the tongue, an organ controlled by complex neuromuscular mechanisms, capable of high deformations of its shape to conquer the physiological tasks in which it intervenes. The study of the full detailed anatomy if this organ has recently gained significant relevance, and the comprehension level towards the study of the complex system of tongue conformation during the various functions, such as, deglutition, speech production and respiratory conformation, have proven to play a key role in its correct execution, where speech impairments, respiratory disturbances, as well as other pathologic consequences need to be studied in further depth. Magnetic Resonance Imaging (MRI), is an imaging technic first discovered in 1952 by Felix Bloch (University of Stanford) and Edward Purcell (University of Harvard), for which they received the Nobel Prize in Physics. This technic revolutionized medical imaging, having been only comparable to the invention of the X-Ray by Wilhelm Conrad Retgen’s, having been first applied to medical purposes in the 1970’s decade (Rinck, 2001). Emerging researches are being carried out addressing the study of the functional, mechanical and dynamic properties, whereas it is well established that targeting specifically the tongue is a matter of high relevance. Currently, there are no tools or exams that allow direct characterization or evaluation of tongue motion. The study in a Computer Vision point of view is therefore of high importance in this field, and the objective is the creation of Computer Aided-Diagnosis (CAD) tools of modelling and quantification. Many are the advantages that derive from tongue segmentation and reconstruction, and its extent goes from the adequacy of imaging acquaintance through MRI to the three dimensional analysis needed to understand its conformation and dynamics. Therefore, the diagnostics and surgery planning related to the structures included in the upper airways holds a gap that can be fulfilled through the development of an automatic Computer Aided Diagnostics tool.

1.2. Objectives

For the development of this work the main goals are:

• Development of the potential properties of magnetic resonance images for the analysis of these structures, as to the 3D conformation and motion during breathing; • Description of Landmark-based geometric morphometrics; • Development of an automatic segmentation process of the structures, specifically the tongue;

2

• Development of a computational analysis of volumetric and quadratic properties of the structures, namely the tongue; • Demonstrate the viability of this analysis for the application as a Computer- Aided Diagnosis System (CAD system).

1.3. Report Organization

The comprehensive analysis of human tongue’s anatomy and functionality will be addressed over Magnetic Resonance imaging, and the various stages of image analysis addressed, cover a wide spectrum of fields. Chapter 2 presents an overview on the tongue’s full anatomy and functionality in a healthy case subject Chapter 3 presents an overview on the basic concepts of Magnetic Resonance technic of image acquisition that are the most determinant feature for the success of the developed analytical work. Chapter 4 presents an overview of the state of the art of tongue segmentation studies and technics, from the very initial reports with poor description of the anatomy, which over the years was never very thoroughly described, and the perception of the complexity of its study not widely addressed. Only in recent years the developments of Computer Vision allowed that the studies address this organ with careful attention and the complexity of such anatomy as one of the most complex in the human body. Conclusions and future perspective for the dissertation work are presented in Chapter 5.

3

4

Human Upper Airway Anatomy

The human upper airway is regulated by many complex mechanisms and organs that sustain the mastication, deglutition, respiration and speech production. The importance of tongue’s functionality for said abilities implies actions of (1) positioning of food in the whole vocal cavity, (2) along with the buccinator muscle maintaining food in position for the mastication tasks, (3) propelling of the food to the palate and posteriorly into the pharynx initiating deglutition, (4) change its conformation in order to alter the sounds produced during speech production. The anatomical structure of the vocal tract (Figure 2.1) is well established, being the tongue a central organ of this system, which plays a crucial role for the correct functioning of the referred tasks. The development of the anatomical structure of the human vocal tract is a complex process taking 6–8 years, sometimes as long as 10 years (Lieberman et al., 2001). During this process it is known that shape and position of the tongue change gradually, whereas the newborn tongue is initially flat, positioned almost entirely in the oral cavity, and later, as it descends into the pharynx, acquires a posterior rounded contour, carrying the larynx down with it. Suprapharyngeal

Figure 2.1 - MR Midsagittal image (slice) indicating the vocal tract’s structures. From Rua Ventura et al. (2011).

5 horizontal and vertical proportions undergo proportional growth that reaches maturity by the age of 6-8 years old (Lieberman et al., 2001). This is confirmed by Vorperian et al., (2009) based on a longitudinal study of 605 subjects using MRIs and CTs. However, the anatomical study of this structure has been simply ignored, whereas, the actual knowledge and role in the execution of the referred tasks has only been attempted to be understood in very recent turn of investigations, being also aided by the redirecting of available imaging technics towards the characterization of this organ. In the literature, reported references that confer some extent of attention to the tongue’s anatomy are very scarce. For instance, a gross anatomy of the tongue is, in very early anatomic discoveries, to be found in general anatomical works Salter, 1852; Gray, 1918)

2.1. Anatomy of the tongue

The human tongue is an organ composed primarily by , and located in the oral cavity, occupying a major portion of its volume. It is attached to the oral cavity through its posterior structures, namely via tendons, and other neighboring muscles as well as to its pavement through the lingual frenulum fold. The tongue is attached to the support structure of bones of this region, specifically to the mandible, the hyoid bone and the styloid process of the skull. The styloid process and bone structure of the skull is shown in Figure 2.2, and the bone attachments of the tongue are depicted in Error! Reference source not found..

Figure 2.2 - Side view of the skull. The styloid process is just posterior to the mandible.

6

The posterior connection of the tongue is made by an attachment to the hyoid bone, suspended in the larynx structure, by muscles and cartilaginous tissue. Anteriorly, the tongue connects to the posterior aspect of the mandibular symphysis. The tongue's base is connected by fascia to the supralaryngeal muscle that lies immediately inferior to the tongue and forms the muscular floor of the mouth, the mylohyoid. The structure that composes the tongue is a complex arrangement of muscles whereas, the muscles can be grouped in two categories: intrinsic muscles, those that are actually part of the tongue, have no bone insertions and are responsible for its shape changing, flattening and up-lifting abilities, and extrinsic muscles, those that are connected to the main structure and attached to bone, responsible for protrusion and retraction, lateral movement and shape modification abilities (Seeley et al., 2008). The extrinsic muscles are genioglossus, hyoglossus, styloglossus, and palatoglossus. The remaining muscles, transversus, verticalis, superior longitudinalis, and inferior longitudinalis, are intrinsic to the tongue. A groove, named terminal groove, divides the tongue in two portions. The anterior portion relatively to the groove corresponds to 2/3 of the surface of the tongue being covered with taste buds, with taste receptor cells. The posterior third portion is, in contrast, deprived of taste buds, having only some taste terminal receptors on its surface, being occupied by little glands and a big agglomerate of lymphoid tissue belonging to the lingual amygdalae.

Figure 2.3 - Tongue’s attachments and neighboring structures in a sagittal anatomical view. From Gray (1918).

7

The musculature of the tongue has been described as being composed by eight paired muscles, as illustrated in Error! Reference source not found.Error! Reference source not found..

Genioglossus Genioglossus constitutes the main volumetric portion of the tongue posteriorly, having a fan or wedge-shape. It is fixated by a musculo-tendinous origin from the inner surface of the symphysis menti, continuing from root to tip. Its muscular anterior fibers are arranged in a curved antero-dorsal direction that culminates in the anterior fibers of the inferior longitudinal, hyoglossus, and styloglossus muscles. Its posterior fibers run horizontally and backwards to the root of the tongue towards the anterior surface of the hyoid bone and anterior surface of the base of the epiglottis. Also, intermediate bundles of fibers diverge with different degrees of obliquity between the two mentioned portions. In parasagittal plane it becomes possible to identify its orientation.

Hyoglossus Hyoglossus radiates in a fan-shaped manner in its upper portion, having a quadrangular conformation in base. Anatomically, towards the other tongue muscles, it is positioned medially, between the inferior longitudinal and genioglossus muscles. Arises from the body of the hyoid bone and interdigitates at its origin with superficial and deep fibers of the geniohyoid. Fiber orientation in the posterior portion of the muscle consists in an antero-posterior radiation. The anterior fibers run and terminate in an approximately longitudinal direction towards the tip of the tongue. The posterior portion lies therefore under cover of styloglossus, terminating in a fusion to its fibers.

Styloglossus Styloglossus departs from an insertion in the anterior and lateral surface of the styloid process, close to its apex, continuing in a descending and forward direction into the tongue. Its deep fibers interdigitate with the body muscle of the tongue. After inserting into the tongue, the fibers divide into two bundles. An anterior bundle continues anteriorly along the inferior surface of the inferior longitudinalis, laterally to the hyoglossus, finalizing in the tip of the tongue. A posterior bundle penetrates de hyoglossus and courses medially into the lingual septum.

Transversus Transversus is part of the bulk of the tongue, along with the Verticalis. It is located between the superior longitudinal muscle, dorsally, the genioglossus and inferior longitudinal muscles, ventrally. The more superficial muscle fibers take a dorsal direction, and the deepest ones are disposed in a ventral direction.

Verticalis Verticalis is the other muscle that constitutes the thickness of the tongue, being in a tight joint surface with the Transversus muscle. Verticalis fibers are generally

8 vertical, spreading at its superior and inferior portions. The Genioglossus, transversus, and verticalis partially overlap with one another.

Figure 2.4 - Extrinsic muscle of the tongue. Styloglossus visible at center top (in red). From Gray1019 modified by Uwe Gille.

Superior Longitudinalis Superior longitudinalis consists of a thin stratum muscle. Its fibers are directed longitudinally along the lamina propria, although this directionality is not clearly defined, being reported with disagreement in Anatomy bibliography. The muscle has a gradual reduction in thickness as it reaches the Styloglossus, hyoglossus and inferior longitudinal muscles, laterally in the tongue.

Inferior Longitudinalis Inferior longitudinal is a narrow muscle that extends between the paramedian septum and the medial lamella of the lateral septum. It arises medially with the genioglossus muscle, having lateral attachment from the body of the hyoid bone. It is positioned medially with the hyoglossus muscle. In the middle body of the muscle, it blends with the genioglossus hyoglossus, and Styloglossus muscles forming the tip of the tongue.

The whole description of the musculature existent in the tongue is based on the findings reported by Shafik Abd-El-Malek in (Abd-el-Malek, 1939). Takemoto, (2001)

9 was able to describe and illustrate his findings on the relative positioning, especially well for the extrinsic muscles, stating the difficulties of distinction between the genioglossus, transversus and verticalis, and produced a three-dimensional tongue model based on impressions from his tongue dissections, depicted in Figure 2.5. Also, muscle tongue movement has been established for each constitutive muscle of the tongue, as presented in Table I.

Table I - Muscles of tongue movement. From Seikel et al. (2009).

Elevate tongue tip Superior longitudinal muscles

Depress tongue tip Inferior longitudinal muscles

Left and right superior and inferior Deviate tongue tip longitudinal muscles for left and right deviation, respectively Posterior genioglossus for protrusion; superior longitudinal for tip elevation; Relax lateral margin transverse intrinsic for pulling sides medially

Narrow tongue Transverse intrinsic

Genioglossus for depression of tongue Deep central groove body; vertical intrinsic for depression of central dorsum Moderate genioglossus for depression of tongue body; vertical intrinsic for Broad central groove depression of dorsum; superior longitudinal for elevation of margins Posterior genioglossus for advancement of body; vertical muscles to narrow Protrude tongue tongue; superior and inferior longitudinal to balance and point tongue Anterior genioglossus for retraction of tongue into oral cavity; superior and Retract tongue inferior longitudinal for shortening of tongue; Styloglossus for retraction of tongue into pharyngeal cavity Palatoglossus for elevation of sides; Elevate posterior tongue transverse intrinsic to bunch tongue Genioglossus for depression of medial tongue; hyoglossus and Depress tongue body for depression of sides if hyoid is fixed by infrahyoid muscles

10

Despite the unclear definition of the myoarchitecture and anatomical fiber orientation and 3D arrangement of the tongue, in the last ten years a new interest has been taken by the scientific community in the comprehensive analysis of this structure. To answer these disparities, the detailed study of the tongue, specifically of the lingual myoarchitecture has been collected with new recordings through diffusion tensor magnetic resonance imaging, or diffusion tensor imaging (DTI). This technique is very attractive for these types of studies since it enables fiber orientation imaging and analysis in vivo. Gilbert and Napadow, (2005) report imaging three human statically, and Shinagawa et al., 2008 reports imaging from single sections of in vivo human tongues during rest and protrusion movement. Although electropalatography (EPG), X-ray imaging, ultrasound, and cine-MRI imaging have been reported in the study of lingual function (Shinagawa et al., 2008) relatively to other neighboring structures such as the hard palate, and other attempts of imaging the surface visualization during movement and/or oral functions, the tongue muscle activation of fibers, for deformation of its body are not well defined, and a clear understanding of these mechanisms in vivo has only been in recent years considered a matter of deserving attention.

Figure 2.5 - Muscles of the tongue, from Takemoto (2001). GG genioglossus, T transversus, V verticalis, HG hyoglossus, IL inferior longitudinalis, S superior longitudinalis, PG palatoglossus, SG Styloglossus.

11

2.2. Neurophysiological control of the tongue

Neurophysiology is an advanced field, which addresses the understanding of the mechanisms that govern the motor control system, especially at the level of last-order muscular output. Since the tongue is purely a muscular structure, the understanding of its complexity may address the neural complex mechanisms of activation that rule its functionality. This analysis is of preponderant importance since the neural control on tongue movement is crucial to the function of rhythmic tasks of respiration and swallowing, and disruptions of these mechanisms have even been associated with the highest mortality reported among the pathological problems that may arise (Sawczuk and Mosier, 2001). The neuromotor system, is based on the activation of motor units. These consist of single motor neurons and an assortment of muscle fibers onto which it is connected. Through this connection synapses occur, through electrical potential signals that are sent along the specific motor neurons innervating the muscle fiber bundles that need to be activated, producing a simultaneous contraction of said fibers. Motor units are organized in motor pools activated in a systematic stimulation, by the central nervous system. Tongue muscle movement, contractile properties and generator-produced rhythmic modulation derive all from the innervation of the hypoglossal motoneuron complex. The motoneurons are clustered in the hypoglossal nucleus, part of the brainstem, from which departs the hypoglossal , the twelfth cranial nerve XII. The system of motor neurons that innervate this group of muscles is astonishing, evidencing the remarkable complexity of such an important organ in all its functions. Although the actual number of neurons that intervene in this structure is reported with high disparity, placing, for instance, the total number of myelinated fibers in 9,900 (Atsumi and Miyatake, 1987). In contrast, other muscles of higher dimensions, including biceps or rectus femoris, for instance, are innervated by an average of 441.5 and 609 motor units, respectively (Hamilton et al., 2004). Electromyographic studies have on the other hand, been more recently carried out in order to comprehend the complete muscle activity involved. Recent studies report that the genioglossus is the primary upper airway dilator muscle, and its internal motion activation is inhomogeneous. The neuronal control has been vastly studied in the last ten years, and punctual conclusions have been established relatively to the phases of control of the Hypoglossus. EMG findings reveal that inspiratory neuronal activity begins approximately 250ms before the inspiratory process begins, whereas, during inspiration neuronal stimulus increases, and during expiration tonus level is maintained (Cheng et al., 2008). Although this basic neuronal source is established, the tongue is very uniquely characterized by a complex mechanism of activation that is not yet known, whereas the highest difficulty of the comprehensive process is straightly related to its anatomical complexity. In fact, the human tongue is not only of higher complexity relatively to other mammals, but its anatomical nerve activation and gross neuroanatomy is also lacking. The most extensively studied muscle among tongue

12 muscles is the Genioglossus, responsible for protrusion and depression motion, which has been demonstrated to take part in most tongue movements carried out. It is hypothesized, in the literature, although it hasn’t been directly reported, that neural control of the tongue may be done, as reported in other mammals for skeletal muscles control, by means of tissue composed of neuromuscular compartments (NMCs), that are morphologically and functionally activated by distinct neuromotor pools, defined as “smallest portion of a muscle to receive exclusive innervations by a set of motoneurons” (English et al., 1993). (Mu & Sanders, 2000), have demonstrated a compartmental organization of the canine tongue, specifically the innervation present in the genioglossus, where it is reported the presence of two compartments, with fibers horizontal and an obliquely oriented, as well as the branches subdivision departing from the main genioglossus nucleus. This mechanism is reported to base neuromuscular control of shoulder muscles (Wickham & Brown (2012), Lucas-Osma & Collazos-Castro (2009)), however, even in said anatomically simpler muscles NMCs boundaries are not completely defined. Unfortunately no careful anatomical data is found in the literature describing the neuronal organization of the human tongue, compartmental or non-compartmental wise.

2.3. Speech Production, Respiration and Deglutition

Speech production, respiration and deglutition are the three main activities that are carried out by the vocal tract, with determinant aid of tongue motion. Among these functions, speech production is the area that has been more extensively studied by the scientific community, due to its multidisciplinary character. The human phonetic apparatus may be divided in organs responsible for sound production and organs of speech articulation. Sound production or phonation, is achieved through the addiction of the vocal folds into the airstream of the airway, a process named vocal attack, following their fixation into specific position that modulates the aerodynamics of airstream passage. The vocal tract acts as an acoustic filter for a source signal generated in the vocal folds within the larynx, whereas the process of speech production implies the complement of simple phonation with the execution of an extremely well-organized and integrated sequence of movements of the speech articulator organs (lips, mandible, tongue and palatal velum), shaping the resonant cavities of the vocal tract and consequently altering the resulting acoustic output (Seikel et al., 2009). Tongue deformation is directly related to vocalic as well as palatal, velar and pharyngeal consonant’s sound production. Many are the studies that model tongue conformation, during production of specific sounds, present in various languages worldwide, as presented in Figure 2.6, for vocalic sounds of Portuguese language.

13

Figure 2.6 - Tongue contour extracted from midsagittal images, during production of vocalic sounds present in Portuguese language. From (Ventura et al., 2008).

Moreover, the cross-sectional area along the vocal tract, in its supralaryngeal section determines formant frequencies, whereas records of studies addressing the human tongue deformation during speech production, exist from over 150 years (Lieberman, 2012). The analysis of the resonance cavities involved in phonation is, as obviously understood by the scientific community that has undergone an extensive amount of research relevance to the study of speech production anatomy and mechanism is in this sense of extreme importance, for understanding more importantly the mechanisms that allow the diversity of phonation capacity and how disturbances, of pathological origin or otherwise, to the structures involved may affect their functionality. Deglutition consists in the passage of a bolus of food through the vocal tract, which will trigger a swallowing reflex as it passes into the region behind the tongue and above the larynx, whereas the larynx elevates, and the epiglottis (attached to the root of the tongue) drops down to cover the aditus. Food bolus formation was illustrated and explained in (Abd-El-Malek, 1955). His observation of subjects masticating nuts, gelatin and chewing gum led to the description of the following steps: a) Preparatory stage – acquires a pouch-like form, to collect the food on its dorsum. Illustrated in Figure 2.7. b) Throwing-stage - a twisting movement towards one side to deposit the bolus onto the molars. Illustrated in Figure 2.8. c) Guarding stage -tongue twists even more, making contact with the upper and lower teeth, in order to keep the bolus between the molars during mastication. Illustrated in Figure 2.9. d) Bolus formation – after several chewing movements the cheeks move medially and the tongue moves side to side, mixing the bolus with saliva and coating it with mucus. e) Deglutition - the tip of the tongue is raised and pressed against the posterior surface of the front teeth and the anterior part of the hard palate, so as to close off the mouth and pharynx. Illustrated in Figure 2.10.

14

Figure 2.7 - Abd-El-Malek (1955) illustration of the preparatory stage of mastication.

Figure 2.8 - Abd-El-Malek (1955) illustration of the throwing stage of mastication.

Figure 2.9 - Abd-El-Malek (1955) illustration of the guarding stage of mastication.

15

Figure 2.10 - Abd-El-Malek (1955) illustration of the initial stage of deglutition.

Muscle activation during this process is automatic, and important processes regard studying deglutition to assess the stiffness of the tongue’s surface, or the force that the tongue is able to exert on the hard palate.

In humans, respiratory airway activity involves important tasks of patency maintenance. Substantial studies suggest that this function is provided by the tongue’s genioglossus muscle (GG). Airway patency is a matter of extreme importance, and delicate to control, since the human pharynx has no rigid support except at its extreme upper and lower ends where it is anchored to bone (upper extremity to hyoid bone) and cartilage (part of the larynx). Therefore the airway depends on 20 skeletal muscles that dilate and keep the oropharynx open (Dempsey et al., 2010). During respiration, tongue deformation has been analyzed through tagged MRI, a technique that arose later as a modality of MR imaging, allowing quantification of physiological motion. Expiratory and inspiratory tasks create pressure differences in the airway and muscle tonus of the involved structures that define its need to be able to maintain the adequate compliance. Inspiration tasks generate a negative inspiratory pressure that manifests at epiglottis level, that has been directly correlated with neuronal firing of the genioglossus (Pillar et al., 2001). Cheng et al. (2008) reports the muscle movements activated throughout the respiratory cycle. Genioglossus muscle analysis indicated posterior movement during expiration as opposed to an anterior movement during inspiration, and over the geniohyioid. Geniohyioid has presented very little movement during respiration.

2.4. Final considerations

Airway patency needs to be further analyzed since the exact mechanisms by which the tongue’s genioglossus maintains airway structure is not fully known. Various imaging methods have been reported in studies addressing this airway structural geometry including said muscle activation, specifically endoscopic imaging (Kuna, 2004), X-ray fluoroscopy (Wheatley et al., 1991), acoustic reflection, Computer tomography imaging (CT, in Teguh et al., 2011), optical coherence tomography (OCT,

16 in Togeiro et al., 2010), as well as magnetic resonance imaging (MRI) (Woo, Murano, Stone, & Prince (2012), Moon et al. (2010), Arens et al. (2003)). Another important aspect that represents current challenge in the clinical practice of physicians, takes into consideration that large numbers of target and normal tissue structures present in the head and neck, for instance, require manual delineation. An example includes cancer patients, where the contouring is tedious and time consuming. Also, in certain courses of treatment, such as head-and-neck intensity- modulated radiotherapy, it is required accurate delineation of those structures, implying efficiency benefits from an economical perspective, besides obvious improvement to the patient’s treatment.

It is preponderant to address the various health issues that still need to be studied further, as well as improve the imaging and the related mechanisms of regulation and functionalization. Understanding speech disorders Kent, 2004, understanding sleep apnea (Saboisky et al. (2007), Dempsey et al. (2010)), planning and practicing surgery with computer models (Rodrigues et al., 2001), and understanding problems in tongue movement following surgery (Rodrigues et al., 1998) are some of the examples of problems that could be addressed in further studies of the tongue.

17

18

- Magnetic Resonance Imaging in the context of Upper Airway structural components

3.1. Introduction

Since the development of novel imaging techniques of the tissues that the in vivo anatomy of living organisms has been made possible. Magnetic resonance imaging (MRI) is a diagnostic’s method that uses strong magnetic fields and radiofrequency (RF) waves to form images of the human body. This technique allows a non-invasive imaging method that presents a wide range of potential clinical applications. MRI is therefore nowadays a well-established imaging method used by physicians in the evaluation and characterization of soft tissues. The technic presents major advantages compared to conventional imaging methods: uses non-ionizing radiation, allows greater soft tissue contrast and also enables an analysis of the three- dimensional structures surrounding the upper airway. Analysis of images from MRI, relatively to other imaging technics is characterized for being more informative in terms of output extent of informative level that can be retrieved, allowing an analysis of the outputs to be oriented to the monitoring of the respiratory airway during sleep and the structures that play a determinant role in the study of normal functioning upper airway, relatively to the imaging of pathological upper airway. It allows therefore the addition of tremendous value to screening, diagnostic, surgical planning and follow-up of patients, for a variety of pathologies developed in the upper airways. A particular case where this imaging technic is advantageous and necessary, precisely for the appearance of pathological scenarios during the developmental process in the upper airways, is when it is applied to children, to whom the usage of non-ionizing radiation is preferable. Despite the advantages presented, the use of MRI is not quite as common as it was idealized, being the main reason related to the high cost of the imaging technic. In this chapter the physical principles in which this technic is based will be described, as well as the variable aspects that affect its quality and adequacy, in order to better understand the adaptability and potential in the application of imaging the human tongue.

19

3.2. Basic Principles in Magnetic Resonance Imaging

The rotational movement of protons present in the 1H atoms nucleus – spins – implies that each of them is associated with magnetic dipolar moment (m.d.m). The most abundant atoms present in tissues are 1H atoms, with spin =1/2, being more sensitive to magnetic fields applied in Magnetic Resonance (RM). When a magnetic field is applied to the spins, these go from a state of null magnetization, to a state of magnetization where the m.d.m’s tend to align themselves with the orientation of said field (illustrated in Figure 3.1), in a given volume element, assuming magnetization value different from zero.

Figure 3.1 - Spin orientation of atoms within tissues under no external field (a), and under a horizontal external field (b). Adapted from (Rinck, 2001).

This alignment is done in its majority according to a parallel direction related to the field, however a part of these spins does not respect this behavior and its movement is named precession movement that occurs with a given frequency, called Larmor frequency (Rinck, 2001). An external pulse applied in form of oscillations of the magnetic field in the range of radiofrequencies at Larmor frequency of those spins, forces them to enter in phase precession, which originates a signal of image in RM.

z

B0

Θ

x y

Figure 3.2 - Representation of equilibrium magnetization vector B0, and the modulated magnetic RF field, at Larmor frequency, which forms a spiral magnetic field, oriented in the z axis. Consequent spin precession phenomenon with angle θ of deflection is represented. Adapted from (Rinck, 2001).

20

The phenomenon explained in terms of physical behavior, can be examined considering Figure 3.2, where the magnetization vector is in the z axis, and the precession phenomenon makes spins rotate around that axis of magnetization with a deflection angle in the vertical plane containing said axis. Therefore, in an MR equipment, a given antenna is positioned in the xy plane that detects a variable electromagnetic field, producing an oscillatory signal, which corresponds to the RM image signal. This method therefore intends to detect the energy released by the phenomenon of Relaxation, that occurs when the radiofrequency (RF) pulse ends and the spins start to relax to the minimum energy state.

3.3. Relaxation Times

There are two types of relaxation of tissues, longitudinal or spin-lattice relaxation (T1 weighted time), made through the Z component of magnetization 푀푧 after the application of magnetization in the xy plane, and transversal relaxation or spin-spin relaxation (T2 weighted time), that occurs by the additional effect of dephasing of magnetization induced by interactions between spins of neighbor protons, that when subjected to magnetic fields with slight differences, rotate at corresponding Larmor frequencies. This process of continuous loss of phase coherence, becomes gradually more prominent with time. The magnetization therefore implies that T2 relaxation time is always less than T1, and that the timeline of the process starts at a magnetization in xy plane that then tends to zero, followed by an increase in the longitudinal magnetization until equilibrium is achieved, in axis Z. T1 relaxation results from the interaction with the mesh of atoms in the tissue, and is characterized by a rate of magnetization Mz vector through time given by:

푡 − 푀푧 (푡) = (0).(1 − 푒 푇1) Eq. 1

Figure 3.3 - Longitudinal relaxation, T1 feature described in Eq. 1.

This equation describes a profile, where the recovery tends to a thermodynamic equilibrium state, for which Eq. (1), given t=T1, is [1-(1/e1)], meaning that T1’s

21 characteristic time is the time where the longitudinal magnetization recovers 63% of its equilibrium value (Rinck, 2001).

3.4. K-space

Spatial coding of the image is another part of the mechanism, of acquisition that includes: - Slice selection – implies the positioning of a gradient in the perpendicular direction to the cut to be retrieved (in the z plane for an axial slice), the position of slice is selected by the frequency of the pulse, and the thickness by its bandwidth. - Frequency encoding – applying a first signal according to a specific direction, the signal emitted by the different elements of volume, are characterized by different frequencies; - Phase encoding – applying a second signal according to a determined direction, the different elements of volume according to that direction will be characterized by different phases. Therefore, for an axial acquisition, the slice selection is done in the z plane, the axis x and y are responsible for the frequency and phase encoding. The two magnetic fields distributed, make for each orientation of the phase encoding gradient Gy correspondent to a line (y position), and the frequency encoding gradient dictates each column’s value (x position) of that line, and in this way the (x,y) positions are stored in a matrix called K space. Each combination is afterwards mapped in the image reconstruction to its position, and the amplitude into the corresponding intensity, by applying the Fourier transform to the 2D distribution (A. Bernstein et al., 2004). The design of appropriate gradients, is preponderant so that k-space samples can be acquired and then inverse Fourier transformed to obtain an image of the magnetization M(x; y). K-space must be sufficiently sampled according to the Nyquist criterion to avoid object domain aliasing. The extent of k-space coverage determines the image’s resolution.

3.5. Contrast and tissue signal in RM

Contrast in RM is due to the occurrence of specific relaxation phenomena in the different tissues, where it depends on the different times of relaxation T1 and T2, as well as different proton densities, which are characteristic and intrinsic of each type of tissue. The different tissues contain large numbers of chemical components that contribute to the measured magnetic resonance signal, and this composition characterizes each type. Image acquisition in RM is made through specific sequences of pulses, of RF and orientation of the phenomena of relaxation, where given the dependence on time of these phenomena, contrast can be adjusted and chosen by applying specific combinations of temporal parameters of acquisition. In the conventional MRI acquisition these phenomena will also be influenced by the technical factors of medical acquisition, or biologically extrinsic factors. These include the magnetic field strength

22 and homogeneity, and are crucially determined by the pulse sequence contrast influencing components TR, TE, TI and FA. The main objective since the discovery of this technique relies in combining these parameters in order to emphasize certain contrast determining factors, or determining relaxation phenomena among others, or even a set of different factors.

3.5.1. TR, TE and Pulse Sequences

Pulse sequences of acquisition consist in a sequence of signals sent to the tissues, by MR machines. The pulse sequence consists in repeated RF pulses which cause a free induction decay (FID) characterized by a specific initial amplitude, mediated by the pulse sequence parameters. The two time parameters that determine this method are TR (repetition time) and TE (echo time) of the pulse sequences. TR is the time interval between two successive RF pulses, and TE is the time at which the echo signal, the signal produced by induction of the spinning protons, reaches the detector of the machine and is measured. TR can therefore determine the degree of relaxation of protons back into alignment of the magnetic field, whereas specific rates of relaxation of the tissues will imply having TR times shorter than what is needed for a full relaxation decrease the signal retrieved from the analyzed tissues.

3.6. Limitations and determinant considerations

The growing interest in the tongue’s function over all its functionalities of breathing, swallowing and speech production tasks has given rise to the importance of imaging the upper airway and its structures with the best imagiological technic available, whereas for the correct imaging of such complex structures, a good contrast between tissues is fundamental to allow the distinguishment of the different structures at its correct boundaries. These factor are of extreme importance for the development of the dissertation work proposed here. Therefore the rigorous imaging of the structures at study is determinant for the correct function of the following computational tasks of retrieval of the target structural. In spite of the image quality conditionings referred above MRI technic is considered as the best, a non-invasive, accurate method imaging technic available for the imaging of the muscular organ at study.

23

24

- State of the Art

Introduction

Image processing and analysis of medical images is a novel field that has gained a promising and relevant importance over the years, presenting astonishing developments in the areas of computer aided diagnostics, improving imaging technics, and imaging analysis processing of aspects that cannot be visualized and/or retrieved by plain image observation. Volumetric imaging techniques can be used to reconstruct three-dimensional structures from serial two-dimensional images. This section provides a conceptual overview of those techniques by illustrating the reconstruction of the airway structures. Segmentation of the target anatomical structures from MRI is still a challenging process. There are various reported methods of segmentation of static MR images/volumes (Balafar et al., 2010). Their applications to the particular segmentation of tongue, is reported in a scares number of instances, highlighting the need of further studying this organ and the development of the adequate tools accordingly.

State of the art

The imaging study of the tongue is a very underdeveloped field which has limited the development of anatomical and functional characterization of this organ. The recent development of Computer Vision and Machine Learning fields of Image Analysis in recent years, provides the availability of new tools of image computer analysis regarding 3D volume segmentation and reconstruction. The first imaging reports of the tongue are made through ultrasound (US) imaging (Sonies, 1981), and subsequent applications towards the analysis of swallowing and articulation tasks using snakes in (Unser and Stone, 1992), and using scale space filtering for edge detection in (Kelch and Wein, 1993). The main applied studies that address specifically this structure are extensively reported in speech studies. Therefore, US imaging presented the best imaging characteristics for a dynamic acquisition of multiple frames during speech production exercises. First tongue 3D modelling and reconstruction were reported in (Watkin and Rubin, 1989), that

25 describes a trigonometric transformation of the 2D coordinates into a volume, and latter, more advanced segmentation methods were described by Akgul et al. (1998) and more recently for segmented 2D motion analysis applying Markov random fields in Tang et al. (2012). Although the demonstrated applicability of US to tongue modelling, further study of its anatomy imply that a higher contrast and resolution imaging technic, such as MRI, prevails as more adequate in the intended study of the tongue. The first reports of tongue anatomy imaging through MRI were reported in (Lufkin et al., 1983). The analysis of tongue anatomy and functionalization has been reported in studies using both static volumetric MRI, standard imaging modality for 3D imaging, and Cine or tagged-MRI imaging, another imaging modality that has been extensively used for temporal characterization of the tongue’s anatomy. Reported dynamic acquisition image analysis studies reinforce the necessity of a proper segmentation in 3D studies to the evaluation of the dynamic processes it is responsible for, such as swallowing and speech production (Lee et al., 2014). Other studies pretend to reinforce the study of the biomechanical modelling of this structure, and therefore chose a high resolution imaging modality such as static volumetric MRI (Harandi et al., 2014). The emerging interest in the study of the tongue’s deformation and functionality has established that the requirement for an automated method of image analysis of this kind of anatomic data is expected to gain a rapid eminent relevance (Woo et al., 2012). Reported studies on segmentation of the tongue, focus of the segmentation of static and dynamic acquisitions. Dynamic acquisition reveal obvious relevance in the study of tongue motion characterization. The processing needed is common since the format is always based on 2D image segmentation. Stone et al. (2010) is one of the first reports that focuses on the strict tongue segmentation, and establishes the relevance of this study for motion patterns during speech production. In this 2D study the images were to simply be registered through a landmark based transformation algorithm and aligned, following principal components analysis for the motion study. The processes reported are usually divided in various basic phases: 1) Resolution wise pre-processing, 2) Registration, 3) Segmentation, 4) 3D Volume reconstruction. In Lee et al. (2014) is reported a isotropic volume super-resolution reconstruction from dynamic tagged-MRI images. The images were subjected to a super-resolution volume reconstruction, in order to address inter-slice resolution. It was attempted to surpass the limitation, extensively mentioned throughout this report, of long acquisition time, through the acquisition of three images with 6.0 mm thickness, which obviously affected the resolution through-planes. An up-sampling in the through-plane direction was developed using a fifth-order B-spline interpolation. Registration, for inter-slice alignment is reported in various studies (Lee et al. 2014, Woo et al. 2012), where applications of Mutual information (MI) similarity measure is reported for registration of sagittal with axial and coronal volumetric image stacks. After registration a final intensity correction is made using a local intensity matching algorithm, following Random walker (RW) segmentation algorithm.

26

The Random walker algorithm, for segmentation of 3D super-resolution volumes was also cited in the literature for similar purposes, due to its attractive features in Woo et al. (2012). Tagged-MRI is not adequate, regarding preponderant implications on volume reconstruction, to be used in these studies since the image quality is very low to when compared to static volumetric MRI. A mesh modelling approach is reported in Harandi et al. (2014) whereas the registration technic departs from an initial source model of the tongue to whose vertices are applied external forces forcing it towards the target boundaries through a process dictated by local intensity profile registration and positions computed through normalized cross-correlation and finalized by shape matching. The advantage of this algorithm is that it allows user input to automatically correct the mesh nodes positioning. The most recent study published attempted to go further in the investigation of functional behavior, and describes a novel method of segmentation of individual tongue muscles (Ibragimov et al., 2015), specifically genioglossus and inferior longitudinalis. In their work, it was implemented an adaptation to muscle segmentation of the game-theoretic framework (GTF) algorithm, based on land-mark- based segmentation.

MRI 3D volumes image segmentation technics

Computer-aided modelling of the oropharyngeal structures is beneficial for 3D visualization, and for the understanding of the associated physiology. Medical imaging is retrieved in a universal format, organized according to a predefined standard. The studies that address image segmentation of the tongue are limited and therefore an overview of this list of presented in the following points.

4.3.1. DICOM Standard Overview and Volumetric Data

The process of imaging has become extensive, including a wide variety of formats, imaging technics, and post-acquisition procedures. For this reason, in addition to the creation of a communication system and network storage used, named Picture Achieving and Communication System (PACS), a common format that allows correspondence between station and a safe data transference was created. A picture archiving and communication system (PACS) is essentially a network system for digital or digitized images from any modality to be retrieved, viewed and analyzed by an appropriate expert system, at different workstations. This communication is safeguarded by a pattern called DICOM - Digital Imaging and communications in Medicine, a standard for the communication and management of medical imaging information and related data (ISO 12052). The DICOM format was first released in initial versions of the ACR-NEMA - version 2.0 published in 1988 - created standardized terminology, an information structure, and file encoding, whereas the version 3.0 of the standard published in 1993 finally addresses the matters of a standardized communication of digital image information, developed by the American

27

College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA) who in 1983 formed a working group with the objective of developing a model that would allow a fully digital workflow for image exchange. It is defined as a set of standards for treatment, storage and transfer of medical images and associated information, in an electronic format, and was created with the purpose of standardizing the formatting of diagnostic images allowing these to be exchanged between equipments, computers and hospitals (NEMA). The DICOM system has interest in a variety of medical fields including cardiology, dentistry, endoscopy, mammography, ophthalmology, orthopedics, pathology, pediatrics, radiation therapy, radiology, surgery, etc. From the Scientific’s community point of view, this standard enabled an open architecture for imaging systems, bridging hardware and software entities and allowing interoperability for the transfer of medical images and associated information between disparate systems (Dreyer et al., 2006). Furthermore in the field of Image Processing, and Computer Vision the development of analysis and processing tools is now possible to be standardized, without any format and organizational issues. The data structure of a DICOM file consists of a set of data elements. A header portion includes general data elements related to the image. Image data is also contained in one data element, or more data elements if there are more than one part image in this DICOM file. Each data element is stored as depicted in Figure 4.1 (The DICOM Standard, 2015). After the header a dataset follows, which represents the content of the file. The dataset can be an image, a presentation state, a structured report or another DICOM object. For reading procedures, the format implies that a system based on a data dictionary, which stores all kinds of tag groups so that every data element can be read correctly. Information of each kind of image (CT - computer tomography, MR - magnetic resonance tomography) has an identifier as well as the instance of such a class. There is no definition of 3D data storage in DICOM standard.

Figure 4.1 - DICOM data set structure consists of several data elements.

28

A volume is usually presented by an ordered series of 2D DICOM files, each of which may have multiple components of the same size and representation, that are parallel slices of the volume.

4.3.2. Slice-to-Volume Registration

As previously stated, one of the issues that arise from MRI acquisition, is the technical consequential issue created by the rather long times necessary for the retrieval of each 2D k-space image. Volumetric MRI data consists of a series of 2D images corresponding to a given series of slice of tissues, and a determined thickness. Each slice is acquired consecutively, in a sequential series of acquisition, whereas the process of each acquisition, is therefore very sensitive to motion of tissue, that will practically inevitably cause some degree of inter-slice misalignment. Under the field of medical image processing this issue is currently covered by image registration, under which extensive research devotion and developments have been made over a time span of 25 years, and its relevance and attention given include a applications with computed tomography (CT),magnetic resonance imaging (MRI), Positron emission tomography (PET), Single Photon Emission Computed Tomography (SPECT), and also a later increase of applicability’s in Ultrasound (US) imaging (Pluim and Fitzpatrick, 2003). Other functional applicability’s include intervention and treatment planning, computer-aided diagnosis and disease follow-up, surgery simulation, atlas building and comparison, radiation therapy, assisted and guided surgery, and finally, as will be applied in this work, for registration based segmentation (Oliveira and Tavares, 2012). Also, many are the functional organ our tissue imaging applications that have been coupled with image registration, whereas the bibliography reports a variety of studies developed for brain (Jiang et al. 2007, Jenkinson et al. 2002), retina (Can et al. 2002, Hendargo et al. 2013) , lung (Wang and Gu, 2013), breast (Hopp et al., 2013), abdomen (Joshi et al., 2013), heart (Bai et al., 2013) imaging, among others. Registration is in many cases used to register the alignment between different types of images in order to retrieve and complement the information obtained from each one. In the present study a variant of image registration will be addressed, where registration is made within the image at study, for alignment between volumetric slices, in order to establish coherence throughout the volumetric image retrieved. Image registration in this sense as been used in many imaging studies, such for brain image (Jiang et al., 2007), cardiovascular (Monti et al., 2008), prostate (Fei et al., 2003), or upper airways (Lee et al., 2014). This process is needed to correct through- plane resolution. Image registration is addressed to correct for subject motion between acquisitions. Accurate registration is of great importance in this application because small perturbations in alignment can lead to visible artifacts after applying the MAP-MRF reconstruction algorithm (increases the variance of intensity values at each spatial location).

29

Mutual information algorithm (MI) (Maes et al., 1997) is one of the most popular similarity measurement metrics method, whereas, reports show it as being successfully employed for non-rigid registration, although this technic may present its limitations. A registration method using a mesh-to-volume technic represents a different approach to landmark generation by adapting a deformable surface model to the target volume. This registration is used in Harandi et al. (2014), based on mesh nodes position calculation through local gradient intensity profiles and normal to the mesh surface.

4.3.3. Super Resolution Volumes

The technical time limitations of MRI acquisition protocols, translate into resulting limited resolution images whereas its high sensitivity to motion, that will be present with higher probability due to swallowing motion, will automatically condition negatively the images acquired. 3D acquisition of upper airway (head and neck imaging) takes at minimum 4-5 minutes. Maintain the tongue immobilized for such time span is likely to induce involuntary motion and/or swallowing. Super resolution algorithms can be categorized into being based on non-uniform interpolation, frequency domain, and spatial domain analysis methods. 3D MR images of the tongue can be produced from sets of orthogonal volumetric images, acquired at a lower resolution and combined using super-resolution techniques. This is described in Woo et al. (2012), where super resolution is produced. The production of super resolution volumes may also imply adaptations of acquisition protocols in order to obtain for instance, volumetric acquisitions with specific/target areas of super-resolution as reported in Ibragimov et al. (2015), as an adapted kind of orthogonal acquisition from (Woo et al., 2012). The success of this step of processing will determine prominently the success of the following segmentation step.

4.3.4. Segmentation

Image processing includes tasks of segmentation whereas the objective of segmentation algorithms is to partition an image into a finite number of important regions under the image scope, such as anatomical or functional structures in medical images. Image segmentation can be defined as the process of decomposing an image into various labeled regions that are characterized by some measure of homogeneity inside it and heterogeneity among different regions is maximal. When it comes to airway contour delimitation, the process is difficulted by eventual non-identification of organs, anatomic parts or artificial inclusion of non-existing parts. Air-tissue boundaries of vocal tract are hard to extract due to the similarity of anatomic structures around it. High resolution MRI is known to provide good representation of muscle anatomy. However a compromise of image quality for the acquisition of volumetric data is in many cases a balance to take into consideration upon the definition of the image acquisition protocol. This will lower boundary resolution and contract, since upon the acquisition pixel intensities are obtained,

30 through an averaging process of signal over each TR time, over the space of the target volume. The problems of the segmentation of this structure may arise from the presence of poor muscle-neighboring structures interface visibility, intensity mismatches, blurring, blank regions, etc.

Supervised segmentation algorithm is based on an analysis of a training data as example and produces an inferred function that allows the mapping of new data. Supervised segmentation algorithms typically operate under one of two paradigms for guidance: 1) Specification of a portion of the boundary of the target object; 2) Specification of a small set of pixels belonging to the desired object and (possibly) a set of pixels belonging to the background. Therefore supervised algorithms only use of labeled by any of the previous methods data. Particular variants are also relevant in this study, such as semi-supervised algorithms that make use of unlabeled data for training, typically a small amount of labeled data with a large amount of unlabeled data (Xiaojin Zhu, John Lafferty, 2003). Among these categories, segmentation can be based on seed growing approaches, which requires an operator/user to empirically select seeds and thresholds. Pixels around the seeds are examined, and included in the region if they are within the thresholds, sometimes adding the requirement that they are sufficiently similar to the pixels already in the region. Each added pixel then becomes a new seed whose neighbors are inspected for inclusion in the region. The random walker algorithm falls under this category.

Random Walker segmentation algorithm The Random Walker (RW) algorithm is described in Grady (2006), is being applied in several studies in the segmentation of the upper airway. This algorithm presents the several characteristics that confer the adequacy and suitability to this algorithm among others. It is characterized by having fast computation costs, flexibility, an easy user- interaction is required, and produces a very accurate segmentation with minimal interaction, through user-defined seeds. The insufficient image contrast between the structures to be segmented, such as the tongue and adjacent soft tissues at the periphery makes the segmentation task challenging (Lee et al., 2014). The algorithm is a K-way image segmentation and semi-automatic since it requires user-defined regions correspondent to K objects. These are defined by the user, specifying a small number of pixels with user-defined labels as seeds (on the tongue and the vocal cavity). Also, the algorithm uses for graph representation, harmonic energy minimizing functions, whereas low energy corresponds to a slowly varying function over the graph has will be defined next (Zhu et al., 2003).

Another approach of supervised modeling is to analyze anatomical variations in images and are based on generating a point distribution model that captures the shape of the object of interest and then augmenting this model with intensities near landmarks in the case of Active Shape Modeling (ASM). This method can be adapted to

31 a game-theoretic perspective as was validated by Ibragimov et al. (2012), and applied to tongue’s individual muscles segmentation for the first time, by the same authors (Ibragimov et al., 2015).

Game-theoretic framework for landmark based segmentation This algorithm is based on an adaptation of an Economics theory, the game theory, that studies the decision making of player that affects the other players during a game, that was established in Neumann and Morgenstern (1947) into the landmark position definition segmentation of the ASM segmentation. In this method candidate points are defined for each landmark, and likelihoods that each candidate point represents a specific landmark are evaluated. The landmark detection is formulated mathematically as a game, considering landmarks as players, landmark candidate points as strategies, and likelihoods that each candidate point represents a landmark as payoffs. To the obtained combination of optimal candidate points follows the definition of the boundaries connection each pair of adjacent landmarks, formulated as an optimal path searching problem. Image intensities in the area between landmark and are filtered by a control intensity function that minimized the distance error training images to the ground truth boundary.

Landmark-based atlasing using B-spline and Demons atlasing are other possible algorithms to be used for non-rigid segmentation, based on transformations to map/align the training-defined landmarks to the landmarks identified in the new target image. There is finally a report in using Gabor filter banks to extract rotation-invariant descriptors, which provided information of boundary strength as well as direction and proved to improve segmentation of objects with low signal-to-noise ratio.

3D Reconstruction

Computer-aided modelling of the oropharyngeal structures is beneficial for 3D visualization, and for the understanding of the associated physiology. MRI are represented as a group of bidimensional images of the study area, each one representing a depth level. The plain visualization of these images by surgeons or physicians implies the need of some mental reconstruction and therefore may cause doubts about the relations of volume or position between structures. The process of reconstruction is based on the estimation of the position of a point in 3D space by means of multiple images. After segmentation the reconstruction simply based on stacking of the MRI slice images with the labeled volumes to the correspondent image pixel positions. The astonishing results achieved in particular muscles segmentation reported in Harandi et al. (2014)

32

– Final Conclusions and Future perspectives

5.1. Final Conclusions

This report includes the first stage of development of the dissertation work in development, referred to the bibliographic study of the state of the art reported studies in this area. In light of this report, it is intended the development of an algorithm that allows the segmentation of the tongue presenting the following key features: a) Accuracy in boundary delimitation - with special concern of accuracy in the boundary definition of the base of the tongue; b) Fast computation - feature that needs to be balanced with the segmentation efficiency; c) Overcome the semi automaticity reported in other studies – produce a totally automatic algorithm; d) Overcome resolution implications that are present in the used MRI acquisition protocols;

The most approximate method of analysis currently is manual segmentation which is of course extremely time-consuming. The validation of developed methods and algorithms is only achieved through manual segmentation, which establishes the importance of the development of a computational segmentation method. The study presented in this work includes a wide range of established segmentation technics and very recent ones, which contributes to the development of a rich acquisition of new competences under image processing and Computer Vision segmentation.

5.2. Future Perspectives

The main contributions that the dissertation work to be developed intends to achieve include: a) A State-of-the-art review under the scope of tongue structural segmentation from MRI images;

33

b) The application and development of image analysis and computer vision concepts and technics for a biomedical imaging analysis applicability; c) Allow a 3D computer visualization of the tongue; d) Develop a novel tool for Computer-Aided-Diagnosis, that can be applied for a variety of medical diagnosis, methodologies and surgical intervention planning tasks;

The following stage of the dissertation work will include the development of the computational algorithms to be used in this work, that will include a selection of a restricted number technics of MRI imaging and the development, application to a predefined dataset, and quality evaluated and compared, as well as validated. The previous technics can be based on C++ programing language, whereas the computational work can be developed using VTK/ITK libraries (C++ based libraries for visualization and processing of graphical objects 2D/3D) platform or Matlab platform.

34

References

A. Bernstein, M., F. King, K., Xiaohong, Zhou, J., 2004. Handbook of MRI Pulse Sequences, Handbook of MRI Pulse Sequences. Elsevier.

Abd-El-Malek, S., 1955. The part played by the tongue in mastication and deglutition. J. Anat. 89, 250–254.1.

Akgul, Y.S., Kambhamettu, C., Stone, M., 1998. Extraction and tracking of the tongue surface from ultrasound image sequences, in: Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231). IEEE Comput. Soc, pp. 298–303.

Arens, R., McDonough, J.M., Corbin, A.M., Rubin, N.K., Carroll, M.E., Pack, A.I., Liu, J., Udupa, J.K., 2003. Upper airway size analysis by magnetic resonance imaging of children with obstructive sleep apnea syndrome. Am. J. Respir. Crit. Care Med. 167, 65–70.

Atsumi, T., Miyatake, T., 1987. Morphometry of the degenerative process in the hypoglossal in amyotrophic lateral sclerosis. Acta Neuropathol 73, 25–31.

Bai, W., Shi, W., O’Regan, D.P., Tong, T., Wang, H., Jamil-Copley, S., Peters, N.S., Rueckert, D., 2013. A probabilistic patch-based label fusion model for multi-atlas segmentation with registration refinement: Application to cardiac MR images. IEEE Trans. Med. Imaging 32, 1302–1315.

Balafar, M. a., Ramli, a. R., Saripan, M.I., Mashohor, S., 2010. Review of brain MRI image segmentation methods. Artif. Intell. Rev. 33, 261–274.

Can, A., Stewart, C. V., Roysam, B., Tanenbaum, H.L., 2002. A feature-based, robust, hierarchical algorithm for registering pairs of images of the curved human retina. IEEE Trans. Pattern Anal. Mach. Intell. 24, 347–364.

Cheng, S., Butler, J.E., Gandevia, S.C., Bilston, L.E., 2008. Movement of the tongue during normal breathing in awake healthy humans. J. Physiol. 586, 4283–4294.

Dempsey, J., Veasey, S., Morgan, B., O’Donnell, C., 2010. Pathophysiology of Sleep Apnea. Physiol. Rev. 90, 47–112.

English, a W., Wolf, S.L., Segal, R.L., 1993. Compartmentalization of muscles and their motor nuclei: the partitioning hypothesis. Phys. Ther. 73, 857–867.

Fei, B., Duerk, J.L., Boll, D.T., Lewin, J.S., Wilson, D.L., 2003. Slice-to-volume registration and its potential application to interventional MRI-guided radio-frequency thermal ablation of prostate cancer. IEEE Trans. Med. Imaging 22, 515–525.

35

Gilbert, R.J., Napadow, V.J., 2005. Three-dimensional muscular architecture of the human tongue determined in vivo with diffusion tensor magnetic resonance imaging. Dysphagia 20, 1–7.

Grady, L., 2006. Random walks for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1768–83.

Hamilton, A.F.D.C., Jones, K.E., Wolpert, D.M., 2004. The scaling of motor noise with muscle strength and motor unit number in humans. Exp. Brain Res. 157, 417–430.

Harandi, N.M., Abugharbieh, R., Fels, S., 2014. 3D segmentation of the tongue in MRI : a minimally interactive model-based approach. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. doi:http://dx.doi.org/10.1080/21681163.2013.864958

Hendargo, H.C., Estrada, R., Chiu, S.J., Tomasi, C., Farsiu, S., Izatt, J. a, 2013. Automated non-rigid registration and mosaicing for robust imaging of distinct retinal capillary beds using speckle variance optical coherence tomography. Biomed. Opt. Express 4, 803–21.

Hopp, T., Dietzel, M., Baltzer, P. a., Kreisel, P., Kaiser, W. a., Gemmeke, H., Ruiter, N. V., 2013. Automatic multimodal 2D/3D breast image registration using biomechanical FEM models and intensity-based optimization. Med. Image Anal. 17, 209–218.

Ibragimov, B., Likar, B., Pernuš, F., Vrtovec, T., 2012. A game-theoretic framework for landmark-based image segmentation. IEEE Trans. Med. Imaging 31, 1761–1776.

Ibragimov, B., Prince, J.L., Murano, E.Z., Woo, J., Stone, M., Likar, B., Pernuš, F., Vrtovec, T., 2015. Segmentation of tongue muscles from super-resolution magnetic resonance images. Med. Image Anal. 20, 198–207.

Jenkinson, M., Bannister, P., Brady, M., Smith, S.M., 2002. Improved optimisation for the robust and accurate linear registration and motion correction of brain images. Neuroimage 17, 825–841.

Jiang, S., Xue, H., Glover, A., Rutherford, M., Rueckert, D., Hajnal, J. V., 2007. MRI of moving subjects using multislice Snapshot images with Volume Reconstruction (SVR): Application to fetal, neonatal, and adult brain studies. IEEE Trans. Med. Imaging 26, 967–980. doi:10.1109/TMI.2007.895456

Joshi, A. a., Hu, H.H., Leahy, R.M., Goran, M.I., Nayak, K.S., 2013. Automatic intra-subject registration-based segmentation of abdominal fat from water-fat MRI. J. Magn. Reson. Imaging 37, 423–430.

Kelch, J., Wein, B., 1993. Segmentation of the tongue surface in ultrasonic images using modified scale space filtering. Proc. IEEE Ultrason. Symp. 947–950.

Kuna, S.T., 2004. Regional effects of selective pharyngeal muscle activation on airway shape. Am. J. Respir. Crit. Care Med. 169, 1063–1069.

Lee, J., Woo, J., Xing, F., Murano, E.Z., Stone, M., Prince, J.L., 2014. Semi-automatic segmentation for 3D motion analysis of the tongue with dynamic MRI. Comput. Med. Imaging Graph. 38, 714–24.

Lieberman, D.E., McCarthy, R.C., Hiiemae, K.M., Palmer, J.B., 2001. Ontogeny of postnatal hyoid and larynx descent in humans. Arch. Oral Biol. 46, 117–128.

Lieberman, P., 2012. Vocal tract anatomy and the neural bases of talking. J. Phon. 40, 608– 622.

36

Lucas-Osma, A.M., Collazos-Castro, J.E., 2009. Compartmentalization in the triceps brachii motoneuron nucleus and its relation to muscle architecture. J. Comp. Neurol. 516, 226– 239.

Lufkin, R.B., Larsson, S.G., Hanafee, W.N., 1983. Work in progress: NMR anatomy of the larynx and tongue base. Radiology 148, 173–5.

Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P., 1997. Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imaging 16, 187– 198.

Monti, L., Renifilo, E., Profili, M., Balzarini, L., 2008. Journal of Cardiovascular Magnetic Resonance Cardiovascular magnetic resonance features of caseous calcification of the mitral annulus. J. Cardiovasc. Magn. Reson. 5, 1–5.

Moon, I.J., Han, D.H., Kim, J.-W., Rhee, C.-S., Sung, M.-W., Park, J.-W., Kim, D.S., Lee, C.H., 2010. Sleep magnetic resonance imaging as a new diagnostic method in obstructive sleep apnea syndrome. Laryngoscope 120, 2546–54.

Mu, L., Sanders, I., 2000. Neuromuscular specializations of the pharyngeal dilator muscles: II. Compartmentalization of the canine genioglossus muscle. Anat. Rec. 260, 308–325.

Oliveira, F.P.M., Tavares, J.M.R.S., 2012. Medical image registration: a review. Comput. Methods Biomech. Biomed. Engin. 1–21.

Pillar, G., Fogel, R.B., Malhotra, A., Beauregard, J., Edwards, J.K., Shea, S. a., White, D.P., 2001. Genioglossal inspiratory activation: Central respiratory vs mechanoreceptive influences. Respir. Physiol. 127, 23–38.

Pluim, J.P.W., Fitzpatrick, J.M., 2003. Image registration. IEEE Trans. Med. Imaging 22, 1341– 1343.

Rinck, P.A., 2001. Magnetic Resonance in Medicine Basic Textbook of the European Magnetic Resonance Forum, 4th ed. ed. Oxford: Blackwell Scientific Publications.

Rodrigues, M. a. F., Gillies, D.F., Charters, P., 1998. Modelling and simulation of the tongue during laryngoscopy. Comput. Networks ISDN Syst. 30, 2037–2045.

Rodrigues, M. a. F., Gillies, D.F., Charters, P., 2001. Realistic deformable models for simulating the tongue during laryngoscopy. Proc. Int. Work. Med. Imaging Augment. Real.

Rua Ventura, S.M., Freitas, D.R.S., Tavares, J.M.R.S., 2011. Toward dynamic magnetic resonance imaging of the vocal tract during speech production. J. Voice 25, 511–518.

Saboisky, J.P., Butler, J.E., McKenzie, D.K., Gorman, R.B., Trinder, J. a, White, D.P., Gandevia, S.C., 2007. Neural drive to human genioglossus in obstructive sleep apnoea. J. Physiol. 585, 135–146.

Sawczuk, a, Mosier, K.M., 2001. Neural control of tongue movement with respect to respiration and swallowing. Crit. Rev. Oral Biol. Med. 12, 18–37.

Seeley, R., Stephens, T., Tate, P., 2008. Anatomia e Fisiologia, 8a Edição. ed. McGraw-Hill.

Seikel, J., King, D., Drumright, D., 2009. Anatomy and physiology for speech, language, and hearing. Delmar, Cengade learning.

Sonies, B.C., 1981. Ultrasonic visualization of tongue motion during speech. J. Acoust. Soc. Am. 70, 683.

37

Stone, M., Liu, X., Chen, H., Prince, J.L., 2010. A preliminary application of principal components and cluster analysis to internal tongue deformation patterns. Comput. Methods Biomech. Biomed. Engin. 13, 493–503.

Tang, L., Bressmann, T., Hamarneh, G., 2012. Tongue contour tracking in dynamic ultrasound via higher-order MRFs and efficient fusion moves. Med. Image Anal. 16, 1503–20.

Teguh, D.N., Levendag, P.C., Voet, P.W.J., Al-Mamgani, A., Han, X., Wolf, T.K., Hibbard, L.S., Nowak, P., Akhiat, H., Dirkx, M.L.P., Heijmen, B.J.M., Hoogeman, M.S., 2011. Clinical validation of atlas-based auto-segmentation of multiple target volumes and normal tissue (swallowing/mastication) structures in the head and neck. Int. J. Radiat. Oncol. Biol. Phys. 81, 950–957.

Togeiro, S.M.G.P., Chaves, C.M., Palombini, L., Tufik, S., Hora, F., Nery, L.E., 2010. Evaluation of the upper airway in obstructive sleep apnoea. Indian J. Med. Res. 131, 230–235.

Unser, M., Stone, M., 1992. Automated detection of the tongue surface in sequences of ultrasound images. J. Acoust. Soc. Am. 91, 3001–3007.

Ventura, S.R., Diamantino, R.F., Tavares, J.M., 2008. Three-Dimensional modeling of tongue during speech using MRI data. C. 2008—8th Int. Symp. Comput. Methods Biomech. Biomed. Eng. 49–58.

Von Neuman, J., Morgenstern, O., 1994. Theory of Games and Economic Behavior. Princeton University Press.

Vorperian, H.K., Wang, S., Chung, M.K., Schimek, E.M., Durtschi, R.B., Kent, R.D., Ziegert, A.J., Gentry, L.R., 2009. Anatomic development of the oral and pharyngeal portions of the vocal tract: an imaging study. J. Acoust. Soc. Am. 125, 1666–1678.

Wang, J., Gu, X., 2013. High-quality four-dimensional cone-beam CT by deforming prior images. Phys. Med. Biol. 58, 231–46.

Watkin, K.L., Rubin, J.M., 1989. Pseudo-three-dimensional reconstruction of ultrasonic images of the tongue. J. Acoust. Soc. Am. 85, 496–9.

Wheatley, J.R., Kelly, W.T., Tully, a, Engel, L. a, 1991. Pressure-diameter relationships of the upper airway in awake supine subjects. J. Appl. Physiol. 70, 2242–2251.

Wickham, J.B., Brown, J.M.M., 2012. The function of neuromuscular compartments in human shoulder muscles. J. Neurophysiol. 107, 336–345.

Woo, J., Murano, E.Z., Stone, M., Prince, J.L., 2012. Reconstruction of high-resolution tongue volumes from MRI. IEEE Trans. Biomed. Eng. 59, 3511–24.

Xiaojin Zhu, John Lafferty, Z.G., 2003. Combining Active Learning and Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. Proc. ICML 2003 Work. Contin. from Labeled to Unlabeled Data Mach. Learn. Data Min. 58–65.

Zhu, X., Ghahramani, Z., Lafferty, J., 2003. Combining Active Learning and Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. … Data Mach. Learn. … 20, 912– 919.

38