See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/275963346

Enhancement Classification of Images

Thesis · December 2014 DOI: 10.13140/RG.2.1.4152.4641

CITATIONS READS 0 493

1 author:

John Jenkinson University of Texas at San Antonio

5 PUBLICATIONS 53 CITATIONS

SEE PROFILE

All content following this page was uploaded by John Jenkinson on 07 May 2015.

The user has requested enhancement of the downloaded file. ENHANCEMENT CLASSIFICATION OF GALAXY IMAGES

APPROVED BY SUPERVISING COMMITTEE:

Arytom Grigoryan, Ph.D., Chair

Walter Richardson, Ph.D.

David Akopian, Ph.D.

Accepted: Dean, Graduate School Copyright 2014 John Jenkinson All rights reserved. DEDICATION

To my family.

ENHANCEMENT CLASSIFICATION OF GALAXY IMAGES

by

JOHN JENKINSON, M.S.

DISSERTATION Presented to the Graduate Faculty of The University of Texas at San Antonio In Partial Fulfillment Of the Requirements For the Degree of

MASTER OF SCIENCE IN ELECTRICAL ENGINEERING

THE UNIVERSITY OF TEXAS AT SAN ANTONIO College of Engineering Department of Electrical and Computer Engineering December 2014 UMI Number: 1572687

All rights reserved

INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion.

UMI 1572687 Published by ProQuest LLC (2015). Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code

ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346 ACKNOWLEDGEMENTS

My most sincere regard is given to Dr. Artyom Grigoryan for giving me the opportunity to learn to research and for being here for the students, to Dr. Walter Richardson, Jr. for teaching complex topics from the ground up and leading this horse of a student to mathematical waters applicable to my research, to Dr. Mihail Tanase for being the study group that I have never had, and to Dr. Azima Mottaghi for constant motivation, support and the remark, "You can finish it all in one day."

Additionally, this work was progressed through discussions with Mehdi Hajinoroozi, Skei, hftf, and pavonia. I also acknowledge the UTSA Mexico Center for their support of this research.

December 2014

iv ENHANCEMENT CLASSIFICATION OF GALAXY IMAGES

John Jenkinson, B.S. The University of Texas at San Antonio, 2014

Supervising Professor: Arytom Grigoryan, Ph.D., Chair

With the advent of astronomical imaging technology developments, and the increased capacity of digital storage, the production of photographic atlases of the night sky have begun to generate volumes of data which need to be processed autonomously. As part of the Tonantzintla Digi- tal Sky Survey construction, the present work involves software development for the digital image processing of astronomical images, in particular operations that preface feature extraction and clas- sification. Recognition of in these images is the primary objective of the present work. Many galaxy images have poor resolution or contain faint galaxy features, resulting in the mis- classification of galaxies. An enhancement of these images by the method of the Heap transform is proposed, and experimental results are provided which demonstrate the image enhancement to improve the presence of faint galaxy features thereby improving classification accuracy. The fea- ture extraction was performed using morphological features that have been widely used in previous automated galaxy investigations. Principal component analysis was applied to the original and en- hanced data sets for a performance comparison between the original and reduced features spaces. Classification was performed by the Support Vector Machine learning algorithm.

v TABLE OF CONTENTS

Acknowledgements...... iv

Abstract...... v

ListofTables...... viii

ListofFigures...... ix

Chapter1:Introduction...... 1

1.1GalaxyClassification...... 1 1.1.1 Hubble Scheme ...... 2

1.1.2 deVaucouleursScheme...... 7 1.2 Digital Data Volumes in Modern Astronomy ...... 12

1.2.1 DigitizedSkySurveys...... 12

1.2.2 ProblemMotivation...... 14 1.3 Problem Description and Proposed Solution ...... 14

1.4PreviousWork...... 15

1.4.1 Survey of Automated Galaxy Classification ...... 15 1.4.2 Survey of Support Vector Machines ...... 17

1.4.3 Survey of Enhancement Methods ...... 18

Chapter 2: Morphological Classification and Image Analysis ...... 20

2.1 Astronomical Data Collection ...... 20 2.2Imageenhancementmeasure(EME)...... 22

2.3Spatialdomainimageenhancement...... 25

2.3.1 NegativeImage...... 27 2.3.2 LogarithmicTransformation...... 28

vi 2.3.3 PowerLawTransformation...... 30 2.3.4 HistogramEqualization...... 31

2.3.5 MedianFilter...... 34

2.4Transform-basedimageenhancement...... 37 2.4.1 Transforms...... 37

2.4.2 Enhancement methods ...... 44 2.5ImagePreprocessing...... 47

2.5.1 Segmentation...... 48

2.5.2 Rotation,ShiftingandResizing...... 53 2.5.3 CannyEdgeDetection...... 58

2.6DataMiningandClassification...... 61

2.6.1 FeatureExtraction...... 61 2.6.2 Principal Component Analysis ...... 64

2.6.3 Support Vector Machines ...... 67 2.7ResultsandDiscussion...... 73

2.8FutureWork...... 82

AppendixA:ProjectSoftware...... 84

A.1PreprocessingandFeatureExtractioncodes...... 85

A.2SVMClassificationcodeswithdata...... 92 A.2.1Originaldata...... 92

A.2.2Enhanceddata...... 100

Bibliography...... 110

Vita

vii LIST OF TABLES

Table 1.1 Hubble’s Original Classification of Nebulae Table ...... 3

Table 2.1 Morphological Feature Descriptions ...... 64 Table2.2 FeatureValuesPerClass...... 64

Table 2.3 Galaxy list and relation between NED classification and current project classification...... 74

Table 2.4 Summary of classification results for original and enhanced data. Accuracy

improved by 12.924% due to enhancement...... 81

viii LIST OF FIGURES

Figure 1.1 Hubble Tuning Fork Diagram. Image from http://www.physast.uga.edu/ rl-

s/astro1020/ch20/ch26_fig26_9.jpg...... 2

Figure 1.2 Plate scan of Elliptical and Irregular Nebulae from Mount Wilson Obser- vatory originally included in Hubble’s paper, Extra-galactic Nebulae. . . . . 4

Figure 1.3 Plate scan of Spiral and Barred Spiral Nebulae from Mount Wilson Obser-

vatory originally included in Hubble’s paper, Extra-galactic Nebulae. . . . . 6 Figure 1.4 A plane projection of the revised classification scheme...... 10

Figure 1.5 A 3-Dimensional representation of the revised classification volume and notationsystem...... 11

Figure 1.6 coverage map. http://www.sdss.org/sdss-surveys/.

...... 13

Figure 2.1 Schmidt Camera of Tonantzintla. Permission to use image from the Insti- tuto Nacional de Astrofísica, Óptica y Electrónica (INAOE)...... 20

Figure 2.2 Plate Sky Coverage. Permission to use image from the Instituto Nacional

deAstrofísica,ÓpticayElectrónica(INAOE)...... 21 Figure 2.3 Digitized plate AC8431 ...... 23

Figure 2.4 Marked plate scan AC8431 ...... 24 Figure 2.5 Plate scan AC8409 ...... 25

Figure 2.6 Marked plate scan AC8409 ...... 26

Figure 2.7 Cropped galaxies from plate scans AC8431 and AC8409 read left to right and top to bottom: NGC 4251, 4274, 4278, 4283, 4308, 4310, 4314, 4393,

4414, 4448, 4559, 3985, 4085, 4088, 4096, 4100, 4144, 4157, 4217, 4232,

4218, 4220, 4346, 4258...... 27 Figure 2.8 Negative, log and power transformations...... 28

ix Figure 2.9 Top to bottom: Galaxy NGC4258 and its Negative Image...... 29

Figure 2.10 Logarithmic and nthroottransformations...... 30

Figure 2.11 γ-powertransformation...... 31 Figure 2.12 Galaxy NGC 4217 power law transformations...... 32 Figure 2.13 Histogram processing to enhance Galaxy NGC 6070...... 34

Figure 2.14 Top to Bottom: Histogram of original and enhanced image...... 35 Figure 2.15 Illustration of the median of a set of points in different dimensions...... 36

Figure 2.16 Signal-flow graph of determination of the five-point transformation by a

vector x =(x0,x1,x2,x3,x4) ...... 43 Figure 2.17 Network of the x-induced DsiHT of the signal z...... 44

Figure 2.18 Intensity values and spectral coefficients of Galaxy NGC 4242...... 46

Figure 2.19 Butterworth lowpass filtering performed in the Fourier (frequency) domain. 47

Figure 2.20 α-rooting enhancement of Galaxy NGC 4242...... 47 Figure 2.21 Top: Galaxy PIA 14402, Bottom: NGC 5194, both processed by Heap transform...... 48

Figure 2.22 Computational scheme for galaxyclassification...... 49

Figure 2.23 Background subtraction of Galaxy NGC 4274 by manual and Otsu’s thresh- olding...... 52

Figure 2.24 Morphological opening for removal from Galaxy NGC 5813...... 54 Figure 2.25 Rotation of Galaxy image NGC 4096 by galaxy second moment defined

angle...... 57

Figure 2.26 Resizing of Galaxy NGC 4220...... 59 Figure2.27Cannyedgedetection...... 62

Figure 2.28 PCA rotation of axes for a bivariate Gaussian distribution...... 65

Figure 2.29 Pictorial representation of the development of the geometric margin. . . . . 69 Figure2.30Maximumgeometricmargin...... 70

Figure2.31SVMappliedtogalaxydata...... 73

x Figure2.32Classificationiterationclasspairs...... 77 Figure2.33PCAfeaturespaceiteration1classification...... 78

Figure2.34PCAfeaturespaceiteration2classification...... 79

Figure2.35PCAfeaturespaceiteration3classification...... 79 Figure2.36PCAfeaturespaceiteration4classification...... 80

Figure 2.37 PCA feature space iteration 1 classification of enhanced data...... 81 Figure 2.38 PCA feature space iteration 2 classification of enhanced data...... 82

Figure 2.39 PCA feature space iteration 3 classification of enhanced data...... 82

Figure 2.40 PCA feature space iteration 4 classification of enhanced data...... 83

xi Chapter 1: INTRODUCTION

1.1 Galaxy Classification

Why classify galaxies? It is an inherent characteristic of man to classify objects. Our country’s government classifies families according to annual income to establish tax laws. Medical doctor’s

classify our blood’s type making successful transfusion possible. Organic genes are classified by

genetic engineers so that freeze resistant DNA from a fish can be used to "infect" a tomato cell making the tomato less susceptible to cold. Words in the English language are assigned to the

categories noun, verb, adjective, adverb, pronoun, preposition, conjunction, determiner, and excla- mation, allowing for the structured composition of sentences. Differential equations are classified

as ordinary (ODEs) and partial (PDEs) with ODEs having sub-categories: linear homogeneous,

exact differential equations, n-th order equations, etc..., which allowing easy of study and for solu- tion methods to be developed for certain classes such as the method of undetermined coefficients

for ordinary linear differential equations with variable coefficients. If we say that a system is linear,

there is no need to mention that the system’s input-output relationship is observed to be additive and homogeneous. Classification pervades every industry, and enables improved communication,

organization and operation within society. For galaxies classification in particular, astrophysicists think that to understand the formation and subsequent evolution of galaxies one must first dis-

tinguish between the two main morphological classes of massive systems: spirals and early-type

systems which are also called ellipticals. Galaxies with spiral arms, for example, are normally ro- tating disk of , dust and gas with plenty of fuel for future . Ellipticals, however,

are normally more mature system which long ago finished forming stars. The galaxies’ histories are also revealed; dust lane early-type galaxies are starbust systems formed in gas-rich mergers

of smaller spiral galaxies. A galaxy’s classification can reveal information about its environment.

A morphology-density relationship has been observed in many studies; spiral galaxies tend to be located in low-density environments and ellipticals in more dense environments [1,2,3].

1 There are many physical parameters of galaxies that are useful for their classification, but this paper considers the classification of galaxies by their morphology, a word derived from the Greek word morph, meaning shape or form.

1.1.1 Hubble Scheme

Hubble’s scheme was visually popularized by the "tuning fork" diagram which displays examples of each nebulae class, described in this section, in the transition sequence from early-type elliptical to late-type spiral. The tuning fork diagram is shown in Figure 1.1. While the basic classification

Figure 1.1: Hubble Tuning Fork Diagram. Image from http://www.physast.uga.edu/ rls/as- tro1020/ch20/ch26_fig26_9.jpg. of galaxy morphology assigns members to the categories of elliptical and spiral, the most promi- nent classification scheme was introduced by Sir Edwin Hubble in his 1926 paper, "Extra-galactic Nebulae." This classification scheme is based on galaxy structure. The individual members of a class differ only in apparent size and luminosity. Originally, Hubble stated that the forms divide themselves naturally into two groups: those found in or near the Milky Way and those in moderate

2 or high altitude galactic latitudes. This paper, along with Hubble’s classification scheme will only consider the extra-galactic division: Table 1.1 shows that this scheme contains two main divisions,

Table 1.1: Hubble’s Original Classification of Nebulae Table Type: Symbol Example A. Regular: N.G.C 3379 E0 1. Elliptical...... En 221 E2 (n=1,2,...,7 indicates the ellipticity of the image) 4621 E5 2117 E7 2. Spirals: a) Normal spirals...... S N.G.C. (1) Early...... Sa 4594 (2) Intermediate...... Sb 2841 (3) Late...... Sc 5457 b) Barred spirals...... SB N.G.C. (1) Early...... SBa 2859 (2) Intermediate...... SBb 3351 (3) Late...... SBc 7479 B. Irregular: ...... Irr 4449 regular and irregular galaxies. Within the regular division, three main classes exist: elliptical, spirals, and barred spirals. The term nebulae and galaxies are used interchangeably with a brief discussion of the rational for this at the end of this subsection. N.G.C. and U.G.C are acronyms for and Uppsala General Catalogue, respectively, and are designations for deep sky objects. Elliptical galaxies range in shape from circular through flattening ellipses to a limiting lenticu- lar figure in which the ratio of axes is about 1 to 3 or 4. They contain no apparent structure except for their luminosity distribution which is maximum at the center of the galaxy and decreases to unresolved edges. The degree to which an elliptical nebulae is flattened is determined by the cri- terion, elongation, defined as (a − b)/a,wherea and b are the semi major and semi minor axes, respectively, or an ellipse fitted to the nebulae. The elongation mentioned here is different than, and not to be confused with, the morphic feature elongation that is introduced later in this paper.

Elliptical nebulae are designated by the symbol,"E," followed by the numerical value of ellipticity.

3 The complete series is E0, E1,..., E7, the last representing a definite limiting figure which marks the junction with spirals. Examples of nebulae with differing ellipticities are shown in Figure 1.2.

Figure 1.2: Plate scan of Elliptical and Irregular Nebulae from Mount Wilson Observatory origi- nally included in Hubble’s paper, Extra-galactic Nebulae.

All regular nebulae with ellipticities greater than about E7 are spirals, and no spirals are known

4 with ellipticity less than this limit. Spirals are designated by the symbol "S". Classification criteria for spiral nebulae is: (1) relative size of the unresolved nuclear region; (2) extent to which the arms

are unwound; (3) degree of resolution in the arms. Relative size of the nucleus decreases as the

arms of the spiral more widely open. The stages of this transition of spiral galaxies are designed as "a" for early types, "b" for intermediate types, and "c" for late types. Nebulae intermediate

between E7 and Sa are occasionally designated as S0, or lenticular. Barred spirals is a class of spirals which have a bar of nebulosity extending diametrically across

the nucleus. This class is designated by the symbol "SB", with a sequence which parallels that of

normal spirals, leading to the subdivision of barred spirals designated by "SBa", "SBb", and "SBc" for early, intermediate and late type barred spirals, respectively. Examples of normal and barred

spirals along with their subclasses are shown in Figure 1.3.

Irregular nebulae are extra-galactic nebulae that lack both discriminating nuclei and rotational symmetry. Individual stars may emerge from an unresolved background in these galaxies.

For any given imaging system, there is a limiting resolution beyond which classification cannot be made with any confidence. Hubble designed galaxies within this category by the letter "Q."

On the usage of nebulae versus galaxy, the astronomical term nebulae has come down through the centuries as the name for permanent, cloudy patches in the sky that are beyond the limits of the solar system. In 1958, the term nebulae was used for two types of astronomical bodies: clouds of dust and gas which are scattered among the stars of the galactic system (galactic nebulae), and the remaining objects, which are now recognized as independent stellar systems scattered through space beyond the limits of the galactic system (extra-galactic nebulae). Some astronomers considered that since nebulae are now considered stellar systems they should be designated by some other name, which does not carry the connotation of clouds or mist. Today, those who adopt this consideration refer to other stellar systems as external galaxies. Since this paper only considers external galaxies we will drop the adjective and employ the term galaxies for whole external stellar systems [4].

5 Figure 1.3: Plate scan of Spiral and Barred Spiral Nebulae from Mount Wilson Observatory orig- inally included in Hubble’s paper, Extra-galactic Nebulae.

6 1.1.2 de Vaucouleurs Scheme

The de Vaucouleurs Classification system is an extension of the Hubble Classification system, and is the most commonly used system. For this reason it is noted in this paper.

About 1935, Hubble undertook a systematic morphological study of the approximately 1000 brighter galaxies listed in the Shipely Ames Catalogue, north of -30° declination, with a view of refining his original classification scheme. The main revisions include a) the introduction of the

S0 and SB0 types regarded as transition stages between ellipticals and spirals at the branching off point of the tuning fork. S0, or lenticular galaxies resemble spiral galaxies in luminosity, but do not contain visible spiral arms. A visible lens surrounds these galaxies bordered by a faint ring of nebulosity. Characteristics of lenticular galaxies are a bright nucleus in the center of a disc or lens. Near the perimeter of the galaxy, there exists a faint rim or envelope with unresolved edges. Hubble separated the lenticulars into two groups, S0(1) and S0(2). These groups have a smooth lens and envelope, and some structure in the envelope in the form of a dark zone and ring, respectively. S0/a is the transition stage between S0 and Sa and shows apparent developing spiral structure in the envelope. SB0 objects are characterized by a bar through the central lens. Hubble distinguished three groups of SB0 objects: group SB0(1) have a bright lens, with broad, hazy bar and no ring, surrounded by a larger, fainter envelopes some being circular, group SB0(2) have a broad, weak bar across a primary ring, with faint outer secondary rings, and group SB0(3) have a well developed bar and ring pattern, with the bar stronger than the ring.

c) Harlow Shapely proposed an extension to the normal spiral sequence beyond Sc designating galaxies showing a very small, bright nucleus and many knotty irregular arms by Sd. A parallel extension of the barred spiral sequence beyond the stage SBc was introduced by de Vaucouleurs in

1955 which may be denoted SBd or SBm [5,6]. For Irregular type galaxies related to Magenellic Clouds, I(m), an important characteristic is their small diameter and low luminosity which marks them as dwarf galaxies.

d) Shapely discovered the existence of dwarf ellipticals (dE) by observation of ellipticals with

7 very low . de Vaucouleurs noted that after all such types or variants have been assigned into categories, there remains a hard core of "irregular" objects which do not seem to fit into any of the recognized types. These outliers are presently discarded, and only isolated galaxies are considered in the present article.

The coherent classification scheme proposed by de Vaucouleurs which included most of the current revision and additions to the standard classification is described here. Classification and notation of the scheme are illustrated in Figure 1.4, which may be considered as a plane projection of the three dimensional representation in Figure 1.5. Four Hubble classes are retained: ellipticals E, lenticulars S0, spirals S, irregulars I.

Lenticulars and spirals, were re-designated "ordinary" SA and "barred" SB, respectively, to allow for the use of the compound symbol SAB for the transition stage between these two classes. The symbol S alone is used when a spiral object cannot be more accurately classified as either SA or SB because of poor resolution, unfavorable tilt, etc. Lenticulars were divided into two subclasses, denoted SA0 and SB0, where SB0 galaxies have a bar structure across the lens and SA0 galaxies do not. SAB0 denotes objects with a very weak bar. The symbol S0 is now used for a lenticular object which cannot be more precisely classified as either SA0 or SB0; this is often the case for edgewise objects.

Two main varieties are recognized in each of the lenticular and spiral families, the" annular" or "ringed" type, denoted (r), and the" spiral" or " S-shaped" type, denoted (s). Intermediate types are noted (rs). In the "ringed" variety the structure includes circular (sometimes elliptical) arcs or rings (SO) or consists of spiral arms or branches emerging tangentially from an inner circular ring (5). In the "spiral" variety two main arms start at right angles from a globular or little elongated nucleus (5 A) or from an axial bar (5 B). The distinction between the two families A and B and between the two varieties (r) and (s) is most clearly marked at the transition stage SO/a between the SO and 5 classes. It vanishes at the transition stage between E and SO on the one hand, and at the transition stage between 5 and I on the other (d. Fig. 3).

8 Four sub-divisions or stages are distinguished along each of the four spiral sequences SA(r), SA (s), SB(r), SB(s), viz. "early", "intermediate" and "late" denoted a, b, e as in the standard classification, with the addition of a "very late" stage, denoted d. Intermediate stages are noted 5 ab, 5 be, 5 cd. The transition stage towards the magellanic irregulars (whether barred or not) is noted 5 m, e.g. the Large Magellanic Cloud is 5 B (s) m. Along each of the non-spiral sequences the signs + and - are used to denote " early" and "late" subdivisions; thus E+ denotes a "late" E, the first stage of the transition towards the SO class 2. In both the SAO and S BO sub-classes three stages, noted SO-, 50°, 50+ are thus distinguished; the transition stage between SO and Sa, noted SO/a by HUBBLE, may also be noted Sa-. Notations such as S a+, S b-, etc. may be used occasionally in the spiral sequences, but the distinction is so slight between, say, 5 a+ and S b-, that for statistical purposes it is convenient to group them together as 5 a b, etc. Experience shows that this makes the transition subdivisions, Sab, Sbe, etc. as wide as the main sub-divisions, Sa, Sb, etc. 3.

The classification of irregulars which do not show clearly the characteristic spiral structure are noted I(m).

Figure 1.4 shows a plane projection of the revised classification scheme.Compare with Fig- ure 1.5. The ordinary spirals SA are in the upper half of the figure, the barred spirals SB in the lower half. The ring types (r) are the the left, the spiral types (s) to the right. Ellipticals and lentic- ulars are near the center, magellanic irregulars near the rim. The main stages of the classification sequence from E to Im through S0-, S0, S0+, Sa, Sb, Sc, Sd, Sm are illustrated, approximately on the same scale, along each of the four main morphological series SA(r), SA(s), SB(r), SB(s).

A few mixed or "intermediate" types SAB and S(rs) are shown along the horizontal and vertical diameters respectively. This scheme is superseded by the slightly revised and improved system illustratedinFigure1.5.

Figure 1.5 shows a 3-Dimensional representation of the revised classification volume and no- tation system. From left to right are the four main classes: ellipticals E, lenticulars S0, spirals S, and Irregulars I. Above are ordinary families SA, below the barred families SB; on the near side

9 Figure 1.4: A plane projection of the revised classification scheme. are the S-shaped varieties s(s), on the far side the ringed varieties S(r). The shape of the volume indicated that the separation between the various sequences SA(s), SA(r), SB(r), SB(s) is greatest at the transition stage S0/a between lenticulars and spirals and vanishes at E and Im. A central cross-section of the classification volume illustrates the relative location of the main types and the notation system. There is a continuous transition of mixed types between the main families and va-

10 rieties across the classification volume and between stages along each sequence; each point in the classification volume represents potentially a possible combination of morphological characteris- tics. For classification purposes this infinite continuum of types is represented by a finite number of discrete "cells" [5,6,7]. The classification scheme included here defers to [5, 6] for a complete

Figure 1.5: A 3-Dimensional representation of the revised classification volume and notation sys- tem. description.

11 1.2 Digital Data Volumes in Modern Astronomy

1.2.1 Digitized Sky Surveys

Modern astronomy has produced massive volumes of data relative to that produced at the start of the 20th century. Digitized sky surveys attempt to construct a virtual photographic atlas of the universe through the identification and cataloging of observed celestial phenomena for the purpose of understanding the large-scale structure of the universe, the origin and evolution of galaxies, the relationship between dark and luminous matter, and many other topics of research interest in astronomy. This idea is being realized through the efforts of multiple organizations and all sky surveys. Notable surveys and their night sky coverage contribution and data collection are mentioned here.

The Sloan Digital Sky Survey (SDSS) is the most prominent on going all sky survey, in its seventh data release almost 1 billion objects have been identified in approximately 35% of the night sky. Comprehensive data collection for the survey which uses electronic light detectors for imaging is projected at 15 terabytes [8]. An image from the SDSS displaying the current coverage of the sky in orange with selected regions displayed in higher resolution is shown in Figure 1.6.

The Galaxy Evolution Explorer (GALEX), a NASA mission led by Caltech, has used micro channel plate detectors in two bands to image 2/3 of the night sky from the GALEX satellite be- tween 2003 and the present in its survey [9]. In 1969, the two micro sky survey (TMSS) scanned

70% of the sky and detected approximately 5,700 celestial sources of infrared radiation [10]. With the advancement of infrared sensing technology, the Two micron "all-sky" survey (2MASS) de- tected an 80,000 fold increase over the TMSS between 1997 and 2001. The 2MASS was conducted by two separate observatories at Mount Hopkins Arizona and Cerro Tololo Inter-American Obser- vatory (CITO), Chile, using 1.3 meter telescopes equipped with a 3 channel camera and a 256x256 electronic light detector. Each night of released data consisted of 250,000 point sources, 2,000 galaxies, and 5,000 images weighing about 13.8 Gigabytes per facility. The compiled catalog has over 1,000,000 galaxies, extracted from 99.998% sky coverage and 4,121,439 atlas images [11].

12 Figure 1.6: Sloan Digital Sky Survey coverage map. http://www.sdss.org/sdss-surveys/.

Sky coverage by the Space Telescope Science Institute’s Guide Star Catalog 2 (GSC-2) survey which occurred from 2000 to 2009 was 100%. The optical catalog produced by this survey used 1" resolution scans of 6.5x6.5 square degrees photographic plates from the Palomar and UK Schmidt telescopes. Almost 1 billion point sources were imaged. Each plate was digitized using a modified microdensitometer with a pixel size of either 25 or 15 microns (1.7 or 1.0 arcsec respectively).

The digital images are 14000x14000 (0.4 GB) or 23040x23040 (1.1 GB) in size [12]. The second

Palomar Observatory Sky Survey (POSS2) images 897 plates between the early 1980’s and 1999 which covered the entire southern celestial hemisphere using the Oschin Schmidt telescope [13].

One of the main objectives of the ROSAT All-sky survey was to conduct the first all-sky survey in X-ray with an imaging telescope leading to a major increase in sensitivity and source location

13 accuracy. ROSAT was conducted between 1990-1991 covering 99.7% of the sky [14]. The Faint Images of the Radio Sky at Twenty-centimeters (FIRST) project was designed to produce the radio

equivalent of the Palomer Observatory Sky Survey 10,000 square degrees of the North and South

Galactic Caps. The survey began in 1993 and is currently active [15, 16]. The Deep Near Infrared Survey (DENIS) is a survey of the southern sky in two infrared and one optical band conducted at

the La Silla European Space Observatory in Chile. The survey ran from 1996 through 2001 and cataloged 355 million point sources [17]. The present work is part of the Tonantzintla Digital Sky

Survey which is discussed in Chapter 2.

1.2.2 Problem Motivation

The image quantity and data volume produced by digital sky surveys presents human analysis with

an impossible task. Therefore, source detection and classification in modern astronomy necessitate automation in the image processing and analysis, providing the motivation for the present work.

To address this problem, an algorithm for processing astronomical images to classify galaxies con- tained therein is presented and implemented using followed by class discrimination of the detected

galaxies according to the scheme mentioned in section 1.1.1. Class discrimination is performed

using extracted galaxy feature values which experience varying accuracy with different methods of segmentation. Faint regions of galaxies can be lost during segmentation, leading to increased error

during feature extraction and subsequent classification. Enhancement of the galaxy image by mul- tiple methods is proposed and implemented to reduce data loss during segmentation and improve

the accuracy of feature extraction implied through the increase of classification performance.

1.3 Problem Description and Proposed Solution

This project is part of the on going work within the Tonantzintla Digital Sky Survey. The present

work focuses on automated astronomical image processing and classification. Final performance criterion is 100% classification in categories E0,...,E7, S0,Sa,Sb,Sc,SBa,SBb,SBc,Irr,while the present work builds towards that goal by incremental improvement of classification perfor-

14 mance with categories elliptical "E," spiral "S," lenticular "S0," barred spiral "SB," and irregular "Irr." The intent in this work is to partially or fully resolve the classification performance limita- tions within the galaxy segmentation, edge detection and feature extraction stages of the image processing pipeline by enhancing the galaxy images by method of the Heap transform to preserve the faint regions of the galaxies which may be lost during the processing of images without en- hancement. Classification is performed by the supervised machine learning algorithm Support Vector Machines (SVM).

1.4 Previous Work

1.4.1 Survey of Automated Galaxy Classification

Morphological classification of galaxies into 5 broad categories was performed by the artificial neural network (ANN) machine learning algorithm with back propagation trained using 13 pa- rameters by Storrie-Lombardi in [18]. Odewahn classified galaxies from large sky surveys using

ANNs in [35, 36, 37]. The development progress of an automatic star/galaxy classifier using Ko- honen Self-Organizing Maps was presented in [38, 39] and using learning vector quantization and fuzzy classified with back-propogation based neural networks in [39]. An automatic system to classify images of varying resolution based on morphology was presented in [40]. Owens, in [19], shows comparable performance between the machine learning algorithms of oblique decision trees induced with different impurity measures to the artificial neural network used in [18] and that clas- sification of the original data could be performed with less well-defined categories. In [20] an artificial neural network was trained on the features of galaxies that were defines as a galaxy class mean by 6 independent experts. The network performed comparable to the overall root mean square dispersion between the experts. A comparison of the classification performance of an artifi- cial neural network machine learning algorithm to that of human experts for 456 galaxies with their source being the SDSS in [20] was detailed in [21]. Lahav showed the classification performance of galaxy images and spectra an unsupervised artificial neural network trained with galaxy spectra

15 de-noised and compressed by principal component analysis. A supervised artificial neural net- work was also trained with classes determined by human experts [22]. Folkes, Lahav and Maddox trained an artificial neural network using a small number of principal components selected from galaxy spectra with low signal-to-noise ratios characteristic of surveys. Classification was the performed into 5 broad morphological classes. It was shown that artificial neural networks are useful in discriminating normal and unusual galaxy spectra [23]. The use of galaxy parameters lu- minosity and color and the image-structure parameters: size, image concentration, asymmetry and surface brightness to classify galaxy images into three classes was performed by Bershady, Jangren and Conselice. It was determined that the essential features for discrimination were a combination of spectral index, e.g., color, and concentration, asymmetry, and surface brightness [24]. A com- parison using ensembles of classifiers for the classification methods Naive bayes, back propagation artificial neural network, and a decision-tree induction algorithm with pruning was performed by Bazell which resulted in the artificial neural network producing the best results, and ensemble methods improving the performance of all classification methods [30]. A computational scheme to develop an automatic galaxy classifier using galaxy morphology was shown to provide robust- ness for classification using artificial neural networks in [26,34]. Bazell derived 22 morphological features, including asymmetry, which were used to train an artificial neural network for the clas- sification of galaxy images to determine which features were most important [27]. Strateva used visual morphology and spectral classification to show that two peaks correspond roughly to early (E, S0, Sa) and late-type (Sb, Sc, Irr) galaxies. It was also shown that the color of galaxies corre- lates with their radial profile [28]. The Gini coefficient, a statistic commonly used in econometrics to measure the distribution of wealth among a population, was used to quantify galaxy morphol- ogy based on galaxy light distribution in [29]. In [31], an algorithm for preprocessing galaxy images for morphological classification was proposed. In addition, the classification performance between an artificial neural network, locally weighted regression and homogeneous ensembles of classifiers was performed for 2 and 3 galaxy classes. Lastly, compression and discrimination by principal component analysis was performed. The artificial network performed best under all con-

16 ditions. In [32], principal component analysis was applied to galaxy images and a structural type estimator names "ZEST" used a 5 nonparametric diagnosis to classify galaxy structure. Finally,

Banerji presented morphological classification by artificial neural networks for 3 classes yielding

90% accuracy in comparison to human classifications [33].

1.4.2 Survey of Support Vector Machines

This method of class segregation is performed by hyperplanes which can be defined by a variety of functions, both linear and non linear. The development of this method is presented in Chapter 2.

Support vector machines (SVMs) have been employed widely in the areas of pattern recognition and prediction. Here a limited survey of SVM applications is presented, which includes two sur- veys conducted by researchers in the field. Romano applied SVMs to photometric and geometric features computed from astronomical imagery for the identification of possible supernovae in [42]. M. Huertas-Company applied SVM to 5 morphological features, luminosity and redshift calcu- lated from galaxy images in [43]. Freed and Lee classified galaxies by morphological features into 3 classes using a SVM in [44]. Saybani conducted a survey of SVMs used in oil refineries in [45].

Xie proposed a method for predicting crude oil prices using a SVM in [90]. Petkovi used a SVM to predict the power level consumption of an oil refinery in [47]. Balabin performed near infrared spectroscopy for gasoline classification using nine different multivariate classification methods in- cluding SVMs in [48]. Byun and Lee conducted a comprehensive survey on applications of SVMs for pattern recognition and prediction in [41]. References contained therein are included here in support of the present survey. For classification with q classes (q>2), classes are trained pairwise.

The pairwise classifiers are arranged in trees where each tree node represents a SVM. A bottom up tree originally proposed for recognition of 2D objects was applied to face recognition in [49, 50].

In contrast, an interesting approach was the top down tree published in [51]. SVMs applied to improve classification speed of face detection was presented in [63, 53]. Face detection from mul- tiple views was presented in [56, 55, 54]. A SVM was applied to coarse eigenface detection for a fine detection in [57]. Frontal face detection using SVMs was discussed in [58]. [59] presented

17 SVMs for face and eye detection. Independent component analysis for face features were input to the SVM in [60], orthogonal Fourier-Mellin Moments in [61], and an overcomplete wavelet decomposition as input in [62]. A myriad of other applications have been ventured using SVMs including but not limited to 2-D and 3-D object recognition [64, 65, 66], texture recognition [66], people and pose recognition [67,68,69,70,71], moving vehicle detection [72], radar target recog- nition [73, 76], hand written character and digit recognition [74, 75, 71, 77], speaker or speech recognition [78,79,80,81], image retrieval [82,83,84,85], prediction of financial time series [86], bankruptcy [87], and other classifications such as gender [88], fingerprints [89], bullet-holes for auto scoring [90], white blood cells [91], spam categorization [92], hyperspectral data [93], storm cells [94], and image classification [95].

1.4.3 Survey of Enhancement Methods

Image enhancement is the process of visually improving the quality of a region of or the entire image with respect to some measure of quality, e.g., the Image Enhancement Measure (EME) introduced in Chapter 2. Enhancement methods can be classified as either spatial domain or trans- form domain methods depending on whether the manipulation of the image is performed directly on the pixels or on the spectral coefficients, respectively. Here, a survey of both spatial and trans- form domain methods is presented for the enhancement of astronomical images and images in general. Spatial domain methods are commonly referred to as contrast enhancement methods. The core of these methods are histogram equalization, logarithmic and inverse log transforma- tions, negative and identity transformations, nth-power and nth-root transformations, histogram matching and local histogram processing. Adaptive histogram equalization, which uses local con- trast stretching to calculate several histograms corresponding to distinct sections of the image, was applied after denoising to improve the contrast of astronomical images in [96, 99, 100, 34] and generic images in [106]. Traditional histogram equalization was applied to the Hale-Bopp comet image for enhancement in [98] and other astronomical images in [97, 101, 103, 104, 105]. [102] included histogram equalization in the development of two algorithms for point extraction and

18 matching for registration of infrared astronomical images. Astronomical images were logarithmi- cally transformed for visualization in [108] and likewise for generic images in [127]. Inverse log

transformations, negative and identity transformations, nth-power and nth-root transformations,

histogram matching and local histogram processing are introduced and applied to generic images in [107, 126, 127, 129]. At the core of transform domain methods for image enhancement exist

the discrete Fourier, Heap, α-rooting, Tensor, and Wavelet transforms. Astronomical image en- hancement performed by the discrete Fourier transform was presented in [109, 111, 112], by the

Wavelet transform in [110] and by the Heap and α-rooting transform in [113], and the Curvelet transform in [114, 98]. The enhancement of generic images can be seen in [115, 127, 128, 129] by the discrete Fourier and Cosine transforms, in [116] by the Heap transform, in [117,118,127,128] by the α-rooting, in [119,120,121,122] by the Tensor or Paried transform, in [123,98,124] by the Wavelet transform, and in [124,125] by other methods of transform domain processing.

19 Chapter 2: MORPHOLOGICAL CLASSIFICATION AND IMAGE

ANALYSIS

2.1 Astronomical Data Collection

Figure 2.1: Schmidt Camera of Tonantzintla. Permission to use image from the Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE).

20 The Tonantzintla Schmidt camera was constructed in the Harvard Observatory shop under the guidance of Dr. Harlow Shapley, and started operation in 1942. The spherical mirror is 762 mm in diameter and coupled to a 660.4 mm correcting plate. The camera is shown in figure 2.1. The

8x8 inch2 photographic plates cover a 5ºx5º field with a plate-scale of 95 arcsec/mm. The existing collection consists of a total of 14565 glass plates: 10445 taken in direct image mode; and 4120 through a 3.96° objective prism. Figure 2.2 shows the sky covered by the complete plate collection, marking the center of each observed field [130].

Figure 2.2: Plate Sky Coverage. Permission to use image from the Instituto Nacional de As- trofísica, Óptica y Electrónica (INAOE).

The plates are first digitized at the maximum optical resolution of the scanner, 4800 dots per inch (dpi), and then rebinned by a factor 3 for a final pixel size of ˜ 15 μm (1.51 arcsec/pixel) and transformed to the transparency (positive) mode. Each image has 12470 x 12470 pixels (about 350 Mb in 16-bit mode) and is stored in FITS format.

The images in this project were received from the collection of digitzed photographic plates at

21 the Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE). The present data set consists of 6 plate scans. All 6 plates were marked to indicate the galaxies contained within the image. The goal is the process the digitized plates automatically, i.e., segmenting galaxies within the image, calculating their features and performing classification. In initial attempts of processing the plate scans in Matlab on an Alienware M14x with an Intel Core i7-3840QM 2.80GHz CPU and 12.0GB

DDRAM5, e.g, applying the watershed algorithm for segmentation, memory consumption errors were experienced. Consequently, the galaxies within each plate scan were cropped and process- ing individually. Figures 2.3, 2.4, 2.5, 2.6, and 2.7 show the original digitized plates AC841 and

AC8409, their marked versions indicating captured galaxies, and the cropped galaxies from both plates. Upon performing automatic classification with the cropped images, one of the University of Texas at San Antonio’s (UTSA) high performance computing clusters SHAMU, will be used for the automatic classification of whole plate scans. SHAMU consists of twenty-two computational nodes and two high-end visualization nodes. Each computational node is powered by dual Quad- core Intel Xeon E5345 2.33GHz processors (8M Cache). SHAMU consists of twenty-three Sun Fire X4150 servers, four Penguin Relion 1800E servers, a DELL Precision R5400 and a DELL

PowerEdge R5400. SHAMU utilizes GlusterFS open-source file system over high speed Infini-

Band connection. A Sun StorageTek 2530 SAS array, fully populated with twelve 500GB hard drives, acts as SHAMU’s physical storage in a RAID 5 configuration. SHAMU is networked to- gether with two DELL PowerConnect Ethernet switches and one QLogic Silverstorm InfiniBand switch.

2.2 Image enhancement measure (EME)

To measure the quality of images and select optimal processing parameters, we consider the de- scribed in [131, 128] quantitative measure of image enhancement that relates to Weber’s law of human visual system. This measure can be used for selecting the best parameters for image en- hancement by the Fourier transform, as well as other unitary transforms. The measure is defined as follows. A discrete image {fn,m} of size N1 × N2 is divided by k1k2 blocks of size L1 × L2,

22 Figure 2.3: Digitized plate AC8431 where integers Li =[Ni/ki],i=1, 2. The quantitative measure of enhancement of the processed image, Ma : {fn,m}→{fˆn,m}, is defined by k1 k2 1 maxk,l(fˆ) EMEa(fˆ)= 20 log10 , 1 2 ˆ k k k=1 l=1 mink,l(f)

where maxk,l(fˆ) and mink,l(fˆ) respectively are the maximum and minimum of the image fˆn,m inside the (k, l)th block, and a is a parameter, or a vector parameter of the enhancement algorithm.

23 Figure 2.4: Marked plate scan AC8431

EMEa(fˆ) is called a measure of enhancement, or measure of improvement of the image f. We ˆ ˆ define a parameter a0 such that EME(f)=EMEa0 (f) to be the best (or optimal) Φ-transform- based image enhancement vector parameter. Experimental results show that the discrete Fourier

transform can be considered as the optimal, when compared with the cosine, Hartley, Hadamard,

and other transforms. When Φ is the identity transformation, I, the EME of fˆ = f is called the

enhancement measure of the image f, i.e., EME(f)=EMEI (f). EME values of the enhanced galaxy images are presented in subsequent subsections.

24 Figure 2.5: Plate scan AC8409

2.3 Spatial domain image enhancement

Contrast enhancement is the process of improving image quality by manipulating the values of single pixels in an image. This processing is said to occur in the spatial domain, meaning that the image involved in processing is represented as a plane in 2-Dimensional Euclidean space, which coined contrast enhancement methods as spatial domain methods. Contrast enhancement in the spatial domain is paralleled by transform based methods which operate in the frequency domain as

25 Figure 2.6: Marked plate scan AC8409 is shown in following subsections. The image enhancement is described by a transformation T

T : f(x, y) → g(x, y)=T[f(x, y)] where f(x, y) is the original image, g(x, y) is the processed image, and T is the enhancement operator. As a rule, T is considered to be a monotonic and invertible transformation.

26 Figure 2.7: Cropped galaxies from plate scans AC8431 and AC8409 read left to right and top to bottom: NGC 4251, 4274, 4278, 4283, 4308, 4310, 4314, 4393, 4414, 4448, 4559, 3985, 4085, 4088, 4096, 4100, 4144, 4157, 4217, 4232, 4218, 4220, 4346, 4258.

2.3.1 Negative Image

This transformation is especially useful for processing binary images, e.g., text-document images, and is described as

Tn : f(x, y) → g(x, y)=M − f(x, y)

27 for every pixel (x, y) in the image plane. M is the maximum intensity in the image f(x, y). Figure

2.8 shows this transformation for the image 0 ≤ f(x, y) ≤ L − 1,whereL is the intensity value

in the image. In the discrete, M is the maximum level, M = L − 1,andTn : r → s = L − 1 − r, where r is the original image intensity and s is the intensity mapped by the transformation. The example of an image negative is given in Figure 2.9.

250 identity negative 46*log(1+r) 16*sqrt(1+r) 200 40*(1+r)(1/3) 0.004*r2 c*r3 150

100

50

0 0 50 100 150 200 250

Figure 2.8: Negative, log and power transformations.

2.3.2 Logarithmic Transformation

The logarithmic function is used in image enhancement, because it is a monotonically increasing function. The transformation is described as

Tl : f(x, y) → g(x, y)=c0log(1 + f(x, y))

28 Figure 2.9: Top to bottom: Galaxy NGC4258 and its Negative Image. where c0 is a constant and is calculated as c0 = M/log(1 + M) in order to preserve the resolution

of the enhanced image by gray scale. For example, for the 256-gray level scale image, c0 ≈ 46.

Other versions of this transform are based on the use of the nth roots instead of the log function as

29 shown in Figure2.8. For example,

T2 : f(x, y) → g(x, y)=c0 1+f(x, y).

where the constant c0 =16, when processing a 256-level gray scale image. Examples of image enhancement by such transformations are given in Figure 2.10.

(a) Original image (b) log transformation

(c) square root transformation (d) 3rd root transformation

Figure 2.10: Logarithmic and nth root transformations.

2.3.3 Power Law Transformation

These transformations are parameterized by γ and described as

γ Tγ : f(x, y) → g(x, y)=cγ(1 + f(x, y))

30 where γ>0 is a constant which is selected by the user. The constant cγ is used to normalize the gray scale levels within [0,M].

For 0 ≤ γ ≤ 1, the transform maps a narrow range of dark samples of the image into a wide range of bright samples, and it smoothes the difference between intensities of bright samples of the original image. The Power law transformation is shown with γ = 0.0500, 0.8500, 1.6500, 2.4500, 3.2500, 4.0500, and 4.8500 in Figure 2.11.

250 original 0.05 0.85 1.65 200 2.45 3.25 4.05 4.85 150

100

50

0 0 50 100 150 200 250

Figure 2.11: γ-power transformation.

Examples of image enhancement by power log transformations are given in Figure 2.12.

2.3.4 Histogram Equalization

Consider an image of size N ×N as a random realization ξ takes values r from a range [rmin,rmax],

and let h(r)=fξ(r) be the probability density function of ξ. It is desirable to transform the image in such a way that the new image will have the uniform distribution. The equates to a change of

31 (a) Original image (b) γ = 0.005

(c) γ =0.3 (d) γ =0.9

Figure 2.12: Galaxy NGC 4217 power law transformations.

32 random variable ξ → ξ = w(ξ)(w : r → s)

such that w is a monotonically increasing function

1 h(s)=fξb(s)= . w(rmax) − w(rmin)

The following fact is well-known: dr h(s)=h(r) ds

or h(r)dr = h(s)ds. Integrating this equality yields

r r 1 ds = h(a)da w(rmax) − w(rmin) rmin rmin which yields s = w(r) r w(r) − w(r ) min = h(a)da = F (r). w(rmax) − w(rmin) rmin

In the particular case, when rmin =0and w(rmin)=0, the following result is obtained

w(r)=w(rmax)F (r).

In the case of digital image, where the image has been sampled and quantized, the discrete version of this transform has the representation ⎧ r ⎪ ⎨ M h(k) if r =1, 2,...,M − 1 → r s = ⎪ k=1 ⎩⎪ 0 if r =0 where r is the integer value of the original image, s is the quantized value of the transformed image, and h(k) is the histogram of the image.

33 So, independent of the image intensity probability density function, the intensity density func- tion of the processed image is uniform,

1 fξb(s)= . w(rmax) − w(rmin)

Histogram equalization applied to galaxy NGC 6070 is shown in Figure 2.13 with the correspond- ing original and enhanced image histograms shown in Figure 2.14. The histogram equalization

destroys the details of the galaxy image, indicating that spatial methods of enhancement are not

suitable for all images. This is part of the motivation for using α-rooting, Heap transform, and other transform based which are described in the next section.

(a) Original image (b) Histogram equalization

Figure 2.13: Histogram processing to enhance Galaxy NGC 6070.

2.3.5 Median Filter

A noteworthy spatial domain filter is the Median filter. This filter is based on order statistics. Given

a set of numbers S = {1, 2, 1, 4, 2, 5, 6, 7}, the values in S are rearranged in order of descending value, i.e., 7, 6, 5, 4, 2, 2, 1, 1, and labeled as order statistics in ascending order, i.e., 7 is the 1st order statistic and the second 1 is the 7th order statistic. The 4 and adjacent 2 can both be considered

34 9000

8000

7000

6000

5000

4000

3000

2000

1000

0 0 50 100 150 200 250 300

9000

8000

7000

6000

5000

4000

3000

2000

1000

0 0 50 100 150 200 250 300

Figure 2.14: Top to Bottom: Histogram of original and enhanced image.

35 as the median here, and the selection is made at the discretion of the user. In general, the highest

order statistics is regarded as the nth order statistic.

The Median filter comes from the follow problem in probability. Given a set of points S =

{x1,x2,...,x7} containing the Median point m, i.e., m ∈ S, which point in the set closest to every other point in the set. Figures 2.15 illustrate this in two different ways.

The Median m is found by minimization of the following function

n |m − x1| + |m − x2| + |m − x3| + ···+ |m − xn| = |xk − m|. k=1

In signal filtration, the Median filter preserves the range and edges of the original signal in contrast to the mean filter which destroys the signal edges. For signals with many consecutive noisy points, the length of the median filter must be extended to retain this behavior. The Median filter has the root property where the output of the filtration will be identical to the previous output after a certain number of filtration iterations. The Median filter is effective in removing salt and pepper noise.

x6 x 1

x2 m

x7 x5

xxxx1234 m xxx5 6 7 x8

x3 x4

(a) median in the line (b) median in space

Figure 2.15: Illustration of the median of a set of points in different dimensions.

36 2.4 Transform-based image enhancement

In parallel to directly processing image pixels in the spatial domain by contrast enhancement meth- ods, transform based methods of enhancement manipulate the spectral coefficients of an image in the domain of the transform. The primary benefits of these methods are low computational com- plexity and the usefulness of unitary transforms for filtering, coding, recognition, and restoration analysis in signal and image processing. First the operators that transform the domain of the image are introduced followed by methods of enhancement in the transform domain.

2.4.1 Transforms

Each of the following transforms presented here in one dimension can easily be extended into two dimensions which is where the transforms are useful for image processing.

Fourier Transform

The one dimensional discrete Fourier transform (1-D DFT) maps the real line in the time domain to the complex domain resulting in time domain signals being transformed into the frequency domain.

The direct transform and inverse transform pair are defined, for a discrete function xn,as

N−1 2πnp 2πnp Fp = xncos − jxnsin n=0 N N N−1 1 2πnp 2πnp xn = Fpcos + jFpsin N p=0 N N where n =0, 1,...,N − 1 represents discrete time points and p =0, 1,...,N − 1 represents discrete frequency points. The basis functions for this transform are complex exponentials. The

"real" and "imaginary" parts of this sum are considered as the sum of the cosine terms and the sum of the sine terms, respectively, and are computed by the fast Fourier transform.

37 Hartley Transform

Similar to the Fourier transform is the Hartley transform, but only generates real coefficients. This transform is defined in the one dimensional case as

N−1 N−1 2πnp 2πnp 2πnp Hp = xn cos + sin = xncas n=0 N N n=0 N

where the basis function cas(t)=cos(t)+sin(t). The inverse transform is calculated by

N−1 1 2πnp xn = Hpcas N p=0 N

Cosine Transform

The cosine transform or cosine transform of type 2 is determined by the following basis functions: ⎧ ⎪ 1 ⎨⎪ √ , if p =0 2N φp(n)=⎪ 1 π(n +1/2)p ⎩⎪ √ cos , if p =0 N N

for the p =0case as N−1 c 1 X0 = √ xn 2N n=0 and for the p =0 case as

N−1 c 1 π(n +1/2)p Xp = √ xncos N n=0 N

N−1          1 πn pπn πn pπn = √ xn cos cos − sin sin N n=0 2N N 2N N where p =1:(N − 1).

38 Paired Transform

The one dimensional unitary discrete paired transform (DPT), also known as the Grigoryan trans- form is described in the following way. The transform describes a frequency-time representation

of the signal by a set of short signals which are called the splitting-signals. Each such signal is

generated by a frequency and carries the spectral information of the original signal in a certain set of frequencies. These sets are disjoint. Therefore, the paired transform transfers the signal into a

space with frequency and time, or space which represents a source "bridge" between the time and

frequency. Consider the most interesting case, when the length of signals is N =2r, r>1.Let p, t ∈ XN = {0, 1,...,N − 1},andletχp,t(n) be the binary function ⎧ ⎨⎪ 1, if np = tmodN − χp,t(n)=⎪ n =0:(N 1). ⎩ 0, otherwise

Given a sample p ∈ XN and integer t ∈ [0,N/2], the function

χp,t(n)=χp,t(n) − χp,t+n/2(n) is called the 2-paired, or shortly the paired function.

The complete set of these functions is defined for frequency points p =2k,k =0,...,r− 1 and p =0, and time points 2kt. The binary paired functions can also be written as the following transformation of the consine function:

r−k χ2k,2kt(n)=M(cos(2π(n − t)/2 )), (χ0,0(n) ≡ 1), where t =0:(2r−k−1 − 1). M(x) is the real function which is not zero only on the bounds of the interval [−1, 1] and takes values M(−1) = −1 and M(1) = 1. The paired functions are determined by the extremal values of the consine functions, when they run through the interval with different frequencies.

39 The totality of the N paired functions

r−n−1 {χ2k,2kt; n =0:(r − 1),t=0:(2 − 1, 1}

is the complete and orthogonal set of functions [132,134].

Haar Transform

The Haar transform is the first orthogonal transform found after the Fourier transform, which is

now widely used in wavelets theory and in applications in image processing, in the N =2r, r>1 the transform is defined without normalization by the following matrix: ⎡ ⎤ ⎢ 11⎥ [HA2]=⎣ ⎦ 1 −1

⎡ ⎤

⎢ [HA2][HA2] ⎥ [HA4]=⎣ √ √ ⎦ , 2I2 − 2I2

where I2 is the unit matrix 2 × 2,andfork>2 ⎡ ⎤

⎢ [HA2k][HA2k] ⎥ [HA2k+1 ]=⎣ √ √ ⎦ . k k 2 I2k − 2 I2k

Heap Transform

The discrete Heap transform is a new concept which was introduced by Artyom Grigoryan in 2006 [135]. The basis functions of the transformation represent certain waves which are propagated in

the “field" which is associated with the signal generator. The composition of the N-point discrete

heap transform, T, is based on the special selection of a set of parameters ϕ1, ..., ϕm, or angles from the signal generator and given rules, where m ≥ N − 1. The transformation T is considered

40 separable, which means there exist such transformations Tϕ1 ,Tϕ2 , ..., Tϕm that

T = Tϕ1,...,ϕm = Tϕi(m) ...Tϕi(2) Tϕi(1)

where i(k) is a permutation of numbers k =1, 2, ..., m.

Consider the case when each transformation Tϕk changes only two components of the vec-

tor z =(z1, ..., zN−1) . These two components may be chosen arbitrarily and such a selection is

defined by a path of the transform. Thus, Tϕk is represented as

z → z z Tϕk : (z1, ..., zk1−1,fk1 ( ,ϕk),zk1+1, ..., zk2−1,fk2 ( ,ϕk),zk2+1, ..., zm). (2.1)

Here the pair of numbers (k1,k2) is uniquely defined by k, and 1 ≤ k1

as well as all functions fk2 (z,ϕ) equal to a function g(z,ϕ). The n-dimensional transformation

T = Tϕ1,...,ϕm is composed by the transformations

Tk1,k2 (ϕk): (zk1 ,zk2 ) → (f(zk1 ,zk2 ,ϕk),g(zk1 ,zk2 ,ϕk)).

The selection of parameters ϕk,k=1:m, is based on specified signal generators x, the num- ber of which is defined through the given decision equations, to achieve a uniqueness of parameters

and desired properties of the transformation T. Consider the case of two decision equations with one signal-generator.

Let f(x, y, ϕ) and g(x, y, ϕ) be functions of three variables; ϕ is referred to as the rotation parameter such as the angle, and x and y as the coordinates of a point (x, y) on the plane. It is

assumed that, for a specified set of numbers a, the equation g(x, y, ϕ)=a has a unique solution with respect to ϕ, for each point (x, y) on the plane or its chosen subset.

41 The system of equations ⎧ ⎨⎪ f(x, y, ϕ)=y0 ⎪ ⎩ g(x, y, ϕ)=a is called the system of decision equations [135]. First the value of ϕ is calculated from the second

equation which we call the angular equation. Then, the value of y0 is calculated from the given

input (x, y) as y0 = f(x, y, ϕ). It is also assumed that the two-point transformation

Tϕ :(z0,z1) → (z0,z1)=(f(z0,z1,ϕ),g(z0,z1,ϕ)),

which is derived from the given decision equations by Tϕ :(x, y) → (f(x, y, ϕ),a), is unitary. We call Tϕ the basic transformation. Example 1: Consider the following functions that describe the elementary rotation:

f(x, y, ϕ)=x cos ϕ − y sin ϕ,

g(x, y, ϕ)=x sin ϕ + y cos ϕ.

Given a real number, the basic transformation is defined as the rotation of the point (x, y) to the horizontal Y = a,

Tϕ :(x, y) → (x cos ϕ − y sin ϕ, a).

The rotation angle ϕ is calculated by     a y ϕ = arccos + arctan . x2 + y2 x

Thefirstpairtobeprocessedis(x0,x1),

(1) (x0,x1) → (x0 ,a),

42 the next is (y0,x2), (1) (2) (x0 ,x2) → (x0 ,a),

(2) with the new value of x0 = x0 , and so on. The first component of the signal is renewed and − − participates in calculation of all (N 1) basic transformations Tk = Tϕk ,k =1:(N 1). (k) Therefore, at the stage k, the first component of the transform is y0 = x0 . The complete transform of the signal-generator x is

(N−1) T (x)=(y0,a1,a2,...,aN−1), (y0 = x0 ).

The signal-flow graph of processing the five-point generator x is shown in Figure 2.16.

x y y y y 0 0 0 0 0

T 1 T 2 a 1 T 3 x 1 a 2 T 4

x 2 a 3

x T =T(φ ), k=1:4 3 k k a φ =r(y ,x ,a ) 4 k 0 k k x 4

Figure 2.16: Signal-flow graph of determination of the five-point transformation by a vector x = (x0,x1,x2,x3,x4) .

This transform is applied the the input signal zn in the same order, or path P , as the generator x. In the first stage the first two components are processed

(1) (1) Tϕ1 :(z0,z1) → (z0 ,z1 ), next (1) (2) (1) Tϕ2 :(z0 ,z2) → (z0 ,z2 ),

43 (N−2) z z(1) z(2) z (N−1) 0 0 0 0 z T T ... T 0 Level 2 φ φ φ 1 2 ... N−1 (1) (1) (1) z z z z z z 1 1 2 2 N−1 N−1

φ φ φ 1 2 N−1 (N−2) x x(1) x(2) x y 0 0 0 0 0 φ ,T φ ,T ... φ ,T Level 1 1 1 2 2 ... N−1 N−1 x x x 1 2 N−1

Figure 2.17:Networkofthex-induced DsiHT of the signal z. and so on. The result of the transform is

(n−1) (1) (1) (1) T [z]=(z0 ,z1 ,z2 ,...,zN−1),a=0.

Now consider the case when all parameters ak =0, i.e., when the whole energy of the vector x is collected in one heap, and then transfered to the first component. In other words, we consider the Givens rotations of vectors, or points (y0,xk) on the horizontal Y =0. Figure 2.16 shows

the transform-network of the transform of the signal z =(z0,z1,z2, ..., zN−1) . The parameters (angles) of the transformation are generated by the signal-generator x. In the 1st level and the kth (k−1) stage of the flow-graph, the angle ϕk is calculated by inputs (x0 ,xk), where k ∈{1,N − 1} (0) and x0 = x0. This angle is used in the basic transform Tk = Tϕk to define the next component (k) x0 , as well as to perform the transform of the input signal z, in the 2nd level. The full graph itself

represents a co-ordinated network of transformation of the vector z, under the action on x.

2.4.2 Enhancement methods

The common algorithm for image enhancement via a 2-D invertible transform consists of: The

frequency ordered system-based method can be represented as

−1 x → X = T(x) → O · X → T [O(X)] = x.

44 Algorithm 2.1 Transform based image enhancement 1. Perform the 2-D unitary transform

2. Multiply the transform coefficients, X(p, s) by some factor, O(p, s)

3. Perform the 2-D inverse unitary transform

O is an operator which could be applied on the coefficients X(p, s) of the transform or its real

and imaginary parts ap,s and bp,s if the transform is complex. For instance, they could be X(p, s),

α α α α ap,s, bp,s,orlog ap,s,log bp,s. The cases of greatest interest are when O(X)p,s is an operator of magnitude and when O(x)p,s is performed separately on the coefficients. Let X(p, s) be the transform coefficients and let the enhancement operator O be of the form

X(p, s) · C(p, s), where the latter is a real function of the magnitude of the coefficients, i.e.,

C(p, s)=f(|X|)(p, s). C(p, s) must be real since only modification of the magnitude and not phase information is desired. The following possibilities are a subset of methods for modifying the magnitude coefficients within this framework.

γ α−1 1. C1(p, s)=C(p, s) |X(p, s)| , 0 ≤ α<1 (which is the so-called modified α-rooting);   β λ 2. C2(p, s)=log |X(p, s)| +1 , 0 ≤ β,0 <λ;

3. C3(p, s)=C1(p, s) · C2(p, s).

α, λ,andβ are the parameters of the enhancement which are selected by the user to achieve the desired enhancement. Denoting by θ(p, s) ≥ 0 the phase of the transform coefficient X(p, s),the transform coefficient can be expressed as

jθ(p,s) X(p, s)=|X(p, s)|e

where |X(p, s)| is the magnitude of the coefficients. The investigation of the operator O applied to the modules of the transform coefficients instead of directly to the transform coefficients X(p, s)

45 will be performed as

O(X)(p, s)=O(|X|)(p, s)|e[jθ(p,s)].

The assumption that the enhancement operator O(|X|) takes one of the forms Ci(p, s)|X(p, s)|,i= 1, 2, 3 at every frequency point (p, s) is made. Figure 2.18 shows Galaxy NGC 4242 in the time domain (pixel intensity values) and frequency domain (spectral coefficients).

(a) intensity image (b) spectral coefficients

Figure 2.18: Intensity values and spectral coefficients of Galaxy NGC 4242.

Figure 2.19 shows Butterworth lowpass filtering for Galaxy UGC 7617 for n =2and D0 =

120. The transfer function of the filter of order n with cutoff frequency at a distance D0 from the origin is defined as 1 X(p, s)= 2n . 1+[D(p, s)/D0]

α-rooting

Figure 2.20 shows the enhancement of Galaxy NGC 4242 by method C1(p, s) with α =0.02.

Heap transform

Figure 2.21 shows the results of enhancing galaxy images PIA 14402 and NGC 5194 by the Heap transform.

46 (a) original image (b) low pass filtering

Figure 2.19: Butterworth lowpass filtering performed in the Fourier (frequency) domain.

(a) original image (b) enhancement by α =0.02

Figure 2.20: α-rooting enhancement of Galaxy NGC 4242.

2.5 Image Preprocessing

The steps taken to prepare the galaxy images for feature extraction are detailed in this section. The position, size, and orientation of the galaxy varies from image to image. Therefore, the prepro- cessing steps will produce a training set that is invariant to galaxy position, scale and orientation.

Individual galaxies were cropped from the digitized photographic plates and processed manually by adjusting parameters at several stages in the pipeline. Automatic selection of these parameters if part of future work. Figure 2.5 shows the computational scheme for the classification pipeline.

47 Figure 2.21: Top: Galaxy PIA 14402, Bottom: NGC 5194, both processed by Heap transform.

2.5.1 Segmentation

Other than the object of interest, galaxy images contain stars, gast, dust, and artifacts induced during the imaging and scanning process. For a galaxy to be recognized, such contents not included in the galaxy need to be removed. In general, this process involves denoising and inpainting. Here, the background is subtracted via a single threshold or Otsu’s method. Otsu’s method is calculated in Matlab by the command graythresh. Otsu’s method automatically selects a good threshold for images where there are few stars and the galaxy intensity varies greatly from the background. As the quantity and size of stars increase in the image, or when the background is close in intensity to the galaxy, Otsu’s method is not performing well. After background subtraction by thresholding, stars and other artifacts are removed by the morphological opening operation by different values of pixel connectivity using the Matlab function bwareaopen.

A grayscale image relates to a function f(x, y) that takes values from a finite interval [0,M]. In the discrete case, M is considered to be a positive integer. Consider an image with only one

48 Galaxy Images

Segmentation: Thresholding Morphological Opening

Feature Invariance: Rotation, Centering, Resizing

Canny Edge Detection

Feature Extraction: Elongation Form Factor Convexity Bounding-rectangle-to-fill-factor Bounding-rectangle-to-perimeter Asymmetry Index

Support Vector Machine

Galaxy Classes

Figure 2.22: Computational scheme for galaxy classification.

49 object ⎧ ⎨⎪ 1(x, y) ∈ O ⊂ X f(x, y)=⎪ ⎩ 0 otherwise

where O is the set of pixels in the object, and X is the whole domain of the image. The function f(x, y) represents a binary image. Any number can be used instead of 1, e.g., 255. Thresholding is defined as the following procedure ⎧ ⎨⎪ 1 f(x, y) ≥ T g(x, y)=gT (x, y)=⎪ ⎩ 0 otherwise where T is a positive number from the interval [0,M]. This number is called a threshold.

Otsu’s method begins by representing a grayscale image by L gray levels. ni represents the number of pixels at level i, and the total number of pixels N = n1 + n2 + ...+ nL.Theimage histogram is then described by a probability distribution

L ni pi = ,pi ≥ 0, pi =1. N i=1

The intensity values are then separated into two classes C0 and C1 by a threshold k,whereC0 represents the intensities [0,...,k] and C1,[k +1,...,L]. The occurrence, mean levels for each class are respectively given by

k w0 = Pr(C0)= pi = w(k) i=1

L w1 = Pr(C1)= pi =1− w(k) i=k+1 and k k ipi μ(k) μ0 = iPr(i|C0)= = 0 i=1 i=1 w w(k)

50 L L ipi μT − μ(k) μ1 = iPr(i|C1)= = 1 − i=k+1 i=k+1 w 1 w(k) where w(k) and μ(k) are the zeroth- and first-order moments up the the kth level, respectively, and

L μT = μ(L)= ipi i=1 is the total mean level of the original image. The following relationships are easily verified for any k

w0μ0 + w1μ1 = μT ,w0 + w1 =1. (2.2)

The class variances are given by

k k 2 2 2 (i − μ0) pi σ0 = (i − μ0) Pr(i|C0)= i=1 i=1 w0

L L 2 2 2 (i − μ1) pi σ1 = (i − μ1) Pr(i|C1)= . 1 i=k+1 i=k+1 w

The following criteria to measure k as an effective threshold are introduced from discriminant analysis 2 2 2 σB σT σB λ = 2 ,κ= 2 ,η= 2 , σW σW σT

where

2 2 2 σW = w0σ0 + w1σ1

2 2 2 σB = w0(μ0 − μT ) + w1(μ1 − μT )

and from equation 2.2 L 2 2 σT = (i − μT ) pi i=1 are the within-class variance, the between-class variance, and the total variance of levels, respec-

tively.

51 Through relationships between the criteria, the problem becomes finding the k that maximizes

2 the criterion η or equivalently σB by 2 σB η(k)= 2 σT

or 2 2 [μT w(k) − μ(k)] σB(k)= , w(k)[1 − w(k)] and, as shown in [136], the optimal threshold k∗, restricted to the range S∗ = {k;0

2 2 σB(k∗)= max σB(k). 1≤k

Figure 2.23 shows original images with subtracted backgrounds by different manual thresholds and Otsu’s method.

(a) Original image (b) T =60

(c) T =74 (d) Otsu’s T =85

Figure 2.23: Background subtraction of Galaxy NGC 4274 by manual and Otsu’s thresholding.

The average difference between single thresholds and thresholds by Otsu’s method for the enhanced data set was 6.67 with a standard deviation of 11.21. Mathematical morphology provides image processing with powerful nonlinear filters which

52 operate according to the Minkowski’s addition and subtraction. Given subsets X and B of Rn,

Minkowski’s addition, X ⊕ B,ofsetsX and B is the set

 X ⊕ B = {Xb = {x + b; x ∈ X}}. b∈B

For the set Bˇ = {−b; b ∈ B} symmetric to B with respect to the origin, the set X ⊕ Bˇ is called a

dilation of the set X by B.ThesetB is said to be a structuring element. So, in the symmetric case, if Bˇ = B, Minkowski’s addition of sets X and B and the dilation of X by B are the same concepts.

The dual operation to Minkowski’s addition of sets X and B is the subtractions, X B,which is defined as  c c X B =(X ⊕ B) = {Xb = {x + b; x ∈ X}}. b∈B

The set X Bˇ dual to the dilation X ⊕ Bˇ is called an erosion of the set X by B. By means of dilation and erosion of sets, the corresponding operations of opening, X ◦ Bˇ, and closing, X • Bˇ, can be defined as  X ◦ Bˇ =(X Bˇ) ⊕ Bˇ = {x + Bˇ; x + Bˇ ⊂ X}

X • Bˇ =(Xc ◦ Bˇ)c =(X ⊕ Bˇ) B.ˇ

Herewith, the operation of opening of X by B is dual to the operation of closing of X by B, i.e.,

X ◦ Bˇ =(Xc • Bˇ)c. Figure 2.24 shows star and artifact removal of Galaxy NGC 5813 with pixel connectivity P =64.

2.5.2 Rotation, Shifting and Resizing

To achieve invariance to orientation, position, and scale, the galaxies were shifted by their geomet- rical center, rotated by the angle between their first principal component and the image x-axis, and resized to a uniform size of 128x128 pixels, respectively.

53 (a) original image (b) thresholded image

(c) opened image

Figure 2.24: Morphological opening for star removal from Galaxy NGC 5813.

54 The geometrical center, or centroid, of an object in an image is the center of mass of the object. The center is the point where one can concentrate the whole mass of the object without changing the first moment relative to any axis. The first moment with respect to the x axis is defined by

μx f(x, y)dxdy = xf(x, y)dxdy. X X

The first moment with respect to the y axis is defined by

μy f(x, y)dxdy = yf(x, y)dxdy. X X

The coordinate of the object center is then (μx,μy).

In the discrete case, the first moment with respect to the axis x is defined by

μx fn,m = nfn,m = n fn,m n m n m n m

and with respect to the y axis

μy fn,m = mfn,m = m fn,m n m n m n m

where the summation is performed over all pixels (n, m) of the object O. The center of the object is defined as ⎛ ⎞ nfn,m mfn,m ⎜ ⎟ ⎜ n m n m ⎟ (μx,μy)=⎝ , ⎠ . fn,m fn,m n m n m

55 In the discrete binary case, the center is defined as ⎛ ⎞ ⎛ ⎞ ⎜ n m⎟ ⎜ n m⎟ ⎜ (n,m)∈O (n,m)∈O ⎟ ⎜ (n,m)∈O (n,m)∈O ⎟ ⎜ ⎟ ⎜ ⎟ (μx,μy)=⎝ , ⎠ = ⎝ , ⎠ 1 1 card(O) card(O) (n,m)∈O (n,m)∈O where card(O) is the cardinality of the set O that defines the binary image. To find the orientation of an object in an image, if possible or if such exists and is unique, consider a line along which the second moment is minimum. In other words, consider the integral

2 E = μ2(l)= r f(x, y)dxdy (2.3) l where r is the distance of point (x, y) from the line l, i.e., the length of the perpendicular emitted from point (x, y) to the line l. The line l is described by the equation

l : xsinθ − ycosθ + p =0 where p is the length of the perpendicular drawn from the origin (0, 0) to the line l. Therefore, 2.3 can be rewritten as

2 E = E(θ)= (xsinθ − ycosθ + p) f(x, y)dxdy. (2.4) l

The following two denotations are made to for the image coordinates shifted by the geometrical center of the object

x = x − μx,y= y − μy,

and the second moments of the shifted object are denoted

a = (x)2f(x, y)dxdy,c= (y)2f(x, y)dxdy,b= (x)2(y)2f(x, y)dxdy. l l l

56 E(θ) can then be rewritten as

E(θ)=asin2(θ) − bsin(θ)cos(θ)+ccos2(θ) 1 1 1 or E(θ)= (a + c) − (a − c)cos(2θ) − bsin(2θ). 2 2 2

Differentiating E by θ gives

b E(θ) =0→ tan(2θ)= (a = c = b). a − c

Therefore, the angle of the orientation line l(θ) is found by

b a − c sin(2θ)=± , cos(2θ)=± . b2 +(a − c)2 b2 +(a − c)2

The angle of the orientation line l(θ) was calculated for each galaxy image, and the used to rotate the image by the Matlab function imrotate. Figure 2.25 shows this rotation for galaxy image

NGC 4096 by angle −64 degrees. Note that the image x-axis of the image in Matlab is vertical, and the desired orientation of the galaxy’s first principal component being collinear with the horizontal axis of the image is achieved by rotating the galaxy an additional 90 degrees.

(a) segmented galaxy (b) rotated galaxy

Figure 2.25: Rotation of Galaxy image NGC 4096 by galaxy second moment defined angle.

57 Resizing an image involves either subsampling if the desired image size is less than the original image size and resampling if the desired image size is greater than the original image. Subsampling reduces the size of an image by creating a new image with pixel value a calculated from the values of a neighborhood of pixels about a in the original image. Resampling from the image size of 128 × 128 into 256 × 256 is calculated by ⎡ ⎤ ⎢ ······⎥ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ···· ⎢ · aabb· ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ · · ⎥ ⎢ · · ⎥ ⎢ ab ⎥ ⎢ aabb ⎥ ⎢ ⎥ → ⎢ ⎥ . ⎢ ⎥ ⎢ ⎥ ⎢ · cd· ⎥ ⎢ · ccdd· ⎥ ⎣ ⎦ ⎢ ⎥ ⎢ ⎥ ···· ⎢ · ccdd· ⎥ ⎣ ⎦ ······

Another process of subsampling is defined by calculation of means, as follows below for the

2 × 2 subsampling example where

a1 + a2 + b1 + b2 a3 + a4 + b3 + b4 a = ,b= 4 4 c1 + c2 + d1 + d2 c3 + c4 + d3 + d4 c = ,d= . 4 4

Image resizing is peformed in Matlab by the function imresize. Figure 2.26 shows an example

of image resizing from size 138 × 197 into size 128 × 128 for galaxy image NGC 4220.

2.5.3 Canny Edge Detection

The Canny edge detection method was developed by John Canny in 1986. The Canny edge detector

was developed to satisfy the performance criteria: (1) Good detection (2) Good localization (3) Only one response to a single edge. Good detection means reducing false positives (non edges

being detected as edges) and false negatives (edges not being detected). Good localization means that minimal error exists between identified edge points and true edge points. Only one response

58 (a) cropped image size 138 × 197 (b) image resized to 128 × 128

Figure 2.26: Resizing of Galaxy NGC 4220. to a single edge ensures that the operator eliminates the multiple maxima output from the filter at step edges. Canny formulated each of these three criterion mathematically and found solutions through numerical optimization. The result is that impulse response of the first derivative of a

Gaussian approximately the optimal edge detector which optimizes the signal-to-noise ratio and localization, i.e., the first two criteria. The edge detected algorithm is presented below here. Let f(x, y) denote the input image and G(x, y) denote the Gaussian function

x2 + y2 − G(x, y)=e 2σ2 .

The convolution of these two functions results in a smoothing of the input image and is written as

s(x, y)=f(x, y) ∗ G(x, y), where σ controls the degree of smoothing of the image.

First order finite difference approximations are used to compute the gradient of s(x, y) which

59 is written as [sx,sy] where δs δs sx = ,sy = . δx δy

The gradient magnitude and orientation or angle are respectively computed by

2 2 M(x, y)= sx + sy

and −1 sy α(x, y)=tan . sx

The array of image magnitudes will contain large values in the directions of greatest change. The array is then thinned so that only the magnitudes at the points of greatest local change remain. This procedure is called non maxima suppression. An example presents this notion. Consider a

3 × 3 grid where 4 possible orientations are found through the center point in the grid: horizontal, vertical, +45 degrees, and −45 degrees. All possible orientations have been discretized into these 4 orientations. A range of orientations is then specified to quantize the orientations. Edge direction

is determined by the edge normal, computed by 2.5.3.

Let dk, k =1, 2,...,nrepresent the discrete orientations where n is the number of orientations. Using the 3 × 3 grid, every nonmaxima suppression scheme at every point (x, y) in α(x, y) can be formulated as where st(x, y) is the nonmaxima suppressed image.

Algorithm 2.2 Nonmaxima suppression algorithm

1. Find the orientation dk which is closest to α(x, y)

2. Set st(x, y)=0ifM(x, y) is less than at least one of its two neighbors along dk,otherwise, st(x, y)=M(x, y).

Finally, a hysteresis thresholding is applied to st(x, y) to reduce falsely detected edges. Two

thresholds are used here and are referred to as a weak (or low) threshold τ1 and a strong (or high)

threshold τ2. Too low of a threshold will retain false positives. Too high of a threshold will remove

60 correctly detected edges. The double threshold produces two new images written as

stw(x, y)=st(x, y) ≥ τ1

where stw(x, y) denotes the image created due to the weak threshold and

sts(x, y)=st(x, y) ≥ τ2

where sts(x, y) denotes the image created due to the strong threshold. Edges in sts(x, y) are linked into contours by searching through an 8 pixel neighborhood in stw(x, y) for edges that can be linked to the end of the current edge. The output of the algorithm is the image of all nonzero points in stw(x, y) appended to sts(x, y). Canny edge detection was performed using the Matlab function edge with τ1 =0.3, τ2 =0.9 and σ =1.5. Figure 2.27 shows the Canny edge detector for multiple galaxy images.

2.6 Data Mining and Classification

The canonical problem addressed in the field of data mining and classification is the following:

Given a very large family of vectors (signals, images, etc.) each of which lives in a high dimen- sional space, how can the set be effectively represented this data for storage and retrieval, for

recognizing patterns within the images, and for classifying objects. In the subsequent sections, a small subset of the tools used in statistics, data mining, and machine learning in astronomy will be

investigated to address the posed problem of the representation and classification of galaxy images.

2.6.1 Feature Extraction

A useful galaxy feature descriptor varies in value so that a classifier can discriminates between

input galaxies and place each galaxy into one of several classes. The shape, or morphologi- cal, features used in this paper are described in [26, 31, 137] and are Elongation (E), Form Fac-

61 (a) NGC 6070 original (b) NGC 6070 canny edge

(c) NGC 4460 original (d) NGC 4460 canny edge

(e) NGC 4283 original (f) NGC 4283 canny edge

Figure 2.27: Canny edge detection.

62 tor (F), Convexity (C), Bounding-rectangle-to-fill-factor (BFF), Bounding-rectangle-to-perimeter (BP), and Asymmetry Index (AI). Table ?? gives the average values of the original data for these features.

Elongation has higher values for spiral and lenticular galaxies and lower values for irregular and elliptical galaxies. This feature can be written as

(a − b) E = (a + b) where a is the major axis and b is the minor axis. Form factor is useful in dividing spiral galaxies from other classes. This feature can be written as A F = P 2 where A is the number of pixels in the galaxy and P is the number of pixels in the galaxy edge found by canny edge detection.

Convexity has larger for spirals with open winding arms and lower values for compact galaxies such as are in the class elliptical. This feature can be written as

P C = (2H +2W ) where P is as defined above and H and W are the height and width of minimum bounding rectangle for the galaxy.

Bounding-rectangle-to-fill-factor is... This feature is defined as

A BFF = HW where A, H,andW are as defined above. Bounding-rectangle-to-perimeter shows a decreasing trend from compact and circular galaxies

63 Table 2.1: Morphological Feature Descriptions Feature Formula E (a−b)/(a+b) Has higher values for s F A/P 2 Form factor is useful in dividing spiral gala C P/(2H+2W ) Convexity has larger for spirals with open winding arms and lower v BFF A/HW BP HW/(2H+2W )2 Bounding-rectangle-to-perim P P |I(i,j)−I180(i,j)| AI i,j / i,j |I(i,j)| The asymmetry index tends towa

Table 2.2: Feature Values Per Class Feature Elliptical Lenticular Simple Spiral Barred Spiral Irregular E 0.071 0.382 0.547 0.485 0.214 F 0.059 0.049 0.025 0.029 0.044 C 0.888 0.872 1.05 1.01 0.953 BFF 0.744 0.699 0.609 0.583 0.634 BP 0.062 0.052 0.043 0.048 0.059 AI 0.274 0.375 0.510 0.464 0.354

to open and edge-on galaxies. This feature can be written as

HW BP = (2H +2W )2 where H and W are as defined above. The asymmetry index tends towards zero when the image is invariant under a 180 degree rota- tion. This feature can be written as |I(i, j) − I180(i, j)| i,j AI = |I(i, j)| i,j where I is the original image and I180 is the image rotated by 180 degrees.

2.6.2 Principal Component Analysis

Data may be highly correlated, but represented such that its axes are not aligned with the directions

in which the data varies the most. A data set generated by N observations with K measurements

64 per observation lives in a K-Dimensional space, each dimension, or axis, representing a feature of the data. To represent the data in a more compact form, the axes can be rotated to be collinear with the directions of maximum variance in the data, thereby discriminating between the data points. In other words, this rotation results in the first feature being collinear with the direction of maximum variance, the second feature being orthogonal to the first and maximizing the residual variance, and so on. This dimensionality reduction technique is called Principal Component Analysis (PCA), also known as the Karhunen-Loéve transform or Hotelling transform, and is depicted in Figure 2.28 for a bivariate Gaussian distribution. Consider the data set xi with N observations and K features

Figure 2.28: PCA rotation of axes for a bivariate Gaussian distribution. written as the N × K matrix X. The covariance matrix of zero mean data is estimated as

1 T CX = X X N − 1

65 where N is the dimension of the matrix and division by N−1 is necessary for CX to be an un-biased estimate of the covariance matrix. Nonzero components in the off diagonal entries represent corre- lation between the features, whereas zero components represent uncorrelated data. PCA transform the original data into equivalent uncorrelated data so that the covariance matrix of the new data is diagonal and the diagonal entries decrease from top to bottom. To achieve this, PCA attempts to

find a nonsingular matrix R which transforms X into such an ideal matrix. The data transforms to

Y = XR and its covariance estimate to

T T T CY = R X XR = R CX R

The first column r1 of R is the first principal component, and is along the direction of the data with maximum variance. The columns of R which are called principal components form an orthonormal basis of the data space. The first principal component r1 can therefore be derived using Lagrangian multipliers and setting equal to zero the cost function φ(r1,λ) as

T T φ(r1,λ)=r1 CX r1 − λ1(r1 r1 − 1).

δφ(ri,λ) Setting =0set then gives δri

CX r1 − λ1r1 =0 or CX r1 = λ1r1.

This shows that λ1 is an eigenvalue of the covariance matrix CX, i.e., a root of (CX − λ1I)=0.

T λ1 = r1 CXr1 being the largest eigenvalue in CX equates to maximizing the variance along the first principal component. The remaining principal components are derived in the same manner.

CX The matrix CY is the transformation of CX in the basis consisting of the columns of R,

the eigenvectors of CX . This comes to have the new basis, i.e., the columns of R have a basis, of eigenvectors of CX .SinceCX is symmetric by definition, the Spectral Theorem guarantees that

the eigenvectors of CX are orthogonal. These eigenvectors can be listed in any order and CY will

66 remain diagonal. However, the requirement of PCA is to list them such that the diagonal entries of

CY be in decreasing order of their values, which comes to a unique order of the eigenvectors which make the columns of R. The order of the components (or dimensions) is the so named rank-order

T according to variance. With CX = RCY R and these eigenvectors in this order, the set of principal components is defined.

The morphological feature data described in 2.6.1 was reduced in dimension from 6 to 2 by keeping the first two principal components for both the comparison of classification performance

with compressed data and visualization. All classification figures in the following sections were

generated from the classification of PCA features.

2.6.3 Support Vector Machines

The Support Vector Machine (SVM) learning algorithm captures the structure of a multi-class training data set towards predicting class membership of unknown data with correctness and high

decision confidence. Classes are divided by a decision boundary or hyperplane defined by with the minimum distance between the boundary and nearest point in each class defining the margins of

the boundary, which the SVM optimizes. Points that lie on the margin are called support vectors.

Consider a linear classifier for a binary classification problem with labels y, y ∈{−1, 1},and features x. The classifier is written as

T hw,b(x)=g(w x + b),

and ⎧ ⎨⎪ 1 if z ≥ 0 g(z)=⎪ ⎩ −1 otherwise

67 where w is the weight vector, and b is the bias of the hyperplane. Given a training example (x(i),y(i)), the functional margin of (w, b) is defined with respect to the training example as

γ(i) = y(i)(wT x(i) + b).

If y(i) =1,thenwT (x(i) + b) need to be a large positive number for a large functional margin, and, conversely, if y(i) = −1,thenwT (x(i) + b) needs to be a large negative number. A large functional margin represents a confident and correct prediction.

With the chosen g,ifw and b are scaled by 2, the function margin is scaled by a factor of 2.

T T However, since g(w x + b)=g(2w x +2b), no change would occur in hw,b(x). This shows that

T hw,b(x) depends only on the sign, and not the magnitude, of g(w x + b).

Given a training set S = {(x(i),y(i)); i =1, 2,...,m}, the functional margin of (w, b) with respect to S is defined as the smallest functional margin of the individual training examples and is written as

(i) γ =minγ . i=1,...,m

Another type of margin is the geometric margin. Consider the training set in Figure 2.6.3.

The hyperplane defined by (w, b) is shown, along with vector w, which is normal to the hyper- plane. Point A represents positive training example x(i) with label y(i) =1. The geometric margin of point A, γ(i), has distance of line segment AB. Point B is defined by x(i) − γ(i)w/||w||.Since point B is on the decision boundary, which satisfies the equation wT x + b =0,then

T (i) (i) w w x − γ + b =0. ||w||

Solving for γ(i) yields

T (i) T (i) w x + b w (i) b γ = = x + . ||w|| ||w|| ||w||

In general, the geometric margin of (w, b) with respect to any training example (x(i),y(i)) is given

68 6

A u w Q  Q u u Q Q (i) Q γ QB Q u Q Q u Q Q u u e Q e Q u Q e Q Q e e Q e Q Q e Q e e -

Figure 2.29: Pictorial representation of the development of the geometric margin.

by   T (i) (i) w (i) b γ = y x + . ||w|| ||w||

Note that if ||w|| =1, then the geometric margin equals the functional margin. Additionally, the geometric margin is invariant to scaling the parameters w and b.

Given a training set S = {(x(i),y(i)); i =1, 2,...,m}, the geometric margin of (w, b) with respect to S is defined as the smallest geometric margin of the individual training examples and is written as

(i) γ =minγ . i=1,...,m

Assuming the training data is linearly separable, the problem of determining the boundary decision that maximizes the geometric margin is posed as the follow optimization problem

(i) T (i) max γ subject to y (w x + b) ≥ γ,i =1, 2,...,m and ||w|| =1. γ,w,b

The ||w|| =1constraint is non-convex. To work towards recasting the optimization problem as convex, first recall that γ = γ/ ||w||. With this relation, the problem can then be written as an

69 6 QQ QQ u Q QQ Q u u Q QQ Q QQ QQ Q QQ QQ Q u Q QQ QQ Q Q QQ QQ Q u Q QQ u u QQe Q QQ e QQ Q u Q QQ e QQ Q Q QQ e QQ Q e Q QQ e e QQ Q QQ Q e QeQ -

Figure 2.30: Maximum geometric margin.

optimization of the functional margin that achieves the geometric margin optimization:

γ max subject to y(i)(wT x(i) + b) ≥ γ,i =1, 2,...,m. γ,w,b ||w||

γ Again, the object function is non-convex, and the problem cannot be solved by standard ||w|| optimization software.

Recall that w and b can be scaled without affecting the decision of our classifier. The scaling constraint that the functional margin of (w, b, ) with respect to the training set must be 1 is intro- γ 1 1 duced, γ =1. then becomes , and since maximizing is equivalent to minimizing ||w|| ||w|| ||w|| ||w||, the geometric margin convex optimization problem is then posed as

1 2 (i) T (i) min ||w|| subject to y (w x + b) ≥ 1,i=1, 2,...,m. γ,w,b 2

Which can be solved by the commercial quadratic programming (QP) code. Figure ?? illustrates the geometric margin for a training set.

Whereas the previous problem is referred to as the primal form, optimization theory tells of

70 a dual form for expressing the primal problem. Constructing the Lagrangian for the optimization problem gives m 1 2 (i) T (i) L(w, b, α)= ||w|| − αi[y (w x + b) − 1]. (2.5) 2 i=1

To find the dual form of the problem, L(w, b, α) is minimized with respect to w and b for fixed α. Setting the derivatives of L with respect to w and b to zero gives

m (i) (i) wL(w, b, α)=w − αiy x =0 i=1

which implies that m (i) (i) w = αiy x =0. (2.6) i=1 Take the derivative with respect to b gives

m δ (i) L(w, b, α)= αiy =0. (2.7) δb i=1

Substituting the definition of w in (2.2) into the Lagrangian in (2.1) yields

m m m 1 (i) (j) (i) T (j) (i) L(w, b, α)= αi − y y αiαj(x ) x − b αiy , i=1 2 i,j=1 i=1 but from (3) the last term is equal to zero, which gives

m m 1 (i) (j) (i) T (j) L(w, b, α)= αi − y y αiαj (x ) x . i=1 2 i,j=1

This result, along with the constraints αi ≥ 0 and (10), the following dual optimization problem is obtained

m m 1 (i) (j) (i) (j) max W (α)= αi − y y αiαjx ,x  α i=1 2 i,j=1 m (i) subject to αi ≥ 0,i=1, 2,...,m, and αiy =0. i=1 71 Suppose the model’s parameters have been fit to a training set. The task now is to predict class

membership of a new point input x by calculating wT x + b and, if this quantity is grater than zero,

predict y =1. Using the expression for w in (2.2), this calculation can be written   m T T (i) (i) w x + b = αiy x x + b (2.8) i=1

m (i) (i) g(z)= αiy x ,x + b (2.9) i=1

(i) where the points x for which αi =0 are the support vectors. So far, the assumption for the data is linear separability. In application this assumption is

relaxed by the introduction of slack variables ξi leading to the primal minimization formulation

1 2 (i) T (i) min ||w|| subject to y (w x + b) ≥ 1 − ξi,i=1, 2,...,m. γ,w,b 2

with the following constraints limiting the amount of slack

ξi ≥ 0 and ξi ≤ C. i

Therefore, misclassification is bounded in quantity by C. Finally, the SVM optimization is equivalent to minimizing

m (i) 2 (1 − y g(xi))+ + λ||w|| , (2.10) i=1 where λ is related to the misclassification bound C and the index + indicates x+ = max(0,x). Figure 2.31 shows the SVM decision boundary computed for the data of 15 galaxies having class membership to either class Irregular or Regular. SVM maps data from the input space Υ to afeaturespaceF using a nonlinear function φ :Υ→Fcalled a kernel so that the discriminant

72 6 I (training) 5 I (classified) R (training) R (classified) 4 Support Vectors

3

2

1

0

−1

−2

−3 5 10 15 20 25 30

Figure 2.31: SVM applied to galaxy data. function becomes

T hw,b(x)=w φ(x)+b. (2.11)

Many kernel functions are possible, and the present work has used the quadratic kernel

T d K(x, x )=(x x +1) . (2.12)

2.7 Results and Discussion

The galaxy data used in this classification is listed in Table 2.3. The name of each galaxy is given

along with its corresponding classification obtained from the NASA/IPAC Extragalactic Database (NED) and the relation between the NED classification and the scheme used in the present work.

Only the major galaxy classes Elliptical "E," Lenticular "S0," Spiral "S," Barred Spiral "SB," and

73 Irregular "Irr" were used in classification. All subclasses listed in the table below such as Sa, Sd, SBm, etc... were, in the SVM training and validation, generalized to belong to their respective major class. Galaxy NGC 4457 has NED classification SAB0/a(s), which is interpreted as either

S0 or SBa for compliance with the present scheme, and was judicially assigned to class barred spiral (SB) due to similarities between the feature values of NGC 4457 and the SB class. Galaxy

NGC 4144 has NED classification SAB(s)cd? edge-on and was not used in classification since a definite relation to the present classification scheme was unable to be determined.

Table 2.3: Galaxy list and relation between NED classification and current project classification

Galaxy name N.E.D. Class Present Work Class

NGC 4278 E1-2 E

NGC 4283 E0 E

NGC 4308 E? E

NGC 5813 E1-2 E

NGC 5831 E3 E

NGC 5846 E0-1 E

NGC 5846A compact E2+ E

NGC 4346 S0 edge-on S0

NGC 4460 SB0ˆ+(s)? edge-on S0

NGC 4251 SB0? edge-on S0

NGC 4220 SA0ˆ+(r) S0

NGC 4346 S0 edge-on S0

NGC 4324 SA0ˆ+(r) S0

NGC 5854 SB0 S0

NGC 5838 SA0ˆ- S0

NGC 5839 SAB0ˆ0?(rs) S0

NGC 5864 SB0ˆ0(s)? edge-on S0 74 Table 2.3: Continued NGC 5865 SAB0ˆ- S0

NGC 5868 SAB0ˆ- S0

NGC 4310 SAB0ˆ+(r) S0

NGC 4218 Sa? Sa

NGC 4217 Sb edge-on Sb

NGC 4100 SA(rs)bc Sb/Sc

UGC 10288 Sc: edge-on Sc

NGC 6070 SA(s)cd Sc/Sd

UGC 07617 Sd Sd

NGC 4457 SAB0/a(s) (S0)/SBa

NGC 4314 SB(rs)a SBa

NGC 4274 SB(r)ab SBa/SBb

NGC 4448 SB(r)ab SBa/SBb

NGC 4157 SAB(s)b? edge-on SBb

NGC 5850 SB(r)b SBb

NGC 5806 SAB(s)b SBb

NGC 4232 SBb pec? SBb

NGC 4088 SAB(rs)bc SBb/SBc

NGC 4258 () SAB(s)bc SBb/SBc

NGC 4527 SAB(s)bc SBb/SBc

NGC 4389 SB(rs)bc pec? SBb/SBc

NGC 4496 SBc SBc

NGC 4085 SAB(s)c SBc

NGC 4096 SAB(rs)c SBc

NGC 4480 SAB(s)c SBc

75 Table 2.3: Continued UGC 10133 SAB(r)c SBc

NGC 4559 SAB(rs)cd SBc/SBd

NGC 4242 SAB(s)dm SBd

NGC 4393 SABd SBd

NGC 4288 SB(s)dm SBd/SBm

NGC 3985 SB(s)m SBm

NGC 4449 IBm Irr

UGC 07408 IAm Irr

UGC 07577 Im Irr

UGC 07639 Im Irr

UGC 07690 Im Irr

NGC 4496B IB(s)m Irr

NGC 4144 SAB(s)cd? edge-on not used

The classification scheme used in this project is a subset of Hubble’s classification scheme; galaxies are assigned to 1 of the 5 major classes: Elliptical "E," Lenticular "S0," Spiral "S," Barred

Spiral "SB," and Irregular "Irr." Classification was performed by two classes at a time using Sup- port Vector Machines (SVM) in Matlab’s Statistical Toolbox with both a linear and quadratic kernel and default parameters. The Matlab functions svmtrain and svmclassify were used to train the classifiers and and perform validation, respectively. The pairs used at each iteration is shown in Figure 2.32. The idea was to iteratively perform classification between the whole remaining set and a single class, removing the classified set from the remaining whole in the next iteration. The training and validation sets are separated such that approximately one third of the data was used for validation while the remainder was used for training. The extracted feature data is was listed in a spreadsheet and sorted by class starting from elliptical to lenticular through barred spirals and

76 Iteration 1 Irregular Regular

Iteration 2 Elliptical Not Elliptical

Iteration 3 Lenticular Spiral

Iteration 4 Simple Spiral Barred Spiral

Figure 2.32: Classification iteration class pairs. irregular. The bottom one third of each class was reserved fro validation while the top one third was used for training. This process was a single-fold validation.

For Iteration 1, galaxies were assigned to the Irregular or Regular class. Training was per- formed on a set of 40 galaxies consisting of 5 irregular and 35 regular, with class membership ranging from elliptical to spiral and barred spiral. All 6 morphic features were used in the training.

The validation set contained 15 galaxies: 1 irregular and 14 regular. Of the validation set, 7/15 galaxies were classified correctly giving an accuracy of 46.6667%. Principal component analysis

(PCA) was applied to the training and validation sets, and the data was projected onto the first two principal components. Using the reduced data as input the SVM yielded a classification accuracy of 13.3333%. Using the quadratic kernel for the SVM classification yielded 13/15 (86.6667%) and

12/15 (80%) accuracy for 6 and 2 features, respectively. Figure 2.33 shows classification in the PCA feature space for each kernel. The legend indicates the symbols Irregular (I) and Regular (R).

For all subsequent classification of un-enhanced galaxy images the irregular class was removed from the training and validation sets. The next pair of classes used for SVM training is Elliptical and Not Elliptical. The label vector

77 1.2 1.2 I (training) I (training) I (classified) I (classified) 1 1 R (training) R (training) R (classified) R (classified) 0.8 Support Vectors 0.8 Support Vectors

0.6 0.6

0.4 0.4

0.2 0.2

0 0

−0.2 −0.2

−0.4 −0.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

(a) linear kernel (b) quadratic kernel

Figure 2.33: PCA feature space iteration 1 classification. used was binary with entries 1 for Elliptical and 0 for Not Elliptical. The training set consisted of

34 galaxies: 5 elliptical and 29 galaxies belonging to classes lenticular, spiral and barred spiral. The validation set contained 15 galaxies: 2 elliptical and 13 others. All 6 morphic features were used. Classification accuracy was 13/15 (86.6667%). PCA was applied to the data set and the data was projected onto the first two principal components. Classification in the reduced feature space was 12/15 correctly classified galaxies or 80% accuracy. Using the quadratic kernel for the SVM classification yielded 3/15 (20%) accuracy for both sets of 6 and 2 features. Figure 2.34 shows classification in the PCA feature space for each kernel. The legend indicates the symbols Elliptical

(1) and Not Elliptical (0).

Elliptical galaxies were then removed from the training and test sets for all subsequent classi- fication of un-enhanced galaxy images.

Lenticular and Spiral are the next two classes to be trained by the SVM. The training set con- sisted of 9 lenticular galaxies and 20 spiral galaxies, while the test set consisted of 4 lenticular and

9 spiral galaxies. All 6 morphic features were used. Classification accuracy was 11/13 (84.6154%) and 8/13 (61.5385%) with the linear and quadratic kernels, respectively. Classification accuracy of the two PCA features with the linear and quadratic kernels respectively was 9/13 (69.2308%) and

3/13 (23.0769%). Figure 2.35 shows classification in the PCA feature space for each kernel. The

78 1.2 1.2 0 (training) 0 (training) 0 (classified) 0 (classified) 1 1 1 (training) 1 (training) 1 (classified) 1 (classified) 0.8 Support Vectors 0.8 Support Vectors

0.6 0.6

0.4 0.4

0.2 0.2

0 0

−0.2 −0.2

−0.4 −0.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

(a) linear kernel (b) quadratic kernel

Figure 2.34: PCA feature space iteration 2 classification. legend indicates the symbols Lenticular (1) and Spiral (0). Lenticular galaxies were then removed

1.4 1.4 0 (training) 0 (training) 0 (classified) 0 (classified) 1.2 1.2 1 (training) 1 (training) 1 (classified) 1 (classified) 1 Support Vectors 1 Support Vectors

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

−0.2 −0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

(a) linear kernel (b) quadratic kernel

Figure 2.35: PCA feature space iteration 3 classification. from the training and test sets for all subsequent classification of un-enhanced galaxy images.

The final categories to be trained by SVM are Simple Spiral, which are referred to as spiral, and Barred Spiral. The training set contained 5 simple spirals and 15 barred spirals. Validation was performed by 2 simple and 7 barred spirals. All 6 morphic features were used. The SVM clas- sified 7/9 galaxies correctly giving 77.7778% accuracy. After PCA, 2/9 galaxies were classified correctly giving 22.2222% accuracy. Using the quadratic kernel for the SVM classification yielded

79 8/9 (88.8889%) and 2/9 (22.2222%) accuracy for 6 and 2 features, respectively. Figure 2.36 shows classification in the PCA feature space for each kernel. The legend indicates the symbols Simple

Spiral (1) and Barred Spiral (0). Classification was then performed for the Heap transform en-

1 1 0 (training) 0 (training) 0 (classified) 0 (classified) 0.8 1 (training) 0.8 1 (training) 1 (classified) 1 (classified) Support Vectors Support Vectors 0.6 0.6

0.4 0.4

0.2 0.2

0 0

−0.2 −0.2

−0.4 −0.4 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

(a) linear kernel (b) quadratic kernel

Figure 2.36: PCA feature space iteration 4 classification. hanced galaxy image set. Table 2.4 summarizes the classification results for both the original and enhanced data, 6 and 2 features, and linear and quadratic kernels. Figures 2.37, 2.38, 2.39, 2.40 show the classification results of the PCA feature space for the enhanced data set. The total classifi- cation accuracy for the original data was 51.570% and for the enhanced data was 64.494% giving an overall improvement in classification performance by galaxy image enhancement of 12.924%.

80 Classification Results Linear Kernel Original Data 6 Features 2 PCA Features Irregular/Regular 7/15 (46.6667%) 2/15 (13.3333%) Elliptical/Not Elliptical 13/15 (86.6667%) 3/15 (20%) Lenticular/Spiral 11/13 (84.6154%) 9/13 (69.2308%) Spiral/Barred Spiral 7/9 (77.7778%) 2/9 (22.2222%) Enhanced Data Irregular/Regular 4/15 (26.6667%) 2/15 (13.3333%) Elliptical/Not Elliptical 11/15 (73.3333%) 10/15 (66.6667%) Lenticular/Spiral 11/13 (84.6154%) 9/13 (60%) Spiral/Barred Spiral 8/9 (88.8889%) 7/9 (77.7778%) Quadratic Kernal Original Data 6 Features 2 PCA Features Irregular/Regular 13/15 (86.6667%) 12/15 (80%) Elliptical/Not Elliptical 10/15 (66.6667%) 3/15 (20%) Lenticular/Spiral 8/13 (61.5385%) 3/13 (23.0769%) Spiral/Barred Spiral 4/9 (44.4444%) 2/9 (22.2222%) Enhanced Data Irregular/Regular 12/15 (80%) 0/15 (0%) Elliptical/Not Elliptical 12/15 (80%) 13/15 (86.6667%) Lenticular/Spiral 11/13 (84.6154%) 9/13 (60%) Spiral/Barred Spiral 6/9 (66.6667%) 6/9 (66.6667%)

Table 2.4: Summary of classification results for original and enhanced data. Accuracy improved by 12.924% due to enhancement.

0.6 0.6 I (training) I (training) I (classified) I (classified) 0.4 R (training) 0.4 R (training) R (classified) R (classified) Support Vectors Support Vectors 0.2 0.2

0 0

−0.2 −0.2

−0.4 −0.4

−0.6 −0.6

−0.8 −0.8 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

(a) linear kernel (b) quadratic kernel

Figure 2.37: PCA feature space iteration 1 classification of enhanced data.

81 0.6 0.6 0 (training) 0 (training) 0 (classified) 0 (classified) 0.4 1 (training) 0.4 1 (training) 1 (classified) 1 (classified) Support Vectors Support Vectors 0.2 0.2

0 0

−0.2 −0.2

−0.4 −0.4

−0.6 −0.6

−0.8 −0.8 0 0.5 1 1.5 2 0 0.5 1 1.5 2

(a) linear kernel (b) quadratic kernel

Figure 2.38: PCA feature space iteration 2 classification of enhanced data.

1 1 0 (training) 0 (training) 0.8 0 (classified) 0.8 0 (classified) 1 (training) 1 (training) 1 (classified) 1 (classified) 0.6 Support Vectors 0.6 Support Vectors

0.4 0.4

0.2 0.2

0 0

−0.2 −0.2

−0.4 −0.4

−0.6 −0.6

−0.8 −0.8 0 0.5 1 1.5 2 0 0.5 1 1.5 2

(a) linear kernel (b) quadratic kernel

Figure 2.39: PCA feature space iteration 3 classification of enhanced data.

2.8 Future Work

Improve the segmentation scheme to capture more accurately the shape of the galaxies. Extend the

classification scheme to include classes Sa,Sb,Sc,SBa,SBb,SBc,SBd,SBmand the elliptical subclasses E0,...,E7. Use a sparse dictionary to perform classification of image data. Download a data set from the CDS Strausburg to increase the size of training and validation sets. 5-fold and 10-fold cross validation for classification.Implement classification procedures in python. Develop graphical user interface for user driven or automated classification software.

82 0.8 0.8 0 (training) 0 (training) 0.7 0 (classified) 0.7 0 (classified) 1 (training) 1 (training) 1 (classified) 1 (classified) 0.6 Support Vectors 0.6 Support Vectors

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

−0.1 −0.1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

(a) linear kernel (b) quadratic kernel

Figure 2.40: PCA feature space iteration 4 classification of enhanced data.

83 Appendix A: PROJECT SOFTWARE

A list and brief description of the Matlab codes used in the paper is below,

• galaxy_processing.m: preprocessing and feature extraction as delineated in sections 2.5 and

2.6.1.

• centroid.m: calculates the center of brightness of the galaxy image to be used for shifting the image by the centroid.

• galaxy_shift.m: shifts the galaxy image so that the center of brightness and the image center are coincident.

• secondmoment.m: calculates the second moments of the galaxy image and the angle between

the first second moment and the vertical axis of the image.

• calculateEllipse: calculates and plots an ellipse by the centroid, ellipse axes and angle of rotation on the galaxy image.

• classification_Irr_Reg.m: original data classification for classes irregular and regular. (gen-

erated figure 2.33)

• classification_E_NE.m: " " elliptical and not elliptical. (generated figure 2.34)

• classification_S0_S.m: " " lenticular and spiral. (generated figure 2.35)

• classification_S_SB.m: " " simple spiral and barred spiral. (generated figure 2.36)

• heap_classification_Irr_Reg.m: enhanced data classification for classes irregular and regular.

(generated figure 2.37)

• heap_classification_E_NE.m: " " elliptical and not elliptical. (generated figure 2.38)

• heap_classification_S0_S.m: " " lenticular and spiral. (generated figure 2.39)

• heap_classification_S_SB.m: " " simple spiral and barred spiral. (generated figure 2.40)

84 A.1 Preprocessing and Feature Extraction codes

% call: galaxy_processing.m % % Background subtraction by thresholding. Threshold is determined % by either manual inspection of threshold image iterations of the % histogram levels or Otsu's method. Star/object removal by morphological % opening. Shift image so galaxy centroid and image center are coincident. % Galaxy rotation by angle between 2nd moment and vertical image axis. % Crop and resize image to 128x128. Edge detection and calculate best fit % ellipse for use in feature extraction by 6 morphological features. %% Read image in A=imread('AC8431_NGC3985.tif'); [N M L]=size(A); A=A(:,:,1); %% Find best threshold H=imhist(A,65535); %imhist(A,65535) for uint16 images figure; subplot(2,2,1) imshow(A) subplot(2,2,[3,4]) plot(H)

% x1=1*10^4; % for uint16 images for i=50:200 subplot(2,2,[3,4]) hold on; T=i; xx=[T T]; yy=[0 H(T)]; hline=line(xx,yy); set(hline,'Color',[1 0 0]); htext=text(T-5, H(T),'T'); set(htext,'Color',[1 0 0]);

85 Ab=(A>T); subplot(2,2,2) imshow(Ab) ss=sprintf('Thresholding by %g',T); stitle=title(ss); pause(.1) delete(htext); delete(hline); end % Thresholding bw=im2bw(A,19200/65535); bw=1-bw; bw2=bwareaopen(bw,256); % cc=bwconncomp(bw2); %use for more than 1 object % L=labelmatrix(cc); % L(L~=2)=0; % L=double(L); % imshow(L,[]) % X=double(A);

% g=L.*X; imshow(bw2,[]); %colormap(gray(65535)) X=double(A); g=bw2.*X; imshow(g,[]); %colormap(gray(65535)) %% Shifting image by centroid [xc,yc]=centroid(g,1); Y=galaxy_shift(g,xc,yc); %% Rotate image by angle defined by 2nd moments [m11,m20,m02]=secondmoment(g); theta=(1/2)*atan2(2*m11,m20-m02); alpha=theta*(180/pi); gr=imrotate(g,angle); imshow(gr,[])

86 % Crop galaxy % reduce size of rotated galaxy by size(gr)/n, n=1,2,... % use the reduced size to compose a new image I which contains % the galaxy. I=imcrop(gr,[102 214 129 125]); gs=imresize(I,[128 128]); imshow(gs,[]) [N M L]=size(gs); %% Calculating morphics features bs=im2bw(gs); p=regionprops(bs,'all'); p=p(1); xc=p.Centroid(1); yc=p.Centroid(2); a=p.MajorAxisLength/2; b=p.MinorAxisLength/2; BBox=round(p.BoundingBox); [X,Y]=calculateEllipse(p.Centroid(1),p.Centroid(2),a,b,0); % Edge detection [gCanny, gt]=edge(gs,'canny',[0.3 .9], 0.5); imshow(gCanny) G=find(gCanny>0); figure; imshow(gs,[]); hold on; plot(X,Y,'b*'); rectangle('Position',p.BoundingBox,'EdgeColor','r') plot(G,'g-'); % Elongation: (a-b)/(b+a). Elongation=(a-b)/(b+a) % Form Factor: ratio of the area of the galaxy % (number of pixels in the galaxy) to its perimeter % (number of pixels in canny edge detection).

87 numpixels_galaxy=0; for n=1:N for m=1:M if(gs(n,m)~=0) numpixels_galaxy=numpixels_galaxy+1; end end end numpixels_perimeter=numel(find(gCanny>0)); Formfactor=numpixels_galaxy/numpixels_perimeter % Convexity: ratio of the galaxy perimeter to the % perimeter of the minimum bounding rectangle. % imshow(A) %show bounding rectangle superimposed on galaxy. % rectangle('position',[xmin ymin width height],'EdgeColor','r'); rectangle_perimeter=2*BBox(3)+2*BBox(4); Convexity=numpixels_perimeter/rectangle_perimeter %Bounding-rectangle-to-fill-factor (BFF): area of the bounding rectangle %to the number of pixels within the rectangle. rectangle_area=BBox(3)*BBox(4); L1=BBox(1); W1=BBox(2); L=BBox(1)+BBox(3); W=BBox(2)+BBox(4); numpixels_bounding_box=0; for n=L1:L for m=W1:W numpixels_bounding_box=numpixels_bounding_box+1; end end BFF=rectangle_area/numpixels_bounding_box % Bounding-rectangle-to-perimeter: area of the bounding rectangle % to the number of pixels included in the perimeter.

88 Bounding_rectangle_to_perimeter=rectangle_area/rectangle_perimeter % Asymmetry index: taking the difference between the galaxy image % and the same image rotated 180 degrees about the center of the galaxy. % The sum of the absolute value of the pixels in the difference image % is divided by the sum of pixels in the original image to give the % asymmetry parameter. gs_rotated=imrotate(gs,180); difference_image=gs-gs_rotated; Asymmetry_index=sum(sum(abs(difference_image)))/sum(sum(gs))

%======% call: centroid.m % % calculate the first moment of an image. centroid(X,I) calculates % the centroid for binary or grayscale image X. If X is binary, I=0. % If X is intensity image, I=1. % John Jenkinson, Dr. Artyom Grigoryan, ECE UTSA 2014. function[xc,yc]=centroid(X,I) [N M L]=size(X); X=double(X(:,:,1)); xbar=0; ybar=0; for n=1:N for m=1:M a=X(n,m);

xbar = xbar + n*a; ybar = ybar + m*a; end end if(I==1) ss=sum(X(:)); %faster than sum(sum(X)) for type double else if(I==0)

ss=N*M; end

89 end xc=round(xbar/ss); yc=round(ybar/ss); end

%======% call: galaxy_shift.m % % Shift the center of brightness to the image center. % John Jenkinson, ECE UTSA 2014. function[Y]=galaxy_shift(g,xc,yc) [N M L]=size(g); Y=zeros(N,M); if(N/2-yc<0 & M/2-xc<0) Y(1:N+(N/2-yc),1:M+(M/2-xc))=g(1-(N/2-yc):N,1-(M/2-xc):M); else if(N/2-yc<0 & M/2-xc>0) Y(1:N+(N/2-yc),1+(M/2-xc):M)=g(1-(N/2-yc):N,1:M-(M/2-xc)); else if(N/2-yc>0 & M/2-xc<0) Y(1+(N/2-yc):N,1:M+(M/2-xc))=g(1:N-(N/2-yc),1-(M/2-xc):M); else if(N/2-yc>0 & M/2-xc>0) Y(1+(N/2-yc):N,1+(M/2-xc):M)=g(1:N-(N/2-yc),1:M-(M/2-xc)); end end end end end

%======% call: secondmoment.m % % Let say you have the image A(n,m) of galaxy of size NxM the moment mu(11) % is calculated as follows: % by Art Grigoryan edited by John Jenkinson function [m11,m20,m02]=secondmoment(A) [N,M]=size(A);

90 m11=0; m20=0; m02=0; for n=0:N-1 n1=n+1; for m=0:M-1 a=A(n1,m+1);

ma=m*a; na=n*a; m11=m11+n*ma; m20=m20+n*na; m02=m02+m*ma; end end if(islogical(A)==1) % normalization

ss=N*M; m11=m11/ss; m20=m20/ss; m02=m02/ss; else % normalization ss=sum(sum(A)); m11=round(m11/ss); m20=round(m20/ss); m02=round(m02/ss); end end

%======% call: calculateEllipse.m % % calculate points to draw an ellipse

91 function [X,Y] = calculateEllipse(x, y, a, b, angle, steps) % x coordinate % y coordinate % semimajor axis % semiminor axis % angle of the ellipse (in degrees) narginchk(5, 6); if nargin<6, steps = 36; end

beta = -angle * (pi / 180); sinbeta = sin(beta); cosbeta = cos(beta);

alpha = linspace(0, 360, steps)' .* (pi / 180); sinalpha = sin(alpha); cosalpha = cos(alpha);

X=x+(a* cosalpha * cosbeta - b * sinalpha * sinbeta); Y=y+(a* cosalpha * sinbeta + b * sinalpha * cosbeta); if nargout==1, X = [X Y]; end end

A.2 SVM Classification codes with data

A.2.1 Original data training=[0.2379 0.031 1.1141 0.6371 0.0604 0.338 0.2066 0.0623 0.8261 0.7143 0.0595 0.7111 0.3681 0.0423 0.8803 0.586 0.0559 0.1604 0.1589 0.0275 1.1492 0.5895 0.0617 0.2602 0.2876 0.058 0.8281 0.6792 0.0586 0.3558 0.0577 0.059 0.8803 0.7329 0.0624 0.2386 0.0175 0.0585 0.9 0.7582 0.0624 0.1724 0.054 0.0497 0.9521 0.7206 0.0625 0.1144 0.0316 0.0767 0.7955 0.7769 0.0625 0.2979 0.1817 0.0733 0.7895 0.75 0.0609 0.303

92 0.5137 0.0393 0.8707 0.651 0.0458 0.4838 0.5666 0.0372 0.9038 0.6854 0.0444 0.1155 0.3609 0.0482 0.8878 0.6913 0.055 0.0932 0.6616 0.0284 0.9259 0.6455 0.0377 0.2113 0.3547 0.047 0.871 0.6524 0.0546 0.219 0.4334 0.0457 0.8917 0.6918 0.0525 0.1033 0.461 0.0428 0.8625 0.6395 0.0498 0.5098 0.2049 0.0629 0.84 0.74 0.06 0.2342 0.1287 0.0718 0.8158 0.7841 0.0609 0.2609 0.5203 0.032 0.9405 0.625 0.0454 0.64 0.1891 0.0412 1.011 0.6946 0.0606 0.1607 0.7442 0.0141 1.0774 0.5336 0.0306 0.5857 0.6239 0.0347 0.9126 0.7114 0.0406 0.4327 0.411 0.0415 0.9273 0.6789 0.0525 0.2029 0.7521 0.0145 1.0337 0.497 0.0312 1.0387 0.1306 0.0462 0.9239 0.6423 0.0614 0.3473 0.5882 0.0291 0.9153 0.5535 0.0441 0.1941 0.4012 0.0192 1.3214 0.6382 0.0527 0.4836 0.5257 0.0356 0.9414 0.7045 0.0448 0.1554 0.7802 0.0155 1 0.5871 0.0264 0.2839 0.5228 0.0184 1.2409 0.5717 0.0496 0.3557 0.505 0.0305 0.9783 0.5842 0.0499 0.3856 0.4325 0.0322 0.8718 0.4838 0.0506 0.443 0.5556 0.0282 0.8941 0.5242 0.0429 0.6762 0.521 0.0233 1.119 0.6573 0.0443 0.3256 0.453 0.044 0.8519 0.6118 0.0521 0.2796 0.7246 0.0227 0.9924 0.6446 0.0347 0.7385 0.5626 0.0248 1.115 0.6618 0.0466 0.9537 0.6077 0.0318 0.9091 0.657 0.04 0.5912 0.6071 0.0219 1.0536 0.5515 0.0441 0.3169]; %NGC 4449,UGC 7408,UGC 7577,UGC 7639,UGC 7690 %NGC 4278,NGC 4283,NGC 4308,NGC 5813,NGC 5831

93 %NGC 4346,NGC 4460,NGC 4251,NGC 4220,NGC 4346 %NGC 4324,NGC 5854,NGC 5838,NGC 5839,NGC 5864 %NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288 %NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157 %NGC 5850,NGC 5806,NGC 4232,NGC 4088,NGC 4258 %NGC 4527,NGC 4389,NGC 4496,NGC 4085,NGC 4096 Y=['I'; 'I'; 'I'; 'I'; 'I'; 'R'; 'R'; 'R'; 'R'; 'R';... 'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R';... 'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R';... 'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R']; coeff=pca(training); reduced_training=training*coeff(:,1:2); svmStruct=svmtrain(training,Y,'kernel_function',... 'quadratic','showplot',true); test=[0.0284 0.044 0.9241 0.6016 0.0625 0.299 0.0474 0.0469 0.9684 0.705 0.0624 0.3738 0.1105 0.0548 0.9314 0.7682 0.0619 0.4194 0.1687 0.0637 0.8448 0.75 0.0606 0.1961 0.1563 0.0692 0.85 0.8333 0.06 0.2 0.4373 0.051 0.8421 0.7037 0.0514 1.6172 0.3642 0.0147 1.5035 0.603 0.055 0.5245 0.7489 0.0199 0.9471 0.5502 0.0324 0.6252 0.304 0.0258 1.1216 0.5528 0.0588 0.4888 0.2894 0.0161 1.3588 0.4942 0.06 0.6418 0.6478 0.0129 1.3286 0.5558 0.0411 0.6026 0.3865 0.0333 0.9158 0.5154 0.0541 0.4956 0.3934 0.0403 0.8406 0.5123 0.0556 0.4945 0.484 0.0319 0.9286 0.4949 0.0556 0.5979 0.2565 0.0618 0.7857 0.6361 0.06 0.3743]; %test set is first row irregular, remaining regular. %NGC 4496B,NGC 5846,NGC 5846A,NGC 5865,NGC 5868,NGC 4310 %NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242

94 %NGC 4393,NGC 4288,NGC 3985 coeff2=pca(test); reduced_test=test*coeff2(:,1:2); group = svmclassify(svmStruct,test,'showplot',true);

%======%elliptical versus not elliptical training=[0.0577 0.059 0.8803 0.7329 0.0624 0.2386 0.0175 0.0585 0.9 0.7582 0.0624 0.1724 0.054 0.0497 0.9521 0.7206 0.0625 0.1144 0.0316 0.0767 0.7955 0.7769 0.0625 0.2979 0.1817 0.0733 0.7895 0.75 0.0609 0.303 0.5137 0.0393 0.8707 0.651 0.0458 0.4838 0.5666 0.0372 0.9038 0.6854 0.0444 0.1155 0.3609 0.0482 0.8878 0.6913 0.055 0.0932 0.6616 0.0284 0.9259 0.6455 0.0377 0.2113 0.3547 0.047 0.871 0.6524 0.0546 0.219 0.4334 0.0457 0.8917 0.6918 0.0525 0.1033 0.461 0.0428 0.8625 0.6395 0.0498 0.5098 0.2049 0.0629 0.84 0.74 0.06 0.2342 0.1287 0.0718 0.8158 0.7841 0.0609 0.2609 0.1891 0.0412 1.011 0.6946 0.0606 0.1607 0.7442 0.0141 1.0774 0.5336 0.0306 0.5857 0.6239 0.0347 0.9126 0.7114 0.0406 0.4327 0.411 0.0415 0.9273 0.6789 0.0525 0.2029 0.7521 0.0145 1.0337 0.497 0.0312 1.0387 0.1306 0.0462 0.9239 0.6423 0.0614 0.3473 0.5882 0.0291 0.9153 0.5535 0.0441 0.1941 0.4012 0.0192 1.3214 0.6382 0.0527 0.4836 0.5257 0.0356 0.9414 0.7045 0.0448 0.1554 0.7802 0.0155 1 0.5871 0.0264 0.2839 0.5228 0.0184 1.2409 0.5717 0.0496 0.3557 0.505 0.0305 0.9783 0.5842 0.0499 0.3856

95 0.4325 0.0322 0.8718 0.4838 0.0506 0.443 0.5556 0.0282 0.8941 0.5242 0.0429 0.6762 0.521 0.0233 1.119 0.6573 0.0443 0.3256 0.453 0.044 0.8519 0.6118 0.0521 0.2796 0.7246 0.0227 0.9924 0.6446 0.0347 0.7385 0.5626 0.0248 1.115 0.6618 0.0466 0.9537 0.6077 0.0318 0.9091 0.657 0.04 0.5912 0.6071 0.0219 1.0536 0.5515 0.0441 0.3169]; %NGC 4278,NGC 4283,NGC 4308,NGC 5813,NGC 5831,NGC 4346,NGC 4460 %NGC 4251,NGC 4220,NGC 4346,NGC 4324,NGC 5854,NGC 5838 %NGC 5839,NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288 %NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850 %NGC 5806,NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527 %NGC 4389,NGC 4496,NGC 4085,NGC 4096 % 1 for Elliptical 0 for Not Elliptical Y=[1 1 1 1 ... 100000... 000000... 000000... 000000... 000000]; Y=Y'; coeff=pca(training); reduced_training=training*coeff(:,1:2); svmStruct=svmtrain(reduced_training,Y,'kernel_function',... 'quadratic','showplot',true); % 'kernal_function','quadratic' test=[0.0474 0.0469 0.9684 0.705 0.0624 0.3738 0.1105 0.0548 0.9314 0.7682 0.0619 0.4194 0.5203 0.032 0.9405 0.625 0.0454 0.64 0.1687 0.0637 0.8448 0.75 0.0606 0.1961 0.1563 0.0692 0.85 0.8333 0.06 0.2

96 0.4373 0.051 0.8421 0.7037 0.0514 1.6172 0.3642 0.0147 1.5035 0.603 0.055 0.5245 0.7489 0.0199 0.9471 0.5502 0.0324 0.6252 0.304 0.0258 1.1216 0.5528 0.0588 0.4888 0.2894 0.0161 1.3588 0.4942 0.06 0.6418 0.6478 0.0129 1.3286 0.5558 0.0411 0.6026 0.3865 0.0333 0.9158 0.5154 0.0541 0.4956 0.3934 0.0403 0.8406 0.5123 0.0556 0.4945 0.484 0.0319 0.9286 0.4949 0.0556 0.5979 0.2565 0.0618 0.7857 0.6361 0.06 0.3743]; %NGC 5846,NGC 5846A,NGC 5864,NGC 5865,NGC 5868,NGC 4310 %NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242 %NGC 4393,NGC 4288,NGC 3985 coeff2=pca(test); reduced_test=test*coeff2(:,1:2); group = svmclassify(svmStruct,reduced_test,'showplot',true);

%======%lenticular versus spiral clear all; close all; clc training=[0.5137 0.0393 0.8707 0.651 0.0458 0.4838 0.5666 0.0372 0.9038 0.6854 0.0444 0.1155 0.3609 0.0482 0.8878 0.6913 0.055 0.0932 0.6616 0.0284 0.9259 0.6455 0.0377 0.2113 0.3547 0.047 0.871 0.6524 0.0546 0.219 0.4334 0.0457 0.8917 0.6918 0.0525 0.1033 0.461 0.0428 0.8625 0.6395 0.0498 0.5098 0.2049 0.0629 0.84 0.74 0.06 0.2342 0.1287 0.0718 0.8158 0.7841 0.0609 0.2609 0.1891 0.0412 1.011 0.6946 0.0606 0.1607 0.7442 0.0141 1.0774 0.5336 0.0306 0.5857 0.6239 0.0347 0.9126 0.7114 0.0406 0.4327 0.411 0.0415 0.9273 0.6789 0.0525 0.2029

97 0.7521 0.0145 1.0337 0.497 0.0312 1.0387 0.1306 0.0462 0.9239 0.6423 0.0614 0.3473 0.5882 0.0291 0.9153 0.5535 0.0441 0.1941 0.4012 0.0192 1.3214 0.6382 0.0527 0.4836 0.5257 0.0356 0.9414 0.7045 0.0448 0.1554 0.7802 0.0155 1 0.5871 0.0264 0.2839 0.5228 0.0184 1.2409 0.5717 0.0496 0.3557 0.505 0.0305 0.9783 0.5842 0.0499 0.3856 0.4325 0.0322 0.8718 0.4838 0.0506 0.443 0.5556 0.0282 0.8941 0.5242 0.0429 0.6762 0.521 0.0233 1.119 0.6573 0.0443 0.3256 0.453 0.044 0.8519 0.6118 0.0521 0.2796 0.7246 0.0227 0.9924 0.6446 0.0347 0.7385 0.5626 0.0248 1.115 0.6618 0.0466 0.9537 0.6077 0.0318 0.9091 0.657 0.04 0.5912 0.6071 0.0219 1.0536 0.5515 0.0441 0.3169]; %NGC 4346,NGC 4460,NGC 4251,NGC 4220,NGC 4346,NGC 4324,NGC 5854 %NGC 5838,NGC 5839,NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288 %NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850,NGC 5806 %NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527,NGC 4389 %NGC 4496,NGC 4085,NGC 4096 % 1 for Lenticular 0 for Spiral Y=[1 1 111111100000000000000000000]; Y=Y'; coeff=pca(training); reduced_training=training*coeff(:,1:2); svmStruct=svmtrain(reduced_training,Y,'showplot',true); %,'kernel_function','quadratic' test=[0.5203 0.032 0.9405 0.625 0.0454 0.64 0.1687 0.0637 0.8448 0.75 0.0606 0.1961 0.1563 0.0692 0.85 0.8333 0.06 0.2 0.4373 0.051 0.8421 0.7037 0.0514 1.6172

98 0.3642 0.0147 1.5035 0.603 0.055 0.5245 0.7489 0.0199 0.9471 0.5502 0.0324 0.6252 0.304 0.0258 1.1216 0.5528 0.0588 0.4888 0.2894 0.0161 1.3588 0.4942 0.06 0.6418 0.6478 0.0129 1.3286 0.5558 0.0411 0.6026 0.3865 0.0333 0.9158 0.5154 0.0541 0.4956 0.3934 0.0403 0.8406 0.5123 0.0556 0.4945 0.484 0.0319 0.9286 0.4949 0.0556 0.5979 0.2565 0.0618 0.7857 0.6361 0.06 0.3743]; %NGC 5864,NGC 5865,NGC 5868,NGC 4310,NGC 6070 %UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242 %NGC 4393,NGC 4288,NGC 3985 coeff2=pca(test); reduced_test=test*coeff2(:,1:2); group = svmclassify(svmStruct,reduced_test,'showplot',true);

%======%simple spiral versus barred spiral training=[0.1891 0.0412 1.011 0.6946 0.0606 0.1607 0.7442 0.0141 1.0774 0.5336 0.0306 0.5857 0.6239 0.0347 0.9126 0.7114 0.0406 0.4327 0.411 0.0415 0.9273 0.6789 0.0525 0.2029 0.7521 0.0145 1.0337 0.497 0.0312 1.0387 0.1306 0.0462 0.9239 0.6423 0.0614 0.3473 0.5882 0.0291 0.9153 0.5535 0.0441 0.1941 0.4012 0.0192 1.3214 0.6382 0.0527 0.4836 0.5257 0.0356 0.9414 0.7045 0.0448 0.1554 0.7802 0.0155 1 0.5871 0.0264 0.2839 0.5228 0.0184 1.2409 0.5717 0.0496 0.3557 0.505 0.0305 0.9783 0.5842 0.0499 0.3856 0.4325 0.0322 0.8718 0.4838 0.0506 0.443 0.5556 0.0282 0.8941 0.5242 0.0429 0.6762 0.521 0.0233 1.119 0.6573 0.0443 0.3256

99 0.453 0.044 0.8519 0.6118 0.0521 0.2796 0.7246 0.0227 0.9924 0.6446 0.0347 0.7385 0.5626 0.0248 1.115 0.6618 0.0466 0.9537 0.6077 0.0318 0.9091 0.657 0.04 0.5912 0.6071 0.0219 1.0536 0.5515 0.0441 0.3169]; %NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288,NGC 4457 %NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850,NGC 5806 %NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527 %NGC 4389,NGC 4496,NGC 4085,NGC 4096 % 1 for Spiral 0 for Barred Spiral Y=[1 1 111000000000000000]; Y=Y'; coeff=pca(training); reduced_training=training*coeff(:,1:2); svmStruct=svmtrain(reduced_training,Y,'kernel_function',... 'quadratic','showplot',true); %,'kernel_function','quadratic' test=[0.3642 0.0147 1.5035 0.603 0.055 0.5245 0.7489 0.0199 0.9471 0.5502 0.0324 0.6252 0.304 0.0258 1.1216 0.5528 0.0588 0.4888 0.2894 0.0161 1.3588 0.4942 0.06 0.6418 0.6478 0.0129 1.3286 0.5558 0.0411 0.6026 0.3865 0.0333 0.9158 0.5154 0.0541 0.4956 0.3934 0.0403 0.8406 0.5123 0.0556 0.4945 0.484 0.0319 0.9286 0.4949 0.0556 0.5979 0.2565 0.0618 0.7857 0.6361 0.06 0.3743]; %NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242 %NGC 4393,NGC 4288,NGC 3985 coeff2=pca(test); reduced_test=test*coeff2(:,1:2); group = svmclassify(svmStruct,reduced_test,'showplot',true);

A.2.2 Enhanced data

100 training=[0.2028 0.0399 1.0698 0.7504 0.0608 1.0613 0.2187 0.0379 1.02 0.646 0.0611 0.4876 0.4311 0.0116 1.5897 0.5422 0.0541 1.4891 0.0873 0.0179 1.4145 0.5727 0.0625 0.2709 0.1025 0.0493 0.9038 0.6488 0.0621 0.2294 0.0616 0.0223 1.3416 0.6442 0.0623 0.2499 0.0439 0.0594 0.8462 0.6845 0.0621 0.2609 0.0498 0.0386 1.0259 0.6544 0.062 0.4055 0.066 0.0297 1.2147 0.7027 0.0624 0.297 0.1106 0.0612 0.8429 0.7007 0.062 0.1972 0.563 0.0361 0.9012 0.6811 0.043 0.3022 0.5703 0.0343 0.9 0.6169 0.045 0.1646 0.4029 0.0437 0.85 0.6012 0.0525 0.203 0.6352 0.0297 0.8824 0.5779 0.04 0.3413 0.5132 0.0402 0.8814 0.6323 0.0494 0.0874 0.4404 0.0377 0.9557 0.6677 0.0516 0.2233 0.4778 0.0393 0.9455 0.7083 0.0496 0.4565 0.2595 0.0531 0.89 0.7148 0.0589 0.1188 0.1686 0.0687 0.7857 0.6927 0.0612 0.2556 0.5027 0.0415 0.8942 0.6748 0.0492 0.1393 0.1871 0.0386 1.0429 0.6948 0.0603 0.2403 0.6458 0.0125 1.1555 0.4424 0.0377 0.6822 0.606 0.0331 0.9237 0.6895 0.0409 0.3285 0.3777 0.047 0.8939 0.6923 0.0542 0.23 0.7385 0.0209 0.9314 0.5232 0.0347 0.8889 0.2116 0.044 0.9107 0.6126 0.0596 1.8642 0.5645 0.0352 0.9286 0.6695 0.0454 0.2674 0.4421 0.0279 1.1424 0.7056 0.0516 0.3175 0.5088 0.0314 1.0037 0.6476 0.0489 0.1525 0.7701 0.0192 0.9634 0.6308 0.0283 0.2815 0.4965 0.0186 1.2083 0.4951 0.0548 0.4783 0.487 0.0362 0.9202 0.6272 0.0488 0.3401

101 0.3847 0.043 0.8571 0.55 0.0574 0.3939 0.5871 0.0271 0.9656 0.6006 0.042 0.5289 0.5288 0.0309 0.9645 0.6443 0.0446 0.3304 0.4683 0.0366 0.9167 0.6028 0.051 0.4332 0.4643 0.042 0.9167 0.672 0.0525 0.4112 0.5166 0.0175 1.297 0.5687 0.0518 0.4288 0.6254 0.0331 0.8987 0.6617 0.0404 0.5547 0.6314 0.031 0.9389 0.6861 0.0398 0.1997]; %NGC 4449,UGC 7408,UGC 7577,UGC 7639,UGC 7690 %NGC 4278,NGC 4283,NGC 4308,NGC 5813,NGC 5831 %NGC 4346,NGC 4460,NGC 4251,NGC 4220,NGC 4346 %NGC 4324,NGC 5854,NGC 5838,NGC 5839,NGC 5864 %NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288 %NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157 %NGC 5850,NGC 5806,NGC 4232,NGC 4088,NGC 4258 %NGC 4527,NGC 4389,NGC 4496,NGC 4085,NGC 4096 Y=['I'; 'I'; 'I'; 'I'; 'I'; 'R'; 'R'; 'R'; 'R'; 'R';... 'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R';... 'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R';... 'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R']; coeff=pca(training); reduced_training=training*coeff(:,1:2); svmStruct=svmtrain(reduced_training,Y,'kernel_function',... 'quadratic','showplot',true); %,'kernel_function','quadratic' test=[0.1574 0.0321 1.1029 0.6243 0.0625 0.42 0.0188 0.0368 1.0887 0.6989 0.0625 0.3746 0.0763 0.0406 1.0671 0.7388 0.0625 0.9791 0.1338 0.0592 0.8514 0.6912 0.0621 0.2128 0.0194 0.0653 0.875 0.8 0.0625 0.3 0.4014 0.0365 0.9178 0.6007 0.0512 0.9695 0.3985 0.0389 0.9258 0.6263 0.0533 0.3471

102 0.7009 0.0124 1.1456 0.3995 0.0406 0.5291 0.2914 0.0296 1.067 0.577 0.0583 0.3457 0.246 0.0247 1.1345 0.5328 0.0597 0.5092 0.5692 0.0229 1.0414 0.5557 0.0447 0.5544 0.2172 0.0124 1.5533 0.4904 0.0611 0.5086 0.3022 0.0153 1.3596 0.4925 0.0576 0.7544 0.3798 0.029 1.0152 0.4938 0.0604 0.4923 0.2377 0.0559 0.8617 0.6898 0.0602 0.2997]; %test set is first row irregular, remaining regular. %NGC 4496B,NGC 5846,NGC 5846A,NGC 5865,NGC 5868,NGC 4310 %NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242 %NGC 4393,NGC 4288,NGC 3985 coeff2=pca(test); reduced_test=test*coeff2(:,1:2); GROUP = svmclassify(svmStruct,reduced_test,'showplot',true);

%======%elliptical versus not elliptical training=[0.0616 0.0223 1.3416 0.6442 0.0623 0.2499 0.0439 0.0594 0.8462 0.6845 0.0621 0.2609 0.0498 0.0386 1.0259 0.6544 0.062 0.4055 0.066 0.0297 1.2147 0.7027 0.0624 0.297 0.1106 0.0612 0.8429 0.7007 0.062 0.1972 0.563 0.0361 0.9012 0.6811 0.043 0.3022 0.5703 0.0343 0.9 0.6169 0.045 0.1646 0.4029 0.0437 0.85 0.6012 0.0525 0.203 0.6352 0.0297 0.8824 0.5779 0.04 0.3413 0.5132 0.0402 0.8814 0.6323 0.0494 0.0874 0.4404 0.0377 0.9557 0.6677 0.0516 0.2233 0.4778 0.0393 0.9455 0.7083 0.0496 0.4565 0.2595 0.0531 0.89 0.7148 0.0589 0.1188 0.1686 0.0687 0.7857 0.6927 0.0612 0.2556 0.1871 0.0386 1.0429 0.6948 0.0603 0.2403

103 0.6458 0.0125 1.1555 0.4424 0.0377 0.6822 0.606 0.0331 0.9237 0.6895 0.0409 0.3285 0.3777 0.047 0.8939 0.6923 0.0542 0.23 0.7385 0.0209 0.9314 0.5232 0.0347 0.8889 0.2116 0.044 0.9107 0.6126 0.0596 1.8642 0.5645 0.0352 0.9286 0.6695 0.0454 0.2674 0.4421 0.0279 1.1424 0.7056 0.0516 0.3175 0.5088 0.0314 1.0037 0.6476 0.0489 0.1525 0.7701 0.0192 0.9634 0.6308 0.0283 0.2815 0.4965 0.0186 1.2083 0.4951 0.0548 0.4783 0.487 0.0362 0.9202 0.6272 0.0488 0.3401 0.3847 0.043 0.8571 0.55 0.0574 0.3939 0.5871 0.0271 0.9656 0.6006 0.042 0.5289 0.5288 0.0309 0.9645 0.6443 0.0446 0.3304 0.4683 0.0366 0.9167 0.6028 0.051 0.4332 0.4643 0.042 0.9167 0.672 0.0525 0.4112 0.5166 0.0175 1.297 0.5687 0.0518 0.4288 0.6254 0.0331 0.8987 0.6617 0.0404 0.5547 0.6314 0.031 0.9389 0.6861 0.0398 0.1997]; %NGC 4278,NGC 4283,NGC 4308,NGC 5813,NGC 5831,NGC 4346,NGC 4460 %NGC 4251,NGC 4220,NGC 4346,NGC 4324,NGC 5854,NGC 5838 %NGC 5839,NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288 %NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850 %NGC 5806,NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527 %NGC 4389,NGC 4496,NGC 4085,NGC 4096 % 1 for Elliptical 0 for Not Elliptical Y=[1 1 1 1 ... 100000... 000000... 000000... 000000... 000000];

104 Y=Y'; coeff=pca(training); reduced_training=training*coeff(:,1:2); svmStruct=svmtrain(training,Y,'showplot',true); %'kernel_function','quadratic',S0 test=[0.0188 0.0368 1.0887 0.6989 0.0625 0.3746 0.0763 0.0406 1.0671 0.7388 0.0625 0.9791 0.5027 0.0415 0.8942 0.6748 0.0492 0.1393 0.1338 0.0592 0.8514 0.6912 0.0621 0.2128 0.0194 0.0653 0.875 0.8 0.0625 0.3 0.4014 0.0365 0.9178 0.6007 0.0512 0.9695 0.3985 0.0389 0.9258 0.6263 0.0533 0.3471 0.7009 0.0124 1.1456 0.3995 0.0406 0.5291 0.2914 0.0296 1.067 0.577 0.0583 0.3457 0.246 0.0247 1.1345 0.5328 0.0597 0.5092 0.5692 0.0229 1.0414 0.5557 0.0447 0.5544 0.2172 0.0124 1.5533 0.4904 0.0611 0.5086 0.3022 0.0153 1.3596 0.4925 0.0576 0.7544 0.3798 0.029 1.0152 0.4938 0.0604 0.4923 0.2377 0.0559 0.8617 0.6898 0.0602 0.2997]; %NGC 5846,NGC 5846A,NGC 5864,NGC 5865,NGC 5868, %NGC 4310,NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559, %NGC 4242,NGC 4393,NGC 4288,NGC 3985 coeff2=pca(test); reduced_test=test*coeff2(:,1:2); group = svmclassify(svmStruct,test,'showplot',true);

%======%lenticular versus spiral training=[0.563 0.0361 0.9012 0.6811 0.043 0.3022 0.5703 0.0343 0.9 0.6169 0.045 0.1646 0.4029 0.0437 0.85 0.6012 0.0525 0.203 0.6352 0.0297 0.8824 0.5779 0.04 0.3413

105 0.5132 0.0402 0.8814 0.6323 0.0494 0.0874 0.4404 0.0377 0.9557 0.6677 0.0516 0.2233 0.4778 0.0393 0.9455 0.7083 0.0496 0.4565 0.2595 0.0531 0.89 0.7148 0.0589 0.1188 0.1686 0.0687 0.7857 0.6927 0.0612 0.2556 0.1871 0.0386 1.0429 0.6948 0.0603 0.2403 0.6458 0.0125 1.1555 0.4424 0.0377 0.6822 0.606 0.0331 0.9237 0.6895 0.0409 0.3285 0.3777 0.047 0.8939 0.6923 0.0542 0.23 0.7385 0.0209 0.9314 0.5232 0.0347 0.8889 0.2116 0.044 0.9107 0.6126 0.0596 1.8642 0.5645 0.0352 0.9286 0.6695 0.0454 0.2674 0.4421 0.0279 1.1424 0.7056 0.0516 0.3175 0.5088 0.0314 1.0037 0.6476 0.0489 0.1525 0.7701 0.0192 0.9634 0.6308 0.0283 0.2815 0.4965 0.0186 1.2083 0.4951 0.0548 0.4783 0.487 0.0362 0.9202 0.6272 0.0488 0.3401 0.3847 0.043 0.8571 0.55 0.0574 0.3939 0.5871 0.0271 0.9656 0.6006 0.042 0.5289 0.5288 0.0309 0.9645 0.6443 0.0446 0.3304 0.4683 0.0366 0.9167 0.6028 0.051 0.4332 0.4643 0.042 0.9167 0.672 0.0525 0.4112 0.5166 0.0175 1.297 0.5687 0.0518 0.4288 0.6254 0.0331 0.8987 0.6617 0.0404 0.5547 0.6314 0.031 0.9389 0.6861 0.0398 0.1997]; %NGC 4346,NGC 4460,NGC 4251,NGC 4220,NGC 4346,NGC 4324,NGC 5854 %NGC 5838,NGC 5839,NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288 %NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850,NGC 5806 %NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527,NGC 4389 %NGC 4496,NGC 4085,NGC 4096 % 1 for Lenticular 0 for Spiral Y=[1 1 111111100000000000000000000];

106 Y=Y'; coeff=pca(training); reduced_training=training*coeff(:,1:2); svmStruct=svmtrain(training,Y,'showplot',true); % 'kernel_function','quadratic', test=[0.5027 0.0415 0.8942 0.6748 0.0492 0.1393 0.1338 0.0592 0.8514 0.6912 0.0621 0.2128 0.0194 0.0653 0.875 0.8 0.0625 0.3 0.4014 0.0365 0.9178 0.6007 0.0512 0.9695 0.3985 0.0389 0.9258 0.6263 0.0533 0.3471 0.7009 0.0124 1.1456 0.3995 0.0406 0.5291 0.2914 0.0296 1.067 0.577 0.0583 0.3457 0.246 0.0247 1.1345 0.5328 0.0597 0.5092 0.5692 0.0229 1.0414 0.5557 0.0447 0.5544 0.2172 0.0124 1.5533 0.4904 0.0611 0.5086 0.3022 0.0153 1.3596 0.4925 0.0576 0.7544 0.3798 0.029 1.0152 0.4938 0.0604 0.4923 0.2377 0.0559 0.8617 0.6898 0.0602 0.2997]; %NGC 5864,NGC 5865,NGC 5868,NGC 4310,NGC 6070 %UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242 %NGC 4393,NGC 4288,NGC 3985 coeff2=pca(test); reduced_test=test*coeff2(:,1:2); group = svmclassify(svmStruct,test,'showplot',true);

%======%simple spiral versus barred spiral training=[0.1871 0.0386 1.0429 0.6948 0.0603 0.2403 0.6458 0.0125 1.1555 0.4424 0.0377 0.6822 0.606 0.0331 0.9237 0.6895 0.0409 0.3285 0.3777 0.047 0.8939 0.6923 0.0542 0.23 0.7385 0.0209 0.9314 0.5232 0.0347 0.8889 0.2116 0.044 0.9107 0.6126 0.0596 1.8642

107 0.5645 0.0352 0.9286 0.6695 0.0454 0.2674 0.4421 0.0279 1.1424 0.7056 0.0516 0.3175 0.5088 0.0314 1.0037 0.6476 0.0489 0.1525 0.7701 0.0192 0.9634 0.6308 0.0283 0.2815 0.4965 0.0186 1.2083 0.4951 0.0548 0.4783 0.487 0.0362 0.9202 0.6272 0.0488 0.3401 0.3847 0.043 0.8571 0.55 0.0574 0.3939 0.5871 0.0271 0.9656 0.6006 0.042 0.5289 0.5288 0.0309 0.9645 0.6443 0.0446 0.3304 0.4683 0.0366 0.9167 0.6028 0.051 0.4332 0.4643 0.042 0.9167 0.672 0.0525 0.4112 0.5166 0.0175 1.297 0.5687 0.0518 0.4288 0.6254 0.0331 0.8987 0.6617 0.0404 0.5547 0.6314 0.031 0.9389 0.6861 0.0398 0.1997]; %NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288,NGC 4457 %NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850,NGC 5806 %NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527 %NGC 4389,NGC 4496,NGC 4085,NGC 4096 % 1 for Spiral 0 for Barred Spiral Y=[1 1 111000000000000000]; Y=Y'; coeff=pca(training); reduced_training=training*coeff(:,1:2); svmStruct=svmtrain(training,Y,'kernel_function',... 'quadratic','showplot',true); %'kernel_function','quadratic', test=[0.3985 0.0389 0.9258 0.6263 0.0533 0.3471 0.7009 0.0124 1.1456 0.3995 0.0406 0.5291 0.2914 0.0296 1.067 0.577 0.0583 0.3457 0.246 0.0247 1.1345 0.5328 0.0597 0.5092 0.5692 0.0229 1.0414 0.5557 0.0447 0.5544 0.2172 0.0124 1.5533 0.4904 0.0611 0.5086

108 0.3022 0.0153 1.3596 0.4925 0.0576 0.7544 0.3798 0.029 1.0152 0.4938 0.0604 0.4923 0.2377 0.0559 0.8617 0.6898 0.0602 0.2997]; %NGC 6070,UGC 7617,NGC 4480,UGC 10133,NGC 4559,NGC 4242 %NGC 4393,NGC 4288,NGC 3985 coeff2=pca(test); reduced_test=test*coeff2(:,1:2); group = svmclassify(svmStruct,test,'showplot',true);

109 BIBLIOGRAPHY

[1] Lintott, Chris J. and Schawinski, Kevin and Slosar, Anze and Land, Kate and Bamford, Steven

and others. Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Mon. Not. R. Astron. Soc., 389, 1179-1189, 2008.

[2] R. A. Skibba, S. P. Bamford, R. C. Nichol, C. J. Lintott, D. Andreescu, E. M. Edmondson, P. Murray and M. J. Raddick. Galaxy Zoo: disentangling the environmental dependence of

morphology and colour. Mon. Not. R. Astron. Soc., 399, 966-982, 2009.

[3] Shabala, S. S. and Ting, Y. S. and Kaviraj, S. and Lintott, C. and Crockett, R. M. and Silk, J.

and Sarzi, M. and Schawinski, K. and Bamford, S. P. and Edmondson, E. Galaxy Zoo: dust lane early-type galaxies are tracers of recent, gas-rich minor mergers. Mon. Not. R. Astron.

Soc., 423, 59-67, 2012.

[4] Hubble, E. P. Extragalactic nebulae. Astrophysical Journal, 64, 321-369, 1926.

[5] de Vaucouleurs, G. Classification and Morphology of External Galaxies. Handbuch der

Physik, 53, 275-310, 1959.

[6] de Vaucouleurs, G. Revised Classification of 1500 Bright Galaxies. Astrophysical Journal Supplement, 8, 31, 1963.

[7] Hubble, E. P. The Realm of the Nebulae. Yale University Press, 1936.

[8] Abazajian et al. The First Data Release of the Sloan Digital Sky Survey. The Astronomical Journal, 128, 4, 2081-2086, 2003. , Volume 126, Issue 4, pp. 2081-2086

[9] Patrick Morrissey et al. The Calibration and Data Products of GALEX. The Astrophysical

Journal Supplement Series , 173, 682, 2007.

[10] Neugebauer, G. and Leighton, R. B. Two-micron sky survey. A preliminary catalogue.NASA

SP, Washington: NASA, 1696.

110 [11] Skrutskie et al. The Two Micron All Sky Survey (2MASS). The Astronomical Journal, 131, 1163-1183, 2006.

[12] Lasker et al. The Second-Generation Guide Star Catalog: Description and Properties. The Astronomical Journal, 136, 735-766, 2008.

[13] Reid et al. The Second Palomar Sky Survey. Publications of the Astronomical Society of the

Pacific, 103, 661-674, 1991.

[14] W. Voges et al. The ROSAT all-sky survey bright source catalogue. Astronomy and Astro-

physics, 349, 389-405, 1999.

[15] Becker, R. H. and White, R. L. and Helfand, D. J. The FIRST Survey: Faint Images of the Radio Sky at Twenty Centimeters. The Astronomical Journal, 450, 559, 1995.

[16] Becker, R. H. and White, R. L. and Helfand, D. J. The VLA’s FIRST Survey. Astronomical

Society of the Pacific Conference Series, 61, 165, 1994.

[17] Epchtein, N. et al. The deep near-infrared southern sky survey (DENIS). The Messenger, 87,

27-34, 1997.

[18] Storrie-Lombardi, MC et al. Morphological Classification of Galaxies by Artificial Neural

Networks. Mon. Not. R. Astron. Soc., 259, 8-12, 1992.

[19] Owens, E. A. and Griffiths, R. E. and Ratnatunga, K. U. Using oblique decision trees for the morphological classification of galaxies. Mon. Not. R. Astron. Soc., 281, 153-157, 1996.

[20] Naim, A. and Lahav, O. and Sodre, Jr., L. and Storrie-Lombardi, M. C. Automated morpho-

logical classification of APM galaxies by supervised artificial neural networks. Mon. Not. R.

Astron. Soc., 275, 567-590, 1995.

[21] Nicholas M. Ball. Morphological Classification of Galaxies Using Artificial Neural Net- works. University of Sussex,UK. MSc thesis, 2001.

111 [22] Lahav, O. Artificial neural networks as a tool for galaxy classification. Data Analysis in Astronomy, 43-51, 1997.

[23] Folkes, S. R. and Lahav, O. and Maddox, S. J. An artificial neural network approach to the classification of galaxy spectra. Mon. Not. R. Astron. Soc., 283, 651-665, 1996.

[24] Bershady, M. A. and Jangren, A. and Conselice, C. J. Structural and Photometric Classifica-

tion of Galaxies. I. Calibration Based on a Nearby Galaxy Sample. The Astronomical Journal, 119, 2645-2663, 2000.

[25] D. Bazell and David W. Aha Ensembles of Classifiers for Morphological Galaxy Classifica- tion. The Astrophysical Journal, 548, 219, 2001.

[26] Goderya, Shaukat N. and Lolling, Shawn M. Morphological Classification of Galaxies using

Computer Vision and Artificial Neural Networks: A Computational Scheme. Astrophysics and

Space Science, 279, 377-387, 2002.

[27] D. Bazell. Feature relevance in morphological galaxy classification. Mon. Not. R. Astron. Soc., 316, 519-528, 2000.

[28] Strateva, I. et al. Color Separation of Galaxy Types in the Sloan Digital Sky Survey Imaging

Data. The Astronomical Journal, 122, 1861-1874, 2001.

[29] Abraham, R. G. and van den Bergh, S. and Nair, P. A New Approach to Galaxy Morphology.

I. Analysis of the Sloan Digital Sky Survey Early Data Release. The Astrophysical Journal, 588, 218-229, 2003.

[30] D. Bazell and David W. Aha Ensembles of Classifiers for Morphological Galaxy Classifica- tion. The Astrophysical Journal, 548, 219, 2001.

[31] de la Calleja, J. and Fuentes, O. Machine learning and image analysis for morphological

galaxy classification. Mon. Not. R. Astron. Soc., 349, 87-93, 2001.

112 [32] C. Scarlata et al. COSMOS Morphological Classification with the Zurich Estimator of Struc- tural Types (ZEST) and the Evolution Since z = 1 of the Luminosity Function of Early, Disk,

and Irregular Galaxies. The Astrophysical Journal Supplement Series, 172, 406-433, 2007.

[33] Banerji, M. et al. Galaxy Zoo: reproducing galaxy morphologies via machine learning. Mon.

Not. R. Astron. Soc., 406, 342-353, 2010.

[34] Goderya, S. and Andreasen, J. D. and Philip. Advances in Automated Algorithms For Mor- phological Classification of Galaxies Based on Shape Features. Astronomical Data Analysis

Software and Systems (ADASS) XIII, 314, 617, 2004.

[35] Odewahn, S. C. Automated galaxy classificationinlargeskysurveys. Neural Networks,

1999. IJCNN ’99. International Joint Conference on, 6, 3824-3829, 1999.

[36] Odewahn, S. C. Automated galaxy classification with the APS digitization of POSS I. Astro-

physical Letters and Communications, 31, 55-64, 1995.

[37] Odewahn, S. C. and Windhorst, R. A. and Driver, S. P. and Keel, W. C. Automated Morpho- logical Classification in Deep UBVI Fields: Rapidly and Passively

Evolving Faint Galaxy Populations. Astrophysical Journal Letters, 472, L13-L16, 1996.

[38] Maehoenen, P. H. and Hakala, P. J. Automated Source Classification Using a Kohonen Net-

work. Astrophysical Journal Letters, 452, L77, 1995.

[39] Cortiglioni, F. and Mähönen, P. and Hakala, P. and Frantti, T. Automated Star-Galaxy Dis- crimination for Large Surveys. The Astrophysical Journal, 556, 937-943, 2001.

[40] Baillard, A. and Bertin, E. and Mellier, Y. and McCracken, H. J. and Géraud, T. and Pelló, R. and Leborgne, F. and Fouqué, P. Project EFIGI: Automatic Classification of Galaxies.

Astronomical Society of the Pacific, 351, 236, 2006.

113 [41] Byun, Hyeran and Lee, Seong-Whan. Applications of Support Vector Machines for Pattern Recognition: A Survey. Proceedings of the First International Workshop on Pattern Recogni-

tion with Support Vector Machines, 213-236, 2002.

[42] Romano, Raquel A. and Aragon, Cecilia R. and Ding, Chris. Supernova Recognition Us-

ing Support Vector Machines. Proceedings of the 5th International Conference on Machine Learning and Applications, 77-82, 2006.

[43] Huertas-Company, M. et al. A robust morphological classification of high-redshift galaxies

using support vector machines on seeing limited images. Astronomy & Astrophysics, 497,

743-753, 2009.

[44] Freed, M. and Jeonghwa Lee. Application of Support Vector Machines to the Classifica- tion of Galaxy Morphologies. Computational and Information Sciences (ICCIS), 2013 Fifth

International Conference on, 322-325, 2013.

[45] Saybani et al. Applications of support vector machines in oil refineries: A survey. Interna-

tional Journal of the Physical Sciences, 6(27), 6295-6302, 2011.

[46] Xie W, Yu L, Xu S, Wang S. A New Method for Crude Oil Price Forecasting Based on

Support Vector Machines. Computational Science-ICCS, 3994, 444-451, 2006.

[47] Petkovic, Milena R. and Rapaic, Milan R. and Jakovljevic, Boris B. Electrical Energy Con- sumption Forecasting in Oil Refining Industry Using Support Vector Machines and Particle

Swarm Optimization. WSEAS Trans. Info. Sci. and App., 6(11), 1761-1770, 2009.

[48] Balabin RM, Safieva RZ, Lomakina EI. Gasoline classification using near infrared (NIR)

spectroscopy data: comparison of multivariate techniques. Anal Chim Acta., 671, 27-35, 2010.

[49] Guo, Guodong and Li, Stan Z. and Chan, Kapluk. Face Recognition by Support Vector Machines. Proceedings of the Fourth IEEE International Conference on Automatic Face and

Gesture Recognition 2000, 196-201, 2000.

114 [50] Guodong Guo and Stan Z. Li and Kap Luk Chan. Support vector machines for face recogni- tion. Image and Vision Computing, 19, 631-638, 2001.

[51] John C. Platt and Nello Cristianini and John Shawe-taylor. Large Margin DAGs for Multiclass Classification. Advances in Neural Information Processing Systems, 547-553, 2000.

[52] Haizhou Ai and Luhong Liang and Guangyou Xu. Face detection based on template match-

ing and support vector machines. Image Processing, 2001. Proceedings. 2001 International Conference on, 1, 1006-1009, 2001.

[53] Romdhani, S. and Torr, P. and Scholkopf, B. and Blake, A. Computationally efficient face detection. Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Con-

ference on, 2, 695-700, 2001.

[54] Yongmin Li and Shaogang Gong and Liddell, H.. Support vector regression and classification

based multi-view face detection and recognition. Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on, 300-305, 2000.

[55] Ng, Jeffrey and Gong, Shaogang. Multi-View Face Detection and Pose Estimation Using a

Composite Support Vector Machine Across the View Sphere. Proceedings of the International

Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 14–, 1999.

[56] Jeffrey Ng and Shaogang Gong. Composite support vector machines for detection of faces

across views and pose estimation. Image and Vision Computing, 20, 359-368, 2002.

[57] Yongmin Li and Shaogang Gong and Sherrah, J. and Liddell, H. Multi-view face detection

using support vector machines and eigenspace modelling. Knowledge-Based Intelligent Engi- neering Systems and Allied Technologies, 2000. Proceedings. Fourth International Conference

on, 1, 241-244, 2000.

115 [58] Osuna, E. and Freund, R. and Girosi, F. Training support vector machines: an application to face detection. Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE

Computer Society Conference on, 130-136, 1997.

[59] Kumar, V.P. and Poggio, T. Learning-based approach to real time tracking and analysis of

faces. Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE Interna- tional Conference on, 96-101, 2000.

[60] Yuan Qi and Doermann, D. and DeMenthon, D. Hybrid independent component analysis

and support vector machine learning scheme for face detection. Acoustics, Speech, and Signal

Processing, 2001. Proceedings. (ICASSP ’01). 2001 IEEE International Conference on,3, 1481-1484, 2001.

[61] Terrillon, J.-C. and Shirazi, M.N. and Sadek, M. and Fukamachi, H. and Akamatsu, S. Hybrid

independent component analysis and support vector machine learning scheme for face detec-

tion. Pattern Recognition, 2000. Proceedings. 15th International Conference on, 4, 210-217, 2000.

[62] Papageorgiou, C.P. and Oren, M. and Poggio, T. A general framework for object detection.

Computer Vision, 1998. Sixth International Conference on, 555-562, 1998.

[63] Haizhou Ai and Luhong Liang and Guangyou Xu. Face detection based on template match-

ing and support vector machines. Image Processing, 2001. Proceedings. 2001 International Conference on, 1, 1006-1009, 2001.

[64] Roobaert, D. and Van Hulle, M.M. View-based 3D object recognition with support vector machines. Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE

Signal Processing Society Workshop., 77-84, 1999.

[65] Pontil, Massimiliano and Verri, Alessandro Support Vector Machines for 3D Object Recog-

nition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 637-646, 1998.

116 [66] Yingjie Wang and Chin-Seng Chua and Yeong-Khing Ho. Facial feature detection and face recognition from 2D and 3D images. Pattern Recognition Letters, 23, 1191-1202, 2002.

[67] Kim, K.I and Kim, J. and Jung, K. Recognition of facial images using support vector ma- chines. Statistical Signal Processing, 2001. Proceedings of the 11th IEEE Signal Processing

Workshop on, 468-471, 2001.

[68] M. Pittore, C. Basso, and A. Verri. Representing and recognizing visual dynamic events with support vector machines. In Proceedings of Int. Conference on Image Analysis and Processing,

18-23, 1999.

[69] C. Nakajima, M. Pontil, and T. Poggio. People recognition and pose estimation in image

sequences. In Proceedings of IEEE Int. Joint Conference on Neural Net-works, 4, 189-194, 2000.

[70] S. Gutta, J.R.J. Huang, P. Jonathon, and H. Wechsler. Mixture of experts for classification of gender, ethnic origin, and pose of human. IEEE Trans. on Neural Networks, 4, 948-960, 2001.

[71] Loo-Nin Teow and Kia-Fock Loe. Robust vision-based features and classification schemes

for off-line handwritten digit recognition. Pattern Recognition, 35, 2355-2364, 2002.

[72] Dashan Gao and Jie Zhou and Leping Xin. SVM-based detection of moving vehicles for

automatic traffic monitoring. Intelligent Transportation Systems, 2001. Proceedings. 2001 IEEE, 745-749, 2001.

[73] Kent, S. and Kasapoglu, N. G. and Kartal, M. Radar target classification based on support

vector machines and High Resolution Range Profiles. Radar Conference, 2008. RADAR ’08.

IEEE, 1-6, 2008.

[74] Choisy, C. and Belaid, A. Handwriting recognition using local methods for normalization and global methods for recognition. Document Analysis and Recognition, 2001. Proceedings.

Sixth International Conference on, 23-27, 2001.

117 [75] Gorgevik, D. and Cakmakov, D. and Radevski, V.. Handwritten digit recognition by combin- ing support vector machines using rule-based reasoning. Information Technology Interfaces,

2001. ITI 2001. Proceedings of the 23rd International Conference on, 1, 139-144, 2001.

[76] Junxian Li and Limin Shen and Shuo Yang. A Novel Radar Target Recognition Algorithm

Based on SVM. Intelligent Information Technology Application Workshops, 2008. IITAW ’08. International Symposium on, 431-434, 2008.

[77] Oliveira, L. and Sabourin, R. Support vector machines for handwritten numerical string

recognition. Frontiers in Handwriting Recognition, 2004. IWFHR-9 2004. Ninth International

Workshop on, 39-44, 2004.

[78] Xin Dong and Wi Zhaohui. Speaker recognition using continuous density support vector machines. Electronics Letters, 37, 1099-1101, 2001.

[79] Bengio, S. and Mariethoz, J. Learning the decision function for speaker verification. Acous- tics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP ’01). 2001 IEEE Interna-

tional Conference on, 1, 425-428, 2001.

[80] Changxue Ma and Randolph, M.A and Drish, J. A support vector machines-based rejection

technique for speech recognition. Acoustics, Speech, and Signal Processing, 2001. Proceed- ings. (ICASSP ’01). 2001 IEEE International Conference on, 35, 381-384, 2001.

[81] Wan, V. and Campbell, W.M. Support vector machines for speaker verification and identifi-

cation. Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE Signal

Processing Society Workshop, 2, 775-784, 2000.

[82] Guo, G. and Hong-Jiang Zhang and Li, S.Z. Distance-from-boundary as a metric for texture image retrieval. Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP ’01).

2001 IEEE International Conference on, 3, 1629-1632, 2001.

118 [83] Qi Tian and Hong, Pengyu and Huang, T.S. Update relevant image weights for content-based image retrieval using support vector machines. Multimedia and Expo, 2000. ICME 2000. 2000

IEEE International Conference on, 2, 1199-1202, 2000.

[84] H. Druker, B. Shahrary, and D.C. Gibbon. Support vector machines: relevance feedback and

information retrieval. Information Processing & Management, 3, 305-323, 2002.

[85] Lei Zhang and Fuzong Lin and Bo Zhang. Support vector machine learning for image re- trieval. Image Processing, 2001. Proceedings. 2001 International Conference on, 2, 721-724,

2001.

[86] Francis E.H. Tay and L.J. Cao. Modified support vector machines in financial time series

forecasting. Neurocomputing, 48, 847 - 861, 2002.

[87] Fan, A and Palaniswami, M. Selecting bankruptcy predictors using a support vector ma-

chine approach. Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on, 6, 354-359, 2000.

[88] B. Moghaddam and M. H. Yang. Gender classification using support vector machines. In

Proceedings of IEEE Int. Conference on Image Processing, 2, 471-474, 2000.

[89] Yuan Yao and Gian Luca Marcialis and Massimiliano Pontil and Gian Luca and Marcialis

Massimiliano Pontil and Paolo Frasconi and Fabio Roli. Combining Flat and Structured Repre- sentations for Fingerprint Classification With Recursive Neural Networks and Support Vector

Machines. Pattern Recognition, 36, 397-406, 2003.

[90] Xie, W.F. and Hou, D. J. and Song, Q. Bullet-hole image classification with support vector

machines. Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE Signal Processing Society Workshop, 1, 318-327, 2000.

119 [91] C. Ongun, U. Halici, K. Leblebicioglu, V. Atalay, M. Beksac, and S. Beksac. Feature ex- traction and classification of blood cells for an automated differential blood count system. In

Proceedings of Int. Joint Conference on Neural Networks, 2461-2466, 2001.

[92] Drucker, H. and Wu, S. and Vapnik, V.N. Support vector machines for spam categorization.

Neural Networks, IEEE Transactions on, 10, 1048-1054, 1999.

[93] Junping Zhang and Ye Zhang and Tingxian Zhou. Classification of hyperspectral data using support vector machine. Image Processing, 2001. Proceedings. 2001 International Conference

on, 1, 882-885, 2001.

[94] Ramirez, L. and Pedrycz, W. and Pizzi, N. Severe storm cell classification using support

vector machines and radial basis function approaches. Electrical and Computer Engineering, 2001. Canadian Conference on, 1, 87-91, 2001.

[95] Chapelle, O. and Haffner, P. and Vapnik, V.N. Support vector machines for histogram-based image classification . Neural Networks, IEEE Transactions on, 10, 1055-1064, 1999.

[96] Gintu Xavier, Tintu Erlin Philip, Deepthi T.V.N, K.P Soman. An Efficient Algorithm for

the Segmentation of Astronomical Images. IOSR Journal of Computer Engineering, 6, 21-29,

2012.

[97] Grant Privett. Creating and Enhancing Digital Astro Images. Springer Science & Business Media, Jan 7, 2007.

[98] Jean-Luc Starck and Fionn Murtagh. Astronomical Image and Data Analysis. Springer-

Verlag Berlin Heidelberg, 2006.

[99] Peng, E. W. and Ford, H. C. and Freeman, K. C. and White, R. L. A Young Blue Tidal Stream

in NGC 5128. The Astronomical Journal, 124, 3144-3156, 2002.

120 [100] Jacob Lucas, Brandoch Calef, and Keith Knox. Image enhancement for astronomi- cal scenes. Proc. SPIE 8856, Applications of Digital Image Processing XXXVI, 885603

doi:10.1117/12.2025191, 2013.

[101] Eliska Anna Kubickova. Processing of Astronomical Images Using Matlab Image Process-

ing Toolbox. Ad Alta : Journal of Interdisciplinary Research, 2011.

[102] Adith Chandrasekhar. Point Extraction and Matching for Registration of Infrared Astro- nomical Images. Chester F. Carlson Center for Imaging Science of the College of Science,

Rochester Institute of Technology, Master’s Thesis, 1999.

[103] Hoekzema, N. M. and Brandt, P. N. Small-scale topology of solar atmosphere dynamics.

Astronomy and Astrophysics, 353, 389-395, 2000.

[104] Comerón, S. and Knapen, J. H. and Sheth, K. and Regan, M. W. and Hinz, J. L. and Gil de

Paz, A. and Menéndez-Delmestre, K. and Muñoz-Mateos, J.-C. and Seibert, M. and Kim, T. and Athanassoula, E. and Bosma, A. and Buta, R. J. and Elmegreen, B. G. and Ho, L. C. and

Holwerda, B. W. and Laurikainen, E. and Salo, H. and Schinnerer, E. The Thick Disk in the Galaxy NGC 4244 from S4G Imaging. The Astrophysical Journal, 729, 18, 2011.

[105] Davis, D. R. and Hayes, W. B. Scalable Automated Detection of Arm Seg- ments. The Astrophysical Journal, 790, 87, 2014.

[106] Ji, T-L and Sundareshan, M.K. and Roehrig, H. Adaptive image contrast enhancement based

on human visual properties. Medical Imaging, IEEE Transactions on, 13, 573-586, 1994.

[107] Rafael C. Gonzales and Richard E. Woods. Digital Image Processing. Prentice Hall, 3

edition, 2012.

[108] Starck, J.-L. and Murtagh, F. and Pirenne, B. and Albrecht, M. Astronomical Image Com- pression Based on Noise Suppression. Publications of the Astronomical Society of the Pacific,

108, 446-455, 1996.

121 [109] Faundez-Abans, M. and de Oliveira-Abans, M. Looking for fine structures in galaxies. Astronomy and Astrophysics, 128, 289-297, 1998.

[110] James F. Scholl. Image enhancement of the galaxy VV371c using the 2D fast

wavelet transform. Proc. SPIE 2308, Visual Communications and Image Processing ’94,

doi:10.1117/12.185886., 1268, 1994.

[111] Burkhead, M. S. and Matuska, W. Fourier Transform Enhanced Photography of the M51 System. AAS Photo Bulletin, 23, 13, 1980.

[112] S. Djorgovski. Enhancement Of Features In Galaxy Images. Proc. SPIE 0627, Instrumen-

tation in Astronomy VI, 674 doi:10.1117/12.968146, 674, 1986.

[113] Jenkinson et al. Machine Learning and Image Processing in Astronomy with Sparse Data Sets. Submitted to: IEEE Transactions on Systems, Man, and Cybernetics, 2014.

[114] Starck, J. L. and Donoho, D. L. and Candès, E. J. Astronomical image representation by the

curvelet transform. Astronomy and Astrophysics, 398, 785-800, 2003.

[115] Leonid P. Yaroslavsky. Local adaptive image restoration and enhancement with the use of DFT and DCT in a running window. Proc. SPIE 2825, Wavelet Applications in Signal and

Image Processing IV, doi:10.1117/12.255218., 2, 1996.

[116] Artyom M. Grigoryan and Mehdi Hajinoroozi. Image and Audio Signal Filtration with

Discrete Heap Transforms. Applied Mathematics and Sciences: An International Journal (MathSJ), 1, 2014.

[117] Artyom M. Grigoryan, Sos S. Agaian. Alpha-Rooting Method of Color Image Enhancement

by Discrete Quaternion Fourier Transform. Proc. SPIE 9019, Image Processing: Algorithms and Systems XII, 901904 (February 25, 2014); doi:10.1117/12.2040596.

[118] McClellan, James H. Artifacts in alpha-rooting of images. Acoustics, Speech, and Signal

Processing, IEEE International Conference on ICASSP ’80, 5, 449-452, 1980.

122 [119] Arslan, F.T. and Grigoryan, AM. Image enhancement by the tensor transform. Biomedical Imaging: Nano to Macro, 2004. IEEE International Symposium on, 1, 816-819, 2004.

[120] Arslan, F.T. and Grigoryan, AM. Fast splitting alpha-rooting method of image enhancement: tensor representation. IEEE Trans. Image Process., 15, 3375-3384, 2006.

[121] Arslan, F.T. and Grigoryan, AM. Enhancement of Medical Images by the Paired Transform.

Image Processing, 2007. ICIP 2007. IEEE International Conference on, 1, 537-540, 2007.

[122] Grigoryan, Artyom M. and Naghdali, Khalil. On a Method of Paired Representation: En-

hancement and Decomposition by Series Direction Images. J. Math. Imaging Vis., 34, 185-199, 2009.

[123] Ronald R. Coifman and Artur Sowa. Combining the Calculus of Variations and Wavelets

for Image Enhancement. Applied and Computational Harmonic Analysis, 9, 1-18, 2000.

[124] Ruchika Mishra and Utkarsh Sharma and Manish Shrivastava. Contrast Enhancement of

Remote Sensing Images using DWT with Kernel Filter and DTCWT. International Journal of Computer Applications, 87, 43-49, 2014.

[125] Naghdali, K. and Ranjith, R. and Grigoryan, AM. Fast signal-induced transforms in image

enhancement. Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Confer-

ence on, 565-570, 2009.

[126] Morrow, W.M. and Paranjape, R.B. and Rangayyan, R.M. and Desautels, J. E L. Region- based contrast enhancement of mammograms. Medical Imaging, IEEE Transactions on, 11,

392-406, 1992.

[127] Agaian, S. S. and Silver, B. and Panetta, K. A. Transform Coefficient Histogram-Based

Image Enhancement Algorithms Using Contrast Entropy. Trans. Img. Proc., 16, 741-758, 2007.

123 [128] Agaian, S.S. and Panetta, K. and Grigoryan, AM. Transform-Based Image Enhancement Algorithms with Performance Measure. Image Processing, IEEE Transactions on, 10, 367-

382, 2001.

[129] Douglas F. Elliott. Handbook of Digital Signal Processing: Engineering Applications. Aca-

demic Press, Feb 1, 1988.

[130] Díaz-Hernández, R. and González, J. J. and Costero, R. and Guichard, J. Retrieval of spec- troscopic information from the Tonantzintla Schmidt camera archival plates. Society of Photo-

Optical Instrumentation Engineers (SPIE) Conference Series, 8011, 2011.

[131] A.M. Grigoryan and S.S. Agaian. Multidimensional Discrete Unitary Transforms: Repre-

sentation, Partitioning and Algorithms. Marcel Dekker Inc., New York, 2003.

[132] A. Grigoryan and M. Grigoryan. Brief notes in advanced dsp: Fourier analysis with matlab.

CRC Press Taylor and Francis Group, 2009.

[133] Agaian, S.S. and Panetta, K. and Grigoryan, AM. Discrete unitary transforms generated by moving waves. Proc. of the International Conference: Wavelets XII, SPIE: Optics+Photonics,

6701, 25, 2007.

[134] Grigoryan, AM. 2-D and 1-D multipaired transforms: frequency-time type wavelets. Signal

Processing, IEEE Transactions on, 49, 344-353, 2001.

[135] Grigoryan, AM. and Grigoryan, M.M., Nonlinear Approach Of Construction of Fast Unitary Transforms. Information Sciences and Systems, 2006 40th Annual Conference on, 1073-1078,

2006.

[136] Nobuyuki Otsu. A Threshold Selection Method from Gray-Level Histograms. Systems,

Man and Cybernetics, IEEE Transactions on, 9, 62-66, 1979.

124 [137] Abraham, R. G. and Valdes, F. and Yee, H. K. C. and van den Bergh, S. The morphologies of distant galaxies. 1: an automated classification system. Astrophysical Journal, Part 1, 432,

75-90, 1994.

[138] Ivezi´c, Ž. and Connolly, A.J. and Vanderplas, J.T. and Gray, A. Statistics, Data Mining and

Machine Learning in Astronomy. Princeton University Press, Princeton, NJ 2014.

125 VITA

John Jenkinson is from Austin, Texas. He graduated with a Bachelor of Science from the University of Texas at San Antonio. He is currently completing his Masters of Science in Electrical

Engineering degree from the University of Texas at San Antonio (UTSA). His future plans include attending a PhD program at UTSA.

View publication stats