Evaluation and Comparison of Word Embedding Models, for Efficient Text Classification

Total Page:16

File Type:pdf, Size:1020Kb

Evaluation and Comparison of Word Embedding Models, for Efficient Text Classification Evaluation and comparison of word embedding models, for efficient text classification Ilias Koutsakis George Tsatsaronis Evangelos Kanoulas University of Amsterdam Elsevier University of Amsterdam Amsterdam, The Netherlands Amsterdam, The Netherlands Amsterdam, The Netherlands [email protected] [email protected] [email protected] Abstract soon. On the contrary, even non-traditional businesses, like 1 Recent research on word embeddings has shown that they banks , start using Natural Language Processing for R&D tend to outperform distributional models, on word similarity and HR purposes. It is no wonder then, that industries try to and analogy detection tasks. However, there is not enough take advantage of new methodologies, to create state of the information on whether or not such embeddings can im- art products, in order to cover their needs and stay ahead of prove text classification of longer pieces of text, e.g. articles. the curve. More specifically, it is not clear yet whether or not theusage This is how the concept of word embeddings became pop- of word embeddings has significant effect on various text ular again in the last few years, especially after the work of classifiers, and what is the performance of word embeddings Mikolov et al [14], that showed that shallow neural networks after being trained in different amounts of dimensions (not can provide word vectors with some amazing geometrical only the standard size = 300). properties. The central idea is that words can be mapped to In this research, we determine that the use of word em- fixed-size vectors of real numbers. Those vectors, should be beddings can create feature vectors that not only provide able to hold semantic context, and this is why they were a formidable baseline, but also outperform traditional, very successful in sentiment analysis, word disambiguation count-based methods (bag of words, tf-idf) for the same and syntactic parsing tasks. However, those vectors can still amount of dimensions. We also show that word embed- represent single words only, something that, although use- dings appear to improve accuracy, even further than the ful, really limits the usage of word embeddings in a wider distributional models baseline, when the amount of data is context. relatively small. In addition to that, the averaging of word 1.1 Motivation embeddings appears to be a simple but effective way to keep a significant portion of the semantic information. The motivation behind this thesis, is to examine whether or Besides the overall performance and the standard classifi- not word embeddings can be used in document classification cation metrics (accuracy, recall, precision, F1 score), the time tasks. It is important to understand the main problem, which complexity of the embeddings will also be compared, is how to form a document feature vector, from the separate word vectors. Also, we should examine the possible perfor- Keywords text classification, word embeddings, fasttext, mance issues that different dataset sizes can have, but most word2vec, vectorization importantly, datasets with different semantics. Following, we will present our findings, comparing the us- Acknowledgments age of word embeddings in text classification tasks, through the averaging of the word vectors, and their performance I would like to thank my supervisors, George Tsatsaronis in comparison to baseline, distributional models (bag and Evangelos Kanoulas, for guiding me and providing me of words and tf-idf). with their years of experience, patience, and support, during the completion of this thesis. 2 Classification and Evaluation Metrics I would also like to thank Elsevier, for giving me the chance to work and contribute to the work of amazing individuals, Classification is the process where, given a set of classes, we in such an establishment, and take advantage of their know- try to determine one or more predifined classes/labels, that a how and data, during my research. given object belongs to [13]. More specifically, Using a learn- Last, I would like to thank my mother, for constantly being ing method or learning algorithm , we then wish to train a on my side, supporting and pushing me. classifier or classification function γ that maps documents to classes, like this: 1 Introduction γ = X ! C The accumulating amount of text data, is increasing expo- nentially, day by day. This trend of constant production and 1https://www.ibm.com/blogs/watson/2016/06/ analysis of textual information is not going to stop anytime natural-language-processing-transforming-financial-industry-2/ 1 In our experiments, we used the Gaussian Naive Bayes implementation, from Scikit-Learn, which uses the following formula: 2 1 ¹xi − µy º P¹x jyº = exp(− º i q 2 2 2σy 2πσy 2.1.2 Logistic Regression Logistic regression (LR) is a regression model, where the dependent variable is categorical. This allows it to be used Figure 1. A representation of the classification procedure, in classification, where, as an optimization problem20 [ ], it showing the training and the prediction part. (source: NLTK minimizes the following cost function: Documentation) n 1 Õ minimize loд¹1 + exp(−b aT xºº This group of machine learning algorithms is called su- n i i i=1 pervised learning because a supervisor (the human who defines the classes and labels training documents) serves It was invented in 1958 [22], and it is similar to the naive as a teacher directing the learning process. We denote the Bayes classifier. But instead of using probabilities to setthe supervised learning method by Γ and write Γ¹Dº = γ , where model’s parameters, it searches for the parameters that will maximize the classifier performance D is the training set, containing the documents. The learning [12]. method Γ takes the training set D as input and returns the 2.1.3 Random Forest learned classification function γ . It is important to note, that binary datasets (with true/false Ensemble methods, try to improve accuracy and generaliza- labels only) andmulticlass datasets (with a variety of classes tion in predictions, by using several estimators instead of one. to choose from), are not necessarily different problems. More Random Forest (RF) is an ensemble method based on De- specifically, the multiclass classification problem is usually cision Trees. It uses the averaging method of ensembling, delegated into a binary problem using the "one-vs-rest" method, i.e. averages the predictions of the classifiers used. where each class iteratively becomes the "true" class, and Decision Trees differ from the previously presented algo- is evaluated with the rest of the classes (which collectively rithms, as they are a non-parametric supervised learning become the "false" class). method, that tries to create a model that predicts the value You can take a see an example of the process on Figure 1, of a target variable based on several input variables [24]. You taken from the NLTK documentation [12]. What follows is can take a look on an example of created rules, on Figure 2. a short descriptions of the classifiers used in this thesis, as 2.1.4 Support Vector Machines well as the most common classifier evaluation metrics, both statistical and visual. Support Vector Machines/Classifiers (SVC) make use of hyperplanes, in a high or infinite dimensional space, which 2.1 Common classifiers can be used for classification. They are very effective, com- For this thesis, we needed to test different classification al- pared to other algorithms, in cases where [17]: gorithms and compare the results. We decided to settle on • the dataset is very high-dimensional; and/or a few algorithms, representative of the different classifica- • the number of dimensions is higher than the number tion methods that exist. The implementations used are the of samples. ones found in Scikit-Learn, a machine learning library for Both of those are immediately applicable in text classifi- Python [17]. cation, and although it is slower than other algorithms (es- The selected algorithms, are the following: pecially Bayesian ones), it is very performant and memory efficient. 2.1.1 Naive Bayes Naive bayes (NB) is a classifier based on applying Bayes’ 2.1.5 k-Nearest Neighbors theorem with the "naive" assumption of independence be- One of the simplest and most important algorithms in data tween all the features. It is considered a particularly powerful mining, is the k-Nearest Neighbors. It is a non-parametric machine learning algorithm, with multiple applications, es- method used for classification, where the input consists of pecially in document classification and spam filtering[25]. the k closest training examples in the feature space, and the 2 tp + tn Accuracy = P + N • Precision: the ability of the classifier not to label as positive a sample that is negative, defined as: tp Precision = tp + f p • Recall: ability of the classifier to find all the positive samples, defined as: tp Recall = tp + f n • F1 score: the weighted harmonic mean of the preci- sion and recall, defined as: precision × recall F1 = 2 × precision + recall Figure 2. A tree showing survival of passengers on the Ti- 2.3 Visual Evaluation Metrics - Confusion Matrix tanic. The figures under the leaves show the probability In addition to the above, we can also make use of a visual of survival and the percentage of observations in the leaf. metric, the confusion matrix. Also known as error matrix, (source: Wikipedia) it is a special kind of contigency table, with dimensions equal to the number of classes. It summarizes the algorithm perfor- mance, by exposing the false positives and false negatives. output is the predicted class [2]. Every input item is classified 2.4 Differentiating between binary and multiclass by a majority vote of its neighbors. datasets It is considered very sensitive to the structure of the data, thus it is commonly used for datasets with a small amount In order to differentiate the classification metrics used for of samples.
Recommended publications
  • Malware Classification with BERT
    San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-25-2021 Malware Classification with BERT Joel Lawrence Alvares Follow this and additional works at: https://scholarworks.sjsu.edu/etd_projects Part of the Artificial Intelligence and Robotics Commons, and the Information Security Commons Malware Classification with Word Embeddings Generated by BERT and Word2Vec Malware Classification with BERT Presented to Department of Computer Science San José State University In Partial Fulfillment of the Requirements for the Degree By Joel Alvares May 2021 Malware Classification with Word Embeddings Generated by BERT and Word2Vec The Designated Project Committee Approves the Project Titled Malware Classification with BERT by Joel Lawrence Alvares APPROVED FOR THE DEPARTMENT OF COMPUTER SCIENCE San Jose State University May 2021 Prof. Fabio Di Troia Department of Computer Science Prof. William Andreopoulos Department of Computer Science Prof. Katerina Potika Department of Computer Science 1 Malware Classification with Word Embeddings Generated by BERT and Word2Vec ABSTRACT Malware Classification is used to distinguish unique types of malware from each other. This project aims to carry out malware classification using word embeddings which are used in Natural Language Processing (NLP) to identify and evaluate the relationship between words of a sentence. Word embeddings generated by BERT and Word2Vec for malware samples to carry out multi-class classification. BERT is a transformer based pre- trained natural language processing (NLP) model which can be used for a wide range of tasks such as question answering, paraphrase generation and next sentence prediction. However, the attention mechanism of a pre-trained BERT model can also be used in malware classification by capturing information about relation between each opcode and every other opcode belonging to a malware family.
    [Show full text]
  • Black Box Explanation by Learning Image Exemplars in the Latent Feature Space
    Black Box Explanation by Learning Image Exemplars in the Latent Feature Space Riccardo Guidotti1, Anna Monreale2, Stan Matwin3;4, and Dino Pedreschi2 1 ISTI-CNR, Pisa, Italy, [email protected] 2 University of Pisa, Italy, [email protected] 3 Dalhousie University, [email protected] 4 Institute of Computer Scicne, Polish Academy of Sciences Abstract. We present an approach to explain the decisions of black box models for image classification. While using the black box to label im- ages, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first gener- ates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to be- come counter-factuals by \morphing" into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Be- sides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability. Keywords: Explainable AI, Adversarial Autoencoder, Image Exemplars. 1 Introduction Automated decision systems based on machine learning techniques are widely used for classification, recognition and prediction tasks.
    [Show full text]
  • Performance Comparison of Support Vector Machine, Random Forest, and Extreme Learning Machine for Intrusion Detection
    Technological University Dublin ARROW@TU Dublin Articles School of Science and Computing 2018-7 Performance Comparison of Support Vector Machine, Random Forest, and Extreme Learning Machine for Intrusion Detection Iftikhar Ahmad King Abdulaziz University, Saudi Arabia, [email protected] MUHAMMAD JAVED IQBAL UET Taxila MOHAMMAD BASHERI King Abdulaziz University, Saudi Arabia See next page for additional authors Follow this and additional works at: https://arrow.tudublin.ie/ittsciart Part of the Computer Sciences Commons Recommended Citation Ahmad, I. et al. (2018) Performance Comparison of Support Vector Machine, Random Forest, and Extreme Learning Machine for Intrusion Detection, IEEE Access, vol. 6, pp. 33789-33795, 2018. DOI :10.1109/ACCESS.2018.2841987 This Article is brought to you for free and open access by the School of Science and Computing at ARROW@TU Dublin. It has been accepted for inclusion in Articles by an authorized administrator of ARROW@TU Dublin. For more information, please contact [email protected], [email protected]. This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 License Authors Iftikhar Ahmad, MUHAMMAD JAVED IQBAL, MOHAMMAD BASHERI, and Aneel Rahim This article is available at ARROW@TU Dublin: https://arrow.tudublin.ie/ittsciart/44 SPECIAL SECTION ON SURVIVABILITY STRATEGIES FOR EMERGING WIRELESS NETWORKS Received April 15, 2018, accepted May 18, 2018, date of publication May 30, 2018, date of current version July 6, 2018. Digital Object Identifier 10.1109/ACCESS.2018.2841987
    [Show full text]
  • Machine Learning Methods for Classification of the Green
    International Journal of Geo-Information Article Machine Learning Methods for Classification of the Green Infrastructure in City Areas Nikola Kranjˇci´c 1,* , Damir Medak 2, Robert Župan 2 and Milan Rezo 1 1 Faculty of Geotechnical Engineering, University of Zagreb, Hallerova aleja 7, 42000 Varaždin, Croatia; [email protected] 2 Faculty of Geodesy, University of Zagreb, Kaˇci´ceva26, 10000 Zagreb, Croatia; [email protected] (D.M.); [email protected] (R.Ž.) * Correspondence: [email protected]; Tel.: +385-95-505-8336 Received: 23 August 2019; Accepted: 21 October 2019; Published: 22 October 2019 Abstract: Rapid urbanization in cities can result in a decrease in green urban areas. Reductions in green urban infrastructure pose a threat to the sustainability of cities. Up-to-date maps are important for the effective planning of urban development and the maintenance of green urban infrastructure. There are many possible ways to map vegetation; however, the most effective way is to apply machine learning methods to satellite imagery. In this study, we analyze four machine learning methods (support vector machine, random forest, artificial neural network, and the naïve Bayes classifier) for mapping green urban areas using satellite imagery from the Sentinel-2 multispectral instrument. The methods are tested on two cities in Croatia (Varaždin and Osijek). Support vector machines outperform random forest, artificial neural networks, and the naïve Bayes classifier in terms of classification accuracy (a Kappa value of 0.87 for Varaždin and 0.89 for Osijek) and performance time. Keywords: green urban infrastructure; support vector machines; artificial neural networks; naïve Bayes classifier; random forest; Sentinel 2-MSI 1.
    [Show full text]
  • Random Forest Regression of Markov Chains for Accessible Music Generation
    Random Forest Regression of Markov Chains for Accessible Music Generation Vivian Chen Jackson DeVico Arianna Reischer [email protected] [email protected] [email protected] Leo Stepanewk Ananya Vasireddy Nicholas Zhang [email protected] [email protected] [email protected] Sabar Dasgupta* [email protected] New Jersey’s Governor’s School of Engineering and Technology July 24, 2020 *Corresponding Author Abstract—With the advent of machine learning, new generative algorithms have expanded the ability of computers to compose creative and meaningful music. These advances allow for a greater balance between human input and autonomy when creating original compositions. This project proposes a method of melody generation using random forest regression, which in- creases the accessibility of generative music models by addressing the downsides of previous approaches. The solution generalizes the concept of Markov chains while avoiding the excessive computational costs and dataset requirements associated with past models. To improve the musical quality of the outputs, the model utilizes post-processing based on various scoring metrics. A user interface combines these modules into an application that achieves the ultimate goal of creating an accessible generative music model. Fig. 1. A screenshot of the user interface developed for this project. I. INTRODUCTION One of the greatest challenges in making generative music is emulating human artistic expression. DeepMind’s generative II. BACKGROUND audio model, WaveNet, attempts this challenge, but requires A. History of Generative Music large datasets and extensive training time to produce qual- ity musical outputs [1]. Similarly, other music generation The term “generative music,” first popularized by English algorithms such as MelodyRNN, while effective, are also musician Brian Eno in the late 20th century, describes the resource intensive and time-consuming.
    [Show full text]
  • On the Boosting Ability of Top-Down Decision Tree Learning Algorithms
    On the Bo osting AbilityofTop-Down Decision Tree Learning Algorithms Michael Kearns Yishay Mansour AT&T Research Tel-Aviv University May 1996 Abstract We analyze the p erformance of top-down algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a pro of that such algorithms are boosting algorithms. By this we mean that if the functions that lab el the internal no des of the decision tree can weakly approximate the unknown target function, then the top-down algorithms we study will amplify this weak advantage to build a tree achieving any desired level of accuracy. The b ounds we obtain for this ampli catio n showaninteresting dep endence on the splitting criterion used by the top-down algorithm. More precisely, if the functions used to lab el the internal no des have error 1=2 as approximation s to the target function, then for the splitting criteria used by CART and C4.5, trees 2 2 2 O 1= O log 1== of size 1= and 1= resp ectively suce to drive the error b elow .Thus for example, a small constant advantage over random guessing is ampli ed to any larger constant advantage with trees of constant size. For a new splitting criterion suggested by our analysis, the much stronger 2 O 1= b ound of 1= which is p olynomial in 1= is obtained, whichisprovably optimal for decision tree algorithms. The di ering b ounds have a natural explanation in terms of concavity prop erties of the splitting criterion.
    [Show full text]
  • Boosting with Multi-Way Branching in Decision Trees
    Boosting with Multi-Way Branching in Decision Trees Yishay Mansour David McAllester AT&T Labs-Research 180 Park Ave Florham Park NJ 07932 {mansour, dmac }@research.att.com Abstract It is known that decision tree learning can be viewed as a form of boosting. However, existing boosting theorems for decision tree learning allow only binary-branching trees and the generalization to multi-branching trees is not immediate. Practical decision tree al­ gorithms, such as CART and C4.5, implement a trade-off between the number of branches and the improvement in tree quality as measured by an index function. Here we give a boosting justifica­ tion for a particular quantitative trade-off curve. Our main theorem states, in essence, that if we require an improvement proportional to the log of the number of branches then top-down greedy con­ struction of decision trees remains an effective boosting algorithm. 1 Introduction Decision trees have been proved to be a very popular tool in experimental machine learning. Their popularity stems from two basic features - they can be constructed quickly and they seem to achieve low error rates in practice. In some cases the time required for tree growth scales linearly with the sample size. Efficient tree construction allows for very large data sets. On the other hand, although there are known theoretical handicaps of the decision tree representations, it seem that in practice they achieve accuracy which is comparable to other learning paradigms such as neural networks. While decision tree learning algorithms are popular in practice it seems hard to quantify their success ,in a theoretical model.
    [Show full text]
  • Evaluating the Combination of Word Embeddings with Mixture of Experts and Cascading Gcforest in Identifying Sentiment Polarity
    Evaluating the Combination of Word Embeddings with Mixture of Experts and Cascading gcForest In Identifying Sentiment Polarity by Mounika Marreddy, Subba Reddy Oota, Radha Agarwal, Radhika Mamidi in 25TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (SIGKDD-2019) Anchorage, Alaska, USA Report No: IIIT/TR/2019/-1 Centre for Language Technologies Research Centre International Institute of Information Technology Hyderabad - 500 032, INDIA August 2019 Evaluating the Combination of Word Embeddings with Mixture of Experts and Cascading gcForest In Identifying Sentiment Polarity Mounika Marreddy Subba Reddy Oota [email protected] IIIT-Hyderabad IIIT-Hyderabad Hyderabad, India Hyderabad, India [email protected] [email protected] Radha Agarwal Radhika Mamidi IIIT-Hyderabad IIIT-Hyderabad Hyderabad, India Hyderabad, India [email protected] [email protected] ABSTRACT an effective neural networks to generate low dimensional contex- Neural word embeddings have been able to deliver impressive re- tual representations and yields promising results on the sentiment sults in many Natural Language Processing tasks. The quality of analysis [7, 14, 21]. the word embedding determines the performance of a supervised Since the work of [2], NLP community is focusing on improving model. However, choosing the right set of word embeddings for a the feature representation of sentence/document with continuous given dataset is a major challenging task for enhancing the results. development in neural word embedding. Word2Vec embedding In this paper, we have evaluated neural word embeddings with was the first powerful technique to achieve semantic similarity (i) a mixture of classification experts (MoCE) model for sentiment between words but fail to capture the meaning of a word based classification task, (ii) to compare and improve the classification on context [17].
    [Show full text]
  • Inductive Bias in Decision Tree Learning • Issues in Decision Tree Learning • Summary
    Machine Learning Decision Tree Learning Artificial Intelligence & Computer Vision Lab School of Computer Science and Engineering Seoul National University Overview • Introduction • Decision Tree Representation • Learning Algorithm • Hypothesis Space Search • Inductive Bias in Decision Tree Learning • Issues in Decision Tree Learning • Summary AI & CV Lab, SNU 2 Introduction • Decision tree learning is a method for approximating discrete-valued target function • The learned function is represented by a decision tree • Decision tree can also be re-represented as if-then rules to improve human readability AI & CV Lab, SNU 3 Decision Tree Representation • Decision trees classify instances by sorting them down the tree from the root to some leaf node • A node – Specifies some attribute of an instance to be tested • A branch – Corresponds to one of the possible values for an attribute AI & CV Lab, SNU 4 Decision Tree Representation (cont.) Outlook Sunny Overcast Rain Humidity Yes Wind High Normal Strong Weak No Yes No Yes A Decision Tree for the concept PlayTennis AI & CV Lab, SNU 5 Decision Tree Representation (cont.) • Each path corresponds to a conjunction of attribute Outlook tests. For example, if the instance is (Outlook=sunny, Temperature=Hot, Sunny Rain Humidity=high, Wind=Strong) then the path of Overcast (Outlook=Sunny ∧ Humidity=High) is matched so that the target value would be NO as shown in the tree. Humidity Wind • A decision tree represents a disjunction of Yes conjunction of constraints on the attribute values of instances. For example, three positive instances can High Normal Strong Weak be represented as (Outlook=Sunny ∧ Humidity=normal) ∨ (Outlook=Overcast) ∨ (Outlook=Rain ∧Wind=Weak) as shown in the tree.
    [Show full text]
  • A Prediction for Student's Performance Using Decision Tree ID3 Method
    International Journal of Scientific & Engineering Research, Volume 5, Issue 7, July-2014 1329 ISSN 2229-5518 Data Mining: A prediction for Student's Performance Using Decision Tree ID3 Method D.BHU LAKSHMI, S. ARUNDATHI, DR.JAGADEESH Abstract— Knowledge Discovery and Data Mining (KDD) is a multidisciplinary area focusing upon methodologies for extracting useful knowledge from data and there are several useful KDD tools to extracting the knowledge. This knowledge can be used to increase the quality of education. But educational institution does not use any knowledge discovery process approach on these data. Data mining can be used for decision making in educational system. A decision tree classifier is one of the most widely used supervised learning methods used for data exploration based on divide & conquer technique. This paper discusses use of decision trees in educational data mining. Decision tree algorithms are applied on students’ past performance data to generate the model and this model can be used to predict the students’ performance. The most useful data mining techniques in educational database is classification, the decision tree (ID3) method is used here. Index Terms— Educational Data Mining, Classification, Knowledge Discovery in Database (KDD), ID3 Algorithm. 1. Introduction the students, prediction about students’ The advent of information technology in various performance and so on, the classification task is fields has lead the large volumes of data storage in used to evaluate student’s performance and as various formats like records, files, documents, there are many approaches that are used for data images, sound, videos, scientific data and many classification, the decision tree method is used new data formats.
    [Show full text]
  • 10-601 Machine Learning, Project Phase1 Report Random Forest
    10-601 Machine Learning, Project Phase1 Report Group Name: DEADLINE Team Member: Zhitao Pei (zhitaop), Sean Hao (xinhao) Random Forest Environment: Weka 3.6.11 Data: Full dataset Parameters: 200 trees 400 features 1 seed Unlimited max depth of trees Accuracy: The training takes about half an hour and achieve an accuracy of 39.886%. Explanation: The reason we choose it is that random forest learner will usually give good performance compared to other classifiers. Decision tree is one of the best classifiers as the ranking showed in the class. Random forest is an ensemble of decision trees which is able to reduce the variance and give a better and unbiased result compared to other decision tree. The error mostly occurs when the images are hard to tell the difference simply based on the grid. Multilayer Perceptron Environment: Weka 3.6.11 Parameters: Hidden Layer: 3 Learning Rate: 0.3 Momentum: 0.2 Training Time: 500 Validation Threshold: 20 Accuracy: 27.448% Explanation: I chose Neural Network because I consider the features are independent since they are pixels of picture. To get the relationships between those pixels, a good way is weight different features and combine them to get a result. Multilayer perceptron is perfectly match with my imagination. However, training Multilayer perceptrons consumes huge time once there are many nodes in hidden layer. So I indicates that the node in hidden layer only could be 3. It is bad but that's a sort of trade off. In next phase, I will try to construct different Neural Network structure to reduce the training time and improve model accuracy.
    [Show full text]
  • Galaxy Classification with Deep Convolutional Neural Networks
    c 2016 Honghui Shi GALAXY CLASSIFICATION WITH DEEP CONVOLUTIONAL NEURAL NETWORKS BY HONGHUI SHI THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Engineering in the Graduate College of the University of Illinois at Urbana-Champaign, 2016 Urbana, Illinois Adviser: Professor Thomas S. Huang ABSTRACT Galaxy classification, using digital images captured from sky surveys to de- termine the galaxy morphological classes, is of great interest to astronomy researchers. Conventional methods rely heavily on a few handcrafted mor- phological features while popular feature extraction methods that developed for natural images are not suitable for galaxy images. Deep convolutional neural networks (CNNs) are able to learn powerful features from images by hierarchical convolutional and pooling operations. This work applies state-of- the-art deep CNN technologies to galaxy classification for both a regression task and multi-class classification tasks. We also implement and compare the performance with several different conventional machine learning algorithms for a classification sub-task. Our experiments show that convolutional neural networks are able to learn representative features automatically and achieve high performance, surpassing both human recognition and other machine learning methods. ii To my family, especially my wife, and my friends near or far. To my adviser, to whom I owe much thanks! iii ACKNOWLEDGMENTS I would like to acknowledge my adviser Professor Thomas Huang, who has given me lots of guidance, support, and visionary insights. I would also like to acknowledge Professor Robert Brunner who led me to the topic and granted me lots of help during the research.
    [Show full text]