BUSEM at Semeval-2017 Task 4A Sentiment Analysis with Word

Total Page:16

File Type:pdf, Size:1020Kb

BUSEM at Semeval-2017 Task 4A Sentiment Analysis with Word BUSEM at SemEval-2017 Task 4 Sentiment Analysis with Word Embedding and Long Short Term Memory RNN Approaches Deger Ayata1, Murat Saraclar 1, Arzucan Ozgur2 1Electrical & Electronical Engineering Department, Bogaziçi University 2Computer Engineering Department, Bogaziçi University Istanbul , Turkey {deger.ayata, murat.saraclar, arzucan.ozgur}@boun.edu.tr Abstract uses word embeddings for feature representation and Support Vector Machine This paper describes our approach for (SVM), Random Forest (RF) and Naive Bayes SemEval-2017 Task 4: Sentiment Analysis in (NB) algorithms for classification Twitter Twitter. We have participated in Subtask A: messages into negative, neutral and positive Message Polarity Classification subtask and polarity. The second system is based on Long developed two systems. The first system Short Term Memory Recurrent Neural uses word embeddings for feature representation and Support Vector Machine, Networks (LSTM) and uses word indexes as Random Forest and Naive Bayes algorithms sequence of inputs for feature representation. for the classification of Twitter messages into The remainder of this article is structured as negative, neutral and positive polarity. The follows: Section 2 contains information about second system is based on Long Short Term the system description and Section 3 explains Memory Recurrent Neural Networks and methods, models, tools and software packages uses word indexes as sequence of inputs for used in this work. Test cases and datasets are feature representation. explained in Section 4. Results are given in Section 5 with discussions. Finally, section 6 summarizes the conclusions and future work. 1 Introduction 2 System Description Sentiment analysis is extracting subjective information from source materials, via natural We have developed two independent language processing, computational linguistics, systems. The first system is word embedding text mining and machine learning. centric and described in subsection 2.1. The Classification of users’ reviews about a second is LSTM based and described in concept or political view may bring different subsection 2.2. Further details about both opportunities including customer satisfaction systems are given in Section 3. rating, making right recommendations to right target, categorization of users etc. Sentiment 2.1 Word Embedding based System Analysis is often referred to as subjectivity Description analysis, opinion mining and appraisal In word embedding centric system approach, extraction with some connections to affective each word in a tweet is represented with a computing. Sometimes whole documents are vector. Tweets consist of words and vectorial studied as a sentiment unit (Turney and values of words (word vectors) are used to Littman, 2003), but it’s generally agreed that represent tweets as vectorial values. Word sentiment resides in smaller linguistic units Embedding system framework is shown Figure (Pang and Lee, 2008). 1. Two methods are used to obtain word This paper describes our approach for vectors in this work. The first method is based SemEval-2017 Task 4: Sentiment Analysis in on generating word vectors via constructing a Twitter. We have participated in Subtask A: word2vec model from semeval corpus as Message Polarity Classification subtask. We depicted in Figure 1 steps 1 and 2. The second have developed two systems. The first system 777 Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval-2017), pages 777–783, Vancouver, Canada, August 3 - 4, 2017. c 2017 Association for Computational Linguistics method is based on Google News pre-trained contains both training data and test data (steps word vectors model (step 3). 5 and 6). This stage includes also the preprocessing of the tweets, e.g. deleting http Corpus (1) links and twitter user names included in the (3) tweets, deleting the duplicate tweets which Use Google News (2) Generate word2vec pre-trained word occur multiple times on the dataset etc. Later, vectors model model preprocessed and formatted tweets are iterated to generate a tweet vector for each tweets by (4) Word vectors using the words they contained. Therefore, inputs of this stage are the dataset and the model which includes word vectors, while its Vectorize tweets in the (5) dataset (6) output is a set containing tweet vectors, both Vectorize each one of Dataset the tweets in the for the train and the test data. dataset using the word vectors Outputted tweet vectors are in a numerical Tweet (7) format which can be given as an input to vectors multiple machine learning algorithms with the purpose of classifying them into categories or (8) testing an existing model (step 8). At this Train and test a model stage, each tweet in the dataset is represented Train a classifier with train data and test the as a vector with multiple dimensions (step 7). classifier with test data It is possible to train a new classifier model or load a pre-trained and saved model. SVM, RF, and NB classifier models are trained in this (9) Results work. Tweets are categorized into three classes which are negative, neutral and positive (step Figure 1: General framework of the system 9). In the first method, a word2vec model is 2.2 LSTM Based System Description construted by using the entire semeval tweet corpus and a vector space (word2vec model) The pipeline of the second system consists of has been created. This model contains vector many steps : reading Tweets from Semeval values for all unique words in the corpus. datasets, preprocessing Tweets, representing Words which have similar contexts are each word with an index, then representing positioned closer on this space. The parameters each Tweet with a set of word index sequence used in training word2vec model effect the and training a LSTM classifier with sequence performance of the whole framework. index array. The Flowchart of this system is Therefore it is important to find optimal shown in Figure 2. parameter values. This work is focused on the parameter named feature vector dimension size and its impact on the general performance. Dataset Pre-processing Indexing LSTM This parameter determines the dimensionality of the word vectors, which are generated via the word2vec model. Figure 2: LSTM based system pipeline The second method is based on Google News pre-trained word vectors model. This method 3 Methods and Tools uses the Google News pre-trained word vectors model to obtain word vectors as shown in 3.1 Word Embedding Figure 1 step 3. The Google news pre-trained Word embedding stands for a set of natural model is a dictionary which contains word and language processing methods, where words or vectorial value pairs, and it is generated via a phrases from the vocabulary are mapped to word2vec model trained on the Google News vectorial values of real numbers (Bengio et text corpus. al.,2003). Embeddings have been shown to Next stage includes using the obtained word boost the performance of natural language vectors to vectorize tweets in the dataset which processing tasks such as sentiment analysis 778 (Socher et al., 2013). Vector representations of is given in equation (E.4). We have used words can be used in vector operations like Random Forest with the following parameters addition and subtraction. Vectors generated by ={bagSizePercent =100, batchSize=100} word embedding can be used to represent sentences, tweets or whole documents as 푛 {(푥푖, 푦푖)}푖=1 : 푡푟푎푖푛푖푛푔 푠푒푡, vectorial values. There are multiple methods to generate sentence vectors using the word 푦 ∗ : 푝푟푒푑푖푐푡푖표푛푠, vectors, a modified version of the sum 푥′ ∶ 푛푒푤 푝표푖푛푡푠 푡표 푐푙푎푠푠푖푓푦, ′ ′ representation method which is proposed by 푊(푥푖, 푥 )푦푖 ∶ 푤푒푖푔ℎ푡 표푓 푡ℎ푒 푖 푡ℎ 푓푢푛푐푡푖표푛, Blacoe is used in this work (Blacoe et al., 푊 ∶ 푤푒푖푔ℎ푡 푓푢푛푐푡푖표푛, 2012). The sum representation model, in its original 푚 푛 1 state, is generated via summing the vectorial 푦∗ = ∑ ∑ 푊 (푥 , 푥′)푦 푚 푗 푖 푖 embeddings of words which a sentence 푗=1 푖=1 푛 푚 (E.4) contains. The related equations are given 1 = ∑ ( ∑ 푊 (푥 , 푥′)) 푦 below with E.1, E.2 and E.3: 푚 푗 푖 푖 푖=1 푗=1 푇푤푡: 푡푤푒푒푡, 푡푤푡푉푒푐: 푡푤푒푒푡 푣푒푐푡표푟, 푤: 푤표푟푑, 푤푑푉푒푐: 푤표푟푑 푣푒푐푡표푟, 푛: 푛푢푚푏푒푟 표푓 푤표푟푑푠 푖푛 푡푤푒푒푡, 3.2.3 Naïve Bayes (푖) (푖) 푇푤푡푖 = (푤1 , … , 푤푛 ) ∶ words in tweet Naïve-Bayes is a probabilistic classifier based on Bayes’ theorem, based on 푡푤푡푉푒푐[푗] = ∑ 푤푑푉푒푐 [푗] independence of features (John et al, 1995). 푤푘 (E.1) 푘=1,…,푛푖 Mathematical expression is given in equation (E.5). A modified version is used in this work. Derived version considers the number of words. The related 푐 ∗∶ 푝푟푒푑푖푐푡푒푑 푐푙푎푠푠 푥 ∶ 푠푎푚푝푙푒 equations are given below: ℎ푁퐵: 푛푎ï푣푒 푏푎푦푒푠 푓푢푛푐 ( ) ( ) 푇푤푡 = (푤 푖 , … , 푤 푖 ) (E.2) ∗ 푖 1 푛 푐 = ℎ푁퐵(푥) ∑푘=1,…,푛 푤푑푉푒푐푤 [푗] 푡푤푡푉푒푐[푗] = 푖 푘 (E.3) 푛 = 푎푟푔푚푎푥푗=1…푚푃(푐푗) ∏ 푃(푋푖 = 푥푖|푐푗) (E.5) 3.2 Classification Models 3.2.4 Long Short Term Memory Recurrent Neural Nets 3.2.1 Support Vector Machine LSTM networks have similiar architecture to SVM finds a hyper plane seperating tweet Recurrent Neural Nets (RNNs), except that vectors according to their classes while making they use different functions and architecture to the margin as large as possible. After training, compute the hidden state. They were it classifies test records according to which introduced by Hochreiter & Schmidhuber side of the hyperplane their positions are (1997) to avoid the long-term dependency (Fradkin et al, 2000). We have used SVM with problem and were refined and popularized by the following parameters = {Kernel = many people in next studies. PolyKernel, batchSize=100} LSTMs have the form of a chain of 3.2.2 Random Forest repeating modules of a special kind of architecture. The memory in LSTMs are Random forests, first proposed by Ho (Ho, called cells. Internally, these cells decide what 1995) and later improved by Breiman to keep in and what to erase from memory.
Recommended publications
  • Comparative Study of Deep Learning Software Frameworks
    Comparative Study of Deep Learning Software Frameworks Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, Mohak Shah Research and Technology Center, Robert Bosch LLC {Soheil.Bahrampour, Naveen.Ramakrishnan, fixed-term.Lukas.Schott, Mohak.Shah}@us.bosch.com ABSTRACT such as dropout and weight decay [2]. As the popular- Deep learning methods have resulted in significant perfor- ity of the deep learning methods have increased over the mance improvements in several application domains and as last few years, several deep learning software frameworks such several software frameworks have been developed to have appeared to enable efficient development and imple- facilitate their implementation. This paper presents a com- mentation of these methods. The list of available frame- parative study of five deep learning frameworks, namely works includes, but is not limited to, Caffe, DeepLearning4J, Caffe, Neon, TensorFlow, Theano, and Torch, on three as- deepmat, Eblearn, Neon, PyLearn, TensorFlow, Theano, pects: extensibility, hardware utilization, and speed. The Torch, etc. Different frameworks try to optimize different as- study is performed on several types of deep learning ar- pects of training or deployment of a deep learning algorithm. chitectures and we evaluate the performance of the above For instance, Caffe emphasises ease of use where standard frameworks when employed on a single machine for both layers can be easily configured without hard-coding while (multi-threaded) CPU and GPU (Nvidia Titan X) settings. Theano provides automatic differentiation capabilities which The speed performance metrics used here include the gradi- facilitates flexibility to modify architecture for research and ent computation time, which is important during the train- development. Several of these frameworks have received ing phase of deep networks, and the forward time, which wide attention from the research community and are well- is important from the deployment perspective of trained developed allowing efficient training of deep networks with networks.
    [Show full text]
  • Reinforcement Learning in Videogames
    Reinforcement Learning in Videogames Alex` Os´esLaza Final Project Director: Javier B´ejarAlonso FIB • UPC 20 de junio de 2017 2 Acknowledgements First I would like and appreciate the work and effort that my director, Javier B´ejar, has put into me. Either solving a ton of questions that I had or guiding me throughout the whole project, he has helped me a lot to make a good final project. I would also love to thank my family and friends, that supported me throughout the entirety of the career and specially during this project. 3 4 Abstract (English) While there are still a lot of projects and papers focused on: given a game, discover and measure which is the best algorithm for it, I decided to twist things around and decided to focus on two algorithms and its parameters be able to tell which games will be best approachable with it. To do this, I will be implementing both algorithms Q-Learning and SARSA, helping myself with Neural Networks to be able to represent the vast state space that the games have. The idea is to implement the algorithms as general as possible.This way in case someone wanted to use my algorithms for their game, it would take the less amount of time possible to adapt the game for the algorithm. I will be using some games that are used to make Artificial Intelligence competitions so I have a base to work with, having more time to focus on the actual algorithm implementation and results comparison. 5 6 Abstract (Catal`a) Mentre ja existeixen molts projectes i estudis centrats en: donat un joc, descobrir i mesurar quin es el millor algoritme per aquell joc, he decidit donar-li la volta i centrar-me en donat dos algorismes i els seus par`ametres,ser capa¸cde trobar quin tipus de jocs es beneficien m´esde la configuraci´odonada.
    [Show full text]
  • Distributed Negative Sampling for Word Embeddings Stergios Stergiou Zygimantas Straznickas Yahoo Research MIT [email protected] [email protected]
    Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Distributed Negative Sampling for Word Embeddings Stergios Stergiou Zygimantas Straznickas Yahoo Research MIT [email protected] [email protected] Rolina Wu Kostas Tsioutsiouliklis University of Waterloo Yahoo Research [email protected] [email protected] Abstract Training for such large vocabularies presents several chal- lenges. Word2Vec needs to maintain two d-dimensional vec- Word2Vec recently popularized dense vector word represen- tors of single-precision floating point numbers for each vo- tations as fixed-length features for machine learning algo- cabulary word. All vectors need to be kept in main memory rithms and is in widespread use today. In this paper we in- to achieve acceptable training latency, which is impractical vestigate one of its core components, Negative Sampling, and propose efficient distributed algorithms that allow us to scale for contemporary commodity servers. As a concrete exam- to vocabulary sizes of more than 1 billion unique words and ple, Ordentlich et al. (Ordentlich et al. 2016) describe an corpus sizes of more than 1 trillion words. application in which they use word embeddings to match search queries to online ads. Their dictionary comprises ≈ 200 million composite words which implies a main mem- Introduction ory requirement of ≈ 800 GB for d = 500. A complemen- tary challenge is that corpora sizes themselves increase. The Recently, Mikolov et al (Mikolov et al. 2013; Mikolov and largest reported dataset that has been used was 100 billion Dean 2013) introduced Word2Vec, a suite of algorithms words (Mikolov and Dean 2013) which required training for unsupervised training of dense vector representations of time in the order of days.
    [Show full text]
  • Comparative Study of Caffe, Neon, Theano, and Torch
    Workshop track - ICLR 2016 COMPARATIVE STUDY OF CAFFE,NEON,THEANO, AND TORCH FOR DEEP LEARNING Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, Mohak Shah Bosch Research and Technology Center fSoheil.Bahrampour,Naveen.Ramakrishnan, fixed-term.Lukas.Schott,[email protected] ABSTRACT Deep learning methods have resulted in significant performance improvements in several application domains and as such several software frameworks have been developed to facilitate their implementation. This paper presents a comparative study of four deep learning frameworks, namely Caffe, Neon, Theano, and Torch, on three aspects: extensibility, hardware utilization, and speed. The study is per- formed on several types of deep learning architectures and we evaluate the per- formance of the above frameworks when employed on a single machine for both (multi-threaded) CPU and GPU (Nvidia Titan X) settings. The speed performance metrics used here include the gradient computation time, which is important dur- ing the training phase of deep networks, and the forward time, which is important from the deployment perspective of trained networks. For convolutional networks, we also report how each of these frameworks support various convolutional algo- rithms and their corresponding performance. From our experiments, we observe that Theano and Torch are the most easily extensible frameworks. We observe that Torch is best suited for any deep architecture on CPU, followed by Theano. It also achieves the best performance on the GPU for large convolutional and fully connected networks, followed closely by Neon. Theano achieves the best perfor- mance on GPU for training and deployment of LSTM networks. Finally Caffe is the easiest for evaluating the performance of standard deep architectures.
    [Show full text]
  • Using Long Short-Term Memory Recurrent Neural Network in Land Cover Classification on Landsat and Cropland Data Layer Time Series
    Using Long Short-Term Memory Recurrent Neural Network in Land Cover Classification on Landsat and Cropland Data Layer time series Ziheng Suna, Liping Dia,*, Hui Fang a Center for Spatial Information Science and Systems, George Mason University, Fairfax, United States * [email protected]; Tel: +1-703-993-6114; Fax: +1-703-993-6127; mailing address: 4087 University Dr STE 3100, Fairfax, VA, 22030, United States. Dr. Ziheng Sun is a research assistant professor in Center for Spatial Information Science and Systems, George Mason University. Dr. Liping Di is a professor of Geography and Geoinformation Science, George Mason University and the director of Center for Spatial Information Science and Systems, George Mason University. Hui Fang receives her M.Sc degree from State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University. This manuscript contains 6328 words. Using Long Short-Term Memory Recurrent Neural Network in Land Cover Classification on Landsat Time Series and Cropland Data Layer Land cover maps are significant in assisting agricultural decision making. However, the existing workflow of producing land cover maps is very complicated and the result accuracy is ambiguous. This work builds a long short-term memory (LSTM) recurrent neural network (RNN) model to take advantage of the temporal pattern of crops across image time series to improve the accuracy and reduce the complexity. An end-to-end framework is proposed to train and test the model. Landsat scenes are used as Earth observations, and some field- measured data together with CDL (Cropland Data Layer) datasets are used as reference data. The network is thoroughly trained using state-of-the-art techniques of deep learning.
    [Show full text]
  • Learning Options in Deep Reinforcement Learning Jean
    Learning Options in Deep Reinforcement Learning Jean Merheb-Harb Computer Science McGill University, Montreal December 8, 2016 A thesis submitted to McGill University in partial fulfilment of the requirements of the degree of Master of Science. c Jean Merheb-Harb; December 8, 2016. i Acknowledgements First and foremost, I would like to thank my supervisor, Doina Precup, for all the encouragement and for teaching me so much. A special thanks to Pierre-Luc Bacon, for the great discussions we’ve had and enduring my constant bombardment of questions. ii Abstract Temporal abstraction is a key idea in decision making that is seen as crucial for creating artificial intelligence. Humans have the ability to reason over long periods of time and make decisions on multiple time scales. Options are a rein- forcement learning framework that allow an agent to make temporally extended decisions. In this thesis, we present the deep option-critic, which combines the powerful representation learning capabilities of deep learning models with the option-critic to learn both state and temporal abstraction automatically from data in an end-to-end fashion. We apply the algorithm on the Arcade Learning Environment, where the agent must learn to play Atari 2600 video games, and analyze performance and behaviours learned using the options framework. iii Résumé L’abstraction temporelle est une idée clée dans la prise de décision qui est considérée comme cruciale pour la création de l’intelligence artificielle. Les humains ont la capacité de raisonner sur de longues périodes de temps et pren- dre des décisions sur différentes échelles de temps.
    [Show full text]
  • Deep Learning: Concepts and Implementation Tools
    Deep learning: concepts and implementation tools Eddy S´anchez-DelaCruz and David Lara-Alabazares Postgraduate Department, Technological Institute of Misantla, Veracruz Mexico. feddsacx, [email protected] Abstract. In this paper, the concepts and tool available to use Deep learning in scholar projects are given. We carry out experiments by com- bining meta-classifiers with a deep artificial neural network in four binary datasets. The results show optimal percentages of correct classification in some cases. The sample criteria that prevalence in this study was a representative sample over traditional criteria. Keywords: Artificial Intelligence · Deep Learning · Artificial Neural Network · Meta-classifiers 1 Introduction At the beginning of the 21st century, Artificial Intelligence (AI) in its various disciplines that integrate it, has started in a surprising way, to emulate faith- fully human behavior and reasoning. As results of this, remarkable progress have emerged in different fields such as the computer-assisted medical diagnosis, clas- sification of DNA sequences, data mining, artificial vision, voice recognition, analysis of written language, virtual games, robotics, and any others where the reasoning is the main element [9]. Among the different disciplines of AI, the deep learning is a novel alternative (from a few decades ago) that has as main objective to make that an intelligent agent can be capable to make its own decisions, something that currently is only possible in science fiction. In this sense, different approaches of Artificial Neural Networks (ANN) are used in deep learning, having as goal provides to an agent of personality comparable to a human. Among these approaches, we have Deep Neural Networks, Convolutional Neural Networks, and Deep Belief Networks.
    [Show full text]
  • An Exploration of Approaches to Integrating Neural Reranking Models in Multi-Stage Ranking Architectures
    An Exploration of Approaches to Integrating Neural Reranking Models in Multi-Stage Ranking Architectures Zhucheng Tu, Ma Crane, Royal Sequiera, Junchen Zhang, and Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo, Ontario, Canada fmichael.tu,ma.crane,rdsequie,j345zhan,[email protected] ABSTRACT results. Recent eorts include the Lucene4IR1 workshop organized We explore dierent approaches to integrating a simple convolu- by Azzopardi et al. [3], the Lucene for Information Access and tional neural network (CNN) with the Lucene search engine in a Retrieval Research (LIARR) Workshop [2] at SIGIR 2017, and the multi-stage ranking architecture. Our models are trained using the Anserini IR toolkit built on top of Lucene [24]. PyTorch deep learning toolkit, which is implemented in C/C++ with Given the already substantial and growing interest in applying a Python frontend. One obvious integration strategy is to expose deep learning to information retrieval [11], it would make sense to the neural network directly as a service. For this, we use Apache integrate existing deep learning toolkits with open-source search ri, a soware framework for building scalable cross-language engines. In this paper, we explore dierent approaches to integrat- services. In exploring alternative architectures, we observe that ing a simple convolutional neural network (CNN) with the Lucene once trained, the feedforward evaluation of neural networks is search engine, evaluating alternatives in terms of performance quite straightforward. erefore, we can extract the parameters (latency and throughput) as well as ease of integration. of a trained CNN from PyTorch and import the model into Java, Fundamentally, neural network models (and even more broadly, taking advantage of the Java Deeplearning4J library for feedfor- learning-to-rank approaches) for a variety of information retrieval ward evaluation.
    [Show full text]
  • Using Deep Learning for Optimized Personalized Card Linked Offer
    Using Deep Learning for optimized personalized card linked offer March 5, 2018 Suqiang Song , Director, Chapter Leader of Data Engineering & AI Shared Components and Security Solutions LinkedIn : https://www.linkedin.com/in/suqiang-song-72041716/ Mastercard Big Data & AI Expertise Differentiation starts with consumer insights from a massive worldwide payments network and our experience in data cleansing, analytics and modeling What can MULTI-SOURCED • 38MM+ merchant locations • 22,000 issuers CLEANSED, AGGREGATD, ANONYMOUS, AUGMENTED 2.4 BILLION • 1.5MM automated rules • Continuously tested and Global Cards WAREHOUSED • 10 petabytes 56 BILLION • 5+ year historic global view • Rapid retrieval Transactions/ • Above-and-beyond privacy protection and security Year mean TRANSFORMED INTO ACTIONABLE INSIGHTS • Reports, indexes, benchmarks to you? • Behavioral variables • Models, scores, forecasting • Econometrics Mastercard Enhanced Artificial Intelligence Capability with the Acquisitions of Applied Predictive Technologies(2015) and Brighterion (2017) Hierarchy of needs of AI https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007?gi=b66e6ae1b71d ©2016 Mastercard.Proprietaryand Confidential. Our Explore & Evaluation Journey Enterprise Requirements for Deep Learning Analyze a large amount of data on the same Big Data clusters where the data are stored (HDFS, HBase, Hive, etc.) rather than move or duplicate data. Add deep learning capabilities to existing Analytic Applications and/or machine learning workflows rather than rebuild all of them. Leverage existing Big Data clusters and deep learning workloads should be managed and monitored with other workloads (ETL, data warehouse, traditional ML etc..) rather than separated deployment , resources allocation ,workloads management and enterprise monitoring …… ©2016 Mastercard.Proprietaryand Confidential. Challenges and limitations to Production considering some “Super Stars”….
    [Show full text]
  • Long Short-Term Memory Recurrent Neural Network Architectures for Generating Music and Japanese Lyrics
    Long Short-Term Memory Recurrent Neural Network Architectures for Generating Music and Japanese Lyrics Ayako Mikami 2016 Honors Thesis Advised by Professor Sergio Alvarez Computer Science Department, Boston College Abstract Recent work in deep machine learning has led to more powerful artificial neural network designs, including Recurrent Neural Networks (RNN) that can process input sequences of arbitrary length. We focus on a special kind of RNN known as a Long-Short-Term-Memory (LSTM) network. LSTM networks have enhanced memory capability, creating the possibility of using them for learning and generating music and language. This thesis focuses on generating Chinese music and Japanese lyrics using LSTM networks. For Chinese music generation, an existing LSTM implementation is used called char-RNN written by Andrej Karpathy in the Lua programming language, using the Torch deep learning library. I collected a data set of 2,500 Chinese folk songs in abc notation, to serve as the LSTM training input. The network learns a probabilistic model of sequences of musical notes from the input data that allows it to generate new songs. To generate Japanese lyrics, I modified Denny Britz’s GRU model into a LSTM networks in the Python programming language, using the Theano deep learning library. I collected over 1MB of Japanese Pop lyrics as the training data set. For both implementations, I discuss the overall performance, design of the model, and adjustments made in order to improve performance. 1 Contents 1. Introduction . 3 1.1 Overview . 3 1.2 Feedforward Neural Networks . 4 1.2.1 Forward Propagation . 4 1.2.2 Backpropagation .
    [Show full text]
  • Chained Anomaly Detection Models for Federated Learning: an Intrusion Detection Case Study
    applied sciences Article Chained Anomaly Detection Models for Federated Learning: An Intrusion Detection Case Study Davy Preuveneers 1,* , Vera Rimmer 1, Ilias Tsingenopoulos 1, Jan Spooren 1, Wouter Joosen 1 and Elisabeth Ilie-Zudor 2 1 imec-DistriNet-KU Leuven, Celestijnenlaan 200A, B-3001 Heverlee, Belgium; [email protected] (V.R.); [email protected] (I.T.); [email protected] (J.S.); [email protected] (W.J.) 2 MTA SZTAKI, Kende u. 13-17, H-1111 Budapest, Hungary; [email protected] * Correspondence: [email protected]; Tel.: +32-16-327853 Received: 4 November 2018; Accepted: 14 December 2018; Published: 18 December 2018 Abstract: The adoption of machine learning and deep learning is on the rise in the cybersecurity domain where these AI methods help strengthen traditional system monitoring and threat detection solutions. However, adversaries too are becoming more effective in concealing malicious behavior amongst large amounts of benign behavior data. To address the increasing time-to-detection of these stealthy attacks, interconnected and federated learning systems can improve the detection of malicious behavior by joining forces and pooling together monitoring data. The major challenge that we address in this work is that in a federated learning setup, an adversary has many more opportunities to poison one of the local machine learning models with malicious training samples, thereby influencing the outcome of the federated learning and evading detection. We present a solution where contributing parties in federated learning can be held accountable and have their model updates audited. We describe a permissioned blockchain-based federated learning method where incremental updates to an anomaly detection machine learning model are chained together on the distributed ledger.
    [Show full text]
  • Dlobd: a Comprehensive Study of Deep Learning Over Big Data Stacks on HPC Clusters Xiaoyi Lu , Haiyang Shi, Rajarshi Biswas, M
    DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters Xiaoyi Lu , Haiyang Shi, Rajarshi Biswas, M. Haseeb Javed , and Dhabaleswar K. Panda 1 What is Big Data? ● "Big data" is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. (Source: Wikipedia) ● What does “Big” mean? ○ According to Forbes, we generate 2.5 quintillion bytes of data each day. ● Properties of Big Data ○ Variety ○ Velocity ○ Volume 2 How is Big Data different from large data stored in relational databases? How is Big Data different from large data stored in relational databases? 3 Uses of Big Data Predictive analytics: Utilizing a variety of statistical techniques ● Data mining(discovering patterns in large data sets), ● Predictive modeling(uses statistics to predict outcomes), and ● Machine learning User Behaviour Analytics (UBA) : analyze human behavior patterns 4 Big Data Technology stack Source: https://www.slideshare.net/Khalid-Imran/big-data-technology-stack-nutshell 5 Big Data Frameworks 6 Hadoop vs Spark ● Spark is 100 times faster than Hadoop. Why? ○ In-memory computation ● Spark does not provide it’s own distributed storage. ● Hence, Big Data projects involve installing Spark on top of Hadoop. ○ Spark applications can use Hadoop Distributed File System (HDFS). 7 Batch Processing ● Processing happens of blocks of data that have already been stored over a period of time. ● Example - calculating monthly payroll summaries. 8 source - https://medium.com/@gowthamy/big-data-battle-batch-processing-vs-stream-processing-5d94600d8103 Stream Processing ● Process data in real time as they arrive.
    [Show full text]