A Multilayer Perceptron Based Ensemble Technique for Fine-Grained Financial Sentiment Analysis

A Multilayer Perceptron based Ensemble Technique for Fine-grained Financial Sentiment Analysis A and B and C and D and E [email protected] Abstract in company stock prices and vice-versa. Benefits of such analysis are two-fold: (i). an individual In this paper, we propose a novel method can take informed decision before buying/selling for combining deep learning and classical his/her share; and (ii). an organization can utilize feature based models using a Multi-Layer this information to forecast its economic situation. Perceptron (MLP) network for financial Sentiment prediction is a core component of sentiment analysis. We develop various an end-to-end stock market forecasting business deep learning models based on Convolu- model. Thus, an efficient sentiment analysis sys- tional Neural Network (CNN), Long Short tem is required for real-time analysis of financial Term Memory (LSTM) and Gated Recur- text originating from the web. Sentiment analy- rent Unit (GRU). These are trained on sis in financial domain offers more challenges (as top of pre-trained, autoencoder-based, fi- compared to product reviews domains etc.) due nancial word embeddings and lexicon fea- to the presence of various financial and techni- tures. An ensemble is constructed by com- cal terms along with numerous statistics. Coarse- bining these deep learning models and a level sentiment analysis in financial texts usually classical supervised model based on Sup- ignores critical information towards a particular port Vector Regression (SVR). We eval- company, therefore making it unreliable for the uate our proposed technique on a bench- stock prediction. In fine-grained sentiment anal- mark dataset of SemEval 2017. The pro- ysis, we can emphasize on a given company with- pose model shows impressive results on out losing any critical information. For example, two datasets, i.e. microblogs and news in the following tweet sentiment towards $APPL headlines datasets. Comparisons show (APPLE Inc.) is positive while negative towards that our proposed model performs better $FB (Facebook Inc.). than the existing state-of-the-art systems for the above two datasets by 2.0 and 4.1 ‘$APPL going strong; $FB not so.’ cosine points, respectively. In literature, many methods for sentiment anal- 1 Introduction ysis from financial news have been described. O’Hare et al. (2009) used word-based approach Microblog messages and news headlines are freely on financial blogs to train a sentiment classifier for available on Internet in vast amount. Dynamic na- automatically determining the sentiment towards ture of these texts can be utilized effectively to companies and their stocks. Authors in (Schu- analyze the shift in the stock prices of any com- maker and Chen, 2009) use the bag-of-words and pany (Goonatilake and Herath, 2007). By keep- named entities for predicting stock market. They ing a track of microblog messages and news head- successfully showed that the stock market behav- lines for financial domain one can observe the ior is based on the opinions. A fine-grained sen- trend in stock prices, which in turn, allows an in- timent annotation scheme was incorporated by dividual to predict the future stock prices. An (de Kauter et al., 2015) for predicting the explicit increase in positive opinions towards a particular and implicit sentiment in the financial text. An ap- company would indicate that the company is do- plication of multiple regression model was devel- ing well and this would be reflected in the increase oped by (Oliveira et al., 2013). In this paper, we propose a novel Multi-Layer (Word2Vec: given a context, a word is predicted Perceptron (MLP) based ensemble technique for or vice-versa; GloVe: count-based model utiliz- fine-grained sentiment analysis. It combines the ing word co-occurrence matrix), some applica- outputs of four systems, one is feature-driven su- tions adapt well to Word2Vec while others per- pervised model and the rest three are deep learning form well on GloVe model. We, therefore, at- based. tempt to leverage the richness of both the models We further propose to develop an enhanced by using a stacked denoising autoencoder.Finally, word representation by learning through a stacked we combine predictions of these models using the denoising autoencoder network (Vincent et al., MLP network in order to obtain the final sentiment 2010) using word embeddings of Word2Vec scores. An overview of the proposed method is de- (Mikolov et al., 2013) and GloVe (Pennington picted in Figure1. et al., 2014) models. For evaluation purpose we use datasets of SemEval-2017 ‘Fine-Grained Sentiment Analysis on Financial Microblogs and News’ shared task (Keith Cortis and Davis, 2017). The dataset com- prises of financial short texts for two domains i.e. microblog messages and news headlines. Com- parisons with the state-of-the-art models show that our system produces better performance. Figure 1: MLP based ensemble architecture. The main contributions of the current work are summarized as follows: a) we effectively com- A. Convolution Neural Network (CNN): bine competing systems to work as a team via Lit- MLP based ensemble learning; b) develop an en- erature suggests that CNN architecture had been hanced word representation by leveraging the syn- successfully applied for sentiment analysis at tactic and semantic richness of the two distributed various level (Kim, 2014; Akhtar et al., 2016; word representation through a stacked denoising Singhal and Bhattacharyya, 2016). Most of these autoencoder; and c) build a state-of-the-art model works involve classification tasks, however, we for sentiment analysis in financial domain. adopt CNN architecture for solving the regression problem. Our proposed system employs a 2 Proposed Methodology convolution layer followed by a max pool layer, 2 fully connected layers and an output layer. We We propose a Multi-Layer Perceptron based en- use 100 different filters while sliding over 2, 3 and semble approach to leverage the goodness of 4 words at a time. We employ all these filters in various supervised systems. We develop three parallel. deep neural network architecture based models, viz. Convolution Neural Network (CNN) (Kim, B. Long Short Term Memory Network 2014), Long Shrort Term Memory (LSTM) net- (LSTM): LSTMs (Hochreiter and Schmidhuber, work (Hochreiter and Schmidhuber, 1997) and 1997) are a special kind of recurrent neural Gated Recurrent Unit (GRU) network (Cho et al., network (RNN) capable of learning long-term 2014)). The other model is based on Support dependencies by effectively handling the vanish- Vector Regression (SVR) (Smola and Scholkopf¨ , ing or exploding gradient problem. We use two 2004) based feature-driven system. LSTM layers on top of each other having 100 The classical feature based system utilizes a neurons in each. This was followed by 2 fully diverse set of features (c.f. Section 2.D). On connected layers and an output layer. the other hand we train a CNN, a LSTM and a GRU network on top of distributed word rep- C. Gated Recurrent Unit (GRU): GRUs (Cho resentations. We utilize Word2Vec and GloVe et al., 2014) are also a special kind of RNN which word representation techniques to learn our fi- can efficiently learn long-term dependencies. A nancial word embeddings. Since Word2Vec and key difference of GRU with LSTM is that, GRU’s GloVe models capture syntactic and semantic re- recurrent state is completely exposed at each time lations among words using different techniques step in contrast to LSTM’s recurrent state which controls its recurrent state. Thus, comparably lexicons (Kiritchenko et al., 2014; Mohammad GRUs have lesser parameters to learn and training et al., 2013) which associate a positive or negative is computationally efficient. We use two GRU score to a token. Following features are extracted layers on top of each other having 100 neurons for each of these: i) positive, negative and net in each. This was followed by 2 fully connected count. ii) maximum of positive and negative layers and an output layer. scores. iii) sum of positive, negative and net scores. D. Feature based Model (SVR): We extract and - Vader Sentiment: Vader sentiment (Gilbert, implement following set of features to learn a 2014) score is a rule-based method that generates Support Vector Regression (SVR) (Smola and a compound sentiment score for each sentence Scholkopf¨ , 2004) for predicting the sentiment between -1 (extreme negative) and +1 (extreme score in the continuous range of -1 to +1. positive). It also produces ratio of positive, - Word Tf-Idf: Term frequency-inverse docu- negative and neutral tokens in the sentence. We ment frequency (tf-idf) is a numerical statistic that obtain score and ratio of each instance in the is intended to reflect how important a word is datasets and use as feature for training. to a document in a corpus. We consider tf-idf weighted counts of continuous sequences of n- Network parameters for CNN, LSTM & GRU: grams (n=2,3,4,5) at a time. In the fully connected layers we use 50 and 10 - Lexicon Features: Sentiment lexicons are neurons , respectively for the two hidden layers. widely utilized resources in the field of sentiment We use Relu activations (Glorot et al., 2011) for analysis. Its application and effectiveness in senti- intermediate layers and tanh activation in the final ment prediction task had been widely studied. We layer. We employ 20% Dropout (Srivastava et al., employ two lexicons i.e. Bing Liu opinion lexi- 2014) in the fully connected layers as a measure con (Ding et al., 2008) and MPQA (Wilson et al., of regularization and Adam optimizer (Kingma 2005) subjectivity lexicon for news headlines do- and Ba, 2014) for optimization. main. First we compile a comprehensive list of positive and negative words form these lexicons E. MultiLayer Perceptron (MLP) based En- and then extract the following lexicon driven fea- semble: Ensemble of models improves the pre- tures.

A Multilayer Perceptron Based Ensemble Technique for Fine-Grained Financial Sentiment Analysis

Audio Event Classification Using Deep Learning in an End-To-End Approach

4 Nonlinear Regression and Multilayer Perceptron

Petroleum and Coal

Learning Sabatelli, Matthia; Loupe, Gilles; Geurts, Pierre; Wiering, Marco

The Architecture of a Multilayer Perceptron for Actor-Critic Algorithm with Energy-Based Policy Naoto Yoshida

Performance Analysis of Multilayer Perceptron in Profiling Side

Perceptron 1 History of Artificial Neural Networks

TF-IDF Vs Word Embeddings for Morbidity Identification in Clinical Notes: an Initial Study

Multilayer Perceptron

Imputation and Generation of Multidimensional Market Data

Lecture 7. Multilayer Perceptron. Backpropagation

A Multilayer Perceptron Solution to the Match