Stock Market Prediction with Deep Learning: a Character-Based Neural Language Model for Event-Based Trading

Stock Market Prediction with Deep Learning: A Character-based Neural Language Model for Event-based Trading Leonardo dos Santos Pinheiro Mark Dras Macquarie University Macquarie University Capital Markets CRC [email protected] [email protected] Abstract Financial news conveys novel information to broad market participants and a fast reaction In the last few years, machine learning to the release of new information is an impor- has become a very popular tool for an- tant component of trading strategies (Leinwe- alyzing financial text data, with many ber and Sisk, 2011). promising results in stock price fore- But despite the great interest, attempts to casting from financial news, a devel- forecast stock prices from unstructured text opment with implications for the Effi- data have had limited success and there seems cient Markets Hypothesis (EMH) that to be much room for improvement. This can underpins much economic theory. In be in great part attributed to the difficulty in- this work, we explore recurrent neu- volved in extracting the relevant information ral networks with character-level lan- from the text. So far most approaches to ana- guage model pre-training for both in- lyzing financial text data are based on bag-of- traday and interday stock market fore- words, noun phrase and/or named entity fea- casting. In terms of predicting di- ture extraction combined with manual feature rectional changes in the Standard & selection, but the capacity of these methods Poor’s 500 index, both for individual to extract meaningful information from the companies and the overall index, we data is limited as much information about the show that this technique is competitive structure of text is lost in the process. with other state-of-the-art approaches. In recent years, the trend for extracting features from text data has shifted away from 1 Introduction manual feature engineering and there has been Predicting stock market behavior is an area of a resurgence of interest in neural networks due strong appeal for both academic researchers to their power for learning useful representa- and industry practitioners alike, as it is both tions directly from data (Bengio et al., 2013). a challenging task and could lead to increased Even though deep learning has had great suc- profits. Predicting stock market behavior from cess in learning representations from text data the arrival of new information is an even more (e.g. Mikolov et al. (2013a), Mikolov et al. interesting area, as economists frequently test (2013b) and Kiros et al. (2015)), successful ap- it to challenge the Efficient Market Hypothe- plications of deep learning in textual analysis sis (EMH) (Malkiel, 2003): a strict form of the of financial news have been few, even though EMH holds that any news is incorporated into it has been demonstrated that its application prices without delay, while other interpreta- to event-driven stock prediction is a promising tions hold that incorporation takes place over area of research (Ding et al., 2015). time. Finding the most informative representation In practice, the analysis of text data such of the data in a text classification problem is as news announcements and commentary on still an open area of research. In the last few events is one major source of market infor- years a range of di↵erent neural networks ar- mation and is widely used and analyzed by chitectures have been proposed for text classi- investors (Oberlechner and Hocking, 2004). fication, each one with strong results on di↵er- Leonardo Dos Santos Pinheiro and Mark Dras. 2017. Stock Market Prediction with Deep Learning: A Character-based Neural Language Model for Event-based Trading. In Proceedings of Australasian Language Technology Association Workshop, pages 6 15. − ent benchmarks (e.g. Socher et al. (2013), Kim character level language model. (2014) and Kumar et al. (2016)), and each one The remainder of the paper is structured as proposing di↵erent ways to encode the textual follows: In Section 2,wedescribeevent-driven information. trading and review the relevant literature. In One of the most commonly used architec- Section 3 we describe our model and the ex- tures for modeling text data is the Recurrent perimental setup used in this work. Section 5 Neural Network (RNN). One technique to im- presents and discuss the results. Finally, in prove the training of RNNs, proposed by Dai Section 6 we summarize our work and suggest and Le (2015) and widely used, is to pre-train directions for future research. the RNN with a language model. In this work this approach outperformed training the same 2 Event-based Trading model from random initialization and achieved In recent years, with the advances in compu- state of the art in several benchmarks. tational power and in the ability of computers Another strong trend in deep learning for to process massive amounts of data, algorith- text is the use of a word embedding layer as mic trading has emerged as a strong trend in the main representation of the text. While this investment management (Ruta, 2014). This, approach has notable advantages, word-level combined with the advances in the fields of language models do not capture sub-word in- machine learning and natural language pro- formation, may inaccurately estimate embed- cessing (NLP), has been pushing the use of dings for rare words, and can poorly represent unstructured text data as source of informa- domains with long-tailed frequency distribu- tion for investment strategies as well (Fisher tions. These were motivations for character- et al., 2016). level language models, which Kim et al. (2016) The area of NLP with the biggest influ- and Radford et al. (2017) showed are capable ence in stock market prediction so far has of learning high level representations despite been sentiment analysis, or opinion mining their simplicity. These motivations seem ap- (Pang et al., 2008). Earlier work by Tetlock plicable in our domain: character-level repre- (2007) used sentiment analysis to analyze the sentations can for example generalise across correlation between sentiment in news arti- numerical data like percentages (e.g. the cles and market prices, concluding that me- terms 5% and 9%) and currency (e.g. $1,29), dia pessimism may a↵ect both market prices and can handle the large number of infre- and trading volume. Similarly, Bollen et al. quently mentioned named entities. Character- (2011) used a system to measure collective level models are also typically much more com- mood through Twitter feeds and showed it to pact. be highly predictive of the Dow Jones Indus- In this work we propose an automated trad- trial Average closing values. Following these ing system that, given the release of news in- results, other work has also social media infor- formation about a company, predicts changes mation for stock market forecasting (Nguyen in stock prices. The system is trained to pre- et al., 2015; Oliveira et al., 2017, for example). dict both changes in the stock price of the With respect to direct stock price forecast- company mentioned in the news article and in ing, from news articles, many systems based the corresponding stock exchange index (S&P on feature selection have been proposed in the 500). We also test this system for both in- literature. Schumaker et al. (2012)builta traday changes, considering a window of one system to evaluate the sentiment in financial hour after the release of the news, and for news articles using a Support Vector Regres- changes between the closing price of the cur- sion learner with features extracted from noun rent trading session and the closing price of phrases and scored on a positive/negative sub- the next day session. This comparative analy- jectivity scale, but the results had limited suc- sis allow us to infer whether the incorporation cess. Yu et al. (2013) achieved better ac- of new information is instantaneous or if it oc- curacy with a selection mechanism based on curs gradually over time. Our model consists a contextual entropy model which expanded of a recurrent neural network pre-trained by a a set of seed words by discovering similar 7 emotion words and their corresponding in- dict changes to the S&P 500 index. The ar- tensities from online stock market news ar- chitecture here was also multichannel, and in- ticles. Hagenau et al. (2013) also achieved corporated a technical analysis1 input channel. good results by applying Chi-square and Bi- The results from both pieces of work outper- normal separation feature selection with n- formed the former manual feature engineering gram features. As with all sentiment analysis, approaches. scope of negation can be an issue: Pröllochs To the best of our knowledge, character- et al. (2016) recently proposed a reinforcement level sequence modeling has not been applied learning method to predict negation scope and to stock price forecasting so far; neither has showed that it improved the accuracy on a the use of language model pre-training. We dataset from the financial news domain. note that the event models of Ding et al. A di↵erent approach to incorporate news (2014) and Ding et al. (2015) make use of into stock trading strategies was proposed by generalization and back-o↵techniques to deal Nuij et al. (2014), which used an evolution- with data sparsity in terms of named enti- ary algorithm to combine trading rules us- ties etc, which as mentioned earlier character- ing technical indicators and events extracted level representations could help address. Also, from news with expert-defined impact scores. character-level inputs are potentially comple- While far from using an optimal way to extract mentary to other sorts such as word-level in- information from financial text data their re- puts or event representations, in particular sults concluded that the news events were a with the multichannel architectures used for component of optimal trading strategies.

Load more