<<

and Deep Reinforcement Learning for

Chi Zhang Department of Computer Science University of Southern California Los Angeles, CA, 90089 [email protected]

Abstract If lose confidence in certain company, the price will fail. Historical trend of a stock In algorithmic trading, we buy/sell price reflects the future trends to some extend. For using computers automatically. While high frequency algorithmic trading is pretty example, if a stock is growing for the last 1 month, common in financial market, we focus on the probability that it keeps growing is very high. -term algorithmic trading based on histor- However, historical statistics fail to capture social ical stock price and news/tweets. To make the impact on stock market such as political regula- problem tractable, we model the transaction tions, trade war and British exit. Those infor- as a sequential decision making problem. The mation can be retrieved by news articles and hu- transaction period is one day. The return is man commentators such as microblogs. Studies calculated using the close price plus a fixed (Bollen et al., 2010) show a strong correlation be- rate of commission. In the midterm report, we implement three baselines models: RL tween modes and Dow Jones Industrial Av- using price-only observation, RL using news- erage (DJIA). only observation and pretrained sentiment In this work, we focus on long-term portfolio classifier plus manually designed policy. We management using sentiment analysis and deep re- implement the sentiment classifier as a 2-layer inforcement learning. We will try two approaches: convolutional nets using pretrained Bert embedding. We test our idea on Dow Jones 1. Independent sentiment analysis system: we Industrial Average (DJIA) and experiments train separate independent analysis system show that all the methods overfit on training using twitter data and produce a confidence dataset and the performance on testing data score ranging from 0 to 1. Then, we fixed the amounts to random guessing. Our code is open source at https://github.com/ sentiment analysis system and train a trading vermouth1992/CSCI699ml4know/ agent using reinforcement learning. The ob- tree/master/project. servation will be twitter data and price data within a historical window. 1 Introduction 2. The sentiment analysis system will be jointly Algorithm trading is gaining attention with the trained with RL system. extraordinary performance of deep learning al- gorithms in computer vision, natural language 2 Related Work process and sequential decision making domain (Ganesh and Rakheja, 2018). Most algorithmic 2.1 Data Acquisition trading, however, is applied in high frequency do- The accurate labeled data is crucial for training main (large transaction at fractions of a sentiment analysis systems. (Go et al., 2009) pro- second), where the trend of stock market is sta- poses to use Emoticons as noisy labels. For exam- tionary and easily predictable. This is generally ple, :), :-), :D is positive while :(, :-( is negative. not true in long-term since the underly- (Pagolu et al., 2016) proposes to use stock move- ing dynamics of stock market is constantly chang- ment direction as labels. (Samonte et al., 2017) ing. From basic economics theory, the stock price uses keyword tagging for labeling. indicates the future value of a company and re- Available benchmarks for twitter sentiment flects the sentiment of the investors (Desjardins). analysis include SemEval (Rosenthal et al., 2014) and STS-Gold (Saif et al., 2013). Other twitter 3 Trading Model sentiment analysis datasets can be found on Kag- gle competition (KazAnova; Kaggle). In this project, we would like to manage portfo- lio by distributing our investment into a number of 2.2 Sentiment-encoded Embedding stocks based on the market. We define our envi- Word embedding is the key to apply neural ronment similar to (Jiang et al., 2017). network models to sentiment analysis. Di- Concretely, we define N to be the number of rectly applying context-free models like word2vec stocks we would like to invest. Without losing (Mikolov et al., 2013) may encounter problems generality, at initial timestamp, we assume our to- because words with similar contexts but opposite tal investment volume is 1 dollar. We define rela- sentiment polarities (e.g., “good” or “bad”) may tive price vector as: be mapped to nearby vectors in the embedding space. (Tang et al., 2014) proposes Sentiment- v1,t+1,close v2,t+1,close vN,t+1,close yt = [1, , , ··· , ] specific Word Embedding (SSWE) to embed both v1,t,close v2,t,close vN,t,close semantic and sentiment information in the learned (1) word vectors. (Bespalov et al., 2011) shows that where vi,t+1,close is the relative price of stock i vi,t,close an n-gram model combined with latent represen- at timestamp t. Note the first element represents tation is more suitable embedding for sentiment the relative price of cash, which is always 1. We classification. In this paper, the goal is to ex- define portfolio weight vector as: tract sentiment for certain stock. Thus, aspect- level feature is also important. (Vo and Zhang, wt = [w0,t, w1,t, ··· , wN,t] (2) 2015) studied aspect-based Twitter sentiment clas- sification by making use of rich automatic fea- where wi,t represents the fraction of investment PN tures, which are additional features obtained using on stock i at timestamp t and i=0 wi,t = 1. Note unsupervised learning techniques. that w0,t represents the fraction of cash that we maintain. Then the profit after timestamp T is: 2.3 Deep Learning Models T In this paper, we mainly survey deep learning Y models that can learn sentiment embedding di- pT = yt · wt−1 (3) t=1 rectly instead of designing hand-crafted features. (Kalchbrenner et al., 2014) proposes to use con- where w0 = [1, 0, ··· , 0]. If we consider a trad- volutional neural network for semantic modeling ing cost factor µ, then the trading cost of each of sentences. They design Dynamic k-Max Pool- timestamp is: ing, a global pooling operation over linear se- X yt wt−1 quences that handles input sentences of varying µt = µ | − wt| (4) length and induces a feature graph over the sen- yt · wt−1 tence to capture and long-range relations. where is element-wise product. Then equa- (Wang et al., 2017) proposes Coupled Multi-Layer tion3 becomes: Attention Model (CMLA) for co-extracting of as- T pect and opinion terms. The model consists of Y an aspect attention and an opinion attention using pT = (1 − µt)yt · wt−1 (5) GRU units. (Li and Lam, 2017) further improves t=1 the model by using three LSTMs, of which two 3.1 Key Assumptions and Goal LSTMs are for capturing aspect and sentiment in- To model real world market trades, we make sev- teractions. The third LSTM is to use the sentiment eral assumptions to simplify the problems: polarity information as an additional guidance. In stock market sentiment analysis, it is crucial to • We can get any information about the stocks ignore noisy sentiment against other entity. (He before timestamp t for stock i. e.g. The pre- et al., 2017) proposes an attention-based model for vious stock price, the news and tweets online. unsupervised aspect extraction to focus more on aspect-related words while de-emphasize aspect- • Our investment will not change how the mar- irrelevant words. ket behaves. • The way we calculate profit in equation3 can News Data be interpreted as: At timestamp t, we observe historical price and news including today and

rearrange our portfolio percentage through Sentiment Price Data bug/sell stocks using close price. This may Classifier not be true in practice because you will not always be able to buy/sell the stock at close

price. Previous Hidden Actor Next Hidden

The goal of portfolio management is to maxi- mum pT by choosing portfolio weight vector w at each timestamp t based on history stock informa- Actions tion. Figure 1: General Policy Network Architecture 3.2 POMDP formulation

We formulate the problem as Partially-Observable the stock is going up. In the originally reward for- Markov Decision Process. mulation, the agent will make a tradeoff between 3.2.1 State and Action the confidence and the losing of money. We define ot as the observation of timestamp t. 4 Baseline Approaches In our setting ot is the today’s price and today’s news. The action at is just portfolio weight vector 4.1 Reinforcement Learning Algorithm wt. Note that this is a continuous state and action In this work, we use Proximal Policy Optimization space problem. Essentially, we want to train a pol- (PPO) (Schulman et al., 2017) as our default RL icy network πθ(at|ot). algorithm. It is a policy gradient algorithm that 3.2.2 State Transition converges much faster than vanilla policy gradient. Since RL algorithm is not the focus of this work, The underlining state evolution is determined by we leave the details of algorithm to readers. the market, which we don’t have any control. What we can get is the observation state, which 4.2 Policy Network is the price and the news. Since we will collect We show a generic policy network in Figure1. history price of various stocks, o is provide by the t The green box shows the input/output and the red dataset. box shows the network. The news data consist of 3.2.3 Reward a fixed number of news/tweets of N stocks in one Instead of having reward 0 at each timestamp and day. The price day is the open, high, low, close. In partially-observable sequentially decision making, pT at the end, we take logarithm of equation5: the current step action also depends on previous T T Y X step action, which is denoted as previous hidden log pT = log µtyt · wt−1 = log(µtyt·wt−1) in Figure1. The sentiment classifier takes news t=1 t=1 data as input and produce a sentiment score for each stock (probability of going up). Combined Thus, we have log(µtyt ·wt−1) reward each times- tamp, which avoids the sparsity of reward prob- with price data, sentiment score and last step hid- lem. den, the actor module produces the current action and the hidden for next step. 3.2.4 Reward Shaping To facilitate the reinforcement learning algorithm, 4.3 Fixed sentiment classifier we perform optional reward shaping by using indi- For baseline sentiment classification, we apply cator of whether the reward is positive or negative Convolutional Neural Networks (CNN) for sen- (we make money of lose money) instead of the ac- tence sentiment classification as in (Kim, 2014). tual amount. The advantage is that the agent now The use a 2-layer CNN with batch normalization needs to avoid any losing of money and make sure and dropout to prevent overfitting. We use con- we only invest when there is a high confidence that textualized word embedding Bert (Devlin et al., 2018) to initialize our word embedding layer. References After training the sentiment classifier, we train Dmitriy Bespalov, Bing Bai, Yanjun Qi, and Ali Shok- actor network with fixed sentiment classifier using oufandeh. 2011. Sentiment classification based on RL. supervised latent n-gram analysis. Proceedings of the 20th ACM international conference on Informa- 4.4 News-only Observation RL tion and knowledge management - CIKM ’11. We train end-to-end model using only news as ob- Johan Bollen, Huina Mao, and Xiao-Jun Zeng. 2010. servation. Twitter mood predicts the stock market. Desjardins. What causes stock prices to change? 4.5 Price-only Observation RL https://www.disnat.com/en/learning/trading- We train end-to-end model using only price as ob- basics/stock-basics/what-causes-stock-prices- servation. to-change. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and 5 Experimental Results Kristina Toutanova. 2018. BERT: pre-training of deep bidirectional transformers for language under- 5.1 Datasets standing. CoRR, abs/1810.04805. We use Dow Jones Industrial Average (DJIA) Prakhar Ganesh and Puneet Rakheja. 2018. Deep dataset from https://www.kaggle.com/ reinforcement learning in high frequency trading. aaron7sun/stocknews. It contains DJIA CoRR, abs/1809.01506. price from 2008-08-08 to 2016-07-01. For each Alec Go, Richa Bhayani, and Lei Huang. 2009. Twit- day, it also contains 25 top news headline. We la- ter sentiment classification using distant supervision. bel the data using the trend (up/down) of tomor- Processing, 150. row’s closing price. We split the whole data to Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel 0.8/0.2 as training/validation data. Dahlmeier. 2017. An unsupervised neural atten- tion model for aspect extraction. Proceedings of the 5.2 Performance 55th Annual Meeting of the Association for Compu- tational Linguistics (Volume 1: Long Papers). Unfortunately, the validation accuracy of the clas- sifier is around 50%, which is the same as random Zhengyao Jiang, Dixing Xu, and Jinjun Liang. 2017. guess. The accumulated reward of simulated trad- A deep reinforcement learning framework for the fi- ing using the agents trained by all the approaches nancial portfolio management problem. is around zero, which is also the same as random Kaggle. Twitter sentiment analysis. guess. https://www.kaggle.com/c/twitter-sentiment- analysis2/data. 6 Discussions Nal Kalchbrenner, Edward Grefenstette, and Phil Blun- som. 2014. A convolutional neural network for We make several comments here for our future ex- modelling sentences. ploration. KazAnova. Sentiment140 dataset 1. There is a fundamental issue of the current with 1.6 million tweets. sentence labeling approach. We use the stock https://www.kaggle.com/kazanova/sentiment140. price to indicate the label. However, this may Yoon Kim. 2014. Convolutional neural networks for not be correct. We probably need a general sentence classification. CoRR, abs/1408.5882. sentiment classification system on large news Xin Li and Wai Lam. 2017. Deep multi-task learning and financial corpus and average the score of for aspect term extraction with memory interaction. all news for particular stock. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2. Predicting the daily stock trend using his- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Cor- torical price and news is almost impossible rado, and Jeffrey Dean. 2013. Distributed represen- due to the huge fluctuation of the stock mar- tations of words and phrases and their composition- ket. Maybe, we want to predict longer period ality. trend (say a week) and fix the investment ra- Venkata Sasank Pagolu, Kamal Nayan Reddy Challa, tio within longer period. Ganapati Panda, and Babita Majhi. 2016. Sentiment analysis of twitter data for predicting stock market movements. Sara Rosenthal, Alan Ritter, Preslav Nakov, and Veselin Stoyanov. 2014. Semeval-2014 task 9: Sen- timent analysis in twitter. pages 73–80. Hassan Saif, Miriam Fernandez, Yulan He, and Harith Alani. 2013. Evaluation datasets for twitter senti- ment analysis. a survey and a new dataset, the sts- gold. volume 1096. Mary Jane C. Samonte, John Michael R. Garcia, Va- lerie Jade L. Lucero, and Shayann Celine B. San- tos. 2017. Sentiment and opinion analysis on twitter about local airlines. In Proceedings of the 3rd Inter- national Conference on Communication and Infor- mation Processing, ICCIP ’17, pages 415–422, New York, NY, USA. ACM. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. CoRR, abs/1707.06347. Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014. Learning sentiment- specific word embedding for twitter sentiment clas- sification. volume 1, pages 1555–1565. Duy-Tin Vo and Yue Zhang. 2015. Target-dependent twitter sentiment classification with rich automatic features. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pages 1347–1353. AAAI Press. Wenya Wang, Sinno Jialin Pan, Daniel Dahlmeier, and Xiaokui Xiao. 2017. Coupled multi-layer attentions for co-extraction of aspect and opinion terms. In AAAI.