Sentiment Analysis and Deep Reinforcement Learning for Algorithmic Trading

Sentiment Analysis and Deep Reinforcement Learning for Algorithmic Trading Chi Zhang Department of Computer Science University of Southern California Los Angeles, CA, 90089 [email protected] Abstract If investors lose confidence in certain company, the stock price will fail. Historical trend of a stock In algorithmic trading, we buy/sell stocks price reflects the future trends to some extend. For using computers automatically. While high frequency algorithmic trading is pretty example, if a stock is growing for the last 1 month, common in financial market, we focus on the probability that it keeps growing is very high. long-term algorithmic trading based on histor- However, historical statistics fail to capture social ical stock price and news/tweets. To make the impact on stock market such as political regula- problem tractable, we model the transaction tions, trade war and British exit. Those infor- as a sequential decision making problem. The mation can be retrieved by news articles and hu- transaction period is one day. The return is man commentators such as microblogs. Studies calculated using the close price plus a fixed (Bollen et al., 2010) show a strong correlation be- rate of commission. In the midterm report, we implement three baselines models: RL tween twitter modes and Dow Jones Industrial Av- using price-only observation, RL using news- erage (DJIA). only observation and pretrained sentiment In this work, we focus on long-term portfolio classifier plus manually designed policy. We management using sentiment analysis and deep re- implement the sentiment classifier as a 2-layer inforcement learning. We will try two approaches: convolutional nets using pretrained Bert embedding. We test our idea on Dow Jones 1. Independent sentiment analysis system: we Industrial Average (DJIA) and experiments train separate independent analysis system show that all the methods overfit on training using twitter data and produce a confidence dataset and the performance on testing data score ranging from 0 to 1. Then, we fixed the amounts to random guessing. Our code is open source at https://github.com/ sentiment analysis system and train a trading vermouth1992/CSCI699ml4know/ agent using reinforcement learning. The ob- tree/master/project. servation will be twitter data and price data within a historical window. 1 Introduction 2. The sentiment analysis system will be jointly Algorithm trading is gaining attention with the trained with RL system. extraordinary performance of deep learning al- gorithms in computer vision, natural language 2 Related Work process and sequential decision making domain (Ganesh and Rakheja, 2018). Most algorithmic 2.1 Data Acquisition trading, however, is applied in high frequency do- The accurate labeled data is crucial for training main (large transaction volume at fractions of a sentiment analysis systems. (Go et al., 2009) pro- second), where the trend of stock market is sta- poses to use Emoticons as noisy labels. For exam- tionary and easily predictable. This is generally ple, :), :-), :D is positive while :(, :-( is negative. not true in long-term investment since the underly- (Pagolu et al., 2016) proposes to use stock move- ing dynamics of stock market is constantly chang- ment direction as labels. (Samonte et al., 2017) ing. From basic economics theory, the stock price uses keyword tagging for labeling. indicates the future value of a company and re- Available benchmarks for twitter sentiment flects the sentiment of the investors (Desjardins). analysis include SemEval (Rosenthal et al., 2014) and STS-Gold (Saif et al., 2013). Other twitter 3 Trading Model sentiment analysis datasets can be found on Kag- gle competition (KazAnova; Kaggle). In this project, we would like to manage portfolio by distributing our investment into a number of 2.2 Sentiment-encoded Embedding stocks based on the market. We define our envi- Word embedding is the key to apply neural ronment similar to (Jiang et al., 2017). network models to sentiment analysis. Di- Concretely, we define N to be the number of rectly applying context-free models like word2vec stocks we would like to invest. Without losing (Mikolov et al., 2013) may encounter problems generality, at initial timestamp, we assume our to- because words with similar contexts but opposite tal investment volume is 1 dollar. We define rela- sentiment polarities (e.g., “good” or “bad”) may tive price vector as: be mapped to nearby vectors in the embedding space. (Tang et al., 2014) proposes Sentiment- v1;t+1;close v2;t+1;close vN;t+1;close yt = [1; ; ; ··· ; ] specific Word Embedding (SSWE) to embed both v1;t;close v2;t;close vN;t;close semantic and sentiment information in the learned (1) word vectors. (Bespalov et al., 2011) shows that where vi;t+1;close is the relative price of stock i vi;t;close an n-gram model combined with latent represen- at timestamp t. Note the first element represents tation is more suitable embedding for sentiment the relative price of cash, which is always 1. We classification. In this paper, the goal is to ex- define portfolio weight vector as: tract sentiment for certain stock. Thus, aspect- level feature is also important. (Vo and Zhang, wt = [w0;t; w1;t; ··· ; wN;t] (2) 2015) studied aspect-based Twitter sentiment classification by making use of rich automatic fea- where wi;t represents the fraction of investment PN tures, which are additional features obtained using on stock i at timestamp t and i=0 wi;t = 1. Note unsupervised learning techniques. that w0;t represents the fraction of cash that we maintain. Then the profit after timestamp T is: 2.3 Deep Learning Models T In this paper, we mainly survey deep learning Y models that can learn sentiment embedding di- pT = yt · wt−1 (3) t=1 rectly instead of designing hand-crafted features. (Kalchbrenner et al., 2014) proposes to use con- where w0 = [1; 0; ··· ; 0]. If we consider a trad- volutional neural network for semantic modeling ing cost factor µ, then the trading cost of each of sentences. They design Dynamic k-Max Pool- timestamp is: ing, a global pooling operation over linear se- X yt wt−1 quences that handles input sentences of varying µt = µ j − wtj (4) length and induces a feature graph over the sen- yt · wt−1 tence to capture short and long-range relations. where is element-wise product. Then equa- (Wang et al., 2017) proposes Coupled Multi-Layer tion3 becomes: Attention Model (CMLA) for co-extracting of as- T pect and opinion terms. The model consists of Y an aspect attention and an opinion attention using pT = (1 − µt)yt · wt−1 (5) GRU units. (Li and Lam, 2017) further improves t=1 the model by using three LSTMs, of which two 3.1 Key Assumptions and Goal LSTMs are for capturing aspect and sentiment in- To model real world market trades, we make sev- teractions. The third LSTM is to use the sentiment eral assumptions to simplify the problems: polarity information as an additional guidance. In stock market sentiment analysis, it is crucial to • We can get any information about the stocks ignore noisy sentiment against other entity. (He before timestamp t for stock i. e.g. The pre- et al., 2017) proposes an attention-based model for vious stock price, the news and tweets online. unsupervised aspect extraction to focus more on aspect-related words while de-emphasize aspect- • Our investment will not change how the mar- irrelevant words. ket behaves. • The way we calculate profit in equation3 can News Data be interpreted as: At timestamp t, we observe historical price and news including today and rearrange our portfolio percentage through Sentiment Price Data bug/sell stocks using close price. This may Classifier not be true in practice because you will not always be able to buy/sell the stock at close price. Previous Hidden Actor Next Hidden The goal of portfolio management is to maxi- mum pT by choosing portfolio weight vector w at each timestamp t based on history stock informa- Actions tion. Figure 1: General Policy Network Architecture 3.2 POMDP formulation We formulate the problem as Partially-Observable the stock is going up. In the originally reward for- Markov Decision Process. mulation, the agent will make a tradeoff between 3.2.1 State and Action the confidence and the losing of money. We define ot as the observation of timestamp t. 4 Baseline Approaches In our setting ot is the today’s price and today’s news. The action at is just portfolio weight vector 4.1 Reinforcement Learning Algorithm wt. Note that this is a continuous state and action In this work, we use Proximal Policy Optimization space problem. Essentially, we want to train a pol- (PPO) (Schulman et al., 2017) as our default RL icy network πθ(atjot). algorithm. It is a policy gradient algorithm that 3.2.2 State Transition converges much faster than vanilla policy gradient. Since RL algorithm is not the focus of this work, The underlining state evolution is determined by we leave the details of algorithm to readers. the market, which we don’t have any control. What we can get is the observation state, which 4.2 Policy Network is the price and the news. Since we will collect We show a generic policy network in Figure1. history price of various stocks, o is provide by the t The green box shows the input/output and the red dataset. box shows the network. The news data consist of 3.2.3 Reward a fixed number of news/tweets of N stocks in one Instead of having reward 0 at each timestamp and day. The price day is the open, high, low, close. In partially-observable sequentially decision making, pT at the end, we take logarithm of equation5: the current step action also depends on previous T T Y X step action, which is denoted as previous hidden log pT = log µtyt · wt−1 = log(µtyt·wt−1) in Figure1.

Sentiment Analysis and Deep Reinforcement Learning for Algorithmic Trading

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support