Master of Science (M.Sc.) in Software Engineering

Twitter sentiments for high frequency trading

Viktor Víðisson

Instructor: Jacqueline Clare Mallett June 2021 Copyright Viktor Víðisson June 2021 Abstract

Twitter has become an influential source for news which can have impact on market behaviour, with many users monitoring news agency feeds directly without editorial reviews. Headlines from central bank leaders as well as top government officials can have significant impact on the markets. In this thesis we examined whether there is any correlation between a group of tweets quoting the same headline and measurable differences in the market before and after a given headline. Can this help high frequency trading firms to tune their strategies by filtering important headlines from the vast ocean of headlines by decreasing exposure when market making? In this thesis tweets were grouped by the number of references to the headline limited to 1 hour. We compared ratios of trading measurements before and after the headline came out. We trained a Recurrent Neural Network (RNN) using Long Short Term Memory (LSTM) cells telling us if the sentiment in the headline affects the US treasury futures or the SP 500 stock market future. We found that training initially on datasets that started trending after causing a noticeable initial reaction in the market helped the neural network to classify tweets as impact-full or not. Training a network that can take in a live feed from Twitter and other news agencies sources can help high frequency trading firms react automatically to incoming headlines.

2 Útdráttur

Twitter er byrjað að vera áhrifamikil upptök frétta sem geta haft áhrif á markaðs hegðun þar sem margir notendur fylgjast með frétta veitum og tweeta beint án ritstýringar. Fyrirsagnir frá stjórnendum seðlabanka sem og háttsettra stjórnmála fólks getur haft mikil áhrif á markaðinn. Í þessari ritgerð skoðum við fylgni þess að hópur fólks tweetar um fyrirsögn og mælan- legrar breytingar í hegðun á markaðnum fyrir og eftir fyrirsögnina. Getur þetta hjálpað hátíðni viðskipta fyrirtækjum að stylla algrímin sín sjálfvirkt með því að sigta út mikilvægar fyrirsagnir úr stóra hafinu af fréttum með því að minnka stærðir hjá viðskiptavökum? Í þessari ritgerð þá voru tweets flokkuð eftir hversu mikla umfjöllun þau fengu næsta klukkutímann. Við bárum saman hlutföll á markaðs mælingum fyrir og eftir fyrirsögnina. Við þjálfuðum enturtekið tauganet "Recurrent neural network (RNN)" og no- tuðum "Long short term memory cells (LSTM)" til að segja okkur hvort að fyrirsagnir munu hafa áhrif á Bandarísku skuldabréfa framtíðar samningana eða "SP 500" hlutabréfa vísitölu framtíðar samninginn. Við komumst að því að þjálfa tauganetið upphaflega á gagnasafni af tweetum sem fengu mikla um- fjöllun og höfðu sjáanleg áhrif á markaðinn, hjálpaði tauganetinu að ákveða hvort að ný tweets hefðu áhrif á markaðinn eða ekki. Að þjálfa tauganet sem getur tekið við beinu streymi af tweetum og öðrum fréttaveitum getur hjálpað hátíðni viðskipta fyrirtækjum að bregðast sjálfkrara við nýjum fyrirsögnum.

3 Twitter sentiments for high frequency trading

Master of Science (M.Sc.) in Software Engineering

Viktor Víðisson June 2021

Thesis of 60 ECTS credits submitted to the School of Science and Engineering at Reykjavík University in partial fulfillment of the requirements for the degree of

Student: Viktor Víðisson

Examining Committee: Jacqueline Clare Mallett

Examining Committee: Kristinn R Þórisson

Examining Committee: Stephan Schiffel

Examining Committee: Dean Christakos

4 Contents

List of Figures 8

List of Tables 10

1 Introduction 14 1.1 Introduction ...... 14 1.2 Related work ...... 16

2 Background 19 2.1 Limit order book ...... 19 2.1.1 Limit order book tutorial ...... 20 2.2 Sweep info and Stamina info ...... 26 2.2.1 Sweep info ...... 26 2.2.2 Stamina info ...... 28 2.3 Liquidity ...... 30 2.4 Zoom plot explanation ...... 31 2.4.1 Zoom plot tutorial ...... 31 2.4.2 Matching algorithms ...... 33 2.5 High Frequency Trading ...... 35 2.6 Market Making ...... 36 2.7 Market reactions ...... 38

3 Methods 42 3.1 Experimental design data preparation ...... 42 3.2 Code Flow ...... 45 3.3 Preparing tweets ...... 45 3.3.1 alike_next_hour ...... 48 3.4 Market impact analysis ...... 50 3.5 TA library ...... 52 3.6 Create datasets ...... 53

5 3.7 Dataset impact analysis ...... 55 3.8 Data augment the input data ...... 56 3.9 Prepare the data for the ANN ...... 57 3.10 Define the ANN model ...... 60 3.11 Training the ANN model ...... 63 3.12 Optimizers for the ANNs ...... 65 3.13 Regularization for the ANNs ...... 67 3.14 Training schedule ...... 69 3.15 Evaluation metrics ...... 70

4 Results 74 4.1 Dataset impact analysis results ...... 74 4.2 Evaluation metrics results ...... 76

5 Conclusion 80 5.1 Future work ...... 80

References 83

6 Appendix 87 6.1 Highest retweets ...... 87 6.2 Best results per optimizer ...... 91 6.3 Best results per evaluation metric ...... 96 6.4 Code ...... 101 6.4.1 Sort twitter ...... 101 6.4.2 Alike next hour ...... 104 6.4.3 Get product ...... 108 6.4.4 Prepare market data ...... 111 6.4.5 Tweets to train base ...... 113 6.4.6 Technical analysis, TA library ...... 120 6.4.7 Group sweep ...... 133 6.4.8 Classification metrics ...... 135

6 6.4.9 Regularize XY ...... 137 6.4.10 Class weights Output biases ...... 139 6.4.11 Get sentences ...... 140 6.4.12 Plot history ...... 141 6.4.13 Split XY ...... 142 6.4.14 Define model ...... 142 6.4.15 Fit model ...... 145 6.4.16 Grid searching configurations ...... 146 6.4.17 Predict sample ...... 150

7 List of Figures

1 Empty limit order book ...... 21 2 One open order in the limit order book on the bid price 9 . . . 21 3 Two open orders in the wide limit order book ...... 22 4 The book is one tick wide now ...... 22 5 Liquid limit order book ...... 23 6 An order crossing the book, 1st trade for 100 lots at price 10 . 23 7 A 1000 lot order crossed the book on the bid, matching the 200 lots resting on the bid and establishing the remaining 800 lots on the new best offer price 9 ...... 24 8 2500 resting orders on the bid price 8 get matched in two different sweep events, a) being a single trade report, b) being multiple trade reports ...... 25 9 Zoom plot demo part 1 ...... 33 10 Zoom plot demo part 2 ...... 33 11 LIFFE’s matching algorithm for Short Sterling ...... 35 12 James Bullard the president of the Federal Reserve bank of St. Luis talks about a 50 basis point rate cut in an interview . 38 13 John Williams the president of the Federal Reserve bank of New York gave an interview the day before. The following day the bank made a statement that his interview got misinterpreted 39 14 FOMC president Jeromy Powell gives his opinion about shift- ing trend in the Nonfarm payroll number ...... 40 15 The trade meeting between Trump and Xi got postponed . . . 40 16 White house looking into the possibility of firing the FOMC president Jerome Powell ...... 41 17 Trump announced he was going to delay increasing tariffs on Chinese imports ...... 41 18 All correlation cases number of consecutive cases ...... 47

8 19 Zoomed in on correlation cases number of consecutive cases . . 47 20 Cases where a tweet had y-axis many tweets in the following hour with a minimum correlation of 0.5 to it...... 49 21 Cases where a tweet had y-axis many tweets in the following hour with a minimum correlation of 0.5 to it...... 49 22 FOMC members and White house advisors ...... 53 23 The keras model when we use bidirectional LSTM layers and Batch normalization ...... 62 24 Dataset impact analysis results ...... 76 25 Best results by evaluation metric ...... 77 26 Results by optimizer plotted against ANNs ...... 78

9 List of Tables

1 Sweep info, rows of data for each contract ...... 27 2 Stamina info, rows of data for each contract ...... 29 3 Spread contracts, rows of data for each contract. Notice the difference in amount of sweep info rows and stamina info rows 30 4 Sweep info demo ...... 32 5 Stamina info demo ...... 32 7 Adam settings and results ...... 92 8 Adamax settings and results ...... 93 9 Nadam settings and results ...... 94 10 RMSprop settings and results ...... 95 11 Highest precision settings ...... 97 12 Highest recall settings ...... 97 13 Highest AUC settings ...... 98 14 Highest f score settings ...... 98 15 Highest MCC settings ...... 99 16 Highest Youden settings ...... 99 17 Highest C kappa settings ...... 100

10 List of Algorithms

1 Codeflow...... 45 2 Determining which future contract to use for market impact analysis of a tweet ...... 50 3 High level view of the market impact analysis ...... 51 4 Group sweep ...... 53 5 Creating datasets ...... 54 6 Dataset impact analysis ...... 55 7 Tokenizer ...... 58 8 Embedding matrix ...... 58 9 Define model ...... 60 10 Predict model ...... 64 11 Grid searching NN model settings ...... 68 12 The other binary settings in the grid search ...... 68 13 Get best classification results and model settings ...... 69 14 Training schedule ...... 70

11 List of Abbreviations

ANN Artificial Neural Network. 16, 78

AUC Area under the ROC curve. 70

CME Chicago Mercantile Exchange. 26

CNN Convolution Neural Network. 17

DIJA Down Jones Industrial Average. 16, 17

EST Eastern Standard Time. 42

FED Federal Reserve. 15

FIFO First In First Out. 33

FN False Negative. 70

FOMC Federal Open Market Committee. 15

FP False Positive. 71

HFT High Frequency Trading. 35

LSTM Long Short Term Memory. 2, 18

MCC Matthews correlation coefficient. 72

MM Market Maker. 15, 31

NLP Natural Language Processing. 16

Q Queue. 19

12 QE Quantitative Easing. 15, 81

RNN Recurrent Neural Network. 2

ROC Receiver Operating Characteristic. 71

SGD Stochastic Gradient Descent. 65

SVM Support Vector Machines. 17

TA Technical Analysis. 52

TN True Negative. 70

TP True Positive. 70

USTF US Treasury Futures. 43

13 1 Introduction

1.1 Introduction

Market actions are driven by supply and demand from the market partici- pants. Market participants have their guesstimates on where prices should be and where they are likely to be in the future. They place their orders accordingly. Many if not most traders take into considerations rate decisions made by central banks and changes in the economic environment. Many traders look at changes in foreign affairs between countries and trade rela- tionships. There are many economical fundamentals that can change when a central banker gives a statement about future outlook or a politician hints there might be upcoming changes in foreign or domestic affairs. It has be- come important for many traders and investors to monitor news agencies like Bloomberg, and try to analyse new headlines and decide if they should enter new positions or modify existing positions. In this sense, a position is owning a financial product, having a long position or owing a financial prod- uct, having a short position which means selling the product first with the intention of repurchasing it later at a lower price. Twitter can be a reliable source of news if good traders or news agencies are followed that provide a good overview of the sentiment in the market. Which headlines are relevant and which are not, depends on the financial product group being monitored, for example stock market indices or US treasuries. For example if a headline comes out where a single company gives out a warning that their production targets were not met, it will have a significant impact on that stock price, but it will unlikely impact the US treasury futures market. The former President Trump tweeted a lot 1. Most of his tweets did not have any impact on the market at all, but those which did, often had 1He got banned from Twitter for spreading misinformation about the alleged election fraud, https://blog.twitter.com/en_us/topics/company/2020/suspension.html

14 significant impact on the market 2. There are many day traders, market researchers and small market news agencies that are competing to tweet as fast as they can for example when Trump said anything noteworthy. They then build their reputation and get more followers if they are quick and relevant tweeting headlines. Decisions made by the Federal Reserve (FED) have the most impact on the US treasury futures and the stock market indices. Companies borrow money for production and give out corporate bonds that are tied to current interest rates determined by the Federal Open Market Committee (FOMC), and the Fed has also been buying bonds with Quantitative Easing (QE). The Federal Reserve has the most influence over the market. Their goal is not to influence the market but to minimize unemployment and keep prices stable (stable inflation). The Fed is made up of many Federal Reserve banks in key areas in the USA. The presidents of those banks take turns being on the FOMC voting committee. The Fed Chair, (currently Jerome H. Powell), is by far most influential and has the biggest influence over monetary policy. Some members are permanent voting members like the NY Fed president, vice chair and chairman of the board. The other Fed banks rotate their seats at the committee [23]. It is very important for a Market Maker (MM), which has open limit orders on both sides of the book, to know when a new headline comes in that will move the market. The market maker is looking for an equilibrium of buyers and sellers entering the market. The research questions being asked in this paper are:

• Do trending Trump tweets have more impact on the market compared to those who do not trend?

• Do trending tweets about Fed members mentioning monetary policy 2https://www.cnbc.com/2019/09/03/on-days-when-president-trump-tweets-a-lot-the- stock-market-falls-investment-bank-finds.html

15 have more impact on the market compared to those who do not trend?

• Can an Artificial Neural Network (ANN) learn to classify tweets as being market impact-full or not?

• Null hypothesis: Tweets can not be shown to impact the market.

To answer these questions we investigate the challenges of using neural networks on Natural Language Processing (NLP), to analyse the market. Recurrent Neural networks have been used to train on datasets of text that are marked as negative or positive manually to learn sentiment [6]. Hence it should be possible to do the same with financial market data: if the market is measured before and after a tweet is tweeted, then the tweet can be marked as eventful or not. This thesis shows how to compare tweets to market data and viewing the market-impact tweets have. It shows how to find trending tweets and use them to train the Artificial Neural Network (ANN). It shows how to grid-search ANNs settings that work on different datasets. It shows how to analyse the market impact a subgroup of tweets have on the market for example trending tweets mentioning Federal Reserve members, then use those market impact analysis to create a training schedule for the ANN, training the ANN on the datasets that impacted the market the most to begin with.

1.2 Related work

There is an extensive literature on using algorithms for trading. A lot has been done using Twitter and artificial neural networks for market speculation. Bollen, Mao Zeng integrated a trained ANN made by Google that categorized the tweets into 6 dimensions (calm, alert, sure, vital, kind and happy) after using their own atrificial neural network which categorised them as positive or negative [26]. They tried to predict if the Down Jones Industrial Average

16 (DIJA) stock index closed higher or lower. They got an 86.7% accuracy in predicting the daily up and down changes in the closing values of the DIJA from March 2008 to December 2008. Gilbert and Karahalios used a dataset of 20 million blog posts posted on LiveJournal tracking the emotional state of web bloggers, they estimated the anxiety, worry and fear from those posts [21]. They then applied Monte Carlo Simulation and Granger-causal Analysis to see if a lot of fear put downward pressure on the SP 500 stock index and upward pressure on the VIX index which is a volatility index. They found that their anxiety index was high for a long while before the stock market fell during the financial crisis back in 2008 and that their anxiety index was negative when the stock market started recovering. Recurrent Neural Networks (RNN) are most commonly used when getting sentiment out of text. Convolution Neural Network (CNN) have also been used to analyse sentiment [20]. Abid, Alam, Yasir and Li combined CNN and RNN to measure sentiment [22]. They used CNN layers to extract features and RNN layers to catch the long term dependencies. They got a better accuracy on the Stanford Twitter sentiment datasets than the group that only used CNN. There are also those who use machine learning techniques other than deep learning [29]. Kiritchenko, Xiaodan and Mohammad used Support Vector Machines (SVM), but also took into consideration certain types of hashtags to get hints of the sentiment. Their system ranked first in both tasks at the SemEval-2013 competition (Sentiment analysis in Twitter). ANNs excel at storing implicit knowledge, but they struggle with tasks that require long-term memory of temporal causal chains. They lack a working memory system like humans have. Recurrent neural networks provide a way to extend deep learning to sequential data with parameter sharing. The parameter sharing works as each member of the output is a function of the previous members of the output. Each member of the output is produced using the same update rule applied to the previous outputs. This allows

17 RNNs to share parameters through a very deep computational graph. Long Short Term Memory (LSTM) networks have been shown to learn long-term dependencies more easily than the simple recurrent architectures. LSTMs have shown to connect events with large temporal distances between them [25] [1]. We were not able to find any research on measuring the micro trading conditions before and after headlines, nor grouping together quoted headlines nor those who analyse the difference in market conditions after Fed member comments, see chapter 3.1.

18 2 Background

The goal of this chapter is to provide the reader an understanding of how trades are done on exchanges, and the kind of market data this thesis uses for analysing market impact with a tutorial on how those metrics are best viewed against tweets, different types of market participants and different market reactions to headlines. There are multiple exchanges around the world trading a wide variety of financial products. Trading the assets directly or options or futures or swaps on the asset etc. The assets can be stocks, bonds, cattle, oil, crypto currency, currency, stock indexes and many other things. What all of the exchanges, assets and asset contract types have in common is that almost all of them are traded via the limit order book.

2.1 Limit order book

The limit order book is a simple and a very efficient method for trading. The price is solely determined by the supply of market participants. The book consist of bids and asks. Offers is a term often used instead of asks. The bids have queues for limit orders on the bid prices and asks have queues for limit orders on the ask prices. The book builds as market participants enter their limit orders into the queues. They specify which price, which side (is_offer) and the size of the order. That order then enters at the back of the Queue (Q) at that price on that side. A limit order is a type of order to purchase or sell a security at a specified price or better. For buy limit orders, the order will be executed only at the limit price or a lower one, while for sell limit orders, the order will be executed only at the limit price or a higher one. This stipulation allows traders to better control the prices they trade 3. If an incoming sell limit order is for a price that is currently a bid in the

3https://www.investopedia.com/terms/l/limitorder.asp

19 limit order book, then it will match orders that are waiting in the Q on that bid price who will buy the contract from the incoming seller, hence a trade occurs. The incoming sell limit order crossed the book. Crossing the book means either sending a buy order to the exchange on a price that is currently an offer price or vice versa sending a sell order to the exchange on a price that is currently a bid price. A trade consists of two market participants. A single order that crosses the book can match multiple orders in the Q. Then that order can match with multiple market participants. If the incoming sell limit order is for a price that is currently an offer in the limit order book, then it will enter the line at the back of the Q of that price. The same goes for an incoming buy limit order. If it is for a price that is currently an offer in the limit order book, then it will match orders that are waiting in the Q on that ask/offer price who will sell the contract to the incoming buyer, hence a trade occurs. If the incoming buy limit order is for a price that is currently a bid in the limit order book, then it will enter the line at the back of the Q of that price.

2.1.1 Limit order book tutorial

Le us take an example of a limit order book which has a price step of 1, which is the difference in prices. It is only possible to place orders at prices with a price step interval, like 9,10,11,..etc. If the price step is 3, it is possible to place at prices 9,12,15,...etc. It is not possible to place at price $9.01 for example and skip the Q at price 9. This used to be possible on some exchanges a decade ago. The book starts empty as seen in figure 1.

20 Figure 1: Empty limit order book

A trader sends to the exchange a buy limit order for price 9, quantity: 10. Then price 9 becomes the best bid, see figure 2.

Figure 2: One open order in the limit order book on the bid price 9

A trader sends to the exchange a sell limit order for price 11, quantity 20 enters the book. Then price 11 becomes the best ask/offer. Now the book is two tick wide since there are no orders on price 10 as shown in figure 3.

21 Figure 3: Two open orders in the wide limit order book

A trader sends to the exchange a sell limit order for price 10, quantity 10 enters the book. Now price 10 becomes the best offer, see figure 4.

Figure 4: The book is one tick wide now

Multiple limit orders enter the book placed by multiple market partici- pants. They enter after existing entries. The book is now considered to be more liquid, see figure 5.

22 Figure 5: Liquid limit order book

If someone enters a buy order for 100 at price 10 they will trade with the first 100 orders in the Q at 10. Now 1205 lots remain in the Q at offer price 10. The aggressive order, the order that crossed the spread determines if it was a buy order or sell order. In this case it was classified as a buy order, but bear in mind for the other market participant it was a sell order, see figure 6.

Figure 6: An order crossing the book, 1st trade for 100 lots at price 10

If the aggressive order is larger than the total quantity in the Q, this is

23 called a sweep. If an aggressive sell order for 1000 lots at price 9 comes in, it takes the available 200 lots, then changes "flips" the price and the remaining 800 go to the front of the Q on the ask/offer side. The price lowers from price 10 to price 9. This is how prices change. Supply and demand in the limit order book, is shown in figure 7. In between the price changes there occurred 100 buys and 200 sells. Mea- suring the Back and forth trading can be the minimum of those two, or 100 lots, or looking at the minimum, or the volume on the other side of the book, meaning the trading volume on the best offer before the best bid traded through, or the trading volume on the best bid before the best offer traded through. In this case both give 100 lots back and forth volume.

Figure 7: A 1000 lot order crossed the book on the bid, matching the 200 lots resting on the bid and establishing the remaining 800 lots on the new best offer price 9

Now lets look at two different sweep event types, meaning the flow of events while a price is being swept.

(a) 1 large aggressive order takes out the whole Q on a price.

(b) Multiple orders with few microseconds or milliseconds apart take out

24 the Q and flip it.

Some exchanges report the size of the incoming aggressive order in the trades list while other report the size of the passive limit orders in the Q that got filled. for case (b.) If the timestamps of the incoming trades are near iden- tical, it is more helpful to sum up these trades to 2500 instead of looking at the sweep event as being a single 10 lot order that flipped the price, see figure 8.

Figure 8: 2500 resting orders on the bid price 8 get matched in two different sweep events, a) being a single trade report, b) being multiple trade reports

The data used in this thesis to compare against the tweets, are mea- surements on those sweep events, and what happens in between these sweep events to the best bid and best offer.

25 2.2 Sweep info and Stamina info

The data used in this thesis is different micro trading analyses of several fu- ture contracts traded on the Chicago Mercantile Exchange (CME), collected and measured at our firm. Each row represents a price change, analysis of the sweep event which changed the price and some micro trading behaviour in between the price changes. Update_level messages are sent from the ex- changes to all market participants when some trader places a new order into the limit order book or cancels an existing order. Book_traded messages are sent from the exchange to all market participants when a trade occurs between two traders. By micro trading behaviour it is generally meant quan- tifying all update_level messages and book_traded messages coming from the exchange.

2.2.1 Sweep info

The Sweep info analyses the sweep event and what happens on the best bid and best offer since the previous sweep event. We use additional functions to compare sweep events giving us a better view of trading conditions. See table 1 for an overview of the products we are using and how many rows of data each product has, most of them are active for 3 months, some of them such as the spread contracts are mostly active for a week before the former contract expires. The more rows a product has, the more likely it is to be more volatile, hence prices change more frequently. The US treasury futures spreads are the least volatile products being a spread between two very similar contracts but they also have the largest sweep events because market makers are willing to put a lot of exposure on them, knowing they are not volatile. When a sweep event has started it has already been sent to the exchange even though it might take a few milliseconds to report it back. With more trades being matched the longer it takes the exchanges to report back to

26 the market participants, for example the spreads have longer sweep events compared to a thin volatile contract. Because the spreads have a lot of open limit orders on the best bid and best offer, it takes the exchange a long time to match all those orders and report the trades back to all market participants during large spread sweep events.

M8 U8 Z8 H9 M9 U9 Z9 H0

ZT 16564 25036 37294 134962 129558 304414 257903 255496 ZF 117916 187970 205769 221783 153225 452726 458669 427095 ZN 96820 131568 189642 171725 111946 349323 336588 367294 TN 194668 233915 208817 215739 NaN NaN NaN NaN ZB 250662 226857 327826 254717 203903 461816 556869 611663 UB 257485 310031 392533 329908 293608 588962 966248 934594 ES 1705286 1042102 4080647 2045909 1203395 2702206 2191826 8305085

Table 1: Sweep info, rows of data for each contract

Table 1 gives an overview of how many price changes happened in each contract, each row containing an analysis of the price changing event (sweep) and what happened in between the price changes. ZTM8 stands for the 2- Year US T-Note future which expired in May 2018, ZNZ9 stands for the 10-Year US T-Note future which expired in December 2019. An individual contract is called an outright, while a contract which trades two contracts at the same time which expire at different time are called spreads for example ZTU8-ZTZ8. In this thesis we used the outrigts that were active during 2018 and 2019. We have both older and newer data that can be used to further test and train the networks on. Also spread contracts that trade actively for shorter periods of time and have few rows of data due to their non volatile nature. Each row in the sweep info dataset has the following columns: { ' quant ' : Total trade quantity of the sweep event that changed the price , ' price ' : Price , ' i s _ o f f e r ' : Did the offer or bid just trade through,

27 ' ts_epoch ' : Second timestamp, ' ts_micros ' : Microsecond timestamp, ' s e l l s ' : How many contracts were sold in between this sweep event and the previous one, ' buys ' : How many contracts were bought in between this sweep event and the previous one, ' support_before ' : The liquidity on the price before the sweep event ' ' latency ' : How many micro seconds did the sweep event take , ' latency_limit ' : Time limit on what we decide to be part of the sweep event wrt trade size. ' latency_reached ' : Did the trade event take longer than guesstimated time limit , ' trade_count ' : How many trades were in the sweep event, } Other market making specific columns are: [ ' min_opp_support ' , ' last_opp_support ' , ' seed_support ' , ' seed_depth ' , ' seed_timing ' , ' continued ' , ' double_continued ' , ' bnf ' , ' offer_bnf ' , ' bid_bnf ' , ' Bad_sweep ' , ' double_Bad_sweep ' , ' Bad_sweep_latency ' , ' life_saving_volume ' , ' bnf_before_bs ' , ' Bad_sweep_bnf ' , ' ts_combined ' , ' volume ' , ' marker ' , ' boS ' , ' bbS ' , 'boBS ' , 'bbBS ' , 'boDBS ' , 'bbDBS ' ]

2.2.2 Stamina info

Stamina is a measure which sums up back and forth trading for multiple liquidity thresholds on the best bids and best offers. Liquidity in this case means the amount of orders in the Q at a given time during incoming spread crossing orders (trades). This measures activity in between price changes.

28 How much liquidity were market participants willing to provide on the best prices while other market participants were crossing the spreads. After market moving headlines this stamina usually lowers with market makers being more passive, not willing to offer liquidity on the best prices, and more prone to cancel open orders and dump their positions more quickly in a moving market. After a market moving headline, and after the price has moved a certain amount, there are often price levels where the market is uncertain whether the price has gone too far being an overshoot or if it still should go further. During those times a decent amount of stamina can establish with many cross spreading sellers and buyers arriving at the same time. Cross spreading meaning one who buys at the offer and sells on the bid, instead of placing a limit order waiting to match with someone else like market makers. See table 2 for the overview of products and how many rows of stamina information each has.

M8 U8 Z8 H9 M9 U9 Z9 H0

ZT 9862 26886 39379 14751 4276 632 710 1746 ZF 4800 11344 18576 12210 10790 4339 3640 7406 ZN 16193 31788 42301 24629 32616 16439 15742 27265 TN 769 1312 1274 921 11158 NaN NaN NaN ZB 9566 17340 24863 11880 8986 8297 4318 9862 UB 6156 7622 14253 7265 18118 5459 2098 3398 ES 2218 14230 19494 8849 18681 18681 22599 21367

Table 2: Stamina info, rows of data for each contract

Columns for stamina_info: { ' ts_epoch ' : Second timestamp, ' ts_micros ' : Microsecond timestamp, ' i s _ o f f e r ' : Was the stamina flushed on the offer or bid side , ' c l e a r _ a l l ' : All stored stamina info is flushed due to a sweep event,

29 ' stamina ' : How much liquidity on the best prices was provided by market makers and others with open orders while trading on both sides was being established , ' bnf ' : How many bought and sold under the stamina liquidity , while the support was higher on both sides than a set stamina. It is calculated for each stamina threshold. }

Due to the non volatile and high volume nature of the spreads there is much more stamina data compared to the number sweep events as show in table 3.

ZTM8ZTU8 ZFM8ZFU8 ZNM8ZNU8 ZBM8ZBU8 UBM8UBU8

sweep_info 162 357 346 185 144 stamina_info 106956 96077 62439 45476 295848

Table 3: Spread contracts, rows of data for each contract. Notice the differ- ence in amount of sweep info rows and stamina info rows

2.3 Liquidity

A market is said to be liquid if traders can easily enter and exit their positions because there are always open limit orders on both sides of the book with whom they can trade with. The term liquid can be taken further. A market can be said to be liquid when it almost always one tick wide, meaning the price difference between the best bid and best offer is exactly one price step interval. In the limit a market is liquid when open limit order on the best bid and best offer stay open without being canceled when trading on that price starts accumulating. That liquidity is what the stamina info in the chapter

30 before is measuring against back and forth trading. The more willing traders are to keep open orders on the best prices under the trading volume, the higher the stamina is.

2.4 Zoom plot explanation

In this thesis we introduce a "zoom plot" which provides a view of how the market is behaving on the best prices in between, and during price changes. This is a method we developed to help view if a tweet had any impact on the market. They are viewed from the market maker’s perspective. A Market Maker (MM) is a type of trader who has open limit orders on both sides of the book at the same time. Sweep events are marked red if the market continues in the same direction in the following sweep event; Purple if the market continues further than 1 price step consecutively. It marks back and forth trading as green since that is what market makers seek. It marks stamina information as stars and corresponding horizontal lines. The line stands for the liquidity threshold, and the star stands for the back and forth trading that established itself while the liquidity was above that threshold. The more thresholds there were, which had back and forth trading surpassing the threshold limit, the more lines and stars there will be at that data point. The lines and stars have matching colors. For example the 1000 limit threshold is purple. It will have a purple horizontal line and a purple star somewhere above it.

2.4.1 Zoom plot tutorial

Consider two datasets. One includes the sweep info and the other the stamina info for the same time period, see tables 4 and 5.

31 quant price is_offer ts_epoch ts_micros sells buys support_before

4500 10 True 0 286873 100 6000 4500 1500 12 True 1 287050 500 3000 1500 1300 14 True 2 287170 1200 2500 1300 5000 16 True 3 310009 400 6000 5000 400 16 False 4 396104 3000 100 400 1500 16 True 5 396826 100 1500 1500 600 18 True 6 418114 300 100 600 1000 18 False 7 459542 3000 50 1000 2000 16 False 8 464054 4000 150 2000 3000 14 False 9 471792 3000 1000 3000 100 14 True 10 481091 200 10 100 200 14 False 11 490278 2000 600 200 1000 14 True 12 535428 200 3000 1000

Table 4: Sweep info demo

ts_epoch ts_micros is_offer clear_all stamina bnf

0 286873 True False 0 0 1 287050 False False 200 450 1 287050 False False 400 50 2 287170 False False 200 1200 2 287170 True True 400 1200 2 287170 True True 600 1100 2 287170 True True 800 1100 2 287170 True True 1000 1050 3 310009 False False 200 100 11 490278 False False 200 550 11 490278 False False 400 500 11 490278 False False 400 322 12 535428 True True 200 328

Table 5: Stamina info demo

The x-axis is artificial data for demo purposes. When iterating through each line, the zoom plot will update as follows, the initial 4500 quant sweep is marked as blue to begin with since it has not continued further yet see the first plot in figure 9. Then in the second plot the market has continued further and that 4500 quant sweep is now marked red. Then it continues even further so it becomes purple and a larger marker. The most amount of stamina established with the back and forth trading is at index 2, resulting in couple horizontal lines and corresponding stars as seen in figures 9 and 10. The right y-axis is for the price that traded through (black line), the best bid (thin green line) and the best offer (thin red line). The left y-axis is for the sweep quantities and back and forth trading quantities (stamina).

32 Figure 9: Zoom plot demo part 1

Figure 10: Zoom plot demo part 2

2.4.2 Matching algorithms

Most exchanges use First In First Out (FIFO) matching algorithms which means if an incoming trade order crosses the book and matches orders that

33 are waiting in the Q, the trade will match the orders that entered the Q first. Crossing the book means placing a buy order on the best offer or placing a sell order on the best bid. Some exchanges also use prorata algorithms or split prorata/fifo algorithms or even more complicated time incentive prorata algorithms. Prorata algorithms take the incoming trade order which crossed the book and matches it with the largest orders in the Q, giving the largest portion of the fill to the one who has the largest order in the Q. For example, if it is split prorata/fifo say 60%/40% ratio such as the ZT contract on CME (2 year US treasury future), then 40% of the incoming order that crossed the book goes to the ones that came first into the Q and the remaining 60% get distributed to the largest orders in the Q. The purpose of the prorata algorithm is to give incentives to market participants to place large passive limit orders, hence increasing liquidity. This also has the effect that prices are swept with much larger sweep orders creating more volume and generating more fees to the exchanges since they charge per contract on top of other fixed costs. This does however also reduce volume since sweep events are more calculated and infrequent. There is less urgency in crossing the book. The books become less volatile and are less likely to move on small news headlines.[24] The more complicated matching algorithms like the ones on the LIFFE exchange in the UK that trades Short Sterling, Eurodollar and Euroswiss use time incentive prorata matching algorithms that take into account the size of the order in the Q and it’s position in the Q, see figure 11.

34 Figure 11: LIFFE’s matching algorithm for Short Sterling

2.5 High Frequency Trading

The term High Frequency Trading (HFT) is open to interpretation. It is generally used to refer to those who trade often and hold positions for short periods of time, usually with computer algorithms that are co-located at the exchanges. Market makers are generally classified as high frequency traders. Event traders are sometimes classified as high frequency traders. Exchange arbitrage traders are classified as high frequency traders. Many contracts are traded on multiple exchanges. If an exchange arbitrage trader sees a price difference between exchanges he will buy the contract on the exchange where the price is lower and sell the contract on the exchange where the price is higher.

35 2.6 Market Making

Market making is a general concept and it can be open to interpretation as to when a trading strategy can be classified as being a market making strategy or not. A market maker can be a single tick market maker or a wide market maker. Some exchanges require that the majority of participants use passive limit orders in order to be classified as a market maker and possible get a market making fee discount.4 A simple explanation of a single tick market maker’s strategy is placing passive orders on all prices near the best prices. The single tick market maker is trying to buy passively with limit orders in the Q on the best bid, and sell passively with limit orders in the Q on the best offer in between price changes. The market maker is looking for back and forth trading, preferably with good stamina conditions and they are trying to avoid large sweep events, especially the ones that continue further, with the following sweep events being at a higher prices with offer sweep events or at a lower prices with bid sweep events. If a market maker sells on the best offer and becomes short and then thinks the price is going to move higher, they will try to buy the same amount, aggressively crossing the spread on the best offer. This is known as scratching the position. Vice versa, if they buy passively on the best bid and think the price is going to move down they will try to sell by crossing the spread on the best bid, scratching the position. If the price event occurs before the market maker manages to scratch the position then they will enter the Q on the other side and try to get out of the swept position passively. If the sweep event continues further, meaning the next price will get swept, they will then try to get out on that price, before it gets swept or by entering the Q on the other side to get out passively for a 1 tick loss. 1 tick loss means a loss of the price of the price change multiplied by the number of contracts. For example with the 10 year US treasury future ZN, the minimum price

4https://www.cmegroup.com/content/dam/cmegroup/trading/interest-rates/ files/ir-incentive-program-guide.pdf

36 fluctuation is $15.625. This means if a market maker lost 1 tick on a 100 lot position, they will lose $1562.5 plus clearing and exchange fees on the position. Originally they were trying to sell 100 contracts on the best offer and buying 100 contracts on the best bid or vice versa, winning $1562.5 minus clearing and exchange fees. It would be very valuable for the market maker to be able to predict these conditions. Will conditions be favorable for the market maker with a lot of back and forth trading under good stamina or will there be limited back and forth trading with large sweep events that continue further? Market makers contribute most of the liquidity provided on trading ex- changes. They make the tick thin, meaning the ones who enter the market aggressively by crossing the spread will almost always get the best price avail- able. It is often claimed that if there were no market makers the book would be wide, hence traders coming into the market crossing the spread would have to buy the contract at a much higher price or sell at a much lower price. High frequency market making has increased volume significantly which has resulted in exchanges being able to lower the trading fees as well as newer and better technology on the exchange side going from pit trading to electric trading. The amount of sweep events is far greater compared to the amount of back and forth trading events, which makes it difficult for the market makers to profit. Some high frequency trading firms have made agreements with on- line brokers to buy their order flows. The broker can than offer customers the opportunity to trade with zero fees. They then market make these incom- ing orders on their personal dark pools without competition on the public exchanges. Dark pools are limit order books within a single company. Al- though the trader gets to trade with zero fees they may get worse prices, if the dark pool maintains a wide spread. In trader parlance this is known as spending a dime to save a nickel.

37 2.7 Market reactions

Markets tend to react differently depending on the sentiment and what funda- mentals most market participants have priced in. In this chapter we we look at several different market reactions to different headlines using the zoom plot method described before. The federal fund rate which is determined by the FOMC about every 7 weeks is by far the most impact-full number. If there’s an expected change in course of the rate path in the future, espe- cially near future, the market will react. The market will look for hints and clues when Fed members are giving speeches and or interviews. The Blue vertical line on the plots marks the time of the headline (tweet), see figure 12.

Figure 12: James Bullard the president of the Federal Reserve bank of St. Luis talks about a 50 basis point rate cut in an interview

It can be hard for Fed members to give speeches and interviews since they can say something that gets misinterpreted. Sometimes the market also reacts when headlines that moved the market earlier in the day or few days ago are taken back, see figure 13.

38 Figure 13: John Williams the president of the Federal Reserve bank of New York gave an interview the day before. The following day the bank made a statement that his interview got misinterpreted

Often the market reacts with different trading behaviour on headlines even though it is hard to spot the headline just by looking at the price graph. Often the market will get much more sweep heavy on a headline with the price trading on a similar trend and on a similarly tight range. Sweep heavy means there are mostly sweep events and sweep events become larger. There can be confusion on how or if the headline had any impact. Here it is very important to know the current sentiment in the market, as shown in figure 14, where there is an increase in sweep event behaviour.

39 Figure 14: FOMC president Jeromy Powell gives his opinion about shifting trend in the Nonfarm payroll number

It can also occur that headlines which are not important can move the market sharply initially, then cause buyers and sellers to arrive at the same time in the confusion, then the market drifts back to where it was before, as shown in figure 15.

Figure 15: The trade meeting between Trump and Xi got postponed

The Fed chair serves 4 years. The Fed is supposed to be independent but sometimes silly headlines seem to spike the market, see figure 16.

40 Figure 16: White house looking into the possibility of firing the FOMC president Jerome Powell

The US treasury futures market trades 23 hours, 5 days a week excluding the pre-market auctions. During overnight periods the market is less active and it can take a bit longer for it to react to the headline. The headline might be important. Trump delaying putting tariffs could be interpreted as showing good faith and that trade negotiations with China are going well, see figure 17.

Figure 17: Trump announced he was going to delay increasing tariffs on Chinese imports

41 3 Methods

3.1 Experimental design data preparation

There are two types of market impacts that we are looking for.

1. Will the headline move the market, increase volume, bid ask spread [27] and volatility.

2. Will the headline not move the market but make the market more illiquid after it and increase the price changes volume (sweep quantity).

The tweets are the independent variables in the experiments and the reactions by the market are the dependant variables. We measured this using already made trading analysis of the market before and after the tweet. We grouped together the tweets and count how many tweeters that we follow mentioned the headline, limited to a fixed time of 1 hour. We use the timestamp of the Trump tweet to split the market data analysis. Trading in between price changes and sweep quantity has been measured beforehand, (see chapters 2.2.1 and 2.2.2). We compared the ratios between these two before and after the event. Volatility and range is calculated from the price changes and volume is included in the market analysis. Fed members do not use twitter directly and therefore do not have a timestamped tweet. We took the first tweet that mentioned the headline and split the market data one minute before that tweet. That is more than enough to put the market reaction into the post headline data-frame. We exclude tweets that happen on key time spots where economic num- bers are usually released. For example 8:30am Eastern Standard Time (EST) is a time slot used for releasing most of the important economic numbers. Other busy time slots are 10am EST, 1pm est and every 6 weeks when the Fed announces the rate decision at 2pm EST. At 9:30am EST the stock mar- ket opens, followed by increased volume and volatility in the future market,

42 similarly increased volume is often seen when the stock market is about to close at 4pm EST. We used market data analysis for US Treasury Futures (USTF) and SP 500 futures, E-mini. There are 6 different US treasury futures which all have the same 4 expiries per year. The E-mini contract also has 4 expiries per year, about three weeks after the US treasury futures expire. All of the market data is time stamped down to a microsecond. In order to extract the sentiment out of the market we categorize the amount of references to a headline by multiple accounts in a short time window. This helps when determining if an incoming tweet is impact-full or not. We used LSTM Neural Networks for this purpose [25]. The twitter accounts that quote the headlines are generally considered to be up to date with the overall sentiment in the market at any given point. Obviously there are many dependant variables since we’re looking at var- ious different market reactions. There are also many independent variables:

• How many tweets referenced the headline.

• How many followers does the user have.

• How important is the person in the headline.

• Keywords in the headline.

• Widespread effect of the tweet, example has a war started.

• Was the market very volatile before the headline.

In this thesis we measured how much the tweets trended. We did not use the amount of followers a user has. We created specific datasets of tweets who mention persons who influence the markets. We created specific datasets that include the word "says" in the tweets. We did not measure the widespread effect of the tweet. We compared the market conditions before a tweet was tweeted to the market conditions after the tweet was tweeted.

43 We have observed that the most retweeted tweets are usually not the ones that have impact on the market. We deem retweet count not to be a helpful input into the neural network. The reader can see the most retweeted tweets in the appendix chapter 6.1. Most of them are obviously not tweets that impact the market.

44 3.2 Code Flow

The flow of the code can be seen in the following algorithm. The code being referenced in the algorithm can be found in the appendix under code. Algorithm 1: Code flow -> Sort_twitter; -> Alike_next_hour; -> Get_product; -> if do_technical_analysis then -> Group_sweep; -> do_ta; end -> Prepare_market_data; -> Tweets_to_train_base; -> if do_data_augmentation then -> Reguralize_XY; end -> Class_weights; -> Output_biases; -> Get_sentences; -> Split_XY; -> Define_model; -> Fit_model; -> Predict_sample; -> Grid_searching_ANN_configurations;

3.3 Preparing tweets

We collected twitter data between 2017 and 2020 into a hdf file, tweets.h5, only tweets from users we have monitored and deemed useful were selected in total over 400 thousand tweets from these selected users were used. See

45 in the appending chapter 6.4.1 on how the tweets were modified. We wanted to find headlines that started trending, ergo those where other tweeters start covering the same headline. We used the SequenceMatcher library to compare tweets. In order to be able to do that using vectorization we created new columns where we shifted users and tweets before and after into new columns for each tweets. After that we could create new columns running SequenceMatcher on the previous and following tweets and the correlation ratio between them. The sequence matcher class has a ratio() function which returns a measure of the sequences similarity as a float on the range [0,1]. 5 This creates a dilemma: should we keep the correlation minimum low to get the more cases to train the ANN with or high to get more accurate similarity between tweets? If we have the correlation minimum too high we end up only with retweets, if we keep it low around 0.3 we may get headlines from the same person but not necessarily the same headline, if we keep it high we have fewer consecutive cases. For example,

• correlation minimum: 0.3:

– 2 sequential cases: 70k. – 6 sequential cases: 1205.

• correlation minimum: 0.6:

– 2 sequential cases: 13k. – 6 sequential cases: 6.

The sequence matcher documentation suggests as a rule of thumb, a ratio() value of 0.6 which should mean the sequences are close matches.

5https://docs.python.org/3/library/difflib.html

46 Figure 18: All correlation cases number of consecutive cases

Figure 19: Zoomed in on correlation cases number of consecutive cases

When we chose the minimum 0.5: we usually got the same headline and as seen in figure 18 it gives us 46% more cases compared to setting it as 0.6.

47 We came to the 0.5 conclusion with trial and error comparing tweets with different minimum ratios. Now we look at the scenario. There can be tweets coming in between not quoting the headline. They might be about something totally different or they might be about something else that had just moved the market. We needed to find a tactic for finding how many tweets match a certain tweet for the following period of time, maybe 30 minutes, possibly longer. Some analysts have a big following not for being the fastest market tweeter but for being great analysts. They might quote the headline a couple minutes later, even hours later. If so it might be helpful for training the network. We are using the timestamp of the first tweet. When it comes to using this practically, if we were to act immediately on the headline, we can’t wait for a minimum number of tweeters to quote the headline. But we can be more patient if we are warning other strategies that the sentiment has changed and it might impact the trading behaviour in the market for the next minutes or hours. For this analysis we created the attribute alike_next_hour.

3.3.1 alike_next_hour

Creating the alike_next_hour attribute was more challenging than expected. The reader can look at chapter 6.4.2 in the appendix to see the different versions tried creating this attribute. We used the 4th method marked as do_alike4 in the code. We compared each tweet to all of the tweets that came 1 hour after it. As seen in the code comments it take a long time generating the Alike_next_hour attribute, we decided that using tweets that trended for an hour was sufficient. We used correlation minimum 0.5 for the Alike_next_hour as well. Alike_next_hour gives us a dataset of 85k tweets when we use a correlation minimum ratio of 0,5 which is 4.5 times larger than the dataset that only looked at correlation minimum of the following tweet.

48 Figure 20: Cases where a tweet had y-axis many tweets in the following hour with a minimum correlation of 0.5 to it.

Similarly for Trump and his advisors.

Figure 21: Cases where a tweet had y-axis many tweets in the following hour with a minimum correlation of 0.5 to it.

Looking at figures 20 and 21 we can see how big a proportion of the

49 trending tweets were discussing or referencing Fed members and how many were discussing or referencing White house members. The tweets that trend for the next hour are considered relevant. These groupings then played a big role when we trained the neural networks discussed in the following chapters.

3.4 Market impact analysis

In order to determine whether tweets were eventful or not we need to compare them to actual market data at the time of the tweet. As discussed before, most of the future contracts are actively trading for about three months each. We need to created a dictionary for each future contract for when they were active. Algorithm 2: Determining which future contract to use for market impact analysis of a tweet -> Dictionary of start and end time for a future contract; for contract in list_of_future_contract do -> Collect active start and end time into dictionary; end for tweet in tweet_dataset do -> Put the time index into the dictionary and get which contracts where active when the tweet came out; end We have the option to drop tweets that come out on key time spots where economic numbers are usually released.

50 Algorithm 3: High level view of the market impact analysis for tweet in tweet_dataset do -> Check if it is in a key economic time spot; -> Check if it has the word "says" in it; -> Shift the timestamp of the tweet 1 minute earlier to make sure we get the market impact in the after part of the contract. for product in active_contracts of the time of the tweet do -> Create two samples from the future contract; –> Fixed time period before the tweet came out; –> Fixed time period after the tweet came out; for metric in list_of_metcis_to_analyse do -> Send both parts into the get_ratio() function; if Significant change in metric before and after the tweet then -> mark y_train[i][metric] as eventful; else -> mark y_train[i][metric] is uneventful; end end if Require_initial_impact then -> Create two samples from the future contract; –> Fixed time period before the tweet came out; –> Very short time period after the tweet came out; for metric in list_of_metcis_to_analyse do -> Send both parts into the get_ratio() function; if Significant change in metric before and after the tweet then -> mark y_train[i][metric] as eventful; else -> mark y_train[i][metric] is uneventful; end end end 51 end end We used require_initial_impact because too many non market-impacting tweets came before the actual tweet that caused the change in the mar- ket. Those tweets might have been tweeted 10-15 minutes before the market reactions. They would also see changes in the metrics in the before and after samples of the future contract and get marked as eventful. See chap- ters 6.4.3 and 6.4.5 in the appendix for details on implementation. We ran require_initial_impact after the initial market-impact check because we wanted to see how many metrics the tweet impacted and by how much.

3.5 TA library

Technical Analysis (TA) trading is often looked down upon by some market participants but some of the most successful traders in the world talk about how they use technical analysis for trading when interviewed in the Market Wizard book series written by Jack. D. Schwager [18]. In the Hedge Fund wizards book [17] many successful hedge fund managers talk about how they use fundamental analysis to determine what to buy, then they used technical analysis to determine when and at what price to buy and sell the asset. Based on this we integrated the technical analysis library (TA library [11]) into our data, to get more attributes to feed into the neural network. Most of those indicators are daily indicators but we used our group_sweep() function to group our sweep data into optional time interval, with closing, high, low prices for that interval, volume, etc. We can then generate more short term intraday technical analysis that could help us when determining whether a tweet packed an impact or not. See chapters 6.4.7 and 6.4.6 in the appendix for details on how how we integrated the TA library into our project with the help of group_sweep().

52 Algorithm 4: Group sweep -> Group the market data into groups of small time intervals such as 15 minutes; -> Get the low price, high price, closing price and volume for each interval; -> Merge the groups together with the new low price, high price, closing price and volume values; -> Run TA on the market data using the new values;

3.6 Create datasets

The Fed and Trump Advisor data frames include mentions of people seen in figure 22.

Figure 22: FOMC members and White house advisors

53 Algorithm 5: Creating datasets -> Load in modified_tweets.h5 where Alike next hour has been generated as well as the Fed, Powell, Advisor and Trump dataframes; -> Create same_headline column where the following tweet has one less alike_next_hour value. Locating the first instance of the headline; if Alike_next_hour >= 3 then -> Use Tweet for the alike_next_hour dataset; -> Load Trump’s personal account; -> Determine which future contracts to compare to the tweets. We decided to use the 10yr US treasury future contracts that were trading during 2018 and 2019. It is the most traded US treasury future contract; for contract in contracts do -> Pre-process the contract as seen in chapter 6.4.4; for dataset in datasets do -> Run market impact analysis as seen in chapter 3.4; -> Collect x_train data which are the tweets. Save that into x_train.h5| hdf5 file; -> Collect y_train data and save into y_train.h5; -> Collect y_trainB data and save into y_trainB.h5; ( y_trainB is binary saying whether it impacted any metric or not, y_train says which metrics specifically were impacted ) We deemed that shift_minute and drop_events should be true by de- fault and only used those datasets. The datasets marked with shift_minute had the timestamp of the tweets shifted 1 minute back in time during the market impact analysis. The datasets marked with drop_events do not in- clude tweets that were tweeted during key economic time spots. We have separate datasets that include the word "says" in the tweets. Many tweets start with for example: Powell says or Trump says. Those datasets should

54 include mostly direct quotes from the person instead of discussions about the person. However many tweets start with for example Powell: or Trump: with a colon instead of says. Those tweets are not included in the datasets that require the keyword "says".

3.7 Dataset impact analysis

To answer the first two research questions we used the datasets we had cre- ated. We looked at how many of the tweets had any impact at all and also summed up the number of metrics impacted for each dataset. Since the datasets are different in size we divide the total amount of metrics impacted by the size of the dataset. We used the datasets that include "drop_events", which does not include tweets that were tweeted during key economic time spots, and "require_initial_impact", which only includes tweets that im- pacted metrics within a minute after being tweeted. Algorithm 6: Dataset impact analysis for dataset in datasets do -> Load the x_train, y_train and y_trainB generated in Creating datasets; -> Calculate the percentage of tweets that had any impact on the market; -> total_impacts = 0; for tweet in dataset do -> total_impacts+ = P Metrics_impacted_by_the_tweet total_impacts -> total_impact = length_of_dataset ; -> Return percentage of tweets that had any impact on the market and total_impact;

55 3.8 Data augment the input data

The original data consisted of a very uneven dataset. Most of the tweets do not impact the market even though they are almost solely about the market, central banks and major political discussions. This is unlike the IMDB dataset, 6 which is an even dataset of 25,000 movies reviews labeled by sentiment as being positive or negative. Not only is it very hard to get the sentiment in the market out of the tweets, it is also hard to get the timing of it. The same headline has very different impacts depending on the time it was tweeted. The market is of- ten waiting on several outcomes at any given time, for example did trade negotiation go well or not between China and the US. Will the central bank continue with asset purchasing or not. Will OPEC 7 reach a production deal or not during their meeting.

We decided to even the dataset by copying the tweets that packed the most impact. The tweet had to impact a minimum of 7 metrics in order to be used to even the dataset. We did the same to the y_trainB dataset, to perform a partial form of data augmentation. Partial because we’re not alternating the tweets by removing random words in the tweets or randomly shuffling words in the tweets.

For determining how many copies of each impact-full tweet were needed, we defined the following variables:

• a = length of the y_train data frame where number of metrics im- pacted >= minimum impact, we used 7.

• b = length of the whole dataset.

6https://keras.io/api/datasets/imdb/ 7https://www.opec.org/opec_web/en/

56 To get an even dataset of impact-full and non impact-full tweets we used the following formula. x ∗ a 0.5 = b + x ∗ a b => x = a This resulted in approximately doubling the size of all datasets.

We saved the new x_train and y_trainB datasets into x_train_reguralize.h5 and y_trainB_regularize.h5 HDF5 files, in order to keep the datasets sep- arate.

3.9 Prepare the data for the ANN

In order to prepare the data for the ANN we first had to chose which dataset to load, and chose if we wanted to use the data augmented version or the original one. Keras gives the option to pay more attention to samples from an under- represented class. The class_weight dictionary is used for weighting the loss function during training.8 It can also be helpful when using an un-even dataset to initialize the bias of the final sigmoid dense layer in the ANN with an output bias. For this thesis however all of the best results came when we did not use the output bias. We calculated the class_weights and output_biases for each of the dataset, see chapter 6.4.10 in the appendix how they were calculated. We cleaned the text in the tweets with the get_sentences function, which removes punctuation from each word and removes words that are not alphabetical, see chapter 6.4.11 in the appendix.

8https://keras.io/api/models/model_training_apis/

57 We ended up using the Keras Tokenizer instead of a custom built tokenizer or the Huggingface tokenizer because we are using Keras for creating and training the neural networks. In the future when Keras updates their code base they should update their Tokenzier as well in order for it to work with new code updates.9 Algorithm 7: Tokenizer -> Fit Keras tokenizer on the cleaned tweets; -> Vocab_size = len(tokenizer.word_index + 1); -> max_length = max( length of words in a tweet); -> X_train_test = tokenizer.texts_to_sequences( tweets );

-> X_train_test = padsequences(X_train_test, max_length);

We used the Stanford’s GloVe 100d word embedding. We used Brown- lee’s method [12], to gather the weights into an embedding matrix which is then used as weights in the embedding layer of the ANN.

Algorithm 8: Embedding matrix -> embedding_index dictionary; for line in Stanford’s GloVe 100d file do -> Set the front word of the line as the key in the dictionary; -> The value is the coefficient array of that word; -> Create a matrix of size of our vocabulary size and 100; for word in our vocabulary do if word is in the embedding_index dictionary then -> Add the coefficient vector to the matrix for that word;

9https://huggingface.co/transformers/main_classes/tokenizer.html

58 We split the dataset into training and testing datasets for the ANN. We used 80% for training and 20% for testing. We have the option before we split the dataset to shuffle the dataset so we are testing headlines with the sentiment closer to the training data. The downside is that it does confuse the model on when a certain sentiment had high impact on the market. The upside is to test on tweets that happened during 2018 and early 2019, not only testing the tweets that happened in late 2019. The metrics we used to train the network on are true positives, false positives, true negatives, false negatives, binary accuracy, precision, recall and AUC "Area under the ROC curve". We decided to not use do_fit_batch_size, which is a bool variable, if true, it makes sure the batch_size is a divisor of the split index used to split the training and testing data, because it often made the batch_size too small. The purpose of that is to increase the training speed.

59 3.10 Define the ANN model define_model() creates the LSTM model. Algorithm 9: Define model -> Takes in a tuple of configurations; if do_embedding_matrix then -> Initialize the weights on the first embedding layer with the embedding_matrix generated from the Stanford’s embedding file; if do_bidirectional then -> Put a bidirectional layer around the LSTM layers, Bidireactional(LSTM(...)); if do_batch_norm then -> add Batchnomalization layers; if L2 > 0 then -> Regularize the first LSTM layer with L2 weight decay; –> Regularize the kernel; –> Regularize the recurrence; –> Regularize the bias; if do_output_bias then -> Use tf.keras.initializers.Constant( output_bias ); -> Feed that into the final sigmoid Dense layer for the bias_initializer; -> Set the optimizer; -> Set the learning rate; -> Set the dropout rate; if plot_summary then -> print model summary; -> Plot the model; -> Always set to false when grid searching parameters to prevent this from spaming the terminal;

60 As shown in figure 23, the sequential model starts with an embedding layer which takes as an input integer encoded tweets, followed by a LSTM layer. The tweets are fed through the LSTM cells consecutively. The goal was to use more LSTM layers but since we’re using a very large batch_size the GPU memory limited us to two LSTM layers. We used a large batch_size in order to increase the probability of having positive instances in each batch. The output dimension of the embedding layer is the number of hidden states of the first LSTM layer. After the LSTM layers, optional batch normalization layers and dropout layers we add two dense layers be- fore the final sigmoid dense layer. For the final Dense layer we have optionally do_categorical, if so we have 2 hidden layers and we would have made the Y data to_categorical 2, instead of having a single column with 0 or 1 in it. We compiled the model with the binary_crossentropy loss function be- cause it is for binary classification. The model is set up like this because it is taking in a sequence of strings and making a binary classification prediction.

61 Figure 23: The keras model when we use bidirectional LSTM layers and Batch normalization

Chapter 6.4.14 in the appendix shows how the model is defined and com- piled.

62 3.11 Training the ANN model

When training the model we feed the following settings into a fit_model() function, x_train, y_train, x_test, y_test, batch_size, epochs, verbose, callbacks and class_weight. When iterating through multiple settings we only used the early stopping callback, otherwise the tensor board and checkpoint logs grew out of control. We decided to use another method for storing settings and their results when iterating model settings on each dataset. Instead of monitoring the loss or accuracy for the early stopping we moni- tored the val_auc, area under the ROC curve since it is often a good indicator whether a model is good at classifying. When grid searching settings when fitting the networks (see chapter 6.4.16). we stored the results in a fit_results array. The fit_result stores the model configuration tuple, how long it ran before or if it got early stopped, predict sample results and the model history. We stored the history which is the dictionary for each value for each epoch when training the ANN and the results from the predict sample function. We plotted the history when a model ran through each epoch without getting stopped. When grid searching settings, we looped through all the bool variables true and false. We looped through all the optimizers, a range of learning rates, a range of dropout rates and different monitoring, val_auc, val_recall and val_precision. To save a lot of time we often just used val_auc since trying all three multiplies the size of the running schedule by 3. Instead of evaluating the whole network after each setting which is very time consuming we called a custom function called predict_model.

63 Algorithm 10: Predict model -> Takes in the ANN model and the tokenizer; -> Result dictionary sorted by types of predictions; for i in range (1,99) do -> Uses percentile i of the length of the dataset in order to take evenly distributed test samples; -> pre-process the tweet for the model; -> model.predict( pre-processed tweet ); -> collect the results under random in the result dictionary; for i in range predict_positive_values do -> Find a tweet that did impact the market; -> pre-process the tweet for the model; -> model.predict( pre-processed tweet ); -> collect the results under positive in the result dictionary; Found the same amount of tweets that did not impact the market. for i in range predict_positive_values do -> Find a tweet that did not impact the market; -> pre-process the tweet for the model; -> model.predict( pre-processed tweet ); -> collect the results under negative in the result dictionary; -> Manually create a list of tweets that should impact the market; for fake tweet in fake_positive_tweets do -> pre-process the tweet for the model; -> model.predict( pre-processed tweet ); -> collect the results under fake_positive in the result dictionary; -> Manually create a list of tweets that should not impact the market; for fake tweet in fake_negative_tweets do -> pre-process the tweet for the model; -> model.predict( pre-processed tweet ); -> collect the results under fake_negative in the result dictionary; -> Calculated the total accuracy to the result dictionary under total; 64 -> Prints out the overview and returns the dictionary; Predict_model uses the network to predict random tweets from the dataset, tweets that are known to have impacted the market, tweets that are known to have not impacted the market and custom made tweets. It returns how accurately the network predicted on each test set. The model might have good val_auc but it might perform badly on the samples in the predict sample function. See chapter 6.4.17 in the appendix on how Predict sample was implemented.

3.12 Optimizers for the ANNs

There are multiple different optimizers, it is often considered difficult to know which optimizer works best for each model. In the Deep Learning book [25] 10 it says "nearly every deep learning algorithm relies on sampling based estimates when using ANNs, in practice we have only noisy or even biased estimates of the gradients while most optimization algorithms are designed with the assumption they have access to the exact gradient of the Hessian matrix [25]. Much of the run time of training is due to the length of the trajectory needed to arrive at the solution, for example time tracing out a wide arc around a mountain shaped structure." Since it is only possible to train one model at a time, meaning using one certain optimizer with non changing settings and its hyper parameters, we iterated through the optimizers by storing them in an array as objects. We tested six optimizers, SGD, Adagrad, RMSprop, Adam, Adamax and Nadam. Stochastic Gradient Descent (SGD) is probably the most used optimiza- tion algorithm for machine learning. It follows a gradient of randomly se- lected minibatches downhill. The crucial parameter for the SGD algorithm is the learning rate. We had little success using the SGD optimizer. We did not use a learning rate schedule which modifies the learning rate while 10chapter 8

65 training, we tested different learning rate parameters. SGD can sometimes be slow, we did not tune the momentum nor the velocity because of limited resources. The more hyper parameters we grid search the more dimensions of settings there are to train. We only had one GPU at our disposal. Adagrad adapts the learning rates of all model parameters by scaling them inversely proportional to the square root of the sum of all of the histor- ical squared values of the gradient. The parameters with the largest partial derivative of the loss have a correspondingly rapid decrease in their learn- ing rate, while the parameters with small partial derivatives have relatively small decrease in their learning rate. The net effect is a greater progress in the more gently sloped directions of parameter space. It is designed to con- verge rapidly when applied to a convex function.[25]. We did not get good classifying results either from the Adagrad optimizer. Adam uses adaptive momentum. It is a combination of RMSProp and momentum. Adam estimates the first-order momentum of the gradient [25]. Adamax is a varient of Adam based on infinity norm. It can be superior to Adam when using word embeddings [15] like we do in this paper but they performed similarly. Nadam is a variant of Adam with Nesterov momentum [16]. RMSProp modifies AdaGrad to perform better in the nonconvex setting by changing the gradient accumulation into an exponentially weighted mov- ing average (EMA). It uses an exponentially decaying average to discard history from the extreme past so that it can converge rapidly after finding a convex bowl. We got multiple good classification results when using the RMSProp optimizer as we did with the Adam optimizer and its variants. Unlike Adam the RMSProp second-order moment estimate may have high bias early in training [25]. RMSProp, Adam, Adamax and Nadam all had multiple model settings that gave good classification scores, high F1score, AUC, MCC, Youden and C_Kappa values from the validation sets.

66 For all of the optimizers we tested the following learning rates:

[1e−1, 1e−2, 1e−3, 1e−4, 1e−5]

3.13 Regularization for the ANNs

We did the same when using regularization as we did with the optimizers, since they also have hyper parameters we can tune for our network. Accord- ing to Goodfellow [25] 11, "A central problem in machine learning is how to make an algorithm that will perform well not just on the training data, but also on new inputs." Regularization techniques can help prevent overfitting. We mostly used Dropout, L2 weight decay, data augmentation and batch normalization for this purpose. L2 weight decay drives the weights closer to the origin by adding a regularization term to the objective function. The pri- mary purpose of batch normalization is to improve optimization. We tested the dropout rates of [ 0.1, 0.2, 0.5, 0.8 ] and L2 weight decay of [ 0, 1e−3, 1e−5]

11chapter 7

67 Algorithm 11: Grid searching NN model settings for optimizer in optimizers do for learning_rate in learning_rate_schedule do for dropout in dropout_schedule do for L2 in L2_schedule do for the other binary settings do -> Define the model; -> Fit the model; -> Plot model history when early stopping was not triggered; -> Predict sample; -> Collect the results; end end end end end

Algorithm 12: The other binary settings in the grid search for do_embedding_matrix in [True, False] do for do_output_bias in [True, False] do for do_batch_normalization in [True, False] do for do_bidirectional_layers in [True, False] do ... end end end end See chapter 6.4.16 in the appendix how this was implemented.

68 3.14 Training schedule

We saved the fit_results with pickle as binary files, since each value of fit_results consist of a tuple, int, small dictionary and a large dictionary, stored as a .data binary file.

Algorithm 13: Get best classification results and model settings -> Load the fit_results binary file; -> Initialize Pandas DataFrame (df); for each model setting do -> Get the final validation values; -> Create no_early_stopping variable for those models that trained through all the epochs without being early stopped; -> Collect the values into the df; -> Get the NN model settings; -> Calculate the classification metrics; -> Sort the df by the best classification results; -> Get the NN model settings which gave those results; for dataset in training_schedule do -> Load the weights from previous training; -> Use the NN model settings wich gave the best classification results; -> Train the NN model on the new dataset; -> Save the NN model weights for the next dataset;

69 We ran the following dataset schedule. Algorithm 14: Training schedule -> Alike next hour first shift initial impact drop Events Powell; -> Alike next hour first shift initial impact drop Events Trump; -> Alike next hour first shift initial impact drop Events; -> Trump shift initial impact drop Events; -> Powell shift initial impact drop Events; -> Fed shift initial impact drop Events; -> Adviser shift initial impact drop Events; -> realDonaldTrump shift initial Impact drop Events; -> Tweets shift initial Impact drop Events says; -> Tweets shift initial Impact drop Events;

3.15 Evaluation metrics

The dataset is very uneven with way more tweets not impacting the market compared to those who do. We are using a wide variety of subsets of our datasets when training which range from 1.3% to 11.7% of tweets impacting the market. When monitoring the training process of the ANNs we most often monitored the progress of the Area under the ROC curve (AUC) when it came to early stopping.

David [10] talks about how in a medical context, for example, it is usual to come up with potentially useful medications or tests, and then explore their effectiveness across a wide range of complaints. In this case markedness may be appropriate for the comparison of performance across different conditions.

• True Positive (TP), when we correctly predict a positive value.

• True Negative (TN), when we correctly predict a negative value.

• False Negative (FN), when we predict a negative value when the value is actually positive (Type II error)

70 • False Positive (FP), when we predict a positive value when the value is actually negative (Type I error)

TP P recision = TP + FP

TP Recall = TP + FN

P recision ∗ Recall F score = 2 ∗ 1 P recision + Recall

TP True positive rate (TPR) = Recall = TP +FN FP False positive rate (FPR) = FP +TN

The ROC curve is a graph which is a useful tool to see how good a clas- sifier is at classifying the dataset [9]. The Receiver Operating Characteristic (ROC) graph consists of True positive rate (TPR) and False positive rate (FPR).

• y_axis - tpr [0,1].

• x_axis - fpr [0,1].

We calculated both val_F_score from val_precision and val_recall and F_score from precision and recall. We used val_F_score for our final results when determining how good a given ANN is at classifying tweets as being market impact-full or not. The highest possible F_score is 1.0 when both precision and recall are 1. A downside of the F_score is that it does not take into account true negatives.

71 Matthews correlation coefficient (MCC) is named after a biochemist Brian W. Matthews. The coefficient returns values on the interval [-1,1]. MCC takes into account true negatives.

• + 1 represent a perfect prediction.

• 0 equals random predictions.

• -1 disagrees completely with the classifier.

TP ∗ TN − FP ∗ FN MCC = p(TP + FP ) ∗ (TP + FN) ∗ (TN + FP ) ∗ (TN + FN)

Chicco and Jurman [4] discussed that F1 score and accuracy can mislead but MCC does not. We noticed when training our ANN we often got F1 score as high as 0.7 but at the same time a MCC score around 0. W.J Youden proposed a new measurement for performance in diagnostic tests. [5]

TP TN J = + − 1 TP + FN TN + FP The results are on the interval [0,1]. If the value is 1 then that means there are no false positive and no false negatives and the test is perfect. If the value is 0 the test is useless. If the value is 0.5, then we are getting the same proportion of positive results from a group of impact-full tweets and non-impact-full tweets. Cohen’s kappa is another indicator for measuring the statistics from a classifier.

p − p κ = 0 e 1 − pe where

TP +TN • p0 = TP +FP +FN+TN

72 • pe = pY es + pNo

TP +FP TP +FN – pY es = TP +FP +FN+TN ∗ TP +FP +FN+TN FN+TN FP +TN – pNo = TP +FP +FN+TN ∗ TP +FP +FN+TN

See chapter 6.4.8 in the appendix on how we implemented these classifi- cation metrics into our model. In order for us to monitor and evaluate the classification metrics we made the ANNs monitor the following metrics while training. METRICS = [ keras.metrics.TruePositives(name='tp'), keras.metrics.FalsePositives(name='fp'), keras.metrics.TrueNegatives(name='tn'), keras.metrics.FalseNegatives(name='fn'), keras.metrics.BinaryAccuracy(name='accuracy'), keras.metrics.Precision(name='precision'), keras.metrics.Recall(name='recall'), keras.metrics.AUC(name='auc'), ]

We used the validation data when calculating the evaluation metrics.

73 4 Results

4.1 Dataset impact analysis results

In this thesis we measured and viewed how many metrics each tweet impacted the market. A single tweet can impact a single metric, or several at the same time, the more metrics a tweet impacted, the more impact it generally had on the market. We created multiple datasets where some datsets were sparser, meaning a lower percentage of tweets had any impact on the market. Then there were other datasets, for example datasets that included trending tweets that had a higher percentage of tweets impacting the market. For the dataset impact analysis we looked at the percentage of tweets that impacted any metric, then we looked at how many metrics were impacted in total by all of the tweets in the dataset. The datasets vary in size so in order to determine the total impact of a datasets we summed up the total metrics impacted and divided it with the number of tweets in the dataset.

P Metrics impacted P Tweets in dataset The dataset that were analysed are listed below and summarized in figure 24:

1. The whole tweet dataset more often had more impact than expected. 1.3% of tweets impacted the market, where the sum of metrics impacted divided by the number tweets was 0.4.

2. For Trump’s personal account only 2.6% of his tweets had any impact on the market, where the sum of metrics impacted divided by the num- ber of tweets was 0.68, which is high but not surprising after watching him wreak havoc on the market for 4 years.

3. 1.3% of the dataset that had the keywords "Trump" and "says" im- pacted the market, where the sum of metrics impacted divided by the

74 number of tweets was 0.5.

4. The dataset that includes the keyword "Trump" but does not require the tweet to include "says" in it, packed more impact, where 8.5% of the tweets impacted the market, and the number of metrics divided by the number of tweets was 2.95.

5. The dataset for Fed chair Powell had 8.7% of tweets impacting the market, where sum of metric divided by the number of tweets was 2.82.

6. The dataset for other Fed members had 1.49% of tweets impacting the market, and the sum of metrics divided by the size of the datset was 0.61. This is not surprising because Fed members talk very frequently and they are usually reading a similar speech as before. It is not often that they say something that moves the market, but when they do it is usually impact-full.

7. The dataset for the White house advisors and Trump had 8.4% of the tweets impacting the market, where the sum of metrics divided by the number of tweets was 3.03. This is not surprising because the White house advisors seemed to be trying to move the markets with their comments like Trump.

8. The whole Alike_next_hour dataset, which had trending headlines, had 11.7% of tweets impacting the market, where the sum of metrics divided by the number of tweets was 3.97.

9. 8.1% of trending headlines mentioning Trump packed a impact on the market, where the sum of metrics divided by the number of tweets was 8.60.

10. For Powell trending headlines, 9.6% packed impact, where the sum of metrics divided by the number of tweets was 7.19.

75 Figure 24: Dataset impact analysis results

The first two research questions,:

• Do trending Trump tweets have more impact on the market compared to those who do not trend?

• Do trending tweets about Fed members mentioning monetary policy have more impact on the market compared to those who do not trend?

Can be answered with a yes. The trending tweets mentioning Trump and Fed members packed more impact on the market compared to the ones which did not trend.

4.2 Evaluation metrics results

The best results for each evaluation metric had similar ANN settings except for recall as shown in figure 25. We can exclude the recall result since a recall

76 value of 1 means there were no false positives, which tells us the ANN most likely over-fitted, only predicting positive values. Looking at the other metrics:

• They all had the same learning rate and dropout rate.

• They did not use any L2 weight decay on the first LSTM layer.

• They used the embedding matrix generated from the Stanford’s GloVe word embedding when initializing the weights of the first embedding layer in the ANN model.

• They did not use the output bias in the final sigmoid dense layer of the ANN model.

• They had bidirectional layers around the LSTM layers making the LSTM cells take in inputs from the previous cell and the following cell.

• The only difference between them is that precision got its best result using the Nadam optimizer, and the AUC from using the Adamax optimizer, while the others got it from using the Adam optimizer.

Figure 25: Best results by evaluation metric

77 We plotted for each optimizer the results of the key quality measure- ments of each setting. Notice that both the Adagrad and SGD optimizers never managed to get good enough results from these grid searching param- eters. The plot only includes calculation from the validation metrics. The

AUC stayed around 0.5, F1 Score topping below 0.7 on both of them and the MCC, Youden and C_kappa remaining flat lined near 0 as shown in figure 26. Adam, Nadam, Adamax and RMSprop however have multiple promising set- tings where all of the indicators show good promise. They had similar trends in their graphs which is not surprising since those four optimizers are similar and they all got their best results using similar ANN settings. The SGD optimizers might need a learning rate schedule with a dropping learning rate to give better results.

Figure 26: Results by optimizer plotted against ANNs

The last two research questions,:

• Can an Artificial Neural Network (ANN) learn to classify tweets as being market impact-full or not?

78 • Null hypothesis: Tweets can not be shown to impact the market.

Given our high classification score results we can conclude that an artificial neural network can learn to classify tweets as being market impact-full or not. With further research, more GPU resources and time we should be able to train them to be even more accurate at classifying tweets. In the appendix under chapter 6.2 the reader can view the best results for each optimizer and under chapter 6.3 the best results for each evaluation metric.

79 5 Conclusion

In this thesis we described how to measure the impact tweets have on the market, and how to train ANNs to learn to classify tweets as being market impact-full or not. We developed a method for finding trending tweets, a method for comparing tweets to market data, a method for finding ANN settings that work on different datasets, a method for viewing market impact and a method for creating a training schedule for multiple datasets beginning with the ones who impacted the market the most. We found that using headlines that started trending, impacted the market the most, and helped training the ANNs. We grid searched ANNs configu- rations, in order to find good settings that gave good classification results. It was better to train the largest non-specific datasets, using weights from models that had been trained on more specific datasets that had more impact on the market.

5.1 Future work

The original random forest paper by L Breiman [19] 12, Breiman talks about datasets with many weak inputs are becoming more common, i.e. in medical diagnosis, document retrieval, etc. The common characteristics are no single input or small group of inputs can distinguish between the classes. This type of data is difficult for the usual classifiers—neural nets and trees. This is something we are running into with our data, multiple different datasets of sweep event analysis and technical analysis, used from the TA libary. What we are most interested in with regards to random forest is the ranking forest idea [13]. Gérard and Erwan talked about it in the random forest overview [14]. Could we extract which input variables are the most important ones? Then use data augmentation on those attributes, putting heavier weights on them or train a neural network where we only input 12chapter 9

80 those attributes. It could be, for example, that only large continuous sweeps following tweets can be regarded as impact-full. There has been a significant increase in the money supply during 2020 because the central banks have been printing money and buying assets.13 This tends to raise stock prices, but how about US treasury future prices? Does it have a different impact on the shorter end of the curve than the longer end of the curve? The hard part with using the money supply in this thesis is that this thesis is looking at short term market reactions. The money supply only affects the market in the long run, except when new Quantitative Easing (QE) or tapering headlines come out. We can create quantitative easing datasets that include mention of QE or QE related lingo. We can create a dataset when anyone mentions QE. We can create a second dataset when a tweet includes a Fed member and also some of the lingo for QE. The third dataset could be tweets including Powell and QE lingo. Quantitative easing is a touchy subject, the Fed often likes to phrase it differently. For example in the 2020 20.November FOMC statement it says "In addition, over coming months the Federal Reserve will increase its holdings of Treasury securities and agency mortgage-backed securities at least at the current pace to sustain smooth market functioning and help foster accommodative financial conditions, thereby supporting the flow of credit to households and businesses."14 Traders also have their own lingo when it comes to Fed asset purchase. In this thesis we predicted short term impacts. It would also be helpful to be able to predict longer term affects of headlines. A change in the course of rates for example affects the market for the next years. The problem is determining which tweet of thousands of tweets was the one that caused change in the trend. We could look at the tweets that had the most initial

13https://fred.stlouisfed.org/series/M2 14https://www.federalreserve.gov/newsevents/pressreleases/ monetary20201105a.htm

81 impact on the market. Transformers are a recent innovation from Google, introduced in Atten- tion is all you need [2]. The first Transformer was an encoder decoder model with multi head attention mechanism. Attention mechanism have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regards to their dis- tance in the input or output sequences. The Transformer model architecture eschewing recurrence and instead rely entirely on the attention mechanism to draw global dependencies between input and output. Recurrent Neural Networks (RNNs) have been shown to have explicit memory. Transformers have shown to have even better working memory than LSTMs. After extensive attempts we were not able to fit our data into a trans- former ANN. We think transformers will greatly improve the ANN ability to extract the sentiment out of the market. One issue was dividing our tweets into queries, keys and values, needed to feed into the multi head attention layer. The multi head attention layers concatenates together attention layers of queries, keys and values. BERT has been a popular transformer for NLP projects, after it was published by Google in 2019, it advanced the state of the art for eleven NLP tasks[3]. BERT is a method of pre-training language representations. It trains a general purpose language understanding model on a large text corpus. This uses that model for downstream NLP tasks like question an- swering. BERT outperformed previous methods because it was the first un- supervised, deeply bidirectional system for pre-training NLP. Unsupervised, meaning BERT was trained only using a plain text corpus. BERT could be a good choice for our model since it has pre-trained language representations. We also tried using Hugginface’s BERT framework without success so far 15.

15https://huggingface.co/transformers/model_doc/bert.html

82 Acknowledgements

I want to thank my wife for making this possible, she really ought to receive a diploma degree of some sort for this. I want to thank my parents for all the help baby sitting my daughters and their son. I want to thank Kristján Þór Jónsson for helping me collect the tweets. I want to thank my instructor for the multiple thorough reviews of the thesis, as well as the committee. I want to thank my daughters for being perfect.

References

[1] Felix A. Gers, Nicol N. Schraudolph J¨urgen Schmidhuber (2002), Learn- ing Precise Timing with LSTM Recurrent Networks, Journal of machine learning, JMLR

[2] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin (2017), Attention Is All You Need, Google Inc, arXiv

[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2019), BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Google Inc, arXiv

[4] Davide Chicco Giuseppe Jurman (2020), The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary clas- sification evaluation, BMC genomics, Springer

[5] W.J.Youden (1950), Index for rating diagnostic tests, Cancer volume 3, Wiley Online Library

[6] Saikiran Gogineni Anjusha Pimpalshende (2020), Predicting IMDB Movie Rating Using Deep Learning , 2020 5th International Conference on Communication and Eloectronics Systems (ICCES), IEEE Xplore

83 [7] Gregory Carey (1998), Multivariate Analysis of Variance (MANOVA): I. Theory, Tech rep, University of Colorado

[8] Russell Warne (2014), A Primer on Multivariate Analysis of Variate Analysis of Variance (MANOVA) for Behavioral Scientists, Practical As- sessment Research Evaluation, Citeseer

[9] Tom Fawcett (2005), An introduction to ROC analysis, Pattern recogni- tion letters, Elsevier

[10] David M W Powers (2007), Evaluation: From Precision, Recall and F- Factor to ROC, Informedness, Markedness Correlation, Machine Learn- ing, Cornell University

[11] https://technical-analysis-library-in-python.readthedocs.io/en/latest/ta, Dario Lopez Padial

[12] Jason Brownlee (2017), Deep Learning for Natural Language Processing, machinelearningmastery.com, Machine Learning Mastery Pty. Ltd

[13] Stephan Cl Marine Depecker (2013), Ranking Forest, Journal of Machine Learning Research 14, JMLR

[14] Gérard Biau Erwan Scornet (2016), A random forest guided tour, TEST 25, Springer

[15] Diederik P. Kingma, Jimmy Ba (2014), Adam: A Method for Stochas- tic Optimization, International Conference for Learning Representations, Cornell University

[16] Timothy Dozat (2016), Incorporating Nesterov Momentum into Adam, ICLR 2016 workshop paper, Stanford University

84 [17] Jack D. Schwager (2012), Hedge Fund Market Wizards: How Winning Traders Win, Gliden Media

[18] Jack D. Schwager (1989), Market Wizards: Interviews with Top Traders, Gliden Media

[19] Leo Breiman (2001), Random Forest, Machine learning, Springer

[20]C ´ıcero Nogueira dos Santos, Ma´ıra Gatti (2014), Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts, Proceedings of COLING, Aclweb

[21] Eric Gilbert and Karrie Karahalios (2010), Widespread Worry and the Stock Market, International AAAI Conference on Web, PKP Publishing Services Network

[22] Fazeel Abid, Muhammad Alam, Muhammad Yasir Chen Li (2019), Sen- timent analysis through recurrent variants latterly on convolutional neural network of Twitter, Future Generation Computer Systems, Elsevier

[23] Federal Open Market Committee, https://www.federalreserve.gov/monetarypolicy/fomc.htm

[24] Angelo Asprisa, Sean Foleya, Peter O’Neillb Drew Harris (2014), Time and Pro-rata Matching: Evidence of a change in LIFFE STIR Futures, Journal of Futures Markets, Wiley Online Library

[25] Ian Goodfellow, Yoshua Bengia Aaron Courville (2015), Deep Learning, The MIT press

[26] Johan Bollen, Huina Mao Xiaojun Zeng (2011), Twitter mood predicts the stock market], Journal of Computational Science, Elsvier

85 [27] Michael Fleming Asani Sarkar (1999), Liquidity in U.S. Treasury spot and futures markets, Federal Reserve Bank of New York, page 4

[28] Rao T. Srivastava S. (2014), Twitter Sentiment Analysis: How to Hedge Your Bets in the Stock Markets, State of the Art Applications of Social Network Analysis. Lecture Notes in Social Networks. Springer, Cham

[29] Svetlana Kiritchenko, Zhu Xiaodan M. Mohammad (2014), Sentiment Analysis of Short Informal Texts, JAIR, AI Access Foundation

86 6 Appendix

6.1 Highest retweets

• Retweets: 947 | Tweeter 1 | rt @staedtler: i am the very terror of a top attorney general with beetle brows and booming vowels both baritone and tenoral, i hold command of any court from magistrate’s to coroners, but practise mostly in the art of bellowing at foreigners https://t.co/eajjstk4i4

• Retweets: 51891 | Tweeter 2 | physicist stephen hawking has died at the age of 76, a spokesman for his family has said https://t.co/zfw9msxcbl

• Retweets: 42524 | Tweeter 3 | rt @rterdogan: türk silahlı kuvvetleri’miz suriye milli ordusu’yla birlikte suriye’nin kuzeyinde pkk/ypg ve deaş terör örgütlerine karşı barışpınarıharekatı’nı başlatmıştır. amacımız güney sınırımızda oluşturulmaya çalışılan terör koridorunu yok etmek ve bölgeye barış ve huzuru getirmektir.

• Retweets: 14428 | Tweeter 4 | rt @realdonaldtrump: very important that opec increase the flow of oil. world markets are fragile, price of oil getting too high. thank you!

• Retweets: 59079 | Tweeter 5 | rt @realdonaldtrump: happy new year to everyone, including the haters and the fake news media! 2019 will be a fantastic year for those not suffering from trump derangement syn- drome. just calm down and enjoy the ride, great things are happening for our country!

• Retweets: 27949 | Tweeter 6 | rt @nbcnews: breaking: u.s. official in charge of protecting american elections from hacking tells nbc news that russians successfully penetrated voter registration rolls of several u.s. states prior to 2016 presidential election. https://t.co/qxnaa0zdbd more tonight on @nbcnightlynews.

87 • Retweets: 161718 | Tweeter 7 | rt @chaeronaea: please enjoy this video i found on reddit of a dog trying to steal another smaller dog https://t.co/tm82uk9xle

• Retweets: 243215 | Tweeter 8 | rt @gourdnibler: so dramatic! dude from the weather channel bracing for his life, as 2 dudes just stroll past. hurricaneflorence https://t.co/8frym4nlbl

• Retweets: 53172 | Tweeter 9 | rt @acosta: i walked out of the end of that briefing because i am totally saddened by what just happened. sarah sanders was repeatedly given a chance to say the press is not the enemy and she wouldn’t do it. shameful.

• Retweets: 201085 | Tweeter 10 | rt @vancityreynolds: these assholes told me it was a sweater party. @realhughjackman jakegyllenhaal https://t.co/qgla2a2o0z

• Retweets: 185568 | Tweeter 11 | rt @thedad: an 88-year-old dad is reunited with his 53-year-old down syndrome son after spending a week apart for the first time ever. https://t.co/5hvl0fkgks

• Retweets: 106702 | Tweeter 12 | rt @realdonaldtrump: to iranian pres- ident rouhani: never, ever threaten the united states again or you will suffer consequences the likes of which few throughout history have ever suffered before. we are no longer a country that will stand for your demented words of violence amp; death. be cautious!

• Retweets: 81769 | Tweeter 13 | rt @michaelbloch15: my client walked out of rikers island today after a jury acquitted him of all charges. he waited in jail for his trial for 3 years. he lost his job and missed his son’s first 3 birthdays simply because he couldn’t afford bail. he’s not unique - this is our system. reform

88 • Retweets: 224287 | Tweeter 14 | rt @bbcworld: the handshake that made history. https://t.co/jb09ce9mht

• Retweets: 181040 | Tweeter 15 | rt @chelseaclinton: good morning mr. president. it would never have occurred to my mother or my father to ask me. were you giving our country away? hoping not. https://t.co/4odjwzup0c

• Retweets: 1523 | Tweeter 16 | the united states national debt surpassed 22,000,000,000,000 dollars today.

• Retweets: 25101 | Tweeter 17 | rt @govmikehuckabee: i lived through pentagon papers and watergate, iran contra and whitewater and more but 1st time in my life seeing press trying to keep lid on govt actions rather than fully expose. they hate @realdonaldtrump that much. more than they love truth. sad. journalism is dead.

• Retweets: 2328 | Tweeter 18 | rt @joe_co_uk: david cameron: peerless in number 10... ge2015 https://t.co/ea5l4ddqtt

• Retweets: 228 | Tweeter 19 | the pound is on track for its biggest fall in 2 years in a big day for theresa may and brexit https://t.co/epvjxb3gij https://t.co/s2j8kzpaoe

• Retweets: 515 | Tweeter 20 | fed announces cease and desist order against u.s. operations of deutsche bank ag; fines firm 41 million dol- lars: https://t.co/rxvh55qq1w

• Retweets: 179085 | Tweeter 21 | rt @ziyatong: we could all learn a lesson from this baby bear: look up amp; don’t give up. https://t.co/nm0mcsyeqy

• Retweets: 50488 | Tweeter 22 | rt @sokane1: trump just called apple ceo tim cook “tim apple” https://t.co/gthhtjwvc9

• Retweets: 54317 | Tweeter 23 | rt @natt0: nbafinals https://t.co/a1upknezrq

89 • Retweets: 156238 | Tweeter 24 | rt @incredibleculk: i feel bad about all the burglars who never had the confidence to rob a house on christmas thanks to me.

• Retweets: 71200 | Tweeter 25 | rt @realdonaldtrump: now that the three players are out of china and saved from years in jail, lavar ball, the father of liangelo, is unaccepting of what i did for his son and that shoplifting is no big deal. i should have left them in jail!

• Retweets: 124031 | Tweeter 26 | rt @jebbush: sorry mom

• Retweets: 168824 | Tweeter 27 | rt @tiffanyrg9: this is quality content https://t.co/6mgpsqlrss

• Retweets: 257313 | realDonaldTrump | why would kim jong-un insult me by calling me "old," when i would never call him "short and fat?" oh well, i try so hard to be his friend - and maybe someday that will happen!

• Retweets: 106697 | Tweeter 29 | rt @realdonaldtrump: to iranian pres- ident rouhani: never, ever threaten the united states again or you will suffer consequences the likes of which few throughout history have ever suffered before. we are no longer a country that will stand for your demented words of violence amp; death. be cautious!

• Retweets: 180713 | Tweeter 30 | rt @miketyson: stop sending me this shit https://t.co/jfj8uchqvt

Highest number of retweets by user

90 6.2 Best results per optimizer

As shown in tables 7, 8, 9 and 10 we can see various ANN settings for each of the best performing optimizers, that gave classification score above 80% for all of the metrics, most of them scoring above 90%. All of them showed their best results when using a learning rate of 0.001 and when they did not use the output bias.

91 NN opt lr dropout embedding batchnorm outputbias bidirectional precision recall auc f_score MCC Youden C_kappa 0 Adam 0.001 0.3 True True False True 0.92 0.99 0.97 0.95 0.90 0.90 0.90 1 Adam 0.001 0.3 False True False True 0.92 0.99 0.97 0.95 0.90 0.89 0.90 2 Adam 0.001 0.3 True True False True 0.94 0.99 0.97 0.96 0.93 0.92 0.92 3 Adam 0.001 0.3 False True False True 0.93 0.98 0.97 0.96 0.91 0.91 0.91 4 Adam 0.001 0.3 True True False False 0.93 0.98 0.96 0.95 0.90 0.90 0.90 5 Adam 0.001 0.3 False True False False 0.92 0.98 0.96 0.95 0.90 0.90 0.90 6 Adam 0.001 0.3 True True False False 0.92 0.99 0.96 0.95 0.90 0.90 0.90 7 Adam 0.001 0.3 False True False False 0.93 0.98 0.96 0.96 0.91 0.90 0.91 92 8 Adam 0.001 0.3 True False False True 0.91 0.99 0.97 0.95 0.89 0.88 0.89 9 Adam 0.001 0.3 False False False True 0.90 0.99 0.96 0.94 0.88 0.87 0.87 10 Adam 0.001 0.3 True False False True 0.91 0.99 0.96 0.95 0.89 0.88 0.88 11 Adam 0.001 0.3 False False False True 0.89 0.99 0.97 0.93 0.86 0.85 0.86 18 Adam 0.001 0.8 True True False True 0.90 0.95 0.94 0.93 0.85 0.84 0.85 20 Adam 0.001 0.8 True True False False 0.94 0.98 0.97 0.96 0.91 0.91 0.91 21 Adam 0.001 0.8 False True False False 0.90 0.96 0.95 0.93 0.85 0.85 0.85 22 Adam 0.001 0.8 True True False False 0.92 0.98 0.96 0.95 0.90 0.89 0.90 23 Adam 0.001 0.8 False True False False 0.90 0.97 0.93 0.94 0.87 0.86 0.86

Table 7: Adam settings and results NN opt lr dropout embedding batchnorm outputbias bidirectional precision recall auc f_score MCC Youden C_kappa 320 Adamax 0.001 0.3 True True False True 0.92 0.99 0.97 0.95 0.90 0.90 0.90 321 Adamax 0.001 0.3 False True False True 0.91 0.99 0.97 0.95 0.89 0.89 0.89 322 Adamax 0.001 0.3 True True False True 0.93 0.99 0.97 0.96 0.91 0.91 0.91 323 Adamax 0.001 0.3 False True False True 0.92 0.99 0.96 0.95 0.90 0.90 0.90 324 Adamax 0.001 0.3 True True False False 0.92 0.99 0.96 0.95 0.89 0.89 0.89 325 Adamax 0.001 0.3 False True False False 0.91 0.98 0.96 0.94 0.87 0.87 0.87 326 Adamax 0.001 0.3 True True False False 0.92 0.98 0.96 0.95 0.90 0.89 0.89 327 Adamax 0.001 0.3 False True False False 0.85 0.97 0.93 0.91 0.81 0.80 0.80 93 328 Adamax 0.001 0.3 True False False True 0.92 0.99 0.97 0.95 0.90 0.89 0.90 329 Adamax 0.001 0.3 False False False True 0.90 0.99 0.97 0.94 0.87 0.86 0.87 330 Adamax 0.001 0.3 True False False True 0.92 0.99 0.97 0.95 0.90 0.89 0.89 331 Adamax 0.001 0.3 False False False True 0.89 0.99 0.96 0.94 0.87 0.86 0.86 340 Adamax 0.001 0.8 True True False False 0.88 0.98 0.95 0.93 0.84 0.83 0.84 342 Adamax 0.001 0.8 True True False False 0.89 0.98 0.95 0.93 0.86 0.85 0.85

Table 8: Adamax settings and results NN opt lr dropout embedding batchnorm outputbias bidirectional precision recall auc f_score MCC Youden C_kappa 256 Nadam 0.001 0.3 True True False True 0.93 0.98 0.97 0.96 0.91 0.90 0.91 257 Nadam 0.001 0.3 False True False True 0.94 0.99 0.97 0.96 0.92 0.92 0.92 258 Nadam 0.001 0.3 True True False True 0.95 0.97 0.97 0.96 0.91 0.91 0.91 259 Nadam 0.001 0.3 False True False True 0.94 0.98 0.97 0.96 0.91 0.91 0.91 260 Nadam 0.001 0.3 True True False False 0.92 0.98 0.96 0.95 0.90 0.89 0.90 261 Nadam 0.001 0.3 False True False False 0.91 0.98 0.96 0.95 0.89 0.88 0.88 262 Nadam 0.001 0.3 True True False False 0.92 0.99 0.97 0.95 0.90 0.89 0.89 263 Nadam 0.001 0.3 False True False False 0.91 0.98 0.96 0.95 0.89 0.88 0.89 94 264 Nadam 0.001 0.3 True False False True 0.92 0.99 0.97 0.95 0.89 0.89 0.89 265 Nadam 0.001 0.3 False False False True 0.91 0.99 0.96 0.95 0.89 0.89 0.89 266 Nadam 0.001 0.3 True False False True 0.91 0.99 0.97 0.95 0.89 0.88 0.89 267 Nadam 0.001 0.3 False False False True 0.90 0.99 0.96 0.94 0.88 0.87 0.87 272 Nadam 0.001 0.8 True True False True 0.90 0.95 0.95 0.92 0.83 0.83 0.83 276 Nadam 0.001 0.8 True True False False 0.92 0.98 0.96 0.95 0.89 0.89 0.89 277 Nadam 0.001 0.8 False True False False 0.90 0.99 0.95 0.94 0.88 0.87 0.87 278 Nadam 0.001 0.8 True True False False 0.91 0.98 0.95 0.94 0.88 0.87 0.88 279 Nadam 0.001 0.8 False True False False 0.91 0.98 0.96 0.95 0.89 0.88 0.88

Table 9: Nadam settings and results NN opt lr dropout embedding batchnorm outputbias bidirectional precision recall auc f_score MCC Youden C_kappa 128 RMSprop 0.001 0.3 True True False True 0.92 0.99 0.97 0.95 0.90 0.89 0.89 129 RMSprop 0.001 0.3 False True False True 0.88 0.98 0.96 0.92 0.84 0.83 0.83 130 RMSprop 0.001 0.3 True True False True 0.93 0.98 0.97 0.95 0.90 0.90 0.90 131 RMSprop 0.001 0.3 False True False True 0.94 0.99 0.97 0.96 0.92 0.91 0.92 132 RMSprop 0.001 0.3 True True False False 0.92 0.98 0.96 0.95 0.90 0.90 0.90 133 RMSprop 0.001 0.3 False True False False 0.92 0.98 0.97 0.95 0.90 0.89 0.90 134 RMSprop 0.001 0.3 True True False False 0.93 0.97 0.96 0.95 0.89 0.89 0.89 135 RMSprop 0.001 0.3 False True False False 0.91 0.99 0.96 0.94 0.88 0.88 0.88 95 136 RMSprop 0.001 0.3 True False False True 0.91 0.98 0.97 0.95 0.89 0.88 0.89 137 RMSprop 0.001 0.3 False False False True 0.89 0.97 0.96 0.93 0.86 0.85 0.85 138 RMSprop 0.001 0.3 True False False True 0.93 0.98 0.97 0.95 0.90 0.90 0.90 139 RMSprop 0.001 0.3 False False False True 0.91 0.98 0.96 0.94 0.88 0.87 0.88 148 RMSprop 0.001 0.8 True True False False 0.91 0.98 0.96 0.95 0.89 0.88 0.89 149 RMSprop 0.001 0.8 False True False False 0.89 0.98 0.95 0.93 0.86 0.85 0.85 150 RMSprop 0.001 0.8 True True False False 0.87 0.99 0.96 0.93 0.84 0.83 0.84 151 RMSprop 0.001 0.8 False True False False 0.89 0.97 0.95 0.93 0.85 0.84 0.85

Table 10: RMSprop settings and results 6.3 Best results per evaluation metric

If we examine which model settings gave the best results for each classification metric. All of the best settings had do_output_bias set to false. Most of them had learning rate of 0.001, and a dropout rate of 0.3, but some settings had a high dropout rate of 0.8. Even lower learning rate of 1e−5 gave perfect recall results which means there were no false negatives, meaning it clearly over-fitted the model. The difference between the best model for Cohen’s Kappa, and the 5th best result, were the best settings used a dropout rate of 0.3 compared to 0.8, and it used bidirectional layers around the LSTM layers. The only difference between the best model for F_score and the 5th best results, were the best settings used the embedding matrix generated from Stanford’s GloVe word embedding when initializing the embedding layer, while the 5th best one did not.

96 opt lr dropout embedding batchnorm outputbias bidirectional precision

RMSprop 0.001 0.3 False True False True 0.94 Adam 0.001 0.8 True True False False 0.94 Nadam 0.001 0.3 False True False True 0.94 Adam 0.001 0.3 True True False True 0.94 Nadam 0.001 0.3 True True False True 0.95

Table 11: Highest precision settings

97 opt lr dropout embedding batchnorm outputbias bidirectional recall

RMSprop 0.00001 0.3 True False False False 1.0 Nadam 0.00001 0.8 False True False False 1.0 RMSprop 0.00001 0.3 False False False False 1.0 SGD 0.00100 0.8 True False False False 1.0 Adamax 0.00001 0.8 False False False False 1.0

Table 12: Highest recall settings opt lr dropout embedding batchnorm outputbias bidirectional auc

Adamax 0.001 0.3 True True False True 0.97 Adam 0.001 0.3 False True False True 0.97 Adam 0.001 0.3 True True False True 0.97 Nadam 0.001 0.3 True True False True 0.97 Adamax 0.001 0.3 True True False True 0.97

Table 13: Highest AUC settings

98 opt lr dropout embedding batchnorm outputbias bidirectional f_score

Adam 0.001 0.3 False True False True 0.96 Adam 0.001 0.8 True True False False 0.96 RMSprop 0.001 0.3 False True False True 0.96 Nadam 0.001 0.3 False True False True 0.96 Adam 0.001 0.3 True True False True 0.96

Table 14: Highest f score settings opt lr dropout embedding batchnorm outputbias bidirectional MCC

Nadam 0.001 0.3 True True False True 0.91 Adam 0.001 0.8 True True False False 0.91 RMSprop 0.001 0.3 False True False True 0.92 Nadam 0.001 0.3 False True False True 0.92 Adam 0.001 0.3 True True False True 0.93

Table 15: Highest MCC settings

99 opt lr dropout embedding batchnorm outputbias bidirectional Youden

Adam 0.001 0.8 True True False False 0.91 Nadam 0.001 0.3 True True False True 0.91 RMSprop 0.001 0.3 False True False True 0.91 Nadam 0.001 0.3 False True False True 0.92 Adam 0.001 0.3 True True False True 0.92

Table 16: Highest Youden settings opt lr dropout embedding batchnorm outputbias bidirectional C_kappa

Adam 0.001 0.8 True True False False 0.91 Nadam 0.001 0.3 True True False True 0.91 RMSprop 0.001 0.3 False True False True 0.92 Nadam 0.001 0.3 False True False True 0.92 Adam 0.001 0.3 True True False True 0.92

Table 17: Highest C kappa settings 100 6.4 Code

6.4.1 Sort twitter def has_powell(a,b): if type(a)== str: return a.lower() inb if type(a)== list: fori ina: if i.lower() inb: return True return False def has_trump(a,b): if type(a)== str: return a.lower() inb if type(a)== list: fori ina: if i.lower() inb: return True return False def sort_twitter(drop_retweets=False):

t0= time.time() with h5py.File(thepath+"tweets.h5", 'r') as tweets: tweet_keys=list(tweets.keys()) print(f"tweet keys:{len(tweet_keys)})") # print("done, it took: {} ".format(get_time(t0)))

# t0= time.time() dfs=[] fort in tweet_keys: dfs .append(pd.read_hdf(thepath+"tweets.h5", t)) tweets= pd.concat(dfs) tweets[ 'id ']= tweets.index # tweets.index= tweets.Created

101 tweets.Created= pd.to_datetime(tweets.Created) tweets.index= tweets.Created if drop_retweets: tweets= tweets[tweets[ 'IsRetweet ']== False]

#TODO, input variable date # tweets= tweets[tweets.index> pd.to_datetime("2019")] tweets.sort_index(axis=0, inplace=True)

# print("done, it took: {} ".format(get_time(t0))) users= tweets.User.unique() foru in users: tmp= tweets[tweets.User==u] print("{}:{}".format(u, len(tmp)))

# tweets.Created= pd.to_datetime(tweets.Created) tweets[ 'timedelta ']= (tweets.Created.shift(-1)- tweets. Created ) tweets[ 'seconds ']= tweets.timedelta.apply(lambdax: x. seconds ) tweets[ ' cumseconds ']= tweets.seconds.cumsum()

# tweets[ ' cumseconds ']= tweets[ ' cumseconds ']- tweets[ ' cumseconds '].shift(1) # tweets[ ' reverseCumSum ']= tweets.seconds[::-1].cumsum() # tweets[ ' reverseCumSum ']= tweets.loc[::-1, 'seconds ']. cumsum()[::1] tweets.head(10) mask5min= tweets.seconds< 60*5 tweets5= tweets[mask5min] print("{} vs{}, ratio:{}".format(len(tweets), len( tweets5 ), len(tweets5)/ len(tweets)))

102 # t0= time.time() def similar(a, b): return SequenceMatcher(None, a, b).ratio() # https://stackoverflow.com/questions/17388213/find- the-similarity-metric- between-two-strings tweets[ 'Text ']= tweets[ 'Text '].str.lower() tweets[ ' has_Powell ']= tweets.apply(lambdax: has_powell( "powell", x.Text), axis=1) tweets[ 'has_Fed ']= tweets.apply(lambdax: has_powell( Fed_list, x.Text), axis=1) tweets[ 'has_Trump ']= tweets.apply(lambdax: has_trump(" trump", x.Text), axis=1) tweets[ ' has_Advisor ']= tweets.apply(lambdax: has_trump( Advisor_list, x.Text), axis=1) tweets[ 'nextText ']= tweets.Text.shift(1) tweets[ 'prevText ']= tweets.Text.shift(-1) tweets[ 'nextUser ']= tweets.User.shift(1) tweets[ 'prevUser ']= tweets.User.shift(-1) # tweets[ ' corrAfter ']= similar(tweets.Text, tweets. nextText,) # tweets[ ' corrBefore ']= similar(tweets.Text, tweets. prevText/) tweets.nextText.fillna("",inplace=True) tweets.prevText.fillna("",inplace=True) tweets[ 'corrAfter ']= tweets.apply(lambdax: similar(x. Text, x.nextText), axis=1) tweets[ ' corrBefore ']= tweets.apply(lambdax: similar(x. Text, x.prevText), axis=1) """I 'm using0.5, because there 's often comentary on the headline, >0.9 almost just gives retweets, I read through multiple around0.5 cases and they were the same headline"""

103 alikeB= tweets[(tweets[ ' corrBefore ']>0.5)] alikeA= tweets[(tweets[ 'corrAfter ']>0.5)]

alikeB= alikeB[alikeB.User!= alikeB.prevUser] alikeA= alikeA[alikeA.User!= alikeA.nextUser]

# t0= time.time() days_alikeA= alikeA.groupby(alikeA.index.date) days_alikeB= alikeB.groupby(alikeB.index.date) datesA=[] datesB=[] for days in days_alikeA: datesA .append(days[0]) for days in days_alikeB: datesB .append(days[0]) print("done, it took:{}".format(get_time(t0)))

return tweets, alikeA, alikeB, datesA, datesB

Preparing tweets and grouping by Fed members and White house members

6.4.2 Alike next hour

def similar(a, b): return SequenceMatcher(None, a, b).ratio() def iterate_similar(i, a, df, max_time=3600, corr_min=0.5): alike=0 tweets_compared=0 for idx, row in df.iterrows(): if (idx- i).total_seconds()> max_time: break tweets_compared+=1 if similar(a, row.Text)> corr_min: alike+=1

104 print("tweets_compared:{}".format(tweets_compared)) print("alike:{}".format(alike)) tweets . loc[i, ' alike_next_hour ']= alike #https:// stackoverflow.com/ questions/25478528/ updating-value-in-iterrow- for-pandas def iterate_similar2(i, a, df, max_time=3600, corr_min=0.5): alike=0 tweets_compared=0 for row in df.itertuples(): if(getattr(row,"Created")- i).total_seconds()> max_time: break tweets_compared+=1 if similar(a, getattr(row,"Text"))> corr_min: alike+=1 print("tweets_compared:{}".format(tweets_compared)) print("alike:{}".format(alike)) tweets . loc[i, ' alike_next_hour ']= alike do_alike= False # This takes forever....(over5 seconds per row) do_alike2= False # Can 't time it do_alike3= False # Takesa little bit longer than do_alike1 (over6 seconds per row) do_alike4= False #(0.13 seconds per row).... True here ...... do_alike5= False #(0.15 seconds per row)

"""I set all of them to False becauseI 've finished running for all of the tweets""" if do_alike5: t0= time.time()

105 i=0 for row in tweets.itertuples(): i+=1 t3= time.time() iterate_similar2(getattr(row,"Created"), getattr(row ,"Text"), tweets[ tweets.index> getattr (row ,"Created")]) print("i:{}, done, it took:{}".format(i, get_time( t3)))

print("all done, it took:{}".format(get_time(t0))) if do_alike4: t0= time.time() j=0 fori in range(len(tweets)): ifi< current_index: j=i continue t3= time.time() iterate_similar2(tweets.iloc[i].Created, tweets.iloc[ i].Text, tweets[tweets . index> tweets.iloc[i ].Created]) print("i:{}, done, it took:{}".format(i, get_time( t3))) ifi-j> 10000: save_tweet(tweets) j=i

print("all done, it took:{}".format(get_time(t0))) if do_alike3: t0= time.time()

106 fori in range(len(tweets)): t3= time.time() iterate_similar(tweets.iloc[i].Created, tweets.iloc[i ].Text, tweets[tweets. index> tweets.iloc[i] . Created]) print("i:{}, done, it took:{}".format(i, get_time( t3)))

print("all done, it took:{}".format(get_time(t0))) if do_alike2: t0= time.time()

# tweets.apply(lambdax: iterate_similar(x.Created,x. Text, tweets[tweets.index >x.Created])) tweets= tweets.apply(lambdax: iterate_similar(x.Created , x.Text, tweets[tweets. index> x.Created]), axis= 1)

print("all done, it took:{}".format(get_time(t0))) if do_alike: t0= time.time() tweets[ ' alike_next_hour ']=0 i=0 for idx, row in tweets.iterrows(): i+=1 t3= time.time() iterate_similar(idx, row.Text, tweets[tweets.index> idx]) # ifi>6: # break

107 print("i:{}, done, it took:{}".format(i, get_time( t3)))

Different versions for generating Alike next hour

6.4.3 Get product

range_dict2={} def add_to_range_dict(the_product, df): if the_product not in range_dict2: range_dict2[the_product]=[df.head(1).index.date, df .tail(1).index.date] # print(range_dict2[the_product]) def in_this_range2(idx, symbol, df): # Replace this with what 's in get_product.py if symbol not in range_dict2: add_to_range_dict(symbol, df) r= range_dict2[symbol] idx= pd.to_datetime(idx) try: # print("idx: {}| r[0]: {}| r[1]: {} ".format(idx,r[ 0],r[1])) if idx>r[0] and idx

108 vol_after= after.buys.sum()+ after.sells.sum() vol_before= before.buys.sum()+ before.sells.sum() if vol_before>0: return vol_after/ vol_before else: return0 if kind== 'range ': step_size= step_sizes[symbol[0:2]] #Ticker range_after= (after.price.max()/1e8- after.price. min()/1e8)/ step_size range_before= max((before.price.max()/1e8- before. price .min()/1e8)/ step_size, 1) return range_after/ range_before if range_before>0 else0 if kind== ' bid_ask_spread ': step_size= step_sizes[symbol[0:2]] #Ticker after[ 'bb ']= np.where(after.is_offer== True, after. price, after.price- step_size ) after[ 'bo ']= np.where(after.is_offer== True, after. price+ step_size, after.price) before[ 'bb ']= np.where(before.is_offer== True, before.price, before. price- step_size) before[ 'bo ']= np.where(before.is_offer== True, before.price+ step_size, before. price )

after_gap= step_size before_gap= step_size if after.buys.sum()!=0 and after.sells.sum()!=0: after_gap= ((after.bo* after.buys).sum()/ after.buys.sum()) - ((after.bb*

109 after.sells).sum() / after.sells.sum ()) if before.buys.sum()!=0 and before.sells.sum()!=0 : before_gap= ((before.bo* before.buys).sum()/ before.buys.sum()) - ((before.bb* before.sells).sum ()/ before.sells. sum()) return after_gap/ before_gap if before_gap!=0 else 0 if kind== 'stamina ': st_after=0 st_before=0 for threshold in after.stamina: st_after+= after[threshold] for threshold in before.stamina: st_before+= before[threshold] if st_after>0: return st_before/ st_after else: return0 else: try: a= after[kind].sum()/ thres[symbol] b= before[kind].sum()/ thres[symbol] ifb>0: returna/b else: returna/thres[symbol] except: return0

ECO_TIMES=[["07:44","07:46"],["08:14","08:16"],["08:29"," 08:31"],["09:29","09:31"],["

110 09:59","10:01"],["13:00","13: 03"],["13:59","14:01"],["17: 00","18:00"]] def is_in_event(idx): # print(idx) fore inECO_TIMES: # print("idx | e[0] | e[1]: {}|{}|{} ".format(idx, e[0],e[1])) if idx>=e[0] and idx<=e[1]: # print("RETURNTRUE") return True return False print("done, it took:{}".format(get_time(t0)))

Gather and prepare market data for tweet impact analysis

6.4.4 Prepare market data

PRODUCTS=[ 'ZNM8 ','ZNU8 ','ZNZ8 ','ZNH9 ','ZNM9 ','ZNU9 ','ZNZ9 ', 'ZNH0 '] sweeps={} staminas={} thres={} num_columns=0 sw_columns=[] forp inPRODUCTS: try: is_spread= '-' inp p_ticker=p[0:2] if not is_spread else '{}_spread '. format(p[0:2])

fifo= fifos[p_ticker]

111 steps= step_sizes[p_ticker] threshold= thresholds[p_ticker] thres[p]= threshold pid= str(products[p]) print(f"{p}-{pid}") sw= pd.read_hdf(thepath+"sweep_info.h5", pid) sw= sw.iloc[1:] sw= sw.drop([ ' seed_support ',' seed_depth ',' seed_timing '], axis=1) st= pd.read_hdf(thepath+"stamina_info.h5", pid) sw= add_sweep_columns(sw) sw= add_sweep_today_columns(sw, steps) st= stamina_fifo(st, fifo) sw= localize_df(sw) st= localize_df(st) sw[ ' dump_for_loss_bid_sweep ']= np.where((sw[ ' continued ']== False) & (sw[ 'is_offer ']== False)& (sw[ ' min_opp_support ']. shift (-1)!=-1)& (sw [' min_opp_support ']. shift (-1)< opp_dump_size[p_ticker ]), sw[ 'quant '], None) sw[ ' dump_for_loss_offer_sweep ']= np.where((sw[ ' continued ']== False) & (sw[ 'is_offer ']== True)& (sw[ ' min_opp_support ']. shift (-1)!=-1)& (sw [' min_opp_support ']. shift (-1)< opp_dump_size[p_ticker ]), sw[ 'quant '], None)

112 # sw= do_ta(sw)# daily values, so the ratio would not change, hence never givingya value of1(impact) # But it does if we do sweepSweep breaking it into smaller parts than min_offset. For example groupby5 minutes and min_offset =30

sweeps[p]= sw staminas[p]= st num_columns= len(sw.columns) # This will become higher if we do_ta sw_columns= list(sw.columns) except: print("Product:{}, did not work".format(p)) metrics=[ 'volume ','range ',' bid_ask_spread '] num_columns+= len(metrics) comb= metrics+ sw_columns

Prepare market data for the market impact analysis

6.4.5 Tweets to train base

def tweets_to_train_base(tweets, num_columns=1, min_offset=20 , ratio_min= 3, do_plot=False , plot_name="Powell", shift_minute=False, require_initial_impact=False, drop_events=False, only_has_says=False):

113 idxs=[] x_train=[] y_train=[] y_trainB=[] x_plot={} # if do_plot y_plot={} #-- || -- fori in range(len(comb)): x_plot[comb[i]]=[] y_plot[comb[i]]=[] # x_plot=[]# if do_plot # y_plot=[]#-- || --

# Making tweets tz_aware, because sweep_index is tz_aware try: # tweets.index= tweets.index.tz_localize("America/ New_York", ambiguous= ' NaT ', nonexistent= ' shift_forward ') tweets.index= tweets.index.tz_localize("UTC", ambiguous= 'NaT ', nonexistent= ' shift_forward '). tz_convert("America/ New_York") except Exception as e: print(e) # Most likely already tz_aware # Run this just once, but it does not work running it at the beginning of the file

for idx, row in tweets.iterrows(): if drop_events: if is_in_event(idx.strftime("%H:%M")): continue if"heads up:" not in row.Text.lower(): pure_text= str(row.Text) pure_text= re.sub(r '\w+:\/{2}[\d\w-]+(\.[\d\w -]+)*(?:(?:\/[^\s

114 /]*))* ', '' , row . Text ) if only_has_says and"says" not in row.Text.lower(): continue # pure_text= re.sub(r '^https?:\/\/.*[\r\n]* ', '', pure_text, flags=re. MULTILINE) # x_train.append(row.Text)

x_train .append(pure_text) y_trainB .append(0) # y_train.append([-1]*num_columns)#I will iterate through all of the columns and get the ratio for all of them, before and after a tweet y_train .append([0]*num_columns) #I will iterate through all of the columns and get the ratio for all of them, before and aftera tweet idxs .append(idx) # return x_train, y_train, idxs i=0 plots=0

if shift_minute: plot_name+="_shift" if require_initial_impact: plot_name+="_initialImpact" if drop_events: plot_name+="_dropEvents" if only_has_says: plot_name+="_says"

115 plot_name+="from2018" for idx in idxs: forp inPRODUCTS: if in_this_range2(idx, p, sweeps[p]):

# print("in_range") pidx= pd.to_datetime(idx) if shift_minute: pidx= pidx- pd.DateOffset(minutes=1) tmp= sweeps[p] tmpst= staminas[p] text="" mark_y= False try: before= tmp[(tmp.index< pidx)& (tmp.index> pidx- pd. DateOffset(minutes =min_offset))] after= tmp[(tmp.index> pidx)& (tmp.index< pidx+ pd. DateOffset(minutes =min_offset))] both= tmp[(tmp.index> pidx- pd.DateOffset( minutes=min_offset ))& (tmp.index< pidx+ pd. DateOffset(minutes =min_offset))] beforest= tmpst[(tmpst.index< pidx)& (tmpst. index> pidx- pd. DateOffset(minutes =min_offset))] afterst= tmpst[(tmpst.index> pidx)& (tmpst. index< pidx+ pd. DateOffset(minutes

116 =min_offset))] bothst= tmpst[(tmpst.index> pidx- pd. DateOffset(minutes =min_offset))&( tmpst.index< pidx + pd.DateOffset( minutes=min_offset ))] if len(before)>0 and len(after)>0: forj in range(num_columns): ratio= get_ratio(comb[j], before, after, p) if ratio> ratio_min: text+="{}:{}\n".format(comb[j], int( ratio )) mark_y= True y_trainB[i]=1 y_train[i][j]=1 #FIXME, shouldI do=1 or= ratio( then normalize the ratio ?) if do_plot: x_plot[comb[j]].append(pidx) y_plot[comb[j]].append(ratio) else: y_trainB[i]=0 y_train[i][j]=0 if require_initial_impact and mark_y: mark_y= False text+="---Initial impact:---\n" before= tmp[(tmp.index< pidx)& (tmp.index > pidx- pd. DateOffset( minutes=

117 min_offset))] after= tmp[(tmp.index> pidx)& (tmp.index< pidx+ pd. DateOffset( minutes=2))] both= tmp[(tmp.index> pidx- pd.DateOffset( minutes= min_offset))& ( tmp . index< pidx+ pd. DateOffset( minutes= min_offset))] beforest= tmpst[(tmpst.index< pidx)&( tmpst.index> pidx- pd. DateOffset( minutes= min_offset))] afterst= tmpst[(tmpst.index> pidx)& (tmpst . index< pidx + pd. DateOffset( minutes=2))] bothst= tmpst[(tmpst.index> pidx- pd. DateOffset( minutes= min_offset))& (tmpst.index < pidx+ pd. DateOffset( minutes= min_offset))] if len(before)>0 and len(after)>0: forj in range(num_columns):

118 ratio= get_ratio(comb[j], before, after, p) if ratio> ratio_min: text+="{}:{}\n".format(comb[j], int( ratio ) ) mark_y= True y_trainB[i]=1 y_train[i][j]=1 #FIXME, shouldI do =1 or= ratio (then normalize the ratio ?) if do_plot: x_plot[comb[j]].append(pidx) y_plot[comb[j]].append(ratio) else: y_trainB[i]=0 y_train[i][j]=0

if mark_y: # zoom_plot(both, bothst,p, x_train[i], text, pdf, pidx, is_spread=False) plots+=1 except Exception as e: print(e) i+=1 # if plots> 100: # break if do_plot:

119 for key in x_plot: print("____{}____".format(key)) plt .plot(x_plot[key],y_plot[key], marker="*") plt . show ()

df_y_trainB= pd.DataFrame(data= y_trainB, columns=[ ' y_train '], index=idxs) df_y_train= pd.DataFrame(data=y_train, columns=comb, index=idxs) df_x_train= pd.DataFrame(data=x_train, columns=[ 'x_train '], index=idxs)

df_y_trainB.to_hdf("y_trainB.h5", key=plot_name, mode="a" , complevel=9, complib=" blosc", append=True, format="table") df_y_train.to_hdf("y_train.h5", key=plot_name, mode="a", complevel=9, complib=" blosc", append=True, format="table") df_x_train.to_hdf("x_train.h5", key=plot_name, mode="a", complevel=9, complib=" blosc", append=True, format="table")

print("len tweets|i| plot|:{}|{}|{}".format(len( tweets), i, plots)) return x_train, y_train, idxs

Takes in a twitter data set and analysis the impact it had on the market

6.4.6 Technical analysis, TA library

def pre_prep_sweep(sweep):

120 # sweep= sweep.drop([ ' seed_support ',' seed_depth ',' seed_timing '], axis=1) date_sweep= sweep.groupby(sweep.index.date) sweep_dates=[] ford in date_sweep: sweep_dates.append(d[0]) tail=[] ford in sweep_dates: tmp= sweep[sweep.index.date==d] after6= tmp[tmp.index.hour> 18] try: tmp[ 'after6 ']= after6.head(1).price.item() tmp[ 'price ']= np.where( (tmp.index.hour>= 17)& (tmp.index.hour< = 18) ,tmp.after6 , tmp.price ) except Exception as e: print(e) tmp[ 'Low ']= tmp.price.min() tmp[ 'High ']= tmp.price.max() tmp[ 'Close ']= tmp.tail(1).price.item() tmp[ 'Volume ']= tmp.quant.cumsum() tail .append(tmp.tail(1)) tails= pd.concat(tail) tails.sort_index(axis=0, inplace=True) return tails, sweep_dates def do_ta(newSweep, do_momentum=True, do_volume=True, do_volatility=True, do_trend= True, do_print=False): sweep= newSweep.copy() newSweep, sweep_dates= pre_prep_sweep(newSweep) if do_momentum: t0= time.time() if do_print: print("doing ta.momentum")

121 """ta.momentum""" #TODO, there are new Momentum indicators, https:// technical-analysis- library-in-python. readthedocs.io/en/ latest/ta.html newSweep[ ' AwesomeOscillatorIndicator ']= ta.momentum. AwesomeOscillatorIndicator ( low=newSweep.Low, high=newSweep.High). awesome_oscillator() newSweep[ ' KAMAIndicator ']= ta.momentum.KAMAIndicator ( close=newSweep.Close, fillna=True).kama() newSweep[ ' ROCIndicator ']= ta.momentum.ROCIndicator( close=newSweep.Close). roc () newSweep[ ' RSIIndicator ']= ta.momentum.RSIIndicator( close=newSweep.Close). rsi () newSweep[ ' StochasticOscillator ']= ta.momentum. StochasticOscillator( low=newSweep.Low, high =newSweep.High, close= newSweep.Close).stoch () newSweep[ ' StochasticOscillator_high ']= ta.momentum. StochasticOscillator( low=newSweep.Low, high =newSweep.High, close= newSweep.Close). stoch_signal() newSweep[ ' TSIIndicator ']= ta.momentum.TSIIndicator( close=newSweep.Close). tsi ()

122 newSweep[ ' UltimateOscillator ']= ta.momentum. UltimateOscillator(low =newSweep.Low, high= newSweep.High, close= newSweep.Close). ultimate_oscillator() newSweep[ ' WilliamsRIndicator ']= ta.momentum. WilliamsRIndicator(low =newSweep.Low, high= newSweep.High, close= newSweep.Close). williams_r() if do_print: print("momentum done, took:{},\n doing ta.volume ".format(get_time( t0))) if do_volume: t0= time.time() """ta.volume""" newSweep[ ' AccDistIndexIndicator ']= ta.volume. AccDistIndexIndicator( low=newSweep.Low, high =newSweep.High, close= newSweep.Close, volume =newSweep.Volume). acc_dist_index() newSweep[ ' ChaikinMoneyFlowIndicator ']= ta.volume. ChaikinMoneyFlowIndicator ( low=newSweep.Low, high=newSweep.High, close=newSweep.Close, volume=newSweep.Volume ).chaikin_money_flow() newSweep[ ' EaseOfMovementIndicator ']= ta.volume. EaseOfMovementIndicator ( low=newSweep.Low,

123 high=newSweep.High, volume=newSweep.Volume ).ease_of_movement() newSweep[ ' EaseOfMovementIndicator_sma ']= ta.volume. EaseOfMovementIndicator ( low=newSweep.Low, high=newSweep.High, volume=newSweep.Volume ).sma_ease_of_movement () newSweep[ ' ForceIndexIndicator ']= ta.volume. ForceIndexIndicator( close=newSweep.Close, volume=newSweep.Volume ).force_index() newSweep[ ' MFIIndicator ']= ta.volume.MFIIndicator(low =newSweep.Low, high= newSweep.High, close= newSweep.Close, volume =newSweep.Volume). money_flow_index() newSweep[ ' NegativeVolumeIndexIndicator ']= ta.volume. NegativeVolumeIndexIndicator ( close=newSweep.Close, volume=newSweep. Volume ). negative_volume_index () newSweep[ ' OnBalanceVolumeIndicator ']= ta.volume. OnBalanceVolumeIndicator ( close=newSweep.Close, volume=newSweep. Volume ). on_balance_volume() newSweep[ ' VolumePriceTrendIndicator ']= ta.volume. VolumePriceTrendIndicator

124 ( close=newSweep.Close, volume=newSweep. Volume ). volume_price_trend() newSweep[ ' VolumeWeightedAveragePrice ']= ta.volume. VolumeWeightedAveragePrice ( low=newSweep.Low, high=newSweep.High, close=newSweep.Close, volume=newSweep.Volume ). volume_weighted_average_price () if do_print: print("volume done, took:{},\n doing ta. volatility".format (get_time(t0))) if do_volatility: t0= time.time() """ta.volatility""" newSweep[ ' AverageTrueRange ']= ta.volatility. AverageTrueRange(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). average_true_range() newSweep[ ' BollingerBands_hband ']= ta.volatility. BollingerBands(close= newSweep.Close). bollinger_hband() newSweep[ ' BollingerBands_hband_indicator ']= ta. volatility. BollingerBands(close= newSweep.Close). bollinger_hband_indicator ()

125 newSweep[ ' BollingerBands_lband ']= ta.volatility. BollingerBands(close= newSweep.Close). bollinger_lband() newSweep[ ' BollingerBands_lband_indicator ']= ta. volatility. BollingerBands(close= newSweep.Close). bollinger_lband_indicator () newSweep[ ' BollingerBands_mavg ']= ta.volatility. BollingerBands(close= newSweep.Close). bollinger_mavg() newSweep[ ' BollingerBands_pband ']= ta.volatility. BollingerBands(close= newSweep.Close). bollinger_pband() newSweep[ ' BollingerBands_wband ']= ta.volatility. BollingerBands(close= newSweep.Close). bollinger_wband() newSweep[ ' DonchianChannel_hband ']= ta.volatility. DonchianChannel(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). donchian_channel_hband () newSweep[ ' DonchianChannel_lband ']= ta.volatility. DonchianChannel(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). donchian_channel_lband ()

126 newSweep[ ' DonchianChannel_mband ']= ta.volatility. DonchianChannel(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). donchian_channel_mband () newSweep[ ' DonchianChannel_pband ']= ta.volatility. DonchianChannel(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). donchian_channel_pband () newSweep[ ' DonchianChannel_wband ']= ta.volatility. DonchianChannel(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). donchian_channel_wband () newSweep[ ' KeltnerChannel_hband ']= ta.volatility. KeltnerChannel(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). keltner_channel_hband () newSweep[ ' KeltnerChannel_hband_indicator ']= ta. volatility. KeltnerChannel(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). keltner_channel_hband_indicator ()

127 newSweep[ ' KeltnerChannel_lband ']= ta.volatility. KeltnerChannel(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). keltner_channel_lband () newSweep[ ' KeltnerChannel_lband_indicator ']= ta. volatility. KeltnerChannel(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). keltner_channel_lband_indicator () newSweep[ ' KeltnerChannel_mband ']= ta.volatility. KeltnerChannel(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). keltner_channel_mband () newSweep[ ' KeltnerChannel_pband ']= ta.volatility. KeltnerChannel(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). keltner_channel_pband () newSweep[ ' KeltnerChannel_wband ']= ta.volatility. KeltnerChannel(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). keltner_channel_wband ()

128 if do_print: print("volatility done, took:{},\n doing ta. trend".format( get_time(t0))) if do_trend: t0= time.time() """ta.trend""" #TODO, few new trend indicators, around the bottom of the list newSweep[ ' ADXIndicator ']= ta.trend.ADXIndicator(low= newSweep.Low, high= newSweep.High, close= newSweep.Close).adx() newSweep[ ' ADXIndicator_neg ']= ta.trend.ADXIndicator( low=newSweep.Low, high =newSweep.High, close= newSweep.Close). adx_neg () newSweep[ ' ADXIndicator_pos ']= ta.trend.ADXIndicator( low=newSweep.Low, high =newSweep.High, close= newSweep.Close). adx_pos () newSweep[ ' AroonIndicator_down ']= ta.trend. AroonIndicator(close= newSweep.Close). aroon_down() newSweep[ ' AroonIndicator_indicator ']= ta.trend. AroonIndicator(close= newSweep.Close). aroon_indicator() newSweep[ ' AroonIndicator_up ']= ta.trend. AroonIndicator(close= newSweep.Close). aroon_up ()

129 newSweep[ ' CCIIndicator ']= ta.trend.CCIIndicator(low= newSweep.Low, high= newSweep.High, close= newSweep.Close).cci() newSweep[ ' DPOIndicator ']= ta.trend.DPOIndicator( close=newSweep.Close). dpo () newSweep[ ' EMAIndicator ']= ta.trend.EMAIndicator( close=newSweep.Close). ema_indicator() newSweep[ ' IchimokuIndicator_a ']= ta.trend. IchimokuIndicator(low= newSweep.Low, high= newSweep.High). ichimoku_a() newSweep[ ' IchimokuIndicator_b ']= ta.trend. IchimokuIndicator(low= newSweep.Low, high= newSweep.High). ichimoku_b() newSweep[ ' IchimokuIndicator_base_line ']= ta.trend. IchimokuIndicator(low= newSweep.Low, high= newSweep.High). ichimoku_base_line() newSweep[ ' IchimokuIndicator_conversion_line ']= ta. trend . IchimokuIndicator(low= newSweep.Low, high= newSweep.High). ichimoku_conversion_line () newSweep[ ' KSTIndicator ']= ta.trend.KSTIndicator( close=newSweep.Close). kst ()

130 newSweep[ ' KSTIndicator_diff ']= ta.trend.KSTIndicator ( close=newSweep.Close) .kst_diff() newSweep[ ' KSTIndicator_sig ']= ta.trend.KSTIndicator( close=newSweep.Close). kst_sig () # newSweep[ ' MACD ']= ta.trend.MACD(close=newSweep. Close).macd() # newSweep[ ' MACD_diff ']= ta.trend.MACD(close= newSweep.Close,n=9). macd_diff() # newSweep[ ' MACD_signal ']= ta.trend.MACD(close= newSweep.Close,n=9). macd_signal() # newSweep[ ' MassIndex ']= ta.trend.MassIndex(low= newSweep.Low, high= newSweep.High,n=9). mass_index()#FIXME,n =9 hardcoded newSweep[ ' PSARIndicator ']= ta.trend.PSARIndicator( low=newSweep.Low, high =newSweep.High, close= newSweep.Close).psar() newSweep[ ' PSARIndicator_down ']= ta.trend. PSARIndicator(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). psar_down() newSweep[ ' PSARIndicator_down_indicator ']= ta.trend. PSARIndicator(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). psar_down_indicator()

131 newSweep[ ' PSARIndicator_up ']= ta.trend.PSARIndicator ( low=newSweep.Low, high=newSweep.High, close=newSweep.Close). psar_up () newSweep[ ' PSARIndicator_up_indicator ']= ta.trend. PSARIndicator(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). psar_up_indicator() # newSweep[ ' SMAIndicator ']= ta.trend.SMAIndicator( close=newSweep.Close, n=9).sma_indicator()# FIXME,n=9 hardcoded newSweep[ ' TRIXIndicator ']= ta.trend.TRIXIndicator( close=newSweep.Close). trix () newSweep[ ' VortexIndicator_diff ']= ta.trend. VortexIndicator(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). vortex_indicator_diff () newSweep[ ' VortexIndicator_neg ']= ta.trend. VortexIndicator(low= newSweep.Low, high= newSweep.High, close= newSweep.Close). vortex_indicator_neg() newSweep[ ' VortexIndicator_pos ']= ta.trend. VortexIndicator(low= newSweep.Low, high= newSweep.High, close= newSweep.Close).

132 vortex_indicator_pos() if do_print: print("trend done, took:{}".format(get_time(t0)) )

sweepTA= newSweep cc=[] forc in sweep.columns: cc.append(c) sweepTA= sweepTA.drop(cc, axis=1) t0= time.time() forc in sweepTA.columns: sweep[c]=0 t1= time.time() for idx, row in sweepTA.iterrows(): print("{}, last day took{}".format(idx, get_time(t1) )) t1= time.time() forc in sweepTA.columns: sweep[c]= np.where(sweep.index.date== idx, row[ c], sweep[c])

print("done, it took:{}".format(get_time(t0)))

# return newSweep, sweep_dates return sweep

Technical analysis library preparation

6.4.7 Group sweep

def group_sweep(sweep, sweep_dates, minutes=10): freq= str(minutes)+ 'Min ' t1= time.time() i=0

133 all_shorts=[] ford in sweep_dates: t0= time.time() i+=1 tmp= sweep[sweep.index.date==d]

# m10= tmp.groupby(pd.Grouper(freq= ' 10Min '))# Prufa 60mini stadinn fyrir 10min m10= tmp.groupby(pd.Grouper(freq=freq)) # Prufa 60mini stadinn fyrir 10min

minutes=[] form in m10: t=m[1] t[ 'Low ']= t.price.min() t[ 'High ']= t.price.max() try: t[ 'Close ']= t.tail(1).price.item() except: t[ 'Close ']= t.price.max() t[ 'Volume ']= t.quant.cumsum() try: t= do_ta(t) except: None t.fillna(t.shift(1), axis=0, inplace=True)

minutes .append(t.tail(1))

sweepShort= pd.concat(minutes) all_shorts.append(sweepShort)

print("{} done, took:{}".format(d, get_time(t0))) sweepSweep= pd.concat(all_shorts)

134 print("sweepSweep ready, took:{}".format(get_time(t1))) return sweepSweep from get_ta import do_ta, plot_ta, ta_dec, ta_type, ta_cat

"""TODO""" """ akveda hvort eg vilji nota newsweep eda tail hjalparfall inni do_ta""" # sweepTA, sweep_dates= do_ta(sweep) sweep_orig_columns= sweep.columns sweep= do_ta(sweep) sweep

Since the technical analysis, ta library uses daily values to update it’s in- dicators we created this group sweep function to integrate the library for intraday usage. It groups the market data into small time frames and runs the ta library on those short periods as if they were daily values. Doing this allows us to see if the tweets change the intraday trends in those indicators

6.4.8 Classification metrics

def f_score(precision, recall): if precision>0 or recall>0: return2* (precision*recall)/(precision+recall) else: return0 defMCC(TP,TN,FP,FN): if(TP+FP)>0 and(TP+FN)>0 and(TN+FP)>0 and(TN+FN )>0: return(TP*TN-FP*FN)/(math.sqrt((TP+FP)*(TP+FN)*( TN+FP)*(TN+FN))) else: return0

135 def Youden(TP, TN, FP, FN): former=0 later=0 if(TP+FN)>0: former=TP/(TP+FN) if(TN+FP)>0: later=TN/(TN+FP) return max(0, former+later-1) def C_Kappa(TP, TN, FP, FN): if(TP+TN+FP+FN)>0: p_yes=(TP+FP)/(TP+TN+FP+FN)*(TP+FN)/(TP+FP+FN+TN) p_no=(FN+TN)/(TP+TN+FP+FN)*(FP+TN)/(TP+FP+FN+TN) p_e= p_yes+ p_no p_0=(TP+TN)/(TP+TN+FP+FN) if (1-p_e)>0: return (p_0-p_e)/(1-p_e) else: print("1-p_e ==0") return0 else: print("TP+TN+FP+FN ==0") return0 df[ ' val_f_score ']= df.apply(lambdax: f_score(x. val_precision, x.val_recall), axis=1) df[ 'f_score ']= df.apply(lambdax: f_score(x.precision, x. recall), axis=1) df[ 'val_MCC ']= df.apply(lambdax: MCC(x.val_tp, x.val_tn, x. val_fp, x.val_fn), axis=1) df[ 'MCC ']= df.apply(lambdax: MCC(x.tp, x.tn, x.fp, x.fn), axis=1) df[ ' val_Youden ']= df.apply(lambdax: Youden(x.val_tp, x. val_tn, x.val_fp, x.val_fn),

136 axis=1) df[ 'Youden ']= df.apply(lambdax: Youden(x.tp, x.tn, x.fp, x. fn), axis=1) df[ ' val_C_kappa ']= df.apply(lambdax: C_Kappa(x.val_tp, x. val_tn, x.val_fp, x.val_fn), axis=1) df[ 'C_kappa ']= df.apply(lambdax: C_Kappa(x.tp, x.tn, x.fp, x.fn), axis=1)

All of the classification metrics were calculated both for the training values and validation values

6.4.9 Regularize XY

""" At first we sum up each row how many metrics were impacted, extract that column to the dictionary df_total""" df_total={} fork in keys: total= df_y_train[k].sum(axis=1) df_y_train[k][ 'total ']= total df_total[k]= df_y_train[k][[ 'total ']] df_total[k]= df_total[k][df_total[k][ 'total ']>0] fork in keys: print("\nRegularizing key:{}".format(k)) t0= time.time()

a= len(df_total[k]) b= len(df_x_train[k]) c=a/b d=1/c

""" IfI want an even datasetI need

137 0.5=x*a/(b+x*a)

->x=b/a

It will be """ min_impacts=7 #Min number of metrics it packed an impact on tmp= df_total[k][df_total[k][ 'total ']>=min_impacts] a= len(tmp) x= round(b/a) print(x) df= df_x_train[k].copy() dfy= df_y_trainB[k].copy() print("a:{}|b:{}|x:{}| len df:{}| len dfy:{}". format(a, b, x, len(df), len(dfy))) fori in range(len(tmp)): # value= df_total[keys[key]].iloc[i].item() value=x

test_index= tmp.index[i] tweet= df_x_train[k][df_x_train[k].index==test_index ] y= df_y_trainB[k][df_y_trainB[k].index==test_index] fori in range(value): tweet_tmp= tweet.copy().head(1) # HereI 'm testing if this will prevent df and dfy to grow

138 out of control, THIS worked! y_tmp= y.copy().head(1) y_tmp[ 'y_train ']=1 # Just making sure we are evening the dataset correctly df= pd.concat([df, tweet_tmp]).sort_index() dfy= pd.concat([dfy, y_tmp]).sort_index()

dfy.to_hdf("{}y_trainB_regularize.h5".format(path_name), key=k, mode="a", complevel =9, complib="blosc", append=True, format="table ") df. to_hdf ("{}x_train_regularize.h5".format(path_name), key=k, mode="a", complevel =9, complib="blosc", append=True, format="table ") print("a:{}|b:{}|x:{}| len df:{}| len dfy:{}". format(a, b, x, len(df), len(dfy)))

print("done, it took:{}".format(get_time(t0)))

Data Augmentation to the XY values, increasing the amount of tweets that did impact the market the most to even the datasets

6.4.10 Class weights Output biases

class_weights={} output_biases={} pos={} neg={} i=0

139 fork in keys: df= df_y_trainB[k] P= df[df[ 'y_train ']==1] N= df[df[ 'y_train ']==0] print("i:{}|{}: pos| neg:{}|{}".format(i, k, len( P), len(N))) p= len(P) n= len(N) t=p+n pos[k]=p neg[k]=n # class_weights[k]= {0:1,1:1/(p/t) } """Could this class_weight calculation above explain the bad results""" """Got better results using the class_weight calculation below""" class_weights[k]={0:1/n, 1:1/p} """Note the regularized datasets give output_biases around0""" output_biases[k]= np.log(p/n) i+=1

Calculating the class_weights and output_biases for each dataset

6.4.11 Get sentences

def get_sentences(tweets, get_type="list"): sentences=[] fors in tweets: if type(s)!="str": s=s[0] # prepare regex for char filtering re_punc= re.compile( '[%s] ' % re.escape(string. punctuation)) tokens= s.split()

140 # remove punctuation from each word tokens=[re_punc.sub( '' , w) forw in tokens] # remove remaining tokens that are not alphabetic tokens=[word for word in tokens if word.isalpha()]

if get_type=="list": sentences .append(tokens) if get_type=="str": sen="" fort in tokens: sen+=t+"" sentences .append(sen)

return sentences

Cleaning the tweets for the tokenizer

6.4.12 Plot history

def plot_hist3(history): pairs=[] fork in history.keys(): forh in history.keys(): ifk inh andk!=h: pairs .append([k,h]) print(pairs) df= pd.DataFrame(data=history) forp in pairs: df[p[0]].plot() df[p[1]].plot() plt .legend() plt . show ()

Helper function that automatically plots together the values for all of the metrics and corresponding validation values

141 6.4.13 Split XY

def split_XY(X,Y, do_categorical=True, do_shuffle=False): split_index= int(0.8* len(X))

if do_shuffle: X,Y= sklearn.utils.shuffle(X, Y)

x_train=X[:split_index] x_test=X[split_index:] y_train=Y[:split_index] y_test=Y[split_index:] y_train= np.array(y_train) y_test= np.array(y_test)

if do_categorical: if not type(y_train[0])== list: y_train= to_categorical(y_train, 2) y_test= to_categorical(y_test, 2)

return x_train.astype(np.float32), x_test.astype(np. float32), y_train.astype( np.float32), y_test.astype (np.float32), split_index

Splits the data into training and validation data

6.4.14 Define model

configs=( opt , METRICS, DROPOUT, do_bidirectional, do_embedding_matrix,

142 do_categorical, do_batch_norm, do_output_bias, plot_summary, L2 ) def define_model(*configs):

tf.keras.backend.clear_session()

model= Sequential()

n_features= length if do_embedding_matrix: e= Embedding(vocab_size, output_dim= hidden_states, weights=[ embedding_matrix], input_length=x_train. shape[1], trainable= False, mask_zero=True) else: e= Embedding(vocab_size, output_dim= hidden_states, input_length=x_train. shape[1], trainable= False, mask_zero=True) model.add(e) if do_bidirectional: if L2>0: model.add(Bidirectional(LSTM(hidden_states, kernel_regularizer =l2(L2), recurrent_regularizer =l2(L2), bias_regularizer= l2(L2),

143 return_sequences= True))) else: model.add(Bidirectional(LSTM(hidden_states, return_sequences= True))) else: if L2>0: model.add(LSTM(hidden_states, kernel_regularizer= l2(L2), recurrent_regularizer =l2(L2), bias_regularizer= l2(L2), return_sequences= True)) else: model.add(LSTM(hidden_states, return_sequences= True)) if do_batch_norm: model.add(BatchNormalization()) model.add(Dropout(DROPOUT)) if do_bidirectional: model.add(Bidirectional(LSTM(32))) else: model.add(LSTM(32)) model.add(Dropout(DROPOUT)) model.add(Dense(64, activation="relu")) if do_batch_norm: model.add(BatchNormalization()) model.add(Dense(16, activation="relu")) output_bias= None if do_output_bias:

144 output_bias= output_biases[keys[key]]

output_bias= np.log(pos[keys[key]]/neg[keys[key]]) print("___\noutput_bias:{}\n___".format(output_bias) ) output_bias= tf.keras.initializers.Constant( output_bias)

if do_categorical: model.add(Dense(2, activation="sigmoid", bias_initializer= output_bias)) else: model.add(Dense(1, activation="sigmoid", bias_initializer= output_bias))

model .compile(optimizer=opt, loss= ' binary_crossentropy ', metrics=METRICS)

if plot_summary: model.summary() plot_model(model, to_file="model.png", show_shapes= True) return model

Define model takes in a tuple of configs

6.4.15 Fit model

monitor= 'val_auc '

145 verbose=1 early_stopping= tf.keras.callbacks.EarlyStopping( monitor=monitor, verbose=verbose, patience=10, mode= 'max ', restore_best_weights=True) def fit_model(model,*fit_configs): history= model.fit(x_train, y_train, batch_size=BATCH_SIZE, epochs=epochs, verbose=verbose, callbacks=[early_stopping], # callbacks=[checkpoint, early_stopping, tensorBoard], validation_data=(x_test, y_test), class_weight=class_weight)

return model, history

Fit model takes in a Keras Tensorflow model and a tuple of configs

6.4.16 Grid searching configurations

fit_results=[] verbose=1 optimizers=[Adam(), SGD(), RMSprop(), Adagrad(), Nadam(), Adamax ()] fori in range(len(optimizers)): for lr in[1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]: forDROPOUT in[0.1, 0.2, 0.5, 0.7]: for do_embedding_matrix in[True, False]: for do_batch_norm in[True, False]:

146 for do_output_bias in[True, False]: for do_bidirectional in[True, False]: # for monitor in[ ' val_auc ', ' val_recall ', ' val_precision ']: for monitor in[ 'val_auc ']: for L2 in[0, 1e-3, 1e-5]: print("Optimizer:{}\nlearning_rate:{}, L2 :{}| monitor:{}" .format( optimizers[i ]._name, lr, L2 , monitor )) print("Dropout:{}| do_embedding_matrix:{ }| do_batch_norm :{}| do_output_bias :{}| do_bidirectional :{}".format (DROPOUT, do_embedding_matrix , do_batch_norm , do_output_bias , do_bidirectional )) opt= optimizers[i] opt.learning_rate.assign(lr)

early_stopping= tf.keras.callbacks. EarlyStopping

147 ( monitor=monitor, verbose=verbose, patience=30, mode= 'max ', restore_best_weights=True)

configs=( opt , METRICS, DROPOUT, do_bidirectional, do_embedding_matrix, do_categorical, do_batch_norm, do_output_bias, plot_summary, L2 )

configs_to_save=( i, lr , DROPOUT, do_bidirectional, do_embedding_matrix, do_categorical, do_batch_norm, do_output_bias )

fit_configs=( x_train , y_train , BATCH_SIZE,

148 epochs , # checkpoint, early_stopping, # tensorBoard, class_weight, verbose )

tf.keras.backend.clear_session() t1= time.time()

model= define_model(configs) model, history= fit_model(model, fit_configs ) print("early_stopping epoch:{}{} ______{}:{}". format("_" *(int( early_stopping . stopped_epoch )-9), early_stopping . stopped_epoch , monitor , history . history[ monitor][- 1])) predict_accuracy= predict_sample(model, tokenizer , keys[key] )

149 if early_stopping.stopped_epoch==0: plot_hist2(history) predict_sample(model, tokenizer, keys[key ])

fit_results.append([configs_to_save, early_stopping . stopped_epoch , history . history , predict_accuracy ]) print("Done, it took:{}".format(get_time (t0))) print("\n______\nAll Done, it took:{}".format(get_time(t0)))

Grid Searching different configurations

6.4.17 Predict sample

def predict_sample(model, tokenizer, k, print_results= True, print_predictions=False): predictions_by_type={ 'random ':[], 'positive ':[], 'negative ':[], ' fake_positive ':[], ' fake_negative ':[] } predictions=[] y_hats=[] correct_false=[] if print_results:

150 print("______-=== Random picks ====-______")

"""I 'm using percentiles instead of fixed values to get a more even distribution on datasets that vary in sizes, also preventing out of bounds on very small datasets""" ranges= range(len(df_x_train[k])) random_tests=[] fori in range(1,100): random_tests.append(int(np.percentile(ranges, i)))

fori in random_tests: orig= df_x_train[k].values[i] y_hat= df_y_trainB[k].values[i] to_predict= tokenizer.texts_to_sequences(orig) to_predict= pad_sequences(to_predict, maxlen=length, padding= 'post ') pred= round(model.predict(to_predict)[0][0]) predictions.append(pred) y_hats .append(y_hat) if print_predictions: print(orig) print(pred) if y_hat== pred: correct_false.append(1) predictions_by_type[ 'random '].append(1) if print_predictions: print("Correct!!!") else: correct_false.append(0) predictions_by_type[ 'random '].append(0) if print_predictions: print("swing anda miss:/")

151 predict_positive_values= 10 if print_results: print("______-==={} positive tweets ====-______". format( predict_positive_values )) j=0 prev_orig="" fori in range(len(df_y_trainB[k])): if df_y_trainB[k].iloc[i].item()==1: orig= df_x_train[k].values[i] y_hat= df_y_trainB[k].values[i] if orig!= prev_orig: to_predict= tokenizer.texts_to_sequences( orig ) to_predict= pad_sequences(to_predict, maxlen =length, padding= 'post ' ) pred= round(model.predict(to_predict)[0][0]) predictions.append(pred) y_hats .append(y_hat) if print_predictions: print(orig) print(pred) if y_hat== pred: correct_false.append(1) predictions_by_type[ 'positive '].append(1) if print_predictions: print("Correct!!!") else: correct_false.append(0) predictions_by_type[ 'positive '].append(0) if print_predictions: print("swing anda miss:/")

152 prev_orig= orig j+=1 ifj>= predict_positive_values: break if print_results: print("______-==={} Negative tweets ====-______". format( predict_positive_values )) j=0 prev_orig="" fori in range(len(df_y_trainB[k])): if df_y_trainB[k].iloc[i].item()==0: orig= df_x_train[k].values[i] y_hat= df_y_trainB[k].values[i] if orig!= prev_orig: to_predict= tokenizer.texts_to_sequences( orig ) to_predict= pad_sequences(to_predict, maxlen =length, padding= 'post ' ) pred= round(model.predict(to_predict)[0][0]) predictions.append(pred) y_hats .append(y_hat) if print_predictions: print(orig) print(pred) if y_hat== pred: correct_false.append(1) predictions_by_type[ 'negative '].append(1) if print_predictions: print("Correct!!!") else: correct_false.append(0)

153 predictions_by_type[ 'negative '].append(0) if print_predictions: print("swing anda miss:/") prev_orig= orig j+=1 ifj>= predict_positive_values: break

"""TODO, more custom tweets that should impact the market and should not impact the market""" if print_results: print("______-=== Should pack impact ====-______") fake_positive_tweets=[ 'trump fires powell says raises rates and more qe ', 'trump says will increase tariffs on china ', 'trump decleares war ', 'powell says will raise rates now ', 'powell will increase asset purchase ' ] for orig in fake_positive_tweets: to_predict= tokenizer.texts_to_sequences(orig) to_predict= pad_sequences(to_predict, maxlen=length, padding= 'post ') pred= round(model.predict(to_predict)[0][0]) predictions.append(pred) y_hats .append(1) if print_predictions: print(orig) print(pred) if1== pred: correct_false.append(1) predictions_by_type[ ' fake_positive '].append(1) if print_predictions: print("Correct!!!")

154 else: correct_false.append(1) predictions_by_type[ ' fake_positive '].append(0) if print_predictions: print("swing anda miss:/") if print_results: print("______-===Should not pack impact ====-______") orig=[ 'puppies are so cute '] fake_negative_tweets=[ 'puppies are so cute ', 'john loves her ', 'these jeans match that outfit ', 'market is stable ', 'should we buy now ' ] for orig in fake_negative_tweets: to_predict= tokenizer.texts_to_sequences(orig) to_predict= pad_sequences(to_predict, maxlen=length, padding= 'post ') pred= np.argmax(model.predict(to_predict)) pred= round(model.predict(to_predict)[0][0]) predictions.append(pred) y_hats .append(0) if print_predictions: print(orig) print(pred) if0== pred: correct_false.append(1) predictions_by_type[ ' fake_negative '].append(1) if print_predictions: print("Correct!!!") else: correct_false.append(0) predictions_by_type[ ' fake_negative '].append(0) if print_predictions: print("swing anda miss:/")

155 accuracy={} for key in predictions_by_type: if"_" in key: if print_results: print("Accuracy for:{}:\t{}%".format(key,int ( 100* sum( predictions_by_type [key])/float( len( predictions_by_type [key]))))) else: if print_results: print("Accuracy for:{}:\t\t{}%".format(key, int(100* sum( predictions_by_type [key])/float( len( predictions_by_type [key]))))) accuracy[key]= int(100* sum(predictions_by_type[key ])/float(len( predictions_by_type[ key]))) if print_results: print("\nTotal model accuracy:{}%".format(int(100* sum(correct_false)/ float(len( correct_false))))) accuracy[ 'total ']= int(100* sum(correct_false)/float( len(correct_false))) return accuracy

Predict sample takes 100 semi random samples evenly across the dataset using percentiles of the length of the dataset. It predicts tweets that we know impacted the market. It predicts tweets that we know did not impact

156 the market. It predicts made up tweets that should impact the market. It predicts made up tweets that should not impact the market. It returns a dictionary with an overview of how the model did in each category and then also how it did in total.

157