@Therealdonaldtrump's Tweets Correlation with Stock Market Volatility

EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15 HP STOCKHOLM, SVERIGE 2020

@TheRealDonaldTrump’s tweets correlation with stock market volatility

ISAK OLOFSSON

KTH SKOLAN FÖR TEKNIKVETENSKAP

@TheRealDonaldTrump’s tweets correlation with stock market volatility

Isak Olofsson

ROYAL

Degree Projects in Applied Mathematics and Industrial Economics (15 hp) Degree Programme in Industrial Engineering and Management (300 hp) KTH Royal Institute of Technology year 2020 Supervisor at KTH: Alessandro Mastrototaro Examiner at KTH: Sigrid Källblad Nordin

TRITA-SCI-GRU 2020:116 MAT-K 2020:017

Royal Institute of Technology School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

Abstract

The purpose of this study is to analyze if there is any tweet speciﬁc data posted by Donald Trump that has a correlation with the volatility of the stock market. If any details about the president Trump’s tweets show correlation with the volatility, the goal is to ﬁnd a subset of regressors with as high as possible predictability. The content of tweets is used as the base for regressors.

The method which has been used is a multiple linear regression with tweet and volatility data ranging from 2010 until 2020. As a measure of volatility, the Cboe VIX has been used, and the regressors in the model have focused on the content of tweets posted by Trump using TF-IDF to evaluate the content of tweets.

The results from the study imply that the chosen regressors display a small signiﬁcant correlation of with an adjusted R2 = 0.4501 between Trumps tweets and the market volatility. The ﬁndings Include 78 words with correlation to stock market volatility when part of President Trump’s tweets. The stock market is a large and complex system of many unknowns, which aggravate the process of simplifying and quantifying data of only one source into a regression model with high predictability.

Sammanfattning

Syftet med denna studie äratt analysera om det finns n˚agraspecifika egenskaper i de tweets publicerade av Donald Trump som har en korrelation med volatiliteten p˚a aktiemarknaden. Om egenskaper kring president Trumps tweets visar ett samband med volatiliteten ärm˚aletatt hitta en delmängdav regressorer med föratt beskriva sambandet med s˚ahögsignifikans som möjligt. Inneh˚alleti tweets har varit i fokus använts som regressorer.

Metoden som har använts ären multipel linjärregression med tweet och volatilitetsdata som sträcker sig fr˚an2010 till 2020. Som ett m˚attp˚avolatilitet har Cboe VIX använts, och regressorerna i modellen har fokuserat p˚ainneh˚alleti tweets därTF-IDF har använts föratt transformera ord till numeriska värden.

Resultaten fr˚anstudien visar att de valda regressorerna uppvisar en liten men signifikant korrelation med en justerad R 2 = 0,4501 mellan Trumps tweets och marknadens volatilitet. Resultaten inkluderar 78 ord som de nären ären del av president Trumps tweets visar en signifikant korrelation till volatiliteten p˚abörsen. Börsenärett stort och komplext system av m˚angaokända,som försv˚ararprocessen att förenklaoch kvantifiera data fr˚anendast en källatill en regressionsmodell med högförutsägbarhet.

Contents

1 Introduction 6 1.1 Background ...... 6 1.2 Purpose and Problem Statement ...... 6 1.3 Earlier research ...... 7 1.3.1 Volfefe Index ...... 7 1.3.2 Stock Price Expectations and Stock Trading ...... 8 1.3.3 Twitter mood predicts the stock market ...... 8

2 Economical Theory of the Study 10 2.1 The financial market ...... 10 2.1.1 The efficient market hypothesis ...... 10 2.1.2 The stock market ...... 10 2.1.3 New’s impact on the financial market ...... 11 2.1.4 Volatility and Cboe VIX Index ...... 12 2.2 Twitter and Sentiment Analysis ...... 13

3 Mathematical Theory of the Study 14 3.1 Multiple Linear Regression ...... 14 3.1.1 Assumptions of the linear regression model ...... 14 3.1.2 Ordinary Least Squares estimation ...... 15 3.1.3 Indicator variables ...... 16 3.1.4 Residual Analysis ...... 16 3.2 Model assessment and veriﬁcation ...... 19 3.2.1 Leveraged and Inﬂuential points ...... 19 3.2.2 Multicollinearity ...... 20 3.2.3 Methods for dealing with multicollinearity ...... 20 3.2.4 Variable Selection ...... 21 3.2.5 Mallows Cp ...... 22 3.3 Quantitative Selection ...... 23 3.3.1 Selection using TF-IDF ...... 23 3.3.2 Stemming ...... 24 3.4 Transformation ...... 24 3.4.1 Box-Cox Transformation ...... 24

4 Methodology 26

4 4.1 Data Gathering ...... 26 4.2 General transformation of data points ...... 27 4.2.1 Transformation of Volatility ...... 27 4.2.2 Transformation of dates ...... 29 4.3 Initial models ...... 29 4.3.1 Model 1 - statistics of tweets ...... 29 4.3.2 Model 2 - Words from Volfefe Index ...... 29 4.4 Regression Model ...... 29 4.4.1 Data selection using TF-IDF ...... 30 4.4.2 Variable selection using Forward Selection ...... 31 4.4.3 Regression ...... 32

5 Results 33 5.1 Findings ...... 33 5.1.1 Interpretation ...... 34 5.1.2 Top regressors ...... 35 5.2 Residual analysis ...... 36

6 Discussion 41 6.1 Analysis of results ...... 41 6.2 Limitations ...... 42 6.3 Conclusion ...... 42 6.4 Further studies ...... 43

5 1 Introduction

1.1 Background

On June 16 in 2015, Donald Trump, a controversial and not unknown ﬁgure, at least in the United States announced his intention to run for president of the United States of America as the republican party’s candidate. The day before, Donald Trump’s account @realdonaldtrump had just under three million followers on the social media platform Twitter, reading the twenty-three-thousand tweets he had posted, not including retweets. By the start of 2020 his audience had reached sixty-eight million accounts and his tweet legacy amounted to forty-one thousand tweets, again not counting retweets. During this period Donald Trump has transitioned from being a famous person, predominantly in the United States to becoming a household name worldwide. Soon ending his ﬁrst period of presidency and just putting his reelection campaign into gear with four more years in the White House as his target, his twitter account continues to deliver daily tweets and replies. This form of direct communication from one the world’s truly elite politicians is unprecedented.

Financial markets have always had a ﬂavour of speculation to it as the cycle of boom, bust, rinse repeat that keeps on iterating ever since the seventeenth century Tulip mania.[1] Motives of these speculative moves are hard to formulate explicitly, someone able to do this would surely be able to retire rather quickly. That political decisions play its part to some degree is something that most, if not all, would agree on. Therefore, it is of interest to investigate the connection between president Trump’s tweets about his work and the ﬁnancial market, this will in this thesis be done using a multiple regression analysis.

The response of Trump’s tweeting in the market will be measured with volatility. Volatility in the ﬁnancial market is of high interest as it is used in pricing derivatives and therefore play a great part in how ﬁnancial markets move in the short-to-medium time span. For this thesis the daily closing price of the VIX index will be used.

1.2 Purpose and Problem Statement

The main purpose of this thesis is to investigate whether @realdonaldrump tweets does have an impact on market volatility, and if so, how much and in what way? This connection will be examined using a multiple linear regression model, with the characteristics of the tweets of a day as the regressors and the day close for the VIX that day as response variable.

However, we must ﬁrst understand that president Trump tweets multiple times almost every day, and that very few of his tweets can be considered to inﬂuence market movements. Trumps tweets

6 vary considerably in content, which is a problem when performing the regression. Tweets concerning trade and monetary policy, that are more likely to affect the market are prone to drown in the noise of misspelled and self-praising tweets. Therefore, this thesis will first deal with a selection of tweets in order to determine and sort out which tweets to include in the regression. Further, a connection between the tweets of significance and the market volatility will be quantified using a multiple linear regression.

This paper and its findings might be of interest to a large variety of people, including the general public and international traders investing in markets worldwide. Although this analysis will be based on the American markets historical evidence implies that there to a varying degree exists a correlation between the market returns around the world.[2] Specifically the findings given the methods used to could be of interest for traders deploying quantitative trading models where volatility could determine size, timing and risk in potential trades.

Traditional model assessment and veriﬁcation techniques used with regression such as analysis of residuals and multicollinearity will be utilized. Also, a regressor of whether Donald Trump was or was not president at the time of publishing his tweet will be implemented in the model to investigate if the impact of Trump’s tweets before and after his presidency vary.

An important demarcation to note is that the research is limited to the eﬀect of Donald Trumps tweets on the market volatility on the same day of the tweet. The ambitions are not that the ﬁndings will be able to explain market volatility fully, rather to examine which parameters of Donald Trump’s tweets correlate and possibly impact market volatility.

1.3 Earlier research 1.3.1 Volfefe Index

The interest for President Trumps controversial behavior in Social Media have been covered in many diﬀerent angles. In September of 2019, the bank JP.Morgan Chase created a index called the Volfefe Index, based on President Trumps Tweets.[4]

The Volfefe index was created to predict movements in treasury bonds, and in order to do this JP Morgan had to build an algorithm for assessing tweets. To do this, every tweet’s impact was categorised as significant or non-significant. Significant tweets were those that were followed by move of ±0.25 basis points in 10 year Treasury yields after 5 minutes of trading from the publication of the tweet.

Those words are, in order of decreasing signiﬁcance.

7 1. China 6. Dollars 11. President 16. Years 2. Billion 7. Tariﬀs 12. Congressman 17. Farmers 3. Products 8. Country 13. People 18. Going 4. Democrats 9. Muller 14. Korea 19. Trade 5. Great 10. Border 15. Party 20. Never

This research and its findings show that there, on a small time frame exists a correlation between Donald Trump’s Tweets and the financial market. Performing this classification of tweets is outside the scope of this project and will not be attempted. However, the 20 most influential words for market moving tweet shared in the article by JP Morgan will be used.

1.3.2 Stock Price Expectations and Stock Trading

In a study from 2012 researchers from RAND published a paper for the National Bureau of Economic Research investigating stock price expectations in relations to market events. The ﬁndings from the paper suggests that on average, subjective expectations of stock market behavior depend on stock price changes, meaning that past performance will inﬂuence the future expectations on a stock. Moreover, stock trading responds to changes in expectations in a delayed manner i.e. that stock operators execute trades now even if the change in expectations occurred several ago. Implying that news impact the market also after the time of punishment by building subjective momentum but that the initial reaction to an event is of importance. Further the paper also discusses and concludes the vast complexity behind market reactions and summarizes that we still don’t fully understand how expectation on events are translated into action. [5]

1.3.3 Twitter mood predicts the stock market

Behavioral economics tells us that sentiment profoundly can influence decision-making and individual behavior. In a paper form 2011 researchers from Indiana University and University of Manchester investigate whether this can be applied to in a larger scale that is, can societies have states of mood that affect collective decision making? In the paper the researchers use twitter as a data base of sentiment in society. More specifically the paper investigates whether measurements of collective mood states derived from large-scale Twitter feeds are correlated to the value of the Dow Jones Industrial Average (DJIA). This is done using OpinionFinder and Google-Profile of Mood States (GPOMS). The results from the study indicate that predictions of the DJIA significantly can be improved by including some specific public mood dimensions. The model presented in the paper is based Self-Organizing Fuzzy Neural Network and found an accuracy of 86.7% in predicting

8 the daily up and down changes in the closing values of the DJIA. Which compared to methods not including the sentiment model reduces the Mean Average Percentage Error by over 6%.[6]

9 Theoretical framework

2 Economical Theory of the Study

2.1 The ﬁnancial market 2.1.1 The eﬃcient market hypothesis

The efficient market hypothesis (EMH) made famous by Eugene Fama in the 1970s is theoretical concept used to model financial markets, EMH puts certain demands on a market and the pricing of securities listed on a market. A financial market is said to be efficient if the prices of securities on the market fully reflect all available information at the time. By definition, the market is said to be efficient to a set of information, if that information when raveled to all participants at the same time would leave security prices unchanged. Having an efficient market with respect to some set of information, φ, implies that there exists no opportunities of arbitrage and that it is impossible to make economic profit trading on the already known information φ.[7]

2.1.2 The stock market

A stock market is a platform were buyers and sellers of stocks can meet and trade stocks. Each stock, also known as a share, traded on a stock market represents a piece of ownership in the company associated with the share. It is common to associate the term stock market to big companies listed on big and well-known stock exchanges such as the New York Stock Exchange, NASDAQ or Dow Jones. However, in 2020 there exists heaps of other exchanges that facilitate the same function, but for other markets.

Historically stock markets were physical places were people met and came to an agreement on the price and number of shares. This way of selling and buying stocks are today outdated and instead almost all transactions occur via some form of digital platform. Different stocks trade on different stock markets. This division is partly due to practicality, addressing the implications of different time-zones and currencies around the world. However, there are also other factors that divide stocks into different markets, e.g. market value. Some stock exchanges only trade publicly listed shares, while other stock exchanges may also include securities that are privately traded. Exchanges are not only for equities but may also list other securities for instance bonds or derivatives.[8]

The price change or movement of a stock is function of supply and demand. Depending on the relative share of buyers and sellers the price is prone to ﬂuctuate, and in a scenario where the numbers of sellers exceed the number of buyers the price will decrease until the price becomes low

10 enough to encourage buyers and thereby increasing the demand. The current price per share is a function of all the current stock owners view on the company and the company’s future potential. The diﬃculty of a stock market is clearly to predict price movement, since present and historical data usually does not suﬃce to make accurate future projections. When predicting future prices, one must take every person’s sentiment of the company into calculation. Because of the complexity of this estimation future share prices are often seen as unattainable.[9]

From theoretical and empirical studies it is evident that the stock market has played a significant role within both the advanced economy and the emerging market.[10] The stock market sentiment is in a way a reflection of the larger economical sentiment in society. Government controlling policy’s, professional and recreational investors, companies and media all playing their own role the stock market. All these institutions are ultimately controlled by humans which are known to not always act rationally. The mood on a market participant at a given point in time is referred to as Market psychology. Emotions, including, greed, fear, expectations, and circumstances are all factors that can contribute to market psychology at any time. As early as 1936 periodic John Maynard Keynes described how of these sentiments in society can trigger periods of “risk-on” and risk-off”. [11] Something that conventional financial theory mainly EMH fails to explain the emotion involved in investing, and how this contribute to irrational behavior. In other words, theories of market psychology are in conflict with the belief that markets are rational when they in reality never fully are. This aspect of market psychology further adds to the complexity of predicting individual stocks and markets performance based on fundamental facts.[12]

2.1.3 New’s impact on the ﬁnancial market

The stock market is driven by and relies on new information to be unveiled. As part of EMH the expected news is priced into the price of a stock or an index, while unexpected news is not. Stock price movement depends on the constant change in supply and demand, making this relationship is highly sensitive to the news of the moment. That said the anticipation of an event might be already price in the expected event even before it’s published. On the other hand unexpected news disclaiming something new and not priced in must ﬁrst be interpreted, making chasing the news a tricky strategy for trading.[13]

Financial markets never rest and constantly react to new information, making isolating which event resulted in which price movement even more difficult. Generally indicators of general economic news are found to be better than firm specific news when predicting price changes on the stock market.[14]

Nevertheless, in a study from early 2017 not long after Trump’s presidential inauguration, re-

11 searchers from Harvard and the university of Zurich published a paper trying to model asset price responses to unexpected news in and around the election. In the morning of election day Donald Trump was a fairly unlikely winner in the election, with betting services pricing bets with the chance of Trump being elected to between 18-27 %. When Trump to a lot of people’s surprise won the election, markets reacted quickly. In the paper the model for price Pn and return Rn are modeled around the presidential election of 2016 but in theory the model can be applied to all events with expectations on outcome. Given two outcomes X and Y with probabilities πX and πY and respectively the current price before the event is given by

Pn = πX Pn,X + πY Pn,Y

Where PX and PY are the expected price given outcome X or Y. The expected return of given outcome X then becomes Pn,X − Pn Rn = Pn Clearly a straightforward model including expectations and the outcome of an event. Using this model, the researchers found that the individual stock price reactions to the election reflect the unexpected change in investor expectations on economic growth, taxes, and trade policy. More specifically, the market reacted quickly to the expected consequences of the election for US growth and tax policy while, it took the market longer to incorporate the consequences of shifts in trade policy. By evaluating the impacts of different news under a ten-day period after the election the researchers found that one-day response varied between about 30-80% of their ten-day response. Implying that the stock market reacts differently to different events and headlines, sometimes the implications of news are straightforward to interpret while other times the effects of a headline is more cumbersome to asses.[15]

2.1.4 Volatility and Cboe VIX Index

Volatility in a stock market measures the frequency and magnitude of price changes, both for movements up and down. This applies to all traded financial instruments during a certain period of time. The more dramatic the price fluctuation in that instrument, the larger the volatility. The volatility is defined and measured either using historical prices, called realized volatility or as a measurement of implied volatility by the use of options prices.[16]

The VIX, which stands for Volatility Index, is an index introduced by Chicago Board Options Ex- change in 1993 and tries to capture the 30-day implied volatility of the underlying equity index. The VIX uses the later and is therefore a measurement expected future volatility.[17] Cboe continuously updates the VIX index values during trading hours. The VIX measures the implicit volatility for

12 the index SP 500, which is one of the most common equity indices, that many consider to be one of the best representations of the U.S. stock market. The VIX can be seen as a leading indicator of investor attitudes and market volatility relating to the listed options upon which the index is based.[18]

There is a phenomenon called volatility asymmetry which refers to the volatility being higher in down markets than in up markets.[19] This means that volatility generally is low during longer period of economic growth and, high during economic recessions. As a result, trading volatility either through options or special derivatives can be used as a hedge against a downturn in the stock market.

2.2 Twitter and Sentiment Analysis

Twitter was founded in 2006 and is one of the ﬁrst social media platforms launched that still ex- isting in 2020. The platform was launched in San Francisco, California and is now an international microblogging and social networking service. On twitter all users can post and interact with short statements or messages known as ”tweets”. The platform is also open for unregistered users with the restriction being that unregistered users are limited to reading. Originally, tweets were re- stricted to a maximum of 140 characters but in November 2017 this restriction increased to 280 characters. Twitter is accessible on both its website interface and on its mobile-device application software.[20]

The growth of social media and social networking sites have been exponential in the past decade for platforms such as Twitter and Facebook. This widespread phenomenon of social media raises the possibility to track the preferences of citizens in an unprecedented manner. At the end of 2019 twitter averaged 152 million daily users.[21] The opinions and ﬂow of information spreading instantaneously on Twitter represents a valuable source of data that can be useful for a general sentiment of topics.[22] This source of data comes with a complexity of analyzing emotions on social media is due to non-standard linguistics, intensive use of slang, emojis and incorrect grammar. Aspects that people have no problem understanding but nevertheless is troublesome for models to interpret. Another concern is that the results of sentiment analysis on social medias such as twitter assumes that the ﬁndings are representative for the entire population. [23] Something that might not always be true since not everyone is connected to social media.

13 3 Mathematical Theory of the Study

This part will walk the reader through the more rigorous mathematical aspects of the study. Unless otherwise stated the theory found in section 3 is extracted from Montgomery, D.C., Peck, E.A. and Vining, G.G. (2012). [24]

3.1 Multiple Linear Regression

The hypothesis is that the the volatility on a daily basis can be explained by using a multiple linear regression using the measurements supplied in the data set. As presented in the following linear model for predicting the VIX index

y = β0 + β1x1 + β2x2 + β3x3 + ... + βkxk +

The interpretation of this formula is that xi is the measurement of a tewwt and the goal is to ﬁnd the corresponding coeﬃcient, βi, to be inserted into the model in order to produce the best estimate the VIX value, here represented by y. With n observations and k covariates the model in matrix notations is described as follows

y = Xβ + where

β  y  1 x ··· x  0 ε  1 11 1k β  1 y  1 x ··· x   1 ε   2   21 2k  β   2  y =  . ,X = . . . . ,β =  2, ε =  . .  .  . . .. .   .   .   .  . . .   .   .   .  yn 1 xn1 ··· xnk εn βk

Here βi explains by how much the VIX is expected to change by every unit change in the measurement, xi, and β0 is the intercept for the model.

3.1.1 Assumptions of the linear regression model

In the study of regression analysis major assumptions are stated. In order for the regression model to be valid these assumptions must be proved to be true. Otherwise model inadequacies are inevitable. The assumptions are

14 1. The relationship between the response variable y and the regressors x is approximately linear.

2. Error term has mean µ = 0 and constant variance = σ2.

3. Errors are uncorrelated. i.e. Corr(i, j) = 0, 4. The observation of y is ﬁxed in repeated samples. Meaning that resampling with the same independent variable values is possible.

5. The number of observations, n is larger than the number of regressors k. Also there are no

exact linear relationships between the xi’s.

When evaluating these conditions residual analysis is a very useful method for diagnosing violations of the basic regression assumptions, this will be further explained later on in the study.

3.1.2 Ordinary Least Squares estimation

The values of β will be estimated using linerar model lm() function in R which utilises the ordinary least square method and minimizes the sum of squares of the residuals. This means that the estimation of β is given by a solution to the normal equations where the residual is deﬁned as

e = y − Xβ

0 Minimizing the sum of suqares of the residuals (SSRes = e e) where the objective function S is given by βˆ = arg min S(β) β can be written as n p X X 2 2 S(β) = yi − Xijβj = y − Xβ i=1 j=1 This minimization problem has a unique solution, provided that the k columns of the matrix are linearly independent

(X0X)βˆ = X0y

Finally rewriting this we end up with the OLS estimate

βˆ = (X0X)−1X0y

After the estimates of β has been produced these have to be evaluated further to assure congruence with the assumptions relating to theory on quality of results.

15 3.1.3 Indicator variables

Unlike other regressor variables that have a quantitative value, the indicator variables or dummy variables are instead qualitative variables. Since they have no natural numeric value they will in the regression model be represented via levels either 1 or 0 assigned to them. In this study the indicator variable is Donald trumps occupation. This indicator variables is divided in to civilian (0) or president (1).

3.1.4 Residual Analysis

The key assumption which constitutes as the backbone of the whole project is that between VIX and the regressors there are at least a reasonable linear relationship. By examining the produced residual through various standardised tests and measurement, there is a higher chance of detecting model in-adequacy.

Normal residuals

Normal residuals are defined as the difference between the observed value yi and the fitted value of the modely î.

ei = yi − yˆi

The residual is interpreted as the deviation between the model and the actual data, making plotting the residual an effective method for quickly detecting violation of model assumptions. In the best of worlds where the model is effective the sum of all residuals should be zero and their distribution be of the Gaussian type. In the case where this is not the case something with the model is flawed and examining the residual will give important clues to what is wrong.

The residuals have zero mean, E(e) and their approximate average variance can be estimated using the residual sum of squares, which has n − k degrees of freedom associated with it since k parameters are estimated in the regression model. An estimation of the variance residuals is given by the residual mean square MSRes

Pn (ˆy − y¯)2 SS i=1 i = res = MS n − k n − k res

16 Scaled Residuals

Scaled residuals are obtained by transforming the normal residuals. The purpose of scaling is to make residuals comparable with both each other and residuals from other models. These new residuals can offer further clues whether something is wrong with the model and how it could benefit from modifications. In order to understand the notations in the transformation of residuals in the following part we will now introduce some concepts.

The total sum of squares SST is partitioned into a sum of squares due to regression, SSR, and a residual sum of squares, SSRes.

SST = SSR + SSRes

Pn 2 Pn 2 Where the terms are deﬁnes as follows, SST = i=1 (yi − y¯) , SSR = i=1 (ˆyi − y¯) and Pn 2 Pn 2 0 SSRes = i=1(yi − yˆi) = i=1(ei) = e e.

Standardized Residuals

By normalising the residuals so that they can be estimated by a Gaussian distribution with mean equal to zero and variance of approximately one unit. This modiﬁed residual makes it easier to analyse as the residual can be compared with other standardised residuals. As a rule of thumb a value of di > 3 is an indication of a possible outlier. The estimation is given by

ei di = √ MSRes

2 Where MSRes is an unbiased estimator of σ .

Studentized Residuals

Studentized residuals builds on the standardized residuals that were obtained by nomalizing using an estimate of variance with MSRes. To calculate studentized residuals the scaling is based of the exact standard deviation of every i:th observation. This is calculated by dividing ei with the exact standard deviation of the given observation i.

Writing the residual by use of the hat matrix H = X(X0X)−1X0 gives

e = (I − H))y which through substitution in y = Xβ +

17 gives the following e = (I − H)

Showing that the they are the same transformations of y as of . Variance of the error, is given by Var() = σ2I and since I − H is symmetric and idempotent the residuals covariance matrix is Var(e) = Var[(I − H)] = (I − H)Var()(I − H)0 = σ2(I − H)

With the variance of each residual given by the covariance matrix according to

2 Var(ei) = σ (1 − hii)

Where hii is an element in the hat-matrix H.

Using the found variance of ei we have that the studentized residual is calculated by

ei ri = p MSRes(1 − hii)

The takeaway from the above formulas is that in general a xi closer to the center has a larger variation and thus model assumptions are more probable to be challenged further out towards the edges and that if everything about the model is sound the residual will have the variance = 1. It could also be of use to know that as n goes to infinity, studentized residuals usually converge with standardised residuals. As in most cases with residuals, one lonely point far away from the rest may be influential on the whole fit. These points should be further analysed.

R-Studentized Residuals

2 When constructing R-Studentized residuals the variance is estimated by calculating Si , where i is the an observation removed from the estimation. This is done to examine how single datapoints influence the results, much as later described in the PRESS Residuals-section. The formula for 2 calculating Si is (n − p)MS − e2/(1 − h ) S2 = Res i ii i n − k − 1 This estimate of σ2 is then used to calculate the R-student according to: e t = i , i = 1, ..., n i p 2 Si (1 − hii) An observation i that gives a R-studentized residual wich differs greatly from the result obtained 2 from estimating σ using MSRes indicates that the observation, i, is an influential point.

18 PRESS Residuals

PRESS, Prediction Error Sum of Squares, is another method to examine the influence of specified observation, i, in the set. It is produced by calculating the error sum of squares from for every observation except i. The PRESS residual is defined as

ei e(i) = p , i = 1, ..., n (1 − hii)

Where hii, that is the elements of the hat matrix H is large so will the PRESS residuals also be. If this sum greatly differs from the value obtained from the whole set and the sums obtained from excluding the other observations one-by-one the isolated point, i, has an disproportional effect on the regression and may skew the model. This means that a point that stands out in the PRESS diagram is a point where the model fits well but an model excluding this point will have poor results when predicting.

3.2 Model assessment and veriﬁcation 3.2.1 Leveraged and Inﬂuential points

There are many forms of outliers that can be identified, and in this section we will be looking at leveraged points and influential points. A point of high leverage is an observation with an unusual high x-value. If this point’s y-value is in line with the rest of the regression it won’t affect the fit of the model too much. However, if this point also has an deviating y-value, the point becomes an influential point. Influential points have a large effect on the model, since it pulls the entire regression towards it. Concluding, not all leverage points are influential for the fit.

When identifying these points the Hat-matrix H = X(X0X)−1X is crucial. Each element of the hat matrix hii tells us the leverage of yi and regressors xii on the optimal ﬁtted valuey ˆi. As a general rule a point is said to be a leveraged point if the diagonal in the Hat-matrix for that observation exceeds double the average, 2p ≯ n.

Cook’s Distance

In order to ﬁnd these points of interest, a useful diagnostics tool is Cook’s Distance. Cook’s Distance takes into account both the x-value for the observation as well as the response variable by taking the least square value from the observation to the ﬁt. Cook’s distance for the i:th observation is calculated by deleting that observation and looking at the change in the model that results from

19 doing so. Cook’s distance for the observation i removed can be calculated as below, where n is the number of observations.

2 2 ri V ar(ˆyi) ri hii Di = = , i = 1, 2, ..., n k V ar(ei) k 1 − hii

Where ri it the i:th standardized reidual hii i a diagonal element of the hat matrix H. A rule of thumb points with Di>1 is considered to be inﬂuential points.

3.2.2 Multicollinearity

Multicollinearity occurs if the regressors are almost perfectly linear. Having multicollinearity in the data may cause diﬀerent degrees of interference in the model, symptoms range from inaccuracy of the estimation to the model being straight out misleading and wrong. Understanding the data-set and the source of the multicollinearity is key in treating it.

3.2.3 Methods for dealing with multicollinearity

Variance Inﬂation Factor

Another method for detecting multicollinearity is to look at the variance inﬂation factor, or VIF for short. The VIF is deﬁned as follows,

2 −1 VIFi = Cii = (1 − Rii)

0 −1 2 Where C = (X X) ,R denotes the coeﬃcient of determination and each observations is Cii = −1 (1 − Rii ).

If xi is nearly linearly dependent to some subset of regressors, Cii becomes very large. A VIFi value of 10 indicate multicollinearity which can result in poor estimations of β.

Eigenvalue Analysis

One of the most common analysis for detecting multicollinearity is to look at the eigenvalues of X’X in our system. The easiest way to determine whether there is multicollinearity is to look at the condition number of X’X, deﬁned as

λ k = max λmin

20 The common rule of thumb is that

• k = 1 implies perfectly orthogonal regressors and no multicollinearity

• k < 100 implies week multicollinearity

• 100 < k < 1000 relates to moderate to strong multicollinearity

• k > 1000 is sign of severe multicollinearity.

3.2.4 Variable Selection

A contradiction that occurs in any regression model is the problem with the number of variables. Firstly, it would be preferred to include as many regressors as possible for the purpose of having the largest scope of information. On the other hand, too many regressors inﬂate variance which will have a negative impact on the overall performance of the model.

All Possible Regression

This method ﬁts all the possible regression equations involving one candidate regressor, two candidate regressors, and so on. The optimal regression model is then selected based on some criterion, in our case Baysian Information Criteria, Mallows Cp and adjusted R2. This technique is rather computational heavy and is not suited for models with many regressors. A model with k candidate regressors result in that there are 2 k total equations to be estimated and examined. For depending on the number of regressors in model this technique may not be possible. For models under 30 regressors this method is acceptable using the computers of 2020. For models containing more than 30 regressors variable selection can be achieved using either forward, backward or stepwise elimination.

Forward Selection

Forward elimination starts with a blank model solely including the interception β0. Then the model adds one regressor at a time in order to to find the optimal subset of regressors. The first variable is chosen based on the largest simple correlation with the response variable. When adding the second regressor, the method again chooses the regressor with the largest correlation to the response variable, after adjusting for the first variable. The regressors having the highest partial correlation will produce the largest value of F statistic for testing the significance of the regression.

21 Backward Elimination

The inverse of forward selection. Starting the model with all the regressors being in the model.Then the F statistic is evaluated for all regressors as if it was the last to enter the model. Then the model simply removes the regressor with the smallest F statistic. Repeat.

Baysian Information Criterion

Bayesian information criterion, or BIC, is a criterion that balances the number of regressors to the number of observations. The BIC is used for variable selection and places penalty on adding regressors to the model. The BIC can be computed as

SS BIC = n ln Res + k ln(n) n

The variable k denotes the number of coeﬃcients including the intercept and n is the number of observations.

R2 and adjusted R2

Another way to evaluate the adequacy of the ﬁtted model is to look at the R2 for the generated models.

SS R2 = R SST

2 Where SST is the total sum of squares and SSR is sum of squares due to regression. However R dose not take the number of variables in to consideration. it never decreases when new variable is added to the model. This is why adjusted R2 is used as a criteria instead, this takes the number of regressors in to consideration.

2 SSR/(n − p) Radj = 1 − SST /(n − 1)

The deﬁnition of adjusted R2 for a model with n observations and k regressors.

3.2.5 Mallows Cp

Mallow’s Cp presents a variance based criterion, deﬁned as SS C = Res − n + 2p, p σˆ2

22 2 whereσ ˆ is an estimator of the variance, e.g. MSRes. It can be shown that if the p-term is without bias, the estimated value of the Cp equals p. When using the Cp criterion, it can be helpful to visualize it in a plot of Cp as a function of p for each regression equation, this is exempliﬁed in

ﬁgure 1. Models with little bias will have values of Cp that fall near the line Cp = p. While regression equations with bias e.g. point B, are illustrated above this line. Generally, small values of Cp are desirable. On the other hand small bias may be preferred for the sake of a simpler model, in the case illustrated in ﬁgure 1 C can be preferred over A even tough in includes bias.

Figure 1: A Cp plot example

3.3 Quantitative Selection 3.3.1 Selection using TF-IDF

TF-IDF is a quantitative measure reﬂecting the importance of a single word in a sentence or collection of words. [25] TF-IDF is deﬁned as the product of two terms, term frequency (TF) and inverse document frequency (IDF).

The term frequency is used to measure how frequent a word is in the given document. TF treats the problem of documents having diﬀerent total word counts. To compensate for documents having unequal lengths the TF takes the occurrence of the speciﬁc word and divides it with the total word count.

Occurrences of word X in document Y TF = Word count in document Y

23 For the second part, the inverse document frequency attempts to distinguish relevant and non- relevant terms by observing whether the term is common or rare across all documents. The IDF assigns lower values to common words and assigns larger values for the words that are rare. This is done by the logarithmically scaled inverse fraction of the documents that contain the word.

Number of documents IDF = log Documents that contain the term X

And as previously stated the TF-IDF is simply the frequency multiplied by the inverse document frequency. Calculating the TF-IDF for all terms in a corpus will assign a numeric value of sig- niﬁcance to each word in each document. This value represents how important a speciﬁc word is to the collection of documents. The higher the TF-IDF value, the greater the importance of the word.

However, the method of TF-IDF are not without limitations. For instance, it does not retain the semantic context of words in the initial text. Moreover TF-IDF is unaware of synonyms or even plural form of words. [26] This can be handled trough the process of stemming.

3.3.2 Stemming

The process of stemming is in morphology and information retrieval of reducing words to their core form. For instance the words consultant, consultants, consultancy, consulting are all reduced to their stem-form, that is consult. The word do not need to be an inﬂection of a word, it is enough that related words map to the same stem. [27] Even if the word in itself is not a valid root. All this is accomplished through algorithms. The process is implanted in our every day life, for instance many search engines treat word of the same stem as the same as synonyms as a way to expand the query.[28]

3.4 Transformation 3.4.1 Box-Cox Transformation

Box-Cox is used in order to investigate whether the data set requires transformation to correct for non-constant variance or non-normality. If the model needs transformation this can be done by utilising the method presented by, and named after, Box and Cox. The method deploys the fact that yλ can be used to adjust for non-normality or non-constant variance. Lambda is a constant estimated by maximizing

24 1 L(λ) = − n ln (SS (λ)) 2 Res

ˆ 1 2 Plotting L(λ) and drawing vertical lines marking the the horizontal lines L(λ) − 2 χα,1 on the y- axis, two intersections are found. χα,1 is the upper α percentage point of the chi-square distribution with one degree of freedom. Mening that for α = 0.05 the x-values of the vertical lines indicate the border of a 95 per cent CI. If 1 is inside of this CI it implies that no transformation is needed. In other cases the recommended transform is.

 λ yi − 1 (λ)  if λ 6= 0, yi = λ ln yi if λ = 0,

This is the one-parameter Box–Cox transformations that will be used to transform data. The exact value of λ is within the 95 per cent CI conﬁdence interval but not exactly known. The transformation process becomes because of this one of trail and error and the method can be repeated if one transformation was unsatisfying.

25 4 Methodology

In this section the iterative process of finding the final model is described. Before obtaining the final model and evaluating it two initial models were tested and discarded. For the model building the majority of the time was spent on sorting and selecting different aspects of the tweets. In order to understand the selection and transformation of the data, the two data sets used are described below.

4.1 Data Gathering

Two data sets that were used to carry out this analysis are ﬁrstly, the regressors containing all of Trump’s tweets with a date- and time stamp, as well as the number of favourites and retweets.

The data of President Trumps Tweets were found as an open source csv-ﬁle at Kaggle.com.[29] This data set contains all of Donald Trump’s tweets from his very ﬁrst tweet the May 4th 2009 all the way to January 20th 2020, summing to 41 060 unique tweets. This data do not include any retweets.

Figure 2: Tweet attributes and their descriptions

The second data set needed is that of the volatility of the stock market. Here there are plenty of options and a wealth of diﬀerent data. The Cboe Volatility Index ($VIX) will be used. There are two reasons, one the index measuring implicit volatility and has the advantage of having unweighted data. This data is found directly on Cboe’s website.[30]

26 Figure 3: VIX attributes and their descriptions

In the following analysis a transformed value of VIX Close will be utilized.

4.2 General transformation of data points 4.2.1 Transformation of Volatility

When performing the regression analysis it is key that the response variable has a normally distribution. Below is a graph showing the VIX index with a mean value of 18.34, a maximum and minimum value of 82.69 and 9.14 respectively during the last 10 years.

Figure 4: VIX index historical prices

As can be seen in ﬁgure 5 the distribution of the VIX price clearly is not normally distributed. This can be seen in the histogram of VIX Close prices below.

27 Figure 5: VIX index histogram Figure 6: Transformed VIX index histogram

Box Cox transformation was used to transform the VIX data. This resulted in a transformation of. −2 VIXtransformed = log(VIX)

Observing figure 6, the new histogram of transformed data understandably normalised the VIX closing prices. The transformation is further strengthened by the Box-Cox intervals in figure 7, were one can ovserve λ = 1 within the confidence interval of 95 per cent. Indicating that no further transformation of the response variable is necessary.

Figure 7: BoxCox parameter λ with the 95 % CI shown.

28 4.2.2 Transformation of dates

The data of tweets include both date and an exact time of when the tweet was published. Since the VIX data set only has day resolution, this had to be taken in to consideration. The closing price for the VIX data set is set at 16:00 eastern time, while all tweets sent after 16:00 are regarded as if they belong to the next trading day. This is done with the argument that tweets sent after closing hours simply are unable to impact the volatility the same day.

4.3 Initial models 4.3.1 Model 1 - statistics of tweets

The ﬁrst thing that was tried was building a model with only the quantitative data from given by the tweets, disregarding the content of the tweets. In order to achieve this the statistics from each day was summed. The regressors for this model are.

• Number of retweets for tweets posted that day

• Number of favorites for tweets posted that day

• Length of tweet in terms of characters

• Number of tweets that day

This model presented an adjusted R2 of 0.024. Clearly using the statistics of the tweets will not tell us much about volatility.

4.3.2 Model 2 - Words from Volfefe Index

The second model constructed took the words of importance stated in the Volfefe study in to consideration. This model simply sorted out all the tweets that did not contain any of the 20 key words given in the Volfefe study by JP.Morgan. This model did not perform well with an adjusted R2 of 0.072. The main problem identiﬁed with this approach was that with only 20 words to many days when none of these words where tweeted gave no input to the model. From this attempt the key takeaway was that as many days as possible need to be considered, and tha using only 20 words left to many days and tweets out of the model.

4.4 Regression Model

Learning form the ﬁrst two attempts of models the ﬁnal model focuses on selected words mentioned per day in the tweets of Donald Trump.

29 4.4.1 Data selection using TF-IDF

The first step was to gather all words tweeted per day in a bag of words for this day. This bag of words will act as a representation of that day. In this step of the process we perform the first step in clearing the data by removing puncts, hyphens and other symbols. This operation is justified by arguing that symbols by them self are without meaning, unless there is context. By the same logic all stop words are removed. Stop words are words, that just like symbols are meaningless without context. Examples of stop words are, whom, this, that, these, am, is, in total 175 words are considered and stripped using the package quanteda in R.[31]

By creating a matrix with days represented as rows and each word represented as a column, we have a matrix of 3 116 days by 38 516 words. Were each word is represented by an integer indicating how many times that word appeared in his tweets on that day. With the words set out to act as regressors later it is easy to understand that the number of words further need to be reduced.

Secondly the process of stemming is applied. After removing stop words and applying stemming to all words remaining the number of unique words was reduced to 31 608. Stemming also, more importantly gathers information of the same kind. For instance, the word China will be reopened not only by occurrences of China but also by Chinese, China’s, etc. The thought behind this is that the words of the same stem essentially have the same meaning and refers to the same thing.

The next step is to calculate the TF-IDF of the matrix. This do not change the dimensions of the matrix. After this calculation each word each day is represented by a a number. This number is the TF-IDF which if the word is tweeted during the day is a decimal numeral and 0 if the word in question don’t appear during the day. This TF-IDF number can be interpreted as, how many times Trump tweeted that word on that day compared to the other days. In ﬁgure 6 below a small sample of the TF-IDF data table.

30 Figure 8: TF-IDF data table

Thirdly, and removing the bulk of words the appearance of words is considered. The words that do not appear more than 25 times are stripped from the data. Arguing that it’s hard to know what to make of these words since they appear so infrequently. Weekends when markets are closed are also removed. This results in regressors now having the dimensions of 2 502 days by 1 706 words, with each word being a regressor. The regressor value is represented by the words TF-IDF value for that day.

Finally, a last regression variable was added and another one was made binary. This regressor represents whether Donald Trump is president of not at the date of his tweet. This was represented as a factor of two levels, 0 = civilian and 1 = president. Moreover, the regressor ’pic.twitter.com’ which represents if there was a picture attached to tweet was transformed to be binary, either one or zero.

4.4.2 Variable selection using Forward Selection

In order to further reduce the number of regressors forward selection is used. This is done to avoid over ﬁtting and remove the features that don’t contribute to the performance of the model. Forward selection was chosen over the all possible regression method of regressors, which in the case of 1707 regressors is infeasible since the algorithm have to search over 21707 features combinations. Thus, forward selection was used, the forward selection was then evaluated using Baysian Information Criterion, Mallows Cp and adjusted R2. Where we would like to minimize the BIC as well as the Cp with regards to the number of regresses, and of course maximize the adjusted R2.

31 Figure 9: BIC Figure 10: Mallows Cp Figure 11: adjusted R2

Evaluating with regards to BIC we ﬁnd the optimal model to consist of 79 regressors. Mallows Cp suggests that 392 regressors should be used in the model. Finally evaluating on adjusted R2 recommends 981 variables to be used.

These criterion’s of evaluation all recommend quite diﬀerent models to be used. And there is no easy correct answer to witch to use. Despite its popularity and being intuitive, the adjusted R2 is not as well motivated in statistical theory as BIC, and Cp[32].

To decide which model to chose as the ﬁnal one the model recommended both by BIC and Mallows Cp was evaluated. This was done by performing the regression using the lm() command in R and observing the p-values of individual regressors for the model. For the model of regressors selected by BIC all p-values are close to or equal to zero.

For the model of regressors selected by Cp quite a few coeﬃcients have large p-value which is undesirable. More over, with many features we lose interpretability, while with less words we can have more insights on them. Concluding, that even though a lower adj-R2 for the model selected by Baysian Information Criterion this model will be used. The selected regressors for the ﬁnal model is stated in section 5.1 Findings.

4.4.3 Regression

At this stage a multiple linear regression was carried out using the transformed value of VIX 1/log(VIX)2 as response variable and the TF-IDF data table with the 79 words suggested by BIC as regressors. The regression is carried out in R using the command lm().

32 5 Results

5.1 Findings

The Output from the ﬁnal regression model is

Figure 12: Output from regression

33 Figure 11 shows the full model of all regressors. We ﬁnd the adjusted R2 = 0.4501 and the sum of residuals to −1.553852e−18. Some of the regressor’s names will look a bit strange due to the process of stemming, for instance the stem of leaving and leave are both included in the stem leav. Here the regressors marked in grey ’president’ is a indicator variable and ’pic.twitter.com’ is binary. Apart from these two exceptions all other regressors are represented by their TF-IDF value.

5.1.1 Interpretation

The β’s in the model are hard to interpret in their current state due to the transformation of volatility. The reverse transform is given by

r 1 VIX = exp VIXtransformed

This means that the intercept β0 which in our model has a value of 0.1334 transforms to 15.45, which is a bit lower than the median VIX of 15.60. Another big part to this transformation is that negative coeﬃcients of βi contribute to a higher price of the VIX, not lower. A positive βi such as that for ’oil’ contribute to a lower price of the VIX.

To calculate the expected VIX one would calculate the TF-IDF for the word during a day. Then putting the values of these words into the model. For example, a day only counting one of President Trump’s more known tweets

Why would Kim Jong-un insult me by calling me "old," when I would NEVER call him "short and fat?" Oh well, I try so hard to be his friend - and maybe someday that will happen!

Would not contribute to the model since none of the words in the tweet is in the model. The model would the output β0 = 0.1334, which transformed relate to a VIX of 15.45. If trump however were to tweet,

Canada will now sell its oil to China because @BarackObama rejected Keystone. At least China knows a good deal when they see it.

Were two key words are mentioned, oil and china. Calculating the TF-IDF values for these words they are 0,092301 and 0,11255 respectively. Inserted in the model this gives

VIXtransformed = β0 + βpresident + βoilTF-IDFoil + βchinaTF-IDFchina

With the values put in our VIXtransformed becomes equal 0,07940 to which corresponds to an estimate of the VIX of 34,7685. Comparing this to the median value of 15.6 for the VIX during the last 10

34 years, we understand that the estimate can be regarded as high, whilst still within the observed range of the VIX. This tweet was published on January 20:th 2012, at this time the VIX traded at about 19, considerably lower than the estimate.

5.1.2 Top regressors

In the table below the regressors with the top ten highest and lowest coeﬃcients.

Figure 13: Most inﬂuential regressors

Regressors highlighted in green, with negative coefficients contribute to a higher value of VIX and regressors with positive coefficients highlighted in red contribute to a lower VIX after the transform. However, viewing this we must understand that the TF-IDF value of each regressor also comes in to play. Remembering that the TF-IDF value is a measure of ”how many times Donald Trump tweeted a word on a specific day compared to the other days”. Having a highly negative value coefficient does not mean that the word will have a large impact on the volatility only because it appears in tweets during a day. The word crime that have a coefficient estimate of −0.8815 is common word in Donald Trump’s tweets which will result in a high document frequency, implying a low IDF value for the appearance of the word crime.

35 5.2 Residual analysis

Figure 14: Studentized residuals

Plotting studentized residuals against the corresponding ﬁtted values is a quick way of detecting model inadequacies. The residuals seem to be normally distributed; the residual values are contained in a band centered around zero with no residuals much outside the ±3, indicating that the model has no pronounced defects. Worth noting is that the residuals seem to follow the patterns of market movement.

36 Figure 15: PRESS residuals

The plotted PRESS Residuals against the corresponding fitted values also imply that the mean of all errors is near zero. No single observation stands out that potentially could influence the overall fit of the model. The sum(PRESS2) has a value of 0.819.

37 Figure 16: Residuals vs Fitted

In ﬁgure 16 of residuals against the ﬁtted curve the deduction is made that the variance is to be considered constant with a mean of zero. The Observations are equally spread out around the red line with the variance of the error terms being constant, thus homoscedasticity exists.

38 QQ-plot and Residuals vs Leverage

Figure 17: Residuals vs Leverage Figure 18: QQ-plot

Looking at the Residuals vs Leverage plot above we can see the relationship between leverage and the influence for all observations. The dashed line in the bottom and top and bottom right corners mark the level of Di = 0.5. There are a few observations closing in on this, however we can conclude that none of the observations are to be seen as influential. Also, by observing the normal Q-Q plot the confidence in the model is further strengthened as the residuals align to a high degree with the expected outcome.

Variance Inﬂation Factor and Eigenvalues

Looking at the conditioning number it has a value of 419.75, indicating some but not severe multicollinearity among the aggressors. In order to further understand the multicollinearity between single regressors the Variance Inﬂation Factor is consulted.

39 Figure 19: Variance Inﬂation Factor

The VIF values for the regressors mostly vary between 1-1,5 with the one exception being the regressor president with has a VIF of 2,2. None of these VIF are of major concern. In fact, some Variance inﬂation are to be expected as no words make much sense alone without other words providing context.

To summarize all theoretical assumptions of linear regression stated in section 3.1.1 are satisﬁed. According to the normal QQ-plot, the relationship between the response variable y and the regressors x is approximately linear. From the analysis of residuals, we can state that error term 2 has mean µ = 0 and constant variance = σ . Further errors are uncorrelated. i.e. Corr(i,

j) = 0 as seen in figure 16, residual vs. fitted. Moreover, the residuals are uncorrelated which assume that there is no or little multicollinearity as illustrated in figure 17, VIF. Lastly, we have, from the variable selection that the number of observations exceeds the number of independent regressors.

40 6 Discussion

This part will present an short analysis of the results, discussion about the limitations and con- straints of the approach. Lastly, the conclusion found and future work of this thesis.

6.1 Analysis of results

The obtained model is able to predict the VIX with an adjusted R2 of 0.45. The found results propose that the models do not yield satisfiable performance predicting price movements of the Cboe VIX index. The results from the study imply that the chosen regressors display a small significant correlation between Trump’s tweets and the market volatility. The stock market is volatile by nature and is influenced by many different global factors, ranging from economic, political, social and technological conditions. This large and complex system of many unknowns, will complicate the process of simplifying and quantifying data of only one source into a regression model with high predictability.

Disregarding the inadequate explanatory features of the model and focusing only on key words, the ﬁndings are more promising. The chosen regressors have in fact a clear correlation with the market volatility. All the words chosen as regressor have a p-value of 0.01 or much less in most cases. This implies that the words when part of the content in Donald Trump tweets with almost certainty correlate with market volatility. Although the model with the words chosen are unable to fully explain market volatility, we can conclude that the words used as regressors in fact do correlate with market volatility. This is to be regarded as the practical contribution of this study. One could use the regressors presented in this study to have a correlation with market volatility to further investigate the question or other aspects of Donald Trump’s tweets.

Finally the one question that has to be asked, the chicken or the egg? which translated to our case becomes dose Trump’s tweets inﬂuence volatility or vice versa? This question can be discussed back and forth, however with the results from this study we are not able to answer this. The reason for this is again that there only is one data point per day. The model is not able to diﬀerentiate between whether a tweet is the origin of a topic of if the tweet simply comments an already known topic. It is impossible say if the tweet contains disclosure or not. However here the fact of Donald Trump in his role as president of United Stated of America have a wide reach to tens of millions of followers only via twitter makes the contents of his tweets more likely to have impact regardless of disclosing any news or not.

41 6.2 Limitations

One of the most signiﬁcant limitation of this study was the fact that observations of volatility only was gathered with day resolution. Data with higher resolution do exist but is expensive to get a hold of apart from this training a model that use data with higher resolution data would also require more computational power than what was available during the writing of this thesis.

6.3 Conclusion

The aim of this study was to evaluate if it is possible to predict the VIX by choosing a set of regressors from President Trump’s tweets. Modeling tweets using words it is clear that we have some form of correlation. The low adj-R2 of 0.45 means that, even if coeﬃcients are signiﬁcant with low p-values, they cannot explain entirely the target variable. Which is to be expected, a high R2 would imply that Donald trump have some control of stock market movement.

To make it clearer, if we want to predict some day’s volatility based on what Trump tweeted on that day, after computing TF-IDF transformations and applying the model, our value wouldn’t be much reliable because we can’t explain most of the variance. That is, the errors of the predictions would not be very useful for estimating the true volatility. A linear model based on Trump tweets only would never be accurate to determine the VIX, but still we can get a clue on some trigger words that he uses and may have a more or less tiny eﬀect on the ﬂuctuations.

The 79 words included as regressors in the model are as of this time, May 2020 relevant and show a significant correlation with volatility on the American stock market. This means that applying the model in a real time scenario would be possible, even though the predictions many times would be way of the actual value. Twitter provides the availability to search and gather tweets in real time using their API. This said, the nature of the stock market and media would force the model to constantly have to be updated. If not up to date new key words for market volatility would fall between the grasp of the model. If we look at VIX since the end of February 2020 it has dramatically increased and terms as covid-19, coronavirus or pandemic that as of April 2020 when this is written, are highly relevant would not affect the model at all. The only way to counteract this would be to always keep the model up to date, something that is computationally heavy and adds to the flaw to the model.

42 6.4 Further studies

In order to further investigate or the connection between President Trump’s tweets and the stock market one would need to have more detailed data of the stock market. More specifically the resolution to prices for the VIX would have to be much higher in order to determine whether a specific tweet has a direct impact. Further studies could also build up on the word-stems presented in this thesis. One interesting approach would be to implement the words presented in some real time trading algorithm. This would possible if one where to closely figure out the average implication of each word more precisely.

Another approach to improve the model would be to gather more data from twitter, using more sources than Donald Trump. Twitters API allow one to gather nearly unlimited data. Extracting data is free for up to seven days back in time, but if one would like to gather historically data form further back this is also available to a cost. This said, having more inputs than that of Donald Trump would greatly increase to model possibility. One would then be able to asses both the content as well as the validity of diﬀerent sources.

Further improvement to the analysis could be achieved through more advanced techniques of analysis. One approach could be to use a combination of tweet classification and neuro-linguistic programming. This would be implemented by first classifying all tweets with a label, for instance all tweets historically would get a label of either market moving or non market moving, this would be doable if one had market data with finer resolution. From this one could then use Random Forest, SVM, Naive Bayes or other classifiers to build a model that guesses the right class. This model, when trained the model would then decide whether new tweets are to be labeled as market moving or non market moving. Using this approach and running training algorithms as random forest is feasible given that one has firstly, some criteria to evaluate the tweets and secondly, the computing power to train the model.

Lastly nonlinear models could be used in order to establish a better model. During the process of this study it has dawned that the question of correlation between Donald Trump’s Tweets and market volatility is a complicated one, and that a linear regression model might be inadequate to explain it fully. Therefore, it would be of interest to extend the model coeﬃcients to non-linear ones, possibly gaining more detailed answer.

43 References

[1] Investopedia, Dutch Tulip Bulb Market Bubble Deﬁnition, https://www.investopedia.com/terms/d/dutch tulip bulb market bubble.asp Online; accessed 21-May-2020

[2] Hyde Stuart J, Bredin Don P and Nguyen Nghia. Correlation Dynamics between Asia-Paciﬁc, EU and US Stock Returns, 2007

[3] Cizeau et al. Volatility distribution in the SP500 Stock Index https://arxiv.org/abs/cond-mat/9708143 [cond-mat.stat-mech], Online; accessed 24-March-2020

[4] JP Morgan North America, Introducing the Volfefe Index, Septemper 6 2019

[5] Hurd, Michael D and Rohwedder, Susann, Stock Price Expectations and Stock Trading,(2012), National Bureau of Economic Research, http://www.nber.org/papers/w17973, Online; accessed 20-May-2020

[6] Johan Bollen, Huina Mao, Xiaojun Zeng,Twitter mood predicts the stock market, Jour- nal of Computational Science, Volume 2, Issue 1, 2011, Pages 1-8, ISSN 1877-7503, http://www.sciencedirect.com/science/article/pii/S187775031100007X Online; accessed 20-may-2020

[7] Malkiel B.G. (1989) Eﬃcient Market Hypothesis. In: Eatwell J., Milgate M., Newman P. (eds) Finance. The New Palgrave. Palgrave Macmillan, London

[8] Corporate Finance Institute,What is the Stock Market?, https://corporatefinanceinstitute.com/resources/knowledge/trading-investing/stock-market/ Online; accessed 05-May-2020

[9] R Harper, David (2020). Forces that move stock prices https://www.investopedia.com/articles/basics/04/100804.asp, Online; accessed 18- April-2020

[10] Masoud, Najeb. (2013). The Impact of Stock Market Performance upon Economic Growth. International Journal of Economics and Financial Issues. 3. 788-798.

[11] Keynes, John Maynard. (1936). The general theory of employment, interest and money. London

[12] Investopedia, Kenton, Will. What Is Market Psychology?, https://www.investopedia.com/terms/m/marketpsychology.asp, Online; accessed 18-May-2020What Is Market Psychology?

44 [13] Investopedia, Beers, Brian. How the News Aﬀects Stock Prices, https://www.investopedia.com/ask/answers/155.asp, Online; accessed 18-May-2020

[14] Ormos, Mihály Vázsonyi, Miklós.(2011). Impacts of Public News on Stock Market Prices: Evidence from SP5001. Interdisciplinary Journal of Research in Business. 1. 1-17.

[15] Alexander F. Wagner Richard J. Zeckhauser Alexandre Ziegler, 2018. Company stock price reactions to the 2016 election shock: Trump, taxes, and trade, Journal of Financial Economics, https://www.nber.org/papers/w23152.pdf Online; accessed 05-May-2020

[16] Kuepper, Justin (2020). Volatility Deﬁnition https://www.investopedia.com/terms/v/volatility.asp Online; accessed 17-March-2020

[17] Chicago Board Options Exchange, VIX Index, http://www.cboe.com/vix, Online; accessed 21-Jan-2020

[18] Investopedia SP 500 Index – Standard Poor’s 500 Index, https://www.investopedia.com/terms/s/sp500.asp, Online; accessed 22-March-2020

[19] For Financial Glossary: Asymmetric volatility. (n.d.) Financial Glossary. (2011). Retrieved April 20 2020 from https://financial-dictionary.thefreedictionary.com/Asymmetric+volatility Online; accessed 21-March-2020

[20] Wikipedia. Twitter, http://en.wikipedia.org/w/index.php?title=Twitteroldid=952054004, 2020. Online; accessed 20-April-2020

[21] Twitter Q4 and Fiscal Year 2019, Earnings Highlights, https://investor.twitterinc.com/financial-information/quarterly-results/default.aspx Online; accessed 20-April-2020

[22] Ceron, A. et al. (2009). Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France. In: New Media Society 16, pp. 340–358. http: //nms.sagepub.com.focus.lib.kth.se/content/16/2/340.full.pdf+html Online; accessed 18-April-2020

45 [23] Paltoglou, G. and M. Thelwall (2012). Twitter, MySpace, Digg: Unsupervised sentiment analysis in social media. In: ACM Transactions on Intelligent Systems and Technology 3.4. http://dl.acm.org.focus.lib.kth.se/citation. cfm?doid=2337542.2337551 Online; accessed 05-May-2020

[24] Montgomery, D.C., Peck, E.A. and Vining, G.G. (2012) Introduction to Linear Regression Analysis. Vol. 821, John Wiley Sons, Hoboken.

[25] Qaiser, Shahzad Ali, Ramsha. (2018). Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents. International Journal of Computer Applications.

[26] Aizawa, Akiko (2003). An information-theoretic perspective of tf–idf measures. Information Processing and Management. 39 (1): 45–65.

[27] Willett, Peter. (2006). The Porter stemming algorithm: Then and now. Program electronic library and information systems.

[28] Uyar, A. (2009). Google stemming mechanisms. Journal of Information Science, 35(5), 499–514. https://doi.org/10.1177/1363459309336801 Online; accessed 18-March-2020

[29] Kaggle.com, Trump Tweets,Austin Reese CC0 1.0 Public Domain Dedication. https://www.kaggle.com/austinreese/trump-tweetstrumptweets.csv, Online; accessed 21-Jan-2020

[30] www.cboe.com, VIX Index Historical Data, http://www.cboe.com/products/vix-index- volatility/vix-options-and-futures/vix-index/vix-historical-data Online; accessed 21-Jan-2020

[31] www.quanteda.io Stopwords, https://quanteda.io/reference/stopwords.html Online; accessed 8-March-2020

[32] Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. An Introduction to Statis- tical Learning : with Applications in R. New York :Springer, 2013 (section 6.1.3)

TRITA 2020:116