<<

DEGREE PROJECT IN FINANCE PROGRAM: REAL ESTATE AND FINANCE FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020

Coronavirus-Related Sentiment and Prices Measuring Sentiment Effects on Swedish Stock Indices

Olga Piksina Patricia Vernholmen

KTH

INSTITUTIONEN FÖR FASTIGHETER OCH BYGGANDE 1

Bachelor of Science Thesis

Title: Coronavirus-Related Sentiment and Stock Prices: Measuring Sentiment Effects on Swedish Stock Indices

Authors: Olga Piksina, Patricia Vernholmen

Institution: Institution of Real Estate and Construction Management

Bachelor Thesis number: TRITA-ABE-MBT-20482

Archive number:

Supervisor: Andreas Fili

Keywords: market sentiment, behavioural finance, market efficiency, coronavirus, Swedish stock market, text analytics, , news mining

Abstract

This thesis examines the effect of coronavirus-related sentiment on Swedish stock market returns during the coronavirus pandemic. We study returns on the large cap and small cap price indices OMXSLCPI and OMXSSCPI during the period January 2, 2020 – April 30, 2020. Coronavirus sentiment proxies are constructed from news articles clustered into topics using latent Dirichlet allocation and scored through sentiment analysis. The impact of the sentiment proxies on the stock indices is then measured using a dynamic multiple regression model. The results show that the proxies representing fundamental changes in our model — Swedish Politics and Economic Policy — have a strongly significant impact on the returns of both indices, which is consistent with financial theory. We also find that sentiment proxies Sport and Coronavirus Spread are statistically significant and impact Swedish stock prices. This implies that coronavirus-related news influenced market sentiment in Sweden during the research period and could be exploited to uncover arbitrage. Finally, the amount of sentiment-inducing news published daily is shown to have an impact on stock price .

2 Examensarbete kandidatnivå

Titel: Coronavirus-relaterat sentiment och aktiepriser: En studie av sentimenteffekter på svenska aktieindex

Författare: Olga Piksina, Patricia Vernholmen

Institution: Fastigheter och Byggande

Examensarbete kandidatnivå nummer: TRITA-ABE-MBT-20482

Arkiv nummer:

Handledare: Andreas Fili

Nyckelord: marknadssentiment, beteendefinans, marknadseffektivitet, coronaviruset, svensk aktiemarknad, textanalys, sentimentanalys, news mining

Sammanfattning

Denna studie undersöker den effekt coronavirus-relaterat sentiment haft på avkastningen på svenska aktieindex under coronaviruspandemin. Vi studerar avkastningen på large cap- och small cap-prisindexen OMXSLCPI och OMXSSCPI under perioden 2 januari 2020 – 30 april 2020. Proxier för coronavirus- sentiment konstrueras från nyhetsartiklar som klustrats i ämnen genom latent Dirichlet-allokering och poängsatts genom sentimentanalys. Sentimentproxiernas påverkan på aktieindexen mäts sedan med en dynamisk multipel regressionsmodell. Resultaten visar att proxierna som representerar fundamentala förändringar i vår modell — svensk politik och ekonomisk policy — har en starkt signifikant inverkan på avkastningen på båda indexen, vilket är konsekvent med finansiell teori. Vi finner även att sentimentproxierna sport och spridning av coronaviruset är statistiskt signifikanta i sin påverkan på svenska aktiepriser. Detta innebär att coronavirus-relaterade nyheter påverkade marknadssentiment i Sverige under undersökningsperioden och skulle kunna användas för att upptäcka arbitrage. Slutligen visas mängden sentimentframkallande nyheter publicerade per dag ha en inverkan på aktieprisvolatilitet.

3 Acknowledgements

We would like to extend our genuine gratitude to our supervisor, Dr. Andreas Fili, whose guidance and support made this work possible. We wish to express our sincere thanks to Dr. Bertram Steininger for sharing meaningful insights into text analytics and recent research in the field. We would also like to thank Dr. Olga Rud for her recommendations on the material used for this study. Our special thanks goes to Stephen Rosewarne who kindly agreed to proofread this thesis. Furthermore, we are thankful to our family members and friends Christoffer Linné, Marina and Laura Vernholmen, Manne Svensson, Helene Törnqvist and Leo P. Thank you all for your unwavering support and inspiration.

4

Table of Contents

1. INTRODUCTION 8 1.1 Research Purpose and Questions 8 1.2 Contribution to the Field 10 1.3 Disposition 10

2. REVIEW OF THE LITERATURE 10 2.1 vs. 10 2.2 Efficient Market Hypothesis 11 2.3 Behavioural Finance 12 2.4 Event Studies and EMH 16 2.5 Text Analytics in Finance 16 2.5.1. Sentiment Analysis through Computational Linguistics 17

3. METHOD AND MATERIALS 18

3.1 Description 18 3.2 Limitations 19 3.3 Stock Indices 20 3.4 News 20 3.4.1 Collection 20 3.4.2 Preprocessing Textual Data 21 3.4.3 Topic Modelling and Scoring 21 3.4.4 Sentiment Proxies 23 3.4.5 Sentiment Analysis 26 3.4.6 Allocation of News to Dates 27 3.5 Economic Indicators 27 3.6 Multiple Regression 27 3.6.1 Model Specification 27

4. RESULTS 28 4.1 Autocorrelation Analysis 28 4.2 Cross-Correlation Matrices 29 4.3 Specified Regression Model 30 4.3.1 Variance Inflation Factors 30

5

4.4 Regression Outputs 30 4.5 Market Volatility and Coronavirus-Related News 32

5. ANALYSIS 32 5.1 Sustainability Aspects 34 5.2 Further Research 35

6. CONCLUSION 35

REFERENCES 37

6

Terminology list

OMXSLCPI Price index of all Large Cap companies listed on Stockholm (Market value of 1 billion euro or more. Nasdaq, 2020). OMXSSCPI Price index of all Small Cap companies listed on Stockholm Stock Exchange (Market value below 150 million euro. Nasdaq, 2020). Market Sentiment “[...] a belief about future cash flows and risks that is not justified by the facts at hand” (Baker and Wurgler, 2007). Market value of an asset = fundamental value + sentiment value.

Proxy A proxy is “a variable used instead of the variable of interest when that variable of interest cannot be measured directly” (Oxford University Press, 2009). Proxies used in this study fall into two mutually exclusive categories: 1. Fundamental proxies, representing news which can cause fundamental change to asset values, and 2. Sentiment proxies, which reflect sentiment. Proxies belonging to one category have no influence on the other.

Web Scraping Automated gathering of data from the internet through any means other than a program interacting with an API (Mitchell, 2018).

Latent Dirichlet Allocation Unsupervised algorithm used to cluster previously (LDA) unlabelled text data according to topics (method known as topic modelling). It finds the most common words appearing in the text and clusters them, thus uncovering themes in text (see Blei, Ng and Jordan, 2003).

Text Analytics “[...] large-scale, automated processing of plain text language in digital form to extract data that is converted into useful quantitative or qualitative information” (Das, 2014).

Corpus Collection of text documents which can be readily processed in an automated way.

Tokenisation Part of data preprocessing identifying basic units, known as tokens, in text corpora. Some methods tokenise by words or entities delimited by blank spaces while others make tokens of more complex entities such as idioms or expressions (Webster and Kit, 1992).

7

1. Introduction

On the 11th of March, 2020, the World Health Organisation (WHO) declared the novel coronavirus disease 2019 (COVID-19) outbreak a global pandemic. The outbreak originated in Wuhan, China in December 2019 and has since spread throughout the entire world. In addition to a serious health emergency, the spread of the disease in the majority of the world’s countries has led to a deep economic crisis predicted by many to become a recession similar to the Great Depression. Stock markets all over the globe have reacted with varying degrees of panic, and widespread future uncertainty has resulted in two of the largest single day drops in the Dow Jones Industrial Average. The Swedish stock market has also experienced days of historic decline (see Figure 1). At the time of writing, the pandemic is ongoing and there is no clear outlook on how it is going to develop or when it will end.

The coronavirus outbreak has also been remarkable due to its receiving unprecedented, near total media coverage across the globe. With a huge part of the world’s population being isolated or confined to their homes, the current pandemic has become a unique event with news rapidly spreading around the world. Information on the spread of the coronavirus and measures taken by governments in response to the crisis have alternated with news of skyrocketing unemployment rates, industries at risk of collapse, and the total economic impact of the pandemic on the global community.

Our primary hypothesis is that non-economic news has made a significant impact on how on the Swedish stock market have valued assets during the pandemic. We assume movement on the market is influenced by blanket media coverage of the pandemic as well as various measures the Swedish government and central bank have introduced in response to the crisis. Our hypothesis has its basis in the theory of behavioural finance that indicates investors’ mood, fear and emotions impact the decision making process (Kahneman and Tversky, 1979; Statman, 1995; Shleifer and Vishny, 1997; Donadelli, Kizys and Riedel, 2017; Bukovina, 2016).

1.1 Research Purpose and Questions

In this paper, our aim is to analyse how coronavirus-related news has impacted the stock market in Sweden during the period January 2, 2020 – April 30, 2020, when several European countries and the United States became epicentres of the COVID-19 pandemic.

8

This study aims to address the following questions:

1. Did coronavirus-related news in Sweden generate sentiment that could be observed on the Swedish stock market? 2. Which type(s) of news had the greatest impact on stock returns? 3. Is it possible to use sentiment proxies extracted from coronavirus-related news to make profitable ?

Figure 1: Source: NASDAQ Nordic, http://www.nasdaqomxnordic.com/

* News of the coronavirus outbreak was first reported in the media after 17:00 CET on the evening of 11th March, 2020. The arrow points to the following day (12th March) because these first news reports could not impact stock prices on the day of the announcement.

9

1.2 Contribution to the Field

Sentiment analysis is a relatively new method of studying financial markets. The emerging field of behavioural finance has provided analysts with numerous studies on how sentiment impacts (Baker and Wurgler, 2007; Kaplanski and Levy, 2008; Statman, 2014). Investor sentiment is presumably unique for each market, which is why applying known sentiment effects from one market to another is ill-advised (Lang and Schaefers, 2015). This work contributes to existing research by performing an empirical study that analyses the effects of coronavirus-related news on Swedish stock prices, combining techniques with multiple regression methodology. This exploratory study deals with a unique and extreme sequence of events, and it is unclear whether these findings will prove useful to future sentiment research. However, we believe our conclusions could be interesting for future research on Swedish stock market changes related to media coverage of low-probability and high-consequence events.

1.3 Disposition

The structure of this thesis is as follows: Section 2 reviews relevant literature to present the current state of research. Section 3 details our research method. Section 4 presents our empirical findings which are then analysed in Section 5. Finally, Section 6 includes discussion of our results and conclusions.

2. Review of the Literature

2.1 Fundamental Analysis vs. Technical Analysis

Stock markets have always attracted investors willing to grow their capital. Due to their dynamic and volatile nature, stock market investments are associated with high levels of risk. Financial analysis has been used for decades to understand movement of capital markets and to forecast stock price development. When analysing capital markets to support their decisions in buying, selling or holding , investors have mainly been using two techniques — fundamental analysis and technical analysis. Which type of analysis investors choose depends on what they believe about the characteristics of the market. According to Murphy (1999), both methodologies aim to satisfy the same need, namely understanding in which direction the market moves. The only difference is that a fundamentalist would want to know why the market behaves as it does, while a technician would solely analyse market action itself (Murphy, 1999).

10

Fundamental analysis aims to identify mispriced assets, thus the main belief of fundamentalists is that the market prices are often incorrect. The main purpose of this type of analysis is to identify the intrinsic value of securities and compare it with the actual market price. The intrinsic value is set at the equilibrium on the market by supply and demand forces (Griffioen, 2003). A fundamental analyst would look at companies’ financial statements and calculate important multiples and ratios. While this form of economic analysis is time consuming and tedious, it is not sufficient to provide a complete picture of why an asset may be mispriced. Other aspects of fundamental analysis include broader industry analysis, and subjecting individual companies to deeper levels of scrutiny (Griffioen, 2003). The fundamental analysis is therefore based not only on mathematical calculations and financial statements, but also on analysts’ knowledge of the market and on their assumptions and beliefs. This could hold as an explanation of why there are investors on the market willing to buy and sell the same assets at the same time. Investors presumably reach different interpretations and conclusions when presented with the same information.

Technical analysis is based on the premise that prices on the market move in trends and that those trends tend to repeat, leading to market swings. Technicians believe market prices could react to multiple factors and analysing causes of the market moves would be excessive, because stock prices get instantaneously corrected reacting to these underlying factors. Essential for the technicians is to recognise the direction of the market and even more essential doing this before others recognise the move. Thus technical analysts assume the fall in price of a security could be derived from higher supply or lower demand on that security which in turn reflects changes in the fundamentals (Murphy, 1999). Technical analysis is therefore limited to studying charts and graphs with the past market price (returns) movements to make predictions about the future.

2.2 Efficient Market Hypothesis

While a detailed description of traditional finance evolution lies outside the scope of this study, we assume mentioning one of the most important assumptions of standard finance is of great importance. According to traditional finance theories, investors act rationally and always aim to maximise their profits while minimising their risks (De Bondt, 1995). The efficient market hypothesis (EMH) introduced and described by Eugene Fama represents one of the pillars of standard finance. The theory suggests no analysis of publicly available information could be used to outperform the market since stock prices adjust to new information instantaneously and unbiasedly, making the idea of ‘beating the market’ a utopia. The theory implies financial analysts are worthless, and the best investment strategy would simply be following market indices. To gain abnormal returns investors

11 would need to have access to information that is not known to the public, insider information (Fama, 1970).

EMH stands out among traditional finance theories because it recognises the existence of irrational investors. However, Fama concludes that irrational investors’ trading is insufficient in to impact asset prices significantly. He points out that such investors would quickly be corrected by arbitrageurs forcing security prices to move to their fundamental values. Fama also suggests that not all market participants have to process the entire body of information available in the market because even a significant proportion of informed investors would take care of the efficient pricing of market securities (Fama, 1970).

Whether financial analysis is capable of providing investors with necessary insights for outperforming the market has been highly debatable. De Bondt (1995) calls financial analysts’ roles paradoxical. He wonders, on the one hand, how efficient market hypothesis proponents could explain the existence of well paid financial analysts on the market if their job is worthless. On the other hand, he implies a competing view of an irrational market has difficulty explaining why professional investors consistently fail to beat market indices. At the AIMR conference in 1995 De Bondt said: “Despite its many insights, modern finance offers only a set of asset-pricing theories for which no empirical support exists and a set of empirical facts for which no theory exists.”

2.3 Behavioural Finance

In standard or traditional finance humans are rational, therefore they aim to maximise their utility taking as little risk as possible. Pricing assets on capital markets is thus an unemotional and straightforward process and a correct price of an asset should be equal to the discounted present value of all future cash flows. For a period of time, traditional finance theories were used to explain market movements and develop investment strategies. The reality, however, has been steadily challenging the standard finance assumption of rational and unemotional market participants. In the light of efficient markets, events such as price bubbles (e.g. the Swedish real estate bubble of the 1990s, the dot-com bubble) or stock market crashes (e.g. the Black Monday crash of October 1987) have not been able to find satisfactory explanations in the field of traditional finance. This has given rise to a new finance paradigm where irrationality of people, individual biases and cognitive factors hit the spotlight.

In an attempt to explain market anomalies and improve market development analysis, financial researchers came to direct their views to other disciplines. Behavioural finance then emerged as an

12 innovative new discipline combining aspects of sociology, psychology, anthropology and finance (Ricciardi and Simon, 2000) with the focus of studying the behaviour of people and the transmission of this behaviour into capital markets. In particular, this newer discipline started challenging the notions of standard finance such as human rationality and efficiency of the markets, suggesting there are other factors explaining market anomalies and volatility. As Statman (1995, p.15) put it: “People in standard finance are rational. People in behavioural finance are normal.”

Early research in the field presented the modern financial paradigm with behavioural investors who were not consistent in their attitudes towards risk. In accordance with Prospect theory, people are risk-averse in the domain of gains and risk-seeking in the domain of losses (Kahneman and Tversky, 1979). Simply put, investors feel the loss stronger than a gain of the same amount of money and are therefore prepared to pay to avoid losses as readily as they pay to generate profits. Prospect theory, with its empirical evidence, has since become a central concept in behavioural finance.

Another finding of behavioural finance is that investors exhibit bounded rationality, in other words their decisions are limited by cognitive mistakes, psychological biases and emotions. The existence of fully rational investors cannot be verified in reality (Shleifer and Vishny, 1997). A constantly growing number of market participants has also come to influence market movements. Trading is no longer exclusively accessible to institutional investors. Developing technology and an emerging number of trading platforms has given rise to non-professional “hobby” traders. They trade more often and act unpredictably, making markets even more volatile. Their access to professional financial forums is limited, and they often turn to information sources such as news outlets and social media to support their decisions on the stock market. These retail investors are also more likely to trade on market sentiment (Bukovina, 2016; Baker and Wurgler, 2007).

2.3.1 Market Sentiment In the field of finance, sentiment is often described as the mood or emotions of people which may influence capital markets. This implies asset-pricing should not be fully associated with fundamental changes in the economy or individual securities. Kaplanski and Levy (2008) state sentiment is a much broader concept and could be described as “any misperception leading to asset mispricing”. Mood and fear can therefore be considered examples of how the potential for irrational behaviour in investors can result in shifts in market sentiment. Baker and Wurgler (2007) define market sentiment as “investors’ beliefs” about future returns and risks when these beliefs are not necessarily supported by available information related to fundamentals. In this paper we use “market sentiment” as a broader interpretation described in the aforementioned works so that any factor beyond changes in fundamentals is seen as sentiment.

13

Behavioural finance studies have been aiming at better understanding whether market sentiment has an impact on investors’ decisions while pricing assets. Statman (2014) concluded the phenomenon that investors sell securities when the market reaches its bottom could be explained by fear of growing risk and smaller belief of future returns. He also explains that many investors tend to buy securities when the market is already overheated because of the excitement that mitigates risk perception and exaggerates expectations of future returns. These future returns expectations could probably explain irrational exuberance behind stock market speculative bubbles much better than standard finance theories could. In a similar way the disposition effect theory is based on a number of misleading emotions that influence investors’ decisions. Regret of realising losses is described as an emotion forcing investors to hold on bad stocks for a longer period of time than it is rational. Pride and thrill, on the other hand, make investors sell winning stocks too early in a hurry to realise their gains (Shefrin and Statman, 1984). Symeonidis, Daskalakis and Markellos (2010) inferred sunny weather could be associated with increased volatility of the markets in the US. They conclude good weather could result in good moods and increased communication among investors on the market driving up trading volumes and thereby volatility. Events such as disease outbreaks or epidemics are known to increase the overall anxiety level and pessimism in the society and thus create negative sentiment (Donadelli, Kizys and Riedel, 2017). Evidence is prolific that financial decisions are driven by sentiment as behavioural finance studies provide us with results ascertaining the existing relationship between stock returns and market sentiment.

Sentiment studies of recent years often do not question whether sentiment has an impact on stock prices. Researchers have seemingly accepted that it does. A new direction of research does not simply study sentiment impact on the stock market but also points out that different stocks are disproportionately reactive to market sentiment. Baker and Wurgler (2007) ascertained that stocks of smaller, younger, high volatility and growth companies are more prone to market sentiment than large cap and mature stocks. They explain this disproportional sensitivity depends on two factors: smaller and younger stocks are more difficult to arbitrage and value. Baker and Wurgler (2007) pointed out that during the dot-com bubble, the majority of speculative stocks were small start-ups with no historical data to lean against, thus valuation mistakes were very probable considering general excitement around the Internet at that time. Consequently, while studying market sentiment, it is more reasonable to look at small and large cap stocks separately than analyse dependency of aggregate stock prices on sentiment.

The presence of market sentiment raises a question of how it could be extracted, measured and analysed. Studying news and social media content to extract market sentiment has become common in recent behavioural research. Many attempts have been made to use qualitative textual content for

14 quantitative analysis and predictions (Bukovina, 2016). Researchers have been steadily studying how financial news could be processed and categorised with the help of different computer based techniques (Lee et al., 2014). The amount of data involved is enormous and its effective and quick analysis by humans is no longer feasible. Traders see an advantage in developing algorithms for textual data analysis and in creating stock price predictive models and automated trading systems (Atkins, Niranjan and Gerding, 2018). Evidence suggests such models can improve stock prices predictions made by traditional financial analysts.

Studies on how non-financial news influences market sentiment and stock markets are numerous but plagued by a persisting difficulty to extract and distill news relevant to the analysis. Also, these studies have most often analysed social media driven market sentiment and been carried out on the US market. It is therefore questionable if the results could be replicated on other markets. Moreover, the research results on sentiment impact on stock markets are contradictory (Lang and Schaefers, 2015) and prove that market sentiment can change over time. It is also obvious that sentiment differs depending on event, market and even culture. Kaplanski and Levy (2014) showed through the example of the football sentiment from 2010 FIFA World Cup study how sophisticated investors adjusted their trading strategies and weakened the football sentiment effect on the US stock market at the last stage of the tournament. A real challenge, however, has been in creating a model that is reliable in establishing a connection between news and evolving stock prices, because adjustments on the fundamental changes in the economy have to be made. The authors suggested that when studying market sentiment, one should analyse economic news in terms of its potential for both positive and negative impact, and include the economic news as a non-sentiment fundamental variable that impacts stock prices in the predictive model.

Another difficulty for sentiment analysis is to distinguish what market sentiment at different periods of time consists of, and which component of this sentiment plays the most significant role. Baker and Wurgler (2007) propose measuring market sentiment by breaking it down into sentiment proxies. According to them, extracting potential sentiment proxies can be useful for future models measuring sentiment impact on stock markets, even though these proxies are imperfect and noisy. They suggest combining different imperfect measures such as surveys on investors’ beliefs, option implied volatility measures, trading volume, retail investor trades and mood proxies. Baker and Wurgler (2007) have constructed a sentiment index level that includes several of these proxies to smooth out idiosyncrasies. Studying existing research on sentiment measures we arrive at a conclusion that there are not many established methods at the moment, and that researchers are actively exploring new methods by merging existing models and constructing their own ones.

15

2.4 Event Studies and EMH

One of the most frequently used methods to determine whether market sentiment related to a certain event gets effectively incorporated into stock prices is the event study method. Event studies are most suitable for analysing the sentiment effect of a single event or a series of rare events on a stock’s price or a sector’s returns. One must define the event and estimation windows and subsequently calculate the stock’s abnormal return. The abnormal return is the stock’s return over the event window minus the expected return of the stock over the event window (MacKinley, 1997). There are several methods to calculate abnormal returns to carry out an event study. MacKinley (1997) suggests models for calculating expected returns with a constant mean return, or market models. Measuring and analysing abnormal returns for the stock provides a researcher with insights on whether the market has efficiently incorporated this event into the stock’s price. Discussion of an event study often deals with implications of EMH and the market’s capacity to timely and rationally assess relevant information. Some events are more commonplace than others and are generally easier to analyse. It is not surprising that event studies often arrive at different conclusions in regard to market efficiency.

2.5 Text Analytics in Finance

The importance of analysing text springs from its “nuances and behavioural expression which is not possible to convey using numbers” (Das, 2014, p.4). Text analysis is used to convert textual data to quantitative or qualitative information. It includes everything from simple methods for summarising and visualising large bodies of text in order to make it easier to comprehend to complex methods of quantifying vast amounts of unstructured text data. Depending on the goal, there are many different tools which can be employed. Text analytics in finance has primarily focused on measuring effects on stock prices and indices as well as analysing corporate reports (Cohen, Malloy and Nguyen, 2020). It can be performed by either the use of dictionaries, lexicons or machine learning (Das, 2014). The choice of method depends on the type of inputs and desired outputs, for example, in machine learning, a regression model is used to predict continuous output variables whereas classification models are used for discrete output variables. Also, supervised models are used for classification with prespecified outputs whereas unsupervised ones are used for clustering inputs in previously unspecified ways (Mitchell, 1997; Das, 2014; Cohen, Malloy and Nguyen, 2020).

Several researchers have underlined the usefulness of unsupervised machine learning models on financial data, in quantifying and visualising financial stability tendencies (Li, et al., 2017), modelling

16 the structure of the stock market (Doyle and Elkan, 2009) and specifically on financial text data for which no a priori categorisation of their content exists (Feuerriegel and Pröllochs, 2018). One example of an unsupervised machine learning model for clustering is topic modelling using Latent Dirichlet Allocation (LDA). It is a generative probabilistic model for collections of discrete data such as text corpora which finds the words that occur the most throughout the corpus, clustering them into topics and calculating the probabilities of each document belonging to each topic respectively (Blei, Ng and Jordan, 2003). Feuerriegel and Pröllochs (2018) measured the impact of topics within corporate filings on the stock market to identify topics which are of relevance to investors, motivating the use of LDA by the fact that previous studies on the effect of specific disclosure topics on the market (Tetlock, 2007; Vuolteenaho, 2002 and Chan, 2003 cited in Feuerriegel and Pröllochs, 2018, p. 3) had evaluated the effect of one topic at a time and ignored disclosures not belonging to any of the given topics. They stated the advantages of employing LDA as avoiding subjective bias due to manual topic extraction, greater flexibility with topic selection accordingly with the text corpus and the ability to process vast amounts of text, which would be “prohibitively difficult and costly with manual labelling” (p.4).

2.5.1 Sentiment Analysis through Computational Linguistics Another form of text analysis, which has recently been highly recognised for its usefulness among academics and business people, is automated sentiment analysis. Part of mathematical language theory known as quantitative linguistics, it is a way of extracting subjective expressions from unstructured text and classifying them according to their sentiment (Alessia, et al., 2015). Many programming languages support this type of analysis, i.e. R with the packages dplyr, tidyr (Wickham and Henry, 2020), textdata (Hvitfeldt and Silge, 2020) and tidytext (Robinson and Silge, 2020) to mention a few. The steps in sentiment analysis, as described by Alessia, et al. (2015), are data collection, text preparation, sentiment detection, sentiment classification and presentation of output, and can be done using either a lexicon-based or machine learning-based approach or a hybrid of the same. Lexicons are dictionaries with words which are each given a sentiment orientation or score. The lexicon is joined with preprocessed, so called “tidy” data (Wickham and Henry, 2020), to determine the affective content of the text and its polarity (Devitt and Ahmad, 2007). Sentiment analysis processes within computational linguistics have previously been widely used in financial contexts, with a few examples including forecasting stock prices (Tetlock, 2007; Day and Lee, 2016) and analysis of the market’s response to sentiment in financial press releases (Federal Reserve Bank of St. Louis, 2006).

17

3. Method and Materials

3.1 Description

Our findings were expected to explain whether or not investors’ perceptions of news on the disease, mortality and the sequence of events associated with the coronavirus influenced the behaviour of the Swedish stock market. Keeping in mind that small and large cap stocks can move differently and exhibit different levels of volatility, we considered them separately by analysing both OMXSSCPI and OMXSLCPI indices. Our model was intended to shed light upon the types of coronavirus-related news that had the biggest impact on investor sentiment and explore the possibility of using those news categories to adjust future trading strategies. We aimed to examine whether uncommon events may bring with them new arbitrage opportunities.

The impact of coronavirus-related sentiment on the Swedish stock indices OMXSSCPI and OMXSLCPI was measured using a multiple regression model. The independent variables were created through a three-step process: first, news articles related to the coronavirus were gathered as a source of text containing fundamentals and sentiment. Second, relevant news topics were chosen automatically from the text data by the degree of importance in the total body of extracted news through the use of latent Dirichlet allocation. The topics were weighted for each news article in accordance to probability measures, giving articles topic scores. Third, each news article was assigned a sentiment score using sentiment analysis to account for negative or positive events in each topic. The topic score was then combined with the sentiment score to provide an overall article-specific proxy score. Then, proxy scores per article were merged to the same date to create single daily observations per business day. Finally, daily returns on OMXSSCPI and OMXSLCPI were regressed on the sentiment proxy variables.

A possible source of uncertainty in the models was that the news sentiment variables might show a certain degree of multicollinearity, i.e. news about deaths might affect the probability of upcoming news about restrictions due to containment of the virus spread. This was assessed through the use of the variance inflation factor (VIF) which shows the degree of correlation between independent variables (Fox and Weisberg, 2018). Statistical software calculated a VIF for each independent variable.

A commonly used method for examining the impact of news on returns is the event study method previously described in the literature review. However, this method was not employed in the study of the impact of coronavirus-related news due to the daily media coverage. The extensive supply of news

articles made it nearly impossible to isolate individual events from each other. Due to overlap, an event

18 window could not be successfully constructed and meaningful results. Also, an event study utilises excess returns on the observed stock or index compared to another index. This was not feasible to assess due to the coronavirus being a global series of events, meaning there could be no comparison index left unaffected.

3.2 Limitations

The actualisation of this study in the midst of the ongoing pandemic posed as a limitation as well as an advantage. Due to the crisis only having begun, only the beginning effect and strong panic were shown in the news without any crisis reversal which would be needed in order to fully examine the situation from a time-series perspective. In analysing the text data, a significant amount of noise was present due to the data being collected in real time. However, performing this research during the pandemic is also what makes it so valuable. At the beginning, one can make predictions and get results, then compare them to see what has changed and work on iteratively improving the accuracy of the model. This also demonstrates how models can be designed in the best possible way to account for the fact that they are analysing a continuously developing situation.

The lack of prior research on the topic is clear. The idea of analysing the impact of market sentiment on stock prices is not new and numerous studies have been introduced for several decades, using both theoretical-manual and automated approaches such as the aforementioned event study and text analysis methodologies. However, to the best of our knowledge, there are very few studies, if any, analysing the effects of news-derived sentiment on the Swedish stock market in the context of uncommon events. The coronavirus pandemic is a unique event for the entire world, and particularly challenging for Sweden, because an analogy between the disease spread and other tragic events like natural disasters, disease outbreaks or wars cannot be drawn. Sweden has not experienced any calamities of this nature in modern times. In our study, we thus rely on international research with a prevalence of evidence from the United States. We recognise that assumptions about market inefficiency, weather influencing stock market prices or behaviour of influential figures might not have the same effect in Sweden as it would in other countries.

We were also limited by the shortage of available data in performing the text analysis. Analysing news articles published in Swedish, we encountered not only a lack of notable Swedish dictionaries for sentiment analysis, but also of an epidemic-related lexicon in particular. Creating such a lexicon would require a study on its own and was outside of our scope. In light of this limitation, we translated the news articles from Swedish to English using Microsoft Translator. We acknowledge this might have

19 contributed to uncertain results in topic modelling, however manual random sample controls yielded satisfactory results with the meaning of the articles intact.

As mentioned in the description of our methodology above, we recognise that stock price patterns are influenced by real changes in the economy and not primarily by sentiment. A reliable economic indicator index consisting of main economic factors, such as employment rate, GDP growth, consumption of durables and non-durables, service consumption and production index are essential for building a robust statistical model. At this moment, such an index is not present in Sweden and is not feasible to construct from the data available due to a difference in time-series frequencies for the different indicators. For our work, we would have needed daily indicator data, however the Swedish official statistics agency publishes these figures on monthly and quarterly bases. We recognise using economic and political news articles published in a daily newspaper to model real changes in the Swedish economy is not ideal and acknowledge there is room to address this particular issue in a better way.

3.3 Stock Indices

Historical data for the Stockholm Small Cap and Large Cap price indices (OMXSSCPI and OMXSLCPI) are collected from the Nasdaq Nordic website (The Nasdaq Group, 2020) for the period of January 2nd, 2020, to April 30th, 2020.

3.4 News The data source for collection of the news data was chosen with considerations regarding the target group; investors in the Swedish stock market. Considering the features of a reliable source and readability and preference of the target group, Dagens Nyheter, a renown Swedish news site, was chosen. The site has a subsection aggregating news articles covering the coronavirus, containing both economic and non-economic news (Dagens Nyheter, 2020).

3.4.1 Collection News articles on websites are unstructured strings of text. In order to use the data for text analysis, the website was scraped using the Google Chrome extension tool “Web Scraper” (Web Scraper, 2020). Using this tool, the text was extracted from the news website through the site’s CSS selectors (see the selector graph in figure 2) and rendered as structured text in a .csv file. The start URL was https://www.dn.se/om/det-nya-coronaviruset/. Pagination was used to navigate to each page of the website to extract all news to one file.

20

Figure 2: Selector graph over the coronavirus news sitemap. TitleLink denotes the link to each news article on the start page and Text the content of the article.

3.4.2 Preprocessing Textual Data Once the data was collected, it underwent preprocessing in order to be readily usable by quantitative linguistics programs. The text rendered from web scraping was as presented in the CSS-selectors; containing punctuation, numbers and both upper and lower casing. The text was stripped of all of these. Following that, the text was tokenised using a whitespace tokeniser, rendering each part of text separated with blank spaces as a token and later stemmed using a Swedish Snowball stemmer, connecting all tokens with the same word stem into the same token. Stop words, extremely common words of little value to the text (Manning, Raghavan and Schütze, 2008), were omitted. The preprocessing method differed for the topic modelling and sentiment analysis and will be described more in-depth in the following sub-chapters for each method.

3.4.3 Topic Modelling and Scoring As scraped news data is not classified a priori, LDA was employed in order to find the most commonly appearing words in the text corpus. The LDA was performed using the KNIME Analytics Platform, an open source data science software (Berthold, et al., 2020). The platform allows the user to perform data analysis and build, test and deploy models using built in nodes. In order to be able to perform text analysis, the data, originally in strings, was transformed to a corpus with documents using the Strings to Documents node. Then, Swedish stop words were removed through the creation of a custom dictionary filter using a stop word list from a Github repository (Dahlgren, 2019). Further, punctuation was erased, the documents were converted to lowercase and stemmed using a Swedish Snowball stemmer. The use of code and text from Github repositories brings with it risks and potential for errors due to the fact the authors are not renowned and trusted sources. To minimise errors, the preprocessed text was viewed and examined after the completion of each preprocessing step. The

21 design choice between using Swedish text translated to English to be able to use built-in stop word removers, or using the original text and with a sourced stop word list, was made in favour of using the original text due to concerns regarding how important meaning could be lost in translation and topics modelled incorrectly as a consequence.

The Topic Extractor (Parallel LDA) node was configured to extract ten topics with 20 terms each after empirical testing of different combinations of topics and terms. The extracted terms are visualised in Figure 3.

Figure 3: Word cloud of the most common words appearing in Swedish Coronavirus-related news articles, sized according to relative weight in the corpus.

Feature extraction and topic assignment is a crucial step in determining which variables to extract as sentiment proxies from the coronavirus-related news. The words that appear in the news most often are likely to give a reasonable understanding of what the topic in question deals with.

We understand that one news article might contain information belonging to several topics. Therefore, we, in contrast to previous research (Feuerriegel and Pröllochs, 2018) did not assign each document to one topic, but rather to multiple topics. We assigned each article to the topics for which its probability of belonging exceeded 0.25, yielding a topic score for each article of 1 for belonging topics and 0 for non-belonging topics. The aforementioned researchers used topic modelling to extract topics from financial texts that could be relevant for investors. We extracted topics from non-financial texts that theoretically should not be relevant for investors, but we hypothesised that they are. A trade-off

22 was made here between allocating individual articles to a single topic, thus missing out on a lot of data, or introducing noise into the model by allocating articles to several topics each. We deemed it more important to make our data as exhaustive as possible, and therefore chose to allocate articles to several topics.

3.4.4 Sentiment Proxies Coronavirus-related news is not an investor survey or mood proxy, because it does not directly reflect what investors think or feel. We should not forget that news articles are written by journalists, and not investors themselves. We aim to model the way the information contained within the articles can influence investors. We find it reasonable to assume that coronavirus-related news is sentiment driving information which in turn can be deconstructed and analysed. Thus, the modelled topics are assumed to be potential proxies for coronavirus-related sentiment caused by the news. Further we detail the topics and motivate our choice of proxies that can be useful for measuring disease-related news sentiment.

Swedish Restrictions. The Swedish strategy in response to the coronavirus pandemic has been very different to that of many other countries in the world. Understandably, it has drawn a lot of attention from international media as a consequence and remains widely discussed in Sweden, polarising society. Some people in Sweden had supported the relatively passive line the Public Health Agency of Sweden (FoHM) and their government had chosen, while others were very critical and urged politicians to implement a more stringent regime of heavy social restrictions in line with those introduced in other countries. Although Sweden had at no stage closed down its economic activity or enacted a curfew for citizens, some considerable restrictions were implemented during March and April, 2020. For instance, people could not meet in groups of more than 50 people at a time. Travelling from one region to another within Sweden was not recommended and deemed inappropriate unless work- related or for serious personal reasons. Swedish institutions responsible for handling the crisis have held press conferences daily to inform citizens on regulations and recommendations in conjunction with the emerging crisis. The keywords with highest weights in this topic had roots like Swed-, close, school, Stockholm, Public Health Agency (one word in Swedish), institution, travel, pupil, spread, follow. This proxy was named REST.

Swedish Politics. This topic covered how politicians in Sweden had tackled the crisis during the first months of the coronavirus outbreak. The topic was different from the previous one in that it dealt with the political actions of Swedish leaders regarding economic, social and political concerns, and not the actions of the Public Health Agency in addressing the public health emergency. When the

23 coronavirus pandemic began, Sweden had a social democratic minority government. Making prompt decisions and implementing active measures in response to the rapidly evolving situation would not have been possible without oppositional support, which is possibly why politics became a central topic. The keywords with highest weights in this topic had roots like Swed-, crisis, government, country, need, coronavirus, leading, politics, measure, economic, state. This proxy was named POLIT.

Economic Policy. Many companies suffered rapid declines in their revenues due to the coronavirus outbreak. This crisis was completely unpredictable and in no way caused by the businesses themselves, which is why nobody was prepared to handle it. The Swedish government had to act decisively and quickly to avoid widespread bankruptcy. Economic support packages were introduced, one after another, but businesses continued to ask for more help. Unemployment rates increased heavily, although much of the support was addressing employment issues. The keywords with highest weights in this topic had roots like company, percent, crowns, state, econom-, billions, employed, Swed-, government, support. This proxy was named ECON.

Sport. The coronavirus pandemic caused almost all organised sporting events around the world to be suspended, cancelled, delayed or moved. Delaying the Olympic Games in Tokyo and the UEFA European Football Championship were unprecedented measures taken in light of the coronavirus crisis. Because sport affects so many people, and we know from earlier studies that it also affects stock prices, it is not surprising sport became an important topic. The keywords with highest weights in this topic had roots like game, sport, match, Olympics, cancel, coronavirus, club, move. This proxy was named SPORT.

Coronavirus Contemplations. For many individuals, an important part of the debate around the coronavirus outbreak has centred upon “the new normal”, i.e. how to adapt and carry on living meaningful and fulfilling lives in an unusual and confronting situation characterised by social distancing, isolation and loneliness. The spread of the virus has impacted every aspect of our lives and in turn prompted widespread debate in the media. The keywords with highest weights in this topic had roots like world, death, person, time, self, life, live, years. This proxy was named FEEL.

Culture. Culture also had a central place in public discourse, as many cultural events were also cancelled completely or moved to unknown dates in the future. The arts, exactly like sport, depend on people gathering in large groups, and many culturally based industries rapidly fell into a deep crisis within a week. The keywords with highest weights in this topic had roots like culture, film, public, music, cancel, media, concert. This proxy was named CULTURE.

24

Instructions to Swedish People. The Swedish government and authorities repeatedly called on the public to follow existing instructions and advice to help slow the spread of the virus and protect the health care system from collapse. Their guidelines included working from home where possible, maintaining distance between one another where practicable, limiting social contact to members of one’s own household, and avoiding crowded places. People above the age of 70 were repeatedly instructed to be careful and follow these recommendations very carefully. This topic was different from the aforementioned Swedish Strategy in that it did not deal with restrictions and limitations enforced on the public, but mostly with calls on personal responsibility and solidarity. The keywords with highest weights in this topic had roots like at home, keep distance, job, think, help, public, time, try, people. This proxy was named INSTR.

Swedish Healthcare System. Most countries in the world are simply not prepared to respond to a pandemic with overwhelming numbers of sick people at a time. The state of the healthcare system became crucial as it was associated with the potential outcome for both COVID-19 patients and medical staff. Due to the characteristics of the virus and its transmissibility, elderly care received a lot of attention in the media due to the high occurrence of deaths in nursing homes. A lack of protective equipment in hospitals and nursing homes, limited intensive care places and shortages of medical staff sparked intense debate in the Swedish media. The words with highest weights in this topic had roots like region, patient, medical care, Stockholm, staff, hospital, protective equipment, intensive care, commune, nursing home, Karolinska (a Swedish university and hospital institution). This proxy was named MED.

Coronavirus-Related Events in The World. Delivering information on the spread of coronavirus around the world has been a main goal of the media during the unfolding crisis. In the modern, globalised world, countries are dependent on each other's production and trade and it is reasonable to assume Swedish investors have meticulously followed not only local coronavirus-related news but also sought information from other countries. This is particularly true of Italy and the United States. The words with highest weights in this topic had roots like Chin-, coronavirus, Trump, USA, Ital-, country, president, infect-, close, quarantine. This proxy was named WORLD.

Coronavirus Spread. A lot of news articles published during our research period were dedicated to the virus and its spread. First there was news about a new virus outbreak in Wuhan, then about deaths associated with this new disease in China. After the virus had spread to other parts of the world and the WHO announced that coronavirus could be characterised as a pandemic, news articles began to

centre upon statistical comparisons between countries, and the ways in which governments were

25 responding to the crisis. The words with highest weights in this topic had roots like infect-, new, coronavirus, virus, disease, China, case. This proxy was named SPREAD.

3.4.5 Sentiment Analysis Sentiment analysis was performed on each article separately in R (R Core Team, 2020), using packages tidyr, tidytext and dplyr. The analysis was performed on the text data translated to English with Microsoft’s translator in Excel and through the use of the bing sentiment lexicon (Hu and Liu, 2004) in R. The translation of text can bring with it some sources of error, but due to the sentiment analysis being done only to get the sentiment orientation of articles, we deemed this error insignificant. Another path would have been to use a Swedish sentiment lexicon (i.e. Dahlgren’s “sentimentlex”, 2019), but due to bing being a built-in lexicon in R and heavily used in literature, we deemed it more reliable and exhaustive than sentiment lexicons found in Github repositories. The bing lexicon contains affective words and their sentiment orientation (positive or negative). The weighted sentiment score allocated to each article was calculated as follows:

퐴푚표푢푛푡 표푓 푝표푠푖푡푖푣푒 푤표푟푑푠푎 − 퐴푚표푢푛푡 표푓 푛푒푔푎푡푖푣푒 푤표푟푑푠푎 푆푒푛푡푖푚푒푛푡 푠푐표푟푒푎 = 푇표푡푎푙 푎푚표푢푛푡 표푓 푤표푟푑푠푎

where a denotes the article. The net sentiment of each article was normalised through division with the total amount of words in the article to ensure that the sentiment score was not dependent on the length of the article, but on the relative proportion of sentiment words used. We believe the length of an article is not a primary determining factor in its potential impact on the reader, while its relative sentiment score is of greater importance. Each article’s sentiment score was then combined with the article's topic score using the following formula, yielding the amplitude for each article’s sentiment proxies.

푆푒푛푡푖푚푒푛푡 푝푟표푥푦푝,푎 = 푆푒푛푡푖푚푒푛푡 푠푐표푟푒푎 × 푇표푝푖푐 푠푐표푟푒푝,푎

where p denotes each proxy respectively (REST, POLIT, ECON, SPORT, FEEL, CULTURE, INSTR, MED, WORLD and SPREAD) and a denotes the article.

26

3.4.6 Allocation of News to Dates The topic and sentiment scoring was performed on individual news articles. To measure the impact of coronavirus-related news on the stock index, the scores per article were then aggregated to one data point per day. This was done due to the stock index data having a daily frequency. The news occuring on weekends or holidays were aggregated to the following trading day.

3.5 Economic Indicators

In literature, the analysis of the effect of sentiment on stock markets has involved the use of explanatory variables encompassing fundamental economic factors, because important economic news most likely causes change in stock returns. Due to the period of our study and daily frequency of data in use, no suitable economic indicators for the Swedish market can be used at the day of writing. Thus, we used economic news in Dagens Nyheter as proxies for changes in fundamentals. Even though Dagens Nyheter does not publish detailed economic reports on main macro- and microeconomic indicators, it presents the economic news regularly and offers insights into the present state of the economy.

3.6 Multiple Regression Two regression models were set up in this study, one explaining the variations in the Stockholm small cap index, the other the Stockholm large cap index through our coronavirus-related sentiment indicators. The models were specified in accordance with econometric theory (Brooks and Tsolacos, 2010) and regressions run in R (R Core Team, 2020) using dynlm F(Zeileis, 2019).

3.6.1 Model Specification The specification of the empirical regression models is done through statistical tests. First was chosen whether the dependent variables, OMXSSCPI and OMXSLCPI indices, should be in level points or in growth rates. This decision was made based on examination of the autocorrelation of level indices and their returns for different lags. Then, a dynamic model with Distributed Lag variables was chosen, with the dependent variables explained by contemporaneous and lagged sentiment proxies. This was due to two factors: Firstly, it could take time for news to reach the audience and secondly, some news articles were published after the market had closed, meaning news could not possibly impact the indices on the day of publishing.

The sentiment proxies allocated to different topics during the modeling process were used as numerical variables in the regression as described by Kaplanski and Levy (2014), creating a multiple

27 regression model with dependent variable “Returnt” and independent sentiment variables in the format

“TopicXt”. We understand that changes in stock prices are largely dictated by real changes in the economy and not by sentiment. To account for effects caused by important economic changes, we included the proxy “ECONt” as a variable based on economic news. We recognise that Swedish restrictions imposed on businesses and the public by politicians have economic implications rather than represent coronavirus-related sentiment. Therefore the proxy “POLITt” also reflects economic fundamentals to a great extent. Furthermore, we admit that Swedish authorities’ restrictions imply changes in the economy, even though these restrictions were not enforced by law. Thus, we treated the following proxies as coronavirus sentiment proxies: SPORT, FEEL, CULTURE, INSTR, MED,

WORLD, SPREAD. 휀푡 is an error term.

Figures 4 and 5 report the results on the regression models in Section 4 and an analysis of the results follows in Section 5.

4. Results

4.1 Autocorrelation Analysis

Firstly, we analysed stock price indices and returns to see if they exhibit serial correlation up to third order. Table 1 shows a high autocorrelation in the indices’ levels, as they are strongly associated with their previous values as suggested in Brooks and Tsolacos (2010). This correlation becomes much less apparent in the indices’ returns. Thus, to remove autocorrelation, index returns were chosen as the dependent variables for the regression models. The low values of present autocorrelation for the returns made the inclusion of lagged returns redundant.

Table 1: Autocorrelation for up to three lags of the indices and their returns.

Lag Large Cap index Small Cap index Large Cap return Small Cap return

0 1 1 1 1 1 0.97670012 0.981824815 –0.007561831 0.06283091

2 0.953023276 0.96121493 0.037710589 0.239080598

3 0.9271483 0.934710414 0.144588413 0.242961469

28

4.2 Cross-Correlation Matrices Next, we studied if our dependent variables, OMXS Price Indices returns exhibit high correlation with the sentiment proxies. The cross-correlation coefficients for the large cap and small cap returns are presented in Tables 2 and 3 respectively.

Table 2: Cross-correlation table for return on the large cap index OMXSLCPI with the proxy variables of up to five lags.

Lag REST POLIT ECON SPORT FEEL CULTURE INSTR MED WORLD SPREAD

0 0.1313 –0.1589 0.0333 0.1210 –0.0270 0.2377 0.1072 0.0206 0.1141 0.1107

1 0.1356 0.0346 0.3070 0.3268 0.0095 0.1959 0.2138 0.0662 0.2118 0.1908

2 –0.1220 –0.0392 –0.0009 0.0443 –0.0953 –0.0730 0.0027 0.1704 –0.1652 –0.0879

3 –0.0243 –0.1353 –0.0084 0.0064 –0.1984 0.0728 0.0843 0.1255 –0.0137 –0.0818

4 –0.0246 –0.2133 0.1191 0.1664 –0.1720 0.0509 0.3259 0.0237 0.0108 0.0573

5 –0.0628 –0.1054 0.0767 0.0670 –0.1505 –0.0709 –0.1229 0.1137 –0.0135 –0.0784

Table 3: Cross-correlation table for return on the small cap index OMXSSCPI with the proxy variables of up to five lags.

Lag REST POLIT ECON SPORT FEEL CULTURE INSTR MED WORLD SPREAD

0 0.1031 –0.2729 0.0992 0.1983 –0.1201 0.1769 0.1674 0.0534 0.0191 0.0712

1 0.1402 0.0057 0.2867 0.2953 –0.0438 0.1660 0.2080 0.1907 0.1956 0.1796

2 –0.0172 0.0228 0.0834 0.0812 –0.0177 0.0141 0.0509 0.2136 –0.1435 –0.0774

3 0.0564 –0.1294 0.0182 0.0323 –0.1322 0.1055 0.1026 0.1922 0.0386 –0.0166

4 0.0908 –0.1355 0.1239 0.1602 –0.0835 0.1208 0.4008 0.1482 0.0857 0.1390

5 –0.1865 –0.2143 0.0040 0.0363 –0.2133 –0.0564 –0.0843 0.0587 –0.0655 –0.0934

29

4.3 Specified Regression Model General specification for modelling the returns on both the large and small cap indices OMXSSCPI and OMXSLCPI:

Formula 1: Dynamic regression model with contemporaneous and first-order lag proxy variables explaining index returns.

4.3.1 Variance Inflation Factors To check for multicollinearity in the models, the variance inflation factors (VIF) were computed using the vif function from the R package car (Fox and Weisberg, 2020). Results are shown in Table 4.

Table 4: VIF for the proxy variables as independent variables.

Variance Inflation Factors RESTt RESTt–1 POLITt POLITt–1

2.18526 2.574774 3.017019 3.244251

ECONt ECONt–1 SPORTt SPORTt–1 FEELt FEELt–1 CULTUREt CULTUREt–1

1.928681 1.97476 2.181635 2.143464 3.280461 2.689358 1.949739 1.693042

INSTRt INSTRt–1 MEDt MEDt–1 WORLDt WORLDt–1 SPREADt SPREADt–1

1.860088 2.380085 1.691282 1.541684 4.05448 4.030508 4.550286 4.27107

Both return models have the same VIF since they have the same explanatory variables. The widely accepted cut-off value implying significance for VIF is 5, thus our models show no significant degree of multicollinearity.

4.4 Regression Outputs

The outputs for the regression models explaining the small and large cap returns are shown in Figure 4 and 5 respectively.

30

Figure 4: Regression output for the regression of small cap index returns on contemporaneous and first-order lag sentiment proxy variables.

Figure 5: Regression output for the regression of large cap index returns on contemporaneous and first-order lag sentiment proxy variables.

31

4.5 Market Volatility and Coronavirus-Related News To study if market volatility could be explained by coronavirus-related sentiment we require a Swedish volatility index. In the absence of any such index, we constructed a graph showing how the returns on small and large cap indices moved in relation to the amount of news articles assigned to each day. This allows for straightforward visual interpretation of market volatility.

Figure 6: Daily returns relative to the number of news articles published on the coronavirus daily.

5. Analysis By examining the regression outputs, it is clear that economic and political news, which we treat as proxies for fundamentals, had a statistically significant impact on stock price indices. There was also a slight difference between the small cap and large cap indices; as suggested by Baker and Wurgler (2007), small cap stocks are more reactive to market sentiment and thus more volatile. This can be interpreted as the small cap index reacting contemporaneously to political news, whereas large cap stocks do not. Also, the immediate impact of political news seems to have been negative, meaning that positive political news affects returns negatively. However, the one-day-lagged effect of political news was positive. This suggests that investors of small and large cap stocks would interpret economic implications of the political news in different ways.

32

Another interesting finding is that some of our sentiment proxies showed a significant impact on the indices – news about sport and spread of the coronavirus. Sports-related sentiment, as discussed in the paper by Kaplanski and Levy (2014), has a significant impact on stock prices. In their research, they stated that the possibility for arbitrage during the FIFA World Cup 2010 diminished after some time. Investors recognised the trend of decreasing stock prices due to negative sentiment, forcing prices back to their fundamental values. In our case, we can still observe arbitrage opportunities connected to sports- related sentiment. During the coronavirus outbreak, it is difficult to interpret sports-related sentiment, but it is reasonable to assume that the absence of sporting events in conjunction with social distancing, home working practices and avoiding social gatherings could account for bad mood and pessimism.

News about the spread of the coronavirus affected the large cap index more significantly than the small cap index which contradicts the existing research (Baker and Wurgler, 2007). This might be a result of the large cap companies being more globalised, perhaps more dependent on international relations and trade. Small cap companies may be more enclosed within the Nordic region. Another explanation could be that bigger stocks are owned by international investors to a greater extent than smaller companies. These investors are therefore sensitive to sentiments related to wider areas of the world. Moreover, real changes in international economies may be more significant than those observed in Sweden, mainly because Sweden has at no stage enacted a curfew drastically limiting economic activity. The differing reaction might also have been due to large cap companies having more employees and thus being more strongly affected by news about the virus spread and the implications for the company in the future. For example, one prevalent issue in news articles allocated to this topic was the relatively ineffective transmission of the coronavirus in children, affecting whether children were allowed to go to schools or not.

As seen in Figure 6, the stock returns became more volatile for both indices as the amount of coronavirus-related news per day increased. We also see that after a certain amount of news articles per day is reached, a further increased daily amount of news did not impact stock volatility. This might be due to investors having experienced a strong initial shock as the amount of pandemic-related news increased and later adjusting, becoming less affected and more resilient. This could also depend on the cumulative allocation of news from weekends to the next working day, which weakened the amount effect. It is reasonable to assume that shock value of the news is also greater in the beginning of the pandemic, with events being repeated as time passes, so that the amount of news articles would correlate with returns’ volatility less with time. A brief examination of the results presented in Figure 6 confirms our hypothesis that coronavirus-related sentiment would influence the stock market prices, not least increase the volatility level.

33

A lot of noise is present in news articles. This can be discerned from the cross-correlation table. Many associations with the returns were as we hypothesised them to be, positive and rather strong. However, many became stronger with greater lags and even changed signs. The fact that so many variables exhibited strong correlations with returns of greater lags than one shows that the impact was not only immediate, but could also take some time to be observed on the market. A possible explanation for this could be that we are in the midst of the ongoing coronavirus pandemic, and massive media coverage of the events created this noise, making it difficult to extract and distill relevant information for a sentiment study.

The reasons for not all sentiment proxies having shown a statistically significant impact could be explained by several assumptions. First, many topics modelled using latent Dirichlet allocation contained the same terms, spreading the impact of important terms across several variables. Moreover, perhaps the fundamental effects and sentiments tied to them alone affected the indices. Articles conveying negative political and economic news are bound to create negative impact on investors, which is perhaps enough to make the marginal impact on investor sentiment from other sentiment proxies negligible. Secondly, time difference could play a significant role in the analysis, i.e. the impact of many news articles could not be reflected in market movements on the same day. March and April are months with numerous public holidays in Sweden, and the stock market was closed earlier than usual on several occasions. Furthermore, the stream of news being published and potentially influencing investors did not observe business hours, and this delayed effect could have caused noise in the data. Finally, in accordance with EMH, we could assume that despite the existing sentiment research and criticism of market efficiency the stock market in Sweden was indeed efficient and showing only a slight sentiment effect. This deduction can also affirm findings from previous research that markets react differently to the same events, and some of them are more efficient than others, even in the midst of the pandemic.

5.1 Sustainability Aspects

The development of increasingly precise prediction mechanisms and analysis of capital markets both contribute to economic sustainability by ensuring investments are made more efficient and capital is allocated in a more correct manner. The recognition and measurement of investor sentiment, as opposed to valuation solely according to fair value principles and fundamentals, can contribute to stock returns reflecting a more accurate total risk of a company, including risks connected to sentiment. A development of a new ‘rational investor’ according to the principles of behavioural finance stated by Statman among others can be created, contributing to the development of a new view on finance.

34

5.2 Further Research

Coronavirus-related sentiment will surely become a big part of behavioural research in the coming years. Thus, it would be useful to create text analysis models tailored to analyse the coronavirus crisis in particular. Existing lexicons and classified data used to train machine learning models in past research are not directly applicable in text analysis of such unique events as the pandemic Also, when compared to English, Swedish text analysis has not been practised as widely, leading to more uncertainty and fewer reliable, tested analytical models. As mentioned earlier, sentiment proxies rendered from sentiment analysis of one market may not be directly transferred to other markets, meaning more research should be conducted in the Swedish environment to provide more empirical evidence and theoretical knowledge. Analysis of daily data has not been commonplace either, and with the increasing amount of data in social media and internet forums, this will definitely be looked into in the near future. Upon visual inspection, it is clear the increasing number of news articles published daily was associated with higher volatility of stocks (see Figure 6). No official Swedish volatility index exists, as opposed to the VIX-index of the Chicago Board of Exchange, making the analysis of sentiment effects on volatility difficult. Developing a nationally accepted Swedish volatility index would therefore prove very fruitful. For further disease-related news analysis, the labelled data produced in this study using latent Dirichlet allocation can be used as training data for a supervised machine learning classification model.

6. Conclusion

This study was designed to investigate whether coronavirus-related news has contributed to market sentiment changes observable on the Swedish stock market which may cause mispricing of stocks, and to analyse how coronavirus-related news could be classified into different themes, quantitatively scored and analysed in order to check which topics had higher significance for Swedish investors. We also aimed to examine whether it is feasible to use sentiment for making profitable investments. Our results show that proxies intended to represent changes in market fundamentals had a genuine impact on stock index returns. The effect was significant, and could be observed in real time as well as one day after the news was published. This could probably be explained by the fact that news articles are often published outside the trading hours of the Swedish stock market, and likely reflects a delayed reaction by investors.

Two of our sentiment proxies constructed from discovered themes in the news, Sport and Coronavirus

Spread, also showed a statistically significant impact on the indices, at 1% and 5% respectively. This

35 finding has interesting implications, with regard to both the efficiency of the Swedish capital markets and in finding novel arbitrage opportunities. Not only has a text analytics method been employed on Swedish data yielding a significant result, but it has also shown that sentiment regarding sports still creates a possible arbitrage opportunity. As seen in Kaplanski and Levy (2014), sentiment effects springing from the 2010 FIFA World Cup created initial arbitrage opportunities which subsequently diminished due to public knowledge of this phenomenon. Following this previous research, one might suspect that sports news in general should not have an effect on markets. However, the results of this study indicate the opposite. This suggests known financial effects are disregarded and overpowered by negative sentiments when investors are placed under considerable stress. This is particularly true when they encounter unforeseen, potentially calamitous events such as the coronavirus pandemic. Should another major, similarly unforeseen set of circumstances arise, we can use the results of this study to be more prepared for how the markets will respond.

36

References

Alessia, D., Ferri, F., Grifoni, P. and Guzzo, T., 2015. Approaches, tools and applications for sentiment analysis implementation. International Journal of Computer Applications, 125(3), pp. 26–33. http://dx.doi.org/10.5120/ijca2015905866. Atkins, A., Niranjan, M. and Gerding, E., 2018. Financial news predicts stock market volatility better than close price. The Journal of Finance and Data Science, 4(2), pp. 120–137. https://doi.org/10.1016/j.jfds.2018.02.002. Baker M. and Wurgler, J., 2007. Investor sentiment in the stock market. Journal of Economic Perspectives, 21(2), pp. 129–152. https://doi.org/10.1257/jep.21.2.129. Blei, D.M., Ng, A.Y. and Jordan, M.I., 2003. Latent Dirichlet Allocation. Journal of machine Learning research, 3(Jan), pp. 993–1022. Brooks, C. and Tsolacos, S., 2010. Real estate modelling and forecasting. New York: Cambridge University Press. Bukovina, J., 2016. Social media big data and capital markets — An overview. Journal of Behavioral and Experimental Finance, 11(C), pp. 18–26. https://doi.org/10.1016/j.jbef.2016.06.002. Cohen, L., Malloy, C. and Nguyen, Q., 2020. Lazy Prices. The Journal of Finance, 75(3), pp. 1371–1415. https://doi.org/10.1111/jofi.12885. Das, S.R., 2014. Text and Context: Language Analytics in Finance. Foundations and Trends in Finance, 8(3), pp. 145–260. http://dx.doi.org/10.1561/0500000045. Day, M.Y. and Lee, C.C., 2016. Deep learning for financial sentiment analysis on finance news providers. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). Davis, CA, United States, 18–21 August 2016. New York: Institute of Electrical and Electronics Engineers' Communications Society (IEEE). De Bondt, W.F. 1995. Investor psychology and the dynamics of security prices. AIMR Conference Proceedings, 1995(7), pp. 7–13. Devitt, A. and Ahmad, K., 2007. Sentiment polarity identification in financial news: A cohesion- based approach. In: Proceedings of the 45th annual meeting of the association of computational linguistics. Prague, Chech Republic, June 2007. Stroudsburg, PA: The Association for Computational Linguistics. Donadelli, M., Kizys, R. and Riedel, M., 2017. Dangerous infectious diseases: Bad news for Main Street, good news for ? Journal of Financial Markets, 35(Sep), pp. 84–103. http://dx.doi.org/10.1016/j.finmar.2016.12.003. Doyle, G. and Elkan, C., 2009. Financial topic models. In: NIPS-2009 Workshop on Applications for Topic Models: Text and Beyond. Whistler, B.C., Canada, 11–12 December 2009. Fama, E., 1970. Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2), pp. 383–417. http://dx.doi.org/10.2307/2325486.

37

Federal Reserve Bank of St. Louis, 2006. Beyond the numbers: An analysis of optimistic and pessimistic language in earnings press releases. [pdf] St. Louis, MO: Federal Reserve Bank of St. Louis. Available at: [Accessed 13 April 2020]. Feuerriegel, S. and Pröllochs, N., 2018. Investor reaction to financial disclosures across topics: An application of latent Dirichlet allocation. A Journal of the Decision Sciences Institute, special issue. http://dx.doi.org/10.1111/deci.12346. Fox, J. and Weisberg, S., 2018. An R companion to applied regression. [e-book] London: Sage publications. Available at: Google Books [Accessed 9 April 2020]. Griffioen, G.A.W., 2003. Technical Analysis in Financial Markets. PhD. Amsterdam School of Economics Research Institute. Available at: [Accessed 30 March 2020]. Hu, M. and Liu, B., 2004. Mining and summarizing customer reviews. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2004). Seattle, WA, United States. Kahneman, D. and Tversky, A., 1979. Prospect Theory: An Analysis of Decision Under Risk. Econometrica, 47(2), pp. 263–91. Available at: [Accessed 19 April 2020]. Kaplanski, G. and Levy, H., 2008. Sentiment and Stock Prices: The Case of Aviation Disasters. Journal of Financial Economics, 95(2), pp. 174–201. Available at: [Accessed 21 April 2020]. Kaplanski, G. and Levy, H., 2014. Sentiment, irrationality and market efficiency: The case of the 2010 FIFA World Cup. Journal of Behavioral and Experimental Economics, 49, pp. 35–43. http://dx.doi.org/10.2139/ssrn.2016442. Lang, S. and Schaefers W., 2015. Examining the sentiment-return relationship in European real estate stock markets. Journal of European Real Estate, 8(1), pp. 24–45. Lee, H., Surdeanu, M., MacCartney, B. and Jurafsky, D., 2014. On the Importance of Text Analysis for Stock Price Prediction. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland, May 2014. Paris: European Language Resources Association (ELRA). Li, G., Zhu, X., Wang, J., Wu, D. and Li, J., 2017. Using LDA Model to Quantify and Visualize Textual Financial Stability Report. Procedia computer science, 122, pp. 370–76. http://dx.doi.org/10.1016/j.procs.2017.11.382. MacKinlay, A.C., 1997. Event Studies in Economics and Finance. Journal of Economic Literature, 35(1), pp. 13–39. Available at: [Accessed 6 April 2020]. Manning, C.D., Raghavan, P. and Schütze, H., 2008. Introduction to information retrieval. New York: Cambridge university press. Mitchell, R., 2018. Web scraping with Python: Collecting more data from the modern web. 2nd ed. Sebastopol, CA: O'Reilly Media. Mitchell, T.M., 1997. Machine learning. New York: McGraw-hill.

38

Murphy, J., 1999. Technical analysis of the financial markets. Rev. ed. New York: New York Institute of Finance. Ricciardi, V. and Simon, H., 2000. What is behavioral finance? Business, Education and Technology Journal, 2(2), pp. 1–9. Shefrin H.M. and Statman, M., 1984. Explaining investor preference for cash . Journal of Financial Economics, 13, pp. 253–82. Shleifer, A. and Vishny, W. R., 1997. The limits of arbitrage. The Journal of Finance, 52(1), pp. 35–55. http://dx.doi.org/10.1111/j.1540-6261.1997.tb03807.x. Statman, M., 1995. Behavioral finance versus standard finance. AIMR Conference Proceedings, 1995(7), pp. 14–22. Statman, M., 2014. Behavioral finance: Finance with normal people. Borsa Istanbul Review, 14(2), pp. 65–73. https://doi.org/10.1016/j.bir.2014.03.001. Symeonidis, L., Daskalakis, G. and Markellos, R.N., 2010. Does the weather affect stock market volatility? Finance Research Letters, 7(4), pp. 214–23. Tetlock, P.C., 2007. Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), pp. 1139–68. https://doi.org/10.1111/j.1540-6261.2007.01232.x. Webster, J.J. and Kit, C., 1992. Tokenization as the initial phase in NLP. In: COLING 1992 Volume 4: The 15th International Conference on Computational Linguistics. 23–28, August, 1992. Praha: International Committee for Computational Linguistics.

Data Dagens Nyheter AB, 2020. Det nya coronaviruset. Dagens Nyheter, [online] Available at: [Accessed 11 May 2020]. The Nasdaq Group, Inc. (2020). Index Overview OMXSLCPI. Retrieved March 16, 2020, from Nasdaq Global Indexes: https://indexes.nasdaqomx.com/Index/Overview/OMXSLCPI. The Nasdaq Group, Inc. (2020). Index Overview OMXSSCPI. Retrieved March 16, 2020, from Nasdaq Global Indexes: https://indexes.nasdaqomx.com/Index/Overview/OMXSSCPI.

Software and websites Berthold, M. R., Cebron, N., Dill, F., Gabriel, Thomas, R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K. and Wiswedel, B., 2007. KNIME - the Konstanz information miner: version 2.0 and beyond. ACM SIGKDD Explorations Newsletter, 11(1). https://doi.org/10.1145/1656274.1656280. Dahlgren, P.M., 2019. Svensk text. [online] GitHub. Available at: [Accessed 21 April 2020]. Fox J. and Weisberg S., 2020. An R Companion to Applied Regression, Third Edition. Thousand

39

Oaks CA: Sage. URL: https://socialsciences.mcmaster.ca/jfox/Books/Companion. Hvitfeldt, E. and Silge, J., 2020. textdata: Download and Load Various Text Datasets. R package version 0.4.1. R Core Team, 2020. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. Robinson, D. and Silge, J., 2020. tidytext: Text Mining and Analysis Using Tidy Data Principles in R. JOSS, *1*(3). https://doi.org/10.21105/joss.00037. RStudio Team, 2020. RStudio: Integrated Development for R. Version 1.2.5042. RStudio, Inc., Boston, MA URL http://www.rstudio.com/. Web Scraper, 2020. Web Scraper Extension. [online] Available at: [Accessed 23 April 2020]. Wickham, H., François, R., Henry, L. and Müller, K., 2020. dplyr: A Grammar of Data Manipulation. R package version 0.8.5. Wickham, H. and Henry, L., 2020. tidyr: Tidy Messy Data. R package version 1.0.3. https://CRAN.R- project.org/package=tidyr. Zeileis, A., 2019. dynlm: Dynamic Linear Regression. R package version 0.3-6, Available at: [Accessed 23 April 2020].

Lexicons Oxford University Press, 2009. A Dictionary of Economics. 3rd ed. Oxford: Oxford University Press. Available at: < https://www.oxfordreference.com> [Accessed 17 May 2020].

40

TRITA-ABE-MBT-20482

www.kth.se

41