Influence of Cryptocurrencies on Lse Twitter Cashtags
Total Page:16
File Type:pdf, Size:1020Kb
INFLUENCE OF CRYPTOCURRENCIES ON LSE TWITTER CASHTAGS Antón Lorenzo García Master’s Thesis presented to the Telecommunications Engineering School Master’s Degree in Telecommunications Engineering Supervisors Rebeca P. Díaz Redondo Ana Fernández Vilas 2018 Acknowledgements This work is funded by: the European Regional Development Fund (ERDF) and the Galician Regional Government under agreement for funding the Atlantic Research Center for Information and Communication Technologies (AtlantTIC), and the Spanish Ministry of Economy and Competitiveness under the National Science Program (TEC2014-54335- C4-3-R and TEC2017-84197-C4-2-R). We thank the Centro de Supercomputación de Galicia (CESGA) for its computational support during the research stay. Vigo, July 15, 2018 i Abstract There is a general consensus about the good sensing and original characteristics of Twitter as an information media for complex financial markets. Analysis establishes Twitter as a relevant feeder for taking decisions regarding the financial market and even fraudulent activities in that market. One of the main mechanisms used in Twitter to track financial tweets is the cashtag, a label formed by the ticker of a company preceded by the $ symbol. However, in the last months the irruption of the cryptocurrencies has produced a degra- dation in the quality of the information obtained through this mechanism. This is due to the fact that a few of them have homonym tickers to those of some of the companies in the main markets, which means that when using the cashtag, results referring both to stock companies and to cryptocurrencies are obtained. With the overall aim of this research, to deploy a classification system that allows split both types of tweets, a set of analysis were made to extract the distinctive features of both information sets. To be precise, the interference between both types is studied for the main London Stock Exchange (LSE) companies through the constituent companies of its two main markets, the FTSE-100 and the AIM-100 between July 1, 2017 and February 15, 2018. In addition, different classifying systems using adapted heuristics and supervised methods have been proposed, analysing their main advantages and limitations, as well as their useful lifespan. The experimental results confirm that a behaviour change can be appreciated in recent months in the data collected by the cashtag of some LSE companies for which there is a cryptocurrency with the same ticker. However, this analysis shows that both types of tweets can be accu- rately split using classifiers that considered the distinctive features of both types of tweets. Key words: AIM-100, Cashtags, Cryptocurrencies, Data Analysis, FTSE-100, London Stock Exchange, Support Vector Machines, Twitter iii Contents Acknowledgementsi Abstract iii List of figures vii List of tables ix 1 Introduction1 2 Related Work5 3 Motivation9 3.1 Original idea..................................9 3.2 Cashtag behaviour change........................... 11 4 Datasets 17 4.1 Extraction process............................... 20 4.2 Tweet structure................................. 21 5 Methodology 23 6 Tweet features 27 6.1 Corpus information............................... 28 6.2 User information................................ 30 6.3 Tweet time and place.............................. 33 7 Application of filtering criteria and results 35 7.1 Heuristic filters................................. 37 7.2 SVM classifiers................................. 39 7.3 Combined systems............................... 42 7.4 LSTM classifiers................................ 44 7.5 Logistic regression systems........................... 47 7.6 Conclusions and limitations.......................... 47 v Contents 8 Conclusions and future lines 49 9 Appendix I: Classifiers 53 9.1 Supervised Methods.............................. 53 9.2 Measurements.................................. 56 10 Appendix II: LSTM networks 59 Bibliography 63 vi List of Figures 3.1 Searches on Google trend evolution...................... 12 3.2 LSE-100 tweet time distribution....................... 13 3.3 LSE-100 tweet time distribution, Homonym(black) vs No Homonym(blue) 14 3.4 Homonym tweets time distribution, LSE(blue) vs Cryptocurrency(black). 15 5.1 Block diagram................................. 26 6.1 Cryptocurrency text word cloud....................... 28 6.2 Company text word cloud........................... 28 6.3 Ticker distribution, LSE(blue) vs Cryptocurrency(black).......... 29 6.4 Cryptocurrency hashtag word cloud..................... 30 6.5 Company hashtag word cloud......................... 30 6.6 Cryptocurrency user description word cloud................. 30 6.7 Company user description word cloud.................... 30 6.8 Follower distribution by user, LSE(blue) vs Cryptocurrency(black).... 31 6.9 Friend distribution by user, LSE(blue) vs Cryptocurrency(black)..... 31 6.10 Cryptocurrency default profile distribution.................. 32 6.11 Company default profile distribution..................... 32 6.12 Account creation time distribution, LSE(blue) vs Cryptocurrency(black). 32 6.13 Tweet time distribution, LSE(blue) vs Cryptocurrency(black)....... 33 7.1 AUC Basic SVM classifier........................... 42 7.2 AUC Extended SVM classifier......................... 42 7.3 AUC Combined SVM classifier........................ 43 7.4 AUC Independent SVM classifier....................... 43 7.5 AUC LSTM SVM classifier.......................... 46 7.6 AUC independent LSTM SVM classifier................... 46 10.1 Recurrent network model........................... 59 10.2 LSTM cell structure.............................. 60 vii List of Tables 3.1 Homonym tickers................................ 13 4.1 Cryptocurrencies captured........................... 17 4.2 LSE-100 tickers................................. 18 4.3 Datasets overview............................... 19 4.4 Tweet structure................................. 22 6.1 Tweet main features.............................. 34 7.1 Wordbase heuristic filter measurements................... 38 7.2 Cryptocurrencies used (Heuristic word filter)................ 38 7.3 Words used (Heuristic word filter)...................... 39 7.4 Independent variables (Basic svm classifier)................. 40 7.5 Vocabulary Extended SVM classifier..................... 41 7.6 SVM classifier measurements......................... 41 7.7 Combined SVM classifier........................... 43 7.8 Independent SVM classifier.......................... 43 7.9 Independent variables(Independent SVM filter)............... 44 7.10 Vocabulary independent SVM classifier.................... 44 7.11 LSTM SVM classifier.............................. 46 7.12 Independent LSTM SVM classifier...................... 46 7.13 Logistic regression systems measurements.................. 47 ix 1 Introduction Progressive usage of technology in the stock market has lead to a continuous growth in their business. Helping both business and individual investors harvest information about diverse topics such as the perspective of the situation of a company, the opinions of the clients, news about significant changes, rumours... As it is well known, success in stock trade highly depends on the quality and the speed of the information to support decision-making. As on-line social media invaded the habits of people, also companies, brokers and other key roles in the financial market began to share more and more useful information and professional opinions about the stock exchanges. All this public information, turns social media in one of the main, if not the greatest, information source for brokers. Currently, Twitter is one of the most used platforms to share financial information from companies, brokers, news agencies or individual investors. As Twitter usage in this context is definitively increasing, it is important to stress that, according to (Sprenger et al., 2014), stock microblogs exhibit three distinct characteristics about stock message boards: (1) Twitter’s public timeline may capture the natural market conversation more accurately and reflect up to date developments; (2) Twitter reflects a more ticker-like live conversation, which allows twitter-bloggers to be exposed to the most recent information of all stocks and does not require users to actively enter the forum for a particular stock; and (3) twitter-bloggers have a strong incentive to publish valuable information to maintain reputation (increase mentions, the rate of retweets, and their followers), while financial bloggers can be indifferent to their reputation in the forum. Providing sensing, harvesting and analysing methods, this information can be very useful for many stakeholders such as businesses, individuals making decisions to invest, stock market analysts or law enforcement agencies. One of the main mechanisms provided by Twitter to track the financial information about a stock company is the cashtag. A cashtag is a label formed by the ticker of a company preceded by the $ symbol. Remember that the ticker of a company is a short 1 Chapter 1. Introduction sequence of letters and sometimes a few numbers, that identifies a stock company in financial environments. For example, in the case of Vodafone, its ticker would be VOD and its cashtags $VOD. This label is added to tweets, similarly to what happens with hashtags, and indicates that it contains financial information about the company the ticker references. Twitter also provides resources to track the tweets that contain a specific cashtag. All of this turns cashtags into one of