The Predictive Power of Stock Micro- Blogging Sentiment in Forecasting Stock Market Behaviour

Total Page:16

File Type:pdf, Size:1020Kb

The Predictive Power of Stock Micro- Blogging Sentiment in Forecasting Stock Market Behaviour THE PREDICTIVE POWER OF STOCK MICRO- BLOGGING SENTIMENT IN FORECASTING STOCK MARKET BEHAVIOUR This thesis is submitted for the degree of Doctor of Philosophy By Alya Ali AL-Nasseri Brunel Business School Brunel University, London February 2016 ABSTRACT Online stock forums have become a vital investing platform on which to publish relevant and valuable user-generated content (UGC) data such as investment recommendations and other stock-related information that allow investors to view the opinions of a large number of users and share-trading ideas. This thesis applies methods from computational linguistics and text-mining techniques to analyse and extract, on a daily basis, sentiments from stock-related micro-blogging messages called “StockTwits”. The primary aim of this research is to provide an understanding of the predictive ability of stock micro-blogging sentiments to forecast future stock price behavioural movements by investigating the various roles played by investor sentiments in determining asset pricing on the stock market. The empirical analysis in this thesis consists of four main parts based on the predictive power and the role of investor sentiment in the stock market. The first part discusses the findings of the text-mining procedure for extracting and predicting sentiments from stock-related micro-blogging data. The purpose is to provide a comparative textual analysis of different machine learning algorithms for the purpose of selecting the most accurate text-mining techniques for predicting sentiment analysis on StockTwits through the provision of two different applications of feature selection, namely filter and wrapper approaches. The second part of the analysis focuses on investigating the predictive correlations between StockTwits features and the stock market indicators. It aims to examine the explanatory power of StockTwits variables in explaining the dynamic nature of different financial market indicators. The third part of the analysis investigates the role played by noise traders in determining asset prices. The aim is to show that stock returns, volatility and trading volumes are affected by investor sentiment; it also seeks to investigate whether changes in sentiment (bullish or bearish) will have different effects on stock market prices. The fourth part offers an in-depth analysis of some tweet-market relationships which represent an open problem in the empirical literature (e.g. sentiment-return relations and volume-disagreement relations). The results suggest that StockTwits sentiments exhibit explanatory power in explaining the dynamics of stock prices in the U.S. market. Taking different approaches by combining text-mining techniques with feature selection methods has proved successful in predicting StockTwits sentiments. The applications of the approach presented in this thesis offer real-time investment ideas that may provide investors and their peers with a decision support mechanism. Investor sentiment plays a critical role in determining asset prices in capital markets. Overall, the findings suggest that investor sentiment among noise traders is a priced factor. The findings confirm the existence of asymmetric spillover effects of bullish and bearish sentiments on the stock market. They also suggest that sentiment is a significant factor in explaining stock price behaviour in the capital market and imply the positive role of the stock market in the formation of investor sentiment in stock markets. Furthermore, the research findings demonstrate that disagreement is not only an important factor in determining trading volumes but it is also considered a very significant factor in influencing asset prices and returns in capital markets. Overall, the findings of the thesis provide empirical evidence that failure to consider the role of investor sentiment in traditional finance theory could lead to an imperfect picture when explaining the behaviour of stock prices in stock markets. i DEDICATION Dedicated to the loving memory of my late grandmother and grandfather (may Allah grant their eternal peace) who always desired and prayed for my success but did not live to see this great accomplishment. Also, to my parents and family for their unconditional love and support which helped me to achieve my aspiration. ii ACKNOWLEDGEMENTS I am immensely indebted to Allah (The Almighty God) for bestowing on me the knowledge of His creation. I pray to Him for forgiveness, guidance, and assistance and to continually support me towards success in my whole life in this world and hereafter. I am most grateful for the wise council of my ‘wonderful’ supervisor, Dr. Sergio de Cesare. I have been fortunate and honoured to know him and to work under his supervision. I would like to express my profound gratitude for all his great help support, encouragement, and continuous guidance on this thesis-writing process. My deepest appreciation also goes to Dr. Allan Tucker (Department of Computer Science) for his invaluable guidance, experience, inspiration, support and encouragement throughout my PhD journey. I am particularly grateful to him for working on a couple of joint papers on Text Mining. I owe him my deepest gratitude not only for his very helpful comments and guidance but also for all his kindness to me in boosting my confidence. I feel very lucky to have worked under his guidance and advice. I would also like to express my deepest appreciation to Dr. Faek Menla Ali (Department of Social Science). I am very grateful to him for his continuous support, guidance and invaluable comments and suggestions on the empirical finance part of this thesis. I am very thankful to him for his great advice and valuable comments in writing up and structuring journal papers. Without the unconditional love and endless support of my parents, I would not have been in a position to complete this PhD research. No words can express my gratitude to them. Thank you, Mum and Dad, for your loving support and endless encouragement. I owe a substantial debt of gratitude and thanks to my lovely sisters, Aisha and Tasnim. Both of you, more than anyone, have always been there for me during the twists and turns and through the ups and downs. I am deeply grateful to you, my gorgeous and amazing sisters, and I can never repay even a little of what you have given me of your exceptional love, constant care and emotional support. My special thanks must also go to my three amazing and wonderful brothers, Hossam, Mohammed and Mansoor, for their invaluable support, care and prayers. All three of you have given me constant support and I truly cannot thank you enough. I love you all. Special thanks and sincere gratitude are due also to my friends and other family members who have been very supportive during this challenging period of my PhD study. Finally, I would like to take the opportunity to express my gratitude to members of the StockTwits website for providing me with the relevant data for this research. Without their valuable input, this study would not have been possible. iii DECLARATION I grant powers of discretion to the Librarian of Brunel University to allow this thesis to be copied in whole or in part without the necessity to contact me for permission. This permission covers only single copies made for study purposes subject to the normal conditions of acknowledgment. iv PUBLICATIONS AND CONFERENCES The following journal and conference papers are outputs based on the research conducted during my PhD study: Al Nasseri, A., Tucker, A. and de Cesare, S., 2015. Quantifying StockTwits semantic terms’ trading behavior in financial markets: An effective application of decision tree algorithms. Expert Systems with Applications, 42(23), pp.9192-9210. Al Nasseri, A., Tucker, A. and de Cesare, S., 2014, January. Big Data Analysis of StockTwits to Predict Sentiments in the Stock Market. In Discovery Science (pp. 13-24). Springer International Publishing. Al Nasseri, A., 2014. The Predictive Value of Stock Micro-blogging Sentiments in Predicting Stock Market Behaviour. Proceeding of the British Academy of Management (BAM) Conference. 09-11 September, Belfast Waterfront, Northern Ireland. Al Nasseri, A., 2014. The Predictive Value of Stock Micro-blogging Sentiments in Predicting Stock Market Behaviour. British Academy of Management (BAM): SIG Workshop, University of East Angelia, Norwich, 12-13 June 2014. Al Nasseri, A., 2016. Dispersion of Stock Returns and Investor Sentiment: StockTwits Evidence. World Finance Conference. 29-31 July, New York, Manhattan. Journal Articles Submitted “Dispersion of Stock Returns and Investor Sentiment: StockTwits Evidence”, submitted to, Journal of Empirical Finance. “Investors’ divergence of opinion and Trading Volume: Evidence from Online Stock Forum”, targeted journal, Journal of Banking and Finance. Best Paper Award “The Predictive Value of Stock Micro-blogging Sentiments in Predicting Stock Market Behaviour”, Paper Presented at Brunel Bussiness School, PhD Doctoral Symposium 2014, Brunel University. (Best Overall Paper Award). v Table of Contents ABSTRACT ................................................................................................................... i DEDICATION ..............................................................................................................ii ACKNOWLEDGEMENTS ...................................................................................... iii DECLARATION ........................................................................................................
Recommended publications
  • Social Sentiment Indices Powered by X-Scores
    Social Sentiment Indices powered by X-Scores Brian Davis∗, Keith Cortisy, Laurentiu Vasiliuz, Adamantios Koumpisy, Ross McDermott∗ and Siegfried Handschuhy ∗INSIGHT Centre for Data Analytics, NUI Galway, Ireland Email: [email protected] yUniversitat¨ Passau, Passau, Germany Email: [email protected] zPeracton, Dublin, Ireland Email: [email protected] Abstract—Social Sentiment Indices powered by X-Scores (SSIX) can be incorporated into current operating models as additional seeks to address the challenge of extracting relevant and valuable attributes for executing investment decision-making, with a economic signals in a cross-lingual fashion from the vast variety goal to increase alpha and manage risk for a portfolio. of and increasingly influential social media services; such as Twitter, Google+, Facebook, StockTwits and LinkedIn, and in The European research project Social Sentiment Indices 2 conjunction with the most reliable and authoritative newswires, powered by X-Scores (SSIX) , seeks to assist in this chal- online newspapers, financial news networks, trade publications lenge of incorporating relevant and valuable social media and blogs. A statistical framework of qualitative and quantitative sentiment data into investment decision making by enabling parameters called X-Scores will power SSIX. This framework X-Scores metrics and SSIX indices to act as valid indicators will interpret economically significant sentiment signals that that will help produce increased growth for European Small are disseminated in the social ecosystem. Using X-Scores, SSIX and Medium-sized Enterprises (SMEs). SSIX will extract will create commercially viable and exploitable social sentiment meaningful financial signals in a cross-lingual fashion from a indices, regardless of language, locale and data format.
    [Show full text]
  • Social Sentiment Indices Powered by X-Scores
    ALLDATA 2016 : The Second International Conference on Big Data, Small Data, Linked Data and Open Data (includes KESA 2016) Social Sentiment Indices Powered by X-Scores Brian Davis∗, Keith Cortisy, Laurentiu Vasiliuz, Adamantios Koumpisy, Ross McDermott∗ and Siegfried Handschuhy ∗INSIGHT Centre for Data Analytics, NUI Galway, Ireland Email: [email protected] yUniversitat¨ Passau, Passau, Germany Email: [email protected] zPeracton, Dublin, Ireland Email: [email protected] Abstract—Social Sentiment Indices powered by X-Scores (SSIX) assist in this challenge of incorporating relevant and valuable seeks to address the challenge of extracting relevant and valuable social media sentiment data into investment decision making financial signals in a cross-lingual fashion from the vast variety by enabling X-Scores metrics and SSIX indices to act as of and increasingly influential social media services, such as valid indicators that will help produce increased growth for Twitter, Google+, Facebook, StockTwits and LinkedIn, and in European Small and Medium-sized Enterprises (SMEs). X- conjunction with the most reliable and authoritative newswires, Scores provide actionable analytics in the shape of unique online newspapers, financial news networks, trade publications and blogs. A statistical framework of qualitative and quantitative metrics calculated out of the Natural Language Processing parameters called X-Scores will power SSIX. This framework (NLP) output. SSIX will extract meaningful financial signals will interpret financially significant sentiment signals that are in a cross-lingual fashion from a multitude of social net- disseminated in the social ecosystem. Using X-Scores, SSIX work sources, such as Twitter, Google+, Facebook, StockTwits will create commercially viable and exploitable social sentiment and LinkedIn, and also authoritative news sources, such as indices, regardless of language, locale and data format.
    [Show full text]
  • SSIX Big Data Technologies and Methods for Leveraging Social Sentiment Data in Multiple Business Domains
    February 2017 SSIX Big Data Technologies and Methods for Leveraging Social Sentiment Data in Multiple Business Domains Keith Cortis a, Waqas Khawaja b, Ross McDermott b, Laurentiu Vasiliu c, Adamantios Koumpis a, Siegfried Handschuh a and Brian Davis b a Universitat¨ Passau, Passau, Germany b INSIGHT Centre for Data Analytics, NUI Galway, Ireland c Peracton, Dublin, Ireland Abstract. Social Sentiment Indices powered by X-Scores (SSIX) aims to provide European SMEs with a collection of easy to interpret tools to analyse and under- stand social media users’ attitudes for any given topic. These sentiment character- istics can be exploited to help SMEs operate more efficiently resulting in increased revenues. Social media data represents a combined measure of thoughts and views touching every area of life. SSIX will search and index conversations taking place on social network services, such as Twitter, StockTwits and Facebook, together with the most reliable and trustworthy news agencies, newspapers, blogs and in- dustry publications. A statistical framework of qualitative and quantitative param- eters called X-Scores will power SSIX. Classification and scoring of content will be done using this framework, regardless of language, locale or data architecture. The X-Scores framework will interpret economically significant sentiment signals in social media conversations producing sentiment metrics, such as momentum, breadth, topic frequency, volatility and historical comparison. These metrics will create commercially viable social sentiment indexes, which can be tailored to any domain of interest. By enabling European SMEs to analyse and leverage social sen- timent in their discipline, SSIX will facilitate the creation of innovative products and services by enhancing the investment decision making process, thus assisting in generating increased revenue while also minimising risk exposure.
    [Show full text]
  • 1 Why Pay Attention to Stock Message Boards? 2 a Variety of Stock
    Notes 1 Why Pay Attention to Stock Message Boards? 1. The term quality of life (QOL) references the general well-being of individu- als and societies. 2. www.stocktwits.com 3. http://www.empathica.com/retail2012 4. http://www.accenture.com/us-en/Pages/insight-shopper-preferences.aspx 5. For stocks priced under $1, add 0.5 percent of the principal value to the $7 commission. 6. http://www.sec.gov/News/PressRelease/Detail/PressRelease/1365171513574. Regulation Fair Disclosure is a regulation that was promulgated by the SEC in August 2000. The rule mandates that all publicly traded companies must disclose material information to all investors at the same time. 7. http://www.sec.gov/news/press/2000-135.txt 8. http://www.ftc.gov/opa/2007/06/wholefoods.shtm 9. Penny stocks are usually unlisted, highly speculative, and usually selling for a dollar or less. 10. Asset liquidity refers to how quickly an asset can be converted into cash without a significant loss in value. 11. For the definition of short selling, please visit http://www.sec.gov/answers/ shortsale.htm. Not all the stocks can be shorted. In order to sell the stock short, you must have margin privileges on your brokerage account. Your broker must have available shares to lend to you. You cannot sell short a stock which is under $5. At any time, you must maintain enough capital in your account to place a buy to cover order on your short position to return shares you borrowed to your broker. 2 A Variety of Stock Message Boards 1.
    [Show full text]
  • Aggregate Attention∗
    Aggregate attention Paul J. Irvine Neeley School of Business Texas Christian University June, 2021 Abstract We study some of the aggregate properties of investor attention to the stock mar- ket. We introduce a framework that constructs a null hypothesis of what rational aggregate attention should look like. This framework states that aggregate attention should be proportional to the aggregate wealth invested in each stock. We find that the distribution of attention at first appears rational. However, much of this attention is directed away from the high market cap names that should attract the most attention. Attention is notably more volatile from month to month than market cap, making attention unpredictable from a market maker perspective This pattern generates pric- ing errors that result in the most popular stocks performing poorly in the upcoming month. This poor performance appears to be a short-term reversal of pricing errors generated by the unpredictable liquidity demands volatile attention can generate. All errors are our own. Preliminary and incomplete. Thanks to David Rakowski, Joey Engelberg, Siyi Shen, and seminar participants at the University of South Carolina. Corresponding author: [email protected]. Aggregate Attention Abstract We study some of the aggregate properties of investor attention to the stock market. We introduce a framework that constructs a null hypothesis of what rational aggregate attention should look like. This framework states that aggregate attention should be proportional to the aggregate wealth invested in each stock. We find that the distribution of attention at first appears rational. However, much of this attention is directed away from the high market cap names that should attract the most attention.
    [Show full text]
  • Thinktwenty20 Winter 2020 the Magazine for Financial Professionals
    Issue No.7 ThinkTWENTY20 Winter 2020 The Magazine for Financial Professionals Automated Audits - Distinguishing Myth From Reality How Social Media Became the Go-To Communication Channels Social Media - The New Reporting Frontier The Surprising Power of Experimentation Twenty-Twenty Vision Number 7, Winter 2020 Editor in Chief: Gerald Trites Managing Editor: Gundi Jeffrey Contributing Editor: Eric E Cohen Email: [email protected] Telephone: (416) 602-3931 Subscription rate, digital edition: $29.95 per year, $44.95 for two years. Individual digital issues: $9.95. To subscribe or buy an issue, go to our online store at https://thinktwenty20-magazine.myshopify.com. ISSN 2563-0113 Cover Photo by Ekaterina Bolovtsova from Pexels Submissions for the magazine are invited from people with an in-depth knowledge of accounting or finance. Submissions can be made by email attachment to [email protected]. Articles should be in Microsoft Word in 1 2 pt Calibri Font. They should be 2000 to 3000 words and be well researched as evidenced by the inclusion of references, which should be numbered and included at the end of the article. Bibliographies are also encouraged. Academic papers with extensive mathematical analyses will not be accepted. We ask for first publication rights, after which copyright returns to the author. Founding Partner TABLE OF CONTENTS Editorial……………………………………………………………………………………………………………………...P. 1 Distinguishing Hype from Reality about the Future of Automated Audits..…P. 3 By Gregory P. Shields, CPA, CA Auditors have been put on notice: artificial intelligence (AI) is here and it’s here to stay. The possibility of AI-enabled machines replacing human auditors is creating some angst.
    [Show full text]
  • Predicting Mental Health from Followed Accounts Predicting Ment
    Predicting Mental Health from Followed Accounts 1 Running Head: Predicting Mental Health from Followed Accounts Predicting Mental Health from Followed Accounts on Twitter Cory K. Costello, Sanjay Srivastava, Reza Rejaie, & Maureen Zalewski University of Oregon Author note: Cory K. Costello, Sanjay Srivastava, and Maureen Zalewski, University of Oregon, Department of Psychology, 1227 University of Oregon, Eugene, OR, 97403. Reza Rejaie, University of Oregon, Department of Computer and Information Science, 1202 University of Oregon, Eugene, OR, 97403. Correspondence concerning this article should be addressed to Cory K. Costello, who is now at the Department of Psychology, University of Michigan, 530 Church Street, University of Michigan, Ann Arbor, MI 48109, email: [email protected] Study materials, analysis code, and registrations (including pre-registered Stage 1 protocol) can be found at the project site: https://osf.io/54qdm/ This material is based on work supported by the National Institute of Mental Health under Grant # 1 R21 MH106879-01 and the National Science Foundation under NSF GRANT# 1551817. Predicting Mental Health from Followed Accounts 2 Abstract The past decade has seen rapid growth in research linking stable psychological characteristics (i.e., traits) to digital records of online behavior in Online Social Networks (OSNs) like Facebook and Twitter, which has implications for basic and applied behavioral sciences. Findings indicate that a broad range of psychological characteristics can be predicted from various behavioral residue online, including language used in posts on Facebook (Park et al., 2015) and Twitter (Reece et al., 2017), and which pages a person ‘likes’ on Facebook (e.g., Kosinski, Stillwell, & Graepel, 2013).
    [Show full text]
  • Three Essays in Financial Economics
    THREE ESSAYS IN FINANCIAL ECONOMICS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Alan Paul Kwan May 2017 Copyright c 2017 Alan Paul Kwan ALL RIGHTS RESERVED THREE ESSAYS IN FINANCIAL ECONOMICS Alan Paul Kwan, PhD Cornell University 2017 This dissertation explores three different perspectives on frictions that impact the functioning of financial markets. My first and third essay explore information economics in financial mar- kets. My second essay studies the role of regulatory scope and how financial regulation should be implemented. In Chapter 1, “Does Social Media Cause Excess Comovement?”, I study social media’s potential to impact financial markets. When information is costly to produce, information intermediaries specialize in some stocks, creating flows of information and trading among such stocks. Trading by customers results in “excess”, seemingly non-fundamental comovement. Consistent with this theory, I find that co-mentioning of stocks explains increases in comovement. Three different empirical designs point toward a causal interpretation. In Chapter 2 (joint with Chicago Booth PhD students Ben Charoenwong and Tarik Umar), “Who Should Regulate Investment Advisors?”, we study whether national or local regulators best de- ter investment adviser misconduct. Dodd-Frank provides us a laboratory to observe a large re- jurisdiction event in which state regulators below an arbitrary threshold were delegated to state regulation. Consistent with weakened regulation, customer complaint rates increase. The com- plaints represent more severe, not more frivolous reporting. Finally, they precipitate for firms and adviser representatives it might be assumed under the weakest oversight, such as those further from regulators.
    [Show full text]
  • Marketwatch Virtual Stock Exchange
    Analysis: MarketWatch Virtual Stock Exchange 1. EXECUTIVE SUMMARY Key takeaways: ● VSE has been around longer than most competitors ● Heavily supported by MarketWatch and Dow Jones network ○ Dow Jones (especially WSJ) has excellent advertising capabilities ○ MarketWatch provides high-quality news feed content ○ MarketWatch VSE game is “lumped into” MarketWatch generally for advertising purposes ■ VSE probably receives very little standalone attention from corporate owners (or even advertisers, who are only choosing MarketWatch “generally” for advertising purposes) ○ Dow Jones Media Group was formed in 2016 to focus more/better on digital advertising and millennial demographic ● VSE banner advertising brings in revenue, but game does not ○ Opportunity to position as “loss leader”? ● Website traffic ○ Mostly US (not surprising since game trades only US stocks) ○ Page and domain referrals largely come from within Dow Jones network ■ VSE experienced a big jump in referring domains and pages in 2016, which is the same year that Dow Jones Media Group was formed ○ Some game pages get (or got, while they were still active) far more traffic than the main VSE page ○ Note: I’m not a website traffic expert, so perhaps your programmers will infer other takeaways from the website traffic data that I missed ● General user sentiment seems to be that VSE is best for learning or academic settings, but not super fun as a “game” ○ More engaging in group settings (for bragging rights) but not very engaging to use as a single individual ● Recurring user complaint
    [Show full text]
  • NIMO SM How-To Manual
    National Incident Management Organization Social Media How-To Manual for Incidents DRAFT 3/10/10 Version 1b Disclaimer: This document should not be construed as direction or permission to use social media on Federal wildland fires. It is simply a “primer” on how to use a variety of tools for those who want to build skills. Decisions about social media use are between each individual and their agency. 3/10/10 Version 1b NIMO Social Media How-To Manual Table of Contents Listening Tools Blog Search Page 2 Google Reader Page 3 Google News Page 4 Google Alerts Page 6 Addictomatic Page 8 Technorati Page 9 IceRocket Page 10 Twitter Search Page 11 Monitter Page 12 Twitter How to create an account Page 14 Posting a Twitter update Page 16 Twitter-The Lingo Page 17 Twitter-on-the-fly Page 19 Twitter Lists Page 22 Consider a Tweetup Page 26 Twittlonger Page 27 TwitDoc Page 29 Management Tools TweetDeck Page 32 Hootsuite Page 47 Facebook What is a Facebook Page Page 49 Administering a Page Page 51 Some Facebook Best Practices Page 54 Tweeting from your Facebook Page Page 56 Embedding links on Facebook Page 57 Facebook Insights Page 58 Import-Inciweb-Twitter-Feeds Page 60 Photo Management Tools Picasa Page 64 Flickr Page 73 Analytics Tools for Measuring Success Page 78 Videos My Fire Videos How-To Page 80 YouTube Page 81 Blogger Creating an Incident Blog Page 8 3/10/10 Version 1b Listening Tools Google Blog Search Blog Search is Google search technology focused on blogs. Your results include all blogs, not just those published through Blogger; our blog index is continually updated, so you'll always get the most accurate and up-to-date results.
    [Show full text]
  • Howard Lindzon - Transcript
    Howard Lindzon - Transcript Jim: 00:01:49 Well, Hello everyone. It's Jim O'Shaughnessy with my colleague Jamie Catherwood, one of my favorite people. And we've got to actually tell how we met Howard, because I think it's hysterical. Howard Lindzon, social media [inaudible 00:02:08]. CEO, chairman of the board. I mean, my God, you are... Howard Lindzon is social media. Social media is Howard Lindzon. How are you? Howard: 00:02:18 I am just a little bit crappy, but just, I'm excited to be here. Two old guys talking about stuff we barely know [crosstalk 00:02:28] Jim: 00:02:46 So I'm going to tell the story and then you can fill in the color. So I met Howard here in New York at an office open house. And it was our pals at Ritholtz Wealth Management. And they got actually a very nice office for I think a pretty reasonable- Jim: 00:03:07 Nice. Right next to the library. If you know New York, it's really a great location. And so I went up, Patrick was supposed to come, but as usual, I may be 60, but I act like I'm 10, whereas Patrick is 36 and acts like he's much more mature than I am. Anyway, so he had something else to do. And I went and I met you. And you had this vest on. Do you remember? Howard: 00:03:33 I had a three piece suit. I have no idea why.
    [Show full text]
  • Text Mining of Stocktwits Data for Predicting Stock Prices
    Article Text Mining of Stocktwits Data for Predicting Stock Prices Mukul Jaggi * , Priyanka Mandal , Shreya Narang , Usman Naseem and Matloob Khushi * School of Computer Science, The University of Sydney, Sydney, NSW 2006, Australia; [email protected] (P.M.); [email protected] (S.N.); [email protected] (U.N.) * Correspondence: [email protected] (M.J.); [email protected] (M.K.) Abstract: Stock price prediction can be made more efficient by considering the price fluctuations and understanding people’s sentiments. A limited number of models understand financial jargon or have labelled datasets concerning stock price change. To overcome this challenge, we introduced FinALBERT, an ALBERT based model trained to handle financial domain text classification tasks by labelling Stocktwits text data based on stock price change. We collected Stocktwits data for over ten years for 25 different companies, including the major five FAANG (Facebook, Amazon, Apple, Netflix, Google). These datasets were labelled with three labelling techniques based on stock price changes. Our proposed model FinALBERT is fine-tuned with these labels to achieve optimal results. We experimented with the labelled dataset by training it on traditional machine learning, BERT, and FinBERT models, which helped us understand how these labels behaved with different model architectures. Our labelling method’s competitive advantage is that it can help analyse the historical data effectively, and the mathematical function can be easily customised to predict stock movement. Keywords: BERT; FinBERT; ALBERT; NLP; StockTwits; FinALBERT; FAANG; transformer; pre- training; fine-tuning Citation: Jaggi, M.; Mandal, P.; Narang, S.; Naseem, U.; Khushi, M.
    [Show full text]