No News Is Bad News

Total Page:16

File Type:pdf, Size:1020Kb

No News Is Bad News n Market report News analytics No news is bad news How news analytics are being used to create a superior trading experience. Bob Giffords* s e-trading competition increase the relevance and A intensifies, traders are recall of searches carried looking for ways to ‘get out.” Besides searching a smarter’. One way is to tap wide range of internal and into a new range of auto- external documents, users mated news flows, mining can also specify their own the text for trading opportu- websites to monitor BNP’s nities and threats. “Current LEOnard information portal. algos are really flying blind if Both ‘push and pull’ tech- they just use market data,” nologies are being used, but maintains Ryan Terpstra, traders want to go further to director of global news feeds drive their quant models at Thomson Financial News. directly with news and not just Brokers are attacking the market data. An early adopter problem head on using a new is JRC Capital Management generation of decision sup- Consultancy & Research in port tools that combine Berlin, a hedge fund that also search engines with text ana- distributes models for invest- lytics. “We are using text ment firms to plug into their mining to structure our trading systems. “A large information resources German bank, for example, according to our own com- has created a certificate prod- petitive intelligence catego- uct using one model,” says ries,” says Valérie Savin- Petra Ristau, head of research Abdelli from the economics and development at JRC. “To research department at BNP stay at the forefront we are Paribas. “By applying our investing heavily in natural own tags in terms of what language processing, together © Digital Vision interests us we can greatly with several partners.” n the trade n issue 13 n jul-sep 2007 101 n Market report News analytics The hedge fund’s man- “Algorithmic news feeds are primarily being agement is very committed n used by quants on the buy-side or by to text processing. “They have seen how fast markets investment banks, which use them for risk can react and how extreme that can be for certain news management. The feeds alert them to changes in and announcements,” says the markets that might cause them to suspend an Ristau. “Most technical quant models can’t cope algo or change its parameters.” with large moves in volatility Ryan Terpstra, director of global news feeds, Thomson Financial News or market rises or falls.” JRC’s aim is to get an early “Traditional vendors like Dow ly this will ever match a spe- warning system that they can Jones and Reuters have cialist trader’s experience and use alongside their technical launched low latency news detailed understanding of the and fundamental analysis. flow offerings with some content. So traders will have Early results are promising, annotation,” says Michael to deploy the technology on a but sometimes surprising. Kearns, professor of computer large scale, achieving slightly Ravenpack, based in Spain, and information science at the better than chance over thou- provides the artificial intelli- University of Pennsylvania, sands of stocks, some thinly gence engine behind the Dow and head of a quant team at traded. That implies a lot of Jones News Analytics service, Banc of America Securities. heavy lifting.” Kearns would offering sentiment analysis of “There are also new entrants like to see much more empha- their news flow. They analysed like Relegence and Monitor sis on fact extraction. and charted the Dow Jones 20- 110, which mine the web and Some, like Dow Jones and year news archive, classifying blog space, as well as local, less Ravenpack, offer market senti- the stories as being S&P posi- accessible or more specialist ment indicators. Reuters has tive, negative or neutral. “What information sources,” he joined forces with Infonic, for- you get looks incredibly like observes. merly Corpora, to offer a soft- the S&P 500 index,” says Phil Approaches differ. The ware product for measuring Gagner, CTO. “Is it exact? No, basic services provide a stream sentiment that can be applied but strikingly similar.” of low latency metadata in to any data feed. XML format to categorise or “Traditionally, this process had Mining the flow tag the articles and extract been performed by a human Natural language processing some key facts like economic linguistic analyst in the PR/ has matured greatly in the last or corporate results or fore- marketing industry, at about few years, and the race to casts that trading algorithms six articles per hour,” says launch news flow products is can act on. “Current vendor Richard Brown, business man- now on. Investment firms can offerings are focused mainly at ager, Reuters NewsScope. “Our now either do this in-house the document level, classifying system is designed to score using text analytic software or the overall intent of an article about 10 articles per second buy pre-processed data servic- and pulling out a few facts,” and can scale to handle even es from the vendors. says Kearns. “It’s most unlike- the highest news flow volume, n the trade n issue 13 n jul-sep 2007 103 n Market report n Market report News analytics News analytics banks, and expects to launch it “Today, algorithmic news in the first quarter next year. feeds are primarily being used “We are focusing on extracting by quants on the buy-side or facts rather than assigning our by investment banks, which own view on sentiment,” says use them for risk manage- Terpstra at Thomson. “We’ve ment,” says Terpstra. “The heard very different views on feeds alert them to changes in sentiment analysis. Our cus- the markets that might cause tomers prefer to draw their them to suspend an algo or own conclusions on sentiment, change its parameters.” for example, by comparing an Much more, of course, is earnings per share announce- promised. “An economic event ment to our First Call esti- such as a better-than-expected mates database.” consumer confidence report Fact extraction requires the can kick off a basket trade to which typically occurs during “[News software to carefully map out purchase retail stocks and earnings season. This makes n analytics] the linguistic structure of the rotate out of other positions,” the analysis applicable to a can help an document and use other infor- says Brown at Reuters. “It can wide array of use cases, includ- algorithm to get mation sources to confirm and be done within milliseconds, ing algorithmic trading.” better behind improve the semantic interpre- saving valuable basis points on the scenes, tation. “We are integrating our the transactions.” Gauging sentiment they can help news with Thomson’s Kearns, who has been the client filter Running sentiment analysis Quantitative Analytics product researching machine learn- the news and software in-house gives con- identify what is suite,” says Terpstra, “to deliv- ing and computational trol. “The output of the sen- important, and er news that is correlated and finance for some years, timent engine is very robust can help the normalised to other content agrees: “For algo trading, and each client may inter- client select the sets including fundamentals, news flow can give tangible pret the results differently,” right algorithm estimates, ownership and his- benefits by listening for key says Brown. “Weighting by to use in the torical tick data.” The platform events or recognising market news source, particular indi- pre-trade will be data agnostic, so it will reactions to such events.” cators, or combinations of assessment of contain content from indicators, as well as varying market Thomson and third parties. Assessing the potential the lag times to market reac- conditions.” Terpstra believes this is very HSBC is still in a research and tions and constructing pro- Tom Middleton, important to quants that need development phase on news head of European prietary trading baskets, are algorithmic to see a comprehensive view at flow, reviewing vendor offer- just a few ways the sentiment trading, Citi any time. ings and exploring use cases. information can be highly All the vendors will pro- “The real challenge will be personalised.” vide a historical archive for assessing return versus invest- Thomson Financial News is back-testing and more or less ment,” says Kevin Bourne, developing its own news flow integration with other data, global head of equities execu- together with a number of including synchronisation of tion, HSBC CIBM. “Is this hedge funds and investment timestamps and symbology. going to become just another 104 n the trade n issue 13 n jul-sep 2007 n Market report n Market report News analytics News analytics layer of complex IT engineer- our own algorithms. For this ing which does not deliver a we’ll use different approaches clear net dollar benefit?” Such from simple heuristics, which caution is widespread. have a lot of merit, to com- One investment bank plex neural nets and genetic ready to move forward is searches to provide automated Citi. “We have just finished adaptive learning.” a due diligence, reviewing Meanwhile, the interest vendors of news flow analyt- grows and third party OMS/ ics and the kinds of analysis EMS vendors are beginning we should apply to them,” to incorporate the feeds into says Tom Middleton, head their platforms. “In June we of European algorithmic announced a dedicated trading at Citi. “We will now adapter to allow users to engage with one vendor and integrate the Dow Jones ‘ele- add this to our algorithmic mentised’ news feed into the “Is this including both technical and trading offering in the com- Apama trading platform,” n [news fractal analysis to deal with the ing months.” There are three says Dr John Bates, founder analytics] going high impact low probability ways news analytics can and vice president, Apama to become just events.
Recommended publications
  • Analysis of News Sentiment and Its Application to Finance
    ANALYSIS OF NEWS SENTIMENT AND ITS APPLICATION TO FINANCE By Xiang Yu A thesis submitted for the degree of Doctor of Philosophy School of Information Systems, Computing and Mathematics, Brunel University 6 May 2014 Dedication: To the loving memory of my mother. Abstract We report our investigation of how news stories influence the behaviour of tradable financial assets, in particular, equities. We consider the established methods of turning news events into a quantifiable measure and explore the models which connect these measures to financial decision making and risk control. The study of our thesis is built around two practical, as well as, research problems which are determining trading strategies and quantifying trading risk. We have constructed a new measure which takes into consideration (i) the volume of news and (ii) the decaying effect of news sentiment. In this way we derive the impact of aggregated news events for a given asset; we have defined this as the impact score. We also characterise the behaviour of assets using three parameters, which are return, volatility and liquidity, and construct predictive models which incorporate impact scores. The derivation of the impact measure and the characterisation of asset behaviour by introducing liquidity are two innovations reported in this thesis and are claimed to be contributions to knowledge. The impact of news on asset behaviour is explored using two sets of predictive models: the univariate models and the multivariate models. In our univariate predictive models, a universe of 53 assets were considered in order to justify the relationship of news and assets across 9 different sectors.
    [Show full text]
  • Text and Context: Language Analytics in Finance
    Text and Context: Language Analytics in Finance Text and Context: Language Analytics in Finance Sanjiv Ranjan Das Santa Clara University Leavey School of Business [email protected] Boston — Delft Foundations and Trends R in Finance Published, sold and distributed by: now Publishers Inc. PO Box 1024 Hanover, MA 02339 United States Tel. +1-781-985-4510 www.nowpublishers.com [email protected] Outside North America: now Publishers Inc. PO Box 179 2600 AD Delft The Netherlands Tel. +31-6-51115274 The preferred citation for this publication is S. R. Das. Text and Context: Language Analytics in Finance. Foundations and Trends R in Finance, vol. 8, no. 3, pp. 145–260, 2014. R This Foundations and Trends issue was typeset in LATEX using a class file designed by Neal Parikh. Printed on acid-free paper. ISBN: 978-1-60198-910-9 c 2014 S. R. Das All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, mechanical, photocopying, recording or otherwise, without prior written permission of the publishers. Photocopying. In the USA: This journal is registered at the Copyright Clearance Cen- ter, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by now Publishers Inc for users registered with the Copyright Clearance Center (CCC). The ‘services’ for users can be found on the internet at: www.copyright.com For those organizations that have been granted a photocopy license, a separate system of payment has been arranged.
    [Show full text]
  • Sentiment Analysis of News Headlines for Stock Trend Prediction
    © 2020 IJRTI | Volume 5, Issue 12 | ISSN: 2456-3315 Sentiment Analysis of News Headlines For Stock Trend Prediction Oshi Gupta Student of Bachelor of Technology Department of Computer Science and Technology Symbiosis University of Applied Sciences, Indore, India Abstract: Stock market is the bone of fast emerging economies. Stock market data analysis needs the help of artificial intelligence and data mining techniques. The volatility of stock prices depends on gains or losses of certain companies. News articles are one of the most important factors which influence the stock market. This project is about taking non quantifiable data such as financial news articles about a company and predicting its future stock trend with news sentiment classification. Assuming that news articles have impact on stock market, this is an attempt to study relationship between news and stock trend. Index Terms: Text Mining, Sentiment analysis, Naive Bayes, Random Forest, Stock trends. 1. INTRODUCTION News has always been an important source of information to build perception of market investments. As the volumes of news and news sources are increasing rapidly, it’s becoming impossible for an investor or even a group of investors to find out relevant news from the big chunk of news available. But, it’s important to make a rational choice of investing timely in order to make maximum profit out of the investment plan. And having this limitation, computation comes into the place which automatically extracts news from all possible news sources, filter and aggregate relevant ones, analyse them in order to give them real time sentiments. Stock market is the bone of fast emerging economies such as India.
    [Show full text]
  • Its Impact on Liquidity and Trading Review Authors
    DRAFT DRAFT DRAFT Automated Analysis of News to Compute Market Sentiment: Its Impact on Liquidity and Trading Review Authors: 1 DRAFT DRAFT DRAFT CONTENTS 0. Abstract 1. Introduction 2. Consideration of asset classes for automated trading 3. Market micro structure and liquidity 4. Categorisation of trading activities 5. Automated news analysis and market sentiment 6. News analytics and market sentiment: impact on liquidity 7. News analytics and its application to trading 8. Discussions 9. References 2 DRAFT DRAFT DRAFT 0. Abstract Computer trading in financial markets is a rapidly developing field with a growing number of applications. Automated analysis of news and computation of market sentiment is a related applied research topic which impinges on the methods and models deployed in the former. In this review we have first explored the asset classes which are best suited for computer trading. We critically analyse the role of different classes of traders and categorise alternative types of automated trading. We present in a summary form the essential aspects of market microstructure and the process of price formation as this takes place in trading. We introduce alternative measures of liquidity which have been developed in the context of bid-ask of price quotation and explore its connection to market microstructure and trading. We review the technology and the prevalent methods for news sentiment analysis whereby qualitative textual news data is turned into market sentiment. The impact of news on liquidity and automated trading is critically examined. Finally we explore the interaction between manual and automated trading. 3 DRAFT DRAFT DRAFT 1. Introduction This report is prepared as a driver review study for the Foresight project: The Future of Computer Trading in Financial Markets.
    [Show full text]
  • Download File
    From Language to the Real World: Entity-Driven Text Analytics Boyi Xie Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2015 c 2015 Boyi Xie All Rights Reserved ABSTRACT From Language to the Real World: Entity-Driven Text Analytics Boyi Xie This study focuses on the modeling of the underlying structured semantic information in natural language text to predict real world phenomena. The thesis of this work is that a general and uniform representation of linguistic information that combines multiple levels, such as semantic frames and roles, syntactic dependency structure, lexical items and their sentiment values, can support challeng- ing classification tasks for NLP problems. The hypothesis behind this work is that it is possible to generate a document representation using more complex data structures, such as trees and graphs, to distinguish the depicted scenarios and semantic roles of the entity mentions in text, which can facil- itate text mining tasks by exploiting the deeper semantic information. The testbed for the document representation is entity-driven text analytics, a recent area of active research where large collection of documents are analyzed to study and make predictions about real world outcomes of the entity mentions in text, with the hypothesis that the prediction will be more successful if the representation can capture not only the actual words and grammatical structures but also the underlying semantic generalizations encoded in frame semantics, and the dependency relations among frames and words. The main contribution of this study includes the demonstration of the benefits of frame semantic features and how to use them in document representation.
    [Show full text]
  • How Conflict and Corruption Impact Financial Markets
    January 2018 Risk Systems That Read® Northfield Information Services Research Dan diBartolomeo After a series of research projects going back to 1997, Northfield has commercially introduced risk models augmented with quantified news flows for investors. News conditioned versions of our “near horizon” models became available to Northfield clients in January 2018. We believe this is the biggest step forward in risk modeling for asset management since the creation of the multi-factor risk model in the 1970s. News and Conditional Risk Models Almost all available risk models are “unconditional.” They are based on a sample of past history that is deemed relevant, possibly giving more weight to recent observations, or assuming a simple trend in volatility (e.g. GARCH). We recently surveyed industry practice and found models based on sample periods ranging from as short as sixty trading days to more than twenty years. Once the sample period is determined, the heroic assumption is made that the future will be like the past. This process omits everything we know about the present, and how the present is different from the past average conditions of the sample period. Using the information about the present to adjust the risk estimates has been standard in some Northfield models since 1997 and in all models since 2009. For our purposes, “News” is the set of information coming to investors that tell us how the present is different from the past. This definition implies that routine information affirming the “status quo” is not news irrespective of how it is delivered. We also recognize the extensive literature showing that investors respond differently to “announcements” (time of information release anticipated) than to “news” where both the content and timing are a surprise.
    [Show full text]
  • Multi-Factor Sentiment Analysis for Gauging Investors Fear by Ichihan Tai B.S. in Engineering in Computer Science, May 2012
    Multi-factor Sentiment Analysis for Gauging Investors Fear by Ichihan Tai B.S. in Engineering in Computer Science, May 2012, University of Michigan M.S. in Engineering in Industrial and Operations Engineering, December 2012, University of Michigan A Praxis submitted to The Faculty of The School of Engineering and Applied Science of The George Washington University in partial fulfillment of the requirements of the degree of Doctor of Engineering August 31, 2018 Praxis directed by Amir H. Etemadi Assistant Professor of Engineering and Applied Science Ebrahim Malalla Visiting Associate Professor of Engineering and Applied Science The School of Engineering and Applied Science of The George Washington University certifies that Ichihan Tai has passed the Final Examination for the degree of Doctor of Engineering as of the date of praxis defense, July 19, 2018. This is the final and approved form of the praxis. Multi-factor Sentiment Analysis for Gauging Investors Fear Ichihan Tai Praxis Research Committee: Amir H. Etemadi, Assistant Professor of Engineering and Applied Science, Praxis Co-Director Ebrahim Malalla, Visiting Associate Professor of Engineering and Applied Science, Praxis Co-Director Takeo Mitsuhashi, Chief Executive Officer & Chief Investment Officer, Committee Member ii © Copyright 2018 by Ichihan Tai All rights reserved iii Acknowledgements I am grateful to my academic advisors Dr. Amir H. Etemadi and Dr. Ebrahim Malalla, and previous academic advisors Dr. Bill Olson and Dr. Paul Blessner for all the support and guidance throughout my doctorial study. I appreciate my employer Takeo Mitsuhashi and Satoshi Tazaki for financial supports and accommodations to my doctorial study. I am thankful to my previous superiors Sunny Park, Andrew Ray, and David Carlebach for all the encouragements and supports to get me started in this doctorial journey.
    [Show full text]
  • Topic Modelling and Sentiment Analysis with the Bangla Language: a Deep Learning Approach Combined with the Latent Dirichlet Allocation
    Topic Modelling and Sentiment Analysis with the Bangla Language: A Deep Learning Approach Combined with the Latent Dirichlet Allocation by Mustakim Al Helal A thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements For the Degree of Master of Science in Computer Science University of Regina Regina, Saskatchewan September, 2018 UNIVERSITY OF REGINA FACULTY OF GRADUATE STUDIES AND RESEARCH SUPERVISORY AND EXAMINING COMMITTEE Mustakim Al Helal, candidate for the degree of Master of Science in Computer Science, has presented a thesis titled, Topic Modelling and Sentiment Analysis with the Bangla Language: A Deep Learning Approach Combined with the Latent Dirichlet Allocation, in an oral examination held on August 28, 2018. The following committee members have found the thesis acceptable in form and content, and that the candidate demonstrated satisfactory knowledge of the subject material. External Examiner: *Dr. Yllias Chali, University of Lethbridge Supervisor: Dr. Malek Mouhoub, Department of Computer Science Committee Member: Dr. Samira Sadaoui, Department of Computer Science Committee Member: Dr. David Gerhard, Department of Computer Science Chair of Defense: Dr. Maria Velez-Caicedo, Department of Geology *via SKYPE Abstract In this thesis, the Bangla language topic modelling and sentiment analysis has been researched. It has two contributions lining up together. In this regard, we have proposed different models for both the topic modelling and the sentiment analysis task. Many research exist for both of these works but they do not address the Bangla language. Topic modelling is a powerful technique for unsupervised analysis of large document collections. There are various efficient topic modelling techniques available for the English language as it is one of the most spoken lan- guages in the whole world, but not for the other spoken languages.
    [Show full text]
  • Predicting Returns with Text Data
    WORKING PAPER · NO. 2019-69 Predicting Returns with Text Data Zheng Tracy Ke, Brian Kelly, Dacheng Xiu MAY 2019 5757 S. University Ave. Chicago, IL 60637 Main: 773.702.5599 bfi.uchicago.edu Predicting Returns with Text Data∗ Zheng Tracy Ke Bryan Kelly Dacheng Xiu Department of Statistics Yale University, AQR Capital Booth School of Business Harvard University Management, and NBER University of Chicago May 14, 2019 Abstract We introduce a new text-mining methodology that extracts sentiment information from news articles to predict asset returns. Unlike more common sentiment scores used for stock return prediction (e.g., those sold by commercial vendors or built with dictionary-based methods), our supervised learning framework constructs a sentiment score that is specifically adapted to the problem of return prediction. Our method proceeds in three steps: 1) isolating a list of sentiment terms via predictive screening, 2) assigning sentiment weights to these words via topic modeling, and 3) aggregating terms into an article-level sentiment score via penalized likelihood. We derive theoretical guarantees on the accuracy of estimates from our model with minimal assumptions. In our empirical analysis, we text-mine one of the most actively monitored streams of news articles in the financial system|the Dow Jones Newswires|and show that our supervised sentiment model excels at extracting return-predictive signals in this context. Key words: Text Mining, Machine Learning, Return Predictability, Sentiment Analysis, Screen- ing, Topic Modeling, Penalized
    [Show full text]
  • Leveraging Text Mining and Analytical Technology to Enhance Financial Planning and Analysis
    Iowa State University Capstones, Theses and Creative Components Dissertations Spring 2021 Leveraging Text Mining and Analytical Technology to Enhance Financial Planning and Analysis Yih-Shan Sheu Follow this and additional works at: https://lib.dr.iastate.edu/creativecomponents Part of the Business Analytics Commons Recommended Citation Sheu, Yih-Shan, "Leveraging Text Mining and Analytical Technology to Enhance Financial Planning and Analysis" (2021). Creative Components. 810. https://lib.dr.iastate.edu/creativecomponents/810 This Creative Component is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Creative Components by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Leveraging Text Mining and Analytical Technology to Enhance Financial Planning and Analysis by Yih-Shan Sheu A creative component submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF BUSINESS ADMINISTRATION (MBA) and MASTER OF SCIENCE (MS) Major: Business Analytics (MBA) and Information Systems (MS) Program of Study Committee: Dr. Anthony M. Townsend, Major Professor (MSIS) Dr. Valentina Salotti (MBA) The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this creative component. The Graduate College will ensure this creative component is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2021 Copyright © Yih-Shan Sheu, 2021. All rights reserved. 1 TABLE OF CONTENTS LIST OF FIGURES ...........................................................................................................
    [Show full text]
  • The Relevance of High-Frequency News Analytics for Lower-Frequency Investment Strategies∗
    The relevance of high-frequency news analytics for lower-frequency investment strategies∗ David Happersbergerya,b Harald Lohrea,b Ingmar Noltea Carsten Rotherb,c aCentre for Financial Econometrics, Asset Markets and Macroeconomic Policy, Lancaster University Management School, United Kingdom bInvesco Quantitative Strategies, Frankfurt am Main, Germany cUniversity of Hamburg, Germany This version: January 15, 2020 Abstract This paper investigates whether lower-frequency investment strategies can be enhanced by the use of high-frequency news analytics. First, we study the cross-sectional characteristics of a broad set of indicators generated from news flow data. While long-short equity portfolios based on news sentiment indicators show promise in global and European stock universes, results for the US and Japan are rather modest. Second, we investigate the benefits of incorporating news-based equity factors into multi-factor investment strategies. Risk-based asset allocation strategies such as minimum-variance and risk parity strategies benefit from augmenting a portfolio of global equity factors by news sentiment-related equity factors. Also, active factor allocation strategies can be enhanced by utilizing the information embedded in news flow data. Factor timing using fundamental and technical time-series predictors generates statistically significant and economically relevant results. Similarly, a factor tilting strategy that exploits cross-sectional news-related information outperforms an equally weighted benchmark portfolio. JEL classification: G11, G12, G17. Keywords: News Analytics, News Sentiment, Portfolio Sorts, Factor Timing, Factor Tilting, Parametric Portfolio Policies. ∗We thank Stefan Mittnik, and the participants of the 2018 CEQURA Conference on Advances in Financial and Insurance Risk Management in Munich for fruitful discussions and suggestions.
    [Show full text]
  • Social Media Analytics: a Survey of Techniques, Tools and Platforms
    AI & Soc (2015) 30:89–116 DOI 10.1007/s00146-014-0549-4 OPEN FORUM Social media analytics: a survey of techniques, tools and platforms Bogdan Batrinca • Philip C. Treleaven Received: 25 February 2014 / Accepted: 4 July 2014 / Published online: 26 July 2014 Ó The Author(s) 2014. This article is published with open access at Springerlink.com Abstract This paper is written for (social science) scientists seeking to utilize social media scraping and researchers seeking to analyze the wealth of social media analytics either in their research or business. The data now available. It presents a comprehensive review of retrieval techniques that are presented in this paper are software tools for social networking media, wikis, really valid at the time of writing this paper (June 2014), but they simple syndication feeds, blogs, newsgroups, chat and are subject to change since social media data scraping APIs news feeds. For completeness, it also includes introduc- are rapidly changing. tions to social media scraping, storage, data cleaning and sentiment analysis. Although principally a review, the Keywords Social media Á Scraping Á Behavior paper also provides a methodology and a critique of social economics Á Sentiment analysis Á Opinion mining Á media tools. Analyzing social media, in particular Twitter NLP Á Toolkits Á Software platforms feeds for sentiment analysis, has become a major research and business activity due to the availability of web-based application programming interfaces (APIs) provided by 1 Introduction Twitter, Facebook and News services. This has led to an ‘explosion’ of data services, software tools for scraping and Social media is defined as web-based and mobile-based analysis and social media analytics platforms.
    [Show full text]