An Statistical Analysis of Twitterati's Reaction, Across Geographies And

Pages 1–6 An statistical analysis of Twitterati’s reaction, across geographies and gender to the Panama Papers Leak Pruthvi N. Shetty1, Revathy Sridharan2, Snehil Vishwakarma3, Srikanth Kanuri4 and Vikas Jangam5 [email protected] [email protected] [email protected] [email protected] [email protected] Date 05-04-2016 Professor: Muhammad Abdul-Mageed, [email protected] ABSTRACT Naturally, this led to a public outrage on twitter at an unprecedented Motivation: The Panama Papers are a set of 11.5 million leaked global scale, reaching up to 400K tweets in the first 72 hours. We documents that detail financial and attorneyclient information for more saw this as an opportunity to obtain a high quality, data diverse than 214,000 offshore companies associated with the Panamanian data source to harvest the sentiment and identify the polarity of law firm and corporate service provider, Mossack Fonseca. the twitter users, across geographies and gender. Also, since many The leaked documents contain the identities of the companies’ of the accused individuals are active twitter users, this gave us shareholders and directors, as well as some financial transactions. the opportunity to analyze twitter’s reaction to each of the accused Among other things, they illustrate how wealthy individuals, including individual.[4] public officials, can keep personal financial information private.[1] At the time of publication, the papers identified five then-heads of state or government leaders from Argentina, Iceland, Saudi Arabia, Ukraine, and the United Arab Emirates as well as government officials, close relatives, and close associates of various heads of government of more than forty other countries. The British Virgin Islands was home to half of the companies listed and Hong Kong contained the most affiliated banks, law firms, and middlemen. The names of several national leaders appear in the documents, including presidents Khalifa bin Zayed Al Nahyan of the United Arab Emirates, Petro Poroshenko of Ukraine, King Salman of Saudi Arabia, and the Prime Minister of Iceland, Sigmundur Dav Gunnlaugsson. Former heads of state mentioned in the papers include Sudanese president Ahmed al-Mirghani; the Emir of Qatar Hamad bin Khalifa Al Thani; prime ministers Bidzina Ivanishvili of Georgia, Ayad Allawi of Iraq, and Ali Abu al-Ragheb of Jordan; former prime ministers Hamad bin Jassim bin Jaber Al Thani of Qatar, Pavlo Lazarenko of Ukraine, and Ion Sturza of Moldova.[2] The leaked files identified 61 Fig. 1. Countries affected by the Panama Paper leak [4] family members and associates of prime ministers, presidents and kings, including the deceased father of British prime minister David Cameron; the brother-in-law of China’s paramount leader Xi Jinping; the son of Malaysian prime minister Najib Razak; the children of Pakistani prime minister Nawaz Sharif; the children of Azerbaijani president Ilham Aliyev; Clive Khulubuse Zuma, the nephew of Results: The data we collected reflects a largely negative result, with South African president Jacob Zuma; Nurali Aliyev, the grandson of respect to the sentiment of the twitter users. Also, we see a sharp Kazakh president Nursultan Nazarbayev; Mounir Majidi, the personal rebuttal of the accused individual from different corners of the world. secretary of Moroccan king Mohammed VI; Kojo Annan, the son of Based on gender, we have see that male users tend to be more former United Nations Secretary-General Kofi Annan; Mark Thatcher, critical in their tweets. Likewise, we have seen a larger outrage from the son of former British prime minister Margaret Thatcher; and the countries which have been directly affected by the scandal, over other ”favourite contractor” of Mexican president Enrique Pea Nieto.[3] countries. Keywords: Twitter, Panama Papers, Sentiment, Gender, Geography, Social Media c . 1 Project Report - PanamaPaperLeaks 1 INTRODUCTION (e.g. Tweets, Facebook fan page posts and comments, YouTube In this project, we mined twitter for tweets with the hash video comments). tags ”PanamaPapers” and ”panamapapersleak”. In this direction, we pulled the data in two different categories - With location information and without location information. We leveraged the use of packages ‘tweepy’ for Python and ‘twitteR’ & ‘socialmedialab’ for R. Upon collecting over 10,000 tweets per case, we proceed to analyze the data in terms of gender and location to study the polarity and sentiment of the twitter users. For this purpose, we employ packages such as ‘sentiment’, ‘TM’,‘qdap’ and ‘ggplot’. 2 CHALLENGES Sentiment and opinion mining can be useful in several ways. It can help marketers evaluate the success of an ad campaign or new product launch, determine which versions of a product or service are popular and identify which demographics like or dislike particular product features. For example, a review on a website might be broadly positive about a digital camera, but be specifically negative about how heavy it is. Being able to identify this kind of information in a systematic way gives the vendor a much clearer picture of public Fig. 2. Sentiment analysis across six variants. opinion than surveys or focus groups do, because the data is created by the customer. There are several challenges in opinion mining. The first is that 3.3 TM & Sentiment a word that is considered to be positive in one situation may be considered negative in another situation. Take the word ”long” for R provides two packages for working with unstructured text TM instance. If a customer said a laptop’s battery life was long, that and Sentiment. TM can be installed in the usual way. Unfortunately, would be a positive opinion. If the customer said that the laptop’s Sentiment has been archived in 2012, and is therefore more difficult start-up time was long, however, that would be is a negative opinion. to install. However, it can still be installed from an external These differences mean that an opinion system trained to gather repository. Sentiment package contains two handy functions serving opinions on one type of product or product feature may not perform our purposes: very well on another. • classify emotion: This function helps us to analyze some text In this project, we had to carefully examine the tweets collected and classify it in different types of emotion: anger, disgust, fear, from users across countries and gender to form a credible corpus of joy, sadness, and surprise. The classification can be performed data to be used for further analysis. Also, we had to refine the train using two algorithms: one is a naive Bayes classier trained on data, adding certain key words specific to the Panama paper leak to Carlo Strapparava and Alessandro Valituttis emotions lexicon; accurately identify what the tweet implies. the other one is just a simple voter procedure. • classify polarity: In contrast to the classification of emotions, the classify polarity function allows us to classify some text as 3 PACKAGES positive or negative. In this case, the classification can be done by using a naive Bayes algorithm trained on Janyce Wiebes 3.1 twitteR subjectivity lexicon; or by a simple voter algorithm. twitteR is an R package which provides access to the Twitter API. Most functionality of the API is supported, with a bias towards 3.4 Wordcloud API calls that are more useful in data analysis as opposed to As the name suggests, it’s used to build a word cloud. This R daily interaction. It R Based Twitter Client Description Provides an package takes in text data as input and builds word clouds. We will interface to the Twitter web API.It is authored by Jeff Gentry. perform a series of operations on the text data to simplify it. First, we need to create a corpus.Next, we will convert the corpus to a plain 3.2 SocialMediaLab text document. Next, we will remove all punctuation and stopwords. VOSON SocialMediaLab is an R package that provides a suite of Stopwords are commonly used words in the English language such tools for collecting and constructing networks from social media as I, me, my, etc. data. It provides easy-to-use functions for collecting data across There are a few ways to customize it. popular platforms (Instagram, Facebook, Twitter, and YouTube) and generating different types of networks for analysis. SocialMediaLab • scale: This is used to indicate the range of sizes of the words. also collects the associated text data from social media platforms 2 Analysis of Panama Paper Leaks in Twitter • max.words and min.freq: These parameters are used to limit parameters. With the RESTful API, we cannot crawl for data older the number of words plotted. max.words will plot the specified than 15 days from current date.There are also other limitations when number of words and discard least frequent terms, whereas, using RESTful API for crawling data. min.freq will discard all terms whose frequency is below the specified value. 3.5.2 OAuth Authentication: Tweepy tries to make OAuth as painless as possible for you. To begin the process we need to register • random.order: By setting this to FALSE, we make it so that the our client application with Twitter. Create a new application and words with the highest frequency are plotted first. If we dont set once you are done you should have your consumer token and secret. this, it will plot the words in a random order, and the highest Keep these two handy, youll need them. The next step is creating an frequency words may not necessarily appear in the center. OAuthHandler instance. Into this we pass our consumer token and secret which was given to us in the previous paragraph: • rot.per: This value determines the fraction of words that are auth = tweepy.OAuthHandler(consumer token, consumer secret) plotted vertically.

Load more