Classification of Twitter Trends Using Feature Ranking and Feature Selection

Total Page:16

File Type:pdf, Size:1020Kb

Classification of Twitter Trends Using Feature Ranking and Feature Selection Classification of Twitter Trends using Feature ranking and Feature Selection A Thesis presented to the Faculty of the Graduate School at the University of Missouri In Partial Fulfillment of the Requirements for the Degree Master of Science by Abhishek Shah Dr. Wenjun Zeng, Thesis Supervisor December 2015 The undersigned, appointed by the dean of the Graduate School, have examined the thesis entitled Classification of Twitter Trends using Feature Ranking and Feature Selection Presented by Abhishek Shah, A candidate for the degree of Master of Science, And hereby certify that, in their opinion is worthy of acceptance. Professor Wenjun Zeng Professor Toni Kazic Professor Mike McKean Acknowledgments I would like to thank Dr. Wenjun Zeng for his tremendous support and guidance. He not only provided excellent guidance, but also asked tough questions and pushed for excellence. He took extra time out of his current job to provide guidance and assistance whenever necessary. Without his knowledge and guidance, this work would not have been possible. I would also like to thank Dr. Suman Deb Roy, my mentor, who brought this problem to my attention in the first place. He was the one who showed the pathway to this research. His support and guidance has also been very useful. I would like to acknowledge A. Zubiaga, D. Spina, V. Fresno and R. Martinez, who provided the dataset and whose results were used as a benchmark for our research. Last but not the least, I would like to thank my parents, my brother and Nikita Bhatia for providing unconditional support and affection throughout my academic life. ii Table of Contents List of Figures ........................................................................................................................... iv List of Tables ............................................................................................................................... v Abstract ...................................................................................................................................... vi Chapter 1: Introduction ......................................................................................................... 1 1.1 Background ............................................................................................................................ 3 1.2 Related Work ................................................................................................................................ 6 Chapter 2: Classification System and Dataset ................................................................ 8 2.1 System Overview ......................................................................................................................... 8 2.2 Data Collection .......................................................................................................................... 10 2.3 Trending Topic Categories .................................................................................................... 11 2.4 Data Pre Processing ................................................................................................................. 13 2.5 Data organization ..................................................................................................................... 14 2.6 Data Cleanup .............................................................................................................................. 14 Chapter 3: Feature Selection .............................................................................................. 17 3.1 Feature Ranking ....................................................................................................................... 17 3.2 Forward Selection .................................................................................................................... 21 Chapter 4: Classification ...................................................................................................... 23 4.1 N-different classifiers ............................................................................................................. 23 4.2 Training and Testing Dataset ............................................................................................... 24 4.3 Naïve Bayes Classifier ............................................................................................................. 24 4.4 Bayesian Network .................................................................................................................... 26 Chapter 5: Results .................................................................................................................. 28 5.1 Bag-of-Words Ranking Analysis .......................................................................................... 28 5.2 TF-IDF Ranking Analysis ........................................................................................................ 30 5.3 Bag-of-Words vs TF-IDF ......................................................................................................... 33 5.4 Class Precision Analysis ......................................................................................................... 34 Chapter 6: Discussion and Future Work ........................................................................ 37 6.1 Discussion ................................................................................................................................... 37 6.2 Recognition ................................................................................................................................. 38 6.3 Future Work ............................................................................................................................... 39 Chapter 7: Conclusion ........................................................................................................... 41 References ................................................................................................................................ 43 iii List of Figures Figure 1: An overview of the end-to-end classification system ............................................ 9 Figure 2: Bayesian Network for trending topic classification ............................................ 27 Figure 3: Feature Selection (Bag-of-words) vs. No Feature Selection ............................ 30 Figure 4: Feature Selection (TF-IDF) vs. No Feature Selection .......................................... 33 Figure 5: Meme class precision ...................................................................................................... 35 Figure 6: News class precision ....................................................................................................... 35 Figure 7: Ongoing-event class precision ..................................................................................... 36 Figure 8: Commemorative class precision ................................................................................. 36 iv List of Tables Table 1: Trending Topic Examples with a Sample Tweet ....................................................... 6 Table 2: Example and description of each category of trending topic ............................ 13 Table 3: Top 10 features using bag-of-words approach for each category ................... 18 Table 4: Top 10 features using TF-IDF for each category .................................................... 21 Table 5: Class Precision (%) for features selected using Frequency Count .................. 29 Table 6: Class precision comparison. Feature Selection (bag-of-words) vs No Feature Selection ......................................................................................................................................... 29 Table 7: Class precision (%) for features selected using TF-IDF ranking ..................... 31 Table 8: Feature Selection (TF-IDF) vs No Feature Selection ............................................ 32 v Abstract Twitter scales 500 million tweets per day and has 316 million monthly active users. The majority of tweets are in the form of natural language. Using natural language makes it difficult to understand Twitter's data programmatically. In our research, we attempt to solve this challenge using various machine learning techniques. This thesis includes a new approach for classifying Twitter trends by adding a layer of feature selection and feature ranking. A variety of feature ranking algorithms, such as TF-IDF and bag-of-words, are used to facilitate the feature selection process. This helps in surfacing the important features, while reducing the feature space and making the classification process more efficient. Four Naïve Bayes text classifiers (one for each class), backed by these sophisticated feature ranking and feature selection techniques, are used to successfully categorize Twitter trends. Using the bag-of-words and TF-IDF rankings, our research provides an average class precision improvement, over the current methodologies, of 33.14% and 28.67% correspondingly. vi Chapter 1: Introduction In recent years, with the sudden increase in popularity of various social networks, the way we produce and consume information has changed dramatically. There is a massive amount of information flowing through these social networks. It has forced news organizations, journalists, marketing companies, business organizations, musicians, actors, bloggers, programmers and almost all businesses and communities to change their approach to branding, marketing and networking.
Recommended publications
  • Twitter, Myspace and Facebook Demystified - by Ted Janusz
    Twitter, MySpace and Facebook Demystified - by Ted Janusz Q: I hear people talking about Web sites like Twitter, MySpace and Facebook. What are they? And, even more importantly, should I be using them to promote oral implantology? First, you are not alone. A recent survey showed that 70 percent of American adults did not know enough about Twitter to even have an opinion. Tools like Twitter, Facebook and MySpace are components of something else you may have heard people talking about: Web 2.0 , a popular term for Internet applications in which the users are actively engaged in creating and distributing Web content. Web 1.0 probably consisted of the Web sites you saw back in the late 90s, which were nothing more than fancy electronic brochures. Web 1.5 would have been something like Amazon or eBay, sites on which one could buy, sell and leave reviews. What Web 3.0 will look like is anybody's guess! Let's look specifically at the three applications that you mentioned. Tweet, Tweet Twitter - "Twitter is like text messaging, only you can also do it from the Web," says Dan Tynan, the author of the Tynan on Technology blog. "Instead of sending a message to just one person, you can send it to thousands of people at once. You can choose to follow anyone's update (called "tweets") simply by clicking the Follow button on their profile, or vice-versa. The only rule is that each tweet can be no longer than 140 characters." Former CEO of Twitter Jack Dorsey once accepted an award for Twitter by saying, "We'd like to thank you in 140 characters or less.
    [Show full text]
  • TREND ANALYSIS on TWITTER Tejal Rathod1, Prof
    TREND ANALYSIS ON TWITTER Tejal Rathod1, Prof. Mehul Barot2 1Research Scholar, Computer Department, LDRP-ITR, Gandhinagar Kadi Sarva Vishvavidhyalaya 2Assistant Prof. Computer Engineering Department, LDRP-ITR, Gandhinagar Abstract Twitter has standard syntax which listed follow: Twitter is most popular social media that . User Mentions: when a user mentions allows its user to spread and share another user in their tweet, @-sign is placed information. It Monitors their user postings before the corresponding username. Like @ and detect most discussed topic of the Username movement. They publish these topics on the . Retweets: Re-share of a tweet posted by list called “Trending Topics”. It show what is another user is called retweet. i.e., a retweet happening in the world and what people's means the user considers that the message in opinions are about it. For that it uses top 10 the tweet might be of interest to others. trending topic list. Twitter Use Method which When a user retweets, the new tweet copies provides efficient way to immediately and the original one in it. accurately categorize trending topics without . Replies: when a user wants to direct to need of external data. We can use different another user, or reply to an earlier tweet, Techniques as well as many algorithms for they place the @username mention at the trending topics meaning disambiguation and beginning of the tweet, e.g., @username I classify that topic in different category. totally agree with you. Among many methods there is no standard . Hashtags: Hashtags included in a tweet tend method which gave accuracy so comparative to group tweets in conversations or analysis is needed to understand which one is represent the main terms of the tweet, better and gave quality of the detected topic.
    [Show full text]
  • GOOGLE+ PRO TIPS Google+ Allows You to Connect with Your Audience in a Plethora of New Ways
    GOOGLE+ PRO TIPS Google+ allows you to connect with your audience in a plethora of new ways. It combines features from some of the more popular platforms into one — supporting GIFs like Tumblr and, before Facebook jumped on board, supporting hashtags like Twitter — but Google+ truly shines with Hangouts and Events. As part of our Google Apps for Education agreement, NYU students can easily opt-in to Google+ using their NYU emails, creating a huge potential audience. KEEP YOUR POSTS TIDY While there isn’t a 140-character limit, keep posts brief, sharp, and to the point. Having more space than Twitter doesn’t mean you should post multiple paragraphs; use Twitter’s character limit as a good rule of thumb for how much your followers are used to consuming at a glance. When we have a lot to say, we create a Tumblr post, then share a link to it on Google+. REMOVE LINKS AFTER LINK PREVIEW Sharing a link on Google+ automatically creates a visually appealing link preview, which means you don’t need to keep the link in your text. There may be times when you don’t want a link preview, in which case you can remove the preview but keep the link. Whichever you prefer, don’t keep both. We keep link previews because they are more interactive than a simple text link. OTHER RESOURCES Google+ Training Video USE HASHTAgs — but be sTRATEGIC BY GROVO Hashtags can put you into already-existing conversations and make your Learn More BY GOOGLE posts easier to find by people following specific tags, such as incoming How Google+ Works freshmen following — or constantly searching — the #NYU tag.
    [Show full text]
  • Effectiveness of Dismantling Strategies on Moderated Vs. Unmoderated
    www.nature.com/scientificreports OPEN Efectiveness of dismantling strategies on moderated vs. unmoderated online social platforms Oriol Artime1*, Valeria d’Andrea1, Riccardo Gallotti1, Pier Luigi Sacco2,3,4 & Manlio De Domenico 1 Online social networks are the perfect test bed to better understand large-scale human behavior in interacting contexts. Although they are broadly used and studied, little is known about how their terms of service and posting rules afect the way users interact and information spreads. Acknowledging the relation between network connectivity and functionality, we compare the robustness of two diferent online social platforms, Twitter and Gab, with respect to banning, or dismantling, strategies based on the recursive censor of users characterized by social prominence (degree) or intensity of infammatory content (sentiment). We fnd that the moderated (Twitter) vs. unmoderated (Gab) character of the network is not a discriminating factor for intervention efectiveness. We fnd, however, that more complex strategies based upon the combination of topological and content features may be efective for network dismantling. Our results provide useful indications to design better strategies for countervailing the production and dissemination of anti- social content in online social platforms. Online social networks provide a rich laboratory for the analysis of large-scale social interaction and of their social efects1–4. Tey facilitate the inclusive engagement of new actors by removing most barriers to participate in content-sharing platforms characteristic of the pre-digital era5. For this reason, they can be regarded as a social arena for public debate and opinion formation, with potentially positive efects on individual and collective empowerment6.
    [Show full text]
  • What Is Gab? a Bastion of Free Speech Or an Alt-Right Echo Chamber?
    What is Gab? A Bastion of Free Speech or an Alt-Right Echo Chamber? Savvas Zannettou Barry Bradlyn Emiliano De Cristofaro Cyprus University of Technology Princeton Center for Theoretical Science University College London [email protected] [email protected] [email protected] Haewoon Kwak Michael Sirivianos Gianluca Stringhini Qatar Computing Research Institute Cyprus University of Technology University College London & Hamad Bin Khalifa University [email protected] [email protected] [email protected] Jeremy Blackburn University of Alabama at Birmingham [email protected] ABSTRACT ACM Reference Format: Over the past few years, a number of new “fringe” communities, Savvas Zannettou, Barry Bradlyn, Emiliano De Cristofaro, Haewoon Kwak, like 4chan or certain subreddits, have gained traction on the Web Michael Sirivianos, Gianluca Stringhini, and Jeremy Blackburn. 2018. What is Gab? A Bastion of Free Speech or an Alt-Right Echo Chamber?. In WWW at a rapid pace. However, more often than not, little is known about ’18 Companion: The 2018 Web Conference Companion, April 23–27, 2018, Lyon, how they evolve or what kind of activities they attract, despite France. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3184558. recent research has shown that they influence how false informa- 3191531 tion reaches mainstream communities. This motivates the need to monitor these communities and analyze their impact on the Web’s information ecosystem. 1 INTRODUCTION In August 2016, a new social network called Gab was created The Web’s information ecosystem is composed of multiple com- as an alternative to Twitter.
    [Show full text]
  • Marketing and Communications for Artists Boost Your Social Media Presence
    MARKETING AND COMMUNICATIONS FOR ARTISTS BOOST YOUR SOCIAL MEDIA PRESENCE QUESTIONS FOR KIANGA ELLIS: INTERNET & SOCIAL MEDIA In 2012, social media remains an evolving terrain in which artists and organizations must determine which platforms, levels of participation, and tracking methods are most effective and sustainable for their own needs. To inform this process, LMCC invited six artists and arts professionals effectively using social media to share their approaches, successes, and lessons learned. LOWER MANHATTAN CULTURAL COUNCIL (LMCC): Briefly describe your work as an artist and any other roles or affiliations you have as an arts professional. KIANGA ELLIS (KE): Following a legal career on Wall Street in derivatives and commodities sales and trading, I have spent the past few years as a consultant and producer of art projects. My expertise is in patron and audience development, business strategy and communications, with a special focus on social media and Internet marketing. I have worked with internationally recognized institutions such as The Museum of Modern Art, The Metropolitan Museum of Art, Sotheby’s, SITE Santa Fe, and numerous galleries and international art fairs. I am a published author and invited speaker on the topic of how the Internet is changing the business of art. In 2011, after several months meeting with artists in their studios and recording videos of our conversations, I began exhibiting and selling the work of emerging and international artists through Kianga Ellis Projects, an exhibition program that hosts conversations about the studio practice and work of invited contemporary artists. I launched the project in Santa Fe, New Mexico ⎯Kianga Ellis Projects is now located in an artist loft building in Bedford Stuyvesant, Brooklyn.
    [Show full text]
  • Social Media Events
    University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 2018 Social Media Events Katerina Girginova University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Communication Commons Recommended Citation Girginova, Katerina, "Social Media Events" (2018). Publicly Accessible Penn Dissertations. 3446. https://repository.upenn.edu/edissertations/3446 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/3446 For more information, please contact [email protected]. Social Media Events Abstract Audiences are at the heart of every media event. They provide legitimation, revenue and content and yet, very few studies systematically engage with their roles from a communication perspective. This dissertation strives to fill precisely this gap in knowledge by asking how do social media audiences participate in global events? What factors motivate and shape their participation? What cultural differences emerge in content creation and how can we use the perspectives of global audiences to better understand media events and vice versa? To answer these questions, this dissertation takes a social-constructivist perspective and a multiple-method case study approach rooted in discourse analysis. It explores the ways in which global audiences are imagined and invited to participate in media events. Furthermore, it investigates how and why audiences actually make use of that invitation via an analytical framework I elaborate called architectures of participation (O’Reilly, 2004). This dissertation inverts the predominant top-down scholarly gaze upon media events – a genre of perpetual social importance – to present a much needed bottom-up intervention in media events literature. It also provides a more nuanced understanding of what it means to be a member of ‘the audience’ in a social media age, and further advances Dayan and Katz’ (1992) foundational media events theory.
    [Show full text]
  • Here's Your Grab-And-Go One-Sheeter for All Things Twitter Ads
    Agency Checklist Here's your grab-and-go one-sheeter for all things Twitter Ads Use Twitter to elevate your next product or feature launch, and to connect with current events and conversations Identify your client’s specific goals and metrics, and contact our sales team to get personalized information on performance industry benchmarks Apply for an insertion order if you’re planning to spend over $5,000 (or local currency equivalent) Set up multi-user login to ensure you have all the right access to your clients’ ad accounts. We recommend clients to select “Account Administrator” and check the “Can compose Promotable Tweets” option for agencies to access all the right information Open your Twitter Ads account a few weeks before you need to run ads to allow time for approval, and review our ad policies for industry-specific rules and guidance Keep your Tweets clear and concise, with 1-2 hashtags if relevant and a strong CTA Add rich media, especially short videos (15 seconds or less with a sound-off strategy) whenever possible Consider investing in premium products (Twitter Amplify and Twitter Takeover) for greater impact Set up conversion tracking and mobile measurement partners (if applicable), and learn how to navigate the different tools in Twitter Ads Manager Review our targeting options, and choose which ones are right for you to reach your audience Understand the metrics and data available to you on analytics.twitter.com and through advanced measurement studies .
    [Show full text]
  • Twitter: Explained
    Twitter: Explained Twitter is a micro-blogging tool where users opt-in to receive and send extremely brief content -- or tweets -- with others. Or, in layman’s terms, it’s a way to share thoughts and ideas in 140 characters or fewer. Getting Started When you first register at twitter.com, you will choose a username (also known as your “Twitter handle”). The tweet The “tweet,” or messages used in Twitter, are limited to 140 characters. This creates a wonderful practice of being concise with the message you would like to convey. Interacting If you want to reference or “tweet at” another user, you simply use the @ symbol followed by the Twitter handle of the user. For example, @marcwais. Who to Follow? Twitter is not only a useful tool to connect with your friends and keep abreast of their daily activities. The true power of Twitter is in the viral dissemination of information. You can follow news organizations, celebrities (both movie stars and industry & thought leaders), and other organizations. You could follow the Chronicle of Higher Ed (@chronicle), NCAA News (@ncaa), and other NYU departments & personalities. Organizing your feed When you start to follow users who update a lot throughout the day, you might be overwhelmed with the sheer number of posts. You can organize the people you follow into lists and focus just on one group of users at a time. You can also use a tool like Tweetdeck, which is an application that allows you to group your feed into different columns (for example: friends, news, work, or whatever category you choose to create).
    [Show full text]
  • THE ESSENTIAL SOCIAL MEDIA RESOURCE GUIDE Everything You Ever Wanted to Know About Pinterest, Linkedin, Google+, Twitter, and Facebook
    THE ESSENTIAL SOCIAL MEDIA RESOURCE GUIDE Everything you ever wanted to know about Pinterest, LinkedIn, Google+, Twitter, and Facebook. And then a whole lot more. Copyright © 2016 | Act-On Software www.Act-On.com Your Guide to a Successful Social Media Program If you’re like most people, you’ve spent a good amount of time reading blog posts, downloading eBooks and sharing information about social media so you can get a deeper understanding of it. There’s a lot of information out there – in fact, it may seem as though every time you start your computer, there’s yet more information that you need to digest and understand. The problem is that there’s no simple, easy-to-understand guide that provides a roadmap for setting up, launching, and managing an effective social media program. Oh, sure, there’s plenty of information, but information is useless unless it’s laid out in a clear, action- oriented format. That’s why we created this guide. You’ll get tips on getting started, an overview of advertising opportunities, how to track and measure your social campaigns, insider tips, and common mistakes to avoid. You’ll learn everything you need to know about the most popular social media 11k platforms and more. In the end, our goal is quite simple – we want to provide you with the + tools and techniques you need to run a successful social media program, SHARE actionable information that you can implement right away, and a straightforward, easy-to-use reference guide you can use as you run and manage your social campaigns.
    [Show full text]
  • An Unprecedented Global Communications Campaign for the Event Horizon Telescope First Black Hole Image
    An Unprecedented Global Communications Campaign Best for the Event Horizon Telescope First Black Hole Image Practice Lars Lindberg Christensen Colin Hunter Eduardo Ros European Southern Observatory Perimeter Institute for Theoretical Physics Max-Planck Institute für Radioastronomie [email protected] [email protected] [email protected] Mislav Baloković Katharina Königstein Oana Sandu Center for Astrophysics | Harvard & Radboud University European Southern Observatory Smithsonian [email protected] [email protected] [email protected] Sarah Leach Calum Turner Mei-Yin Chou European Southern Observatory European Southern Observatory Academia Sinica Institute of Astronomy and [email protected] [email protected] Astrophysics [email protected] Nicolás Lira Megan Watzke Joint ALMA Observatory Center for Astrophysics | Harvard & Suanna Crowley [email protected] Smithsonian HeadFort Consulting, LLC [email protected] [email protected] Mariya Lyubenova European Southern Observatory Karin Zacher Peter Edmonds [email protected] Institut de Radioastronomie de Millimétrique Center for Astrophysics | Harvard & [email protected] Smithsonian Satoki Matsushita [email protected] Academia Sinica Institute of Astronomy and Astrophysics Valeria Foncea [email protected] Joint ALMA Observatory [email protected] Harriet Parsons East Asian Observatory Masaaki Hiramatsu [email protected] Keywords National Astronomical Observatory of Japan Event Horizon Telescope, media relations, [email protected] black holes An unprecedented coordinated campaign for the promotion and dissemination of the first black hole image obtained by the Event Horizon Telescope (EHT) collaboration was prepared in a period spanning more than six months prior to the publication of this result on 10 April 2019.
    [Show full text]
  • The Fringe Insurgency Connectivity, Convergence and Mainstreaming of the Extreme Right
    The Fringe Insurgency Connectivity, Convergence and Mainstreaming of the Extreme Right Jacob Davey Julia Ebner About this paper About the authors This report maps the ecosystem of the burgeoning Jacob Davey is a Researcher and Project Coordinator at ‘new’ extreme right across Europe and the US, which is the Institute for Strategic Dialogue (ISD), overseeing the characterised by its international outlook, technological development and delivery of a range of online counter- sophistication, and overtures to groups outside of the extremism initiatives. His research interests include the traditional recruitment pool for the extreme-right. This role of communications technologies in intercommunal movement is marked by its opportunistic pragmatism, conflict, the use of internet culture in information seeing movements which hold seemingly contradictory operations, and the extreme-right globally. He has ideologies share a bed for the sake of achieving provided commentary on the extreme right in a range common goals. It examines points of connectivity of media sources including The Guardian, The New York and collaboration between disparate groups and Times and the BBC. assesses the interplay between different extreme-right movements, key influencers and subcultures both Julia Ebner is a Research Fellow at the Institute for online and offline. Strategic Dialogue (ISD) and author of The Rage: The Vicious Circle of Islamist and Far-Right Extremism. Her research focuses on extreme right-wing mobilisation strategies, cumulative extremism and European terrorism prevention initiatives. She advises policy makers and tech industry leaders, regularly writes for The Guardian and The Independent and provides commentary on broadcast media, including the BBC and CNN. © ISD, 2017 London Washington DC Beirut Toronto This material is offered free of charge for personal and non-commercial use, provided the source is acknowledged.
    [Show full text]