Analyzing Spread of Influence in Social Networks for Transportation Applications 09/02/2016 6
Total Page:16
File Type:pdf, Size:1020Kb
STATE OF CALIFORNIA • DEPARTMENT OF TRANSPORTATION ADA Notice For individuals with sensory disabilities, this document is available in alternate TECHNICAL REPORT DOCUMENTATION PAGE formats. For alternate format information, contact the Forms Management Unit TR0003 (REV 10/98) at (916) 445-1233, TTY 711, or write to Records and Forms Management, 1120 N Street, MS-89, Sacramento, CA 95814. 1. REPORT NUMBER 2. GOVERNMENT ASSOCIATION NUMBER 3. RECIPIENT'S CATALOG NUMBER CA16-2875 4. TITLE AND SUBTITLE 5. REPORT DATE Analyzing Spread of Influence in Social Networks for Transportation Applications 09/02/2016 6. PERFORMING ORGANIZATION CODE 7. AUTHOR 8. PERFORMING ORGANIZATION REPORT NO. Lourdes V. Abellera 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. WORK UNIT NUMBER Civil Engineering Department California State Polytechnic University, Pomona 3762 3801 West Temple Avenue 11. CONTRACT OR GRANT NUMBER Pomona, CA 91768 65A0529 TO 029 12. SPONSORING AGENCY AND ADDRESS 13. TYPE OF REPORT AND PERIOD COVERED University of California Center on Economic Competitiveness in Transportation Final Report, May 1, 2015 through April 1, (UCCONNECT), UC Berkeley 2016 2150 Shattuck Avenue, Suite 300 14. SPONSORING AGENCY CODE Berkeley, CA 94704-5940 Caltrans, DRISI 15. SUPPLEMENTARY NOTES Using Twitter data, a research tool was developed for generating a list of potential influential individuals and/or organizations for particular transportation-related topics by counting the number of mentions of a specific Twitter use and retweets of a particular tweet. Their locations are indicated in Google Maps. To date, this work is the only work in the study of influence that is transportation-related. The researchers believe that this tool will advance the state of the practice. 16. ABSTRACT This project analyzed the spread of influence in social media, in particular, the Twitter social media site, and identified the individuals who exert the most influence to those they interact with. There are published studies that use social media to assess public perception and sentiment regarding public transit. The project had two goals. First, to identify the online sources of opinions as expressed in social media networks and second, to identify a set of social network users who collectively have the largest spread of influence on the social network. The sequence of methods to accomplish these goals include, 1) extraction of transportation related social media messages; 2) extract spatial relations between social media users; 3) parameter identification of influence spread models; 4) Influence maximization for transportation topics; 5) GIS visualization; and 6) datasets. After development of the methods were completed, the analysis produced by our approach was evaluated for correctness. We randomly selected subsets of the test data for manual annotation. Specifically, investigators manually determined if these test data messages were relevant to one of the pre-determined transportation topics and if these messages had been so identified by the topic modeling step. The following are the deliverables for this project: a) Analysis of online social media communications to identify statements related to transportation; b) Discovery of originators of frequently transmitted messages on transportation, c) Set of most influential social media users for specific topics (the number of such users varied by the planner), and 4) Output of results in format that can be ingested by industry-standard GIS tools. 17. KEY WORDS 18. DISTRIBUTION STATEMENT UCCONNECT, UC Berkeley, Twitter, social media, GIS The readers can freely refer to and distribute this report. If there is any visualization, datasets, transportation-related messages, influence questions, please contact one of the authors. maximization, communications, transmitted messages, transportation planner, Cal Poly, Pomona, Caltrans, DRISI, data analysis 19. SECURITY CLASSIFICATION (of this report) 20. NUMBER OF PAGES 21. COST OF REPORT CHARGED No security issues 27 Free, for E-copy Reproduction of completed page authorized. DISCLAIMER STATEMENT This document is disseminated in the interest of information exchange. The contents of this report reflect the views of the authors who are responsible for the facts and accuracy of the data presented herein. The contents do not necessarily reflect the official views or policies of the State of California or the Federal Highway Administration. This publication does not constitute a standard, specification or regulation. This report does not constitute an endorsement by the Department of any product described herein. For individuals with sensory disabilities, this document is available in alternate formats. For information, call (916) 654-8899, TTY 711, or write to California Department of Transportation, Division of Research, Innovation and System Information, MS-83, P.O. Box 942873, Sacramento, CA 94273-0001. 5 Analyzing Spread of Influence in Social Networks for Transportation Applications Final Report UCCONNECT 2015-2016 - TO 029 Lourdes V. Abellera, PhD Assistant Professor California State Polytechnic University, Pomona Anand Panangadan, PhD Assistant Professor California State University, Fullerton Table of Contents LIST OF FIGURES 2 LIST OF TABLES 3 ACKNOWLEDGMENTS 4 DISCLAIMER STATEMENT 5 ABSTRACT 6 1. INTRODUCTION 7 1.1 Problem Statement 7 1.2 Relevance 7 1.3 Social Media and Influence 8 1.4 Measures of Influence in Twitter 8 3.1 Strengths and Limitations 12 3.2 An Example- California High Speed Rail 13 3.3 Recommendations and Policy Implications 18 3.4 Other Work 20 4. CONCLUSIONS AND FUTURE WORK 21 5. REFERENCES 23 APPENDIX: Poster Presentations of Related Work 25 1 LIST OF FIGURES Figure 1: Available topics in the web app 15 Figure 2: Result of searching for mentions for the topic “High speed rail” 15 Figure 3: The tweets mentioning the tweets of Washington Examiner, the entity with the highest number of mentions 16 Figure 4: Result of searching for retweets for the topic “High speed rail” 16 Figure 5: The tweet by Scott Walker that was retweeted 68 times 17 2 LIST OF TABLES Table 1: Keywords used to capture the live tweets from Twitter 12 Table 2: Search results of using mention as the measure of influence 17 Table 3: Search results of using retweet as the measure of influence 18 3 ACKNOWLEDGMENTS The California Department of Transportation (Caltrans) provided financial support for this project through the UCCONNECT (University of California Center on Economic Competitiveness in Transportation) Faculty Research Grant for FY 2015- 16. We are grateful for the assistance of Madonna Camel, the UC CONNECT Program Manager, and Christine Azevedo, Caltrans Contract Manager. The following are the main student contributors to this work: Pritesh Pimpale, California State University, Fullerton Nanwarin Chantarutai, California State Polytechnic University, Pomona Diana Lin, California State Polytechnic University, Pomona 4 ABSTRACT Using Twitter data, we developed a tool for generating a list of potential influential individuals and/or organizations for particular transportation-related topics by counting the number of mentions of a specific Twitter user and retweets of a particular tweet. Their locations are indicated in Google Maps. Although papers in the current literature propose different measures of influence using both contrived and real data, mentions and retweets are the most reliable measures of influence. To date, our work is the only work in the study of influence that is transportation- related. We believe our tool will advance the state of the practice. We have listed many purposes that can be addressed by using our tool including limiting misinformation and encouraging acceptance of a new transportation product or service. 6 1. INTRODUCTION 1.1 Problem Statement In Southern California, public transit is an unpopular mode of transportation. There are many reasons for this sentiment. Communities and business districts are so spread out that one person might live in Los Angeles and work in Claremont. It is generally impractical to ride the train or the buses in this situation, unless, for example, the MetroLink can be a convenient alternative. People generally believe that public transportation is slow, not reliable, not safe, and not clean. Some of these beliefs have an actual basis, and some do not. For example, reliability depends on a particular route, and safety depends on the area surrounding a station. Most people who do not usually ride public transportation will never be able to experience its benefit if they hear negative experiences and views of other riders. Consider the following situation. An unpleasant experience happens to a rider, for instance, if she missed her plane due to the Los Angeles Airport FlyAway shuttle being late. The disgruntled passenger, especially if young, might tell her story in Twitter or Facebook, or write about the unpleasant experience in her blog. Consequently, her followers will take note of this experience, and will affect their decisions to take or not take the LAX FlyAway shuttle for their next trip to the airport depending on the amount of influence the dissatisfied passenger has exerted on them. In this project, we have analyzed the concept of influence in social media, in particular, the Twitter social media site, and identified the individuals who exert the most influence to those they interact with using two measures of influence, the Twitter mention and retweet. There are several studies that use social media to assess