2.1.3 Twitter and GIS Data
Total Page:16
File Type:pdf, Size:1020Kb
A Thesis Submitted for the Degree of PhD at the University of Warwick Permanent WRAP URL: http://wrap.warwick.ac.uk/130202 Copyright and reuse: This thesis is made available online and is protected by original copyright. Please scroll down to view the document itself. Please refer to the repository record for this item for information to help you to cite it. Our policy information is available from the repository home page. For more information, please contact the WRAP Team at: [email protected] warwick.ac.uk/lib-publications Exploring Happiness Indicators In Cities and Industrial Sectors Using Twitter and Urban GIS Data by Neha Gupta Thesis Submitted to the University of Warwick in partial fulfilment of the requirements for admission to the degree of Doctor of Philosophy Supervised by: Prof. Stephen Jarvis and Dr. Weisi Guo Department of Computer Science September 2018 Contents List of Tables iv List of Figures v Acknowledgments viii Declarations x Abstract xii Acronyms xiv Chapter 1 Introduction 1 1.1 Motivation - Social Media Data to Study Sentiment . .1 1.2 Summary of Research Questions . .3 1.3 Thesis Contribution . .4 1.4 Conclusion and Thesis Outline . .6 Chapter 2 Literature Review 7 2.1 Overview of Social Media Analytics . .7 2.1.1 Application In Businesses . .8 2.1.2 Application In Government . .9 2.1.3 Twitter and GIS data . 10 2.2 Twitter Sentiment and Urban Spatial Analysis . 14 2.2.1 Sentiment Classification - Lexicon Based . 16 2.2.2 Sentiment Classification - Machine Learning Methods . 17 2.2.3 Analysing Spatial Patterns of Tweets . 20 2.3 Exploring urban sentiment . 22 2.3.1 Tweet Sentiment - A Proxy of Happiness . 22 2.3.2 Linking Twitter Happiness to Urban Demographic Features . 24 i 2.4 Twitter Sentiment for Industrial Landscape . 25 2.4.1 Dynamics of Twitter Usage - Who Contributes to Twitter . 26 2.4.2 Perceiving Happiness Indicators in Industry . 28 2.5 Twitter Sentiment during Political Events . 29 2.5.1 Twitter Sentiment during Brexit . 30 2.5.2 Comparing Surveys and Social Media Analytics . 31 Chapter 3 Understanding Happiness in Cities using Twitter: Jobs, Children, and Transport 32 3.1 Introduction . 32 3.2 Methods . 32 3.2.1 The Data . 32 3.2.2 Sentiment Labelling using Lexicons . 34 3.2.3 Metrics for Comparison . 35 3.3 Results . 36 3.3.1 Baseline Sentiment Data . 36 3.3.2 Employment Opportunities . 36 3.3.3 Number of Children . 37 3.3.4 Accessibility to Public Transport . 40 3.3.5 Linear Regression of Sentiment vs. Ward Level Parameters . 41 3.4 Discussion . 43 3.5 Conclusion . 44 Chapter 4 Exploring Working Hours and Sentiment in Industry Us- ing Twitter and UK Property Data 45 4.1 Introduction . 46 4.1.1 Describing the Study Hypothesis . 47 4.2 Methods . 47 4.2.1 Datasets Description . 47 4.2.2 Creating Spatial Property Map for London . 49 4.2.3 Extracting Commercial (only) Properties Spatial Layer . 50 4.2.4 Sentiment Analysis using Machine Learning . 52 4.2.5 Spatially Filter Sentiment Labelled Tweets . 53 4.2.6 Extracting Temporal Variables from Tweet Timestamps . 53 4.3 Results . 57 4.3.1 Tweets Volume Comparison - 2012 and 2016 Data . 57 4.3.2 Tweets Volume Comparison to Proportion of People Working in the Business Sector . 58 ii 4.3.3 Tweeting Activity During the Week . 59 4.3.4 Heatmaps of Tweeting Activity Across Sectors . 59 4.3.5 Sentiment across SIC Categories . 60 4.3.6 Tweet Hourly Trends within SIC Categories . 62 4.3.7 Impact of Working Hours (using tweeting intensities) on Tweet Sentiment . 62 4.4 Discussion . 64 4.5 Conclusions . 65 Chapter 5 Exploring Twitter Strengths and Limitations to Detect Event Related Sentiment in Industry - A Brexit Case Study 68 5.1 Introduction . 68 5.2 Data and Methodology . 69 5.2.1 Sourcing Twitter data . 70 5.2.2 Filtering Tweets in Industrial Sectors using Land Registry and INSPIRE Polygon Data . 70 5.2.3 Screening Tweets to Identify Brexit Keywords and Hashes . 71 5.3 Results and Discussion . 73 5.3.1 Industrial Participation in Brexit Conversation . 73 5.3.2 Brexit Related Tweet Sentiment in Selected Industry . 75 5.3.3 Discussing Limitations and Strengths - Twitter Data Linked to Industries . 76 5.4 Conclusion . 77 Chapter 6 Discussion and Conclusion 79 6.1 Thesis Summary . 79 6.2 Discussion . 81 6.2.1 Research results and urban science . 81 6.3 Study Limitations . 85 6.4 Ideas for Future Research . 86 6.5 General Conclusion . 88 iii List of Tables 2.1 Twitter Analysis Off-the-shelf Tools . 15 2.2 Classification Methods for Sentiment Analysis of Tweets . 19 4.1 Numbered of tweets filtered for analysis . 54 4.2 Parameters extracted from a tweets time stamp for each industry SIC codetype ................................. 55 5.1 Tweets Count in Industries during Brexit time window . 71 iv List of Figures 2.1 Tweet JSON, STT features of Tweet . 11 2.2 GIS modelling of real world . 13 2.3 Lexicon-based sentiment analysis . 17 2.4 Machine Learning Techniques . 18 2.5 Spatial Analysis using QGIS . 21 2.6 Tweet Sentiment to study happiness . 24 2.7 Twitter Usage and variability of its content . 27 3.1 Mapping the Sentiment in London:(a) 0.4 million geo-tagged Tweets in Greater London over a 2-weeks period. (b) Tweets labelled as negative (red triangle), positive (green diamond), or neutral (pale circle) on a scale of 11. (c) Ward level sentiment where dark red indicates negative sentiment and dark blue indicates positive sentiment. 33 3.2 Distribution of 2012 Twitter data . 34 3.3 Sentiment Data Analysis: People who tweet more also express stronger aggregate sentiments, but on average express a lower sentiment per tweet. ................................... 37 3.4 Relating Average Sentiment per Person to Jobs Opportunities in London: (a) The number of jobs available in a ward is positively correlated with the sentiment in the ward (adjusted R2 = 0:45). (b) The number of jobs opportunities (jobs normalised against working population) in a ward is positively correlated with the sentiment in the ward (adjusted R2 = 0:47). 38 v 3.5 Relating Avg. Sentiment per Person to Number of Children and Access to Public Transport in London: (a) The percentage of population that are children in a ward is negatively correlated with the sentiment in the ward (adjusted R2 = 0:33). (b) The accessibility to public transport in a ward has a parabolic relationship with the sentiment in the ward (adjusted R2 = 0:44), such that those with good access to public transport are happy and those who are in areas with poor public transport are also happy (rely on personal transport), whilst those that are in between are generally less happy. 39 3.6 Public Transport Access vs. Number of Private Vehicles: Those with poor public transport access levels (PTALs) own up to 4x more private vehicles per household, and the PTALs explains 71% of the variance in car ownership numbers. 41 3.7 Linear Regression Matrix of Sentiment vs. Ward Level Socioeconomic and Infrastructure Metrics. Sentiment correlations are boxed. 42 4.1 INSPIRE Polygon in pink - Blue dots as Tweeting activity . 48 4.2 INSPIRE Polygon Map for London Properties . 50 4.3 Methodology (1) Link diverse sources of data-sets to create a com- mercial properties polygon layer for Greater London. (2) Conduct a spatial join between tweets and commercial property polygons . 52 4.4 Clipping Tweets in Commercial Property Polygons . 54 4.5 Extracting regression parameters from aggregated tweet timestamps for each SIC code type . 56 4.6 Two weeks Tweets Volume Comparison . 57 4.7 Tweets Percentage compared to percentage of Jobs in each sector - 2012 58 4.8 Percentage distribution of tweets during different days of the week - 2016 . 59 4.9 Heat-maps of Tweeting density in London's Industrial Sectors - 2012 60 4.10 Heat-maps of Tweeting density in London's Industrial Sectors - 2016 61 4.11 Sentiment in 2012 . 61 4.12 Sentiment in 2016 . 62 4.13 Hourly normalised moving average tweet count within industry sectors 63 4.14 Multivariate regression p-values for the variables `duration' and `start time' . 64 5.1 Tweeting density in different industries during Brexit . 72 5.2 Brexit Hashtags, Accounts and Keywords . 73 vi 5.3 Tweet types during Brexit time window . 74 5.4 Tweet sentiment during Brexit time window . 75 5.5 SWOT analysis of Twitter and Survey Methods . 77 vii Acknowledgments Thank you Almighty for granting me an opportunity to fulfil my aspiration and blessing me with everything needed to pursue this dream. Thank you to my spiritual gurus, Mother and Sri Aurobindo for their teachings and Grace. Emotionally, I dedicate this Thesis to my father, Prof. (Dr.) Kaushal Kumar, who taught me to aspire high and preserve a positivity for life in all circumstances. Affectionately, I am indebted by my brother, Major. Shubham Agarwal, for sharing those live words in a letter years ago -\A Brave Sister of an Army Officer...”, which has kept me inspired and motivated. I hope this will make them both proud from wherever they are watching. My special and heartily thanks to my supervisor, Professor Stephen Jarvis who always showed trust in me and has been my mentor in this academic journey. It is his ever encouraging support that has brought this work towards a completion. He has been truly a `Guru', showing me light on the darkest nights by offering both professional and emotional support this arduous journey requires.