Harvesting and Visualizing Twitter Data
Total Page:16
File Type:pdf, Size:1020Kb
Harvesting and Visualizing Twitter Data GIS Workshop Series Spring 2016 Data Services, Mason Libraries Goals “Do you want to explore data generated from social media activities by a topic?” • Introduce Various Tools to collect Twitter Data • Collect geo-tagged Twitter data for a certain topic • CartoDB: Mining tweets for geographic data and Visualizing (a Web Mapping Platform tool—Cloud Based GIS). Social Media Content • Real Time (Live) Streaming • Subject Matter of Interest • Actionable information • Historical Archives • Geospatial data available through SM. Why Twitter Data? • Open Source and Easily Accessible (1 % of Tweet Data to Public with Twitter APIs) • Characteristics of Tweets: 140 characters (text) in length, #hashtags (a specific topic), @, RT, geotagged information • Emerging as Data for Academic Research (http://geosocial.gmu.edu)--spatial & temporal Ways to Collect Geotagged Twitter Data • Collect Tweets without Programming (Nvivo Ncapture/CartoDB) • Collect tweets with Streaming APIs (https://dev.twitter.com/streaming/public ) + some scripting languages (Ruby, Python, JavaScript, PHP) • Scraping with Python and Twitter APIs (In-house library tool) -Capture Live Streaming Data -Extract by a Keyword, a geographic boundary (a specific location), geotagged information & more. -Result in CSV CartoDB Twitter Maps—NEW service • Search Tweets directly and get all geotagged tweets on a specific topic WT using the Twitter APIs or programming • Download tweets from the last 30 days • Perform Geospatial Analysis • Visualize the data on a map/share or publish “One Stop shop via CartoDB” Collecting Data/Visualization • Exercise: CartoDB Twitter Maps (login with your Google or with your own CartoDB account ). See Example) http://www.portland- communications.com/2014/08/indyref-scottish- independence-debate-on-twitter/ • CartoDB Editor --SQL statements for Query (“PostGIS”) • Visualization Style (Simple, Density, Cluster, etc.) CartoDB: Basic Exercises 1) Connect to Twitter 2) Harvest your search 3) Download (various formats including shp format) 4) Edit/Visualize --Create new columns (latitude and longitude) --Create rows/Remove rows --Remove duplicates --Create a subset of the data (based on SQLs) --Add another layer (polygon) and spatial join (points falling in polygons) Simple SQL Statements SELECT some_column(s) FROM some_data-source(s) WHERE some_condition(s); • (the name of the columns you want to select) http://workshops.boundlessgeo.com/postgis- intro/simple_sql.html (see more) Exercise Data • Download Twitter Data for Exercise --Countries.csv --Twitter_messi --Twitter_ronaldo --countries.shp Exercise SQL Statements Example: Update the values in “Ronaldo” column with the number of tweets mentioning his name per country update countries set ronaldo = ( SELECT count(m.cartodb_id) FROM twitter_ronaldo as m, countries as c where st_intersects(m.the_geom,c.the_geom) and countries.the_geom = c.the_geom ); Another SQL Example Example: Updating the “mostplayer” column with either the word “Messi” or “Ronaldo” based on comparing the numbers in “Messi column” and “Ronaldo column”. update countries set mostplayer = case when messi > ronaldo then 'Messi' when ronaldo > messi then 'Ronaldo' else 'N/A' end; Another SQL Example Example: Selecting the number of “Messi” tweets in each country ordered from highest to lowest. (Spatial Query) SELECT c.name, count(m.cartodb_id) FROM twitter_messi as m, countries as c WHERE st_intersects(m.the_geom,c.the_geom) group by c.name order by count desc; See More • Backtweets.com • LibraryTutorial: “Steps to Collect Twitter Data” http://infoguides.gmu.edu/gis/tutorials • PostGIS SQL Statements: http://workshops.boundlessgeo.com/postgis- intro/ • http://www.qsrinternational.com/support/faqs/w hat-is-ncapture • https://www.youtube.com/channel/UCOjxVdT7wK bHKA5PWvFsW3g (Tips for GIS by Ahmad A.) Qs and Contact Joy Suh Geospatial Resources Librarian ([email protected]) Ahmad Aburizaiza GRA for GIS ([email protected]) .