Time to Check the Tweets: Harnessing Twitter As a Location-Indexed Source of Big Geodata

Total Page:16

File Type:pdf, Size:1020Kb

Time to Check the Tweets: Harnessing Twitter As a Location-Indexed Source of Big Geodata Time to Check the Tweets: Harnessing Twitter as a Location-indexed Source of Big Geodata Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Arts in the Graduate School of The Ohio State University By Kiril Vatev, B.S. Graduate Program in Geography The Ohio State University 2013 Thesis Committee: Karl Ola Ahlqvist, Advisor Ningchuan Xiao Copyright by Kiril Vatev 2013 Abstract Twitter, a popular online social network based on the microblogging format, is an important and significant source of both locational and archival data. It lends to a wide range of topics, from predicting the results of political elections, analysis of natural disasters, and tracking of disease epidemics, to studying political movements and protests, notably in cases of protest organization in Moldova and Iran, and recent events in the Arab Spring. However, many obstacles and barriers to entry are posed by Twitter’s system. Twitter has two main ways of obtaining data with different API requirements, and both returning different formats of data. System-wide authentication further complicates the data collection process. This research is based in creating an open-source software package to overcome the difficulties and barriers to obtaining a Twitter dataset, for both programming aficionados and those only interested in the data. The goals of the software are to ease the collection, storage, and future access to those interesting and important datasets. The resulting software package provides ways to drill down into the data through advanced queries not available from Twitter, and retains the data for extended analysis and reference over time. The software provides a friendly user interface, as well as plug-in capabilities for programmers. ii Vita 2011 ........................................... Bachelor’s of Science in Geography, specialization in GIS, The Ohio State University February 2012 ............................ “Strategies in Implementation of ArcGIS Server as a Data Provider for Online Geogames.” Association of American Geographers Annual Meeting. New York, NY. April 2012 ................................... “Green Revolution Geogame Demo.” Innovate OSU, Steal My Idea Showcase. Columbus, OH. July 2012 .................................... “Responsive Map-app: From Design to Implementation.” Esri International User Conference. San Diego, CA. June 2012 to August 2012 ......... Esri: Applications Prototype Lab Intern May 2011 to present .................. Research Assistant and Software Developer – The GeoGame Project, The Ohio State University: Department of Geography Fields of Study Major Field: Geography iii Table of Contents Abstract .......................................................................................................................................... ii Vita ................................................................................................................................................. iii List of Tables .................................................................................................................................. v List of Figures ................................................................................................................................. vi Introduction ................................................................................................................................... 1 Background .................................................................................................................................... 3 Understanding Twitter ...................................................................................................... 4 Sample previous work ....................................................................................................... 7 Geocoding ....................................................................................................................... 11 Software Architecture .................................................................................................................. 13 Data format ..................................................................................................................... 13 The development environment ...................................................................................... 14 The database ................................................................................................................... 17 Geocoding service ........................................................................................................... 19 Other considerations ...................................................................................................... 20 Software Implementation ............................................................................................................ 22 Sample Dataset and Testing ......................................................................................................... 25 Discussion ..................................................................................................................................... 30 Conclusion and Distribution ......................................................................................................... 33 References ................................................................................................................................... 34 Appendix A: The Manual .............................................................................................................. 38 iv List of Tables Table 1: Supervised location categories ...................................................................................... 26 Table 2: Supervised location counts ............................................................................................ 27 Table 3: Unsupervised location categories .................................................................................. 28 Table 4: Unsupervised location counts ........................................................................................ 28 Table 5: Classification problem areas .......................................................................................... 29 v List of Figures Figure 1: Software architecture .................................................................................................... 21 Figure 2: The task creation form .................................................................................................. 43 Figure 3: Task data page .............................................................................................................. 45 vi Introduction Locational data is no longer confined to the ubiquitous shape file (.shp), and today can be gathered from countless sources and data services: RSS feeds, social media, and Open 311 feeds to name a few. With over one billion GPS-equipped smartphones in use, a number predicted to double in the next three years (Strategy Analytics 2012), and advancements in Internet infrastructure, accurately placed geographic data can be produced more easily than ever before. However, putting this data to full use in geographic research is still problematic for a number of reasons. The data can come in any format that the data service deems appropriate, with any number of barriers to entry, and are almost always inaccessible to conventional geographic information systems (GIS), such as ArcMap. To further complicate data collection, location is often not standardized within a single data service – let alone across the various services – and in itself presents a challenge to data collection, often resulting in the necessary discarding of large amounts of data – at times as high as 99%, as I have found in my preliminary work. Twitter, in particular, is a major source of interesting and relevant geographic information. Twitter is a rapidly growing social network site created in 2006. It is currently the fourth largest social network globally (Global Wed Index 2013), and the largest with an open access. Twitter adopts – and arguably created – the microblogging format. The network is based entirely on users sharing 140 character messages, called tweets, openly with the public (Gonzalez et al. 2011). These messages can spread quickly through the general public, and as such, Twitter is often regarded as “electronic word of mouth,” (Tumasjan et al. 2011), consisting of small bits of information that are important to and reflective of the opinions of the author. With the wide array of data Twitter can provide, and in its wide array of formats, Gao et al. (2011) argue that there is no standardized system to collect such data, and not even a comprehensible one. Past efforts to do so in academia have been ad-hoc, and often the most 1 problematic part of research. However, data can be used on its own, or as a complement to other sources, to answer many questions raised by geographers. The first part of my thesis will focus on defining and examining the uses and limitations of Twitter as they exist today, while the second will be to develop a software package suitable for research purposes. My research collects and assembles the latest techniques in geographic data collection and location extrapolation and geocoding, and combines them in a standardized tool for collection and distribution of geographic data. Importance will be placed not only on the functionality and optimization of the software, but also on the user experience and software friendliness. The resulting software package,
Recommended publications
  • Metadata for Semantic and Social Applications
    etadata is a key aspect of our evolving infrastructure for information management, social computing, and scientific collaboration. DC-2008M will focus on metadata challenges, solutions, and innovation in initiatives and activities underlying semantic and social applications. Metadata is part of the fabric of social computing, which includes the use of wikis, blogs, and tagging for collaboration and participation. Metadata also underlies the development of semantic applications, and the Semantic Web — the representation and integration of multimedia knowledge structures on the basis of semantic models. These two trends flow together in applications such as Wikipedia, where authors collectively create structured information that can be extracted and used to enhance access to and use of information sources. Recent discussion has focused on how existing bibliographic standards can be expressed as Semantic Metadata for Web vocabularies to facilitate the ingration of library and cultural heritage data with other types of data. Harnessing the efforts of content providers and end-users to link, tag, edit, and describe their Semantic and information in interoperable ways (”participatory metadata”) is a key step towards providing knowledge environments that are scalable, self-correcting, and evolvable. Social Applications DC-2008 will explore conceptual and practical issues in the development and deployment of semantic and social applications to meet the needs of specific communities of practice. Edited by Jane Greenberg and Wolfgang Klas DC-2008
    [Show full text]
  • Marketing Cloud Published: August 12, 2021
    Marketing Cloud Published: August 12, 2021 The following are notices required by licensors related to distributed components (mobile applications, desktop applications, or other offline components) applicable to the services branded as ExactTarget or Salesforce Marketing Cloud, but excluding those services currently branded as “Radian6,” “Buddy Media,” “Social.com,” “Social Studio,”“iGoDigital,” “Predictive Intelligence,” “Predictive Email,” “Predictive Web,” “Web & Mobile Analytics,” “Web Personalization,” or successor branding, (the “ET Services”), which are provided by salesforce.com, inc. or its affiliate ExactTarget, Inc. (“salesforce.com”): @formatjs/intl-pluralrules Copyright (c) 2019 FormatJS Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    [Show full text]
  • Using RSS to Spread Your Blog
    10_588486 ch04.qxd 3/4/05 11:33 AM Page 67 Chapter 4 Using RSS to Spread Your Blog In This Chapter ᮣ Understanding just what a blog is ᮣ Creating the blog and the feed ᮣ Using your RSS reader ᮣ Creating a blog using HTML ᮣ Maintaining your blog ᮣ Publicizing your blog with RSS hat’s a blog, after all? Blog, short for Web log, is just a Web site with Wa series of dated entries, with the most recent entry on top. Those entries can contain any type of content you want. Blogs typically include links to other sites and online articles with the blogger’s reactions and com- ments, but many blogs simply contain the blogger’s own ramblings. Unquestionably, the popularity of blogging has fueled the expansion of RSS feeds. According to Technorati, a company that offers a search engine for blogs in addition to market research services, about one-third of blogs have RSS feeds. Some people who maintain blogs are publishing an RSS feed with- out even knowing about it, because some of the blog Web sites automatically create feeds (more about how this happens shortly). If you think that blogs are only personal affairs, remember that Microsoft has hundreds of them, and businesses are using them more and more to keep employees, colleagues, and customersCOPYRIGHTED up to date. MATERIAL In this chapter, I give a quick overview of blogging and how to use RSS with your blog to gain more readers. If you want to start a blog, this chapter explains where to go next.
    [Show full text]
  • Introduction to Web 2.0 Technologies
    Introduction to Web 2.0 Joshua Stern, Ph.D. Introduction to Web 2.0 Technologies What is Web 2.0? Æ A simple explanation of Web 2.0 (3 minute video): http://www.youtube.com/watch?v=0LzQIUANnHc&feature=related Æ A complex explanation of Web 2.0 (5 minute video): http://www.youtube.com/watch?v=nsa5ZTRJQ5w&feature=related Æ An interesting, fast-paced video about Web.2.0 (4:30 minute video): http://www.youtube.com/watch?v=NLlGopyXT_g Web 2.0 is a term that describes the changing trends in the use of World Wide Web technology and Web design that aim to enhance creativity, secure information sharing, increase collaboration, and improve the functionality of the Web as we know it (Web 1.0). These have led to the development and evolution of Web-based communities and hosted services, such as social-networking sites (i.e. Facebook, MySpace), video sharing sites (i.e. YouTube), wikis, blogs, etc. Although the term suggests a new version of the World Wide Web, it does not refer to any actual change in technical specifications, but rather to changes in the ways software developers and end- users utilize the Web. Web 2.0 is a catch-all term used to describe a variety of developments on the Web and a perceived shift in the way it is used. This shift can be characterized as the evolution of Web use from passive consumption of content to more active participation, creation and sharing. Web 2.0 Websites allow users to do more than just retrieve information.
    [Show full text]
  • Standardized Classification, Folksonomies, and Ontological Politics
    UCLA InterActions: UCLA Journal of Education and Information Studies Title Burning Down the Shelf: Standardized Classification, Folksonomies, and Ontological Politics Permalink https://escholarship.org/uc/item/74p477pz Journal InterActions: UCLA Journal of Education and Information Studies, 4(1) ISSN 1548-3320 Author Lau, Andrew J. Publication Date 2008-02-08 DOI 10.5070/D441000621 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Colonialism has left its indelible mark on the world: cultures denied, traditions altered or erased, narratives ignored—all under the guise of noble, and perhaps sincere, intentions of spreading civility among “heathens” and the “wretched.” Power and authority have been assumed with little regard for the conquered as their voices have been disavowed, discounted as subhuman. Societal structures have been implemented as ideal solutions to the perceived inferiority of indigenous societal structures; language, culture, and traditions of a conqueror have been imposed on the conquered, an ontology imposed on another, despite cultural incongruence, and in consequence of power assumed by one over another. The colonized have been classified as the “others,” as “uncivilized,” as “ungodly in need of saving.” This has been the experience of China and Korea at the hands of Japan; the Philippines at the hands of Spain; India, Burma, and Singapore at the hands of Britain; and countless others throughout history. While the example of colonialism may be extreme as a metaphor for ontology, it still serves as a compelling metaphor; colonialism alludes to structures of power, as does an ontology imposing itself on another. In hindsight, history has revealed the terrors and consequences of colonialism, including loss of culture, loss of life, and loss of heritage.
    [Show full text]
  • Open Source Licenses Visionize Lab Suite the Software of Visionize Lab Suite Contains Open Source Licenses Included in This Document
    Open Source Licenses VisioNize Lab Suite The software of VisioNize Lab Suite contains open source licenses included in this document. Package name License Website/Repository Publisher/Author @angular-devkit/[email protected] MIT https://github.com/angular/angular-cli Angular Authors @angular-devkit/[email protected] MIT https://github.com/angular/angular-cli Angular Authors @angular-devkit/build-optimiz- MIT https://github.com/angular/angular-cli Angular Authors [email protected] @angular-devkit/build-web- MIT https://github.com/angular/angular-cli Angular Authors [email protected] @angular-devkit/[email protected] MIT https://github.com/angular/angular-cli Angular Authors @angular-devkit/[email protected] MIT https://github.com/angular/angular-cli Angular Authors @angular/[email protected] MIT https://github.com/angular/angular angular @angular/[email protected] MIT https://github.com/angular/angular-cli Angular Authors @angular/[email protected] MIT https://github.com/angular/angular angular @angular/[email protected] MIT https://github.com/angular/angular @angular/[email protected] MIT https://github.com/angular/angular angular @angular/[email protected] MIT https://github.com/angular/angular angular @angular/[email protected] MIT https://github.com/angular/angular angular @angular/[email protected] MIT https://github.com/angular/angular angular @angular/platform-browser-dynam- MIT https://github.com/angular/angular angular [email protected] @angular/[email protected] MIT https://github.com/angular/angular angular @angular/[email protected] MIT https://github.com/angular/angular angular
    [Show full text]
  • Blogging Glossary
    Blogging Glossary Glossary: General Terms Atom: A popular feed format developed as an alternative to RSS. Autocasting: Automated form of podcasting that allows bloggers and blog readers to generate audio versions of text blogs from RSS feeds. Audioblog: A blog where the posts consist mainly of voice recordings sent by mobile phone, sometimes with some short text message added for metadata purposes. (cf. podcasting) Blawg: A law blog. Bleg: An entry in a blog requesting information or contributions. Blog Carnival: A blog article that contains links to other articles covering a specific topic. Most blog carnivals are hosted by a rotating list of frequent contributors to the carnival, and serve to both generate new posts by contributors and highlight new bloggers posting matter in that subject area. Blog client: (weblog client) is software to manage (post, edit) blogs from operating system with no need to launch a web browser. A typical blog client has an editor, a spell-checker and a few more options that simplify content creation and editing. Blog publishing service: A software which is used to create the blog. Some of the most popular are WordPress, Blogger, TypePad, Movable Type and Joomla. Blogger: Person who runs a blog. Also blogger.com, a popular blog hosting web site. Rarely: weblogger. Blogirl: A female blogger Bloggernacle: Blogs written by and for Mormons (a portmanteau of "blog" and "Tabernacle"). Generally refers to faithful Mormon bloggers and sometimes refers to a specific grouping of faithful Mormon bloggers. Bloggies: One of the most popular blog awards. Blogroll: A list of other blogs that a blogger might recommend by providing links to them (usually in a sidebar list).
    [Show full text]
  • Backbone Tutorials
    Backbone Tutorials Beginner, Intermediate and Advanced ©2012 Thomas Davis This version was published on 2012-10-16 This is a Leanpub book, for sale at: http://leanpub.com/backbonetutorials Leanpub helps authors to self-publish in-progress ebooks. We call this idea Lean Publishing. To learn more about Lean Publishing, go to: http://leanpub.com/manifesto To learn more about Leanpub, go to: http://leanpub.com Tweet This Book! Please help Thomas Davis by spreading the word about this book on Twitter! The suggested hashtag for this book is #backbonetutorials. Find out what other people are saying about the book by clicking on this link to search for this hashtag on Twitter: https://twitter.com/search/#backbonetutorials Contents Why do you need Backbone.js? i Why single page applications are the future ............................ i So how does Backbone.js help? ................................... i Other frameworks .......................................... i Contributors .......................................... i What is a view? ii The “el” property ........................................... ii Loading a template .......................................... iii Listening for events .......................................... iii Tips and Tricks ............................................ iv Relevant Links ......................................... v Contributors .......................................... v What is a model? vi Setting attributes ........................................... vi Getting attributes ..........................................
    [Show full text]
  • Between Ontologies and Folksonomies
    BOF Between Ontologies and Folksonomies Michigan State University-Mi, US June 28, 2007 Workshop held in conjunction with Preface Today on-line communities, as well as individuals, produce a substantial amount of unstructured (and extemporaneous) content, arising from tacit and explicit knowledge sharing. Various approaches, both in the managerial and computer science fields, are seek- ing ways to crystallize the - somewhat volatile, but often highly valuable - knowl- edge contained in communities "chatters". Among those approaches, the most relevants appear to be those aimed at developing and formalizing agreed-upon semantic representations of specific knowledge domains (i.e. domain ontologies). Nonetheless, the intrinsic limits of technologies underpinning such approaches tend to push communities members towards the spontaneous adoption of less cumbersome tools, usually offered in the framework of the Web 2.0 (e.g. folkso- nomies, XML-based tagging, etc.), for sharing and retrieving knowledge. Inside this landscape, community members should be able to access and browse community knowledge transparently and in a personalized way, through tools that should be at once device-independent and context- and user-dependent, in order to manage and classify content for heterogeneous interaction channels (wired/wireless network workstations, smart-phones, PDA, and pagers) and dispa- rate situations (while driving, in a meeting, on campus). The BOF- Between Ontologies and Folksonomies workshop, held in conjunction with the third Communities and Technologies conference in June 2007 1, aimed at the development of a common understanding of the frontier technologies for shar- ing knowledge in communities. We are proposing here a selection of conceptual considerations, technical issues and "real-life case studies" presented during the workshop.
    [Show full text]
  • UCLA Electronic Theses and Dissertations
    UCLA UCLA Electronic Theses and Dissertations Title Leveraging Program Commonalities and Variations for Systematic Software Development and Maintenance Permalink https://escholarship.org/uc/item/25z5n0t6 Author Zhang, Tianyi Publication Date 2019 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA Los Angeles Leveraging Program Commonalities and Variations for Systematic Software Development and Maintenance A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Tianyi Zhang 2019 c Copyright by Tianyi Zhang 2019 ABSTRACT OF THE DISSERTATION Leveraging Program Commonalities and Variations for Systematic Software Development and Maintenance by Tianyi Zhang Doctor of Philosophy in Computer Science University of California, Los Angeles, 2019 Professor Miryung Kim, Chair Software is becoming increasingly pervasive and complex. During software development and maintenance, developers often make ad hoc decisions based on local program contexts and their own experience only, which may increase technical debt and raise unknown conse- quences as software evolves. This dissertation explores several opportunities to guide software development and maintenance by exploiting and visualizing the commonalities and varia- tions among similar programs. Our hypothesis is that unveiling what has and has not been done in other similar program contexts can help developers make more systematic decisions, explore implementation alternatives, and reduce the risk of unknown consequences. The inherent repetitiveness in software systems provides a solid foundation for identify- ing and exploiting similar code. This dissertation first presents two approaches that leverage the syntactic similarity and repetitiveness in local codebases to improve code reviews and software testing.
    [Show full text]
  • Overview of the TREC-2007 Blog Track
    Overview of the TREC-2007 Blog Track Craig Macdonald, Iadh Ounis Ian Soboroff University of Glasgow NIST Glasgow, UK Gaithersburg, MD, USA {craigm,ounis}@dcs.gla.ac.uk [email protected] 1. INTRODUCTION The number of permalink documents in the collection is over The goal of the Blog track is to explore the information seeking 3.2 million, while the number of feeds is over 100,000 blogs. The behaviour in the blogosphere. It aims to create the required in- permalink documents are used as a retrieval unit for the opinion frastructure to facilitate research into the blogosphere and to study finding task and its associated polarity subtask. For the blog distil- retrieval from blogs and other related applied tasks. The track was lation task, the feed documents are used as the retrieval unit. The introduced in 2006 with a main opinion finding task and an open collection has been distributed by the University of Glasgow since task, which allowed participants the opportunity to influence the March 2006. Further information on the collection and how it was determination of a suitable second task for 2007 on other aspects created can be found in [1]. of blogs besides their opinionated nature. As a result, we have created the first blog test collection, namely the TREC Blog06 3. OPINION FINDING TASK collection, for adhoc retrieval and opinion finding. Further back- Many blogs are created by their authors as a mechanism for self- ground information on the Blog track can be found in the 2006 expression. Extremely-accessible blog software has facilitated the track overview [2].
    [Show full text]
  • 8.5.512 Speechminer UI User Manual
    SpeechMiner UI User Manual 8.5.5 Date: 7/11/2018 Table of Contents Chapter 1: Introduction to SpeechMiner 1 New in this Release 4 SpeechMiner Deployments 15 Topics, Categories and Programs 17 Topics 19 Categories 21 Programs 23 Log into SpeechMiner 26 SpeechMiner Menu Reference 28 Chapter 2: Dashboard 30 Dashboard Menu Reference 31 Create a New Dashboard 32 Working with Dashboards 34 Widgets 36 Working with Widgets 37 Report Widget 40 My Messages Widget 43 Chapter 3: Explore 46 Explore Menu Reference 47 Search Results Grid 48 Working with the Search Results Grid 52 What is an Interaction? 55 Screen Recordings 58 Batch Actions 59 Create a New Search 64 Search Filter 66 Explore Terms 74 Saved Searches 75 Compare Saved Searches 77 Interaction Lists 79 Content Browser 82 Trending 84 Create a Trending Filter 86 Create a Custom Trending Cluster Task 88 Trending Filter Toolbar Description 91 Trending Chart Tool Tip Description 95 Trending Chart Data Description 97 Manage the Blacklist 101 Working with Saved Trending Filters 103 Export Trending Data 107 Chapter 4: Media Player 108 Playback 110 Playback Controls 111 Hidden Confidential Information 115 Dual-Channel Audio 116 Interaction Transcript 117 Interaction Comments 121 Interaction Events 124 Interaction Attributes 126 Media Player Options 127 Chapter 5: Reports 131 Reports Menu Reference 132 Create a New Report 133 Edit a Report 134 Run a Report 135 Analyze Report Data 137 Drill Down a Report 138 Working with Saved Reports 139 Schedule a Report 142 Report Templates 147 Report Template Layout
    [Show full text]