Using a Deep Convolutional Neural Network to Map Social Media Photo Topics Across Major Cities
Total Page:16
File Type:pdf, Size:1020Kb
USING A DEEP CONVOLUTIONAL NEURAL NETWORK TO MAP SOCIAL MEDIA PHOTO TOPICS ACROSS MAJOR CITIES by Sean Stuekerjuergen A Thesis Submitted to the Graduate Faculty of George Mason University in Partial Fulfillment of The Requirements for the Degree of Master of Science Geoinformatics and Geospatial Intelligence Committee: _________________________________________ Dr. Anthony Stefanidis, Thesis Director _________________________________________ Dr. Arie Croitoru, Committee Member _________________________________________ Dr. Ruixin Yang, Committee Member _________________________________________ Dr. Anthony Stefanidis, Department Chairperson _________________________________________ Dr. Donna M. Fox, Associate Dean, Office of Student Affairs & Special Programs, College of Science _________________________________________ Dr. Peggy Agouris, Dean, College of Science Date: __________________________________ Spring Semester 2017 George Mason University Fairfax, VA Using A Deep Convolutional Neural Network to Map Social Media Photo Topics Across Major Cities A Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at George Mason University by Sean Stuekerjuergen Bachelors of Science Virginia Polytechnic Institute and State University, 2009 Director: Anthony Stefanidis, Professor Department of Geoinformatics and Geospatial Intelligence Spring Semester 2017 George Mason University Fairfax, VA Copyright 2017 Sean Stuekerjuergen All Rights Reserved ii DEDICATION This work is dedicated to my wife Miebi. iii ACKNOWLEDGEMENTS It isn’t an easy decision to go back to school and change careers. Fortunately I was still fairly young, but I had lost that youthful naïve confidence. I would not be where I am today if not for the advice and encouragement of many people. Carl Stuekerjuergen, for always being there to advise but encouraging me set my own path. Dr. Greg Koeln, for giving me a glimpse into industry. All of my professors, for welcoming me to GMU and teaching me so much, particularly Dr. Barry Haack, Dr. Arie Croitoru, Dr. Ruixin Yang, and especially Dr. Anthony Stefanidis for guiding me through it all. Finally, I want to acknowledge everyone at DigitalGlobe and our customers, for allowing me to collaborate and balance school and work. iv TABLE OF CONTENTS Page List of Tables ........................................................................................................................... vi List of Figures.........................................................................................................................vii List of Equations .................................................................................................................. viii List of Abbreviations and/or Symbols ................................................................................... ix Abstract ..................................................................................................................................... x 1. Introduction ....................................................................................................................... 1 1.1 Overview ................................................................................................................... 1 1.2 Thesis Structure......................................................................................................... 3 2. Image Classification and Deep Convolutional Neural Networks .................................. 4 2.1 Deep Convolutional Neural Networks..................................................................... 4 2.2 ImageNet ................................................................................................................... 5 2.3 Training ..................................................................................................................... 6 3. Twitter Dataset .................................................................................................................. 8 3.1 Twitter Data Structure .............................................................................................. 8 3.2 Twitter Data Processing............................................................................................ 9 4. Validated Association Scoring Technique .................................................................... 11 4.1 CNN Scoring ........................................................................................................... 11 4.2 Validated Association Scoring Technique ............................................................ 12 5. Results ............................................................................................................................. 15 5.1 Image Classification Results .................................................................................. 15 5.2 Geographic Distributions........................................................................................ 17 5.3 Twitter Text Comparison ....................................................................................... 20 6. Conclusion and Future Work ......................................................................................... 22 Appendix ................................................................................................................................. 24 References............................................................................................................................... 41 v LIST OF TABLES Table Page Table 1 Geolocated Tweet Counts Per City (2015) ............................................................. 10 Table 2 Number of Specific Categories Within Each Broad Category .............................. 16 vi LIST OF FIGURES Figure Page Figure 1 Top 1 VAST Score Histogram ............................................................................... 16 Figure 2 Natural Places in New York City ........................................................................... 17 Figure 3 Sports in Jakarta ..................................................................................................... 19 Figure 4 Sports, Ice Skate, and Stadium in Jakarta ............................................................. 19 Figure 5 New York City Natural Places Word Cloud ........................................................ 21 Figure 6 Jakarta Sports Word Cloud .................................................................................... 21 vii LIST OF EQUATIONS Equation Page Equation 1 VAST Score ........................................................................................................ 14 viii LIST OF ABBREVIATIONS AND/OR SYMBOLS Convolutional Neural Network ........................................................................................ CNN Open Street Map................................................................................................................ OSM ImageNet Large Scale Visual Recognition Competition.......................................... ILSVRC Graphics Processing Unit .................................................................................................. GPU Visual Geometry Group .................................................................................................... VGG Berkeley Vision and Learning Center............................................................................ BVLC Application Programing Interface ...................................................................................... API Validated Association Scoring Technique..................................................................... VAST OpenStreetMap .................................................................................................................. OSM ix ABSTRACT USING A DEEP CONVOLUTIONAL NEURAL NETWORK TO MAP SOCIAL MEDIA PHOTO TOPICS ACROSS MAJOR CITIES Sean Stuekerjuergen, M.S. George Mason University, 2017 Thesis Director: Dr. Anthony Stefanidis Recent research has demonstrated the potential of mining geotagged Twitter data in order to identify distinct places as spatial clusters of thematically congruent tweets posted from these locations. But social media interaction and participation is not only textual: social media platforms are multimedia in nature, encompassing imagery as well as text. Accordingly, a research question emerges on whether geotagged imagery posted in social media can also be analyzed to reveal thematic clusters, furthering our abilities to harvest platial content from such crowd-contributed content. Such studies can be enabled by the recent advent of convolutional neural networks that can be trained to automatically and accurately classify imagery. In this thesis we pursue a study of automatically classified crowd-contributed geotagged imagery from six major cities, in order to assess the emergence of spatial semantic associations. x 1. INTRODUCTION 1.1 Overview With the advent of cameras in cell phones and the open sharing of photos on the internet, in particular social media, a visual portion of the world is uploaded to the internet every day. Pixel data has very quickly become the most ubiquitous data on the internet. According to CICSCO, in 2014 64% of internet traffic was from video, and that number will grow to 80% by 2020 (CISCO 2016). Every 60 seconds, over 100 hours of YouTube video is uploaded, 56,000 new photos are uploaded on Instagram, and over 1,000 images are uploaded to Flickr (GO-GLOBE, Flickr API 2016). In that same timeframe, 430,000 tweets are sent.