Viscat: Spatio-Temporal Visualization and Aggregation of Categorical Attributes in Twitter Data (Demo Paper)
Total Page:16
File Type:pdf, Size:1020Kb
VisCAT: Spatio-Temporal Visualization and Aggregation of Categorical Attributes in Twitter Data (Demo Paper) Thanaa M. Ghanem1, Amr Magdy2, Mashaal Musleh3, Sohaib Ghani4, Mohamed F. Mokbel5 1,2,3,4,5KACST GIS Technology Innovation Center, Umm Al-Qura University, Makkah, KSA 1Dept. of Information and Computer Sciences, Metropolitan State University, Saint Paul, MN 2,5Dept. of Computer Science and Engineering, University of Minnesota, Minneapolis, MN [email protected], {2amr,5mokbel}@cs.umn.edu, {3mmusleh,4sghani}@gistic.org ABSTRACT analysis tasks that can deduce fruitful conclusions for ac- In the last few years, Twitter data has become so popular tual population. For example, a recent study on geotagged that it is used in a rich set of new applications, e.g., real-time tweets [11] has examined the relation between cultural di- event detection, demographic analysis, and news extraction. versity and Twitter language usage in different countries us- As user-generated data, the plethora of Twitter data moti- ing ground truth real data from international organizations. vates several analysis tasks that make use of activeness of The results shows a localized correlation in various societies 255+ Million Twitter users. This demonstration presents with particular demographic characteristics and standards VisCAT; a tool for aggregating and visualizing categorical of living. The study has also shown a strong correlation attributes in Twitter data. VisCAT outputs visual reports between the tweets posted in country and its first spoken that provide spatial analysis through interactive map-based language. Thus, the availability of large amount of Twitter visualization for categorical attributes—such as tweet lan- data from a wide active user base around the world moti- guage or source operating system—at different zoom levels. vates a variety of more accurate potential analysis tasks. The visual reports are built based on user-selected data in One of the underutilized Twitter data attributes are the arbitrary spatial and temporal ranges. For this data, Vis- categorical attributes: the attributes that can take one of CAT employs a hierarchical spatial data structure to ma- multiple discrete values. Prime example of important cate- terialize the count of each category at multiple spatial lev- gorical attributes in Twitter data is the language attribute els. We demonstrate VisCAT, using real Twitter dataset. that is appended to tweets by Twitter on February 2013 [20]. The demonstration includes use cases on tweet language and The language attribute determines in which natural lan- tweet source attributes in the region of Gulf Arab states, guage the tweet is written. This single attribute, along with which can be used for deducing thoughtful conclusions on geolocation information, allows the whole study in [11] and demographics and living levels in local societies. it enables even more analysis, e.g., about users language usage. Another example for Twitter categorical attribute 1. INTRODUCTION is the tweet source, which determines from which OS, de- vice, or application the tweet is posted. This is another at- Twitter microblogging service becomes very popular in tribute that could enable more analysis tasks like the spread the last few years. Everyday, 500+ Million tweets are posted of different devices in the geo-located space, analysis of stan- by 255+ Million active users [19, 22]. With such unprece- dards of living in different regions,...etc. Thus, categorical dented user activeness and huge user-generated data sizes, attribute are important sources of Twitter data analysis that several new applications and analysis tasks are motivated. could be exploited to draw fruitful conclusions from the con- This includes real-time keyword search [5], spatio-temporal tinuously flowing Twitter data. modeling [1], event detection [2, 10, 12, 13, 14, 16, 23], event In this demonstration, we present VisCAT—Visualization analysis [8, 18], news extraction [15], photo extraction [7], of Categorical Attributes in Twitter—as a web-based ser- and general analysis [9, 17]. Such kinds of applications are so vice that enables users to analyze categorical attributes of important that major IT companies spend millions of dollars Twitter data. VisCAT web interface facilitates choosing to enable them to their customers [3, 21]. Twitter data from arbitrary spatial and temporal ranges The plethora of Twitter active users enables meaningful and a particular categorical attribute to analyze. Then, a web-based visual report interface is generated for aggregate counts of different categories based on pie charts, per spa- tial region and at different zoom levels. To this end, Vis- Permission to make digital or hard copies of all or part of this work for CAT employs a hierarchical spatial pyramid structure [4] personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies that materializes the count of each category. All the counts bear this notice and the full citation on the first page. To copy otherwise, to are pre-computed and stored in the different pyramid levels. republish, to post on servers or to redistribute to lists, requires prior specific The pyramid structure along with it aggregate counts are permission and/or a fee. stored on the disk and loaded on the launch of the report ACM SIGSPATIAL ’14 Dallas, Texas, USA interface. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. We demonstrate VisCAT using real Twitter dataset, show- completed, the pyramid structure is stored on disk with its ing use cases for language and source attributes of Twitter aggregate counts. Afterwards, VisCAT back end generates data, in the region of Gulf Arab states, during the period an interactive web interface that visualize the contents of from December 2013 to February 2014. The rest of this pa- the pyramid structure on a map-based interface at different per describes VisCAT service in more detail along with the zoom levels, where each map level corresponds to a pyra- demonstration scenarios. mid level. This interface represents the output visual re- 2. VISCAT OVERVIEW port. Whenever the report is launched, the stored pyramid is loaded from disk to visualize the precomputed aggrega- In this section, we present an overview of VisCAT ser- tions. After the report interface is successfully generated, vice features and components. First, we describe a detailed an email is sent to user with a hyperlink to the report. It is overview about the process of generating visual reports along important to note that we take tweet language and source with the supported features. Second, we describe details of OS attributes classification from Twitter and it may contains the internals of the employed data structure in VisCAT. some errors. The accurate classification of such attributes is VisCAT service is a web-based asynchronous service that beyond the scope of this work. generates visual reports for aggregate analysis on categori- cal attributes of Twitter data. It allow users to submit a 3. DEMONSTRATION SCENARIOS request for specific visual report, takes its time processing the request in the back end, and then sends the visual report VisCAT user interface and functionality are demonstrated to the user by email. The requested visual report is char- using a real dataset that is continuously being crawled from acterized by two main things: (1) the data to be included Twitter streaming APIs since October 2013. Our demo at- in the report, and (2) the categorical attribute to be aggre- tendees would be able to interact with VisCAT as explained gated and visualized in the report. VisCAT users can select in the following scenarios. data in arbitrary spatial and temporal range. The available 3.1 Scenario 1: Submitting Report Request geotagged tweets in VisCAT are crawled since October 2013 The main user interface of VisCAT is shown in Figure 1, from Twitter streaming APIs. For the user-selected data, from which users would be able to submit spatio-temporal the user can select any categorical attribute, e.g., language requests to generate visual reports on categorical attributes or source, to generate an interactive visual report. in Twitter data. The user would select the report spatial Once the user submits a request, the back end of VisCAT area using a map-based interface (the black rectangle in extracts the user-selected data through extensive scanning Figure 1) and the temporal range through the datepicker. for geotagged tweets in the specified temporal range. Then, Then, the user chooses an attribute to analyze. Finally, the VisCAT creates an adaptive pyramid structure [4] (similar user enters an email address to receive the output report to a partial quad tree [6]) that stores counts of different and submit the request. VisCAT would process the report attribute categories in the whole spatial range at different request in its back end to generate a report similar to those levels of granularity. Building the pyramid structure goes presented in the next sections. Once the report is gener- through two phases: (1) Structuring phase, and (2) Compu- ated, the user receives an email message with a link to the tation phase. The structuring phase determines the pyra- generated report. It is worth noting that the interface in mid shape by inserting the actual individual tweets. Then, Figure 1 does not let the user input the pyramid cell capac- the computation phase precomputes the aggregate counts ity. VisCAT sets this by default to 50 which enables the in each pyramid cell before discarding the individual tweets. spatial granularity to be street-level. The pyramid is initialized by one root cell that covers the whole spatial range and contains all the tweets of the report. 3.2 Scenario 2: Interactive Visual Reports The root cell is then divided into four disjoint children cells, As described throughout the paper, VisCAT generates each covering a quarter of the space.