Hierarchical Multi-Label Classification of Social Text Streams
Total Page:16
File Type:pdf, Size:1020Kb
Hierarchical Multi-Label Classification of Social Text Streams Zhaochun Ren Maria-Hendrike Peetz Shangsong Liang University of Amsterdam University of Amsterdam University of Amsterdam Amsterdam, The Netherlands Amsterdam, The Netherlands Amsterdam, The Netherlands [email protected] [email protected] [email protected] Willemijn van Dolen Maarten de Rijke Business School University of Amsterdam University of Amsterdam Amsterdam, The Netherlands Amsterdam, The Netherlands [email protected] [email protected] ABSTRACT For many social media applications, a document in a social text Hierarchical multi-label classification assigns a document to mul- stream usually belongs to multiple labels that are organized in a tiple hierarchical classes. In this paper we focus on hierarchical hierarchy. This phenomenon is widespread in web forums, ques- multi-label classification of social text streams. Concept drift, com- tion answering platforms, and microblogs [11]. In Fig.1 we show plicated relations among classes, and the limited length of docu- an example of several classes organized in a tree-structured hier- ments in social text streams make this a challenging problem. Our archy, of which several subtrees have been assigned to individual approach includes three core ingredients: short document expan- tweets. The tweet “I think the train will soon stop again because sion, time-aware topic tracking, and chunk-based structural learn- of snow . ” is annotated with multiple hierarchical labels: “Com- ing. We extend each short document in social text streams to a munication,” “Personal experience” and “Complaint.” Faced with more comprehensive representation via state-of-the-art entity link- many millions of documents every day, it is impossible to manually ing and sentence ranking strategies. From documents extended in classify social streams into multiple hierarchical classes. This mo- this manner, we infer dynamic probabilistic distributions over top- tivates the hierarchical multi-label classification (HMC) task for ics by dividing topics into dynamic “global” topics and “local” top- social text streams: classify a document from a social text stream ics. For the third and final phase we propose a chunk-based struc- using multiple labels that are organized in a hierarchy. tural optimization strategy to classify each document into multi- Recently, significant progress has been made on the HMC task, ple classes. Extensive experiments conducted on a large real-world see, e.g., [4,7, 10]. However, the task has not yet been examined dataset show the effectiveness of our proposed method for hierar- in the setting of social text streams. Compared to HMC on station- chical multi-label classification of social text streams. ary documents, HMC on documents in social text streams faces specific challenges: (1) Because of concept drift a document’s sta- Categories and Subject Descriptors tistical properties change over time, which makes the classification output different at different times. (2) The shortness of documents H.3.3 [Information Search and Retrieval]: Information filtering in social text streams hinders the classification process. Keywords In this paper, we address the HMC problem for documents in social text streams. We utilize structural support vector machines Twitter; tweet classification; topic modeling; structural SVM (SVMs) [41]. Unlike with standard SVMs, the output of struc- tural SVMs can be a complicated structure, e.g., a document sum- 1. INTRODUCTION mary, images, a parse tree, or movements in video [22, 45]. In The growth in volume of social text streams, such as microblogs our case, the output is a 0/1 labeled string representing the hi- and web forum threads, has made it critical to develop methods erarchical classes, where a class is included in the result if it is that facilitate understanding of such streams. Recent work has con- labeled as 1. For example, the annotation of the top left tweet firmed that short text classification is an effective way of assisting in Fig.1 is 1100010000100. Based on this structural learn- users in understanding documents in social text streams [25, 26, 29, ing framework, we use multiple structural classifiers to transform 46]. Straightforward text classification methods, however, are not our HMC problem into a chunk-based classification problem. In adequate for mining documents in social streams. chunk-based classification, the hierarchy of classes is divided into Permission to make digital or hard copies of all or part of this work for personal or multiple chunks. classroom use is granted without fee provided that copies are not made or distributed To address the shortness and concept drift challenges mentioned for profit or commercial advantage and that copies bear this notice and the full citation above, we proceed as follows. Previous solutions for working with on the first page. Copyrights for components of this work owned by others than the short documents rely on extending short documents using a large author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or external corpus [32]. In this paper, we employ an alternative strat- republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. egy involving both entity linking [30] and sentence ranking to col- SIGIR ’14, July 06–11, 2014, Gold Coast, QLD, Australia. lect and filter relevant information from Wikipedia. To address con- Copyright is held by the owner/author(s). Publication rights licensed to ACM. cept drift [1, 39], we track dynamic statistical distributions of topics ACM 978-1-4503-2257-7/14/07 ... $15.00. over time. Time-aware topic models, such as dynamic topic mod- http://dx.doi.org/10.1145/2600428.2609561 There are quite cramped trains 200,000 people travel with Those techniques can be classified into web search-based methods book as ticket I really feel like Smullers ... ... and topic-based ones. I think the train will soon Web search-based methods handle each short text as a query to stop again because of ROOT a search engine, and then improve short text classification perfor- snow... mance using external knowledge extracted from web search engine results [8, 44]. Such approaches face efficiency and scalability Communication Product Traveler challenges, which makes them ill-suited for use in our data-rich setting [13]. As to topic-based techniques, Phan et al. [32] extract Personal report Personal experience Retail on station Parking topic distributions from a Wikipedia dump based on the LDA [6] model. Similarly, Chen et al. [13] propose an optimized algorithm for extracting multiple granularities of latent topics from a large- scale external training set; see [37] for a similar method. Complaint Product Experience Smullers Incident Compliment Besides those two strategies, other methods have also been em- ployed. E.g., Nishida et al. [28], Sun [38] improve classification performance by compressing shorts text into entities. Zhang et al. Figure 1: An example of predefined labels in hierarchical [46] learn a short text classifier by connecting what they call the multi-label classification of documents in a social text stream. “information path,” which exploits the fact that some instances of Documents are shown as colored rectangles, labels as rounded test documents are likely to share common discriminative terms rectangles. Circles in the rounded rectangles indicate that the with the training set. Few previous publications on short text classi- corresponding document has been assigned the label. Arrows fication consider a streaming setting; none focuses on a hierarchical indicate hierarchical structure between labels. multiple-label version of the short text classification problem. els (DTM) [5], are not new. Compared to latent Dirichlet allocation 2.2 Hierarchical multi-label classification (LDA) [6], dynamic topic models are more sensitive to bursty top- In the machine learning field, multi-label classification problems ics. A global topic is a stationary latent topic extracted from the have received lots of attention. Discriminative ranking methods whole document set and a local topic is a dynamic latent topic ex- have been proposed in [36], while label-dependencies are applied tracted from a document set within a specific time period. To track to optimize the classification results by [18, 20, 31]. However, none dynamic topics, we propose an extension of DTM that extracts both of them can work when labels are organized hierarchically. global and local topics from documents in social text streams. The hierarchical multi-label classification problem is to classify Previous work has used Twitter data for streaming short text clas- a given document into multiple labels that are organized as a hier- sification [29]. So do we. We use a large real-world dataset of archy. Koller and Sahami [19] propose a method using Bayesian tweets related to a major public transportation system in a Euro- classifiers to distinguish labels; a similar approach uses a Bayesian pean country to evaluate the effectiveness of our proposed meth- network to infer the posterior distributions over labels after train- ods for hierarchical multi-label classification of documents in so- ing multiple classifiers [3]. As a more direct approach to the HMC cial text streams. The tweets were collected and annotated as part task, Rousu et al. [34] propose a large margin method, where a dy- of their online reputation management campaign. As we will see, namic programming algorithm is applied to calculate the maximum our proposed method offers statistically significant improvements structural margin for output classes. Decision-tree based optimiza- over state-of-the-art methods. tion has also been applied to the HMC task [7, 42]. Cesa-Bianchi Our contributions can be summarized as follows: et al. [10] develop a classification method using hierarchical SVM, • We present the task of hierarchical multi-label classification where SVM learning is applied to a node if and only if this node’s for streaming short texts. parent has been labeled as positive.