Cross-Media Event Extraction and Recommendation Di Lu1, Clare R
Total Page:16
File Type:pdf, Size:1020Kb
Cross-media Event Extraction and Recommendation Di Lu1, Clare R. Voss2, Fangbo Tao3, Xiang Ren3, Rachel Guan1, Rostyslav Korolov1, Tongtao Zhang1, Dongang Wang4, Hongzhi Li4, Taylor Cassidy2, Heng Ji1, Shih-fu Chang4, Jiawei Han3, William Wallace1, James Hendler1, Mei Si1, Lance Kaplan2 1 Rensselaer Polytechnic Institute flud2,[email protected] 2 Army Research Laboratory, Adelphi, Maryland fclare.r.voss,taylor.cassidy,[email protected] 3 Computer Science Department, University of Illinois at Urbana-Champaign [email protected] 4 Electrical Engineering Department, Columbia University [email protected] Abstract properties of interest to them in selecting new sce- narios for presentation. The sheer volume of unstructured multimedia In this paper we present a novel event extraction data (e.g., texts, images, videos) posted on the and recommendation system that incorporates ad- Web during events of general interest is over- vances in extracting events across multiple sources whelming and difficult to distill if seeking in- with data in diverse modalities and so yields a more formation relevant to a particular concern. We have developed a comprehensive system that comprehensive understanding of collective events, searches, identifies, organizes and summarizes their importance, and their inter-connections. The complex events from multiple data modalities. novel capabilities of our system include: It also recommends events related to the user’s • Event Extraction, Summarization and Search: ongoing search based on previously selected extract concepts, events and their arguments (par- attribute values and dimensions of events be- ticipants) and implicit attributes, and semanti- ing viewed. In this paper we briefly present cally meaningful visual patterns from multiple the algorithms of each component and demon- strate the system’s capabilities 1. data modalities, and organize them into Event Cubes (Tao et al., 2013) based on Online Analyt- ical Processing (OLAP). We developed a search 1 Introduction interface to these cubes with a novel back-end event summarization component, for users to Every day, a vast amount of unstructured data in dif- specify multiple dimensions in a query and re- ferent modalities (e.g., texts, images and videos) is ceive single-sentence summary responses. posted online for ready viewing. Complex event ex- • Event Recommendation: recommend related traction and recommendation is critical for many in- events based on meta-paths derived from event formation distillation tasks, including tracking cur- arguments, and automatically adjust the ranking rent events, providing alerts, and predicting possi- function by updating the weights of dimensions ble changes, as related to topics of ongoing con- based on user browsing feedback. cern. State-of-the-art Information Extraction (IE) technologies focus on extracting events from a sin- 2 Overview gle data modality and ignore cross-media fusion. More importantly, users are presented with extracted 2.1 System Architecture events in a passive way (e.g., in a temporally ordered The overall architecture of our system is illustrated event chronicle (Ge et al., 2015)). Such technologies in Figure 1. We first extract textual event informa- do not leverage user behavior to identify the event tion, as well as visual concepts, events and patterns from the raw multimedia documents to construct 1The system demo is available at: http://nlp.cs. rpi.edu/multimedia/event/navigation_dark. event cubes. Based on the event cubes, the search html interface (Figure 2) returns the document that best Figure 1: System Workflow matches the user’s query as the first primary event documents, 46 images and 6 videos. for display. A query may consist of multiple dimen- sions (Figure 3). The recommendation interface dis- 3 Event Cube Construction and Search plays multiple dimensions and rich annotations of the primary event, and recommends similar and dis- similar events to the user. 2.2 Data Sets To demonstrate the capabilities of our system, we use two event types, Protest and Attack, as our case studies. We collected the following data sets: • Protest: 59 protest incidents that occurred be- tween January 2009 and December 2010, from (b) Attack Event 458 text documents, 28 images and 31 videos. (a) Protest Event • Attack: 52 attack incidents that occurred between Figure 2: Search Interface January 2014 and December 2015, from 812 text 3.1 Event Extraction pared to keyword search. We apply a state-of-the-art English IE system (Li 4 Multi-media Event Illustration and et al., 2014) to jointly extract entities, relations and Summarization events from text documents. This system is based on structured perceptron incorporating multiple lev- After the search interface retrieves the most rele- els of linguistic features. However, some impor- vant event (’primary event’), the user will be di- tant event attributes are not expressed by explicit rected to the recommendation interface (Figure 2b) textual clues. To enrich the profile of each protest and can start exploring various events. The user’s event, the system identifies two additional implicit initial query is displayed at the top (gray bar) and is attributes that derive from social movement theo- updated to capture new user selections. The number ries (Della Porta and Diani, 2009; Furusawa, 2006; of events that match the user’s initial query and the Dalton, 2013): number of documents associated with the primary • Types of Protest, including demonstration, riot, event are displayed in the gray bar, at the far right strike, boycott, picket and individual protest. side. In addition to text IE results, we apply the fol- • Demands of Protest, indicating the hidden be- lowing multi-media extraction and summarization havioral intent of protesters, about what they de- techniques to illustrate and enrich event profiles. sire to change or preserve, including pacifist, po- litical, economic, retraction, financial, religious, 4.1 Summarization human rights, race, justice, environmental, sports From each set of relevant documents, we apply a rivalry and social. state-of-the-art phrase mining method (Liu et al., We annotated 92 news documents, 59 of which con- 2016) to mine top-k representative phrases. Then tained protests, to learn a set of heuristic rules for au- we construct an affinity matrix of sentences and ap- tomatically extracting these implicit attributes from ply spectral clustering to find several clustering cen- the IE outputs of these documents. ters (i.e., representative sentences including the most important phrases) as the summary. The user is 3.2 Event Cube Construction and Search also provided two options to show the original doc- Event extraction helps in converting unstructured uments and the document containing the summary. texts into structured arguments and attributes (di- mensions). However, a user may still need to “drill 4.2 Visual Information Extraction down,” searching back through many documents and For each event, we retrieve the most representa- changing query selections before finding an item of tive video/image online using the key-phrases such interest. We use EventCube (Tao et al., 2013) to as date, location and entities as queries. Videos effectively organize and efficiently search relevant and images are often more impressive and efficient events, and measure event similarity based on multi- at conveying information. We first apply a pre- ple dimensions. EventCube is a general online ana- trained convolutional neural network (CNN) archi- lytical processing (OLAP) framework for importing tecture (Kuznetsova et al., 2012) to extract visual any collection of multi-dimensional text documents concepts from each video key frame based on the and constructing text-rich data cube. EventCube dif- EventNet concept library (Ye et al., 2015). For fers from traditional search engines in that it returns example, the extracted visual concepts “crowd on Top-Cells, where relevant documents are aggregated street, riot, demonstration or protest, people march- by dimension combinations. ing” appear when the user’s mouse is over the video We regard each event as a data point associated of the primary event (Figure 2b). Then we adopt the with multiple dimension values. After a user in- approach described in (Li et al., 2015) which applies puts a multi-dimensional query in the search inter- CNN and association rule mining technique to gen- face (Figure 2), we build inverted index upon the erate visual patterns and extract semantically mean- dimensions in Event Cube to return related events, ingful relations between visual and textual informa- which provides much more flexible matching com- tion to name the patterns. Figure 3: Recommendation Interface 5 Event Recommendation 6 Conclusions and Future Work In this paper we present a cross-media event ex- We rank and recommend events based on meta traction and recommendation system which effec- paths (Sun et al., 2011), by representing the tively aggregates and summarizes related complex whole data set as a heterogeneous network, that events, and makes recommendations based on user is composed of multi-typed and interconnected ob- interests. The current system interface incorporates jects (e.g., events, location, protesters, target of a medium-level human agency (the capacity of an protesters). A meta path is a sequence of rela- entity to act) by allowing a human user to provide tions defined between different object types. For relevance feedback while driving the browsing in- protest events, we define six meta