A Sequence-Oblivious Generation Method for Context-Aware Hashtag Recommendation

A Sequence-Oblivious Generation Method for Context-Aware Hashtag Recommendation Junmo Kang Jeonghwan Kim Suwon Shin Sung-Hyon Myaeng School of Computing, KAIST Daejeon, Republic of Korea fjunmo.kang, jeonghwankim123, ssw0093, [email protected] 0.43 0.22 0.15 … 0.03 Abstract #Tag1 #Tag2 #Tag3 … #TagN Like search, a recommendation task accepts Ranking an input query or cue and provides desirable Model items, often based on a ranking function. Such Context a ranking approach rarely considers explicit dependency among the recommended items. In this work, we propose a generative approach #Tag1 #Tag2 #TagN to tag recommendation, where semantic tags Generation Generation Generation are selected one at a time conditioned on Model Model … Model the previously generated tags to model inter- dependency among the generated tags. We ap- Context Context #Tag1 Context #Tag1 #Tag2 … #TagN-1 ply this tag recommendation approach to an Instagram data set where an array of context Figure 1: Ranking vs. Generation model. feature types (image, location, time, and text) are available for posts. To exploit the inter- dependency among the distinct types of fea- while the tags can be seen as predefined categories, tures, we adopt a simple yet effective architec- they can be generated as a sequence of tags like a ture using self-attention, making deep interac- natural language sentence. tions possible. Empirical results show that our Social networking service (SNS) platforms like method is significantly superior to not only the Instagram and Twitter are example cases in which usual ranking schemes but also autoregressive models for tag recommendation. They indi- tag recommendation plays a significant role in ag- cate that it is critical to fuse mutually support- gregating and distributing information. Users tend ing features at an early stage to induce exten- to include hashtags when they post daily life stories sive and comprehensive view on inter-context or advertisements, expecting they would serve as interaction in generating tags in a recurrent keywords for the semantics and pragmatics of the feedback loop. unstructured content like image and text. Recom- mending appropriate hashtags to the users would in- 1 Introduction crease global coherency of the tags used in the user From traditional term-based methods to deep neu- population and subsequently facilitate grouping the arXiv:2012.02957v1 [cs.CL] 5 Dec 2020 ral network based models, recommendation func- posts of similar topics for easier navigation/search. tions in the Internet domain are widely adopted. A number of hashtag recommendation methods While categorical features are often used for col- have been proposed to date (Weston et al., 2014; laborative filtering types along with a user cue, rec- Gong and Zhang, 2016; Wu et al., 2018a; Wang ommending text or image, i.e. content-based rec- et al., 2019; Zhang et al., 2019; Yang et al., 2020b; ommendation, requires handling unstructured data Kaviani and Rahmani, 2020; Yang et al., 2020a). based on relevance of items toward a user query. These methods focus on modeling latent topic dis- Like search, most of these recommendation sys- tribution over hashtags (Godin et al., 2013), heavily tems rank and return items with top-k predicted relying on late fusion approach to model the inter- logit values (Covington et al., 2016; Weston et al., action between image and text input (Zhang et al., 2014; Wu et al., 2018b). We posit that tag recom- 2017; Yang et al., 2020b) or project words within mendation touches on the middle ground because the given SNS post and the hashtag embeddings (i.e. tag embeddings) to a common high-dimensional has been used in other studies (Wang et al., 2019; space and update the embeddings with pairwise Yang et al., 2020b), we build a BERT-based autore- ranking loss (Weston et al., 2014). However, prior gressive (AR) model with a Transformer (Vaswani studies taking the ranking approach to tag recom- et al., 2017) decoder. The experimental result show mendation neglect inter-dependency among the that our model also outperforms the AR model by generated hashtags for the given context (i.e., post). a large margin. We propose a recurrent hashtag feedback ap- Our key contributions are summarized are as proach to tag recommendation, which enables the follows: recommendation model to repeatedly consider pre- • A generation framework, recurrent hashtag viously generated tags in generating the next “rel- feedback, for tag recommendation, consider- evant” tag. Our recurrent model using BERT (De- ing inter-tag dependency vlin et al., 2019) generates hashtags conditioned on the assorted context information and previously • An early fusion approach enabling deep in- generated hashtags as in Figure1. Note that this teractions among context features and tags, recurrent BERT model is devised for the unique and nature of syntax-free tag generation, instead of the usual RNN or BERT approach for language gen- • Experiments showing the superiority of the eration. On a different note, this approach can be proposed approaches and shedding light on seen as analogous to pseudo-relevance feedback the way the context features interact among (PRF) (Xu and Croft, 1996) where previously re- each other for tag generation. trieved items are deemed relevant and additional query terms are extracted as relevance feedback. 2 Related Work For tag recommendation, we also assume that pre- The hashtag recommendation problem has been viously generated tags can be trusted to be relevant studied as a ranking problem (Park et al., 2016; but incorporate them for generating the next one. Zangerle et al., 2011; Denton et al., 2015; Sed- Nonetheless, they can be seen as part of the new hai and Sun, 2014; Li et al., 2016; Weston et al., “query” for generation of the next tag. 2014; Wu et al., 2018b; Gong and Zhang, 2016). Our work also proposes an early fusion of multi- A representative approach is to use a visual fea- modal context features of Instagram posts (image, ture extractor from the input image and employs a location, time, and text). Prior works (Denton et al., multi-label classifier to calculate the score of each 2015; Wang et al., 2019; Gong et al., 2018; Li et al., hashtag and provides top-k hashtags recommenda- 2016) on combining multi-modal features in hash- tions (Park et al., 2016). Others handle multiple tag recommendation tend to merge the representa- input feature types (i.e. image, text) by mapping tions with either a co-attention (Lu et al., 2016) or a them into a common representation space and ap- bi-attention (Seo et al., 2017) mechanism after inde- ply the pairwise ranking loss algorithm (Denton pendently encoding features of differing modalities. et al., 2015; Weston et al., 2014; Wu et al., 2018b), Based on our intuition that context input features such as the weighted approximate-rank pairwise should affect the representation modeling process (WARP) loss (Weston et al., 2011), as the training as early as possible, we exploit the self-attention objective. Many of the prior studies on hashtag rec- based pre-trained BERT to fuse the different fea- ommendation (Godin et al., 2013; Ding et al., 2012; tures and encode the relationships at an early stage Li et al., 2019; Zhao et al., 2016) take topic mod- of building representations. This approach has an eling approaches with Latent Dirichlet Allocation added benefit of allowing for an investigation of (LDA), which is often used to discover general top- how the features influence each other for tag gener- ics in a large collection of documents. Unlike our ation, in addition to a usual ablation study where approach, however, such ranking approaches do we can only reveal the role of each feature type for not explicitly consider the inter-dependency among the overall performance. the generated hashtags. Our experimental work shows that the proposed The use of multi-modal features is also evident method outperforms the ranking approaches by a (Denton et al., 2015; Wang et al., 2019; Gong et al., significant margin. To further differentiate and eval- 2018; Li et al., 2016; Zhang et al., 2017; Yang et al., uate our model against a generative approach that 2020a,b). The types of multi-modal features and ^ ^ ^ ht2 ht3 hts … … … … … BERT BERT … BERT … … … … … … … … … ^ … ^ ^ ^ img1 [IMG] loc1 [LOC] time1 [TIME] txt1 [SEP] ht1 [MASK] img loc time txt ht1 ht2 [MASK] img loc time txt ht< s [MASK] ^ ht 1 ^ s- I2 = [C, ht< 2, [MASK]] I3 … Is where C = [img, loc, time, txt] Is-1 Figure 2: Overall architecture of the proposed approach (sequential tag generation). the ways they are fused are quite distinct. For in- immediately preceding token is pooled (Vaswani stance, Denton et al. incorporate user metadata et al., 2017), our model fuses mutually supporting (e.g. age, gender) with 3-way multiplicative gat- context features “directly” with the self-attention ing along with image for hashtag recommendation. mechanism along with the incrementally appended Another example (Wang et al., 2019) uses both the hashtags. This early fusion approach is conducive text description of a given tweet and the thread to modeling our representation space because the conversation by employing bi-attention (Seo et al., output space is jointly modeled with the combined 2017). A more recent approach makes use of audio context-tag representation for every generation step. features (Yang et al., 2020a) for short video infor- On the contrary, the commonly used late fusion of mation. A common drawback of these previous latent representations of multi-modal input vectors models is that they capture a very limited amount is limited to the aggregation of the projected infor- of relationship among the input features of different mation. The expected benefit of the early fusion modalities. Our approach of using a self-attention is the extensive and comprehensive view on the mechanism by employing a pre-trained BERT en- contextual information in generating a hashtag.

A Sequence-Oblivious Generation Method for Context-Aware Hashtag Recommendation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support