Memebot: Towards Automatic Image Meme Generation
Total Page:16
File Type:pdf, Size:1020Kb
memeBot: Towards Automatic Image Meme Generation Aadhavan Sadasivam, Kausic Gunasekar, Hasan Davulcu, Yezhou Yang Arizona State University, Tempe AZ, United States fasadasi1, kgunase3, hdavulcu, yz.yang [email protected] Abstract Input Sentence I am curiously waiting for my father to cook supper tonight Image memes have become a widespread tool used by people for interacting and exchang- Meme Template ing ideas over social media, blogs, and open Selection Module messengers. This work proposes to treat au- tomatic image meme generation as a transla- Selected Meme Template Name Encoder tion process, and further present an end to end Meme Template neural and probabilistic approach to generate Meme Template Meme an image-based meme for any given sentence Image Embedding using an encoder-decoder architecture. For a given input sentence, an image meme is gener- Decoder ated by combining a meme template image and a text caption where the meme template image Generated Caption is selected from a set of popular candidates us- ing a selection module, and the meme caption is generated by an encoder-decoder model. An encoder is used to map the selected meme tem- Figure 1: An illustrative figure of memeBot. It gen- plate and the input sentence into a meme em- erates an image meme for a given input sentence by bedding and a decoder is used to decode the combining the selected meme template image and the meme caption from the meme embedding. The generated meme caption. generated natural language meme caption is conditioned on the input sentence and the se- lected meme template. The model learns the dependencies between the meme captions and Over the internet, information exists in the form the meme template images and generates new of text, images, video, or a combination of these. memes using the learned dependencies. The However the information existing as a combina- quality of the generated captions and the gen- tion of image or video and short text often gets erated memes is evaluated through both auto- viral. Image memes are a combination of image, mated and human evaluation. An experiment text, and humor, making them a powerful tool to is designed to score how well the generated memes can represent the tweets from Twitter deliver information. The image memes are popular arXiv:2004.14571v1 [cs.CL] 30 Apr 2020 conversations. Experiments on Twitter data because they portray the culture and social choices show the efficacy of the model in generating embraced by the internet community and they have memes for sentences in online social interac- a strong influence on the cultural norms of how tion. specific demographics of people operate. For ex- ample, in Figure2, we present the memes used by 1 Introduction an online deep learning community to ridicule how An Internet meme commonly takes the form of the new pre-training methods are outperforming an image and is created by combining a meme the previous state-of-the-art models. template (image) and a caption (natural language The popularity of image-based memes can be at- sentence). The image typically comes from a set of tributed to the fact that visual information is easier popular image candidates and the caption conveys to process and understand when compared to read- the intended message through natural language. ing large blocks of text, and this fact is evident in any given sentence. • We compiled the first large-scale Meme Cap- tion dataset. • We design experiments based on human eval- uation and provide a thorough analysis of the Figure 2: Memes used by the online deep learning com- experimental results on using the generated munity on social media to ridicule the state-of-the-art pre-training models. memes for online social interaction. Figure2. The key role played by the image memes 2 Related Work in shaping the popular culture of the internet com- There are only a few studies on automatic image munity makes automatic meme image generation a meme generation and the existing approaches treat demanding research topic to delve into. meme generation as a caption selection or caption Davison(2012) separates a meme into three com- generation problem. Wang and Wen(2015) com- ponents - Manifestation, Behavior, and Ideal. In bined an image and its text description to select a an image meme, the Ideal is the idea that needs to meme caption from a corpus based on a ranking be conveyed. The Behavior is to select a suitable algorithm. Peirson et al.(2018) extends Natural meme template and caption to convey that idea and, Language Description Generation to generating the Manifestation is the final meme image with a a meme caption using an encoder-decoder model caption conveying the idea. Wang and Wen(2015) with an attention (Luong et al., 2015) mechanism. and Peirson et al.(2018) focus on the behavior and Although there is not much work on automatic manifestation of a meme, not much importance meme generation, the task of meme generation can is given to the ideal of a meme. Their approach be closely aligned with tasks like Sentiment Anal- of image meme generation is limited to selecting ysis, Neural Machine Translation, Image Caption the most appropriate meme caption or generating Generation and Controlled Natural Language Gen- a meme caption for the given image and template eration. name. In this work, we intend to automatically In Natural Language Understanding (NLU), re- generate an image meme to represent a given input searchers have explored classifying a sentence sentence (Ideal) as illustrated in Figure 1, which is based on their sentiment (Socher et al., 2013). We a challenging NLP task with immediate practical extend this idea to classify a sentence based on its applications for online social interaction. compatibility with a meme template. The idea of By taking a deep look into the process of image creating an encoded representation and decoding meme generation, we propose to co-relate image it into a desired target is well establish in Neural meme generation to Natural Language Translation. Machine Translation and Image Caption Genera- To translate a sentence from a source to target lan- tion. Sutskever et al.(2014), Bahdanau et al.(2014) guage, one has to decode the meaning of the sen- and Vaswani et al.(2017) use an encoder-decoder tence in its entirety, analyze its meaning, and then model to encode and decode a sentence from a encode that meaning of the source sentence into source to a targeted language. Vinyals et al.(2015), the target sentence. Similarly, a sentence can be Xu et al.(2015), Karpathy and Fei-Fei(2015) en- translated into a meme by encoding the meaning of code the visual features of an image and use a the sentence into a pair of image and caption capa- decoder to generate natural language description ble of conveying the same meaning or emotion as of the image. Fang et al.(2020) generate natu- that of the sentence. Motivated by this intuition for ral language description containing common sense meme generation, we develop a model that oper- knowledge from the encoded visual inputs. ates beyond the known approaches and extend the Our proposed model of meme generation shares capability of image meme generation to generate similar spirit with the above mentioned problems memes for a given input sentence. We summarize where we encode the given input sentence into a our contributions as follows: latent space followed by decoding it into a meme • We present an end to end encoder-decoder caption that can be combined with the meme image architecture to generate an image meme for to convey the same meaning as that of the input Generated Meme Meme Generated Template Meme Predicted Image Caption Meme Template Softmax Linear Meme Template Add & Norm Add & Norm Selection Module Meme Feed Forward Feed Forward Template Name "Please save the Input Add & Norm Add & Norm world from Sentence Corona Virus" Multi-Head Multi-Head Attention Attention Template Embedding Parts-of-Speech Meme Add & Norm Add & Norm Tagger Embedding Multi-Head Masked Multi-Head Attention Attention POS Vector POS Mask Position Position Embedding Embedding Masked Input Output Caption Output Input Embedding (Shifted Right) Embedding Sentence Figure 3: memeBot - model architecture. For a given input sentence, a meme is created by combining the meme image selected by the template selection module and the meme caption generated by the caption generation trans- former. sentence. However, the generated meme caption 3 Our Approach should represent the input sentence through the se- lected meme template, making it a conditioned or In this section, we describe our approach: an end- controlled text generation task. Controlled Natu- to-end neural and probabilistic architecture for ral Language Generation with desired emotions, meme generation. Our model has two components. semantics and keywords have been studied pre- First, a meme template selection module to identify viously. Huang et al.(2018) generate text with a compatible meme template (image) for the input desired emotions by embedding emotion represen- sentence. Second, a meme caption generator as tations into Seq2Seq models. Hu et al.(2017) illustrated in Figure3. concatenate a control vector to the latent space 3.1 Meme Template Selection Module of their model to generate text with designated se- mantics. Su et al.(2018) and Miao et al.(2019) Pre-trained language representations from trans- generate a sentence with desired emotions or key- former based architectures like BERT (Devlin et al., words using sampling techniques. While these 2019), XLNet (Yang et al., 2019) and Roberta (Liu approaches of controlled text generation involve et al., 2019) are being used in a wide range of Nat- relatively complex conditioning factors, we imple- ural Language Understanding tasks. Devlin et al.