A Neural Framework for Analyzing the Tweets of Dril
Total Page:16
File Type:pdf, Size:1020Kb
A Neural Framework for Analyzing the Tweets of dril Grant Storey Cornell University [email protected] Abstract Our contributions are threefold: Natural Language Processing is a major 1. We collect and curate a new dataset for ana- field within computer science research, but lyzing the absurdity of dril Tweets. by definition it limits itself to common human language. Many interesting lin- 2. We propose a state-of-the-art neural model guistic phenomena can be found in un- for performing this analysis task. natural, absurd language, especially of the 3. We analyze applications for this research and sort common in the twitterverse. In this directions for future work. work, we present the first scholarly study of the tweets of Twitter user dril. We 2 Related Work gather a new dataset containing the text of dril tweets and associated absurdity rank- Some prior work in Emoji language, in- ings collected from mechanical turkers. cluding Barbieri et al.’s skip-gram semantic Then, we suggest state-of-the-art neural model(Barbieri et al., 2016) but also other anal- techniques to perform the given classifi- yses(Miller et al., 2016; Kelly and Watts, 2015) is cation task on these tweets. We report tangentially related in analyzing slightly different promising experimental results that sug- forms of communication found on social media, gest wider value for this specific task. often detached from natural human language. Parker’s work on echos of Cthulu’s primal bel- 1 Introduction lows seeks to provide structure to unnatural lan- guage, but is mostly focused on waveform analysis Vast scholarly time and effort has been focused and is primarily concerned with non-human lan- on solving various problems in Natural Language guage (Parker, 1999). Our work, in contrast, seeks Processing, the sub-field of Computer Science to study language originating from this planet. which focuses on natural language processing. In Lastly, studies on train conductor announce- spite of this, very little attention has been paid ments (Yarini and Fubata, 2002) and blackout frat to linguistic corpora which are not strictly natural boy heckling (Johnson, 2011) present analyses of language but are quite similar to it: pig latin, gib- edge cases of human language, but these works berish, random yelling, 8-year-olds on Youtube, focus more on translating less coherent language and wild Twitter accounts. Our work seeks to be- into natural language rather than analyzing the al- gin remedying this gap. ternative languages in their own right. For our first task, we hope to analyze the absur- dity of tweets from the Twitter user dril. No exist- 3 Data ing corpora concerning these tweets exist, so we utilize mechanical turkers to create a new dataset To the best of our knowledge, there is no prior and curate it for optimal usability. work on analyzing absurd tweets. In order to make We then utilize modern neural network tech- this task feasible, we collect a new dataset of dril’s niques for text analysis to create a model for per- tweets, the Dril Tweet Foundational Dataset (DTF forming the analysis task given a tweet. Using this Dataset). We gathered all 7519 of dril’s tweets model on our dataset gives promising results. from the first tweet on September 15th, 2008 to Figure 1: An example of one of dril’s tweets. Figure 2: Illustration of the model architecture. 4 Method 12:30 PM (EST) on March 4th, 2018. An example tweet is presented in Figure1. Our model is designed to solve the task at hand: We then gathered mechanical turkers to per- given a sequence of tokens representing the tweet, form our labeling by harassing them via Facebook predict an absurdity score from the set f1, 2, 3, 4, Messenger. Turkers were asked to rate tweets on 5g. We begin with preprocessing, feed the tokens an absurdity scale of 1-5, with 1 being reasonably through an LSTM, and then choose the most likely normal and 5 being completely unhinged. label for the given tweet. For each score from 1 to 5, users were given an example tweet with that score to calibrate their 4.1 Preprocessing analysis. They were then presented with 50 tweets We analyze the entire data set, and replace every and 1 check (to ensure they were paying atten- token that appears only once with the special to- tion). These 50 initial tweets were the same for ken <UNK> (unknown). 240 tokens appear at each user. After completing the 50 tweets, users least twice and are mapped to a unique embed- could optionally continue and rate more random ding, while the remaining 820 tokens (which ap- tweets in batches of 50. pear once each) are mapped to the UNK embed- A total of 5 users performed the task over the ding. We do not use pre-trained embeddings be- course of 10 days. All users passed every attention cause many of the words used are misspellings task, and 1 user completed more than the initial 50 or nonsense not present in any pre-trained embed- tweets (they rated 108 tweets total, including the dings. initial 50). After gathering the data, we assigned a score 4.2 Neural Model to each rated tweet by choosing the median score See Figure2 for an illustration of the model. provided to it. The Cronbach’s alpha for the Following on prior work in Natural Language dataset was 0.8009, where a score above .8 rep- Processing, we use a Recurrent Neural Network resents “good” quality agreement, showing that with Long Short-Term Memory (Hochreiter and the amount of inter-annotator agreement was quite Schmidhuber, 1997) for analysis of the text. We “good”. first extract the 50-dimensional embedding asso- We then tokenized the text of each tweet so that ciated with each of the tokens in the tweet. These punctuation was treated as its own token. Each to- embeddings are fed into a two-layer LSTM to gen- kenized tweet is paired with its median absurdity erate a representation of the sentence. This rep- score across all raters. resentation is passed through a hidden layer with Finally, we divided our 108 data points into 128 dimensions to generate 5 output dimensions, Training, Validation, and Test sets of size which are in turn passed through a SOFTMAX layer 88/10/10 respectively. The DTF Dataset can be to generate probabilities for each of the possible found at http://www.grantstorey.com/ labels. The label with highest probability is cho- SIGSEGV/2018/DTF.zip. sen, and the associated negative log-loss is calcu- Figure 3: Training and validation accuracy by epoch. lated and back-propagated through the network. In Model Validation Acc Test Acc general, the model can be described by the follow- Majority-Class 50% 50% ing equations: Baseline Our Model 50% 30% A = P (pa + re ∗ n ∗ t)=l(y) I = Ne ∗ Ed Table 1: Comparison on the Validation and Test t Equ = A + I0;N ∗ ~s sets of the performance of a baseline and our model. Li[k] ± = T (Hi ∗ ~s) 4.3 Parameters 5 Results and Discussion We trained our model for 150 epochs, with a batch size of 1, and beam search with beam size large The results for our model on the validation and enough to obliterate Alderaan in one shot and test set is compared to the performance of the crush any hope for the rebellion with our display majority-class baseline (which always chooses the of unstoppable military might. most common label in the training data) in Table1. The majority-class baseline clearly outperforms 4.4 Model Choice our model. Nevertheless, we consider this to be We examined the model’s performance on the vali- a positive result. The failure of state-of-the-art dation set at each epoch, and chose the model from methods to solve this problem show that it is a dif- the last epoch with 100% training accuracy before ficult task, and likely unsolvable by computational validation accuracy began to go down - Epoch 26 methods. in this case. See Figure3 to see the training and In fact, this difficult is quite promising, as this validation error by epoch. task (or other similar tasks) could be used to pre- vent dangerous artificial intelligences from posing Smyth Parker. 1999. Interpreting sounds of darkness a threat to humanity by distracting them with un- with complex analysis. Studies in Necromancy, solvable problems. Other problems in a similar Voidwalking, Shadowcasting, and Other Forbidden Arts . class include convincing young children to sit still for an entire day, determining why kids love Cin- Sheila Yarini and Alice Fubata. 2002. Hrmnrm hmr namon Toast Crunch, and finding meaning in an next stop frmhrm: Translating new york subway announcements into english. In Trains! Trains! uncaring universe. Trains!. 6 Conclusion We set out to tackle a new task: determining the absurdity of dril tweets. We collected a robust dataset to allow evaluation of models on this task. Our proposed model is then applied to this task and found to have worse performance than a sim- ple baseline. We use this result to successfully argue that this task can stand between humanity and the robot apocalypse by presenting an unsolv- able task for potentially dangerous Artificial Intel- ligences. Acknowledgments We would like to thank the brave souls who con- tributed to our Tweet dataset for their valuable contributions. We would also like to thank the world for providing us with dril, a national trea- sure. This work was generously funded by grant EM-45209 from Elon Musk’s Artificial Stupid- ity initiative, with additional funding provided through Aperture Labs’ W.H.E.A.T.L.E.Y.