Arxiv:2104.08620V1 [Cs.CL] 17 Apr 2021

Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP Josh Rozner Christopher Potts Kyle Mahowald frozner,[email protected] [email protected] Abstract Cryptic crosswords, the dominant English- language crossword variety in the United King- dom, can be solved by expert humans using flexible, creative intelligence and knowledge of language. Cryptic clues read like fluent natural language, but they are adversarially composed of two parts: a definition and a wordplay Figure 1: Illustration of how the cryptic crossword clue cipher requiring sub-word or character-level “But everything’s really trivial, initially, for a trans- manipulations. As such, they are a promising former model” is parsed. The word “initially” is an in- target for evaluating and advancing NLP sys- dicator to apply an initialism function to the string “But tems that seek to process language in more cre- everything’s really trivial,” taking the first letter of each ative, human-like ways. We present a dataset word to get BERT. The word “for” is an optional link- of cryptic crossword clues from a major news- ing word (which connects the wordplay and the defini- paper that can be used as a benchmark and tion). The definition part is “a transformer model,” for train a sequence-to-sequence model to solve which a valid answer is BERT. Because this matches them. We also develop related benchmarks the wordplay part and fits the enumeration (which indi- that can guide development of approaches to cates 4 letters), it is a valid answer. this challenging task. We show that performance can be substantially improved using a novel curriculum learning approach in which domains. Just as complex games mastered by hu- the model is pre-trained on related tasks in- man experts, such as chess, Go, and video games, volving, e.g, unscrambling words, before it have proved a fertile ground for developing more is trained to solve cryptics. However, even flexible AI, (Silver et al., 2018, 2016; Mnih et al., this curricular approach does not generalize to 2015), we propose that creative language games novel clue types in the way that humans can, are a rich area for developing more flexible NLP and so cryptic crosswords remain a challenge for NLP systems and a potential source of fu- models. In particular, we argue that linguistic tasks ture innovation.1 involving meta-linguistic reasoning (i.e., reasoning about language qua language) pose an important 1 Introduction and significant challenge for state-of-the-art computational language systems. arXiv:2104.08620v1 [cs.CL] 17 Apr 2021 Modern computational models have made great progress at handling a variety of natural language One such domain is cryptic crossword puzzles. tasks that require interpreting the rich syntactic and Cryptics are the most popular style of crossword semantic structures (Devlin et al., 2018; Radford in the United Kingdom and appear in major news- et al., 2019; Manning et al., 2020; Rogers et al., papers like The Times and The Guardian. They 2020). However, in NLP (Bender and Koller, 2020; differ from American-style crossword puzzles in Marcus, 2020; Bisk et al., 2020) as in other areas that they have both definition and wordplay com- of AI (Lake et al., 2016), machines still lag humans ponents. Consider this NLP-centric cryptic cross- on tasks that require flexible problem solving, rapid word clue: “But everything’s really trivial, initially, learning of unseen tasks, and generalization to new for a transformer model (4).” The wordplay part is “But everything’s really trivial, initially.” The 1Code to download (from theguardian.com), clean, and process the dataset is publicly available at https:// word initially in this context is used as an indica- github.com/jsrozner/decrypt tor, which tells the solver to take the initial letter Clue type Clue example Explanation for this example Anagram: An anagram indicator indicates Confused, Bret makes a Rearrange the letters of “Bret” to get the letters must be scrambled. language model (4) BERT. Initialism: An initialism indicator indicates But everything’s really Take the first letters of “everything’s really one must take the first letters of a phrase trivial, at first, for a lan- trivial” guage model (4) Container: A container indicator indicates Language model in Extract the word BERT from the phrase the answer is hidden within a larger phrase. somber text (4) “somber text.” Charade: For a charade clue, each part of A language model exist? “exist” becomes BE. A standard abbrevia- the answer is clued sequentially. Right time! (4) tion for “right” is R. A standard crossword abbreviation for “time” is T. Double definition: In a double definition Model Sesame Street Bert is a valid answer for “Sesame Street clue, two synonyms or phrases appear next character (4) character”, and it is also a model. to each other, each of which can refer to the answer. Table 1: Sample of 4 common clue types in cryptic crosswords, all demonstrating clues for the answer: “BERT”. of each of the preceding words (but everything’s important challenge for modern NLP. really trivial) to get the answer word: “BERT”. The definition part of the clue is “a transformer model,” In this paper, we first present a dataset which is a semantically valid description of BERT. of cryptic crosswords clues taken from Because both the wordplay and the definition com- theguardian.com, consisting of 142,380 ponents lead to the same 4-letter word (which is cleaned clues from 5,518 puzzles over 21 years. what the enumeration calls for), we can be rea- Second, we present a series of computational sonably confident that BERT is the correct answer. benchmarks, along with a sequence to sequence Many clues require the application of multiple, po- approach that uses the pretrained T5 model, to tentially composed functions (i.e. the result of one characterize the problem space—and to illustrate transformation, like taking a synonym, becomes why cryptic crosswords pose a particular challenge the input to the next), with multiple indicators, to for existing NLP approaches. Finally, we present solve the wordplay. a curriculum learning approach, in which our While cryptic crosswords pose a challenge to system is first trained on related tasks (e.g., an novice solvers, expert solvers draw on a combina- augmented word descrambling task) before being tion of world knowledge, domain-specific cryptic unleashed on real cryptic crossword clues. This crossword knowledge, linguistic knowledge, and approach meaningfully improves on our standard general problems solving. Expert solvers know the T5 sequence-to-sequence approach and on the best rules that govern cryptic crosswords, but they also performing model in Efrat et al. 2021—concurrent reason about them flexibly and apply them to solve work that presents a similar dataset and baseline novel clues. In the psychology literature, it has T5 approach. been claimed that cryptic crosswords depend on domain-general intelligence (Friedlander and Fine, While we show that our sequence to sequence 2020). Moreover, Friedlander and Fine(2018) cite approach can learn interesting features of the prob- cryptic crosswords as an ideal domain for studying lem space, including learning some meta-linguistic the “aha moment” in humans because they can be facts about word length, fully solving cryptic cross- solved by experts with high accuracy (but often words remains a challenge for NLP. In particular, not at first glance), can be easily created by expert we show that the Transformer-based approach suf- humans, and require drawing on a diverse range of fers significantly when the test set is constructed so cognitive abilities. as to avoid having answer words that also appear in Therefore, we believe that cryptic crosswords are the training set. Moreover, although we introduce an excellent domain for developing computational a novel method that considerably improves T5’s language systems that “learn and think like humans” performance, we are still far below expert human (Lake et al., 2016). In particular, we argue that performance. To that end, we hope that this dataset cryptic crossword clues pose an interesting and will serve as a challenge for future work. 2 Background and Related Work 3 Dataset 2.1 Cryptic Crosswords We present a cleaned dataset of cryptic cross- Standard cryptic crossword clues generally have word clues drawn from puzzles published in The two parts: a definition part and a wordplay part. Guardian between July 23, 1999, and October 8, 2 What makes cryptics hard is that there is huge va- 2020. riety in the kind of wordplay that can occur. In 3.1 Preprocessing the introduction, we discussed a type of cryptic clue that requires extracting the initial letters of a To produce the clean dataset, we remove 15,360 sequence of words. Another common type of word- clues that interact with other clues in the same play is the anagram clue type, where a sequence of puzzle, as well as 231 clues that are ill-formatted letters needs to be scrambled to find a new word. (e.g., have a solution length not matching the length Anagrams are usually indicated by words like con- enumeration, unparsed HTML characters, etc.). fused, drunk, mixed up, or any other word in the We further remove 1,611 exact duplicates, which semantic space of motion, confusion, alteration, are clues with identical answers and clue strings. accident, etc. (It would be difficult to construct a Code to fully replicate our data download and pre- list of all possible anagram indicators, but expert processing pipeline is available in our project code solvers learn to identify indicators of this type.) repository. Clues also have an enumeration which says how 3.2 Dataset Properties many words and letters are in the answer.

Arxiv:2104.08620V1 [Cs.CL] 17 Apr 2021

The Story of Cluedo & Clue a “Contemporary” Game for Over 60 Years

Decree Legal Term Crossword Clue

Hasbro Third Quarter 2011 Financial Results Conference Call Management Remarks October 17, 2011

Mixed Logical and Probabilistic Reasoning in the Game of Clue

Supernatural Aspect in James Wan's Movie the Conjuring Ii

Accelerated Reader Book List Report by Reading Level

“It Was Colonel Mustard in the Study with the Candlestick”: Using Artifacts to Create an Alternate Reality Game– the Unworkshop

Scooby Doo Clue Game Instructions

1958 Renegade Raconteur Bakersfield College Yearbook

Ludic Dysnarrativa: How Can Fictional Inconsistency in Games Be Reduced? by Rory Keir Summerley

Ouija Board Ghost Encounter

American Crossword Puzzle Tourney