How to Make a Frenemy: Multitape Fsts for Portmanteau Generation
Total Page:16
File Type:pdf, Size:1020Kb
How to Make a Frenemy: Multitape FSTs for Portmanteau Generation Aliya Deri and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California aderi, knight @isi.edu { } Abstract W1 W2 PM affluence influenza affluenza A portmanteau is a type of compound word anecdote data anecdata that fuses the sounds and meanings of two chill relax chillax component words; for example, “frenemy” flavor favorite flavorite (friend + enemy) or “smog” (smoke + fog). We develop a system, including a novel mul- guess estimate guesstimate titape FST, that takes an input of two words jogging juggling joggling and outputs possible portmanteaux. Our sys- sheep people sheeple tem is trained on a list of known portmanteaux spanish english spanglish and their component words, and achieves 45% zeitgeist ghost zeitghost exact matches in cross-validated experiments. Table 1: Valid component words and portmanteaux. 1 Introduction with minimal human intervention would be not only Portmanteaux are new words that fuse both the a useful tool in areas like advertising and journalism, sounds and meanings of their component words. In- but also a notable achievement in creative NLP. novative and entertaining, they are ubiquitous in ad- Due to the complexity of both component word vertising, social media, and newspapers (Figure 1). selection and blending, previous portmanteau gen- Some, like “frenemy” (friend + enemy), “brunch” eration systems have several limitations. The Neho- (breakfast + lunch), and “smog” (smoke + fog), ex- vah system (Smith et al., 2014) combines words only press such unique concepts that they permanently at exact grapheme matches, making the generation enter the English lexicon. of more complex phonetic blends like “frenemy” or Portmanteau generation, while seemingly trivial “brunch” impossible. Ozbal¨ and Strappavara (2012) for humans, is actually a combination of two com- blend words phonetically and allow inexact matches plex natural language processing tasks: (1) choos- but rely on encoded human knowledge, such as sets ing component words that are both semantically of similar phonemes and semantically related words. and phonetically compatible, and (2) blending those Both systems are rule-based, rather than data-driven, words into the final portmanteau. An end-to-end and do not train or test their systems with real-world system that is able to generate novel portmanteaux portmanteaux. In contrast to these approaches, this paper presents a data-driven model that accomplishes (2) by blending two given words into a portmanteau. That is, with an input of “friend” and “enemy,” we Figure 1: A New Yorker headline portmanteau. want to generate “frenemy.” 206 Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL, pages 206–210, Denver, Colorado, May 31 – June 5, 2015. c 2015 Association for Computational Linguistics 3. 1+ phonemes from W1 are aligned with an F1 R1 EH3 N3 D4 T1 OW1 F1 UW3 pron 2 equal number of phonemes from Wpron. EH3 N3 AH5 M5 IY5 T2 ER3 K5 IY5 For each aligned pair of phonemes (x, y), either x or y is output. Figure 2: Derivations for friend + enemy “frenemy” 1 → 4. 0+ phonemes from Wpron are deleted, until the and tofu + turkey “tofurkey.” Subscripts indicate the 1 → end of Wpron. step applied to each phoneme. 2 5. 0+ phonemes from Wpron are output, until the 2 end of Wpron. We take a statistical modeling approach to port- manteau generation, using training examples (Table 3 Multitape FST model 1) to learn weights for a cascade of finite state ma- chines. To handle the 2-input, 1-output problem in- Finite state machines (FSMs) are powerful tools herent in the task, we implement a multitape FST. in NLP and are frequently used in tasks like ma- This work’s contributions can be summarized as: chine transliteration and pronunciation. Toolkits like a portmanteau generation model, trained in an Carmel and OpenFST allow rapid implementations • unsupervised manner on unaligned portman- of complex FSM cascades, machine learning algo- teaux and component words, rithms, and n-best lists. the novel use of a multitape FST for a 2-input, Both toolkits implement two types of FSMs: fi- • 1-output problem, and nite state acceptors (FSAs) and finite state transduc- the release of our training data.1 ers (FSTs), and their weighted counterparts (wFSAs • and wFSTs). An FSA has one input tape; an FST 2 Definition of a portmanteau has one input and one output tape. In this work, a portmanteau PM and its pronuncia- What if we want a one input and two output tapes for an FST? Three input tapes for an FSA? Although tion PMpron have the following constraints: infrequently explored in NLP research, these “mul- PM has exactly 2 component words W1 and titape” machines are valid FSMs. • 1 2 W2, with pronunciations W1 and W2 . In the case of converting W ,W to pron pron { pron pron} All of PM’s letters are in W1 and W2, and all PMpron, an interleaved reading of two tapes would be • 1 2 impossible with a traditional FST. Instead, we model phonemes in PMpron are in Wpron and Wpron. All pronunciations use the Arpabet symbol set. the problem with a 2-input, 1-output FST (Figure • Portmanteau building occurs at the phoneme 3). Edges are labeled x : y : z to indicate input • 1 2 level. PMpron is built through the following tapes Wpron and Wpron and output tape PMpron, re- steps (further illustrated in Figure 2): spectively. 1 1. 0+ phonemes from Wpron are output. 4 FSM Cascade 2 2. 0+ phonemes from Wpron are deleted. We include the multitape model as part of an FSM 1Available at both authors’ websites. cascade that converts W1 and W2 to PM (Figure 4). : : : : : : : : q1 q2 q3 q4 q5 : x : : : : x : : x : : : : : : y : : y : : : : y : x x/y y q1a q2a q3a q4a q5a Figure 3: A 2- input, 1-output wFST for portmanteau pronunciation generation. 207 1 1 W FST A Wpron wFST B PMpron wFST C PM0 wFSA D PM00 FSA E1,2 PM000 2 2 W FST A Wpron jogging JH AH G IH NG JH AH G AH L IH NG joggaling juggling joggling juggling JH AA G AH L IH NG Figure 4: The FSM cascade for converting W1 and W2 into a PM, and an illustrative example. phonemes P (x, y z) step k description P (k) → x y z cond. joint mixed 1 1 Wpron keep 0.68 2 AA AA AA 1.000 0.017 1.000 2 Wpron delete 0.55 AH ER AH 0.424 0.007 0.445 3 align 0.74 1 AH ER ER 0.576 0.009 0.555 4 Wpron delete 0.64 2 PBP 0.972 0.002 1.000 5 Wpron keep 0.76 PBB 0.028 N/A N/A Z SH SH 1.000 N/A N/A Table 3: Learned step probabilities. The probabilities of keeping and aligning are higher than those of deleting, JH AO JH 1.000 N/A N/A showing a tendency to preserve the component words. Table 2: Sample learned phoneme alignment probabili- ties for each method. manteaux with three component words (“turkey” + “duck” + “chicken” “turducken”) or without any 1 → We first generate the pronunciations of W and overlap (“arpa” + “net” “arpanet”). From 571 ex- 2 → W with FST A, which functions as a simple look- amples, this yields 401 W1,W2, PM triples. { } up from the CMU Pronouncing Dictionary (Weide, We also use manual annotations of PMpron for 1998). learning the multitape wFST B weights and for mid- Next, wFST B, the multitape wFST from Figure cascade evaluation. 1 2 3, translates Wpron and Wpron into PMpron. wFST C, We randomly split the data for 10-fold cross- built from aligned graphemes and phonemes from validation. For each iteration, 8 folds are used for the CMU Pronunciation Dictionary (Galescu and training data, 1 for dev, and 1 for test. Training data Allen, 2001), spells PMpron as PM0. is used to learn wFST B weights (Section 6) and dev To improve PM0, we now use three FSAs built data is used to learn reranking weights (Section 7). from W1 and W2. The first, wFSA D, is a smoothed “mini language model” which strongly prefers letter 6 Training trigrams from W1 and W2. The second and third, 1 FST A is unweighted and wFST C is pretrained. FSA E1 and FSA E2, accept all inputs except W and W2. wFSA D and FSA E1,2 are built at runtime. We only need to learn wFST B weights, which we can reduce to weights on transitions q q a 5 Data k → k and q a q from Figure 3. The weights q 3 → 3 k → We obtained examples of portmanteaux and com- qka represent the probability of each step, or P (k). ponent words from Wikipedia and Wiktionary lists The weights q a q represent the probability of 3 → 3 (Wikipedia, 2013; Wiktionary, 2013). We reject any generating phoneme z from input phonemes x and that do not satisfy our constraints–for example, port- y, or P (x, y z). → 208 model % exact avg. dist. % 1k-best W1 W2 gold PM hyp. PM dev test dev test dev test affluence influenza affluenza affluenza cond 28.9 29.9 1.6 1.6 92.0 91.2 architecture ecology arcology architecology chill relax chillax chilax joint 44.6 44.6 1.5 1.5 91.0 89.7 friend enemy frenemy frienemy mixed 31.9 33.4 1.6 1.5 92.8 91.0 japan english japlish japanglish rerank 51.4 50.6 1.2 1.3 93.1 91.5 jeans shorts jorts js jogging juggling joggling joggling Table 4: PM results pre- and post-reranking.