A Neural Network Ensemble Model for Lexical Blends
Total Page:16
File Type:pdf, Size:1020Kb
Neuramanteau: A Neural Network Ensemble Model for Lexical Blends Kollol Das Shaona Ghosh [email protected] [email protected] Independent Researcher University of Cambridge Bangalore, India Department of Engineering, Cambridge, UK Word1 Word2 Blend Structure Category Coverage Abstract aviation electronics avionics avi-onics Prefix + Suffix 16.74% communicate fake communifake communi-fake Prefix + Word 10.78% speak typo speako speak-o Word + Letter 0.14% The problem of blend formation in genera- west indiea windies w-indies Letter + Word 0.89% point broadcast pointcast point-cast Word + Suffix 22.56% tive linguistics is interesting in the context scientific fiction scientifiction scienti-fic-tion Word + Word overlap 22.56% of neologism, their quick adoption in mod- affluence influenza affluenza af-fluen-za Prefix + Suffix overlap 13.98% brad angelina brangelina br-a-ngelina Prefix + Word overlap 11.39% ern life and the creative generative pro- subvert advertising subvertising sub-vert-ising Word + Suffix overlap 16.31% cess guiding their formation. Blend qual- Table 1: Sample blends in our dataset along with the type ity depends on multitude of factors with and coverage. There are other types of rare blends that is high degrees of uncertainty. In this work, beyond the scope of this work. we investigate if the modern neural net- work models can sufficiently capture and recognize the creative blend composition blending systems include generating names of process. We propose recurrent neural net- products, brands, businesses and advertisements to work sequence-to-sequence models, that name a few especially if coupled with contextual are evaluated on multiple blend datasets information about the business or sector. available in the literature. We propose an ensemble neural and hybrid model that 1.1 Related Work outperforms most of the baselines and Blends are compositional words consisting of heuristic models upon evaluation on test whole word and a splinter (part of morpheme) or data. two splinters (Lehrer, 2007). The creative neoglo- gism of blend formation has been studied by lin- 1 Introduction guists in an attempt to recognize patterns in the Blending is the formation and intentional coinage process that model human lexical creativity or to of new words from existing two or more identify source words and blend meaning, context words (Gries, 2004). These are called neolo- and influence. With the popularity of deep neu- gisms. Neologisms effectively trace changing cul- ral networks (DNN)s (LeCun et al., 2015), we are tures and addition of new technologies. Blend- interested in the question if neural network mod- ing is one way to create a neologism. A lexi- els of learning can be leveraged in a generative cal blend is formed by combining parts of two capacity, that can sufficiently explore or formal- or more words. Predicting a high quality lexi- ize the process of blend formation. Blends can cal blend is often unpredictable due to the un- be formed in several closely related morphologi- certainty in the formal structure of blends (Beli- cally productive ways as shown in Table1. Our aeva, 2014). Merging source words digital and work targets blends that are ordered combinations camera, the common expectation is the blend of prefixes of first word and suffixes of the sec- digamera, although, in practice, digicam is ond word. Examples include, avionics (prefix more common (Beliaeva, 2014). There are mul- + suffix), vaporware (word + suffix), robocop tiple unknown factors such as phonology, seman- (prefix + word), carmageddon (overlap). Sev- tics, familiarity, recognizability and lexical cre- eral theories have been forwarded as to the struc- ativity that contribute to blend formation. Some ture and mechanism of blending (Gries, 2012), downstream applications that can leverage such without much consensus (Shaw et al., 2014). Most 576 Proceedings of the The 8th International Joint Conference on Natural Language Processing, pages 576–583, Taipei, Taiwan, November 27 – December 1, 2017 c 2017 AFNLP Figure 1: Architectural diagram of the bidirectional hybrid encoder decoder model. implementation work in blenders and generators is designing rules. quite sparse in the literature and mainly concern The closest work of using novel multitape Fi- explicit blending of vector embeddings. nite State Transducers (FST) for blend creation is Naive implementations of generators exist on in work by Deri et. al. (2015). Our model is a neu- the internet 123 , that are simply lists of all possi- ral model, different from the multitape FST model. ble blend combinations. Some other work is based The multitape FST model is similar to our base- on greedy heuristics for better results 45. Andrew line heuristic model with which we compare our et. Al (2014) built a statistical word blender as a neural model proposed in this paper. The primary part of a password generator. Ozbal (2012) uses a benefit of our model is that it attempts to arrive at combination of edit distances and phonetic met- a consensus among various neural experts in the rics and Pilichowski (2013) also uses a similar generative process of new blend creation. technique. Our work empirically captures some 1.2 Contributions of these mechanisms from blend datasets used in the literature using neural networks. Numerous We summarize the main contributions of our work studies on deterministic rules for blend forma- as follows: tion (Kelly, 1998; Gries, 2004, 2006, 2012; Bauer, 1. We propose an ensemble neural network 2012) do not find consensus on the blending pro- model to learn the compositional lexical cess mainly due to the ’human factor’ involved in blends from two given source words. 1http://www.dcode.fr/ 2. We generalize the problem of lexical blend- word-contraction-generator ing by leveraging the character based 2 http://werdmerge.com/ sequence-to-sequence hybrid bidirectional 3http://www.portmanteaur.com/ 4https://www.namerobot.com/namerobot/ models in order to predict the blends. name-generators/name-factory/merger.html 5http://www.namemesh.com/ 3. We release a blending dataset and demo of company-name-generator our neural blend generator along with open 577 source software out padding, the prediction on the blended word is 11110000111111 for the target workoholic. 2 Neural Lexical Blending Model The order is enforced implicitly in the concate- nated inputs. The indices that are predicted 1 are 2.1 Sequence-to-sequence Models for the characters in the concatenated input that are Sequence-to-Sequence (Seq2Seq) neural network included in the blend. models (Sutskever et al., 2014), are a general- f The forward and reverse hidden states ht and ization of the recurrent neural network (RNN) r ht of the encoder at time t is given by: sequence learning (Cho et al., 2014) paradigm, where a source sequence x1, . , xn is mapped f f f ht = GRUenc ht 1, xtr (1) to a fixed sized vector using a RNN often serv- − ing as the encoder, and another RNN is used to r r r map the vector to the target sequence y1, . , yn, ht = GRUenc ht+1, xtr (2) functioning as the decoder. However, RNNs strug- f r gle to train on long term dependencies sufficiently; where GRUenc and GRUenc stand for forward and therefore, Long Short Term Memory Models and reverse GRU encoder units described by (Cho (LSTM) and Gated Recurrent Units (GRUs)(Cho et al., 2014) and tr = T t + 1. Similarly, the f r− et al., 2014) are more common for such sequence forward y and reverse y intermediate outputs of to sequence learning. the decoder are given by: f f f 2.2 Bidirectional Seq2Seq Model yt = GRUdec yt 1, xt (3) − We propose a bidirectional forward and re- verse GRU encoder-decoder model for our lex- r r r y = GRU y , xt (4) ical blends. To this end, both the encoder and t dec t+1 f f r r the decoder are bidirectional, i.e. they see the where y0 = hT and yT +1 = h1. The final decoder ordered input source words both in the forward output is the result of applying smooth non-linear direction and the backward direction. The mo- sigmoid transformation to the GRU outputs tivation here is that in order to sufficiently cap- j f r ture the splinter point, there should be depen- zt = σ Wf yt + Uryt (5) dency on the neighbouring characters in both directions. Figure1 shows the bidirectional z is character wise probability of inclusion in the Seq2Seq model that we propose. Since we blend. We apply sigmoid loss on z for training. have two source words for every blend, that is a sequence of characters, the input to our 2.3 Ensemble Hybrid Bidirectional Seq2Seq model has the source words concatenated with a Model space and padding to align to the longest con- We propose an ensemble model of our vanilla bidi- catenated example. For example in Figure1, rectional encoder decoder model discussed in Sec- the source words work and alcoholic are tion 2.2 in order to capture different representa- concatenated as cilohocla krow and work tions of the source word, hidden state and a vari- alcoholic for the encoder and decoder respec- ety of blends. Blends can be inherently varying in tively, which subsequently gets reversed to work structure of the blend formation resulting in multi- alcoholic and cilohocla krow for the re- ple possible blend candidates. Each member or an verse encoder and decoder units. The represen- expert in our ensemble model has the same under- tation of a sequence of characters in the input lying architecture as described in Section 2.2 and pair of source words is the concatenation of the in Figure1. However, the experts in the ensem- fixed dimensional character to vector representa- ble do not share the model parameters, the motiva- tions. The model’s prediction corresponding to tion being able to capture a wider range of param- the two source words in the input, is the order eter values thus tackling the variations and mul- preserving binary output yˆ = 0, 1 x y tiplicity in the blending process adequately.