Building a Spoonerism Detection System for Vietnamese

Building a Spoonerism Detection System for Vietnamese

PACLIC 32 Building a Spoonerism Detection System for Vietnamese Thai-Hoang Pham Xuan-Khoai Pham Alt Inc City University of Hong Kong Hanoi, Vietnam Kowloon, Hong Kong [email protected] [email protected] Abstract Spoonerism in Vietnamese has a long history. It is found in Vietnamese poetry from the 17th cen- This paper presents the first automatic sys- tury and in traditional folk tale and song. Speaker of- tem for Vietnamese spoonerism detection. By ten uses spoonerism in mocking or sarcasm contexts incorporating hand-crafted rules with Viet- where directness is avoided. They also use spooner- namese language model, our system, although ism as a linguistic challenge in riddles and paral- simple, achieves a promising result for this task. The proposed method achieves an over- lel sentences. Vietnamese, because of its linguistic all F1 score of 95:47% on the Vietnamese characteristics, is very suitable for making spooner- spoonerism dataset. ism. People only need to switch parts of surface words to form meaningful Vietnamese words. For this reason, spoonerism is used widely in practice 1 Introduction by Vietnamese people. Spoonerism is a linguistic phenomenon in which Spoonerism detection is an important step used in parts of two spoken words including consonants, some natural language processing applications such vowels, and tones can be switched to construct two as question answering and sentiment analysis. It pro- other implied words. A term spoonerism is named vides useful information that helps computers under- after William Archibald Spooner, who was famous stand human conversation more deeply. However, to for making spoonerism. Spoonerism can be seen as best of our knowledge, there is no research on build- one kind of wordplay used to entertain or to criti- ing an automatic system for spoonerism detection. cize. People often use spoonerism in both of written form such as poetry and in oral form such as music In this paper, we present the first spoonerism de- and folk tale. To make spoonerism, they use several tection system for Vietnamese language that uses rules switching parts of one word to the other word hand-crafted rules associated with Vietnamese lan- to form words with implied meaning. Figure 1 shows guage model. Our system, although simple, achieves a well-known example of spoonerism in English. promising results for this task. In particular, our system obtains an F1 score of 95:47% on Viet- namese spoonerism dataset. Moreover, our approach does not require the training data when collect- ing data for this task is difficult. We also publi- cize our Vietnamese spoonerism detection system and Vietnamese spoonerism dataset for research pur- pose, which is believed to positively contributing to the long-term advancement of Vietnamese language Figure 1: Spoonerism in English processing1. 1https://github.com/khoaipx/Vietnamese-Spoonerism 549 32nd Pacific Asia Conference on Language, Information and Computation Hong Kong, 1-3 December 2018 Copyright 2018 by the authors PACLIC 32 Input: Vietnamese sentence from one, two, or more syllables and syllables are Output: Type 1 and Type 2 spoonerisms separated by spaces. Phonologically, each syllable is Step 1: Extract syllable pair set from input a combination of initial and final consonants (op- sentence; tional), vowel, and tone. The details of each com- Step 2: for each syllable pairs do ponent of Vietnamese syllables are described as fol- Decompose syllables pair; lows. Generate reverse forms by using Vowel Vietnamese has comparative large numbers spoonerism rules; of vowels (monophthongs), diphthongs, and triph- for each reverse form do thongs. They are given in table 1 below. Compare with other syllable pairs; if matched then Front Back Center Move these pairs from syllable Centering ia, iê, iêu ưa, ươ, ươu, ươi ua, uô, uôi pair set to Type 2 candidate set; Close i, iu ư, ưu, ưi u, ui end Mid ê, êu ơ, â, âu, ơi, ây ô, ôi end Open e, eo a, ă, ao, au, ai, ay o, oi Step 3: Resolve Type 2 conflict; Table 1: Vowels, diphthongs, and triphthongs in Step 4: Select syllable pairs that do not Vietnamese overlap with syllables in Type 2 set; Step 5: for each syllable pairs do for each reverse form do Consonant Vietnamese consonant is written with Use language model, dictionary to one or two characters. It can appear at the begining find the reverse form with highest or the end of the Vietnamese syllable. Vietnamese score; consonants are given in table 2 below. if score > threshold then Labial Alveolar Retroflex Palatal Velar Glottal Move this pair to Type 1 Nasal m n nh ng/ngh candidate set; Tenuis p t tr ch c/k/q Stop Glottalized b đ Aspirated th end kh Voiceless ph x s h Fricative Voiced v d gi g/gh end r Step 6: Resolve Type 1 conflict; Approximant u/o l y/i Algorithm 1: Vietname Spoonerism Detection Table 2: Consonants in Vietnamese The remainder of this paper is structured as fol- Tone Each Vietnamese syllable has its own tone. lows. Section 2 presents Vietnamese characteristics The tone is marked at the vowel of the syllable. They including vowel, consonant, and tone, the history of are given in table 3 below. using spoonerism in Vietnamese, and rules for mak- Name Description Mark Example ing Vietnamese spoonerism. Section 3 describes our ngang mid level no mark a spoonerism detection system for Vietnamese. Sec- sắc high rising = á tion 4 gives experimental results and discussions. Fi- huyền low falling n à nally, Section 5 concludes the paper. hỏi mid dipping-rising ? ả ngã high breaking-rising ∼ ã 2 Background nặng low falling constricted . ạ 2.1 Vietnamese Language Characteristics Table 3: Tones in Vietnamese Vietnamese is a Mon-Khmer language spoken by more than 100 million people. It is the official and national language in Vietnam. There are three 2.2 Spoonerism in Vietnamese main dialects of Vietnamese: the northern (used Spoonerism in Vietnamese has a difference com- in national broadcast in Vietnam), the central, and pared to other languages. It occurs between two syl- the southern dialects. Vietnamese words are formed lables instead of two words. Moreover, Vietnamese 550 32nd Pacific Asia Conference on Language, Information and Computation Hong Kong, 1-3 December 2018 Copyright 2018 by the authors PACLIC 32 has two important characteristics that make it suit- able for spoonerism. First, the boundary between the syllables is very clear. Second, almost con- sonants can be combined with vowels and tones to make meaningful syllables (Le and Ho, 2013). For these reasons, Vietnamese spoonerism has been used widely in both of written and oral form hun- (a) A riddle dreds of years ago. In Vietnamese literature, spoonerism is used mainly in poetry. Ho Xuan Huong, the female poet who lived in the eighteenth century, is one of the most well-known Vietnamese poets using spooner- ism in their works. She used spoonerism as an art form to express her opinions about the status of women, male authority, Buddhist practices, and the social order of her times (Macken and Nguyen, 2006; Nguyen, 2010a; Le, 2011). Figure 2 shows an example of using spoonerism in Ho Xuan Huong’s poem, the English translation is from (Macken and (b) A folk song Nguyen, 2006). Figure 2: Spoonerism in Ho Xuan Huong’s poem In oral form such as classic folk tales and folk songs, spoonerism is often used to entertain or to criticize (Mai, 2010; Nguyen, 2010b; Bui, 2011; Tran, 2011; Le, 2012). Figure 3 shows examples of using spoonerism in oral speech form (Vu, 2016), (c) A pair of parallel sentences the English translation is from (Macken and Nguyen, 2006). Figure 3: Spoonerism in Vietnamese oral form 551 32nd Pacific Asia Conference on Language, Information and Computation Hong Kong, 1-3 December 2018 Copyright 2018 by the authors PACLIC 32 2.3 Vietnamese Spoonerism Rules Rule #3: Switching Tone In this case, the tone is swapped, leaving in place the vowel and initial and There are several ways to make spoonerism in final consonants. Figure 6 describes a transforma- Vietnamese. Basically, spoonerism occurs when tion applying rule #3 between two syllables and an switching parts of one syllable to the other syllable example. to form Vietnamese words with different meaning. In particular, Vietnamese syllable is a combination of consonants, vowels, and tone. Thus, switching each of these components between two syllables are likely to make spoonerism. Spoonerism also occurs among there, four, or more consecutive syllables. Previous works presented several rules for making Vietnamese spoonerism but they are not sufficient Figure 6: Rule #3 diagram and example and united. Therefore, in this paper, we synthesis Vietnamese spoonerism rules covered almost cases Rule #4: Switching Initial and Final Consonants in practice and present extra rules covered some ex- and Vowel In this case, the initial and final con- ceptional cases. sonants and vowel are swapped, leaving in place the 2.3.1 Spoonerism Rules for 2 Syllables tone. Figure 7 describes a transformation applying rule #4 between two syllables and an example. Rule #1: Switching Vowel and Final Consonant In this case, the vowel and final consonant are swapped together, leaving in place the initial con- sonant and tone. Figure 4 describes a transformation applying rule #1 between two syllables and an exam- ple. Figure 7: Rule #4 diagram and example Rule #5: Switching Vowel, Final Consonant, and Tone In this case, the vowel, final consonant, and tone are swapped, leaving in place the initial conso- Figure 4: Rule #1 diagram and example. C, V, T, nant. Figure 8 describes a transformation applying and S are Consonant, Vowel, Tone, and Syllable re- rule #5 between two syllables and an example.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us