WCCI 2012 IEEE World Congress on Computational Intelligence June, 10-15, 2012 - Brisbane, Australia FUZZ IEEE A Critical Survey on the use of Fuzzy Sets in Speech and Natural Language Processing

Joao P. Carvalho Fernando Batista Luisa Coheur TULisbon – Instituto Superior ISCTE-IUL – Lisbon University TULisbon – Instituto Superior Técnico Institute Técnico L2F - INESC-ID L2F - INESC-ID L2F - INESC-ID R. Alves Redol 9, 1000-029 Lisboa R. Alves Redol 9, 1000-029 Lisboa R. Alves Redol 9, 1000-029 Lisboa [email protected] [email protected] [email protected]

Abstract — This paper shows how the use and applications of specific evaluation metrics exist, and it is not possible to reduce Fuzzy Sets (FS) in Speech and Natural Language Processing the whole field into a specific problem, as it was done by (SNLP) have seen a steady decline to a point where FS are Novak in [36]. Despite some potentially interesting works and virtually unknown or unappealing for most of the researchers results (See section III.), fuzzy sets are virtually unknown or currently working in the SNLP field, tries to find the reasons unappealing for most of the researchers currently working and behind this decline, and proposes some guidelines on what could publishing in SNLP journals and conferences. This could be done to reverse it and make FS assume a relevant role in somehow be understandable if the SNLP community was SNLP. small, or if SNLP was a niche research field. However, several yearly SNLP conferences attract hundreds of participants, and a Keywords: Fuzzy Sets, Natural Language Processing, Speech processing. few ones even a few thousands (e.g.: INTERSPEECH; ICASSP; COLING). Therefore, from a fuzzy sets point of 1 I. INTRODUCTION view, it should be expected to find a considerable number of “fuzzy” contributions. Speech and natural language processing (SNLP) is a research field that joins linguistics, computer science and signal This paper presents an overview of the current use of fuzzy processing and is concerned with the interactions between sets in SNLP, focusing on works that have been accepted by computers and human languages. This interdisciplinary field is the SNLP community, and presents some suggestions and often referred by less encompassing names like computational directions in how can fuzzy sets be useful and accepted by the linguistics – usually used by linguistics – or simply by natural same community. language processing – the term originally preferred in computer science –, even when making extensive use of speech II. SPEECH AND NATURAL LANGUAGE PROCESSING: A processing. The ultimate goal of SNLP would be to develop a SHORT OVERVIEW system to simulate a conversational agent that would be able to SNLP originated in the late 1940s, and despite long term maintain a spoken dialogue or answer questions made in divergences on the best way to approach SNLP, like the human-like form. Such agent would recognize and understand famous “Two Camps” in the 1960s – the symbolic paradigm almost any question or discourse in a given spoken language, associated with Chomsky formal language theory and and would provide an appropriate spoken sequence to the generative , and the stochastic paradigm associated with conversation (or a reply to the question). Automatic language AI names like John McCarthy, Marvin Minsky and Claude translation could also have to eventually be addressed Shannon, focused on stochastic, statistic and probabilistic somewhere in mid-process. As it will be seen in section 2., the models –, the field has lately converge to the use of complexity and number of tasks involved in obtaining such probabilistic finite state models and other machine learning agent is even more daunting than it appears to be, at least to techniques, especially unsupervised statistical approaches made someone not directly involved in those tasks. possible and motivated by the creation and availability of From a fuzzy sets “point of view”, i.e., for someone directly large/huge linguistic and text corpora. involved in fuzzy sets research, it would seem very natural for In order to understand the complexity of the tasks involved such agent to use fuzzy sets. Actually, the first thought would in SNLP let us analyze the major steps involved in obtaining an be “why shouldn’t it be implemented mainly using fuzzy agent similar to what is described in section 1, and what are systems?” The first reason is that SNLP is a huge field currently the major techniques used to approach them. currently tackling dozens of different problems for which A. From Speech to Text

1 When in the presence of an audio signal, the agent would This work was in part supported by FCT (INESC-ID multi annual funding) need to recognize a sequence of . If each was through the PIDDAC Program funds. pronounced the same in all contexts and by all speakers, speech

U.S. Government work not protected by U.S. copyright 270 recognition would be really easy. But even if one focus in a Other major steps include , single language like English, what might seem an easy task lemmatization, part of speech tagging (POS), syntactic analysis becomes incredibly complex when one has to deal with gender and Named Entity Recognition (NER). and age variations, different accents and ways to pronounce and produce a word, different emotional states, rate of speech, The first two apply some transformation to words: disfluencies, noisy environments, etc. One must not forget that stemming rules usually discard some suffixes; lemmatization words are produced by the movement of the speech articulators map variant forms of the same word into the same expression. (tongue, lips, velum) during speech production, which is This process is particularly interesting in tasks such as different from person to person, continuously changes and is Information Retrieval, as the information we are seeking is subject to physical constraints like momentum. The same word probably not stated with the same words of our query. Stems can be represented by an infinite number of different signals, and lemmas might help. and the number of different words and word variations In POS tagging each word is associated with one or more (number, gender, etc.) to be recognized is huge. The modern morpho-syntactic labels (name, proper name, verb, etc.). algorithms to approach this problem use smaller units of Several POS taggers exist for many languages and some can speech, called phones, instead of words. Phonology is the reach almost a perfect accuracy (considering that even linguists science that studies phones. Note that even phones are very do not agree in every tag). Some of these taggers are rule- complex signals. based, others use labeled corpora to train, for instance, HMM In addition to word recognition, there are additional models [24], and unsupervised POS tagging is also being problems associated with speech to text transcription. Namely, explored [20]. one must deal with word boundary detection, sentence Concerning syntactic analysis, there is an impressive boundary detection, punctuation detection, , speaker collection of formalisms, algorithms and tools. recognition (which can be important when several participants can be roughly split into constituency and are involved), etc. The recognition and marking of the above dependency grammars (and hybrids). The first set tries to group items is a problem usually known as rich transcription. sequences of words (phrases) together; the second set intends has improved and nowadays works to establish relations (dependencies) between words. Examples rather well under restricted environments, like commands over of grammars from the first set are the well-known Context-Free a telephone, or recognition of tv news. But even these tasks Grammars (CFG) [9]; an example of the second set is the Link have problems under noisy environments. On the last 3 decades Grammar [43]. Many grammars are hand crafted, but there are the bottleneck with natural speech recognition has always been also systems that try to infer rules. Algorithms for the availability of data, and the ability to process it sentences are defined according with the involved formalism. efficiently. This availability has evolved a lot in the last years, For instance, Earley algorithm [14] can be applied to CFGs, but and so have speech recognition methods and computational CKY only to CFGs in some special form. Tools implementing systems. The results obtained by Dragon technologies, the these algorithms can be found in many programming arrival of Siri on the iPhone 4s, and the spread of Google voice languages. recognition, changed the public perception that speech Named Entity Recognition is also a task that should be recognition does not work. However one must not forget that detached. It consists in labeling “entities” in text, such technologies imply an online connection, lots of remote usually locations, organizations and people names. HMMs and processing power, and corpora access. its successors are widely used in this so useful task in so many SNLP applications. Clearly, one can say that Speech-to-text is mostly composed of classification problems, and the techniques To conclude, one should note that a text extracted from an currently used to approach it are classification techniques. The automatic speech transcript contains errors, is characterized by following keywords can be used to access the most used a weaker syntactic structure, and contains phenomena specific techniques in speech recognition and rich transcription – from speech data, such as, repetitions, and other disfluencies, Hidden Markov Models (HMM); Neural Networks; Gaussian that may have impact in further processing. Most tasks Mixture Models (GMM); Support Vector Machines (SVM); performed for written texts are also performed for speech texts, Viterbi; Multipass decoding; LPC features; N-gram features; but additional specific problems need to be addressed. For Mel Frequency Cepstral Coefficients (MFCC) features. instance a grammar that suits for well-formed sentences might be inappropriate for speech input. B. Words, Utterances and Grammars Identifying sentences (or utterances) and the respective C. Natural Language Understanding tokens is one of the first steps involved in text processing, and Natural Language Understanding (NLU) is the process of its impact is well known for subsequent processing tasks. converting natural language utterances into a structure that the Although this might seem to be a trivial task, it is not. As an computer can deal with. This structure, which abstracts the example, one cannot consider that a full stop represents the end of the utterance, is highly dependent on the involved of a sentence, since expressions such as Mr. or John F. application and can move from keywords to complex Kennedy, need special attention. But it gets worse if we representations such as logical forms or frames (sets of consider, for instance, Chinese. As it is written without spaces attribute/value pairs.) The process of mapping a given utterance between words, the segmentation process is extremely into its semantic representation flows from keyword spotting to complicated (there are even competitions on this subject).

271 linguistically motivated approaches, but machine learning also difficult steps since ones intention it to produce human, instead feeds current NLU research. of robotic, sounding speech. Considering early systems, we detach ELIZA [49] and The first step is text normalization, and includes LUNAR [51]. ELIZA, which aims at emulating a psychologist, tokenization and dealing with non standard words. had a great impact at its time, as many people believed it to be Tokenization uses any supervised machine learning method to human despite the simplicity of the underlying framework, train a classifier to mark sentence boundaries. Non-standard based on regular expressions. ELIZA can be seen as the words are often ambiguous, i.e., can be spoken in different predecessor of nowadays , such as Cleverbot2 and ways according to context. Examples: 3/4 (should it be spoken ALICE3, which try to simulate human conversation. Many of as March 4th or three quarters?); should an acronym be read as these systems target to pass the Turing test and win the Lobner a word (IKEA), or using each letter in a sequence (IBM)? lead prize 4 . In what concerns LUNAR, this natural language (refers to leash or metal? Note that each is pronounced interface capable of answering questions about moon rocks differently); Dr. (drive or doctor?); St. (saint or street?) also deserves some detach, as it started the line of research Dealing with non-standard words requires at least three steps: responsible for a panoply of Natural Language Interfaces with tokenization to separate out and identify potential non-standard Databases (NLIDB), in the 80s and 90s, which ended up words, classification to classify them into a given type, and converging in question/answering (QA) systems. Both NLIDBs expansion to convert each type into a string of standard words. and QA systems share a process of question understanding, but Classification itself might be difficult but is not ambiguous. in the former the target information is in the database and in the Disambiguation of homographs can be done using POS later information needs to be extracted from the web or tagging except for a few special cases that are usually ignored collections of documents, that is, from possibly unstructured in such systems. texts. The second step consists in phonetic analysis. Current In early times, NLU was linguistically motivated and techniques use dictionaries for known words. Problem lies with usually based on the following paradigm: each syntactic rule out to deal with proper names and with unkown words. Some was associated with a semantic rule, and logical forms were systems have been proposed for names but there is room for generated in a bottom-up, compositional process, where the improvement. Nonstandard words use machine learning based semantic of each level was the result of combining the semantic grapheme-to- conversion (g2p). of the elements of previous levels through the correspondent The next steps are related with prosodic features and semantic rules [24][1]. Nevertheless, this approach presents several drawbacks, as for instance, syntactic elements, prosodic prominence – the rhythm, intonation and level of words in phrases. Regarding features, phrases are usually significant for semantic analysis, could be spread in different divided into parts when spoken, and the last vowels of the parts syntactic rules. In addition, all these rules were hand crafted, which made them difficult to develop and maintain. are usually longer. Natural sounding speech must include these features. Phrase boundary prediction is generally treated as a Sub-symbolic techniques, as in so many other research binary classification task, where a word is given and the system fields, are currently being widely applied to NLU. For has to decide whether or not to put a prosodic boundary after it. instance, Bhagat [4] treats NLU as a classification problem, Deterministic rules can also be included, like for example i.e., his final goal is to be able to classify an utterance. Thus, inserting boundaries after punctuation. More sophisticated Maximum Entropy and Support Vector Machines are used in models are based on machine learning classifiers trained using some of their experiments. Other approaches try to label parts corpora annotated with prosodic boundaries. Prominence of the utterance that can be later used to create frames. HMMs involves making some words sound more louder, other slower, can be used in such approaches. However, despite the claim of slightly changing the tone, etc. The duration of phones is an Bhagat of training his system with little data, all these issue that is not only related with the phone itself, but also with approaches rely on large annotated corpora, which are used for prosodics and a wide variety of contextual factors that can be training. modelled by rule-based or statistical methods. NLU remains one of the most complicated problems to be E. Other Problems solved in SNLP. Not only because of the so many possibilities SNLP also deals with several other problems not described in stating something (which should all be mapped into the same in the previous sections. Among others, these include: author semantic representation), but also because it is almost recognition (who wrote a given text?); speaker recognition impossible to find a representation that perfectly suits even the (who is talking?); summarization (extract the main contents of simplest problem. a given text); (automatic translation D. Text to Speech between a pair of languages); detecting emotions in speech (and also in texts); etc. Currently most of these problems are The final stage would involve generating speech from the also addressed recurring to statistical approaches and machine text representing the agent reply. Speech is usually generated learning techniques. recurring to phones, but before even considering how to generate and concatenate phones, this stage involves multiple III. A SURVEY OF FUZZY SETS RELATED WORKS IN SPEECH AND NATURAL LANGUAGE PROCESSING 2 http://Cleverbot.com Research in the various subareas of SNLP is spread across a 3 http://alice.pandorabots.com/ 4 http://www.loebner.net/Prizef/loebner-prize.html wide number of journals and conference proceedings. Some

272 have more emphasis on natural language processing and use the term “fuzzy” a couple of times (as in “language is computational linguistics, others, related with speech fuzzy”), but never use it has a scientific technique; another processing, are usually associated with the signal processing paper refers to fuzzy when listing the amount of AI papers field. There are also a few books that can be considered as refering to language, where he cites “Linguistic quantifiers references in the field. In this section one surveys the impact of modeled by Sugano integrals knowledge representation” (yes, fuzzy sets in the most relevant SNLP books, journals and he wrote SugAno instead of Sugeno), a work that has no conference proceedings. It is also possible to find some SNLP particular relevance from a computational linguistics point of works in conferences related with , like the view; in one paper the author mentions he has proved in a AAAI, for example, but those were not considered in this work. previous conference paper that fuzzy sets can have a small role in CL, but never uses fuzzy sets in the paper that was accepted A. SNLP Books for the journal; finally, the remaining paper [15] uses fuzzy sets Currently the most influential book in SNLP is Jurafsky and to try to find near-synoms, but never makes any reference to Martin’s “Speech and Language Processing - An introduction any fuzzy sets work or even mentions fuzzy sets as the to natural language processing, computational linguistics, and scientific theory that supports most of the presented research. speech recognition” [24], which is used as the main textbook in Concluding, there is only one paper using/applying fuzzy sets many (most) graduation and post graduation courses in SNLP. in the major computational linguistics related journal, and even nd This work was printed originally in 2000, and had a revised 2 in that paper, Fuzzy Sets are not acknoeledged as a relevant edition in 2009. The book presents a very extensive overview sicentific technique. of the field, giving an emphasis to the statistical and machine learning approach to SNLP. “Fuzzy” is referenced in the book In Natural Language Engineering (NLE), in 18 volumes exactly once. However, the word is present only as an example one can find half a dozen papers where fuzzy sets are somehow of how plurals are constructed in English. Fuzzy sets are never referenced, and two works based on fuzzy sets theory, one even mentioned as a scientific area. published in 1996 related with POS tagging [27], and a rather recent paper on detecting emotions on short texts (SMS, Other relevant books can be mentioned, like for example: tweets) [35]. “Challenges in Natural Language Processing”, by Bates and In the ACM Transactions on Speech and Language Weischedel [2]; “Natural Language Processing for Online th Applications”, by Jackson and Moulinier [23]; “Computational Processing, a journal currently in its 8 volume, only two Linguistics: Models, Resources, Applications”, by Igor A. papers related to fuzzy sets have been published, one where Bolshakov and Alexander Gelbukh [5]; or “Natural Language fuzzy boundaries are applied in segmentation [10], Understanding”, by Allen. Total number of times “fuzzy” is and another concerning stemming in the Arabic language [16]. mentioned in all these books? One: in [2], chapter 2., Atkins The balance in what concerns fuzzy sets in articles writes about “fuzzy boundaries”. However, she never uses any published in more speech processing oriented journals is not technique related with the fuzzy sets field when dealing with particularly better (in between parenthesis the publication these “fuzzy boundaries”. Finally one can mention the only year): book where FS are indeed referred as a proper technique, even if no details are given, the “Springer Handbook of Speech • IEEE Trans. Acoustics, Speech and Signal Processing, 2 Processing” [3], where there are references to Fuzzy queries, papers in 38 volumes (1979, 1983) [12][13]; and Fuzzy Vector Quantization (FVQ). • In IEEE Trans. Speech and Audio Processing, 3 papers in 13 volumes (1994, 2000, 2000) [32][54][8]; B. SNLP Journals • In IEEE Trans. On Audio, Speech and Language SNLP Journals include “Computational Linguistics”, Processing, 1 paper in 7 volumes (2011) [47]; “Natural Language Engineering”, “ACM Transactions on • In Speech Communication, 3 papers in 53 volumes (1996, Speech and Language Processing”, “Speech Communication”, 2001, 2002) [7][53][52]. “IEEE Transactions on Acoustics, Speech and Signal Processing”, “IEEE Transactions on Speech and Audio The above papers usually address small (but important) Processing” and the “IEEE Transactions on Audio, Speech and parts of speech processing problems by introducing fuzzy Language Processing. techniques such as neural fuzzy networks [32][54], fuzzy maximum entropy (ME) smoothing of n-gram models [8], The Computational Linguistics journal (CL) is the primary fuzzy relations for evaluating phonemic hypotheses [12], fuzzy archival forum for research on computational linguistics and possibilities for plosive consonants recognition [13], fuzzy natural language processing. The journal, sponsored by the masks in missing feature theory [47], a two stage neural fuzzy Association for Computational Linguistics (ACL), has been decision classifier for speaker identification [7], telephone published for the ACL by MIT Press since 1988. It was speech keyword fuzzy search algorithm [53], or a corpus-based founded in 1974, publishes 4 issues per year, and has currently fuzzy fragment-class Markov model (FFCMM) proposed to an ISI impact factor of 2.9. During a period of 37 years, only model the syntactic characteristics of a speech act and used to 15 papers mentioning “fuzzy” have appeared in CL. Of those choose the speech act candidates [52]. 15, 9 are short reviews of textbooks where fuzzy is mentioned. Of the other 5 articles, there is one that has dozens of citations, Given that only 16 papers in 184 journal volumes were almost revolves around fuzzy, but uses the term “fuzzy” as a found, which means 16 papers in more than 6000, it is easy to novelty, ignoring that “fuzzy” is a coined word in the scientific conclude that the influence of fuzzy sets in the SNLP community for almost half a century! [48]; other 2 papers also community is close to null.

273 C. SNLP Conferences in speech processing conferences. The numbers and The most important conferences concerned with natural percentages in conferences closer to NLP are much lower language processing and computational linguistics are naturally overall. For example, only 1 paper related with fuzzy sets [18] associated with the ACL. Computational Linguistics has been accepted in the last four editions of COLING (see (COLING) and include the annual proceedings of ACL, TABLE I). This low number is even more impressive if one NAACL, and EACL, and the biennial COLING conference. mentions that COLING2010 had 335 papers and was held in Related conferences include various proceedings of ACL Beijing, China, a traditional “fuzzy sets friendly” country with Special Interest Groups such as the Conference on Natural a large fuzzy sets community. Language Learning (CoNLL), as well as the conference on Empirical Methods in Natural Language Processing (EMNLP), 0.60% and also other conferences such as the Conference on Intelligent Text Processing and Computational Linguistics 0.50% (CICLing) or the International Conference on Computational 0.40% Linguistics (ICCL). 0.30% Research on speech recognition, understanding, and synthesis is presented at the annual INTERSPEECH 0.20% conference, which is called the International Conference on Spoken Language Processing (ICSLP) and the European 0.10% Conference on Speech Communication and Technology 0.00% (EUROSPEECH) in alternating years, or the annual IEEE International Conference on Acoustics, Speech, and Signal Processing (IEEE ICASSP), among others. 1980-81 1982-83 1984-85 1986-87 1988-89 1990-91 1992-93 1994-95 1996-97 1998-99 2000-01 2002-03 2004-05 2006-07 2008-09 2010-11 The e-library of L2F – the Spoken Language Laboratory at Inesc-ID – was used as a basis to study the influence of fuzzy Figure 2 – Percentage of Fuzzy Sets related papers in the L2F SNLP sets in SNLP conferences. Even if the library does not conference library from 1980-81 until 2010-11 obviously contain all the editions of all the conferences in the area, it is very extensive and can be used as a good indicator. TABLE I: Number of papers in last COLING editions The total number of indexed papers in the library is over Year 2004 2006 2008 2010 160000, of which around 90000 were considered in this study. Figure 1 shows the number fuzzy sets related published papers #papers 204 284 181 335 per year since 1980. Figure 2 shows the percentage of fuzzy sets related papers among all papers. #fuzzy sets related papers 0 0 0 1

25 The papers that were found cover a large number of interesting topics. Due to lack of space it is not possible to cite 20 all the works, but an Internet keyword web search should provide the necessary references. The following list of topics is 15 representative but not exhaustive: fuzzy speech recognition; fuzzy vector quantization; fuzzy HMM; fuzzy neuro HMM; 10 model fuzzy estimation for HMM; fuzzy entropy HMM; fuzzy speech segmentation using HMM; fuzzy GMM; fuzzy speech 5 synthesis; rule based fuzzy formant ; vowel recognition using fuzzy sets; auditory representation 0 combination using fuzzy sets; fuzzy multilevel acoustic phonetic decoder; speech recognition using fuzzy clustering; noise adaptation algorithms for roubust speech recognition 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007 2010 using fuzzy clustering and fuzzy post correction rules; fuzzy Figure 1 – Number of Fuzzy Sets related papers in the L2F SNLP conference logic based speech detection algorithms under noisy library per year environments; fuzzy speaker recognition; fuzzy emotion recognition in voice and in speech; fuzzy c-means clustering The first noticeable factor in the figures is the very low for speaker recognition; fuzzy SVM for Age and Gender percentage of fuzzy sets related papers (the max value is classification in speech; fuzzy statistical language; fuzzy topic 0.54%). The second relevant factor is the declining tendency detection; topic similarities using fuzzy; fuzzy word match in since the year 2000. The start of this declining period coincides document retrieval; speaker adaptation methods; fuzzy with the publication of Jurafsky’s referential textbook [24], and prosodics; fusion of different classification outputs using fuzzy the declination appears to be co-related with the rise in integrals; audio visual (fuzzy logical model influence of machine learning in SNLP. of perception); rule-based Translation Corpus where queries use fuzzy matches; fuzzy natural languages interfaces for It should also be mentioned that a relevant number of the databases; N-gram Fuzzy Matching. works referenced in the table are related to, and were published

274 Some of the works compared fuzzy techniques with it has not been integrated in the SNLP chain or compared baselines and state of the art techniques, but the results were with the SNLP NLU methods. not always very conclusive or the comparisons extensive. Some examples: • Linguistic data mining and summarization – Deals with creating linguistic summaries from numeric data [55][26] • In [29], speaker adaptation methods were compared, and [25]. fuzzy HMM produced worse results (1990), while in [34], fuzzy HMM were deemed at least as good in what Maybe an effort should be done to advertize/publish these works, where FS have assumed a very important role, in the concerns robustness of speech recognition algorithms (1990); SNLP process area. • In [11], robust speech recognition using fuzzy clustering, There are also some other works that should be referenced produced much better results than hard clustering (but no since they explicitely address the connection of FS with SNLP, other methods were compared); even if they only focus on some limited aspects of SNLP: • In [46], Fuzzy GMM is found to be more efective than Novak [36]; Ma [33]; Inoue work on the treatment of fuzziness GMM for speaker recognitions; in Natural Language [19]; Langanke work on Fuzzy Sets and • In [21], fuzzy sets yeld better results than MFCC when Chomsky grammars [31]. combining auditory representations (1994); IV. CAN FUZZY SETS BE USEFUL TO SNLP? • In [45], fuzzy techniques for speech segmentation, produced results equal to NN but slightly worse than In the previous sections we have seen that, despite some manual segmentation (2001). attempts to use FS in SNLP, fact is that, especially in the last decade, the SNLP community has basically ignored FS. Note that all of the above works are more than 10 years old. Therefore the big question is, “Can FS be useful to SNLP?” A special mention should be done to the works of B. B. We believe that the answer is affirmative, but it is imperative to Rieger [39][38][40][41] (among others), one of the pionneers follow some guidelines in order for FS to be accepted and used of the use of FS in Computational Linguistics. in the SNLP community: D. SNLP Works published outside the SNLP community • Publish in the right places – SNLP researchers will never Fuzzy SNLP works have also been published outside the give credit to works published out of the main journals SNLP community, and not necessarily within the FS and conferences of their community; community. It is our contention that in order to gain impact in • It is wrong to assume that something is better just because the SNLP community, researchers should focus on the forums uses fuzzy – Simply stating that fuzzy is naturally adapted directly related with SNLP. Therefore although we mention to deal with SNLP is worthless. Compare results against some works in this section, the coverage was not meant to be as established and accepted baselines and gold standards exhaustive as in the previous sections. using well-known and publicly available corpora. Use In Fuzzy Sets related conferences the number of SNLP statistical tests. Be aware that SNLP community will not related works is rather low. As an example, one can find only 7 accept the proposed approach unless there is an papers [50][17][44][42][30][6][37] in the last two Fuzz-IEEE improvement over the state of the art methods. conference proceedings, of a total of around 1000 papers. One • Fuzzy Sets cannot solve everything – FS can have should note that these numbers do not include works that are obvious advantages when dealing with specific problems somehow related to some natural language topics and that have but using them does not make much sense in several been given a particular attention in the fuzzy sets community, approaches. One good example consists in using FS when but are usually ignored in SNLP. These topics include: tagging text: it is well known that the same word can • Fuzzy Query-Answering (not to confuse with Question belong to different syntactic classes depending Answering) – Use of a “controlled” natural language to on the context. E.g., “married” can be the past participle query databases. The “controlled” term is used to indicate of the verb “marry”, as in “Bob married Alice”, or an that these systems are usually not prepared to accept adjective, as in “Bob is married”. However, when totally open NL questions, as in QA. There should be a someone uses “married” in an utterance it should be huge overlap and interaction between researchers in this assumed that he knows exactly how he is using that word. community and NLU community that is not usually There is no in the author intention. When verified. A good source for papers in this area could be R. classifying “married” within an utterance using Yager FQAS - Flexible Query Answering Systems probabilistic methods, there is a given probability that the conference series, usually published in Springer’s LNCS word belongs to a class or the other, but never (or rarely – Lecture Notes in computer Science. in some languages) it belongs to more than one class. FS would only be useful if there was a possibility that the • Zadeh’s Computing with words and Precisiated Natural word would somehow fit partially into both classes Languages – A very well known topic in FS [56] that simultaneously within the same utterance, or if the classes deals with how is it possible to treat propositions drawn would overlap. On the other hand there are problems from a natural language as objects of computation. This where FS make much more sense than probabilities, such topic is definitely related with Natural Language as, for example, when classifying phones in speech Understanding, but not yet associated with SNLP because recognition, since a given phone can fit partially to

275 several classes, and phones are subject to all sorts of need to use additional morphological information that variations, e.g., a “k” can be heard sometimes like a “g” could make use of fuzzy rule bases. (see section II.A.) • Fusion and combination of features is needed in several Regarding what can FS bring to SNLP, lots of work on SNLP problems and can make good use of fuzzy different topics has already been done, especially during the rulebases (and more), instead of using probability 90’s (see section III.C.) The reason why most of those works estimators (and HMM), or Maximum entropy models. was not well received was probably because of the lack of proper testing and comparisons to other approaches. The V. CONCLUSIONS soundest of those “old” ideas could certainly be put up to date The use and applications of Fuzzy Sets in SNLP have seen and properly tested in order to check for their validity. An a steady decline on the first decade of the 21st century, and this effort should also be made to bring the FS works mentioned in work as shown that nowadays FS are virtually unknown or section III.D to the SNLP community, especially to the NLU unappealing for most of the researchers currently working and community. publishing in SNLP journals and conferences. This decline To conclude we suggest a list of some particular areas coincided with the publication of Jurafsky’s textbook [24], and where the application of FS could provide interesting results: appears to be co-related with the rise in influence of machine learning methods in SNLP. Despite this fact it is our belief that • Speech phone recognition – Phonetic variation occurs FS can have a relevant role in several areas of SNLP, as long as not only from speaker to speaker but even within the FS researchers make an effort to publish their works in the same speaker depending on multiple factors and SNLP community. In order to accomplish it, researchers should contexts, like for example the shortening or reduction of focus on the assets that FS could bring to SNLP and follow sounds as a speaker talks faster or colloquially. The some important guidelines in what concerns the results they reason these changes happen is that the movement of obtain. the speech articulators during speech production is continuous and is subject to physical. Based on the fact REFERENCES that a speaker represents in his mind categories instead [1] Allen, J., “Natural Language Understanding”, Benjamin/Cummings Pub. of phones with all details, one could use fuzzy Co.. 1995 distinctive features for phones classification (instead of [2] Bates, M., Weischedel, R., “Challenges in Natural Language using binary variables) capturing generalizations across Processing”, Cambridge University Press, 1993 phone categories), in order to improve the detection of [3] Benesty, Sondhi, Huang (Eds.), “Springer Handbook of Speech phones in speech recognition. Processing”, Springer, 2008 [4] Bhagat, R., Leuski A., Hovy E., "Statistical shallow semantic parsing • Fuzzy Syllabification – Syllabification, the task of despite little training data". In Proc. of the 9th International Workshop segmenting a sequence of phones into is on Parsing Technology (Parsing '05). ACL, Stroudsburg, PA, USA, 186- extremely important in several tasks of both speech 187, 2005. synthesis and recognition. One reason syllabification is [5] Bolshakov, I., Gelbukh, A., “Computational Linguistics: Models, a difficult computational task, is that there is no Resources, Applications”, IPN-UNM-FCE, 2004 completely agreed-upon definition of [6] Cao, J., Kubota, N., Liu, H., “A Two Stage Pattern Matching Method for Speaking Recognition of Partner Robots”, Procceedings of the 2010 boundaries. The most common syllabification methods Fuzz-IEEE, WCCI2010, Barcelona, Spain, 499-504, 2010 are rule-based and could possibly be improved if fuzzy [7] Castellano, P., Sridharan, S., “A two stage fuzzy decision classifier for rule-bases were used to deal with the uncertainty speaker identification“, Speech Communication, Vol. 18 (2) (1996) pp. regarding the syllable boundaries. 139-149 [8] Chen, S. F., Rosenfeld, R., “A survey of smoothing techniques for ME • Multipass decoding in speech recognition – Use FS to models”, IEEE Trans. Speech Audio Processing, vol. 8, pp. 37 – 50, improve/complement later recognition stages, by using 2000 contexts or word associations, as humans do when they [9] Chomsky, N., “Three models for the description of language”, IRI misunderstand a word due, for example, to a noisy Transactions on Information Theory, 2(3), 113– 124, 1956 environment – humans tends to think what of the word [10] Creutz, M., Lagus, K., “Unsupervised models for morpheme hipothesis make more sense given the sentence context. segmentation and morphology learning”, ACM Transactions on Speech and Language Processing (TSLP), Vol.4 (1), ACM, 2007 • Speech emotion detection – FS make sense since [11] Cung, H. M., Normandin, Y. "Noise adaptation algorithms for robust emotions cannot be separated into crisp separate classes. speech recognition", Proceddings of the SPAC1992, ETRW on Speech Processing in Adverse Conditions, Cannes, France, 171-174, 1992 • Speech synthesis involves several steps where FS could [12] De Mori, R., et al., “Inference of a knowledge source for the recognition be useful to give a speech sinthetizer a more human like of nasals in continuous speech”, IEEE Trans. Acoust., Speech, Signal appearance, especially in what concerns prosodics. Processing, vol. 27, pp. 538 – 549, 1979 [13] DeMichelis, P., et al., “Computer recognition of plosive sounds using • Text Authorship – Very good preliminary results using contextual information”, IEEE Trans. Acoust., Speech, Signal FS have been presented recently [22]. Processing, vol. 31, pp. 359 - 377, 1983 [14] Earley J., "An efficient context-free parsing algorithm", Commun. ACM • Fuzzy tagging of unknown words – Usually unkown 13, 2, 94-102, 1970 words are tagged based on statistics about hapax [15] Edmonds, P., Hirst, G., “Near-Synonymy and Lexical Choice”, legomena (words that have occurred only once), and Computational Linguistics, Vol 28 (2), 105-144, 2002

276 [16] El-Beltagy, S.R., Rafea, A., “An accuracy-enhanced light stemmer for [37] Okamoto, W., Tano, S., Inoue, A., Fujioka A., "A Generalized Four-Step arabic text”, ACM Transactions on Speech and Language Processing Inference Method for Fuzzy Quantified and Truth-Qualified Natural (TSLP), Vol. 4, (2), Article 2, ACM, 2011 Language Propositions" Procceedings of the 2010 Fuzz-IEEE, [17] El-Fiqui, H., Petraki, E., Hussein, A., “A Computational Linguistic WCCI2010, Barcelona, Spain, 191-198, 2010. Approach for the Identification of Translator Stylometry using Arabic- [38] Rieger, B.B., “Feasible Fuzzy Semantics”, Procs. of the COLING 1978, English Text”, Procceedings of the 2011 IEEE International Conference 41-43, Bergen, Denmark, 1978 on Fuzzy Systems”, Taipei, Taiwan, 2029-2045, 2011 [39] Rieger, B.B., “Fuzzy Structural Semantics”, Proc. of the 3rd European [18] Fu, G., Wang, X., “Chinese Sentence-Level Sentiment Classification Meeting on Cybernetics, Vienna, Austria, 1976 Based on Fuzzy Sets”, Proceedings of the COLING2010, Beijing, [40] Rieger, B.B., “Fuzzy Word Meaning Analysis And Representation In China, 312-319, 2010 Linguistic Semantics. An Empirical Approach To The Reconstruction [19] Fujioka R., et.al, ”Treatment of Fuzziness in natural Language by Fuzzy Of Lexical Meanings In East- And West-German Newspaper Texts”, Lingual System - FLINS,” Proc. of the 5th IEEE International Procs. of the COLING 1980, 76-84, Tokyo, Japan,1980 Conference on Fuzzy Systems and the 2nd International Fuzzy [41] Rieger, B.B., “Semantic Relevance And Aspect Dependency In A Given Engineering Symposium (IFES95 and FUZZ-IEEE95), Yokohama, Subject Domain: Contents-driven algorithmic processing of fuzzy Japan, vol. 2, pp. 1045-1050, 1995. wordmeanings to form dynamic stereotype representations”, Proc. of [20] Graca, J., Ganchev, K., Coheur, L., Pereira, F., and Taskar, B., the COLING 1984, 298-301, 1984 "Controlling Complexity in Part-of-Speech Induction", Journal of [42] Sharef, N., Shen, Y., “Text Fragment Extraction using Incremental Artificial Intelligence Research, Vol.41, 527-551, 2011 Evolving Fuzzy Grammar Fragments Learning”, Procceedings of the [21] Gransden, L.R., Beet, S.W., "Combining auditory representations using 2010 Fuzz-IEEE, WCCI2010, Barcelona, Spain, 3026-3033, 2010 fuzzy sets", Proc. of the ICSLP94, 1047-1050, 1994 [43] Sleator D., Temperley D., "Parsing English with a Link Grammar", [22] Homem, N., Carvalho J.P., “Authorship Identification and Author Fuzzy Carnegie Mellon University Computer Science technical report CMU- Fingerprints”, Proc. of the NAFIPS2011 - 30th Annual Conference of CS-91-196, October 1991. the North American Fuzzy Information Processing Society, El Paso, TX, [44] Taylor, J., Raskin, V., “Understanding the Unknown: Unattested Input USA, 2011 Processing in Natural Language”, Procceedings of the 2011 IEEE [23] Jackson, P., Moulinier, I., “Natural Language Processing for Online International Conference on Fuzzy Systems”, Taipei, Taiwan, 94-101, Applications”, John Benjamins Publishing Company, 2007 2011 [24] Jurafsky, D., Martin, J., “Speech and Language Processing - An [45] Toledano, D.T., Gómez, L.H., “Local refinement of phonetic introduction to natural language processing, computational linguistics, boundaries: a general framework and its application using different and speech recognition”, 2nd Edition, Prentice-Hall, 2009 transition models", Proc. of the EUROSPEECH-2001, Aalborg, [25] Kacprzyk, J., Wilbik, A., Zadrożny, S., “Linguistic summarization of Denmark, 1695-1698, 2001 time series using a fuzzy quantifier driven aggregation”, Fuzzy Sets and [46] Tran, D., Le, T.V., Wagner, M., "Fuzzy Gaussian mixture models for Systems, 159(12):1485–1499, 2008. speaker recognition", Proc. of the ICSLP98, Sidney, Australia, 1998 [26] Kacprzyk, J., Zadrozny, S., “Computing with Words and Systemic [47] Van Segbroeck, M., Van Hamme, H., “Advances in Missing Feature Functional Linguistics: Linguistic Data Summaries and Natural Techniques for Robust Large-Vocabulary Continuous Speech Language Generation”, Proc. of the IPMU, Advances in Intelligent and Recognition”, IEEE Trans. On Audio, Speech, and Language Soft Computing, 2010, Vol 68 23-36, 2010 Processing, Vol 19 (1), 2011 [27] Kim, J-H., KIM G-C., “Fuzzy network model for part-of-speech tagging [48] Way, A., Gough, N., “wEBMT: Developing and Validating an Example- under small training data “, Natural Language Engineering, vol 2: pp 95 Based Machine Translation System Using the World Wide Web”, – 110, 1996 Computational Linguistics, 29 (3), 2003 [28] Kleene, S., “Representation of events in nerve nets and finite automata”, [49] Weizenbaum, J., “ELIZA - A Computer Program for the Study of In Shannon, C. and McCarthy, J. (Eds.), Automata Studies, pp. 3–41. Natural Language Communication between Man and Machine". Princeton University Press, Princeton, NJ. Communications of the ACM 9 (1), 36-45, 1966 [29] Koo, M. W., et al, "A comparative study of speaker adaptation methods [50] White, E., Mazlack, L., “Discerning Suicide Notes Causality Using for HMM-based speech recognition", Proc. of the ICSLP90, 145-148, Fuzzy Cognitive Maps, Procceedings of the 2011 IEEE International 1990 Conference on Fuzzy Systems”, Taipei, Taiwan, 2940-2947, 2011 [30] Lai, L-F. et al, “Developing a Fuzzy Search Engine Based on Fuzzy [51] Woods, W.A., et al., "The lunar sciences natural language information Ontology and Semantic Search”, Procceedings of the 2011 IEEE system: Final report", Cambridge, Massachussets: Bolt Beranek and International Conference on Fuzzy Systems”, Taipei, Taiwan, 2684- Newman Inc., 1972 2689, 2011 [52] Wu, C-H., Chen,Y-J., “Multi-keyword spotting of telephone speech [31] Langanke, U., “Language technology — chomsky grammars and/or using a fuzzy search algorithm and keyword-driven two-level CBSM”, fuzzy sets?”, Proc. of the SAMI2008, 6th International Symposium on Speech Communication, Vol. 33 (3) (2001) pp. 197-212 applied Machine Intelligence and Informatics, Kosice, Slovakia, 85-89, [53] Wu, C-H., Yan, G-L., Lin, C-L., “Speech act modeling in a spoken 2008 dialog system using a fuzzy fragment-class Markov model”, Speech [32] Le Cerf, P., et al., “Multilayer perceptrons as labelers for hidden Markov Communication, Vol. 38 (1-2) (2002) pp. 183-199 models”, IEEE Trans. Speech Audio Processing, vol. 2, pp. 185 – 193, [54] Wu, G-D., Lin, C-T., “Word boundary detection with mel-scale 1994 frequency bank in noisy environment”, IEEE Trans. Speech Audio [33] Ma, L., “Clarification on Linguistic Applications of Fuzzy Set Theory to Processing, vol. 8, pp. 541 – 554, 2000 Natural Language Analysis”, Proc. of the 2011 International Conference [55] Yager, R., “A new Approach to the Summarization of Data”, on fuzzy systems and Knowledge Discovery (FSKD), 811-815, 2011 Information Sciences, 28:69-86, 1982. [34] Minami, Y. et al,, "On the robustness of HMM and ANN speech [56] Zadeh, L., “Precisiated Natural Language (PNL)”, Artificial Intelligence, recognition algorithms", Proc. of the ICSLP90, 1345-1348, 1990 AAAI Vol. 25 (3), 2004 [35] Neviarouskaya A., Prendinger, H., Ishizuka, M., “Affect Analysis [57] Zadeh, L.A., “PRUF - A Meaning Representation Language for Natural Model: novel rule based approach to affect sensing from text”, Natural Languages", Int.J.Man-Mach.Stud. vol10. 396-460, 1978. Language Engineering, vol 17 : pp 95-135, 2010 [58] Zadeh, L.A., “A Computational Approach to Fuzzy Quantifiers in [36] Novak, V., “Fuzzy Sets in Natural Language Processing”, in An Natural Languages”, Comp. Math. with Applic. Vol. 9, 149-184, 1983 Introduction to Fuzzy Logic Applications in Intelligent Systems, R. R. Yager et al. (eds.), , Kluwer Academic Publishers, 185-200, 1992

277