A Morphological Analyzer for Japanese Nouns, Verbs and Adjectives∗ Yanchuan Sim Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213
[email protected] April 26, 2013 Abstract We present an open source morphological analyzer for Japanese nouns, verbs and adjectives.1 The system builds upon the morphological analyzing capabilities of MeCab [Matsumoto et al., 1999] to incorporate finer details of classification such as politeness, tense, mood and voice attributes. We implemented our analyzer in the form of a finite state transducer using the open source finite state compiler FOMA toolkit [Hulden, 2009]. 1 Basic Information about Japanese The Japanese language is spoken by more than 100 million speakers (mostly in Japan). It is an agglutinative language with a SOV word order. The Japanese writing system makes extensive use of Chinese characters, also known as kanji, along with scripts, hiragana and katakana, which are syllabic. There are 3 main lexical classes in Japanese that exhibit morphology, they are the nouns, verbs and adjectives. In Japanese, adverbs are often adjectives suffixed with a special morpheme. As such, it is often not considered a separate class of words. In §4, we will discuss the phenomenon for each of these word classes. arXiv:1410.0291v2 [cs.CL] 16 Dec 2014 2 Past Work on the Japanese morphology In traditional Japanese morphological analysis, a lexicon is assumed. The lexicon is a list of pairs of a word and its corresponding part-of-speech. As Japanese is an unsegmented language, past work on Japanese language morphology analyzers require the use of a segmenter. Often, the segmentation step is conducted jointly with the morphological analysis step in a rule based [Matsumoto et al., 1991] or machine learning framework [Kudo et al., 2004, Matsumoto et al., 1999].