Automatic Conversion of Dialectal Tamil Text to Standard Written Tamil Text using FSTs Marimuthu K Sobha Lalitha Devi AU-KBC Research Centre, AU-KBC Research Centre, MIT Campus of Anna University, MIT Campus of Anna University, Chrompet, Chennai, India. Chrompet, Chennai, India.
[email protected] [email protected] oriented activities such as blogging, social media Abstract chats, and discussions in online forums. Given the unmediated nature of these services, users We present an efficient method to auto- conveniently share the contents in their native matically transform spoken language text languages in a more natural and informal way. to standard written language text for var- This has resulted in bringing together the con- ious dialects of Tamil. Our work is novel tents of various languages. More often these con- in that it explicitly addresses the problem tents are informal, colloquial, and dialectal in and need for processing dialectal and nature. The dialect is defined as a variety of a spoken language Tamil. Written language language that is distinguished from other varie- equivalents for dialectal and spoken lan- ties of the same language by features of phonol- guage forms are obtained using Finite ogy, grammar, and vocabulary and by its use by State Transducers (FSTs) where spoken a group of speakers who are set off from others language suffixes are replaced with ap- geographically or socially. The dialectal varia- propriate written language suffixes. Ag- tion refers to changes in a language due to vari- glutination and compounding in the re- ous influences such as geographic, social, educa- sultant text is handled using Conditional tional, individual and group factors.