Grammatical Disambiguation in The Tatar Language Corpus Bulat Khakimov Rinat Gilmullin Ramil Gataullin Research Institute of Research Institute of Research Institute of Applied Semiotics Applied Semiotics Applied Semiotics of the Tatarstan of the Tatarstan of the Tatarstan Academy of Sciences, Academy of Sciences, Academy of Sciences, Kazan Federal University, Kazan Federal University, Kazan Federal University, Kazan, Russia Kazan, Russia Kazan, Russia
[email protected] [email protected] [email protected] A corpus was developed [4]. Research on bstract specification and improvement of the This article concerns the issues of corpus-oriented metalanguage for the description of a Tatar study of the most frequent types of grammatical wordform is currently carried out [5]. The homonymy in the Tatar language and the general conception of the corpus is presented possiblities for automation of the disambiguation in [6]. To implement the grammatical process in the corpus. The authors determine the disambiguation in the Tatar National Corpus, relevance of alternative parses generated in the developers have conducted a study of process of automatic morphological analysis in contextual constraints of different types of terms of real linguistic ambiguity. This work grammatical homonyms, involving statistical presents a variant of classification of frequent homoforms and methods for their disambiguation, corpus data, and suggest the methods of and it estimates the potential impact on the corpus. automatic grammatical disambiguation for the Tatar language. Keywords: linguistic corpus, Tatar language, grammatical homonymy, homoform, 2 Statistical Characteristics of the disambiguation Corpus At the initial stage of work we obtained the 1 Introduction statistical data on the frequency of wordforms The problem of grammatical ambiguity and its with alternative parses, presented in Table 1, resolution is one of the most pressing problems from the database of texts of the Tatar National in modern computer and corpus linguistics [1].