Mathematical Model and Algorithm for Calculating Complex Words in the Karakalpak Language

Bulletin of TUIT: Management and Communication Technologies Volume 2 Issue 1 Article 2 6-30-2019 MATHEMATICAL MODEL AND ALGORITHM FOR CALCULATING COMPLEX WORDS IN THE KARAKALPAK LANGUAGE Shaxnoza Abidova Bulletin of TUIT: Management and Communication Technologies, [email protected] Follow this and additional works at: https://uzjournals.edu.uz/tuitmct Part of the Other Computer Sciences Commons Recommended Citation Abidova, Shaxnoza (2019) "MATHEMATICAL MODEL AND ALGORITHM FOR CALCULATING COMPLEX WORDS IN THE KARAKALPAK LANGUAGE," Bulletin of TUIT: Management and Communication Technologies: Vol. 2 , Article 2. Available at: https://uzjournals.edu.uz/tuitmct/vol2/iss1/2 This Article is brought to you for free and open access by 2030 Uzbekistan Research Online. It has been accepted for inclusion in Bulletin of TUIT: Management and Communication Technologies by an authorized editor of 2030 Uzbekistan Research Online. For more information, please contact [email protected]. “Bulletin of TUIT: Management and Communication Technologies” 1 UDC 510.5, 519.768.2 MATHEMATICAL MODEL AND ALGORITHM FOR CALCULATING COMPLEX WORDS IN THE KARAKALPAK LANGUAGE Nazirova E. Sh., Abidova Sh.B. Abstract. The article examines the morphology of the Karakalpak language, which belongs to the Kipchak group of the Turkic language family. The forms of word formation in the Karakalpak language, their sequences and the affixes added to the core are analyzed. On the basis of the analyzed affixes and suffixes, a complex mathematical model of word formation in the Karakalpak language was developed. On the basis of the developed mathematical model, an algorithm for creating a complex word in the Karakalpak language was developed. Using the developed mathematical model, a four-stage scheme was created for creating complex words of the Karakalpak language. Keywords: morphology, mathematical model, electronic translator, affixes. Introduction Machine translation (MT) is the process of translating written texts from one natural language into As a result of the development of modern another using a special computer program. Sometimes information technologies and communications, access such an appeal turns out to be useful, as it allows you to to multilingual information is provided. As the flow of quickly understand the main content of the text, but multilingual information expands and becomes more much more often, especially when working with complex, information processing, its quality and rapid scientific and technical literature, such translations do translation from one language to another remain one of not stand up to criticism [8]. the most pressing issues. It takes a lot of time and Machine translation can use a method based money to solve this problem with a dictionary. on linguistic rules, which means that words will be Therefore, the demand for electronic translators is translated in a linguistic way - the most suitable (orally growing. Especially in the current period of the speaking) words of the target language will replace pandemic, a dictionary or the help of a translator is those in the original language. needed so that people can translate the information, It is often argued that the success of machine scientific and fiction books they need at home from one translation requires the problem of understanding language to another. During a pandemic, it is natural language to be addressed first. impossible to turn to a translator, and the dictionary is This article is devoted to morphological time consuming. Therefore, the use of electronic analysis, the creation of complex words in the translators is advisable in all respects. Karakalpak language, a mathematical model and an In translation studies, more and more attention algorithm for translating complex words. is paid to the analysis of electronic means, which make First of all, the morphological analysis should it possible to speed up and optimize the translation be briefly described. process. Domestic and foreign scientists-linguists, Morphological analysis is a procedure as a practitioners and theorists-translation studies, result of which information about its internal structure especially specialists in the field of translation is obtained from the form of the external design of a terminology and machine translation, noting the word in the text. Today, there are several dozen increasing importance of information technologies in morphological analysis algorithms for different linguistics in general and in translation in particular, are languages. The main directions of morphological developing various strategies and methods for their analysis are: most effective application in professional activities [1- 1. Analysis by dividing a word form into a 6]. stem and an intended ending, followed by checking for The article by A.O. Kazennikov presents a compatibility. method for removing morphological homonymy. The 2. Morphological analysis of the final proposed method combines classical morphological combination of letters. analysis and allows to simultaneously solve the 3. Universal mathematical models that can be problems of lemmatization and restoration of used by the methods of morphology for morphological morphological features [7]. The task of lemmatization analysis, in the form of open systems of equations, is to reduce a word or word form to a lemma or normal allowing by calculations to carry out the normalization form. of word forms, obtaining grammatical information and the synthesis of word forms. Nazirova E.Sh., Abidova Sh.B. 2019, 1 (44) 2 “Bulletin of TUIT: Management and Communication Technologies” When translating words using electronic - problems of synthesis of construction of all translators, morphological analysis of words is word forms in normal form. performed first. Since words must be separated by root Most of the popular algorithms implement and affixes. To create a machine translator, it is lemmatization (reduction to normal form) using the important to know the morphology of the translated stem of the word, the stemming algorithm [15]. languages. Let's analyze the two most popular It should be noted that different languages lemmatization algorithms based on different principles have different semantic and grammatical features, so - Porter and Yandex. often algorithms successfully used to process one Porter's stemming algorithm was published in language show very low efficiency in another language. 1980 by Martin Porter for the English language [15]. The complexities of natural language processing, The main idea of Porter's algorithm is that there is a however, do not exclude the possibility of identifying limited number of form and word-forming affixes, and narrower problems that can be solved algorithmically: the stem of a word is transformed without using any for example, determining parts of speech or splitting bases (dictionaries) of stems: only a lot of existing texts into logical groups. However, some features of affixes (while complex compound affixes are broken natural languages significantly reduce the effectiveness into simple ones) and manually set regulations. of these solutions. Thus, for example, taking into The fact that Porter's algorithm does not use account all word forms for each word in the any dictionaries and base bases is a plus for Karakalpak language increases the complexity of text performance and a range of applications (it does a good processing by an order of magnitude. Note that the job with non-existent words), and at the same time a Karakalpak language, which is part of a large group of minus in terms of the accuracy of the base selection. In Turkic languages, refers to an agglutinative language. addition, the human factor is often attributed to the The main feature of languages of the disadvantages of Porter's algorithm: the fact that the agglutinative type is that the forms of independent rules for checking are set manually and are sometimes words are formed with the help of unambiguous affixes associated with the grammatical features of the freely attached to the original form. The term ag-glu- language increases the likelihood of error [13, 15]. tinatio etymologically means "sticking, sticking". The Yandex algorithm (Mystem) is the The essential features of agglutinative development of Ilya Segalovich (Yandex, 1998). This languages include: morphological analysis algorithm is a dictionary one. • transparency of the syntagmatic structure The main feature of the algorithm is that for a word of the word, free articulation into morphemes; form that is not described in the dictionary or a non- • axial (axial) nature of the paradigmatic existent word, the algorithm generates its hypothetical structure, free constructability of word forms; model of the word change - one or more variants of the • the linear character of the word, the normal form of the word then, replenishing the coincidence of the stem with the root and with any dictionary with new tokens, the generated hypothetical word form that serves to build more complex word entries can be saved in this dictionary (or in another forms in terms of the number of grammatical meanings. dictionary of the same type) for further use [15]. Agglutinative affixes are characterized by the The advantages of this algorithm are that for following features: each variant of the normal form it offers all the • unambiguity: each affix usually expresses grammatical information (synthesized for non-existent one category; words), this data can be used in the future to select one • standard: the affix

Load more