Development of a Micro Telugu Opinion Wordnet and Aligning with TELOWN Ontology for Automatic Recognition of Opinion Words from Telugu Documents

Total Page:16

File Type:pdf, Size:1020Kb

Development of a Micro Telugu Opinion Wordnet and Aligning with TELOWN Ontology for Automatic Recognition of Opinion Words from Telugu Documents INTERNATIONAL JOURNAL OF RESEARCH ISSN NO : 2236-6124 Development of a Micro Telugu Opinion WordNet and Aligning with TELOWN Ontology for Automatic Recognition of Opinion Words from Telugu Documents Benarji Tharini1, Dr.Vishnu Vardhan Bulusu2 1 Research Scholar Rayalaseema University, Kurnool, AP, India. [email protected] 2 Professor in Department of CSE, Manthany JNTUH, TS, India. [email protected] Abstract: The emergencies in Indian language based documents over the web are observed in the recent past with the advent of Unicode standard. The content of Indian language and its accessibility is observed to be minimal in the linguistic process evaluation. Unicode based language tools are created in order to prepare language specific repositories and the dictionaries online. Construction of wordnet, language constructs and thinking about a semantically rich lexical synsets is useful in the linguistic processing of the Indian context. Over the two decades the research world is trending towards construction of the semantic models. It is necessary to start a beginning to create a rich knowledge base in order to attain semantically rich linguistic models. With this phenomenon as an aim a micro opinion Telugu wordnet is created in order to map with Telugu Opinion WordNet Ontology (TELOWN) which consists of semantic knowledge on positive and negative Telugu opinion words. The objective of this process is to create opinion wordnet in Telugu along with their synsets for the automatic recognition of opinion words from Telugu documents. SPARQL is used as a query language for the retrieval at the backend. Keywords: semantic web, ontology, Telugu, WordNet, opinion words, SPARQL I. Introduction In a multilingual nation like India interpretation between Indian languages and also amongst English and Indian languages is a basic undertaking. Likewise basic is the errand of Cross-Lingual Search where the query is made in an Indian language and recovery of reports occurs in English or Telugu (vide Figure 1). Every one of these exercises relies upon lexical information of high caliber and scope. This lexical learning is as machine-discernable lexicons, ontologies (various leveled association of ideas) and wordnets (a huge chart like the structure of words). Volume 7, Issue VI, JUNE/2018. Page No:197 INTERNATIONAL JOURNAL OF RESEARCH ISSN NO : 2236-6124 Query in an Indian language IL Query in IL Query in IL Input Input Input Processing Processing Processing Search in Search in English Telugu Document Document Search in Processing of Processing of IL retrieval in retrieval in English Telugu Optional Optional output in output in Translation Translation English Telugu E-> IL T IL Output in IL Output in IL Figure 1: Cross Lingual Search The vast majority of the data on the World Wide Web is encoded as natural language content expected for people yet troublesome for machines to get it. With the Internet blast over the current years, expansive volumes of unstructured messages in different languages and structures are being included to the data stores an everyday schedule. With the approach of Unicode, this wonder is watched for writings in Indian languages like Telugu, Tamil and Bengali in the current years [1]. When all is said in done, these languages are poor as far as accessibility of entrenched corpus, natural language handling apparatuses, et cetera, and along these lines have turned into a vital zone of research in the Indian people group. Volume 7, Issue VI, JUNE/2018. Page No:198 INTERNATIONAL JOURNAL OF RESEARCH ISSN NO : 2236-6124 Throughout the most recent two decades, the world is seeing huge development in Web substance of Indian languages. This influenced individuals to feel good with their local language. Particularly, throughout the previous couple of years, there has been a colossal increment in the Telugu content on the web. Telugu is the fifth biggest talked language and has 250 million speakers over the world, the dominant part of who are from India [2]. So as to process the substance in local language towards important data recovery, the language- particular WordNet is required. WordNet [3] has developed as an awesome asset for the Natural Language Processing applications for English reports. Following English WordNet, WordNets are worked for some languages of the world. Indo WordNet [4] is the main WordNet worked for an Indian language. Wordnets are lexical structures made out of synsets and semantic relations. Synsets are sets of equivalent words. They are connected by semantic relations as is hypernymy (a), meronymy (some portion of) and so on. Wordnets have developed as critical assets for Natural Language Processing (NLP). The principal word net on the planet was worked for English at Princeton University1. At that point took after word nets for European Languages: Eurowordnet2. Since 2000, wordnets for various Indian languages are getting assembled, driven by the Indo wordnet3 exertion at Indian Institute of Technology Bombay4 (IITB). Opinionated substance in Telugu is critical to be dissected for the utilization of enterprises and government(s). Programmed thinking about such natural language records by the machine requires the help of Telugu WordNet. At the point when such a lexical asset is coordinated with the ideas of ontology, the programmed acknowledgment of opinion words from Telugu records happens effectively. II. Related Work A good amount of research has happened on determining orientations of the opinion words in Telugu language. The development of lexical resources for both traditional information retrieval and Opinion Mining tasks is the first step in this research. IndoWordNet is a linked lexical knowledge base of word nets of 18 scheduled languages of India, namely Assamese, Bangla, Bodo, Gujarati, TELUGU, Kannada, Kashmiri, Konkani, Malayalam, Meitei (Manipuri), Marathi, Nepali, Odia, Punjabi, Sanskrit, Tamil, Telugu and Urdu. Such project indeed took off in 2000 with TELUGU WordNet being created by the Natural Language Processing group at the Center for Indian Language Technology (CFILT) in the Computer Science and Engineering Department at IIT Bombay. [5] It was made publicly available in 2006 under GNU license. The TELUGU WordNet was created with support from the TDIL project of Ministry of Communication and Information Technology, India and also partially from Ministry of Human Resources Development, India. Volume 7, Issue VI, JUNE/2018. Page No:199 INTERNATIONAL JOURNAL OF RESEARCH ISSN NO : 2236-6124 The word nets follow the principles of minimality, coverage and replace ability for the synsets. That means, there should be at least a 'core' set of lexemes in the synsets that uniquely give the concept represented by the synsets (minimality), e.g., {house, family} standing for the concept of 'family' ("she is from a noble house"). Then the synsets should cover ALL the words representing the concept in the language (coverage), e.g., the word 'ménage' will have to appear in the 'family' synsets, albeit, towards the end of the synsets, since its usage is rare. Finally, the words towards the beginning of the synsets should be able to replace one another in reasonable amount of corpora (replace ability), e.g., 'house' and 'family' can replace each other in the sentence "she is from a noble house". IndoWordNet is highly similar to EuroWordNet. However, the pivot language is TELUGU which, of course, is linked to the English WordNet. Also typical Indian language phenomena like complex predicates and causative verbs are captured in IndoWordNet. IndoWordNet is publicly brows able. The Indian language word net building efforts forming the subcomponents of IndoWordNet project are: North East WordNet project, Dravidian WordNet Project and Indradhanush project all of which are funded by the TDIL project. Word nets of other languages of India then followed suit. The large nationwide project of building Indian language word nets was called the IndoWordNet project. IndoWordNet[1] is a linked lexical knowledge base of word nets of 18 scheduled languages of India, viz., Assamese, Bangla, Bodo, Gujarati, TELUGU, Kannada, Kashmiri, Konkani, Malayalam, Meitei, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil, Telugu and Urdu. The word nets are getting created by using expansion approach from the TELUGU WordNet. The TELUGU WordNet was created from first principles (mentioned below) and was the first wordnet for an Indian language. The method adopted was same as the Princeton WordNet for English. Polish WordNet is being mapped to Princeton WordNet based on the strategy followed by IndoWordNet.[6] For example Fig : Telugu word response on Indo wordnet Volume 7, Issue VI, JUNE/2018. Page No:200 INTERNATIONAL JOURNAL OF RESEARCH ISSN NO : 2236-6124 Amitava Das and Bandopadhya created [6] SentiWordNet for the Bengali language. 35,805 Bengali passages were accounted for from their trial. Joshi et al. created [7] one of Indian language Telugu SentiWordNet (T-SWN) by utilizing English SentiWordNet and English- Telugu WordNet mappings. Bakliwal et al. made [8] Telugu subjective vocabulary for Telugu content extremity grouping. They built up this dictionary with Telugu descriptive words and qualifiers and their extremity scores. The examination chip away at connecting of WordNet with ontology is propelled by the inspiration of computerized thinking about natural language assets. The advantages of connecting WordNet with ontology are multi-overlay [10]. These are: (I) The formal details of the ontology are conceivable to
Recommended publications
  • An Efficient Database Design for Indowordnet Development Using
    An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh Prabhu2 Shilpa Desai1 Hanumant Redkar1 N eha Prabhugaonkar1 Apurva N agvenkar1 Ramdas Karmali1 (1) GOA UNIVERSITY, Taleigao - Goa (2) THYWAY CREATIONS, Mapusa - Goa [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] ABSTRACT WordNet is a crucial resource that aids in Natural Language Processing (NLP) tasks such as Machine Translation, Information Retrieval, Word Sense Disambiguation, Multi-lingual Dictionary creation, etc. The IndoWordNet is a multilingual WordNet which links WordNets of different Indian languages on a common identification number given to each concept. WordNet is designed to capture the vocabulary of a language and can be considered as a dictionary cum thesaurus and much more. WordNets for some Indian Languages are being developed using expansion approach. In this paper we have discussed the details and our experiences during the evolution of this database design while working on the Indradhanush WordNet Project. The Indradhanush WordNet Project is working on the development of WordNets for seven Indian languages. Our database design gives an efficient plan for storage of WordNet data for all languages. In addition it extends the design to hold specific concepts for a language. KEYWORDS: WordNet, IndoWordNet, synset, database design, expansion approach, semantic relation, lexical relation. Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), pages 229–236, COLING 2012, Mumbai, December 2012. 229 1 Introduction 1.1 WordNet and its storage methods WordNet (Miller, 1993) maintains the concepts in a language, relations between concepts and their ontological details.
    [Show full text]
  • The Festvox Indic Frontend for Grapheme-To-Phoneme Conversion
    The Festvox Indic Frontend for Grapheme-to-Phoneme Conversion Alok Parlikar, Sunayana Sitaram, Andrew Wilkinson and Alan W Black Carnegie Mellon University Pittsburgh, USA aup, ssitaram, aewilkin, [email protected] Abstract Text-to-Speech (TTS) systems convert text into phonetic pronunciations which are then processed by Acoustic Models. TTS frontends typically include text processing, lexical lookup and Grapheme-to-Phoneme (g2p) conversion stages. This paper describes the design and implementation of the Indic frontend, which provides explicit support for many major Indian languages, along with a unified framework with easy extensibility for other Indian languages. The Indic frontend handles many phenomena common to Indian languages such as schwa deletion, contextual nasalization, and voicing. It also handles multi-script synthesis between various Indian-language scripts and English. We describe experiments comparing the quality of TTS systems built using the Indic frontend to grapheme-based systems. While this frontend was designed keeping TTS in mind, it can also be used as a general g2p system for Automatic Speech Recognition. Keywords: speech synthesis, Indian language resources, pronunciation 1. Introduction in models of the spectrum and the prosody. Another prob- lem with this approach is that since each grapheme maps Intelligible and natural-sounding Text-to-Speech to a single “phoneme” in all contexts, this technique does (TTS) systems exist for a number of languages of the world not work well in the case of languages that have pronun- today. However, for low-resource, high-population lan- ciation ambiguities. We refer to this technique as “Raw guages, such as languages of the Indian subcontinent, there Graphemes.” are very few high-quality TTS systems available.
    [Show full text]
  • Exploring Resources in Word Sense Disambiguation for Marathi Language Amit Patil1, Chhaya Patil2, Dr
    www.rspsciencehub.com Volume 02 Issue 10S October 2020 Special Issue of First International Conference on Advancements in Management, Engineering and Technology (ICAMET 2020) Exploring Resources in Word Sense Disambiguation for Marathi Language Amit Patil1, Chhaya Patil2, Dr. Rakesh Ramteke3, Dr. R. P. Bhavsar4, Dr. Hemant Darbari5 1,2 Assistant Professor, Department of Computer Application, RCPET’s IMRD, Maharashtra, India 3,4Professor, School of Computer Sciences, KBC North Maharashtra University, Maharashtra, India 5Director General, Centre for Development of Advanced Computing (C-DAC), Maharashtra, India Abstract Word Sense Disambiguation (WSD) is one of the most challenging problems in the research area of natural language processing. To find the correct sense of the word in a particular context is called Word Sense Disambiguation. As a human, we can get a correct sense of the word given in the sentence because of word knowledge of that particular natural language, but it is not an easy task for the machine to disambiguate the word. Developing any WSD system, it required sense repository and sense dictionary. It is very costly and time-consuming to build these resources. Many foreign languages have available these resources, that is why most of the foreign languages like English, German, Spanish etc lot of work is done in these Natural languages. When we look for Indian languages like Hindi, Marathi, Bengali etc. very less work is done. The reason behind this is resource-scarcity. In this paper, we majorly focus on Marathi Language Word Sense Disambiguation because of very less work is done in the Marathi Language as compared to Hindi and other Indian Languages.
    [Show full text]
  • WWDS Apis: Application Programming Interfaces for Efficient Manipulation of World Wordnet Database Structure
    Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) WWDS APIs: Application Programming Interfaces for Efficient Manipulation of World WordNet Database Structure Hanumant Redkar1, Sudha Bhingardive1, Kevin Patel1, Pushpak Bhattacharyya1 Neha Prabhugaonkar2, Apurva Nagvenkar2, Ramdas Karmali2 1Indian Institute of Technology Bombay, Mumbai, India 2Goa University, Goa, India {hanumantredkar, bhingardivesudha, kevin.svnit, pushpakbh}@gmail.com {nehapgaonkar.1920, apurv.nagvenkar, ramdas.karmali}@gmail.com Abstract example, developers can potentially extract information WordNets are useful resources for natural language from other WordNets through WWDS and its APIs that is processing. Various WordNets for different languages have missing in their source WordNet. The WWDS and WWDS been developed by different groups. Recently, World APIs are explained in the following sections. WordNet Database Structure (WWDS) was proposed by Redkar et. al (2015) as a common platformm to store these different WordNets. However, it is underutilized due to lack World WordNet Database Structure of programming interface. In this paper, we present WWDS APIs, which are designed to address this shortcoming. These WWDS is an efficient storage mechanism which uses WWDS APIs, in conjunction with WWDS, act as a wrapper that enables developers to utilize WordNets without multiple databases to accommodate different WordNets. Its worrying about the underlying storage structure. The APIs design is based on IndoWordNet database structure are developed in PHP, Java, and Python, as they are the (Prabhu et al., 2012). The language independent preferred programming languages of most developers and information such as semantic relations, ontology details, researchers working in language technologies. These APIs etc. is stored in a single master database named can help in various applications like machine translation, word sense disambiguation, multilingual information wordnet_master.
    [Show full text]
  • Comparative Study on Currently Available Wordnets
    International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 10 (2018) pp. 8140-8145 © Research India Publications. http://www.ripublication.com Comparative Study on Currently Available WordNets Sreedhi Deleep Kumar Reshma E U PG Scholar PG Scholar Department of Computer Science and Engineering Department of Computer Science and Engineering Vidya Academy of Science and Technology Vidya Academy of Science and Technology Thrissur, India. Thrissur, India. Sunitha C Amal Ganesh Associate Professor Assistant Professor Department of Computer Science and Engineering Department of Computer Science and Engineering Vidya Academy of Science and Technology Vidya Academy of Science and Technology Thrissur, India. Thrissur, India. Abstract SinoTibetan, Tibeto-Burman and Austro-Asiatic. The major ones are the Indo-Aryan, spoken by the northern to western WordNet is an information base which is arranged part of India and Dravidian, spoken by southern part of India. hierarchically in any language. Usually, WordNet is The Eighth Schedule of the Indian Constitution lists 22 implemented using indexed file system. Good WordNets languages, which have been referred to as scheduled available in many languages. However, Malayalam is not languages and given recognition, status and official having an efficient WordNet. WordNet differs from the encouragement. dictionaries in their organization. WordNet does not give pronunciation, derivation morphology, etymology, usage notes, A Dictionary can be called as are source dealing with the or pictorial illustrations. WordNet depicts the semantic relation individual words of a language along with its orthography, between word senses more transparently and elegantly. In this pronunciation, usage, synonyms, derivation, history, work, a general comparison of currently browsable WordNets etymology, etc.
    [Show full text]
  • Improving Semantic Similarity with Cross-Lingual Resources: a Study in Bangla—A Low Resourced Language
    informatics Article Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language Rajat Pandit 1,* , Saptarshi Sengupta 2, Sudip Kumar Naskar 3, Niladri Sekhar Dash 4 and Mohini Mohan Sardar 5 1 Department of Computer Science, West Bengal State University, Kolkata 700126, India 2 Department of Computer Science, University of Minnesota Duluth, Duluth, MN 55812, USA; [email protected] 3 Department of Computer Science & Engineering , Jadavpur University, Kolkata 700032, India; [email protected] 4 Linguistic Research Unit, Indian Statistical Institute, Kolkata 700108, India; [email protected] 5 Department of Bengali, West Bengal State University, Kolkata 700126, India; [email protected] * Correspondence: [email protected] Received: 17 February 2019; Accepted: 20 April 2019; Published: 5 May 2019 Abstract: Semantic similarity is a long-standing problem in natural language processing (NLP). It is a topic of great interest as its understanding can provide a look into how human beings comprehend meaning and make associations between words. However, when this problem is looked at from the viewpoint of machine understanding, particularly for under resourced languages, it poses a different problem altogether. In this paper, semantic similarity is explored in Bangla, a less resourced language. For ameliorating the situation in such languages, the most rudimentary method (path-based) and the latest state-of-the-art method (Word2Vec) for semantic similarity calculation were augmented using cross-lingual resources in English and the results obtained are truly astonishing. In the presented paper, two semantic similarity approaches have been explored in Bangla, namely the path-based and distributional model and their cross-lingual counterparts were synthesized in light of the English WordNet and Corpora.
    [Show full text]
  • A Sentiment Analysis of Gujarati Text Using Gujarati Senti Word Net
    International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8 Issue-9, July 2019 A Sentiment Analysis of Gujarati Text using Gujarati Senti word Net Lata Gohil, Dharmendra Patel Abstract: Sentiment Analysis plays vital role in decision II. RELATED WORK making. For English language intensive research work is done in this area. Very less work is reported in this domain for Indian Sentiment analysis is useful method to understand opinion languages compared to English language. Gujarati language is expressed in text. “It is one of the most active research areas almost unexplored for this task. More data in form of movie in natural language processing and is also widely studied in reviews, product reviews, social media posts etc are available in data mining, web mining, and information retrieval” [7]. regional languages as people like to use their native language on Beginning research work on sentiment analysis was mostly Internet which leads to need of mining these data in order to understand their opinion. Various tools and resources are focused on English language. However increasing developed for English language and few for Indian languages. non-English language content on Internet led demand to work Gujarati is resource poor language for this task. Motive of this towards other languages. There are two approaches namely paper is to develop sentiment lexical resource for Gujarati Lexicon and Machine Learning are widely explored for language which can be used for sentiment analysis of Gujarati sentiment analysis. Large amount of annotated data is the key text. Hindi SentiWordNet (H-SWN) [1] and synonym relations of requirement of Machine learning approach.
    [Show full text]
  • Sanskrit Dictionary
    Online Sanskrit Dictionary Introduction The following is a list of Sanskrit words printed in Devanagari with its transliterated form and a short meaning provided as a reference source. This cannot be a substitute for a good printed Sanskrit-English dictionary. However, we anticipate this to aid a student of Sanskrit in the on-line world. The list of words is a compilation from various sources such as messages on sanskrit-digest, translated documents such as Bhagavadgita, atharvashiirshha, raamarakshaa et cetera, and other files accessible on the web. The words are encoded in ITRANS transliteration scheme so as to print them in Devanagari. There is a copyright on this file to the extent of preventing misuse on other internet sites and ego-trips of individuals. We recommend not to copy and post this file on any other site since we periodically update and correct this list and we do not want different versions of file floating around the internet. We have seen people copying this work and calling of their own. We request you to provide corrections, and more importantly many such additions from your own collection. The list has been arranged according to Devanagari sequence.\\ The transliteration according to ITRANS (older 3.2) scheme is given by \medskip\hrule\medskip \underline{vowels(svara):} \hskip .5in a aa(A) i ii(I) u uu(U) R\^{}i R\^{}I e ai o \hskip .5in au aM aH L\^{}i L\^{}I \underline{consonants(vya.njana):} \hskip .5in k kh g gh N\^{}\\ \hskip .5in ch chh j jh JN\\ \hskip .5in T Th D Dh N\\ \hskip .5in t th d dh n\\ \hskip .5in p ph b bh m\\ \hskip .5in y r l v sh shh s h L(maraaThii) kSh(x ksh) GY(hindii)\\ Both .n and M produce anusvaara, .a avagraha, .h haLa.nta \\ H visarga, Only a dot .
    [Show full text]
  • S.ARULMOZI Assistant Professor Mobile: +91-9441330510 Dept
    S.ARULMOZI Assistant Professor Mobile: +91-9441330510 Dept. of Dravidian & Computational Linguistics Residence: +91-8570-278214 Dravidian University, Kuppam 517426, India Email: [email protected] EDUCATION Ph.D., Applied Linguistics, University of Hyderabad, 1999. PGDTS, Translation Studies, University of Hyderabad, 1995 (66%) M.Phil., Applied Linguistics, University of Hyderabad, 1992 (70.8%) M.A., Linguistics, Bharathiar University, 1990 (64.4%) B.Sc., Chemistry, Bharathiar University, 1988. (54.6%) H.Sc., Board of Higher Secondary Education, Tamil Nadu, 1985 (61.75%) S.S.L.C., Board of Secondary Education, Tamil Nadu, 1983 (64.5%) EMPLOYMENT Designation Department University/Institution Period Assistant Professor Department of Dravidian Dravidian University 29 Dec 2005 and Computational onwards Linguistics Guest Faculty Centre for ALTS University of Hyderabad 19 Jan 2005 to 28 Dec 2005 Member Research AU-KBC Research Centre Anna University 29 Dec 2000 to Staff 15 Jan 2005 Project Fellow Department of Linguistics Tamil University 21 Jan 1999 to 2 Jul 2000 Language Assistant- DoE Project Central Institute of Indian 8 Jun 1998 to Tamil Languages 31 Oct 1998 RESEARCH EXPERIENCE Course Topic Research Guide Year University Ph.D. Aspects of Inflectional Prof. Probal Dasgupta 1998 University of Morphophonology: A & Dr. N. Krupanandam Hyderabad Computational Approach M.Phil. Dynamics of Translation in Prof. Probal Dasgupta 1992 University of Reconstructing Sci-Tech Hyderabad Terminologies M.A. Advertisement Headlines Prof. C.Shanmugom 1990 Bharathiar University 1 ACADEMIC OUTREACH S.No. Role Topic Event Place Date 1. Resource Person BIS Tagset for Workshop on Tamil Madurai 03-03-2013 Tamil POS POS Tagging Kamaraj Tagging University 2.
    [Show full text]
  • An Online Interface for Synset Creation with Special Reference to Sanskrit
    Introduction to Synskarta: An Online Interface for Synset Creation with Special Reference to Sanskrit Hanumant Redkar, Jai Paranjape, Nilesh Joshi, Irawati Kulkarni, Malhar Kulkarni, Pushpak Bhattacharyya Center for Indian Language Technology, Indian Institute of Technology Bombay, India. [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] In this paper, we have taken reference of San- Abstract skrit WordNet 3. Sanskrit is an Indo-Aryan lan- guage and is one of the ancient languages. It has WordNet is a large lexical resource express- vast literature and a rich tradition of creating ing distinct concepts in a language. Synset is léxica. The roots of all languages in the Indo Euro- a basic building block of the WordNet. In this pean family in India can be traced to Sanskrit paper, we introduce a web based lexicogra- (Kulkarni et al., 2010). Sanskrit WordNet is con- pher's interface ‘Synskarta’ which is devel- structed using expansion approach where Hindi oped to create synsets from source language to target language with special reference to WordNet is used as a source (Kulkarni et al., Sanskrit WordNet. We focus on introduction 2010). and implementation of Synskarta and how it While developing Sanskrit WordNet, lexicogra- can help to overcome the limitations of the phers create Sanskrit synsets by referring to Hindi existing system. Further, we highlight the fea- synsets and by following the three principles of tures, advantages, limitations and user evalua- synset creation (Bhattacharyya, 2010). Since San- tions of the same. Finally, we mention the skrit came into existence much before Hindi, it has scope and enhancements to the Synskarta and many words which are not present in Hindi its usefulness in the entire IndoWordNet WordNet.
    [Show full text]
  • Gujarati Language: Research Issues, Resources and Proposed Method on Word Sense Disambiguation
    International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-8, Issue-2S11, September 2019 Gujarati Language: Research Issues, Resources and Proposed Method on Word Sense Disambiguation Tarjni Vyas, Amit Ganatra B. GUJARATI GRAMMAR Abstract: Gujarati Word Sense Disambiguation (WSD) is an There are 32 consonants and 8 vowels in Gujarati language. exceptionally complex when it comes to Natural language handling because it needs to manage complexities found in a No Components Number Details language. In this paper, the discussion has put forward about 1 Consonants 32 Fig2 Guajarati language, Gujarati Wordnet and Gujarati word sense disambiguation. Accordingly, the deep learning approach is 2 Vowels 8 Fig2 found to perform better in Gujarati WSD yet one of its weakness is 3 Tenses 3 Past,Present, Future the prerequisite of enormous information sources without which 4 Vachans 2 Singular,Plural preparing is close to impossible. On the other hand, utilizes 5 Sentence Structure 3 Subject,object,Verb information sources to choose the meanings of words in a specific setting. Provided with that, deep learning approaches appear to be Sentence structure is made in the order of Subject, Object and more suitable to manage word sense disambiguation; however, the process will always be challenging given the ambiguity of Verb in Gujarati language. natural languages. There are three Tenses in Gujarati Past Tense, Present Tense Keywords: Word Sense Disambiguation, Gujarati Language, and Future Tense. There are two types of Vachans in Deep learning, Natural language processing, Lesk Gujarati.Singular and Plural. Gujarati Language includes Algorithm, Wordnet. following cases.(Table 1) I.
    [Show full text]
  • Foreign Languages for the Use of Printers and Translators
    u. Gmm^-mi'mr printing office k. K GIEGJij^^a^GlI, Public Pbinter FOREIGN LANG-UAGI SUPPLEMENT TO STYLE MANUAL JIICVISED EDITION FOREIGN LANGUAGES For the Use of Printers and Translators SUPPLEMENT TO STYLE MANUAL of the UNITED STATES GOVERNMENT PRINTING OFFICE SECOND EDITION, REVISED AND ENLARGED APRIL 1935 By GEORGE F. von OSTERMANN Foreign Reader A. E. GIEGENGACK Public Printer WASHINGTON, D. C. 1935 For sale by the Superintendent of Documents, Washington, D. C. Price $1.00 (Buckram) PREFACE This manual relating to foreign languages is purposely condensed for ready reference and is intended merely as a guide, not a textbook. Only elementary rules and examples are given, and no effort is made to deal exhaustively with any one subject. Minor exceptions exist to some of the rules given, but a close adherence to the usage indicated will be sufficient for most foreign-language work. In the Romance languages, especially, there are other good forms and styles not shovm in the following pages. It is desired to acknowledge the assistance and cooperation of officials and members of the staff of the Library of Congress in the preparation of these pages and, in particular. Dr. Herbert Putnam, Librarian of Congress; Mr. Martin A. Roberts, Superintendent of the Reading Room; Mr. Charles Martel, Consultant in Cataloging, Classification, and Bibliography; Mr. Julian Leavitt, Chief of Catalog Division; Mr. James B. Childs, Chief of Document Division; Dr. Israel Schapiro, Chief of the Semitic Division; Mr. George B. Sanderlin; Mr. S. N. Cerick; Mr. Jens Nyholm; Mr. N. H. Randers-Pehrson; Mr. Oscar E.
    [Show full text]