Gujarati Language: Research Issues, Resources and Proposed Method on Word Sense Disambiguation
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
An Efficient Database Design for Indowordnet Development Using
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh Prabhu2 Shilpa Desai1 Hanumant Redkar1 N eha Prabhugaonkar1 Apurva N agvenkar1 Ramdas Karmali1 (1) GOA UNIVERSITY, Taleigao - Goa (2) THYWAY CREATIONS, Mapusa - Goa [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] ABSTRACT WordNet is a crucial resource that aids in Natural Language Processing (NLP) tasks such as Machine Translation, Information Retrieval, Word Sense Disambiguation, Multi-lingual Dictionary creation, etc. The IndoWordNet is a multilingual WordNet which links WordNets of different Indian languages on a common identification number given to each concept. WordNet is designed to capture the vocabulary of a language and can be considered as a dictionary cum thesaurus and much more. WordNets for some Indian Languages are being developed using expansion approach. In this paper we have discussed the details and our experiences during the evolution of this database design while working on the Indradhanush WordNet Project. The Indradhanush WordNet Project is working on the development of WordNets for seven Indian languages. Our database design gives an efficient plan for storage of WordNet data for all languages. In addition it extends the design to hold specific concepts for a language. KEYWORDS: WordNet, IndoWordNet, synset, database design, expansion approach, semantic relation, lexical relation. Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), pages 229–236, COLING 2012, Mumbai, December 2012. 229 1 Introduction 1.1 WordNet and its storage methods WordNet (Miller, 1993) maintains the concepts in a language, relations between concepts and their ontological details. -
Exploring Resources in Word Sense Disambiguation for Marathi Language Amit Patil1, Chhaya Patil2, Dr
www.rspsciencehub.com Volume 02 Issue 10S October 2020 Special Issue of First International Conference on Advancements in Management, Engineering and Technology (ICAMET 2020) Exploring Resources in Word Sense Disambiguation for Marathi Language Amit Patil1, Chhaya Patil2, Dr. Rakesh Ramteke3, Dr. R. P. Bhavsar4, Dr. Hemant Darbari5 1,2 Assistant Professor, Department of Computer Application, RCPET’s IMRD, Maharashtra, India 3,4Professor, School of Computer Sciences, KBC North Maharashtra University, Maharashtra, India 5Director General, Centre for Development of Advanced Computing (C-DAC), Maharashtra, India Abstract Word Sense Disambiguation (WSD) is one of the most challenging problems in the research area of natural language processing. To find the correct sense of the word in a particular context is called Word Sense Disambiguation. As a human, we can get a correct sense of the word given in the sentence because of word knowledge of that particular natural language, but it is not an easy task for the machine to disambiguate the word. Developing any WSD system, it required sense repository and sense dictionary. It is very costly and time-consuming to build these resources. Many foreign languages have available these resources, that is why most of the foreign languages like English, German, Spanish etc lot of work is done in these Natural languages. When we look for Indian languages like Hindi, Marathi, Bengali etc. very less work is done. The reason behind this is resource-scarcity. In this paper, we majorly focus on Marathi Language Word Sense Disambiguation because of very less work is done in the Marathi Language as compared to Hindi and other Indian Languages. -
The Rise of Dalit Peasants Kolhi Activism in Lower Sindh
The Rise of Dalit Peasants Kolhi Activism in Lower Sindh (Original Thesis Title) Kolhi-peasant Activism in Naon Dumbālo, Lower Sindh Creating Space for Marginalised through Multiple Channels Ghulam Hussain Mahesar Quaid-i-Azam University Department of Anthropology ii Islamabad - Pakistan Year 2014 Kolhi-Peasant Activism in Naon Dumbālo, Lower Sindh Creating Space for Marginalised through Multiple Channels Ghulam Hussain Thesis submitted to the Department of Anthropology, Quaid-i-Azam University Islamabad, in partial fulfillment of the degree of ‗Master of Philosophy in Anthropology‘ iii Quaid-i-Azam University Department of Anthropology Islamabad - Pakistan Year 2014 Formal declaration I hereby, declare that I have produced the present work by myself and without any aid other than those mentioned herein. Any ideas taken directly or indirectly from third party sources are indicated as such. This work has not been published or submitted to any other examination board in the same or a similar form. Islamabad, 25 March 2014 Mr. Ghulam Hussain Mahesar iv Final Approval of Thesis Quaid-i-Azam University Department of Anthropology Islamabad - Pakistan This is to certify that we have read the thesis submitted by Mr. Ghulam Hussain. It is our judgment that this thesis is of sufficient standard to warrant its acceptance by Quaid-i-Azam University, Islamabad for the award of the degree of ―MPhil in Anthropology‖. Committee Supervisor: Dr. Waheed Iqbal Chaudhry External Examiner: Full name of external examiner incl. title Incharge: Dr. Waheed Iqbal Chaudhry v ACKNOWLEDGEMENT This thesis is the product of cumulative effort of many teachers, scholars, and some institutions, that duly deserve to be acknowledged here. -
Proposal for a Gujarati Script Root Zone Label Generation Ruleset (LGR)
Proposal for a Gujarati Root Zone LGR Neo-Brahmi Generation Panel Proposal for a Gujarati Script Root Zone Label Generation Ruleset (LGR) LGR Version: 3.0 Date: 2019-03-06 Document version: 3.6 Authors: Neo-Brahmi Generation Panel [NBGP] 1 General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Gujarati LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: proposal-gujarati-lgr-06mar19-en.xml Labels for testing can be found in the accompanying text document: gujarati-test-labels-06mar19-en.txt 2 Script for which the LGR is proposed ISO 15924 Code: Gujr ISO 15924 Key N°: 320 ISO 15924 English Name: Gujarati Latin transliteration of native script name: gujarâtî Native name of the script: ગજુ રાતી Maximal Starting Repertoire (MSR) version: MSR-4 1 Proposal for a Gujarati Root Zone LGR Neo-Brahmi Generation Panel 3 Background on the Script and the Principal Languages Using it1 Gujarati (ગજુ રાતી) [also sometimes written as Gujerati, Gujarathi, Guzratee, Guujaratee, Gujrathi, and Gujerathi2] is an Indo-Aryan language native to the Indian state of Gujarat. It is part of the greater Indo-European language family. It is so named because Gujarati is the language of the Gujjars. Gujarati's origins can be traced back to Old Gujarati (circa 1100– 1500 AD). -
Parsi Theater, Urdu Drama, and the Communalization of Knowledge: a Bibliographic Essay
Parsi Theater, Urdu Drama, and the Communalization of Knowledge: A Bibliographic Essay I its remarkable century-long history traversing the colonial and nation- alist eras, the Parsi theater was unique as a site of communal harmony. The Parsi theater began in Bombay in the early s and fanned out across South and Southeast Asia by the s. During the twentieth cen- tury, major Parsi theatrical companies flourished in Lahore, Delhi, and Calcutta, exerting a huge impact on the development of modern drama, regional music, and the cinema. Parsis, Hindus, Muslims, Anglo-Indians, and Baghdadi Jews consorted amicably in both residential and traveling companies. Although company ownership usually remained in Parsi hands, actors were drawn from many communities, as were professional writers, musicians, painters, stage hands, and other personnel. As Såmn≥t^ Gupta makes clear, it was Parsis, non-Parsis, Hindus, Muslims, and Christians who spread the art of theatre by founding theatrical companies, who built playhouses and encouraged drama, who became actors and popularized the art of acting, who composed innumerable dramas in Gujarati, Hindi, and Urdu, who composed songs and defended classical music, and who wrote descriptions of the Parsi stage and related matters.1 Audiences similarly were heterogeneous, comprised of diverse relig- ious, ethnic, and linguistic groups and representing a wide range of class 1Såmn≥t^ Gupta, P≥rsµ T^iy®ªar: Udb^av aur Vik≥s (Allahabad: Låkb^≥ratµ Prak≥shan, ), dedication (samarpan), p. • T A U S positions. Sections of the public were catered to by particular narrative genres, including the Indo-Muslim fairy romance, the Hindu mythologi- cal, and the bourgeois social drama, yet no genre was produced exclu- sively for a particular viewership. -
WWDS Apis: Application Programming Interfaces for Efficient Manipulation of World Wordnet Database Structure
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) WWDS APIs: Application Programming Interfaces for Efficient Manipulation of World WordNet Database Structure Hanumant Redkar1, Sudha Bhingardive1, Kevin Patel1, Pushpak Bhattacharyya1 Neha Prabhugaonkar2, Apurva Nagvenkar2, Ramdas Karmali2 1Indian Institute of Technology Bombay, Mumbai, India 2Goa University, Goa, India {hanumantredkar, bhingardivesudha, kevin.svnit, pushpakbh}@gmail.com {nehapgaonkar.1920, apurv.nagvenkar, ramdas.karmali}@gmail.com Abstract example, developers can potentially extract information WordNets are useful resources for natural language from other WordNets through WWDS and its APIs that is processing. Various WordNets for different languages have missing in their source WordNet. The WWDS and WWDS been developed by different groups. Recently, World APIs are explained in the following sections. WordNet Database Structure (WWDS) was proposed by Redkar et. al (2015) as a common platformm to store these different WordNets. However, it is underutilized due to lack World WordNet Database Structure of programming interface. In this paper, we present WWDS APIs, which are designed to address this shortcoming. These WWDS is an efficient storage mechanism which uses WWDS APIs, in conjunction with WWDS, act as a wrapper that enables developers to utilize WordNets without multiple databases to accommodate different WordNets. Its worrying about the underlying storage structure. The APIs design is based on IndoWordNet database structure are developed in PHP, Java, and Python, as they are the (Prabhu et al., 2012). The language independent preferred programming languages of most developers and information such as semantic relations, ontology details, researchers working in language technologies. These APIs etc. is stored in a single master database named can help in various applications like machine translation, word sense disambiguation, multilingual information wordnet_master. -
Rita Kothari.P65
NMML OCCASIONAL PAPER PERSPECTIVES IN INDIAN DEVELOPMENT New Series 47 Questions in and of Language Rita Kothari Humanities and Social Sciences Department, Indian Institute of Technology, Gandhinagar, Gujarat Nehru Memorial Museum and Library 2015 NMML Occasional Paper © Rita Kothari, 2015 All rights reserved. No portion of the contents may be reproduced in any form without the written permission of the author. This Occasional Paper should not be reported as representing the views of the NMML. The views expressed in this Occasional Paper are those of the author(s) and speakers and do not represent those of the NMML or NMML policy, or NMML staff, fellows, trustees, advisory groups, or any individuals or organizations that provide support to the NMML Society nor are they endorsed by NMML. Occasional Papers describe research by the author(s) and are published to elicit comments and to further debate. Questions regarding the content of individual Occasional Papers should be directed to the authors. NMML will not be liable for any civil or criminal liability arising out of the statements made herein. Published by Nehru Memorial Museum and Library Teen Murti House New Delhi-110011 e-mail : [email protected] ISBN : 978-93-83650-63-7 Price Rs. 100/-; US $ 10 Page setting & Printed by : A.D. Print Studio, 1749 B/6, Govind Puri Extn. Kalkaji, New Delhi - 110019. E-mail : [email protected] NMML Occasional Paper Questions in and of Language* Rita Kothari Sorathgada sun utri Janjhar re jankaar Dhroojegadaanrakangra Haan re hamedhooje to gad girnaar re… (K. Kothari, 1973: 53) As Sorath stepped out of the fort Not only the hill in the neighbourhood But the walls of Girnar fort trembled By the sweet twinkle of her toe-bells… The verse quoted above is one from the vast repertoire of narrative traditions of the musician community of Langhas. -
FACTORS AFFECTING PROFICIENCY AMONG GUJARATI HERITAGE LANGUAGE LEARNERS on THREE CONTINENTS a Dissertation Submitted to the Facu
FACTORS AFFECTING PROFICIENCY AMONG GUJARATI HERITAGE LANGUAGE LEARNERS ON THREE CONTINENTS A Dissertation submitted to the Faculty of the Graduate School of Arts and Sciences of Georgetown University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Linguistics By Sheena Shah, M.S. Washington, DC May 14, 2013 Copyright 2013 by Sheena Shah All Rights Reserved ii FACTORS AFFECTING PROFICIENCY AMONG GUJARATI HERITAGE LANGUAGE LEARNERS ON THREE CONTINENTS Sheena Shah, M.S. Thesis Advisors: Alison Mackey, Ph.D. Natalie Schilling, Ph.D. ABSTRACT This dissertation examines the causes behind the differences in proficiency in the North Indian language Gujarati among heritage learners of Gujarati in three diaspora locations. In particular, I focus on whether there is a relationship between heritage language ability and ethnic and cultural identity. Previous studies have reported divergent findings. Some have found a positive relationship (e.g., Cho, 2000; Kang & Kim, 2011; Phinney, Romero, Nava, & Huang, 2001; Soto, 2002), whereas others found no correlation (e.g., C. L. Brown, 2009; Jo, 2001; Smolicz, 1992), or identified only a partial relationship (e.g., Mah, 2005). Only a few studies have addressed this question by studying one community in different transnational locations (see, for example, Canagarajah, 2008, 2012a, 2012b). The current study addresses this matter by examining data from members of the same ethnic group in similar educational settings in three multi-ethnic and multilingual cities. The results of this study are based on a survey consisting of questionnaires, semi-structured interviews, and proficiency tests with 135 participants. Participants are Gujarati heritage language learners from the U.K., Singapore, and South Africa, who are either current students or recent graduates of a Gujarati School. -
Religious and Social Life of Religious Minorities
RELIGIOUS AND SOCIAL LIFE OF RELIGIOUS MINORITIES A CASE STUDY OF BAHÁ’Í AND PARSI COMMUNITIES OF PAKISTAN Abdul Fareed 101-FU/PhD/F08 DEPARTMENT OF COMPARATIVE RELIGION FACULTY OF ISLAMIC STUDIES, INTERNATIONAL ISLAMIC UNIVERSITY ISLAMABAD RELIGIOUS AND SOCIAL LIFE OF RELIGIOUS MINORITIES A CASE STUDY OF BAHÁ’Í AND PARSI COMMUNITIES OF PAKISTAN A thesis submitted in partial fulfillment of the requirements for the degree of Doctorate of Philosophy (PhD) in Comparative Religion By Abdul Fareed Registration no. 101-FU/PhD/F08 Under the Supervision of Dr. Muhammad Imtiaz Zafar DEPARTMENT OF COMPARATIVE RELIGION FACULTY OF ISLAMIC STUDIES, INTERNATIONAL ISLAMIC UNIVERSITY ISLAMABAD ١ذو القعدة ١٤١٦ من الهجرة /Submitted on: August17, 2015 C.E Statement of Undertaking I Abdul Fareed Reg. No. 101/FU/PHD/F-08 and student of Ph.D. Comparative Religion, Faculty of Islamic Studies, International Islamic University Islamabad do hereby solemnly declare that the thesis entitled ‘ Religious and Social Life of the Religious Minorities: A case Study of Bahá’í and Parsi Communities of Pakistan’ submitted by me in partial fulfillment of the requirements for the Ph.D. is my original work, except where otherwise acknowledge in the text, and has not been submitted or published earlier and so not in future, be submitted by me for any degree this University or institution. Abdul Fareed APPROVAL It is certified that Mr. Abdul Fareed s/o Abdul Raheem Reg.No.101-FU/PhD/F08 has successfully defended his thesis titled: Religious and Social Life of the Religious Minorities: A case Study of Bahá’í and Parsi Communities of Pakistan in viva-voce examination held in the Department of Comparative Religion, Faculty of Islamic Studies( Usuluddin) , International Islamic University, Islamabad. -
Comparative Study on Currently Available Wordnets
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 10 (2018) pp. 8140-8145 © Research India Publications. http://www.ripublication.com Comparative Study on Currently Available WordNets Sreedhi Deleep Kumar Reshma E U PG Scholar PG Scholar Department of Computer Science and Engineering Department of Computer Science and Engineering Vidya Academy of Science and Technology Vidya Academy of Science and Technology Thrissur, India. Thrissur, India. Sunitha C Amal Ganesh Associate Professor Assistant Professor Department of Computer Science and Engineering Department of Computer Science and Engineering Vidya Academy of Science and Technology Vidya Academy of Science and Technology Thrissur, India. Thrissur, India. Abstract SinoTibetan, Tibeto-Burman and Austro-Asiatic. The major ones are the Indo-Aryan, spoken by the northern to western WordNet is an information base which is arranged part of India and Dravidian, spoken by southern part of India. hierarchically in any language. Usually, WordNet is The Eighth Schedule of the Indian Constitution lists 22 implemented using indexed file system. Good WordNets languages, which have been referred to as scheduled available in many languages. However, Malayalam is not languages and given recognition, status and official having an efficient WordNet. WordNet differs from the encouragement. dictionaries in their organization. WordNet does not give pronunciation, derivation morphology, etymology, usage notes, A Dictionary can be called as are source dealing with the or pictorial illustrations. WordNet depicts the semantic relation individual words of a language along with its orthography, between word senses more transparently and elegantly. In this pronunciation, usage, synonyms, derivation, history, work, a general comparison of currently browsable WordNets etymology, etc. -
Pandemic, Law,And Indigenous Languages in Pakistan
Vol. 11 No. 01 2021 p-ISSN 2202-2821 e-ISSN 1839-6518 (Australian ISSN Agency) 82801101202103 PANDEMIC, LAW, AND INDIGENOUS LANGUAGES IN PAKISTAN Muhammad Hassan Abbasi Bahria University, Karachi Campus, Pakistan Maya Khemlani David Asia Europe Institute, University of Malaya, Malaysia ABSTRACT – Pakistan is a multilingual state with 74 languages (Siddiqui, 2019), with Urdu being its national language while English is its official language (Article 251 of the Constitution of the Islamic Republic of Pakistan). However, the linguistic diversity, as per the law, has not been given proper status in Pakistan (Rahman, 2002). In the wake of Covid-19 pandemic, the role of medical health professionals, local police officers, media persons and educationists to create an awareness about the precautionary measures to fight Covid-19 among the indigenous communities in different regions of Pakistan is important. However, there is no practice prescribed in the law, to disseminate awareness in the local languages. Moreover, as most of the lexical items regarding the pandemic have been borrowed, the shift to local languages is more than challenging. In urban areas, indigenous communities are aware of the precautions to be taken during this pandemic as they use the mainstream languages (Ali, 2017 & Abbasi, 2019.) However, in the rural and northern areas of Pakistan this is not so prevalent. Some language activists and concerned members of the community in different parts of the state took this opportunity to educate the masses and started an awareness campaign about coronavirus pandemic in local languages (posters in local languages and short video messages on social media and YouTube). -
Improving Semantic Similarity with Cross-Lingual Resources: a Study in Bangla—A Low Resourced Language
informatics Article Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language Rajat Pandit 1,* , Saptarshi Sengupta 2, Sudip Kumar Naskar 3, Niladri Sekhar Dash 4 and Mohini Mohan Sardar 5 1 Department of Computer Science, West Bengal State University, Kolkata 700126, India 2 Department of Computer Science, University of Minnesota Duluth, Duluth, MN 55812, USA; [email protected] 3 Department of Computer Science & Engineering , Jadavpur University, Kolkata 700032, India; [email protected] 4 Linguistic Research Unit, Indian Statistical Institute, Kolkata 700108, India; [email protected] 5 Department of Bengali, West Bengal State University, Kolkata 700126, India; [email protected] * Correspondence: [email protected] Received: 17 February 2019; Accepted: 20 April 2019; Published: 5 May 2019 Abstract: Semantic similarity is a long-standing problem in natural language processing (NLP). It is a topic of great interest as its understanding can provide a look into how human beings comprehend meaning and make associations between words. However, when this problem is looked at from the viewpoint of machine understanding, particularly for under resourced languages, it poses a different problem altogether. In this paper, semantic similarity is explored in Bangla, a less resourced language. For ameliorating the situation in such languages, the most rudimentary method (path-based) and the latest state-of-the-art method (Word2Vec) for semantic similarity calculation were augmented using cross-lingual resources in English and the results obtained are truly astonishing. In the presented paper, two semantic similarity approaches have been explored in Bangla, namely the path-based and distributional model and their cross-lingual counterparts were synthesized in light of the English WordNet and Corpora.