Introduction to Gujarati Wordnet

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Gujarati Wordnet Introduction to Gujarati Wordnet Prof. C. K. Bhensdadia Brijesh Bhatt Prof. Pushpak Bhattacharyya [email protected] [email protected] [email protected] Department of Computer Engg., Department of Computer Science and Department of Computer Science and Dharmsinh Desai University, Nadiad Engineering, Engineering, Indian Institute of Technology, Mumbai Indian Institute of Technology, Mumbai Abstract languages of India. Incidentally, Gujarati was Gujarati language is the youngest member of the first language of Gandhiji (Mohandas K. IndoWordnet[1]. As a part of IndoWordnet Gandhi, father of India) and Mohammed Ali project, Wordnet for Gujarati language is being Jinnah (father of Pakistan). developed from Hindi Wordnet using expansion approach. This paper reviews the Gujarati 2.1 History Wordnet development process. It describes the Initially, the writing system of Gujarati was basic features of Gujarati language and evaluates restricted to business writing , while the suitability of Hindi language as a source literature was in Devanāgarī script. The poetry language. Also, the current status of the work and form of language is much older, enriched by the issues in development are described. poetry of poets like Narsinh Mehta. Gujarati prose writing and journalism started in 19th 1. Introduction century. Protest writing against colonialism led WordNet[2] is a machine readable lexical to a string of powerful essays leading to the database for English language developed at foundation of modern Gujarati literature. Princeton University. It has evolved as the most valuable resource for the natural language 2.2 Features processing application. Following the Princeton Some features of Gujarati language are as WordNet, wordnets for many other languages follows: were developed across the globe. The first 2.2.1 Writing system: Gujarati script is a wordnet for Indian languages is Hindi variant of Devanāgarī script, differentiated by wordnet[3], developed at Indian Institute of the loss of the characteristic horizontal line Technology, Bombay. Recently, efforts are going running above the letters and by a small on to develop wordnets for many Indian number of modifications in the remaining Languages. One such effort is to build Gujarati characters. wordnet from Hindi wordnet using expansion For example: approach. Hindi: कमल The layout of the paper is as follows: section 2 (kamal) gives introduction to Gujarati language, section 3 Gujarati: describes historic influence of other languages on કમળ Gujarati and justifies use of Hindi language as 2.2.2 Vocabulary: As Gujarati is an Indo- base language for Gujarati Wordnet development. Aryan language descended from Sanskrit, it's Section 4 describes the expansion approach vocabulary contains four general categories of selected for the Wordnet development. Section 5 words: describes the status of Gujarati Wordnet Tatsam, Tadbhav and Native and Loan words. devleopment and some issues related to synset Tatsam: Set of words accepted from Sanskrit linking. language. Tadbhav: Set of words from Sanskrit language 2. Gujarati Language adopted with change in phonological form. Gujarati, a native language of Indian state of Native: Words which are specific to Gujarati Gujarat, is a member of Indo-Aryan family of Language. languages. There are over 50 million speakers of Loan Words: Words which are accepted from Gujarati language and it is one of the 22 official different languages, like Persian, English, Portugese etc. Next section describes such words verb to make causative sentence. in more detail. For example: It is also noteworthy that in some cases tatsam (1) ઝાડ પડયુ. and tadbhav words for same Sanskrit word co- (Zaad paDyu) exist with same or different meanings. A tree fell. For example: (2) રામે ઝાડ પાડયુ. (1) ધમર ( Dharma) and ધરમ (Dharam) both (Rame Zaad paaDyu) means same, 'Religion'. Ram caused the tree fell. (2) કમર (karma) : Work, with religious connotation (3) કાને રામ પાસે ઝાડ પડાવયુ. કરમ (karam) : Work (Kane Ram paase Zaad padaVyu) Kan cause Ram who caused the tree fell. 2.2.3 Grammar: Gujarati follows Subject- Object-Verb word order. There are three genders 3. Influence of other languages on Gujarati and two numbers. There are no articles. Some As an Indo-Aryan language, Gujarati language significant features are as follows: is very similar to Hindi, Marathi and Punjabi. 2.2.3.1 Gender: Gujarati distinguishes between Grammar and vocabulary of Gujarati language three genders : masculine, feminine and neutral. is very similar to Hindi with few exceptions. A However the gender marker do not represent the brief comparison is as follows : biological gender all the time. (1) Gender: As described in section 2, For example: Gujarati language defines three genders while છોકરો છોકરી Hindi has only 2 genders. (chhokaro) (chhokari) (2) Writing system: Gujarati dropped the (Boy) (Girl) upper horizontal line running above the letter, and few characters are modified as shown in મંકોડો મંકોડી the previous section. (mankodo) (mankodi) (3) Causative verbs: Both Hindi and Gujarati (Big Ant) (Small Ant) handle causative verbs in the same fashion. 2.2.3.2 Adjective: Adjective agrees with noun For Example, and gender. Feminine adjective does not take Hindi: रोना रलाना रलवाना plural marker while agreeing with a plural noun (rona) (rulana) (rulavana) with feminine gender. is similar to, For example: Gujarati: રડવું રડાવવું રડાવરાવવું (1) Masculine singular (radvu) (radavavu) (radavravavu) સારો છોકરો (sar-o chhokar-o) (4) 'Want' and 'should': Both Hindi and Good Boy Gujarati handles "I should ..." and "I want .." in (2) Masculine plural similar ways. Gujarati uses 'jo' which is similar સારા છોકરાઓ to 'chah' of Hindi. (sar-a chhokara-o) Good Boys For example, (3) Feminine singular I should go home now. સારી છોકરી in Hindi, (sar-i chhokar-i) मुजे घर जाना चाहीये। Good girl in Gujarati, (4) Feminine plural મારે ઘરે જવુ જોઇએ. સારી છોકરીઓ (mare ghare javu joiAe) (sar-i chhokari-o) Good girls However there are other languages which also influence Gujarati. As India was ruled by 2.2.3.3 Structure of verbs: Gujarati verbs have Muslims, English and Portuguese, there is root+infinitive structure. Gujarati extends root influence of these languages on Gujarati. Urdu influence: Following words demonstrate translating the Hindi synset to Gujarati synset. Urdu influence on Gujarati, However, emphasis was given to understand Gujarati Urdu English the concept independently of language and then દાવો dava Clami to create synset. ફાયદો fayda Benefit The task of synset development for Gujarati કાયદો kayda Law language is further simplified by on line ખરાબ kharab Bad availability of the milestone laxicon resources like 'Bhagavad Go Mandal'[5] and 'Gujarati English influence: Most of the Indian languages Lexicon'[6]. 'Bhagavad Go Mandal' was have adapted many of the English words and created in early twentieth century at princely Gujarati is not an exception in that. state of Gondal in Kathiawad. It contains For example, around 8.2 lacs words spread across 9 volumes. બેક : Bank It is accepted as standard reference for Gujarati ફોન : Phone language by 'Gujarat Sahitya Parishad' under ટેબલ : Table the leadership of Mahatma Gandhi. 'Gujarati Lexicon' is an another more recent effort, by Portuguese influence: Following are the some Ratilal Chandaria. The online interface of of the words of Portuguese language adapted in Gujarati lexicon provides easy access to Gujarati: meanings, synonyms, antonyms, idioms, સાબુ soap proverbs and phrases. These two resources બટાટા potato provide great help in building synsets. પાદરી father (Christian priest) 5. Observations Thus the Gujarati language has rich set of words 5.1 Synset linkage status derived from Indian languages as well as foreign The synsets are divided into two categories- languages. This insight helps in selecting the Core and Common. Following is the status of approach for building wordnet. synset developed under each categories. Core synset 4. Gujarati Wordnet development using No. of synsets: 1866 expansion approach Total words : 7985 Gujarati wordnet is being built using expansion Unique words: 7078 approach[4]. In this approach, instead of creating Common synset the synset from the scratch, synsets are created by No. of synset : 5632 referring to existing wordnet of related language. Total words : 17245 Hindi is used as a source language to create Unique words: 13800 synsets of Gujarati language. The benefits of this approach are: 5.2 Issues related to synset development (1) Wordnet development process becomes faster Some Hindi synsets were not linked with as the gloss and synset of the source language is Gujarati synsets because of the following already available as reference. reasons: (2) It provides linking between the synsets of (1) Concept does not exist in Gujarati language different languages which can be used for (2) Difficulty in interpreting gloss of Hindi machine translation applications. synset. Some examples are as follows: Synset linkage tool, provided by I.I.T.Bombay, is Core synset used to create synset of Gujarati language. This (1) ID: 408 synset linking tool provides graphical user Concept: तुरही की तरह का एक बडा बाजा interface which shows Hindi synset on the left side and provides interface to enter Gujarati Example: "नरिसंहा की आवाज दरू -दरू तक सुनाई synset on the right hand side. देती है" As Gujarati language is closely related to Hindi, most of the Gujarati synsets are created by Synset: नरिसंहा, नरिसंगा, बाँिकया, गोमुख, िसंगा No such concept is identified in Gujarati There was no difficulty in linking verb, language. However there is a concept in Gujarati adjectives or causative verbs. This is due to the language for similar instrument which is used at similarity between Hindi and Gujarati war-front to announce beginning of a war.
Recommended publications
  • An Efficient Database Design for Indowordnet Development Using
    An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh Prabhu2 Shilpa Desai1 Hanumant Redkar1 N eha Prabhugaonkar1 Apurva N agvenkar1 Ramdas Karmali1 (1) GOA UNIVERSITY, Taleigao - Goa (2) THYWAY CREATIONS, Mapusa - Goa [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] ABSTRACT WordNet is a crucial resource that aids in Natural Language Processing (NLP) tasks such as Machine Translation, Information Retrieval, Word Sense Disambiguation, Multi-lingual Dictionary creation, etc. The IndoWordNet is a multilingual WordNet which links WordNets of different Indian languages on a common identification number given to each concept. WordNet is designed to capture the vocabulary of a language and can be considered as a dictionary cum thesaurus and much more. WordNets for some Indian Languages are being developed using expansion approach. In this paper we have discussed the details and our experiences during the evolution of this database design while working on the Indradhanush WordNet Project. The Indradhanush WordNet Project is working on the development of WordNets for seven Indian languages. Our database design gives an efficient plan for storage of WordNet data for all languages. In addition it extends the design to hold specific concepts for a language. KEYWORDS: WordNet, IndoWordNet, synset, database design, expansion approach, semantic relation, lexical relation. Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), pages 229–236, COLING 2012, Mumbai, December 2012. 229 1 Introduction 1.1 WordNet and its storage methods WordNet (Miller, 1993) maintains the concepts in a language, relations between concepts and their ontological details.
    [Show full text]
  • Exploring Resources in Word Sense Disambiguation for Marathi Language Amit Patil1, Chhaya Patil2, Dr
    www.rspsciencehub.com Volume 02 Issue 10S October 2020 Special Issue of First International Conference on Advancements in Management, Engineering and Technology (ICAMET 2020) Exploring Resources in Word Sense Disambiguation for Marathi Language Amit Patil1, Chhaya Patil2, Dr. Rakesh Ramteke3, Dr. R. P. Bhavsar4, Dr. Hemant Darbari5 1,2 Assistant Professor, Department of Computer Application, RCPET’s IMRD, Maharashtra, India 3,4Professor, School of Computer Sciences, KBC North Maharashtra University, Maharashtra, India 5Director General, Centre for Development of Advanced Computing (C-DAC), Maharashtra, India Abstract Word Sense Disambiguation (WSD) is one of the most challenging problems in the research area of natural language processing. To find the correct sense of the word in a particular context is called Word Sense Disambiguation. As a human, we can get a correct sense of the word given in the sentence because of word knowledge of that particular natural language, but it is not an easy task for the machine to disambiguate the word. Developing any WSD system, it required sense repository and sense dictionary. It is very costly and time-consuming to build these resources. Many foreign languages have available these resources, that is why most of the foreign languages like English, German, Spanish etc lot of work is done in these Natural languages. When we look for Indian languages like Hindi, Marathi, Bengali etc. very less work is done. The reason behind this is resource-scarcity. In this paper, we majorly focus on Marathi Language Word Sense Disambiguation because of very less work is done in the Marathi Language as compared to Hindi and other Indian Languages.
    [Show full text]
  • The Rise of Dalit Peasants Kolhi Activism in Lower Sindh
    The Rise of Dalit Peasants Kolhi Activism in Lower Sindh (Original Thesis Title) Kolhi-peasant Activism in Naon Dumbālo, Lower Sindh Creating Space for Marginalised through Multiple Channels Ghulam Hussain Mahesar Quaid-i-Azam University Department of Anthropology ii Islamabad - Pakistan Year 2014 Kolhi-Peasant Activism in Naon Dumbālo, Lower Sindh Creating Space for Marginalised through Multiple Channels Ghulam Hussain Thesis submitted to the Department of Anthropology, Quaid-i-Azam University Islamabad, in partial fulfillment of the degree of ‗Master of Philosophy in Anthropology‘ iii Quaid-i-Azam University Department of Anthropology Islamabad - Pakistan Year 2014 Formal declaration I hereby, declare that I have produced the present work by myself and without any aid other than those mentioned herein. Any ideas taken directly or indirectly from third party sources are indicated as such. This work has not been published or submitted to any other examination board in the same or a similar form. Islamabad, 25 March 2014 Mr. Ghulam Hussain Mahesar iv Final Approval of Thesis Quaid-i-Azam University Department of Anthropology Islamabad - Pakistan This is to certify that we have read the thesis submitted by Mr. Ghulam Hussain. It is our judgment that this thesis is of sufficient standard to warrant its acceptance by Quaid-i-Azam University, Islamabad for the award of the degree of ―MPhil in Anthropology‖. Committee Supervisor: Dr. Waheed Iqbal Chaudhry External Examiner: Full name of external examiner incl. title Incharge: Dr. Waheed Iqbal Chaudhry v ACKNOWLEDGEMENT This thesis is the product of cumulative effort of many teachers, scholars, and some institutions, that duly deserve to be acknowledged here.
    [Show full text]
  • Proposal for a Gujarati Script Root Zone Label Generation Ruleset (LGR)
    Proposal for a Gujarati Root Zone LGR Neo-Brahmi Generation Panel Proposal for a Gujarati Script Root Zone Label Generation Ruleset (LGR) LGR Version: 3.0 Date: 2019-03-06 Document version: 3.6 Authors: Neo-Brahmi Generation Panel [NBGP] 1 General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Gujarati LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: proposal-gujarati-lgr-06mar19-en.xml Labels for testing can be found in the accompanying text document: gujarati-test-labels-06mar19-en.txt 2 Script for which the LGR is proposed ISO 15924 Code: Gujr ISO 15924 Key N°: 320 ISO 15924 English Name: Gujarati Latin transliteration of native script name: gujarâtî Native name of the script: ગજુ રાતી Maximal Starting Repertoire (MSR) version: MSR-4 1 Proposal for a Gujarati Root Zone LGR Neo-Brahmi Generation Panel 3 Background on the Script and the Principal Languages Using it1 Gujarati (ગજુ રાતી) [also sometimes written as Gujerati, Gujarathi, Guzratee, Guujaratee, Gujrathi, and Gujerathi2] is an Indo-Aryan language native to the Indian state of Gujarat. It is part of the greater Indo-European language family. It is so named because Gujarati is the language of the Gujjars. Gujarati's origins can be traced back to Old Gujarati (circa 1100– 1500 AD).
    [Show full text]
  • Parsi Theater, Urdu Drama, and the Communalization of Knowledge: a Bibliographic Essay
    Parsi Theater, Urdu Drama, and the Communalization of Knowledge: A Bibliographic Essay I its remarkable century-long history traversing the colonial and nation- alist eras, the Parsi theater was unique as a site of communal harmony. The Parsi theater began in Bombay in the early s and fanned out across South and Southeast Asia by the s. During the twentieth cen- tury, major Parsi theatrical companies flourished in Lahore, Delhi, and Calcutta, exerting a huge impact on the development of modern drama, regional music, and the cinema. Parsis, Hindus, Muslims, Anglo-Indians, and Baghdadi Jews consorted amicably in both residential and traveling companies. Although company ownership usually remained in Parsi hands, actors were drawn from many communities, as were professional writers, musicians, painters, stage hands, and other personnel. As Såmn≥t^ Gupta makes clear, it was Parsis, non-Parsis, Hindus, Muslims, and Christians who spread the art of theatre by founding theatrical companies, who built playhouses and encouraged drama, who became actors and popularized the art of acting, who composed innumerable dramas in Gujarati, Hindi, and Urdu, who composed songs and defended classical music, and who wrote descriptions of the Parsi stage and related matters.1 Audiences similarly were heterogeneous, comprised of diverse relig- ious, ethnic, and linguistic groups and representing a wide range of class 1Såmn≥t^ Gupta, P≥rsµ T^iy®ªar: Udb^av aur Vik≥s (Allahabad: Låkb^≥ratµ Prak≥shan, ), dedication (samarpan), p. • T A U S positions. Sections of the public were catered to by particular narrative genres, including the Indo-Muslim fairy romance, the Hindu mythologi- cal, and the bourgeois social drama, yet no genre was produced exclu- sively for a particular viewership.
    [Show full text]
  • WWDS Apis: Application Programming Interfaces for Efficient Manipulation of World Wordnet Database Structure
    Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) WWDS APIs: Application Programming Interfaces for Efficient Manipulation of World WordNet Database Structure Hanumant Redkar1, Sudha Bhingardive1, Kevin Patel1, Pushpak Bhattacharyya1 Neha Prabhugaonkar2, Apurva Nagvenkar2, Ramdas Karmali2 1Indian Institute of Technology Bombay, Mumbai, India 2Goa University, Goa, India {hanumantredkar, bhingardivesudha, kevin.svnit, pushpakbh}@gmail.com {nehapgaonkar.1920, apurv.nagvenkar, ramdas.karmali}@gmail.com Abstract example, developers can potentially extract information WordNets are useful resources for natural language from other WordNets through WWDS and its APIs that is processing. Various WordNets for different languages have missing in their source WordNet. The WWDS and WWDS been developed by different groups. Recently, World APIs are explained in the following sections. WordNet Database Structure (WWDS) was proposed by Redkar et. al (2015) as a common platformm to store these different WordNets. However, it is underutilized due to lack World WordNet Database Structure of programming interface. In this paper, we present WWDS APIs, which are designed to address this shortcoming. These WWDS is an efficient storage mechanism which uses WWDS APIs, in conjunction with WWDS, act as a wrapper that enables developers to utilize WordNets without multiple databases to accommodate different WordNets. Its worrying about the underlying storage structure. The APIs design is based on IndoWordNet database structure are developed in PHP, Java, and Python, as they are the (Prabhu et al., 2012). The language independent preferred programming languages of most developers and information such as semantic relations, ontology details, researchers working in language technologies. These APIs etc. is stored in a single master database named can help in various applications like machine translation, word sense disambiguation, multilingual information wordnet_master.
    [Show full text]
  • Rita Kothari.P65
    NMML OCCASIONAL PAPER PERSPECTIVES IN INDIAN DEVELOPMENT New Series 47 Questions in and of Language Rita Kothari Humanities and Social Sciences Department, Indian Institute of Technology, Gandhinagar, Gujarat Nehru Memorial Museum and Library 2015 NMML Occasional Paper © Rita Kothari, 2015 All rights reserved. No portion of the contents may be reproduced in any form without the written permission of the author. This Occasional Paper should not be reported as representing the views of the NMML. The views expressed in this Occasional Paper are those of the author(s) and speakers and do not represent those of the NMML or NMML policy, or NMML staff, fellows, trustees, advisory groups, or any individuals or organizations that provide support to the NMML Society nor are they endorsed by NMML. Occasional Papers describe research by the author(s) and are published to elicit comments and to further debate. Questions regarding the content of individual Occasional Papers should be directed to the authors. NMML will not be liable for any civil or criminal liability arising out of the statements made herein. Published by Nehru Memorial Museum and Library Teen Murti House New Delhi-110011 e-mail : [email protected] ISBN : 978-93-83650-63-7 Price Rs. 100/-; US $ 10 Page setting & Printed by : A.D. Print Studio, 1749 B/6, Govind Puri Extn. Kalkaji, New Delhi - 110019. E-mail : [email protected] NMML Occasional Paper Questions in and of Language* Rita Kothari Sorathgada sun utri Janjhar re jankaar Dhroojegadaanrakangra Haan re hamedhooje to gad girnaar re… (K. Kothari, 1973: 53) As Sorath stepped out of the fort Not only the hill in the neighbourhood But the walls of Girnar fort trembled By the sweet twinkle of her toe-bells… The verse quoted above is one from the vast repertoire of narrative traditions of the musician community of Langhas.
    [Show full text]
  • FACTORS AFFECTING PROFICIENCY AMONG GUJARATI HERITAGE LANGUAGE LEARNERS on THREE CONTINENTS a Dissertation Submitted to the Facu
    FACTORS AFFECTING PROFICIENCY AMONG GUJARATI HERITAGE LANGUAGE LEARNERS ON THREE CONTINENTS A Dissertation submitted to the Faculty of the Graduate School of Arts and Sciences of Georgetown University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Linguistics By Sheena Shah, M.S. Washington, DC May 14, 2013 Copyright 2013 by Sheena Shah All Rights Reserved ii FACTORS AFFECTING PROFICIENCY AMONG GUJARATI HERITAGE LANGUAGE LEARNERS ON THREE CONTINENTS Sheena Shah, M.S. Thesis Advisors: Alison Mackey, Ph.D. Natalie Schilling, Ph.D. ABSTRACT This dissertation examines the causes behind the differences in proficiency in the North Indian language Gujarati among heritage learners of Gujarati in three diaspora locations. In particular, I focus on whether there is a relationship between heritage language ability and ethnic and cultural identity. Previous studies have reported divergent findings. Some have found a positive relationship (e.g., Cho, 2000; Kang & Kim, 2011; Phinney, Romero, Nava, & Huang, 2001; Soto, 2002), whereas others found no correlation (e.g., C. L. Brown, 2009; Jo, 2001; Smolicz, 1992), or identified only a partial relationship (e.g., Mah, 2005). Only a few studies have addressed this question by studying one community in different transnational locations (see, for example, Canagarajah, 2008, 2012a, 2012b). The current study addresses this matter by examining data from members of the same ethnic group in similar educational settings in three multi-ethnic and multilingual cities. The results of this study are based on a survey consisting of questionnaires, semi-structured interviews, and proficiency tests with 135 participants. Participants are Gujarati heritage language learners from the U.K., Singapore, and South Africa, who are either current students or recent graduates of a Gujarati School.
    [Show full text]
  • Religious and Social Life of Religious Minorities
    RELIGIOUS AND SOCIAL LIFE OF RELIGIOUS MINORITIES A CASE STUDY OF BAHÁ’Í AND PARSI COMMUNITIES OF PAKISTAN Abdul Fareed 101-FU/PhD/F08 DEPARTMENT OF COMPARATIVE RELIGION FACULTY OF ISLAMIC STUDIES, INTERNATIONAL ISLAMIC UNIVERSITY ISLAMABAD RELIGIOUS AND SOCIAL LIFE OF RELIGIOUS MINORITIES A CASE STUDY OF BAHÁ’Í AND PARSI COMMUNITIES OF PAKISTAN A thesis submitted in partial fulfillment of the requirements for the degree of Doctorate of Philosophy (PhD) in Comparative Religion By Abdul Fareed Registration no. 101-FU/PhD/F08 Under the Supervision of Dr. Muhammad Imtiaz Zafar DEPARTMENT OF COMPARATIVE RELIGION FACULTY OF ISLAMIC STUDIES, INTERNATIONAL ISLAMIC UNIVERSITY ISLAMABAD ١ذو القعدة ١٤١٦ من الهجرة /Submitted on: August17, 2015 C.E Statement of Undertaking I Abdul Fareed Reg. No. 101/FU/PHD/F-08 and student of Ph.D. Comparative Religion, Faculty of Islamic Studies, International Islamic University Islamabad do hereby solemnly declare that the thesis entitled ‘ Religious and Social Life of the Religious Minorities: A case Study of Bahá’í and Parsi Communities of Pakistan’ submitted by me in partial fulfillment of the requirements for the Ph.D. is my original work, except where otherwise acknowledge in the text, and has not been submitted or published earlier and so not in future, be submitted by me for any degree this University or institution. Abdul Fareed APPROVAL It is certified that Mr. Abdul Fareed s/o Abdul Raheem Reg.No.101-FU/PhD/F08 has successfully defended his thesis titled: Religious and Social Life of the Religious Minorities: A case Study of Bahá’í and Parsi Communities of Pakistan in viva-voce examination held in the Department of Comparative Religion, Faculty of Islamic Studies( Usuluddin) , International Islamic University, Islamabad.
    [Show full text]
  • Comparative Study on Currently Available Wordnets
    International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 10 (2018) pp. 8140-8145 © Research India Publications. http://www.ripublication.com Comparative Study on Currently Available WordNets Sreedhi Deleep Kumar Reshma E U PG Scholar PG Scholar Department of Computer Science and Engineering Department of Computer Science and Engineering Vidya Academy of Science and Technology Vidya Academy of Science and Technology Thrissur, India. Thrissur, India. Sunitha C Amal Ganesh Associate Professor Assistant Professor Department of Computer Science and Engineering Department of Computer Science and Engineering Vidya Academy of Science and Technology Vidya Academy of Science and Technology Thrissur, India. Thrissur, India. Abstract SinoTibetan, Tibeto-Burman and Austro-Asiatic. The major ones are the Indo-Aryan, spoken by the northern to western WordNet is an information base which is arranged part of India and Dravidian, spoken by southern part of India. hierarchically in any language. Usually, WordNet is The Eighth Schedule of the Indian Constitution lists 22 implemented using indexed file system. Good WordNets languages, which have been referred to as scheduled available in many languages. However, Malayalam is not languages and given recognition, status and official having an efficient WordNet. WordNet differs from the encouragement. dictionaries in their organization. WordNet does not give pronunciation, derivation morphology, etymology, usage notes, A Dictionary can be called as are source dealing with the or pictorial illustrations. WordNet depicts the semantic relation individual words of a language along with its orthography, between word senses more transparently and elegantly. In this pronunciation, usage, synonyms, derivation, history, work, a general comparison of currently browsable WordNets etymology, etc.
    [Show full text]
  • Pandemic, Law,And Indigenous Languages in Pakistan
    Vol. 11 No. 01 2021 p-ISSN 2202-2821 e-ISSN 1839-6518 (Australian ISSN Agency) 82801101202103 PANDEMIC, LAW, AND INDIGENOUS LANGUAGES IN PAKISTAN Muhammad Hassan Abbasi Bahria University, Karachi Campus, Pakistan Maya Khemlani David Asia Europe Institute, University of Malaya, Malaysia ABSTRACT – Pakistan is a multilingual state with 74 languages (Siddiqui, 2019), with Urdu being its national language while English is its official language (Article 251 of the Constitution of the Islamic Republic of Pakistan). However, the linguistic diversity, as per the law, has not been given proper status in Pakistan (Rahman, 2002). In the wake of Covid-19 pandemic, the role of medical health professionals, local police officers, media persons and educationists to create an awareness about the precautionary measures to fight Covid-19 among the indigenous communities in different regions of Pakistan is important. However, there is no practice prescribed in the law, to disseminate awareness in the local languages. Moreover, as most of the lexical items regarding the pandemic have been borrowed, the shift to local languages is more than challenging. In urban areas, indigenous communities are aware of the precautions to be taken during this pandemic as they use the mainstream languages (Ali, 2017 & Abbasi, 2019.) However, in the rural and northern areas of Pakistan this is not so prevalent. Some language activists and concerned members of the community in different parts of the state took this opportunity to educate the masses and started an awareness campaign about coronavirus pandemic in local languages (posters in local languages and short video messages on social media and YouTube).
    [Show full text]
  • Improving Semantic Similarity with Cross-Lingual Resources: a Study in Bangla—A Low Resourced Language
    informatics Article Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language Rajat Pandit 1,* , Saptarshi Sengupta 2, Sudip Kumar Naskar 3, Niladri Sekhar Dash 4 and Mohini Mohan Sardar 5 1 Department of Computer Science, West Bengal State University, Kolkata 700126, India 2 Department of Computer Science, University of Minnesota Duluth, Duluth, MN 55812, USA; [email protected] 3 Department of Computer Science & Engineering , Jadavpur University, Kolkata 700032, India; [email protected] 4 Linguistic Research Unit, Indian Statistical Institute, Kolkata 700108, India; [email protected] 5 Department of Bengali, West Bengal State University, Kolkata 700126, India; [email protected] * Correspondence: [email protected] Received: 17 February 2019; Accepted: 20 April 2019; Published: 5 May 2019 Abstract: Semantic similarity is a long-standing problem in natural language processing (NLP). It is a topic of great interest as its understanding can provide a look into how human beings comprehend meaning and make associations between words. However, when this problem is looked at from the viewpoint of machine understanding, particularly for under resourced languages, it poses a different problem altogether. In this paper, semantic similarity is explored in Bangla, a less resourced language. For ameliorating the situation in such languages, the most rudimentary method (path-based) and the latest state-of-the-art method (Word2Vec) for semantic similarity calculation were augmented using cross-lingual resources in English and the results obtained are truly astonishing. In the presented paper, two semantic similarity approaches have been explored in Bangla, namely the path-based and distributional model and their cross-lingual counterparts were synthesized in light of the English WordNet and Corpora.
    [Show full text]