A Python-Based API to Access Indian Language Wordnets
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A Collaborativeplatform for Multilingual Ontology
PhD Dissertation International Doctorate School in Information and Communication Technologies DIT - University of Trento A COLLABORATIVE PLATFORM FOR MULTILINGUAL ONTOLOGY DEVELOPMENT Ahmed Maher Ahmed Tawfik Moustafa Advisor: Professor. Fausto Giunchiglia Università degli Studi di Trento Abstract The world is extremely diverse and its diversity is obvious in the cultural differences and the large number of spoken languages being used all over the world. In this sense, we need to collect and organize a huge amount of knowledge obtained from multiple resources differing from one another in many aspects. A possible approach for doing that is to think of designing effective tools for construction and maintenance of linguistic resources and localized domain ontologies based on well-defined knowledge representation methodologies capable of dealing with diversity and the continuous evolvement of human knowledge. In this thesis, we present a collaborative platform which allows for knowledge organization in a language-independent manner and provides the appropriate mapping from a language independent concept to one specific lexicalization per language. This representation ensures a smooth multilingual enrichment process for linguistic resources and a robust construction of ontologies using language-independent concepts. The collaborative platform is designed following a workflow-based development methodology that models linguistic resources as a set of collaborative objects and assigns a customizable workflow to build and maintain each collaborative object in a community driven manner, with extensive support of modern web 2.0 social and collaborative features. Keywords Knowledge Representation, Multilingual Resources, Ontology Development, Computer Supported Collaborative Work 2 Acknowledgments I am particularly grateful for my supervisor, Professor. Fausto Giunchiglia, for the guidance and advices he has provided throughout my time as a PhD student. -
PRASHANT KUMAR SHARMA 173059007 Computer Science & Engineering M.Tech
PRASHANT KUMAR SHARMA 173059007 Computer Science & Engineering M.Tech. Indian Institute of Technology Bombay Male Specialization: Computer Science and Engineering DOB: 24/12/1993 Examination University Institute Year CPI / % Post Graduation IIT Bombay IIT Bombay 2020 7.53 Undergraduate Specialization : Computer Science and Engineering Dr. A.P.J. Abdul Kalam Technical Kamla Nehru Institute of Technology, Graduation University Sultanpur 2016 84.02 Intermediate/+2 CBSE Kendriya Vidyalaya AFS Gorakhpur 2011 83.40 Matriculation CBSE Kendriya Vidyalaya AFS Gorakhpur 2009 90.80 FIELDS OF INTEREST • Machine Learning, Deep Learning, Natural Language Processing, Computer Vision, Reinforcement Learning M.TECH THESIS AND SEMINAR • Explainable Artificial Intelligence (M.Tech Project) Guide: Prof. Pushpak Bhattacharyya | Industry Collaborator: IBM (May’19 - Present) ◦ Product Objective : To develop an explainability framework which takes a trained deep learning model as input and interprets it with state-of-the-art methods like Integrated Gradients and LIME algorithm. This framework will be distributed as a python package, which can be used either as a command-line tool or as a web service. ◦ Research Objective : To develop a model to capture causal relationships in natural languages, and differentiate between causal and correlated factors using knowledge graphs and deep learning techniques. • Explainable Artificial Intelligence (M.Tech Seminar) Guide: Prof. Pushpak Bhattacharyya | Industry Collaborator: IBM (Sprint’19) ◦ Objective: : Conducted extensive research literature survey of explainability of deep neural networks: Need, Scope, Challenges and state-of-the-art methods. ◦ Involved comprehensive study of explainability of deep neural network representation, output and algorithms for explainability: Sensitivity Analysis, Layer-wise Relevance Propagation, LIME, Integrated Gradients, Influence Functions, Sanity Checks for Saliency maps, Attention mechanism, Semantically Equivalent Rules. -
An Efficient Database Design for Indowordnet Development Using
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh Prabhu2 Shilpa Desai1 Hanumant Redkar1 N eha Prabhugaonkar1 Apurva N agvenkar1 Ramdas Karmali1 (1) GOA UNIVERSITY, Taleigao - Goa (2) THYWAY CREATIONS, Mapusa - Goa [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] ABSTRACT WordNet is a crucial resource that aids in Natural Language Processing (NLP) tasks such as Machine Translation, Information Retrieval, Word Sense Disambiguation, Multi-lingual Dictionary creation, etc. The IndoWordNet is a multilingual WordNet which links WordNets of different Indian languages on a common identification number given to each concept. WordNet is designed to capture the vocabulary of a language and can be considered as a dictionary cum thesaurus and much more. WordNets for some Indian Languages are being developed using expansion approach. In this paper we have discussed the details and our experiences during the evolution of this database design while working on the Indradhanush WordNet Project. The Indradhanush WordNet Project is working on the development of WordNets for seven Indian languages. Our database design gives an efficient plan for storage of WordNet data for all languages. In addition it extends the design to hold specific concepts for a language. KEYWORDS: WordNet, IndoWordNet, synset, database design, expansion approach, semantic relation, lexical relation. Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), pages 229–236, COLING 2012, Mumbai, December 2012. 229 1 Introduction 1.1 WordNet and its storage methods WordNet (Miller, 1993) maintains the concepts in a language, relations between concepts and their ontological details. -
Latent Semantic Network Induction in the Context of Linked Example Senses
Latent semantic network induction in the context of linked example senses Hunter Scott Heidenreich Jake Ryland Williams Department of Computer Science Department of Information Science College of Computing and Informatics College of Computing and Informatics [email protected] [email protected] Abstract of a network—much like the Princeton Word- Net (Miller, 1995; Fellbaum, 1998)—that is con- The Princeton WordNet is a powerful tool structed solely from the semi-structured data of for studying language and developing nat- ural language processing algorithms. With Wiktionary. This relies on the noisy annotations significant work developing it further, one of the editors of Wiktionary to naturally induce a line considers its extension through aligning network over the entirety of the English portion of its expert-annotated structure with other lex- Wiktionary. In doing so, the development of this ical resources. In contrast, this work ex- work produces: plores a completely data-driven approach to network construction, forming a wordnet us- • an induced network over Wiktionary, en- ing the entirety of the open-source, noisy, user- riched with semantically linked examples, annotated dictionary, Wiktionary. Compar- forming a directed acyclic graph (DAG); ing baselines to WordNet, we find compelling evidence that our network induction process • an exploration of the task of relationship dis- constructs a network with useful semantic structure. With thousands of semantically- ambiguation as a means to induce network linked examples that demonstrate sense usage construction; and from basic lemmas to multiword expressions (MWEs), we believe this work motivates fu- • an outline for directions of expansion, includ- ture research. ing increasing precision in disambiguation, cross-linking example usages, and aligning 1 Introduction English Wiktionary with other languages. -
Visualization Design for a Web Interface to the Large-Scale Linked Lexical Resource UBY
Visualization Design for a Web Interface to the Large-Scale Linked Lexical Resource UBY We present the results of a collaboration of visualization experts and computational linguists which aimed at the re-design of the visualization component in the Web user interface (Web UI) to the large-scale linked lexical resource UBY. UBY combines a wide range of information from expert- constructed (e.g., WordNet, FrameNet, VerbNet) and collaboratively constructed (e.g., Wiktionary, Wikipedia) resources for English and German, see https://www.ukp.tu-darmstadt.de/uby. All resources contained in UBY distinguish not only different words but also their senses. A distinguishing feature of UBY is that the different resources are aligned to each other at the word sense level, i.e. there are links connecting equivalent word senses from different resources in UBY. For senses that are linked, information from the aligned resources can be accessed and the resulting enriched sense representations can be used to enhance the performance of Natural Language Processing tasks. Targeted user groups of the UBY Web UI are researchers in the field of Natural Language Processing and in the Digital Humanities (e.g., lexicographers, linguists). In the context of exploring the usually large number of senses for an arbitrary search word, the UBY Web UI should support these user groups in assessing the added value of sense links for particular applications. It is important to emphasize that this is an open research question for most applications. We will present the results of our detailed requirements analysis that revealed a number of central requirements a visualization of all the senses for a given search word and the links between them must meet in order to be useful for this purpose. -
A Large, Interlinked, Syntactically-Rich Lexical Resource for Ontologies
Semantic Web 0 (0) 1 1 IOS Press lemonUby - a large, interlinked, syntactically-rich lexical resource for ontologies Judith Eckle-Kohler, a;∗ John Philip McCrae b and Christian Chiarcos c a Ubiquitous Knowledge Processing (UKP) Lab, Department of Computer Science, Technische Universität Darmstadt and Information Center for Education, German Institute for International Educational Research, Germany, http://www.ukp.tu-darmstadt.de b Cognitive Interaction Technology (CITEC), Semantic Computing Group, Universität Bielefeld, Germany, http://www.sc.cit-ec.uni-bielefeld.de c Applied Computational Linguistics (ACoLi), Department of Computer Science and Mathematics, Goethe-University Frankfurt am Main, Germany, http://acoli.cs.uni-frankfurt.de Abstract. We introduce lemonUby, a new lexical resource integrated in the Semantic Web which is the result of converting data extracted from the existing large-scale linked lexical resource UBY to the lemon lexicon model. The following data from UBY were converted: WordNet, FrameNet, VerbNet, English and German Wiktionary, the English and German entries of Omega- Wiki, as well as links between pairs of these lexicons at the word sense level (links between VerbNet and FrameNet, VerbNet and WordNet, WordNet and FrameNet, WordNet and Wiktionary, WordNet and German OmegaWiki). We linked lemonUby to other lexical resources and linguistic terminology repositories in the Linguistic Linked Open Data cloud and outline possible applications of this new dataset. Keywords: Lexicon model, lemon, UBY-LMF, UBY, OLiA, ISOcat, WordNet, VerbNet, FrameNet, Wiktionary, OmegaWiki 1. Introduction numerous mappings and linkings of lexica, as well as standards for representing lexical resources, such Recently, the language resource community has begun as the ISO 24613:2008 Lexical Markup Framework to explore the opportunities offered by the Semantic (LMF) [13]. -
Mining Translations from the Web of Open Linked Data
Mining translations from the web of open linked data John Philip McCrae Philipp Cimiano University of Bielefeld University of Bielefeld [email protected] [email protected] Abstract OmegaWiki on the web of data should ameliorate In this paper we consider the prospect of the process of harvesting translations from these extracting translations for words from the resources. web of linked data. By searching for We consider two sources for translations from entities that have labels in both English linked data: firstly, we consider mining labels for and German we extract 665,000 transla- concepts from the data contained in the 2010 Bil- tions. We then also consider a linguis- lion Triple Challenge (BTC) data set, as well as tic linked data resource, lemonUby, from DBpedia (Auer et al., 2007) and FreeBase (Bol- which we extract a further 115,000 transla- lacker et al., 2008). Secondly, we mine transla- tions. We combine these translations with tions from lemonUby (Eckle-Kohler et al., 2013), the Moses statistical machine translation, a resource that integrates a number of distinct dic- and we show that the translations extracted tionary language resources in the lemon (Lexi- from the linked data can be used to im- con Model for Ontologies) format (McCrae et al., prove the translation of unknown words. 2012), which is a model for representing rich lex- ical information including forms, sense, morphol- 1 Introduction ogy and syntax of ontology labels. We then con- In recent years there has been a massive explo- sider the process of including these extra transla- sion in the amount and quality of data available tions into an existing translation system, namely as linked data on the web. -
IITK Chronicle IITK Newsletter || Volume 2 Issue 2, April 2018 Highlights from the Director’S Desk
Indian Institute of Technology Kanpur IITK Chronicle IITK Newsletter || Volume 2 Issue 2, April 2018 Highlights From the Director’s Desk e are once again at the end of another academic session and in the last two months, we have Prof. Abhay Karandikar, Department of successfully organized some of the most awaited annual events of our campus. Techkriti, our Electrical Engineering, IIT Bombay annual technological fest, saw record participation and the Institute Flower Show had the entire appointed as the Director, IIT Kanpur. W campus coming together for a fun show and fete. The institute's Women's Cell also organized great initiatives in the past month for gender sensitization. With Prof. Monika Katiyar appointed Head of these events, IIT Kanpur strives to set an example for a zero-tolerance policy towards sexual harassment on the Department of Materials Science & campus. Several of our faculty members have won notable awards and recognitions and the SIDBI Engineering. Innovation and Incubation Centre won the Platinum Award under the "Smart Incubator of the Year" category by India Smart Grid Forum. Prof. Amalendu Chandra appointed Head of the Department of Chemistry. For the coming months, we have many exciting things in the pipeline – the start of the new academic session and the Convocation always create a special buzz on campus. But most of all, we look forward to welcoming our new Director, Prof. Abhay Karandikar, and working with him to take IIT Kanpur to greater heights. New Colleagues Prof. Manindra Agrawal Officiating Director, Prof. Subrahmanyam Saderla IIT Kanpur Aerospace Engineering Prof. Ankush Sharma In Focus Electrical Engineering Techkriti 2018 Prof. -
Word Sense Disambiguation Using Indowordnet
Word Sense Disambiguation Using IndoWordNet Sudha Bhingardive Department of Computer Science and Engineering, Indian Institute of Technology-Bombay, Powai, Mumbai, India. Email: [email protected] Pushpak Bhattacharyya Department of Computer Science and Engineering, Indian Institute of Technology-Bombay, Powai, Mumbai, India. Email: [email protected] Abstract Word Sense Disambiguation (WSD) is considered as one of the toughest problem in the field of Natural Language Processing. IndoWordNet is a linked structure of WordNets of major Indian languages. Recently, several IndoWordNet based WSD approaches have been proposed and implemented for Indian languages. In this chapter, we present the usage of various other features of IndoWordNet in performing WSD. Here, we have used features like linked WordNets, semantic and lexical relations, etc. We have followed two unsupervised approaches, viz., (1) use of IndoWordNet in bilingual WSD for finding the sense distribution with the help of Expectation Maximization algorithm, (2) use of IndoWordNet in WSD for finding the most frequent sense using word and sense embeddings. Both these approaches justifies the importance of IndoWordNet for word sense disambiguation for Indian languages, as the results are found to be promising and can beat the baselines. Keywords: IndoWordNet, WordNet, Word Sense Disambiguation, WSD, Bilingual WSD, Unsupervised WSD, Most Frequent Sense, MFS 1. Introduction 1.1. What is Word Sense Disambiguation? Word Sense Disambiguation or WSD is the task of identifying the correct meaning of a word in a given context. The necessary condition for a word to be disambiguated is that it should have multiple senses. Generally, in order to disambiguate a given word, we should have a context in which the word has been used and knowledge about the word, otherwise it becomes difficult to get the exact meaning of a word. -
Exploring Resources in Word Sense Disambiguation for Marathi Language Amit Patil1, Chhaya Patil2, Dr
www.rspsciencehub.com Volume 02 Issue 10S October 2020 Special Issue of First International Conference on Advancements in Management, Engineering and Technology (ICAMET 2020) Exploring Resources in Word Sense Disambiguation for Marathi Language Amit Patil1, Chhaya Patil2, Dr. Rakesh Ramteke3, Dr. R. P. Bhavsar4, Dr. Hemant Darbari5 1,2 Assistant Professor, Department of Computer Application, RCPET’s IMRD, Maharashtra, India 3,4Professor, School of Computer Sciences, KBC North Maharashtra University, Maharashtra, India 5Director General, Centre for Development of Advanced Computing (C-DAC), Maharashtra, India Abstract Word Sense Disambiguation (WSD) is one of the most challenging problems in the research area of natural language processing. To find the correct sense of the word in a particular context is called Word Sense Disambiguation. As a human, we can get a correct sense of the word given in the sentence because of word knowledge of that particular natural language, but it is not an easy task for the machine to disambiguate the word. Developing any WSD system, it required sense repository and sense dictionary. It is very costly and time-consuming to build these resources. Many foreign languages have available these resources, that is why most of the foreign languages like English, German, Spanish etc lot of work is done in these Natural languages. When we look for Indian languages like Hindi, Marathi, Bengali etc. very less work is done. The reason behind this is resource-scarcity. In this paper, we majorly focus on Marathi Language Word Sense Disambiguation because of very less work is done in the Marathi Language as compared to Hindi and other Indian Languages. -
ANNUAL REPORT 2019-20 IIT Bombay Annual Report 2019-20 Content
IIT BOMBAY ANNUAL REPORT 2019-20 IIT BOMBAY ANNUAL REPORT 2019-20 Content 1) Director’s Report 05 2) Academic Programmes 07 3) Research and Development Activities 09 4) Outreach Programmes 26 5) Faculty Achievements and Recognitions 27 6) Student Activities 31 7) Placement 55 8) Society For Innovation And Entrepreneurship 69 9) IIT Bombay Research Park Foundation 71 10) International Relations 73 11) Alumni And Corporate Relations 84 12) Institute Events 90 13) Facilities 99 a) Infrastructure Development b) Central Library c) Computer Centre d) Centre For Distance Engineering Education Programme 14) Departments/ Centres/ Schools and Interdisciplinary Groups 107 15) Publications 140 16) Organization 141 17) Summary of Accounts 152 Director's Report By Prof. Subhasis Chaudhuri, Director, IIT Bombay Indian Institute of Technology Bombay acknowledged for their research contributions. (IIT Bombay) has a rich tradition of pursuing We have also been able to further our links with excellence and has continually re-invented international and national peer universities, itself in terms of academic programmes and enabling us to enhance research and educational research infrastructure. Students are exposed programmes at the Institute. to challenging, research-based academics and IIT Bombay continues to make forays into a host of sport, cultural and organizational newer territories pertinent to undergraduate activities on its vibrant campus. The presence and postgraduate education. At postgraduate of world-class research facilities, vigorous level, a specially designed MA+PhD dual institute-industry collaborations, international degree programme in Philosophy under the exchange programmes, interdisciplinary HSS department has been introduced. IDC, the research collaborations and industrial training Industrial Design Centre, celebrated 50 years opportunities help the students of IIT Bombay to of its golden existence earlier this year. -
WWDS Apis: Application Programming Interfaces for Efficient Manipulation of World Wordnet Database Structure
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) WWDS APIs: Application Programming Interfaces for Efficient Manipulation of World WordNet Database Structure Hanumant Redkar1, Sudha Bhingardive1, Kevin Patel1, Pushpak Bhattacharyya1 Neha Prabhugaonkar2, Apurva Nagvenkar2, Ramdas Karmali2 1Indian Institute of Technology Bombay, Mumbai, India 2Goa University, Goa, India {hanumantredkar, bhingardivesudha, kevin.svnit, pushpakbh}@gmail.com {nehapgaonkar.1920, apurv.nagvenkar, ramdas.karmali}@gmail.com Abstract example, developers can potentially extract information WordNets are useful resources for natural language from other WordNets through WWDS and its APIs that is processing. Various WordNets for different languages have missing in their source WordNet. The WWDS and WWDS been developed by different groups. Recently, World APIs are explained in the following sections. WordNet Database Structure (WWDS) was proposed by Redkar et. al (2015) as a common platformm to store these different WordNets. However, it is underutilized due to lack World WordNet Database Structure of programming interface. In this paper, we present WWDS APIs, which are designed to address this shortcoming. These WWDS is an efficient storage mechanism which uses WWDS APIs, in conjunction with WWDS, act as a wrapper that enables developers to utilize WordNets without multiple databases to accommodate different WordNets. Its worrying about the underlying storage structure. The APIs design is based on IndoWordNet database structure are developed in PHP, Java, and Python, as they are the (Prabhu et al., 2012). The language independent preferred programming languages of most developers and information such as semantic relations, ontology details, researchers working in language technologies. These APIs etc. is stored in a single master database named can help in various applications like machine translation, word sense disambiguation, multilingual information wordnet_master.