Natural Language Processing
Total Page:16
File Type:pdf, Size:1020Kb
Natural Language Processing Preeti Dubey Department of Computer Science Govt. College for Women, Parade, Jammu Abstract: The upcoming field of computational linguists namely natural language processing (NLP) is presented in this paper. Introduction to the basics of NLP, techniques of development and its importance is discussed by the author. Key words: Computational Linguists, NLP Introduction way that the meaning of the source text is Natural language processing is a preserved during this process. Machine branch of computer science that deals with the translation systems can be bilingual as well as processing of one human language into multilingual. Systems that produce translations another. It involves both computation and between only two particular languages are language. Natural language processing can be called bilingual systems and those that produce applied to text as well as speech. The translations for any given pair of languages are development of speech processing systems is called multilingual systems. Many approaches more challenging as compared to text are available for the development of machine processing. Natural language processing is a translation systems namely: rule based major technology that can be used to bridge approach, Corpus based and Hybrid. Present the gap between human communication and machine systems are also being developed digital data. The goal of natural language using deep learning methods. processing is to design and build software that will analyse, understand, and generate Rule Based Machine Translation languages that humans use naturally, so that Systems are developed to produce output using eventually people can address computers as the linguistic rules. It requires deep though they were addressing people. NLP has understanding of the languages undertaken. It many applications such as Machine uses machine readable dictionaries. Translation, Infor mation Extracton, Summarization, Question Answering etc. In Corpus Based Machine Translation the last decade, most of the effort in this field Systems use large corpora for the development is inclined towards machine translation. This of machine translation systems. The accuracy paper is more focused on the basics of text of these systems depends on both the quantity processing using machine translation. and quality of corpora used. Machine Translation Hybrid Machine Translation Systems are developed using more than one technique. MT is a multidisciplinary field of research. It applies the ideas from linguistics, A system developed using this method may be computer science, artificial intelligence, using the corpora as well has linguistic rules. statistics, mathematics, philosophy and many Deep Learning Machine Translation other fields. The text is translated in such a Systems are based on methods based on artificial neural networks. The choice of the approach to be used depends on the languages *Corresponding author(s): [email protected] (Preeti Dubey) JK Knowledge Initiative 2017; 1(2): 89 undertaken and the availability of Technology, Kanpur, 2) University of computational resources. Hyderabad, 3) National Center for Software Technology, Mumbai, 4) Center for Challenges in Machine Translation Development of Advanced Computing (CDAC), 5) Pune Indian Institute of The translation process is not mere translation Technology, Bombay, 6) International Institute of words but the translated output should be as of Information Technology, Hyderabad, 7) accurate as translated by some human. It is a Anna University, 8) KB Chandrasekhar challenging task. Some challenges faced in the Research Center, Chennai, 9) Jadavpur process of machine translation are: University, Kolkata, 10) Jawaharlal Nehru Non-availability of computational resources University (JNU), 11) Mahatma Gandhi required for developing machine translation International Hindi University (MGIHU) systems such as dictionaries, corpora, Some machine translation systems morphological analyzer etc. It requires a lot of developed in India are: finance and time to start from the scratch. Different Language structures: Language structure can be SVO (Subject- Verb- Object) Language S.No System Year SOV (Subject- Object- Verb). Some languages Undertaken have completely different structures. English to Indian 1. AnglaBharti 1991 Therefore, they need to be aligned. Languages(IL) 2. Anusaarka 1995 IL to IL Word Sense Ambiguity is another challenge faced by the developers of machine translation 3. Anubharati 1995 Hindi to English systems. Every natural language involves 4. Mantra 1999 English to Hindi English to ambiguous words. It is very important to 5. MAT 2002 handle ambiguity so as to develop accurate Kanadda Bengali - systems. 6. Vaasannubaada 2002 Assameese Discourse Integration is essential for 7. AnglaHindi 2003 English to Hindi developing efficient systems. The meaning of 8. AnglaBharti-II 2004 English to Hindi an individual sentence may depend on the 9. MaTra 2004 IL to IL sentences that precede it and may influence the 10. Anubharati-II 2004 IL to IL meanings of the sentences that follow it e.g. 11. Shiva & Shakti 2004 English to Hindi “He always wanted it”. The meaning of ‘It’ in English -Telugu 12. 2004 English to Telugu the sentence depends on the prior discourse MTS Telugu -Tamil context. References such as this are called 13. 2004 Telugu to Tamil anaphoric references or anaphora. MTS English to 14. Anubaad 2004 Pragmatic Analysis means to correctly Bengali interpret the sentence and analyze what was 15. Hinglish 2004 Hindi to English meant. E.g. “My house was broken into last Punjabi to Hindi 16. 2007 Punjabi to Hindi night. They took the jewelry” should be MTS English to English to interpret correctly and they should be 17. 2008 recognized as referring to thieves. Malayam MTS Malayam 18. Sampark 2009 IL-IL Key Systems Developed in India Hindi to Punjabi 19. 2009 Hindi to Punjabi MTS The earliest efforts in Machine Translation in English to English to 20. 2010 India started from the mid 80s. Some of the Sanskrit MTS Sanskrit prominent contributors in the field of language Hindi to Dogri 21. 2014 Hindi to Dogri translators are: 1) The research and MTS development projects at Indian Institute of JK Knowledge Initiative 2017; 1(2): 90 Translation Scenario in J&K Intelligence, 10(3), NCST, Banglore. India, 22-25. The state of Jammu and Kashmir has a Kommaluri Vijayanand, Sirajul Islam diversity of languages. The languages which Choudhury, Pranab Ratna. 2002. are majorly spoken in the state are: Urdu, VAASAANUBAADA - Automatic Dogri and Kashmiri. The translation of Hindi Machine Translation of Bilingual and English text into Urdu is available online. Bengali-Assamese News Texts. Google has its translation tool where English Language Engineering Conference. as well as Hindi can be converted to Urdu. Hyderabad, India. [Internet Kashmiri and Dogri are both low resourced Source:http://portal.acm.org/citation.cfm languages. Some translation tools for Kashmiri ?id=788716 have been developed by researchers in Preeti Dubey, The study and development of Kashmir University.The Department of Macine translation from Hindi language Information Technology (DIT), India has got to Dogri language: An important tool to some work done on software localization in bridge the digital divide. Thesis Dogri. Some of the software tools that are submitted in the Department of available in Dogri are: Open Office, Firefox, Computer Science & IT, University of Thunderbird, Pidgin messenger, Sunbird Jammu, 2014 calendar, scribus page layout application etc. R. M. K. Sinha, Jain R., Jain A. 2001. The only tool available for the translation of Translation from English to Indian Hindi text into Dogri is the Hindi-Dogri languages: ANGLABHARTI Approach. machine translation software developed by the In proceedings of Symposium on author. The author of the paper is also working Translation Support System STRANS for the development of Dogri-Hindi Machine 2001.February 15-17, IIT Kanpur, India. translation system. pp.167-172. Renu Jain, R.M.K.Sinha, and Ajai Jain, Conclusion Anubharti. 2001. Using Hybrid Natural language processing is an Example-Based Approach for Machine important technique to overcome the language Translation. In proceedings of barriers. In a multilingual country like India, Symposium on Translation Support there is a great need to develop such systems Systems (SYSTRAN2001), February 15- for content localization. The research in this 17, 2001. Kanpur. P.g.123-130 field is now focusing on development of systems to convert from Indian languages to other languages like IL to Japanese, IL to Spanish etc. whereas there are still many Indian languages which do not even have computational resources to develop these tools. A lot of effort in terms of time and finance is required to start from scratch. There is a great need to promote and provide funds for digitization of Indian languages. References Bharati, Akshar, Chaitanya, Vineet, Kulkarni, Amba P., Sangal,Rajeev. 1997. Anusaaraka: Machine Translation in stages. Vivek, A Quarterly in Artificial JK Knowledge Initiative 2017; 1(2): 91 .