Named Entity Recognition in Assamese: A Hybrid Approach

Padmaja Sharma Utpal Sharma Jugal Department of CSE Department of CSE Department of CS University Tezpur University University of Colorado at Colorado Springs , 784028 Assam, India 784028 Colorado, USA 80918 Email: [email protected] Email:[email protected] Email:[email protected]

Abstract—Most NER systems have been developed using one NER has been applied in many applications such as of two approaches: Rule-based or Machine-Learning, with their Information Extraction, Question Answering and Event strengths and weaknesses. In this paper, we propose a hybrid Extraction. Besides these, NER can also be applied in NER approach which is a combination of both rule-based and ML approaches to improve the overall system performance for co-reference resolution, Web mining, molecular biology, a resource poor language like Assamese. Our proposed hybrid bioinformatics, and medicine, etc. approach is capable of recognizing four types of NEs: Person, The rest of the paper is organized as follows- Section 2 Location, Organization and Miscellaneous. The empirical results describes the characteristic of Assamese and challenges of obtained indicate that the hybrid approach outperforms both NER in Indian languages. Approaches to NER are described rule-based and ML when processed independently. The hybrid Assamese NER obtains an F-measure of 85%-90%. in Section 3. Section 4 describes previous work on NER using Hybrid Approaches. Section 5 describes our work and the last . INTRODUCTION section describes the conclusion. The term Named Entity, which is used extensively in Natural Language Processing, was first introduced at the Sixth II. CHARACTERISTIC OF ASSAMESE LANGUAGE AND Message Understanding Conference [1] whose main goal was CHALLENGES OF NER to identify entities which can be considered names from a Assamese is a morphologically rich language like any other set of documents and classify them into predefined categories. Indian languages. Although Assamese is an Indo-European Tagging of Named Entities in text plays an important role language spoken by around 30 million people, very little in many NLP applications. In the Message Understanding computational linguistic work has been done for the language. Conferences (MUC) of the 1990s, it became clear that it It is written using the Assamese script. It consists of 11 is necessary to first identify certain classes of information , 34 and 10 digits. There are no uppercase in order to extract meaningful information from a given or lowercase letters in the script. It is a relatively a free word document. Later the conference established the Named Entity order language. For example the sentence: Recognition task, in which systems were asked to identify names, dates, times and numerical information. Thus Named Entity Recognition (NER) can be defined as the identification [: I will go to play] of proper nouns and the further classification of these proper nouns into a set of classes such as person names, location can be written in any of the following forms given below. names, organization names and miscellaneous names. A few conventions for tagging Named Entities were established at the MUC Conferences. These include ENAMEX for names (organization, person, location), NUMEX for numerical entities (monetary, percentages) and TIMEX tags for temporal The different types of ambiguities that occur in NER are as entities (time, date, year). For example consider the sentence follows: given below: 1) Person vs. location:- In English, a word such as Mr. John visited .S in July 2012. Washington or Cleveland can be the name of a person Using an XML format, it can be marked up as follows: or a location. Similarly, in , words such Mr. as Kashi can be a person name as well as a location John visited U.S in 2) Common noun vs. proper noun:- Common nouns July 2012 . sometimes occur as a person name. For example, Here, the markups show the named entities in the document. Surya which means sun in , creates ambiguities between common nouns and proper nouns. 3) Organization vs. person name:- Amulya may be the name is required. Such corpora of significant size are still of a person as well as that of an organization, creating lacking for most Indian languages. Basic resources such ambiguity. An English example may be Trump, which as parts of speech (POS) taggers, or good morphological can be the name of a person as well as the name of a analyzers, and name lists, for most Indian languages do company or a brand. not exist or are in research stages, whereas a number of 4) Nested entities:- Nested entities such as New York resources are available in English. University, also create ambiguity because they contain two or more proper nouns. III. DIFFERENT APPROACHES TO NER Such phenomena are abundant in Indian or South Asian Three broadly used approaches in NER are: languages languages as well. These ambiguities in names can 1) Rule-based be categorized as structural ambiguity and semantic ambiguity. 2) Machine-Learning based, and A number of additional challenges need to be addressed in 3) Hybrid. languages such as , , Assamese, Telugu, Rule-based NER focuses on the extraction of names using and Tamil. The key challenges are briefly described as follows. human made rules. This approach lacks portability and Although our examples are in specific languages, similar robustness. One needs a significant number of rules to phenomena occur in all Indian languages and Assamese in maintain optimal performance, resulting in high maintenance particular. cost. There are several rule-based NER systems for English Lack of capitalization:- Capitalization plays a major role • providing 88%- 92% F-measure [2]. The main attractiveness in identifying NEs in English and some other European of the machine learning (ML) approach is that it is trainable languages. However, Indian languages do not have the and can be adapted to different languages and domains. In concept of capitalization. addition, the maintenance cost is cheaper than that of the Ambiguity:- In Indian languages, the problem of • rule-based approach. The main goal of the ML approach is ambiguity between common nouns and proper nouns to identify proper names by employing statistical models that is more difficult since names of people are usually classify them. ML models can be broadly classified into three dictionary words, unlike Western names. For example, types: [akax] and [zun] mean sky and moon, 1) Supervised, respectively, in Assamese, but also can indicate person 2) Unsupervised, and names. In fact most people’s names are dictionary words, 3) Semi-supervised. used without capitalization. Nested entities:- Indian languages also face the problem 1) Supervised: In supervised learning, the training data • of nested entities. Consider, in Assamese the expression include both the input and the output. In this approach, [ bisHobidyaloi] [E:1 the construction of proper training, validation and test Nagaland University]. It creates a problem for NER in sets is crucial. This method is usually fast and accurate. the sense that the word [nagaland] [E:Nagaland] As the program is taught with the right examples, it refers to a location, whereas [bisHobidyaloi] is “supervised”. A large amount of training data is [E: University] is a common noun and thus required for good performance of this model. Several [nagaland bisHobidyaloi] [E:Nagaland supervised models used in NER are: Hidden markov University] is an organization name. Thus it becomes Model (HMM) [2],[3],[4], Conditional Random Field difficult to retain the proper class. (CRF) [5]; Support Vector Machine (SVM) [6]; and Agglutinative nature:- Agglutination adds additional Maximum Entropy (ME) [7]. In addition, a variant of • features to the root word to produce complex meaning. Brill’s transformation-based rules [8] has been applied For example, in Assamese, [monipuR] [E:Manipu] to the problem [9]. HMM is widely used in NER due refers to a location named entity whereas to the efficiency of the Viterbi algorithm [10] used to [monipuRi] [E:Manipuri] is not a named entity as it discover the most likely NE class state sequence. refers to the people who live in . 2) Unsupervised:- In the unsupervised learning method, the Ambiguity in suffixes:- Indian languages can have a aim of the model is to build a representation from the • number of postpositions attached to a root word to form data. It can be used to cluster the input data to classes a single word. In Assamese the word [tEzpuR] on the basis of statistical properties. This approach is [E:Tezpur] is a place name, but when the suffix [eeya] portable to different domains or languages unlike the is attached, it gives a different meaning compared to the rule-based approach. [11] discuss an unsupervised model original one which means the people of Tezpur. for NE classification using unlabeled data. Work on NER Resource constraints:- NER approaches are either rule using the unsupervised model can also be found in [12], • based or machine learning (ML)-based. In either case, a and [13]. good-sized corpus of the language under consideration 3) Semi-supervised:- The semi-supervised model makes use of both labeled and unlabeled data, usually resulting 1E: English meaning in high accuracy. Expertise is required to obtain TABLE I gazetteer-based approach involves the tagging of NEs using DIFFERENT WORK ON NER USING HYBRID APPROACHES look-up lists for location, person, and organization names. Reference Language Approach F-measure(%) [16] Punjabi HMM+Rule-based 74.56 A. Features used in Named Entity Recognition [20] HMM+Rule-based 94.85 [19] Hindi CRF+ME+Rule-based 82.95 Different types of contextual information along with a [17] Hindi ME+Rule-based 65.13 variety of other features are used to identify NEs. Prefixes Bengali ME+Rule-based 65.96 and suffixes of words also play important roles in NER. Oriya ME+Rule-based 44.65 Telugu ME+Rule-based 18.74 The features used may be language-independent or dependent. Urdu ME+Rule-based 35.47 Language independent features used in NER include the [21] Manipuri CRF+Rule-based 93.3 following. [17] Bengali ME+Rule-based 65.96 [19] Hindi CRF+ME+Rule-based 82.95 1) Context word features: Surrounding words, such as the [17] ME+Rule-based 65.13 previous and the next word of a particular word serve [22] HMM+Rule-based 94.61 as important features when finding NEs. For example, [18] HMM+Gazetteer 98.37 [23] Rule-based+List-look-up 96 a word like [zila], [puR] or [paRa] indicates the presence of a location. These words are used to identify location names. Similarly, [ustad] [E:Expert], [kriRabid] [E:Sportsman] and labeled data and the cost of labeling the data is high. [kobi] [E:Poet] denote that the next word is a person Bootstrapping is a popular approach for this method. name. Work on NER using the semi-supervised approach can 2) NE information: The NE tag information for the previous be found in [14], [15]. and the following words are important features in IV. PREVIOUS WORK ON NER USING HYBRID APPROACH deciding the NE tag of the current word. For example, [Ram oxomoloi gol] [E:Ram went to Some of the work found in NER in Indian languages using Assam]. In this example, [Ram] is a person NE hybrid approaches are briefly described below. [16] proposed which helps identify that the next word is likely to be NER for Punjabi using a hybrid approaches in which rules are again an NE. Similarly in Bengali, used with HMM. [17] described a hybrid system that applied [Ram asame giyesil] [E:Ram went to Assam] can also the Maximum Entropy Model, language-specific rules and a help to identify the person NE. gazetteer list for several Indian languages. [18] presented a 3) Digit features: Different types of digit features have been combination of HMM and gazetteer methods for a tourism used in NER. These include whether the current token corpus. [19] discussed NER for Hindi using the CRF, HMM is a two-digit or four digit number, or a combination of and rule-based approaches. [20] used HMM and rule-based digits and periods and so on. For example, approaches for Kannada. [21] proposed a hybrid approach [5 June 2011]. in Manipuri, combining CRF and rule-based approaches. The 4) Organization suffix word list: Several known suffixes are accuracies obtained by different authors for different languages used for organizations. These help identify organization using hybrid approaches are shown in Table I. We see that names. For example, if there exists a word like Ltd accuracy varies across languages. Differences in the datasets, or Co, it is likely to be a part of an organization’s sizes of the training data and the use of POS, morphological name. Similarly for Indian languages also, there are information, language-specific rules, and gazetteers are the some suffixes used for organization names such as main reasons for the low performance of the systems. [got] [E:Group], [soRkaR] [E:Government]. V. UR HYBRID APPROACH 5) Length of words: It is often seen that short words less than 3 characters are not usually NE. But there are A hybrid approach is an approach where more than two exceptions, e.g., [Ram] [E:Ram], [sita] [E:Sita], approaches are used to improve the performance of an NER [Ron] [E:Ron]. system. To the best of our knowledge, there is no work on 6) POS: Part-of-speech is an important feature in hybrid NER in Assamese. We develop a hybrid NER system identifying the NEs. For example, if two words in that has the ability to extract four types of NEs. Each of the sequence are both verbs, the previous word is most approaches has its own strengths and weaknesses. Here, we likely to be a person name. Example: describe the hybrid architecture, which produces better results [komol douRi ahise] [E:Kamal came running]. Similarly than the rule-based approach or ML individually. in Bengali we can say as [komol The processing goes through three main components: kHeye gHumaise] [E:Kamal slept after eating]. Machine-learning, rule-based, and gazetteer-based. The machine-learning component involves two approaches, CRF Language dependent features used in NER include the and HMM. Various NE features are used when implementing following. the two approaches. The rule-based approach involves the rules 1) Action verb list: Person names generally appear before that we have derived for different classes of NEs and the action verbs. Examples of such action verbs in Assamese are [koisil] [E:told], [goisil] [E:Went]. b) [kothatu rame koisil ] [E:Ram told [sRi sRi soNkoRdeb madHyomik skul] [E:Sri it]. Sri Shankardev Higher Secondary School]. [sihotor ghoRoloi hoRi goisil ] 2) Rules for person names: [E:Hari went to their home]. But since Assamese is a Rules for multiword person names: • free it can also be written as A: Find the surname based on the common surname [rame kothatu koisil ]. gazetteer list. 2) Word prefixes and suffixes: A fixed-length prefix or B: If found, tag the word as the end word of a person suffix of a word may be used as a feature. It has NE. been seen that many NEs share common prefix or C: Search the previous word in the surname suffix strings which help identify them. For example, gazetteer list. in Assamese [dada] [E:Older Brother], D: If found, search the previous word in the same [baidEu] [E:Older Sister] are used identify person NEs. gazetteer list else Similarly in Bengali, [dada] [E:Older Brother], E: Search the previous word in the title list. [didi] [E:Older Sister] are used to identify a person NEs. F: If found, mark it as the beginning of a person 3) Designation words: Words like Dr., Prof, etc., often NE tag else, indicate the position and occupation of named persons, G: Mark it as the beginning in the next word. serving as clues to detect person NEs. For example, in Assamese we can say [profEsor dAs] Here are examples of multiword person name [E:Professor Das], [montRi borai koi] recognized by the rules: [E:Minister Bora says]. a) We have also derived some hand coded rules to identify [sRi hiRen kumaR boRuwa] different classes of NE. [E:Sri Hiren Kumar ]. b) [Ram xorma deb] [E:Ram Sharma 1) Rules for organizations: Deb] A: Find organization clue words based on the common Rules for single word person names: organization gazetteer list. • If the previous and succeeding words are verbs, B: If found, tag the word as the end word of an – the current word is most likely to be a person organization. name. C: Search the middle clue words in the organization Example: names. D: If found, search for the previous word in the same a) [bHat khai Ram gazetteer list else kHeliboloi goise] [E:Ram went to play E: Search for the previous word in the surname gazetteer after eating rice]. Here [Ram] [E:Ram] list. is NE, previous word [khai] [E:eat] F: If found, search for the previous word in the same and succeeding word [kHeliboloi] gazetteer list else [E:play] are both verbs. G: Search for the previous word in the title list. b) [kHai utHi Ram poRhi H: If found, mark it as the beginning of an organization ase] [E: Ram is studying after eating]. Here NE tag, else, [Ram] [E:Ram] is the NE, previous word I: Mark as a beginning in the next word. [utHi] [E:stand] and succeeding word [poRhi] [E:read] are both verbs. Here are some examples of Organization names c) If two words in sequence are both verbs, the recognized by the rules given above. previous word is most likely to be a person name. a) [belguRi utSotor Example: madHyomik skul] [E:Belguri Higher Secondary i) [komol douRi ahise] [E: School] . Kamal came running]. Here [komol] b) [oxom bisHobidyaloi] [E:Assam [E:Kamal] is NE and [douRi University] . ahise] [E: came running] are both verbs. c) [RadHa gobindo boRuwa Rules for Location: If there exist as word like kolez] [E:Radha Govindo Baruah College]. • [nogoR] [E: town], [zila] [E:district], Here are example of Organization names that are not [sohoR] [E:city], [paRot] [E:Lane]. The recognized: previous word represents a location named entity. a) [oxom kaziRoNga Example - [kamRup zila] [E:Kamrup bisHobidyaloi] [E:Assam Kaziranga University]. district], [sonitpuR zila] [E:]. TABLE II SIZES OF GAZETTEER LIST Rules for miscellaneous NEs: • LIST DATA – If the current word is a number and the next Surnames 6000 Locations 12000 word represents a unit of measurement such Organization Clue Words 37 as [kilo] [E:Kilo] , [gRam] [E:Gram]; Organization Middle Words 24 etc., it represent a Measurement NE. Location Clue Words 29 Example - [E:1 Kilo]. Pre-Nominal Words 120 Organization names 800 – If the current word is a digit and the following Person names 9,6000 word is a month name, it represents a date NE. Example - [E:1 June]. – If the current word is a number and the next TABLE III word is a month name followed by a digit, it NER RESULTS FOR SET 1 USING HYBRID APPROACH represents a date NE. Classes Precision Recall F-measure(%) Example - [E:5 June 2011]. Person 87 86.1 86.4 – If the current word is a digit followed by a word Location 87 83.2 85.05 like [bojat], [minit], [ghonta], Organization 88.1 86 86.4 Miscellaneous 90 88 88.8 [sekend] [E:second, hour, minute], it represents time NE. Example - , [E:3 mins]. – If there exists a month name preceded by a digit names and location names have the best results in word list, it represents date NE. dealing with NEs with clue words. Thus, these rules Example - - [E:6-7 June]. are applied to the output of the ML approach, and this – If there exists a digit followed by a word will not only tag the left-out data but also overwrite [son], [bosoR], it represents a date NE. the existing tagged data wherever applicable. This will Example - [E:In 1992], [E:10 help us in effectively handling the errors encountered in year]. implementing the HMM, i.e., lack of word features and – If there exists a digit followed by a word ambiguity in names. [sonR] [E:year], a digit and a month name, it 4) Apply the gazetteer-based approach on the untagged data represents a date NE. of the output of Step 2. Example - [E:1980 year 23 5) Apply the ML-based smoothing technique on the May]. remaining left-out words. – If a month name is followed by a word like 6) Apply rules for single word person names as the last , it represents a month NE. step on the untagged data. Example - [E:May month]. The overall architecture of the hybrid approach is shown in – If a digit exists in a range followed by a month, it represents a date NE. Fig. 1. Example - [E:1-4 June]. We conducted standard 3-fold cross-validation experiments. – If a dot exists between consecutive letters, it is The main goal of cross validation involves partitioning a most likely to be an Organization NE. sample of data into complementary subsets, performing the Example - . . [b.j.p]. analysis in one subset (called the trianing set) and validating the analysis on the other subset (called testing set). In each fold, there are training data and test data. Then in each The sizes of the different gazetteer lists we prepared fold, a learning model is created based on the training data. for our purpose are shown in Table II and is available in Out of 0.2 million wordforms, a set of 130K wordforms http://www.tezu.ernet.in/nlp/. have been manually tagged with four tags namely person, We have observed that certain methods are superior in location, organization and miscellaneous. This set is used as handling certain issues better than other models and vice versa. the training set for the NER system and the remaining 70K Thus, we obtained a precedence of methods to be applied on wordforms are considered test data. The words which were the output to another. Below are the steps used in our proposed unseen during the training phase are assigned the class 0. We hybrid model. present the precision, recall and f-measure for each of the 3- 1) With a large amount of training data, ML approaches fold experiments in Table III, Table IV and Table V and the normally give better results than other methods when average result is shown in Table VI. We have seen from Table applied individually. I that the best performing result is obtained in Kannada and 2) Apply ML on the raw test data. Manipuri language of 94.85% and 93.3% accuracy. Smaller 3) The rules for multi-word person names, organization the test corpus higher is the accuracy. Fig. 1. Hybrid NER Architecture.

TABLE IV of NER and found that handcrafted rules work well provided NER RESULTS FOR SET 2 USING HYBRID APPROACH the rules are carefully prepared. The hand coded rules result Classes Precision Recall F-measure(%) in an accuracy of 70-75%. NER using a gazetteer list is also Person 87.2 85 86 implemented which results in an accuracy of 75%-85%. And Location 86 84 84.8 the pure ML-based approaches yields an accuracy of 75%-83% Organization 85.1 86 85.5 Miscellaneous 88 87 87.4 whereas the hybrid approach achieves an accuracy of 85%- 88%. The experimental result prove that the hybrid approach outperforms the pure rule-based and the pure ML approach. TABLE V NER RESULTS FOR SET 3 USING HYBRID APPROACH REFERENCES Classes Precision Recall F-measure(%) Person 86.1 84 85 Location 85 87 85.9 [1] R. Grishman and B. Sundheim, “Message Understanding Conference-6: Organization 85.1 86 85.5 A Brief History,” in Proceedings of the 16th International Conference Miscellaneous 89.1 87 88 on Computational Linguistics (COLING), Copenhagen, Denmark, 1996, pp. 466–71. [2] B. D. M, M. Scott, S. Richard, and W. Ralph, “A High Performance Learning Name-finder,” in Proceedings of the fifth Conference on Applied Natural language Processing, Washington, DC, USA, 1997, pp. VI. CONCLUSION 194–201. The proposed hybrid system is capable of recognizing four [3] S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz, R. Stone, R. Weischedel, and the Annotation Group, “BBN: Description of the different types of named entities namely Person, Location, SIFT System as Used for MUC-7,” in Proceedings of Seventh Message Organization and Miscellaneous. We have considered a large Understanding Conference (MUC-7), Fairfax,Virginia, 1998, pp. 1–17. dataset and the system has gone through extensive testing [4] S. Yu, S. Bai, and P. Wu, “Description of the Kent Ridge Digital Labs System Used for MUC-7,” in Proceedings of Seventh Message process. Handcrafted rules are also derived for different classes Understanding Conference (MUC-7), Fairfox, Virginia. [5] J. Lafferty, A. McCallum, and F. Pereira, “Probabilistic Models for Segmenting and Labelling Sequence Data,” in Proceedings of the TABLE VI Eighteenth International Conference on Machine Learning (ICML- NER AVERAGE RESULT FOR HYBRID APPROACH 2001), Williams College, Williamstown, MA, USA, 2001, pp. 282–289. [6] Cortes and Vapnik, “Support Vector Network,” Machine Learning, pp. Classes F-measure(%) 273–297, 1995. Person 85.8 [7] B. Andrew, “A Maximum Entropy Approach to NER,” in Ph.D thesis, Location 85.25 Computer Science Dept, New York University, 1999. Organization 85.8 [8] E. Brill, “Transformation-based Error Driven Learning and Natural Miscellaneous 88.06 Language Processing: A Case Study in Part-of-speech,” Computational Linguistics, vol. 21. [9] J. Aberdeen, J. Burger, D. Day, L. Hirschman, P. Robinson, and M. Vilain, “MITRE: Description of the Alembic System Used for MUC- 6,” in Proceedings of the 6th Conference on Message Understanding, Columbia, Maryland, 1995, pp. 141–155. [10] A. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE Transactions on Information Theory, vol. 13, pp. 260–269, 1967. [11] M. Collins and Y. Singer, “Unsupervised Models for Named Entity Classification,” in Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, Stroudsburg, P.A, U.S.A, 1999, pp. 100–110. [12] E. Alfonseca and S. Manandhar, “An Unsupervised Method for Named Entity Recognition and Automated Concept Discovery,” in Proceedings of 1st International Conference on General WordNet, Mysore, India, 2002, pp. 34–43. [13] Y. Shinyama and S. Sekine, “Named Entity Discovery Using Comparable News Articles,” in Proceedings of 20th International Conference on Computational Linguistics, Stroudsburg, PA, USA, 2004. [14] R. Yangarber, W. Lin, and R. Grishman, “Unsupervised Learning of Generalized Names,” in Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, 2002, pp. 1135–1141. [15] S. Cucerzan and D. Yarowsky, “Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence,” in Proceedings of Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, Stroudsburg, P.A, USA, 1999, pp. 90–99. [16] K. S. Bajwa and A. Kaur, “Hybrid Approach for Named Entity Recognition,” International Journal of Computer Applications, vol. 118, pp. 36–41, 2015. [17] S. K. Saha, S. Chatterji, S. Dandapat, S. Sarkar, and P. Mitra, “A Hybrid Approach for Named Entity Recognition in Indian Languages,” in Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian languages, Hyderabad, India, January 2008, pp. 17–24. [18] N. Jahan and S. M. D. Chopra, “Named Entity Recognition in Indian Languages Using Gazetteer Method and Hidden Markov Model: A Hybrid Approach,” International Journal of Computer Science Engineering Technology (IJCSET), vol. 3, pp. 621–628, 2012. [19] S. Srivastava, M. Sanglikar, and D. Kothari, “Named Entity Recognition System for Hindi Language: A Hybrid Approach,” International Journal of Computational Linguistics (IJCL), vol. 2, pp. 10–23, 2011. [20] S. Amarappa and S. V. Sathyanarayana, “A Hybrid approach for Named Entity Recognition, Classification and Extraction (NERCE) in Kannada Documents,” in Proceedings of International Conference on Multimedia Processing, Communication and Information Technology MPCIT, Shimoga, India, 2013, pp. 173–179. [21] J. L and D. Kaur, “Named Entity Recognition System in Manipuri: A Hybrid Approach,” Lecture Notes in Computer Science, vol. 8105, pp. 104–110, 2013. [22] D. Chopra, N. Jahan, and S. Morwal, “Hindi Named Entity Recognition by Aggregating Rule Based Heuristic and Hidden Markov Model,” International Journal of Infomation Sciences and Techniques, vol. 2, pp. 43–52, 2012. [23] Y. Kaur and E. Kaur, “Named Entity Recognition System for Hindi Language Using Combination of Rule Based Approach and List Look Up Approach,” International Journal of scientific research and management (IJSRM), vol. 3, pp. 2300–2306—, 2015.