HANDWRITTEN NANDINAGARI IMAGE RETRIEVAL SYSTEM BASED on MACHINE LEARNING APPROACH USING BAG of VISUAL WORDS Prathima Guruprasad1, Dr

Total Page:16

File Type:pdf, Size:1020Kb

HANDWRITTEN NANDINAGARI IMAGE RETRIEVAL SYSTEM BASED on MACHINE LEARNING APPROACH USING BAG of VISUAL WORDS Prathima Guruprasad1, Dr HANDWRITTEN NANDINAGARI IMAGE RETRIEVAL SYSTEM BASED ON MACHINE LEARNING APPROACH USING BAG OF VISUAL WORDS Prathima Guruprasad1, Dr. Jharna Majumdar2 1Dept. of Computer Science and Engineering, NMIT, Université of Mysore PB No. 6429, Govindapura, Gollahalli, Yelahanka, Bangalore, India 2Dept. of CSE(PG) and Center for Robotics Research, NMIT, Université of Mysore PB No. 6429, Govindapura, Gollahalli, Yelahanka, Bangalore, India ABSTRACT Recognition is even more challenging in case of Human handwriting comes with different rare ancient handwritten Nandinagari styles, highly variable and inconsistent which manuscripts which is earlier form of Devanagari makes recognition of rare handwritten scripts. In recent years, large-scale image Nandinagari scripts extremely challenging. indexing and retrieval show sufficient scope in This paper proposes a Bag of Visual Words both corporate sector as well as in research (BoVW) based approach to retrieve similar community. The focus of this work is to use Handwritten Nandinagari character images invariant descriptors to represent such rare from visual vocabulary. This technique is manuscripts. Retrieval of similar images from a useful in the retrieval of images from a large large image database is a challenging problem. database, efficiently and accurately also helps This is because the memory requirement of the in achieving required scalability. The images, time of processing, indexing and character images are represented as retrieval time will grow as database size grows. histogram of visual words present in the Also effective and accurate retrieval when the image. Visual words are quantized images are with various sizes, orientations and representation of local regions, and for this occlusions is a matter of concern. In the proposed work, SIFT descriptors at key interest points model, the Bag of Visual Words (BoVW) are used as feature vectors. For the efficient concept is used for effective handwritten handwritten Nandinagari character retrieval Nandinagari character image retrieval as the system, the optimal number of clusters to be technique is widely used for text based system. chosen is 32 with the maximum mean Average This is the way in which human mind interpret Precision (mAP) as 0.904. the similar objects by generating the visual Keywords: Character Image retrieval, Bag of words. In BoVW, the Nandinagari character Visual words, SIFT Features, Code book image is represented as discrete visual words, the generation, Clustering, Indexing and quantized representation of large feature retrieval, Visual Vocabulary. descriptors extracted from the images. During recognition, similar images are retrieved by I. INTRODUCTION effectively computing the histogram of visual The Indian regional languages are cursive in word frequencies with the closest histogram nature so, reliable and efficient Recognition matching in very less time. In order to process system is still not available.Many of these the handwritten Nandinagari script which is languages come in different forms subjected to having complex 52 characters which include 15 changes region to region and time to time. They vowels and 37 consonants, the following steps have rich heritage which are in manuscript Palm are used. First the Nandinagari character dataset Leaf forms and also in handwritten forms. is prepared by converting them to gray scale images in different sizes and orientations. ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-4, ISSUE-4, 2017 163 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) Then detect the interest points and generate the Step 2: Feature Extraction: This is done using invariant feature descriptors around the key SIFT method. First identify the key interest points using SIFT (Scale Invariant Feature points which usually lie on high-contrast regions Transform) technique. This is the pre-processing of the image, such as corners which remain step for BOVW approach which is based on data invariant even change in scale, rotation and visualization concept. Next is code book illumination. Then 128 feature descriptors generation by quantizing local SIFT descriptors extracted for each indentified key points on the into visual words. This is also called as images. The steps are as shown in Figure 2. vocabulary generation. The last step is indexing and searching similar images from the visual Step 3: Representation of visual words: The vocabulary. This approach is highly scalable. To large set of SIFT local feature descriptors are measure the recognition accuracy mean Average represented as visual words using BOVW Precision method is used. approach. This technique groups the similar descriptors using k means clustering [7] II. REVIEW OF LITERATURE approach and codebook of k X 128 dimension is Large scale image retrieval systems is using generated by k cluster centroids. This is based on the BOVW model in which each image can be number of clusters formed and called as k visual viewed as a sparse vector of visual words to text- words. Each key point descriptor of an image is retrieval system[2]. BOVW model is designed assigned to the closest cluster centroid. The for the local descriptors of images to describe number of assignments is represented as regions around the key points which is detected histogram of frequencies of visual words and in the images [3]. Instead of global features, an generated by aggregating of local SIFT image can have a set of patches around key descriptors. interest points. And local descriptor of 128- dimension SIFT features is used to be a good way Step 4: Then indexing and retrieval of similar to represent the characteristics of these patches characters from the codebook by querying is [4][5]. But when such local descriptors for each done using nearest neighbour approach in the image is extracted, the total number of such recognition phase. Top N similar images features is very huge. And comparing and retrieved from the database. searching similar matches for each local descriptor in the query image becomes too IV. EXPERIMENTAL RESULTS cumbersome. Therefore, BOVW is proposed as As input, Handwritten Nandinagari characters a way to solve this problem by compact of different sizes 256 X 256, 384 X 384,512 X representation of these descriptors into visual 512 and 640 X 640 and 5 Rotations are 0o, 45o, words, which decreases the size of the 90o, 135o and 180oare considered. This forms a descriptors [6][8]. This is helpful in scalable 1049 characters in the database. indexing and fast search on the vector space. For SIFT feature extraction, a set of images III. PROPOSED METHODOLOGY (.jpg or .png files) contained in a folder writes the The training phase in this proposed model as extracted features into text files. For each image, shown in Figure 1 of recognizing the hand this first key points of each image is identified written Nandinagari scripts using data and 128 feature descriptors are generated for visualization has following steps. The visual each of these key points. In this way a total of vocabulary is used for matching query images in 1049 input images are fed to the feature the recognition phase. extraction module and feature files are written in Algorithm Steps a directory which represents different types and Step 1: Preparation of dataset: Standard dataset styles of images. Average number of interest for handwritten Nandinagari characters is not points for these images are ranging from 63 to available .As input, Handwritten Nandinagari 268 key points. And the average execution time characters of different orientations and styles are is ranging from 45 milli seconds to 218 milli prepared. A total of 1049 handwritten seconds and the feature descriptor size is ranging Nandinagari character images are taken for result from 35 MB to 166 MB. This feature analysis analysis with minimal or without Pre-processing statistics is as shown in Figure 3. by converting them to Greyscale. ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-4, ISSUE-4, 2017 164 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) Input Samples Query Images Converting to Converting to Gray scale Gray scale Images Images Extracting Invariant Extracting Invariant Features using SIFT Features using SIFT method method Generate the code book Generate the query vector of the feature descriptors using K means approach Indexing and retrieval Visual Word of similar images using representation nearest neighbour in the vocabulary method database Retrieved Results Figure 1: Proposed model 1.Construct Scale 6.Generate Feature Descriptors Space 5.Assign Key point 2. Take the difference Orientations of Gaussian 4.Remove Edge and 3. Locate DOG extreme Low Contrast Responses (Min or max) Figure 2: Steps in SIFT Method In SIFT codebook generation phase K means clustering approach is used and code book is generated only once. The code book file stores only cluster centres aFnd the number of cluster centres depends on number of clusters formed based on the value of K. To analyze the performance the codebooks with 8, 16, 24, 32 and 52 cluster centroid are generated. These are called visual words. Figure 3: Average no. of interest points, Average execution time and Feature Descriptor size for Character images ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-4, ISSUE-4, 2017 165 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) Table I. Code book size and Clustering time query image vector with nearest neighbour different number of clusters method is computed. Sl. No. of Index Indexing Here top N similar images are retrieved from No. Clusters Size(KB) Time(m.s.) visual vocabulary. From the image set a representative of 10 query samples as shown in 1 8 8,176 95250 table 3are taken for analyzing the performance of this proposed framework. 2 16 16,895 88513 Table 3: Query Image samples 3 24 24,648 93430 Query Name Query Size and Image Orientations 4 32 33,251 91417 5 52 53,761 85466 512X512, 0 A Deg. These internally get mapped to all key points present in that particular cluster.
Recommended publications
  • UTC L2/16‐037 FROM: Deborah Anderson, Ken Whistler
    TO: UTC L2/16‐037 FROM: Deborah Anderson, Ken Whistler, Rick McGowan, Roozbeh Pournader, Andrew Glass, and Laurentiu Iancu SUBJECT: Recommendations to UTC #146 January 2016 on Script Proposals DATE: 22 January 2016 The recommendations below are based on documents available to the members of this group at the time they met, January 19, 2016. EUROPE 1. Latin Document: L2/15‐327 Proposal to add Medievalist punctuation characters – Everson Discussion: We reviewed this document, which requested 21 characters. Many of the proposed characters require more detailed analysis, specifically providing examples that show contrasts in manuscripts, in old transcriptions, and how the marks are represented in text today. Specific comments raised in the discussion: • §1 Introduction. In the list of the proposed characters on pages 1 and 2, include dotted guide‐ lines, which show the placement of the characters in relation to the baseline, mid‐line, and top line, and solid lines separating individual table cells. • §2.2.3. Punctus versus. The text suggests that two glyphs for the same character are being proposed: PUNCTUS VERSUS MARK and LOW PUNCTUS VERSUS MARK. • §2.4 Distinctiones. “Note too that ჻ is the Georgian paragraph separator; no ‘generic’ punctuation mark for that has been encoded.” Is this a request to unify the Latin ჻ with U+10FB Georgian Paragraph Separator? If so, it can be added to ScriptExtensions.txt. • §4 Linebreaking. The assignment of SY as the LB property for DOTTED SOLIDUS should be reviewed by the UTC, since the SY class currently has only one member and it would be prudent to be cautious about adding another member to SY.
    [Show full text]
  • Optimal Clustering Technique for Handwritten Nandinagari Character Recognition
    International Journal of Computer Applications Technology and Research Volume 6–Issue 5, 213-223, 2017, ISSN:-2319–8656 Optimal Clustering Technique for Handwritten Nandinagari Character Recognition Prathima Guruprasad Prof. Dr. Jharna Majumdar Research Scholar, UOM, Sc. G DRDO (Retd.), Dean, Dept. of CSE, NMIT, R&D, Prof. and Head, Dept. of CSE and Center for Gollahalli, Yelahanka, Robotics Research, NMIT, Bangalore, India Bangalore, India Abstract: In this paper, an optimal clustering technique for handwritten Nandinagari character recognition is proposed. We compare two different corner detector mechanisms and compare and contrast various clustering approaches for handwritten Nandinagari characters. In this model, the key interest points on the images which are invariant to Scale, rotation, translation, illumination and occlusion are identified by choosing robust Scale Invariant Feature Transform method(SIFT) and Speeded Up Robust Feature (SURF) transform techniques. We then generate a dissimilarity matrix, which is in turn fed as an input for a set of clustering techniques like K Means, PAM (Partition Around Medoids) and Hierarchical Agglomerative clustering. Various cluster validity measures are used to assess the quality of clustering techniques with an intent to find a technique suitable for these rare characters. On a varied data set of over 1040 Handwritten Nandinagari characters, a careful analysis indicate this combinatorial approach used in a collaborative manner will aid in achieving good recognition accuracy. We find that Hierarchical clustering technique is most suitable for SIFT and SURF features as compared to K Means and PAM techniques. Keywords: Invariant Features, Scale Invariant Feature Transform, Speeded Up Robust Feature technique, Nandinagari Handwritten Character Recognition, Dissimilarity Matrix, Cluster measures, K Means, PAM, Hierarchical Agglomerative Clustering 1.
    [Show full text]
  • Elements of South-Indian Palaeography, from the Fourth To
    This is a reproduction of a library book that was digitized by Google as part of an ongoing effort to preserve the information in books and make it universally accessible. https://books.google.com ELEMENTS SOUTH-INDIAN PALfi3&BAPBY FROM THE FOURTH TO THE SEVENTEENTH CENTURY A. D. BEIN1 AN INTRODUCTION TO ?TIK STUDY OF SOUTH-INDIAN INSCRIPTIONS AND MSS. BY A. C. BURNELL HON'. PH. O. OF TUE UNIVERSITY M. K. A, ri'VORE PIS I. A SOClfcTE MANGALORE \ BASEL MISSION BOOK & TRACT DEPOSITORY ft !<3 1874 19 Vi? TRUBNER & Co. 57 & 69 LUDOATE HILL' . ' \jj *£=ggs3|fg r DISTRIBUTION of S INDIAN alphabets up to 1550 a d. ELEMENTS OF SOUTH-INDIAN PALEOGRAPHY FROM THE FOURTH TO THE SEVENTEENTH CENTURY A. D. BEING AN INTRODUCTION TO THE STUDY OF SOUTH-INDIAN INSCRIPTIONS AND MSS. BY A. p. j^URNELL HON. PH. D. OF THE UNIVERSITY OF STRASSBUB.G; M. R. A. S.; MEMBKE DE LA S0CIETE ASIATIQUE, ETC. ETC. MANGALORE PRINTED BY STOLZ & HIRNER, BASEL MISSION PRESS 1874 LONDON TRtlBNER & Co. 57 & 59 LUDGATE HILL 3« w i d m « t als ^'ctdjcn kr §anltekcit fiir Mc i|jm bdic<jcnc JJoctorMvk ttcsc fetlings^kit auf rincm fejjcr mtfrckntcn Jfclk bet 1®4 INTRODUCTION. I trust that this elementary Sketch of South-Indian Palaeography may supply a want long felt by those who are desirous of investigating the real history of the peninsula of India. Trom the beginning of this century (when Buchanan executed the only archaeological survey that has ever been done in even a part of the South of India) up to the present time, a number of well meaning persons have gone about with much simplicity and faith collecting a mass of rubbish which they term traditions and accept as history.
    [Show full text]
  • Ori & Mss Library
    ORI & MSS LIBRARY SCHEME AND SYLLABUS M. PHIL MANUSCRIPTOLOGY ORI & MSS LIBRARY, KARIAVATTOM Syllabus - M.Phil Manuscriptology in Malayalam, Tamil & Sanskrit M. Phil Manuscriptology Scheme and Syllabus Semester – I Sl.No. Course code Course Name Credit Marks 1 MSS 711 Paper I - Research Methodology 4 100 2. MSS 712 Paper II- Textual Criticism 4 100 3. MSS 713 Paper III - Writing and Writing materials 4 100 Semester – II 1 MSS 721 Paper IV-Dissertation + Viva voce 20 300 Total marks 600 Total credits 32 (12+20) Semester I Paper I MSS 711 Research Methodology Credit – 4 Marks - 100 Unit – I – Introduction Meaning and Definition of Research Need of Research (26 hrs) Unit – II – Types of Research (50 hrs) Unit - III – Research Process Formulation of Research Problem Hypothis Research Design Data Collection Analysis of Data Centralisation (50 hrs) Unit – IV – Structure of Research Report Preliminary Section The Text End Matter (50 hrs) Suggested Readings : 1. Research in Education - Best W John 2. Research Methodology - Kothari.C.R. 3. Gaveshana Pravacika - Dr.M.V.Vishnu Nambothiri 4. Sakithya Gaveshanathinte Reethi Sastram - Dr.D.Benjamin 5. Methodology of Research - Kulbir Singh Sidhu 6. Research Methods in Social Sciences - Sharma.R.D 7. Thesis and Assignment Writing - Anderson J Durston 8. The Elements of Research in Education - Whitmeu.F.C. 9. Arivum Anubuthiyum - P.V.Velayudhan Pillai 10. Methodology for Research - Joseph A Antony 11. Sakithya Ghaveshanam - Chattnathu Achuthanunni Paper II - MSS 712 - Textual Criticism Credit
    [Show full text]
  • Final Proposal to Encode Nandinagari in Unicode
    L2/17-162 2017-05-05 Final proposal to encode Nandinagari in Unicode Anshuman Pandey [email protected] May 5, 2017 1 Introduction This is a proposal to encode the Nandinagari script in Unicode. It supersedes the following documents: • L2/13-002 “Preliminary Proposal to Encode Nandinagari in ISO/IEC 10646” • L2/16-002 “Proposal to encode the Nandinagari script in Unicode” • L2/16-310 “Proposal to encode the Nandinagari script in Unicode” • L2/17-119 “Towards an encoding model for Nandinagari conjuncts” It incorporates comments regarding previous proposals made in: • L2/16-037 “Recommendations to UTC #146 January 2016 on Script Proposals” • L2/16-057 “Comments on L2/16-002 Proposal to encode Nandinagari” • L2/16-216 “Recommendations to UTC #148 August 2016 on Script Proposals” • L2/17-153 “Recommendations to UTC #151 May 2017 on Script Proposals” • L2/17-117 “Proposal to encode a nasal character in Vedic Extensions” Major changes since L2/16-310 include: • Expanded description of the headstroke and its behavior (see section 3.2). • Clarification of encoding model for consonant conjuncts (see section 5.4). • Removal of digits and proposed unification with Kannada digits (see section 4.9). • Re-analysis of ‘touching’ conjuncts as variant forms that may be controlled using fonts. • Removal of ardhavisarga and other characters that require additional research (see section 4.10). • Identification of a pr̥ ṣṭhamātrā, which is not included in the proposed repertoire (see section 4.10). • Proposed reallocation of a Vedic nasal letter to the ‘Vedic Extensions’ block (see L2/17-117). • Revision of Indic position category for vowel signs.
    [Show full text]
  • Preliminary Proposal to Encode Nandinagari in ISO/IEC 10646
    ISO/IEC JTC1/SC2/WG2 N4389 L2/13-002 2013-01-14 Title: Preliminary Proposal to Encode Nandinagari in ISO/IEC 10646 Source: Script Encoding Initiative (SEI) Author: Anshuman Pandey ([email protected]) Status: Liaison Contribution Action: For consideration by WG2 and UTC Date: 2013-01-14 1 Introduction This is a preliminary proposal to encode Nandinagari in the Universal Character Set (ISO/IEC 10646). It provides a draft character repertoire, names list, and some specimens. Research on the script is ongoing and a formal proposal is forthcoming. Nandinagari is a Brahmi-based script that was used in southern India between the 8th and 19th centuries for producing manuscripts and inscriptions in Sanskrit in south Maharashtra, Karnataka and Andhra Pradesh. It derives from the central group of Nagari scripts and is related to Devanagari. There are several similarities between Nandinagari and Devanagari in terms of character repertoire, glyphic representation, and structure (see the comparison in table 1). However, Nandinagari differs from Devanagari in the shapes of character glyphs, the lack of a connecting headline, and, particularly, in the rendering of con- sonant conjuncts (see figures 14–18; note the shapes of kṣa and jña, and the form of ya as C2). There are also several styles of Nandinagari, which are to be treated as variant forms of the script. As such, Nandinagari cannot be considered a stylistic variant of Devanagari and the various styles of Nandinagari cannot be prop- erly classified as variants of Devanagari. The independent status of Nandinagari is perhaps best articulated by Saraju Rath, who writes: From statements in various early and recent secondary literature [...] one could infer that Nandināgarī, Nāgarī and Devanāgarī are very close and show only minor distinctions.
    [Show full text]
  • Unicode Character Properties
    Unicode character properties Document #: P1628R0 Date: 2019-06-17 Project: Programming Language C++ Audience: SG-16, LEWG Reply-to: Corentin Jabot <[email protected]> 1 Abstract We propose an API to query the properties of Unicode characters as specified by the Unicode Standard and several Unicode Technical Reports. 2 Motivation This API can be used as a foundation for various Unicode algorithms and Unicode facilities such as Unicode-aware regular expressions. Being able to query the properties of Unicode characters is important for any application hoping to correctly handle any textual content, including compilers and parsers, text editors, graphical applications, databases, messaging applications, etc. static_assert(uni::cp_script('C') == uni::script::latin); static_assert(uni::cp_block(U'[ ') == uni::block::misc_pictographs); static_assert(!uni::cp_is<uni::property::xid_start>('1')); static_assert(uni::cp_is<uni::property::xid_continue>('1')); static_assert(uni::cp_age(U'[ ') == uni::version::v10_0); static_assert(uni::cp_is<uni::property::alphabetic>(U'ß')); static_assert(uni::cp_category(U'∩') == uni::category::sm); static_assert(uni::cp_is<uni::category::lowercase_letter>('a')); static_assert(uni::cp_is<uni::category::letter>('a')); 3 Design Consideration 3.1 constexpr An important design decision of this proposal is that it is fully constexpr. Notably, the presented design allows an implementation to only link the Unicode tables that are actually used by a program. This can reduce considerably the size requirements of an Unicode-aware executable as most applications often depend on a small subset of the Unicode properties. While the complete 1 Unicode database has a substantial memory footprint, developers should not pay for the table they don’t use. It also ensures that developers can enforce a specific version of the Unicode Database at compile time and get a consistent and predictable run-time behavior.
    [Show full text]
  • General Historical and Analytical / Writing Systems: Recent Script
    9 Writing systems Edited by Elena Bashir 9,1. Introduction By Elena Bashir The relations between spoken language and the visual symbols (graphemes) used to represent it are complex. Orthographies can be thought of as situated on a con- tinuum from “deep” — systems in which there is not a one-to-one correspondence between the sounds of the language and its graphemes — to “shallow” — systems in which the relationship between sounds and graphemes is regular and trans- parent (see Roberts & Joyce 2012 for a recent discussion). In orthographies for Indo-Aryan and Iranian languages based on the Arabic script and writing system, the retention of historical spellings for words of Arabic or Persian origin increases the orthographic depth of these systems. Decisions on how to write a language always carry historical, cultural, and political meaning. Debates about orthography usually focus on such issues rather than on linguistic analysis; this can be seen in Pakistan, for example, in discussions regarding orthography for Kalasha, Wakhi, or Balti, and in Afghanistan regarding Wakhi or Pashai. Questions of orthography are intertwined with language ideology, language planning activities, and goals like literacy or standardization. Woolard 1998, Brandt 2014, and Sebba 2007 are valuable treatments of such issues. In Section 9.2, Stefan Baums discusses the historical development and general characteristics of the (non Perso-Arabic) writing systems used for South Asian languages, and his Section 9.3 deals with recent research on alphasyllabic writing systems, script-related literacy and language-learning studies, representation of South Asian languages in Unicode, and recent debates about the Indus Valley inscriptions.
    [Show full text]
  • List of Ancient Indian Scripts - GK Notes for SSC & Banking
    List of Ancient Indian Scripts - GK Notes for SSC & Banking India is known for its culture, heritage, and history. One of the treasures of India is its ancient scripts. There are a huge number of scripts in India, many of these scripts have not been deciphered yet. Most Indian languages are written in Brahmi-derived scripts, such as Devanagari, Tamil, Telugu, Kannada, Odia etc. Read this article to know about a List of Ancient Indian Scripts. Main Families of Indian Scripts All Indian scripts are known to be derived from Brahmi. There are three main families of Indian scripts: 1. Devanagari - This script is the basis of the languages of northern India and western India: Hindi, Gujarati, Bengali, Marathi, Panjabi, etc. 1 | P a g e 2. Dravidian - This script is the basis of Telugu, Kannada. 3. Grantha - This script is a subsection of the Dravidian languages such as Tamil and Malayalam, although it is not as important as the other two. Ancient Indian Scripts Given below are five ancient Indian scripts that have proven to be very important in the history of India - 1. Indus Script The Indus script is also known as the Harappan script. It uses a combination of symbols which were developed during the Indus Valley Civilization which started around 3500 and ended in 1900 BC. Although this script has not been deciphered yet, many have argued that this script is a predecessor to the Brahmi Script. 2. Brahmi Script Brahmi is a modern name for one of the oldest scripts in the Indian as well as Central Asian writing system.
    [Show full text]
  • Overview and Rationale
    Integration Panel: Maximal Starting Repertoire — MSR-5 Overview and Rationale REVISION – APRIL 06, 2021 – PUBLIC COMMENT Table of Contents 1 Overview 3 2 Maximal Starting Repertoire (MSR-5) 3 2.1 Files 3 2.1.1 Overview 3 2.1.2 Normative Definition 4 2.1.3 HTML Presentation 4 2.1.4 Code Charts 4 2.2 Determining the Contents of the MSR 5 2.3 Process of Deciding the MSR 6 3 Scripts 8 3.1 Comprehensiveness and Staging 8 3.2 What Defines a Related Script? 9 3.3 Separable Scripts 9 3.4 Deferred Scripts 9 3.5 Historical and Obsolete Scripts 9 3.6 Selecting Scripts and Code Points for the MSR 10 3.7 Scripts Appropriate for Use in Identifiers 10 3.8 Modern Use Scripts 11 3.8.1 Common and Inherited 12 3.8.2 Scripts included in MSR-1 12 3.8.3 Scripts added in MSR-2 12 3.8.4 Scripts added in MSR-3 through MSR-5 13 3.8.5 Modern Scripts Ineligible for the Root Zone 13 3.9 Scripts for Possible Future MSRs 13 3.10 Scripts Identified in UAX#31 as Not Suitable for identifiers 14 4 Exclusions of Individual Code Points or Ranges 16 4.1 Historic and Phonetic Extensions to Modern Scripts 16 4.2 Code Points That Pose Special Risks 17 4.3 Code Points with Strong Justification to Exclude 17 4.4 Code Points That May or May Not be Excludable from the Root Zone LGR 17 4.5 Non-spacing Combining Marks 18 Integration Panel: Maximal Starting Repertoire — MSR-3 Overview and Rationale 5 Discussion of Particular Code Points 20 5.1 Digits and Hyphen 20 5.2 CONTEXT O Code Points 21 5.3 CONTEXT J Code Points 21 5.4 Code Points Restricted for Identifiers 21 5.5 Compatibility
    [Show full text]
  • Section 15.2, Kaithi
    The Unicode® Standard Version 12.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2019 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 12.0. Includes index. ISBN 978-1-936213-22-1 (http://www.unicode.org/versions/Unicode12.0.0/) 1.
    [Show full text]
  • An Accurate and Robust Handwritten Nandinagari Recognition System
    ISSN: 0976-9102 (ONLINE) ICTACT JOURNAL ON IMAGE AND VIDEO PROCESSING, FEBRUARY 2020, VOLUME: 10, ISSUE: 03 DOI: 10.21917/ijivp.2020.0302 AN ACCURATE AND ROBUST HANDWRITTEN NANDINAGARI RECOGNITION SYSTEM Prathima Guruprasad1 and Guruprasad2 1Department of Computer Science and Engineering, Dayananda Sagar University, India 2Department of Information Technology, Mindtree Limited, Bengaluru, India Abstract 1.1 NANDINAGARI CHARACTER SET This paper is one of the early attempts for the recognition of Nandinagari handwritten characters. Good literature for finding Nandinagari character set has 52 characters out of which 15 specific types of key interest points using single approach is available are vowels and 37 are consonants as shown in Table.1. for manuscripts. However, careful analysis indicate that a Handwritten scripts pose significant challenges, and sufficient combinatorial approach is needed to be used in a collaborative manner for achieving good accuracy. On a variant data set of over 1000 work needs to be done in each step for their identification. Handwritten Nandinagari characters having different size, rotation, To perform reliable recognition, it is important that the translation and image format, we subject them to an approach at every features extracted from the image be detectable even under stage where their recognition is effective. In the first stage, the key changes in image scale, noise and illumination. Such points interest points on the images which are invariant to Scale, rotation, usually lie on high-contrast regions of the image. Another translation, illumination and occlusion are identified by choosing Scale important characteristic of these features is that they should be Invariant Feature Transform method.
    [Show full text]