Mobile Application for Recognition of Japanese Writing System

Total Page:16

File Type:pdf, Size:1020Kb

Mobile Application for Recognition of Japanese Writing System Bachelor’s thesis Czech Technical University in Prague Faculty of Electrical Engineering F3 Department of Cybernetics Mobile Application for Recognition of Japanese Writing System Jan Zdeněk Program: Open Informatics Field: Computer and Information Science May 2014 Supervisor: prof. Ing. Pavel Zahradník, CSc. ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 2 Acknowledgement First of all, I would like to express my gratitude to my supervisor, prof. Ing. Pavel Zahradník, CSc., for giving me the opportunity to work on this project. I appreciate his guidance and comments on my work and I would like to thank him for giving me the freedom to explore on my own. I would like to acknowledge the help of my friend Jakub Černý who gave me his opinion on the thesis and provided me with helpful comments. Finally, I want to thank my parents for their love, encouragement and constant support throughout my studies. i Declaration / Prohlášení I declare that I worked out the presented thesis independently and I quoted all used sources of information in accord with Methodical instructions about ethical principles for writing academic thesis. In Prague, May 23, 2014 .................................................... Jan Zdeněk Prohlašuji, že jsem předloženou práci vypracoval samostatně a že jsem uvedl veškeré použité informační zdroje v souladu s Metodickým pokynem o dodržování etických principů při přípravě vysokoškolských závěrečných prací. V Praze dne, 23. května 2014 .................................................... Jan Zdeněk ii Abstract / Abstrakt The objective of this work was to implement and compare various methods which can be used for optical character recognition (OCR) of characters used in the Japanese language and create a mobile application which could recognize characters in an image captured by the camera of a device and present the user with a translation of the words into English. The engine for recognition has been trained on a database with 2,908 unique characters. Using the moment shape normalization, the Histogram of Oriented Gradients for feature ex- traction, and quadratic discriminant function for classification, recognition rate of 99.9% has been achieved on isolated characters in standard fonts used in print publications and the engine outperformed some of the free OCR engines for Japanese. An initial alpha version of a mobile application for Japanese OCR for Android operating system has been created and the engine with optimal results has been successfully implemented therein. The author intends to continue the development of the application and expand the range of its possible use. The OCR engine could also be used in other applications requiring Japanese OCR. Keywords: Pattern Recognition; OCR Cílem práce bylo implementovat a otestovat různé metody použitelné pro optické rozpozná- vání znaků (OCR) používaných v japonštině a vytvořit mobilní aplikaci, která bude rozpoznávat znaky v obraze zachyceném fotoaparátem zařízení a poskytne uživateli překlad textu do anglič- tiny. Systém pro rozpoznávání byl natrénován na databázi obsahující 2908 unikátních znaků. Za použití momentové normalizace pro normalizaci tvaru znaku, histogramu orientovaných gra- dientů pro extrakci příznaků a kvadratické diskriminační funkce pro klasifikaci bylo dosaženo úspěšnosti rozpoznání 99,9% na izolovaných znacích ve standardním fontu používaném v tiště- ných publikacích. Systém byl úspěšnější než některé existující volně šířené systémy pro OCR japonštiny. Byla vytvořena počáteční alfa verze aplikace pro operační systém Android a úspěšně v něm byla implementována verze systému dosahující nejlepších výsledků. Autor chce nadále pokračovat ve vývoji aplikace a rozšířit spektrum její použitelnosti. OCR systém aplikace by mohl bý použit i v dalších aplikacích vyžadujících OCR japonštiny. Klíčová slova: Rozpoznávání; optické rozpoznávání znaků iii Contents List of Figures vii List of Tables ix 1 Introduction 1 1.1 State of the Art and the Goal ............................. 1 1.2 Related Work ...................................... 3 1.3 Japanese Writing System ............................... 4 1.4 Thesis Structure .................................... 6 2 Dataset Generation 7 2.1 Supported Characters ................................. 7 2.2 Character Image Generation .............................. 8 3 Preprocessing and Feature Extraction 9 3.1 Grayscale Normalization ................................ 9 3.2 Shape Normalization .................................. 9 3.2.1 Aspect Ratio Adaptive Normalization .................... 10 3.2.2 Linear Normalization .............................. 10 3.2.3 Moment Normalization ............................ 11 3.2.4 Bi-moment Normalization ........................... 13 3.2.5 Smoothing ................................... 14 3.3 Feature Extraction ................................... 15 3.3.1 Zonal Line Density Method .......................... 16 3.3.2 Histogram of Oriented Gradients in Local Regions ............. 16 4 Learning and Classification 19 4.1 Dimensionality Reduction ............................... 19 4.1.1 Principal Component Analysis ........................ 19 4.1.2 Linear Discriminant Analysis ......................... 20 4.2 Classification Methods ................................. 21 4.2.1 k-Nearest Neighbors .............................. 22 4.2.2 Linear Discriminant Function ......................... 23 4.2.3 Quadratic Discriminant Function ....................... 24 4.3 Results .......................................... 24 iv Contents ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 5 Text Extraction 30 5.1 Binarization ....................................... 31 5.2 Black on White / White on Black Detection ..................... 32 5.3 Skew Correction .................................... 33 5.4 Character Segmentation ................................ 36 5.4.1 Projection Profile Analysis .......................... 36 6 Implementation 42 6.1 Training ......................................... 42 6.2 Comparison ....................................... 42 6.3 Application ....................................... 42 7 Conclusion 45 7.1 Summary ........................................ 45 7.2 Future Work ...................................... 46 References 47 A Abbreviations and Symbols 52 A.1 Abbreviations ...................................... 52 A.2 Symbols ......................................... 53 B CD Content 54 C Application Manual 56 D Dataset Format 57 E Matlab Demo for OCR 58 v List of Figures 1.1 Examples of hiragana, katakana, and kanji characters ............... 5 3.1 Examples of grayscale normalization ......................... 9 3.2 Examples of shape normalization ........................... 14 3.3 Example of image smoothing using a Gaussian filter ................ 15 3.4 Example of a character composed of multiple existing kanji ............ 15 3.5 Visualization of gradient direction and gradient strengths ............. 17 3.6 Division of a block during calculation of HOG .................... 18 4.1 Example of a k-NN classifier with 3 classes ..................... 22 4.2 Recognition error rates of tested shape normalization methods with varying sizes of feature vectors. Tested on standard print fonts. ................. 25 4.3 Recognition error rates of tested shape normalization methods with varying sizes of feature vectors. Tested on fonts resembling handwriting. ............ 25 4.4 Recognition error rates using different classification methods with varying sizes of feature vectors. Tested on standard print fonts. ................. 26 4.5 Recognition error rates using different classification methods with varying sizes of feature vectors. Tested on fonts resembling handwriting. ............ 26 4.6 Recognition error rates using different methods for feature dimensionality reduc- tion with varying sizes of feature vectors. Tested on standard print fonts. .... 26 4.7 Recognition error rates using different methods for feature dimensionality reduc- tion with varying sizes of feature vectors. Tested on fonts resembling handwriting. 26 4.8 Recognition error rates using different methods for feature extraction. Tested on standard print fonts. .................................. 27 4.9 Recognition error rates using different methods for feature extraction. Tested on fonts resembling handwriting. ............................. 27 4.10 Examples of correctly classified characters ...................... 28 4.11 Examples of incorrectly classified characters ..................... 28 5.1 Diagram of the steps from image acquisition to character classification. ...... 31 5.2 Example of binarization using Otsu’s thresholding method ............. 32 5.3 Radon transform mechanism ............................. 34 5.4 Example of Radon transform on a correctly aligned line of text .......... 35 5.5 Proposed method for skew correction using radon transform ............ 36 5.6 Example of correctly split multi-character projection block ............. 38 5.7 Example of correctly merged projection mark blocks ................ 40 5.8 Example of incorrectly merged projection blocks of two hiragana characters ... 41 vi List of Figures ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 5.9 An example of correctly segmented line of printed text. .............. 41 6.1 Screenshot of the Android application. ........................ 43 C.1 Screenshot of the Android application with description of elements. ........ 56 E.1 Screenshot of the demo application in Matlab. .................... 59 vii List of Tables 4.1 Minimum recognition error rate of different classification
Recommended publications
  • Specifications
    SPECIFICATIONS 9-PIN WIDE CARRIAGE IMPACT PRINTER The Versatile Data Reporter • Improved processing speed with a 64KB Input Data Buffer • Fast print speed of up to 337 Characters Per Second (12 CPI) • Achieve optimum efficiency with 5 Part Forms printout (1 original + 4 copies) • Greater connectivity with built-in USB, Serial and Parallel Interface options • Choice of 8 Built-in Bar Code formats for maximum versatility SPECIFICATIONS Weight – approx. 6.6 kg 9-PIN WIDE CARRIAGE IMPACT PRINTER 164mm PRINT DIRECTION Bi-directional with logic seeking PRINT SPEED HIGH SPEED DRAFT 10/12/15 CPI 300/337/337 cps HIGH SPEED DRAFT CONDENSED 17/20 CPI 321/300 cps 275mm DRAFT 10/12/15 CPI 225/270/225 cps 546mm DRAFT CONDENSED 17/20 CPI 191/225 cps NLQ 10/12/15/17/20 CPI 56/67/56/47/56 cps PRINT CHARACTER SETS 13 International character sets; 13 character code tables (Standard); Italic, PC437, PC850, PC860, PC861, CHARACTERISTICS PC863, PC865, Abicomp, BRASCII, Roman 8, ISO Latin 1, PC858, ISO 8859-15 BITMAP FONTS Epson Draft: 10, 12, 15 CPI; Epson Roman and Sans Serif: 10, 12, 15 CPI, Proportional BAR CODE FONT EAN-13, EAN-8, Interleaved 2 of 5, UPC-A, UPC-E, Code 39, Code 128, PostNet OPTIONS PRINTABLE PITCH(CPI) Character per line FABRIC RIBBON CARTRIDGE (BLACK) COLUMNS 10 CPI 136 FABRIC RIBBON PACK (BLACK) 12 CPI 163 SINGLE BIN CUT SHEET FEEDER 15 CPI 204 PULL TRACTOR UNIT 17 CPI CONDENSED 233 ROLL PAPER HOLDER 20 CPI CONDENSED 272 PAPER HANDLING PAPER PATH MANUAL INSERTION: Rear in, Top out PUSH TRACTOR: Rear in, Top out PULL TRACTOR: Rear/Bottom in, Top out CUT SHEET FEEDER: Rear in, Top out PAPER SIZE CUT SHEET MANUAL INSERTION Width: 148 ~ 420 mm (5.8 ~ 16.5") Length: 100 ~ 364 mm (3.9 ~ 14.3") Thickness: 0.065 ~ 0.14 mm (0.0025 ~ 0.0055") CSF SINGLE-BIN Width: 182 ~ 420 mm (7.2 ~ 16.5") Length: 210 ~ 364 mm (8.3 ~ 14.3") Thickness: 0.07 ~ 0.14 mm (0.0028 ~ 0.0055") MULTI PART Width: 148 ~ 420 mm (5.8 ~ 16.5") Length: 100 ~ 364 mm (3.9 ~ 14.3") Thickness: 0.12 ~ 0.39 mm (0.0047 ~ 0.015") (Total) ENVELOPE No.
    [Show full text]
  • ISO Basic Latin Alphabet
    ISO basic Latin alphabet The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in[1] various national and international standards and used widely in international communication. The two sets contain the following 26 letters each:[1][2] ISO basic Latin alphabet Uppercase Latin A B C D E F G H I J K L M N O P Q R S T U V W X Y Z alphabet Lowercase Latin a b c d e f g h i j k l m n o p q r s t u v w x y z alphabet Contents History Terminology Name for Unicode block that contains all letters Names for the two subsets Names for the letters Timeline for encoding standards Timeline for widely used computer codes supporting the alphabet Representation Usage Alphabets containing the same set of letters Column numbering See also References History By the 1960s it became apparent to thecomputer and telecommunications industries in the First World that a non-proprietary method of encoding characters was needed. The International Organization for Standardization (ISO) encapsulated the Latin script in their (ISO/IEC 646) 7-bit character-encoding standard. To achieve widespread acceptance, this encapsulation was based on popular usage. The standard was based on the already published American Standard Code for Information Interchange, better known as ASCII, which included in the character set the 26 × 2 letters of the English alphabet. Later standards issued by the ISO, for example ISO/IEC 8859 (8-bit character encoding) and ISO/IEC 10646 (Unicode Latin), have continued to define the 26 × 2 letters of the English alphabet as the basic Latin script with extensions to handle other letters in other languages.[1] Terminology Name for Unicode block that contains all letters The Unicode block that contains the alphabet is called "C0 Controls and Basic Latin".
    [Show full text]
  • Morse Code (Edited from Wikipedia)
    Morse Code (Edited from Wikipedia) SUMMARY Morse code is a method of transmitting text information as a series of on-off tones, lights, or clicks that can be directly understood by a skilled listener or observer without special equipment. It is named for Samuel F. B. Morse, an inventor of the telegraph. The International Morse Code encodes the ISO basic Latin alphabet, some extra Latin letters, the Arabic numerals and a small set of punctuation and procedural signals (prosigns) as standardized sequences of short and long signals called "dots" and "dashes", or "dits" and "dahs", as in amateur radio practice. Because many non-English natural languages use more than the 26 Roman letters, extensions to the Morse alphabet exist for those languages. Each Morse code symbol represents either a text character (letter or numeral) or a prosign and is represented by a unique sequence of dots and dashes. The duration of a dash is three times the duration of a dot. Each dot or dash is followed by a short silence, equal to the dot duration. The letters of a word are separated by a space equal to three dots (one dash), and the words are separated by a space equal to seven dots. The dot duration is the basic unit of time measurement in code transmission. To increase the speed of the communication, the code was designed so that the length of each character in Morse is shorter the more frequently it is used in the language. Thus the most common letter in English, the letter "E", has the shortest code, a single dot.
    [Show full text]
  • Living in Korea
    A Guide for International Scientists at the Institute for Basic Science Living in Korea A Guide for International Scientists at the Institute for Basic Science Contents ⅠOverview Chapter 1: IBS 1. The Institute for Basic Science 12 2. Centers and Affiliated Organizations 13 2.1 HQ Centers 13 2.1.1 Pioneer Research Centers 13 2.2 Campus Centers 13 2.3 Extramural Centers 13 2.4 Rare Isotope Science Project 13 2.5 National Institute for Mathematical Sciences 13 2.6 Location of IBS Centers 14 3. Career Path 15 4. Recruitment Procedure 16 Chapter 2: Visas and Immigration 1. Overview of Immigration 18 2. Visa Types 18 3. Applying for a Visa Outside of Korea 22 4. Alien Registration Card 23 5. Immigration Offices 27 5.1 Immigration Locations 27 Chapter 3: Korean Language 1. Historical Perspective 28 2. Hangul 28 2.1 Plain Consonants 29 2.2 Tense Consonants 30 2.3 Aspirated Consonants 30 2.4 Simple Vowels 30 2.5 Plus Y Vowels 30 2.6 Vowel Combinations 31 3. Romanizations 31 3.1 Vowels 32 3.2 Consonants 32 3.2.1 Special Phonetic Changes 33 3.3 Name Standards 34 4. Hanja 34 5. Konglish 35 6. Korean Language Classes 38 6.1 University Programs 38 6.2 Korean Immigration and Integration Program 39 6.3 Self-study 39 7. Certification 40 ⅡLiving in Korea Chapter 1: Housing 1. Measurement Standards 44 2. Types of Accommodations 45 2.1 Apartments/Flats 45 2.2 Officetels 46 2.3 Villas 46 2.4 Studio Apartments 46 2.5 Dormitories 47 2.6 Rooftop Room 47 3.
    [Show full text]
  • S P E C I F I C a T I O
    SPECIFIC A TIO N S CONTROL PANEL 24 pin Dot Matrix Printer 4 switches, 5 LEDs ENVIRONMENTAL CONDITIONS (OPERATING) TEMPERATURE 5 ~ 35°C HUMIDITY 10~80%RH ELECTRICAL SPECIFICATIONS RATED VOLTAGE AC 220V ~ 240V 210mm RATED FREQUENCY 50 ~ 60Hz POWER CONSUMPTION Approx. 37.5W (ISO/IEC 10561 letter pattern) Energy Star Compliant DIMENSIONS 480mm WIDTH x DEPTH x HEIGHT 480 x 370 x 210mm WEIGHT Approx. 6.8kg 370mm PAPER HANDLING PAPER PATH: Manual Insertion Front in, Front out Push Tractor Rear in, Front out PRINTER TYPE Optional Roll Paper holder Rear in, Front out MODEL LQ-690 PAPER SIZE: PRODUCT CODE C11CA13081 Cut Sheet Width 90 ~ 304.8mm (3.5 ~12.0”) PRINTING METHOD Impact dot matrix Length 70 ~ 420mm (2.8 ~16.5”) NUMBER OF PINS IN HEAD 24 pins Envelopes No.6, No.10 PRINT DIRECTION Bi-direction with logic seeking Continuous Paper Width 101.6 ~ 304.8 (4.0 ~12.0”) Length 76.2 ~ 558.8mm (3.0~22.0”) PRINT SPEED Label (Cut sheet) Base sheet width 101.6 ~ 304.8mm (4.0-12.0”) HIGH SPEED DRAFT 10cpi 440 cps Base sheet length 101.6 ~ 558.8mm (4.0-22.0”) 12cpi 529 cps Label (Continuous Paper) Base sheet width 100.0 ~ 210.0mm (3.9-8.3”) DRAFT 10cpi 330 cps Base sheet length 100.0 ~ 297.0mm (3.9-11.7”) 12cpi 396 cps PAPER THICKNESS: 15cpi 496 cps Cut Sheet Single Sheet 0.065 ~ 0.19mm (0.0025 ~ 0.0074”) 17cpi (Condensed) 283 cps Multi Part 0.12 ~ 0.49mm (0.0047 ~ 0.0019”) 20cpi (Condensed) 330 cps Envelopes 0.16 ~ 0.52mm (0.0063 ~ 0.0205”) LQ 10cpi 110 cps Continuous Paper Single Sheet/Multi Part 0.065 ~ 0.49 mm (0.0025 - 0.019 “) 12cpi 132 cps Label Thickness 0.16 ~ 0.19mm (0.0063 ~ 0.0075”) 15cpi 165 cps PAPER FEEDING: Friction Feed Front, Rear 17 cpi (Condensed) 188 cps Push tractor Feed Rear 20 cpi (Condensed) 220 cps Optional Feeder Friction Feeder Rear push tractor and Optional roll paper holder COPIES: 1 Original + 6 copies PRINT CHARACTERISTICS ACOUSTIC NOISE: Approx.
    [Show full text]
  • Fast Forward Your Business Efficiency
    DOT MATRIX PRINTER 9-PIN WIDE CARRIAGE DFX-9000 FAST FORWARD YOUR BUSINESS EFFICIENCY. Amplify your productivity with the Epson DFX-9000 dot matrix printer. Featuring speeds of up to 1,550 cps and a MTBF rating of 20,000 power- on hours (POH), this 9-pin, wide format printer can perform high-volume printing with great effi ciency and cost-savings. With a convenient LCD panel, versatile connectivity and advanced paper handling capabilities, the DFX-9000 is your reliable choice in a busy environment. 10-copies High Speed Power Cost Printing Printing Saving Effective Epson genuine ribbons ensure greater yield, better quality and longer lifespan. Remarkable Performance Effi cient and Reliable Flexibility and Versatility Print speeds of up to 1,550 characters per Enjoy resilient and long-lasting operations Prints up to 10-part forms and offers second (10 cpi) and a user-friendly LCD panel with low cost printing that goes easy on a wide range of connectivity and paper to maximise your productivity. your overheads. handling options. SPECIFICATIONS 9-PIN WIDE CARRIAGE DFX-9000 Model Number DFX-9000 Printer Type Dimensions & Weight Weight: Printing Method Impact Dot Matrix 34kg Number of Pins In Head 36 pins (9 × 4 staggered) Print Direction Bi-direction with logic seeking Control Code ESC/P®, IBM® PPDS emulation Print Speed High Speed Draft 10 / 12 cpi 1,550 / 1,450 cps 363mm Draft 10 / 12 cpi 1,320 / 1,320 cps NLQ 10 / 12 cpi 330 / 330 cps Print Characteristics Character Table For Standard version: 13 International character sets: Italic table,
    [Show full text]
  • Approximate Type © 2021 Dalat Specimen, Description Released: 2020 Designer: Kasper Pyndt Rasmussen Weights: 3 — Styles: 3 Av
    Approximate Type © 2021 Dalat Specimen, description Dalat Light Dalat Regular Dalat Bold The starting point of the project was a trip metric construction of the letters, while to Vietnam and a subsequent curiosity about the typeface’s soft serifs are derived from their prolific use of Cooper Black for street Chinese/Vietnamese calligraphy (thu phap) signage—ironically an archetypical American whose terminals tend to thicken and pool. typeface. Foreign conflicts and colonisation The assemblage of these styles results in a are deeply entangled in the story of Viet- typeface that points East as well as West nam which results in a very multi- facetted and finds its home in both contexts. With its visual culture. The typeface Dalat is an un- decorative and playful characteristics, Dalat likely hybrid and synthesis of these varying is mainly suitable for large text-sizes while references; French Art Deco and Russian maintaining good legibility in smaller sizes Constructivism comes across in the geo- as well. Released: 2020 Weights: 3 — Styles: 3 Designer: Kasper Pyndt Rasmussen Availability: Desktop, Web, App Approximate Type © 2021 Dalat Specimen, headlines (Light) EXPLICIT! 100 pt Tra#c Jam 15 CONFLICTS 70 pt Voyager In!ux? DANH VO, ARTIST. 50 pt Immediate A"ermath 218 CALLIGRAPHIC REFERENCES 30 pt Craftsmanship & Self-governance FRANCIS FORD COPPOLA OR OLIVER STONE?! 20 pt Experts in disagreement over Vietnam-!icks Approximate Type © 2021 Dalat Specimen, headlines (Regular) EXPLICIT! 100 pt Tra#c Jam 15 CONFLICTS 70 pt Voyager In!ux? DANH
    [Show full text]
  • Morse Code - Wikipedia Page 1 of 21
    Morse code - Wikipedia Page 1 of 21 Morse code From Wikipedia, the free encyclopedia Morse code is a method of transmitting text information as a series of on- off tones, lights, or clicks that can be directly understood by a skilled listener or observer without special equipment. It is named for Samuel F. B. Morse, an inventor of the telegraph. The International Morse Code[1] encodes the ISO basic Latin alphabet, some extra Latin letters, the Arabic numerals and a small set of punctuation and procedural signals (prosigns) as standardized sequences of short and long signals called "dots" and "dashes",[1] or "dits" and "dahs", as in amateur radio practice. Because many non-English natural languages use more than the 26 Roman letters, extensions to the Morse alphabet exist for those languages. Each Morse code symbol represents either a text character (letter or numeral) or a prosign and is represented by a unique sequence of dots and dashes. The duration of a dash is three times the duration of a dot. Each dot or dash is followed by a short silence, equal to the dot duration. The letters of a word are separated by a space equal to three dots (one dash), and the words are separated by a space equal to seven dots. The dot duration is the basic unit of time measurement in code transmission.[1] To increase the speed of the communication, the code was designed so that the length of each character in Morse varies approximately inversely to its frequency of occurrence in English. Thus the most common letter in English, the letter "E", has the shortest code, a single dot.
    [Show full text]
  • ISO Basic Latin Alphabet
    E 1 E For the mathematical constant, see e (mathematical constant). For other uses, see E (disambiguation). For technical reasons, "E#" redirects here. For E sharp, see E♯. ISO basic Latin alphabet Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz •• v •• t [1] • e Cursive script 'e' and capital 'E' E (named e /ˈiː/, plural ees)[2] is the fifth letter and a vowel in the ISO basic Latin alphabet. It is the most commonly used letter in many languages, including: Czech, Danish, Dutch, English, French, German, Hungarian, Latin, Norwegian, Spanish, and Swedish. E 2 History Egyptian Phoenician Etruscan Greek Roman/ hieroglyph He E Epsilon Cyrillic q’ E The Latin letter 'E' differs little from its derivational source, the Greek letter epsilon, 'Ε'. In etymology, the Semitic hê has been suggested to have started as a praying or calling human figure (hillul 'jubilation'), and was probably based on a similar Egyptian hieroglyph that indicated a different pronunciation. In Semitic, the letter represented /h/ (and /e/ in foreign words), in Greek hê became epsilon with the value /e/. Etruscans and Romans followed this usage. Although Middle English spelling used 'e' to represent long and short /e/, the Great Vowel Shift changed long /eː/ (as in 'me' or 'bee') to /iː/ while short /e/ (as in 'met' or 'bed') remained a mid vowel. No. Use in other languages In the International Phonetic Alphabet, /e/ represents the close-mid front unrounded vowel. In the orthography of many languages it represents either this or /ɛ/, or some variation (such as a nasalized version) of these sounds, often with diacritics (as: 〈e ê é è ë ē ĕ ě ẽ ė ẹ ę ẻ〉) to indicate contrasts.
    [Show full text]
  • Specifications
    SPECIFICATIONS 9-PIN WIDE CARRIAGE IMPACT PRINTER Minimum cost, maximum workload • Improved processing speed with a 64KB Input Data Buffer • Fast print speed of up to 337 Characters Per Second (12 CPI) • Achieve optimum efficiency with 5 Part Forms printout (1 original + 4 copies) • Greater connectivity with built-in USB, Serial and Parallel Interface options • Choice of 8 Built-in Bar Code formats for maximum versatility SPECIFICATIONS Weight – approx. 4.4 kg 9-PIN WIDE CARRIAGE IMPACT PRINTER 159mm PRINT DIRECTION Bi-directional with logic seeking PRINT SPEED HIGH SPEED DRAFT 10/12/15 CPI 300/337/337 cps 275mm HIGH SPEED DRAFT CONDENSED 17/20 CPI 321/300 cps DRAFT 10/12/15 CPI 225/270/225 cps DRAFT CONDENSED 17/20 CPI 191/225 cps 366mm NLQ 10/12/15/17/20 CPI 56/67/56/47/56 cps PRINT CHARACTER SETS 13 International character sets; 13 character code tables (Standard); Italic, PC437, PC850, PC860, PC861, CHARACTERISTICS PC863, PC865, Abicomp, BRASCII, Roman 8, ISO Latin 1, PC858, ISO 8859-15 BITMAP FONTS Epson Draft: 10, 12, 15 CPI; Epson Roman and Sans Serif: 10, 12, 15 CPI, Proportional BAR CODE FONT EAN-13, EAN-8, Interleaved 2 of 5, UPC-A, UPC-E, Code 39, Code 128, PostNet PRINTABLE PITCH (CPI) Character per line OPTIONS COLUMNS 10 CPI 80 FABRIC RIBBON CARTRIDGE (BLACK) 12 CPI 96 FABRIC RIBBON PACK (BLACK) 15 CPI 120 FABRIC RIBBON CARTRIDGE (COLOUR) 17 CPI CONDENSED 137 SINGLE BIN CUT SHEET FEEDER 20 CPI CONDENSED 160 PULL TRACTOR UNIT PAPER HANDLING COLOUR UPGRADE KIT PAPER PATH MANUAL INSERTION: Rear in, Top out ROLL PAPER HOLDER PUSH TRACTOR: Rear in, Top out PULL TRACTOR: Rear/Bottom in, Top out CUT SHEET FEEDER: Rear in, Top out PAPER SIZE CUT SHEET MANUAL INSERTION Width: 100 ~ 257 mm (3.9 ~ 10.1") Length: 100 ~ 364 mm (3.9 ~ 14.3") Thickness: 0.065 ~ 0.14 mm (0.0025 ~ 0.0055") CSF SINGLE-BIN Width: 182 ~ 216 mm (7.2 ~ 8.5") Length: 257 ~ 356 mm (10.1 ~ 14.0") Thickness: 0.07 ~ 0.14 mm (0.0028 ~ 0.0055") MULTI PART Width: 100 ~ 257 mm (3.9 ~ 10.1") Length: 100 ~ 364 mm (3.9 ~ 14.3") Thickness: 0.12 ~ 0.39 mm (0.0047 ~ 0.015") (Total) ENVELOPE No.
    [Show full text]
  • Crash Course on Character Encodings
    Crash Course on Character Encodings Yusuke Shinyama NYCNLP Oct. 27, 2006 Introduction 2 Are they the same? • Unicode • UTF 3 Two Mappings Character Byte Character Code Sequence A 64 64 182 216 1590 ﺽ 美 32654 231 190 142 4 Two Mappings Character Byte Character Code Sequence Unicode UTF-8 A 64 64 182 216 1590 ﺽ 美 32654 231 190 142 “Character Set” “Encoding Scheme” 5 Terminology • Character Set - Mapping from abstract characters to numbers. • Encoding Scheme - Way to represent (encode) a number in a byte sequence in a decodable way. - Only necessary for character sets that have more than 256 characters. 6 In ASCII... Character Byte Character Code Sequence ASCII 5 53 53 A 65 65 m 109 109 7 Character Sets 8 Character Sets • ≤ 256 characters: - ASCII (English) - ISO 8859-1 (English & Western European languages) - KOI8 (Cyrillic) - ISO-8859-6 (Arabic) • 256 < characters: - Unicode - GB 2312 (Simplified Chinese) - Big5 (Traditional Chinese) - JISX 0208 (Japanese) - KPS 9566 (North Korean) 9 Character Sets ISO ISO ASCII GB 2312 Unicode 8859-1 8859-6 A 65 65 65 65 65 ë - 235 - - 235 1590 - 214 - - ﺽ 美 - - - 50112 32654 ♥ - - - - 9829 10 Unicode Standard • History - ISO Universal Character Set (1989) - Unicode 1.0 (1991) ■ 16-bit fixed length codes. - Unicode 2.0 (1996) ■ Oops, we’ve got many more. ■ Extended to 32 bits. - Unicode 5.0 (2006) ■ Keep growing... 11 Unicode Standard • Hexadecimal notation (U+XXXX). • ISO 8859-1 is preserved as the first 256 characters. ISO Unicode 8859-1 U+0041 A 65 (65) U+00EB ë 235 (235) 12 Problems in Unicode • Politics (Microsoft, Apple, Sun, ...) • Lots of application specific characters.
    [Show full text]
  • ISO/IEC International Standard ISO/IEC 10646
    ISO/IEC International Standard ISO/IEC 10646 Final Committee Draft Information technology – Universal Coded Character Set (UCS) Technologie de l‘information – Jeu universel de caractères codés (JUC) Third edition, 2012 ISO/IEC 10646:2012 (E) Final Committee Draft (FCD) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF- creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the add ress given below. © ISO/IEC 2010 All rights reserved. Unless otherwise specified, no part of this publica tion may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the a d- dress below or ISO's member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail [email protected] Web www.iso.ch Printed in Switzerland ii © ISO/IEC 2010 – All rights reserved ISO/IEC 10646:2012 (E) Final Committee Draft (FCD) CONTENTS Foreword................................................................................................................................................
    [Show full text]