Transliteration Rules Arabic

Arabic Transliteration Rules in Document 9303 Mike ELLIS ISO, Australia V3.4.1 10/2011 Main points: 1. Identification: name in ARABIC script is only reliable basis 2. Countries that use the Arabic script should have the benefits of the MRZ The Arabic name must appear in Latin characters in VIZ 8.3 Languages and characters. When the mandatory elements of Zones I, II and III are in a national language that does not use the Latin alphabet, a transliteration shall also be provided. Status qqpuo: name in VIZ copied to MRZ Manyyp phonetic transcri ptions of same Arabic name Manyyp phonetic transcri ptions Manyyp phonetic transcri ptions Manyyp phonetic transcri ptions Manyyp phonetic transcri ptions and possibly over 9,000 more... MRZ same as VIZ Much variation in (Latin) name IDENTIFICATION: the Arabic name is unique ﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ ﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ So the solution is to base the MRZ on the Arabic Name. But the MRZ can only contain OCR-B A-Z and < ﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ ﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ 9.4.1 Names in the MRZ are represented differently from those in the VIZ. National characters must be transliterated using only the allowed OCR character set [A..Z]. Solution: Transliteration of Arabic name into Latin Transliteration table based on closest match + ‘escape’ Arabic letter Name MRZ Unicode hamza XE ء 0621 alef with madda above XAA ﺁ 0622 alef with hamza above XAE أ 0623 waw with hamza above XW ؤ 0624 alef with hamza below I إ 0625 yeh with hamza above XI ئ 0626 alef A ا 0627 beh B ب Technical 0628 [teh marbuta XTA/XAH[1 ة 0629 Report – teh T ت 062A theh XTH ث Appendix 1 062B jeem J ج 062C hah XH ح 062D khah XKH خ 062E dal D د 062F thal XDH ذ 0630 reh R ر 0631 [1] XTA is used generally except if teh marbuta occurs at the end of the name component, in which case XAH is used. Arabic letter Name MRZ Unicode zain Z ز 0632 seen S س 0633 sheen XSH ش 0634 sad XSS ص 0635 dad XDZ ض 0636 tah XTT ط 0637 zah XZZ ظ 0638 ain E ع Technical 0639 ghain G غ Report – 063A feh F ف 0641 qaf Q ق Appendix 1 0642 kaf K ك 0643 lam L ل 0644 meem M م 0645 noon N ن 0646 heh H ﻩ 0647 waw W و 0648 Arabic letter Name MRZ Unicode alef maksura XAY ى 0649 yeh Y ي 064A [shadda [DOUBLE][1 ّ 0651 alef wasla XXA ٱ 0671 Tteh XXT ٹ 0679 Peh P ﭗ 067E teh with ring XRT ټ 067C hah with hamza above XKE ځ 0681 ha h w ith 3 do ts ab ove XXH څ Technical 0685 Tcheh XC چ Report – 0686 Ddal XXD ڈ 0688 dal with ring XDR ډ Appendix 1 0689 Rreh XXR ڑ 0691 reh with ring XRR ړ 0693 reh with dot below and dot above XRX ږ 0696 Jeh XJ ژ 0698 seen with dot below and dot above XXS ښ 069A .becomes FXDZXDZXAH ﻓ ﻀّﺔ ;becomes EBBAS ﻋﺒّﺎﺒسس Shadda denotes doubling: Latin character or sequence is repeated eg [1] Arabic letter Name MRZ Unicode keheh XKK ﮎ 06A9 kaf with ring XXK ګ 06AB Ng XNG ڭ 06AD gaf XGG گ 06AF noon ghunna XNN ں 06BA noon with ring XXN ڼ 06BC heh doachashmee XDO ه 06BE heh with yeh above XYH ۀ 06C0 Technical 06C1 heh goal XXG Report – 06C2 heh goal with hamza above XGE Appendix 1 06C3 thteh mar btbuta goa l XTG farsi yeh XYA ى 06CC yeh with tail XXY ۍ 06CD Yeh Y ې 06D0 Yeh barree XYB ے 06D2 yeh barree with hamza above XBE ۓ 06D3 Some Arabic characters have near matches Meem مﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ M ‘H’ is alreadyyg assigned to “Heh”, so use ‘X’ as escap e Hah حﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ XH Heh -> ‘H’, Hah -> ‘XH’ Meem مﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ M Waw وﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ W Dal دﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ D “Ain” has no exact Latin eqq,guivalent, so assign ‘E’ Ain ﻣﺤﻤﻮد عﻋﺒﺪاﻟﺮﺣﻴﻢ E Beh ﻣﺤﻤﻮد بﻋﺒﺪاﻟﺮﺣﻴﻢ B Dal ﻣﺤﻤﻮد دﻋﺒﺪاﻟﺮﺣﻴﻢ D Alef ﻣﺤﻤﻮد اﻋﺒﺪاﻟﺮﺣﻴﻢ A Lam ﻣﺤﻤﻮد لﻋﺒﺪاﻟﺮﺣﻴﻢ L Reh ﻣﺤﻤﻮد رﻋﺒﺪاﻟﺮﺣﻴﻢ R Hah ﻣﺤﻤﻮد حﻋﺒﺪاﻟﺮﺣﻴﻢ XH Yeh ﻣﺤﻤﻮد يﻋﺒﺪاﻟﺮﺣﻴﻢ Y Meem ﻣﺤﻤﻮد مﻋﺒﺪاﻟﺮﺣﻴﻢ M Reiterate: MRZ is different form of name to VIZ 9.4.1 Names in the MRZ are represented differently from those in the VIZ. National characters must be transliterated using only the allowed OCR character set [A..Z]. Reiterate: MRZ is different form of name to VIZ 9.1.3 The data in the MRZ are formatted in such a way as to be readable by machines with standard capability worldwide. It must be stressed that the MRZ is reserved for data intended for international use in conformance with international Standards for MRPs. The MRZ is a different representation of the data than is found in the VIZ. Transliteration - advantage Name in MRZ is unique (= Arabic name) Arabic name direct from MRZ ﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ Chip holds name in DG11 (Unicode), can bdbe compared ﻣﺤﻤﻮدﺤد ﻋﺒﺪاﻟ ﺮﺣﻴﻢ Unicode: ﻣﺤﻤﻮد ,0645,062D,0645,0648,062F,0639,0628ﻋﺒﺪاﻟﺮﺣﻴﻢ 062F,0627,0644,0631,062D,064A,0645 IMPLEMENTATION ISSUES IMPLEMENTATION – Legacy Database and Border Control MEHMOOD ABD AL RAHEEM MAHMOOD ABDUL RAHIM MAHMUT ABD AR-RAHEEM MAHMUD ABDALRAHEEM MAHMOUD ABD-AL-RAHIIM MAHMUT ABDUL RAHIIM ﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ Other variations can be derived. Original Arabic form IMPLEMENTATION – Name in VIZ 1. Use Civil Register (transcription) or 2. Status quo transcription or 3. Use MRZ transliteration IMPLEMENTATION - PNR Only concerned when PNR is used for API, not with PNR for airline use. IATA/CAWG “API Statement of Principles” presented to FACILITATION (FAL) DIVISION — TWELFTH SESSION Cairo, Egypt, 2004-3-22/4-2 stated that: “Required API data should be limited to the data contained in the machine-readable zone of travel documents or obtainable from existing government databases, such as those containing visa issuance information.” IMPLEMENTATION – AIRLINE CHECKING 1. PNR = MRZ 2. PNR = both VIZ and MRZ CONCLUSION 1. Essential – solve identity management issue, required for Interpol , aviation security , etc 2. Using the original name in Arabic is only way 3. Arabic name is now machine readable 4. Will be transitional implementation problems, not imposs ible, b ut worth whil e goal Transliteration Rules - Arabic (WP17) Mike Ellis ISO (JTC1 SC17/WG3 TF3) on behalf of New Technologggpies Working Group (NTWG) presented to ICAO TAG/MRTD 20 – Montreal, September 2011 20th Meeting of the Technical Advisory Group on Machine Readable Travel Documents V3.4 09/2011.

Transliteration Rules Arabic

Similarities and Dissimilarities of English and Arabic Alphabets in Phonetic and Phonology: a Comparative Study

Arabic Sociolinguistics: Topics in Diglossia, Gender, Identity, And

Uyghur Language Processing on the Web

Arabic Letters Joined Up

Proposal for Arabic Script Root Zone LGR

A Handbook of Modern Uyghur

Henze, Paul B

Arabic Range: 0600–06FF

Automatic Arabic Dialect Identification Systems for Written Texts: a Survey

An Uyghur-English Dictionary

Turkmen Language Manual. INSTITUTION Peace Corps, Washington, D.C

Uyghur Script in ISO/IEC 10646