¢ ز@ں ñ اê6† 5 a ۔ Domain Name in Pakistani Languages

a ،ñ a8ہ :ا، :hن ر † @ ر آف : ا 6 اa h، DomainDomain NameName

www.crulp.org 2 InternationalizedInternationalized DomainDomain NameName

www.crulp.org 3 WhatWhat lettersletters ofof PakistaniPakistani LanguagesLanguages shouldshould bebe allowedallowed inin thethe InternationalizedInternationalized DomainDomain NamesNames (IDNs)?(IDNs)?

-- ForFor eacheach language?language? -- Collectively?Collectively?

www.crulp.org 4 MorningMorning SessionSession

„ Background:Background: UnicodeUnicode „ InternationalizedInternationalized DomainDomain NamesNames (IDNs)(IDNs) „ IssuesIssues andand challengeschallenges relatedrelated toto ArabicArabic IDNsIDNs „ SampleSample (tentative)(tentative) solutionsolution forfor UrduUrdu languagelanguage

www.crulp.org 5 AfternoonAfternoon SessionSession

„ SampleSample languagelanguage tablestables forfor thethe followingfollowing languageslanguages „ Balochi „ Pashto „ Punjabi „ Seraiki „ Sindhi „ Torwali „ CollectiveCollective IssuesIssues forfor multiplemultiple languageslanguages

www.crulp.org 6 Background:Background: UnicodeUnicode

„ EverythingEverything inin thethe computerscomputers isis representedrepresented asas numbersnumbers „ InitiallyInitially ASCIIASCII encoding:encoding: „ AA ÆÆ 6565 „ BB ÆÆ 6666 …… „ OnlyOnly supportedsupported LatinLatin script,script, primarilyprimarily EnglishEnglish „ OtherOther encodingsencodings developeddeveloped forfor otherother languages,languages, butbut cumbersomecumbersome toto developdevelop separateseparate encodingencoding forfor eacheach languagelanguage ofof thethe worldworld

www.crulp.org 7 UnicodeUnicode

„ ThusThus efforteffort startedstarted toto developdevelop UniversalUniversal encodingencoding oror UNIcodeUNIcode „ UnicodeUnicode ConsortiumConsortium developsdevelops thethe UnicodeUnicode standardstandard „ CoversCovers almostalmost allall writingwriting systemssystems inin currentcurrent useuse todaytoday „ FirstFirst versionversion ‘‘TheThe UnicodeUnicode StandardStandard 1.01.0’’ publishedpublished inin 19911991 „ CurrentCurrent versionversion ‘‘TheThe UnicodeUnicode StandardStandard 5.15.1’’ publishedpublished inin AprilApril 20082008

www.crulp.org 8 UnicodeUnicode

„ EuropeanEuropean scriptsscripts „ Latin, Greek, Cyrillic, Armenian, Georgian, IPA „ BidirectionalBidirectional (Middle(Middle Eastern)Eastern) scriptsscripts „ Hebrew, Arabic, Syriac, Thaana „ IndicIndic (Indian(Indian andand SoutheastSoutheast Asian)Asian) scriptsscripts „ , Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Khmer, Myanmar, Tibetan, Philippine „ EastEast AsianAsian scriptsscripts „ Chinese (Han) characters, Japanese (Hiragana and Katakana), Korean (Hangul), Yi

www.crulp.org 9 UnicodeUnicode

„ OtherOther modernmodern scriptsscripts „ Mongolian,Mongolian, Ethiopic,Ethiopic, Cherokee,Cherokee, CanadianCanadian AboriginalAboriginal „ HistoricalHistorical scriptsscripts „ Runic,Runic, OghamOgham,, OldOld Italic,Italic, Gothic,Gothic, DeseretDeseret „ PunctuationPunctuation andand symbolssymbols „ Numerals,Numerals, mathmath symbols,symbols, scientificscientific symbols,symbols, arrows,arrows, blocks,blocks, geometricgeometric shapes,shapes, Braille,Braille, musicalmusical notation,notation, etc.etc.

www.crulp.org 10 UnicodeUnicode isis SCRIPTSCRIPT basedbased

„ OneOne codecode perper charactercharacter perper scriptscript „ ToTo avoidavoid duplicationduplication ofof codescodes ofof samesame letterletter usedused byby multiplemultiple scriptsscripts „ ForFor example:example: ,is same in Urdu, Sindhi ک The character code +06A9 „ Pashto, Punjabi, Farsi, … „ DifferentDifferent codecode blocksblocks reservedreserved forfor differentdifferent scriptsscripts „ ForFor ArabicArabic scriptscript 0600,0600, 0601,0601, ……,, 06FE,06FE, 06FF06FF

www.crulp.org 11 CharactersCharacters SemanticsSemantics

„ TheThe UnicodeUnicode standardstandard includesincludes anan extensiveextensive databasedatabase thatthat specifiesspecifies aa largelarge numbernumber ofof charactercharacter properties,properties, including:including: „ Name „ Type (.g., letter, digit, punctuation mark) „ Decomposition „ Case and case mappings (for cased letters) „ Numeric value (for digits and numerals) „ Combining class (for combining characters) „ Cursive joining behavior

www.crulp.org 12 UnicodeUnicode

„ AdoptedAdopted byby industryindustry leadersleaders asas Apple,Apple, HP,HP, IBM,IBM, Microsoft,Microsoft, etc.etc. „ SupportedSupported inin manymany platformsplatforms includingincluding Java,Java, LinuxLinux andand MicrosoftMicrosoft Windows,Windows, etc.etc. „ SupportedSupported byby manymany internationalizedinternationalized applicationsapplications includingincluding OpenOpen Office,Office, FirefoxFirefox,, Thunderbird,Thunderbird, MicrosoftMicrosoft Office,Office, etc.etc.

www.crulp.org 13 UnicodeUnicode isis thethe basisbasis forfor InternationalizedInternationalized DomainDomain NamesNames

www.crulp.org 14 MorningMorning SessionSession

9 Background:Background: UnicodeUnicode „ InternationalizedInternationalized DomainDomain NamesNames (IDNs)(IDNs) „ IssuesIssues andand challengeschallenges relatedrelated toto ArabicArabic IDNsIDNs „ SampleSample (tentative(tentative solution)solution) forfor UrduUrdu languagelanguage

www.crulp.org 15 DomainDomain NameName SystemSystem (DNS)(DNS) DomainDomain NameName SystemSystem (DNS)(DNS)

„ DomainDomain namename isis thethe addressaddress ofof aa websitewebsite whichwhich isis usedused toto accessaccess itit e.g.e.g. www.crulp.orgwww.crulp.org

www.crulp.org 17 DomainDomain NameName SystemSystem (DNS)(DNS)

www.crulp.org

6. Request Reply 1. www.crulp.org

4. 192.168.0.1 ISP Host Server 5. Requested Found / Not Found

2. www.crulp.org 3. 192.168.0.1

Domain Name Server

www.crulp.org = 192.168.0.1 www.crulp.org 18 NeedNeed ofof IDNsIDNs

„ DomainDomain namename systemsystem (DNS)(DNS) isis inin ASCII,ASCII, .e.i.e. LatinLatin scriptscript „ MakesMakes itit difficultdifficult toto accessaccess internetinternet forfor peoplepeople whowho dodo notnot understandunderstand EnglishEnglish oror LatinLatin scriptscript

www.crulp.org 19 IDNsIDNs

„ BasicBasic reasonreason forfor thatthat isis thethe internetinternet addressesaddresses mapmap intointo 77--bitbit ASCIIASCII standardstandard „ WeWe cancan notnot changechange thethe overalloverall existingexisting system.system. „ TheThe solutionsolution isis toto addadd layerlayer thatthat worksworks onon toptop ofof existingexisting systemsystem „ IDNIDN isis anyany domaindomain namename consistingconsisting ofof labelslabels whichwhich cancan bebe convertedconverted toto ASCIIASCII formatformat „ InitialInitial setset ofof protocolsprotocols defineddefined inin 20032003

www.crulp.org 20 IDNsIDNs

„ AA layerlayer thatthat takestakes thethe addressaddress inin locallocal languageslanguages andand convertsconverts thatthat intointo ASCIIASCII formatformat „ DNSDNS continuescontinues toto resolveresolve ASCIIASCII formatformat addressesaddresses „ IDNsIDNs maymay bebe resolvedresolved atat thethe UserUser’’ss computercomputer

www.crulp.org 21 InternationalInternational DomainDomain NameName inin ApplicationsApplications (IDNA)(IDNA)

a.comاتaاردو.www

1. Convert to ASCII Compatible Encoding http://www.xn--mgbahbnpifd6na4a4c58gep.com/

7. Request Reply

5. 192.16.0.1 ISP Host Server 6. Requested Found / Not Found

4. 192.16.0.1 Domain Name Server www.crulp.org 22 IDNAIDNA 200X200X

„ SomeSome IssuesIssues observedobserved inin thethe originaloriginal IDNA2003IDNA2003 protocolprotocol „ DependenceDependence onon UnicodeUnicode verver.. 3.23.2 „ HardcodedHardcoded languagelanguage specificspecific separatorsseparators „ …… „ DecisionDecision toto reviserevise thethe originaloriginal standardstandard takentaken inin 20062006 „ NewNew standard,standard, IDNAIDNA 200X200X currentlycurrently underunder developmentdevelopment

www.crulp.org 23 IDNAIDNA 200X200X

„ AssignsAssigns valuesvalues toto allall UnicodeUnicode CharacterCharacter DatabaseDatabase (UCD)(UCD) onon thethe basisbasis ofof UnicodeUnicode propertiesproperties „ VALIDVALID (or(or allowed)allowed) „ DISALLOWEDDISALLOWED „ CONTEXTOCONTEXTO oror CONTEXTJCONTEXTJ (depends(depends onon thethe context)context)

www.crulp.org 24 MorningMorning SessionSession

9 Background:Background: UnicodeUnicode 9 InternationalizedInternationalized DomainDomain NamesNames (IDNs)(IDNs) „ ArabicArabic IDNsIDNs „ SampleSample (tentative)(tentative) solutionsolution forfor UrduUrdu languagelanguage

www.crulp.org 25 ArabicArabic ScriptScript

„ ArabicArabic scriptscript isis thethe secondsecond largestlargest scriptscript afterafter LatinLatin script.script. „ ItIt isis usedused forfor writingwriting Arabic,Arabic, Urdu,Urdu, Persian,Persian, Baluchi,Baluchi, Pashto,Pashto, SindhiSindhi andand manymany otherother languageslanguages acrossacross PakistanPakistan andand thethe worldworld

www.crulp.org 26 ArabicArabic ScriptScript

„ ArabicArabic scriptscript isis defineddefined from:from: „ U+0600U+0600 toto U+06FFU+06FF „ U+0750U+0750 toto U+077FU+077F „ U+FB50U+FB50 toto U+FDFFU+FDFF (Obsolete presentation forms) „ U+FE70U+FE70 toto U+FEFFU+FEFF (Obsolete presentation forms except U+FDFx sequence)

www.crulp.org 27 ArabicArabic ScriptScript

„ CursiveCursive scriptscript „ ShapeShape ofof eacheach letterletter maymay havehave fourfour differentdifferent shapesshapes dependingdepending onon itsits positionposition (iso(isolated,lated, initial,initial, medialmedial oror final)final) „ WrittenWritten fromfrom rightright toto leftleft „ ButBut NumeralsNumerals writtenwritten leftleft toto rightright

www.crulp.org 28 ArabicArabic ScriptScript

„ DiacriticsDiacritics (optionally)(optionally) usedused forfor vowelsvowels „ StretchedStretched shapesshapes usedused forfor texttext justificationjustification „ ShapesShapes ofof lettersletters highlyhighly contextcontext sensitivesensitive

www.crulp.org 29 PositionalPositional ShapesShapes ofof DifferentDifferent LettersLetters

IsolatedIsolated InitialInitial MedialMedial FinalFinal اا اا بب اا اا چچ NANA NANA وو

www.crulp.org 30 IssuesIssues inin ArabicArabic ScriptScript EncodingEncoding

„ SimilarSimilar charactercharacter shapesshapes acrossacross thethe languageslanguages thatthat createscreates confusionsconfusions e.g.e.g. UrduUrdu charactercharacter andand PashtoPashto ىى charactercharacter havehave similarsimilar shapesshapes ىى „ TheseThese differentdifferent shapesshapes areare usedused asas distinctdistinct lettersletters inin differentdifferent languages.languages. InIn SindhiSindhi thesethese areare twotwo differentdifferent characters:characters: (U+06AA),(U+06AA), andand (U+06A9)(U+06A9) کک ڪڪ

www.crulp.org 31 ConfusableConfusable VariantsVariants ofof DifferentDifferent CharactersCharacters

Unicode Isolate Initial Medial Final RemarksRemarks d form form form form U+0643 ARABIC LETTER KAF ﺑﺑﻚﻚ ﺑﺑﻜﻜﺎﺎ آآﺎﺎ كك U+06A9 ARABIC LETTER KEHEH ﺑﺑﮏ ﮏ ﺑﺑﮑﮑﺎﺎ ﮐﮐﺎﺎ ک ک (used for Persian and Urdu) U+06AA ARABIC LETTER SWASH KAF ببڪڪ ببڪڪ ڪڪاا ڪڪ (used for Sindhi) 32 اا www.crulp.org ConfusableConfusable VariantsVariants ofof DifferentDifferent CharactersCharacters Unicode Isolated Initial Medial Final RemarksRemarks form form form form

U+064A ARABIC LETTER YEH ﺑﻴﻠﺑﻴﻠﻲ ﻲ ﻣﻣﻴﻴﻞﻞ ﻳﻳﻊ ﻊ ي ي

U+06CC ARABIC LETTER FARSI ﻳﻠﻳﻠﯽﯽ ﻣﻣﻴﻴﻞﻞ ﻳﻳﻊﻊ ی ی YEH (Arabic, Persian, Urdu) U+0649 ARABIC LETTER ALEF ى ى بب ى ى www.crulp.org MAKSURA33 OptionalOptional DiacriticsDiacritics

„ WordsWords normallynormally writtenwritten withoutwithout diacritics,diacritics, e.g.e.g. inin Urdu:Urdu:

ََ ِِ //ttæ̪æ̪ rr// (swim)(swim) //tt̪ir̪ir// (arrow)(arrow)

www.crulp.org 34 SpaceSpace

„ NoNo conceptconcept ofof spacespace betweenbetween wordswords inin UrduUrdu „ NeedNeed aa separatorseparator charactercharacter „ wordswords maymay assumeassume wrongwrong shapesshapes withoutwithout aa separatorseparator دﺳﺪندﺳﺪن willwill bebe displayeddisplayed erroneouslyerroneously دسدس دندن .e.g.e.g withoutwithout separatorseparator „ OneOne solutionsolution isis ZeroZero WithWith NonNon JoinerJoiner (ZWNJ)(ZWNJ) butbut usersusers unfamiliarunfamiliar withwith it.it.

www.crulp.org 35 BidirectionalityBidirectionality

„„ aaاتاتaaاردواردوaaaaادادaaارچارچaaaa20012001aaررaa۔۔aa

www.crulp.org 36 NormalizationNormalization

„ ThereThere areare characterscharacters thatthat cancan bebe typedtyped inin moremore thanthan oneone way,way, e.g.e.g. U+0653U+0653 ++ ((اا )) U+0627U+0627 == (( ﺁﺁ )) U+0622U+0622 „ WeWe havehave toto normalizenormalize thesethese characterscharacters

www.crulp.org 37 NormalizationNormalization

Composed Form Decomposed Form U+0653U+0653 ++ ((ا ا )) U+0627U+0627 ((ﺁ ﺁ )) U+0622U+0622 U+0654U+0654 ++ ((ا ا )) U+0627U+0627 ((أ أ )) U+0623U+0623 U+0653U+0653 ++ ((و و )) U+0648U+0648 ((ؤ ؤ )) U+0624U+0624 U+0655U+0655 ++ ((ا ا )) U+0627U+0627 ((إ إ )) U+0625U+0625 U+0654U+0654 ++ ((يي)) U+064AU+064A ((ئئ)) U+0626U+0626 U+0674U+0674 ++ ((ا ا )) U+0627U+0627 ((ٵٵ)) U+0675U+0675 www.crulp.org 38 NormalizationNormalization

Composed Form Decomposed Form U+0674U+0674 ++ ((وو)) U+0648U+0648 ((ٶٶ)) U+0676U+0676 U+0674U+0674 ++ ((وو)) U+06C7U+06C7 ((ٶٶ)) U+0677U+0677 U+0674+U+0674+ ((يي)) U+064AU+064A ((ٸٸ)) U+0678U+0678 U+0654U+0654 ++ (( ەە)) U+06D5U+06D5 ((ۀۀ)) U+06C0U+06C0 U+0654U+0654 ++ ((ﮦﮦ)) U+06C1U+06C1 ((ۂۂ)) U+06C2U+06C2 U+0654U+0654 ++ ((ےے)) U+06D2U+06D2 ((ۓۓ)) U+06D3U+06D3

www.crulp.org 39 ConfusableConfusable CharactersCharacters

ک ک ک ک كك (U+0643)(U+0643) (U+06A9)(U+06A9) (U+06AA)(U+06AA) ۍۍ ىى یی (U+06CC)(U+06CC) (U+0649)(U+0649) (U+06CD)(U+06CD) ﻩﻩ ﮦﮦ ەە (U+0647)(U+0647) (U+06C1)(U+06C1) (U+06D5)(U+06D5) www.crulp.org 40 ConfusableConfusable CharactersCharacters…… (U+06F0)٠٠ (U+0660)٠٠ (U+06F1) ١١ (U+0661) ١١ (U+06F2) ٢٢ (U+0662) ٢٢ (U+06F3) ٣٣ (U+0663) ٣٣ (U+0665) ۵۵ (U+06F5) ٥٥ (U+06F7) ٧٧ (U+0667) ٧٧ (U+06F8) ٨٨ (U+0668) ٨٨ (U+06F9) ٩٩ (U+0669) ٩٩ (U+0672) ٲٲ (U+0623) أ أ

U+0673) 41) ٳٳ (U+0625) ٳٳ www.crulp.org MorningMorning SessionSession

9 Background:Background: UnicodeUnicode 9 InternationalizedInternationalized DomainDomain NamesNames (IDNs)(IDNs) 9 ArabicArabic IDNsIDNs „ SampleSample (tentative)(tentative) solutionsolution forfor UrduUrdu languagelanguage

www.crulp.org 42 UrduUrdu IDNsIDNs

„ TheThe followingfollowing areare CONTEXTOCONTEXTO byby IDNA200XIDNA200X butbut areare notnot recommendedrecommended forfor UrduUrdu

www.crulp.org 43 Character Unicode Description Current Recommen- status in dation IDNA 200x 0600 ARABIC NUMBER CONTEXTO NO SIGN ؀؀ 0601 ARABIC SIGN CONTEXTO NO SANAH ؁؁ 0602 ARABIC CONTEXTO NO FOOTNOTE ؂؂ MARKER 0603 ARABIC SIGN CONTEXTO NO SAFHA ؃؃ 06DD CONTEXTO NO 06DD END OF AYAH NO ۝۝

www.crulp.org 44 UrduUrdu IDNsIDNs

„ TheThe followingfollowing areare PVALIDPVALID byby IDNA200XIDNA200X butbut thesethese areare notnot recommendedrecommended forfor UrduUrdu

www.crulp.org 45 Character Unicode Description Current Recommen- status in dation IDNA 200x 0615 ARABIC SMALL PVALID NO LL HIGH TAH 0640 ARABIC PVALID NO TATWEEL ــ 0657 ARABIC PVALID NO INVERTED ٗٗ DAMMA 0659 ARABIC PVALID NO ZWARAKAY 065A ARABIC VOWEL PVALID NO SIGN SMALL V ABOVE www.crulp.org 46 Character Unicode Description Current Recommen- status in dation IDNA 200x 065B ARABIC VOWEL PVALID NO SIGN INVERTED PVALID

SMALL V ABOVE 065C ARABIC VOWEL PVALID NO SIGN DOT BELOW 065D ARABIC REVERSED PVALID NO DAMMA

065E ARABIC FATHA PVALID NO WITH TWO DOTS ARABIC LETTER PVALID NO 0671 ٱٱ ALEF WASLA

www.crulp.org 47 Character Unicode Description Current Recommen- status in dation IDNA 200x

0672 ARABIC LETTER PVALID NO ALEF WITH WAVY PVALID ٲٲ

HAMZA ABOVE PVALID NO ٲARABIC LETTER 0673 ALEF WITH WAVY ٳٳ

HAMZA BELOW ARABIC LETTER PVALID NO 0674 ٔٔ HIGH HAMZA 067A ARABIC LETTER PVALID NO ٺٺ TTEHEH 067B ARABIC LETTER PVALID NO ٻٻ BEEH www.crulp.org 48 Character Unicode Description Current Recommen- status in dation IDNA 200x 067C ARABIC LETTER PVALID NO ټټ TEH WITH RING 067D ARABIC LETTER PVALID NO TEH WITH THREE ٽٽ DOTS ABOVE DOWN 067F ARABIC LETTER PVALID NO ٿٿ TEHEH ARABIC LETTER PVALID NO 0680 ڀڀ BEHEH 0681 ARABIC LETTER PVALID NO HAH WITH HAMZA ځځ

ABOVE www.crulp.org 49 Character Unicode Description Current Recommen- status in dation IDNA 200x 0682 ARABIC LETTER PVALID NO HAH WITH TWO ڂڂ DOTS VERTICAL

ABO ARABIC LETTER PVALID NO 0683 ڃڃ NYEH ARABIC LETTER PVALID NO 0684 ڄڄ DYEH 0685 ARABIC LETTER PVALID NO HAH WITH THREE څڅ

DOTS ABOVE ARABIC LETTER PVALID NO 0687 ڇڇ TCHEHEH www.crulp.org 50 Character Unicode Description Current Recommen- status in dation IDNA 200x ARABIC LETTER PVALID NO 0689 ډډ DAL WITH RING 068A ARABIC LETTER PVALID NO DAL WITH DOT ڊڊ

BELOW 068B ARABIC LETTER PVALID NO DAL WITH DOT ڋڋ BELOW AND SMALL

T 068C ARABIC LETTER PVALID NO ﮄﮄ DAHAL 068D ARABIC LETTER PVALID NO ﮂﮂ DDAHAL www.crulp.org 51 Character Unicode Description Current Recommen- status in dation IDNA 200x 068E ARABIC LETTER PVALID NO DUL PVALID ﮆﮆ 068F ARABIC LETTER PVALID NO DAL WITH THREE ڏڏ DOTS ABOVE

DOWN 0690 ARABIC LETTER PVALID NO DAL WITH FOUR ڐڐ

DOTS ABOVE ARABIC LETTER PVALID NO 0692 ڒڒ REH WITH SMALL V

ARABIC LETTER PVALID NO 0693 ړړ REH WITH RING www.crulp.org 52 Character Unicode Description Current Recommen- status in dation IDNA 200x

0694 ARABIC LETTER PVALID NO REH WITH DOT PVALID ڔڔ

BELOW 0695 ARABIC LETTER PVALID NO REH WITH SMALL V ڕڕ

BELOW 0696 ARABIC LETTER PVALID NO REH WITH DOT ږږ BELOW AND DOT

ABO 0697 ARABIC LETTER PVALID NO REH WITH TWO ڗڗ

DOTS ABOVE 0699 ARABIC LETTER PVALID NO REH WITH FOUR ڙڙ

DOTS ABOVE www.crulp.org 53 Character Unicode Description Current Recommen- status in dation IDNA 200x 069A ARABIC LETTER PVALID NO SEEN WITH DOT PVALID ښښ BELOW AND DOT AB 069B ARABIC LETTER PVALID NO SEEN WITH THREE ڛڛ

DOTS BELOW 069C ARABIC LETTER PVALID NO SEEN WITH THREE ڜڜ

DOTS BELOW AND 069D ARABIC LETTER PVALID NO SAD WITH TWO ڝڝ

DOTS BELOW 069E ARABIC LETTER PVALID NO SAD WITH THREE ڞڞ www.crulp.org DOTS ABOVE 54 Character Unicode Description Current Recommen- status in dation IDNA 200x

069F ARABIC LETTER PVALID NO TAH WITH THREE PVALID ڟڟ

DOTS ABOVE 06A0 ARABIC LETTER PVALID NO AIN WITH THREE ڠڠ

DOTS ABOVE 06A1 ARABIC LETTER PVALID NO ڡڡ DOTLESS FEH 06A2 ARABIC LETTER PVALID NO FEH WITH DOT ڢڢ

MOVED BELOW www.crulp.org 55 Character Unicode Description Current Recommen- status in dation IDNA 200x 06A3 ARABIC LETTER PVALID NO FEH WITH DOT PVALID ڣڣ

BELOW 06A4 ARABIC LETTER PVALID NO ڤڤ VEH 06A5 ARABIC LETTER PVALID NO FEH WITH THREE ڥڥ

DOTS BELOW 06A6 ARABIC LETTER PVALID NO ڦڦ PEHEH 06A7 ARABIC LETTER PVALID NO QAF WITH DOT ڧڧ

ABOVE www.crulp.org 56 Character Unicode Description Current Recommen- status in dation IDNA 200x 06A8 ARABIC LETTER PVALID NO QAF WITH THREE PVALID ڨڨ

DOTS ABOVE 06AA ARABIC LETTER PVALID NO ڪڪ SWASH KAF 06AB ARABIC LETTER PVALID NO ګګ KAF WITH RING 06AC ARABIC LETTER PVALID NO KAF WITH DOT ګګ

ABOVE 06AD ARABIC LETTER PVALID NO NG ڭڭ www.crulp.org 57 Character Unicode Description Current Recommen- status in dation IDNA 200x

06AE ARABIC LETTER PVALID NO KAF WITH THREE PVALID ڮڮ

DOTS BELOW 06B0 ARABIC LETTER PVALID NO WITH RING ڰڰ 06B1 ARABIC LETTER PVALID NO ڱڱ NGOEH 06B2 ARABIC LETTER PVALID NO GAF WITH TWO ڲڲ DOTS BELOW

www.crulp.org 58 Character Unicode Description Current Recommen- status in dation IDNA 200x 06B3 ARABIC LETTER PVALID NO ڳڳ 06B4 ARABIC LETTER PVALID NO GAF WITH THREE ڴڴ

DOTS ABOVE 06B5 ARABIC LETTER PVALID NO ڵڵ LAM WITH SMALL V

06B6 ARABIC LETTER PVALID NO LAM WITH DOT ڶڶ

ABOVE 06B7 ARABIC LETTER PVALID NO LAM WITH THREE ڷڷ www.crulp.org DOTS ABOVE 59 Character Unicode Description Current Recommen- status in dation IDNA 200x

06B8 ARABIC LETTER PVALID NO LAM WITH THREE PVALID ڸڸ DOTS BELOW 06B9 ARABIC LETTER PVALID NO NOON WITH DOT ڹڹ

BELOW 06BB ARABIC LETTER PVALID NO ڻڻ RNOON 06BC ARABIC LETTER PVALID NO ڼڼ NOON WITH RING

www.crulp.org 60 Character Unicode Description Current Recommen- status in dation IDNA 200x 06BD ARABIC LETTER PVALID NO NOON WITH PVALID ڽڽ THREE DOTS

ABOVE 06BF ARABIC LETTER PVALID NO TCHEH WITH DOT ڿڿ ABOVE 06C4 ARABIC LETTER PVALID NO WITH RING ۄۄ 06C5 ARABIC LETTER PVALID NO ۅۅ KIRGHIZ OE 06C6 PVALID NO 06C6 ARABIC LETTER OE NO ۆۆ

www.crulp.org 61 Character Unicode Description Current Recommen- status in dation IDNA 200x 06C7 NO 06C7 ARABIC LETTER U PVALID NO ۇۇ 06C8 PVALID NO 06C8 ARABIC LETTER YU NO ۈۈ

06C9 ARABIC LETTER PVALID NO ۉۉ KIRGHIZ YU 06CA ARABIC LETTER PVALID NO WAW WITH TWO ۊۊ

DOTS ABOVE 06CB ARABIC LETTER VE PVALID NO ۋۋ

www.crulp.org 62 Character Unicode Description Current Recommen- status in dation IDNA 200x 06CD ARABIC LETTER PVALID NO ۍۍ YEH WITH TAIL 06CE ARABIC LETTER PVALID NO ێێ YEH WITH SMALL V

06CF ARABIC LETTER PVALID NO WAW WITH DOT ۏۏ

ABOVE 06D0 PVALID NO 06D0 ARABIC LETTER E NO ېې

06D1 ARABIC LETTER PVALID NO YEH WITH THREE ۑۑ

DOTS BELOW www.crulp.org 63 Character Unicode Description Current Recommen- status in dation IDNA 200x 06D5 ARABIC LETTER AE PVALID NO ەە 06D6 ARABIC SMALL PVALID NO HIGH LIGATURE SAD WITH LAM

WITH 06D7 ARABIC SMALL PVALID NO HIGH LIGATURE QAF WITH LAM

WITH 06D8 ARABIC SMALL PVALID NO HIGH MEEM

INITIAL FORM 06D9 ARABIC SMALL PVALID NO HIGH LAM ALEF

www.crulp.org 64 Character Unicode Description Current Recommen- status in dation IDNA 200x

06DA ARABIC SMALL PVALID NO HIGH JEEM 06DB ARABIC SMALL PVALID NO HIGH THREE DOTS 06DC ARABIC SMALL PVALID NO HIGH SEEN 06DF ARABIC SMALL PVALID NO HIGH ROUNDED ZERO

www.crulp.org 65 Character Unicode Description Current Recommen- status in dation IDNA 200x 06E0 ARABIC SMALL PVALID NO HIGH UPRIGHT PVALID RECTANGULAR

ZERO 06E1 ARABIC SMALL PVALID NO HIGH DOTLESS HEAD OF KHAH 06E2 ARABIC SMALL PVALID NO HIGH MEEM

ISOLATED FORM 06E3 ARABIC SMALL PVALID NO LOW SEEN 06E4 ARABIC SMALL PVALID NO HIGH MADDA www.crulp.org 66 Character Unicode Description Current Recommen- status in dation IDNA 200x 06E5 ARABIC SMALL PVALID NO WAW 06E6 ARABIC SMALL YEH PVALID NO

06E7 ARABIC SMALL PVALID NO HIGH YEH 06E8 ARABIC SMALL PVALID NO HIGH NOON 06EA ARABIC EMPTY PVALID NO CENTRE LOW STOP

www.crulp.org 67 Character Unicode Description Current Recommen- status in dation IDNA 200x

06EB ARABIC EMPTY PVALID NO CENTRE HIGH PVALID

STOP 06EC ARABIC ROUNDED PVALID NO HIGH STOP WITH

FILLED CENTRE 06ED ARABIC SMALL PVALID NO LOW MEEM

06EE ARABIC LETTER PVALID NO DAL WITH INVERTED V 06EF ARABIC LETTER PVALID NO REH WITH

06FF ARABIC LETTER PVALID NO HEH WITH INVERTED V www.crulp.org 68 UrduUrdu LanguageLanguage TableTable ForFor IDNsIDNs

„ FollowingFollowing characterscharacters areare DISALLOWEDDISALLOWED byby IDNAIDNA 200X200X andand areare notnot requiredrequired

www.crulp.org 69 Character Unicode Description Current Recommen- status in dation IDNA 200x 060B AFGHANI SIGN DISALLOWED N0

060C DISALLOWED ARABIC COMMA N0

060D ARABIC DATE DISALLOWED N0 SEPARATOR

060E ARABIC POETIC DISALLOWED N0 VERSE SIGN

060F DISALLOWED ARABIC SIGN MISRA N0

www.crulp.org 70 Character Unicode Description Current Recommen- status in dation IDNA 200x ARABIC 061B DISALLOWED N0 SEMICOLON 061E ARABIC TRIPLE DISALLOWED N0 DOT PUNCTUATION

MARK 061F ARABIC QUESTION DISALLOWED N0 MARK 066A ARABIC PERCENT DISALLOWED N0 SIGN 066B ARABIC DECIMAL DISALLOWED N0 SEPARATOR www.crulp.org 71 Character Unicode Description Current Recommen- status in dation IDNA 200x

ARABIC 066C DISALLOWED N0 THOUSANDS DISALLOWED

SEPARATOR 066D ARABIC FIVE DISALLOWED N0 POINTED STAR

ARABIC LETTER DISALLOWED DISALLOWED N0 ٲ 0675 HIGH HAMZA ALEF 0676 ARABIC LETTER DISALLOWED N0 HIGH HAMZA WAW 0677 ARABIC LETTER U DISALLOWED N0 WITH HAMZA ABOVE 0678 ARABIC LETTER DISALLOWED N0 HIGH HAMZA YEH

www.crulp.org 72 Character Unicode Description Current Recommen- status in dation IDNA 200x 06D4 ARABIC LETTER DISALLOWED N0 HAH WITH HAMZA ABOVE 06DE ARABIC START OF DISALLOWED N0 RUB EL HIZB 06E9 ARABIC PLACE OF DISALLOWED N0 SAJDAH 06FD ARABIC SIGN DISALLOWED N0 SINDHI

AMPERSAND FDF2 ARABIC LIGATURE DISALLOWED N0 ALLAH ISOLATED ﷲﷲ

FORM

www.crulp.org 73 Character Unicode Description Current Recommen- status in dation IDNA 200x ARABIC LIGATURE FDF3 DISALLOWED N0 AKBAR ISOLATED ﷳﷳ

FORM FDF4 ARABIC LIGATURE DISALLOWED N0 MOHAMMAD ﷴﷴ ISOLATED FORM FDF5 ARABIC LIGATURE DISALLOWED N0 SALAM ISOLATED

FORM FDF6 ARABIC LIGATURE DISALLOWED N0 RASOUL ISOLATED ﷶﷶ

FORM FDF7 ARABIC LIGATURE DISALLOWED N0 ALAYHE ISOLATED ﷷﷷ www.crulp.org FORM 74 Character Unicode Description Current Recommen- status in dation IDNA 200x ARABIC LIGATURE FDF8 DISALLOWED N0 WASALLAM ﷸ ISOLATED FORM ﷸ FDF9 ARABIC LIGATURE DISALLOWED N0 SALLA ISOLATED

FORM ﷹ FDFA ARABIC LIGATURE DISALLOWED N0 SALLALLAHOU ALAYHE ﷺ WASALLAM FDFB ARABIC LIGATURE DISALLOWED N0 ¡ JALLAJALALOUHOU www.crulp.org 75 UrduUrdu IDNsIDNs

„ FollowingFollowing areare PVALIDPVALID characterscharacters byby IDNA200XIDNA200X decisiondecision andand areare requiredrequired forfor UrduUrdu

www.crulp.org 76 Character Unicode Description Current Recommen- status in dation IDNA 200x 0610 ARABIC SIGN YES SALLALLAHOU PVALID ALAYHE

WASSALLAM 0611 ARABIC SIGN PVALID YES ALAYHE ASSALLAM

0612 ARABIC SIGN PVALID YES RAHMATULLAH

ALAYHE 0613 ARABIC SIGN RADI PVALID YES ALLAHOU ANHU

0614 ARABIC SIGN PVALID YES TAKHALLUS

www.crulp.org 77 Character Unicode Description Current Recommen- status in dation IDNA 200x ARABIC LETTER PVALID YES 0621 ءء HAMZA 0622 ARABIC LETTER PVALID YES ALEF WITH MADDA ﺁﺁ ABOVE 0623 ARABIC LETTER PVALID YES ALEF WITH HAMZA أأ

ABOVE 0624 ARABIC LETTER PVALID YES WAW WITH HAMZA ؤؤ

ABOVE 0625 ARABIC LETTER PVALID YES ALEF WITH HAMZA إإ (Variant of Base BELOW Character) www.crulp.org 78 Character Unicode Description Current Recommen- status in dation IDNA 200x 0626 ARABIC LETTER PVALID YES YEH WITH HAMZA ئئ (Variant of Base ABOVE Character)

ARABIC LETTER PVALID YES 0627 اا ALEF ٲ ARABIC LETTER PVALID YES 0628 بب BEH 0629 ARABIC LETTER PVALID YES TEH MARBUTA ةة (Variant of Base Character) 062A ARABIC LETTER PVALID YES تت TEH 062B ARABIC LETTER PVALID YES ثث THEH www.crulp.org 79 Character Unicode Description Current Recommen- status in dation IDNA 200x 062C ARABIC LETTER PVALID YES جج JEEM 062D ARABIC LETTER PVALID YES حح HAH 062E ARABIC LETTER PVALID YES خخ KHAH 062F ARABIC LETTER PVALID YES دد DAL ARABIC LETTER PVALID YES 0630 ذذ THAL www.crulp.org 80 Character Unicode Description Current Recommen- status in dation IDNA 200x ARABIC LETTER PVALID YES 0631 رر REH ARABIC LETTER PVALID YES 0632 زز ZAIN ARABIC LETTER PVALID YES 0633 سس SEEN ARABIC LETTER PVALID YES 0634 شش SHEEN ARABIC LETTER PVALID YES 0635 صص SAD

www.crulp.org 81 Character Unicode Description Current Recommen- status in dation IDNA 200x ARABIC LETTER PVALID YES 0636 ضض DAD 0637 ARABIC LETTER PVALID YES TAH طط

ARABIC LETTER PVALID YES 0638 ظظ ZAH ARABIC LETTER PVALID YES 0639 عع AIN

www.crulp.org 82 Character Unicode Description Current Recommen- status in dation IDNA 200x 063A ARABIC LETTER PVALID YES غغ GHAIN ARABIC LETTER PVALID YES 0641 ف ف FEH ARABIC LETTER PVALID YES 0642 قق QAF ARABIC LETTER PVALID YES 0643 كك KAF (Variant of Base Character)

www.crulp.org 83 Character Unicode Description Current Recommen- status in dation IDNA 200x ARABIC LETTER PVALID YES 0644 لل LAM ARABIC LETTER PVALID YES 0645 مم MEEM ARABIC LETTER PVALID YES 0646 نن NOON ARABIC LETTER PVALID YES 0647 ﻩﻩ HEH (Variant of Base Character) ARABIC LETTER PVALID YES 0648 وو WAW www.crulp.org 84 Character Unicode Description Current Recommen- status in dation IDNA 200x ARABIC LETTER PVALID YES 0649 ىى ALEF MAKSURA (Variant of Base Character) 064A ARABIC LETTER PVALID YES YEH يي (Variant of Base Character) 064B ARABIC FATHATAN PVALID YES

064C ARABIC PVALID YES DAMMATAN PVALID 064D ARABIC KASRATAN YES

www.crulp.org 85 Character Unicode Description Current Recommen- status in dation IDNA 200x

064E ARABIC FATHA PVALID YES

PVALID 064F ARABIC DAMMA YES

PVALID 0650 ARABIC KASRA YES

PVALID 0651 ARABIC SHADDA YES

PVALID 0652 ARABIC SUKUN YES

www.crulp.org 86 Character Unicode Description Current Recommen- status in dation IDNA 200x 0653 ARABIC MADDAH PVALID YES ABOVE 0654 ARABIC HAMZA PVALID YES ABOVE 0655 ARABIC HAMZA PVALID YES BELOW 0656 ARABIC SUBSCRIPT PVALID YES ALEF

0658 ARABIC MARK PVALID YES NOON GHUNNA

www.crulp.org 87 Character Unicode Description Current Recommen- status in dation IDNA 200x ARABIC-INDIC PVALID YES 0660 ٠٠ DIGIT ZERO (Variant of Base Character) ARABIC-INDIC PVALID YES 0661 ١١ DIGIT ONE (Variant of Base Character) ARABIC-INDIC PVALID YES 0662 ٢٢ DIGIT TWO (Variant of Base Character) ARABIC-INDIC PVALID YES 0663 ٣٣ DIGIT THREE (Variant of Base Character) www.crulp.org 88 Character Unicode Description Current Recommen- status in dation IDNA 200x ARABIC-INDIC PVALID YES 0664 ٤٤ DIGIT FOUR (Variant of Base Character) ARABIC-INDIC PVALID YES 0665 ٥٥ DIGIT FIVE (Variant of Base Character) ARABIC-INDIC PVALID YES 0666 ٦٦ DIGIT SIX (Variant of Base Character) ARABIC-INDIC PVALID YES 0667 ٧٧ DIGIT SEVEN (Variant of Base Character) 0668 ARABIC-INDIC PVALID YES DIGIT EIGHT ٨٨ (Variant of Base Character) www.crulp.org 89 Character Unicode Description Current Recommen- status in dation IDNA 200x ARABIC-INDIC PVALID YES 0669 ٩٩ DIGIT NINE (Variant of Base Character) 0670 ARABIC LETTER PVALID YES SUPERSCRIPT ALEF

ARABIC LETTER PVALID YES 0679 ٹٹ TTEH 067E ARABIC LETTER PVALID YES پپ PEH ARABIC LETTER PVALID YES 0686 چچ TCHEH ARABIC LETTER PVALID YES 0688 ڈڈ DDAL

www.crulp.org 90 Character Unicode Description Current Recommen- status in dation IDNA 200x ARABIC LETTER PVALID YES 0691 ڑڑ RREH ARABIC LETTER PVALID YES 0698 ژژ JEH 06A9 ARABIC LETTER PVALID YES کک KEHEH 06AF ARABIC LETTER PVALID YES گگ GAF 06BA ARABIC LETTER PVALID YES ںں NOON GHUNNA

www.crulp.org 91 Character Unicode Description Current Recommen- status in dation IDNA 200x 06BE ARABIC LETTER PVALID YES HEH PVALID هه

DOACHASHMEE 06C0 ARABIC LETTER PVALID YES تت HEH WITH YEH (Variant of Base character) ABOVE 06C1 ARABIC LETTER PVALID YES ﮦﮦ HEH GOA 06C2 ARABIC LETTER PVALID YES HEH GOAL WITH ۂۂ

HAMZA ABOVE 06C3 ARABIC LETTER PVALID YES TEH MARBUTA ۃۃ www.crulp.org GOAL 92 Character Unicode Description Current Recommen- status in dation IDNA 200x 06CC ARABIC LETTER PVALID YES یی FARSI YEH 06D2 ARABIC LETTER PVALID YES ےے YEH BARREE 06D3 ARABIC LETTER PVALID YES YEH BARREE WITH ۓۓ

HAMZA ABOVE 06F0 EXTENDED PVALID YES ARABIC-INDIC ٠٠

DIGIT ZERO

www.crulp.org 93 Character Unicode Description Current Recommen- status in dation IDNA 200x 06F1 EXTENDED PVALID YES ARABIC-INDIC PVALID ١١

DIGIT ONE 06F2 EXTENDED PVALID YES ARABIC-INDIC ٢٢

DIGIT TWO 06F3 EXTENDED PVALID YES ARABIC-INDIC ٣٣

DIGIT THREE 06F4 EXTENDED PVALID YES ARABIC-INDIC ۴۴

DIGIT FOUR 06F5 EXTENDED PVALID YES ARABIC-INDIC ۵۵

DIGIT FIVE

www.crulp.org 94 Character Unicode Description Current Recommen- status in dation IDNA 200x

06F6 EXTENDED PVALID YES ARABIC-INDIC PVALID ۶۶

DIGIT SIX 06F7 EXTENDED PVALID YES ARABIC-INDIC ٧٧

DIGIT SEVEN 06F8 EXTENDED PVALID YES ARABIC-INDIC ٨٨

DIGIT EIGHT 06F9 EXTENDED PVALID YES ARABIC-INDIC ٩٩

DIGIT NINE

www.crulp.org 95 AfternoonAfternoon SessionSession

„ SampleSample languagelanguage tablestables forfor thethe followingfollowing languageslanguages „ Balochi „ Pashto „ Punjabi „ Seraiki „ Sindhi „ Urdu „ Torwali „ CollectiveCollective IssuesIssues forfor multiplemultiple languageslanguages

www.crulp.org 96 CollectiveCollective IssuesIssues

„ Separator „ Diacritics „ Honorifics „ Confusable characters „ Kaf „ Yay „ Hay „ Gol Tay „ Others? „ Digits „ Space/ZWNJ „ Blocking vs. Bundling www.crulp.org 97