
Email addresses and domain names are non-latin! Now what? Jim DeLaHunt / IUC44 / 14 October 2020 Universal Acceptance +1 ,000,000,000 Internationalized domain names, email addresses T e ne!t one billion internet users use a "ide variety of lan&uages and scripts. (tandards allow email addresses, and domain names, in scripts t e$ can easily rea#' T is T الهند :o @ أشوكا. .is an introduction to t ose standards مانيش 世界 To: données@fußballplatz.technology 试。 受 - 测 遍接 2 :// 普 http A!stract Email addresses, and domain names, are no longer limited to ASCII Latin script. The can now !e or """ @ or donn#es$%&'ballplatz.technology . مانيش أشوكا.الهند http:// 普遍接受 - 测试。世界 So%tware, %ramewor(s, and wor()ows will need to change to accommodate. *hat are Internationalized +omain Names ,I+N- and Email Address Internationalization ,EAI-? *hat do o& need to (now? *hat do o& do ne.t? This tutorial !rings o& &p to speed. It e.plains I+N and EAI. It shows o& the implications. It connects o& to so&rces o% in%ormation. It helps o& &nderstand what this will mean %or o&. S&ita!le %or so%tware developers, 0A, marketers, system administrators, and management. * Agenda Slides1 http122go.3dlh.com/i&c44t5t5 (lin(s in slides- * + o "e are, UASG) Jim DeLaHunt * Conte!t: the ne!t one billion) and previous domain names, email * + at’s ne", so man$ top-level domain names, Internationalized Domain Names 2ID1s3, Email Address Internationalization (4AI3 * 5ene6ts of, Issues "ith ID1s, 4AI * 7ore resources * 1e!t steps * 89A 4 *ho we are : Who we are 6ni/ersal Acceptance Steering Group (6AS7- * ttp,//w""'uas&'tech * Community/led initiative, "orl#/"ide * Raise awareness, identi%$ problems, solve them * =ro>ect of ICAN1, the domain name system organisation 8im DeLa9&nt * ttp,//jdl 'com, ☎?@1-;04-*A;/BC:* * Dancouver, Canada * Consultant in multilin&ual websitesE software engineer * UAS- volunteer participant ; UASG materials available UAS- operates primaril$ by public education' =articipants "rite outreach materials, tec nical notes' T e$ give presentations to industry meetin&s. T e$ evaluate, report, and follow up on UA issue reports' Technical Notes (selection- * UAS-004 Use Cases for UA Readiness Evaluation * UAS-010 8uick -ui#e to Linki6cation * UAS-018 =ro&rammin& Langua&es 4valuation Criteria :l&s C-le/el outreach papers, magazine articles, presentations, …. A Who yo& are This talF is a tutorial for those " o Fno" email addresses and Internet domain names primaril$ as A(CII/onl$' +e introduce internationalised domain names (ID1s3 and email addresses (4AI3' (oftware development sFills elpful for some advanced material, to w ich we linF' :rimar a&dience < Users of domain names and email addresses, technicall$ inquisitive < Application developers handlin& domain name an# email addresses < Dev, QA, marketers, system administrators, an# management B Conte.t C The ne.t 1,>>>,>>>,>>0 Internet users th ? (next) billion VS @irst = billion China) In#ia) Thir# +orl#' Hirst "orl#) 1' America) 4urope' Large s are use non-Latin script' Large s are use Latin script' Little marginal North American, Includes large share of North 4uropean increase' American, 4uropean potential' 7ostl$ mobile and small-screen) 7ostl$ #esFtop 9 laptop lo"er s are on #esFtop) laptop' computers) mobile onl$ later. 4!tending to mi#/) lo"er-e#ucate#) Cream of highl$-e#ucate# in each less comfortable "ith Latin script' market) t e best at Latin script 10 +omain names Domain names are the primary way to locate things on the internet' Ori&inal stan#ards limite# domain names an ASCII subset, and thus to Latin script' T is obstructs users of non-Latin langua&es' Names aren.t >ust on/line 2see, ads), or "ritten 2see, saying a domain name3 +omain name standards * ASCII Letters, Di&its, and Hyphen, ma! 63 (<HC10353 * +ell Fno"n Top-Level Domains, .com, 'org' .net, '>p) 'ru, .cn, 'in, I * e'&' Amazon'com, J&en=lus.com, * Appear in man$ areas, e'&' email addresses, URLs, billboards, speech 11 +omain names, e.tended Recent changes permit Internationalized Domain Names for Apps 2ID1A)' T is allows ne" non/Latin TLDs, and non-Latin characters in rest of name' =arallel chan&es permit Latin TLDs with more than three characters. T ousands ave been re&istere#' +omain name e.tensions * Internationalized Domain Names for Apps “ID1A2008” 2<HC:8903 * Replaces earlier ID1A200* * e'&' ttp,// 普遍接受 / 测试。世界 * 'भारत 2“bharat”, India3, . 中国 2C ina3, 「。」 as "ell as M'. * 'tech, 'museum, and undreds more 12 Email addresses (till a mainsta$ of Internet communication' Actuall$ a stack of related specifcations, includin& SMT=, PO=3, I7AP, etc' Ori&inal standards limited email addresses to an ASCII subset, and thus to Latin script' T is obstructs users "ith names from non/Latin-script lan&uages' Email standards * (ubset of ASCII, typicall$ letters, di&its, punctuation 2<HC2822) * mailbox N domain.name, e'&' info@unicode'org * mailbox preferabl$ similar to user’s own name in own script * 7an$ implementations, some deviatin& %rom standards 1* Email addresses, e.tended Domain name e!tensions brin&s change to the domain'name part of email addresses' E!tensions to email address s$ntax permit almost an$ Unicode character in mailbox' Consequences ripple throu&h SMT=, 7I74, I7AP, =OP3, and more' Email Address Internationalization (EAI) standards * 4AI Overvie" and Hrame"ork 2<HC65303 + 6 more <HCs * 4AI requires chan&es to several protocols and components * C ange takes time, so EAI must interoperate "ith le&ac$ email 14 *hatAs new: So man top-le/el domain names! 15 The older, simpler top-le/el domain names The top-level domain name is the part after the fnal M'’ Until 2001, there used to be a small set of */letter generic top-level domains, plus 2-letter country code top/level domains' The$ all consisted of latin letters. Top-le/el domains, up to 2>>= * &eneric: com, e#u, gov, mil, org * country/code) 2-letter: e'&' .ca, .uF, .eu * 5ase# on ISO *1;;/1 standard) "ith supplements * Latin script, letters onl$ 1; Top-le/el domains today Cesource * ttp,//data.iana.org/TLD/tlds-alpha/by/domain'txt * Consider anal$sin& with spreads eet or =$thon code' 1A Top-le/el domains today Cesource * ttp,//data.iana.org/TLD/tlds-alpha/by/domain'txt (e!cerpt belo"3 * Consider anal$sin& with spreads eet or =$thon code' # Version 2020101200… AAA 1O<THW4ST4<17UTUAL … AA<= … J1//CLCHC0EA0B2G2A9GCD I 5OH J1//D4<7-41(54RAT4</CT5 CA J1//D4<7-41(54RATUN-/=+5 I CAB … OUE<ICH CO7 I O+ 1B Top-le/el domains today Cesource * ttp,//data.iana.org/TLD/tlds-alpha/by/domain'txt * Consider anal$sin& with spreads eet or =erl/=$t on code' 0&estions1 * Ho" man$ top-level domain names (TLDs3 no"Q * Ho" man$ be&in "ith “X1//” pre6!? Ho" man$ don’t? * + at is the lon&est TLD not avin& “X1//” pre6!Q * Ho" man$ 3-character TLDs are there nowQ * Ho" man$ TLDs not aving KJ1//” pre6! include di&its or ‘-.Q 1C Top-le/el domains today Cesource * ttp,//data.iana.org/TLD/tlds-alpha/by/domain'txt * Consider anal$sin& with spreads eet or =erl/=$t on code' Answers1 * Ho" man$ top-level domain names (TLDs3 no"? A, 1507 * P Be&in with “X1//L pre6!? Ho" man$ don.t? A: 153 211%3, 1354 * Lon&est TLD not avin& KJ1//” pre6!? A: NORTHWESTERNMUTUAL * Ho" man$ 3-character TLDs are there now? A: 223 * Ho" man$ TLDs not aving KJ1//” pre6! include di&its or ‘-.Q A: 0 20 Internationalized +omain Names (IDNs) 21 IDN: Unicode names, LD9 in%rastr&ct&re T e Domain 1ame ($stem was designe# to permit only Letters, Digits, an# Hyphens 2LDH). It was reliable) but so important, c ange was cautious. + en internationalising, rat er t an add more c aracters to t e D1(, t e$I http:// 普遍接受测试。世界 22 IDN: Unicode names, LD9 in%rastr&ct&re Imapped Unicode characters to LDH. ● Pre6! MJ1//. ● Internationali0ed Domain ● !n//uorr18ad6bbt1e, A-Label 1ame for Applications (ID1A) ● e'&' ● 普遍接受 , U-Label http:// 普遍接受测试 。世界 ● 1ame=rep to normalise http://xn//uorr18ad6bbt1e'!n// ● =unycode to separate non/ rhqvC;& ASCII and map to LDH 2* IDNA U-Labels, A-Labels, NR-L+H labels Domain names are separated b$ perio# ‘.’ into labels' A label using an$thin& outside Letters, Di&its, and Hyphen (LDH) is a U-Label' T e ID1A al&orithm converts to a correspondin& A/Label made of LDH. T e familiar LDH labels are “1</LDH”. +NS and IDNA “la!elsE < e'&' w""'uasg'tec has three labels: “w""L, “uasg”, and “techL < 1</LDH labels, must not start or end "ith “-”, LDH onl$, max len&th 63 < A-Labels: LDH labels, start "ith K!n//L) vali# Pun$code output < U-Labels: Unicode strin& from reversin& Pun$co#e on A/Label < A-Label ← ID1A 21ameprep) =unycode3 al&orithm → U-Label 24 Example U-La!els, A-La!els, NR-L+H labels 6-La!el, A-La!el pairs *hat are these? * 中国 ⇔ !n//6qs8s * munchen, mUnc en * भारत ⇔ !n// 2brj9c * museum * résumS ⇔ !n//rsum/bpad * !n/triF/bpad, xn//triF/bpad * après/ski ⇔ !n//aprs-sFi/30a Tr it! * ttps,//eai'!&enplus'com/ NC-LD9 la!els 7ultilan&uage/To-=un$code/ * com, gov) ca Convertor.>sp * unicodeconference) iuc44 * apres/ski 2: IDN uptake < B'3m IDNs re&istere# (January 20203 * 2'*R of appro! *;;m total #omain names re&istere# * ;0R IDNs un#er Latin/script TLDs e'&' 'com) '!$0 * 40R un#er IDN top/level #omain names < 4!amples * 'VWX) 'VY 2<ussian3) .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages66 Page
-
File Size-