Email Addresses and Domain Names Are Non-Latin! Now What?

Email Addresses and Domain Names Are Non-Latin! Now What?

Email addresses and domain names are non-latin! Now what? Jim DeLaHunt / IUC44 / 14 October 2020 Universal Acceptance +1 ,000,000,000 Internationalized domain names, email addresses T e ne!t one billion internet users use a "ide variety of lan&uages and scripts. (tandards allow email addresses, and domain names, in scripts t e$ can easily rea#' T is T الهند :o @ أشوكا. .is an introduction to t ose standards مانيش 世界 To: données@fußballplatz.technology 试。 受 - 测 遍接 2 :// 普 http A!stract Email addresses, and domain names, are no longer limited to ASCII Latin script. The can now !e or """ @ or donn#es$%&'ballplatz.technology . مانيش أشوكا.الهند http:// 普遍接受 - 测试。世界 So%tware, %ramewor(s, and wor()ows will need to change to accommodate. *hat are Internationalized +omain Names ,I+N- and Email Address Internationalization ,EAI-? *hat do o& need to (now? *hat do o& do ne.t? This tutorial !rings o& &p to speed. It e.plains I+N and EAI. It shows o& the implications. It connects o& to so&rces o% in%ormation. It helps o& &nderstand what this will mean %or o&. S&ita!le %or so%tware developers, 0A, marketers, system administrators, and management. * Agenda Slides1 http122go.3dlh.com/i&c44t5t5 (lin(s in slides- * + o "e are, UASG) Jim DeLaHunt * Conte!t: the ne!t one billion) and previous domain names, email * + at’s ne", so man$ top-level domain names, Internationalized Domain Names 2ID1s3, Email Address Internationalization (4AI3 * 5ene6ts of, Issues "ith ID1s, 4AI * 7ore resources * 1e!t steps * 89A 4 *ho we are : Who we are 6ni/ersal Acceptance Steering Group (6AS7- * ttp,//w""'uas&'tech * Community/led initiative, "orl#/"ide * Raise awareness, identi%$ problems, solve them * =ro>ect of ICAN1, the domain name system organisation 8im DeLa9&nt * ttp,//jdl 'com, ☎?@1-;04-*A;/BC:* * Dancouver, Canada * Consultant in multilin&ual websitesE software engineer * UAS- volunteer participant ; UASG materials available UAS- operates primaril$ by public education' =articipants "rite outreach materials, tec nical notes' T e$ give presentations to industry meetin&s. T e$ evaluate, report, and follow up on UA issue reports' Technical Notes (selection- * UAS-004 Use Cases for UA Readiness Evaluation * UAS-010 8uick -ui#e to Linki6cation * UAS-018 =ro&rammin& Langua&es 4valuation Criteria :l&s C-le/el outreach papers, magazine articles, presentations, …. A Who yo& are This talF is a tutorial for those " o Fno" email addresses and Internet domain names primaril$ as A(CII/onl$' +e introduce internationalised domain names (ID1s3 and email addresses (4AI3' (oftware development sFills elpful for some advanced material, to w ich we linF' :rimar a&dience < Users of domain names and email addresses, technicall$ inquisitive < Application developers handlin& domain name an# email addresses < Dev, QA, marketers, system administrators, an# management B Conte.t C The ne.t 1,>>>,>>>,>>0 Internet users th ? (next) billion VS @irst = billion China) In#ia) Thir# +orl#' Hirst "orl#) 1' America) 4urope' Large s are use non-Latin script' Large s are use Latin script' Little marginal North American, Includes large share of North 4uropean increase' American, 4uropean potential' 7ostl$ mobile and small-screen) 7ostl$ #esFtop 9 laptop lo"er s are on #esFtop) laptop' computers) mobile onl$ later. 4!tending to mi#/) lo"er-e#ucate#) Cream of highl$-e#ucate# in each less comfortable "ith Latin script' market) t e best at Latin script 10 +omain names Domain names are the primary way to locate things on the internet' Ori&inal stan#ards limite# domain names an ASCII subset, and thus to Latin script' T is obstructs users of non-Latin langua&es' Names aren.t >ust on/line 2see, ads), or "ritten 2see, saying a domain name3 +omain name standards * ASCII Letters, Di&its, and Hyphen, ma! 63 (<HC10353 * +ell Fno"n Top-Level Domains, .com, 'org' .net, '>p) 'ru, .cn, 'in, I * e'&' Amazon'com, J&en=lus.com, * Appear in man$ areas, e'&' email addresses, URLs, billboards, speech 11 +omain names, e.tended Recent changes permit Internationalized Domain Names for Apps 2ID1A)' T is allows ne" non/Latin TLDs, and non-Latin characters in rest of name' =arallel chan&es permit Latin TLDs with more than three characters. T ousands ave been re&istere#' +omain name e.tensions * Internationalized Domain Names for Apps “ID1A2008” 2<HC:8903 * Replaces earlier ID1A200* * e'&' ttp,// 普遍接受 / 测试。世界 * 'भारत 2“bharat”, India3, . 中国 2C ina3, 「。」 as "ell as M'. * 'tech, 'museum, and undreds more 12 Email addresses (till a mainsta$ of Internet communication' Actuall$ a stack of related specifcations, includin& SMT=, PO=3, I7AP, etc' Ori&inal standards limited email addresses to an ASCII subset, and thus to Latin script' T is obstructs users "ith names from non/Latin-script lan&uages' Email standards * (ubset of ASCII, typicall$ letters, di&its, punctuation 2<HC2822) * mailbox N domain.name, e'&' info@unicode'org * mailbox preferabl$ similar to user’s own name in own script * 7an$ implementations, some deviatin& %rom standards 1* Email addresses, e.tended Domain name e!tensions brin&s change to the domain'name part of email addresses' E!tensions to email address s$ntax permit almost an$ Unicode character in mailbox' Consequences ripple throu&h SMT=, 7I74, I7AP, =OP3, and more' Email Address Internationalization (EAI) standards * 4AI Overvie" and Hrame"ork 2<HC65303 + 6 more <HCs * 4AI requires chan&es to several protocols and components * C ange takes time, so EAI must interoperate "ith le&ac$ email 14 *hatAs new: So man top-le/el domain names! 15 The older, simpler top-le/el domain names The top-level domain name is the part after the fnal M'’ Until 2001, there used to be a small set of */letter generic top-level domains, plus 2-letter country code top/level domains' The$ all consisted of latin letters. Top-le/el domains, up to 2>>= * &eneric: com, e#u, gov, mil, org * country/code) 2-letter: e'&' .ca, .uF, .eu * 5ase# on ISO *1;;/1 standard) "ith supplements * Latin script, letters onl$ 1; Top-le/el domains today Cesource * ttp,//data.iana.org/TLD/tlds-alpha/by/domain'txt * Consider anal$sin& with spreads eet or =$thon code' 1A Top-le/el domains today Cesource * ttp,//data.iana.org/TLD/tlds-alpha/by/domain'txt (e!cerpt belo"3 * Consider anal$sin& with spreads eet or =$thon code' # Version 2020101200… AAA 1O<THW4ST4<17UTUAL … AA<= … J1//CLCHC0EA0B2G2A9GCD I 5OH J1//D4<7-41(54RAT4</CT5 CA J1//D4<7-41(54RATUN-/=+5 I CAB … OUE<ICH CO7 I O+ 1B Top-le/el domains today Cesource * ttp,//data.iana.org/TLD/tlds-alpha/by/domain'txt * Consider anal$sin& with spreads eet or =erl/=$t on code' 0&estions1 * Ho" man$ top-level domain names (TLDs3 no"Q * Ho" man$ be&in "ith “X1//” pre6!? Ho" man$ don’t? * + at is the lon&est TLD not avin& “X1//” pre6!Q * Ho" man$ 3-character TLDs are there nowQ * Ho" man$ TLDs not aving KJ1//” pre6! include di&its or ‘-.Q 1C Top-le/el domains today Cesource * ttp,//data.iana.org/TLD/tlds-alpha/by/domain'txt * Consider anal$sin& with spreads eet or =erl/=$t on code' Answers1 * Ho" man$ top-level domain names (TLDs3 no"? A, 1507 * P Be&in with “X1//L pre6!? Ho" man$ don.t? A: 153 211%3, 1354 * Lon&est TLD not avin& KJ1//” pre6!? A: NORTHWESTERNMUTUAL * Ho" man$ 3-character TLDs are there now? A: 223 * Ho" man$ TLDs not aving KJ1//” pre6! include di&its or ‘-.Q A: 0 20 Internationalized +omain Names (IDNs) 21 IDN: Unicode names, LD9 in%rastr&ct&re T e Domain 1ame ($stem was designe# to permit only Letters, Digits, an# Hyphens 2LDH). It was reliable) but so important, c ange was cautious. + en internationalising, rat er t an add more c aracters to t e D1(, t e$I http:// 普遍接受测试。世界 22 IDN: Unicode names, LD9 in%rastr&ct&re Imapped Unicode characters to LDH. ● Pre6! MJ1//. ● Internationali0ed Domain ● !n//uorr18ad6bbt1e, A-Label 1ame for Applications (ID1A) ● e'&' ● 普遍接受 , U-Label http:// 普遍接受测试 。世界 ● 1ame=rep to normalise http://xn//uorr18ad6bbt1e'!n// ● =unycode to separate non/ rhqvC;& ASCII and map to LDH 2* IDNA U-Labels, A-Labels, NR-L+H labels Domain names are separated b$ perio# ‘.’ into labels' A label using an$thin& outside Letters, Di&its, and Hyphen (LDH) is a U-Label' T e ID1A al&orithm converts to a correspondin& A/Label made of LDH. T e familiar LDH labels are “1</LDH”. +NS and IDNA “la!elsE < e'&' w""'uasg'tec has three labels: “w""L, “uasg”, and “techL < 1</LDH labels, must not start or end "ith “-”, LDH onl$, max len&th 63 < A-Labels: LDH labels, start "ith K!n//L) vali# Pun$code output < U-Labels: Unicode strin& from reversin& Pun$co#e on A/Label < A-Label ← ID1A 21ameprep) =unycode3 al&orithm → U-Label 24 Example U-La!els, A-La!els, NR-L+H labels 6-La!el, A-La!el pairs *hat are these? * 中国 ⇔ !n//6qs8s * munchen, mUnc en * भारत ⇔ !n// 2brj9c * museum * résumS ⇔ !n//rsum/bpad * !n/triF/bpad, xn//triF/bpad * après/ski ⇔ !n//aprs-sFi/30a Tr it! * ttps,//eai'!&enplus'com/ NC-LD9 la!els 7ultilan&uage/To-=un$code/ * com, gov) ca Convertor.>sp * unicodeconference) iuc44 * apres/ski 2: IDN uptake < B'3m IDNs re&istere# (January 20203 * 2'*R of appro! *;;m total #omain names re&istere# * ;0R IDNs un#er Latin/script TLDs e'&' 'com) '!$0 * 40R un#er IDN top/level #omain names < 4!amples * 'VWX) 'VY 2<ussian3) .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    66 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us