BS ISO/IEC 10646:2017
BSI Standards Publication
Information technology — Universal Coded Character Set (UCS) BS ISO/IEC 10646:2017 BRITISH STANDARD
National foreword
This British Standard is the UK implementation of ISO/IEC 10646:2017. The UK participation in its preparation was entrusted to Technical Committee IST/5, Programming languages, their environments and system software interfaces. A list of organizations represented on this committee can be obtained on request to its secretary. This publication does not purport to include all the necessary provisions of a contract. Users are responsible for its correct application. © The British Standards Institution 2018 Published by BSI Standards Limited 2018 ISBN 978 0 580 90707 4 ICS 35.040.10 Compliance with a British Standard cannot confer immunity from legal obligations. This British Standard was published under the authority of the Standards Policy and Strategy Committee on 31 2018.
Amendments/corrigenda issued since publicationAugust Date Text affected L ISO/IEC 10646 INTERNATIONA STANDARD Fifth edition
2017-12
Information technology — Universal Coded Character Set (UCS)
Technologies de l'information — Jeu universel de caractères codés (JUC)
ence number ISO/IEC 10646:2017(E) Refer
ISO/IEC 2017
© BS ISO/IEC 10646:201710646:2017 ISO/IEC 10646:2017 (E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2017, Published in S ϐǡ witzerland All rights reserved. Unless otherwise no part of this publication may be reproduced or utilized otherwise in any form orthe by r any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission.ϐ Permission can be requested from either ISO at the address below or ISO’s member body in the country of Ch.equester. de Blandonnet 8 • CP 401 CH-1214ISO copyrig V , Gene a, S el. +41 22 749 01 11 ax +41 22ernier 749 09 47 v witzerland T F [email protected] www.iso.org © ISO/IEC 2017 – All rights r ii eserved BS ISO/IEC 10646:2017 ISO/IEC 10646:2017 (E)
ForewordCONTENTSYLL YLLL Introduction e 1 Scope d 2 Normativ references 3 Terms an definitions 4 Conformance information in 4.1 General e o d 4.2 Conformance of terchange ral struc e of the UC 4.3 Conformanc f evices Basic cture d n 5 Gene tur S 6 stru an omenclature of 6.1 Structure of de poi 6.2 Coding characters f 6.3 Types co nts s for code points (UIDs) 6.4 Naming o characters S Id 6.5 Short identifier id 6.6 UC Sequence entifiers he 6.7 Octet sequence entifiers 7 Revision and updating of t UCS 8 Subsets 8.1 General su 8.2 Limited subset S ng f 8.3 Selected bset 9 UC encodi orms 9.1 General 9.2 UTF-8 32 (UCS-4) 9.3 UTF-16 UCS 9.4 UTF- 10 Encoding schemes 10.1 General 10.2 UTF-8 10.3 UTF-16BE 10.4 UTF-16LE 10.5 UTF-16 10.6 UTF-32BE 10.7 UTF-32LE con ol fu ns wi h the 10.8 UTF-32 f f 11 Use of tr nctio t UCS d con ext f 12 Declaration o identification o features of a S ing sche 12.1 Purpose an t o identification 12.2 Identification UC encod me © ISO/IEC 2017 – All rights reserved iii BS ISO/IEC 10646:201710646:2017 ISO/IEC 10646:2017 (E)
of subs s o g phic cha of con rol fu s 12.3 Identification et f ra racters of th ng s m of SO/IEC 202 12.4 Identification t nction et of co ts nd lists 12.5 Identification e codi yste I 2 Block and colle 13 Structure the de char a Block name 14 ction names 14.1 s ed characters in b al co 14.2 Collection names ed 15 Mirror idirection ntext ality of b al te 15.1 Mirror characters ial cha 15.2 Direction idirection xt 16 Spec racters 16.1 General cy sym 16.2 Space characters at 16.3 Curren bols aphic d cha 16.4 Form characters s a va s 16.5 Ideogr escription racters fo ms of cha 16.6 Variation electors nd riation equences Compatibility characters 17 Presentation r racters of 18 ing 19 Order characters of ng cha 20 Combin characters ing class d nical 20.1 Order combini racters in de 20.2 Combin an cano ordering e c 20.3 Appearance co charts com cha 20.4 Alternat oded representations s co ng co cha 20.5 Multiple bining racters g Graph 20.6 Collection ntaini mbining racters 20.7 Combinin eme Joiner of s a s 21 Normalization forms syl 22 Special features individual cripts nd ymbol repertoires of pts us n I o South A c 22.1 Hangul lable composition method musical symb 22.2 Features scri ed i ndia and some ther sian ountries r r s 22.3 Byzantine ols r r C 22.4 Source eferences fo pictographic ymbols r 23 Source eferences fo JK ideographs r f C i 23.1 List of source eferences r n f r C 23.2 Source eferences file or JK deographs r on for C C id 23.3 Source eference presentatio o JK Unified ideographs r r i 23.4 Source eferences presentati JK ompatibility eographs r 24 Source eferences fo Tangut deographs r f e f 24.1 List of source eferences r n f r 24.2 Source eference il or Tangut ideographs 24.3 Source eference presentatio o Tanguts ideographs iv © ISO/IEC 2017 – All rights reserved BS ISO/IEC 10646:2017 ISO/IEC 10646:2017 (E)
r r N r 25 Source eferences fo üshu characters r f e f 25.1 List of source eferences r s a d a 25.2 Source eference il or Nüshu characters Entity 26 Characte name n nnotations fo 26.1 names nam 26.2 Name rmation im 26.3 Single e u 26.4 Name mutability s for CJK 26.5 Name niqueness r s for Tan ut 26.6 Character name ideographs s for u ch 26.7 Characte name g ideographs ter 26.8 Character name Nüsh aracters d UC Id 26.9 Charac names for Hangul syllables of Basic Mul l Plane 27 Name S Sequence entifiers of ry Mul l Plane r scripts and sym ls (SMP) 28 Structure the tilingua of ry Ideographic Plane (SIP) 29 Structure the Supplementa tilingua fo bo of y Ideographic Plan ( 30 Structure the Supplementa of ry Sp Plan (S 31 Structure the Tertiar e TIP) cha and lists of 32 Structure the Supplementa ecial-purpose e SP) 33 Code rts character names 33.1 General s 33.2 Code chart of va iation seq 33.3 Character name list cha and lists of 33.4 Summary standardized r uences x ) s g c r s 33.5 Code rts character names s of cod graphic Anne A (normative Collection of raphic haracters fo ubsets Blocks lists A.1 Collection ed characters d colle s of e co A.2 K A.3 Fixe ction the whole UCS (except Unicod llections) r colle A.4 CJ collections A.5 Othe ctions x ) List of comb A.6 Unicode collections x C ) f r plan 01 to 10 of Anne B (normative ining characters x ) Fo mat 8 (UTF Anne (normative Transformation ormat fo es the UCS (UTF-16) x ) racters i con Anne D (normative UCS Transformation r -8) x F ) at Anne E (normative Mirrored cha n bidirectional text c Anne (informative Form characters c F.1 General format haracters r a ion cha F.2 Script-specifi format characters fo mat cha F.3 Interlinea nnotat racters d fo cha F.4 Subtending r racters e m o F.5 Shorthan rmat racters F.6 Invisibl athematical perators © ISO/IEC 2017 – All rights reserved v BS ISO/IEC 10646:201710646:2017 ISO/IEC 10646:2017 (E)
n musical symbols ge ta ng us Ta F.7 Wester x ) ally s list of r F.8 Langua ggi ing g characters x H ) The u of s” o id tify Anne G (informative Alphabetic orted characte names x ) aphic de cha Anne (informative se “signature t en UCS Anne I (informative Ideogr scription racters Syntax f id raphic d s I.1 General d of aphic d cha I.2 o an eog escription equence x J ( n f ed d w I.3 Individualefinitions the ideogr escription racters Anne informative) Recommendatio or combin receiving/originating evices ith internal x K ) N s o va r storage x L ter ng gu Anne (informative otation f octet lue epresentations x ) s of Anne (informative) Charac nami idelines x N ) al s to re Anne M (informative Source characters of to c ter d r c Anne (informative Extern reference character pertoires of ASN.1 cha r a ct sy N.1 Methods reference harac repertoires an thei oding of ASN.1 cha r syntax N.2 Identification racte bstra ntaxes x P al in mation on K N.3 Identification racte transfer es x Q ( ta for Anne (informative) Addition for CJ Unified ideographs x R ) mes of angul Anne informative) Code mapping ble Hangul syllables x S ure and ar of Anne (informative Na H syllables n p Anne (informative) Proced for the unification rangement CJK ideographs S.1 Unificatio rocedure se n S.2 Arrangement procedure n S.3 Source paratio examples x T ( ge ta ng us ng C S.4 Non-unificatio examples x U ) Cha rs in id Anne informative) Langua ggi i Tag haracters Anne (informative racte entifiers
vi © ISO/IEC 2017 – All rights reserved BS ISO/IEC 10646:2017 ISO/IEC 10646:2017 (E)
Foreword
al ion ) C al chnical ) form he cialized m r . s at rs f ISO ISOIEC (the Internation deOrganizatment offor Standardizational ardsand IE ough(the Internation Electrote e Commis- by sion t spe n to systel withfo worldwider standardizationds of Nationalctivity. bodie ndth ECare membe o or participate inldsthe of velopl st.Internation r Standnal thr ns,technical committeesal nd stablished thetal, respective iaison organizatioith O IEC, dealso particulaart he fiel k. n techn icalld af ISOna Inology,technical O andcommittees IEC collaborate in fie t mutua intere Othe, IS internatioC J C 1 organizatio government a non-governmen in l w IS and a take p in t wor I the fie o informatio tech IS have established pr a joins edtechnical o committeelop this O/IEt andT those. f m a d i 1 I t d a c t d o The ocedurent houldus t deved. This documen was intended for its urther t e aintenance r reo escribed n the ISO/IEC, PartDirectives, (see Part . n particular ).he ifferent pproval riteria needed for he ifferent types f docume s be note document drafted in accordance with he ditorial ules f the ISO/IEC Directivesn is wn2 o www.iso.org/directives lity t f o d t s o ghts. O d IEC shall d a o p D o Attentio dra t the possibi tha some o the elements fwthisll ocumentn t e I may be he a ubjectr onf patent I listri of patIS an d snot be hel (seeresponsible for identifying ny r all such atent rights. etails f any patent rights identified during the development of the document i be i h ntroduction nd/o the SO Any adeent eclara tion receiveds swww.iso.org/patents n r). ence ers e tr name used in thi document i informatio given fo the conveni of us and does not constitute anFor ndorsement. luntary s, c lated to t, s ll as ut O's to orld an explanationO) ons the vo icalnature arriersof standard to TBT)the me aninge e of ISO specifi terms and expressions re conformity. assessmen a we information abo IS adherence the W Trade Organiza- tion (WT principle in the Techn B Trade ( se th following URL: www.iso.org/iso/fore- word.html his t is TC , Information technology, SC 2, Coded character sets. The committee responsible for t documen ISO/IEC J 1 This th tion f ls nd s 014), which s ically . It lso /IEC 14/Amd /IEC md fif edi o ISO/IEC 10646 cance a replace the fourth edition (ISO/IEC 10646:2 ha been techn revised a incorporates ISO 10646:20 1:2015 and ISO 10646:2014/A 2:2016.This editio includes th g sig th r the p ous ed x n e followin nificanti, changes Masaramwi espect to revi hu, ition: , d New scripts covered: Adlam, Bhaiksuk , Marchen, Gondhi, Newa, Nus Osage, Soyombo, Tangut s s C e, Uni (Extension F x an Zanabazar Square, w oji sy x Existing cripts ignificantly extended: heroke CJK fied Ideographs ), Ne Em mbols.
© ISO/IEC 2017 – All rights reserved vii BS ISO/IEC 10646:201710646:2017 ISO/IEC 10646:2017 (E)
This al s ersalIntroduction Coded et S). It s ion, ng, put n en orm of Internation of e w dStandard a specifie a the sUniv Character S (UC i applicable to the represen- tation, transmiss interchange, processi storage, in and presentatio of the writt f the lan- guagesBy thg orl sntwell as f dditional ymbols. l t of ata lly. n t logy i try gains data stab lity, g l i and data This definin a consiste way o encoding multilinguaed n text i enableset theocolsexchange and d internationad The informatiog sys sechno and computerndus . This e i coversreater ovgloba 30nteroperability 000 ch from interchange. International Standard has been widely adopt i new Intern prot implemente in modern oper- atin tem languages dition er 1 aracters the world’s scripts.
viii © ISO/IEC 2017 – All rights reserved BS ISO/IEC 10646:2017 ISO/IEC 10646:2017 (E)
Information technology — Universal Coded Character Set (UCS)
1Scope
This al s ersal Coded et S). It s , put, d rm f Internation of rldStandard as ll aspecifie of thealUniv sym Character S (UC i applicable to the represen- tation, transmission, interchange, processing storage, in an presentation of the written fo o the lan- Thisguages Int the woal Stanwe s addition bols. x ernations th dard f this Int Standar terms us this I Standar x specifie e architecture o ernational d, s the l of the x defines ed in nternational d, s Basic al BMP) of x describe genera structure UCS codespace, e P), x specifie the Multilingu Plane ( the UCS, tary raphic lane ic ne ary se specifies supplementary planes of th UCS: the Supplementary Multilingual Plane (SM the Supplemen- Ideog P (SIP), the Tertiary Ideograph Pla (TIP), and the Supplement Special-purpo s a set of raphic cha s s and form of n a sca x Plane (SSP), s the s r graphic s and at s of MP, SMP, IP, TIP, SP nd x define g racter used in script the written languages o world-wide le, d th UC cod specifie name fo the character form character the B S S a s th s f r co rol cha and e e , x their code representations within e S espace, s th fo ms of the S: UTF-8, U F-16, and U x specifie e coded representation o nt racters privat us characters s n f TF-8, F-16, -16BE, E, 32, -32BE, x specifie ree encoding r UC T TF-32, E, specifie seve encoding schemes o the UCS: U UT UTF UTF-16L UTF- UTF and x UTF-32Ls add ns to this coded cha U i d f s i O/IEC hod to nate specifie the management of future itio racter set. ISO/IEC 2022 is sp d in The CS s an encoding system ifferent rom that pecified n IS 2022. The met desig UCS from A graph c cha acterecifie will a12.2. only de oint s l BM or o of the sup ntary i r be ssigned one co p in the tandard, ocated either in the P in ne 2 Normativepleme referencesplanes.
ollowing ts d to n xt h ay t r all of nt ents f this or s, y the . r d , Thestf e ofdocumen are referre d i the te inanysuc a w thats) some o their conte constitutes requirem o document. F dated reference onl edition cited applies Fo undate references the lateISO/IECdition 2022:1994the referencedInformation technologyocument (including — Character amendmcode structureen applies.and extension techniques. ISO/IEC 6429:1992 Information technology — Control functions for coded character sets. An , UA #9, The Unicode Bidirectional Algorithm:
Unicode Standard nex X An , UA 15, Unicode Normalization Forms: http://www.unicode.org/reports/tr9/tr9-35.html
Unicode Standard nex X # UTS #37, Ideographic Variation Database: http://www.unicode.org/reports/tr15/tr15-44.html
Unicode Technical Standard, http://www.unicode.org/reports/tr37/tr37-8.html © ISO/IEC 2017 – All rights reserved 1