BS ISO/IEC 10646:2017

BSI Standards Publication

Information technology — Universal Coded Character Set (UCS) BS ISO/IEC 10646:2017 BRITISH STANDARD

National foreword

This British Standard is the UK implementation of ISO/IEC 10646:2017. The UK participation in its preparation was entrusted to Technical Committee IST/5, Programming languages, their environments and system software interfaces. A list of organizations represented on this committee can be obtained on request to its secretary. This publication does not purport to include all the necessary provisions of a contract. Users are responsible for its correct application. © The British Standards Institution 2018 Published by BSI Standards Limited 2018 ISBN 978 0 580 90707 4 ICS 35.040.10 Compliance with a British Standard cannot confer immunity from legal obligations. This British Standard was published under the authority of the Standards Policy and Strategy Committee on 31 2018.

Amendments/corrigenda issued since publicationAugust Date Text affected L ISO/IEC 10646 INTERNATIONA STANDARD Fifth edition

2017-12

Information technology — Universal Coded Character Set (UCS)

Technologies de l'information — Jeu universel de caractères codés (JUC)

ence number ISO/IEC 10646:2017(E) Refer

ISO/IEC 2017

© BS ISO/IEC 10646:201710646:2017 ISO/IEC 10646:2017 (E)

COPYRIGHT PROTECTED DOCUMENT

© ISO/IEC 2017, Published in S •’‡ ‹ϐ‹‡†ǡ witzerland All rights reserved. Unless otherwise no part of this publication may be reproduced or utilized otherwise in any form orthe by r any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission.Š–‘ˆϐ‹ ‡ Permission can be requested from either ISO at the address below or ISO’s member body in the country of Ch.equester. de Blandonnet 8 • CP 401 CH-1214ISO copyrig V , Gene a, S el. +41 22 749 01 11 ax +41 22ernier 749 09 47 v witzerland T F [email protected] www.iso.org © ISO/IEC 2017 – All rights r ii eserved BS ISO/IEC 10646:2017 ISO/IEC 10646:2017 (E)

ForewordCONTENTSYLL YLLL  Introduction e  1 Scope d  2 Normativ references  3 Terms an definitions  4 Conformance information in  4.1 General e o d  4.2 Conformance of terchange ral struc e of the UC  4.3 Conformanc f evices Basic cture d n  5 Gene tur S  6 stru an omenclature of  6.1 Structure of de poi  6.2 Coding characters f  6.3 Types co nts s for code points (UIDs) 6.4 Naming o characters S Id  6.5 Short identifier id  6.6 UC Sequence entifiers he  6.7 Octet sequence entifiers  7 Revision and updating of t UCS  8 Subsets  8.1 General su  8.2 Limited subset S ng f  8.3 Selected bset  9 UC encodi orms  9.1 General  9.2 UTF-8 32 (UCS-4) 9.3 UTF-16 UCS  9.4 UTF-  10 Encoding schemes  10.1 General  10.2 UTF-8  10.3 UTF-16BE  10.4 UTF-16LE  10.5 UTF-16  10.6 UTF-32BE  10.7 UTF-32LE con ol fu ns wi h the  10.8 UTF-32 f f  11 Use of tr nctio t UCS d con ext f  12 Declaration o identification o features of a S ing sche  12.1 Purpose an t o identification 12.2 Identification UC encod me © ISO/IEC 2017 – All rights reserved iii BS ISO/IEC 10646:201710646:2017 ISO/IEC 10646:2017 (E)

of subs s o g phic cha  of con rol fu s  12.3 Identification et f ra racters of th ng s m of SO/IEC 202  12.4 Identification t nction et of co ts nd lists 12.5 Identification e codi yste I 2 Block and colle  13 Structure the de char a Block name  14 ction names  14.1 s ed characters in b al co  14.2 Collection names ed  15 Mirror idirection ntext ality of b al te  15.1 Mirror characters ial cha  15.2 Direction idirection xt  16 Spec racters  16.1 General cy sym  16.2 characters at  16.3 Curren bols aphic d cha  16.4 Form characters s a va s  16.5 Ideogr escription racters fo ms of cha  16.6 Variation electors nd riation equences Compatibility characters 17 Presentation r racters of  18 ing  19 Order characters of ng cha  20 Combin characters ing class d nical  20.1 Order combini racters in de  20.2 Combin an cano ordering e c  20.3 Appearance co charts com cha  20.4 Alternat oded representations s co ng co cha  20.5 Multiple bining racters g Graph  20.6 Collection ntaini mbining racters  20.7 Combinin eme Joiner of s a s  21 Normalization forms syl  22 Special features individual cripts nd ymbol repertoires of pts us n I o South A c  22.1 lable composition method musical symb  22.2 Features scri ed i ndia and some ther sian ountries r r s  22.3 Byzantine ols r r C  22.4 Source eferences fo pictographic ymbols r  23 Source eferences fo JK ideographs r f C i  23.1 List of source eferences r n f r C  23.2 Source eferences file or JK deographs r on for C C id  23.3 Source eference presentatio o JK Unified ideographs r r i  23.4 Source eferences presentati JK ompatibility eographs r  24 Source eferences fo deographs r f e f  24.1 List of source eferences r n f r  24.2 Source eference il or Tangut ideographs 24.3 Source eference presentatio o Tanguts ideographs iv © ISO/IEC 2017 – All rights reserved BS ISO/IEC 10646:2017 ISO/IEC 10646:2017 (E)

r r N  r  25 Source eferences fo üshu characters r f e f  25.1 List of source eferences r s a d a  25.2 Source eference il or Nüshu characters Entity  26 Characte name n nnotations fo  26.1 names nam  26.2 Name rmation im  26.3 Single e u  26.4 Name mutability s for CJK  26.5 Name niqueness r s for Tan ut  26.6 Character name ideographs s for u ch  26.7 Characte name g ideographs ter  26.8 Character name Nüsh aracters d UC Id  26.9 Charac names for of Basic Mul l  27 Name S Sequence entifiers of ry Mul l Plane r scripts and sym ls (SMP) 28 Structure the tilingua of ry Ideographic Plane (SIP) 29 Structure the Supplementa tilingua fo bo of y Ideographic Plan (  30 Structure the Supplementa of ry Sp Plan (S  31 Structure the Tertiar e TIP) cha and lists of  32 Structure the Supplementa ecial-purpose e SP)  33 Code rts character names  33.1 General s  33.2 Code chart of va iation seq  33.3 Character name list cha and lists of  33.4 Summary standardized r uences x ) s g c r s  33.5 Code rts character names s of cod graphic  Anne A (normative Collection of raphic haracters fo ubsets Blocks lists A.1 Collection ed characters d colle s of e co  A.2 K  A.3 Fixe ction the whole UCS (except Unicod llections) r colle  A.4 CJ collections  A.5 Othe ctions x ) List of comb  A.6 collections x C ) f r plan 01 to 10 of  Anne B (normative ining characters x ) Fo mat 8 (UTF  Anne (normative Transformation ormat fo es the UCS (UTF-16) x ) racters i con  Anne D (normative UCS Transformation r -8) x F ) at  Anne E (normative Mirrored cha n c  Anne (informative Form characters c  F.1 General format haracters r a ion cha  F.2 -specifi format characters fo mat cha  F.3 Interlinea nnotat racters d fo cha  F.4 Subtending r racters e m o  F.5 Shorthan rmat racters F.6 Invisibl athematical perators © ISO/IEC 2017 – All rights reserved v BS ISO/IEC 10646:201710646:2017 ISO/IEC 10646:2017 (E)

n musical symbols ge ta ng us Ta  F.7 Wester x ) ally s list of r  F.8 Langua ggi ing g characters x H ) The u of s” o id tify  Anne G (informative Alphabetic orted characte names x ) aphic de cha  Anne (informative se “signature t en UCS  Anne I (informative Ideogr scription racters Syntax f id raphic d s  I.1 General d of aphic d cha  I.2 o an eog escription equence x J ( n f ed d w I.3 Individualefinitions the ideogr escription racters Anne informative) Recommendatio or combin receiving/originating evices ith internal x K ) N s o va r  storage x L ter ng gu  Anne (informative otation f octet lue epresentations x ) s of  Anne (informative) Charac nami idelines x N ) al s to re  Anne M (informative Source characters of to c ter d r c  Anne (informative Extern reference character pertoires of ASN.1 cha r a ct sy  N.1 Methods reference harac repertoires an thei oding of ASN.1 cha r syntax  N.2 Identification racte bstra ntaxes x P al in mation on K  N.3 Identification racte transfer es x Q ( ta for  Anne (informative) Addition for CJ Unified ideographs x R ) mes of angul  Anne informative) Code mapping ble Hangul syllables x S ure and ar of  Anne (informative Na H syllables n p  Anne (informative) Proced for the unification rangement CJK ideographs  S.1 Unificatio rocedure se n  S.2 Arrangement procedure n  S.3 Source paratio examples x T ( ge ta ng us ng C  S.4 Non-unificatio examples x U ) Cha rs in id  Anne informative) Langua ggi i Tag haracters Anne (informative racte entifiers

vi © ISO/IEC 2017 – All rights reserved BS ISO/IEC 10646:2017 ISO/IEC 10646:2017 (E)

Foreword

al ion ) C al chnical ) form he cialized m r . s at rs f ISO ISOIEC (the Internation deOrganizatment offor Standardizational ardsand IE ough(the Internation Electrote e Commis- by sion t spe n to systel withfo worldwider standardizationds of Nationalctivity. bodie ndth ECare membe o or participate inldsthe of velopl st.Internation r Standnal thr ns,technical committeesal nd stablished thetal, respective iaison organizatioith O IEC, dealso particulaart he fiel k. n techn icalld af ISOna Inology,technical O andcommittees IEC collaborate in fie t mutua intere Othe, IS internatioC J C 1 organizatio government a non-governmen in l w IS and a take p in t wor I the fie o informatio tech IS have established pr a joins edtechnical o committeelop this O/IEt andT those. f m a d i 1 I t d a c t d o The ocedurent houldus t deved. This documen was intended for its urther t e aintenance r reo escribed n the ISO/IEC, PartDirectives, (see Part . n particular ).he ifferent pproval riteria needed for he ifferent types f docume s be note document drafted in accordance with he ditorial ules f the ISO/IEC Directivesn is wn2 o www.iso.org/directives lity t f o d t s o ghts. O d IEC shall d a o p D o Attentio dra t the possibi tha some o the elements fwthisll ocumentn t e I may be he a ubjectr onf patent I listri of patIS an d snot be hel (seeresponsible for identifying ny r all such atent rights. etails f any patent rights identified during the development of the document i be i h ntroduction nd/o the SO Any adeent eclara tion receiveds swww.iso.org/patents n r). ence ers e tr name used in thi document i informatio given fo the conveni of us and does not constitute anFor ndorsement. luntary s, c lated to t, s ll as ut O's to orld an explanationO) ons the vo icalnature arriersof standard to TBT)the me aninge e of ISO specifi terms and expressions re conformity. assessmen a we information abo IS adherence the W Trade Organiza- tion (WT principle in the Techn B Trade ( se th following URL: www.iso.org/iso/fore- word.html his t is TC , Information technology, SC 2, Coded character sets. The committee responsible for t documen ISO/IEC J 1 This th tion f ls nd s 014), which s ically . It lso /IEC 14/Amd /IEC md fif edi o ISO/IEC 10646 cance a replace the fourth edition (ISO/IEC 10646:2 ha been techn revised a incorporates ISO 10646:20 1:2015 and ISO 10646:2014/A 2:2016.This editio includes th g sig th r the p ous ed x n e followin nificanti, changes Masaramwi espect to revi hu, ition: , d New scripts covered: , Bhaiksuk , Marchen, Gondhi, Newa, Nus Osage, Soyombo, Tangut s s C e, Uni (Extension F x an Zanabazar Square, w oji sy x Existing cripts ignificantly extended: heroke CJK fied Ideographs ), Ne Em mbols.

© ISO/IEC 2017 – All rights reserved vii BS ISO/IEC 10646:201710646:2017 ISO/IEC 10646:2017 (E)

This al s ersalIntroduction Coded et S). It s ion, ng, put n en orm of Internation of e w dStandard a specifie a the sUniv Character S (UC i applicable to the represen- tation, transmiss interchange, processi storage, in and presentatio of the writt f the lan- guagesBy thg orl sntwell as f dditional ymbols. l t of ata lly. n t logy i try gains data stab lity, g l i and data This definin a consiste way o encoding multilinguaed n text i enableset theocolsexchange and d internationad The informatiog sys sechno and computerndus . This e i coversreater ovgloba 30nteroperability 000 ch from interchange. International Standard has been widely adopt i new Intern prot implemente in modern oper- atin tem languages dition er 1 aracters the world’s scripts.

viii © ISO/IEC 2017 – All rights reserved BS ISO/IEC 10646:2017 ISO/IEC 10646:2017 (E)

Information technology — Universal Coded Character Set (UCS)

1Scope

This al s ersal Coded et S). It s , put, d rm f Internation of rldStandard as ll aspecifie of thealUniv sym Character S (UC i applicable to the represen- tation, transmission, interchange, processing storage, in an presentation of the written fo o the lan- Thisguages Int the woal Stanwe s addition bols. x ernations th dard f this Int Standar terms us this I Standar x specifie e architecture o ernational d, s the l of the x defines ed in nternational d, s Basic al BMP) of x describe genera structure UCS codespace, e P), x specifie the Multilingu Plane ( the UCS, tary raphic lane ic ne ary se specifies supplementary planes of th UCS: the Supplementary Multilingual Plane (SM the Supplemen- Ideog P (SIP), the Tertiary Ideograph Pla (TIP), and the Supplement Special-purpo s a set of raphic cha s s and form of n a sca x Plane (SSP), s the s r graphic s and at s of MP, SMP, IP, TIP, SP nd x define g racter used in script the written languages o world-wide le, d th UC cod specifie name fo the character form character the B S S a s th s f r co rol cha and e e , x their code representations within e S espace, s th fo ms of the S: UTF-8, U F-16, and U x specifie e coded representation o nt racters privat us characters s n f TF-8, F-16, -16BE, E, 32, -32BE, x specifie ree encoding r UC T TF-32, E, specifie seve encoding schemes o the UCS: U UT UTF UTF-16L UTF- UTF and x UTF-32Ls add ns to this coded cha U i d f s i O/IEC hod to nate specifie the management of future itio racter set. ISO/IEC 2022 is sp d in The CS s an encoding system ifferent rom that pecified n IS 2022. The met desig UCS from A graph c cha acterecifie will a12.2. only de oint s l BM or o of the sup ntary i r be ssigned one co p in the tandard, ocated either in the P in ne 2 Normativepleme referencesplanes.

ollowing ts d to n xt h ay t r all of nt ents f this or s, y the . r d , Thestf e ofdocumen are referre d i the te inanysuc a w thats) some o their conte constitutes requirem o document. F dated reference onl edition cited applies Fo undate references the lateISO/IECdition 2022:1994the referencedInformation technologyocument (including — Character amendmcode structureen applies.and extension techniques. ISO/IEC 6429:1992 Information technology — Control functions for coded character sets. An , UA #9, The Unicode Bidirectional Algorithm:

Unicode Standard nex X An , UA 15, Unicode Normalization Forms: http://www.unicode.org/reports/tr9/tr9-35.html

Unicode Standard nex X # UTS #37, Ideographic Variation Database: http://www.unicode.org/reports/tr15/tr15-44.html

Unicode Technical Standard, http://www.unicode.org/reports/tr37/tr37-8.html © ISO/IEC 2017 – All rights reserved 1