<<

4 0 0 2 e e go. a years 2000 did e voic " angel the s a

r u e c t s 1 2 e t n i y l u r e p d n a y l r a e l c s a e t a c i n u m m o c

l wil translation w ne this that is hope our oday; t of

y t i n u m m o c n i a e p s & n a e d l a h C e h t r o f d r o w s " d o G f o m r o f

e t a i r p o r p p a e h t e b l l i w t i t u B . s e l b i B h s i l g n E r u o r o s l e p s o G

ne Lindisfar the either rom f erent ' di e quit look l wil it

Arabic script, Arabic in ed print e Onc ompanions. c his and esus

by en spok Aramaic the of endant desc direct a Chaldean,

day & n moder in estament T w e N the of formatting

h t e b a i l E , r e h p o t s i r h C , y h t o m i T

the beginning am I approaches, s Christma this As

a l l i c s i r P & an h t a n o J . himself esus J of and % time that

at ws e J the of ge langua yday ver e the Aramaic, in them o t

e spok y probabl he ood, underst y readil be o T e. Luk and

w atthe M by en writt y l ventual e s wa it s a reek, G speak " lov r ou ith !

he did think, I , or N English. ames J King in $ not! ear F #

, say t " didn abriel G th? bir s " Lord the ed announc angel the

when shepherds, the or heard, y l actual y ar M ords w what

r ea Y ew N e th in

d e r e d n o w r e e u o y e v a h , y r o t s s a m t s i r h C e h t d a e r u o y s A

g n i s s e l b y r e v e d n a

use. o t sier ea far

n o i t a l s n a r t h s i l g n E d e t n i r p n r e d o m a d n ! s u f o t s o m y a d o t

s a m t s i r h C y p p a H

, e r a y e h t h g u o h t l u f i t u a e b t u B . y a d t a h t n i d e t a i c e r p p a d n a

d e t c e p e m r o f e h t n i d r o w s i H f o n o i t a t n e s e r p g n i n n u t s a y r ve a ou y ishing W

e r a s l e p s o G e n r a f s i d n i L e h t , o g a s r a e y 0 0 3 1 y l r a e n d e t a e r C

e read and study it. study and read e w s a ord w en writt is H through

speaks e H and Spirit, is H by ts hear our o int speaks e H

troubled. and dark en oft so is that orld w a in hope and

life bringing , oday t us o t speaks l stil ord W that And

ord. W the s wa beginning the in … m u b r e v t a r e o i p i c n i r p n I

History of fonts and encodings Fonts in print (encoding and subsetting) Conclusion In the beginning… Fonts & Encoding — A Never Ending Problem?

Volker RW Schaa

GSI Helmholtzzentrum für Schwerionenforschung GmbH Darmstadt,Home address Germany NN Xxxxxx Xxxxx • Xxxxx • Xxxx XXN NXX •!England : jonathan_kew(at)sil.org, priscilla_kew(at)sil.org Tel: +44 (0)9999 999999 AIM: nnnnnnnn JACoW Team Meeting 2009 DESY@Hamburg, Germany November, 2009

Wycliffe UK • Horsleys Green • High Wycombe • Bucks HP14 3XL • England …was the Word Image of Lindisfarne Gospels, f.211 © The British Library Board Volker RW Schaa Fonts & Encoding History of fonts and encodings Fonts in print (encoding and subsetting) Conclusion

1 History of fonts and encodings In pre- time What changed in the computer era? New standards Problem of interpretation

2 Fonts in print (encoding and subsetting) Encoding and naming Font maps Font problems part 1 – Font not embedded Font problems part 2 – Font embedded but characters missing

3 Conclusion What can we do? Thanks

Volker RW Schaa Fonts & Encoding Z You needed foreign characters — you took them

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

In pre-computer times fonts never were a problem

Z Characters were local and known

Volker RW Schaa Fonts & Encoding Z You needed foreign characters — you took them

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

In pre-computer times fonts never were a problem

Z Characters were local and known

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

In pre-computer times fonts never were a problem

Z Characters were local and known

Z You needed foreign characters — you took them

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

In pre-computer times fonts never were a problem

Z Characters were local and known

Z You needed foreign characters — you took them

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

In pre-computer times fonts never were a problem

Z Characters were local and known

Z You needed foreign characters — you took them

Volker RW Schaa Fonts & Encoding by frequency of occurrence

Z German, Swiss and other West-European languages used the following distribution

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

In pre-computer times fonts never were a problem Z Characters were placed in the typesetter’s tray

Volker RW Schaa Fonts & Encoding Z German, Swiss and other West-European languages used the following distribution

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

In pre-computer times fonts never were a problem Z Characters were placed in the typesetter’s tray by frequency of occurrence

Volker RW Schaa Fonts & Encoding used the following distribution

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

In pre-computer times fonts never were a problem Z Characters were placed in the typesetter’s tray by frequency of occurrence

Z German, Swiss and other West-European languages

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

In pre-computer times fonts never were a problem Z Characters were placed in the typesetter’s tray by frequency of occurrence

Z German, Swiss and other West-European languages used the following distribution

Volker RW Schaa Fonts & Encoding UTF-EBCDIC —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F NUL SOH STX ETX ST HT SSA DEL EPA RI SS2 VT FF CR SO SI 0− 0000 0001 0002 0003 009C 0009 0086 007F 0097 008D 008E 000B 000C 000D 000E 000F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DLE DC1 DC2 DC3 OSC LF BS ESA CAN EM PU2 SS3 FS GS RS US 1− 0010 0011 0012 0013 009D 000A 0008 0087 0018 0019 0092 008F 001C 001D 001E 001F 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PAD HOP BPH NBH IND NEL ETB ESC HTS HTJ VTS PLD PLU ENQ ACK BEL 2− 0080 0081 0082 0083 0084 0085 0017 001B 0088 0089 008A 008B 008C 0005 0006 0007 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 DCS PU1 SYN STS CCH MW SPA EOT SOS SGCI SCI CSI DC4 NAK PM SUB 3− 0090 0091 0016 0093 0094 0095 0096 0004 0098 0099 009A 009B 0014 0015 009E 001A Introduction48 49 50 of51 EBCDIC52 53 54 by55 IBM56 57 in58 196359 60 (861 ,62 63 94 printable SP . < ( + | 4− 0020 002E 003C 0028 002B 007C characters,64 65 control , 3 invisible75 76 spaces,77 78 79 softhyphen) & ! $ * ) ; ^ Z From5− the0026 Jargon book: 0021 0024 002A 0029 003B 005E BothA popular widely80 joke: spread ‘standards’ were90 91 implemented92 93 94 95 for US-English - / , % _ > ? Today,6− 002D IBM002F claims to be an open-systems002C 0025 005F company,003E 003F but IBM’s onlyProfessor:96 97 "So the American government107 108 109 went110 111 to IBM to come ` : # @ ' = " own description7− of the EBCDIC0060 variants003A 0023 0040 and0027 003D how0022 to convert Evenup with ASCII an encryption wasn’t usable standard, for British-English121 and122 123 they124 125 came126 because127 up with—" the ‘£’ was between thema b isc stilld e internallyf g h i classified top-secret, Student:8− "EBCDIC!"0061 0062 0063 0064 0065 0066 0067 0068 0069 missing, Canada129 130 131 132 used133 134 an135 own136 137 version which supported French burn-before-reading.j k l m n o p r characters9− 006A 006B 006C 006D 006E 006F 0070 0071 0072 145 146 147 148 149 150 151 152 153 ~ s t u v w x y z [ The laterA− 007E standard0073 0074 0075 ISO/IEC0076 0077 0078 0079 646,007A like ASCII,005B was a 7-bit character 161 162 163 164 165 166 167 168 169 173 set. It did not make any additional codes] available, so the same B− 005D 189 points{ A encodedB C D E differentF G H charactersI in different countries. C− 007B 0041 0042 0043 0044 0045 0046 0047 0048 0049 192 193 194 195 196 197 198 199 200 201 a German,} J French,K L M orN Swedish,O P Q R etc., programmer had to D− 007D 004A 004B 004C 004D 004E 004F 0050 0051 0052 208 209 210 211 212 213 214 215 216 217 get used\ to readingS T U andV W writingX Y Z E− 005C 0053 0054 0055 0056 0057 0058 0059 005A ä aÄiÜ=’Ön’;224 226 227 228 229 ü 230 231 232 233 0 1 2 3 4 5 6 7 8 9 APC F− 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 009F instead240 of241 242 243 244 245 246 247 248 249 255 —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F { a[i]=’\n’; }

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Early days Introduction of ASCII in 1963 (7 Bit, 94 printable characters, 33 control codes, invisible )

Volker RW Schaa Fonts & Encoding UTF-EBCDIC —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F NUL SOH STX ETX ST HT SSA DEL EPA RI SS2 VT FF CR SO SI 0− 0000 0001 0002 0003 009C 0009 0086 007F 0097 008D 008E 000B 000C 000D 000E 000F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DLE DC1 DC2 DC3 OSC LF BS ESA CAN EM PU2 SS3 FS GS RS US 1− 0010 0011 0012 0013 009D 000A 0008 0087 0018 0019 0092 008F 001C 001D 001E 001F 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PAD HOP BPH NBH IND NEL ETB ESC HTS HTJ VTS PLD PLU ENQ ACK BEL 2− 0080 0081 0082 0083 0084 0085 0017 001B 0088 0089 008A 008B 008C 0005 0006 0007 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 DCS PU1 SYN STS CCH MW SPA EOT SOS SGCI SCI CSI DC4 NAK PM SUB 3− 0090 0091 0016 0093 0094 0095 0096 0004 0098 0099 009A 009B 0014 0015 009E 001A Introduction48 49 50 of51 EBCDIC52 53 54 by55 IBM56 57 in58 196359 60 (861 Bit,62 63 94 printable SP . < ( + | 4− 0020 002E 003C 0028 002B 007C characters,64 65 control codes, 3 invisible75 76 spaces,77 78 79 softhyphen) & ! $ * ) ; ^ Z From5− the0026 Jargon book: 0021 0024 002A 0029 003B 005E BothA popular widely80 joke: spread ‘standards’ were90 91 implemented92 93 94 95 for US-English - / , % _ > ? Today,6− 002D IBM002F claims to be an open-systems002C 0025 005F company,003E 003F but IBM’s onlyProfessor:96 97 "So the American government107 108 109 went110 111 to IBM to come ` : # @ ' = " own description7− of the EBCDIC0060 variants003A 0023 0040 and0027 003D how0022 to convert Evenup with ASCII an encryption wasn’t usable standard, for British-English121 and122 123 they124 125 came126 because127 up with—" the ‘£’ was between thema b isc stilld e internallyf g h i classified top-secret, Student:8− "EBCDIC!"0061 0062 0063 0064 0065 0066 0067 0068 0069 missing, Canada129 130 131 132 used133 134 an135 own136 137 version which supported French burn-before-reading.j k l m n o p q r characters9− 006A 006B 006C 006D 006E 006F 0070 0071 0072 145 146 147 148 149 150 151 152 153 ~ s t u v w x y z [ The laterA− 007E standard0073 0074 0075 ISO/IEC0076 0077 0078 0079 646,007A like ASCII,005B was a 7-bit character 161 162 163 164 165 166 167 168 169 173 set. It did not make any additional codes] available, so the same B− 005D 189 code points{ A encodedB C D E differentF G H charactersI in different countries. C− 007B 0041 0042 0043 0044 0045 0046 0047 0048 0049 192 193 194 195 196 197 198 199 200 201 a German,} J French,K L M orN Swedish,O P Q R etc., programmer had to D− 007D 004A 004B 004C 004D 004E 004F 0050 0051 0052 208 209 210 211 212 213 214 215 216 217 get used\ to readingS T U andV W writingX Y Z E− 005C 0053 0054 0055 0056 0057 0058 0059 005A ä aÄiÜ=’Ön’;224 226 227 228 229 ü 230 231 232 233 0 1 2 3 4 5 6 7 8 9 APC F− 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 009F instead240 of241 242 243 244 245 246 247 248 249 255 —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F { a[i]=’\n’; }

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Early days Introduction of ASCII in 1963 (7 Bit, 94 printable characters, 33 control codes, invisible space)

Volker RW Schaa Fonts & Encoding UTF-EBCDIC —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F NUL SOH STX ETX ST HT SSA DEL EPA RI SS2 VT FF CR SO SI 0− 0000 0001 0002 0003 009C 0009 0086 007F 0097 008D 008E 000B 000C 000D 000E 000F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DLE DC1 DC2 DC3 OSC LF BS ESA CAN EM PU2 SS3 FS GS RS US 1− 0010 0011 0012 0013 009D 000A 0008 0087 0018 0019 0092 008F 001C 001D 001E 001F 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PAD HOP BPH NBH IND NEL ETB ESC HTS HTJ VTS PLD PLU ENQ ACK BEL 2− 0080 0081 0082 0083 0084 0085 0017 001B 0088 0089 008A 008B 008C 0005 0006 0007 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 DCS PU1 SYN STS CCH MW SPA EOT SOS SGCI SCI CSI DC4 NAK PM SUB 3− 0090 0091 0016 0093 0094 0095 0096 0004 0098 0099 009A 009B 0014 0015 009E 001A 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 SP . < ( + | 4− 0020 002E 003C 0028 002B 007C 64 75 76 77 78 79 & ! $ * ) ; ^ Z From5− the0026 Jargon book: 0021 0024 002A 0029 003B 005E BothA popular widely80 joke: spread ‘standards’ were90 91 implemented92 93 94 95 for US-English - / , % _ > ? Today,6− 002D IBM002F claims to be an open-systems002C 0025 005F company,003E 003F but IBM’s onlyProfessor:96 97 "So the American government107 108 109 went110 111 to IBM to come ` : # @ ' = " own description7− of the EBCDIC0060 variants003A 0023 0040 and0027 003D how0022 to convert Evenup with ASCII an encryption wasn’t usable standard, for British-English121 and122 123 they124 125 came126 because127 up with—" the ‘£’ was between thema b isc stilld e internallyf g h i classified top-secret, Student:8− "EBCDIC!"0061 0062 0063 0064 0065 0066 0067 0068 0069 missing, Canada129 130 131 132 used133 134 an135 own136 137 version which supported French burn-before-reading.j k l m n o p q r characters9− 006A 006B 006C 006D 006E 006F 0070 0071 0072 145 146 147 148 149 150 151 152 153 ~ s t u v w x y z [ The laterA− 007E standard0073 0074 0075 ISO/IEC0076 0077 0078 0079 646,007A like ASCII,005B was a 7-bit character 161 162 163 164 165 166 167 168 169 173 set. It did not make any additional codes] available, so the same B− 005D 189 code points{ A encodedB C D E differentF G H charactersI in different countries. C− 007B 0041 0042 0043 0044 0045 0046 0047 0048 0049 192 193 194 195 196 197 198 199 200 201 a German,} J French,K L M orN Swedish,O P Q R etc., programmer had to D− 007D 004A 004B 004C 004D 004E 004F 0050 0051 0052 208 209 210 211 212 213 214 215 216 217 get used\ to readingS T U andV W writingX Y Z E− 005C 0053 0054 0055 0056 0057 0058 0059 005A ä aÄiÜ=’Ön’;224 226 227 228 229 ü 230 231 232 233 0 1 2 3 4 5 6 7 8 9 APC F− 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 009F instead240 of241 242 243 244 245 246 247 248 249 255 —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F { a[i]=’\n’; }

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Early days Introduction of ASCII in 1963 (7 Bit, 94 printable characters, 33 control codes, invisible space) Introduction of EBCDIC by IBM in 1963 (8 Bit, 94 printable characters, 65 control codes, 3 invisible spaces, softhyphen)

Volker RW Schaa Fonts & Encoding Z Z BothAFrom popular widely the Jargon joke: spread book: ‘standards’ were implemented for US-English onlyProfessor:Today, IBM "So claims the to American be an open-systems government went company, to IBM but to IBM’s come Evenupown with description ASCII an encryption wasn’t of the usable standard, EBCDIC for British-English variants and they and came howbecause up to with—" convert the ‘£’ was missing,Student:between Canada them"EBCDIC!" is still used internally an own classifiedversion which top-secret, supported French charactersburn-before-reading. The later standard ISO/IEC 646, like ASCII, was a 7-bit character set. It did not make any additional codes available, so the same code points encoded different characters in different countries. a German, French, or Swedish, etc., programmer had to get used to reading and writing ä aÄiÜ=’Ön’; ü instead of { a[i]=’\n’; }

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion UTF-EBCDIC Problem of interpretation —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F NUL SOH STX ETX ST HT SSA DEL EPA RI SS2 VT FF CR SO SI 0− 0000 0001 0002 0003 009C 0009 0086 007F 0097 008D 008E 000B 000C 000D 000E 000F Computers introduced0 1 problems:2 3 4 5 Early6 7 days8 9 10 11 12 13 14 15 DLE DC1 DC2 DC3 OSC LF BS ESA CAN EM PU2 SS3 FS GS RS US 1− 0010 0011 0012 0013 009D 000A 0008 0087 0018 0019 0092 008F 001C 001D 001E 001F 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 IntroductionPAD HOP BPH ofNBH ASCIIIND NEL inETB 1963ESC HTS (7HTJ Bit,VTS PLD 94PLU printableENQ ACK BEL characters, 33 2− 0080 0081 0082 0083 0084 0085 0017 001B 0088 0089 008A 008B 008C 0005 0006 0007 control32 codes,33 34 invisible35 36 37 38 space)39 40 41 42 43 44 45 46 47 DCS PU1 SYN STS CCH MW SPA EOT SOS SGCI SCI CSI DC4 NAK PM SUB 3− 0090 0091 0016 0093 0094 0095 0096 0004 0098 0099 009A 009B 0014 0015 009E 001A Introduction48 49 50 of51 EBCDIC52 53 54 by55 IBM56 57 in58 196359 60 (861 Bit,62 63 94 printable SP . < ( + | 4− 0020 002E 003C 0028 002B 007C characters,64 65 control codes, 3 invisible75 76 spaces,77 78 79 softhyphen) & ! $ * ) ; ^ 5− 0026 0021 0024 002A 0029 003B 005E 80 90 91 92 93 94 95 - / , % _ > ? 6− 002D 002F 002C 0025 005F 003E 003F 96 97 107 108 109 110 111 ` : # @ ' = " 7− 0060 003A 0023 0040 0027 003D 0022 121 122 123 124 125 126 127 a b c d e f g h i 8− 0061 0062 0063 0064 0065 0066 0067 0068 0069 129 130 131 132 133 134 135 136 137 j k l m n o p q r 9− 006A 006B 006C 006D 006E 006F 0070 0071 0072 145 146 147 148 149 150 151 152 153 ~ s t u v w x y z [ A− 007E 0073 0074 0075 0076 0077 0078 0079 007A 005B 161 162 163 164 165 166 167 168 169 173 ] B− 005D 189 { A B C D E F G H I C− 007B 0041 0042 0043 0044 0045 0046 0047 0048 0049 192 193 194 195 196 197 198 199 200 201 } J K L M N O P Q R D− 007D 004A 004B 004C 004D 004E 004F 0050 0051 0052 208 209 210 211 212 213 214 215 216 217 \ S T U V W X Y Z E− 005C 0053 0054 0055 0056 0057 0058 0059 005A 224 226 227 228 229 230 231 232 233 0 1 2 3 4 5 6 7 8 9 APC F− 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 009F 240 241 242 243 244 245 246 247 248 249 255 —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F

Volker RW Schaa Fonts & Encoding UTF-EBCDIC —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F NUL SOH STX ETX ST HT SSA DEL EPA RI SS2 VT FF CR SO SI 0− 0000 0001 0002 0003 009C 0009 0086 007F 0097 008D 008E 000B 000C 000D 000E 000F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DLE DC1 DC2 DC3 OSC LF BS ESA CAN EM PU2 SS3 FS GS RS US 1− 0010 0011 0012 0013 009D 000A 0008 0087 0018 0019 0092 008F 001C 001D 001E 001F 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PAD HOP BPH NBH IND NEL ETB ESC HTS HTJ VTS PLD PLU ENQ ACK BEL 2− 0080 0081 0082 0083 0084 0085 0017 001B 0088 0089 008A 008B 008C 0005 0006 0007 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 DCS PU1 SYN STS CCH MW SPA EOT SOS SGCI SCI CSI DC4 NAK PM SUB 3− 0090 0091 0016 0093 0094 0095 0096 0004 0098 0099 009A 009B 0014 0015 009E 001A 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 SP . < ( + | 4− 0020 002E 003C 0028 002B 007C 64 75 76 77 78 79 & ! $ * ) ; ^ Z 5− 0026 0021 0024 002A 0029 003B 005E BothA popular widely80 joke: spread ‘standards’ were90 91 implemented92 93 94 95 for US-English - / , % _ > ? Today,6− 002D IBM002F claims to be an open-systems002C 0025 005F company,003E 003F but IBM’s onlyProfessor:96 97 "So the American government107 108 109 went110 111 to IBM to come ` : # @ ' = " own description7− of the EBCDIC0060 variants003A 0023 0040 and0027 003D how0022 to convert Evenup with ASCII an encryption wasn’t usable standard, for British-English121 and122 123 they124 125 came126 because127 up with—" the ‘£’ was between thema b isc stilld e internallyf g h i classified top-secret, Student:8− "EBCDIC!"0061 0062 0063 0064 0065 0066 0067 0068 0069 missing, Canada129 130 131 132 used133 134 an135 own136 137 version which supported French burn-before-reading.j k l m n o p q r characters9− 006A 006B 006C 006D 006E 006F 0070 0071 0072 145 146 147 148 149 150 151 152 153 ~ s t u v w x y z [ The laterA− 007E standard0073 0074 0075 ISO/IEC0076 0077 0078 0079 646,007A like ASCII,005B was a 7-bit character 161 162 163 164 165 166 167 168 169 173 set. It did not make any additional codes] available, so the same B− 005D 189 code points{ A encodedB C D E differentF G H charactersI in different countries. C− 007B 0041 0042 0043 0044 0045 0046 0047 0048 0049 192 193 194 195 196 197 198 199 200 201 a German,} J French,K L M orN Swedish,O P Q R etc., programmer had to D− 007D 004A 004B 004C 004D 004E 004F 0050 0051 0052 208 209 210 211 212 213 214 215 216 217 get used\ to readingS T U andV W writingX Y Z E− 005C 0053 0054 0055 0056 0057 0058 0059 005A ä aÄiÜ=’Ön’;224 226 227 228 229 ü 230 231 232 233 0 1 2 3 4 5 6 7 8 9 APC F− 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 009F instead240 of241 242 243 244 245 246 247 248 249 255 —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F { a[i]=’\n’; }

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Early days Introduction of ASCII in 1963 (7 Bit, 94 printable characters, 33 control codes, invisible space) Introduction of EBCDIC by IBM in 1963 (8 Bit, 94 printable characters, 65 control codes, 3 invisible spaces, softhyphen) Z From the Jargon book:

Volker RW Schaa Fonts & Encoding UTF-EBCDIC —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F NUL SOH STX ETX ST HT SSA DEL EPA RI SS2 VT FF CR SO SI 0− 0000 0001 0002 0003 009C 0009 0086 007F 0097 008D 008E 000B 000C 000D 000E 000F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DLE DC1 DC2 DC3 OSC LF BS ESA CAN EM PU2 SS3 FS GS RS US 1− 0010 0011 0012 0013 009D 000A 0008 0087 0018 0019 0092 008F 001C 001D 001E 001F 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PAD HOP BPH NBH IND NEL ETB ESC HTS HTJ VTS PLD PLU ENQ ACK BEL 2− 0080 0081 0082 0083 0084 0085 0017 001B 0088 0089 008A 008B 008C 0005 0006 0007 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 DCS PU1 SYN STS CCH MW SPA EOT SOS SGCI SCI CSI DC4 NAK PM SUB 3− 0090 0091 0016 0093 0094 0095 0096 0004 0098 0099 009A 009B 0014 0015 009E 001A 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 SP . < ( + | 4− 0020 002E 003C 0028 002B 007C 64 75 76 77 78 79 & ! $ * ) ; ^ Z 5− 0026 0021 0024 002A 0029 003B 005E BothA popular widely80 joke: spread ‘standards’ were90 91 implemented92 93 94 95 for US-English - / , % _ > ? 6− 002D 002F 002C 0025 005F 003E 003F onlyProfessor:96 97 "So the American government107 108 109 went110 111 to IBM to come ` : # @ ' = " 7− 0060 003A 0023 0040 0027 003D 0022 Evenup with ASCII an encryption wasn’t usable standard, for British-English121 and122 123 they124 125 came126 because127 up with—" the ‘£’ was a b c d e f g h i Student:8− "EBCDIC!"0061 0062 0063 0064 0065 0066 0067 0068 0069 missing, Canada129 130 131 132 used133 134 an135 own136 137 version which supported French j k l m n o p q r characters9− 006A 006B 006C 006D 006E 006F 0070 0071 0072 145 146 147 148 149 150 151 152 153 ~ s t u v w x y z [ The laterA− 007E standard0073 0074 0075 ISO/IEC0076 0077 0078 0079 646,007A like ASCII,005B was a 7-bit character 161 162 163 164 165 166 167 168 169 173 set. It did not make any additional codes] available, so the same B− 005D 189 code points{ A encodedB C D E differentF G H charactersI in different countries. C− 007B 0041 0042 0043 0044 0045 0046 0047 0048 0049 192 193 194 195 196 197 198 199 200 201 a German,} J French,K L M orN Swedish,O P Q R etc., programmer had to D− 007D 004A 004B 004C 004D 004E 004F 0050 0051 0052 208 209 210 211 212 213 214 215 216 217 get used\ to readingS T U andV W writingX Y Z E− 005C 0053 0054 0055 0056 0057 0058 0059 005A ä aÄiÜ=’Ön’;224 226 227 228 229 ü 230 231 232 233 0 1 2 3 4 5 6 7 8 9 APC F− 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 009F instead240 of241 242 243 244 245 246 247 248 249 255 —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F { a[i]=’\n’; }

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Early days Introduction of ASCII in 1963 (7 Bit, 94 printable characters, 33 control codes, invisible space) Introduction of EBCDIC by IBM in 1963 (8 Bit, 94 printable characters, 65 control codes, 3 invisible spaces, softhyphen) Z From the Jargon book: Today, IBM claims to be an open-systems company, but IBM’s own description of the EBCDIC variants and how to convert between them is still internally classified top-secret, burn-before-reading.

Volker RW Schaa Fonts & Encoding UTF-EBCDIC —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F NUL SOH STX ETX ST HT SSA DEL EPA RI SS2 VT FF CR SO SI 0− 0000 0001 0002 0003 009C 0009 0086 007F 0097 008D 008E 000B 000C 000D 000E 000F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DLE DC1 DC2 DC3 OSC LF BS ESA CAN EM PU2 SS3 FS GS RS US 1− 0010 0011 0012 0013 009D 000A 0008 0087 0018 0019 0092 008F 001C 001D 001E 001F 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PAD HOP BPH NBH IND NEL ETB ESC HTS HTJ VTS PLD PLU ENQ ACK BEL 2− 0080 0081 0082 0083 0084 0085 0017 001B 0088 0089 008A 008B 008C 0005 0006 0007 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 DCS PU1 SYN STS CCH MW SPA EOT SOS SGCI SCI CSI DC4 NAK PM SUB 3− 0090 0091 0016 0093 0094 0095 0096 0004 0098 0099 009A 009B 0014 0015 009E 001A 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 SP . < ( + | 4− 0020 002E 003C 0028 002B 007C 64 75 76 77 78 79 & ! $ * ) ; ^ Z From5− the0026 Jargon book: 0021 0024 002A 0029 003B 005E Both widely80 spread ‘standards’ were90 91 implemented92 93 94 95 for US-English - / , % _ > ? Today,6− 002D IBM002F claims to be an open-systems002C 0025 005F company,003E 003F but IBM’s onlyProfessor:96 97 "So the American government107 108 109 went110 111 to IBM to come ` : # @ ' = " own description7− of the EBCDIC0060 variants003A 0023 0040 and0027 003D how0022 to convert Evenup with ASCII an encryption wasn’t usable standard, for British-English121 and122 123 they124 125 came126 because127 up with—" the ‘£’ was between thema b isc stilld e internallyf g h i classified top-secret, Student:8− "EBCDIC!"0061 0062 0063 0064 0065 0066 0067 0068 0069 missing, Canada129 130 131 132 used133 134 an135 own136 137 version which supported French burn-before-reading.j k l m n o p q r characters9− 006A 006B 006C 006D 006E 006F 0070 0071 0072 145 146 147 148 149 150 151 152 153 ~ s t u v w x y z [ The laterA− 007E standard0073 0074 0075 ISO/IEC0076 0077 0078 0079 646,007A like ASCII,005B was a 7-bit character 161 162 163 164 165 166 167 168 169 173 set. It did not make any additional codes] available, so the same B− 005D 189 code points{ A encodedB C D E differentF G H charactersI in different countries. C− 007B 0041 0042 0043 0044 0045 0046 0047 0048 0049 192 193 194 195 196 197 198 199 200 201 a German,} J French,K L M orN Swedish,O P Q R etc., programmer had to D− 007D 004A 004B 004C 004D 004E 004F 0050 0051 0052 208 209 210 211 212 213 214 215 216 217 get used\ to readingS T U andV W writingX Y Z E− 005C 0053 0054 0055 0056 0057 0058 0059 005A ä aÄiÜ=’Ön’;224 226 227 228 229 ü 230 231 232 233 0 1 2 3 4 5 6 7 8 9 APC F− 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 009F instead240 of241 242 243 244 245 246 247 248 249 255 —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F { a[i]=’\n’; }

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Early days Introduction of ASCII in 1963 (7 Bit, 94 printable characters, 33 control codes, invisible space) Introduction of EBCDIC by IBM in 1963 (8 Bit, 94 printable characters, 65 control codes, 3 invisible spaces, softhyphen) Z A popular joke:

Volker RW Schaa Fonts & Encoding UTF-EBCDIC —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F NUL SOH STX ETX ST HT SSA DEL EPA RI SS2 VT FF CR SO SI 0− 0000 0001 0002 0003 009C 0009 0086 007F 0097 008D 008E 000B 000C 000D 000E 000F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DLE DC1 DC2 DC3 OSC LF BS ESA CAN EM PU2 SS3 FS GS RS US 1− 0010 0011 0012 0013 009D 000A 0008 0087 0018 0019 0092 008F 001C 001D 001E 001F 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PAD HOP BPH NBH IND NEL ETB ESC HTS HTJ VTS PLD PLU ENQ ACK BEL 2− 0080 0081 0082 0083 0084 0085 0017 001B 0088 0089 008A 008B 008C 0005 0006 0007 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 DCS PU1 SYN STS CCH MW SPA EOT SOS SGCI SCI CSI DC4 NAK PM SUB 3− 0090 0091 0016 0093 0094 0095 0096 0004 0098 0099 009A 009B 0014 0015 009E 001A 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 SP . < ( + | 4− 0020 002E 003C 0028 002B 007C 64 75 76 77 78 79 & ! $ * ) ; ^ Z From5− the0026 Jargon book: 0021 0024 002A 0029 003B 005E Both widely80 spread ‘standards’ were90 91 implemented92 93 94 95 for US-English - / , % _ > ? Today,6− 002D IBM002F claims to be an open-systems002C 0025 005F company,003E 003F but IBM’s only 96 97 107 108 109 110 111 ` : # @ ' = " own description7− of the EBCDIC0060 variants003A 0023 0040 and0027 003D how0022 to convert Even ASCII wasn’t usable for British-English121 122 123 124 125 126 because127 the ‘£’ was between thema b isc stilld e internallyf g h i classified top-secret, Student:8− "EBCDIC!"0061 0062 0063 0064 0065 0066 0067 0068 0069 missing, Canada129 130 131 132 used133 134 an135 own136 137 version which supported French burn-before-reading.j k l m n o p q r characters9− 006A 006B 006C 006D 006E 006F 0070 0071 0072 145 146 147 148 149 150 151 152 153 ~ s t u v w x y z [ The laterA− 007E standard0073 0074 0075 ISO/IEC0076 0077 0078 0079 646,007A like ASCII,005B was a 7-bit character 161 162 163 164 165 166 167 168 169 173 set. It did not make any additional codes] available, so the same B− 005D 189 code points{ A encodedB C D E differentF G H charactersI in different countries. C− 007B 0041 0042 0043 0044 0045 0046 0047 0048 0049 192 193 194 195 196 197 198 199 200 201 a German,} J French,K L M orN Swedish,O P Q R etc., programmer had to D− 007D 004A 004B 004C 004D 004E 004F 0050 0051 0052 208 209 210 211 212 213 214 215 216 217 get used\ to readingS T U andV W writingX Y Z E− 005C 0053 0054 0055 0056 0057 0058 0059 005A ä aÄiÜ=’Ön’;224 226 227 228 229 ü 230 231 232 233 0 1 2 3 4 5 6 7 8 9 APC F− 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 009F instead240 of241 242 243 244 245 246 247 248 249 255 —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F { a[i]=’\n’; }

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Early days Introduction of ASCII in 1963 (7 Bit, 94 printable characters, 33 control codes, invisible space) Introduction of EBCDIC by IBM in 1963 (8 Bit, 94 printable characters, 65 control codes, 3 invisible spaces, softhyphen) Z A popular joke: Professor: "So the American government went to IBM to come up with an encryption standard, and they came up with—"

Volker RW Schaa Fonts & Encoding UTF-EBCDIC —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F NUL SOH STX ETX ST HT SSA DEL EPA RI SS2 VT FF CR SO SI 0− 0000 0001 0002 0003 009C 0009 0086 007F 0097 008D 008E 000B 000C 000D 000E 000F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DLE DC1 DC2 DC3 OSC LF BS ESA CAN EM PU2 SS3 FS GS RS US 1− 0010 0011 0012 0013 009D 000A 0008 0087 0018 0019 0092 008F 001C 001D 001E 001F 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PAD HOP BPH NBH IND NEL ETB ESC HTS HTJ VTS PLD PLU ENQ ACK BEL 2− 0080 0081 0082 0083 0084 0085 0017 001B 0088 0089 008A 008B 008C 0005 0006 0007 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 DCS PU1 SYN STS CCH MW SPA EOT SOS SGCI SCI CSI DC4 NAK PM SUB 3− 0090 0091 0016 0093 0094 0095 0096 0004 0098 0099 009A 009B 0014 0015 009E 001A 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 SP . < ( + | 4− 0020 002E 003C 0028 002B 007C 64 75 76 77 78 79 & ! $ * ) ; ^ Z From5− the0026 Jargon book: 0021 0024 002A 0029 003B 005E Both widely80 spread ‘standards’ were90 91 implemented92 93 94 95 for US-English - / , % _ > ? Today,6− 002D IBM002F claims to be an open-systems002C 0025 005F company,003E 003F but IBM’s only 96 97 107 108 109 110 111 ` : # @ ' = " own description7− of the EBCDIC0060 variants003A 0023 0040 and0027 003D how0022 to convert Even ASCII wasn’t usable for British-English121 122 123 124 125 126 because127 the ‘£’ was between thema b isc stilld e internallyf g h i classified top-secret, 8− 0061 0062 0063 0064 0065 0066 0067 0068 0069 missing, Canada129 130 131 132 used133 134 an135 own136 137 version which supported French burn-before-reading.j k l m n o p q r characters9− 006A 006B 006C 006D 006E 006F 0070 0071 0072 145 146 147 148 149 150 151 152 153 ~ s t u v w x y z [ The laterA− 007E standard0073 0074 0075 ISO/IEC0076 0077 0078 0079 646,007A like ASCII,005B was a 7-bit character 161 162 163 164 165 166 167 168 169 173 set. It did not make any additional codes] available, so the same B− 005D 189 code points{ A encodedB C D E differentF G H charactersI in different countries. C− 007B 0041 0042 0043 0044 0045 0046 0047 0048 0049 192 193 194 195 196 197 198 199 200 201 a German,} J French,K L M orN Swedish,O P Q R etc., programmer had to D− 007D 004A 004B 004C 004D 004E 004F 0050 0051 0052 208 209 210 211 212 213 214 215 216 217 get used\ to readingS T U andV W writingX Y Z E− 005C 0053 0054 0055 0056 0057 0058 0059 005A ä aÄiÜ=’Ön’;224 226 227 228 229 ü 230 231 232 233 0 1 2 3 4 5 6 7 8 9 APC F− 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 009F instead240 of241 242 243 244 245 246 247 248 249 255 —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F { a[i]=’\n’; }

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Early days Introduction of ASCII in 1963 (7 Bit, 94 printable characters, 33 control codes, invisible space) Introduction of EBCDIC by IBM in 1963 (8 Bit, 94 printable characters, 65 control codes, 3 invisible spaces, softhyphen) Z A popular joke: Professor: "So the American government went to IBM to come up with an encryption standard, and they came up with—" Student: "EBCDIC!"

Volker RW Schaa Fonts & Encoding UTF-EBCDIC —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F NUL SOH STX ETX ST HT SSA DEL EPA RI SS2 VT FF CR SO SI 0− 0000 0001 0002 0003 009C 0009 0086 007F 0097 008D 008E 000B 000C 000D 000E 000F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DLE DC1 DC2 DC3 OSC LF BS ESA CAN EM PU2 SS3 FS GS RS US 1− 0010 0011 0012 0013 009D 000A 0008 0087 0018 0019 0092 008F 001C 001D 001E 001F 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PAD HOP BPH NBH IND NEL ETB ESC HTS HTJ VTS PLD PLU ENQ ACK BEL 2− 0080 0081 0082 0083 0084 0085 0017 001B 0088 0089 008A 008B 008C 0005 0006 0007 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 DCS PU1 SYN STS CCH MW SPA EOT SOS SGCI SCI CSI DC4 NAK PM SUB 3− 0090 0091 0016 0093 0094 0095 0096 0004 0098 0099 009A 009B 0014 0015 009E 001A 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 SP . < ( + | 4− 0020 002E 003C 0028 002B 007C 64 75 76 77 78 79 & ! $ * ) ; ^ Z From5− the0026 Jargon book: 0021 0024 002A 0029 003B 005E A popular80 joke: 90 91 92 93 94 95 - / , % _ > ? Today,6− 002D IBM002F claims to be an open-systems002C 0025 005F company,003E 003F but IBM’s Professor:96 97 "So the American government107 108 109 went110 111 to IBM to come ` : # @ ' = " own description7− of the EBCDIC0060 variants003A 0023 0040 and0027 003D how0022 to convert Evenup with ASCII an encryption wasn’t usable standard, for British-English121 and122 123 they124 125 came126 because127 up with—" the ‘£’ was between thema b isc stilld e internallyf g h i classified top-secret, Student:8− "EBCDIC!"0061 0062 0063 0064 0065 0066 0067 0068 0069 missing, Canada129 130 131 132 used133 134 an135 own136 137 version which supported French burn-before-reading.j k l m n o p q r characters9− 006A 006B 006C 006D 006E 006F 0070 0071 0072 145 146 147 148 149 150 151 152 153 ~ s t u v w x y z [ The laterA− 007E standard0073 0074 0075 ISO/IEC0076 0077 0078 0079 646,007A like ASCII,005B was a 7-bit character 161 162 163 164 165 166 167 168 169 173 set. It did not make any additional codes] available, so the same B− 005D 189 code points{ A encodedB C D E differentF G H charactersI in different countries. C− 007B 0041 0042 0043 0044 0045 0046 0047 0048 0049 192 193 194 195 196 197 198 199 200 201 a German,} J French,K L M orN Swedish,O P Q R etc., programmer had to D− 007D 004A 004B 004C 004D 004E 004F 0050 0051 0052 208 209 210 211 212 213 214 215 216 217 get used\ to readingS T U andV W writingX Y Z E− 005C 0053 0054 0055 0056 0057 0058 0059 005A ä aÄiÜ=’Ön’;224 226 227 228 229 ü 230 231 232 233 0 1 2 3 4 5 6 7 8 9 APC F− 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 009F instead240 of241 242 243 244 245 246 247 248 249 255 —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F { a[i]=’\n’; }

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Early days Introduction of ASCII in 1963 (7 Bit, 94 printable characters, 33 control codes, invisible space) Introduction of EBCDIC by IBM in 1963 (8 Bit, 94 printable characters, 65 control codes, 3 invisible spaces, softhyphen) Both widely spread ‘standards’ were implemented for US-English only

Volker RW Schaa Fonts & Encoding UTF-EBCDIC —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F NUL SOH STX ETX ST HT SSA DEL EPA RI SS2 VT FF CR SO SI 0− 0000 0001 0002 0003 009C 0009 0086 007F 0097 008D 008E 000B 000C 000D 000E 000F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DLE DC1 DC2 DC3 OSC LF BS ESA CAN EM PU2 SS3 FS GS RS US 1− 0010 0011 0012 0013 009D 000A 0008 0087 0018 0019 0092 008F 001C 001D 001E 001F 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PAD HOP BPH NBH IND NEL ETB ESC HTS HTJ VTS PLD PLU ENQ ACK BEL 2− 0080 0081 0082 0083 0084 0085 0017 001B 0088 0089 008A 008B 008C 0005 0006 0007 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 DCS PU1 SYN STS CCH MW SPA EOT SOS SGCI SCI CSI DC4 NAK PM SUB 3− 0090 0091 0016 0093 0094 0095 0096 0004 0098 0099 009A 009B 0014 0015 009E 001A 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 SP . < ( + | 4− 0020 002E 003C 0028 002B 007C 64 75 76 77 78 79 & ! $ * ) ; ^ Z From5− the0026 Jargon book: 0021 0024 002A 0029 003B 005E A popular80 joke: 90 91 92 93 94 95 - / , % _ > ? Today,6− 002D IBM002F claims to be an open-systems002C 0025 005F company,003E 003F but IBM’s Professor:96 97 "So the American government107 108 109 went110 111 to IBM to come ` : # @ ' = " own description7− of the EBCDIC0060 variants003A 0023 0040 and0027 003D how0022 to convert up with an encryption standard,121 and122 123 they124 125 came126 127 up with—" between thema b isc stilld e internallyf g h i classified top-secret, Student:8− "EBCDIC!"0061 0062 0063 0064 0065 0066 0067 0068 0069 129 130 131 132 133 134 135 136 137 burn-before-reading.j k l m n o p q r 9− 006A 006B 006C 006D 006E 006F 0070 0071 0072 145 146 147 148 149 150 151 152 153 ~ s t u v w x y z [ The laterA− 007E standard0073 0074 0075 ISO/IEC0076 0077 0078 0079 646,007A like ASCII,005B was a 7-bit character 161 162 163 164 165 166 167 168 169 173 set. It did not make any additional codes] available, so the same B− 005D 189 code points{ A encodedB C D E differentF G H charactersI in different countries. C− 007B 0041 0042 0043 0044 0045 0046 0047 0048 0049 192 193 194 195 196 197 198 199 200 201 a German,} J French,K L M orN Swedish,O P Q R etc., programmer had to D− 007D 004A 004B 004C 004D 004E 004F 0050 0051 0052 208 209 210 211 212 213 214 215 216 217 get used\ to readingS T U andV W writingX Y Z E− 005C 0053 0054 0055 0056 0057 0058 0059 005A ä aÄiÜ=’Ön’;224 226 227 228 229 ü 230 231 232 233 0 1 2 3 4 5 6 7 8 9 APC F− 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 009F instead240 of241 242 243 244 245 246 247 248 249 255 —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F { a[i]=’\n’; }

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Early days Introduction of ASCII in 1963 (7 Bit, 94 printable characters, 33 control codes, invisible space) Introduction of EBCDIC by IBM in 1963 (8 Bit, 94 printable characters, 65 control codes, 3 invisible spaces, softhyphen) Both widely spread ‘standards’ were implemented for US-English only Even ASCII wasn’t usable for British-English because the ‘£’ was missing, Canada used an own version which supported French characters

Volker RW Schaa Fonts & Encoding UTF-EBCDIC —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F NUL SOH STX ETX ST HT SSA DEL EPA RI SS2 VT FF CR SO SI 0− 0000 0001 0002 0003 009C 0009 0086 007F 0097 008D 008E 000B 000C 000D 000E 000F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DLE DC1 DC2 DC3 OSC LF BS ESA CAN EM PU2 SS3 FS GS RS US 1− 0010 0011 0012 0013 009D 000A 0008 0087 0018 0019 0092 008F 001C 001D 001E 001F 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PAD HOP BPH NBH IND NEL ETB ESC HTS HTJ VTS PLD PLU ENQ ACK BEL 2− 0080 0081 0082 0083 0084 0085 0017 001B 0088 0089 008A 008B 008C 0005 0006 0007 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 DCS PU1 SYN STS CCH MW SPA EOT SOS SGCI SCI CSI DC4 NAK PM SUB 3− 0090 0091 0016 0093 0094 0095 0096 0004 0098 0099 009A 009B 0014 0015 009E 001A 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 SP . < ( + | 4− 0020 002E 003C 0028 002B 007C 64 75 76 77 78 79 & ! $ * ) ; ^ Z From5− the0026 Jargon book: 0021 0024 002A 0029 003B 005E A popular80 joke: 90 91 92 93 94 95 - / , % _ > ? Today,6− 002D IBM002F claims to be an open-systems002C 0025 005F company,003E 003F but IBM’s Professor:96 97 "So the American government107 108 109 went110 111 to IBM to come ` : # @ ' = " own description7− of the EBCDIC0060 variants003A 0023 0040 and0027 003D how0022 to convert up with an encryption standard,121 and122 123 they124 125 came126 127 up with—" between thema b isc stilld e internallyf g h i classified top-secret, Student:8− "EBCDIC!"0061 0062 0063 0064 0065 0066 0067 0068 0069 129 130 131 132 133 134 135 136 137 burn-before-reading.j k l m n o p q r 9− 006A 006B 006C 006D 006E 006F 0070 0071 0072 145 146 147 148 149 150 151 152 153 ~ s t u v w x y z [ A− 007E 0073 0074 0075 0076 0077 0078 0079 007A 005B 161 162 163 164 165 166 167 168 169 173 ] B− 005D 189 { A B C D E F G H I C− 007B 0041 0042 0043 0044 0045 0046 0047 0048 0049 192 193 194 195 196 197 198 199 200 201 a German,} J French,K L M orN Swedish,O P Q R etc., programmer had to D− 007D 004A 004B 004C 004D 004E 004F 0050 0051 0052 208 209 210 211 212 213 214 215 216 217 get used\ to readingS T U andV W writingX Y Z E− 005C 0053 0054 0055 0056 0057 0058 0059 005A ä aÄiÜ=’Ön’;224 226 227 228 229 ü 230 231 232 233 0 1 2 3 4 5 6 7 8 9 APC F− 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 009F instead240 of241 242 243 244 245 246 247 248 249 255 —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F { a[i]=’\n’; }

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Early days Introduction of ASCII in 1963 (7 Bit, 94 printable characters, 33 control codes, invisible space) Introduction of EBCDIC by IBM in 1963 (8 Bit, 94 printable characters, 65 control codes, 3 invisible spaces, softhyphen) Both widely spread ‘standards’ were implemented for US-English only Even ASCII wasn’t usable for British-English because the ‘£’ was missing, Canada used an own version which supported French characters The later standard ISO/IEC 646, like ASCII, was a 7-bit character set. It did not make any additional codes available, so the same code points encoded different characters in different countries.

Volker RW Schaa Fonts & Encoding UTF-EBCDIC —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F NUL SOH STX ETX ST HT SSA DEL EPA RI SS2 VT FF CR SO SI 0− 0000 0001 0002 0003 009C 0009 0086 007F 0097 008D 008E 000B 000C 000D 000E 000F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 DLE DC1 DC2 DC3 OSC LF BS ESA CAN EM PU2 SS3 FS GS RS US 1− 0010 0011 0012 0013 009D 000A 0008 0087 0018 0019 0092 008F 001C 001D 001E 001F 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PAD HOP BPH NBH IND NEL ETB ESC HTS HTJ VTS PLD PLU ENQ ACK BEL 2− 0080 0081 0082 0083 0084 0085 0017 001B 0088 0089 008A 008B 008C 0005 0006 0007 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 DCS PU1 SYN STS CCH MW SPA EOT SOS SGCI SCI CSI DC4 NAK PM SUB 3− 0090 0091 0016 0093 0094 0095 0096 0004 0098 0099 009A 009B 0014 0015 009E 001A 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 SP . < ( + | 4− 0020 002E 003C 0028 002B 007C 64 75 76 77 78 79 & ! $ * ) ; ^ Z From5− the0026 Jargon book: 0021 0024 002A 0029 003B 005E A popular80 joke: 90 91 92 93 94 95 - / , % _ > ? Today,6− 002D IBM002F claims to be an open-systems002C 0025 005F company,003E 003F but IBM’s Professor:96 97 "So the American government107 108 109 went110 111 to IBM to come ` : # @ ' = " own description7− of the EBCDIC0060 variants003A 0023 0040 and0027 003D how0022 to convert up with an encryption standard,121 and122 123 they124 125 came126 127 up with—" between thema b isc stilld e internallyf g h i classified top-secret, Student:8− "EBCDIC!"0061 0062 0063 0064 0065 0066 0067 0068 0069 129 130 131 132 133 134 135 136 137 burn-before-reading.j k l m n o p q r 9− 006A 006B 006C 006D 006E 006F 0070 0071 0072 145 146 147 148 149 150 151 152 153 ~ s t u v w x y z [ A− 007E 0073 0074 0075 0076 0077 0078 0079 007A 005B 161 162 163 164 165 166 167 168 169 173 ] B− 005D 189 { A B C D E F G H I C− 007B 0041 0042 0043 0044 0045 0046 0047 0048 0049 192 193 194 195 196 197 198 199 200 201 } J K L M N O P Q R D− 007D 004A 004B 004C 004D 004E 004F 0050 0051 0052 208 209 210 211 212 213 214 215 216 217 \ S T U V W X Y Z E− 005C 0053 0054 0055 0056 0057 0058 0059 005A 224 226 227 228 229 230 231 232 233 0 1 2 3 4 5 6 7 8 9 APC F− 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 009F 240 241 242 243 244 245 246 247 248 249 255 —0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Early days Introduction of ASCII in 1963 (7 Bit, 94 printable characters, 33 control codes, invisible space) Introduction of EBCDIC by IBM in 1963 (8 Bit, 94 printable characters, 65 control codes, 3 invisible spaces, softhyphen) Both widely spread ‘standards’ were implemented for US-English only Even ASCII wasn’t usable for British-English because the ‘£’ was missing, Canada used an own version which supported French characters The later standard ISO/IEC 646, like ASCII, was a 7-bit character set. It did not make any additional codes available, so the same code points encoded different characters in different countries. a German, French, or Swedish, etc., programmer had to get used to reading and writing ä aÄiÜ=’Ön’; ü instead of { a[i]=’\n’; }

Volker RW Schaa Fonts & Encoding With ECMA-94 (ISO/IEC 8859-1/-2/-3/-4) we are now able to write and encode the following European languages: Albanian, Catalan, Czech, Danish, Dutch, English, Estonian, Faeroese, Finnish, French, Galician, German, Greenlandic, Hungarian, Icelandic, Irish, Italian, Lappish, Latvian, Lithuanian, Maltese, Norwegian, Polish, Portuguese, Rumanian, Serbo-croatian, Slovak, Slovene, Spanish, Swedish, and Turkish (plus Afrikaans and Esperanto). So now we have ways of defining different characters in character sets, but. . .

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Later times In April 1984 an eight-bit standard was adopted as ECMA-94 (later as ISO/IEC 8859). This derived from the character sets DEC-MCS and Mac Roman, developed as true extensions of ASCII. It leaves the original character-mapping intact, and adds additional character definitions after the first 128 (i.e., 7 bit) characters.

Volker RW Schaa Fonts & Encoding With ECMA-94 (ISO/IEC 8859-1/-2/-3/-4) we are now able to write and encode the following European languages: Albanian, Catalan, Czech, Danish, Dutch, English, Estonian, Faeroese, Finnish, French, Galician, German, Greenlandic, Hungarian, Icelandic, Irish, Italian, Lappish, Latvian, Lithuanian, Maltese, Norwegian, Polish, Portuguese, Rumanian, Serbo-croatian, Slovak, Slovene, Spanish, Swedish, and Turkish (plus Afrikaans and Esperanto). So now we have ways of defining different characters in character sets, but. . .

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Later times In April 1984 an eight-bit standard was adopted as ECMA-94 (later as ISO/IEC 8859). This derived from the character sets DEC-MCS and Mac OS Roman, developed as true extensions of ASCII. It leaves the original character-mapping intact, and adds additional character definitions after the first 128 (i.e., 7 bit) characters.

Volker RW Schaa Fonts & Encoding So now we have ways of defining different characters in character sets, but. . .

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Later times In April 1984 an eight-bit standard was adopted as ECMA-94 (later as ISO/IEC 8859). This derived from the character sets DEC-MCS and Mac OS Roman, developed as true extensions of ASCII. It leaves the original character-mapping intact, and adds additional character definitions after the first 128 (i.e., 7 bit) characters. With ECMA-94 (ISO/IEC 8859-1/-2/-3/-4) we are now able to write and encode the following European languages: Albanian, Catalan, Czech, Danish, Dutch, English, Estonian, Faeroese, Finnish, French, Galician, German, Greenlandic, Hungarian, Icelandic, Irish, Italian, Lappish, Latvian, Lithuanian, Maltese, Norwegian, Polish, Portuguese, Rumanian, Serbo-croatian, Slovak, Slovene, Spanish, Swedish, and Turkish (plus Afrikaans and Esperanto).

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Computers introduced problems: Later times In April 1984 an eight-bit standard was adopted as ECMA-94 (later as ISO/IEC 8859). This derived from the character sets DEC-MCS and Mac OS Roman, developed as true extensions of ASCII. It leaves the original character-mapping intact, and adds additional character definitions after the first 128 (i.e., 7 bit) characters. With ECMA-94 (ISO/IEC 8859-1/-2/-3/-4) we are now able to write and encode the following European languages: Albanian, Catalan, Czech, Danish, Dutch, English, Estonian, Faeroese, Finnish, French, Galician, German, Greenlandic, Hungarian, Icelandic, Irish, Italian, Lappish, Latvian, Lithuanian, Maltese, Norwegian, Polish, Portuguese, Rumanian, Serbo-croatian, Slovak, Slovene, Spanish, Swedish, and Turkish (plus Afrikaans and Esperanto). So now we have ways of defining different characters in character sets, but. . .

Volker RW Schaa Fonts & Encoding Now we looking at the input text as the computer does. What do we (or the computer) see? Perfect! But what does it mean? We have (or the computer has) no idea where this might come from. We only know that we accept 8 Bit values. No idea whether it’s a Chinese scientist at CERN using his laptop or a French who is sitting at KEK, or . . . No knowledge about used character set or . If the computer switches to a different interpretation of the input, it might look like this. It helps a bit, but we still have no idea about the characters not being ASCII (or in the range of 32. . . 127 or x’20’...’7e’). . .

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

New standards: Problems solved?

Now imagine, somebody types (or even better cuts/pastes) an ‘Abstract’ text into the submission form of SPMS for the conference where you are the editor.

Volker RW Schaa Fonts & Encoding Perfect! But what does it mean? We have (or the computer has) no idea where this might come from. We only know that we accept 8 Bit values. No idea whether it’s a Chinese scientist at CERN using his laptop or a French who is sitting at KEK, or . . . No knowledge about used character set or character encoding. If the computer switches to a different interpretation of the input, it might look like this. It helps a bit, but we still have no idea about the characters not being ASCII (or in the range of 32. . . 127 or x’20’...’7e’). . .

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

New standards: Problems solved?

Now imagine, somebody types (or even better cuts/pastes) an ‘Abstract’ text into the submission form of SPMS for the conference where you are the editor. Now we looking at the input text as the computer does. What do we (or the computer) see?

Volker RW Schaa Fonts & Encoding Perfect! But what does it mean? We have (or the computer has) no idea where this might come from. We only know that we accept 8 Bit values. No idea whether it’s a Chinese scientist at CERN using his laptop or a French who is sitting at KEK, or . . . No knowledge about used character set or character encoding. If the computer switches to a different interpretation of the input, it might look like this. It helps a bit, but we still have no idea about the characters not being ASCII (or in the range of 32. . . 127 or x’20’...’7e’). . .

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

New standards: Problems solved?

Now imagine, somebody types (or even better cuts/pastes) an ‘Abstract’ text into the submission form of SPMS for the conference where you are the editor. Now we looking at the input text as the computer does. What do we (or the computer) see?

Volker RW Schaa Fonts & Encoding We have (or the computer has) no idea where this might come from. We only know that we accept 8 Bit values. No idea whether it’s a Chinese scientist at CERN using his laptop or a French who is sitting at KEK, or . . . No knowledge about used character set or character encoding. If the computer switches to a different interpretation of the input, it might look like this. It helps a bit, but we still have no idea about the characters not being ASCII (or in the range of 32. . . 127 or x’20’...’7e’). . .

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

New standards: Problems solved?

Now imagine, somebody types (or even better cuts/pastes) an ‘Abstract’ text into the submission form of SPMS for the conference where you are the editor. Now we looking at the input text as the computer does. What do we (or the computer) see? Perfect! But what does it mean?

Volker RW Schaa Fonts & Encoding No knowledge about used character set or character encoding. If the computer switches to a different interpretation of the input, it might look like this. It helps a bit, but we still have no idea about the characters not being ASCII (or in the range of 32. . . 127 or x’20’...’7e’). . .

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

New standards: Problems solved?

Now imagine, somebody types (or even better cuts/pastes) an ‘Abstract’ text into the submission form of SPMS for the conference where you are the editor. Now we looking at the input text as the computer does. What do we (or the computer) see? Perfect! But what does it mean? We have (or the computer has) no idea where this might come from. We only know that we accept 8 Bit values. No idea whether it’s a Chinese scientist at CERN using his laptop or a French who is sitting at KEK, or . . .

Volker RW Schaa Fonts & Encoding If the computer switches to a different interpretation of the input, it might look like this. It helps a bit, but we still have no idea about the characters not being ASCII (or in the range of 32. . . 127 or x’20’...’7e’). . .

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

New standards: Problems solved?

Now imagine, somebody types (or even better cuts/pastes) an ‘Abstract’ text into the submission form of SPMS for the conference where you are the editor. Now we looking at the input text as the computer does. What do we (or the computer) see? Perfect! But what does it mean? We have (or the computer has) no idea where this might come from. We only know that we accept 8 Bit values. No idea whether it’s a Chinese scientist at CERN using his laptop or a French who is sitting at KEK, or . . . No knowledge about used character set or character encoding.

Volker RW Schaa Fonts & Encoding It helps a bit, but we still have no idea about the characters not being ASCII (or in the range of 32. . . 127 or x’20’...’7e’). . .

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

New standards: Problems solved?

Now imagine, somebody types (or even better cuts/pastes) an ‘Abstract’ text into the submission form of SPMS for the conference where you are the editor. Now we looking at the input text as the computer does. What do we (or the computer) see? Perfect! But what does it mean? We have (or the computer has) no idea where this might come from. We only know that we accept 8 Bit values. No idea whether it’s a Chinese scientist at CERN using his laptop or a French who is sitting at KEK, or . . . No knowledge about used character set or character encoding. If the computer switches to a different interpretation of the input, it might look like this.

Volker RW Schaa Fonts & Encoding It helps a bit, but we still have no idea about the characters not being ASCII (or in the range of 32. . . 127 or x’20’...’7e’). . .

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

New standards: Problems solved?

Now imagine, somebody types (or even better cuts/pastes) an ‘Abstract’ text into the submission form of SPMS for the conference where you are the editor. Now we looking at the input text as the computer does. What do we (or the computer) see? Perfect! But what does it mean? We have (or the computer has) no idea where this might come from. We only know that we accept 8 Bit values. No idea whether it’s a Chinese scientist at CERN using his laptop or a French who is sitting at KEK, or . . . No knowledge about used character set or character encoding. If the computer switches to a different interpretation of the input, it might look like this.

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

New standards: Problems solved?

Now imagine, somebody types (or even better cuts/pastes) an ‘Abstract’ text into the submission form of SPMS for the conference where you are the editor. Now we looking at the input text as the computer does. What do we (or the computer) see? Perfect! But what does it mean? We have (or the computer has) no idea where this might come from. We only know that we accept 8 Bit values. No idea whether it’s a Chinese scientist at CERN using his laptop or a French who is sitting at KEK, or . . . No knowledge about used character set or character encoding. If the computer switches to a different interpretation of the input, it might look like this. It helps a bit, but we still have no idea about the characters not being ASCII (or in the range of 32. . . 127 or x’20’...’7e’). . .

Volker RW Schaa Fonts & Encoding  Not a West-European language!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean we switch to Z West-European languages and select ISO8859-1:

Volker RW Schaa Fonts & Encoding  Not a West-European language!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean we switch to Z West-European languages and select ISO8859-1:

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean we switch to Z West-European languages and select ISO8859-1:  Not a West-European language!

Volker RW Schaa Fonts & Encoding  Doesn’t look Greek to me!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean we switch to Z the Greek language and select ISO8859-7:

Volker RW Schaa Fonts & Encoding  Doesn’t look Greek to me!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean we switch to Z the Greek language and select ISO8859-7:

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean we switch to Z the Greek language and select ISO8859-7:  Doesn’t look Greek to me!

Volker RW Schaa Fonts & Encoding  Not a Baltic language!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean we switch to Z the Baltic languages and select ISO8859-13:

Volker RW Schaa Fonts & Encoding  Not a Baltic language!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean we switch to Z the Baltic languages and select ISO8859-13:

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean we switch to Z the Baltic languages and select ISO8859-13:  Not a Baltic language!

Volker RW Schaa Fonts & Encoding  Looks funny but perhaps not Thai! (See question marks in black diamonds?)

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, let’s try Z the Thai language and select ISO8859-11:

Volker RW Schaa Fonts & Encoding  Looks funny but perhaps not Thai! (See question marks in black diamonds?)

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, let’s try Z the Thai language and select ISO8859-11:

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, let’s try Z the Thai language and select ISO8859-11:  Looks funny but perhaps not Thai! (See question marks in black diamonds?)

Volker RW Schaa Fonts & Encoding  Looks good, but is it Armenian?

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean we switch to Z the Armenian language and select ISO8859-16:

Volker RW Schaa Fonts & Encoding  Looks good, but is it Armenian?

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean we switch to Z the Armenian language and select ISO8859-16:

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean we switch to Z the Armenian language and select ISO8859-16:  Looks good, but is it Armenian?

Volker RW Schaa Fonts & Encoding  Surely not Romanian (even on a MAC)

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, let’s stay for a moment Z in Europe and try Romanian and select MAC-Romanian:

Volker RW Schaa Fonts & Encoding  Surely not Romanian (even on a MAC)

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, let’s stay for a moment Z in Europe and try Romanian and select MAC-Romanian:

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, let’s stay for a moment Z in Europe and try Romanian and select MAC-Romanian:  Surely not Romanian (even on a MAC)

Volker RW Schaa Fonts & Encoding  Too many black diamonds! Not Arabic!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, let’s try Z Arabic and switch to ISO8859-6:

Volker RW Schaa Fonts & Encoding  Too many black diamonds! Not Arabic!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, let’s try Z Arabic and switch to ISO8859-6:

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, let’s try Z Arabic and switch to ISO8859-6:  Too many black diamonds! Not Arabic!

Volker RW Schaa Fonts & Encoding  Still many black diamonds! Not simple Chinese!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, let us assume it’s our Chinese colleague at CERN, Z so switch to Chinese-Simplified (ISO-2022-CN):

Volker RW Schaa Fonts & Encoding  Still many black diamonds! Not simple Chinese!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, let us assume it’s our Chinese colleague at CERN, Z so switch to Chinese-Simplified (ISO-2022-CN):

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, let us assume it’s our Chinese colleague at CERN, Z so switch to Chinese-Simplified (ISO-2022-CN):  Still many black diamonds! Not simple Chinese!

Volker RW Schaa Fonts & Encoding  Different but still wrong!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, what about Z traditional Chinese, let’s try Chinese-Traditional:

Volker RW Schaa Fonts & Encoding  Different but still wrong!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, what about Z traditional Chinese, let’s try Chinese-Traditional:

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, what about Z traditional Chinese, let’s try Chinese-Traditional:  Different but still wrong!

Volker RW Schaa Fonts & Encoding  Still no idea what it means!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, what about Z Farsi? We use Mac-Farsi:

Volker RW Schaa Fonts & Encoding  Still no idea what it means!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, what about Z Farsi? We use Mac-Farsi:

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, what about Z Farsi? We use Mac-Farsi:  Still no idea what it means!

Volker RW Schaa Fonts & Encoding  Still black diamonds!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, what about the fifth most spoken language Z Hindi? We switch to Mac-Hindi:

Volker RW Schaa Fonts & Encoding  Still black diamonds!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, what about the fifth most spoken language Z Hindi? We switch to Mac-Hindi:

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, what about the fifth most spoken language Z Hindi? We switch to Mac-Hindi:  Still black diamonds!

Volker RW Schaa Fonts & Encoding  Anybody can read this?

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, what about our Japanese colleagues? Z Let’s try Japanese in Shift_JIS encoding:

Volker RW Schaa Fonts & Encoding  Anybody can read this?

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, what about our Japanese colleagues? Z Let’s try Japanese in Shift_JIS encoding:

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, what about our Japanese colleagues? Z Let’s try Japanese in Shift_JIS encoding:  Anybody can read this?

Volker RW Schaa Fonts & Encoding  Too many question marks and daggers!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, we now try Z Let’s switch to UTF-16:

Volker RW Schaa Fonts & Encoding  Too many question marks and daggers!

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, we now try Unicode Z Let’s switch to UTF-16:

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

To get an idea about what it could mean, we now try Unicode Z Let’s switch to UTF-16:  Too many question marks and daggers!

Volker RW Schaa Fonts & Encoding Z That look nice, even the line feeds are correct. Makzim??

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

OK, we left out a language which is the eighth most spoken one: Z Russian! Let’s switch to Cyrillic in ISO8859-5 encoding:

Volker RW Schaa Fonts & Encoding Z That look nice, even the line feeds are correct. Makzim??

In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

OK, we left out a language which is the eighth most spoken one: Z Russian! Let’s switch to Cyrillic in ISO8859-5 encoding:

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

OK, we left out a language which is the eighth most spoken one: Z Russian! Let’s switch to Cyrillic in ISO8859-5 encoding: Z That look nice, even the line feeds are correct. Makzim??

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

So in the end it’s not an ‘Abstract’ but some SPAM message. Z And this is the English translation (OK, mostly English):

Volker RW Schaa Fonts & Encoding In pre-computer time History of fonts and encodings What changed in the computer era? Fonts in print (encoding and subsetting) New standards Conclusion Problem of interpretation

Let’s try to uncover the secret

So in the end it’s not an ‘Abstract’ but some SPAM message. Z And this is the English translation (OK, mostly English):

Volker RW Schaa Fonts & Encoding If we would have used: x‘a4’ + ISO8859-14 C˙ ⇒ But how are fonts encoded? Is there a unique way to define an encoding for them? Z Short answer: before Unicode some had names (PostScript Level 3 named 180 including Greek letters but nothing specific for Math Symbols)≈ Z In Unicode it look like this:

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Encoding We now have understood that Encoding is important as it points to the right character: x‘a4’ + ISO8859-15 ⇒ D

Volker RW Schaa Fonts & Encoding But how are Symbol fonts encoded? Is there a unique way to define an encoding for them? Z Short answer: before Unicode some had names (PostScript Level 3 named 180 including Greek letters but nothing specific for Math Symbols)≈ Z In Unicode it look like this:

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Encoding We now have understood that Encoding is important as it points to the right character: x‘a4’ + ISO8859-15 ⇒ D If we would have used: x‘a4’ + ISO8859-14 C˙ ⇒

Volker RW Schaa Fonts & Encoding Z Short answer: before Unicode some had names (PostScript Level 3 named 180 including Greek letters but nothing specific for Math Symbols)≈ Z In Unicode it look like this:

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Encoding We now have understood that Encoding is important as it points to the right character: x‘a4’ + ISO8859-15 ⇒ D If we would have used: x‘a4’ + ISO8859-14 C˙ ⇒ But how are Symbol fonts encoded? Is there a unique way to define an encoding for them?

Volker RW Schaa Fonts & Encoding Z In Unicode it look like this:

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Encoding We now have understood that Encoding is important as it points to the right character: x‘a4’ + ISO8859-15 ⇒ D If we would have used: x‘a4’ + ISO8859-14 C˙ ⇒ But how are Symbol fonts encoded? Is there a unique way to define an encoding for them? Z Short answer: before Unicode some had names (PostScript Level 3 named 180 including Greek letters but nothing specific for Math Symbols)≈

Volker RW Schaa Fonts & Encoding Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Encoding We now have understood that Encoding is important as it points to the right character: x‘a4’ + ISO8859-15 ⇒ D If we would have used: x‘a4’ + ISO8859-14 C˙ ⇒ But how are Symbol fonts encoded? Is there a unique way to define an encoding for them? Z Short answer: before Unicode some had names (PostScript Level 3 named 180 including Greek letters but nothing specific for Math Symbols)≈ Z In Unicode it look like this:

Volker RW Schaa Fonts & Encoding Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Encoding We now have understood that Encoding is important as it points to the right character: x‘a4’ + ISO8859-15 ⇒ D If we would have used: x‘a4’ + ISO8859-14 C˙ ⇒ But how are Symbol fonts encoded? Is there a unique way to define an encoding for them? Z Short answer: before Unicode some had names (PostScript Level 3 named 180 including Greek letters but nothing specific for Math Symbols)≈ Z In Unicode it look like this:

Volker RW Schaa Fonts & Encoding Nowadays fonts have far more than the possible 192 characters (standard is 250, but there are fonts with more 2 000 characters). The PostScript () driver looks up the requested font and accesses the character map table of this fonts, it scans the text to typeset and finds out about the requested Encoding and all different characters that appear in the text. Now it copies the needed Encoding vector and, depending on the parameters, copies either the whole set of characters, just the needed ones or nothing (Embedded, Embedded Subset, nada). If it finds characters which do not fit into one Encoding vector, the driver may copy a second map and the corresponding characters.

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping A typical font (Type1, TrueType and OpenType) contains a map with character names and its location in the font.

Volker RW Schaa Fonts & Encoding The PostScript (printer) driver looks up the requested font and accesses the character map table of this fonts, it scans the text to typeset and finds out about the requested Encoding and all different characters that appear in the text. Now it copies the needed Encoding vector and, depending on the parameters, copies either the whole set of characters, just the needed ones or nothing (Embedded, Embedded Subset, nada). If it finds characters which do not fit into one Encoding vector, the driver may copy a second map and the corresponding characters.

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping A typical font (Type1, TrueType and OpenType) contains a map with character names and its location in the font. Nowadays fonts have far more than the possible 192 characters (standard is 250, but there are fonts with more 2 000 characters).

Volker RW Schaa Fonts & Encoding Now it copies the needed Encoding vector and, depending on the parameters, copies either the whole set of characters, just the needed ones or nothing (Embedded, Embedded Subset, nada). If it finds characters which do not fit into one Encoding vector, the driver may copy a second map and the corresponding characters.

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping A typical font (Type1, TrueType and OpenType) contains a map with character names and its location in the font. Nowadays fonts have far more than the possible 192 characters (standard is 250, but there are fonts with more 2 000 characters). The PostScript (printer) driver looks up the requested font and accesses the character map table of this fonts, it scans the text to typeset and finds out about the requested Encoding and all different characters that appear in the text.

Volker RW Schaa Fonts & Encoding If it finds characters which do not fit into one Encoding vector, the driver may copy a second map and the corresponding characters.

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping A typical font (Type1, TrueType and OpenType) contains a map with character names and its location in the font. Nowadays fonts have far more than the possible 192 characters (standard is 250, but there are fonts with more 2 000 characters). The PostScript (printer) driver looks up the requested font and accesses the character map table of this fonts, it scans the text to typeset and finds out about the requested Encoding and all different characters that appear in the text. Now it copies the needed Encoding vector and, depending on the parameters, copies either the whole set of characters, just the needed ones or nothing (Embedded, Embedded Subset, nada).

Volker RW Schaa Fonts & Encoding Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping A typical font (Type1, TrueType and OpenType) contains a map with character names and its location in the font. Nowadays fonts have far more than the possible 192 characters (standard is 250, but there are fonts with more 2 000 characters). The PostScript (printer) driver looks up the requested font and accesses the character map table of this fonts, it scans the text to typeset and finds out about the requested Encoding and all different characters that appear in the text. Now it copies the needed Encoding vector and, depending on the parameters, copies either the whole set of characters, just the needed ones or nothing (Embedded, Embedded Subset, nada). If it finds characters which do not fit into one Encoding vector, the driver may copy a second map and the corresponding characters.

Volker RW Schaa Fonts & Encoding The code which makes up a font Subset may look like this: So this all looks fine, but why do we face problems so often? Let us look into the Document Properties of a typical PDF file: We find 3 different Encodings (Ansi, Identity-H, and Roman) for one font, the 2 pairs of character sub-maps (Identity-H, Ansi) are surely not the same, and there are 3 different font format embedded: Type1, TrueType, and TrueType (CID) Positive: all fonts are embedded as subsets! But: we surely will have problems to change or edit such a file with Pitstop or Acrobat!

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping and character copying How does a Encoding map looks like? An example of a Custom map which encodes more than the standard ISOLatin1Encoding by using some of the between x‘00’ and x‘1f’.

Volker RW Schaa Fonts & Encoding The code which makes up a font Subset may look like this: So this all looks fine, but why do we face problems so often? Let us look into the Document Properties of a typical PDF file: We find 3 different Encodings (Ansi, Identity-H, and Roman) for one font, the 2 pairs of character sub-maps (Identity-H, Ansi) are surely not the same, and there are 3 different font format embedded: Type1, TrueType, and TrueType (CID) Positive: all fonts are embedded as subsets! But: we surely will have problems to change or edit such a file with Pitstop or Acrobat!

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping and character copying How does a Encoding map looks like? An example of a Custom map which encodes more than the standard ISOLatin1Encoding by using some of the code point between x‘00’ and x‘1f’.

Volker RW Schaa Fonts & Encoding So this all looks fine, but why do we face problems so often? Let us look into the Document Properties of a typical PDF file: We find 3 different Encodings (Ansi, Identity-H, and Roman) for one font, the 2 pairs of character sub-maps (Identity-H, Ansi) are surely not the same, and there are 3 different font format embedded: Type1, TrueType, and TrueType (CID) Positive: all fonts are embedded as subsets! But: we surely will have problems to change or edit such a file with Pitstop or Acrobat!

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping and character copying How does a Encoding map looks like? An example of a Custom map which encodes more than the standard ISOLatin1Encoding by using some of the code point between x‘00’ and x‘1f’. The code which makes up a font Subset may look like this:

Volker RW Schaa Fonts & Encoding So this all looks fine, but why do we face problems so often? Let us look into the Document Properties of a typical PDF file: We find 3 different Encodings (Ansi, Identity-H, and Roman) for one font, the 2 pairs of character sub-maps (Identity-H, Ansi) are surely not the same, and there are 3 different font format embedded: Type1, TrueType, and TrueType (CID) Positive: all fonts are embedded as subsets! But: we surely will have problems to change or edit such a file with Pitstop or Acrobat!

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping and character copying How does a Encoding map looks like? An example of a Custom map which encodes more than the standard ISOLatin1Encoding by using some of the code point between x‘00’ and x‘1f’. The code which makes up a font Subset may look like this:

Volker RW Schaa Fonts & Encoding Let us look into the Document Properties of a typical PDF file: We find 3 different Encodings (Ansi, Identity-H, and Roman) for one font, the 2 pairs of character sub-maps (Identity-H, Ansi) are surely not the same, and there are 3 different font format embedded: Type1, TrueType, and TrueType (CID) Positive: all fonts are embedded as subsets! But: we surely will have problems to change or edit such a file with Pitstop or Acrobat!

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping and character copying How does a Encoding map looks like? An example of a Custom map which encodes more than the standard ISOLatin1Encoding by using some of the code point between x‘00’ and x‘1f’. The code which makes up a font Subset may look like this: So this all looks fine, but why do we face problems so often?

Volker RW Schaa Fonts & Encoding We find 3 different Encodings (Ansi, Identity-H, and Roman) for one font, the 2 pairs of character sub-maps (Identity-H, Ansi) are surely not the same, and there are 3 different font format embedded: Type1, TrueType, and TrueType (CID) Positive: all fonts are embedded as subsets! But: we surely will have problems to change or edit such a file with Pitstop or Acrobat!

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping and character copying How does a Encoding map looks like? An example of a Custom map which encodes more than the standard ISOLatin1Encoding by using some of the code point between x‘00’ and x‘1f’. The code which makes up a font Subset may look like this: So this all looks fine, but why do we face problems so often? Let us look into the Document Properties of a typical PDF file:

Volker RW Schaa Fonts & Encoding We find 3 different Encodings (Ansi, Identity-H, and Roman) for one font, the 2 pairs of character sub-maps (Identity-H, Ansi) are surely not the same, and there are 3 different font format embedded: Type1, TrueType, and TrueType (CID) Positive: all fonts are embedded as subsets! But: we surely will have problems to change or edit such a file with Pitstop or Acrobat!

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping and character copying How does a Encoding map looks like? An example of a Custom map which encodes more than the standard ISOLatin1Encoding by using some of the code point between x‘00’ and x‘1f’. The code which makes up a font Subset may look like this: So this all looks fine, but why do we face problems so often? Let us look into the Document Properties of a typical PDF file:

Volker RW Schaa Fonts & Encoding Positive: all fonts are embedded as subsets! But: we surely will have problems to change or edit such a file with Pitstop or Acrobat!

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping and character copying How does a Encoding map looks like? An example of a Custom map which encodes more than the standard ISOLatin1Encoding by using some of the code point between x‘00’ and x‘1f’. The code which makes up a font Subset may look like this: So this all looks fine, but why do we face problems so often? Let us look into the Document Properties of a typical PDF file: We find 3 different Encodings (Ansi, Identity-H, and Roman) for one font, the 2 pairs of character sub-maps (Identity-H, Ansi) are surely not the same, and there are 3 different font format embedded: Type1, TrueType, and TrueType (CID)

Volker RW Schaa Fonts & Encoding Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Mapping and character copying How does a Encoding map looks like? An example of a Custom map which encodes more than the standard ISOLatin1Encoding by using some of the code point between x‘00’ and x‘1f’. The code which makes up a font Subset may look like this: So this all looks fine, but why do we face problems so often? Let us look into the Document Properties of a typical PDF file: We find 3 different Encodings (Ansi, Identity-H, and Roman) for one font, the 2 pairs of character sub-maps (Identity-H, Ansi) are surely not the same, and there are 3 different font format embedded: Type1, TrueType, and TrueType (CID) Positive: all fonts are embedded as subsets! But: we surely will have problems to change or edit such a file with Pitstop or Acrobat!

Volker RW Schaa Fonts & Encoding In the PostScript driver setup the option for TrueType fonts was set as “Substitute with Device Font". A font was requested in the PostScript file which is not installed on the editor’s computer, therefore a substitute font is used for display and PDF generation, but the font will not be included in the final PDF. A Truetype, CID or OpenType font was used that contained the flag /FSType def with an x-value , 0 (i.e. 2 [restricted license embedding], 4 [embedding allowed, but not editable], 256 [no subset embedding], 512 [Bitmap embedding only])

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Why is a font not embedded? Just for the statistics: in 35 % of the PDF files uploaded for a JACoW ≈ conference one or more fonts are missing. A font was requested which is one of the 15 base fonts and the switch always embed fonts had not been set in the PostScript produced by the author.

Volker RW Schaa Fonts & Encoding A font was requested in the PostScript file which is not installed on the editor’s computer, therefore a substitute font is used for display and PDF generation, but the font will not be included in the final PDF. A Truetype, CID or OpenType font was used that contained the flag /FSType def with an x-value , 0 (i.e. 2 [restricted license embedding], 4 [embedding allowed, but not editable], 256 [no subset embedding], 512 [Bitmap embedding only])

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Why is a font not embedded? Just for the statistics: in 35 % of the PDF files uploaded for a JACoW ≈ conference one or more fonts are missing. A font was requested which is one of the 15 base fonts and the switch always embed fonts had not been set in the PostScript produced by the author. In the PostScript driver setup the option for TrueType fonts was set as “Substitute with Device Font".

Volker RW Schaa Fonts & Encoding A font was requested in the PostScript file which is not installed on the editor’s computer, therefore a substitute font is used for display and PDF generation, but the font will not be included in the final PDF. A Truetype, CID or OpenType font was used that contained the flag /FSType def with an x-value , 0 (i.e. 2 [restricted license embedding], 4 [embedding allowed, but not editable], 256 [no subset embedding], 512 [Bitmap embedding only])

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Why is a font not embedded? Just for the statistics: in 35 % of the PDF files uploaded for a JACoW ≈ conference one or more fonts are missing. A font was requested which is one of the 15 base fonts and the switch always embed fonts had not been set in the PostScript produced by the author. In the PostScript driver setup the option for TrueType fonts was set as “Substitute with Device Font".

Volker RW Schaa Fonts & Encoding A Truetype, CID or OpenType font was used that contained the flag /FSType def with an x-value , 0 (i.e. 2 [restricted license embedding], 4 [embedding allowed, but not editable], 256 [no subset embedding], 512 [Bitmap embedding only])

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Why is a font not embedded? Just for the statistics: in 35 % of the PDF files uploaded for a JACoW ≈ conference one or more fonts are missing. A font was requested which is one of the 15 base fonts and the switch always embed fonts had not been set in the PostScript produced by the author. In the PostScript driver setup the option for TrueType fonts was set as “Substitute with Device Font". A font was requested in the PostScript file which is not installed on the editor’s computer, therefore a substitute font is used for display and PDF generation, but the font will not be included in the final PDF.

Volker RW Schaa Fonts & Encoding Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Why is a font not embedded? Just for the statistics: in 35 % of the PDF files uploaded for a JACoW ≈ conference one or more fonts are missing. A font was requested which is one of the 15 base fonts and the switch always embed fonts had not been set in the PostScript produced by the author. In the PostScript driver setup the option for TrueType fonts was set as “Substitute with Device Font". A font was requested in the PostScript file which is not installed on the editor’s computer, therefore a substitute font is used for display and PDF generation, but the font will not be included in the final PDF. A Truetype, CID or OpenType font was used that contained the flag /FSType def with an x-value , 0 (i.e. 2 [restricted license embedding], 4 [embedding allowed, but not editable], 256 [no subset embedding], 512 [Bitmap embedding only])

Volker RW Schaa Fonts & Encoding Missing Characters in PDF in http://americanprinter.com/mag/printing_best_pdfs/ On the Mac, as any user knows, fonts can be found just about anywhere and it’s very easy to have duplicate fonts. To make matters worse, OS X includes Apple’s new dfonts, which are simply the Mac version of the most commonly used fonts, like Helvetica and Times. When it comes to PDF creation, this variability can result in the wrong version of a font being embedded in a PDF file with the possible outcome of different spacing, kerning or even inclusion of a glyph that is entirely different from the one originally set in the page layout.

Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Why are characters missing (or shown as )? A font was requested with a high version (newer release date containing more characters) than the font installed on the editor’s computer (therefore missing in display and/or print) or on the printer (missing in print). [Like during PAC’09 on Mac computers]

Volker RW Schaa Fonts & Encoding Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Why are characters missing (or shown as )? A font was requested with a high version (newer release date containing more characters) than the font installed on the editor’s computer (therefore missing in display and/or print) or on the printer (missing in print). [Like during PAC’09 on Mac computers] Missing Characters in PDF in http://americanprinter.com/mag/printing_best_pdfs/ On the Mac, as any user knows, fonts can be found just about anywhere and it’s very easy to have duplicate fonts. To make matters worse, OS X includes Apple’s new dfonts, which are simply the Mac version of the most commonly used fonts, like Helvetica and Times. When it comes to PDF creation, this variability can result in the wrong version of a font being embedded in a PDF file with the possible outcome of different spacing, kerning or even inclusion of a glyph that is entirely different from the one originally set in the page layout.

Volker RW Schaa Fonts & Encoding Encoding and naming History of fonts and encodings Font character maps Fonts in print (encoding and subsetting) Font problems part 1 – Font not embedded Conclusion Font problems part 2 – Font embedded but characters missing

Why are characters missing (or shown as ) cont.? In the PostScript driver setup the option for TrueType fonts was set as “Substitute with Device Font" but the printer device has a font with different layout or less characters.

Volker RW Schaa Fonts & Encoding Make sure that the newest firmware is loaded on printers used in the editorial office. The editors should check twice that fonts are embedded. We should provide a set of fonts (on JACoW) which are not distributed with the OS any longer (i.e. original Times family) for installation on computers in the editorial office. Ease the configuration of Tools & Utilities Z Listen to Raphael’s talk about a new step to ease configuration woes with a “Generic PostScript Printer for JACoW"!

History of fonts and encodings What can we do? Fonts in print (encoding and subsetting) Thanks Conclusion

What can we do for the editors to make life easier? Make sure when installing Word on PC or Mac that the newest fonts are installed and that there is only one version on the computer.

Volker RW Schaa Fonts & Encoding The editors should check twice that fonts are embedded. We should provide a set of fonts (on JACoW) which are not distributed with the OS any longer (i.e. original Times family) for installation on computers in the editorial office. Ease the configuration of Tools & Utilities Z Listen to Raphael’s talk about a new step to ease configuration woes with a “Generic PostScript Printer for JACoW"!

History of fonts and encodings What can we do? Fonts in print (encoding and subsetting) Thanks Conclusion

What can we do for the editors to make life easier? Make sure when installing Word on PC or Mac that the newest fonts are installed and that there is only one version on the computer. Make sure that the newest firmware is loaded on printers used in the editorial office.

Volker RW Schaa Fonts & Encoding We should provide a set of fonts (on JACoW) which are not distributed with the OS any longer (i.e. original Times family) for installation on computers in the editorial office. Ease the configuration of Tools & Utilities Z Listen to Raphael’s talk about a new step to ease configuration woes with a “Generic PostScript Printer for JACoW"!

History of fonts and encodings What can we do? Fonts in print (encoding and subsetting) Thanks Conclusion

What can we do for the editors to make life easier? Make sure when installing Word on PC or Mac that the newest fonts are installed and that there is only one version on the computer. Make sure that the newest firmware is loaded on printers used in the editorial office. The editors should check twice that fonts are embedded.

Volker RW Schaa Fonts & Encoding Ease the configuration of Tools & Utilities Z Listen to Raphael’s talk about a new step to ease configuration woes with a “Generic PostScript Printer for JACoW"!

History of fonts and encodings What can we do? Fonts in print (encoding and subsetting) Thanks Conclusion

What can we do for the editors to make life easier? Make sure when installing Word on PC or Mac that the newest fonts are installed and that there is only one version on the computer. Make sure that the newest firmware is loaded on printers used in the editorial office. The editors should check twice that fonts are embedded. We should provide a set of fonts (on JACoW) which are not distributed with the OS any longer (i.e. original Times family) for installation on computers in the editorial office.

Volker RW Schaa Fonts & Encoding Z Listen to Raphael’s talk about a new step to ease configuration woes with a “Generic PostScript Printer for JACoW"!

History of fonts and encodings What can we do? Fonts in print (encoding and subsetting) Thanks Conclusion

What can we do for the editors to make life easier? Make sure when installing Word on PC or Mac that the newest fonts are installed and that there is only one version on the computer. Make sure that the newest firmware is loaded on printers used in the editorial office. The editors should check twice that fonts are embedded. We should provide a set of fonts (on JACoW) which are not distributed with the OS any longer (i.e. original Times family) for installation on computers in the editorial office. Ease the configuration of Tools & Utilities

Volker RW Schaa Fonts & Encoding History of fonts and encodings What can we do? Fonts in print (encoding and subsetting) Thanks Conclusion

What can we do for the editors to make life easier? Make sure when installing Word on PC or Mac that the newest fonts are installed and that there is only one version on the computer. Make sure that the newest firmware is loaded on printers used in the editorial office. The editors should check twice that fonts are embedded. We should provide a set of fonts (on JACoW) which are not distributed with the OS any longer (i.e. original Times family) for installation on computers in the editorial office. Ease the configuration of Tools & Utilities Z Listen to Raphael’s talk about a new step to ease configuration woes with a “Generic PostScript Printer for JACoW"!

Volker RW Schaa Fonts & Encoding History of fonts and encodings What can we do? Fonts in print (encoding and subsetting) Thanks Conclusion Thanks for your

attention!

Volker RW Schaa Fonts & Encoding