<<

11/13/2019

Other Encodings

ASCII, , BCD and EBCDIC

ASCII

• American Standard for Information Interchange

• Representation of printable (and related) characters as patterns

• Basic ASCII is 7-bit code - 8th bit used for parity (primitive error checking)

• Extended ASCII: ISO8859-1, CP437, etc - extensions of 7-bit ASCII 8 - include graphics symbols, European characters - not consistent - All include 7-bit ASCII as 1st 128 characters

1 11/13/2019

ASCII tables – see montcs.bloomu.edu/Information/Encoding/ASCII-EBCDIC.html

Hexadecimal Range Usage Examples range

First 32 0x00 - 0x1f Control Ctrl-, ‘\’, values characters Escape Second 32 , 0x20 - 0x3f (, ), 0..9, = values digits Third 32 Uppercase 0x40 - 0x5f A.., [, ],@ values letters Fourth 32 Lowercase 0x60 - 0x7f a..z, {, }, ~ values letters Extended ASCII Various non- Last 128 English 0x80 - 0xff ¶, ü, ┝, ┤ values characters, “ASCII graphics”

ISO8859-1, a..a. -1

2 11/13/2019

CodePage 437 – the IBM Set

Unicode family

• Multi- successor to ASCII - Represents by "code points" - UTF-8, other encodings of code points » 1 byte for ASCII » expands to 2, 3, or 4 for other character sets

• Support for many languages - Greek - Cyrillic - - Mandarin - -

3 11/13/2019

partial Unicode code table – see montcs.bloomu.edu/Information/Encodings/unicode.html

Unicode and

• Different representations for code points

• More than 1700 emojis currently defined

4 11/13/2019

UTF-8 Encodings

• Unicode currently defines code points +0000 through 0x10ffff - somewhat over 1 million characters in 17 planes

• UTF-8 uses up to four bytes to represent these code points - 5-, 6-byte encodings unneeded

Unicode and UTF-8 – A Few Example Alphabets

Character First Second Third Fourth Code Points Set Byte Byte? Byte? Byte? Basic Latin 0x00 – U+0000 – U+007f (ASCII) 0x7f 0xc0 – 0x80 – Latin-1 U+0080 – U+00ff 0xc3 0xbf Latin 0xc4 – 0x80 – U+0100 – U+017f Extended-A 0xc5 0xbf Latin 0xc6 – 0x80 – U+0180 – U+024f Extended- 0xc9 0x8f 0xc9 – 0x90 – … U+0250 – U+036f 0xcd 0xaf Greek and 0xcd – 0xb0 – U+0370 – U+03ff Coptic 0xcf 0xbf 0xd0 – 0x80 – … U+0380 – U+07ff 0xdf 0xbf 0x80 – Samaritan U+0800 – U+083f 0xe0 0xa0 0xbf … U+0840 – U+10ffff 0xe0 0xa1 – … 0x80 – … ???

5 11/13/2019

Bit Pattern BCD Unsigned Binary Coded Decimal Binary 0000 0 0 0001 1 1 • BCD – scheme for encoding 10 decimal digits 0010 2 2 3 - Bit patterns 1010 – 1111 unused 0011 3 0100 4 4 • Two BCD digits per byte 0101 5 5 - examples: 13 = 0000 0011 0110 6 6 64 = 0110 0100 0111 7 7 - 00-99 range is less than 1000 8 8 unsigned binary range of 0-255 1001 9 9 • Hardware support more 1010 - 10 complicated 1011 - 11 - , subtraction are full of 1100 - 12 “special cases” requiring 1101 - 13 additional circuitry 1110 - 14

1111 - 15

Using BCD : a 1979 HP-9845 interfaces with a 1967 HP voltmeter

6 11/13/2019

EBCDIC

• Extended Binary Coded Decimal Interchange Code - BCD is embedded within it

• Alternative to ASCII - 8 bits

• Created for IBM mainframes - suited to 80-column punched cards - support for business applications

• Country-specific versions were not mutually consistent

• Little used today

EBCDIC

7 11/13/2019

EBCDIC table – see montcs.bloomu.edu/Information/Encoding/ASCII-EBCDIC.html

a punched card showing EBCDIC

8