Inside ASCII

By Ft. W. Bemer The data alphabet called ASCII [Figure 1. page '38. and STICKS «ll-T ; Reference 111, also has two other names—International ASCII. as a then code. is usually represented in 3 col- Standard 1545 {the 150 Code [Reference 2]} and Alphabet um ns of 16 positions. The row positions are (flit) through No. 5 ol CCI'I'T {the International Consultative Committee 1111. the tow-order 4 hits. C through 15 in decimal. The for Telephone and Telegraph}. It is used throughout the columns are CUE! through 111, the next higher 3 hits. Ci world. incorporated in billions ofdollars of equipment. through ? in decimal. For some reason, the developers But is it used correctly and wisely“? Not alwavs. There of ASCII found it convenient to refer to these eight col- are misinterpretations. and gaps in definition that per- urnns as ”sticks.” So shall we. Each position will he rep- mit nonstandard usage. This article (in three parts] will resented in this article lav its usual decimal represents- give you the background, peculiarities. preferred prac' tion. For example. capital A is position 4“. Figure 2 is a tices. and. new developments. for ASCII. You will find a representation of ASCII that is more convenient to those lot of information not too generally known or realized; it working in octal. rather than hexadecimal, notation. should help in the correct and safe usage of ASCII. For additional help. vou can reference the various national and international standards given in Table 1. Some other detailed articles are listed in References 3, A and S. HlflH DFIDER LE'I'II' CIHDE H BET-At. “h CHSTAL DIGETS EMGIT 1H “'5‘ I'll Fl” {1| I1. I1 “HIT it'll 9.1” HuL'oLE st- ' '1] than-Tma In“ I. IJJ-I‘IFF 'I :Eil1i “*3: "TI- LI tum TIN-4‘1! sot-loot t ‘ Eta-ruff? 5-1"! III-.W fifflittfi tn! W“ I? [LII-HF! In! IVE! {b51957 [Aorist-H use ETI‘. till-Follvt“ 1" "I! ILI-tu-I'HI- .15 Ila-LBJ Ill Int-linking. IT!" I.'§_F§ EDT ‘ lfl'i'fiwl [lei'il III I'll {Pl’lfif‘ll‘ ill-mini IL“- Em c-en nu on 1|. 11- HHA mitt-In to}: ACE 5TH ink [itli'lJEp'l HIM 1‘], “Jul-”hf. 5!. I'MLH “'11 F]..{Illt|.w “:11 ETE accentuate Ipfit 11 rg: 1m lrxmrn 3175 I'M “up-I- haul-Mil. III-I: can: an 4.! Ik3~l'?ri'l:l-l [lira-nor 5!: lb”! ‘ HICH CHEER LCIH' DHUEH {twitter “1'. lo!- M F I : .ill'll "1HEIFI are: ”re: III-twat NIH- I'J l-l- . II- “3"? this“ 1 15”.? H 'II'|-.i".i Ell-PH”? in! far if” 1-! I '“Wifl'l‘f'! twill“ “'5! J LEEEWII it. [SE - Internattonal Stendirds Greenfzatfofi L ECFA - European computer Manufacturers Assoctation ANSI - Airricln Rational Standards Institute H FIFE - Federal Interaction Processlog Standard EBA - Elnadtlfl Standards Associatice H 55 - Brittsh Standard D A5 - Australian Standard CEITT — Eonsultlttve Eennfttee loternattenet, Telcencne E Teardfedh l H5 - Jaaanese IfiGJstrfaL Standard Figure 2. ‘ BUST - USSR Standard Table 1. 95 INTERFACE AGE MA? WES The first positions of sticks it and E are respectively Also in sticks 4-? are three diacritical marl-ts. They are the “commercial at" and “accent grave.” Then the upper accent grave t“! in Etfl. circumflex It} in SIM. and tilde tau} and lower case Ftoman alphabets follow. This ottset of ene in ms. These are called secondary national usage posi- position is historical them the United Kingdom}. and ot tions. In some countries the tilde is a straight overline. no importance as long as you remember that it is so. But it is the circumflex where we have a lot of confu- Foitowihg the alphabet in both sticks 5 and T are three sion. Teletype first made it an “up arrow“ in an earlier positions each that one must be very cautious about. In version of ASCII. to serve as an exponentiation symbol. ASCII they are assigned as [. I". and ] in stick 5 — i. l". primarily for BASIC. But that doesn’t do very well. be- and I in stick T. Eiut in the ISO Code and CCIT’T versions cause the exponentiation for FflFtTHAN is a double they are reserved for national usage. Table It gives the asterisk! The FDRTHAH version is preferable in France. national use assignment tor these positions. Surely you certainly. because they use such words as crane. cote. remember that the Scandinavian alphabet has 29 letters. cout. and so on. not 26'? My friend Grier Heen in field is very protective of A companion problem exists in position bits. with the these positlons. He says ”If you Americans want to sell underscore. The underscore is neither national nor dia- computers and software abrcad. don't use the ASCII critical; all countries use it just as underscore {and for characters for these positions in your software.” typesetting it is a US. convention to indicate italics. but To be more precise. position-s E11. E12. 5i13. Till. in Italy it means beldface. except when it is the last THE. and 1113 {noted above] are called primary national character in a line!"l. But Teletype's early version of usage positions. Se is 4.13. where ASCII has the “com- ASCII used it as a ”left arrow” — probably for an assign- mercial at.” Honeywell. for example. uses the “at” in its ment symbol equivalent to := in ALGDL. The up and left timesharlng systems for deleting the previous character arrow have been carried over from Teletype into many upon entry. But this isn't too serious. because many na- video terminals. Ask your terminal manufacturer to cease tions also have the “at” in their primary sets. and desist and retrofit. It's not ASCII and will only cause trouble forever. currency tat T nah-mil 7 du- die [fl 3" rial-twat till The last character in sticks 4-? is the Delete. symbol at: an: axe 5H] site sets emf etc MT Hz F'r'lJ DEL. in position THT. It was put here because the binary Metrerlmtli-J - I ' . | "1 f I code is 1111111. which would be all punched holes in Australia I ‘ t . Hull mm all ‘ perforated {not always paper!) tape. and that is the only t'I" 4r'mafiy— A t. US way to make sure that it cannot be misread as some Japan *— other character- ASCII is a complete set; all positions Uh it C lllly -; I. I are assigned to have meaning. fimttlr lend ml- Fun“ ~ ill. :- HE‘SP i...__. 2311ch 2-3 . Heme r term — E These are usually called the sticks for digits and spe- Helium —H _._._.__ I when — B 1 It. cials. Remember that they are the “digits" El 1:! 9: not Emu-Ithaca EH numbers. not numerals. not anything but digits! They llfly— E l t IIII‘II'I Eva-fireman —»t: are in am through SIB so that the Iowverder it bits are the Hungary l fr representations for packed decimal. Griginally we con- it: Germany — B i“ t- It: Fm Svr-tnrllnd =-D r-gau—A- sidered the possibility of a special it+bit set for numeri- hug-.- Semen I ' I £‘—L-.-- entry in Table Ia}. but it Ftfiufid H cal applications [see the fifth a. Denmark h out that computer hardware became inexpensive "i‘ri‘niL‘ifilfii-II turned RIPE-ii Helm-y :ga-z-a-J-v- Slum mifieafififlfl‘nfififiz enough to not deprive ourselves of the extra capabitities of the Tabit and 3-bit sets. Table 2. H111“ TQM i‘NTEfiFACE AGE 9? hbfhlhi. hhnhabu mm 0001 [—J 1:] l3 T123 1131:} El '1 C] {—1 [Jr E“ 959 E3 [:1 SD CICI x = SC! IZJIZI D L] II E] D E! I] 3 LS] Hit: 1 TH“ H? pm mm in um sun In: mm unl- - 2 in: :unmr. 'r' prim-*1: nil-anal * fl‘ilfl.flld3iflml'¥ .‘ which if! ducrihnl my, I when fluid-d hr HEP 1hr Humilarvkm Win an [rum in the m hum Figural 1. 93 FM TEHFAGE AGE HAY 19H Position as is officially called "space.“ I don't and HANDPHIHTIHG FDR STICKS 2-? didn't like it. and would have prelerred ”blank.” Which is The classical contusion for many years was “between why the IBM community often uses a lower case "bee” the digit zero and the letter “oh.” but there are other with a slash through the vertical as its symbol. From the possibilities for contusion. American Standard 323.45 Univac side. the space has the official symbol “delta." specifies the handwritten character shapes shown in Having mentioned packed decimal. where two digits Figure 3. go into each 8-bit group (”byte” to the American. “octet" to the French}. a word of caution on the plus and minus signs — they are in stick 2. rather than stick 3 with the digits. But the low order 4 bits are distinct. and + should be used only as tbtt. — only as 1101. I mention this because the nonstandard code EBCD’IC permits multiple representations of + and — in packed decimal. And the ASCII representations are not even coincident NB CDEEEHII with any of these. with obvious dangers! Watch out for the “currency" positions, 33 and a4. They also have national variations. ln ASCII they are we tomarily it and S. but there are some things to be remem- bored: III is not “number sign” for many countries. most of which use ”lilo." or ”Mr." for that purpose. And when it is “number." it must precede the digits.

Inside ASCII

End-Of-Line Hyphenation of Chemical Names (IUPAC Provisional

ISO Basic Latin Alphabet

The Stata Journal

Unicode and Code Page Support

SAS 9.3 UTF-8 Encoding Support and Related Issue Troubleshooting

The Not So Short Introduction to Latex2ε

Basis Technology Unicode対応ライブラリスペックシート文字コードその他の名称 Adobe-Standard-Encoding A

JS Character Encodings

CONTENTS 1. Introduction...1 2. RPL Principles

Solaris Internationalization Guide for Developers

Unicode Unicode

Nuts 'N' Bolts: Legal-Writing Mechanics—Part I Gerald Lebovits