Writing Gregg with METAFONT and LATEX

Stanislav Jan Sarmanˇ Computing Centre Clausthal University of Technology Erzstr. 51 38678 Clausthal Germany Sarman (at) rz dot tu-clausthal dot de http://www3.rz.tu-clausthal.de/~rzsjs/steno/Gregg.php

Abstract We present an online system, which converts English text into Gregg shorthand, a phonetic pen used in the U.S. and Ireland. Ÿ Shorthand is defined [2] as a method of writing de- two forms± of signs exist for§ s and th¨: the right (s) , rived from the spelling system and the script of a (th) and the left (S) , (Th) , respectively. particular language. In order to approximate the Blended consonant signs: speed of speech: (ld) (nd) (rd) (tm) (tn) (td)(dd) • The graphic redundancy of written characters dn•¹ º ­ and ligatures derived from longhand letters is reduced. can be thought of as built-in ligatures. signs: • Silent letters are ignored, and complex ortho- for sound(s) as in graphic rules are simplified. The spelling only ¢  [a] at, art, ate serves as a reference system of what is largely / ‹ phonetic writing. [e]/[i] x pit, pet, peatŽ  (o) pot, port , pour • Abbreviations are utilized. à À ¿ Gregg shorthand was first published in 1888. In (u) tuck, took , tomb the following three sections we want to show how its Elementary (triphone) signs: inventor J. R. Gregg has applied the above general for soundsJ as in principles in the creation of his shorthand system [ai] high $ and how the current system version, the centennial [aee] O client ( edition [7], can be implemented in METAFONT [5]. [iie] create When writing shorthand the curvilinear motion 1 Simplified writing of a pen in the slope of longhand traces ellipse or 1.1 The of Gregg Shorthand oval-based curves. We shape these elementary short- METAFONT 1 hand signs in as continuous curvature Signs of consonant phonemes written forward: splines with zero curvature at the end points. (k) (g) (r) (l) (n) (m) (t) (d) (th)(Th) XB“amfª)±¨ Half-consonants h, w, prefixes and/ suffixes are realized as markers, e.g. a dot over [i] such as Signs of consonant phonemes written downward: H in “he”; means preceding h; another dot under (p) (b) (f) (v) (sh) (ch) (jh) (s)(S) ª I (along) the sign (t) such as in denotes the „2Å WŸ§ suffix -ing in “heating”. The signs of voiced consonant phonemes are larger 1.2 Metaform versions of their unvoiced opposites. As the “mini- mal pairs”2 of words with phonemes s vs. z, th vs. Gregg shorthand is an alphabetic system — written dh and sh vs. zh are very rare, there are only the words are combined together from elementary signs unvoiced versions of these signs. On the other hand by joining, so that in general the Gregg shorthand glyphs have the following metaform: 1 Phonemes are denoted in typewriter type, their signs { ({p,}C{,s}) | [+|-][{p,}V {,s}] }+ are parenthesized or bracketed. where V are circular , C denotes (blended) 2 such as face vs. phase, sooth vs. soothe, mesher vs. mea- sure, pronunciations of which differ only in one place consonants and vowels u and o, optional p and s

458 TUGboat, Volume 29 (2008), No. 3 — Proceedings of the 2008 Annual Meeting Writing Gregg Shorthand with METAFONT and LATEX

stand for markers of semiconsonants, prefixes or suf- κ1 κ1 κ κ fixes. In more detail: 0 0 V ::= a | e | i | ai | aee | iie (r) (k) (r)(k) C ::= b | d | dd ... o | u (2) p ::= h | w | ... over | under ... Figure 1: An example of a G continuous s ::= ing | ly | ... CC joining Examples: metaform H 1.3.2 Joinings with circular signs he -[h,i,] I heating -[h,i,](,t,ing) q The signs of consonants can be classified into seven need (n)-[i](d) r categories. Looped vowels before or after a conso- needed (n)-[i](dd) s nant are written within the curves or combined with needing (n)-[i](,d,ing) straight consonants as right curves, so that the fol- The metaform corresponds directly to the META- lowing VC, CV prototype joinings can occur: FONT X“fª „2 program code, e.g.: (n)-[i](,d,ing) ↔ I(,n,); V(-1,,i,); C(,d,ing); £0 i.e. the shorthand glyph of this particular character VC is initiated with (n), then the signs of right vowel Y”h¬¢ 3 CV -[i] and the sign of consonant d with suffix ing are appended. Observe the realization of as VC or CV joinings, too: 1.3 Joining ƒ for soundsL as in We distinguish between joinings where circular signs -[a](u) | howÁ are involved and joinings without them. (o)+[i] 1 toy 7 -[i](u) few 1.3.1 CC Joinings All VC or CV joinings are done in such a man- Examples of CC joinings ordered by their connec- ner that both components have identical tangents (0) (2) tivity grade G –G follow: and curvature equal to zero at the connection point. - ~ U (2) continuous: (d)(n), (u)(n), (t)(S) As an example of G connecting of a consonant (with turning point!) ¢sign with a vowel signX consider prepending£ the sign 2 a = of -[a] before (k) as in : tangent„ continuous:“  + à =„ Ä(f)(l), + = (p)(r), + = (u)(p), 0 −1/r −α B Å F α r + = (g)(v) 0 0 “X Ì r curvature continuous: + = (r)(k) (k) -[a] -[a](k) The first two types of joinings are handled by META- (2) FONT means; G(2) continuous connecting can be Figure 2: An example of a G continuous done only in special cases using Hermite interpo- VC joining lation for B´eziersplines, as follows. t Here the spline approximation of (k) is such If two endpoints zi, unit tangents di, didi = 1 that the curvature equals zero at the end points; and curvatures κi, i = 1, 2 are given, the control + − this is the case with other consonants, too. The points z0 = z0 + δ0d0, z1 = z1 − δ1d1 have to be determined from the following system of nonlinear spline approximation of right circle with radius r equations [3]: for the sign -[a] consists of 8 segments such that the curvature κ = 0 at the top point and κ ≈ −1/r (d × d )δ = (a × d ) − 3/ κ δ2 (1) 0 1 0 1 2 1 1 at the other points. Let α be the angle of the tangent 3 2 (d0 × d1)δ1 = (d0 × a) − /2κ0δ0 (2) in the zeroth point of (k). Placing the -[a] circle at (a = z1 − z0). the point with distance r from the zeroth point of If the tangents are parallel, i.e. (d0×d1) = 0 and (k) and rotating it by the angle of −α a curvature the curvatures κi have opposite signs, the equations continuous joining is obtained. (1)–(2) are trivially solvable, so that G(2) joining is For a G(2) continuous spline approximation of possible (here after some kerning): the unit circle with curvature κ= 0 in one point we

TUGboat, Volume 29 (2008), No. 3 — Proceedings of the 2008 Annual Meeting 459 Stanislav Jan Sarmanˇ

use the traditional B´ezierspline circle segment ap- 2 Phonetic writing th proximation for the segments 1–7 (see the 7 seg- Gregg shorthand uses its own orthography, which 1 ment on the left) z( /2) = 1, so that δ0 = δ1 = must be acquired by the shorthand learner and in 4 4 /3 tan(θ/4) and κi = −(1 − sin (θ/4)). Using this a system such as ours, the pronunciation of a word th value as left point curvature of the 8 segment (see has to be known. We use the Unisyn multi-accent on the right) and demanding κ1 = 0, the Hermite lexicon4 with about 118,000 entries [4] comprising interpolation with the equations (1)–(2) for the un- British and American spelling variants. The fields knowns δ0 und δ1 can be solved for segments with of lexicon entries are: spelling, optional identifier, ◦ θ < 60 . part-of-speech, pronunciation, enriched orthography, and frequency. Examples are: κ = 0 + −1 owed;;VBD/VBN; { * ouw }> d >;{owe}>ed>;2588 z0 z1 z1 live;1;VB/VBP; { l * i v } ;{live};72417 live;2;JJ; { l * ae v } ;{live};72417} κ1 ≈ −1 a z1 z0 Homonyma cases such as “live” above, in which κ0 ≈ −1 − the pronunciation helps to identify word meaning z1 are much more rarer than the cases in which the a + use of pronunciation yields homophones resulting in z0 shorthand homographs. Consider the very frequent θ θ right, rite, wright, write: { r * ai t } with short- z0 › κ0 ≈ −1 hand glyph (r)+[ai](t) or the heterophones read;1;VB/NN/NNP/VBP; { r * ii d };{read};94567 Figure 3: 7th and 8th segment (θ = 45◦) of our unit read;2;VBN/VBD; { r * e d };{read};94567 right circle spline approximation – both written as (r)+[e|i](d). Thus phonetic writing may speed up the shorthand recording, but For CVC joinings 7×7 cases are possible in the newly-created homographs complicate the deci- all — many of them can be transformed by reflection phering of written notes. and rotation into one another. All these joinings can The pronunciation of a word has to be trans- be made G(2) continuous. It must be decided before formed to the above defined metaform, e.g.: writing a right consonant, whether the loop is to be needed: { n * ii d }.> I7 d >⇒ (n)-[i](dd) written as a left or¯ as a rightl loop, compare: needing: { n * ii d }.> i ng >⇒ (n)-[i](,d,ing) team (t)+[i](m) vs. (m)-[i](t) meet This major task consists of a number of trans- X“fª „2 formations done in this order: CVCX Z_]`^[ • Elimination of minor (redundant) vowels such “ š—œž ™˜ as the schwa @r in fewer: { f y * iu }> @r r >. f koijt vu There is a general problem with the schwas (@). Which are to be ruled out and which of them are ª µ°¯«®¸¶ to be backtransformed5 to the spelling equiva- "!#£¦ ¥¤ lent? Consider „ ‡ˆŒ†Š‰ data: { d * ee . t == @ } ⇒ (d)-[a](t]-[a] upon: { @ . p * o n } ⇒ (u)(p)(n) 2 96:<; 58 Both @ are backtransformed, but stressed o is cancelled. Technically speaking the shorthand glyphs are • Finding of prefixes (un-, re-, . . . ), suffixes (-ing, realized as -ly, . . . ), handling of semiconsonants h, w. (0) (2) • an array of G –G continuous METAFONT- • Creation of ligatures such as dd, parenthesizing paths and consonants and bracketing circular vowels. • an array of discontinous marker paths. • Finding the proper orientation of loops. Slanting (default 22.5◦) and tilting3 of charac- These tranformations are done through a series of ters is done at the time of their shipping. cascaded context dependent rewrite rules, realized 4 which itself is a part of a text-to-speech system 3 G(2) continuity is invariant under affine transformations. 5 by a lex program

460 TUGboat, Volume 29 (2008), No. 3 — Proceedings of the 2008 Annual Meeting Writing Gregg Shorthand with METAFONT and LATEX

by finite state transducers (FSTs). An example of are entered in the form latex (i.e. latex#1) and such a rule is latex#2, respectively. " I7" -> 0 || [ t | d ] " }.>" _ " d >" The task of the online conversion of given text which zeroes the string " I7" in the left context of to shorthand notes is done in the following four consonant t or d and the right context of "d >"; i.e.: steps: { n * ii d }.> I7 d > ⇒ { n * ii d }.> dd > 1. The input stream, stripped of LATEX commands, Effectively this rule creates the ligature (dd) and is tokenized.8 There are two kinds of tokens: can be thought of as the reverse of the phonetic rule • Punctuation marks, numbers and common " e" -> " I7" || [ t | d ] " }.>" _ " d >" words for which a metaform entry in the which determines the pronunciation of perfect tense abbreviation dictionary of brief forms and suffix > e d > in the left context of t or d: phrases exists and { n * ii d }.> I7 d > ⇐ { n * ii d }.> e d > • the other words. 3 Abbreviations 2. At first for a word of the latter category its The Gregg shorthand, like every other shorthand pronunciation has to be found in Unisyn lex- system since (43 B.C.), takes advan- icon [4]. From this its metaform is built by a tage of the statistics of speaking6 and defines about program coded as the tokenizer above in the 380 so called brief forms. Here are the most com- XEROX-FST tool xfst [1]. mon English words: 3. In a mf run for each of the tokens using its the and a to of was it in that metaform a shorthand glyph (i.e. a METAFONT ´ »yÇTP ³ character) is generated on-the-fly. (th) (nd) (a) (t)(o) (o) (u)(S) (t) (n) (th)-[a] 4. Then the text is set with LATEX, rendered in the Writing the most common bigrams together, e.g.: pipeline → dvips → gs → ppmtogif and sent ofz the in R the to ¾ the on } the to ¼ be from the server to the browser. PDF output with better resolution can be generated. Also a djvu-backend exists, which produces an anno- (o)(th) (n)(th) (t)(o)(th) (O)(n)(th) (t)(b) tated and searchable shorthand record. PostScript spares lifting the pen between words thus increasing Type 3 vector fonts can be obtained, too. shorthand speed. These bigrams are entries in the dictionary of References about 420 phrases as well as the following examples:  [1] Kenneth R. Beesley and Lauri Karttunen. Finite as soon as possible: -[a](s)(n)(S)(p): State Morphology. CSLI Publications, Stanford, N 2003. if you cannot: +[i](f)(u)(k)(n): [2] Florian Coulmas. Blackwell Encyclopedia of M Writing Systems. Blackwell Publishers, 1999. if you can be: +[i](f)(u)(k)(b): [3] C. deBoor, K. H¨ollig,and M. Sabin. High accu- ² racy geometric hermite interpolation. Computer thank you for your order: (th)(U)(f)(u)(d): Î Aided Geometric Design, 4:269–278, 1987. you might have: (u)(m)-[ai](t)(v): [4] Susan Fitt. Unisyn multi-accent lexicon, The abbreviation dictionary is coded in lexc [1]. 2006. http://www.cstr.ed.ac.uk/projects/ unisyn/. 4 text2Gregg [5] Donald E. Knuth. The METAFONTbook, vol- Our text2Gregg software is an online system,7 which ume C of Computers and Typesetting. Addison- records input text as Gregg shorthand. LAT X input Wesley Publishing Company, Reading, Mass., E th notation is accepted. 5 edition, 1990. Homonyma variants such as [6] Stanislav Jan Sarman.ˇ DEK-Verkehrsschrift b mit METAFONT und LAT X schreiben. Die latex;1,rubber; { l * ee . t e k s } E c T Xnische Kom¨odie, to be published. latex;2,computing; { l * ee . t e k } E [7] Charles E. Zoubek. Gregg Shorthand Dictionary, Centennial Edition, Abridged Version. Glencoe 6 The 15 most frequent words in any text make up 25% Divison of Macmillan/McGraw-Hill, 1990. of it; the first 100, 60%, . . . . 7 see project URL and also DEK.php for the German short- 8 Care has to be taken for proper nouns, which are under- hand DEK counterpart [6] lined in Gregg shorthand.

TUGboat, Volume 29 (2008), No. 3 — Proceedings of the 2008 Annual Meeting 461