<<

DOTSYS II1: A Portable Translator

J. . Sullivan, The Mitre Corporation

and then the right. Each position may have Background a dot or not; thus there are 26=64 possible combinations. In a given piece of text, there The subject of braille translation will typically be a number of cells that is not a widely familiar one, so that a stand one-for-one for a character in the brief introduction to that topic would inkprint counterpart. However, the general seem to be in order before taking up rule is that a group of one or more braille DOTSYS III. Most people know what braille cells will correspond to a group of one or is in general--a coding system employing more inkprint characters. In order to re- raised dots so that the sense of touch duce the space required by braille and alone suffices to read. However, contrary correspondingly improve the reading rate, to the impression one gets from those the rules are so contrived that the braille little cards, the most widely used codes group is almost always shorter. The chief are not "substitution ciphers"--that is, device for doing this is the contraction. the braille equivalent of a given "ink- One-hundred-elghty-nine commonly-occurring print" text is generally not a simple letter groups have a short braille form transliteration but rather a kind of that substitutes for full spelling in some translation. This is because the rules circumstances. Generally, these circum- for transcription involve not only the stances are dictated by the fact that a spelling of words, but also their syllabi- given braille sign may have different mean- fication and pronounciation. And since ings depending on context--e.., the sign these occasionally vary with the meaning for a period and the letter-group "dis" are of the word (e.g., the verb "do" and the one and the same, so that a contraction may musical note "do") the transcriber must be made when the letters appear at the be- understand what he is transcribing, at ginning of a word (e.g., "dislike") but for least at a superficial semantic level. the obvious reason full spelling must be This is characteristics of translation, used when they occur at the end ("Chary- although of course the process is not bdis"). Braille is riddled with such con- nearly as difficult as the translation of text dependencies, including "case" and one natural language to another. "escape" codes to permit representation of inkprint effects such as italics and to The actual rules of braille will not shift interpretation modes--even the digits be treated in any great depth here; the use the same sign as the letters a - . various codes differ from language to lan- However intricate rules covering these guage and, even within a language, from cases may be, they are still mechanical and situation to situation. In this country thus not the worst complication in braille there are three standard codes: one for translation. The difficulty lles in a general, "literary" use [i], one for sci- group of rules that forbid use of contrac- entific notation [2], and another for mu- tions where syllabification or pronuncia- sical notation [3]. A fourth standard [4] tion would likely be thrown off; for ex- covers formatting rules. In addition to ample, the "the" contraction should not be these, a number of more-or-less widely used in "sweetheart" because the letter- used codes exist, ranging from simple group straddles a strong syllable boundary. tranliteration codes such as "grade I" and And, as previously noted, syllabification "computer braille" to a braille shorthand. and pronunciation cannot always be deter- Unless otherwise indicated, we shall be mined without ascending to the level of referring to the literary code; sometimes meaning. called grade II braille. This code re- quires some sixty pages of rules to define. General Function of DOTSYS III The basic unit of braille is a "cell" comprising six dot positions, three posi- DOTSYS III is a computer program to tions high and two wide. These are label- translate inkprint into braille as auto- ed i-6, counting downward first on the left matically as possible. But since the

398 determination and representation of deep DIS I 6 3 50 semantics is beyond the current state of the programming art--at least for natural In order for an entry to qualify, three languages used in unrestricted context--it distinct tests must be passed. First, the is clear that a program to produce perfect left end of the window must match the en- braille automatically is not technically try as far as the "I" • Secondly, the feasible. DOTSYS IIl represents a compro- next character beyond this point must fall mise between the perfect and the possible. into the "right-context class" designated The occasional flaws in its translation by the second field of the entry. In this can be removed by human intervention or, case, "L" has been defined (by another in some situations, simply left in as an table-) to mean "letter." Thus it is clear acceptable level of "misprints." at a glance that the first two qualifica- tions are met. The third qualification is The program proper does not embody determined by the "input class"--in this the rules of , but rather a case, class 6. This class selects an en- generalized process appropriate to the try in a decision table and in turn this larger inkprint-to-braille problem. A set entry imposes a boolean test on a set of of tables specify the process for a parti- state variables. Roughly speaking, the cular code or related group of codes. A test is a disjunction of conjunctions. The single set of tables (with many versions~) state variables themselves are simply true- presently implements English Braille, with false indicators deriving "meaning" only intermixed Grade I and computer braille from the logic of a particular set of permitted• It appears that a table for tables. For the literary tables, the set Grade II can be implemented of state variables and their approximate quite easily. This flexibility of changing meaning is as follows: translation rules by altering tables is motivated not only by the possibility of I. after the start of a number switching codes but also by the expectation 2. after the start of a word that, even for a particular code, a long process of refinement (with occasional 3. grade i translation special-purpose temporary modifications) 4. inside a quotation will take place. 5. inside italicized text Thus the program was designed with 6. not at the start of a prefix two primary goals: fidelity to the standard or stem braille rules and flexibility to adapt to the various codes, their interpretations, 7. part-way through a word or and application contexts. Two additional phrase too long for one entry goals were: "portability" of the system 8. just after a space or A-J, fol- from environment to environment, and com- lowing a number prehensibility of the system so that it would be a known quantity to its users, Getting back to our example, the decision and to facilitate local adaptation. It was table entry for class 6 will demand that chiefly with portability in mind that COBOL states 2, 3, and 7 all be "off"--i.e., we was chosen as the system programming lan- must be at the beginning of a word, in guage [ii], even though languages more grade 2 translation, and not looking for capable in other respects were available. the second part of a double entry• All these will apply in this case, so the entry qualifies. Had it not qualified for any The Alsorithm reason, then the translation table search would have continued; and if no entry The basic algorithm is a modified qualified, the single leftmost character finite-state process devised by Dr. Jonathan would be translated according to an assum- Millen [5]. It is perhaps easiest to ed (default) entry. understand the algorithm by observing it in action on a particular case, using the Once an entry is accepted, three English Braille tables. actions take place. First up to four braille signs and/or program control codes First, let us assume that some text are issued. In this case "50" selects the has already been translated. The input, dots-2-5-6 cell representing the contrac- seen through a 10-character "sliding win- tion "dis" Second, the window is shifted dow" (underlined), looks like a certain number of places, in this case 3:

• THIS DISEASE IS NOT SERIOUS • THIS DISEASE IS NOT SERIOUS .

The first step is to locate a qualifying Finally the input class selects an entry entry in the main translation table, which in the transition table. This entry deter- in this case contains about i000 entries. mines the new setting of each state vari- In principle, though not in actuality, the able: set, reset, changed or unaltered. table is searched serially• The entry in In this case state variable i, 7, and 8 question is: will be left unchanged.

399 The input class is thus the key to much of the program was running to consider the state setting and testing logic. It a strict restructuring along these lines. tends to be the same for most contractions Nevertheless, these structural modularity of a given type--class 6, for example, is concepts are present in sufficient degree characteristic of those contractions used to form a model for explanation. only at the beginning of a word. The first module is the primary input It may be noted at this point that processor. Its main Job is to accept the "disease" is one of those odd words whose inkprint text stream, discarding certain ex- contraction depends on meaning. When used traneous control markings (such as record in the usual sense, the next two letters boundaries) and filler blanks, and produce (ea) should be contracted as a dot-2 cell. the "window" for the translator. When However, when used in the archaic sense of "self-checklng" is in effect (see the next "lack of ease," the following rules combine section), the input processor also computes to forbid the use of the contraction: the "correct translation" of each inkprint word and places it on a ring buffer for Rule Par. 42: "The lower-sign con- later comparison. tractions for ea and (certain others)... must never begin or end a word." Hence The second module is the translator "ease" is spelled out. itself. The main logic has already been discussed. The output of this processor is Rule X Par. 34 (1): "... a contrac- simply the stream of braille signs and spe- tion must not be used where the usual cial control codes exemplified by the "50" braille form of the base word would be al- in the sample shown. tered by the addition of a prefix or suf- fix." [1] An important submodule of the trans- lator is the main table look-up, which im- Naturally, the standard table will plements a logically serial search but contract the ea. If the input preparer capitalizes on the fact that whole portions knew that the archaic meaning was intended, of the table can often be bypassed when he could flag the input to inhibit contrac- failure to match the window contents occurs tion (as will be discussed later under on a particular entry. There is an input "Special Features") or, if a large work ordering restriction: entries whose first used the term so often that this approach letters are alike must be grouped to- would be irksome, another entry--perhaps gether. However, this does not reduce the generality of logic supported by the table. DISEASE - 6 6 5~17~114 Using at most one extra pointer per entry, ""chains" are set up so that all first en- inserted before the DIS entry, woul~ suf- tries for characteristic second-letter fice to alter the logic for Just this word groups are linked and so forth. A signifi- and its derivatives. cant reduction in search time, at minimal cost in space, is effected by this tech- nique. Orsanizatlon of the Program A third module, the stacker, operates Although the process Just described on the codes issued by the translator. In is the heart of the matter, there are a the simplest case, these codes are merely good many other details for the program to accumulated until the code for the blank attend to: input handling, output control braille sign is received--i.e., a braille for a variety of devices, llne and format- word is finished. This word, constituting ting including running titles, pagination one entry in the stack, is normally then and several special formats, and provision removed from the stack and issued as output for several special braille representation to the llne composer module. In some cir- rules. Of these special rules, one of the cumstances, two or more words may be held most troublesome is the requirement that in a stack before release. This may be to a numeric quantity followed by a unit of effect order reversal, or to group words for measure must be represented in a standard running titles or centered headings. If form with the measure first followed by "self-checking" is in effect, it is the the quantity. The order reversal is con- stacker that compares the actual translation trolled by the tables but provision to with the correct translation left on the perform this operation must be supplied by ring buffer by the input processor. The the program. ring buffer allows for the logical lag in the various processing stages, and permits The program comprises five modules, recovery from synchronization problems that or, if you prefer, "levels of abstraction." occur when, as occasionally called for in The processor at each level is organized the rules, several inkprint words are run around some characteristic resource, and together into one braille "word." operates as a logical coroutine, sharing narrow interfaces with the other levels. The line composer, the fourth module, At least, that is the basic Idea--or operates on the braille words produced by rather, the idea that crystallzed after too the stacker. It concerns itself with

400 arranging the words on a braille llne, Direct Translation Control starting a new line or new page when nec- essary, page numbering and titles, and so Although not strictly a function of forth. the program, the standard tables provide a means for inhibiting a contraction that The fifth module, the output writer, would otherwise be made, or for forcing a is actually a collection of proGessors, one contraction that would normally not occur. for each distinct mode of output, as listed In most cases of mlstranslated words, this in the next section. is the easiest way to correct the braille.

SPECIAL FEATURES Self-Checkin 5

Formattin 5 Early in the development of DOTSYS Iii, it became apparent that some automatic The program incorporates a number of means was necessary for checking on the special formatting options, including auto- quality of braille translation achieved by matic pagination and numbering, centered a given table--and especially after a titles, and a "poetry" (hanging indent) series of changes, to see that no unwanted format. An extensive system of "typed" side effects were introduced. This need tabs permit easy alignment of columnar ma- was met by preparing an input text contain- terial on the left, right or decimal point ing 5808 "problem" words used in training within the entries. In general, the selec- human transcribers [7], with special mark- tion of options, the setting of modes and ings to indicate the correct translation. tab values, and the issuing of direct for- The program compares the normal output with mat controls (e.g., "skip to new page") is the correct braille, flagging misses with accomplished by commands embedded in the a code in braille and a large TSK on the input text. proof output.

Output Devices HISTORICAL NOTES AND WORK TO BE DONE

At present, the programcan drive five DOTSYS III is by no means the only output devices, or any combination. Two of braille translation program, although it these are "printer braille": a form of is one of very few that attempts to pro- embossing accomplished by printing periods duce braille in accord with the existing on a standard line printer with a resiliant standards. The programs at the American backing inserted between the paper and the Printing House for the Blind [9, 12, and platen. The periods dimple the paper so 17] were in use for some years prior to that the reverse is a crude form of braille. DOTSYS, but unfortunately they are in With a printer set for normal spacing the machine language and hence not portable. size of the braflle cell is far off the A later PL/I program by Dr. Lois Leffler standard, but a special IBM modification, [I] also adheres to the standard. in conjunction with slightly different pro- gram output logic (hence the two "devices"), Those programs that deviate from the can bring the spacing back into acceptable standard usually do so for eminently prac- range. This is the technique used by the tical reasons, for as we have seen the Atlanta Public Schools. braille rules are not designed for easy automation, and in many situations quantity A hlgh-speed embosser developed at and speed is simply an overriding consid- MIT [6] can also be driven directly by the eration. Beyond this, there are those who program. In a typical configuration, the argue that the rules are not even well program is running under a time-sharing designed from the human standpoint, and system such as Interactive Data Corporation' that a major redesign friendly to man and virtual machine system, with the embosser machine alike is called for [14]. Despite slaved to the user's terminal teletype. much technical merit in such proposals, it The teletype can be used for entry and is clear that the investment already sunk editing of text as well as an intermediary into training and literature inventory for the embosser output. (and--yes--computer programs~) will weigh heavily in favor of the existing standards, One form of output, the "proof," is with perhaps some local improvements, for intended for the use of a sighted editor. some time to come. The proof contains an echo of the input and the resulting braille printed as dots. The development of DOTSYS III was carried out at MITRE under the direction of Finally, a card output, containing Robert A. J. Gildea, presently chairman of numeric codes for the braille signs, is SIGCAPH, under contract with the Massachusetts provided for general interface purposes. Institute of Technology Sensory Aids Center and the Atlanta Public Schools Computer Braille Project. Professor Robert Mann of MIT and Dr. Marion Boyles of Atlanta

401 directed the project at their respective Flexibility of the tables is a built- establishments [16]. Messrs. Vito Proscia in fact, but flexibility of the program it- and George Dalrymple of MIT, and Robert self--to add new formatting capabilities, Lagrone and Donald Bell of Atlanta, were or new output devices--appears to work out also instrumental in various phases of the only for the original programmers, as in project, especially in helping to improve the case of the MIT embosser. This speaks the translation quality while integrating unfavorably of the program's comprehensi- the program into their operating environ- bility, which seems to be the chief user ments. complaint despite extensive functional documentation [10]. Some of the difficulty The original technical development may no doubt be chalked up to the mutual was Dr. Jonathan Millen's DOTSYS II program interference of three "cooks" working on [8]. This pilot program implemented the one program, or the fact that none of the basic algorithm, and even with a relatively three had programmed in COBOL before (and simple set of tables produced braille good they haven' since), or the verbosity and enough that further development into a pro- lack of block structure in COBOL, or simply duction capability was clearly in order. to the fact that perhaps 80% of the present Mr. . Reid Gerhart and the author there- 1600 lines of code is retrofitted. In the upon set out to refine the table and to re- opinion of the author, however, the basic trofit the program with most of its present problem was a very familiar phenomenon: formatting capability, including units-of- the proper structure of the program was measure order reversal. In the process, the not--and could not, as a practical matter-- table search was improved to its present be recognized until the point was past form and self-checking was introduced. Some where a total restructuring was thinkable. modularity was introduced into the output Deadlines, a tight budget and a bird in section, so that new device types could be hand in the form of running code can con- introduced easily. In fact, the MIT spire in a sort of tyranny over one's Braillemboss module was not added until options despite what is obviously desirable after the first delivery of the program to technically. Atlanta in August of 1970. That brought the program to essentially its present In any event, this issue has been form, although a number of minor fixes and revived in the form of an effort, under improvements have been made since, many by Air Force auspices, to rewrite DOTSYS III the users themselves. In this regard as a case study in the principles of Dr. Judy Gunnerman and Messrs. Philip Bagley structured programming. and John Covlci of Information Engineering deserve special mention for considerable Finally, plans are being made to work with the program, including the pre- expand the scope of application from the paration of some additional program logic English literary code to other languages documentation. and possibly the scientific (Nemeth) code. The former will probably require no more As expected, evaluation of the tables than a table change in most cases. For is a continuing process. Most of the en- the latter, a preprocessor will probably tries are determined in the light of the be necessary to convert from a reasonable hindsight arising from a particular mis- input language, yet to be designed, into translation. A table of contexts for a text that defines complicated formats contraction use [15] is typically used to unambiguously and is acceptable to DOTSYS aid in making the table entry specific to III and some set of tables. the desired cases, and the self-checker is used to verify that no bugs have been in- troduced. REFERENCES

It might be well at this point to 1. En$1ish Braille~ American Edition 1959. examine how well DOTSYS III has met its Revised 1968. Americ'an Printing House goals of fidelity, portability, flexibili- for the Blind, Louisville, Kentucky ty and comprehensibility. As for fidelity, 1969. there seems to be general agreement that 2. The Nemeth Code of Braille Mathematics a consistently high quality braille is and Scientific Notation. American achieved. Even in the 5808 "problem" word Printing House for the Blind, Louisville, text, for example, fewer than 200 mistrans- Kentucky 1965. (Revision to be pub- lations were made--most of them far-fetched lished in 1973) concoctions such as "abecedarian" and the archaic "disease." In more typical text, . Spanner, . . (Compiler). Revised the error rate is fewer than one in ten International Manual of pages. Portability also seems to have been Notation. 1956. American Printing successfully achieved, thanks largely to House for the Blind, Louisville, the ubiquitous COBOL, although the size of Kentucky 1961. the program (59K bytes, with the tables, 4. Code of Braille Textbook Formats and on a 360-370) is more :restrictive on mini- Techniques. Revised 1970. American mum size than we had hoped would be the Printing House for the Blind, Louisville, case. Kentucky 1961.

40Z 5. Millen, Jonathan ., Finite-State-Syntax Directed Braille Translation. Technical Report MTR-1829, MITRE Corporation, Bedford, Massachusetts July 2, 1970. 6. Development of a High-Speed Brailler Sys- tem for More Rapid and Extensive Produc- tion of Informational Material for the Blind. (Final Report) Sensory Aids, Evaluation and Development Center, Massachusetts Institute of Technology, Cambridge, Massachusetts September 29, 1970. 7. Krebo, B. ., Transcribers' Guide to English Braille. She Jewish Guild for the Blind, New York, New York 1967. 8. Millen, Jonathan K., DOTSYS II User's Guide and Transfer and Maintenance Manual. Technical Report MTR-I~53, MITRE Corporation, Bedford, Massachusetts July 2, 1970. 9. Schack, Ann S. and Mertz, . T., Braille Translation System for the IBM 70~. Mathematics and Applications Department, IBM Corporation, New York, New York 1961. I0. Gerhart, W. Reid; Millen, Jonathan K. and Sullivan, Joseph E., DOTSYS III: A Portable Program for Grade 2 Braille Translation. Technical Report MTR-2119, MITRE Corporation, Bedford, Massachusetts May 14, 1971. ii. Millen, Jonathan K., Choice of COBOL for Braille Translation. Technical Report MTR-1743, MITRE Corporation, Bedford, Massachusetts December 1969. 12. Siems, John R., Report of New Braille Translation Program at APH. Proc. Conf. on New Processes for Braille Manufacture. American Printing House for the Blind, Louisville, Kentucky 1968. 13. Leffler, Lois ., The Development of a Computerized Grade II Braille Transla- tion Algorithm. Wescon Technical Papers 1971, Session 30. August 24, 1971. 14. Allen, Jonathan and Borroz, W. Terry, R.ecent Improvements in Braille Tran- scriptlon. Proc. ACM Conference, 1972. 15. Key Braille Contraction Contexts. Am- erican Printing House for the Blind, Louisville, Kentucky 1969. 16. Boyles, Marion P. and LaGrone, Robert E., Computer Braille Translation of the Atlanta School System. Wescon Technical Papers 1971, Session 30. August 24, 1971. 17. Haynes, Robert L., An Automated Braille Translation System. Wescon Technical Papers 1971, Session 30. August 24, 1971.

403