On Formalism in Specifications=

Bertrand Meyer, University Of california, santa Barbara

A crilique of a natural ~ language specification, pecification is the life· followed by presentation of a mathematical Scycle phase concerned wit h precise alternative, demonstrates definition of tile tasks to be performed the weakness of by the system. Allhough soflwarc ell­ gineering textbooks emphasize it s ne­ natural language I.:essi!y. the specification phase is often and the strength overlooked in practice. Or, m OTe pre­ of formalism cisely. it is confused with eit her the in requirements preceding phase. definition of system specifications. objectivC$, or Ihe following phase, de­ sign. tn the first case, considered here in particular. a natural-language re­ qllin>mellfs dOCllmen' is deemed suf+ licient !O proceed to system dcsign­ without further specification :Ictivity. This article emphasizes the draw­ backs of such an informal al>Droach and shows the usefu lness of formal specifications. To avoid possible mis­ undefstanding. however. let's clarify one point at the outset: We in no way advocate forlllal specifications as a replucement for natural-language re­ quirements; ralher. we view them as a cOlllplelllell( to natural-language de­ script ions and. as will be illustrated by an example. as an aid in improving the quality of natural-language spccifica- 110 m. Readers already convinced of the benefits of fOfma l s(X'Cificat ions might find in this article some useful argu­ Illent s to reinforce their viewpoinl. Readers not sharing Ihis view will, we hope, find some interesting ideas to ponder.

The seven sins of the specifier The study of requirements docu­ menlS, as they arc routinely produced in industry, yields recurring patterns of

6 O7;tO.74}918510001 / 0006S01 ,00 198,S 1(::1::1'. IEEE SOFTWARE Requirements \ \ Specification \ Global :Ware \ -- DESIGN llfe cyde Detailed \ This "water- fall model" of the \ Implemen'ation \ software life cycle originated with W. w. deficiencies. Table I li sts sevcn classes Royce ("Managing the \ Validation of deficiencies that we have found 10 Development of Large Solt­ be both common and particularly ware Systems: Concepts and \ Techniques," Wescon Proc .. Aug. damaging to the quality of require­ Distribution 1970), but many variants have been \ ments. published. A well-known one is in \ The classilication is interesting for Boehm (1975). The IEEE Standard on Soft­ two reasons. First. by showing the pit­ ware Quality Assurance (Standard P732) also \LI_ope_"_,,o_n--l fall s of natural-language requirements defines a variant. documents, it gives some weight to the thesi5 that formal specificat ions arc needed as an intermediate step be­ tween requirements and design. Sec­ ond. since natural-language require­ ments arc necessary whether or nOl one accepts the thesis that they should Table L be complemented with formal specifi­ T hc sc\'cn ~ in \ of Iht' ~ P<'cificr . cations, it provides writers of such re­ quirements with a checklist o f CO Ill­ Noise: The presence in the tC'I; t of an element tlHl t does not mon mistakes. Writers of Illost kinds carry information relevant to any fealU rc of the of software documentation (user man­ I'roblem. Variants: redllndallcy; remorse. uals, rrogr;lmming language manuals, etc.) should find this li st useful; we'll Silem'e: T he existence of a fea ture of the problem that i~ demonstrate its use through an exalll­ not covered by any clement of the IC'l;1. ple that exhibit s all the defe<.:ts cxcert Overlpecijica/ion : Thc presence in the ICxt an clcment that cor­ the last one. or re,>po nds not 10 a feature of the I'roblem but to fea tures of a possi ble .. olution. A requirements document The reader is invited to st udy. 111 COl/fradiclion: The presence in the te'\ t of two or more clements li ght of the rrevious li st. some of the thm defi ne a feature of the system in an incompati­ soft wan: documentation available to ble way. him . We could do the same here and A mbiguilY: T he pre~e n c(' in the t(''I; t of an clement thaI !l1a ~ es it discuss actual requiremen ts docu­ pos5ible to interpret a feat ure of the problem in at ments, taken from indll ~ trial son ware lea~ t two different ways . projccts. as we did in a prcvious ver­ sion of this article. I But such a discus­ Forward reference: The p r c~e n cc in the text of an clcment that uses sion is not cntirely satisfactory; the fea tures of the I'roblem not defined until later in reader Illay fccl that the examples cho­ the text. scn are not representative. Also. one Wish/ ul/hinkin$;: The presencc in the text of an d emcnt that defines sometimes hears the remark that noth­ a feature o f the problcm in such a way that a can­ ing is inherently wrong with natural­ didate sol ution ca nnot rea lis tica ll y be val idated language specil'ications. All one has to wit h respect to this fea ture. do, the argument continues. is to be

January 1985 7 Formalism

careful when writing them or hire peo­ words of E. W. Dijkstra,2 "Tt"Sting Apparently somebody criticized the ple with good writing sk ills. Although ca n be a very effective way to show the initial version, since the last version well-written requirements arc obvious­ presence of bugs, but it is hopelessly contains the following footnote: inadequate for showing their absence." ly preferable to poorly written ones, Making these specifications precise is we doubt that they solve the problem. Thus, in the view of such critics, tes­ difficult and is an excellent example of In OUT vicw, natural-language descrip­ ting is futile; the only acceptable way the specification task. The specifications tions of any significant system, even 10 validate a program is to prove its here should be compared with those in ones of good quality, exhibit deficien­ correctness mathematically. OUT original paper. cies that make them unacceptable for Since Goodenough and Gerhart Thus, when we examine the final rigorous software development. were discussing lest data selection specification. it is only fair to consider To support this view, we have cho­ methods, they felt compelled to refute it nOl as an imperfcct document writ­ sen a single example, which, although this a priori objection to any research ten under the schedule constraints openly academic in nalUre. is especial­ on testing. They dealt wilh it by show­ usually imposed on software projects ly suitable because it was ex plicitly and ing significant errors in programs in industry, but as the second version carefully designed to be a ';good" whose "proofs" had been published. of a carefully lhought-out text, de­ natural-language spc(;ification. This Among the examples was Naur's pro­ scribing what is really a toy problem, example is the specificat ion of a well­ gram, in which they found seven er­ unplagued by any of the numerous known texl-processing problem. The rors-some minor, some serious. special considerations that often ob­ problem first appeared in a 1969 paper scure real-life problems. If a natural­ by Peter NauT where it was described language specification of a program­ as reproduced here in Figure I. Goodenough and Gerhart ming problem has ever been written Naur's paper was on a method for found seven errors-some with care, this is it. Yet, as we shall see, program construction and program it is nOl without its own shadows. proving; Ihus, the problem statement minor, some serious-in Figure 2 (sec p. I J) gives Good­ in Figure I was accompanied by a pro­ Naur's program. enough and Gerhart's final specifi­ gram and by a proof that the program cation, which should be read carefully indeed satislicd the requiremems. at this point. For the remainder of this The problem appeared again in a Our purpose here is not to enter the article, numbers in parentheses-for paper by Goodenough and Gerhart, testing-versus-proving cont roversy. example, (21)- refer to lines of text as which had two successive versions. The Naur-Goodenough/Gerhart prob­ numbered in Figure 2. Both versions included a crit icism of lem is interesting, however. because it Naur's original specificat ion . exhibits in a panicularly clear fashion Analysis of the speeification Goodenough and Gerhart's work some of the diflicuhies associated with The first thing one notices in look­ was on program testing. To explain natural-language speci fications. Good­ ing at Goodenough and Gerhart's wh y a paper on program testing in ­ enough and Gerhart mention thaI the specification is its lengt h: about four cluded a l: riticism of Naur's text, it is trouble with Naur's paper was partly times that of Naur's original by a sim­ necessary to review the methodologi­ due to inadequate specification; since ple character count. Clearly. the au­ cal dispute surrounding the very con­ their paper proposed a replacement for thors went to great pains to leave noth­ cept of testing. Some researchers dis­ Naur's program, they gave a corrected ing out and to eliminate a ll ambiguity. miss testing as a method for validating specification. This spedlication was As we shall sec, Ihis overzealous effort software because a 1t'S1 can cover only prepared with particular care and was actually introduced problems. In any a fraction of significant cases. In the changed as the paper was rewritten. case, such length seems inappropriate

B tEEE SOFTWARE Rococo interior with fashionable pair dancing; engraving by Gravelol. 1770.

for specifying a problem that. after all. looks fairly simple to the unprejudiced observer. Beforc embarking on a morc dc~ tailed analysis o f this text, we should emphasize that the aim o f the game is not to criticize this particular paper; the official subjcct maHer of Good­ enough and Gerhan's work was test­ ing, not specification, . and the pre­ scription period has expired anyway. We take the paper as an example be­ cause it provides a particularly com­ pact basis for the st udy of common mistakes. Given a text consisting 01 words separated by BLANKS or by NL(new line) Noise. "Noise" elements are identi­ characters, convert it toaline-by-Ijneform in accordance with thelollowing fied by solid underlines in Figure 2. rules: Noise is not necessarily a bad thing in (1) line breaks must be made only where the given text has BLANK or NL; itself; in fact, it can play the same role (2) each line is filled as far as possible. as long as as comments in programs. Often. how­ (3) no line will contain more than MAXPOS characters. ever. noise clements actually obscure the text. When first encountering such Figure 1. Naur's original statement of a well-known tUI-processing problem. an element, the reader thinks it brings new information, but upon closer ex­ amination, he realizes that the element References on tile Naur-

January'985 9 Formalism -

cerned. In a lechnical document, how­ cept in a parenthetical bit of remorse it does not have lines according to the ever, the rule to observe is exactly the toward the end of the text (24), where it definition given! This is obviously ab­ opposite-namely, the same concept is described as a sequence of characters surd and not what the authors had in should always be denoted by the same "between successive NL characters." mind, but the use of natural language words, lest the reader be confused. (By the way, are those characters part leads naturally to such slips of the pen. An interesling variant of noise is of the line?) Another interesting silence concerns remorse, a restriction to the descrip­ An interesting point here is the cul­ the variable Alarm. Line 16 specifies tion of a certain specification element tural background necessary to under­ that this variable should be set (0 made not where the element is defined stand this concept. In ASCII -oriented TRUE in case of an error, but nothing but where it is used, as if the specifier environments, "New Line" is a char­ is said aboul what happens to it in suddenly regretted his initial defini­ acter; thus, people working on ASCII other cases. The answer is obvious, of tion. An example here is "the output environments (DEC machines, for ex­ course; but the matter can only be text, if any" (20). Up to this point. the ample) will probably understand easily brushed aside as minor by program­ specification freely used the notion of the specification's basic hypothesis mers who have never run into a bug oulput text (12, 17); nowhere was there -namely, that NL is treated as an or­ due to an uninitialized variable ... any hint that such a text might not ex­ dinary character upon input but trig­ It must be pointed out that Good­ ist. If the reader wondered about this gers a carriage return upon output. enough and Gerhart corrected a nota­ problem, the specification did not pro­ These concepts are foreign, however, ble silence in Naur's original descrip­ vide an answer. Now, suddenly, when to somebody working in an EBCDIC tion. Naur's text does not explain what the discussion is focusing on some­ environment, especiall y on IBM OS should be done with consecutive groups thing else, the reader is "reminded" systems, on which files are made up of of more than onc break character; this that there might be no such thing as an a sequence of "records" (correspond­ is one of the seven errors analyzed in output text, but no precise criterion is ing, for example, to lines), each made Goodenough and Gerhan's paper. given as to when there is and when up of a sequence of characters. A per­ Their specification corrects it by re­ there isn't. son coming from such an environment quiring that such groups be reduced to Another instance of remorse is the would not have written the above speci­ a single break character in the output. late definition of the "line" concept fication and will probably have trouble Although something had to be done (24), 10 which we will return. We wi[[ understanding it. about the problem, note that this solu­ meet again the tendency 10 say too Besides, the late definition of line is tion is, to some extent, obtained at the much, which generates noise, as a plainly wrong. It applies only to lines expense of simplicity. Eliminating re­ source of contradiction and ambiguity. that are neither at the very beginning dundant break characters and dividing nor at the very end of the text. In both a text into lincs are two unrelated prob­ Sil ence. In spite of all his efforts, the these cases, a line is not "between suc­ lems; merging them into a single specifi­ specifier often leaves, along with over­ cessive NL characters" but between cation complicates the whole affair. documented elements, undefined fea­ the beginning of the file and an NL, or It is probably better to deal with tures. Commonly, these features are between an NL and the end of the these twO requirements separately, and fairly obvious to a community of ap­ file-that is, between an NL and an this is what we do in the formal plication specialists, who are close to ET. If we accept the authors' defini­ specification given below. Some of the the initial customers, but they will be tion, the first and last lines of the out­ current trends in programming meth­ more obscure to those outside this cir­ put may be of arbitrary length; in fact, odology emphasize this approach­ cle. An example is the concept of an output containing 110 NL at allisac­ most notably under the influence of "line," which is not really defined ex- ceptable regardless of its length, since the Unix programming environment,

10 IEEE SOFTWARE Ball at the home of a German baron : engra~ing circa t 750, The Bettmann Afchive

which, at least in principle, favors The program's input is a stream end is tools that are simple and composable orcfiaracter~ whose rather than large and multipurpose. 2 signaled with a speCial end·ol·text character, ET. There Is exactly 3 one ET character in each input stream. Characters are classified Contradictions. There is another I problem with the concept of line. 4 ., Given a type " one should distinguish 5 • break characters-BL (blank) and NL (new line); between the types seq(!], whose ele­ 6 • non break characters-all others except ET; ments arc finite sequences of objects of type I, and seq (seq (Ill, whose ele­ 7 • the end·of·text indicator- ET. ments arc sequences of sequences of 8 A word is a nonempty sequence of non break characters. A objects of type I. Such a confusion can 9 break is a sequence of one or more break characters. Thus, the be found in Figure 2, where we are first told (I) that the input is a " stream," or 10 input can be viewed as a sequence of words separated by breaks, sequence, of characters and later (10) 11 with possibly leading and trailing breaks, and ending with ET. that it "can be viewed" as a sequence 12 The program's output should be the same sequence of words of words and breaks. A s any Lisp pro­ grammer knows, the sequences 13 as in the input, with the exception that a ~ ?ersize word (I.e., a 14 word containing more than MAXPOS characters, where MAXPOS [sequence of objects] 15 is a positive integer) should cause-an error exit from the program and 16 (i.e., a variable, Alarm, should have the value TRUE). Up to the «a>

January 1985 11 Formalism

put." This last comment is remarkable documents written by programmers. Gerhart's specification? The reason is since neither the input nor the oU/put Psychologically, this is understand­ one of the errors in Naur's original is a sequence of" words. Worse yet, able. An implementation-level concept program, which would go imo an in­ even if we parse the input into a se­ is good, concrete, technical stuff, finite loop unless the input was incor­ quence of words, this sequence is not whereas true requirements deal with rect (that is, contained an oversize sufficient to determine the oulpu{­ much less tangible material. To a com­ word). Upon closer examination, how­ one also needs (wo binary informa­ puter specialist, a stack is easier to ever, a case can be made for Naur's tions: whether there is a leading and/or visualize than, say, the flow of infor­ solution (without the other errors, of a trailing break. mation in a company or the needs of a course). It is not so unrealistic to con­ The same sentence (9-11), in its radar operator. ll1Us, many specifiers sider the required program as a poten­ overzealous effort to leave no stone have a natural tendency to cling to pro­ tially infinite process, which takes unturned, ends up introducing another gramming concepts. There is a price to characters as input and produces lines contradiction. An unbiased reader pay for this: Implementation decisions as output. working somewhat like a would be puzzled. How can the input taken too early may turn out to be device handler (for instance one that "end with [the character] ET" (II) wrong, and important problem fea­ drives a printer) in an operating sys­ and at the same time have a "trailing tures can be overlooked. tem. Such an interpretation should, of break" (II)? "Trailing," precisely, The example text contains an over­ course, be clearly described in the means "at the end"! What's the last specification right from the first specification, which was not the case character if (here is a • 'trailing" break: sentence: the notion of Ihe end-of-text with Naur's text. That decision would ET or a break character? character ET. The only reason for Ihe be less arbitrary than the one taken by A more experienced reader, such as presence of this notion is Goodenough Goodenough and Gerhart: their inclu­ a programmer, will have no difficulty and Gerhart's desire to correct Naur's sion of ET changes the data structure resolving this contradiction; his experi­ original program. Input-output facili­ at the specification level to accom­ ence will tell him that "end" markers ties of the version of Algol 60 used by modate the programming language follow "trailing" characters. But this Naur (and, for fairness. by Good­ used at the implementation stage. reliance on intuition and knowledge of enough and Gerhart) do not provide The unacceptability of Ihe change is the appli cation domain can be par­ for end-of-filedetection when reading, fUflher evidenced by the fact that the ticularly damaging when transposed to so one must assume the presence of a output does not satisfy the require­ large requirements documents, which special character at the end of the file ment on the input. Is it realistic to ex­ will be handed down to a group of to make up for this deficiency. But ET pect an existing fileto beterminated by system designers and implememors of is an implememation detail and should an explicit marker? If it is, the output diverse backgrounds and abilities. not be included in an abstract specifi­ produced by the program should satis­ cation. Conceptually, the input is a fi ­ fy that condition; however, examina­ Overspccirication. Overspecifica­ nite sequence of characters; it should tion of the specification. which is not tion in requirements can be annoyingly be transformed into an output that is a completely clear on this matter, and. close to silence. The reader is told too sequence of lines or. depending on the as a final criterion, of the proposed much about the solution while he is interpretation chosen, a sequence of program. shows that ET will not be desperately trying to grasp the problem characters. It is a programmer's vice 10 passed on to the OUlput file. Assume and figure out-by himself-features insist that finile seq uences be specially that we want to write another pro­ not covered by the text. Overspecifica­ marked at. the end. gram, for. say, right-justifying the tion is typically, although certainly not Why does the ET character receive text, that will take Goodenough and exclusively, found in requirements such emphasis in Goodenough and Gerhart's output (in "pipe" mode a la

12 IEEE SOFTWARE Dancing the minuet In the open air: copper engraving by Charles Eisen . Tho

Unix). In designing that program, we will not be able to make the samc assumption on its input. Thus, the overspedfication has opened the way to serious inconsistencies. Another overspecification in thc text is the concept of "crror exit" (16), which causes a "variable," Alarm, to have the value TRUE. Clearly, the no­ 1 2 3 4 6 7 6 10 tion of a variable belongs to the world 5 9 of programs, not specifications. This 1 U N I X I S A piece of overspecification would have been less shocking if the problem had 2 T A A 0 E M A A K been defined as the task of writing a 3 0 F B E l l procedure, with Alarm as one of its parameters, or as

January t985 \3

I Formalism

the point of an error" is susceptible to case, the specification should state it should be a part of all requirements at leasl two interpretations. precisely. specifications and other software The text says that up to (and pre­ Another potential source of am­ documents. sumably including) the point of the er­ biguity is t he use of imprecise or poorly Here, of course, the text is very ror,lhe program's output should cor­ defined terms-for example, the usc short, so the annoyance caused by respond to the input. But where is the of "stream" (I) rather than the more forward references is nowhere ncar "point of the error" in Figure 3? Is it standard "sequence." The expression what it can be with full-size docu­ [1ine 4, column IOJ. last acceptable let ­ "error exit" (15), stemming from the ments. Note, however, that ET is used ter, or ]3, 7), end orlhe lasl acceptable overspecification seen above, is am­ three times (2, 3, 6) before it is defined word? Nothing in the text allows the biguous, and the reader is nO! com­ (7), that the notion of line, defined not reader to decide between these two in­ fOrled by the explanation that follows quite satisfactorily (24), has been used terpretations. it ("Le., a variable, Alarm, should earlier (19-20), and that MAXPOS is Another imcresting ambiguity is have the value TRUE"); the notion of used just before its definition (14). connected with the basic constraint on assigning a value to.a variable does not acceptable solutions (23); "As many by itself imply the idea of an "exit," So what? In dissctting Goodenough words as possible should be placed on which also means that the program and Gerhan's specification, we iden­ each line." [fwe have, say, MAXPOS stops in some fashion. We have seen tified a significant number of prob­ = [0 and the input text that the concept of "line" is not well lems in a text that may scem innocuous defined (24). Also note that t he expres­ to a superficial observer. Not all the WHO WHAT WHEN sion "new line" is to be parsed as a problems were equally serious, and the there aTC two equally correct Iwo-line single entity (the flew line character) in reader may have felt that we were a bit solutions (WHAT may bean either the its first appearance (5) and as separate pedantic at times. We submit, how­ first or second line). This ambiguity words ("a new line should start . .") ever, lhat one must be pedantic in deal­ may be acceptable since neither solu­ in it s second (19). ing with such mailers. Inconsistencies, tion appears superior to the other; the ambiguities. and the like may not war­ speci fication as such is nondeter­ Forward reFerences. In a require­ rant the gallows when the problem is to ministic. We suspect (perhaps wrong­ ments document, not all forward split up a sequence of characters into ly) that this nondeterminism was not references are bad. Some, corre­ lines. But keep in mind how the above intentional and that there was an im­ sponding 10 a top-down presentation defects transpose to more ser-ious mat­ plicit overspecification in the authors' of the concepts ("the notion of ... ters-a nuclear reactor control system, minds: they considered it obvious lhat will be studied in detail in sec­ a missile guidance system, or even just the input would be processed sequen­ tion ..."), might even be considered a payroll program. The computer that tially. so any ambiguity, as in the ex­ good practice, provided there arc not excclltes the code resulting from a faul­ ample above, would be solved by plac­ too many. But implicit forward refer­ ty specification is more pedantic than ing as many words as possible on the ences (that is, uses of a concept that any human referee could ever be. earlier line (giving line WHO WHAT come before the proper definition of Thus, we should consider Good­ followed by line WHEN). In this inter­ the concept, without particular warn­ enough and Gerhart's specification pretation, property 3 (23-24) actually ing to the reader) can preselll much not only as an object of study in itself means, "As many words as possible moreofa problem. They makea docu­ but also, and more importantly, as a should be placed on each line as if is ment extremely hard to read, especial­ microcosm for conveniently observing enCOlllllered in Ihe se

14 IEEE SOFTWARE Two people do1ng the minuet; copper engraving by Nilsson.

though the text was written with great care, we have witnessed how the au­ _ on fOrmal spedfIcatIon thors, who started out to improve Many languages have been designed in recem years. upon Naur's terse but simple text, A few are listed here. without any claim to exhaustivity. sentence after sentence became a little Jean.Raymond Abrial. Stephen A. Schuman, and Benrand Meyer, "A Specifica­ more entangled in their own rosary of tion Language. " in On the Construction 0/ Programs. R. McNaughten and R.C. caveats. This says a lot about why in­ McKeag, eds .• Cambridge University Press, 1980. terminable manuals occupy so much Rod M. Burstall and Joe A. Goguen, "Putting Theories Together to Make shelf space in programmers' offices Specifications," Proc. Fifth Int'f Joint Con! Artificiaf Inreffigence, Cambridge. and computer rooms. Mass, 1977, PP. 1045-1058. In our opinion. the situation can be Oiff B. , Jones, So/twore Development: A Rigorous Approach. Prentice-Hall. significantly improved by a reasoned Engle ..... ood Clim, N.J., 1980 use of more formal specifications. But R. Locasso, John Scheid, Val Schorre, and Paul R. Eggen, "The Ina Jo SpecifICa­ again, let's emphasize that such speci­ tion Languase Reference Manual." Technical Repon TM-(L)-/602I100Il00, fications are a complement to natural System Developmenl Corporation. Santa Monica. Calif., June 1980. language documents. not a replace­ David R. Musser, " Specification in the AFFIRM System," ment. In fact, we' ll show how a detour IEEE Trans. So/rware Engineering. Vol. SE-6. No. I, Jan. 1980, pp. 24-32. through formal specification may L. Robinson and Olivier Roubine, Speciaf Re/eren('e Manuaf. Stanford Research eventually lead to a better English de­ Institute, 1980. scription. This and other benefits of formal approaches more than com­ pensate for the effort needed to write malical notation. The style of exposi­ and underSland methemalical nota­ breaks and limifed~/engfh, and a tions. tion will be similar to that found in function called FEWEST_ LINES. We will now introduce such nota­ mathematical texts; translation to a (Throughout the discussion of the for­ tions, which will allow us to give a for­ specific formal specification language mal specification, the reader may wish mal specification of the Naur-Good­ should not be hard, provided the lan­ to refer to Figure 4 for a picture of the enough/Gerhart problem. guage supports the relevant concepts. overall structure of the relations and functions involved.) Elements for a Overview . Perhaps the only difficult Relation shorLbreaks holds be­ formal specification part of the Naur-Goodenough/Ger­ tween two sequences of characters a Many formal specification lan­ hart problem is thal the processing to and b if and only if b is identical to a. guages have been designed in recent be performed on the text involves three except that breaks in a (i.e., successive years (see box). Choosing one of these aspects; reducing breaks to a single break characters) have been reduced to languages would force the reader to break character, making sure no line single break characters in b. learn its particular notation and would has more than MAXPOS characters, Relation limifelLlellgfh holds bc+ obscure the essential fact-namely. and filling lines as mu!::h as possible. If tween two sequences of characters b that their underlying concepts are, for these three requirements are sepa­ and c if and only if c is a "limited the most part, well-known mathemat­ rated, things become much simpler. length version" of b: that is, no line in ical notions like sets, functions, rela­ Consequently. we will define the prob­ c has length greater than MAXPOS, tions, and sequences. We thus prefer lem formally by considering two sim­ and cis identical to b except thai some to use a more-or-less standard mathe- ple binary relations, called sho,,_ blanks may have been replaced with

January 1985 15 new lines and/or some new lines with We'll now define these notions for­ sol: INPUT - OUTPUT blanks. mally, but a few simple conventions where INPUT and OUTPUT are the By applying these two relations suc­ arc needed first. set s of possible inputs and outputs, cessively. we associate with any se­ which we will describe below as sets of quence of characters a all scquencesof Rasic form or t.he specific ation. As a sequences. Function sol must satisfy characters that are "made of the same general convention, we usc uppercase (,:ertain constraints. which it is the role words," separated only by single for sets and for functions whose results of the specification to express. breaks, and fit on lines no longer than are sets and lowercase for olher func­ As noted above. there may be more MAXPOS. Given such a sct of se­ lions. elements of sets (except for than one correct output for a given in­ sse, FEWEST_ quences. say. then MAXPOS. which we write in upper­ put; in other words, a truly general LINES (SSe) sse is the subset of case as in the original specification). specification of the problem should be containing those sequences that con­ sequences, and relations. nondeterministic. We will represent sist ofa minimum number of lines and this fact by defining a binary relation be thus arc acceptable outputs for the The program to written is the im­ between sets INPUT and OUTPUT. program. plementation of a function We cal1goolthis binary relation; then a function sol will be a correct solution if and only if the following two condi­ tions are satisfied (readers who are not A reminder on fUnctions and relations so sure about functions and relations Consider two sets-for example. arc referred to the refresher in the ad­ INPUT and OUTPUT. A binary jacent box): relallon between these two sets is a • function sol is defined wherever sct of pairs relation gool is defined-that is. sol (i) exists for any i in the do­ 1<;101>' < i 2_ 02>.··· I main of gool; whcreeachiA bclongstoset INPUT • for any i for which gool is defined, 0 , and each 0. belongs to set OUT­ then sol (i) yields a "solution" to PUT. Such a relation is represented gool- that is. gool (i, sol (i» pictoriall y at right. If gool is a rela­ A marion. holds. tion. then we write gool (i, 0) to ex­ This definition is expressed in math­ press that the pair < i, 0> belongs ematical notation by writing that sol is to the relation. an acceptable function if and only if The domain of such a relation. written dam (goo/). is the subsct of IN­ PUTcontaining only those elements i such that goof (i, 0) holds for allra!if vi E dom (goal), i E dom(so/) and gool (i. sol (i» one element 0 in OUTPUT. Thus. in the example pictured, it. i 2• and i4 • but not i), belong to the domain of the relation. where dom (sol) is the domain of A rund60n is a relation f such that for any i there is al most one a for function sol. Note that there may be whichf (i, 0) holds: if 0 exists. then one may write 0= f(;) . The relation some inputs for which there is no ac­ pictured above is nOt a function. since ii' for instance. has two buddies 0 I ceptable solution (those not in the do­ and 02' Note that the domain of a function is made of those e1 emcnt ~ of main of goo/), so sol may be a partial INPUT for which there is e"ac=II),onecorrespondingelement in OUTPUT. function. Also. in more concise nota­ tion, the above property can simply be

16 IEEE SOFTWARE

• IImiled--.Jength (r) FEWEST_ LINES expressed by writing that the domain shor'-./Jrea/(s (r) COMPACTED (F) TRIMMED(F) of sol is at least as large as the domain of goal, and thai sol is included in goal (both being defined as Sets of pai rs): dom (goal) C dom (sol) . =t===~ . and sol C goal ~~=====t:. ~------+.. -+--- • This way of presenting a specifica· • tion is of very general applicability for • programs performing input·to-output o transformations. Such a program may (acceptable be viewed as the implementation of a out puis) certain fun ction (sol) which must en· sure that a certain relalion (goal) is satisfied between its argument and its result ; in mathematical terms, the Figure 4. O"erall structure of the specifi cat ion: (r) indicates a relation. ( t' ) a function is included in (is a subset of) function. the relation . To speci fy the problem is 10 define Ihe relation; to construC1lhe program is 10 find an implemenlable fu nction sol satisfying Ihe above can· Baslc set and logic notations ditions. ) The definitions marked (*) introduce predicates. that is, expressions which may have value "true" or "false." Characters a nd sequences. The principal sct ofinterCSI in ou r problem I u. b, C, ... I: the set made lip of elemellls a, b, C, ••• is the set of characters, which we de· xEA:xis an clement of A(*). nOte by CHAR. The only properly of xf,'A: x is not an clement of A (*). CHAR that matters here is that CHAR contains two clements of par­ A C B: A is a subset of B (all clements of A arc elements of B) (*). ticular interest, blank and new~line. IXEA I p(x) I: The (possibly empty) subset of A made up of those We call BREAK_CHAR the subset of clements x which satisfy property P. CHA R consisting of Ihese two ele· 't'xEA, P (x): All clements xof A, if any, satisfy property P (or: no ele· ments: ment of A violates P); holds in particular whenever A is empty (*). BREAK_CHAR . I blank, !lel'dille I 3.\'EA, P(x): There is at least one clement xin A which satisfies property P; The basic concept in this problem is may only hold if A is nonempty (*). that of seq uence. If X is a set, we uz!> b: a implies b. denOle by seq IX I the set whose ele-­ ments are finit e sequences of clements a .. b: the integer interval cOlllaining all the integers i such that a si sb; empty if This notation is borrowed from Pascal. of X. Such a sequence is written, for a>h. example, as The symbol . means "is defined as."

January 1985 17 Formalism

and has a length that is a nonnegative these sequences, namely, the second Input and output sets. In the prob­ integer; thus, length is a function from and last. lem at hand, the input is a sequence of characters; we choose to describe the seq (XI to the SCI of natural numbers. In the same fashion, we denote by Elements are numbered starting at I; output as a sequence of characters as MAX_SET (X,J) the i-th element of a sequence s (for well. Thus, we define the two sets: I :Si:slength(s» is wrillcn s(i). A the subset of X consisting of the ele­ INPUT _ seq [CHARI subsequenct of s is a sequence made of ments for which the value of/is max­ OUTPUT - seq [CHAR] zero or more of the clements of s, in imum; thus, in the above case, MAX_ Note that, as mentioned above, the same order as ins; for example. if s SET (X,f) is the set I I, another interpretation could have is the above sequence, then some of ils containing just one sequence. subsequences arc defined the set of possible outputs as MAX_ SET, however, is not always seq (LINE], with LINE itself being <0, b. c, d > defined; we have to be careful 10 apply defined as seq (CHARI (or possibly it only to SetS X which are finite; other­ seq (WORD] with WORD _ seq On the other hand, < b. d, c> is nOi a wise, there might be no maximum [CHARI. plus information on leading subsequence of s because the original value for f. Note that the results of and trailing breaks). order of its elements in s is not pre­ MIN_ SET and MAX_ SET are a We will now define the relations served. subset of X rather than a single ele­ shorLbreaks and Iimile{Llenglll and The sct of subsequences of s will be ment, since there may be more than the function FEIVESLLlNES. written SUBSEQUENCES (s). one clement with minimum or max­ The concept of sequences is well imum/value. These subsets are non­ The formal specification known, and we rely on the reader's empty if and only if X is nonempty. Shorl breaks. Let a be a sequence understanding here. A formal defini­ We will also need a way to denote of characters. We define SINGLE_ tion of sequences and of the above no­ the minimum and maximum elements BREAKS (a) as the set of subse­ lions is given in the box on theadjacent of a set of natural numbers SN. They quences of a such (hat no two con­ page. will be written, in the usual fashion, secUlive characters are break charac­ min (SN) and max (SN). Thus, if SN ters: Minima and maxima. If X is a set, is the set SINGLE_BREAKS (0) _ and/is a function from Xto the set of SN-134I,7,),6S41 Is E SUBSEQUENCE (a) I natural numbers, vi E 2 .. lenglh (s). MIN SET (X,J) then min (SN) is) and max (SN) is sU-I) E BREAK CHAR denotes the subset of X consisting of 654. Note that min and max, contrary - sci) f BREAK_CHAR I the elements for which the value of / to MIN_SET and MAX_SET, yield a Note thaI we use the Pascal notalion, is minimum. For example, if X is the natural number, nOI aset. Also in con­ a.. b, to denote the (possibly empty) following set, containing four se­ strast to MIN_SET and MAX_SET, set of integers i such that as i s b. quences which are defined for empty sets (they yield an empty result), both min and Ncxt, we define COMPACTED (a) X-I , , max are defined only if the scI SN is as the subset of SINGLEJ)REAKS (a) , I not empty; max further requires that containing those sequences of maxi­ and / is the lenglll function on se­ SNbe finite. It is essential to check for mum length: quences, then MIN_ SET (X, /) will these conditions whenever using these COMPACTED (a) I!i MAX_SET be the set consisting of the shortest of functions. (SINGLt."'JJREAKS (a), length) " IEEE SOFTWARE A deflnltlon Of sequences The following presentation is based on the formal specification of sequences given in the Z reference manual. 11 N wi ll denOle the set of natural numbers. IJtofinilion: !Irq [X), the set of finite sequences of elements of X, is defined as the set of paoial functions from N to X whose domains arc intervals of the fonn I .. n for some natural number n. So a sequence is defined a~ a partial function: for example, the se­ quence s= is the function defined for arguments 1,2, 3, and 4 only, and whose value is a for 1 and 3, b for 2, and c for 4. The following is a pictorial representation of s: 2 3 4 l 6 7 N As stated abovc, MAX_SET (X,f) s I I may be be undcfined if X is an infinitc a b • c x SCI. This cannOt occur here. howevcr, Note that the above definition allows 1/=0 (empty interval, thus si nce SINGLE BREAKS (a) is a empty function - that is, empty sequcnce) and that it justifies the nota~ subset of SUBSEQUENCES (a) tion sCi) for the ith element of sequence s(which is the result of apply­ which, for any sequence of charactcrs ing function s to element i). at is finite. The lenglh of a sequence is defined as the largcst integer for which Note lhal any sequence b in COM­ the associated partial function i~ defined (i.e .• n in the above defini­ PACTED (a) must have rctained tion). from aall non break characters (if such Now let s be a sequence of elements of X and g be a (total) function a character had been omitted, it could from X to some set Y. The composition be inserted illlo b and yield a longer g os clement of SINGLE BREAKS (a», and has a single break character whcre is a partial function from the set of natural numbers to Y. which has D had one or morc consecutive break the same domain as s; thus, it is a sequence of elements of Y. with the characters. same lenglh as s. This sequence is obtained from s by applying g to all Thus, the relation shorLbreaks (a, the elements of s. Again, a picture may help (we set g(a)=o', elc.): b), which holds betwccn aand bifand 2 3 4 l 6 7 N on ly if a and b are made o f the same sc­ s I I quences of words and breaks but the a b a c X breaks in b consist of a single break g I I I character, can be expressed simply by a' b' c' Y shorLbreaks (a, b) • Now take for X the set N .'of natural numbers. A sorted sequence of bE COMPA CTED (a) natural numbers is an clement s of seq IN] such that "'i E 2 .. length (s), s(i-I) ssU) Limited length, The relation lim­ With this definition, it becomes easy to formally define Ihe notiOIl of ile(L/englh (b, c) holds between se­ subsequence used in the texl. quences b and c if and onl y if • c is the same sequcnce as b, except Definitio n: that it may have a new_line wher­ Let s be an clement of ~q IXI for some sel X. A sub~equence of s is a se­ ever b has a bialik, or conversely; quence of the form s. II where u is a SOrted sequence of natural numbers. and The following picture shows how is obtained as a subse- • the maximum line length of c, quence of using the above definition. The sorted defined as the maximum number sequenccllofnatural numbers used here is <3 4 5 7 >: < 1 357> or of consecutive characters none of < I 4 5 7> would also work. which is a lIew lille, is less than or I 2 3 4 l N equal to MAXPQS. u "-.... I /~ This is expressed more precisely as I 2 3 4 l 6 7 8 9 10 N fo llows: s I I I I fimite(Llellglh (b, c) • (J b (J a b d c d X c E TRIMMED (b) Jan ua ry 1985 •• Formalism

where quences can be interpreted as con­ Figure 4 may help explain t he role of sisting of li nes separated by new_line the various functions and relations in TRIMMED (b). characters. We define the set FEW­ the above specification. {s E EQUIVALENT (b) { ESLLlNES (SSC) as the subset of mQ.Lline_lenglh (s) s MAXPOS I SSCconsisting of those sequences that Existence of solutions. Once wc have as few lines as possible: have a formal specification. wha! can EQUIVALENT (b) . we do with it? Relying on thespecifica­ Is E "",[CHAR[ I FEWEST LINES (SSC) • tion as a basis for the next stages of the length (5) = lel/gth (b) and MIN_SET (SSC. software life cycle-program design (V; E I. .Iength (b)' nllmber of_new_lilles) and implementation (e.g .. translating sO) ;o! b(i) ~ ... s into loops) is the most obvious use. sU) E BREAK_CHAR and where the function number of new_ However. we'd like to emphasize two b(i) E BREAK_CHARI I lines is defined by: others. One usc, studied in the next mox_line_lenglh (s) _ section, is as a staning point for beller max (\)- i l number of_new_lines (s) • natural-language requirements. The Osisjslengrh (s) and card ( liE 1 .. length (s) other, to which \"e now turn, is query­ (V k E i+ I. .j, sCi) = new_line!) ing the specification to learn as much as possible about properties of the s(k) ;It new_line) I) and card (X), defined for any finite problem and valid solutions. A few explanations may help in set X, is the number of elements (car­ What can the given specification understanding these definitions. If s is dinal) of X. teach us about the Naur-Goodenough a sequence of characters. maLlin(L IGerharl problem and its solution? length (s) is the maximum length of a The basic (('Ialion. The abovc defi­ First, let's determine when solutions line in s, expressed as the maximum nitions allow us to define the basic re­ doexist.lt iSl rivialtoprovethal,given number of consecutive characters, lation of the problem, rdalion goal, a sequence of characters a, there is none of which is a new line. In other precisely. Relalion goal (i, 0) holds be­ always at least onc sequence b such words. it is the maximum value ofj- i tween input i and output 0, both of that relation shari_breaks (a, b) such that s(k) is nOI a new line for any which are sequences of characters, if holds. Given b, however, the necessary k in the interval i+ t. ,j. (We will have and only if and sufficient condition for the ex­ more to say about this definition o E FEWESLLlNES (TRANSF (i» istence of at least one sequence c such below.) EQUI VALENT (b) istheS(!( Ihat IimiletL/englh (b. c) holds is that of sequences that afe "equivalent" to TRANSF (i) is the set of sequences b contains no word (i.e .. contiguous sequence b in the sense of being iden­ related to i by the composition of thc subsequence of non-break characters) tica110 b, except that new_linecharac­ two relations shorLbreaks and lim­ of length greater than MAXPOS. This ters may be substituted for blank ited_length: follows from the definitions of characters or vice versa. Finall y, TRANS!' U) • Is< "'" [CHARI I TRIMMED and max_line_lellglh used TRIMMED (b) is the set of sequences tr(i,s) I in the definition of IimiletLlengtit. which are "equivalent" to band havc with Thus, the domain of definition of the a maximum line length less than or relation Ir, which is also the domain or equal to MAXPOS. (r 91 IimiletLfenglh . sharcbreaks the function TRANSFand thus of the Fewcst lines. Let SSC be a set of se­ The dot operator denotes the composi­ relation goal, is Ihe set of input texts quences of characters. These sc- tion of relations (sec box). A look at containing no word longer than MAX·

20 tEEE SOFTWARE " The Compleat Agul'! ot the Minuel .·· an engraving lrom George B1ckhams 's IlII Easy InlroduclKJfllO DancitI{} . shows the base spalaal shapes used m tile minuet: t738 _Schll.lf1CII • • Rivefside

POS. This can be formulaled as a theorem: dom (goo/) = Is< seq ICHARI I vi E I .. Ienglh(s) - MAXPOS, 3j E i . . i + MAXPOS, sUI' BREAK_CfIAR I The property expressed by this theorem is that the domain of relation goo/consi51S of sequences such that. if a character c is followed by MAXPOS other characters, at least one character among cund the other characters must be a break. An important problem, not ad­ dressed here, is how the specification deals with erroneous cases-that is, with inputs not in the domain of the goal relation- like sequences with oversize words. Clearly, a robust and complete specification should include (along with goal) another relation, say. exceplionaLgool. whose domain is IN­ PUT - dom (goo/) (set difference); this relation would complement goal composition Of relations X Y Z by defining alternative results (usually Let r and I be two relatiom: r i\ some kind of error message) for er­ r--. "...... , /' from Xto Yand I is from )'to 7 XI Y, roneous inputs. Formal specification (see figure). " of erroneous cases fa lls beyond the The composition of the.. e two Xl Yl ~ scope of this article. but a discussion of relations, written I . r (note the " the problem and precise de finitionsof xJ )'J 'J order), is the relation w between V terms such as "error," "failure," and sets X and Z such that w (.\. ;:) X, "exception" can be found in a paper hold<; if and o'n ly if there i<; (at x, V~ by Cristiano ~ leasl) one elemenlY in }'such thai both r (x. y) and, (x. y) hold. y Iliscussion. What we have obtained I I Thus, ;, the example illu~- , , is an abstract specification-t hi s is. a trated . w holds for the pairs ' .and lem. It would be difficult to critici ze (and for these pairs only). w= I. r this specification as being oriented toward a particular implementmion: if

January 1985 21 Formalism •

followed to the letter, the specification conclusion position i in sequence s. may be de­ would lead to a program that (as illus­ Although natural language is lhe fined as a minimum: trated in Figure 4) would first generate ideal notation for most aspects of line_length (s, i) IE all possible distributions of the input human communication, from love let­ min

22 IEEE SOFTWARE TIle reasoning behind fOnnaI specificationS: tile exalllple of RIU_ _ length How does one obtain a formal expression such as the one defining max_line_length'! let's analyze the different steps involved. We want to express the fa ct that max line length (s) isthemaximum length ofa line ins. A definition that avoids thepitfa\ls mentioned in the analysis of Goodenough and Gerhart's text is. informally, "the max­ imum number of consecuti ve characters. none of which is a new line." To translate this definition into a formal description, we have to ex­ press the notion of a contiguous subseq uence of s that does not contain a new_line. A contiguous subsequence can be given by its end indices, say, iand). The seq uence comprising the elements bctween indices iand ) will have length) - i+ I; if it is to yield a line length. then s(k) should be a character other than new line for any k between iandj, inclusive. Thus, a first try might yield i = ) = 0), even if s is an empty se­ max line_ length (s) _ max (LINE. LENGTHS) quence (see box). where the set LlNE_ LENGTHS is defined as Natural language definilion. Once LINE_LENGTHS -I) -;+ I I I sisjslenglh (s) li nd such a mathematical definition has (vk E i .. j, s (k) 'F new line) I been produced, it may in return in­ But beware! One should only apply max to nonempty sels. With the nuence the natural language defini­ above convention, wc can cnd up with LINE. LENGTIIS being empty tion. In this exam ple, the formal if sisan empty sequence or a ll its characters are new line; in either case. definition suggests that we should no i, j pair satisfies the condition. Now. if we write a program for the Terrain from trying to define the con­ Naur-Goodenough/ Gerhart problem and put in in to a library. sooner cepe of "a line in the text" which , or later someone wilt apply it to a sequence that is empty or entirely although intuitively clear, is slightly made of new /inecharacters, so we had better deal with these cases in a tricky when one altempts to specify it clean fa shion. precisely, as Goodenough and Ger­ The culprit is the condition is), which prevents us from fi nding a han's text shows. Instead, we should satisfactory i and) in the borderline cases mentioned. The problem focus on the notion of "maximum line disappears, however, if we replace this condition by i - Is). Then, for length," which is always defined. even a sequence having only new_line characters or no character at all, the for a text consisting of new_line set LINE_LENGTHS will contain one element, 0, obtained for i= I c~arac ter s onl y. Once we have ob­ and)= O. For these values, the interval i .. j is empty; thus, the v ... tained the specification of 111ax-/ine_ clause is true. (Remember that a property of the form Vx E E. P (x) is length, wecan build on it and include it always true when the set E is empty, regardless of what property Pis.) in the English problem definit ion a Thus, we obtain the following replacement: sentence such as LINE_LENGTHS -I)-i+ I IOsi-1 s)slength (s) lind (Vk E i .. j. s(k) ;t new_line) I The maximum number of co n<;ecmive characters. none of which is a new_line, (The first condition has been written Osi-I instead of l s i .) should nOI exceed MAXPOS. We have chosen to simplify slightly the writing of this condition by a change of variable (use i for i - I. thus eliminating + I and - I terms): This sentence, a direct translat ion from the formal definition, is nOl, ad­ LINE_LENGTHS _ 1) -i 'Osisjslenglh (s) and mittedly. of the most gracious sytle; (vk E i+ I..), s(k) ;t new_line) I but it is easy to remove the double This new version is defined in all cases. negat ion, yielding It should be noted that this kind of analysis, which at first sigh t might seem quite remote from programmers' concerns, is in fact closely con­ Any consecutive MAXPOS + I charac­ nected to typical pallerns of reasoning about programs. Anyone who ters should include a new_line. has tried to debug a loop that sometimes goes one iteration too few or laO many, or works improperly for empty inputs or other borderline The main advantage of natural cases, will recogniL.e the line followed in the above discussion. It is our language texts is their understandabili­ contention, however, that such analysis is beller performed at the ty. One shou ld concentrate on this specification level, dealing with simple and well-defined mathematical asset rather than tryi ng to use natural concepts, than at program debugging time, when the issues are ob­ language for precision and rigor, scured by many irrelevant details, implementation-dependent features, qualities for which it is hopelessly in­ and idiosyncrasies of programming languages. adequate. Understandability is seri-

January 1985 23 Formalism

ously hindered when nut ural language upon detailed design and coding; he engineering specifications in other requirements become ridiculously long must know what the allowablecharac­ engineering disciplines. Of course, in a vain attempt 10 chase away silence, ters are apart from blank and new_ there must be a way to communicate ambiguity, contradiction. elc. Such at­ line, etc. Also nOte that this text avoids back the contents of technical specifi­ tempts. as shown by the text studied defining specific concepts (e.g., line cations (for example, in the case of here, only make matters worse. The length, word) explicitly; rather, il changes). As we have seen. the exist­ length of many requirements docu­ substitutes the definition for the con­ ence of a good mathematical specifica­ ments found in actual industrial prac­ cept when needed. Although this de­ tion is a great asset for improving a tice, often extending over hundreds or vice can lead to interesting literary ex­ natural-language description. even thousands of pages, is due to sllch periments,8 it is certainly not recom­ Other ways can be found for misuse of natural language. Natural mended for large requirements docu­ translating formal clements into forms language descriptions should remain ments where one must repeatedly refer lhat are more easily understood. Many reasonably short; Iheexacl description 10 the same basic concepts. people like graphical descriptions. of fine points, special cases, precise It seems to us. however. that the which playa basic role in such (non­ details, etc., should be Icftlo a formal above statement of the requiremenlS formal) specification methods as speci fication. embodies the essential elements of the SADT'iI or SREM. 10 A picture may be The advantages of brevity cannol be problem and achieves a reasonable worth a thousand words al times, but overemphasized. It CQuld even be ar­ tradeoff between the imprecision of it can also be dangerously misleading. gued that Naur's slX'Cificalion, once Naur's and the verbosity of Good­ On the other hand. a pictorial explana­ the problems of termination and con­ enough and Gerhan's specifications. tion ora well-defined concept certainly secutive break characters are tackled (Its length is in facl slight ly more than does no harm. If the picture properly, is preferable to Goodenough double the former's and half the and Gerhart's because it is shorter and latter's.) lis mOSt important feature is A B doesn't fuss unnecessarily. that it draws heavily from the lessons --Q]-----+ gained in writing the formal specifica­ New specification. It would be fair tion, while retaining (we hope) clarity game for the reader at this point to ask and simplicity. is considered more understandable what natural-language specification than Ihe function definition we have to offer in lieu of both Naur's End-users. An objcction that is and Goodenough and Gerhart's texts. often voiced against formal specifica­ j-A-B To answer such as request, we'd try to tions relates to the needs of end-users. then why not have graphics tools capitalize on the lessons gained from who request easily understandable generate the picture from the formula writing the mathematical definition. documents. Such an objection, we for the benefit of those who want il? We'd propose something like the text Ihink. is based on an incorrc(:1 assess­ There is certainJy a great need for soft­ in Figure 5, which is directly deduced ment of what specification is about. ware tools of this kind in specificalion from that definition (see in particular There is a need for requirements docu­ systems. its relation to Figure 4). ments that must be read, checked, and No doubt Ihis text deserves some discussed by noncomputer scientists, Techniques. The last point we want criticism of ils own. In particular, it but there is also a need for technical to emphasize is that formal specifica­ still needs to be refined. For example, documents used by computer profes­ tion is lIor lIecessarily difficult. The the implememor mUSt know how 10 sionals. The difference is the same as reader who is familiar with specifica­ "report the error" before embarking thai between user requirements and tion techniques will have noted that the

24 IEEE SOFTWARE

• Minuet at dress ball given by Uluis XV, ~bruary 24, 1745, In tile armory ot the Royal Sta~es at Versallles: from an engraving by C·N Coch in , "l'ancienne France, " in book by the Duke Of La Valliere, Louis ~sar de fa Baurne'!e ~anc (1708-1780), Ballets, Opera, and Drller Musical Works, Ch. J. Baptiste Bauche, Paris, 1760. From the Ii of Christena l. iI

example did nOl rely (at least explicitly) on such notions as abstract data types, finite-state machines, and attribute grammars. In fact, it used only very simple notions from elementary set theory and logic. These notions are no morc difficult than the basic core of college calculus, even if most of today's university students are regrettably less at case dealing with such concepts as sets. relations, partial functions, com­ position . and predicate calculus than with other mathematical objects and operations that are better established in the traditional curriculum.

Of course, the example studied here Given are a nonnegatlve Integer MAXPOS and a character set in· is a small problem. Experience with the cluding two "break characters" blank and new.-llne. Z language I 1.12 and subsequent work The program shall accept as Input a finite sequence of characters l1 15 prompted by this experience . shows, and produce as output a sequence of characters satisfying the follow· however, that the same basic concepts Ing conditions: can be carried through to the descrip­ tion of much more complex systems. • It only differs from tha Input by having a single break character The main limitation of the problem wherever the input has one or more break characters; studied here is that it is defined by a • any MAXPOS + 1 consecutive characters include a new.-llne; simple input-to-output relation, where­ as most significant programs can be • the number of new....Jlne characters Is minimal. characterized, in our view, as syslems If (any only If) an Input sequence contains a group of MAXPQS + 1 that offer various services in response consecutive nonbreak characters, there exists no such output. In this to possible user requests. We are cur­ case, the program shall produce the output associated with the initial rentlyworking on methods, notations, part of the sequence, up to and Including the MAXPOS·th character of and tools for the modular specification the first such group, and report the error. of such systems. 16 Figure 5. Yet another statement of the requirements.

Reust. An essential requirement of a good specification formalism is that it should favor reuse of previously written elements of specifications. For among others, provide for such li­ so ftware tools, this is one of the condi­ example, the notion of sequence and braries of basic specifications. More tions that must be met before fonnal the associated operations should be work is needed to share and reuse the specifications become for software en­ available as predefined specification work of formal specifiers. Along with gineers what, say , differential equa­ elements. Languages Z and Affirm, the availability of simple and efficient tions are for engineers in other fields. 0

January 1985 25 References \1. Jean-Raymond Abrial, Stephen A. Be n rand Meye r is a visiting associate pro­ I. Benrand Mcyer, "Sur lc Formalismc Schuman, and lknrand Meyer, "A fessor at [he University of California, San­ dans les Specifications," Globu/e, SpecifICation Language," in On /he ta Barbara, on leave from Electricite de NewS/l!ller of Ihe AFCET (French Consfme/iQn of Programs. R. Mc· France, where he was division head in Ihe Computer Society) Working Group Naughlen and R.C. McKeag, cds., research and development branch. He is on So/l'KYJreEnginl!l!ring, No. I , 1979, Cambridge Uni\'crsi[y Press, 1980. interested both in doing research and in pp.81-122. 12. Jean.Raymond Abrial, "The Specifi· making the resullS of researeh useful 10 practitioners. He has published papers on cation Language Z: Syntax and 2. Edsger W. Dijkstra, "TIle Humble programming methodology, software Semantics," Oxford University Com­ Programmer," COIIIIII. ACM. Vol. tools and environments, specification, in­ puting LaboralOry, Programming IS, No. 10, Oct. 1972, pp. 859-866. teractive systems and user interfaces, algo­ Research Group, Oxford, Apr. 1980. 3. Bertrand Meyer. "A Basis for the rithms, programming languages, and su· Constructive Approach to Program. 13. Jean-Raymond Abrial and Stephen A. percomputer programming, as well as a ming," in Injormulioll Processing 80 Schuman, "Specification of Parallel compendium on programming methodol­ (Proc. IFIP World Compurer Con­ Processes," in Semal1lics of Concur· ogy and tcchniques, Methodes de Pro­ gress. Tokyo, Japan, Oct. 6-9, 1980), ren/ ComplllUrion (Proc. In[') Symp., gramma/ion (Eyrolles, Paris, 1978, with S. H. Lavington, ed .• Nonh-Holland. Evian, France, July 2-4,1979), Gilles Claude Baudoin). He is theeditor-in--c hief Amsterdam, 1980. pp. 293·298. Kahn, ed .. Springer-Verlag. Berlin­ of Ihe French journal New York, 1979. 4. flaviu Cristian, "00 Exttptions, Fail­ Technology and Science of Informa/ics, a member of ACM, AFCET and the IEEE. ures and Errors," Technology and 14. Carroll Morgan and Bernard Sufrin, and a former member of the AFCET Science of Injonna/ics, Vol. 4, No. I, "Specification of the Unix Fi le Sys­ council. Jan. 1985. lem," IEEE TrailS. Sof/ware Engi. neering. Vol. SE-IO. No.2, Mar. 1984, Meyer's address is Dcpanment of Com­ 5. Michael O. Jackson, Principles of pp. 128-142. puter Science, University of California, Program Design, Academic Press, Santa Ilarbara, CA 93106. London, 1975. 15. Bernard Sufrin, "Formal Specifica­ tion of a Display-Oriented Text Edi­ 6. Rod M. Burstall and Joe A. Goguen, tor," Science of Complller Program­ "PUlling Theories Together to Make ming, Vol . I, No.2, May 1982. Specifications." Prrx. Fiflh Int'lJoint Con/. Artificial Imelfigenct!. Cam· 16. Bcnrand Meyer. "A SYSlem Descrip­ bridge, Mass., 1977, pp. 1045-1058. tion Method," in /1/1 'I Workshop on Models and umgllages for Sofr'KYJre 7. I. D. Hilt, "Wouldn't It Be Nice IrWe SpecUICO/ion and Design. Roben G. Could Write Computer Programs In Babb II and Ali Mili, eds. , Orlando, Ordinary English-Or Would [[?" Aa., Mar. 1984, pp. 42-46. BCS Computer Bulletin, Vot. 16, No. 6, June 1972, pp. 3Q6..312. 8. Oulipo, Ouuroir de Lillero/llre Po/en­ /ielfe, Gallimard. Paris, 1967. Acknowledgments 9. Douglas T. Ross and Kenneth E. I learned most of what I know about Schoman , Jr., " eussions on specification). I also [hank for Requirements Definitions." IEEE specification from Jean-Raymond Abrial. flaviu Crist ian for imponan[ comments on TrollS. Sof/ware Engineering, Vot. Much of the malerial was contained in an a previous version. The referees' comments SE-), No. I, Jan. 1977, pp. 6-15. earlier arlide, wrillen in French and were also useful. published in 1979 in a ne-.o.·slener. I I am 10. Mack W. Alford, "A Requirements grateful to Axel van Lamsweerde for This aniele also benefited from the in­ Engineering Methodology for Real­ reminding me of the existence of that ani­ voluntary contributions made by the au­ Time Processing Requirements," de and suggesting that it might be of in ­ thors of all [he system requirements and IEEE Trans. Sof/wore Engineering, lerest to a wider audience (and to him and other software documentation I have had Vol. SE-3, No. I, Jan. 1977, pp. 60-69. Jean-Pierre Finance for some healed dis- to struggle with ovcr a number of years.

26 IEEE SOFTWARE