The Syntax of Programmnig Languages-A Survey

The Syntax of Programmnig Languages-A Survey

The Syntax of Programmnig Languages-A Survey ROBERT W. FLOYD Summary-The syntactic rules for many programming languages of the way quotation marks are used in English, for the have been expressed by formal grammars, generally variants of same purpose, to distinguish phrase-structure grammars. The syntactic analysis essential to translation of programming languages can be done entirely me- The baby can say "one word" chanically for such languages. Major problems remain in rendering analyzers efficient in use of space and time and in finding fully satis- from factory formal grammars for present and future programming lan- guages. The baby can say one word. INTRODUCTION Names of phrase types appearing in the text are hyphen- ated to show explicitly that the separate words of a N RECENT YEARS, few programming languages name need have no individual meaning and that the designed for widespread use have escaped having name as a whole is used as a technical term, without the more orderly part of their formation rules and such connotations as its individual words may suggest. restrictions presented in one of several simple tabular The angular brackets enclosing a name in a grammatical forms, somewhat like the axioms of a formal mathe- rule share this function. A complete set of grammatical matical system. ALGOL, JOVIAL, FORTRAN, rules for a language written in a format equivalent to NELIAC, COBOL, BALGOL, MAC, APT, and their that of the example is a phrase-structure grammar offshoots have all been defined in such a fashion (see (PSG); a language definable by a PSG is a phrase- Sections A and B of the bibliography). For some of structure language (PSL).' these languages, the formalism is easy and natural. For In general, a phrase-structure grammar, taken as a set others, it is not; FORTRAN [A9} suffers needlessly, of definitions, provides a list of alternative constructions bound in the unaccustomed corsetry of her younger in a definition for each syntactic type, where each con- rival's design. Whatever the merits of formal grammars struction is a list of characters and syntactic type names. in general, some languages are best defined in words. A construction represents the set of phrases which can Where formal grammars are appropriate, however, be formed by replacing each syntactic type name with mathematical and linguistic analysis provides compilers a phrase of that type; the phrases of a certain type are of lower cost and high reliability, and theoretical all those represented by some construction in the defi- knowledge about the structure and value of the language nition of that type. There is usually a single syntactic itself. type, called "program" (or "sentence"), which is used PHRASE-STRUCTURE GRAMMARS in the definition of no other type; the set of phrases of this type is the language defined by the grammar. On The most representative and fruitful example of the the one hand, PSG's can define some languages of con- use of a formal grammar in defining a programming siderable complexity; on the other, such simple sets of language is the use of a phrase-structure grammar to strings as that consisting of 'abc,' 'aabbcc,' 'aaabbbccc,' specify most of the syntactic rules of ALGOL 60 [A7], etc., are demonstrably not definable by any phrase [A8], [B1], [B5] The form for grammatical rules used structure grammar [C4]. in the report which officially defines ALGOL 60 is It is evident that a complete definition of a pro- typified [Al] by gramming language may be expressed far more con- (for statement):: = (for clause) (statement) cisely by a PSG than by the corresponding English | statement). sentences and that it is humanly impossible to read or (label): (for write those sentences, with their hundreds of occur- This assertion can be read "A for-statement is defined rences of 'is defined to be,' 'followed by,' and 'or.' If a to be a for-clause followed by a statement, or a label phrase-structure grammar is nearly adequate to define followed by a colon ':' followed by a for-statement." a language, then most of the rules defining the language The symbol ' *: =' stands for 'is defined to be'; ' |' stands for 'or' and is used to separate alternative forms of the I The type of grammar described here is sometimes called a con- text-free phrase-structure grammar, as distinguished from a more gen- definiendum. The angular brackets'( )' are used to en- eral type of grammar, the context-dependent phrase-structure gram- close each name of a phrase type, distinguishing it as a mar. The latter has no known applications to programming languages, the term "phrase-structure" is not necessarily appropriate for a con- name, rather than the thing named. This is the reverse text-dependent grammar, and the term "context-free" has certain misleading implications; we will therefore use the short term "phrase- structure grammar" for what is sometimes also called a context-free Manuscript received Febrary 3, 1964; revised May 4, 1964. phrase-structure grammar [C6], simple phrase structure grammar The author is with Computer Associates, Inc., Wakefield, Mass. [Cl], or Type 2 grammar [C41. 346 Floyd: Syntax of Programming Languages 347 can be neatly and compactly listed without explanation, 5) 'John' followed by any verb followed by 'Mary' is conserving space, time, and clarity; attention may be a sentence. concentrated on the few syntactic rules which do not fit d) 'Loves' is a verb. the pattern of phrase definitions. 6) 'John loves Mary' is a sentence. As mentioned above, the rules of a PSG are analogous in a phrase structure has From this point of view, a sentence to axioms. One who somehow obtained a program language is the last line of a derivation from the symbol and an understanding of its structure can use a PSG '(sentence),' provided that no further substitutions are to prove the program is well formed and to demonstrate possible.3 The grammar is regarded not as an axiom the structure to others. The usefulness of a PSG to a sentences but as a device for programmer writing in the language, or to the compiler scheme for validating which translates it into machine language coding, is generating them. When a PSG is considered as a genera- A not us how tive grammar, its rules are commonly called productions. less apparent. grammar does tell to syn- The two viewpoints are substantially equivalent, but a it not us how to thesize specific program; does tell the generative viewpoint, by making explicit the process analyze a particular given program.2 In order to construct programs in a phrase-structure by which sentences are constructed, makes the grammar a A writer of may every rule of its grammar more tractable object of study. programs language, one interpret in a now be thought of as a device to generate as a permit to perform certain acts of substitution. PSL can a to each syntactic type of a grammar, sentences, with choices between alternatives governed, Assigning symbol the let us interpret each rule as allowing the substitution, for example, by the structure of a flow chart of for the definiendum, of any one of the alternative de- program. Not enough is known about linguistic behavior finientes. Applying these substitution rules repeatedly to specify the mechanism of choice in detail. to type A compact representation of a derivation is the syntax the symbol designating the syntactic 'program' for the derivation or 'sentence,' we arrive eventually at a sequence of sym- tree [C3], [Gi]; the syntax tree bols in which no further substitutions can take place; above is: this string is a program or sentence in the language, the process by which it was produced being an abbreviated (sentence) proof of its sentencehood. The symbols designating syn- (noun) (predicate) tactic types, for which substitutions may be made, are called nonterminal characters; those undefined symbols which form sentences are the terminal characters. (v (noun) Take, for instance, the grammar: John loves lary a) (sentence)-*(noun)(predicate) b) (predicate)-*>(verb)(noun) c) (noun)-*John| Mary In general, a syntax tree is like a genealogical tree for a d) (verb)--loves. family whose common ancestor is (sentence), where the immediate descendants (sons) of a symbol form one of Successive substitution, starting with (sentence), the alternatives of the definition of that symbol and gives the sequence where only the terminal characters fail to have de- 1) (sentence) scendants. Such a tree represents a derivation of the 2) (noun)(predicate) sentence formed by its terminal characters. It also 3) John (predicate) illustrates the structure of the sentence; the terminal 4) John (verb)(noun) descendants of any node on the tree form a phrase in 5) John (verb) Mary the sentence, of the type designated by that node. In a 6) John loves Mary. language satisfactorily described by its grammar, the phrases of a sentence are its meaningful units. Some This sequence, a derivation of the sentence "John compilers take advantage of this, creating a syntax tree loves Mary," is an abbreviation of the following proof: as a structured representation of the information con- 1) Any sentence is a sentence. tained in the source program. Suitable processes then a) A noun followed by a predicate is a sentence. translate the tree into a computer program, or a deri- 2) Any noun followed by any predicate is a sentence. vation tree for an equivalent sentence in another lan- c) 'John' is a noun. guage or a related sentence in the same language. 3) 'John' followed by any predicate is a sentence. SYNTAX-DIRECTED ANALYSIS b) A verb followed by a noun is a predicate. 4) 'John' followed by any verb followed by any noun A syntax-directed analyzer might be defined as any is a sentence.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us