Denotational Semantics of ANSI C

Denotational Semantics of ANSI C

Computer Standards & Interfaces 23Ž. 2001 169–185 www.elsevier.comrlocatercsi Denotational semantics of ANSI C Nikolaos S. Papaspyrou) Department of Electrical and Computer Engineering, Software Engineering Laboratory, National Technical UniÕersity of Athens, Polytechnioupoli, 15780 Zografou, Athens, Greece Received 20 September 2000; received in revised form 28 December 2000; accepted 5 January 2001 Abstract The semantics of C is described in the ANSIrISO standard using natural language. This paper contains a brief summary, more descriptive than technical, of our research in specifying a complete and accurate formal semantics for ANSI C. We follow the denotational approach and divide the specification in three distinct phases: static, typing and dynamic semantics. Moreover, we have developed a direct implementation of the semantics, using the programming language Haskell. We argue that our formal specification results in a better understanding of the semantics of ANSI C and comment on its readability, precision, abstraction and applications. q 2001 Elsevier Science B.V. All rights reserved. Keywords: ANSI C programming language; ISOrIEC 9899:1999 standard; Formal definition; Denotational semantics; Monads 1. Introduction which is currently considered as the official language documentationwx 15 . The standard is nowadays ac- C is a well-known and very popular general pur- cepted as a common basis by the developers and the pose programming language which represents, to- users of C implementations and other tools. gether with its descendants, a strong and indisputable Every person who uses a programming language status quo in the current software industry. It is a to develop programs must understand its semantics medium-level language, mainly characterized by its at some level of abstraction. Programmers usually economy of expression, its large set of operators and understand the semantics by means of examples, data types, and its concern for source code portabil- intuition and descriptions in natural language. Such ity. The general feeling towards C can probably be semantic descriptions are informal and typically best summarized in a statement made by its inventor, based on a set of assumptions about the reader’s Dennis Ritchie: AC is quirky, flawed and an enor- knowledge, understanding and an agreed upon com- mous successB wx32 . In 1990, ISOrIEC 9899:1990 mon interpretation of terms. Informal semantic de- was adopted as the first standard for the ANSI C scriptions are inherently ambiguous, as is always the programming languagewx 14 . It was later amended by case with natural languages. In the best case, a a few complementary documents. A long review programmer’s intuition fills the missing points in the process completed in 1999, resulting in the revised description and leads to the correct understanding of standard ISOrIEC 9899:1999, nicknamed AC9XB, a language’s semantics. In the worst case, the de- scription is fatally ambiguous or even misleading ) Tel.: q30-1-772-2486; fax: q30-1-772-2519. and the programmer is prone to misinterpretations, E-mail address: [email protected]Ž. N.S. Papaspyrou . which often lead to programming errors. 0920-5489r01r$ - see front matter q 2001 Elsevier Science B.V. All rights reserved. PII: S0920-5489Ž. 01 00059-9 170 N.S. PapaspyrourComputer Standards & Interfaces 23() 2001 169–185 The semantics of ANSI C is informally defined tics for C is given as an evolving algebra. Again, a in the standardwx 15 using natural language. This number of simplifications are made, e.g. no inter- causes a number of ambiguities and problems of leaving is possible in expression evaluation and side interpretation, clearly manifested in numerous dis- effects are assumed to take place at the same time cussions which often take place in the news-group that they are generated. In the work of Cook and It is worth noticing that members Subramanianwx 7 , an incomplete semantics for C is of the standardization committee and other distin- developed in the theorem prover Nqthm. Cook et al. guished researchers participating in the discussions wx6 have also developed a denotational semantics for often give contradictory answers when asked about C based on temporal logic, which again makes a the intended semantics of surprisingly small pro- number of simplifying assumptions mainly concern- grams, and that their answers are usually based on ing evaluation order. Finally, in the work of Norrish different possible interpretations of the standard. With wx27 , a complete operational semantics for C is given all this in mind, the necessity for a formal descrip- using small-step reductions. To the best of our tion of the semantics of C becomes apparent. Such a knowledge, this is the only approach that formalizes description would serve as a rigid mathematical correctly C’s unspecified order of evaluation and model for the language, without restricting the tech- sequence points. No similar denotational approach is niques used in implementations. Moreover, it would known to us. provide a basis for reasoning about properties of C In the present paper, we summarize the results of programs and would be a valuable tool for the our researchwx 28 , aiming at the development of a analysis, evaluation and possible redesign of the complete and accurate formal description for the language. semantics of ANSI C. For this purpose, we have The semantics of many popular programming lan- chosen the denotational approachwx 26,35 and employ guages have been formally specified in literature monads and monad transformerswx 20,23,38 in order using various formalisms. However, in most cases to improve the modularity and elegance of the devel- the specifications are incomplete, inaccurate or both, oped semantics. Our formalization is based on the to some extent. By incomplete we mean that they do 1990 version of the standard and all references to not specify the semantics of the whole language but paragraphs and pages are with respect to that version that of a subset, often leaving out the most compli- wx14 . cated features. By inaccurate we mean that the The rest of the paper is structured as follows. In formal descriptions are not entirely correct, either Section 2, we identify a few common misunderstand- because of intended simplifications or by mistake. ings concerning the semantics of C. We discuss their Few real programming languages, i.e. high-level lan- causes, consequences and to what extent they can be guages that are widely used in industry for software attributed to faults and weeknesses in the language’s development, have been given formal semantics as standard. Section 3 contains aŽ more descriptive than part of their definition. Among them one should technical. summary of our approach, dividing the mention Schemewx 1,12,13 and Standard ML wx 22,41 . specification of C’s formal semantics in three dis- Other languages that have been at least partly for- tinct phasesŽ. static, typing and dynamic semantics malized include: Adawx 25,30 ; Algol 60 wx 2,10 ; Cqq and illustrating their collaboration. In Section 4, we wx39,40 ; Cobol wx 37 ; Modula-2 wx 9,24 ; PLr1 present an evaluation of our semantics and comment wxwxwx21,31,42 ; Pascal 3,11,36 ; Prolog 5 ; and Smalltalk on its possible applications. Finally, in Section 5, we 80wx 4,43 . summarize the contribution of our research and show Significant research has been conducted recently directions for future work. concerning the semantics of C. In what seems to be the earliest formal approach, Sethiwx 33,34 addresses mainly the semantics of pre-ANSI C declarations 2. Misinterpretations in the semantics of C and control structures, using the denotational ap- proach and making several simplifications. In the C is probably the most widely spread program- work of Gurevich and Hugginswx 8 , a formal seman- ming language in today’s software industry. We N.S. PapaspyrourComputer Standards & Interfaces 23() 2001 169–185 171 believe, however, that there is a large number of lent. A question that may seem easy at first is: What programmers who are confident of their understand- are the members of the structure pointed by ? ing of C, but whose understanding is unfortunately Possible candidates are obviously and , de- subjective and incorrect, i.e. they do not understand pending on which of the declarations for structure the language in the way that is intended in the is in effect at line 4. But which one is it? The correct standard. To support this opinion, we present four answer is that the two programs are not equivalent simple program segments that are sources of com- and the only member of the structure is in the mon misinterpretations among C programmers. In case of segmentŽ. A , and in the case of segment the first three cases, taken respectively from the Ž.B . The rationale behind this answer can be found in areas of static, typing and dynamic semantics as Section 6.5.2.3 of the standard. In brief, line 3 of distinguished in our research, all doubts vanish when segmentŽ. B defines a new incomplete structure type one reads the standard carefully. and overrides the definition of in the enclosing scope, whereas line 3 of segmentŽ. A does not have 2.1. Case 1 the same effect. Consider the two small program segments shown 2.2. Case 2 in Fig. 1. The two segments differ only in the Next, consider the following program segment, presence of identifier in line 3. This identi- which is intended to increase the contents of a fier is never used within function and therefore, variable representing the number of counted men or one might assume that the two segments are equiva- women, depending on the value of a boolean flag. The question now is: Is this program segment Before attempting an answer, one should be fa- legal? miliar with some key notions regarding the seman- The answer depends on whether a conditional tics of C.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    17 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us