Defining the Undefinedness of C
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
1.1 Introduction to C Language
1.1 Introduction to C Language 1 Department of CSE Objectives • To understand the structure of a C-Language Program • To write a minimal C program • To introduce the include preprocessor command • To be able to create good identifiers for quantities in a program • To be able to list, describe and use the basic data types in C • To be able to create and use variables and constants in a C program 2 Department of CSE Agenda • Background of C Language • Structure of a C program • C Comments • Identifiers in C • Data types in C • Variables in C • Constants in C 3 Department of CSE Background of C • C is a middle level language it combines the best elements of high-level languages with the control and flexibility of assembly language • Like most modern languages, C is also derived from ALGOL 60 (1960) • Developed by Dennis Ritchie in 1972 using many concepts from its predecessors – ALGOL,BCPL and B and by adding the concept of data types • American National Standards Institute (ANSI) began the standardization of C in 1983 which was approved in 1989 • In 1990 International Standards Organization (ISO) adopted the ANSI standard version known as C89 • Minor changes were made to C89 in 1995 which came to be known as C95 • Much more significant updates were made in 1999 and named as C99 4 Department of CSE Features of C • C is a structured programming language which allows compartmentalization of code and data • A structured language offers a variety of programming possibilities. For example, structured languages typically support several loop constructs, such as while, do-while, and for. -
Undefined Behaviour in the C Language
FAKULTA INFORMATIKY, MASARYKOVA UNIVERZITA Undefined Behaviour in the C Language BAKALÁŘSKÁ PRÁCE Tobiáš Kamenický Brno, květen 2015 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Vedoucí práce: RNDr. Adam Rambousek ii Acknowledgements I am very grateful to my supervisor Miroslav Franc for his guidance, invaluable help and feedback throughout the work on this thesis. iii Summary This bachelor’s thesis deals with the concept of undefined behavior and its aspects. It explains some specific undefined behaviors extracted from the C standard and provides each with a detailed description from the view of a programmer and a tester. It summarizes the possibilities to prevent and to test these undefined behaviors. To achieve that, some compilers and tools are introduced and further described. The thesis contains a set of example programs to ease the understanding of the discussed undefined behaviors. Keywords undefined behavior, C, testing, detection, secure coding, analysis tools, standard, programming language iv Table of Contents Declaration ................................................................................................................................ ii Acknowledgements .................................................................................................................. iii Summary ................................................................................................................................. -
Defining the Undefinedness of C
Technical Report: Defining the Undefinedness of C Chucky Ellison Grigore Ros, u University of Illinois {celliso2,grosu}@illinois.edu Abstract naturally capture undefined behavior simply by exclusion, be- This paper investigates undefined behavior in C and offers cause of the complexity of undefined behavior, it takes active a few simple techniques for operationally specifying such work to avoid giving many undefined programs semantics. behavior formally. A semantics-based undefinedness checker In addition, capturing the undefined behavior is at least as for C is developed using these techniques, as well as a test important as capturing the defined behavior, as it represents suite of undefined programs. The tool is evaluated against a source of many subtle program bugs. While a semantics other popular analysis tools, using the new test suite in of defined programs can be used to prove their behavioral addition to a third-party test suite. The semantics-based tool correctness, any results are contingent upon programs actu- performs at least as well or better than the other tools tested. ally being defined—it takes a semantics capturing undefined behavior to decide whether this is the case. 1. Introduction C, together with C++, is the king of undefined behavior—C has over 200 explicitly undefined categories of behavior, and A programming language specification or semantics has dual more that are left implicitly undefined [11]. Many of these duty: to describe the behavior of correct programs and to behaviors can not be detected statically, and as we show later identify incorrect programs. The process of identifying incor- (Section 2.6), detecting them is actually undecidable even rect programs can also be seen as describing which programs dynamically. -
ACCU 2015 “New” Features in C
"New" Features in C ACCU 2015 “New” Features in C Dan Saks Saks & Associates www.dansaks.com 1 Abstract The first international standard for the C programming language was C90. Since then, two newer standards have been published, C99 and C11. C99 introduced a significant number of new features. C11 introduced a few more, some of which have been available in compilers for some time. Curiously, many of these added features don’t seem to have caught on. Many C programmers still program in C90. This session explains many of these “new” features, including declarations in for-statements, typedef redefinitions, inline functions, complex arithmetic, extended integer types, variable- length arrays, flexible array members, compound literals, designated initializers, restricted pointers, type-qualified array parameters, anonymous structures and unions, alignment support, non-returning functions, and static assertions. 2 Copyright © 2015 by Daniel Saks 1 "New" Features in C About Dan Saks Dan Saks is the president of Saks & Associates, which offers training and consulting in C and C++ and their use in developing embedded systems. Dan has written columns for numerous print publications including The C/C++ Users Journal , The C++ Report , Software Development , and Embedded Systems Design . He currently writes the online “Programming Pointers” column for embedded.com . With Thomas Plum, he wrote C++ Programming Guidelines , which won a 1992 Computer Language Magazine Productivity Award . He has also been a Microsoft MVP. Dan has taught thousands of programmers around the world. He has presented at conferences such as Software Development and Embedded Systems , and served on the advisory boards for those conferences. -
The Art, Science, and Engineering of Fuzzing: a Survey
1 The Art, Science, and Engineering of Fuzzing: A Survey Valentin J.M. Manes,` HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele, Edward J. Schwartz, and Maverick Woo Abstract—Among the many software vulnerability discovery techniques available today, fuzzing has remained highly popular due to its conceptual simplicity, its low barrier to deployment, and its vast amount of empirical evidence in discovering real-world software vulnerabilities. At a high level, fuzzing refers to a process of repeatedly running a program with generated inputs that may be syntactically or semantically malformed. While researchers and practitioners alike have invested a large and diverse effort towards improving fuzzing in recent years, this surge of work has also made it difficult to gain a comprehensive and coherent view of fuzzing. To help preserve and bring coherence to the vast literature of fuzzing, this paper presents a unified, general-purpose model of fuzzing together with a taxonomy of the current fuzzing literature. We methodically explore the design decisions at every stage of our model fuzzer by surveying the related literature and innovations in the art, science, and engineering that make modern-day fuzzers effective. Index Terms—software security, automated software testing, fuzzing. ✦ 1 INTRODUCTION Figure 1 on p. 5) and an increasing number of fuzzing Ever since its introduction in the early 1990s [152], fuzzing studies appear at major security conferences (e.g. [225], has remained one of the most widely-deployed techniques [52], [37], [176], [83], [239]). In addition, the blogosphere is to discover software security vulnerabilities. At a high level, filled with many success stories of fuzzing, some of which fuzzing refers to a process of repeatedly running a program also contain what we consider to be gems that warrant a with generated inputs that may be syntactically or seman- permanent place in the literature. -
Software II: Principles of Programming Languages
Software II: Principles of Programming Languages Lecture 6 – Data Types Some Basic Definitions • A data type defines a collection of data objects and a set of predefined operations on those objects • A descriptor is the collection of the attributes of a variable • An object represents an instance of a user- defined (abstract data) type • One design issue for all data types: What operations are defined and how are they specified? Primitive Data Types • Almost all programming languages provide a set of primitive data types • Primitive data types: Those not defined in terms of other data types • Some primitive data types are merely reflections of the hardware • Others require only a little non-hardware support for their implementation The Integer Data Type • Almost always an exact reflection of the hardware so the mapping is trivial • There may be as many as eight different integer types in a language • Java’s signed integer sizes: byte , short , int , long The Floating Point Data Type • Model real numbers, but only as approximations • Languages for scientific use support at least two floating-point types (e.g., float and double ; sometimes more • Usually exactly like the hardware, but not always • IEEE Floating-Point Standard 754 Complex Data Type • Some languages support a complex type, e.g., C99, Fortran, and Python • Each value consists of two floats, the real part and the imaginary part • Literal form real component – (in Fortran: (7, 3) imaginary – (in Python): (7 + 3j) component The Decimal Data Type • For business applications (money) -
Intel ® Fortran Programmer's Reference
® Intel Fortran Programmer’s Reference Copyright © 1996-2003 Intel Corporation All Rights Reserved Issued in U.S.A. Version Number: FWL-710-02 World Wide Web: http://developer.intel.com Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSO- EVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PAR- TICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, or life sustaining applications. This Intel® Fortran Programmer’s Reference as well as the software described in it is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this manual is furnished for infor- mational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corpora- tion. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "unde- fined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibil- ities arising from future changes to them. -
A Simple, Possibly Correct LR Parser for C11 Jacques-Henri Jourdan, François Pottier
A Simple, Possibly Correct LR Parser for C11 Jacques-Henri Jourdan, François Pottier To cite this version: Jacques-Henri Jourdan, François Pottier. A Simple, Possibly Correct LR Parser for C11. ACM Transactions on Programming Languages and Systems (TOPLAS), ACM, 2017, 39 (4), pp.1 - 36. 10.1145/3064848. hal-01633123 HAL Id: hal-01633123 https://hal.archives-ouvertes.fr/hal-01633123 Submitted on 11 Nov 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 14 A simple, possibly correct LR parser for C11 Jacques-Henri Jourdan, Inria Paris, MPI-SWS François Pottier, Inria Paris The syntax of the C programming language is described in the C11 standard by an ambiguous context-free grammar, accompanied with English prose that describes the concept of “scope” and indicates how certain ambiguous code fragments should be interpreted. Based on these elements, the problem of implementing a compliant C11 parser is not entirely trivial. We review the main sources of difficulty and describe a relatively simple solution to the problem. Our solution employs the well-known technique of combining an LALR(1) parser with a “lexical feedback” mechanism. It draws on folklore knowledge and adds several original as- pects, including: a twist on lexical feedback that allows a smooth interaction with lookahead; a simplified and powerful treatment of scopes; and a few amendments in the grammar. -
Eliminating Scope and Selection Restrictions in Compiler Optimizations
ELIMINATING SCOPE AND SELECTION RESTRICTIONS IN COMPILER OPTIMIZATION SPYRIDON TRIANTAFYLLIS ADISSERTATION PRESENTED TO THE FACULTY OF PRINCETON UNIVERSITY IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY RECOMMENDED FOR ACCEPTANCE BY THE DEPARTMENT OF COMPUTER SCIENCE SEPTEMBER 2006 c Copyright by Spyridon Triantafyllis, 2006. All Rights Reserved Abstract To meet the challenges presented by the performance requirements of modern architectures, compilers have been augmented with a rich set of aggressive optimizing transformations. However, the overall compilation model within which these transformations operate has remained fundamentally unchanged. This model imposes restrictions on these transforma- tions’ application, limiting their effectiveness. First, procedure-based compilation limits code transformations within a single proce- dure’s boundaries, which may not present an ideal optimization scope. Although aggres- sive inlining and interprocedural optimization can alleviate this problem, code growth and compile time considerations limit their applicability. Second, by applying a uniform optimization process on all codes, compilers cannot meet the particular optimization needs of each code segment. Although the optimization process is tailored by heuristics that attempt to a priori judge the effect of each transfor- mation on final code quality, the unpredictability of modern optimization routines and the complexity of the target architectures severely limit the accuracy of such predictions. This thesis focuses on removing these restrictions through two novel compilation frame- work modifications, Procedure Boundary Elimination (PBE) and Optimization-Space Ex- ploration (OSE). PBE forms compilation units independent of the original procedures. This is achieved by unifying the entire application into a whole-program control-flow graph, allowing the compiler to repartition this graph into free-form regions, making analysis and optimization routines able to operate on these generalized compilation units. -
A Theory of Type Qualifiers
A Theory of Type Qualifiers∗ Jeffrey S. Foster Manuel F¨ahndrich Alexander Aiken [email protected] [email protected] [email protected] EECS Department University of California, Berkeley Berkeley, CA 94720-1776 Abstract Purify [Pur]), which test for properties in a particular pro- gram execution. We describe a framework for adding type qualifiers to a lan- A canonical example of a type qualifier from the C world guage. Type qualifiers encode a simple but highly useful is the ANSI C qualifier const. A variable with a type an- form of subtyping. Our framework extends standard type notated with const can be initialized but not updated.1 A rules to model the flow of qualifiers through a program, primary use of const is annotating pointer-valued function where each qualifier or set of qualifiers comes with addi- parameters as not being updated by the function. Not only tional rules that capture its semantics. Our framework al- is this information useful to a caller of the function, but it lows types to be polymorphic in the type qualifiers. We is automatically verified by the compiler (up to casting). present a const-inference system for C as an example appli- Another example is Evans's lclint [Eva96], which intro- cation of the framework. We show that for a set of real C duces a large number of additional qualifier-like annotations programs, many more consts can be used than are actually to C as an aid to debugging memory usage errors. One present in the original code. such annotation is nonnull, which indicates that a pointer value must not be null. -
C and C++ Programming for the Vector Processor
C AND C++ PROGRAMMING FOR THE VECTOR PROCESSOR Geir Johansen, Cray Inc., Mendota Heights, Minnesota, USA Abstract: The Cray Standard C and C++ compilers have the ability to perform many types of optimizations to improve the performance of the code. This paper outlines strategies for C and C++ programming that will give the Cray Standard C and C++ compilers a greater opportunity to generate optimized code. 1.0 Scope of the Paper PDGCS (Program Dependence Graph Compiling System) is the area of the compiler The purpose of this is to give guidance to C that performs most of the optimization of the and C++ developers in writing code, so that the code. PDGCS has been a part of the compiler Cray compiler will produce better optimized since the Cray YMP. The Cray C/C++ compiler code. An algorithm can be coded in several greatly benefits from the fact that PDGCS is also different ways; however, the Cray compiler can the backend for the Cray Fortran compiler. The more easily optimize certain styles of coding. Cray C/C++ compiler leverages the extensive The more optimization that is initially performed testing and development to PDGCS for the Cray by the compiler, the less time the user needs to Fortran compiler. Optimizations that PDGCS spend on analyzing and manually optimizing the perform include: code. The paper is not intended for a specific Cray architecture, so in general, machine specific ∑ Reduction loops optimization issues such as memory contention ∑ Loop fusing are not discussed. Also, the paper does not ∑ Loop unrolling discuss optimization techniques, such as loop ∑ Loop unwinding fusing, that can be performed by the compiler. -
Towards Restrict-Like Aliasing Semantics for C++ Abstract
Document Number: N3988 Date: 2014-05-23 Authors: Hal Finkel, [email protected] Hubert Tong, [email protected] Chandler Carruth, [email protected], Clark Nelson, [email protected], Daveed Vandevoode, [email protected], Michael Wong, [email protected] Project: Programming Language C++, EWG Reply to: [email protected] Revision: 1 Towards restrict-like aliasing semantics for C++ Abstract This paper is a follow-on to N3635 after a design discussion with many compiler implementers at Issaquah, and afterwards by telecon, and specifically proposes a new alias_set attribute as a way to partition type-based aliasing to enable more fine grain control, effectively giving C++ a more complete restrict-like aliasing syntax and semantics. Changes from previous papers: N3988: this paper updates with syntax and semantics N3635: Outlined initial idea on AliasGroup which became alias_set. The alternative of newType was rejected as too heavy weight of a solution. 1. Problem Domain There is no question that the restrict qualifier benefits compiler optimization in many ways, notably allowing improved code motion and elimination of loads and stores. Since the introduction of C99 restrict, it has been provided as a C++ extension in many compilers. But the feature is brittle in C++ without clear rules for C++ syntax and semantics. Now with the introduction of C++11, functors are being replaced with lambdas and users have begun asking how to use restrict in the presence of lambdas. Given the existing compiler support, we need to provide a solution with well-defined C++ semantics, before use of some common subset of C99 restrict becomes widely used in combination with C++11 constructs.