
MASARYK UNIVERSITY FACULTY}w¡¢£¤¥¦§¨ OF I !"#$%&'()+,-./012345<yA|NFORMATICS Undefined behaviour in language C MASTER’S THESIS Branislav Košˇcák Brno, autumn 2014 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Branislav Košˇcák Advisor: Mgr. Marek Grác, Ph.D. ii Acknowledgement I am very grateful to my supervisor Miroslav Franc for his guidance, invaluable help and feedback throughout the work on this thesis. iii Abstract This Master’s Thesis deals with the concept of undefined behaviour in the C language from the point of view of a tester and a developer. It further describes what the purpose of undefined behaviour is, why it is dangerous, and it provides a summary of the current state regard- ing prevention and detection of undefined behaviour. Blind spots where there is currently no tool available are identified. The thesis contains a set of sample programs. iv Keywords undefined behaviour, C, testing, programming language, exploit, stan- dard, detection v Contents 1 Introduction ............................1 1.1 ISO requirements......................1 1.2 Analysis Tools........................3 1.3 Goals.............................4 2 Undefined Behaviour ......................5 2.1 Reason for Undefined Behaviour.............5 2.1.1 Performance – compile/run-time........5 2.1.2 Support of HW portability............6 2.2 Drawbacks of Undefined Behaviour...........7 3 Analyzed Undefined Behaviours ................9 3.1 Object Lifetime.......................9 3.1.1 Detection...................... 10 3.2 Integer Overflow...................... 11 3.2.1 Detection...................... 13 3.3 Integer Conversion..................... 13 3.3.1 Detection...................... 14 3.4 Void Expression....................... 15 3.5 Pointer Conversion..................... 16 3.6 Pointer to Function..................... 17 3.6.1 Detection...................... 18 3.7 Modifying String Literal.................. 19 3.8 Sequence Points....................... 20 3.8.1 Detection...................... 21 3.9 Function Prototype..................... 22 3.10 Pointer Type......................... 23 3.11 Use of Library Functions.................. 24 3.11.1 Detection...................... 25 3.12 Pointer Arithmetic..................... 26 3.12.1 Detection...................... 27 3.13 Shifting Overflow...................... 28 3.13.1 Detection...................... 29 3.14 Pointers Comparison.................... 29 3.14.1 Detection...................... 30 3.15 Integer Constant Expression................ 31 3.16 Longjmp/Setjmp...................... 32 vi 3.16.1 Detection...................... 33 3.17 Multiple External Definitions............... 34 3.18 Pointer to FILE....................... 35 3.19 Output Functions...................... 36 3.20 Copy of Overlapping Objects............... 37 3.20.1 Detection...................... 39 4 Conclusion ............................. 40 Bibliography.............................. 43 A Shared Library Implementation ................ 44 B Content of the CD-ROM ..................... 46 vii 1 Introduction Nowadays there are various types of programming languages avail- able. They differ by programming style – paradigm, by syntax and semantics they use, whether they are strongly or weakly typed and by other specifications. It is natural that single programming lan- guage cannot handle all tasks equally efficiently. Performing quick queries on a large database requires different approach than low level GPU programming. Every language has its advantages and limita- tions. Mastering both minimizes errors and also leads to better per- formance and an effective code. In 1972 C programming language was developed. Combination of principles and main features of BCPL and B programming lan- guage resulted into creation of C. It soon spread among universities and became used by several organizations. The problem was that dif- ferent versions of C were used what caused compatibility problems. It was necessary to establish a standard definition of C. The first was ANSI C (later adopted by ISO organization and thus referred as ISO C). The standard has been evolving, undergoing several changes and adding new features that were lacking in previous standards. The current standard for C programming language is C11 [9]. C is imperative programming language. It is a flexible program- ming language whose important feature is that it allows low level programming while using syntax of high level languages. It allows fast program execution, has ability to manipulate with memory on low level. C is weakly typed. No bounds checking and pointer valid- ity checking improves the performance of C programs. This is pow- erful tool for experienced programmer. On the other hand it can lead to severe problems and undesired results when used inappropriately. 1.1 ISO requirements C programming language comes with additional concepts whose knowledge is required when programming. Unspecified behaviour, implementation-defined behaviour and undefined behaviour are as- pects of C which may cause programs to misbehave, even though a compiler does not report any error. However, as it often happens, af- 1 1. INTRODUCTION ter a change in compiler implementation or language specification, the program may suddenly stop working. Unspecified behaviour is defined as the use of an unspecified value, or other behaviour where by standard two or more options are possible, and it is not specified which one would be chosen. For example order in which the operands of an assignment operator are evaluated. Program that contains unspecified behaviour but is cor- rect in all other aspects is considered to be correct. Implementation-defined behaviour is like unspecified behaviour where operations and choices that are made are described with re- spect to implementation documents. For example the number of bits in a byte. In ISO C undefined behaviour is described as "behaviour, upon use of a non-portable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements". As there are no constraints, a program containing undefined behaviour can be- have in an expected way, it can crash or run with unpredictable re- sults with or without an issuance of a diagnostic messages. The specification of C language contains full list of program con- structs that will lead to undefined behaviour. The occurrence can be of different type. From preprocessor directives, functions decla- rations, expression evaluation or calling library functions. Causing undefined behaviour should be avoided as it may lead into undesired interpretations. In better case program would not work. But because results can be arbitrary there is also a possibility that it could be exploited. As stated in C specification, signed integer overflow causes un- defined behaviour. Example [7] of the real situation when this could have been exploited, is the Linux kernel 2.4.25 and earlier where inte- ger overflow in the SCTP_SOCKOPT_DEBUG_NAME SCTP socket option in socket.c allowed local users to execute arbitrary code. Since C programming language imposes only a few constraints on a programmer it can easily result into problems that may be diffi- cult to discover. There are many tools available to aid programmers to avoid some of the problems that can occur in a code and may re- main hidden as they are not reported by a compiler. Analysis can be static (compile-time) or dynamic (run-time), performed on source code or binary. 2 1. INTRODUCTION 1.2 Analysis Tools One of the main problems with undefined behaviour is that the range of possible errors is quite wide, the C specification lists more than 190 types of undefined behaviour. Unless a programmer knows the C standard, it is easy to make a mistake. The programmer may not even know that there is something wrong with the code. For exam- ple null pointer dereference is known bug which usually leads into a program crash. This kind of error is frequent and programmers usu- ally pay attention to prevent it. Even though statement like a[i] = i++; does not look suspicious, it is by the standard considered to be unde- fined. Again, consequences may or may not occur. Expressions like this usually do not cause program to crash, just results of program computation can be unexpected and it may be difficult to find the reason. i = i + 1; //Well defined behaviour i = i++ + 1; //Undefined behaviour Figure 1.1: Defined versus undefined behaviour The need for automated code check is obvious. For undefined be- haviour detection both static and dynamic analyzers are available. Clang is a front-end for the LLVM compiler [12] which comes with several sanitizers such as memory, address and also undefined be- haviour sanitizer. Analyzers that explicitly do not seek undefined behaviour can be used as well. For example the programming tool that detects er- rors at run-time – Valgrind [17]. Even though the main usage of Val- grind tool Memcheck is connected with memory issues, it can dis- cover problems resulting from undefined behaviour (out of bound access, multiple free memory). There are various static analyzers such as Clang Static Analyzer [13], Cppcheck [3], Splint [16] available too. They provide checkers that can detect numerous problems. Apart from performance, there are areas where results of static analysis differs from dynamic code check. 3 1. INTRODUCTION 1.3
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages53 Page
-
File Size-