Effective Types: Examples (P1796R0) 2 3 4 PETER SEWELL, University of Cambridge 5 KAYVAN MEMARIAN, University of Cambridge 6 VICTOR B
Total Page:16
File Type:pdf, Size:1020Kb
1 Effective types: examples (P1796R0) 2 3 4 PETER SEWELL, University of Cambridge 5 KAYVAN MEMARIAN, University of Cambridge 6 VICTOR B. F. GOMES, University of Cambridge 7 JENS GUSTEDT, INRIA 8 HUBERT TONG 9 10 11 This is a collection of examples exploring the semantics that should be allowed for objects 12 and subobjects in allocated regions – especially, where the defined/undefined-behaviour 13 boundary should be, and how that relates to compiler alias analysis. The examples are in 14 C, but much should be similar in C++. We refer to the ISO C notion of effective types, 15 but that turns out to be quite flawed. Some examples at the end (from Hubert Tong) show 16 that existing compiler behaviour is not consistent with type-changing updates. 17 This is an updated version of part of n2294 C Memory Object Model Study Group: 18 Progress Report, 2018-09-16. 19 1 INTRODUCTION 20 21 Paragraphs 6.5p{6,7} of the standard introduce effective types. These were added to 22 C in C99 to permit compilers to do optimisations driven by type-based alias analysis, 23 by ruling out programs involving unannotated aliasing of references to different types 24 (regarding them as having undefined behaviour). However, this is one of the less clear, 25 less well-understood, and more controversial aspects of the standard, as one can see from 1 2 26 various GCC and Linux Kernel mailing list threads , blog postings , and the responses to 3 4 27 Questions 10, 11, and 15 of our survey . See also earlier committee discussion . 28 Moreover, the ISO text seems not to capture existing mainstream compiler behaviour. 29 The ISO text (recalled below) is in terms of the types of the lvalues used for access, but 30 compilers appear to do type-based alias analysis based on the construction of the lvalues, 31 not just the types of the lvalues as a whole. Additionally, some compilers seem to differ 32 from ISO in requiring syntactic visibility of union definitions in order to allow accesses to 33 structures with common prefixes inside unions. The ISO text also leaves several questions 34 unclear, e.g. relating to memory initialised piece-by-piece and then read as a struct or 35 array, or vice versa. 36 Additionally, several major systems software projects, including the Linux Ker- 37 nel, the FreeBSD Kernel, and PostgreSQL disable type-based alias analysis with the 38 -fno-strict-aliasing compiler flag. The semantics of this (as for other dialects ofC)is 39 currently not specified by the ISO standard; it is debatable whether it would be usefulto 40 do that. 41 1https://gcc.gnu.org/ml/gcc/2010-01/msg00013.html, https://lkml.org/lkml/2003/2/26/158, and http: 42 //www.mail-archive.com/[email protected]/msg01647.html 2 http://blog.regehr.org/archives/959, http://cellperformance.beyond3d.com/articles/2006/06/ 43 understanding-strict-aliasing.html, http://davmac.wordpress.com/2010/02/26/c99-revisited/, 44 http://dbp-consulting.com/tutorials/StrictAliasing.html, and http://stackoverflow.com/questions/ 45 2958633/gcc-strict-aliasing-and-horror-stories 46 3https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-survey-discussion.html (N2014), http://www. 47 open-std.org/jtc1/sc22/wg14/www/docs/n2015.pdf (N2015) 4 48 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1409.htm and http://www.open-std.org/jtc1/ sc22/wg14/www/docs/n1422.pdf (p14) 49 50 2019. 51 Draft of June 18, 2019 52 2 Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, and Hubert Tong 1.1 The ISO standard text 53 54 The C11 standard says, in 6.5: 55 6 The effective type of an object for an access to its stored value is the declared type 56 of the object, if any87). If a value is stored into an object having no declared type 57 through an lvalue having a type that is not a character type, then the type of the 58 lvalue becomes the effective type of the object for that access and for subsequent 59 accesses that do not modify the stored value. If a value is copied into an object having 60 no declared type using memcpy or memmove, or is copied as an array of character type, 61 then the effective type of the modified object for that access and for subsequent 62 accesses that do not modify the value is the effective type of the object from which 63 the value is copied, if it has one. For all other accesses to an object having no declared 64 type, the effective type of the object is simply the type of the lvalue used forthe 65 access. 66 7 An object shall have its stored value accessed only by an lvalue expression that has 67 one of the following types:88) 68 – a type compatible with the effective type of the object, 69 – a qualified version of a type compatible with the effective type of the object, 70 – a type that is the signed or unsigned type corresponding to the effective type of the 71 object, 72 – a type that is the signed or unsigned type corresponding to a qualified version of 73 the effective type of the object, 74 – an aggregate or union type that includes one of the aforementioned types among its 75 members (including, recursively, a member of a subaggregate or contained union), 76 or 77 – a character type. 78 Footnote 87) Allocated objects have no declared type. 79 Footnote 88) The intent of this list is to specify those circumstances in which an object 80 may or may not be aliased. 81 82 As Footnote 87 says, allocated objects (from malloc, calloc, and presumably any fresh 83 space from realloc) have no declared type, whereas objects with static, thread, or automatic 84 storage durations have some declared type. 85 For the latter, 6.5p{6,7} say that the effective types are fixed and that their values 86 can only be accessed by an lvalue that is similar (“compatible”, modulo signedness and 87 qualifiers), an aggregate or union containing such a type, or (to access its representation) 88 a character type. 89 For the former, the effective type is determined by the type of the last write, or, ifthat 90 is done by a memcpy, memmove, or user-code char array copy, the effective type of the source. 91 2 EFFECTIVE TYPE EXAMPLES 92 93 2.1 Basic Effective Types 94 Q73. Can one do type punning between arbitrary types? 95 This basic example involves a write of a uint32_t that is read as a float (assuming 96 that the two have the same size, and, unchecked in the code, that the latter does not 97 require a stronger alignment constraint, and that casts between those two pointer types are 98 implementation-defined to work). The example is clearly and uncontroversially forbidden 99 by the standard text, and this fact is exploited by current compilers, which use the types 100 of the arguments of f to reason that pointers p1 and p2 cannot alias. 101 102 // effective_type_1.c 103 Draft of June 18, 2019 104 Effective types: examples (P1796R0) 3 #include <stdio.h> 105 #include <inttypes.h> 106 #include <assert.h> 107 void f(uint32_t *p1, float *p2) { 108 *p1 = 2; 109 *p2 = 3.0; // does this have defined behaviour? 110 printf("f: *p1 = %" PRIu32 "\n",*p1); 111 } 112 int main() { _ 113 assert(sizeof(uint32 t)==sizeof(float)); uint32_t 114 i = 1; uint32_t p1 = &i; 115 * float *p2; 116 p2 = (float *)p1; 117 f(p1, p2); 118 printf("i=%" PRIu32 " *p1=%" PRIu32 119 " *p2=%f\n",i,*p1,*p2); 120 } 121 122 With -fstrict-aliasing (the default for GCC), GCC assumes in the body of f that the 123 write to *p2 cannot affect the value of *p1, printing 2 (instead of the integer value of the 124 representation of 3.0 that would the most recent write in a concrete semantics): while 125 with -fno-strict-aliasing it does not assume that. The former behaviour can be justified 126 by regarding the program as having undefined behaviour, due to the write of the uint32_t 127 i with a float lvalue. 128 129 2.2 Structs and their members 130 Q91. Can a pointer to a structure alias with a pointer to one of its members? 131 In this example f is given a pointer to a struct and an aliased pointer to its first member, 132 writing via the struct pointer and reading via the member pointer. We presume this 133 is intended to be allowed. The ISO text permits it if one reads the first bullet “a type 134 compatible with the effective type of the object” as referring to the int subobject of s and 135 not the whole st typed object s, but the text is generally unclear about the status of 136 subobjects. 137 138 // effective_type_2c.c 139 #include <stdio.h> 140 typedef struct { int i; } st; void f(st sp, int p) { 141 * * sp->i = 2; 142 *p = 3; 143 printf("f: sp->i=%i *p=%i\n",sp->i,*p); // prints 3,3 not 2,3 ? 144 } 145 int main() { 146 st s = {.i = 1}; 147 st *sp = &s; 148 int *p = &(s.i); 149 f(sp, p); 150 printf("s.i=%i sp->i=%i *p=%i\n", s.i, sp->i, *p); } 151 152 153 Q76. After writing a structure to a malloc’d region, can its members can be 154 accessed via pointers of the individual member types? 155 Draft of June 18, 2019 156 4 Peter Sewell, Kayvan Memarian, Victor B.