The Complete Guide to Return X;

The Complete Guide to return x; I also do C++ training! [email protected] Arthur O’Dwyer 2021-05-04 Outline ● The “return slot”; NRVO; C++17 “deferred materialization” [4–23] ● C++11 implicit move [24–29]. Question break. ● Problems in C++11; solutions in C++20 [30–46]. Question break. ● The reference_wrapper saga; pretty tables of vendor divergence [47–55] ● Quick sidebar on coroutines and related topics [56–65]. Question break. ● P2266 proposed for C++23 [66–79]. Questions! Hey look! Slide numbers! 3 x86-64 calling convention int f() { _Z1fv: int i = 42; movl $42, -4(%rsp) return i; movl -4(%rsp), %eax } retq int test() _Z4testv: { callq _Z1fv int j = f(); addl $1, %eax return j + 1; retq } On x86-64, the function’s return value usually goes into the %eax register. 4 x86-64 calling convention Stack Segment int f() { Since f and test each have their own int i = 42; f i printf("%p\n", &i); stack frame, i and j naturally are different return i; prints “0x9ff00020” variables. } test j j is initialized with a int test() { copy of i — C++ : int j = f(); loves copy semantics. : printf("%p\n", &j); : return j + 1; prints “0x9ff00040” } main 5 x86-64 calling convention Stack Segment struct S { int m; }; Even for class types, C++ does “return by f i S f() { copy.” prints “ ” S i = S{42}; 0x9ff00020 The return value is printf("%p\n", &i); still passed in a test j return i; machine register } when possible. : : S test() { : S j = f(); prints “0x9ff00040” printf("%p\n", &j); main return j; } 6 x86-64 calling convention But what about when Stack Segment struct S { int m[3]; }; S is too big to fit in a register? S f() { f i Then x86-64 says that S i = S{{1,3,5}}; the caller should pass printf("%p\n", &i); an extra parameter, return i; pointing to space in } the caller’s own : return slot stack frame big : S test() { enough to hold the test: S j = f(); result. printf("%p\n", &j); This is the “return j return j; slot.” } 7 x86-64 calling convention Stack Segment struct S { int m[3]; }; The following explanation is the baseline C++98 explanation, with no optimizations or tricks. S f() { f i S i = S{{1,3,5}}; printf("%p\n", &i); Don’t worry, the tricks are return i; coming. } : return slot : S test() { test: S j = f(); printf("%p\n", &j); j return j; } 8 x86-64 calling convention Stack Segment struct S { int m[3]; }; S f() { f i S i = S{{1,3,5}}; printf("%p\n", &i); f knows the return slot’s return i; address because test passed that address to f as a hidden } : return parameter (in register %rdi). slot : At the return statement, test: S test() { i’s value is copied into the S j = f(); return slot. j printf("%p\n", &j); return j; } 9 x86-64 calling convention Stack Segment struct S { int m[3]; }; S f() { S i = S{{1,3,5}}; Now f is done and i is gone. printf("%p\n", &i); Think of the value of f() as the return i; value “in the return slot.” } : return slot : S test() { test: S j = f(); printf("%p\n", &j); j return j; } 10 x86-64 calling convention Stack Segment struct S { int m[3]; }; S f() { S i = S{{1,3,5}}; printf("%p\n", &i); return i; } : return slot : test: S test() { Finally, the value in the return S j = f(); slot is used to copy-initialize j. printf("%p\n", &j); j return j; } 11 Again! Faster! Stack Segment struct S { int m[3]; }; This step is slow. S f() { test can do better. S i = S{{1,3,5}}; Since test controls the printf("%p\n", &i); allocations of both the return return i; slot and j, and test knows that the return slot will be used } : return to copy-initialize j, test can slot allocate them both at the same : S test() { memory address and avoid test: S j = f(); having to do the copy! printf("%p\n", &j); j return j; } 12 Again! Faster! Stack Segment struct S { int m[3]; }; S f() { S i = S{{1,3,5}}; printf("%p\n", &i); return i; } f i S test() { S j = f(); return printf("%p\n", &j); test slot, also j return j; } 13 Again! Faster! Stack Segment struct S { int m[3]; }; S f() { S i = S{{1,3,5}}; printf("%p\n", &i); return i; } f i S test() { S j = f(); return printf("%p\n", &j); test slot, also j return j; } 14 Again! Faster! Stack Segment struct S { int m[3]; }; S f() { And we’re done! S i = S{{1,3,5}}; Notice that the optimizer can printf("%p\n", &i); do this optimization all by itself, under the as-if rule, because return i; the “copying” of S has no f i } user-visible side effects. S test() { S j = f(); return printf("%p\n", &j); test slot, also j return j; } 15 C++98 “copy elision” Stack Segment struct S { int m[3]; S(S&&); }; S f() { If we give the “copy” visible S i = S{{1,3,5}}; side effects, guess what? printf("%p\n", &i); We can still do the optimization! The C++98 return i; standard permitted us to elide f i } even a visible constructor call when initializing an object with S test() { a temporary of the same type. S j = f(); return printf("%p\n", &j); But wait, C++17 made it even test slot, also j return j; better!... } 16 C++17 “deferred prvalue materialization” C++17 changed the high-level formal struct S { int m[3]; S(S&&); }; semantics of a prvalue expression like “f()”. S f() { In C++14, f() eagerly evaluated into a S i = S{{1,3,5}}; temporary object that had to be moved printf("%p\n", &i); into j. Copy-elision was permitted by a return i; special case. } In C++17, a prvalue is more like a recipe for initializing an object, known S test() { as the expression’s “result object.” S j = f(); Here, that result object ultimately turns printf("%p\n", &j); out to be j itself. return j; The formal semantics changed. } The machine code did not! 17 Again! Faster! Stack Segment struct S { int m[3]; S(S&&); }; S f() { This step is slow. S i = S{{1,3,5}}; f can do better. printf("%p\n", &i); Since f controls the allocation return i; of i, and knows that it will be } used to initialize the return slot, f i f can allocate i in the return slot, and avoid having to do S test() { the copy! S j = f(); Let’s run through that... return printf("%p\n", &j); test slot, also j return j; } 18 Named Return Value Optimization Stack Segment struct S { int m[3]; S(S&&); }; S f() { S i = S{{1,3,5}}; printf("%p\n", &i); test allocates its stack frame as usual, and passes f a return i; pointer to the return slot, in } which f will construct its result. S test() { (...in which the result object of S j = f(); the prvalue f() will be return printf("%p\n", &j); materialized.) test slot, also j return j; } 19 Named Return Value Optimization Stack Segment struct S { int m[3]; S(S&&); }; S f() { S i = S{{1,3,5}}; printf("%p\n", &i); return i; } f S test() { S j = f(); i, also return test printf("%p\n", &j); slot, return j; also j } 20 Named Return Value Optimization i is already located in the Stack Segment struct S { int m[3]; S(S&&); }; return slot, so this return statement corresponds to S f() { zero machine code. S i = S{{1,3,5}}; This optimization is done printf("%p\n", &i); automatically under the as-if rule, but even when the return i; constructor call would be } visible, C++98 permits the compiler to elide it as a f S test() { special case. S j = f(); Note that i is not a prvalue, i, also return so this was not affected by test printf("%p\n", &j); C++17’s “deferred slot, also j return j; materialization” business. } 21 Conditions for NRVO to kick in Many things have to go right for NRVO to happen. ● There must be a return slot. Trivial types can just be returned in registers ○ But types with non-trivial SMFs will always be returned via return slot ● The allocation of “return variable” i must be under f’s control ○ Otherwise f can’t allocate i into the return slot! ● i must have the exact same type (modulo cv-qualification) as f’s return slot ○ Otherwise i won’t fit into the return slot! ● One mental — not physical — caveat: The return’s operand must be exactly a (possibly parenthesized) id-expression, such as i. Nothing more complicated. ○ Otherwise things could get very confusing for the human programmer! 22 Examples of NRVO not happening struct Trivial { int m; }; struct S { S(); ~S(); }; struct D : public S { int n; }; Trivial f() { Trivial x; return x; } // no return slot S x; S g1() { return x; } // g doesn’t control allocation of x S g2() { static S x; return x; } // same deal S g3(S x) { return x; } // same deal (params are caller-allocated!) S h() { D x; return x; } // D is too big for the return slot 23 Introducing move semantics ● C++11 added move constructors and move-only types, such as unique_ptr.

The Complete Guide to Return X;

Red Hat Developer Toolset 9 User Guide

Reference Guide for X86-64 Cpus

Demystifying Value Categories in C++ Icsc 2020

Things I Learned from the Static Analyzer

Copy Elision in Exception Handling

Nvidia Hpc Compilers Reference Guide

Reference Guide for Openpower Cpus

Relaxing Restrictions on Arrays

C++ for Blockchain Applications.Pdf

Better Tools in Your Clang Toolbox

Systems Programming in C++ Practical Course

Video Lecture Catalog