The Complete Guide to Return X;

Total Page:16

File Type:pdf, Size:1020Kb

The Complete Guide to Return X; The Complete Guide to return x; I also do C++ training! [email protected] Arthur O’Dwyer 2021-05-04 Outline ● The “return slot”; NRVO; C++17 “deferred materialization” [4–23] ● C++11 implicit move [24–29]. Question break. ● Problems in C++11; solutions in C++20 [30–46]. Question break. ● The reference_wrapper saga; pretty tables of vendor divergence [47–55] ● Quick sidebar on coroutines and related topics [56–65]. Question break. ● P2266 proposed for C++23 [66–79]. Questions! Hey look! Slide numbers! 3 x86-64 calling convention int f() { _Z1fv: int i = 42; movl $42, -4(%rsp) return i; movl -4(%rsp), %eax } retq int test() _Z4testv: { callq _Z1fv int j = f(); addl $1, %eax return j + 1; retq } On x86-64, the function’s return value usually goes into the %eax register. 4 x86-64 calling convention Stack Segment int f() { Since f and test each have their own int i = 42; f i printf("%p\n", &i); stack frame, i and j naturally are different return i; prints “0x9ff00020” variables. } test j j is initialized with a int test() { copy of i — C++ : int j = f(); loves copy semantics. : printf("%p\n", &j); : return j + 1; prints “0x9ff00040” } main 5 x86-64 calling convention Stack Segment struct S { int m; }; Even for class types, C++ does “return by f i S f() { copy.” prints “ ” S i = S{42}; 0x9ff00020 The return value is printf("%p\n", &i); still passed in a test j return i; machine register } when possible. : : S test() { : S j = f(); prints “0x9ff00040” printf("%p\n", &j); main return j; } 6 x86-64 calling convention But what about when Stack Segment struct S { int m[3]; }; S is too big to fit in a register? S f() { f i Then x86-64 says that S i = S{{1,3,5}}; the caller should pass printf("%p\n", &i); an extra parameter, return i; pointing to space in } the caller’s own : return slot stack frame big : S test() { enough to hold the test: S j = f(); result. printf("%p\n", &j); This is the “return j return j; slot.” } 7 x86-64 calling convention Stack Segment struct S { int m[3]; }; The following explanation is the baseline C++98 explanation, with no optimizations or tricks. S f() { f i S i = S{{1,3,5}}; printf("%p\n", &i); Don’t worry, the tricks are return i; coming. } : return slot : S test() { test: S j = f(); printf("%p\n", &j); j return j; } 8 x86-64 calling convention Stack Segment struct S { int m[3]; }; S f() { f i S i = S{{1,3,5}}; printf("%p\n", &i); f knows the return slot’s return i; address because test passed that address to f as a hidden } : return parameter (in register %rdi). slot : At the return statement, test: S test() { i’s value is copied into the S j = f(); return slot. j printf("%p\n", &j); return j; } 9 x86-64 calling convention Stack Segment struct S { int m[3]; }; S f() { S i = S{{1,3,5}}; Now f is done and i is gone. printf("%p\n", &i); Think of the value of f() as the return i; value “in the return slot.” } : return slot : S test() { test: S j = f(); printf("%p\n", &j); j return j; } 10 x86-64 calling convention Stack Segment struct S { int m[3]; }; S f() { S i = S{{1,3,5}}; printf("%p\n", &i); return i; } : return slot : test: S test() { Finally, the value in the return S j = f(); slot is used to copy-initialize j. printf("%p\n", &j); j return j; } 11 Again! Faster! Stack Segment struct S { int m[3]; }; This step is slow. S f() { test can do better. S i = S{{1,3,5}}; Since test controls the printf("%p\n", &i); allocations of both the return return i; slot and j, and test knows that the return slot will be used } : return to copy-initialize j, test can slot allocate them both at the same : S test() { memory address and avoid test: S j = f(); having to do the copy! printf("%p\n", &j); j return j; } 12 Again! Faster! Stack Segment struct S { int m[3]; }; S f() { S i = S{{1,3,5}}; printf("%p\n", &i); return i; } f i S test() { S j = f(); return printf("%p\n", &j); test slot, also j return j; } 13 Again! Faster! Stack Segment struct S { int m[3]; }; S f() { S i = S{{1,3,5}}; printf("%p\n", &i); return i; } f i S test() { S j = f(); return printf("%p\n", &j); test slot, also j return j; } 14 Again! Faster! Stack Segment struct S { int m[3]; }; S f() { And we’re done! S i = S{{1,3,5}}; Notice that the optimizer can printf("%p\n", &i); do this optimization all by itself, under the as-if rule, because return i; the “copying” of S has no f i } user-visible side effects. S test() { S j = f(); return printf("%p\n", &j); test slot, also j return j; } 15 C++98 “copy elision” Stack Segment struct S { int m[3]; S(S&&); }; S f() { If we give the “copy” visible S i = S{{1,3,5}}; side effects, guess what? printf("%p\n", &i); We can still do the optimization! The C++98 return i; standard permitted us to elide f i } even a visible constructor call when initializing an object with S test() { a temporary of the same type. S j = f(); return printf("%p\n", &j); But wait, C++17 made it even test slot, also j return j; better!... } 16 C++17 “deferred prvalue materialization” C++17 changed the high-level formal struct S { int m[3]; S(S&&); }; semantics of a prvalue expression like “f()”. S f() { In C++14, f() eagerly evaluated into a S i = S{{1,3,5}}; temporary object that had to be moved printf("%p\n", &i); into j. Copy-elision was permitted by a return i; special case. } In C++17, a prvalue is more like a recipe for initializing an object, known S test() { as the expression’s “result object.” S j = f(); Here, that result object ultimately turns printf("%p\n", &j); out to be j itself. return j; The formal semantics changed. } The machine code did not! 17 Again! Faster! Stack Segment struct S { int m[3]; S(S&&); }; S f() { This step is slow. S i = S{{1,3,5}}; f can do better. printf("%p\n", &i); Since f controls the allocation return i; of i, and knows that it will be } used to initialize the return slot, f i f can allocate i in the return slot, and avoid having to do S test() { the copy! S j = f(); Let’s run through that... return printf("%p\n", &j); test slot, also j return j; } 18 Named Return Value Optimization Stack Segment struct S { int m[3]; S(S&&); }; S f() { S i = S{{1,3,5}}; printf("%p\n", &i); test allocates its stack frame as usual, and passes f a return i; pointer to the return slot, in } which f will construct its result. S test() { (...in which the result object of S j = f(); the prvalue f() will be return printf("%p\n", &j); materialized.) test slot, also j return j; } 19 Named Return Value Optimization Stack Segment struct S { int m[3]; S(S&&); }; S f() { S i = S{{1,3,5}}; printf("%p\n", &i); return i; } f S test() { S j = f(); i, also return test printf("%p\n", &j); slot, return j; also j } 20 Named Return Value Optimization i is already located in the Stack Segment struct S { int m[3]; S(S&&); }; return slot, so this return statement corresponds to S f() { zero machine code. S i = S{{1,3,5}}; This optimization is done printf("%p\n", &i); automatically under the as-if rule, but even when the return i; constructor call would be } visible, C++98 permits the compiler to elide it as a f S test() { special case. S j = f(); Note that i is not a prvalue, i, also return so this was not affected by test printf("%p\n", &j); C++17’s “deferred slot, also j return j; materialization” business. } 21 Conditions for NRVO to kick in Many things have to go right for NRVO to happen. ● There must be a return slot. Trivial types can just be returned in registers ○ But types with non-trivial SMFs will always be returned via return slot ● The allocation of “return variable” i must be under f’s control ○ Otherwise f can’t allocate i into the return slot! ● i must have the exact same type (modulo cv-qualification) as f’s return slot ○ Otherwise i won’t fit into the return slot! ● One mental — not physical — caveat: The return’s operand must be exactly a (possibly parenthesized) id-expression, such as i. Nothing more complicated. ○ Otherwise things could get very confusing for the human programmer! 22 Examples of NRVO not happening struct Trivial { int m; }; struct S { S(); ~S(); }; struct D : public S { int n; }; Trivial f() { Trivial x; return x; } // no return slot S x; S g1() { return x; } // g doesn’t control allocation of x S g2() { static S x; return x; } // same deal S g3(S x) { return x; } // same deal (params are caller-allocated!) S h() { D x; return x; } // D is too big for the return slot 23 Introducing move semantics ● C++11 added move constructors and move-only types, such as unique_ptr.
Recommended publications
  • Red Hat Developer Toolset 9 User Guide
    Red Hat Developer Toolset 9 User Guide Installing and Using Red Hat Developer Toolset Last Updated: 2020-08-07 Red Hat Developer Toolset 9 User Guide Installing and Using Red Hat Developer Toolset Zuzana Zoubková Red Hat Customer Content Services Olga Tikhomirova Red Hat Customer Content Services [email protected] Supriya Takkhi Red Hat Customer Content Services Jaromír Hradílek Red Hat Customer Content Services Matt Newsome Red Hat Software Engineering Robert Krátký Red Hat Customer Content Services Vladimír Slávik Red Hat Customer Content Services Legal Notice Copyright © 2020 Red Hat, Inc. The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/ . In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux ® is the registered trademark of Linus Torvalds in the United States and other countries. Java ® is a registered trademark of Oracle and/or its affiliates. XFS ® is a trademark of Silicon Graphics International Corp.
    [Show full text]
  • Reference Guide for X86-64 Cpus
    REFERENCE GUIDE FOR X86-64 CPUS Version 2019 TABLE OF CONTENTS Preface............................................................................................................. xi Audience Description.......................................................................................... xi Compatibility and Conformance to Standards............................................................ xi Organization....................................................................................................xii Hardware and Software Constraints...................................................................... xiii Conventions....................................................................................................xiii Terms............................................................................................................xiv Related Publications.......................................................................................... xv Chapter 1. Fortran, C, and C++ Data Types................................................................ 1 1.1. Fortran Data Types....................................................................................... 1 1.1.1. Fortran Scalars.......................................................................................1 1.1.2. FORTRAN real(2).....................................................................................3 1.1.3. FORTRAN 77 Aggregate Data Type Extensions.................................................. 3 1.1.4. Fortran 90 Aggregate Data Types (Derived
    [Show full text]
  • Demystifying Value Categories in C++ Icsc 2020
    Demystifying Value Categories in C++ iCSC 2020 Nis Meinert Rostock University Disclaimer Disclaimer → This talk is mainly about hounding (unnecessary) copy ctors → In case you don’t care: “If you’re not at all interested in performance, shouldn’t you be in the Python room down the hall?” (Scott Meyers) Nis Meinert – Rostock University Demystifying Value Categories in C++ 2 / 100 Table of Contents PART I PART II → Understanding References → Dangling References → Value Categories → std::move in the wild → Perfect Forwarding → What Happens on return? → Reading Assembly for Fun and → RVO in Depth Profit → Perfect Backwarding → Implicit Costs of const& Nis Meinert – Rostock University Demystifying Value Categories in C++ 3 / 100 PART I Understanding References Q: What is the output of the programs? 1 #!/usr/bin/env python3 1 #include <iostream> 2 2 3 class S: 3 struct S{ 4 def __init__(self, x): 4 int x; 5 self.x = x 5 }; 6 6 7 def swap(a, b): 7 void swap(S& a, S& b) { 8 b, a = a, b 8 S& tmp = a; 9 9 a = b; 10 if __name__ == '__main__': 10 b = tmp; 11 a, b = S(1), S(2) 11 } 12 swap(a, b) 12 13 print(f'{a.x}{b.x}') 13 int main() { 14 S a{1}; S b{2}; 15 swap(a, b); 16 std::cout << a.x << b.x; 17 } godbolt.org/z/rE6Ecd Nis Meinert – Rostock University Demystifying Value Categories in C++ – Understanding References 4 / 100 Q: What is the output of the programs? A: 12 A: 22 1 #!/usr/bin/env python3 1 #include <iostream> 2 2 3 class S: 3 struct S{ 4 def __init__(self, x): 4 int x; 5 self.x = x 5 }; 6 6 7 def swap(a, b): 7 void swap(S& a, S& b) {
    [Show full text]
  • Things I Learned from the Static Analyzer
    Things I Learned From The Static Analyzer Bart Verhagen [email protected] 15 November 2019 Static code analyzers What can they do for us? Developer High level structure Low level details Static code analyzer Expertise Static code analyzers Focus areas • Consistency • Maintenance • Performance • Prevent bugs and undefined behaviour Static code analyzers Used static code analyzers for this talk • clang-tidy • cppcheck Consistency ??? ??? #include "someType.h" class Type; template<typename T> class TemplateType; typedef TemplateType<Type> NewType; Consistency Typedef vs using Typedef vs using #include "someType.h" class Type; template<typename T> class TemplateType; typedef TemplateType<Type> NewType; using NewType = TemplateType<Type>; // typedef TemplateType<T> NewTemplateType; // Compilation error template<typename T> using NewTemplateType = TemplateType<T>; modernize-use-using Warning: use ’using’ instead of ’typedef’ The Clang Team. Clang-tidy - modernize-use-using. url: https://clang.llvm.org/extra/clang-tidy/checks/modernize-use-using.html Maintenance ??? ??? class Class { public: Class() : m_char('1'), m_constInt(2) {} explicit Class(char some_char) : m_char(some_char), m_constInt(2) {} explicit Class(int some_const_int) : m_char('1'), m_constInt(some_const_int) {} Class(char some_char, int some_const_int) : m_char(some_char), m_constInt(some_const_int) {} private: char m_char; const int m_constInt; }; Maintenance Default member initialization Default member initialization class Class { public: Class() : m_char('1'), m_constInt(2)
    [Show full text]
  • Copy Elision in Exception Handling
    Document Number: J16/04-0165 = WG21 N1725 Date: 2004-11-08 Reply to: William M. Miller Edison Design Group, Inc. [email protected] Copy Elision in Exception Handling I. The Problem Clause 15 has two passages dealing with copy elision in exception handling. 15.1¶5 says, If the use of the temporary object can be eliminated without changing the meaning of the program except for the execution of constructors and destructors associated with the use of the temporary object (12.2), then the exception in the handler can be initialized directly with the argument of the throw expression. When the thrown object is a class object, and the copy constructor used to initialize the temporary copy is not accessible, the program is ill-formed (even when the temporary object could otherwise be eliminated). Similarly, if the destructor for that object is not accessible, the program is ill-formed (even when the temporary object could otherwise be eliminated). In a sometimes overlapping, sometimes differing passage, 15.3¶17 says, If the use of a temporary object can be eliminated without changing the meaning of the program except for execution of constructors and destructors associated with the use of the temporary object, then the optional name can be bound directly to the temporary object specified in a throw-expression causing the handler to be executed. The copy constructor and destructor associated with the object shall be accessible even when the temporary object is eliminated. Part of the difference is terminological. For instance, 15.1¶5 refers to “the exception in the handler,” while the “optional name” in 15.3¶17 is a reference to the exception-declaration that is part of a handler (15¶1).
    [Show full text]
  • Nvidia Hpc Compilers Reference Guide
    NVIDIA HPC COMPILERS REFERENCE GUIDE PR-09861-001-V20.9 | September 2020 TABLE OF CONTENTS Preface..........................................................................................................................................................ix Audience Description...............................................................................................................................ix Compatibility and Conformance to Standards.......................................................................................ix Organization.............................................................................................................................................. x Hardware and Software Constraints......................................................................................................xi Conventions.............................................................................................................................................. xi Terms.......................................................................................................................................................xii Chapter 1. Fortran, C++ and C Data Types................................................................................................ 1 1.1. Fortran Data Types........................................................................................................................... 1 1.1.1. Fortran Scalars.........................................................................................................................
    [Show full text]
  • Reference Guide for Openpower Cpus
    REFERENCE GUIDE FOR OPENPOWER CPUS Version 2020 TABLE OF CONTENTS Preface............................................................................................................. ix Audience Description.......................................................................................... ix Compatibility and Conformance to Standards............................................................ ix Organization..................................................................................................... x Hardware and Software Constraints........................................................................ xi Conventions..................................................................................................... xi Terms............................................................................................................ xii Related Publications......................................................................................... xiii Chapter 1. Fortran, C, and C++ Data Types................................................................ 1 1.1. Fortran Data Types....................................................................................... 1 1.1.1. Fortran Scalars.......................................................................................1 1.1.2. FORTRAN real(2).....................................................................................3 1.1.3. FORTRAN Aggregate Data Type Extensions...................................................... 3 1.1.4. Fortran 90 Aggregate Data Types
    [Show full text]
  • Relaxing Restrictions on Arrays
    Relaxing Restrictions on Arrays Document #: P1997R1 Date: 2020-01-13 Project: Programming Language C++ Evolution Working Group Reply-to: Krystian Stasiowski <[email protected]> Theodoric Stier <[email protected]> Contents 1 Abstract 2 2 Motivation 2 2.1 Simplifying the language ......................................... 2 2.2 Advantages over std::array ....................................... 3 2.3 “Mandatory” copy elision ........................................ 4 2.4 More use in templates .......................................... 5 3 Proposal 6 3.1 Initialization ............................................... 6 3.2 Assignment ................................................ 7 3.3 Array return types ............................................ 7 3.4 Pseudo-destructors ............................................ 7 3.5 Placeholder type deduction ....................................... 8 3.6 Impact ................................................... 8 4 Design choices 9 4.1 Assignment and initialization ...................................... 9 4.2 Placeholder type deduction ....................................... 9 4.3 Pseudo-destructors ............................................ 9 4.4 Array return types and parameters ................................... 10 5 Wording 10 5.1 Allow array assignment ......................................... 10 5.2 Allow array initialization ......................................... 10 5.3 Allow pseudo-destructor calls for arrays ................................ 10 5.4 Allow returning arrays .........................................
    [Show full text]
  • C++ for Blockchain Applications.Pdf
    C++ for Blockchain Applications Godmar Back C++ History (1) ● 1979 C with Classes ○ classes, member functions, derived classes, separate compilation, public and private access control, friends, type checking of function arguments, default arguments, inline functions, overloaded assignment operator, constructors, destructors ● 1985 CFront 1.0 ○ virtual functions, function and operator overloading, references, new and delete operators, const, scope resolution, complex, string, iostream ● 1989 CFront 2.0 ○ multiple inheritance, pointers to members, protected access, type-safe linkage, abstract classes, io manipulators ● 1990/91 ARM published (CFront 3.0) ○ namespaces, exception handling, nested classes, templates C++ History (2) ● 1992 STL implemented in C++ ● 1998 C++98 ○ RTTI (dynamic_cast, typeid), covariant return types, cast operators, mutable, bool, declarations in conditions, template instantiations, member templates, export ● 1999 Boost founded ● 2003 C++03 ○ value initialization ● 2007 TR1 ○ Includes many Boost libraries (smart pointers, type traits, tuples, etc.) C++ History (3) ● 2011 C++11 ○ Much of TR1, threading, regex, etc. ○ Auto, decltype, final, override, rvalue refs, move constructors/assignment, constexpr, list initialization, delegating constructors, user-defined literals, lambda expressions, list initialization, generalized PODs, variadic templates, attributes, static assertions, trailing return types ● 2014 C++14 ○ variable templates, polymorphic lambdas, lambda captures expressions, new/delete elision, relaxed restrictions
    [Show full text]
  • Better Tools in Your Clang Toolbox
    Better Tools in Your Clang Toolbox September, 2018 Victor Ciura Technical Lead, Advanced Installer www.advancedinstaller.com Abstract Clang-tidy is the go to assistant for most C++ programmers looking to improve their code. If you set out to modernize your aging code base and find hidden bugs along the way, clang-tidy is your friend. Last year, we brought all the clang-tidy magic to Visual Studio C++ developers with a Visual Studio extension called “Clang Power Tools”. After successful usage within our team, we decided to open-source the project and make Clang Power Tools available for free in the Visual Studio Marketplace. This helped tens of thousands of developers leverage its powers to improve their projects, regardless of their compiler of choice for building their applications. Clang-tidy comes packed with over 250 built-in checks for best practice, potential risks and static analysis. Most of them are extremely valuable in real world code, but we found several cases where we needed to run specific checks for our project. This talk will share some of the things we learned while developing these tools and using them at scale on our projects and within the codebases of our community users. http://clangpowertools.com https://github.com/Caphyon/clang-power-tools 2018 Victor Ciura | @ciura_victor !X Who Am I ? Advanced Installer Clang Power Tools @ciura_victor 2018 Victor Ciura | @ciura_victor !X 2018 Victor Ciura | @ciura_victor !2 2018 Victor Ciura | @ciura_victor !3 You are here ---> 2018 Victor Ciura | @ciura_victor !4 Better Tools
    [Show full text]
  • Systems Programming in C++ Practical Course
    Systems Programming in C++ Practical Course Summer Term 2020 Organization Organization 2 Organization Course Goals Learn to write good C++ • Basic syntax • Common idioms and best practices Learn to implement large systems with C++ • C++ standard library and Linux ecosystem • Tools and techniques (building, debugging, etc.) Learn to write high-performance code with C++ • Multithreading and synchronization • Performance pitfalls 3 Organization Formal Prerequisites Knowledge equivalent to the lectures • Introduction to Informatics 1 (IN0001) • Fundamentals of Programming (IN0002) • Fundamentals of Algorithms and Data Structures (IN0007) Additional formal prerequisites (B.Sc. Informatics) • Introduction to Computer Architecture (IN0004) • Basic Principles: Operating Systems and System Software (IN0009) Additional formal prerequisites (B.Sc. Games Engineering) • Operating Systems and Hardware oriented Programming for Games (IN0034) 4 Organization Practical Prerequisites Practical prerequisites • No previous experience with C or C++ required • Familiarity with another general-purpose programming language Operating System • Working Linux operating system (e.g. Ubuntu) • Ideally with root access • Basic experience with Linux (in particular with shell) • You are free to use your favorite OS, we only support Linux • Our CI server runs Linux • It will run automated tests on your submissions 5 Organization Lecture & Tutorial • Sessions • Tuesday, 12:00 – 14:00, live on BigBlueButton • Friday, 10:00 – 12:00, live on BigBlueButton • Roughly 50% lectures
    [Show full text]
  • Video Lecture Catalog
    SENG 475 & ECE 596C, Summer 2021 1 C++ Video Lectures 1 Lecture 1 (2019-05-07) — Course Introduction [2019-05-07] The following is a link to the full video: https://youtu.be/-Jyf-U18_gI [duration: 00:48:37] The following are links to particular offsets within the video: 00:00: [course intro] SENG 475 & ECE 596C 00:24: [course intro] Course Overview [multiple slides] 02:11: [course intro] Prerequisites and Requirements 05:33: [course intro] Course Topics 07:07: [course intro] Learning Outcomes 09:42: [course intro] Course Outline and Various Other Handouts 32:02: [course intro] Video Lectures 32:37: [course intro] Computer-Based Tutorial 37:10: [course intro] Plagiarism and Other Forms of Academic Misconduct 41:54: [course intro] Software Development Environment (SDE) 42:57: [course intro] Prelude to SDE Demonstration 45:55: [course intro] SDE Demonstration 2 Lecture 2 (2019-05-08) — Algorithms and Data Structures [2019-05-08] The following is a link to the full video: https://youtu.be/JOUZZVLMJvI [duration: 00:49:42] The following are links to particular offsets within the video: 00:00: [algorithms] Algorithms [title slide] 01:07: [algorithms] Software Performance 02:16: [algorithms] Random-Access Machine (RAM) Model 04:17: [algorithms] Worst-Case, Average, and Amortized Complexity 08:21: [algorithms] Asymptotic Analysis of Algorithms 09:55: [algorithms] Big Theta (Q) Notation [algorithms] Big Theta (Q) Notation (Continued) 12:12: [algorithms] Big Oh (O) Notation [algorithms] Big Oh (O) Notation (Continued) 13:01:
    [Show full text]