A Byte Type Definition
Total Page:16
File Type:pdf, Size:1020Kb
Project: Programming Language C++, Library Evolution Working Group Document number: P0298R1 Date: 2016-07-10 Reply-to: Neil MacIntosh [email protected] A byte type definition Contents Changelog ..................................................................................................................................................... 2 Changes from R0 ....................................................................................................................................... 2 Introduction .................................................................................................................................................. 2 Motivation and Scope ................................................................................................................................... 2 Design Decisions ........................................................................................................................................... 3 std::byte is not an integer and not a character ........................................................................................ 3 std::byte is storage of bits ......................................................................................................................... 3 Implementation Alternatives ........................................................................................................................ 4 Proposed Wording Changes .......................................................................................................................... 5 Acknowledgements ..................................................................................................................................... 10 References .................................................................................................................................................. 10 1 Changelog Changes from R0 Following advice from LEWG, expanded range of bitwise operations to include compound operations (|=, &= etc) and to appropriately qualify all operations as constexpr and noexcept. Added wording for operations on byte. Corrected minor wording errors and removed use of non-standard extensions in wording for byte operations. Simplified wording describing type in Alternative B following feedback from CWG. Merged in wording content from P0257 as directed by CWG. Increased wording changes to include std::byte in 8.5.12 where appropriate, as suggested by CWG. Added editing instruction to update P0137R1 to include std::byte should both that paper and this paper end up being accepted for inclusion in the standard. Remove Alternative B (implementation-defined type) after straw polling in LWG Corrected wording based on review by LWG Added note about lack of implicit conversion in boolean contexts. Introduction The core of this proposal is to introduce a distinct type implementing the concept of byte as specified in the C++ language definition. The proposal suggests a very simple definition of that type, named std::byte. It is argued that C++17 is expressive enough for a simple library definition, as opposed to a keyword (however uglified). An additional small fix is suggested to allow std::byte to embody an aliasing property for access to arbitrary object representations. Taken altogether, std::byte is just as “builtin” as if it was designated by dedicated keyword (which would have to be ugly or incur source breaking changes.); an illustration of C++’s expressive power. Motivation and Scope Many programs require byte-oriented access to memory. Today, such programs must use either the char, signed char, or unsigned char types for this purpose. However, these types perform a “triple duty”. Not only are they used for byte addressing, but also as arithmetic types, and as character types. This multiplicity of roles opens the door for programmer error – such as accidentally performing arithmetic on memory that should be treated as a byte value – and confusion for both programmers and tools. Having a distinct byte type improves type-safety, by distinguishing byte-oriented access to memory from accessing memory as a character or integral value. It improves readability. Having the type would also make the intent of code clearer to readers (as well as tooling for understanding and transforming programs). It increases type-safety by removing ambiguities in expression of programmer’s intent, thereby increasing the accuracy of analysis tools. 2 Design Decisions std::byte is not an integer and not a character The key motivation here is to make byte a distinct type – to improve program safety by leveraging the type system. This leads to the design that std::byte is not an integer type, nor a character type. It is a distinct type for accessing the bits that ultimately make up object storage. As its underlying type is unsigned char, to facilitate bit twiddling operations, convenience conversion operations are provided for mapping a byte to an unsigned integer type value. They are provided through the function template: namespace std { template <class IntegerType> // constrained appropriately IntegerType to_integer(byte b); } This supports code that, for example, wants to produce a integer hash of an object to take a sequence of std::byte, convert them to integers and then perform arithmetic operations on them as needed. Conversely, conversions in the other direction, e.g. from an integer value to std::byte type is now made simpler, and safer with the relaxation of enum value construction rule in C++17. Now, you can just write std::byte{i} when i is a non-negative integer value no more than std::numeric_limits<unsigned char>::max(). During a previous review by CWG at the Spring 2015 meeting in Jacksonville, FL, John Spicer suggested an alternative definition that would leave much to “implementation defined” to avoid dependency on <cstddef>. That dependency is not a substantial issue in practice, for any interesting program. So, we do not recommend that path as it would add complexity to the wording and definition of std::byte. std::byte is storage of bits A byte is a collection of bits, as described in 1.7/1. std::byte would provide operations that allow manipulation of the bits that it contains. The definition suggested below is kept deliberately simple with a minimum interface. As illustration, these operations would be declared as follows: namespace std { // IntType would be constrained to be true for is_integral_v<IntType> template <class IntType> constexpr byte& operator<<=(byte& b, IntType shift) noexcept; template <class IntType> constexpr byte operator<<(byte b, IntType shift) noexcept; template <class IntType> constexpr byte& operator>>=(byte& b, IntType shift) noexcept; 3 template <class IntType> constexpr byte operator>>(byte b, IntType shift) noexcept; constexpr byte operator|(byte l, byte r) noexcept; constexpr byte& operator|=(byte& l, byte r) noexcept; constexpr byte operator&(byte l, byte r) noexcept; constexpr byte& operator&=(byte& l, byte r) noexcept; constexpr byte operator~(byte b) noexcept; constexpr byte operator^(byte l, byte r) noexcept; constexpr byte& operator^=(byte& l, byte r) noexcept; } Their semantics would be the simple ones you might expect for a non-numeric type that is a collection of bits. Right-shift will shift bits rightwards, filling trailing places with zero bits. Left-shift will shift bits leftwards, filling vacated places with zero bits. Similarly, std::byte can be compared, as comparing and ordering instances is a sensible and useful operation. Given its underlying storage type, the comparison operators would give the same results as if performed on the underlying type. Note that because std::byte is a scoped enumeration there is no way to overload an operator to provide implicit conversions in boolean contexts. It would be possible to add implicit conversion in boolean contexts, by further modifying core language wording but that is left beyond the scope of the present paper. Users can still test variables of std::byte type in boolean contexts, but must use explicit comparisons to do so: constexpr std::byte zero_byte{0x00}; if (b != zero_byte) do_something(); Because std::byte uses unsigned char as its underlying type, initialization from an integer literal is possible, which is a convenience feature. // example of initializing a byte, relying on syntax proposed for C++17 byte b { 0x01 }; Implementation Alternatives Two possible definitions of std::byte were originially considered: // Alternative A: namespace std { enum class byte : unsigned char {}; } 4 and // Alternatve B: namespace std { using byte = /* implementation defined */; } We find alternative A easier to present and explain to ordinary programmers. For all practical purposes, it has the right semantics. After presenting both alternatives to the Library Working Group, straw polling showed unanimous support for the Alternative A, which is the form presented in this paper. Proposed Wording Changes The following proposed changes are relative to N4567 [1]. Note: the wording in P0137R1 should be updated to reference std::byte as appropriate, should that proposal also be accepted for inclusion in the standard. 3.8 Object lifetime [basic.life] 5 Before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located