<<

Volume 18 | Issue 3 May-June 2020

The Life of a scrum Data essentials Data on the cards Outside vs. Data on the Inside

The History, Status and Future of FPGAs

::::: Complete table of contents on the following two pages Features contents Data on the The History, Outside vs. Data Status, and Future MAY-JUNE on the Inside 43 of FPGAs 71 Services are essential to From the early days of building large applications telecom, through the today. Each service high-performance 2020 has its own data, and computing and data that data may reside centers of today, field- inside or outside of that programmable gate service. Where it resides arrays have been hitting determines how that data a nerve in the ASIC should be treated. community. PAT HELLAND OSKAR MENCER ET AL.

Scrum Essentials Cards 83 The popular agile framework Scrum can improve the way a development team works together. Here we present a set of cards based on the Essence standard, which can Scrum more effective. JEFF SUTHERLAND

IVAR JACOBSON Volume 18 Issue 3

BRIAN KERR 2

acmqueue | may-june 2020 2 columns / departments I COMMIT TO MEMORY The Life of a Data Byte 5 As we all know, the state- of-the-art in storage media has come a ridiculously long way, from paper tape to flash. And it’s still evolving to ever faster, smaller storage technology. JESSIE FRAZELLE I

contents EVERYTHING SYSADMIN Five Nonobvious Remote Work Techniques 29 If ever there were a time to refine the practice of working remotely, it is now. Stack Overflow has been doing it for a while and offers some pointers for emulating in-person efficiency. THOMAS A. LIMONCELLI I Kode Vicious Sanity vs. Invisible MAY-JUNE Markings 39 Python and a few other programming languages make significant use of 2020 white space. This is a long-time practice that Volume 18 Issue 3 needs to change. GEORGE V. NEVILLE-NEIL

acmqueue | may-june 2020 3 ACM, the world’s largest educational and scientific computing society, delivers resources that advance computing as a science and profession. ACM provides the computing field’s premier Digital Library and serves its members and the computing profession with leading- edge publications, conferences, and career resources.

Queue STAFF EXECUTIVE DIRECTOR / CEO Queue editorial Board EXECUTIVE EDITOR Vicki L. Hanson BOARD CO-CHAIRS James Maurer DEPUTY EXECUTIVE DIRECTOR / COO Stephen Bourne SENIOR EDITOR Patricia Ryan Theo Schlossnagle Mark Compton BOARD MEMBERS ACM Executive committee Samy Bahra MANAGING EDITOR PRESIDENT Peter Bailis WEB EDITOR Cherri M. Pancake Matt Slaybaugh Betsy Beyer VICE-PRESIDENT Terry Coatta COPY EDITORS Elizabeth Churchill Susie Holly Stuart Feldman SECRETARY / TREASURER Nicole Forsgren Vicki Rosenzweig Yannis Ioannidis Camille Fournier ART DIRECTION Jessie Frazelle Reuter & Associates PAST PRESIDENT Alexander L. Wolf Benjamin Fried Chris Grier Practitioner Board CONTACT POINTS Tom Killalea BOARD CHAIR feedback@ Tom Limoncelli queue.acm.org Terry Coatta Kate Matsudaira Erik Meijer editor@ BOARD MEMBERS queue.acm.org Eve A. Andersson George Neville-Neil Jessica Bell Jim Waldo [email protected] Adam Cole SUBSCRIPTIONS http://queue.acm.org Andrew Conklin A one-year subscription Juan Miguel DeJoya (six bi-monthly issues) to the Scott Hanselman digital magazine is $19.99 (free to Stephen Ibaraki ACM Professional Members) and Peiyun (Sabrina) Hsueh available through your favorite merchant store (Mac App Store/ (ISSN 1542-7730) is Rashmi Mohan play). A browser-based published bi-monthly by Larisa Sawyer version is available through the ACM, 1601 Broadway, Harald Storrle acmqueue website (queue.acm.org) 10th Floor, New York, NY Theo E. Schlossnagle 10019. USA Sophie Watson SINGLE COPIES T (212) 869-7440 Single copies are available through F (212) 869-0481 the same venues and are $6.99 each (free to ACM Professional Members). acmqueue | may-june 2020 4 TEXT COMMIT TO 1 of 24 memory ONLY The Life of a Data Byte Be kind and rewind

JESSIE FRAZELLE

byte of data has been stored in a number of different ways through the years as newer, better, and faster storage media are introduced. A byte is a unit of digital information that most commonly refers to eight . A is a unit of information Athat can be expressed as 0 or 1, representing a logical state. Let’s take a brief walk down memory lane to learn about the origins of bits and . Going back in time to Babbage’s Analytical Engine, you can see that a bit was stored as the position of a mechanical gear or lever. In the case of paper cards, a bit was stored as the presence or absence of a hole in the card at a specific place. For magnetic storage devices, such as tapes and disks, a bit is represented by the polarity of a certain area of the magnetic film. In modern DRAM (dynamic random-access memory), a bit is often represented as two levels of electrical charge stored in a capacitor, a device that stores electrical energy in an electric field. (In the early 1960s, the paper cards used to input programs for IBM mainframes were known as Hollerith cards, named after their inventor, Herman Hollerith from the Tabulating Machines Company—which through numerous mergers is what is now known as IBM.) In June 1956, Werner Buchholz (archive. computerhistory.org) coined the word byte (archive. org) to refer to a group of bits used to encode a single

acmqueue | may-june 2020 5 COMMIT TO 2 of 24 memory I

of text (bobbemer.com). Let’s address character encoding, starting with ASCII (American Standard Code for Information Interchange). ASCII was based on the English alphabet; therefore, every letter, digit, and symbol (a-z, A-Z, 0–9, +, -, /, “, !, etc.) was represented as a seven-bit integer between 32 and 127. This wasn’t very friendly to other languages. To support other languages, Unicode extended ASCII so that each character is represented as a code-point. For example, a lowercase j is U+006A, where U stands for Unicode followed by a hexadecimal number. UTF-8 is the standard for representing characters as eight bits, allowing every code-point from 0 to 127 to be stored in a single byte. This is fine for English characters, but other languages often have characters that are expressed as two or more bytes. UTF-16 is the standard for representing characters as 16 bits, and UTF-32 is the standard for 32 bits. In ASCII every character is a byte, but in Unicode, that’s not true—a character can be one, two, three, or more bytes. Groups of characters might also be referred to as words, as in this linked Univac ad (archive. org) calling out “1 kiloword or 12,000 characters.” This article refers throughout to different-sized groupings of bits—the number of bits in a byte varying according to the design of the storage medium in the past. This article also travels in time through various storage media, diving into how data has been stored throughout history. By no means does this include every single storage medium ever manufactured, sold, or distributed. This article is meant to be fun and informative but not encyclopedic. It wraps up with a look at the current and future technologies for storage.

acmqueue | may-june 2020 6 COMMIT TO 3 of 24 memory I

To get started, let’s assume we have a byte of data to be stored: the letter j, or as an encoded byte 6a, or in binary 01001010. As we travel through time, this data byte will come into play in some of the storage technologies covered here.

1951 The story begins in 1951 with the Uniservo tape drive for the Univac 1 computer, the first tape drive made for a t this point in history, commercial computer. The tape was three pounds of a you could thin strip (half-inch) of nickel-plated phosphor bronze, measure called vicalloy, which was 1,200 feet long. Our data byte the speed could be stored at a rate of 7,200 characters per second Aof a storage algorithm by (computerhistory.org) on tape moving at 100 inches per the distance the second. At this point in history, you could measure the tape traveled. speed of a storage algorithm by the distance the tape traveled.

1952 Let’s fast forward a year to May 21, 1952, when IBM announced its first magnetic tape unit, the IBM 726. Our data byte could now be moved off Uniservo metal tape onto IBM’s magnetic tape. This new home would be roomy for our very small data byte since the tape could store up to 2 million digits. This magnetic seven-track tape moved at 75 inches per second with a transfer rate of 12,500 digits (.com) or 7,500 characters (ibm.com)—called copy groups at the time— per second. For reference, this article has 35,123 characters. Seven-track tapes had six tracks for data and one to

acmqueue | may-june 2020 7 COMMIT TO 4 of 24 memory I

maintain parity by ensuring that the total number of 1-bits in the string was even or odd. Data was recorded at 100 bits per linear inch. This system used a vacuum channel method of keeping a loop of tape circulating between two points. This allowed the tape drive to start and stop the tape in a split second. This was done by placing long vacuum columns between the tape reels and the read/ write heads to absorb sudden increases in tension in the tape, without which the tape would have typically broken. A removable plastic ring on the back of the tape reel provided write protection. About 1.1 MB could be stored on one reel of tape (spectrum.ieee.org). Think back to VHS tapes: What was required before returning a movie to Blockbuster? Rewinding the tape! The same could be said for the tape used for computers. Programs could not hop around a tape, or randomly access data—they had to read and write in sequential order.

1956 The era of magnetic-disk storage began in 1956 with IBM’s completion of the 305 RAMAC computer delivered to Zellerbach Paper in San Francisco (ibm.com). This computer was the first to use a moving-head HDD (). The RAMAC disk drive consisted of 50 magnetically coated 24-inch-diameter metal platters capable of storing about 5 million characters of data, at seven bits per character, and spinning at 1,200 revolutions per minute. The storage capacity was about 3.75 MB. RAMAC allowed realtime random access memory to large amounts of data, unlike magnetic tape or punch cards. IBM advertised the RAMAC as being able to store

acmqueue | may-june 2020 8 COMMIT TO 5 of 24 memory I

the equivalent of 64,000 punched cards (youtube.com). Previously, transactions were held until a group of data was accumulated and batch processed. The RAMAC introduced the concept of continuously processing transactions as they occurred so data could be retrieved immediately when it was fresh. Our data byte could now be accessed in the RAMAC at 100,000 bits per second (ibm. com). Prior to this, with tapes, people had to write and read sequential data and could not randomly jump to various parts of the tape. Realtime random access of data was truly revolutionary at this time.

1963 DECtape was introduced in 1963. Its namesake was the Digital Equipment Corporation, known as DEC for short. DECtape was inexpensive and reliable, so it was used in many generations of DEC computers. This ¾-inch tape was laminated and sandwiched between two layers of Mylar on a four-inch reel. DECtape could be carried by hand, as opposed to its large weighty predecessors, making it great for personal computers. In contrast to seven-track tape, DECtape had six data tracks, two mark tracks, and two clock tracks. Data was recorded at 350 bits per inch. Our data byte, which is eight bits but could be expanded to 12, could be transferred to DECtape at 8,325 12-bit words per second with a tape speed of 93 +/-12 inches per second (pdp8. net). This is eight percent more digits per second than the Uniservo metal tape from 1952.

acmqueue | may-june 2020 9 COMMIT TO 6 of 24 memory I

1967 Four years later, in 1967, a small team at IBM started working on the IBM floppy-disk drive, code-named Minnow (ibm.com). At the time, the team was tasked with developing a reliable and inexpensive way to load microcode into IBM System/370 mainframes (archive.org). The project then got reassigned and repurposed to load microcode into the controller for the IBM 3330 Direct Access Storage Facility, code-named Merlin. Our data byte could now be stored on read-only eight- inch flexible Mylar disks coated with magnetic material, known as floppy disks. At the time of release, the result of the project was named the IBM 23FD Floppy Disk Drive System. The disks could hold 80 KB of data. Unlike hard drives, a user could easily transfer a floppy in its protective jacket from one drive to another. Later, in 1973, IBM released a read/write floppy-disk drive, which then became an industry standard (web.archive.org).

1969 In 1969, the AGC (Apollo Guidance Computer) read-only rope memory was launched into space aboard Apollo 11, FIGURE 1: ROPE MEMORY which carried American astronauts to the moon and back. This rope memory was made by hand and could hold 72KB of data. Manufacturing rope memory was laborious, slow, and required skills analogous to textile work; it could take months to weave a program into the rope memory (authors.library.caltech.edu), shown in figure 1. But it was the right tool for the job at the time to resist the harsh rigors of space. When a wire went through one of the circular cores, it represented a 1. Wires that went around a

acmqueue | may-june 2020 10 COMMIT TO 7 of 24 memory I

core represented a 0. Our data byte would take a human a few minutes to weave into the rope.

1977 The Commodore PET, the first (successful) mass-market personal computer, was released in 1977. Built into the PET was a Commodore 1530 Datasette (a portmanteau of data plus cassette). The PET converted data into analog sound signals that were then stored on cassettes (wav-prg. sourceforge.net). This made for a cost-effective and reliable storage solution, albeit very slow. Our small data byte could be transferred at a rate of around 60-70 bytes per second (c64-wiki.com). The cassettes could hold about 100 KB per 30-minute side, with two sides per tape. For example, you could fit about two (“rickroll” warning) 55KB images (blog.jessfraz.com) on one side of the cassette. The Datasette also appeared in the Commodore VIC-20 and Commodore 64.

1978 Let’s jump ahead a year to 1978 when the LaserDisc was introduced as Discovision by MCA and Philips. Jaws was the first film sold on a LaserDisc in North America. The audio and video quality on a LaserDisc were far better than the competitors’, but too expensive for most consumers. As opposed to VHS tape, which consumers could use to record TV programs, the LaserDisc could not be written to. LaserDiscs used analog video with analog FM stereo sound and PCM (pulse-code modulation) digital audio (tools.ietf. org). The disks were 12 inches in diameter and composed of two single-sided aluminum disks layered in plastic. The

acmqueue | may-june 2020 11 COMMIT TO 8 of 24 memory I

LaserDisc is remembered today as being the foundation CDs that DVDs were built upon.

1979 A year later, in 1979, Alan Shugart and Finis Conner founded Seagate Technology with the idea of scaling down a hard disk drive to be the same size as a 5¼-inch floppy disk, which at the time was the standard. Their first product, in 1980, was the Seagate ST506, the first HDD for microcomputers. The 5¼-inch disk held 5 MB of data, which at the time was five times more than the standard floppy disk. It was a rigid, metallic platter coated on both sides with a thin layer of magnetic material to store data. Our data byte could be transferred at a speed of 625 KB/s (pcmag.com) onto the disk. That’s about one (second and final “rickroll” warning) 625KB animated GIF (blog.jessfraz. com) per second.

1981 A couple of years later Sony introduced the first 3½-inch floppy drive. Hewlett-Packard was the first adopter of the technology in 1982 with its HP-150. This put the 3½- inch floppy disk on the map and gave it wide distribution in the industry (jstor.org). The disks were single-sided with a formatted capacity of 161.2 KB and an unformatted capacity of 218.8 KB. In 1982 the double-sided version was made available, and the Microfloppy Industry Committee, a consortium of 23 media companies, based a spec for a 3½- inch floppy on Sony’s original designs, cementing the format into history (americanradiohistory.com). Our data byte could

acmqueue | may-june 2020 12 COMMIT TO 9 of 24 memory I

now be stored on the early version of one of the most widely distributed storage media: the 3½-inch floppy disk.

1984 In 1984 the CD-ROM (compact disk read-only memory), holding 550 MB of prerecorded data, was announced by Sony and Philips. This format grew out of CD-DA (compact disk digital audio), developed by the two companies in 1982. The CD-DA, which was used for distributing music, had a capacity of 74 minutes. When Sony and Philips were negotiating the standard for a CD-DA, legend has it that one of the four people involved insisted it be able to hold all of Beethoven’s Ninth Symphony (wired.com). The first product released on CD-ROM was Grolier’s Electronic Encyclopedia, which came out in 1985. The encyclopedia contained 9 million words, occupying only 12 percent of the available space, which was 553 mebibytes (books.google. co.uk). There would be more than enough room for the encyclopedia and our data byte. Shortly thereafter in 1985, computer and electronics companies worked together to create a standard for the disks so any computer would be able to access the information.

1984 Also in 1984, Fujio Masuoka published his work on a new type of floating-gate memory, called flash memory, which was capable of being erased and reprogrammed multiple times. Let’s first review how floating-gate memory works. It uses transistors, which are electrical gates that can be switched on and off individually. Since each transistor can be in two distinct states (on or off), it can store two

acmqueue | may-june 2020 13 COMMIT TO 10 of 24 memory I

different numbers: 0 and 1. Floating gate refers to the second gate added to the middle transistor. This second gate is insulated by a thin oxide layer. These transistors use a small voltage applied to the gate of the transistor to denote whether it is on or off, which in turn translates to a 0 or 1. With a floating gate, when a suitable voltage is applied across the oxide layer, the electrons tunnel through it and get stuck on the floating gate. Therefore, even if the power is disconnected, the electrons remain present on the floating gate. When no electrons are on the floating gate, it represents a 1; and when electrons are trapped on the floating gate, it represents a 0. Reversing this process and applying a suitable voltage across the oxide layer in the opposite direction causes the electrons to tunnel off the floating gate and restores the transistor to its original state. Thus, the cells are made programmable and nonvolatile (economist.com). Our data byte could be programmed into the transistors as 01001010, with electrons trapped in the floating gates to represent the zeros. Masuoka’s design was a bit more affordable but less flexible than EEPROM (electrically erasable programmable read-only memory), since it required multiple groups of cells to be erased together, but this also accounted for its speed. At the time, Masuoka was working for Toshiba but quit the company shortly after to become a professor at Tohoku University. He was displeased with Toshiba for not rewarding him for his work and sued the company, demanding compensation for his work. The case settled in 2006 with a one-time payment of ¥87m,

acmqueue | may-june 2020 14 COMMIT TO 11 of 24 memory I

equivalent to $758,000. This still seems low, given how impactful flash memory has been on the industry. While on the topic of flash memory, let’s look at the difference between NOR and NAND flash. Flash stores information in memory cells made up of floating-gate transistors. The names of the technologies are tied directly to the way the memory cells are organized. In NOR flash, individual memory cells are connected in parallel, allowing random access. This architecture enables the short read times required for the random access of microprocessor instructions. NOR flash is ideal for lower-density applications that are mostly read- only. This is why most CPUs typically load their firmware from NOR flash. Masuoka and colleagues presented the invention of NOR flash in 1984 and NAND flash in 1987 (ieeexplore.ieee.org). In contrast, NAND flash designers gave up the ability for random access in a tradeoff to gain a smaller memory cell size. This also has the benefits of a smaller chip size and lower cost-per-bit. NAND flash’s architecture consists of an array of eight memory transistors connected in a series. This leads to high storage density, smaller memory- cell size, and faster write and erase since it can program blocks of data at a time. This comes at the cost of having to overwrite data when it is not sequentially written and data already exists in a (aturing.umcs.maine.edu).

1991 Let’s jump ahead to 1991, when a prototype SSD (solid-state disk) module was made for evaluation by IBM from SanDisk, at the time known as SunDisk (computerhistory.org). This

acmqueue | may-june 2020 15 COMMIT TO 12 of 24 memory I

design combined a flash storage array and nonvolatile memory chips with an intelligent controller to detect and correct defective cells automatically. The disk was 20 MB in a 2½-inch form factor and sold for around $1,000 (meseec.ce.rit.edu). IBM wound up using it in the ThinkPad pen computer (westerndigital.com).

1994 In 1994 Iomega released the Zip disk, a 100MB cartridge unDisk was 20MB in a in a 3½-inch form factor, a bit thicker than a standard 3½- 2½-inch inch disk. Later versions of the Zip disk could store up to 2 form factor GB. These disks had the convenience of being as small as and sold for a floppy disk but with the ability to hold a larger amount Saround $1,000. of data, which made them compelling. Our data byte could be written onto a Zip disk at 1.4 MB/s. At the time, a 1.44MB 3½-inch floppy would write at about 16 kB/s. In a Zip drive, heads are noncontact read/write and fly above the surface, which is similar to a hard drive but unlike other floppies. Because of reliability problems and the affordability of CDs, Zip disks eventually became obsolete.

1994 Also in 1994, SanDisk introduced CompactFlash, which was widely adopted into consumer devices such as digital and video cameras. Like CD-ROMs, CompactFlash speed is based on x-ratings (8x, 20x, 133x, etc.). The maximum transfer rate is calculated based on the original audio CD transfer rate of 150 kB/s. This winds up looking like R = K 150 kB/s, where R is the transfer rate and K is the speed rating. For 133x CompactFlash, our data byte would ⨉ be written at 133 150 kB/s, or around 19,950 kB/s or

⨉ acmqueue | may-june 2020 16 COMMIT TO 13 of 24 memory I

19.95 MB/s. The CompactFlash Association was founded in 1995 to create an industry standard for flash-based memory cards.

1997 A few years later in 1997 the CD-RW (compact disc rewritable) was introduced. These optical discs was used for data storage, as well as for backing up and transferring files to various devices. CD-RWs can be rewritten only about 1,000 times, which at the time was not a limiting factor since users rarely overwrote data on one disk. CD-RWs are based on phase-change technology. During a phase change of a given medium, certain properties of the medium change. In the case of CD-RWs, phase shifts in a special compound, composed of silver, tellurium, and indium, cause reflecting lands and non-reflecting bumps, each representing a 0 or 1. When the compound is in a crystalline state, it is translucent, which indicates a 1. When the compound is melted into an amorphous state, it becomes opaque and nonreflective, which indicates a 0 (computer.howstuffworks.com). We could write our data byte 01001010 as non-reflecting bumps and reflecting lands this way. CD-RWs eventually lost much of their market share to DVDs.

1999 In 1999 IBM introduced the smallest hard drives in the world at the time: the IBM in 170MB and 340MB capacities. These were small hard disks, one inch in size, designed to fit into CompactFlash Type II slots. The intent

acmqueue | may-june 2020 17 COMMIT TO 14 of 24 memory I

was to create a device to be used like CompactFlash but with more storage capacity. These were soon replaced by USB flash drives, however, and larger CompactFlash cards once they became available. Like other hard drives, were mechanical and contained small, spinning disk platters.

2000 USB flash drives were introduced in 2000. These consisted of flash memory encased in a small form factor with a USB interface. Depending on the version of the USB interface used, the speed varies: USB 1.1 is limited to 1.5 Mbps, whereas USB 2.0 can handle 35 Mbps, and USB 3.0 can handle 625 Mbps (diffen.com). The first USB 3.1 type C drives were announced in March 2015 and have read/ write speeds of 530 Mbps (web.archive.org). Unlike floppy and optical disks, USB devices are harder to scratch but still deliver the same use cases of data storage and transferring and backing up files. Because of this, drives for floppy and optical disks have since faded in popularity, replaced by USB ports.

2005 HDD manufacturers started shipping products using PMR (perpendicular magnetic recording) (youtube) in 2005. Interestingly, this happened at the same time that Apple announced the iPod Nano, which used flash as opposed to the one-inch hard drives in the iPod Mini, causing a bit of an industry hoohaw (eetimes.com). A typical hard drive contains one or more rigid disks coated with a magnetically sensitive film consisting of tiny

acmqueue | may-june 2020 18 COMMIT TO 15 of 24 memory I

magnetic grains. Data is recorded when a magnetic write- head flies just above the spinning disk, much like a record player and a record, except a needle is in physical contact with the record. As the platters spin, the air in contact with them creates a slight breeze. Just as air on an airplane wing generates lift, the air generates lift on the head’s airfoil (books.google.com). The write-head rapidly flips the magnetization of one magnetic region of grains so that its magnetic pole points up or down to denote a 1 or a 0. The predecessor to PMR was LMR (longitudinal magnetic recording). PMR can deliver more than three times the storage density of LMR. The key difference between the two is that the grain structure and the magnetic orientation of the stored data of PMR media is columnar instead of longitudinal. PMR has better thermal stability and improved SNR (signal-to-noise ratio) as a result of better grain separation and uniformity. It also benefits from better writability because of stronger head fields and better magnetic alignment of the media. Like LMR, PMR’s fundamental limitations are based on the thermal stability of magnetically written bits of data and the need to have sufficient SNR to read back the information.

2007 Global Storage Technologies announced the first 1TB HDD in 2007. The Hitachi Deskstar 7K1000 used five 3.5-inch 200GB platters and rotated at 7,200 RPM. This is in stark contrast to the world’s first HDD, the IBM RAMAC 350, which had a storage capacity of approximately 3.75 MB. How far we have come in 51 years! But wait, there’s more.

acmqueue | may-june 2020 19 COMMIT TO 16 of 24 memory I

2009 In 2009 technical work was beginning on NVMe (nonvolatile memory express) (flashmemorysummit.com). NVM is a type of memory that has persistence, in contrast to volatile memory, which needs constant power to retain data. NVMe filled a need for a scalable host controller interface for PCIe (peripheral component interconnect express)-based SSDs (nvmexpress.org). More than 90 companies were part of the working group that developed the design. This was all based on prior work to define the NVMHCIS (nonvolatile memory host controller interface specification). Opening up a modern server would likely result in finding some NVMe drives. The best NVMe drives today can do about a 3,500 MB/s read and 3,300 MB/s write (pcgamer.com). For the data byte we started with, the character j, that is extremely fast compared with a couple of minutes to handweave rope memory for the Apollo Guidance Computer.

TODAY AND THE FUTURE Now that we have traveled through time a bit, let’s take a look at the state of the art for SCM (storage class memory). Like NVM, SCM is persistent but goes further by also providing performance better than or comparable to primary memory, as well as byte addressability (ieee. org). SCM aims to address some of the problems faced by caches today, such as the low density of SRAM (static random access memory). DRAM (arxiv.org) provides better density, but this comes at a cost of slower access times. DRAM also suffers from requiring constant power to refresh memory.

acmqueue | may-june 2020 20 COMMIT TO 17 of 24 memory I

Let’s break this down a bit. Power is required since the electric charge on the capacitors leaks off little by little; this means that without intervention, the data on the chip would soon be lost. To prevent this leakage, DRAM requires an external memory-refresh circuit that periodically rewrites the data in the capacitors, restoring them to their original charge. To solve the problems with density and power leakage, a few SCM technologies are in development: PCM (phase- change memory), STT-RAM (spin-transfer torque random access memory), and ReRAM (resistive random access memory). One nice aspect of all these technologies is their ability to function as MLCs (multilevel cells). This means they can store more than one bit of information, compared with SLCs (single-level cells), which can store only one bit per memory cell, or element. Typically, a memory cell consists of one MOSFET (metal-oxide-semiconductor field- effect transistor). MLCs reduce the number of MOSFETs required to store the same amount of data as SLCs, making them denser or smaller to deliver the same amount of storage as technologies using SLCs. Let’s go over how each of these SCM technologies work.

Phase-change memory PCM is similar to phase change for CD-RWs, described earlier. Its phase-change material is typically GST, or GeSbTe (germanium-antimony-tellurium), which can exist in two different states: amorphous and crystalline. The amorphous state has a higher resistance, denoting a 0, than the crystalline state, denoting a 1. By assigning data values to intermediate resistances, PCM can be used to

acmqueue | may-june 2020 21 COMMIT TO 18 of 24 memory I

store multiple states as an MLC (ieeexplore.ieee.org).

Spin-transfer torque random access memory STT-RAM consists of two ferromagnetic, permanent magnetic layers separated by a dielectric, an insulator that can transmit electric force without conduction. It stores bits of data based on differences in magnetic directions. One magnetic layer, called the reference layer, has a fixed magnetic direction, while the other magnetic layer, called the free layer, has a magnetic direction that is controlled by passing current. For a 1, the magnetization directions of the two layers are aligned. For a 0, the two layers have opposing magnetic directions.

Resistive random access memory A ReRAM cell consists of two metal electrodes separated by a metal-oxide layer. This is similar to Masuoka’s original flash-memory design, where electrons would tunnel through the oxide layer and get stuck in the floating gate or vice versa. With ReRAM, however, the state of the cell is determined by the concentration of oxygen vacancy in the metal-oxide layer.

SCM downsides and upsides While these SCM technologies are promising, they still have downsides. PCM and STT-RAM have high write latencies. PCM’s latencies are 10 times that of DRAM, while STT-RAM has 10 times the latencies of SRAM. PCM and ReRAM have a limit on write endurance before a hard error occurs, meaning a memory element gets stuck at a particular value (arxiv.org).

acmqueue | may-june 2020 22 COMMIT TO 19 of 24 memory I

In August 2015 Intel announced Optane, a product built on 3DXPoint, pronounced 3D cross-point (anandtech. com). Optane claims performance 1,000 times faster than NAND SSDs with 1,000 times the performance, while being four to five times the price of flash memory. Optane is proof that SCM is not just experimental. It will be interesting to watch how these technologies evolve.

Helium hard disk drive (HHDD) An HHDD (helium hard disk drive) is a high-capacity HDD filled with helium and hermetically sealed during manufacturing. Like other hard disks, covered earlier, it is much like a record player with a rotating magnetic-coated platter. Typical HDDs would just have air inside the cavity, but that air causes an amount of drag on the spin of the platters. Helium balloons float, so we know helium is lighter than air. Helium is, in fact, one-seventh the density of air, therefore reducing the amount of drag on the spin of the platters, causing a reduction in the amount of energy required for the disks to spin. This is actually a secondary feature; the primary advantage of helium is that it allows for packing seven platters in the same form factor that would typically hold only five. Attempting this with air- filled drives would cause turbulence. If you recall the airplane-wing analogy from earlier, this ties in perfectly. Helium reduces drag, thus eliminating turbulence. As everyone knows, however, helium-filled balloons start to sink after a few days because the gas is escaping the balloons. The same could be said for these drives. It took years before manufacturers had created a container

acmqueue | may-june 2020 23 COMMIT TO 20 of 24 memory I

that prevented the helium from escaping the form factor for the life of the drive. Backblaze found that HHDDs had a lower annualized error rate of 1.03 percent, while standard hard drives resulted in 1.06 percent. Of course, that is so small a difference—it is hard to conclude much from it (backblaze.com). A helium-filled form factor can have an HDD encapsulated that uses PMR, or it could contain an MAMR (microwave-assisted magnetic recording) or HAMR (heat- results SMR in a much assisted magnetic recording) drive. Any magnetic storage more complex technology can be paired with helium instead of air. In 2014 writing process, HGST (Hitachi Global Storage Technologies) combined two since writing cutting-edge technologies into its 10TB HHDD that used to one track winds up host-managed SMR (shingled magnetic recording). overwriting an adjacent track. Shingled magnetic recording PMR, covered earlier, was SMR’s predecessor. In contrast to PMR, SMR writes new tracks that overlap part of the previously written magnetic track, which in turn makes the previous track narrower, allowing for higher track density. The technology’s name stems from the fact that the overlapping tracks resemble roof shingles. SMR results in a much more complex writing process, since writing to one track winds up overwriting an adjacent track. This doesn’t come into play when a disk platter is empty and data is sequential. Once you are writing to a series of tracks that already contain data, however, this process is destructive to existing adjacent data. If an adjacent track contains valid data, then it must be rewritten. This is quite similar to NAND flash, as covered earlier. Device-managed SMR devices hide this complexity

acmqueue | may-june 2020 24 COMMIT TO 21 of 24 memory I

by having the device firmware manage it, resulting in an interface like any other hard disk you might encounter. Host-managed SMR devices, on the other hand, rely on the to know how to handle the complexity of the drive. Seagate started shipping SMR drives in 2013, claiming a 25 percent greater density than PMR (anandtech.com).

Microwave-assisted magnetic recording MAMR is an energy-assisted magnetic storage technology—like HAMR, which is covered next—that uses 20- to 40-GHz frequencies to bombard the disk platter with a circular microwave field. This lowers its coercivity, meaning that the magnetic material of the platter has a lower resistance to changes in magnetization. As already discussed, changes in magnetization of a region of the platter are used to denote a 0 or a 1, so since the platter has a lower resistance to changes in magnetization, the data can be written more densely. The core of this new technology is the spin torque oscillator used to generate the microwave field without sacrificing reliability. , also known as WD, unveiled this technology in 2017 (storagereview.com). Toshiba followed shortly after in 2018 (theregister.co.uk). While WD and Toshiba are busy pursuing MAMR, Seagate is betting on HAMR.

Heat-assisted magnetic recording HAMR is an energy-assisted magnetic storage technology for greatly increasing the amount of data that can be stored on a magnetic device, such as an HDD, by using heat

acmqueue | may-june 2020 25 COMMIT TO 22 of 24 memory I

delivered by a laser to help write data onto the surface of a platter. The heat causes the data bits to be much closer together on the platter, which allows greater data density and capacity. This technology is quite difficult to achieve. A 200-milliwatt laser heats a teensy area of the region to 750 degrees F (400 degrees C) quickly before writing the data, while not interfering with or corrupting the rest of the data on the disk (fstoppers.com). The process of heating, writing the data, and cooling must be completed in less than a nanosecond. These challenges required the development of nano-scale surface plasmons, also known as a surface-guided laser, instead of direct laser-based heating, as well as new types of glass platters and heat- control coatings to tolerate rapid spot-heating without damaging the recording head or any nearby data, and various other technical challenges that needed to be overcome (seagate.com). Seagate first demonstrated this technology, despite many skeptics, in 2013 (computerworld.com). It started shipping the first drives in 2018 (blog.seagate.com).

END OF TAPE, REWIND This article started off with the state of the art in storage media in 1951 and concludes after looking at the future of storage technology. Storage has changed a lot over time—from paper tape to metal tape, magnetic tape, rope memory, spinning disks, optical disks, flash, and others. Progress has led to faster, smaller, and more performant devices for storing data. Comparing NVMe to the 1951 Uniservo metal tape

acmqueue | may-june 2020 26 COMMIT TO 23 of 24 memory I

shows that NVMe can read Related articles 486,111 percent more digits per 3 The Most Expensive One-byte Mistake second. Comparing NVMe to the Did Ken, Dennis, and Brian choose wrong Zip disks of 1994, you see that with NUL-terminated text strings? NVMe can read 213,623 percent Poul-Henning Kamp more digits per second. https://queue.acm.org/detail.cfm?id=2010365 One thing that remains true 3 Should You Upload or Ship is the storing of 0s and 1s. The Big Data to the Cloud? means by which that is done vary The accepted wisdom does not greatly. I hope the next time always hold true. you burn a CD-RW with a mix of Sachin Date songs for a friend, or store home https://queue.acm.org/detail.cfm?id=2933408 videos in an optical disc archive 3 Injecting Errors for Fun and Profit (yup, you heard that right (pro. Error-detection and correction features are sony), you think about how the only as good as our ability to test them. nonreflective bumps translate Steve Chessin to a 0 and the reflective lands of https://queue.acm.org/detail.cfm?id=1839574 the disk translate to a 1. If you are creating a mixtape on a cassette, remember that those are closely related to the Datasette used in the Commodore PET. Lastly, remember to be kind and rewind (this is a tribute to Blockbuster, but there are still open formats for using tape today (wikipedia.org).

Acknowledgments Thank you to Robert Mustacchi (https://twitter.com/ rmustacc), Trammel Hudson (https://twitter.com/qrs), and Rick Altherr (https://twitter.com/kc8apf) for tidbits (I can’t help myself) throughout this article.

acmqueue | may-june 2020 27 COMMIT TO 24 of 24 memory I

Jessie Frazelle is the cofounder and chief product officer of the Oxide Computer Company. Before that, she worked on various parts of Linux, including containers as well as the Go programming language. Copyright © 2020 held by owner/author. Publication rights licensed to ACM.

SHAPE THE FUTURE OF COMPUTING!

We’re more than computational theorists, database managers, UX mavens, coders and developers. We’re on a mission to solve tomorrow. ACM gives us the resources, the access and the tools to invent the future.

Join ACM today at acm.org/join

BE CREATIVE. STAY CONNECTED. KEEP INVENTING.

2CONTENTS acmqueue | may-june 2020 28 TEXT 1 of 10 EVERYTHING ONLY sysadmin Five Nonobvious Remote Work Techniques Emulating the efficiency of in-person conversations THOMAS A. LIMONCELLI

his article reveals five nonobvious techniques that make remote work successful at Stack Overflow. Remote work has been part of the engineering culture at Stack Overflow since the company began. Eighty percent of the engineering Tdepartment works remotely. This enables the company to hire top engineers from around the world, not just from the New York City area. (Forty percent of the company worked remotely prior to the COVID-19 lockdown; 100 percent during the lockdown.) Even employees who do not work remotely must work in ways that are remote-friendly. For some companies, working remotely was a new thing when the COVID-19 pandemic lockdowns began. At first the problems were technical: IT departments had to ramp up VPN (virtual private network) capacity, human resources and infosec departments had to adjust policies, and everyone struggled with microphones, cameras, and videoconferencing software. Once those technical issues are resolved, the social issues become more apparent. How do you strike up a conversation as you used to do in the office? How do you know when it is appropriate to reach out to someone? How do you prevent loneliness and isolation? Here are my top five favorite techniques Stack Overflow uses to make remote work successful on a social level.

acmqueue | may-june 2020 29 EVERYTHING 2 of 10 sysadmin I

TIP #1: IF ANYONE IS REMOTE, WE’RE ALL REMOTE Meetings should be either 100 percent in-person, or 100 percent remote; no mixed meetings. Ever been in a conference room with a bunch of people plus one person participating by phone or videoconference? It never works. The one remote participant can’t hear the conversation, can’t see what everyone else is seeing, and so on. He or she can’t authentically participate. At Stack Overflow we recognized this years ago and adopted a rule: If one person is remote, we’re all remote. This means everyone who is physically present leaves the conference room, goes back to their desks, and we conduct the meeting using desktop videoconferencing. During the COVID-19 lockdown your entire company may be remote, but this is a good policy to adopt when you return to the office. This may not be an option for companies with open floor plans, however, where participants videoconferencing from their desks may disturb their neighbors. How can you make mixed meetings work? Where I’ve observed them working well required two ingredients: First, the conference-room design was meticulously refined and adjusted over time (this is rarer—and more expensive—than you would think); second, and the biggest determinant, was the degree to which all those in the meeting were aware of the remote participants. It requires a learned skill of being vigilant for clues that someone is having difficulty in participating and then taking corrective action. Everyone, not just the facilitator, needs to be mindful of this.

acmqueue | may-june 2020 30 EVERYTHING 3 of 10 sysadmin I

TIP #2: ACCURATE CHAT STATUS Set your chat status to away when you are away. Set it to available when you are available. A chat system isn’t just about text chatting. It fulfills the subtler purpose of indicating if a participant is present or not. In meatspace you don’t walk up to someone’s desk and just start talking. First you notice if the person is there or not. Talking to an empty chair is ineffective and may worry “Do not disturb” your coworkers who witness it. feature Instead, you look for social clues. A shut door indicates helps someone needs privacy. A worker in a cubicle wearing control headphones may be indicating a need for extreme focus. Awhen my phone beeps for new Someone standing up to take a quick stretch is signaling a messages or break. Two people talking about last night’s game indicates when I can that others may join in, or may interrupt for work-related simply sleep. matters. The status presented in your chat system conveys similar information. I prefer to have my status set automatically. For example, I use a feature that automatically changes my status to “In a meeting” if my Google Calendar indicates that. People get automated warnings if they try to chat to me outside of normal working hours. A “Do not disturb” feature helps control when my phone beeps for new messages or when I can simply sleep. Since many schools and child-care facilities are closed now, parents are balancing child care and work in unexpected ways, often with highly dynamic schedules. One way we can support them is to respect status

acmqueue | may-june 2020 31 EVERYTHING 4 of 10 sysadmin I

messages such as “Child care... be back in an hour” or simply “Child care,” more likely to be used when there is no time to set a more detailed explanation because the kid is painting the cat with chocolate milk. Set your status to indicate if you are away, present, present but too busy to be disturbed, and so on. Encourage your team to do the same. Speaking of chat rooms, don’t make finding the right room a guessing game. Yes, it’s cute and funny to have #kittens as the chat room of the team run by a manager known for loving cats. A month later, however, nobody is laughing and it’s just annoying to remember the name. At Stack Overflow we’re not super strict about names, but most rooms have prefixes such as #team-, #project-, #account-, #fun-, and so on. Need to find the SRE team? You can bet we’re in #team-sre. Very little guessing. By using prefixes, not suffixes, the long list of room names sorts related names together.

TIP #3: THE QUICK CHAT PROTOCOL Establish a low-overhead way to start a quick conversation. In meatspace you can ask a quick question by just blurting it out. You are sitting near a coworker, so you can just say, “Got a second? Let’s talk about the Tomato Incident.” If the person is available, you just start talking about it. Chat rooms are good for quick questions that can be explained in a few sentences. Some questions, however, are more easily explained conversationally. In such situations people often do a complicated dance in an attempt to be polite.

acmqueue | may-june 2020 32 EVERYTHING 5 of 10 sysadmin I

Me: Hi! You: Hi. Me: Got a sec? You: Yeah. Me: I have a question about the Tomato Incident. You: OK. Me: Can we chat by video? You: Sure. Want me to make the room or should you? Me: I’ll do it. Me: [link]

By the time we’ve negotiated when, how, and where we’re going to talk, we’ve lost momentum. Take all that overhead, multiply it by the number of casual conversations you have with coworkers, and it totals to a big waste of time. Such etiquette is useful when talking to someone you don’t know well. For a direct coworker, however, you need a protocol with low overhead—as low as peeking over a cubicle wall. At Stack Overflow, we follow some easy rules to avoid the extended conversations. We say in chat, for example, “Quick hangout? Tomato Incident.” If available, your contact replies with a link to a chatroom. No formalities, no small talk, no confusion over who creates the session. Otherwise, the contact responds, “Not now,” and it is up to the requester to make an appointment. It looks like this:

Me: Quick chat? Tomato Incident You: https://link_to_a_chat_room or

acmqueue | may-june 2020 33 EVERYTHING 6 of 10 sysadmin I

Me: Quick chat? Tomato Incident You: Busy right now, put something on my gcal please.

When teams have prearranged to compress conversations that way, everyone benefits. In general, we want the overhead of starting a conversation to match the type of meeting. Important meetings with many people require planning, careful scheduling, and perhaps even a rehearsal. That’s an appropriate amount of overhead. Impromptu meetings should have minimal startup overhead. If a brief conversation requires custom-printed invitations with embossed lettering, a save-this-date card, and a return envelope, you’re doing it wrong. The quick chat protocol has become the virtual equivalent of social customs such as visiting someone’s office, striking up a quick conversation when you pass by someone in the hallway, or spotting someone at the water cooler and approaching with a question. One feature that enables low-overhead conversations is a video-chat system that supports permanent meeting- room URLs. This eliminates the need to pause and “create a room.” Zoom has PMIs (personal meeting IDs). As of this writing, Google Meet doesn’t have an equivalent, but you can schedule a meeting in Google Calendar in the future and use the embedded Meet URL as your PMI. (There are services that will automate this for you, as well as one open source project: https://github.com/g3rv4/GMeet.)

TIP #4: IDLING IN A VIDEOCONFERENCE ROOM Work silently together in a videoconference room.

acmqueue | may-june 2020 34 EVERYTHING 7 of 10 sysadmin I

Working remotely can be lonely. Chat rooms and video conferences go only so far. At Stack Overflow many teams hang out together in a videoconference even when not having a meeting. This is in addition to their regular text chat room. People just stay connected to the video chat, often with audio muted, and silently work independently. They unmute for quick questions or to consult on an idea. People drop out if they have another meeting, need to be alone, or need to focus. Think of this as emulating the physical world where many people work in the same room or work in an open floor plan. A manager who consistently hangs out in the videoconference is simulating an open-door policy, signaling permission to go in and talk. A friend who works in child care calls this “nerd parallel play.” I’m going to assume she means it as a compliment. It may seem like a waste of bandwidth to be in a videoconference when nobody is talking, but it helps fight loneliness and builds team cohesion. Some teams do this more than others. Some do it during specific hours of the day. The phrase “I’m idling in perma” translates to “I’m working. I’ll be in the permanent video chat room. Feel free to join and not talk.”

TIP #5: SCHEDULE VIDEO-CHAT SOCIAL EVENTS Create social events specifically for remote workers. Stack hosts regular social events via videoconference. During the lockdown your family might be doing this for holidays, birthdays, anniversaries, and so on. Why not do it at work too?

acmqueue | may-june 2020 35 EVERYTHING 8 of 10 sysadmin I

A coworker set up daily video chats during lunch. Volunteers have been doing weekly cooking lessons. A film club has popped up; they agree to all watch the same film during the week and then discuss it during lunch on Friday. Diversity employee resource groups and affinity groups host regular video meetups with no agenda other than to be social. Like many startups, we normally have a “beer bash” at the end of the week. Our virtual version of this is called “the remote bev bash.” Before the lockdown this was a smaller event; now it is companywide. It is interesting to see what new concoctions people bring each week.

LET’S STOP APOLOGIZING FOR NORMALITY While this is not an official Stack Overflow policy, I advocate that we should stop apologizing for normality. I’ve noticed that some people spend an inordinate amount of time apologizing in videoconferences for technical problems, difficulty finding the mute button, or children running into the room unexpectedly. In the new world of remote work these events are normal. Until videoconference software works perfectly, makes the mute button easier to find, and schools reopen, these situations will just be part of everyday life. If they are normal, we shouldn’t have to apologize for them. It is a waste of time, multiplied by the number of people in the meeting. Instead, we should simply acknowledge what happened and move on by saying something like “Thank you for waiting while I fixed my camera” or “Ah, there’s the unmute button,” or “That’s my kid who just ran by singing the Lego song, no need to applaud.”

acmqueue | may-june 2020 36 EVERYTHING 9 of 10 sysadmin I

CONCLUSION Related articles The physical world 3 Culture Surprises in Remote has social conventions Software Development Teams around conversations and “When in Rome” doesn’t help when communication that we use your team crosses time zones— without even thinking. As we and your deadline doesn’t. move to a remote-work world, Judith S. Olson, Gary M. Olson we have to be more intentional https://queue.acm.org/detail.cfm?id=966804 to create such conventions. 3 A Passage to India Developing these social norms Pitfalls that the outsourcing is an ongoing commitment vendor forgot to mention that outlasts initial technical Mark Kobayashi-Hillary details of VPN and desktop https://queue.acm.org/detail.cfm?id=1046948 videoconference software 3 Bound by the Speed of Light configuration. There’s only so much you can do to Companies that previously optimize NFS over a WAN. forbade remote work can Kode Vicious no longer deny its benefits. https://queue.acm.org/detail.cfm?id=1900007 Once the pandemic-related lockdowns are over, many people will continue working remotely. Those who return to the office will need to work in ways that are compatible with their remotely working associates. Every company is different. The techniques discussed in this article work for Stack Overflow but might not work everywhere. Give them a try and see what works for you.

Acknowledgments I’d like to thank my coworkers at Stack Overflow Inc. for their feedback.

acmqueue | may-june 2020 37 EVERYTHING 10 of 10 sysadmin I

Thomas A. Limoncelli is the SRE manager at Stack Overflow Inc., makers of Stack Overflow for Teams, a tool for remote workers. His books include The Practice of System and Network Administration (http://the-sysadmin-book.com), The Practice of Cloud System Administration (http://the-cloud- book.com), and Time Management for System Administrators (http://shop.oreilly.com/product/9780596007836.do). He blogs at EverythingSysadmin.com and tweets at @YesThatTom. He holds a B.A. in computer science from Drew University. Copyright © 2020 held by owner/author. Publication rights licensed to ACM.

SHAPE THE FUTURE OF COMPUTING!

Join ACM today at acm.org/join

BE CREATIVE. STAY CONNECTED. KEEP INVENTING.

2CONTENTS acmqueue | may-june 2020 38 TEXT 1 of 4 kode vicious ONLY

Sanity vs. Invisible Markings Tabs VS. spaces

Dear KV, My team resurrected some old Python code and brought it up to version 3. The process was made worse by the new restriction of not mixing tabs and spaces in the source code. An automatic cleanup that allowed the code to execute by replacing the tabs with spaces caused a lot of havoc with the comments at the ends of lines. Why does anyone make a language in which white space matters this much? White Out

Dear White, Ever edited a makefile? Although there is a long tradition of the significant use of white space in programming languages, all traditions need to change. We no longer sacrifice virgins to make the printer work, or at least KV hasn’t sacrificed one recently. In Python, many people have taken issue with the choice to have white space—and not braces—to indicate the limits of blocks of code, but since the developers didn’t change their minds on this with version 3 of Python, I suspect we’re all stuck with it for quite a bit longer, and I am quite sure that there will be other languages, big and small, where white space remains significant. If I could change one thing in the minds of all programming language designers, it would be to impress

acmqueue | may-june 2020 39 kode vicious 2 of 4 I

upon them—forcefully—the idea that anything that is significant to the syntactic or structural meaning of a program must be easily visible to the human reader, as well as easily understood by the systems with which koders write kode. Let’s deal with that last point first. Making it easy for tools to understand the structure of software is one of the keys to having tools that help programmers prepare proper programs for computers. Since the earliest days of software development, programmers have tried to build tools that show them—before the inevitable edit-compile- test-fail-edit endless loop—where there might be issues in the program text. Code editors have added colorization, syntax highlighting, folding, and a host of other features in a desperate, and some might say fruitless, attempt to improve the productivity of programmers. When a new language comes along, it is important for these signifiers in the code to be used consistently; otherwise your editor of choice has little or no ability to deploy these helpful hints to improve productivity. Allowing any two symbols to represent the same concept, for example, is a definite no-no. Imagine if you could have two types of braces to delineate blocks of code, just because two different parts of the programming community wanted them, or if there were multiple syntactic ways to dereference a variable. The basic idea is that there must be one clear way to do each thing that a language must do, both for human understanding and for the sanity of editor developers. Thus, the use of invisible, or near-invisible, markings in code, especially tabs and spaces, to indicate structure or syntax.

acmqueue | may-june 2020 40 kode vicious 3 of 4 I

Invisible and near-invisible markings bring us to the human part of the problem—not that code editor authors aren’t human, but most of us will not write new editors, though all of us will use editors. As we all know, once upon a time computers had small memories and the difference between a tab, which is a single byte, and a corresponding number of spaces (8) could be a significant difference between the size of source code stored on a precious disk, and also transferred, over whatever primitive and slow bus, from storage into memory. Changing the coding standard Related articles from eight spaces to four might 3 File-system Litter improve things, but let’s face Cleaning up your storage space it, none of this has mattered quickly and efficiently for several decades. Now, the Kode Vicious only reason for the use of these https://queue.acm.org/detail.cfm?id=2003323 invisible markings is to clearly 3 A Generation Lost in the Bazaar represent the scope of a piece Quality happens only when someone is of code relative to the pieces of responsible for it. code around it. Poul-Henning Kamp In point of fact, it would be https://queue.acm.org/detail.cfm?id=2349257 better to pick a single character 3 Demo Data as Code that is not a tab and not a space Automation helps collaboration. and not normally used in a Thomas A. Limoncelli program—for example, Unicode https://queue.acm.org/detail.cfm?id=3355565 code point U+1F4A9—and to use that as the universal indentation character. Editors would then be free to indent code in any consistent way based on the user’s preferences. The user could have any number of

acmqueue | may-june 2020 41 kode vicious 4 of 4 I

blank characters used per indent character—8, 4, 2, some prime number, whatever they like—and programmers could choose their very own personal views of the scope. On disk, this format would cost only one character (two bytes) per indent, and if you wanted to see the indent characters, a common feature of modern editors, you flip a switch, and voila, there they all are. Everyone would be happy, and we would have finally solved the age-old conundrum of tabs vs. spaces. KV

George V. Neville-Neil works on networking and operating- system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are code spelunking, operating systems, and rewriting your bad code (OK, maybe not that last one). He earned his bachelor’s degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. He is an avid bicyclist and traveler who currently lives in New York City. Copyright © 2020 held by owner/author. Publication rights licensed to ACM.

2CONTENTS acmqueue | may-june 2020 42 TEXT 1 of 28 data ONLY Data on the Outside vs. Data on the Inside

Data kept PAT HELLAND outside SQL has different ecently, there has been a lot of interest in characteristics services. These can be microservices or just from data kept services. In each case, the service provides a inside. function with its own code and data, and operates independently of partners. This article argues Rthat there are a number of seminal differences between data encapsulated inside a service and data sent into the space outside of the service boundary. SQL data is encapsulated within a service to ensure it is protected by application code. When sending data across services, it is outside that trust boundary. The first question this article asks is what trust means to a service and its encapsulated data. This is answered by looking at transactions and boundaries, data kept inside versus data kept outside of services. Also to be considered

acmqueue | may-june 2020 43 data 2 of 28

is how services compose using operators (requesting stuff) and operands (refining those requests). Then the article looks at time and service boundaries. When data in a database is unlocked, it impacts notions of time. This leads to an examination of the use of immutability in the composition of services with messages, schema, and data flowing between these boundaries. The article then looks at data on the outside of these trust boundaries called services. How do you structure that data so it is meaningful across both space and time as it flows in a world not inside a service? What about data inside a service? How does it relate to stuff coming in and out? Finally, the characteristics of SQL and JSON (JavaScript Object Notation), and other semi-structured representations, are considered. What are their strengths and weaknesses? Why do the solutions seem to use both of them for part of the job?

ESSENTIAL SERVICES Services are essential to building large applications today. While there are many examples of large enterprise solutions that leverage services, the industry is still learning about services as a design paradigm. This section describes how the term service is used and introduces the notions of data residing inside services and outside services.

Services Big and complex systems are typically collections of independent and autonomous services. Each service consists of a chunk of code and data that is private to that

acmqueue | may-june 2020 44 data 3 of 28

service. Services are different from the classic application living in application silos, in that services are primarily designed to interact with other services via messaging. Indeed, that interaction, its data, and how it all works is an interesting topic. Services communicate with each other exclusively through messages. No knowledge of the partner service is shared other than the message formats and the sequences of the expected messages. It is explicitly allowed (and, indeed, expected) that the partner service may be implemented with heterogeneous technology at all levels of the stack including hardware, operating system, database, middleware, programming language, and/or application vendor or implementation team. The essence of a service lies in its independence and how it encapsulates (and protects) its data.

Bounding trust via encapsulation Services interact with a collection of messages whose formats (schema) and business semantics are well defined. Each service will do only limited things for its partner services based upon well-defined messages. The act of defining a limited set of behaviors provides a firm encapsulation of the service. An important part of trust is limiting the things you’ll do for outsiders. To interact with a service, you have to follow its rules and constraints. Each message you send fits a prescribed role. The only way to interact with data in another service is through its rules and business logic. Data is, in general, never allowed out of a service unless it is processed by application logic.

acmqueue | may-june 2020 45 data 4 of 28

For example, when using your bank’ s ATM, you expect to have only a few supported operations such as withdrawal, deposit, etc. Banks do not allow direct access to database connections via ATMs. The only way to change the bank’s database is through the bank’s application logic in the ATM and the back-end system. This is how a service protects its data.

Encapsulating both changes and reads Services encapsulate changes to their data with their application logic. The app logic ensures the integrity of the service’s data and work. Only the service’s trusted application logic can change the data. Services encapsulate access to read their data. This controls the privacy of what is exported. While this autonomy is powerful, it can also cause some challenges. Before your business separated its work into independent services, all of its data was in a big database. Now you have a bunch of services, and they have a bunch of databases running on a bunch of computers with a bunch of different operating systems. This is awesome for the independent development, support, and evolution of the different services, but it’s a royal hassle when you want to do analytics across all your data. Frequently each service will choose to export carefully sanitized subsets of data for consumption by partner services. Of course, this requires some work ensuring proper authorization to see this data (as well as authenticating the curious service). Still, the ability to sanitize and control the data being exposed is crucial.

acmqueue | may-june 2020 46 data 5 of 28

Trust and transactions Participating in an ACID (atomicity, consistency, isolation, durability) transaction means one system can be locked up waiting for another system to decide to commit or abort the transaction. If you are stuck holding locks waiting for another system, that can really cause trouble for your availability. With rare exceptions, services don’t trust other services like that. In the late 1990s, there were efforts to formalize standards for transaction coordination across trust boundaries. Fortunately, these standards died a horrible death.

Data inside and outside services The premise of this article is that data residing inside a service is different in many essential ways from data residing outside or between services: 3 Data on the inside refers to the encapsulated private data contained within the service itself. As a sweeping statement, this is the data that has always been considered “normal”—at least in your database class in college. The classic data contained in a SQL database and manipulated by a typical application is inside data. 3 Data on the outside refers to the information that flows between these independent services. This includes messages, files, and events. It’s not your classic SQL data.

Operators and operands Messages flowing between services contain operators, which correspond to the intended purpose of the message. Frequently the operator reflects a business function in the

acmqueue | may-june 2020 47 data 6 of 28

domain of the service. For example, a service implementing a banking application may have operators in its messages for deposits, withdrawals, and other banking functions. Sometimes operators reflect more mundane reasons for sending messages, such as “Here’s Tuesday’s price list.” Messages may contain operands to the operators, shown in figure 1. The operands are additional stuff needed by the operator message to qualify the intent of the message fully. Operands may be obtained from reference data, published to describe those operands. A message requesting a purchase from an e-commerce site may include product IDs, requested numbers to be purchased, expected price, customer ID, and more. This is covered in more detail later.

DATA: THEN AND NOW This section examines the temporal implications of not sharing ACID transactions across services and examines

FIGURE 1: Operands

service reference data message

deposit

operator 1 operands

acmqueue | may-june 2020 48 data 7 of 28

the nature of work inside the boundaries of an ACID transaction. This provides a crisp sense of “now” for operations against inside data. The situation for data on the outside of the service, however, is different. The fact that it is unlocked means that the data is no longer in the now. Furthermore, operators are requests for operations that have not yet occurred and actually live in the future (assuming they come to fruition). Different services live in their own private temporal domains. This is an intrinsic part of using distrusting services. Trust and time carry implications about how to think about applications.

Transactions, inside data and now Transactions have been historically defined using ACID 1 properties. These properties reflect the semantics of the transaction. Much work has been done to describe transaction serializability, in which transactions executing on a system or set of related systems perceive their work as applied in a serial order even in the face of concurrent execution.2 Transactional serializability makes you feel alone. A rephrasing of serializability is that each transaction sees all other transactions to be in one of three categories: 3 Those whose work preceded this one. 3 Those whose work follows this one. 3 Those whose work is completely independent of this one. This looks just like the executing transaction is all alone. ACID transactions live in the now. As time marches forward and transactions commit, each new transaction

acmqueue | may-june 2020 49 data 8 of 28

perceives the impact of the transactions that preceded it. The executing logic of the service lives with a clear and crisp sense of now.

Blast from the past Messages may contain data extracted from the local service’s database. The sending application logic may look in its belly to extract that data from its database. By the time the message leaves the service, that data will be unlocked. The destination service sees the message; the data on the sender’s service may be changed by subsequent transactions. It is no longer known to be the same as it was when the message was sent. The contents of a message are always from the past, never from now. There is no simultaneity at a distance. Similar to the speed of light bounding information, by the time you see a distant object, it may have changed. Likewise, by the time you see a message, the data may have changed. Services, transactions, and locks bound simultaneity: 3 Inside a transaction, things are simultaneous. 3 Simultaneity exists only inside a transaction. 3 Simultaneity exists only inside a service. All data seen from a distant service is from the “past.” By the time you see data from a distant service, it has been unlocked and may change. Each service has its own perspective. Its inside data provides its framework of “now.” Its outside data provides its framework of the “past.” My inside is not your inside, just as my outside is not your outside. Using services rather than a single centralized database

acmqueue | may-june 2020 50 data 9 of 28

is like going from Newton’s physics to Einstein’s physics: 3 Newton’s time marched forward uniformly with instant knowledge at a distance. 3 Before services, distributed computing strove to make many systems look like one, with RPC (remote procedure call), two-phase commit, etc. 3 In Einstein’s universe, everything is relative to one’s perspective. 3 Within each service, there is a “now” inside, and the “past” arriving in messages.

Hope for the future Messages contain operators that define requests for work from a service, shown in figure 2. If Service A sends a message with an operator request to Service B, it is

FIGURE 2: Requests for work

service service A B

hopeful for the future... decides to issue minding own 2 request request business hopes fulfilled, future altered the future is response by doing now request

acmqueue | may-june 2020 51 data 10 of 28

hopeful that Service B will do the requested operation. In other words, it is hopeful for the future. If Service B complies and performs the work, that work becomes part of Service B’s future, and its state is forever changed. Once Service A receives a reply describing either success or failure of the operation, Service A’s future is changed.

Life in the “then” Operands may live in either the past or the future, depending on their usage pattern. They live in the past if they have copies of unlocked information from a distant service. They live in the future if they contain proposed values that hopefully will be used if the operator is successfully completed. Between the services, life is in the world of “then.” Operators live in the future. Operands live in either the past or the future. Life is always in the then when you are outside the confines of a service. This means that data on the outside lives in the world of then. It is past or future, but it is not now. Each separate service has its own separate “now,” illustrated in figure 3. The domains of transaction serializability are disjoint, and each has its own temporal environment. The only way they interact is through data on the outside, which lives in the world of then.

Dealing with now and then Services must cope with making the now meet the then. Each service lives in its own now and interacts with incoming and outgoing notions of then. The application logic for the service must reconcile these.

acmqueue | may-june 2020 52 data 11 of 28

FIGURE 3: Services with different “now”s 3 service service #1 #2

service service #3 #4

no notion of “now” in between services

Consider, for example, what’s involved when a business accepts an order: The business may publish daily prices, but it probably wants to accept yesterday’s prices for a while after midnight. Therefore, the service’s application logic must manually cope with the differences in prices during the overlap. Similarly, a business that says its product “usually ships in 24 hours” must consider the following: Order processing has old information; the available inventory is deliberately fuzzy; both sides must cope with different time domains.

acmqueue | may-june 2020 53 data 12 of 28

The world is no longer flat: 3 Services with private data support more than one computer working together. 3 Services and their service boundaries mean multiple trust domains and different transaction domains. 3 Multiple transaction domains mean multiple time domains. 3 Multiple time domains force you to cope with ambiguity to allow coexistence, cooperation, and joint work.

DATA ON THE OUTSIDE: IMMUTABILITY This section discusses properties of data on the outside. First, each data item needs to be uniquely identified and have immutable contents that do not change as copies of it move around. Next, anomalies can be caused in the interpretation of data in different locations and at different times; the notion of “stable” data avoids these anomalies. The section also discusses schemas and the messages they describe. This leads to the mechanisms by which one piece of outside data can refer to another piece of data and the implications of immutability. Finally, what does outside data look like when it is being created by a collection of independent services, each living in its own temporal domain?

Immutable and/or versioned data Data may be immutable. Once immutable data is written and given an identifier, its contents will remain the same for that identifier. Once it is written, it cannot be changed. In many environments, the immutable data may be deleted and the identifier will subsequently be mapped to an

acmqueue | may-june 2020 54 data 13 of 28

indication of “no present data,” but it will never return data other than the original contents. Immutable data is the same no matter when or where it is referenced. Versioned data is immutable. If you specify a specific version of some collection of data, you will always get the same contents. In many cases, a version-independent identifier is used to refer to a collection of data. An example is the New York Times. A new version of the newspaper is produced each day (and, indeed, because of regional editions, multiple versions are produced each day). To bind a version-independent identifier to the underlying data, it is necessary first to convert to a version-dependent identifier. For example, the request for a recent New York Times is converted into a request for the New York Times on January 4, 2005, California edition. This is a version-dependent identifier that yields the immutable contents of that region’s edition of that day’s paper. The contents of this edition for that day will never change no matter when or where you request it. Either the information about the contents of that specific newspaper is available or it is not. If it is available, the answer is always the same.

Immutability, messages, and outside data One reality of messaging is that messages sometimes get lost. To ensure delivery, the message must be retried. It is essential that retries have the same contents. The message itself must be immutable. Once a message is sent, it cannot be unsent any more than a politician can unsay something on television. It is best to consider each message as uniquely identified, and that identifier must

acmqueue | may-june 2020 55 data 14 of 28

yield immutable contents for the message. This means the same bits are always returned for the message.

Stability of data Immutability isn’t enough to ensure a lack of confusion. The interpretation of the contents of the data must be unambiguous. Stable data has an unambiguous and unchanging interpretation across space and time. For example, a monthly bank statement is stable data. Its interpretation is invariant across space and time. On the other hand, the words President Bush had a different meaning in 2005 than they did in 1990. These words are not stable in the absence of additional qualifying data. Similarly, anything called current (e.g., current inventory) is not stable. To ensure the stability of data, it is important to design for values that are unambiguous across space and time. One excellent technique for the creation of stable data is the use of time-stamping and/or versioning. Another important technique is to ensure that important identifiers such as customer IDs are never reused.

Immutable schema and immutable messages As discussed previously, when a message is sent, it must be immutable and stable to ensure its correct interpretation. In addition, the schema for the message must be immutable. For this reason, it is recommended that all message schemas be versioned and each message use the version-dependent identifier of the precise definition of the message format. Alternatively, the schema can be

acmqueue | may-june 2020 56 data 15 of 28

embedded in the message. This is popular when using JSON or other semi-structured formats.

References to data, immutability, and DAGs Sometimes it is essential to refer to other data. When referencing from outside data, the identifier used for the reference must specify data that is immutable. If you find an immutable document that tells you to read today’s New York Times to find out more details, that doesn’t do you any good without more details (specifically the date and region of the paper). As new data is generated, it may have references to complex graphs of other data items, each of which is immutable and uniquely identified. This creates a DAG (directed acyclic graph) of referenced data items. Note that this model allows for each data item to refer to its schema using simply another arc in the DAG. Over time, independent services, each within its own temporal domain, will generate new data items blithely ignorant of the recent contributions of other services. The creation of new immutable data items that are interrelated by membership in this DAG is what gives outside data its special charm.

DATA ON THE OUTSIDE: REFERENCE DATA Reference data refers to a type of information that is created and/or managed by a single service and published to other services for their use. Each piece of reference data has both a version-independent identifier and multiple versions, each of which is labeled with a version-dependent identifier. For each piece, there is exactly one publishing service.

acmqueue | may-june 2020 57 data 16 of 28

This section discusses the publication of versions, then moves on to the various uses of reference data.

Publishing versioned reference data The idea here is quite simple. A version-independent identifier is created for some data. One service is the owner of that data and periodically publishes a new version that is labeled with a version-dependent identifier. It is important that the version’s identifier is known to be increasing as subsequent versions are transmitted. When a version of the reference data is transmitted, it must be assumed to be somewhat out of date. The information is clearly from the past and not now. It is reasonable to consider these versions as snapshots.

Uses of reference data There are three broad usage categories for reference data, at least so far: 3 Operands contain information published by a service in anticipation that another service will submit an operator using these values. 3 Historic artifacts describe what happened in the past within the confines of the sending service. 3 Shared collections contain information that is held in common across a set of related services that evolves over time. One service is the custodian and manages the application of changes to a part of the collection. The other services use somewhat older versions of the information.

acmqueue | may-june 2020 58 data 17 of 28

Operands As previously discussed, messages contain operators that map to the functions provided by the service. These operators frequently require operands as additional data describing the details of the requested work. Operands are gleaned from reference data that is typically published by the service being invoked. A department store catalog, for example, is reference data used to fill out the order form. An online retailer’s price list, product catalog, and shipping-cost list are operands.

Historic Artifacts Historic artifacts report on what happened in the past. Sometimes these snapshots of history need to be sent from one service to another. Serious privacy issues can result unless proper care is exercised in the disclosure of historic artifacts from one service to another. For this reason, this usage pattern is often seen across services that have some form of trust relationship, such as quarterly results of sales, a monthly bank statement, or inventory status at the end of the quarter.

Shared Collections The most challenging usage pattern for reference data is the shared collection. In this case, many different services need to have a recent view of some interesting data. Frequently cited examples include the employee database and the customer database. In each of these, lots of separate services want both to examine and to change the contents of the data in these collections. Many large enterprises experience this problem writ

acmqueue | may-june 2020 59 data 18 of 28

large. Lots of different applications think they can change the customer database, and now that these applications are running on many servers, there are many replicas of the customer database (frequently with incompatible schemas). Changes made to one replica gradually percolate to the others with information loss caused by schema transformations and conflicting changes. A shared collection offers a mechanism for rationalizing the desire to have multiple updaters and allowing controlling business logic to enforce policies on the data. A shared collection has one special service that actually owns the authoritative perspective of the collection. It enforces business rules that ensure the integrity of the data. The owning service periodically publishes versions of the collection and supports incoming requests whose operators request changes. Note that this is not optimistic concurrency control. The owning service has complete control over the changes to be made to the data. Some fields may be updatable, and others may not. Business constraints may be applied as each requested change is considered. Consider changes to a customer’s address. This is not just a simple update but complex business logic: 3 You don’t simply update an address. You append the new address while remembering that the old address was in effect for a range of dates. 3 Changing the address may affect the tax location. 3 Changing the address may affect the sales district. 3 Shipments may need to be rerouted.

acmqueue | may-june 2020 60 data 19 of 28

DATA ON THE INSIDE As previously described, inside data is encapsulated behind the application logic of the service. This means that the only way to modify the data is via the service’s application logic. Sometimes a service will export a subset of its inside data for use on the outside as reference data. This section examines the following facets of data on the inside: (1) the temporal environment in which SQL’s schema definition language operates; (2) how outside data is handled as it arrives into a service; and (3) the extensibility seen in data on the outside and the challenges inherent in storing copies of that data inside in a shredded fashion to facilitate its use in relational form.

SQL, DDL, and serializability SQL’ s DDL (Data Definition Language) is transactional. Like other operations in SQL, updates to the schema via DDL occur under the protection of a transaction and are atomically applied. These schema changes may make a significant difference in the ways that data stored within the database is interpreted. It is essential that transactions preceding a DDL operation be based on the existing schema, and those that follow the DDL operation be based on the schema as changed by the operation. In other words, changes to the schema participate in the serializable semantics of the database. Both SQL and DDL live in the now. Each transaction is meaningful only within the context of the schema defined by the preceding transactions. This notion of now is the temporal domain of the service consisting of the service’s

acmqueue | may-june 2020 61 data 20 of 28

logic and its data contained in this database.

Storing incoming data When data arrives from the outside, most services copy it inside their local SQL database. Although inside data is not, in general, immutable, most services choose to implement a convention by which they immutably retain the data. It is not uncommon to see the incoming data syntactically converted to a more convenient form for the service. This is called shredding (figure 4). Many times, an incoming message is kept as an exact binary copy for auditing and non-repudiation while still converting the contents to a form easier to use within the service itself.

Extensibility versus shredding Frequently the outside data is kept in a semi-structured

FIGURE 4: Shredding

service inside data

incoming 4 data

acmqueue | may-june 2020 62 data 21 of 28

representation such as JSON, which has a number of wonderful qualities for this, including extensibility. JSON’s extensibility allows other services to add information to a message that was not declared in the schema for the message. Basically, the sender of the message has added stuff that you didn’t expect when the schema was defined. Extensibility is in many ways like scribbling on the margins of a paper form. It frequently gets the desired results, but there are no guarantees. As incoming outside data is copied into the SQL database, there are advantages to shredding it. Shredding is the process of converting the hierarchical semi-structured data into a relational representation. Normalizing the incoming outside data is not a priority. Normalization is designed to eliminate or reduce update anomalies. Even though you’re stuffing the data into a SQL database, you’re not going to update it. You are capturing the outside data in a fashion that’s easier to use inside SQL. Shredding is, however, of great interest for business analytics. The better the relational mapping, the better you will be able to analyze the data. It is interesting that extensibility fights shredding. Mapping unplanned extensions to planned tables is difficult. Many times, partial shredding is performed wherein the incoming information that does comply with well-known and regular schema representations is cleanly shredded into a relational representation, and the remaining data (including extensions) is kept without shredding.

acmqueue | may-june 2020 63 data 22 of 28

REPRESENTATIONS OF DATA Let’s consider the characteristics of these two prominent representations of data: JSON and SQL.

Representing data in JSON JSON is a standard for representing semi-structured data. It is an interchange format with a human-readable text for storing and transmitting attribute-value pairs. Sometimes a schema for the data is kept outside the JSON document. Sometimes the metadata is embedded (as attribute-value pairs) into the hierarchical structure of the document. JSON documents are frequently identified with a URL (universal resource locator), which gives the document a unique identity and allows references to it. It is this combination of human readability, self- describing attribute-value pairs, and global identity through URLs that make JSON so popular. Of course, its excellent and easy-to-use libraries in multiple languages help too.

Representing data in SQL SQL represents relationships by values contained in cells within rows and tables. Being value-based allows it to “relate” different records to each other by their value. This is the essence of the relational backbone of SQL. It is precisely this value-based nature of the representation that enables the amazing query technology that has emerged over the past few decades. SQL is clearly the leader as a representation for inside data.

acmqueue | may-june 2020 64 data 23 of 28

Bounded and unbounded Let’s contrast SQL’s value-based mechanism with JSON’s identity- and reference-based mechanism. Relational representations must be bounded. For the value-based comparisons to work correctly, there must be both temporal and spatial bounds. Value-based comparisons are meaningful only if the contents of both records are defined within the same schema. Multiple schemas can have well-defined meaning only when they can be (and are) updated within the same temporal scope (i.e., with ACID semantics in the same database). This effectively yields a single schema. SQL is semantically based on a centrally managed single schema. Attempts over the past 20 years to create distributed SQL databases are fine but must include a single transactional scope and a single DDL schema. If not, the semantics of relational algebra are placed under pressure. SQL only works inside a single database. JSON is unbounded. In JSON, data is referenced using URIs (uniform resource identifiers) and not values. These URIs are universally defined and unique. Of course, every URL is a legitimate URI so they’re cool, too. URIs can be used on any machine to uniquely identify the referenced data. When used with the proper discipline, this can result in the creation of DAGs of JSON documents, each of which may be created by independent services living in independent temporal (and schema) domains.

Characteristics of inside and outside data Let’s consider the various characteristics discussed so far for inside and outside data, as shown in figure 5.

acmqueue | may-june 2020 65 data 24 of 28

FIGURE 5: Inside and outside data 5 Outside Data Inside Data Immutable? yes no Identity based references yes no Open schema? yes no Represent in JSON or other semi-structured fashion yes no Encapsulation useful? no yes Long-lived evolving data with evolving schema? no yes Business intelligence desirable over data? yes yes Durable storage in SQL inside the service? yes yes

Immutability, identity-based references, open schema, and JSON representation apply to outside data, not to inside data. This is all part of a package deal in the form of the representation of the data, and it suits the needs of outside data well. The immutable data items can be copied throughout the network and new ones generated by any service. Indeed, the open and independent schema mechanisms allow independent definition of new formats for messages, further empowering the independence of separate services. Next, consider encapsulation and realize that outside data is not protected by code. There is no formalized notion of ensuring that access to the data is mediated by a body of code. Rather, there is a design point that says if you have access to the raw contents of a message, you should be able to understand it. Inside data is always encapsulated by the service and its application logic. Consider data and its relationship to its schema. Outside data is immutable, and each data item’s schema remains

acmqueue | may-june 2020 66 data 25 of 28

immutable. Note that the schema may be versioned and the new version applied to subsequent similar data items, but that does not change the fact that once a specific immutable item is created, its schema remains immutable. This is in stark contrast to the mechanisms employed by SQL for inside data. SQL’s DDL is designed to allow powerful transformations to existing schema while the database is populated. Finally, let’s consider the desirability of performing business intelligence analysis over the data. Experience shows that those analysis folks want to slice and dice anything they can get their hands on. Existing analytics operate largely over inside data, which will certainly continue as fodder for analysis. But there is little doubt about the utility of analyzing outside data as well.

The dynamic duo of data representations Now, let’s compare the strengths and weaknesses of these two representations of data, SQL and JSON: 3 SQL, with its bounded schema, is fantastic for comparing anything with anything (but only within bounds). 3 JSON, with its unbounded schema, supports independent definitions of schema and data. Extensibility is cool too.

Arbitrary queries Independent definition of shared data SQL Outstanding! Impossible. SQL data definition is centralized, not independent JSON Problematic Outstanding!

acmqueue | may-june 2020 67 data 26 of 28

Consider what it takes to perform arbitrary queries: SQL is outstanding because of its value-based nature and tightly controlled schema, which ensure alignment of the values, hence facilitating the comparison semantics that underlie queries. JSON is problematic because of schema inconsistency. It is precisely the independence of the definition that poses the challenges of alignment of the values. Also, the hierarchical shape and forms of the data may also be a headache. Still, you can project consistent schema in a form easily queried. It might be a lossy projection where not all the knowledge is available to be queried.

Next, consider independent definition of shared data: SQL is impossible because it has centralized schema. As already discussed, this is intrinsic to its ability to support value-based querying in a tightly controlled environment. JSON is outstanding. It specializes in independent definition of schema and independent generation of documents containing the data. That is a huge strength of JSON and other semi-structured data representations.

Each model’s strength is simultaneously its weakness. What makes SQL exceptional for querying makes it dreadful for independent definition of shared data. JSON is wonderful for the independent definition, but it stinks for querying. You cannot add features to either of these models to address its weaknesses without undermining its strengths.

acmqueue | may-june 2020 68 data 27 of 28

CONCLUSION Related articles This article describes the 3 Beyond Relational Databases impact of services and trust There is more to data access than SQL. on the treatment of data. It Margo Seltzer introduces the notions of inside https://queue.acm.org/detail.cfm?id=1059807 data as distinct from outside 3 The Singular Success of SQL data. After discussing the SQL has a brilliant future as a major figure temporal implications of not in the pantheon of data representations. sharing transactions across Pat Helland the boundaries of services, https://queue.acm.org/detail.cfm?id=2983199 the article considers the need 3 A co-Relational Model of Data for Large for immutability and stability Shared Data Banks in outside data. This leads to Erik Meijer and Gavin Bierman a depiction of outside data Contrary to popular belief, SQL and noSQL as a DAG of data items being are really just two sides of the same coin. independently generated by https://queue.acm.org/detail.cfm?id=1961297 disparate services. The article then examines the notion of reference data and its usage patterns in facilitating the interoperation of services. It presents a brief sketch of inside data with a discussion of the challenges of shredding incoming data in the face of extensibility. Finally, JSON and SQL are seen as representations of data, and their strengths are compared and contrasted. This leads to the conclusion that each of these models has strength in one usage that complements its weakness in another usage. It is common practice today to use JSON to represent data on the outside and SQL to store the data on the inside. Both of these representations are used in a fashion that plays to their respective strengths.

acmqueue | may-june 2020 69 data 28 of 28

This is an update to the original paper by the same name presented at CIDR 2005 (Conference on Innovative Data Systems Research). At that time, XML was more commonly used than JSON. Similarly, SOA (service-oriented architecture) was used more then, while today, it’s more common to say simply, “service.” In this article, “service” is used to mean a database encapsulated by its service or application code. It does not mean a microservice. That’s for a separate paper. Nomenclature aside, not much has changed.

References 1. Bernstein, P. A., Hadzilacos, V., Goodman, N. 1987. Concurrency Control and Recovery In Database Systems. Addison-Wesley (http://sigmod.org/publications/dblp/db/ books/dbtext/bernstein87.html). 2. Gray, J., Reuter, A. 1993. Transaction Processing: Concepts and Techniques. Morgan Kaufmann.

Pat Helland has been implementing transaction systems, databases, application platforms, distributed systems, fault-tolerant systems, and messaging systems since 1978. For recreation, he occasionally writes technical papers. He currently works at Salesforce. Copyright © 2020 held by owner/author. Publication rights licensed to ACM.

2CONTENTS acmqueue | may-june 2020 70 TEXT 1 of 12 hardware ONLY The History, Status, and Future of FPGAs Hitting a nerve with field-programmable gate arrays

OSKAR MENCER This article is a summary of a three-hour discussion at DENNIS ALLISON Stanford University in September 2019 among the authors. It has been written with combined experiences at and with ELAD BLATT organizations such as Zilog, Altera, Xilinx, Achronix, Intel, MARK CUMMINGS IBM, Stanford, MIT, Berkeley, University of Wisconsin, MICHAEL J. FLYNN the Technion, Fairchild, , Bigstream, Google, DIGITAL (DEC), SUN, Nokia, SRI, Hitachi, Silicom, Maxeler JERRY HARRIS Technologies, VMware, Xerox PARC, Cisco, and many others. CARL HEWITT These organizations are not responsible for the content, but QUINN JACOBSON may have inspired the authors in some ways, to arrive at the MAYSAM LAVASANI colorful ride through FPGA space described above.

MOHSEN MOAZAMI ield-programmable gate arrays (FPGAs) have been HAL MURRAY hitting a nerve in the ASIC community since their MASOUD NIKRAVESH inception. In the mid-1980s, Ross Freeman and his colleagues bought the technology from Zilog and ANDREAS NOWATZYK started Xilinx, targeting the ASIC emulation and MARK SHAND, AND Feducation markets. (Zilog came out of Exxon, since in the SHAHRAM SHIRAZI 1970s people were already afraid that oil would run out in

acmqueue | may-june 2020 71 hardware 2 of 12

30 years, which is still true today.) In parallel, Altera was founded with similar technology at its core. An FPGA is a chip that is programmed by a circuit. It is said to “emulate” that circuit. This emulation runs slower than the actual circuit would run if it were implemented in an ASIC—it has a slower clock frequency and uses more power, but it can be reprogrammed every few hundred milliseconds. People who make ASICs started using FPGAs to emulate their ASICs before committing them to a mask and sending them out to the factory to be manufactured. Intel, AMD, and many other companies use FPGAs to emulate their chips before manufacturing them.

HITTING A NERVE IN TELECOM The telecom industry has been a heavy user of FPGAs. Telecom standards keep changing and building telecom equipment is hard, so the company that ships telecom solutions first tends to capture the biggest chunk of the market. Since ASICs take a long time to make, FPGAs offer an opportunity for a shortcut. FPGAs started to be adopted for first versions of telecom equipment, which initiated the FPGA price conflict. While the price of the FPGA does not matter to the ASIC emulation market, the price of a chip for telecom is important. Many years ago, AT&T and Lucent made their own FPGAs, called ORCAs (optimized reconfigurable cell arrays), but they were not competitive with Xilinx or Altera in terms of speed or size of the silicon. Today, Huawei is the largest customer for FPGAs. It is possible that the recent tension between the United States

acmqueue | may-june 2020 72 hardware 3 of 12

and China began with FPGAs from the United States giving Huawei an edge in delivering 5G telecom equipment two years before any of the other vendors around the world got ready to play.

FPGA PRICE HITS A NERVE Early on, FPGAs were used for SDRs (software-defined radios), building radios for communication on many different standards at the same time, in essence having a arly on, FPGAs single phone speaking many languages. This use of FPGAs were used hit a huge nerve. There was a split in how SDR technology for SDRs was implemented. Commercial vendors developed cost- (software- effective solutions, and today every base station on the Edefined radios), building planet has SDR technology in it. In the defense community, radios for on the other hand, SDRs were built by large defense communication contractors with profitable legacy product lines to on many protect. The result was that the price of FPGA-based radio different standards at products was so high that part of the U.S. defense market the same time, developed a persistent allergic reaction to their use. in essence having Next, FPGAs tried to grow in the DSP (digital signal a single phone speaking many processor) and embedded markets. FPGAs with small languages. hard microprocessors in the corner started to appear. The pressure to sell these new FPGAs was so high that if customers rejected the new family of chips, they were put on a blacklist, and sometimes even refused service for a few months. Pressure to grow the FPGA market was and still is immense, as is the magnitude of the failures of FPGA companies to conquer new markets, given the impossibility of reducing the price of FPGA products because of their enormous surface area and layers of intellectual property.

acmqueue | may-june 2020 73 hardware 4 of 12

HITTING A NERVE IN HPC AND DATACENTERS For the past few years, FPGAs have tried to grow in the HPC (high-performance computing) and datacenter markets. In 2017 Microsoft announced its use of Altera FPGAs in the datacenter, and Intel bought Altera. In 2018 Xilinx announced its “Datacenter First” strategy, with the Xilinx CEO declaring in front of an audience of analysts that Xilinx is not an FPGA company anymore. This may have been a slight dramatization, but historically there is relevance. In HPC and datacenter usage of FPGAs, the main obstacle today is place and route—the time it takes to run the proprietary FPGA vendor software that maps the circuit onto the FPGA elements. On large FPGAs and on a fast CPU server, place and route takes up to three days, and many times even after three days the software fails to find a mapping.

HITTING A NERVE IN OIL AND GAS In oil and gas implementations, however, around 2007 a niche opened up. The time it took classical computers to simulate drilling holes in the earth to find oil was longer than the actual building of a drilling site and the drilling itself. The use of FPGA accelerators dramatically changed this upside-down timing. The first FPGAs in the datacenter of an oil company, computing seismic images, were built by Maxeler Technologies and delivered to Chevron.3 The use of FPGAs in oil and gas expanded for a few years, until pressure from the ASIC industry led to a return to standard CPU technology. Today prediction and simulations in oil and gas are still important, and seismic

acmqueue | may-june 2020 74 hardware 5 of 12

imaging is mostly done on CPUs and GPUs, but the FPGA opportunity still exists. We are reminded that “today’s new stuff is tomorrow’s legacy,” and, of course, today’s new stuff is AI and a focus on data. Despite all of this, FPGAs remain a quick way to market, a simple way to obtain competitive advantage, and an indispensable technology for many mission-critical situations—even though they are expensive on a per-chip basis compared with ASICs. In HPC and the datacenter, however, FPGAs have significantly lower operational costs than running software on CPUs or GPUs. Fewer FPGAs are needed, requiring much less cooling than both CPUs and GPUs. FPGAs make for smaller datacenters, hitting a nerve with operators who fear their datacenters might shrink.

ASIC VS. FPGA Another way to use FPGAs is to complement ASICs. ASICs are built to hold fixed functionality while adding FPGAs to provide some flexibility for last-minute changes or adaptivity of the products to different markets. Modern FPGAs are integrating more and more hard functionality and becoming more and more like ASICs— while ASICs are sometimes adding a bit of FPGA fabric into their design for debugging, testing, in-field fixes, and flexibility in adding little bits of functionality as needed. Nevertheless, ASIC teams always fight the FPGA concept. ASIC designers ask, “Which functionality do you want?” and are impatient if the answer is, “I don’t know yet.” One such new battleground is the autonomous car industry. Since algorithms are constantly changing, and laws could change when cars are in the field, requiring

acmqueue | may-june 2020 75 hardware 6 of 12

driver updates, the solution needs to be flexible. Since FPGAs consume less power and are physically smaller (and have much smaller heat-sinks because of their lower clock frequency of FPGAs) than CPUs and GPUs, they are the obvious choice. Nevertheless, GPUs are easier to program and do not require a three-day place and route. Moreover, it is critical to be able to run the same code in the car and in the cloud (primarily for simulation and testing), so FPGAs would have to be available in the cloud before they could be used in the car. For these reasons, many developers prefer GPUs.

EVOLUTION OF FPGAS FPGAs are evolving. Modern interfaces are trying to make FPGAs easier to program, more modular, and more cooperative with other technologies. FPGAs support AXI (Advanced Extensible Interface) buses, which make them easier to program but also introduce enormous inefficiencies and make FPGAs less performant and ultimately much less competitive. Academic work, such as Eric Chung’s paper on dynamic networks for FPGAs,1 helps with the routing problem, but such advanced ideas have not yet been picked up by industry. How are FPGAs connected? For HPC workloads with large flows of data, you can use PCI Express and deploy communication-hiding techniques. But how about small workloads, such as found in NFV (network function virtualization), serving a large number of users at the same time. For NFV and acceleration of virtual machines in general, the FPGA must connect directly to the CPU, possibly using cache coherency as a communication

acmqueue | may-june 2020 76 hardware 7 of 12

mechanism, as investigated these days by VMware. Of course, a key feature is the ability to crash the FPGA without crashing the CPU, and vice versa. Hyperscalar technology companies are rediscovering requirements from IBM mainframe days, driving more and more complexity into standardized platforms. There are also opportunities for the masses. In offering FPGA platforms, organizations without the budgets for ASIC development and without knowledge of the he key barrier to latest silicon fabrication challenges and solutions can entry for develop circuits and build competitive advantage into new CPUs their products, such as the newly emerging opportunities and FPGAs for computing at the edge of the IoT (Internet of things) Tin the data center is not just speed network, close to sensors, displays, or just in-line at the and cost, but also wire, as data flows through. the availability Meanwhile, FPGA companies are pushing vertically of software and up the stack and into the CPU socket, where Intel is drivers for all possible I/O dominating the market, including, for example, special devices. instructions for NFV. The key barriers to entry for new CPUs and FPGAs in the datacenter are not just speed and cost, but also the availability of software and drivers for all possible I/O devices. Key to making FPGAs work in the datacenter is to make them easier to use—for example, with automatic tools that drive the use of FPGAs without place and route difficulties. Microsoft pioneered the use of FPGAs in a hyperscalar datacenter for accelerating Bing, NFV, and AI algorithms. Microsoft also built abstractions, domain- specific languages, and flexible hardware infrastructures. Commercially, the main problem with FPGAs is the go-to- market strategy.

acmqueue | may-june 2020 77 hardware 8 of 12

Building new chips and then starting to think about the software is too late. How do you extract value from existing software by adapting the hardware to serve the software? This also brings an opportunity to rethink FPGA architecture. A word of warning, however: The silicon industry devours cash. Building ASICs is a poker game with minimum bets rising over the years. It’s a winner-take-all game, and any threats such as FPGAs get eliminated early in the race. FPGAs are creating additional and undesirable risks for silicon projects.

NICHE TECHNOLOGY While a software designer will always say, “If it can be done in software, it will be done in software,” the ASIC designer will say, “If it can be done in an ASIC, it will be done in an ASIC.” Most interestingly, “If it can be done in software, you don’t have to deal with the guy who thinks like an FPGA.” FPGAs have a tiny community of sometimes eccentric, programmers, compared with the armies needed to make ASICs and with the world population of software programmers. The FPGA companies are small. The FPGA community is small. Intel is driving FPGAs for flexibility. It is the most successful company following the principle of building the hardware to run existing software. FPGAs can be faster than CPUs and GPUs, but the hard lesson from industry and the investment community is that most of the time during a computer’s existence, speed does not matter, and realtime does not matter. Therefore, buying a computer for speed alone is rare. It happens, but

acmqueue | may-june 2020 78 hardware 9 of 12

it’s more of a random event than a market on which to build a business. In addition, FPGAs have no standard, open source, enjoyable programming model—and, therefore, no standard marketplace for FPGA programs that work on all FPGA chips or can be easily cross-compiled. Maxeler Technologies has a high-level solution to provide such an interface, but wide industry adoption requires trust. To go from early adopters to benefiting everyone, trust requires alignment and support from established vendors in the datacenter space. Applications people in the real world say, “I don’t care what it is, just give me a way to do what I want to do.” What are the possible application areas for FPGAs that have not been widely explored yet? For realtime computing, there is manufacturing. For computer vision on drones, it’s the weight and power advantage of FPGAs. On a satellite it is very expensive to do hardware upgrades, so FPGAs provide long-term flexibility that can be critical. FPGAs need to find a product that resonates, and they need to be easy to program. It’s not just the hardware or software, it’s the ecosystem. It’s the complete solution. One way to expand beyond current market confines is realtime compilation and automatic FPGA program generation. This is easier said than done, but the opportunity is growing with AI tearing up the application space. These days, everything is done with AI; even traditional algorithms such as seismic imaging for oil and gas are incorporating AI. A science and engineering solution is needed to deal with AI blocks. FPGAs might be a good starting point, maybe initially to connect the AI blocks and then to incorporate them into the FPGA fabric

acmqueue | may-june 2020 79 hardware 10 of 12

such as the next-generation chips from Xilinx—with AI fabric, CPUs, 100G interfaces, and FPGA cells all in the same 7-nm chip. From another perspective, with AI chips producing and consuming vast amounts of data, FPGAs will be needed to feed the beast and move outputs away swiftly. With all the new ASICs for AI processing coming out, FPGAs could provide differentiation to AI chip companies.

PREDICTIONS Could the following developments have been predicted 10 or 25 years ago?2 While the world changes, the predictions seem to stay the same. 1. There will be successful CPU+FPGA server chips, or FPGAs with direct access to the CPU’s cache hierarchy. Some say yes, and some say no. 2. SoC (system on a chip) FPGA chips will grow and expand, driving the medical, next-generation telecom, and automotive industries, among others. 3. Developers will use FPGAs to do amazing things and make the world a better place but will have to hide the fact that there is an FPGA inside. 4. The FPGA name will remain, and chips called FPGAs will be built, but everything inside will be completely different. 5. As we forego (dataflow) optimization in order to make FPGAs easier to program, the performance of FPGAs will be reduced so they are no longer competitive with CPUs, which will always be easier to program. 6. There will be FPGAs with dynamic routing, evolving interconnect, and runtime-flexible data movement.

acmqueue | may-june 2020 80 hardware 11 of 12

7. Place and route software, as well as the complete software stack on top of FPGAs, will be open source. There are already initial efforts with Yosys and Lattice FPGAs. 8. All semiconductor architectures will be combined into single chips with combinations of TPUs, GPUs, CPUs, ASICs, and FPGAs. Some may be combinations of the whole of each. Others will be combinations of parts of each. 9. More chips will be focused on limited application spaces, and Related articles fewer on general-purpose chips. FPGA Programming for the Masses 3 In a way, everything is becoming The programmability of FPGAs must an SoC. improve if they are to be part of mainstream computing. FINAL COMMENT David F. Bacon, Rodric Rabbah, Sunil Shukla How many conflicts are resolved https://queue.acm.org/detail.cfm?id=2443836 with this article, and how many FPGAs in Data Centers 3 new ones are created? In this Expert-curated Guides to the sense, a conflict is a challenge to Best of CS Research an existing way of doing things. Gustavo Alonso Such an existing way of doing https://queue.acm.org/detail.cfm?id=3231573 things may have implications Reconfigurable Future 3 for the way people think, and, The ability to produce cheaper, more therefore, for the way they act. compact chips is a double-edged sword. But maybe more importantly, Mark Horowitz there will be implications for how https://queue.acm.org/detail.cfm?id=1388771 we developers earn a living.

acmqueue | may-june 2020 81 hardware 12 of 12

References 1. Chung, E. 2011. CoRAM: An in-fabric memory architecture for FPGA-based computing. Ph.D thesis, Carnegie Mellon University. 2. Field-programmable Custom Computing Machines. 2012. FCCM predictions; https://www.fccm.org/past/2012/ Previous.html. 3. Nemeth, T., Stefani, J., Liu, W., Dimond, R., Pell, O., Ergas, R. 2008. An implementation of the acoustic wave equation. In Proceedings of the 78th Society of Exploration Geophysicists Meeting, Las Vegas. Copyright © 2020 held by owner/author. Publication rights licensed to ACM.

SHAPE THE FUTURE OF COMPUTING!

Join ACM today at acm.org/join

BE CREATIVE. STAY CONNECTED. KEEP INVENTING.

2CONTENTS acmqueue | may-june 2020 82 TEXT 1 of 19 agile ONLY Scrum Essentials

Experiences of Cards Scrum Teams Improving with Essence

JEFF SUTHERLAND, IVAR JACOBSON, AND BRIAN KERR

his article presents a series of examples and case studies on how people have used the Scrum Essentials cards to benefit their teams and improve how they work. Scrum is one of the most popular agile Tframeworks used successfully all over the world. It has been taught and used for 15-plus years. It is by far the most-used practice when developing software, and it has been generalized to be applicable for not just software but all kinds of products. It has been taught to millions of developers, all based on the Scrum Guide.6 Today all our Scrum training is accompanied by a set of cards designed using the Essence standard, which has given us a better Scrum. Scrum is described succinctly in fewer than 20 pages in the Scrum Guide. Although Scrum is relatively simple in structure, teams can always improve their application of it and increase their effectiveness.

acmqueue | may-june 2020 83 agile 2 of 19

One way of improving is to use Essence,2,3 an industry standard for describing practices such as Scrum. It takes the key content of a practice and represents it in a standard way, allowing practices from many places to be understood, compared, and used together. Making the practice content tangible on cards allows teams to see and directly manipulate the ideas and concepts. The essential elements of Scrum are represented on the 21 cards seen in figure 1, showing all the values, principles, roles, activities, and things produced. The cards are moved, grouped, or ordered by the team in collaborative sessions as they make decisions about their own way of working. The card metaphor naturally prompts the right kind of conversation and encourages all team members to be involved in a hands-on, interactive, and fun way. The rest of this paper shows a few of the ways that the cards can be used: 1 FIGURE 1: Essential elements of Scrum

acmqueue | may-june 2020 84 agile 3 of 19

3 Individuals and teams learning new practices and enhancing training events. 3 Team retrospectives improving the team’s way of working. 3 Planning activities while efficiently coordinating the team and other stakeholders. 3 Agreeing on responsibilities within the team and, importantly, with external stakeholders. 3 Tracking progress of the key items under development. 3 Determining the current status of the endeavor. No doubt you will come up with your own when you have the cards in your hands.

LEARNING WITH THE CARDS One of the biggest challenges in implementing and scaling Scrum is having all team members agree on what Scrum is and how it is to be implemented in the organization. The Essence cards provide a standard definition of Scrum with all of its 21 components as defined in the Scrum Guide. We start with an exercise called “Build Your Own Scrum” using the Essence cards. Every team in the organization puts a copy of the cards on a flip chart with lines and labels that connect and explain the interrelationship between Scrum components, as shown in figure 2. When the teams have completed an image of Scrum, each one presents its Scrum to all the other teams. We then ask the teams to identify which components of Scrum are properly implemented, which are poorly implemented, and which are not implemented at all. On average, teams implement a third of the components well,

acmqueue | may-june 2020 85 agile 4 of 19

FIGURE 2: chart showing interrelationship between components 2

one third poorly, and one third not at all. This clarifies why they are having trouble with their Scrum implementations. If one third of the components of a car were missing, the teams would understand they might have trouble driving the car to a destination.

IMPROVING WITH RETROSPECTIVES An important part of Scrum is for the team to regularly

acmqueue | may-june 2020 86 agile 5 of 19

inspect and improve their way of working. Using the Scrum Essentials cards on a board, as seen in figure 3, allows the team to have good dynamic discussions while being prompted not to forget things. The board has two dimensions. The vertical axis represents how important that concept is to the team, while the horizontal axis is how well the team feels they performed in that area. The team then places the cards appropriately and moves them around as they discuss and assess how they work. The brief content on each card can be relied upon to summarize the most important information from the Scrum Guide. This helps reduce any circular discussions from team members about their personal interpretations of each concept and keeps the Scrum Guide alive and at the team’s fingertips. One common weakness when teams perform a

FIGURE 3: Use of cards on a board 3

acmqueue | may-june 2020 87 agile 6 of 19

Retrospective is that they talk about the problems without then agreeing on the improvement actions to plan for the next sprint. A simple column on the side of the board prompts the team to identify these improvements. The problems on which to focus will be in the top right-hand corner, which represents the high-priority but poorly performed items.

AGREEING ON RESPONSIBILITIES Software development is usually a team sport requiring the coordination of people with many different skillsets. Another simple board on which to place the Essence cards is one focusing on responsibilities, seen in figure 4. The board has a simple line across it, separating the roles that will agree on how they will work together. This type of active discussion has proved especially 4 FIGURE 4: Responsibility board

acmqueue | may-june 2020 88 agile 7 of 19

useful among different parts of an organization or even when the roles come from different organizations, such as in this example where a Product Owner is working with a development team from an external supplier. The cards’ vertical position shows who is wholly responsible, jointly responsible, or taking the lead for each concept. With a collaborative framework such as Scrum, it’s no surprise that many items straddle the line, so the team here has added notes to either side to describe the nature of their expected involvement. This team realized they had to make some compromises with their remote Product Owner and would have to carefully make best use of their time as they wouldn’t be able to have access to the Product Owner on demand during each sprint. When things get more formal, and perhaps contracts are drawn up between organizations intending to work jointly with an agile approach, this kind of facilitated session between parties is a good starting point for discussions before things get turned into legalese.

PLANNING ACTIVITIES In this example the Scrum Essentials cards are used to visualize the practicalities of performing the team’s activities. Most teams are already up and running, so a two-step approach can be used. First, the team distinguishes between the activities that are currently being performed OK and the ones that need to be set up for the first time or definitely need to be improved. This can be done using the Retrospective board, as described previously.

acmqueue | may-june 2020 89 agile 8 of 19

The second step is for the team to agree on how best to do these activities, prompted by the general questions of Who?, When?, Where?, Why?, and How? The activities and roles start off on the far left-hand side, and the activity cards to be focused on are then moved right, to the Activity column, as shown in figure 5. In this example the two activities of interest from before are the Daily Scrum and Product Backlog Refinement. They are discussed by the team, and the Scrum Team role card has been placed in the Who column with some notes added for clarification. Notes have similarly been added to capture the team decisions made when prompted by the questions in the remaining columns. 5 FIGURE 5: Who, When, Where, Why and How?

acmqueue | may-june 2020 90 agile 9 of 19

WHERE ARE WE IN THE BIG PICTURE? Most teams are used to tracking the state of the smaller things like individual Product Backlog Items or Improvements. Essence helps by providing a clear description of the states they go through, but even so it is easy for a team to lose sight of the bigger picture of their endeavor. Essence refers to these things a team wants to track as Alphas, each with a defined lifecycle of states and a checklist for each state. Essence also describes some of these key Alphas that always have to be considered, regardless of the specific practices being used. They are universally applicable and help a team see where they are in the big picture. These elements, known as the Essence Kernel, describe the overall Opportunity, Stakeholders, Requirements, System, Team, Work, and Way of Working. Figure 6 shows

FIGURE 6: Opportunity alpha 6

acmqueue | may-june 2020 91 agile 10 of 19

an example of the Opportunity alpha with its lifecycle and a checklist for one of the states. Laying the cards out like an abacus on a table showing the sequence of states for these top-level items allows the team to reason about where they are. They can consider each state in turn, examine the checklist, and move the card across if they feel they have adequately achieved it. Many teams find they make a lot of progress with their system but often forget key things to do with understanding the original opportunity, or not involving the right stakeholders in the right way. Of course, it’s not just a way to see where you are; the states and checklists also indicate what you should consider next.

BEYOND THE SCRUM ESSENTIALS The examples shown here have used the Scrum Essentials practice, but there are countless other practices in use around the world. Most teams use combinations of practices that suit their circumstances, and Essence allows these to be seamlessly combined into a single way of working using a consistent language. Some practices fill in the areas for which Scrum itself doesn’t provide guidance, such as User Stories, Use Case 2.0, or Scrum@Scale practices. Other practices complement the essential content of Scrum and help in accelerating its adoption, such as Scrum health checks and other patterns. Recently we were asked to run a workshop for a set of experienced Scrum Masters in an organization who were looking for a refresher on Scrum and how they

acmqueue | may-june 2020 92 agile 11 of 19

could improve. We designed this exercise using the Scrum Essentials practice and some complementary practices. It had the following structure: 1. Start with an exercise of “Build your own Scrum.” The Scrum Masters capture and explain how the Scrum elements fit together, allowing any misunderstandings of the Scrum practice itself to be surfaced and dealt with first. This is combined with their initial views on which parts are done well, done poorly, or not done at all. 2. Then perform a Scrum Health Check against the Scrum Pillars and Values. Many Scrum Masters focus on the mechanics of Scrum, but do not always reflect on the higher-level Pillars and Values of Scrum. The Scrum Health Check practice has a set of pattern cards with statements of what it’s like to be “Awesome” or “Awful” at each one to compare against and identify ones that may need more attention, shown in figure 7.

FIGURE 7: Focus Health Check 7

acmqueue | may-june 2020 93 agile 12 of 19

3. Conclude by looking forward to examples of how to become a high-performing team.5 A Scrum for Hyper- Productive Team practice has a set of pattern cards with descriptions of proven ways that a team can become high performing. Here the Scrum Masters can see which patterns they are already following and which ones they may want to prioritize for adopting.

SCRUM@SCALE After the success of using the Essence cards in our teaching of Scrum, we are now using Essence cards when teaching [email protected] It goes way beyond teaching, however, and enables teams to make decisions about how they will adopt and adapt the Scrum@Scale practices to their circumstances. When training groups of people online we use an electronic whiteboard tool that allows them to collaboratively manipulate the practices on a series of gameboards. As the number of practice elements increases, having all the card information visible at once can be overwhelming and a distraction from seeing the big picture. An easy way to overcome this is to use tokens for each element and have an alternative way of looking up the detail if needed. The example in figure 8 shows an online gameboard featuring a picture of the Scrum@Scale framework on the two overlapping circles. Both Scrum and Scrum@Scale practices have been described as Essence practices, and we use electronic movable tokens that only have the name of each practice element. First the team discusses and places the Scrum practice

acmqueue | may-june 2020 94 agile 13 of 19

FIGURE 8: The Scrum@Scale gameboard 8

tokens on the board to understand how they relate to the Scrum@Scale framework and see the coverage that the Scrum practice provides. Then the team place the Scrum@Scale tokens onto the board. This shows not only the similarities to nonscaled Scrum, but also the additional things needed to manage Scrum in a scaled environment of many teams. We also make the cards available on each participant’s phone (figure 9) so they can use that as a reference for themselves at the same time they are manipulating the tokens on the shared board with their team.

acmqueue | may-june 2020 95 agile 14 of 19

FIGURE 9: Cards on phone 9

Subsequent exercises get the participants to use the same tokens to build a picture of how they will implement the concepts in their own organization. They decide how many and what kinds of teams and scaled teams they will need, choosing from the options described in the practices. They also decide how many and what kinds of backlogs they will have, which teams will share them, and what the nature of the backlog items are. At last we’ve found a great way with Essence to comprehend a scaled way of working in an organization, ensuring it is understood by all so they can contribute and have ownership of the key decisions on how they work.

acmqueue | may-june 2020 96 agile 15 of 19

THE FUTURE IS SMART The Essence cards shown in this article are good for face-to-face collaboration and allowing teams to make decisions. If you are mixing and matching practices from many sources or keeping track of your status and using the checklists in your day-to-day work, then some kind of electronic support will be invaluable. Current development of a new app, Essence in Practice, is being performed by Ivar Jacobson International. A preview is shown in figure 10. On the right is a map of the alphas and work products with their relationships within the Essence Kernel. For example, within the Requirements there is one Product Backlog and many Product Backlog Items. The states and levels of details of these items 10 FIGURE 10: Preview of Essence in Practice

acmqueue | may-june 2020 97 agile 16 of 19

are shown on the map, while a Related articles detailed card on the left shows 3 Breaking the Major Release Habit more about the particular item Can agile development make being focused on. In this case your team more productive? the focus is on the Coherent Damon Poole state of Requirements; there https://queue.acm.org/detail.cfm?id=1165768 is a brief description and a 3 Voyage in the Agile Memeplex checklist of what it means to In the world of agile development, be considered in that state. The context is key. team can check these items Philippe Kruchten off as they are achieved, with https://queue.acm.org/detail.cfm?id=1281893 the overall progressions being 3 Is There a Single Method for shown by shading the states the Internet of Things? on the map. The other thing to Essence can keep software development note in this example is how an for the IoT from becoming unwieldy. activity relates to the other Ivar Jacobson, Ian Spence, Pan-Wei Ng items by showing an arrow to the https://queue.acm.org/detail.cfm?id=3123501 states or levels of detail it either contributes to or achieves. These few features enable the beginning of what we call Live Guidance.1 This is a combination of helping teams assess their current status with the checklists, guiding them as to what to progress to next, and how to do that, by showing related activities that can help achieve it. These planned achievements can be turned into to- do’s for the team, prepopulated with the checklists as a starting point, and as they are ticked off the current state is updated. This then allows Live Guidance to know where the team is, prompt what to do next, and so the cycle can repeat.

acmqueue | may-june 2020 98 agile 17 of 19

The future vision is for an ecosystem of practices that are evolved by the community as a whole. Teams will be able to adopt and scale their approach practice by practice, and feed back improvements and experiences of how well it worked in their situations. Depending on what the team sets out to achieve, the system can then recommend which proven practices or specific activities and patterns can help achieve it—even practices that the team is currently unaware of.

FINAL WORDS Scrum and Scrum@Scale are not limited to software but are applicable for all kinds of endeavors: systems, hardware, innovation. When combined with Essence, they can be elegantly extended with all kinds of practices— technical, human, and business—to create trustworthy products.

ACKNOWLEDGMENTS We thank Ian Spence for transforming Scrum and Scrum@ Scale into Essence practices, and Roly Stimson for formulating and documenting some games.

References 1. Jacobson, I. 2019. Live Guidance — a revolution in software development practice; https://www.linkedin.com/pulse/live-guidance-revolution- software-development-ivar-jacobson/. 2. Jacobson, I., Lawson, H., et al. 2019. The Essentials of Modern Software Engineering: Free the Practices from the Method Prisons! ACM Books; https://dl.acm.org/doi/

acmqueue | may-june 2020 99 agile 18 of 19

book/10.1145/3277669. 3. Jacobson, I., Stimson, R. 2018. Tear down the method prisons! Set free the practices acmqueue 16(5); https://queue.acm.org/detail. cfm?id=3301760. 4. Scrum@Scale; https://www.scrumatscale.com. 5. Sutherland, J., Harrison, N., Riddle, J. 2014. Teams that finish early accelerate faster: a pattern language for high-performing Scrum teams. In Proceedings of the 47th Hawaii International Conference on System Sciences; https://www.scruminc.com/scrum-papers/ teamsthatfinishearlyacceleratefaster/. 6. Sutherland, J., Schwaber, K. 2018. Scrum Guide; https:// scrumguides.org.

Jeff Sutherland, the inventor and co-creator of Scrum, has worked with thousands of companies deploying Scrum and recently launched two global trainer programs for Licensed Scrum Trainer and Certified Scrum@Scale Trainer, in addition to creating two independent companies, Scrum Inc Japan and Scrum@Scale LLC. He has been VP of engineering and CTO or CEO of 11 software companies. In the first four companies he prototyped Scrum and in the fifth company created Scrum as it is now used in 74 percent of agile software companies in more than 100 countries. In 2006 Sutherland established his own company, Scrum Inc., now recognized as the premiere source of Scrum training in the world. Ivar Jacobson is the creator of Use Cases and the Unified Process, a widely adopted method. He is also one of the three

acmqueue | may-june 2020 100 agile 19 of 19

original developers of the Unified Modeling Language. His current company, Ivar Jacobson International, is focused on using methods and tools in a smart, superlight, and agile way. This work resulted in the creation of a worldwide network, SEMAT (Software Engineering Method and Theory), which has the mission of revolutionizing software development based on a kernel of software engineering. The kernel has been realized as a formal OMG (Object Management Group) standard called Essence. Brian Kerr is a principal consultant and one of the founders of Ivar Jacobson International in the UK. He is an experienced agile coach, consultant, trainer, and change agent specializing in the adoption of key software development practices in a pragmatic and sustainable way. He works at all levels with individuals, teams, teams of teams, programs, coaching senior management, and strategic support to organizational improvement programs. He has been involved in the development of the Essence standard since the beginning, and in particular formed the early ideas of making the process tangible and representing it on cards to enable serious games. He is currently working on Essence tooling. Copyright © 2020 held by owner/author. Publication rights licensed to ACM.

2CONTENTS acmqueue | may-june 2020 101