Apple M1 Foreshadows Rise of RISC-V | by Erik Engheim | Dec, 2020 | Medium
Total Page:16
File Type:pdf, Size:1020Kb
12/23/2020 Apple M1 foreshadows Rise of RISC-V | by Erik Engheim | Dec, 2020 | Medium Get started Open in app Erik Engheim Follow 5.1K Followers About You have 1 free member-only story left this month. Sign up for Medium and get an extra one Apple M1 foreshadows Rise of RISC-V The M1 is the beginning of a paradigm shift, which will benefit RISC-V microprocessors, but not the way you think. Erik Engheim 4 days ago · 15 min read By now it is pretty clear that Apple’s M1 chip is a big deal. And the implications for the rest of the industry is gradually becoming clearer. In this story I want to talk about a https://erik-engheim.medium.com/apple-m1-foreshadows-risc-v-dd63a62b2562 1/17 12/23/2020 Apple M1 foreshadows Rise of RISC-V | by Erik Engheim | Dec, 2020 | Medium connection to RISC-V microprocessors which may not be obvious to most readers. Get started Open in app Let me me give you some background first: Why Is Apple’s M1 Chip So Fast? In that story I talked about two factors driving M1 performance. One was the use of massive number of decoders and Out-of-Order Execution (OoOE). Don’t worry it that sounds like technological gobbledegook to you. This story will be all about the other part: Heterogenous computing. Apple is aggressively pursued a strategy of adding specialized hardware units, I will refer to as coprocessors throughout this article: GPU (Graphical Processing Unit) for graphics and many other tasks with a lot of data parallelism (do the same operation on many elements at the same time). Neural Engine. Specialized hardware for doing machine learning. Digital Signal processing hardware for image processing. Video encoding in hardware. Instead of adding a lot more general purpose processors to their solution, Apple has started adding a lot more coprocessors to their solution. You could also use the term accelerator. This isn’t an entirely new trend, my good old Amiga 1000 from 1985 had coprocessors to speed up audio and graphics. Modern GPUs are essentially coprocessors. Google’s Tensor Processing Units are a form of coprocessors used for machine learning. https://erik-engheim.medium.com/apple-m1-foreshadows-risc-v-dd63a62b2562 2/17 12/23/2020 Apple M1 foreshadows Rise of RISC-V | by Erik Engheim | Dec, 2020 | Medium Get started Open in app Google TPUs are Application Specific Integrated Circuits (ASIC). I will refer to them as coprocessors. What is a Coprocessor? Unlike a CPU, a coprocessor cannot live alone. You cannot make a computer by just sticking a coprocessor into it. Coprocessor as special purpose processors which do a particular task really well. One of the earliest examples of a coprocessors was the Intel 8087 floating point unit (FPU). The humble Intel 8086 microprocessor could perform integer arithmetic but not floating point arithmetic. What is the difference? Intel 8087. One of the early coprocessors used for performing floating point calculations. Integers are whole numbers like this: 43, -5, 92, 4 . These are fairly easy to work with for computers. You could probably wire together a solution to add integer numbers with some simple chips yourself. https://erik-engheim.medium.com/apple-m1-foreshadows-risc-v-dd63a62b2562 3/17 12/23/2020 Apple M1 foreshadows Rise of RISC-V | by Erik Engheim | Dec, 2020 | Medium The problem starts when you want decimals. Say you want to add or multiply numbers Get started Open in app such as 4.25, 84.7 or 3.1415 . These are examples of floating point numbers. If the number of digits after the point was fixed, we would call it fixed point numbers. Money is often treated this way. You usually have two decimals after the point. You can however emulate floating point arithmetic with integers, it is just slower. That is akin to how early microprocessors could not multiply integers either. They could only add and subtract. However one could still perform multiplication. You just had to emulate it will multiple additions. For instance 3 × 4 is simply 4 + 4 + 4 . It is not important to understand the code example below, but it may help you understand how multiplication can be performed by a CPU only by using addition, subtraction and branching (jumping in code). loadi r3, 0 ; Load 0 into register r3 multiply: add r3, r1 ; r3 ← r3 + r1 dec r2 ; r2 ← r2 - 1 bgt r2, multiply ; goto multiply if r2 > 0 If you do want to understand microprocessors and assembly code, read my beginner friendly intro: How Does a Modern Microprocessor Work? In short, you can always achieve more complex math operations by repeating simpler ones. What all coprocessor do is similar to this. There is always a way for the CPU to do the same task as the coprocessor. However this will usually require repetition of multiple simpler operations. The reasons we got GPUs early on, was that repeating the same calculations on millions of polygons or pixels was really time consuming for a CPU. How Data is Transmitted to and from Coprocessors Let us look at the diagram below to get a better sense of how a coprocessor work together with the microprocessor (CPU), or general purpose processor, if you will. https://erik-engheim.medium.com/apple-m1-foreshadows-risc-v-dd63a62b2562 4/17 12/23/2020 Apple M1 foreshadows Rise of RISC-V | by Erik Engheim | Dec, 2020 | Medium Get started Open in app Overview of how a Microprocessor works. Numbers move along colored lines. Input/Output can be coprocessors, mouse, keyboard and other devices. We can think of green and light blue busses as pipes. Numbers are pushed through these pipes to reach different functional units of the CPU (drawn as gray boxes). The inputs and outputs of these boxes are connected to these pipes. You can think of the inputs and outputs of each box as having valves. The red control lines are used to open and close these valves. Thus the Decoder, in charge of the red lines, can open valves on two gray boxes to make numbers flow between them. https://erik-engheim.medium.com/apple-m1-foreshadows-risc-v-dd63a62b2562 5/17 12/23/2020 Apple M1 foreshadows Rise of RISC-V | by Erik Engheim | Dec, 2020 | Medium Get started Open in app You can think of data buses as pipes with valves opened and closed by the red control lines. In electronics however this is done with what we call multiplexers not actual valves. This lets us explain how data is fetched from memory. To perform operations on numbers we need them in registers. The Decoder uses the control lines to open the valves between the gray Memory box and the Registers box. This is how it specifically happens: 1. The Decoder opens a valve on Load Store Unit (LSU) which causes a memory address to flow out on the green address bus. 2. Another valve is opened on the Memory box, so it can receive the address. It gets delivered by the green pipe (address bus). All other valves are closed so e.g. Input/Output cannot receive the address. 3. The memory cell with the given address is selected. Its content flows out onto the blue data bus, because the Decoder has opened the valve to the data bus. 4. The data in the memory cell could flow anywhere, but the Decoder has only opened the input valve to the Registers. Things like mouse, keyboard, the screen, GPU, FPU, Neural Engine and other coprocessors are equal to the Input/Output box. We access them just like memory locations. Hard drives, mouse, keyboard, network cards, GPU, DMA (direct memory access) and coprocessors all have memory addresses mapped to them. Hardware is accessed just like memory locations by specifying addresses. https://erik-engheim.medium.com/apple-m1-foreshadows-risc-v-dd63a62b2562 6/17 12/23/2020 Apple M1 foreshadows Rise of RISC-V | by Erik Engheim | Dec, 2020 | Medium What exactly do I mean by that? Well let me just make up some addresses. If you Get started Open in app processor attempts to read from memory address 84 that may mean the x-coordinate of your computer mouse. While say 85 means the y-coordinate. So to get a mouse coordinates you would do something like this in assembly code: load r1, 84 ; get x-coordinate loar r2, 85 ; get y-coordinate For a DMA controller there might might be address 110, 111 and 113 which as special meaning. Here is an unrealistic made up assembly code program using this to interact with the DMA controller: loadi r1, 1024 ; set register r to source address loadi r2, 50 ; bytes to copy loadi r3, 2048 ; destination address store r1, 110 ; tell DMA controller start address store r2, 111 ; tell DMA to copy 50 bytes store r3, 113 ; tell DMA where to copy 50 bytes to Everything works in this manner. You read and write to special memory addresses. Of course a regular software developers never sees this. This stuff is done by device drivers. The programs you use only see virtual memory addresses where this is invisible. But the drivers will have these addresses mapped into its virtual memory addresses. https://erik-engheim.medium.com/apple-m1-foreshadows-risc-v-dd63a62b2562 7/17 12/23/2020 Apple M1 foreshadows Rise of RISC-V | by Erik Engheim | Dec, 2020 | Medium Get started Open in app I am not going to say too much about virtual memory. Essentially we got real addresses. The addresses on the green bus will get translated from virtual addresses to real physical addresses.