<<

Introduction to Architecture

Lecture 1 Overview: Architecture and Instructions

Wayne Luk Department of Computing Imperial College London

https://www.doc.ic.ac.uk/~wl/teachlocal/arch/

[email protected] wl 2021 1.1

Welcome to this module on Introduction to . The slides are supported by notes embedded with the presentations. The teaching materials are available on the course homepage: https://www.doc.ic.ac.uk/~wl/teachlocal/arch/

They are also available from Department of Computing’s “materials” page: https://materials.doc.ic.ac.uk/manage/2021/40005

The lectures are delivered by live streaming, and recorded versions would be available in Teams soon after each lecture.

1 TPU:

Systolic Array

Source: N.P. Jouppi et al.

wl 2021 1.2

Lets have a look at one of the latest , the TPU developed by Google to support deep neural networks for applications involving artificial intelligence. This , like many others, contains 4 kinds of resources: resources for computation , for data buffering (using memory to store data), for control , and for input and output ( I/O ). Interestingly, the core computational element of the TPU is a systolic array , enabling effective matrix multiplication. If you wish to find out what systolic arrays are and how to design them, come to my Custom Computing course in two years’ time…

2 What is a computer?

• what makes it general purpose? • simplest description? CPU Main Memory

wl 2021 1.3

So what exactly is a computer? What makes it general purpose so that it can be so useful? What is the simplest way we can describe a computer? One cannot be much simpler by having two connected blocks, so lets start from here… What is a computer?

• what makes it general purpose? • simplest description? CPU CPU Main Memory Memory

Registers Control P Unit Arithmetic & Logic Unit (ALU)

wl 2021 1.4

We can begin to fill in the details. The block on the left is often known as the , CPU. The CPU contains registers as fast memory for storing data, an ALU for computation , and a for managing the operations. The block on the right is the memory, containing programs P and data D. Storing programs in memory allows the CPU to become general- purpose: by supplying different programs to the CPU, it can perform different computations. What is a computer?

• what makes it general purpose? • simplest description? CPU CPU Main Memory Memory

Registers Control RAM RAM Unit Arithmetic & Logic Unit (ALU)

Input/Output Controllers

Hard Disk Mouse, Keyboard Ethernet Cameras, Sound USB/DVD Drive Monitor, Printer Modem

wl 2021 1.5

Lets add the remaining resources: those for input/output. Most computers follow this architecture, which is sometimes called the , after one of the computing pioneers, . But is it the most efficient? Why does the TPU adopt a systolic array rather than an ALU in the CPU? We will address these questions later… Why study architecture? The Good.. • understand: – how computers work; bridge hardware/software gap – choices and constraints for computer engineers/architects – how to manage complexity of architecture description/design • undergoing rapid development – applications: internet, medical imaging, cloud computing new – billions of gates: build your own reconfigurable processor super- customisable parallelism: trading speed, power, accuracy... computer!

wl 2021 1.6

It would be unimaginable if computer scientists do not understand how computers work. And computer architecture is so exciting! It provides the foundation of modern civilization, enabling so many things that we now take for granted, from mobile phones to the internet. And architectures are evolving rapidly; there are no opportunities to, for example, build your own processor which can be reconfigured to support the best trade-off in speed, power consumption etc for particular applications. This capability comes from the technological advances allowing billions of gates , used in implementing computation , and a large amount of storage units, such as flip , used in implementing memory , to be placed on a .

6 Summary

wl 2021 1.7

But this was not the case many years ago. If you go to the Science Museum, you can find the gates and flip flops in the PDP-8 computer, which was released in 1965. They have the size of a medium-sized laptop…

7 Why study architecture? The Good.. • understand: – how computers work; bridge hardware/software gap – choices and constraints for computer engineers/architects – how to manage complexity of architecture description/design • undergoing rapid development – applications: internet, medical imaging, cloud computing new – 5 billion gates: build your own reconfigurable processor super- customisable parallelism: trading speed, power, accuracy... computer! – technology: non-graphics programs on graphics processor • it has impact on almost all aspects of computing and engineering, on both theory and practice • example: accelerator architectures for data centres

wl 2021 1.8

So advances in computer architecture have a significant impact on many aspects of our society. We will have a quick look at data centre acceleration.

8 Accelerate clouds: Microsoft + Amazon

www..org/news/microsoft-goes-all-in-for-fpgas-to-build-out-cloud-based-ai/

aws.amazon.com/ec2/instance-types/f1/

wl 2021 1.9

Many cloud computing providers, including Microsoft and Amazon, are providing resources for accelerating data centre computing for demanding applications, such as those involving artificial intelligence. Such resources are often based on FPGA (field programmable ) technology, which we shall mention more later. Google’s data centres are increasingly accelerated by he TPU, which is covered at the beginning of the lecture.

9 …the Bad and the Ugly

0 M u x Add ALU 1 result

Add Shift RegDst left 2 4 Branch MemRead Instruction [31– 26] MemtoReg Control ALUOp MemWrite ALUSrc RegWrite

Instruction [25– 21] Read Read register 1 PC address Read data 1 Instruction [20– 16] Read register 2 Zero Instruction 0 Registers Read ALU [31– 0] 0 ALU M Write data 2 result Address Read 1 M data u u M memory x Instruction [15– 11] Write x u 1 Data x data 1 memory 0 Write data 16 32 Instruction [15– 0] Sign extend ALU control

Instruction [5– 0]

…and this is not the most complex!

wl 2021 1.10

This shows the architecture of a simple processor. While it looks complex, we will learn the secrets of how it works. One technique is, for each moment, to focus on the elements that are active and to ignore those which are inactive. We will show how this technique will help us manage the complexity of understanding this architecture…

10 Hints for success

• come to lectures • come to tutorials and ask questions • read notes and course textbook • attempt unassessed coursework without reading the solutions • discuss the material regularly with friends • explain ideas to non-specialists • follow latest industrial and research development

wl 2021 1.11

How to get the most of this module? Here are some hints. If you follow some of them, you’ll have a good chance of having a good grasp of the key material, and a good reward from the assessments…

11 Module Plan: Part 1

Week Tuesday Friday Remarks (starting) 2 (18/1) Lecture 1 Ex 1 Module overview; instructions; performance; Lecture 2 evaluation 3 (25/1) Lecture 3 Ex 2 Instruction format; architecture comparison; Lecture 4 arithmetic; ALU 4 (01/2) Lecture 5 Ex 3 Multiply and divide implementation; Lecture 6 and control 5 (08/2) Lecture 7 Ex 4 Single- and multi-cycle datapath; microprogram; Lecture 8 6 (15/2) Lecture 9 Part 2 Exceptions; pipelining; recent advances; summary

wl 2021 1.12

This module has two parts. I will cover Part 1 for the first 4.5 weeks, introducing the relevant topics based on the MIPS processor. Dr Maria Valera- Espina will cover Part 2 for the remaining weeks.

12 Approach

• learn Computer Organisation and Design based on must have Patterson & Hennessy, 5th edition access Morgan Kaufmann 2014 (P&H) to - chapters 1 to 4, appendices B and D

wl 2021 1.13

This course is based on a well-known text book “Computer Organisation and Design“ 5 th edition. You are expected to have access to this book. We will cover chapters 1 to 4, appendices B and D.

13 ISCA, May 2002

wl 2021 1.14

Th text book are written by two renowned computer architects, Professor John Hennessy (left) of and Professor David Patterson (right) of UC Berkeley. I took this photo of them during a boat trip in Alaska during the International Symposium on Computer Architecture in 2002, where I gave a tutorial; see https://iscaconf.org/isca2002/Tutorials.html

14 Approach

• learn Computer Organisation and Design based on must have Patterson & Hennessy, 5th edition access Morgan Kaufmann 2014 (P&H) to - chapters 1 to 4, appendices B and D

• the 6th edition is ready, use it for reference • compare different architectures e.g. MIPS and 68000 Tanenbaum: Structured Computer Organization; H & P:

Computer Architecture: A Quantitative Approach wl 2021 1.15

The 6th edition is just becoming available; use it fer reference if you have a copy. There are other useful books on computer architecture, such as “Structured Computer Organization ” and “Computer Architecture: A Quantitative Approach”. If you do not have time, just focus on relevant chapters in “Computer Organisation and Design” 5th edition.

15 What is computer architecture?

• architecture = instruction set architecture (ISA) + machine organisation • ISA examples: , ARM, MIPS, SPARC, RISC-V • instruction set: how abstract? how complex? support for general / special-purpose computing? Compatibility? • how to choose an implementation for a given instruction set?

wl 2021 1.16

Computer architecture has two parts: instruction set architecture (ISA) and machine organisation that implements a given ISA. So how can we design an ISA? And how to choose an implementation for an ISA?

16 ISA: between software and hardware

Application (eg: browser) Operating System (Win, ) Software Assembler Instruction Set Architecture Hardware Processor Memory I/O system

Datapath & Control Digital Design Circuit Design transistors Source: Garcia

ISA provides an abstraction of hardware resources to software

wl 2021 1.17

The ISA is an abstraction of the hardware resources of a given machine. It forms the lowest level of software, on which higher levels of software can be built. Note that one ISA can be implemented in many ways. For example, both Intel and AMD supports the x86 ISA.

17 Design approaches

• Complex Instruction Set Computers, CISC – dense code, simple compiler – powerful instruction set, variable format

• Reduced Instruction Set Computers, RISC – simple instructions, fixed format, optimising compiler – speed, low development cost, adapt to new technology

wl 2021 1.18

There are two main ISA approaches: CISC and RISC. CISC includes more complex instructions, which reduces the workload of (which translates high-level languages like C into the assembly language). However, this also means that CISC instructions take more time to execute, as they are more complex. Since the compilers are becoming better, RISC is becoming faster and more adaptive to new technologies. You will learn the details of their differences in the third lecture.

Most ISAs are now RISC based, except the x86 ISA. Part one of the module will cover RISC, while Part 2 will cover CISC.

18 Instructions: Overview (P&H: p.62-120)

• instruction = opcode what it does + operand register / memory / data • MIPS instructions: 3 main types: R, I, J • design principles for RISCs good performance + easy to implement • use MIPS processor to illustrate ideas in this module

wl 2021 1.19

An instruction for a processor usually has two parts: opcde and operand. Opcode specifies its function, while the operand specifies the information or data needed to carry out that function. For example, a well-known processor called MIPS has 3 main instruction types, which will be detailed later. We will use MIPS to explain ideas in computer architecture.

19 Where are MIPS processors?

wl 2021 1.20

MIPS processors can often be found for embedded processing, as shown in the above examples.

20 MIPS architecture

• representative of modern RISC architectures • 32 registers $0..$31 32 bits each • $0 wired to 0, the others general-purpose • register-register or load-store architecture – most instructions involve registers only: fast add $1, $2, $3 # reg1 = reg2 + reg3 comment – special memory access instructions: possibly multicycle lw $8, Astart($19) # reg8 = M[Astart + reg19] • goal: minimise memory access; why? wl 2021 1.21

MIPS has a simple architecture, with 32 registers, each of 32 bits. Most instructions related to computation involve only registers and not memory, since registers are much faster than memory. There are special instructions for memory access which are relatively slow; so once data are brought into the CPU, they would stay there as long as possible before being sent back to memory.

21 MIPS instructions: R-type

• 3 types: R-type (register) I-type (immediate) fixed size: 32 bits J-type (jump)

• R-type: arithmetic, comparison, logical, … add $8, $17, $18 # reg8 = reg17 + reg18 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 0 17 18 8 0 32 opcode source 1 source 2 dest. shift function

• MIPS format: usually destination comes first

wl 2021 1.22

The three types of MIPS instructions are R, I and J. The R-type covers instructions for computation, including arithmetic and logical operations.

22 MIPS instructions: I-type

• immediate (I-type): memory access conditional branches arithmetic involving constants • memory access: lw $8, Astart($19) # reg8 = M[Astart + reg19] 6 bits 5 bits 5 bits 16 bits 35 19 8 Astart opcode source dest. immediate constant • arithmetic: addi $1, $2, 100 # reg1 = reg2 + 100

wl 2021 1.23

The I-type instructions cover memory access and conditional branching. It can be used for arithmetic when, for example, a constant value is involved.

23 MIPS instructions: J-type

• jump (J-type): unconditional jump to instruction in memory j 1236 # jump to instruction at address 1236 6 bits 26 bits 2 1236 opcode memory[0],[4], … [4,294,967,292] byte wide • jal: jump and link # save address of next instruction # in register before jumping • “jump” instructions can be I-type or R-type I-type : bne $19,$20,Label # if reg19 ≠ reg20 goto Label R-type: jr $ra # jump to address in register ra

wl 2021 1.24

J-type instructions covers unconditional jumps to instructions in memory. The jal instruction enables returning to the instruction right after the jal instruction. However, note that “jump” instructions can be I-type or R-type.

24 Example

• if (i = j) f = g+h; else f = g-h; • allocate reg16 = f reg17 = g reg18 = h reg19 = i reg20 = j

• bne $19, $20, Else # if i ≠ j goto Else add $16, $17, $18 # f = g+h (if i = j) j Exit # goto Exit Else: sub $16, $17, $18 # f = g-h (if i ≠ j) Exit: • while-loop: similar

wl 2021 1.25

This example shows how to assign data to registers to initialise them, so that they can then be used to implement computations.

25 Remarks

• only 2 conditional branches, bne and beq • need slt (set on less than) slt $1, $16, $17 # if reg16 < reg17 then reg1 = 1 # else reg1 = 0 • implement branch to L on reg16 < reg17 as slt $1, $16, $17 # … if reg1 ≠ 0 then goto L bne $1, $0, L # (reg0 always 0) • load constant hex 000A000B to register 5, use load upper/lower immediate (lui/lli) lui $5, 10 # reg5 = 000A0000 addi $5, $5, 11 # reg5 = reg5 + 000B

wl 2021 1.26

MIPS has relatively few instructions, so some tasks would need two instructions to complete rather than one; e.g. it only has two instructions covering conditional branches. To implement some conditionals, two instructions are needed to make use of the SLT instruction to compute “less than”. As another example, initialising a register would need 2 instructions, lui and addi. We shall continue our discussions about the MIPS processor in the next few lectures.

26