PowerPC Core Tutorial Introduction

PURPOSE - To explain the features and functions of the PowerPC Core OBJECTIVES: - Identify the main components of the PowerPC Core. - Describe the PowerPC Core programming model. - Identify the different PowerPC Core instruction types. - Describe the PowerPC conditional branching logic. - Describe the PowerPC addressing capabilities. - Identify the features and functions of PowerPC unit. - Describe PowerPC operations. - Describe PowerPC . CONTENTS: - 56 pages - 10 question LEARNING TIME: - 115 minutes PREREQUISITE: - NetComm Roadmap tutorial

Welcome to this tutorial on the PowerPC Core. This tutorial describes the features and functions of the PowerPC Core, which is the central control for all MPC860 functions.

Upon completion of this tutorial, you’ll be able to identify the main components of the PowerPC Core and describe its programming model. You’ll be able to identify the different instruction types, including conditional branching. You’ll also be able to describe the important features of the PowerPC, including addressing modes, memory management, cache operations, and exception handling.

It’s assumed that you have a basic understanding of the MPC860 and the other NetComm products. Click the Forward arrow when you are ready begin the tutorial.

Page 1 MPC860 Components

4K System Interface I Cache Unit I MMU Core U- 4K BIU D Cache D System Functions MMU Real Time clock TM PowerPC PCMCIA Interface

4 Parallel I/O Internal General 16 Memory Baud Rate Controller Purpose Serial Space Generators Timers DMAs; 2 Virtual Parallel 32-Bit RISC µController IDMA Interface and Program ROM Port Internal Timers Peripheral Bus

SCC1 SCC2 SCC3 SCC4 SMC1 SMC2 SPI1 I2C Communications Serial Interface Time Slot Assigner Module

Let’s begin this tutorial with an overview of the MPC860 communications processor. The three main functional blocks are the Embedded PowerPC, the System Interface Unit, and the Communications Processor Module.

The Embedded PowerPC (EPPC) is the main processor unit including caches and memory management units (MMUs).

The System Interface Unit (SIU) provides access to the external bus and the sub-blocks shown.

The Communication Processor Module (CPM) sends and receives data over eight different controllers, both multiplexed or non-multiplexed. A 32-bit RISC controls the transfer of data.

This section of the NetComm training module describes the EPPC portion of the MPC8xx family of devices.

Page 2 EPPC Main Components

I-cache/I-MMU interface D-cache/D-MMU interface

Core L-Addr L-Data

Sequencer

Next address Branch Instruction generation unit queue

control bus write back bus (2 slots/clock)

Special GPR GPR IMUL/ ALU/ LDST LDST Regs (32 X 32) history IDIV BFU address fix data

source busses (4 slots/clock)

EPPC Main Components, part 1 Next, let’s take a look at the main components of the EPPC.

The sequencer provides centralized control of instruction flow to the execution units.

The sequencer includes an instruction address generator, a branch unit, and an instruction queue. The instruction address generator determines the address of the next instruction based on information from the sequential fetcher and the branch unit. The branch unit extracts branch instructions from the fetcher and uses static branch prediction on unresolved conditional branches to allow the to fetch instructions. The instruction queue holds and distributes the next instructions to be executed.

The special purpose contains the registers for control, status, and exception handling information.

The general purpose register file, or GPR file, contains the registers for normal integer and pointer operations. The GPR history file holds the results from instructions that have been dispatched but not yet retired.

Page 3 EPPC Main Components

I-cache/I-MMU interface D-cache/D-MMU interface

Core L-Addr L-Data

Sequencer

Next address Branch Instruction generation unit queue

control bus write back bus (2 slots/clock)

Special GPR GPR IMUL/ ALU/ LDST LDST Regs (32 X 32) history IDIV BFU address fix data

source busses (4 slots/clock)

EPPC Main Components, part 2 The integer multiply/divide unit executes all integer multiply and divide instructions. The arithmetic logic and bit field unit executes all other integer and bit instructions.

The EPPC includes two queues for loading and storing addresses. The load/store address queue is a two-entry queue shared by all load/store instructions. The load/store fixed point data queue is a two-entry, 32-bit wide queue that holds integer data, also known as fixed-point data.

Page 4 EPPC Programming Model Overview

• EPPC computations are performed register to register. • No stacking mechanism. • Supervisor mode: - runs in this mode. - Entire model is available. - Changes for different EPPC implementations. • User mode: - Doesn’t contain system functions. - Used for applications. - Same for all EPPC implementations.

In the EPPC, all computations are performed register to register. Information is saved to registers and restored from registers. There is no dedicated stack pointer or automatic stacking mechanism. These functions must be provided using software.

The EPPC programming model includes two modes, supervisor mode and user mode. Supervisor mode is the highest privilege and is the mode that the operating system runs in. In supervisor mode, the entire programming model is available to the CPU. Note that supervisor mode is different for different EPPC implementations.

In the user mode, or problem mode, the resources that have system-wide impact are not available. Unknown programs, such as applications and untested programs, run in this mode. Applications can run on any EPPC implementation because they all have the same user programming model.

The user programming model is essentially a subset of the supervisor model. Let’s take a detailed look at these two models.

Page 5 EPPC Programming Model

031 031 GPR0 General Special XER LR GPR1 Purpose Purpose ... Registers Registers CTR ... TB GPR30 TBU GPR31 User Programming Model Instruction Pointer Condition Register (same for all EPPC implementations)

031Supervisor Programming Model Machine State Register contains: MSR ( changes for different EPPC implementations) •State info •Exception enables •MMU enables Standard SPRs Additional SPRs DSISR SPRG1 SPR80-82: SPR560-570: bit manipulation DAR SPRG2 Icache & Dcache of MSR[RI&EE] control/status DEC SPRG3 SRR0 TB(R/W) SPR144-630: SPR784-826: Debug & MMU SRR1 TBU(R/W) development programming support model SPRG0 PVR

EPPC Programming Model, Part 1 The user programming model includes thirty-two general-purpose registers, where each register is 32 bits wide. All of these registers operate operate the same way except for some special cases for GPR0. EPPC computations are performed register to register. Information is saved to, and restored from, registers. Another register included in the user programming model is the condition register, CR, consisting of eight, 4-bit fields. We’ll discuss this register in more detail later in the tutorial.

The user programming model includes five special purpose registers. SPR1 is the integer exception register, XER, which is used for multi-precision arithmetic. This register includes bits to record overflow, carry, and summary overflow. The link register, LR, stores the return address when a call to a subroutine is invoked. The register, CTR, is commonly used as a counter register in loop programs, or as a pointer for some branch instructions. The other two special purpose registers, TB and TBU, are used for the Time Base, which is part of the PowerPC architecture. This is a 64-bit periodically increasing value that can be used as a time stamp. You can access the Time Base through these registers on a read-only basis.

Note that the Time Base registers are read-write in Supervisor mode but read-only in User mode. These registers also have different SPR numbers in the two modes.

In the supervisor programming model, there is the machine state register, MSR, which contains information about the machine state, such as enabling exceptions or . There is also a set of standard special purpose registers. The first two standard SPRs are the data storage interrupt source register (DSISR) and the data address register (DAR). These two registers store information when certain exceptions occur, especially error exceptions. The decrementer register, DEC, also functions as part of the PowerPC architecture. The value in this register constantly decrements, and it’s possible to generate an interrupt when the decrementer value reaches zero.

Page 6 EPPC Programming Model

031 031 GPR0 General Special XER LR GPR1 Purpose Purpose ... Registers Registers CTR ... TB GPR30 TBU GPR31 User Programming Model Instruction Pointer Condition Register (same for all EPPC implementations)

031Supervisor Programming Model Machine State Register contains: MSR ( changes for different EPPC implementations) •State info •Exception enables •MMU enables Standard SPRs Additional SPRs DSIR SPRG1 SPR80-82: SPR560-570: bit manipulation DAR SPRG2 Icache & Dcache of MSR[RI&EE] control/status DEC SPRG3 SRR0 TB(R/W) SPR144-630: SPR784-826: Debug & MMU SRR1 TBU(R/W) development programming support model SPRG0 PVR

EPPC Programming Model, Part 2 The next two registers, the save and restore registers (SRR0 and SRR1), are always used in exception processing. The processor’s exception handling mechanism saves the MSR and the instruction pointer in these two registers. The next four registers, SPRG0-SPRG3, are available for the operating system to use as it requires. The supervisor can access the TB and TBU registers and write a new value to the Time Base. The PVR contains the processor version and revision number.

Note that the supervisor programming model includes additional SPRs that support MSR bit manipulation, debug and development support, Icache and Dcache control, and the MMU programming model.

Like many RISC processors, the PowerPC doesn’t have an explicit, software-accessible . The mechanism that points to the next instruction is actually part of the sequencer, and can’t be directly accessed or manipulated. To distinguish this mechanism from a traditional program counter, we call this the instruction pointer.

Page 7 Question

Which mode of the EPPC programming model is the same for all PowerPC implementations? Click on your choice.

a) User mode b) Supervisor mode

Consider the following question.

Which mode of the EPPC programming model is the same for all PowerPC implementations?

Answer: The EPPC user mode is the same for all PowerPC implementations.

Page 8 Data (Operand) Formats

0 7 byte 03123 24 GPRn byte

015 halfword 03115 16 GPRn halfword

031 word

03115 16 GPRn halfword

The EPPC supports three data sizes: eight-bit bytes, 16-bit halfwords, and 32-bit words. The PowerPC architecture is derived from IBM’s POWER architecture, which uses the big endian data format. As a reminder, big endian refers to the fact that multi-byte variables are stored with the most-significant byte located at the smallest numerical address and the least-significant byte is located at the highest numerical address. In the PowerPC, this “endian-ness” extends to the bit level; bit 0 is the MSB of registers and busses. In the examples, you can see that the byte and halfword data types are loaded into the least-significant bits of the general purpose registers.

Note that the EPPC will operate in three “endian” modes: true little endian, PowerPC little endian, and big endian. The EPPC is in big endian mode when it comes out of reset, and most of the available real-time operating systems conform to the PowerPC EABI which specifies big endian operation. Therefore, our discussion will focus on the big endian operational mode.

Page 9 Instruction Formats

General Syntax Encoding Examples

0 576 10 11 15 16 31 addi r3,r4,750 Instr_i rD,rA,0xXXXX Opcode rDrA d ori r14,r5,0x100

0561011151620213031add r3,r4,r6 Instr rD,rA,rB Opcode rD rA rB Subopcode 0 or r16,r12,r3

d = UIMM (16-bit unsigned immediate data), or SIMM (16-bit signed immediate data)

The instruction size for all EPPC processors is a 32-bit word. Instructions are word-aligned so that the two low-order bits of an instruction address are not needed or are set to zero. The EPPC, like most 32-bit RISC processors, has a triadic register model. This means that most data manipulation instructions have two source operands and a separate destination operand. The destination operand and at least one of the source operands must be a general purpose register. The second operand can be a general purpose register or 16-bit immediate data. The table shows the two main instruction formats.

In the first instruction format, the operation is performed with a GPR, rA, and 16-bit immediate data (sign or zero extended to 32 bits).

The second instruction format performs an operation with two GPRs, rA and rB. Both instructions place the results into a destination GPR, rD. Operations are always 32 bits and write 32 bits to rD.

Page 10 Order of Operands and Operations

opr =operation (+, -, *, ÷)

Instruction syntax Algebraic Operation instr rD,rA,rB rD = rArB add rD,rA,rB rD = rA + rB sub rD,rA,rB rD = rA - rB mul rD,rA,rB rD = rA * rB div rD,rA,rB rD = rA ÷ rB subf rD,rA,rB rD = rB - rA

The order of the register operands in a PowerPC Assembly language instruction is compared to the corresponding algebraic statement as shown in the table. For example, the instruction “add rD, rA, rB” means that the value of rD equals the value in rA plus the value in rB. Note that the order of the operands for the last instruction is different than the others. The instruction “subf”, which stands for “subtract from”, reverses the order of the register operands.

Page 11 EPPC Instruction Set Overview

• Load and store instructions • Processor and cache instructions • Arithmetic and logical instructions • Flow control, including branching logic • Integer comparisons

The EPPC instruction set includes instructions for loading and storing information in the general purpose registers, for processor and cache control, for synchronization, for arithmetic and logical operations, for flow control, and for integer comparisons.

Let’s take a look at each instruction type beginning with the load and store instructions.

Page 12 Load and Store Instructions

GPRn Memory lbz lbzu lbzux lbzx lha lhau 031 0 31 lhaux lhax lhbrx GPR0 lhz lhzu lhzux Store lhzx lmw lswi GPR1 lswx lwarx lwbrx . lwz lwzu lwzux . Load lwzx stb stbu stbux stbx sth GPR30 sthbrx sthu sthux sthx stmw stswi GPR31 stswx stw stwbrx stwu stwux stwx stwcx.

Load instructions load information from memory into the general purpose registers. The load instructions will zero- or sign-extend data which is a byte or half word in length to 32 bits, so that the entire register is updated. The store instructions store information from the general purpose registers to memory.

Some of the EPPC instructions operate on multiple general purpose registers. These are the lmw, lswi, and lswx load instructions and the stmw, stswi, and stswx store instructions. These instructions have been included to provide compatibility with the POWER architecture. These instructions typically execute much slower than if the same result were accomplished using individual load or store instructions. Consequently, their use is not recommended and most compilers don’t issue them.

The PowerPC Core doesn’t provide any atomic read-modify-write bus transactions to support semaphores and other synchronizing techniques for multi-processor or multi-tasking environments. The lwarx and stwcx. instructions provide the hardware support required to accomplish these tasks in place of the atomic bus transactions.

Note that the optional eciwx and ecowx instructions are not implemented in the embedded EPPC Core.

Page 13 Processor and Cache Instructions

dcbt dcbtst dcbz 031mfmsr 031 031 MSR mtmsr dcbst dcbf dcbi GPR0 SPR0 mtspr icbi mfmsr mfspr GPR1 SPR1 mftb mtmsr mtspr . . . . mfspr tlbia tlbie mftb GPR30 SPR30

GPR31 SPR31

eieio ; I/O control- next load or store, waits until all prior loads/stores are done isync ; waits for all prior operations to complete & flushes instruction queue sync ; waits for all prior operations to complete

The processor and cache instructions are used for control and synchronization. Since the values in SPRs cannot be operated on directly, you must copy the SPR to or from a GPR. All of the cache instructions can be executed in user mode except the data cache block invalidate instruction, dcbi. The tlb instructions are used to configure the translation lookaside buffer entries in the MMUs. We’ll discuss the cache and MMU instructions later in this tutorial.

The enforce in-order execution of I/O instruction, eieio, can be inserted between load and store instructions to ensure that they occur in program order. The optional tlb-sync instruction is not supported and is treated as a no-op. The isync and sync instructions are required whenever performing operations that can change processor context, such as a move to machine state register instruction to enable MMUs. Note that the isync instruction also flushes the instruction queue, which causes instructions to be reloaded from memory. Although the sync instruction doesn’t cause reloading, it does wait until all pending load and store instructions are complete before proceeding.

Page 14 Arithmetic and Logical Instructions

addx addcx addex • Instructions are 32-bit operations. addi addic addic. addmex addzex andx • Instructions update all 32 bits andcx andi. andis. in the GPR cntlzw divwx divwux eqv extsbx extshx mulhwx mulhwux mulli mullwx nandx negx norx orx orcx ori oris rlwimix rlwinmx rlwnmx slwx srawx srawix srwx subfx subfcx subfex subfic subfmex subfzex xorx xori xoris

The EPPC includes many instructions for performing arithmetic and logical operations. Instructions whose names end with an italicized “x” have multiple variants that can use bits from field number 0 of the condition register for additional operands in the computation. Instructions which end with a period, whether noted explicitly or with the italicized “x”, will update field number 0 of the condition register. We’ll discuss the condition register in more detail later in the tutorial.

Note that all instructions are 32-bit operations that update all 32 bits of the destination GPR.

Page 15 Flow Control Instructions

Type Instructions Registers

Branching bx bcx bcctrx bclrx Counter and Link

Compare cmp cpmi cmpl cmpli Condition Register

Integer Compare lt gt eq so Condition Register

Condition Register crand crandc creqv crnand Condition Register crnor cror crorc crxor mcrf mcrxr mfcr mtcrf

Return from rfi (privileged instruction) SRR0 and SRR1 Interrupt System Call sc SRR0 and SRR1

Trap twi tw GPRn, SRR0, SRR1, MSR

The table shows the branch, condition register, return from interrupt system call, and instructions. Note that the compare instructions also affect the condition register. The counter and link registers can contain the target address for the branch instruction. Additionally, the link register can be updated with the next sequential address in the program flow, so it will contain the return-from-subroutine address.

In addition to the instructions listed, there are predefined forms of many instructions with special mnemonics enumerated in Appendix F of the Programming Environments Manual.

Page 16 Interpreting Condition Codes

048 121620242831 Condition Register CR0 CR1 CR2 CR3 CR4 CR5 CR6 CR7

lt gt eq so

Integer Computations and Comparisons

CR0 not CR0 4-bit CRx field Explanation of Condition updated updated Bit Condition For Comparisons Computations add add. 0 LT (rA)<(rB), simm or uimm Negative subf subf. neg neg. 1 GT (rA)>(rB), simm or uimm Positive &>0 mulhw mulhw. 2 EQ (rA)=(rB), simm or uimm Zero divw divw. and and. 3 SO copy of XERso copy of XERso xor xor. simm = signed 16-bit data uimm = unsigned 16-bit data

Next, let’s look at how condition codes are represented in the condition register. The condition register consists of eight 4-bit fields, CR0 through CR7. Each field can represent comparison results or integer computation (add, sub,etc).

Each condition field can record the conditions of a less than, greater than, equal to, or summary overflow operation.

Note that integer instructions do not inherently record condition codes, except for comparisons. The programmer must explicitly indicate with a period that condition codes are to be recorded. Computational instructions record condition codes only to the CR0 field.

Page 17 Compare Instruction Syntax

Syntax Compare Instructions

cmp crx,size,rA,rB ; compare rA & rB algebraic (signed)

cmpl crx,size,rA,rB ; compare rA & rB logical (unsigned)

cmpi crx,size,rA,simm ; compare rA & value as signed

cmpli crx,size,rA,uimm ; compare rA & value as unsigned

simm = sign-extended to 32bits, uimm = zero-extended to 32 bits

The table shows the syntax of the different compare instructions. The compare instructions can affect any field in the condition register. The crx string in the instruction is where the field number of the condition register is explicitly declared.

Page 18 Integer Exception Register (XER)

0 1 2 3 23 24 31 SPR1 SO OV CA 000000000000000000000 byte count

SO (summary overflow) OV ( overflow) CA (carry) Set whenever an instruction Set whenever an overflow Set whenever a carry out of msb sets the overflow (OV) bit. occurs, else cleared. Multiply occurs, else cleared. Extended Once set, it can only be cleared & divide instructions set OV precision instructions use CA as by a mtspr instruction. if result is too big for register. an operand.

The integer exception register, XER, contains arithmetic information useful for extended precision and for detecting signed arithmetic errors.

The summary overflow bit, SO, is set whenever an instruction sets the overflow bit, OV. Once the SO bit is set, it can only be cleared by a mtspr instruction.

The OV bit is set whenever an overflow occurs. Multiply and divide instructions set the OV bit when the result is too big to be stored in the register.

The carry bit, CA, is set whenever a carry out of the most-significant-bit occurs. Extended precision instructions use the CA bit as an operand.

Arithmetic instructions do not inherently use or update bits in the XER. The software must explicitly indicate which XER bits are used or updated by replacing the italicized “x” with a ‘C’, ‘E’ and/or ‘O’ at the end of the instruction. A ‘C’ will record a carry out in the CA bit. An ‘E’ will use CA as an operand and record a carry out in the CA bit. An ‘O’ will record an overflow in the OV and SO bits.

Page 19 Question

The compare instructions record condition codes in which fields of the condition register? Click on your best choice.

a) CR0 b) CR4 c) CR7 d) CR0-CR7 e) a and b

Now, let’s review what we’ve discussed by considering a couple of questions.

The compare instructions record condition codes in which fields of the condition register?

Answer: Compare instructions can record condition codes in any of the condition register fields CR0-CR7.

Page 20 Question

Which of the following instructions use the CA bit as an operand? Click on your best choice.

a) mullwo rD, rA, rB b) adde rD, rA, rB c) addc. rD, rA, rB d) divwo rD, rA, rB e) a and d

Which of the following instructions use the CA bit as an operand?

Answer: Including an ‘E’ at the end of the instruction uses the CA bit as an operand. It will also record a carry-out in the CA bit.

Page 21 Branch Operation bdnz/bdz bdnzt/bdzt

bc/bca b/ba CNTR = count - 1 bdnz bdz bdnzt bdzt Next Sequential Instruction Address

No No No Goto next bdnztCondition CNTR<>0? CNTR=0? sequential Branch / Fail Logic bdzt True? bl/bcl/bla instruction bclrl/bcctrl

Yes Yes bdnz/bdz Yes

0 293031 SPR9 Count Register (CNTR) Sign Extension BD/LI 00 Link Register SPR8

031 Current Instruction Address +

bcctr b ba bclr bc bca Branch Target Address Calculation

031 Branch Target Address

Branching Operation, part 1 Let’s continue with a discussion of branching operations. The diagram shows the operation of all branch instruction types and the different ways that branch target addresses are calculated. Branch instruction program flow can be controlled by a condition code flag (LT,GT,EQ,SO), a decremented count, both of these, or neither in the case of a branch always absolute instruction, ba. Note that the diagram doesn’t show the order of operations.

The top half of diagram shows how branch variations determine whether to branch or to fall through to the next instruction.

Unconditional branches, b and ba, bypass the decision logic, calculate the target address, and branch.

Branch instructions that use the counter (bdnz, bdz, bdnzt & bdzt) decrement the counter first, then check whether the decremented counter is not zero (bdnz & bdnzt) or equal to zero (bdz & bdzt). For the “no” case, branches fall through to the next instruction. bdz & bdnz for the “yes” case branch to the calculated target address. Note, for bdzt and bdnzt, the condition must also be true. Otherwise, bdzt and bdnzt fall through.

Page 22 Branch Operation bdnz/bdz bdnzt/bdzt

bc/bca b/ba CNTR = count - 1 bdnz bdz bdnzt bdzt Next Sequential Instruction Address

No No No Goto next bdnztCondition CNTR<>0? CNTR=0? sequential Branch / Fail Logic bdzt True? bl/bcl/bla instruction bclrl/bcctrl

Yes Yes bdnz/bdz Yes

0 293031 SPR9 Count Register (CNTR) Sign Extension BD/LI 00 Link Register SPR8

031 Current Instruction Address +

bcctr b ba bclr bc bca Branch Target Address Calculation

031 Branch Target Address

Branching Operation, part 2 Conditional branch instructions that do not use the counter, such as bc and bca, check a bit in the condition register for the set or cleared state. If the bit is in the proper state, then the condition is true and the instruction branches to the calculated target address. Otherwise, the branch falls through to the next instruction.

Branch instructions that include the “L” option (bl, bla, bcl, bclrl and bcctrl) save the return address, or next sequential instruction address, to the link register regardless of whether the branch takes or fails.

Depending on the branch/fail logic, the branch target address is calculated using one of three methods. The first method uses the contents of the link or counter register. The second method uses the sign extended displacement (BD/LI) from the branch opcode. The last method adds the sign-extended displacement from the branch opcode to the current instruction address.

Note that the mnemonics used in this diagram are predominantly from Appendix F of the Programming Environments Manual.

Page 23 Conditional Branch Instructions

Syntax: bc BO,BI,target BI = bit # of condition register (0-31)

05610 11 15 16 293031 bc opcode: 0x12BO Y BI BD AA LK

BO - Selects how branching is determined Y - Used for branch prediction

Let’s take a closer look at conditional branch instructions. The 5-bit BO field of the opcode, which includes the Y bit, controls whether a condition from the condition register and/or a count from the counter, SPR9, will determine the branching. The Y bit is the LSB of the “BO” field and is used for branch prediction.

The 5-bit BI field controls which bit of the condition register is evaluated. Therefore, this field controls which condition (LT,GT,EQ,SO) in which CRx field is evaluated.

The bit displacement field, BD, is used in the branching operation to determine the correct target address.

Page 24 Branch Prediction

BO Description Y=0 Y=1

0 1 Decrement cntr, branch if decremented cntr<>0 & condition is false 2 3 Decrement cntr, branch if decremented cntr =0 & condition is false 4 5 Branch if condition is false 8 9 Decrement cntr, branch if decremented cntr<>0 & condition is true 10 11 Decrement cntr, branch if decremented cntr =0 & condition is true 12 13 Branch if condition is true 16 17 Decrement cntr, branch if decremented cntr<>0 18 19 Decrement cntr, branch if decremented cntr =0 20 - Branch always

The table shows the different decimal values for the BO field for both values of Y.. These values select how the counter register and the condition register bits are used to determine if the counter register will be altered and the branch taken. For relative and absolute branches, if Y = 0 with a negative value in the BD field, the branch is taken. For all other cases, a positive value in the BD field, or when the target address comes from the link or counter registers, the branch is not taken. Setting Y = 1 causes the opposite prediction.

Since the EPPC performs branch folding, it’s this prediction mechanism that determines which instruction replaces the branch instruction. The new instruction gets folded into the instruction stream by the instruction dispatch unit, or sequencer. These instructions begin executing based on the predicted outcome of the conditional test and are said to be speculatively executing. If the prediction later turns out to be incorrect, the results from the speculative instructions are discarded, the values in the GPRs are restored, and the processor begins executing instructions in the correct path.

Page 25 Question

Which branch instruction updates the link register with the next sequential instruction address regardless of whether the branch takes or fails? Click on your choice.

a) bca b) bcctrl c) bdzt d) bclr

Consider the following question.

Which branch instruction updates the link register with the next sequential instruction address regardless of whether the branch takes or fails?

Answer: Branch instructions that include the “L” option save the next sequential instruction address to the link register regardless of whether the branch takes or fails.

Page 26 Overview of Addressing Capabilities

1. Register Indirect with Immediate index Effective Address = rA + d

2. register Indirect with Index Effective Address = rA + rB

3. Register Indirect (used only for string loads and stores) Effective Address = rA

Immediate Mode: 16-bit signed or unsigned data embedded into opword of arithmetic and logical instructions

Next, let’s discuss EPPC addressing capabilities. The EPPC supports three standard addressing modes and an immediate mode. Note that immediate mode is not technically considered an because the data is actually embedded in the opcode of the arithmetic or logical instruction. The addressing modes are used by the load, store, and cache instructions. rA and rB are the general purpose registers, GPR0-GPR31, d is a 16-bit immediate index sign-extended to 32 bits, and the effective address is the calculated .

Page 27 Addressing Mode Flow Chart

rA d rB 0

rA = GPR0? Yes 0

No + 0 31 031 GPR (rA) Effective Address

Yes Update?

031Store Memory GPR (rD/rS) Access Load

A special case of the first two addressing modes is when rA = GPR0. In this case, literal zero is used instead of the value in GPR0. This reduces the effective address to a single variable or constant. Note that an update option causes rA to be updated with the calculated effective address. This allows a pointer to be incremented or decremented and is typically used to implement stacks and heaps.

The figure shows the flow chart of the three possible addressing modes. First, rA is checked to determine if it is the GPR0. If rA is GPR0, a value of zero is used. Otherwise, the value of the specified register is used. The next step depends on the addressing mode. If the implemented addressing mode is Register Indirect with Index, the specified value is added to rB. If the addressing mode is Register Indirect with Immediate Index, the specified value is added to d. If the addressing mode is Register Indirect, the specified value is added to zero.

Page 28 Initializing Constants and Pointers

Example 1: addis with rA = GPR0

Instruction: addis r6,r0,0x8234 Simplified mnemonic: lis r6,0x8234

not used 0x00000000 literal zero 0x00032000 r0 + 0x82340000 d (shifted to upper halfword) 0x82340000 r6

Example 2: addi with rA = GPR0

Instruction: addi r6,r0,0x8234 Simplified mnemonic: li r6,0x8234

not used 0x00000000 literal zero 0x00032000 r0 + 0xffff8234 d (sign-extended to 32 bits) 0xffff8234 r6

Example 3: ori

Instruction: ori r6,r6,0x5671

0x82340000 r6 from addis example OR 0x00005671 d (zero-extended to 32 bits) 0x82345671 r6 complete 32-bit value

Recall that immediate data is embedded in the opcode for some arithmetic and logical instructions and is a maximum of 16 bits. Therefore, it takes two instructions to load pointers or constants greater than 16 bits. These examples show how to initialize values greater than 32-bits. The add immediate-shifted instruction, addis, can be used to load the value for the upper 16 bits. A special case of the add immediate or add immediate-shifted instructions occurs when rA = GPR0. In this case, a literal zero is used instead of r0, which reduces the instruction to a load immediate operation.

In the second example, the immediate value is not shifted. Instead, it is sign-extended. This may cause unintended results. Therefore, the recommended way for loading the lower 16 bits of a constant value is with the or-immediate instruction, ori, as shown in the third example.

Page 29 Overview of MMU Functions

• Memory access control: - Allow or inhibit caching - Control data cache mode - Control read-only or read/write access - Control user access and privilege access •Address translation

Next, let’s discuss the features and functions of the MMU. The two primary functions performed by the MMU are memory access control and address translation. The MMU transparently monitors all accesses to memory, including caching, and controls the data cache mode. It also controls read-only and read/write access as well as user and privilege access. If the access is not defined or not allowed, the MMU generates an interrupt.

The MMU address translation capability is particularly useful when you have multiple tasks that all want to use the same logical addresses for programs and data. Using the MMU, these tasks can be located in distinct physical memory locations.

While it is possible to run with the MMUs disabled so that the effective address becomes the real address, this also means that access protection and cache control can’t be changed for accesses of the same type, instruction or data, to different areas of memory. While this may work fine for the instruction stream, most applications need the performance benefits of the data cache while ensuring that I/O, FIFO entries, and are non-cacheable. The MMU must be used to accomplish this.

Page 30 Basic MMU Functions

Memory Pages M M U Read-only Read/Write Task A Cacheable Cache-inhibited Task A Shared Read/Write Access not allowed Cacheable Read/Write Task B Cache-Inhibited Task B Read/Write Cacheable

Hard Disk Task B page

The figure illustrates the many functions performed by the MMU of the MPC8xx processors. The MMU in 8xx processors supports the 4 kbyte page size defined in the PowerPC Virtual Environments Architecture. It also supports page sizes of 16 kbytes, 512 kbytes, and 8 mega-bytes. Additionally, it can provide protection for sub- pages of 1 kbyte.

The example shows two tasks, A and B, and their allocated memory pages.

The MMU transparently monitors all accesses. If access is not allowed, the MMU will generate an interrupt. For example, task B cannot access the non-shareable memory pages of task A.

Although rarely used in embedded applications, the MMU supports a demand paged, environment. Software programs request (demand ) memory by simply attempting an access. The operating system then swaps unused pages out to disk and brings new pages from disk to memory as they are demanded. The operating system defines these pages as resident and all of the task’s memory space appears to exist in memory (is virtual).

The MMU can operate in three different modes that control page size and access protection resolution. In this tutorial, we’ll discuss the simplest of these modes, Mode 1.

Page 31 Two-Level Organization Level 2 Table Desc # 0 4K 1 4K 2 4K

Level 1 Desc # Table 4K 0 4M 1021 4K 1 4M 1022 4K 2 4M 1023 Level 2 Desc # Table 0 4K 1021 4M 1 4K 1022 4M 2 4K 1023 4M

1021 4K 1022 4K 1023 4K Level 2 Desc # Table 0 4K 1 4K 2 4K

1021 4K 1022 4K 1023 4K

This example shows the organization of a two-level page table. The basic element for address translation and access control is a structure called a page descriptor. Each descriptor contains a number of fields that describe the access attributes and physical location of a page in memory. Page descriptors are organized into a table in memory with no more than 1024 entries, thus, they are also called page table entries, or PTEs. Since a Level 2 table for 4 kbyte pages can only describe 4 of the 4 of the EPPC, multiple Level 2 tables are required to fully describe all of the memory physically resident in a system. Consequently, there is a hierarchical system of descriptors. The entries in the Level 1 table contain pointers to the individual Level 2 tables, as well as additional access control fields.

Note that page tables are also called translation tables. The example shows what the two-level translation tables would look like with enough 4 kbyte pages to represent a fully populated system with 4 of memory.

Page 32 Overview of TLB Characteristics

• 860 TLBs contain 32 entries. • 850 and 823 TLBs contain eight entries. • TBL features: - Each TLB is fully associative. - TLB entries describe page size, access control, and address translation. - Hit in TBL results in one-clock cycle delay.

Let’s take a closer look at MMU operations. Recall that each instruction or data access to cache or memory requires two memory accesses to first get the PTE. To reduce this enormous overhead, the MMU also has a specialized, fully-associative cache that contains the critical translation and access control information found in the translation tables. In some architectures, this cacheing mechanism is called an address translation cache, or ATC. In the PowerPC, it’s called a translation look-aside buffer, or TLB. For most people, these terms are synonymous.

Each 860-derivative TLB contains 32 entries, 850 and 823 TLBs contain eight entries. Each TLB is fully associative. Some of the other TLB features in 8xx devices include:

• Individual TLB entries can independently describe any allowable page size. • In addition, each entry specifies attributes, such as cacheability, access protection, and address translation. • A hit in a TLB only incurs a one-clock cycle delay.

Page 33 TLB Operation Flow Chart

Start

Effective Address Asserted Result goes into TLB

Address N TLB miss ISR executes Match interrupt a tablewalk in TLB?

Y

Protection N TLB error Application- Match? interrupt specific ISR

Y

Real Address Asserted

End

This flow chart shows that mismatches for addresses or protection will result in interrupts to the CPU. When an interrupt occurs, the CPU must perform the appropriate corrective action in the interrupt service routine before processing can continue.

TLB entries must be initialized after coming out of reset. The of initializing and updating this information is known as a “tablewalk”. If there are sufficient TLB entries to fully map all of the I/O and memory in a system, including all of the I/O and memory of the on-chip resources, then no page miss will be incurred. This allows us to overwrite the page table and use that memory for other information. Otherwise, all available TLBs must be initialized after reset. As the software demands pages that are not mapped through TLBs, the interrupt service routine must perform a tablewalk to load a TLB with the appropriate page table entry.

Note that some members of the PowerPC family include specialized hardware that perform tablewalks without software intervention. The EPPC does it’s tablewalks with exception servicing software, using some specialized hardware for assistance.

Page 34 Access Attributes Guarded bc done ------done: lbz Rx,0(Ry) Permissions Instruction Data Supervisor User Supervisor User EX = Executable EX N/A R/W N/A N/A = No Access EX EX R/W R-O R-O = Read-only R/W R/W R/W = Read & Write R-O R-O

Cache Control

I-cache - Inhibited or Enabled D-cache - Inhibited or Enabled, Write-Through or Copy-Back

The access attributes supported by the MMU of the MPC8xx include guarded, user-level and supervisor-level permissions, and cache control.

The guarded attribute prevents speculative loading and instruction fetching from the addressed memory location. In the example shown, as the branch conditional to done (bc done) instruction enters the sequencer, the branch unit predicts the branch to be taken. This causes the program counter to return to the instruction addressed “done”, so the sequencer pre-fetches and issues the lbz instruction. If the memory location pointed to by register y is not guarded, data is loaded. If it is guarded, data is not loaded until the branch is decided. If the lbz instruction is in a guarded page, it is not fetched until the condition for the branch is resolved. Note that if the guarded instruction or data is already in the cache, the guarded bit has no effect. A page should be guarded if it is subject to destructive reads, and it’s usually recommended that guarded pages also be marked as non-cacheable.

Next we see the possible combinations for a Page Table Entry for access protection in resolution mode 1. As you can see, distinctions can be made between CPU privilege levels as well as transaction types.

The MMU uses cache control to determine whether transactions to a given memory page are cacheable or not. Additionally, the memory coherency mode of the data cache is also controlled here. For very sophisticated environments, you can assign up to 16 separate categories of tasks, each with it’s own separate address space ID. At the very minimum, this will require a separate translation table for each address space. Note that there are very few systems that actually use this feature.

Page 35 Initializing TLB Entries

(steps 1 and 2 of 6)

Step Action Description

1 Initialize an L1 descriptor Level 1 Descriptor

L2BA: level2 table pointer APG: access protection group G: guarded attribute PS: page size WT: write-through or copy back V: valid

2 Initialize L2 descriptor Level 2 Descriptor RPN: real page number PP: page protection E: encoding C: changed TLBH: TLB hit SPS: small page size SH: shareable CI: cache inhibit V: valid

Initializing TLB Entries, Part 1 Next, let’s look at the steps to initialize a TLB entry. Typically, this is required during initialization after reset. In the first two steps, the necessary descriptor tables are created in memory using the Level 1 and Level 2 descriptors. These are the actual entries in the Level 1 and Level 2 tables.

Page 36 (steps 3-6 of 6) Initializing TLB Entries 3 Initialize Mx_CTR Control Register x = I or D GPM: Group Protection Mode PPM: Page Protection Mode CIDEF: Default cache mode WTDEF: Write-through def. Reserved: Reserve TLB entries TWAM: table-walk assist mode PPCS: Privilege state compare INDX 4 Initialize Mx_EPN Effective Page Number Register x = I or D EA: page number EV: entry valid ASID: address space ID

5 Initialize Mx_TWC Tablewalk Control Register x = I or D x=I or D: Access info from Level 1 descriptor

6 Initialize Mx_RPN Real Page Number Register x = I or D page number and access info from Level 2 descriptor

Initializing TLB Entries, Part 2 Next, initialize the two MMU control registers (MI_CTR and MD_CTR). Note that the fields in the data and instruction MMU control registers differ slightly. The data MMU control register includes the WTDEF bit to control the default write-through attribute of the data cache, as well as the tablewalk assist mode bit (TWAM) that defines the 1 kbyte sub-page protection resolution.

The next step is to initialize the effective page number control register (Mx_EPN). This is where the logical effective address is defined. Then set the tablewalk control register (Mx_TWC).

Finally, initialize the real page number register (Mx_RPN). This is where the physical address is defined. You can set the real page number equal to the effective page number. In this case, the address generated by the CPU equals the address presented to the caches and/or bus unit. This provides a means of controlling the access attributes without any effective address translation. Most embedded systems actually operate in this manner.

When step 6 is executed, the contents of the Mx_EPN, Mx_TWC, and Mx_RPN are automatically copied to the TLB entry pointed to by the index field in the respective MMU control register (Mx_CTR). As the software demands pages which are not mapped through TLBs, the interrupt service routine must perform a tablewalk to load a TLB with the appropriate page table entry.

Page 37 Example: Reloading TLBs

Data TLB Tablewalk:

dtlb_swtw mtspr M_TW, R1 # Save R1 mfspr R1, M_TWB # Load R1 with address of Level 1 descriptor lwz R1, (R1) # Load Level 1 page entry mtspr MD_TWC,R1 # Save Level 2 base pointer and Level 1 attributes mfspr R1, MD_TWC # Load R1 with Level 2 pointer, accounting # for the page size lwz R1, (R1) # Load Level 2 page entry mtspr MD_RPN, R1 # Write TLB entry mfspr R1, M_TW # Restore R1 rfi

You can use this routine to reload a TLB in the event of a TLB miss interrupt. In this example, the MMU registers are used to perform the tablewalk for a miss in each of the MMUs. There are some differences between the data TLB tablewalk routine shown above and an instruction TLB tablewalk routine. The data TLB miss service routine takes 20 clocks when running from zero wait-state memory, while the instruction TLB handler requires 23 clocks running from the same zero wait-state memory.

Note that both routines are provided in the User Manual for the specific device. For the MPC860, they are provided in the section 9.10.1.1.

Page 38 Reserving TLB Entries

ITLB DTLB 0 0 1 1 2 2 • • • • • • • • • • • • 26 26 27 27 28 RSV4I=1 28 RSV4D=1 29 29 30 30 31 31 RSV4I=0 RSV4D=0

The index field in the MMU control register decrements every time the RPN register is written, as will happen in a tablewalk for a TLB miss interrupt. This provides a round-robin approach to TLB replacement; the oldest TLB is the next one to be replaced. Often you will want certain portions of memory, such as the operating system kernel and the task dispatch table, to always map into a TLB. The MMUs in the 8xx family allow you to reserve a subset of the TLBs so that the index field will never point to them. This is controlled by the reserve bit (RSV) of the respective control register.

These diagrams show that entries can be independently reserved in both MMUs. These examples are for 860 variants which have 32 TLB entries per MMU. In these cases, the reserve bit protects the four highest TLB entries. For 850 and 823 variants, which have only eight TLB entries per MMU, only two entries per MMU can be reserved. However, the basic process for reserving TLB entries is the same as for 860 variants.

Note that replacement counters in the index fields of the control registers are cleared to zero after executing the TLB invalidate all, or tlbia, instruction.

Page 39 Summary - Loading Reserved TLB Entries

1. Disable the TLB - clear IR or DR in the MSR. 2. Clear the RSV bit in the Mx_CTR 3. Invalidate the effective address of the reserved page using tlbia or tlbie 4. Set the index field in the control register: - between 28 and 31 for 860 variants - 6 or 7 for 850 and 823 variants. 5. Load the appropriate EPN register: - effective page number (EA) - address space ID of reserved page (ASID) - set EV bit = 1. 6. Run Tablewalk routine to load the TLB entry. 7. If needed, repeat steps 4-6 to load other TLB entries. 8. Set TLB index field outside of reserved range. 9. Set the RSV bit in the Mx_CTR.

Let’s review the process for loading a single reserved entry into a TLB.

The first step is to disable the TLB by clearing the IR or DR bit of the MSR as needed. In the second step, clear the reserved bit (RSV) in the appropriate MMU control register. Then invalidate the effective address of the reserved page using the tlbia or tlbie instruction.

In step four, set the index field of the appropriate control register. The value of the index field should be between 28 and 31 for 860 variants, and 6 or 7 for 850 and 823 variants. In step five, load the appropriate EPN register with the effective page number and the address space ID of the reserved page, and set the EV bit to 1. In step six, execute the Tablewalk routine to load the appropriate entry into the TLB. Repeat steps 4-6 as needed to load other TLB entries.

Finally, set the TLB index field outside of the reserved range and set the reserved bit in the appropriate control register.

Page 40 Question

Which instruction is recommended for loading the lower 16 bits of a constant value? Click on your choice.

a) addze. rD, rA b) addis rD, rA, SIMM c) ori rA, rS, UIMM

Let’s consider a few questions to check your understanding of PowerPC addressing and memory management.

Which instruction is recommended for loading the lower 16 bits of a constant value?

Answer: The recommended way for loading the lower 16 bits of a constant value is with the or-immediate instruction, ori.

Page 41 Question

In the 823 device, each translation look-aside buffer or TLB contains how many entries? Click on your choice.

a) 8 b) 16 c) 32 d) 64

In the 823 device, each translation look-aside buffer, or TLB, contains how many entries?

Answer: The 823 TLBs contain eight entries.

Page 42 Question

Which TLB tablewalk routine executes faster when running from a zero wait- state memory? Click on your choice.

a) data tablewalk routine b) instruction tablewalk routine

Which TLB tablewalk routine executes faster when running from a zero wait- state memory?

Answer: The data TLB routine executes in 20 clock cycles when running from zero wait-state memory, while the instruction TLB handler executes in 23 clock cycles when running from the same zero wait-state memory.

Page 43 Basic Cache Operation

External Memory

Cache line (block)

Tag D V L Data Address Data Comparator Match? (Hit) Current Address

Let’s continue our tutorial with a discussion of instruction and data caches. The diagram shows a simplified representation of a cache operation. A cache is a specialized block of high-speed memory. Typically this block contains an image of a portion of external, slower memory.

When the requested data is not in the cache, the Cache Controller performs an external access. Then the Cache Controller loads the data into a cache line, tags it with the address of where the data came from, and marks it valid. Caches have many cache lines, also referred to as sets. Each cache line can have multiple data entries organized as adjacent words in memory. The address of subsequent memory accesses is compared to the tag; when a match, or hit, occurs, the data is sent to the requester in a fraction of the time of an external access.

In addition to an address tag, each cache line contains one or more status bits. The dirty bit, or D-bit, indicates whether data has been written to the cache but not to external memory. Note that instruction cache lines do not have this bit because they can only be updated as a result of instruction fetch transactions. The lock bit, or L-bit, indicates if the cache line can be accessed but not replaced. Finally, a set valid bit, or V-bit, indicates the cache line is not empty or has not been deleted.

Page 44 MPC8xx Cache Characteristics Instruction Cache Characteristics 823/850 860/855T 823e/860P Cache size (Bytes) 2 k 4 k 16 k # of ways 2 2 4 # of sets 64 128 256 Tag size (bits) 22 21 20 Data Cache Characteristics 823/850 860/855T 823e/860P Cache size (Bytes) 1 k 4 k 8 k # of ways 2 2 2 # of sets 32 128 256 Tag size (bits) 23 21 20

The tables describe the cache characteristics for various members of the MPC8xx product family. Both the data and instruction caches for MPC8xx variants are n-way set associative physically addressed caches with a 4-word line size. As shown in the tables, the cache characteristics differ from variant to variant. The caches vary in size, the number of sets, the number of ways, and the number of most significant address bits for tag comparisons.

The instruction cache can only supply instruction words. Therefore, the only data it can contain is 16-bit immediate data embedded in an instruction word. It cannot be written to by the CPU.

The data cache provides data for load instructions and accepts data from store instructions. Note that the data cache cannot provide instructions.

Page 45 Instruction Cache Processing Flow

0 Instruction Pointer 20 21 27 28 29

21 2 7 word select way0 way1 w2 set0 tag0 w0 w1w2 w3 . . L . . tag0 w0 w1 w2 w3 set1 tag1 w0 w1 w2 w3 . . R . . tag1 w0 w1 w2 w3 U A r r set126 tag126 w0 w1 w2 w3 a tag126 w0 w1 w2 w3 set127 tag127 w0 w1 w2 w3 . . y . . tag127 w0 w1 w2 w3 21 21 MMU 128 128 comp hit1 comp hit0 Bidirectional Mux 2 -> 1

128

hit to line buffer/ from burst buffer

Next, we’ll look at some specific examples using the cache organization of the 860 and 855T devices. For other variants, you can adjust the variant characteristics in the examples using the values shown on the previous page. This example describes how the instruction cache operates.

The first 21 bits of the instruction pointer indexes through the instruction MMU to locate the translated, or physical, address. At the same time, the next 7 bits of instruction pointer indexes into the cache to find the set of tags to be compared.

Next, the physical address from the MMU is compared to both way 0 and way 1 of the indexed set. If one of the tags matches the physical address (a hit), the word selected by bits 28-29 in the instruction pointer is sent to the Core. If neither tag matches (a miss), an external transfer begins with the word requested by the instruction unit, with the critical word first, followed by the remaining three words. As the missed instruction is received, it is sent to the instruction unit and to the burst buffer.

Once the burst buffer is full, the line is sent to an empty way in the selected set or it’s sent to replace the data in the least recently used (LRU) way that is not locked.

Notice that there is only one valid bit and one lock bit for each cache line.

Page 46 Data Cache Processing Flow

EffectiveAddress 0 20 21 27 28 31

21 4 7 byte select way0 way1 set0 tag0 w0 w1w2 w3 L tag0 w0 w1w2 w3 set1 tag1 w0 w1 w2 w3 R tag1 w0 w1 w2 w3 U A r r set126 tag126 w0 w1 w2 w3 a tag126 w0 w1 w2 w3 y set127 tag127 w0 w1 w2 w3 tag127 w0 w1 w2 w3 MMU 21 21 128 128 comp hit1 comp hit0 Bidirectional Mux 2 -> 1

128

hit to/from line buffer/ burst buffer

Data Cache Processing Flow, Part 1 This example describes how the data cache operates. The data cache adds a dirty bit to each set of each way. This is because the data cache can be written to as well as read from. Note that the dirty bit will only be set for cache lines which map to pages where the write-back operation is permitted.

The effective address contains four low-order address bits used to select not only a particular word in a set, but even individual bytes in a word when that is required.

Next, let’s discuss the different data cache operations. A read operation is the same as the instruction cache read operation. Since the data cache can be updated by store operations, there are two methods to handle store operations: write-back mode and write-through mode. As we learned earlier, the MMU controls which memory pages are cacheable and whether the data cache should operate in write-back mode or write-through mode in a particular page.

Page 47 Data Cache Processing Flow

EffectiveAddress 0 20 21 27 28 31

21 4 7 byte select way0 way1 set0 tag0 w0 w1w2 w3 L tag0 w0 w1w2 w3 set1 tag1 w0 w1 w2 w3 R tag1 w0 w1 w2 w3 U A r r set126 tag126 w0 w1 w2 w3 a tag126 w0 w1 w2 w3 y set127 tag127 w0 w1 w2 w3 tag127 w0 w1 w2 w3 MMU 21 21 128 128 comp hit1 comp hit0 Bidirectional Mux 2 -> 1

128

hit to/from line buffer/ burst buffer

Data Cache Processing Flow, Part 2 Write-back mode, also called copy-back, only updates the necessary elements of the cache line without writing the transaction out to external memory. Typically, this is the preferred operating mode since it can significantly reduce the memory system bandwidth requirements.

In this mode, a write operation with a hit is similar to the read operation. The cache line written to is changed to the modified-valid state, where both the dirty and the valid bits are set, and the operation is concluded without a bus transaction to the corresponding external memory location.

A write operation with a miss first causes a burst read operation from external memory. It then proceeds with the write operation as in the case where a hit occurs.

In write-through mode, write operations to the cache also update the respective locations in external memory. Although programs normally execute slower when the data cache is in this mode, it’s the only way to ensure that an external master can read the correct data when reading these locations. This is because the MPC8xx data cache cannot “snoop” bus transactions of alternate masters.

In this mode, a write operation with a hit updates both cache and external memory. The cache line remains in the unmodified-valid state, with D = 0 and V = 1. A write operation with a miss writes to external memory and does not affect cache.

Page 48 Cache Instructions

Cache Instructions Operation

dcbf - Data Cache Block Flush If modified, writes line to memory then invalidates line (modified or not)

dcbst - Data Cache Block Store Writes the line to memory

dcbt - Data Cache Block Touch Loads the line from memory into cache

dcbtst - Data Cache Block Touch Loads the line from memory into cache for store

dcbz - Data Cache Block set to zero Zeroes the line in cache

dcbi - Data Cache Block Invalidate Invalidates the line (modified or not)

icbi - Instruction Cache Block Invalidates the line Invalidate

Let’s take a closer look at the cache control instructions described earlier in the tutorial. There are two basic ways to manipulate the caches to control when data is loaded into a cache or emptied from a cache. In most cases, the Cache Controller mechanism is adequate and no special software is required. However, there are times when you might want to perform a special function that can’t be handled by the Cache Controller. With the EPPC, this can be done using the cache control instructions or the EPPC special purpose cache control registers.

Cache control instructions require a memory address to specify the line to be accessed. The dcbi instruction is privileged because modified data may be lost. The dirty bit in the tag is compared for the dcbf instruction. Note that in the EPPC, the dcbtst instruction operates differently than what is described in the PowerPC virtual environments specification. Instead, it behaves exactly like the dcbt instruction.

Page 49 MPC860 Cache Special Purpose Registers

D-cache I-cache Description DC_CST IC_CST D/I-cache control and DC_ADR IC_ADR D/I-cache address register DC_DAT IC_DAT D/I-cache data port (read only)

The EPPC includes special purpose registers that can be used for cache operations. These registers control the I- cache and D-cache and are only accessible in supervisor mode. They can also be used by a debugger to aid in debugging operations.

Page 50 MPC860 Cache Register Operations

Operation Comments

Cache Enable/Disable Permit/prohibit cache operation

Data Cache Block Lock Useful for fast and deterministic accesses

Instruction Cache Block Load Useful for fast and deterministic and lock accesses

Cache Block Unlock Locked lines cannot be flushed/invalidated

Cache Invalidate all Must be done after reset

Cache Unlock all Must be done after reset

Data Cache flush cache line Similar to dcbf but does not compare tag

Cache read tags Useful for testing and debugging

Cache read registers Useful for testing and debugging

The table describes the operations that can be applied to the EPPC caches using the special purpose registers. Programming the EPPC special purpose registers makes the cache functions a flexible and dynamic way to optimize application performance.

Page 51 EPPC Exceptions

IP SRR0 IP SRR0 MSR SRR1 MSR SRR1

A Exception C Exception main service service routine 1 routine 2

B D

IP SRR0 IP SRR0 MSR SRR1 MSR SRR1

The last topic we’ll discuss is the MPC860 Core exception handling. An exception is an event which causes a deviation from normal processing. Some examples of exceptions are: interrupts, resets, and bus errors. When an exception occurs, all previously issued instructions are allowed to complete. Then, the address of the next instruction to be executed is saved in SPR save and restore register zero, SRR0, and the current copy of the machine state register is saved in SRR1.

In this example, the lines in the diagram show the processing for these events: path A - represents any exception path B - is a return from interrupt, or RFI path C - is an exception that occurs during exception service routine 1 path D - is an RFI where you need to check that the RI bit in SRR1 is set to 1.

If execution flow is from A to B, the exception is ordered, meaning that the program state is not lost.

If execution flow is A - C - D, and if C is caused by a machine check, non-maskable interrupt (NMI), or synchronous exception, then the exception may be unordered because the program state for path B in the diagram may be lost.

You can program the execution flow A - C - D - B for the interrupt nesting which is necessary to recover from unordered exceptions. Note that synchronous exceptions, which can cause unordered exceptions, are caused by an instruction, such as system call or trap. Asynchronous exceptions, which are often recoverable, are caused by anything other than an instruction.

The device User Manual includes a chapter that describes the different exceptions and how they are ordered and prioritized by the EPPC.

Page 52 EPPC Exception Vector Table

VECTOR OFFSET EXCEPTION TYPE (HEX) 0 0000 RESERVED 0 0100 SYSTEM RESET/NMI 0 0200 MACHINE CHECK 0 0300 DATA STORAGE 0 0400 INSTRUCTION STORAGE 0 0500 EXTERNAL INTERRUPT 0 0600 ALIGNMENT 0 0700 PROGRAM 0 0800 FLOATING-POINT UNAVAILABLE 0 0900 DECREMENTER 0 0A00 RESERVED 0 0B00 RESERVED 0 0C00 SYSTEM CALL 0 0D00 TRACE 0 0E00 FLOATING-POINT ASSIST 0 1000 IMPLEMENTATION DEPENDENT SOFTWARE EMULATION 0 1100 IMPLEMENTATION DEPENDENT INSTRUCTION TLB MISS 0 1200 IMPLEMENTATION DEPENDENT DATA TLB MISS 0 1300 IMPLEMENTATION DEPENDENT INSTRUCTION TLB ERROR 0 1400 IMPLEMENTATION DEPENDENT DATA TLB ERROR 01500 - 01BFF RESERVED 0 1C00 IMPLEMENTATION DEPENDENT DATA BREAKPOINT 0 1D00 IMPLEMENTATION DEPENDENT INSTRUCTION BREAKPOINT 0 1E00 IMPLEMENTATION DEPENDENT PERIPHERAL BREAKPOINT 0 1F00 IMPLEMENTATION DEPENDENT NON MASKABLE DEVELOPMENT PORT

The exception vector table defines where the program control finds the necessary instructions when an exception occurs. The base address of the table is controlled by the IP bit (bit 25) in the machine state register. If this bit is set to 0, the table is located at address $0. If the bit is set to 1, the table is located at address $FFF00000. The state of the Interrupt Prefix at reset can be controlled by the IP bit in the hard reset configuration word. The initialization of the hard reset configuration word is described in the System Interface Unit (SIU) tutorial.

The exception vector entries are not pointers to exception service routines. The processor begins executing instructions at the start of each exception vector. There are 64 words (instructions) to handle the exception before the next vector starts. If more space is needed, the code must branch outside of the table.

The first vector in the table, at offset zero, is reserved. You can load this vector with special values to help track programming errors such as de-referencing null pointers. Notice that the non-maskable interrupt shares the same vector as a reset and will execute the same initialization software unless the reset status register is queried to determine the program flow.

Page 53 Machine State Register (MSR)

MSR - MACHINE STATE REGISTER 0123456789101112131415 RESERVED POW - ILE

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 EE PR FP ME - SE BE - - IP IR DR RESVD RESVD RI LE

Next, let’s look at the fields within the machine state register (MSR).

Bit 13 of the MSR, the disable bit, POW, selects whether the MPC860 is in normal power mode or reduced power mode.

The exception little endian bit, ILE, determines if the EPPC should be in little endian mode when taking an exception. The LE bit determines if the EPPC should be in little endian mode during regular operation.

The PR bit determines if the EPPC is in User Mode or Supervisor mode. The FP bit would enable the Floating Point Unit if the 860 was capable of doing floating point operations. ME enables a double bus fault when cleared. Therefore, if you get a out of reset, the machine goes to a checkstop, or halt state.

Setting the SE bit enables a single state trace, and setting the BE bit enables branch trace.

The IP bit selects the interrupt prefix. The IR and DR bits enable the instruction and data MMUs when set. These bits get cleared by all exceptions, thus disabling the MMU functions.

The EE bit is cleared by hardware whenever an interrupt or decrementer exception occurs. In your software, you can re-enable these exceptions by setting this bit after the machine state has been saved. If the RI bit is not set when an exception occurs, the interrupt is not recoverable. In this case, the operating system should halt or reinitialize the system. The RI bit is cleared automatically by hardware after copying the original MSR to SRR1. After the exception handler saves SRR0 and SRR1 to memory, additional exceptions may be recoverable if the software can set the RI bit before another exception occurs.

Page 54 EPPC Exception Processing Flow

Start

Copy MSR to SRR1

Load SRR0 with the next instruction address or the address of the instruction that caused the exception.

Change MSR (typically to all zeroes) including MSR.PR, MSR.EE, and MSR.RI

Point to the exception vector and begin the service routine

End

This flow diagram describes EPPC exception processing.

The first step in the exception process causes the contents of the MSR to be copied to SRR1.

Next, the SRR0 is loaded with the following instruction address or the address of the instruction that caused the exception.

The third step causes changes in the MSR, typically setting all the bits to zero, including the PR, EE, and RI fields.

After the first three steps are completed, the EPPC can move on to the exception service routine. Please note that for most exceptions, the machine state is saved only in SRR0 and SRR1, but some exceptions will save other information in the data storage interrupt source register (DSISR) and the data address register (DAR). The exception service routine must save the machine state in the SRR0 and SSR1 to other locations to ensure that subsequent exceptions are recoverable.

After exception service routine has completed, the RFI causes the value in SRR1 to be loaded into the MSR and instruction execution to commence at the address pointed to by SRR0.

Page 55 Machine Check Exception Processing Flow Start

TEA* asserts or a parity error occurs

Debug 0 MSR[ME] =? 1 mode N enabled? Debug Y Y mode N enabled? 1 Checkstop State DER[MCIE] =? 0 DER[CHSTPE] =? instruction 0 processing 1 disabled Machine check exception occurs PowerPC enters debug mode

1 PLPRCR[CSR] =? 1 SRR1[RI] =?

Reset occurs 0 Interrupt 0 End recoverable End

This flow diagram describes how the EPPC processes a machine check exception. The machine check exception is important to consider, because processing this type of exception can potentially interact with other resources in the 8xx device.

There are three possible outcomes from a machine check exception.

If debug mode is disabled and exceptions are recoverable, we proceed to the machine check exception service routine.

If debug mode is disabled, the machine halts. A reset is the only way to exit the halted state.

You can also enter the debug state. In this state, the processor will respond to commands over the debug interface.

Page 56 Question

In the 823e instruction cache, what is the tag size in bits? Click on your choice.

a) 20 b) 21 c) 22 d) 23

Let’s complete this tutorial with a few questions to check your understanding of PowerPC caches and exception handling.

In the 823e instruction cache, what is the tag size, in bits?

Answer: In the 823e instruction cache, the tag size is 20 bits.

Page 57 Question

Which instruction can be used to invalidate a cache line? Click on your choice.

a) dcbf b) dcbi c) icbi d) all of the above

Which exception type shares the same exception vector as a reset?

Answer: The non-maskable interrupt shares the same exception vector as a reset.

Page 58 Question

Which exception type shares the same exception vector as a reset? Click on your choice.

a) external interrupt b) system call c) non-maskable interrupt d) trace

Which exception type shares the same interrupt vector as a reset? Click on your choice.

Answer: The non-maskable interrupt shares the same interrupt vector as a reset.

Page 59 PowerPC Core Conclusion

- PowerPC Components and Programming Model - PowerPC Instruction Set and Branching Logic - PowerPC Memory Management and Addressing Capabilities - PowerPC Instruction and Data Caching - PowerPC Exception Handling

This completes our training on the PowerPC Core of the MPC860. In this tutorial, we examined the main features and functions of the PowerPC Core. We took a detailed look at the PowerPC instruction set, including conditional branching logic, and discussed PowerPC memory management and addressing capabilities. We also discussed PowerPC cache operations and exception handling.

Page 60