At the core of the user experience.®

The MIPS32® 34K™ Processor: Ultimate Design Flexibility for Embedded Applications

Ryan Kinter, Lead Microarchitect Hotchips 18, August 22, 2006

© 2006 MIPS Technologies, Inc. All rights reserved. At the core of the user experience.®

Agenda

Î 34K core multi-threaded architecture

Î Applications for multi-threading

Î Microarchitectural features

Î 34K core specs

© 2006 MIPS Technologies, Inc. All rights reserved. 2 At the core of the user experience.®

MIPS32® 34K™ Processor Core Family

Î Based on the proven 24K®/24KE™ core pipeline

Î First implementation of MIPS’ multithreading architecture

Î Includes DSP extensions

Î First released in October 2005

Î Working silicon running 5 threads in May 2006

Î Supports configuration options of the 24KE core plus MT specific options Î Customizable policy manager & gating storage modules Î 1 or 2 virtual processing elements Î 1 to 9 thread contexts

© 2006 MIPS Technologies, Inc. All rights reserved. 3 At the core of the user experience.®

Thread Contexts

Register File ASID Î The hardware associated with a Program Counter Instn Buffer software thread TC1 Î A TC executes instructions

Î Includes user state ASID Program Counter Instn Buffer Î 34K core can have 1 to 9 TCs TCn Î Minimal area per TC

Î Can dedicate TCs to specific, critical tasks Î Great for embedded applications

© 2006 MIPS Technologies, Inc. All rights reserved. 4 At the core of the user experience.®

Virtual Processing Elements Common Hardware (Fetch, Decode, Execute, Caches)

Î Contains system MMU MMU state Cache Control Cache Control Î With a TC, looks like a complete Exception Exception MIPS32 processor to software TC TC TC TC TC TC Î 34K core can have 1 or 2 VPEs

VPE1 VPE2

OS1 OS2

Î Each VPE can run an independent operating system Î 2 OSes without virtualization software

© 2006 MIPS Technologies, Inc. All rights reserved. 5 At the core of the user experience.®

Multi-threading Benefits

Î Fills short-term pipeline bubbles

Î Runs other threads during memory accesses Î Improves performance by increasing IPC

Î Removes overhead from switching threads Î Improves performance by reducing instruction count Î Reduces response time

© 2006 MIPS Technologies, Inc. All rights reserved. 6 At the core of the user experience.®

Application: 1-1 mapping of processes

Î In an RTOS with few processes, each one can have its own thread context. Î Never need to do a context switch! Î Reduces the number of instructions that need to be executed

Î For example, processing parallel data streams Î 1 control TC + up to 8 data TCs

Î If there is an unknown number of processes or more than the number of TCs, hybridize: Î Dedicate hardware for critical tasks Î Switch processes in and out on remaining TCs.

© 2006 MIPS Technologies, Inc. All rights reserved. 7 At the core of the user experience.®

Example System

RTOS handling Multi-tasking OS VoIP on other running residential Application Application VPE gateway Residential VoIP Gateway applications on multiple TCs within OS OS Dedicated TCs one VPE ThreadX for each function TC TC TC TC TC TC AppApplications App App Enc Dec E.C. VPE 0 VPE 1 QoS Policy manager Common Hardware sets relative priority of the tasks

© 2006 MIPS Technologies, Inc. All rights reserved. 8 At the core of the user experience.®

Application: Interrupt Service Thread

Thread A Thread B Î Allocate TCs to service system interrupts Time Startup Î Ready to run immediately Wait_on_IO Î “Zero” interrupt latency Application

Application I/O Signal

Application Read_from_IO

Application Service_IO

Application Wait_on_IO

© 2006 MIPS Technologies, Inc. All rights reserved. 9 At the core of the user experience.®

Interrupt service threads

Î YIELD instruction specifies that the thread should wait until a particular system event occurs

Î Following instructions are prefetched and stored in an instruction buffer

Î When system event occurs, the thread can resume execution.

Î TC has a dedicated register file, so there is no need for a context save or restore

© 2006 MIPS Technologies, Inc. All rights reserved. 10 At the core of the user experience.®

Interrupt service threads

Without Interrupt Service Thread Take Restore Exception Save Registers Registers 5 3 20 X 20 Time Fetch Handler Process Interrupt

With Interrupt Service Thread

5 3 20 X 20 Time

© 2006 MIPS Technologies, Inc. All rights reserved. 11 At the core of the user experience.®

Interrupt Service Threads

Î Thread scheduling policies can set priority of service thread versus other threads

Î Could have multiple interrupt service threads for various interrupt sources

Î Can be generalized to other types of tasks Î Periodic timer triggering profiling task Î I/O coherence request

© 2006 MIPS Technologies, Inc. All rights reserved. 12 At the core of the user experience.®

How to support 9 TCs?

Î Wide range of configurations requires novel solutions to have an attractive product with larger configurations as well as the more typical small ones Î Cannot have a large area or speed penalty for adding TCs Î Cannot sacrifice performance

Î Need to be selective about when to replicate resources per TC

© 2006 MIPS Technologies, Inc. All rights reserved. 13 At the core of the user experience.®

Example: Micro TLB

Î 24K core had 4 entry ITLB IF IS VA Î Need more entries for 9 TCs Entry0 24K Entry1 Tag Î Larger array would impact Entry2 P Compare speed Entry3 A

Î 1 private entry IPF IF IS VA per TC, plus 3 Shared shared TC0 Shared Tag 34K Compare Shared P Private A TC8

© 2006 MIPS Technologies, Inc. All rights reserved. 14 At the core of the user experience.®

Example: Micro TLB

Î Lookup is still across 4 entries, no speed penalty

Î When a single TC is running, it will have to 4 entries Î Same as it would have on a 24K core

Î Dedicated entry simplifies control logic

IPF IF IS VA Shared TC0 Shared Tag 34K Compare Shared P Private A TC8

© 2006 MIPS Technologies, Inc. All rights reserved. 15 At the core of the user experience.®

Example: Instruction Buffers

Î 24K core features 8 entry instruction buffer Î Decouples the fetch unit from the execution unit Î Fetch unit executes ahead to try to keep the buffer full

Î Replicated per TC on 34K core

Î Instruction buffer still decouples the fetch unit from the execution unit

Î Also isolates the fetch unit from the policy manager

© 2006 MIPS Technologies, Inc. All rights reserved. 16 At the core of the user experience.®

Instruction Buffers

Î Replicating moves the thread selection point further down the pipe Î Core can respond faster Î Reduces the number of instructions killed

Long stall condition detected A1 RF AG EX MS ER B1 RF AG EX EX A2 RF AG AG EX MS ER B2 RF RF Only 1 additional instn killed A3 RF AG EX MS ER A4 RF AG EX MS ER A5 RF AG EX MS ER Start executing exclusively from running thread

© 2006 MIPS Technologies, Inc. All rights reserved. 17 At the core of the user experience.®

Instruction Buffers

Î Instruction buffer also acts as a skid buffer Î Dispatched instructions are held in buffer Î Stalling instructions can be removed from the pipeline ƒ Move buffer pointer to restore A4 A3 A2 TC A

A3 B2 A2 B1 A1

TC B B3 B2 B1

© 2006 MIPS Technologies, Inc. All rights reserved. 18 At the core of the user experience.®

Load Data Queue

Î Buffer in Load/Store Unit LDQ 0-3 Î Holds load miss info Reg # Align Data Î 4 entries on 24K core

Î Option to increase to 9 entries on 34K core LDQ 0-8 Î Enough for 1 per TC for TCID Reg # Align Data gating storage

Î Shared resource

© 2006 MIPS Technologies, Inc. All rights reserved. 19 At the core of the user experience.®

34K™ Core Specs

Î Speed: 500MHz

Î TSMC 90G, nominal Vt Î Worst-case PVT

Î Area: 5.1mm2 with 2 VPEs, 4 TCs (Including 32KB/32KB caches) 2 Î 0.1 - 0.2mm for a TC 2 Î 0.2 - 0.3mm for a VPE

Î Power: 0.59 mW/MHz @1.0V Î Running four instances of Dhrystone

Î Licensable now Î First released in October 2005 Î Working silicon running 5 threads

© 2006 MIPS Technologies, Inc. All rights reserved. 20 At the core of the user experience.®

Summary

Î 34K core features multi-threading Î Wide range of applications

Î Dedicated hardware threads Î Reduce instruction count Î Improve response time

Î Low thread cost

© 2006 MIPS Technologies, Inc. All rights reserved. 21