Challenges to EDA System from the View Point of and Technology Drivers

Mitsuo Saito

Chief Fellow / VP Corporation Semiconductor Company

Copyright © 2009 Toshiba Corporation, All rights reserved. Contents Semiconductor Technology and Trend Our Processor Design Experiences ¾ Early Struggles for Processor Design ¾ Meet with Playstation • Development of Graphics and for Playstation 2 ¾ Towards Broadband Engine (Cell) • Development of Cell Processor • Targets of Cell Processor ¾ Towards SpursEngine • What is SpursEngine Makimoto’s Wave and What we have done? ¾ Historical Analysis of Development Today’s Situation and Challenge to EDA system ¾ What is necessary now? ¾ What will happen next? Summary

Copyright © 2009 Toshiba Corporation, All rights reserved. 2 StillStill SemiconductorSemiconductor RoadmapRoadmap is is WorkingWorking :: MooreMoore’’ss LawLaw

One Billion per Chip in 2010 (250Mgates)

1.E+10

1.E+09

1.E+08

1.E+07

1.E+06

1.E+05 # of Transistors / Chip 1.E+04

1.E+03

1980 1985 1990 1995 2000 2005 2010

From SEMATECH Data

Copyright © 2009 Toshiba Corporation, All rights reserved. 3 First Micro Processor was born in 1971 4004 DIP16P, Address 12 bit, Data bus 4 bit Clock 741KHz P-MOS 10um Tr. 2300 11/15/1971

From : Wikipedia

Copyright © 2009 Toshiba Corporation, All rights reserved. 4 ProcessorProcessor ClockClock RateRate perper YearYear

10000

6GHz 3.2GHz 5GHz 3.8GHz P4 PLAYSTATION3 2.93GHz Standard Processor Era 3.6GHz P4 2006/11/11 3.4GHz P4 Core2 Ext 4GHz Out of Order Execution 3.2GHz P4 3.0GHz P4 3GHz

2.0GHz 3.2GHz Pentium 4 XBox360 2GHz 1.4GHz 2005/12/10 Pentium 4

1GHz Athlon/PIII Multi core Era 1000 700MHz 800MHz Freq. ≠ Perf. RISC Era 600MHz 21264 Pentium III 733MHz 21164 XBox 500MHz Simpler the Better 600MHz 2001/11/15 21164 Pentium III 485MHz 300MHz 450MHz GameCube 275MHz 21164 Pentium II 2001/9/14 21064 200MHz 300MHz 300MHz (MHz) 187MHz 21064 Pentium II PlayStation2 21064 2000/3/4 200MHz 200MHz Dreamecast 1998/11/24 100MHz 150MHz Pentium Pro 100 100MHz 100MHz Pentium Nintendo64 1996/6/18 Hiend Processor 66MHz Pentium Processor 66MHz 33MHz 486DX2 PlayStation GAME console 1994/12/3 1999 SIA local clk(Hi-Perf.)

1999 SIA chip-across clk(Hi-Perf.)

10 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 991 992 993 994 995 996 997 998 999 000 001 002 003 004 005 006 007

Year (Chip or System Shipment)

Copyright © 2009 Toshiba Corporation, All rights reserved. 5 Waves to Standardization and Customization

Mass production (Standardization):Factory efficiency S/W solution Enough performance with 10 million/month Performance became standard HW, SW Solution better than RISC + ASIC Severe competition Surplus supply Memory in Asia Price down Price slump X86 + MPU in US Standard Software PC Chipset , 1KDRAM(Intel) Memory, SPARC(SUN) Centric Emotion Tr Diode 4bit MPU(Intel) MPU (MIPS) Design ('47) Engine Cell/B.E. Cell/B.E. 1957 1967 1977 1987 1997 2007 2017 TV, IC Invention CAD Revolution Power Consumption? Calculator, SW+HW Engine Killby patent Watch MOS GA ASIC System on a RISC Short life cycle Design crisis Chip race Too much integration

100 design/month Co-design of dedicated H/W & S/W Various & small production (Customization):Customer satisfaction Source : Makimoto’s Wave

Copyright © 2009 Toshiba Corporation, All rights reserved. 6 AI (1986) Maybe the last paper and pencil based Design! Processor was a board based About 100 sets were sold, and reasonable revenue

Copyright © 2009 Toshiba Corporation, All rights reserved. 7 AI Processor Chip (1988) •Silicon Compiler “Genesil” was Used

Performance 12 MIPS @Dhrystone

Registers 32bit × 32

Buses 32bit × 2

Bus Cycle 1 Clock min

Clock Freq. 16MHz Max

Process 1.2um 2 Layer Metal CMOS

Tr.Cnt 113K

Package 299pin PGA

Copyright © 2009 Toshiba Corporation, All rights reserved. 8 First TRON Processor TX1 (1988) •Proprietary HDL (H2DL) was used

Performance 5 MIPS

Registers 32bit × 16

Buses 32bit × 2

Bus Cycle 2 Clock min

Clock Freq. 25MHz Max

Process 1.0um 2 Layer Metal CMOS Package 155-pin PGA

Copyright © 2009 Toshiba Corporation, All rights reserved. 9 Incomplete GL Processor (1990) •Proprietary HDL (H2DL) was used

Freq. 65MHz

Inst. 4 Way SuperScalar

Dhry. 100MIPS

Linpack 66MFlops(Single) 35MFlops(Double)

Power 3.3V/5~6W

Process 0.5μmCMOS

Package 448pin PGA

Tr.cnt 2.5M

Copyright © 2009 Toshiba Corporation, All rights reserved. 10 RISC Processor (1993) • was used

Jointly developed with Inc.

Copyright © 2009 Toshiba Corporation, All rights reserved. 11 R8000 Based Board

Designed by Silicon Graphics Inc.

Copyright © 2009 Toshiba Corporation, All rights reserved. 12 Embedded RISC Processor TX39(1995) •Verilog was used

•First Embedded RISC

•32bit MIPS

•40MHz/43MIPS

•500mw

•4KB I / 1KB D cache

•4K page MMU (32 entry)

Copyright © 2009 Toshiba Corporation, All rights reserved. 13 “GSP” in 1988

・Gouraud Shading 10M Pixel/sec 30K Polygons/sec ・Flat Shading 160M Pixel/sec ・Lines 1M Lines/sec ・Clock 20MHz ・Technology 1.2μmCMOS ・ 130K Transistors ・Chip size 11.5×11.3mm ・Package 144pin Flat Package

Copyright © 2009 Toshiba Corporation, All rights reserved. 14 Molecular model image by GSP

5160 Polygons 6.4 frames/Sec 8 bit/pixel 16bit Z Buffer

Copyright © 2009 Toshiba Corporation, All rights reserved. 15 First Generation Playstation

Source : Wikipedia

Copyright © 2009 Toshiba Corporation, All rights reserved. 16 Playstation Shipment History

From: http://www.scei.co.jp/corporate/data/

1000 16000

900 14000 800 12000 700 600 10000 500 8000 x10KTotal x10K/Quater 400 6000 300 4000 200 100 2000 0 0 96/3 97/3 98/3 99/3 00/3 01/3 02/3 03/3 04/3 05/3 06/3 07/3 08/3

PlayStation PlayStation 2 Playstation 3 Playstation Total 30 100 Playstation2 Total Playstation3 Total

Copyright © 2009 Toshiba Corporation, All rights reserved. 17 Playstation was very successful, though

Original business model was not making money from Hardware but Software license. In reality, could make money even from hardware, mainly because of semiconductor cost down. 3D graphics could get very good reputation. But It’s performance was not good enough yet. Emotion of the characters couldn’t be presented! By considering game console life cycle, hardware performance should be competitive at least for a few years An epoch-making graphic chip was the key for the next generation Embedded DRAM technology based graphics was employed Main processor had to have enough performance correspond to very high performance graphics. Special processor had to be developed

Copyright © 2009 Toshiba Corporation, All rights reserved. 18 “Graphics Synthesizer ”(GS)

16.6mm×16.8mm 32Mbit Embedded DRAM 0.25μm 47.7 M transistors Designed by , SCE and Toshiba Fabricated by SONY

Copyright © 2009 Toshiba Corporation, All rights reserved. 19 Rendering performance Trend

100,000,000 Game Console Polygon/sec Personal Computer PS2 Graphics Work Station

10,000,000 Dream Cast

1,000,000 N64(?)

PlayStation 100,000 1993 1994 1995 1996 1997 1998 1999

Copyright © 2009 Toshiba Corporation, All rights reserved. 20 How Emotion Engine was born?

Toshiba could develop and provide the graphics chip for the first generation Playstaion. However, processor chip was provided by US company. Some engineers in Toshiba semiconductor division, began to consider to propose not only graphics but main processor for next generation Playstation, and it was called “Emotion Engine” afterwards. Started to approach to SCEI a few month after first generation Playstation was launched (12/3/1994). SCEI was interested in proposed Geometry Engine architecture, and started a joint technical investigation After main concept of Emotion Engine was accomplished, Toshiba team has started to develop the processor portion at the Silicon Valley office in 1996, before an agreement was done.

Copyright © 2009 Toshiba Corporation, All rights reserved. 21 Emotion Engine Photo

15.02mm x 15.04mm Frequency : 300MHz Transistors : 13.5M Power : 18Watts Design Rule : 0.25um Gate Length : 0.18um Designed by SCE and Toshiba Fabricated by Toshiba

Copyright © 2009 Toshiba Corporation, All rights reserved. 22 Emotion Engine Block Diagram

COP2 E 128bit F COP1 VU1 Processor U FPU Core VU0 128 Inst. Data GIF Inst. Data Mem. Mem. to 16KB 16KB SP- Mem. Mem. Rendering I$ D$ 4KB 4KB Engine RAM (GS LSI) VIF0 VIF1 CPU VPU0 VPU1

System Bus 10ch 128-bit Memory I/O IPU DMAC Interface Interface

External Memory Peripherals

Copyright © 2009 Toshiba Corporation, All rights reserved. 23 Emotion Engine Peak Performance

7 Integer Performance(GIPS) Bus Band width(Gbyte/Sec) 6 Floating Point performance(GFLOPS) 5 4 3 2 1 0 Pentium2(400MHz) Pentium3(600MHz) EE(300MHz)

Copyright © 2009 Toshiba Corporation, All rights reserved. 24 Emotion Engine Development History Spring 1997:Joint Development Agreement was done (Schedule was defined) Spring 1998 :Design Completion has been delayed because of many Bugs of CPU and not enough speed ¾ Had to report Six month delay from the original schedule Spring 1998 :Reorganize and enhance development team (82 engineers from Toshiba Japan), and Reset the Target date and specification ¾ #1 Design Completion 7/1998 ¾ Chip size=17x14.1mm2 Aug. 1998 : EE#1 Design completion (8 month Delay) ¾ Relaxation of the target frequency(300MHz=>200MHz) ¾ Agreed to Relax voltage and temperature constraint Oct. 1998 : Shipped the first sample and SCE started the system evaluation Dec. 31 1998 : Shipped the software developable revised samples Feb. 1999 : Presented at the conference(ISSCC), fair reputation Mar. 1999 : Next generation Playstation technical overview was presented at PlayStation Meeting 1999, and big impact on all over the world! EE#2 Design completed as scheduled and achieved 300MHz Target EE#3 Shrunken version(0.18um) development has been started

Copyright © 2009 Toshiba Corporation, All rights reserved. 25 Timing Closure

WithWith 0.25/0.180.25/0.18 umum designdesign rule,rule, controllingcontrolling interconnectinterconnect delaydelay hashas becomebecome significantsignificant tasktask inin LSILSI designdesign projects.projects.

KeyKey isis ““quickquick timingtiming analysisanalysis andand feedbackfeedback toto RTLRTL andand floorfloor planplan””.. DesignerDesigner isis ableable toto dodo ““whatwhat--ifif”” analysis.analysis.

Copyright © 2009 Toshiba Corporation, All rights reserved. 26 Timing Design Flow

RTL Design Floor Plan Quick Turnaround Pre-layout 15.4 hours Timing Accurate pre-layout Wire Load Estimation Analysis timing analysis

Place & Route Repeater Insertion Well Controlled IPO & ECO Repeater Insertion

Post-layout Timing Analysis No Timing met? Clock Skew < 116ps Yes Automated Clock Buffer Tuning

Tape Out! Copyright © 2009 Toshiba Corporation, All rights reserved. 27 Timing Closure -Results-

s

h

t

e

a

u

p

l

a

r

v

o

r

r

n

e

o

i

t

g

a

n

l

i

o

i

m Applied partitioning

i

v

t

and wire load

g f reconsidered estimation estimation n

o

i partition size

r

m

e

i

t

b

m

m

u

u

n

m

i

e

x

h

a

T

M Design Start Iterations ---> time Copyright © 2009 Toshiba Corporation, All rights reserved. Tape Out 28 Clock Skew Management

ClockClock skewskew mustmust bebe minimizedminimized forfor ¾ the maximum clock rate, ¾ evading a race condition. AutomatedAutomated ClockClock design,design, e.g.e.g. CTS,CTS, isis establishedestablished inin StandardStandard CellCell richrich ASIC.ASIC. InIn Contrast,Contrast, halfhalf ofof EEEE isis occupiedoccupied byby customcustom blocksblocks (CB(CB’’s).s). ManualManual tuningtuning isis timetime-- consuming.consuming. ÎÎWeWe developeddeveloped anan automatedautomated clockclock tuningtuning method.method.

Copyright © 2009 Toshiba Corporation, All rights reserved. 29 Playstation2

From Wikipedia

Copyright © 2009 Toshiba Corporation, All rights reserved. 30 Beyond Playstation2 Jan. 1999 : The Investigation of the processor for the next-generation Playstation started internally in Toshiba ¾ At first, investigated the possibility to develop in Silicon Valley ¾ Investigated IBM as a partner, they were not interested so much at first From the experience of Playstation2 ¾ We didn’t have to use the existing processor architecture, because Playstation can have enough volume to be a standard by itself ¾ From DVD capability experience, It will not be only a game console but the center for the entertainment in home ¾ Broadband network will be popular and have very big impact Basic concept of the next generation processor ¾ Not just for Game console, but should be useful in various applications ¾ So, it must be scalable ¾ The real-time computation will be the key for gaming and networking ¾ New should be employed for the new world Development Team ¾ By talking directly from Kutaragi-san to IBM, SCEI, IBM, and Toshiba have agreed to set up the joint development team ¾ We have agreed to develop next generation general purpose processor, and three company join the project, totally even share of the resource and the right

Copyright © 2009 Toshiba Corporation, All rights reserved. 31 Cell BE Development Project

Established main campus in IBM Austin ¾ For the quick startup ¾ Reserve excellent engineers Multiple sites were mandatory ¾ Overcome management difficulty by frequent tele-conference ¾ Network infrastructure was very effective for the information sharing Joint Development team has been established ¾ Each part was consist of mixture of three companies engineers ¾ For the good communication to collaborate smoothly ¾ Necessary also for the technical transfer to each company After target was clarified, development was relatively smooth ¾ Quick termination of culture friction was achieved ¾ Flexible resource allocation was possible

Copyright © 2009 Toshiba Corporation, All rights reserved. 32 CellCell BEBE DieDie PhotoPhoto

Copyright © 2009 Toshiba Corporation, All rights reserved. 33 Cell Block Diagram

High-speed memory (XDR) SPE SPE SPE SPE XDR I/F 2 Channels PowerPC Processor PPE

Element (PPE) SPU SPU SPU SPU XIO XDR* CPU core based on PPU 64-bit PowerPC * DRAM architecture, suitable L1 LS LS LS LS for general programming Element Interconnect MFC MFC MFC MFC Bus (EIB) L2 A broadband high speed internal ring bus which Cache Element Interconnect Bus (EIB) connects all the elements inside Cell

MFC MFC MFC MFC Cell, GPU, Synergistic Processor Etc. Element (SPE) FlexIO Simple SIMD processor LS LS LS LS Flex IO core (suitable for media Broadband

* 2 FlexIO I/O processing) Super SPU SPU SPU SPU bus Companion interfaces Chip SPE SPE SPE SPE

PPE: PowerPC Processor Element SPE: Synergistic Processor Element I/O MFC: Memory Flow Controller LS: Local Storage * Trademarks of Rambus, Inc.

Copyright © 2009 Toshiba Corporation, All rights reserved. 34 Cell BE Major Development Topic

Apr. 2000 : Architecture investigation has been started among three companies ¾ Big culture gap, and lot of friction ¾ Various candidates were investigated, and concluded almost same as present one Mar. 2001 : Public announcement of Joint development ¾ Started project by setting the campus in Austin Sep. 2001 : Toshiba has started to investigate the function of Peripheral chip Apr. 2004 : Cell BE first sample was accomplished and worked! Feb. 2005 : First technical presentation at ISSCC conference ¾ Big impact all over the world Aug. 2005 : Toshiba presented peripheral chip and SPE asign software idea Oct. 2005 : Toshiba demonstrated at CEATEC exhibition ¾ One of the biggest impact demonstration ¾ All major TV station mentioned in the news program Apr. 2006 : Toshiba started to provide Cell BE development set Sep. 2006 : IBM won a Cell BE based super computer for the national research Lab. Nov. 2006 : SCEI launched Playstation3 with Cell BE Copyright © 2009 Toshiba Corporation, All rights reserved. 35 Cell BE (SPU)

STI Design Center Open HLD (High Design) Exit (macro I/F fix)

basic instruction VHDL running at unit level

basic instruction VHDL running at core VHDL complete (functional)

VHDL complete (pervasive)

Gold (function) Gold (pervasive) RIT High Level Design Debug/Timing Tune 2001 2002 2003 4 345 8 1011 12 -50ps -75ps -100ps timing slack

-200ps

Copyright © 2009 Toshiba Corporation, All rights reserved. 36 Each Reference model for SPU Development phase

CompilerCompiler developmentdevelopment SoftwareSoftware developmentdevelopment SPUSPU DevelopmentDevelopment phasephase

MicroMicro ISAISA RTLRTL PhysicalPhysical ArchitectureArchitecture DefinitionDefinition ImplementationImplementation ImplementationImplementation DefinitionDefinition

Validation of the Performance RTL verification ISA completeness analysis TestTest casecase generatorgenerator Applications InstructionInstruction PipelinePipeline that use the Expected value SimulatorSimulator SimulatorSimulator Expected value reference model generatorgenerator

CommonCommon ReferenceReference ModelModel

Copyright © 2009 Toshiba Corporation, All rights reserved. 37 SPU Verification Strategy

Coverage Random Instruction Coverage Based Verification file sequence ¾ Test items are written as coverage events SPUSPU ¾ Executes simulation random test Checkers Datapath Checkers

cases until all the coverage events Control Control are hit. Local • Random instruction sequences Local RegisterRegister StorageStorage • Random external transactions FileFile ¾ The checkers check SPU logic correctness while simulation ChannelChannel ¾ Coverage files are generated as a Channel I/F DMA I/F simulation results

Driver ExternalExternal TransactionTransaction DriverDriver parameters Random external transactions Copyright © 2009 Toshiba Corporation, All rights reserved. 38 Synergistic Processor Element Feature

New architecture for the data processing ¾ Media and floating point computation RISC type instruction set ¾ High level language oriented SIMD based instruction scheme ¾ 128 bit width Parallel execution (e.g. 4x32bit) Large 128bit width 128 entry PPE SPE SPE SPE SPE 25.6GB/s 注) 注)

xio XDR SPU SPU SPU SPU PPU DRAM LS LS LS LS

L2 MFC MFC MFC MFC 256KB Local store, not Cache Cache Element Interconnect Bus (EIB) Rich performance monitoring MFC MFC MFC MFC Cell/GPU 等 注) FlexIO capability LS LS LS LS 76.8GB/s SPU SPU SPU SPU Super Companion SPE SPE SPE SPE Chip

PPE:Power Processor Element SPE: Synergistic Processor Element LS : Local Storage I/O

Copyright © 2009 Toshiba Corporation, All rights reserved. 39 CellCell BEBE positioningpositioning asas aa ProcessorProcessor

Media Performance enhancement by Not Enough Media adding media instructions PC Performance

I nee Performance Integer

nt Processor CP e / Video quality m y c e v n a and size are not o e i r d u e enough yet p q n M o e t m i i i r t f b c 8 u e 2 r e t c 1 s Too much h n n Cell BE t I Cell BE

a Integer o a m t i r Performance d n e o e o f v r i M ti t e t c a i u P l b r I can’t feel t e 4 s r 6 n performance i difference even higher frequency

Media Performance

Copyright © 2009 Toshiba Corporation, All rights reserved. 40 Cell BE Proposes New Computer Direction

New computer direction proposed by Cell BE ¾ To show hardware feature as is to the user • Program must be written by considering HW structure and memory size • Need knowledge about HW ¾ Instead, let user know what is going on in the computer • Easier performance tuning • Especially for realtime system ¾ Overwhelming cost performance is achieved • Very Small SPE sometimes provide twice of performance of PC processor Maybe new direction since RISC is proposed ¾ More than 20 years ago

Copyright © 2009 Toshiba Corporation, All rights reserved. 41 Real-time Face Tracking

Camera captures the face of the person facing the “mirror”、 and in real-time, the rendered image is projected in the “mirror”.

Camera Capture

Cell Processing

Display CG Image

Virtual Make-up Virtual Hair Style

Copyright © 2009 Toshiba Corporation, All rights reserved. 42 Hand Gesture User Interface

Operate products using hand gestures (hand position, motion) Hand gesture controlled ¾ Deviceless remote control. media player equipment ¾ Applicable for a variety of daily life objects. Operate Open, A/C close curtains 番組 2/5 Operate TV

Three hand share can be recognized

GUI Control by recognized image

Copyright © 2009 Toshiba Corporation, All rights reserved. 43 Trend of HW→SW

Game/PC Required Performance/Cost Three Dimensional GraphicsCELL HDTV/HDDVD HD Image Processing HW or No Solution TV/DVD/PC Image Processing (MPEG) Multi-window WS Two Dimensional Graphics n tio ca PC pli DSP Ap al Signal Processing(Voice) on ers Calculator r P Fo ion Personal Use tat 1980 er pu erv m S Co Mini Computer ose urp Process Control al P ner Ge Main Frame General Purpose 1965(money) SW Solution EDVAC 1960 ENIAC(Programmable) Military Military

1952 1946 1950 1960 1970 Year 1980 1990 2000 2010

Copyright © 2009 Toshiba Corporation, All rights reserved. 44 48 Streams (MPEG2) Decode Demo

MPEG2 (About 4Mbps) ・・・ 720 x 480 Read out 48 streams from HDD and Decode Shrink to 1/3 and arranged to one picture for displaying on the TV

Copyright © 2009 Toshiba Corporation, All rights reserved. 45 Hard to Find Customers in this Field

So, by Focusing on Video Application for PC!

Copyright © 2009 Toshiba Corporation, All rights reserved. 46 SpursEngineTM Block Diagram

SPE SPE SPE SPE Advanced Applications 1.5Ghz 1.5GHz 1.5GHz 1.5GHz by Quad GHz Core

EIB (Element Interconnect Bus)

Full-HD DMAC Video Host SCP MPEG-2 H.264 MPEG-2 H.264 CPU PCI-ex (cont Mem Encoder Encoder Decoder Decoder Codec I/F roller) Cont. HW-IP HW-IP HW-IP HW-IP by HW-IP

PCI-Express Video CODECs Max 4 lanes XDR DRAM MPEG-2: MP@HL support In : 1GB/s XDR DRAM H.264: High Profile@Level 4.1 support Out : 1GB/s 6.4 GB/s (1 pcs XDR) max. 1920x1080 60i/24p, 1280x720 60p ~ 12.8 GB/s (2 pcs XDR)

XDR™ DRAM is trademark of Rambus Inc. in the United States and other countries.

Copyright © 2009 Toshiba Corporation, All rights reserved. 47 WhatWhat isis thethe NextNext TTargetarget forfor PC?PC? AnswerAnswer isis HD.HD. Emergence of HD Contents ¾ High Definition (HD) era has been quickly emerging. For instance, video content from digital terrestrial broadcasting, digital video cameras, and optical disks are all HD ready. HD contents require much more processing power than SD contents. ¾ HD needs 6 times higher bandwidth than Standard Definition (SD). Conventional PC architecture of CPU and GPU can only decode HD video in real-time. ¾ CPU, even though it keeps getting faster, will not be capable of real-time encoding HD video in the near future.

Digital/Analog Terrestrial Broadcasting HD HD

Importing Burning Conventional PC Sharing Playback (Decode) Streaming Break Through Scaling Users demand an innovative HD solution! Authoring Transcoding Editing

Copyright © 2009 Toshiba Corporation, All rights reserved. 48 Demand:Demand: VideoVideo IndexingIndexing andand SearchingSearching

FutureFuture demands: demands: not not only only transcoding transcoding,, but but also alsoindexingindexing and and searchingsearching features featurestoto easily easily find find the the video video which which they they want want to to watch. watch. PC Live broadcasting

Terrestrial Digital/Analog

Short movies using Home videos DSC will be much more popular and casual in the near future.

Most videos are un-tagged raw data. YouTube Archive Difficult to Internet find the video they want. http://www.youtube.com/ Locally, 1000+ files, 10000+ hours (>1TB disk) On Internet, infinite number and hours of video

Copyright © 2009 Toshiba Corporation, All rights reserved. 49 FurtherFurther Advantages:Advantages: NewNew applicationsapplications byby SpursEngineSpursEngine „ Super realtime Transcoding: Transcoding at faster than real-time „ Super Resolution: Picture resolution up-scaling for HDTV „ Indexing: Video categorizing during HDD storing, DVD burning „ Gesture I/F: Control various devices in the living room by hand gestures „ Face Tracking: Realtime 3D face tracking for communication tools „ Interactive Gaming: New type of realtime game with Gesture I/F and Face Tracking „ Editing: Video editing of consumer generated content HDTV

SpursEngine Super DVC PC resolution HDTV Super Realtime ・・ ・ Transcoding Terrestrial Indexing Digital/Analog HDD Streaming Mobile DVD Internet Gesture I/F

Gaming Editing Face Tracking http://www.youtube.com/

Copyright © 2009 Toshiba Corporation, All rights reserved. 50 ToshibaToshiba PC:PC: QosmioQosmio withwith SpursEngineSpursEngine Toshiba released Qosmio empowered by SpursEngine in July 2008.

Reference URLs: http://explore.toshiba.com/laptops/qosmio/G50 http://explore.toshiba.com/innovation-lab/quad-core-processor

Copyright © 2009 Toshiba Corporation, All rights reserved. 51 Waves to Standardization and Customization

Mass production (Standardization):Factory efficiency S/W solution Enough performance with 10 million/month Performance became standard HW, SW Solution better than RISC + ASIC Severe competition Surplus supply Memory in Asia Price down Price slump X86 + MPU in US Standard Software PC Chipset Transistor, 1KDRAM(Intel) Memory, SPARC(SUN) Centric Emotion Tr Diode 4bit MPU(Intel) MPU R3000(MIPS) Design ('47) Engine Cell/B.E. Cell/B.E. 1957 1967 1977 1987 1997 2007 2017 TV, Power Consumption IC Invention CAD Revolution Calculator, EDA Revolution SW+HW Engine Killby patent Watch MOS GA ASIC System on a RISC Short life cycle Design crisis Chip New product development race Too much integration

100 design/month Co-design of dedicated H/W & S/W Various & small production (Customization):Customer satisfaction Source : Makimoto’s Wave

Copyright © 2009 Toshiba Corporation, All rights reserved. 52 Starting ASIC, RISC Processor Era (From ‘87) Starting HDL Design by Semiconductor Company’s Proprietary CAD AI Workstation (1985 to 1987) ¾ Last trial for designing processor using standard parts 3D Graphics Processor (from 1986) ¾ First chip using proprietary HDL CAD H2DL、and ¾ Revolutionary design efficiency, small design team was arrowed AI Processor Chip(1987 to 1988) ¾ Designed by using Silicon Compiler “Genesil” ¾ Defeated by severe RISC competition with declining AI booming TRON Processor (1986 to 1989) ¾ Disappeared in the CISC/RISC debates GL Processor (1989 to 1991) ¾ Again relatively large team, and couldn’t reach final stage ¾ Prologue to R8000

Copyright © 2009 Toshiba Corporation, All rights reserved. 53 Processor Development Became a Big Project Again

Proprietary EDA was expelled by third party EDA vendors R8000 for Server(1991-1993) ¾ Successor of GL Processor Team ¾ Joint Development with Silicon Graphics Inc ¾ Verilog HDL and logic Synthesis tool was used ¾ Random verification and C reference model ¾ Start of Standard EDA Era for Server and Workstation (1994-1997) ¾ Very Early out of order processor ¾ One year project delay ¾ Big competition era TX39, 49 for Embedded Application(1993-1998) ¾ Embedded RISC Processor raised ¾ Good power performance ¾ Started to have revenue, though,,

Copyright © 2009 Toshiba Corporation, All rights reserved. 54 Standard Processor Era Since 1997 For the leading edge development, proprietary EDA was necessary combining with standard flow X86 Architecture became standard (RISC was defeated) ¾ Less advantage by super scalar architecture ¾ Out of order made no advantage for RISC Emotion Engine(1996-1999) ¾ Multi-core Architecture was adopted ¾ STA, Formal Verification, Extraction Cell Broadband Engine(2000-2005) ¾ Al most IBM Proprietary EDA was used ¾ DFT, Assertion、DFM、Power Estimation ¾ Was good enough for the fastest processor ¾ Too early for replacing HW solution

Copyright © 2009 Toshiba Corporation, All rights reserved. 55 ProcessorProcessor ClockClock RateRate perper YearYear

10000

6GHz 3.2GHz 5GHz 3.8GHz P4 PLAYSTATION3 2.93GHz Standard Processor Era 3.6GHz P4 2006/11/11 3.4GHz P4 Core2 Ext 4GHz Out of Order Execution 3.2GHz P4 3.0GHz P4 3GHz

2.0GHz 3.2GHz Pentium 4 XBox360 2GHz 1.4GHz 2005/12/10 Pentium 4

1GHz Athlon/PIII Multi core Era 1000 700MHz 800MHz Freq. ≠ Perf. RISC Era 600MHz 21264 Pentium III 733MHz 21164 XBox 500MHz Simpler the Better 600MHz 2001/11/15 21164 Pentium III 485MHz 300MHz 450MHz GameCube 275MHz 21164 Pentium II 2001/9/14 21064 200MHz 300MHz 300MHz Clock Rate (MHz) 187MHz 21064 Pentium II PlayStation2 21064 2000/3/4 200MHz 200MHz Pentium Pro Dreamecast 1998/11/24 100MHz 150MHz R4000 Pentium Pro 100 100MHz 100MHz Pentium Nintendo64 1996/6/18 Hiend Processor 66MHz Pentium x86 Processor 66MHz 33MHz 486DX2 PlayStation GAME console 1994/12/3 1999 SIA local clk(Hi-Perf.)

1999 SIA chip-across clk(Hi-Perf.)

10 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 991 992 993 994 995 996 997 998 999 000 001 002 003 004 005 006 007

Year (Chip or System Shipment)

Copyright © 2009 Toshiba Corporation, All rights reserved. 56 What is going on to the MPU and SOC Now? Just Transition from Standard MPU Era to Custom SOC Era ¾ Change of technology direction : Big chance in ten years ¾ Out of order changed the direction ten years ago •RISC→CISC ¾ From frequency competition to multi-core • Player increase by matured technology ¾ What is SOC as a technology driver? • Combination of processor and fixed function unit • SpursEngine will be one of the early example ¾ Big change is necessary for EDA too • Backend design will change too : no custom block • Model based design with coverage based random verification • High level design including software is the next key Manufacturing Technology is Also Changing ¾ DFM is needed ¾ A lot of new materials, for low power ¾ New transistor structure

Copyright © 2009 Toshiba Corporation, All rights reserved. 57 SpursEngineTM Physical Implementation

Process: ¾ 65nm bulk CMOS with 7 levels of copper layers Die Size: PLL PCIe PHY ¾ 9.98mm x 10.31mm, 102.89mm2 LS LS H.264 PCIe MPEG2 Decode Fmax: 1.5GHz Decode H.264 Decode Transistor Counts: ¾ 239.1M SPE SPE SCP MPEG2 Encode • Logic: 134.3M

EIB • SRAM: 104.8M Shared TDP Memory SPE SPE ¾ < 20W (depending on application sets) Mem. Package: Cont. H.264 ¾ FC BGA 624 LS LS Encode

eFuse XIO*

* XIO™ is trademarks of Rambus Inc. in the United States and other countries.

Copyright © 2009 Toshiba Corporation, All rights reserved. 58 Key Design Feature: Synthesizable Over GHz SPE

Synthesizable design ¾ shorter design TAT • (estimation: 1/4 compared with custom migration) ¾ Technology Independent • To Develop Different Target SoC Key Feature ¾ floor plan optimization ¾ hybrid standard cell height ¾ RTL abstraction ¾ register retiming ¾ semi-custom clock tree ¾ wire width control

Copyright © 2009 Toshiba Corporation, All rights reserved. 59 Floor Plan Optimization Original Floor Plan (Cell/B.E.TM) New Floor Plan

Cell/B.E.TM (65nm) 2.09mm x 5.30mm SpursEngineTM (65nm) 2.07mm x 3.93mm

L/S Square & Compact Data Path L/S

Cont. logic more cell placement flexibility area: ¾ 27% smaller total wire length: MFC ¾ 28% shorter L/S : Local Storage MFC : Memory Flow Controller

Copyright © 2009 Toshiba Corporation, All rights reserved. 60 IntroductionIntroduction ofof CC ModelModel forfor DTV-SoCDTV-SoC

DesignDesign FlowFlow

C Model of Verification Software Each Module Environment

Voice Output

C Level Functional Input Stream Simulator Image Output

Copyright © 2009 Toshiba Corporation, All rights reserved. 61 EffectEffect ofof CC ModelModel :: ShortenShorten TATTAT

Without C Model Debug Finish Bug Count

With C Model Shorten TAT Detectable Bugs On C Model ation Undetectable Bugs cceler A without Real Silicon

Bug Count

Develop Start using SoC Design 1st Sample Debug ment Decision C Model Accomplish Evaluation Finish

Copyright © 2009 Toshiba Corporation, All rights reserved. 62 Coverage Base Random Verification The limitation of Directed Test ¾ Many good engineers are necessary to write enough test cases ¾ Test cases must be rewritten by design changes ¾ Cannot write unknown test cases Random Test is Mandatory though, ¾ No hand written assembler level test cases ¾ No way to cover all the test case ¾ To randomize many parameters is most important thing ¾ Simulation cycle count doesn’t mean verification quality →Coverage Base verification

w/o Coverage w/ Coverage

Copyright © 2009 Toshiba Corporation, All rights reserved. 63 Random Simulator for SoC

Processor Verification Stream SoC Verification

Program Program Testbench

RTL Simulator Simulator Simulator Signal Driver RTL Simulator

Log/ Log/ Trace Trace

=? Log/ Log/ Trace =? Trace Verification Results Verification Results

Copyright © 2009 Toshiba Corporation, All rights reserved. 64 EDAEDA forfor TechnologyTechnology DriverDriver SoCSoC

Now, it is not considered as an EDA

Software High Level Development Hardware Design Environment Environment

Highly abstracted EDA for Software environment for defining and hardware engine control hardware architecture sequences Coverage base random Automatic Random vector simulation generation Higher level description System level development System level development

Fusion of Two Environments is the Key Point!

Copyright © 2009 Toshiba Corporation, All rights reserved. 65 Software Development Cost is Dominant Even Today

Etc. 7.7% Mechancal HW Cost 11.9%

Electrical Software HW Cost Cost 20.7% 59.7%

Source : Embedded Software Investigation by METI (2007)

Copyright © 2009 Toshiba Corporation, All rights reserved. 66 HW Variation is Very Effective for Software Environment

1Q08 PC Shipments in US 1Q08 PC Shipments WW

Others 22% Dell 31%

HP 18% Toshiba 6% Others 46% Acer HP 9% 25% Dell 15% Acer Toshiba Lenovo 10% Apple 7% 4% 7%

Apple is supporting proprietary OS with only 7% US market share

Source: Gartner (April 2008)

Copyright © 2009 Toshiba Corporation, All rights reserved. 67 Towards Software Centric Era (After 2017?) Software Development Cost will be much more than Today ¾ It will be impossible to support many platforms HW Design Cost will also Increase a Lot ¾ Design methodology won’t meet design complexity increase ¾ Verification will be the critical issue ¾ Single purpose LSI will not pay off any more ¾ More expensive prototyping • Complicated mask process Enough Performance with Software Solution ¾ Less cost and power for the same function • It will be possible to replace dedicated HW such as software radio ¾ Protocols will be more complex and varies • MPEG2 lasted for 10 years, but next is confusing such as H264,DIVX,VC1 • MPEG4 varies more than 20 derivatives • Voice Codec are innumerable (AAC3,MP3,WMA,,,,) ¾ Softness will have bigger advantage New Revolution will be Necessary for After Software Centric Era

Copyright © 2009 Toshiba Corporation, All rights reserved. 68 Software solution of HW makes possible to….

Freedom from standardization Freedom from standardization Program Contents Quick service startup with new technology becomes possible

By delivering contents and decoder simultaneously Proprietary Contents delivery ¾ Better compression technology than standard ¾ Hi-level Contents protection ¾ Apply various charging system ¾ Efficient usage of Network Infrastructure Communication with proprietary protocol Any kind of contents are available Proprietary cipher system

Copyright © 2009 Toshiba Corporation, All rights reserved. 69 Network Transparent Model Makes…

Heavy function is achievable by collaborating multiple machines on the network Same Program is applicable to various performance machines with keeping real-time-ness from Cell phone to Server

Cell Program

Collaboration of High-end Game Console Digital TV and HDD Recorder Cell Phone and Server

Copyright © 2009 Toshiba Corporation, All rights reserved. 70 ScalabilityScalability ofof CellCell BEBE

Cell Server

CELL CELL PE PE Engineering WS SPE SPE

DTV Home Server CELL DVR PE Car SPE Platform

Mobile Phone PDA

SPE

Copyright © 2009 Toshiba Corporation, All rights reserved. 71 Towards Cell World

By taking advantage of overwhelming performance, Cell BE is aiming at the new software standard world

SW for machine collaboration and

Cell Service Collaboration Cell Cell Company Service Provider Man Machine Interface

Cell Broadband Internet Machine Cell Collaboration Cell

Cell

Home Cell Cell

Cell Cell Cell HDD Home Server Cell Cell (W/ HDD, DVD) Ubiquitous Network Copyright © 2009 Toshiba Corporation, All rights reserved. 72 Summary Semiconductor Roadmap is Straight Forward, But application is not ¾ Moore’s Law is forever!? ¾ Technology Driver is changing ¾ Makimoto’s Wave is still surviving Our Microprocessor Development History was Also Following the Wave ¾ Design Methodology is Dramatically Effective for this Wave ¾ Requirement for EDA for Technology Driver was Changing • Description, Timing, Backend, Verification ¾ Positioning of Emotion Engine and Cell was Standard Processors Today is Transition from Standard Processor to SoC ¾ No More Frequency Increase ¾ Need Purpose Oriented SoC’s ¾ SpursEngine is One of the Typical Example ¾ Verification Methodology for Processor began to be applied for designing SoC’s ¾ Needs High Level and Software Oriented Design Environment Next Wave will be Software Centric ¾ Will take some more time for new revolution after software era Take Advantage of Today’s Transition

Copyright © 2009 Toshiba Corporation, All rights reserved. 73 CrisisCrisis == 危機危機 Danger Chance TransitionTransition == 変転変転 Change Revolution

Copyright © 2009 Toshiba Corporation, All rights reserved. 74 Cell Everywhere!! Thank you

Copyright © 2009 Toshiba Corporation, All rights reserved. 75 Copyright © 2009 Toshiba Corporation, All rights reserved. 76