Mpsoc with Multi Configurable Processors and Design Environment

Mpsoc with Multi Configurable Processors and Design Environment

MPSoC with multi Configurable Processors and Design Environment Jack, Kazutoshi Wakabayashi EDA R&D center, Central Res. Labs. NEC corp. and chief architect, CWB business promotion, NEC System Technology, Ltd. System IP Core Res. Labs, NEC Corp, Visiting Professor, JAIST URL:www.cyberworkbench.com Email: [email protected] June 28 (Th) 2008 Agenda One feature of NEC’s ASSP (DAV,Mobile) is wide usage of C-based synthesis and verification. This impacts ASSP architecture and design flow these five years. (We applied Behavior synthesis even for CPUs) Keywords: 1. Multiple Configurable Processors (custom CPU + custom DSP) 2. Dynamic Reconfigurable Processors (flexible Hardware and controller) 3. C-based Synthesis and Verification 4. Reverse Trend : SW to HW HW solution for RT operation, power efficiency K. Wakabayashi © NEC Corporation 2007 2 Global trend: Roll of SW(CPU) is increasing ROMROM IFIF SDRAMSDRAM IFIF MPEG2MPEG2 A/VA/V GraphicsGraphics DecoderDecoder TimerTimer UARTUART TSTS DEMUXDEMUX SmartCardSmartCard VideoVideo EncoderEncoder CPUCPU DescramblerDescrambler HighSpeedPortHighSpeedPort VideoDACVideoDAC For Low Power Multi Custom Processors SW: just controlling HW blocks uPD61030 (1998) CPU SW: DSP MPEG2MPEG2 StreamStream MPEG2MPEG2 MIPSMIPS DisplayDisplay VideoVideo function Enc.Enc. Proc.Proc. Dec.Dec. DSPDSP Proc.Proc. EncoderEncoder DSP DSP Custom CPU,DSP 1998 2000 2002 2004 2007 K. Wakabayashi © NEC Corporation 2007 3 Global trend: MPSoC: Software-oriented system V850E General CPU: for embedded SW : a few MPSoC Custom CPU,DSP configurable Proc. : ~20 MIPS: MIPS: 4KE 4KE V850E 14 12 EMMA2RL 12 10 EMMA2R V_Bus Custom 8 9 DSP 6 EMMA2 . 7 4 SIF EMMA1 2 Custom 3 Custom 0 CPU DSP 1998年11999 1999年 2000 2000年 2001 2001年 2001 2002年 2003 2003年 2004 2004年 2005 2005年 Custom CPU EMMArchitecture Roadmap K. Wakabayashi © NEC Corporation 2007 4 Custom CPUs exist usually inside HW module # of “custom CPU” is 20 or more, but they are not Main CPUs. Main CPU is just a few… (not so-called MPSoC) (though, custom CPU communicates with Main CPU) Main General CPU CPU OS (RTOS) HI/F OUTPUT HI/F OUTPUT HI/F OUTPUT HI/F OUTPUT INPUT INPUT INPUT INPUT Local ローカルCustom CPU Local Local Mem メモリ Mem. Mem CPU CPU CPU CPU アクセラ Accele- DMAC accelerator DMAC HW DMAC Accelerator DMAC レータ rator module K. Wakabayashi © NEC Corporation 2007 5 Our ASSP architecture A few General CPUs + Multi Custom Processors + HW modules (+ DRP ) General General CPU CPU MIPS custom General CPU DRP: DSP Dynamic (every 1ns) reconfigurable (multi context) custom DRP FPGA or Programable V850 DSP . array MEM Custom Custom CPU CPU Custom Custom CPU CPU Custom CPU, DRP, HW modules are based on C-synthesis. So, they can be replace into another. e.g. C source for DRP can be synthesized into pure HW!! CPU tasks are implemented in DRP (really happens often)!! K. Wakabayashi © NEC Corporation 2007 6 “All-in-C” : All Modules with C-based Synthesis FLASH DMAC HOSTHOST CPUCPU DMAC ROM ARM,V850ARM,V850等等 (( HOST)HOST) CPU BUS HIFHIF GPIOGPIO DSPDSP SRAM SPX,TISPX,TI等等 DSP BUS BRIDGEBRIDGE (HOST- (HOST- SPRAM DSP)DSP) DMACDMAC (( APPLIAPPLI )) APPLI BUS SDRAM BRIDGEBRIDGE ROTATORROTATOR RESIZERRESIZER LCDCLCDC BRIDGEBRIDGE (( DSP-DSP- (( HOST-HOST- APPLIAPPLI )) APPLIAPPLI )) Control Datapath Dominated circuit K. Wakabayashi © NEC Corporation 2007 7 1. Multi Custom Processors Types of CPUs on our ASSP RTOS, C developing environment ARM (Debugger, Performance analysis, etc) MIPS General CPU Middlewares, V850 C Open for “SW engineer” or ASSP user General CPUs Instr. Set Semi- CPUs for + Special Inst. Set Custom -Tel com, CPU (Environment: same as above) -Digital AV Assembler C-synthesis (video, audio, No OS, Assembler lang. graphic, etc ) Custom CPU Full Custom High Performance -Encryption CPU / DSP Less Memory size etc. “HW engineer” or ASSP supplier “screw&nail” processor K. Wakabayashi © NEC Corporation 2007 8 Merits of Custom CPU/DSP 1) HW control (sequencer)->SW control more flexible, debuggable after chip fab. 2) Special Instruction Set : higher Performance Low Power and low clock frequencies, memory save Requirements for our favorite custom CPU/DSP - Real time function : No Cache, avoid bus collision - HW-like Performance for special procedures - Smaller size of Memory (Instruction & Data) - Bit handing, Branch with bit - read/write of I/O reg. - Easy to design/ Modify ( few weeks for design) K. Wakabayashi © NEC Corporation 2007 9 Types of Additional Instruction Set zALU extension :Easy, but very limited Additional IS CPU ●Co-Processor Co-processor core Pros: Easy even with RTL based Design Cons: no resource sharing among co-processors and core-CPU slow data transfer CPU-core and Co-processor(e.g. memory mapped IO) ●Complex Instruction into CPU core Pro: sharing registers, FUs with core CPU and Extended Instructions Direct read of multi public register -> higher performance CPU Cons: CPU core has to be changed core (difficult for RTL design, timing closure ) Additonal (CPU core should be described in C, C-synthesis!) IS “C-base behavior synthesis” supports all types K. Wakabayashi © NEC Corporation 2007 10 Tool Flow for Custom CPUs C behavior of Custom CPU Source Code Debugging C (BDL) Behevioral Synthesizer Cyber Cycle Accurate Models For other HW modules RTL C++ + General CPU models Verilog, VHDL Cycle Accurate++ C++ Sim.Model HW-SW co-sim FPGA emu. (ISS) K. Wakabayashi © NEC Corporation 2007 11 Configurable Processors Generation RTL-based vs Behavioral-synthesis-based Logic Synthesis based Behavioral Synthesis based Adding Instructions : RTL Co-processors Adding Instructions: Melted into Core Behavioral C Processor A B C D A B core Processor core Smaller Beh.Syn. Lower power C D Special IS: Limited Special IS: less limited Base Processor is not flexible Base Processor is flexible K. Wakabayashi © NEC Corporation 2007 12 EffectsEffects ofof ResourceResource SharingSharing Base I.S. : 81 Adding I.S. : 24 Base Processor : 33.9K Gate For Stream Processing After Adding IS: 42.6K Gate e.g.CRC, check-sum, only 8.7K(25%) increase start-code check... Kind Before After Necessary Bit width Diff. Effects of FU sharing among adding adding FUs in new IS >> 32 1 3 2 8 >> 33 1 1 0 0 Base IS’s and Adding IS’s + 3 1 1 0 0 + 12 2 2 0 0 + 14 0 1 1 4 + 16 4 5 1 3 + 32 3 3 0 6 <= 14 4 4 0 0 <= 32 1 1 0 0 Adding IS’s include - 3 1 1 0 0 - 5 0 1 1 2 Six Add Operations(32bit), - 32 2 2 0 0 << 32 1 1 0 0 But No FU increases << 33 1 1 0 0 * 8 1 1 0 0 > 16 0 2 2 2 > 32 1 1 0 0 Adding extra Instractions : >= 32 1 1 0 0 < 16 0 2 2 2 Not big area impact like co-processors < 32 1 1 0 0 K. Wakabayashi © NEC Corporation 2007 13 Summary for our custom CPU/DSP design We are “C-maniac” all modules are C-synthesis: this changes ASSP platform 1) CPU is described in behavioral C, and RTL is synthesized with our behavioral synthesizer (Cyber) 2) Additional Instruction set is also described in behavioral C 1)CPU basic Instractions and 2)additional Instructions share resources. 3) ISS of the CPU is also generated with Cyber, used for entire SoC simulation including HW modules is constructed 4) C compiler or C-like structured assembler ( semi-custom CPU case, RTOS and development env. can be used. Additional IS is handled as macro instruction.) K. Wakabayashi © NEC Corporation 2007 14 What size of additional Instruction is appropriate? Fine grain Instruction : Flexible, but low performance Coarse grain Instruction: HW like performance, but low flexibility 1) Find bottleneck contains many general CPU instructions. 2) Hierarchical Instruction (for flexibility) ex. Level 3 : function level instruction (DES encryption) Level 2: SIMD, Protocol, combination (2bit shift&mask) Level 1: Reg access, single ALU, testing, status checks Bug, Spec. Change : Insted of ECO, HW bugs in level 3, can be realized with level 2 or 1. ( use current chip until next ES) K. Wakabayashi © NEC Corporation 2007 15 Example of Additional Instruction basic example:several codes -> one instruction e.g. Huffman Decoding Instruction Only with Basic Instractions Additional Instruction which Original CPU core has tmp32A = FAST_LEN_TABLE; tmp32A = memr32(tmp16a + tmp32A); tmp16b = tmp32A >> 16; tmp16a = tmp32A; Len = VLD_CalcLenDist (tmp32A, TableData1, 1); if (tmp16b != 0) { tmp16b = VLD_GetBitReverse (tmp16b); Huffman } CPU Len = tmp16a + tmp16b; Instruction 7 instructions → 1 instructions 7 cycles 2 cycles ☆Large procedure containing hundreds instructions, might be synthesized as a HW accelerator. (e.g. highly parallel=many FUs, low power, routability) K. Wakabayashi © NEC Corporation 2007 16 Additional IS containing control flow: special branch, loop instruction branch with bit calculation: common for DAV, mobile Function: If 25th bit of R3 is 0, then goto L1 Base Instructions New Instruction For custom CPU mov r2,0xfeffffff and r3,r2 If(bit(r3,25)==0) goto L1 jz L1 1 instruction 3 instruction (1 cycle ) (3 cycle) K. Wakabayashi © NEC Corporation 2007 17 Result: “ Screw & snail” CPU C(BDL) 2,267 Line Basic:81, RTL(verilog) 12,985 Line Special:24 # of IS 105 clock 108 MHz Gate size 42.6 K Gate Less than 1% area -> dozens is fine, but for 1000, 42KG is too large. Man Power (design) 3.0 MM First Trial (C++ simulation) 3.5 MM Next version took Much less Effects of C-based design Just inserting Specials 1. Flexible enough to follow spec changes 2.HW/SW co-design SW and HW guys discussed on additional ISs 3 Smaller circuits with resource sharing K. Wakabayashi © NEC Corporation 2007 18 Comparison of Custom Processors with C-Synthesis vs. Manual RTL Configurable Behavior C RTL Rate Processor Code Size 1.3KL 9.2KL 7.6X Simulation 61Kc/s 0.3K 203X (*1) Gate Size 19KG 18KG ~5%+ “Screw&nail” processor should be with C-synthesis, since custom CPU design effort should be small enough! *1 Pentium3@1GHz, Testbench, debug included K.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    32 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us