Tolapai A with Integrated Accelerators

Rumi Zahir, Sr. Principal Engineer (Contributions from S. Lakshmanamurthy and D. Parikh)

INTEL Confidential- CNDA required Hot Chips

All products, dates and figures are preliminary, for planning purposes only,(Revision and are subject 1.0) to change August without any2007 notice. may make changes to specifications, product descriptions, and plans at any time, without notice.

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. Results on Foil #13 have been simulated and are provided for informational purposes only. Results were derived using simulations run on an architecture simulator. Any difference in system hardware or software design or configuration may affect actual performance. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit http://www.intel.com/performance/resources/benchmark_limitations.htm Intel, , and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2007, Intel Corporation.

All products, dates and figures are preliminary, for planning purposes only, and are subject to change without any notice. Intel may make 2 changes to specifications, product descriptions, and plans at any time, without notice. Tolapai SOC – Intel® Architecture (IA) Computing & Secure Packet Processing on 1 Die Single Die integrates • IA CPU @ 600, 1066 and 1200MHz • DDR2 memory controller (MCH) • PCI Express* • Standard IA PC peripherals (ICH) • 3x Gigabit Ethernet MACs • 3x TDM high-speed serial interfaces for 12 T1/E1 or Slic/Codec connections • Intel® QuickAssist Integrated Accelerator – For high-performance security and IP telephony applications

Vital Statistics • 148 Million transistors • 1,088-ball FCBGA w/1.092 mm pitch • 37.5 mm x 37.5 mm package • Intel's first integrated IA processor, chipset and memory controller since 1994's 80386EX.

All products, dates and figures are preliminary, for planning purposes only, and are subject to change without any notice. Intel may make 3 changes to specifications, product descriptions, and plans at any time, without notice.

Wireless Tolapai Market Segments Comms Infrastructure

Industrial Storage Control Tolapai is expected to play in a diverse set of Security Federal embedded, storage & comms segments.

Print Imaging Infotainment Interactive Clients Medical Enterprise Print Imaging Voice

All products, dates and figures are preliminary, for planning purposes only, and are subject to change without any notice. Intel may make 4 changes to specifications, product descriptions, and plans at any time, without notice. Tolapai Agenda

Objectives Block Diagram, Interconnect Topology Security/Packet Processing Models How IA and Accelerators Share Memory Memory Controller Design Expected Tolapai Benefits

All products, dates and figures are preliminary, for planning purposes only, and are subject to change without any notice. Intel may make 5 changes to specifications, product descriptions, and plans at any time, without notice.

Tolapai Objectives

1. IA code base • Run existing mainstream IA OS, driver and application software • Standard tool chains & platform mechanisms (e.g., BIOS, ACPI) 2. Low power • 13-20+ Watts TDP, based on application • Reduce operating frequency and voltage 3. Integration • Intel® QuickAssist Acceleration for security and IP telephony applications • PCI Express, Gigabit Ethernet, TDM (T1/E1) connectivity 4. Bill of Materials & Time to Market • Minimize pin count • Standard PC/ICH interfaces • Standard mainstream DDR2 memory technology • Wide (1.092mm) ball pitch helps reduce PCB layer count

All products, dates and figures are preliminary, for planning purposes only, and are subject to change without any notice. Intel may make 6 changes to specifications, product descriptions, and plans at any time, without notice. Acceleration ‡ Services Unit Local MDIO (x1) Expansion TDM GigE GigE GigE CAN (x2) ‡ Tolapai Security Bus Interface MAC MAC MAC ‡ SSP (x1) Services Unit (16b @ (12 E1/T1) #2 #1 #0 IEEE -1588 (3DES, AES, (A)RC4, 80 MH z ) IA CPU Core w/ 256KB L2 cache MD5, SHA -x , PKE , TRNG) • Intel® Pentium® M processor derivative 256 KB Integrated Memory Controller ASU SRAM

• 1 channel 64-bit DDR2 Acceleration and I /O Complex ‡ Enabling software required . 4 channel DMA engine • IA Complex IMCH Transparent PCI Express* (1x8, 2x4, or 2x1) EDMA • PCI-to-PCI Bridge Intel® QuickAssist Acceleration • Multi-core, Multi-threaded Engines (256 KB) (256 L2 Cache

256KB Internal SRAM core IA-32 • FSB Memory Controller Hub • Security Hardware Acceleration for –Bulk: AES, 3DES, (A)RC4 –Hash: MD5, SHA-x – Public Key – RSA, DSA, DH IICH – Internal True Random Number APIC, DMA, Timers , Watch Dog Timer, RTC, HPET (x3) Generator (TRNG) PCI Integrated I/O Interfaces Express Interface Memory Controller • 3x TDM (12 T1/E1) (x1) (DDR-2 400 /533/667/800 , • 3x GbE MAC (RGMII or RMII) (Gen 1, 64b with ECC ) 1x 8, 2 x4 or • 1x Local Expansion Bus (16b) UART (x 2) 2 x1 root complex ) 2x Controller Area Network (CAN) SATA 2. 0 USB 2.0 GPIO (x37) • (x 2) (x2) SMBus (x 2) • 1x Sync Serial Port (SSP) LPC 1. 1 • 2x UART, 37x GPIO, • 2x SMBus/I2C, LPC • 2x USB, 2x SATA • WDT, RTC All products, dates and figures are preliminary, for planning purposes only, and are subject to change without any notice. Intel may make 7 changes to specifications, product descriptions, and plans at any time, without notice.

Acceleration ‡ Services Unit Local MDIO (x1) Expansion TDM GigE GigE GigE Packet Processing CAN (x2) ‡ Security Bus Interface MAC MAC MAC ‡ SSP (x1) Services Unit (16b @ (12 E1/T1) #2 #1 #0 IEEE -1588 (3DES, AES, (A)RC4, 80 MH z ) Flows MD5, SHA -x , PKE , TRNG)

• Classic IA (blue) 256 KB • GigE Rx DMA packets to DRAM ASU SRAM (includes IA snoop) Acceleration and I /O Complex ‡ Enabling software required .

• IA interrupt IA Complex IMCH Transparent EDMA • IA CPU runs protocol PCI-to-PCI Bridge • IA CPU controls GigE TX • Fastpath (red) (256 KB) (256 L2 Cache IA-32 core IA-32 FSB Memory Controller Hub • GigE Rx DMA packets to DRAM • Interrupt routed to accelerator

• Accelerator operates on packet IICH • Forwarding/filtering and security APIC, DMA, Timers , Watch Dog Timer, RTC, HPET (x3) functions can be handled w/o IA PCI CPU intervention Express Interface Memory Controller • Accelerator controls GigE Tx (x1) (DDR-2 400 /533/667/800 , (Gen 1, 64b with ECC ) • Exception Packets (green) 1x 8, 2 x4 or UART (x 2) 2 x1 root complex ) SATA 2. 0 USB 2.0 GPIO (x37) • Move packet to coherent DRAM (x 2) (x2) SMBus (x 2) (includes IA snoop) LPC 1. 1 • Accelerator signals IA CPU

All products, dates and figures are preliminary, for planning purposes only, and are subject to change without any notice. Intel may make 8 changes to specifications, product descriptions, and plans at any time, without notice. IPsec VPN, TCP Tolapai Security/Packet TCP / UDP SSL / SRTP NAT, Proxy/ Termination Termination Processing Models firewall Splice

Protocol / IA Classic Look- FastPath FastPath Inline Inline SW Layer (no accel.) aside L3 L4 L4 (L5+)

Application App App App App App App SSL/SRTP L5-L7 L5-7 Acc L5-7 Acc L5-7 Acc L5-7 Acc L5-7 Acc TCP/UDP L4 L4 L4 L4 L4 L4 IP & IPsec L3 L3 Acc L3 Acc L3 Acc L3 Acc L3 Acc MAC L2 L2 L2 L2 L2 L2 PHY L1 L1 L1 L1 L1 L1

Color IA OS code (generic) Processing Coding: Protocol Device Driver Platform Flows Accelerator Device Driver Optimized

All products, dates and figures are preliminary, for planning purposes only, and are subject to change without any notice. Intel may make 9 changes to specifications, product descriptions, and plans at any time, without notice.

How IA and Accelerators share Memory

Accelerators Run in physical address space

IA sees them as PCI device Non- Tolapai- Coherent • Run concurrently, async. to IA CPU Accelerator aware • IA CPU could be sleeping/throttled Private IA Driver (UC) IA and Accelerators share DRAM Most cost effective Coherent Shared Tolapai- • Share devices, pins, i.e. bandwidth IA aware DRAM Partitioned Accelerator IA Driver MAX_PHYSICAL (WB) Non-Coherent (Not Snooped) • Private to Accelerator IA Host • Physically contiguous portion hidden Operating from OS System • Accessible to Tolapai-aware IA driver • Separate high performance data path Coherent (Snooped) • Shared physical address space with IA

All products, dates and figures are preliminary, for planning purposes only, and are subject to change without any notice. Intel may make 10 changes to specifications, product descriptions, and plans at any time, without notice. The need to optimize DRAM Efficiency

Typical IA chipset Simple Bank Sorting Scheduler (Unified FIFO)

• Optimized for 64B cache line bursts 85% • Use page open policy (locality) 80% 75% Typical Tolapai Access Pattern 70% Client Chipset • Comparatively I/O heavy 65% • Highly interleaved address streams 60% Comms 55% 8-bank • DRAM controller is subjected to 50% Comms

Efficiency 4-bank randomized address stream 45% Exploration 40% 35% • Expect decreasing DRAM efficiencies 30% • Tolapai cost-limited to single memory 25% channel 20% 400 500 600 700 800 • Device Trc approximately constant (~50- DDR2 Frequency 60ns) • Development of Tolapai bank scheduling memory controller that overlaps per bank Trc

All products, dates and figures are preliminary, for planning purposes only, and are subject to change without any notice. Intel may make 11 changes to specifications, product descriptions, and plans at any time, without notice.

Non-Coherent Coherent Non-coherent Coherent CMD FIFO CMD FIFO WR BUF read returns read returns Tolapai Memory Controller Sort into independent bank FIFOs

Interleave across banks Simple R R Ar biter Bank Sort /transaction sort Schedule each bank to Tolapai Memory Controller maximize pin bandwidth Data Path Bank 0 Bank 0 Bank 7 Bank 7 and Simulation shows 60-70% Coh FIFO Non-Coh FIFO Coherent FIFO Non-Coh FIFO Write Data Buffers efficiency for mixed random address patterns Bank Scheduler Bandwidth (simulated) 50%R/ 50%W RR or WRR [Mixed random traffic] 67%R/ 33%W 4,500 ECC Logic Pin State Machine 4,000

D ata Bus ( 64b + EC C ) 3,500 CMD & Address 3,000 2,500 2,000

MBytes/sec 1,500 1,000 500 - 400 533 667 800 All products, dates and figures are preliminary, for planning purposes only, and are subject to change without any notice. Intel may make 12 changes to specifications, productDDR-2 descriptions Frequency, and plans at any time, without notice. Expected Tolapai Benefits Example - IPSec VPN Appliance using Tolapai

1600 17 in2 200 32 in2 Mbps 10% 25W (110 cm2) Mbps 100% 31W (206 cm2)

IPsec Thruput CPU Power Area IPsec Thruput CPU Power Area (Lookaside) Utilization (Fastpath) Utilization Traditional 4-chip IA Solution 1-chip Tolapai Solution (simulated) (CPU + MCH + ICH + PCI Crypto Accelerator) Estimated improvements based on simulation and are provided for informational purposes only. Results were derived using simulations run on an architecture simulator. Any difference in system hardware or Assumptions: software design or configuration may affect actual performance. Compares Intel® Pentium® M processor-based platform with external PCI crypto accelerator to 1-chip Tolapai solution 256 byte packets with 2048 IPsec VPN tunnels

All products, dates and figures are preliminary, for planning purposes only, and are subject to change without any notice. Intel may make 13 changes to specifications, product descriptions, and plans at any time, without notice.

Summary

Tolapai is a new Intel system-on-a-chip that integrates an IA core, an MCH, an ICH and Intel® QuickAssist Integrated Accelerators. Tolapai’s accelerators focus on widely used Internet security functions. Accelerators provide power-efficient security performance and enable headroom for IA applications. Memory controller optimized to get maximum performance out of a single DDR2 memory channel for both IA processor and accelerator traffic.

The Tolapai platform helps customers to rapidly build cost-effective fully IA-compatible systems and embedded IP-enabled appliances.

All products, dates and figures are preliminary, for planning purposes only, and are subject to change without any notice. Intel may make 14 changes to specifications, product descriptions, and plans at any time, without notice.