NP-4™ Network Processor

NETWORK PROCESSOR PRODUCT BRIEF † HIGHLIGHTS ™ NP-4 Network Processor – Single-chip, programmable, 100-Gigabit throughput (50-Gigabit full duplex) wire-speed 100Gbps NPU for Carrier Ethernet Applications network processor – Line card, services card, pizza box and switch card applications Mellanox’s NP-4 is a highly flexible network processor providing wire-speed packet processing with both an integrated traffic manager and a control CPU. NP-4 provides the silicon core of – Based on Mellanox’s NP-3 with performance next-generation Carrier Ethernet Switches and Routers (CESR). scaling and an enhanced feature set – On-chip CPUs for control CPU offload Through programming the NP-4 delivers a variety of applications such as L2 switching, QVLAN – On-chip Fabric Interface Controller for stacking, MPLS and VPLS, and IPv4/IPv6 routing coupled with QoS for providing flow-based interfacing to Ethernet fabrics as well as third- service level agreements (SLA). party fabric solutions Mellanox’s NP-4 is a highly flexible network processor providing wire-speed packet processing – System-wide traffic management with with both an integrated traffic manager and a control CPU. The NP-4 offers the speed of an hierarchical scheduling ASIC combined with the flexibility of a programmable microprocessor. It provides the silicon – Flexible processing with programmable packet core of next-generation Carrier Ethernet Switches and Routers (CESR). Through programming parsing, classifying, modifying and forwarding the NP-4 delivers a variety of applications such as L2 switching, QVLAN stacking, MPLS and – IP reassembly VPLS, IPv4/IPv6 routing coupled with QoS for providing flow-based service level agreements – Enhanced support for video streams and IPTV (SLA). – Embedded search engine eliminating the need The NP-4 integrates into a single chip several functions that would normally be found in for external co-processors separate chips: – On-chip OAM protocol processing offload – 50-Gigabit full-duplex processing – Integrated MACs: – Serdes interfaces configurable to various – Classification search engines – 48 1-Gigabit MACs network interfaces: – 50/100-Gigabit traffic manager – Ten 10/20-Gigabit Ethernet MACs – Ten XAUI/RXAUI interfaces – On-chip Control CPU – Three Interlaken MACs – 24 quad-speed SGMII ports or 48 tri-speed – On-chip Quality of Service CPU – One 40-Gigabit MAC QSGMII ports – Fabric Interface Chip (FIC) functionality – Three Interlaken MACs – Support for OC-768 framer, switch fabrics and FLEXIBLE PACKET PROCESSING external 100G Ethernet MAC – Single 40G MAC NP-4 provides exceptionally flexible packet processing enabling system designers to future – Internal TCAM proof their designs to support new protocols and features through software updates. Packet – On-chip hardware time-stamping supporting parsing is supported for any field anywhere in the packet. Various table lookup options are IEEE1588v2 provided with support for long lookup keys and results. Flows are classified based on any combination of extracted packet information. Any packet header and content can be edited – Support for Synchronous Ethernet ITU-T G.8261 as required by Circuit Emulation Services and packets can easily be replicated to support multicast applications. A ‘run to completion’ processing model guarantees support for processing scenarios of any complexity. Large code – PCI-Express external host interface space is provided to support complex applications as well as true hitless code updates. – Comprehensive on-chip diagnostic hardware support ©2017 Mellanox Technologies. All rights reserved. † For illustration only. Actual products may vary. Mellanox NP-4 Network Processor page 2 FEATURES SUMMARY Integrated Traffic Management Stateful Classifying & Processing • External statistics memory interface: – 180Mpps throughput – Access to all 7 layers for classify and modify • RLDRAM2-SIO – Dynamic hitless resource allocation – Maintains state of millions of sessions • 533 MHz DDR; 2x18 bit, 1 or 2 devices simultaneously – Dynamic hitless reconfiguration • ECC protected counters – LAG shaping – On-chip state updates and learning of millions of – External TCAM interface: – Work conserving and non-work conserving sessions per second • Especially useful for fast lookups through large schedulers Programming tables with wildcards, such as Access Control – Frame sizes from 1 byte to 11 KB – Large code space memory for multiple and Lists (ACL) – Total frame memory up to 4 Gbytes complex applications – Hitless code upgrades OAM Offload – Up to 8M frames – Single-image programming model with no – KeepAlive frame generation for precise and – Per Flow Queuing (PFQ) with 5-level hierarchical parallel programming or multi-threading scheduling: accurate session maintenance operations – Automatic ordering of frames • 32 interfaces – KeepAlive watchdog timers for fastest detection – Automatic allocation of frames to processing • 256 ports time engines (TOPs) • 4K subports – 802.1ag compliant message generation/ – Automatic passing of messages among TOPs • 32K classes/users termination offload – Microcode compatible with Mellanox’s NP-2, • 256K flows NP-3 and NPA network processors – Per OAM session state tracking and reporting – Policing: Per-flow metering, marking and policing – Flexible statistics and performance monitoring for millions of flows Interfaces – Serdes interfaces configurable to various network Statistics and Counters – Configurable WRED profiles interfaces – Per flow per color WRED statistics – Up to 16M 64-bit counters via external memory – 40 Gigabit Ethernet MAC compatible to 802.3ba – Shaping: Single and Dual leaky bucket on – Per-flow statistics for programmable events, standard over XLAUI Multi Lane Distribution over committed/peak rate/bursts (CIR, CBS, PIR, PBS), traffic metering, policing and shaping 8 physical lanes with IFG emulation for accurate rate control – Programmable threshold settings and threshold – Ten XAUI interfaces: – Scheduling: WFQ and priority scheduling at each exceeded notification • Ten on-chip 10G/20G MACs hierarchy level • 3.125Gbps; 6.25Gbps per lane – Dynamic allocation and auto association between – Per frame timestamp and timeout drop • Channelized operation with up to 256 transmit counters and flows. Counters are automatically – Hardware flow control per port and TM Interface/ channels recycled when a flow is deleted or aged. Port/Subport and Class • In band and out of band flow control – Auto implementation of token bucket per flow – Link-level flow control generation management • Connection to Ethernet and TDM framers scheme based on flexible traffic aggregation per (srTCM, trTCM or MEF5): • Support for SPAUI packet mode source/destination TM congestion • Hardware implementation of token bucket • Support for RXAUI protocol – Class-based flow control calculations and coloring (green, yellow, red) – 24 quad-speed SGMII/1000Base-X Ethernet Packet Manipulation & Reassembly interfaces or 48 tri-speed QSGMII Ethernet Power Management – TOPs control of TM buffered data interfaces – Per interface power-up/power-down – Data reordering – Three Interlaken MACs – Configurable number of active TOP engines at – Data reassembly (e.g. IP reassembly) – External Host interface: each stage, for best power optimization per Enhanced Video Transmission • 1-lane PCI-Express 2.5Gbps for control CPU application – Caching video streams for retransmission interface Physical Specifications – Video data awareness, IPTV fast channel zapping • Additional 2xSGMII GE ports – Package: HFCBGA, 1895 pins, 45x45 mm – Video de-multiplexing • MDC/MDIO master interface for external PHY – Process: 55nm – Video streams path redundancy control; continuous polling mode by HW Integrated Search Engines – LED interface for port status exporting – Power supply: 1.0V core voltage – Flexibly defined switching, routing, classification – External memory interfaces: – Power dissipation typical: NP-4: 35W, and policy lookup tables with millions of entries • External TM memory interface (optional): NP-4L: 25W per table • DDR3 SDRAM Models • 666 MHz DDR; 8x16 bit – Programmable keys and results (associated – NP-4 with 100-Gigabit throughput (50G full information) per table • ECC protected data duplex) – Support for long keys and long results per table • External lookup table memory interface: – NP-4L with 50-Gigabit throughput (25G full entry • DDR3 SDRAM duplex) – Table entries stored in DRAM to reduce power • 666 MHz DDR; 8x16 bit or 16x8 bit dissipation and cost and provide large lookup • ECC protected structures – Both devices have same package, pin out, tables headroom interfaces and are software compatible ©2017 Mellanox Technologies. All rights reserved. Mellanox NP-4 Network Processor page 3 SAMPLE APPLICATIONS NP-4’s flexibility and integration allows system vendors to deliver cost effective solutions that can easily adapt to changing market requirements. Figure 1. Interfaces Diagram Typical applications include: CPU or Host Network Ports CPU – Line cards in modular chassis: 2xSGMII PCI Express • Metro Switches 2x1GE MACs Host • 4x32bit Edge and Core Routers 48xG1E 666MHz 2xSGMII MACs • Wireless Backhaul Aggregation Switch/Routers or DDR3 Lookup 48xQSGMII Tables or Packet • Enterprise Backbone Switches 10x10GE Lookup Interfaces 10xXAUI/SPAUI MACs Tables or – Stand-alone box solutions: 3xIntertaken NP-4 TCAM Optional 3xInterlaken or MACs ACLs • Ethernet Aggregation Nodes 40GE 40GE MAC TM Memory Statistics • EPON/GPON OLTs and cable CMTS 4x32bit 2x18bit 666MHz 533MHz • Firewalls,

NP-4™ Network Processor

C5ENPA1-DS, C-5E NETWORK PROCESSOR SILICON REVISION A1

Design and Implementation of a Stateful Network Packet Processing

Embedded Multi-Core Processing for Networking

Network Processors: Building Block for Programmable Networks

Intel® IXP42X Product Line of Network Processors with ETHERNET Powerlink Controlled Node

And GPU-Based DNN Training on Modern Architectures

Effective Compilation Support for Variable Instruction Set Architecture

NP-5™ Network Processor

Network Processors the Morgan Kaufmann Series in Systems on Silicon Series Editor: Wayne Wolf, Georgia Institute of Technology

Synchronized MIMD Computing Bradley C. Kuszmaul

A Network Processor Architecture for High Speed Carrier Grade Ethernet Networks

Object-Oriented Reconfigurable Processing for Wireless Networks Andrew A