Programming the Data Path in Network Processor-Based Routers
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
System Design for a Computational-RAM Logic-In-Memory Parailel-Processing Machine
System Design for a Computational-RAM Logic-In-Memory ParaIlel-Processing Machine Peter M. Nyasulu, B .Sc., M.Eng. A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Doctor of Philosophy Ottaw a-Carleton Ins titute for Eleceical and Computer Engineering, Department of Electronics, Faculty of Engineering, Carleton University, Ottawa, Ontario, Canada May, 1999 O Peter M. Nyasulu, 1999 National Library Biôiiothkque nationale du Canada Acquisitions and Acquisitions et Bibliographie Services services bibliographiques 39S Weiiington Street 395. nie WeUingtm OnawaON KlAW Ottawa ON K1A ON4 Canada Canada The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, ban, distribute or seU reproduire, prêter, distribuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or electronic formats. la forme de microficbe/nlm, de reproduction sur papier ou sur format électronique. The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fkom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. Abstract Integrating several 1-bit processing elements at the sense amplifiers of a standard RAM improves the performance of massively-paralle1 applications because of the inherent parallelism and high data bandwidth inside the memory chip. -
Dop – a Cpu Core for Teaching Basics of Computer Architecture
DOP – A CPU CORE FOR TEACHING BASICS OF COMPUTER ARCHITECTURE Milos Becvar, Alois Pluhacek and Jiri Danecek Department of Computer Science and Engineering Faculty of Electrical Engineering Czech Technical University in Prague, Abstract: A simple 16-bit processor core called DOP and its teaching environment is presented. The DOP processor illustrates the basic principles of computer organization and is therefore used in the introductory hardware course. Its major features are simplicity, availability of an FPGA implementation and a C compiler. This paper presents the description of the core, HW and SW tools and teaching methodology. computing performance. The DOP processor core was 1. INTRODUCTION developed at our department together with various SW and HW visualization tools (Danecek et al., 1994a). An introductory computer hardware course should teach students to the fundamental principles of computer The goal of this paper is to describe this processor core internal functionality. Students, who are familiar with and its learning environment for teaching basics of programming in high-level languages, are required to computer organization. The paper is organized as understand the interaction between a processor, a follows - section 2 outlines the introductory course and memory and I/O devices, an internal organization of characterizes the students, section 3 describes the DOP processor, computer arithmetic and basics of digital processor core, section 4 describes the SW and HW design. Our experience has shown that it is not an easy tools supporting this processor and finally section 5 task for most of them. The functionality of the processor outlines the use of the DOP in our introductory course. -
C5ENPA1-DS, C-5E NETWORK PROCESSOR SILICON REVISION A1
Freescale Semiconductor, Inc... SILICON REVISION A1 REVISION SILICON C-5e NETWORK PROCESSOR Sheet Data Rev 03 PRELIMINARY C5ENPA1-DS/D Freescale Semiconductor,Inc. F o r M o r G e o I n t f o o : r w m w a t w i o . f n r e O e n s c T a h l i e s . c P o r o m d u c t , Freescale Semiconductor, Inc... Freescale Semiconductor,Inc. F o r M o r G e o I n t f o o : r w m w a t w i o . f n r e O e n s c T a h l i e s . c P o r o m d u c t , Freescale Semiconductor, Inc... Freescale Semiconductor,Inc. Silicon RevisionA1 C-5e NetworkProcessor Data Sheet Rev 03 C5ENPA1-DS/D F o r M o r Preli G e o I n t f o o : r w m w a t w i o . f n r e O e n s c T a h l i e s . c P o r o m m d u c t , inary Freescale Semiconductor, Inc... Freescale Semiconductor,Inc. F o r M o r G e o I n t f o o : r w m w a t w i o . f n r e O e n s c T a h l i e s . c P o r o m d u c t , Freescale Semiconductor, Inc. C5ENPA1-DS/D Rev 03 CONTENTS . -
Design and Implementation of a Stateful Network Packet Processing
610 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 25, NO. 1, FEBRUARY 2017 Design and Implementation of a Stateful Network Packet Processing Framework for GPUs Giorgos Vasiliadis, Lazaros Koromilas, Michalis Polychronakis, and Sotiris Ioannidis Abstract— Graphics processing units (GPUs) are a powerful and TCAMs, have greatly reduced both the cost and time platform for building the high-speed network traffic process- to develop network traffic processing systems, and have ing applications using low-cost hardware. The existing systems been successfully used in routers [4], [7] and network tap the massively parallel architecture of GPUs to speed up certain computationally intensive tasks, such as cryptographic intrusion detection systems [27], [34]. These systems offer a operations and pattern matching. However, they still suffer from scalable method of processing network packets in high-speed significant overheads due to critical-path operations that are still environments. However, implementations based on special- being carried out on the CPU, and redundant inter-device data purpose hardware are very difficult to extend and program, transfers. In this paper, we present GASPP, a programmable net- and prohibit them from being widely adopted by the industry. work traffic processing framework tailored to modern graphics processors. GASPP integrates optimized GPU-based implemen- In contrast, the emergence of commodity many-core tations of a broad range of operations commonly used in the architectures, such as multicore CPUs and modern graph- network traffic processing applications, including the first purely ics processors (GPUs) has proven to be a good solution GPU-based implementation of network flow tracking and TCP for accelerating many network applications, and has led stream reassembly. -
Embedded Multi-Core Processing for Networking
12 Embedded Multi-Core Processing for Networking Theofanis Orphanoudakis University of Peloponnese Tripoli, Greece [email protected] Stylianos Perissakis Intracom Telecom Athens, Greece [email protected] CONTENTS 12.1 Introduction ............................ 400 12.2 Overview of Proposed NPU Architectures ............ 403 12.2.1 Multi-Core Embedded Systems for Multi-Service Broadband Access and Multimedia Home Networks . 403 12.2.2 SoC Integration of Network Components and Examples of Commercial Access NPUs .............. 405 12.2.3 NPU Architectures for Core Network Nodes and High-Speed Networking and Switching ......... 407 12.3 Programmable Packet Processing Engines ............ 412 12.3.1 Parallelism ........................ 413 12.3.2 Multi-Threading Support ................ 418 12.3.3 Specialized Instruction Set Architectures ....... 421 12.4 Address Lookup and Packet Classification Engines ....... 422 12.4.1 Classification Techniques ................ 424 12.4.1.1 Trie-based Algorithms ............ 425 12.4.1.2 Hierarchical Intelligent Cuttings (HiCuts) . 425 12.4.2 Case Studies ....................... 426 12.5 Packet Buffering and Queue Management Engines ....... 431 399 400 Multi-Core Embedded Systems 12.5.1 Performance Issues ................... 433 12.5.1.1 External DRAMMemory Bottlenecks ... 433 12.5.1.2 Evaluation of Queue Management Functions: INTEL IXP1200 Case ................. 434 12.5.2 Design of Specialized Core for Implementation of Queue Management in Hardware ................ 435 12.5.2.1 Optimization Techniques .......... 439 12.5.2.2 Performance Evaluation of Hardware Queue Management Engine ............. 440 12.6 Scheduling Engines ......................... 442 12.6.1 Data Structures in Scheduling Architectures ..... 443 12.6.2 Task Scheduling ..................... 444 12.6.2.1 Load Balancing ................ 445 12.6.3 Traffic Scheduling ................... -
10. Assembly Language, Models of Computation
10. Assembly Language, Models of Computation 6.004x Computation Structures Part 2 – Computer Architecture Copyright © 2015 MIT EECS 6.004 Computation Structures L10: Assembly Language, Models of Computation, Slide #1 Beta ISA Summary • Storage: – Processor: 32 registers (r31 hardwired to 0) and PC – Main memory: Up to 4 GB, 32-bit words, 32-bit byte addresses, 4-byte-aligned accesses OPCODE rc ra rb unused • Instruction formats: OPCODE rc ra 16-bit signed constant 32 bits • Instruction classes: – ALU: Two input registers, or register and constant – Loads and stores: access memory – Branches, Jumps: change program counter 6.004 Computation Structures L10: Assembly Language, Models of Computation, Slide #2 Programming Languages 32-bit (4-byte) ADD instruction: 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 opcode rc ra rb (unused) Means, to the BETA, Reg[4] ß Reg[2] + Reg[3] We’d rather write in assembly language: Today ADD(R2, R3, R4) or better yet a high-level language: Coming up a = b + c; 6.004 Computation Structures L10: Assembly Language, Models of Computation, Slide #3 Assembly Language Symbolic 01101101 11000110 Array of bytes representation Assembler 00101111 to be loaded of stream of bytes 10110001 into memory ..... Source Binary text file machine language • Abstracts bit-level representation of instructions and addresses • We’ll learn UASM (“microassembler”), built into BSim • Main elements: – Values – Symbols – Labels (symbols for addresses) – Macros 6.004 Computation Structures L10: Assembly Language, Models -
Programmable Digital Microcircuits - a Survey with Examples of Use
- 237 - PROGRAMMABLE DIGITAL MICROCIRCUITS - A SURVEY WITH EXAMPLES OF USE C. Verkerk CERN, Geneva, Switzerland 1. Introduction For most readers the title of these lecture notes will evoke microprocessors. The fixed instruction set microprocessors are however not the only programmable digital mi• crocircuits and, although a number of pages will be dedicated to them, the aim of these notes is also to draw attention to other useful microcircuits. A complete survey of programmable circuits would fill several books and a selection had therefore to be made. The choice has rather been to treat a variety of devices than to give an in- depth treatment of a particular circuit. The selected devices have all found useful ap• plications in high-energy physics, or hold promise for future use. The microprocessor is very young : just over eleven years. An advertisement, an• nouncing a new era of integrated electronics, and which appeared in the November 15, 1971 issue of Electronics News, is generally considered its birth-certificate. The adver• tisement was for the Intel 4004 and its three support chips. The history leading to this announcement merits to be recalled. Intel, then a very young company, was working on the design of a chip-set for a high-performance calculator, for and in collaboration with a Japanese firm, Busicom. One of the Intel engineers found the Busicom design of 9 different chips too complicated and tried to find a more general and programmable solu• tion. His design, the 4004 microprocessor, was finally adapted by Busicom, and after further négociation, Intel acquired marketing rights for its new invention. -
The Hpc C Compiler 2.1 Introduction
™ MICROCONTROLLER DEVELOPMENT SUPPORT ( MOLE HPC™ C COMPILER USER'S MANUAL ( ( ( ( ~ National Semiconductor Corporation Customer Order Number 424410883-001 NSC Publication Number 424410883-001C October 1988 HPC™ C Compiler User's Manual @l 1988 National Semiconductor Corporation 2900 Semiconductor Drive P.O. Box 58090 Santa Clara. California 95052-8090 CONTENTS Chapter 1 OVERVIEW 1.1 INTRODUCTION............................. 1-1 1.2 MANUAL ORGANIZATION. .. 1-2 1.3 DOCUMENTATION CONVENTIONS. .. 1-2 1.3.1 General Conventions . .. 1-2 1.3.2 Conventions in Syntax Descriptions ............. 1-2 1.3.3 Example Conventions. .. 1-3 1.3.4 Additional Conventions .................... 1-3 Chapter 2 THE HPC C COMPILER 2.1 INTRODUCTION............................. 2-1 2.2 COMPILER COMMAND SYNTAX 2-1 Chapter 3 BASIC DEFINITIONS 3.1 INTRODUCTION............................. 3-1 3.2 NAMES.................................. 3-1 3.3 CONSTANTS............................... 3-1 3.4 ESCAPE SEQUENCES . .. 3-2 3.5 COMMENTS............................... 3-3 3.6 DATA TYPES. .. 3-3 3.7 PREPROCESSOR DIRECTIVES . .. 3-4 3.8 PROGRAM ORGANIZATION. .. 3-4 3.9 INITIALIZATION OF VARIABLES . .. 3-4 3.10 OPERATORS . .. 3-5 3.11 IN-LINE MICROASSEMBLER CODE . .. 3-5 Chapter 4 IMPLEMENTATION-DEPENDENT CONSIDERATIONS 4.1 INTRODUCTION............................. 4-1 4.2 MEMORy................................. 4-1 4.3 STORAGE CLASSES . .. 4-1 4.3.1 Storage Class Modifiers. .. 4-1 4.4 C STACK FORMAT . .. 4-3 4.5 USING IN-LINE MICROASSEMBLER CODE. .. 4-4 4.6 EFFICIENCY CONSIDERATIONS . .. 4-6 4.6.1 Declaration Syntax . .. 4-9 4.7 STATEMENTS AND IMPLEMENTATION .............. 4-10 4.8 RUN-TIME NOTES ........................... 4-11 v Appendix A CCHPC SPECIFICATIONS Appendix B CONVERTING BETWEEN STANDARD C AND CCHPC Appendix C INVOCATION LINE SYNTAX C.l INTRODUCTION............................ -
WCAE 2003 Workshop on Computer Architecture Education
WCAE 2003 Proceedings of the Workshop on Computer Architecture Education in conjunction with The 30th International Symposium on Computer Architecture DQG 2003 Federated Computing Research Conference Town and Country Resort and Convention Center b San Diego, California June 8, 2003 Workshop on Computer Architecture Education Sunday, June 8, 2003 Session 1. Welcome and Keynote 8:45–10:00 8:45 Welcome Edward F. Gehringer, workshop organizer 8:50 Keynote address, “Teaching and teaching computer Architecture: Two very different topics (Some opinions about each),” Yale Patt, teacher, University of Texas at Austin 1 Break 10:00–10:30 Session 2. Teaching with New Architectures 10:30–11:20 10:30 “Intel Itanium floating-point architecture,” Marius Cornea, John Harrison, and Ping Tak Peter Tang, Intel Corp. ................................................................................................................................. 5 10:50 “DOP — A CPU core for teaching basics of computer architecture,” Miloš BeþváĜ, Alois Pluháþek and JiĜí DanƟþek, Czech Technical University in Prague ................................................................. 14 11:05 Discussion Break 11:20–11:30 Session 3. Class Projects 11:30–12:30 11:30 “Superscalar out-of-order demystified in four instructions,” James C. Hoe, Carnegie Mellon University ......................................................................................................................................... 22 11:50 “Bridging the gap between undergraduate and graduate experience in -
Network Processors: Building Block for Programmable Networks
NetworkNetwork Processors:Processors: BuildingBuilding BlockBlock forfor programmableprogrammable networksnetworks Raj Yavatkar Chief Software Architect Intel® Internet Exchange Architecture [email protected] 1 Page 1 Raj Yavatkar OutlineOutline y IXP 2xxx hardware architecture y IXA software architecture y Usage questions y Research questions Page 2 Raj Yavatkar IXPIXP NetworkNetwork ProcessorsProcessors Control Processor y Microengines – RISC processors optimized for packet processing Media/Fabric StrongARM – Hardware support for Interface – Hardware support for multi-threading y Embedded ME 1 ME 2 ME n StrongARM/Xscale – Runs embedded OS and handles exception tasks SRAM DRAM Page 3 Raj Yavatkar IXP:IXP: AA BuildingBuilding BlockBlock forfor NetworkNetwork SystemsSystems y Example: IXP2800 – 16 micro-engines + XScale core Multi-threaded (x8) – Up to 1.4 Ghz ME speed RDRAM Microengine Array Media – 8 HW threads/ME Controller – 4K control store per ME Switch MEv2 MEv2 MEv2 MEv2 Fabric – Multi-level memory hierarchy 1 2 3 4 I/F – Multiple inter-processor communication channels MEv2 MEv2 MEv2 MEv2 Intel® 8 7 6 5 y NPU vs. GPU tradeoffs PCI XScale™ Core MEv2 MEv2 MEv2 MEv2 – Reduce core complexity 9 10 11 12 – No hardware caching – Simpler instructions Î shallow MEv2 MEv2 MEv2 MEv2 Scratch pipelines QDR SRAM 16 15 14 13 Memory – Multiple cores with HW multi- Controller Hash Per-Engine threading per chip Unit Memory, CAM, Signals Interconnect Page 4 Raj Yavatkar IXPIXP 24002400 BlockBlock DiagramDiagram Page 5 Raj Yavatkar XScaleXScale -
Series 3000 Reference Manua
inter Series 3000 Family Of Computing Elements - The Total System Solution. Since its introduction in September, 1974, the Series 3000 family of computing elements has found acceptance in a wide range of high performance applications from disk controllers to airborne CPU's. The Series 3000 family represents more than a simple collection of bipolar components, it is a complete family of computing elements and hardware/software support that greatly simplifies the task of transforming a design from concept to production. The Series 3000 Component Family A complete set of computing elements that are designed as a system requiring a minimum amount of ancillary circuitry. 3001 Microprogram Control Unit. 3002 Central Processing Element. 3003 . Look-Ahead Carry Generator. 3212 Multi-Mode Latch Buffer. 3214 Interrupt Control Unit. 3216/26 Parallel Bi-directional Bus Driver. ROMs/PROMs A complete set of bipolar ROMs and PROMs. RAMs A Complete family of MOS and bipolar RAMs. rhe Series 3000 Support A comprehensive support system that assists the designer in writing microprograms, debugging hardware and microcode, and programming prototype and production PROMs. CROMIS Cross microprogram assembler. MDS-800 Microcomputer development system with TTY/CRT, line printer, diskette, PROM programmer and high speed paper tape reader facilities. ICE-30 In-circuit emulation for the 3001 MCU. ROM-SIM ROM simulation for all of Intel's Bipolar ROMs and PROMs. Application Central processor and disk controller designs and Notes system timing considerations. Customer Comprehensive 3 day course covering the component Course family, CPU and controller designs, microprogramming and the MDS-800, ICE-30 and ROM-SIM operation. -
Intel® IXP42X Product Line of Network Processors with ETHERNET Powerlink Controlled Node
Application Brief Intel® IXP42X Product Line of Network Processors With ETHERNET Powerlink Controlled Node The networked factory floor enables the While different real-time Ethernet solutions adoption of systems for remote monitoring, are available or under development, EPL, long-distance support, diagnostic services the real-time protocol solution managed and the integration of in-plant systems by the open vendor and user group EPSG with the enterprise. The need for flexible (ETHERNET Powerlink Standardization Group), connectivity solutions and high network is the only deterministic data communication bandwidth is driving a fundamental shift away protocol that is fully conformant with Ethernet from legacy industrial bus architectures and networking standards. communications protocols to industry EPL takes the standard approach of IEEE standards and commercial off-the shelf (COTS) 802.3 Ethernet with a mixed polling and time- solutions. Standards-based interconnect slicing mechanism to transfer time-critical data technologies and communications protocols, within extremely short and precise isochronous especially Ethernet, enable simpler and more cycles. In addition, EPL provides configurable cost-effective integration of network elements timing to synchronize networked nodes with and applications, from the enterprise to the www.intel.com/go/embedded high precision while asynchronously transmitting factory floor. data that is less time-critical. EPL is the ideal The Intel® IXP42X product line of network solution for meeting the timing demands of processors with ETHERNET Powerlink (EPL) typical high performance industrial applications, software helps manufacturers of industrial such as automation and motion control. control and automation devices bridge Current implementations have reached 200 µs between real-time Ethernet on the factory cycle-time with a timing deviation (jitter) less floor and standard Ethernet IT networks in than 1 µs.