
CEC 450 Real-Time Systems Lecture 9 – Device Interfaces and I/O October 20, 2020 Sam Siewert View to End of Semester Exam 1 – Ave 77.74, High 98 – Exam #1 Solutions - Go over in class (Canva T/F, Multiple Choice) – Code Q&A – Grades Posted on Canvas – Discuss, Q&A Assignment #4 – Last Practice Oriented Exercise Assignment #5 – Propose In-Depth Creative RT Project, or Outline Requirements for Standard Project (and team if you wish) – Or Standard External Clock / Metronome Machine Vision Project Exam #2 - Extension to I/O, Memory and Mission Critical RT – Can do on Canvas rather than in class – Will be take home, open notes, book, etc., with some simple coding Assignment #6 – Complete Project Final Oral Exam – Present Project Sam Siewert 2 LUB vs. N&S (CT/SP) vs. Safe Criteria LUB - always safe, but fails some workable service sets N&S (CT/SP) - exact, but could have low to zero margin (harmonic), and therefore, potentially unsafe Safe Criteria - Must pass N&S test AND have say 10% margin – Margin criteria s.t. margin < LUB required for m services – E.g. margin < 30%, but more than 0 – Compromise that tolerates some WCET and T error, jitter, drift Test Has margin? Is feasible? LUB yes, f(m) some false failures some zero margin N&S CT/SP (harmonic) exact Safe Criteria yes, fixed fails if insufficient margin Sam Siewert 3 Bus I/O Interfacing to Off-Chip Devices Sam Siewert 4 Recall: Conceptual View of HW/SW Interface Three-Space View of Utilization CPU-Util Requirements – CPU Margin? – IO Latency (and Bandwidth) Margin? – Memory Capacity (and IO-Util Latency) Margin? Upper Right Front Corner – Low-Margin Memory-Util Origin – High-Margin Sam Siewert 5 VITA VME/VXS/VXI vs. PCI / PCIe VITA VME (VESA Module PCI 2.1, 2.2, 2.3 (Peripheral Expansion - History) Component Interconnect) Asynchronous 20 MhZ Synch Clock 33/66 MhZ A32, A24, A16 Addr Bus Muxed 32/64 bit A/D Bus D32, D24, D16 Data Bus Word or Block Transfer Burst Transfer Always Daisy-Chained Prio Interrupts Int A-D Routed to APIC, MSI added Interrupt Data Cycle Map onto IRQ 0…15 Built-in Hidden Arbiter Device Designed in MMIO Plug ‘n’ Play Configuration Space Custom Bus Integration on 6U PCI-to-PCI Bridge Scalability 3U/6U D-shell form factor CPCI, PMC, PC/104+, PCI-X VME, VXS Bus, VXI Bus PCI-Express Sam Siewert 6 Card / Backplane I/O Expansion Scalable Embedded Systems DoD, Commercial Aviation, etc. Sam Siewert 7 PCI Revisions Compared Bus Frequency Potential Number of Bandwidth Devices PCI 2.x 32-bit 33 MhZ 133 Mbytes/sec 4-5 PCI 2.x 32-bit 66 MhZ 266 Mbytes/sec 1-2 PCI-X 1.0a 133 MhZ 533 Mbytes/sec 1-2 PCI-X 2.0 266 MhZ 1066 Mbytes/sec 1 Point-to-Point Bus PCI-E x8 bi- 2.5 GhZ 4 GBytes/sec Switched Scalable directional Differential Serial Byte Lanes Sam Siewert 8 Original PCI System - Intel Reference Key concepts: Evolved into PCIe gen3/4 1) NB - Memory Controller, CPU • 8/16 G-transfers/sec core(s) and cache, integrated graphics • Differential serial byte lanes CPU 2) SB - I/O Controller interfaced to FSB - Front side bus evolved into Memory Controller (MMIO) with low QPI (Quick Path Interconnect) rate programmed I/O devices and FSB high-rate DMA Graphics North Bridge Adapter AGP SDRAM/ DDR PCI 2.x Bus IDE South Bridge Ethernet Expansion Slots ISA Bus COM-A Audio Super IO COM-B Sam Siewert 9 x86 and ARM SoCs - with PCIe Bus Key Distinctions between and MCUs and an SoC x86 PC System – Both are Single Chip Solutions Architecture – SoC has more processing, memory, and scalable I/O for Memory and I/O Bus Interfaces to Peripherals Multi-Core CPU Memory Controller (Local Bus) – On-chip Memory (E.g. SRAM) – Off-chip Memory Expansion (E.g. DRAM) – On-chip and Off-chip Persistent Memory (Nand, NOR Flash) I/O Bus (e.g. PCIe - Express) – Expansion I/O Bus (PCIe) – On-chip I/O Bus NVIDIA Tegra K1 Quad Cortex-A15 Intel Altera Cyclone V HPS, Cyclone SoC Dual Core ARM Cortex-A9 Sam Siewert 10 Tegra K1 SoC Detailed Block Diagram Sam Siewert 11 PCI Key Concepts PCI SIG Industry Consortium for Standard – PCI 2.1, 2.2, 2.3 – PCI-X 1.0a/b, PCI-X 2.0 – PCI-Express North Bridge: CPU, Memory, AGP, PCI Bus South Bridge: PCI Bus, APIC, ISA Bus – Legacy I/F for x86 IRQs and SuperIO ISA Chipsets – Not Required, but PCI NB/SB Often on a Single Chip PCI-to-PCI Bus Bridges for Scalability – Type 0 Configuration Transaction for Bus 0 – Type 1 Configuration Transaction for Bridged Bus Sam Siewert 12 PCI Key Concepts Plug ‘n’ Play (Resource Config Space) – OS and/or BIOS can probe configuration space registers at well known port address For Each Bus (256), Probe to Find all Devices (32) – Vendor ID and Device ID – Setup Each Device Function (8) – Setup Interrupts A-D for Each Function Program Command Register for MMIO, IO, and Mastering Program BAR 0-5 for MMIO or IO Program Int A-D if Applicable Hidden Arbitration – Req/Gnt During Master to Target Bursts – Master Latency Timer is Minimum Burst Sam Siewert 13 PCI Form Factors PC Edge Connected 32 bit and 64 bit slots CPCI Backplane Pin Connectors PC/104+ PCI and ISA Stackable PMC PCI Mezzanine Cards PCI-Express Scalable Differential Serial – PCI Compatible Message Transport Protocol – x1 to x32 PCI Express Byte Lanes (8 to 256 bits) – Root Complex (Replaces North Bridge) – Byte Lanes are 2.5 Gb/s/direction Differential Serial 8b/10b Encoded – Switched Architecture – Slots, Embedded, Cables MiniPCI Sam Siewert 14 PCI-Express Design Goals: – Highest Bandwidth / pin (2.5 Gb/sec/direction) PCI-Express [(2.5 Gb/s/dir X 8b/dir) X (1B/8b)]/40 pins = 100 MB/s/pin PCI [(32b X 33 MhZ) X (1B/8b)]/84 pins = 1.58 MB/s/pin PCI-X 2.0 (DDR) [(64b X 266 MhZ) X (1B/8b)]/150 pins = 7.09 MB/s/pin – De-couple Physical Signaling from Protocol – Switched Architecture – Message Protocol with Minimized Side-Band Signals MSI and MSI-X Message-Based Interrupts Data Transport Management Power Management Side-Band Signals – Compatible with PCI-2.x, PCI-X 1.0a/b, PCI-X 2.0 PCI buses Bridged with PCI-Express Switches on I/O Bridge North Bridge Becomes Root Complex – Memory Bridge – I/O Bridge Sam Siewert 15 PCI-Express Each Tx/Rx Differential Pair (HSOp,n and HSIp,n) forms a Byte Lane – Byte Lanes Ganged x1, x2, x4, x8, x12, x16, x32 – Serializer/Deserializer on Each Byte Lane – Driver, Buffering, and PLL on Each Byte Lane – Lane-to-Lane Deskewing Done in Phy Data Tx/Rx with Packet Protocol PCI PnP Driver transaction HDR link Seq# CRC physical Frame Frame Sam Siewert 16 I/O Trends High Speed Differential Serial Overtaking Parallel Buses? – USB 2.x, 3.x – PCI-Express – 1/10G Ethernet (1G Cat-5 UTP Copper, 10G Fiber) – SAS Parallel Buses Hit Signal Integrity Limits (Skew, Crosstalk) – DDR (266, 400 MhZ) – PCI-X 2.0 – Quad-Rate Sam Siewert 17 Single Board Computer SoCs SBC = Single Board Computer (Instead of Backplane) For RT Systems 2 Boards are Use for High Rate I/O (with Co-Processing) – Jetson TK-1 – Multi-Core CPU + GPU Co-Processor – DE1-SoC – Multi-Core CPU + FPGA Co-Processor For Low Rate, Texas Instruments Tiva TM4C is also an Option SBCs are Less Scalable than a CPCI or VXS/VXI Backplane, But SoC Packs Multiple Cores and I/O onto a Single Chip! Sam Siewert 18 Embedded MCUs – Texas Instruments TIVA - ARM Cortex M4 Microprocessor, IAR or Code Composer IDE, Cyclic Executive or FreeRTOS ARM M-Series SBC:TIVA TM4C123G DEV BOARD (used in lab), TM4C123G MCU (used in lab), ARM Cortex M4 (used in lab), TM4C123GXL Launch Pad ARM M-Series IoT Boards:IoT Enabled TM4C129X Connected Development Kit , TM4C1294XL Connected Launch Pad Sam Siewert 19 Embedded GP-GPU SoCs - Jetson Older Jetson TK1 CPU+GPU – NVIDIA "4-Plus-1" 2.32GHz ARM quad-core Cortex-A15 – NVIDIA Kepler "GK20a" GPU with 192 SM3.2 CUDA cores (up to 326 GFLOPS) Newer Jetson Nano Tegra K1 – Competitive with R-Pi, TI OMAP, etc. in terms of price, no fan, etc. – Same Tegra K1 SoC – Much more compact – Good for student projects involving machine vision, AI https://developer.nvidia.com/embedded/jetson-nano-developer-kit Sam Siewert 20 Embedded FPGA SoC Devices – DE1-SoC Reconfigurable SoC with FPGA Co-processing Dual-Core ARM Cortex A9, Linux or FreeRTOS Sam Siewert 21 Comparison of FGPA and GP-GPU SoC Both use PCIe to interface to co- processing, both < 12 Watts peak, 2-5 Watts idle FPGA Tegra K1 – FPGA is power efficient, with scaling limited by gates (LE logic elements, or LUTs Look-Up Tables) – Power consumption is more constant GP-GPU – GP-GPU scales with number of synergistic processors and gridding of workload – Power consumption is efficient, but scales up to saturation of 4 GP cores + 192 SP cores Full comparison here - for image proc Sam Siewert DE1-SoC FPGA 22 Processor Trends Yesterday’s Board, Today’s Chipset, Tomorrow’s ASIC System on a Chip - SoC – Core(s) + IO (PowerPC 8xx, 82xx) – Reconfigurable (Virtex II) – Configurable (Tensilica) – IP Modules (CPU Cores, Memory Contorller, Local Bus) Offloading to HW (Today’s SW is Tomorrow’s HW) Multi-Core SoCs – Cache Coherency – e.g.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages29 Page
-
File Size-