Mobile Application and Multimedia Processors
Total Page:16
File Type:pdf, Size:1020Kb
5/11/21 Ch. 7: Application and Multimedia Processors Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University 1 Outline Introduction ARM ARM CPU cores The ARM Architecture Case Study: Application Processors Chapter 7: Applications and Multimedia Processors 2 2 1 5/11/21 Introduction An Application Processor is a System on Chip (SoC) designed to support a variety of mobile applications running on mobile OS. ◦ Smartphones ◦ Tablets ◦ Netbooks ◦ SmartTVs ◦ Car Info System ◦ Gaming Console ◦ .. High performance, low power, and low system cost. Chapter 7: Applications and Multimedia Processors 3 3 Smartphone Block Diagram Video Multi-core CPU GPU Codec Audio Codec Chapter 7: Applications and Multimedia Processors 4 4 2 5/11/21 Outline Introduction ARM ARM CPU cores The ARM Architecture Case Study: Application Processors Chapter 7: Applications and Multimedia Processors 5 5 ARM CPU cores for Application Processors are based on ARM: ◦ Except for Apple and some Samsung devices. Two types of approaches: ◦ Processor (core) license ◦ Architecture (ISA) license So what is ARM? Chapter 7: Applications and Multimedia Processors 6 6 3 5/11/21 Advanced RISC Machines Founded in November 1990: ◦ A spin-off from Acorn Computer. Acorn designed the CPU for Apple Newton (1993) ◦ ARM originally stood for “Acorn RISC Machine”. Designs ARM RISC cores. Licenses ARM core designs to partners who fabricate and sell to their customers: ◦ ARM is fabless. Acorn A5000 (1991) Also develops technologies to 25 or 33 MHz ARM3 CPU assist with products using ARM 2-4 MB RAM, 20 MB HDD cores: ◦ Software tools, boards, debug hardware, application software, bus architectures, peripherals, etc. Acorn RISC PC 600 (1994) 30 MHz ARM6 CPU 4 MB RAM Chapter 7: Applications and Multimedia Processors 7 7 ARM Dominance ARM processors are in just about any mobile devices you can think of: ◦ Smartphones ◦ Netbooks ◦ Pads/tablets ◦ Embedded devices ◦ … ARM only licenses its technology as Intellectual Property (IP), rather than manufacturing its own CPUs. As of 2021, over 180 billion chips based on ARM design were manufactured! Chapter 7: Applications and Multimedia Processors 8 8 4 5/11/21 ARM Business Model IPs – ISA, Processors, Interconnects, Graphics, Codecs, … Qualcomm Samsung, LG, Google, Motorola, etc. Two types of licenses: Processor license: Buses and interfaces: free Most companies rely on processor Upfront fees: $1M ~ $10M IPs from ARM Royalty: 1% ~ 2% Architecture (ISA) license: Apple’s A series CPUs Nvidia’s Denver CPUs Chapter 7: Applications and Multimedia Processors 9 9 ARM Processor Family Cortex-A17 To p -of-the-line Cortex-A78, A77, A76 processors! Cortex-A75, A73, A72, A57 Cortex- A55, A53 ARMv.8 (32-bit/64-bit) ARM ISA versions Chapter 7: Applications and Multimedia Processors 10 10 5 5/11/21 ARM Cortex-A Cores Chapter 7: Applications and Multimedia Processors 11 11 ARM Architecture Evolution AArch32 AArch64 Cryptography Cryptography ext. ext. Security TrustZone Multimedia NEON extensions DSP SIMD Adv. SIMD Key feature Vector ARMv7-A VFPv1/2 Floating-Point VFPv3/v4 compatibility Direct JAVA Jazelle bytecode execution Jazelle (ex. by SW) ThumbEE (Jazelle-RCT) 16-bit Thumb Thumb-2 Inst. Set (ARMv4T) (ARMv6T2) ARMv4 ARMv5 ARMv6 ARMv7-A/R ARMv8-A ARM920T ARM926 ARM1176 Cortex-A5-15 Cortex-A50 (~2000) (2001) (2004) (2006) (2014) Chapter 7: Applications and Multimedia Processors 12 12 6 5/11/21 Advanced Microcontroller Bus Architecture (AMBA) On-chip interconnect specification: Latest (2013) => AMBA 5 CHI (Coherent Hub Interface) Arbiter Reset ARM nIRQ TIC nFIQ Interrupt Controller External Bus Interface Timer I/O ROM External Bus AHB or ASB Interface System Bus APB I/O Bus External Bridge RAM On-chip DMA I/O RAM Low-bandwidth I/O devices Advanced High-performance Bus (AHB) Advanced Peripheral Bus (APB) Chapter 7: Applications and Multimedia Processors 13 13 IPs and Tools Processors Security ◦ Cortex series ◦ Tr ust Zone ◦ SecureCore series ◦ SecureCore System IP IoT ◦ AMBA ◦ Device platform ◦ Cache-coherent interconnects To o l s Multimedia IPs ◦ Software development tools ◦ Graphics ◦ Debugging tools ◦ Video ◦ Development boards ◦ Audio Physical IPs ◦ Power management ◦ Memory ◦ I/O Chapter 7: Applications and Multimedia Processors 14 14 7 5/11/21 Outline Introduction ARM ARM CPU cores The ARM Architecture Case Study: Application Processors Chapter 7: Applications and Multimedia Processors 15 15 An ARM System Generic Interrupt Controller Video Display Multi-core GPU Processor Processor Memory Management Unit Cache Coherent Interconnect Network Interconnect System Control Processor (Power Management) Memory Controller Debug and Trace Chapter 7: Applications and Multimedia Processors 16 16 8 5/11/21 ARM Cortex-A75 Address Generation Unit (AGU) Quad-core, ARMv8-A Simple branch µops bypass Rename/Dispatch Multimedia Extensions Introduced in 2017 3-way SuperScalar, OoO, 11/13+ stage pipeline, up to 3 GHz 8 Execution Units 64 KB Instruction and 64 KB Data L1 cache 256 KB or 512 KB L2 cache Optional 512 KB - 4 MB L3 cache big core Chapter 7: Applications and Multimedia Processors 17 17 ARM Cortex-A55 Address Generation Unit (AGU) 2-way SuperScalar, in-order, 8/10 stage pipeline 8 Execution Units 16KB, 32KB, or 32KB L1 cache Up to 256KB L2 cache Energy efficient LITTLE core Chapter 7: Applications and Multimedia Processors 19 19 9 5/11/21 ARM big.LITTLETe c h n o l o g y Use during high loads Use during Responsiveness of mobile high loads application requires high Use during Cluster of low loads big cores performance, but only for Cluster of brief bursts of time. LITTLE cores CPU0 CPU1 Solution => Distribute CPU0 CPU1 tasks across both low power cores and high CPU2 CPU3 CPU2 CPU3 performance cores: ◦ A cluster of a low power cores => LITTLE cores Cache Coherent Interconnect To memory ◦ A cluster of high performance cores => big cores Delivers responsive apps and longer battery life! Chapter 7: Applications and Multimedia Processors 20 20 Cluster Migration Model The cluster migration model Exclusive use Inclusive use of the clusters of the clusters Cluster migration Core migration Core migration big.LITTLE processing big.LITTLE processing big.LITTLE MP with cluster migration with core migration Chapter 7: Applications and Multimedia Processors 21 21 10 5/11/21 big.LITTLE Processing with Cluster Migration Only one core cluster is active at any time. Low workloads run on LITTLE core cluster: ◦ Background tasks, audio, or video. LITTLE Migration => If the workload Core becomes higher than maximum cluster big Core performance of LITTLE core cluster cluster. ◦ Also related to Dynamic Voltage Scaling (DVS). ◦ Cluster switch requires ~30k cycles. Example: ◦ Samsung Exynos 5 Octa 5410 (2013) Chapter 7: Applications and Multimedia Processors 22 22 big.LITTLE Processing with Core Migration Only one core cluster is active at any time. Low workloads run on LITTLE core cluster ◦ Background tasks, audio, or video. Switch to a big core LITTLE counterpart => If the Core cluster big workload becomes Core cluster higher than maximum performance of a LITTLE core. Chapter 7: Applications and Multimedia Processors 23 23 11 5/11/21 big.LITTLE Processing with MP OS schedules to all cores of both clusters. Tasks can run or be moved between LITTLE big core cluster cores and big cores Also referred to as Heterogeneous MP (HMP) LITTLE core cluster Examples: ◦ Samsung Exynos 5 Octa 5420 (2013) Chapter 7: Applications and Multimedia Processors 24 24 DynamIQ big.LITTLE Streamlines traffic across Support for multiple bridges performance domains Snoop Control Unit Accelerator Coherency Port Low latency interfaces Support large amounts closely couple accelerators AXI Coherence Extensions of local memory Chapter 7: Applications and Multimedia Processors 25 25 12 5/11/21 NEON Software processing of Audio, video, graphics, gaming, voice recognition, etc.: ◦ MP3 decoding Advanced SIMD: ◦ Supports 8, 16, 32, and 64-bit integer and single-precision FP operations ◦ Up to 16 operations at the same time ◦ 1B x 16 = 16B => 1 quad word VADD.I16 D0,D1,D2 VMUL.I32.S16 Q0,D2,D3 Chapter 7: Applications and Multimedia Processors 26 26 Mali-V550 Video Processor The second video IP from ARM with multi-standard codec. Multiple simultaneous encode/decode streams: ◦ Each core can support up to1080p @60fps to 4k @120fps Encoder: HEVC/H.265, H.264, VP8, JPEG Decoder: HEVC/H.265, H.264, H.263, MPEG-4, MPEG-2, VC-1/WMV, Real, VP8, JPEG Chapter 7: Applications and Multimedia Processors 27 27 13 5/11/21 Mali-G72 GPU Second generation. Supports all modern graphics APIs: OpenGL, Vulkan, OpenCL Chapter 7: Applications and Multimedia Processors 28 28 Outline Introduction ARM ARM CPU cores The ARM Architecture Case Study: Application Processors Chapter 7: Applications and Multimedia Processors 29 29 14 5/11/21 General Characteristics 32-bit (data) architecture (ARMv.8 is 64-bit): ◦ Handles byte, half word, word Load-store architecture 3-address format: ◦ 2 source operands (register and/or immediate) & 1 destination register Most ARM’s implement two instruction sets: ◦ 32-bit ARM Instruction Set ◦ 16-bit Thumb Instruction Set Jazelle cores can also execute Java bytecode Chapter 7: Applications and Multimedia Processors 30 30 Unique Features of ARM Conditional execution and flags Extensive use of ALU and shifter LD/ST Multiple instructions Thumb instruction set Fast Interrupt handling Chapter 7: Applications and Multimedia Processors 31 31 15 5/11/21 Conditional Execution & Flags Instructions can execute conditionally by postfixing them with the appropriate condition code field: ◦ Improves code density and performance by reducing the number of forward branch instructions. Typical RISC ARM CMP r3,#0 CMP r3,#0 BEQ skip ADDNE r0,r1,r2 ADD r0,r1,r2 skip: By default, data processing instructions do not affect the condition code flags but the flags can be optionally set by using “S”. CMP (Compare) instruction does not need “S”.