DSP Ringer ASIC

Total Page:16

File Type:pdf, Size:1020Kb

DSP Ringer ASIC EMBEDDED TECHNOLOGY FOR MOBILE MULTIMEDIA 19 June 2008 김철우 이사 / 마케팅 / 공학박사 QUALCOMM CDMA TECHNOLOGIES KOREA MOBILE MARKET UPDATE Source : SA, Jul07 1400 14.63 CDMA WCDMA HSPA 10.45 1200 GSM/GPRS EDGE Others 8.98 17.37 1000 594.41 25.78 617.36 365.29 552.31 188.88 800 33.61 79.97 56.64 48.55 Millions 600 11.54 110.99 403.57 224.54 532.35 286.84 521.98 185.04 400 458.27 66.52 23.90 170.61 151.45 157.26 2.57 141.83 89.07 200 41.980.00 17.570.00 201.23 203.53 132.58 155.58 159.79 170.58 187.76 0 2004 2005 2006 2007 2008 2009 2010 By Interface (Mil) 2004 2005 2006 2007 2008 2009 2010 CDMA 132.58 155.58 159.79 170.58 187.76 201.23 203.53 WCDMA 17.57 41.98 89.07 141.83 170.61 151.45 157.26 HSPA 0.00 0.00 2.57 23.90 66.52 185.04 286.84 GSM/GPRS 458.27 521.98 532.35 403.57 224.54 110.99 48.55 EDGE 11.54 79.97 188.88 365.29 552.31 617.36 594.41 Others 56.64 33.61 25.78 17.37 8.98 10.45 14.63 Total 676.61 833.12 998.45 1122.55 1210.72 1276.53 1305.23 2 QUALCOMM MOBILE PLATFORM • Background of Multimedia Integration • Qualcomm Platform Overview • Comparison with other Platforms • Qualcomm Implementation of Features 3 INTEGRATED MULTIMEDIA SOLULTION Non-IntegratedNon-Integrated SolutionSolution IntegratedIntegrated QCTQCT SolutionSolution GPS Video ASIC Camera Module with DSP Ringer ASIC Video ASIC Camera Module Apps Ringer Co- Processor Chip Processor 3D Gaming ASIC 3D Gaming ASIC GPS • Complete / Integrated Solution • Lower Device Costs • Proven Interoperability 4 INCREASING MIPS FOR MOBILE MULTIMEDIA 2400 Paging, Messaging, Voice Mail Personal 2000 Information Manager Convergence Dual-Core Snapdragon CPU + 1600 ARM 9 MSM6500 Up to 2370 Multimedia PlatformMSM6550 MIPS* ARM9 (1GHz) MIPS 1200 Enhanced MSM3000 Up to 160 MIPS* Platform ARM9 MSM2300 ARM7TDMI (146 MHz) Up to 250 MIPS* MSM7200 800 MSM2 Convergence Intel 80186 ARM MCU 23 MIPS* (27 (225 MHz) < 20 MIPS MHz) Dual-Core 10 MIPS ARM9 + ARM11 400 (~2.5 MHz) Up to 740 MIPS* (400 MHz) 0 Processing Power 1996 1998 2000 2002 2004 2006 2008 GB Desktop PCs 10 (3.5 inch HDD form factor) Mobile Phones (1 inch HDD form factor) 5 Toshiba W41T - cell 4 phone with a 4 GB 1 Hard Drive .1 Memory .01 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Source: Gartner Dataquest and 3G Today (www.3gtoday.com), 2006 5 QUALCOMM UMTS LINE-UP TECHNOLOGY Wireless & Consumer - INDUSTRY LEADER IN MODEM TECHNOLOGY QSD Convergence - BEST-IN-CLASS RF & PMIC - HIGHEST LEVEL OF SYSTEM INTEGRATION - Modem, Multimedia, RF, PMIC, Software - LEADING-EDGE PROCESS & PACKAGING MSM HLOS Phones HSUPA / MBMS 7K VGA/WVGA Mainstream MM MSM Broadcast 6K QVGA/WQVGA CUSTOMERS z Over 30 customers z Over 250 devices in design & production Single Chip QSC UMTS / HSDPA Entry Level 6 QSD 8250 ARCHITECTURE Stereo Speakers RF Touch Key- IrDA CE-ATA Broad- Aux Screen board cast Display Stereo USIM 802.11 Earphones UBM SD Microphone NOR (2) NAND Stereo BT Headphone SD SDSD SSBI Touch GPIO Fast I2C SDSDCtlr TS FastFast Aux Aux Aux UART CtlrCtlr EBI2 I2S Ctlr Ctlr Ctlr UART Ctlr CtlrCtlr Ctlr UARTUART Out In PCM Peripheral Bus MODEM CPU DSP Multimedia Accelerators Data Subsystem Subsystem Subsystem Display Video Video Graphics Audio Mover Crypto IMEM Core CPU QDSP6 DSP Proc FE Proc Proc Proc ARM9 QDSP4 L2 Cache/TCM L2 Cache/TCM GPS VIC & Timers Timers Cross-bar Bus Memory Subsystem LP- TV LCD MDDIMDDI MDDI USB USB EBI1 SMI1 DDR DAC Ctlr HostHost Client 2.0 2.0HS QSD8x50 LPDDR TV Primary Camera Display 7 SNAPDRAGON MICROPROCESSOR z Snapdragon Core – low power, high performance superscalar CPU developed by QCT z First to develop 1GHz CPU for battery powered wireless applications z Low power, low leakage, 65-nm process Snapdragon Core z Specifically designed and optimized for MSM solutions z ARM v7 compliant; QCT is an ARM architecture licensee z VeNum 128 bit SIMD – low power, high performance multimedia coprocessor z Up to 2X performance boost for multimedia applications CPU Delivers Up To 16x Performance Over Previous QCT Generations Marvel XScale Cortex-A8 Snapdragon Collaboration Between QCT and ARM on v7 Architecture Feature PXA320 (Tiger) Core 20 < 600MHz Frequency Up to 800MHz TI OMAP 3 is up to 1GHz 15 550MHz V5 V7 V7 Instruction set 10 WMMX 64 bit Neon 128 bit VeNum 5 Performance 1000 DMIPS 1200 DMIPS 2100 DMIPS Relative Performance Relative 0 Power ARM9 ARM11 SD Core SD Core + VeNum 480 mWatts 300 mWatts 200 mWatts @ 600 MHz NOTE: SD Core is based on ARM v7 architecture. VeNum is based on ARM SIMD Neon architecture 8 MSM 7000 / QSD 8000 ARCHITECTURE DUAL CORE SYSTEM Applications / BREW etc Comm Protocol Stack IPC L4 / WM / Linux / Symbian L4 / Iguana SOFTWARE Drivers Drivers ARM 1136 ARM 926 HARDWARE Application Processor Communication Processor QDSP QDSP Application DSP Communication DSP HARDWARE ACCELERATOR 9 MULTIMEDIA IMPLEMENTATION ARM 11 (400 MHz) or Qualcomm Implementation Scorpion (1GHz) ARM + DSP + HW • Video Processor (HD support) • Other Applications •Low Power • Easy IPC Application DSP (250/600 MHz) • Video Processor (HD support) • Speech / Audio codec • 3D Graphics ARM 11 (860 MHz) • Camera processing Hardware Accelerator • Video Processor (HD support) WMMX • 3D Graphics (Imageon, Yamato) • Camera ISP • Mobile Display Processor • Stereo DAC etc MONAHAN Implementation 10 PERFORMANCE COMPARISON SIMPLIFIED DESCRIPTION Performance App A App B App C App D Monahan Hardware or Software? MSM 7200 11 QTV DECODER EXAMPLE (6260/45/55A) • Audio codecs optimized on DSP Ping Pong Buffer Audio Previously Speaker Codec 76 K Byte Reconstructed • Video Decode Data RAM VOP Functions split between ARM & DSP Bitstream Unpack &Demultiplex Bitstream Variable Motion Motion VOP (Motion) Length Decoding Compensation Reconstruction • Support for QCIF Decoding resolutions EFS QCT Post Bitstream Processing Video (Texture) Variable Inverse Engine Length DS Scan Decoding Sockets Color Conversion Prediction Inverse Inverse Inverse AC/DC Quantization DCT Off-screen (Intral Blocks) Legend LCD Buffers ARM Processing Texture Decoding QDSP Blocks Video Core External Memory 12 QTV DECODER EXAMPLE (6275/6280) • More dedicated VOP Buffers hardware blocks to 465K Bytes Previously support higher bit rate Data RAM Reconstructed and larger image size (3 CIF buffers) VOP decoding Bitstream Bitstream & Demultiplex Unpack (Motion) Variable Motion Motion VOP • MDP (Mobile Display Length Decoding Compensation Reconstruction Processing) hardware Decoding based assist for post processing, color based conversion & Bitstream display Variable (Texture) Inverse Length Scan Decoding De-blocking • Support for QVGA resolutions Prediction Inverse Inverse Inverse AC/DC Quantization DCT Legend (Intral Blocks) Data RAM ARM Processing QDSP Blocks Texture Decoding Video Core External Memory 13 VIDEO CODEC FUNCITONALITY IMPLEMENTATION 아래 그림은 MSM 6245와 MSM 6280이 각 비디오 코덱의 부분들을 어떻게 처리하는가 하는 점을 도식화하여 나타내고자 한 것입니다. 아래 그림에서 보듯이 6245에서는 ARM에서 처리되던 Motion Vector Decoder가 6280에서는 QDSP에서 구현 되어있습니다. 그리고 6245에서는 QDSP에서 구현되던 일부 기능들이 6280에서는 HW core로 구현되어 있습니다. ARM QDSP Inverse Inverse Q VLC Prediction decoding Inverse DCT MSM Inverse Scan VOP reconstruct Motion Deblocking F 6245 Vector MSM 6245에서 QDSP에 decoding Motion 의해 처리되던 일부 기능 Compenstation 들이 MSM 6280에서는 HW CORE로 구현됨! MV decoder block은 QDSP로이동 Inverse Q VLC Inverse Motion decoding Prediction Vector Inverse DCT MSM decoding VOP reconstruct Inverse Scan 6280 Deblocking F Motion Compenstation ARM QDSP HW CORE 이러한 이유로 6280에서는 6245에 비해서 ARM 및 DSP의 load가더적게되고, 이는 MSM 6280이 QVGA의 resolution을 지원할 수 있는 여지를 마련해줍니다 14 INTEGRATED MULTIMEDIA FEATURES • Migration to Integrated Solution • Advantage & Disadvantage Analysis 15 INTEGRATION OF FUNCTIONALITIES Single-Chip to accommodate the sweet- spot (volume-zone) Video Video Graphics Audio Graphics Audio Camera Modem Bluetooth etc Bluetooth Camera Migration is likely to happen when target product/market is proven with discrete components The first Single chip usually covers only a portion of what is required! Target Product Varies! 16 MODEM ENHANCEMENT & INTEGRATION Multimedia Component Enhancement 6 months for Modem Tech Phase 2 multimedia Shaded e.g. MSM evolution e.g. HSUPA Area 7000 Modem Technology Phase 1 e.g. MSM e.g. HSDPA 1.8 6000 2 Years for modem tech update / Feature integration 17 INTEGRATED SOLUTION ADVANTAGES DISADVANTAGES • Reduced Size • Long Development Period • Power Saving • Increased code size • Lower Cost • Difficulty in test & analysis • Easy IPC • Redundant features The importance lies in that total software solution must be accompanied to fortify the advantage of integrated solution! 18 IMPORTANCE OF OPEN OS • Open OS Market Update • Migration to Open Platform 19 OPEN OS HANDSET MARKET Source : Informa Telecoms & Media, 9th Edition 450 35 400 30 350 Unit Sales(Million) 25 Market Share(%) 300 250 20 200 15 150 10 100 5 50 0 0 2006 2007 2008 2009 2010 2011 2012 Open OS Proprietary OS Share (%) 20 OPEN OS MARKET STATUS Source : Informa Telecoms & Media, 9th Edition 250 200 Unit Sales(Million) 150 100 50 0 2006 2007 2008 2009 2010 2011 2012 Symbian Microsoft Linux Palm & Others Proprietary OS * Recent movement by Google Android should be considered 21 IMPORTANCE OF OPEN OS A B … N A B N Q … Z OS or Middleware (Rex) OPEN OS Hardware Platform (MSM) Various Hardware Platforms z Growth of diversified multimedia applications z Reuse of existing software resources z Compatibility with wired Internet environment 22 CONCLUSION 23 CONSIDERATIONS WHEN to Launch? • Target release date is critical Mobile WHICH market to aim? Multimedia • Target customer • Chipset Features HOW much the price is? • Feasibility of penetration 24.
Recommended publications
  • Computer Graphics on Mobile Devices
    Computer Graphics on Mobile Devices Bruno Tunjic∗ Vienna University of Technology Figure 1: Different mobile devices available on the market today. Image courtesy of ASU [ASU 2011]. Abstract 1 Introduction Computer graphics hardware acceleration and rendering techniques Under the term mobile device we understand any device designed have improved significantly in recent years. These improvements for use in mobile context [Marcial 2010]. In other words this term are particularly noticeable in mobile devices that are produced in is used for devices that are battery-powered and therefore physi- great amounts and developed by different manufacturers. New tech- cally movable. This group of devices includes mobile (cellular) nologies are constantly developed and this extends the capabilities phones, personal media players (PMP), personal navigation devices of such devices correspondingly. (PND), personal digital assistants (PDA), smartphones, tablet per- sonal computers, notebooks, digital cameras, hand-held game con- soles and mobile internet devices (MID). Figure 1 shows different In this paper, a review about the existing and new hardware and mobile devices available on the market today. Traditional mobile software, as well as a closer look into some of the most important phones are aimed at making and receiving telephone calls over a revolutionary technologies, is given. Special emphasis is given on radio link. PDAs are personal organizers that later evolved into de- new Application Programming Interfaces (API) and rendering tech- vices with advanced units communication, entertainment and wire- niques that were developed in recent years. A review of limitations less capabilities [Wiggins 2004]. Smartphones can be seen as a that developers have to overcome when bringing graphics to mobile next generation of PDAs since they incorporate all its features but devices is also provided.
    [Show full text]
  • Comparative Study of Various Systems on Chips Embedded in Mobile Devices
    Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol.4, No.7, 2013 - National Conference on Emerging Trends in Electrical, Instrumentation & Communication Engineering Comparative Study of Various Systems on Chips Embedded in Mobile Devices Deepti Bansal(Assistant Professor) BVCOE, New Delhi Tel N: +919711341624 Email: [email protected] ABSTRACT Systems-on-chips (SoCs) are the latest incarnation of very large scale integration (VLSI) technology. A single integrated circuit can contain over 100 million transistors. Harnessing all this computing power requires designers to move beyond logic design into computer architecture, meet real-time deadlines, ensure low-power operation, and so on. These opportunities and challenges make SoC design an important field of research. So in the paper we will try to focus on the various aspects of SOC and the applications offered by it. Also the different parameters to be checked for functional verification like integration and complexity are described in brief. We will focus mainly on the applications of system on chip in mobile devices and then we will compare various mobile vendors in terms of different parameters like cost, memory, features, weight, and battery life, audio and video applications. A brief discussion on the upcoming technologies in SoC used in smart phones as announced by Intel, Microsoft, Texas etc. is also taken up. Keywords: System on Chip, Core Frame Architecture, Arm Processors, Smartphone. 1. Introduction: What Is SoC? We first need to define system-on-chip (SoC). A SoC is a complex integrated circuit that implements most or all of the functions of a complete electronic system.
    [Show full text]
  • Innovative AMD Handheld Technology – the Ultimate Visual Experience™ Anywhere –
    MEDIA BACKGROUNDER Innovative AMD Handheld Technology – The Ultimate Visual Experience™ Anywhere – AMD Vision AMD has a vision of a new era of mobile entertainment, bringing all the capabilities of a camera, camcorder, music player and 3D gaming console to mobile phones, smart phones and tomorrow’s converged portable devices. This vision is quickly becoming reality. Mass adoption of image and video sharing sites like YouTube, as well as the growing popularity of camera phones and personalized media services, are several trends that demonstrate ever-increasing consumer demand for “always connected” multimedia. And consumers have demonstrated a willingness to pay for sophisticated devices and services that deliver immersive, media-rich experiences. This increasing appetite for mobile multimedia makes it more important than ever for device manufacturers to quickly deliver the latest multimedia features – without significantly increasing design and manufacturing costs. AMD in Mobile Multimedia With the acquisition of ATI Technologies in 2006, AMD expanded beyond its traditional realm of PC computing to become a powerhouse in multimedia processing technologies. Building on more than 20 years of graphics and multimedia expertise, AMD is a leading supplier of media processors to the handheld market with nearly 250 million AMD Imageon™ media processors shipped to date. Furthermore, AMD is a significant source of mobile intellectual property (IP), licensing graphics technology to semiconductor suppliers. AMD provides customers with a top-to-bottom family of cutting-edge audio, video, imaging, graphics and mobile TV products. The scalable AMD technology platforms are based on open industry standards, and are designed for maximum performance with low power consumption.
    [Show full text]
  • GPU4S: Embedded Gpus in Space
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. “The final publication is available at: DOI: 10.1109/DSD.2019.00064 GPU4S: Embedded GPUs in Space Leonidas Kosmidis∗,Jer´ omeˆ Lachaizey, Jaume Abella∗ Olivier Notebaerty, Francisco J. Cazorla∗;z, David Steenarix ∗Barcelona Supercomputing Center (BSC), Spain yAirbus Defence and Space, France zSpanish National Research Council (IIIA-CSIC), Spain xEuropean Space Agency, The Netherlands Abstract—Following the same trend of automotive and avion- in space [1][2]. Those studies concluded that although their ics, the space domain is witnessing an increase in the on-board energy efficiency is high, their power consumption is an order computing performance demands. This raise in performance of magnitude higher than the limited power budget of a space needs comes from both control and payload parts of the space- craft and calls for advanced electronics able to provide high system, which is limited to a couple of Watts. computational power under the constraints of the harsh space Interestingly, GPUs entered in the embedded domain to environment. On the non-technical side, for strategic reasons it is satisfy the increasing demand for multimedia-based hand- mandatory to get European independence on the used computing held and consumer devices such as smartphones, in-vehicle technology. In this project, which is still in its early phases, we entertainment systems, televisions, set-top boxes etc.
    [Show full text]
  • Accelerating Augmented Reality Video Processing with Fpgas
    Accelerating Augmented Reality Video Processing with FPGAs A Major Qualifying Project Submitted to the Faculty of Worcester Polytechnic Institute in partial fulfillment of the requirements for the Degree of Bachelor of Science 4/27/2016 Anthony Dresser, Lukas Hunker, Andrew Weiler Advisors: Professor James Duckworth, Professor Michael Ciaraldi This report represents work of WPI undergraduate students submitted to the faculty as evidence of a degree requirement. WPI routinely publishes these reports on its web site without editorial or peer review. For more information about the projects program at WPI, see http://www.wpi.edu/Academics/Projects. Abstract This project implemented a system for performing Augmented Reality on a Xilinx Zync FPGA. Augmented and virtual reality is a growing field currently dominated by desktop computer based solutions, and FPGAs offer unique advantages in latency, performance, bandwidth, and portability over more traditional solutions. The parallel nature of FPGAs also create a favorable platform for common types of video processing and machine vision algorithms. The project uses two OV7670 cameras mounted on the front of an Oculus Rift DK2. A video pipeline is designed around an Avnet ZedBoard, which has a Zynq 7020 SoC/FPGA. The system aimed to highlight moving objects in front of the user. Executive Summary Virtual and augmented reality are quickly growing fields, with many companies bringing unique hard- ware and software solutions to market each quarter. Presently, these solutions generally rely on a desktop computing platform to perform their video processing and video rendering. While it is easy to develop on these platforms due to their abundant performance, several issues arise that are generally discounted: cost, portability, power consumption, real time performance, and latency.
    [Show full text]
  • Rapid Prototyping of an FPGA-Based Video Processing System
    Rapid Prototyping of an FPGA-Based Video Processing System Zhun Shi Thesis submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Science In Computer Engineering Peter M. Athanas, Chair Thomas Martin Haibo Zeng Apr 29th, 2016 Blacksburg, Virginia Keywords: FPGA, Computer Vision, Video Processing, Rapid Prototyping, High-Level Synthesis Copyright 2016, Zhun Shi Rapid Prototyping of an FPGA-Based Video Processing System Zhun Shi (ABSTRACT) Computer vision technology can be seen in a variety of applications ranging from mobile phones to autonomous vehicles. Many computer vision applications such as drones and autonomous vehicles requires real-time processing capability in order to communicate with the control unit for sending commands in real time. Besides real-time processing capability, it is crucial to keep the power consumption low in order to extend the battery life of not only mobile devices, but also drones and autonomous vehicles. FPGAs are desired platforms that can provide high-performance and low-power solutions for real-time video processing. As hardware designs typically are more time consuming than equivalent software designs, this thesis proposes a rapid prototyping flow for FPGA-based video processing system design by taking advantage of the use of high performance AXI interface and a high level synthesis tool, Vivado HLS. Vivado HLS provides the convenience of automatically synthesizing a software implementation to hardware implementation. But the tool is far from being perfect, and users still need embedded hardware knowledge and experience in order to accomplish a successful design.
    [Show full text]
  • Performance Characterization of Mobile GP-Gpus
    Performance Characterization of Mobile GP-GPUs Fitsum Assamnew Andargie Jonathan Rose School of Electrical and Computer Engineering The Edward Roger Sr. Department of Electrical and Addis Ababa University Computer Engineering Addis Ababa, Ethiopia University of Toronto [email protected] Toronto, Canada [email protected] Abstract— As smartphones and tablets have become more efficient algorithms [4]. The microarchitectural parameters of sophisticated, they now include General Purpose Graphics desktop/server GP GPUs are well understood and revealed by Processing Units (GP GPUs) that can be used for computation the vendors [4], whereas mobile GP GPU microarchitectures beyond driving the high-resolution screens. To use them are far less well documented. The purpose of this paper is to effectively, the programmer needs to have a clear sense of their measure key aspects of the microarchitecture and micro- microarchitecture, which in some cases is hidden by the communication channels for the Qualcomm Adreno 320 [5] manufacturer. In this paper we unearth key microarchitectural and 420 GPUs, which exist in the widely used Snapdragon parameters of the Qualcomm Adreno 320 and 420 GP GPUs, series of SoCs [6] used in many tablets and phones. present in one of the key SoCs in the industry, the Snapdragon Understanding these GP GPUs will enable high performance series of chips. applications to be developed for platforms that harbor them. Keywords—smartphones; GP GPU; microarchitecture; Adreno GPU;OpenCL II. BACKGROUND Recent smartphones are equipped with many kinds of I. INTRODUCTION compute modalities that can be used to enhance application Smartphones have progressed dramatically in the last few performance.
    [Show full text]
  • Simulation and Development Environment for Mobile 3D Graphics Architectures
    SPECIAL SECTION ON ADVANCES IN ELECTRONICS SYSTEMS SIMULATION Simulation and development environment for mobile 3D graphics architectures W.-J. Lee, W.-C. Park, V.P. Srini and T.-D. Han Abstract: This paper describes a simulation and development environment for designing mobile three-dimensional (3D) graphics architectures. The proposed simulation and verification environ- ment (SVE) uses glTrace’s ability to intercept and redirect an OpenGLjES streams. The SVE simu- lates the behaviour of mobile 3D graphics pipeline during the playback of traces and produces the second geometry trace that can be used as a test vector for the Verilog/hardware discription language RT-level model. An architectural verification can be conducted by comparing the frame-by-frame results. The functionality of the SVE is demonstrated by designing a mobile 3D graphics architecture and implementing the verified architecture on field programmable gate array (FPGA) boards. An application development environment (ADE) is also presented that includes a mobile graphics application programming interface and a device driver interface. The proposed SVE and the ADE could be efficiently used for developing and testing mobile appli- cations, architectural analysis and hardware designs. 1 Introduction for mobile 3D has been proposed with optimisation tech- niques such as multi-sampling, texture filtering and simple Mobile devices such as hand-held phone, smart phone, culling. Mitsubishi researchers have published a paper digital multimedia broadcasting (DMB) terminal, PDA and about an large-scale integration (LSI) core, called Z3D [4]. portable gaming console are used all over the world. They Many GPUs for mobile devices are being released by hard- contain a decent liquid crystal display (LCD) colour ware manufacturers.
    [Show full text]
  • Memory System Optimizations for CPU-GPU Heterogeneous Chip-Multiprocessors
    Memory System Optimizations for CPU-GPU Heterogeneous Chip-multiprocessors A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Siddharth Rai to the DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY KANPUR, INDIA July, 2018 Synopsis Recent commercial chip-multiprocessors (CMPs) have integrated CPU as well as GPU cores on the same chip [42, 43, 44, 93]. In today's designs, these cores typically share parts of the memory system resources between the applications executing on the two types of cores. However, since the CPU and the GPU cores execute very different workloads leading to very different resource requirements, designing intelligent protocols for sharing resources between them such that both CPU and GPU gain in performance brings forth new challenges to the design space of these heterogeneous processors. In this dissertation, we explore solutions to dynamically allocate last-level cache (LLC) capacity and DRAM bandwidth to the CPU and GPU cores in a design where both the CPU and the GPU share the large on- die LLC, DRAM controllers, DRAM channels, DRAM ranks, and DRAM device resources (banks, rows). CPU and GPU differ vastly in their execution models, workload characteristics, and performance requirements. On one hand, a CPU core executes instructions of a latency-sensitive and/or moderately bandwidth-sensitive job progressively in a pipeline generating memory accesses (for instruction and data) only in a few pipeline stages (instruction fetch and data memory access stages). On the other hand, GPU can access different data streams having different semantic meanings and disparate access patterns throughout the rendering pipeline.
    [Show full text]
  • (Unified) Shader GPU Microarchitecture for Embedded Systems*
    A Single (Unified) Shader GPU Microarchitecture * for Embedded Systems Victor Moya1, Carlos González, Jordi Roca, 2 Agustín Fernández, and Roger Espasa Department of Computer Architecture, Universitat Politècnica de Catalunya Abstract. We present and evaluate the TILA-rin GPU microarchitecture for embedded systems using the ATTILA GPU simulation framework. We use a trace from an execution of the Unreal Tournament 2004 PC game to evaluate and compare the performance of the proposed embedded GPU against a baseline GPU architecture for the PC. We evaluate the different elements that have been removed from the baseline GPU architecture to accommodate the architecture to the restricted power, bandwidth and area budgets of embedded systems. The unified shader architecture we present processes vertices, triangles and fragments in a single processing unit saving space and reducing hardware complexity. The proposed embedded GPU architecture sustains 20 frames per second on the selected UT 2004 trace. 1 Introduction In the last years the embedded market has been growing at a fast peace. With the in- crease of the computational power of the CPUs mounted in embedded systems and the increase in the amount of available memory those systems have become open to new kind of applications. One of these applications are 3D graphic applications, mainly games. Modern PDAs, powerful mobile phones and portable consoles already implement relatively powerful GPUs and support games with similar characteristics and features of PC games from five to ten years ago. However at the current pace embedded GPUs are about to reach the programmability and performance capabilities of their ‘big brothers’, the PC GPUs.
    [Show full text]
  • Mobile Graphics Trends
    Visual Computing Group Part 2 Mobile graphics trends • Hardware architectures • Applications 1 Visual Computing Group Hardware architectures 2 Mobile Graphics Tutorial – EuroGraphics 2017 Brief history of mobile graphics hardware Apple Samsung Imagination ARM Qualcomm AMD Intel Nvidia (PowerVR) (mostly ARM) PowerVR Mali Snapdragon/ iPhone MBX Buys Adreno (MBX) SGX535/541 Phalanx 2007 (GLES 2.0) Buys Sells Imageon Omnia HD Buys PA Hummingbird Mali 400 Imageon (TI OMAP 3 SGX543 (GLES 2008 Semi (Cortex A8) GLES 2.0 (Adreno) & Power VR 2.0, GL 2.1) SGX530) iPhone 3GS SGX545 (GLES 2009 (SGX535) 2.0 GL 3.2) Tegra 2 (Cortex- 2010 A4 (ARM A9, GLES 2.0) Cortex A8) Tegra 3 (Cortex- 2011 A9, GLES 2.0) Series 6XE/XT T600 GLES Adreno 530 GLES 3.1+, 2012 A7/A8 & GLES 3.1 GL 3.2 2.0, DX9.0 OpenCL 2, A8X (28nm) DX 11.2 Tegra 4 (Cortex- (GT64XX) Vulcan 1.0 2013 T700 GLES A15, GL 4.4, 28nm) Exynos 3,1, DX 11.1 5433/7410 Series7XE A9 OpenCL 1.1 Tegra K1 (Cortex- 2014 (GT7600) (20nm, Mali- Vulkan 1.0 GLES A15, GL 4.4, 28nm) T760 MP6) 3.1 (latter ones T800 GLES 3,1, Tegra X1 (Cortex- 10nm) DX 11.1-11.2 A57, GLES 3.1, GL 2015 OpenCL 1.2 4.5, Vulkan, 20nm) Plans to Apple will no longer require 2016 build its own its services in 18-24 months Next Tegra GPU Furian? generations seem to be for automotive 3 Mobile Graphics Tutorial – EuroGraphics 2017 Architectures (beginning 2015) ARM 4 Mobile Graphics Tutorial – EuroGraphics 2017 Architectures • x86 (CISC 32/64bit) – Intel Atom Z3740/Z3770, X3/X5/X7 – AMD Amur / Styx (announced) – Present in few smartphones, more common in tablets – Less efficient • ARM – RISC 32/64bit • With SIMD add-ons – Most common chip for smartphones – More efficient & smaller area • MIPS – RISC 32/64bit – Including some SIMD instructions – Acquired by Imagination, Inc.
    [Show full text]
  • 3D Graphics and Speqg Update
    3D Graphics and SpeqG Update David Ligon Product Manager, Staff QUALCOMM Incorporated Agenda • Overview of QUALCOMM® Graphics Cores • MSM6xxx Update, Including New Cores • MSM7x00 Update • MSM7850 Introduction • SpeqG 100M Gaming Phone Alliance QUALCOMM Graphics Core Performance 1G Convergence Sony PSP Enhanced “Imageon” Stargate “Imageon” 100M without SMI Sony PS2 “Stargate” Convergence Convergence 10M Enhanced Defender3 Nintendo DS “Defender3” “Imageon” “LT” 1M with SMI Convergence Enhanced 100K Platform provides 3D Pixels/Sec Defender2 advanced graphics features not “Defender2” available on PSP 10K and other handheld gaming devices: Gameboy Advance 1K 1K 10K 100K 1M 10M 100M MSM Cores 3D Triangles/Sec New MSM Cores Graphics Core MSM Lineup Gfx Core Peak Performance In Production 2007 21M TRIS /SEC 133M PIXELS /SEC 7850A7850 LT 3D DOrB LT 2D 532M PIXEL REJECT /S 798M TOTAL INST /S Q1 7200A 7500A HSUPA Imageon 3D 4M TRIS /SEC 7500 7200 DOrA 133M PIXELS /SEC DOrA HSUPA 7600 Imageon 2D DOrA Q1 Q4 HSUPA Stargate 3D 600K TRIS /SEC 6280A HSDPA ARM 2D 90M PIXELS /SEC Q3 Defender3 3D 225K TRIS /SEC 6175 6800A 6575 ARM 2D 22M PIXELS /SEC 1x DOrA DOr0 6550 6550A 6800 6280 Defender2 3D 225K TRIS /SEC DOr0 DOr0 DOrA HSDPA ARM 2D 7M PIXELS /SEC 6150 6275 1x HSDPA 6500 6100 6250A 6260 ARM-DSP 3D 50K - 100K TRIS /SEC DOr0 1x WCDMA HSDPA ARM 2D 400K - 1M PIXELS /SEC 6125 6250 6255A 6245 1x WCDMA WEDGE WEDGE 6050 6025 QSC QSC QSC 7525 7225 QSC 1x 1x 6030 6055 6075 DOrA HSUPA 6085 No 3D N/A 1x 1x DOrA DOrA ARM 2D 6000 QSC QSC 6225 Q1 QSC 6260-1 QSC QSC
    [Show full text]