GPU Developments 2017T

Total Page:16

File Type:pdf, Size:1020Kb

GPU Developments 2017T GPU Developments 2017 2018 GPU Developments 2017t © Copyright Jon Peddie Research 2018. All rights reserved. Reproduction in whole or in part is prohibited without written permission from Jon Peddie Research. This report is the property of Jon Peddie Research (JPR) and made available to a restricted number of clients only upon these terms and conditions. Agreement not to copy or disclose. This report and all future reports or other materials provided by JPR pursuant to this subscription (collectively, “Reports”) are protected by: (i) federal copyright, pursuant to the Copyright Act of 1976; and (ii) the nondisclosure provisions set forth immediately following. License, exclusive use, and agreement not to disclose. Reports are the trade secret property exclusively of JPR and are made available to a restricted number of clients, for their exclusive use and only upon the following terms and conditions. JPR grants site-wide license to read and utilize the information in the Reports, exclusively to the initial subscriber to the Reports, its subsidiaries, divisions, and employees (collectively, “Subscriber”). The Reports shall, at all times, be treated by Subscriber as proprietary and confidential documents, for internal use only. Subscriber agrees that it will not reproduce for or share any of the material in the Reports (“Material”) with any entity or individual other than Subscriber (“Shared Third Party”) (collectively, “Share” or “Sharing”), without the advance written permission of JPR. Subscriber shall be liable for any breach of this agreement and shall be subject to cancellation of its subscription to Reports. Without limiting this liability, Subscriber shall be liable for any damages suffered by JPR as a result of any Sharing of any Material, without advance written permission of JPR. Important Note re Sharing of Material. If Subscriber wishes to Share any Material with any Shared Third Party, it shall contact JPR and request permission to do so. JPR shall forthwith prepare an appropriate Sharing agreement, which shall set forth the name and contact information of the Shared Third Party, the terms and conditions of such Sharing, the compensation to JPR for such Sharing, and the agreement by Subscriber and the Shared Third Party that the Shared Third Party shall not, under any circumstances, Share any of the Material with any other entity or person. Disclaimers. JPR Reports contain “reviews” of various products and information. The content of these Reports represents the interpretation and analysis of statistics and information that are either generally available to the public or released by entities or individuals deemed responsible. The information in the Reports is believed to be reliable but is not guaranteed as to its accuracy or completeness. The Reports do not endorse or attempt to sell any products and should not be interpreted differently. JPR assumes no responsibility for the correct or incorrect usage of any trademarks or service marks. Page 1 of 107 © Copyright Jon Peddie Research 2018 GPU Developments 2017t Table of Contents Executive Summary ................................................................................................. 7 Introduction ............................................................................................................. 8 Scope of report......................................................................................................... 9 First quarter 2017 ........................................................................................................................... 9 AMD .......................................................................................................................................... 9 Vega10 architecture and technology ............................................................................................... 10 High Bandwidth Cache (HBC) and HBM2 technology .................................................................. 11 New NCU ....................................................................................................................................... 13 Enhanced geometry pipeline ........................................................................................................... 15 Next generation pixel engine .......................................................................................................... 16 FreeSync 2 display technology ....................................................................................................... 18 Radeon Crimson Software .............................................................................................................. 22 What do we think? .......................................................................................................................... 22 AMD Embedded claimed 75% of discrete graphics market share in global casino gaming ............ 23 AMD accused companies of infringing patents............................................................................... 25 Arm .......................................................................................................................................... 25 Imagination Technologies ........................................................................................................ 26 New PowerVR Series8XE Plus GPUs for mid-range markets ........................................................ 26 Inside Series8XE Plus GPUs ................................................................................................... 29 Software tools and availability ........................................................................................ 30 What do we think? ........................................................................................................... 31 MediaTek got some serious graphics ............................................................................... 31 What do we think? ........................................................................................................... 32 PowerVR Furian GPU architecture ................................................................................................. 32 Furian architecture highlights .......................................................................................... 33 What do we think? ........................................................................................................... 35 Intel .......................................................................................................................................... 36 Intel Xeon E3-1200 ......................................................................................................................... 38 Nvidia ....................................................................................................................................... 38 GeForce GTX 1050 Ti and 1050 for laptops ................................................................................... 38 Nvidia GTX 1080 Ti ....................................................................................................................... 39 What do we think? ........................................................................................................... 40 Qualcomm ................................................................................................................................ 40 Verisilicon-Vivanti .................................................................................................................. 44 Adjacent developments ............................................................................................................ 44 The API path ................................................................................................................................... 45 Adoption and capabilities grew for Vulkan ..................................................................................... 47 Vulkan extension sets ...................................................................................................... 48 One size fits all? ............................................................................................................... 48 What do we think? ........................................................................................................... 48 I’m seeing more than I want to, or maybe not enough .................................................................... 48 Closing comments ............................................................................................................ 51 Micron’s 3D Xpoint memory structure .................................................................................... 51 Recipient of foundation graphics patents made claims.................................................................... 52 What do we think? ........................................................................................................... 59 Page 2 of 107 © Copyright Jon Peddie Research 2018 GPU Developments 2017t Epilog ............................................................................................................................... 59 Second quarter 2016 .................................................................................................................... 59 AMD ........................................................................................................................................ 59 Arm .........................................................................................................................................
Recommended publications
  • Wind River Vxworks Platforms 3.8
    Wind River VxWorks Platforms 3.8 The market for secure, intelligent, Table of Contents Build System ................................ 24 connected devices is constantly expand- Command-Line Project Platforms Available in ing. Embedded devices are becoming and Build System .......................... 24 VxWorks Edition .................................2 more complex to meet market demands. Workbench Debugger .................. 24 New in VxWorks Platforms 3.8 ............2 Internet connectivity allows new levels of VxWorks Simulator ....................... 24 remote management but also calls for VxWorks Platforms Features ...............3 Workbench VxWorks Source increased levels of security. VxWorks Real-Time Operating Build Configuration ...................... 25 System ...........................................3 More powerful processors are being VxWorks 6.x Kernel Compatibility .............................3 considered to drive intelligence and Configurator ................................. 25 higher functionality into devices. Because State-of-the-Art Memory Host Shell ..................................... 25 Protection ..................................3 real-time and performance requirements Kernel Shell .................................. 25 are nonnegotiable, manufacturers are VxBus Framework ......................4 Run-Time Analysis Tools ............... 26 cautious about incorporating new Core Dump File Generation technologies into proven systems. To and Analysis ...............................4 System Viewer ........................
    [Show full text]
  • Reviving the Development of Openchrome
    Reviving the Development of OpenChrome Kevin Brace OpenChrome Project Maintainer / Developer XDC2017 September 21st, 2017 Outline ● About Me ● My Personal Story Behind OpenChrome ● Background on VIA Chrome Hardware ● The History of OpenChrome Project ● Past Releases ● Observations about Standby Resume ● Developmental Philosophy ● Developmental Challenges ● Strategies for Further Development ● Future Plans 09/21/2017 XDC2017 2 About Me ● EE (Electrical Engineering) background (B.S.E.E.) who specialized in digital design / computer architecture in college (pretty much the only undergraduate student “still” doing this stuff where I attended college) ● Graduated recently ● First time conference presenter ● Very experienced with Xilinx FPGA (Spartan-II through 7 Series FPGA) ● Fluent in Verilog / VHDL design and verification ● Interest / design experience with external communication interfaces (PCI / PCIe) and external memory interfaces (SDRAM / DDR3 SDRAM) ● Developed a simple DMA engine for PCI I/F validation w/Windows WDM (Windows Driver Model) kernel device driver ● Almost all the knowledge I have is self taught (university engineering classes were not very useful) 09/21/2017 XDC2017 3 Motivations Behind My Work ● General difficulty in obtaining meaningful employment in the digital hardware design field (too many students in the field, difficulty obtaining internship, etc.) ● Collects and repairs abandoned computer hardware (It’s like rescuing puppies!) ● Owns 100+ desktop computers and 20+ laptop computers (mostly abandoned old stuff I
    [Show full text]
  • Intel® Quick Sync Video Technology Guide
    WP80 Superguide 3: THE CLOUD VIDEO SUPERGUIDE JANUARY/FEBRUARY 2015 SPONSORED CONTENT Intel® Quick Sync Video Technology and Intel® Xeon® Processor Based Servers— Flexible Transcode Performance and Quality Video transcoding involves converting provides enterprise-quality HEVC and and boost image quality. Some key one compressed video format to audio codecs, Intel® VTune™ Amplifier improvements include the following: another. In the past this process has XE performance analysis tools and Video • Additional JPEG/MJPEG decode in been a compute-intensive task which Quality Caliper stream quality analyzer. the multi-format codec engine. This demanded a large amount of precious Additionally, product family members, support is on top of existing energy- CPU resources. Intel® Quick Synch Video Intel® Video Pro Analyzer and Intel® efficient, high-performance AVC (QSV) can enable hardware-accelerated Stress Bitstreams and Encoder bundles encode/decode that sustains multiple transcoding to deliver better performance enable production–scale validation and 4K and Ultra HD video streams. than transcoding on the CPU without debug of encode, transcode, and decode • A dedicated new video quality engine sacrificing quality. and playback applications. to provide extensive video processing First introduced in 2011, Intel Quick Intel Media Server Studio SDK at low power consumption Sync technology is available in the Intel® implements many codec and tools • Programmable and media-optimized Xeon® Processor E3-1200 v3 with Intel components initially in software, and EU (execution units)/samplers for HD Graphics P4600/4700 and Iris™ Pro later as hybrid (software and hardware) high quality P5200. (From here on, we’ll simply refer to or entirely in hardware.
    [Show full text]
  • Download Speeds Performance and Optimized for Long Battery Life
    ® QUALCOMM FEATURING THE LATEST IN MOBILE TECHNOLOGY. TM SNAPDRAGON Capture sharper, higher-quality images, in challenging lighting situations Qualcomm’s Spectra 14-bit dual image signal processors tap into the enhanced performance and feature enhancements that Hexagon 680 DSP’s HVX 820MOBILE PROCESSOR performance adds with amazing features like Low Light Photo and Video and Touch-to-Track where ISP and DSP work intelligently together to enhance imaging as well as track movements and improve zoom. Enabling a more immersive, intuitive and Immersive, life-like connected experience. virtual reality The Snapdragon 820 mobile processor Experience realistic, visual and audio immersion and offers many advantages: smooth VR action enabled by Snapdragon 820’s • New X12 LTE: Industry leading Heterogeneous compute platform, designed for high connectivity with LTE download speeds performance and optimized for long battery life. of up to 600 Mbps and multi-gigabit 802.11ad Wi-Fi • New Qualcomm® Kryo CPU: Delivering Next-generation maximum performance and low power consumption Kryo is QTI’s first custom computer vision 64-bit quad-core CPU, manufactured Drive more safely with object detection and enhanced in advanced 14nm FinFET LPP process navigation and enhance your smartphone camera • New Qualcomm® Adreno 530: Up to capability with features that can track faces and 40% better graphics and compute objects for a more intelligent mobile experience. performance with the Adreno 530 GPU • Qualcomm Spectra™ 14-bit dual image signal processors (ISPs) deliver high Deeply immersive resolution DSLR-quality images using heterogeneous compute for advanced 3D gaming processing and additional power The combination of Snapdragon 820’s Adreno 530 GPU savings and Kryo CPU creates enough compute performance • New Hexagon 680 DSP includes to enable console quality games and exciting, next Hexagon Vector eXtensions and Sensor generation virtual reality applications.
    [Show full text]
  • Intel® Atom™ Processor E6x5c Series-Based Platform for Embedded Computing
    PlAtfOrm brief Intel® Atom™ Processor E6x5C Series Embedded Computing Intel® Atom™ Processor E6x5C Series-Based Platform for Embedded Computing Platform Overview Available with industrial and commercial The Intel® Atom™ processor E6x5C series temperature ranges, this processor series delivers, in a single package, the benefits of provides embedded lifecycle support and the Intel® Atom™ processor E6xx combined is supported by the broad Intel® archi- with a Field-Programmable Gate Array tecture ecosystem as well as standard (FPGA) from Altera. This series offers Altera development tools. Additionally, a exceptional flexibility to incorporate a compatible, dedicated Power Management wide range of standard and user-defined Integrated Circuit (PMIC) solution may be I/O interfaces, high-speed connectivity, obtained from leading PMIC suppliers to memory interfaces, and process accelera- help minimize platform part count and tion to meet the requirements of a variety reduce bill of material costs and design of embedded applications in industrial, medi- complexity. Options include separate PMIC cal, communication, vision systems, voice and clock generator chips (available from over Internet protocol (VoIP), military, high- ROHM Co., Ltd.) or a single-chip solution performance programmable logic control- that integrates the voltage regulator and lers (PLCs) and embedded computers. clock generator (available from Dialog Semiconductor). The Intel Atom processor E6x5C series is a multi-chip, single-package device that Product Highlights reduces board footprint, lowers compo- • Single-package: A compact 37.5 x nent count, and simplifies inventory 37.5 mm, 0.8 mm ball pitch, multi-chip control and manufacturing. This compact device internally connects the Intel Atom design offers single-vendor support while processor E6xx with a user-programma- providing Intel Atom processors for ble FPGA.
    [Show full text]
  • Overview: Graphics Processing Units
    Overview: Graphics Processing Units l advent of GPUs l GPU architecture n the NVIDIA Fermi processor l the CUDA programming model n simple example, threads organization, memory model n case study: matrix multiply using shared memory n memories, thread synchronization, scheduling n case study: reductions n performance considerations: bandwidth, scheduling, resource conflicts, instruction mix u host-device data transfer: multiple GPUs, NVLink, Unified Memory, APUs l the OpenCL programming model l directive-based programming models refs: Lin & Snyder Ch 10, CUDA Toolkit Documentation, An Even Easier Introduction to CUDA (tutorial); NCI NF GPU page, Programming Massively Parallel Processors, Kirk & Hwu, Morgan-Kaufman, 2010; Cuda By Example, by Sanders and Kandrot; OpenCL web page, OpenCL in Action, by Matthew Scarpino COMP4300/8300 L18,19: Graphics Processing Units 2021 JJJ • III × 1 Advent of General-purpose Graphics Processing Units l many applications have massive amounts of mostly independent calculations n e.g. ray tracing, image rendering, matrix computations, molecular simulations, HDTV n can be largely expressed in terms of SIMD operations u implementable with minimal control logic & caches, simple instruction sets l design point: maximize number of ALUs & FPUs and memory bandwidth to take advantage of Moore’s’ Law (shown here) n put this on a co-processor (GPU); have a normal CPU to co-ordinate, run the operating system, launch applications, etc l architecture/infrastructure development requires a massive economic base for its development (the gaming industry!) n pre 2006: only specialized graphics operations (integer & float data) n 2006: ‘General Purpose’ (GPGPU): general computations but only through a graphics library (e.g.
    [Show full text]
  • EEMBC and the Purposes of Embedded Processor Benchmarking Markus Levy, President
    EEMBC and the Purposes of Embedded Processor Benchmarking Markus Levy, President ISPASS 2005 Certified Performance Analysis for Embedded Systems Designers EEMBC: A Historical Perspective • Began as an EDN Magazine project in April 1997 • Replace Dhrystone • Have meaningful measure for explaining processor behavior • Developed business model • Invited worldwide processor vendors • A consortium was born 1 EEMBC Membership • Board Member • Membership Dues: $30,000 (1st year); $16,000 (subsequent years) • Access and Participation on ALL Subcommittees • Full Voting Rights • Subcommittee Member • Membership Dues Are Subcommittee Specific • Access to Specific Benchmarks • Technical Involvement Within Subcommittee • Help Determine Next Generation Benchmarks • Special Academic Membership EEMBC Philosophy: Standardized Benchmarks and Certified Scores • Member derived benchmarks • Determine the standard, the process, and the benchmarks • Open to industry feedback • Ensures all processor/compiler vendors are running the same tests • Certification process ensures credibility • All benchmark scores officially validated before publication • The entire benchmark environment must be disclosed 2 Embedded Industry: Represented by Very Diverse Applications • Networking • Storage, low- and high-end routers, switches • Consumer • Games, set top boxes, car navigation, smartcards • Wireless • Cellular, routers • Office Automation • Printers, copiers, imaging • Automotive • Engine control, Telematics Traditional Division of Embedded Applications Low High Power
    [Show full text]
  • Intel Atom® Processor C2000 Product Family for Microservers
    Intel® Atom™ Processor C2000 Product Family for Microservers PRODUCT BRIEF Maximize Efficiency for Your Lightweight Scale-Out Workloads Extreme Density and Energy-Efficiency for Low-End, Scale-Out Workloads With a need to rapidly deliver new services, cope with massive data growth, and contain costs, cloud service providers and hosters seek increasingly efficient ways to handle the demands on their infrastructure. Today’s servers based on Intel® Xeon® processors provide leadership performance and performance per watt with the flexibility to handle a wide range of workloads and peak demands. 2nd generation 64-bit However, certain lightweight, scale-out workloads—such as basic dedicated hosting, low-end static Intel® Atom™ Processor web serving, and simple content delivery can sometimes be hosted more efficiently on larger C2000 Product Family numbers of smaller servers built for extreme power efficiency. for microservers To address this need, Intel worked with a broad giving you the flexibility to right-size your infra- • Up to 7x higher7 perfor- ecosystem of leading server manufacturers to structure without limiting software mobility and mance, up to 6x better8 develop and deliver a variety of extreme low-power interoperability as your applications evolve. systems to support an emerging server category— performance/watt Optimized Platform Support microservers. With up to a 1,000 nodes1 per Intel provides complete microserver platform 9 rack and shared power, cooling, and networking • 6-20 watt TDP solutions that simplify implementation, improve resources, microservers can help you improve data overall efficiency and enable higher node density. • Up to 1000+ server center efficiency by right-sizing infrastructure for New innovations include: 10 relatively light processing requirements.
    [Show full text]
  • GPU Developments 2018
    GPU Developments 2018 2018 GPU Developments 2018 © Copyright Jon Peddie Research 2019. All rights reserved. Reproduction in whole or in part is prohibited without written permission from Jon Peddie Research. This report is the property of Jon Peddie Research (JPR) and made available to a restricted number of clients only upon these terms and conditions. Agreement not to copy or disclose. This report and all future reports or other materials provided by JPR pursuant to this subscription (collectively, “Reports”) are protected by: (i) federal copyright, pursuant to the Copyright Act of 1976; and (ii) the nondisclosure provisions set forth immediately following. License, exclusive use, and agreement not to disclose. Reports are the trade secret property exclusively of JPR and are made available to a restricted number of clients, for their exclusive use and only upon the following terms and conditions. JPR grants site-wide license to read and utilize the information in the Reports, exclusively to the initial subscriber to the Reports, its subsidiaries, divisions, and employees (collectively, “Subscriber”). The Reports shall, at all times, be treated by Subscriber as proprietary and confidential documents, for internal use only. Subscriber agrees that it will not reproduce for or share any of the material in the Reports (“Material”) with any entity or individual other than Subscriber (“Shared Third Party”) (collectively, “Share” or “Sharing”), without the advance written permission of JPR. Subscriber shall be liable for any breach of this agreement and shall be subject to cancellation of its subscription to Reports. Without limiting this liability, Subscriber shall be liable for any damages suffered by JPR as a result of any Sharing of any Material, without advance written permission of JPR.
    [Show full text]
  • Differentiate Your Android Game with Tegra & Allegorithmic Substance
    FLAP HIGHER THAN THE BIRDS: DIFFERENTIATE YOUR ANDROID GAME WITH TEGRA & ALLEGORITHMIC SUBSTANCE Andrew Edelsten, NVIDIA Dr Sebastien Deguy, Allegorithmic GOOD MORNING! . Welcome to GTC 2014 . And yes, that is the time! GOOD MORNING! OVERVIEW . Today’s Android and Google Play Store . NVIDIA Technology — SHIELD — Tegra K1 & the Kepler GPU architecture . Hands on with Allegorithmic Substance . NVIDIA GameWorks — Libraries & Samples — Tools — TegraZone . Wrap up ANDROID IN 2014 . The world’s most popular mobile OS . Over 1 billion devices (as of September 2013) . 1 million new devices every day . Devices in over 190 countries GOOGLE PLAY STORE . Play Store has over 1 million apps and games . Over 1.5 billion apps and games downloaded each month . With great success comes great problems — How do you get noticed through the noise? GETTING NOTICED GETTING NOTICED . Today’s mobile games are often too simple or clones — Angry Birds and Flappy Birds (and 2048?) will still happen — You may win the lottery . The Wild West is becoming more mainstream — Quality games are trending higher more often . Quality games — “Quality” is a nebulous term — Game design is hugely important — Differentiation . General niceties . Graphics and next-gen visuals TODAY’S THEME Human’s are visual creatures; we like things that are different, pretty, and which stand out. Stand out from the crowd: . Be different . Get noticed . Succeed NVIDIA TECHNOLOGY NVIDIA SHIELD . Tegra 4 powered . 5 inch 720p & multitouch display . Console-grade controller . High speed Wi-Fi . Full connectivity (HDMI, Miracast, USB, MicroSD, headphone, Bluetooth) . Tuned port base reflex speakers . Pure Android . 3D dashboard SHIELD DEVELOPMENT CONSIDERATIONS .
    [Show full text]
  • CTL RFP Proposal
    State of Maine Department of Education in coordination with the National Association of State Procurement Officials PROPOSAL COVER PAGE RFP # 201210412 MULTI-STATE LEARNING TECHNOLOGY INITIATIVE Bidder’s Organization Name: CTL Chief Executive - Name/Title: Erik Stromquist / COO Tel: 800.642.3087 x 212 Fax: 503.641.5586 E-mail: [email protected] Headquarters Street Address: 3460 NW Industrial St. Headquarters City/State/Zip: Portland, OR 97210 (provide information requested below if different from above) Lead Point of Contact for Proposal - Name/Title: Michael Mahanay / GM, Sales & Marketing Tel: 800.642.3087 x 205 Fax: 503.641.5586 E-mail: [email protected] Street Address: 3460 NW Industrial St. City/State/Zip: Portland, OR 97219 Proposed Cost: $294/yr. The proposed cost listed above is for reference purposes only, not evaluation purposes. In the event that the cost noted above does not match the Bidder’s detailed cost proposal documents, then the information on the cost proposal documents will take precedence. This proposal and the pricing structure contained herein will remain firm for a period of 180 days from the date and time of the bid opening. No personnel on the multi-state Sourcing Team or any other involved state agency participated, either directly or indirectly, in any activities relating to the preparation of the Bidder’s proposal. No attempt has been made or will be made by the Bidder to induce any other person or firm to submit or not to submit a proposal. The undersigned is authorized to enter into contractual obligations on behalf of the above-named organization.
    [Show full text]
  • An Evolution of Mobile Graphics
    AN EVOLUTION OF MOBILE GRAPHICS Michael C. Shebanow Vice President, Advanced Processor Lab Samsung Electronics July 20, 20131 DISCLAIMER • The views herein are my own • They do not represent Samsung’s vision nor product plans 2 • The Mobile Market • Review of GPU Tech • GPU Efficiency • User Experience • Tech Challenges • Summary 3 The Rise of the Mobile GPU & Connectivity A NEW WORLD COMING? 4 DISCRETE GPU MARKET Flattening 5 MOBILE GPU MARKET Smart • In 2012, an estimated 800+ Phones million mobile GPUs shipped “Phablets” • ~123M tablets • ~712M smart phones Tablets • Will easily exceed 1B in the coming years • Trend: • Discrete GPU relatively flat • Mobile is growing rapidly 6 WW INTERNET TRAFFIC • Source: Cisco VNI Mobile INET IP Traffic growth Traffic • Internet traffic growth Year (TB/sec) rate (TB/sec) rate is staggering 2005 0.9 0.00 2006 1.5 65% 0.00 • 2012 total traffic is 2007 2.5 61% 0.01 13.7 GB per person 2008 3.8 54% 0.01 per month 2009 5.6 45% 0.04 2010 7.8 40% 0.10 • 2012 smart phone 2011 10.6 36% 0.23 traffic at 2012 12.4 17% 0.34 0.342 GB per person per month • 2017 smart phone traffic expected at 2.7 GB per person per month 7 WHERE ARE WE HEADED?… • Enormous quantity of GPUs • Large amount of interconnectivity • Better I/O 8 GPU Pipelines A BRIEF REVIEW OF GPU TECH 9 MOBILE GPU PIPELINE ARCHITECTURES Tile-based immediate mode rendering IA VS CCV RS PS ROP (TBIMR) Tile-based deferred IA VS CCV scene rendering (TBDR) RS PS ROP IA = input assembler VS = vertex shader CCV = cull, clip, viewport transform RS = rasterization,
    [Show full text]