Compiler Aspects of Hardware Accelerators
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
And Complex-Valued Multiply-Accumulate SIMD Unit for Digital Signal Processors
An Area Efficient Real- and Complex-Valued Multiply-Accumulate SIMD Unit for Digital Signal Processors Lukas Gerlach, Guillermo Paya-Vay´ a,´ and Holger Blume Cluster of Excellence Hearing4all, Institute of Microelectronic Systems Leibniz Universitat¨ Hannover, Appelstr. 4, 30167 Hannover, Germany Email: {gerlach, guipava, blume}@ims.uni-hannover.de Abstract—This paper explores a real- and complex-valued In the signal processing field, the fast Fourier transform multiply-accumulate (MAC) functional unit for digital signal pro- (FFT) is one of the mostly used transformations, which greatly cessors. MAC units with single-instruction-multiple-data (SIMD) pushes the performance requirements. The data parallelism support are often used to increase the processing performance inherent in the FFT processing allows operating with many in modern signal processing processors. Compared to a real- independent MAC operations simultaneously. Therefore, a valued SIMD-MAC units, the proposed unit uses the same performance increment can be achieved by MAC units with multipliers to also support complex-valued SIMD-MAC and butterfly operations. The area overhead for the complex mode SIMD mechanisms, but many instructions are still needed to is small. Complex-valued operations speed up signal processing operate the real- and imaginary parts of complex numbers algorithms and make the execution more efficient in terms of separately. The use of single instructions in DSPs, executing power consumption. As a case study, a fast Fourier transform operations with complex numbers, can lead to a significant (FFT) is implemented for a VLIW-processor with a complex- performance gain in many signal processing algorithms. valued SIMD butterfly extension. The proposed functional unit is quantitatively evaluated in terms of performance, silicon area, A SIMD-MAC unit that can handle both complex and and power consumption. -
TSMC Integrated Fan-Out (Info) Package Apple A10
Electronic Costing & Technology Experts 21 rue la Nouë Bras de Fer 44200 Nantes – France Phone : +33 (0) 240 180 916 email : [email protected] www.systemplus.fr September 2016 – Version 1 – Written by Stéphane ELISABETH DISCLAIMER : System Plus Consulting provides cost studies based on its knowledge of the manufacturing and selling prices of electronic components and systems. The given values are realistic estimates which do not bind System Plus Consulting nor the manufacturers quoted in the report. System Plus Consulting is in no case responsible for the consequences related to the use which is made of the contents of this report. The quoted trademarks are property of their owners. © 2016 by SYSTEM PLUS CONSULTING, all rights reserved. 1 Return to TOC Glossary 1. Overview / Introduction 4 – A10 Die Analysis 57 – Executive Summary – A10 Die View, Dimensions & Marking – Reverse Costing Methodology – A10 Die Cross-Section – A10 Die Process Characteristics 2. Company Profile 7 – Comparison with previous generation 65 – Apple Inc. – A9 vs. A10 PoP – Apple Series Application processor – A9 vs. A10 Process – Fan-Out Packaging – TSMC Port-Folio 4. Manufacturing Process Flow 70 – TSMC inFO packaging – Chip Fabrication Unit – Packaging Fabrication Unit 3. Physical Analysis 15 – inFO Reconstitution Flow – Physical Analysis Methodology – iPhone 7 Plus Teardown 17 5. Cost Analysis 81 – A10 Die removal – Synthesis of the cost analysis – A10 Package-on-Package Analysis 23 – Main steps of economic analysis – A10 Package View, Dimensions – Yields Hypotheses – A10 Package XRay View – Die Cost Analysis 86 – A10 Package Opening – Wafer Cost – A10 Package Marking – Die Cost – A10 Package Cross-Section – inFO Packaging Cost Analysis 90 – A10 Package Cross-Section – Adhesive & Passivation – Packaging Wafer Cost – A10 package cross-Section - TIVs – Packaging Cost per process Steps – A10 package cross-Section – Solder Balls – Component Cost – A10 package cross-Section – RDL – Land-Side Decoupling Capacitor Analysis 48 6. -
A Many-Core Architecture for In-Memory Data Processing
A Many-core Architecture for In-Memory Data Processing Sandeep R Agrawal Sam Idicula Arun Raghavan [email protected] [email protected] [email protected] Oracle Labs Oracle Labs Oracle Labs Evangelos Vlachos Venkatraman Govindaraju Venkatanathan Varadarajan [email protected] [email protected] venkatanathan.varadarajan@oracle. Oracle Labs Oracle Labs com Oracle Labs Cagri Balkesen Georgios Giannikis Charlie Roth [email protected] [email protected] [email protected] Oracle Labs Oracle Labs Oracle Labs Nipun Agarwal Eric Sedlar [email protected] [email protected] Oracle Labs Oracle Labs ABSTRACT ACM Reference format: For many years, the highest energy cost in processing has been Sandeep R Agrawal, Sam Idicula, Arun Raghavan, Evangelos Vlachos, Venka- traman Govindaraju, Venkatanathan Varadarajan, Cagri Balkesen, Georgios data movement rather than computation, and energy is the limiting Giannikis, Charlie Roth, Nipun Agarwal, and Eric Sedlar. 2017. A Many-core factor in processor design [21]. As the data needed for a single Architecture for In-Memory Data Processing. In Proceedings of MICRO-50, application grows to exabytes [56], there is clearly an opportunity Cambridge, MA, USA, October 14–18, 2017, 14 pages. to design a bandwidth-optimized architecture for big data compu- https://doi.org/10.1145/3123939.3123985 tation by specializing hardware for data movement. We present the Data Processing Unit or DPU, a shared memory many-core that is specifically designed for high bandwidth analytics workloads. 1 INTRODUCTION The DPU contains a unique Data Movement System (DMS), which A large number of data analytics applications in areas varying provides hardware acceleration for data movement and partition- from business intelligence, health sciences and real time log and ing operations at the memory controller that is sufficient to keep telemetry analysis already benefit from working sets that span up with DDR bandwidth. -
NVIDIA Bluefield-2 Datasheet
NVIDIA BLUEFIELD-2 DPU DATA CENTER INFRASTRUCTURE ON A CHIP The NVIDIA® BlueField®-2 data processing unit (DPU) is the world’s first data center Key Features infrastructure-on-a-chip optimized for traditional enterprises’ modern cloud workloads and high performance computing. It delivers a broad set of accelerated software- Security defined networking, storage, security, and management services with the ability to > Hardened isolation layer offload, accelerate and isolate data center infrastructure. With its 200Gb/s Ethernet or > Hardware root of trust > IPsec/TLS and AES-XTS encryption InfiniBand connectivity, the BlueField-2 DPU enables organizations to transform their IT acceleration infrastructures into state-of-the-art data centers that are accelerated, fully > Connection tracking for stateful firewall and programmable, and armed with “zero trust” security to prevent data breaches and IDS/IPS cyber attacks. > Regular expression (RegEx) matching processor ® By combining the industry-leading NVIDIA ConnectX -6 Dx network adapter with an Storage ® array of Arm cores and infrastructure-specific offloads, BlueField-2 offers purpose- > NVIDIA GPUDirect® Storage built, hardware-acceleration engines with full software programmability. Sitting at the > Elastic block storage enabled by BlueField edge of every server, BlueField-2 empowers agile, secured and high-performance cloud SNAP storage virtualization > Compression and decompression and artificial intelligence (AI) workloads, all while reducing the total cost of ownership acceleration and increasing data center efficiency. > NVMe-oF acceleration > VirtIO-blk acceleration The NVIDIA DOCA™ software framework enables developers to rapidly create applications and services for the BlueField-2 DPU. NVIDIA DOCA makes it easy to Networking leverage DPU hardware accelerators, providing breakthrough data center performance, > RoCE, Zero Touch RoCE efficiency and security. -
Bluefield As Platform
VISION ON NETWORKING IN THE AGE OF AI January 2021 J.J. Vegas Olmos, L. Liss, T. Oved, Z. Binshtock, D. Goldenberg DISCLAIMER • I only have 15 minutes and certainly the deck is long • I will go through some slides quickly • We can always engage in a conversation off-line [email protected] • You will notice throughout the presentation we are transmission media agnostic • Fiber, copper, wireless… not really relevant • Each media has its place and time • Whenever I say “Currently it is like this, but it will be like that” – that is a research line that requires efforts 2 DATA PROCESSING UNIT VISION NVIDIA’s DPU (Data Processing Unit) are the smallest DC you can have • A DPU is • A network interface • A SoC for programmability • A GPU • An acceleration enginee…. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 3 A LITTLE BIT OF BACKGROUND: THE DATA CENTER IS THE NEW UNIT OF COMPUTING Accelerated Disaggregated Infrastructure (ADI) Accelerated Computing Software defined, GPU: AI & machine learning Hardware-accelerated DPU (data processing unit) GPU critical for AI & machine learning Every workload will become AI Accelerated DPU essential to disaggregate resources & make composable ADI NVIDIA Networking You can do 3 things with data: transport it, processed it, use it. As of 2020, NVIDIA covers these three pilars with the incorporation of Mellanox Technologies 4 DISAGGREGATION & COMPOSABILITY All Resources Become Virtualized and Composable GPUs CPUs Ethernet Switches Ethernet Adapter Cards (NICs) Storage MEM/PMEM The NVIDIA Network is the Backplane of the Data Center | Dynamically Compose the Computer You Need! NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. -
Apple Iphone SE 16GB 4-Inch Powerhouse Smartphone!
Apple iPhone SE 16GB 4-inch Powerhouse Smartphone! Apple makes the compact smartphone king again with its new iPhone SE. A 4-inch powerhouse that easily fits in your pocket yet delivers the same high speeds and stunning graphics as the iPhone 6s. It’s designed with an improved battery, a rear-facing 12- megapixel iSight camera with Focus Pixels, True Tone Flash and Live Photos, and FaceTime HD camera with Retina Flash for gorgeous selfies. Same powerful chip as iPhone 6s Engineered to bring desktop-class architecture to the iPhone, the 64-bit A9 chip delivers incredible performance and console-standard graphics in apps and games. Overall computing performance is improved by up to 70% over the previous generation, while graphics are as much as 90% faster, ensuring you get a more reactive, immersive experience. Built within the A9 chip is the M9 motion coprocessor, which is connected to the accelerometer, compass and gyroscope to improve city navigation and fitness tracking. It also works to strengthen your connection to Siri, so you don’t have to hold up your iPhone to ask your questions. Capture amazing photos and 4K videos Take sharp, detailed photos and brilliant 4K video with the new 12 megapixel iSight camera. It’s faster, with improved tone mapping and image stabilisation, so you can just tap the shutter and let iPhone do the rest. And with four times the resolution of 1080p Full HD video, the detail you can capture in videos is simply extraordinary. Live Photos Don’t just capture a still; capture the moment with Live Photos. -
A Microarchitectural Study on Apple's A11 Bionic Processor
A Microarchitectural Study on Apple’s A11 Bionic Processor Debrath Banerjee Department of Computer Science Arkansas State University Jonesboro,AR,USA debrath.banerjee@smail. astate.edu Abstract—Over the 10 years of evolution in iPhone ARM Cortex A9 CPU with ARM’s advanced SIMD extension generations, world has experienced a revolutionary advancement called NEON and a dual core Power VR SGX543MP2 GPU. in iPhone processor which was first brought into palm through According to Apple , the A5 was clocked at 1GHz on the iPad2 iPhone first generation embedded with APL0098 processor. After while it could dynamically adjust its frequency to save its a rapid progression in microarchitecture , currently iPhone battery life.A5 processor came up with two different variants of market is dominated by Apple's new A11(SoC) Bionic processor 45nm and 32nm ,where 32nm was said to provide 12% better chipped with iPhone 8 and iPhone X which is based on ARM battery life. big.LITLE architecture. Apple’s new A11 is based of two performance cores to handle heavy duty multithreaded The high performance variant of Apple A5X was introduced workloads and four efficiency cores to cover more mundane tasks when Apple launched third generation iPad. This SoC had a when the requirements arises in order to preserve power quadcore graphics unit instead of the previous dual core as well consumption. A11 sports a new heavy duty performance as quad core channel memory controller that provided a controller which allows the chip to use these six cores at same memory bandwidth of 12.8GB/sec which was about three times time which is a great departure from A10 processor. -
Apple Iphone 6S Plus 5.5" 128GB 4G Sølv
Apple iPhone 6s Plus 5.5" 128GB 4G Sølv Beskrivelse Apple iPhone 6s Plus - Smartphone - 4G LTE Advanced - 128 GB - CDMA / GSM - 5.5" - 1920 x 1080 pixels (401 ppi) - Retina HD - 12 MP (5 MP frontkamera) - sølv Features Den næste generation af Multi-Touch Da den første iPhone kom til verden, tog den også Multi-Touch med sig. En funktion, som lige siden har ændret den måde, man oplever teknologi på. Nu kommer 3D Touch, så du får mulighed for at gøre ting, der slet ikke har været mulige før. Og fordi din iPhone nu også kan mærke, hvor hårdt du trykker på skærmen, kan du interagere med den og alt det, du har på den, på en helt ny måde. Endda både hurtigere og nemmere end før. Og takket være den nye Taptic Engine får du også feedback i realtid i form af små prik. Minder for livet Verdens mest populære kamera er blevet endnu mere avanceret. 12-megapixels iSight-kameraet tager skarpe og detaljerede billeder, det optager strålende 4K-video med op til fire gange så høj opløsning som 1080p HD-video, og det indbyggede 5-megapixels FaceTime HD-kamera forvandler dine selfies til små kunstværker. iPhone 6s kommer også med en helt ny funktion: Live Photos - en ny måde at genopleve dine minder på. Live Photos fanger sekunderne, før og efter du tager et stillbillede, så du kan se en levende version af billedet med både bevægelse og lyd. Den mest avancerede processor i en smartphone iPhone 6s kører på en særligt tilpasset 64-bit A9-processor. -
Technical Specifications
TECHNICAL SPECIFICATIONS 1. Desktop Minimum Requirements APPLE MAC MINI BUILD HARDWARE MINIMUM REQUIREMENTS 1. Motherboard MAC MINI Motherboard 2. Processor / CPU i3 8TH Gen Intel Processor 3. Memory / RAM 8GB DDR4 4. Hard Disk Drive or Solid State Drive 256gb SSD 5. PSU / Power Supply MAC MINI Built-in PSU 6. Monitor 1920x1080 or 1080p Resolution Monitor 7. Camera/Webcam w/ Microphone 720p / 1080p HD Web Cam (You will buy separately) INTEL BUILD HARDWARE MINIMUM REQUIREMENTS 1. Motherboard LGA 1151 Motherboard 2. Processor / CPU i3 7th Generation 3. Memory / RAM 4GB RAM DDR4 4. Hard Disk Drive or Solid State Drive 250 GB SSD 5. PSU / Power Supply Generic Power Supply 6. Monitor 1366x768, 720p Wide Monitor 7. Camera/Webcam w/ Microphone 720 / 1080p HD 1 AMD BUILD HARDWARE MINIMUM REQUIREMENTS 1. Motherboard AM4 Motherboard 2. Processor / CPU AMD Ryzen 3 Processors 3. Memory / RAM 4GB RAM DDR4 4. Hard Disk Drive or Solid State Drive 250 GB SSD 5. PSU / Power Supply Generic Power Supply 6. Monitor 1366x768, 720p Wide Monitor 7. Camera/Webcam w/ Microphone 720 / 1080p HD 2. Laptop Minimum Requirements APPLE / MAC LAPTOPS HARDWARE MINIMUM REQUIREMENTS 1. Processor / CPU i3 10th Generation Intel Processor 2. Memory / RAM 8gb LPDDR4X Memory 3. Hard Disk Drive or Solid State Drive 256GB SSD 4. Screen 2560x1600 Native Resolution 5. Camera/Webcam w/ Microphone 720p Facetime HD INTEL LAPTOPS HARDWARE MINIMUM REQUIREMENTS 1. Processor / CPU i3 8th Generation Intel Processor 2. Memory / RAM 4 or 8GB Memory 3. Hard Disk Drive or Solid State Drive 128/250GB SSD 4. Screen 13 - 15 inches, 1366x768, 720p Resolution 5. -
Smartnics: Current Trends in Research and Industry
SmartNICs: Current Trends in Research and Industry Tristan Döring, Henning Stubbe∗, Kilian Holzinger∗ ∗Chair of Network Architectures and Services, Department of Informatics Technical University of Munich, Germany Email: [email protected], [email protected], [email protected] Abstract—With ever rising demand, modern cloud environ- 2. Trends and Technological Demands in ments had to evolve fast in the last years. One of these novel Cloud/Hosting Industry problems are the increasing speed requirements in com- bination with present Software-Defined Networks (SDNs). Before diving deeper into the topic of SmartNICs this This paper gives an overview on a new hardware trend section will give a short overview on the current state resulting from this. We illustrate the demand, development, of the industry. The new guiding trend is the virtualiza- implementation and use of the network accelerating Smart- tion of networks, storage, GPUs etc. These techniques NICs. SmartNICs tackle existing problems of NIC-hardware generate network related workloads not only on network such as the lack of flexibility, a requirement for virtualized devices as virtualization can not independently run on e.g. networks. Furthermore the SmartNIC term will be analyzed NICs. The network processing can be divided into two to provide an universal definition. categories, i.e. the data plane and the control plane. The control plane is responsible for the control of the network Index Terms—SmartNIC, network accelerator, data process- structure, i.e. communications between network devices ing unit, fpga-based smartnic, asic-based smartnic, soc-based and assigning tasks to network devices. In other words smartnic it is the implementation of network policies. -
Opportunities for Near Data Computing in Mapreduce Workloads
OPPORTUNITIES FOR NEAR DATA COMPUTING IN MAPREDUCE WORKLOADS by Seth Hintze Pugsley A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science School of Computing The University of Utah May 2015 Copyright ⃝c Seth Hintze Pugsley 2015 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Seth Hintze Pugsley has been approved by the following supervisory committee members: Rajeev Balasubramonian , Chair 12 May 2014 Date Approved Alan L. Davis , Member 12 May 2014 Date Approved Erik L. Brunvand , Member 3 June 2014 Date Approved Feifei Li , Member 15 May 2014 Date Approved Vijayalakshmi Srinivasan , Member 19 June 2014 Date Approved and by Ross Whitaker , Chair/Dean of the Department/College/School of Computing and by David B. Kieda, Dean of The Graduate School. ABSTRACT In-memory big data applications are growing in popularity, including in-memory versions of the MapReduce framework. The move away from disk-based datasets shifts the performance bottleneck from slow disk accesses to memory bandwidth. MapReduce is a data-parallel application, and is therefore amenable to being executed on as many parallel processors as possible, with each processor requiring high amounts of memory bandwidth. We propose using Near Data Computing (NDC) as a means to develop systems that are optimized for in-memory MapReduce workloads, offering high compute parallelism and even higher memory bandwidth. This dissertation explores three different implementations and styles of NDC to improveMapReduce execution. First, we use 3D-stacked memory+logic devices to process the Map phase on compute elements in close proximity to database splits. -
2019 Technology Outlook
Global Information Technology 7 January 2019 2019 Technology Outlook A year of new leadership We see 2 milestones in 2019: 1) BigData/IoT overtaking MCD in revenue terms, 2) 5G kicking off, helping boost the new demand cycle We expect compound-semiconductor makers to capture the most Rick Hsu incremental value from the 5G cellular upgrade (886) 2 8758 6261 We flag 4 secular themes for investment — bandwidth (5G/FO), HMI [email protected] (3D/multi-cam), AI and density — and 15 related stock picks Robert Hsu (886) 2 8758 6251 [email protected] See important disclosures, including any required research certifications, beginning on page 115 Global Information Technology 7 January 2019 2019 Technology Outlook A year of new leadership We see 2 milestones in 2019: 1) BigData/IoT overtaking MCD in revenue terms, 2) 5G kicking off, helping boost the new demand cycle We expect compound-semiconductor makers to capture the most Rick Hsu incremental value from the 5G cellular upgrade (886) 2 8758 6261 We flag 4 secular themes for investment — bandwidth (5G/FO), HMI [email protected] (3D/multi-cam), AI and density — and 15 related stock picks Robert Hsu (886) 2 8758 6251 [email protected] What's new: 2018 was the Year of the Dog for many tech stocks in Asia, with Daiwa’s 2019 tech picks the sector hit by turbulence caused by trade tensions, crypto volatility and Stock Ticker Rating TP* iPhone weakness, on top of investors’ inflated expectations. With expectations Bandwidth since reset, valuations rationalised, inventory manageable, and demand MediaTek 2454 TT Buy 280 Inari INRI MK Buy 2.25 diversified, 2019 should be a “refresh” year for investment with new demand SEMCO 009150 KS Buy 172,000 leadership.