Compiler Aspects of Hardware Accelerators

Total Page:16

File Type:pdf, Size:1020Kb

Compiler Aspects of Hardware Accelerators Compiler Aspects of Hardware Accelerators Xiaochu Liu Department of Computer Science and Engineering University of California San Diego La Jolla, California 92092 Email: [email protected] Abstract Param. Description Relation Classical Leakage —Hardware accelerators are first-class building blocks scaling scaling in modern computing devices. Ranging from data centers B power budge 1 1 to mobile devices, hardware accelerators are designed and A chip size 1 1 developed in order to run certain applications more efficiently. Vt threshold voltage 1/S 1 V V The interaction between hardware accelerator design and dd supply voltage t 1/S 1 tox oxide thickness 1/S 1/S compiler support has continued to become critical in order W, L transistor dimen- 1/S 1/S to make systems more efficient. In this report, I will describe sion key efforts in hardware accelerators and their philosophy for Isat saturation current WVdd/tox 1/S 1 2 interacting with compilers. Then I will describe my current p device power Isat Vdd 1/S 1 Cgate capacitance WL/tox 1/S 1/S work in building tools to generate and integrate hardware F device frequency Isat S S accelerators automatically. CgateVdd D device per chip A/(WL) S2 S2 P power D× p 1 S2 1. INTRODUCTION U utilization B/P 1 1/S2 Table 1: Technology scaling table [3]. Chip technology scaling is now a limiting factor of the hardware system efficiencies [1], [2], [3], [4]. Previously in classical scaling era, as the transistor size shrinks, the power to drive more transistors on the same chip area does [9]. Adding hardware accelerators becomes the consensus in not change [5]. Hence the frequency is increased for free. industry and academia to increase system efficiency facing However in the leakage scaling era (where we are right the problem of dark silicon. now), the threshold voltage (Vt) does not shrink which stops These hardware accelerators can be built as extensions the supply voltage (Vdd) from shrinking any more (refer to for CPUs/GPUs through either ASIC (Application Specific Table 1). Given a constraint of the supply power of the chip, Integrated Circuit) or FPGA (Field-programmable Gate Ar- only part of the chips can be actively switched on and off ray). CPUs and GPUs themselves are general-purpose and at the same time. The other portions that have to be off trading efficiency for flexibility. For CPUs, the parallelism due to the limitation of supply power are idle chip area and memory bandwidth are limited. GPUs have a massive and are referred to as dark silicon. Trading the dark silicon amount of parallelism, however they are power hungry and for hardware accelerators generally is a profitable choice in show variant performance for different domains. Accelera- terms of reducing power consumption. tors built in ASIC or FPGA can remedy the shortcomings Hardware accelerators are common building blocks of CPUs and GPUs. Though ASIC requires higher design nowadays. In addition to specialized Functional Unit like and manufacture costs, it can be specialized for high per- Floating-point Unit, Intel processors added AVX for vector formance and low power for a particular application. Since processing and AES-NI for cryptography encryption oper- it is rare to do any change after computations are hardened ations [6], [7]. ARM has instructions set extension to sup- into ASIC, flexibility of designs becomes very important port AES encrypt/decrypt and advanced SIMD (NEON) for for domain-specific ASIC accelerators. FPGA is less energy mapping into hardware accelerators [8]. ARM also supports efficient than ASIC but is re-programmable which lowers the co-processor (hardware accelerators) integration by physi- cost and provides an approach to prototype the hardware cal register mapping. GPUs are essentially large external before the actual manufacturing. accelerators to perform intensive parallel tasks. The latest High-Level Synthesis (HLS) converts programs written Apple A9 chip has M9 motion coprocessors as an acceler- in high-level programming languages into hardware written ator to gather data from sensors (accelerometer, gyroscope, in Hardware Descriptive Languages (HDL). It automates compass, and barometer) or even receive Siri commands the process of hardware designing by shifting the burden of hardware design to its software counterpart. Several It supports rotations which makes the reuse of neurons to HLS tools are available in production or research proto- avoid reloading the same neurons again and again. types - BlueSpec, Catapult C, C-To-Silicon, LegUp, Vivado [10], [11], [12], [13], [14]. The generated hardware can be *' mapped to FPGA or ASIC. Leveraging HLS tools to gen- Control#Processor#(CP)# Inst.# erate hardware accelerators andweight integrate' neuronthem into systems DMA# Instruc:ons# output output can significantlylayer reduce the design efforts. This report explores the design decision of key hardware Tn# NFU)1% NFU)2% NFU)3% accelerator-based systems and focuses on their compiler aspects. I synthesized the compiler aspects of these efforts. NBin% DMA# hidden +' x Inst.# layer I divide these accelerators into three categories+' based on Memory#Interface# their compiler aspects - ISA-based,synapses configuration-based ,and Inst.# DMA# Tn# *' bi automatically generated. A brief introductiona ofi my current work of generatinginput and integrating hardware accelerators neuron table' Tn#x#Tn# NBout% automatically is also presented in the end. x The restsynapse of the report is organized as following. In Section 2, 3, 4, designs of key hardware accelerator-based architecturesFigure 9. Full are hardware described implementation and the compiler of neuralnetworks. aspects are SB% summarized. In Section 5, I introduce my recent works on building tools to generate hardware accelerators for irregular code. In Section 6, I summarize my content and give some Critical Path (ns) Figure 1: DiannaoFigure 11. systemAccelerator. architecture [15]. future directionsArea on (mm^2) improving compiler aspects of hardware accelerators. Energy (nJ) 2.2.such asConvolution-operation the Intel ETANN [18] at the accelerator beginning of the 1990s, 2. ISA-based accelerators not because neural networks were already large at the time, butConvolution because hardware Engine resources (CE) is (number a customized of transistors) hardware were ac- Firstly, I describe key efforts in ISA-based hardware celeratornaturally for much convolution-operation more scarce. The principle composed was of tomap time- and accelerators. They serve various domains and have certain reduceshare the [18]. physical Using SIMD neurons machines and use to the compute on-chip convolutions RAM to ways of ISA designs inside. The ISA designs target for high needsstore synapses many registers and intermediate (quadratic neurons to the size values of the of hidden block). flexibility while maintain specializations of the hardware GPGPUslayers. However, increase at performance that time, many by 10 neural timescomparing networks were with 012345 SIMD machine but cost 100 times more energy [19]. CE is accelerator. I8x8 summarize 16x16the 32x32 design 32x4 decisions 64x8 128x16 on compiler small enough that all synapses and intermediate neurons aspects at the end of this section. designedvalues could to perform fit in the this neural computation network RAM. pattern Since efficiently this is no by Figure 10. Energy, critical path and area of full-hardware layers. reducinglonger the the case, unnecessary one of the operations. main challenges for large-scale Convolution pattern is widely used in computational neural network accelerator design has become the interplay 2.1. Large-scale neural network accelerator photography, image and video processing. A standard dis- neuron to a neuron of the next layer, and from one synap- cretebetween 2-dimensional the computational convolution and the has memory a general hierarchy. formula as tic latchDiannao to the is associated a hardware neuron. accelerator For instance, targeting an executionlarge-scale such: timeCNNs of (Convolutional 15ns and an energy Neural reduction Networks) of 974x and DNNs over a (Deep core 5. Accelerator for Large Neural Networks hasNeural been Networks) reported for [15], a 90-10-10 [16], [17]. (90 It inputs, focuses 10 on hidden, increasing 10 In this section, we draw1 from1 the analysis of Sections 3 and the performance and energy efficiency by reducing the run- X X outputs) perceptron [38]. 4(Img to designf)[ ann; accelerator m] = for large-scaleImg[k] neuralf[n networks.k; m l] time memory footprint. Diannao hardware accelerator is × × − − The main componentsl=−∞ k of=−∞ the accelerator are the fol- synthesized into hardware designs in order to benchmark (1) 4.2 Maximum Number of Hardware Neurons ? lowing: an input buffer for input neurons (NBin), an out- the hardware for the power numbers. Function f is a filter and Img is a mapping from location However,Diannao the area,is designed energy andfor large-scaledelay grow quadratically CNNs and DNNs. with toput pixel buffer values. for output It contains neuronsmap (NBout),(the product and a of third filter buffer func- theFor number small-scale of neurons. neural We networks, have synthesized memory is the only ASIC used ver- to tionfor synaptic and pixels) weights and reduce (SB),(summation connected to of a all computational the products) sionsstore ofinput neural and network output result. layers All of thevarious neurons dimensions, and synapses and operation.block (performing Abstracting both the synapses two operations and neurons to acomputations) more general weare report hardened their in area, the accelerators critical path which and energy minimizes in Figure the execu- 10. format:which we call the Neural Functional Unit (NFU), and the Wetion have and used communication Synopsys ICC overhead for the place between and neurons. route, and How- the control logic (CP), see Figure 11. We first describe the NFU TSMCever, this 65nm low-overhead GP library, design standard does VT. not A scale hardware since neuron a large below, and then we focus on and explain(Img thef)[ rationalen; m] = for the performsneuron network the following will take operations: too much multiplication hardware die of area.
Recommended publications
  • And Complex-Valued Multiply-Accumulate SIMD Unit for Digital Signal Processors
    An Area Efficient Real- and Complex-Valued Multiply-Accumulate SIMD Unit for Digital Signal Processors Lukas Gerlach, Guillermo Paya-Vay´ a,´ and Holger Blume Cluster of Excellence Hearing4all, Institute of Microelectronic Systems Leibniz Universitat¨ Hannover, Appelstr. 4, 30167 Hannover, Germany Email: {gerlach, guipava, blume}@ims.uni-hannover.de Abstract—This paper explores a real- and complex-valued In the signal processing field, the fast Fourier transform multiply-accumulate (MAC) functional unit for digital signal pro- (FFT) is one of the mostly used transformations, which greatly cessors. MAC units with single-instruction-multiple-data (SIMD) pushes the performance requirements. The data parallelism support are often used to increase the processing performance inherent in the FFT processing allows operating with many in modern signal processing processors. Compared to a real- independent MAC operations simultaneously. Therefore, a valued SIMD-MAC units, the proposed unit uses the same performance increment can be achieved by MAC units with multipliers to also support complex-valued SIMD-MAC and butterfly operations. The area overhead for the complex mode SIMD mechanisms, but many instructions are still needed to is small. Complex-valued operations speed up signal processing operate the real- and imaginary parts of complex numbers algorithms and make the execution more efficient in terms of separately. The use of single instructions in DSPs, executing power consumption. As a case study, a fast Fourier transform operations with complex numbers, can lead to a significant (FFT) is implemented for a VLIW-processor with a complex- performance gain in many signal processing algorithms. valued SIMD butterfly extension. The proposed functional unit is quantitatively evaluated in terms of performance, silicon area, A SIMD-MAC unit that can handle both complex and and power consumption.
    [Show full text]
  • TSMC Integrated Fan-Out (Info) Package Apple A10
    Electronic Costing & Technology Experts 21 rue la Nouë Bras de Fer 44200 Nantes – France Phone : +33 (0) 240 180 916 email : [email protected] www.systemplus.fr September 2016 – Version 1 – Written by Stéphane ELISABETH DISCLAIMER : System Plus Consulting provides cost studies based on its knowledge of the manufacturing and selling prices of electronic components and systems. The given values are realistic estimates which do not bind System Plus Consulting nor the manufacturers quoted in the report. System Plus Consulting is in no case responsible for the consequences related to the use which is made of the contents of this report. The quoted trademarks are property of their owners. © 2016 by SYSTEM PLUS CONSULTING, all rights reserved. 1 Return to TOC Glossary 1. Overview / Introduction 4 – A10 Die Analysis 57 – Executive Summary – A10 Die View, Dimensions & Marking – Reverse Costing Methodology – A10 Die Cross-Section – A10 Die Process Characteristics 2. Company Profile 7 – Comparison with previous generation 65 – Apple Inc. – A9 vs. A10 PoP – Apple Series Application processor – A9 vs. A10 Process – Fan-Out Packaging – TSMC Port-Folio 4. Manufacturing Process Flow 70 – TSMC inFO packaging – Chip Fabrication Unit – Packaging Fabrication Unit 3. Physical Analysis 15 – inFO Reconstitution Flow – Physical Analysis Methodology – iPhone 7 Plus Teardown 17 5. Cost Analysis 81 – A10 Die removal – Synthesis of the cost analysis – A10 Package-on-Package Analysis 23 – Main steps of economic analysis – A10 Package View, Dimensions – Yields Hypotheses – A10 Package XRay View – Die Cost Analysis 86 – A10 Package Opening – Wafer Cost – A10 Package Marking – Die Cost – A10 Package Cross-Section – inFO Packaging Cost Analysis 90 – A10 Package Cross-Section – Adhesive & Passivation – Packaging Wafer Cost – A10 package cross-Section - TIVs – Packaging Cost per process Steps – A10 package cross-Section – Solder Balls – Component Cost – A10 package cross-Section – RDL – Land-Side Decoupling Capacitor Analysis 48 6.
    [Show full text]
  • A Many-Core Architecture for In-Memory Data Processing
    A Many-core Architecture for In-Memory Data Processing Sandeep R Agrawal Sam Idicula Arun Raghavan [email protected] [email protected] [email protected] Oracle Labs Oracle Labs Oracle Labs Evangelos Vlachos Venkatraman Govindaraju Venkatanathan Varadarajan [email protected] [email protected] venkatanathan.varadarajan@oracle. Oracle Labs Oracle Labs com Oracle Labs Cagri Balkesen Georgios Giannikis Charlie Roth [email protected] [email protected] [email protected] Oracle Labs Oracle Labs Oracle Labs Nipun Agarwal Eric Sedlar [email protected] [email protected] Oracle Labs Oracle Labs ABSTRACT ACM Reference format: For many years, the highest energy cost in processing has been Sandeep R Agrawal, Sam Idicula, Arun Raghavan, Evangelos Vlachos, Venka- traman Govindaraju, Venkatanathan Varadarajan, Cagri Balkesen, Georgios data movement rather than computation, and energy is the limiting Giannikis, Charlie Roth, Nipun Agarwal, and Eric Sedlar. 2017. A Many-core factor in processor design [21]. As the data needed for a single Architecture for In-Memory Data Processing. In Proceedings of MICRO-50, application grows to exabytes [56], there is clearly an opportunity Cambridge, MA, USA, October 14–18, 2017, 14 pages. to design a bandwidth-optimized architecture for big data compu- https://doi.org/10.1145/3123939.3123985 tation by specializing hardware for data movement. We present the Data Processing Unit or DPU, a shared memory many-core that is specifically designed for high bandwidth analytics workloads. 1 INTRODUCTION The DPU contains a unique Data Movement System (DMS), which A large number of data analytics applications in areas varying provides hardware acceleration for data movement and partition- from business intelligence, health sciences and real time log and ing operations at the memory controller that is sufficient to keep telemetry analysis already benefit from working sets that span up with DDR bandwidth.
    [Show full text]
  • NVIDIA Bluefield-2 Datasheet
    NVIDIA BLUEFIELD-2 DPU DATA CENTER INFRASTRUCTURE ON A CHIP The NVIDIA® BlueField®-2 data processing unit (DPU) is the world’s first data center Key Features infrastructure-on-a-chip optimized for traditional enterprises’ modern cloud workloads and high performance computing. It delivers a broad set of accelerated software- Security defined networking, storage, security, and management services with the ability to > Hardened isolation layer offload, accelerate and isolate data center infrastructure. With its 200Gb/s Ethernet or > Hardware root of trust > IPsec/TLS and AES-XTS encryption InfiniBand connectivity, the BlueField-2 DPU enables organizations to transform their IT acceleration infrastructures into state-of-the-art data centers that are accelerated, fully > Connection tracking for stateful firewall and programmable, and armed with “zero trust” security to prevent data breaches and IDS/IPS cyber attacks. > Regular expression (RegEx) matching processor ® By combining the industry-leading NVIDIA ConnectX -6 Dx network adapter with an Storage ® array of Arm cores and infrastructure-specific offloads, BlueField-2 offers purpose- > NVIDIA GPUDirect® Storage built, hardware-acceleration engines with full software programmability. Sitting at the > Elastic block storage enabled by BlueField edge of every server, BlueField-2 empowers agile, secured and high-performance cloud SNAP storage virtualization > Compression and decompression and artificial intelligence (AI) workloads, all while reducing the total cost of ownership acceleration and increasing data center efficiency. > NVMe-oF acceleration > VirtIO-blk acceleration The NVIDIA DOCA™ software framework enables developers to rapidly create applications and services for the BlueField-2 DPU. NVIDIA DOCA makes it easy to Networking leverage DPU hardware accelerators, providing breakthrough data center performance, > RoCE, Zero Touch RoCE efficiency and security.
    [Show full text]
  • Bluefield As Platform
    VISION ON NETWORKING IN THE AGE OF AI January 2021 J.J. Vegas Olmos, L. Liss, T. Oved, Z. Binshtock, D. Goldenberg DISCLAIMER • I only have 15 minutes and certainly the deck is long • I will go through some slides quickly • We can always engage in a conversation off-line [email protected] • You will notice throughout the presentation we are transmission media agnostic • Fiber, copper, wireless… not really relevant • Each media has its place and time • Whenever I say “Currently it is like this, but it will be like that” – that is a research line that requires efforts 2 DATA PROCESSING UNIT VISION NVIDIA’s DPU (Data Processing Unit) are the smallest DC you can have • A DPU is • A network interface • A SoC for programmability • A GPU • An acceleration enginee…. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 3 A LITTLE BIT OF BACKGROUND: THE DATA CENTER IS THE NEW UNIT OF COMPUTING Accelerated Disaggregated Infrastructure (ADI) Accelerated Computing Software defined, GPU: AI & machine learning Hardware-accelerated DPU (data processing unit) GPU critical for AI & machine learning Every workload will become AI Accelerated DPU essential to disaggregate resources & make composable ADI NVIDIA Networking You can do 3 things with data: transport it, processed it, use it. As of 2020, NVIDIA covers these three pilars with the incorporation of Mellanox Technologies 4 DISAGGREGATION & COMPOSABILITY All Resources Become Virtualized and Composable GPUs CPUs Ethernet Switches Ethernet Adapter Cards (NICs) Storage MEM/PMEM The NVIDIA Network is the Backplane of the Data Center | Dynamically Compose the Computer You Need! NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
    [Show full text]
  • Apple Iphone SE 16GB 4-Inch Powerhouse Smartphone!
    Apple iPhone SE 16GB 4-inch Powerhouse Smartphone! Apple makes the compact smartphone king again with its new iPhone SE. A 4-inch powerhouse that easily fits in your pocket yet delivers the same high speeds and stunning graphics as the iPhone 6s. It’s designed with an improved battery, a rear-facing 12- megapixel iSight camera with Focus Pixels, True Tone Flash and Live Photos, and FaceTime HD camera with Retina Flash for gorgeous selfies. Same powerful chip as iPhone 6s Engineered to bring desktop-class architecture to the iPhone, the 64-bit A9 chip delivers incredible performance and console-standard graphics in apps and games. Overall computing performance is improved by up to 70% over the previous generation, while graphics are as much as 90% faster, ensuring you get a more reactive, immersive experience. Built within the A9 chip is the M9 motion coprocessor, which is connected to the accelerometer, compass and gyroscope to improve city navigation and fitness tracking. It also works to strengthen your connection to Siri, so you don’t have to hold up your iPhone to ask your questions. Capture amazing photos and 4K videos Take sharp, detailed photos and brilliant 4K video with the new 12 megapixel iSight camera. It’s faster, with improved tone mapping and image stabilisation, so you can just tap the shutter and let iPhone do the rest. And with four times the resolution of 1080p Full HD video, the detail you can capture in videos is simply extraordinary. Live Photos Don’t just capture a still; capture the moment with Live Photos.
    [Show full text]
  • A Microarchitectural Study on Apple's A11 Bionic Processor
    A Microarchitectural Study on Apple’s A11 Bionic Processor Debrath Banerjee Department of Computer Science Arkansas State University Jonesboro,AR,USA debrath.banerjee@smail. astate.edu Abstract—Over the 10 years of evolution in iPhone ARM Cortex A9 CPU with ARM’s advanced SIMD extension generations, world has experienced a revolutionary advancement called NEON and a dual core Power VR SGX543MP2 GPU. in iPhone processor which was first brought into palm through According to Apple , the A5 was clocked at 1GHz on the iPad2 iPhone first generation embedded with APL0098 processor. After while it could dynamically adjust its frequency to save its a rapid progression in microarchitecture , currently iPhone battery life.A5 processor came up with two different variants of market is dominated by Apple's new A11(SoC) Bionic processor 45nm and 32nm ,where 32nm was said to provide 12% better chipped with iPhone 8 and iPhone X which is based on ARM battery life. big.LITLE architecture. Apple’s new A11 is based of two performance cores to handle heavy duty multithreaded The high performance variant of Apple A5X was introduced workloads and four efficiency cores to cover more mundane tasks when Apple launched third generation iPad. This SoC had a when the requirements arises in order to preserve power quadcore graphics unit instead of the previous dual core as well consumption. A11 sports a new heavy duty performance as quad core channel memory controller that provided a controller which allows the chip to use these six cores at same memory bandwidth of 12.8GB/sec which was about three times time which is a great departure from A10 processor.
    [Show full text]
  • Apple Iphone 6S Plus 5.5" 128GB 4G Sølv
    Apple iPhone 6s Plus 5.5" 128GB 4G Sølv Beskrivelse Apple iPhone 6s Plus - Smartphone - 4G LTE Advanced - 128 GB - CDMA / GSM - 5.5" - 1920 x 1080 pixels (401 ppi) - Retina HD - 12 MP (5 MP frontkamera) - sølv Features Den næste generation af Multi-Touch Da den første iPhone kom til verden, tog den også Multi-Touch med sig. En funktion, som lige siden har ændret den måde, man oplever teknologi på. Nu kommer 3D Touch, så du får mulighed for at gøre ting, der slet ikke har været mulige før. Og fordi din iPhone nu også kan mærke, hvor hårdt du trykker på skærmen, kan du interagere med den og alt det, du har på den, på en helt ny måde. Endda både hurtigere og nemmere end før. Og takket være den nye Taptic Engine får du også feedback i realtid i form af små prik. Minder for livet Verdens mest populære kamera er blevet endnu mere avanceret. 12-megapixels iSight-kameraet tager skarpe og detaljerede billeder, det optager strålende 4K-video med op til fire gange så høj opløsning som 1080p HD-video, og det indbyggede 5-megapixels FaceTime HD-kamera forvandler dine selfies til små kunstværker. iPhone 6s kommer også med en helt ny funktion: Live Photos - en ny måde at genopleve dine minder på. Live Photos fanger sekunderne, før og efter du tager et stillbillede, så du kan se en levende version af billedet med både bevægelse og lyd. Den mest avancerede processor i en smartphone iPhone 6s kører på en særligt tilpasset 64-bit A9-processor.
    [Show full text]
  • Technical Specifications
    TECHNICAL SPECIFICATIONS 1. Desktop Minimum Requirements APPLE MAC MINI BUILD HARDWARE MINIMUM REQUIREMENTS 1. Motherboard MAC MINI Motherboard 2. Processor / CPU i3 8TH Gen Intel Processor 3. Memory / RAM 8GB DDR4 4. Hard Disk Drive or Solid State Drive 256gb SSD 5. PSU / Power Supply MAC MINI Built-in PSU 6. Monitor 1920x1080 or 1080p Resolution Monitor 7. Camera/Webcam w/ Microphone 720p / 1080p HD Web Cam (You will buy separately) INTEL BUILD HARDWARE MINIMUM REQUIREMENTS 1. Motherboard LGA 1151 Motherboard 2. Processor / CPU i3 7th Generation 3. Memory / RAM 4GB RAM DDR4 4. Hard Disk Drive or Solid State Drive 250 GB SSD 5. PSU / Power Supply Generic Power Supply 6. Monitor 1366x768, 720p Wide Monitor 7. Camera/Webcam w/ Microphone 720 / 1080p HD 1 AMD BUILD HARDWARE MINIMUM REQUIREMENTS 1. Motherboard AM4 Motherboard 2. Processor / CPU AMD Ryzen 3 Processors 3. Memory / RAM 4GB RAM DDR4 4. Hard Disk Drive or Solid State Drive 250 GB SSD 5. PSU / Power Supply Generic Power Supply 6. Monitor 1366x768, 720p Wide Monitor 7. Camera/Webcam w/ Microphone 720 / 1080p HD 2. Laptop Minimum Requirements APPLE / MAC LAPTOPS HARDWARE MINIMUM REQUIREMENTS 1. Processor / CPU i3 10th Generation Intel Processor 2. Memory / RAM 8gb LPDDR4X Memory 3. Hard Disk Drive or Solid State Drive 256GB SSD 4. Screen 2560x1600 Native Resolution 5. Camera/Webcam w/ Microphone 720p Facetime HD INTEL LAPTOPS HARDWARE MINIMUM REQUIREMENTS 1. Processor / CPU i3 8th Generation Intel Processor 2. Memory / RAM 4 or 8GB Memory 3. Hard Disk Drive or Solid State Drive 128/250GB SSD 4. Screen 13 - 15 inches, 1366x768, 720p Resolution 5.
    [Show full text]
  • Smartnics: Current Trends in Research and Industry
    SmartNICs: Current Trends in Research and Industry Tristan Döring, Henning Stubbe∗, Kilian Holzinger∗ ∗Chair of Network Architectures and Services, Department of Informatics Technical University of Munich, Germany Email: [email protected], [email protected], [email protected] Abstract—With ever rising demand, modern cloud environ- 2. Trends and Technological Demands in ments had to evolve fast in the last years. One of these novel Cloud/Hosting Industry problems are the increasing speed requirements in com- bination with present Software-Defined Networks (SDNs). Before diving deeper into the topic of SmartNICs this This paper gives an overview on a new hardware trend section will give a short overview on the current state resulting from this. We illustrate the demand, development, of the industry. The new guiding trend is the virtualiza- implementation and use of the network accelerating Smart- tion of networks, storage, GPUs etc. These techniques NICs. SmartNICs tackle existing problems of NIC-hardware generate network related workloads not only on network such as the lack of flexibility, a requirement for virtualized devices as virtualization can not independently run on e.g. networks. Furthermore the SmartNIC term will be analyzed NICs. The network processing can be divided into two to provide an universal definition. categories, i.e. the data plane and the control plane. The control plane is responsible for the control of the network Index Terms—SmartNIC, network accelerator, data process- structure, i.e. communications between network devices ing unit, fpga-based smartnic, asic-based smartnic, soc-based and assigning tasks to network devices. In other words smartnic it is the implementation of network policies.
    [Show full text]
  • Opportunities for Near Data Computing in Mapreduce Workloads
    OPPORTUNITIES FOR NEAR DATA COMPUTING IN MAPREDUCE WORKLOADS by Seth Hintze Pugsley A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science School of Computing The University of Utah May 2015 Copyright ⃝c Seth Hintze Pugsley 2015 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Seth Hintze Pugsley has been approved by the following supervisory committee members: Rajeev Balasubramonian , Chair 12 May 2014 Date Approved Alan L. Davis , Member 12 May 2014 Date Approved Erik L. Brunvand , Member 3 June 2014 Date Approved Feifei Li , Member 15 May 2014 Date Approved Vijayalakshmi Srinivasan , Member 19 June 2014 Date Approved and by Ross Whitaker , Chair/Dean of the Department/College/School of Computing and by David B. Kieda, Dean of The Graduate School. ABSTRACT In-memory big data applications are growing in popularity, including in-memory versions of the MapReduce framework. The move away from disk-based datasets shifts the performance bottleneck from slow disk accesses to memory bandwidth. MapReduce is a data-parallel application, and is therefore amenable to being executed on as many parallel processors as possible, with each processor requiring high amounts of memory bandwidth. We propose using Near Data Computing (NDC) as a means to develop systems that are optimized for in-memory MapReduce workloads, offering high compute parallelism and even higher memory bandwidth. This dissertation explores three different implementations and styles of NDC to improveMapReduce execution. First, we use 3D-stacked memory+logic devices to process the Map phase on compute elements in close proximity to database splits.
    [Show full text]
  • 2019 Technology Outlook
    Global Information Technology 7 January 2019 2019 Technology Outlook A year of new leadership We see 2 milestones in 2019: 1) BigData/IoT overtaking MCD in revenue terms, 2) 5G kicking off, helping boost the new demand cycle We expect compound-semiconductor makers to capture the most Rick Hsu incremental value from the 5G cellular upgrade (886) 2 8758 6261 We flag 4 secular themes for investment — bandwidth (5G/FO), HMI [email protected] (3D/multi-cam), AI and density — and 15 related stock picks Robert Hsu (886) 2 8758 6251 [email protected] See important disclosures, including any required research certifications, beginning on page 115 Global Information Technology 7 January 2019 2019 Technology Outlook A year of new leadership We see 2 milestones in 2019: 1) BigData/IoT overtaking MCD in revenue terms, 2) 5G kicking off, helping boost the new demand cycle We expect compound-semiconductor makers to capture the most Rick Hsu incremental value from the 5G cellular upgrade (886) 2 8758 6261 We flag 4 secular themes for investment — bandwidth (5G/FO), HMI [email protected] (3D/multi-cam), AI and density — and 15 related stock picks Robert Hsu (886) 2 8758 6251 [email protected] What's new: 2018 was the Year of the Dog for many tech stocks in Asia, with Daiwa’s 2019 tech picks the sector hit by turbulence caused by trade tensions, crypto volatility and Stock Ticker Rating TP* iPhone weakness, on top of investors’ inflated expectations. With expectations Bandwidth since reset, valuations rationalised, inventory manageable, and demand MediaTek 2454 TT Buy 280 Inari INRI MK Buy 2.25 diversified, 2019 should be a “refresh” year for investment with new demand SEMCO 009150 KS Buy 172,000 leadership.
    [Show full text]