Xilinx ML Suite Overview

Total Page:16

File Type:pdf, Size:1020Kb

Xilinx ML Suite Overview Xilinx ML Suite Overview Jim Heaton Sr. FAE © Copyright 2018 Xilinx Deep Learning explores the study of algorithms that can Deep Learning learn from and make is Re-defining predictions on data Many Applications Cloud Ecommerce Security Financial Acceleration Social Industrial Medical Autonomous Surveillance IOT Bioinformatics Vehicles Page 2 © Copyright 2018 Xilinx Accelerating AI Inference into Your Cloud Applications Deep Learning Applications Featuring the Most Powerful FPGA in the Cloud Virtex® Ultrascale+™ Zynq® Ultrascale+™ VU9P MPSoC Cloud On Premises Edge © Copyright 2018 Xilinx Want to try the out Xilinx ML Suite ? https://github.com/Xilinx/ml-suite © Copyright 2018 Xilinx Xilinx ML Suite - AWS Marketplace ML Suite Supported Frameworks: ‒ Caffe ‒ MxNet ‒ Tensorflow ‒ Keras ‒ Python Support ‒ Darknet Jupyter Notebooks available: ‒ Image Classification with Caffe ‒ Using the xfDNN Compiler w/ a Caffe Model ‒ Using the xfDNN Quantizer w/ a Caffe Model Pre-trained Models ‒ Caffe 8/16-bit – GoogLeNet v1 – ResNet50 – Flowers102 – Places365 ‒ Python 8/16-bit – Yolov2 ‒ MxNet 8/16-bit – GoogLeNet v1 xfDNN Tools https://aws.amazon.com/marketplace/pp/B077FM2JNS?qid=1544477354556&sr=0-2&ref_=srh_res_product_title ‒ Compiler ‒ Quantizer © Copyright 2018 Xilinx Unified Simple User Experience from Cloud to Alveo Develop Publish Deploy User Guides Tutorials Examples User Guides Tutorials Examples Launch Instance User Choice Download xfDNN Apps xDNN Binaries xfDNN Apps xDNN Binaries © Copyright 2018 Xilinx Deep Learning Models Multi-Layer Perceptron Convolutional Neural Network Recurrent Neural Network • Classification • Feature Extraction • Sequence and Temporal Data • Universal Function Approximator • Object Detection • Speech to Text • Autoencoder • Image Segmentation • Language Translation Classification Object Detection Segmentation “Dog” © Copyright 2018 Xilinx Overlay Architecture Custom Processors Exploiting Xilinx FPGA Flexibility Customized overlays with ISA architecture for optimized implementation Easy plug and play with Software Stack Random Forest MLP Engine xDNN – CNN Engine for Large 16 nm Deephi ESE Configurable RF Scalable sparse and dense Xilinx Devices LSTM Speech to Text classification implementation Deephi DPU – Flexible CNN Engine engine with Embedded Focus Available on AWS © Copyright 2018 Xilinx Rapid Feature and Performance Improvement xDNN-v1 xDNN-v2 xDNN-v3 – 500 MHz – 500 MHz – 700 MHz – URAM for feature maps without caching – All xDNN-v1 features – Feature compatible with – Array of accumulator with – DDR Caching: Larger xDNN-v2 – 16 bit(batch 1), 8 bit(batch Image, CNN Networks – New Systolic Array 2) – Instructions: Depth wise Implementation: 50% – Instructions: Convolution, Convolution, Higher FMAX and 2.2x Relu, MaxPool, Deconvolution, time lower latency AveragePool, Elementwise Convolution, Transpose – Batch of 1 for 8 bit – Flexible kernel size(square) and strides Upsampling implementation – Programmable Scaling – Rectangular Kernels – Non-blocking Caching and – Q4CY17 – Q2CY18 Pooling – Q4CY18 Page 9 © Copyright 2018 Xilinx Break Through on Peak Performance GPU: Introduce new architectures Xilinx: Adapt the break through of and silicon emerging domain knowledge GPU Deep Learning Peak Power Efficiency FPGA Deep Learning Peak Power Efficiency 1000 1000 900 900 ACAP 800 800 Architectures 700 700 600 600 500 500 GOPS/W GOPS/W 400 400 Adaptable Precision Mixed FP16 for TensorCore 300 300 INT8 Optimization 200 200 Native INT8 operation 100 100 0 0 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2014 2015 2016 2017 2018 2019 2020 2021 Kepler Maxwell Pascal Volta Ultrascale Ultrascale+ ACAP © Copyright 2018 Xilinx Seamless Deployment with Open Source Software From Community xfDNN Middleware, Tools and Runtime From Xilinx xDNN Processing Engine © Copyright 2018 Xilinx xfDNN flow CNTK Caffe2 PyTorch Tensorflow MxNet Caffe F Framework Tensor Graph to R ONNX O Xilinx Tensor Graph N T E xfDNN Tensor Graph Optimization N D Model Weights xfDNN Compiler xfDNN Compression Calibration Set xfDNN Runtime Image (python API) CPU Layers FPGA Layers https://github.com/Xilinx/ml-suite © Copyright 2018 Xilinx xfDNN Inference Toolbox Graph Compiler Network Optimization xfDNN Quantizer • Python tools to quickly compile • Automatic network optimizations for • Quickly reduce precision of trained networks from common lower latency by fusing layers and models for deployment Frameworks – Caffe, MxNet and buffering on-chip memory • Maintains 32bit accuracy at 8 bit Tensorflow within 2% © Copyright 2018 Xilinx xfDNN Graph Compiler Pass in a Network Microcode for xDNN is Produced 0 XNConv conv1 7 2 16 26 2 1 1 0x1c0000 224 3 0x0 112 64 2 XNMaxPool pool1 3 2 0 0x0 112 64 0x1c0000 56 3 XNConv res2a_branch1 1 1 16 26 2 0 1 0x1c0000 56 64 0x0 56 256 4 XNConv res2a_branch2a 1 1 16 26 2 1 1 0x1c0000 56 64 0x230000 56 64 6 XNConv res2a_branch2b 3 1 16 26 2 1 1 0x230000 56 64 0x1c0000 56 64 8 XNConv res2a_branch2c 1 1 16 26 2 0 1 0x1c0000 56 64 0x230000 56 256 9 XNEltWise 1 0 1 0x0 0x230000 56 256 0x0 9 XNUpload 0x0 56 256 11 XNConv res2b_branch2a 1 1 16 26 2 1 1 0x0xfDNN 56 256 0x1c0000 56 64 13 XNConv res2b_branch2b 3 1 16 26 2 1 1 0x1c0000 56 64 0x3f0000 56 64 15 XNConv res2b_branch2c 1 1 16 26Graph 2 0 1 0x3f0000 Compiler 56 64 0x1c0000 56 256 16 XNEltWise 1 0 1 0x0 0x1c0000 56 256 0x0 16 XNUpload 0x0 56 256 18 XNConv res2c_branch2a 1 1 16 26 2 1 1 0x0 56 256 0x3f0000 56 64 20 XNConv res2c_branch2b 3 1 16 26 2 1 1 0x3f0000 56 64 0x460000 56 64 22 XNConv res2c_branch2c 1 1 16 26 2 0 1 0x460000 56 64 0x1c0000 56 256 23 XNEltWise 1 0 1 0x0 0x1c0000 56 256 0x0 23 XNUpload 0x0 56 256 25 XNConv res3a_branch1 1 2 16 26 2 0 1 0x0 56 256 0x1c0000 28 512 26 XNConv res3a_branch2a 1 2 16 26 2 1 1 0x0 56 256 0x3f0000 28 128 28 XNConv res3a_branch2b 3 1 16 26 2 1 1 0x3f0000 28 128 0x428000 28 128 30 XNConv res3a_branch2c 1 1 16 26 2 0 1 0x428000 28 128 0x2a0000 28 512 31 XNEltWise 1 0 1 0x1c0000 0x2a0000 28 512 0x1c0000 31 XNUpload 0x1c0000 28 512 © Copyright 2018 Xilinx xfDNN Network Optimization Layer to Layer Next Next Relu Relu Relu Relu Fused Fused Fused Fused [Relu+ [Relu+ [Relu+ [Relu+ Bias Bias Bias Bias Bias+ Bias+ Bias+ Bias+ Conv] Conv] Conv] Conv] Conv Conv Conv Conv Relu Relu Pool Fused Fused Pool [Relu+ [Relu+ Bias Bias Bias+ Bias+ Conv] Conv] DDR Buffered Conv Conv On-Chip URAM Buffered Previous Previous Unoptimized Model xfDNN Intelligently Fused layers Streaming optimized for URAM © Copyright 2018 Xilinx xfDNN Network Deployment Output Output Next Fused Layer Optimizations HW HW HW HW In- In- In- In- Line Line Line Line [Relu+ [Relu+ [Relu+ [Relu+ • Compiler can merge nodes Bias+ Bias+ Bias+ Bias+ Conv] Conv] Conv] Conv] • (Conv or EltWise)+Relu HW HW In- In- Pool Line Line [Relu+ [Relu+ • Conv + Batch Norm Bias+ Bias+ Conv] Conv] • Compiler can split nodes Previous • Conv 1x1 stride 2 -> Maxpool+Conv 1x1 Stride 1 Next HW HW HW HW In- In- In- In- Line Line Line Line [Relu+ [Relu+ [Relu+ [Relu+ On-Chip buffering reduces latency and increases Bias+ Bias+ Bias+ Bias+ “One Shot” Conv] Conv] Conv] Conv] Network throughput HW HW In- In- Pool Line Line [Relu+ [Relu+ • xfDNN analyzes network memory needs and optimizes Bias+ Bias+ Deployment Conv] Conv] scheduler Previous • For Fused and “One Shot” Deployment Next HW HW HW HW In- In- In- In- Line Line Line Line [Relu+ [Relu+ [Relu+ [Relu+ “One Shot” deploys entire network to FPGA Bias+ Bias+ Bias+ Bias+ Conv] Conv] Conv] Conv] • Optimized for fast, low latency inference HW HW In- In- Pool Line Line [Relu+ [Relu+ • Entire network, schedule and weights loaded only once to Bias+ Bias+ Conv] Conv] FPGA Previous Input Input Page 16 © Copyright 2018 Xilinx xfDNN Quantizer: FP to Fixed-Point Quantization Problem: xfDNN Quantized Model Accuracy Nearly all trained models are in 32-bit floating-point 100% Available Caffe and TensorFlow quantization tools 90% take hours and produce inefficient models 80% 70% Introducing: xfDNN Quantizer 60% A customer friendly toolkit that automatically analyses 50% floating-point ranges layer-by-layer and produces the 40% fixed-point encoding that looses the least amount of information 30% ‒ Quantizes GoogleNet in under a minute 20% ‒ Quantizes 8-bit fixed-point networks within 1-3% 10% accuracy of 32-bit floating-point networks 0% ‒ Extensible toolkit to maximize performance by resnet-50 resnet-101 resnet-152 Googlenet-v1 searching for minimal viable bitwidths and prune sparse fp32 top-1 accuracy 8 bit top-1 accuracy fp32 top-5 accuracy 8 bit top-5 accuracy networks © Copyright 2018 Xilinx xfDNN Quantizer: Fast and Easy 8 1) Provide FP32 network and 2) Provide a small sample set, no 3) Specify desired precision model labels required • Quantizes to <8 bits to match • E.g., prototxt and caffemodel • 16 to 512 images Xilinx’s DSP © Copyright 2018 Xilinx Xilinx ML Processing Engine – xDNN Features Description Kernel Sizes W: 1-15; H:1-15 Strides W: 1,2,4,8; H: 1,2,4,8 Convolution / Padding Same, Valid Deconvolution / Convolution Dilation Factor: 1,2,4 Programmable Feature-set Transpose Activation ReLU Bias Value Per Channel Tensor Level Instructions Scale & Shift Value Per Scaling Channel Kernel Sizes W: 1-15; H:1-15 700+MHz DSP Freq (VU9P) Supported Max Pooling Strides W: 1,2,4,8; H: 1,2,4,8 Operations Padding Same, Valid Custom Network Acceleration Kernel Sizes W: 1-15; H:1-15 Avg Pooling Strides W: 1,2,4,8; H: 1,2,4,8 Padding Same, Valid Element-wise Add Width & Height must match; Depth can mismatch. Memory Support On-Chip Buffering, DDR Caching Expanded set
Recommended publications
  • Partial Reconfiguration: a Simple Tutorial
    Partial Reconfiguration: A Simple Tutorial Richard Neil Pittman Microsoft Research February 2012 Technical Report MSR-TR 2012-19 Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052 Partial Reconfiguration: A Simple Tutorial A Tutorial for XILINX FPGAs Neil Pittman – 2/12, version 1.0 Introduction Partial Reconfiguration is a feature of modern FPGAs that allows a subset of the logic fabric of a FPGA to dynamically reconfigure while the remaining logic continues to operate unperturbed. Xilinx has provided this feature in their high end FPGAs, the Virtex series, in limited access BETA since the late 1990s. More recently it is a production feature supported by their tools and across their devices since the release of ISE 12. The support for this feature continues to improve in the more recent release of ISE 13. Altera has promised this feature for their new high end devices, but this has not yet materialized. Partial Reconfiguration of FPGAs is a compelling design concept for general purpose reconfigurable systems for its flexibility and extensibility. Despite the significant improvements in software tools and support, the Xilinx partial reconfiguration design option has a reputation for being an expert level flow that is difficult to use. In this tutorial we will show that it can actually be quite simple. As a case study, we apply Partial Reconfiguration to the Simple Interface for Reconfigurable Computing (SIRC) toolset. Combining SIRC and partial reconfiguration makes the idea of general purpose hardware and software user systems deployed on demand on generic platforms viable. The goal is to make developers more confident in the practicality of this concept and in their own ability to use it, so that more will take advantage of what it has to offer.
    [Show full text]
  • Xilinx-Solution-Brief-Bluedot.Pdf
    SOLUTION BRIEF Product Name DeepField-SR – Deep Learning based Video Super Resolution Accelerator INTRODUCTION DeepField-SR is a fixed functional hardware accelerator leveraging Xilinx Alveo Cards to offer the highest computational efficiency for Video Super Resolution. Based Solution Image on proprietary Neural Network trained with real world video data from the internet and fusing spatio-temporal information in multiple frames, it produces superior high resolution video quality. BENEFITS Key Features Comparing with EDSR(NTIRE 2017) on Tesla V100 > A fixed functional hardware > 45x faster than GPU accelerator offering high throughput > Scalable architecture to support multiple FPGA cards > Low upscaling latency > Deep Neural Network suitable for resource limited FPGA > 0.03 better quality assessment in LPIPS metric than > Deep Neural Network trained EDSR with real world video data > 44x lower latency than GPU > Designed with High Level Synthesis > Easy to update and upgrade per market demands > 1x FPGA card required Adaptable. Intelligent. © Copyright 2020 Xilinx SOLUTION OVERVIEW DeepField-SR is deployable on both public cloud and on-premise with Xilinx Alveo U200/U50 Accelerator Cards. As it is designed in scalable architecture and supports multiple FPGA cards, it can flexibly respond to various resolution upscale request. Its runtime performance on single Alveo U50 is 11 ~ 14fps to upscale video up to 4K resolution. The API is integrated within an FFmpeg workflow, meaning that simple command enables DeepField-SR acceleration and upscaling user input video. TAKE THE NEXT STEP Learn more about BLUEDOT Inc. For DeepField-SR quality evaluation, please visit http://kokoon.cloud Reach out to BLUEDOT sales at [email protected] Corporate Headquarters Europe Japan Asia Pacific Pte.
    [Show full text]
  • Alveo U50 Data Center Accelerator Card Installation Guide
    Alveo U50 Data Center Accelerator Card Installation Guide UG1370 (v1.6) June 4, 2020 Revision History Revision History The following table shows the revision history for this document. Section Revision Summary 06/04/2020 Version 1.6 Chapter 1: Introduction Updated the information. Card Features Added new section. Chapter 2: Card Interfaces and Details Added a caution. Known Issues • Added a known issue about installing the U50 card deployment package. • Added a known issue about downgrading to a beta platform. Downgrading Packages Added information about downgrading to a beta platform. Downgrading Packages Added information about downgrading to a beta platform. 02/27/2020 Version 1.5 XRT and Deployment Platform Installation Procedures on Replaced steps 4, 6, 7, 8, and 9 to document the new RedHat and CentOS installation steps for U50. Replaced all mentions of zip files with tar.gz. XRT and Deployment Platform Installation Procedures on Replaced steps 1, 2, 3, and the log file of step 6 to document Ubuntu the new installation steps for U50. Replaced all mentions of zip files with tar.gz. Running lspci Revised log file in step 2. Running xbmgmt flash --scan Revised output, platform, and ID information in step 1. Upgrading Packages Updated step 1 to include a link to chapter 4; removed steps 2-6. Upgrading Packages Updated step 1 to include a link to chapter 4; removed steps 2-6. 01/07/2020 Version 1.4 Installing the Card Updated to add notes about UL Listed Servers and card handling. 12/18/2019 Version 1.3 General Updated output logs.
    [Show full text]
  • Verisilicon and Cadence
    VeriSilicon and Cadence “Using Allegro FPGA System Planner XL, we completed pin assignment in one week. A manual pin-assignment process would have been error-prone, tedious, and time-consuming—it would have taken at least one month.“ Steven Guo, Package Design and PCB Implementation Manager, VeriSilicon The Customer Business Challenges VeriSilicon Holdings Co. is a world-class IC design foundry • Meet an aggressive timeline for an FPGA- that provides custom silicon solutions and system-on-chip based ASIC prototyping project (SoC) / system-in-package (SiP) turnkey services. The company’s • Develop a scalable prototyping platform solutions combine licensable digital signal processing cores, for hardware/software integration and eDRAM, value-added mixed-signal IP, and other IP into SoC co-verification platforms that extends to sub-65nm process technologies. These platforms are used in a wide range of consumer Design Challenges electronics devices, including set-top boxes, home gateway • Accelerate the design process with devices, mobile Internet devices, cell phones, HDTV devices, automated, placement-aware pin and Blu-ray/DVD players. assignment • Optimize the physical connectivity, even as For five consecutive years, Deloitte LLP has ranked VeriSilicon it changes among the top 50 high-tech, high-growth companies in China. Red Herring named VeriSilicon one of the top 100 private • Ensure quality and reduce complexity with companies in Asia, and EE Times named the company one of reuse of interface rules and protocols 60 emerging start-ups. Headquartered in Shanghai, China, Cadence Solutions VeriSilicon has R&D centers in both China and the United States, • Allegro FPGA System Planner XL with sales and customer support offices throughout Asia, the United States, and Europe.
    [Show full text]
  • Chip Company AMD Pursues Rival for $30 Billion Tie-Up
    P2JW283000-5-A00100-17FFFF5178F ***** FRIDAY,OCTOBER 9, 2020 ~VOL. CCLXXVI NO.85 WSJ.com HHHH $4.00 DJIA 28425.51 À 122.05 0.4% NASDAQ 11420.98 À 0.5% STOXX 600 368.31 À 0.8% 10-YR. TREAS. (Re-opening) , yield 0.764% OIL $41.19 À $1.24 GOLD $1,888.60 À $5.00 EURO $1.1761 YEN 106.03 Conflicts in Russia’s Orbit Intensify, Upending Kremlin Plans Stimulus What’s News Talks Are On Again, Business&Finance But Deal MD is in advanced talks Ato buy Xilinx in adeal that could be valued at Is Elusive morethan $30 billion and mark the latest big tie-up in the rapidly consolidating Negotiations show semiconductor industry. A1 signs of life after AT&T’s WarnerMedia is Pelosi ties airline aid restructuring itsworkforce as it seeks to reducecostsby S to broad agreement as much as 20% as the pan- PRES demic drains income from TED BY KRISTINA PETERSON movie tickets, cable sub- CIA AND ALISON SIDER scriptions and TV ads. A1 SO AS MorganStanleysaid it is RE/ WASHINGTON—Demo- buying fund manager Eaton LU cratic and WhiteHouse negoti- TO Vancefor $7 billion, continu- atorsresumed discussions over ing the Wall Street firm’s N/PHO acoronavirus relief deal Thurs- shifttoward safer businesses YA day, but gavenoindication AR likemoney management. B1 AS they were closer to resolving GHD deep-seated disputes that led IBM plans itsbiggest- BA President Trump to end negoti- ever businessexit, spinning AM ationsearlier this week. off amajor part of itsinfor- HR FewonCapitol Hill were op- mation-technologyservices VA SHATTERED:Armenia accused Azerbaijan on Thursday of shelling ahistoric cathedral in the separatistterritory of Nagorno- timistic that Congressand the operations as the company Karabakh.
    [Show full text]
  • Xilinx XAPP107: Synopsys/Xilinx High Density Design Methodology Using
    Product Obsolete/Under Obsolescence Application Note 3 0 Synopsys/Xilinx High Density Design Methodology Using FPGA Compiler™ XAPP107 August 6, 1998 (Version 1.0) 03*Written by: Bernardo Elayda & Ramine Roane Summary This paper describes design practices to synthesize high density designs (i.e. over 100k gates), composed of large functional blocks, for today’s larger Xilinx FPGA devices using the Synopsys FPGA Compiler. The Synopsys FPGA Compiler version 1998.02, Alliance Series 1.5, and the XC4000X family were used in preparing the material for this application note. Introduction designs preserving some levels of hierarchy will lead to better placement routing results and shorter compile times. For smaller designs, optimal quality of results can often be achieved by ungrouping hierarchical boundaries (compile The designer must determine which trade-offs to make to -ungroup_all). For high density designs, designers tra- meet his design goals and timing requirements. ditionally partition their circuits into small 5-10k gate mod- High density designs require greater selectivity on which ules. With todays version of FPGA Compiler (1998.02) and hierarchical boundaries to eliminate for synthesis. Flatten- currently available workstation hardware it is no longer nec- ing a 150k gate design can lead to long compile times, and essary to partition a design into small 5K to 10K blocks for compromise results. The following guidelines will help in efficient synthesis. In fact, artificial partitioning the design choosing which blocks to flatten in a large design: can worsen optimization results by breaking paths into sep- 1. Eliminating boundaries between combinatorial blocks arate partitions and preventing FPGA Compiler from being allows the synthesis tool to optimize glue logic across able to optimize the entire path.
    [Show full text]
  • Nicholas Fund, Inc. Schedule of Investments (Unaudited) As of December 31, 2020
    NICHOLAS FUND, INC. SCHEDULE OF INVESTMENTS (UNAUDITED) AS OF DECEMBER 31, 2020 SHARES OR PRINCIPAL AMOUNT VALUE ------------------------ ------------------------------ COMMON STOCKS - 98.86% Communication Services - Media & Entertainment - 6.04% 71,000 Alphabet Inc. - Class C * $ 124,383,480 650,000 Comcast Corporation - Class A 34,060,000 145,000 Facebook, Inc. - Class A * 39,608,200 ----------------- 198,051,680 ----------------- Consumer Discretionary - Retailing - 9.31% 21,000 Amazon.com, Inc. * 68,395,530 270,000 Home Depot, Inc. (The) 71,717,400 144,976 O'Reilly Automotive, Inc. * 65,611,788 850,000 TJX Companies, Inc. (The) 58,046,500 145,000 Ulta Beauty, Inc. * 41,638,200 ----------------- 305,409,418 ----------------- Consumer Discretionary - Services - 3.05% 180,000 McDonald's Corporation 38,624,400 575,000 Starbucks Corporation 61,513,500 ----------------- 100,137,900 ----------------- Consumer Staples - Food & Staples Retailing - 1.21% 105,000 Costco Wholesale Corporation 39,561,900 ----------------- Consumer Staples - Food, Beverage & Tobacco - 4.84% 660,000 Coca-Cola Company (The) 36,194,400 180,000 Constellation Brands, Inc. - Class A 39,429,000 700,000 Mondelez International, Inc. - Class A 40,929,000 285,000 PepsiCo, Inc. 42,265,500 ----------------- 158,817,900 ----------------- Energy - 1.02% 1,700,000 Enterprise Products Partners L.P. 33,303,000 ----------------- Financials - Banks - 2.79% 395,000 JPMorgan Chase & Co. 50,192,650 865,000 Truist Financial Corporation 41,459,450 ----------------- 91,652,100 ----------------- Financials - Diversified - 4.57% 1,000,000 Charles Schwab Corporation (The) 53,040,000 385,000 Intercontinental Exchange, Inc. 44,386,650 160,000 S&P Global Inc. 52,596,800 ----------------- 150,023,450 ----------------- Financials - Insurance - 3.99% 415,000 Aon plc 87,677,050 280,000 Chubb Limited 43,097,600 ----------------- 130,774,650 ----------------- Health Care - Equipment & Services - 9.36% 605,000 Alcon, Inc.
    [Show full text]
  • EENG 428 ENAS 968 Bit.Ly/Cloudfpga Lecture: Cloud FPGA Infrastructures
    Cloud FPGA EENG 428 ENAS 968 bit.ly/cloudfpga Lecture: Cloud FPGA Infrastructures Prof. Jakub Szefer Dept. of Electrical Engineering, Yale University EENG 428 / ENAS 968 Cloud FPGA Share: EENG 428 / ENAS 968 – Cloud FPGA 2 bit.ly/cloudfpga © Jakub Szefer, Fall 2019 Public Cloud FPGA Infrastructures • In recent 2~3 years, there has been an emergence of public cloud providers offering FPGAs for customer use in their data centers (as of 2019): • Xilinx Virtex UltraScale+: Amazon AWS, HuaWei Cloud, and Alibaba Cloud • Xilinx Kintex UltraScale: Baidu Cloud and Tencent Cloud • Xilinx Alveo Accelerator: Nimbix • Intel Arria 10: Alibaba Cloud and OVH • Intel Stratix V: Texas Advanced Computing Center (TACC) • Intel Stratix 10: Microsoft Azure (for AI applications) • Most infrastructures let users load any hardWare design (With limitations imposed by the underlying FPGA and design rule checks) • Some infrastructures only give indirect access to FPGA, e.g., via HLS • Most use pay-as-you-go model, or require users to rent compute resources up-front for a specified amount of time (e.g. per month) Share: EENG 428 / ENAS 968 – Cloud FPGA 3 bit.ly/cloudfpga © Jakub Szefer, Fall 2019 Amazon AWS as a Public Cloud Example • Amazon Web Services (AWS) is a major public cloud infrastructure provider • A subsidiary of Amazon • Provides on-demand cloud computing platforms to individuals, companies, and governments • AWS operating income Was $7.2 billion in 2018 • Amazon as a Whole, had operating income of $12.4 billion in 2018 (AWS contributed then 58%) • AWS page is at https://aWs.amazon.com AWS services relevant to Cloud FPGAs include: • Amazon Elastic Compute Cloud (EC2) – on-demand access to virtual servers in the cloud • Amazon Elastic Block Storage (EBS) – block storage devices (i.e.
    [Show full text]
  • Fpgas – EPIC Benefits
    FPGAs – EPIC Benefits Philip Leong Director, Computer Engineering Laboratory http://phwl.org/talks Computer Engineering Laboratory › Focuses on how to use parallelism to solve demanding problems - Novel architectures, applications and design techniques using VLSI, FPGA and parallel computing technology › Research - Nanoscale interfaces - Machine learning - Reconfigurable computing › Collaborations - Consunet, DST Group - Intel, Xilinx › Ex-students - Xilinx, Intel, Waymo 2 Overview FPGA Technology Applications Our work Overview FPGA Technology Applications Our work What is an FPGA? User-customisable integrated circuit › Dedicated blocks: memory, transceivers and MAC, PLLs, DSPs, ARM cores Configurable Xilinx FPGAs logic blocks Dedicated blocks Input and output blocks Routing * Clocking Resources Source: Xilinx 5 Xilinx RFSoc Device Source: Xilinx Recent Uptake in Reconfigurable Computing FPGA to accelerate database Catapult V2 large scale Announced FPGA for and machine learning FPGA cloud Tecent Cloud 2012 2014 2015/12 2016/10 2016/11 2017 Catapult V1 with AWS announced FPGA 1632 FPGA Acquire Altera, the second A pilot program accelerated F1 instance launched with Alibaba Accelerator as a largest FPGA producer. pilot project 30% servers will be cloud to provide cloud equipped with FPGAs by based acceleration 2020 service Source: CTAccel 7 Speed of Innovation Source: Xilinx 8 Motivation for FPGAs (EPIC) › FPGAs commercial off-the-shelf › They offer an opportunity to implement complex algorithms with higher throughput, lower latency and
    [Show full text]
  • AMD+Investor+Presentation Dec+10
    CAUTIONARY STATEMENT​ This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) such as AMD’s journey, technology investments and plan to drive shareholder returns over the next five years; the proposed transaction with Xilinx, Inc. including expectations, benefits and plans of the proposed transaction; total addressable markets; AMD’s technology roadmaps; the features, functionality, performance, availability, timing and expected benefits of future AMD products; AMD’s data center growth and as the new data center leader; AMD’s product and commercial momentum; and AMD’s long-term financial model, including revenue, non-GAAP gross margin, non-GAAP operating expenses as a percentage of revenue, non-GAAP operating margin and free cash flow margin, which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this presentation are based on current beliefs, assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and uncertainties in AMD’s Securities and Exchange Commission filings, including but not limited to AMD’s Quarterly Report on Form 10-Q for the quarter ended September 26, 2020.
    [Show full text]
  • Xcell Journal Special Issue: Xilinx Customers Shape a Brilliant Future
    ISSUE 92, THIRD QUARTER 2015 V SPECIAL ISSUE V 5G Wireless Brings Xilinx Customers Ubiquitous Connectivity The Coming Revolution Shape a Brilliant in Vehicle Technology Machine Learning in the Cloud: Future Deep Neural Networks on FPGAs Power Fingerprinting Cybersecurity Using Zynq SoCs World’s First Programmable City Arises, Built on Xilinx FPGAs 18 www.xilinx.com/xcell Lifecycle Technology Design it or Buy it? Shorten your development cycle with Avnet’s SoC Modules Quick time-to-market demands are forcing you to rethink how you design, build and deploy your products. Sometimes it’s faster, less costly and lower risk to incorporate an off-the-shelf solution instead of designing from the beginning. Avnet’s system-on module and motherboard solutions for the Xilinx Zynq®-7000 All Programmable SoC can reduce development times by more than four months, allowing you to focus your efforts on adding differentiating features and unique capabilities. Find out which Zynq SOM is right for you http://zedboard.org/content/design-it-or-buy-it facebook.com/avnet twitter.com/avnet youtube.com/avnet Integrated Hardware and Software Prototyping Solution HAPS and ProtoCompiler accelerate software development, HW/SW integration and system validation from individual IP blocks to processor subsystems to complete SoCs. Integrated ProtoCompiler design automation software speeds prototype bring-up by 3X Enhanced HapsTrak I/O connector technology and high-speed time-domain multiplexing deliver the highest system performance Automated debug captures seconds
    [Show full text]
  • ADC12DJ3200EVMCVAL with Alpha Data Xilinx® Kintex Ultrascale Space Development Kit
    www.ti.com Table of Contents User’s Guide ADC12DJ3200EVMCVAL With Alpha Data Xilinx® Kintex Ultrascale Space Development Kit Jim Seton ABSTRACT This user's guide describes the functionality, hardware, operation, and software instructions to interface the Texas Instruments ADC12DJ3200EVMCVAL with the Alpha Data ADA-SDEV-Kit1&2 development boards, which contain a XQRKU060, a space grade Xilinx® Kintex® UltraScale™ field-programmable gate array (FPGA). Table of Contents 1 Trademarks..............................................................................................................................................................................2 2 Introduction.............................................................................................................................................................................3 3 Functionality............................................................................................................................................................................3 4 Required Hardware.................................................................................................................................................................4 4.1 AlphaData ADA-SDEV-KIT1&2® ....................................................................................................................................... 4 4.2 TI ADC12DJ3200CVAL Evaluation Module....................................................................................................................... 4 4.3 Test Equipment.................................................................................................................................................................
    [Show full text]