Multi Core Dsps Vs Gpus TI for Distribution

Total Page:16

File Type:pdf, Size:1020Kb

Multi Core Dsps Vs Gpus TI for Distribution MultiCore DSP vs GPUs Nissim Saban Multi Core Regional Application Manager GPU DSP Texas Instruments [email protected] MultiCore DSP vs GPUs • Today’s system developer’s needs Multi Core DSP • Multicore DSP Architechture • Shared GPU and DSP features • Differences highlighted • Summery GPU Today’s system developer’s needs • Processing power (floating point) – Systems that use GPU/DSP need high processing power , usually floating point operations. – Today’s DSP/GPU can provide many GFLOPS • Connectivity – Fast data movements (bandwidth) • GPU uses PCIe • DSP has PCIe, SRIO (20Gb/s),GMII(1Gb/s), HYPER LINK(50Gb/S) – Common Standards compliance (PCIe,SRIO,LAN,USB) • High level programming tools – Ease of use and reuse (OPENCL,OPENMP,CUDA……) – Easy Platform migration • Size • Price $ The next slides discuss other parameters as well Specialization (app oriented acceleration) • When a specific well defined task is needed usually the silicon contains HW accelerators • 2 examples follow in the next slides • The HW accelerator has API that the controlling DSP can use to offload it’s computations • The accelerators are equivalent to many additional GFLOPS • Typical accelerators: Video Compress/Decompress , Communication protocols , FFT , Encryption……….. App oriented acceleration - HD Video Transcode, Encode & Decode 1GHz - TMS320DM6467T Processor Features TMS320DM6467TZUT1 Digital Video Interfaces Core Capture • ARM926EJ-S™ (MPU) at 500 MHz HD VICP 0 2x BT.656 ARM DSP 1x BT.1120 • TMS320C64x+™ DSP Core at 1 GHz TCM RAM Subsystem Subsystem Display ME IPDE 2x BT.656 Memory ARM C64x+TM MC LF 1x BT.1120 • ARM: 16K I-Cache, 8K D-Cache, 32K TCM RAM, CALC ECD 8K Boot ROM 926EJ-S DSP Stream I/O • DSP: 32KB L1P Cache/SRAM, 32KB L1D Cache/SRAM, CPU Core HD VICP 1 128KB L2 Cache/SRAM, 64K Boot ROM 500 MHz 1 GHz Video Data Conversion TCM RAM Engine Video Encoder/Decoder Capabilities DownScaler Chroma MC LF HW Menu • Real-Time HD-HD Transcoding Up to 1080p Sampler Overlay • Multi-format (mf) HD to mf HD or mf SD CALC ECD • Up to 2 real time for HD-to-SD Transcoder • Real-time HD-HD transcoding for PVR Switched Central Resource (SCR) • Video Encode and Decode • HD 1080p30 H.264 BP encode • HD 1080p60 H.264 BP decode Peripherals System Connectivity • Dual HD 1080i60/p30 MPEG-2 MP@HL decoding • Simultaneous 720p30 H.264 BP encode USB PCI G- Timer EMAC WD and 1080p30 BP/HP decode PWM EDMA 2.0 VLYNQ • Simultaneous 720p60 H.264 BP encode and BP decode 2 Timer 2 With PHY* HPI MDIO Benefits Serial Interfaces Program/Data Storage • Scalable video engine building on high-performance C64x+ DDR2 Async media DSP, low-cost local controllers, and rich suite of multi- McASP McASP 2 UART format video accelerators I C SPI Controller EMIF/ ATA 4 ch 1 ch 3 (16b/32b) NAND Applications • Transcoding (HD-HD, HD-SD) HD-Video Conferencing, HD- IP Set-Top Boxes, Digital Media Adapters, Video Surveillance, Medical Imaging bak Nyquist (C6670) with accelerators • SoC for Baseband C6670 (Nyquist) Hyper – 4x Performance of Faraday Link Layer 1 Acceleration – RTOS, Linux and GCC • Fixed/Floating C66x™ CorePac Turbo Encoder Turbo Decoder – Eight C66x DSP cores @ > 1.0 GHz New C6x Viterbi Decoder FFT / DFT – 1 MB Local L2 core – >128 GMAC, >64 GFLOP L2 Memory / Cache, CQI, Reed TFCI – 2.0 MB shared memory 1MB Muller • Navigator Uplink CR Downlink CR – Make Multi-core easy – Packet Infrastructure Memory System Interference Cancellation b Multicore Memory • Wireless Acceleration for Layer 1 & Layer 2 44 66 hh h – Enable 2 cells on Single Chip Controller cc c 33 tt t - ii – Chip architecture tailor made for 3G/4G packet i R Shared Memory ww w data D Layer 2 Acceleration D SS 2 MB S • Network Processing Engine tt t ii – IP Fast Path in hardware i RoHC Air Ciphering bb b aa a rr • Low Power Consumption System Elements r QoS RLC / MAC ee – Adaptive Voltage Scaling e TT Power Mgmt SysMon T Various and many Power saving techniques • Hyperlink Debug EDMA Scheduler Fast Path – Expansion port – Transparent to Software Peripherals and I/O • Terabit Switch Network Processing Engine sRIO 4x AIF2 6x – Reduces bottlenecks and unleashes full IPsec / GTP-U Fast Path performance of the chip SGMII 2x PCIe 2x • Multicore Debugging GPIO, I2C UART, SPI Cryptographic IEEE 1588 Multi-core Infrastructure Navigator connectivity Power (cooling , reliability….) • The DSP is general purpose and intended to be used in embedded system also • Power is a major concern today to system designers and has many impacts on price , reliability • DSPs are available in industrial temperature range • Next slide shows a Power/Performance GPP/GPU/DSP comparison example GeForce GTX 580 Source : http://www.nvidia.com/object/product-geforce-gtx-580-us.html GeForce GTX 580 FFT benchmark comparison with GPP/GPU Platform Effective Time to Power Energy complete 1024 point (Watts) per FFT complex to complex (µJ) FFT (single precision), µs GPU: nVidia Tesla C2070 0.16 225 36 GPU: nVidia Tesla C1060 0.3 188 56.4 GPP: Intel Xeon Core Duo 1.8 95 171 @ 3 GHz1 GPP: Intel Nehalam Quad 1.2 130 156 Core @ 3.2 GHz1 DSP: TI C6678 @ 1.2 GHz1 0.86 10 8.6 1For GPP and DSP, we show the effective time of completion assuming that multiple FFTs are run in parallel to the multiple cores available • When performance is distributed over power – DSP >4x better than GPU – DSP >18x better than GPP GPP/GPU information sources: • http://www.microway.com/pdfs/TeslaC2050-Fermi-Performance.pdf • http://www.fftw.org/speed/CoreDuo-3.0GHz-icc/ Real time systems (HW connectivity ,controlled response timing…..) • The typical system we are discussing has very fast inputs and outputs • The system has to meet real time constraint as frame rate , communications , latency • DSP operating system and HW architecture enables real time reliable response • Following slides show typical connections of GPU and DSP in real world • The DSP is designed to connect easily to FPGA and other HW inputs/outputs GPU environment DSP environment Please note connectivity to FPGA and to external buses C6678 DSP Architecture • The next slide shows the world of embedded processing today 16,32 bits, Accelerators, Multicores • A slide with the internal architechture of the Multicore DSP follows • The DSP has 8 functional units that can run 8 instructions per cycle • In term of “CUDA cores” each DSP has 16 “cores” • In C6678 the HYPER LINK enables to connect DSPs in pairs so you can get 32 cores @1.2GHZ Embedded processing portfolio TI Embedded Processors Microcontrollers (MCUs) ARM®-Based Processors Digital Signal Processors (DSPs) 16-bit ultra- 32-bit 32-bit ARM Ultra ARM DSP Multi-core low power real-time Cortex™-M3 Low power Cortex-A8 & DSP+ARM DSP MCUs MCUs MCUs ARM9™ MPUs DSP ™ ™ C2000 ® Sitara™ C6000 Stellaris ™ ™ ™ ™ ™ ARM Cortex-A8 DaVinci C6000 C5000 MSP430 Delfino ARM Cortex-M3 ™ & ARM9 video processors Piccolo OMAP™ Up to 40MHz to Up to 375MHz to 300MHz to >1Ghz 320 GMACS Up to 300 MHz 25 MHz 300 MHz 100 MHz >1GHz +Accelerator +Accelerator Flash Flash, RAM Flash Cache, Cache Cache Up to 320KB RAM 1 KB to 256 KB 16 KB to 512 KB 8 KB to 256 KB RAM, ROM RAM, ROM RAM, ROM Up to 128KB ROM Analog I/O, ADC PWM, ADC, USB, ENET USB, CAN, SATA, USB, ENET, SRIO, EMAC USB, ADC LCD, USB, RF 2 MAC+PHY CAN, SPI, PCIe, EMAC PCIe, SATA, SPI DMA, PCIe McBSP, SPI, I2C CAN, SPI, I C ADC, PWM, SPI Measurement, Motor Control, Connectivity,Security, Industrial automation, Floating/Fixed Point Telecom test & meas, Audio, Voice Sensing, General Digital Power, Motion Control, HMI, POS & portable Video, Audio, Voice, media gateways, Purpose Lighting, Ren. Energy Industrial Automation data terminals Security, Conferencing base stations Medical, Biometrics $0.25 to $9.00 $1.50 to $20.00 $1.00 to $8.00 $5.00 to $25.00 $5.00 to $200.00 $40.00 to $200.00 $3.00 to $10.00 Software & Dev. Tools MPUs – Microprocessors TMX320C6678/4/2 processors are ideal for Applications such as Key Advantages • Mission Critical • Military • Support for fixed & floating point • Avionics processing • Public Safety • Focused security & network • Medical Imaging coprocessors • Ultrasound • Multiple high speed peripherals • Endoscopy • Enhanced memory architecture • MRI / CT Scan with large on chip memory • Emerging modalities • New multicore navigator for fast • Test & Automation data transfers • Wafer/LCD inspection • Niche printing/scanning • Switch fabric with 2 terabits of • Emerging Video bandwidth • New multicore focused debug and performance profiling capabilities back Multicore High Performance DSP C66x Core Multicore Navigator Next generation Fixed / Floating-Point Data transfer engine that is architected to move DSP core with clock speeds ranging data between various system elements without using any CPU overhead so maximum system from 1GHz – 1.25GHz and Up to 8 efficiency is achieved core options a b Network Co- Processor and Accelerators A cost effective implementation to Memory Architecture b off-load the wireless and secure Up to 8MB of combined memory /%t '' / /%t '' networking functions from the DSP with an enhanced memory architecture that allows fast on- and [ / w!a Y. tttt off-chip memory access including a eeee a { NNNN aaaa rrrr !!!!!! a a " # !!!!!! a a " # { 9 DDR3-1600MHz (64-bit) interface eeee /$ /$ TTTT %%% t %%% t {#' a $ www {#' a $ (8GB of addressable space). www {+ a " # 555 {+ a " # a " 555 a " TeraNet 555 555 a . 5! 95a ! Switch fabric that has 2 Terabits D 9 { of bandwidth which allows t Lh maximum data transfer between {" w0 system components to realize full
Recommended publications
  • OMAP2430 Applications Processor
    TM Technology for Innovators OMAP2430 applications processor All-in-One Key features entertainment • Advanced Imaging, Video and Audio Accelerator (IVA™ 2) boosts video for 3G mobile performance in mobile phones by up to 4X and imaging performance by up to 1.5X phones • Delivering a multimedia experience with consumer electronics quality to the handset • Multi-engine parallel processing architecture for supporting complex usage scenarios • Built-in M-Shield™ mobile security technology enables value-added services and terminal security • Support for all major High Level Operating Systems (HLOS) aids applications development PRODUCT BULLETIN Leading imaging The new OMAP2430 processor from Texas Instruments (TI) delivers a new and video level of multimedia performance to third-generation (3G) mobile phones and performance other handheld systems. Leveraging TI’s proven OMAP™ 2 architecture, the OMAP2430 features an advanced imaging, video and audio accelerator (IVA™ 2) that provides a 4X improvement in video processing and 1.5X improvement in image processing over previously available solutions for mobile phones. The processor’s high video performance enables advanced codec algorithms that promote higher compression ratios allowing networks to support more data, bringing down costs for service providers and allowing them to deploy revenue- generating services such as mobile digital TV and mobile-to-mobile gaming. Support for these services, plus high-resolution, high-speed decode of standard video compression algorithms, brings a TV-like viewing quality and the familiar features of consumer electronics to mobile communications. The OMAP2430 is optimized for the complex applications characteristic of 3G wireless communications. Offering even higher performance than first-generation OMAP 2 processors while at a lower cost, the OMAP2430 processor provides the ultimate balance between multimedia performance, flexibility, power and cost.
    [Show full text]
  • NVIDIA GEFORCE GT 630 Step up to Better Video, Photo, Game, and Web Performance
    NVIDIA GEFORCE GT 630 Step up to better video, photo, game, and web performance. Every PC deserves dedicated graphics. The NVIDIA® GeForce® GT 630 graphics card delivers a premium multimedia experience “Sixty percent of integrated and reliable gaming—every time. Insist on NVIDIA dedicated graphics owners want dedicated graphics for the faster, more immersive performance your graphics in their next PC1”. customers want. GEFORCE GT 630 TECHNICAL SPECIFICATIONS CUDA CORES > 96 GRAPHICS CLOCK > 810 MHz PROCESSOR CLOCK > 1620 MHz Deliver more Accelerate performance by up Give customers the freedom performance—and fun— to 5x over today’s integrated to connect their PC to any MEMORY INTERFACE with amazing HD video, graphics solutions2 and provide 3D-enabled TV for a rich, > 128 Bit photo, web, and gaming. additional dedicated memory. cinematic Blu-ray 3D experience FRAME BUFFER in their homes4. > 512/1024 MB GDDR5 or 1024 MB DDR3 MICROSOFT DIRECTX > 11 BEST-IN-CLASS PERFORMANCE MICROSOFT DIRECTCOMPUTE > Yes BLU-RAY 3D4 > Yes TRUEHD AND DTS-HD AUDIO BITSTREAMING > Yes NVIDIA PHYSX™-READY > Yes MAX. ANALOG / DIGITAL RESOLUTION > 2048x1536 (analog) / 2560x1600 (digital) DISPLAY CONNECTORS > Dual Link DVI, HDMI, VGA NVIDIA GEFORCE GT 630 | SELLSHEET | MAY 12 NVIDIA® GEFORCE® GT 630 Features Benefits HD VIDEOS Get stunning picture clarity, smooth video, accurate color, and precise image scaling for movies and video with NVIDIA PureVideo® HD technology. WEB ACCELERATION Enjoy a 2x faster web experience with the latest generation of GPU-accelerated web browsers (Internet Explorer, Google Chrome, and Firefox) 5. PHOTO EDITING Perfect and share your photos in half the time with popular apps like Adobe® Photoshop® and Nik Silver EFX Pro 25.
    [Show full text]
  • GS40 0.11-Μm CMOS Standard Cell/Gate Array
    GS40 0.11-µm CMOS Standard Cell/Gate Array Version 1.0 January 29, 2001 Copyright Texas Instruments Incorporated, 2001 The information and/or drawings set forth in this document and all rights in and to inventions disclosed herein and patents which might be granted thereon disclosing or employing the materials, methods, techniques, or apparatus described herein are the exclusive property of Texas Instruments. No disclosure of information or drawings shall be made to any other person or organization without the prior consent of Texas Instruments. IMPORTANT NOTICE Texas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to discontinue any product or service without notice, and advise customers to obtain the latest version of relevant information to verify, before placing orders, that information being relied on is current and complete. All products are sold subject to the terms and conditions of sale supplied at the time of order acknowledgement, including those pertaining to warranty, patent infringement, and limitation of liability. TI warrants performance of its semiconductor products to the specifications applicable at the time of sale in accordance with TI’s standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary to support this war- ranty. Specific testing of all parameters of each device is not necessarily performed, except those mandated by government requirements. Certain applications using semiconductor products may involve potential risks of death, personal injury, or severe property or environmental damage (“Critical Applications”). TI SEMICONDUCTOR PRODUCTS ARE NOT DESIGNED, AUTHORIZED, OR WAR- RANTED TO BE SUITABLE FOR USE IN LIFE-SUPPORT DEVICES OR SYSTEMS OR OTHER CRITICAL APPLICATIONS.
    [Show full text]
  • Integrated Step-Down Converter with SVID Interface for Intel® CPU
    Product Order Technical Tools & Support & Folder Now Documents Software Community TPS53820 SLUSE33 –FEBRUARY 2020 Integrated Step-Down Converter with SVID Interface for Intel® CPU Power Supply 1 Features 3 Description The TPS53820 device is D-CAP+ mode integrated 1• Single chip to support Intel VR13.HC SVID POL applications step-down converter for low current SVID rails of Intel CPU power supply. It provides up to two outputs to • Two outputs to support VCCANA (5.5 A) and power the low current SVID rails such as VCCANA P1V8 (4 A) (5.5 A) and P1V8 (4 A). The device employs D-CAP+ • D-CAP+™ Control for Fast Transient Response mode control to provide fast load transient • Wide Input Voltage (4.5 V to 15 V) performance. Internal compensation allows ease of use and reduces external components. • Differential Remote Sense • Programmable Internal Loop Compensation The device also provides telemetry, including input voltage, output voltage, output current and • Per-Phase Cycle-by-Cycle Current Limit temperature reporting. Over voltage, over current and • Programmable Frequency from 800 kHz to 2 MHz over temperature protections are provided as well. 2 • I C System Interface for Telemetry of Voltage, The TPS53820 device is packaged in a thermally Current, Output Power, Temperature, and Fault enhanced 35-pin QFN and operates between –40°C Conditions and 125°C. • Over-Current, Over-Voltage, Over-Temperature (1) protections Table 1. Device Information • Low Quiescent Current PART NUMBER PACKAGE BODY SIZE (NOM) • 5 mm × 5 mm, 35-Pin QFN, PowerPAD Package TPS53820 RWZ (35) 5 mm × 5 mm (1) For all available packages, see the orderable addendum at 2 Applications the end of the data sheet.
    [Show full text]
  • Data Sheet: Quadro GV100
    REINVENTING THE WORKSTATION WITH REAL-TIME RAY TRACING AND AI NVIDIA QUADRO GV100 The Power To Accelerate AI- FEATURES > Four DisplayPort 1.4 Enhanced Workflows Connectors3 The NVIDIA® Quadro® GV100 reinvents the workstation > DisplayPort with Audio to meet the demands of AI-enhanced design and > 3D Stereo Support with Stereo Connector3 visualization workflows. It’s powered by NVIDIA Volta, > NVIDIA GPUDirect™ Support delivering extreme memory capacity, scalability, and > NVIDIA NVLink Support1 performance that designers, architects, and scientists > Quadro Sync II4 Compatibility need to create, build, and solve the impossible. > NVIDIA nView® Desktop SPECIFICATIONS Management Software GPU Memory 32 GB HBM2 Supercharge Rendering with AI > HDCP 2.2 Support Memory Interface 4096-bit > Work with full fidelity, massive datasets 5 > NVIDIA Mosaic Memory Bandwidth Up to 870 GB/s > Enjoy fluid visual interactivity with AI-accelerated > Dedicated hardware video denoising encode and decode engines6 ECC Yes NVIDIA CUDA Cores 5,120 Bring Optimal Designs to Market Faster > Work with higher fidelity CAE simulation models NVIDIA Tensor Cores 640 > Explore more design options with faster solver Double-Precision Performance 7.4 TFLOPS performance Single-Precision Performance 14.8 TFLOPS Enjoy Ultimate Immersive Experiences Tensor Performance 118.5 TFLOPS > Work with complex, photoreal datasets in VR NVIDIA NVLink Connects 2 Quadro GV100 GPUs2 > Enjoy optimal NVIDIA Holodeck experience NVIDIA NVLink bandwidth 200 GB/s Realize New Opportunities with AI
    [Show full text]
  • Download Gtx 970 Driver Download Gtx 970 Driver
    download gtx 970 driver Download gtx 970 driver. Completing the CAPTCHA proves you are a human and gives you temporary access to the web property. What can I do to prevent this in the future? If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware. If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices. Another way to prevent getting this page in the future is to use Privacy Pass. You may need to download version 2.0 now from the Chrome Web Store. Cloudflare Ray ID: 67a229f54fd4c3c5 • Your IP : 188.246.226.140 • Performance & security by Cloudflare. GeForce Windows 10 Driver. NVIDIA has been working closely with Microsoft on the development of Windows 10 and DirectX 12. Coinciding with the arrival of Windows 10, this Game Ready driver includes the latest tweaks, bug fixes, and optimizations to ensure you have the best possible gaming experience. Game Ready Best gaming experience for Windows 10. GeForce GTX TITAN X, GeForce GTX TITAN, GeForce GTX TITAN Black, GeForce GTX TITAN Z. GeForce 900 Series: GeForce GTX 980 Ti, GeForce GTX 980, GeForce GTX 970, GeForce GTX 960. GeForce 700 Series: GeForce GTX 780 Ti, GeForce GTX 780, GeForce GTX 770, GeForce GTX 760, GeForce GTX 760 Ti (OEM), GeForce GTX 750 Ti, GeForce GTX 750, GeForce GTX 745, GeForce GT 740, GeForce GT 730, GeForce GT 720, GeForce GT 710, GeForce GT 705.
    [Show full text]
  • Datasheet Quadro K600
    ACCELERATE YOUR CREATIVITY NVIDIA® QUADRO® K620 Accelerate your creativity with FEATURES ® ® > DisplayPort 1.2 Connector NVIDIA Quadro —the world’s most > DisplayPort with Audio > DVI-I Dual-Link Connector 1 powerful workstation graphics. > VGA Support ™ The NVIDIA Quadro K620 offers impressive > NVIDIA nView Desktop Management Software power-efficient 3D application performance and Compatibility capability. 2 GB of DDR3 GPU memory with fast > HDCP Support bandwidth enables you to create large, complex 3D > NVIDIA Mosaic2 SPECIFICATIONS models, and a flexible single-slot and low-profile GPU Memory 2 GB DDR3 form factor makes it compatible with even the most Memory Interface 128-bit space and power-constrained chassis. Plus, an all-new display engine drives up to four displays with Memory Bandwidth 29.0 GB/s DisplayPort 1.2 support for ultra-high resolutions like NVIDIA CUDA® Cores 384 3840x2160 @ 60 Hz with 30-bit color. System Interface PCI Express 2.0 x16 Quadro cards are certified with a broad range of Max Power Consumption 45 W sophisticated professional applications, tested by Thermal Solution Ultra-Quiet Active leading workstation manufacturers, and backed by Fansink a global team of support specialists, giving you the Form Factor 2.713” H × 6.3” L, Single Slot, Low Profile peace of mind to focus on doing your best work. Whether you’re developing revolutionary products or Display Connectors DVI-I DL + DP 1.2 telling spectacularly vivid visual stories, Quadro gives Max Simultaneous Displays 2 direct, 4 DP 1.2 you the performance to do it brilliantly. Multi-Stream Max DP 1.2 Resolution 3840 x 2160 at 60 Hz Max DVI-I DL Resolution 2560 × 1600 at 60 Hz Max DVI-I SL Resolution 1920 × 1200 at 60 Hz Max VGA Resolution 2048 × 1536 at 85 Hz Graphics APIs Shader Model 5.0, OpenGL 4.53, DirectX 11.24, Vulkan 1.03 Compute APIs CUDA, DirectCompute, OpenCL™ 1 Via supplied adapter/connector/bracket | 2 Windows 7, 8, 8.1 and Linux | 3 Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process when available.
    [Show full text]
  • NVIDIA AI-On-5G Computing Platform Adopted by Leading Service and Network Infrastructure Providers
    NVIDIA AI-on-5G Computing Platform Adopted by Leading Service and Network Infrastructure Providers Fujitsu, Google Cloud, Mavenir, Radisys and Wind River to Deliver Solutions for Smart Hospitals, Factories, Warehouses and Stores GTC -- NVIDIA today announced it is teaming with Fujitsu, Google Cloud, Mavenir, Radisys and Wind River to develop solutions for NVIDIA’s AI-on-5G platform, which will speed the creation of smart cities and factories, advanced hospitals and intelligent stores. Enterprises, mobile network operators and cloud service providers that deploy the platform will be able to handle both 5G and edge AI computing in a single, converged platform. The AI-on-5G platform leverages the NVIDIA Aerial™ software development kit with the NVIDIA BlueField®-2 A100 — a converged card that combines GPUs and DPUs including NVIDIA’s “5T for 5G” solution. This creates high-performance 5G RAN and AI applications in an optimal platform to manage precision manufacturing robots, automated guided vehicles, drones, wireless cameras, self-checkout aisles and hundreds of other transformational projects. “In this era of continuous, accelerated computing, network operators are looking to take advantage of the security, low latency and ability to connect hundreds of devices on one node to deliver the widest range of services in ways that are cost-effective, flexible and efficient,” said Ronnie Vasishta, senior vice president of Telecom at NVIDIA. “With the support of key players in the 5G industry, we’re delivering on the promise of AI everywhere.” NVIDIA and several collaborators in this new AI-on-5G ecosystem are members of the O-RAN Alliance, which is developing standards for more intelligent, open, virtualized and fully interoperable mobile networks.
    [Show full text]
  • Nvidia A100 Supercharges Cfd with Altair and Google Cloud
    NVIDIA A100 SUPERCHARGES CFD WITH ALTAIR AND GOOGLE CLOUD Image courtesy of Altair Ultra-Fast CFD powered by Altair, Google TAILORED FOR YOU Cloud, and NVIDIA > Get exactly as many GPUs as you need, for exactly as long as you need them, and pay The newest A2 VM family on Google Cloud Platform was designed to just for that. meet today’s most demanding applications. With up to 16 NVIDIA® A100 Tensor Core GPUs in a node, each offering up to 20X the compute FASTER OUT OF THE BOX performance of the previous generation and a whopping total of up to > The new Ampere GPU architecture on Google Cloud A2 instances delivers 640GB of GPU memory, CFD simulations with Altair ultraFluidX™ were the unprecedented acceleration and flexibility ideal candidate to test the performance and benefits of the new normal. to power the world’s highest-performing elastic data centers for AI, data analytics, With ultraFluidX, highly resolved transient aerodynamics simulations and HPC applications. can be performed on a single server. The Lattice Boltzmann Method is a perfect fit for the massively parallel architecture of NVIDIA GPUs, THINK THE UNTHINKABLE and sets the stage for unprecedented turnaround times. What used to > With a total of up to 312 TFLOPS FP32 peak be overnight runs on single nodes now becomes possible within working performance in the same node linked with NVLink technology, with an aggregate hours by utilizing state-of-the-art GPU-optimized algorithms, the new bandwidth of 9.6TB/s, new solutions to NVIDIA A100, and Google Cloud’s A2 VM family, while delivering the high many tough challenges are within reach.
    [Show full text]
  • NVIDIA and Tech Leaders Team to Build GPU-Accelerated Arm Servers for New Era of Diverse HPC Architectures
    NVIDIA and Tech Leaders Team to Build GPU-Accelerated Arm Servers for New Era of Diverse HPC Architectures Arm, Ampere, Cray, Fujitsu, HPE, Marvell to Build NVIDIA GPU-Accelerated Servers for Hyperscale-Cloud to Edge, Simulation to AI, High-Performance Storage to Exascale Supercomputing SC19 -- NVIDIA today introduced a reference design platform that enables companies to quickly build GPU-accelerated Arm®-based servers, driving a new era of high performance computing for a growing range of applications in science and industry. Announced by NVIDIA founder and CEO Jensen Huang at the SC19 supercomputing conference, the reference design platform -- consisting of hardware and software building blocks -- responds to growing demand in the HPC community to harness a broader range of CPU architectures. It allows supercomputing centers, hyperscale-cloud operators and enterprises to combine the advantage of NVIDIA's accelerated computing platform with the latest Arm-based server platforms. To build the reference platform, NVIDIA is teaming with Arm and its ecosystem partners -- including Ampere, Fujitsu and Marvell -- to ensure NVIDIA GPUs can work seamlessly with Arm-based processors. The reference platform also benefits from strong collaboration with Cray, a Hewlett Packard Enterprise company, and HPE, two early providers of Arm-based servers. Additionally, a wide range of HPC software companies have used NVIDIA CUDA-X™ libraries to build GPU-enabled management and monitoring tools that run on Arm-based servers. “There is a renaissance in high performance computing,'' Huang said. “Breakthroughs in machine learning and AI are redefining scientific methods and enabling exciting opportunities for new architectures. Bringing NVIDIA GPUs to Arm opens the floodgates for innovators to create systems for growing new applications from hyperscale-cloud to exascale supercomputing and beyond.'' Rene Haas, president of the IP Products Group at Arm, said: “Arm is working with ecosystem partners to deliver unprecedented performance and efficiency for exascale-class Arm-based SoCs.
    [Show full text]
  • Notes to Portfolio of Investments—March 31, 2021 (Unaudited)
    Notes to portfolio of investments—March 31, 2021 (unaudited) Shares Value Common stocks: 99.36% Communication services: 9.92% Interactive media & services: 8.14% Alphabet Incorporated Class C † 4,891 $ 10,117,669 Facebook Incorporated Class A † 21,120 6,220,474 16,338,143 Wireless telecommunication services: 1.78% T-Mobile US Incorporated † 28,582 3,581,039 Consumer discretionary: 10.99% Automobiles: 1.91% General Motors Company † 66,899 3,844,017 Internet & direct marketing retail: 3.59% Amazon.com Incorporated † 2,329 7,206,112 Multiline retail: 1.69% Dollar General Corporation 16,717 3,387,199 Specialty retail: 3.80% Burlington Stores Incorporated † 15,307 4,573,732 Ulta Beauty Incorporated † 9,936 3,071,913 7,645,645 Consumer staples: 2.81% Food & staples retailing: 1.60% Sysco Corporation 40,883 3,219,127 Household products: 1.21% Church & Dwight Company Incorporated 27,734 2,422,565 Financials: 7.38% Capital markets: 4.85% CME Group Incorporated 7,970 1,627,713 Intercontinental Exchange Incorporated 24,575 2,744,536 S&P Global Incorporated 9,232 3,257,696 The Charles Schwab Corporation 32,361 2,109,290 9,739,235 Insurance: 2.53% Chubb Limited 13,181 2,082,203 Marsh & McLennan Companies Incorporated 24,610 2,997,498 5,079,701 Health care: 14.36% Biotechnology: 1.11% Alexion Pharmaceuticals Incorporated † 14,516 2,219,642 Health care equipment & supplies: 6.29% Align Technology Incorporated † 5,544 3,002,242 Boston Scientific Corporation † 79,593 3,076,269 See accompanying notes to portfolio of investments Wells Fargo VT Opportunity
    [Show full text]
  • Quadro Vws on Google Cloud Platform
    Industry Support for Quadro vWS on Google Cloud Platform “Access to NVIDIA Quadro Virtual Workstation on the Google Cloud Platform will empower many of our customers to deploy and start using Autodesk software quickly, from anywhere. For certain workflows, customers leveraging NVIDIA T4 and RTX technology will see a big difference when it comes to rendering scenes and creating realistic 3D models and simulations. We’re excited to continue to collaborate with NVIDIA and Google to bring increased efficiency and speed to artist workflows." - Eric Bourque, Senior Software Development Manager, Autodesk “Many organizations are seeing the need for greater flexibility as they transition into a multi and hybrid cloud environment. NVIDIA Quadro Virtual Workstation alongside Citrix Workspace cloud services on the GCP Marketplace ensures today's workforce can have instant access to graphically rich 3D applications, from anywhere. With support for the latest Turing architecture-based NVIDIA T4 GPUs, we can assure Citrix Workspace users can experience interactive, high-fidelity rendering with RTX technology on GCP.” - James Hsu, Technical Alliance Director, Citrix “CL3VER has changed the way customers engage with products. Our cloud-based real-time rendering solution enables customers to interact with and visualize 3D content directly to the web. With Quadro vWS from the GCP marketplace powered by NVIDIA T4 and RTX technology, customers can now experience products on physical showrooms virtually with photorealistic quality, visualize 3D scenes in real-time,
    [Show full text]