Nvidia® Tesla® Gpu Computing Revolutionizing High Performance Computing

NVIDIA® TESLA® GPU COMPUTING REVOLUTIONIZING HIGH PERFORMANCE COMPUTING

To learn more, go to www.nvidia.com/tesla ©2010 NVIDIA, the NVIDIA logo, CUDA, GPUDirect, Parallel Nsight, and Tesla are trademarks and/or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. All rights reserved. 09/2010 GPUS ARE REVOLUTIONIZING COMPUTING

The high performance computing (HPC) industry’s need for computation is increasing, as large and complex computational problems become commonplace across many industry segments. Traditional CPU technology, however, is no longer capable of scaling in performance sufficiently to address this demand.

The parallel processing capability of the GPU allows it to divide complex computing tasks into thousands of smaller tasks that can be run concurrently. This ability is enabling computational scientists and researchers to address some of the world’s most challenging computational problems up to several orders of magnitude faster. The convergence of new, fast GPUs optimized for computation as well as 3-D graphics acceleration and industry-standard software development tools marks the real beginning 5X 18X 20X 30X 36X of the GPU computing era.” Adobe digital Video Transcoding 3D Ultrsound Gene Sequencing Molecular Dynamics content creation Elemental Tech TechniScan U of Maryland U of Illinois, Urbana

Nathan Brookwood “ Principal Analyst & Co-Founder, Insight64

50X 80X 100X 146X 149X This advancement represents a From climate modeling to advances in MATLAB Computing Weather Modeling Astrophysics Medical Imaging Financial Simulation RIKEN dramatic shift in HPC. In addition to medical tomography, NVIDIA® Tesla™ AccelerEyes Tokyo Institute of Technology U of Utah Oxford Co-processing refers to dramatic improvements in speed, GPUs are enabling a wide variety of the use of an accelerator, GPUs also consume less power than segments in science and industry to such as a GPU, to offload conventional CPU-only clusters. progress in ways that were previously the CPU to increase GPUs deliver performance increases impractical, or even impossible, due computational efficiency. of 10x to 100x to solve problems to technological limitations. WHY GPU COMPUTING? As sequential processors, CPUs Tesla GPU computing is delivering in minutes instead of hours–while are not designed for this type of transformative increases in With the ever-increasing demand outpacing the performance of computation, but they are adept performance for a wide range of HPC for more computing performance, industry segments. traditional computing with x86-based at more serial based tasks such the HPC industry is moving toward a CPUs alone. as running operating systems and hybrid computing model, where GPUs organizing data. NVIDIA believes in and CPUs work together to perform applying the most relevant processor general purpose computing tasks. to the specific task in hand. Conventional CPU computing As parallel processors, GPUs excel 100x architecture can no longer support Performance Advantage at tackling large amounts of similar by 2021 the growing HPC needs. data because the problem can be split Source: Hennesse]y &Patteson, CAAQA, 4th Edition. into hundreds or thousands of pieces and calculated simultaneously. 10000

20% year

1000

52% year

100 Performance vs VAX vs Performance

10 CPU 25% year GPU Growth per year

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2016 NVIDIA TESLA GPUS ARE REVOLUTIONIZING COMPUTING GPU COMPUTING APPLICATIONS The CUDA parallel computing architecture, with a combination of Libraries and Middleware hardware and software. Language Solutions Device level APIs

Java and Direct C C++ Fortran Python OpenCL™ Compute interfaces

Core comparison between NVIDIA GPU CUDA Parallel Computing Architecture a CPU and a GPU. CPU GPU Multiple Cores Hundreds of Cores

Parallel critical portions of an application CUDA PARALLEL With support for C++, GPUs based on Acceleration to take advantage of the hundreds COMPUTING the Fermi architecture make parallel of parallel cores in the GPU. The ARCHITECTURE processing easier and accelerate Multi-core programming with x86 rest of the application remains the performance on a wider array of CPUs is difficult and often results CUDA™ is NVIDIA’s parallel computing same, making the most efficient use applications than ever before. Just a in marginal performance gain when architecture. Applications that of all cores in the system. Running few applications that can experience going from 1 core to 4 cores to 16 leverage the CUDA architecture can a function of the GPU involves significant performance benefits cores. Beyond 4 cores, memory be developed in a variety of languages rewriting that function to expose include ray tracing, finite element bandwidth becomes the bottleneck to and APIs, including C, C++, Fortran, its parallelism, then adding a few analysis, high-precision scientific further performance increases. OpenCL, and DirectCompute. The new function-calls to indicate which computing, sparse linear algebra, CUDA architecture contains hundreds To harness the parallel computing functions will run on the GPU or the sorting, and search algorithms. of cores capable of running many The next generation CUDA parallel power of GPUs, programmers can CPU. With these modifications, the computing architecture, codenamed thousands of parallel threads, while simply modify the performance- performance-critical portions of the “Fermi”. the CUDA programming model lets application can now run significantly I believe history programmers focus on parallelizing “ faster on the GPU. their algorithms and not the will record Fermi as a mechanics of the language. significant milestone.”

The latest generation CUDA Dave Patterson architecture, codenamed “Fermi”, is Director, Parallel Computing the most advanced GPU computing Research Laboratory, U.C. Berkeley architecture ever built. With over three billion transistors, Fermi is Co-author of Computer making GPU and CPU co-processing Architecture: A Quantitative Approach pervasive by addressing the full- spectrum of computing applications.

NVIDIA TESLA GPUS ARE REVOLUTIONIZING COMPUTING Solution Stone and his team and its mutations—but it also bought The benefits of turned to the NVIDIA CUDA parallel them time to make other important GPU“ computing can processing architecture running discoveries. Further calculations on Tesla GPUs to perform their showed that genetic mutations which be replicated in other molecular modeling calculations and render the swine or avian flu resistant research areas as well. simulate the drug resistance of H1N1 to Tamiflu had actually disrupted All of this work is made mutations. Thanks to GPU technology, the “binding funnel,” providing new speedier and more the scientists could efficiently run understanding about a fundamental multiple simulations and achieve mechanism behind drug resistance. efficient thanks to GPU Ribosome for protein synthesis. potentially life-saving results faster. technology, which for us In the midst of the H1N1 pandemic, Impact The GPU-accelerated the use of improved algorithms means quicker results calculation was completed in just based on CUDA and running on Tesla as well as dollars and over an hour. The almost thousand- GPUs made it possible to produce energy saved.” said WHO BENEFITS FROM GPU University of Illinois: Accelerated molecular fold improvement in performance actionable results about the efficacy Stone. COMPUTING? modeling enables rapid response to H1N1 available through GPU computing of Tamiflu during a single afternoon. Computational scientists and advanced algorithms empowered This would have taken weeks or and researchers who are the scientists to perform “emergency months of computing to produce This determination involved a using GPUs to accelerate Challenge A first step in computing” to study biological the same results using conventional daunting simulation of a 35,000- their applications are mitigating a global pandemic, like problems of extreme current approaches. atom system, something a group seeing results in days H1N1, requires quickly developing relevance and share their results with of University of Illinois, Urbana- instead of months, even drugs to effectively treat a virus the medical research community. Champaign scientists, led by John minutes instead of days. that is new and likely to evolve. This requires a compute-intensive process Stone, decided to tackle in a new way This speed and performance increase to determine how, in the case of using GPUs. not only enabled researchers to H1N1, mutations of the flu virus fulfill their original goal—testing Conducting this kind of simulation on protein could disrupt the binding Tamiflu’s efficacy in treating H1N1 a CPU would take more than a month pathway of Tamiflu, rendering it to calculate...and that would only potentially ineffective. amount to a single simulation, not the multiple simulations that constitute a complete study.

NVIDIA TESLA Case Study: University of Illinois Using imaging devices like a CT Impact GPUs provide 20x more scan, scientists are able to create computational power and an order a model of a patient’s circulatory of magnitude more performance system. From there, an advanced per dollar to the application of fluid dynamics simulation of the blood image reconstruction and blood flow through the patient’s arteries flow simulation, finally making such Harvard University: can be conducted on a computer to advanced simulation techniques Finding Hidden Heart Problems Faster identify areas of reduced endothelial practical at the clinical level. sheer stress on the arterial wall. Without GPUs, the amount of A complex simulation like this one computing equipment—in terms of Challenge Heart attacks are the plaque could greatly improve requires billions of fluid elements to size and expense—would render a the leading cause of death worldwide. patient care and save lives. be modeled as they pass through an hemodynamics approach unusable. Heart attacks are caused when Solution A team of researchers, artery system. An area of reduced Because it can detect dangerous plaque that has built up on artery including doctors at Harvard Medical sheer stress indicates that plaque arterial plaque earlier than any walls dislodges and blocks the flow School and Brigham & Women’s has formed on the interior artery other method, it is expected that of blood to the heart. Up to 80% of Hospital in Boston, Massachusetts, walls, preventing the bloodstream this breakthrough could save heart attacks are caused by plaque have discovered a non-invasive from making contact with the inner numerous lives when it is approved that is not detectable by conventional way to find dangerous plaque in a wall. The overall output of the for deployment in hospitals and 320 detector-row CT has enabled medical imaging. Even viewing the patient’s arteries. Tapping into the simulation provides doctors with research centers. single heart beat coronary imaging 20% that is detectable requires so that the entire coronary contrast computational power of GPUs, they an atherosclerotic risk map. The invasive endoscopic procedures which opacification can be evaluated at can create a highly individualized map provides cardiologists with the involve running several feet of tubing a single time point. The full 3D model of blood flow within a patient in location of hidden plaque and can into the patient in an effort to take course of the arteries, in turn, a study of blood flow which is called serve as an indicator as to where allows researchers to simulate the pictures of arterial plaque. hemodynamics. stents may eventually need to blood flowing through it by using This level of uncertainty with regard placed—and all of this knowledge computational fluid flow simulations, The buildup of plaque is highly to the exact location of potentially is gained without invasive imaging and subsequently compute the correlated to the shape—or deadly plaque poses a significant techniques or exploratory surgery. endothelial shear stress. geometry—of a patient’s arterial challenge for cardiologists. structure. Bends in an artery tend to Historically, it has been a guessing be areas where dangerous plaque is game for heart specialists to especially concentrated. determine if and where to place arterial stents in patients with blockages. Knowing the location of NVIDIA TESLA Case Study: Harvard University Solution MotionDSP, a software Using NVIDIA Tesla GPUs, MotionDSP’s With the GPU, we’re company based in San Mateo, customers, which include a variety of “ California, has developed “super- military-funded research groups, are bringing better video resolution” algorithms that allow it making UAVs safer and more reliable to all ISR platforms, to reconstruct video with better and while reducing deployment costs, including smaller UAVs. cleaner detail, increased resolution improving simulation accuracy and We’re helping these and reduced noise. All of which are dramatically boosting performance. ideal for the live streaming of video platforms perform to impact Using only CPUs to execute from the cameras attached to a UAV. their potential by being the kind of sophisticated video post- MotionDSP: GPUs increasing importance in MotionDSP’s product, Ikena ISR, processing algorithms required for smarter about how Armed Forces targets NVIDIA’s CUDA GPU parallel effective reconnaissance would result in we utilize COTS PC computing architecture allowing it up to six hours of processing for each one technology.” said Sean Challenge Unmanned Aerial The challenge is that the resulting to render, stabilize and enhance live hour of video—not a viable solution when Varah, chief executive Vehicles (UAVs) represent the latest images need to be rendered, video faster and more accurately than real-time results are critical. In contrast, in high-tech weaponry deployed stabilized and enhanced in real-time its competitors. Ikena ISR features Tesla GPUs enable MotionDSP’s Ikena officer of MotionDSP. to strengthen and improve the and across vast distances in order computationally intense, advanced ISR software to process any live video “Our technology makes military’s capabilities. But with new to be useful and accurate. Once they motion-tracking algorithms that source in real-time with less than 200ms the impossible possible, technologies come new challenges, have been processed, the images provide the basis for sophisticated of latency. Moreover, instead of requiring and this is making such as capturing actionable can give infantry critical information image stabilization and super- expensive CPU-clustered computing “Super-resolution” algorithms that intelligence while flying at speeds about potential challenges ahead— resolution video reconstruction. systems to complete the work, Ikena can our military safer and allow MotionDSP to reconstruct upwards of 140 mph, 10 miles above the end goal being to ensure the Perhaps most importantly, it can all be perform at full capacity on a standard better prepared.” video with better and cleaner detail, the earth. safety and protection of military run on off-the-shelf Windows laptops workstation small enough to fit inside of increased resolution and personnel in the field. and servers. the military’s field vehicles. reduced noise. One key feature of the UAV is that it is capable of providing a real-time Using CPUs alone, this process is stream of detailed images taken with very time consuming and doesn’t multiple different cameras situated allow information to be viewed on the vehicle itself. in real-time. As a result, military action could be based on potentially outdated intelligence data and inaccurate guides.

NVIDIA TESLA Case Study: MotionDSP Bloomberg: GPUs increase accuracy and reduce One of the challenges Bloomberg always faces is that we have processing time for bond pricing very large scale. We’re serving all the financial and business community and there are a lot of different instruments and Challenge Getting mortgage and Known as collateralized debt buying a home is a complex financial obligations (CDO) and collateralized models people want calculated. ” transaction, and for lenders, the mortgage obligations (CMO), baskets Shawn Edwards, competitive pricing and management of thousands of loans are publicly CTO, Bloomberg of that mortgage is an even greater traded financial instruments. For the challenge. Transactions involving banks and institutional investors who “ thousands of mortgages at once are buy and sell these baskets, timely a routine occurrence in financial pricing updates are essential because This technique requires calculating In addition, the capital outlay for the markets, spurred by banks that wish of fast-changing market conditions. huge amounts of data, from interest new GPU-based solution was one- to sell off loans to get their money Bloomberg, one of the world’s leading rate volatility to the payment behavior tenth the cost of an upgraded CPU GPU ACCELERATION back sooner. financial services organizations, prices of individual borrowers. These data- solution, and further savings are being Large calculations that CDO/CMO baskets for its customers intensive calculations can take hours realized due to the GPU’s efficient had previously taken up by running powerful algorithms that to run with a CPU-based computing power and cooling needs. to two hours can now be model the risks and determine the price. grid. Time is money, and Bloomberg Impact As Bloomberg customers completed in two minutes. wanted a new solution that would allow make CDO/CMO buying and selling Smaller runs that had them to get pricing updates to their decisions, they now have access to taken 20 minutes can now customers faster. the best and most current pricing be done in just seconds. Financial engineering is integral to Solution Bloomberg implemented information, giving them a serious today’s buying and selling decisions. an NVIDIA Tesla GPU computing competitive trading advantage in a solution in their datacenter. By market where timing is everything. porting this application to run on the NVIDIA CUDA parallel processing architecture to harness the power of GPUs, Bloomberg received dramatic improvements across the board. Large calculations that had previously taken up to two hours can now be completed in two minutes. Smaller runs that had taken 20 minutes can now be done in just seconds.

NVIDIA TESLA Case Study: Bloomberg This step change in performance “Access to high significantly increases the amount of performance, data that geoscientists can analyze in a given timeframe. Plus, it allows them high quality 3D to fully exploit the information derived computational tools on from 3D seismic surveys to improve a workstation platform subsurface understanding and reduce drastically changes risk in oil and gas exploration and exploitation. the productivity curve in 3D seismic Impact NVIDIA CUDA is allowing ffA to provide scalable high performance analysis and seismic computation for seismic data on interpretation, giving ffA: Accelerating 3D seismic interpretation hardware platforms equipped with one our users a real edge in or more NVIDIA Tesla and Quadro FX oil and gas exploration Challenge In the search for oil Solution UK-based company ffA GPUs. The latest benchmarking results and gas, the geological information provides world leading 3D seismic using Tesla GPUs have produced and de-risking field streamling with GPUs provided by seismic images of the analysis software and services to the performance improvements of up to development.” for Better Results earth is vital. By interpreting the data global oil and gas industry. Its software 37x versus high end dual quad core produced by seismic imaging surveys, tools extract detailed information from CPU workstations. 40 geoscientists can identify the likely 3D seismic data, providing a greater presence of hydrocarbon reserves and understanding of complex 3D geology, CUDA-based 30 understand how to extract resources improving productivity and reducing “ most effectively. Today, sophisticated uncertainty within the interpretation PUGs power large 20 visualization systems and process. The sophisticated tools are computation tasks Speedups computational analysis tools are used compute-intensive so it can take and interactive 10 to streamline what was previously a hours, or even days, to produce results computational subjective and labor intensive process. on conventional high performance workstations. workflows that we 0 Today, geoscientists must process Quad-Core Tesla could not hope to CPU GPU + CPU increasing amounts of data as With the recent release of its dwindling reserves require them CUDA enabled 3D seismic analysis implement effectively The latest benchmarking to pinpoint smaller, more complex application, ffA users routinely achieve otherwise.” said Steve results using Tesla GPUs reservoirs with greater speed and over an order of magnitude speed-up Purves, ffA Technical Data courtesy of RMOTC have produced performance accuracy. compared with performance on high Director. improvements of up to 37x end multi-core CPUs. versus high end dual quad core CPU workstations.

NVIDIA TESLA Case Study: ffA The Shift to Parallel Computing and the Path to Exascale

Approximately every 10 years, the world of supercomputing experiences a fundamental shift in computing architectures. It was around 10 years ago when cluster-based computing largely superseded vector-based computing as the de facto standard for large-scale computing installations, and this shift saw the supercomputing industry move beyond the petaFLOP performance barrier. With the next performance target being exascale-class computing, it is time for a new shift in computing architectures— the move to parallel computing.

The shift is already underway.

NVIDIA TESLA GPUS ARE REVOLUTIONIZING COMPUTING Future computing architectures will be hybrid and tertiary oil recovery using multi- systems with parallel-core GPUs working in Five years from scale simulation of porous materials, tandem with multi-core CPUs.” “ the simulation of nano- and micro- now, the bulk of flow in chemical and bio-chemical Jack Dongarra, serious HPC is going processes, and much more. University Distinguished Professor at University of Tennessee to be done with some These computational problems kind of accelerated represent a tiny fraction of the entire “ heterogeneous landscape of computational challenges architecture.” said that we face today and these problems In November 2008, Tokyo Institute What is even more impressive than are not getting any smaller. The sheer of Technology became the first its overall performance is the power Steve Scott, CTO, quantity of data that many scientists, supercomputing center to enter that it consumes. Jaguar delivers Cray, Inc. engineers and researchers must “Nebulae”, powered by 4640 Tesla the Top500 with a GPU-based 1.77 petaFLOPs, yet it consumes analyze is increasing exponentially and 20-series GPUs, is one of the fastest supercomputers in the world. hybrid system —a system that more than 7 megawatts of power supercomputing centers are over- uses GPUs and CPUs together to to do so. To put that into context, 7 subscribed as demand is outpacing deliver transformative increases megawatts is enough energy to power the supply of computational resources. CAS is one of the world’s most dynamic in performance without breaking 15,000 homes. In contrast, Nebulae If we are to maintain our rate of research and educational facilities. the bank with regards to energy delivers 1.27 petaFLOPs, yet it does innovation and discovery, we must take Prof. Wei Ge, Professor of Chemical consumption. The system, called this within a power budget of just computational performance to a level Engineering at the Institute of Process TSUBAME 1.2, entered the list at 2.55 megawatts. That makes it twice where it is 1000 times faster than what Engineering at CAS, introduced GPU number 24. as power-efficient as Jaguar. This it is today. The GPU is a disruptive force computing to the Beijing facility in 2007 difference in computational throughput in supercomputing and represents TSUBAME 1.2, by Tokyo Institute of Fast forward to June 2010 and to help them with discrete particle and Technology, was the first Tesla GPU and power is owed to the massively the only viable strategy to successfully hybrid systems have started to make molecular dynamics simulations. Since based hybrid cluster to reach Top500. parallel architecture of GPUs, where build exascale systems that are appearances even higher up the list. then, parallel computing has enabled hundreds of cores work together affordable and not prohibitive to build “Nebulae”, a system installed at the advancement of research in dozens within a single processor, delivering and operate. Shenzhen Supercomputing Center of other areas, including: real time unprecedented compute density for in China, equipped with 4640 Tesla simulations of industrial facilities, the next generation supercomputers. 20-series GPUs, made its entry into design and optimization of multi-phase the list at number 2, just one spot Another very notable entry into and turbulent industrial reactors using behind Oak Ridge National Lab’s this year’s Top500 was the Chinese computational fluid dynamics, the “Jaguar”, the fastest supercomputer Academy of Sciences (CAS). The “Mole optimization of secondary in the world. 8.5” supercomputer at CAS uses 2200 Tesla 20-series GPUs to deliver 207 teraFLOPS, which puts it at number 19 in the Top500.

NVIDIA TESLA The Shift to Parallel Computing and the Path to Exascale NVIDIA TESLA World’s First Computational GPU

The Tesla brand of products is Highly Reliable designed for high-performance > ECC protection for computing, and offers exclusive uncompromised data computing features. reliability Superior Performance > Stress tested for zero error > Highest double precision tolerance floating point performance > Manufactured by NVIDIA > Large HPC data sets > Enterprise-level support supported by larger on-board that includes a three-year memory warranty

> Faster communication with Designed for HPC InfiniBand using NVIDIA > Integrated into leading OEM GPUDirect™ including workstations, servers, and blades

NVIDIA TESLA GPUS ARE REVOLUTIONIZING COMPUTING Tesla Data Center Products

Available from OEMs and certified resellers, Tesla GPU computing products are designed to supercharge your computing cluster.

Highest Performance, Highest Efficiency

Performance Performance/$ Performance/watt Gflops Gflops/$K Gflops/kwatt

750 8x 70 800 4x 60 600 5x 600 50

450 40 TESLA GPU COMPUTING SOLUTIONS 400 300 30 The Tesla 20-series GPU computing solutions are designed from the ground up 20 200 for high-performance computing and are based on NVIDIA’s latest CUDA GPU 150 architecture, code named “Fermi”. It delivers many “must have” features for HPC 10 including ECC memory for uncompromised accuracy and scalability, C++ support, 0 0 0 and 7x double precision performance compared to CPUs. Compared to the typical CPU Server GPU-CPU CPU Server GPU-CPU CPU Server GPU-CPU Server Server Server quad-core CPUs, Tesla 20-series GPU computing products can deliver equivalent performance at 1/10th the cost and 1/20th the power consumption. CPU 1U Server: 2x Intel Xeon X5550 (Nehalem) 2.66 GHz, 48 GB memory, $7K, 0.55 kw GPU-CPU 1U Server: 2x Tesla C2050 + 2x Intel Xeon X5550, 48 GB memory, $11K, 1.0 kw GPU-CPU server solutions deliver up to 8x higher Linpack performance.

Tesla M2050/M2070 GPU Computing Tesla S2050 GPU Computing System Modules enables the use of GPUs and is a 1U system powered by 4 Tesla CPUs together in an individual server GPUs and connects to a CPU server. node or blade form factor.

NVIDIA TESLA GPU computing SOLuTIONS Tesla Workstation Products DEVELOPER ECOSYSTEM AND WORLDWIDE EDUCATION

Workstations powered by Tesla GPUs Designed to deliver cluster-level performance on a workstation, the NVIDIA Tesla In just a few years, an entire software ecosystem has developed around the CUDA outperform conventional CPU-only GPU Computing Processors fuel the transition to parallel computing while making architecture—from more than 350 universities worldwide teaching the CUDA solutions in life science applications. personal supercomputing possible—right at your desk. programming model, to a wide range of libraries, compilers and middleware that help users optimize applications for GPUs.

Highest Performance, Highest Efficiency The NVIDIA GPU Computing Ecosystem

MIDG: Discontinuous AMBER Molecular FFT: cuFFT 3.1 in OpenEye ROCS Radix Sort A rich ecosystem of software Galerkin Solvers for Dynamics (Mixed Double Precision Virtual Drug CUDA SDK applications, libraries, programming PDEs Precision) Screening language solutions, and service providers supporting the CUDA 120 8 8 180 8 RESEARCH & parallel computing architecture. EDUCATION LIBRARIES 7 7 160 7 100 6 6 140 6 INTEGRATED 80 DEVELOPMENT MATHEMATICAL 5 5 120 5 ENVIRONMENT PACKAGES 60 4 4 100 4 CUDA Speedups 3 3 80 3 LANGUAGES CONSULTANTS, 40 & APIs TRAINING, & 2 2 40 2 CERTIFICATION 20 1 1 20 1 ALL MAJOR TOOLS 0 0 0 0 0 PLATFORMS & PARTNERS

Intel Xeon X5550 CPU Tesla C2050

Tesla C2050/C2070 GPU Computing Processor delivers the power of a cluster in the form factor of a NVIDIA TESLA workstation. GPU computing ECOSYSTEM NVIDIA’s CUDA architecture has the industry’s most robust language and API In addition to the CUDA C development tools, math libraries, and hundreds of support for GPU computing developers, including C, C++, OpenCL, DirectCompute, code samples in the NVIDIA GPU computing SDK, there is also a rich ecosystem of and Fortran. NVIDIA Parallel Nsight, a fully integrated development environment solution providers: for Microsoft Visual Studio is also available. Used by more than six million developers worldwide, Visual Studio is one of the world’s most popular Libraries and Middleware Solutions development environments for Windows-based applications and services. Adding > Acceleware FDTD libraries GPU Debugging Tools functionality specifically for GPU computing developers, Parallel Nsight makes the > CUBLAS, complete BLAS library* > Allinea DDT power of the GPU more accessible than ever before. > CUFFT, high-performance FFT > Fixstars Eclipse plug-in routines* > NVIDIA cuda-gdb* > CUSP > NVIDIA Parallel Nsight for Visual > EM Photonics CULA Tools, Studio heterogeneous LAPACK > TotalView Debugger implementation GPU Performance Analysis Tools > NVIDIA OptiX and other AXEs* > NVIDIA Performance Primitives for > NVIDIA Visual Profiler* image and video processing > NVIDIA Parallel Nsight for Visual (www.nvidia.com/npp)* Studio > Thrust > TAU CUDA > Vampir Compilers and Language Solutions > CAPS HMPP Cluster and Grid Management Solutions > NVIDIA CUDA C Compiler (NVCC), supporting both CUDA C and CUDA > Bright Cluster Manager C++* > NVIDIA system management > Par4All interface (nvidia-smi) > PGI CUDA Fortran > Platform Computing > PGI Accelerator Compilers for C Math Packages and Fortran > Jacket by AccelerEyes > PyCUDA > Mathematica 8 by Wolfram NVIDIA Parallel Nsight Development > MATLAB Distributed Computing Environment Server (MDCS) by Mathworks > MATLAB Parallel Computing NVIDIA Parallel Nsight software NVIDIA Parallel Nsight software is the industry’s first development environment for Toolbox (PCT) by Mathworks integrates CPU and GPU development, massively parallel computing integrated into Microsoft Visual Studio, the world’s allowing developers to create optimal most popular development environment. It integrates CPU and GPU development, GPU-accelerated applications. allowing developers to create optimal GPU-accelerated applications.

Parallel Nsight supports Microsoft Visual Studio 2010, advanced debugging and analysis capabilities, as well as Tesla 20-series GPUs, built on the latest generation CUDA parallel computing architecture codenamed “Fermi”. For more information, visit developer.nvidia.com/object/nsight.html.

NVIDIA TESLA GPU computing ECOSYSTEM Consulting and Training Education and Certification Consulting and training services are > CUDA Certification (www.nvidia. available to support you in porting com/certification) applications and learning about > CUDA Center of Excellence (research.nvidia.com) developing with CUDA. For more > CUDA Research Centers (research. information, visit www.nvidia.com/ nvidia.com/content/cuda-research- object/cuda_consultants.html. centers) > CUDA Teaching Centers (research. nvidia.com/content/cuda-teaching- centers) > CUDA and GPU computing books (www.nvidia.com/object/cuda_ books.html)

* Available with the latest CUDA toolkit at www.nvidia.com/getcuda

For more information about the CUDA Certification Program, visit www.nvidia.com/certification. To learn more about NVIDIA Tesla, go to www.nvidia.com/tesla.