Arrayfire: Open Source

Total Page:16

File Type:pdf, Size:1020Kb

Arrayfire: Open Source ArrayFire: Open Source An Introduction to the ArrayFire Library ● We make code run faster ○ Started in 2007 by Georgia Tech researchers ArrayFire—The People ● Diverse research background ○ HPC ○ Computer vision ○ Machine learning ○ Image processing ○ Social network analysis ○ Optical interferometry ○ Computer graphics ● From all over: Georgia Tech, U. Penn., Clemson, GSU, Texas A&M, UCSD, and more ● Passionate about making things faster ArrayFire—R&D ● Case studies on image analysis for massive datasets ● Large scale triangle counting on multiple GPUs ● Virtual fitting of glasses ● Computer aided logo detection ● Sleep disorder diagnosis ArrayFire—The Library ● The Library ○ GPU accelerated high performance library ○ Extensive set of built in functions ○ Applicable for numerous domains ■ Science ■ Engineering ■ Finance ■ Biology ○ Great for mobile and embedded computing Libraries are Great Eliminate Hidden Costs Library Types ● Specialized GPU library ○ Targeted at a specific set of operators (functionality) ■ Precision tools ○ Optimized for specific systems ○ C-like interface ○ Raw pointer interface Images taken from: http://classroom.synonym.com/learn-surgical-instruments-2429.html Library Types (cont.) ● General GPU library ○ Manage GPU resources using containers ○ Applicable to a large set of applications and domains ○ Portable across multiple architectures ○ Higher level functions ○ Can make use of specialized GPU libraries Images taken from: http://wordlesstech.com/2010/12/09/swiss-army-knife-giant-elite/ ArrayFire Capabilities ● Hundreds of parallel functions for multi-disciplinary work ○ Image processing ○ Machine learning ○ Graphics ○ Sets ● Support for multiple languages ○ C/C++, Fortran, Java and R ● Multiple backends - great for portability ○ CUDA ○ OpenCL ○ CPU (requires gcc) ArrayFire Capabilities (cont.) ● Linux, Windows, Mac OS X ● OpenGL based graphics ● JIT ○ Combine multiple operations into one kernel ArrayFire Functions ● Hundreds of highly-optimized parallel functions ○ Signal/image processing ■ Convolution ■ FFT ■ Histograms ■ Interpolation ■ Connected components ○ Linear Algebra ■ Matrix multiply ■ Linear system solving ■ Factorization ArrayFire Functions ● Supports hundreds of parallel functions ○ Building blocks ■ Reductions ■ Scan ■ Set operations ■ Sorting ■ Statistics ■ Basic matrix manipulation Images taken from: http://technogems.blogspot.com/2011/06/sorting-included-files-by-importance.html http://www.cmsoft.com.br/tutorialOpenCL/CLMatrixMultExplanationSubMatrixes.png ArrayFire—Data Structures ● Built around a flexible data structure named "array" ○ Lightweight wrapper around the data on the compute device ○ Manages the data and basic metadata such as size, type and dimensions ● You can transfer data into an array using constructors ● Column major float hA[6] = {0, 1, 2, 3, 4, 5}; array A(2, 3, hA); Introducing... Open source! https://www.github.com/arrayfire/arrayfire ArrayFire—Indexing #include <arrayfire.h> #include <af/utils.h> void af_example() { float f[8] = {1, 2, 4, 8, 16, 32, 64, 128}; array a(2, 4, f); // 2 rows x 4 col array initialized with f values array sumSecondCol = sum(a(span, 1)); // reduce-sum over the second column print(sumSecondCol); // 12 } ArrayFire Example—Swap R and B array tmp = img(span,span,0); // save the R channel img(span,span,0) = img(span,span,2); // R channel gets values of B img(span,span,2) = tmp; // B channel gets value of R array swapped = join(2, img(span,span,2), // blue img(span,span,1), // green img(span,span,0)); // red array swapped = img(span,span,seq(2,-1,0)); ArrayFire Functions Original Grayscale Box Filter Blur Gaussian Blur Image Negative Erosion ArrayFire // erode an image, 8-neighbor connectivity array mask8 = constant(1,3, 3); array img_out = erode(img_in, mask8); // erode an image, 4-neighbor connectivity const float h_mask4[] = { 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0 }; array mask4 = array(3, 3, h_mask4); array img_out = erode(img_in, mask4); Erosion Filtering ArrayFire array R = convolve(img, ker); // 1, 2 and 3d convolution filter array R = convolve(fcol, frow, img); // Separable convolution array R = filter(img, ker); // 2d correlation filter Histograms ArrayFire int nbins = 256; array hist = histogram(img,nbins); Transforms ArrayFire array half = resize(0.5, img); array rot90 = rotate(img, af::Pi/2); array warped = approx2(img, xLocations, yLocations); Image smoothing ArrayFire array S = bilateral(I, sigma_r, sigma_c); array M = meanshift(I, sigma_r, sigma_c, iter); array R = medfilt(img, 3, 3); // Gaussian blur array gker = gaussiankernel(ncols, ncols); array res = convolve(img, gker); FFT FFT (API) array R1 = fft2(I); // 2d fft. check fft, fft3 array R2 = fft2(I, M, N); // fft2 with padding array R3 = ifft2(fft2(I, M, N) * fft2(K, M, N)); // convolve using fft2 JIT Code Generation ● Run time kernel generation ● Combines multiple element wise operations into one kernel ● Reduces number of kernel launches ● Improves cache performance ○ Intermediate data not allocated JIT—Monte Carlo Pi Simulation array temp = (sqrt(x*x+y*y)<1); return sum<float>(temp); No JIT: ● 5 function calls (2”*”, 1 “+”, “ ”,1 “<”, 1 “√”) With JIT: ● One function call ● Fewer cache misses ○ “Capacity” cache misses avoided JIT Performance Results are from a K20 GPU Lower is better Additional Performance Results OpenCV ● Open source computer vision library ● C++ interface ● Some CUDA supported functionality OpenCV—ArrayFire Interoperability ● Helper Functions ○ https://github.com/arrayfire-community/arrayfire_opencv.git Mat R; // OpenCV API Rodrigues(poses(Rect(0, 0, 1, 3)), R); // OpenCV function call af::array af_R = mat_to_array(R); // Results transferred to array data-structure Feature Tracking—FAST Corner Detection ● In comparison with OpenCV CPU implementation ● Larger data-sets can be analyzed ● Single-threaded CPU implementation Feature Tracking—Harris Speedup ● Feature Detection algorithm ● Better performance ● Larger data-sets can be analyzed ● Single-threaded CPU implementation Feature Tracking—ORB Speedup ● Better performance ● Uses the FAST algorithm ○ At multiple scales ● Single-threaded CPU implementation Feature Tracking—Example (ORB) Speedup for sample: 21.6x Performance Results—Bilateral Filtering Performance Results—Convolution Performance Results—Image Rotate Performance Results—Bilinear Image Resize Performance insights ● Large images - GPUS are ideal. ● Smaller images - GPUs are great, but better performance can be achieved through batching and data parallelism. ○ With batching or data parallelism, GPU are also ideal. Conway’s Game of Life // Convolve gets neighbors af::array nHood = convolve(state, kernel, false); // Generate conditions for life af::array C0 = (nHood == 2); af::array C1 = (nHood == 3); // Update state state = state * C0 + C1; Open Source Roadmap ● Create capabilities for ○ Streaming video ○ Large number of images ○ Machine learning ○ Data analysis ○ Dynamic data ● Faster visualization/rendering utilities for large scale data sets Open Source Roadmap ● Support for sparse linear algebra ● Additional language wrappers ● More machine learning and computer vision functions ● Additional “big-data” infrastructure Look Us Up Company website: www.arrayfire.com Open source library: https://github. com/arrayfire/arrayfire Language wrapper {Fortran, R, Java}: https://github.com/arrayfire/arrayfire_$lang Q & A Speaker: Oded Green ([email protected]) Engineer: Shehzan Mohammed ([email protected]) Sales: Scott Blakeslee ([email protected]) TEST DRIVE GPU ACCELERATORS TODAY Accelerate your scientific discoveries: ✓ Reducing simulation time from hours to minutes ✓ Using the latest Tesla K80 GPUs FREE GPU Trial at: www.nvidia.com/GPUTestDrive UPCOMING LIVE GTC EXPRESS WEBINAR Thursday, December 18 Photorealistic Visualization with Speed and Ease Using Iray+ for Autodesk 3ds Max David Coldron, Lightwork Design and Peter de Lappe, NVIDIA ON-DEMAND GTC EXPRESS WEBINARS More than 100 GTC Express Webinar recordings. www.gputechconf.com/gtcexpress REGISTRATION IS OPEN! 20% OFF March 17-20, 2015 | San Jose, CA GM15WEB www.gputechconf.com #GTC15 CONNECT LEARN DISCOVER INNOVATE Connect with experts Get key learnings and Discover the latest Hear about disruptive from NVIDIA and other hands-on training in the technologies shaping innovations as early-stage organizations across a 400+ sessions and 150+ the GPU ecosystem start-ups present their work wide range of fields research posters.
Recommended publications
  • Analysis of GPU-Libraries for Rapid Prototyping Database Operations
    Analysis of GPU-Libraries for Rapid Prototyping Database Operations A look into library support for database operations Harish Kumar Harihara Subramanian Bala Gurumurthy Gabriel Campero Durand David Broneske Gunter Saake University of Magdeburg Magdeburg, Germany fi[email protected] Abstract—Using GPUs for query processing is still an ongoing Usually, library operators for GPUs are either written by research in the database community due to the increasing hardware experts [12] or are available out of the box by device heterogeneity of GPUs and their capabilities (e.g., their newest vendors [13]. Overall, we found more than 40 libraries for selling point: tensor cores). Hence, many researchers develop optimal operator implementations for a specific device generation GPUs each packing a set of operators commonly used in one involving tedious operator tuning by hand. On the other hand, or more domains. The benefits of those libraries is that they are there is a growing availability of GPU libraries that provide constantly updated and tested to support newer GPU versions optimized operators for manifold applications. However, the and their predefined interfaces offer high portability as well as question arises how mature these libraries are and whether they faster development time compared to handwritten operators. are fit to replace handwritten operator implementations not only w.r.t. implementation effort and portability, but also in terms of This makes them a perfect match for many commercial performance. database systems, which can rely on GPU libraries to imple- In this paper, we investigate various general-purpose libraries ment well performing database operators. Some example for that are both portable and easy to use for arbitrary GPUs such systems are: SQreamDB using Thrust [14], BlazingDB in order to test their production readiness on the example of using cuDF [15], Brytlyt using the Torch library [16].
    [Show full text]
  • Data Structure
    EDUSAT LEARNING RESOURCE MATERIAL ON DATA STRUCTURE (For 3rd Semester CSE & IT) Contributors : 1. Er. Subhanga Kishore Das, Sr. Lect CSE 2. Mrs. Pranati Pattanaik, Lect CSE 3. Mrs. Swetalina Das, Lect CA 4. Mrs Manisha Rath, Lect CA 5. Er. Dillip Kumar Mishra, Lect 6. Ms. Supriti Mohapatra, Lect 7. Ms Soma Paikaray, Lect Copy Right DTE&T,Odisha Page 1 Data Structure (Syllabus) Semester & Branch: 3rd sem CSE/IT Teachers Assessment : 10 Marks Theory: 4 Periods per Week Class Test : 20 Marks Total Periods: 60 Periods per Semester End Semester Exam : 70 Marks Examination: 3 Hours TOTAL MARKS : 100 Marks Objective : The effectiveness of implementation of any application in computer mainly depends on the that how effectively its information can be stored in the computer. For this purpose various -structures are used. This paper will expose the students to various fundamentals structures arrays, stacks, queues, trees etc. It will also expose the students to some fundamental, I/0 manipulation techniques like sorting, searching etc 1.0 INTRODUCTION: 04 1.1 Explain Data, Information, data types 1.2 Define data structure & Explain different operations 1.3 Explain Abstract data types 1.4 Discuss Algorithm & its complexity 1.5 Explain Time, space tradeoff 2.0 STRING PROCESSING 03 2.1 Explain Basic Terminology, Storing Strings 2.2 State Character Data Type, 2.3 Discuss String Operations 3.0 ARRAYS 07 3.1 Give Introduction about array, 3.2 Discuss Linear arrays, representation of linear array In memory 3.3 Explain traversing linear arrays, inserting & deleting elements 3.4 Discuss multidimensional arrays, representation of two dimensional arrays in memory (row major order & column major order), and pointers 3.5 Explain sparse matrices.
    [Show full text]
  • Conference Program
    Ernest N. Morial ConventionConference Center Program • New Orleans, Louisiana HPC Everywhere, Everyday Conference Program The International Conference for High Performance Exhibition Dates: Computing, Networking, Storage and Analysis November 17-20, 2014 Sponsors: Conference Dates: November 16-21, 2014 Table of Contents 3 Welcome from the Chair 67 HPC Impact Showcase/ Emerging Technologies 4 SC14 Mobile App 82 HPC Interconnections 5 General Information 92 Keynote/Invited Talks 9 SCinet Contributors 106 Papers 11 Registration Pass Access 128 Posters 13 Maps 152 Tutorials 16 Daily Schedule 164 Visualization and Data Analytics 26 Award Talks/Award Presentations 168 Workshops 31 Birds of a Feather 178 SC15 Call for Participation 50 Doctoral Showcase 56 Exhibitor Forum Welcome 3 Welcome to SC14 HPC helps solve some of the SC is fundamentally a technical conference, and anyone world’s most complex problems. who has spent time on the show floor knows the SC Exhibits Innovations from our community have program provides a unique opportunity to interact with far-reaching impact in every area of the future of HPC. Far from being just a simple industry science —from the discovery of new exhibition, our research and industry booths showcase recent 67 HPC Impact Showcase/ drugs to precisely predicting the developments in our field, with a rich combination of research Emerging Technologies next superstorm—even investment labs, universities, and other organizations and vendors of all 82 HPC Interconnections banking! For more than two decades, the SC Conference has types of software, hardware, and services for HPC. been the place to build and share the innovations that are 92 Keynote/Invited Talks making these life-changing discoveries possible.
    [Show full text]
  • Data Structure Invariants
    Lecture 8 Notes Data Structures 15-122: Principles of Imperative Computation (Spring 2016) Frank Pfenning, André Platzer, Rob Simmons 1 Introduction In this lecture we introduce the idea of imperative data structures. So far, the only interfaces we’ve used carefully are pixels and string bundles. Both of these interfaces had the property that, once we created a pixel or a string bundle, we weren’t interested in changing its contents. In this lecture, we’ll talk about an interface that mimics the arrays that are primitively available in C0. To implement this interface, we’ll need to round out our discussion of types in C0 by discussing pointers and structs, two great tastes that go great together. We will discuss using contracts to ensure that pointer accesses are safe. Relating this to our learning goals, we have Computational Thinking: We illustrate the power of abstraction by con- sidering both the client-side and library-side of the interface to a data structure. Algorithms and Data Structures: The abstract arrays will be one of our first examples of abstract datatypes. Programming: Introduction of structs and pointers, use and design of in- terfaces. 2 Structs So far in this course, we’ve worked with five different C0 types — int, bool, char, string, and arrays t[] (there is a array type t[] for every type t). The character, Boolean and integer values that we manipulate, store LECTURE NOTES Data Structures L8.2 locally, and pass to functions are just the values themselves. For arrays (and strings), the things we store in assignable variables or pass to functions are addresses, references to the place where the data stored in the array can be accessed.
    [Show full text]
  • “GPU in HEP: Online High Quality Trigger Processing”
    “GPU in HEP: online high quality trigger processing” ISOTDAQ 15.1.2020 Valencia Gianluca Lamanna (Univ.Pisa & INFN) G.Lamanna – ISOTDAQ – 15/1/2020 Valencia TheWorldin2035 2 The problem in 2035 15/1/2020 Valencia 15/1/2020 – FCC (Future Circular Collider) is only an example Fixed target, Flavour factories, … the physics reach will be defined ISOTDAQ ISOTDAQ – by trigger! What the triggers will look like in 2035? 3 G.Lamanna The trigger in 2035… … will be similar to the current trigger… High reduction factor High efficiency for interesting events Fast decision High resolution …but will be also different… The higher background and Pile Up will limit the ability to trigger 15/1/2020 Valencia 15/1/2020 on interesting events – The primitives will be more complicated with respect today: tracks, clusters, rings ISOTDAQ ISOTDAQ – 4 G.Lamanna The trigger in 2035… Higher energy Resolution for high pt leptons → high-precision primitives High occupancy in forward region → better granularity Higher luminosity track-calo correlation Bunch crossing ID becomes challenging, pile up All of these effects go in the same direction 15/1/2020 Valencia 15/1/2020 More resolution & more granularity more data & more – processing What previously had to be done in hardware may ISOTDAQ ISOTDAQ – now be done in firmware; What was previously done in firmware may now be done in software! 5 G.Lamanna Classic trigger in the future? Is a traditional “pipelined” trigger possible? Yes and no Cost and dimension Getting all data in one place • New links -> data flow • No “slow”
    [Show full text]
  • Using Machine Learning to Improve Dense and Sparse Matrix Multiplication Kernels
    Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2019 Using machine learning to improve dense and sparse matrix multiplication kernels Brandon Groth Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Applied Mathematics Commons, and the Computer Sciences Commons Recommended Citation Groth, Brandon, "Using machine learning to improve dense and sparse matrix multiplication kernels" (2019). Graduate Theses and Dissertations. 17688. https://lib.dr.iastate.edu/etd/17688 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Using machine learning to improve dense and sparse matrix multiplication kernels by Brandon Micheal Groth A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Applied Mathematics Program of Study Committee: Glenn R. Luecke, Major Professor James Rossmanith Zhijun Wu Jin Tian Kris De Brabanter The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2019 Copyright c Brandon Micheal Groth, 2019. All rights reserved. ii DEDICATION I would like to dedicate this thesis to my wife Maria and our puppy Tiger.
    [Show full text]
  • Verification-Aware Opencl Based Read Mapper for Heterogeneous
    JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 CORAL: Verification-aware OpenCL based Read Mapper for Heterogeneous Systems Sidharth Maheshwari, Venkateshwarlu Y. Gudur, Rishad Shafik, Senion Member, IEEE, Ian Wilson, Alex Yakovlev, Fellow, IEEE, and Amit Acharyya, Member, IEEE Abstract—Genomics has the potential to transform medicine from reactive to a personalized, predictive, preventive and participatory (P4) form. Being a Big Data application with continuously increasing rate of data production, the computational costs of genomics have become a daunting challenge. Most modern computing systems are heterogeneous consisting of various combinations of computing resources, such as CPUs, GPUs and FPGAs. They require platform-specific software and languages to program making their simultaneous operation challenging. Existing read mappers and analysis tools in the whole genome sequencing (WGS) pipeline do not scale for such heterogeneity. Additionally, the computational cost of mapping reads is high due to expensive dynamic programming based verification, where optimized implementations are already available. Thus, improvement in filtration techniques is needed to reduce verification overhead. To address the aforementioned limitations with regards to the mapping element of the WGS pipeline, we propose a Cross-platfOrm Read mApper using opencL (CORAL). CORAL is capable of executing on heterogeneous devices/platforms simultaneously. It can reduce computational time by suitably distributing the workload without any additional programming effort. We showcase this on a quadcore Intel CPU along with two Nvidia GTX 590 GPUs, distributing the workload judiciously to achieve up to 2× speedup compared to when only CPUs are used. To reduce the verification overhead, CORAL dynamically adapts k-mer length during filtration.
    [Show full text]
  • Applying Front End Compiler Process to Parse Polynomials in Parallel
    Western University Scholarship@Western Electronic Thesis and Dissertation Repository 12-16-2020 10:30 AM Applying Front End Compiler Process to Parse Polynomials in Parallel Amha W. Tsegaye, The University of Western Ontario Supervisor: Dr. Marc Moreno Maza, The University of Western Ontario A thesis submitted in partial fulfillment of the equirr ements for the Master of Science degree in Computer Science © Amha W. Tsegaye 2020 Follow this and additional works at: https://ir.lib.uwo.ca/etd Part of the Computer Sciences Commons, and the Mathematics Commons Recommended Citation Tsegaye, Amha W., "Applying Front End Compiler Process to Parse Polynomials in Parallel" (2020). Electronic Thesis and Dissertation Repository. 7592. https://ir.lib.uwo.ca/etd/7592 This Dissertation/Thesis is brought to you for free and open access by Scholarship@Western. It has been accepted for inclusion in Electronic Thesis and Dissertation Repository by an authorized administrator of Scholarship@Western. For more information, please contact [email protected]. Abstract Parsing large expressions, in particular large polynomial expressions, is an important task for computer algebra systems. Despite of the apparent simplicity of the problem, its efficient software implementation brings various challenges. Among them is the fact that this is a memory bound application for which a multi-threaded implementation is necessarily limited by the characteristics of the memory organization of supporting hardware. In this thesis, we design, implement and experiment with a multi-threaded parser for large polynomial expressions. We extract parallelism by splitting the input character string, into meaningful sub-strings that can be parsed concurrently before being merged into a single polynomial.
    [Show full text]
  • Exploratory Large Scale Graph Analytics in Arkouda 59 2 60 3 61 4 Zhihui Du,Oliver Alvarado Rodriguez and Michael Merrill and William Reus 62 5 David A
    1 Exploratory Large Scale Graph Analytics in Arkouda 59 2 60 3 61 4 Zhihui Du,Oliver Alvarado Rodriguez and Michael Merrill and William Reus 62 5 David A. Bader [email protected] 63 6 {zhihui.du,oaa9,bader}@njit.edu [email protected] 64 7 New Jersey Institute of Technology Department of Defense 65 8 Newark, New Jersey, USA USA 66 9 67 10 ABSTRACT 1 INTRODUCTION 68 11 Exploratory graph analytics helps maximize the informational value A graph is a well defined mathematical model to formulate the rela- 69 12 for a graph. However, the increasing graph size makes it impossi- tionship between different objects and is widely used in numerous 70 13 ble for existing popular exploratory data analysis tools to handle domains such as social sciences, biological systems, and informa- 71 14 dozens-of-terabytes or even larger data sets in the memory of a tion systems. The edge distributions of many large scale real world 72 15 common laptop/personal computer. Arkouda is a framework un- problems tend to follow a power-law distribution [1, 11, 26]. Dense 73 16 der early-development that brings together the productivity of graph data structures and algorithms will consume much more 74 17 Python at the user side with the high-performance of Chapel at memory and cannot analyze very large sparse graphs efficiently. 75 18 the server side. In this paper, the preliminary work on overcoming Therefore, parallel algorithms for sparse graphs [23] have become 76 19 the memory limit and high performance computing coding road- an important research topic to efficiently analyze the large and 77 20 block for high level Python users to perform large graph analysis sparse graphs from different real-world problems.
    [Show full text]
  • GPU-Accelerated Applications for HPC Industries| NVIDIA
    GPU-ACCELERATED APPLICATIONS GPU‑ACCELERATED APPLICATIONS Accelerated computing has revolutionized a broad range of industries with over three hundred applications optimized for GPUs to help you accelerate your work. CONTENTS 01 Computational Finance 02 Data Science & Analytics 02 Defense and Intelligence 03 Deep Learning 03 Manufacturing: CAD and CAE COMPUTATIONAL FLUID DYNAMICS COMPUTATIONAL STRUCTURAL MECHANICS DESIGN AND VISUALIZATION ELECTRONIC DESIGN AUTOMATION 06 Media and Entertainment ANIMATION, MODELING AND RENDERING COLOR CORRECTION AND GRAIN MANAGEMENT COMPOSITING, FINISHING AND EFFECTS EDITING ENCODING AND DIGITAL DISTRIBUTION ON-AIR GRAPHICS ON-SET, REVIEW AND STEREO TOOLS WEATHER GRAPHICS 10 Oil and Gas 11 Research: Higher Education and Supercomputing COMPUTATIONAL CHEMISTRY AND BIOLOGY NUMERICAL ANALYTICS PHYSICS 16 Safety & Security Computational Finance APPLICATION DESCRIPTION SUPPORTED FEATURES MULTI-GPU SUPPORT Aaon Benfield Pathwise™ Specialized platform for real-time hedging, Spreadsheet-like modeling interfaces, Yes valuation, pricing and risk management Python-based scripting environment and Grid middleware Altimesh’s Hybridizer C# Multi-target C# framework for data parallel C# with translation to GPU or Multi-Core Yes computing. Xeon Elsen Accelerated Secure, accessible, and accelerated back- Web-like API with Native bindings for Yes Computing Engine (TM) testing, scenario analysis, risk analytics Python, R, Scala, C. Custom models and and real-time trading designed for easy data streams are easy to add. integration and rapid development. Global Valuation Esther In-memory risk analytics system for OTC High quality models not admitting closed Yes portfolios with a particular focus on XVA form solutions, efficient solvers based on metrics and balance sheet simulations. full matrix linear algebra powered by GPUs and Monte Carlo algorithms.
    [Show full text]
  • Matrix Computations on the GPU with Arrayfire for Python and C/C++
    Matrix Computations on the GPU with ArrayFire for Python and C/C++ by Andrzej Chrzȩszczyk of Jan Kochanowski University Phone: +1.800.570.1941 • [email protected] • www.accelereyes.com The “AccelerEyes” logo is a trademark of AccelerEyes. Foreward by John Melonakos of AccelerEyes One of the biggest pleasures we experience at AccelerEyes is watching programmers develop awesome stuff with ArrayFire. Oftentimes, ArrayFire programmers contribute back to the community in the form of code, examples, or help on the community forums. This document is an example of an extraordinary contribution by an ArrayFire programmer, written entirely by Andrzej Chrzȩszczyk of Jan Kochanowski University. Readers of this document will find it to be a great resource in learning the ins-and-outs of ArrayFire. On behalf of the rest of the community, we thank you Andrzej for this marvelous contribution. Phone: +1.800.570.1941 • [email protected] • www.accelereyes.com The “AccelerEyes” logo is a trademark of AccelerEyes. Foreward by Andrzej Chrzȩszczyk of Jan Kochanowski University In recent years the Graphics Processing Units (GPUs) designed to efficiently manipulate computer graphics are more and more often used to General Purpose computing on GPU (GPGPU). NVIDIA’s CUDA and OpenCL platforms allow for general purpose parallel programming on modern graphics devices. Unfortunately many owners of powerful graphic cards are not experienced programmers and can find these platforms quite difficult. The purpose of this document is to make the first steps in using modern graphics cards to general purpose computations simpler. In the first two chapters we want to present the ArrayFire software library which in our opinion allows to start computations on GPU in the easiest way.
    [Show full text]
  • Array Data Structure
    © 2014 IJIRT | Volume 1 Issue 6 | ISSN : 2349-6002 ARRAY DATA STRUCTURE Isha Batra, Divya Raheja Information Technology Dronacharya College Of Engineering, Farukhnagar,Gurgaon Abstract- In computer science, an array data structure valid index tuples and the addresses of the elements or simply an array is a data structure consisting of a (and hence the element addressing formula) are collection of elements (values or variables), each usually but not always fixed while the array is in use. identified by at least one array index or key. An array is The term array is often used to mean array data type, stored so that the position of each element can be a kind of data type provided by most high-level computed from its index tuple by a mathematical formula. The simplest type of data structure is a linear programming languages that consists of a collection array, also called one-dimensional array. For example, of values or variables that can be selected by one or an array of 10 32-bit integer variables, with indices 0 more indices computed at run-time. Array types are through 9, may be stored as 10 words at memory often implemented by array structures; however, in addresses 2000, 2004, 2008, … 2036, so that the element some languages they may be implemented by hash with index i has the address 2000 + 4 × i. The term array tables, linked lists, search trees, or other data is often used to mean array data type, a kind of data structures. The term is also used, especially in the type provided by most high-level programming description of algorithms, to mean associative array languages that consists of a collection of values or or "abstract array", a theoretical computer science variables that can be selected by one or more indices computed at run-time.
    [Show full text]