
Numerical computation in C++ Introduction Numerical Fast numerical computation in C++: algorithms, libraries and their Expression Templates and Beyond to performance Interlude: Profiling Lazy Code Generation (LzCG) on Linux Expression template generalities B. Nikolic Lazy code generation – what it is & how it works Cavendish Laboratory/Kavli Institute LzCG example University of Cambridge Summary BoostCon 2011 May 2011 Numerical Overview of ideas computation in C++ Introduction 1. ‘Standard’ rules of C++ lead to inefficient numerical Numerical algorithms, code libraries and their 2. New rules (≡ sub-languages) can be implemented performance Interlude: Profiling using expression templates on Linux 2.1 Types are used confer information about expressions Expression template 2.2 Translated to ‘standard’ C++ at compile-time generalities 3. Makes high-performance numerical C++ libraries Lazy code generation – what possible and successful it is & how it works 4. But is it enough? LzCG example 4.1 Most efficient algorithm not obvious at compile-time Summary 4.2 Convenience/flexibility of generating code in C++ 5. Types retain information about expressions in signatures in object code 5.1 Can re-generate expression template implementations post-compilation-time Numerical Outline computation in C++ Introduction Numerical Introduction algorithms, libraries and their performance Numerical algorithms, libraries and their performance Interlude: Profiling on Linux Expression Interlude: Profiling on Linux template generalities Lazy code Expression template generalities generation – what it is & how it works LzCG example Lazy code generation – what it is & how it works Summary LzCG example Summary Numerical About myself: ALMA telescope computation in C++ Largest ground-based astronomy project in the world Introduction Numerical algorithms, libraries and their performance Interlude: Profiling on Linux Expression template generalities Lazy code generation – what it is & how it works LzCG example Summary Currently being commissioned at altitude of 5000 m in Chile. Will have 66 telescopes separated by up to 15 kms and observed at wavelength between 7 and 0.35 mm. Numerical About myself: Green Bank Telescope computation in C++ Largest steerable telescope in the world Introduction Numerical algorithms, libraries and their performance Interlude: Profiling on Linux Expression template generalities Lazy code generation – what it is & how it works LzCG example Summary Main reflector is 100x110 m in size, total height 160 m. Entire structure is accurate to 0.25 mm. Numerical About myself: Thermal radio emission from computation in C++ Messier 66 Introduction Numerical algorithms, libraries and their performance Interlude: Profiling I Colour scale is on Linux emission from dust at Expression template 0.024 mm wavelegnth generalities Lazy code I Contours represent generation – what emission at 3 mm from it is & how it works hot electron gas LzCG example Summary I Both appear to be powered by recent star formation Numerical General Interests computation in C++ Introduction Numerical algorithms, libraries and their performance Interlude: Profiling I Model optimisation and statistical inference on Linux (maximum-likelihood, Markov Chain Monte Carlo, Expression template Nested Sampling techniques) generalities Pricing and risk-management of derivative contracts Lazy code I generation – what it is & how it works I Remote sensing of Earth’s atmosphere LzCG example I Radiative transfer and other physical simulations Summary ) All very numerically intensive applications... Numerical Aperture synthesis radio-astronomy computation in C++ Introduction Revolutionised the radio view of the universe – Nobel I Numerical prize in 1972 algorithms, libraries and their I Development of the technique closely tied to performance computers: Interlude: Profiling on Linux I Lots of Fourier Transforms Expression I Large quantities of data to be binned, inspected, template generalities discarded if necessary Lazy code I Instruments inherently unstable so calibration is generation – what critical it is & how it works LzCG example I Atacama Large Millimetre Array: eventually 66 antennas, ∼ 20 Mb=s average output data rate: Summary I Computational issues inconvenient, reduce scientist productivity I Square Kilometre Array (SKA): 1000s antennas, wide field of view, ∼ few Gb=s average output data rate: I Computational issues limiting factor in scientific output Numerical Risk management of ‘derivative’ contracts in computation in C++ finance Requirements in just one product line (e.g., credit derivatives) Introduction Numerical algorithms, libraries and their Typically calculations involve either: solving PDEs using performance Interlude: Profiling finite differences; or computing FFTs; or Monte-Carlo (MC) on Linux simulations. Expression template generalities I 2000 nodes × 1 kW=node + 50% aircon cost = Lazy code 3 MW generation – what it is & how it works I 3 MW × 10 p=s × 8500 hr=yr = LzCG example 6 ∼ 2:5 × 10 GBP=yr! Summary I Additional costs / number of nodes: I Installation, maintenance, software licenses (even Excel sometimes!) I Floor-space (in expensive buildings) I Standby backup power generation costs Numerical Numerical performance computation in C++ (Why) does it matter? Introduction Numerical algorithms, Easily parallelisable Difficult to parallelise libraries and their performance Interlude: Profiling I Cost I Feasibility on Linux I Heat, power, floor I Latency Expression template space generalities I User patience Lazy code I Environmental impact generation – what it is & how it works I Time to scale-up LzCG example I Access to capital Summary Parallelisation is usually the most important aspect of high-performance numerical computing I Not directly considering it in this talk although much of the material is relevant Numerical Small problems ≡ simple solutions computation in C++ Many practical scientific and industrial problems can be accelerated a simple way Introduction Numerical algorithms, Listing 1: By-hand coding + SIMD intrinsics libraries and their void add2Vect ( const std :: vector<double> &v1 , performance const std :: vector<double> &v2 , std :: vector<double> &res ) f Interlude: Profiling typedef double v2df a t t r i b u t e ( ( mode(V2DF ) ) ) ; on Linux ∗ ∗ ∗ v2df dest =( v2df )&( res.begin()); Expression const s i z e t n=v1.size(); template ∗ ∗ const v2df src1 =( const v2df )&v1 [ 0 ] ; generalities const v2df ∗src2 =( const v2df ∗)&v2 [ 0 ] ; i f ( n%2==0) Lazy code f generation – what for ( s i z e t i =0; i<n / 2 ; i ++) it is & how it works f dest [ i ]= b u i l t i n i a 3 2 addpd(src1[i],src2[i ]); LzCG example g g Summary else f for ( s i z e t i =0; i<n ; ++ i ) f dest[i]=src1[i]+src2[i ]; g g g Simple problems are common in real life but not really the subject of this talk! Numerical Hand coding unsuitable for large systems computation in C++ Introduction Numerical algorithms, libraries and their performance I Correctness Interlude: Profiling I Maintainability, readability, portability on Linux Expression I Algorithms need adjustment over time template generalities I Experiment with different implementations of Lazy code generation – what algorithms it is & how it works I Approximations: how much precision, what accuracy LzCG example is necessary? Summary ) These can be difficult to achieve with complex hand-crafted code! Numerical Warning! computation in C++ “Don’t try this at home” – try existing libraries first Introduction Numerical algorithms, libraries and their performance Writing numerical libraries is difficult and error prone – Interlude: Profiling on Linux always carefully consider alternatives! Expression template I Can you use standard existing libraries (“C” or “C++”) generalities Lazy code I Are you writing a general purpose library or an generation – what application? it is & how it works LzCG example I Can you, in advance, identify a subset of algorithm Summary which is likely to consume most time but can present a clean, data-only, interface? Numerical Outline computation in C++ Introduction Numerical Introduction algorithms, libraries and their performance Numerical algorithms, libraries and their performance Interlude: Profiling on Linux Expression Interlude: Profiling on Linux template generalities Lazy code Expression template generalities generation – what it is & how it works LzCG example Lazy code generation – what it is & how it works Summary LzCG example Summary Numerical Requirements for good numerical computation in C++ performance Introduction Numerical algorithms, I Maximise parallelism libraries and their performance I Use all of the nodes/processors/cores/execution units Interlude: Profiling I Use Single-Instruction-Multiple-Data (SIMD) on Linux Minimise memory access Expression I template generalities I Keep close data to be processed together I Use algorithms that process small chunks of input Lazy code generation – what data at a time it is & how it works I Avoid temporaries LzCG example I Minimise ‘branching’ Summary I Keep the pipeline and speculative fetches good I But, need enough code at hand to execute I Minimise quantity of transcendental calculations I Includes division in this set I Reducing precision or accuracy makes these faster Numerical Optimisation Challenges I computation in C++ Introduction I Want: to describe the algorithm in simple, readable, Numerical re-usable way algorithms, libraries and their / / This : performance R=A+B+C+D+E; / / Not t h i s : Interlude: Profiling addFiveVect Double Double Double Double
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages64 Page
-
File Size-