Fast Numerical Computation in C++: Expression Templates and Beyond

Fast Numerical Computation in C++: Expression Templates and Beyond

Numerical computation in C++ Introduction Numerical Fast numerical computation in C++: algorithms, libraries and their Expression Templates and Beyond to performance Interlude: Profiling Lazy Code Generation (LzCG) on Linux Expression template generalities B. Nikolic Lazy code generation – what it is & how it works Cavendish Laboratory/Kavli Institute LzCG example University of Cambridge Summary BoostCon 2011 May 2011 Numerical Overview of ideas computation in C++ Introduction 1. ‘Standard’ rules of C++ lead to inefficient numerical Numerical algorithms, code libraries and their 2. New rules (≡ sub-languages) can be implemented performance Interlude: Profiling using expression templates on Linux 2.1 Types are used confer information about expressions Expression template 2.2 Translated to ‘standard’ C++ at compile-time generalities 3. Makes high-performance numerical C++ libraries Lazy code generation – what possible and successful it is & how it works 4. But is it enough? LzCG example 4.1 Most efficient algorithm not obvious at compile-time Summary 4.2 Convenience/flexibility of generating code in C++ 5. Types retain information about expressions in signatures in object code 5.1 Can re-generate expression template implementations post-compilation-time Numerical Outline computation in C++ Introduction Numerical Introduction algorithms, libraries and their performance Numerical algorithms, libraries and their performance Interlude: Profiling on Linux Expression Interlude: Profiling on Linux template generalities Lazy code Expression template generalities generation – what it is & how it works LzCG example Lazy code generation – what it is & how it works Summary LzCG example Summary Numerical About myself: ALMA telescope computation in C++ Largest ground-based astronomy project in the world Introduction Numerical algorithms, libraries and their performance Interlude: Profiling on Linux Expression template generalities Lazy code generation – what it is & how it works LzCG example Summary Currently being commissioned at altitude of 5000 m in Chile. Will have 66 telescopes separated by up to 15 kms and observed at wavelength between 7 and 0.35 mm. Numerical About myself: Green Bank Telescope computation in C++ Largest steerable telescope in the world Introduction Numerical algorithms, libraries and their performance Interlude: Profiling on Linux Expression template generalities Lazy code generation – what it is & how it works LzCG example Summary Main reflector is 100x110 m in size, total height 160 m. Entire structure is accurate to 0.25 mm. Numerical About myself: Thermal radio emission from computation in C++ Messier 66 Introduction Numerical algorithms, libraries and their performance Interlude: Profiling I Colour scale is on Linux emission from dust at Expression template 0.024 mm wavelegnth generalities Lazy code I Contours represent generation – what emission at 3 mm from it is & how it works hot electron gas LzCG example Summary I Both appear to be powered by recent star formation Numerical General Interests computation in C++ Introduction Numerical algorithms, libraries and their performance Interlude: Profiling I Model optimisation and statistical inference on Linux (maximum-likelihood, Markov Chain Monte Carlo, Expression template Nested Sampling techniques) generalities Pricing and risk-management of derivative contracts Lazy code I generation – what it is & how it works I Remote sensing of Earth’s atmosphere LzCG example I Radiative transfer and other physical simulations Summary ) All very numerically intensive applications... Numerical Aperture synthesis radio-astronomy computation in C++ Introduction Revolutionised the radio view of the universe – Nobel I Numerical prize in 1972 algorithms, libraries and their I Development of the technique closely tied to performance computers: Interlude: Profiling on Linux I Lots of Fourier Transforms Expression I Large quantities of data to be binned, inspected, template generalities discarded if necessary Lazy code I Instruments inherently unstable so calibration is generation – what critical it is & how it works LzCG example I Atacama Large Millimetre Array: eventually 66 antennas, ∼ 20 Mb=s average output data rate: Summary I Computational issues inconvenient, reduce scientist productivity I Square Kilometre Array (SKA): 1000s antennas, wide field of view, ∼ few Gb=s average output data rate: I Computational issues limiting factor in scientific output Numerical Risk management of ‘derivative’ contracts in computation in C++ finance Requirements in just one product line (e.g., credit derivatives) Introduction Numerical algorithms, libraries and their Typically calculations involve either: solving PDEs using performance Interlude: Profiling finite differences; or computing FFTs; or Monte-Carlo (MC) on Linux simulations. Expression template generalities I 2000 nodes × 1 kW=node + 50% aircon cost = Lazy code 3 MW generation – what it is & how it works I 3 MW × 10 p=s × 8500 hr=yr = LzCG example 6 ∼ 2:5 × 10 GBP=yr! Summary I Additional costs / number of nodes: I Installation, maintenance, software licenses (even Excel sometimes!) I Floor-space (in expensive buildings) I Standby backup power generation costs Numerical Numerical performance computation in C++ (Why) does it matter? Introduction Numerical algorithms, Easily parallelisable Difficult to parallelise libraries and their performance Interlude: Profiling I Cost I Feasibility on Linux I Heat, power, floor I Latency Expression template space generalities I User patience Lazy code I Environmental impact generation – what it is & how it works I Time to scale-up LzCG example I Access to capital Summary Parallelisation is usually the most important aspect of high-performance numerical computing I Not directly considering it in this talk although much of the material is relevant Numerical Small problems ≡ simple solutions computation in C++ Many practical scientific and industrial problems can be accelerated a simple way Introduction Numerical algorithms, Listing 1: By-hand coding + SIMD intrinsics libraries and their void add2Vect ( const std :: vector<double> &v1 , performance const std :: vector<double> &v2 , std :: vector<double> &res ) f Interlude: Profiling typedef double v2df a t t r i b u t e ( ( mode(V2DF ) ) ) ; on Linux ∗ ∗ ∗ v2df dest =( v2df )&( res.begin()); Expression const s i z e t n=v1.size(); template ∗ ∗ const v2df src1 =( const v2df )&v1 [ 0 ] ; generalities const v2df ∗src2 =( const v2df ∗)&v2 [ 0 ] ; i f ( n%2==0) Lazy code f generation – what for ( s i z e t i =0; i<n / 2 ; i ++) it is & how it works f dest [ i ]= b u i l t i n i a 3 2 addpd(src1[i],src2[i ]); LzCG example g g Summary else f for ( s i z e t i =0; i<n ; ++ i ) f dest[i]=src1[i]+src2[i ]; g g g Simple problems are common in real life but not really the subject of this talk! Numerical Hand coding unsuitable for large systems computation in C++ Introduction Numerical algorithms, libraries and their performance I Correctness Interlude: Profiling I Maintainability, readability, portability on Linux Expression I Algorithms need adjustment over time template generalities I Experiment with different implementations of Lazy code generation – what algorithms it is & how it works I Approximations: how much precision, what accuracy LzCG example is necessary? Summary ) These can be difficult to achieve with complex hand-crafted code! Numerical Warning! computation in C++ “Don’t try this at home” – try existing libraries first Introduction Numerical algorithms, libraries and their performance Writing numerical libraries is difficult and error prone – Interlude: Profiling on Linux always carefully consider alternatives! Expression template I Can you use standard existing libraries (“C” or “C++”) generalities Lazy code I Are you writing a general purpose library or an generation – what application? it is & how it works LzCG example I Can you, in advance, identify a subset of algorithm Summary which is likely to consume most time but can present a clean, data-only, interface? Numerical Outline computation in C++ Introduction Numerical Introduction algorithms, libraries and their performance Numerical algorithms, libraries and their performance Interlude: Profiling on Linux Expression Interlude: Profiling on Linux template generalities Lazy code Expression template generalities generation – what it is & how it works LzCG example Lazy code generation – what it is & how it works Summary LzCG example Summary Numerical Requirements for good numerical computation in C++ performance Introduction Numerical algorithms, I Maximise parallelism libraries and their performance I Use all of the nodes/processors/cores/execution units Interlude: Profiling I Use Single-Instruction-Multiple-Data (SIMD) on Linux Minimise memory access Expression I template generalities I Keep close data to be processed together I Use algorithms that process small chunks of input Lazy code generation – what data at a time it is & how it works I Avoid temporaries LzCG example I Minimise ‘branching’ Summary I Keep the pipeline and speculative fetches good I But, need enough code at hand to execute I Minimise quantity of transcendental calculations I Includes division in this set I Reducing precision or accuracy makes these faster Numerical Optimisation Challenges I computation in C++ Introduction I Want: to describe the algorithm in simple, readable, Numerical re-usable way algorithms, libraries and their / / This : performance R=A+B+C+D+E; / / Not t h i s : Interlude: Profiling addFiveVect Double Double Double Double

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    64 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us