IntelIntel®® MathMath KernelKernel LibraryLibrary

ReleaseRelease 7.07.0

MarchMarch 20052005 IntelIntel®® MKLMKL PurposePurpose z Performance, performance, performance! z ’s scientific and engineering floating point math library z Initially only basic subroutines (BLAS) and fast Fourier transformations (FFT) z Address: – Solvers such as linear algebra package (LAPACK) and BLAS – Eigenvector/eigenvalue solvers (BLAS, LAPACK) – Some quantum chemistry needs (dgemm) – PDEs, signal processing, seismic, solid-state physics (FFTs) – General scientific, financial - vector transcendental functions, vector markup language (VML) z Tune for Intel processors – current & future

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL PurposePurpose –– DonDon’’tsts z ButBut dondon’’tt useuse IntelIntel MKLMKL onon ……

X’ 4x4 X Y’ Transformation Y Z’ = matrix Z But you could W’ W use Intel® IPP1

Geometric transformation z DonDon’’tt useuse IntelIntel MKLMKL onon ““smallsmall”” countscounts z DonDon’’tt callcall vectorvector mathmath functionsfunctions onon smallsmall nn

1Intel Integrated Performance Primitives

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL ContentsContents

BLAS (basic linear algebra subroutines) z Level 1 BLAS – vector-vector operations – 15 function types – 48 functions z Level 2 BLAS – matrix-vector operations – 26 function types – 66 functions z Level 3 BLAS – matrix-matrix operations – 9 function types – 30 functions z Extended BLAS – level 1 BLAS for sparse vectors – 8 function types – 24 functions

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL ContentsContents LAPACKLAPACK (linear(linear algebraalgebra package)package) – Solvers and eigensolvers, hundreds of routines! – More than 1000 user callable and support routines FFTsFFTs (fast(fast FourierFourier transforms)transforms) – One and two dimensional – With and without frequency ordering (bit reversal) VMLVML (vector(vector mathmath library)library) – Set of vectorized transcendental functions – Most of libm functions, but faster DirectDirect SparseSparse solversolver ((PardisoPardiso*)*)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL ContentsContents z MostMost ofof IntelIntel MKLMKL isis FortranFortran interfaceinterface z LegacyLegacy ofof highhigh performanceperformance computationcomputation z BLAS,BLAS, LAPACKLAPACK areare bothboth ,Fortran, makemake upup mostmost ofof librarylibrary z CBLASCBLAS interfaceinterface –– moremore convenientconvenient forfor /C++C/C++ programmerprogrammer toto callcall BLASBLAS

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL ContentsContents -- EnvironmentEnvironment z SupportsSupports cdeclcdecl andand CVFCVF defaultdefault interfacesinterfaces z SupportsSupports IntelIntel andand CVFCVF FortranFortran compilerscompilers –– importimport forfor thisthis supportsupport relatesrelates toto runtimeruntime librarieslibraries z SupportsSupports *Linux* andand Windows*Windows* OSOS z StaticStatic andand dynamicallydynamically linkedlinked librarieslibraries z SupportsSupports allall processorsprocessors –– 3232--bitbit andand 6464--bitbit z LargeLarge setset ofof teststests andand examplesexamples z ExtensiveExtensive documentationdocumentation

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries ThreadingThreading z Most of Intel® MKL could be threaded but … – Limited resource is memory bandwidth – Threading level 1, level 2 BLAS mostly ineffective ( O(n) ) z Numerous opportunities for threading – Level 3 BLAS ( O(n3) ) – LAPACK ( O(n3) ) – FFTs ( O(n log(n) ) – VML? Depends on processor and function z All threading uses OpenMP* z All Intel MKL is designed and compiled for thread safety

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries HowHow toto LinkLink WithWith MKLMKL onon ItaniumItanium®® z Set path to installation directory – E.g. export MKLPATH=/opt/intel/mkl z Static sample: – ld myprog.o $MKLPATH/libmkl_lapack.a $MKLPATH/libmkl_ipf.a - L$MKLPATH -lguide -lpthread – Itanium®-based processor static linking of LAPACK and kernels. Processor dispatcher will call the appropriate kernel for the system at runtime. z Dynamic sample: – ld myprog.o -L$MKLPATH -lmkl_lapack64 -lmkl -lguide -lpthread – Dynamic linking on Itanium®-based platforms, LAPACK library (double precision functions), Itanium-based processor kernels. Shared object dispatcher will dynamically load the appropriate shared object with specific kernel for the system at runtime

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries BLASBLAS ReviewReview z 33 ““levelslevels”” ofof functionsfunctions ++ sparsesparse – Level 1: vector-vector operations – Level 2: vector-matrix operations – Level 3: matrix-matrix operations – Sparse: level 1 operations on sparse vectors z ““LevelsLevels”” followfollow historyhistory – Level 1 in early 70’s – Level 2 in mid-70’s followed immediately by level 3

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries BLASBLAS NamingNaming ConventionsConventions z GeneralGeneral scheme:scheme: z precision:precision: oneone oror twotwo lettersletters – 1 letter implies input and output are same type s = single, d = double, c = single complex, z = double complex – 2 letters input and output are different cs, zd: complex in, real out; sc, dz: real in, complex out

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries BLASBLAS NamingNaming ConventionsConventions z LevelLevel 11 BLAS:BLAS: where modifiers are c: conjugated (cdotc), u: unconjugated (cdotu), g: givens (srotg) z LevelLevel 2,2, 33 BLASBLAS :: g: general - ge: general; gb: band s: symmetric - sy: symmetric; sp: packed; sb: band h: Hermitian - he: Hermitian; hp: packed ; hb: band t: triangular - tr: triangular; tp: packed; tb: band

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries BLASBLAS NamingNaming ConventionsConventions z LevelLevel 22 mv: matrix-vector; sv: solve (vector operations); r: rank update; r2: rank 2 update dger: double-precision general rank update: AA :=:= alphaalpha ** xx ** yy’’ ++ AA z LevelLevel 33 mm: matrix-matrix; sm: solve (matrix operations); r: rank update; r2: rank 2 update dsyr2k: double-precision symmetric rank-2 update

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries RNGRNG FunctionsFunctions

z Gaussian (RPM, Box-Muller Methods) z Exponential z Laplace z Uniform (a,b), (-a,a) z Weibull z Rayleigh z Cauchy z Lognormal z Discrete Uniform [a,b) z Geometric z Bernoulli z Others DGEMMDGEMM onon IPF2:IPF2: NN == 10241024

100% 90% 80% 90%-100% 70% 80%-90% 60% 50% % pe ak 70%-80% 40% 60%-70% 30% 50%-60% 20% 10% 40%-50% 0% 30%-40% 960 256

1024 20%-30% 512

k 224 128

128 10%-20% 72 80 m 56 40

32 0%-10% 8 1.0 GHz Itanium™ 2 Processor8 – in 6.0 update LINPACKLINPACK onon 1.01.0 GHzGHz IPF2IPF2

16000

14000

12000

10000

1 CPU 2 CPU 8000

MFLOPS 4 CPU

6000

4000

2000

0 160 320 640 1000 1280 2000 2560 3000 4000 5000 5120 6000 7000 8000 9000 10000 11000 12000 14000 Number of Equations 2D2D DFTs*onDFTs*on 900900 MHzMHz IPF2IPF2

4000 3500 3000 2500 2000 1P 1500 2P MFLOPS 1000 500 0 128 384 640 900 1152 1440 1728 1920 2250 *Single precision complex Transform Siz4e MKL 6.0 β update 1D1D DFTs*onDFTs*on 900900 MHzMHz IPF2IPF2 3000 2500 2000 1500 1P 1000 MFLOPS 500 0 96 192 384 768 1152 1536 1920 *Single precision complex MKL 6.0 β update Transform Size MKLMKL Status,Status, PlansPlans z CurrentCurrent ProductionProduction ReleaseRelease isis 7.27.2 aavailablevailable inin 22 versionsversions – Standard MKL – Cluster MKL – Standard MKL _ ScaLAPACK z VersionVersion 7.2.17.2.1 –– toto bebe releasedreleased inin Q1/2005Q1/2005 – Improvements on Itanium® – BLAS:DGEMM: 1-3% improvement for TN and TT cases – BLAS:*TRMV, ZGERC, ZGERU: 20-30% improvement – VML – vdPowx: improved for special cases – To be released in Q3/2004 FutureFuture ReleasesReleases ofof MKLMKL

z NewNew capabilitiescapabilities – C++ Wrappers – Iterative Sparse Solver – LAPACK 4.0 support – Additional statistical functions – Support for upcoming Intel processors – More… • IntelIntel®® ClusterCluster MKLMKL – Distributed Memory DFTs – Distributed Memory sparse solver – Additional ScaLAPACK performance optimizations MKLMKL SummarySummary z EasyEasy wayway toto portableportable codecode forfor allall IntelIntel®® architectures,architectures, Linux*Linux* andand Windows*Windows* z MKLMKL forfor ItaniumItanium®® processorprocessor pathpath toto easyeasy highhigh performanceperformance forfor applicationsapplications z TechnicalTechnical computationcomputation supportsupport ––linearlinear algebraalgebra (BLAS,(BLAS, LAPACK)LAPACK) ––FFTsFFTs ––vectorvector transcendentalstranscendentals (VML)(VML) ––ClusterCluster computingcomputing beingbeing addedadded