Intel® Math Kernel Library

IntelIntel®® MathMath KernelKernel LibraryLibrary ReleaseRelease 7.07.0 MarchMarch 20052005 IntelIntel®® MKLMKL PurposePurpose z Performance, performance, performance! z Intel’s scientific and engineering floating point math library z Initially only basic linear algebra subroutines (BLAS) and fast Fourier transformations (FFT) z Address: – Solvers such as linear algebra package (LAPACK) and BLAS – Eigenvector/eigenvalue solvers (BLAS, LAPACK) – Some quantum chemistry needs (dgemm) – PDEs, signal processing, seismic, solid-state physics (FFTs) – General scientific, financial - vector transcendental functions, vector markup language (VML) z Tune for Intel processors – current & future Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL PurposePurpose –– DonDon’’tsts z ButBut dondon’’tt useuse IntelIntel MKLMKL onon …… X’ 4x4 X Y’ Transformation Y Z’ = matrix Z But you could W’ W use Intel® IPP1 Geometric transformation z DonDon’’tt useuse IntelIntel MKLMKL onon ““smallsmall”” countscounts z DonDon’’tt callcall vectorvector mathmath functionsfunctions onon smallsmall nn 1Intel Integrated Performance Primitives Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL ContentsContents BLAS (basic linear algebra subroutines) z Level 1 BLAS – vector-vector operations – 15 function types – 48 functions z Level 2 BLAS – matrix-vector operations – 26 function types – 66 functions z Level 3 BLAS – matrix-matrix operations – 9 function types – 30 functions z Extended BLAS – level 1 BLAS for sparse vectors – 8 function types – 24 functions Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL ContentsContents LAPACKLAPACK (linear(linear algebraalgebra package)package) – Solvers and eigensolvers, hundreds of routines! – More than 1000 user callable and support routines FFTsFFTs (fast(fast FourierFourier transforms)transforms) – One and two dimensional – With and without frequency ordering (bit reversal) VMLVML (vector(vector mathmath library)library) – Set of vectorized transcendental functions – Most of libm functions, but faster DirectDirect SparseSparse solversolver ((PardisoPardiso*)*) Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL ContentsContents z MostMost ofof IntelIntel MKLMKL isis FortranFortran interfaceinterface z LegacyLegacy ofof highhigh performanceperformance computationcomputation z BLAS,BLAS, LAPACKLAPACK areare bothboth Fortran,Fortran, makemake upup mostmost ofof librarylibrary z CBLASCBLAS interfaceinterface –– moremore convenientconvenient forfor C/C++C/C++ programmerprogrammer toto callcall BLASBLAS Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL ContentsContents -- EnvironmentEnvironment z SupportsSupports cdeclcdecl andand CVFCVF defaultdefault interfacesinterfaces z SupportsSupports IntelIntel andand CVFCVF FortranFortran compilerscompilers –– importimport forfor thisthis supportsupport relatesrelates toto runtimeruntime librarieslibraries z SupportsSupports Linux*Linux* andand Windows*Windows* OSOS z StaticStatic andand dynamicallydynamically linkedlinked librarieslibraries z SupportsSupports allall processorsprocessors –– 3232--bitbit andand 6464--bitbit z LargeLarge setset ofof teststests andand examplesexamples z ExtensiveExtensive documentationdocumentation Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries ThreadingThreading z Most of Intel® MKL could be threaded but … – Limited resource is memory bandwidth – Threading level 1, level 2 BLAS mostly ineffective ( O(n) ) z Numerous opportunities for threading – Level 3 BLAS ( O(n3) ) – LAPACK ( O(n3) ) – FFTs ( O(n log(n) ) – VML? Depends on processor and function z All threading uses OpenMP* z All Intel MKL is designed and compiled for thread safety Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries HowHow toto LinkLink WithWith MKLMKL onon ItaniumItanium®® z Set path to installation directory – E.g. export MKLPATH=/opt/intel/mkl z Static sample: – ld myprog.o $MKLPATH/libmkl_lapack.a $MKLPATH/libmkl_ipf.a - L$MKLPATH -lguide -lpthread – Itanium®-based processor static linking of LAPACK and kernels. Processor dispatcher will call the appropriate kernel for the system at runtime. z Dynamic sample: – ld myprog.o -L$MKLPATH -lmkl_lapack64 -lmkl -lguide -lpthread – Dynamic linking on Itanium®-based platforms, LAPACK library (double precision functions), Itanium-based processor kernels. Shared object dispatcher will dynamically load the appropriate shared object with specific kernel for the system at runtime Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries BLASBLAS ReviewReview z 33 ““levelslevels”” ofof functionsfunctions ++ sparsesparse – Level 1: vector-vector operations – Level 2: vector-matrix operations – Level 3: matrix-matrix operations – Sparse: level 1 operations on sparse vectors z ““LevelsLevels”” followfollow historyhistory – Level 1 in early 70’s – Level 2 in mid-70’s followed immediately by level 3 The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries BLASBLAS NamingNaming ConventionsConventions z GeneralGeneral scheme:scheme: <precision><name><modifier><precision><name><modifier> z precision:precision: oneone oror twotwo lettersletters – 1 letter implies input and output are same type s = single, d = double, c = single complex, z = double complex – 2 letters input and output are different cs, zd: complex in, real out; sc, dz: real in, complex out The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries BLASBLAS NamingNaming ConventionsConventions z LevelLevel 11 BLAS:BLAS: <precision><name><modifier><precision><name><modifier> where modifiers are c: conjugated (cdotc), u: unconjugated (cdotu), g: givens (srotg) z LevelLevel 2,2, 33 BLASBLAS <name><name>:: g: general - ge: general; gb: band s: symmetric - sy: symmetric; sp: packed; sb: band h: Hermitian - he: Hermitian; hp: packed ; hb: band t: triangular - tr: triangular; tp: packed; tb: band The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries BLASBLAS NamingNaming ConventionsConventions z LevelLevel 22 <modifier><modifier> mv: matrix-vector; sv: solve (vector operations); r: rank update; r2: rank 2 update dger: double-precision general rank update: AA :=:= alphaalpha ** xx ** yy’’ ++ AA z LevelLevel 33 <modifier><modifier> mm: matrix-matrix; sm: solve (matrix operations); r: rank update; r2: rank 2 update dsyr2k: double-precision symmetric rank-2 update The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries RNGRNG FunctionsFunctions z Gaussian (RPM, Box-Muller Methods) z Exponential z Laplace z Uniform (a,b), (-a,a) z Weibull z Rayleigh z Cauchy z Lognormal z Discrete Uniform [a,b) z Geometric z Bernoulli z Others DGEMMDGEMM onon IPF2:IPF2: NN == 10241024 100% 90% 80% 70% 90%-100% 60% 80%-90% 50% % pe ak 70%-80% 40% 60%-70% 30% 20% 50%-60% 10% 40%-50% 0% 30%-40% 960 20%-30% 256 1024 512 224 k 128 10%-20% 128 80 72 56 m 40 32 0%-10% 8 1.0 GHz Itanium™ 2 Processor8 – in 6.0 update LINPACKLINPACK onon 1.01.0 GHzGHz IPF2IPF2 16000 14000 12000 10000 1 CPU 8000 2 CPU MFLOPS 4 CPU 6000 4000 2000 0 160 320 640 1000 1280 2000 2560 3000 4000 5000 5120 6000 7000 8000 9000 10000 11000 12000 14000 Number of Equations 2D2D DFTs*onDFTs*on 900900 MHzMHz IPF2IPF2 4000 3500 3000 2500 1P 2000 1500 2P MFLOPS 1000 500 0 128 384 640 900 1152 1440 1728 1920 2250 *Single precision complex Transform Siz4e MKL 6.0 β update 1D1D DFTs*onDFTs*on 900900 MHzMHz IPF2IPF2 3000 2500 2000 1500 1P MFLOPS 1000 500 0 96 192 384 768 1152 1536 1920 *Single precision complex Transform Size MKL 6.0 β update MKLMKL Status,Status, PlansPlans z CurrentCurrent ProductionProduction ReleaseRelease isis 7.27.2 aavailablevailable inin 22 versionsversions – Standard MKL – Cluster MKL – Standard MKL _ ScaLAPACK z VersionVersion 7.2.17.2.1 –– toto bebe releasedreleased inin Q1/2005Q1/2005 – Improvements on Itanium® – BLAS:DGEMM: 1-3% improvement for TN and TT cases – BLAS:*TRMV, ZGERC, ZGERU: 20-30% improvement – VML – vdPowx: improved for special cases – To be released in Q3/2004 FutureFuture ReleasesReleases ofof MKLMKL z NewNew capabilitiescapabilities – C++ Wrappers – Iterative Sparse Solver – LAPACK 4.0 support – Additional statistical functions – Support for upcoming Intel processors – More… • IntelIntel®® ClusterCluster MKLMKL – Distributed Memory DFTs – Distributed Memory sparse solver – Additional ScaLAPACK performance optimizations MKLMKL SummarySummary z EasyEasy wayway toto portableportable codecode forfor allall IntelIntel®® architectures,architectures, Linux*Linux* andand Windows*Windows* z MKLMKL forfor ItaniumItanium®® processorprocessor pathpath toto easyeasy highhigh performanceperformance forfor applicationsapplications z TechnicalTechnical computationcomputation supportsupport ––linearlinear algebraalgebra (BLAS,(BLAS, LAPACK)LAPACK) ––FFTsFFTs ––vectorvector transcendentalstranscendentals (VML)(VML) ––ClusterCluster computingcomputing beingbeing addedadded .

Load more