IntelIntel®® MathMath KernelKernel LibraryLibrary
ReleaseRelease 7.07.0
MarchMarch 20052005 IntelIntel®® MKLMKL PurposePurpose z Performance, performance, performance! z Intel’s scientific and engineering floating point math library z Initially only basic linear algebra subroutines (BLAS) and fast Fourier transformations (FFT) z Address: – Solvers such as linear algebra package (LAPACK) and BLAS – Eigenvector/eigenvalue solvers (BLAS, LAPACK) – Some quantum chemistry needs (dgemm) – PDEs, signal processing, seismic, solid-state physics (FFTs) – General scientific, financial - vector transcendental functions, vector markup language (VML) z Tune for Intel processors – current & future
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL PurposePurpose –– DonDon’’tsts z ButBut dondon’’tt useuse IntelIntel MKLMKL onon ……
X’ 4x4 X Y’ Transformation Y Z’ = matrix Z But you could W’ W use Intel® IPP1
Geometric transformation z DonDon’’tt useuse IntelIntel MKLMKL onon ““smallsmall”” countscounts z DonDon’’tt callcall vectorvector mathmath functionsfunctions onon smallsmall nn
1Intel Integrated Performance Primitives
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL ContentsContents
BLAS (basic linear algebra subroutines) z Level 1 BLAS – vector-vector operations – 15 function types – 48 functions z Level 2 BLAS – matrix-vector operations – 26 function types – 66 functions z Level 3 BLAS – matrix-matrix operations – 9 function types – 30 functions z Extended BLAS – level 1 BLAS for sparse vectors – 8 function types – 24 functions
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL ContentsContents LAPACKLAPACK (linear(linear algebraalgebra package)package) – Solvers and eigensolvers, hundreds of routines! – More than 1000 user callable and support routines FFTsFFTs (fast(fast FourierFourier transforms)transforms) – One and two dimensional – With and without frequency ordering (bit reversal) VMLVML (vector(vector mathmath library)library) – Set of vectorized transcendental functions – Most of libm functions, but faster DirectDirect SparseSparse solversolver ((PardisoPardiso*)*)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL ContentsContents z MostMost ofof IntelIntel MKLMKL isis FortranFortran interfaceinterface z LegacyLegacy ofof highhigh performanceperformance computationcomputation z BLAS,BLAS, LAPACKLAPACK areare bothboth Fortran,Fortran, makemake upup mostmost ofof librarylibrary z CBLASCBLAS interfaceinterface –– moremore convenientconvenient forfor C/C++C/C++ programmerprogrammer toto callcall BLASBLAS
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries IntelIntel®® MKLMKL ContentsContents -- EnvironmentEnvironment z SupportsSupports cdeclcdecl andand CVFCVF defaultdefault interfacesinterfaces z SupportsSupports IntelIntel andand CVFCVF FortranFortran compilerscompilers –– importimport forfor thisthis supportsupport relatesrelates toto runtimeruntime librarieslibraries z SupportsSupports Linux*Linux* andand Windows*Windows* OSOS z StaticStatic andand dynamicallydynamically linkedlinked librarieslibraries z SupportsSupports allall processorsprocessors –– 3232--bitbit andand 6464--bitbit z LargeLarge setset ofof teststests andand examplesexamples z ExtensiveExtensive documentationdocumentation
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries ThreadingThreading z Most of Intel® MKL could be threaded but … – Limited resource is memory bandwidth – Threading level 1, level 2 BLAS mostly ineffective ( O(n) ) z Numerous opportunities for threading – Level 3 BLAS ( O(n3) ) – LAPACK ( O(n3) ) – FFTs ( O(n log(n) ) – VML? Depends on processor and function z All threading uses OpenMP* z All Intel MKL is designed and compiled for thread safety
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries HowHow toto LinkLink WithWith MKLMKL onon ItaniumItanium®® z Set path to installation directory – E.g. export MKLPATH=/opt/intel/mkl z Static sample: – ld myprog.o $MKLPATH/libmkl_lapack.a $MKLPATH/libmkl_ipf.a - L$MKLPATH -lguide -lpthread – Itanium®-based processor static linking of LAPACK and kernels. Processor dispatcher will call the appropriate kernel for the system at runtime. z Dynamic sample: – ld myprog.o -L$MKLPATH -lmkl_lapack64 -lmkl -lguide -lpthread – Dynamic linking on Itanium®-based platforms, LAPACK library (double precision functions), Itanium-based processor kernels. Shared object dispatcher will dynamically load the appropriate shared object with specific kernel for the system at runtime
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries BLASBLAS ReviewReview z 33 ““levelslevels”” ofof functionsfunctions ++ sparsesparse – Level 1: vector-vector operations – Level 2: vector-matrix operations – Level 3: matrix-matrix operations – Sparse: level 1 operations on sparse vectors z ““LevelsLevels”” followfollow historyhistory – Level 1 in early 70’s – Level 2 in mid-70’s followed immediately by level 3
The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries BLASBLAS NamingNaming ConventionsConventions z GeneralGeneral scheme:scheme:
The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries BLASBLAS NamingNaming ConventionsConventions z LevelLevel 11 BLAS:BLAS:
The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries BLASBLAS NamingNaming ConventionsConventions z LevelLevel 22
The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries RNGRNG FunctionsFunctions
z Gaussian (RPM, Box-Muller Methods) z Exponential z Laplace z Uniform (a,b), (-a,a) z Weibull z Rayleigh z Cauchy z Lognormal z Discrete Uniform [a,b) z Geometric z Bernoulli z Others DGEMMDGEMM onon IPF2:IPF2: NN == 10241024
100% 90% 80% 90%-100% 70% 80%-90% 60% 50% % pe ak 70%-80% 40% 60%-70% 30% 50%-60% 20% 10% 40%-50% 0% 30%-40% 960 256
1024 20%-30% 512
k 224 128
128 10%-20% 72 80 m 56 40
32 0%-10% 8 1.0 GHz Itanium™ 2 Processor8 – in 6.0 update LINPACKLINPACK onon 1.01.0 GHzGHz IPF2IPF2
16000
14000
12000
10000
1 CPU 2 CPU 8000
MFLOPS 4 CPU
6000
4000
2000
0 160 320 640 1000 1280 2000 2560 3000 4000 5000 5120 6000 7000 8000 9000 10000 11000 12000 14000 Number of Equations 2D2D DFTs*onDFTs*on 900900 MHzMHz IPF2IPF2
4000 3500 3000 2500 2000 1P 1500 2P MFLOPS 1000 500 0 128 384 640 900 1152 1440 1728 1920 2250 *Single precision complex Transform Siz4e MKL 6.0 β update 1D1D DFTs*onDFTs*on 900900 MHzMHz IPF2IPF2 3000 2500 2000 1500 1P 1000 MFLOPS 500 0 96 192 384 768 1152 1536 1920 *Single precision complex MKL 6.0 β update Transform Size MKLMKL Status,Status, PlansPlans z CurrentCurrent ProductionProduction ReleaseRelease isis 7.27.2 aavailablevailable inin 22 versionsversions – Standard MKL – Cluster MKL – Standard MKL _ ScaLAPACK z VersionVersion 7.2.17.2.1 –– toto bebe releasedreleased inin Q1/2005Q1/2005 – Improvements on Itanium® – BLAS:DGEMM: 1-3% improvement for TN and TT cases – BLAS:*TRMV, ZGERC, ZGERU: 20-30% improvement – VML – vdPowx: improved for special cases – To be released in Q3/2004 FutureFuture ReleasesReleases ofof MKLMKL
z NewNew capabilitiescapabilities – C++ Wrappers – Iterative Sparse Solver – LAPACK 4.0 support – Additional statistical functions – Support for upcoming Intel processors – More… • IntelIntel®® ClusterCluster MKLMKL – Distributed Memory DFTs – Distributed Memory sparse solver – Additional ScaLAPACK performance optimizations MKLMKL SummarySummary z EasyEasy wayway toto portableportable codecode forfor allall IntelIntel®® architectures,architectures, Linux*Linux* andand Windows*Windows* z MKLMKL forfor ItaniumItanium®® processorprocessor pathpath toto easyeasy highhigh performanceperformance forfor applicationsapplications z TechnicalTechnical computationcomputation supportsupport ––linearlinear algebraalgebra (BLAS,(BLAS, LAPACK)LAPACK) ––FFTsFFTs ––vectorvector transcendentalstranscendentals (VML)(VML) ––ClusterCluster computingcomputing beingbeing addedadded