Run Time Comparison of MATLAB, Scilab and GNU Octave on Various Benchmark Programs

Version 0.2 – 07/26/2016 Run time comparison of MATLAB, Scilab and GNU Octave on various benchmark programs Roland Baudin, <[email protected]>, july 2016 1. Introduction This document presents a comparison of the run times of MATLAB, Scilab and GNU Octave (abbreviated as “Octave” in the sequel) on 50 benchmark programs found on the Internet or developed by the author. 2. Hardware and software used 2.1. Hardware The computer used for the benchmarks is a HP Z820 PC workstation equipped with a Intel Xeon E5-2687W processor @ 3.10 GHz, with 16 cores, 32 threads and 128 GB RAM. 2.2. Software The operating system used is Ubuntu Linux. More precisely, the system version is Ubuntu 16.04 Xenial Xerus and the architecture is amd64 (64 bits). The MATLAB, Scilab and Octave versions used are given in the following table. It has to be noted that Octave was compiled with the experimental Just In Time (JIT) compiler enabled (see Appendix 7.1 for details). There is no such feature in Scilab, while in MATLAB the JIT feature is available by default. Software Version MATLAB R 2014a (8.3.0.532) Scilab 5.5.2 Octave 4.0.2 (JIT enabled) Table 2.1: Software versions The three packages use the same linear algebra library, called LAPACK. This library, in turn, relies on the BLAS (Basic Linear Algebra Subprograms) library for its internal computations. There are several implementations of the BLAS library that can be used within math packages like MATLAB, Scilab, Octave (and others, like Python SciPy). The four major implementations are called Reference BLAS (RefBLAS), Atlas BLAS (Atlas), OpenBLAS and Intel Math Kernel Library (MKL). 1 Version 0.2 – 07/26/2016 RefBLAS, Atlas and OpenBLAS are free software and are directly available through the package system of the Ubuntu Linux distribution. MKL is a commercial library developed by Intel. It is free for non commercial use and Intel provides a specific bundle that has to be manually installed in Ubuntu. RefBLAS provides a reference single thread implementation of BLAS, but it is slow on some complex linear algebra operations. Atlas and OpenBLAS are highly optimized multithread BLAS implementations and are usually faster. MKL is a multithread implementation of BLAS highly optimized for Intel processors and is usually very fast on such hardware. Scilab and Octave are free software and can be compiled under Ubuntu Linux in such a way that it is easy to switch between the four implementations of the BLAS library (see Appendix 7.2 for the details of the procedure). MATLAB, as a commercial software, is provided with a given version of the MKL, so that it is not possible to switch to another BLAS implementation. The following table gives the version of the various BLAS libraries used in this comparison. BLAS Library Version Package name in Ubuntu RefBLAS 3.6.0 libblas3 Atlas 3.10.2 libatlas3-base OpenBLAS 0.2.18 libopenblas-base MKL 11.3.3.210 N/A MKL MATLAB 11.0.5 N/A Table 2.2: BLAS versions 3. Description of the benchmarks We have used four different sets of benchmark programs. The first three were found on the Internet and only slightly modified for the purpose of this comparison. The fourth was specifically written by the author. These four benchmark sets are named upon their directory name in the benchmark package. They are described below. 3.1. pincon This benchmark set was written by Bruno Pinçon (hence the benchmark name) and is extensively described in [1]. 2 Version 0.2 – 07/26/2016 It consists of 31 different programs that target various operations: loops, random generations, recursive calls, quick sorts, extractions and insertions, matrix operations, permutations, comparisons, cell reads and writes. This benchmark set does not make an intensive use of the BLAS routines because its aim is mainly to test the speed of the interpreter and of the general matrix operations. 3.2. poisson This benchmark set contains only two programs. It was developed by Neeraj Sharma and Matthias K. Gobbert and is fully described in [2]. The purpose of this benchmark is to provide a complex problem that could be encountered in the real life. The implemented problem results from a finite difference discretization of the Poisson equation (hence the name) in two spatial dimensions. The problem is solved using two algorithms: gaussian elimination (GE) and conjugate gradient (CG). 3.3. ncrunch This benchmark set consists of 15 programs. It was designed by Stefan Steinhaus and is described in [3]. The purpose of this benchmark is to be a general “number crunching” benchmark (hence the name). It tests loops, random generation, sorting, FFT, determinants, matrix inverses, eigenvalues, Cholesky decomposition, principal components, special functions and linear regression. This benchmark makes an intensive use of BLAS routines and thus it is very sensitive to the BLAS library implementation. 3.4. optim This benchmark set contains two programs and was developed specifically by the author of this comparison. These programs implement two popular optimization metaheuristics, called Particle Swarm Optimization (PSO) and Differential Evolution (DE). These metaheuristics are described in [4] and [5]. Since there exists many variants of these methods, we choose to implement the “standard” versions, which are the most popular and simple ones. The PSO method is used to find the global minimum of the Rosenbrock function in 30 dimensions. The DE method is used to find the global minimum of the Rastrigin function in 10 dimensions. Rosenbrock and Rastrigin functions are standard test functions used for the evaluation of global optimization methods. The two methods are by nature loop intensives. However, the standard PSO method can easily be vectorized, while only some parts of the DE method can be vectorized. Thus, the DE method is slower than the PSO. 3 Version 0.2 – 07/26/2016 This benchmark set does not make much use of the BLAS routines and is rather insensitive to the choice of the BLAS library. 4. Results 4.1. Detailed results The detailed results (i.e. run times in seconds, lower is better) of the four benchmark sets are given in tables 4.1 to 4.4. We comment below these results for each benchmark set. – pincon The first observation is that the MKL gives by far the worst total results for Scilab and Octave on this benchmark set. Indeed, looking at the individual benchmark results, we see that some programs have nearly the same performances in all BLAS implementations (fibonacci, random generation, quick sort, fannkuch, etc.) while others (subsets, crible, loop calls, etc.) have much higher run times in MKL. We can also see such a phenomenon in MATLAB (though there is no comparison point with another BLAS implementation in MATLAB): for example, it is quite strange that the hmeans (a simple class partitioning algorithm) benchmark has a run time of ~3 s in MATLAB, while it is only ~0.4 s in Scilab and ~0.1 s in Octave. The same observation can be done on the make_perm (permutation algorithm), loop calls, crible or simplexe algorithms. A possible explanation could be that the MKL is well optimized for complex linear algebra operations (as will be seen later) but this optimization has some negative impact on more simple matrix operations. However, this is only a conjecture since we don’t know the internals of the MKL library. It has to be noted that Scilab gives bad results on the cell test (construction and lecture) but this should be fixed in the next generation of the program (Scilab 6). If we disregard the Scilab and Octave results with MKL, we see that the three packages give comparable results on the pincon benchmark set, whatever the BLAS library. Anyway, MATLAB is slightly faster, then comes Scilab and the last is Octave. 4 Version 0.2 – 07/26/2016 Benchmark name MATLAB Scilab Scilab Scilab Scilab Octave Octave Octave Octave (MKL) (RefBLAS) (Atlas) (OpenBLAS) (MKL) (RefBLAS) (Atlas) (OpenBLAS) (MKL) fib(24) 1.11 0.52 0.54 0.59 0.67 2.04 2.06 2.03 1.99 subsets(1:20.7) 0.76 0.51 0.57 0.57 3.31 2.24 2.37 2.25 2.22 irn55_rand(1.1e6) 0.78 0.44 0.44 0.46 0.48 2.28 2.24 2.26 2.25 merge_sort. 10000 elts 0.29 0.42 0.45 0.46 0.47 2.48 2.48 2.44 2.41 qSort. 40000 elts 1.28 0.62 0.66 0.68 0.70 2.83 2.85 2.83 2.73 25 times harmvect(1e6) 0.46 0.22 0.23 0.21 0.22 0.22 0.23 0.21 0.22 harmloop(1e6) 0.02 0.47 0.51 0.51 0.50 0.01 0.01 0.02 0.01 3 times fannkuch(7) 0.46 0.29 0.30 0.31 0.30 2.37 2.39 2.34 2.30 my_lu 400x400 2.04 0.15 0.15 0.14 1.90 0.10 0.10 0.10 1.21 crible(3e6) 1.19 0.12 0.11 0.11 1.46 0.09 0.09 0.09 1.19 p = make_perm(20000) 3.80 0.44 0.44 0.49 2.28 2.21 2.24 2.21 4.50 4 times q = inv_perm(p) 0.01 0.26 0.30 0.28 0.29 0.05 0.06 0.05 0.05 fftx 2^15 elts 1.79 2.76 2.86 2.89 2.92 4.10 4.13 4.08 4.02 30 times pascal(1029) 0.20 0.34 0.36 0.36 0.36 0.50 0.50 0.51 0.50 hmeans.

Run Time Comparison of MATLAB, Scilab and GNU Octave on Various Benchmark Programs

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support