The Design and Implementation of FFTW3 Matteo Frigo and Steven G

Total Page:16

File Type:pdf, Size:1020Kb

The Design and Implementation of FFTW3 Matteo Frigo and Steven G 1 The Design and Implementation of FFTW3 Matteo Frigo and Steven G. Johnson (Invited Paper) Abstract— FFTW is an implementation of the discrete Fourier of a multi-dimensional array. (Most implementations sup- transform (DFT) that adapts to the hardware in order to port only a single DFT of contiguous data.) maximize performance. This paper shows that such an approach • FFTW supports DFTs of real data, as well as of real can yield an implementation that is competitive with hand- optimized libraries, and describes the software structure that symmetric/antisymmetric data (also called discrete co- makes our current FFTW3 version flexible and adaptive. We sine/sine transforms). further discuss a new algorithm for real-data DFTs of prime size, The interaction of the user with FFTW occurs in two a new way of implementing DFTs by means of machine-specific stages: planning, in which FFTW adapts to the hardware, “SIMD” instructions, and how a special-purpose compiler can and execution, in which FFTW performs useful work for the derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm. user. To compute a DFT, the user first invokes the FFTW planner, specifying the problem to be solved. The problem is Index Terms— FFT, adaptive software, Fourier transform, a data structure that describes the “shape” of the input data— cosine transform, Hartley transform, I/O tensor. array sizes and memory layouts—but does not contain the data itself. In return, the planner yields a plan, an executable data I. INTRODUCTION structure that accepts the input data and computes the desired DFT. Afterwards, the user can execute the plan as many times FTW [1] is a widely used free-software library that as desired. computes the discrete Fourier transform (DFT) and its F The FFTW planner works by measuring the actual run time various special cases. Its performance is competitive even with of many different plans and by selecting the fastest one. This vendor-optimized programs, but unlike these programs, FFTW process is analogous to what a programmer would do by hand is not tuned to a fixed machine. Instead, FFTW uses a planner when tuning a program to a fixed machine, but in FFTW’s case to adapt its algorithms to the hardware in order to maximize no manual intervention is required. Because of the repeated performance. The input to the planner is a problem, a multi- performance measurements, however, the planner tends to be dimensional loop of multi-dimensional DFTs. The planner time-consuming. In performance-critical applications, many applies a set of rules to recursively decompose a problem into transforms of the same size are typically required, and there- simpler sub-problems of the same type. “Sufficiently simple” fore a large one-time cost is usually acceptable. Otherwise, problems are solved directly by optimized, straight-line code FFTW provides a mode of operation where the planner quickly that is automatically generated by a special-purpose compiler. returns a “reasonable” plan that is not necessarily the fastest. This paper describes the overall structure of FFTW as well as The planner generates plans according to rules that recur- the specific improvements in FFTW3, our latest version. sively decompose a problem into simpler sub-problems. When FFTW is fast, but its speed does not come at the expense of the problem becomes “sufficiently simple,” FFTW produces flexibility. In fact, FFTW is probably the most flexible DFT a plan that calls a fragment of optimized straight-line code library available: that solves the problem directly. These fragments are called • FFTW is written in portable C and runs well on many codelets in FFTW’s lingo. You can envision a codelet as architectures and operating systems. computing a “small” DFT, but many variations and special • FFTW computes DFTs in O(n log n) time for any cases exist. For example, a codelet might be specialized to length n. (Most other DFT implementations are either compute the DFT of real input (as opposed to complex). 2 restricted to a subset of sizes or they become Θ(n ) for FFTW’s speed depends therefore on two factors. First, the certain values of n, for example when n is prime.) decomposition rules must produce a space of plans that is rich • FFTW imposes no restrictions on the rank (dimension- enough to contain “good” plans for most machines. Second, ality) of multi-dimensional transforms. (Most other im- the codelets must be fast, since they ultimately perform all the plementations are limited to one-dimensional, or at most real work. two- and three-dimensional data.) FFTW’s codelets are generated automatically by a special- • FFTW supports multiple and/or strided DFTs; for exam- purpose compiler called genfft. Most users do not interact ple, to transform a 3-component vector field or a portion with genfft at all: the standard FFTW distribution contains a set of about 150 pre-generated codelets that cover the most M. Frigo is with the IBM Austin Research Laboratory, 11501 Burnet Road, Austin, TX 78758. He was supported in part by the Defense Advanced common uses. Users with special needs can use genfft Research Projects Agency (DARPA) under contract No. NBCH30390004. to generate their own codelets. genfft is useful because S. G. Johnson is with the Massachusetts Institute of Technology, 77 of the following features. From a high-level mathematical Mass. Ave. Rm. 2-388, Cambridge, MA 02139. He was supported in part by the Materials Research Science and Engineering Center program of the description of a DFT algorithm, genfft derives an optimized National Science Foundation under award DMR-9400334. implementation automatically. From a complex-DFT algo- Published in Proc. IEEE, vol. 93, no. 2, pp. 216–231 (2005). rithm, genfft automatically derives an optimized algorithm result. The most important FFT (and the one primarily used in for the real-input DFT. We take advantage of this property to FFTW) is known as the “Cooley-Tukey” algorithm, after the implement real-data DFTs (Section VII), as well as to exploit two authors who rediscovered and popularized it in 1965 [14], machine-specific “SIMD” instructions (Section IX). Similarly, although it had been previously known as early as 1805 by genfft automatically derives codelets for the discrete cosine Gauss as well as by later re-inventors [15]. The basic idea (DCT) and sine (DST) transforms (Section VIII). We summa- behind this FFT is that a DFT of a composite size n = n1n2 rize genfft in Section VI, while a full description appears can be re-expressed in terms of smaller DFTs of sizes n1 and in [2]. n2—essentially, as a two-dimensional DFT of size n1 × n2 We have produced three major implementations of FFTW, where the output is transposed. The choices of factorizations each building on the experience of the previous system. of n, combined with the many different ways to implement the FFTW1 [3] (1997) introduced the idea of generating codelets data re-orderings of the transpositions, have led to numerous automatically, and of letting a planner search for the best implementation strategies for the Cooley-Tukey FFT, with combination of codelets. FFTW2 (1998) incorporated a new many variants distinguished by their own names [16], [17]. version of genfft [2]. genfft did not change much in FFTW implements a space of many such variants, as described FFTW3 (2003), but the runtime structure was completely later, but here we derive the basic algorithm, identify its key rewritten to allow for a much larger space of plans. This paper features, and outline some important historical variations and describes the main ideas common to all FFTW systems, the their relation to FFTW. runtime structure of FFTW3, and the modifications to genfft The Cooley-Tukey algorithm can be derived as follows. If since FFTW2. n can be factored into n = n1n2, Eq. (1) can be rewritten by Previous work on adaptive systems includes [3]–[11]. In letting j = j1n2 + j2 and k = k1 + k2n1. We then have: particular, SPIRAL [9], [10] is another system focused on Y [k1 + k2n1] = (2) optimization of Fourier transforms and related algorithms, but it has distinct differences from FFTW. SPIRAL searches n2−1 n1−1 X X j1k1 j2k1 j2k2 X[j1n2 + j2]ω ω ω . at compile-time over a space of mathematically equivalent n1 n n2 formulas expressed in a “tensor-product” language, whereas j2=0 j1=0 FFTW searches at runtime over the formalism discussed in Thus, the algorithm computes n2 DFTs of size n1 (the inner Section IV, which explicitly includes low-level details, such as sum), multiplies the result by the so-called twiddle factors j2k1 strides and memory alignments, that are not as easily expressed ωn , and finally computes n1 DFTs of size n2 (the outer using tensor products. SPIRAL generates machine-dependent sum). This decomposition is then continued recursively. The code, whereas FFTW’s codelets are machine-independent. literature uses the term radix to describe an n1 or n2 that FFTW’s search uses dynamic programming [12, chapter 16], is bounded (often constant); the small DFT of the radix is while the SPIRAL project has experimented with a wider traditionally called a butterfly. range of search strategies including machine-learning tech- Many well-known variations are distinguished by the radix niques [13]. alone. A decimation in time (DIT) algorithm uses n2 as the The remainder of this paper is organized as follows. We radix, while a decimation in frequency (DIF) algorithm uses n1 begin with a general overview of fast Fourier transforms in as the radix. If multiple radices are used, e.g. for n composite Section II. Then, in Section III, we compare the performance but not a prime power, the algorithm is called mixed radix.
Recommended publications
  • The Fast Fourier Transform
    The Fast Fourier Transform Derek L. Smith SIAM Seminar on Algorithms - Fall 2014 University of California, Santa Barbara October 15, 2014 Table of Contents History of the FFT The Discrete Fourier Transform The Fast Fourier Transform MP3 Compression via the DFT The Fourier Transform in Mathematics Table of Contents History of the FFT The Discrete Fourier Transform The Fast Fourier Transform MP3 Compression via the DFT The Fourier Transform in Mathematics Navigating the Origins of the FFT The Royal Observatory, Greenwich, in London has a stainless steel strip on the ground marking the original location of the prime meridian. There's also a plaque stating that the GPS reference meridian is now 100m to the east. This photo is the culmination of hundreds of years of mathematical tricks which answer the question: How to construct a more accurate clock? Or map? Or star chart? Time, Location and the Stars The answer involves a naturally occurring reference system. Throughout history, humans have measured their location on earth, in order to more accurately describe the position of astronomical bodies, in order to build better time-keeping devices, to more successfully navigate the earth, to more accurately record the stars... and so on... and so on... Time, Location and the Stars Transoceanic exploration previously required a vessel stocked with maps, star charts and a highly accurate clock. Institutions such as the Royal Observatory primarily existed to improve a nations' navigation capabilities. The current state-of-the-art includes atomic clocks, GPS and computerized maps, as well as a whole constellation of government organizations.
    [Show full text]
  • Bring Down the Two-Language Barrier with Julia Language for Efficient
    AbstractSDRs: Bring down the two-language barrier with Julia Language for efficient SDR prototyping Corentin Lavaud, Robin Gerzaguet, Matthieu Gautier, Olivier Berder To cite this version: Corentin Lavaud, Robin Gerzaguet, Matthieu Gautier, Olivier Berder. AbstractSDRs: Bring down the two-language barrier with Julia Language for efficient SDR prototyping. IEEE Embedded Systems Letters, Institute of Electrical and Electronics Engineers, 2021, pp.1-1. 10.1109/LES.2021.3054174. hal-03122623 HAL Id: hal-03122623 https://hal.archives-ouvertes.fr/hal-03122623 Submitted on 27 Jan 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. AbstractSDRs: Bring down the two-language barrier with Julia Language for efficient SDR prototyping Corentin LAVAUD∗, Robin GERZAGUET∗, Matthieu GAUTIER∗, Olivier BERDER∗. ∗ Univ Rennes, CNRS, IRISA, [email protected] Abstract—This paper proposes a new methodology based glue interface [8]. Regarding programming semantic, low- on the recently proposed Julia language for efficient Software level languages (e.g. C++/C, Rust) offers very good runtime Defined Radio (SDR) prototyping. SDRs are immensely popular performance but does not offer a good prototyping experience. as they allow to have a flexible approach for sounding, monitoring or processing radio signals through the use of generic analog High-level languages (e.g.
    [Show full text]
  • Lecture 11 : Discrete Cosine Transform Moving Into the Frequency Domain
    Lecture 11 : Discrete Cosine Transform Moving into the Frequency Domain Frequency domains can be obtained through the transformation from one (time or spatial) domain to the other (frequency) via Fourier Transform (FT) (see Lecture 3) — MPEG Audio. Discrete Cosine Transform (DCT) (new ) — Heart of JPEG and MPEG Video, MPEG Audio. Note : We mention some image (and video) examples in this section with DCT (in particular) but also the FT is commonly applied to filter multimedia data. External Link: MIT OCW 8.03 Lecture 11 Fourier Analysis Video Recap: Fourier Transform The tool which converts a spatial (real space) description of audio/image data into one in terms of its frequency components is called the Fourier transform. The new version is usually referred to as the Fourier space description of the data. We then essentially process the data: E.g . for filtering basically this means attenuating or setting certain frequencies to zero We then need to convert data back to real audio/imagery to use in our applications. The corresponding inverse transformation which turns a Fourier space description back into a real space one is called the inverse Fourier transform. What do Frequencies Mean in an Image? Large values at high frequency components mean the data is changing rapidly on a short distance scale. E.g .: a page of small font text, brick wall, vegetation. Large low frequency components then the large scale features of the picture are more important. E.g . a single fairly simple object which occupies most of the image. The Road to Compression How do we achieve compression? Low pass filter — ignore high frequency noise components Only store lower frequency components High pass filter — spot gradual changes If changes are too low/slow — eye does not respond so ignore? Low Pass Image Compression Example MATLAB demo, dctdemo.m, (uses DCT) to Load an image Low pass filter in frequency (DCT) space Tune compression via a single slider value n to select coefficients Inverse DCT, subtract input and filtered image to see compression artefacts.
    [Show full text]
  • Parallel Fast Fourier Transform Transforms
    Parallel Fast Fourier Parallel Fast Fourier Transform Transforms Massimiliano Guarrasi – [email protected] MassimilianoSuper Guarrasi Computing Applications and Innovation Department [email protected] Fourier Transforms ∞ H ()f = h(t)e2πift dt ∫−∞ ∞ h(t) = H ( f )e−2πift df ∫−∞ Frequency Domain Time Domain Real Space Reciprocal Space 2 of 49 Discrete Fourier Transform (DFT) In many application contexts the Fourier transform is approximated with a Discrete Fourier Transform (DFT): N −1 N −1 ∞ π π π H ()f = h(t)e2 if nt dt ≈ h e2 if ntk ∆ = ∆ h e2 if ntk n ∫−∞ ∑ k ∑ k k =0 k=0 f = n / ∆ = ∆ n tk k / N = ∆ fn n / N −1 () = ∆ 2πikn / N H fn ∑ hk e k =0 The last expression is periodic, with period N. It define a ∆∆∆ between 2 sets of numbers , Hn & hk ( H(f n) = Hn ) 3 of 49 Discrete Fourier Transforms (DFT) N −1 N −1 = 2πikn / N = 1 −2πikn / N H n ∑ hk e hk ∑ H ne k =0 N n=0 frequencies from 0 to fc (maximum frequency) are mapped in the values with index from 0 to N/2-1, while negative ones are up to -fc mapped with index values of N / 2 to N Scale like N*N 4 of 49 Fast Fourier Transform (FFT) The DFT can be calculated very efficiently using the algorithm known as the FFT, which uses symmetry properties of the DFT s um. 5 of 49 Fast Fourier Transform (FFT) exp(2 πi/N) DFT of even terms DFT of odd terms 6 of 49 Fast Fourier Transform (FFT) Now Iterate: Fe = F ee + Wk/2 Feo Fo = F oe + Wk/2 Foo You obtain a series for each value of f n oeoeooeo..oe F = f n Scale like N*logN (binary tree) 7 of 49 How to compute a FFT on a distributed memory system 8 of 49 Introduction • On a 1D array: – Algorithm limits: • All the tasks must know the whole initial array • No advantages in using distributed memory systems – Solutions: • Using OpenMP it is possible to increase the performance on shared memory systems • On a Multi-Dimensional array: – It is possible to use distributed memory systems 9 of 49 Multi-dimensional FFT( an example ) 1) For each value of j and k z Apply FFT to h( 1..
    [Show full text]
  • Evaluating Fourier Transforms with MATLAB
    ECE 460 – Introduction to Communication Systems MATLAB Tutorial #2 Evaluating Fourier Transforms with MATLAB In class we study the analytic approach for determining the Fourier transform of a continuous time signal. In this tutorial numerical methods are used for finding the Fourier transform of continuous time signals with MATLAB are presented. Using MATLAB to Plot the Fourier Transform of a Time Function The aperiodic pulse shown below: x(t) 1 t -2 2 has a Fourier transform: X ( jf ) = 4sinc(4π f ) This can be found using the Table of Fourier Transforms. We can use MATLAB to plot this transform. MATLAB has a built-in sinc function. However, the definition of the MATLAB sinc function is slightly different than the one used in class and on the Fourier transform table. In MATLAB: sin(π x) sinc(x) = π x Thus, in MATLAB we write the transform, X, using sinc(4f), since the π factor is built in to the function. The following MATLAB commands will plot this Fourier Transform: >> f=-5:.01:5; >> X=4*sinc(4*f); >> plot(f,X) In this case, the Fourier transform is a purely real function. Thus, we can plot it as shown above. In general, Fourier transforms are complex functions and we need to plot the amplitude and phase spectrum separately. This can be done using the following commands: >> plot(f,abs(X)) >> plot(f,angle(X)) Note that the angle is either zero or π. This reflects the positive and negative values of the transform function. Performing the Fourier Integral Numerically For the pulse presented above, the Fourier transform can be found easily using the table.
    [Show full text]
  • Fourier Transforms & the Convolution Theorem
    Convolution, Correlation, & Fourier Transforms James R. Graham 11/25/2009 Introduction • A large class of signal processing techniques fall under the category of Fourier transform methods – These methods fall into two broad categories • Efficient method for accomplishing common data manipulations • Problems related to the Fourier transform or the power spectrum Time & Frequency Domains • A physical process can be described in two ways – In the time domain, by h as a function of time t, that is h(t), -∞ < t < ∞ – In the frequency domain, by H that gives its amplitude and phase as a function of frequency f, that is H(f), with -∞ < f < ∞ • In general h and H are complex numbers • It is useful to think of h(t) and H(f) as two different representations of the same function – One goes back and forth between these two representations by Fourier transforms Fourier Transforms ∞ H( f )= ∫ h(t)e−2πift dt −∞ ∞ h(t)= ∫ H ( f )e2πift df −∞ • If t is measured in seconds, then f is in cycles per second or Hz • Other units – E.g, if h=h(x) and x is in meters, then H is a function of spatial frequency measured in cycles per meter Fourier Transforms • The Fourier transform is a linear operator – The transform of the sum of two functions is the sum of the transforms h12 = h1 + h2 ∞ H ( f ) h e−2πift dt 12 = ∫ 12 −∞ ∞ ∞ ∞ h h e−2πift dt h e−2πift dt h e−2πift dt = ∫ ( 1 + 2 ) = ∫ 1 + ∫ 2 −∞ −∞ −∞ = H1 + H 2 Fourier Transforms • h(t) may have some special properties – Real, imaginary – Even: h(t) = h(-t) – Odd: h(t) = -h(-t) • In the frequency domain these
    [Show full text]
  • Fourier Transform, Convolution Theorem, and Linear Dynamical Systems April 28, 2016
    Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow Lecture 23: Fourier Transform, Convolution Theorem, and Linear Dynamical Systems April 28, 2016. Discrete Fourier Transform (DFT) We will focus on the discrete Fourier transform, which applies to discretely sampled signals (i.e., vectors). Linear algebra provides a simple way to think about the Fourier transform: it is simply a change of basis, specifically a mapping from the time domain to a representation in terms of a weighted combination of sinusoids of different frequencies. The discrete Fourier transform is therefore equiv- alent to multiplying by an orthogonal (or \unitary", which is the same concept when the entries are complex-valued) matrix1. For a vector of length N, the matrix that performs the DFT (i.e., that maps it to a basis of sinusoids) is an N × N matrix. The k'th row of this matrix is given by exp(−2πikt), for k 2 [0; :::; N − 1] (where we assume indexing starts at 0 instead of 1), and t is a row vector t=0:N-1;. Recall that exp(iθ) = cos(θ) + i sin(θ), so this gives us a compact way to represent the signal with a linear superposition of sines and cosines. The first row of the DFT matrix is all ones (since exp(0) = 1), and so the first element of the DFT corresponds to the sum of the elements of the signal. It is often known as the \DC component". The next row is a complex sinusoid that completes one cycle over the length of the signal, and each subsequent row has a frequency that is an integer multiple of this \fundamental" frequency.
    [Show full text]
  • SEISGAMA: a Free C# Based Seismic Data Processing Software Platform
    Hindawi International Journal of Geophysics Volume 2018, Article ID 2913591, 8 pages https://doi.org/10.1155/2018/2913591 Research Article SEISGAMA: A Free C# Based Seismic Data Processing Software Platform Theodosius Marwan Irnaka, Wahyudi Wahyudi, Eddy Hartantyo, Adien Akhmad Mufaqih, Ade Anggraini, and Wiwit Suryanto Seismology Research Group, Physics Department, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Sekip Utara Bulaksumur, Yogyakarta 55281, Indonesia Correspondence should be addressed to Wiwit Suryanto; [email protected] Received 4 July 2017; Accepted 24 December 2017; Published 30 January 2018 Academic Editor: Filippos Vallianatos Copyright © 2018 Teodosius Marwan Irnaka et al. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Seismic refection is one of the most popular methods in geophysical prospecting. Nevertheless, obtaining high resolution and accurate results requires a sophisticated processing stage. Tere are many open-source seismic refection data processing sofware programs available; however, they ofen use a high-level programming language that decreases its overall performance, lacks intuitive user-interfaces, and is limited to a small set of tasks. Tese shortcomings reveal the need to develop new sofware using a programming language that is natively supported by Windows5 operating systems, which uses a relatively medium-level programming language (such as C#) and can be enhanced by an intuitive user interface. SEISGAMA was designed to address this need and employs a modular concept, where each processing group is combined into one module to ensure continuous and easy development and documentation.
    [Show full text]
  • Chapter B12. Fast Fourier Transform Taking the FFT of Each Row of a Two-Dimensional Matrix: § 1236 Chapter B12
    http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North Amer readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, v Permission is granted for internet users to make one paper copy their own personal use. Further reproduction, or any copyin Copyright (C) 1986-1996 by Cambridge University Press. Programs Copyright (C) 1986-1996 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN FORTRAN 90: THE Art of PARALLEL Scientific Computing (ISBN 0-521-57439-0) Chapter B12. Fast Fourier Transform The algorithms underlying the parallel routines in this chapter are described in §22.4. As described there, the basic building block is a routine for simultaneously taking the FFT of each row of a two-dimensional matrix: SUBROUTINE fourrow_sp(data,isign) USE nrtype; USE nrutil, ONLY : assert,swap IMPLICIT NONE COMPLEX(SPC), DIMENSION(:,:), INTENT(INOUT) :: data INTEGER(I4B), INTENT(IN) :: isign Replaces each row (constant first index) of data(1:M,1:N) by its discrete Fourier trans- form (transform on second index), if isign is input as 1; or replaces each row of data by N times its inverse discrete Fourier transform, if isign is input as −1. N must be an integer power of 2. Parallelism is M-fold on the first index of data. INTEGER(I4B) :: n,i,istep,j,m,mmax,n2 REAL(DP) :: theta COMPLEX(SPC), DIMENSION(size(data,1)) :: temp COMPLEX(DPC) :: w,wp Double precision for the trigonometric recurrences.
    [Show full text]
  • Julia: a Modern Language for Modern ML
    Julia: A modern language for modern ML Dr. Viral Shah and Dr. Simon Byrne www.juliacomputing.com What we do: Modernize Technical Computing Today’s technical computing landscape: • Develop new learning algorithms • Run them in parallel on large datasets • Leverage accelerators like GPUs, Xeon Phis • Embed into intelligent products “Business as usual” will simply not do! General Micro-benchmarks: Julia performs almost as fast as C • 10X faster than Python • 100X faster than R & MATLAB Performance benchmark relative to C. A value of 1 means as fast as C. Lower values are better. A real application: Gillespie simulations in systems biology 745x faster than R • Gillespie simulations are used in the field of drug discovery. • Also used for simulations of epidemiological models to study disease propagation • Julia package (Gillespie.jl) is the state of the art in Gillespie simulations • https://github.com/openjournals/joss- papers/blob/master/joss.00042/10.21105.joss.00042.pdf Implementation Time per simulation (ms) R (GillespieSSA) 894.25 R (handcoded) 1087.94 Rcpp (handcoded) 1.31 Julia (Gillespie.jl) 3.99 Julia (Gillespie.jl, passing object) 1.78 Julia (handcoded) 1.2 Those who convert ideas to products fastest will win Computer Quants develop Scientists prepare algorithms The last 25 years for production (Python, R, SAS, DEPLOY (C++, C#, Java) Matlab) Quants and Computer Compress the Scientists DEPLOY innovation cycle collaborate on one platform - JULIA with Julia Julia offers competitive advantages to its users Julia is poised to become one of the Thank you for Julia. Yo u ' v e k i n d l ed leading tools deployed by developers serious excitement.
    [Show full text]
  • Praktikum Iz Softverskih Alata U Elektronici
    PRAKTIKUM IZ SOFTVERSKIH ALATA U ELEKTRONICI 2017/2018 Predrag Pejović 31. decembar 2017 Linkovi na primere: I OS I LATEX 1 I LATEX 2 I LATEX 3 I GNU Octave I gnuplot I Maxima I Python 1 I Python 2 I PyLab I SymPy PRAKTIKUM IZ SOFTVERSKIH ALATA U ELEKTRONICI 2017 Lica (i ostali podaci o predmetu): I Predrag Pejović, [email protected], 102 levo, http://tnt.etf.rs/~peja I Strahinja Janković I sajt: http://tnt.etf.rs/~oe4sae I cilj: savladavanje niza programa koji se koriste za svakodnevne poslove u elektronici (i ne samo elektronici . ) I svi programi koji će biti obrađivani su slobodan softver (free software), legalno možete da ih koristite (i ne samo to) gde hoćete, kako hoćete, za šta hoćete, koliko hoćete, na kom računaru hoćete . I literatura . sve sa www, legalno, besplatno! I zašto svake godine (pomalo) updated slajdovi? Prezentacije predmeta I engleski I srpski, kraća verzija I engleski, prezentacija i animacije I srpski, prezentacija i animacije A šta se tačno radi u predmetu, koji programi? 1. uvod (upravo slušate): organizacija nastave + (FS: tehnička, ekonomska i pravna pitanja, kako to uopšte postoji?) (≈ 1 w) 2. operativni sistem (GNU/Linux, Ubuntu), komandna linija (!), shell scripts, . (≈ 1 w) 3. nastavak OS, snalaženje, neki IDE kao ilustracija i vežba, jedan Python i jedan C program . (≈ 1 w) 4.L ATEX i LATEX 2" (≈ 3 w) 5. XCircuit (≈ 1 w) 6. probni kolokvijum . (= 1 w) 7. prvi kolokvijum . 8. GNU Octave (≈ 1 w) 9. gnuplot (≈ (1 + ) w) 10. wxMaxima (≈ 1 w) 11. drugi kolokvijum . 12. Python, IPython, PyLab, SymPy (≈ 3 w) 13.
    [Show full text]
  • Introduction to Matlab
    Introduction to Numerical Libraries Shaohao Chen High performance computing @ Louisiana State University Outline 1. Introduction: why numerical libraries? 2. Fast Fourier transform: FFTw 3. Linear algebra libraries: LAPACK 4. Krylov subspace solver: PETSc 5. GNU scientific libraries: GSL 1. Introduction It is very easy to write bad code! • Loops : branching, dependencies • I/O : resource conflicts, • Memory : long fetch-time • Portability : code and data portable? • Readability : Can you still understand it? A solution: look for existing libraries! Why numerical libraries? • Many functions or subroutines you need may have already been coded by others. Just use them. • Not necessary to code every line by yourself. • Check available libs before starting to write your program. • Save your time and efforts! Advantages of using numerical libraries: • Computing optimizations • Portability • Easy to debug • Easy to read What you will learn in this training • General knowledge of numerical libs. • How to check available libs on HPC systems and the web. • How to use numerical libs on HPC systems. • Basic programming with numerical libs. List of notable numerical libraries • Linear Algebra Package (LAPACK): computes matrix and vector operations. [Fortran] • Fastest Fourier Transform in the West (FFTW): computes Fourier and related transforms. [C/C++] • GNU Scientific Library (GSL): provides a wide range of mathematical routines. [C/C++] • Portable, Extensible Toolkit for Scientific Computation (PETSc): a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations. [C/C++] • NumPy: adds support for the manipulation of large, multi-dimensional arrays and matrices; also includes a large collection of high-level mathematical functions.
    [Show full text]