Computational Model
Computational Methods for Oil Recovery PASI: Scientific Computing in the Americas
The Challenge of Massive Parallelism
Luis M. de la Cruz Salas
Instituto de Geof´ısica Universidad Nacional Aut´onomade M´exico
January 2011 Valpara´ıso,Chile
Comp EOR LMCS 1 / 47 Computational Model
Table of contents
1 Computational Model OOP & Generic Programming TUNA CUDA & Seldon Final Remarks
Comp EOR LMCS 2 / 47 Modular Programming Module: contains a set of procedures along the data they work on (F90, C++, ... ). Object–Oriented Programming Construction of user defined Abstract Data Types (ADT), called classes of objects. Behaviour and data are encapsulated in classes (C++, Java, F90+). Generic Programming We focus on the implementation of models defined by a set of requirements bundled into a concept (C++). E. g. the study of linear algebra algorithms produce Matrix and Vector concepts.
Computational Model
Structured Programming Subroutines, Functions, Procedures (C, F77, ... )
Comp EOR LMCS 3 / 47 Object–Oriented Programming Construction of user defined Abstract Data Types (ADT), called classes of objects. Behaviour and data are encapsulated in classes (C++, Java, F90+). Generic Programming We focus on the implementation of models defined by a set of requirements bundled into a concept (C++). E. g. the study of linear algebra algorithms produce Matrix and Vector concepts.
Computational Model
Structured Programming Subroutines, Functions, Procedures (C, F77, ... ) Modular Programming Module: contains a set of procedures along the data they work on (F90, C++, ... ).
Comp EOR LMCS 3 / 47 Generic Programming We focus on the implementation of models defined by a set of requirements bundled into a concept (C++). E. g. the study of linear algebra algorithms produce Matrix and Vector concepts.
Computational Model
Structured Programming Subroutines, Functions, Procedures (C, F77, ... ) Modular Programming Module: contains a set of procedures along the data they work on (F90, C++, ... ). Object–Oriented Programming Construction of user defined Abstract Data Types (ADT), called classes of objects. Behaviour and data are encapsulated in classes (C++, Java, F90+).
Comp EOR LMCS 3 / 47 Computational Model
Structured Programming Subroutines, Functions, Procedures (C, F77, ... ) Modular Programming Module: contains a set of procedures along the data they work on (F90, C++, ... ). Object–Oriented Programming Construction of user defined Abstract Data Types (ADT), called classes of objects. Behaviour and data are encapsulated in classes (C++, Java, F90+). Generic Programming We focus on the implementation of models defined by a set of requirements bundled into a concept (C++). E. g. the study of linear algebra algorithms produce Matrix and Vector concepts.
Comp EOR LMCS 3 / 47 Computational Model
OOP & Generic Programming Table of contents
1 Computational Model OOP & Generic Programming TUNA CUDA & Seldon Final Remarks
Comp EOR LMCS 4 / 47 Computational Model
OOP & Generic Programming Generic Programming (GP)I
To developing efficient and reusable software libraries [6]. Main idea: many algorithms can be abstracted away from the particular data structures on which they operate. For example: The algorithm accumulate(), which successively applies a binary operator to each element of a container. Typical use would be to sum the elements of a container, using the addition operator. accumulate() algorithm in C++:
t e m p l a t e < class InputIterator , c l a s s T > double x[10]; T accumulate(InputIterator first , v e c t o r
Comp EOR LMCS 5 / 47 Computational Model
OOP & Generic Programming Generic Programming (GP)II
The algorithm can be used with any container that exports the InputIterator interface. An InputIterator can be de-referenced. An InputIterator can be incremented. GP process (lifting): Identify useful and efficient algorithms. Focus on finding commonality among similar implementations of the same algorithm, and then find its generic representation. Derive a set of (minimal) requirements that allow these algorithms to run and run efficiently. Construct a framework based on classifications of requeriments. Lifting process provide us with a suitable abstractions so that a single, generic algorithm can cover many concrete implementations.
Comp EOR LMCS 6 / 47 Computational Model
OOP & Generic Programming Generic Programming (GP)III
Concepts : describe a set of abstractions, each of which meets all of the requirements. Graph algorithms will produce Graph concepts. Linear algebra algorithms will produce Matrix and Vector concepts. Model : is an abstraction that meets all requirements. Summary: GP process give us a better understanding of the problem domain. GP is the creation of a systematic organization of abstract and efficient software components. GP is to program with concepts. GP is not about languages features. Standard Template Library (STL) uses generic programming. THRUST use many concepts from STL.
Comp EOR LMCS 7 / 47 Computational Model
OOP & Generic Programming Classes and Objects
A C++ Implementation
while ( t <= Tmax) { IMPES Algorithm pressure. calcCoefficients (); Solver ::TDMA1D( pressure ); 1: while (t < Tmax) do pressure .update(); 2: Calc. coeff. of pressure equation. 3: Solve the pressure equation implicitly. saturation. calcCoefficients (); 4: Calc. coeff. of saturation equation. Solver :: solExplicit(saturation); 5: Solve the saturation eq. explicitly. saturation .update(); 6: t ← t + ∆t 7: end while t += dt ; }
An object has a name, attributes and behaviour. A class is a construct that is used as a blueprint to create objects. An object of a given class is called an instance of the class.
Comp EOR LMCS 8 / 47 Computational Model
OOP & Generic Programming Class declaration
class Vector { class Matrix { p r i v a t e : p r i v a t e : i n t s i z e ; int rows, cols; double ∗ data ; Vector ∗ d v e c t o r s ; p u b l i c : p u b l i c : Vector ( ) ; Matrix(int , int); Vector(int ); Matrix(const Matrix&); Vector(const Vector&); int getCols() const; int getDimension() const; int getRows() const void setDimension(int ); void setCols(int); void resize(int); void setRows(); double& operator()(int i) const } ; { return data[i ]; } ; } ; Vector x(10), y(10), z(10); Matrix A(10,10), B(10,10); Vector& operator+(const Vector&, z = x + y ; const Vector); A = B ∗ z ;
Comp EOR LMCS 9 / 47 Computational Model
OOP & Generic Programming Templates
Classes Functions template(s1 , s2 ) Vector a r r a y i ( 1 0 ) ; r e t u r n 0 ; Vector
Substitution of type parameters during compile time: template instantation. C++ templates supports: functions, classes and member functions.
Comp EOR LMCS. 10 / 47 Computational Model
OOP & Generic Programming Dynamic polymorphism
In C++ this is about the use of virtual functions. Virtual functions can introduce an overhead in run time:
class Matrix { public: virtual double operator()(int i, int j); } ;
class SymmetricMatrix : public Matrix { public: virtual double operator()(int i, int j); } ;
class UpperTriangularMatrix : public Matrix { public: virtual double operator()(int i, int j); } ; //... double sum(Matrix &A) { double suma; for(int j = 0; j < A.cols(); ++j) for(int i = 0; i < A.cols(); ++i) suma += A(i ,j); return suma; } //... SymmetricMatrix A(N,N); double suma = sum(A);
Comp EOR LMCS 11 / 47 Computational Model
OOP & Generic Programming Engines
class Symmetric { //... } ; class UpperTriangular { //... } ;
template
template
Matrix is a template class with the T engine parameter. Symmetric and UpperTriangular are two different engines that hide the particular implementation for each kind of matrix. All engines must have the same set of operations.
Comp EOR LMCS 12 / 47 Computational Model
OOP & Generic Programming Curiosly Recurring Template Pattern (CRTP) [3]
Also known as Barton & Nackman trick [2]. template
double operator()(int i, int j) { return asLeaf()(i ,j); } } ; class SymmetricMatrix : public Matrix
SymmetricMatrix A; double suma = sum(A);
The base class takes as parameter the type of the derived class.
Comp EOR LMCS 13 / 47 Computational Model
OOP & Generic Programming Template Metaprogramming ([4])I
Is about writting programs that represent and manipulate other programs or themselves They are executed at compile time Prime numbers calculations: Erwin Unruh [ANSI X3J16-94-0075/ISO WG21-462, 1994]. The program does not compile:
unruh.cpp 15: Cannot convert ’enum’ to D<2> in function Prime_print unruh.cpp 15: Cannot convert ’enum’ to D<3> in function Prime_print unruh.cpp 15: Cannot convert ’enum’ to D<5> in function Prime_print unruh.cpp 15: Cannot convert ’enum’ to D<7> in function Prime_print unruh.cpp 15: Cannot convert ’enum’ to D<11> in function Prime_print unruh.cpp 15: Cannot convert ’enum’ to D<13> in function Prime_print unruh.cpp 15: Cannot convert ’enum’ to D<17> in function Prime_print unruh.cpp 15: Cannot convert ’enum’ to D<19> in function Prime_print But it actually runs!!!
Comp EOR LMCS 14 / 47 Computational Model
OOP & Generic Programming Template Metaprogramming ([4])II
Metainformation is info that is available during compile time and cannot be changed during run time. types, constants, non-type template parameters, name of a function or member function. Factorial calculation: template struct Factorial { enum { value = N ∗ F a c t o r i a l
const int fact5 = Factorial <5>:: value ;
Some compilers have limits on template recursion. TM can generate many MB of executable code.
Comp EOR LMCS 15 / 47 Computational Model
OOP & Generic Programming Template Metaprogramming ([4])III
Unrolling
template
template
template s t r u c t meta dot { template
template<> s t r u c t meta dot <0> { template
Comp EOR LMCS 16 / 47 Computational Model
OOP & Generic Programming Template Metaprogramming ([4])IV
TinyVector
Flow control
------| template
Comp EOR LMCS 17 / 47 Computational Model
OOP & Generic Programming Expression Templates ([5])I
Pair evaluation: source code | pseudo code ------+------template
Comp EOR LMCS 18 / 47 Computational Model
OOP & Generic Programming Expression Templates ([5])II The basic idea of ET is to overload operators using trees: template
Vector A, B, C, D; D = A + B + C;
X< X< Vector, plus, Vector>, plus, Vector >
Leaves of the tree should store info about arrays.
Comp EOR LMCS 19 / 47 Computational Model
OOP & Generic Programming Expression Templates ([5])III
Simple code: class plus { }; class Vector { };
template
template
Vector A, B, C, D; D = A + B + C; = X
Comp EOR LMCS 20 / 47 Computational Model
OOP & Generic Programming Expression Templates ([5])IV
More complete code: class plus { public: static double apply(double a, double b) { return a + b; } };
template
X(Left t1, Right t2) : leftnode_(t1), rightnode_(t2) { }
double operator()(int i) { return Op::apply( leftnode_(i), rightnode_(i) ); } };
Comp EOR LMCS 21 / 47 Computational Model
OOP & Generic Programming Expression Templates ([5])V
class Vector { public: Vector(double* data, int N) : data_(data), N_(N) { }
template
private: double* data_; int N_; };
template
Comp EOR LMCS 22 / 47 Computational Model
OOP & Generic Programming Expression Templates ([5])VI
Lets see how does it works: D = A + B + C; D = X< Vector, plus, Vector>(A, B) + C; D = X< X
D.operator=( X< X
for(int i = 0; i < N_; ++i) data_(i) = plus::apply( X
for(int i = 0; i < N_; ++i) data_(i) = plus::apply( plus::apply( A(i), B(i) ), C(i) );
for(int i = 0; i < N_; ++i) data_(i) = plus::apply( A(i)+B(i),C(i) );
for(int i = 0; i < N_; ++i) data_(i) = A(i) + B(i) + C(i);
Comp EOR LMCS 23 / 47 Computational Model
OOP & Generic Programming Expression Templates ([5])VII
Linear Algebra libraries that use ET: Blitz++ (sourceforge.net/projects/blitz/files) FLENS (flens.sourceforge.net) Interfaced with BLAS and LAPACK. Seldon (seldon.sourceforge.net) Interfaced with BLAS, LAPACK, MUMPS, SuperLU, UmfPack Python interface generated by Swig. Eigen (eigen.tuxfamily.org) uBlas (www.boost.org/doc/libs/1 38 0/libs/numeric/ublas)
Comp EOR LMCS 24 / 47 Computational Model
OOP & Generic Programming Krylov algorithms
Conjugate Gradient in FLENS
t e m p l a t e
for k = 1 to ... do r = b − A∗x ; if pk = 0 then p = r ; xk is solution of Ax = b rNormSquare = r ∗ r ; else for (long k=1; k<=maxIterations ; k++) { T r r if (rNormSquare<=t o l ) { α ← k k pT Ap r e t u r n k−1; k k xk+1 ← xk + αpk } rk+1 ← rk − αApk Ap = A∗p ; T alpha = rNormSquare/(p ∗ Ap ) ; rk+1rk+1 βk ← x += alpha ∗p ; rT r k k r −= alpha ∗Ap ; pk+1 ← rk+1 − βkpk rNormSquarePrev = rNormSquare; end if rNormSquare = r ∗ r ; end for beta = rNormSquare/rNormSquarePrev; p = beta ∗p + r ; } return maxIterations; }
Comp EOR LMCS 25 / 47 Computational Model
TUNA Table of contents
1 Computational Model OOP & Generic Programming TUNA CUDA & Seldon Final Remarks
Comp EOR LMCS 26 / 47 Computational Model
TUNA Template Units for Numerical ApplicationsI
TUNA use several C++ template techniques. Based on Blitz++, which uses TM and EP.
Comp EOR LMCS 27 / 47 Computational Model
TUNA Template Units for Numerical ApplicationsII
GeneralEquation class.
namespace Tuna {
template
TwoPhaseEquation class:
namespace Tuna { //... template
inline Tscheme& asDerived() { return static c a s t
Comp EOR LMCS 28 / 47 Computational Model
TUNA Template Units for Numerical ApplicationsIII
BLIP1 adaptor.
template
Comp EOR LMCS 29 / 47 Computational Model
TUNA Template Units for Numerical ApplicationsIV
Generic IMPES:
TwoPhaseEquation< BLIP1
w h i l e ( t <= Tmax) { pressure. calcCoefficients (); Solver ::TDMA1D( pressure ); pressure .update();
saturation. calcCoefficients (); Solver :: solExplicit(saturation ); saturation .update();
t += dt ; }
To change the way you calc the coefficients use:
TwoPhaseEquation< MyPRE
You need to provide the MyPRE and MySAT implementations.
Comp EOR LMCS 30 / 47 Computational Model
CUDA & Seldon Table of contents
1 Computational Model OOP & Generic Programming TUNA CUDA & Seldon Final Remarks
Comp EOR LMCS 31 / 47 Computational Model
CUDA & Seldon Buckley–Leverett in 3D
Property Value Zero capillary pressure Length 300 m Pressure eq. k 1.0E-15 m2 φ 0.2 −∇ · kλ∇p = 0 µw 1.0E-03 Pa.s µo 1.0E-03 Pa.s Srw 0 Saturation eq. Sro 0.2 gin 3.4722E-07 m.s−1 ∂S p φ − ∇ · kλ ∇p = 0. pout 1E+07 Pa ∂t w
Comp EOR LMCS 32 / 47 Computational Model
CUDA & Seldon Steps for IMPES in CUDA
Ongoing work by Daniel Monsivais, MSc in computer science student Locate memory for arrays Initialize Pressure-related arrays Initialize Saturation-related arrays IMPES loop Solve Pressure Implicitly (IMP) ⇐ BiCGStab (CUSPARSE). Update arrays. Solve Saturation Explicitly (ES) ⇐ CUDA kernel Update arrays. Visualize and finish.
Comp EOR LMCS 33 / 47 Computational Model
CUDA & Seldon IMP: (BICGSTAB) CUDA vs SeldonI
Comp EOR LMCS 34 / 47 Computational Model
CUDA & Seldon IMP: (BICGSTAB) CUDA vs SeldonII
Comp EOR LMCS 35 / 47 Computational Model
CUDA & Seldon IMP: (BICGSTAB) CUDA vs SeldonIII
Comp EOR LMCS 36 / 47 Computational Model
CUDA & Seldon Explicit Saturation KernelI
First two terms with 2 axpy’s (CUBLAS). S(i, j, k) =S0(i, j, k) + sp(i, j, k) − AP (i, j, k) ∗ p(i, j, k) Last 7 terms: + AE(i, j, k) ∗ p(i + 1, j, k) ⇐= Mat * Vec op. + AW (i, j, k) ∗ p(i − 1, j, k) CSR format. + AN(i, j, k) ∗ p(i, j + 1, k) + AS(i, j, k) ∗ p(i, j − 1, k) + AF (i, j, k) ∗ p(i, j, k + 1) + AB(i, j, k) ∗ p(i, j, k − 1)
Comp EOR LMCS 37 / 47 Computational Model
CUDA & Seldon Explicit Saturation KernelII
cuda_explicit(double * val, int * I, int * J, double * p, double *phi, double *phi_0, double *sp, unsigned int N, unsigned int num_blocks , int num_threads ) { dim3 blocks(num_blocks,1); dim3 threads(num_threads,1); first_sat<<
__global__ void first_sat(double *val, int * I, int * J, double *p,double *phi,int N) { int idx = blockIdx.x * blockDim.x + threadIdx.x; int j; double sum = 0;
if ( idx < N ) { int initial = I[idx]; final = I[idx + 1];
for (int i = initial ; i < final ; i++){ j = J[i]; sum += (j != idx) ? (val[i]*p[j]) : (-val[i]*p[j]); } phi[idx] = sum; } } Comp EOR LMCS 38 / 47 Computational Model
CUDA & Seldon Intel i5 vs Nvidia Tesla C1060I
Device 0: ”Tesla C1060”
CUDA Driver Version: 3.20 CUDA Runtime Version: 3.20 CUDA Capability Major/Minor version number: 1.3 Total amount of global memory: 4294770688 bytes Multiprocessors x Cores/MP = Cores: 30 (MP) x 8 (Cores/MP) = 240 (Cores) Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz ...
Comp EOR LMCS 39 / 47 Computational Model
CUDA & Seldon Intel i5 vs Nvidia Tesla C1060II
Comp EOR LMCS 40 / 47 Computational Model
CUDA & Seldon Intel i5 vs Nvidia Tesla C1060III
Comp EOR LMCS 41 / 47 Computational Model
CUDA & Seldon Intel i5 vs Nvidia Tesla C1060IV
*Uploading and downloading data to the device was taken into account
Comp EOR LMCS 42 / 47 Computational Model
Final Remarks Table of contents
1 Computational Model OOP & Generic Programming TUNA CUDA & Seldon Final Remarks
Comp EOR LMCS 43 / 47 Requires the skill from many areas of study. Software engineering can be very helpful in organizing your developments. Parallel computing is essential (MPI, OPENMP, CUDA, ...). Reuse code
Computational Model
Final Remarks Final Remarks
Oil reservoir simulation is a grand challenge problem.
Comp EOR LMCS 44 / 47 Software engineering can be very helpful in organizing your developments. Parallel computing is essential (MPI, OPENMP, CUDA, ...). Reuse code
Computational Model
Final Remarks Final Remarks
Oil reservoir simulation is a grand challenge problem. Requires the skill from many areas of study.
Comp EOR LMCS 44 / 47 Parallel computing is essential (MPI, OPENMP, CUDA, ...). Reuse code
Computational Model
Final Remarks Final Remarks
Oil reservoir simulation is a grand challenge problem. Requires the skill from many areas of study. Software engineering can be very helpful in organizing your developments.
Comp EOR LMCS 44 / 47 Reuse code
Computational Model
Final Remarks Final Remarks
Oil reservoir simulation is a grand challenge problem. Requires the skill from many areas of study. Software engineering can be very helpful in organizing your developments. Parallel computing is essential (MPI, OPENMP, CUDA, ...).
Comp EOR LMCS 44 / 47 Computational Model
Final Remarks Final Remarks
Oil reservoir simulation is a grand challenge problem. Requires the skill from many areas of study. Software engineering can be very helpful in organizing your developments. Parallel computing is essential (MPI, OPENMP, CUDA, ...). Reuse code
Comp EOR LMCS 44 / 47 Computational Model
Final Remarks
Thank you very much!
Comp EOR LMCS 45 / 47 Computational Model
Final Remarks ReferencesI
[1] I. Jacobson and G. Booch and J. RumbaughPrimer, The Unified Software Development Process, Addison – Wesley, 1999. [2] J. Barton and L.R. Nackman, Scientific and Engineering C++: An Introduction with Advanced Techniques and Examples, Addison-Wesley Professional, 1994. [3] D. V. and N. M. Josuttis, C++ Templates - The Complete Guide, Addison-Wesley, 2002.
Comp EOR LMCS 46 / 47 Computational Model
Final Remarks ReferencesII
[4] T. Veldhuizen, Using C++ template metaprograms, C++ Report, 7(4):36–43, May 1995. Reprinted in C++ Gems, ed. Stanley Lippman. [5] T. L. Veldhuizen. Expression templates, C++ Report, 7(5):26–31, June 1995. Reprinted in C++ Gems, ed. Stanley Lippman. [6] Generic Programming, http://www.generic-programming.org/.
Comp EOR LMCS 47 / 47