, Modelling and Simulation

Griffin Cook Numerical Analysis, Modelling and Simulation

Numerical Analysis, Modelling and Simulation

Edited by Griffin Cook Numerical Analysis, Modelling and Simulation Edited by Griffin Cook ISBN: 978-1-9789-1530-5

© 2018 Press

Published by Library Press, 5 Penn Plaza, 19th Floor, New York, NY 10001, USA

Cataloging-in-Publication Data

Numerical analysis, modelling and simulation / edited by Griffin Cook. p. cm. Includes bibliographical references and index. ISBN 978-1-9789-1530-5 1. Numerical analysis. 2. Mathematical models. 3. Simulation methods. I. Cook, Griffin. QA297 .N86 2018 518--dc23

This book contains information obtained from authentic and highly regarded sources. All chapters are published with permission under the Creative Commons Attribution Share Alike License or equivalent. A wide variety of references are listed. Permissions and sources are indicated; for detailed attributions, please refer to the permissions page. Reasonable efforts have been made to publish reliable data and information, but the authors, editors and publisher cannot assume any responsibility for the validity of all materials or the consequences of their use.

Copyright of this ebook is with Library Press, rights acquired from the original print publisher, Larsen and Keller Education.

Trademark Notice: All trademarks used herein are the property of their respective owners. The use of any trademark in this text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affiliation with or endorsement of this book by such owners.

The publisher’s policy is to use permanent paper from mills that operate a sustainable forestry policy. Furthermore, the publisher ensures that the text paper and cover boards used have met acceptable environmental accreditation standards. Table of Contents

Preface VII

Chapter 1 Introduction to Numerical Analysis 1

Chapter 2 Understanding Simulation 11 • Simulation 11 • Computer Simulation 13 • Dynamic Simulation 43 • Discrete Event Simulation 44 • List of Computer Simulation Software 47

Chapter 3 Modeling: An Overview 62 • Mathematical Model 62 • Conceptual Model 85 • Conceptual Model (Computer Science) 94 • Multiscale Modeling 94 • Ontology (Information Science) 97 • Statistical Model 104

Chapter 4 Theorems in Approximation Theory 108 • Approximation Theory 108 • Stone–Weierstrass Theorem 112 • Fejér’s Theorem 116 • Bernstein’s Theorem (Approximation Theory) 117 • Favard’s Theorem 118 • Müntz–Szász Theorem 118

Chapter 5 Methods and Techniques of Numerical Analysis 120 • Numerical Methods for Ordinary Differential Equations 120 • Series Acceleration 128 • Minimum Extrapolation 130 • Richardson Extrapolation 131 • Shanks Transformation 136 • Interpolation 139 • Van Wijngaarden Transformation 145 • Splitting 146 • Elimination 151 • Convex Optimization 158

Chapter 6 Essential Aspects of Numerical Analysis 164 • Numerical Integration 164 • Monte Carlo Method 172 • Monte Carlo Integration 183 • Mathematical Optimization 190 • Optimization Problem 203

______WORLD TECHNOLOGIES ______VI Contents

• Multi-objective Optimization 205 • Eigendecomposition of A Matrix 217 • Value Decomposition 226 • System of Linear Equations 243

Chapter 7 Various Numerical Analysis Softwares 256 • List of Numerical Analysis Software 256 • TK Solver 262 • LAPACK 264 • DataMelt 266 • Analytica (Software) 272 • GNU Octave 275 • Julia () 281

Chapter 8 Applications of Simulation 285 • Flight Simulator 285 • Robotics Suite 291 • Reservoir Simulation 291 • UrbanSim 296 • Traffic Simulation 297 • Stochastic Simulation 302

Permissions

Index

______WORLD TECHNOLOGIES ______Preface

Numerical analysis, modeling and simulation are very complex and intricate subjects, which play a crucial role in the visual (2D and 3D) representation of concepts and objects that are otherwise not visible at a phenomenal level. They are used in the fields like systems theory, science education, knowledge visualization and even in philosophy of science. The text presents these difficult subjects in the most comprehensible and easy to understand language. It includes topics, which are important for the holistic understanding of the subject matter. It studies, analyses and upholds the pillars of the subject and its utmost significance in modern times. Coherent flow of topics, student-friendly language and extensive use of examples make this textbook an invaluable source of knowledge.

Given below is the chapter wise description of the book:

Chapter 1- Numerical analysis is a topic in that concerns itself with the study of . Numerical analysis has applications in all fields of engineering; recently it has been adopted by life sciences also. This chapter will provide an integrated understanding of numerical analysis.

Chapter 2- The process of re-creating the operations of the processes that occur in the real world over time is referred to as simulation. Simulation can only take place when the model it is imitating has already been fully developed. This section on simulation offers an insightful focus, keeping in mind the subject matter.

Chapter 3- Mathematical modeling is the description of systems that uses mathematical concepts. The process of developing this model is known as mathematical modeling. The major elements of mathematical modeling are governing equations, constitutive equations, constraints, kinematics equations etc. The major components are discussed in this section.

Chapter 4- Approximation theory concerns itself with how functions can be valued with simpler functions. The other theorems explained in this section are Stone-Weierstrass theorem, Fejér’s theorem, Bernstein’s theorem and Favard’s theorem. The chapter strategically encompasses and incorporated the theorems used in approximation theory, providing a complete understanding.

Chapter 5- The methods and techniques of numerical analysis are series acceleration, minimum polynomial extrapolation, Richardson extrapolation, Shanks transformation and interpolation. Series acceleration improves the rate of convergence of a series. It is also used to obtain a variety of identities on special functions. The aspects elucidated in this chapter are of vital importance, and provides a better understanding of numerical analysis.

Chapter 6- Numerical integration is a broad family that involves algorithms for calculation. The calculation is done for calculating the numerical value of a definite . The aspects of numerical analysis explained in this section are Monte Carlo method, Monte Carlo integration, mathematical optimization, optimization problem, singular value decomposition etc. The topics discussed in the section are of great importance to broaden the existing knowledge on numerical analysis.

______WORLD TECHNOLOGIES ______VIII Preface

Chapter 7- This chapter lists the numerical analysis softwares; some of these softwares are TK Solver, LAPACK, DataMelt, Analytica and GNU Octave. TK Solver is a mathematical modeling software that is based on declarative and rule-based language. Analytica is a software that is developed by Lumia decision systems for creating and analyzing quantitative decision models. The section serves as a source to understand all the numerical analysis softwares.

Chapter 8- Simulation has numerous applications; some of these applications are flight simulators, robotics suites, reservoir simulations, UrbanSim and traffic simulation. Flight simulators are devices that artificially re-creates aircraft flight and along with this also re-creates the environment in which it flies. This is used for pilot training. This chapter helps the readers in understanding the applications of simulation in today’s time.

At the end, I would like to thank all those who dedicated their time and efforts for the successful completion of this book. I also wish to convey my gratitude towards my friends and family who supported me at every step.

Editor

______WORLD TECHNOLOGIES ______1 Introduction to Numerical Analysis

Numerical analysis is a topic in mathematics that concerns itself with the study of algorithms. Numerical analysis has applications in all fields of engineering; recently it has been adopted by life sciences also. This chapter will provide an integrated understanding of numerical analysis.

Numerical analysis is the study of algorithms that use numerical approximation (as opposed to general symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics).

Babylonian clay tablet YBC 7289 (. 1800–1600 BC) with annotations. The approximation of the square root of 2 is four sexagesimal figures, which is about six decimal figures. 1 + 24/60 + 51/602 + 10/603 = 1.41421296...

One of the earliest mathematical writings is a Babylonian tablet from the Yale Babylonian Col- lection (YBC 7289), which gives a sexagesimal numerical approximation of 2 , the length of the diagonal in a unit square. Being able to compute the sides of a triangle (and hence, being able to compute square roots) is extremely important, for instance, in astronomy, carpentry and construc- tiotn.

Numerical analysis continues this long tradition of practical mathematical calculations. Much like the Babylonian approximation of 2, modern numerical analysis does not seek exact answers, because exact answers are often impossible to obtain in practice. Instead, much of numerical anal- ysis is concerned with obtaining approximate solutions while maintaining reasonable bounds on errors.

Numerical analysis naturally finds applications in all fields of engineering and the physical scienc- es, but in the 21st century also the life sciences and even the arts have adopted elements of scientif- ic computations. Ordinary differential equations appear in celestial mechanics (planets, stars and galaxies); numerical linear is important for data analysis; stochastic differential equations and Markov chains are essential in simulating living cells for medicine and biology.

______WORLD TECHNOLOGIES ______2 Numerical Analysis, Modelling and Simulation

Before the advent of modern computers numerical methods often depended on hand interpolation in large printed tables. Since the mid 20th century, computers calculate the required functions in- stead. These same interpolation formulas nevertheless continue to be used as part of the software algorithms for solving differential equations.

General Introduction

The overall goal of the field of numerical analysis is the design and analysis of techniques to give approximate but accurate solutions to hard problems, the variety of which is suggested by the fol- lowing:

• Advanced numerical methods are essential in making numerical weather prediction feasi- ble.

• Computing the trajectory of a spacecraft requires the accurate numerical solution of a sys- tem of ordinary differential equations.

• Car companies can improve the crash safety of their vehicles by using computer simu- lations of car crashes. Such simulations essentially consist of solving partial differential equations numerically.

• Hedge funds (private investment funds) use tools from all fields of numerical analysis to attempt to calculate the value of stocks and derivatives more precisely than other market participants.

• Airlines use sophisticated optimization algorithms to decide ticket prices, airplane and crew assignments and fuel needs. Historically, such algorithms were developed within the overlapping field of operations research.

• Insurance companies use numerical programs for actuarial analysis. History

The field of numerical analysis predates the invention of modern computers by many centuries. Linear interpolation was already in use more than 2000 years ago. Many great mathematicians of the past were preoccupied by numerical analysis, as is obvious from the names of important algorithms like Newton’s method, Lagrange interpolation polynomial, Gaussian elimination, or ’s method.

To facilitate computations by hand, large books were produced with formulas and tables of data such as interpolation points and coefficients. Using these tables, often calculated out to 16 decimal places or more for some functions, one could look up values to plug into the formulas giv- en and achieve very good numerical estimates of some functions. The canonical work in the field is the NIST publication edited by Abramowitz and Stegun, a 1000-plus page book of a very large number of commonly used formulas and functions and their values at many points. The function values are no longer very useful when a computer is available, but the large listing of formulas can still be very handy.

The mechanical calculator was also developed as a tool for hand computation. These calculators

______WORLD TECHNOLOGIES ______Introduction to Numerical Analysis 3 evolved into electronic computers in the 1940s, and it was then found that these computers were also useful for administrative purposes. But the invention of the computer also influenced the field of numerical analysis, since now longer and more complicated calculations could be done.

Direct and Iterative Methods Direct Vs Iterative Methods Consider the problem of solving

3x3 + 4 = 28 for the unknown quantity x.

Direct method 3x3 + 4 = 28. Subtract 4 3x3 = 24. Divide by 3 x3 = 8. Take cube roots x = 2.

For the iterative method, apply the bisection method to f(x) = 3x3 − 24. The initial values are a = 0, b = 3, f(a) = −24, f(b) = 57.

Iterative method a b mid f(mid) 0 3 1.5 −13.875 1.5 3 2.25 10.17... 1.5 2.25 1.875 −4.22... 1.875 2.25 2.0625 2.32...

We conclude from this table that the solution is between 1.875 and 2.0625. The might return any number in that range with an error less than 0.2.

Discretization and Numerical Integration

In a two-hour race, we have measured the speed of the car at three instants and recorded them in the following table.

______WORLD TECHNOLOGIES ______4 Numerical Analysis, Modelling and Simulation

Time 0:20 1:00 1:40 km/h 140 150 180

A discretization would be to say that the speed of the car was constant from 0:00 to 0:40, then from 0:40 to 1:20 and finally from 1:20 to 2:00. For instance, the total distance traveled in the first 40 minutes is approximately (2/3 h × 140 km/h) = 93.3 km. This would allow us to estimate the total distance traveled as 93.3 km + 100 km + 120 km = 313.3 km, which is an example of numerical integration using a Riemann sum, because displacement is the integral of velocity.

Ill-conditioned problem: Take the function f(x) = 1/(x − 1). Note that f(1.1) = 10 and f(1.001) = 1000: a change in x of less than 0.1 turns into a change in f(x) of nearly 1000. Evaluating f(x) near x = 1 is an ill-conditioned problem.

Well-conditioned problem: By contrast, evaluating the same function f(x) = 1/(x − 1) near x = 10 is a well-conditioned problem. For instance, f(10) = 1/9 ≈ 0.111 and f(11) = 0.1: a modest change in x leads to a modest change in f(x).

Direct methods compute the solution to a problem in a finite number of steps. These methods would give the precise answer if they were performed in infinite precision arithmetic. Examples include Gaussian elimination, the QR factorization method for solving systems of linear equations, and the simplex method of . In practice, finite precision is used and the result is an approximation of the true solution (assuming stability).

In contrast to direct methods, iterative methods are not expected to terminate in a finite number of steps. Starting from an initial guess, iterative methods form successive approximations that converge to the exact solution only in the limit. A convergence test, often involving the residual, is specified in order to decide when a sufficiently accurate solution has (hopefully) been found. Even using infinite precision arithmetic these methods would not reach the solution within a fi- nite number of steps (in general). Examples include Newton’s method, the bisection method, and Jacobi iteration. In computational matrix algebra, iterative methods are generally needed for large problems.

Iterative methods are more common than direct methods in numerical analysis. Some methods are direct in principle but are usually used as though they were not, e.g. GMRES and the conjugate gradient method. For these methods the number of steps needed to obtain the exact solution is so large that an approximation is accepted in the same manner as for an iterative method.

Discretization

Furthermore, continuous problems must sometimes be replaced by a discrete problem whose solution is known to approximate that of the continuous problem; this process is called discret- ization. For example, the solution of a is a function. This function must be represented by a finite amount of data, for instance by its value at a finite number of points at its domain, even though this domain is a continuum.

______WORLD TECHNOLOGIES ______Introduction to Numerical Analysis 5

Generation and Propagation of Errors

The study of errors forms an important part of numerical analysis. There are several ways in which error can be introduced in the solution of the problem.

Round-off

Round-off errors arise because it is impossible to represent all real numbers exactly on a machine with finite memory (which is what all practical digital computers are).

Truncation and Discretization Error

Truncation errors are committed when an iterative method is terminated or a mathematical procedure is approximated, and the approximate solution differs from the exact solution. Similarly, discretization induces a discretization error because the solution of the discrete problem does not coincide with the solution of the continuous problem. For instance, in the iteration in the sidebar to compute the solution of 3x3 += 4 28, after 10 or so iterations, we conclude that the root is roughly 1.99 (for example). We therefore have a truncation error of 0.01.

Once an error is generated, it will generally propagate through the calculation. For instance, we have already noted that the operation + on a calculator (or a computer) is inexact. It follows that a calculation of the type abcde+++ +is even more inexact.

What does it mean when we say that the truncation error is created when we approximate a mathe- matical procedure? We know that to integrate a function exactly requires one to find the sum of infinite trapezoids. But numerically one can find the sum of only finite trapezoids, and hence the approxima- tion of the mathematical procedure. Similarly, to differentiate a function, the differential element ap- proaches zero but numerically we can only choose a finite value of the differential element.

Numerical Stability and Well-posed Problems

Numerical stability is an important notion in numerical analysis. An algorithm is called numeri- cally stable if an error, whatever its cause, does not grow to be much larger during the calculation. This happens if the problem is well-conditioned, meaning that the solution changes by only a small amount if the problem data are changed by a small amount. To the contrary, if a problem is ill-con- ditioned, then any small error in the data will grow to be a large error.

Both the original problem and the algorithm used to solve that problem can be well-conditioned and/or ill-conditioned, and any combination is possible.

So an algorithm that solves a well-conditioned problem may be either numerically stable or numer- ically unstable. An art of numerical analysis is to find a stable algorithm for solving a well-posed mathematical problem. For instance, computing the square root of 2 (which is roughly 1.41421) is a well-posed problem. Many algorithms solve this problem by starting with an initial approximation x0 to 2, for instance x0 = 1.4, and then computing improved guesses x1, x2, etc. One such method is the famous Babylonian method, which is given by xk+1 = xk/2 + 1/xk. Another method, which 2 2 we will call Method X, is given by xk+1 = (xk − 2) + xk. We have calculated a few iterations of each

scheme in table form below, with initial guesses x0 = 1.4 and x0 = 1.42.

______WORLD TECHNOLOGIES ______6 Numerical Analysis, Modelling and Simulation

Babylonian Babylonian Method X Method X

x0 = 1.4 x0 = 1.42 x0 = 1.4 x0 = 1.42

x1 = 1.4142857... x1 = 1.41422535... x1 = 1.4016 x1 = 1.42026896

x2 = 1.414213564... x2 = 1.41421356242... x2 = 1.4028614... x2 = 1.42056......

x1000000 = 1.41421... x27 = 7280.2284...

Observe that the Babylonian method converges quickly regardless of the initial guess, whereas

Method X converges extremely slowly with initial guess x0 = 1.4 and diverges for initial guess x0 = 1.42. Hence, the Babylonian method is numerically stable, while Method X is numerically unstable.

Numerical stability is affected by the number of the significant digits the machine keeps on, if we use a machine that keeps only the four most significant decimal digits, a good example on loss of significance is given by these two equivalent functions

x f( x )= x() x +− 1 x and gx ( ) = . xx++1

If we compare the results of f (500)= 500( 501 −= 500) 500( 22.38 −= 22.36) 500(0.02) = 10

and

500 g(500) = 501+ 500 500 = 22.38+ 22.36 500 = =11.17 44.74

by looking to the two results above, we realize that loss of significance (caused here by catastrophic cancelation) has a huge effect on the results, even though both functions are equivalent, as shown below

fx()= x() x +− 1 x xx++1 =xx() +−1 x xx++1 (xx+− 1)22 ( ) = x xx++1 xx+−1 = x xx++1 1 = x xx++1 x = xx++1 = gx()

______WORLD TECHNOLOGIES ______Introduction to Numerical Analysis 7

The desired value, computed using infinite precision, is 11.174755...

• The example is a modification of one taken from Mathew; Numerical methods using Mat- lab, 3rd ed.

Areas of Study

The field of numerical analysis includes many sub-disciplines. Some of the major ones are:

Computing Values of Functions

Interpolation: We have observed the temperature to vary from 20 degrees Celsius at 1:00 to 14 degrees at 3:00. A linear interpolation of this data would conclude that it was 17 degrees at 2:00 and 18.5 degrees at 1:30pm.

Extrapolation: If the gross domestic product of a country has been growing an average of 5% per year and was 100 billion dollars last year, we might extrapolate that it will be 105 billion dollars this year.

Regression: In linear regression, given n points, we compute a line that passes as close as possi- ble to those n points.

Optimization: Say you sell lemonade at a lemonade stand, and notice that at $1, you can sell 197 glasses of lemonade per day, and that for each increase of $0.01, you will sell one glass of lemonade less per day. If you could charge $1.485, you would maximize your profit, but due to the constraint of having to charge a whole cent amount, charging $1.48 or $1.49 per glass will both yield the max- imum income of $220.52 per day.

______WORLD TECHNOLOGIES ______8 Numerical Analysis, Modelling and Simulation

Differential equation: If you set up 100 fans to blow air from one end of the room to the other and then you drop a feather into the wind, what happens? The feather will follow the air currents, which may be very complex. One approximation is to measure the speed at which the air is blowing near the feather every second, and advance the simulated feather as if it were moving in a straight line at that same speed for one second, before measuring the wind speed again. This is called the Euler method for solving an ordinary differential equation.

One of the simplest problems is the evaluation of a function at a given point. The most straightfor- ward approach, of just plugging in the number in the formula is sometimes not very efficient. For , a better approach is using the Horner scheme, since it reduces the necessary number of multiplications and additions. Generally, it is important to estimate and control round-off er- rors arising from the use of floating point arithmetic.

Interpolation, Extrapolation, and Regression

Interpolation solves the following problem: given the value of some unknown function at a number of points, what value does that function have at some other point between the given points?

Extrapolation is very similar to interpolation, except that now we want to find the value of the un- known function at a point which is outside the given points.

Regression is also similar, but it takes into account that the data is imprecise. Given some points, and a measurement of the value of some function at these points (with an error), we want to deter- mine the unknown function. The -method is one popular way to achieve this.

Solving Equations and Systems of Equations

Another fundamental problem is computing the solution of some given equation. Two cases are com- monly distinguished, depending on whether the equation is linear or not. For instance, the equation 2x += 53is linear while 2x2 += 53is not.

Much effort has been put in the development of methods for solving systems of linear equations. Standard direct methods, i.e., methods that use some matrix decomposition are Gaussian elimina- tion, LU decomposition, for symmetric (or hermitian) and positive-defi- nite matrix, and QR decomposition for non-square matrices. Iterative methods such as the Jaco- bi method, Gauss–Seidel method, successive over-relaxation and conjugate gradient method are

______WORLD TECHNOLOGIES ______Introduction to Numerical Analysis 9

usually preferred for large systems. General iterative methods can be developed using a matrix splitting.

Root-finding algorithms are used to solve nonlinear equations (they are so named since a root of a function is an argument for which the function yields zero). If the function is differentiable and the derivative is known, then Newton’s method is a popular choice. Linearization is another technique for solving nonlinear equations.

Solving Eigenvalue or Singular Value Problems

Several important problems can be phrased in terms of eigenvalue decompositions or singular value decompositions. For instance, the spectral image compression algorithm is based on the singular value decomposition. The corresponding tool in is called principal component analysis.

Optimization

Optimization problems ask for the point at which a given function is maximized (or minimized). Often, the point also has to satisfy some constraints.

The field of optimization is further split in several subfields, depending on the form of the objective function and the constraint. For instance, linear programming deals with the case that both the objective function and the constraints are linear. A famous method in linear programming is the simplex method.

The method of Lagrange multipliers can be used to reduce optimization problems with constraints to unconstrained optimization problems.

Evaluating

Numerical integration, in some instances also known as numerical quadrature, asks for the value of a definite integral. Popular methods use one of the Newton–Cotes formulas (like the midpoint rule or Simpson’s rule) or Gaussian quadrature. These methods rely on a “divide and conquer” strategy, whereby an integral on a relatively large set is broken down into integrals on smaller sets. In higher dimensions, where these methods become prohibitively expensive in terms of com- putational effort, one may use Monte Carlo or quasi-Monte Carlo methods, or, in modestly large dimensions, the method of sparse grids.

Differential Equations

Numerical analysis is also concerned with computing (in an approximate way) the solution of dif- ferential equations, both ordinary differential equations and partial differential equations.

Partial differential equations are solved by first discretizing the equation, bringing it into afi- nite-dimensional subspace. This can be done by a , a finite difference meth- od, or (particularly in engineering) a finite volume method. The theoretical justification of these methods often involves theorems from functional analysis. This reduces the problem to the solu- tion of an algebraic equation.

______WORLD TECHNOLOGIES ______10 Numerical Analysis, Modelling and Simulation

Software

Since the late twentieth century, most algorithms are implemented in a variety of programming languages. The Netlib repository contains various collections of software routines for numerical problems, mostly in and C. Commercial products implementing many different numerical algorithms include the IMSL and NAG libraries; a free alternative is the GNU Scientific Library.

There are several popular numerical computing applications such as MATLAB, TK Solver, S-PLUS, LabVIEW, and IDL as well as free and open source alternatives such as FreeMat, , GNU Oc- tave (similar to Matlab), IT++ (a C++ library), (similar to S-PLUS) and certain variants of Py- thon. Performance varies widely: while vector and matrix operations are usually fast, scalar loops may vary in speed by more than an order of magnitude.

Many systems such as Mathematica also benefit from the availability ofarbitrary precision arithmetic which can provide more accurate results.

Also, any software can be used to solve simple problems relating to numerical analy- sis.

References • Golub, Gene H. and Charles F. Van Loan (1986). Matrix Computations, Third Edition (Johns Hopkins Univer- sity Press, ISBN 0-8018-5413-X).

• Higham, Nicholas J. (1996). Accuracy and Stability of Numerical Algorithms (Society for Industrial and Ap- plied Mathematics, ISBN 0-89871-355-2).

______WORLD TECHNOLOGIES ______2 Understanding Simulation

The process of re-creating the operations of the processes that occur in the real world over time is referred to as simulation. Simulation can only take place when the model it is imitating has already been fully developed. This section on simulation offers an insightful focus, keeping in mind the subject matter.

Simulation

Simulation is the imitation of the operation of a real-world process or system over time. The act of sim- ulating something first requires that a model be developed; this model represents the characteris- tics or behaviors/functions of the selected physical or abstract system or process. The model represents the system itself, whereas the simulation represents the operation of the system over time

3D simulation of a Grain Terminal Model.

Simulation is used in many contexts, such as simulation of technology for performance optimization, safety engineering, testing, training, education, and video games. Often, computer experiments are used to study simulation models. Simulation is also used with scientific modelling of natural systems or human systems to gain insight into their functioning. Simulation can be used to show the eventual real effects of alternative conditions and courses of action. Simulation is also used when the real system cannot be engaged, because it may not be accessible, or it may be dangerous or unacceptable to engage, or it is being designed but not yet built, or it may simply not exist.

Key issues in simulation include acquisition of valid source information about the relevant selection of key characteristics and behaviours, the use of simplifying approximations and assumptions within the simulation, and fidelity and validity of the simulation outcomes. Procedures and protocols for model verification and validation are an ongoing field of academic study, refinement, research and develop- ment in simulations technology or practice, particularly in the field of computer simulation.

______WORLD TECHNOLOGIES ______12 Numerical Analysis, Modelling and Simulation

Classification and Terminology

Historically, simulations used in different fields developed largely independently, but 20th century studies of systems theory and cybernetics combined with spreading use of computers across all those fields have led to some unification and a more systematic view of the concept.

Human-in-the-loop simulation of outer space

Physical simulation refers to simulation in which physical objects are substituted for the real thing (some circles use the term for computer simulations modelling selected laws of physics). These physical objects are often chosen because they are smaller or cheaper than the actual object or system.

isualization of a direct numerical simulation model.

Interactive simulation is a special kind of physical simulation, often referred to as a human in the loop simulation, in which physical simulations include human operators, such as in a flight simu- lator or a driving simulator.

Simulation Fidelity is used to describe the accuracy of a simulation and how closely it imitates the real-life counterpart. Fidelity is broadly classified as 1 of 3 categories: low, medium, and high. Specific descriptions of fidelity levels are subject to interpretation but the following generalization can be made:

______WORLD TECHNOLOGIES ______Understanding Simulation 13

• Low - the minimum simulation required for a system to respond to accept inputs and proide outputs

• Medium - responds automatically to stimuli, with limited accuracy

• High - nearly indistinguishable or as close as possible to the real system

Human in the loop simulations can include a computer simulation as a so-called synthetic envi- ronment.

Simulation in failure analysis refers to simulation in which we create environment/conditions to identify the cause of equipment failure. This was the best and fastest method to identify the failure cause.

Computer Simulation

A computer simulation (or “sim”) is an attempt to model a real-life or hypothetical situation on a computer so that it can be studied to see how the system works. By changing variables in the simulation, predictions may be made about the behaviour of the system. It is a tool to virtually investigate the behaviour of the system under study.

Computer simulation has become a useful part of modeling many natural systems in physics, chemistry and biology, and human systems in economics and social science (e.g., computational sociology) as well as in engineering to gain insight into the operation of those systems. A good ex- ample of the usefulness of using computers to simulate can be found in the field of network traffic simulation. In such simulations, the model behaviour will change each simulation according to the set of initial parameters assumed for the environment.

Traditionally, the formal modeling of systems has been via a mathematical model, which attempts to find analytical solutions enabling the prediction of the behaviour of the system from a set of parameters and initial conditions. Computer simulation is often used as an adjunct to, or substitu- tion for, modeling systems for which simple closed form analytic solutions are not possible. There are many different types of computer simulation, the common feature they all is the attempt to generate a sample of representative scenarios for a model in which a complete enumeration of all possible states would be prohibitive or impossible.

Several software packages exist for running computer-based simulation modeling (e.g. Monte Carlo simulation, stochastic modeling, multimethod modeling) that makes all the modeling almost effortless.

Modern usage of the term “computer simulation” may encompass virtually any computer-based representation.

Computer Science

In computer science, simulation has some specialized meanings: Alan Turing used the term “sim- ulation” to refer to what happens when a universal machine executes a state transition table (in modern terminology, a computer runs a program) that describes the state transitions, inputs and

______WORLD TECHNOLOGIES ______14 Numerical Analysis, Modelling and Simulation

outputs of a subject discrete-state machine. The computer simulates the subject machine. Accord- ingly, in theoretical computer science the term simulation is a relation between state transition systems, useful in the study of operational semantics.

Less theoretically, an interesting application of computer simulation is to simulate computers using computers. In computer architecture, a type of simulator, typically called an emulator, is often used to execute a program that has to run on some inconvenient type of computer (for example, a newly de- signed computer that has not yet been built or an obsolete computer that is no longer available), or in a tightly controlled testing environment. For example, simulators have been used to debug a micropro- gram or sometimes commercial application programs, before the program is downloaded to the target machine. Since the operation of the computer is simulated, all of the information about the computer’s operation is directly available to the , and the speed and execution of the simulation can be varied at will.

Simulators may also be used to interpret fault trees, or test VLSI logic designs before they are con- structed. Symbolic simulation uses variables to stand for unknown values.

In the field of optimization, simulations of physical processes are often used in conjunction with evolutionary computation to optimize control strategies.

Simulation in Education and Training

Simulation is extensively used for educational purposes. It is frequently used by way of adaptive hypermedia.

Simulation is often used in the training of civilian and military personnel. This usually occurs when it is prohibitively expensive or simply too dangerous to allow trainees to use the real equip- ment in the real world. In such situations they will spend time learning valuable lessons in a “safe” virtual environment yet living a lifelike experience (or at least it is the goal). Often the convenience is to permit mistakes during training for a safety-critical system. There is a distinction, though, between simulations used for training and Instructional simulation.

Training simulations typically come in one of three categories:

• “live” simulation (where actual players use genuine systems in a real environment);

• “virtual” simulation (where actual players use simulated systems in a synthetic environ- ment), or

• “constructive” simulation (where simulated players use simulated systems in a synthetic environment). Constructive simulation is often referred to as “wargaming” since it bears some resemblance to table-top war games in which players command armies of soldiers and equipment that move around a board.

In standardized tests, “live” simulations are sometimes called “high-fidelity”, producing “samples of likely performance”, as opposed to “low-fidelity”, “pencil-and-paper” simulations producing only “signs of possible performance”, but the distinction between high, moderate and low fidelity remains relative, depending on the context of a particular comparison.

______WORLD TECHNOLOGIES ______Understanding Simulation 15

Simulations in education are somewhat like training simulations. They focus on specific tasks. The term ‘microworld’ is used to refer to educational simulations which model some abstract concept rather than simulating a realistic object or environment, or in some cases model a real world envi- ronment in a simplistic way so as to help a learner develop an understanding of the key concepts. Normally, a user can create some sort of construction within the microworld that will behave in a way consistent with the concepts being modeled. Seymour Papert was one of the first to advocate the value of microworlds, and the Logo programming environment developed by Papert is one of the most famous microworlds. As another example, the Global Challenge Award online STEM learning web site uses microworld simulations to teach science concepts related to global warming and the future of energy. Other projects for simulations in educations are Open Source Physics, NetSim etc.

Project Management Simulation is increasingly used to train students and professionals in the art and science of project management. Using simulation for project management training improves learning retention and enhances the learning process.

Social simulations may be used in social science classrooms to illustrate social and political pro- cesses in anthropology, economics, history, political science, or sociology courses, typically at the high school or university level. These may, for example, take the form of civics simulations, in which participants assume roles in a simulated society, or international relations simulations in which participants engage in negotiations, alliance formation, trade, diplomacy, and the use of force. Such simulations might be based on fictitious political systems, or be based on current or historical events. An example of the latter would be Barnard College’s Reacting to the Past series of historical educational games. The National Science Foundation has also supported the creation of reacting games that address science and math education.

In recent years, there has been increasing use of social simulations for staff training in aid and development agencies. The Carana simulation, for example, was first developed by the United Nations Development Programme, and is now used in a very revised form by the World Bank for training staff to deal with fragile and conflict-affected countries.

Common User Interaction Systems for Virtual Simulations

Virtual simulations represent a specific category of simulation that utilizes simulation equipment to create a simulated world for the user. Virtual simulations allow users to interact with a virtual world. Virtual worlds operate on platforms of integrated software and hardware components. In this man- ner, the system can accept input from the user (e.g., body tracking, voice/sound recognition, physical controllers) and produce output to the user (e.g., visual display, aural display, haptic display). Virtual Simulations use the aforementioned modes of interaction to produce a sense of immersion for the user

Virtual Simulation Input Hardware

There is a wide variety of input hardware available to accept user input for virtual simulations. The following list briefly describes several of them:

Body tracking: The motion capture method is often used to record the user’s movements and trans- late the captured data into inputs for the virtual simulation. For example, if a user physically turns

______WORLD TECHNOLOGIES ______16 Numerical Analysis, Modelling and Simulation

their head, the motion would be captured by the simulation hardware in some way and translated to a corresponding shift in view within the simulation.

Motorcycle simulator of Bienal do Automóel exhibition, in Belo Horizonte, Brazil.

• Capture suits and/or gloves may be used to capture movements of users body parts. The systems may have sensors incorporated inside them to sense movements of different body parts (e.g., fingers). Alternatively, these systems may have exterior tracking devices or marks that can be detected by external ultrasound, optical receivers or electromagnetic sensors. Internal inertial sensors are also available on some systems. The units may trans- mit data either wirelessly or through cables.

• Eye trackers can also be used to detect eye movements so that the system can determine precisely where a user is looking at any given instant.

Physical controllers: Physical controllers provide input to the simulation only through direct ma- nipulation by the user. In virtual simulations, tactile feedback from physical controllers is highly desirable in a number of simulation environments.

• Omni directional treadmills can be used to capture the users locomotion as they walk or run.

• High fidelity instrumentation such as instrument panels in virtual aircraft cockpits pro- vides users with actual controls to raise the level of immersion. For example, pilots can use the actual global positioning system controls from the real device in a simulated cockpit to help them practice procedures with the actual device in the context of the integrated cockpit system.

Voice/sound recognition: This form of interaction may be used either to interact with agents with- in the simulation (e.g., virtual people) or to manipulate objects in the simulation (e.g., informa- tion). Voice interaction presumably increases the level of immersion for the user.

• Users may use headsets with boom microphones, lapel microphones or the room may be equipped with strategically located microphones.

______WORLD TECHNOLOGIES ______Understanding Simulation 17

Current Research into User Input Systems

Research in future input systems hold a great deal of promise for virtual simulations. Systems such as brain–computer interfaces (BCIs) offer the ability to further increase the level of immersion for virtual simulation users. Lee, Keinrath, Scherer, Bischof, Pfurtscheller proved that naïve subjects could be trained to use a BCI to navigate a virtual apartment with relative ease. Using the BCI, the authors found that subjects were able to freely navigate the virtual environment with relatively minimal effort. It is possible that these types of systems will become standard input modalities in future virtual simulation systems.

Virtual Simulation Output Hardware

There is a wide variety of output hardware available to deliver stimulus to users in virtual simula- tions. The following list briefly describes several of them:

Visual display: visual displays provide the visual stimulus to the user.

• Stationary displays can vary from a conventional desktop display to 360-degree wrap around screens to stereo three-dimensional screens. Conventional desktop displays can vary in size from 15 to 60+ inches. Wrap around screens are typically utilized in what is known as a cave automatic virtual environment (CAVE). Stereo three-dimensional screens produce three-di- mensional images either with or without special glasses—depending on the design.

• Head-mounted displays (HMDs) have small displays that are mounted on headgear worn by the user. These systems are connected directly into the virtual simulation to provide the user with a more immersive experience. Weight, update rates and field of view are some of the key variables that differentiate HMDs. Naturally, heavier HMDs are undesirable as they cause fatigue over time. If the update rate is too slow, the system is unable to update the displays fast enough to correspond with a quick head turn by the user. Slower update rates tend to cause simulation sickness and disrupt the sense of immersion. Field of view or the angular extent of the world that is seen at a given moment field of view can vary from system to system and has been found to affect the users sense of immersion.

Aural display: Several different types of audio systems exist to help the user hear and localize sounds spatially. Special software can be used to produce 3D audio effects 3D audio to create the illusion that sound sources are placed within a defined three-dimensional space around the user.

• Stationary conventional speaker systems may be used provide dual or multi-channel sur- round sound. However, external speakers are not as effective as headphones in producing 3D audio effects.

• Conventional headphones offer a portable alternative to stationary speakers. They also have the added advantages of masking real world noise and facilitate more effective 3D audio sound effects.

Haptic display: These displays provide sense of touch to the user (haptic technology). This type of output is sometimes referred to as force feedback.

• Tactile tile displays use different types of actuators such as inflatable bladders, vibrators,

______WORLD TECHNOLOGIES ______18 Numerical Analysis, Modelling and Simulation

low frequency sub-woofers, pin actuators and/or thermo-actuators to produce sensations for the user.

• End effector displays can respond to users inputs with resistance and force. These systems are often used in medical applications for remote surgeries that employ robotic instru- ments.

Vestibular display: These displays provide a sense of motion to the user (motion simulator). They often manifest as motion bases for virtual vehicle simulation such as driving simulators or flight simulators. Motion bases are fixed in place but use actuators to move the simulator in ways that can produce the sensations pitching, yawing or rolling. The simulators can also move in such a way as to produce a sense of acceleration on all axes (e.g., the motion base can produce the sensation of falling).

Clinical Healthcare Simulators

Medical simulators are increasingly being developed and deployed to teach therapeutic and diagnostic procedures as well as medical concepts and decision making to personnel in the health professions. Simulators have been developed for training procedures ranging from the basics such as blood draw, to laparoscopic surgery and trauma care. They are also important to help on prototyping new devices for biomedical engineering problems. Currently, simulators are applied to research and develop tools for new therapies, treatments and early diagnosis in medicine.

Many medical simulators involve a computer connected to a plastic simulation of the relevant anatomy. Sophisticated simulators of this type employ a life size mannequin that responds to in- jected drugs and can be programmed to create simulations of life-threatening emergencies. In other simulations, visual components of the procedure are reproduced by computer graphics tech- niques, while touch-based components are reproduced by haptic feedback devices combined with physical simulation routines computed in response to the user’s actions. Medical simulations of this sort will often use 3D CT or MRI scans of patient data to enhance realism. Some medical sim- ulations are developed to be widely distributed (such as web-enabled simulations and procedural simulations that can be viewed via standard web browsers) and can be interacted with using - dard computer interfaces, such as the keyboard and mouse.

Another important medical application of a simulator—although, perhaps, denoting a slightly dif- ferent meaning of simulator—is the use of a placebo drug, a formulation that simulates the active drug in trials of drug efficacy.

Improving Patient Safety

Patient safety is a concern in the medical industry. Patients have been known to suffer injuries and even death due to management error, and lack of using best standards of care and training. Ac- cording to Building a National Agenda for Simulation-Based Medical Education (Eder-Van Hook, Jackie, 2004), “A health care provider’s ability to react prudently in an unexpected situation is one of the most critical factors in creating a positive outcome in medical emergency, regardless of whether it occurs on the battlefield, freeway, or hospital emergency room.” simulation. Eder-Van

______WORLD TECHNOLOGIES ______Understanding Simulation 19

Hook (2004) also noted that medical errors kill up to 98,000 with an estimated cost between $37 and $50 million and $17 to $29 billion for preventable adverse events dollars per year. “Deaths due to preventable adverse events exceed deaths attributable to motor vehicle accidents, breast cancer, or AIDS” Eder-Van Hook (2004).

Innovative simulation training solutions are now being used to train medical professionals in an attempt to reduce the number of safety concerns that have adverse effects on the pa- tients. However, according to the article “Does Simulation Improve Patient Safety? Self-effi- cacy, Competence, Operational Performance, and Patient Safety” (Nishisaki A., Keren R., and Nadkarni, V., 2007), the jury is still out. Nishisaki states that “There is good evidence that simulation training improves provider and team self-efficacy and competence on manikins. There is also good evidence that procedural simulation improves actual operational perfor- mance in clinical settings.” However, no evidence yet shows that crew resource management training through simulation, despite its promise, improves team operational performance at the bedside. Although evidence that simulation-based training actually improves patient out- come has been slow to accrue, today the ability of simulation to provide hands-on experience that translates to the operating room is no longer in doubt.

One such attempt to improve patient safety through the use of simulations training is pediatric care to deliver just-in-time service or/and just-in-place. This training consists of 20 minutes of simulated training just before workers report to shift. It is hoped that the recentness of the train- ing will increase the positive and reduce the negative results that have generally been associated with the procedure. The purpose of this study is to determine if just-in-time training improves patient safety and operational performance of orotracheal intubation and decrease occurrences of undesired associated events and “to test the hypothesis that high fidelity simulation may enhance the training efficacy and patient safety in simulation settings.” The conclusion as reported in “Ab- stract P38: Just-In-Time Simulation Training Improves ICU Physician Trainee Airway Resusci- tation Participation without Compromising Procedural Success or Safety” (Nishisaki A., 2008), were that simulation training improved resident participation in real cases; but did not sacrifice the quality of service. It could be therefore hypothesized that by increasing the number of highly trained residents through the use of simulation training, that the simulation training does in fact increase patient safety. This hypothesis would have to be researched for validation and the results may or may not generalize to other situations.

History of Simulation in Healthcare

The first medical simulators were simple models of human patients. Since antiquity, these representations in clay and stone were used to demonstrate clinical features of disease states and their effects on humans. Models have been found from many cultures and continents. These models have been used in some cultures (e.g., Chinese culture) as a “diagnostic” instrument, allowing women to consult male physicians while maintaining social laws of modesty. Models are used today to help students learn the anatomy of the musculoskeletal system and organ systems.

In 2002, the Society for Simulation in Healthcare (SSH) was formed to become a leader in interna- tional interprofessional advances the application of medical simulation in healthcare.

______WORLD TECHNOLOGIES ______20 Numerical Analysis, Modelling and Simulation

The need for a “uniform mechanism to educate, evaluate, and certify simulation instructors for the health care profession” was recognized by McGaghie et al. in their critical review of simula- tion-based medical education research. In 2012 the SSH piloted two new certifications to provide recognition to educators in an effort to meet this need.

Type of Models Active Models

Active models that attempt to reproduce living anatomy or physiology are recent developments. The famous “Harvey” mannequin was developed at the University of Miami and is able to recreate many of the physical findings of the cardiology examination, including palpation, auscultation, and electrocardiography.

Interactive Models

More recently, interactive models have been developed that respond to actions taken by a student or physician. Until recently, these simulations were two dimensional computer programs that act- ed more like a textbook than a patient. Computer simulations have the advantage of allowing a student to make judgments, and also to make errors. The process of iterative learning through assessment, evaluation, decision making, and error correction creates a much stronger learning environment than passive instruction.

Computer Simulators

Simulators have been proposed as an ideal tool for assessment of students for clinical skills. For patients, “cybertherapy” can be used for sessions simulating traumatic experiences, from fear of heights to social anxiety.

3DiTeams learner is percussing the patient’s chest in irtual field hospital

Programmed patients and simulated clinical situations, including mock disaster drills, have been used extensively for education and evaluation. These “lifelike” simulations are expensive, and lack reproducibility. A fully functional “3Di” simulator would be the most specific tool available for teaching and measurement of clinical skills. Gaming platforms have been applied to create these virtual medical environments to create an interactive method for learning and application of in- formation in a clinical context.

______WORLD TECHNOLOGIES ______Understanding Simulation 21

Immersive disease state simulations allow a doctor or HCP to experience what a disease actually feels like. Using sensors and transducers symptomatic effects can be delivered to a participant allowing them to experience the patients disease state.

Such a simulator meets the goals of an objective and standardized examination for clinical com- petence. This system is superior to examinations that use “standard patients” because it permits the quantitative measurement of competence, as well as reproducing the same objective findings.

Simulation in Entertainment

Simulation in entertainment encompasses many large and popular industries such as film, televi- sion, video games (including serious games) and rides in theme parks. Although modern simula- tion is thought to have its roots in training and the military, in the 20th century it also became a conduit for enterprises which were more hedonistic in nature.

History of Visual Simulation in Film and Games Early History (1940S and 1950S)

The first simulation game may have been created as early as 1947 by Thomas T. Goldsmith Jr. and Estle Ray Mann. This was a straightforward game that simulated a missile being fired at a target. The curve of the missile and its speed could be adjusted using several knobs. In 1958, a computer game called “Tennis for Two” was created by Willy Higginbotham which simulated a tennis game between two players who could both play at the same time using hand controls and was displayed on an oscilloscope. This was one of the first electronic video games to use a graphical display.

1970s and Early 1980s

Computer-generated imagery was used in film to simulate objects as early as 1972 in the A Com- puter Animated Hand, parts of which were shown on the big screen in the 1976 film Futureworld. Many will remember the “targeting computer” that young Skywalker turns off in the 1977 film Star Wars.

The film Tron (1982) was the first film to use computer-generated imagery for more than a couple of minutes.

Advances in technology in the 1980s caused 3D simulation to become more widely used and it began to appear in movies and in computer-based games such as Atari’s Battlezone (1980) and Acornsoft’s Elite (1984), one of the first wire-frame 3D graphics games for home computers.

Pre-virtual Cinematography Era (Early 1980s to 1990s)

Advances in technology in the 1980s made the computer more affordable and more capable than they were in previous decades, which facilitated the rise of computer such as the Xbox gaming. The first video game consoles released in the 1970s and early 1980s fell prey to the industry crash in 1983, but in 1985, Nintendo released the Nintendo Entertainment System (NES) which be- came one of the best selling consoles in video game history. In the 1990s, computer games became widely popular with the release of such game as The Sims and Command & Conquer and the still

______WORLD TECHNOLOGIES ______22 Numerical Analysis, Modelling and Simulation

increasing power of desktop computers. Today, computer simulation games such as World of War- craft are played by millions of people around the world.

In 1993, the film Jurassic Park became the first popular film to use computer-generated graphics extensively, integrating the simulated dinosaurs almost seamlessly into live action scenes.

This event transformed the film industry; in 1995, the film Toy Story was the first film to use only computer-generated images and by the new millennium computer generated graphics were the leading choice for special effects in films.

Virtual Cinematography (Early 2000s–Present)

The advent of virtual cinematography in the early 2000s (decade) has led to an explosion of mov- ies that would have been impossible to shoot without it. Classic examples are the digital look-alikes of Neo, Smith and other characters in the Matrix sequels and the extensive use of physically impos- sible camera runs in the The Lord of the Rings (film series) trilogy.

The terminal in the Pan Am (TV series) no longer existed during the filming of this 2011-2012 aired se- ries, which was no problem as they created it in virtual cinematography utilising automated viewpoint finding and matching in conjunction with compositing real and simulated footage, which has been the bread and butter of the movie artist in and around film studios since the early 2000s.

Computer-generated imagery is “the application of the field of 3D computer graphics to special effects”. This technology is used for visual effects because they are high in quality, controllable, and can create effects that would not be feasible using any other technology either because of cost, resources or safety. Computer-generated graphics can be seen in many live action movies today, especially those of the action genre. Further, computer generated imagery has almost completely supplanted hand-drawn animation in children’s movies which are increasingly computer-gener- ated only. Examples of movies that use computer-generated imagery include Finding Nemo, 300 and Iron Man.

Examples of Non-film Entertainment Simulation Simulation Games

Simulation games, as opposed to other genres of video and computer games, represent or simulate an environment accurately. Moreover, they represent the interactions between the playable char- acters and the environment realistically. These kinds of games are usually more complex in terms of game play. Simulation games have become incredibly popular among people of all ages. Popular simulation games include SimCity and Tiger Woods PGA Tour. There are also flight simulator and driving simulator games.

Theme Park Rides

Simulators have been used for entertainment since the Link Trainer in the 1930s. The first modern simulator ride to open at a theme park was Disney’s Star Tours in 1987 soon followed by Univer- sal’s The Funtastic World of Hanna-Barbera in 1990 which was the first ride to be done entirely with computer graphics.

______WORLD TECHNOLOGIES ______Understanding Simulation 23

Simulator rides are the progeny of military training simulators and commercial simulators, but they are different in a fundamental way. While military training simulators react realistically to the input of the trainee in real time, ride simulators only feel like they move realistically and move according to prerecorded motion scripts. One of the first simulator rides, Star Tours, which cost $32 millon, used a hydraulic motion based cabin. The movement was programmed by a joystick. Today’s simulator rides, such as The Amazing Adventures of Spider-Man include elements to increase the amount of immersion experienced by the riders such as: 3D imagery, physical effects (spraying water or producing scents), and movement through an environment. Examples of simulation rides include Mission Space and The Simpsons Ride. There are many simulation rides at themeparks like Disney, Universal etc., Examples are Flint Stones, Earth Quake, Time Machine, King Kong.

Simulation and Manufacturing

Manufacturing represents one of the most important applications of simulation. This technique represents a valuable tool used by engineers when evaluating the effect of capital investment in equipment and physical facilities like factory plants, warehouses, and distribution centers. Sim- ulation can be used to predict the performance of an existing or planned system and to compare alternative solutions for a particular design problem.

Another important goal of Simulation in Manufacturing Systems is to quantify system perfor- mance. Common measures of system performance include the following: • Throughput under average and peak loads; • System cycle time (how long it take to produce one part); • Utilization of resource, labor, and machines; • Bottlenecks and choke points; • Queuing at work locations; • Queuing and delays caused by material-handling devices and systems; • WIP storages needs; • Staffing requirements; • Effectiveness of scheduling systems; • Effectiveness of control systems.

More Examples of Simulation Automobiles

An automobile simulator provides an opportunity to reproduce the characteristics of real vehicles in a virtual environment. It replicates the external factors and conditions with which a vehicle in- teracts enabling a driver to feel as if they are sitting in the cab of their own vehicle. Scenarios and events are replicated with sufficient reality to ensure that drivers become fully immersed in the experience rather than simply viewing it as an educational experience.

______WORLD TECHNOLOGIES ______24 Numerical Analysis, Modelling and Simulation

Car racing simulator

The simulator provides a constructive experience for the novice driver and enables more complex exercises to be undertaken by the more mature driver. For novice drivers, truck simulators pro- vide an opportunity to begin their career by applying best practice. For mature drivers, simulation provides the ability to enhance good driving or to detect poor practice and to suggest the necessary steps for remedial action. For companies, it provides an opportunity to educate staff in the driving skills that achieve reduced maintenance costs, improved productivity and, most importantly, to ensure the safety of their actions in all possible situations.

A soldier tests out a heay-wheeled-ehicle drier simulator.

Biomechanics An open-source simulation platform for creating dynamic mechanical models built from combi- nations of rigid and deformable bodies, joints, constraints, and various force actuators. It is spe- cialized for creating biomechanical models of human anatomical structures, with the intention to study their function and eventually assist in the design and planning of medical treatment. A biomechanics simulator is used to analyze walking dynamics, study sports performance, sim- ulate surgical procedures, analyze joint loads, design medical devices, and animate human and animal movement. A neuromechanical simulator that combines biomechanical and biologically realistic neural net- work simulation. It allows the user to test hypotheses on the neural basis of behavior in a physical- ly accurate 3-D virtual environment.

______WORLD TECHNOLOGIES ______Understanding Simulation 25

City and Urban

A city simulator can be a city-building game but can also be a tool used by urban planners to under- stand how cities are likely to evolve in response to various policy decisions. AnyLogic is an example of modern, large-scale urban simulators designed for use by urban planners. City simulators are generally agent-based simulations with explicit representations for land use and transportation. UrbanSim and LEAM are examples of large-scale urban simulation models that are used by met- ropolitan planning agencies and military bases for land use and transportation planning.

Classroom of The Future

The “classroom of the future” will probably contain several kinds of simulators, in addition to tex- tual and visual learning tools. This will allow students to enter the clinical years better prepared, and with a higher skill level. The advanced student or postgraduate will have a more concise and comprehensive method of retraining—or of incorporating new clinical procedures into their skill set—and regulatory bodies and medical institutions will find it easier to assess the proficiency and competency of individuals.

The classroom of the future will also form the basis of a clinical skills unit for continuing education of medical personnel; and in the same way that the use of periodic flight training assists airline pilots, this technology will assist practitioners throughout their career.

The simulator will be more than a “living” textbook, it will become an integral a part of the prac- tice of medicine. The simulator environment will also provide a standard platform for curriculum development in institutions of medical education.

Communication Satellites

Modern satellite communications systems (SatCom) are often large and complex with many interact- ing parts and elements. In addition, the need for broadband connectivity on a moving vehicle has in- creased dramatically in the past few years for both commercial and military applications. To accurately predict and deliver high quality of service, satcom system designers have to factor in terrain as well as atmospheric and meteorological conditions in their planning. To deal with such complexity, system designers and operators increasingly turn towards computer models of their systems to simulate real world operational conditions and gain insights into usability and requirements prior to final product sign-off. Modeling improves the understanding of the system by enabling the SatCom system designer or planner to simulate real world performance by injecting the models with multiple hypothetical at- mospheric and environmental conditions. Simulation is often used in the training of civilian and mili- tary personnel. This usually occurs when it is prohibitively expensive or simply too dangerous to allow trainees to use the real equipment in the real world. In such situations they will spend time learning valuable lessons in a “safe” virtual environment yet living a lifelike experience (or at least it is the goal). Often the convenience is to permit mistakes during training for a safety-critical system.

Digital Lifecycle

Simulation solutions are being increasingly integrated with CAx (CAD, CAM, CAE....) solutions and processes. The use of simulation throughout the product lifecycle, especially at the earlier

______WORLD TECHNOLOGIES ______26 Numerical Analysis, Modelling and Simulation

concept and design stages, has the potential of providing substantial benefits. These benefits range from direct cost issues such as reduced prototyping and shorter time-to-market, to better perform- ing products and higher margins. However, for some companies, simulation has not provided the expected benefits.

Simulation of airflow oer an engine

The research firm Aberdeen Group has found that nearly all best-in-class manufacturers use sim- ulation early in the design process as compared to 3 or 4 laggards who do not.

The successful use of simulation, early in the lifecycle, has been largely driven by increased inte- gration of simulation tools with the entire CAD, CAM and PLM solution-set. Simulation solutions can now function across the extended enterprise in a multi-CAD environment, and include solu- tions for managing simulation data and processes and ensuring that simulation results are made part of the product lifecycle history. The ability to use simulation across the entire lifecycle has been enhanced through improved user interfaces such as tailorable user interfaces and “wizards” which allow all appropriate PLM participants to take part in the simulation process.

Disaster Preparedness

Simulation training has become a method for preparing people for disasters. Simulations can rep- licate emergency situations and track how learners respond thanks to a lifelike experience. Di- saster preparedness simulations can involve training on how to handle terrorism attacks, natural disasters, pandemic outbreaks, or other life-threatening emergencies.

One organization that has used simulation training for disaster preparedness is CADE (Center for Advancement of Distance Education). CADE has used a video game to prepare emergency workers for multiple types of attacks. As reported by News-Medical.Net, “The video game is the first in a series of simulations to address bioterrorism, pandemic flu, smallpox and other disasters that emergency personnel must prepare for.” Developed by a team from the University of Illinois at Chicago (UIC), the game allows learners to practice their emergency skills in a safe, controlled environment.

The Emergency Simulation Program (ESP) at the British Columbia Institute of Technology (BCIT), Vancouver, British Columbia, Canada is another example of an organization that uses simulation to train for emergency situations. ESP uses simulation to train on the following situations: forest

______WORLD TECHNOLOGIES ______Understanding Simulation 27

fire fighting, oil or chemical spill response, earthquake response, law enforcement, municipal fire fighting, hazardous material handling, military training, and response to terrorist attack One fea- ture of the simulation system is the implementation of “Dynamic Run-Time Clock,” which allows simulations to run a ‘simulated’ time frame, “’speeding up’ or ‘slowing down’ time as desired”Ad- ditionally, the system allows session recordings, picture-icon based navigation, file storage of indi- vidual simulations, multimedia components, and launch external applications.

At the University of Québec in Chicoutimi, a research team at the outdoor research and expertise laboratory (Laboratoire d’Expertise et de Recherche en Plein Air–LERPA) specializes in using wil- derness backcountry accident simulations to verify emergency response coordination.

Instructionally, the benefits of emergency training through simulations are that learner perfor- mance can be tracked through the system. This allows the developer to make adjustments as nec- essary or alert the educator on topics that may require additional attention. Other advantages are that the learner can be guided or trained on how to respond appropriately before continuing to the next emergency segment—this is an aspect that may not be available in the live-environment. Some emergency training simulators also allows for immediate feedback, while other simulations may provide a summary and instruct the learner to engage in the learning topic again.

In a live-emergency situation, emergency responders do not have time to waste. Simulation-train- ing in this environment provides an opportunity for learners to gather as much information as they can and practice their knowledge in a safe environment. They can make mistakes without risk of endangering lives and be given the opportunity to correct their errors to prepare for the real-life emergency.

Economics

In economics and especially macroeconomics, the effects of proposed policy actions, such as fiscal policy changes or monetary policy changes, are simulated to judge their desirability. A mathemat- ical model of the economy, having been fitted to historical economic data, is used as a proxy for the actual economy; proposed values of government spending, taxation, open market operations, etc. are used as inputs to the simulation of the model, and various variables of interest such as the inflation rate, the unemployment rate, the balance of trade deficit, the government budget deficit, etc. are the outputs of the simulation. The simulated values of these variables of interest are com- pared for different proposed policy inputs to determine which set of outcomes is most desirable.

Engineering, Technology, and Processes

Simulation is an important feature in engineering systems or any system that involves many pro- cesses. For example, in electrical engineering, delay lines may be used to simulate propagation delay and phase shift caused by an actual transmission line. Similarly, dummy loads may be used to simulate impedance without simulating propagation, and is used in situations where propaga- tion is unwanted. A simulator may imitate only a few of the operations and functions of the unit it simulates.

Most engineering simulations entail mathematical modeling and computer assisted investigation. There are many cases, however, where mathematical modeling is not reliable. Simulation of fluid

______WORLD TECHNOLOGIES ______28 Numerical Analysis, Modelling and Simulation

dynamics problems often require both mathematical and physical simulations. In these cases the physical models require dynamic similitude. Physical and chemical simulations have also direct realistic uses, rather than research uses; in chemical engineering, for example, process simulations are used to give the process parameters immediately used for operating chemical plants, such as oil refineries. Simulators are also used for plant operator training. It is called Operator Training Simulator (OTS) and has been widely adopted by many industries from chemical to oil&gas and to power industry. This created a safe and realistic virtual environment to train board operators and engineers. Mimic is capable of providing high fidelity dynamic models of nearly all chemical plants for operator training and control system testing.

Equipment

Due to the dangerous and expensive nature of training on heavy equipment, simulation has be- come a common solution across many industries. Types of simulated equipment include cranes, mining reclaimers and construction equipment, among many others. Often the simulation units will include pre-built scenarios by which to teach trainees, as well as the ability to customize new scenarios. Such equipment simulators are intended to create a safe and cost effective alternative to training on live equipment.

Ergonomics

Ergonomic simulation involves the analysis of virtual products or manual tasks within a virtual environment. In the engineering process, the aim of ergonomics is to develop and to improve the design of products and work environments. Ergonomic simulation utilizes an anthropometric vir- tual representation of the human, commonly referenced as a mannequin or Digital Human Models (DHMs), to mimic the postures, mechanical loads, and performance of a human operator in a sim- ulated environment such as an airplane, automobile, or manufacturing facility. DHMs are recog- nized as evolving and valuable tool for performing proactive ergonomics analysis and design. The simulations employ 3D-graphics and physics-based models to animate the virtual humans. Ergo- nomics software uses inverse kinematics (IK) capability for posing the DHMs. Several ergonomic simulation tools have been developed including Jack, SAFEWORK, RAMSIS, and SAMMIE.

The software tools typically calculate biomechanical properties including individual muscle forces, joint forces and moments. Most of these tools employ standard ergonomic evaluation methods such as the NIOSH lifting equation and Rapid Upper Limb Assessment (RULA). Some simulations also analyze physiological measures including metabolism, energy expenditure, and fatigue limits Cycle time studies, design and process validation, user comfort, reachability, and line of sight are other human-factors that may be examined in ergonomic simulation packages.

Modeling and simulation of a task can be performed by manually manipulating the virtual hu- man in the simulated environment. Some ergonomics simulation software permits interactive, real-time simulation and evaluation through actual human input via motion capture technologies. However, motion capture for ergonomics requires expensive equipment and the creation of props to represent the environment or product.

Some applications of ergonomic simulation in include analysis of solid waste collection, disaster management tasks, interactive gaming, automotive assembly line, virtual prototyping of rehabili-

______WORLD TECHNOLOGIES ______Understanding Simulation 29

tation aids, and aerospace product design. Ford engineers use ergonomics simulation software to perform virtual product design reviews. Using engineering data, the simulations assist evaluation of assembly ergonomics. The company uses Siemen’s Jack and Jill ergonomics simulation soft- ware in improving worker safety and efficiency, without the need to build expensive prototypes.

Finance

In finance, computer simulations are often used for scenario planning. Risk-adjusted net present value, for example, is computed from well-defined but not always known (or fixed) inputs. By im- itating the performance of the project under evaluation, simulation can provide a distribution of NPV over a range of discount rates and other variables.

Simulations are frequently used in financial training to engage participants in experiencing var- ious historical as well as fictional situations. There are stock market simulations, portfolio sim- ulations, risk management simulations or models and forex simulations. Such simulations are typically based on stochastic asset models. Using these simulations in a training program allows for the application of theory into a something akin to real life. As with other industries, the use of simulations can be technology or case-study driven.

Flight

Flight Simulation Training Devices (FSTD) are used to train pilots on the ground. In comparison to training in an actual aircraft, simulation based training allows for the training of maneuvers or situations that may be impractical (or even dangerous) to perform in the aircraft, while keeping the pilot and instructor in a relatively low-risk environment on the ground. For example, electrical system failures, instrument failures, hydraulic system failures, and even flight control failures can be simulated without risk to the pilots or an aircraft.

Instructors can also provide students with a higher concentration of training tasks in a given peri- od of time than is usually possible in the aircraft. For example, conducting multiple instrument ap- proaches in the actual aircraft may require significant time spent repositioning the aircraft, while in a simulation, as soon as one approach has been completed, the instructor can immediately preposition the simulated aircraft to an ideal (or less than ideal) location from which to begin the next approach.

Flight simulation also provides an economic advantage over training in an actual aircraft. Once fuel, maintenance, and insurance costs are taken into account, the operating costs of an FSTD are usually substantially lower than the operating costs of the simulated aircraft. For some large transport category airplanes, the operating costs may be several times lower for the FSTD than the actual aircraft.

Some people who use simulator software, especially flight simulator software, build their own sim- ulator at home. Some people—to further the realism of their homemade simulator—buy used cards and racks that run the same software used by the original machine. While this involves solving the problem of matching hardware and software—and the problem that hundreds of cards plug into many different racks—many still find that solving these problems is well worthwhile. Some are so serious about realistic simulation that they will buy real aircraft parts, like complete nose sections

______WORLD TECHNOLOGIES ______30 Numerical Analysis, Modelling and Simulation

of written-off aircraft, at aircraft boneyards. This permits people to simulate a hobby that they are unable to pursue in real life.

Marine

Bearing resemblance to flight simulators, marine simulators train ships’ personnel. The most com- mon marine simulators include: • Ship’s bridge simulators • Engine room simulators • Cargo handling simulators • Communication / GMDSS simulators • ROV simulators Simulators like these are mostly used within maritime colleges, training institutions and navies. They often consist of a replication of a ships’ bridge, with operating console(s), and a number of screens on which the virtual surroundings are projected.

Military

Military simulations, also known informally as war games, are models in which theories of war- fare can be tested and refined without the need for actual hostilities. They exist in many different forms, with varying degrees of realism. In recent times, their scope has widened to include not only military but also political and social factors (for example, the NationLab series of strategic exer- cises in Latin America). While many governments make use of simulation, both individually and collaboratively, little is known about the model’s specifics outside professional circles.

Payment and Securities Settlement System

Simulation techniques have also been applied to payment and securities settlement systems. Among the main users are central banks who are generally responsible for the oversight of market infrastructure and entitled to contribute to the smooth functioning of the payment systems. Central banks have been using payment system simulations to evaluate things such as the adequa- cy or sufficiency of liquidity available ( in the form of account balances and intraday credit limits) to participants (mainly banks) to allow efficient settlement of payments.The need for liquidity is also dependent on the availability and the type of netting procedures in the systems, thus some of the studies have a focus on system comparisons. Another application is to evaluate risks related to events such as communication network break- downs or the inability of participants to send payments (e.g. in case of possible bank failure).This kind of analysis falls under the concepts of stress testing or scenario analysis. A common way to conduct these simulations is to replicate the settlement logics of the real pay- ment or securities settlement systems under analysis and then use real observed payment data. In case of system comparison or system development, naturally also the other settlement logics need to be implemented.

______WORLD TECHNOLOGIES ______Understanding Simulation 31

To perform stress testing and scenario analysis, the observed data needs to be altered, e.g. some payments delayed or removed. To analyze the levels of liquidity, initial liquidity levels are varied. System comparisons (benchmarking) or evaluations of new netting algorithms or rules are per- formed by running simulations with a fixed set of data and varying only the system setups.

Inference is usually done by comparing the benchmark simulation results to the results of altered simulation setups by comparing indicators such as unsettled transactions or settlement delays.

Project Management

Project management simulation is simulation used for project management training and analysis. It is often used as training simulation for project managers. In other cases it is used for what-if analysis and for supporting decision-making in real projects. Frequently the simulation is con- ducted using software tools.

Robotics

A robotics simulator is used to create embedded applications for a specific (or not) robot without being dependent on the ‘real’ robot. In some cases, these applications can be transferred to the real robot (or rebuilt) without modifications. Robotics simulators allow reproducing situations that cannot be ‘created’ in the real world because of cost, time, or the ‘uniqueness’ of a resource. A simulator also allows fast robot prototyping. Many robot simulators feature physics engines to simulate a robot’s dynamics.

Production

Simulations of production systems is used mainly to examine the effect of improvements or invest- ments in a production system. Most often this is done using a static spreadsheet with process times and transportation times. For more sophisticated simulations Discrete Event Simulation (DES) is used with the advantages to simulate dynamics in the production system. A production system is very much dynamic depending on variations in manufacturing processes, assembly times, ma- chine set-ups, breaks, breakdowns and small stoppages. There are lots of software commonly used for discrete event simulation. They differ in usability and markets but do often share the same foundation.

Sales Process

Simulations are useful in modeling the flow of transactions through business processes, such as in the field of sales process engineering, to study and improve the flow of customer orders through various stages of completion (say, from an initial proposal for providing goods/services through order acceptance and installation). Such simulations can help predict the impact of how improve- ments in methods might impact variability, cost, labor time, and the quantity of transactions at various stages in the process. A full-featured computerized process simulator can be used to depict such models, as can simpler educational demonstrations using spreadsheet software, pennies be- ing transferred between cups based on the roll of a die, or dipping into a tub of colored beads with a scoop.

______WORLD TECHNOLOGIES ______32 Numerical Analysis, Modelling and Simulation

Sports

In sports, computer simulations are often done to predict the outcome of events and the perfor- mance of individual sportspeople. They attempt to recreate the event through models built from statistics. The increase in technology has allowed anyone with knowledge of programming the ability to run simulations of their models. The simulations are built from a series of mathematical algorithms, or models, and can vary with accuracy. Accuscore, which is licensed by companies such as ESPN, is a well known simulation program for all major sports. It offers detailed analysis of games through simulated betting lines, projected point totals and overall probabilities.

With the increased interest in fantasy sports simulation models that predict individual player per- formance have gained popularity. Companies like What If Sports and StatFox specialize in not only using their simulations for predicting game results, but how well individual players will do as well. Many people use models to determine who to start in their fantasy leagues.

Another way simulations are helping the sports field is in the use of biomechanics. Models are derived and simulations are run from data received from sensors attached to athletes and video equipment. Sports biomechanics aided by simulation models answer questions regarding training techniques such as: the effect of fatigue on throwing performance (height of throw) and biome- chanical factors of the upper limbs (reactive strength index; hand contact time).

Computer simulations allow their users to take models which before were too complex to run, and give them answers. Simulations have proven to be some of the best insights into both play perfor- mance and team predictability.

Space Shuttle Countdown

Simulation is used at Kennedy Space Center (KSC) to train and certify Space Shuttle engineers during simulated launch countdown operations. The Space Shuttle engineering community participates in a launch countdown integrated simulation before each shuttle flight. This simulation is a virtual sim- ulation where real people interact with simulated Space Shuttle vehicle and Ground Support Equip- ment (GSE) hardware. The Shuttle Final Countdown Phase Simulation, also known as S0044, involves countdown processes that integrate many of the Space Shuttle vehicle and GSE systems. Some of the Shuttle systems integrated in the simulation are the main propulsion system, main engines, solid rock- et boosters, ground liquid hydrogen and liquid oxygen, external tank, flight controls, navigation, and avionics. The high-level objectives of the Shuttle Final Countdown Phase Simulation are:

Firing Room 1 configured for space shuttle launches

______WORLD TECHNOLOGIES ______Understanding Simulation 33

• To demonstrate Firing Room final countdown phase operations.

• To provide training for system engineers in recognizing, reporting and evaluating system problems in a time critical environment.

• To exercise the launch team’s ability to evaluate, prioritize and respond to problems in an integrated manner within a time critical environment.

• To provide procedures to be used in performing failure/recovery testing of the operations performed in the final countdown phase.

The Shuttle Final Countdown Phase Simulation takes place at the Kennedy Space Center Launch Control Center Firing Rooms. The firing room used during the simulation is the same control room where real launch countdown operations are executed. As a result, equipment used for real launch countdown operations is engaged. Command and control computers, application software, engi- neering plotting and trending tools, launch countdown procedure documents, launch commit cri- teria documents, hardware requirement documents, and any other items used by the engineering launch countdown teams during real launch countdown operations are used during the simula- tion. The Space Shuttle vehicle hardware and related GSE hardware is simulated by mathematical models (written in Shuttle Ground Operations Simulator (SGOS) modeling language) that behave and react like real hardware. During the Shuttle Final Countdown Phase Simulation, engineers command and control hardware via real application software executing in the control consoles – just as if they were commanding real vehicle hardware. However, these real software applications do not interface with real Shuttle hardware during simulations. Instead, the applications interface with mathematical model representations of the vehicle and GSE hardware. Consequently, the simulations bypass sensitive and even dangerous mechanisms while providing engineering mea- surements detailing how the hardware would have reacted. Since these math models interact with the command and control application software, models and simulations are also used to debug and verify the functionality of application software.

Satellite Naigation

The only true way to test GNSS receivers (commonly known as Sat-Nav’s in the commercial world) is by using an RF Constellation Simulator. A receiver that may for example be used on an aircraft, can be tested under dynamic conditions without the need to take it on a real flight. The test condi- tions can be repeated exactly, and there is full control over all the test parameters. this is not pos- sible in the ‘real-world’ using the actual signals. For testing receivers that will use the new Galileo (satellite navigation) there is no alternative, as the real signals do not yet exist.

Weather

Predicting weather conditions by extrapolating/interpolating previous data is one of the real use of simulation. Most of the weather forecasts use this information published by Weather buereaus. This kind of simulations help in predicting and forewarning about extreme weather conditions like the path of an active hurricane/cyclone. Numerical weather prediction for forecasting involves complicated numeric computer models to predict weather accurately by taking many parameters into account.

______WORLD TECHNOLOGIES ______34 Numerical Analysis, Modelling and Simulation

Simulation Games

Strategy games—both traditional and modern—may be viewed as simulations of abstracted deci- sion-making for the purpose of training military and political leaders.

Many other video games are simulators of some kind. Such games can simulate various aspects of reality, from business, to government, to construction, to piloting vehicles.

Historical Usage Historically, the word had negative connotations: …therefore a general custom of simulation (which is this last degree) is a vice, using either of a natural falseness or fearfulness… — Francis Bacon, of Simulation and Dissimulation, 1597 …for Distinction Sake, a Deceiving by Words, is commonly called a Lye, and a Deceiving by Ac- tions, Gestures, or Behavior, is called Simulation… — Robert South, South, 1697, p.525 However, the connection between simulation and dissembling later faded out and is now only of linguistic interest.

Computer Simulation

A computer simulation is a simulation, run on a single computer, or a network of computers, to reproduce behavior of a system. The simulation uses an abstract model (a computer model, or a computational model) to simulate the system. Computer simulations have become a useful part of mathematical modeling of many natural systems in physics (computational physics), astrophysics, climatology, chemistry and biology, human systems in economics, psychology, social science, and engineering. Simulation of a system is represented as the running of the system’s model. It can be used to explore and gain new insights into new technology and to estimate the performance of systems too complex for analytical solutions.

Process of building a computer model, and the interplay between experiment, simulation, and theory.

______WORLD TECHNOLOGIES ______Understanding Simulation 35

Computer simulations are computer programs that can be either small, running almost instantly on small devices, or large-scale programs that run for hours or days on network-based groups of computers. The scale of events being simulated by computer simulations has far exceeded any- thing possible (or perhaps even imaginable) using traditional paper-and-pencil mathematical modeling. Over 10 years ago, a desert-battle simulation of one force invading another involved the modeling of 66,239 tanks, trucks and other vehicles on simulated terrain around Kuwait, using multiple in the DoD High Performance Computer Modernization Program. Other examples include a 1-billion-atom model of material deformation; a 2.64-million-atom model of the complex protein-producing organelle of all living organisms, the ribosome, in 2005; a com- plete simulation of the life cycle of Mycoplasma genitalium in 2012; and the Blue Brain project at EPFL (Switzerland), begun in May 2005 to create the first computer simulation of the entire human brain, right down to the molecular level.

Because of the computational cost of simulation, computer experiments are used to perform infer- ence such as uncertainty quantification.

Simulation Versus Model

A computer model is the algorithms and equations used to capture the behavior of the system be- ing modeled. By contrast, computer simulation is the actual running of the program that contains these equations or algorithms. Simulation, therefore, is the process of running a model. Thus one would not “build a simulation”; instead, one would “build a model”, and then either “run the mod- el” or equivalently “run a simulation”.

History

Computer simulation developed hand-in-hand with the rapid growth of the computer, following its first large-scale deployment during the Manhattan Project in World War II to model the pro- cess of nuclear detonation. It was a simulation of 12 hard spheres using a Monte Carlo algorithm. Computer simulation is often used as an adjunct to, or substitute for, modeling systems for which simple closed form analytic solutions are not possible. There are many types of computer simula- tions; their common feature is the attempt to generate a sample of representative scenarios for a model in which a complete enumeration of all possible states of the model would be prohibitive or impossible.

Data Preparation

The external data requirements of simulations and models vary widely. For some, the input might be just a few numbers (for example, simulation of a waveform of AC electricity on a wire), while others might require terabytes of information (such as weather and climate models).

Input sources also vary widely:

• Sensors and other physical devices connected to the model;

• Control surfaces used to direct the progress of the simulation in some way;

• Current or historical data entered by hand;

______WORLD TECHNOLOGIES ______36 Numerical Analysis, Modelling and Simulation

• Values extracted as a by-product from other processes;

• Values output for the purpose by other simulations, models, or processes.

Lastly, the time at which data is available varies:

• “invariant” data is often built into the model code, either because the value is truly invari- ant (e.g., the value of π) or because the designers consider the value to be invariant for all cases of interest;

• data can be entered into the simulation when it starts up, for example by reading one or more files, or by reading data from a preprocessor;

• data can be provided during the simulation run, for example by a sensor network.

Because of this variety, and because diverse simulation systems have many common elements, there are a large number of specialized simulation languages. The best-known may be Simula (sometimes called Simula-67, after the year 1967 when it was proposed). There are now many others.

Systems that accept data from external sources must be very careful in knowing what they are receiving. While it is easy for computers to read in values from text or binary files, what is much harder is knowing what the accuracy (compared to measurement resolution and pre- cision) of the values are. Often they are expressed as “error bars”, a minimum and maximum deviation from the value range within which the true value (is expected to) lie. Because digital computer mathematics is not perfect, rounding and truncation errors multiply this error, so it is useful to perform an “error analysis” to confirm that values output by the simulation will still be usefully accurate.

Even small errors in the original data can accumulate into substantial error later in the simulation. While all computer analysis is subject to the “GIGO” (garbage in, garbage out) restriction, this is especially true of digital simulation. Indeed, observation of this inherent, cumulative error in dig- ital systems was the main catalyst for the development of chaos theory.

Types

Computer models can be classified according to several independent pairs of attributes, including:

• Stochastic or deterministic (and as a special case of deterministic, chaotic)

• Steady-state or dynamic

• Continuous or discrete (and as an important special case of discrete, discrete event or DE models)

• Dynamic system simulation, e.g. systems, hydraulic systems or multi-body me- chanical systems (described primarely by DAE:s) or dynamics simulation of field prob- lems, e.g. CFD of FEM simulations (described by PDE:s).

• Local or distributed.

______WORLD TECHNOLOGIES ______Understanding Simulation 37

Another way of categorizing models is to look at the underlying data structures. For time-stepped simulations, there are two main classes:

• Simulations which store their data in regular grids and require only next-neighbor access are called stencil codes. Many CFD applications belong to this category.

• If the underlying graph is not a regular grid, the model may belong to the meshfree method class.

Equations define the relationships between elements of the modeled system and attempt to find a state in which the system is in equilibrium. Such models are often used in simulating physical systems, as a simpler modeling case before dynamic simulation is attempted.

• Dynamic simulations model changes in a system in response to (usually changing) input signals.

• Stochastic models use random number generators to model chance or random events.

• A discrete event simulation (DES) manages events in time. Most computer, logic-test and fault-tree simulations are of this type. In this type of simulation, the simulator maintains a queue of events sorted by the simulated time they should occur. The simulator reads the queue and triggers new events as each event is processed. It is not important to execute the simulation in real time. It is often more important to be able to access the data produced by the simulation and to discover logic defects in the design or the sequence of events.

• A continuous dynamic simulation performs numerical solution of differential-algebraic equations or differential equations (either partial or ordinary). Periodically, the simulation program solves all the equations and uses the numbers to change the state and output of the simulation. Applications include flight simulators, construction and management sim- ulation games, chemical process modeling, and simulations of electrical circuits. Original- ly, these kinds of simulations were actually implemented on analog computers, where the differential equations could be represented directly by various electrical components such as op-amps. By the late 1980s, however, most “analog” simulations were run on conven- tional digital computers that emulate the behavior of an analog computer.

• A special type of discrete simulation that does not rely on a model with an underlying equa- tion, but can nonetheless be represented formally, is agent-based simulation. In agent- based simulation, the individual entities (such as molecules, cells, trees or consumers) in the model are represented directly (rather than by their density or concentration) and pos- sess an internal state and set of behaviors or rules that determine how the agent’s state is updated from one time-step to the next.

• Distributed models run on a network of interconnected computers, possibly through the Internet. Simulations dispersed across multiple host computers like this are often referred to as “distributed simulations”. There are several standards for distributed simulation, in- cluding Aggregate Level Simulation Protocol (ALSP), Distributed Interactive Simulation (DIS), the High Level Architecture (simulation) (HLA) and the Test and Training Enabling Architecture (TENA).

______WORLD TECHNOLOGIES ______38 Numerical Analysis, Modelling and Simulation

Visualization

Formerly, the output data from a computer simulation was sometimes presented in a table or a matrix showing how data were affected by numerous changes in the simulation parameters. The use of the matrix format was related to traditional use of the matrix concept in mathematical models. However, psychologists and others noted that humans could quickly perceive trends by looking at graphs or even moving-images or motion-pictures generated from the data, as displayed by computer-generated-imagery (CGI) animation. Although observers could not necessarily read out numbers or quote math formulas, from observing a moving weather chart they might be able to predict events (and “see that rain was headed their way”) much faster than by scanning tables of rain-cloud coordinates. Such intense graphical displays, which transcended the world of numbers and formulae, sometimes also led to output that lacked a coordinate grid or omitted timestamps, as if straying too far from numeric data displays. Today, weather forecasting models tend to bal- ance the view of moving rain/snow clouds against a map that uses numeric coordinates and nu- meric timestamps of events.

Similarly, CGI computer simulations of CAT scans can simulate how a tumor might shrink or change during an extended period of medical treatment, presenting the passage of time as a spin- ning view of the visible human head, as the tumor changes.

Other applications of CGI computer simulations are being developed to graphically display large amounts of data, in motion, as changes occur during a simulation run.

Computer Simulation in Science

Generic examples of types of computer simulations in science, which are derived from an underly- ing mathematical description:

Computer simulation of the process of osmosis

• a numerical simulation of differential equations that cannot be solved analytically, theories that involve continuous systems such as phenomena in physical cosmology, fluid dynamics (e.g., climate models, roadway noise models, roadway air dispersion models), continuum mechanics and chemical kinetics fall into this category.

• a stochastic simulation, typically used for discrete systems where events occur probabilis- tically and which cannot be described directly with differential equations (this is a discrete

______WORLD TECHNOLOGIES ______Understanding Simulation 39

simulation in the above sense). Phenomena in this category include genetic drift, biochem- ical or gene regulatory networks with small numbers of molecules.

Specific examples of computer simulations follow:

• statistical simulations based upon an agglomeration of a large number of input profiles, such as the forecasting of equilibrium temperature of receiving waters, allowing the gamut of meteorological data to be input for a specific locale. This technique was developed for thermal pollution forecasting.

• agent based simulation has been used effectively in ecology, where it is often called “in- dividual based modeling” and is used in situations for which individual variability in the agents cannot be neglected, such as population dynamics of salmon and trout (most purely mathematical models assume all trout behave identically).

• time stepped dynamic model. In hydrology there are several such hydrology transport models such as the SWMM and DSSAM Models developed by the U.S. Environmental Pro- tection Agency fo ulations have also been used to formally model theories of human cogni- tion and performance, e.g., ACT-R.

• computer simulation using molecular modeling for drug discovery.

• computer simulation for studying the selective sensitivity of bonds by mechanochemistry during grinding of organic molecules.

• Computational fluid dynamics simulations are used to simulate the behaviour of flowing air, water and other fluids. One-, two- and three-dimensional models are used. A one-di- mensional model might simulate the effects of water hammer in a pipe. A two-dimensional model might be used to simulate the drag forces on the cross-section of an aeroplane wing. A three-dimensional simulation might estimate the heating and cooling requirements of a large building.

• An understanding of statistical thermodynamic molecular theory is fundamental to the appreciation of molecular solutions. Development of the Potential Distribution Theorem (PDT) allows this complex subject to be simplified to down-to-earth presentations of mo- lecular theory.

Notable, and sometimes controversial, computer simulations used in science include: Donella Meadows’ World3 used in the Limits to Growth, James Lovelock’s Daisyworld and Thomas Ray’s Tierra.

Simulation Environments for Physics and Engineering

Graphical environments to design simulations have been developed. Special care was taken to handle events (situations in which the simulation equations are not valid and have to be changed). The open project Open Source Physics was started to develop reusable libraries for simulations in Java, together with Easy Java Simulations, a complete graphical environment that generates code based on these libraries.

______WORLD TECHNOLOGIES ______40 Numerical Analysis, Modelling and Simulation

Computer Simulation in Practical Contexts

Computer simulations are used in a wide ariety of practical contexts, such as:

• analysis of air pollutant dispersion using atmospheric dispersion modeling

• design of complex systems such as aircraft and also logistics systems.

• design of noise barriers to effect roadway noise mitigation

• modeling of application performance

• flight simulators to train pilots

• weather forecasting

• forecasting of risk

• simulation of electrical circuits

• simulation of other computers is emulation.

• forecasting of prices on financial markets

• behavior of structures (such as buildings and industrial parts) under stress and other con- ditions

• design of industrial processes, such as chemical processing plants

• strategic management and organizational studies

• reservoir simulation for the petroleum engineering to model the subsurface reservoir

• process engineering simulation tools.

• robot simulators for the design of robots and robot control algorithms

• urban simulation models that simulate dynamic patterns of urban development and re- sponses to urban land use and transportation policies.

• traffic engineering to plan or redesign parts of the street network from single junctions over cities to a national highway network to transportation system planning, design and operations.

• modeling car crashes to test safety mechanisms in new vehicle models.

• crop-soil systems in agriculture, via dedicated software frameworks (e.g. BioMA, OMS3, APSIM)

The reliability and the trust people put in computer simulations depends on the validity of the sim- ulation model, therefore verification and validation are of crucial importance in the development of computer simulations. Another important aspect of computer simulations is that of reproduc-

______WORLD TECHNOLOGIES ______Understanding Simulation 41

ibility of the results, meaning that a simulation model should not provide a different answer for each execution. Although this might seem obvious, this is a special point of attention in stochas- tic simulations, where random numbers should actually be semi-random numbers. An exception to reproducibility are human-in-the-loop simulations such as flight simulations and computer games. Here a human is part of the simulation and thus influences the outcome in a way that is hard, if not impossible, to reproduce exactly.

Vehicle manufacturers make use of computer simulation to test safety features in new designs. By building a copy of the car in a physics simulation environment, they can save the hundreds of thou- sands of dollars that would otherwise be required to build and test a unique prototype. Engineers can step through the simulation milliseconds at a time to determine the exact stresses being put upon each section of the prototype.

Computer graphics can be used to display the results of a computer simulation. Animations can be used to experience a simulation in real-time, e.g., in training simulations. In some cases anima- tions may also be useful in faster than real-time or even slower than real-time modes. For example, faster than real-time animations can be useful in visualizing the buildup of queues in the simula- tion of humans evacuating a building. Furthermore, simulation results are often aggregated into static images using various ways of scientific visualization.

In debugging, simulating a program execution under test (rather than executing natively) can de- tect far more errors than the hardware itself can detect and, at the same time, log useful debugging information such as instruction trace, memory alterations and instruction counts. This technique can also detect buffer overflow and similar “hard to detect” errors as well as produce performance information and tuning data.

Pitfalls

Although sometimes ignored in computer simulations, it is very important to perform a sensi- tivity analysis to ensure that the accuracy of the results is properly understood. For example, the probabilistic risk analysis of factors determining the success of an oilfield exploration pro- gram involves combining samples from a variety of statistical distributions using the Monte Carlo method. If, for instance, one of the key parameters (e.g., the net ratio of oil-bearing stra- ta) is known to only one significant figure, then the result of the simulation might not be more precise than one significant figure, although it might (misleadingly) be presented as having four significant figures.

Model Calibration Techniques

The following three steps should be used to produce accurate simulation models: calibration, verification, and validation. Computer simulations are good at portraying and comparing the- oretical scenarios, but in order to accurately model actual case studies they have to match what is actually happening today. A base model should be created and calibrated so that it matches the area being studied. The calibrated model should then be verified to ensure that the model is operating as expected based on the inputs. Once the model has been verified, the final step is to validate the model by comparing the outputs to historical data from the study area. This can be done by using statistical techniques and ensuring an adequate R-squared value. Unless

______WORLD TECHNOLOGIES ______42 Numerical Analysis, Modelling and Simulation

these techniques are employed, the simulation model created will produce inaccurate results and not be a useful prediction tool.

Model calibration is achieved by adjusting any available parameters in order to adjust how the model operates and simulates the process. For example, in traffic simulation, typical param- eters include look-ahead distance, car-following sensitivity, discharge headway, and start-up lost time. These parameters influence driver behavior such as when and how long it takes a driver to change lanes, how much distance a driver leaves between his car and the car in front of it, and how quickly a driver starts to accelerate through an intersection. Adjusting these parameters has a direct effect on the amount of traffic volume that can traverse through the modeled roadway network by making the drivers more or less aggressive. These are examples of calibration parameters that can be fine-tuned to match characteristics observed in the field at the study location. Most traffic models have typical default values but they may need to be adjusted to better match the driver behavior at the specific location being studied.

Model verification is achieved by obtaining output data from the model and comparing them to what is expected from the input data. For example, in traffic simulation, traffic volume can be verified to ensure that actual volume throughput in the model is reasonably close to traffic volumes input into the model. Ten percent is a typical threshold used in traffic simulation to determine if output volumes are reasonably close to input volumes. Simulation models handle model inputs in different ways so traffic that enters the network, for example, may or may not reach its desired destination. Additionally, traffic that wants to enter the network may not be able to, if congestion exists. This is why model verification is a very important part of the modeling process.

The final step is to validate the model by comparing the results with what is expected based on historical data from the study area. Ideally, the model should produce similar results to what has happened historically. This is typically verified by nothing more than quoting the R-squared statistic from the fit. This statistic measures the fraction of variability that is ac- counted for by the model. A high R-squared value does not necessarily mean the model the data well. Another tool used to validate models is graphical residual analysis. If model output values drastically differ from historical values, it probably means there is an error in the model. Before using the model as a base to produce additional models, it is important to verify it for different scenarios to ensure that each one is accurate. If the outputs do not rea- sonably match historic values during the validation process, the model should be reviewed and updated to produce results more in line with expectations. It is an iterative process that helps to produce more realistic models.

Validating traffic simulation models requires comparing traffic estimated by the model to ob- served traffic on the roadway and transit systems. Initial comparisons are for trip interchanges between quadrants, sectors, or other large areas of interest. The next step is to compare traffic estimated by the models to traffic counts, including transit ridership, crossing contrived bar- riers in the study area. These are typically called screenlines, cutlines, and cordon lines and may be imaginary or actual physical barriers. Cordon lines surround particular areas such as a city’s central business district or other major activity centers. Transit ridership estimates are commonly validated by comparing them to actual patronage crossing cordon lines around the central business district.

______WORLD TECHNOLOGIES ______Understanding Simulation 43

Three sources of error can cause weak correlation during calibration: input error, model error, and parameter error. In general, input error and parameter error can be adjusted easily by the user. Model error however is caused by the methodology used in the model and may not be as easy to fix. Simulation models are typically built using several different modeling theories that can produce conflicting results. Some models are more generalized while others are more de- tailed. If model error occurs as a result, in may be necessary to adjust the model methodology to make results more consistent.

In order to produce good models that can be used to produce realistic results, these are the necessary steps that need to be taken in order to ensure that simulation models are func- tioning properly. Simulation models can be used as a tool to verify engineering theories, but they are only valid if calibrated properly. Once satisfactory estimates of the parameters for all models have been obtained, the models must be checked to assure that they adequately per- form the intended functions. The validation process establishes the credibility of the model by demonstrating its ability to replicate actual traffic patterns. The importance of model valida- tion underscores the need for careful planning, thoroughness and accuracy of the input data collection program that has this purpose. Efforts should be made to ensure collected data is consistent with expected values. For example, in traffic analysis it is typical for a traffic en- gineer to perform a site visit to verify traffic counts and become familiar with traffic patterns in the area. The resulting models and forecasts will be no better than the data used for model estimation and validation.

Dynamic Simulation

Dynamic simulation (or dynamic system simulation) is the use of a computer program to model the time varying behavior of a system. The systems are typically described by ordinary differential equations or partial differential equations. As mathematical models incorporate real-world constraints, like gear backlash and rebound from a hard stop, equations become nonlinear. This requires numerical methods to solve the equations. A numerical simulation is done by stepping through a time interval and calculating the integral of the derivatives by approximating the area under the derivative curves. Some methods use a fixed step through the interval, and others use an adaptive step that can shrink or grow automatically to maintain an acceptable error tolerance. Some methods can use different time steps in different parts of the simulation model. Industrial uses of dynamic simulation are many and range from nuclear power, steam turbines, 6 degree of freedom vehicle modeling, electric motors, econometric models, biological systems, robot arms, mass spring dampers, hydraulic systems, and drug dose migration through the human body to name a few. These models can often be run in real time to give a virtual response close to the actual system. This is useful in process control and mechatronic systems for tuning the automatic control systems before they are connected to the real system, or for human training before they control the real system. Simulation is also used in computer games and animation and can be accelerated by using a physics engine, the technology used in many powerful computer programs, like 3ds Max, Maya, Lightwave, and many others to simulate physical characteristics. In computer anima- tion, things like hair, cloth, liquid, fire, and particles can be easily modeled, while the human

______WORLD TECHNOLOGIES ______44 Numerical Analysis, Modelling and Simulation

animator animates simpler objects. Computer-based dynamic animation was first used at a very simple level in the 1989 Pixar Animation Studios short film Knick Knack to move the fake snow in the snowglobe and pebbles in a fish tank.

Discrete Event Simulation

A discrete-event simulation (DES) models the operation of a system as a discrete sequence of events in time. Each event occurs at a particular instant in time and marks a change of state in the system. Between consecutive events, no change in the system is assumed to occur; thus the simu- lation can directly jump in time from one event to the next.

This contrasts with continuous simulation in which the simulation continuously tracks the system dynamics over time. Instead of being event-based, this is called an activity-based sim- ulation; time is broken up into small time slices and the system state is updated according to the set of activities happening in the time slice. Because discrete-event simulations do not have to simulate every time slice, they can typically run much faster than the corresponding continuous simulation.

A more recent method is the three-phased approach to discrete event simulation (Pidd, 1998). In this approach, the first phase is to jump to the next chronological event. The second phase is to execute all events that unconditionally occur at that time (these are called B-events). The third phase is to execute all events that conditionally occur at that time (these are called C-events). The three phase approach is a refinement of the event-based approach in which si- multaneous events are ordered so as to make the most efficient use of computer resources. The three-phase approach is used by a number of commercial simulation software packages, but from the user’s point of view, the specifics of the underlying simulation method are generally hidden.

Example

A common exercise in learning how to build discrete-event simulations is to model a queue, such as customers arriving at a bank to be served by a teller. In this example, the system entities are Customer-queue′ and Tellers. The system events are Customer-Arrival and Customer-Departure. (The event of Teller-Begins-Service can be part of the logic of the arrival and departure events.) The system states, which are changed by these events, are Number-of-Customers-in-the-Queue (an integer from 0 to n) and Teller-Status (busy or idle). The random variables that need to be characterized to model this system stochastically are Customer-Interarrival-Time and Teller-Ser- vice-Time. An agent-based framework for performance modeling of an optimistic parallel discrete event simulator is another example for a discrete event simulation.

Components

In addition to the logic of what happens when system events occur, discrete event simulations include the following:

______WORLD TECHNOLOGIES ______Understanding Simulation 45

State

A system state is a set of variables that captures the salient properties of the system to be studied. The state trajectory overtime S(t) can be mathematically represented by a step function whose values change in correspondence of discrete events.

Clock

The simulation must keep track of the current simulation time, in whatever measurement units are suitable for the system being modeled. In discrete-event simulations, as opposed to continuous simulations, time ‘hops’ because events are instantaneous – the clock skips to the next event start time as the simulation proceeds.

Events List

The simulation maintains at least one list of simulation events. This is sometimes called the pend- ing event set because it lists events that are pending as a result of previously simulated event but have yet to be simulated themselves. An event is described by the time at which it occurs and a type, indicating the code that will be used to simulate that event. It is common for the event code to be parametrized, in which case, the event description also contains parameters to the event code.

When events are instantaneous, activities that extend over time are modeled as sequences of events. Some simulation frameworks allow the time of an event to be specified as an interval, giv- ing the start time and the end time of each event.

Single-threaded simulation engines based on instantaneous events have just one current event. In contrast, multi-threaded simulation engines and simulation engines supporting an interval-based event model may have multiple current events. In both cases, there are significant problems with synchronization between current events.

The pending event set is typically organized as a priority queue, sorted by event time. That is, regardless of the order in which events are added to the event set, they are removed in strictly chronological order. Several general-purpose priority queue algorithms have proven effective for discrete-event simulation, most notably, the splay tree. More recent alternatives include skip lists, calendar queues, and ladder queues.

Typically, events are scheduled dynamically as the simulation proceeds. For example, in the bank example noted above, the event CUSTOMER-ARRIVAL at time t would, if the CUSTOMER_ QUEUE was empty and TELLER was idle, include the creation of the subsequent event CUSTOM- ER-DEPARTURE to occur at time t+s, where s is a number generated from the SERVICE-TIME distribution.

Random-number Generators

The simulation needs to generate random variables of various kinds, depending on the system model. This is accomplished by one or more Pseudorandom number generators. The use of pseu- do-random numbers as opposed to true random numbers is a benefit should a simulation need a rerun with exactly the same behavior.

______WORLD TECHNOLOGIES ______46 Numerical Analysis, Modelling and Simulation

One of the problems with the random number distributions used in discrete-event simulation is that the steady-state distributions of event times may not be known in advance. As a result, the initial set of events placed into the pending event set will not have arrival times representative of the steady-state distribution. This problem is typically solved by bootstrapping the simulation model. Only a limited effort is made to assign realistic times to the initial set of pending events. These events, however, schedule additional events, and with time, the distribution of event times approaches its steady state. This is called bootstrapping the simulation model. In gathering statis- tics from the running model, it is important to either disregard events that occur before the steady state is reached or to run the simulation for long enough that the bootstrapping behavior is over- whelmed by steady-state behavior. (This use of the term bootstrapping can be contrasted with its use in both statistics and computing.)

Statistics

The simulation typically keeps track of the system’s statistics, which quantify the aspects of inter- est. In the bank example, it is of interest to track the mean waiting times. In a simulation model, performance metrics are not analytically derived from probability distributions, but rather as av- erages over replications, that is different runs of the model. Confidence intervals are usually con- structed to help assess the quality of the output.

Ending Condition

Because events are bootstrapped, theoretically a discrete-event simulation could run forever. So the simulation designer must decide when the simulation will end. Typical choices are “at time t” or “after processing n number of events” or, more generally, “when statistical measure X reaches the value x”.

Simulation Engine Logic

The main loop of a discrete-event simulation is something like this:

Start

• Initialize Ending Condition to FALSE.

• Initialize system state variables.

• Initialize Clock (usually starts at simulation time zero).

• Schedule an initial event (i.e., put some initial event into the Events List).

“Do Loop” or “

While (Ending Condition is FALSE) then do the following:

• Set clock to next event time.

• Do next event and remove from the Events List.

______WORLD TECHNOLOGIES ______Understanding Simulation 47

• Update statistics.

End

• Generate statistical report.

Common Uses Diagnosing Process Issues

Simulation approaches are particularly well equipped to help users diagnose issues in complex environments. The Goal (Theory of Constraints) illustrates the importance of understanding bottlenecks in a system. Only process ‘improvements’ at the bottlenecks will actually improve the overall system. In many organizations bottlenecks become hidden by excess inventory, overproduction, variability in processes and variability in routing or sequencing. By accurately documenting the system inside a simulation model it is possible to gain a bird’s eye view of the entire system.

A working model of a system allows management to understand performance drivers. A simula- tion can be built to include any number of performance indicators such as worker utilization, on- time delivery rate, scrap rate, cash cycles, and so on.

Hospital Applications

An operating theater is generally shared between several surgical disciplines. Through bet- ter understanding the nature of these procedures it may be possible to increase the patient throughput. Example: If a heart surgery takes on average four hours, changing an operating room schedule from eight available hours to nine will not increase patient throughput. On the other hand, if a hernia procedure takes on average twenty minutes providing an extra hour may also not yield any increased throughput if the capacity and average time spent in the re- covery room is not considered.

Lab Test Performance Improvement Ideas

Many systems improvement ideas are built on sound principles, proven methodologies (Lean, Six Sigma, TQM, etc.) yet fail to improve the overall system. A simulation model allows the user to understand and test a performance improvement idea in the context of the overall system.

Evaluating Capital Investment Decisions

Simulation modeling is commonly used to model potential investments. Through modeling invest- ments decision-makers can make informed decisions and evaluate potential alternatives.

Network Simulators

Discrete event simulation is used in to simulate new protocols for different net- work traffic scenarios before deployment.

______WORLD TECHNOLOGIES ______48 Numerical Analysis, Modelling and Simulation

List of Computer Simulation Software

Free or Open-source

• Advanced Simulation Library - open-source hardware accelerated multiphysics simulation software.

• ASCEND - open-source equation-based modelling environment.

• DWSIM - an open-source CAPE-OPEN compliant chemical process simulator.

• Elmer - an open-source multiphysical simulation software for Windows/Mac/.

• Facsimile - a free, open-source discrete-event simulation library.

• Freemat - a free environment for rapid engineering, scientific prototyping and data pro- cessing using the same language as Matlab and GNU Octave.

• Galatea - a multi-agent, multi-programming language, simulation platform.

• GNU Octave - an open-source mathematical modeling and simulation software very simi- lar to using the same language as Matlab and Freemat.

• Minsky (economic simulator) - an open-source visual computer program for dynamic sim- ulation of economic models.

- an open standard for modeling software.

• Mobility Testbed - an open-source multi-agent simulation testbed for transport coordina- tion algorithms.

• NetLogo - an open-source multi-agent simulation software.

• ns-3 - an open-source network simulator.

• OpenFOAM - open-source software used for computational fluid dynamics (or CFD)

• OpenEaagles - multi-platform simulation framework to prototype and build simulation applications.

• Open Source Physics - an open-source Java software project for teaching and studying physics.

• OpenSim - an open-source software system for biomechanical modeling.

• Physics Abstraction Layer - an open-source physics simulation package.

• Project Chrono - an open-source multi-physics simulation framework.

• SageMath - a system for algebra and geometry experimentation via Python.

• Scilab - free open-source software for numerical computation and simulation similar to

______WORLD TECHNOLOGIES ______Understanding Simulation 49

Matlab/.

• SimPy - an open-source discrete-event simulation package based on Python.

• SOFA - an open-source framework for multi-physics simulation with an emphasis on med- ical simulation.

• Stanford University Unstructured - an open-source framework for computational fluid dy- namics simulation and optimal shape design.

• Step - an open-source two-dimensional physics simulation engine (KDE).

• Tortuga - an open-source software framework for discrete-event simulation in Java.

Proprietary

• 20-sim - bond graph-based multi-domain simulation software.

- finite element-based simulation software to analyze the acoustic behavior of me- chanical systems and parts.

• ACSL and acslX - an advanced continuous simulation language.

• AMESim - a platform to analyze multi-domain, intelligent systems and predict and opti- mize multi-disciplinary performance. Developed by Siemens PLM Software.

• AnyLogic - a multi-method simulation modeling tool for business and science. Developed by The AnyLogic Company.

• APMonitor - a tool for dynamic simulation, validation, and optimization of multi-domain systems with interfaces to Python and MATLAB.

• AutoCAST - metal casting design and simulation software developed by Advanced Reason- ing Technologies.

• Automation Studio - a fluid power, electrical and control systems design and simulation software developed by Famic Technologies Inc.

• Chemical WorkBench - a chemical kinetics simulation software tool developed by Kintech Lab.

• CircuitLogix - an electronics simulation software developed by Logic Design Inc.

• COMSOL Multiphysics (formerly FEMLAB) - a finite element analysis, solver and simula- tion software package for various physics and engineering applications, especially coupled phenomena, or multi-physics.

• DX Studio - a suite of tools for simulation and visualization.

- modeling and simulation software based on the Modelica language.

• Ecolego - a simulation software tool for creating dynamic models and performing deter-

______WORLD TECHNOLOGIES ______50 Numerical Analysis, Modelling and Simulation

ministic and probabilistic simulations.

• EcosimPro - continuous and discrete modelling and simulation software.

• Enterprise Architect - a tool for simulation of UML behavioral modeling, coupled with Win32 interaction.

• Enterprise Dynamics - a simulation software platform developed by INCONTROL Simula- tion Solutions.

• ExtendSim - simulation software for discrete event, continuous, discrete rate and agent- based simulation.

• Flexsim - discrete event simulation software.

• Fluent, Inc. - simulation software for fluid flow, turbulence, heat transfer, and reactions for industrial applications.

• GoldSim - simulation software for system dynamics and discrete event simulation, embed- ded in a Monte Carlo framework.

• HyperWorks - multi-discipline simulation software

• Isaac dynamics - dynamic process simulation software for conventional and renewable power plants.

• Khimera - a chemical kinetics simulation software tool developed by Kintech Lab.

• Lanner WITNESS - a discrete event simulation platform for modelling processes and ex- perimentation.

• Lanner L-SIM Server - Java-based simulation engine for simulating BPMN2.0 based pro- cess models.

- a general-purpose developed and sold commercially by Waterloo Maple Inc.

• MapleSim - a multi-domain modeling and simulation tool developed by Waterloo Maple Inc.

• MATLAB - a programming, modeling and simulation tool developed by MathWorks.

• Mathematica - a computational software program based on symbolic mathematics, devel- oped by .

• ModelCenter - a framework for integration of third-party modeling and simulation tools/ scripts, workflow automation, and multidisciplinary design analysis and optimization from Phoenix Integration.

• NEi - software for engineering simulation of stress, dynamics, and heat transfer in structures.

______WORLD TECHNOLOGIES ______Understanding Simulation 51

• NetSim - network simulation software for defense applications, network design validation and network research and development.

• NI Multisim - an electronic schematic capture and simulation program.

• Plant Simulation - plant, line and process simulation and optimization software, developed by Siemens PLM Software.

• PLECS - a tool for system-level simulations of electrical circuits. Developed by Plexim.

• PRO/II - software for steady state chemical process simulation and extensively used by oil and gas refineries.

• Promodel - a discrete event simulation software.

• Project Team Builder - a project management simulator used for training and education.

• PSF Lab - calculates the point spread function of an optical microscope under various im- aging conditions based on a rigorous vectorial model.

• RoboLogix - robotics simulation software developed by Logic Design Inc.

• Ship Simulator - a vehicle simulation computer game by VSTEP which simulates maneu- vering various ships in different environments.

• Simcad Pro - dynamic discrete and continuous simulation software.

• SimEvents - a part of MathWorks which adds discrete event simulation to the MATLAB/ Simulink environment.

• Simio - an object-oriented discrete event and agent based simulation software developed by Simio LLC.

• SimScale - a web-based simulation platform, with CFD, FEA, and thermodynamics capa- bilities.

• SIMUL8 - software for discrete event or process based simulation.

• Simulations Plus - modeling and simulation software for pharmaceutical research

• SimulationX - modeling and simulation software based on the Modelica language.

• Simulink - a tool for block diagrams, electrical mechanical systems and machines from MathWorks.

• TRNSYS - software for dynamic simulation of renewable energy systems, HVAC systems, building energy use and both passive and active solar systems.

• Vensim - system dynamics and continuous simulation software for business and public policy applications.

• VisSim - system simulation and optional C-code generation of electrical, process, control,

______WORLD TECHNOLOGIES ______52 Numerical Analysis, Modelling and Simulation

bio-medical, mechanical and UML State chart systems.

• Vortex (software) - a high-fidelity, realtime physics engine that simulates rigid body dy- namics, collision detection, contact determination, and dynamic reactions.

• Wolfram SystemModeler – modeling and simulation software based on the Modelica lan- guage.

• Working Model – a dynamic simulator with connections to SolidWorks.

• VisualSim Architect – an electronic system-level software for modeling and simulation of electronic systems, embedded software and semiconductors

Minsky (Economic Simulator)

Minsky is an open source visual computer program for simulation of economic models using a sys- tem dynamics approach to model the kind of non linear complex systems found in national econo- mies. Its name is a tribute to the American economist Hyman Minsky (1919 – 1996) who pioneered the underlying economic model. The originator of the Minsky simulator project is Dr. Steve Keen.

Description

With Minsky, economic models are built using a causal loop diagram and a double entry book- keeping system across whole sectors of the economy known as a Godley table.

Development Minsky has been developed for the Australian economist Steve Keen by lead programmer Russell K. Standish and Nathan Moses & Kevin Pereira. The development of Minsky was funded first by an academic research grant of $128,000 from INET the Institute for New Economic Thinking. 1000 hours of programming work went into it. Then, in February 2013, Keen launched a crowdfunding project for further development of the program. The Kickstarter campaign started on Feb 9, 2013 and ended on Mar 17, 2013. The campaign exceeded its goal of $50,000 and, at the end, $78,025 were contributed for the project. The objective of the project is to develop a standalone dynamic monetary macroeconomic mod- elling tool that would be more suited to financial flows than existing systems dynamics programs such as Simulink, Vensim, Vissim and Stella. Keen envisages Minsky being used for educational and research purposes. Keen himself uses Minsky for his own work based on the Financial Insta- bility Hypothesis of Hyman Minsky. Minsky is released in open source and was SourceForge’s open source project of the month in January 2014.

______WORLD TECHNOLOGIES ______Understanding Simulation 53

ASCEND

ASCEND is the GC on November 3, 2016, open source, mathematical modelling system de- veloped at Carnegie Mellon University since late 1978. ASCEND is an acronym which stands for Advanced System for Computations in Engineering Design. Its main uses have been in the field of chemical process modelling although its capabilities are general. It was a pioneering piece of software in the chemical process modelling field, with its novel modelling language conventions and powerful solver, although it has never been commercialized and remains as an open source software project.

ASCEND includes nonlinear algebraic solvers, differential/algebraic equation solvers, nonlinear optimization and modelling of multi-region ‘conditional models’. Its matrix operations are sup- ported by an efficient sparse matrix solver called mtx.

ASCEND differs from earlier modelling systems because it separates the solving strategy from model building. So domain experts (people writing the models) and computational engineers (people writing the solver code) can work separately in developing ASCEND. Together with a number of other early modelling tools, its architecture helped to inspire newer languages such as Modelica. It was recognised for its flexible use of variables and parameters, which it always treats as solvable, if desired.

The software remains as an active open-source software project, and has been part of the Google Summer of Code programme in 2009, 2010, 2011, 2012, 2013 (under Python Software Founda- tion) and has been accepted for the 2015 programme as well.

NetLogo

NetLogo is an agent-based programming language and integrated modeling environment.

About

NetLogo was designed, in the spirit of the Logo programming language, to be “low threshold and no ceiling”. It teaches programming concepts using agents in the form of turtles, patches, links and the observer. NetLogo was designed for multiple audiences in mind, in particular: teaching children in the education community, and for domain experts without a programming background to model related phenomena. Many scientific articles have been published using NetLogo.

The NetLogo environment enables exploration of emergent phenomena. It comes with an ex- tensive models library including models in a variety of domains, such as economics, biology, physics, chemistry, psychology, system dynamics. NetLogo allows exploration by modifying switches, sliders, choosers, inputs, and other interface elements. Beyond exploration, NetLogo allows authoring of new models and modification of existing models.

NetLogo is freely available from the NetLogo website. It is in use in a wide variety of educa- tional contexts from elementary school to graduate school. Many teachers make use of NetLo- go in their curricula.

NetLogo was designed and authored by Uri Wilensky, director of Northwestern University’s Center for Connected Learning and Computer-Based Modeling.

______WORLD TECHNOLOGIES ______54 Numerical Analysis, Modelling and Simulation

Online Courses

Several massive open online courses are currently being offered that use NetLogo for assignments and/or demonstrations: • Introduction to Complexity (Melanie Mitchell, Santa Fe Institute) • Social Network Analysis (Lada Adamic, University of Michigan – this course is not yet available). • Model Thinking (Scott E. Page, University of Michigan)

Technical Foundation

NetLogo is free and open source software, under a GPL license. Commercial licenses are also avail- able. It is written in Scala and Java and runs on the Java Virtual Machine. At its core is a hybrid interpreter/ that partially compiles user code to JVM .

NetLogo Web is a version that runs on JavaScript, instead of the JVM, so models may be run in a . It does not have all features of the desktop version.

Examples

A simple multiagent model in NetLogo is the Wolf-Sheep Predation model, which is shown in the screenshot above. It models the population growth of a predator/prey system over time. It has the following characteristics:

______WORLD TECHNOLOGIES ______Understanding Simulation 55

• There are two breed of turtles, called sheep and wolves.

• Sheep and wolves move randomly and have limited energy.

• Wolves and sheep lose energy by moving. If a wolf or sheep has zero energy, it dies.

• Sheep gain energy by eating grass.

• Wolves gain energy by eating sheep.

• Both wolves and sheep can reproduce, sharing energy with their offspring.

HubNet

HubNet is a technology that uses NetLogo to run participatory simulations in the classroom. In a participatory simulation, a whole group of users takes part in enacting the behavior of a system. Using an individual device, such as a networked computer or Texas Instruments graphing calcula- tor, each user acts as a separate, independent agent. One example of a HubNet activity is “Tragedy of the Commons”, which models the economic problem called tragedy of the commons.

Scilab

Scilab is an open source, cross-platform numerical computational package and a high-level, nu- merically oriented programming language. It can be used for signal processing, statistical analysis, image enhancement, fluid dynamics simulations, numerical optimization, and modeling, simu- lation of explicit and implicit dynamical systems and (if the corresponding toolbox is installed) symbolic manipulations.

Scilab is one of the two major open-source alternatives to MATLAB, the other one being GNU Oc- tave. Scilab is similar enough to MATLAB that some book authors (who use it) argue that it is easy to transfer skills between the two systems. Scilab however puts less emphasis on (bidirectional) syntactic compatibility with MATLAB than Octave does.

Overview Scilab is a high-level, numerically oriented programming language. The language provides an interpreted programming environment, with matrices as the main data type. By using ma- trix-based computation, dynamic typing, and automatic memory management, many numer- ical problems may be expressed in a reduced number of code lines, as compared to similar solutions using traditional languages, such as Fortran, C, or C++. This allows users to rap- idly construct models for a range of mathematical problems. While the language provides simple matrix operations such as multiplication, the Scilab package also provides a library of high-level operations such as correlation and complex multidimensional arithmetic. The software can be used for signal processing, statistical analysis, image enhancement, fluid dy- namics simulations, and numerical optimization. Scilab also includes a free package called Xcos (based on Scicos) for modeling and simulation of explicit and implicit dynamical systems, including both continuous and discrete sub-systems. Xcos is the open source equivalent to Simulink from the MathWorks.

______WORLD TECHNOLOGIES ______56 Numerical Analysis, Modelling and Simulation

As the syntax of Scilab is similar to MATLAB, Scilab includes a translator for assist- ing the conversion of code from MATLAB to Scilab. Scilab is available free of cost under an open source license. Due to the open source nature of the software, some user contributions have been integrated into the main program.

License

Scilab family 5 is distributed under the GPL-compatible CeCILL license. Prior to version 5, Scilab was semi- according to the nomenclature of the Free Soft- ware Foundation. The reason for this is that earlier versions’ licenses prohibited commercial dis- tribution of modified versions of Scilab.

Syntax

Scilab syntax is largely based on the MATLAB language. The simplest way to execute Scilab code is to type it in at the prompt, --> , in the graphical command window. In this way, Scilab can be used as an interactive mathematical shell. Hello World! in Scilab:

disp(“Hello World!”) Plotting a 3D surface function: // A simple plot of z = f(x,y) t=[0:0.3:2*%pi]’; z=sin(t)*cos(t’); plot3d(t,t,z)

LaTeX Engine

Scilab renders formulas in mathematical notation using its own Java-based rendering engine, JLa- TeXMath, a fork of the JMathTeX project.

Toolboxes

Scilab has many contributed toolboxes for different tasks:

• Scilab Image Processing Toolbox (SIP) and its variants (such as SIVP)

• Scilab Wavelet Toolbox

• Scilab Java and .NET Module

• Scilab Remote Access Module

• Scilab MySQL

• Equalis Communication Systems Module

______WORLD TECHNOLOGIES ______Understanding Simulation 57

• Equalis Signal Processing Module

• SoftCruncher Performance Accelerator

Many more toolboxes are available on ATOMS Portal or the Scilab forge.

History

Scilab was created in 1990 by researchers from INRIA and École nationale des ponts et chaussées (ENPC). It was initially named Ψlab (Psilab). The Scilab Consortium was formed in May 2003 to broaden contributions and promote Scilab as worldwide reference software in academia and industry. In July 2008, in order to improve the technology transfer, the Scilab Consortium joined the Digiteo Foundation.

Scilab 5.1, the first release compiled for Mac, was available in early 2009, and supported Mac OS X 10.5, a.k.a. Leopard. Thus, OSX 10.4, Tiger, was never supported except by porting from sources. Linux and Windows builds had been released since the beginning, with Solaris support dropped with version 3.1.1, and HP-UX dropped with version 4.1.2 after spotty support.

In June 2010, the Consortium announced the creation of Scilab Enterprises. Scilab Enterprises develops and markets, directly or through an international network of affiliated services providers, a comprehensive set of services for Scilab users. Scilab Enterprises also develops and maintains the Scilab software. The ultimate goal of Scilab Enterprises is to help make the use of Scilab more effective and easy.

Since July 2012, Scilab is developed and published by Scilab Enterprises.

Tortuga (Software)

Tortuga is a software framework for discrete event simulation in Java. A Tortuga simulation can be written either as interacting processes or as scheduled events. A Tortuga simulation can have thousands of entities, and can be part of a larger Java system.

Run-time and Development Environment

Tortuga simulations run on XP and Windows Vista as well as on Linux, macOS, BSD and . They can also be used in an applet environment, although this typically requires a signed applet. As part of its support for simulation, Tortuga employs tools from aspect-oriented programming, or AOP. You need not be familiar with AOP to use Tortuga: your simulation classes are written in standard Java. However, the use of AOP in Tortuga requires more elaborate compi- lation that mere javac. This has been wrapped up in an Ant task included in tortuga.jar. This task is the reason it is assumed Tortuga-based simulations are using Ant to build.

Tortuga

Tortuga utilizes a programming paradigm that greatly reduces the burden on the simulation on developer. Tortuga treats each simulation entity as a separate and allows the user to specify

______WORLD TECHNOLOGIES ______58 Numerical Analysis, Modelling and Simulation

a run method. This allows the developer to focus on simulation specifics without littering event handler code all over the place. Unfortunately, this means that a Tortuga simulation is inherently limited by the number of threads the JVM is able to support. This limit becomes an upper bound on the number of actors, and with the 1.5 Sun-based JRE the limit was about 6,000.

Author and Maintainer

Tortuga was developed by Dr. Fred Kuhl and Dr. Richard Weatherly of The MITRE Corporation in 2004-2006 and they continue to maintain it.

Advanced Simulation Library

Advanced Simulation Library (ASL) is free and open source hardware accelerated multiphysics simulation platform. It enables users to write customized numerical solvers in C++ and deploy them on a variety of massively parallel architectures, ranging from inexpensive FPGAs, DSPs and GPUs up to heterogeneous clusters and supercomputers. Its internal computational engine is writ- ten in OpenCL and utilizes matrix-free solution techniques. ASL implements variety of advanced numerical methods, i.a. Level set method, Lattice Boltzmann, Immersed Boundary. Mesh-free, immersed boundary approach allows to move from CAD directly to simulation significantly reduc- ing pre-processing efforts and amount of potential errors. ASL can be used to model various cou- pled physical and chemical phenomena, especially in the field of Computational fluid dynamics. It is distributed under the free GNU Affero General Public License with an optional commercial license.

Computer-assisted cryosurgery Multicomponent flow ideo

Simulation of a microfluidic deice for separating mixtures of proteins

______WORLD TECHNOLOGIES ______Understanding Simulation 59

Coating procedure employing Physical apor Deposition (PD) method

Image-guided neurosurgery, brain deformation simulation

Aerodynamics of a locomotie in a tunnel

History

Advanced Simulation Library is being developed by Avtech Scientific, an Israeli company. Its source code was released to the community on 14 May 2015 whose members packaged it for sci- entific sections of all major Linux distributions shortly thereafter. Subsequently Khronos Group acknowledged the significance of ASL and listed it on its website among OpenCL-based resources.

Application Areas

• Computational fluid dynamics

• Computer-assisted surgery

• Virtual sensing

• Industrial process data validation and reconciliation

• Multidisciplinary design optimization

• Design space exploration

• Computer-aided engineering

• Crystallography

• Microfluidics

______WORLD TECHNOLOGIES ______60 Numerical Analysis, Modelling and Simulation

Advantages and Disadvantages Advantages

• Easy to learn C++ API (no OpenCL knowledge required)

• Mesh-free, immersed boundary approach allows to move from CAD directly to computa- tions eliminating pre-processing efforts and reducing amount of potential errors

• High performance, memory efficiency

• Dynamic compilation enables an additional layer of optimization at run-time (i.e. for a specific parameters set the application was provided with)

• Automatic hardware acceleration and parallelization of applications

• Deployment of same program on a variety of parallel architectures - GPU, APU, FPGA, DSP, multicore CPUs

• Ability to deal with complex boundaries

• Ability to incorporate microscopic interactions

• Availability of the source code

Disadvantages

• Absence of detailed documentation (besides the Developer Guide generated from the source code comments)

• Not all OpenCL drivers are mature enough for the library

Features

ASL provides a range of features to solve number of problems - from complex fluid flows involving chemical reactions, turbulence and heat transfer, to solid mechanics and elasticity.

• Interfacing: VTK/ParaView, MATLAB (export).

o import file formats: .stl .vtp .vtk .vti .mnc .dcm

o export file formats: .vti .mat • Geometry:

o flexible and complex geometry using simple rectangular grid

o mesh-free, immersed boundary approach

o generation and manipulation of geometric primitives • Implemented phenomena:

______WORLD TECHNOLOGIES ______Understanding Simulation 61

o Transport processes . multicomponent transport processes

. compressible and incompressible fluid flow

o Chemical reactions . electrode reactions

o Elasticity . homogenious isotropic elasticity

. homogenious isotropic poroelasticity

o Interface tracking . evolution of an interface

. evolution of an interface with crystalographic kinetics

Uses

• ACTIVE - Active Constraints Technologies for Ill-defined or Volatile Environments (Euro- pean FP7 Project)

References • Richard H. Riley (2008). Chapter 38: Society for Simulation in Healthcare by Raemer, Dan IN: Manual of Sim- ulation in Healthcare. Oxford University Press. pp. 532–. ISBN 978-0-19-920585-1

• J. Banks; J. Carson; B. Nelson; D. Nicol (2001). Discrete-Event System Simulation. Prentice Hall. p. 3. ISBN 0-13-088702-1.

• Sokolowski, J.A.; Banks, C.M. (2009). Principles of Modeling and Simulation. Hoboken, NJ: Wiley. p. 6. ISBN 978-0-470-28943-3.

• Sherman, W.R.; Craig, A.B. (2003). Understanding Virtual Reality. San Francisco, CA: Morgan Kaufmann. ISBN 978-1-55860-353-0.

• Banks, J.; Carson J.; Nelson B.L.; Nicol, D. (2005). Discrete-event system simulation (4th ed.). Upper Saddle River, NJ: Pearson Prentice Hall. ISBN 978-0-13-088702-3.

• Ulf, Eriksson (2005). Diffusion of Discrete Event Simulation in Swedish Industry. Gothenburg: Doktorsavhan- dlingar vid Chalmers tekniska högskola. ISBN 91-7291-577-3.

• Paul H. Selden (1997). Sales Process Engineering: A Personal Workshop. Milwaukee, WI: ASQ Quality Press. ISBN 978-0-87389-418-0.

• Strogatz, Steven (2007). “The End of Insight”. In Brockman, John. What is your dangerous idea?. HarperCol- lins. ISBN 9780061214950

• John Robert Taylor (1999). An Introduction to Error Analysis: The Study of Uncertainties in Physical Measure- ments. University Science Books. pp. 128–129. ISBN 0-935702-75-X.

• Wescott, Bob (2013). The Every Computer Performance Book, Chapter 7: Modeling Computer Performance. CreateSpace. ISBN 1482657759.

______WORLD TECHNOLOGIES ______3 Modeling: An Overview

Mathematical modeling is the description of systems that uses mathematical concepts. The process of developing this model is known as mathematical modeling. The major elements of mathemat- ical modeling are governing equations, constitutive equations, constraints, kinematics equations etc. The major components are discussed in this section.

Mathematical Model

A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used in the natural sciences (such as physics, biology, earth science, meteorology) and engineering disciplines (such as computer science, artificial intelligence), as well as in the social sciences (such as economics, psychology, sociology, political science). Physicists, engineers, statis- ticians, operations research analysts, and economists use mathematical models most extensively. A model may help to explain a system and to study the effects of different components, and to make predictions about behaviour.

Elements of A Mathematical Model

Mathematical models can take many forms, including dynamical systems, statistical models, dif- ferential equations, or game theoretic models. These and other types of models can overlap, with a given model involving a variety of abstract structures. In general, mathematical models may in- clude logical models. In many cases, the quality of a scientific field depends on how well the math- ematical models developed on the theoretical side agree with results of repeatable experiments. Lack of agreement between theoretical mathematical models and experimental measurements of- ten leads to important advances as better theories are developed.

The traditional mathematical model contains four major elements. These are 1. Governing equations 2. Constitutive equations 3. Constraints 4. Kinematic equations

Classifications

Mathematical models are usually composed of relationships and variables. Relationships can be described by operators, such as algebraic operators, functions, differential operators, etc. Vari-

______WORLD TECHNOLOGIES ______Modeling: An Overview 63

ables are abstractions of system parameters of interest, that can be quantified. Several classifica- tion criteria can be used for mathematical models according to their structure:

• Linear vs. nonlinear: If all the operators in a mathematical model exhibit linearity, the resulting mathematical model is defined as linear. A model is considered to be nonlin- ear otherwise. The definition of linearity and nonlinearity is dependent on context, and linear models may have nonlinear expressions in them. For example, in a statis- tical linear model, it is assumed that a relationship is linear in the parameters, but it may be nonlinear in the predictor variables. Similarly, a differential equation is said to be linear if it can be written with linear differential operators, but it can still have non- linear expressions in it. In a mathematical programming model, if the objective func- tions and constraints are represented entirely by linear equations, then the model is regarded as a linear model. If one or more of the objective functions or constraints are represented with a nonlinear equation, then the model is known as a nonlinear model. Nonlinearity, even in fairly simple systems, is often associated with phenomena such as chaos and irreversibility. Although there are exceptions, nonlinear systems and models tend to be more difficult to study than linear ones. A common approach to nonlinear prob- lems is linearization, but this can be problematic if one is trying to study aspects such as irreversibility, which are strongly tied to nonlinearity.

• Static vs. dynamic: A dynamic model accounts for time-dependent changes in the state of the system, while a static (or steady-state) model calculates the system in equilibrium, and thus is time-invariant. Dynamic models typically are represented by differential equations.

• Explicit vs. implicit: If all of the input parameters of the overall model are known, and the output parameters can be calculated by a finite series of computations (known as linear pro- gramming), the model is said to be explicit. But sometimes it is the output parameters which are known, and the corresponding inputs must be solved for by an iterative procedure, such as Newton’s method (if the model is linear) or Broyden’s method (if non-linear). For example, a jet engine’s physical properties such as turbine and nozzle throat areas can be explicitly calcu- lated given a design thermodynamic cycle (air and fuel flow rates, pressures, and temperatures) at a specific flight condition and power setting, but the engine’s operating cycles at other flight conditions and power settings cannot be explicitly calculated from the constant physical prop- erties.

• Discrete vs. continuous: A discrete model treats objects as discrete, such as the particles in a molecular model or the states in a statistical model; while a continuous model represents the objects in a continuous manner, such as the velocity field of fluid in pipe flows, tem- peratures and stresses in a solid, and electric field that applies continuously over the entire model due to a point charge.

• Deterministic vs. probabilistic (stochastic): A deterministic model is one in which every set of variable states is uniquely determined by parameters in the model and by sets of previ- ous states of these variables; therefore, a deterministic model always performs the same way for a given set of initial conditions. Conversely, in a stochastic model—usually called a “statistical model”—randomness is present, and variable states are not described by unique values, but rather by probability distributions.

______WORLD TECHNOLOGIES ______64 Numerical Analysis, Modelling and Simulation

• Deductive, inductive, or floating: A deductive model is a logical structure based on a theory. An inductive model arises from empirical findings and generalization from them. The float- ing model rests on neither theory nor observation, but is merely the invocation of expected structure. Application of mathematics in social sciences outside of economics has been criticized for unfounded models. Application of catastrophe theory in science has been characterized as a floating model.

Significance in The Natural Sciences

Mathematical models are of great importance in the natural sciences, particularly in physics. Phys- ical theories are almost invariably expressed using mathematical models.

Throughout history, more and more accurate mathematical models have been developed. New- ton’s laws accurately describe many everyday phenomena, but at certain limits relativity theory and quantum mechanics must be used; even these do not apply to all situations and need further refinement. It is possible to obtain the less accurate models in appropriate limits, for example rel- ativistic mechanics reduces to Newtonian mechanics at speeds much less than the speed of light. Quantum mechanics reduces to classical physics when the quantum numbers are high. For exam- ple, the de Broglie wavelength of a tennis is insignificantly small, so classical physics is a good approximation to use in this case.

It is common to use idealized models in physics to simplify things. Massless ropes, point particles, ideal gases and the particle in a box are among the many simplified models used in physics. The laws of physics are represented with simple equations such as Newton’s laws, Maxwell’s equations and the Schrödinger equation. These laws are such as a basis for making mathematical models of real situations. Many real situations are very complex and thus modeled approximate on a com- puter, a model that is computationally feasible to compute is made from the basic laws or from ap- proximate models made from the basic laws. For example, molecules can be modeled by molecular orbital models that are approximate solutions to the Schrödinger equation. In engineering, physics models are often made by mathematical methods such as finite element analysis.

Different mathematical models use different geometries that are not necessarily accurate descrip- tions of the geometry of the universe. Euclidean geometry is much used in classical physics, while special relativity and general relativity are examples of theories that use geometries which are not Euclidean.

Some Applications

Since prehistorical times simple models such as maps and diagrams have been used.

Often when engineers analyze a system to be controlled or optimized, they use a mathemati- cal model. In analysis, engineers can build a descriptive model of the system as a hypothesis of how the system could work, or try to estimate how an unforeseeable event could affect the system. Similarly, in control of a system, engineers can try out different control approaches in simulations.

A mathematical model usually describes a system by a set of variables and a set of equations that establish relationships between the variables. Variables may be of many types; real or integer

______WORLD TECHNOLOGIES ______Modeling: An Overview 65

numbers, boolean values or strings, for example. The variables represent some properties of the system, for example, measured system outputs often in the form of signals, timing data, counters, and event occurrence (yes/no). The actual model is the set of functions that describe the relations between the different variables.

Building Blocks

In business and engineering, mathematical models may be used to maximize a certain output. The system under consideration will require certain inputs. The system relating inputs to outputs de- pends on other variables too: decision variables, state variables, exogenous variables, and random variables.

Decision variables are sometimes known as independent variables. Exogenous variables are some- times known as parameters or constants. The variables are not independent of each other as the state variables are dependent on the decision, input, random, and exogenous variables. Further- more, the output variables are dependent on the state of the system (represented by the state variables).

Objectives and constraints of the system and its users can be represented as functions of the output variables or state variables. The objective functions will depend on the perspective of the model’s user. Depending on the context, an objective function is also known as an index of performance, as it is some measure of interest to the user. Although there is no limit to the number of objective functions and constraints a model can have, using or optimizing the model becomes more involved (computationally) as the number increases.

For example, in economics students often apply when using input-output models. Complicated mathematical models that have many variables may be consolidated by use of vectors where one symbol represents several variables.

A Priori Information

To analyse something with a typical “black box approach”, only the behavior of the stimulus/response will be accounted for, to infer the (unknown) box. The usual representation of this black box system is a data flow diagram centered in the box.

Mathematical modeling problems are often classified into black box or white box models, accord- ing to how much a priori information on the system is available. A black-box model is a system of which there is no a priori information available. A white-box model (also called glass box or clear box) is a system where all necessary information is available. Practically all systems are some- where between the black-box and white-box models, so this concept is useful only as an intuitive guide for deciding which approach to take.

Usually it is preferable to use as much a priori information as possible to make the model more

______WORLD TECHNOLOGIES ______66 Numerical Analysis, Modelling and Simulation

accurate. Therefore, the white-box models are usually considered easier, because if you have used the information correctly, then the model will behave correctly. Often the a priori information comes in forms of knowing the type of functions relating different variables. For example, if we make a model of how a medicine works in a human system, we know that usually the amount of medicine in the blood is an exponentially decaying function. But we are still left with several un- known parameters; how rapidly does the medicine amount decay, and what is the initial amount of medicine in blood? This example is therefore not a completely white-box model. These parameters have to be estimated through some means before one can use the model.

In black-box models one tries to estimate both the functional form of relations between variables and the numerical parameters in those functions. Using a priori information we could end up, for example, with a set of functions that probably could describe the system adequately. If there is no a priori information we would try to use functions as general as possible to cover all different models. An often used approach for black-box models are neural networks which usually do not make assumptions about incoming data. Alternatively the NARMAX (Nonlinear AutoRegressive Moving Average model with eXogenous inputs) algorithms which were developed as part of non- linear system identification can be used to select the model terms, determine the model structure, and estimate the unknown parameters in the presence of correlated and nonlinear noise. The ad- vantage of NARMAX models compared to neural networks is that NARMAX produces models that can be written down and related to the underlying process, whereas neural networks produce an approximation that is opaque.

Subjective Information

Sometimes it is useful to incorporate subjective information into a mathematical model. This can be done based on intuition, experience, or expert opinion, or based on convenience of mathemat- ical form. Bayesian statistics provides a theoretical framework for incorporating such subjectivity into a rigorous analysis: we specify a prior probability distribution (which can be subjective), and then update this distribution based on empirical data.

An example of when such approach would be necessary is a situation in which an experimenter bends a coin slightly and tosses it once, recording whether it comes up heads, and is then given the task of predicting the probability that the next flip comes up heads. After bending the coin, the true probability that the coin will come up heads is unknown; so the experimenter would need to make a decision (perhaps by looking at the shape of the coin) about what prior distribution to use. Incorporation of such subjective information might be important to get an accurate estimate of the probability.

Complexity

In general, model complexity involves a trade-off between simplicity and accuracy of the model. Occam’s razor is a principle particularly relevant to modeling; the essential idea being that among models with roughly equal predictive power, the simplest one is the most desirable. While added complexity usually improves the realism of a model, it can make the model difficult to understand and analyze, and can also pose computational problems, including numerical instability. Thomas Kuhn argues that as science progresses, explanations tend to become more complex before a par- adigm shift offers radical simplification.

______WORLD TECHNOLOGIES ______Modeling: An Overview 67

This is a schematic representation of three types of mathematical models of complex systems with the level of their mechanistic understanding.

For example, when modeling the flight of an aircraft, we could embed each mechanical part of the aircraft into our model and would thus acquire an almost white-box model of the system. Howev- er, the computational cost of adding such a huge amount of detail would effectively inhibit the us- age of such a model. Additionally, the uncertainty would increase due to an overly complex system, because each separate part induces some amount of variance into the model. It is therefore usually appropriate to make some approximations to reduce the model to a sensible size. Engineers often can accept some approximations in order to get a more robust and simple model. For example, Newton’s classical mechanics is an approximated model of the real world. Still, Newton’s model is quite sufficient for most ordinary-life situations, that is, as long as particle speeds are well below the speed of light, and we study macro-particles only.

Training

Any model which is not pure white-box contains some parameters that can be used to fit the model to the system it is intended to describe. If the modeling is done by a neural network, the optimi- zation of parameters is called training. In more conventional modeling through explicitly given mathematical functions, parameters are determined by .

Model Evaluation

A crucial part of the modeling process is the evaluation of whether or not a given mathematical model describes a system accurately. This question can be difficult to answer as it involves several different types of evaluation.

Fit to Empirical Data

Usually the easiest part of model evaluation is checking whether a model fits experimental mea- surements or other empirical data. In models with parameters, a common approach to test this fit is to split the data into two disjoint subsets: training data and verification data. The training data are used to estimate the model parameters. An accurate model will closely match the verification data even though these data were not used to set the model’s parameters. This practice is referred to as cross-validation in statistics.

Defining a metric to measure distances between observed and predicted data is a useful tool of assessing model fit. In statistics, decision theory, and someeconomic models, a loss function plays a similar role.

______WORLD TECHNOLOGIES ______68 Numerical Analysis, Modelling and Simulation

While it is rather straightforward to test the appropriateness of parameters, it can be more difficult to test the validity of the general mathematical form of a model. In general, more mathematical tools have been developed to test the fit of statistical models than models involving differential equations. Tools from non-parametric statistics can sometimes be used to evaluate how well the data fit a known distribution or to come up with a general model that makes only minimal assump- tions about the model’s mathematical form.

Scope of The Model

Assessing the scope of a model, that is, determining what situations the model is applicable to, can be less straightforward. If the model was constructed based on a set of data, one must determine for which systems or situations the known data is a “typical” set of data.

The question of whether the model describes well the properties of the system between data points is called interpolation, and the same question for events or data points outside the observed data is called extrapolation.

As an example of the typical limitations of the scope of a model, in evaluating Newtonian classical mechanics, we can note that Newton made his measurements without advanced equipment, so he could not measure properties of particles travelling at speeds close to the speed of light. Likewise, he did not measure the movements of molecules and other small particles, but macro particles only. It is then not surprising that his model does not extrapolate well into these domains, even though his model is quite sufficient for ordinary life physics.

Philosophical Considerations

Many types of modeling implicitly involve claims about causality. This is usually (but not al- ways) true of models involving differential equations. As the purpose of modeling is to increase our understanding of the world, the validity of a model rests not only on its fit to empirical observations, but also on its ability to extrapolate to situations or data beyond those originally described in the model. One can think of this as the differentiation between qualitative and quantitative predictions. One can also argue that a model is worthless unless it provides some insight which goes beyond what is already known from direct investigation of the phenomenon being studied.

An example of such criticism is the argument that the mathematical models of Optimal foraging theory do not offer insight that goes beyond the common-sense conclusions of evolution and other basic principles of ecology.

Examples

• One of the popular examples in computer science is the mathematical models of various machines, an example is the deterministic finite automaton which is defined as an abstract mathematical concept, but due to the deterministic nature of a DFA, it is implementable in hardware and software for solving various specific problems. For example, the following is a DFA M with a binary alphabet, which requires that the input contains an even number of 0s.

______WORLD TECHNOLOGIES ______Modeling: An Overview 69

The state diagram for M

M = (Q, Σ, δ, q0, F) where • Q = {S1, S2}, • Σ = {0, 1},

• q0 = S1, • F = {S1}, and • δ is defined by the following state transition table:

0 1

S1 S2 S1

S2 S1 S2

The state S1 represents that there has been an even number of 0s in the input so far, while S2 signifies an odd number. A 1 in the input does not change the state of the automaton. When the input ends, the state will show whether the input contained an even number of 0s or not. If the input did contain an even number of 0s, M will finish in state S1, an accepting state, so the input string will be accepted. The language recognized by M is the regular language given by the regular expression 1*( 0 (1*) 0 (1*) )*, where “*” is the Kleene star, e.g., 1* denotes any non-negative number (possibly zero) of symbols “1”.

• Many everyday activities carried out without a thought are uses of mathematical models. A geographical map projection of a region of the earth onto a small, plane surface is a model which can be used for many purposes such as planning travel.

• Another simple activity is predicting the position of a vehicle from its initial position, direc- tion and speed of travel, using the equation that distance traveled is the product of time and speed. This is known as dead reckoning when used more formally. Mathematical modeling in this way does not necessarily require formal mathematics; animals have been shown to use dead reckoning.

• Population Growth. A simple (though approximate) model of population growth is the Malthusian growth model. A slightly more realistic and largely used population growth model is the logistic function, and its extensions.

• Individual-based cellular automata models of population growth

______WORLD TECHNOLOGIES ______70 Numerical Analysis, Modelling and Simulation

• Model of a particle in a potential-field. In this model we consider a particle as being a point of mass which describes a trajectory in space which is modeled by a function giving its co- ordinates in space as a function of time. The potential field is given by a function V: 3→ and the trajectory, that is a function r : R → R 3 r:,→ 3 is the solution of the differential equation:

d2rrrr ()t∂∂∂ Vt [ ()] Vt [ ()] Vt [ ()] −=m xyzˆˆˆ + + , dt2 ∂∂∂ xyz that can be written also as:

d2r ()t m= −∇ Vt[r ( )]. dt 2

Note this model assumes the particle is a point mass, which is certainly known to be false in many cases in which we use this model; for example, as a model of planetary motion.

• Model of rational behavior for a consumer. In this model we assume a consumer faces

a choice of n commodities labeled 1,2,...,n each with a market price p1, p2,..., pn. The con- sumer is assumed to have a cardinal utility function U (cardinal in the sense that it assigns

numerical values to utilities), depending on the amounts of commodities x1, x2,..., xn con- sumed. The model further assumes that the consumer has a budget M which is used to

purchase a vector x1, x2,..., xn in such a way as to maximize U(x1, x2,..., xn). The problem of rational behavior in this model then becomes an optimization problem, that is:

maxUx (12 , x ,… , xn )

subject to:

n ∑ pxii≤ M. i=1

xii ≥0 ∀∈ {1, 2, … , n }

This model has been used in general equilibrium theory, particularly to show existence and Pa- reto efficiency of economic equilibria. However, the fact that this particular formulation assigns numerical values to levels of satisfaction is the source of criticism (and even ridicule). Howev- er, it is not an essential ingredient of the theory and again this is an idealization.

• Neighbour-sensing model explains the mushroom formation from the initially chaotic fun- gal network.

• Computer science: models in Computer Networks, data models, surface model,...

• Mechanics: movement of rocket model,...

Modeling requires selecting and identifying relevant aspects of a situation in the real world.

______WORLD TECHNOLOGIES ______Modeling: An Overview 71

Major Elements of Mathematical Model Governing Equation

Mathematical models can take many forms, including dynamical systems, statistical models, dif- ferential equations, game theoretic models, recurrence relations, an algorithm for calculation of a sequence of related states (e.g. equilibrium states) and possibly even more forms. The governing equations of a mathematical model describes how the unknown variables (i.e. the dependent vari- ables) will change. The change of variables w.r.t. time may be explicit (i.e. a governing equation in- cludes derivative with respect to time) or implicit (e.g. a governing equation has velocity or flux as unknown variable or an algorithm). The classic governing equations in continuum mechanics are • Balance of mass • Balance of (linear) momentum • Balance of angular momentum • Balance of energy • Balance of entropy

For isolated systems the upper four equations are the familiar conservation equations in physics. A governing equation may also take the form of a flux equation like the diffusion equation or the heat conduction equation. In these cases the flux itself is a variable describing change of the unknown variable or property (e.g. mole concentration or internal energy or temperature). A governing equation may also be an approximation and adaption of the above basic equations to the situation or model in question. A governing equation may also be derived directly from experimental results and therefore be an empiric equation. A governing equation may also be an equation describing the state of the system, and thus actually be a constitutive equation that has “stepped up the ranks” because the model in question was not meant to include a time-dependent term in the equation. This is the case for a model of a petroleum processing plant. Results from one thermodynamic equilibrium calculation are input data to the next equilibrium calculation together with some new state parameters and so on. In this case the algorithm and sequence of input data form a chain of actions, or calculations, that describes change of states from the first state (based solely on input data) to the last state that finally comes out of the calculation sequence.

Some examples using differential equations are

• Lotka-Volterra equations are predator-prey equations

• Hele-Shaw flow

• Plate theory

o Kirchhoff–Love plate theory or Bending of Kirchhoff-Love plates

o Mindlin–Reissner plate theory or Bending of thick Mindlin plates or Bending of Reissner-Stein cantilever plates

• Vortex shedding

______WORLD TECHNOLOGIES ______72 Numerical Analysis, Modelling and Simulation

• Annular fin

• Astronautics

• Finite volume method for unsteady flow

• Acoustic theory

• Precipitation hardening

• Kelvin’s circulation theorem

• Kernel function for solving integral equation of surface radiation exchanges

• Nonlinear acoustics

• Large eddy simulation

• Föppl–von Kármán equations

• Timoshenko beam theory

Constitutive Equation

In physics and engineering, a constitutive equation or constitutive relation is a relation between two physical quantities (especially kinetic quantities as related to kinematic quantities) that is spe- cific to a material orsubstance , and approximates the response of that material to external stimuli, usually as applied fields or forces. They are combined with other equations governing physical laws to solve physical problems; for example in fluid mechanics the flow of a fluid in a pipe, insolid state physics the response of a to an electric field, or in structural analysis, the connection between applied stresses or forces to strains or deformations.

Some constitutive equations are simply phenomenological; others are derived from first prin- ciples. A common approximate constitutive equation frequently is expressed as a simple pro- portionality using a parameter taken to be a property of the material, such as electrical con- ductivity or a spring constant. However, it is often necessary to account for the directional dependence of the material, and the scalar parameter is generalized to a tensor. Constitutive relations are also modified to account for the rate of response of materials and their non-linear behavior.

Mechanical Properties of Matter

The first constitutive equation (constitutive law) was developed by Robert Hooke and is known as Hooke’s law. It deals with the case of linear elastic materials. Following this discovery, this type of equation, often called a “stress-strain relation” in this example, but also called a “constitutive as- sumption” or an “equation of state” was commonly used. Walter Noll advanced the use of constitu- tive equations, clarifying their classification and the role of invariance requirements, constraints, and definitions of terms like “material”, “isotropic”, “aeolotropic”, etc. The class of “constitutive relations” of the form stress rate = f (velocity gradient, stress, density) was the subject of Walter Noll’s dissertation in 1954 under Clifford Truesdell.

______WORLD TECHNOLOGIES ______Modeling: An Overview 73

In modern condensed matter physics, the constitutive equation plays a major role.

Definitions

Quantity (Common) (common Defining equation SI units Dimension symbol/s name/s)

General stress, σ = FA/ P, σ Pa = N m−2 [M] [T]−2[L]−1 Pressure F may be any perpendicular force applied to area A ε = ∆DD/

• D = dimension General strain ε (length, area, dimensionless dimensionless volume) • ΔK = change in material General elastic E E = σε/ Pa = N m−2 [M] [T]−2 [L]−1 modulus mod mod

Young’s modulus E, Y Y=σ /( ∆ LL / ) Pa = N m−2 [M] [T] −2[L]−1

Shear modulus G G= ∆ xL/ Pa = N m−2 [M] [T]−2[L]−1

Bulk modulus K, B B= P/( ∆ VV / ) Pa = N m−2 [M] [T]−2[L]−1

Compressibility C C =1 > Pa−1 = m2 N−1 [L] [T]2[M]−1

Deformation of Solids Friction

Friction is a complicated phenomenon. Macroscopically the friction force F between the interface of two materials can be modelled as proportional to the reaction force R at a point of contact be-

tween two interfaces, through a dimensionless coefficient of friction μf which depends on the pair of materials:

FR= µ f .

This can be applied to static friction (friction preventing two stationary objects from slipping on their own), kinetic friction (friction between two objects scraping/sliding past each other), or roll- ing (frictional force which prevents slipping but causes a torque to exert on a round object). Sur- prisingly, the friction force does not depend on the surface area of common contact.

Stress and Strain

The stress-strain constitutive relation for linear materials is commonly known as Hooke’s law. In its simplest form, the law defines thespring constant (or elasticity constant) k in a scalar equation, stating the tensile/compressive force is proportional to the extended (or contracted) displacement x:

______WORLD TECHNOLOGIES ______74 Numerical Analysis, Modelling and Simulation

Fii= − kx

meaning the material responds linearly. Equivalently, in terms of the stress σ, Young’s modulus E, and strain ε (dimensionless):

σε= E

In general, forces which deform solids can be normal to a surface of the material (normal forces), or tangential (shear forces), this can be described mathematically using the stress tensor:

σij= CS ijkl εε kl ij= ijkl σ kl

where C is the elasticity tensor and S is the compliance tensor

Solid-state Deformations

Several classes of deformations in elastic materials are the following: • Elastic: The material recovers its initial shape after deformation. • Anelastic: if the material is close to elastic, but the applied force induces additional time-de- pendent resistive forces (i.e. depend on rate of change of extension/compression, in addi- tion to the extension/compression). Metals and ceramics have this characteristic, but it is usually negligible, although not so much when heating due to friction occurs (such as vibrations or shear stresses in machines). • Viscoelastic: If the time-dependent resistive contributions are large, and cannot be ne- glected. Rubbers and plastics have this property, and certainly do not satisfy Hooke’s law. In fact, elastic hysteresis occurs. • Plastic: The applied force induces non-recoverable deformations in the material when the stress (or elastic strain) reaches a critical magnitude, called the yield point. • Hyperelastic: The applied force induces displacements in the material following a strain energy density function.

Collisions

The relative speed of separation vseparation of an object A after a collision with another object B is re- lated to the relative speed of approach vapproach by the coefficient of restitution, defined byNewton’s experimental impact law: ||v e = separation ||v approach which depends the materials A and B are made from, since the collision involves interactions at the surfaces of A and B. Usually 0 ≤ e ≤ 1, in which e = 1 for completely elastic collisions, and e = 0 for completely inelastic collisions. It’s possible for e ≥ 1 to occur – for superelastic (or explosive) collisions.

______WORLD TECHNOLOGIES ______Modeling: An Overview 75

Deformation of Fluids

The drag equation gives the drag force D on an object of cross-section area A moving through a fluid of density ρ at velocity v (relative to the fluid) 1 D= cρ Av2 2 d

where the drag coefficient (dimensionless) cd depends on the geometry of the object and the drag forces at the interface between the fluid and object.

For a Newtonian fluid of viscosity μ, the shear stress τ is linearly related to the strain rate (trans- verse flow velocity gradient) ∂u/∂y (units s−1). In a uniform shear flow: ∂u τµ= , ∂y

with u(y) the variation of the flow velocity u in the cross-flow (transverse) direction y. In general,

for a Newtonian fluid, the relationship between the elements τij of the shear stress tensor and the deformation of the fluid is given by

∂ 11∂ vi v j τµij =2 eij− ∆ δ ij with eij = + and ∆=ekk = div v, ∂∂ ∑ 32xxji k

where vi are the components of the flow velocity vector in the corresponding xi coordinate direc- tions, eij are the components of the strain rate tensor, Δ is the volumetric strain rate (or dilatation rate) and δij is the Kronecker delta. The ideal gas law is a constitutive relation in the sense the pressure p and volume V are related to the temperature T, via the number of moles n of gas: pV= nRT

where R is the gas constant (J K−1 mol−1).

Electromagnetism Constitutive Equations in Electromagnetism and Related Areas

In both classical and quantum physics, the precise dynamics of a system form a set of coupled dif- ferential equations, which are almost always too complicated to be solved exactly, even at the level of statistical mechanics. In the context of electromagnetism, this remark applies to not only the dynamics of free charges and currents (which enter Maxwell’s equations directly), but also the dy- namics of bound charges and currents (which enter Maxwell’s equations through the constitutive relations). As a result, various approximation schemes are typically used.

For example, in real materials, complex transport equations must be solved to determine the time and spatial response of charges, for example, the Boltzmann equation or the Fokker–Planck equa-

______WORLD TECHNOLOGIES ______76 Numerical Analysis, Modelling and Simulation

tion or the Navier-Stokes equations. For example, magnetohydrodynamics, fluid dynamics, elec- trohydrodynamics, superconductivity, plasma modeling. An entire physical apparatus for dealing with these matters has developed.

These complex theories provide detailed formulas for the constitutive relations describing the electrical response of various materials, such as permittivities, permeabilities, conductivities and so forth.

It is necessary to specify the relations between displacement field D and E, and the magnetic H-field H and B, before doing calculations in electromagnetism (i.e. applying Maxwell’s macro- scopic equations). These equations specify the response of bound charge and current to the ap- plied fields and are called constitutive relations.

Determining the constitutive relationship between the auxiliary fields D and H and the E and B fields starts with the definition of the auxiliary fields themselves:

Dr(,)t=ε 0 Er (,) tt + Pr (,)

1 Hr(,)t= Br (,) tt − Mr (,), µ0

where P is the polarization field and M is the magnetization field which are defined in terms of microscopic bound charges and bound current respectively. Before getting to how to calculate M and P it is useful to examine the following special cases.

Without Magnetic or Dielectric Materials

In the absence of magnetic or dielectric materials, the constitutive relations are simple:

D=εµ00 EHB,/ =

where ε0 and μ0 are two universal constants, called the permittivity of free space and permeability of free space, respectively.

Isotropic Linear Materials

In an (isotropic) linear material, where P is proportional to E, and M is proportional to B, the con- stitutive relations are also straightforward. In terms of the polarization P and the magnetization M they are:

P=εχ0 em EM,, = χ H

where χe and χm are the electric and magnetic susceptibilities of a given material respectively. In terms of D and H the constitutive relations are: D=εµ EHB, = /,

where ε and μ are constants (which depend on the material), called the permittivity and permea-

______WORLD TECHNOLOGIES ______Modeling: An Overview 77

bility, respectively, of the material. These are related to the susceptibilities by:

εε/00==+==+ εre ( χ 1), µµ / µrm ( χ 1)

General Case

For real-world materials, the constitutive relations are not linear, except approximately. Calculat- ing the constitutive relations from first principles involves determining how P and M are created from a given E and B. These relations may be empirical (based directly upon measurements), or theoretical (based upon statistical mechanics, transport theory or other tools of condensed matter physics). The detail employed may be macroscopic or microscopic, depending upon the level nec- essary to the problem under scrutiny.

In general, the constitutive relations can usually still be written:

D=εµ EH, = −1 B

but ε and μ are not, in general, simple constants, but rather functions of E, B, position and time, and tensorial in nature. Examples are:

• Dispersion and absorption where ε and μ are functions of frequency. (Causality does not per- mit materials to be nondispersive; for example, Kramers–Kronig relations). Neither do the fields need to be in phase which leads to ε and μ being complex. This also leads to absorption.

• Nonlinearity where ε and μ are functions of E and B.

• Anisotropy (such as birefringence or dichroism) which occurs when ε and μ are sec- ond-rank tensors,

Di= ∑∑εµ ij EB j i = ij H j . jj

• Dependence of P and M on E and B at other locations and times. This could be due to spatial inhomogeneity; for example in a domained structure, heterostructure or a liquid crystal, or most commonly in the situation where there are simply multiple materials oc- cupying different regions of space. Or it could be due to a time varying medium or due to hysteresis. In such cases P and M can be calculated as:

= εχ3 ′′′ˆ ′′ ′ Pr(,)t0 ∫ d r d te (, rr ,, tt ; EEr ) ( , t )

1 = 3 ′′′χˆ ′′ ′ Mr(,)t∫ d r d tm (, rr ,, tt ; BBr ) ( , t ), µ0

in which the permittivity and permeability functions are replaced by integrals over the more general electric and magnetic susceptibilities. In homogenous materials, dependence on other locations is known as spatial dispersion.

As a variation of these examples, in general materials are bianisotropic where D and B depend on

______WORLD TECHNOLOGIES ______78 Numerical Analysis, Modelling and Simulation

both E and H, through the additional coupling constants ξ and ζ: D=+=+εξ E HB,. µζ H E

In practice, some materials properties have a negligible impact in particular circumstances, per- mitting neglect of small effects. For example: optical nonlinearities can be neglected for low field strengths; material dispersion is unimportant when frequency is limited to a narrow bandwidth; material absorption can be neglected for wavelengths for which a material is transparent; and met- als with finite conductivity often are approximated at microwave or longer wavelengths as perfect metals with infinite conductivity (forming hard barriers with zeroskin depth of field penetration).

Some man-made materials such as metamaterials and photonic crystals are designed to have cus- tomized permittivity and permeability.

Calculation of Constitutive Relations

The theoretical calculation of a material’s constitutive equations is a common, important, and sometimes difficult task in theoreticalcondensed-matter physics and materials science. In general, the constitutive equations are theoretically determined by calculating how a molecule responds to the local fields through the Lorentz force. Other forces may need to be modeled as well such as lattice vibrations in crystals or bond forces. Including all of the forces leads to changes in the mol- ecule which are used to calculate P and M as a function of the local fields.

The local fields differ from the applied fields due to the fields produced by the polarization and magnetization of nearby material; an effect which also needs to be modeled. Further, real materials are not continuous media; the local fields of real materials vary wildly on the atomic scale. The fields need to be averaged over a suitable volume to form a continuum approxima- tion.

These continuum approximations often require some type of quantum mechanical analysis such as quantum field theory as applied to condensed matter physics.

A different set of homogenization methods (evolving from a tradition in treating materials such as conglomerates and laminates) are based upon approximation of an inhomogeneous material by a homogeneous effective medium (valid for excitations with wavelengths much larger than the scale of the inhomogeneity).

The theoretical modeling of the continuum-approximation properties of many real materials often rely upon experimental measurement as well. For example, ε of an insulator at low frequencies can be measured by making it into a parallel-plate capacitor, and ε at optical-light frequencies is often measured by ellipsometry.

Thermoelectric and Electromagnetic Properties of Matter

These constitutive equations are often used in crystallography, a field of solid-state physics. Electromagnetic properties of solids Stimuli/response Constitutive tensor of Property/effect Equation parameters of system system

______WORLD TECHNOLOGIES ______Modeling: An Overview 79

• E = electric field strength (N C−1) • J = electric current ρ = electrical resistivity (Ω Hall effect E= ρ JH density (A m−2) m) k kij i j • H = magnetic field intensity (A m−1) • σ = Stress (Pa) Direct Piezoelectric d = direct piezoelectric Pd= σ Effect • P = (dielectric) coefficient (K−1) i ijk jk polarization (C m−2) • ε = Strain Converse Piezoelectric (dimensionless) d = direct piezoelectric −1 εij= dE ijk k Effect • E = electric field coefficient (K ) strength (N C−1) • σ = Stress (Pa) q = piezomagnetic Piezomagnetic effect Mq= σ • M = magnetization coefficient (K−1) i ijk jk (A m−1)

Thermoelectric properties of solids Stimuli/response parameters Constitutive tensor of Property/effect Equation of system system • P = (dielectric) −2 p = pyroelectric coefficient Pyroelectricity polarization (C m ) ∆=∆P pT (C m−2 K−1) jj • T = temperature (K) • S = entropy (J K−1) p = pyroelectric coefficient Electrocaloric effect ∆=S pE ∆ • E = electric field strength (C m−2 K−1) ii (N C−1) • E = electric field strength (N C−1 = V m−1) ∂T Seebeck effect β = thermopower (V K−1) E = −β • T = temperature (K) i ij ∂x j • x = displacement (m) • E = electric field strength (N C−1) Π = Peltier coefficient (W Peltier effect • J = electric current qJ= Π A−1) j ji i density (A m−2) • q = heat flux (W m−2)

Photonics

Refractive index

The (absolute) refractive index of a medium n (dimensionless) is an inherently important property of geometric and physical optics defined as the ratio of the luminal speed in vacuum c0 to that in the medium c:

c0 εµ n = = = εµrr c εµ00

______WORLD TECHNOLOGIES ______80 Numerical Analysis, Modelling and Simulation

where ε is the permittivity and εr the relative permittivity of the medium, likewise μ is the perme-

ability and μr are the relative permmeability of the medium. The vacuum permittivity is ε0 and

vacuum permeability is μ0. In general, n (also εr) are complex numbers. The relative refractive index is defined as the ratio of the two refractive indices. Absolute is for on material, relative applies to every possible pair of interfaces;

nA nAB = nB

Speed of light in matter

As a consequence of the definition, the speed of light in matter is

c =1/ εµ

for special case of vacuum; ε = ε0 and μ = μ0,

c0=1/ εµ 00

Piezooptic effect

The piezooptic effect relates the stresses in solids σ to the dielectric impermeability a, which are coupled by a fourth-rank tensor called the piezooptic coefficient Π (units K−1):

aij= Π ijpqσ pq

Transport Phenomena Definitions

Definitions (thermal properties of matter) Quantity (Common (Common) Symbol/s Defining Equation SI Units Dimension Name/s) General heat C = heat capacity of q= CT J K−1 [M][L]2[T]−2[Θ]−1 capacity substance • L = length of material (m) • α = coefficient ∂ ∂=α Linear thermal linear thermal LT/ L K−1 [Θ]−1 expansion expansion (dimensionless) εαij= ij ∆T • ε = strain tensor (dimensionless)

______WORLD TECHNOLOGIES ______Modeling: An Overview 81

β, γ • V = volume of Volumetric object (m3) (/)∂∂VT =γ V K−1 [Θ]−1 thermal expansion p • p = constant pressure of surroundings κ, K, λ, • A = surface cross section of material (m2)

Thermal • P = thermal W m−1 λ = −PT/(A ⋅∇ ) [M][L][T]−3[Θ]−1 conductivity current/power K−1 through material (W) • T = temperature gradient in ∇material (K m−1) Thermal W m−2 U Ux= λδ/ [M][T]−3[Θ]−1 conductance K−1 R Thermal R=1/ Ux = ∆ /λ m2 K W−1 [M]−1[L][T]3[Θ] resistance Δx = displacement of heat transfer (m)

Definitions (Electrical/magnetic properties of matter) Quantity (Common (Common) Defining SI Units Dimension Name/s) Symbol/s Equation Ω = V A−1 [M] [L]2 [T]−3 Electrical resistance R RV= > = J s C−2 [I]−2 [M]2 [L]2 [T]−3 Resistivity ρ ρ = RA/ l Ω m [I]−2 Resistivity temperature −1 −1 coefficient, linear temperature α ρ−= ρ00 ρα()TT − 0 K [Θ] dependence [T]3 [I]2 [M]−1 Electrical conductance G G =1 > S = Ω−1 [L]−2 [I]2 [T]3 [M]−2 Electrical conductivity σ σρ=1/ Ω−1 m−1 [L]−2 A Wb−1 = Magnetic reluctance R, R , R R = / Φ [M]−1[L]−2[T]2 m m B H−1 Wb A−1 Magnetic permeance P, P , Λ, P Λ=1/ R [M][L]2[T]−2 m m = H

Definitive Laws

There are several laws which describe the transport of matter, or properties of it, in an almost identical way. In every case, in words they read:

Flux (density) is proportional to a gradient, the constant of proportionality is the charac- teristic of the material.

______WORLD TECHNOLOGIES ______82 Numerical Analysis, Modelling and Simulation

In general the constant must be replaced by a 2nd rank tensor, to account for directional depen- dences of the material.

Property/effect Nomenclature Equation • D = mass diffusion coefficient (m2 s−1)

−2 −1 ∂C Fick’s law of diffusion, defines • J = diffusion flux of substance (mol m s ) JD= − diffusion coefficient D j ij • ∂C/∂x = (1d)concentration gradient of ∂xi substance (mol dm−4) • κ = permeability of medium (m2) • μ = fluid viscosity (Pa s) Darcy’s law for porous flow in matter, κ ∂P −1 q = − defines permeability κ • q = discharge flux of substance (m s ) j µ ∂x j • ∂P/∂x = (1d) pressure gradient of system (Pa m−1) • V = potential difference in material (V) • I = electric current through material (A) • Simplist form is: • R = resistance of material (Ω) V= IR • ∂V/∂x = potential gradient (electric field) Ohm’s law of electric conduction, • More general forms through material (V m−1) defines electric conductivity (and are: hence resistivity and resistance) • J = electric current density through material (A m−2) ∂∂VV −1 =ρσJJ = • σ = electric conductivity of material (Ω ∂∂xxji i j ji m−1) ii • ρ = electrical resistivity of material (Ω m) • λ = thermal conductivity of material (W m−1 K−1 ) Fourier’s law of thermal conduction, ∂T • q = heat flux through material (W m−2) q = −λ defines thermal conductivity λ j ij ∂x • ∂T/∂x = temperature gradient in material i (K m−1) • For a single radiator:

IT= εσ 4

• I = radiant intensity (W m−2) For a temperature difference: • σ = Stefan–Boltzmann constant (W m−2 K−4) 44 Stefan–Boltzmann law of black-body • T = temperature of radiating system (K) sys I=εσ () TText − sys radiation, defines emmisivity ε • Text = temperature of external surroundings (K) • 0 ≤ ε ≤ 1 • ε = emissivity (dimensionless) • ε = 0 for perfect reflector • ε = 1 for perfect absorber (true black body)

Constraint (Mathematics)

In mathematics, a constraint is a condition of an optimization problem that the solution must sat- isfy. There are several types of constraints—primarily equality constraints, inequality constraints, and integer constraints. The set of candidate solutions that satisfy all constraints is called the fea- sible set.

______WORLD TECHNOLOGIES ______Modeling: An Overview 83

Example

The following is a simple optimization problem:

24 minf (x ) = xx12 +

subject to

x1 ≥1

and

x2 =1,

where x {\ {\mathbf {x}}} x denotes the vector (x1, x2). In this example, the first line defines the function to be minimized (called the objective function, loss function, or cost function). The second and third lines define two constraints, the first of which is an inequality constraint and the second of which is an equality constraint. These two constraints are hard constraints, meaning that it is required that they be satisfied; they define the feasible set of candidate solutions.

Without the constraints, the solution would be (0,0), where f ()x has the lowest value. But this solution does not satisfy the constraints. The solution of the constrained optimization problem stated above is x = (1,1), which is the point with the smallest value of f ()x that satisfies the two constraints.

Terminology • If an inequality constraint holds with equality at the optimal point, the constraint is said to be binding, as the point cannot be varied in the direction of the constraint even though doing so would improve the value of the objective function. • If an inequality constraint holds as a strict inequality at the optimal point (that is, does not hold with equality), the constraint is said to be non-binding, as the point could be varied in the direction of the constraint, although it would not be optimal to do so. If a constraint is non-binding, the optimization problem would have the same solution even in the absence of that constraint. • If a constraint is not satisfied at a given point, the point is said to be infeasible.

Hard and Soft Constraints

If the problem mandates that the constraints be satisfied, as in the above discussion, the con- straints are sometimes referred to as hard constraints. However, in some problems, called flexible constraint satisfaction problems, it is preferred but not required that certain constraints be satis- fied; such non-mandatory constraints are known as soft constraints. Soft constraints arise in, for example, preference-based planning. In a MAX-CSP problem, a number of constraints are allowed to be violated, and the quality of a solution is measured by the number of satisfied constraints.

______WORLD TECHNOLOGIES ______84 Numerical Analysis, Modelling and Simulation

Kinematics Equations

Kinematics equations refers to the constraint equations of a mechanical system such as a robot manipulator that define how input movement at one or more joints specifies the configuration of the device, in order to achieve a task position or end-effector location. Kinematics equations are used to analyze and design articulated systems ranging from four-bar linkages to serial and paral- lel robots.

Kinematics equations are constraint equations that characterize the geometric configuration of an articulated mechanical system. Therefore, these equations assume the links are rigid and the joints provide pure rotation or translation. Constraint equations of this type are known as holonomic constraints in the study of the dynamics of multi-body systems.

Loop Equations

The kinematics equations for a mechanical system are formed as a sequence of rigid transforma- tions along links and around joints in a mechanical system. The principle that the sequence of transformations around a loop must return to the identity provides what are known as the loop equations. An independent set of kinematics equations is assembled from the various sets of loop equations that are available in a mechanical system.

Transformations

In 1955, Jacques Denavit and Richard Hartenberg introduced a convention for the definition of the joint matrices [Z] and link matrices [X] to standardize the coordinate frames for spatial linkages. This convention positions the joint frame so that it consists of a screw displacement along the Z-axis

cosθθii− sin 0 0  sinθθ cos 0 0 []Z = ii , i 0 01d i 0 0 01

and it positions the link frame so it consists of a screw displacement along the X-axis,

10 0 aii,1+ αα− 0 cosii,1++ sinii,1 0 []X i = . 0 sinαα++ cos 0 ii,1 ii,1 00 0 1

The kinematics equations are obtained using a rigid transformation [Z] to characterize the relative movement allowed at each joint and separate rigid transformation [X] to define the dimensions of each link.

The result is a sequence of rigid transformations alternating joint and link transformations from the base of the chain around a loop back to the base to obtain the loop equation,

______WORLD TECHNOLOGIES ______Modeling: An Overview 85

[ZXZX1 ][ 12 ][ ][ 2 ]…= [ Xnn− 1 ][ Z ] [ I ].

The series of transformations equate to the identify matrix because they return to the beginning of the loop.

Serial Chains

The kinematics equations for a serial chain robot are obtained by formulating the loop equations in terms of a transformation [T] from the base to the end-effector, which is equated to the series of transformations along the robot. The result is,

[T ]= [ ZXZX1 ][ 12 ][ ][ 2 ]… [ Xnn− 1 ][ Z ],

These equations are called the kinematics equations of the serial chain.

Parallel Chains

The kinematics equations for a parallel chain, or parallel robot, formed by an end-effector support- ed by multiple serial chains are obtained from the kinematics equations of each of the supporting serial chains. Suppose that m serial chains support the end-effector, then the transformation from the base to the end-effector is defined by m equations,

[TZXZX ]= [1,j ][ 1, j ][ 2, j ][ 2, j ] …= [ X n−1, j ][ Z nj , ], j 1,… , m .

These equations are the kinematics equations of the parallel chain.

Forward Kinematics

The kinematics equations of serial and parallel robots can be viewed as relating parameters, such as joint angles, that are under the control of actuators to the position and orientation [T] of the end-effector.

From this point of view the kinematics equations can be used in two different ways. The first called forward kinematics uses specified values for the joint parameters to compute the end-effector position and orientation. The second called inverse kinematics uses the position and orientation of the end-effector to compute the joint parameters values.

Remarkably, while the forward kinematics of a serial chain is a direct calculation of a single matrix equation, the forward kinematics of a parallel chain requires the simultaneous solution of multiple matrix equations which presents a significant challenge.

Conceptual Model

A conceptual model is a representation of a system, made of the composition of concepts which are used to help people know, understand, or simulate a subject the model represents. Some models

______WORLD TECHNOLOGIES ______86 Numerical Analysis, Modelling and Simulation

are physical objects; for example, a toy model which may be assembled, and may be made to work like the object it represents.

The term conceptual model may be used to refer to models which are formed after a conceptualiza- tion or generalization process. Conceptual models are often abstractions of things in the real world whether physical or social. Semantics studies are relevant to various stages of concept formation and use as Semantics is basically about concepts, the meaning that thinking beings give to various elements of their experience.

Models of Concepts and Models That are Conceptual

The term conceptual model is normal. It could mean “a model of concept” or it could mean “a model that is conceptual.” A distinction can be made between what models are and what models are models of. With the exception of iconic models, such as a scale model of Winchester Cathe- dral, most models are concepts. But they are, mostly, intended to be models of real world states of affairs. The value of a model is usually directly proportional to how well it corresponds to a past, present, future, actual or potential state of affairs. A model of a concept is quite different because in order to be a good model it need not have this real world correspondence. In artificial intelli- gence conceptual models and conceptual graphs are used for building expert systems and knowl- edge-based systems; here the analysts are concerned to represent expert opinion on what is true not their own ideas on what is true.

Type and Scope of Conceptual Models

Conceptual models (models that are conceptual) range in type from the more concrete, such as the mental image of a familiar physical object, to the formal generality and abstractness of mathematical models which do not appear to the mind as an image. Conceptual models also range in terms of the scope of the subject matter that they are taken to represent. A model may, for instance, represent a single thing (e.g. the Statue of Liberty), whole classes of things (e.g. the electron), and even very vast domains of subject matter such as the physical universe. The variety and scope of conceptual models is due to the variety of purposes had by the people using them.

Overview

Conceptual modeling is the activity of formally describing some aspects of the physical and social world around us for the purposes of understanding and communication.”

A conceptual model’s primary objective is to convey the fundamental principles and basic func- tionality of the system which it represents. Also, a conceptual model must be developed in such a way as to provide an easily understood system interpretation for the models users. A conceptual model, when implemented properly, should satisfy four fundamental objectives.

1. Enhance an individual’s understanding of the representative system

2. Facilitate efficient conveyance of system details between stakeholders

3. Provide a point of reference for system designers to extract system specifications

______WORLD TECHNOLOGIES ______Modeling: An Overview 87

4. Document the system for future reference and provide a means for collaboration

The conceptual model plays an important role in the overall system development life cycle. Fig- ure 1 below, depicts the role of the conceptual model in a typical system development scheme. It is clear that if the conceptual model is not fully developed, the execution of fundamental system properties may not be implemented properly, giving way to future problems or system shortfalls. These failures do occur in the industry and have been linked to; lack of user input, incomplete or unclear requirements, and changing requirements. Those weak links in the system design and development process can be traced to improper execution of the fundamental objectives of con- ceptual modeling. The importance of conceptual modeling is evident when such systemic failures are mitigated by thorough system development and adherence to proven development objectives/ techniques.

Techniques

As systems have become increasingly complex, the role of conceptual modeling has dramatically expanded. With that expanded presence, the effectiveness of conceptual modeling at capturing the fundamentals of a system is being realized. Building on that realization, numerous conceptual modeling techniques have been created. These techniques can be applied across multiple disci- plines to increase the users understanding of the system to be modeled. A few techniques are briefly described in the following text, however, many more exist or are being developed. Some commonly used conceptual modeling techniques and methods include; Workflow Modeling, Workforce Modeling, Rapid Application Development, Object Role Modeling, and Unified Model- ing Language (UML).

Data Flow Modeling

Data flow modeling (DFM) is a basic conceptual modeling technique that graphically represents elements of a system. DFM is a fairly simple technique, however, like many conceptual modeling techniques, it is possible to construct higher and lower level representative diagrams. The data flow diagram usually does not convey complex system details such as parallel development consid- erations or timing information, but rather works to bring the major system functions into context.

______WORLD TECHNOLOGIES ______88 Numerical Analysis, Modelling and Simulation

Data flow modeling is a central technique used in systems development that utilizes theStructured Systems Analysis and Design Method (SSADM).

Entity Relationship Modeling

Entity-relationship modeling (ERM) is a conceptual modeling technique used primarily for soft- ware system representation. Entity-relationship diagrams, which are a product of executing the ERM technique, are normally used to represent models and information systems. The main components of the diagram are the entities and relationships. The entities can represent in- dependent functions, objects, or events. The relationships are responsible for relating the entities to one another. To form a system process, the relationships are combined with the entities and any attributes needed to further describe the process. Multiple diagramming conventions exist for this technique; IDEF1X, Bachman, and EXPRESS, to name a few. These conventions are just different ways of viewing and organizing the data to represent different system aspects.

Event-driven Process Chain

The event-driven process chain (EPC) is a conceptual modeling technique which is mainly used to sys- tematically improve business process flows. Like most conceptual modeling techniques, the event driv- en process chain consists of entities/elements and functions that allow relationships to be developed and processed. More specifically, the EPC is made up of events which define what state a process is in or the rules by which it operates. In order to progress through events, a function/ active event must be executed. Depending on the process flow, the function has the ability to transform event states or link to other event driven process chains. Other elements exist within an EPC, all of which work together to define how and by what rules the system operates. The EPC technique can be applied to business practices such as resource planning, process improvement, and logistics.

Joint Application Development

The Dynamic Systems Development Method (DSDM) uses a specific process called JEFFF to con- ceptually model a systems life cycle. JEFFF is intended to focus more on the higher level develop- ment planning that precedes a projects initialization. The JAD process calls for a series of work- shops in which the participants work to identify, define, and generally map a successful project from conception to completion. This method has been found to not work well for large scale appli- cations, however smaller applications usually report some net gain in efficiency.

Place/Transition Net

Also known as Petri Nets, this conceptual modeling technique allows a system to be constructed with elements that can be described by direct mathematical means. The petri net, because of its nondeterministic execution properties and well defined mathematical theory, is a useful technique for modeling concurrent system behavior, i.e. simultaneous process executions.

State Transition Modeling

State transition modeling makes use of state transition diagrams to describe system behavior. These state transition diagrams use distinct states to define system behavior and changes. Most

______WORLD TECHNOLOGIES ______Modeling: An Overview 89

current modeling tools contain some kind of ability to represent state transition modeling. The use of state transition models can be most easily recognized as logic state diagrams and directed graphs for finite state machines.

Technique Evaluation and Selection

Because the conceptual modeling method can sometimes be purposefully vague to account for a broad area of use, the actual application of concept modeling can become difficult. To alleviate this issue, and shed some light on what to consider when selecting an appropriate conceptual modeling technique, the framework proposed by Gemino and Wand will be discussed in the following text. However, before evaluating the effectiveness of a conceptual modeling technique for a particular application, an important concept must be understood; Comparing conceptual models by way of specifically focusing on their graphical or top level representations is shortsighted. Gemino and Wand make a good point when arguing that the emphasis should be placed on a conceptual mod- eling language when choosing an appropriate technique. In general, a conceptual model is devel- oped using some form of conceptual modeling technique. That technique will utilize a conceptual modeling language that determines the rules for how the model is arrived at. Understanding the capabilities of the specific language used is inherent to properly evaluating a conceptual modeling technique, as the language reflects the techniques descriptive ability. Also, the conceptual model- ing language will directly influence the depth at which the system is capable of being represented, whether it be complex or simple.

Considering Affecting Factors

Building on some of their earlier work, Gemino and Wand acknowledge some main points to con- sider when studying the affecting factors: the content that the conceptual model must represent, the method in which the model will be presented, the characteristics of the models users, and the conceptual model languages specific task. The conceptual models content should be considered in order to select a technique that would allow relevant information to be presented. The presenta- tion method for selection purposes would focus on the techniques ability to represent the model at the intended level of depth and detail. The characteristics of the models users or participants is an important aspect to consider. A participant’s background and experience should coincide with the conceptual models complexity, else misrepresentation of the system or misunderstanding of key system concepts could lead to problems in that systems realization. The conceptual model language task will further allow an appropriate technique to be chosen. The difference between creating a system conceptual model to convey system functionality and creating a system concep- tual model to interpret that functionality could involve to completely different types of conceptual modeling languages.

Considering Affected Variables

Gemino and Wand go on to expand the affected variable content of their proposed framework by considering the focus of observation and the criterion for comparison. The focus of observation considers whether the conceptual modeling technique will create a “new product”, or whether the technique will only bring about a more intimate understanding of the system being modeled. The criterion for comparison would weigh the ability of the conceptual modeling technique to be effi-

______WORLD TECHNOLOGIES ______90 Numerical Analysis, Modelling and Simulation

cient or effective. A conceptual modeling technique that allows for development of a system model which takes all system variables into account at a high level may make the process of understand- ing the system functionality more efficient, but the technique lacks the necessary information to explain the internal processes, rendering the model less effective.

When deciding which conceptual technique to use, the recommendations of Gemino and Wand can be applied in order to properly evaluate the scope of the conceptual model in question. Under- standing the conceptual models scope will lead to a more informed selection of a technique that properly addresses that particular model. In summary, when deciding between modeling tech- niques, answering the following questions would allow one to address some important conceptual modeling considerations. 1. What content will the conceptual model represent? 2. How will the conceptual model be presented? 3. Who will be using or participating in the conceptual model? 4. How will the conceptual model describe the system? 5. What is the conceptual models focus of observation? 6. Will the conceptual model be efficient or effective in describing the system?

Another function of the simulation conceptual model is to provide a rational and factual basis for assessment of simulation application appropriateness.

Models in Philosophy and Science Mental Model

In cognitive psychology and philosophy of mind, a mental model is a representation of something in the mind, but a mental model may also refer to a nonphysical external model of the mind itself.

Metaphysical Models

A metaphysical model is a type of conceptual model which is distinguished from other conceptual models by its proposed scope. A metaphysical model intends to represent reality in the broadest possible way. This is to say that it explains the answers to fundamental questions such as whether matter and mind are one or two substances; or whether or not humans have free will.

Conceptual Model Vs. Semantics Model Epistemological Models

An epistemological model is a type of conceptual model whose proposed scope is the known and the knowable, and the believed and the believable.

Logical Models

In logic, a model is a type of interpretation under which a particular statement is true. Logical

______WORLD TECHNOLOGIES ______Modeling: An Overview 91

models can be broadly divided into ones which only attempt to represent concepts, such as math- ematical models; and ones which attempt to represent physical objects, and factual relationships, among which are scientific models.

Model theory is the study of (classes of) mathematical structures such as groups, fields, graphs, or even universes of set theory, using tools from mathematical logic. A system that gives meaning to the sentences of a formal language is called a model for the language. If a model for a language moreover satisfies a particular sentence or theory (set of sentences), it is called a model of the sen- tence or theory. Model theory has close ties to algebra and universal algebra.

Mathematical Models

Mathematical models can take many forms, including but not limited to dynamical systems, sta- tistical models, differential equations, or game theoretic models. These and other types of models can overlap, with a given model involving a variety of abstract structures.

A more comprehensive type of mathematical model uses a linguistic version of to model a given situation. Akin to entity-relationship models, custom categories or sketches can be directly translated into database schemas. The difference is that logic is replaced by category theory, which brings powerful theorems to bear on the subject of modeling, especially useful for translating between disparate models (as functors between categories).

Scientific Models

A scientific model is a simplified abstract view of a complex reality. A scientific model represents empirical objects, phenomena, and physical processes in a logical way. Attempts to formalize the principles of the empirical sciences use an interpretation to model reality, in the same way logi- cians axiomatize the principles of logic. The aim of these attempts is to construct a formal system for which reality is the only interpretation. The world is an interpretation (or model) of these sci- ences, only insofar as these sciences are true.

Statistical Models

A statistical model is a probability distribution function proposed as generating data. In a para- metric model, the probability distribution function has variable parameters, such as the mean and variance in a normal distribution, or the coefficients for the various exponents of the independent variable in linear regression. A nonparametric model has a distribution function without param- eters, such as in bootstrapping, and is only loosely confined by assumptions. Model selection is a statistical method for selecting a distribution function within a class of them, e.g., in linear regres- sion where the dependent variable is a polynomial of the independent variable with parametric co- efficients, model selection is selecting the highest exponent, and may be done with nonparametric means, such as with cross validation.

In statistics there can be models of mental events as well as models of physical events. For exam- ple, a statistical model of customer behavior is a model that is conceptual (because behavior is physical), but a statistical model of customer satisfaction is a model of a concept (because satisfac- tion is a mental not a physical event).

______WORLD TECHNOLOGIES ______92 Numerical Analysis, Modelling and Simulation

Social and Political Models Economic Models

In economics, a model is a theoretical construct that represents economic processes by a set of variables and a set of logical and/or quantitative relationships between them. The economic mod- el is a simplified framework designed to illustrate complex processes, often but not always using mathematical techniques. Frequently, economic models use structural parameters. Structural pa- rameters are underlying parameters in a model or class of models. A model may have various pa- rameters and those parameters may change to create various properties.

Models in Systems Architecture

A system model is the conceptual model that describes and represents the structure, behavior, and more views of a system. A system model can represent multiple views of a system by using two different approaches. The first one is the non-architectural approach and the second one is the architectural approach. The non-architectural approach respectively picks a model for each view. The architectural approach, also known as system architecture, instead of picking many heteroge- neous and unrelated models, will use only one integrated architectural model.

Business Process Modelling

Abstraction for Business process modelling

In business process modelling the enterprise process model is often referred to as the business process model. Process models are core concepts in the discipline of process engineering. Process models are:

• Processes of the same nature that are classified together into a model.

• A description of a process at the type level.

• Since the process model is at the type level, a process is an instantiation of it.

The same process model is used repeatedly for the development of many applications and thus, has many instantiations.

One possible use of a process model is to prescribe how things must/should/could be done in contrast to the process itself which is really what happens. A process model is roughly an anticipa- tion of what the process will look like. What the process shall be will be determined during actual system development.

______WORLD TECHNOLOGIES ______Modeling: An Overview 93

Models in Information System Design Conceptual Models of Human Activity Systems

Conceptual models of human activity systems are used in Soft systems methodology (SSM) which is a method of systems analysis concerned with the structuring of problems in management. These models are models of concepts; the authors specifically state that they are not intended to rep- resent a state of affairs in the physical world. They are also used in Information (IRA) which is a variant of SSM developed for information system design and .

Logico-linguistic Models

Logico-linguistic modeling is another variant of SSM that uses conceptual models. However, this method combines models of concepts with models of putative real world objects and events. It is a graphical representation of modal logic in which modal operators are used to distinguish state- ment about concepts from statements about real world objects and events.

Data Models Entity-relationship Model

In software engineering, an entity-relationship model (ERM) is an abstract and conceptual repre- sentation of data. Entity-relationship modeling is a database modeling method, used to produce a type of or semantic of a system, often a relational database, and its requirements in a top-down fashion. Diagrams created by this process are called entity-relation- ship diagrams, ER diagrams, or ERDs.

Entity-relationship models have had wide application in the building of information systems intended to support activities involving objects and events in the real world. In these cases they are models that are conceptual. However, this modeling method can be used to build computer games or a family tree of the Greek Gods, in these cases it would be used to model concepts.

Domain Model

A domain model is a type of conceptual model used to depict the structural elements and their con- ceptual constraints within a domain of interest (sometimes called the problem domain). A domain model includes the various entities, their attributes and relationships, plus the constraints govern- ing the conceptual integrity of the structural model elements comprising that problem domain. A domain model may also include a number of conceptual views, where each view is pertinent to a particular subject area of the domain or to a particular subset of the domain model which is of interest to a stakeholder of the domain model.

Like entity-relationship models, domain models can be used to model concepts or to model real world objects and events.

______WORLD TECHNOLOGIES ______94 Numerical Analysis, Modelling and Simulation

Conceptual Model (Computer Science)

A mental model captures ideas in a problem domain, while a conceptual model represents ‘con- cepts’ (entities) and relationships between them.

A conceptual model in the field of computer science is a special case of a general conceptual mod- el. To distinguish from other types of models, it is also known as a domain model. Conceptual modeling should not be confused with other modeling disciplines such as data modelling, logical modelling and physical modelling. The conceptual model is explicitly chosen to be independent of design or implementation concerns, for example, concurrency or data storage. The aim of a conceptual model is to express the meaning of terms and concepts used by domain experts to discuss the problem, and to find the correct relationships between different concepts. The concep- tual model attempts to clarify the meaning of various, usually ambiguous terms, and ensure that problems with different interpretations of the terms and concepts cannot occur. Such differing interpretations could easily cause confusion amongst stakeholders, especially those responsible for designing and implementing a solution, where the conceptual model provides a key artifact of business understanding and clarity. Once the domain concepts have been modeled, the model becomes a stable basis for subsequent development of applications in the domain. The concepts of the conceptual model can be mapped into physical design or implementation constructs using either manual or automated code generation approaches. The realization of conceptual models of many domains can be combined to a coherent platform.

A conceptual model can be described using various notations, such as UML, ORM or OMT for object modelling, or IE or IDEF1X for Entity Relationship Modelling. In UML notation, the conceptual model is often described with a class diagram in which classes represent con- cepts, associations represent relationships between concepts and role types of an association represent role types taken by instances of the modelled concepts in various situations. In ER notation, the conceptual model is described with an ER Diagram in which entities represent concepts, cardinality and optionality represent relationships between concepts. Regardless of the notation used, it is important not to compromise the richness and clarity of the business meaning depicted in the conceptual model by expressing it directly in a form influenced by design or implementation concerns.

This is often used for defining different processes in a particular company or institute.

Multiscale Modeling

In engineering, mathematics, physics, chemistry, bioinformatics, computational biology, me- teorology and computer science, multiscale modeling or multiscale mathematics is the field of solving problems which have important features at multiple scales of time and/or space. Important problems include multiscale modeling of fluids, solids, polymers, proteins, nucleic acids as well as various physical and chemical phenomena (like adsorption, chemical reac- tions, diffusion).

______WORLD TECHNOLOGIES ______Modeling: An Overview 95

History

Horstemeyer 2009, 2012 presented a historical review of the different disciplines (solid mechan- ics, numerical methods, mathematics, physics, and materials science) for solid materials related to multiscale materials modeling.

The recent surge of multiscale modeling from the smallest scale (atoms) to full system level (e.g., autos) related to solid mechanics that has now grown into an international multidisciplinary ac- tivity was birthed from an unlikely source. Since the US Department of Energy (DOE) national labs started to reduce nuclear underground tests in the mid 1980s, with the last one in 1992, the idea of simulation-based design and analysis concepts were birthed. Multiscale modeling was a key in garnering more precise and accurate predictive tools. In essence, the number of large scale systems level tests that were previously used to validate a design was reduced to nothing, thus warranting the increase in simulation results of the complex systems for design verification and validation purposes.

Essentially, the idea of filling the space of system level “tests” was then proposed to be filled by sim- ulation results. After the Comprehensive Test Ban Treaty of 1996 in which many countries pledged to discontinue all systems level nuclear testing, programs like the Advanced Strategic Computing Initiative (ASCI) were birthed within the Department of Energy (DOE) and managed by the na- tional labs within the US. Within ASCI, the basic recognized premise was to provide more accurate and precise simulation-based design and analysis tools. Because of the requirements for great- er complexity in the simulations, and multiscale modeling became the major challenges that needed to be addressed. With this perspective, the idea of experiments shifted from the large scale complex tests to multiscale experiments that provided material models with validation at different length scales. If the modeling and simulations were physically based and less empirical, then a predictive capability could be realized for other conditions. As such, various multiscale modeling methodologies were independently being created at the DOE national labs: Los Alamos National Lab (LANL), Lawrence Livermore National Laboratory (LLNL), Sandia Na- tional Laboratories (SNL), and Oak Ridge National Laboratory (ORNL). In addition, personnel from these national labs encouraged, funded, and managed academic research related to mul- tiscale modeling. Hence, the creation of different methodologies and computational algorithms for parallel environments gave rise to different emphases regarding multiscale modeling and the associated multiscale experiments.

The advent of parallel computing also contributed to the development of multiscale modeling. Since more degrees of freedom could be resolved by parallel computing environments, more accu- rate and precise algorithmic formulations could be admitted. This thought also drove the political leaders to encourage the simulation-based design concepts.

At LANL, LLNL, and ORNL, the multiscale modeling efforts were driven from the materials sci- ence and physics communities with a bottom-up approach. Each had different programs that tried to unify computational efforts, materials science information, and applied mechanics algorithms with different levels of success. Multiple scientific articles were written, and the multiscale activ- ities took different lives of their own. At SNL, the multiscale modeling effort was an engineering top-down approach starting from continuum mechanics perspective, which was already rich with a computational paradigm. SNL tried to merge the materials science community into the continu-

______WORLD TECHNOLOGIES ______96 Numerical Analysis, Modelling and Simulation

um mechanics community to address the lower length scale issues that could help solve engineer- ing problems in practice.

Once this management infrastructure and associated funding was in place at the various DOE institutions, different academic research projects started, initiating various satellite networks of multiscale modeling research. Technological transfer also arose into other labs within the Depart- ment of Defense and industrial research communities.

The growth of multiscale modeling in the industrial sector was primarily due to financial moti- vations. From the DOE national labs perspective, the shift from large scale systems experiments mentality occurred because of the 1996 Nuclear Ban Treaty. Once industry realized that the no- tions of multiscale modeling and simulation based design were invariant to the type of product and that effective multiscale simulations could in fact lead to design optimization, a paradigm shift began to occur, in various measures within different industries, as cost savings and accuracy in product warranty estimates were rationalized. “ Mark Horstemeyer, Integrated Computational Materials Engineering (ICME) for Metals, Chap- ter 1, Section 1.3. ” The aforementioned DOE multiscale modeling efforts were hierarchical in nature. The first concurrent multiscale model occurred when Michael Ortiz (Caltech) took the code, Dynamo, (developed by Mike Baskes at Sandia National Labs) and with his students embedded it into a finite element code for the first time. Martin Karplus, Michael Levitt, Arieh Warshel 2013 were awarded a Nobel Prize in Chemistry for the development of a multiscale model method using both classical and quantum mechanical theory which were used to model large complex chemical systems and reactions.

Areas of Research

In physics and chemistry, multiscale modeling is aimed to calculation of material properties or system behavior on one level using information or models from different levels. On each level particular approaches are used for description of a system. The following levels are usually dis- tinguished: level of quantum mechanical models (information about electrons is included), level of molecular dynamics models (information about individual atoms is included), coarse-grained models (information about atoms and/or groups of atoms is included), mesoscale or nano level (information about large groups of atoms and/or molecule positions is included), level of continu- um models, level of device models. Each level addresses a phenomenon over a specific window of length and time. Multiscale modeling is particularly important in integrated computational mate- rials engineering since it allows the prediction of material properties or system behavior based on knowledge of the process-structure-property relationships.

In operations research, multiscale modeling addresses challenges for decision makers which come from multiscale phenomena across organizational, temporal and spatial scales. This theory fuses

______WORLD TECHNOLOGIES ______Modeling: An Overview 97

decision theory and multiscale mathematics and is referred to as multiscale decision-making. Mul- tiscale decision-making draws upon the analogies between physical systems and complex man- made systems.

In meteorology, multiscale modeling is the modeling of interaction between weather systems of different spatial and temporal scales that produces the weather that we experience. The most chal- lenging task is to model the way through which the weather systems interact as models cannot see beyond the limit of the model grid size. In other words, to run an atmospheric model that is hav- ing a grid size (very small ~ 500 m) which can see each possible cloud structure for the whole globe is computationally very expensive. On the other hand, a computationally feasible Global climate model (GCM, with grid size ~ 100 km, cannot see the smaller cloud systems). So we need to come to a balance point so that the model becomes computationally feasible and at the same time we do not lose much information, with the help of making some rational guesses, a process called parametrization.

Besides the many specific applications, one area of research is methods for the accurate and effi- cient solution of multiscale modeling problems. The primary areas of mathematical and algorith- mic development include: • Analytical modeling • Center manifold and slow manifold theory • Continuum modeling • Discrete modeling • Network-based modeling • Statistical modeling

Ontology (Information Science)

In computer science and information science, an ontology is a formal naming and definition of the types, properties, and interrelationships of the entities that really or fundamentally exist for a particu- lar domain of discourse. It is thus a practical application of philosophical ontology, with a taxonomy.

An ontology compartmentalizes the variables needed for some set of computations and establishes the relationships between them.

The fields of artificial intelligence, the Semantic Web, systems engineering, software engineering, biomedical informatics, library science, enterprise bookmarking, and information architecture all create ontologies to limit complexity and to organize information. The ontology can then be ap- plied to problem solving.

Etymology and Definition

The term ontology has its origin in philosophy and has been applied in many different ways.

______WORLD TECHNOLOGIES ______98 Numerical Analysis, Modelling and Simulation

The core meaning within computer science is a model for describing the world that consists of a set of types, properties, and relationship types. There is also generally an expectation that the features of the model in an ontology should closely resemble the real world (related to the object).

Overview

What many ontologies have in common in both computer science and in philosophy is the repre- sentation of entities, ideas, and events, along with their properties and relations, according to a system of categories. In both fields, there is considerable work on problems ofontological relativity (e.g., Quine and Kripke in philosophy, Sowa and Guarino in computer science), and debates con- cerning whether a normative ontology is viable (e.g., debates over foundationalism in philosophy, and over the Cyc project in AI). Differences between the two are largely matters of focus. Computer scientists are more concerned with establishing fixed, controlled vocabularies, while philosophers are more concerned with first principles, such as whether there are such things asfixed essences or whether enduring objects must be ontologically more primary than processes.

Other fields make ontological assumptions that are sometimes explicitly elaborated and explored. For instance, the definition and ontology of economics (also sometimes called the political econ- omy) is hotly debated especially in Marxist economics where it is a primary concern, but also in other subfields. Such concerns intersect with those of information science when a simulation or model is intended to enable decisions in the economic realm; for example, to determine what cap- ital assets are at risk and if so by how much. Some claim all social sciences have explicit ontology issues because they do not have hard falsifiability criteria like most models in physical sciences and that indeed the lack of such widely accepted hard falsification criteria is what defines a social or soft science.

History

Historically, ontologies arise out of the branch of philosophy known as metaphysics, which deals with the nature of reality – of what exists. This fundamental branch is concerned with analyzing various types or modes of existence, often with special attention to the relations between particu- lars and universals, between intrinsic and extrinsic properties, and between essence and existence. The traditional goal of ontological inquiry in particular is to divide the world “at its joints” to dis- cover those fundamental categories or kinds into which the world’s objects naturally fall.

During the second half of the 20th century, philosophers extensively debated the possible meth- ods or approaches to building ontologies without actually building any very elaborate ontologies themselves. By contrast, computer scientists were building some large and robust ontologies, such as WordNet and Cyc, with comparatively little debate over how they were built.

Since the mid-1970s, researchers in the field of artificial intelligence (AI) have recognized that capturing knowledge is the key to building large and powerful AI systems. AI researchers argued that they could create new ontologies as computational models that enable certain kinds of auto- mated reasoning. In the 1980s, the AI community began to use the term ontology to refer to both a theory of a modeled world and a component of knowledge systems. Some researchers, drawing inspiration from philosophical ontologies, viewed computational ontology as a kind of applied

______WORLD TECHNOLOGIES ______Modeling: An Overview 99 philosophy.

In the early 1990s, the widely cited Web page and paper “Toward Principles for the Design of Ontologies Used for Knowledge Sharing” by Tom Gruber is credited with a deliberate definition of ontology as a technical term in computer science. Gruber introduced the term to mean a speci- fication of a conceptualization:

An ontology is a description (like a formal specification of a program) of the concepts and relation- ships that can formally exist for an agent or a community of agents. This definition is consistent with the usage of ontology as set of concept definitions, but more general. And it is a different sense of the word than its use in philosophy.

According to Gruber (1993):

Ontologies are often equated with taxonomic hierarchies of classes, class definitions, and the sub- sumption relation, but ontologies need not be limited to these forms. Ontologies are also not limit- ed to conservative definitions — that is, definitions in the traditional logic sense that only introduce terminology and do not add any knowledge about the world. To specify a conceptualization, one needs to state axioms that do constrain the possible interpretations for the defined terms.

Components

Contemporary ontologies share many structural similarities, regardless of the language in which they are expressed. As mentioned above, most ontologies describe individuals (instances), classes (concepts), attributes, and relations.

Common components of ontologies include:

Individuals

Instances or objects (the basic or “ground level” objects)

Classes

Sets, collections, concepts, classes in programming, types of objects, or kinds of things

Attributes

Aspects, properties, features, characteristics, or parameters that objects (and classes) can have

Relations

Ways in which classes and individuals can be related to one another

Function terms

Complex structures formed from certain relations that can be used in place of an individual term in a statement

Restrictions

______WORLD TECHNOLOGIES ______100 Numerical Analysis, Modelling and Simulation

Formally stated descriptions of what must be true in order for some assertion to be accept- ed as input

Rules

Statements in the form of an if-then (antecedent-consequent) sentence that describe the logical inferences that can be drawn from an assertion in a particular form

Axioms

Assertions (including rules) in a logical form that together comprise the overall theory that the ontology describes in its domain of application. This definition differs from that of “ax- ioms” in generative grammar and formal logic. In those disciplines, axioms include only statements asserted as a priori knowledge. As used here, “axioms” also include the theory derived from axiomatic statements

Events

The changing of attributes or relations

Ontologies are commonly encoded using ontology languages.

Types Domain Ontology

A domain ontology (or domain-specific ontology) represents concepts which belong to part of the world. Particular meanings of terms applied to that domain are provided by domain ontology. For example, the word card has many different meanings. An ontology about the domain of poker would model the “playing card” meaning of the word, while an ontology about the domain of com- puter hardware would model the “punched card” and “video card” meanings.

Since domain ontologies represent concepts in very specific and often eclectic ways, they are often incompatible. As systems that rely on domain ontologies expand, they often need to merge domain ontologies into a more general representation. This presents a challenge to the ontology designer. Different ontologies in the same domain arise due to different languages, different intended usage of the ontologies, and different perceptions of the domain (based on cultural background, educa- tion, ideology, etc.).

At present, merging ontologies that are not developed from a common foundation ontology is a largely manual process and therefore time-consuming and expensive. Domain ontologies that use the same foundation ontology to provide a set of basic elements with which to specify the meanings of the domain ontology elements can be merged automatically. There are studies on generalized techniques for merging ontologies, but this area of research is still largely theoretical.

Upper Ontology

An upper ontology (or foundation ontology) is a model of the common objects that are generally applicable across a wide range of domain ontologies. It usually employs a core glossary that con-

______WORLD TECHNOLOGIES ______Modeling: An Overview 101

tains the terms and associated object descriptions as they are used in various relevant domain sets.

There are several standardized upper ontologies available for use, including BFO, BORO method, Dublin Core, GFO, OpenCyc/ResearchCyc, SUMO, the Unified Foundational Ontology (UFO), and DOLCE. WordNet, while considered an upper ontology by some, is not strictly an ontology. How- ever, it has been employed as a linguistic tool for learning domain ontologies.

Hybrid Ontology

The Gellish ontology is an example of a combination of an upper and a domain ontology.

Visualization

A survey of ontology visualization techniques is presented by Katifori et al. An evaluation of two most established ontology visualization techniques: indented tree and graph is discussed in. A visual language for ontologies represented in OWL is specified by the Visual Notation for OWL Ontologies (VOWL).

Engineering

Ontology engineering (or ontology building) is a subfield of knowledge engineering. It studies the ontology development process, the ontology life cycle, the methods and methodologies for build- ing ontologies, and the tool suites and languages that support them.

Ontology engineering aims to make explicit the knowledge contained within software appli- cations, and within enterprises and business procedures for a particular domain. Ontology engineering offers a direction towards solving the interoperability problems brought about by semantic obstacles, such as the obstacles related to the definitions of business terms and soft- ware classes. Ontology engineering is a set of tasks related to the development of ontologies for a particular domain.

Known challenges with ontology engineering include:

1. Ensuring the ontology is current with domain knowledge and term use

2. Providing sufficient specificity and concept coverage for the domain of interest, thus min- imizing the content completeness problem

3. Ensuring the ontology can support its use cases

Learning

Ontology learning is the automatic or semi-automatic creation of ontologies, including extracting a domain’s terms from natural language text. As building ontologies manually is extremely la- bor-intensive and time consuming, there is great motivation to automate the process. Information extraction and text mining methods have been explored to automatically link ontologies to docu- ments, e.g. in the context of the BioCreative challenges.

______WORLD TECHNOLOGIES ______102 Numerical Analysis, Modelling and Simulation

Languages An ontology language is a formal language used to encode the ontology. There are a number of such languages for ontologies, both proprietary and standards-based: • Common Algebraic Specification Language is a general logic-based specification language developed within the IFIP working group 1.3 “Foundations of System Specifications” and functions as a de facto standard in the area of software specifications. It is now being applied to ontology specifications in order to provide modularity and structuring mechanisms.

• Common logic is ISO standard 24707, a specification for a family of ontology languages that can be accurately translated into each other.

• The Cyc project has its own ontology language called CycL, based on first-order predicate with some higher-order extensions.

• DOGMA (Developing Ontology-Grounded Methods and Applications) adopts the fact-ori- ented modeling approach to provide a higher level of semantic stability.

• The Gellish language includes rules for its own extension and thus integrates an ontology with an ontology language.

• IDEF5 is a software engineering method to develop and maintain usable, accurate, domain ontologies.

• KIF is a syntax for first-order logic that is based on S-expressions. SUO-KIF is a derivative version supporting the Suggested Upper Merged Ontology.

• MOF and UML are standards of the OMG

• Olog is a category theoretic approach to ontologies, emphasizing translations between on- tologies using functors.

• OBO, a language used for biological and biomedical ontologies.

• OntoUML is an ontologically well-founded profile of UML for conceptual modeling of do- main ontologies.

• OWL is a language for making ontological statements, developed as a follow-on from RDF and RDFS, as well as earlier ontology language projects including OIL, DAML, and DAM- L+OIL. OWL is intended to be used over the World Wide Web, and all its elements (classes, properties and individuals) are defined as RDF resources, and identified by URIs.

• Rule Interchange Format (RIF) and F-Logic combine ontologies and rules.

• Semantic Application Design Language (SADL) captures a subset of the expressiveness of OWL, using an English-like language entered via an Plug-in.

• SBVR (Semantics of Business Vocabularies and Rules) is an OMG standard adopted in industry to build ontologies.

• TOVE Project, TOronto Virtual Enterprise project

______WORLD TECHNOLOGIES ______Modeling: An Overview 103

Libraries

The development of ontologies for the Web has led to the emergence of services providing lists or directories of ontologies with search facility. Such directories have been called ontology libraries.

The following are libraries of human-selected ontologies.

• COLORE is an open repository of first-order ontologies inCommon Logic with formal links between ontologies in the repository.

• DAML Ontology Library maintains a legacy of ontologies in DAML.

• Ontology Design Patterns portal is a wiki repository of reusable components and practices for ontology design, and also maintains a list of exemplary ontologies.

• Protégé Ontology Library contains a set of OWL, Frame-based and other format ontologies.

• SchemaWeb is a directory of RDF schemata expressed in RDFS, OWL and DAML+OIL.

The following are both directories and search engines. They include crawlers searching the Web for well-formed ontologies.

• OBO Foundry is a suite of interoperable reference ontologies in biology and biomedicine.

• Bioportal (ontology repository of NCBO).

• OntoSelect Ontology Library offers similar services for RDF/S, DAML and OWL ontologies.

• Ontaria is a “searchable and browsable directory of semantic web data” with a focus on RDF vocabularies with OWL ontologies. (NB Project “on hold” since 2004).

• Swoogle is a directory and search engine for all RDF resources available on the Web, in- cluding ontologies.

• Open Ontology Repository initiative.

• ROMULUS is a foundational ontology repository aimed at improving semantic interoperabili- ty. Currently there are three foundational ontologies in the repository: DOLCE, BFO and GFO.

Examples of Applications

In general, ontologies can be used beneficially in

• enterprise applications. A more concrete example is SAPPHIRE (Health care) or Situa- tional Awareness and Preparedness for Public Health Incidences and Reasoning Engines which is a semantics-based health information system capable of tracking and evaluating situations and occurrences that may affect public health.

• geographic information systems bring together data from different sources and benefit therefore from ontological metadata which helps to connect the semantics of the data.

______WORLD TECHNOLOGIES ______104 Numerical Analysis, Modelling and Simulation

Statistical Model

A statistical model is a class of mathematical model, which embodies a set of assumptions con- cerning the generation of some sample data, and similar data from a larger population. A statisti- cal model represents, often in considerably idealized form, the data-generating process. The assumptions embodied by a statistical model describe a set of probability distributions, some of which are assumed to adequately approximate the distribution from which a particular data set is sampled. The probability distributions inherent in statistical models are what distinguishes statistical models from other, non-statistical, mathematical models. A statistical model is usually specified by mathematical equations that relate one or morerandom variables and possibly other non-random variables. As such, “a model is a formal representation of a theory” (Herman Adèr quoting Kenneth Bollen). All statistical hypothesis tests and all statistical estimators are derived from statistical models. More generally, statistical models are part of the foundation of statistical inference.

Formal Definition

In mathematical terms, a statistical model is usually thought of as a pair S, ), where S is the set of possible observations, i.e. the sample space, and  is a set of probability distributions on S .

The intuition behind this definition is as follows. It is assumed that there is a “true” probability distribution induced by the process that generates the observed data. We choose  to represent a set (of distributions) which contains a distribution that adequately approximates the true distribu- tion. Note that we do not require that  contains the true distribution, and in practice that is rarely the case. Indeed, as Burnham & Anderson state, “A model is a simplification or approximation of reality and hence will not reflect all of reality”—whence the saying “all models are wrong”.

The set  is almost always parameterized:  ={Pθ :θ ∈Θ }. The set Θ defines the parameters of the model. A parameterization is generally required to have distinct parameter values give rise to distinct distributions, i.e. PP= ⇒=θθmust hold (in other words, it must be injective). A θθ12 12 parameterization that meets the condition is said to be identifiable.

An Example

Height and age are each probabilistically distributed over humans. They are stochastically relat- ed: when we know that a person is of age 10, this influences the chance of the person being 5 feet tall. We could formalize that relationship in a linear regression model with the following form:

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i identifies the person. This implies that height is predicted by age, with some error.

An admissible model must be consistent with all the data points. Thus, the straight line (heighti =

b0 + b1agei) is not a model of the data. The line cannot be a model, unless it exactly fits all the data

points—i.e. all the data points lie perfectly on a straight line. The error term, εi, must be included in the model, so that the model is consistent with all the data points.

______WORLD TECHNOLOGIES ______Modeling: An Overview 105

To do statistical inference, we would first need to assume some probability distributions for the εi.

For instance, we might assume that the εi distributions are i.i.d. Gaussian, with zero mean. In this instance, the model would have 3 parameters: b0, b1, and the variance of the Gaussian distribution. We can formally specify the model in the form S, as follows. The sample space, S , of our model 2 comprises the set of all possible pairs (age, height). Each possible value of θ = (b0, b1, σ ) deter- mines a distribution on S; denote that distribution by Pθ . If Θ is the set of all possible values of

θ , then  ={Pθ :θ ∈Θ }. (The parameterization is identifiable, and this is easy to check.) In this example, the model is determined by (1) specifying S and (2) making some assumptions relevant to . There are two assumptions: that height can be approximated by a linear function of age; that errors in the approximation are distributed as i.i.d. Gaussian. The assumptions are sufficient to specify  —as they are required to do.

General Remarks

A statistical model is a special class of mathematical model. What distinguishes a statistical model from other mathematical models is that a statistical model is non-deterministic. Thus, in a statisti- cal model specified via mathematical equations, some of the variables do not have specific values, but instead have probability distributions; i.e. some of the variables are stochastic. In the example above, ε is a stochastic variable; without that variable, the model would be deterministic. Statistical models are often used even when the physical process being modeled is deterministic. For instance, coin tossing is, in principle, a deterministic process; yet it is commonly modeled as stochastic (via a Bernoulli process). There are three purposes for a statistical model, according to Konishi & Kitagawa. • Predictions • Extraction of information

• Description of stochastic structures

Dimension of A Model

Suppose that we have a statistical model ( S, ) with  ={Pθ :θ ∈Θ }. The model is said to be parametric if Θ has a finite dimension. In notation, we write that Θ⊆d where d is a positive integer  denotes the real numbers; other sets can be used, in principle). Here, d is called the dimension of the model.

As an example, if we assume that data arise from a univariate Gaussian distribution, then we are assuming that

1 ()x − µ 2 = ≡ −µσ ∈>  {Pxµσ, ( ) exp 2 : , 0}. 2πσ 2σ

In this example, the dimension, d, equals 2.

As another example, suppose that the data consists of points (x, y) that we assume are distributed

______WORLD TECHNOLOGIES ______106 Numerical Analysis, Modelling and Simulation

according to a straight line with i.i.d. Gaussian residuals (with zero mean). Then the dimension of the statistical model is 3: the intercept of the line, the slope of the line, and the variance of the distribution of the residuals. (Note that in geometry, a straight line has dimension 1.)

A statistical model is nonparametric if the parameter set È is infinite dimensional. A statistical model is semiparametric if it has both finite-dimensional and infinite-dimensional parameters. Formally, if d is the dimension of È and n is the number of samples, both semiparametric and nonparametric models have d →∞as n →∞. . If dn/0→ as n →∞, , then the model is semi- parametric; otherwise, the model is nonparametric.

Parametric models are by far the most commonly used statistical models. Regarding semiparamet- ric and nonparametric models, Sir David Cox has said: “These typically involve fewer assumptions of structure and distributional form but usually contain strong assumptions about independen- cies”.

Nested Models

Two statistical models are nested if the first model can be transformed into the second model by imposing constraints on the parameters of the first model. For example, the set of all Gaussian distributions has, nested within it, the set of zero-mean Gaussian distributions: we constrain the mean in the set of all Gaussian distributions to get the zero-mean distributions.

In that example, the first model has a higher dimension than the second model (the zero-mean model has dimension 1). Such is usually, but not always, the case. As a different example, the set of positive-mean Gaussian distributions, which has dimension 2, is nested within the set of all Gaussian distributions.

Comparing Models

It is assumed that there is a “true” probability distribution underlying the observed data, induced by the process that generated the data. The main goal of model selection is to make statements about which elements of  are most likely to adequately approximate the true distribution.

Models can be compared to each other by exploratory data analysis or confirmatory data analysis. In exploratory analysis, a variety of models are formulated and an assessment is performed of how well each one describes the data. In confirmatory analysis, a previously formulated model or mod- els are compared to the data. Common criteria for comparing models include R2, Bayes factor, and the likelihood-ratio test together with its generalization relative likelihood.

Konishi & Kitagawa state: “The majority of the problems in statistical inference can be considered to be problems related to statistical modeling. They are typically formulated as comparisons of several statistical models.” Relatedly, Sir David Cox has said, “How [the] translation from sub- ject-matter problem to statistical model is done is often the most critical part of an analysis”.

References • Jørgen Rammer (2007). Quantum Field Theory of Nonequilibrium States. Cambridge University Press. ISBN 978-0-521-87499-1.

______WORLD TECHNOLOGIES ______Modeling: An Overview 107

• Encyclopaedia of Physics (2nd Edition), R.G. Lerner, G.L. Trigg, VHC publishers, 1991, ISBN (Verlags- gesellschaft) 3-527-26954-1, ISBN (VHC Inc.) 0-89573-752-3

• Clifford Truesdell & Walter Noll; Stuart S. Antman, editor (2004). The Non-linear Field Theories of Mechanics. Springer. p. 4. ISBN 3-540-02779-3. CS1 maint: Multiple names: authors list (link)

• Kay, J.M. (1985). Fluid Mechanics and Transfer Processes. Cambridge University Press. pp. 10 & 122–124. ISBN 9780521316248.

• AC Gilbert (Ronald R Coifman, Editor) (May 2000). Topics in Analysis and Its Applications: Selected Theses. Singapore: World Scientific Publishing Company. p. 155. ISBN 981-02-4094-5.

• Edward D. Palik; Ghosh G (1998). Handbook of Optical Constants of Solids. London UK: Academic Press. p. 1114. ISBN 0-12-544422-2.

• Paul, Richard (1981). Robot manipulators: mathematics, programming, and control : the computer control of robot manipulators. MIT Press, Cambridge, MA. ISBN 978-0-262-16082-7.

• J. Sokolowski, C. Banks, Modeling and Simulation Fundamentals: Theoretical Underpinnings and Practical Domains, Wiley (2010). Amazon.com. ISBN 0470486740.

______WORLD TECHNOLOGIES ______4 Theorems in Approximation Theory

Approximation theory concerns itself with how functions can be valued with simpler functions. The other theorems explained in this section are Stone-Weierstrass theorem, Fejér’s theorem, Ber- nstein’s theorem and Favard’s theorem. The chapter strategically encompasses and incorporated the theorems used in approximation theory, providing a complete understanding.

Approximation Theory

In mathematics, approximation theory is concerned with how functions can best be approximated with simpler functions, and with quantitatively characterizing the errors introduced thereby. Note that what is meant by best and simpler will depend on the application.

A closely related topic is the approximation of functions by generalized Fourier series, that is, approximations based upon summation of a series of terms based upon orthogonal polynomials.

One problem of particular interest is that of approximating a function in a computer mathematical library, using operations that can be performed on the computer or calculator (e.g. addition and multiplication), such that the result is as close to the actual function as possible. This is typically done with polynomial or rational (ratio of polynomials) approximations.

The objective is to make the approximation as close as possible to the actual function, typically with an accuracy close to that of the underlying computer’s floating point arithmetic. This is ac- complished by using a polynomial of high degree, and/or narrowing the domain over which the polynomial has to approximate the function. Narrowing the domain can often be done through the use of various addition or scaling formulas for the function being approximated. Modern mathe- matical libraries often reduce the domain into many tiny segments and use a low-degree polyno- mial for each segment.

Error between optimal polynomial and log(x) (red), Error between optimal polynomial and exp(x) (red), and and Chebyshev approximation and log(x) (blue) over Chebyshev approximation and exp(x) (blue) over the the interval [2, 4]. Vertical divisions are 10−5. Maximum interval [−1, 1]. Vertical divisions are 10−4. Maximum error for the optimal polynomial is 6.07 x 10−5. error for the optimal polynomial is 5.47 x 10−4.

______WORLD TECHNOLOGIES ______Theorems in Approximation Theory 109

Optimal Polynomials

Once the domain (typically an interval) and degree of the polynomial are chosen, the polyno- mial itself is chosen in such a way as to minimize the worst-case error. That is, the goal is to minimize the maximum value of Px()− f () x where P(x) is the approximating polynomial, f(x) is the actual function, and x varies over the chosen interval. For well-behaved functions, there exists an Nth-degree polynomial that will lead to an error curve that oscillates back and forth between +ε and −ε a total of N+2 times, giving a worst-case error of ε. It is seen that an Nth-degree polynomial can interpolate N+1 points in a curve. Such a polynomial is always optimal. It is possible to make contrived functions f(x) for which no such polynomial exists, but these occur rarely in practice.

For example, the graphs shown to the right show the error in approximating log(x) and exp(x) for N = 4. The red curves, for the optimal polynomial, are level, that is, they oscillate between −ε ex- actly. Note that, in each case, the number of extrema is N+2, that is, 6. Two of the extrema are at the end points of the interval, at the left and right edges of the graphs.

Error P(x) − f(x) for level polynomial (red), and for purported better polynomial (blue)

To prove this is true in general, suppose P is a polynomial of degree N having the property de- scribed, that is, it gives rise to an error function that has N + 2 extrema, of alternating signs and equal magnitudes. The red graph to the right shows what this error function might look like for N = 4. Suppose Q(x) (whose error function is shown in blue to the right) is another N-degree poly- nomial that is a better approximation to f than P. In particular, Q is closer to f than P for each value xi where an extreme of P−f occurs, so

|()Qxii−<− fx ()||() Px ii fx ()|.

When a maximum of P−f occurs at xi, then

Qx()ii−≤ fx ()|() Qx ii −<− fx ()||() Px ii fx ()|() =− Px ii fx (),

And when a minimum of P−f occurs at xi, then

fx()ii−≤ Qx ()|() Qx ii −<−=− fx ()||() Px ii fx ()| fx () ii Px ().

______WORLD TECHNOLOGIES ______110 Numerical Analysis, Modelling and Simulation

So, as can be seen in the graph, [P(x) − f(x)] − [Q(x) − f(x)] must alternate in sign for the N + 2

values of xi. But [P(x) − f(x)] − [Q(x) − f(x)] reduces to P(x) − Q(x) which is a polynomial of degree N. This function changes sign at least N+1 times so, by the Intermediate value theorem, it has N+1 zeroes, which is impossible for a polynomial of degree N.

Chebyshev Approximation

One can obtain polynomials very close to the optimal one by expanding the given function in terms of Chebyshev polynomials and then cutting off the expansion at the desired degree. This is similar to the Fourier analysis of the function, using the Chebyshev polynomials instead of the usual trig- onometric functions.

If one calculates the coefficients in the Chebyshev expansion for a function:

f()~ x∑ cTii () x i=0

and then cuts off the series after the TN term, one gets an Nth-degree polynomial approximating f(x).

The reason this polynomial is nearly optimal is that, for functions with rapidly converging power series, if the series is cut off after some term, the total error arising from the cutoff is close to the first term after the cutoff. That is, the first term after the cutoff dominates all later terms. The same is true if the expansion is in terms of Chebyshev polynomials. If a Chebyshev expansion is cut off after TN , the error will take a form close to a multiple of TN +1. The Chebyshev polynomials have

the property that they are level – they oscillate between +1 and −1 in the interval [−1, 1]. TN +1 has

N+2 level extrema. This means that the error between f(x) and its Chebyshev expansion out to TN is close to a level function with N+2 extrema, so it is close to the optimal Nth-degree polynomial.

In the graphs above, note that the blue error function is sometimes better than (inside of) the red function, but sometimes worse, meaning that it is not quite the optimal polynomial. Note also that the discrepancy is less serious for the exp function, which has an extremely rapidly converging power series, than for the log function.

Chebyshev approximation is the basis for Clenshaw–Curtis quadrature, a numerical integration technique.

Remez’s Algorithm

The Remez algorithm (sometimes spelled Remes) is used to produce an optimal polynomial P(x) approximating a given function f(x) over a given interval. It is an iterative algorithm that converg- es to a polynomial that has an error function with N+2 level extrema. By the theorem above, that polynomial is optimal.

Remez’s algorithm uses the fact that one can construct an Nth-degree polynomial that leads to level and alternating error values, given N+2 test points.

Given N+2 test points xN +2 (where x1 and xN +2 are presumably the end points of the interval of approximation), these equations need to be solved:

______WORLD TECHNOLOGIES ______Theorems in Approximation Theory 111

Px()11−=+ f () x ε

Px()22−=− f () x ε

Px()33−=+ f () x ε

Px()()NN++22−=± f x ε .

The right-hand sides alternate in sign.

That is,

23 N P0+ Px 11 + Px 21 + Px 31 +…+ PN x1 − f() x 1 = +ε

23 N P0+ Px 12 + Px 22 + Px 32 +…+ PN x2 − f() x 2 = −ε

Since x1, ..., xN +2 were given, all of their powers are known, and fx()1 , ..., fx()N +2 are also known.

That means that the above equations are just N+2 linear equations in the N+2 variables P1 , ..., PN , and ε. Given the test points x1, ..., xN +2 , one can solve this system to get the polynomial P and the number ε.

The graph below shows an example of this, producing a fourth-degree polynomial approximating ex over [−1, 1]. The test points were set at −1, −0.7, −0.1, +0.4, +0.9, and 1. Those values are shown in green. The resultant value of ε is 4.43 x 10−4

Error of the polynomial produced by the first step of Remez’s algorithm, approximating ex over the interval [−1, 1]. Vertical divisions are 10−4.

Note that the error graph does indeed take on the values ±ε at the six test points, including the end points, but that those points are not extrema. If the four interior test points had been extrema (that is, the function P(x)f(x) had or minima there), the polynomial would be optimal.

______WORLD TECHNOLOGIES ______112 Numerical Analysis, Modelling and Simulation

The second step of Remez’s algorithm consists of moving the test points to the approximate loca- tions where the error function had its actual local maxima or minima. For example, one can tell from looking at the graph that the point at −0.1 should have been at about −0.28. The way to do this in the algorithm is to use a single round of Newton’s method. Since one knows the first and second derivatives of P(x)−f(x), one can calculate approximately how far a test point has to be moved so that the derivative will be zero.

Calculating the derivatives of a polynomial is straightforward. One must also be able to calculate the first and second derivatives of f(x). Remez’s algorithm requires an ability to calculate fx′(), and fx′′()to extremely high precision. The entire algorithm must be carried out to higher preci- sion than the desired precision of the result.

After moving the test points, the linear equation part is repeated, getting a new polynomial, and Newton’s method is used again to move the test points again. This sequence is continued until the result converges to the desired accuracy. The algorithm converges very rapidly. Convergence is quadratic for well-behaved functions—if the test points are within 10−15 of the correct result, they will be approximately within 10−30 of the correct result after the next round.

Remez’s algorithm is typically started by choosing the extrema of the Chebyshev polynomial TN +1 as the initial points, since the final error function will be similar to that polynomial.

Stone–Weierstrass Theorem

In mathematical analysis, the Weierstrass approximation theorem states that every continuous function defined on a closed interval [a, b] can be uniformly approximated as closely as desired by a polynomial function. Because polynomials are among the simplest functions, and because computers can directly evaluate polynomials, this theorem has both practical and theoretical rele- vance, especially in polynomial interpolation. The original version of this result was established by Karl Weierstrass in 1885 using the Weierstrass transform.

Marshall H. Stone considerably generalized the theorem (Stone 1937) and simplified the proof (Stone 1948). His result is known as the Stone–Weierstrass theorem. The Stone–Weierstrass the- orem generalizes the Weierstrass approximation theorem in two directions: instead of the real interval [a, b], an arbitrary compact Hausdorff space X is considered, and instead of the algebra of polynomial functions, approximation with elements from more general subalgebras of C(X) is investigated. The Stone–Weierstrass theorem is a vital result in the study of the algebra of contin- uous functions on a compact Hausdorff space.

Further, there is a generalization of the Stone–Weierstrass theorem to noncompact Tychonoff spaces, namely, any continuous function on a Tychonoff space is approximated uniformly on compact sets by of the type appearing in the Stone–Weierstrass theorem and de- scribed below.

A different generalization of Weierstrass’ original theorem is Mergelyan’s theorem, which general- izes it to functions defined on certain subsets of the complex plane.

______WORLD TECHNOLOGIES ______Theorems in Approximation Theory 113

Weierstrass Approximation Theorem

The statement of the approximation theorem as originally discovered by Weierstrass is as follows:

Weierstrass Approximation Theorem. Suppose f is a continuous real-valued function defined on the real interval [a, b]. For every ε > 0, there exists a polynomial p(x) such that for all x in [a, b], we have | f (x) − p(x)| < ε, or equivalently, the supremum norm || f − p|| < ε.

A constructive proof of this theorem using Bernstein polynomials is outlined on that page.

Applications

As a consequence of the Weierstrass approximation theorem, one can show that the space C[a, b] is separable: the polynomial functions are dense, and each polynomial function can be uniformly approximated by one with rational coefficients; there are only countably many polynomials with rational coefficients. Since C[a, b] is Hausdorff and separable it follows that C[a, b] has cardinality

equal to 2 0 — the same cardinality as the cardinality of the reals. (Remark: This cardinality result also followsℵ from the fact that a continuous function on the reals is uniquely determined by its restriction to the rationals.)

Stone–Weierstrass Theorem, Real Version

The set C[a, b] of continuous real-valued functions on [a, b], together with the supremum norm

|| f || = supa ≤ x ≤ b | f (x)|, is a Banach algebra, (i.e. an associative algebra and a Banach space such that || fg|| ≤ || f ||·||g|| for all f, g). The set of all polynomial functions forms a subalgebra of C[a, b] (i.e. a vector subspace of C[a, b] that is closed under multiplication of functions), and the con- tent of the Weierstrass approximation theorem is that this subalgebra is dense in C[a, b].

Stone starts with an arbitrary compact Hausdorff space X and considers the algebra C(X, R) of real-val- ued continuous functions on X, with the topology of uniform convergence. He wants to find subalge- bras of C(X, R) which are dense. It turns out that the crucial property that a subalgebra must satisfy is that it separates points: a set A of functions defined on X is said to separate points if, for every two different points x and y in X there exists a function p in A with p(x) ≠ p(y). Now we may state:

Stone–Weierstrass Theorem (real numbers). Suppose X is a compact Hausdorff space and A is a subalgebra of C(X, R) which contains a non-zero constant function. Then A is dense in C(X, R) if and only if it separates points.

This implies Weierstrass’ original statement since the polynomials on [a, b] form a subalgebra of C[a, b] which contains the constants and separates points.

Locally Compact Version

A version of the Stone–Weierstrass theorem is also true when X is only locally compact. Let C0(X, R) be the space of real-valued continuous functions on X which vanish at infinity; that is, a contin-

uous function f is in C0(X, R) if, for every ε > 0, there exists a compact set K X such that | f | < ε

on X \ K. Again, C0(X, R) is a Banach algebra with the supremum norm. A subalgebra A of C0(X, R) is said to vanish nowhere if not all of the elements of A simultaneously vanish⊂ at a point; that is, for

______WORLD TECHNOLOGIES ______114 Numerical Analysis, Modelling and Simulation

every x in X, there is some f in A such that f (x) ≠ 0. The theorem generalizes as follows:

Stone–Weierstrass Theorem (locally compact spaces). Suppose X is a locally compact

Hausdorff space and A is a subalgebra of C0(X, R). Then A is dense in C0(X, R) (given the topology of uniform convergence) if and only if it separates points and vanishes nowhere.

This version clearly implies the previous version in the case when X is compact, since in that case

C0(X, R) = C(X, R). There are also more general versions of the Stone–Weierstrass that weaken the assumption of local compactness.

Applications

The Stone–Weierstrass theorem can be used to prove the following two statements which go be- yond Weierstrass’s result.

• If f is a continuous real-valued function defined on the set a[ , b] × [c, d] and ε > 0, then there exists a polynomial function p in two variables such that | f (x, y) − p(x, y) | < ε for all x in [a, b] and y in [c, d].

• If X and Y are two compact Hausdorff spaces and f : X × Y → R is a continuous function,

then for every ε > 0 there exist n > 0 and continuous functions f1, ..., fn on X and continu-

ous functions g1, ..., gn on Y such that || f − ∑ fi gi || < ε. The theorem has many other applications to analysis, including:

• 2πinx Fourier series: The set of linear combinations of functions en(x) = e , n Z is dense in C([0, 1]/{0, 1}), where we identify the endpoints of the interval [0, 1] to obtain a circle. An ∈ 2 important consequence of this is that the en are an orthonormal basis of the space L ([0, 1]) of square-integrable functions on [0, 1].

Stone–Weierstrass Theorem, Complex Version

Slightly more general is the following theorem, where we consider the algebra C(X, C) of com- plex-valued continuous functions on the compact space X, again with the topology of uniform convergence. This is a C*-algebra with the *-operation given by pointwise complex conjugation.

Stone–Weierstrass Theorem (complex numbers). Let X be a compact Hausdorff space and let S be a subset of C(X, C) which separates points. Then the complex unital *-algebra gen- erated by S is dense in C(X, C).

The complex unital *-algebra generated by S consists of all those functions that can be obtained from the elements of S by throwing in the constant function 1 and adding them, multiplying them, conjugating them, or multiplying them with complex scalars, and repeating finitely many times.

This theorem implies the real version, because if a sequence of complex-valued functions uniform- ly approximate a given function f , then the real parts of those functions uniformly approximate the real part of f . As in the real case, an analog of this theorem is true for locally compact Haus- dorff spaces.

______WORLD TECHNOLOGIES ______Theorems in Approximation Theory 115

Stone–Weierstrass Theorem, Quaternion Version

Following John C.Holladay (1957) : consider the algebra C(X, H) of quaternion-valued con- tinuous functions on the compact space X, again with the topology of uniform convergence. If a quaternion q is written in the form q=a+ib+jc+kd then the scalar part a is the real number (q-iqi-jqj-kqk)/4. Likewise being the scalar part of -qi,-qj and -qk : b,c and d are respectively the real numbers (-qi-iq+jqk-kqj)/4, (-qj-iqk-jq+kqi)/4 and (-qk+iqj-jqk-kq)/4. Then we may state :

Stone–Weierstrass Theorem (quaternion numbers). Suppose X is a compact Hausdorff space and A is a subalgebra of C(X, H) which contains a non-zero constant function. Then A is dense in C(X, H) if and only if it separates points.

Stone-Weierstrass Theorem, C*-Algebra Version

The space of complex-valued continuous functions on a compact Hausdorff space X i.e. C(X, C) is the canonical example of a unital commutative C*-algebra A. . The space X may be viewed as the space of pure states on A, with the weak-* topology. Following the above cue, a non-commutative extension of the Stone–Weierstrass theorem, which has remain unsolved, is as follows:

Conjecture. If a unital C*-algebra A has a C*-subalgebra B which separates the pure states of A , then AB= . .

In 1960, Jim Glimm proved a weaker version of the above conjecture.

Stone-Weierstrass theorem (C*-algebras). If a unital C*-algebra A has a C*-subalgebra B which separates the pure state space (i.e. the weak-* closure of the pure states) of A, then AB= .

Lattice Versions

Let X be a compact Hausdorff space. Stone’s original proof of the theorem used the idea of lattices in C(X, R). A subset L of C(X, R) is called a lattice if for any two elements f, g L, the functions max{ f, g}, min{ f, g} also belong to L. The lattice version of the Stone–Weierstrass theorem states: ∈

Stone–Weierstrass Theorem (lattices). Suppose X is a compact Hausdorff space with at least two points and L is a lattice in C(X, R) with the property that for any two distinct elements x and y of X and any two real numbers a and b there exists an element f L with f (x) = a and f (y) = b. Then L is dense in C(X, R). ∈ The above versions of Stone–Weierstrass can be proven from this version once one realizes that the lattice property can also be formulated using the absolute value | f | which in turn can be ap- proximated by polynomials in f . A variant of the theorem applies to linear subspaces of C(X, R) closed under max (Hewitt & Stromberg 1965, Theorem 7.29):

Stone–Weierstrass Theorem. Suppose X is a compact Hausdorff space and B is a family of functions in C(X, R) such that

______WORLD TECHNOLOGIES ______116 Numerical Analysis, Modelling and Simulation

1. B separates points.

2. B contains the constant function 1.

3. If f B then af B for all constants a R.

4. If f, ∈ g B, then ∈ f + g, max{ f, g} B. ∈

Then B is dense∈ in C(X, R). ∈

More precise information is available:

Suppose X is a compact Hausdorff space with at least two points and L is a lattice in C(X, R). The function φ C(X, R) belongs to the closure of L if and only if for each pair of dis- tinct points x and y in X and for each ε > 0 there exists some f L for which | f (x) − φ(x)| < ε and | f (y) − φ(y)|∈ < ε. ∈ Bishop’s Theorem

Another generalization of the Stone–Weierstrass theorem is due to Errett Bishop. Bishop’s theo- rem is as follows (Bishop 1961):

Let A be a closed subalgebra of the Banach space C(X, C) of continuous complex-valued functions on a compact Hausdorff space X. Suppose that f C(X, C) has the following property: ∈

f |S AS for every maximal set S X such that all real functions of AS are constant. Then∈ f A. ⊂

Glicksberg (1962)∈ gives a short proof of Bishop’s theorem using the Krein–Milman theorem in an essential way, as well as the Hahn–Banach theorem : the process of Louis de Branges (1959).

Nachbin’s Theorem

Nachbin’s theorem gives an analog for Stone–Weierstrass theorem for algebras of complex valued smooth functions on a smooth manifold (Nachbin 1949). Nachbin’s theorem is as follows (Llavona 1986):

Let A be a subalgebra of the algebra C∞(M) of smooth functions on a finite dimensional smooth manifold M. Suppose that A separates the points of M and also separates the tan- gent vectors of M: for each point m M and tangent vector v at the tangent space at m, there is a f A such that df(x)(v) ≠ 0. Then A is dense in C∞(M). ∈ ∈ Fejér’s Theorem

In mathematics, Fejér’s theorem, named for Hungarian mathematician Lipót Fejér, states that if

f:R → C is a continuous function with period 2π, then the sequence (σn) of Cesàro means of the

______WORLD TECHNOLOGIES ______Theorems in Approximation Theory 117

sequence (sn) of partial sums of the Fourier series of f converges uniformly to f on [-π,π]. Explicitly,

n ikx snk() x= ∑ ce , kn=−

where

1 π c= f() t e−ikt dt , k 2π ∫−π

and

11n−1 π σ ()x= s () x = f ( x − t ) F () t dt , nk∑ ∫−π n n k =0 2π

with Fn being the nth order Fejér kernel.

A more general form of the theorem applies to functions which are not necessarily continuous 1 (Zygmund 1968, Theorem III.3.4). Suppose that f is in L (-π,π). If the left and right limits f(x0±0) of f(x) exist at x0, or if both limits are infinite of the same sign, then 1 σ (x )→( fx ( ++ 0) fx ( − 0)) . n 02 00

Existence or divergence to infinity of the Cesàro mean is also implied. By a theorem of Marcel

Riesz, Fejér’s theorem holds precisely as stated if the (C, 1) mean σn is replaced with (C, α) mean of the Fourier series (Zygmund 1968, Theorem III.5.1).

Bernstein’s Theorem (Approximation Theory)

In approximation theory, Bernstein’s theorem is a converse to Jackson’s theorem. The first results of this type were proved by Sergei Bernstein in 1912.

For approximation by trigonometric polynomials, the result is as follows:

Let f: [0, 2π] → C be a 2π-periodic function, and assume r is a natural number, and 0 < α < 1. If

there exists a number C(f) > 0 and a sequence of trigonometric polynomials {Pn}n ≥ n0 such that Cf() = −≤ degPnn n , sup | fx ( ) Px ( ) |r+α , 02≤≤x π n

then f = Pn0 + φ, where φ has a bounded r-th derivative which is α-Hölder continuous.

______WORLD TECHNOLOGIES ______118 Numerical Analysis, Modelling and Simulation

Favard’s Theorem

In mathematics, Favard’s theorem, also called the Shohat–Favard theorem, states that a sequence of polynomials satisfying a suitable 3-term recurrence relation is a sequence of orthogonal poly- nomials. The theorem was introduced in the theory of orthogonal polynomials by Favard (1935) and Shohat (1938), though essentially the same theorem was used by Stieltjes in the theory of con- tinued fractions many years before Favard’s paper, and was rediscovered several times by other authors before Favard’s work.

Statement

Suppose that y0 = 1, y1, ... is a sequence of polynomials where yn has degree n. If this is a sequence of orthogonal polynomials for some positive weight function then it satisfies a 3-term recurrence relation. Favard’s theorem is roughly a converse of this, and states that if these polynomials satisfy a 3-term recurrence relation of the form

yn+−11=−−() x cn y n dy nn

for some numbers cn and dn, then the polynomials yn form an orthogonal sequence for some linear functional Λ with Λ(1)=1; in other words Λ(ymyn) = 0 if m ≠ n.

The linear functional Λ is unique, and is given by Λ(1) = 1, Λ(yn) = 0 if n > 0.

The functional Λ satisfies Λ(y2n) = dn Λ(y2n–1), which implies that Λ is positive definite if (and

only if) the numbers cn are real and the numbers dn are positive.

Müntz–Szász Theorem

The Müntz–Szász theorem is a basic result of approximation theory, proved by Herman Müntz in 1914 and Otto Szász (1884–1952) in 1916. Roughly speaking, the theorem shows to what extent the Weierstrass theorem on polynomial approximation can have holes dug into it, by restricting certain coefficients in the polynomials to be zero. The form of the result had been conjectured by Sergei Bernstein before it was proved.

The theorem, in a special case, states that a necessary and sufficient condition for the monomials

xn

to span a dense subset of the Banach space C[a,b] of all continuous functions with complex num- ber values on the closed interval [a,b] with a > 0, with the uniform norm, when the n form a subset S of the natural numbers, is that the sum ∑ nS∈

______WORLD TECHNOLOGIES ______Theorems in Approximation Theory 119 of the reciprocals, taken over S, should diverge, i.e. S is a large set. For an interval [0, b], the con- stant functions are necessary: assuming therefore that 0 is in S, the condition on the other expo- nents is as before.

More generally, one can take exponents from any strictly increasing sequence of positive real num- bers, and the same result holds. Szász showed that for complex number exponents, the same con- dition applied to the sequence of real parts.

There are also versions for the Lp spaces.

References • K.-G. Steffens, “The History of Approximation Theory: From Euler to Bernstein,” Birkhauser, Boston 2006 ISBN 0-8176-4353-2.

• Jan Brinkhuis & Vladimir Tikhomirov (2005) Optimization: Insights and Applications, Princeton University Press ISBN 978-0-691-10287-0 MR 2168305.

• Llavona, José G. (1986), Approximation of continuously differentiable functions, Amsterdam: North-Holland, ISBN 9780080872414

• Subbotin, Yu. N. (2001), “Favard Theorem”, in Hazewinkel, Michiel, Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-010-4

______WORLD TECHNOLOGIES ______5 Methods and Techniques of Numerical Analysis

The methods and techniques of numerical analysis are series acceleration, minimum polynomial extrapolation, Richardson extrapolation, Shanks transformation and interpolation. Series accel- eration improves the rate of convergence of a series. It is also used to obtain a variety of identities on special functions. The aspects elucidated in this chapter are of vital importance, and provides a better understanding of numerical analysis.

Numerical Methods for Ordinary Differential Equations

Numerical methods for ordinary differential equations are methods used to findnumerical approxima- tions to the solutions of ordinary differential equations (ODEs). Their use is also known as “numerical integration”, although this term is sometimes taken to mean the computation of integrals.

Illustration of numerical integration for the differential equation y′ = yy, (0) = 1. Blue: the Euler method, green: t the midpoint method, red: the exact solution, ye= . The step size is h =1.0.

Many differential equations cannot be solved using symbolic computation (“analysis”). For prac- tical purposes, however – such as in engineering – a numeric approximation to the solution is often sufficient. The algorithms studied here can be used to compute such an approximation. An alternative method is to use techniques from calculus to obtain a series expansion of the solution.

Ordinary differential equations occur in many scientific disciplines, for instance inphysics , chem- istry, biology, and economics. In addition, some methods in numerical partial differential equa- tions convert the partial differential equation into an ordinary differential equation, which must then be solved.

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 121

The same illustration for h = 0.25. It is seen that the midpoint method converges faster than the Euler method.

The Problem

A first-order differential equation is an Initial value problem (IVP) of the form,

y′( t )= f ( t , yt ( )), yt (00 )= y , (1)

d d d where f is a function that maps [t0,∞) × R to R , and the initial condition y0 R is a given vector. First-order means that only the first derivative ofy appears in the equation, and higher derivatives are absent. ∈

Without loss of generality to higher-order systems, we restrict ourselves to first-order differen- tial equations, because a higher-order ODE can be converted into a larger system of first-order equations by introducing extra variables. For example, the second-order equation y’’ = −y can be rewritten as two first-order equations: y’ = z and z’ = −y.

We will describe numerical methods for IVPs, and remark that boundary value problems (BVPs) require a different set of tools. In a BVP, one defines values, or components of the solution y at more than one point. Because of this, different methods need to be used to solve BVPs. For example, the shooting method (and its variants) or global methods like finite differences, Galerkin methods, or collocation methods are appropriate for that class of problems.

The Picard–Lindelöf theorem states that there is a unique solution, provided f is Lipschitz-contin- uous.

Methods

Numerical methods for solving first-order IVPs often fall into one of two large categories: lin- ear multistep methods, or Runge-Kutta methods. A further division can be realized by dividing methods into those that are explicit and those that are implicit. For example, implicit linear mul- tistep methods include Adams-Moulton methods, and backward differentiation methods (BDF), whereas implicit Runge-Kutta methods include diagonally implicit Runge-Kutta (DIRK), singly diagonally implicit runge kutta (SDIRK), and Gauss-Radau (based on Gaussian quadrature) nu- merical methods. Explicit examples from the linear multistep family include the Adams-Bashforth methods, and any Runge-Kutta method with a lower diagonal Butcher tableau is explicit. A loose

______WORLD TECHNOLOGIES ______122 Numerical Analysis, Modelling and Simulation

rule of thumb dictates that stiff differential equations require the use of implicit schemes, whereas non-stiff problems can be solved more efficiently with explicit schemes.

The so-called general linear methods (GLMs) are a generalization of the above two large classes of methods.

Euler Method

From any point on a curve, you can find an approximation of a nearby point on the curve by mov- ing a short distance along a line tangent to the curve.

Starting with the differential equation (1), we replace the derivative y’ by the finite difference ap- proximation yt(+− h ) yt () yt′() ≈ , (2) h

which when re-arranged yields the following formula yt(+≈ h ) yt () + hyt′ ()

and using (1) gives: yt(+≈ h ) yt ( ) + hft ( , yt ( )). (3)

This formula is usually applied in the following way. We choose a step size h, and we construct the sequence t0, t1 = t0 + h, t2 = t0 + 2h, … We denote by yn a numerical estimate of the exact solution

y(tn). Motivated by (3), we compute these estimates by the following recursive scheme

yn+1 = y n + hf( t nn , y ). (4)

This is the Euler method (or forward Euler method, in contrast with the backward Euler method, to be described below). The method is named after Leonhard Euler who described it in 1768.

The Euler method is an example of an explicit method. This means that the new value yn+1 is de- fined in terms of things that are already known, like yn.

Backward Euler Method

If, instead of (2), we use the approximation yt()−− yt ( h ) yt′() ≈ , (5) h

we get the backward Euler method:

yn+1= y n + hf( t nn++11 , y ). (6)

The backward Euler method is an implicit method, meaning that we have to solve an equation

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 123

to find yn+1. One often uses fixed point iteration or (some modification of) the Newton–Raphson method to achieve this.

It costs more time to solve this equation than explicit methods; this cost must be taken into consider- ation when one selects the method to use. The advantage of implicit methods such as (6) is that they are usually more stable for solving a stiff equation, meaning that a larger step size h can be used.

First-order Exponential Integrator Method

Exponential integrators describe a large class of integrators that have recently seen a lot of devel- opment. They date back to at least the 1960s.

In place of (1), we assume the differential equation is either of the form y′( t )=−+ Ay ( y ), (7)

or it has been locally linearized about a background state to produce a linear term −Ay and a non- linear term  (y ).

Exponential integrators are constructed by multiplying (7) by eAt , and exactly integrating the re- sult over a time interval [,ttnn+1 = t n + h ]:

h −Ah −−()hτ A y+ =++ e y e () yt()ττ d. nn1 ∫0 n

This integral equation is exact, but it doesn’t define the integral.

The first-order exponential integrator can be realized by holding  (yt (n +τ )) constant over the full interval:

−Ah −−1 Ah ynn+1 = e y +− A(1 e ) ( yt (n )) . (8)

Generalizations

The Euler method is often not accurate enough. In more precise terms, it only has order one (the concept of order is explained below). This caused mathematicians to look for higher-order meth- ods.

One possibility is to use not only the previously computed value yn to determine yn+1, but to make the solution depend on more past values. This yields a so-called multistep method. Perhaps the simplest is the Leapfrog method which is second order and (roughly speaking) relies on two time values.

Almost all practical multistep methods fall within the family of linear multistep methods, which have the form

ααkyy nk++ k −11 nk +− ++ α 0 yn

=h[ββk ft( nk+ , y nk + )+ k −1 ft ( nk +− 11 , y nk +− )++ β 0ft (n , y n ).]

______WORLD TECHNOLOGIES ______124 Numerical Analysis, Modelling and Simulation

Another possibility is to use more points in the interval [tn,tn+1]. This leads to the family of Runge– Kutta methods, named after Carl Runge and Martin Kutta. One of their fourth-order methods is especially popular.

Advanced Features

A good implementation of one of these methods for solving an ODE entails more than the time-step- ping formula.

It is often inefficient to use the same step size all the time, so variable step-size methods have been de- veloped. Usually, the step size is chosen such that the (local) error per step is below some tolerance lev- el. This means that the methods must also compute an error indicator, an estimate of the local error.

An extension of this idea is to choose dynamically between different methods of different orders (this is called a variable order method). Methods based on Richardson extrapolation, such as the Bulirsch–Stoer algorithm, are often used to construct various methods of different orders.

Other desirable features include:

• dense output: cheap numerical approximations for the whole integration interval, and not

only at the points t0, t1, t2, ... • event location: finding the times where, say, a particular function vanishes. This typically requires the use of a root-finding algorithm.

• support for parallel computing.

• when used for integrating with respect to time, time reversibility

Alternative Methods

Many methods do not fall within the framework discussed here. Some classes of alternative meth- ods are:

• multiderivative methods, which use not only the function f but also its derivatives. This class includes Hermite–Obreschkoff methods and Fehlberg methods, as well as methods like the Parker–Sochacki method or Bychkov-Scherbakov method, which compute the co- efficients of the Taylor series of the solution y recursively.

• methods for second order ODEs. We said that all higher-order ODEs can be transformed to first-order ODEs of the form (1). While this is certainly true, it may not be the best way to proceed. In particular, Nyström methods work directly with second-order equations.

• geometric integration methods are especially designed for special classes of ODEs (e.g., symplectic integrators for the solution of Hamiltonian equations). They take care that the numerical solution respects the underlying structure or geometry of these classes.

• Quantized State Systems Methods are a family of ODE integration methods based on the idea of state quantization. They are efficient when simulating sparse systems with frequent discontinuities.

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 125

Parallel-in-time Methods

For applications that require parallel computing on supercomputers, the degree of concurrency of- fered by a numerical method becomes relevant. In view of the challenges from exascale computing systems, numerical methods for initial value problems which can provide concurrency in temporal direction are being studied. Parareal is a relatively well known example of such a parallel-in-time integration method, but early ideas go back into the 1960s.

Analysis

Numerical analysis is not only the design of numerical methods, but also their analysis. Three cen- tral concepts in this analysis are:

• convergence: whether the method approximates the solution,

• order: how well it approximates the solution, and

• stability: whether errors are damped out.

Convergence

A numerical method is said to be convergent if the numerical solution approaches the exact solu- tion as the step size h goes to 0. More precisely, we require that for every ODE (1) with a Lipschitz function f and every t* > 0,

lim max* ‖‖ yn,h−= y(t n ) 0. h0→+ n= 0,1, … , t / h

All the methods mentioned above are convergent. In fact, a numerical scheme has to be convergent to be of any use.

Consistency and Order

Suppose the numerical method is

ynk+=Ψ…( t nk + ; yy n , n +11 , , y nk +− ; h ).

The local (truncation) error of the method is the error committed by one step of the method. That is, it is the difference between the result given by the method, assuming that no error was made in earlier steps, and the exact solution:

h δnk+ =Ψ…−(t nk+; yt ( n ), yt ( n+11 ), , yt (nk+− ); h) yt (nk+ ).

The method is said to be consistent if δ h limnk+ = 0. h→0 h

The method has order p if

______WORLD TECHNOLOGIES ______126 Numerical Analysis, Modelling and Simulation

hp+1 δnk+ = Oh( ) as h→ 0.

Hence a method is consistent if it has an order greater than 0. The (forward) Euler method (4) and the backward Euler method (6) introduced above both have order 1, so they are consistent. Most methods being used in practice attain higher order. Consistency is a necessary condition for convergence, but not sufficient; for a method to be convergent, it must be both consistent and zero-stable.

A related concept is the global (truncation) error, the error sustained in all the steps one needs to reach a fixed timet . Explicitly, the global error at time t is yN − y(t) where N = (t−t0)/h. The global error of a pth order one-step method is O(hp); in particular, such a method is convergent. This statement is not necessarily true for multi-step methods.

Stability and Stiffness

For some differential equations, application of standard methods —such as the Euler method, explicit Runge–Kutta methods, or multistep methods (e.g., Adams–Bashforth methods)— exhibit instability in the solutions, though other methods may produce stable solutions. This “difficult behaviour” in the equation (which may not necessarily be complex itself) is described as stiffness, and is often caused by the presence of different time scales in the underlying problem. For exam- ple, a collision in a mechanical system like in an impact oscillator typically occurs at much smaller time scale than the time for the motion of objects; this discrepancy makes for very “sharp turns” in the curves of the state parameters.

Stiff problems are ubiquitous in chemical kinetics, control theory, solid mechanics, weather forecast- ing, biology, plasma physics, and electronics. One way to overcome stiffness is to extend the notion of differential equation to that of differential inclusion, which allows for and models non-smoothness.

History

Below is a timeline of some important developments in this field.

• 1768 - Leonhard Euler publishes his method.

• 1824 - Augustin Louis Cauchy proves convergence of the Euler method. In this proof, Cau- chy uses the implicit Euler method.

• 1855 - First mention of the multistep methods of John Couch Adams in a letter written by F. Bashforth.

• 1895 - Carl Runge publishes the first Runge–Kutta method.

• 1905 - Martin Kutta describes the popular fourth-order Runge–Kutta method.

• 1910 - Lewis Fry Richardson announces his extrapolation method, Richardson extrapolation.

• 1952 - Charles F. Curtiss and Joseph Oakland Hirschfelder coin the term stiff equations.

• 1963 - Germund Dahlquist introduces A-stability of integration methods.

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 127

Numerical Solutions to Second-order One-dimensional Boundary Value Problems

Boundary value problems (BVPs) are usually solved numerically by solving an approximately equivalent matrix problem obtained by discretizing the original BVP. The most commonly used method for numerically solving BVPs in one dimension is called the Finite Difference Method. This method takes advantage of linear combinations of point values to construct finite difference coefficients that describe derivatives of the function. For example, the second-order central differ- ence approximation to the first derivative is given by:

uu− ii+−11=ux′( ) + ( h2 ), 2h i

and the second-order central difference for the second derivative is given by:

u−+2 uu i+−11 ii=ux′′( ) + ( h2 ). h2 i

In both of these formulae, hxx=ii − −1 is the distance between neighbouring x values on the dis- cretized domain. One then constructs a linear system that can then be solved by standard matrix methods. For instance, suppose the equation to be solved is:

du2 −=u 0, dx2 u(0)= 0,

u(1)= 1.

The next step would be to discretize the problem and use linear derivative approximations such as

u−+2 uu u′′ = i+−11 ii i h2

and solve the resulting system of linear equations. This would lead to equations such as:

u−+2 uu i+−11 ii−ui =0, ∀= 1,2,3,..., n − 1. h2 i

On first viewing, this system of equations appears to have difficulty associated with the fact that the equation involves no terms that are not multiplied by variables, but in fact this is false. At i = 1 and n − 1 there is a term involving the boundary values uu(0) = 0 and uu(1) = n and since these two values are known, one can simply substitute them into this equation and as a result have a non-ho- mogeneous linear system of equations that has non-trivial solutions.

______WORLD TECHNOLOGIES ______128 Numerical Analysis, Modelling and Simulation

Series Acceleration

In mathematics, series acceleration is one of a collection of sequence transformations for improv- ing the rate of convergence of a series. Techniques for series acceleration are often applied in numerical analysis, where they are used to improve the speed of numerical integration. Series acceleration techniques may also be used, for example, to obtain a variety of identities on special functions. Thus, the Euler transform applied to the hypergeometric series gives some of the classic, well-known hypergeometric series identities.

Definition

Given a sequence

Ss{}nn∈

having a limit

limsn =  , n→∞

an accelerated series is a second sequence ′′ Ss= {}nn∈

which converges faster to  than the original sequence, in the sense that s′ −  limn = 0. n→∞ sn − 

If the original sequence is divergent, the sequence transformation acts as an extrapolation method to the antilimit  . The mappings from the original to the transformed series may be linear, or non-linear. In general, the non-linear sequence transformations tend to be more powerful.

Overview Two classical techniques for series acceleration are Euler’s transformation of series and Kum- mer’s transformation of series. A variety of much more rapidly convergent and special-case tools have been developed in the 20th century, including Richardson extrapolation, intro- duced by Lewis Fry Richardson in the early 20th century but also known and used by Katahiro Takebe in 1722, the Aitken delta-squared process, introduced by Alexander Aitken in 1926 but also known and used by Takakazu Seki in the 18th century, the epsilon algorithm given by Peter Wynn in 1956, the Levin u-transform, and the Wilf-Zeilberger-Ekhad method or WZ method. For alternating series, several powerful techniques, offering convergence rates from 5.828−n all the way to 17.93−n for a summation of n terms, are described by Cohen et al..

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 129

Euler’s Transform

A basic example of a linear sequence transformation, offering improved convergence, is Euler’s transform. It is intended to be applied to an alternating series; it is given by

∞∞∆na −=−nn0 ∑∑( 1)an ( 1) n+1 nn=00= 2

where ∆ is the forward difference operator:

n nkn ∆=aa0 ∑( − 1)nk− . k =0 k

If the original series, on the left hand side, is only slowly converging, the forward differences will tend to become small quite rapidly; the additional power of two further improves the rate at which the right hand side converges.

A particularly efficient numerical implementation of the Euler transform is the van Wijngaarden transformation.

Conformal Mappings

A series

S = ∑an n=0

can be written as f(1), where the function f(z) is defined as

∞ n f() z= ∑ azn n=0

The function f(z) can have singularities in the complex plane (branch point singularities, poles or essential singularities), which limit the radius of convergence of the series. If the point z = 1 is close to or on the boundary of the disk of convergence, the series for S will converge very slowly. One can then improve the convergence of the series by means of a conformal mapping that moves the singularities such that the point that is mapped to z = 1, ends up deeper in the new disk of conver- gence.

The conformal transform zw= Φ()needs to be chosen such that Φ=(0) 0 , and one usually choos- es a function that has a finite derivative at w = 0. One can assume that Φ=(1) 1 without loss of generality, as one can always rescale w to redefine Φ. We then consider the function gw()= f() Φ () w

Since Φ=(1) 1, we have f(1) = g(1). We can obtain the series expansion of g(w) by putting in the series expansion of f(z) because zw= Φ(); the first n terms of the series expansion for f(z) will yield the first

______WORLD TECHNOLOGIES ______130 Numerical Analysis, Modelling and Simulation

n terms of the series expansion for g(w) if Φ≠′(0) 0 . Putting w = 1 in that series expansion will thus yield a series such that if it converges, it will converge to the same value as the original series.

Non-linear Sequence Transformations

Examples of such nonlinear sequence transformations are Padé approximants, the Shanks trans- formation, and Levin-type sequence transformations.

Especially nonlinear sequence transformations often provide powerful numerical methods for the summation of divergent series or asymptotic series that arise for instance in perturbation theory, and may be used as highly effective extrapolation methods.

Aitken Method

A simple nonlinear sequence transformation is the Aitken extrapolation or delta-squared method, ′′ :SS→= () S = ( snn )∈

defined by

2 ()ssnn++21− ssnn′ =+2 − . sn++21−+2 ss nn

This transformation is commonly used to improve the rate of convergence of a slowly converging sequence; heuristically, it eliminates the largest part of the absolute error.

Minimum Polynomial Extrapolation

In mathematics, minimum polynomial extrapolation is a sequence transformation used for con- vergence acceleration of vector sequences, due to Sabay and Jackson.

While Aitken’s method is the most famous, it often fails for vector sequences. An effective method for vector sequences is the minimum polynomial extrapolation. It is usually phrased in terms of the fixed point iteration:

xkk+1 = fx( ).

n Given iterates xx12, ,..., xk in  , one constructs the nk×−( 1) matrix U=−−( x2 xx 13 , x 2 ,..., xkk − x−1 ) + whose columns are the k −1differences. Then, one computes the vector c=−− Ux()kk+1 xwhere U + denotes the Moore–Penrose pseudoinverse of U . The number 1 is then appended to the end of c, and the extrapolated limit is Xc s = k , ∑ci i=1

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 131

where X= ( xx23 , ,..., xk + 1 ) is the matrix whose columns are the k iterates starting at 2. The following 4 line MATLAB code segment implements the MPE algorithm:

U=x(:,2:end-1)-x(:,1:end-2);

c=-pinv(U)*(x(:,end)-x(:,end-1));

c(end+1,1)=1;

s=(x(:,2:end)*c)/sum(c);

Richardson Extrapolation

In numerical analysis, Richardson extrapolation is a sequence acceleration method, used to im- prove the rate of convergence of a sequence. It is named after Lewis Fry Richardson, who intro- duced the technique in the early 20th century. In the words of Birkhoff and Rota, “its usefulness for practical computations can hardly be overestimated.”

Practical applications of Richardson extrapolation include Romberg integration, which applies Richardson extrapolation to the trapezoid rule, and the Bulirsch–Stoer algorithm for solving ordi- nary differential equations.

Example of Richardson Extrapolation

Suppose that we wish to approximate A*, and we have a method Ah()that depends on a small parameter h , so that

A() h=++ A∗+ Chnn O() h 1

Define a new method

kn Ah()− Akh ( ) Rhk( , ):= k n −1

Then

kn( A*+ Ch nn + O( h++1* )) −+ ( A Cknnn h + O( h 1 )) Rhk(,) = =A*1 + Oh(n+ ). k n −1

Rhk(, )is called the Richardson extrapolation of A(h), and has a higher-order error estimate Oh()n+1 compared to Ah().

Very often, it is much easier to obtain a given precision by using R(h) rather than A(h’) with a much smaller h’ , which can cause problems due to limited precision (rounding errors) and/or due to the increasing number of calculations needed.

______WORLD TECHNOLOGIES ______132 Numerical Analysis, Modelling and Simulation

General Formula

Let A(h) be an approximation of A that depends on a positive step size h with an error formula of the form

k0 kk12 A−= Ah() ah012 ++ ah ah +

k k where the ai are unknown constants and the ki are known constants such that h i > h i+1. The exact value sought can be given by

k0 kk12 A=+ Ah() ah012 ++ ah ah +

which can be simplified with Big O notation to be

k0 k1 A=++ Ah( ) ah0 Oh ( ).

Using the step sizes h and h / t for some t, the two formulas for A are:

k0 k1 A=++ Ah() ah0 Oh ( )

k0 hh  k1 A=++ A a0  Oh( ). tt 

k Multiplying the second equation by t 0 and subtracting the first equation gives

kkh k (t00−= 1) A t A − Ah()() + Oh1 t

which can be solved for A to give

k h t0 A− Ah() t k A= + Oh(1 ). t k0 −1

By this process, we have achieved a better approximation of A by subtracting the largest term in k the error which was O(h 0). This process can be repeated to remove more error terms to get even better approximations.

A general recurrence relation beginning with A0 = Ah()can be defined for the approximations by

ki h tAii− Ah() t Ahi+1()= t ki −1

where ki+1 satisfies

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 133

ki+1 A= Ai+1() h + Oh ( ). The Richardson extrapolation can be considered as a linear sequence transformation.

Additionally, the general formula can be used to estimate k0 when neither its value nor A is known a priori. Such a technique can be useful for quantifying an unknown rate of conver- gence. Given approximations of A from three distinct step sizes h, h / t, and h / s, the exact relationship

kkhh t00 A−− Ah() s A Ah() tskk A= +=Oh()11 +Oh() tskk00−−11

yields an approximate relationship

hh A−− Ah() A Ah() hhts  AA+ ≈+ ttkk00−−11 ss

which can be solved numerically to estimate k0.

Example

Using Taylor’s theorem about h=0, fx′′() fx(+= h ) fx () + f′ () xh + h2 + 2

the derivative of f(x) is given by fx(+− h ) fx () f′′ () x fx′()= −+h. h 2

If the initial approximations of the derivative are chosen to be fx(+− h ) fx () Ah()= 0 h

then ki = i+1. For t = 2, the first formula extrapolated for A would be

h 2 A=2 A00 −+ A ( h ) Oh ( ). 2

For the new approximation

______WORLD TECHNOLOGIES ______134 Numerical Analysis, Modelling and Simulation

h Ah1()= 2 A 00 − Ah () 2 we can extrapolate again to obtain

h 4A11− Ah () 2 3 A= + Oh( ). 3

Example Pseudocode Code for Richardson Extrapolation The following pseudocode in MATLAB style demonstrates Richardson extrapolation to help solve the ODE yt′()= − y2 , y(0)= 1with the Trapezoidal method. In this example we halve the step size h each iteration and so in the discussion above we’d have that t = 2. The error of the Trapezoidal method can be expressed in terms of odd powers so that the error over multiple steps can be ex- pressed in even powers and so we take powers of 42=22 = t in the pseudocode. We want to find

11 the value of y(5), which has the exact solution of = = 0.1666...since the exact solution of 51+ 6 1 the ODE is yt( )= . This pseudocode assumes that a function called Trapezoidal(f, tStart, tEnd, 1+ t h, y0) exists which performs the trapezoidal method on the function f, with starting point y0 and tStart, step size h, and attempts to computes y(tEnd)

Starting with too small an initial step size can potentially introduce error into the final solution. Although there are methods designed to help pick the best initial step size, one option is to start with a large step size and then to allow the Richardson extrapolation to reduce the step size each iteration until the error reaches the desired tolerance. tStart = 0 %Starting time tEnd = 5 %Ending time f = -y^2 %The derivative of y, so y’ = f(t, y(t)) = -y^2 % The solution to this ODE is y = 1/(1 + t) y0 = 1 %The initial position (i.e. y0 = y(tStart) = y(0) = 1) tolerance = 10^-11 %10 digit accuracy is desired maxRows = 20 %Don’t allow the iteration to continue indefinitely initialH = tStart - tEnd %Pick an initial step size haveWeFoundSolution = false %Were we able to find the solution to the desired tolerance? not yet. h = initialH

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 135

%Create a 2D matrix of size maxRows by maxRows to hold the Richardson extrapolates %Note that this will be a lower triangular matrix and that at most two rows are actually % needed at any time in the compuation. A = zeroMatrix(maxRows, maxRows)

%Compute the top left element of the matrix A(1, 1) = Trapezoidal(f, tStart, tEnd, h, y0)

%Each row of the matrix requires one call to Trapezoidal %This loops starts by filling the second row of the matrix, since the first row was com- puted above for i = 1 : maxRows - 1 %Starting at i = 1, iterate at most maxRows - 1 times h = h/2 %Half the previous value of h since this is the start of a new row

%Call the Trapezoidal function with this new smaller step size A(i + 1, 1) = Trapezoidal(f, tStart, tEnd, h, y0)

for j = 1 : i %Go across the row until the diagonal is reached %Use the value just computed (i.e. A(i + 1, j)) and the element from the % row above it (i.e. A(i, j)) to compute the next Richardson extrapolate

A(i + 1, j + 1) = ((4^j).*A(i + 1, j) - A(i, j))/(4^j - 1); end

%After leaving the above inner loop, the diagonal element of row i + 1 has been com- puted % This diagonal element is the latest Richardson extrapolate to be computed %The difference between this extrapolate and the last extrapolate of row i is a good % indication of the error if(absoluteValue(A(i + 1, i + 1) - A(i, i)) < tolerance) %If the result is within tolerance print(“y(5) = “, A(i + 1, i + 1)) %Display the result of the Richardson extrapolation haveWeFoundSolution = true break %Done, so leave the loop end end

______WORLD TECHNOLOGIES ______136 Numerical Analysis, Modelling and Simulation

if(haveWeFoundSolution == false) %If we weren’t able to find a solution to within the desired tolerance print(“Warning: Not able to find solution to within the desired tolerance of “, tol- erance); print(“The last computed extrapolate was “, A(maxRows, maxRows)) end

Shanks Transformation

In numerical analysis, the Shanks transformation is a non-linear series acceleration method to increase the rate of convergence of a sequence. This method is named after Daniel Shanks, who re- discovered this sequence transformation in 1955. It was first derived and published by R. Schmidt in 1941.

One can calculate only a few terms of a perturbation expansion, usually no more than two or three, and almost never more than seven. The resulting series is often slowly convergent, or even diver- gent. Yet those few terms contain a remarkable amount of information, which the investigator should do his best to extract. This viewpoint has been persuasively set forth in a delightful paper by Shanks (1955), who displays a number of amazing examples, including several from fluid me- chanics.

Milton D. Van Dyke (1975) Perturbation methods in fluid mechanics.

Formulation

For a sequence a the series {}m m∈ ∞

Aa= ∑ m m=0

is to be determined. First, the partial sum An is defined as:

n Aanm= ∑ m=0 and forms a new sequence A . Provided the series converges, A will also approach the limit {}n n∈ n A as n →∞.The Shanks transformation SA()n of the sequence An is the new sequence defined by

22 AAnn+−11−− A n ()AAn+1 n SA()nn= =A+1 − An+−11−2 AA nn + ( An + 1−−− A n )( AA nn − 1 )

where this sequence SA()n often converges more rapidly than the sequence An . Further speed-up 2 may be obtained by repeated use of the Shanks transformation, by computing S( Ann )= SSA ( ( )),

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 137

3 S( Ann )= SSSA ( ( ( ))), etc. Note that the non-linear transformation as used in the Shanks transformation is essentially the same as used in Aitken’s delta-squared process so that as with Aitken’s method, the right-most

2 ()AAnn+1 − expression in SA()n ’s definition (i.e. SA()nn= A+1 − ) is more numerically (An+−11−−− A n )( AA nn ) 2 AAnn+−11− A n stable than the expression to its left (i.e. SA()n = ). Both Aitken’s method and An+−11−+2 AA nn Shanks transformation operate on a sequence, but the sequence the Shanks transformation oper- ates on is usually thought of as being a sequence of partial sums, although any sequence may be viewed as a sequence of partial sums.

Example

As an example, consider the slowly convergent series

Absolute error as a function of n in the partial sums An and after applying the Shanks transformation once or several 2 3 1111 times: SA( ), SA()and SA( ). The series used is 41−+−+− ,which has the exact sum π. n n n 3579

∞ k 1 111 4∑ (− 1) = 4 1 −+−+ k =0 2k + 1 357 which has the exact sumπ ≈ 3.14159265. The partial sum A6 has only one digit accuracy, while six-figure accuracy requires summing about 400,000 terms.

In the table below, the partial sums An the Shanks transformation SA()n on them, as well as the 2 3 repeated Shanks transformations SA()n and SA()n are given for n up to 12. The figure to the right shows the absolute error for the partial sums and Shanks transformation results, clearly showing the improved accuracy and convergence rate.

______WORLD TECHNOLOGIES ______138 Numerical Analysis, Modelling and Simulation

n 2 3 An SA()n SA()n SA()n 0 4.00000000 — — — 1 2.66666667 3.16666667 — — 2 3.46666667 3.13333333 3.14210526 — 3 2.89523810 3.14523810 3.14145022 3.14159936 4 3.33968254 3.13968254 3.14164332 3.14159086 5 2.97604618 3.14271284 3.14157129 3.14159323 6 3.28373848 3.14088134 3.14160284 3.14159244 7 3.01707182 3.14207182 3.14158732 3.14159274 8 3.25236593 3.14125482 3.14159566 3.14159261 9 3.04183962 3.14183962 3.14159086 3.14159267 10 3.23231581 3.14140672 3.14159377 3.14159264 11 3.05840277 3.14173610 3.14159192 3.14159266 12 3.21840277 3.14147969 3.14159314 3.14159265

The Shanks transformation SA()1 already has two-digit accuracy, while the original partial sums 3 only establish the same accuracy at A24.Remarkably, SA()3 has six digits accuracy, obtained from repeated Shank transformations applied to the first seven terms A0 , ... , A6. As said before, An only obtains 6-digit accuracy after about summing 400,000 terms.

Motivation

The Shanks transformation is motivated by the observation that — for larger n — the partial sum

An quite often behaves approximately as

n AAqn = +α , with |q |1< so that the sequence converges transiently to the series result A for n →∞.So for n −1, n and n +1the respective partial sums are:

nn−+11 n Ann−+11= Aq+=αα, AAq+and A n = Aq+ α.

These three equations contain three unknowns: A, α and q. Solving for A gives

AA− A2 A = nn+−11 n. An+−11−+2 AA nn

In the (exceptional) case that the denominator is equal to zero: then AAn = for all n.

Generalized Shanks Transformation

The generalized kth-order Shanks transformation is given as the ratio of the determinants:

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 139

Ank−− AAn1 n

∆Ank−− ∆∆ AAn1 n

∆Ank−+11 ∆∆ AAn n+   ∆A ∆∆ AA SA()= n−1nk+− 21 nk +− , kn 1 11

∆Ank−− ∆∆ AAn1 n

∆Ank−+11 ∆∆ AAn n+  

∆An−1 ∆∆ AAnk+− 21 nk +−

with ∆=AApp+1 − A p. It is the solution of a model for the convergence behaviour of the partial sums An with k distinct transients:

k n AAn= + ∑α pp q. p=1

This model for the convergence behaviour contains 21k + unknowns. By evaluating the above equation at the elements AAnk−, nk −+1 ,,… A nk + and solving for A, the above expression for the kth-order Shanks transformation is obtained. The first-order generalized Shanks transformation is equal to the ordinary Shanks transformation: S1() Ann= SA (). The generalized Shanks transformation is closely related to Padé approximants and Padé tables.

Interpolation

In the mathematical field ofnumerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points.

In engineering and science, one often has a number of data points, obtained by sampling or experi- mentation, which represent the values of a function for a limited number of values of the independent variable. It is often required to interpolate (i.e. estimate) the value of that function for an intermediate value of the independent variable. This may be achieved by curve fitting or .

A different problem which is closely related to interpolation is the approximation of a complicat- ed function by a simple function. Suppose the formula for some given function is known, but too complex to evaluate efficiently. A few known data points from the original function can be used to create an interpolation based on a simpler function. Of course, when a simple function is used to estimate data points from the original, interpolation errors are usually present; however, depend- ing on the problem domain and the interpolation method used, the gain in simplicity may be of greater value than the resultant loss in precision.

In the examples below if we consider x as a topological space and the function f forms a different kind of Banach spaces then the problem is treated as “interpolation of operators”. The classical

______WORLD TECHNOLOGIES ______140 Numerical Analysis, Modelling and Simulation

results about interpolation of operators are the Riesz–Thorin theorem and the Marcinkiewicz the- orem. There are also many other subsequent results.

An interpolation of a finite set of points on an epitrochoid. Points through which curve is splined are red; the blue curve connecting them is interpolation.

Example

For example, suppose we have a table like this, which gives some values of an unknown func- tion f.

Plot of the data points as given in the table.

x f(x) 0 0 1 0 . 8415 2 0 . 9093 3 0 . 1411 4 −0 . 7568 5 −0 . 9589 6 −0 . 2794

Interpolation provides a means of estimating the function at intermediate points, such as x = 2.5.

There are many different interpolation methods, some of which are described below. Some of the concerns to take into account when choosing an appropriate algorithm are: How accurate is the method? How expensive is it? How smooth is the interpolant? How many data points are needed?

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 141

Piecewise Constant Interpolation

The simplest interpolation method is to locate the nearest data value, and assign the same value. In simple problems, this method is unlikely to be used, as linear interpolation is almost as easy, but in higher-dimensional multivariate interpolation, this could be a favourable choice for its speed and simplicity.

Piecewise constant interpolation, or nearest-neighbor interpolation.

Linear Interpolation

One of the simplest methods is linear interpolation (sometimes known as lerp). Consider the above example of estimating f(2.5). Since 2.5 is midway between 2 and 3, it is reasonable to take f(2.5) midway between f(2) = 0.9093 and f(3) = 0.1411, which yields 0.5252.

Plot of the data with linear interpolation superimposed

Generally, linear interpolation takes two data points, say (xa,ya) and (xb,yb), and the interpolant is given by:

xx− a y=+− ya() y ba y at the point ()xy , xxba−

yy−− xx aa= yyba−− xx ba

yy−− y y a= ba xx−−a x ba x

This previous equation states that the slope of the new line between (,xyaa )and (,xy )is the same

as the slope of the line between (,xyaa )and (,xybb )

______WORLD TECHNOLOGIES ______142 Numerical Analysis, Modelling and Simulation

Linear interpolation is quick and easy, but it is not very precise. Another disadvantage is that the

interpolant is not differentiable at the point xk. The following error estimate shows that linear interpolation is not very precise. Denote the func- tion which we want to interpolate by g, and suppose that x lies between xa and xb and that g is twice continuously differentiable. Then the linear interpolation error is

2 1 |()f x−≤− gx ()| Cx ( x ) whereC = max∈ |g′′ ()|. r b a 8 r[,] xxab

In words, the error is proportional to the square of the distance between the data points. The error in some other methods, including polynomial interpolation and spline interpolation (described below), is proportional to higher powers of the distance between the data points. These methods also produce smoother interpolants.

Polynomial Interpolation

Polynomial interpolation is a generalization of linear interpolation. Note that the linear interpo- lant is a linear function. We now replace this interpolant with a polynomial of higher degree.

Plot of the data with polynomial interpolation applied

Consider again the problem given above. The following sixth degree polynomial goes through all the seven points:

fx( )=−− 0.0001521x6 0.003130 x 5 + 0.07321 x 432 −++ 0.3577 x 0.2255 x 0.9038 x .

Substituting x = 2.5, we find that f(2.5) = 0.5965.

Generally, if we have n data points, there is exactly one polynomial of degree at most n−1 going through all the data points. The interpolation error is proportional to the distance between the data points to the power n. Furthermore, the interpolant is a polynomial and thus infinitely dif- ferentiable. So, we see that polynomial interpolation overcomes most of the problems of linear interpolation.

However, polynomial interpolation also has some disadvantages. Calculating the inter- polating polynomial is computationally expensive compared to linear interpolation. Fur- thermore, polynomial interpolation may exhibit oscillatory artifacts, especially at the end points.

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 143

Polynomial interpolation can estimate local maxima and minima that are outside the range of the samples, unlike linear interpolation. For example, the interpolant above has a local maximum at x ≈ 1.566, f(x) ≈ 1.003 and a local minimum at x ≈ 4.708, f(x) ≈ −1.003. However, these maxima and minima may exceed the theoretical range of the function—for example, a function that is always positive may have an interpolant with negative values, and whose inverse therefore contains false vertical asymptotes.

More generally, the shape of the resulting curve, especially for very high or low values of the inde- pendent variable, may be contrary to commonsense, i.e. to what is known about the experimental system which has generated the data points. These disadvantages can be reduced by using spline interpolation or restricting attention to Chebyshev polynomials.

Spline Interpolation

Remember that linear interpolation uses a linear function for each of intervals [xk,xk+1]. Spline interpolation uses low-degree polynomials in each of the intervals, and chooses the polynomial pieces such that they fit smoothly together. The resulting function is called a spline.

Plot of the data with spline interpolation applied

For instance, the natural cubic spline is piecewise cubic and twice continuously differentiable. Furthermore, its second derivative is zero at the end points. The natural cubic spline interpolating the points in the table above is given by

−+0.1522xx3 0.9937 , if x∈[0,1],  32 −0.01258xxx − 0.4189 +− 1.4126 0.1396, if x ∈[1,2], 0.1403xx32− 1.3359 +− 3.2467 x 1.3623, if x ∈[2,3], = fx()  32 0.1579xx− 1.4945 +− 3.7225 x 1.8381, if x ∈[3,4], 0.05375xxx32− 0.2450 −+ 1.2756 4.8259, if x ∈[4,5],  32 −+0.1871xx 3.3673 − 19.3370 x + 34.9282, if x ∈[5,6].

In this case we get f(2.5) = 0.5972.

Like polynomial interpolation, spline interpolation incurs a smaller error than linear interpolation and the interpolant is smoother. However, the interpolant is easier to evaluate than the high-de- gree polynomials used in polynomial interpolation. It also does not suffer from Runge’s phenom- enon.

______WORLD TECHNOLOGIES ______144 Numerical Analysis, Modelling and Simulation

Interpolation Via Gaussian Processes

Gaussian process is a powerful non-linear interpolation tool. Many popular interpolation tools are actually equivalent to particular Gaussian processes. Gaussian processes can be used not only for fitting an interpolant that passes exactly through the given data points but also for regression, i.e., for fitting a curve through noisy data. In the geostatistics community Gaussian process regression is also known as Kriging.

Other Forms of Interpolation

Other forms of interpolation can be constructed by picking a different class of interpolants. For instance, rational interpolation is interpolation by rational functions using Padé approximant, and trigonometric interpolation is interpolation by trigonometric polynomials using Fourier series. Another possibility is to use wavelets.

The Whittaker–Shannon interpolation formula can be used if the number of data points is infinite.

Sometimes, we know not only the value of the function that we want to interpolate, at some points, but also its derivative. This leads to Hermite interpolation problems.

When each data point is itself a function, it can be useful to see the interpolation problem as a par- tial advection problem between each data point. This idea leads to the displacement interpolation problem used in transportation theory.

In Higher Dimensions

Comparison of some 1- and 2-dimensional interpolations. Black and red/yellow/green/blue dots correspond to the interpolated point and neighbouring samples, respectively. Their heights above the ground correspond to their values.

Multivariate interpolation is the interpolation of functions of more than one variable. Methods include bilinear interpolation and bicubic interpolation in two dimensions, and trilinear interpo- lation in three dimensions. They can be applied to gridded or scattered data.

Bicubic Bilinear Nearest neighbor

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 145

Interpolation in Digital Signal Processing

In the domain of digital signal processing, the term interpolation refers to the process of converting a sampled digital signal (such as a sampled audio signal) to that of a higher sampling rate (Upsampling) using various digital filtering techniques (e.g., convolution with a frequency-limited impulse signal). In this application there is a specific requirement that the harmonic content of the original signal be pre- served without creating aliased harmonic content of the original signal above the original Nyquist limit of the signal (i.e., above fs/2 of the original signal sample rate). An early and fairly elementary discus- sion on this subject can be found in Rabiner and Crochiere’s book Multirate Digital Signal Processing.

Related Concepts

The term extrapolation is used to find data points outside the range of known data points.

In curve fitting problems, the constraint that the interpolant has to go exactly through the data points is relaxed. It is only required to approach the data points as closely as possible (within some other constraints). This requires parameterizing the potential interpolants and having some way of measuring the error. In the simplest case this leads to least squares approximation.

Approximation theory studies how to find the best approximation to a given function by another function from some predetermined class, and how good this approximation is. This clearly yields a bound on how well the interpolant can approximate the unknown function.

Van Wijngaarden Transformation

In mathematics and numerical analysis, in order to accelerate convergence of an alternating series, Euler’s transform can be computed as follows.

Compute a row of partial sums :

k n sa0,kn=∑( − 1) n=0

and form rows of averages between neighbors, ss+ s = jk, jk ,1+ jk+1, 2

The first column s j,0 then contains the partial sums of the Euler transform. Adriaan van Wijngaarden’s contribution was to point out that it is better not to carry this proce-

dure through to the very end, but to stop two-thirds of the way. If aa0,, 1… , a 12 are available, then

s8,4 is almost always a better approximation to the sum than s 12,0. 111 π Leibniz formula for pi, 1−+−+ = =0.7853981 … , , gives the partial sum 357 4 −6 s0,12 = 0.8046006...(+ 2.4%), , the Euler transform partial sum s12,0 =0.7854002...( +× 2.6 10 )

______WORLD TECHNOLOGIES ______146 Numerical Analysis, Modelling and Simulation

−8 and the van Wijngaarden result s8,4 =0.7853982...( +× 4.7 10 ) (relative errors are in round brack- ets).

1.00000000 0.66666667 0.86666667 0.72380952 0.83492063 0.74401154 0.82093462 0.75426795 0.81309148 0.76045990 0.80807895 0.76460069 0.80460069 0.83333333 0.76666667 0.79523810 0.77936508 0.78946609 0.78247308 0.78760129 0.78367972 0.78677569 0.78426943 0.78633982 0.78460069 0.80000000 0.78095238 0.78730159 0.78441558 0.78596959 0.78503719 0.78564050 0.78522771 0.78552256 0.78530463 0.78547026 0.79047619 0.78412698 0.78585859 0.78519259 0.78550339 0.78533884 0.78543410 0.78537513 0.78541359 0.78538744 0.78730159 0.78499278 0.78552559 0.78534799 0.78542111 0.78538647 0.78540462 0.78539436 0.78540052 0.78614719 0.78525919 0.78543679 0.78538455 0.78540379 0.78539555 0.78539949 0.78539744 0.78570319 0.78534799 0.78541067 0.78539417 0.78539967 0.78539752 0.78539847 0.78552559 0.78537933 0.78540242 0.78539692 0.78539860 0.78539799 0.78545246 0.78539087 0.78539967 0.78539776 0.78539829 0.78542166 0.78539527 0.78539871 0.78539803 0.78540847 0.78539699 0.78539837 0.78540273 0.78539768 0.78540021 This table results from the J formula ‘b11.8’8!:2-:&(}:+}.)^:n+/\(_1^n)*%1+2*n=.i.13 In many cases the diagonal terms do not converge in one cycle so process of averaging is to be repeated with diagonal terms by bringing them in a row. This will be needed in an geometric series with ratio -4. This process of successive averaging of the average of partial sum can be replaced by using formula to calculate the diagonal term.

Matrix Splitting

In the mathematical discipline of , a matrix splitting is an expression which represents a given matrix as a sum or difference of matrices. Many iterative methods (for example, for systems of differential equations) depend upon the direct solution of matrix equa- tions involving matrices more general than tridiagonal matrices. These matrix equations can often be solved directly and efficiently when written as a matrix splitting. The technique was devised by Richard S. Varga in 1960.

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 147

Regular Splittings

We seek to solve the matrix equation

Ax= k, (1) where A is a given n × n non-singular matrix, and k is a given column vector with n components. We split the matrix A into

A= BC − , (2) where B and C are n × n matrices. If, for an arbitrary n × n matrix M, M has nonnegative entries, we write M ≥ 0. If M has only positive entries, we write M > 0. Similarly, if the matrix M1 − M2 has

nonnegative entries, we write M1 ≥ M2. Definition: A = B − C is a regular splitting of A if and only if B−1 ≥ 0 and C ≥ 0.

We assume that matrix equations of the form

(3) Bx= g, where g is a given column vector, can be solved directly for the vector x. If (2) represents a regular splitting of A, then the iterative method

(mm+ 1) ( ) Bx=+=… Cx k,m 0,1,2, , (4) where x(0) is an arbitrary vector, can be carried out. Equivalently, we write (4) in the form

x(mm+− 1)=+=… BCx 1 ( ) Bk − 1 ,m 0,1,2, (5) The matrix D = B−1C has nonnegative entries if (2) represents a regular splitting of A.

It can be shown that if A−1 > 0, then ρ()D < 1, where ρ()D represents the spectral radius of D, and thus D is a convergent matrix. As a consequence, the iterative method (5) is necessarily convergent.

If, in addition, the splitting (2) is chosen so that the matrix B is a diagonal matrix (with the diago- nal entries all non-zero, since B must be invertible), then B can be inverted in linear time.

Matrix Iterative Methods

Many iterative methods can be described as a matrix splitting. If the diagonal entries of the matrix A are all nonzero, and we express the matrix A as the matrix sum

(6) A=−− DUL, where D is the diagonal part of A, and U and L are respectively strictly upper and lower triangular n × n matrices, then we have the following.

______WORLD TECHNOLOGIES ______148 Numerical Analysis, Modelling and Simulation

The Jacobi method can be represented in matrix form as a splitting

x(mm+− 1)=++ D 1 () U Lx( ) D− 1 k . (7) The Gauss-Seidel method can be represented in matrix form as a splitting

x(mm+− 1) =−() DL1 Ux ( ) +− (). DL −1 k (8) The method of successive over-relaxation can be represented in matrix form as a splitting

x(mm+− 1) =−()[(1)] DLω1 −+ ωω DUx() +− ωω (). DLk−1 (9)

Example Regular Splitting

In equation (1), let

6−− 23   5    Ak=−−=−1 4 2  ,  12 .    (10) −−3 1 5   10 Let us apply the splitting (7) which is used in the Jacobi method: we split A in such a way that B consists of all of the diagonal elements of A, and C consists of all of the off-diagonal elements of A, negated. (Of course this is not the only useful way to split a matrix into two matrices.) We have

(11) 600  023     BC= 0 4 0, = 1 0 2,     005  310 

1 00 18 13 16 6 −−11 AB11= 11 21 15 ,=  0 0 , 47 4 13 12 22   1 00 5

11 05 32   6 −−1111  D= BC = 0 ,Bk= − 3. 42  2 31  0  55

Since B−1 ≥ 0 and C ≥ 0, the splitting (11) is a regular splitting. Since A−1 > 0, the spectral radi-

us ρ()D < 1. (The approximate eigenvalues of D are λi ≈−0.4599820, − 0.3397859,0.7997679.) Hence, the matrix D is convergent and the method (5) necessarily converges for the problem (10). Note that the diagonal elements of A are all greater than zero, the off-diagonal elements of A are all less than zero and A is strictly diagonally dominant.

The method (5) applied to the problem (10) then takes the form

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 149

11 05 32  6 (m+ 1) 11(m) xx= 0 +−3 , m= 0,1,2,... 42 2 31  0  55 (12)

The exact solution to equation (12) is

  = −1.  (13)  The first few iterates for equation (12) are listed in the table below, beginning with x(0) = (0.0, 0.0, 0.0)T. From the table one can see that the method is evidently converging to the solution (13), al- beit rather slowly.

()m ()m ()m x1 x2 x3 0.0 0.0 0.0 0.83333 -3.0000 2.0000 0.83333 -1.7917 1.9000 1.1861 -1.8417 2.1417 1.2903 -1.6326 2.3433 1.4608 -1.5058 2.4477 1.5553 -1.4110 2.5753 1.6507 -1.3235 2.6510 1.7177 -1.2618 2.7257 1.7756 -1.2077 2.7783 1.8199 -1.1670 2.8238

Jacobi Method

As stated above, the Jacobi method (7) is the same as the specific regular splitting 11( ) demonstrat- ed above.

Gauss-seidel Method

Since the diagonal entries of the matrix A in problem (10) are all nonzero, we can express the ma- trix A as the splitting (6), where

______WORLD TECHNOLOGIES ______150 Numerical Analysis, Modelling and Simulation

600  023  000      DUL= 0 4 0, = 0 0 2, = 1 0 0.  005  000  310      (14)

We then have

20 0 0 −1 1  (DL−= ) 5 30 0 , 120  13 6 24

0 40 60  100 −−1111   (DL− ) U = 0 10 75  , (DL−=− ) k 335 . 120   120  0 26 51  233

The Gauss-Seidel method (8) applied to the problem (10) takes the form

0 40 60  100 (15) (mm+ 1) 11 ()  xx=0 10 75 +− 335 ,m = 0,1,2, … 120  120  0 26 51  233 The first few iterates for equation 15( ) are listed in the table below, beginning with x(0) = (0.0, 0.0, 0.0)T. From the table one can see that the method is evidently converging to the solution (13), somewhat faster than the Jacobi method described above.

()m ()m ()m x1 x2 x3 0.0 0.0 0.0 0.8333 -2.7917 1.9417 0.8736 -1.8107 2.1620 1.3108 -1.5913 2.4682 1.5370 -1.3817 2.6459 1.6957 -1.2531 2.7668 1.7990 -1.1668 2.8461 1.8675 -1.1101 2.8985 1.9126 -1.0726 2.9330 1.9423 -1.0479 2.9558 1.9619 -1.0316 2.9708

Successive Over-relaxation Method

Let ω = 1.1. Using the splitting (14) of the matrix A in problem (10) for the successive over-relax- ation method, we have

2 00 −1 1  (D−=ω L) 0.55 3 0 12  1.441 0.66 2.4

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 151

−1.2 4.4 6.6 −1 1  (D−ω L) [(1 −+= ωω )D U] − 0.33 0.01 8.415 , 12  −0.8646 2.9062 5.0073

11 −1 1  ωω(D−=− L) k 36.575 . 12  25.6135

The successive over-relaxation method (9) applied to the problem (10) takes the form

−1.2 4.4 6.6  11 + 11   xx(mm 1) = −0.33 0.01 8.415 () +−36.575 ,m = 0,1,2, … 12  12  −0.8646 2.9062 5.0073  25.6135    (16)

The first few iterates for equation (16) are listed in the table below, beginning with x(0) = (0.0, 0.0, 0.0)T. From the table one can see that the method is evidently converging to the solution (13), slightly faster than the Gauss-Seidel method described above.

()m ()m ()m x1 x2 x3 0.0 0.0 0.0

0.9167 -3.0479 2.1345

0.8814 -1.5788 2.2209

1.4711 -1.5161 2.6153

1.6521 -1.2557 2.7526

1.8050 -1.1641 2.8599

1.8823 -1.0930 2.9158

1.9314 -1.0559 2.9508

1.9593 -1.0327 2.9709

1.9761 -1.0185 2.9829

1.9862 -1.0113 2.9901

Gaussian Elimination

In linear algebra, Gaussian elimination (also known as row reduction) is an algorithm for solving systems of linear equations. It is usually understood as a sequence of operations performed on the corresponding matrix of coefficients. This method can also be used to find the rank of a matrix, to calculate the determinant of a matrix, and to calculate the inverse of an invertible square matrix.

______WORLD TECHNOLOGIES ______152 Numerical Analysis, Modelling and Simulation

The method is named after Carl Friedrich Gauss (1777–1855), although it was known to Chinese mathematicians as early as 179 CE.

To perform row reduction on a matrix, one uses a sequence of elementary row operations to modi- fy the matrix until the lower left-hand corner of the matrix is filled with zeros, as much as possible. There are three types of elementary row operations: 1) Swapping two rows, 2) Multiplying a row by a non-zero number, 3) Adding a multiple of one row to another row. Using these operations, a matrix can always be transformed into an upper triangular matrix, and in fact one that is in row echelon form. Once all of the leading coefficients (the left-most non-zero entry in each row) are 1, and every column containing a leading coefficient has zeros elsewhere, the matrix is said to be in reduced row echelon form. This final form is unique; in other words, it is independent of the sequence of row operations used. For example, in the following sequence of row operations (where multiple elementary operations might be done at each step), the third and fourth matrices are the ones in row echelon form, and the final matrix is the unique reduced row echelon form.

131 9  13 1 9 13 1 9 102−− 3       1111−  → 0228 −−−→ 0228 −−−→ 0114  3 11 5 35  0 2 2 8 0 0 0 0 0 0 0 0 

Using row operations to convert a matrix into reduced row echelon form is sometimes called Gauss–Jordan elimination. Some authors use the term Gaussian elimination to refer to the pro- cess until it has reached its upper triangular, or (non-reduced) row echelon form. For computa- tional reasons, when solving systems of linear equations, it is sometimes preferable to stop row operations before the matrix is completely reduced.

Definitions and Example of Algorithm

The process of row reduction makes use of elementary row operations, and can be divided into two parts. The first part (sometimes called Forward Elimination) reduces a given system to row echelon form, from which one can tell whether there are no solutions, a unique solution, or infinitely many solutions. The second part (sometimes called back substitution) continues to use row operations until the solution is found; in other words, it puts the matrix into re- duced row echelon form.

Another point of view, which turns out to be very useful to analyze the algorithm, is that row reduc- tion produces a matrix decomposition of the original matrix. The elementary row operations may be viewed as the multiplication on the left of the original matrix by elementary matrices. Alterna- tively, a sequence of elementary operations that reduces a single row may be viewed as multipli- cation by a Frobenius matrix. Then the first part of the algorithm computes anLU decomposition, while the second part writes the original matrix as the product of a uniquely determined invertible matrix and a uniquely determined reduced row echelon matrix.

Row Operations

There are three types of elementary row operations which may be performed on the rows of a ma- trix:

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 153

Type 1: Swap the positions of two rows.

Type 2: Multiply a row by a nonzero scalar.

Type 3: Add to one row a scalar multiple of another.

If the matrix is associated to a system of linear equations, then these operations do not change the solution set. Therefore, if one’s goal is to solve a system of linear equations, then using these row operations could make the problem easier.

Echelon Form

For each row in a matrix, if the row does not consist of only zeros, then the left-most non-zero entry is called the leading coefficient (or pivot) of that row. So if two leading coefficients are in the same column, then a row operation of type 3 could be used to make one of those coefficients zero. Then by using the row swapping operation, one can always order the rows so that for every non-ze- ro row, the leading coefficient is to the right of the leading coefficient of the row above. If this is the case, then matrix is said to be in row echelon form. So the lower left part of the matrix contains only zeros, and all of the zero rows are below the non-zero rows. The word “echelon” is used here because one can roughly think of the rows being ranked by their size, with the largest being at the top and the smallest being at the bottom.

For example, the following matrix is in row echelon form, and its leading coefficients are shown in red.

02 11−  003 1 000 0

It is in echelon form because the zero row is at the bottom, and the leading coefficient of the second row (in the third column), is to the right of the leading coefficient of the first row (in the second column).

A matrix is said to be in reduced row echelon form if furthermore all of the leading coefficients are equal to 1 (which can be achieved by using the elementary row operation of type 2), and in every column containing a leading coefficient, all of the other entries in that column are zero (which can be achieved by using elementary row operations of type 3).

Example of The Algorithm

Suppose the goal is to find and describe the set of solutions to the followingsystem of linear equa- tions:

2x+−= y z 8 (L ) − 3x −+ y 2z =− 11 (L ) − 2x ++ y 2z =− 3 (L ) 1 23 2x+−= y z 8 (L ) − 3x −+ y 2z =− 11 (L ) − 2x ++ y 2z =− 3 (L ) 1 23

2x+−= y z 8 (L1 ) − 3x −+ y 2z =− 11 (L 23 ) − 2x ++ y 2z =− 3 (L )

______WORLD TECHNOLOGIES ______154 Numerical Analysis, Modelling and Simulation

The table below is the row reduction process applied simultaneously to the system of equations, and its associated augmented matrix. In practice, one does not usually deal with the systems in terms of equations but instead makes use of the augmented matrix, which is more suitable for computer manipulations. The row reduction procedure may be summarized as follows: eliminate

x from all equations below L1, and then eliminate y from all equations below L2. This will put the system into triangular form. Then, using back-substitution, each unknown can be solved for.

The second column describes which row operations have just been performed. So for the first step,

3 the x is eliminated from L by adding L to L . Next x is eliminated from L by adding L to L . 2 2 1 2 3 1 3 These row operations are labelled in the table as

3 L+→ LL 22 12

LL31+→ L 3. Once y is also eliminated from the third row, the result is a system of linear equations in triangular form, and so the first part of the algorithm is complete. From a computational point of view, it is faster to solve the variables in reverse order, a process known as back-substitution. One sees the solution is z = -1, y = 3, and x = 2. So there is a unique solution to the original system of equations.

Instead of stopping once the matrix is in echelon form, one could continue until the matrix is in reduced row echelon form, as it is done in the table. The process of row reducing until the matrix is reduced is sometimes referred to as Gauss-Jordan elimination, to distinguish it from stopping after reaching echelon form.

History

The method of Gaussian elimination appears in the Chinese mathematical text Chapter Eight Rect- angular Arrays of The Nine Chapters on the Mathematical Art. Its use is illustrated in eighteen problems, with two to five equations. The first reference to the book by this title is dated to 179 CE, but parts of it were written as early as approximately 150 BCE. It was commented on by Liu Hui in the 3rd century.

The method in Europe stems from the notes of Isaac Newton. In 1670, he wrote that all the alge- bra books known to him lacked a lesson for solving simultaneous equations, which Newton then supplied. Cambridge University eventually published the notes as Arithmetica Universalis in 1707 long after Newton left academic life. The notes were widely imitated, which made (what is now called) Gaussian elimination a standard lesson in algebra textbooks by the end of the 18th century. Carl Friedrich Gauss in 1810 devised a notation for symmetric elimination that was adopted in the 19th century by professional hand computers to solve the normal equations of least-squares prob- lems. The algorithm that is taught in high school was named for Gauss only in the 1950s as a result of confusion over the history of the subject.

Some authors use the term Gaussian elimination to refer only to the procedure until the matrix is

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 155

in echelon form, and use the term Gauss-Jordan elimination to refer to the procedure which ends in reduced echelon form. The name is used because it is a variation of Gaussian elimination as described by Wilhelm Jordan in 1888. However, the method also appears in an article by Clasen published in the same year. Jordan and Clasen probably discovered Gauss–Jordan elimination independently.

Applications

The historically first application of the row reduction method is for solvingsystems of linear equa- tions. Here are some other important applications of the algorithm.

Computing Determinants

To explain how Gaussian elimination allows the computation of the determinant of a square ma- trix, we have to recall how the elementary row operations change the determinant:

• Swapping two rows multiplies the determinant by -1

• Multiplying a row by a nonzero scalar multiplies the determinant by the same scalar

• Adding to one row a scalar multiple of another does not change the determinant.

If the Gaussian elimination applied to a square matrix A produces a row echelon matrix B, let d be the product of the scalars by which the determinant has been multiplied, using above rules. Then the determinant of A is the quotient by d of the product of the elements of the diagonal of B: det(A) = ∏diag(B) / d.

Computationally, for a n×n matrix, this method needs only O(n3) arithmetic operations, while solving by elementary methods requires O(2n) or O(n!) operations. Even on the fastest computers, the elementary methods are impractical for n above 20.

Finding The Inverse of A Matrix

A variant of Gaussian elimination called Gauss–Jordan elimination can be used for finding the inverse of a matrix, if it exists. If A is a n by n square matrix, then one can use row reduction to compute its inverse matrix, if it exists. First, the n by n identity matrix is augmented to the right of A, forming a n by 2n block matrix [A | I]. Now through application of elementary row operations, find the reduced echelon form of this n by 2n matrix. The matrix A is invertible if and only if the left block can be reduced to the identity matrix I; in this case the right block of the final matrix is A−1. If the algorithm is unable to reduce the left block to I, then A is not invertible.

For example, consider the following matrix

2− 10  A =−−1 2 1. 0− 12

To find the inverse of this matrix, one takes the following matrix augmented by the identity, and

______WORLD TECHNOLOGIES ______156 Numerical Analysis, Modelling and Simulation

row reduces it as a 3 by 6 matrix:

2− 1 0 100  [AI | ]=−− 1 2 1 0 1 0. 0− 1 2 001

By performing row operations, one can check that the reduced row echelon form of this augmented matrix is:

311 100 424 11 [I | B]=  0 1 0 1 . 22  113 001 424

One can think of each row operation as the left product by an elementary matrix. Denoting by B the product of these elementary matrices, we showed, on the left, that BA = I, and therefore, B = A−1. On the right, we kept a record of BI = B, which we know is the inverse desired. This procedure for finding the inverse works for square matrices of any size.

Computing Ranks and Bases

The Gaussian elimination algorithm can be applied to any mn× matrix A . In this way, for exam- ple, some 69× matrices can be transformed to a matrix that has a row echelon form like a ********  00b ****** 000c ***** T =  000000d ** 00000000e  000000000

where the *s are arbitrary entries and a, b, c, d, e are nonzero entries. This echelon matrix T contains a wealth of information about A : the rank of A is 5 since there are 5 non-zero rows in T ; the vector space spanned by the columns of A has a basis consisting of the first, third, fourth, seventh and ninth column of A (the columns of a, b, c, d, e in T ), and the *s tell you how the other columns of A can be written as linear combinations of the basis columns. This is a consequence of the distributivity of the dot product in the expression of a linear map as a matrix.

All of this applies also to the reduced row echelon form, which is a particular row echelon form.

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 157

Computational Efficiency

The number of arithmetic operations required to perform row reduction is one way of measuring the algorithm’s computational efficiency. For example, to solve a system of n equations for n un- knowns by performing row operations on the matrix until it is in echelon form, and then solving for each unknown in reverse order, requires n(n+1) / 2 divisions, (2n3 + 3n2 − 5n)/6 multiplica- tions, and (2n3 + 3n2 − 5n)/6 subtractions, for a total of approximately 2n3 / 3 operations. Thus it has arithmetic complexity of O(n3). This arithmetic complexity is a good measure of the time needed for the whole computation when the time for each arithmetic operation is approximately constant. This is the case when the coefficients are represented byfloating point numbers or when they belong to a finite field. If the coefficients areintegers or rational numbers exactly represented, the intermediate entries can grow exponentially large, so the bit complexity is exponential. How- ever, there is a variant of Gaussian elimination, called Bareiss algorithm that avoids this exponen- tial growth of the intermediate entries, and, with the same arithmetic complexity of O(n3), has a bit complexity of O(n5).

This algorithm can be used on a computer for systems with thousands of equations and unknowns. However, the cost becomes prohibitive for systems with millions of equations. These large systems are generally solved using iterative methods. Specific methods exist for systems whose coefficients follow a regular pattern.

To put an n by n matrix into reduced echelon form by row operations, one needs n3 arithmetic operations; which is approximately 50% more computation steps.

One possible problem is numerical instability, caused by the possibility of dividing by very small numbers. If, for example, the leading coefficient of one of the rows is very close to zero, then to row reduce the matrix one would need to divide by that number so the leading coeffi- cient is 1. This means any error that existed for the number which was close to zero would be amplified. Gaussian elimination is numerically stable for diagonally dominant or positive-defi- nite matrices. For general matrices, Gaussian elimination is usually considered to be stable, when using partial pivoting, even though there are examples of stable matrices for which it is unstable.

Generalizations

The Gaussian elimination can be performed over any field, not just the real numbers.

Gaussian elimination does not generalize in any simple way to higher order tensors (matrices are array representations of order 2 tensors); even computing the rank of a tensor of order greater than 2 is a difficult problem.

Pseudocode

As explained above, Gaussian elimination writes a given m × n matrix A uniquely as a product of an invertible m × m matrix S and a row-echelon matrix T. Here, S is the product of the matrices corresponding to the row operations performed.

The formal algorithm to compute T from A follows. We write Ai[, j ]for the entry in row i, column

______WORLD TECHNOLOGIES ______158 Numerical Analysis, Modelling and Simulation

j in matrix A with 1 being the first index. The transformation is performedin place, meaning that the original matrix A is lost and successively replaced by T .

for k = 1 ... min(m,n): Find the k-th pivot: i_max := argmax (i = k ... m, abs(A[i, k])) if A[i_max, k] = 0 error “Matrix is singular!” swap rows(k, i_max) Do for all rows below pivot: for i = k + 1 ... m: f := A[i, k] / A[k, k] Do for all remaining elements in current row: for j = k + 1 ... n: A[i, j] := A[i, j] - A[k, j] * f Fill lower triangular matrix with zeros: A[i, k] := 0 This algorithm differs slightly from the one discussed earlier, because before eliminating a variable, it first exchanges rows to move the entry with the largestabsolute value to the pivot position. Such partial pivoting improves the numerical stability of the algorithm; some other variants are used.

Upon completion of this procedure the augmented matrix will be in row-echelon form and may be solved by back-substitution.

With modern computers, Gaussian elimination is not always the fastest algorithm to compute the row echelon form of matrix. There are computer libraries, like BLAS, that exploit the specifics of the com- puter hardware and of the structure of the matrix to choose the best algorithm automatically.

Convex Optimization

Convex minimization, a subfield of optimization, studies the problem of minimizing convex func- tions over convex sets. The convexity property can make optimization in some sense “easier” than the general case - for example, any local minimum must be a global minimum.

Given a real vector space X together with a convex, real-valued function f : → 

defined on aconvex subset  of X , the problem is to find any point x∗ in  for which the number

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 159

fx()is smallest, i.e., a point x∗ such that

fx(∗ )≤ fx ()for all x ∈ .

The convexity of f makes the powerful tools of convex analysis applicable. In finite-dimensional normed spaces, the Hahn–Banach theorem and the existence of subgradients lead to a particularly satisfying theory of necessary and sufficient conditions for optimality, a duality theory generalizing that for linear programming, and effective computational methods.

Convex minimization has applications in a wide range of disciplines, such as automatic control systems, estimation and signal processing, communications and networks, electronic circuit de- sign, data analysis and modeling, statistics (optimal design), and finance. With recent improve- ments in computing and in optimization theory, convex minimization is nearly as straightforward as linear programming. Many optimization problems can be reformulated as convex minimization problems. For example, the problem of maximizing a concave function f can be re-formulated equivalently as a problem of minimizing the function -f, which is convex.

Convex Optimization Problem

The general form of an optimization problem (also referred to as a mathematical programming problem or minimization problem) is to find some x∗ ∈ such that

fx(∗ )= min{ fx ( ) : x∈ },

for some feasible set  ⊂ n and objective function fx( ):n → .. The optimization problem is called a convex optimization problem if  is a convex set and fx()is a convex function defined on n . Alternatively, an optimization problem of the form minimizefx ( )

subjecttogxi ()≤=… 0, i 1, , m

n is called convex if the functions fg,:1 …→ gm are all convex functions.

Theory

The following statements are true about the convex minimization problem:

• if a local minimum exists, then it is a global minimum.

• the set of all (global) minima is convex.

• for each strictly convex function, if the function has a minimum, then the minimum is unique.

These results are used by the theory of convex minimization along with geometric notions from functional analysis (in Hilbert spaces) such as the Hilbert projection theorem, the separating hy- perplane theorem, and Farkas’ lemma.

______WORLD TECHNOLOGIES ______160 Numerical Analysis, Modelling and Simulation

Standard Form

Standard form is the usual and most intuitive form of describing a convex minimization problem. It consists of the following three parts: • A convex function fx( ):n → to be minimized over the variable x

• Inequality constraints of the form gxi ( )≤ 0, , where the functions x are convex

• Equality constraints of the form hxi ( )= 0, , where the functions gi are affine. In practice, the terms “linear” and “affine” are often used interchangeably. Such constraints can be T expressed in the form hi , where hxi()= ax ii + bis a column-vector and ai a real number. A convex minimization problem is thus written as minimizefx ( ) x

subjecttogxi ()≤=… 0, i 1, , m

hxi ( )= 0, i = 1, … , p .

Note that every equality constraint hx()= 0can be equivalently replaced by a pair of inequality constraints hx()≤ 0and −≤hx() 0. Therefore, for theoretical purposes, equality constraints are redundant; however, it can be beneficial to treat them specially in practice.

Following from this fact, it is easy to understand why hxi ()= 0has to be affine as opposed to mere- ly being convex. If hxi ()is convex, hxi ()≤ 0is convex, but −≤hxi () 0is concave. Therefore, the only way for hxi ()= 0to be convex is for hxi ()to be affine.

Examples

The following problems are all convex minimization problems, or can be transformed into convex minimizations problems via a change of variables:

• Least squares

• Linear programming

• Convex quadratic minimization with linear constraints

• quadratic minimization with convex quadratic constraints

• Conic optimization

• Geometric programming

• Second order cone programming

• Entropy maximization with appropriate constraints

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 161

Lagrange Multipliers

Consider a convex minimization problem given in standard form by a cost function fx()and in- equality constraints gxi ()≤ 0for 1.≤≤im Then the domain  is:

 =∈…≤{x Xgx|1 (),, gm () x 0.}

The Lagrangian function for the problem is

Lx(,λλ0 , 1 ,… , λm ) = λ0f () x + λ 11 g () x ++ λmm g (). x

For each point x in X that minimizes f over X , there exist real numbers λλ01,,,… λm ,called Lagrange multipliers, that satisfy these conditions simultaneously:

1. x minimizes Ly(,λλ01 , ,… , λm )over all yX∈ , λλ, ,…≥ , λ 0, with at least one λ > 0, 2. 01 m k λλgx()= = g () x = 0(complementary slackness). 3. 11  mm If there exists a “strictly feasible point”, that is, a point z satisfying

gz1(),,…< gm () z 0,

then the statement above can be strengthened to require that λ0 =1.

Conversely, if some x in X satisfies (1)–(3) for scalars λλ0 ,,… m with λ0 =1then x is certain to minimize f over X .

Methods

Convex minimization problems can be solved by the following contemporary methods: • “Bundle methods” (Wolfe, Lemaréchal, Kiwiel), and • Subgradient projection methods (Polyak), • Interior-point methods (Nemirovskii and Nesterov). Other methods of interest: • Cutting-plane methods • Ellipsoid method • Subgradient method • Dual subgradients and the drift-plus-penalty method

Subgradient methods can be implemented simply and so are widely used. Dual subgradient meth- ods are subgradient methods applied to a dual problem. The drift-plus-penalty method is similar to the dual subgradient method, but takes a time average of the primal variables.

______WORLD TECHNOLOGIES ______162 Numerical Analysis, Modelling and Simulation

Convex Minimization With Good Complexity: Self-concordant Barriers

The efficiency of iterative methods is poor for the class of convex problems, because this class in- cludes “bad guys” whose minimum cannot be approximated without a large number of function and subgradient evaluations; thus, to have practically appealing efficiency results, it is necessary to make additional restrictions on the class of problems. Two such classes are problems special barrier functions, first self-concordant barrier functions, according to the theory of Nesterov and Nemirovskii, and second self-regular barrier functions according to the theory of Terlaky and coauthors.

Quasiconvex Minimization

Problems with convex level sets can be efficiently minimized, in theory.Yurii Nesterov proved that quasi-convex minimization problems could be solved efficiently, and his results were extended by Kiwiel. However, such theoretically “efficient” methods use “divergent-series” stepsize rules, which were first developed for classical subgradient methods. Classical subgradient methods us- ing divergent-series rules are much slower than modern methods of convex minimization, such as subgradient projection methods, bundle methods of descent, and nonsmooth filter methods.

Solving even close-to-convex but non-convex problems can be computationally intractable. Mini- mizing a unimodal function is intractable, regardless of the smoothness of the function, according to results of Ivanov.

Convex Maximization

Conventionally, the definition of the convex optimization problem (we recall) requires that the objective function f to be minimized and the feasible set be convex. In the special case of linear programming (LP), the objective function is both concave and convex, and so LP can also consider the problem of maximizing an objective function without confusion. However, for most convex minimization problems, the objective function is not concave, and therefore a problem and then such problems are formulated in the standard form of convex optimization problems, that is, min- imizing the convex objective function.

For nonlinear convex minimization, the associated maximization problem obtained by substitut- ing the supremum operator for the infimum operator is not a problem of convex optimization, as conventionally defined. However, it is studied in the larger field of convex optimization as a prob- lem of convex maximization.

The convex maximization problem is especially important for studying the existence of maxima. Consider the restriction of a convex function to a compact convex set: Then, on that set, the func- tion attains its constrained maximum only on the boundary. Such results, called “maximum prin- ciples”, are useful in the theory of harmonic functions, potential theory, and partial differential equations.

The problem of minimizing a quadratic multivariate polynomial on a cube is NP-hard. In fact, in the quadratic minimization problem, if the matrix has only one negative eigenvalue, is NP- hard.

______WORLD TECHNOLOGIES ______Methods and Techniques of Numerical Analysis 163

Extensions

Advanced treatments consider convex functions that can attain positive infinity, also; theindicator function of convex analysis is zero for every x ∈ and positive infinity otherwise.

Extensions of convex functions include biconvex, pseudo-convex, and quasi-convex functions. Partial extensions of the theory of convex analysis and iterative methods for approximately solving non-convex minimization problems occur in the field of generalized convexity (“abstract convex analysis”).

References • Bertsekas, Dimitri P.; Nedic, Angelia; Ozdaglar, Asuman (2003). Convex Analysis and Optimization. Belmont, MA.: Athena Scientific. ISBN 1-886529-45-0.

• Bertsekas, Dimitri P. (2009). Convex Optimization Theory. Belmont, MA.: Athena Scientific. ISBN 978-1- 886529-31-1.

• Boyd, Stephen P.; Vandenberghe, Lieven (2004). Convex Optimization (). Cambridge University Press. ISBN 978-0-521-83378-3. Retrieved October 15, 2011.

• Kiwiel, Krzysztof C. (1985). Methods of Descent for Nondifferentiable Optimization. Lecture Notes in Mathe- matics. New York: Springer-Verlag. ISBN 978-3-540-15642-0.

______WORLD TECHNOLOGIES ______6 Essential Aspects of Numerical Analysis

Numerical integration is a broad family that involves algorithms for calculation. The calculation is done for calculating the numerical value of a definite integral. The aspects of numerical analysis explained in this section are Monte Carlo method, Monte Carlo integration, mathematical opti- mization, optimization problem, singular value decomposition etc. The topics discussed in the section are of great importance to broaden the existing knowledge on numerical analysis.

Numerical Integration

In numerical analysis, numerical integration constitutes a broad family of algorithms for calculat- ing the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical solution of differential equations. The term numerical quadrature (often abbreviated to quadrature) is more or less a synonym for numerical integration, especially as applied to one-dimensional integrals. Some authors refer to numerical integration over more than one dimension as cubature; others take quadrature to include higher-dimensional integration.

Numerical integration consists of finding numerical approximations for the value S

The basic problem in numerical integration is to compute an approximate solution to a definite integral

b ∫ f() x dx a

to a given degree of accuracy. If f(x) is a smooth function integrated over a small number of dimen- sions, and the domain of integration is bounded, there are many methods for approximating the integral to the desired precision.

History

The term “numerical integration” first appears in 1915 in the publicationA Course in Interpolation and Numeric Integration for the Mathematical Laboratory by David Gibb.

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 165

Quadrature is a historical mathematical term that means calculating area. Quadrature problems have served as one of the main sources of mathematical analysis. Mathematicians of Ancient Greece, according to the Pythagorean doctrine, understood calculation of area as the process of constructing geometrically a square having the same area (squaring). That is why the process was named quadrature. For example, a quadrature of the circle, Lune of Hippocrates, The Quadrature of the Parabola. This construction must be performed only by means of compass and straightedge.

The ancient Babylonians used the trapezoidal rule to integrate the motion of Jupiter along the ecliptic.

Antique method to find the Geometric mean

For a quadrature of a rectangle with the sides a and b it is necessary to construct a square with the side x= ab (the Geometric mean of a and b). For this purpose it is possible to use the following fact: if we draw the circle with the sum of a and b as the diameter, then the height BH (from a point of their connection to crossing with a circle) equals their geometric mean. The similar geometrical construction solves a problem of a quadrature for a parallelogram and a triangle.

The area of a segment of a parabola

Problems of quadrature for curvilinear figures are much more difficult. The quadrature of the circle with compass and straightedge had been proved in the 19th century to be impossible. Nev- ertheless, for some figures (for exampleLune of Hippocrates) a quadrature can be performed. The quadratures of a sphere surface and a parabola segment done by Archimedes became the highest achievement of the antique analysis.

• The area of the surface of a sphere is equal to quadruple the area of a great circle of this sphere.

• The area of a segment of the parabola cut from it by a straight line is 4/3 the area of the triangle inscribed in this segment.

For the proof of the results Archimedes used the Method of exhaustion of Eudoxus.

______WORLD TECHNOLOGIES ______166 Numerical Analysis, Modelling and Simulation

In medieval Europe the quadrature meant calculation of area by any method. More often the Method of indivisibles was used; it was less rigorous, but more simple and powerful. With its help Galileo Galilei and Gilles de Roberval found the area of a cycloid arch, Grégoire de Saint-Vincent investigated the area under a hyperbola (Opus Geometricum, 1647), and Alphonse Antonio de Sarasa, de Saint-Vincent’s pupil and commentator noted the relation of this area to logarithms.

John Wallis algebrised this method: he wrote in his Arithmetica Infinitorum (1656) series that we now call the definite integral, and he calculated their values. Isaac Barrow and James Gregory made further progress: quadratures for some algebraic curves and spirals. Christiaan Huygens successfully performed a quadrature of some Solids of revolution.

The quadrature of the hyperbola by Saint-Vincent and de Sarasa provided a new function, the nat- ural logarithm, of critical importance.

With the invention of integral calculus came a universal method for area calculation. In response, the term quadrature has become traditional, and instead the modern phrase “computation of a univariate definite integral” is more common.

Reasons for Numerical Integration

There are several reasons for carrying out numerical integration.

1. The integrand f(x) may be known only at certain points, such as obtained by sampling. Some embedded systems and other computer applications may need numerical integra- tion for this reason.

2. A formula for the integrand may be known, but it may be difficult or impossible to find an antiderivative that is an elementary function. An example of such an integrand is f(x) = exp(−x2), the antiderivative of which (the error function, times a constant) cannot be written in elementary form.

3. It may be possible to find an antiderivative symbolically, but it may be easier to compute a numerical approximation than to compute the antiderivative. That may be the case if the antiderivative is given as an infinite series or product, or if its evaluation requires a special function that is not available.

Methods for One-dimensional Integrals

Numerical integration methods can generally be described as combining evaluations of the inte- grand to get an approximation to the integral. The integrand is evaluated at a finite set of points called integration points and a weighted sum of these values is used to approximate the integral. The integration points and weights depend on the specific method used and the accuracy required from the approximation.

An important part of the analysis of any numerical integration method is to study the behavior of the approximation error as a function of the number of integrand evaluations. A method that yields a small error for a small number of evaluations is usually considered superior. Reducing the number of evaluations of the integrand reduces the number of arithmetic operations involved, and

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 167

therefore reduces the total round-off error. Also, each evaluation takes time, and the integrand may be arbitrarily complicated.

A ‘brute force’ kind of numerical integration can be done, if the integrand is reasonably well-be- haved (i.e. piecewise continuous and of bounded variation), by evaluating the integrand with very small increments.

Quadrature Rules Based on Interpolating Functions

A large class of quadrature rules can be derived by constructing interpolating functions that are easy to integrate. Typically these interpolating functions are polynomials. In practice, since poly- nomials of very high degree tend to oscillate wildly, only polynomials of low degree are used, typ- ically linear and quadratic.

Illustration of the rectangle rule.

The simplest method of this type is to let the interpolating function be a constant function (a polynomial of degree zero) that passes through the point ((a+b)/2, f((a+b)/2)). This is called the midpoint rule or rectangle rule.

b ab+ f() x dx≈− ( b a ) f  . ∫a 2

Illustration of the trapezoidal rule.

The interpolating function may be a straight line (an affine function, i.e. a polynomial of degree 1) passing through the points (a, f(a)) and (b, f(b)). This is called the trapezoidal rule.

b fa()+ fb () ∫ f() x dx≈− ( b a ). a 2

Illustration of Simpson’s rule.

______WORLD TECHNOLOGIES ______168 Numerical Analysis, Modelling and Simulation

For either one of these rules, we can make a more accurate approximation by breaking up the in- terval [a, b] into some number n of subintervals, computing an approximation for each subinter- val, then adding up all the results. This is called a composite rule, extended rule, or iterated rule. For example, the composite trapezoidal rule can be stated as

− b bafa−() n 1 ba − fb() f() x dx ≈ ++f a k + ∫a ∑ nn22k =1 

where the subintervals have the form [k h, (k+1) h], with h = (b−a)/n and k = 0, 1, 2, ..., n−1.

Interpolation with polynomials evaluated at equally spaced points in [a, b] yields the Newton– Cotes formulas, of which the rectangle rule and the trapezoidal rule are examples. Simpson’s rule, which is based on a polynomial of order 2, is also a Newton–Cotes formula.

Quadrature rules with equally spaced points have the very convenient property of nesting. The corresponding rule with each interval subdivided includes all the current points, so those inte- grand values can be re-used.

If we allow the intervals between interpolation points to vary, we find another group of quadrature formulas, such as the Gaussian quadrature formulas. A Gaussian quadrature rule is typically more accurate than a Newton–Cotes rule, which requires the same number of function evaluations, if the integrand is smooth (i.e., if it is sufficiently differentiable). Other quadrature methods with varying intervals include Clenshaw–Curtis quadrature (also called Fejér quadrature) methods, which do nest.

Gaussian quadrature rules do not nest, but the related Gauss–Kronrod quadrature formulas do.

Adaptive Algorithms

If f(x) does not have many derivatives at all points, or if the derivatives become large, then Gauss- ian quadrature is often insufficient. In this case, an algorithm similar to the following will perform better:

def calculate_definite_integral_of_f(f, initial_step_size):

‘’’ This algorithm calculates the definite integral of a function from 0 to 1, adaptively, by choosing smaller steps near problematic points. ‘’’ x = 0.0 h = initial_step_size accumulator = 0.0 while x < 1.0:

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 169

if x + h > 1.0: h = 1.0 - x quad_this_step = if error_too_big_in_quadrature_of_over_range(f, [x,x+h]): h = make_h_smaller(h) else: accumulator += quadrature_of_f_over_range(f, [x,x+h]) x += h if error_too_small_in_quadrature_of_over_range(f, [x,x+h]): h = make_h_larger(h) # Avoid wasting time on tiny steps. return accumulator Some details of the algorithm require careful thought. For many cases, estimating the error from quadrature over an interval for a function f(x) isn’t obvious. One popular solution is to use two different rules of quadrature, and use their difference as an estimate of the error from quadrature. The other problem is deciding what “too large” or “very small” signify. A local criterion for “too large” is that the quadrature error should not be larger than t · h where t, a real number, is the tol- erance we wish to set for global error. Then again, if h is already tiny, it may not be worthwhile to make it even smaller even if the quadrature error is apparently large. A global criterion is that the sum of errors on all the intervals should be less than t. This type of error analysis is usually called “a posteriori” since we compute the error after having computed the approximation.

Extrapolation Methods

The accuracy of a quadrature rule of the Newton-Cotes type is generally a function of the number of evaluation points. The result is usually more accurate as the number of evaluation points in- creases, or, equivalently, as the width of the step size between the points decreases. It is natural to ask what the result would be if the step size were allowed to approach zero. This can be answered by extrapolating the result from two or more nonzero step sizes, using series acceleration methods such as Richardson extrapolation. The extrapolation function may be a polynomial or rational function. Extrapolation methods are described in more detail by Stoer and Bulirsch (Section 3.4) and are implemented in many of the routines in the QUADPACK library.

Conservative (A Priori) Error Estimation

Let f have a bounded first derivative over [a,b]. The mean value theorem for f, where x < b, gives

(x−=− af )′ ( vx ) fx () fa ()

for some vx in [a,x] depending on x. If we integrate in x from a to b on both sides and take the ab- solute values, we obtain

______WORLD TECHNOLOGIES ______170 Numerical Analysis, Modelling and Simulation

bb f() x dx−− ( b a ) f () a = ( x − a ) f′ ( v ) dx ∫∫aax

We can further approximate the integral on the right-hand side by bringing the absolute value into the integrand, and replacing the term in f’ by an upper bound:

2 b ()ba− ∫ f() x dx−− ( b a ) f () a ≤ supf′ () x (**) a 2 axb≤≤ b Hence, if we approximate the integral ∫a f(x) dx by the quadrature rule (b − a)f(a) our error is no greater than the right hand side of (**). We can convert this into an error analysis for the Riemann sum (*), giving an upper bound of

n−1 supfx′ ( ) 2 01≤≤x

for the error term of that particular approximation. (Note that this is precisely the error we calculated for the example fx()= x.) Using more derivatives, and by tweaking the quadra- ture, we can do a similar error analysis using a Taylor series (using a partial sum with remain- der term) for f. This error analysis gives a strict upper bound on the error, if the derivatives of f are available.

This integration method can be combined with to produce computer proofs and verified calculations.

Integrals Over Infinite Intervals

Several methods exist for approximate integration over unbounded intervals. The standard tech- nique involves specially derived quadrature rules, such as Gauss-Hermite quadrature for integrals on the whole real line and Gauss-Laguerre quadrature for integrals on the positive reals. Monte Carlo methods can also be used, or a change of variables to a finite interval; e.g., for the whole line one could use

2 +∞ +1 tt1+ f() x dx= f  dt, ∫∫−∞ −1 1−−tt2 (1 22 )

and for semi-infinite intervals one could use

+∞ 1 t dt f() x dx= f a + ∫∫a 0 1−−tt (1 ) 2

a 1 1− t dt f() x dx= f a − ∫∫−∞ 0 tt2

as possible transformations.

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 171

Multidimensional Integrals

The quadrature rules discussed so far are all designed to compute one-dimensional integrals. To compute integrals in multiple dimensions, one approach is to phrase the multiple integral as re- peated one-dimensional integrals by applying to Fubini’s theorem. This approach requires the function evaluations to grow exponentially as the number of dimensions increases. Three methods are known to overcome this so-called curse of dimensionality.

Monte Carlo

Monte Carlo methods and quasi-Monte Carlo methods are easy to apply to multi-dimensional in- tegrals, and may yield greater accuracy for the same number of function evaluations than repeated integrations using one-dimensional methods.

A large class of useful Monte Carlo methods are the so-called Markov chain Monte Carlo algo- rithms, which include the Metropolis-Hastings algorithm and Gibbs sampling.

Sparse Grids

Sparse grids were originally developed by Smolyak for the quadrature of high-dimensional func- tions. The method is always based on a one-dimensional quadrature rule, but performs a more sophisticated combination of univariate results.

Bayesian Quadrature

Bayesian Quadrature is a statistical approach to the numerical problem of computing integrals and falls under the field of probabilistic numerics. It can provide a full handling of the uncertainty over the solution of the integral expressed as a Gaussian Process posterior variance. It is also known to provide very fast convergence rates which can be up to exponential in the number of quadrature points n.

Connection With Differential Equations

The problem of evaluating the integral

x F() x= f () u du ∫a

can be reduced to an initial value problem for an ordinary differential equation by applying the first part of the fundamental theorem of calculus. By differentiating both sides of the above with respect to the argument x, it is seen that the function F satisfies dF() x = f( x ), Fa ( )= 0. dx

Methods developed for ordinary differential equations, such as Runge–Kutta methods, can be ap- plied to the restated problem and thus be used to evaluate the integral. For instance, the standard fourth-order Runge–Kutta method applied to the differential equation yields Simpson’s rule from above.

______WORLD TECHNOLOGIES ______172 Numerical Analysis, Modelling and Simulation

The differential equation F ‘ (x) = ƒ(x) has a special form: the right-hand side contains only the de- pendent variable (here x) and not the independent variable (here F). This simplifies the theory and algorithms considerably. The problem of evaluating integrals is thus best studied in its own right.

Monte Carlo Method

Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational al- gorithms that rely on repeated random sampling to obtain numerical results. Their essential idea is using randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. Monte Carlo methods are mainly used in three distinct problem classes: optimization, numerical integration, and generating draws from a probabil- ity distribution.

In physics-related problems, Monte Carlo methods are quite useful for simulating systems with many coupled degrees of freedom, such as fluids, disordered materials, strongly coupled solids, and cellular structures. Other examples include modeling phenomena with significantuncertainty in inputs such as the calculation of risk in business and, in math, evaluation of multidimensional definite integrals with complicated boundary conditions. In application to space and oil explora- tion problems, Monte Carlo–based predictions of failure, cost overruns and schedule overruns are routinely better than human intuition or alternative “soft” methods.

In principle, Monte Carlo methods can be used to solve any problem having a probabilistic inter- pretation. By the law of large numbers, integrals described by the expected value of some random variable can be approximated by taking the empirical mean (a.k.a. the sample mean) of indepen- dent samples of the variable. When the probability distribution of the variable is parametrized, mathematicians often use a Markov Chain Monte Carlo (MCMC) sampler. The central idea is to design a judicious Markov chain model with a prescribed stationary probability distribution. By the ergodic theorem, the stationary distribution is approximated by the empirical measures of the random states of the MCMC sampler.

In other problems, the objective is generating draws from a sequence of probability distributions satisfying a nonlinear evolution equation. These flows of probability distributions can always be interpreted as the distributions of the random states of a Markov process whose transition proba- bilities depend on the distributions of the current random states. In other instances we are given a flow of probability distributions with an increasing level of sampling complexity (path spaces models with an increasing time horizon, Boltzmann-Gibbs measures associated with decreasing temperature parameters, and many others). These models can also be seen as the evolution of the law of the random states of a nonlinear Markov chain. A natural way to simulate these sophisticat- ed nonlinear Markov processes is to sample a large number of copies of the process, replacing in the evolution equation the unknown distributions of the random states by the sampled empirical measures. In contrast with traditional Monte Carlo and Markov chain Monte Carlo methodologies these mean field particle techniques rely on sequential interacting samples. The terminology mean field reflects the fact that each of thesamples (a.k.a. particles, individuals, walkers, agents, crea- tures, or phenotypes) interacts with the empirical measures of the process. When the size of the

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 173

system tends to infinity, these random empirical measures converge to the deterministic distribu- tion of the random states of the nonlinear Markov chain, so that the statistical interaction between particles vanishes.

Introduction

Monte Carlo method applied to approximating the value of π. After placing 30000 random points, the estimate for π is within 0.07% of the actual value.

Monte Carlo methods vary, but tend to follow a particular pattern:

1. Define a domain of possible inputs.

2. Generate inputs randomly from a probability distribution over the domain.

3. Perform a deterministic computation on the inputs.

4. Aggregate the results.

For example, consider a circle inscribed in a unit square. Given that the circle and the square have a ratio of areas that is π/4, the value of π can be approximated using a Monte Carlo method:

1. Draw a square, then inscribe a circle within it.

2. Uniformly scatter objects of uniform size over the square.

3. Count the number of objects inside the circle and the total number of objects.

4. The ratio of the two counts is an estimate of the ratio of the two areas, which is π/4. Multi- ply the result by 4 to estimate π.

In this procedure the domain of inputs is the square that circumscribes our circle. We generate random inputs by scattering grains over the square then perform a computation on each input (test whether it falls within the circle). Finally, we aggregate the results to obtain our final result, the approximation of π.

There are two important points to consider here: Firstly, if the grains are not uniformly distribut- ed, then our approximation will be poor. Secondly, there should be a large number of inputs. The

______WORLD TECHNOLOGIES ______174 Numerical Analysis, Modelling and Simulation

approximation is generally poor if only a few grains are randomly dropped into the whole square. On average, the approximation improves as more grains are dropped.

Uses of Monte Carlo methods require large amounts of random numbers, and it was their use that spurred the development of pseudorandom number generators, which were far quicker to use than the tables of random numbers that had been previously used for statistical sampling.

History

Before the Monte Carlo method was developed, simulations tested a previously understood deter- ministic problem and statistical sampling was used to estimate uncertainties in the simulations. Monte Carlo simulations invert this approach, solving deterministic problems using a probabilistic analog.

An early variant of the Monte Carlo method can be seen in the Buffon’s needle experiment, in which π can be estimated by dropping needles on a floor made of parallel and equidistant strips. In the 1930s, Enrico Fermi first experimented with the Monte Carlo method while studying neutron diffusion, but did not publish anything on it.

The modern version of the Markov Chain Monte Carlo method was invented in the late 1940s by Stanislaw Ulam, while he was working on nuclear weapons projects at the Los Alamos National Laboratory. Immediately after Ulam’s breakthrough, John von Neumann understood its impor- tance and programmed the ENIAC computer to carry out Monte Carlo calculations. In 1946, phys- icists at Los Alamos Scientific Laboratory were investigating radiation shielding and the distance that neutrons would likely travel through various materials. Despite having most of the necessary data, such as the average distance a neutron would travel in a substance before it collided with an atomic nucleus, and how much energy the neutron was likely to give off following a collision, the Los Alamos physicists were unable to solve the problem using conventional, deterministic math- ematical methods. Stanislaw Ulam had the idea of using random experiments. He recounts his inspiration as follows:

The first thoughts and attempts I made to practice [the Monte Carlo Method] were sug- gested by a question which occurred to me in 1946 as I was convalescing from an illness and playing solitaires. The question was what are the chances that a Canfield solitaire laid out with 52 cards will come out successfully? After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method than “abstract thinking” might not be to lay it out say one hundred times and simply ob- serve and count the number of successful plays. This was already possible to envisage with the beginning of the new era of fast computers, and I immediately thought of problems of neutron diffusion and other questions of mathematical physics, and more generally how to change processes described by certain differential equations into an equivalent form inter- pretable as a succession of random operations. Later [in 1946], I described the idea to John von Neumann, and we began to plan actual calculations.

–Stanislaw Ulam

Being secret, the work of von Neumann and Ulam required a code name. A colleague of von Neu- mann and Ulam, Nicholas Metropolis, suggested using the name Monte Carlo, which refers to the

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 175

Monte Carlo Casino in Monaco where Ulam’s uncle would borrow money from relatives to gamble. Using lists of “truly random” random numbers was extremely slow, but von Neumann developed a way to calculate pseudorandom numbers, using the middle-square method. Though this method has been criticized as crude, von Neumann was aware of this: he justified it as being faster than any other method at his disposal, and also noted that when it went awry it did so obviously, unlike methods that could be subtly incorrect.

Monte Carlo methods were central to the simulations required for the Manhattan Project, though severely limited by the computational tools at the time. In the 1950s they were used at Los Alamos for early work relating to the development of the hydrogen bomb, and became popularized in the fields ofphysics , physical chemistry, and operations research. The Rand Corporation and the U.S. Air Force were two of the major organizations responsible for funding and disseminating informa- tion on Monte Carlo methods during this time, and they began to find a wide application in many different fields.

The theory of more sophisticated mean field type particle Monte Carlo methods had certainly started by the mid-1960s, with the work of Henry P. McKean Jr. on Markov interpretations of a class of nonlinear parabolic partial differential equations arising in fluid mechanics. We also quote an earlier pioneering article by Theodore E. Harris and Herman Kahn, published in 1951, using mean fieldgenetic -type Monte Carlo methods for estimating particle transmission energies. Mean field genetic type Monte Carlo methodologies are also used as heuristic natural search algorithms (a.k.a. ) in evolutionary computing. The origins of these mean field computational techniques can be traced to 1950 and 1954 with the work of Alan Turing on genetic type muta- tion-selection learning machines and the articles by Nils Aall Barricelli at the Institute for Ad- vanced Study in Princeton, New Jersey.

Quantum Monte Carlo, and more specifically Diffusion Monte Carlo methods can also be inter- preted as a mean field particle Monte Carlo approximation of Feynman-Kac path integrals. The or- igins of Quantum Monte Carlo methods are often attributed to Enrico Fermi and Robert Richtmy- er who developed in 1948 a mean field particle interpretation of neutron-chain reactions, but the first heuristic-like and genetic type particle algorithm (a.k.a. Resampled or Reconfiguration Monte Carlo methods) for estimating ground state energies of quantum systems (in reduced matrix mod- els) is due to Jack H. Hetherington in 1984 In molecular chemistry, the use of genetic heuristic-like particle methodologies (a.k.a. pruning and enrichment strategies) can be traced back to 1955 with the seminal work of Marshall. N. Rosenbluth and Arianna. W. Rosenbluth.

The use of Sequential Monte Carlo in advanced Signal processing and Bayesian inference is more recent. It was in 1993, that Gordon et al., published in their seminal work the first application of a Monte Carlo resampling algorithm in Bayesian statistical inference. The authors named their algorithm ‘the bootstrap filter’, and demonstrated that compared to other filtering methods, their bootstrap algorithm does not require any assumption about that state-space or the noise of the system. We also quote another pioneering article in this field of Genshiro Kitagawa on a related “Monte Carlo filter”, and the ones by Pierre Del Moral and Himilcon Carvalho, Pierre Del Moral, André Monin and Gérard Salut on particle filters published in the mid-1990s. Particle filters were also developed in signal processing in the early 1989-1992 by P. Del Moral, J.C. Noyer, G. Rigal, and G. Salut in the LAAS-CNRS in a series of restricted and classified research reports with ST- CAN (Service Technique des Constructions et Armes Navales), the IT company DIGILOG, and the

______WORLD TECHNOLOGIES ______176 Numerical Analysis, Modelling and Simulation

LAAS-CNRS (the Laboratory for Analysis and Architecture of Systems) on RADAR/SONAR and GPS signal processing problems. These Sequential Monte Carlo methodologies can be interpreted as an acceptance-rejection sampler equipped with an interacting recycling mechanism.

From 1950 to 1996, all the publications on Sequential Monte Carlo methodologies including the pruning and resample Monte Carlo methods introduced in computational physics and molecular chemistry, present natural and heuristic-like algorithms applied to different situations without a single proof of their consistency, nor a discussion on the bias of the estimates and on genealogical and ancestral tree based algorithms. The mathematical foundations and the first rigorous analysis of these particle algorithms are due to Pierre Del Moral in 1996. Branching type particle method- ologies with varying population sizes were also developed in the end of the 1990s by Dan Crisan, Jessica Gaines and Terry Lyons, and by Dan Crisan, Pierre Del Moral and Terry Lyons. Further developments in this field were developed in 2000 by P. Del Moral, A. Guionnet and L. Miclo.

Definitions

There is no consensus on how Monte Carlo should be defined. For example, Ripley defines most probabilistic modeling as stochastic simulation, with Monte Carlo being reserved for Monte Carlo integration and Monte Carlo statistical tests. Sawilowsky distinguishes between a simulation, a Monte Carlo method, and a Monte Carlo simulation: a simulation is a fictitious representation of reality, a Monte Carlo method is a technique that can be used to solve a mathematical or statistical problem, and a Monte Carlo simulation uses repeated sampling to determine the properties of some phenomenon (or behavior). Examples:

• Simulation: Drawing one pseudo-random uniform variable from the interval [0,1] can be used to simulate the tossing of a coin: If the value is less than or equal to 0.50 designate the outcome as heads, but if the value is greater than 0.50 designate the outcome as tails. This is a simulation, but not a Monte Carlo simulation.

• Monte Carlo method: Pouring out a box of coins on a table, and then computing the ratio of coins that land heads versus tails is a Monte Carlo method of determining the behavior of repeated coin tosses, but it is not a simulation.

• Monte Carlo simulation: Drawing a large number of pseudo-random uniform variables from the interval [0,1], and assigning values less than or equal to 0.50 as heads and greater than 0.50 as tails, is a Monte Carlo simulation of the behavior of repeatedly tossing a coin.

Kalos and Whitlock point out that such distinctions are not always easy to maintain. For example, the emission of radiation from atoms is a natural stochastic process. It can be simulated directly, or its average behavior can be described by stochastic equations that can themselves be solved using Monte Carlo methods. “Indeed, the same computer code can be viewed simultaneously as a ‘natural simulation’ or as a solution of the equations by natural sampling.”

Monte Carlo and Random Numbers

Monte Carlo simulation methods do not always require truly random numbers to be useful — while for some applications, such as primality testing, unpredictability is vital. Many of the most useful techniques use deterministic, pseudorandom sequences, making it easy to test and re-run

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 177

simulations. The only quality usually necessary to make good simulations is for the pseudo-ran- dom sequence to appear “random enough” in a certain sense.

What this means depends on the application, but typically they should pass a series of statistical tests. Testing that the numbers are uniformly distributed or follow another desired distribution when a large enough number of elements of the sequence are considered is one of the simplest, and most common ones. Weak correlations between successive samples is also often desirable/ necessary.

Sawilowsky lists the characteristics of a high quality Monte Carlo simulation:

• the (pseudo-random) number generator has certain characteristics (e.g., a long “period” before the sequence repeats)

• the (pseudo-random) number generator produces values that pass tests for randomness

• there are enough samples to ensure accurate results

• the proper sampling technique is used

• the algorithm used is valid for what is being modeled

• it simulates the phenomenon in question.

Pseudo-random number sampling algorithms are used to transform uniformly distributed pseu- do-random numbers into numbers that are distributed according to a given probability distribu- tion.

Low-discrepancy sequences are often used instead of random sampling from a space as they en- sure even coverage and normally have a faster order of convergence than Monte Carlo simulations using random or pseudorandom sequences. Methods based on their use are called quasi-Monte Carlo methods.

Monte Carlo Simulation Versus “What If” Scenarios

There are ways of using probabilities that are definitely not Monte Carlo simulations — for exam- ple, deterministic modeling using single-point estimates. Each uncertain variable within a model is assigned a “best guess” estimate. Scenarios (such as best, worst, or most likely case) for each input variable are chosen and the results recorded.

By contrast, Monte Carlo simulations sample from a probability distribution for each variable to produce hundreds or thousands of possible outcomes. The results are analyzed to get probabilities of different outcomes occurring. For example, a comparison of a spreadsheet cost construction model run using traditional “what if” scenarios, and then running the comparison again with Mon- te Carlo simulation and triangular probability distributions shows that the Monte Carlo analysis has a narrower range than the “what if” analysis. This is because the “what if” analysis gives equal weight to all scenarios, while the Monte Carlo method hardly samples in the very low probability regions. The samples in such regions are called “rare events”.

______WORLD TECHNOLOGIES ______178 Numerical Analysis, Modelling and Simulation

Applications

Monte Carlo methods are especially useful for simulating phenomena with significantuncertainty in inputs and systems with a large number of coupled degrees of freedom. Areas of application include:

Physical Sciences

Monte Carlo methods are very important in computational physics, physical chemistry, and re- lated applied fields, and have diverse applications from complicated quantum chromodynamics calculations to designing heat shields and aerodynamic forms as well as in modeling radiation transport for radiation dosimetry calculations. In statistical physics Monte Carlo molecular mod- eling is an alternative to computational molecular dynamics, and Monte Carlo methods are used to compute statistical field theories of simple particle and polymer systems. Quantum Monte Carlo methods solve the many-body problem for quantum systems. In radiation materials science, the binary collision approximation for simulating ion implantation is usually based on a Monte Carlo approach to select the next colliding atom. In experimental particle physics, Monte Carlo methods are used for designing detectors, understanding their behavior and comparing experimental data to theory. In astrophysics, they are used in such diverse manners as to model both galaxy evolution and microwave radiation transmission through a rough planetary surface. Monte Carlo methods are also used in the ensemble models that form the basis of modern weather forecasting.

Engineering

Monte Carlo methods are widely used in engineering for sensitivity analysis and quantitative prob- abilistic analysis in process design. The need arises from the interactive, co-linear and non-linear behavior of typical process simulations. For example,

• In microelectronics engineering, Monte Carlo methods are applied to analyze correlated and uncorrelated variations in analog and digital integrated circuits.

• In geostatistics and geometallurgy, Monte Carlo methods underpin the design of mineral processing flowsheets and contribute to quantitative risk analysis.

• In wind energy yield analysis, the predicted energy output of a wind farm during its lifetime is calculated giving different levels of uncertainty (P90, P50, etc.)

• impacts of pollution are simulated and diesel compared with petrol.

• In fluid dynamics, in particular rarefied gas dynamics, where the Boltzmann equation is solved for finite Knudsen number fluid flows using the direct simulation Monte Carlo method in combination with highly efficient computational algorithms.

• In autonomous robotics, Monte Carlo localization can determine the position of a robot. It is often applied to stochastic filters such as theKalman filter or particle filter that forms the heart of the SLAM (simultaneous localization and mapping) algorithm.

• In telecommunications, when planning a wireless network, design must be proved to work for a wide variety of scenarios that depend mainly on the number of users, their locations

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 179

and the services they want to use. Monte Carlo methods are typically used to generate these users and their states. The network performance is then evaluated and, if results are not satisfactory, the network design goes through an optimization process.

• In reliability engineering, one can use Monte Carlo simulation to generate mean time be- tween failures and mean time to repair for components.

• In signal processing and Bayesian inference, particle filters and sequential Monte Carlo techniques are a class of mean field particle methods for sampling and computing the pos- terior distribution of a signal process given some noisy and partial observations using in- teracting empirical measures.

Climate Change and Radiative Forcing

The Intergovernmental Panel on Climate Change relies on Monte Carlo methods in probability density function analysis of radiative forcing.

Probability density function (PDF) of ERF due to total GHG, aerosol forcing and total anthropo- genic forcing. The GHG consists of WMGHG, ozone and stratospheric water vapour. The are generated based on uncertainties provided in Table 8.6. The combination of the individual RF agents to derive total forcing over the Industrial Era are done by Monte Carlo simulations and based on the method in Boucher and Haywood (2001). PDF of the ERF from surface albedo changes and combined contrails and contrail-induced cirrus are included in the total anthropo- genic forcing, but not shown as a separate PDF. We currently do not have ERF estimates for some forcing mechanisms: ozone, land use, solar, etc.

Computational Biology

Monte Carlo methods are used in various fields of computational biology, for example forBayesian inference in phylogeny, or for studying biological systems such as genomes, proteins, or mem- branes. The systems can be studied in the coarse-grained or ab initio frameworks depending on the desired accuracy. Computer simulations allow us to monitor the local environment of a partic- ular molecule to see if some chemical reaction is happening for instance. In cases where it is not feasible to conduct a physical experiment, thought experiments can be conducted (for instance: breaking bonds, introducing impurities at specific sites, changing the local/global structure, or introducing external fields).

Computer Graphics

Path tracing, occasionally referred to as Monte Carlo ray tracing, renders a 3D scene by randomly tracing samples of possible light paths. Repeated sampling of any given pixel will eventually cause the average of the samples to converge on the correct solution of the rendering equation, making it one of the most physically accurate 3D graphics rendering methods in existence.

Applied Statistics

The standards for Monte Carlo experiments in statistics were set by Sawilowsky. In applied statis- tics, Monte Carlo methods are generally used for two purposes:

______WORLD TECHNOLOGIES ______180 Numerical Analysis, Modelling and Simulation

1. To compare competing statistics for small samples under realistic data conditions. Al- though type I error and power properties of statistics can be calculated for data drawn from classical theoretical distributions (e.g., normal curve, Cauchy distribution) for asymptotic conditions (i. e, infinite sample size and infinitesimally small treatment effect), real data often do not have such distributions.

2. To provide implementations of hypothesis tests that are more efficient than exact tests such as permutation tests (which are often impossible to compute) while being more accu- rate than critical values for asymptotic distributions.

3. To provide a random sample from the posterior distribution in Bayesian inference. This sample then approximates and summarizes all the essential features of the posterior.

Monte Carlo methods are also a compromise between approximate randomization and permuta- tion tests. An approximate randomization test is based on a specified subset of all permutations (which entails potentially enormous housekeeping of which permutations have been considered). The Monte Carlo approach is based on a specified number of randomly drawn permutations (ex- changing a minor loss in precision if a permutation is drawn twice – or more frequently—for the efficiency of not having to track which permutations have already been selected).

Artificial Intelligence for Games

Monte Carlo methods have been developed into a technique called Monte-Carlo tree search that is useful for searching for the best move in a game. Possible moves are organized in a search tree and a large number of random simulations are used to estimate the long-term potential of each move. A black box simulator represents the opponent’s moves.

The Monte Carlo tree search (MCTS) method has four steps: 1. Starting at root node of the tree, select optimal child nodes until a leaf node is reached. 2. Expand the leaf node and choose one of its children. 3. Play a simulated game starting with that node. 4. Use the results of that simulated game to update the node and its ancestors.

The net effect, over the course of many simulated games, is that the value of a node representing a move will go up or down, hopefully corresponding to whether or not that node represents a good move.

Monte Carlo Tree Search has been used successfully to play games such as Go, Tantrix, Battleship, Havannah, and Arimaa.

Design and Visuals

Monte Carlo methods are also efficient in solving coupled integral differential equations of radi- ation fields and energy transport, and thus these methods have been used in global illumination computations that produce photo-realistic images of virtual 3D models, with applications in video games, architecture, design, computer generated films, and cinematic special effects.

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 181

Search and Rescue

The US Coast Guard utilizes Monte Carlo methods within its computer modeling software SA- ROPS in order to calculate the probable locations of vessels during search and rescue operations. Each simulation can generate as many as ten thousand data points which are randomly distributed based upon provided variables. Search patterns are then generated based upon extrapolations of these data in order to optimize the probability of containment (POC) and the probability of de- tection (POD), which together will equal an overall probability of success (POS). Ultimately this serves as a practical application of probability distribution in order to provide the swiftest and most expedient method of rescue, saving both lives and resources.

Finance and Business

Monte Carlo methods in finance are often used to evaluate investments in projects at a business unit or corporate level, or to evaluate financial derivatives. They can be used to model project schedules, where simulations aggregate estimates for worst-case, best-case, and most likely dura- tions for each task to determine outcomes for the overall project. Monte Carlo methods are also used in option pricing, default risk analysis.

Use in Mathematics

In general, Monte Carlo methods are used in mathematics to solve various problems by generating suitable random numbers and observing that fraction of the numbers that obeys some property or properties. The method is useful for obtaining numerical solutions to problems too complicated to solve analytically. The most common application of the Monte Carlo method is Monte Carlo integration.

Integration

Monte-Carlo integration works by comparing random points with the value of the function

Deterministic numerical integration algorithms work well in a small number of dimensions, but encounter two problems when the functions have many variables. First, the number of function evaluations needed increases rapidly with the number of dimensions. For example, if 10 evalua- tions provide adequate accuracy in one dimension, then 10100 points are needed for 100 dimen- sions—far too many to be computed. This is called the curse of dimensionality. Second, the bound- ary of a multidimensional region may be very complicated, so it may not be feasible to reduce the

______WORLD TECHNOLOGIES ______182 Numerical Analysis, Modelling and Simulation

problem to an iterated integral. 100 dimensions is by no means unusual, since in many physical problems, a “dimension” is equivalent to a degree of freedom.

Errors reduce by a factor of

Monte Carlo methods provide a way out of this exponential increase in computation time. As long as the function in question is reasonably well-behaved, it can be estimated by randomly select- ing points in 100-dimensional space, and taking some kind of average of the function values at these points. By the central limit theorem, this method displays convergence—i.e., quadrupling the number of sampled points halves the error, regardless of the number of dimensions.

A refinement of this method, known as importance sampling in statistics, involves sampling the points randomly, but more frequently where the integrand is large. To do this precisely one would have to already know the integral, but one can approximate the integral by an integral of a similar function or use adaptive routines such as stratified sampling, recursive stratified sampling, adap- tive umbrella sampling or the VEGAS algorithm.

A similar approach, the quasi-Monte Carlo method, uses low-discrepancy sequences. These se- quences “fill” the area better and sample the most important points more frequently, soqua- si-Monte Carlo methods can often converge on the integral more quickly.

Another class of methods for sampling points in a volume is to simulate random walks over it (Markov chain Monte Carlo). Such methods include the Metropolis-Hastings algorithm, Gibbs sampling, Wang and Landau algorithm, and interacting type MCMC methodologies such as the sequential Monte Carlo samplers.

Simulation and Optimization

Another powerful and very popular application for random numbers in numerical simulation is in numerical optimization. The problem is to minimize (or maximize) functions of some vector that often has a large number of dimensions. Many problems can be phrased in this way: for example, a computer chess program could be seen as trying to find the set of, say, 10 moves that produces the best evaluation function at the end. In the traveling salesman problem the goal is to minimize dis- tance traveled. There are also applications to engineering design, such as multidisciplinary design optimization. It has been applied with quasi-one-dimensional models to solve particle dynamics problems by efficiently exploring large configuration space.

The traveling salesman problem is what is called a conventional optimization problem. That is, all the facts (distances between each destination point) needed to determine the optimal path to fol-

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 183

low are known with certainty and the goal is to run through the possible travel choices to come up with the one with the lowest total distance. However, let’s assume that instead of wanting to min- imize the total distance traveled to visit each desired destination, we wanted to minimize the total time needed to reach each destination. This goes beyond conventional optimization since travel time is inherently uncertain (traffic jams, time of day, etc.). As a result, to determine our optimal path we would want to use simulation - optimization to first understand the range of potential times it could take to go from one point to another (represented by a probability distribution in this case rather than a specific distance) and then optimize our travel decisions to identify the best path to follow taking that uncertainty into account.

Inverse Problems

Probabilistic formulation of inverse problems leads to the definition of a probability distribu- tion in the model space. This probability distribution combines prior information with new information obtained by measuring some observable parameters (data). As, in the general case, the theory linking data with model parameters is nonlinear, the posterior probability in the model space may not be easy to describe (it may be multimodal, some moments may not be defined, etc.).

When analyzing an inverse problem, obtaining a maximum likelihood model is usually not suf- ficient, as we normally also wish to have information on the resolution power of the data. In the general case we may have a large number of model parameters, and an inspection of the marginal probability densities of interest may be impractical, or even useless. But it is possible to pseudo- randomly generate a large collection of models according to the posterior probability distribution and to analyze and display the models in such a way that information on the relative likelihoods of model properties is conveyed to the spectator. This can be accomplished by means of an efficient Monte Carlo method, even in cases where no explicit formula for the a priori distribution is avail- able.

The best-known importance sampling method, the Metropolis algorithm, can be generalized, and this gives a method that allows analysis of (possibly highly nonlinear) inverse problems with com- plex a priori information and data with an arbitrary noise distribution.

In Popular Culture

• The Monte Carlo Method, the 1998 album by the southern California indie rock band Noth- ing Painted Blue. (Scat. 1998).

Monte Carlo Integration

In mathematics, Monte Carlo integration is a technique for numerical integration using random numbers. It is a particular Monte Carlo method that numerically computes a definite integral. While other algorithms usually evaluate the integrand at a regular grid, Monte Carlo randomly choose points at which the integrand is evaluated. This method is particularly useful for higher-di- mensional integrals.

______WORLD TECHNOLOGIES ______184 Numerical Analysis, Modelling and Simulation

An illustration of Monte Carlo integration. In this example, the domain D is the inner circle and the domain E is the square. Because the square’s area (4) can be easily calculated, the area of the circle (π*12) can be estimated by the ratio (0.8) of the points inside the circle (40) to the total number of points (50), yielding an approximation for the circle’s area of 4*0.8 = 3.2 ≈ π*12.

There are different methods to perform a Monte Carlo integration, such as uniform sampling, stratified sampling, importance sampling, Sequential Monte Carlo (a.k.a. particle filter),and mean field particle methods.

Overview

In numerical integration, methods such as the Trapezoidal rule use a deterministic approach. Monte Carlo integration, on the other hand, employs a non-deterministic approach: each realiza- tion provides a different outcome. In Monte Carlo, the final outcome is an approximation of the correct value with respective error bars, and the correct value is within those error bars.

The problem Monte Carlo integration addresses is the computation of a multidimensional definite integral

I= fd()xx ∫Ω where Ω, a subset of Rm, has volume

Vd= x ∫Ω

The naive Monte Carlo approach is to sample points uniformly on Ω: given N uniform samples,

xx1,, N ∈Ω ,

I can be approximated by

1 N I≈ QN ≡ V∑ f()xi = Vf 〈〉 .. N i=1 This is because the law of large numbers ensures that

limQIN = . N →∞

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 185

Given the estimation of I from QN, the error bars of QN can be estimated by the sample variance using the unbiased estimate of the variance.

N 2 2 1 Var(f ) ≡σ N = ∑() ff().xi −〈 〉 N −1 i=1

which leads to

Vf2 N Var( ) σ 2 = =22 = N Var(QN )2 ∑ Var(fV ) V . Ni=1 NN As long as the sequence

222 {σσσ1,,, 23…}

is bounded, this variance decreases asymptotically to zero as 1/N. The estimation of the error of

QN is thus

σ N δQNN≈=Var( QV ) , N 1 which decreases as . This is standard error of the mean multiplied with V. This result does N not depend on the number of dimensions of the integral, which is the promised advantage of Mon- te Carlo integration against most deterministic methods that depend exponentially on the dimen- sion. It is important to notice that, like in deterministic methods, the estimate of the error is not a strict error bound; random sampling may not uncover all the important features of the integrand that can result in an underestimate of the error.

While the naive Monte Carlo works for simple examples, this is not the case in most problems. A large part of the Monte Carlo literature is dedicated in developing strategies to improve the error estimates. In particular, stratified sampling - dividing the region in sub-domains -, and impor- tance sampling - sampling from non-uniform distributions - are two of such techniques.

Example

1 Relative error as a function of the number of samples, showing the scaling N

______WORLD TECHNOLOGIES ______186 Numerical Analysis, Modelling and Simulation

A paradigmatic example of a Monte Carlo integration is the estimation of π. Consider the function

1 if xy22+≤1 H() xy, =  0 else

and the set Ω = [−1,1] × [−1,1] with V = 4. Notice that

I= H(, x y ) dxdy = π . π ∫Ω

Thus, a crude way of calculating the value of π with Monte Carlo integration is to pick N random numbers on Ω and compute

1 N QN= 4∑ Hx (,ii y ) N i=1

QN −π In the figure on the right, the relative error π is measured as a function of N, confirming the 1 . N

Wolfram Mathematica Example

The code below describes a process of integrating the function 1 fx()= 1+ sinh(2xx )log( )2

from 0.8<

code: func[x_] := 1/(1 + Sinh[2*x]*(Log[x])^2); (*Sample from truncated normal distribution to speed up convergence*) Distrib[x_, average_, var_] := PDF[NormalDistribution[average, var], 1.1*x - 0.1]; n = 10; RV = RandomVariate[TruncatedDistribution[{0.8, 3}, NormalDistribution[1, 0.399]], n]; Int = 1/n Total[func[RV]/Distrib[RV, 1, 0.399]]*Integrate[Distrib[x, 1, 0.399], {x, 0.8, 3}] NIntegrate[func[x], {x, 0.8, 3}] (*Compare with real answer*)

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 187

Recursive Stratified Sampling

11xy22+< = f(, xy )  22 An illustration of Recursive Stratified Sampling. In this example, the function: 01xy+≥from the above illustration was integrated within a unit square using the suggested algorithm. The sampled points were recorded and plotted. Clearly stratified sampling algorithm concentrates the points in the regions where the variation of the function is largest.

Recursive stratified sampling is a generalization of one-dimensional adaptive quadratures to multi-dimensional integrals. On each recursion step the integral and the error are estimated using a plain Monte Carlo algorithm. If the error estimate is larger than the required accuracy the inte- gration volume is divided into sub-volumes and the procedure is recursively applied to sub-vol- umes.

The ordinary ‘dividing by two’ strategy does not work for multi-dimensions as the number of sub-volumes grows far too quickly to keep track. Instead one estimates along which dimension a subdivision should bring the most dividends and only subdivides the volume along this dimension.

The stratified sampling algorithm concentrates the sampling points in the regions where the vari- ance of the function is largest thus reducing the grand variance and making the sampling more effective, as shown on the illustration.

The popular MISER routine implements a similar algorithm.

MISER Monte Carlo

The MISER algorithm is based on recursive stratified sampling. This technique aims to reduce the overall integration error by concentrating integration points in the regions of highest variance.

The idea of stratified sampling begins with the observation that for two disjoint regions a and b 2 2 with Monte Carlo estimates of the integral Efa ()and Efb ()and variances σ a ()f and σ b ()f , the variance Var(f) of the combined estimate

1 Ef()=2 ( Eab () f + E () f)

is given by,

σσ22()ff () Var(f ) =ab + 44NNab

______WORLD TECHNOLOGIES ______188 Numerical Analysis, Modelling and Simulation

It can be shown that this variance is minimized by distributing the points such that, N σ aa= NNa++ bσσ ab

Hence the smallest error estimate is obtained by allocating sample points in proportion to the standard deviation of the function in each sub-region.

The MISER algorithm proceeds by bisecting the integration region along one coordinate axis to give two sub-regions at each step. The direction is chosen by examining all d possible bisections and selecting the one which will minimize the combined variance of the two sub-regions. The variance in the sub-regions is estimated by sampling with a fraction of the total number of points available to the current step. The same procedure is then repeated recursively for each of the two half-spaces from the best bisection. The remaining sample points are allocated to the sub-regions using the formula for Na and Nb. This recursive allocation of integration points continues down to a user-specified depth where each sub-region is integrated using a plain Monte Carlo estimate. These individual values and their error estimates are then combined upwards to give an overall result and an estimate of its error.

Importance Sampling VEGAS Monte Carlo

The VEGAS algorithm takes advantage of the information stored during the sampling, and uses it and importance sampling to efficiently estimate the integral I. It samples points from the proba- bility distribution described by the function |f| so that the points are concentrated in the regions that make the largest contribution to the integral.

In general, if the Monte Carlo integral of f is sampled with points distributed according to a prob- ability distribution described by the function g, we obtain an estimate:

f Eg (; fN )= E()g ; N

with a corresponding variance,

f Varg (fN ; )= Var()g ; N

If the probability distribution is chosen as

||f g = If(| |)

then it can be shown that the variance Vg (; fN )vanishes, and the error in the estimate will be zero. In practice it is not possible to sample from the exact distribution g for an arbitrary function, so importance sampling algorithms aim to produce efficient approximations to the desired distribu- tion.

The VEGAS algorithm approximates the exact distribution by making a number of passes over the

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 189

integration region which creates the histogram of the function f. Each histogram is used to define a sampling distribution for the next pass. Asymptotically this procedure converges to the desired distribution. In order to avoid the number of histogram bins growing like Kd, the probability dis- tribution is approximated by a separable function:

gxx(,12 ,…= ) gxg 11 () 2 () x 2 …

so that the number of bins required is only Kd. This is equivalent to locating the peaks of the function from the projections of the integrand onto the coordinate axes. The efficiency of VEGAS depends on the validity of this assumption. It is most efficient when the peaks of the integrand are well-localized. If an integrand can be rewritten in a form which is approximately separable this will increase the efficiency of integration with VEGAS.

VEGAS incorporates a number of additional features, and combines both stratified sampling and im- portance sampling. The integration region is divided into a number of “boxes”, with each box getting a fixed number of points (the goal is 2). Each box can then have a fractional number of bins, but if bins/ box is less than two, Vegas switches to a kind variance reduction (rather than importance sampling).

This routines uses the VEGAS Monte Carlo algorithm to integrate the function f over the dim-di- mensional hypercubic region defined by the lower and upper limits in the arrays xl and xu, each of size dim. The integration uses a fixed number of function calls. The result and its error estimate are based on a weighted average of independent samples.

The VEGAS algorithm computes a number of independent estimates of the integral internally, ac- cording to the iterations parameter described below, and returns their weighted average. Random sampling of the integrand can occasionally produce an estimate where the error is zero, particu- larly if the function is constant in some regions. An estimate with zero error causes the weighted average to break down and must be handled separately.

Importance Sampling Algorithm

Importance sampling provides a very important tool to perform Monte-Carlo integration. The main result of importance sampling to this method is that the uniform sampling of x is a particu- lar case of a more generic choice, on which the samples are drawn from any distribution p(x ). The

idea is that p()x can be chosen to decrease the variance of the measurement QN. Consider the following example where one would like to numerically integrate a gaussian function, centered at 0, with σ = 1, from −1000 to 1000. Naturally, if the samples are drawn uniformly on the interval [−1000, 1000], only a very small part of them would be significant to the integral. This can be improved by choosing a different distribution from where the samples are chosen, for instance by sampling according to a gaussian distribution centered at 0, with σ = 1. Of course the “right” choice strongly depends on the integrand.

Formally, given a set of samples chosen from a distribution

pV(x ): xx1 , ,N ∈ ,

the estimator for I is given by

______WORLD TECHNOLOGIES ______190 Numerical Analysis, Modelling and Simulation

N 1f ()xi QN ≡ ∑ N i=1 p()xi

Intuitively, this says that if we pick a particular sample twice as much as other samples, we weight it half as much as the other samples. This estimator is naturally valid for uniform sampling, the case where p()x is constant.

The Metropolis-Hastings algorithm is one of the most used algorithms to generate x from p()x , thus providing an efficient way of computing integrals.

Multiple and Adaptive Importance Sampling

When different proposal distributions, pn ()x , nN=1, … , , are jointly used for drawing the sam- ples xx1,, N ∈V ,different proper weighting functions can be employed . In an adaptive setting,

the proposal distributions, pnt, ()x , nN=1, … , , and tT=1, … , , are updated each iteration t of the adaptive importance sampling algorithm. Hence, since a population of proposal densities is used, several suitable combinations of sampling and weighting schemes can be employed.

Mathematical Optimization

In mathematics, computer science and operations research, mathematical optimization, also spelled mathematical optimisation, alternatively named mathematical programming or simply optimization or optimisation, is the selection of a best element (with regard to some criterion) from some set of available alternatives.

Graph of a paraboloid given by z = f(x, y) = −(x² + y²) + 4. The global maximum at (x, y, z) = (0, 0, 4) is indicated by a blue dot.

In the simplest case, an optimization problem consists of maximizing or minimizing a real func- tion by systematically choosing input values from within an allowed set and computing the value of the function. The generalization of optimization theory and techniques to other formulations com- prises a large area of applied mathematics. More generally, optimization includes finding “best

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 191

available” values of some objective function given a defineddomain (or input), including a variety of different types of objective functions and different types of domains.

Nelder-Mead minimum search of Simionescu’s function. Simplex vertices are ordered by their value, with 1 having the lowest (best) value.

Optimization Problems

An optimization problem can be represented in the following way:

Given: a function f : A → R from some set A to the real numbers

Sought: an element x0 in A such that f(x0) ≤ f(x) for all x in A (“minimization”) or such that

f(x0) ≥ f(x) for all x in A (“maximization”). Such a formulation is called an optimization problem or a mathematical programming problem (a term not directly related to , but still in use for example in linear pro- gramming). Many real-world and theoretical problems may be modeled in this general framework. Problems formulated using this technique in the fields ofphysics and computer vision may refer to the technique as energy minimization, speaking of the value of the function f as representing the energy of the system being modeled.

Typically, A is some subset of the Euclidean space Rn, often specified by a set ofconstraints , equal- ities or inequalities that the members of A have to satisfy. The domain A of f is called the search space or the choice set, while the elements of A are called candidate solutions or feasible solutions.

The function f is called, variously, an objective function, a loss function or cost function (minimiza- tion), a utility function or fitness function (maximization), or, in certain fields, an energy function or energy functional. A feasible solution that minimizes (or maximizes, if that is the goal) the ob- jective function is called an optimal solution.

In mathematics, conventional optimization problems are usually stated in terms of minimization. Generally, unless both the objective function and the feasible region are convex in a minimization problem, there may be several local minima. A local minimum x* is defined as a point for which there exists some δ > 0 so that for all x such that

______WORLD TECHNOLOGIES ______192 Numerical Analysis, Modelling and Simulation

‖‖xx−≤* δ ,

the expression

ff(xx* )≤ ()

holds; that is to say, on some region around x* all of the function values are greater than or equal to the value at that point. Local maxima are defined similarly.

While a local minimum is at least as good as any nearby points, a global minimum is at least as good as every feasible point. In a convex problem, if there is a local minimum that is interior (not on the edge of the set of feasible points), it is also the global minimum, but a nonconvex problem may have more than one local minimum not all of which need be global minima.

A large number of algorithms proposed for solving nonconvex problems—including the majority of commercially available solvers—are not capable of making a distinction between locally optimal solutions and globally optimal solutions, and will treat the former as actual solutions to the orig- inal problem. is the branch of applied mathematics and numerical analysis that is concerned with the development of deterministic algorithms that are capable of guarantee- ing convergence in finite time to the actual optimal solution of a nonconvex problem.

Notation

Optimization problems are often expressed with special notation. Here are some examples.

Minimum aand Maximum Value of A Function

Consider the following notation:

2 minx∈ (x + 1)

This denotes the minimum value of the objective function x2 +1, when choosing x from the set of real numbers . . The minimum value in this case is , occurring at x = 0. Similarly, the notation

maxx∈ 2x

asks for the maximum value of the objective function 2x, where x may be any real number. In this case, there is no such maximum as the objective function is unbounded, so the answer is “infinity” or “undefined”.

Optimal Input Arguments

Consider the following notation:

arg minx2 + 1, x∈−∞−( , 1]

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 193

or equivalently

argminxx2 + 1, subject to:∈ (−∞ , − 1]. x

This represents the value (or values) of the argument x in the interval (−∞ , − 1] that minimizes (or minimize) the objective function x2 + 1 (the actual minimum value of that function is not what the problem asks for). In this case, the answer is x = –1, since x = 0 is infeasible, i.e. does not belong to the feasible set.

Similarly, argmaxxy cos( ), xy∈−[ 5,5], ∈

or equivalently

argmaxxy cos( ),subject to: x∈− [ 5,5], y ∈ , xy,

represents the (,xy )pair (or pairs) that maximizes (or maximize) the value of the objective func- tion xycos( ), with the added constraint that x lie in the interval [− 5,5] (again, the actual maxi- mum value of the expression does not matter). In this case, the solutions are the pairs of the form (5, 2kπ) and (−5,(2k+1)π), where k ranges over all integers. arg min and arg max are sometimes also written argmin and argmax, and stand for argument of the minimum and argument of the maximum.

History

Fermat and Lagrange found calculus-based formulas for identifying optima, while Newton and Gauss proposed iterative methods for moving towards an optimum.

The term “linear programming” for certain optimization cases was due to George B. Dantzig, al- though much of the theory had been introduced by Leonid Kantorovich in 1939. (Programming in this context does not refer to computer programming, but from the use of program by the United States military to refer to proposed training and logistics schedules, which were the problems Dantzig studied at that time.) Dantzig published the Simplex algorithm in 1947, and John von Neumann developed the theory of duality in the same year.

Other major researchers in mathematical optimization include the following: • Aharon Ben-Tal • Richard Bellman • Roger Fletcher • Ronald A. Howard • Fritz John

______WORLD TECHNOLOGIES ______194 Numerical Analysis, Modelling and Simulation

• Narendra Karmarkar • William Karush • Leonid Khachiyan • Bernard Koopman • Harold Kuhn • László Lovász • Arkadi Nemirovski • Yurii Nesterov • Boris Polyak • Lev Pontryagin • James Renegar • R. Tyrrell Rockafellar • Cornelis Roos • Naum Z. Shor • Michael J. Todd • Albert Tucker

Major Subfields

• Convex programming studies the case when the objective function is convex (minimiza- tion) or concave (maximization) and the constraint set is convex. This can be viewed as a particular case of or as generalization of linear or convex quadrat- ic programming.

o Linear programming (LP), a type of convex programming, studies the case in which the objective function f is linear and the constraints are specified using only linear equalities and inequalities. Such a set is called a polyhedron or a polytope if it is bounded.

o Second order cone programming (SOCP) is a convex program, and includes certain types of quadratic programs.

o Semidefinite programming (SDP) is a subfield of convex optimization where the underlying variables are semidefinite matrices. It is generalization of linear and convex .

o Conic programming is a general form of convex programming. LP, SOCP and SDP can all be viewed as conic programs with the appropriate type of cone.

o Geometric programming is a technique whereby objective and inequality con-

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 195

straints expressed as posynomials and equality constraints as monomials can be transformed into a convex program.

• Integer programming studies linear programs in which some or all variables are con- strained to take on integer values. This is not convex, and in general much more difficult than regular linear programming.

• Quadratic programming allows the objective function to have quadratic terms, while the feasible set must be specified with linear equalities and inequalities. For specific forms of the quadratic term, this is a type of convex programming.

• Fractional programming studies optimization of ratios of two nonlinear functions. The special class of concave fractional programs can be transformed to a convex optimization problem.

• Nonlinear programming studies the general case in which the objective function or the constraints or both contain nonlinear parts. This may or may not be a convex program. In general, whether the program is convex affects the difficulty of solving it.

• Stochastic programming studies the case in which some of the constraints or parameters depend on random variables.

• Robust programming is, like stochastic programming, an attempt to capture uncertainty in the data underlying the optimization problem. Robust optimization targets to find solu- tions that are valid under all possible realizations of the uncertainties.

• Combinatorial optimization is concerned with problems where the set of feasible solutions is discrete or can be reduced to a discrete one.

• Stochastic optimization is used with random (noisy) function measurements or random inputs in the search process.

• Infinite-dimensional optimization studies the case when the set of feasible solutions is a subset of an infinite-dimensional space, such as a space of functions.

• Heuristics and make few or no assumptions about the problem being op- timized. Usually, heuristics do not guarantee that any optimal solution need be found. On the other hand, heuristics are used to find approximate solutions for many complicated optimization problems.

• Constraint satisfaction studies the case in which the objective function f is constant (this is used in artificial intelligence, particularly in automated reasoning).

o is a programming paradigm wherein relations between variables are stated in the form of constraints.

• Disjunctive programming is used where at least one constraint must be satisfied but not all. It is of particular use in scheduling.

In a number of subfields, the techniques are designed primarily for optimization in dynamic con-

______WORLD TECHNOLOGIES ______196 Numerical Analysis, Modelling and Simulation

texts (that is, decision making over time):

• Calculus of variations seeks to optimize an action integral over some space to an extremum by varying a function of the coordinates.

• Optimal control theory is a generalization of the calculus of variations which introduces control policies.

• Dynamic programming studies the case in which the optimization strategy is based on splitting the problem into smaller subproblems. The equation that describes the relation- ship between these subproblems is called the Bellman equation.

• Mathematical programming with equilibrium constraints is where the constraints include variational inequalities or complementarities.

Multi-objective Optimization

Adding more than one objective to an optimization problem adds complexity. For example, to op- timize a structural design, one would desire a design that is both light and rigid. When two objec- tives conflict, a trade-off must be created. There may be one lightest design, one stiffest design, and an infinite number of designs that are some compromise of weight and rigidity. The set of trade-off designs that cannot be improved upon according to one criterion without hurting another criterion is known as the Pareto set. The curve created plotting weight against stiffness of the best designs is known as the Pareto frontier.

A design is judged to be “Pareto optimal” (equivalently, “Pareto efficient” or in the Pareto set) if it is not dominated by any other design: If it is worse than another design in some respects and no better in any respect, then it is dominated and is not Pareto optimal.

The choice among “Pareto optimal” solutions to determine the “favorite solution” is delegated to the decision maker. In other words, defining the problem as multi-objective optimization signals that some information is missing: desirable objectives are given but not their detailed combina- tion. In some cases, the missing information can be derived by interactive sessions with the deci- sion maker.

Multi-objective optimization problems have been generalized further into vector optimization problems where the (partial) ordering is no longer given by the Pareto ordering.

Multi-modal Optimization

Optimization problems are often multi-modal; that is, they possess multiple good solutions. They could all be globally good (same cost function value) or there could be a mix of globally good and locally good solutions. Obtaining all (or at least some of) the multiple solutions is the goal of a multi-modal optimizer.

Classical optimization techniques due to their iterative approach do not perform satisfactorily when they are used to obtain multiple solutions, since it is not guaranteed that different solutions will be obtained even with different starting points in multiple runs of the algorithm. Evolutionary

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 197

algorithms, however, are a very popular approach to obtain multiple solutions in a multi-modal optimization task.

Classification of Critical Points And Extrema Feasibility Problem

The satisfiability problem, also called the feasibility problem, is just the problem of finding any feasible solution at all without regard to objective value. This can be regarded as the special case of mathematical optimization where the objective value is the same for every solution, and thus any solution is optimal.

Many optimization algorithms need to start from a feasible point. One way to obtain such a point is to relax the feasibility conditions using a slack variable; with enough slack, any starting point is feasible. Then, minimize that slack variable until slack is null or negative.

Existence

The extreme value theorem of Karl Weierstrass states that a continuous real-valued function on a compact set attains its maximum and minimum value. More generally, a lower semi-continuous function on a compact set attains its minimum; an upper semi-continuous function on a compact set attains its maximum.

Necessary Conditions for Optimality

One of Fermat’s theorems states that optima of unconstrained problems are found at stationary points, where the first derivative or the gradient of the objective function is zero. More generally, they may be found at critical points, where the first derivative or gradient of the objective function is zero or is undefined, or on the boundary of the choice set. An equation (or set of equations) stat- ing that the first derivative(s) equal(s) zero at an interior optimum is called a ‘first-order condition’ or a set of first-order conditions.

Optima of equality-constrained problems can be found by the Lagrange multiplier method. The optima of problems with equality and/or inequality constraints can be found using the ‘Karush– Kuhn–Tucker conditions’.

Sufficient Conditions for Optimality

While the first derivative test identifies points that might be extrema, this test does not distinguish a point that is a minimum from one that is a maximum or one that is neither. When the objective function is twice differentiable, these cases can be distinguished by checking the second derivative or the matrix of second derivatives (called the Hessian matrix) in unconstrained problems, or the matrix of second derivatives of the objective function and the constraints called the bordered Hessian in constrained problems. The conditions that distinguish maxima, or minima, from other stationary points are called ‘second-order conditions’. If a candidate solution satisfies the first-or- der conditions, then satisfaction of the second-order conditions as well is sufficient to establish at least local optimality.

______WORLD TECHNOLOGIES ______198 Numerical Analysis, Modelling and Simulation

Sensitivity and Continuity of Optima

The envelope theorem describes how the value of an optimal solution changes when an underlying parameter changes. The process of computing this change is called comparative statics.

The maximum theorem of Claude Berge (1963) describes the continuity of an optimal solution as a function of underlying parameters.

Calculus of Optimization

For unconstrained problems with twice-differentiable functions, some critical points can be found by finding the points where the gradient of the objective function is zero (that is, the stationary points). More generally, a zero subgradient certifies that a local minimum has been found formin - imization problems with convex functions and other locally Lipschitz functions.

Further, critical points can be classified using thedefiniteness of the Hessian matrix: If the Hessian is positive definite at a critical point, then the point is a local minimum; if the Hessian matrix is negative definite, then the point is a local maximum; finally, if indefinite, then the point is some kind of saddle point.

Constrained problems can often be transformed into unconstrained problems with the help of Lagrange multipliers. Lagrangian relaxation can also provide approximate solutions to difficult constrained problems.

When the objective function is convex, then any local minimum will also be a global minimum. There exist efficient numerical techniques for minimizing convex functions, such as interior-point methods.

Computational Optimization Techniques

To solve problems, researchers may use algorithms that terminate in a finite number of steps, or iterative methods that converge to a solution (on some specified class of problems), or heuristics that may provide approximate solutions to some problems (although their iterates need not con- verge).

Optimization Algorithms

• Simplex algorithm of George Dantzig, designed for linear programming.

• Extensions of the simplex algorithm, designed for quadratic programming and for lin- ear-fractional programming.

• Variants of the simplex algorithm that are especially suited for network optimization.

• Combinatorial algorithms

Iterative Methods

The iterative methods used to solve problems of nonlinear programming differ according to wheth- er they evaluate Hessians, gradients, or only function values. While evaluating Hessians (H) and

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 199

gradients (G) improves the rate of convergence, for functions for which these quantities exist and vary sufficiently smoothly, such evaluations increase the computational complexity (or compu- tational cost) of each iteration. In some cases, the computational complexity may be excessively high.

One major criterion for optimizers is just the number of required function evaluations as this often is already a large computational effort, usually much more effort than within the optimizer itself, which mainly has to operate over the N variables. The derivatives provide detailed information for such optimizers, but are even harder to calculate, e.g. approximating the gradient takes at least N+1 function evaluations. For approximations of the 2nd derivatives (collected in the Hessian matrix) the number of function evaluations is in the order of N². Newton’s method requires the 2nd order derivates, so for each iteration the number of function calls is in the order of N², but for a simpler pure gradient optimizer it is only N. However, gradient optimizers need usually more iterations than Newton’s algorithm. Which one is best with respect to the number of function calls depends on the problem itself.

• Methods that evaluate Hessians (or approximate Hessians, using finite differences):

o Newton’s method

o Sequential quadratic programming: A Newton-based method for small-medium scale constrained problems. Some versions can handle large-dimensional prob- lems.

o Interior point methods: This is a large class of methods for constrained optimiza- tion. Some interior-point methods use only (sub)gradient information, and others of which require the evaluation of Hessians.

• Methods that evaluate gradients, or approximate gradients in some way (or even subgra- dients):

o Coordinate descent methods: Algorithms which update a single coordinate in each iteration

o Conjugate gradient methods: Iterative methods for large problems. (In theory, these methods terminate in a finite number of steps with quadratic objective func- tions, but this finite termination is not observed in practice on finite–precision computers.)

o Gradient descent (alternatively, “steepest descent” or “steepest ascent”): A (slow) method of historical and theoretical interest, which has had renewed interest for finding approximate solutions of enormous problems.

o Subgradient methods - An iterative method for large locally Lipschitz functions using generalized gradients. Following Boris T. Polyak, subgradient–projection methods are similar to conjugate–gradient methods.

o Bundle method of descent: An iterative method for small–medium-sized problems with locally Lipschitz functions, particularly for convex minimization problems.

______WORLD TECHNOLOGIES ______200 Numerical Analysis, Modelling and Simulation

(Similar to conjugate gradient methods)

o Ellipsoid method: An iterative method for small problems with quasiconvex ob- jective functions and of great theoretical interest, particularly in establishing the polynomial time complexity of some combinatorial optimization problems. It has similarities with Quasi-Newton methods.

o Reduced gradient method (Frank–Wolfe) for approximate minimization of spe- cially structured problems with linear constraints, especially with traffic networks. For general unconstrained problems, this method reduces to the gradient method, which is regarded as obsolete (for almost all problems).

o Quasi-Newton methods: Iterative methods for medium-large problems (e.g. N<1000).

o Simultaneous perturbation stochastic approximation (SPSA) method for stochastic optimization; uses random (efficient) gradient approximation.

• Methods that evaluate only function values: If a problem is continuously differentiable, then gradients can be approximated using finite differences, in which case a gradient-based method can be used.

o Interpolation methods

o Pattern search methods, which have better convergence properties than the Nelder– Mead heuristic (with simplices), which is listed below.

Global Convergence

More generally, if the objective function is not a quadratic function, then many optimization meth- ods use other methods to ensure that some subsequence of iterations converges to an optimal solution. The first and still popular method for ensuring convergence relies online searches, which optimize a function along one dimension. A second and increasingly popular method for ensuring convergence uses trust regions. Both line searches and trust regions are used in modern methods of non-differentiable optimization. Usually a global optimizer is much slower than advanced local optimizers (such as BFGS), so often an efficient global optimizer can be constructed by starting the local optimizer from different starting points.

Heuristics

Besides (finitely terminating)algorithms and (convergent) iterative methods, there are heuristics. A heuristic is any algorithm which is not guaranteed (mathematically) to find the solution, but which is nevertheless useful in certain practical situations. List of some well-known heuristics:

• Memetic algorithm

• Differential evolution

• Evolutionary algorithms

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 201

• Dynamic relaxation • Genetic algorithms • Hill climbing with random restart • Nelder-Mead simplicial heuristic: A popular heuristic for approximate minimization (with- out calling gradients) • Particle swarm optimization • Gravitational search algorithm • Artificial bee colony optimization • Simulated annealing • Stochastic tunneling • Tabu search • Reactive Search Optimization (RSO) implemented in LIONsolver

Applications Mechanics

Problems in rigid body dynamics (in particular articulated rigid body dynamics) often require mathematical programming techniques, since you can view rigid body dynamics as attempting to solve an ordinary differential equation on a constraint manifold; the constraints are various non- linear geometric constraints such as “these two points must always coincide”, “this surface must not penetrate any other”, or “this point must always lie somewhere on this curve”. Also, the prob- lem of computing contact forces can be done by solving a linear complementarity problem, which can also be viewed as a QP (quadratic programming) problem.

Many design problems can also be expressed as optimization programs. This application is called design optimization. One subset is the engineering optimization, and another recent and growing subset of this field ismultidisciplinary design optimization, which, while useful in many problems, has in particular been applied to aerospace engineering problems.

Economics

Economics is closely enough linked to optimization of agents that an influential definition relatedly describes economics qua science as the “study of human behavior as a relationship between ends and scarce means” with alternative uses. Modern optimization theory includes traditional optimization theory but also overlaps with game theory and the study of economic equilibria.

In microeconomics, the utility maximization problem and its dual problem, the expenditure mini- mization problem, are economic optimization problems. Insofar as they behave consistently, con- sumers are assumed to maximize their utility, while firms are usually assumed to maximize their

______WORLD TECHNOLOGIES ______202 Numerical Analysis, Modelling and Simulation

profit. Also, agents are often modeled as being risk-averse, thereby preferring to avoid risk. Asset prices are also modeled using optimization theory, though the underlying mathematics relies on optimizing stochastic processes rather than on static optimization. Trade theory also uses opti- mization to explain trade patterns between nations. The optimization of market portfolios is an example of multi-objective optimization in economics.

Since the 1970s, economists have modeled dynamic decisions over time using control theory. For example, microeconomists use dynamic search models to study labor-market behavior. A crucial distinction is between deterministic and stochastic models. Macroeconomists build dynamic sto- chastic general equilibrium (DSGE) models that describe the dynamics of the whole economy as the result of the interdependent optimizing decisions of workers, consumers, investors, and gov- ernments.

Electrical Engineering

Some common applications of optimization techniques in electrical engineering include active filter design, stray field reduction in superconducting magnetic energy storage systems, space mapping design of microwave structures, handset antennas, electromagnetics-based design. Elec- tromagnetically validated design optimization of microwave components and antennas has made extensive use of an appropriate physics-based or empirical surrogate model and space mapping methodologies since the discovery of space mapping in 1993

Operations Research

Another field that uses optimization techniques extensively is operations research. Operations re- search also uses stochastic modeling and simulation to support improved decision-making. In- creasingly, operations research uses stochastic programming to model dynamic decisions that adapt to events; such problems can be solved with large-scale optimization and stochastic optimi- zation methods.

Control Engineering

Mathematical optimization is used in much modern controller design. High-level controllers such as model predictive control (MPC) or real-time optimization (RTO) employ mathematical optimi- zation. These algorithms run online and repeatedly determine values for decision variables, such as choke openings in a process plant, by iteratively solving a mathematical optimization problem including constraints and a model of the system to be controlled.

Geophysics

Optimization techniques are regularly used in geophysical parameter estimation problems. Given a set of geophysical measurements, e.g. seismic recordings, it is common to solve for the physical properties and geometrical shapes of the underlying rocks and fluids.

Molecular Modeling

Nonlinear optimization methods are widely used in conformational analysis.

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 203

Optimization Problem

In mathematics and computer science, an optimization problem is the problem of finding thebest solution from all feasible solutions. Optimization problems can be divided into two categories de- pending on whether the variables are continuous or discrete. An optimization problem with dis- crete variables is known as a combinatorial optimization problem. In a combinatorial optimization problem, we are looking for an object such as an integer, permutation or graph from a finite (or possibly countable infinite) set. Problems with continuous variables include constrained problems and multimodal problems.

Continuous Optimization Problem

The standard form of a (continuous) optimization problem is minimizefx ( ) x

subject to gxi ( )≤=… 0, i 1, , m

hxi ( )= 0, i = 1, … , p

where

n • fx( ):→ is the objective function to be minimized over the variable x , ≤ • gxi () 0are called inequality constraints, and = • hxi () 0are called equality constraints. By convention, the standard form defines a minimization problem. A maximization problem can be treated by negating the objective function.

Combinatorial Optimization Problem

Formally, a combinatorial optimization problem A is a quadruple (,I f , mg , ), where

• I is a set of instances; • given an instance xI∈ , fx()is the set of feasible solutions;

• given an instance x and a feasible solution y of x, mxy(, )denotes the measure of y , which is usually a positive real.

• g is the goal function, and is either min or max . The goal is then to find for some instance x an optimal solution, that is, a feasible solution with m(x,y)= g{m(x,y′′ ) y∈ f(x)}.

For each combinatorial optimization problem, there is a corresponding decision problem that asks whether there is a feasible solution for some particular measure m0 . For example, if there is a

______WORLD TECHNOLOGIES ______204 Numerical Analysis, Modelling and Simulation

graph G which contains vertices u and v, an optimization problem might be “find a path from u to v that uses the fewest edges”. This problem might have an answer of, say, 4. A corresponding decision problem would be “is there a path from u to v that uses 10 or fewer edges?” This problem can be answered with a simple ‘yes’ or ‘no’.

In the field of approximation algorithms, algorithms are designed to find near-optimal solutions to hard problems. The usual decision version is then an inadequate definition of the problem since it only specifies acceptable solutions. Even though we could introduce suitable decision problems, the problem is more naturally characterized as an optimization problem.

NP Optimization Problem

An NP-optimization problem (NPO) is a combinatorial optimization problem with the following additional conditions. Note that the below referred polynomials are functions of the size of the respective functions’ inputs, not the size of some implicit set of input instances.

• the size of every feasible solution y∈ fx()is polynomially bounded in the size of the given instance x ,

• the languages {}x|x∈ I and {(x,y) y∈ f(x)} can be recognized in polynomial time, and

• m is polynomial-time computable.

This implies that the corresponding decision problem is in NP. In computer science, interesting optimization problems usually have the above properties and are therefore NPO problems. A prob- lem is additionally called a P-optimization (PO) problem, if there exists an algorithm which finds optimal solutions in polynomial time. Often, when dealing with the class NPO, one is interested in optimization problems for which the decision versions are NP-complete. Note that hardness relations are always with respect to some reduction. Due to the connection between approximation algorithms and computational optimization problems, reductions which preserve approximation in some respect are for this subject preferred than the usual Turing and Karp reductions. An ex- ample of such a reduction would be the L-reduction. For this reason, optimization problems with NP-complete decision versions are not necessarily called NPO-complete.

NPO is divided into the following subclasses according to their approximability:

• NPO(I): Equals FPTAS. Contains the Knapsack problem.

• NPO(II): Equals PTAS. Contains the Makespan scheduling problem.

• NPO(III): :The class of NPO problems that have polynomial-time algorithms which com- putes solutions with a cost at most c times the optimal cost (for minimization problems) or a cost at least of the optimal cost (for maximization problems). In Hromkovič’s book, excluded from this class are all NPO(II)-problems save if P=NP. Without the exclusion, equals APX. Contains MAX-SAT and metric TSP.

• NPO(IV): :The class of NPO problems with polynomial-time algorithms approximating the optimal solution by a ratio that is polynomial in a logarithm of the size of the input. In Hromkovic’s book, all NPO(III)-problems are excluded from this class unless P=NP. Con-

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 205

tains the set cover problem.

• NPO(V): :The class of NPO problems with polynomial-time algorithms approximating the optimal solution by a ratio bounded by some function on n. In Hromkovic’s book, all NPO(IV)-problems are excluded from this class unless P=NP. Contains the TSP and Max Clique problems.

Another class of interest is NPOPB, NPO with polynomially bounded cost functions. Problems with this condition have many desirable properties.

Multi-objective Optimization

Multi-objective optimization (also known as multi-objective programming, vector optimization, multicriteria optimization, multiattribute optimization or Pareto optimization) is an area of mul- tiple criteria decision making, that is concerned with mathematical optimization problems involv- ing more than one objective function to be optimized simultaneously. Multi-objective optimization has been applied in many fields of science, including engineering, economics and logistics where optimal decisions need to be taken in the presence of trade-offs between two or more conflicting objectives. Minimizing cost while maximizing comfort while buying a car, and maximizing perfor- mance whilst minimizing fuel consumption and emission of pollutants of a vehicle are examples of multi-objective optimization problems involving two and three objectives, respectively. In practi- cal problems, there can be more than three objectives.

For a nontrivial multi-objective optimization problem, there does not exist a single solution that simultaneously optimizes each objective. In that case, the objective functions are said to be conflicting, and there exists a (possibly infinite) number of Pareto optimal solutions. A solution is called nondominated, Pareto optimal, Pareto efficient or noninferior, if none of the objective functions can be improved in value without degrading some of the other objec- tive values. Without additional subjective preference information, all Pareto optimal solutions are considered equally good (as vectors cannot be ordered completely). Researchers study multi-objective optimization problems from different viewpoints and, thus, there exist differ- ent solution philosophies and goals when setting and solving them. The goal may be to find a representative set of Pareto optimal solutions, and/or quantify the trade-offs in satisfying the different objectives, and/or finding a single solution that satisfies the subjective preferences of a human decision maker (DM).

Introduction

A multi-objective optimization problem is an optimization problem that involves multiple objec- tive functions. In mathematical terms, a multi-objective optimization problem can be formulated as where the integer k ≥ 2 is the number of objectives and the set X is the feasible set of decision vec- tors. The feasible set is typically defined by some constraint functions. In addition, the vector-val- ued objective function is often defined as

______WORLD TECHNOLOGIES ______206 Numerical Analysis, Modelling and Simulation

kT f: X→=… , fx () ( fx1 (),, fk ()). x If some objective function is to be maximized, it is equivalent to minimize its negative. The image of X is denoted by Y ∈ k An element xX* ∈ is called a feasible solution or a feasible decision. A vector z**:= fx () ∈ k for a feasible solution x∗ is called an objective vector or an outcome. In multi-objective optimization, there does not typically exist a feasible solution that minimizes all objective functions simultane- ously. Therefore, attention is paid to Pareto optimal solutions; that is, solutions that cannot be improved in any of the objectives without degrading at least one of the other objectives. In math- ematical terms, a feasible solution xX1 ∈ is said to (Pareto) dominate another solution xX2 ∈ , if

fx()12≤ fx ()for all indices ik∈…1, 2, , and 1. ii {} fx()12< fx ()for at least one index jk∈…{}1, 2, , . 2. jj A solution xX* ∈ (and the corresponding outcome fx()* is called Pareto optimal, if there does not exist another solution that dominates it. The set of Pareto optimal outcomes is often called the Pareto front or Pareto boundary.

The Pareto front of a multi-objective optimization problem is bounded by a so-called nadir objec- tive vector znad and an ideal objective vector zideal , if these are finite. The nadir objective vector is defined as

nad zii= supfx ( ) for all i= 1, … , k xX∈ is Pareto optimal

and the ideal objective vector as

ideal zii= inf fx ( ) for all i= 1, … , k . xX∈

In other words, the components of a nadir and an ideal objective vector define upper and lower bounds for the objective function values of Pareto optimal solutions, respectively. In practice, the nadir objective vector can only be approximated as, typically, the whole Pareto optimal set is un- known. In addition, a utopian objective vector zutopian with

utopian ideal zzii=−=… for all ik 1, , ,

where  > 0 is a small constant, is often defined because of numerical reasons.

Examples of Multi-objective Optimization Applications Economics

In economics, many problems involve multiple objectives along with constraints on what combi- nations of those objectives are attainable. For example, consumer’s demand for various goods is determined by the process of maximization of the utilities derived from those goods, subject to a constraint based on how much income is available to spend on those goods and on the prices of

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 207

those goods. This constraint allows more of one good to be purchased only at the sacrifice of con- suming less of another good; therefore, the various objectives (more consumption of each good is preferred) are in conflict with each other. A common method for analyzing such a problem is to use a graph of indifference curves, representing preferences, and a budget constraint, representing the trade-offs that the consumer is faced with.

Another example involves the production possibilities frontier, which specifies what combinations of various types of goods can be produced by a society with certain amounts of various resources. The frontier specifies the trade-offs that the society is faced with — if the society is fully utilizing its resources, more of one good can be produced only at the expense of producing less of another good. A society must then use some process to choose among the possibilities on the frontier.

Macroeconomic policy-making is a context requiring multi-objective optimization. Typically a central bank must choose a stance for monetary policy that balances competing objectives — low inflation, low unemployment, low balance of trade deficit, etc. To do this, the central bank uses a model of the economy that quantitatively describes the various causal linkages in the economy; it simulates the model repeatedly under various possible stances of monetary policy, in order to obtain a menu of possible predicted outcomes for the various variables of interest. Then in prin- ciple it can use an aggregate objective function to rate the alternative sets of predicted outcomes, although in practice central banks use a non-quantitative, judgement-based, process for ranking the alternatives and making the policy choice.

Finance

In finance, a common problem is to choose a portfolio when there are two conflicting objectives — the desire to have the expected value of portfolio returns be as high as possible, and the desire to have risk, often measured by the standard deviation of portfolio returns, be as low as possible. This problem is often represented by a graph in which the efficient frontier shows the best combi- nations of risk and expected return that are available, and in which indifference curves show the investor’s preferences for various risk-expected return combinations. The problem of optimizing a function of the expected value (first moment) and the standard deviation (square root of the sec- ond central moment) of portfolio return is called a two-moment decision model.

Optimal Control

In engineering and economics, many problems involve multiple objectives which are not describ- able as the-more-the-better or the-less-the-better; instead, there is an ideal target value for each objective, and the desire is to get as close as possible to the desired value of each objective. For example, energy systems typically have a trade-off between performance and cost or one might want to adjust a rocket’s fuel usage and orientation so that it arrives both at a specified place and at a specified time; or one might want to conductopen market operations so that both the inflation rate and the unemployment rate are as close as possible to their desired values.

Often such problems are subject to linear equality constraints that prevent all objectives from being simultaneously perfectly met, especially when the number of controllable variables is less than the number of objectives and when the presence of random shocks generates uncertainty. Commonly a multi-objective quadratic objective function is used, with the cost associated with an objective

______WORLD TECHNOLOGIES ______208 Numerical Analysis, Modelling and Simulation

rising quadratically with the distance of the objective from its ideal value. Since these problems typically involve adjusting the controlled variables at various points in time and/or evaluating the objectives at various points in time, intertemporal optimization techniques are employed.

Optimal Design

Product and process design can be largely improved using modern modeling, simulation and opti- mization techniques. The key question in optimal design is the measure of what is good or desirable about a design. Before looking for optimal designs it is important to identify characteristics which contribute the most to the overall value of the design. A good design typically involves multiple cri- teria/objectives such as capital cost/investment, operating cost, profit, quality and/or recovery of the product, efficiency, process safety, operation time etc. Therefore, in practical applications, the performance of process and product design is often measured with respect to multiple objectives. These objectives typically are conflicting, i.e. achieving the optimal value for one objective requires some compromise on one or more of other objectives.

For example, in paper industry when designing a paper mill, one can seek to decrease the amount of capital invested in a paper mill and enhance the quality of paper simultaneously. If the design of a paper mill is defined by large storage volumes and paper quality is defined by quality parameters, then the problem of optimal design of a paper mill can include objectives such as: i) minimization of expected variation of those quality parameter from their nominal values, ii) minimization of ex- pected time of breaks and iii) minimization of investment cost of storage volumes. Here, maximum volume of towers are design variables. This example of optimal design of a paper mill is a simpli- fication of the model used in. Multi-objective design optimization have also been implemented in engineering systems - e.g. design of nano-CMOS semiconductors, design of solar-powered irriga- tion systems, optimization of sand mould systems, engine design, optimal sensor deployment and optimal controller design.

Process Optimization

Multi-objective optimization has also been increasingly employed in chemical engineering. In Fi- andaca and Fraga, (2009), the multi-objective generic algorithm (MOGA) was used to optimize the design of the pressure swing adsorption process (cyclic separation process). The design problem considered in that work involves the bi-objective maximization of nitrogen recovery and nitrogen purity. The results obtained in that work provided a good approximation of the Pareto frontier with acceptable trade-offs between the objectives. A multi-objective chemical process problem for the thermal processing of food was solved by Sendín et al.,(2010). In that work, two case studies (bi-objective and triple objective problems) with nonlinear dynamic models were tackled. A hybrid approach consisting of the weighted Tchebycheff and the Normal Boundary Intersection approach- es were utilized. This novel hybrid approach was successful in constructing a Pareto optimal set for the thermal processing of foods. The multi-objective optimization of the combined carbon dioxide reforming and partial-oxidation of methane was carried out in the work of Ganesan et al.,(2013). In that work, the objective functions were: methane conversion, carbon monoxide selectivity and the hydrogen to carbon monoxide ratio. The problem was tackled using the Normal Boundary In- tersection (NBI) method in conjunction with two swarm-based techniques (Gravitational Search Algorithm (GSA) and Particle Swarm Optimization (PSO)). Similar multi-objective problems were

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 209

encountered and solved for applications involving chemical extraction and bioethanol production processes. In Abakarov et al., (2013), an alternative technique to solve multi-objective optimiza- tion problems arising in food engineering was proposed. The Aggregating Functions Approach, the Adaptive Random Search Algorithm, and the Penalty Functions Approach were used to compute the initial set of the non-dominated or Pareto-optimal solutions. The well-known method AHP and Tabular Method were used simultaneously for choosing the best alternative among the com- puted subset of non-dominated solutions for osmotic dehydration processes.

Radio Resource Management

The purpose of radio resource management is to satisfy the data rates that are requested by the users of a cellular network. The main resources are time intervals, frequency blocks, and transmit powers. Each user has its own objective function that, for example, can represent some combination of the data rate, latency, and energy efficiency. These objectives are conflicting since the frequency resources are very scarce, thus there is a need for tight spatial frequency reuse which causes immense inter-user interference if not properly controlled. Multi-user MIMO techniques are nowadays used to reduce the interference by adaptive precoding. The network operator would like to both bring great coverage and high data rates, thus the operator would like to find a Pareto optimal solution that balance the total network data throughput and the user fairness in an appropriate subjective manner.

Radio resource management is often solved by scalarization; that is, selection of a network utility func- tion that tries to balance throughput and user fairness. The choice of utility function has a large impact on the computational complexity of the resulting single-objective optimization problem. For example, the common utility of weighted sum rate gives an NP-hard problem with a complexity that scales expo- nentially with the number of users, while the weighted max-min fairness utility results in a quasi-con- vex optimization problem with only a polynomial scaling with the number of users.

Electric Power Systems

Reconfiguration, by exchanging the functional links between the elements of the system,rep- resents one of the most important measures which can improve the operational performance of a distribution system. The problem of optimization through the reconfiguration of a power distri- bution system, in terms of its definition, is a historical single objective problem with constraints. Since 1975, when Merlin and Back introduced the idea of distribution system reconfiguration for active power loss reduction, until nowadays, a lot of researchers have proposed diverse methods and algorithms to solve the reconfiguration problem as a single objective problem. Some authors have proposed Pareto optimality based approaches (including active power losses and reliability indices as objectives). For this purpose, different artificial intelligence based methods have been used: microgenetic, branch exchange, particle swarm optimization and non-dominated sorting genetic algorithm.

Solving A Multi-Objective Optimization Problem

As there usually exist multiple Pareto optimal solutions for multi-objective optimization problems, what it means to solve such a problem is not as straightforward as it is for a conventional single-ob- jective optimization problem. Therefore, different researchers have defined the term “solving a multi-objective optimization problem” in various ways.

______WORLD TECHNOLOGIES ______210 Numerical Analysis, Modelling and Simulation

Many methods convert the original problem with multiple objectives into a single-objective optimization problem. This is called a scalarized problem. If scalarization is done neatly, Pareto optimality of the solutions obtained can be guaranteed.

Solving a multi-objective optimization problem is sometimes understood as approximating or computing all or a representative set of Pareto optimal solutions.

When decision making is emphasized, the objective of solving a multi-objective optimization prob- lem is referred to supporting a decision maker in finding the most preferred Pareto optimal solu- tion according to his/her subjective preferences. The underlying assumption is that one solution to the problem must be identified to be implemented in practice. Here, a human decision maker (DM) plays an important role. The DM is expected to be an expert in the problem domain.

The most preferred results can be found using different philosophies. Multi-objective optimization methods can be divided into four classes. In so-called no preference methods, no DM is expected to be available, but a neutral compromise solution is identified without preference information. The other classes are so-called a priori, a posteriori and interactive methods and they all involve preference information from the DM in different ways.

In a priori methods, preference information is first asked from the DM and then a solution best satisfying these preferences is found. In a posteriori methods, a representative set of Pareto opti- mal solutions is first found and then the DM must choose one of them. In interactive methods, the decision maker is allowed to iteratively search for the most preferred solution. In each iteration of the interactive method, the DM is shown Pareto optimal solution(s) and describes how the solu- tion(s) could be improved. The information given by the decision maker is then taken into account while generating new Pareto optimal solution(s) for the DM to study in the next iteration. In this way, the DM learns about the feasibility of his/her wishes and can concentrate on solutions that are interesting to him/her. The DM may stop the search whenever he/she wants to. More infor- mation and examples of different methods in the four classes are given in the following sections.

Scalarizing Multi-Objective Optimization Problems

Scalarizing a multi-objective optimization problem is an a priori method, which means formu- lating a single-objective optimization problem such that optimal solutions to the single-objective optimization problem are Pareto optimal solutions to the multi-objective optimization problem. In addition, it is often required that every Pareto optimal solution can be reached with some param- eters of the scalarization. With different parameters for the scalarization, different Pareto optimal solutions are produced. A general formulation for a scalarization of a multiobjective optimization is thus

mingfx (1 ( ),… , fk ( x ),θ )

s.t xX∈ θ ,

where θ is a vector parameter, the set XXθ ⊆ is a set depending on the parameter θ and g : k +1 is a function. Very well-known examples are the so-called

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 211

• linear scalarization

k minxX∈ ∑ wf i i ( x ), i=1

where the weights of the objectives wi > 0 are the parameters of the scalarization, and the

•  -constraint method

minfxj ( ) s.t. xX∈

fxij( )≤ for i ∈… {1, , k } { j },

where upper bounds  j are parameters as above and f j is the objective to be minimized. Somewhat more advanced examples are the achievement scalarizing problems of Wierzbicki. One example of the Achievement scalarizing problems can be formulated as

fx()− zk fx() ii+ ρ i min maxik=1, … , nad utopia ∑ nad utopian zzii−−i=1 zzi i subject to xS∈ , k fx() ρ i ρ > where the term ∑ nad utopia is called the augmentation term, 0 is a small constant, and i=1 zzii− znad and zutopian are the nadir vector and a utopian vectors, respectively. In the above problem, the parameter is the so-called reference point z which represents objective function values preferred by the decision maker.

For example, portfolio optimization is often conducted in terms of mean-variance analysis. In this context, the efficient set is a subset of the portfolios parametrized by the portfolio mean return µP

in the problem of choosing portfolio shares so as to minimize the portfolio’s variance of return σ P

subject to a given value of µP . Alternatively, the efficient set can be specified by choosing the port- folio shares so as to maximize the function µσPP− b ; the set of efficient portfolios consists of the solutions as b ranges from zero to infinity.

No-preference Methods

Multi-objective optimization methods that do not require any preference information to be ex- plicitly articulated by a decision maker can be classified as no-preference methods. A well-known example is the method of global criterion, in which a scalarized problem of the form

min‖‖ f (x)− zideal s.t. xX∈ ⋅ is solved. In the above problem, can be any Lp norm, with common choices including L1 , L2 and L∞ . The method of global criterion is sensitive to the scaling of the objective functions, and

______WORLD TECHNOLOGIES ______212 Numerical Analysis, Modelling and Simulation

thus, it is recommended that the objectives are normalized into a uniform, dimensionless scale.

A Priori Methods

A priori methods require that sufficient preference information is expressed before the solution process. Well-known examples of a priori methods include the utility function method, lexico- graphic method, and goal programming.

In the utility function method, it is assumed that the decision maker’s utility function is available. A mapping uY: →  is a utility function if for all yy12, ∈Y if it holds that uu()yy12> ()if the de- cision maker prefers y1 to y2 , and uu()yy12= ()if the decision maker is indifferent between y1 and y2 . The utility function specifies an ordering of the decision vectors (recall that vectors can be ordered in many different ways). Once u is obtained, it suffices to solve maxuX (fx ( )) subject to x∈ ,

but in practice it is very difficult to construct a utility function that would accurately represent the decision maker’s preferences - particularly since the Pareto front is unknown before the optimiza- tion begins.

Lexicographic method assumes that the objectives can be ranked in the order of importance. We can assume, without loss of generality, that the objective functions are in the order of importance

so that f1 is the most important and fk the least important to the decision maker. The lexicograph- ic method consists of solving a sequence of single-objective optimization problems of the form

* ∗ where y j is the optimal value of the above problem with lj= . Thus, y11:= min{fX ( xx ) ∈ } and each new problem of the form in the above problem in the sequence adds one new constraint as l goes

A Posteriori Methods

A posteriori methods aim at producing all the Pareto optimal solutions or a representative subset of the Pareto optimal solutions. Most a posteriori methods fall into either one of the following two classes: mathematical programming -based a posteriori methods, where an algorithm is repeated and each run of the algorithm produces one Pareto optimal solution, and evolutionary algorithms where one run of the algorithm produces a set of Pareto optimal solutions.

Well-known examples of mathematical programming -based a posteriori methods are the Nor- mal Boundary Intersection (NBI), Modified Normal Boundary Intersection (NBIm) Normal Con- straint (NC), Successive Pareto Optimization (SPO) and Directed Search Domain (DSD) methods that solve the multi-objective optimization problem by constructing several scalarizations. The solution to each scalarization yields a Pareto optimal solution, whether locally or globally. The sca- larizations of the NBI, NBIm, NC and DSD methods are constructed with the target of obtaining evenly distributed Pareto points that give a good evenly distributed approximation of the real set of Pareto points.

Evolutionary algorithms are popular approaches to generating Pareto optimal solutions to a multi-objective optimization problem. Currently, most evolutionary multi-objective optimiza-

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 213

tion (EMO) algorithms apply Pareto-based ranking schemes. Evolutionary algorithms such as the Non-dominated Sorting Genetic Algorithm-II (NSGA-II) and Strength Pareto Evolutionary Algo- rithm 2 (SPEA-2) have become standard approaches, although some schemes based on particle swarm optimization and simulated annealing are significant. The main advantage of evolutionary algorithms, when applied to solve multi-objective optimization problems, is the fact that they typ- ically generate sets of solutions, allowing computation of an approximation of the entire Pareto front. The main disadvantage of evolutionary algorithms is their lower speed and the Pareto opti- mality of the solutions cannot be guaranteed. It is only known that none of the generated solutions dominates the others.

In 2015, a new paradigm for multi-objective optimization based on novelty was presented using evolutionary algorithms. This new paradigm searches for novel solutions in objective space (i.e., novelty search on objective space) in addition to the search for non-dominated solutions. Novelty search is like stepping stones guiding the search to previously unexplored places. It is specially useful in overcoming bias and plateaus as well as guiding the search in many-objective optimi- zation problems. Subpopulation Algorithm based on Novelty is the first algorithm based on this paradigm.

Commonly known a posteriori methods are listed below:

• Normal Boundary Intersection (NBI)

• Modified Normal Boundary Intersection (NBIm) Normal Constraint (NC),

• Successive Pareto Optimization (SPO)

• Directed Search Domain (DSD)

• NSGA-II

• PGEN (Pareto surface generation for convex multi-objective instances)

• IOSO (Indirect Optimization on the basis of Self-Organization)

• SMS-EMOA (S-metric selection evolutionary multi-objective algorithm)

• Reactive Search Optimization (using for adapting strategies and objec- tives), implemented in LIONsolver

• Benson’s algorithm for linear vector optimization problems

• Multi-objective particle swarm optimization

• Subpopulation Algorithm based on Novelty

Interactive Methods

In interactive methods, the solution process is iterative and the decision maker continuously inter- acts with the method when searching for the most preferred solution. In other words, the decision maker is expected to express preferences at each iteration in order to get Pareto optimal solutions

______WORLD TECHNOLOGIES ______214 Numerical Analysis, Modelling and Simulation

that are of interest to him/her and learn what kind of solutions are attainable. The following steps are commonly present in interactive methods:

1. initialize (e.g. calculate ideal and approximated nadir objective vectors and show them to the decision maker)

2. generate a Pareto optimal starting point (by using e.g. some no-preference method or solu- tion given by the decision maker)

3. ask for preference information from the decision maker (e.g. aspiration levels or number of new solutions to be generated)

4. generate new Pareto optimal solution(s) according to the preferences and show it/them and possibly some other information about the problem to the decision maker

5. if several solutions were generated, ask the decision maker to select the best solution so far

6. stop, if the decision maker wants to; otherwise, go to step 3).

Above, aspiration levels refer to desirable objective function values forming a reference point. In- stead of mathematical convergence that is often used as a stopping criterion in mathematical op- timization methods, a psychological convergence is emphasized in interactive methods. Generally speaking, a method is terminated when the decision maker is confident that (s)he has found the most preferred solution available.

Different interactive methods involve different types of preference information. For example, three types can be identified: methods based on 1) trade-off information, 2) reference points and 3) classification of objective functions. On the other hand, a fourth type of generating a small sample of solutions is included in and. An example of interactive method utilizing trade-off information is the Zionts-Wallenius method, where the decision maker is shown several objective trade-offs at each iteration, and (s)he is expected to say whether (s)he likes, dislikes or is indifferent with respect to each trade-off. In reference point based methods, the decision maker is expected at each iteration to specify a reference point consisting of desired values for each objective and a corresponding Pareto optimal solution(s) is then computed and shown to him/her for analysis. In classification based interactive methods, the decision maker is assumed to give preferences in the form of classifying objectives at the current Pareto optimal solution into different classes indicat- ing how the values of the objectives should be changed to get a more preferred solution. Then, the classification information given is taken into account when new (more preferred) Pareto optimal solution(s) are computed. In the satisficing trade-off method (STOM) three classes are used: ob- jectives whose values 1) should be improved, 2) can be relaxed, and 3) are acceptable as such. In the NIMBUS method, two additional classes are also used: objectives whose values 4) should be improved until a given bound and 5) can be relaxed until a given bound.

Hybrid Methods

Different hybrid methods exist, but here we consider hybridizing MCDM (multi-criteria decision making) and EMO (evolutionary multi-objective optimization). A hybrid algorithm in the context of multi-objective optimization is a combination of algorithms/approaches from these two fields.

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 215

Hybrid algorithms of EMO and MCDM are mainly used to overcome shortcomings by utilizing strengths. Several types of hybrid algorithms have been proposed in the literature, e.g. incorpo- rating MCDM approaches into EMO algorithms as a local search operator and to lead a DM to the most preferred solution(s) etc. A local search operator is mainly used to enhance the rate of convergence of EMO algorithms.

The roots for hybrid multi-objective optimization can be traced to the first Dagstuhl seminar or- ganized in November 2004. Here some of the best minds in EMO (Professor Kalyanmoy Deb, Professor Jürgen Branke etc.) and MCDM (Professor Kaisa Miettinen, Professor Ralph E. Steuer etc.) realized the potential in combining ideas and approaches of MCDM and EMO fields to pre- pare hybrids of them. Subsequently many more Dagstuhl seminars have been arranged to foster collaboration. Recently, hybrid multi-objective optimization has become an important theme in several international conferences in the area of EMO and MCDM.

Visualization of The Pareto Front

Visualization of the Pareto front is one of the a posteriori preference techniques of multi-objective op- timization. The a posteriori preference techniques provide an important class of multi-objective opti- mization techniques. Usually the a posteriori preference techniques include four steps: (1) computer approximates the Pareto front, i.e. the Pareto optimal set in the objective space; (2) the decision maker studies the Pareto front approximation; (3) the decision maker identifies the preferred point at the Pareto front; (4) computer provides the Pareto optimal decision, which output coincides with the ob- jective point identified by the decision maker. From the point of view of the decision maker, the second step of the a posteriori preference techniques is the most complicated one. There are two main ap- proaches to informing the decision maker. First, a number of points of the Pareto front can be provided in the form of a list (interesting discussion and references are given in) or using Heatmaps. Alternative idea consists in visualizing the Pareto front.

Visualization in Bi-objective Problems: Tradeoff Curve

In the case of bi-objective problems, informing the decision maker concerning the Pareto front is usually carried out by its visualization: the Pareto front, often named the tradeoff curve in this case, can be drawn at the objective plane. The tradeoff curve gives full information on objective values and on objective tradeoffs, which inform how improving one objective is related to dete- riorating the second one while moving along the tradeoff curve. The decision maker takes this information into account while specifying the preferred Pareto optimal objective point. The idea to approximate and visualize the Pareto front was introduced for linear bi-objective decision prob- lems by S.Gass and T.Saaty. This idea was developed and applied in environmental problems by J.L. Cohon. A review of methods for approximating the Pareto front for various decision problems with a small number of objectives (mainly, two) is provided in.

Visualization in High-order Multi-objective Optimization Problems

There are two generic ideas how to visualize the Pareto front in high-order multi-objective deci- sion problems (problems with more than two objectives). One of them, which is applicable in the case of a relatively small number of objective points that represent the Pareto front, is based on using the visualization techniques developed in statistics (various diagrams, etc). The second idea

______WORLD TECHNOLOGIES ______216 Numerical Analysis, Modelling and Simulation proposes the display of bi-objective cross-sections (slices) of the Pareto front. It was introduced by W.S. Meisel in 1973 who argued that such slices inform the decision maker on objective tradeoffs. The figures that display a series of bi-objective slices of the Pareto front for three-objective prob- lems are known as the decision maps. They give a clear picture of tradeoffs between three criteria. Disadvantages of such an approach are related to two following facts. First, the computational procedures for constructing the bi-objective slices of the Pareto front are not stable since the Pa- reto front is usually not stable. Secondly, it is applicable in the case of only three objectives. In the 1980s, the idea W.S. Meisel of implemented in a different form – in the form of the Interactive Decision Maps (IDM) technique.

Multi-objective Optimization Software

Name License Brief info (alphabetically) free VLP (vector linear programs) solver - implementation of BENSOLVE GPL Benson’s algorithm for solving vector linear programs, in particular, multiobjective linear programs A novel evolutionary computation framework for rapid prototyping Distributed Evolutionary and testing of ideas. It seeks to make algorithms explicit and data LGPL Algorithms in Python structures transparent. It works in perfect harmony with different parallelisation mechanisms. Java-based framework for multi-objective optimization with real, MOEA Framework LGPL discrete, grammatical, or program representations. Decisionarium free for academic global space for decision support (for academic use) for Multi Objective Optimization from GUIMOO LGPL INRIA. This is only for analyzing the results in multi-objective optimization. free for non-profit MCDM software of the Laboratory of Intelligent Decision Support IDSS Software activities Systems (University of Poznan, Poland) implementation of the interactive NIMBUS method that can be IND-NIMBUS proprietary connected with different simulation and modelling tools solver with specifiable accuracy from OpenOpt - free universal interalg BSD cross-platform numerical optimization framework written in Python language using NumPy arrays. Java-based framework for multi-objecive optimization with jMetal LGPL metaheuristics. MakeItRational proprietary AHP based decision software Multi-Objective extension for MIDACO (in Matlab, Python, C/ Midacomo proprietary/free C++ and Fortran), solves (constrained) problems with continuous, discrete and mixed integer variables. An improved recursive algorithm for multi-objective integer programming which uses an extended LP file format with arbitrary MOIP_AIRA free for academic number of objectives and returns the set of nondominated solutions. Collection of Multiple Criteria Decision different by Dr. Roland Weistroffer Support Software Implemented Multi-objective Algorithms: GDE3 and Subpopulation zweifel library Apache Algorithm based on Novelty (SAN).

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 217

MGO is a scala library based on the cake pattern for multi-objective MGO (Multiple Goal GNU Affero GPLv3 Evolutionary Algorithms, work also with OpenMOLE using Netlogo Optimization) software licence and other agent based model software for solving nonlinear (and even nondifferentiable) multiobjective WWW-NIMBUS free for academic optimization problems in an interactive way. Operates via the Internet. module specifically devoted to multiobjective optimization in ParadisEO, software framework for the design and implementation ParadisEO-MOEO CeCill of metaheuristics, hybrid methods as well as parallel and distributed models from INRIA

Parallel Global Multiobjective Optimizer (and its Python alter ego PyGMO) offers a user-friendly access to a wide array of global and local optimization algorithms and problems. The main purpose of the software is to provide a parallelization engine common to PaGMO / PyGMO free all algorithms through the ‘generalized island model’. Initially developed within the European Space Agency, the code was intended to help the automated design of interplanetary trajectories and spacecraft transfers in general. The user can implement his own problem and algorithm both in C++ and in Python.

Weistroffer et al. have written a book chapter on multi-objective optimization software.

Eigendecomposition of A Matrix

In the mathematical discipline of linear algebra, eigendecomposition or sometimes spectral de- composition is the factorization of a matrix into a canonical form, whereby the matrix is represent- ed in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way.

Fundamental Theory of Matrix Eigenvectors and Eigenvalues

A (non-zero) vector v of dimension N is an eigenvector of a square (N×N) matrix A if it satisfies the linear equation Av= λ v where λ is a scalar, termed the eigenvalue corresponding to v. That is, the eigenvectors are the vectors that the linear transformation A merely elongates or shrinks, and the amount that they elongate/shrink by is the eigenvalue. The above equation is called the eigenvalue equation or the eigenvalue problem.

This yields an equation for the eigenvalues

p()()λλ:= detAI −= 0.

We call p(λ) the characteristic polynomial, and the equation, called the characteristic equation, is an Nth order polynomial equation in the unknown λ. This equation will have Nλ distinct solutions,

______WORLD TECHNOLOGIES ______218 Numerical Analysis, Modelling and Simulation

where 1 ≤ Nλ ≤ N . The set of solutions, that is, the eigenvalues, is called the spectrum of A. We can factor p as

nn12 nk p()λ=−−( λλ12 ) ( λλ ) ( λλ −=k ) 0.

The integer ni is termed the algebraic multiplicity of eigenvalue λi. The algebraic multiplicities sum to N:

∑ nNi = . i=1

For each eigenvalue, λi, we have a specific eigenvalue equation

()A−=λi Iv 0.

There will be 1 ≤ mi ≤ ni linearly independent solutions to each eigenvalue equation. The mi solu- tions are the eigenvectors associated with the eigenvalue λi. The integer mi is termed the geometric

multiplicity of λi. It is important to keep in mind that the algebraic multiplicity ni and geometric

multiplicity mi may or may not be equal, but we always have mi ≤ ni. The simplest case is of course when mi = ni = 1. The total number of linearly independent eigenvectors, Nv, can be calculated by summing the geometric multiplicities

∑ mNi = v . i=1

th The eigenvectors can be indexed by eigenvalues, i.e. using a double index, with vi,j being the j ei- genvector for the ith eigenvalue. The eigenvectors can also be indexed using the simpler notation of a single index vk, with k = 1, 2, ..., Nv.

Eigendecomposition of A Matrix

Let A be a square (N×N) matrix with N linearly independent eigenvectors, qii (= 1, … , N ). Then A can be factorized as

A= QQ Λ −1

th where Q is the square (N×N) matrix whose i column is the eigenvector qi of A and Λ is the diag-

onal matrix whose diagonal elements are the corresponding eigenvalues, i.e., Λ=iiλ i . Note that only diagonalizable matrices can be factorized in this way. For example, the defective matrix 11 cannot be diagonalized. 01 The eigenvectors qii (= 1, … , N ) are usually normalized, but they need not be. A non-normalized

set of eigenvectors, vii (= 1, … , N ), can also be used as the columns of Q. That can be understood by noting that the magnitude of the eigenvectors in Q gets canceled in the decomposition by the presence of Q−1.

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 219

Example

10 Taking a 2 × 2 real matrix A = as an example to be decomposed into a diagonal matrix 13

ab 22× through multiplication of a non-singular matrix B = ∈  .. cd Then −1 ab10 ab x0 x 0 = , for some real diagonal matrix . cd13 cd 0 y 0 y Shifting B to the right hand side: 10 ab   ab  x0    =    13 cd   cd 0 y 

The above equation can be decomposed into 2 simultaneous equations:

10 a   ax    =   13  c   cx   10 b   by   =      13 d   dy 

Factoring out the eigenvalues x and y :

10 aa   = x  13 cc   10 bb   =  y  13 dd   ab  Letting ab=,, =  this gives us two vector equations: cd   Aa= xa   Ab= yb

And can be represented by a single vector equation involving 2 solutions as eigenvalues: Au= λ u   where λ represents the two eigenvalues x and y, u represents the vectors a and b.

Shifting λu to the left hand side and factorizing u out

______WORLD TECHNOLOGIES ______220 Numerical Analysis, Modelling and Simulation

(A−=λ Iu )0

Since B is non-singular, it is essential that u is non-zero. Therefore, ()AI0−=λ

Considering the determinant of (AI− λ ), 10− λ = 0 13− λ

Thus (1−λλ )(3 −= ) 0

Giving us the solutions of the eigenvalues for the matrix A as λ =1or λ = 3, , and the resulting

10 diagonal matrix from the eigendecomposition of A is thus . 03 Putting the solutions back into the above simultaneous equations

10 aa   =1  13 cc   10 bb   =  3  13 dd 

Solving the equations, we have a=−∈2, ca  and bd=0, ∈  −20c Thus the matrix A required for the eigendecomposition of ,[cd , ]∈  .is . i.e. : cd −1 −−20cc 1020 10 = ,[cd , ]∈ cd13 cd 03

Matrix Inverse Via Eigendecomposition

If matrix A can be eigendecomposed and if none of its eigenvalues are zero, then A is nonsingular and its inverse is given by

A−1= QQ Λ −− 11

Furthermore, because Λ is a diagonal matrix, its inverse is easy to calculate: 1 Λ=−1 ii λi

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 221

Practical Implications

When eigendecomposition is used on a matrix of measured, real data, the inverse may be less valid when all eigenvalues are used unmodified in the form above. This is because as eigenvalues become relatively small, their contribution to the inversion is large. Those near zero or at the “noise” of the mea- surement system will have undue influence and could hamper solutions (detection) using the inverse.

Two mitigations have been proposed: 1) truncating small/zero eigenvalues, 2) extending the low- est reliable eigenvalue to those below it.

The first mitigation method is similar to a sparse sample of the original matrix, removing compo- nents that are not considered valuable. However, if the solution or detection process is near the noise level, truncating may remove components that influence the desired solution.

The second mitigation extends the eigenvalue so that lower values have much less influence over inversion, but do still contribute, such that solutions near the noise will still be found.

The reliable eigenvalue can be found by assuming that eigenvalues of extremely similar and low value are a good representation of measurement noise (which is assumed low for most systems).

If the eigenvalues are rank-sorted by value, then the reliable eigenvalue can be found by minimi- zation of the Laplacian of the sorted eigenvalues:

2 min |∇ λs |

where the eigenvalues are subscripted with an ‘s’ to denote being sorted. The position of the mini- mization is the lowest reliable eigenvalue. In measurement systems, the square root of this reliable eigenvalue is the average noise over the components of the system.

Functional Calculus

The eigendecomposition allows for much easier computation of power series of matrices. If f(x) is given by

2 f() x=++ a01 ax ax 2 +

then we know that ff()()AQ= Λ Q−1

Because Λ is a diagonal matrix, functions of Λ are very easy to calculate: ffΛ=λ ()()ii i

The off-diagonal elements of f(Λ) are zero; that is, f(Λ) is also a diagonal matrix. Therefore, calcu- lating f(A) reduces to just calculating the function on each of the eigenvalues. A similar technique works more generally with the holomorphic functional calculus, using

A−1= QQ Λ −− 11

______WORLD TECHNOLOGIES ______222 Numerical Analysis, Modelling and Simulation

from above. Once again, we find that

ffΛ=λ ()()ii i

Examples

A2=ΛΛ=ΛΛ=Λ (Q Q− 1 )(Q Q − 1 ) Q (Q −−1 Q) Q 1 Q 21 Q −

An= QQ Λ n1−

Decomposition for Special Matrices Normal Matrices

A complex normal matrix ( A∗∗ A= AA ) has an orthogonal eigenvector basis, so a normal matrix can be decomposed as A= UU Λ *

where U is a unitary matrix. Further, if A is Hermitian ( AA= * ), which implies that it is also com- plex normal, the diagonal matrix Λ has only real values, and if A is unitary, Λ takes all its values on the complex unit circle.

Real Symmetric Matrices

As a special case, for every N×N real symmetric matrix, the eigenvalues are real and the eigenvec- tors can be chosen such that they are orthogonal to each other. Thus a real symmetric matrix A can be decomposed as A= QQ Λ T

where Q is an orthogonal matrix, and Λ is a diagonal matrix whose entries are the eigenvalues of A.

Useful Facts Useful Facts Regarding Eigenvalues

• The product of the eigenvalues is equal to the determinant of A

Nλ ni det ()A = ∏λi i=1

Note that each eigenvalue is raised to the power ni, the algebraic multiplicity. • The sum of the eigenvalues is equal to the trace of A

tr ()A = ∑ niiλ i=1

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 223

Note that each eigenvalue is multiplied by ni, the algebraic multiplicity. • −1 −1 If the eigenvalues of A are λi, and A is invertible, then the eigenvalues of A are simply λi . • If the eigenvalues of A are λi, then the eigenvalues of f(A) are simply f(λi), for any holomor- phic function f.

Useful Facts Regarding Eigenvectors

• If A is Hermitian and full-rank, the basis of eigenvectors may be chosen to be mutually orthogonal. The eigenvalues are real.

• The eigenvectors of A−1 are the same as the eigenvectors of A.

• Eigenvectors are defined up to a phase, i.e. if Av= λ v then eviθ is also an eigenvector, and specifically so is −v.

• In the case of degenerate eigenvalues (an eigenvalue appearing more than once), the - vectors have an additional freedom of rotation, i.e. any linear (orthonormal) combination of eigenvectors sharing an eigenvalue (i.e. in the degenerate sub-space), are themselves eigenvectors (i.e. in the subspace).

Useful Facts Regarding Eigendecomposition

• A can be eigendecomposed if and only if

NNv = • If p(λ) has no repeated roots, i.e. Nλ = N, then A can be eigendecomposed. • The statement “A can be eigendecomposed” does not imply that A has an inverse.

• The statement “A has an inverse” does not imply that A can be eigendecomposed.

Useful Facts Regarding Matrix Inverse

• A can be inverted if and only if

λi ≠∀0 i

• If λi ≠∀0 i and NNv = , the inverse is given by A−1= QQ Λ −− 11

Numerical Computations Numerical Computation of Eigenvalues

Suppose that we want to compute the eigenvalues of a given matrix. If the matrix is small, we can compute them symbolically using the characteristic polynomial. However, this is often impossible for larger matrices, in which case we must use a numerical method.

______WORLD TECHNOLOGIES ______224 Numerical Analysis, Modelling and Simulation

In practice, eigenvalues of large matrices are not computed using the characteristic polynomial. Computing the polynomial becomes expensive in itself, and exact (symbolic) roots of a high-de- gree polynomial can be difficult to compute and express: the Abel–Ruffini theorem implies that the roots of high-degree (5 or above) polynomials cannot in general be expressed simply using nth roots. Therefore, general algorithms to find eigenvectors and eigenvalues are iterative. Iterative numerical algorithms for approximating roots of polynomials exist, such as Newton’s method, but in general it is impractical to compute the characteristic polynomial and then apply these methods. One reason is that small round-off errors in the coefficients of the characteristic polynomial can lead to large errors in the eigenvalues and eigenvectors: the roots are an extremely ill-conditioned function of the coefficients. A simple and accurate iterative method is the power method: a random vector v is chosen and a sequence of unit vectors is computed as

AvAvAv23 ,,,… ‖AvAvAv ‖‖23 ‖‖ ‖

This sequence will almost always converge to an eigenvector corresponding to the eigenvalue of greatest magnitude, provided that v has a nonzero component of this eigenvector in the eigenvector basis (and also provided that there is only one eigenvalue of greatest magnitude). This simple algorithm is useful in some practical applications; for example, Google uses it to calculate the page rank of documents in their search engine. Also, the power method is the starting point for many more sophisticated algorithms. For instance, by keeping not just the last vector in the sequence, but instead looking at the span of all the vectors in the sequence, one can get a better (faster converging) approximation for the eigenvector, and this idea is the basis of Arnoldi iteration. Alternatively, the important QR algorithm is also based on a subtle transformation of a power method.

Numerical Computation of Eigenvectors Once the eigenvalues are computed, the eigenvectors could be calculated by solving the equa- tion

()A−=λi Iv ij, 0

using Gaussian elimination or any other method for solving matrix equations. However, in practical large-scale eigenvalue methods, the eigenvectors are usually computed in other ways, as a byproduct of the eigenvalue computation. In power iteration, for example, the ei- genvector is actually computed before the eigenvalue (which is typically computed by the Rayleigh quotient of the eigenvector). In the QR algorithm for a Hermitian matrix (or any normal matrix), the orthonormal eigenvectors are obtained as a product of the Q matrices from the steps in the al- gorithm. (For more general matrices, the QR algorithm yields the Schur decomposition first, from which the eigenvectors can be obtained by a backsubstitution procedure.) For Hermitian matrices, the Divide-and-conquer eigenvalue algorithm is more efficient than the QR algorithm if both ei- genvectors and eigenvalues are desired.

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 225

Additional Topics Generalized Eigenspaces

Recall that the geometric multiplicity of an eigenvalue can be described as the dimension of the asso- ciated eigenspace, the nullspace of λI − A. The algebraic multiplicity can also be thought of as a dimen- sion: it is the dimension of the associated generalized eigenspace (1st sense), which is the nullspace of the matrix (λI − A)k for any sufficiently large .k That is, it is the space of generalized eigenvectors (1st sense), where a generalized eigenvector is any vector which eventually becomes 0 if λI − A is applied to it enough times successively. Any eigenvector is a generalized eigenvector, and so each eigenspace is contained in the associated generalized eigenspace. This provides an easy proof that the geometric multiplicity is always less than or equal to the algebraic multiplicity.

Conjugate Eigenvector

A conjugate eigenvector or coneigenvector is a vector sent after transformation to a scalar multiple of its conjugate, where the scalar is called the conjugate eigenvalue or coneigenvalue of the linear transformation. The coneigenvectors and coneigenvalues represent essentially the same informa- tion and meaning as the regular eigenvectors and eigenvalues, but arise when an alternative coor- dinate system is used. The corresponding equation is

Av= λ v*.

For example, in coherent electromagnetic scattering theory, the linear transformation A represents the action performed by the scattering object, and the eigenvectors represent polarization states of the electromagnetic wave. In optics, the coordinate system is defined from the wave’s viewpoint, known as the Forward Scattering Alignment (FSA), and gives rise to a regular eigenvalue equation, whereas in radar, the coordinate system is defined from the radar’s viewpoint, known as the Back Scattering Alignment (BSA), and gives rise to a coneigenvalue equation.

Generalized Eigenvalue Problem

A generalized eigenvalue problem (2nd sense) is the problem of finding a vector v that obeys ABvv= λ

where A and B are matrices. If v obeys this equation, with some λ, then we call v the generalized eigenvector of A and B (in the 2nd sense), and λ is called the generalized eigenvalue of A and B (in the 2nd sense) which corresponds to the generalized eigenvector v. The possible values of λ must obey the following equation det(AB−=λ ) 0.

In the case we can find n∈  linearly independent vectors {,,}vv1 … n so that for every in∈…{1, , } , ABvvi= λ ii, where λi ∈  then we define the matrices P and D such that

______WORLD TECHNOLOGIES ______226 Numerical Analysis, Modelling and Simulation

| |  ()vv11  ()n 1    = ≡ P vv1 n      | |  ()vv1 n  ()nn

λi , if ij= ()D ij =  0, else

Then the following equality holds

A= BPDP−1

And the proof is

| ||  | | | | |      = = = λλ= = AP A v1 vn A v1 A vn11 B v nn BB v v1  B vn D BPD      | ||  | | | | |

And since P is invertible, we multiply the equation from the right by its inverse, finishing the proof.

The set of matrices of the form A − λB, where λ is a complex number, is called a pencil; the term matrix pencil can also refer to the pair (A,B) of matrices. If B is invertible, then the original prob- lem can be written in the form

BA−1 vv= λ

which is a standard eigenvalue problem. However, in most situations it is preferable not to per- form the inversion, but rather to solve the generalized eigenvalue problem as stated originally. This is especially important if A and B are Hermitian matrices, since in this case BA−1 is not gener- ally Hermitian and important properties of the solution are no longer apparent.

If A and B are Hermitian and B is a positive-definite matrix, the eigenvalues λ are real and eigen- * vectors v1 and v2 with distinct eigenvalues are B-orthogonal ( vv12B = 0 ). Also, in this case it is guaranteed that there exists a basis of generalized eigenvectors (it is not a defective problem). This case is sometimes called a Hermitian definite pencil or definite pencil.

Singular Value Decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It is the generalization of the eigendecomposition of a positive semidefinite normal matrix (for example, a symmetric matrix with positive eigenvalues) to any mn× matrix via an extension of polar decomposition. It has many useful applications in signal processing and statistics. Formally, the singular value decomposition of an mn× real or complex matrix M is a factorization of the form UV∑ * , where U is an mm× real or complex unitary matrix, ∑ is a mn× rectangular

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 227

diagonal matrix with non-negative real numbers on the diagonal, and V is an nn× real or complex

unitary matrix. The diagonal entries σ i of Óare known as the singular values of M . The columns of U and the columns of V are called the left-singular vectors and right-singular vectors of M, respectively. The singular value decomposition can be computed using the following observations: • The left-singular vectors of M are a set of orthonormal eigenvectors of MM*. • The right-singular vectors of M are a set of orthonormal eigenvectors of M*M. • The non-zero singular values of M (found on the diagonal entries of Σ) are the square roots of the non-zero eigenvalues of both M*M and MM*. Applications that employ the SVD include computing the pseudoinverse, least squares fitting of data, multi- variable control, matrix approximation, and determining the rank, range and null space of a matrix.

Statement of the Theorem Suppose M is a m × n matrix whose entries come from the field K, which is either the field of real numbers or the field ofcomplex numbers. Then there exists a factorization, called a singular value decomposition of M, of the form MUV= ∑ *

where • U is a m × m, unitary matrix, • Σ is a diagonal m × n matrix with non-negative real numbers on the diagonal, and • V* is a n × n, unitary matrix over K. (If K = R, unitary matrices are orthogonal matrices.) V* is the conjugate transpose of the n × n unitary matrix, V.

The diagonal entries σi of Σ are known as the singular values of M. A common convention is to list the singular values in descending order. In this case, the diagonal matrix, Σ, is uniquely deter- mined by M (though not the matrices U and V).

Intuitive Interpretations

The image shows: Upper Left: The unit disc with the two canonical unit vectors

Upper Right: Unit disc transformed with M and singular Values σ1 and σ2 indicated Lower Left: The action of V* on the unit disc. This is just a rotation. Lower Right: The action of ΣV* on the unit disc. Sigma scales in vertically and horizontally. In this special case, the singular values are Phi and 1/Phi where Phi is the Golden ratio. V* is a (counter clockwise) rotation by an angle alpha where alpha satisfies tan(alpha) = -Phi. U is a rotation by an angle beta with tan(beta) = Phi-1

______WORLD TECHNOLOGIES ______228 Numerical Analysis, Modelling and Simulation

Rotation, Scaling

In the special, yet common case when M is an m × m real square matrix with positive determi- nant, U, V*, and Σ are real m × m matrices as well, Σ can be regarded as a scaling matrix, and U, V* can be viewed as rotation matrices. Thus the expression UΣV* can be intuitively interpreted as a composition of three geometrical transformations: a rotation or reflection, a scaling, and another rotation or reflection. For instance, the figure above explains how sheara matrix can be described as such a sequence.

Using the polar decomposition theorem, we can also consider M = RP as the composition of a * stretch (positive definite normal matrix P = VΣV ) with eigenvalue scale factors σi along the orthog- * onal eigenvectors Vi of P, followed by a single rotation (unitary matrix R = UV ). If the rotation is done first, M = P’R, then R is the same and P’ = UΣU* has the same eigenvalues, but is stretched along different (post-rotated) directions. This shows that the SVD is a generalization of the eigen- value decomposition of pure stretches in orthogonal directions (symmetric matrix P) to arbitrary matrices (M = RP) which both stretch and rotate.

Singular Values as Semiaxes of An Ellipse or Ellipsoid

As shown in the figure, the singular values can be interpreted as the semiaxes of an ellipse in 2D. This concept can be generalized to n-dimensional Euclidean space, with the singular values of any n × n square matrix being viewed as the semiaxes of an n-dimensional ellipsoid.

The Columns of U and V are Orthonormal Bases

Since U and V* are unitary, the columns of each of them form a set of orthonormal vectors,

which can be regarded as basis vectors. The matrix M maps the basis vector Vi to the stretched

unit vector σi Ui. By the definition of a unitary matrix, the same is true for their conjugate transposes U* and V, except the geometric interpretation of the singular values as stretches is lost. In short, the columns of U, U*, V, and V* are orthonormal bases.

Example

Consider the 4 × 5 matrix 10002  00300 M =  00000  02000

A singular value decomposition of this matrix is given by UΣV*

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 229

001 0  010 0 U =  000− 1  100 0 20 0 00  03 0 00 Ó=  00 500  00000 0 100 0  0 010 0 V* = 0.2 0 0 0 0.8  0 001 0  − 0.8 0 0 0 0.2

Notice Σ is zero outside of the diagonal and one diagonal element is zero. Furthermore, because the matrices U and V* are unitary, multiplying by their respective conjugate transposes yields iden- tity matrices, as shown below. In this case, because U and V* are real valued, they each are an orthogonal matrix.

001 0 00 0 1 1000  010 0 01 0 0 0100 UUT =⋅= 000− 1 10 0 0 0010  100 0 00− 10 0001 0 0 0.2 0− 0.8 0 100 0 10000  0 010 0  1000 0 01000 =I VVT = 0 1 0 0 0⋅  0.2 0 0 0 0.8 = 00100= I 4 5 00 0 1 0 0 001 0 00010  0 0 0.8 0 0.2− 0.8 0 0 0 0.2 00001

This particular singular value decomposition is not unique. Choosing V such that 0 10 0 0  0 01 0 0 V* = 0.2 0 0 0 0.8  0.4 0 0 0.5− 0.1  − 0.4 0 0 0.5 0.1 is also a valid singular value decomposition.

______WORLD TECHNOLOGIES ______230 Numerical Analysis, Modelling and Simulation

Singular Values, Singular Vectors, and their Relation to the SVD A non-negative real number σ is a singular value for M if and only if there exist unit-length vectors u in Km and v in Kn such that

*  MMvu= σσ and uv=

The vectors u and v are called left-singular and right-singular vectors for σ, respectively.

In any singular value decomposition

MUV= ∑ *

the diagonal entries of Σ are equal to the singular values of M. The first p = min(m, n) columns of U and V are, respectively, left- and right-singular vectors for the corresponding singular values. Consequently, the above theorem implies that:

• An m × n matrix M has at most p distinct singular values.

• It is always possible to find aunitary basis U for Km with a subset of basis vectors spanning the left-singular vectors of each singular value of M.

• It is always possible to find a unitary basis V for Kn with a subset of basis vectors spanning the right-singular vectors of each singular value of M.

A singular value for which we can find two left (or right) singular vectors that are linearly inde- pendent is called degenerate. If u1 and u2 are two left-singular vectors which both correspond to the singular value σ, then any normalized linear combination of the two vectors is also a left-sin- gular vector corresponding to the singular value σ. The similar statement is true for right-singular vectors. The number of independent left and right singular vectors coincides, and these singular vectors appear in the same columns of U and V corresponding to diagonal elements of Σ all with the same value σ.

As an exception, the left and right singular vectors of singular value 0 comprise all unit vectors in the kernel and cokernel, respectively, of M, which by the rank–nullity theorem cannot be the same dimension if m ≠ n. Even if all singular values are nonzero, if m > n then the coker- nel is nontrivial, in which case U is padded with m − n orthogonal vectors from the cokernel. Conversely, if m < n, then V is padded by n − m orthogonal vectors from the kernel. However, if the singular value of 0 exists, the extra columns of U or V already appear as left or right sin- gular vectors.

Non-degenerate singular values always have unique left- and right-singular vectors, up to multi- plication by a unit-phase factor eiφ (for the real case up to a sign). Consequently, if all singular val- ues of a square matrix M are non-degenerate and non-zero, then its singular value decomposition is unique, up to multiplication of a column of U by a unit-phase factor and simultaneous multi- plication of the corresponding column of V by the same unit-phase factor. In general, the SVD is unique up to arbitrary unitary transformations applied uniformly to the column vectors of both U and V spanning the subspaces of each singular value, and up to arbitrary unitary transformations on vectors of U and V spanning the kernel and cokernel, respectively, of M.

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 231

Applications of the SVD Pseudoinverse

The singular value decomposition can be used for computing the pseudoinverse of a matrix. In- deed, the pseudoinverse of the matrix M with singular value decomposition M = UΣV* is

MVU++= ∑ *

where Σ+ is the pseudoinverse of Σ, which is formed by replacing every non-zero diagonal entry by its reciprocal and transposing the resulting matrix. The pseudoinverse is one way to solve linear least squares problems.

Solving Homogeneous Linear Equations

A set of homogeneous linear equations can be written as Ax = 0 for a matrix A and vector x. A typ- ical situation is that A is known and a non-zero x is to be determined which satisfies the equation. Such an x belongs to A’s null space and is sometimes called a (right) null vector of A. The vector x can be characterized as a right-singular vector corresponding to a singular value of A that is zero. This observation means that if A is a square matrix and has no vanishing singular value, the equa- tion has no non-zero x as a solution. It also means that if there are several vanishing singular val- ues, any linear combination of the corresponding right-singular vectors is a valid solution. Analo- gously to the definition of a (right) null vector, a non-zero x satisfying x*A = 0, with x* denoting the conjugate transpose of x, is called a left null vector of A.

Total Least Squares Minimization

A total least squares problem refers to determining the vector x which minimizes the 2-norm of a vector Ax under the constraint ||x|| = 1. The solution turns out to be the right-singular vector of A corresponding to the smallest singular value.

Range, Null Space and Rank

Another application of the SVD is that it provides an explicit representation of the range and null space of a matrix M. The right-singular vectors corresponding to vanishing singular values of M span the null space of M and the left-singular vectors corresponding to the non-zero singular val- ues of M span the range of M. E.g., in the above example the null space is spanned by the last two columns of V and the range is spanned by the first three columns of U.

As a consequence, the rank of M equals the number of non-zero singular values which is the same as the number of non-zero diagonal elements in Σ. In numerical linear algebra the singular values can be used to determine the effective rank of a matrix, as rounding error may lead to small but non-zero singular values in a rank deficient matrix.

Low-rank Matrix Approximation

Some practical applications need to solve the problem of approximating a matrix M with another matrix M , said truncated, which has a specific rankr . In the case that the approximation is based

______WORLD TECHNOLOGIES ______232 Numerical Analysis, Modelling and Simulation

on minimizing the Frobenius norm of the difference between M and M under the constraint that rank ()M = r it turns out that the solution is given by the SVD of M, namely MUV = ∑ *

where Ó is the same matrix as Σ except that it contains only the r largest singular values (the other singular values are replaced by zero). This is known as the Eckart–Young theorem, as it was proved by those two authors in 1936 (although it was later found to have been known to earlier authors).

Separable Models

The SVD can be thought of as decomposing a matrix into a weighted, ordered sum of separable matrices. By separable, we mean that a matrix A can be written as an outer product of two vectors

A = u v, or, in coordinates, Aij= uv i j . . Specifically, the matrix M can be decomposed as:

† ⊗ M=∑∑ Ai =σ ii UV ⊗ i ii

Here Ui and Vi are the i-th columns of the corresponding SVD matrices, σi are the ordered singular values, and each Ai is separable. The SVD can be used to find the decomposition of an image pro-

cessing filter into separable horizontal and vertical filters. Note that the number of non-zero σi is exactly the rank of the matrix.

Separable models often arise in biological systems, and the SVD factorization is useful to analyze such systems. For example, some visual area V1 simple cells’ receptive fields can be well described by a Gabor filter in the space domain multiplied by a modulation function in the time domain. Thus, given a linear filter evaluated through, for example, reverse correlation, one can rearrange the two spatial dimensions into one dimension, thus yielding a two-dimensional filter (space, time) which can be decomposed through SVD. The first column of U in the SVD factorization is then a Gabor while the first column of V represents the time modulation (or vice versa). One may then define an index of separability,

σ 2 α = 1 2 , ∑σ i i

which is the fraction of the power in the matrix M which is accounted for by the first separable matrix in the decomposition.

Nearest Orthogonal Matrix

It is possible to use the SVD of a square matrix A to determine the orthogonal matrix O closest to A. The closeness of fit is measured by the Frobenius norm of O − A. The solution is the product UV*. This intuitively makes sense because an orthogonal matrix would have the decomposition UIV* where I is the identity matrix, so that if A = UΣV* then the product A = UV* amounts to replacing the singular values with ones.

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 233

A similar problem, with interesting applications in shape analysis, is the orthogonal Procrustes problem, which consists of finding an orthogonal matrix O which most closely maps A to B. Spe- cifically,

T O= argminÙ‖‖ A Ω− B F subject to Ω Ω=I

where ⋅ F denotes the Frobenius norm. This problem is equivalent to finding the nearest orthogonal matrix to a given matrix M = ATB.

The Kabsch Algorithm

The Kabsch algorithm (called Wahba’s problem in other fields) uses SVD to compute the optimal rotation (with respect to least-squares minimization) that will align a set of points with a corre- sponding set of points. It is used, among other applications, to compare the structures of mole- cules.

Signal Processing

The SVD and pseudoinverse have been successfully applied to signal processing and big data, e.g., in genomic signal processing.

Other Examples

The SVD is also applied extensively to the study of linear inverse problems, and is useful in the analysis of regularization methods such as that of Tikhonov. It is widely used in statistics where it is related to principal component analysis and to Correspondence analysis, and in signal process- ing and pattern recognition. It is also used in output-only modal analysis, where the non-scaled mode shapes can be determined from the singular vectors. Yet another usage is latent semantic indexing in natural language text processing.

The SVD also plays a crucial role in the field ofquantum information, in a form often referred to as the Schmidt decomposition. Through it, states of two quantum systems are naturally decomposed, providing a necessary and sufficient condition for them to beentangled : if the rank of the Σ matrix is larger than one.

One application of SVD to rather large matrices is in numerical weather prediction, where Lanczos methods are used to estimate the most linearly quickly growing few perturbations to the central nu- merical weather prediction over a given initial forward time period; i.e., the singular vectors corre- sponding to the largest singular values of the linearized propagator for the global weather over that time interval. The output singular vectors in this case are entire weather systems. These perturbations are then run through the full nonlinear model to generate an ensemble forecast, giving a handle on some of the uncertainty that should be allowed for around the current central prediction.

SVD has also been applied to reduced order modelling. The aim of reduced order modelling is to reduce the number of degrees of freedom in a complex system which is to be modelled. SVD was coupled with radial basis functions to interpolate solutions to three-dimensional unsteady flow problems.

______WORLD TECHNOLOGIES ______234 Numerical Analysis, Modelling and Simulation

Singular value decomposition is used in recommender systems to predict people’s item ratings. Distributed algorithms have been developed for the purpose of calculating the SVD on clusters of commodity machines.

Low-rank SVD has been applied for hotspot detection from spatiotemporal data with application to disease outbreak detection . A combination of SVD and higher-order SVD also has been applied for real time event detection from complex data streams (multivariate data with space and time dimensions) in Disease surveillance.

Relation to Eigenvalue Decomposition

The singular value decomposition is very general in the sense that it can be applied to any m × n matrix whereas eigenvalue decomposition can only be applied to certain classes of square matri- ces. Nevertheless, the two decompositions are related.

Given an SVD of M, as described above, the following two relations hold:

MMVUUVV()V*= ∑ ** ∑ * = ∑∑ * * MM*= U ∑ V * V ∑ ** U = U( ∑∑ * )U *

The right-hand sides of these relations describe the eigenvalue decompositions of the left-hand sides. Consequently:

• The columns of V (right-singular vectors) are eigenvectors of M*M.

• The columns of U (left-singular vectors) are eigenvectors of MM*.

• The non-zero elements of Σ (non-zero singular values) are the square roots of the non-zero eigenvalues of M*M or MM*.

In the special case that M is a normal matrix, which by definition must be square, thespectral theo- rem says that it can be unitarily diagonalized using a basis of eigenvectors, so that it can be written M = UDU* for a unitary matrix U and a diagonal matrix D. When M is also positive semi-definite, the decomposition M = UDU* is also a singular value decomposition. Otherwise, it can be recast as

an SVD by moving the phase of each σi to either its corresponding Vi or Ui. The natural connection of the SVD to non-normal matrices is through the polar decomposition theorem: M=SR, where S=UΣU* is positive semidefinite and normal, and R=UV* is unitary.

Thus while related, the eigenvalue decomposition and SVD differ except for positive semi-definite normal matrices M: the eigenvalue decomposition is M = UDU−1 where U is not necessarily unitary and D is not necessarily positive semi-definite, while the SVD is M = UΣV* where Σ is diagonal and positive semi-definite, and U and V are unitary matrices that are not necessarily related except through the matrix M. While only non-defective square matrices have an eigenvalue decomposi- tion, any mn× matrix has a SVD.

Existence

An eigenvalue λ of a matrix M is characterized by the algebraic relation Mu = λu. When M is Her-

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 235

mitian, a variational characterization is also available. Let M be a real n × n symmetric matrix. Define

 f : RRn →  T  fx()= xM x

By the extreme value theorem, this continuous function attains a maximum at some u when re- stricted to the closed unit sphere {||x|| ≤ 1}. By the Lagrange multipliers theorem, u necessarily satisfies

∇f =∇ xTTM x =λ ⋅∇ xx

where the nabla symbol, ∇, is the del operator.

A short calculation shows the above leads to Mu = λu (symmetry of M is needed here). Therefore, λ is the largest eigenvalue of M. The same calculation performed on the orthogonal complement of u gives the next largest eigenvalue and so on. The complex Hermitian case is similar; there f(x) = x* M x is a real-valued function of 2n real variables.

Singular values are similar in that they can be described algebraically or from variational prin- ciples. Although, unlike the eigenvalue case, Hermiticity, or symmetry, of M is no longer re- quired.

Based on the Spectral Theorem

Let M be an m × n complex matrix. Since M*M is positive semi-definite and Hermitian, by the spectral theorem, there exists a unitary n × n matrix V such that

** D 0 V M MV =  00

where D is diagonal and positive definite. Partition V appropriately so we can write

∗ ∗∗ ∗∗ V1 ∗ V1 M MV 11 V M MV 2D 0 ∗MM[] V12 V = ∗∗ ∗∗ =  V2 V2 M MV 12 V M MV 200

Therefore:

** ** V11 M MV= D,. V 22 M MV= 0

The second equation implies MV2 = 0. Also, since V is unitary: VV*= I,, VV * = I VV ** += VV I 111 222 11 22 12 VV*= I,, VV * = I VV ** += VV I 111 222 11 22 12 * * ** VV11= I1 ,, VV 22 = I2 VV 11 += VV 22 I12

______WORLD TECHNOLOGIES ______236 Numerical Analysis, Modelling and Simulation

where the subscripts on the identity matrices are there to keep in mind that they are of different dimensions. Define

1 − 2 U11= MV D

Then

1 11 − 2*22 ** * UDVMVDDVMIVVMMVVM1 1= 1 1 =−=−=(22 ) ()2 2

since MV2 = 0.

We see that this is almost the desired result, except that U1 and V1 are not unitary in general since

they might not be square. However, we do know that for U1, the number of rows is greater than the number of columns since the dimensions of D is no greater than m and n. Also, since

1 1 11 − − −− * 2** 2 22 UU11= D VMMVD 1 1 = D DD = I1

the columns in U1 are orthonormal and can be extended to an orthonormal basis. This means, we

can choose U2 such that the following matrix is unitary:

UUU= []12

For V1 we already have V2 to make it unitary. Now, define 1 D02 ∑= 00 0

where extra zero rows are added or removed to make the number of zero rows equal the number

of columns of U2. Then 1 1 2 1 D 0 * * 2 * []UU[][] VV= UUDV1 = UDVM2 = 1200 12 1211  0 0

which is the desired result: MUV= ∑ *

Notice the argument could begin with diagonalizing MM* rather than M*M (This shows directly that MM* and M*M have the same non-zero eigenvalues).

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 237

Based on Variational Characterization

The singular values can also be characterized as the maxima of uTMv, considered as a function of u and v, over particular subspaces. The singular vectors are the values of u and v where these maxima are attained.

Let M denote an m × n matrix with real entries. Let Sm−1 and Sn−1 denote the sets of unit 2-norm vectors in Rm and Rn respectively. Define the function

σ (,)u v= uT Mv , u ∈∈SS mn−−11 , v .

Consider the function σ restricted to Sm−1 × Sn−1. Since both Sm−1 and Sn−1 are compact sets, their product is also compact. Furthermore, since σ is continuous, it attains a largest value for at least m−1 n−1 one pair of vectors u S and v S . This largest value is denoted σ1 and the corresponding vec- tors are denoted u and v . Since σ is the largest value of σ(u, v) it must be non-negative. If it were 1 ∈ 1 ∈1 negative, changing the sign of either u1 or v1 would make it positive and therefore larger.

Statement. u1, v1 are left and right-singular vectors of M with corresponding singular value

σ1. Proof: Similar to the eigenvalues case, by assumption the two vectors satisfy the Lagrange multi- plier equation:

T TT ∇σ =∇u Mv − λλ12 ⋅∇ u u − ⋅∇ v v

After some algebra, this becomes

Mv=+=+2λλ u 0 MT u 02 v 1 11 1 21

Mv1=+=+2λλ 11 u 0 M u 1 0221 v

T T Multiplying the first equation from left by u1 and the second equation from left by v1 and taking ||u|| = ||v|| = 1 into account gives

σλλ112=2 = 2.

Plugging this into the pair of equations above, we have

Mv=σσ u MT u = v 111111 T Mv111111=σσ u M u = v

This proves the statement.

More singular vectors and singular values can be found by maximizing σ(u, v) over normalized u, v which are orthogonal to u1 and v1, respectively. The passage from real to complex is similar to the eigenvalue case.

______WORLD TECHNOLOGIES ______238 Numerical Analysis, Modelling and Simulation

Geometric Meaning

Because U and V are unitary, we know that the columns U1, ..., Um of U yield an orthonormal basis m n of K and the columns V1, ..., Vn of V yield an orthonormal basis of K (with respect to the standard scalar products on these spaces).

The linear transformation

TK: nm→ K  xx M has a particularly simple description with respect to these orthonormal bases: we have

T(VUi )=σ ii ,i = 1, … , min( mn , ),

where σi is the i-th diagonal entry of Σ, and T(Vi) = 0 for i > min(m,n). The geometric content of the SVD theorem can thus be summarized as follows: for every linear map T : Kn Km one can find orthonormal bases ofK n and Km such that T maps the i-th basis vector of Kn to a non-negative multiple of the i-th basis vector of Km, and sends the left-over basis vectors to zero. With→ respect to these bases, the map T is therefore represented by a diagonal matrix with non-negative real diagonal entries.

To get a more visual flavour of singular values and SVD factorization — at least when working on real vector spaces — consider the sphere S of radius one in Rn. The linear map T maps this sphere onto an ellipsoid in Rm. Non-zero singular values are simply the lengths of the semi-axes of this ellipsoid. Especially when n = m, and all the singular values are distinct and non-zero, the SVD of the linear map T can be easily analysed as a succession of three consecutive moves: consider the ellipsoid T(S) and specifically its axes; then consider the directions in nR sent by T onto these axes. These directions happen to be mutually orthogonal. Apply first an isometry V* sending these directions to the coordinate axes of Rn. On a second move, apply an endomorphism D diagonal- ized along the coordinate axes and stretching or shrinking in each direction, using the semi-axes lengths of T(S) as stretching coefficients. The composition D V* then sends the unit-sphere onto an ellipsoid isometric to T(S). To define the third and last move U, apply an isometry to this ellip- soid so as to carry it over T(S). As can be easily checked, the composition∘ U D V* coincides with T. ∘ ∘ Calculating the SVD Numerical Approach

The SVD of a matrix M is typically computed by a two-step procedure. In the first step, the matrix is reduced to a bidiagonal matrix. This takes O(mn2) floating-point operations (flops), assuming that m ≥ n. The second step is to compute the SVD of the bidiagonal matrix. This step can only be done with an iterative method (as with eigenvalue algorithms). However, in practice it suffices to compute the SVD up to a certain precision, like the . If this precision is considered constant, then the second step takes O(n) iterations, each costing O(n) flops. Thus, the first step is more expensive, and the overall cost is O(mn2) flops (Trefethen & Bau III 1997, Lecture 31).

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 239

The first step can be done usingHouseholder reflections for a cost of 4mn2 − 4n3/3 flops, assuming that only the singular values are needed and not the singular vectors. If m is much larger than n then it is advantageous to first reduce the matrix M to a triangular matrix with the QR decompo- sition and then use Householder reflections to further reduce the matrix to bidiagonal form; the combined cost is 2mn2 + 2n3 flops (Trefethen & Bau III 1997, Lecture 31).

The second step can be done by a variant of the QR algorithm for the computation of eigenvalues, which was first described by Golub & Kahan (1965). The LAPACK subroutine DBDSQR imple- ments this iterative method, with some modifications to cover the case where the singular values are very small (Demmel & Kahan 1990). Together with a first step using Householder reflections and, if appropriate, QR decomposition, this forms the DGESVD routine for the computation of the singular value decomposition.

The same algorithm is implemented in the GNU Scientific Library (GSL). The GSL also offers an alternative method, which uses a one-sided Jacobi orthogonalization in step 2 (GSL Team 2007). This method computes the SVD of the bidiagonal matrix by solving a sequence of 2 × 2 SVD prob- lems, similar to how the Jacobi eigenvalue algorithm solves a sequence of 2 × 2 eigenvalue methods (Golub & Van Loan 1996, §8.6.3). Yet another method for step 2 uses the idea of divide-and-con- quer eigenvalue algorithms (Trefethen & Bau III 1997, Lecture 31).

There is an alternative way which is not explicitly using the eigenvalue decomposition. Usually the singular value problem of a matrix is converted into an equivalent symmetric eigenvalue problem such as M M*, M*M, or OM * . MO

The approaches using eigenvalue decompositions are based on QR algorithm which is well-de- veloped to be stable and fast. Note that the singular values are not complex and right- and left- singular vectors are not required to form any similarity transformation. Alternating QR decom- position and LQ decomposition can be claimed to use iteratively to find the real diagonal matrix with Hermitian matrices. QR decomposition gives M Q R and LQ decomposition of R gives R L P*. Thus, at every iteration, we have M Q L P*, update M L and repeat the orthogonaliza- tions. Eventually, QR decomposition and LQ decomposition⇒ iteratively provide unitary matrices for⇒ left- and right- singular matrices, respectively.⇒ This approach⇐ does not come with any accelera- tion method such as spectral shifts and deflation as in QR algorithm. It is because the shift method is not easily defined without using similarity transformation. But it is very simple to implement where the speed does not matter. Also it give us a good interpretation that only orthogonal/unitary transformations can obtain SVD as the QR algorithm can calculate the eigenvalue decomposition.

Analytic Result of 2 × 2 SVD

The singular values of a 2 × 2 matrix can be found analytically. Let the matrix be

MI=+++zz0 11σσσ z 2 2 z 33

where zi ∈ are complex numbers that parameterize the matrix, I is the identity matrix, and σ i denote the Pauli matrices. Then its two singular values are given by

______WORLD TECHNOLOGIES ______240 Numerical Analysis, Modelling and Simulation

2 2 2 2 2 2 2 2222222 σ ± =||||||||z0123 + z + z + z ± (||||||||)| z 0123 + z + z + z − zzzz 0123 −−−|

2 2 2 2 *2 *2 *2 *2 *2 *2 =|z0 | +++± | z 1 | | z 2 | | z 3 | 2 (Re zz01 ) + (Re zz02 ) + (Re zz03 ) + (Im zz12 ) + (Im zz23 ) + (Im zz31 )

Reduced SVDs

In applications it is quite unusual for the full SVD, including a full unitary decomposition of the null-space of the matrix, to be required. Instead, it is often sufficient (as well as faster, and more economical for storage) to compute a reduced version of the SVD. The following can be distin- guished for an m×n matrix M of rank r:

Thin SVD

* MU=nn ∑ V

Only the n column vectors of U corresponding to the row vectors of V* are calculated. The remaining column vectors of U are not calculated. This is significantly quicker and more economical than the full

SVD if n m. The matrix U’n is thus m×n, Σn is n×n diagonal, and V is n×n. The first≪ stage in the calculation of a thin SVD will usually be aQR decomposition of M, which can make for a significantly quicker calculation if n m.

Compact SVD ≪

* MU=r ∑ rr V

Only the r column vectors of U and r row vectors of V* corresponding to the non-zero singular

values Σr are calculated. The remaining vectors of U and V* are not calculated. This is quicker and more economical than the thin SVD if r n. The matrix Ur is thus m×r, Σr is r×r diagonal, and Vr* is r×n. ≪ Truncated SVD

 * MU=t ∑ tt V

Only the t column vectors of U and t row vectors of V* corresponding to the t largest singular values Σt are calculated. The rest of the matrix is discarded. This can be much quicker and more economical than the compact SVD if t r. The matrix Ut is thus m×t, Σt is t×t diagonal, and V * is t×n. t ≪ Of course the truncated SVD is no longer an exact decomposition of the original matrix M, but as discussed above, the approximate matrix M is in a very useful sense the closest approximation to M that can be achieved by a matrix of rank t.

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 241

Norms Ky Fan Norms

The sum of the k largest singular values of M is a matrix norm, the Ky Fan k-norm of M.

The first of the Ky Fan norms, the Ky Fan 1-norm is the same as the operator norm of M as a linear operator with respect to the Euclidean norms of Km and Kn. In other words, the Ky Fan 1-norm is the operator norm induced by the standard l2 Euclidean inner product. For this reason, it is also called the operator 2-norm. One can easily verify the relationship between the Ky Fan 1-norm and singular values. It is true in general, for a bounded operator M on (possibly infinite-dimensional) Hilbert spaces 1 ‖M ‖‖= MM* ‖2 But, in the matrix case, (M* M)½ is a normal matrix, so ||M* M||½ is the largest eigenvalue of (M* M)½, i.e. the largest singular value of M.

The last of the Ky Fan norms, the sum of all singular values, is the trace norm (also known as the ‘nuclear norm’), defined by ||M|| = Tr[(M* M)½] (the eigenvalues of M* M are the squares of the singular values).

Hilbert–Schmidt Norm

The singular values are related to another norm on the space of operators. Consider the Hilbert– Schmidt inner product on the n × n matrices, defined by 〈MN, 〉= trace() NM* . So the induced norm is ‖‖M= 〈 MM, 〉= trace() MM* . Since the trace is invariant under unitary equivalence, this shows

2 ‖‖M = ∑σ i i

where σi are the singular values of M. This is called the Frobenius norm, Schatten 2-norm, or Hil-

bert–Schmidt norm of M. Direct calculation shows that the Frobenius norm of M = (mij) coincides with:

2 ∑|mij |. ij

Tensor SVD

Two types of tensor decompositions exist, which generalise the SVD to multi-way arrays. One of them decomposes a tensor into a sum of rank-1 tensors, which is called a tensor rank decompo- sition. The second type of decomposition computes the orthonormal subspaces associated with the different factors appearing in the tensor product of vector spaces in which the tensor lives. This decomposition is referred to in the literature as the higher-order SVD (HOSVD) or Tucker3/

______WORLD TECHNOLOGIES ______242 Numerical Analysis, Modelling and Simulation

TuckerM. In addition, multilinear principal component analysis in multilinear subspace learning involves the same mathematical operations as Tucker decomposition, being used in a different context of dimensionality reduction.

Bounded Operators on Hilbert Spaces The factorization M = UΣV* can be extended to a bounded operator M on a separable Hilbert space H. Namely, for any bounded operator M, there exist a partial isometry U, a unitary V, a measure space (X, μ), and a non-negative measurable f such that

* MUV= Tf

2 where Tf is the multiplication by f on L (X, μ).

This can be shown by mimicking the linear algebraic argument for the matricial case above. VTf V* is the unique positive square root of M*M, as given by the Borel functional calculus for self adjoint operators. The reason why U need not be unitary is because, unlike the finite-dimensional case,

given an isometry U1 with nontrivial kernel, a suitable U2 may not be found such that

U1  U 2

is an unitary operator. As for matrices, the singular value factorization is equivalent to the polar decomposition for oper- ators: we can simply write

** M= UV ⋅ VTf V

and notice that U V* is still a partial isometry while VTf V* is positive.

Singular Values and Compact Operators To extend notion of singular values and left/right-singular vectors to the operator case, one needs to restrict to compact operators. It is a general fact that compact operators on Banach spaces have only discrete spectrum. This is also true for compact operators on Hilbert spaces, since Hilbert spaces are a special case of Banach spaces. If T is compact, every non-zero λ in its spectrum is an eigenvalue. Furthermore, a compact self adjoint operator can be diagonalized by its eigenvectors. If M is compact, so is M*M. Applying the diagonalization result, the unitary image of its positive

square root Tf has a set of orthonormal eigenvectors {ei} corresponding to strictly positive eigen- values {σi}. For any ψ H,

** Mψ∈ = UVTf ψ = ∑∑ UV Tf ψ,, UU ee ii= σψ i V e i U e i ii

where the series converges in the norm topology on H. Notice how this resembles the expression

from the finite-dimensional case. σi are called the singular values of M. {Uei} (resp. {Vei} ) can be considered the left-singular (resp. right-singular) vectors of M.

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 243

Compact operators on a Hilbert space are the closure of finite-rank operators in the uniform op- erator topology. The above series expression gives an explicit such representation. An immediate consequence of this is:

Theorem. M is compact if and only if M*M is compact.

History The singular value decomposition was originally developed by differential geometers, who wished to de- termine whether a real bilinear form could be made equal to another by independent orthogonal trans- formations of the two spaces it acts on. Eugenio Beltrami and Camille Jordan discovered independently, in 1873 and 1874 respectively, that the singular values of the bilinear forms, represented as a matrix, form a complete set of invariants for bilinear forms under orthogonal substitutions. James Joseph Syl- vester also arrived at the singular value decomposition for real square matrices in 1889, apparently in- dependently of both Beltrami and Jordan. Sylvester called the singular values the canonical multipliers of the matrix A. The fourth mathematician to discover the singular value decomposition independently is Autonne in 1915, who arrived at it via the polar decomposition. The first proof of the singular value de- composition for rectangular and complex matrices seems to be by Carl Eckart and Gale Young in 1936; they saw it as a generalization of the principal axis transformation for Hermitian matrices. In 1907, Erhard Schmidt defined an analog of singular values for integral operators (which are compact, under some weak technical assumptions); it seems he was unaware of the parallel work on singular values of finite matrices. This theory was further developed by Émile Picard in 1910, who is the first to call the numbers σ k singular values (or in French, valeurs singulières). Practical methods for computing the SVD date back to Kogbetliantz in 1954, 1955 and Hestenes in 1958. resembling closely the Jacobi eigenvalue algorithm, which uses plane rotations or Givens ro- tations. However, these were replaced by the method of Gene Golub and William Kahan published in 1965, which uses Householder transformations or reflections. In 1970, Golub and Christian Reinsch published a variant of the Golub/Kahan algorithm that is still the one most-used today.

System of Linear Equations

A linear system in three variables determines a collection of planes. The intersection point is the solution.

In mathematics, a system of linear equations (or linear system) is a collection of two or more linear equations involving the same set of variables. For example,

______WORLD TECHNOLOGIES ______244 Numerical Analysis, Modelling and Simulation

32x+ yz −= 1 224xyz−+=− 2 1 −+x2 yz − =0

is a system of three equations in the three variables x, y, z. A solution to a linear system is an assignment of values to the variables such that all the equations are simultaneously satisfied. A solution to the system above is given by 21x= y=− 2 z= −2

since it makes all three equations valid. The word “system” indicates that the equations are to be considered collectively, rather than individually.

In mathematics, the theory of linear systems is the basis and a fundamental part of linear algebra, a subject which is used in most parts of modern mathematics. Computational algorithms for find- ing the solutions are an important part of numerical linear algebra, and play a prominent role in engineering, physics, chemistry, computer science, and economics. A system of non-linear equa- tions can often be approximated by a linear system, a helpful technique when making a mathemat- ical model or computer simulation of a relatively complex system.

Very often, the coefficients of the equations are real or complex numbers and the solutions are searched in the same set of numbers, but the theory and the algorithms apply for coefficients and solutions in any field. For solutions in an integral domain like the ring of the integers, or in other algebraic structures, other theories have been developed. Integer linear programming is a collec- tion of methods for finding the “best” integer solution (when there are many). Gröbner basis the- ory provides algorithms when coefficients and unknowns are polynomials. Also tropical geometry is an example of linear algebra in a more exotic structure.

Elementary Example

The simplest kind of linear system involves two equations and two variables: 52xy+= 3 6 4xy+= 9 15.

One method for solving such a system is as follows. First, solve the top equation for x in terms of y : 3 xy=3. − 2

Now substitute this expression for x into the bottom equation:

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 245

3 4 3−yy += 9 15. 2

This results in a single equation involving only the variable y . Solving gives y =1, and sub- stituting this back into the equation for x yields x =3. > This method generalizes to systems with additional variables

General Form

A general system of m linear equations with n unknowns can be written as

ax11 1+ ax 12 2 ++ ax1nn = b 1

ax21 1+ ax 22 2 ++ ax2nn = b 2

axm11+ ax m 2 2 ++ axmn n = b m .

Here xx12,,,… xn are the unknowns, aa11, 12 ,,… amn are the coefficients of the system, and

bb12,,,… bm are the constant terms. Often the coefficients and unknowns arereal or complex numbers, but integers and rational num- bers are also seen, as are polynomials and elements of an abstract algebraic structure.

Vector Equation

One extremely helpful view is that each unknown is a weight for a column vector in a linear com- bination.

a11   a12   ab1n  1       a a ab xx21 + 22  ++ x 2n  = 2 12    n          am12   am   abmn  m

This allows all the language and theory of vector spaces (or more generally, modules) to be brought to bear. For example, the collection of all possible linear combinations of the vectors on the left-hand side is called their span, and the equations have a solution just when the right-hand vector is within that span. If every vector within that span has exactly one expression as a linear combination of the given left-hand vectors, then any solution is unique. In any event, the span has a basis of linearly independent vectors that do guaran- tee exactly one expression; and the number of vectors in that basis (its dimension) cannot be larger than m or n, but it can be smaller. This is important because if we have m inde- pendent vectors a solution is guaranteed regardless of the right-hand side, and otherwise not guaranteed.

Matrix Equation

The vector equation is equivalent to a matrix equation of the form

______WORLD TECHNOLOGIES ______246 Numerical Analysis, Modelling and Simulation

Axb=

where A is an m×n matrix, x is a column vector with n entries, and b is a column vector with m entries.

aa11 12  a1n   x1  b1     aa a x b A = 21 22 2n ,,xb= 2 = 2           aam12 m  amn   xn  bm

The number of vectors in a basis for the span is now expressed as the rank of the matrix.

Solution Set

A solution of a linear system is an assignment of values to the variables x1, x2, ..., xn such that each of the equations is satisfied. The set of all possible solutions is called the solution set.

The solution set for the equations x − y = −1 and 3x + y = 9 is the single point (2, 3).

A linear system may behave in any one of three possible ways:

1. The system has infinitely many solutions.

2. The system has a single unique solution.

3. The system has no solution.

Geometric Interpretation

For a system involving two variables (x and y), each linear equation determines a line on the xy- plane. Because a solution to a linear system must satisfy all of the equations, the solution set is the intersection of these lines, and is hence either a line, a single point, or the empty set.

For three variables, each linear equation determines a plane in three-dimensional space, and the solution set is the intersection of these planes. Thus the solution set may be a plane, a line, a single point, or the empty set. For example, as three parallel planes do not have a common point, the solution set of their equations is empty; the solution set of the equations of three planes inter- secting at a point is single point; if three planes pass through two points, their equations have at

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 247

least two common solutions; in fact the solution set is infinite and consists in all the line passing through these points.

For n variables, each linear equation determines a hyperplane in n-dimensional space. The solu- tion set is the intersection of these hyperplanes, which may be a flat of any dimension.

General Behavior

In general, the behavior of a linear system is determined by the relationship between the number of equations and the number of unknowns: • Usually, a system with fewer equations than unknowns has infinitely many solutions, but it may have no solution. Such a system is known as an underdetermined system. • Usually, a system with the same number of equations and unknowns has a single unique solution. • Usually, a system with more equations than unknowns has no solution. Such a system is also known as an overdetermined system. In the first case, thedimension of the solution set is usually equal to n − m, where n is the number of variables and m is the number of equations. The following pictures illustrate this trichotomy in the case of two variables:

One equation Two equations Three equations

The first system has infinitely many solutions, namely all of the points on the blue line. The second system has a single unique solution, namely the intersection of the two lines. The third system has no solutions, since the three lines share no common point.

Keep in mind that the pictures above show only the most common case. It is possible for a system of two equations and two unknowns to have no solution (if the two lines are parallel), or for a system of three equations and two unknowns to be solvable (if the three lines intersect at a single point). In general, a system of linear equations may behave differently from expected if the equa- tions are linearly dependent, or if two or more of the equations are inconsistent.

Properties Independence

The equations of a linear system are independent if none of the equations can be derived algebra- ically from the others. When the equations are independent, each equation contains new informa- tion about the variables, and removing any of the equations increases the size of the solution set.

______WORLD TECHNOLOGIES ______248 Numerical Analysis, Modelling and Simulation

For linear equations, logical independence is the same as linear independence.

The equations x − 2y = −1, 3x + 5y = 8, and 4x + 3y = 7 are linearly dependent.

For example, the equations 3xy+= 2 6 and 6 xy += 4 12

are not independent — they are the same equation when scaled by a factor of two, and they would produce identical graphs. This is an example of equivalence in a system of linear equa- tions.

For a more complicated example, the equations xy−=−21 358xy+= 437xy+=

are not independent, because the third equation is the sum of the other two. Indeed, any one of these equations can be derived from the other two, and any one of the equations can be removed without affecting the solution set. The graphs of these equations are three lines that intersect at a single point.

Consistency

The equations 3x + 2y = 6 and 3x + 2y = 12 are inconsistent.

A linear system is inconsistent if it has no solution, and otherwise it is said to be consistent. When the system is inconsistent, it is possible to derive a contradiction from the equations, that may al-

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 249

ways be rewritten as the statement 0 = 1. For example, the equations 3xy+= 2 6 and 3 xy += 2 12

are inconsistent. In fact, by subtracting the first equation from the second one and multiplying both sides of the result by 1/6, we get 0 = 1. The graphs of these equations on the xy-plane are a pair of parallel lines. It is possible for three linear equations to be inconsistent, even though any two of them are consis- tent together. For example, the equations xy+=1 21xy+= 323xy+=

are inconsistent. Adding the first two equations together gives 3x + 2y = 2, which can be subtracted from the third equation to yield 0 = 1. Note that any two of these equations have a common solu- tion. The same phenomenon can occur for any number of equations. In general, inconsistencies occur if the left-hand sides of the equations in a system are linearly dependent, and the constant terms do not satisfy the dependence relation. A system of equations whose left-hand sides are linearly independent is always consistent. Putting it another way, according to the Rouché–Capelli theorem, any system of equations (overdetermined or otherwise) is inconsistent if the rank of the augmented matrix is greater than the rank of the coefficient matrix. If, on the other hand, the ranks of these two matrices are equal, the system must have at least one solution. The solution is unique if and only if the rank equals the number of variables. Otherwise the general solution has k free parameters where k is the difference between the number of variables and the rank; hence in such a case there are an infinitude of solutions. The rank of a system of equations can never be higher than [the number of variables] + 1, which means that a system with any number of equations can always be reduced to a system that has a number of independent equations that is at most equal to [the number of variables] + 1. Equivalence

Two linear systems using the same set of variables are equivalent if each of the equations in the second system can be derived algebraically from the equations in the first system, and vice versa. Two systems are equivalent if either both are inconsistent or each equation of each of them is a lin- ear combination of the equations of the other one. It follows that two linear systems are equivalent if and only if they have the same solution set.

Solving A Linear System

There are several algorithms for solving a system of linear equations.

Describing The Solution When the solution set is finite, it is reduced to a single element. In this case, the unique solution

______WORLD TECHNOLOGIES ______250 Numerical Analysis, Modelling and Simulation

is described by a sequence of equations whose left-hand sides are the names of the unknowns and right-hand sides are the corresponding values, for example (xy= 3, =−= 2, z 6). . When an order on the unknowns has been fixed, for example thealphabetical order the solution may be described as a vector of values, like (3,− 2,6) for the previous example.

It can be difficult to describe a set with infinite solutions. Typically, some of the variables are desig- nated as free (or independent, or as parameters), meaning that they are allowed to take any value, while the remaining variables are dependent on the values of the free variables.

For example, consider the following system: xyz+−=325 3567xyz++=

The solution set to this system can be described by the following equations: x=−−7 z 1 and yz = 3 + 2.

Here z is the free variable, while x and y are dependent on z. Any point in the solution set can be obtained by first choosing a value for z, and then computing the corresponding values for x and y.

Each free variable gives the solution space one degree of freedom, the number of which is equal to the dimension of the solution set. For example, the solution set for the above equation is a line, since a point in the solution set can be chosen by specifying the value of the parameter z. An infinite solution of higher order may describe a plane, or higher-dimensional set.

Different choices for the free variables may lead to different descriptions of the same solution set. For example, the solution to the above equations can alternatively be described as follows: 3 11 1 1 yx=−+and zx =−−. 7 7 77

Here x is the free variable, and y and z are dependent.

Elimination of Variables

The simplest method for solving a system of linear equations is to repeatedly eliminate variables. This method can be described as follows:

1. In the first equation, solve for one of the variables in terms of the others.

2. Substitute this expression into the remaining equations. This yields a system of equations with one fewer equation and one fewer unknown.

3. Continue until you have reduced the system to a single linear equation.

4. Solve this equation, and then back-substitute until the entire solution is found.

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 251

For example, consider the following system: xyz+−=325 3567xyz++= 24xyz+ += 38

Solving the first equation for x gives x = 5 + 2z − 3y, and plugging this into the second and third equation yields −+4yz 12 =− 8 −+27yz =− 2

Solving the first of these equations for y yields y = 2 + 3z, and plugging this into the second equa- tion yields z = 2. We now have: x=+−52 zy 3 yz=23 + z = 2

Substituting z = 2 into the second equation gives y = 8, and substituting z = 2 and y = 8 into the first equation yields x = −15. Therefore, the solution set is the single point (x, y, z) = (−15, 8, 2).

Row Reduction

In row reduction, the linear system is represented as an augmented matrix:

13− 25  3 5 6 7. 24 3 8

This matrix is then modified usingelementary row operations until it reaches reduced row echelon form. There are three types of elementary row operations:

Type 1: Swap the positions of two rows.

Type 2: Multiply a row by a nonzero scalar.

Type 3: Add to one row a scalar multiple of another.

Because these operations are reversible, the augmented matrix produced always represents a lin- ear system that is equivalent to the original.

There are several specific algorithms to row-reduce an augmented matrix, the simplest of which are Gaussian elimination and Gauss-Jordan elimination. The following computation shows Gauss-Jordan elimination applied to the matrix above:

The last matrix is in reduced row echelon form, and represents the system x = −15, y = 8, z = 2.

______WORLD TECHNOLOGIES ______252 Numerical Analysis, Modelling and Simulation

A comparison with the example in the previous section on the algebraic elimination of variables shows that these two methods are in fact the same; the difference lies in how the computations are written down.

1325132513251325−−   − −      3 5 6 7  ~ 0−− 4 12 8 ~ 0 −− 4 12 8 ~ 0 1 − 3 2  2438  24 3 8 0−− 27 2 0 −− 27 2 13−− 25  13 25  1309  100− 15       ~0 1− 3 2~0  1 0 8~0  1 0 8~0  1 0 8. 00 1 2  00 1 2  0012  001 2 

Cramer’s Rule

Cramer’s rule is an explicit formula for the solution of a system of linear equations, with each vari- able given by a quotient of two determinants. For example, the solution to the system xyz+−=325 3567xyz++= 24xyz+ += 38 is given by

53−− 2 15 2 135 75 6 37 6 357 84 3 28 3 248 xyz= ,,.= = 13−−− 2 13 2 13 2 35 6 35 6 35 6 24 3 24 3 24 3

For each variable, the denominator is the determinant of the matrix of coefficients, while the nu- merator is the determinant of a matrix in which one column has been replaced by the vector of constant terms.

Though Cramer’s rule is important theoretically, it has little practical value for large matrices, since the computation of large determinants is somewhat cumbersome. (Indeed, large determi- nants are most easily computed using row reduction.) Further, Cramer’s rule has very poor numer- ical properties, making it unsuitable for solving even small systems reliably, unless the operations are performed in rational arithmetic with unbounded precision.

Matrix Solution

If the equation system is expressed in the matrix form Axb= , , the entire solution set can also be expressed in matrix form. If the matrix A is square (has m rows and n=m columns)

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 253

and has full rank (all m rows are independent), then the system has a unique solution giv- en by

xb= A−1

where A−1 is the inverse of A. More generally, regardless of whether m=n or not and regardless of the rank of A, all solutions (if any exist) are given using the Moore-Penrose pseudoinverse of A, denoted Ag , as follows:

xb=Agg +−() I AA w

where w is a vector of free parameters that ranges over all possible n×1 vectors. A necessary and sufficient condition for any solution(s) to exist is that the potential solution obtained using w0= satisfy Axb= — that is, that AAgbb= .If this condition does not hold, the equa- tion system is inconsistent and has no solution. If the condition holds, the system is con- sistent and at least one solution exists. For example, in the above-mentioned case in which A is square and of full rank, Ag simply equals A−1 and the general solution equation simplifies to xb=AIAAAIIA−1 +−( −− 11 ) wb = +− () wb =−1as previously stated, where w has completely dropped out of the solution, leaving only a single solution. In other cases, though, w remains and hence an infinitude of potential values of the free parameter vector w give an infinitude of solu- tions of the equation.

Other Methods

While systems of three or four equations can be readily solved by hand, computers are often used for larger systems. The standard algorithm for solving a system of linear equations is based on Gaussian elimination with some modifications. Firstly, it is essential to avoid division by small numbers, which may lead to inaccurate results. This can be done by reordering the equations if necessary, a process known as pivoting. Secondly, the algorithm does not exactly do Gaussian elimination, but it computes the LU decomposition of the matrix A. This is mostly an organiza- tional tool, but it is much quicker if one has to solve several systems with the same matrix A but different vectors b.

If the matrix A has some special structure, this can be exploited to obtain faster or more accurate algorithms. For instance, systems with a symmetric positive definite matrix can be solved twice as fast with the Cholesky decomposition. Levinson recursion is a fast method for Toeplitz matri- ces. Special methods exist also for matrices with many zero elements (so-called sparse matrices), which appear often in applications.

A completely different approach is often taken for very large systems, which would otherwise take too much time or memory. The idea is to start with an initial approximation to the solution (which does not have to be accurate at all), and to change this approximation in several steps to bring it closer to the true solution. Once the approximation is sufficiently accurate, this is taken to be the solution to the system. This leads to the class of iterative methods.

______WORLD TECHNOLOGIES ______254 Numerical Analysis, Modelling and Simulation

Homogeneous Systems

A system of linear equations is homogeneous if all of the constant terms are zero:

ax11 1+ ax 12 2 ++ ax1nn =0

ax21 1+ ax 22 2 ++ ax2nn =0

axm11+ ax m 2 2 ++ axmn n =0. A homogeneous system is equivalent to a matrix equation of the form Ax0= where A is an m × n matrix, x is a column vector with n entries, and 0 is the zero vector with m entries.

Solution Set

Every homogeneous system has at least one solution, known as the zero solution (or trivial solu- tion), which is obtained by assigning the value of zero to each of the variables. If the system has a non-singular matrix (det(A) ≠ 0) then it is also the only solution. If the system has a singular matrix then there is a solution set with an infinite number of solutions. This solution set has the following additional properties:

1. If u and v are two vectors representing solutions to a homogeneous system, then the vector sum u + v is also a solution to the system.

2. If u is a vector representing a solution to a homogeneous system, and r is any scalar, then ru is also a solution to the system.

These are exactly the properties required for the solution set to be a linear subspace of Rn. In particular, the solution set to a homogeneous system is the same as the null space of the corresponding matrix A. Numerical solutions to a homogeneous system can be found with an SVD decomposition.

Relation to Nonhomogeneous Systems

There is a close relationship between the solutions to a linear system and the solutions to the cor- responding homogeneous system: AAxb= andx0= . Specifically, if p is any specific solution to the linear systemA x = b, then the entire solution set can be described as

{p+= vv: is any solution to Ax 0} .

Geometrically, this says that the solution set for Ax = b is a translation of the solution set for Ax = 0. Specifically, theflat for the first system can be obtained by translating thelinear subspace for the homogeneous system by the vector p.

This reasoning only applies if the system Ax = b has at least one solution. This occurs if and only if

______WORLD TECHNOLOGIES ______Essential Aspects of Numerical Analysis 255 the vector b lies in the image of the linear transformation A.

References • Boyer, C. B., A History of Mathematics, 2nd ed. rev. by Uta C. Merzbach, New York: Wiley, 1989 ISBN 0-471- 09763-2 (1991 pbk ed. ISBN 0-471-54397-7).

• Berg, Bernd A. (2004). Markov Chain Monte Carlo Simulations and Their Statistical Analysis (With Web-Based Fortran Code). Hackensack, NJ: World Scientific.ISBN 981-238-935-0.

• Doucet, Arnaud; Freitas, Nando de; Gordon, Neil (2001). Sequential Monte Carlo methods in practice. New York: Springer. ISBN 0-387-95146-6.

• Gould, Harvey; Tobochnik, Jan (1988). An Introduction to Computer Simulation Methods, Part 2, Applications to Physical Systems. Reading: Addison-Wesley. ISBN 0-201-16504-X.

• Kroese, D. P.; Taimre, T.; Botev, Z.I. (2011). Handbook of Monte Carlo Methods. New York: John Wiley & Sons. p. 772. ISBN 0-470-17793-4.

______WORLD TECHNOLOGIES ______7 Various Numerical Analysis Softwares

This chapter lists the numerical analysis softwares; some of these softwares are TK Solver, LAPACK, DataMelt, Analytica and GNU Octave. TK Solver is a mathematical modeling software that is based on declarative and rule-based language. Analytica is a software that is developed by Lumia decision systems for creating and analyzing quantitative decision models. The section serves as a source to understand all the numerical analysis softwares.

List of Numerical Analysis Software

Numerical Software Packages

• TK Solver is a mathematical modeling and problem solving software system based on a declarative, rule-based language, commercialized by Universal Technical Systems, Inc.

• DataMelt (or DMelt) is a free math software for numerical computation and 2D/3D visual- ization. Supports Java, Python/Jython, BeanShell, JRuby and Apache Groovy.

• Analytica is a widely used proprietary tool for building and analyzing numerical models. It is a declarative and visual programming language based on influence diagrams.

• MATLAB is a widely used for performing numerical calculations. It comes with its own programming language, in which numerical algorithms can be implemented.

• GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with MATLAB. Octave includes an experimental GUI as of Version 3.8, released December 31, 2013. A number of independently developed Linux programs (Can- tor, KAlgebra) also offer GUI front-ends to Octave. An active community provides techni- cal support to users.

• Plotly – Plotting library, Python command line, and graphical interface for analyzing data and creating browser-based graphs. Available for R, Python, MATLAB, Julia, and Perl.

• Julia (programming language) is a new high-level dynamic language with a surface simi- larity to MATLAB.

• FlexPro is a program for data analysis and presentation of measurement data. It provides a rich Excel-like user interface and its built-in vector programming language FPScript has a syntax similar to MATLAB.

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 257

• Scilab is advanced numerical analysis package similar to MATLAB or Octave. Comes with a complete GUI and Xcos which is alternative to Simulink. (free software, GPL-compatible CeCILL license)

is a deep learning library with support for manipulation, statistical analysis and pre- sentation of Tensors.

• LAPACK provides Fortran 90 routines for solving systems of simultaneous linear equa- tions, least-squares solutions of linear systems of equations, eigenvalue problems, and sin- gular value problems and the associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, and generalized Schur).

• ScaLAPACK is a library of high-performance linear algebra routines for parallel distributed memory machines that features functionality similar to LAPACK (solvers for dense and banded linear systems, least squares problems, eigenvalue problems, and singular value problem).

• NAG Library is an extensive software library of highly optimized numerical analysis rou- tines for various programming environments.

• FreeMat, an open-source MATLAB-like environment with a GPL license.

• Rlab is another free software computer program which bears a strong resemblance to MATLAB. Rlab development ceased for several years but it was revived as RlabPlus.

• Sysquake is a computing environment with interactive graphics for mathematics, physics and engineering. Like other applications from Calerga, it is based on a MATLAB-compat- ible language.

• LabVIEW offers both textual and graphical programming approaches to numerical analy- sis. Its text-based programming language MathScript uses .m file script syntax providing some compatibility with MATLAB and its clones.

• O-Matrix

• jLab, a research platform for building an open source MATLAB-like environment in pure Java and Groovy. Currently supports interpreted j-Scripts (MATLAB-like) and compiled GroovySci (extension to Groovy) scripts that provides direct interfacing to Java code and scripting access to many popular Java scientific libraries (e.g. and JSci ) and appli- cation Wizards. (Project Page: )

• pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

• SmartXML, a free programming language with integrated development environment (IDE) for mathematical calculations. SmartXML supports calculations involving big numbers, which can have up to 100,000,000 decimal digits and up to 100,000,000 whole digits. SmartXML offers a context aware code completion, code details tool tips, automatic error check, and many other user friendly features.

______WORLD TECHNOLOGIES ______258 Numerical Analysis, Modelling and Simulation

Add-ons:

• Jacket, A proprietary GPU Toolbox for MATLAB, enabling some MATLAB computations to be offloaded to the GPU for acceleration and data visualization purposes.

• XLfit, A plugin to Excel which provides curve fitting and statistical analysis.

General-purpose Computer Algebra Systems

, a general-purpose computer algebra system, which has a free GPL-licensed ver- sion called Maxima.

• Maple, a general-purpose commercial mathematics software package.

offers a WYSIWYG interface and the ability to generate publication-quality math- ematical equations.

• Mathematica offers numerical evaluation, optimization and visualization of a very wide range of numerical functions. It also includes a programming language and computer al- gebra capabilities.

• PARI/GP is a widely used computer algebra system designed for fast computations in (factorizations, algebraic number theory, elliptic curves...), but also con- tains a large number of other useful functions to compute with mathematical entities such as matrices, polynomials, power series, algebraic numbers etc., and a lot of transcendental functions. PARI is also available as a C library to allow for faster computations.

• SageMath is an open-source math software, with a unified Python interface which is avail- able as a text interface or a graphical web-based one. Includes interfaces for open-source and proprietary general purpose CAS, and other numerical analysis programs, like PARI/ GP, GAP, , Magma, and Maple.

• Speakeasy is an interactive numerical environment also featuring an interpreted program- ming language. Born in the mid ‘60s for matrix manipulation and still in continuous evo- lution, it pioneered the most common paradigms of this kind of tools, featuring dynamic typing of the structured data objects, dynamic allocation and garbage collection, operators overloading, dynamic linking of compiled or interpreted additional modules contributed by the community of the users and so on.

is a collection of open source, object-oriented libraries for use in scientific and engineering applications. Trilinos is based on scalable, parallel linear algebra algorithms.

Interface-oriented

• Baudline is a time-frequency browser for numerical signals analysis and scientific visualization.

• COMSOL Multiphysics is a finite element analysis, solver and Simulation software / FEA Software package for various physics and engineering applications, especially coupled phe- nomena, or multiphysics.

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 259

is provided by NIST.

• DADiSP is a commercial program focused on DSP that combines the numerical capability of MATLAB with a spreadsheet like interface.

• EJS is an open source software tool, written in Java, for generating simulations.

• Euler Mathematical Toolbox is a powerful numerical laboratory with a programming lan- guage that can handle real, complex and interval numbers, vectors and matrices. It can produce 2D/3D plots.

• DataMelt is a free data-analysis framework written in Java which uses Jython for scripting (although other Java scripting can also be used)

• FEATool is an easy to use multiphysics GUI toolbox for Matlab and Octave to solve PDEs with the Finite Element Method.

• FEniCS Project is a collection of project for automated solutions to PDEs.

• Hermes is a C++ library of advanced adaptive finite element algorithms to solvePDEs and multiphysics coupled problems.

• Fityk is a curve fitting and data analysis program. Primarily used for peak fitting and ana- lyzing peak data.

• FlexPro is a commercial program for interactive and automated analysis and presentation of mainly measurement data. It supports many binary instrument data formats and has its own vectorized programming language.

• IGOR Pro, a software package with emphasis on , image analysis, and curve fit- ting. It comes with its own programming language and can be used interactively.

• LabPlot is a data analysis and visualization application built on the KDE Platform.

• MCSim a Monte Carlo simulation tool.

• Origin, a software package that is widely used for making scientific graphs. It comes with its own C/C++ compiler that conforms quite closely to ANSI standard.

• PAW is a free data analysis package developed at CERN.

• SPSS, an application for statistical analysis.

• QtiPlot is a data analysis and scientific visualisation program, similar to Origin.

• ROOT is a free object oriented multipurpose data analysis package, developed at CERN.

is a free software that provides a generic platform for pre- and post-processing for numerical simulation.

• Shogun, an open source Large Scale Machine Learning toolbox that provides several SVM implementations (like libSVM, SVMlight) under a common framework and interfaces to

______WORLD TECHNOLOGIES ______260 Numerical Analysis, Modelling and Simulation

Octave, MATLAB, Python, R

• Waffles is a free-software collection of command-line tools designed for scripting machine learning operations in automated experiments and processes.

• Weka is a suite of machine learning software written at the University of Waikato.

Language-oriented

• acslX is a software application for modeling and evaluating the performance of continuous systems described by time-dependent, nonlinear differential equations.

• ADMB is a software suite for non-linear statistical modeling based on C++ which uses au- tomatic differentiation.

• AMPL is a mathematical modeling language for describing and solving high complexity problems for large scale optimization.

• Ch, a commercial C/C++ based interpreted language with computational array for scientif- ic numerical computation and visualization.

• APMonitor: APMonitor is a mathematical modeling language for describing and solving representations of physical systems in the form of differential and algebraic equations.

• Armadillo is C++ template library for linear algebra; includes various decompositions, fac- torisations, and statistics functions; its syntax (API) is similar to MATLAB.

• DataMelt is scientific package which uses Jython to call numerical and graphical libraries written in Java

• Julia is designed for cloud parallel scientific computing in mind on LLVM-based JIT as a backend. Lightweight “green” threading (coroutines). Direct calls of C functions from code (no wrappers or special APIs needed), support for Unicode. Powerful shell-like capabilities for managing other processes. Lisp-like macros and other metaprogramming facilities.

• ELKI a software framework for development of data mining algorithms in Java.

• GAUSS, a matrix programming language for mathematics and statistics.

• GNU Data Language, a free compiler designed as a drop-in replacement for IDL.

• IDL, a commercial interpreted language based on FORTRAN with some vectorization. Widely used in the solar physics, fusion, atmospheric sciences and medical communities. The GNU Data Language is a free alternative.

• ILNumerics.Net, a C# math library that brings numeric computing functions for science, engineering and financial analysis to the .NET Framework.

• KPP generates Fortran 90, FORTRAN 77, C, or Matlab code for the integration of ordinary differential equations (ODEs) resulting from chemical reaction mechanisms.

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 261

• Madagascar, an open-source software package for multidimensional data analysis and re- producible computational experiments.

• MLPACK is an open-source library for machine learning, providing a simple and consis- tent API, while exploiting C++ language features to provide maximum performance and flexibility

• NCAR Command Language is an interpreted language designed specifically for scientific data analysis and visualization.

• O-Matrix - a matrix programming language for mathematics, engineering, science, and financial analysis.

• OptimJ is a mathematical Java-based modeling language for describing and solving high complexity problems for large scale optimization.

• Perl Data Language, also known as PDL, an array extension to Perl ver.5, used for data manipulation, statistics, numerical simulation and visualization.

• R is a widely used system with a focus on data manipulation and statistics which imple- ments the S language. Many add-on packages are available (free software, GNU GPL li- cense).

• SAS, a system of software products for statistics

• VisSim is a visual block diagram language for simulation of nonlinear dynamic systems and model based embedded development. Its fast ODE engine supports real-time simulation of complex large scale models. The highly efficient fixed point code generator allows targeting of low cost fixed-point embedded processors.

which is used within many Wolfram technologies such as Mathematica and the Wolfram Cloud

(WPS), supports the SAS language for statistics

• Yorick is an interpreted programming language designed for numerics, graph plotting and simulation.

• SmartXML is a compiled language. The program written in SmrtXML language is com- piled into C# code and is executed by SmartXML integrated development environment (IDE).

Historically Significant

• Expensive Desk Calculator written for the TX-0 and PDP-1 in the late 1950s or early 1960s.

• S is an (array-based) programming language with strong numerical support. R language is a continuation of S project.

______WORLD TECHNOLOGIES ______262 Numerical Analysis, Modelling and Simulation

TK Solver

TK Solver (originally TK!Solver) is a mathematical modeling and problem solving software system based on a declarative, rule-based language, commercialized by Universal Technical Systems, Inc.

History

Invented by Milos Konopasek in the late 1970s and initially developed in 1982 by Software Arts, the company behind VisiCalc, TK Solver was acquired by Universal Technical Systems in 1984 after Software Arts fell into financial difficulty and was sold to Lotus Software. Konopasek’s goal in inventing the TK Solver concept was to create a problem solving environment in which a giv- en mathematical model built to solve a specific problem could be used to solve related problems (with a redistribution of input and output variables) with minimal or no additional programming required: once a user enters an equation, TK Solver can evaluate that equation as is—without iso- lating unknown variables on one side of the equals sign.

Core Technology

TK Solver’s core technologies are a declarative programming language, algebraic equation solver, an iterative equation solver, and a structured, object-based interface. The interface comprises nine classes of objects that can be shared between and merged into other TK files: • Rules: equations, formulas, function calls which may include logical conditions • Variables: a listing of the variables that are used in the rules, along with values (numeric or non-numeric) that have been entered by the user or calculated by the software • Units: all units conversion factors, in a single location, to allow automatic update of values when units are changed • Lists: ranges of numeric and non-numeric values which can be associated with a variable or processed directly by procedure functions • Tables: collections of lists displayed together • Plots: line charts, scatterplots, bar charts, and pie charts • Functions: rule-based, table look-up, and components • Formats: settings for displaying numeric and string values • Comments: for explanation and documentation

Each class of object is listed and stored on its own worksheet—the Rule Sheet, Variable Sheet, Unit Sheet, etc. Within each worksheet, each object has properties summarized on subsheets or viewed in a property window. The interface uses toolbars and a hierarchal navigation bar that resembles the directory tree seen on the left side of the Windows Explorer.

The declarative programming structure is embodied in the rules, functions and variables that form the core of a mathematical model.

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 263

Rules, Variables and Units

All rules are entered in the Rule Sheet or in user-defined functions. Unlike a spreadsheet orimper - ative programming environment, the rules can be in any order or sequence and are not expressed as assignment statements. “A + B = C / D” is a valid rule in TK Solver and can be solved for any of its four variables. Rules can be added and removed as needed in the Rule Sheet without regard for their order and incorporated into other models. A TK Solver model can include up to 32,000 rules, and the library that ships with the current version includes utilities for higher mathematics, statistics, engineering and science, finances, and programming. Variables in a rule are automatically posted to the Variable Sheet when the rule is entered and the rule is displayed in mathematical format in the MathLook View window at the bottom of the screen. Any variable can operate as an input or an output, and the model will be solved for the output variables depending on the choice of inputs. A database of unit conversion factors also ships with TK Solver, and users can add, delete, or import unit conversions in a way similar to that for rules. Each variable is associated with a “cal- culation” unit, but variables can also be assigned “display” units and TK automatically converts the values. For example, rules may be based upon meters and kilograms, but units of inches and pounds can be used for input and output.

Problem-solving

TK Solver has three ways of solving systems of equations. The “direct solver” solves a system al- gebraically by the principle of consecutive substitution. When multiple rules contain multiple un- knowns, the program can trigger an iterative solver which uses the Newton-Raphson algorithm to successively approximate based on initial guesses for one or more of the output variables. Pro- cedure functions can also be used to solve systems of equations. Libraries of such procedures are included with the program and can be merged into files as needed. A list solver feature allows variables to be associated with ranges of data or probability distributions, solving for multiple values, which is useful for generating tables and plots and for running Monte Carlo simulations. The premium version now also includes a “Solution Optimizer” for direct setting of bounds and constraints in solving models for minimum, maximum, or specific conditions.

TK Solver includes roughly 150 built-in functions: mathematical, trigonometric, Boolean, numer- ical calculus, database access, and programming functions, including string handling and calls to externally compiled routines. Users may also define three types of functions: declarative rule func- tions; list functions, for table lookups and other operations involving pairs of lists; and procedure functions, for loops and other procedural operations which may also process or result in arrays (lists of lists). The complete NIST database of thermodynamic and transport properties is includ- ed, with built-in functions for accessing it. TK Solver is also the platform for engineering applica- tions marketed by UTS, including Advanced Spring Design, Integrated Gear Software, Interactive Roark’s Formulas, Heat Transfer on TK, and Dynamics and Vibration Analysis.

Data Display and Sharing

Tables, plots, comments, and the MathLook notation display tool can be used to enrich TK Solver models. Models can be linked to other components with Microsoft and .NET tools, or

______WORLD TECHNOLOGIES ______264 Numerical Analysis, Modelling and Simulation

they can be web-enabled using the RuleMaster product or linked with Excel using the Excel Toolkit product. There is also a DesignLink option linking TK Solver models with CAD drawings and solid models. In the premium version, standalone models can be shared with others who do not have a TK license, opening them in Excel or the free TK Player.

LAPACK

LAPACK (Linear Algebra Package) is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition. LAPACK was originally written in FORTRAN 77, but moved to Fortran 90 in version 3.2 (2008). The routines handle both real and complex matrices in both single and double precision.

LAPACK was designed as the successor to the linear equations and linear least-squares routines of LINPACK and the eigenvalue routines of EISPACK. LINPACK, written in the 1970s and 1980s, was designed to run on the then-modern vector computers with shared memory. LAPACK, in con- trast, was designed to effectively exploit the caches on modern cache-based architectures, and thus can run orders of magnitude faster than LINPACK on such machines, given a well-tuned BLAS implementation. LAPACK has also been extended to run on distributed-memory systems in later packages such as ScaLAPACK and PLAPACK.

LAPACK is licensed under a three-clause BSD style license, a permissive free with few restrictions.

Naming Scheme

Subroutines in LAPACK have a characteristic naming convention which makes the identifiers short but rather obscure. This was necessary as the firstFortran standards only supported identifiers up to six characters long, so the names had to be shortened to fit into this limit.

A LAPACK subroutine name is in the form pmmaaa, where:

• p is a one-letter code denoting the type of numerical constants used. S, D stand for real floating point arithmetic respectively in single and double precision, while C and Z stand for complex arithmetic with respectively single and double precision. The newer version, LAPACK95, uses generic subroutines in order to overcome the need to explicitly specify the data type.

• mm is a two-letter code denoting the kind of matrix expected by the algorithm. The codes for the different kind of matrices are reported below; the actual data are stored in a differ- ent format depending on the specific kind; e.g., when the code DI is given, the subroutine expects a vector of length n containing the elements on the diagonal, while when the code GE is given, the subroutine expects an n×n array containing the entries of the matrix.

• aaa is a one- to three-letter code describing the actual algorithm implemented in the subrou- tine, e.g. SV denotes a subroutine to solve linear system, while R denotes a rank-1 update.

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 265

For example, the subroutine to solve a linear system with a general (non-structured) matrix using real double-precision arithmetic is called DGESV.

Matrix types in the LAPACK naming scheme Name Description BD Bidiagonal matrix DI Diagonal matrix GB Band matrix GE Matrix (i.e., unsymmetric, in some cases rectangular) GG general matrices, generalized problem (i.e., a pair of general matrices) GT Tridiagonal Matrix General Matrix HB (complex) Hermitian matrix Band matrix HE (complex) Hermitian matrix HG upper Hessenberg matrix, generalized problem (i.e. a Hessenberg and a Triangular matrix) HP (complex) Hermitian matrix, Packed storage matrix HS upper Hessenberg matrix OP (real) Orthogonal matrix, Packed storage matrix OR (real) Orthogonal matrix PB Symmetric matrix or Hermitian matrix positive definite band PO Symmetric matrix or Hermitian matrix positive definite PP Symmetric matrix or Hermitian matrix positive definite, Packed storage matrix PT Symmetric matrix or Hermitian matrix positive definite Tridiagonal matrix SB (real) Symmetric matrix Band matrix SP Symmetric matrix, Packed storage matrix ST (real) Symmetric matrix Tridiagonal matrix SY Symmetric matrix TB Triangular matrix Band matrix TG triangular matrices, generalized problem (i.e., a pair of triangular matrices) TP Triangular matrix, Packed storage matrix TR Triangular matrix (or in some cases quasi-triangular) TZ Trapezoidal matrix UN (complex) Unitary matrix UP (complex) Unitary matrix, Packed storage matrix

Details on this scheme can be found in the Naming scheme section in LAPACK Users’ Guide.

Use With Other Programming Languages

Many programming environments today support the use of libraries with C binding. The LAPACK routines can be used like C functions if a few restrictions are observed.

Several alternative language bindings are also available:

• LAPACK++ for C++

______WORLD TECHNOLOGIES ______266 Numerical Analysis, Modelling and Simulation

• Armadillo for C++

• IT++ for C++

• Lacaml for OCaml

DataMelt

DataMelt (or, in short, DMelt) a computation and visualization environment, is an interactive framework for scientific computation, data analysis and data visualization designed for scientists, engineers and students. DataMelt is multiplatform since it is written in Java, thus it runs on any where the Java virtual machine can be installed.

The program is designed for interactive scientific plots in 2D and 3D and contains numerical sci- entific libraries implemented in Java for mathematical functions, random numbers, statistical analysis, curve fitting and other data mining algorithms. DataMelt uses high-level programming languages, such as Jython, Groovy, JRuby, but Java coding can also be used to call DataMelt nu- merical and graphical libraries.

DataMelt is an attempt to create a data-analysis environment using open-source packages with a coherent user interface and tools competitive to commercial programs. The idea behind the proj- ect is to incorporate open-source mathematical and numerical software packages with GUI-type user interfaces into a coherent program in which the main user interface is based on short-named Java/Python classes. This was required to build an analysis environment using Java scripting con- cept. A typical example will be shown below.

Scripts and Java code (in case of the Java programming) can be run either in a GUI editor of DataMelt or as batch programs. The graphical libraries of DataMelt can be used to create applets. All charts (or “Canvases”) used for data representation can be embedded into Web browsers.

DataMelt can be used everywhere where an analysis of large numerical data volumes, data min- ing, statistical data analysis and mathematics are essential. The program can be used in natural sciences, engineering, modeling and analysis of financial markets. While the program falls into the category of open source software, it is not completely free for commercial usage, no source code is available on the home page, and all documentation and even bug reporting requires “member- ship”.

Overview

DataMelt has several features useful for data analysis:

• uses Jython, BeanShell, Groovy, JRuby scripting, or the standard Java. The GNU Octave mode is also available for symbolic calculations;

• can be integrated with the Web in forms of applets or Java Web-start applications, thus it is suited for distributed analysis environment via the Internet;

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 267

• DataMelt is designed from the ground up to support programming with multiple threads;

• has a full-featured IDE with syntax highlighting, syntax checker, code completion and analyser. It includes a version of IDE for small-screen devices;

• includes a help system with a code completion based on the Java reflection technology;

• uses a platform-neutral I/O based on Google’s Protocol Buffers. Data can be written in C++ and analyzed using Java/Jython.

(object databases and SQL-based databases)

• has a browser for serialized objects and objects created using Google Protocol Buffers;

• includes packages for statistical calculations;

• error (uncertainty) propagation using a linear expansion or a Monte Carlo approach for arbitrary function

• symbolic calculations similar to those found in the GNU Octave project or MATLAB, but rewritten in Java (jMathLab project).

Data-analysis Features

The package supports several mathematical, data-analysis and data mining features:

• 2D and 3D interactive visualization of data, functions, histograms, charts.

• analytic calculations using Matlab or Octave syntax

• histograms in 2D and 3D, as well as profile histograms

• random numbers and statistical samples

• functions, including parametric equations in 3D

• contour plots, scatter plots

• neural networks

• linear regression and curve fitting using several minimization techniques

• Cluster analysis (K-means clustering analysis (single and multi pass), Fuzzy (C-means) algorithm, agglomerative )

• input/output for all data objects (arrays, functions, histograms) are based on Java serialization. There is also a support for I/O from/to C++ and other languages using the Google’s Protocol buffer format. Several databases are supported (Java-object databases and SQL-based)

• Cellular automaton

• output to high-quality Vector graphics. Support for PostScript, EPS, PDF and raster for- mats

______WORLD TECHNOLOGIES ______268 Numerical Analysis, Modelling and Simulation

Symbolic and Numeric Calculations

Symbolic calculations use GNU Octave scripting language. The following methods are available:

• Systems of polynomial equations solving

• vectors and matrix algebra

• Factorization

• derivatives

• integrals (rational functions)

• boolean algebra

• simplification

• geometric algebra

Input and Output

DataMelt includes the native Java and Python methods for file input and outputs. In addition, it allows to write data in the following formats:

• The HFile format based on Java serialization. Optionally, compression and XML serializa- tion are supported. Data can be written sequentially or using the key-value maps.

• The PFile format based on the Protocol Buffers engine for multiplatform input output

• The HBook format, which is a simplified XML format to write large data structures without XML tags

• Arbitrary can be written into object databases with file system as back-end. This allows writing large data collections to files which normally do not fit into the comput- er memory.

• Several SQL database engines are included as external packages

• AIDA (computing) file format (read only)

• ROOT file format (read only)

Data stored in external files can be viewed using browsers for convenient visualization.

History

DataMelt has its roots in particle physics where data mining is a primary task. It was created as jHepWork project in 2005 and it was initially written for data analysis for particle physics using the Java software concept for International Linear Collider project developed at SLAC. Later ver- sions of jHepWork were modified for general public use (for scientists, engineers, students for educational purpose) since the International Linear Collider project has stalled. In 2013, jHep-

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 269

Work was renamed to DataMelt and become a general-purpose community-supported project. The main source of reference is the book “Scientific Data analysis using Jython Scripting and Java” and which discuss in depth data analysis methods using Java and Jython scripting.

The string “HEP” in the project name “jHepWork” abbreviates “High-Energy Physics”. But due to a wide popularity outside this area of physics, it was renamed to SCaViS (Scientific Computation and Visualization Environment). This project existed for 3 years before it was renamed to Data- Melt (or, in short, DMelt).

DataMelt is hosted by jWork.ORG portal

Supported Platforms

DataMelt runs on Windows, Linux, Mac and the Android platforms. The package for the Android is called AWork.

License Terms

The core source code of the numerical and graphical libraries is licensed by the GNU General Pub- lic License. The interactive development environment (IDE) used by DataMelt has some restric- tions for commercial usage since language files, documentation files, examples, installer, code-as- sist databases, interactive help are licensed by the creative-common license. Full members of the DataMelt project have several benefits, such as: the license for a commercial usage, access to the source repository, an extended help system, a user script repository and an access to the complete documentation.

The commercial licenses cannot apply to source code that was imported or contributed to Data- Melt from other authors.

Examples Jython Scripts

Here is an example of how to show 2D bar graphs by reading a CVS file downloaded from the World Bank web site.

from jhplot.io.csv import * from java.io import * from jhplot import *

d = {} reader = CSVReader(FileReader(“ny.gdp.pcap.cd_Indicator_en_csv_ v2.csv”)); while True:

______WORLD TECHNOLOGIES ______270 Numerical Analysis, Modelling and Simulation

nextLine = reader.readNext() if nextLine is None: break xlen = len(nextLine) if xlen < 50: continue d[nextLine] = float(nextLine[xlen-2]) # key=country, value=DGP c1 = HChart(“2013”,800,400) #c1.setGTitle(“2013 Gross domestic product per capita”) c1.visible() c1.setChartBar() c1.setNameY(“current US $”) c1.setNameX(“”) c1.setName(“2013 Gross domestic product per capita”) name1 = “Data Source: World Development Indicators” set_value = lambda name: c1.valueBar(d[name], name, name1) set_value(name=”Russia”) set_value(name=”Poland”) set_value(name=”Romania”) set_value(name=”Bulgaria”) set_value(name=”Belarus”) set_value(name=”Ukraine”) c1.update() The execution of this script plots a bar chart in a separate window. The image can be saved in a number of formats.

Here is another simple example which illustrates how to fill a 2D histogram and display it on a canvas. The script also creates a figure in the PDF format. This script illustrates how to glue and mix the native JAVA classes (from the package java.util) and DataMelt classes (the package jhplot) inside a script written using the Python syntax.

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 271

from java.util import Random from jhplot import *

c1 = HPlot3D(“Canvas”) # create an interactive canvas c1.setGTitle(“Global title”) c1.setNameX(“X”) c1.setNameY(“Y”) c1.visible() c1.setAutoRange()

h1 = H2D(“2D histogram”, 25, -3.0, 3.0, 25, -3.0, 3.0) rand = Random() for i in range(200): h1.fill(rand.nextGaussian(), rand.nextGaussian()) c1.draw(h1) c1.export(“jhplot3d.eps”) # export to EPS Vector Graphics This script can be run either using DataMelt IDE or using a stand-alone Jython after specifying classpath to DataMelt libraries. The output is shown below:

3D histogram

Groovy Scripts

The same example can also be coded using Groovy programming language which is supported by DataMelt. import java.util.Random import jhplot.*

______WORLD TECHNOLOGIES ______272 Numerical Analysis, Modelling and Simulation

c1 = new HPlot3D(“Canvas”) // create an interactive canvas c1.setGTitle(“Global title”) c1.setNameX(“X”) c1.setNameY(“Y”) c1.visible() c1.setAutoRange()

h1 = new H2D(“2D histogram”,25,-3.0, 3.0,25,-3.0, 3.0) rand = Random() (1..200).each{ // or (0..<200).each{ or Java: for (i=0; i<200; i++){ if argument is required, you cann access it through “it” inside the loop: (0..<200).each{ println “step: ${it+1}” } h1.fill(rand.nextGaussian(),rand.nextGaussian()) } c1.draw(h1); c1.export(“jhplot3d.eps”) // export to EPS Vector Graphics Groovy is better integrated with Java and can be a factor three faster for long loops over primitives compared to Jython.

Analytica (Software)

Analytica is a visual software package developed by Lumina Decision Systems for creating, analyzing and communicating quantitative decision models. As a modeling environment, it is interesting in the way it com- bines hierarchical influence diagrams for visual creation and view of models, intelligent arrays for working with multidimensional data, Monte Carlo simulation for analyzing risk and uncertainty, and optimization, including linear and nonlinear programming. Its design, especially its influence diagrams and treatment of uncertainty, is based on ideas from the field of decision analysis. As a computer language, it is notable in combining a declarative (non-procedural) structure for referential transparency, array abstraction, and automatic dependency maintenance for efficient sequencing of computation.

Hierarchical Influence Diagrams

Analytica models are organized as influence diagrams. Variables (and other objects) appear as nodes of various shapes on a diagram, connected by arrows that provide a visual representation of dependencies. Analytica influence diagrams may be hierarchical, in which a singlemodule node on a diagram represents an entire submodel.

Hierarchical influence diagrams in Analytica serve as a key organizational tool. Because the visual layout of an influence diagram matches these natural human abilities both spatially and in the lev-

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 273

el of abstraction, people are able to take in far more information about a model’s structure and or- ganization at a glance than is possible with less visual paradigms, such as spreadsheets and mathe- matical expressions. Managing the structure and organization of a large model can be a significant part of the modeling process, but is substantially aided by the visualization of influence diagrams.

Influence diagrams also serve as a tool for communication. Once a quantitative model has been created and its final results computed, it is often the case that an understanding of how the results are obtained, and how various assumptions impact the results, is far more important than the specific numbers com- puted. The ability of a target audience to understand these aspects is critical to the modeling enterprise. The visual representation of an influence diagram quickly communicates an understanding at a level of abstraction that is normally more appropriate than detailed representations such as mathematical expressions or formulae. When more detail is desired, users can drill down to increasing levels of detail, speeded by the visual depiction of the model’s structure.

The existence of an easily understandable and transparent model supports communication and debate within an organization, and this effect is one of the primary benefits of investing in quantitative model building. When all interested parties are able to understand a common model structure, debates and discussions will often focus more directly on specific assumptions, can cut down on “cross-talk”, and therefore lead to more productive interactions within the organization. The influence diagram serves as a graphical representation that can help to make models accessible to people at different levels.

Intelligent Multidimensional Arrays

Analytica uses index objects to track the dimensions of multidimensional arrays. An index object has a name and a list of elements. When two multidimensional values are combined, for example in an expression such as

Profit = Revenue − Expenses

where Revenue and Expenses are each multidimensional, Analytica repeats the profit calculation over each dimension, but recognizes when same dimension occurs in both values and treats it as the same dimension during the calculation, in a process called intelligent array abstraction. Unlike most programming languages, there is no inherent ordering to the dimensions in a multi- dimensional array. This avoids duplicated formulas and explicit FOR loops, both common sources of modeling errors. The simplified expressions made possible by intelligent array abstraction allow the model to be more accessible, interpretable, and transparent.

Another consequence of intelligent array abstraction is that new dimensions can be introduced or removed from an existing model, without requiring changes to the model structure or changes to variable definitions. For example, while creating a model, the model builder might assume a particular variable, for example discount_rate, contains a single number. Later, after constructing a model, a user might replace the single number with a table of numbers, perhaps discount_rate broken down by Country and by Economic_scenario. These new divisions may reflect the fact that the effective discount rate is not the same for international divisions of a company, and that different rates are applicable to different hypothetical scenarios. Analytica automatically propa- gates these new dimensions to any results that depend upon discount_rate, so for example, the result for Net present value will become multidimensional and contain these new dimensions. In

______WORLD TECHNOLOGIES ______274 Numerical Analysis, Modelling and Simulation

essence, Analytica repeats the same calculation using the discount rate for each possible combina- tion of Country and Economic_scenario.

This flexibility is important when exploring computation tradeoffs between the level of detail, computation time, available data, and overall size or dimensionality of parametric spaces. Such adjustments are common after models have been fully constructed as a way of exploring what-if scenarios and overall relationships between variables.

Uncertainty Analysis

Incorporating uncertainty into model outputs helps to provide more realistic and informative pro- jections. Uncertain quantities in Analytica can be specified using a distribution function. When evaluated, distributions are sampled using either Latin hypercube or Monte Carlo sampling, and the samples are propagated through the computations to the results. The sampled result distribu- tion and summary statistics can then be viewed directly (mean, fractile bands, probability density function (PDF), cumulative distribution function (CDF)), Analytica supports collaborative Deci- sion Analysis and Probability Management through the use of the DIST standard.

Systems Dynamics Modeling

System dynamics is an approach to simulating the behaviour of complex systems over time. It deals with feedback loops and time delays on the behaviour of the entire system. The Dynam- ic() function in Analytica allows definition of variables with cyclic dependencies, such as feedback loops. It expands the influence diagram notation, which does not normally allow cycles. At least one link in each cycle includes a time lag, depicted as a gray influence arrow to distinguish it from standard black arrows without time lags.

As A Programming Language

Analytica includes a general language of operators and functions for expressing mathematical re- lationships among variables. Users can define functions and libraries to extend the language.

Analytica has several features as a programming language designed to make it easy to use for quantitative modeling: It is a visual programming language, where users view programs (or “mod- els”) as influence diagrams, which they create and edit visually by adding and linking nodes. It is a declarative language, meaning that a model declares a definition for each variable without spec- ifying an execution sequence as required by conventional imperative languages. Analytica deter- mines a correct and efficient execution sequence using the dependency graph. It is a referentially transparent functional language, in that execution of functions and variables have no side effects i.e. changing other variables. Analytica is an language, where operations and functions generalize to work on multidimensional arrays.

Applications of Analytica

Analytica has been used for policy analysis, business modeling, and risk analysis. Areas in which Analytica has been applied include energy, health and pharmaceuticals, environmental risk and emissions policy analysis, wildlife management, ecology, climate change, technology and defense,

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 275

strategic financial planning, R&D planning and portfolio management, financial services, aero- space, manufacturing and environmental health impact assessment.

Editions

The Analytica software runs on Microsoft Windows operating systems. Three editions (Profession- al, Enterprise, Optimizer) each with more functions and cost, are purchased by users interested in building models. A free edition is available, called Analytica Free 101, which allows you to build medium to moderate sized models of up to 101 user objects.. Free 101 also allows you to view mod- els with more than 101 objects, change inputs, and compute results, which enables free sharing of models for review. A more capable but non-free Power Player enables users to save inputs and utilize database connections. The Analytica Cloud Player allows you to share models over the web and lets users access and run via a web browser.

The most recent release of Analytica is version 4.6, released in May 2015.

History

Analytica’s predecessor, called Demos, grew from the research on tools for policy analysis by Max Henrion as a PhD student and later professor at Carnegie Mellon University between 1979 and 1990. Henrion founded Lumina Decision Systems in 1991 with Brian Arnold. Lumina continued to develop the software and apply it to environmental and public policy analysis applications. Lumi- na first released Analytica as a product in 1996.

GNU Octave

GNU Octave is software featuring a high-level programming language, primarily intended for nu- merical computations. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with MAT- LAB. It may also be used as a batch-oriented language. Since it is part of the GNU Project, it is free software under the terms of the GNU General Public License.

Octave is one of the major free alternatives to MATLAB, others being FreeMat and Scilab. Scilab, however, puts less emphasis on (bidirectional) syntactic compatibility with MATLAB than Octave does.

History

The project was conceived around 1988. At first it was intended to be a companion to a chemical reactor design course. Real development was started by John W. Eaton in 1992. The first alpha release dates back to January 4, 1993 and on February 17, 1994 version 1.0 was released. Version 4.0.0 was released on May 29, 2015.

The program is named after Octave Levenspiel, a former professor of the principal author. Leven- spiel is known for his ability to perform quick back-of-the-envelope calculations.

______WORLD TECHNOLOGIES ______276 Numerical Analysis, Modelling and Simulation

Developments

In addition to use on desktops for personal scientific computing, Octave is used in academia and industry. For example, Octave was used on a massive parallel computer at Pittsburgh supercom- puting center to find vulnerabilities related to guessing social security numbers.

Technical Details • Octave is written in C++ using the C++ standard library. • Octave uses an interpreter to execute the Octave scripting language. • Octave is extensible using dynamically loadable modules. • Octave interpreter has an OpenGL-based graphics engine to create plots, graphs and charts and to save or print them. Alternatively, gnuplot can be used for the same purpose. • Octave versions 3.8.0 and later include a Graphical User Interface (GUI) in addition to the traditional Command Line Interface (CLI).

Octave, The Language

The Octave language is an interpreted programming language. It is a structured programming language (similar to C) and supports many common C standard library functions, and also certain UNIX system calls and functions. However, it does not support passing arguments by reference. Octave programs consist of a list of function calls or a script. The syntax is matrix-based and pro- vides various functions for matrix operations. It supports various data structures and allows ob- ject-oriented programming. Its syntax is very similar to MATLAB, and careful programming of a script will allow it to run on both Octave and MATLAB. Because Octave is made available under the GNU General Public License, it may be freely changed, copied and used. The program runs on Microsoft Windows and most Unix and Unix-like operating systems, including OS X.

Notable Features Command and Variable Name Completion

Typing a TAB character on the command line causes Octave to attempt to complete variable, func- tion, and file names (similar to ’s tab completion). Octave uses the text before the cursor as the initial portion of the name to complete.

Command History

When running interactively, Octave saves the commands typed in an internal buffer so that they can be recalled and edited.

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 277

Data Structures

Octave includes a limited amount of support for organizing data in structures. In this example, we see a structure “x” with elements “a”, “b”, and “c”, (an integer, an array, and a string, respectively):

octave:1> x.a = 1; x.b = [1, 2; 3, 4]; x.c = “string”; octave:2> x.a ans = 1 octave:3> x.b ans =

1 2 3 4

octave:4> x.c ans = string octave:5> x x = { a = 1 b =

1 2 3 4

c = string }

Short-circuit Boolean Operators

Octave’s ‘&&’ and ‘||’ logical operators are evaluated in a short-circuit fashion (like the correspond- ing operators in the C language), in contrast to the element-by-element operators ‘&’ and ‘|’.

Increment and Decrement Operators

Octave includes the C-like increment and decrement operators ‘++’ and ‘--’ in both their prefix and postfix forms. Octave also does augmented assignment, e.g. ‘x += 5’.

______WORLD TECHNOLOGIES ______278 Numerical Analysis, Modelling and Simulation

Unwind-protect

Octave supports a limited form of exception handling modelled after the ‘unwind_protect’ of Lisp. The general form of an unwind_protect block looks like this:

unwind_protect body unwind_protect_cleanup cleanup end_unwind_protect As a general rule, GNU Octave recognizes as termination of a given ‘block’ either the keyword ‘end’ (which is compatible with the MATLAB language) or a more specific keyword ‘end_block’. As a consequence, an ‘unwind_protect’ block can be termi- nated either with the keyword ‘end_unwind_protect’ as in the example, or with the more portable keyword ‘end’.

The cleanup part of the block is always executed. In case an exception is raised by the body part, cleanup is executed immediately before propagating the exception outside the block ‘unwind_protect’.

GNU Octave also supports another form of exception handling (compatible with the MATLAB language):

try body catch exception_handling end This latter form differs from an ‘unwind_protect’ block in two ways. First, exception_handling is only executed when an exception is raised by body. Second, after the execution of exception_han- dling the exception is not propagated outside the block (unless a ‘rethrow( lasterror )’ statement is purposely inserted within the exception_handling code).

Variable-length Argument Lists

Octave has a mechanism for handling functions that take an unspecified number of arguments without explicit upper limit. To specify a list of zero or more arguments, use the special argument varargin as the last (or only) argument in the list.

function s = plus (varargin) if (nargin==0) s = 0; else s = varargin{1} + plus (varargin{2:nargin}); end end

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 279

Variable-length return lists

A function can be set up to return any number of values by using the special return value varargout. For example:

function varargout = multiassign (data) for k=1:nargout varargout{k} = data(:,k); end end

C++ Integration

It is also possible to execute Octave code directly in a C++ program. For example, here is a code snippet for calling rand([10,1]):

#include ... ColumnVector NumRands(2); NumRands(0) = 10; NumRands(1) = 1; octave_value_list f_arg, f_ret; f_arg(0) = octave_value(NumRands); f_ret = feval(“rand”, f_arg, 1); Matrix unis(f_ret(0).matrix_value()); C and C++ code can be integrated into GNU Octave by creating oct files, or using the Matlab compatible MEX files.

MATLAB Compatibility

Octave has been built with MATLAB compatibility in mind, and shares many features with MAT- LAB: 1. Matrices as fundamental data type. 2. Built-in support for complex numbers. 3. Powerful built-in math functions and extensive function libraries. 4. Extensibility in the form of user-defined functions.

In fact, Octave treats incompatibility with MATLAB as a bug; therefore, it can be consid- ered a software clone, which doesn’t infringe software copyright as per Lotus v. Borland court case.

MATLAB scripts from the MathWorks’ FileExchange repository are compatible with Octave, but can’t be used legally due the Terms of use. While often provided and uploaded by users under an

______WORLD TECHNOLOGIES ______280 Numerical Analysis, Modelling and Simulation

Octave compatible and proper Open source BSD license, the fileexchange’s Terms of use prohibit any usage beside MathWorks proprietary MATLAB.

Syntax Compatibility

There are a few purposeful, albeit minor, syntax additions: 1. Comment lines can be prefixed with the # character as well as the % character; 2. Various C-based operators ++, --, +=, *=, /= are supported; 3. Elements can be referenced without creating a new variable by cascaded indexing, e.g. [1:10](3); 4. Strings can be defined with the “ character as well as the ‘ character; 5. When the variable type is single, Octave calculates the “mean” in the single-domain (Mat- lab in double-domain) which is faster but gives less accurate results; 6. Blocks can also be terminated with more specific Control structure keywords, i.e., endif, endfor, endwhile, etc.; 7. Functions can be defined within scripts and at the Octave prompt; 8. All operators perform automatic broadcasting or singleton expansion. 9. Presence of a do-until loop (similar to do-while in C).

Function Compatibility

Many of the numerous MATLAB functions are available in GNU Octave, some of them are accessi- ble through packages via Octave-forge, but not all MATLAB functions are available in GNU Octave. List of unavailable functions exists in Octave, and developers are seeking for help to implement them. Looking for function __unimplemented.m__, leads to the list of unimplemented functions.

Unimplemented functions are also categorized in Image, Mapping, Optimization, Signal, and Sta- tistics packages.

When an unimplemented function is called the following error message is shown:

octave:1> quad2d warning: quad2d is not implemented. Consider using dblquad.

Please read to learn how you can contribute missing functionality. warning: called from __unimplemented__ at line 523 column 5 error: ‘quad2d’ undefined near line 1 column 1

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 281

User Interfaces

Until version 3.8, Octave did not come with a graphical user interface (GUI)/integrated develop- ment environment (IDE) by default. However, an official graphical interface based on has now been migrated to the main source repository and is available with Octave 3.8, but not as the default interface. It has become the default interface with the release of Octave 4.0. Several 3rd-party graphical front-ends have been developed.

Julia (Programming Language)

Julia is a high-level dynamic programming language designed to address the requirements of high-performance numerical and scientific computing while also being effective for general-pur- pose programming, web use or as a specification language.

Distinctive aspects of Julia’s design include a with parametric types in a fully dynamic programming language and multiple dispatch as its core programming paradigm. It allows con- current, parallel and distributed computing, and direct calling of C and Fortran libraries without glue code.

Julia is garbage-collected, uses eager evaluation and includes efficient libraries for floating-point calculations, linear algebra, random number generation, fast Fourier transforms and regular ex- pression matching.

Language Features

According to the official website, the main features of the language are: • Multiple dispatch: providing ability to define function behavior across many combinations of argument types • Dynamic type system: types for documentation, optimization, and dispatch • Good performance, approaching that of statically-typed languages like C • Built-in package manager • Lisp-like macros and other metaprogramming facilities • Call Python functions: use the PyCall package • Call C functions directly: no wrappers or special APIs • Powerful shell-like capabilities for managing other processes • Designed for parallelism and distributed computation • Coroutines: lightweight “green” threading • User-defined types are as fast and compact as built-ins • Automatic generation of efficient, specialized code for different argument types

______WORLD TECHNOLOGIES ______282 Numerical Analysis, Modelling and Simulation

• Elegant and extensible conversions and promotions for numeric and other types

• Efficient support for Unicode, including but not limited to UTF-8

Multiple dispatch (also known as multimethods in Lisp) is a generalization of single dis- patch – the polymorphic mechanism used in common object oriented (OO) languages – that uses inheritance. In Julia, all concrete types are subtypes of abstract types, directly or indirect- ly subtypes of the “Any” type, which is the top of the type hierarchy. Concrete types can not be subtyped, but composition is used over inheritance, that is used by traditional object-oriented languages.

Julia draws significant inspiration from various dialects of Lisp, including Scheme and , and it shares many features with Dylan (such as an ALGOL-like free-form infix syntax rather than a Lisp-like prefix syntax, while in Julia “everything” is an expression) – also a multiple-dis- patch-oriented dynamic language – and Fortress, another numerical programming language with multiple dispatch and a sophisticated parametric type system. While CLOS adds multiple dispatch to Common Lisp, not all functions are generic functions.

In Julia, Dylan and Fortress extensibility is the default, and the system’s built-in functions are all generic and extensible. In Dylan, multiple dispatch is as fundamental as it is in Julia: all user-de- fined functions and even basic built-in operations like + are generic. Dylan’s type system, however, does not fully support parametric types, which are more typical of the ML lineage of languages. By default, CLOS does not allow for dispatch on Common Lisp’s parametric types; such extended dispatch semantics can only be added as an extension through the CLOS Metaobject Protocol. By convergent design, Fortress also features multiple dispatch on parametric types; unlike Julia, however, Fortress is statically rather than dynamically typed, with separate compilation and exe- cution phases. The language features are summarized in the following table:

Language Type system Generic functions Parametric types Julia dynamic default yes Common Lisp dynamic opt-in yes (but no dispatch) Dylan dynamic default partial (no dispatch) Fortress static default yes

By default the Julia runtime needs to be pre-installed as you run source code you provide, while another way is possible, where you make a standalone “executable that doesn’t require any Julia source code” built with BuildExecutable.jl.

Julia’s syntactic macros (used for metaprogramming), like Lisp macros, are more powerful and different from text-substitution macros used in the preprocessor of some other languages such as C, because they work at the level of abstract syntax trees (ASTs). Julia’s macro system is hygienic, but also supports deliberate capture when desired (like for anaphoric macros) using the esc con- struct.

Interaction

The Julia official distribution includes an interactive session shell, called Julia’s REPL, which can be used to experiment and test code quickly. The following fragment represents a sample session

______WORLD TECHNOLOGIES ______Various Numerical Analysis Softwares 283

on the REPL:

julia> p(x) = 2x^2 + 1; f(x, y) = 1 + 2p(x)y julia> println(“Hello world!”, “ I’m on cloud “, f(0, 4), “ as Julia sup- ports recognizable syntax!”) Hello world! I’m on cloud 9 as Julia supports recognizable syntax! The REPL gives user access to the system shell and to help mode, by pressing ; or ? after the prompt (preceding each command), respectively. The REPL also keeps the history of commands, even between sessions. which gives code that can be tested inside the Julia’s interactive section or saved into a file with a .jl extension and run from the command line by typing (for example):

$ julia Julia is also supported by Jupyter, an online interactive “notebooks” environment (project Jupy- ter is a multi-language extension, that “evolved”, from the IPython command shell; now includes IJulia).

To Use Julia With Other Languages

Julia’s ccall keyword is used to call C-exported or Fortran shared library functions individually.

Julia has Unicode 9.0 support, with UTF-8 used for source code (and by default for strings) and e.g. optionally allowing common math symbols for many operators, such as for the in operator.

Julia has packages supporting markup languages such as HTML, (and also for∈ HTTP), XML, JSON and BSON.

Implementation

Julia’s core is implemented in C and C++ (the LLVM dependency is in C++), its parser in Scheme (“femtolisp”), and the LLVM compiler framework is used for just-in-time (JIT) generation of 64- bit or 32-bit optimized machine code (i.e. not for VM) depending on the platform Julia runs on. With some exceptions (e.g., libuv), the standard library is implemented in Julia itself. The most notable aspect of Julia’s implementation is its speed, which is often within a factor of two relative to fully optimized C code (and thus often an order of magnitude faster than Python or R). Develop- ment of Julia began in 2009 and an open-source version was publicized in February 2012.

Julia, the 0.5.x line, is on a monthly release schedule where bugs are fixed and some new features from 0.6-dev are backported (and possibly also to 0.4.x).

Current and Future Platforms

While Julia uses JIT (MCJIT from LLVM) – Julia generates native machine code, directly, the first time a function is run (not a bytecode that is run on a VM, as with e.g. Java/JVM or Java/Dalvik in Android).

Current support is for 32- and 64-bit (all except for ancient pre-Pentium 4-era, to optimize for newer) x86 processors (and with download of executables or source code also available for other

______WORLD TECHNOLOGIES ______284 Numerical Analysis, Modelling and Simulation

architectures). “Experimental and early support for ARM, AARCH64, and POWER (little-endian) is available too.” Including support for 1 and later (e.g. “requires at least armv6”).

Support for GNU Hurd is being worked on.

Julia version 0.6 is planned for 2016 and 1.0 for 2017 and some features are discussed for 2+ that is also planned, e.g. “multiple inheritance for abstract types”.

Julia2C Source-to-source Compiler

A Julia2C source-to-source compiler from Intel Labs is available. This source-to-source compiler is a fork of Julia, that implements the same Julia language syntax, which emits C code (for com- patibility with more CPUs) instead of native machine code, for functions or whole programs. The compiler is also meant to allow analyzing code at a higher level than C.

Intel’s ParallelAccelerator.jl can be thought of as a partial Julia to C++ compiler, but the objective is parallel speedup (can by “100x over plain Julia”, for the older 0.4 version, and could in cases also speed up serial code manyfold for that version), not compiling the full language to C++ (it’s only an implementation detail, that may be dropped later). It needs not compile all syntax, as the rest is handled by Julia.

References • Chekanov, Sergei V. (2016). Numeric Computation and Statistical Data Analysis on the Java Platform. Spring- er. p. 700. ISBN 978-3-319-28529-0.

• Chekanov, Sergei V. (2010). Scientific Data analysis using Jython Scripting and Java. Springer-Verlag. p. 497. ISBN 978-1-84996-286-5.

• Trappenberg, Thomas (2010). Fundamentals of Computational Neuroscience. Oxford University Press. p. 361. ISBN 978-0-19-956841-3.

• Megrey, Bernard A.; Moksness, Erlend (2008). Computers in Fisheries Research. Springer Science & Business Media. p. 345. ISBN 978-1-4020-8636-6.

• Kapuno, Raul Raymond (2008). Programming for Chemical Engineers Using C, C++, and MATLAB. Jones & Bartlett Publishers. p. 365. ISBN 978-1-934015-09-4.

• P.R. Richard (2003), Incorporating Uncertainty in Population Assessments Archived April 3, 2012, at the Way- back Machine., Canadian Science Advisory Secretariat Research Document.

______WORLD TECHNOLOGIES ______8 Applications of Simulation

Simulation has numerous applications; some of these applications are flight simulators, robotics suites, reservoir simulations, UrbanSim and traffic simulation. Flight simulators are devices that artificially re-creates aircraft flight and along with this also re-creates the environment in which it flies. This is used for pilot training. This chapter helps the readers in understanding the applica- tions of simulation in today’s time.

Flight Simulator

A flight simulator is a device that artificially re-creates aircraftflight and the environment in which it flies, for pilot training, design, or other purposes. It includes replicating the equations that gov- ern how aircraft fly, how they react to applications of flight controls, the effects of other aircraft systems, and how the aircraft reacts to external factors such as air density, turbulence, wind shear, cloud, precipitation, etc. Flight simulation is used for a variety of reasons, including flight training (mainly of pilots), the design and development of the aircraft itself, and research into aircraft char- acteristics and control handling qualities.

A military Flight simulator at Payerne air base, Switzerland

History of Flight Simulation World War I (1914–18)

An area of training was for air gunnery handled by the pilot or a specialist air gunner. Firing at a moving target requires aiming ahead of the target (which involves the so-called lead angle) to al- low for the time the bullets require to reach the vicinity of the target. This is sometimes also called “deflection shooting” and requires skill and practice. During World War I, some ground-based simulators were developed to teach this skill to new pilots.

______WORLD TECHNOLOGIES ______286 Numerical Analysis, Modelling and Simulation

The 1920s and 1930s

The best-known early flight simulation device was the Link Trainer, produced by Edwin Link in Binghamton, New York, USA, which he started building in 1927. He later patented his design, which was first available for sale in 1929. The Link Trainer was a basic metal frame flight simula- tor usually painted in its well-known blue color. Some of these early war era flight simulators still exist, but it is becoming increasingly difficult to find working examples.

Link Trainer

The Link family firm in Binghamton manufactured player pianos and organs, and Ed Link was therefore familiar with such components as leather bellows and reed switches. He was also a pilot, but dissatisfied with the amount of real flight training that was available, he decided to build a ground-based device to provide such training without the restrictions of weather and the availabil- ity of aircraft and flight instructors. His design had a pneumatic motion platform driven by inflat- able bellows which provided pitch and roll cues. A vacuum motor similar to those used in player pi- anos rotated the platform, providing yaw cues. A generic replica cockpit with working instruments was mounted on the motion platform. When the cockpit was covered, pilots could practice flying by instruments in a safe environment. The motion platform gave the pilot cues as to real angular motion in pitch (nose up and down), roll (wing up or down) and yaw (nose left and right).

Initially, aviation flight schools showed little interest in the “Link Trainer”. Link also demonstrated his trainer to the U.S. Army Air Force (USAAF), but with no result. However, the situation changed in 1934 when the Army Air Force was given a government contract to fly the postal mail. This in- cluded having to fly in bad weather as well as good, for which the USAAF had not previously car- ried out much training. During the first weeks of the mail service, nearly a dozen Army pilots were killed. The Army Air Force hierarchy remembered Ed Link and his trainer. Link flew in to meet them at Newark Field in New Jersey, and they were impressed by his ability to arrive on a day with poor visibility, due to practice on his training device. The result was that the USAAF purchased six Link Trainers, and this can be said to mark the start of the world flight simulation industry.

World War II (1939–1945)

The principal pilot trainer used during World War II was the Link Trainer. Some 10,000 were produced to train 500,000 new pilots from allied nations, many in the USA and Canada because many pilots were trained in those countries before returning to Europe or the Pacific to fly combat missions. Almost all US Army Air Force pilots were trained in a Link Trainer.

______WORLD TECHNOLOGIES ______Applications of Simulation 287

A Link Trainer at Freeman Field, Seymour, Indiana, 1943

A different type of World War II trainer was used for navigating at night by the stars. The Celestial Navigation Trainer of 1941 was 13.7 m (45 ft) high and capable of accommodating the navigation team of a bomber crew. It enabled sextants to be used for taking “star shots” from a projected dis- play of the night sky.

1945 to the 1960s

In 1954 United Airlines bought four flight simulators at a cost of $3 million from Curtiss-Wright that were similar to the earlier models, with the addition of visuals, sound and movement. This was the first of today’s modern flight simulators for commercial aircraft.

Types of Flight Training Devices in Service Training for Pilots

Several different devices are utilized in modern flight training. Cockpit Procedures Trainer (CPT) are used to practice basic cockpit procedures, such as processing emergency checklists, and for cockpit familiarization. Certain aircraft systems may or may not be simulated. The aerodynamic model is usually extremely generic if present at all.

Cockpit of a twinjet flight simulator.

______WORLD TECHNOLOGIES ______288 Numerical Analysis, Modelling and Simulation

Rudder control system trainer for a Grumman S-2 Tracker

Technology Motion

Statistically significant assessments of skill transfer based on training on a simulator and leading to handling an actual aircraft are difficult to make, particularly where motion cues are concerned. Large samples of pilot opinion are required and many subjective opinions tend to be aired, par- ticularly by pilots not used to making objective assessments and responding to a structured test schedule. For many years, it was believed that 6 DOF motion-based simulation gave the pilot clos- er fidelity to flight control operations and aircraft responses to control inputs and external forces and gave a better training outcome for students than non-motion-based simulation. This is de- scribed as “handling fidelity”, which can be assessed by test flight standards such as the numerical Cooper-Harper rating scale for handling qualities. Recent scientific studies have shown that the use of technology such as vibration or dynamic seats within flight simulators can be equally as ef- fective in the delivery of training as large and expensive 6-DOF FFS devices.

Qualification and Approval Procedure

When a manufacturer wishes to have an ATD model approved, a document that contains the spec- ifications for the model line and that proves compliance with the appropriate regulations is sub- mitted to the FAA. Once this document, called a Qualification Approval Guide (QAG), has been approved, all future devices conforming to the QAG are automatically approved and individual evaluation is neither required nor available.

Flight Simulator “Levels” and Other Categories

The following levels of qualification are currently being granted for both airplane and helicopter FSTD:

US Federal Aviation Administration (FAA) Aviation Training Device (ATD)

______WORLD TECHNOLOGIES ______Applications of Simulation 289

• FAA Basic ATD (BATD) - Provides an adequate training platform and design for both pro- cedural and operational performance tasks specific to the ground and flight training re- quirements for Private Pilot Certificate and instrument rating per Title 14 of the Code of Federal Regulations. • FAA Advanced ATD (AATD) - Provides an adequate training platform for both procedural and operational performance tasks specific to the ground and flight training requirements for Private Pilot Certificate, instrument rating, Commercial Pilot Certificate, and Airline Transport Pilot (ATP) Certificate, and Flight Instructor Certificate. Flight Training Devices (FTD) • FAA FTD Level 4 - Similar to a Cockpit Procedures Trainer (CPT), but for helicopters only. This level does not require an aerodynamic model, but accurate systems modeling is re- quired. • FAA FTD Level 5 - Aerodynamic programming and systems modeling is required, but it may represent a family of aircraft rather than only one specific model. • FAA FTD Level 6 - Aircraft-model-specific aerodynamic programming, control feel, and physical cockpit are required. • FAA FTD Level 7 - Model specific, helicopter only. All applicable aerodynamics, flight con- trols, and systems must be modeled. A vibration system must be supplied. This is the first level to require a visual system. Full Flight Simulators (FFS) • FAA FFS Level A - A motion system is required with at least three degrees of freedom. Air- planes only. • FAA FFS Level B - Requires three axis motion and a higher-fidelity aerodynamic model than does Level A. The lowest level of helicopter flight simulator. • FAA FFS Level C - Requires a motion platform with all six degrees of freedom. Also lower transport delay (latency) over levels A & B. The visual system must have an outside-world horizontal field of view of at least 75 degrees for each pilot. • FAA FFS Level D - The highest level of FFS qualification currently available. Requirements are for Level C with additions. The motion platform must have all six degrees of freedom, and the visual system must have an outside-world horizontal field of view of at least 150 degrees, with a Collimated (distant focus) display. Realistic sounds in the cockpit are re- quired, as well as a number of special motion and visual effects.

European Aviation Safety Agency (EASA, ex JAA)

Flight Navigation and Procedures Trainer (FNPT)

• EASA FNPT Level I • EASA FNPT Level II

______WORLD TECHNOLOGIES ______290 Numerical Analysis, Modelling and Simulation

• EASA FNPT Level III • MCC - Not a true “level” of qualification, but an add-on that allows any level of FNPT to be used for Multi Crew Coordination training. Flight Training Devices (FTD) • EASA FTD Level 1 • EASA FTD Level 2 • EASA FTD Level 3 - Helicopter only. Full Flight Simulators (FFS) • EASA FFS Level A • EASA FFS Level B • EASA FFS Level C • EASA FFS Level D

Modern High-end Flight Simulators

Stewart platform

Vertical Motion Simulator (VMS) at NASA/Ames

The largest flight simulator in the world is the Vertical Motion Simulator (VMS) at NASA Ames Research Center in “Silicon Valley” south of San Francisco. This has a very large-throw motion system with 60 feet (+/- 30 ft) of vertical movement (heave). The heave system supports a hor- izontal beam on which are mounted 40 ft rails, allowing lateral movement of a simulator cab of +/- 20 feet. A conventional 6-degree of freedom hexapod platform is mounted on the 40 ft beam, and an interchangeable cabin is mounted on the platform. This design permits quick switching of different aircraft cabins. Simulations have ranged from blimps, commercial and military aircraft to the Space Shuttle. In the case of the Space Shuttle, the large Vertical Motion Simulator was used to investigate a longitudinal pilot-induced oscillation (PIO) that occurred on an early Shuttle flight just before landing. After identification of the problem on the VMS, it was used to try different lon- gitudinal control algorithms and recommend the best for use in the Shuttle program.

______WORLD TECHNOLOGIES ______Applications of Simulation 291

Disorientation Training

AMST Systemtechnik GmbH (AMST) of Austria and Environmental Tectonics Corporation (ETC) of Philadelphia, US, manufacture a range of simulators for disorientation training, that have full freedom in yaw. The most complex of these devices is the Desdemona simulator at the TNO Re- search Institute in The Netherlands, manufactured by AMST. This large simulator has a gimballed cockpit mounted on a framework which adds vertical motion. The framework is mounted on rails attached to a rotating platform. The rails allow the simulator cab to be positioned at different radii from the centre of rotation and this gives a sustained G capability up to about 3.5.

Robotics Suite

A robotics suite is a visual environment for robot control and simulation. They are typically an end-to-end platform for robotics development and include tools for visual programming and cre- ating and debugging robot applications. Developers can often interact with robots through web- based or visual interfaces. One objective of a robotics suite is to support a variety of different robot platforms through a com- mon programming interface. The key point about a robotics suite is that the same code will run either with a simulated robot or the corresponding real robot without modification. Some robotic suites are based in free software, free hardware and both free software and hardware.

Suites

• Fedora Robotics

Reservoir Simulation

Reservoir simulation is an area of reservoir engineering in which computer models are used to predict the flow of fluids (typically, oil, water, and gas) through porous media.

A simulated depth map of the geology in a full field model from the Merlin finite difference simulator

______WORLD TECHNOLOGIES ______292 Numerical Analysis, Modelling and Simulation

Uses

Reservoir simulation models are used by oil and gas companies in the development of new fields. Also, models are used in developed fields where production forecasts are needed to help make investment decisions. As building and maintaining a robust, reliable model of a field is often time-consuming and expensive, models are typically only constructed where large investment de- cisions are at stake. Improvements in simulation software have lowered the time to develop a model. Also, models can be run on personal computers rather than more expensive workstations.

For new fields, models may help development by identifying the number of wells required, the optimal completion of wells, the present and future needs for artificial lift, and the expected pro- duction of oil, water and gas.

For ongoing reservoir management, models may help in improved oil recovery by hydraulic frac- turing. Highly deviated or horizontal wells can also be represented. Specialized software may be used in the design of hydraulic fracturing, then the improvements in productivity can be included in the field model. Also, future improvement in oil recovery with pressure maintenance by re-in- jection of produced gas or by water injection into an aquifer can be evaluated. Water flooding resulting in the improved displacement of oil is commonly evaluated using reservoir simulation.

The application of enhanced oil recovery (EOR) processes requires that the field possesses the necessary characteristics to make application successful. Model studies can assist in this evalua- tion. EOR processes include miscible displacement by natural gas, CO2, or nitrogen and chemical flooding (polymer, alkaline, surfactant, or a combination of these). Special features in simulation software is needed to represent these processes. In some miscible applications, the “smearing” of the flood front, also called numerical dispersion, may be a problem.

Reservoir simulation is used extensively to identify opportunities to increase oil production in heavy oil deposits. Oil recovery is improved by lowering the oil viscosity by injecting steam or hot water. Typical processes are steam soaks (steam is injected, then oil produced from the same well) and steam flooding (separate steam injectors and oil producers). These processes require simula- tors with special features to account for heat transfer to the fluids present and the formation, the subsequent property changes and heat losses outside of the formation.

A recent application of reservoir simulation is the modeling of coalbed methane (CBM) produc- tion. This application requires a specialized CBM simulator. In addition to the normal fractured (fissured) formation data, CBM simulation requires gas content data values at initial pressure, sorption isotherms, diffusion coefficient, and parameters to estimate the changes in absolute per- meability as a function of pore-pressure depletion and gas desorption.

Fundamentals

Representation of an underground fault by a structure map generated by Contour map software for an 8500ft deep gas & Oil reservoir in the Erath field, Vermilion Parish, Erath, Louisiana. The left-to-right gap, near the top of the contour map indicates a Fault line. This fault line is between the blue/green contour lines and the purple/red/yellow contour lines. The thin red circular con- tour line in the middle of the map indicates the top of the oil reservoir. Because gas floats above oil, the thin red contour line marks the gas/oil contact zone.

______WORLD TECHNOLOGIES ______Applications of Simulation 293

Traditional finite difference simulators dominate both theoretical and practical work in reservoir simulation. Conventional FD simulation is underpinned by three physical concepts: conservation of mass, isothermal fluid phase behavior, and the Darcy approximation of fluid flow through po- rous media. Thermal simulators (most commonly used for heavy crude oil applications) add con- servation of energy to this list, allowing temperatures to change within the reservoir.

Numerical techniques and approaches that are common in modern simulators:

• Most modern FD simulation programs allow for construction of 3-D representations for use in either full-field or single-well models. 2-D approximations are also used in various conceptual models, such as cross-sections and 2-D radial grid models.

• Theoretically, finite difference models permit discretization of the reservoir using both structured and more complex unstructured grids to accurately represent the geometry of the reservoir. Local grid refinements (a finer grid embedded inside of a coarse grid) are also a feature provided by many simulators to more accurately represent the near wellbore multi-phase flow effects. This “refined meshing” near wellbores is extremely important when analyzing issues such as water and gas coning in reservoirs.

• Representation of faults and their transmissibilities are advanced features provided in many simulators. In these models, inter-cell flow transmissibilities must be computed for non-adjacent layers outside of conventional neighbor-to-neighbor connections.

• Natural fracture simulation (known as dual-porosity and dual-permeability) is an advanced feature which model hydrocarbons in tight matrix blocks. Flow occurs from the tight matrix blocks to the more permeable fracture networks that surround the blocks, and to the wells.

• A black oil simulator does not consider changes in composition of the hydrocarbons as the field is produced. The compositional model, is a more complex model, where the PVT properties of oil and gas phases have been fitted to anequation of state (EOS), as a mixture of components. The simulator then uses the fitted EOS equation to dynamically track the movement of both phases and components in field.

Correlating relative permeability

The simulation model computes the saturation change of three phases (oil, water and gas)and pressure of each phase in each cell at each time step. As a result of declining pressure as in a reser-

______WORLD TECHNOLOGIES ______294 Numerical Analysis, Modelling and Simulation

voir depletion study, gas will be liberated from the oil. If pressures increase as a result of water or gas injection, the gas is re-dissolved into the oil phase.

A simulation project of a developed field, usually requires “history matching” where historical field production and pressures are compared to calculated values. In recent years optimisation tools such as MEPO has helped to accelerate this process, as well as improve the quality of the match obtained. The model’s parameters are adjusted until a reasonable match is achieved on a field ba- sis and usually for all wells. Commonly, producing water cuts or water-oil ratios and gas-oil ratios are matched.

Other types of simulators include finite element and streamline.

Other Engineering Approaches

Without FD models, recovery estimates and oil rates can also be calculated using numerous ana- lytical techniques which include material balance equations (including Havlena-Odeh and Tarner method), fractional flow curve methods (such as the Buckley-Leverett one-dimensional displace- ment method, the Deitz method for inclined structures, or coning models), and sweep efficiency estimation techniques for water floods and decline curve analysis. These methods were developed and used prior to traditional or “conventional” simulations tools as computationally inexpensive models based on simple homogeneous reservoir description. Analytical methods generally cannot capture all the details of the given reservoir or process, but are typically numerically fast and at times, sufficiently reliable. In modern reservoir engineering, they are generally used as screen- ing or preliminary evaluation tools. Analytical methods are especially suitable for potential assets evaluation when the data are limited and the time is critical, or for broad studies as a pre-screening tool if a large number of processes and / or technologies are to be evaluated. The analytical meth- ods are often developed and promoted in the academia or in-house, however commercial packages also exist.

Software

Many software, private, open source or commercial, are available for reservoir simulation. The most well knows (in alphabetical order) are:

Open Source:

• BOAST - Black Oil Applied Simulation Tool (Boast) simulator is a free software package for reservoir simulation available from the U.S. Department of Energy. Boast is an IMPES nu- merical simulator (finite-difference implicit pressure-explicit saturation) which finds the pressure distribution for a given time step first then calculates the saturation distribution for the same time step isothermal. The last release was in 1986 but it remains as a good simulator for educational purposes.

• MRST - The MATLAB Reservoir Simulation Toolbox (MRST) is developed by SINTEF Ap- plied Matemathics as a MATLAB® toolbox. The toolbox consists of two main parts: a core offering basic functionality and single and two-phase solvers, and a set of add-on modules offering more advanced models, viewers and solvers. MRST is mainly intended as a tool-

______WORLD TECHNOLOGIES ______Applications of Simulation 295

box for rapid prototyping and demonstration of new simulation methods and modeling concepts on unstructured grids. Despite this, many of the tools are quite efficient and can be applied to surprisingly large and complex models.

• OPM - The Open Porous Media (OPM) initiative provides a set of open-source tools cen- tered on the simulation of flow and transport of fluids in porous media.

Commercial:

• CMG Suite (IMEX, GEM and STARS) - Computer Modelling Group currently offers three simulators: a black oil simulator, called IMEX, a compositional / unconventional simulator called GEM and a thermal and advanced processes simulator called STARS.

• Amarile RE-Studio- is a pre and post processor designed by Reservoir Engineers for Res- ervoir Engineers. RE-Studio is the ideal environment for every reservoir engineer involved in hydrocarbon dynamic flow simulation, with oil and gas industry standard simulators. RE-Studio is a unique interface for any simulator.

• Schlumberger ECLIPSE- ECLIPSE is an oil and gas reservoir simulator originally devel- oped by ECL (Exploration Consultants Limited) and currently owned, developed, market- ed and maintained by SIS (formerly known as GeoQuest), a division of Schlumberger. The name ECLIPSE originally was an acronym for “ECL´s Implicit Program for Simulation Engineering”. Simulators include black oil, compositional, thermal finite-volume, and streamline simulation. Add-on options include local grid refinements, coalbed methane, gas field operations, advanced wells, reservoir coupling, and surface networks.

• Tempest MORE is a modern, next generation reservoir simulator, offering black oil, com- positional and thermal options.

• ExcSim, a fully implicit 3-phase 2D modified black oil reservoir simulator for the ® platform

• Landmark Nexus - Nexus is an oil and gas reservoir simulator originally developed as ‘Fal- con’ by Amoco, Los Alamos National Laboratory and Cray Research. It is currently owned, developed, marketed and maintained by Landmark Graphics, a product service line of Hal- liburton. Nexus will gradually replace VIP, or Desktop VIP, Landmark’s earlier generation of simulator.

• Stochastic Simulation ResAssure - ResAssure is a stochastic simulation software solution, powered by a robust and extremely fast reservoir simulator.

• Rock Flow Dynamics tNavigator supports black oil, compositional and thermal composi- tional simulations for workstations and High Performance Computing clusters

• GrailQuest’s ReservoirGrail employs a unique patented approach called Time Dynamic Volumetric Balancing to simulate reservoirs during primary and secondary recovery.

• Gemini Solutions Merlin is a fully implicit 3-Phase finite difference reservoir simulator originally developed at the Texaco research department and currently used by the Bureau

______WORLD TECHNOLOGIES ______296 Numerical Analysis, Modelling and Simulation

of Ocean Energy Management and Bureau of Safety and Environmental Enforcement to calculate Worst Case Discharge rates and burst/collapse pressures on casing shoes and blowout preventers.

UrbanSim

UrbanSim is an open source urban simulation system designed by Paul Waddell (University of Cal- ifornia, Berkeley) and developed with numerous collaborators to support metropolitan land use, transportation, and environmental planning. It has been distributed on the web since 1998, with regular revisions and updates, from www.urbansim.org. Synthicity Inc coordinates the develop- ment of UrbanSim and provides professional services to support its application. The development of UrbanSim has been funded by several grants from the National Science Foundation, the U.S. Environmental Protection Agency, the Federal Highway Administration, as well as support from states, metropolitan planning agencies and research councils in Europe and South Africa. Reviews of UrbanSim and comparison to other urban modeling platforms may be found in references.

Applications

The first documented application of UrbanSim was a prototype application to the Eu- gene-Springfield, Oregon setting. Later applications of the system have been documented in several U.S. cities, including Detroit, Michigan, Salt Lake City, Utah, San Francisco, Califor- nia, and Seattle, Washington. In Europe, UrbanSim has been applied in Paris, France; Brus- sels, Belgium; and Zurich, Switzerland with various other applications not yet documented in published papers.

Architecture

The initial implementation of UrbanSim was implemented in Java. The software architecture was modularized and reimplemented in Python beginning in 2005, making extensive use of the Numpy numerical library. The software has been generalized and abstracted from the UrbanSim model system, and is now referred to as the Open Platform for Urban Simulation (OPUS), in order to facilitate a plug-in architecture for models such as activity-based travel, dynamic traffic assignment, emissions, and land cover change. OPUS includes a Graphical User Interface, and a concise expression language to facilitate access to complex internal op- erations by non-programmers.

Design

Earlier urban model systems were generally based on deterministic solution algorithms such as Spatial Interaction or Spatial Input-Output, that emphasize repeatability and uniqueness of con- vergence to an equilibrium, but rest on strong assumptions about behavior, such as agents having perfect information of all the alternative locations in the metropolitan area, transactions being costless, and markets being perfectly competitive. Housing booms and busts, and the financial crisis, are relatively clear examples of market imperfections that motivate the use of less restrictive

______WORLD TECHNOLOGIES ______Applications of Simulation 297

assumptions in UrbanSim. Rather than calibrating the model to a cross-sectional equilibrium, or base-year set of conditions, statistical methods have been developed to calibrate uncertainty in UrbanSim arising from its use of Monte Carlo methods and from uncertainty in data and models, against observed data over a longitudinal period, using a method known as Bayesian Melding. In addition to its less strong assumptions about markets, UrbanSim departs from earlier model designs that used high levels of aggregation of geography into large zones, and agents such as households and jobs into large groups assumed to be homogeneous. Instead, UrbanSim adopts a microsimulation approach meaning that it represents individual agents within the simulation. This is an agent-level model system, but unlike most agent-based models, it does not focus exclu- sively on the interactions of adjacent agents. Households, businesses or jobs, buildings, and land areas represented alternatively by parcels, gridcells, or zones, are used to represent the agents and locations within a metropolitan area. The parcel level modeling applications allow for the first time the representation of accessibility at a walking scale, something that cannot be effectively done at high levels of spatial aggregation.

Engagement

One of the motivations for the UrbanSim project is to not only provide robust predictions of the potential outcomes of different transportation investments and land use policies, but also to facil- itate more deliberative civic engagement in what are often contentious debates about transporta- tion infrastructure, or land policies, with uneven distributions of benefits and costs. Initial work on this topic has adopted an approach called Value Sensitive Design. Recent work has also emerged to integrate new forms of visualization, including 3D simulated landscapes.

Traffic Simulation

Traffic simulation or the simulation of transportation systems is the mathematical modeling of transportation systems (e.g., freeway junctions, arterial routes, roundabouts, downtown grid sys- tems, etc.) through the application of computer software to better help plan, design and operate transportation systems. Simulation of transportation systems started over forty years ago, and is an important area of discipline in traffic engineering and transportation planning today. Various national and local transportation agencies, academic institutions and consulting firms use simula- tion to aid in their management of transportation networks.

Simulation in transportation is important because it can study models too complicated for analyti- cal or numerical treatment, can be used for experimental studies, can study detailed relations that might be lost in analytical or numerical treatment and can produce attractive visual demonstra- tions of present and future scenarios.

To understand simulation, it is important to understand the concept of system state, which is a set of variables that contains enough information to describe the evolution of the system over time. System state can be either discrete or continuous. Traffic simulation models are classified accord- ing to discrete and continuous time, state, and space.

______WORLD TECHNOLOGIES ______298 Numerical Analysis, Modelling and Simulation

Traffic simulation types

Theory Traffic Models

Simulation methods in transportation can employ a selection of theories, including probability and statistics, differential equations and numerical methods.

• Monte Carlo method

One of the earliest discrete event simulation models is the Monte Carlo simulation, where a series of random numbers are used to synthesise traffic conditions.

• Cellular automata model

This was followed by the cellular automata model that generates randomness from deterministic rules.

• Discrete event and continuous-time simulation

More recent methods use either discrete event simulation or continuous-time simulation. Discrete event simulation models are both stochastic (with random components) and dynamic (time is a variable). Single server queues for instance can be modeled very well using discrete event simula- tion, as servers are usually at a single location and so are discrete (e.g. traffic lights). Continuous time simulation, on the other hand, can solve the shortcoming of discrete event simulation where the model is required to have input, state and output trajectories within a time interval. The meth- od requires the use of differential equations, specifically numerical integration methods. These equations can range from simple methods, such as Euler’s method, to higher order Taylor’s series methods, such as Heun’s method and Runge-Kutta.

• Car-following models

______WORLD TECHNOLOGIES ______Applications of Simulation 299

A class of microscopic continuous-time models, known as car-following models, are also based on differential equations. Significant models include the Pipes, intelligent driver model and Gipps’ model. They model the behavior of each individual vehicle (“microscopic”) in order to see its im- plications on the whole traffic system (“macroscopic”). Employing a numerical method witha car-following model (such as Gipps’ with Heun’s) can generate important information for traffic conditions, such as system delays and identification of bottlenecks.

Systems Planning

The methods noted above are generally used to model the behavior of an existing system, and are often focused around specific areas of interest under a range of conditions (such as a change in layout, lane closures, and different levels of traffic flow).Transport planning and forecasting can be used to develop a wider understanding of traffic demands over a broad geographic area, and predicting future traffic levels at different links (sections) in the network, incorporating different growth scenarios, with feed- back loops to incorporate the effect of congestion on the distribution of trips.

Applications in Transportation Engineering

Traffic simulation models are useful from a microscopic, macroscopic and sometimes mesoscopic per- spectives. Simulation can be applied to both transportation planning and to transportation design and operations. In transportation planning the simulation models evaluate the impacts of regional urban development patterns on the performance of the transportation infrastructure. Regional planning or- ganizations use these models to evaluate what-if scenarios in the region, such as air quality to help develop land use policies that lead to more sustainable travel. On the other hand, modeling of trans- portation system operations and design focus on a smaller scale, such as a highway corridor and pinch- points. Lane types, signal timing and other traffic related questions are investigated to improve local system effectiveness and efficiency. While certain simulation models are specialized to model either operations or system planning, certain models have the capability to model both to some degree.

Whether it is for planning or for systems operations, simulations can be used for a variety of trans- portation modes.

Roadway and Ground Transportation

Map displaying the results of simulating pedestrian traffic at the National September 11 Memorial & Museum site, based on modeling by the Louis Berger Group

______WORLD TECHNOLOGIES ______300 Numerical Analysis, Modelling and Simulation

Ground transportation for both passenger and goods movement is perhaps the area where simu- lation is most widely used. Simulation can be carried out at a corridor level, or at a more complex roadway grid network level to analyze planning, design and operations such as delay, pollution, and congestion. Ground transportation models can include all modes of roadway travel, including vehicles, trucks, buses, bicycles and pedestrians. In traditional road traffic models, aggregate rep- resentation of traffic is typically used where all vehicles of a particular group obey the same rules of behavior; in micro-simulation, driver behavior and network performance are included so that complete traffic problems (e.g. Intelligent transportation system, shockwaves) can be examined.

Rail Transportation

Rail is an important mode of travel for both freight and passengers. Modeling railways for freight movement is important to determine the operational efficiency and rationalize planning decisions. Freight simulation can include aspects such as dedicated truck lanes, commodity flow, corridor and system capacity, traffic assignment/network flow, and freight plans that involve travel de- mand forecasting.

Maritime and Air Transportation

Maritime and air transportation presents two areas that are important for the economy. Mari- time simulation primarily includes container terminal modeling, that deals with the logistics of container handling to improve system efficiency. Air transportation simulation primarily involves modeling of the airport terminal operations (baggage handling, security checkpoint), and runway operations.

Other

In addition to simulating individual modes, it is often more important to simulate a multi-modal network, since in reality modes are integrated and represent more complexities that each individ- ual mode can overlook. Inter-modal network simulation can also better understand the impact of a certain network from a comprehensive perspective to more accurately represent its impact in or- der to realize important policy implications. An example of an inter-modal simulator is Commuter developed by Azalient which introduces both dynamic route and mode choice by agents during simulation - this type of modeling is referred to as nanosimulation as it considers demand and travel at a finer level of detail than traditional microsimulation.

Simulation in transportation can also be integrated with urban environment simulation, where a large urban area is simulated which includes roadway networks, to better understand land use and other planning implications of the traffic network on the urban environment.

Software Programs

Simulation software is getting better in a variety of different ways. With new advancements in mathematics, engineering and computing, simulation software programs are increasingly becom- ing faster, more powerful, more detail oriented and more realistic.

Transportation models generally can be classified into microscopic, mesoscopic, macroscopic,

______WORLD TECHNOLOGIES ______Applications of Simulation 301

and metascopic models. Microscopic models study individual elements of transportation systems, such as individual vehicle dynamics and individual traveler behavior. Mesoscopic models analyze transportation elements in small groups, within which elements are considered homogeneous. A typical example is vehicle platoon dynamics and household-level travel behavior. Macroscopic models deal with aggregated characteristics of transportation elements, such as aggregated traffic flow dynamics and zonal-level travel demand analysis.

Microsimulation

Microsimulation models track individual vehicle movements on a second or subsecond basis. Mi- crosimulation relies on random numbers to generate vehicles, select routing decisions, and deter- mine behavior. Because of this variation, it is necessary to run the model several times with differ- ent random number seeds to obtain the desired accuracy. There will be a ‘warm-up’ period before the system reaches a steady state, and this period should be excluded from the results.

Microsimulation models usually produce two types of results: animated displays, and numerical output in text files. It is important to understand how the software has accumulated and summa- rized the numerical results to prevent incorrect interpretation. Animation can allow the analyst to quickly assess the performance, however it is limited to qualitative comparisons. The main indica- tion of a problem that can be seen in an animation is the forming of persistent queues.

‘Measures of Effectiveness’ (MOEs) may be calculated or defined in a manner which is unique to each simulation program. MOEs are the system performance statistics that categorize the degree to which a particular alternative meets the project objectives. The following MOEs are most com- mon when analyzing simulation models:

• ‘VMT’ (vehicle miles traveled) is computed as a combination of the number of vehicles in the system and their distance they traveled.

• ‘VHT’ (vehicle hours of travel) is computed as the product of the link volume and the link travel time, summed over all links.

• ‘Mean system speed’ is equal to VMT/VHT.

• ‘Total system delay’ is one of the most effective ways to evaluate different congestion re- lieving alternatives and it is usually the MOE that the travelling public notices. Delay can be calculated several ways. Some consider it to be only that delay which is above free flow conditions. Others include the baseline delay which occurs as a result of traffic control devices. Some even include acceleration and deceleration delay, while others include only stopped delay.

Other commonly reported metrics from traffic simulation tools include:

• Link road section speeds, flow, density, travel time, delay, stop time

• Intersection turning volumes, delay,

• Journey times

______WORLD TECHNOLOGIES ______302 Numerical Analysis, Modelling and Simulation

• loop detector records for speed, occupancy, headway, gap

• vehicle trajectories and speed vs. distance plots

Comparing Simulation Results With The US Highway Capacity Manual

The output of a microsimulation model is different from that of the US Federal Highway Capacity Manual (HCM). For example, most HCM procedures assume that the operation of one intersec- tion will not be affected by the conditions of an adjacent roadway (with the exception of HCS 2000 Freeways). ‘Rubbernecking’ and long queues from one location interfering with another location would contradict this assumption.

The HCM 2010 provides revised guidance on what types of output from traffic simulation software are most suitable for analysis in, and comparison to, the HCM for example vehicle trajectories and raw loop detector output.

Comparison with HCM Delay and Level of Service

In the HCM delay is used to estimate the level of service (LOS) for intersections. However, there are distinct differences between the way microsimulation programs and the HCM define delay. The HCM bases its delay on adjusted flow using mean control delay for the highest 15 minute pe- riod within the hour. The distinction between total delay and control delay is important. Control delay is when a signal control causes a group to slow down or stop. It’s important to look at the software’s documentation to understand how it calculates delay. In order to use microsimulation outputs to find LOS, the delay must be accumulated over 15 minute intervals and averaged over several runs with different random seeds. Because the HCM uses adjusted flow, another way to compare delay is divide the simulation input’s 15 minute peak volume by the peak hour factor (PHF) to increase the simulation’s volume.

Comparison with HCM Queues

HCM 2000 defines a queue as a line of vehicles, bicycles, or persons waiting to be served by the system in which the flow rate from the front of the queue determines the average speed within the queue. Slowly moving vehicles or people joining the rear of the queue are usually considered part of the queue. These definitions are somewhat relative and can be ambigu- ous. In most microsimulation programs the queue length cannot exceed the storage capacity for that turn-bay or lane. Overflows into the adjacent link or off the network are usually not accounted for, even though this may affect the results. (If this is the case, a work-around can be to temporarily ignore those effects and extend the network or storage area for the link to include the maximum queue length.)

Stochastic Simulation

A stochastic simulation is a simulation that traces the evolution of variables that can change sto- chastically (randomly) with certain probabilities.

______WORLD TECHNOLOGIES ______Applications of Simulation 303

With a stochastic model we create a projection which is based on a set of random values. Outputs are recorded and the projection is repeated with a new set of random values of the variables. These steps are repeated until a sufficient amount of data is gathered. In the end, the distribution of the outputs shows the most probable estimates as well as a frame of expectations regarding what rang- es of values the variables are more or less likely to fall in.

Etymology

Stochastic originally meant “pertaining to conjecture”; from Greek stokhastikos “able to guess, conjecturing”: from stokhazesthai “guess”; from stokhos “a guess, aim, target, mark”. The sense of “randomly determined” was first recorded in 1934, from German Stochastik.

Discrete-event Simulation

In order to determine the next event in a stochastic simulation, the rates of all possible changes to the state of the model are computed, and then ordered in an array. Next, the cumulative sum of the array is taken, and the final cell contains the number R, where R is the total event rate. This cumu- lative array is now a discrete cumulative distribution, and can be used to choose the next event by picking a random number z~U(0,R) and choosing the first event, such that z is less than the rate associated with that event.

Probability Distributions

A probability distribution is used to describe the potential outcome of a random variable.

Limits the outcomes where the variable can only take on discrete values.

Bernoulli Distribution

A random variable X is Bernoulli-distributed with parameter p if it has only two possible out- comes, usually encoded as 1 (success or default) or 0 (failure or survival).

Example: Toss of coin

Define

X = 1 if head comes up and X = 0 if tail comes up Both realizations are equally likely:

P (X = 1) = P (X = 0) = 1/2 Of course, the two outcomes may not be equally likely (e.g. success of medical treatment).

Binomial Distribution

A binomial distributed random variable Y with parameters n and p is obtained as the sum of n independent and identically Bernoulli-distributed random variables X1, X2, ..., Xn

______WORLD TECHNOLOGIES ______304 Numerical Analysis, Modelling and Simulation

Example: A coin is tossed three times. Find the probability of getting exactly two heads. This problem can be solved by looking at the sample space. There are three ways to get two heads.

HHH, HHT, HTH, THH, TTH, THT, HTT, TTT The answer is 3/8 (= 0.375).

Poisson Distribution

The Poisson distribution depends on only one parameter, λ, and can be interpreted as an approx- imation to the binomial distribution when the parameter p is a small number. A poisson-distrib- uted random variable is usually used to describe the random number of events occurring over a certain time interval.

Typical example problem: If 3% of the electric bulbs manufactured by a company are defective find the probability that in a sample of 100 bulbs exactly 5 bulbs are defective. ( Given e-0.25= 0.7788 )

Methods Direct and First Reaction Methods

Published by Dan Gillespie in 1977, and is a linear search on the cumulative array.

Gillespie’s Stochastic Simulation Algorithm (SSA) is essentially an exact procedure for numeri- cally simulating the time evolution of a well-stirred chemically reacting system by taking proper account of the randomness inherent in such a system.

It is rigorously based on the same microphysical premise that underlies the chemical master equa- tion and gives a more realistic representation of a system’s evolution than the deterministic reac- tion rate equation (RRE) represented mathematically by ODEs.

As with the chemical master equation, the SSA converges, in the limit of large numbers of reac- tants, to the same solution as the law of mass action.

Next Reaction Method

Published 2000. This is an improvement over the first reaction method where the unused reaction times are reused. To make the sampling of reactions more efficient, an indexed priority queue is used to store the reaction times. On the other hand, to make the recomputation of propensities more efficient, a dependency graph is used. This dependency graph tells which reaction propensi- ties to update after a particular reaction has fired.

Optimised and Sorting Direct Methods

Published 2004 and 2005. These methods sort the cumulative array to reduce the average search depth of the algorithm. The former runs a presimulation to estimate the firing frequency of reac- tions, whereas the latter sorts the cumulative array on-the-fly.

______WORLD TECHNOLOGIES ______Applications of Simulation 305

Logarithmic Direct Method

Published in 2006. This is a binary search on the cumulative array, thus reducing the worst-case time complexity of reaction sampling to O (log M).

Partial-propensity Methods

Published in 2009, 2010, and 2011 (Ramaswamy 2009, 2010, 2011). Use factored-out, partial re- action propensities to reduce the computational cost to scale with the number of species in the network, rather than the (larger) number of reactions. Four variants exist:

• PDM, the partial-propensity direct method. Has a computational cost that scales linearly with the number of different species in the reaction network, independent of the coupling class of the network (Ramaswamy 2009).

• SPDM, the sorting partial-propensity direct method. Uses dynamic bubble sort to reduce the pre-factor of the computational cost in multi-scale reaction networks where the reac- tion rates span several orders of magnitude (Ramaswamy 2009).

• PSSA-CR, the partial-propensity SSA with composition-rejection sampling. Reduces the computational cost to constant time (i.e., independent of network size) for weakly coupled networks (Ramaswamy 2010) using composition-rejection sampling (Slepoy 2008).

• dPDM, the delay partial-propensity direct method. Extends PDM to reaction networks that incur time delays (Ramaswamy 2011) by providing a partial-propensity variant of the de- lay-SSA method (Bratsun 2005, Cai 2007).

The use of partial-propensity methods is limited to elementary chemical reactions, i.e., reactions with at most two different reactants. Every non-elementary chemical reaction can be equivalently decomposed into a set of elementary ones, at the expense of a linear (in the order of the reaction) increase in network size.

Approximate Methods

A general drawback of stochastic simulations is that for big systems, too many events happen which cannot all be taken into account in a simulation. The following methods can dramatically improve simulation speed by some approximations.

Τ Leaping Method

Since the SSA method keeps track of each transition, it would be impractical to implement for cer- tain applications due to high time complexity. Gillespie proposed an approximation procedure, the tau-leaping method which decreases computational time with minimal loss of accuracy. Instead of taking incremental steps in time, keeping track of X(t) at each time step as in the SSA method, the tau-leaping method leaps from one subinterval to the next, approximating how many transitions take place during a given subinterval. It is assumed that the value of the leap, τ, is small enough that there is no significant change in the value of the transition rates along the subinterval [t, t + τ]. This condition is known as the leap condition. The tau-leaping method thus has the advantage of

______WORLD TECHNOLOGIES ______306 Numerical Analysis, Modelling and Simulation

simulating many transitions in one leap while not losing significant accuracy, resulting in a speed up in computational time.

Conditional Difference Method

This method approximates reversible processes (which includes random walk/diffusion process- es) by taking only net rates of the opposing events of a reversible process into account. The main advantage of this method is that it can be implemented with a simple if-statement replacing the previous transition rates of the model with new, effective rates. The model with the replaced tran- sition rates can thus be solved, for instance, with the conventional SSA.

Continuous Simulation

While in discrete state space it is clearly distinguished between particular states (values) in contin- uous space it is not possible due to certain continuity. The system usually change over time, vari- ables of the model, then change continuously as well. Continuous simulation thereby simulates the system over time, given differential equations determining the rates of change of state variables. Example of continuous system is the predator/prey model or cart-pole balancing

Probability Distributions Normal Distribution

The random variable X is said to be normally distributed with parameters μ and σ, abbreviated by X ()x−µ 2 − 1 2 N (μ, σ2), if the density of the random variable is given by the formula fx()= e2σ . x X 2 x R. 2πσ ∈ Many∈ things actually are normally distributed, or very close to it. For example, height and intelligence are approximately normally distributed; measurement errors also often have a normal distribution.

Exponential Distribution

Exponential distribution describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate.

The exponential distribution is popular, for example, in queuing theory when we want to model the time we have to wait until a certain event takes place. Examples include the time until the next client enters the store, the time until a certain company defaults or the time until some machine has a defect.

Student’s T-Distribution Student’s t-distribution are used in finance as probabilistic models of assets returns. The density

ν + ν +1 1 − Γ() 2 2 t 2 function of the t-distribution is given by the following equation: ft()= 1+ , ν ν νπ Γ() 2 where ν is the number of degrees of freedom and Γ is the .

______WORLD TECHNOLOGIES ______Applications of Simulation 307

For large values of n, the t-distribution doesn’t significantly differ from a standard normal distri- bution. Usually, for values n > 30, the t-distribution is considered as equal to the standard normal distribution.

Other Distributions

• Generalized extreme value distribution

Combined Simulation

It is often possible to model one and the same system by use of completely different world views. Dis- crete event simulation of a problem as well as continuous event simulation of it (continuous simulation with the discrete events that disrupt the continuous flow) may lead eventually to the same answers. Sometimes however, the techniques can answer different questions about a system. If we necessarily need to answer all the questions, or if we don’t know what purposes is the model going to be used for, it is convenient to apply combined continuous/discrete methodology. Similar techniques can change from a discrete, stochastic description to a deterministic, continuum description in a time-and space dependent manner. The use of this technique enables the capturing of noise due to small copy num- bers, while being much faster to simulate than the conventional Gillespie algorithm. Furthermore, the use of the deterministic continuum description enables the simulations of arbitrarily large systems.

Monte Carlo Simulation

Monte Carlo is an estimation procedure. The main idea is that if it is necessary to know the average value of some random variable and its distribution can not be stated, and if it is possible to take samples from the distribution, we can estimate it by taking the samples, independently, and aver- aging them. If there are sufficient samples, then the law of large numbers says the average must be close to the true value. The central limit theorem says that the average has a Gaussian distribution around the true value.

Simple example: We need to measure area of a shape with a complicated, irregular outline. The Monte Carlo approach is to draw a square around the shape and measure the square. Then we throw darts into the square, as uniformly as possible. The fraction of darts falling on the shape gives the ratio of the area of the shape to the area of the square. In fact, it is possible to cast almost any integral problem, or any averaging problem, into this form. It is necessary to have a good way to tell if you’re inside the outline, and a good way to figure out how many darts to throw. Last but not least, we need to throw the darts uniformly, i.e., a good random number generator.

Application

There are wide possibilities for use of Monte Carlo Method: • Statistic experiment using generation of random variables (e.g. dice) • sampling method • Mathematics (e.g. numerical integration, multiple integrals) • Reliability Engineering

______WORLD TECHNOLOGIES ______308 Numerical Analysis, Modelling and Simulation

• Project Management (SixSigma) • Experimental particle physics • Simulations • Risk Measurement/Risk Management (e.g. Portfolio value estimation) • Economy (e.g. finding the best fitting Demand) • Process Simulation • Operation Research

Random Number Generators

For simulation experiments (including Monte Carlo) it is necessary to generate random numbers (as values of variables). The problem is, that the computer is highly deterministic machine - ba- sically, behind each process there is always an algorithm, deterministic computation changing inputs to outuputs, therefore it is not easy to generate uniformly spread random numbers over a defined interval or set.

Random number generator is a device capable of producing a sequence of numbers which can not be “easily” identified with deterministic properties. This sequence is then called Sequence of stochastic numbers.

The algorithms typically rely on pseudo random numbers, computer generated numbers mimick- ing true random numbers, to generate a realization, one possible outcome of a process.

Methods for obtaining random numbers exist for a long time and are used in many different fields (like gaming). However, these number suffer from certain bias. Currently the best methods, ex- pected to produce truly random sequences are natural methods that take advantage of the random nature of quantum phenomena.

References • Peter John Davison. “A summary of studies conducted on the effect of motion in flight simulator pilot training” (PDF). MPL Simulator Solutions. Retrieved September 18, 2016.

• Spill, F.; et al. “Hybrid approaches for multiple-species stochastic reaction–diffusion models”. Journal of Com- putational Physics. 299: 429–445. doi:10.1016/j.jcp.2015.07.002.

• Beard, Steven; et al. “Space Shuttle Landing and Rollout Training at the Vertical Motion Simulator” (PDF). AIAA. Retrieved 5 February 2014.

• Fly Away Simulation (12 July 2010). “Flight Simulator Technology Through the Years”. Archived from the orig- inal on 12 October 2011. Retrieved 20 April 2011.

______WORLD TECHNOLOGIES ______Permissions

All chapters in this book are published with permission under the Creative Commons Attribution Share Alike License or equivalent. Every chapter published in this book has been scrutinized by our experts. Their significance has been extensively debated. The topics covered herein carry significant information for a comprehensive understanding. They may even be implemented as practical applications or may be referred to as a beginning point for further studies.

We would like to thank the editorial team for lending their expertise to make the book truly unique. They have played a crucial role in the development of this book. Without their invaluable contributions this book wouldn’t have been possible. They have made vital efforts to compile up to date information on the varied aspects of this subject to make this book a valuable addition to the collection of many professionals and students.

This book was conceptualized with the vision of imparting up-to-date and integrated information in this field. To ensure the same, a matchless editorial board was set up. Every individual on the board went through rigorous rounds of assessment to prove their worth. After which they invested a large part of their time researching and compiling the most relevant data for our readers.

The editorial board has been involved in producing this book since its inception. They have spent rigorous hours researching and exploring the diverse topics which have resulted in the successful publishing of this book. They have passed on their knowledge of decades through this book. To expedite this challenging task, the publisher supported the team at every step. A small team of assistant editors was also appointed to further simplify the editing procedure and attain best results for the readers.

Apart from the editorial board, the designing team has also invested a significant amount of their time in understanding the subject and creating the most relevant covers. They scrutinized every image to scout for the most suitable representation of the subject and create an appropriate cover for the book.

The publishing team has been an ardent support to the editorial, designing and production team. Their endless efforts to recruit the best for this project, has resulted in the accomplishment of this book. They are a veteran in the field of academics and their pool of knowledge is as vast as their experience in printing. Their expertise and guidance has proved useful at every step. Their uncompromising quality standards have made this book an exceptional effort. Their encouragement from time to time has been an inspiration for everyone.

The publisher and the editorial board hope that this book will prove to be a valuable piece of knowledge for students, practitioners and scholars across the globe.

______WORLD TECHNOLOGIES ______Index

A First-order Exponential Integrator Method, 123 A Posteriori Methods, 210, 212-213 Fit To Empirical Data, 67 A Priori Methods, 210, 212 Flight Simulator, 12, 22, 29, 285-290, 308 Analytica (software), 272 Approximation Theory, 108-109, 111, 113, 115, 117-119, 145 G Ascend, 48, 52-53 Gaussian Elimination, 2, 4, 8, 151-152, 154-158, 224, 251, 253 Gnu Octave, 10, 256, 266-268, 275, 278-280 B Backward Euler Method, 122, 126 H Bayesian Quadrature, 171 Heuristics, 169, 195, 198, 200 Bernoulli Distribution, 303 I Bernstein’s Theorem (approximation Theory), 117 Importance Sampling Algorithm, 189-190 Binomial Distribution, 303-304 Interpolation, 2, 7-8, 68, 112, 120, 139-145, 164, 200 Bishop’s Theorem, 116 J C Julia (programming Language), 256, 281 Calculus of Optimization, 198 Chebyshev Approximation, 108, 110 L Combinatorial Optimization Problem, 203-204 Lagrange Multipliers, 9, 161, 198 Computer Simulation, 11, 13-14, 22, 34-35, 38-41, 47, 244, Lapack, 239, 256-257, 264-265 255 Linear Interpolation, 2, 7, 141-144 Continuous Optimization Problem, 203 List of Computer Simulation Software, 47 Convex Maximization, 162 List of Numerical Analysis Software, 256 Convex Optimization, 158-159, 162-163, 194-195, 209 Convex Optimization Problem, 159, 162, 195, 209 M Mathematical Model, 13, 27, 33, 62-64, 66-67, 71, 91, 104- D 105, 244, 262 Datamelt, 256, 259-260, 266-271 Mathematical Optimization, 164, 190, 193, 197, 202, 205, 214 Differential Equations, 1-2, 9, 37-38, 43, 62-63, 68, 71, 75, Matrix Splitting, 9, 146-147 91, 120-122, 126, 131, 146, 162, 164, 171, 174-175, 180, Minimum Polynomial Extrapolation, 120, 130 260, 298-299, 306 Minsky (economic Simulator), 48, 52 Discrete Event Simulation, 44, 61, 298, 307 Miser Monte Carlo, 187 Discretization Error, 5 Monte Carlo Integration, 164, 176, 181, 183-186 Dynamic Simulation, 37, 43, 48-49, 51 Monte Carlo Method, 41, 164, 172-174, 176-178, 181-183, 298, 307 E Multi-objective Optimization, 196, 202, 205-217 Euler Method, 8, 120-123, 126 Multi-objective Optimization Software, 216-217 Evaluating Integrals, 9, 172 Multidimensional Integrals, 171 Exponential Distribution, 306 Multiple And Adaptive Importance Sampling, 190 Extrapolation, 7-8, 68, 120, 124, 126, 128, 130-131, 133- 135, 145, 169 N Nachbin’s Theorem, 116 F Favard’s Theorem, 108, 118 Netlogo, 48, 53-55, 217 Fejér’s Theorem, 108, 116-117 No-preference Methods, 211

______WORLD TECHNOLOGIES ______Index 311

Normal Distribution, 91, 186, 306-307 Series Acceleration, 120, 128, 136, 169 Np Optimization Problem, 204 Shanks Transformation, 120, 130, 136-139 Numerical Integration, 3-4, 9, 110, 120, 128, 164, 166-167, Simulation, 2, 4, 6, 8, 10-61, 64, 66, 68, 70, 72, 74, 76, 78, 172, 181, 183-184, 298, 307 80, 82, 84, 86, 88, 90, 92, 94-96, 98, 100, 102, 104, 106- Numerical Methods For Ordinary Differential Equations, 120 107, 110, 112, 114, 116, 118, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 166, 170, 172, 174, 176-184, O 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, Optimal Input Arguments, 192 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, Optimal Polynomials, 109 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, Optimization Algorithms, 2, 197-198, 217 252, 254-255, 258-262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284-308 Optimization Problem, 70, 82-83, 159, 162, 164, 182, 190- 191, 195-196, 202-206, 209-210, 212 Singular Value Decomposition, 9, 164, 226-231, 234, 239, 243, 264 P Sparse Grids, 9, 171 Parallel-in-time Methods, 125 Spline Interpolation, 142-143 Piecewise Constant Interpolation, 141 Stochastic Simulation, 38, 176, 295, 302-304 Poisson Distribution, 304 Stone-weierstrass Theorem, 108, 112-116 Polynomial Interpolation, 112, 142-143 Student’s T-distribution, 306

Q T Quadrature, 9, 110, 121, 164-167, 169-171 Tk Solver, 10, 256, 262-264 Quasiconvex Minimization, 162 Tradeoff Curve, 215 Traffic Simulation, 13, 42, 285, 297-299, 301-302 R Truncation, 5, 36, 125-126 Recursive Stratified Sampling, 182, 187 Regression, 7-8, 91, 104, 139, 144, 267 U Remez’s Algorithm, 110-112 Urbansim, 25, 285, 296-297 Reservoir Simulation, 291-294 Richardson Extrapolation, 120, 124, 126, 128, 131, 133- V 135, 169 Van Wijngaarden Transformation, 129, 145 Robotics Suite, 291 Vegas Monte Carlo, 188-189

S W Scilab, 10, 48, 55-57, 257, 275 Weierstrass Approximation Theorem, 112-113

______WORLD TECHNOLOGIES ______