Assessing Excel VBA Suitability for Monte Carlo Simulation Alexei Botchkarev Ryerson University, [email protected]

Assessing Excel VBA Suitability for Monte Carlo Simulation Alexei Botchkarev Ryerson University, Alex.Bot@Gsrc.Ca

Spreadsheets in Education (eJSiE)

Volume 8 | Issue 2 Article 3

7-18-2015 Assessing Excel VBA Suitability for Monte Carlo Simulation Alexei Botchkarev Ryerson University, [email protected]

Follow this and additional works at: http://epublications.bond.edu.au/ejsie

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Recommended Citation Botchkarev, Alexei (2015) Assessing Excel VBA Suitability for Monte Carlo Simulation, Spreadsheets in Education (eJSiE): Vol. 8: Iss. 2, Article 3. Available at: http://epublications.bond.edu.au/ejsie/vol8/iss2/3

This Regular Article is brought to you by the Bond Business School at ePublications@bond. It has been accepted for inclusion in Spreadsheets in Education (eJSiE) by an authorized administrator of ePublications@bond. For more information, please contact Bond University's Repository Coordinator. Assessing Excel VBA Suitability for Monte Carlo Simulation

Abstract Monte Carlo (MC) simulation includes a wide range of stochastic techniques used to quantitatively evaluate the behavior of complex systems or processes. Microsoft Excel spreadsheets with Visual Basic for Applications (VBA) software is, arguably, the most commonly employed general purpose tool for MC simulation. Despite the popularity of the Excel in many industries and educational institutions, it has been repeatedly criticized for its flaws and often described as questionable, if not completely unsuitable, for statistical problems. The purpose of this study is to assess suitability of the Excel (specifically its 2010 and 2013 versions) with VBA programming as a tool for MC simulation. The er sults of the study indicate that Microsoft Excel (versions 2010 and 2013) is a strong Monte Carlo simulation application offering a solid framework of core simulation components including spreadsheets for data input and output, VBA development environment and summary statistics functions. This framework should be complemented with an external high-quality pseudo-random number generator added as a VBA module. A large and diverse category of Excel’s incidental simulation components that includes statistical distributions, linear and non-linear regression and other statistical, engineering and business functions require execution of due diligence to determine their suitability for a specific MC project.

Editorial note

Although the present article is not directly concerned with “spreadsheets in education”, the decision to publish it was made on more general grounds. Microsoft Excel has received plenty of bad press concerning its statistical functions over a period of many years, and Microsoft ash been slow to address many of these issues. However, the situation appears to have improved significantly since the early publications of McCullough & Wilson in the journal Computational Statistics & Data Analysis (references [72] and [73] herein), in which problems with Excel’s statistical functions were highlighted. We have chosen to publish the present article so readers have some more up-to-date information on which to base their decisions. As a final comment, readers should also be aware of the existence of RExcel, a port of the statistical package R to Excel as an add-in. It is a free download and is the work of one of our own editorial board members, Emeritus Professor Erich Neuwirth, University of Vienna. Thus, one may have one’s cake and eat it too: the acknowledged accuracy and respectability of R, along with the friendly, well-known interface of Excel.

Keywords simulation, Monte Carlo, Excel, VBA, spreadsheets, suitability, errors, limitations

Distribution License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Cover Page Footnote The uthora is grateful to Natalia Botchkareva, a financial analyst with expert knowledge and skills in Excel, for constant support and advice. The uthora is thankful to an anonymous referee who rejected author’s previous paper in another journal due to the use of Excel spreadsheet MC simulation – deemed inappropriate. That motivated work on this article. The views, opinions and conclusions expressed in this document are those of

This regular article is available in Spreadsheets in Education (eJSiE): http://epublications.bond.edu.au/ejsie/vol8/iss2/3 the author alone and do not necessarily represent the views of the Ontario Ministry of Health and Long-Term Care or any other organizations the author is affiliated with.

This regular article is available in Spreadsheets in Education (eJSiE): http://epublications.bond.edu.au/ejsie/vol8/iss2/3 Botchkarev: Simulation with Excel

Assessing Excel VBA Suitability for Monte Carlo Simulation

Abstract Monte Carlo (MC) simulation includes a wide range of stochastic techniques used to quantitatively evaluate the behavior of complex systems or processes. Microsoft Excel spreadsheets with Visual Basic for Applications (VBA) software is, arguably, the most commonly employed general purpose tool for MC simulation. Despite the popularity of the Excel in many industries and educational institutions, it has been repeatedly criticized for its flaws and often described as questionable, if not completely unsuitable, for statistical problems. The purpose of this study is to assess suitability of the Excel (specifically its 2010 and 2013 versions) with VBA programming as a tool for MC simulation. The results of the study indicate that Microsoft Excel (versions 2010 and 2013) is a strong Monte Carlo simulation application offering a solid framework of core simulation components including spreadsheets for data input and output, VBA development environment and summary statistics functions. This framework should be complemented with an external high-quality pseudo-random number generator added as a VBA module. A large and diverse category of Excel’s incidental simulation components that includes statistical distributions, linear and non-linear regression and other statistical, engineering and business functions require execution of due diligence to determine their suitability for a specific MC project. Keywords: simulation, Monte Carlo, Excel, VBA, spreadsheets, suitability, errors, limitations.

Editorial note Although the present article is not directly concerned with “spreadsheets in education”, the decision to publish it was made on more general grounds. Microsoft Excel has received plenty of bad press concerning its statistical functions over a period of many years, and Microsoft has been slow to address many of these issues. However, the situation appears to have improved significantly since the early publications of McCullough & Wilson in the journal Computational Statistics & Data Analysis (references [72] and [73] herein), in which problems with Excel’s statistical functions were highlighted. We have chosen to publish the present article so readers have some more up-to-date information on which to base their decisions. As a final comment, readers should also be aware of the existence of RExcel, a port of the statistical package R to Excel as an add-in. It is a free download and is the work of one of our own editorial board members, Emeritus Professor Erich Neuwirth, University of Vienna. Thus, one may have one’s cake and eat it too: the acknowledged accuracy and respectability of R, along with the friendly, well-known interface of Excel.

Published by ePublications@bond, 2015 1 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

1. Introduction Monte Carlo (MC) methods [1, 2, 3] denote a wide range of stochastic techniques based on generating probability distributions as inputs to model uncertainty and randomly sampling through multiple repeated runs (simulations) to quantitatively evaluate the characteristics and behavior of complex systems or processes. MC methods are embedded in several computational algorithms, and underpin uncertainty or sensitivity analysis of any mathematical (deterministic) model, bootstrapping and other resampling methods in non-parametric statistical inference, numerical integration in general, Monte Carlo Markov Chain simulation which is a core computational tool for Bayesian statistical inference. MC methods are widely used by scientists, engineers, mathematicians, statisticians to solve problems in engineering [4], physics [5], applied statistics [6], medicine [7], nanobiotechnology [8], economics [9, 10], finance [11], manufacturing and business [12] and many other fields. Computer implementation of the Monte Carlo method can be achieved through several approaches [10, 13]. First, MC models can be coded with a scientific programming language. Programming languages commonly used for numerical algorithms are Fortran, C, C++, JAVA, etc. These languages have libraries of frequently used statistical functions to facilitate program development. Usually, this approach is used to develop tailor-made programs to address specific situations. Second, commercial software packages exist that provide MC simulation environments and components to facilitate modelling and simulation, e.g. ExtendSim [14], Stata [15], Arena Simulation Software [16], gretl [17], Simulink, and general purposes statistical packages SAS, MATLAB, R, Stata, SPSS. Third, spreadsheet software packages are commonly used general purpose tools for MC simulation [20, 21]. Schriber dubbed spreadsheet-based MC simulation as simulation for the masses [22]. Spreadsheet simulation gained its popularity due to (e.g. [10, 23, 28, 29]): availability, user developed knowledge and skills, simplicity, intuitive visualization, broad range of applications, e.g. [23]. Availability of the Visual Basic for Application (VBA) programming language extends the number of tasks that can be solved with Excel spreadsheets, e.g. [10, 29]. Multiple papers study the use of the spreadsheets for MC simulation and apply spreadsheets to solve practical problems. Barreto and Howland studied application of MC simulation with Excel to econometrics problems [37]. Menn and Holle used Excel with VBA for health economic evaluations [38]. Kying explored Excel multivariate MC simulation as it applies to economic valuation of complex financial contracts [39]. Wang et al presented a practical approach to slope stability reliability analysis using

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 2 Botchkarev: Simulation with Excel

spreadsheet MC simulations [36]. Gedam and Beaudet used spreadsheet MC simulation for predicting reliability of complex systems [45]. Dobrican proposed using Excel MC simulation to forecast demand for automotive aftermarket inventories [46]. Rozycki used Excel-based MC simulation as a capital budgeting risk management tool [35]. Wang and Cao applied spreadsheet MC simulation to geotechnical analysis [40]. Au and Wang used Excel MC simulation for engineering risk assessment [41]. In addition, there is an area of research dedicated to using Excel spreadsheets as a tool to teach MC simulation. Mielczrek and Zabawa discussed spreadsheet MC simulation in teaching management science [23, 27]. Lee demonstrated effective use of spreadsheet simulation to teach project management [28]. Briand and Hill used Excel to teach Monte-Carlo experiments to undergraduates in an econometrics course [29]. Yin and Leon taught data resampling to students from business, accounting and economics using Excel for MC simulation [31]. Pecherska and Merkuryev used spreadsheets in teaching simulation concepts [42]. It should be noted that the studies on teaching and applying spreadsheet MC simulation mentioned above are focused on application of Excel to solve practical problems. However, all of these studies about the use of spreadsheets for teaching or applying MC simulation presume that Excel is suitable for this task. The researchers skip a step of the due diligence in verifying suitability of applying Excel to their specific problems. None of these studies undertakes, nor refers to, studies that verify the suitability of Excel for MC simulation. This is what we address in this article. Despite the popularity of the Excel in many industries and educational institutions, it has been repeatedly criticized for its flaws: low quality random numbers generator (RNG), inaccuracies of statistical functions, e.g. [43, 44]. Unfortunately, Microsoft’s corrective efforts to address concerns of the statistical community were too slow, too late: some identified and well-documented Excel deficiencies were not rectified in several Excel upgrades. These delays aggravated criticism, e.g. [72, 73, 75, 76, 79, 81, 82, 87, 91]. Critics may leave an impression that Excel MC simulation is completely inaccurate and is comparable to the Buffon’s needle experiment to calculate π with 300 tosses, and could be used only when numerical results do not matter. This paper is focused on Excel. Negative remarks presented above should not be construed to mean that other statistical tools are perfect. Excel is not the only statistical tool which was subjected to criticism. McCullough assessed three very popular statistical packages (SAS, SPSS and S-Plus) and found flaws in all three areas of evaluation: estimation, random number generation, and statistical distributions [69]. The same refers to the results of assessments of several types of econometric software packages (EViews, LIMDEP, SHAZAM and TSP) [70, 71], or comparative studies of the

Published by ePublications@bond, 2015 3 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

commercial simulation packages performed by Abu-Taieh et al [18] and Pezotta et al [19]. Motivation for this study was to tackle an existing knowledge gap represented by two disconnected trends in the academic literature on Monte Carlo spreadsheet simulation. The first is devoted to the use of the Excel for MC simulation to solve practical industry- specific problems without considering the risks and limitations related to the use of the tool. The second (mostly expressed by the statistical community) is focused on the generic limitations of the Excel as a statistical tool and describes Excel as a questionable, if not completely unsuitable, tool for MC simulations without considering specificity of the practical problems the tool is used to solve. In this paper we have taken a more holistic view of the role of Excel within the context of MC simulation. Selection of the software tool for MC simulation should not be based exclusively on the availability of the tool (“Excel is the only one we have”), neither on the assumption that the tool is appropriate because “everybody else is using it”. At the same time, the tool should not be discarded from a list of potential options because of generic limitations which might be irrelevant for a particular problem to be solved. Selection of the tool (and that applies to selection of any software tool for any purpose) should be based on matching the requirements dictated by the problem at hand and known software limitations. The purpose of this study is to assess suitability of the Excel (specifically its 2010 and 2013 versions) with VBA programming as a tool for MC simulation. To achieve the purpose of the study, the following research questions were formulated: What is the status of the statistical capabilities of the most recent Excel versions (in view of the prior known problems)? What errors and uncertainties are typical for the studies using MC simulation and how to deal with them? Is it appropriate to use Excel 2010 and Excel 2013 with VBA programming for MC simulation? What are the limitations (if any) of using Excel 2010 and Excel 2013 with VBA programming for MC simulation? Several methodologies were used to achieve the research objectives: critical literature review, spreadsheet numerical experiment, critical thinking and inductive reasoning. Scope of the article is defined in the following way. The focus of this article is mostly on the recent versions – 2010 and later. Earlier Excel versions are mentioned only if historic trend needs to be shown. Excel add-ins (e.g. Crystal Ball from Oracle, @RISK from Palisade, Risk Solver add-in from Frontline Systems, etc. [51, 54]) are not included in the

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 4 Botchkarev: Simulation with Excel

scope. Excel deficiencies that do not directly limit MC simulations have been left out of scope. Sometimes, Excel is criticized for not having certain capabilities. For example, Data Analysis ToolPak does not contain tools for calculating nonparametric test statistics [47]. Another example is missing function for the right tail probabilities for the Poisson distribution [86]. We consider these issues out of scope. The paper is intended for academics and researchers interested in MC simulation. It can also be used by practitioners in a broad subject area for evaluating potential tools for MC simulation. The rest of the paper is organized as follows. In Section 2, we present a literature review of the main Excel statistical components that are commonly used in the MC studies: random number generators, statistical distributions, summary statistics, linear and non- linear regression. Also, we analyse typical errors encountered in MC simulation and emphasize that the errors induced by the statistical tool is only one source of a variety of errors inherent to MC modelling and simulation. The section also presents the best practices of dealing with errors and uncertainties in the MC studies through the validation and verification techniques. In Section 3, we present results of the numerical experiment testing of several Excel PRNGs. Section 4 is devoted to the discussion of the results and Section 5 offers concluding remarks.

2. Literature Review Literature review has been conducted in several subject areas that present interest to most types of MC simulation: random numbers generators, univariate summary statistics, statistical distributions, linear and non-linear regression.

2.1. Random Number Generators Overview of Random Number Generators. Random number generators or, to be precise, pseudo-random generators (PRNGs) are the key components of the MC computer simulators. “An ideal random number generator would provide numbers that are uniformly distributed, uncorrelated, satisfy any statistical test of randomness, have a large period of repetition, can be changed by adjusting an initial "seed" value, are repeatable, portable, and can be generated rapidly using minimal computer memory.”[24] PRNG should meet certain desirable quality criteria [5, 34, 48, 89, 94]: - Good statistical properties of the output numbers. - Large period. By definition, algorithmic PRNG output is periodic. The expectation is that the length of the period should be significant. Expectations regarding the minimum required period vary from 2^128 [30], to 10^50 (~2^166) [34], to 2^200.

Published by ePublications@bond, 2015 5 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

Very long periods were cited (WELL1024a with period 2^1024 − 1, WELLRNG44497 with period 2^44497−1 [13]). Sufficiency of the period is determined by the needs of application. - Theoretical foundation. A PRNG should be based on solid mathematical foundation allowing for analytical solutions for the PRNG properties including period length. - Repeatability. A PRNG should be able to reproduce exactly the same sequence numbers from a given seed. - Computational efficiency. A PRNG should display high speed of generating numbers and use minimum amount of memory. - Portability. A PRNG should be able to perform the same way on different software/hardware platforms. There is no mandatory set of requirements that any “usable” PRNG must meet. Only in relation to a specific application certain quality criteria may be given higher or lower priority or discarded altogether. Many PRNGs have been developed using a variety of underlying mathematical algorithms and output qualities [e.g. 26, 34, 89, 94]. In case of insufficient period, two or more PRNGs can be combined to avoid regular patterns and increase period [32, 33]. The Mersenne Twister algorithm has been recognized as a valid choice for MC simulation [96]. PRNGs based on this algorithm passed statistical tests and have large periods: e.g. MT11213 a period of 2^11213-1 and MT19937 has a period of 2^19937-1 (approximately 10^6001) [34, 94]. Recently, another group of random algorithms was proposed by Panneton, L’Ecuyer and Matsumoto - Well Equidistributed Long-period Linear (WELL) [97]. It has implementations with a period of up to 44497 and alleviates some imperfections of the Mersenne Twister PRNGs. A variety of statistical tests have been designed to evaluate statistical qualities of the PRNGs. Primarily their purpose is to detect correlation and deviations from uniformity. The practical importance of statistical testing of PRNGs cannot be overestimated because certain types of correlations and other stochastic imperfections can lead to large systematic errors in MC simulation [25, 92]. Most commonly accepted tests are the TestU01 [52], Diehard tests [53], or its expanded open source version Dieharder [95]. They involve large data sets and multiple algorithms. New and more comprehensive tests are regularly being suggested. To a certain extent, this reflects the growing and changing understanding of the phenomenon of randomness by scientists. Newly developed tests may even “take out of business” PRNGs that have been used for years and were trusted. For example, that happened to the RANDU - a standard PRNG for IBM 360 and 370 series [2, 94]: unacceptable correlation was found in the output numbers. Although, arguably, almost everything we used to fly, launch and fire in the second part of the 20th century was based on MC simulation using RANDU. L’Ecuyer was right stating that:

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 6 Botchkarev: Simulation with Excel

“Of course, the quality of a generator can never be proven by any statistical test.”[32] Another practical approach to test the quality of a generator is to use it for MC simulation for a problem that has a proven numerical theoretical solution. The result of the simulation can be compared to the known value. Two obvious conditions for using this approach are that the exact solution should be known and MC algorithm should be validated. With this approach, any simulation problem can become a statistical test. Coddington has shown that many widely used generators that passed certain randomness tests and were recommended in many text books and included in some commercial software products failed this “reality check” [24, 48]. It should be noted that passing this test does not guarantee the quality of the PRNG in general or in any other application. An advice that was made by Ferrenberg, Landau & Wong long time ago is still valid: “…a specific algorithm must be tested together with the random number generator being used regardless of the tests which the generator has passed.” [25] Recommendations to practitioners regarding PRNGs sometimes sound a bit relaxed. Dupree and Fraley state: “Fortunately, imperfections of the random number generators are frequently minor and can usually be tolerated by Monte Carlo practitioners.” [49] Halton states: “Since we can neither prove that any particular process or device is a priori random, nor test its output exhaustively, the search for randomness is evidently futile. This would be very discouraging, were it not for the fact, when “random numbers” are used in practice, we generally require only a few of the properties of randomness, and all others are immaterial.” [50] We believe that the above quotations do not declare randomness tests irrelevant. The point is that the requirements of a specific simulation task, to a large extent, determine whether a particular PRNG can be considered suitable. Wrapping up a quick overview of the general PRNG development status, we should note that despite extensive research in the field of random number generation and serious success gained over the two – three recent decades, ideal PRNG is still a matter of the future. There is no theoretical solution to determine suitability of PRNG for a specific application. In each case, a combination of PRNG (or several generator options), MC algorithm and computer hardware/software should be tested and matched for an efficient result [5, p. 33; 92]. Anyway, as a starting point for testing any combination of PRNG, and computer environment, it’s much better to use a validated PRNG (especially, if they are available) than a generator with undocumented qualities.

Published by ePublications@bond, 2015 7 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

Excel 2010 Random Number Generators. Table 1 combines information on several types of Excel’s PRNGs. All generators for the versions up to and including Excel 2007 were known to be inadequate because of small period length and unclear algorithms [52, 77, 87]. There are three PRNGs in Excel 2010 [87, 89]: – The generator called upon by function RAND. – The generator in the Statistical Toolpak add-in. – The generator called upon by VBA RND function. Melard shows that PRNGs of the Statistical Toolpak and VBA have not been changed from previous versions [87]. So, these generators are still inadequate for simulations and should not be used. Microsoft indicates that RAND generator has been improved for Excel 2010 [88]. Melard conducted a test of the new PRNG with a modified (simplified) TestU1 [52], and found that the new PRNG has satisfactory statistical qualities [87]. Some authors state that a new RAND PRNG employs Mersenne Twister algorithm [84, 87]. This statement is based on a reference to the Microsoft blog [88]. When we accessed this page, there were no specifics regarding the type of the new RPNG. Microsoft knowledge article on the RAND function still (as of March 2015) indicates that Excel 2010 RAND function is based on the Wichmann-Hill algorithm [90]. We agree with Levy that, despite acknowledging improved quality of the RAND PRNG, there is much uncertainty about the RAND function: its algorithm, its implementation and its period length [89].

2.2. Excel 2010 accuracy of the statistical distributions Table 1 combines results of testing of statistical distributions for several generations of Excel. Inaccuracies of statistical distributions in Excel have been well documented for the versions up to and including Excel 2007. Many commonly used distributions (e.g. Poisson, Beta, Gamma, Binomial, etc.) were considered inadequate [74, 80, 84]. Excel 2010 marks a step in a long-awaited direction. Two papers were published with test results of the Excel 2010 statistical distributions by the authors known for testing and critical discussions of the Excel’s prior versions: Keeling and Pavur [84] and Knusel [86]. Both articles confirm serious improvements and elimination of many errors identified before. Some distributions were found completely error-free in both articles: e.g. Poisson, Binomial, Hypergeometric. Several distributions were found inaccurate for extremely small values of probabilities, e.g. Gamma, inverse normal [86]. Some results need to be verified for the testing conditions and parameters. For the inverse t- distribution, article [84] reports perfect accuracy (same as R statistical package), but

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 8 Botchkarev: Simulation with Excel

article [86] admits that the results could be completely wrong for this function. Also, according to [84], all tested distributions are very accurate, except inverse Chi-square, but no errors in this distribution were found in [86]. Most recent contribution by Melard, confirms improvements (compared to prior Excel versions) with most distributions [87].

2.3. Excel 2010 univariate summary statistics Univariate summary statistics include mean, the sample standard deviation, the correlation coefﬁcient, mode, median, maximum, minimum. Excel commands for computing these quantities are: ‘Average’, ‘Stdev’, ’Pearson’, ‘Median’, ‘Mode’, ‘Max’, ‘Min’. Arguably, these functions are used in any MC simulation study. Tests show that in Excel 2010 basic descriptive statistics can be used with confidence. Means are accurate to 15 significant digits [81, 84]. 2.4. Excel 2010 regression functions Calculation of the linear regression was deemed acceptable since Excel 2007 version [83, 91]. Excel 2010 also demonstrates acceptable performance, although its linear regression accuracy on some data sets is inferior to the previous version [84]. Non linear regression of Excel 2007 was very weak [79, 83]. No changes were made in Excel 2010 [87]. 2.5. Excel 2013 Excel 2013 has 104 functions in the statistical category [98] including six new functions (BINOM.DIST.RANGE, GAMMA, GAUSS, PERMUTATIONA, PHI, SKEW.P [99]). We could not find any third party testing of this version (except an article on Excel 2013 in the cloud [93], see below). So there is no third-party information on the quality of the Excel 2913 statistical functions. One of the types of Excel services offered by Microsoft is Office Online (formerly Office Web Apps). The most current version of the application is offered through the online service. McCullough and Yalta examined this Excel service [93]. Various Excel Web App functions in the cloud have demonstrated mixed levels of accuracies. However, the accuracy seems not to be the major consideration in this case. The testing has revealed a portability issue: the same spreadsheet opened in the cloud and on the desktop may give different results [93]. Even bigger concern is that the cloud option leaves researchers with little or no information about the hardware and software that is used by the service provider, not to mention that the elements of the infrastructure could be changed (with the best intention of improving services) at any time without a notice [93].

Published by ePublications@bond, 2015 9 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

Table 1: Suitability of Excel functions by version

Function Excel 97 Excel 2003 Excel 2007 Excel 2010

Random number generator

RAND Inadequat Inadequate Inadequate Improved [87]. e [72] [73, 77, 78] [77, 78] Undocumented [89]. RND Inadequate Unchanged [87] Analysis Inadequate Inadequate Unchanged [87] Toolpak (ATP) [73, 77, 78] [79] Statistical Inadequat distributions e [72] Standard Erroneous Fixed for No errors found for normal [73] function call Norm.S.Dist and NORMSINV Norm.Dist [86]. (RAND). Improved [87]. Not fixed for analysis toolpak (ATP) [73] Inverse Weak [73] Fixed. Exact Inadequate Exact result [84]. standard [73] [84] Functions Norm.Inv and normal Norm.S.Inv can give errors for small values of probabilities [86]. Poisson Bugs [74] Bugs [74] Inadequate Performs perfectly along [80, 91] with R and SAS [84]. No errors found for Poisson.Dist when computing point probabilities [86]. Improved [87]. Binomial Bugs [74] Bugs [74] Inadequate 100% correct [84]. [80, 91] No errors found for Binomial.Dist [86]. Improved [87].

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 10 Botchkarev: Simulation with Excel

Inverse No errors found for binomial Binom.Inv [86]. Gamma Bugs [74] Questionabl Inadequate Acceptable [84]. e [74] [84]. Gamma.Dist can give Questionabl errors for small values of e [91] probabilities [86]. Improved [87]. Inverse Gamma.Inv can give Gamma errors for large values of the parameter a. Gamma.Inv and Gamma.Dist are not consistent [86]. Beta Inadequate Acceptable [84]. [84] No errors found in Beta.Dist for density function. Left tail probabilities can be chaotic [86]. Improved [87]. Inverse Beta Bugs [74] Questionabl Inadequate Beta.Inv can give errors e [74] [80]. for certain parameters Unreliable [86]. [91] Student’s t For T.Dist, T.Dist.2T, T.Dist.RT result is not always correct [86]. Improved [87]. Inverse Inadequate Acceptable [84]. student’s t [80, 84, 91] For T.Inv and T.Inv.2T results can be completely wrong [86]. F F.Dist and F.Dist.RT – no errors found for density function, even for small probabilities [86]. Improved [87]. Inverse F Inadequate Acceptable [84]. [80, 84, 91] No errors found with

Published by ePublications@bond, 2015 11 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

F.Inv and F.Inv.RT [86]. Chi-square Chisq.Dist, Chisq.Dist.Rt. No errors found for density function. Results can be wrong if probabilities are small [86] Improved [87]. Inverse Chi- Inadequate Questionable. One of 15 square [84]. distributions Unreliable miscalculated [84]. [91] No errors found in Chisq.Inv. Chisq.Inv.RT and Chisq.Dist.RT are not consistent [86]. Hypergeometr Inadequate 100% correct [84]. ic [84] No errors found for Hypgeom.Dist [86]. Improved [87]. Univariate Inadequat Acceptable Basic Acceptable. Means summary e [72] [73]. descriptive accurate to 15 significant statistics Standard statistics digits [84]. (mean, the deviation can be used sample problems with standard corrected confidence deviation, the [75] [81, 84] correlation coefﬁcient, mode, median, maximum, minimum). Excel commands for computing these quantities are: ‘Average’, ‘Stdev’, ’Pearson’, ‘Median’,

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 12 Botchkarev: Simulation with Excel

‘Mode’, ‘Max’, ‘Min’. Regression Linear Acceptable Acceptable. Acceptable. On some data regression [73]. Performs sets inferior to previous Improved. better than version [84]. Performance previous on version [83, computing 91]. Except R-squares no warning and beta of detecting coefﬁcients perfectly is similar to collinear that of SAS data [79] [75] Nonlinear Unacceptabl No changes. No changes [87] regression e [73, 75]. Inadequate Accuracy is [79]. Results far below are better major than in statistical other packages studies [83]. [75] Other Functions Exponential Inadequate smoothing [79] LOGEST Inadequate [79] GROWTH Inadequate [79] Trendline Inadequate Inadequate (ATP) [85] [79, 85]

2.6. MC simulation sources of error By the very nature of the Monte Carlo method, it deals with uncertainties and errors. Despite variation in uncertainties and errors, they could be categorized into several groups according to the phases of modelling and simulation (described in the next

Published by ePublications@bond, 2015 13 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

subsection) [55, 56, 66]. It should be noted that numerical amounts of errors mentioned below apply only to the specific simulations they represent and may not be generalized on any other situations. These data demonstrate potential levels of errors from different sources.

Table 2: Sources of error in MC simulation Sources of Error

Simulation project conception

Errors due to unclear definition of simulation project objectives, in-scope system parameters, assessment criteria and required accuracies. Errors due to linguistic uncertainty, e.g. vagueness, ambiguity [61].

Input data analysis

Errors due to imperfections of input data, e.g. errors, inaccuracies of collected data characterizing modelled systems or processes [55]. Robinson presented a simulation case of a simple queue line in a bank [55]. The simulation has shown that underestimation or overestimation of the service time by 10% led to the significant underestimation of 30% or overestimation of 60%. Errors due to inadequate sample size of collected data characterizing modelled systems or processes [55]. Errors due to imperfect descriptions of the input data with selected or fitted probability distributions. Errors due to incorrect assumptions about input data for the situations in which real systems or processes are not available for empirical data collection. Errors due to incorrect modelling of randomness, e.g. selecting probability distributions that incorrectly represent randomness of the real system or process [56]. Law and McComas presented a simple simulation case of a single machine tool system with exponential interarrival times [56]. They have shown that incorrect selection of normal or lognormal input distributions (instead of a correct distribution which is Weibull in this case) led to large output errors of estimated average delays of 39% and 65% respectively. Errors due to incorrect selection of the parameters of probability

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 14 Botchkarev: Simulation with Excel

distributions or replacing distributions by their means [56].

Conceptual modelling Errors of conceptual modelling arising due to incomplete knowledge and understanding of the modelled system or simplification of the real system behaviour, e.g. ignoring correlations and co-variance in input distributions [55, 58]. Errors due to over-complication of the conceptual model with multiple estimated parameters (parametric uncertainty) [59, 60]. Errors of mathematical modelling due to approximations and simplifications [66]. Converting conceptual to computer model Errors of converting a conceptual model into computer model due to misinterpretation of model description or incorrect application of mathematical theory [5, 55, 62]. Errors due to systematic errors due to programming errors. Errors due to misuse of simulation software by underqualified users [56], e.g. using default settings which are unacceptable for the specific problem. Errors due to typing errors [62] Experimentation Errors due to ignoring the model’s initial transient period or, more generally, incorrect initial conditions [55]. In the simulation of a simple queue line in a bank, presented by Robinson, it was shown that the output error of ignoring initial transient period was 4.57% [55]. Errors due to insufficient number of replications (simulations runs) [5, 55, 65]. Errors due to insufficient searching of the solution space [55], e.g. system parameters and influencing factors are changed in limited ranges and don’t reveal full behaviour of the model. MC software incurred errors (part of experimentation) Errors due to low quality of the PRNG.

Published by ePublications@bond, 2015 15 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

Errors due to imperfect transformation of the PRNG to other probability distributions. Errors due to numerical approximations [58]. Errors due to limited resolution in space and time (discretization) [58]. Errors due to bugs in software [58] Errors due to round-off errors induced by simulation computer hardware and operating system due to limited word length [5, 67, 71]. Errors due to truncation errors induced by simulation computer hardware and operating system due to limited word length [5, 67, 71]. Errors due to algorithmic inefficiencies, e.g. using Newton’s method for nonlinear equations [67, 68]. Simulation output data analysis Errors due to incorrect estimation of the output probability distributions of variables of interest and their parameters (e.g. mean and variance) [64]. Errors induced by visualization of results [66].

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 16 Botchkarev: Simulation with Excel

Each MC simulation research effort is unique and depending on multiple factors (such as the purpose of the simulation, size and complexity of the modelled system or process, application area, etc.) it will experience different types and levels of uncertainties and errors. Complex combination of these errors and uncertainties will propagate through the model and impact the accuracy of the estimated output parameters.

2.7. MC Verification and Validation Some of the errors and uncertainties outlined above can be decreased or completely eliminated (e.g. programming bugs). Others are inherent and are not reducible. Verification and validation processes are part of the overall simulation model development efforts undertaken in order to assure that the models are built according to the user objectives, correctly represent real world systems or processes of the application domain and produce credible and reliable results. For the purpose of this paper, we are interested in certain aspects of verification and validation which are related to the simulation computerized tools and programming:

“Operational validation is deﬁned as determining that the model’s output behaviour has a satisfactory range of accuracy for the model’s intended purpose over the domain of the model’s intended applicability.” [63]

“Computerized model veriﬁcation is deﬁned as assuring that the computer programming and implementation of the conceptual model are correct.” [63]

It is important to note that the process of MC simulation validation is not targeted to achieving maximum accuracy. Required accuracy is determined by the purpose of the simulation to satisfy the objectives of the decisions to be made based on the simulation results. It is intuitively understandable, that different simulation scenarios will require different model accuracy. For example, simulation that reproduces in detail a history of a stochastic movement of a high-energy particle including its trajectory and interactions with surrounding particles will most likely require higher accuracy than a model of a waiting line to a bank teller. A computerized tool or a program that is not suitable for the physics scenario due to lack of accuracy may perfectly fit the needs of the operational bank evaluation.

A crucial point here is that a discussion of the accuracy of a simulation tool (and hence its suitability for MC simulation) cannot be meaningful without relation to a specific situation (task, purpose, users, decisions to be made), which determines the requirements to the acceptable levels of the simulation output accuracies.

Published by ePublications@bond, 2015 17 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

3. Numerical Experiment The purpose of the empirical experiment was to perform small scale comparative testing of several Excel PRNGs. Numerical experiment involved two computers. Their characteristics are shown in Table 3. Computer 1 (C1) had Windows 7 and Excel 2010, and computer 2 (C2) had Windows 8.1 and Excel 2013. Both Excel applications are 32- bit installations. The word length of the Excel application determined the operation of the computers, despite the fact that the second computer hardware was 64-bit.

Table 3: Computers used in numerical experiments

Computer 1 Computer 2 Operating System Windows 7 Professional Windows 8.1

Service Pack Service Pack 1

Word Length, bit 32 64 Processor Intel, Core i7-3520M Intel, Core i5-3210M

CPU Speed, GHz 2.90 2.50 Memory Installed, GB 4.0 6.0

Memory Usable, GB 2.96 5.87

On both computers, the following three PRNGs were used: two Excel built-in functions RND, RAND and Mersenne Twister PRNG. RAND function (as it is not available in VBA directly) was called through Evaluate("=Rand()"). MTwister is a translation by Jerry Wang of the C-program Mersenne Twister MT19937 into an Excel VBA module "MersenneTwisterVBAModule" (as of October 2014). This module is freely available at

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 18 Botchkarev: Simulation with Excel

the website of the Mersenne Twister algorithm creators Nishimura and Matsumoto at the University of Hiroshima [100] and compatible with Excel 2010 and Excel 2013.

Visual testing. Fig 1 through 3 demonstrate visual distribution of 1,000 random numbers generated by RND, RAND and MTwister PRNGs, respectively. By inspection, none of the generators reveals any irregularities or repeating patterns.

Fig 1: Visual representation of 1,000 numbers generated by RND PRNG (C1 Excel 2010)

Fig 2: Visual representation of 1,000 numbers generated by RAND PRNG (C1 Excel 2010)

Published by ePublications@bond, 2015 19 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

Fig 3: Visual representation of 1,000 numbers generated by MTwister PRNG (C1 Excel 2010)

Column charts on Fig 4 and Fig 5 show distribution of random numbers generated by RND and MTwister, respectively. These charts allow for approximate comparison of the PRNGs’ uniformity. The MTwister demonstrates better uniformity. Figures 4 and 1, and figures 5 and 3 are based on the same datasets.

Fig 4: Chart for RND PRNG Fig 5: Chart for MTwister PRNG

Chi-square Test. Ten thousand random numbers were used from MTwister PRNG (from 1,000,001st to 1,010,000th with a default seed 5489). The Chi-square goodness-of-fit uniformity test followed a procedure described in [101]. Calculations and results are shown in Fig. 6.

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 20 Botchkarev: Simulation with Excel

Fig 6: Spreadsheet calculations and results of the Chi-squire test of MTwister PRNG

The results confirm that the MTwister PRMG implemented in VBA generates uniformly distributed data. Statistics equals 8.7 which is way below 16.9 that allows accepting the hypothesis with 95% confidence.

Computational speed testing. Each speed test involved recording time for generation and writing to a spreadsheet 1,000,000 random numbers. The results are presented in Table 2.

Table 4: Computational speed testing (time in seconds)

RND RAND MTwister C1 4 20 10 C2 25 46 12 Average 14.5 33 11

Computational speeds of the RND and MTwister are approximately at the same level. Slower speed of the RAND generation can be explained by the fact that it was called through the EVALUATE function.

Published by ePublications@bond, 2015 21 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

4. Discussion

MC is used in many industrial and knowledge fields to solve a wide variety of problems. Real-world problems may require specific mathematical apparatus to model them and, hence, numerous types of computer functions to support simulation of the models. Functionality of the MC simulation software can be subdivided into two categories. The first one, which we call Core, includes data input and results output facilities, programming environment, PRNG and functions for calculating summary statistics. All of the core components are arguably used in any simulation project forming the minimum scope of MC software functionality. Also, for many applications these components can be the only ones necessary and sufficient to complete the project. The second category, Incidental, includes statistical distributions, linear and non-linear regression and other statistical, engineering and business functions. Commonly, only some of the incidental components are required for a given project. Incidental components cannot be used by themselves to form a complete simulation project: they are used in addition to the core components. Core and incidental components of the MC simulation tools are shown in the left side of the Figure 7. A component could represent either one function (e.g. PRNG) or a group of many functions with similar purpose or application (e.g. a group of statistical functions include over 100 individual Excel functions).

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 22 Botchkarev: Simulation with Excel

Fig 7: Components of the MC simulation tools and suitability of Excel functions The right two columns of Figure 7 demonstrate how Excel 2010 and Excel 2013, respectively, qualify in each of the MC simulation components. At a high level, qualifying characteristics of suitability can be expressed by four categories. The first is Use with confidence. It means that the component has been tested and displayed positive results. If there are more than one function in the component, it means that all functions were tested and proved to be good. The second is Do not use. It also means that the results of the tests are available but they were negative. This category also can include a component with several functions - none of them should be used. The third category is Test before use. It applies to the components that either were not tested in the third-party reviews or the results were mixed. These components require careful assessment/testing before they could be included in the MC models. This category is arguably the most numerous: relatively small number of Excel functions (compared to the total number of functions available) received third-party testing. Finally, the fourth category represents components with multiple functions with mixed levels of usability. Qualifying characteristics of Excel 2010 and Excel 2013 are shown in the two right columns of Figure 7 and are based on the published results of the Excel functions testing (combined in Table 1), and analysis and numerical experiment of this study. For

Published by ePublications@bond, 2015 23 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

both Excel 2010 and Excel 2013, most core components are displaying good results, i.e. input and output with spreadsheets, VBA development environment and summary statistics can be used with confidence in MC simulations. The only exception in the core components is Excel PRNG and it needs more comments. As it has been demonstrated in Sections 2 and 5, Excel built-in PRNGs have serious deficiencies. VBA generator RND has been known to have poor statistical qualities. PRNG RAND, although it has been updated in Excel 2010, has not been documented – so its period length is unclear. Periodicity requirement is determined by the number of calls that will be made during one simulation. The weaker requirement is that the period length cannot be less than number of calls to PRNG in one simulation to avoid repetition of the same random numbers. The stronger requirement should also take into account a risk of long-range correlations in number sequences. To alleviate this risk it is recommended to use only a small part of the period – usually no more than 10% of the numbers in the period [50], or even that the period of a PRNG should be at least 200 times greater than the square of the number of pseudo‐random numbers needed [SQRT (length/200)] [89, 103]. In some situations, the requirement may be more stringent. For example, L’Ecuyer and Simard have shown that for two dimensional uniformity of pseudo-random numbers generated by a PRNG, one should not use sample sizes more than approximately equal to the cubic root of the period length [57]. Not knowing exact period length of the RAND PRNG puts researchers in an uncertain situation regarding suitability of the software tool. Also, RAND is not a built-in VBA function and calling it through EVALUATE function slows down simulations. Another serious issue with Excel built-in PRNGs, which is not commonly mentioned, is that they use current machine time as a seed number. The researchers can’t exercise seed control which means that results of the simulations become irreproducible [77, 89]. Mandatory reproducibility of research is an upcoming ethical and valid testing issue and requirement of the contemporary science. In general, reproducibility is a very broad notion that includes documenting and archiving input and output data, description of the model and code used, etc. [102]. For the MC simulation field, inability to set and document a PRNG seed number makes any other efforts aimed at reproducibility of research impossible and senseless. So, current PRNGs add an ethical problem to the acknowledged technical issues of Excel. Aside from purely ethical implications, lack of seed control may also have tangible consequences, if research results become a matter of litigation [89] or audit under Sarbanes-Oxley Act. Overall recommendation is not to use built-in Excel PRNGs for MC simulation. Fortunately, this restriction can be easily overcome with the use of an external PRNG implemented as a VBA module. In this study, we used Mersenne Twister MT19937 PRNG as an Excel VBA module "MersenneTwisterVBAModule" available at the website of the Mersenne Twister home page at the University of Hiroshima [100]. As a Mersenne Twister type generator, it has

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 24 Botchkarev: Simulation with Excel

a period of length that can satisfy most random number extensive applications – 2^19937-1. This PRNG demonstrated seed control, easy integration with a VBA program and random numbers generation speeds comparable with the RND built-in function. We believe that this PRNG can be used in MC simulation to replace Excel built-in PRNGs and eliminate their problems. The same website provides another VBA version of the Mersenne Twister PRNG [100]. It should be noted that literature search didn’t retrieve any academic papers with extensive tests of the VBA PRNGs, so in making recommendation to use these PRNGs we rely on the reputation of the Mersenne Twister algorithm creators offering these programs on their website. We are not questioning the skills of the people who translated Mersenne Twister programs from C into VBA, however, there could be some subtle specifics in the properties of both languages that may incur risks of “something being lost in translation”. Thorough tests of the randomness qualities of the VBA PRNGs using the TestU01 [52] or Diehard tests [53] are highly desirable. To complete a discussion of the core components, we can reiterate that input and output data with spreadsheets, VBA development environment and summary statistics can be used with confidence in MC simulations. Complemented with an external PRNG added as a VBA module (e.g. "MersenneTwisterVBAModule" or similar), the core components of the Excel 2010 and Excel 2013 can serve as a solid simulation framework. Based on our definition of the core components, it is clear that this framework can be used to implement a broad variety of MC simulation projects. Actually, any other elements of the simulation model (if necessary) could be just programmed with VBA. Whether this approach is feasible or makes sense from a workload point of view depends on a specific project. Incidental components represent several groups with multiple and disparate functions (even within a single group) (see Fig 7). Linear regression and non-linear regression are easier to qualify: linear regression can be used without concerns and non-linear regression should not be used (see Table 1). Suitability of the functions which are included in the statistical distributions component vary from those which were tested as completely error-free and can be used with confidence (e.g. Normal, Poisson, Binomial distributions), to those with mixed test results (e.g. inverse t-distribution). The same may be said regarding the other statistical, engineering or business functions. When considering functions from incidental components, due diligence should be exercised by analysing results of the third parties’ reviews (see Table 1) or conducting additional tests of candidate functions. It should be noted that special caution should be taken with Excel 2013 as there is lack of available reviews (at least at present). Newly added functions should be tested before they can be recommended for use in simulation models. Even improved functions (which tested positive in Excel 2010) should be re-

Published by ePublications@bond, 2015 25 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

tested, as the vendor has some history of making new mistakes while correcting previous ones. Certain miscommunication can be found in the literature. Many papers analyze statistical software and requirements to this type of solutions. Because MC simulation is based on statistical methods, there is a temptation to generalize and directly apply all requirements to the statistical software to the MC simulation software. This may not be always a correct approach. A conclusion of this study is that “Excel 2010 is a strong MC simulation tool…” with certain conditions and limitations. It may be perceived incompatible with a widely publicized notion that “Excel should not be used as a statistical software package…’ – after all we are talking about the same software application with the same set of statistical capabilities. However, there is no contradiction. Conclusions reflect the business needs which transform into varying assessment expectations, requirements and assumptions. First, when Excel is assessed as a statistical software package, the expectation is that the accuracy of the results must be perfect (usually, exact to 15 significant digits). Because this is not always achieved – the tool is deemed inappropriate. When Excel is assessed as a MC simulation tool, the accuracy must be good enough to satisfy the objectives of the decisions to be made based on the simulation results. The requirements to accuracy may vary from very high to rather relaxed based on the needs of a specific simulation project, and Excel may or may not meet these requirements. Second, when Excel is assessed as a statistical software package, another expectation is that all statistical functions must work perfectly and nomenclature of the functions must match variety of functions available in other statistical software. Because this is not always achieved – the tool is considered inappropriate. When Excel is assessed as a MC simulation tool, the expectation is that the core components must provide a reliable simulation framework while the use of the incidental components has to be determined through the validation and verification process. If certain incidental functions are not available or not performing well enough, VBA development environment may be used to program missing components. Using an analogy, statistical software package evaluators are working on competitive Formula 1 bolides and their requirements may or may not be always applicable to a case of selecting a family car. Also, it should be noted that Excel 2010 and Excel 2013 accumulated many improvements (maybe not as quickly and thoroughly as the users would need or expect) and certain critical statements regarding the statistical capabilities of this application, which still could be come across in the literature, do not apply any more.

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 26 Botchkarev: Simulation with Excel

The importance of this work is in offering an objective view of the recent Excel versions through the lens of the MC simulation needs. Practical implication of the paper is in showing a pragmatic approach to using Excel’s strengths and avoiding mistakes in simulation projects. Also, we believe that the overall cast of doubt that tends to overshadow all MC simulations using Excel will be lifted (at least from implementations that use Excel with caution and follow the best V&V practices). Future research will focus on in-depth testing of Mersenne Twister VBA PRNGs and accuracy assessments of the Excel 2013 statistical distributions.

5. Conclusions 1. Microsoft Excel (versions 2010 and 2013) is a strong Monte Carlo simulation application. It offers a solid framework of core simulation components including spreadsheets for data input and output, VBA development environment and summary statistics functions, which have been tested to provide reliable performance. Researchers should complement this framework with an external high-quality PRNG added as a VBA module (e.g. "MersenneTwisterVBAModule" or similar). Even the Excel framework of core simulation components alone can be used to implement a broad variety of MC simulation projects. 2. Excel also offers a large and diverse category of incidental simulation components that includes statistical distributions, linear and non-linear regression and other statistical, engineering and business functions. By using these components, development of the simulation models can be expedited. Suitability of the functions in this category vary from completely error-free to those with mixed test results or those with unknown qualities. Due diligence should be exercised when considering functions from the incidental components for the project: conduct statistical tests of candidate functions and/or analyze results of the third-party reviews (e.g. see Table 1). 3. The general suitability of Excel 2010 for MC simulation, stated in the first point of conclusions, does not imply that Excel is appropriate for any simulation task. Each simulation project is unique and should be based on the best verification and validation practices, which involve, among other steps, determining the acceptable range of accuracy of estimated output parameters to satisfy the needs of decision makers; identifying Excel functions that are required for the model and assessing their statistical qualities and related errors; identifying all model errors and uncertainties and evaluating their propagation through the model and impact on the overall accuracy (including the contribution of the Excel-induced errors). Only after comparing acceptable and expected errors and performing test runs of the model on

Published by ePublications@bond, 2015 27 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

the project-specific computer environment, researcher can make an informed decision on the suitability of Excel for a given simulation project.

References

[1] Robert, C., & Casella, G. (2013). Monte Carlo statistical methods. Springer Science & Business Media. [2] Rubinstein, R. Y., & Kroese, D. P. (2011). Simulation and the Monte Carlo method (Vol. 707). John Wiley & Sons. [3] Shreider, Y. A. (Ed.). (2014). The Monte Carlo Method: The Method of Statistical Trials. Elsevier. [4] Zio, E. (2013). The Monte Carlo simulation method for system reliability and risk analysis (p. 198p). London, UK: Springer. [5] Landau, D. P. & Binder, K. (2014). A guide to Monte Carlo simulations in statistical physics. Cambridge university press. [6] Boos, D. D., & Stefanski, L. A. (2013). Monte Carlo Simulation Studies. In Essential Statistical Inference (pp. 363-383). Springer New York. [7] Ljungberg, M., Strand, S. E., & King, M. A. (Eds.). (2012). Monte Carlo calculations in nuclear medicine: Applications in diagnostic imaging. CRC Press.

[8] Eom, K. (Ed.). (2011). Simulations in Nanobiotechnology. CRC Press. [9] Creal, D. (2012). A survey of sequential Monte Carlo methods for economics and finance. Econometric Reviews, 31(3), 245-296. [10] Brandimarte, P. (2014). Handbook in Monte Carlo Simulation: Applications in Financial Engineering, Risk Management, and Economics. John Wiley & Sons. [11] McLeish, D. L. (2011). Monte Carlo simulation and finance (Vol. 276). John Wiley & Sons. [12] Jahangirian, M., Eldabi, T., Naseer, A., Stergioulas, L. K., & Young, T. (2010). Simulation in manufacturing and business: A review. European Journal of Operational Research, 203(1), 1-13. [13] Raychaudhuri, S. (2008, December). Introduction to Monte Carlo simulation. In Simulation Conference, 2008. WSC 2008. Winter (pp. 91-100). IEEE. [14] Krahl, D. (2012, December). ExtendSim: a history of innovation. In Proceedings of the Winter Simulation Conference (p. 430). Winter Simulation Conference. [15] Adkins, L. C., & Gade, M. N. (2012). Monte Carlo Experiments Using Stata: A Primer with Examples. Advances in Econometrics, 30, 429-477.

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 28 Botchkarev: Simulation with Excel

[16] Pezzotta, G., Pinto, R., Pirola, F., & Cavalieri, S. (2014). Evaluation and Assessment of Two Simulation Software for Service Engineering. In Serviceology for Services (pp. 263- 272). Springer Japan. [17] Adkins, L. C. (2011). Using gretl for Monte Carlo experiments. Journal of Applied Econometrics, 26(5), 880-885. DOI: 10.1002/jae.1228. [18] Abu-Taieh, E., & El Sheikh, A. A. R. (2007). Commercial simulation packages: a comparative study. International Journal of Simulation, 8(2), 66-76. [19] Pezzotta, G., Pinto, R., Pirola, F., Gaiardelli, P., & Cavalieri, S. (2013). A Critical Evaluation and Comparison of Simulation Packages for Service Process Engineering. In XVIII Summer School Francesco Turco. A CHALLENGE FOR THE FUTURE: the role of industrial engineering in a global sustainable economy. [20] Chew, G., & Walczyk, T. (2012). A Monte Carlo approach for estimating measurement uncertainty using standard spreadsheet software. Analytical and bioanalytical chemistry, Vol. 402, no. 7, pp. 2463-2469. [21] Farrance, I., & Frenkel, R. (2014). Uncertainty in Measurement: A Review of Monte Carlo Simulation Using Microsoft Excel for the Calculation of Uncertainties Through Functional Relationships, Including Uncertainties in Empirically Derived Constants. The Clinical Biochemist Reviews, Vol. 35, no. 1, 37. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3961998/pdf/cbr-35-37.pdf [22] Schriber, T. J. (2009, December). Simulation for the masses: Spreadsheet-based Monte Carlo simulation. In Simulation Conference (WSC), Proceedings of the 2009 Winter (pp. 1-11). IEEE. [23] Mielczarek, B., & Zabawa, J. (2011). Integrating Monte Carlo Simulation with Other Spreadsheet-Based Modelling Methods to Teach Management Science. In Proceedings 25th European Conference on Modelling and Simulation (ECMS) (pp. 348-354). http://www.scs- europe.net/conf/ecms2011/ecms2011%20accepted%20papers/ibs_ECMS_0049.pdf [24] Coddington, P. D. (1994). Analysis of random number generators using Monte Carlo simulation. International Journal of Modern Physics C, 5(03), 547-560. [25] Ferrenberg, A. M., Landau, D. P., & Wong, Y. J. (1992). Monte Carlo simulations: Hidden errors from ‘‘good’’ random number generators. Physical Review Letters, 69(23), 3382.

Published by ePublications@bond, 2015 29 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

[26] Vattulainen, I., Kankaala, K., Saarinen, J., & Ala-Nissila, T. (1995). A comparative study of some pseudorandom number generators. Computer Physics Communications, 86(3), 209-226. [27] Zabawa, J., & Mielczarek, B. (2007, June). Tools of Monte Carlo simulation in inventory management problems. In Proceedings 21st European Conference on Modelling and Simulation (ECMS). pp. 56-61. doi:10.7148/2007-0056. [28] Lee, W. L. (2011, December). Spreadsheet based experiential learning environment for project management. In Simulation Conference (WSC), Proceedings of the 2011 Winter (pp. 3877-3887). IEEE. [29] Briand, G., & Hill, R. C. (2013). Teaching basic econometric concepts using Monte Carlo simulations in Excel. International Review of Economics Education, 12, 60-79. [30] Neves, S., & Araujo, F. (2014). Engineering Nonlinear Pseudorandom Number Generators. In Parallel Processing and Applied Mathematics (pp. 96-105). Springer Berlin Heidelberg. [31] Yin, T., & Leong, W. (2008). Spreadsheet Data Resampling for Monte Carlo Simulation. Spreadsheets in Education (eJSiE), 3(1), 6. [32] L'Ecuyer, P. (1990). Random numbers for simulation. Communications of the ACM, 33(10), 85-97. [33] Wang, M., Guo, F., Qu, H., & Li, S. (2011, May). Combined random number generators: A review. In Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on (pp. 443-447). IEEE. [34] Kroese, D. P., Taimre, T., & Botev, Z. I. (2013). Handbook of Monte Carlo Methods (Vol. 706). John Wiley & Sons. [35] Rozycki, J. (2011). Excel-Based Monte Carlo Simulation as a Capital Budgeting Risk Management Tool. Journal of Financial Education, 101-128. [36] Wang, Y., Cao, Z., & Au, S. K. (2010). Practical reliability analysis of slope stability by advanced Monte Carlo simulations in a spreadsheet. Canadian Geotechnical Journal, 48(1), 162-172. [37] Barreto, H., & Howland, F. (2005). Introductory econometrics: using Monte Carlo simulation with Microsoft Excel. Cambridge University Press.

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 30 Botchkarev: Simulation with Excel

[38] Menn, P., & Holle, R. (2009). Comparing three software tools for implementing markov models for health economic evaluations. Pharmacoeconomics, 27(9), 745-753. [39] Kyng, T. J., & Konstandatos, O. (2014). Multivariate Monte-Carlo Simulation and Economic Valuation of Complex Financial Contracts: An Excel Based Implementation. Spreadsheets in Education (eJSiE), 7(2), 5. [40] Wang, Y., & Cao, Z. (2014). Practical reliability analysis and design by Monte Carlo Simulation in spreadsheet. Risk and Reliability in Geotechnical Engineering, 301. [41] Au, S. K., & Wang, Y. (2014). Engineering risk assessment with subset simulation. John Wiley & Sons. [42] Pecherska, J., & Merkuryev, Y. (2005, June). Teaching simulation with spreadsheets. In Proceedings of the 19th European Conference on Modelling and Simulation (pp. 1-6). [43] McCullough, B. D., & Wilson, B. (2005). On the accuracy of statistical procedures in Microsoft Excel 2003. Computational Statistics & Data Analysis, Vol. 49, no. 4, pp. 1244- 1252. [44] McCullough, B. D., & Heiser, D. A. (2008). On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics & Data Analysis, Vol. 52, no. 10, pp. 4570-4578. [45] Gedam, S. G., & Beaudet, S. T. (2000). Monte Carlo simulation using Excel (r) spreadsheet for predicting reliability of a complex system. In Reliability and Maintainability Symposium, 2000. Proceedings. Annual (pp. 188-193). IEEE. [46] Dobrican, O. (2013). Forecasting Demand for Automotive Aftermarket Inventories. Informatica Economica, 17(2), 119-129. [47] Freeman Jr, G. L. (2014). Microsoft Excel 2010 Improved for Teaching Statistics but Caution Advised. National Social Science, 43(1), 21-32. [48] Accardi, L., & Gaebler, M. (2011). Statistical analysis of random number generators. Quantum Bio-Informatics IV, QP–PQ: Quantum Probabilities and White Noise Analysis, 28, 117-128. [49] Dupree, S. A., & Fraley, S. K. (2001). A Monte Carlo primer: A Practical approach to radiation transport (Vol. 1). Springer Science & Business Media. [50] Halton, J.H. (1970). A retrospective and prospective survey of the Monte Carlo Method. SIAM Review, 12(1), 1-63. [51] Sugiyama, S. (2008). Monte Carlo simulation/risk analysis on a spreadsheet: review of three software packages. Foresight, 9, 36-39.

Published by ePublications@bond, 2015 31 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

[52] L'Ecuyer, P., & Simard, R. (2007). TestU01: AC library for empirical testing of random number generators. ACM Transactions on Mathematical Software (TOMS), 33(4), 22. [53] The Marsaglia Random Number CDROM including the Diehard Battery of Tests of Randomness. http://stat.fsu.edu/pub/diehard/ [54] Šukys, J., Mishra, S., & Schwab, C. (2012). Static load balancing for multi-level Monte Carlo finite volume solvers. In Parallel Processing and Applied Mathematics (pp. 245-254). Springer Berlin Heidelberg. [55] Robinson, S. (1999, December). Three sources of simulation inaccuracy (and how to overcome them). In Proceedings of the 31st conference on Winter simulation: Simulation---a bridge to the future-Volume 2 (pp. 1701-1708). ACM. [56] Law, A. M., & McComas, M. G. (1986, December). Pitfalls in the simulation of manufacturing systems. In Proceedings of the 18th conference on Winter simulation (pp. 539- 542). ACM. [57] L’Ecuyer, P., & Simard, R. (2001). On the performance of birthday spacings tests with certain families of random number generators. Mathematics and Computers in Simulation, 55(1), 131-137. [58] Refsgaard, J. C., van der Sluijs, J. P., Højberg, A. L., & Vanrolleghem, P. A. (2007). Uncertainty in the environmental modelling process–a framework and guidance. Environmental modelling & software, 22(11), 1543-1556. [59] McLean, K. A., Wu, S., & McAuley, K. B. (2012). Mean-squared-error methods for selecting optimal parameter subsets for estimation. Industrial & Engineering Chemistry Research, 51(17), 6105-6115. [60] Eghtesadi, Z., Wu, S., & McAuley, K. B. (2013). Development of a Model Selection Criterion for Accurate Model Predictions at Desired Operating Conditions. Industrial & Engineering Chemistry Research, 52(35), 12297-12308. [61] Regan, H. M., Colyvan, M., & Burgman, M. A. (2002). A taxonomy and treatment of uncertainty for ecology and conservation biology. Ecological applications, 12(2), 618-628. [62] Gernaey K.V., Rosen C., Batstone D.J. and Alex J. (2006). Efficient modelling necessitates standards for model documentation and exchange. Water Science and Technology, 53(1), 277-285. [63] Sargent, R. G. (2013). Verification and validation of simulation models. Journal of Simulation, 7(1), 12-24.

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 32 Botchkarev: Simulation with Excel

[64] Oberkampf, W. L., Trucano, T. G., & Hirsch, C. (2004). Verification, validation, and predictive capability in computational engineering and physics. Applied Mechanics Reviews, 57(5), 345-384. [65] Koehler, E., Brown, E., & Haneuse, S. J.-P. A. (2009). On the Assessment of Monte Carlo Error in Simulation-Based Statistical Analyses. The American Statistician, 63(2), 155–162. doi:10.1198/tast.2009.0030 [66] Walter, M., Storch, M., & Wartzack, S. (2014). On uncertainties in simulations in engineering design: A statistical tolerance analysis application. Simulation, 90(5) doi: 10.1177/0037549714529834 [67] McCullough, B. D. (1998). Assessing the reliability of statistical software: Part I. The American Statistician, 52(4), 358-366. [68] Prudhomme, S., & Oden, J. T. (2003). Computable error estimators and adaptive techniques for fluid flow problems. In Error estimation and adaptive discretization methods in computational fluid dynamics (pp. 207-268). Springer Berlin Heidelberg. [69] McCullough, B. D. (1999). Assessing the reliability of statistical software: Part II. The American Statistician, 53(2), 149-159. [70] McCullough, B. D. (1999). Econometric software reliability: EViews, LIMDEP, SHAZAM and TSP. Journal of Applied Econometrics, 14(2), 191-202. [71] McCullough, B. D., & Vinod, H. D. (1999). The numerical reliability of econometric software. Journal of Economic Literature, 633-665. [72] McCullough, B. D., & Wilson, B. (1999). On the accuracy of statistical procedures in Microsoft Excel 97. Computational Statistics & Data Analysis, 31(1), 27-37. [73] McCullough, B. D., & Wilson, B. (2005). On the accuracy of statistical procedures in Microsoft Excel 2003. Computational Statistics & Data Analysis, 49, 1244-1252. [74] Knüsel, L. (2005). On the accuracy of statistical distributions in Microsoft Excel 2003. Computational statistics & data analysis, 48(3), 445-449. [75] Keeling, K. B., & Pavur, R. J. (2007). A comparative study of the reliability of nine statistical software packages. Computational Statistics & Data Analysis, 51(8), 3811-3831. [76] Simonoff, J. S. (2008). Statistical analysis using Microsoft excel. Statistics and Data Analysis Handout, available online at http://pages. stern. nyu. edu/∼ jsimonof/classes/1305/pdf/excelreg. pdf. [77] McCullough, B. D. (2008). Microsoft Excel’s ‘not the Wichmann–Hill’random number generators. Computational Statistics & Data Analysis, 52(10), 4587-4593.

Published by ePublications@bond, 2015 33 Spreadsheets in Education (eJSiE), Vol. 8, Iss. 2 [2015], Art. 3

[78] L'Ecuyer, P., & Simard, R. (2007). TestU01: AC library for empirical testing of random number generators. ACM Transactions on Mathematical Software (TOMS), 33(4), 22. [79] McCullough, B. D., & Heiser, D. A. (2008). On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics & Data Analysis, 52(10), 4570-4578. [80] Yalta, A. T. (2008). The accuracy of statistical distributions in Microsoft® Excel 2007. Computational Statistics & Data Analysis, 52(10), 4579-4586. [81] Using Excel for Statistical Analysis. (2010). Social Science Data and Software at Stanford University. http://web.stanford.edu/group/ssds/cgi- bin/drupal/files/Guides/Using%20Excel.pdf [82] Cryer, J. D., Should, G. G., Axes, L., & Marks, T. (2001, August). Problems with using Microsoft Excel for statistics. In Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001. [83] Odeh, O. O., Featherstone, A. M., & Bergtold, J. S. (2010). Reliability of statistical software. American Journal of Agricultural Economics, 92(5), 1472–1489; doi: 10.1093/ajae/aaq068. [84] Keeling K. B. & Pavur R. J. (2011). Statistical Accuracy of Spreadsheet Software, The American Statistician, 65:4, 265-273, DOI: 10.1198/tas.2011.09076. [85] Hargreaves, B. R., & McWilliams, T. P. (2010). Polynomial Trendline function flaws in Microsoft Excel. Computational Statistics & Data Analysis, 54(4), 1190-1196. [86] Knusel, L. (2010). On the accuracy of statistical distributions in Microsoft Excel 2010. Available at: http://www.csdassn.org/software_reports/Excel2011.pdf [87] Mélard, G. (2014). On the accuracy of statistical procedures in Microsoft Excel 2010. Computational statistics, 29(5), 1095-1128. [88] Oppenheimer D. M. (2009). Function improvements in Excel 2010. http://blogs.msdn.com/b/excel/archive/2009/09/10/function-improvements -in-excel-2010.aspx. Accessed 10 March 2015. [89] Levy, S. M. (2014). Fooled by Pseudo-Randomness (October 7, 2014). Available at SSRN: http://ssrn.com/abstract=2507276 or http://dx.doi.org/10.2139/ssrn.2507276 [90] Microsoft (2011). Description of the RAND function in Excel. Knowledge Base Article KB828795 http://support.microsoft.com/kb/828795 Accessed March 11, 2015. [91] Almiron, M. G., Lopes, B., Oliveira, A. L., Medeiros, A. C., & Frery, A. C. (2010). On the numerical accuracy of spreadsheets. Journal of Statistical Software, 34(4), 1-29.

http://epublications.bond.edu.au/ejsie/vol8/iss2/3 34 Botchkarev: Simulation with Excel

[92] Click, T. H., Liu, A., & Kaminski, G. A. (2011). Quality of random number generators significantly affects results of Monte Carlo simulations for organic and biological systems. Journal of computational chemistry, 32(3), 513-524. [93] McCullough, B. D., & Yalta, A. T. (2013). Spreadsheets in the cloud–not ready yet. Journal of Statistical Software, 52(7), 1-14. [94] Lomont, C. (2008), Random Number Generation. http://www.lomont.org/Math/Papers/2008/Lomont_PRNG_2008.pdf [95] Brown, R. G., Eddelbuettel, D., & Bauer, D. (2015). Dieharder: A random number test suite. Version 3.31.1, http://www.phy.duke.edu/~rgb/General/rand_rate.php [96] Matsumoto, M., & Nishimura, T. (1998). Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation (TOMACS), 8(1), 3-30. [97] Panneton, F., L'Ecuyer, P. and Matsumoto, M. (2006). Improved Long-Period Generators Based on Linear Recurrences Modulo 2, ACM Transactions on Mathematical Software, 32(1), 1-16. http://doi.acm.org/10.1145/1132973.1132974 [98] Microsoft. Excel 2013. Statistical functions (reference) https://support.office.com/en- us/article/Statistical-functions-reference-624dac86-a375-4435-bc25-76d659719ffd [99] Microsoft. New functions in Excel 2013 https://support.office.com/en- us/article/New-functions-in-Excel-2013-075c82bd-15b9-4ad6-af31-55bb6b011cb9 [100] Mersenne Twister Home Page. http://www.math.sci.hiroshima-u.ac.jp/~m- mat/MT/emt.html [101] Sanders T. J. (2013). Simulation Notes. Edited by Nelson A. Uhan. United States Naval Academy Annapolis, Maryland. http://www.usna.edu/Users/math/dphillip/sa421.f13/chapter02.pdf [102] Kleiber, C., & Zeileis, A. (2013). Reproducible econometric simulations. Journal of Econometric Methods, 2(1), 89-99. [103] Ripley, B.D. (1990). Thoughts on pseudorandom number generators. Journal of Computational and Applied Mathematics, 31, 153‐163.

Published by ePublications@bond, 2015 35