Combustion and Flame 183 (2017) 88–101
Contents lists available at ScienceDirect
Combustion and Flame
journal homepage: www.elsevier.com/locate/combustflame
A general probabilistic approach for the quantitative assessment of LES combustion models
∗ Ross Johnson, Hao Wu, Matthias Ihme
Department of Mechanical Engineering, Stanford University, Stanford, CA 94305, United States
a r t i c l e i n f o a b s t r a c t
Article history: The Wasserstein metric is introduced as a probabilistic method to enable quantitative evaluations of LES Received 9 March 2017 combustion models. The Wasserstein metric can directly be evaluated from scatter data or statistical re-
Revised 5 April 2017 sults through probabilistic reconstruction. The method is derived and generalized for turbulent reacting Accepted 3 May 2017 flows, and applied to validation tests involving a piloted turbulent jet flame with inhomogeneous inlets. It Available online 23 May 2017 is shown that the Wasserstein metric is an effective validation tool that extends to multiple scalar quan- Keywords: tities, providing an objective and quantitative evaluation of model deficiencies and boundary conditions Wasserstein metric on the simulation accuracy. Several test cases are considered, beginning with a comparison of mixture- Model validation fraction results, and the subsequent extension to reactive scalars, including temperature and species mass Statistical analysis fractions of CO and CO 2 . To demonstrate the versatility of the proposed method in application to multi-
Quantitative model comparison ple datasets, the Wasserstein metric is applied to a series of different simulations that were contributed Large-eddy simulation to the TNF-workshop. Analysis of the results allows to identify competing contributions to model devi- ations, arising from uncertainties in the boundary conditions and model deficiencies. These applications demonstrate that the Wasserstein metric constitutes an easily applicable mathematical tool that reduce multiscalar combustion data and large datasets into a scalar-valued quantitative measure. ©2017 The Combustion Institute. Published by Elsevier Inc. All rights reserved.
1. Introduction sults from Favre and Reynolds averaging, conditional data, proba- bility density functions, and scatter data. These data provide im- A main challenge in the development of models for turbu- portant information for model evaluation. lent reacting flows is the objective and quantitative evaluation of Significant progress has been made in simulating turbulent the agreement between experiments and simulations. This chal- flows. This progress can largely be attributed to adopting high- lenge arises from the complexity of the flow-field data, involv- fidelity large-eddy simulation (LES) for the prediction of unsteady ing different thermo-chemical and hydrodynamic quantities that turbulent flows, and establishing forums for collaborative compar- are typically provided from measurements of temperature, speci- isons of benchmark flames that are supported by comprehensive ation, and velocity. This data is obtained from various diagnos- databases [7] . tics techniques, including nonintrusive methods such as laser spec- In spite of the increasing popularity of LES, the valida- troscopy and particle image velocimetry, or intrusive techniques tion of combustion models follows previous steady-state RANS- such as exhaust-gas sampling or thermocouple measurements [1– approaches, comparing statistical moments (typically mean and 4] . Instead of directly measuring physical quantities, they are typ- root-mean-square) and conditional data along axial and radial lo- ically inferred from measured signals, introducing several correc- cations in the flame. Qualitative comparisons of scatter data are tion factors and uncertainties [5] . Depending on the experimen- commonly employed to examine whether a particular combustion tal technique, these measurements are generated from single-point model is able to represent certain combustion-physical processes data, line measurements, line-of-sight absorption, or multidimen- in composition space that, for instance, are associated with extinc- sional imaging at acquisition rates ranging from single-shot to tion, equilibrium composition, or mixing conditions. In other in- high-repetition rate measurements to resolve turbulent dynamics vestigations, error measures were constructed by weighting mo- [6] . This data is commonly processed in the form of statistical re- ments and scalar quantities for evaluating the sensitivity of model coefficients in subgrid models and for uncertainty quantification [8,9] . It is not uncommon that models match particular measure- ∗ Corresponding author. ment quantities (such as temperature or major species products) E-mail addresses: [email protected] (R. Johnson), [email protected] (H. in certain regions of the flame, while mispredicting the same Wu), [email protected] (M. Ihme). http://dx.doi.org/10.1016/j.combustflame.2017.05.004 0010-2180/© 2017 The Combustion Institute. Published by Elsevier Inc. All rights reserved. R. Johnson et al. / Combustion and Flame 183 (2017) 88–101 89 quantities in other regions or showing disagreements for other Since multivariate distributions are commonly used to represent flow-field data at the same location. Further, the comparison of in- the thermo-chemical states in turbulent flows, this method is use- dividual scalar quantities makes it difficult to consider dependen- ful for the quantitative assessment of differences between a nu- cies and identify correlations between flow-field quantities. Faced merical simulation and experimental data of turbulent flames. with this dilemma, such comparisons often only provide an incon- The dissimilarity between two points in the thermo-chemical clusive assessment of the model performance, and limit a quantita- state space can typically be quantified by its Euclidean distance. tive comparison among different modeling approaches. Therefore, a In the case of a 1D state space, e.g., temperature, the Euclidean need arises to develop a metric that enables a quantitative assess- distance is simply the absolute value of the difference. This met- ment of combustion models, fulfilling the following requirements: ric serves the purpose well for the comparison among two de- terministic measurements, which can indeed be fully represented 1. provide a single metric for quantitative model evaluation; by points in the state space. However, the Euclidean distance falls 2. combine single and multiple scalar quantities in the validation short when measurements are made on random data, e.g., states in metric, including temperature, mixture fraction, species mass a turbulent flow, in which case a single data point can no longer fractions, and velocity; represent the measurement and shall be replaced by a distribution 3. incorporate scatter data from single-shot, high-speed, and si- of possible outcomes. Consequently, the dissimilarity between two multaneous measurements, and enable the utilization of statis- random variables shall be quantified by the “distance” between the tical data; corresponding probability distributions. 4. enable the consideration of dependencies between measure- Among many definitions of the “distance” between distributions ment quantities; and [11,12] , the Wasserstein metric (also known as the Mallows met- 5. ensure that the metric satisfies conditions on non-negativity, ric) is of interest in this study. Several other methods for quan- identity, symmetry, and triangular inequality. tifying the difference between two distributions have been pro- By addressing these requirements, the objective of this work is posed. However, those methods are either limited to evaluating the to introduce the probabilistic Wasserstein distance [10] as a met- equality of marginal distributions (e.g., Kolmogorov–Smirnov test) ric for the quantitative evaluation of combustion models. Com- or are incomplete in that they fail to satisfy essential metric prop- plementing currently employed comparison techniques, this met- erties (e.g., Kullback–Leibler divergence). The interested reader is ric possesses the following attractive properties. First, this metric referred to the survey by Gibbs and Su [11] for a detailed com- directly utilizes the abundance of data from unsteady simulation parison between the Wasserstein metric and other candidates. The techniques and high repetition rate measurement methods. Rather Wasserstein metric represents a natural extension of the Euclidean than considering low-order statistical moments, this metric is for- distance, which can be recovered as the distribution reduces to a mulated in distribution space. As such, it is thereby directly ap- Dirac delta function. As such, this metric provides a measure of the plicable to scatter data that are obtained from transient simula- difference between distributions that resembles the classical idea tions and high-speed measurements without the need for data re- of “distance”. The resulting value can be judged and interpreted in duction to low-order statistical moment information. Second, this a similar fashion as the arithmetic difference between two deter- metric is able to synthesize multidimensional data into a scalar- ministic quantities. valued quantity, thereby aggregating model discrepancies for indi- The Wasserstein metric has been used as a tool for measur- vidual quantities. Third, the resulting metric utilizes a normaliza- ing the “distance” between distributions or histograms in the con- tion, thereby enabling the objective comparison of different sim- text of content-based image retrieval [12] , hand-gesture recogni- ulation approaches. Fourth, this method is directly applicable to tion [13] , and analysis of 3D surfaces [14] , etc. These applications sample data that are generated from scatter plots, instantaneous were first proposed by Rubner et al. [12] , under the name of the simulation results or reconstructed from statistical results. Fifth, Earth Mover’s Distance (EMD), which is in fact the Wasserstein the Wasserstein metric enables the consideration of conditional metric of order one, W1 . More recent applications favor the 2nd and multiscalar data, and is equipped with essential properties of Wasserstein metric by way of Brenier’s theorem [14,15] , which metric spaces. Finally, this metric can be regarded as a companion shows the uniqueness of the optimal solution for the squared- tool to previously established methods for validation of LES and distance and its connection to differential geometry through which instantaneous CFD-simulations. more efficient algorithms have been constructed. The remainder of this paper is structured as follows. This section is primarily concerned with the Wasserstein metric Section 2 formally introduces the Wasserstein metric and builds a for discrete distributions in the Euclidean space. This allows us to mathematical foundation for this method. This method is demon- focus on the concepts that are directly related to the practical cal- strated in application to the Sydney piloted jet flame with inho- culation and estimation of this metric, and avoid the usage of mea- mogeneous inlets, and experimental configuration and simulation sure theory without creating much ambiguity. The definition of the setup are described in Section 3 . Results are presented in Section 4 . Wasserstein metric for general probability measures with more for- To introduce the metric, we first consider a scalar distribution of mal treatment of the probability theory is discussed in Appendix A . mixture fraction and establish a quantitative comparison of ex- The interested reader is also referred to books by Villani [16] , sur- periments and simulations against presumed closure models. This veys by Urbas [17] , and the more recent lectures by McCann and is then extended to multiscalar data, involving temperature and Guillen [18] for further details. species mass fractions. Subsequently, we examine the utility of applying the Wasserstein metric to statistical results, rather than 2.1. Preliminaries: Monge–Kantorovich transportation problem scatter data, while Section 5 explores how the Wasserstein met- ric could add value for directly comparing multiple simulations The Wasserstein metric is motivated by the classical optimal to assess different LES modeling approaches. This work finishes in transportation problem first proposed by Monge in 1781 [19] . The Section 6 with conclusions. optimal transportation problem of Monge considers the most ef- ficient transportation of ore from n mines to n factories, each of 2. Methodology which produces and consumes one unit of ore, respectively. These mines and factories form two finite point sets in the Euclidean In this section, the methodology of quantifying the dissimilarity space, which are denoted by M and F. The cost of transporting ∈ M ∈ F between two multivariate distributions (joint PDFs) is discussed. one unit of ore from mine xi to factory y j is denoted by 90 R. Johnson et al. / Combustion and Flame 183 (2017) 88–101
cij , which is chosen by Monge to be the Euclidean distance. The The pth Wasserstein metric between discrete distributions f and transport plan can be expressed in the form of an n × n matrix g is defined to be the p th root of the optimal transport cost for the γ , of which the element ij represent the amount of ore trans- corresponding Monge–Kantorovich problem: ported from xi to yj . The central problem in optimal transporta- / n n 1 p tion is to find the transport plan that minimizes the total cost of p W ( f, g) = min γ | x − y | transportation, which is the sum of the costs on n × n available p ij i j = = n n γ i 1 j 1 routes between factories and mines, i.e., = = ij cij . The orig- i 1 j 1 (5) inal problem formulated by Monge requires transporting the ore n n γ = , γ = , γ ≥ . in its entirety. Therefore, is constrained to be a permutation ma- subject to ij fi ij g j i, j 0 trix, and can only take binary values of 0 or 1. Formally, the Monge j=1 i =1 transport problem can be expressed as The constraints in Eq. (5) ensure that the total mass transported n n from xi and the total mass transported to yj match fi and gj , re- min γ c ij ij spectively. i =1 j=1 The Wasserstein metric for continuous distributions is pre- (1) n n sented in Appendix A . The estimation and calculation for the con- γ = γ = , γ ∈ { , } . subject to ij ij 1 ij 0 1 tinuous case and for higher-dimensional problems are discussed in i =1 i = j the next section. The transport problem in Monge’s formulation can be solved rel- atively efficient by the Hungarian algorithm proposed by Kuhn 2.3. Non-parametric estimation of W through empirical distributions [20] . p In 1942, Kantorovich reformulated Monge’s problem by relaxing Two major difficulties arise in obtaining the Wasserstein metric the requirement that all the ore from a given mine goes to a single of two multivariate distributions of thermo-chemical states. One is factory [21,22] . As a result, the transport plan is modified from be- that the multivariate distributions are not readily available from ei- ing a permutation matrix to a doubly stochastic matrix. With this, ther experiments or simulations. Instead, a series of samples drawn Monge’s problem is written in the following form: from these distributions is provided. The other is that there is no