Simulation Optimization: New Approaches to Gradient-Based Search and MLE

Motivations GLR MLE STAR-SA & DiGARSM Summary Simulation Optimization: New Approaches to Gradient-Based Search and MLE PIs: Michael Fu & Steve Marcus Funding Period: August 1, 2020 { August 1, 2023 Robert H. Smith School of Business Electrical & Computer Engineering Institute for Systems Research University of Maryland https://scholar.rhsmith.umd.edu/mfu AFOSR Mathematical Optimization Program Review August 19, 2020 Motivations GLR MLE STAR-SA & DiGARSM Summary Project Team PIs: Michael Fu & Steve Marcus current PhD students: Yunchuan Li (ECE), Peng Wan (math), Yi Zhou (math) Previous AFOSR project ended Jan.2018. DARPA Lagrange Mar.2018 { Sep.2019. This project: Aug.1, 2020 { Aug.1, 2023. Today's talk includes work with many other collaborators. 2 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Introduction Stochastic Optimization Setting: Maximize or Minimize L(θ) = E['(Y )] where θ is the controllable parameter (decision variables), Y is a random variable (r.v.), ' measurable (possibly discontinuous) Simple Example: single-server queue (what is '?) min L(θ) = E[Y (θ)] + c/θ θ Y waiting/system time, θ mean service time, c service \cost" MLE: (what is '?) max L(θ) = [ln] fY (Y1; Y2;::: ; θ) θ Yi observed data with joint density fY 3 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Introduction (cont.) Our usual setting is that Y is an output performance measure represented as Y = g(X (θ); θ); where X can be viewed as the input r.v.s., e.g., in queueing, interarrival and service times; in stochastic activity networks (SANs), activity times Queueing Network Example θ1 θ2 θ3 θ4 Think of as simplified/stylized MVA/DMV 4 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Outline 1 Motivations 2 GLR Stochastic Gradient Estimation Distribution Sensitivities 3 MLE Gradient-Based Simulated MLE Non-convex MLE for Neuroimaging Data 4 STAR-SA & DiGARSM 5 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary MLE for two types of data-driven models likelihood function not readily available, e.g., system times from a queueing network or SAN, gradient-based simulated MLE (GSMLE) GLR estimator for distribution sensitivities \Maximum Likelihood Estimation By Monte Carlo Simulation: Towards Data-Driven Stochastic Modeling" (w/ Y. Peng, B. Heidergott, H. Lam) Operations Research (accepted Dec.2019) likelihood function complex & non-convex, e.g., data from numerous neuroimaging sources MCEM (well-known approach used for MLE) MRAS (global optimization method) to be submitted soon 6 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary New Gradient-Based Search Methods combining direct gradient information with function evaluations to improve search STAR-SPSA (OR, under revision) DiGARSM (OR, R&R, to be resubmitted this month) \Direct Gradient Augmented Response Surface Methodology as Stochastic Approximation" (w/ Y. Li) Operations Research (to be resubmitted this month) stochastic gradient estimators: generalized likelihood ratio (GLR) method discontinuous sample performance structural parameters various ongoing research collaborations, including paper submitted to Operations Research in May 7 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Quick Overview (Gradient-based Search) Main idea: (sign depending on max/min) θk+1 = θk ± ak rdθL(θk ) Main challenge: (stochastic) gradient estimate rdθL(θ) When applicable, use infinitesimal perturbation analysis (IPA) and the likelihood ratio (LR) method (e.g., later DiGARSM example) single-run techniques (NO additional simulation required) however, NOT applicable for MLE −! generalized LR (GLR) 8 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary A different view of a density function (estimation) density function (aka p.d.f.) is ... derivative of c.d.f. F of r.v. Y , which is the expectation of an indicator function, i.e., 1st-order distribution sensitivity @F (y) @ = [1 fY ≤ yg] @y @y E (i) discontinuous sample performance 1fY ≤ yg (jumps from 0 to 1 at y) (ii) structural parameter y (as opposed to a distributional parameter, as in MLE) KEY ISSUES in blue: discontinuity of indicator 1{·}, structural parameters 9 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Relevant Methods (simulation-based density estimation) IPA-based methods (Hong 2009, Hong and Liu 2010): (i) slow convergence rate; (ii) only derivative w.r.t. parameters (not argument). CMC/SPA & push-out LR: problem dependent Kernel-based methods (Liu and Hong 2009): (i) biased; (ii) choice of bandwidth parameters; (iii) slow convergence rate. WD (in Heidergott and Volk-Makarewicz 2016): no structural parameters. GLR: Generalized Likelihood Ratio Method (Peng et al.) \A New Unbiased Stochastic Derivative Estimator for Discontinuous Sample Performances with Structural Parameters," Operations Research, Vol.66, No.2, 487-499, 2018. 10 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary GLR Comparison with Existing Methods Advantages: handle general discontinuous sample performance unbiased w/ desirable convergence properties analytical form derivatives w.r.t. parameters and argument handle any distribution sensitivity in a unified form 11 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary MLE: Review & Two Scenarios Data Observed: Y1; Y2;::: Goal: Estimate parameter(s) θ arg max L(θ; Y1; Y2;:::; Yn); θ where L(θ; y1; y2;:::; yn) = ln fY (y1; y2;::: yn; θ) Two (Different) Scenarios: fY not explicitly available fY non-convex 12 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Big Picture: Data & Model Fitting Illustrative (Motivating) Example: Queueing System Data Observed: waiting (system) times Y1; Y2;::: Goal: Fit stochastic model and estimate input parameter θ, e.g., mean service time at one of the stations MLE: arg max L(θ; Y1; Y2;:::; Yn); θ where L(θ; y1; y2;:::; yn) = ln fY (y1; y2;::: ; θ) Main Assumption: fY not explicitly available e.g., complex simulation model or real system generates Y1; Y2;::: 13 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Models: Statistics vs. Stochastics (e.g., regression vs. queueing) machine learning (ML) models are black-box , fit data statistically stochastic models are causal/explanatory Example: FCFS G/G/1 queue Lindley equation Yt (θ) = max(0; Yt−1(θ) − At ) + Xt (θ) dY dX dY IPA: t = t + t−1 1 fY > A )g dθ dθ dθ t−1 t 14 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Usual Stochastic Model Approach: Input Fitting MLE Find θ by MLE of input data Xt , t = 1;:::; n: arg max ln fX (X1; X2;:::; Xn; θ); θ where fX is (joint) density of service times. What happens if service times X1; X2;::: NOT observed? Or... What happens if model is misspecified? e.g., arrival process assumed stationary Poisson, but in reality very time-varying 15 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Gradient-based simulated MLE (GSMLE) Main idea: Simulation (causal) model available MLE carried out by gradient-based search: θk+1 = θk + ak rdθL(θk ; Y ) i.e., simulation used to get gradient estimate (as a function of fixed output) Main challenge: gradient estimate rdθL(θ) 16 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary What is the Likelihood Function? let's add θ density function is ... derivative of c.d.f, which is the expectation of an indicator function, i.e., 1st-order distribution sensitivity @F (y; θ) @ L(θ) = = [1 fY (θ) ≤ yg] @y @y E AND we actually need derivative of this w.r.t. θ, i.e., @2F @2 r L(θ) = = [1 fY (θ) ≤ yg] θ @y@θ @y@θE 17 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary GLR for Distribution Sensitivities Under appropriate regularity conditions (Peng et al. 2018), @F (y; θ) @F (y; θ) = [φ (X ; y; θ)]; = [φ (X ; y; θ)]; @θ E 1 @y E 2 @2F (y; θ) = [φ (X ; y; θ)]; ······ @y@θ E 3 where @ log f (x; θ) φ (x; y; θ) = 1fY (θ) ≤ yg X + d (x; θ) j @θ j usual LR estimator !−1 " 2 2 !−1!# : @g(x; θ) @ g(x; θ) @g(x; θ) @ log fX (x; θ) @ g(x; θ) @g(x; θ) d (x; θ) = + − ; d = ::: 1 2 2 @xi @θ@xi @θ @xi @xi @xi \simple" rescaling of observed data by simulation Y (θ) = g(X ; θ), where g is the causal/stochastic model, e.g., Lindley's equation 18 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Input Fitting vs. Output Fitting (LN/LN/1 Queue) Service times i.i.d. lognormal w/ true parameter θ = 0 Interarrival times also lognormal Both MLE using input data (service times) & GSMLE using output data (system times) give true θ^ ≈ 0 Estimation Based on True Model (100 Observations) 0.4 0.2 0 θ −0.2 −0.4 Estimation of −0.6 −0.8 −1 GSMLE(100000) GSMLE(1000000) MLE−Input 19 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Model Misspecification Example (LN/LN/1 Queue) \All models are wrong, but some are useful" | George Box Service times i.i.d. lognormal w/ true parameter θ = 0 Interarrival times also lognormal True arrival process 2-state Markov modulated process (MMP) What happens if arrival process misspecified as stationary? 20 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Input Fitting vs. Output Fitting (LN/LN/1 Queue) Arrival process misspecified, and true service parameter θ = 0 MLE using input data (service times) gives true value θ^ = 0 GSMLE using output data (system times) gives θ^ ≈ 0:4 Which is better? Estimation Under Model Misspecification (100 Observations) 1 0.8 0.6 0.4 θ 0.2 0 −0.2 Estimation of −0.4 −0.6 −0.8 −1 GSMLE(100000) GSMLE(1000000) MLE−Input 21 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Towards Data-Driven Modeling (LN/LN/1 Queue) True Model GSMLE Fitting MLE-Input Fitting 4:3 ± 0:04 4:8 ± 0:04 2:5 ± 0:02 average system time of first 10 customers (10K reps) 6.5 Takeaway: True Model 6 GSMLE Fitting MLE-Input Fitting If model is misspecified, 5.5 it might be better 5 to use the output data 4.5 for the stochastic model 4 3.5 input MLE! Expected System Time 3 2.5 2 1.5 1 2 3 4 5 6 7 8 9 10 Customer Number 22 / 36 Motivations GLR MLE STAR-SA & DiGARSM Summary Noninvasive Neural Data Sensory Stimulus The Neural Code Animal Brain Human Neurons Dipoles Measurement Sensors Sources MEG/EEG Dynamic Sparse Structure Electrophysiology Measurements Optimization ~10-10² Sensors ~10²-10⁵ Sources Hours of Data Highly Dynamic Sparse Structure Neural Response 23 / 36 Motivations ProblemGLR FormulationMLE and ModelingSTAR-SA & DiGARSM Summary State-Space Model Original state space model: The matrices , and Matrices , and have covariance are known unknown but fixed parameters.

Simulation Optimization: New Approaches to Gradient-Based Search and MLE

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support