Tutorial Software for

David L. Woodruff Graduate School of Management UC Davis Davis CA USA 95616 DLWoodruff@UCDavis.edu Introduction

I I received an order to talk about software for stochastic programming in general with an overview of issues and what exists. I I intend to deliver, but with a bias toward creating software for release. Software Tools Area of IJOC

Papers describing software and data made available to the research community are particularly sought. These papers need not provide novel mathematics or algorithms themselves, but must be novel and important with respect to computing and must describe or illustrate how the software and data can be used to advance research and how it fits in the research literature.

https://pubsonline.informs.org/page/ijoc/ editorial-statement#Software%20Tools cORe

Database of instances and data. Contact Suvrajeet Sen for more information or see the cORe site https://core.isrd.isi.edu Outline

Introduction

Features Lists

Crowd-source features

Parallelism

Input Creation Software

Parting Thoughts

Encore Optimization Under Uncertainty

Vague, yet general, formulation assuming a function f parameterized by ξ that maps to a space for which min has meaning: minx Vξ∈Ξ(f (x; ξ) (P) s.t. x ∈ Ω) where Ξ and Ω may have uncertain components. Takeaways

I How to think about SP software I A (partial?) list of what is out there I Opportunities for research

There will be takeaways for me as well, based on some crowd-sourcing. Audience

I Who are you? I Writing software? I Using software? I General-purpose? I Applications area? I Energy? I Other natural resources? I Transportation or Logistics? I Finance? I Other? Expressing things

I ”reference” models I stage structure (e.g., scenario trees) I scenario data, or distributions, or uncertainty sets Features Lists

Take a look at the features lists in the software surveys published by or-ms today, e.g., https://pubsonline.informs.org/magazine/orms-today/ 2019-linear-programming-software-survey

Let’s start walking through an extended feature list for SP software. Open and Free?

I Open-source versus compiled image I Gratis/kostenlos/free versus commercial Oh wow, man

Remember that all software is data but not all data is software. Type of “language” for models, data, algorithms

I Data and Structure I Standard Format with implied model form (e.g., SMPS) I Code Library (e.g., SMI) I Creator/Oracle (e.g., mape maker) I Models I AML (e.g., AMPL) perhaps with annotations. I Object Library (e.g., CPLEX Python) I Algorithms I Programming Language (e.g., C++) I Custom Component Library WLOG

Many things that are “without loss of generality” from an algorithmic perspective may be a little misleading from a software user or writer’s perspective. Underlying Model

I linear I quadratic I convex I general non-linear I binaries/integers Two Biggies

I Decision versus Policy I Two stage versus multi-stage Software for two-stage is often not-so-easy to extend to multi-stage. https://www.youtube.com/watch?v=_JZom_gVfuw Assumptions about uncertainty

I RHS only, etc,. I Stage independence I Exogenous/Endogenous uncertainty I Given a distribution I Given scenarios I Just given data Simplified Scenario-based EF Expected Value Statement

X min Pr(s)f (X (s); s): X (s) ∈ Qs , X (s) ∈ N , s ∈ S s∈S Stochasticity

I Will your software provide stochastic bounds I Solve EF only, or progressively sample I Sample from a distribution I Use a sample oracle Decomposition

I Stage-wise I Scenario I Other Risk

I Risk in the objective I Chance constraints I Optimize for worst-case(s) I Stochastic dominance Parallelism

I What aspects are parallelized? I What are the limits of parallel efficiency? Crowd-source features

What features are missing? To whom should we send the survey? I AIMMS; https://www.aimms.com/english/developers/ resources/examples/functional-examples/ stochastic-programming/ I bnbs; https://neos-server.org/neos/solvers/slp: bnbs/SMPS.html I DDSIP; https://github.com/RalfGollmer/ddsip I DSP; https://www.mcs.anl.gov/research/projects/DSP/ I fast; https://github.com/leopoldcambier/FAST I FICO Xpress; (Mosel) Hard to find? I flopc++; https://projects.coin-or.org/FlopC++ I Frontline Risk; https://www.solver.com/ risk-solver-stochastic-libraries?gclid= Cj0KCQjwpavpBRDQARIsAPfTwixWl_ zBumRDV9XoBItadbCslgeGfvHaaxWpvQBqUIRjgCdPVhypuq4aAjqmEALw_ wcB more I Lindo; https://www.lindo.com/index.php/products/ lingo-and-optimization-modeling?catid=90&id=98: stochastic-programming-features I maximal; http://www.maximalsoftware.com/maximal/ news/stochastic.html I MSLiP I msppy; github.com/lingquant/msppy I OsiL/SE; https://www.coin-or.org/OS/OSiL_SE.html I PySP; https://github.com/Pyomo/pyomo I SD; https://github.com/USC3DLAB/ I sddip; https://github.com/lkapelevich/SDDiP.jl I SMI; https://projects.coin-or.org/Smi I SMPS I StructDualDynProg; https://github.com/JuliaStochOpt/StructDualDynProg.jl I structjump; https://github.com/StructJuMP/StructJuMP.jl I sddp.jl; https://www.psr-inc.com/softwares-en/ I ; https://www.mosek.com/documentation/ Parallelism

I What aspects are parallelized? I What are the limits of parallel efficiency? Parallel Efficiency

1 Tseq(N) P T (N, P) often approximated (in a cheating-sort-of-way) by

1 T (N, 1) P T (N, P) You might also want to know the lowest wall-clock time, regardless of efficiency; i.e., the assymptote of T (N, P) as a function of P for a given N. MPI vocabulary

rank: a process identified by an integer (think of it as an instance of your program.) mpiexec -np 27 myprog comm: a collection of ranks that communicate reduce, broadcast, send: some ways to communicate A Little Bit of C

#include ... int main(int argc, char** argv) { int rank,size;

MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&size);

if (rank == 0) ... Use PH to illustrate

I Data is probably more important for software right now, but I algorithms and parallel implementation are more fun. I We will look at PH because the software issues shared with many other algorithms Simplified Problem Statement

X min Pr(s)f (X (s); s): X (s) ∈ Qs , X (s) ∈ N , s ∈ S s∈S This can be formulate to include CVaR. Progressive Hedging

I For a more general description, see Rockafeller and Wets, Math of OR 1991. See also Watson and Woodruff, Computational Management Science 2011.

I For each scenario s, (approximate) solutions are obtained for the problem of minimizing, subject to the constraints, the deterministic fs plus terms that penalize lack of implementability. They make use of a system of row vectors, w, that have the same dimension as the column vector system X . Notation

I t(A) is the time index for node A (i.e., node A corresponds to time t). I Pr(A) denotes the sum of Pr(s) over all s for scenarios emanating from node A (i.e., those s that are the leaves of the sub-tree having A as a root also referred to as s ∈ A). Consequently, X Pr(A) = Pr(s). sinA

I X (t; A) on the left hand side of a statement to indicate assignment to the vector (x1(s, t),..., xN(t)(s, t)) for each s ∈ A. I X (s) means (x(s, 1),..., x(s, T )) for the system of variables. Algorithm Taking ρ > 0 as a (vector) parameter and starting with k := 0 1. For all scenario indexes, s ∈ S (0) X (s) := argmin f (X (s); s): X (s) ∈ Qs (1) and w (0)(s) := 0 2. k := k + 1 3. For each node, A, in the scenario tree, and for t = t(A) (k−1) X X (t; A) := Pr(s)X (t; s)(k−1)/Pr(A) s∈A 4. For all scenario indexes, s ∈ S  (k−1)  w (k)(s) := w (k−1)(s) + (ρ) X (k−1)(s) − X (s)

2 k (k) k−1 X (s) := argmin fs (X (s))+w (s)X (s)+ρ/2 X (s) − X (s) : X (s) ∈ Qs . 5. If the termination criteria are not met, then go to step 2. Load Balancing and speed up

I Have the solution quality improve as the algorithm converges. I Use a decreasing mipgap. I For most iterations, terminate sub-problems on max time. I Fix some variables heuristically I Use only n log(n) communication mpi for ph

I Have a comm for every non-leaf node of the scenario tree I (So for a two stage problem, you only need the global comm)

I Have a rank for groups of scenarios I (The simplest to think about is one scenario per rank)

I Communication can be accomplished via reductions I (See, e.g., Step (3)) Code snippet

comm.Allreduce([synchronizer.local_data[redname][cname], mpi.DOUBLE], [synchronizer.global_data[redname][cname], mpi.DOUBLE], op=mpi.SUM) Some Reduction Operations

SUM, MIN, MAX, SUM, AND, OR,... Reductions take n log n time Scenario Creator Issues

I Is it going to be built into a solution algorithm or stand-alone? I Give a set or consult an oracle piecemeal? I Are there forecasts or not? I How general can you be? I Do you want adhere to a standard? Oracle

For our purposes here, an oracle is a piece of software that returns a scenario when one is requested. I Typically, they are equally likely. I The oracle could simulate or draw from a distribution. I When the oracle operates in parallel (or a priori) it (or something) can create a pool. Quick Example - mape maker

I Given a time series of forecasts, we create a set of scenarios for power generation (that we often refer to as actuals to distinguish these values from forecasts) that, based on a forecast system with a specified accuracy, could reasonably correspond to the forecasts. I We can also create a scenario set of forecasts that could reasonably correspond to a given time series of actuals. I The correspondence between forecasts and actuals is based on analysis of historic forecast error distributions. mape maker output Parting Thoughts

I There is a growing body of software available to solve Stochastic Programs. I Parallelism and massive parallelism are important. I Input creation is a little behind (maybe because it is less satisfying). I It is becoming increasingly important to have software to go with your algorithm paper. I You can even publish papers about software. Benchmarking

I A desired feature of standards is the ability to benchmark solution methods. I But that doesn’t address the question: are you solving the right problem and are the stochastics modeled in the best way? I In some situations one can use Counterfactual Reenactment I Can only be used where there is a history of data and the decisions do not effect the realizations (very much); e.g., unit commitment.

I Basic horse race idea: roll forward through time giving each horse in the race the data available up to that time, then compute costs that would have resulted from each horse’s decision