Optimization

ECE 592 Topics in Data Science Dror Baron Associate Professor Dept. of Electrical and Computer Engr. North Carolina State University, NC, USA Optimization Keywords: linear programming, dynamic programming, convex optimization, non-convex optimization What is Optimization? What is optimization? . Wikipedia: In mathematics, computer science and operations research, mathematical optimization (alternatively, mathematical programming or simply, optimization) is the selection of a best element (with regard to some criterion) from some set of available alternatives. 4 Application #1-Classroom scheduling . Real story NCSU has classes on multiple campuses, dozens of buildings, etc. We want a “good” schedule . What’s good? – Availability of rooms – Proximity of classroom to department – Instructors have day/time preferences – Match sizes of rooms and anticipated class enrollment – Avoid conflicts between course pairs of interest to students 5 Application #2- 1 recovery . Among infinitely manyℓ solutions, seek one with smallest 1 norm (sum of absolute values) ℓ . Relation to compressed sensing recovery (later in course) . Can express x=xp-xn 1 = , + , . min + subject to (s.t. ) y=Φx -Φx , , ∑=1 p n – Also need xp, xn to be non-negative ∑=1 6 Application #3-reducing fuel consumption . Suppose gas prices increase a lot . Truck fleet company wants to save $ by reducing fuel consumption . Things are simple on flat highways . Challenges: 1) You see a hill; can push engine up hill and coast down, or accelerate before hill, then reduce speed while climbing 2) We see red light; should we coast, accelerate, slam brakes? Main point: dynamic behavior links past, present, future 7 Application #4-process design in factories . Consider factory with complicated process – Want to buy less inputs (chemicals) – Want to use less energy – Want product to be produced quickly (time) – Want robustness to surprises (e.g., power shortages) . Goal: tune production process to minimize costs – “Costs” involve inputs, energy, time, robustness, … – Known as multi objective optimization 8 Dynamic Programming (DP) Keywords: Bellman equations, dynamic programming What is dynamic programming (DP)? . Wikipedia: In mathematics, management science, economics, computer science, and bioinformatics, dynamic programming (also known as dynamic optimization) is a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions – ideally, using a memory-based data structure. 10 Resembles divide and conquer . Have a large problem . Partition into parts . Dynamic nature of problem links past, present, future . Want decision whose combined “costs” (current plus future) is best . Whereas brute force optimization is computationally intense, DP is fast 11 Problem setting . t – time . T – time horizon (maximal time) . xt – state at time t . Possible actions ( ) . T(x,a) – next state upon choosing action a ∈ Γ . F(x,a) – payoff from action a . Want to maximize our payoff up to horizon T 12 Solution approach . Basis case: t=T-1, have one time left for an action . Maximize payoff by maximizing F(xt,a) * a =arg maxa∈ F(xt,a) Γ * . At time T (end of problem) arrive at state xT=T(x,a ) – Don’t care about final state, only about payoff 13 Solution continued . Recursive case: t<T-1, have multiple decisions left . Let’s keep it simple with t=T-2 . Based on basis case, for each Xt-1=XT-2 can calculate a* for last decision (in next time step, t=T-1) . Want optimal cost to account for current payoff and payoff in next step = arg max { , + next_payoff(T(xt,a))} ( ) ∗ ∈Γ 14 Recursive solution . Let’s simplify recursive case for t=T-2 using notation for optimal actions / payoffs at time t – - optimal action at time t given state xt – ∗( ) – optimal payoff starting from time t Ψ * . Basis case provides a (xT-1), , ∀ . Recursive case for t=T-2 Ψ −1 = arg max { , + next_payoff(T(x ,a))} ( ) t ∗ = arg max { , + ( = ( , ))} (∈Γ) −1 . Repeat recursively∈Γ for smallerΨ t 15 Computationally efficient DP solution . Instead of processing from t up to T, reverse order: – t=T-1: compute , ( ) for all possible xt – t=T-2: = arg max∗ { , + ( = ( , ))} ( ) ∗ Ψ −1 −2 – t=T-3: = argm∈Γax { , + Ψ( = ( ,))} ( ) ∗ −2 −3 ∈Γ Ψ . General case: Bellman’s optimality equations . Each time step, store optimal actions and payoffs . Lookup table (LUT) for Ψ instead of recomputing . Can construct sequence of optimal actions with LUT 16 Why computationally efficient? . Let’s contrast computational complexities . Brute force optimization: – |Γ| actions per time step and T time steps – Must evaluate |Γ|T trajectories of actions – Θ(|Γ|T) . DP: – Compute , + ( = ( , ))} for |Γ| actions, T time – Θ(|Γ|T) Ψ . Whereas brute force optimization is computationally intense, DP is fast 17 Variations . Deterministic / random – Next state and payoff could be random – Example: there could be more users than expected; adjust server (action) to account for future trajectory of software . Finite / infinite horizon – Infinite horizon decision problems require discount factor β to give future payoffs at time t weight βt – Payoffs in far future matter less β<1 . Discrete / continuous time 18 Example [Cormen et al.] . Rod cutting problem . Have rod of integer length n . Have table of prices pi charged for length-i cuts . Cutting is free . Want to cut rod into parts (or not cut at all) to maximize profit 19 Example continued . Length n=4 . Can charge prices p1=1, p2=5, p3=8, p4=9 . Could look at all possible sets of cuts (see below) 20 Example using DP . Unrealistic to consider cutting configurations for large n, use DP instead . Basis: n=1, Ψ(1)=p1=1 . Recursion: n=2, Ψ(2)=max{2Ψ(1),p2}=5 . n=3, Ψ(3)=max{Ψ(1)+Ψ(2),p3}=max{5+1,8}=8 . At each stage, maximize over Ψ(k)+Ψ(n-k) for k=1,2,…,n-1; and for k=n use pn Ex: Ψ(7)=max{Ψ(1)+Ψ(6),Ψ(2)+Ψ(5),Ψ(3)+Ψ(4),…,p7} 21 Real-world application . Viterbi algorithm . Decodes convolution codes in CDMA – Also used in speech recognition – Text is “hidden” and (noisy) speech observations help estimate text . Relies on DP . Finds shortest path 22 Linear Programming Keywords: linear programming, simplex method Formulation . Canonical form max . , – Note: s.t. = subject to ≤ ≥0 . Matrix manipulations/tricks create variations: – Ax=b by enforcing ≤ and ≥ T – We’ve minimized ||x||1 (instead of c x) 24 What’s it good for? . Transportation - pose airline costs and revenue as linear model, maximize profit (revenue-costs) w/LP . Manufacturing – minimize costs by posing them as linear model . Common theme: many real-world problems are approximately linear, or can be linearized around working point (Taylor series) 25 History . Early formulations date back to early 20th century (rudimentary forms even earlier) . Dantzig invented simplex method (solver) in 40s – Polynomial average runtime; slow worst case . Interior point methods – much faster worst case 26 Simplex algorithm . Linear constraints, , 0 – Correspond to convex polytope . Linear function being optimized,≤ ≥ cTx – Optimal on corner point of polytope – Simplex = outer shell of convex polytope . Start at some corner point (vertex) . Examine neighboring vertices . Either cTx already optimal, or it’s better at neighbor . Move to best neighboring vertex; iterate until done . Specific steps correspond to linear algebra 27 Convex Optimization Keywords: convex optimization What are convex/concave functions? . Consider convex real valued function defined on space , : → ℝ . Convex: f(λx+(1- λ)y))≤ λf(x)+(1- λ)f(y),∀x,y∈ , λ∈(0,1) . Concave: -”- ≥ -”- . Note: f convex if and only if –f concave; convex/concave imply negative/positive second derivatives . Any local optimum is global optimum 29 What is convex optimization? . Basic convex problem: = arg min{f x } – Set and function f(x) must∗ both be convex ∈ . Alternate form: min ( ) . , – Functions f, g , …, g all convex 1 m ≤0 ∀ 30 Applications (Why is this interesting?) . Many problems can be posed as convex . Least squares . Entropy maximization . Linear programming 31 Newton’s method . Newton’s method finds roots of equations, f(x)=0 . Instead, derivative f`=0 or gradient = 0 2 . Taylor expansion: fT(x)=f(xt)+ + +… 1 . Root of derivative: f’(x )+f’’(x ) =0 t f′t � ∆ 2 f′′ � ∆ . Iterate with x =x + t+1 t ∆ ∆ . Newton’s method is simple but O(1/t) convergence 32 Second order methods . Challenge: first order approximation to derivative slows down Newton’s method . Solution: use higher order approximation – Instead of f’(xt)+f’’(xt) , use third derivative too ∆ . Multi-dimensional function? Use gradient, Hessians… . Second order methods more complicated but faster 33 Gradient descent Keywords: gradient descent, line search, golden section search Gradient descent . In each iteration, select direction to pursue – Coordinate descent – move along one of coordinates – Gradient descent - direction that minimizes cost function fastest . How far should we move along that direction? . Undershooting or overshooting bad for convergence 35 Line search . Key sub-method is to move along direction just enough to minimize the function along that line . Line search = optimization along line . Many variations – binary search, golden rule search . Let’s make up an example for this and code it! – Check course webpage for Matlab script 36 Integer Programming Keywords: integer programming, integer linear programs, relaxation What is integer programming? . Integer program = optimization problem where some/all variables must be integers . Integer linear programs (ILP): = arg max . � += – Slack variables s ≥0 ∈ℤ 38 Example . Support set detection – y=Ax+z – Sparse x – Want to identify support set where x≠0 . Can we do “perfect” support set detection? – Are there tiny non-zeros? (yes difficult) – What’s the SNR (low difficult) 39 Example continued . Support set detection, y=Ax+z, want support set . Algorithm: – Consider candidate support set, s∈{0,1}N – Create matrix As contains column i iff si=1 – Run least squares using As (find low-energy solution to y=Asx) – Iterate over all s, select solution with smallest residual .

Optimization

Turbo Decoding As Iterative Constrained Maximum Likelihood Sequence Detection John Maclaren Walsh, Member, IEEE, Phillip A

A Comparative Study of Time Delay Estimation Techniques for Road Vehicle Tracking Patrick Marmaroli, Xavier Falourd, Hervé Lissek

Optimization Basic Results

Comparison of Risk Measures

Direct Optimization Through $\Arg \Max$ for Discrete Variational Auto

Extreme-Value Theorems for Optimal Multidimensional Pricing

Cesifo Working Paper No. 5428 Category 12: Empirical and Theoretical Methods June 2015

Download (543Kb)

Dual Coordinate Descent Methods for Logistic Regression and Maximum Entropy Models

On Competitive Nonlinear Pricing Andrea Attar, Thomas Mariotti, Francois Salanie

Reinforcement Learning Via Fenchel-Rockafellar Duality

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm