Optimization
Total Page:16
File Type:pdf, Size:1020Kb
ECE 592 Topics in Data Science Dror Baron Associate Professor Dept. of Electrical and Computer Engr. North Carolina State University, NC, USA Optimization Keywords: linear programming, dynamic programming, convex optimization, non-convex optimization What is Optimization? What is optimization? . Wikipedia: In mathematics, computer science and operations research, mathematical optimization (alternatively, mathematical programming or simply, optimization) is the selection of a best element (with regard to some criterion) from some set of available alternatives. 4 Application #1-Classroom scheduling . Real story NCSU has classes on multiple campuses, dozens of buildings, etc. We want a “good” schedule . What’s good? – Availability of rooms – Proximity of classroom to department – Instructors have day/time preferences – Match sizes of rooms and anticipated class enrollment – Avoid conflicts between course pairs of interest to students 5 Application #2- 1 recovery . Among infinitely manyℓ solutions, seek one with smallest 1 norm (sum of absolute values) ℓ . Relation to compressed sensing recovery (later in course) . Can express x=xp-xn 1 = , + , . min + subject to (s.t. ) y=Φx -Φx , , ∑=1 p n – Also need xp, xn to be non-negative ∑=1 6 Application #3-reducing fuel consumption . Suppose gas prices increase a lot . Truck fleet company wants to save $ by reducing fuel consumption . Things are simple on flat highways . Challenges: 1) You see a hill; can push engine up hill and coast down, or accelerate before hill, then reduce speed while climbing 2) We see red light; should we coast, accelerate, slam brakes? Main point: dynamic behavior links past, present, future 7 Application #4-process design in factories . Consider factory with complicated process – Want to buy less inputs (chemicals) – Want to use less energy – Want product to be produced quickly (time) – Want robustness to surprises (e.g., power shortages) . Goal: tune production process to minimize costs – “Costs” involve inputs, energy, time, robustness, … – Known as multi objective optimization 8 Dynamic Programming (DP) Keywords: Bellman equations, dynamic programming What is dynamic programming (DP)? . Wikipedia: In mathematics, management science, economics, computer science, and bioinformatics, dynamic programming (also known as dynamic optimization) is a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions – ideally, using a memory-based data structure. 10 Resembles divide and conquer . Have a large problem . Partition into parts . Dynamic nature of problem links past, present, future . Want decision whose combined “costs” (current plus future) is best . Whereas brute force optimization is computationally intense, DP is fast 11 Problem setting . t – time . T – time horizon (maximal time) . xt – state at time t . Possible actions ( ) . T(x,a) – next state upon choosing action a ∈ Γ . F(x,a) – payoff from action a . Want to maximize our payoff up to horizon T 12 Solution approach . Basis case: t=T-1, have one time left for an action . Maximize payoff by maximizing F(xt,a) * a =arg maxa∈ F(xt,a) Γ * . At time T (end of problem) arrive at state xT=T(x,a ) – Don’t care about final state, only about payoff 13 Solution continued . Recursive case: t<T-1, have multiple decisions left . Let’s keep it simple with t=T-2 . Based on basis case, for each Xt-1=XT-2 can calculate a* for last decision (in next time step, t=T-1) . Want optimal cost to account for current payoff and payoff in next step = arg max { , + next_payoff(T(xt,a))} ( ) ∗ ∈Γ 14 Recursive solution . Let’s simplify recursive case for t=T-2 using notation for optimal actions / payoffs at time t – - optimal action at time t given state xt – ∗( ) – optimal payoff starting from time t Ψ * . Basis case provides a (xT-1), , ∀ . Recursive case for t=T-2 Ψ −1 = arg max { , + next_payoff(T(x ,a))} ( ) t ∗ = arg max { , + ( = ( , ))} (∈Γ) −1 . Repeat recursively∈Γ for smallerΨ t 15 Computationally efficient DP solution . Instead of processing from t up to T, reverse order: – t=T-1: compute , ( ) for all possible xt – t=T-2: = arg max∗ { , + ( = ( , ))} ( ) ∗ Ψ −1 −2 – t=T-3: = argm∈Γax { , + Ψ( = ( ,))} ( ) ∗ −2 −3 ∈Γ Ψ . General case: Bellman’s optimality equations . Each time step, store optimal actions and payoffs . Lookup table (LUT) for Ψ instead of recomputing . Can construct sequence of optimal actions with LUT 16 Why computationally efficient? . Let’s contrast computational complexities . Brute force optimization: – |Γ| actions per time step and T time steps – Must evaluate |Γ|T trajectories of actions – Θ(|Γ|T) . DP: – Compute , + ( = ( , ))} for |Γ| actions, T time – Θ(|Γ|T) Ψ . Whereas brute force optimization is computationally intense, DP is fast 17 Variations . Deterministic / random – Next state and payoff could be random – Example: there could be more users than expected; adjust server (action) to account for future trajectory of software . Finite / infinite horizon – Infinite horizon decision problems require discount factor β to give future payoffs at time t weight βt – Payoffs in far future matter less β<1 . Discrete / continuous time 18 Example [Cormen et al.] . Rod cutting problem . Have rod of integer length n . Have table of prices pi charged for length-i cuts . Cutting is free . Want to cut rod into parts (or not cut at all) to maximize profit 19 Example continued . Length n=4 . Can charge prices p1=1, p2=5, p3=8, p4=9 . Could look at all possible sets of cuts (see below) 20 Example using DP . Unrealistic to consider cutting configurations for large n, use DP instead . Basis: n=1, Ψ(1)=p1=1 . Recursion: n=2, Ψ(2)=max{2Ψ(1),p2}=5 . n=3, Ψ(3)=max{Ψ(1)+Ψ(2),p3}=max{5+1,8}=8 . At each stage, maximize over Ψ(k)+Ψ(n-k) for k=1,2,…,n-1; and for k=n use pn Ex: Ψ(7)=max{Ψ(1)+Ψ(6),Ψ(2)+Ψ(5),Ψ(3)+Ψ(4),…,p7} 21 Real-world application . Viterbi algorithm . Decodes convolution codes in CDMA – Also used in speech recognition – Text is “hidden” and (noisy) speech observations help estimate text . Relies on DP . Finds shortest path 22 Linear Programming Keywords: linear programming, simplex method Formulation . Canonical form max . , – Note: s.t. = subject to ≤ ≥0 . Matrix manipulations/tricks create variations: – Ax=b by enforcing ≤ and ≥ T – We’ve minimized ||x||1 (instead of c x) 24 What’s it good for? . Transportation - pose airline costs and revenue as linear model, maximize profit (revenue-costs) w/LP . Manufacturing – minimize costs by posing them as linear model . Common theme: many real-world problems are approximately linear, or can be linearized around working point (Taylor series) 25 History . Early formulations date back to early 20th century (rudimentary forms even earlier) . Dantzig invented simplex method (solver) in 40s – Polynomial average runtime; slow worst case . Interior point methods – much faster worst case 26 Simplex algorithm . Linear constraints, , 0 – Correspond to convex polytope . Linear function being optimized,≤ ≥ cTx – Optimal on corner point of polytope – Simplex = outer shell of convex polytope . Start at some corner point (vertex) . Examine neighboring vertices . Either cTx already optimal, or it’s better at neighbor . Move to best neighboring vertex; iterate until done . Specific steps correspond to linear algebra 27 Convex Optimization Keywords: convex optimization What are convex/concave functions? . Consider convex real valued function defined on space , : → ℝ . Convex: f(λx+(1- λ)y))≤ λf(x)+(1- λ)f(y),∀x,y∈ , λ∈(0,1) . Concave: -”- ≥ -”- . Note: f convex if and only if –f concave; convex/concave imply negative/positive second derivatives . Any local optimum is global optimum 29 What is convex optimization? . Basic convex problem: = arg min{f x } – Set and function f(x) must∗ both be convex ∈ . Alternate form: min ( ) . , – Functions f, g , …, g all convex 1 m ≤0 ∀ 30 Applications (Why is this interesting?) . Many problems can be posed as convex . Least squares . Entropy maximization . Linear programming 31 Newton’s method . Newton’s method finds roots of equations, f(x)=0 . Instead, derivative f`=0 or gradient = 0 2 . Taylor expansion: fT(x)=f(xt)+ + +… 1 . Root of derivative: f’(x )+f’’(x ) =0 t f′t � ∆ 2 f′′ � ∆ . Iterate with x =x + t+1 t ∆ ∆ . Newton’s method is simple but O(1/t) convergence 32 Second order methods . Challenge: first order approximation to derivative slows down Newton’s method . Solution: use higher order approximation – Instead of f’(xt)+f’’(xt) , use third derivative too ∆ . Multi-dimensional function? Use gradient, Hessians… . Second order methods more complicated but faster 33 Gradient descent Keywords: gradient descent, line search, golden section search Gradient descent . In each iteration, select direction to pursue – Coordinate descent – move along one of coordinates – Gradient descent - direction that minimizes cost function fastest . How far should we move along that direction? . Undershooting or overshooting bad for convergence 35 Line search . Key sub-method is to move along direction just enough to minimize the function along that line . Line search = optimization along line . Many variations – binary search, golden rule search . Let’s make up an example for this and code it! – Check course webpage for Matlab script 36 Integer Programming Keywords: integer programming, integer linear programs, relaxation What is integer programming? . Integer program = optimization problem where some/all variables must be integers . Integer linear programs (ILP): = arg max . � += – Slack variables s ≥0 ∈ℤ 38 Example . Support set detection – y=Ax+z – Sparse x – Want to identify support set where x≠0 . Can we do “perfect” support set detection? – Are there tiny non-zeros? (yes difficult) – What’s the SNR (low difficult) 39 Example continued . Support set detection, y=Ax+z, want support set . Algorithm: – Consider candidate support set, s∈{0,1}N – Create matrix As contains column i iff si=1 – Run least squares using As (find low-energy solution to y=Asx) – Iterate over all s, select solution with smallest residual .