Risk-Averse Decision Making and Control
Marek Petrik University of New Hampshire
Mohammad Ghavamzadeh Adobe Research
February 4, 2017 Introduction to Risk Averse Modeling Outline
Introduction to Risk Averse Modeling
(Average) Value at Risk
Coherent Measures of Risk
Risk Measures in Reinforcement Learning
Time consistency of in reinforcement learning
Summary
Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Schedule
9:00–9:20 Introduction to risk-averse modeling 9:20–9:40 Value at Risk and Average Value at Risk 9:40–9:50 Break 9:50–10:30 Coherent Measures of Risk: Properties and methods 10:30–11:00 Coffee break 11:00–12:30 Risk-averse reinforcement learning 12:30–12:40 Break 12:40–12:55 Time consistent measures of risk
Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Risk Aversion
Risk (Wikipedia): Risk is the potential of gaining or losing something of value. . . . Uncertainty is a potential, unpredictable, and uncontrollable outcome; risk is a consequence of action taken in spite of uncertainty. Risk aversion (Wikipedia): ... risk aversion is the behavior of humans, when exposed to uncertainty, to attempt to reduce that uncertainty. . . . Tutorial: Modern methods for risk-averse decision making
Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Desire for Risk Aversion
I Empirical evidence: 1. People buy insurance 2. Diversifying financial portfolios 3. Experimental results
I Other reasons:
I Reduce contingency planning
Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Where Risk Aversion Matters
I Financial portfolios
I Heath-care decisions
I Agriculture
I Public infrastructure
I Self-driving cars?
Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling When Risks Are Ignored . . .
Housing bubble leads to a financial collapse
Seawalls overflow in a tsunami
Risk-Averse Decision Making and Control I Expected utility theory: E[u(X)] = E[utility(X)]
I Exponential utility function (Bernoulli functions): 1 e−a x u(x) = − a
1.0
0.8
0.6
0.4 Utility Function 0.2
0.0
0 2000 4000 6000 8000 10000 Reward ($)
Introduction to Risk Averse Modeling Need to Quantify Risk
I Mitigating risk is expensive, how much is it worth?
Risk-Averse Decision Making and Control I Exponential utility function (Bernoulli functions): 1 e−a x u(x) = − a
1.0
0.8
0.6
0.4 Utility Function 0.2
0.0
0 2000 4000 6000 8000 10000 Reward ($)
Introduction to Risk Averse Modeling Need to Quantify Risk
I Mitigating risk is expensive, how much is it worth? I Expected utility theory: E[u(X)] = E[utility(X)]
Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Need to Quantify Risk
I Mitigating risk is expensive, how much is it worth? I Expected utility theory: E[u(X)] = E[utility(X)]
I Exponential utility function (Bernoulli functions): 1 e−a x u(x) = − a
1.0
0.8
0.6
0.4 Utility Function 0.2
0.0
0 2000 4000 6000 8000 10000 Reward ($)
Risk-Averse Decision Making and Control E $237.50 $272.00 $330.00 − − −
Expected utility: Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − − Risk-neutral choice: no insurance
Introduction to Risk Averse Modeling Example: Buying Car Insurance
Car value: $10 000
Insurance options Option Deductible Cost X1 $10 000 $0 X2 $2 000 $112 X3 $100 $322
Risk-Averse Decision Making and Control E $237.50 $272.00 $330.00 Risk-neutral choice: no insurance− − −
Introduction to Risk Averse Modeling Example: Buying Car Insurance
Car value: $10 000
Insurance options Option Deductible Cost X1 $10 000 $0 X2 $2 000 $112 X3 $100 $322
Expected utility: Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − −
Risk-Averse Decision Making and Control E $237.50 $272.00 $330.00 Risk-neutral choice: no insurance− − −
Introduction to Risk Averse Modeling Example: Buying Car Insurance
Car value: $10 000
Insurance options Option Deductible Cost X1 $10 000 $0 X2 $2 000 $112 X3 $100 $322
Expected utility: Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − −
Risk-Averse Decision Making and Control E $237.50 $272.00 $330.00 Risk-neutral choice: no insurance− − −
Introduction to Risk Averse Modeling Example: Buying Car Insurance
Car value: $10 000
Insurance options Option Deductible Cost X1 $10 000 $0 X2 $2 000 $112 X3 $100 $322
Expected utility: Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − −
Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Example: Buying Car Insurance
Car value: $10 000
Insurance options Option Deductible Cost X1 $10 000 $0 X2 $2 000 $112 X3 $100 $322
Expected utility: Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − − E $237.50 $272.00 $330.00 Risk-neutral choice: no insurance− − −
Risk-Averse Decision Making and Control Prefer insurance, but difficult to interpret and elicit
Introduction to Risk Averse Modeling Risk Averse Utility Functions
I Exponential utility function 1 exp( 10−6 (x + 105)) u(x) = − − · 10−6
I X1 – no insurance
I X2 – high deductible insurance
Event P X1 u(X1) X2 u(X2) No accident 92% $0 1 111 $112 1 111 Minor accident 7.5% $2 500 1 109 −$2 112 1 110 Major accident 0.5% −$10 000 0 −$2 112 1 110 − − E $237.50 1 105 $272.00 1 111 − −
Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Risk Averse Utility Functions
I Exponential utility function 1 exp( 10−6 (x + 105)) u(x) = − − · 10−6
I X1 – no insurance
I X2 – high deductible insurance
Event P X1 u(X1) X2 u(X2) No accident 92% $0 1 111 $112 1 111 Minor accident 7.5% $2 500 1 109 −$2 112 1 110 Major accident 0.5% −$10 000 0 −$2 112 1 110 − − E $237.50 1 105 $272.00 1 111 − −
Prefer insurance, but difficult to interpret and elicit
Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Drawbacks of Expected Utility Theory
(Schoemaker 1980)
1. Does not explain human behavior 2. Difficult to elicit utilities
3. Complicates optimization (Friedman et al. 2014)
Risk-Averse Decision Making and Control 2. Risk measures: (Artzner et al. 1999)
I Value at risk (V@R) I Conditional value at risk (CV@R) I Coherent measures of risk
Introduction to Risk Averse Modeling Major Alternatives for Measuring Risk
1. Markowitz portfolios: Penalize dispersion risk hX i min Var ci Xi c≥0 · i hX i X s.t. E ci Xi = µ, ci = 1 · i i
Limited modeling capability and also penalizes upside
Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Major Alternatives for Measuring Risk
1. Markowitz portfolios: Penalize dispersion risk hX i min Var ci Xi c≥0 · i hX i X s.t. E ci Xi = µ, ci = 1 · i i
Limited modeling capability and also penalizes upside
2. Risk measures: (Artzner et al. 1999)
I Value at risk (V@R) I Conditional value at risk (CV@R) I Coherent measures of risk
Risk-Averse Decision Making and Control + Flexible modeling framework + Convenient to use with optimization and decision making + Easier to elicit than utilities Difficulties in sequential decision making −
Introduction to Risk Averse Modeling Coherent Measures of Risk
Topic of this tutorial
I Alternative to expected utility theory
Risk-Averse Decision Making and Control + Convenient to use with optimization and decision making + Easier to elicit than utilities Difficulties in sequential decision making −
Introduction to Risk Averse Modeling Coherent Measures of Risk
Topic of this tutorial
I Alternative to expected utility theory + Flexible modeling framework
Risk-Averse Decision Making and Control + Easier to elicit than utilities Difficulties in sequential decision making −
Introduction to Risk Averse Modeling Coherent Measures of Risk
Topic of this tutorial
I Alternative to expected utility theory + Flexible modeling framework + Convenient to use with optimization and decision making
Risk-Averse Decision Making and Control Difficulties in sequential decision making −
Introduction to Risk Averse Modeling Coherent Measures of Risk
Topic of this tutorial
I Alternative to expected utility theory + Flexible modeling framework + Convenient to use with optimization and decision making + Easier to elicit than utilities
Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Coherent Measures of Risk
Topic of this tutorial
I Alternative to expected utility theory + Flexible modeling framework + Convenient to use with optimization and decision making + Easier to elicit than utilities Difficulties in sequential decision making −
Risk-Averse Decision Making and Control (Average) Value at Risk Outline
Introduction to Risk Averse Modeling
(Average) Value at Risk
Coherent Measures of Risk
Risk Measures in Reinforcement Learning
Time consistency of in reinforcement learning
Summary
Risk-Averse Decision Making and Control (Average) Value at Risk Schedule
9:00–9:20 Introduction to risk-averse modeling 9:20–9:40 Value at Risk and Average Value at Risk 9:40–9:50 Break 9:50–10:30 Coherent Measures of Risk: Properties and methods 10:30–11:00 Coffee break 11:00–12:30 Risk-averse reinforcement learning 12:30–12:40 Break 12:40–12:55 Time consistent measures of risk
Risk-Averse Decision Making and Control I Expectation is a risk measure X ρ(X) = E[X] = X(ω)P (ω) ω∈Ω
I Risk neutral
I Worst-case is a risk measure
ρ(X) = min[X] = min X(ω) ω∈Ω
I Very risk averse
(Average) Value at Risk Risk Measure
Risk measure: function ρ that maps random variable to a real number
Risk-Averse Decision Making and Control I Worst-case is a risk measure
ρ(X) = min[X] = min X(ω) ω∈Ω
I Very risk averse
(Average) Value at Risk Risk Measure
Risk measure: function ρ that maps random variable to a real number
I Expectation is a risk measure X ρ(X) = E[X] = X(ω)P (ω) ω∈Ω
I Risk neutral
Risk-Averse Decision Making and Control (Average) Value at Risk Risk Measure
Risk measure: function ρ that maps random variable to a real number
I Expectation is a risk measure X ρ(X) = E[X] = X(ω)P (ω) ω∈Ω
I Risk neutral
I Worst-case is a risk measure
ρ(X) = min[X] = min X(ω) ω∈Ω
I Very risk averse
Risk-Averse Decision Making and Control α = 0.3 More conservative α = 0.05 Conservative α = 0 Worst case
(Average) Value at Risk V@R: Value at Risk
ρ(X) = V@Rα(X) = sup t : P[X t] <α ≤ Rewards smaller than V@Rα(X) with probability at most α
Example α values: α = 0.5 Median
Risk-Averse Decision Making and Control α = 0.05 Conservative α = 0 Worst case
(Average) Value at Risk V@R: Value at Risk
ρ(X) = V@Rα(X) = sup t : P[X t] <α ≤ Rewards smaller than V@Rα(X) with probability at most α
Example α values: α = 0.5 Median α = 0.3 More conservative
Risk-Averse Decision Making and Control α = 0 Worst case
(Average) Value at Risk V@R: Value at Risk
ρ(X) = V@Rα(X) = sup t : P[X t] <α ≤ Rewards smaller than V@Rα(X) with probability at most α
Example α values: α = 0.5 Median α = 0.3 More conservative α = 0.05 Conservative
Risk-Averse Decision Making and Control (Average) Value at Risk V@R: Value at Risk
ρ(X) = V@Rα(X) = sup t : P[X t] <α ≤ Rewards smaller than V@Rα(X) with probability at most α
Example α values: α = 0.5 Median α = 0.3 More conservative α = 0.05 Conservative α = 0 Worst case
Risk-Averse Decision Making and Control (Average) Value at Risk V@R Example 1: Cumulative Distribution Function
[email protected](X) = 1.7 −
1.0
0.8
0.6 CDF
0.4
0.2
0.0 6 4 2 0 2 4 6 − − − Reward ($)
Risk-Averse Decision Making and Control (Average) Value at Risk V@R Example 2: Cumulative Distribution Function
[email protected](X) = 0.5 −
1.0
0.8
0.6 CDF
0.4
0.2
0.0 6 4 2 0 2 4 6 − − − Reward ($)
Risk-Averse Decision Making and Control (Average) Value at Risk Car Insurance And V@R: 25%
1.0
0.8 Event P X1 No accident 92% $0 0.6 CDF Minor accident 7.5% $2 500 0.4 − Major accident 0.5% $10 000 0.2 − 0.0
15000 10000 5000 0 5000 − − − Reward ($)
V@Rα(X) = sup t : P[X t] < α α = 0.25 ≤ t P[X t] α ≤ $2 600 0.005 0.25 −$2 500 0.008 0.25 − $0 1.000 0.25
Risk-Averse Decision Making and Control (Average) Value at Risk Car Insurance And V@R: 8%
1.0
0.8 Event P X1 No accident 92% $0 0.6 CDF Minor accident 7.5% $2 500 0.4 − Major accident 0.5% $10 000 0.2 − 0.0
15000 10000 5000 0 5000 − − − Reward ($)
V@Rα(X) = sup t : P[X t] < α α = 0.008 ≤ t P[X t] α ≤ $2 500 0.005 0.008 −$2 400 0.008 0.008 −
Risk-Averse Decision Making and Control (Average) Value at Risk Car Insurance And V@R
I X1: no insurance (high risk)
I X2: high deductible insurance (medium risk)
I X3: low deductible insurance (low risk)
Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − − E $238 $272 $330 − − −
Risk-Averse Decision Making and Control (Average) Value at Risk Car Insurance And V@R
I X1: no insurance (high risk)
I X2: high deductible insurance (medium risk)
I X3: low deductible insurance (low risk)
Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − − E $238 $272 $330 − − − [email protected] $0 $112 $322 − −
Risk-Averse Decision Making and Control (Average) Value at Risk Car Insurance And V@R
I X1: no insurance (high risk)
I X2: high deductible insurance (medium risk)
I X3: low deductible insurance (low risk)
Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − − E $238 $272 $330 − − − [email protected] $0 $112 $322 − − [email protected] $2 500 $2 112 $422 − − −
Risk-Averse Decision Making and Control Coherent measures of risk: Preserve V@R positives and improve negatives (Artzner et al. 1999)
(Average) Value at Risk Properties of V@R
+ Preserves affine transformations:
V@Rα(τ X + c) = τ V@Rα(X) + c · · + Simple and intuitive to model and understand + Compelling meaning in finance Ignores heavy tails − Not convex −
Risk-Averse Decision Making and Control (Average) Value at Risk Properties of V@R
+ Preserves affine transformations:
V@Rα(τ X + c) = τ V@Rα(X) + c · · + Simple and intuitive to model and understand + Compelling meaning in finance Ignores heavy tails − Not convex − Coherent measures of risk: Preserve V@R positives and improve negatives (Artzner et al. 1999)
Risk-Averse Decision Making and Control (Average) Value at Risk Average Value at Risk
I AKA Conditional Value at Risk and Expected Shortfall
I Popular coherent risk measure ρ
I Simple definition for atomless distributions: h i CV@Rα(X) = E X X V@Rα(X) | ≤ I Recall: V@Rα(X) = sup t : P[X t] < α ≤ I Convex extension of V@R (Rockafellar and Uryasev 2000)
Risk-Averse Decision Making and Control (Average) Value at Risk V@R vs CV@R: Cumulative Distribution Function
[email protected](X) = 0.5 [email protected](X) = 1.1 − −
1.0 1.0
0.8 0.8
0.6 0.6 CDF CDF
0.4 0.4
0.2 0.2
0.0 0.0 6 4 2 0 2 4 6 6 5 4 3 2 1 − − − − − − − − − Reward ($) Reward ($)
Risk-Averse Decision Making and Control [email protected] $2 500 [email protected] $2 500 − − [email protected] $3 250 [email protected] $102 250 − −
(Average) Value at Risk CV@R vs V@R: Heavy Tails
A more expensive car?
Event P X1 Event P X1 No accident 92% $0 No accident 92% $0 Minor acc. 7.5% $2 500 Minor acc. 7.5% $2 500 Major acc. 0.5% −$10 000 Major acc. 0.5% $1− 000 000 − −
Risk-Averse Decision Making and Control [email protected] $3 250 [email protected] $102 250 − −
(Average) Value at Risk CV@R vs V@R: Heavy Tails
A more expensive car?
Event P X1 Event P X1 No accident 92% $0 No accident 92% $0 Minor acc. 7.5% $2 500 Minor acc. 7.5% $2 500 Major acc. 0.5% −$10 000 Major acc. 0.5% $1− 000 000 − − [email protected] $2 500 [email protected] $2 500 − −
Risk-Averse Decision Making and Control (Average) Value at Risk CV@R vs V@R: Heavy Tails
A more expensive car?
Event P X1 Event P X1 No accident 92% $0 No accident 92% $0 Minor acc. 7.5% $2 500 Minor acc. 7.5% $2 500 Major acc. 0.5% −$10 000 Major acc. 0.5% $1− 000 000 − − [email protected] $2 500 [email protected] $2 500 − − [email protected] $3 250 [email protected] $102 250 − −
Risk-Averse Decision Making and Control (Average) Value at Risk V@R: Heavy Tails and Financial Crisis
Risk-Averse Decision Making and Control (Average) Value at Risk CV@R vs V@R: Continuity
0
2000 −
4000 − Risk ($) 6000 −
8000 − V@R AV@R 10000 − 0.0 0.2 0.4 0.6 0.8 1.0 Risk level α
Risk-Averse Decision Making and Control (Average) Value at Risk Schedule
9:00–9:20 Introduction to risk-averse modeling 9:20–9:40 Value at Risk and Average Value at Risk 9:40–9:50 Break 9:50–10:30 Coherent Measures of Risk: Properties and methods 10:30–11:00 Coffee break 11:00–12:30 Risk-averse reinforcement learning 12:30–12:40 Break 12:40–12:55 Time consistent measures of risk
Risk-Averse Decision Making and Control Coherent Measures of Risk Outline
Introduction to Risk Averse Modeling
(Average) Value at Risk
Coherent Measures of Risk
Risk Measures in Reinforcement Learning
Time consistency of in reinforcement learning
Summary
Risk-Averse Decision Making and Control Coherent Measures of Risk Schedule
9:00–9:20 Introduction to risk-averse modeling 9:20–9:40 Value at Risk and Average Value at Risk 9:40–9:50 Break 9:50–10:30 Coherent Measures of Risk 10:30–11:00 Coffee break 11:00–12:30 Risk-averse reinforcement learning 12:30–12:40 Break 12:40–12:55 Time consistent measures of risk
Risk-Averse Decision Making and Control Coherent Measures of Risk Coherent Measures of Risk
I Generalize CV@R to allow more general models
I Framework introduced in (Artzner et al. 1999)
I Coherence: Requirements for risk measure ρ to satisfy
I Our treatment based on (Shapiro, Dentcheva, and Ruszczynski 2009) and (Follmer and Schied 2011)
Risk-Averse Decision Making and Control Coherent Measures of Risk Coherence Requirements of Risk Measures
1. Convexity: (really concavity for maximization!)
ρ(t X + (1 t) Y ) t ρ(X) + (1 t) ρ(Y ) · − · ≥ · − · 2. Monotonicity:
If X Y , then ρ(X) ρ(Y ) ≥ 3. Translation equivariance: For a constant a:
ρ(X + a) = ρ(X) + a
4. Positive homogeneity: For t > 0, then:
ρ(t X) = t ρ(X) · ·
Risk-Averse Decision Making and Control 1 1 Event P X1 X2 2 X1 + 2 X2 No accident 92% $0 $112 $56 Minor accident 7.5% $2 500 −$2 112 $2− 306 Major accident 0.5% −$10 000 −$2 112 −$6 056 − − − CV@R $238 $272 $240 − − −
238 + 272 240 − − = 255 − ≥ 2 −
Coherent Measures of Risk Convexity Why: Diversification should decrease risk (and it helps with optimization)
ρ(t X + (1 t) Y ) t ρ(X) + (1 t) ρ(Y ) · − · ≥ · − ·
Risk-Averse Decision Making and Control 238 + 272 240 − − = 255 − ≥ 2 −
Coherent Measures of Risk Convexity Why: Diversification should decrease risk (and it helps with optimization)
ρ(t X + (1 t) Y ) t ρ(X) + (1 t) ρ(Y ) · − · ≥ · − ·
1 1 Event P X1 X2 2 X1 + 2 X2 No accident 92% $0 $112 $56 Minor accident 7.5% $2 500 −$2 112 $2− 306 Major accident 0.5% −$10 000 −$2 112 −$6 056 − − − CV@R $238 $272 $240 − − −
Risk-Averse Decision Making and Control Coherent Measures of Risk Convexity Why: Diversification should decrease risk (and it helps with optimization)
ρ(t X + (1 t) Y ) t ρ(X) + (1 t) ρ(Y ) · − · ≥ · − ·
1 1 Event P X1 X2 2 X1 + 2 X2 No accident 92% $0 $112 $56 Minor accident 7.5% $2 500 −$2 112 $2− 306 Major accident 0.5% −$10 000 −$2 112 −$6 056 − − − CV@R $238 $272 $240 − − −
238 + 272 240 − − = 255 − ≥ 2 −
Risk-Averse Decision Making and Control 0 X2: Insurance with deductible of $10 000 0 Event P X1 X2 No accident 92% $0 $112 Minor accident 7.5% $2 500 −$2 500 Major accident 0.5% −$10 000 −$10 000 − − ρ $238 $320 − −
$320 $238 − ≤ −
Coherent Measures of Risk Monotonicity
Why: Do not prefer an outcome that is always worse
If X Y , then ρ(X) ρ(Y ) ≥
Risk-Averse Decision Making and Control $320 $238 − ≤ −
Coherent Measures of Risk Monotonicity
Why: Do not prefer an outcome that is always worse
If X Y , then ρ(X) ρ(Y ) ≥
0 X2: Insurance with deductible of $10 000 0 Event P X1 X2 No accident 92% $0 $112 Minor accident 7.5% $2 500 −$2 500 Major accident 0.5% −$10 000 −$10 000 − − ρ $238 $320 − −
Risk-Averse Decision Making and Control Coherent Measures of Risk Monotonicity
Why: Do not prefer an outcome that is always worse
If X Y , then ρ(X) ρ(Y ) ≥
0 X2: Insurance with deductible of $10 000 0 Event P X1 X2 No accident 92% $0 $112 Minor accident 7.5% $2 500 −$2 500 Major accident 0.5% −$10 000 −$10 000 − − ρ $238 $320 − −
$320 $238 − ≤ −
Risk-Averse Decision Making and Control More expensive insurance by $100 Event P X2 X2 No accident 92% $112 $212 Minor accident 7.5% −$2 112 −$2 212 Major accident 0.5% −$2 112 −$2 212 − − ρ $272 $372 − −
$372 = $272 $100 − − −
Coherent Measures of Risk Translation equivariance
Why: Risk is measured in the same units as the reward
ρ(X + a) = ρ(X) + a
Risk-Averse Decision Making and Control $372 = $272 $100 − − −
Coherent Measures of Risk Translation equivariance
Why: Risk is measured in the same units as the reward
ρ(X + a) = ρ(X) + a
More expensive insurance by $100 Event P X2 X2 No accident 92% $112 $212 Minor accident 7.5% −$2 112 −$2 212 Major accident 0.5% −$2 112 −$2 212 − − ρ $272 $372 − −
Risk-Averse Decision Making and Control Coherent Measures of Risk Translation equivariance
Why: Risk is measured in the same units as the reward
ρ(X + a) = ρ(X) + a
More expensive insurance by $100 Event P X2 X2 No accident 92% $112 $212 Minor accident 7.5% −$2 112 −$2 212 Major accident 0.5% −$2 112 −$2 212 − − ρ $272 $372 − −
$372 = $272 $100 − − −
Risk-Averse Decision Making and Control What if the prices are in e: $1 = e0.94 Event P X2 X2 No accident 92% $112 e105 Minor accident 7.5% −$2 112 −e1 985 Major accident 0.5% −$2 112 −e1 985 − − ρ $272 e256 − −
$272 = e256 − −
Coherent Measures of Risk Positive homogeneity
Why: Risk is measured in the same units as the reward
ρ(t X) = t ρ(X) · ·
Risk-Averse Decision Making and Control $272 = e256 − −
Coherent Measures of Risk Positive homogeneity
Why: Risk is measured in the same units as the reward
ρ(t X) = t ρ(X) · ·
What if the prices are in e: $1 = e0.94 Event P X2 X2 No accident 92% $112 e105 Minor accident 7.5% −$2 112 −e1 985 Major accident 0.5% −$2 112 −e1 985 − − ρ $272 e256 − −
Risk-Averse Decision Making and Control Coherent Measures of Risk Positive homogeneity
Why: Risk is measured in the same units as the reward
ρ(t X) = t ρ(X) · ·
What if the prices are in e: $1 = e0.94 Event P X2 X2 No accident 92% $112 e105 Minor accident 7.5% −$2 112 −e1 985 Major accident 0.5% −$2 112 −e1 985 − − ρ $272 e256 − −
$272 = e256 − −
Risk-Averse Decision Making and Control Coherent Measures of Risk Convex Risk Measures
Weaker definition than coherent risk measures 1. Convexity:
ρ(t X + (1 t) Y ) t ρ(X) + (1 t) ρ(Y ) · − · ≤ · − · 2. Monotonicity:
If X Y , then ρ(X) ρ(Y ) ≥ 3. Translation equivariance: For a constant a:
ρ(X + a) = ρ(X) + a (( (((( 4. (Positive((( homogeneity
Risk-Averse Decision Making and Control Coherent Measures of Risk Additional Property: Law Invariance
Value of risk measure is independent of the names of the events
Consider a coin flip Event P X Y Heads 1/2 1 0 Tails 1/2 0 1
Require that ρ(X) = ρ(Y ); violated by some coherent risk measures
Distortion risk measures: coherence & law invariance & comonotonicity
Risk-Averse Decision Making and Control I Worst case: ρ(X) = min[X] = min X(ω) ω∈Ω
1. Convexity: min[X] is convex 2. Monotonicity: min[X] min[Y ] if X Y 3. Translation equivariance≥: min[X + a] = min[X] + a 4. Positive homogeneity: min[t X] = t E[X] for t > 0 · ·
Coherent Measures of Risk Simple Coherent Measures of Risk
I Expectation: X ρ(x) = E[X] = X(ω)P (ω) ω∈Ω
1. Convexity: E[X] is linear 2. Monotonicity: E[X] E[Y ] if X Y ≥ 3. Translation equivariance: E[X + a] = E[X] + a 4. Positive homogeneity: E[t X] = t E[X] for t > 0 · ·
Risk-Averse Decision Making and Control Coherent Measures of Risk Simple Coherent Measures of Risk
I Expectation: X ρ(x) = E[X] = X(ω)P (ω) ω∈Ω
1. Convexity: E[X] is linear 2. Monotonicity: E[X] E[Y ] if X Y ≥ 3. Translation equivariance: E[X + a] = E[X] + a 4. Positive homogeneity: E[t X] = t E[X] for t > 0 · · I Worst case: ρ(X) = min[X] = min X(ω) ω∈Ω
1. Convexity: min[X] is convex 2. Monotonicity: min[X] min[Y ] if X Y 3. Translation equivariance≥: min[X + a] = min[X] + a 4. Positive homogeneity: min[t X] = t E[X] for t > 0 · · Risk-Averse Decision Making and Control I Coherent definition of CV@R: n 1 o CV@Rα(X) = sup t + E[X t]− t α −
? I t = V@Rα(X) when the distribution is atom-less
I Definitions the same for continuous distributions
Coherent Measures of Risk CV@R for Discrete Distributions
I Simple definition is not coherent h i CV@Rα(X) = E X X V@Rα(X) ≤
I Violates convexity when distribution has atoms (discrete distributions)
Risk-Averse Decision Making and Control I Definitions the same for continuous distributions
Coherent Measures of Risk CV@R for Discrete Distributions
I Simple definition is not coherent h i CV@Rα(X) = E X X V@Rα(X) ≤
I Violates convexity when distribution has atoms (discrete distributions)
I Coherent definition of CV@R: n 1 o CV@Rα(X) = sup t + E[X t]− t α −
? I t = V@Rα(X) when the distribution is atom-less
Risk-Averse Decision Making and Control Coherent Measures of Risk CV@R for Discrete Distributions
I Simple definition is not coherent h i CV@Rα(X) = E X X V@Rα(X) ≤
I Violates convexity when distribution has atoms (discrete distributions)
I Coherent definition of CV@R: n 1 o CV@Rα(X) = sup t + E[X t]− t α −
? I t = V@Rα(X) when the distribution is atom-less
I Definitions the same for continuous distributions
Risk-Averse Decision Making and Control Coherent Measures of Risk Computing CV@R
I Discrete distributions: Solve a linear program 1 max t + p>y t,y α s.t. y X t, ≤ − y 0 ≤
I Continuous distributions: Closed form for many (Nadarajah, Zhang, and Chan 2014; Andreev, Kanto, and Malo 2005)
Risk-Averse Decision Making and Control Coherent Measures of Risk Car Insurance and CV@R
I X1 – no insurance
I X2 – high deductible insurance
I X3 – low deductible insurance
Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − − E $238 $272 $330 − − −
Risk-Averse Decision Making and Control Coherent Measures of Risk Car Insurance and CV@R
I X1 – no insurance
I X2 – high deductible insurance
I X3 – low deductible insurance
Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − − E $238 $272 $330 − − − [email protected] $0 $112 $322 − − [email protected] $950 $752 $354 − − −
Risk-Averse Decision Making and Control Coherent Measures of Risk Car Insurance and CV@R
I X1 – no insurance
I X2 – high deductible insurance
I X3 – low deductible insurance
Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − − E $238 $272 $330 − − − [email protected] $0 $112 $322 − − [email protected] $950 $752 $354 − − − [email protected] $2 500 $2 112 $422 − − − [email protected] $3 250 $2 112 $422 − − −
Risk-Averse Decision Making and Control I A is a set of measures such that is: 1. convex 2. bounded 3. closed I Proof: Double convex conjugate
I Convex conjugate:
ρ?(y) = sup x>y ρ(x) x −
I Fenchel–Moreau theorem:
ρ??(x) = ρ(x)
Coherent Measures of Risk Robust Representation of Coherent Risk Measures
I Important representation for analysis and optimization I For any coherent risk measure ρ:
> ρ(X) = min Eξ X = inf ξ X ξ∈A ξ∈A
Risk-Averse Decision Making and Control I Proof: Double convex conjugate
I Convex conjugate:
ρ?(y) = sup x>y ρ(x) x −
I Fenchel–Moreau theorem:
ρ??(x) = ρ(x)
Coherent Measures of Risk Robust Representation of Coherent Risk Measures
I Important representation for analysis and optimization I For any coherent risk measure ρ:
> ρ(X) = min Eξ X = inf ξ X ξ∈A ξ∈A
I A is a set of measures such that is: 1. convex 2. bounded 3. closed
Risk-Averse Decision Making and Control Coherent Measures of Risk Robust Representation of Coherent Risk Measures
I Important representation for analysis and optimization I For any coherent risk measure ρ:
> ρ(X) = min Eξ X = inf ξ X ξ∈A ξ∈A
I A is a set of measures such that is: 1. convex 2. bounded 3. closed I Proof: Double convex conjugate
I Convex conjugate:
ρ?(y) = sup x>y ρ(x) x −
I Fenchel–Moreau theorem:
ρ??(x) = ρ(x)
Risk-Averse Decision Making and Control Coherent Measures of Risk Robust Set for CV@R
n 1 o CV@Rα(X) = sup t + E[X t]− t α −
I Robust representation: ρ(X) = inf Eξ X ξ∈A
I Robust set for probability distribution P :
1 A = ξ 0 ξ P, 1>ξ = 1 ≥ | ≤ α
Risk-Averse Decision Making and Control Coherent Measures of Risk Robust Set for CV@R
I Robust representation: ρ(X) = min Eξ X ξ∈A 1 A = ξ 0 ξ P, 1>ξ = 1 ≥ | ≤ α
I Random variable: X = [10, 5, 2]
I Probability distribution: p = [1/3, 1/3, 1/3]
I CV@R1/2(X) =
min 10 ξ1 + 5 ξ2 + 2 ξ3 ξ≥0 1 1 2 ξi pi = 1/3 = ξ1 + ξ2 + ξ3 = 1 ≤ α 1/2 3
Risk-Averse Decision Making and Control Coherent Measures of Risk Other Coherent Risk Measures
1. Combination of expectation and CV@R
2. Entropic risk measure
3. Coherent entropic risk measure (convex, incoherent)
4. Risk measures from utility functions
5....
Risk-Averse Decision Making and Control Coherent Measures of Risk Convex Combination of Expectation and CV@R
I CV@R ignores the mean return
I Risk-averse solutions bad in expectation
I Practical trade-off: Combine mean and risk
ρ(X) = c E[X] + (1 c) CV@Rα(X) · − ·
Risk-Averse Decision Making and Control I Incoherent (violates translation invariance)
I No robust representation
I Coherent entropic risk measure:(F ¨ollmer and Knispel 2011)
n > o ρ(X) = max Eξ[X] KL(ξ P ) c, 1 ξ = 1 ξ≥0 | | ≤
Coherent Measures of Risk Entropic Risk Measure
−τ·X ρ(X) = 1/τ ln E e τ > 0 −
I Convex risk measure
Risk-Averse Decision Making and Control I Coherent entropic risk measure:(F ¨ollmer and Knispel 2011)
n > o ρ(X) = max Eξ[X] KL(ξ P ) c, 1 ξ = 1 ξ≥0 | | ≤
Coherent Measures of Risk Entropic Risk Measure
−τ·X ρ(X) = 1/τ ln E e τ > 0 −
I Convex risk measure
I Incoherent (violates translation invariance)
I No robust representation
Risk-Averse Decision Making and Control Coherent Measures of Risk Entropic Risk Measure
−τ·X ρ(X) = 1/τ ln E e τ > 0 −
I Convex risk measure
I Incoherent (violates translation invariance)
I No robust representation
I Coherent entropic risk measure:(F ¨ollmer and Knispel 2011)
n > o ρ(X) = max Eξ[X] KL(ξ P ) c, 1 ξ = 1 ξ≥0 | | ≤
Risk-Averse Decision Making and Control I Direct construction:
ρ(X) = E[u(X)]
Not coherent or convex
I Optimized Certainty Equivalent (Ben-Tal and Teboulle 2007) ρ(X) = sup (t + E[g(X t)]) t −
Coherent Measures of Risk Risk Measure From Utility Function
I Concave utility function u( ) · I Construct a coherent risk measure from g?
Risk-Averse Decision Making and Control I Optimized Certainty Equivalent (Ben-Tal and Teboulle 2007) ρ(X) = sup (t + E[g(X t)]) t −
Coherent Measures of Risk Risk Measure From Utility Function
I Concave utility function u( ) · I Construct a coherent risk measure from g?
I Direct construction:
ρ(X) = E[u(X)]
Not coherent or convex
Risk-Averse Decision Making and Control Coherent Measures of Risk Risk Measure From Utility Function
I Concave utility function u( ) · I Construct a coherent risk measure from g?
I Direct construction:
ρ(X) = E[u(X)]
Not coherent or convex
I Optimized Certainty Equivalent (Ben-Tal and Teboulle 2007) ρ(X) = sup (t + E[g(X t)]) t −
Risk-Averse Decision Making and Control I Convex risk measure for any concave u
I Coherent risk measure for pos. homogeneous u
I Exponential u: OCE = entropic risk measure
I Piecewise linear u: OCE = CV@R
Coherent Measures of Risk Optimized Certainty Equivalent
ρ(X) = sup (t + E[g(X t)]) t −
I How much consume now given uncertain future
Risk-Averse Decision Making and Control I Exponential u: OCE = entropic risk measure
I Piecewise linear u: OCE = CV@R
Coherent Measures of Risk Optimized Certainty Equivalent
ρ(X) = sup (t + E[g(X t)]) t −
I How much consume now given uncertain future
I Convex risk measure for any concave u
I Coherent risk measure for pos. homogeneous u
Risk-Averse Decision Making and Control Coherent Measures of Risk Optimized Certainty Equivalent
ρ(X) = sup (t + E[g(X t)]) t −
I How much consume now given uncertain future
I Convex risk measure for any concave u
I Coherent risk measure for pos. homogeneous u
I Exponential u: OCE = entropic risk measure
I Piecewise linear u: OCE = CV@R
Risk-Averse Decision Making and Control Coherent Measures of Risk Recommended References
I Lectures on Stochastic Programming: Modeling and Theory (Shapiro, Dentcheva, and Ruszczynski 2014)
I Stochastic Finance: An Introduction in Discrete Time (Follmer and Schied 2011)
Risk-Averse Decision Making and Control Coherent Measures of Risk Remainder of Tutorial: Multistage Optimization
I How to apply risk measures when optimizing over multiple time steps
I Results in machine learning and reinforcement learning
I Time or dynamic consistency in multiple time steps
Risk-Averse Decision Making and Control Coherent Measures of Risk Schedule
9:00–9:20 Introduction to risk-averse modeling 9:20–9:40 Value at Risk and Average Value at Risk 9:40–9:50 Break 9:50–10:30 Coherent Measures of Risk: Properties and methods 10:30–11:00 Coffee break 11:00–12:30 Risk-averse reinforcement learning 12:30–12:40 Break 12:40–12:55 Time consistent measures of risk
Risk-Averse Decision Making and Control Risk Measures in Reinforcement Learning Outline
Introduction to Risk Averse Modeling
(Average) Value at Risk
Coherent Measures of Risk
Risk Measures in Reinforcement Learning
Time consistency of in reinforcement learning
Summary
Risk-Averse Decision Making and Control Risk Measures in Reinforcement Learning Schedule
9:00–9:20 Introduction to risk-averse modeling 9:20–9:40 Value at Risk and Average Value at Risk 9:40–9:50 Break 9:50–10:30 Coherent Measures of Risk: Properties and methods 10:30–11:00 Coffee break 11:00–12:30 Risk-averse reinforcement learning 12:30–12:40 Break 12:40–12:55 Time consistent measures of risk
Risk-Averse Decision Making and Control Risk Measures in Reinforcement Learning
Please see the other slide deck
Risk-Averse Decision Making and Control Risk Measures in Reinforcement Learning Schedule
9:00–9:20 Introduction to risk-averse modeling 9:20–9:40 Value at Risk and Average Value at Risk 9:40–9:50 Break 9:50–10:30 Coherent Measures of Risk: Properties and methods 10:30–11:00 Coffee break 11:00–12:30 Risk-averse reinforcement learning 12:30–12:40 Break 12:40–12:55 Time consistent measures of risk
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Outline
Introduction to Risk Averse Modeling
(Average) Value at Risk
Coherent Measures of Risk
Risk Measures in Reinforcement Learning
Time consistency of in reinforcement learning
Summary
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Schedule
9:00–9:20 Introduction to risk-averse modeling 9:20–9:40 Value at Risk and Average Value at Risk 9:40–9:50 Break 9:50–10:30 Coherent Measures of Risk: Properties and methods 10:30–11:00 Coffee break 11:00–12:30 Risk-averse reinforcement learning 12:30–12:40 Break 12:40–12:55 Time consistent measures of risk
Risk-Averse Decision Making and Control Option 2: Custom Insurance
I Take a safety exam I Pass with probability 1/2 2 I OK[ P = /3]: +$5.00 2 I Not[ P = /3]: $20.00 − I Fail with probability 1/2 2 I OK[ P = /3]: $5.00 2 − I Not[ P = /3]: $10.00 −
Risk measure: ρ = CV@R2/3
Time consistency of in reinforcement learning Example: Driving Test Discount
Option 1: Plain Insurance
I Cost: $9.00
I No deductible
I Certain expected outcome:
E[X1] = 9.00 − ρ(X1) = E[X1] = 9.00 −
Risk-Averse Decision Making and Control Risk measure: ρ = CV@R2/3
Time consistency of in reinforcement learning Example: Driving Test Discount
Option 1: Plain Insurance Option 2: Custom Insurance I Cost: $9.00
I No deductible I Take a safety exam 1 I Certain expected outcome: I Pass with probability /2 2 I OK[ P = /3]: +$5.00 2 E[X1] = 9.00 I Not[ P = /3]: $20.00 − − I Fail with probability 1/2 ρ(X1) = E[X1] = 9.00 2 − I OK[ P = /3]: $5.00 2 − I Not[ P = /3]: $10.00 −
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Example: Driving Test Discount
Option 1: Plain Insurance Option 2: Custom Insurance I Cost: $9.00
I No deductible I Take a safety exam 1 I Certain expected outcome: I Pass with probability /2 2 I OK[ P = /3]: +$5.00 2 E[X1] = 9.00 I Not[ P = /3]: $20.00 − − I Fail with probability 1/2 ρ(X1) = E[X1] = 9.00 2 − I OK[ P = /3]: $5.00 2 − I Not[ P = /3]: $10.00 −
Risk measure: ρ = CV@R2/3
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Risk Measure of Option 2
OK −5 1/3 1 /3 Risk measure: 1 OK −5 /3 2 ρ(X ) = CV@R2 (X ) 1 / 2 /3 2 Fail = P NOK −10 P X2 Take test 1/6 5 1 − OK 5 /6 5 P Pass 1 − 1/3 /6 10 = − 1 1 / 1 3 /6 5 2 / 1 OK 5 1/6 5 /3 1/6 20 − NOK −20
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Risk Measure of Option 2
Risk measure: OK −5 ρ(X2) = CV@R2/3(X2) 1/3
1/3 P X2 1 OK −5 /3 2 1/6 5 1 / Fail − = 1/6 5 P − NOK −10 1/6 10 − Take test 1/6 5 OK 5 1/6 5 P Pass 1 3 1 = / /6 20 1 − / 1 3 2 / 1 OK 5 /3 5 5 10 20 ρ(X ) = − − − − = 2 4 NOK −20 = 10.0 < 9.0 = ρ(X1) − −
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Risk Measure of Option 2
OK −5 Risk measure: 1 3 / ρ(X2) = CV@R2/3(X2) 1/3 OK −5 1/ P X2 2 3 / 1 1 6 Fail / 5 = − 1/6 5 P NOK −10 − 1/6 10 Take test − 1/6 5 OK 5 P Pass 1/6 5 1/3 = 1/6 20 1 / 1 3 2 / − 1 OK 5 /3
ρ(X2) < ρ(X1) NOK −20 Prefer option 1
Risk-Averse Decision Making and Control Fail test 1
1/3 1 /3 / 3 NOK −10 OK −5 OK −5
P 1/3 1/3 1/3 X2 10 5 5 − − − 20 + 5 15 + 5 ρ(X2 Pass) = − = 7.5 ρ(X2 Fail) = − = 7.5 | 2 − | 2 − If pass, prefer option 2 If fail, prefer option 2 Time inconsistent behavior (Roorda, Schumacher, and Engwerda 2005; Iancu, Petrik, and Subramanian 2015)
Time consistency of in reinforcement learning Optimal Solution of Subproblems
Recall we prefer option 1: ρ(X1) = 9 −
Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5
P 1/3 1/3 1/3 X2 20 5 5 −
Risk-Averse Decision Making and Control Fail test 1
1/3 1 /3 / 3 NOK −10 OK −5 OK −5
P 1/3 1/3 1/3 X2 10 5 5 − − − 15 + 5 ρ(X2 Fail) = − = 7.5 | 2 − If fail, prefer option 2 Time inconsistent behavior (Roorda, Schumacher, and Engwerda 2005; Iancu, Petrik, and Subramanian 2015)
Time consistency of in reinforcement learning Optimal Solution of Subproblems
Recall we prefer option 1: ρ(X1) = 9 −
Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5
P 1/3 1/3 1/3 X2 20 5 5 − 20 + 5 ρ(X2 Pass) = − = 7.5 | 2 − If pass, prefer option 2
Risk-Averse Decision Making and Control Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5
P 1/3 1/3 1/3 X2 20 5 5 − 20 + 5 15 + 5 ρ(X2 Pass) = − = 7.5 ρ(X2 Fail) = − = 7.5 | 2 − | 2 − If pass, prefer option 2 If fail, prefer option 2 Time inconsistent behavior (Roorda, Schumacher, and Engwerda 2005; Iancu, Petrik, and Subramanian 2015)
Time consistency of in reinforcement learning Optimal Solution of Subproblems
Recall we prefer option 1: ρ(X1) = 9 −
Fail test 1
1/3 1 /3 / 3 NOK −10 OK −5 OK −5
P 1/3 1/3 1/3 X2 10 5 5 − − −
Risk-Averse Decision Making and Control Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5
P 1/3 1/3 1/3 X2 20 5 5 − 20 + 5 15 + 5 ρ(X2 Pass) = − = 7.5 ρ(X2 Fail) = − = 7.5 | 2 − | 2 − If pass, prefer option 2 If fail, prefer option 2 Time inconsistent behavior (Roorda, Schumacher, and Engwerda 2005; Iancu, Petrik, and Subramanian 2015)
Time consistency of in reinforcement learning Optimal Solution of Subproblems
Recall we prefer option 1: ρ(X1) = 9 −
Fail test 1
1/3 1 /3 / 3 NOK −10 OK −5 OK −5
P 1/3 1/3 1/3 X2 10 5 5 − − −
Risk-Averse Decision Making and Control Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5
P 1/3 1/3 1/3 X2 20 5 5 − 20 + 5 ρ(X2 Pass) = − = 7.5 | 2 − If pass, prefer option 2 Time inconsistent behavior (Roorda, Schumacher, and Engwerda 2005; Iancu, Petrik, and Subramanian 2015)
Time consistency of in reinforcement learning Optimal Solution of Subproblems
Recall we prefer option 1: ρ(X1) = 9 −
Fail test 1
1/3 1 /3 / 3 NOK −10 OK −5 OK −5
P 1/3 1/3 1/3 X2 10 5 5 − − − 15 + 5 ρ(X2 Fail) = − = 7.5 | 2 − If fail, prefer option 2
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Optimal Solution of Subproblems
Recall we prefer option 1: ρ(X1) = 9 −
Pass test 1 Fail test 1
3 3 1 / 1/ 1 /3 1/ 3 / / 3 3 NOK −20 OK 5 OK 5 NOK −10 OK −5 OK −5
P 1/3 1/3 1/3 P 1/3 1/3 1/3 X2 20 5 5 X2 10 5 5 − − − − 20 + 5 15 + 5 ρ(X2 Pass) = − = 7.5 ρ(X2 Fail) = − = 7.5 | 2 − | 2 − If pass, prefer option 2 If fail, prefer option 2 Time inconsistent behavior (Roorda, Schumacher, and Engwerda 2005; Iancu, Petrik, and Subramanian 2015)
Risk-Averse Decision Making and Control I Time consistent: if for all X,Y (also dynamic consistent)
ρt+1(Xt + ) ρt+1(Yt + ) ρt(Xt + ) ρt(Yt + ) ··· ≥ ··· ⇒ ··· ≥ ···
I Similar to subproblem optimality in programming optimality
Time consistency of in reinforcement learning Time Consistent Risk Measures
I Filtration (scenario tree) of rewards with T levels:
X1,X2,X3,...,XT
I Dynamic risk measure at time t:
ρt(Xt + + XT ) ···
Risk-Averse Decision Making and Control I Similar to subproblem optimality in programming optimality
Time consistency of in reinforcement learning Time Consistent Risk Measures
I Filtration (scenario tree) of rewards with T levels:
X1,X2,X3,...,XT
I Dynamic risk measure at time t:
ρt(Xt + + XT ) ···
I Time consistent: if for all X,Y (also dynamic consistent)
ρt+1(Xt + ) ρt+1(Yt + ) ρt(Xt + ) ρt(Yt + ) ··· ≥ ··· ⇒ ··· ≥ ···
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Time Consistent Risk Measures
I Filtration (scenario tree) of rewards with T levels:
X1,X2,X3,...,XT
I Dynamic risk measure at time t:
ρt(Xt + + XT ) ···
I Time consistent: if for all X,Y (also dynamic consistent)
ρt+1(Xt + ) ρt+1(Yt + ) ρt(Xt + ) ρt(Yt + ) ··· ≥ ··· ⇒ ··· ≥ ···
I Similar to subproblem optimality in programming optimality
Risk-Averse Decision Making and Control I Markov risk measures for MDPs (Ruszczynski 2010)
Time consistency of in reinforcement learning Time Consistency via Iterated Risk Mappings
I Time consistent risk measures must be composed of iterated risk mappings (Roorda, Schumacher, and Engwerda 2005): µ1, µ2, . . . , µt
I Dynamic risk measure:
ρt(Xt + + XT ) = µt(Xt + µt+1(Xt+1 + µt+2(xt+3 + ))) ··· ···
I Each µt: a coherent risk measure applied on subtree of filtration
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Time Consistency via Iterated Risk Mappings
I Time consistent risk measures must be composed of iterated risk mappings (Roorda, Schumacher, and Engwerda 2005): µ1, µ2, . . . , µt
I Dynamic risk measure:
ρt(Xt + + XT ) = µt(Xt + µt+1(Xt+1 + µt+2(xt+3 + ))) ··· ···
I Each µt: a coherent risk measure applied on subtree of filtration
I Markov risk measures for MDPs (Ruszczynski 2010)
Risk-Averse Decision Making and Control Fail test 1
1/3 1 /3 / 3 NOK −10 OK −5 OK −5
15 + 5 ρ(X2 Fail) = − = 7.5 | 2 −
Take test 1 1/2 /2
Pass −7.5 Fail −7.5
ρ(X2) = ρ( 7.5) = 7.5 > 9 − − − Consistently prefer option 1 throughout the execution
Time consistency of in reinforcement learning Computing Time Consistent Risk Measure
Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5
20 + 5 ρ(X2 Pass) = − = 7.5 | 2 −
Risk-Averse Decision Making and Control Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5
20 + 5 ρ(X2 Pass) = − = 7.5 | 2 −
Take test 1 1/2 /2
Pass −7.5 Fail −7.5
ρ(X2) = ρ( 7.5) = 7.5 > 9 − − − Consistently prefer option 1 throughout the execution
Time consistency of in reinforcement learning Computing Time Consistent Risk Measure
Fail test 1
1/3 1 /3 / 3 NOK −10 OK −5 OK −5
15 + 5 ρ(X2 Fail) = − = 7.5 | 2 −
Risk-Averse Decision Making and Control Pass test 1 Fail test 1
3 3 1 / 1/ 1 /3 1/ 3 / / 3 3 NOK −20 OK 5 OK 5 NOK −10 OK −5 OK −5
20 + 5 15 + 5 ρ(X2 Pass) = − = 7.5 ρ(X2 Fail) = − = 7.5 | 2 − | 2 −
Consistently prefer option 1 throughout the execution
Time consistency of in reinforcement learning Computing Time Consistent Risk Measure
Take test 1 1/2 /2
Pass −7.5 Fail −7.5
ρ(X2) = ρ( 7.5) = 7.5 > 9 − − −
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Computing Time Consistent Risk Measure
Pass test 1 Fail test 1
3 3 1 / 1/ 1 /3 1/ 3 / / 3 3 NOK −20 OK 5 OK 5 NOK −10 OK −5 OK −5
20 + 5 15 + 5 ρ(X2 Pass) = − = 7.5 ρ(X2 Fail) = − = 7.5 | 2 − | 2 −
Take test 1 1/2 /2
Pass −7.5 Fail −7.5
ρ(X2) = ρ( 7.5) = 7.5 > 9 − − − Consistently prefer option 1 throughout the execution
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Approximating Inconsistent Risk Measures
I Time consistent risk measures are difficult to specify
I Approximate an inconsistent risk measure by a consistent one?
I Best lower bound: e.g. what is the best α1, α2 such that
CV@Rα (CV@Rα (X)) CV@Rα(X) for all X 1 2 ≤
I Best upper bound: e.g. what is the best α1, α2 such that
CV@Rα (CV@Rα (X)) CV@Rα(X) for all X 1 2 ≥ (Iancu, Petrik, and Subramanian 2015)
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Best Time Consistent Bounds
I Compare robust sets of consistent and inconsistent measures I Main insight: need to compare down-monotone closures of robust sets
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Time Consistent Bounds: Main Results
I Lower consistent bound:
I Uniformly tightest bound can be constructed in polynomial time I Method: rectangularization
I Upper consistent bound:
I NP hard to even evaluate how tight the approximation is I Approximation can be tighter than the lower bound
Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Planning with Time Consistent Risk Measures
I Stochastic dual dynamic programming (Shapiro 2012)
I Applied in reinforcement learning (Petrik and Subramanian 2012)
I Only entropic dynamically consistent risk measures are law invariant (Kupper and Schachermayer 2006)
Risk-Averse Decision Making and Control Summary Outline
Introduction to Risk Averse Modeling
(Average) Value at Risk
Coherent Measures of Risk
Risk Measures in Reinforcement Learning
Time consistency of in reinforcement learning
Summary
Risk-Averse Decision Making and Control Summary Risk Measures: Many Other Topics
1. Elicitation of risk measures
2. Estimation of risk measure from samples
3. Relationship to acceptance sets
4. Relationship to robust optimization
Risk-Averse Decision Making and Control I Computations with coherent risk measure are more efficient than with utility functions
I Risk measures (V@R, CV@R) are more intuitive than utility functions
I Time consistency is important in dynamic settings, but can be difficult to achieve (open research problems)
I Risk measures are making inroads in reinforcement learning and artificial intelligence
Summary Take Home Messages
I Coherent risk measures are a convenient and established risk aversion framework
Risk-Averse Decision Making and Control I Risk measures (V@R, CV@R) are more intuitive than utility functions
I Time consistency is important in dynamic settings, but can be difficult to achieve (open research problems)
I Risk measures are making inroads in reinforcement learning and artificial intelligence
Summary Take Home Messages
I Coherent risk measures are a convenient and established risk aversion framework
I Computations with coherent risk measure are more efficient than with utility functions
Risk-Averse Decision Making and Control I Time consistency is important in dynamic settings, but can be difficult to achieve (open research problems)
I Risk measures are making inroads in reinforcement learning and artificial intelligence
Summary Take Home Messages
I Coherent risk measures are a convenient and established risk aversion framework
I Computations with coherent risk measure are more efficient than with utility functions
I Risk measures (V@R, CV@R) are more intuitive than utility functions
Risk-Averse Decision Making and Control I Risk measures are making inroads in reinforcement learning and artificial intelligence
Summary Take Home Messages
I Coherent risk measures are a convenient and established risk aversion framework
I Computations with coherent risk measure are more efficient than with utility functions
I Risk measures (V@R, CV@R) are more intuitive than utility functions
I Time consistency is important in dynamic settings, but can be difficult to achieve (open research problems)
Risk-Averse Decision Making and Control Summary Take Home Messages
I Coherent risk measures are a convenient and established risk aversion framework
I Computations with coherent risk measure are more efficient than with utility functions
I Risk measures (V@R, CV@R) are more intuitive than utility functions
I Time consistency is important in dynamic settings, but can be difficult to achieve (open research problems)
I Risk measures are making inroads in reinforcement learning and artificial intelligence
Risk-Averse Decision Making and Control Summary
Thank you!! Summary BibliographyI
Andreev, Andriy, Antti Kanto, and Pekka Malo (2005). “Closed-Form Calculation of Cvar”. In: Sweedish School of Economics. Artzner, Philippe et al. (1999). “Coherent Measures of Risk”. In: Mathematical Finance 9, pp. 203–228. issn: 14679965. doi: 10.1111/1467-9965.00068. Ben-Tal, Aharon and Marc Teboulle (2007). “An Old-New Concept of Convex Risk Measures: The Optimized Certainty Equivalent”. In: Mathematical Finance 17, pp. 449–476. url: http://onlinelibrary.wiley.com/doi/10.1111/j.1467- 9965.2007.00311.x/full.
Risk-Averse Decision Making and Control Summary BibliographyII
F¨ollmer, Hans and Thomas Knispel (2011). “Entropic Risk Measures: Coherence Vs. Convexity, Model Ambiguity and Robust Large Deviations”. In: Stochastics and Dynamics 11.02n03, pp. 333–351. issn: 0219-4937. doi: 10.1142/S0219493711003334. url: http://www.worldscientific.com/doi/abs/10.1142/ S0219493711003334. Follmer, Hans and Alexander Schied (2011). Stochastic Finance: An Introduction in Discrete Time. 3rd. Walter de Gruyter. Friedman, Daniel et al. (2014). Risky Curves: On the Empirical Failure of Expected Utility.
Risk-Averse Decision Making and Control Summary BibliographyIII
Iancu, Dan A, Marek Petrik, and Dharmashankar Subramanian (2015). “Tight Approximations of Dynamic Risk Measures”. In: Mathematics of Operations Research 40.3, pp. 655–682. issn: 0364-765X. doi: 10.1287/moor.2014.0689. arXiv: 1106.6102. url: http://pubsonline.informs.org/doi/ abs/10.1287/moor.2014.0689. Kupper, Michael and Walter Schachermayer (2006). “Representation results for law invariant time consistent functions”. In: Mathematics and Financial Economics 16.2, pp. 419–441. url: http://link.springer.com/article/10.1007/s11579- 009-0019-9.
Risk-Averse Decision Making and Control Summary BibliographyIV
Nadarajah, Saralees, Bo Zhang, and Stephen Chan (2014). “Estimation methods for expected shortfall”. In: Quantitative Finance 14.2, pp. 271–291. issn: 1469-7688. doi: 10.1080/14697688.2013.816767. url: http://www.tandfonline.com/doi/abs/10.1080/ 14697688.2013.816767. Petrik, Marek and Dharmashankar Subramanian (2012). “An approximate solution method for large risk-averse Markov decision processes”. In: Uncertainty in Artificial Intelligence (UAI). url: http://arxiv.org/abs/1210.4901. Rockafellar, R. Tyrrell and S. Uryasev (2000). “Optimization of conditional value-at-risk”. In: Journal of Risk 2, pp. 21–41.
Risk-Averse Decision Making and Control Summary BibliographyV
Roorda, Berend, Hans Schumacher, and Jacob Engwerda (2005). “Coherent acceptability measures in multiperiod models”. In: Mathematical Finance, pp. 1–29. url: http://onlinelibrary.wiley.com/doi/10.1111/j.1467- 9965.2005.00252.x/full. Ruszczynski, Andrzej (2010). “Risk-averse dynamic programming for Markov decision processes”. In: Mathematical Programming B 125.2, pp. 235–261. issn: 0025-5610. doi: 10.1007/s10107-010-0393-3. url: http://link.springer.com/10.1007/s10107-010-0393-3. Schoemaker, P.J.H. (1980). Experiments on Decisions under Risk: The Expected Utility Hypothesis.
Risk-Averse Decision Making and Control Summary BibliographyVI
Shapiro, A., D. Dentcheva, and A. Ruszczynski (2014). Lectures on stochastic programming: modeling and theory. P. 447. isbn: 089871687X. doi: http://dx.doi.org/10.1137/1.9780898718751. Shapiro, Alexander (2012). “Minimax and risk averse multistage stochastic programming”. In: European Journal of Operational Research 219.3, pp. 719–726. issn: 03772217. doi: 10.1016/j.ejor.2011.11.005. url: http://dx.doi.org/10.1016/j.ejor.2011.11.005. Shapiro, Alexander, Darinka Dentcheva, and Andrzej Ruszczynski (2009). Lectures on Stochastic Programming. SIAM. isbn: 9780898716870.
Risk-Averse Decision Making and Control