Risk-Averse Decision Making and Control

Marek Petrik University of New Hampshire

Mohammad Ghavamzadeh Adobe Research

February 4, 2017 Introduction to Risk Averse Modeling Outline

Introduction to Risk Averse Modeling

(Average) Value at Risk

Coherent Measures of Risk

Risk Measures in Reinforcement Learning

Time consistency of in reinforcement learning

Summary

Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Schedule

9:00–9:20 Introduction to risk-averse modeling 9:20–9:40 Value at Risk and Average Value at Risk 9:40–9:50 Break 9:50–10:30 Coherent Measures of Risk: Properties and methods 10:30–11:00 Coﬀee break 11:00–12:30 Risk-averse reinforcement learning 12:30–12:40 Break 12:40–12:55 Time consistent measures of risk

Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Risk Aversion

Risk (Wikipedia): Risk is the potential of gaining or losing something of value. . . . Uncertainty is a potential, unpredictable, and uncontrollable outcome; risk is a consequence of action taken in spite of uncertainty. Risk aversion (Wikipedia): ... risk aversion is the behavior of humans, when exposed to uncertainty, to attempt to reduce that uncertainty. . . . Tutorial: Modern methods for risk-averse decision making

Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Desire for Risk Aversion

I Empirical evidence: 1. People buy insurance 2. Diversifying ﬁnancial portfolios 3. Experimental results

I Other reasons:

I Reduce contingency planning

Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Where Risk Aversion Matters

I Financial portfolios

I Heath-care decisions

I Agriculture

I Public infrastructure

I Self-driving cars?

Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling When Risks Are Ignored . . .

Housing bubble leads to a ﬁnancial collapse

Seawalls overﬂow in a tsunami

Risk-Averse Decision Making and Control I Expected utility theory: E[u(X)] = E[utility(X)]

I Exponential utility function (Bernoulli functions): 1 e−a x u(x) = − a

1.0

0.8

0.6

0.4 Utility Function 0.2

0.0

0 2000 4000 6000 8000 10000 Reward ($)

Introduction to Risk Averse Modeling Need to Quantify Risk

I Mitigating risk is expensive, how much is it worth?

Risk-Averse Decision Making and Control I Exponential utility function (Bernoulli functions): 1 e−a x u(x) = − a

1.0

0.8

0.6

0.4 Utility Function 0.2

0.0

0 2000 4000 6000 8000 10000 Reward ($)

Introduction to Risk Averse Modeling Need to Quantify Risk

I Mitigating risk is expensive, how much is it worth? I Expected utility theory: E[u(X)] = E[utility(X)]

Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Need to Quantify Risk

I Mitigating risk is expensive, how much is it worth? I Expected utility theory: E[u(X)] = E[utility(X)]

I Exponential utility function (Bernoulli functions): 1 e−a x u(x) = − a

1.0

0.8

0.6

0.4 Utility Function 0.2

0.0

0 2000 4000 6000 8000 10000 Reward ($)

Risk-Averse Decision Making and Control E $237.50 $272.00 $330.00 − − −

Expected utility: Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − − Risk-neutral choice: no insurance

Introduction to Risk Averse Modeling Example: Buying Car Insurance

Car value: $10 000

Insurance options Option Deductible Cost X1 $10 000 $0 X2 $2 000 $112 X3 $100 $322

Risk-Averse Decision Making and Control E $237.50 $272.00 $330.00 Risk-neutral choice: no insurance− − −

Introduction to Risk Averse Modeling Example: Buying Car Insurance

Car value: $10 000

Insurance options Option Deductible Cost X1 $10 000 $0 X2 $2 000 $112 X3 $100 $322

Expected utility: Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − −

Risk-Averse Decision Making and Control E $237.50 $272.00 $330.00 Risk-neutral choice: no insurance− − −

Introduction to Risk Averse Modeling Example: Buying Car Insurance

Car value: $10 000

Insurance options Option Deductible Cost X1 $10 000 $0 X2 $2 000 $112 X3 $100 $322

Expected utility: Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − −

Risk-Averse Decision Making and Control E $237.50 $272.00 $330.00 Risk-neutral choice: no insurance− − −

Introduction to Risk Averse Modeling Example: Buying Car Insurance

Car value: $10 000

Insurance options Option Deductible Cost X1 $10 000 $0 X2 $2 000 $112 X3 $100 $322

Expected utility: Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − −

Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Example: Buying Car Insurance

Car value: $10 000

Insurance options Option Deductible Cost X1 $10 000 $0 X2 $2 000 $112 X3 $100 $322

Expected utility: Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − − E $237.50 $272.00 $330.00 Risk-neutral choice: no insurance− − −

Risk-Averse Decision Making and Control Prefer insurance, but diﬃcult to interpret and elicit

Introduction to Risk Averse Modeling Risk Averse Utility Functions

I Exponential utility function 1 exp( 10−6 (x + 105)) u(x) = − − · 10−6

I X1 – no insurance

I X2 – high deductible insurance

Event P X1 u(X1) X2 u(X2) No accident 92% $0 1 111 $112 1 111 Minor accident 7.5% $2 500 1 109 −$2 112 1 110 Major accident 0.5% −$10 000 0 −$2 112 1 110 − − E $237.50 1 105 $272.00 1 111 − −

Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Risk Averse Utility Functions

I Exponential utility function 1 exp( 10−6 (x + 105)) u(x) = − − · 10−6

I X1 – no insurance

I X2 – high deductible insurance

Prefer insurance, but diﬃcult to interpret and elicit

Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Drawbacks of Expected Utility Theory

(Schoemaker 1980)

1. Does not explain human behavior 2. Diﬃcult to elicit utilities

3. Complicates optimization (Friedman et al. 2014)

Risk-Averse Decision Making and Control 2. Risk measures: (Artzner et al. 1999)

I Value at risk (V@R) I Conditional value at risk (CV@R) I Coherent measures of risk

Introduction to Risk Averse Modeling Major Alternatives for Measuring Risk

1. Markowitz portfolios: Penalize dispersion risk hX i min Var ci Xi c≥0 · i hX i X s.t. E ci Xi = µ, ci = 1 · i i

Limited modeling capability and also penalizes upside

Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Major Alternatives for Measuring Risk

1. Markowitz portfolios: Penalize dispersion risk hX i min Var ci Xi c≥0 · i hX i X s.t. E ci Xi = µ, ci = 1 · i i

Limited modeling capability and also penalizes upside

2. Risk measures: (Artzner et al. 1999)

I Value at risk (V@R) I Conditional value at risk (CV@R) I Coherent measures of risk

Risk-Averse Decision Making and Control + Flexible modeling framework + Convenient to use with optimization and decision making + Easier to elicit than utilities Diﬃculties in sequential decision making −

Introduction to Risk Averse Modeling Coherent Measures of Risk

Topic of this tutorial

I Alternative to expected utility theory

Risk-Averse Decision Making and Control + Convenient to use with optimization and decision making + Easier to elicit than utilities Diﬃculties in sequential decision making −

Introduction to Risk Averse Modeling Coherent Measures of Risk

Topic of this tutorial

I Alternative to expected utility theory + Flexible modeling framework

Risk-Averse Decision Making and Control + Easier to elicit than utilities Diﬃculties in sequential decision making −

Introduction to Risk Averse Modeling Coherent Measures of Risk

Topic of this tutorial

I Alternative to expected utility theory + Flexible modeling framework + Convenient to use with optimization and decision making

Risk-Averse Decision Making and Control Diﬃculties in sequential decision making −

Introduction to Risk Averse Modeling Coherent Measures of Risk

Topic of this tutorial

I Alternative to expected utility theory + Flexible modeling framework + Convenient to use with optimization and decision making + Easier to elicit than utilities

Risk-Averse Decision Making and Control Introduction to Risk Averse Modeling Coherent Measures of Risk

Topic of this tutorial

I Alternative to expected utility theory + Flexible modeling framework + Convenient to use with optimization and decision making + Easier to elicit than utilities Diﬃculties in sequential decision making −

Risk-Averse Decision Making and Control (Average) Value at Risk Outline

Introduction to Risk Averse Modeling

(Average) Value at Risk

Coherent Measures of Risk

Risk Measures in Reinforcement Learning

Time consistency of in reinforcement learning

Summary

Risk-Averse Decision Making and Control (Average) Value at Risk Schedule

Risk-Averse Decision Making and Control I Expectation is a risk measure X ρ(X) = E[X] = X(ω)P (ω) ω∈Ω

I Risk neutral

I Worst-case is a risk measure

ρ(X) = min[X] = min X(ω) ω∈Ω

I Very risk averse

(Average) Value at Risk Risk Measure

Risk measure: function ρ that maps random variable to a real number

Risk-Averse Decision Making and Control I Worst-case is a risk measure

ρ(X) = min[X] = min X(ω) ω∈Ω

I Very risk averse

(Average) Value at Risk Risk Measure

Risk measure: function ρ that maps random variable to a real number

I Expectation is a risk measure X ρ(X) = E[X] = X(ω)P (ω) ω∈Ω

I Risk neutral

Risk-Averse Decision Making and Control (Average) Value at Risk Risk Measure

Risk measure: function ρ that maps random variable to a real number

I Expectation is a risk measure X ρ(X) = E[X] = X(ω)P (ω) ω∈Ω

I Risk neutral

I Worst-case is a risk measure

ρ(X) = min[X] = min X(ω) ω∈Ω

I Very risk averse

Risk-Averse Decision Making and Control α = 0.3 More conservative α = 0.05 Conservative α = 0 Worst case

(Average) Value at Risk V@R: Value at Risk

ρ(X) = V@Rα(X) = sup t : P[X t] <α ≤ Rewards smaller than V@Rα(X) with probability at most α

Example α values: α = 0.5 Median

Risk-Averse Decision Making and Control α = 0.05 Conservative α = 0 Worst case

(Average) Value at Risk V@R: Value at Risk

ρ(X) = V@Rα(X) = sup t : P[X t] <α ≤ Rewards smaller than V@Rα(X) with probability at most α

Example α values: α = 0.5 Median α = 0.3 More conservative

Risk-Averse Decision Making and Control α = 0 Worst case

(Average) Value at Risk V@R: Value at Risk

ρ(X) = V@Rα(X) = sup t : P[X t] <α ≤ Rewards smaller than V@Rα(X) with probability at most α

Example α values: α = 0.5 Median α = 0.3 More conservative α = 0.05 Conservative

Risk-Averse Decision Making and Control (Average) Value at Risk V@R: Value at Risk

ρ(X) = V@Rα(X) = sup t : P[X t] <α ≤ Rewards smaller than V@Rα(X) with probability at most α

Example α values: α = 0.5 Median α = 0.3 More conservative α = 0.05 Conservative α = 0 Worst case

Risk-Averse Decision Making and Control (Average) Value at Risk V@R Example 1: Cumulative Distribution Function

[email protected](X) = 1.7 −

1.0

0.8

0.6 CDF

0.4

0.2

0.0 6 4 2 0 2 4 6 − − − Reward ($)

Risk-Averse Decision Making and Control (Average) Value at Risk V@R Example 2: Cumulative Distribution Function

[email protected](X) = 0.5 −

1.0

0.8

0.6 CDF

0.4

0.2

0.0 6 4 2 0 2 4 6 − − − Reward ($)

Risk-Averse Decision Making and Control (Average) Value at Risk Car Insurance And V@R: 25%

1.0

0.8 Event P X1 No accident 92% $0 0.6 CDF Minor accident 7.5% $2 500 0.4 − Major accident 0.5% $10 000 0.2 − 0.0

15000 10000 5000 0 5000 − − − Reward ($)

V@Rα(X) = sup t : P[X t] < α α = 0.25 ≤ t P[X t] α ≤ $2 600 0.005 0.25 −$2 500 0.008 0.25 − $0 1.000 0.25

Risk-Averse Decision Making and Control (Average) Value at Risk Car Insurance And V@R: 8%

1.0

0.8 Event P X1 No accident 92% $0 0.6 CDF Minor accident 7.5% $2 500 0.4 − Major accident 0.5% $10 000 0.2 − 0.0

15000 10000 5000 0 5000 − − − Reward ($)

V@Rα(X) = sup t : P[X t] < α α = 0.008 ≤ t P[X t] α ≤ $2 500 0.005 0.008 −$2 400 0.008 0.008 −

Risk-Averse Decision Making and Control (Average) Value at Risk Car Insurance And V@R

I X1: no insurance (high risk)

I X2: high deductible insurance (medium risk)

I X3: low deductible insurance (low risk)

Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − − E $238 $272 $330 − − −

Risk-Averse Decision Making and Control (Average) Value at Risk Car Insurance And V@R

I X1: no insurance (high risk)

I X2: high deductible insurance (medium risk)

I X3: low deductible insurance (low risk)

Risk-Averse Decision Making and Control (Average) Value at Risk Car Insurance And V@R

I X1: no insurance (high risk)

I X2: high deductible insurance (medium risk)

I X3: low deductible insurance (low risk)

Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − − E $238 $272 $330 − − − [email protected] $0 $112 $322 − − [email protected] $2 500 $2 112 $422 − − −

Risk-Averse Decision Making and Control Coherent measures of risk: Preserve V@R positives and improve negatives (Artzner et al. 1999)

(Average) Value at Risk Properties of V@R

+ Preserves aﬃne transformations:

V@Rα(τ X + c) = τ V@Rα(X) + c · · + Simple and intuitive to model and understand + Compelling meaning in ﬁnance Ignores heavy tails − Not convex −

Risk-Averse Decision Making and Control (Average) Value at Risk Properties of V@R

+ Preserves aﬃne transformations:

V@Rα(τ X + c) = τ V@Rα(X) + c · · + Simple and intuitive to model and understand + Compelling meaning in ﬁnance Ignores heavy tails − Not convex − Coherent measures of risk: Preserve V@R positives and improve negatives (Artzner et al. 1999)

Risk-Averse Decision Making and Control (Average) Value at Risk Average Value at Risk

I AKA Conditional Value at Risk and Expected Shortfall

I Popular coherent risk measure ρ

I Simple deﬁnition for atomless distributions: h i CV@Rα(X) = E X X V@Rα(X) | ≤ I Recall: V@Rα(X) = sup t : P[X t] < α ≤ I Convex extension of V@R (Rockafellar and Uryasev 2000)

Risk-Averse Decision Making and Control (Average) Value at Risk V@R vs CV@R: Cumulative Distribution Function

[email protected](X) = 0.5 [email protected](X) = 1.1 − −

1.0 1.0

0.8 0.8

0.6 0.6 CDF CDF

0.4 0.4

0.2 0.2

0.0 0.0 6 4 2 0 2 4 6 6 5 4 3 2 1 − − − − − − − − − Reward ($) Reward ($)

Risk-Averse Decision Making and Control [email protected] $2 500 [email protected] $2 500 − − [email protected] $3 250 [email protected] $102 250 − −

(Average) Value at Risk CV@R vs V@R: Heavy Tails

A more expensive car?

Event P X1 Event P X1 No accident 92% $0 No accident 92% $0 Minor acc. 7.5% $2 500 Minor acc. 7.5% $2 500 Major acc. 0.5% −$10 000 Major acc. 0.5% $1− 000 000 − −

Risk-Averse Decision Making and Control [email protected] $3 250 [email protected] $102 250 − −

(Average) Value at Risk CV@R vs V@R: Heavy Tails

A more expensive car?

Event P X1 Event P X1 No accident 92% $0 No accident 92% $0 Minor acc. 7.5% $2 500 Minor acc. 7.5% $2 500 Major acc. 0.5% −$10 000 Major acc. 0.5% $1− 000 000 − − [email protected] $2 500 [email protected] $2 500 − −

Risk-Averse Decision Making and Control (Average) Value at Risk CV@R vs V@R: Heavy Tails

A more expensive car?

Risk-Averse Decision Making and Control (Average) Value at Risk V@R: Heavy Tails and Financial Crisis

Risk-Averse Decision Making and Control (Average) Value at Risk CV@R vs V@R: Continuity

2000 −

4000 − Risk ($) 6000 −

8000 − V@R AV@R 10000 − 0.0 0.2 0.4 0.6 0.8 1.0 Risk level α

Risk-Averse Decision Making and Control (Average) Value at Risk Schedule

Risk-Averse Decision Making and Control Coherent Measures of Risk Outline

Introduction to Risk Averse Modeling

(Average) Value at Risk

Coherent Measures of Risk

Risk Measures in Reinforcement Learning

Time consistency of in reinforcement learning

Summary

Risk-Averse Decision Making and Control Coherent Measures of Risk Schedule

9:00–9:20 Introduction to risk-averse modeling 9:20–9:40 Value at Risk and Average Value at Risk 9:40–9:50 Break 9:50–10:30 Coherent Measures of Risk 10:30–11:00 Coﬀee break 11:00–12:30 Risk-averse reinforcement learning 12:30–12:40 Break 12:40–12:55 Time consistent measures of risk

Risk-Averse Decision Making and Control Coherent Measures of Risk Coherent Measures of Risk

I Generalize CV@R to allow more general models

I Framework introduced in (Artzner et al. 1999)

I Coherence: Requirements for risk measure ρ to satisfy

I Our treatment based on (Shapiro, Dentcheva, and Ruszczynski 2009) and (Follmer and Schied 2011)

Risk-Averse Decision Making and Control Coherent Measures of Risk Coherence Requirements of Risk Measures

1. Convexity: (really concavity for maximization!)

ρ(t X + (1 t) Y ) t ρ(X) + (1 t) ρ(Y ) · − · ≥ · − · 2. Monotonicity:

If X Y , then ρ(X) ρ(Y ) ≥ 3. Translation equivariance: For a constant a:

ρ(X + a) = ρ(X) + a

4. Positive homogeneity: For t > 0, then:

ρ(t X) = t ρ(X) · ·

Risk-Averse Decision Making and Control 1 1 Event P X1 X2 2 X1 + 2 X2 No accident 92% $0 $112 $56 Minor accident 7.5% $2 500 −$2 112 $2− 306 Major accident 0.5% −$10 000 −$2 112 −$6 056 − − − CV@R $238 $272 $240 − − −

238 + 272 240 − − = 255 − ≥ 2 −

Coherent Measures of Risk Convexity Why: Diversiﬁcation should decrease risk (and it helps with optimization)

ρ(t X + (1 t) Y ) t ρ(X) + (1 t) ρ(Y ) · − · ≥ · − ·

Risk-Averse Decision Making and Control 238 + 272 240 − − = 255 − ≥ 2 −

Coherent Measures of Risk Convexity Why: Diversiﬁcation should decrease risk (and it helps with optimization)

ρ(t X + (1 t) Y ) t ρ(X) + (1 t) ρ(Y ) · − · ≥ · − ·

1 1 Event P X1 X2 2 X1 + 2 X2 No accident 92% $0 $112 $56 Minor accident 7.5% $2 500 −$2 112 $2− 306 Major accident 0.5% −$10 000 −$2 112 −$6 056 − − − CV@R $238 $272 $240 − − −

Risk-Averse Decision Making and Control Coherent Measures of Risk Convexity Why: Diversiﬁcation should decrease risk (and it helps with optimization)

ρ(t X + (1 t) Y ) t ρ(X) + (1 t) ρ(Y ) · − · ≥ · − ·

1 1 Event P X1 X2 2 X1 + 2 X2 No accident 92% $0 $112 $56 Minor accident 7.5% $2 500 −$2 112 $2− 306 Major accident 0.5% −$10 000 −$2 112 −$6 056 − − − CV@R $238 $272 $240 − − −

238 + 272 240 − − = 255 − ≥ 2 −

Risk-Averse Decision Making and Control 0 X2: Insurance with deductible of $10 000 0 Event P X1 X2 No accident 92% $0 $112 Minor accident 7.5% $2 500 −$2 500 Major accident 0.5% −$10 000 −$10 000 − − ρ $238 $320 − −

$320 $238 − ≤ −

Coherent Measures of Risk Monotonicity

Why: Do not prefer an outcome that is always worse

If X Y , then ρ(X) ρ(Y ) ≥

Risk-Averse Decision Making and Control $320 $238 − ≤ −

Coherent Measures of Risk Monotonicity

Why: Do not prefer an outcome that is always worse

If X Y , then ρ(X) ρ(Y ) ≥

0 X2: Insurance with deductible of $10 000 0 Event P X1 X2 No accident 92% $0 $112 Minor accident 7.5% $2 500 −$2 500 Major accident 0.5% −$10 000 −$10 000 − − ρ $238 $320 − −

Risk-Averse Decision Making and Control Coherent Measures of Risk Monotonicity

Why: Do not prefer an outcome that is always worse

If X Y , then ρ(X) ρ(Y ) ≥

0 X2: Insurance with deductible of $10 000 0 Event P X1 X2 No accident 92% $0 $112 Minor accident 7.5% $2 500 −$2 500 Major accident 0.5% −$10 000 −$10 000 − − ρ $238 $320 − −

$320 $238 − ≤ −

Risk-Averse Decision Making and Control More expensive insurance by $100 Event P X2 X2 No accident 92% $112 $212 Minor accident 7.5% −$2 112 −$2 212 Major accident 0.5% −$2 112 −$2 212 − − ρ $272 $372 − −

$372 = $272 $100 − − −

Coherent Measures of Risk Translation equivariance

Why: Risk is measured in the same units as the reward

ρ(X + a) = ρ(X) + a

Risk-Averse Decision Making and Control $372 = $272 $100 − − −

Coherent Measures of Risk Translation equivariance

Why: Risk is measured in the same units as the reward

ρ(X + a) = ρ(X) + a

More expensive insurance by $100 Event P X2 X2 No accident 92% $112 $212 Minor accident 7.5% −$2 112 −$2 212 Major accident 0.5% −$2 112 −$2 212 − − ρ $272 $372 − −

Risk-Averse Decision Making and Control Coherent Measures of Risk Translation equivariance

Why: Risk is measured in the same units as the reward

ρ(X + a) = ρ(X) + a

More expensive insurance by $100 Event P X2 X2 No accident 92% $112 $212 Minor accident 7.5% −$2 112 −$2 212 Major accident 0.5% −$2 112 −$2 212 − − ρ $272 $372 − −

$372 = $272 $100 − − −

Risk-Averse Decision Making and Control What if the prices are in e: $1 = e0.94 Event P X2 X2 No accident 92% $112 e105 Minor accident 7.5% −$2 112 −e1 985 Major accident 0.5% −$2 112 −e1 985 − − ρ $272 e256 − −

$272 = e256 − −

Coherent Measures of Risk Positive homogeneity

Why: Risk is measured in the same units as the reward

ρ(t X) = t ρ(X) · ·

Risk-Averse Decision Making and Control $272 = e256 − −

Coherent Measures of Risk Positive homogeneity

Why: Risk is measured in the same units as the reward

ρ(t X) = t ρ(X) · ·

What if the prices are in e: $1 = e0.94 Event P X2 X2 No accident 92% $112 e105 Minor accident 7.5% −$2 112 −e1 985 Major accident 0.5% −$2 112 −e1 985 − − ρ $272 e256 − −

Risk-Averse Decision Making and Control Coherent Measures of Risk Positive homogeneity

Why: Risk is measured in the same units as the reward

ρ(t X) = t ρ(X) · ·

What if the prices are in e: $1 = e0.94 Event P X2 X2 No accident 92% $112 e105 Minor accident 7.5% −$2 112 −e1 985 Major accident 0.5% −$2 112 −e1 985 − − ρ $272 e256 − −

$272 = e256 − −

Risk-Averse Decision Making and Control Coherent Measures of Risk Convex Risk Measures

Weaker deﬁnition than coherent risk measures 1. Convexity:

ρ(t X + (1 t) Y ) t ρ(X) + (1 t) ρ(Y ) · − · ≤ · − · 2. Monotonicity:

If X Y , then ρ(X) ρ(Y ) ≥ 3. Translation equivariance: For a constant a:

ρ(X + a) = ρ(X) + a (( (((( 4. (Positive((( homogeneity

Risk-Averse Decision Making and Control Coherent Measures of Risk Additional Property: Law Invariance

Value of risk measure is independent of the names of the events

Consider a coin ﬂip Event P X Y Heads 1/2 1 0 Tails 1/2 0 1

Require that ρ(X) = ρ(Y ); violated by some coherent risk measures

Distortion risk measures: coherence & law invariance & comonotonicity

Risk-Averse Decision Making and Control I Worst case: ρ(X) = min[X] = min X(ω) ω∈Ω

1. Convexity: min[X] is convex 2. Monotonicity: min[X] min[Y ] if X Y 3. Translation equivariance≥: min[X + a] = min[X] + a 4. Positive homogeneity: min[t X] = t E[X] for t > 0 · ·

Coherent Measures of Risk Simple Coherent Measures of Risk

I Expectation: X ρ(x) = E[X] = X(ω)P (ω) ω∈Ω

1. Convexity: E[X] is linear 2. Monotonicity: E[X] E[Y ] if X Y ≥ 3. Translation equivariance: E[X + a] = E[X] + a 4. Positive homogeneity: E[t X] = t E[X] for t > 0 · ·

Risk-Averse Decision Making and Control Coherent Measures of Risk Simple Coherent Measures of Risk

I Expectation: X ρ(x) = E[X] = X(ω)P (ω) ω∈Ω

1. Convexity: E[X] is linear 2. Monotonicity: E[X] E[Y ] if X Y ≥ 3. Translation equivariance: E[X + a] = E[X] + a 4. Positive homogeneity: E[t X] = t E[X] for t > 0 · · I Worst case: ρ(X) = min[X] = min X(ω) ω∈Ω

1. Convexity: min[X] is convex 2. Monotonicity: min[X] min[Y ] if X Y 3. Translation equivariance≥: min[X + a] = min[X] + a 4. Positive homogeneity: min[t X] = t E[X] for t > 0 · · Risk-Averse Decision Making and Control I Coherent deﬁnition of CV@R: n 1 o CV@Rα(X) = sup t + E[X t]− t α −

? I t = V@Rα(X) when the distribution is atom-less

I Deﬁnitions the same for continuous distributions

Coherent Measures of Risk CV@R for Discrete Distributions

I Simple deﬁnition is not coherent h i CV@Rα(X) = E X X V@Rα(X) ≤

I Violates convexity when distribution has atoms (discrete distributions)

Risk-Averse Decision Making and Control I Deﬁnitions the same for continuous distributions

Coherent Measures of Risk CV@R for Discrete Distributions

I Simple deﬁnition is not coherent h i CV@Rα(X) = E X X V@Rα(X) ≤

I Violates convexity when distribution has atoms (discrete distributions)

I Coherent deﬁnition of CV@R: n 1 o CV@Rα(X) = sup t + E[X t]− t α −

? I t = V@Rα(X) when the distribution is atom-less

Risk-Averse Decision Making and Control Coherent Measures of Risk CV@R for Discrete Distributions

I Simple deﬁnition is not coherent h i CV@Rα(X) = E X X V@Rα(X) ≤

I Violates convexity when distribution has atoms (discrete distributions)

I Coherent deﬁnition of CV@R: n 1 o CV@Rα(X) = sup t + E[X t]− t α −

? I t = V@Rα(X) when the distribution is atom-less

I Deﬁnitions the same for continuous distributions

Risk-Averse Decision Making and Control Coherent Measures of Risk Computing CV@R

I Discrete distributions: Solve a linear program 1 max t + p>y t,y α s.t. y X t, ≤ − y 0 ≤

I Continuous distributions: Closed form for many (Nadarajah, Zhang, and Chan 2014; Andreev, Kanto, and Malo 2005)

Risk-Averse Decision Making and Control Coherent Measures of Risk Car Insurance and CV@R

I X1 – no insurance

I X2 – high deductible insurance

I X3 – low deductible insurance

Event P X1 X2 X3 No accident 92% $0 $112 $322 Minor accident 7.5% $2 500 −$2 112 −$422 Major accident 0.5% −$10 000 −$2 112 −$422 − − − E $238 $272 $330 − − −

Risk-Averse Decision Making and Control Coherent Measures of Risk Car Insurance and CV@R

I X1 – no insurance

I X2 – high deductible insurance

I X3 – low deductible insurance

Risk-Averse Decision Making and Control Coherent Measures of Risk Car Insurance and CV@R

I X1 – no insurance

I X2 – high deductible insurance

I X3 – low deductible insurance

Risk-Averse Decision Making and Control I A is a set of measures such that is: 1. convex 2. bounded 3. closed I Proof: Double convex conjugate

I Convex conjugate:

ρ?(y) = sup x>y ρ(x) x −

I Fenchel–Moreau theorem:

ρ??(x) = ρ(x)

Coherent Measures of Risk Robust Representation of Coherent Risk Measures

I Important representation for analysis and optimization I For any coherent risk measure ρ:

> ρ(X) = min Eξ X = inf ξ X ξ∈A ξ∈A

Risk-Averse Decision Making and Control I Proof: Double convex conjugate

I Convex conjugate:

ρ?(y) = sup x>y ρ(x) x −

I Fenchel–Moreau theorem:

ρ??(x) = ρ(x)

Coherent Measures of Risk Robust Representation of Coherent Risk Measures

I Important representation for analysis and optimization I For any coherent risk measure ρ:

> ρ(X) = min Eξ X = inf ξ X ξ∈A ξ∈A

I A is a set of measures such that is: 1. convex 2. bounded 3. closed

Risk-Averse Decision Making and Control Coherent Measures of Risk Robust Representation of Coherent Risk Measures

I Important representation for analysis and optimization I For any coherent risk measure ρ:

> ρ(X) = min Eξ X = inf ξ X ξ∈A ξ∈A

I A is a set of measures such that is: 1. convex 2. bounded 3. closed I Proof: Double convex conjugate

I Convex conjugate:

ρ?(y) = sup x>y ρ(x) x −

I Fenchel–Moreau theorem:

ρ??(x) = ρ(x)

Risk-Averse Decision Making and Control Coherent Measures of Risk Robust Set for CV@R

n 1 o CV@Rα(X) = sup t + E[X t]− t α −

I Robust representation: ρ(X) = inf Eξ X ξ∈A

I Robust set for probability distribution P :

1 A = ξ 0 ξ P, 1>ξ = 1 ≥ | ≤ α

Risk-Averse Decision Making and Control Coherent Measures of Risk Robust Set for CV@R

I Robust representation: ρ(X) = min Eξ X ξ∈A 1 A = ξ 0 ξ P, 1>ξ = 1 ≥ | ≤ α

I Random variable: X = [10, 5, 2]

I Probability distribution: p = [1/3, 1/3, 1/3]

I CV@R1/2(X) =

min 10 ξ1 + 5 ξ2 + 2 ξ3 ξ≥0 1 1 2 ξi pi = 1/3 = ξ1 + ξ2 + ξ3 = 1 ≤ α 1/2 3

Risk-Averse Decision Making and Control Coherent Measures of Risk Other Coherent Risk Measures

1. Combination of expectation and CV@R

2. Entropic risk measure

3. Coherent entropic risk measure (convex, incoherent)

4. Risk measures from utility functions

5....

Risk-Averse Decision Making and Control Coherent Measures of Risk Convex Combination of Expectation and CV@R

I CV@R ignores the mean return

I Risk-averse solutions bad in expectation

I Practical trade-oﬀ: Combine mean and risk

ρ(X) = c E[X] + (1 c) CV@Rα(X) · − ·

Risk-Averse Decision Making and Control I Incoherent (violates translation invariance)

I No robust representation

I Coherent entropic risk measure:(F ¨ollmer and Knispel 2011)

n > o ρ(X) = max Eξ[X] KL(ξ P ) c, 1 ξ = 1 ξ≥0 | | ≤

Coherent Measures of Risk Entropic Risk Measure

−τ·X ρ(X) = 1/τ ln E e τ > 0 −

I Convex risk measure

Risk-Averse Decision Making and Control I Coherent entropic risk measure:(F ¨ollmer and Knispel 2011)

n > o ρ(X) = max Eξ[X] KL(ξ P ) c, 1 ξ = 1 ξ≥0 | | ≤

Coherent Measures of Risk Entropic Risk Measure

−τ·X ρ(X) = 1/τ ln E e τ > 0 −

I Convex risk measure

I Incoherent (violates translation invariance)

I No robust representation

Risk-Averse Decision Making and Control Coherent Measures of Risk Entropic Risk Measure

−τ·X ρ(X) = 1/τ ln E e τ > 0 −

I Convex risk measure

I Incoherent (violates translation invariance)

I No robust representation

I Coherent entropic risk measure:(F ¨ollmer and Knispel 2011)

n > o ρ(X) = max Eξ[X] KL(ξ P ) c, 1 ξ = 1 ξ≥0 | | ≤

Risk-Averse Decision Making and Control I Direct construction:

ρ(X) = E[u(X)]

Not coherent or convex

I Optimized Certainty Equivalent (Ben-Tal and Teboulle 2007) ρ(X) = sup (t + E[g(X t)]) t −

Coherent Measures of Risk Risk Measure From Utility Function

I Concave utility function u( ) · I Construct a coherent risk measure from g?

Risk-Averse Decision Making and Control I Optimized Certainty Equivalent (Ben-Tal and Teboulle 2007) ρ(X) = sup (t + E[g(X t)]) t −

Coherent Measures of Risk Risk Measure From Utility Function

I Concave utility function u( ) · I Construct a coherent risk measure from g?

I Direct construction:

ρ(X) = E[u(X)]

Not coherent or convex

Risk-Averse Decision Making and Control Coherent Measures of Risk Risk Measure From Utility Function

I Concave utility function u( ) · I Construct a coherent risk measure from g?

I Direct construction:

ρ(X) = E[u(X)]

Not coherent or convex

I Optimized Certainty Equivalent (Ben-Tal and Teboulle 2007) ρ(X) = sup (t + E[g(X t)]) t −

Risk-Averse Decision Making and Control I Convex risk measure for any concave u

I Coherent risk measure for pos. homogeneous u

I Exponential u: OCE = entropic risk measure

I Piecewise linear u: OCE = CV@R

Coherent Measures of Risk Optimized Certainty Equivalent

ρ(X) = sup (t + E[g(X t)]) t −

I How much consume now given uncertain future

Risk-Averse Decision Making and Control I Exponential u: OCE = entropic risk measure

I Piecewise linear u: OCE = CV@R

Coherent Measures of Risk Optimized Certainty Equivalent

ρ(X) = sup (t + E[g(X t)]) t −

I How much consume now given uncertain future

I Convex risk measure for any concave u

I Coherent risk measure for pos. homogeneous u

Risk-Averse Decision Making and Control Coherent Measures of Risk Optimized Certainty Equivalent

ρ(X) = sup (t + E[g(X t)]) t −

I How much consume now given uncertain future

I Convex risk measure for any concave u

I Coherent risk measure for pos. homogeneous u

I Exponential u: OCE = entropic risk measure

I Piecewise linear u: OCE = CV@R

Risk-Averse Decision Making and Control Coherent Measures of Risk Recommended References

I Lectures on Stochastic Programming: Modeling and Theory (Shapiro, Dentcheva, and Ruszczynski 2014)

I Stochastic Finance: An Introduction in Discrete Time (Follmer and Schied 2011)

Risk-Averse Decision Making and Control Coherent Measures of Risk Remainder of Tutorial: Multistage Optimization

I How to apply risk measures when optimizing over multiple time steps

I Results in machine learning and reinforcement learning

I Time or dynamic consistency in multiple time steps

Risk-Averse Decision Making and Control Coherent Measures of Risk Schedule

Risk-Averse Decision Making and Control Risk Measures in Reinforcement Learning Outline

Introduction to Risk Averse Modeling

(Average) Value at Risk

Coherent Measures of Risk

Risk Measures in Reinforcement Learning

Time consistency of in reinforcement learning

Summary

Risk-Averse Decision Making and Control Risk Measures in Reinforcement Learning Schedule

Risk-Averse Decision Making and Control Risk Measures in Reinforcement Learning

Please see the other slide deck

Risk-Averse Decision Making and Control Risk Measures in Reinforcement Learning Schedule

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Outline

Introduction to Risk Averse Modeling

(Average) Value at Risk

Coherent Measures of Risk

Risk Measures in Reinforcement Learning

Time consistency of in reinforcement learning

Summary

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Schedule

Risk-Averse Decision Making and Control Option 2: Custom Insurance

I Take a safety exam I Pass with probability 1/2 2 I OK[ P = /3]: +$5.00 2 I Not[ P = /3]: $20.00 − I Fail with probability 1/2 2 I OK[ P = /3]: $5.00 2 − I Not[ P = /3]: $10.00 −

Risk measure: ρ = CV@R2/3

Time consistency of in reinforcement learning Example: Driving Test Discount

Option 1: Plain Insurance

I Cost: $9.00

I No deductible

I Certain expected outcome:

E[X1] = 9.00 − ρ(X1) = E[X1] = 9.00 −

Risk-Averse Decision Making and Control Risk measure: ρ = CV@R2/3

Time consistency of in reinforcement learning Example: Driving Test Discount

Option 1: Plain Insurance Option 2: Custom Insurance I Cost: $9.00

I No deductible I Take a safety exam 1 I Certain expected outcome: I Pass with probability /2 2 I OK[ P = /3]: +$5.00 2 E[X1] = 9.00 I Not[ P = /3]: $20.00 − − I Fail with probability 1/2 ρ(X1) = E[X1] = 9.00 2 − I OK[ P = /3]: $5.00 2 − I Not[ P = /3]: $10.00 −

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Example: Driving Test Discount

Option 1: Plain Insurance Option 2: Custom Insurance I Cost: $9.00

Risk measure: ρ = CV@R2/3

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Risk Measure of Option 2

OK −5 1/3 1 /3 Risk measure: 1 OK −5 /3 2 ρ(X ) = CV@R2 (X ) 1 / 2 /3 2 Fail = P NOK −10 P X2 Take test 1/6 5 1 − OK 5 /6 5 P Pass 1 − 1/3 /6 10 = − 1 1 / 1 3 /6 5 2 / 1 OK 5 1/6 5 /3 1/6 20 − NOK −20

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Risk Measure of Option 2

Risk measure: OK −5 ρ(X2) = CV@R2/3(X2) 1/3

1/3 P X2 1 OK −5 /3 2 1/6 5 1 / Fail − = 1/6 5 P − NOK −10 1/6 10 − Take test 1/6 5 OK 5 1/6 5 P Pass 1 3 1 = / /6 20 1 − / 1 3 2 / 1 OK 5 /3 5 5 10 20 ρ(X ) = − − − − = 2 4 NOK −20 = 10.0 < 9.0 = ρ(X1) − −

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Risk Measure of Option 2

OK −5 Risk measure: 1 3 / ρ(X2) = CV@R2/3(X2) 1/3 OK −5 1/ P X2 2 3 / 1 1 6 Fail / 5 = − 1/6 5 P NOK −10 − 1/6 10 Take test − 1/6 5 OK 5 P Pass 1/6 5 1/3 = 1/6 20 1 / 1 3 2 / − 1 OK 5 /3

ρ(X2) < ρ(X1) NOK −20 Prefer option 1

Risk-Averse Decision Making and Control Fail test 1

1/3 1 /3 / 3 NOK −10 OK −5 OK −5

P 1/3 1/3 1/3 X2 10 5 5 − − − 20 + 5 15 + 5 ρ(X2 Pass) = − = 7.5 ρ(X2 Fail) = − = 7.5 | 2 − | 2 − If pass, prefer option 2 If fail, prefer option 2 Time inconsistent behavior (Roorda, Schumacher, and Engwerda 2005; Iancu, Petrik, and Subramanian 2015)

Time consistency of in reinforcement learning Optimal Solution of Subproblems

Recall we prefer option 1: ρ(X1) = 9 −

Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5

P 1/3 1/3 1/3 X2 20 5 5 −

Risk-Averse Decision Making and Control Fail test 1

1/3 1 /3 / 3 NOK −10 OK −5 OK −5

P 1/3 1/3 1/3 X2 10 5 5 − − − 15 + 5 ρ(X2 Fail) = − = 7.5 | 2 − If fail, prefer option 2 Time inconsistent behavior (Roorda, Schumacher, and Engwerda 2005; Iancu, Petrik, and Subramanian 2015)

Time consistency of in reinforcement learning Optimal Solution of Subproblems

Recall we prefer option 1: ρ(X1) = 9 −

Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5

P 1/3 1/3 1/3 X2 20 5 5 − 20 + 5 ρ(X2 Pass) = − = 7.5 | 2 − If pass, prefer option 2

Risk-Averse Decision Making and Control Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5

P 1/3 1/3 1/3 X2 20 5 5 − 20 + 5 15 + 5 ρ(X2 Pass) = − = 7.5 ρ(X2 Fail) = − = 7.5 | 2 − | 2 − If pass, prefer option 2 If fail, prefer option 2 Time inconsistent behavior (Roorda, Schumacher, and Engwerda 2005; Iancu, Petrik, and Subramanian 2015)

Time consistency of in reinforcement learning Optimal Solution of Subproblems

Recall we prefer option 1: ρ(X1) = 9 −

Fail test 1

1/3 1 /3 / 3 NOK −10 OK −5 OK −5

P 1/3 1/3 1/3 X2 10 5 5 − − −

Risk-Averse Decision Making and Control Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5

Time consistency of in reinforcement learning Optimal Solution of Subproblems

Recall we prefer option 1: ρ(X1) = 9 −

Fail test 1

1/3 1 /3 / 3 NOK −10 OK −5 OK −5

P 1/3 1/3 1/3 X2 10 5 5 − − −

Risk-Averse Decision Making and Control Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5

P 1/3 1/3 1/3 X2 20 5 5 − 20 + 5 ρ(X2 Pass) = − = 7.5 | 2 − If pass, prefer option 2 Time inconsistent behavior (Roorda, Schumacher, and Engwerda 2005; Iancu, Petrik, and Subramanian 2015)

Time consistency of in reinforcement learning Optimal Solution of Subproblems

Recall we prefer option 1: ρ(X1) = 9 −

Fail test 1

1/3 1 /3 / 3 NOK −10 OK −5 OK −5

P 1/3 1/3 1/3 X2 10 5 5 − − − 15 + 5 ρ(X2 Fail) = − = 7.5 | 2 − If fail, prefer option 2

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Optimal Solution of Subproblems

Recall we prefer option 1: ρ(X1) = 9 −

Pass test 1 Fail test 1

3 3 1 / 1/ 1 /3 1/ 3 / / 3 3 NOK −20 OK 5 OK 5 NOK −10 OK −5 OK −5

P 1/3 1/3 1/3 P 1/3 1/3 1/3 X2 20 5 5 X2 10 5 5 − − − − 20 + 5 15 + 5 ρ(X2 Pass) = − = 7.5 ρ(X2 Fail) = − = 7.5 | 2 − | 2 − If pass, prefer option 2 If fail, prefer option 2 Time inconsistent behavior (Roorda, Schumacher, and Engwerda 2005; Iancu, Petrik, and Subramanian 2015)

Risk-Averse Decision Making and Control I Time consistent: if for all X,Y (also dynamic consistent)

ρt+1(Xt + ) ρt+1(Yt + ) ρt(Xt + ) ρt(Yt + ) ··· ≥ ··· ⇒ ··· ≥ ···

I Similar to subproblem optimality in programming optimality

Time consistency of in reinforcement learning Time Consistent Risk Measures

I Filtration (scenario tree) of rewards with T levels:

X1,X2,X3,...,XT

I Dynamic risk measure at time t:

ρt(Xt + + XT ) ···

Risk-Averse Decision Making and Control I Similar to subproblem optimality in programming optimality

Time consistency of in reinforcement learning Time Consistent Risk Measures

I Filtration (scenario tree) of rewards with T levels:

X1,X2,X3,...,XT

I Dynamic risk measure at time t:

ρt(Xt + + XT ) ···

I Time consistent: if for all X,Y (also dynamic consistent)

ρt+1(Xt + ) ρt+1(Yt + ) ρt(Xt + ) ρt(Yt + ) ··· ≥ ··· ⇒ ··· ≥ ···

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Time Consistent Risk Measures

I Filtration (scenario tree) of rewards with T levels:

X1,X2,X3,...,XT

I Dynamic risk measure at time t:

ρt(Xt + + XT ) ···

I Time consistent: if for all X,Y (also dynamic consistent)

ρt+1(Xt + ) ρt+1(Yt + ) ρt(Xt + ) ρt(Yt + ) ··· ≥ ··· ⇒ ··· ≥ ···

I Similar to subproblem optimality in programming optimality

Risk-Averse Decision Making and Control I Markov risk measures for MDPs (Ruszczynski 2010)

Time consistency of in reinforcement learning Time Consistency via Iterated Risk Mappings

I Time consistent risk measures must be composed of iterated risk mappings (Roorda, Schumacher, and Engwerda 2005): µ1, µ2, . . . , µt

I Dynamic risk measure:

ρt(Xt + + XT ) = µt(Xt + µt+1(Xt+1 + µt+2(xt+3 + ))) ··· ···

I Each µt: a coherent risk measure applied on subtree of ﬁltration

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Time Consistency via Iterated Risk Mappings

I Time consistent risk measures must be composed of iterated risk mappings (Roorda, Schumacher, and Engwerda 2005): µ1, µ2, . . . , µt

I Dynamic risk measure:

ρt(Xt + + XT ) = µt(Xt + µt+1(Xt+1 + µt+2(xt+3 + ))) ··· ···

I Each µt: a coherent risk measure applied on subtree of ﬁltration

I Markov risk measures for MDPs (Ruszczynski 2010)

Risk-Averse Decision Making and Control Fail test 1

1/3 1 /3 / 3 NOK −10 OK −5 OK −5

15 + 5 ρ(X2 Fail) = − = 7.5 | 2 −

Take test 1 1/2 /2

Pass −7.5 Fail −7.5

ρ(X2) = ρ( 7.5) = 7.5 > 9 − − − Consistently prefer option 1 throughout the execution

Time consistency of in reinforcement learning Computing Time Consistent Risk Measure

Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5

20 + 5 ρ(X2 Pass) = − = 7.5 | 2 −

Risk-Averse Decision Making and Control Pass test 1 3 1/ 1 /3 / 3 NOK −20 OK 5 OK 5

20 + 5 ρ(X2 Pass) = − = 7.5 | 2 −

Take test 1 1/2 /2

Pass −7.5 Fail −7.5

ρ(X2) = ρ( 7.5) = 7.5 > 9 − − − Consistently prefer option 1 throughout the execution

Time consistency of in reinforcement learning Computing Time Consistent Risk Measure

Fail test 1

1/3 1 /3 / 3 NOK −10 OK −5 OK −5

15 + 5 ρ(X2 Fail) = − = 7.5 | 2 −

Risk-Averse Decision Making and Control Pass test 1 Fail test 1

3 3 1 / 1/ 1 /3 1/ 3 / / 3 3 NOK −20 OK 5 OK 5 NOK −10 OK −5 OK −5

20 + 5 15 + 5 ρ(X2 Pass) = − = 7.5 ρ(X2 Fail) = − = 7.5 | 2 − | 2 −

Consistently prefer option 1 throughout the execution

Time consistency of in reinforcement learning Computing Time Consistent Risk Measure

Take test 1 1/2 /2

Pass −7.5 Fail −7.5

ρ(X2) = ρ( 7.5) = 7.5 > 9 − − −

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Computing Time Consistent Risk Measure

Pass test 1 Fail test 1

3 3 1 / 1/ 1 /3 1/ 3 / / 3 3 NOK −20 OK 5 OK 5 NOK −10 OK −5 OK −5

20 + 5 15 + 5 ρ(X2 Pass) = − = 7.5 ρ(X2 Fail) = − = 7.5 | 2 − | 2 −

Take test 1 1/2 /2

Pass −7.5 Fail −7.5

ρ(X2) = ρ( 7.5) = 7.5 > 9 − − − Consistently prefer option 1 throughout the execution

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Approximating Inconsistent Risk Measures

I Time consistent risk measures are diﬃcult to specify

I Approximate an inconsistent risk measure by a consistent one?

I Best lower bound: e.g. what is the best α1, α2 such that

CV@Rα (CV@Rα (X)) CV@Rα(X) for all X 1 2 ≤

I Best upper bound: e.g. what is the best α1, α2 such that

CV@Rα (CV@Rα (X)) CV@Rα(X) for all X 1 2 ≥ (Iancu, Petrik, and Subramanian 2015)

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Best Time Consistent Bounds

I Compare robust sets of consistent and inconsistent measures I Main insight: need to compare down-monotone closures of robust sets

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Time Consistent Bounds: Main Results

I Lower consistent bound:

I Uniformly tightest bound can be constructed in polynomial time I Method: rectangularization

I Upper consistent bound:

I NP hard to even evaluate how tight the approximation is I Approximation can be tighter than the lower bound

Risk-Averse Decision Making and Control Time consistency of in reinforcement learning Planning with Time Consistent Risk Measures

I Stochastic dual dynamic programming (Shapiro 2012)

I Applied in reinforcement learning (Petrik and Subramanian 2012)

I Only entropic dynamically consistent risk measures are law invariant (Kupper and Schachermayer 2006)

Risk-Averse Decision Making and Control Summary Outline

Introduction to Risk Averse Modeling

(Average) Value at Risk

Coherent Measures of Risk

Risk Measures in Reinforcement Learning

Time consistency of in reinforcement learning

Summary

Risk-Averse Decision Making and Control Summary Risk Measures: Many Other Topics

1. Elicitation of risk measures

2. Estimation of risk measure from samples

3. Relationship to acceptance sets

4. Relationship to robust optimization

Risk-Averse Decision Making and Control I Computations with coherent risk measure are more eﬃcient than with utility functions

I Risk measures (V@R, CV@R) are more intuitive than utility functions

I Time consistency is important in dynamic settings, but can be diﬃcult to achieve (open research problems)

I Risk measures are making inroads in reinforcement learning and artiﬁcial intelligence

Summary Take Home Messages

I Coherent risk measures are a convenient and established risk aversion framework

Risk-Averse Decision Making and Control I Risk measures (V@R, CV@R) are more intuitive than utility functions

I Time consistency is important in dynamic settings, but can be diﬃcult to achieve (open research problems)

I Risk measures are making inroads in reinforcement learning and artiﬁcial intelligence

Summary Take Home Messages

I Coherent risk measures are a convenient and established risk aversion framework

I Computations with coherent risk measure are more eﬃcient than with utility functions

Risk-Averse Decision Making and Control I Time consistency is important in dynamic settings, but can be diﬃcult to achieve (open research problems)

I Risk measures are making inroads in reinforcement learning and artiﬁcial intelligence

Summary Take Home Messages

I Coherent risk measures are a convenient and established risk aversion framework

I Computations with coherent risk measure are more eﬃcient than with utility functions

I Risk measures (V@R, CV@R) are more intuitive than utility functions

Risk-Averse Decision Making and Control I Risk measures are making inroads in reinforcement learning and artiﬁcial intelligence

Summary Take Home Messages

I Coherent risk measures are a convenient and established risk aversion framework

I Computations with coherent risk measure are more eﬃcient than with utility functions

I Risk measures (V@R, CV@R) are more intuitive than utility functions

I Time consistency is important in dynamic settings, but can be diﬃcult to achieve (open research problems)

Risk-Averse Decision Making and Control Summary Take Home Messages

I Coherent risk measures are a convenient and established risk aversion framework

I Computations with coherent risk measure are more eﬃcient than with utility functions

I Risk measures (V@R, CV@R) are more intuitive than utility functions

I Time consistency is important in dynamic settings, but can be diﬃcult to achieve (open research problems)

I Risk measures are making inroads in reinforcement learning and artiﬁcial intelligence

Risk-Averse Decision Making and Control Summary

Thank you!! Summary BibliographyI

Andreev, Andriy, Antti Kanto, and Pekka Malo (2005). “Closed-Form Calculation of Cvar”. In: Sweedish School of Economics. Artzner, Philippe et al. (1999). “Coherent Measures of Risk”. In: Mathematical Finance 9, pp. 203–228. issn: 14679965. doi: 10.1111/1467-9965.00068. Ben-Tal, Aharon and Marc Teboulle (2007). “An Old-New Concept of Convex Risk Measures: The Optimized Certainty Equivalent”. In: Mathematical Finance 17, pp. 449–476. url: http://onlinelibrary.wiley.com/doi/10.1111/j.1467- 9965.2007.00311.x/full.

Risk-Averse Decision Making and Control Summary BibliographyII

F¨ollmer, Hans and Thomas Knispel (2011). “Entropic Risk Measures: Coherence Vs. Convexity, Model Ambiguity and Robust Large Deviations”. In: Stochastics and Dynamics 11.02n03, pp. 333–351. issn: 0219-4937. doi: 10.1142/S0219493711003334. url: http://www.worldscientific.com/doi/abs/10.1142/ S0219493711003334. Follmer, Hans and Alexander Schied (2011). Stochastic Finance: An Introduction in Discrete Time. 3rd. Walter de Gruyter. Friedman, Daniel et al. (2014). Risky Curves: On the Empirical Failure of Expected Utility.

Risk-Averse Decision Making and Control Summary BibliographyIII

Iancu, Dan A, Marek Petrik, and Dharmashankar Subramanian (2015). “Tight Approximations of Dynamic Risk Measures”. In: Mathematics of Operations Research 40.3, pp. 655–682. issn: 0364-765X. doi: 10.1287/moor.2014.0689. arXiv: 1106.6102. url: http://pubsonline.informs.org/doi/ abs/10.1287/moor.2014.0689. Kupper, Michael and Walter Schachermayer (2006). “Representation results for law invariant time consistent functions”. In: Mathematics and Financial Economics 16.2, pp. 419–441. url: http://link.springer.com/article/10.1007/s11579- 009-0019-9.

Risk-Averse Decision Making and Control Summary BibliographyIV

Nadarajah, Saralees, Bo Zhang, and Stephen Chan (2014). “Estimation methods for expected shortfall”. In: Quantitative Finance 14.2, pp. 271–291. issn: 1469-7688. doi: 10.1080/14697688.2013.816767. url: http://www.tandfonline.com/doi/abs/10.1080/ 14697688.2013.816767. Petrik, Marek and Dharmashankar Subramanian (2012). “An approximate solution method for large risk-averse Markov decision processes”. In: Uncertainty in Artiﬁcial Intelligence (UAI). url: http://arxiv.org/abs/1210.4901. Rockafellar, R. Tyrrell and S. Uryasev (2000). “Optimization of conditional value-at-risk”. In: Journal of Risk 2, pp. 21–41.

Risk-Averse Decision Making and Control Summary BibliographyV

Roorda, Berend, Hans Schumacher, and Jacob Engwerda (2005). “Coherent acceptability measures in multiperiod models”. In: Mathematical Finance, pp. 1–29. url: http://onlinelibrary.wiley.com/doi/10.1111/j.1467- 9965.2005.00252.x/full. Ruszczynski, Andrzej (2010). “Risk-averse dynamic programming for Markov decision processes”. In: Mathematical Programming B 125.2, pp. 235–261. issn: 0025-5610. doi: 10.1007/s10107-010-0393-3. url: http://link.springer.com/10.1007/s10107-010-0393-3. Schoemaker, P.J.H. (1980). Experiments on Decisions under Risk: The Expected Utility Hypothesis.

Risk-Averse Decision Making and Control Summary BibliographyVI

Shapiro, A., D. Dentcheva, and A. Ruszczynski (2014). Lectures on stochastic programming: modeling and theory. P. 447. isbn: 089871687X. doi: http://dx.doi.org/10.1137/1.9780898718751. Shapiro, Alexander (2012). “Minimax and risk averse multistage stochastic programming”. In: European Journal of Operational Research 219.3, pp. 719–726. issn: 03772217. doi: 10.1016/j.ejor.2011.11.005. url: http://dx.doi.org/10.1016/j.ejor.2011.11.005. Shapiro, Alexander, Darinka Dentcheva, and Andrzej Ruszczynski (2009). Lectures on Stochastic Programming. SIAM. isbn: 9780898716870.

Risk-Averse Decision Making and Control