Risk Analysis of Wind Energy Company Stocks

Xin Jiang

July 2020

Faculty of Techonology

Supervisor: Astrid Hilbert Abstract

In this thesis, probability theory and risk analysis are used to determine the risk of wind energy stocks. Three stocks of wind energy companies and three stocks of technology companies are gathered and risks are compared. Three different risk measures: variance, , and conditional value at risk are used in this thesis.

Conclusions which has been drawn, are that wind energy company stock risks are not significantly lower than the stocks of other companies. Furthermore, optimal portfolios should include short positions of one or two of the energy companies for the studied time period and under the different risk measures.

Key words: Probability theory, risk analysis, variance, value at risk, conditional value at risk. Acknowledgements

I would like to thank my supervisor Astrid Hilbert at Linnaeus University for an- swering my questions concerning the thesis and for introducing me to appropriate literature. I also would like to thank Roger Pettersson for teaching me the knowledge of simulation and for showing me the skills to work with real stock price data. Contents

1 Introduction 2 1.1 Background ...... 2 1.2 Basic Concepts and Tools ...... 3 1.2.1 Concepts from Probability Theory ...... 3 1.2.2 Tools from Multivariate Statistics ...... 4 1.2.3 Concepts from Finance ...... 8

2 Risk Measures 10 2.1 Variance ...... 10 2.2 Value at Risk (V aR)...... 11 2.3 Conditional Value at Risk (CV aR)...... 11

3 Empirical Result of Risk Comparison 12 3.1 Empirical Analysis of Data ...... 12 3.2 Empirical Results ...... 14

4 Methodology of Portfolio Optimization 15 4.1 Mean-Variance Optimization ...... 15 4.2 Value at Risk Optimization ...... 16 4.3 Conditional Value at Risk Optimization ...... 16

5 Empirical Results of Portfolio Optimization 19 5.1 Mean-Variance Optimization ...... 20 5.1.1 Algorithm Analysis ...... 20 5.1.2 Results of MV Optimization: ...... 21 5.2 VaR Optimization ...... 22 5.2.1 Algorithm Analysis: ...... 22 5.2.2 Results of VaR Optimization: ...... 23 5.3 CVaR Optimization ...... 25 5.3.1 Algorithm Analysis: ...... 25 5.3.2 Results of CVaR Optimization: ...... 26

6 Conclusion and Discussion 28

7 References 30

8 Appendix 32

1 1 Introduction

1.1 Background

Wind energy is a lucrative field for investors. According to the American Wind Energy Association in 2018, wind power industry in the US grew about 9% in 2017, increasing its power generation capacity by more than 7,000 megawatts. In 2017, wind energy accounted for 6.3% of the total US electricity supply. This 6.3% is more important than it looks - enough to give power to nearly 27 million homes. There is still more room for growth at the global level. The International Energy Agency estimates that wind power will be 18% of the worldwide energy production by 2050 (Olson, 2019).

Will the new energy company stocks be significantly less risky than other compa- nies? If so, they may want only to raise product prices and infringe on the interests of the people due to rigid demand. Hence, this thesis investigates the stocks of new energy companies to determine if they are less risky than other stocks.

This thesis examines stocks of three large wind energy companies: General Elec- tric Company (GE) (Boston, Massachusetts; USA), Vestas Wind Systems A/S (VWDRY) (Aarhus; Denmark), and Pattern Energy Group Inc. (PEGI) (San Fran- cisco, California; USA). For comparison, three stocks of non-energy companies were chosen: Apple (technology company), Amazon (online marketplace), and Facebook (social media). The data for 6 years of daily closing stock prices was obtained from the website of Yahoo Finance.

The thesis compares the risks of these 6 stocks with three risk measures: variance, value at risk (VaR), and conditional value at risk (CVaR). Further more, the invest- ment recommendations are given based on the calculation results of the portfolio optimisation.

The remainder of this thesis is organised as follows: Section 1 introduces basic con- cepts and tools from probability theory, statistics, and finance which will be used in this thesis. Section 2 reviews preliminaries on risk measures. Section 3 calculates and compares the risk of each stock using different risk measures. Section 4 intro- duces the sample-based portfolio optimisation model with the given risk measures. Section 4 presents the selected statistical results of the sample-based portfolio opti- misation with the different risk measures. Finally, Section 5 presents the conclusions and investment advice.

2 1.2 Basic Concepts and Tools

Here, we introduce some basic concepts and tools from probability theory, statistics and finance which will be used in this thesis.

1.2.1 Concepts from Probability Theory

Definition 1.1. Let (Ω, F,P ) be a probability space. A X is a Borel measurable function from the sample space Ω to R:

X :Ω → R (Gut, 2005, p.25).

Theorem 1. A Borel measurable function of a random variable is a random vari- able, viz., if g is a real, Borel measurable function and X is a random variable, then Y = g(X) is a random variable.

Proof. See Gut, 2005, p.28.

Proposition 1.1. Suppose that X1,X2, ... are random variables. The following quantities are random variables:

(a) max{X1,X2} and min{X1,X2};

(b) supn Xn and infn Xn;

(c) lim supn→∞ Xn and lim infn→∞ Xn .

Proof. See Gut, 2005, p.28.

Definition 1.2. Let X be a real-valued random variable. The distribution func- tion of X is

F (X) = P (X ≤ x), x ∈ R, The continuity set of F is

C(F ) = {x : F (x) is continuous at x}.

Whenever convenient we index a distribution function by the random variable it refers to, FX ,FY , and so on (Gut, 2005, p.30).

Definition 1.3. Random vectors are elements in the Euclidean space Rn for some n ∈ N. An n-dimensional random vector X is a Borel measurable function from

3 the sample space Ω to Rn:

n X :Ω → R Random vectors are here considered as column vectors:

0 X = (X1,X2, ..., Xn) where 0 denotes transpose (i.e., X0 is a row vector). The joint distribution function of X is

FX1,X2,...,Xn (x1, x2, ...xn) = P (X1 ≤ x1,X2 ≤ x2, ..., Xn ≤ xn) for xk ∈ R, k = 1, 2, ..., n (Gut, 2005, p.43).

Definition 1.4. Let X be a one-dimensional random variable. The (a) moments are EXn, n = 1, 2, ...; (b) central moments are E(X − EX)n, n = 1, 2, ...; The first moment, EX, is the mean. The second central moment is called variance:

V arX = E(X − EX)2 (= EX2 − (EX)2). (Gut, 2005, p.62).

1.2.2 Tools from Multivariate Statistics

In statistics, the complexities of most phenomena require collecting observations on many variables. In this thesis, I consider the daily returns of six stocks. These observation data are regarded as a 6-variate population.

Definition 1.5. A single multivariate observation is the collection of measure- ments on p variables taken on the same item or trial. If n observations have been obtained, the entire data set can be placed in an n × p array (matrix):

  x11 x12 ... x1p    x21 x22 ... x2p     ·· ... ·    Xn×p =    ·· ... ·     ·· ... ·    xn1 xn2 ... xnp

4 Each row of X represents a multivariate observation. Since the entire set of measurements is often one particular realization of what might have been observed, we say that the data is a ’sample’ of size n from a p-variate ’population’. The sample then consists of n measurements, each of which has p components (Johnson and Wichern, 2014, pps.111-112).

Remark. For a p-dimensional scatter plot, the rows of X represent n points in a p-dimensional space. We can write:

   0  x11 x12 ... x1p x1    0   x21 x22 ... x2p   x2       ·· ... ·   ·      Xn×p =   =    ·· ... ·   ·       ·· ... ·   ·     0  xn1 xn2 ... xnp xn

0 The jth row vector xj, representing the jth observation, contains the coordinates of a point (Johnson and Wichern, 2014, p.112).

Definition 1.6. Suppose the data have not been observed, but we intend to collect n sets of measurements on p variables. Before the measurement, their values cannot, in general, be predicted exactly. Consequently, we treat them as random variables. In this context, let the (j, k)-th entry in the data matrix be the random variable

Xjk. Each set of measurement Xj on p variables is a random vector, and we have the random matrix

   0  X11 X12 ... X1p X1    0   X21 X22 ... X2p   X2       ·· ... ·   ·      Xn×p =   =    ·· ... ·   ·       ·· ... ·   ·     0  Xn1 Xn2 ... Xnp Xn

5 A random sample can now be defined (Johnson and Wichern, 2014, p.119).

0 0 0 Remark. If the row vectors X1, X2, ·, ·, ·, Xn represent independent observations from a common joint distribution with density function f(x) = f(x1, x2, ..., xp), then

X1, X2, ·, ·, ·, Xn are said to form a random sample from f(x). Mathematically,

X1, X2, ·, ·, ·, Xn form a random sample if their joint density function is given by the product f(x1)f(x2) ··· f(xn), where f(xj) = f(xj1, xj2, ..., xjp) is the density function for the jth row vector (Johnson and Wichern, 2014, p.119). In this thesis, we use moment estimation to determine the unknown parameters of population distribution.

Definition 1.7. In statistics, moment estimation is the method of estimating population parameters. It was introduced by Pafnuty Chebyshev in 1887 (Cheby- shev, 1887, pp. 805-815).

Suppose we have to estimate unknown parameters θ1, θ2, . . . , θk characterizing the distribution fX (x; θ) of a random variable X. X1,X2, ..., Xn is a sample of X. The first k moments of population X can be expressed as functions of the θ’s:

E[X] = g1(θ1, θ2, . . . , θk),

2 E[X ] = g2(θ1, θ2, . . . , θk), . . k E[X ] = gk(θ1, θ2, . . . , θk),

Since X1,X2, ..., Xn is a sample of size n, for j = 1, ..., k, let

n 1 X µ = Xj bj n i i=1 be the j − th sample moment, an estimate of E[Xj]. The method of moments estimator for θ1, θ2, . . . , θk denoted by θb1, θb2,..., θbk is defined as the solution (if there is one) to the equations (Bowman and Shenton, 1998, pps.2092–2098):

µb1 = g1(θb1, θb2,..., θbk),

µb2 = g2(θb1, θb2,..., θbk), . .

µbk = gk(θb1, θb2,..., θbk).

6 In statistics, the bias (or bias function) of an estimator is the difference between the estimator’s and the true value of the parameter being estimated. ’Bias’ is an objective property of an estimator.

Definition 1.8. Suppose the distribution of a random variable X has an unknown ˆ parameter θ, and X1,X2, ...... , Xn is a sample of this population. Assume θ is an estimator of θ. If E[θˆ] = θ, then the estimator θˆ is said to be unbiased (Keener, 2010, pps.61-62).

Theorem 2. Let X1, X2, ·, ·, ·, Xn be a random sample from a joint distribution that has mean vector µ and covariance matrix Σ. Then X¯ is an unbiased estimator 1 of µ, and its covariance matrix is n Σ. That is,

E(X¯ ) = µ 1 Cov(X¯ ) = Σ n where Σ is the population variance-covariance matrix.

For the sample variance-covariance matrix Sn,

n − 1 1 E(S ) = Σ = Σ − Σ. n n n Thus,

n E( S ) = Σ n − 1 n so [n/(n − 1)]Sn is an unbiased estimator of Σ, while Sn is a biased estimator with bias = E(Sn) − Σ = −(1/n)Σ (Johnson and Wichern, 2014, p.121).

Proof. See Johnson and Wichern, 2014, pp.121-122.

Remark. An unbiased sample Covariance Matrix of Σ is

n n 1 X S = S = (X − X¯ )(X − X¯ )0 n − 1 n n − 1 j j j=1

(Johnson and Wichern, 2014, p.123). In statistics, one parameter may have two or more unbiased estimators. ”Effi- ciency” is another objective property of unbiased estimators.

7 ˆ ˆ Definition 1.9. If θ] and θ2 are two unbiased estimators of a parameter θ and ˆ ˆ ˆ ˆ V ar(θ1) ≤ V ar(θ2), then the estimator θ1 is said to be more efficient than θ2.

ˆ ˆ∗ ˆ∗ Moreover, if V ar(θ0) ≤ V ar(θ ), for any competing unbiased estimators θ . The ˆ estimator θ0 is the minimum variance unbiased estimator of θ (Keener, 2010, pp.61-62).

1.2.3 Concepts from Finance

This thesis is based on the assumption of strong-form efficient markets, so we introduce the definition of that.

Definition 1.10. In strong-form efficient markets, share prices reflect all in- formation, public and private, and no one can earn excess returns. To test for strong-form efficiency, there should be a market where investors cannot consistently earn excess returns for a long period of time (Fama, 1970).

To eliminate the influence of volume, we use daily return of stock prices. The definition of return is given below:

Definition 1.11. The return at time t, denoted by Rt, is defined by

Pt − Pt−1 Rt = , Pt−1 where Pt is the stock price at time t. The returns can be daily, weekly, monthly etc. (West, 2006. p. 3).

Definition 1.12. A portfolio consists of a number of different stocks X1,X2, ..., XN .

The return Ri of a stock Xi can be considered as a random variable. For a given 0 expected return µ0, it is desirable to find the linear combination ω R such that risk is minimized. The vector ω = [ω1, ω2, ..., ωN ], where ω1 +ω2 +...+ωN = 1, rep- resents the weights and gives the proportion of the value of the portfolio invested at the individual stocks.

8 Figure 1: The minimum variance line and efficient frontier.

Definition 1.13. The negative weights represent short-selling. In finance, a short sale (also known as a short, shorting, or going short) is the sale of an asset (securities or other financial instruments) that the seller borrows to profit from a subsequent fall in the price of the asset. After borrowing the asset, the short seller sells it to a buyer at the market price at that time. Subsequently, the resulting short position is ’covered’ when the seller repurchases the same asset (i.e. an instrument of the same type) in a market transaction and delivers the purchased asset back to the lender to replace the asset that was initially borrowed. In the event of an interim price decline, the short seller profits, since the cost of (re)purchase is less than the proceeds received from the initial (short) sale. Conversely, the short position will result in a loss if the price of a shorted asset rises prior to repurchase (Fleckner, 2015).

Remark. Selecting an optimal portfolio is based on the assumption that rational investors will be willing to accept a higher risk only if the expected return is higher, or equivalently, given a certain risk a rational investor would want the expected re- turn to be as high as possible. Risk is undesirable, but expected return is desirable (Markowitz, 1952, p.77).

Since it is necessary to quantify risk, it is desirable that the considers the spread of returns and the probabilities of attaining those values. Therefore, we introduce several risk measures in the next section.

Take quadratic optimization as an example. We take different mean returns µ0, calculate the minimum risk σ and the corresponding weights ω = [ω1, ω2, ..., ωN ].

Then we plot each given expected return µ0 against the corresponding risk σ into a point. A series of the points will give rise to the minimum variance curve. From figure 1, we see, that for a certain risk, a rational investor will always choose the

9 highest return. Hence the upper half of the curve will be the efficient frontier (Capinski and Zastawniak, 2011, pps.74-78).

2 Risk Measures

In financial mathematics and stochastic optimization, the concept of risk measure is used to quantify the risk involved in a random outcome or risk position. Many risk measures have been proposed, each having certain characteristics.

Definition 2.1. Consider a linear vector space X of random variables X representing the values at time 1 of portfolios. We denote ρ to be a function that assigns a real number (or +∞) to each X in X, representing the measure of the risk of X. The number ρ(X) is interpreted as the minimum capital that needs to be added to the portfolio at time 0 and invested in the reference instrument to make the position acceptable. If ρ(X) ≤ 0, then X is the value at time 1 of an acceptable portfolio and no capital needs to be added. In principle, a risk measure ρ could assign different values to two equally distributed future portfolio values X1 and X2 (Hult, et al., 2010, p.160).

Many mathematicians, economists, and financiers have specified general properties for suitable risk measures. The most important properties should be: translation in- variance, monotonicity, subadditivity, and positive homogeneity. The risk measures that have all four properties are axiomatically defined by Artzner et al. (1999) as coherent risk measures:

1) Translation invariance: ρ(X + c) = ρ(X) + c

2) Subadditivity: ρ(X1 + X2) ≤ ρ(X1) + ρ(X2)

3) Monotonicity: If X1 ≤ X2, then ρ(X1) ≤ ρ(X2) 4) Positive homogeneity: ρ(λX) = λρ(X), for λ > 0

2.1 Variance

In classical mean-variance risk analysis of returns, introduced by Markowitz (1952), the most basic risk measure is the standard deviation, since this measures how far a set of (random) numbers are spread from the average value.

10 Although the standard deviation is a , it has some drawbacks. The variance or standard deviation performs well in describing the risk of portfolios with normally distributed returns, but it’s weakly convincing in the non-normal situ- ation. That is, it symmetrically penalizes profit and loss while risk is an asymmetric phenomenon (Ahmadi-Javid, 2012, p.2). Mean-variance utility does not distinguish between gains and losses. In fact, investors often find the pain of losses to be much greater than the satisfaction that they get from gains. In other words, investors typically exhibit loss aversion. This will suggest for example, that they should be getting different marginal utilities from another dollar of loss versus another dollar of gain. Different from it are the following two risk measures: value-at-risk and conditional-value-at-risk which emphasizes losses compared to mean-variance risk measure.

2.2 Value at Risk (V aR)

Definition 2.2. The value-at-risk V aRβ(X) at level β ∈ (0, 1) of a portfolio with value X at time 1 is defined as

V aRβ(X) = min{m : P (X ≤ m) ≥ 1 − β}

With FX (m) denoting the distribution function of a continuous random variable X, −1 we can rewrite V aRβ(X) as V aRβ(X) = FX (1 − β).

It means that V aRβ(X) is nothing but the β-quantile of a continuous random vari- able X (Hult, et al., 2010, pps.165-166).

Although V aRβ(L) is translation invariant, monotone and positive homogeneous, it is not sub-additive and therefore not a coherent measure of risk (Hult, et al., 2010, pps.176-177).

Furthermore, VaR is unstable as regards to numerical work, and optimization mod- els using VaR are awkward (intractable) in high dimensions (Sarykalin et al. 2008).

2.3 Conditional Value at Risk (CV aR)

Definition 2.3. Conditional value at risk (CVaR) is also called (ES), average value at risk (AVaR), or expected tail loss (ETL). It is defined

11 by 1 Z 1 CV aRβ(X) = V aRu(X)du. 1 − β β Conditional value at risk is often considered to be a better measure of risk than value-at-risk since it considers the entire left tail of the distribution of X, while VaR is just a β-quantile which means that it ignores the left tail beyond level β of the distribution of X. In particular, it allows a careless/dishonest risk manager to miss/hide unlikely but catastrophic risks in the left tail.

Conditional value at risk inherits the properties of VaR and is also sub-additive, and, therefore, a coherent measure of risk (Hult, et al., 2010, pps.189-190).

3 Empirical Result of Risk Comparison

3.1 Empirical Analysis of Data

The historical data of stock prices were collected from Yahoo Finance. The data con- sists of daily closing prices of six stocks: Amazon, Apple, Facebook, General Electric Company (GE), Vestas Wind Systems A/S (VWDRY), Pattern Energy Group Inc. (PEGI), from 25 September 2013 to 25 February 2019. The optimization is run for 1631 working days. Definition 1.11 was used to calculate and obtain the data of 1630 daily returns of each stock for optimizations. MATLAB was used for all the calculations.

Since the marginal data seem to be quite symmetric and close to bell-shaped, like the normal distribution, but with heavier tails the marginal data were fitted with student t in MATLAB (with code fitdist(Returns,0 tlocationscale0)). The marginal data fitted the student t distribution very well with degrees of freedom

ν1 = 2.86, ν2 = 3.27, ν3 = 3.15, ν4 = 2.55, ν5 = 3.83, ν6 = 3.18, respectively.

12 Figure 2: Kernel densities and fitted student t densities of the six stock returns.

Then the data were fitted with normal distribution in MATLAB, and in Figure 3 we can see that the returns of all six stocks have sharper peaks distributions than in the normal distribution.

Figure 3: Kernel densities and fitted normal densities of returns.

13 The skewness and kurtosis for all six returns are tabulated in Table 1.

Stocks Amazon Apple Facebook GE PEGI VWDRY Skewness 0.5315 -0.1725 -0.0446 0.4184 -0.1289 0.2059 Kurtosis 12.0185 7.6993 16.2051 10.5705 5.8056 9.8115

Table 1: The skewness and kurtosis of the returns.

Figure 4: QQplots of the six stock returns.

Since the skewness and kurtosis of a Gaussian distribution should be 0 and 3, re- spectively, the skewness and kurtosis of all the return distributions shown in Table 1 do not meet the condition for the given distributions to be Gaussian. The QQ-plot also gives the same conclusion.

3.2 Empirical Results

To compare the riskiness of wind energy stocks and non-energy company stocks directly, the variance, value at risk and conditional value at risk of the daily returns of each stock are shown in table 1. Under the sample-based setting, the daily returns generate a collection of six vectors

14 R1,R2,...,R6. Then the corresponding approximation to V aRβ can be written as follows:

V[ aRβ = quantile(Ri; 1 − β), where i = 1, 2, 3, ..., 6.

The approximation of CV aRβ can be written as: 1 1 Fbβ(c) = c + · · Σ[(Ri − c)+], 1 − β n where c is the V aRβ of the stock and n is the sample size (Rockafellar and Uryasev, 2000, p.21).

Stocks Amazon Apple Facebook GE PEGI VWDRY Var(10−3) 0.3837 0.2334 0.3761 0.2542 0.3758 0.5595

VaRβ 0.0178 0.0159 0.0190 0.0167 0.0236 0.0229

CVaRβ 0.0358 0.0279 0.0351 0.0283 0.0346 0.0454

Table 2: The various risk measurement results of the returns. Here β = 0.9.

According to table 2, the wind energy company stock risks does not seem to be lower than the stocks of the other companies. Especially the stock of Vestas Wind Systems is more risky than other companies.

4 Methodology of Portfolio Optimization

4.1 Mean-Variance Optimization

Let R = (R1,R2, ..., Rn) denote the return vector of a portfolio consisting of n assets at a future time 1. We now turn to two versions of a portfolio investment problem.

In the first version, the maximization-of-expectation problem, one seeks the optimal weight vector ω to maximize the expected portfolio value given the risk (variance) 2 constraint σ0:

maximize ω0R

0 2 0 subject to ω Σω ≤ σ0, ω · 1 = 1.

15 The second version of the investment problem, the minimization-of-variance prob- lem, seeks the optimal weight vector ω that has minimum risk (variance) given a lower bound on the expected value µ0.

minimize ω0Σω

0 0 subject to ω R ≥ µ0, ω · 1 = 1, where Σ is the covariance matrix and ω is the weight vector of the portfolio, that is, the proportions invested in each stock.

In this thesis only the second version of the investment optimization is used.

4.2 Value at Risk Optimization

Pn 0 The loss of a portfolio is given by L = L(ω, R) = − k=1 ωk · Rk = −ω · R.

The value at risk optimization can be formulated as the following.

The minimization-of-VaR problem, seeks the optimal weight vector ω that has min- imum risk (value at risk) given a lower bound on the expected value µ0.

minimize V aRβ(ω)

0 0 subject to ω R ≥ µ0, ω · 1 = 1, where V aRβ(ω) is the 1 − β-quantile of loss L when weight vector is ω.

4.3 Conditional Value at Risk Optimization

Theorem 3. Assume L(ω, R) has a continuous distribution function. Then we have

CV aRβ(ω) = E[L(ω, R) | L(ω, R) ≥ V aRβ(ω)]. (1)

Proof. See Hult, et al., 2010, pp.179-181.

For c ≥ 0, we define the function Fβ(ω, c) by:

1 F (ω, c) = c + E[(L(ω, R) − c) ] (2) β 1 − β + where

16 ( L(ω, R) − c , L(ω, R) ≥ c, (L(ω, R) − c)+ = 0 ,L(ω, R) < c.

(Rockafellar and Uryasev, 2000, p.21).

Theorem 4. Let L(ω, R) be a continuous random variable. Then CV aRβ(ω) and

V aRβ(ω) are given by the equations

CV aRβ(ω) = min Fβ(ω, c), (3) c

V aRβ(ω) = arg min Fβ(ω, c), (4) c where argmin stands for argument of the minimum. In mathematics, the arguments of the minima are the points, or elements, of the domain of some function at which the function values are minimized. In contrast to the global minima, which refer to the smallest outputs of a function, argmin refers to the inputs or arguments at which the function outputs are as small as possible.

A proof of Theorem 4 is given in Xue (2016, p.13), but with the aim of presenting a self-contained thesis a proof is included.

Proof. To prove equation (4), we have:

∂ ∂c Fβ(ω, c)

∂ 1 = ∂c {c + 1−β E[(L(ω, R) − c)I{L(ω, R) ≥ c}]}

1 = 1 − 1−β E[I{L(ω, R) ≥ c}] = A

According to the definition of expectation, we have:

E[I{L(ω, R) ≥ c}] = P(L(ω, R) ≥ c).

According to the definition of conditional value at risk, we have:

1 − β = P(L(ω, R) ≥ V aRβ(ω)).

17 Then we have:

A = 1 − P(L(ω,R)≥c) . P(L(ω,R)≥V aRβ (ω))

If we set c = V aRβ(ω), we have:

A = 1 − P(L(ω,R)≥V aRβ (ω)) . P(L(ω,R)≥V aRβ (ω))

∂ we get that ∂c Fβ(ω, c) = 0 when c = V aRβ(ω).

+ Since for any constant l(ω, R), c → (l(ω, R) − c) is convex, c → Fβ(ω, c) is convex ∂ which means that the minimum of Fβ(ω, c) is attained when ∂c Fβ(ω, c) = 0.

To prove equation (3), notice that

Fβ(ω, V aRβ(ω)) 1 = V aR (ω) + E[(L(ω, R) − V aR (ω)) ] β 1 − β β + V aR (ω) 1 = β (1 − β) + E[(L(ω, R) − V aR (ω)) ] 1 − β 1 − β β + V aR (ω) (L(ω, R) ≥ V aR (ω)) 1 = β P β + E[(L(ω, R) − V aR (ω)) {L(ω, R) ≥ V aR (ω)}] 1 − β 1 − β β I β V aR (ω)E[ {L(ω, R) ≥ V aR (ω)}] 1 = β I β + E[(L(ω, R) − V aR (ω)) {L(ω, R) ≥ V aR (ω)}] 1 − β 1 − β β I β V aR (ω)E[ {L(ω, R) ≥ V aR (ω)}] E[L(ω, R) {L(ω, R) ≥ V aR (ω)}] = β I β + I β 1 − β 1 − β V aR (ω)E[ {L(ω, R) ≥ V aR (ω)}] − β I β 1 − β E[L(ω, R) {L(ω, R) ≥ V aR (ω)}] = I β 1 − β E[L(ω, R) {L(ω, R) ≥ V aR (ω)}] = I β P(L(ω, R) ≥ V aRβ(ω)) = E[L(ω, R) | L(ω, R) ≥ V aRβ(ω)], which is the same as CV aRβ(ω) according to Theorem 1.

Theorem 5. Minimizing CV aRβ(ω) over ω is equivalent to minimizing Fβ(ω, c) over all of (ω, c), in the sense that

min CV aRβ(ω) = min Fβ(ω, c) ω ω,c

18 .

Proof. From Theorem 4 we know

CV aRβ(ω) = min Fβ(ω, c), c

Fβ(ω, c) can be minimized with respect to (ω, c) by first minimizing over c for fixed ω and then minimizing the results over ω.

Recall that Fβ(ω, c) is convex with respect to (ω, c) whenever the expectation of [L(ω, R) − c]+ in the formula 1 F (ω, c) = c + E[(L(ω, R) − c)+], β 1 − β is convex with respect to (ω, c).

This holds since, for fixed R, L(ω, R) is convex with respect to ω. Furthermore, for fixed c, ω → (l(ω, R) − c)+ is convex. On the other hand, for any constant + l(ω, R), c → (l(ω, R) − c) is convex. Further, c → Fβ(ω, c) is convex. Thus,

E[(L(ω, R) − c)+] is convex with respect to (ω, c).

The convexity of CV aRβ(ω) follows from the definition since L(ω, R) is convex with respect to ω.

Also in Xue (2016, p.13), a similar proof can be found.

Finally we have the conditional value at risk optimization as a minimization-of-CVaR problem, which seeks the optimal ω vector that has minimum risk (conditional value at risk) given a lower bound on the expected value µ0.

minimize CV aRβ(ω)

0 0 subject to ω R ≥ µ0, ω · 1 = 1

5 Empirical Results of Portfolio Optimization

To give financial advice to potential investors in new energy companies, we combine the six stocks into one portfolio and compare the weights between different types of

19 companies in the optimal portfolio.

In this empirical analysis, all the optimizations are run under the assumption that short selling is allowed in the stock market, which means for the constraint of opti- mizations, the domain of each ωi is set as (−∞, +∞).

5.1 Mean-Variance Optimization

5.1.1 Algorithm Analysis

In this paper the sample mean vector and sample covariance matrix of the data are obtained by mean and cov in MATLAB. They are both unbiased estimators. The unbiasedness of cov follows from Theorem 2, and that the normalising constraint of cov is n − 1.

Further, the sample mean and sample covariance are the most efficient estimators (minimum variance unbiased estimator) of µ and Σ if the observations are normal distributed.

In this thesis, we assume that both long and short position are allowed. Then the investment problem for quadratic optimization can be formulated as the following optimization problem:

minimize ω0Σω

0 0 subject to ω R = µ0, ω · 1 = 1.

Σ−1µ The optimal solution ω to this investment problem is ω = µ0 µ0Σ−1µ .

Proof. Here equations for quadratic optimizations are applied,

Σω − λ1µ = 0,

0 ω µ = µ0, µ0 1 = 1.

−1 The first equation yields, ω = λ1Σ µ. ω µ0 Inserting the expression for ω in the second equation gives, λ = = 0 . 1 Σ−1µ µ Σ−1µ −1 µ0Σ µ Thus, ω = µ0Σ−1µ (Hult, et al., 2010, p.98).

20 Suppose the expected return is µ0 = 0.002. To compute ω using the previous formula, we calculate the various components as:

  3.9899 −1.1975 −1.5992 −0.2991 −0.0838 −0.3042    −1.1975 5.6683 −1.0003 −0.7401 0.0611 −0.0499     −1.5992 −1.0003 4.0321 −0.4525 0.0004 −0.3375  −1   3 Σ =   · 10  −0.2991 −0.7401 −0.4525 4.4506 0.2074 −0.3202     −0.0838 0.0611 0.0004 0.2074 2.6747 −0.0690    −0.3042 −0.0499 −0.3375 −0.3202 −0.0690 1.9600

 0 Σ−1 · µ = 2.7284 2.0610 1.0283 −4.1770 0.0427 1.7967

µ0 · Σ−1 · µ = 0.0110

 0 ω = 0.4961 0.3747 0.1870 −0.7594 0.0078 0.3267 .

This way, I choose 200 µ0’s between -0.01 and 0.01, plot these points into a smooth curve which is the minimum variance curve and use the same method in the other two optimizations also.

As an alternative, I use quadprog which is a solver for optimal problems with linear constraints to find the weight vectors w that makes the variance of the portfolio the smallest and plot the minimum variance curve of the optimal portfolio in MATLAB. The plot is shown in section 4.1.2. For different risk measures, I also select three

µ0’s and their corresponding weights ω and put them in tables to show the weights to the readers.

5.1.2 Results of MV Optimization:

Three points from the efficient frontier by setting µ1 = 0.001, µ2 = 0.003, µ3 = 0.005 are selected. Then the corresponding three weight vectors are obtained:

21 Stocks Amazon Apple Facebook GE PEGI VWDRY Weights1 0.2225 0.3412 0.1170 -0.0749 0.2077 0.1866 Weights2 0.7448 0.5747 0.2830 -1.1217 0.0259 0.4933 Weights3 1.2672 0.8082 0.4490 -2.1685 -0.1559 0.8000

Table 3: The weights of MV optimization for given µ.

Figure 5 is a picture of the minimum variance line for MV optimization.

Figure 5: The minimum variance line for MV optimization.

From Table 3, we can see that it is optimal to take short position in GE while taking long positions in other stocks.

5.2 VaR Optimization

5.2.1 Algorithm Analysis:

After calculating the mean vector and covariance matrix, we find the β-quantile of loss L. The function x = fmincon(fun, ω0, A, b) is a nonlinear programming solver −1 which starts at ω0 and attempts to find a minimizer ω of the function FL (1 − β)

22 0 0 described in the fun subject to the linear constraints ω R ≥ µ0 and ω · 1 = 1.

5.2.2 Results of VaR Optimization:

For this optimization, I select three values β = 0.9, β = 0.95 and β = 0.99, as well as three points from each efficient frontier by setting µ1 = 0.001, µ2 = 0.003,

µ3 = 0.005. Then I get the corresponding three groups of tables and pictures:

If we set β = 0.9, µ1 = 0.001, µ2 = 0.003, µ3 = 0.005, then the weights are:

Stocks Amazon Apple Facebook GE PEGI VWDRY Weights1 0.3072 0.1781 0.1855 -0.0104 -0.1729 0.1668 Weights2 0.7804 0.2021 0.4906 -0.8296 -0.3088 0.6653 Weights3 2.1872 0.6577 -1.0436 -2.1499 0.2595 1.0890

Table 4: The weights of VaR optimization for given µ and β = 0.9.

Figure 6: The minimum variance line for VaR optimization when β = 0.9.

If we set β = 0.95, µ1 = 0.001, µ2 = 0.003, µ3 = 0.005, then the weights are:

23 Stocks Amazon Apple Facebook GE PEGI VWDRY Weights1 0.1813 0.1553 0.2050 -0.0412 -0.2106 0.2891 Weights2 0.8008 0.3506 0.4831 -0.8640 -0.3121 0.5417 Weights3 1.1133 0.4199 0.8547 -1.9187 -0.4776 1.0085

Table 5: The weights of VaR optimization for given µ and β = 0.95.

Figure 7: The minimum variance line for VaR optimization when β = 0.95.

If we set β = 0.99, µ1 = 0.001, µ2 = 0.003, µ3 = 0.005, then the weights are:

Stocks Amazon Apple Facebook GE PEGI VWDRY Weights1 0.2466 0.4153 -0.0545 -0.0538 0.1779 0.2685 Weights2 0.9764 0.1890 0.1651 -0.9546 -0.0257 0.6498 Weights3 1.7632 -0.1691 0.9555 -1.9450 -0.1296 0.5248

Table 6: The weights of VaR optimization for given µ and β = 0.99.

24 Figure 8: The minimum variance line for VaR optimization when β = 0.99.

From the three tables: Table 4, 5, and 6, we can see that it is optimal to take short position not only in GE but also for PEGI.

5.3 CVaR Optimization

5.3.1 Algorithm Analysis:

In section 3.4, the function Fβ(ω, c) was introduced, here recalled:

1 F (ω, c) = c + E[(L(ω, R) − c) ], β 1 − β + where

( L(ω, R) − c , L(ω, R) ≥ c, (L(ω, R) − c)+ = 0 ,L(ω, R) < c.

25 In this optimization problem, the following estimate of Fβ(ω, c) is used:

1 1 n Fbβ(ω, c) = c + · · Σ [(L(ω, Ri) − c)+], 1 − β n i=1 where n is the sample size.

1 n The estimator n Σi=1[(L(ω, Ri) − c)+] is an unbiased estimator of E[(L(ω, R) − c)+] and hence Fbβ(ω, c) is an unbiased estimator of Fβ(ω, c) (Rockafellar and Uryasev, 2000, p.25).

5.3.2 Results of CVaR Optimization:

For this optimization, three values β = 0.9, β = 0.95 and β = 0.99, as well as three points from each efficient frontier by setting µ1 = 0.001, µ2 = 0.003, µ3 = 0.005 are selected. Then we get the corresponding three groups of tables and pictures.

If we set β = 0.9, µ1 = 0.001, µ2 = 0.003, µ3 = 0.005, then the weights are :

Stocks Amazon Apple Facebook GE PEGI VWDRY Weights1 0.2298 0.3635 0.0579 -0.0841 -0.2243 0.2086 Weights2 0.7796 0.6673 0.1215 -1.1010 -0.0114 0.5439 Weights3 1.4238 0.8887 0.1813 -2.1053 -0.2198 0.8314

Table 7: The weights of CVaR optimization for given µ and β = 0.9.

Figure 9: The minimum variance line for CVaR optimization when β = 0.9.

26 If we set β = 0.95, µ1 = 0.001, µ2 = 0.003, µ3 = 0.005, then the weights are:

Stocks Amazon Apple Facebook GE PEGI VWDRY Weights1 0.2221 0.3202 0.0904 -0.0745 0.2203 0.2215 Weights2 0.7373 0.6934 0.1792 -1.0626 -0.0974 0.5502 Weights3 1.2454 0.9346 0.4135 -2.0010 -0.4652 0.8728

Table 8: The weights of CVaR optimization for given µ and β = 0.95.

Figure 10: The minimum variance line for CVaR optimization when β = 0.95.

If we set β = 0.99, µ1 = 0.001, µ2 = 0.003, µ3 = 0.005, then the weights are:

Stocks Amazon Apple Facebook GE PEGI VWDRY Weights1 0.2713 0.3785 -0.0257 -0.0796 0.2312 0.2243 Weights2 0.8409 0.4657 0.2957 -0.9008 -0.2636 0.5620 Weights3 1.4497 0.6631 0.4790 -1.5999 -0.9643 0.9724

Table 9: The weights of CVaR optimization for given µ and β = 0.99.

27 Figure 11: The minimum variance line for CVaR optimization when β = 0.99.

From table 7, 8, and 9, we can see that also here the tendency is that it is optimal to take a short position in GE and PEGI.

6 Conclusion and Discussion

This thesis compares the risks of energy company stocks to that of technology com- panies. Wind energy company stock risks are not significantly lower than the stocks of other companies. Especially the stock of Vestas Wind Systems is more risky than other companies.

The fact that the risks for the energy companies are not lower than the risks of the technology companies indicates that the electricity market seems to be a functioning market.

Interestingly, for the risk measures we consider it is optimal to take a negative posi- tion in one or two of the energy companies indicating that those energy companies were developed under the opposite direction as the technology companies.

This thesis has the following limitations:

28 1) The choice of data. The data chosen is from 2013 to 2019. There might be other financial sectors that influenced the stock prices.

2) The choice of companies. The business model of a company influences its stock price. Individual companies may not represent all companies of that type.

3) The choice of distribution. Obviously, the observations fit student t distribution better than normal distribution. The biases caused by the inappropriate choice of distribution are beyond the scope of my knowledge. They could be explored in fu- ture studies.

The value-at-risk V aRβ(X) at level β ∈ (0, 1) of a portfolio is defined as

V aRβ(X) = min{m : P (X ≤ m) ≥ 1 − β}

With FX (m) denoting the distribution function of a continuous random variable X, −1 we can rewrite V aRβ(X) as V aRβ(X) = FX (1 − β).

It means that V aRβ(X) is nothing but the β-quantile of a continuous random vari- able X.

Conditional value at risk (CVaR) is also called expected shortfall (ES), average value at risk (AVaR), or expected tail loss (ETL). It is defined by

1 Z 1 CV aRβ(X) = V aRu(X)du. 1 − β β

29 7 References

Ahmadi-Javid, A., 2012. Entropic value-at-risk: A new coherent risk measure. Jour- nal of Optimization Theory and Applications, 155 (3), pp.1105-1123.

Artzner, P., Delbaen, F., Eber, J.M., and Heath, D., 1999. Coherent measures of risk. Mathematical Finance, 9(3), pp.203–227.

Bowman, K.O., and Shenton, L.R., 1998. Estimator: Method of Moments. Ency- clopedia of statistical sciences, Wiley, pp.2092–2098.

Capinski, M. and Zastawniak, T., 2011. Mathematics for Finance: An Introduction to Financial Engineering. 2nd ed. London: Springer.

Chebyshev, P.L., 1887. A French translation, Sur deux theoremes relatifs aux prob- abilites, is given in Acta Mathematica, Vol. XIV, pp. 805-815.

Fama, E., 1970. Efficient Capital Markets: A Review of Theory and Empirical Work. Journal of Finance, 25 (2), pp.383–417.

Fleckner, A.M., 2015. Regulating Trading Practices. Oxford Handbook of Financial Regulation, Oxford: Oxford University Press, ISBN 978-0-19-968720-6.

Gut, A., 2005. Probability: A Graduate Course . Sweden: Springer.

Hult, H., Linskog, F., Hammarlid, O. and Rehn, C.J., 2010. Risk and Portfolio Analysis, Principles and Methods. 2nd ed. Sweden: Springer.

Johnson, R.A. and Wichern, D.W., 2014. Applied Multivariate statistical analysis. 6th ed. USA: Pearson Education. Inc.

Keener, R.W., 2010. Theoretical Statistics. New York: Springer. pp.61-62.

Markowitz, H., 1952. Portfolio Selection. The Journal of Finance, 7(1), pp.77-91.

Olson, S., 2019. [online] Available at: < https://www.investopedia.com/investing/wind-stocks/ > [Accessed 15 Septemberr 2019].

30 Rockafellar, R.T., and Uryasev, S., 2000. Optimization of Conditional Value-at- Risk. Journal of Risk, 2(3), pp.21-41.

Sarykalin, S., Serraino, G., and Uryasev, S., 2008. Value-at-risk VS conditional value-at-risk in risk management and optimization. Tutorials in Operations Re- search, Informs, pp.270-294.

Stock prices of Apple, Amazon, Facebook, Sinopec, Tesla, 2012/05/18-2018/10/30. [online] Available at: < https://finance.yahoo.com > [Accessed 31 October 2018].

West, G., 2006. An introduction to Modern Portfolio Theory: Markowitz, CAP-M, APT and Black-Litterman, [online] Available at: < https : //janroman.dhis.org/ finance/P ortfolio/Modern%20P ortfolio%20T heory.pdf [Accessed 12 September 2019].

Xue, M., 2016. Comparative analysis of portfolio optimization. Linnaeus University.

31 8 Appendix

Code from MATLAB :

1 R=diff(MA)./(MA(1:1360,:));

2

3 %change sample to simulated data

4

5 %R=X;

6

7 %Find mean vecter and covariance matrix

8

9 M=mean(R);

10 M=M’;

11 Cov=cov(R);

12

13 %find smallest variance portfolio

14

15 f=zeros(1,6);

16 A=[];b=[];

17 A1=zeros(6,6);

18 A1(1,:) = ones(1,6);

19 Aeq=A1;

20 beq=[1,0,0,0,0,0];

21 lb=[];ub=[];

22 w= quadprog(Cov,f,A,b,Aeq,beq,lb,ub);

23

24 M1=w’∗M;% find the expected return for the smallest variance p o r t f o l i o

25

26 d1=sqrt(w’ ∗ Cov∗w);% find the standard deviation for the s m a l l e s t variance portfolio

27

28 p l o t(d1,M1,’x’)

29 hold on

30

31 % find the MV curve

32

32 33 r= − 0.01:0.0001:0.01;

34 N= length(r);

35 w2=zeros(6,N);

36 d3=zeros(1,N);

37 f o ri=1:N

38 Aeq=[ones(1,6);M’; zeros(4,6)];

39 beq=[1,r(i) ,0,0,0,0];

40 w2(:,i)= quadprog(Cov,f,A,b,Aeq,beq,lb,ub);

41 d3(i)=sqrt(w2(:,i)’ ∗Cov∗w2(:,i));

42 end

43

44

45 M3=w2’∗M;

46

47 p l o t(d3,M3,’k −−’)

48 hold on

49

50

51

52 %Find the VaR curve

53

54 alpha=0.95;

55 n=length(R);

56 r i s k 2=[];

57 j=1:6;

58 w0=[(1/6) ∗ ones(1,6)];

59 r= −0.005:0.0004:0.0051;

60 N=length(r);

61 w4=zeros(N,6);

62 f o ri=1:N

63 objfun=@(w4)(quantile(R(:,j) ∗w4(j)’,1 − alpha));

64 Aeq=[−M’;1 1 1 1 1 1];

65 beq=[−r(i);1];

66 lb=[− I n f −I n f −I n f −I n f −I n f −I n f];

67 w4(i ,:)=fmincon(objfun, w0,[],[], Aeq,beq,lb);

68 r i s k 2(:,i)=(quantile(R(:,j) ∗w4(j)’,1 − alpha));

69 end

70

33 71 d4=zeros(1,N);

72 f o ri=1:N

73 d4(i)=sqrt(w4(i,:) ∗Cov∗w4(i,:)’);

74 end

75 p l o t(d4,r,’g’)

76 hold on

77

78

79 %Find the CVaR curve

80

81 beta=0.95;

82 n=length(R);

83 r i s k 2=[];

84 j=1:6;

85 w0=[(1/6) ∗ ones(1,6)];

86 VaR0=abs(quantile(R ∗w0’,1 − beta));

87 w0=[(1/6) ∗ ones(1,6) VaR0];

88 r= −0.01:0.0001:0.01;

89 N=length(r);

90 w4=zeros(N,7);

91 f o ri=1:N

92 objfun=@(w4)w4(7)+(1/n) ∗(1/(1 − beta)) ∗sum(max( −w4(j) ∗R(:,j)’ − w4(7) ,0));

93 Aeq=[−M’ 0;1 1 1 1 1 1 0];

94 beq=[−r(i);1];

95 lb=[ −100 −100 −100 −100 −100 −100 −I n f];

96 w4(i ,:)=fmincon(objfun, w0,[],[], Aeq,beq,lb);

97 r i s k 2(:,i)=w4(7)+(1/n) ∗(1/(1 − beta)) ∗sum(max( −w4(j) ∗R(:,j)’ − w4(7) ,0));

98 end

99

100 w3= w4(:,1:6) ’;%weight of CVaR curve

101 d4=zeros(1,N);

102 f o ri=1:N

103 d4(i)=sqrt(w3(:,i)’ ∗Cov∗w3(:,i));

104 end

105 p l o t(d4,r,’r’)

106 hold on

34 107

108 x l a b e l(’standard deviation’)

109 y l a b e l(’mean’)

110 t i t l e(’Mixture of MV, VaR and CVaR curves’)

111 legend(’Smallest variance portfolio’,’MV curve’,’VaR curve, beta=0.95’,’CVaR curve, beta=0.95’)

35