Quantitative Analysis READING 12 FUNDAMENTALS OF PROBABILITY ❑ BASICS OF PROBABILITY

KEY POINTS

1.)0

5.) The probability that both A and B will occur is written P(AB) and referred to as the Joint Probability. P(AB) =P(A/B) x P(B) ❑ EVENT AND EVENT SPACES

EVENT Event is a single outcome or a combination of outcomes for a random variable.

EVENT SPACE The event space E is the set of the possible distinct outcomes of the random process. The event space is sometimes called the sample space. When rolling a normal six-sided die and recording the uppermost face, the event space is E={1,2,3,4,5,6}. ❑INDEPENDENT EVENTS

Two events are independent events if knowing the outcome of one does not affect the probability of the other. When two events are independent following must hold:

1.) P(A) X P(B) = P(AB)

2.) P(A/B) = P(A)

3.) If A1 , A2 ,……..AN are independent events , their joint probability P( A1 and A2……..and An)is equal to P(A1) x P(A2) x………..x P(An) ❑ MUTUALLY EXCLUSIVE EVENTS

Two events are mutually exclusive events if they cannot both happen. ❑CONDITIONALY INDEPENDENT EVENTS Two conditional probabilities , P(A/C) and P(B/C) , may be independent or dependent regardless of weather the unconditional the unconditional probabilities ,P(A) and P(B) , are independent or not. When two events are conditionally independent events ,

P(A/C) x P(B/C) = P(AB/C) ❑TOTAL PROBABILITY RULE . The Total Probability Rule states that if the conditional events Bi are mutually exclusive and exhaustive then:

P(A)=P(A/B1)P(B1) + P(A/B2)P(B2) +…..+P(A/Bn)P(Bn)

❑JOINT PROBABILITY Given a conditional probability and the unconditional probability of the conditional event, we can calculate the Joint Probability of both events using P(AB) =P(A/B) x P(B) , .❑BAYES RULE Bayes, Rule allows us to use information about the outcome of one event to improve our estimates of the unconditional probabilities of another event.

P(A/B) = P(B/A) x P(A) / P(B) . Questions 1.) A dealer in a casino has rolled a five on a single die three times in a row. What is the probability of her rolling another five on the next roll, assuming it is a fair die? A. 0.200. B. 0.001. C. 0.167. D. 0.500.

Explanation (0.167) The probability of a value being rolled is 1/6 regardless of the previous value rolled. 2.)If X and Y are independent events, which of the following is most accurate? A. P(X or Y) = P(X) + P(Y). B. P(X | Y) = P(X). C. P(X or Y) = (P(X)) × (P(Y)). D. X and Y cannot occur together.

Explanation (B) Note that events being independent means that they have no influence on each other. It does not necessarily mean that they are mutually exclusive. Accordingly, P(X or Y) = P(X) + P(Y) − P(X and Y). By the definition of independent events, P(X|Y) = P(X). READING 13

RANDOM VARIABLES .❑RANDOM VARIABLES

DISCRETE RANDOM VARIABLES A Discrete Random Variable is one that can take on only a countable number of possible outcomes. If it can take on only two possible values, zero and one , it is referred to as Bernoulli Random Variable.

CONTINUOUS RANDOM VARIABLES A Continues Random Variable has an uncountable number of possible outcomes. Because there are an infinite number of possible outcomes, the probability of any single value is zero. For Continues Random Variables, we measure probability over some positive interval. ❑ PROBABILITY FUCTION . PROBABILITY MASS FUNCTION A Probability Mass Function (PMF), f(x)=P(X=x), gives us the probability that the outcome of a discrete random variable, X, will be equal to a given number, x.

For all of these PMFs , the sum of the probabilities of all possible outcome is 100% , a requirement for a PMF . For a Bernoulli Random Variable for which the P(x=1)=p , the PMF is f(x)=px (1-p)1-x . This yields P(x=1)=p and P(x=0)=1-p

CUMULATIVE DISTRIBUTION FUNCTION A Cumulative Distribution function (CDF) gives us the probability that a random variable will take on value less than or equal to x [ i.e. ,F(x) =P(X≤ 풙) ]. For Bernoulli Random Variable, the CDF is: 0 x<0 F(x)=1-p 0≤ 풙 < ퟏ 1 x≥ ퟏ ❑EXPECTATION

The is the weighted average of the possible outcomes of a random variable , where the weights are the probabilities that the outcome will occur.

E(X)= ∑푷 풙풊 풙풊 = 푷 풙ퟏ 풙ퟏ + 푷 풙ퟐ 풙ퟐ + … … 푷 풙풏 풙풏

Statistically speaking , our best guess of the outcome of a random variable is the Expected Value.

PROPERTIES OF EXPECTED VALUE: 1. If c is any constant , then: E(cx) = cE(X) 2. If X and Y are any random variables , then: E(X+Y) =E(X) +E(Y) SKEWNESS, a measure of a distributions symmetry, is the standardized Third . We standardize it by dividing it by the cubed.

= 퐸[ 푋 − 휇 3 ]/ 휎3 ❑ FOUR COMMON POPULATION MOMENTS

The first moment, the MEAN of a random variable is its expected value E(X).which is represented by the Greek letter 휇 푚푢 . The second central movement of a random variable is its VARAIANCE, 휎2 . VARIANCE gives us information about how widely dispersed the values of random variable are around the mean. , is the standard Forth Moment. It is a measure of the shape of the distribution , in particular the total probability in the tails of the distribution relative to the probability in the rest of the distribution. The higher the KURTOSIS, the greater the probability in the tails of the distribution.

= 퐸[ 푋 − 휇 4 ]/ 휎4 Questions 1.) There is a 30% chance that the economy will be good and a 70% chance that it will be bad. If the economy is good, your returns will be 20% and if the economy is bad, your returns will be 10%. What is your expected return? A. 13%. B. 15%. C. 17%. D. 18%. Explanation Expected value is the probability weighted average of the possible outcomes of the random variable. The expected return is: ((0.3) × (0.2)) + ((0.7) × (0.1)) = (0.06) + (0.07) = 0.13 2.)An analyst is currently considering a portfolio consisting of two stocks. The first stock, Remba Co., has an expected return of 12% and a standard deviation of 16%. The second stock, Labs, Inc., has an expected return of 18% and a standard deviation of 25%. The correlation of returns between the two securities is 0.25.If the analyst forms a portfolio with 30% in Remba and 70% in Labs, what is the portfolio's expected return? A. 15.0%. B. 17.3%. C. 21.5%. D. 16.2%. Explanation ER = Σ(ER )(W ) ER = w ER + w ER , where ER = Expected Return and w = % invested in each stock. ER = (0.3 × 12) + (0.70 × 18) = 3.6 + 12.6 = 16.2% READING 14

COMMON UNIVARIATE RANDOM VARIABLES ❑ THE UNIFORM DISTRIBUTION

The Continuous Uniform Distribution is defined over a range that spans between some lower limit, a, and some upper limit , b, which serve as the parameters of the distribution. Outcomes can only occur between a and b and because we are dealing with continuous distribution, even if a

PDF : f(x) = 1/b-a for a≤ 풙 ≤ 풃 , 풆풍풔풆 풇 풙 = ퟎ E(x) = a+ b/2 Var(x) = (b-a)2 /12 ❑ BERNOULLI DISTRIBUTION

A Bernoulli Random Variable only has two possible outcomes, success or failure. Probability of success = p , 1 Probability of failure =(1-p) , 0 MEAN: p VAR(x): p(1-p)

PMF: f(x)=px (1-p)1-x

CDF: 0 x<0 F(x)=1-p 0≤ 풙 < ퟏ 1 x≥ ퟏ ❑ BINOMIAL DISTRIBUTION

The Binomial Random Variable may be defined as the number of successes in a given number of Bernoulli trails, whereby the outcome of each INDEPENDENT TRIAL can be either success or failure.

푛! ( − ) P(x) = P(X=x) = 푝푥 1 − 푝 푛 푥 푛−푥 !푥!

Expected Value of X = E(X) =np Variance of X = np(1-p) = npq Where: n= number of trails p = probability of success in each independent trial q = probability of failure in each independent trail = (1-p)

The binomial distribution model allows us to compute the probability of observing a specified number of "successes" when the process is repeated a specific number of times and the outcome for a given trail is either a success or a failure ❑ POISSION DISTRIBUTION

Poisson Random Variable X refers to the NUMBER OF SUCCESS PER UNIT, The parameter (흀) refers to average or EXPECTED NUMBER OF SUCCESS PER UNIT. Probability of obtaining X successes given that 휆 successes are expected is given by: 휆푥푒−휆 p(X=x) = 푥! MEAN :휆 VARIANCE :휆

POISSION DISTRIBUTION can be used to calculate number of defects per batch in production or number of call per minute to a toll free number. ❑ NORMAL DISTRIBUTION

KEY PROPERTIES OF A NORMAL DISTRIBUTION : ▪ It is completely described by its mean,휇 ,and variance , 휎2 ,stated as X-N(휎, 휇2) ▪ Skewness=0, meaning the normal distribution is symmetric about its mean & (MEAN=MEDIAN=MODE) ▪ Excess kurtosis= (Kurtosis−3) = 0 for normal distribution ❑ STANDARD NORMAL DISTRIBUTION AND CONFIDENCE INTERVAL A Standard Normal Distribution (z-distribution) is a normal distribution that has been standardized so it has a mean of zero and standard deviation of 1 .[ i.e. N-(0,1)]

z=(x−흁)/흈

A Confidence Interval is a range of values around the expected outcome within which we expect the actual outcome to be some specified percentage of the time. A 95% confidence interval is a range that we expect the random variable to be in95% of the time.

+ 푿ഥ − 풛풔 Where; 푋ത = sample mean , and s = sample standard deviation

❑ LOGNORMAL DISTRIBUTION

The Lognormal distribution is generated by the function 푒푥, where x is normally distributed. The PDF for lognormal distribution is:

KEY PROPERTIES OF A LOGNORMAL DISTRIBUTION ▪ The lognormal distribution is skewed to the right. ▪ The lognormal distribution is bound from below by zero. , ❑ STUDENT S T- DISTRIBUTION

KEY PROPERTIES OF STUDENT, S T- DISTRIBUTION : ▪ It is symmetrical ▪ It is defined by a single parameter, the degrees of freedom (df=n-1) ▪ It has greater probability in the tail (fatter tail) than normal distribution ▪ As the df (the sample size) gets larger, the shape of the t-distribution approaches a standard normal distribution.

It is the appropriate distribution to use when construction CI based on small sample(n<30) from a population with unknown variance and a normal or approximately normal distribution.

It may also be appropriate to use when sample size is large and population variance is unknown. ❑ CHI-SQUARED DISTRIBUTION

The Chi-Squared Distribution is ASYMMETRICAL, BOUNDED BELOW BY ZERO , and approaches the normal distribution in shape as the degree of freedom increase. The chi-square test statistic, n-1 df , is computed as: 2 2 = 푛 − 1 푠 /휎0 Where: n= sample size 푠2= sample variance 2 휎0 = hypothesized value of the population mean ❑ F- DISTRIBUTION The F-Distribution is right-skewed and is bound at zero on the left hand side. The shape of the distribution is determined by two different degrees of freedom, the numerator 푑푓1 , the denominator 푑푓2. Hypothesis concerning the equality of variances of two population are tested with an F-distribution test statistic. The F-statistic is computed as: 2 2 F= 푠1 /푠2 Where: 2 푠1 = 푣푎푟𝑖푎푛푐푒 표푓 푡ℎ푒푣 푠푎푚푝푙푒 표푓 푛1observation drawn from population 1

2 푠2 = 푣푎푟𝑖푎푛푐푒 표푓 푡ℎ푒푣 푠푎푚푝푙푒 표푓 푛2observation drawn from population 2

An F- distribution test statistic is used when the populations from which samples are drawn are normally distributed and that the samples are independent. Questions 1.)The annual returns for a portfolio are normally distributed with an expected value of £50 million and a standard deviation of £25 million. What is the probability that the value of the portfolio one year from today will be between £91.13 million and £108.25 million? A. 0.040. B. 0.075. C. 0.025. D. 0.090. Explanation Calculate the standardized variable corresponding to the outcomes. Z = (91.13 − 50) / 25 = 1.645, and Z = (108.25− 50) / 25 = 2.33. The cumulative normal distribution gives cumulative probabilities of F(1.645) = 0.95 and F(2.33) =0.99. The probability that the outcome will lie between Z and Z is the difference: 0.99 − 0.95 = 0.04. Note that even though you will not have a z-table on the exam, the probability values for 1.645 and 2.33 are commonly used values that you should have memorized. 2.)A food retailer has determined that the mean household income of her customers is $47,500 with a standard deviation of $12,500. She is trying to justify carrying a line of luxury food items that would appeal to households with incomes greater than $60,000. Based on her information and assuming that household incomes are normally distributed, what percentage of households in her customer base has incomes of $60,000 or more? A. 15.87%. B. 2.50%. C. 34.13%. D. 5.00%. Explanation Z = ($60,000 - $47,500) / $12,500 = 1.0 From the table of areas under the normal curve, 84.13% of observations lie to the left of +1 standard deviation of the mean. So, 100% - 84.13% = 15.87% with incomes of $60,000 or more. Reading 15 Multivariate Random Variables Probability Matrices

• A probability Matrix of a discrete bivariate random variable distribution describes the outcome probabilities as a function of the coordinates X1 and X2. • All probabilities in the Matrix are positive or zero, are less than or equal to 0, and the sum across all possible outcomes for X1 and X2 equals 1. • P3 = 1-P1-P2 Marginal & Conditional Distributions

• A marginal distribution defines the distribution of a single component of a bivariate random variable. • A conditional distribution sums the probabilities of the outcomes for each component conditional on the other component being a specific value. The numerator in the equation of conditional distribution is the joint probability of two events occurring and the denominator is the marginal probability X2 = x2. Expectation of a Bivariate Random Variable

• The expectation of a bivariate random function g(X1,X2) is a probability weighted average of the function of the outcomes g(x1,x2) Covariance

• It is the expected value of the product of the deviations of the two random variables from their respective expected values. It measures how two variables move with each other. • Cov between X1 and X2 Cov (X1,X2) = E[(X1-E(X1)(X2-E(X2)] or Cov [X1,X2] = E[X1X2]-E[X1] E[X2] Correlation

• The Correlation coefficient is a statistical measure that standardizes the covariance as follows: Corr(X1,X2)= Cov[X1,X2]/(Var(X1))^0.5 (Var(X2)^0.5 • Corr ranges from -1 to +1. Behavior of Moments For Bivariate

There are four effects of linear transformation of bivariate random variables. X2=a+b.X1 • The first effect of a linear transformation on the cov of the two random variables is that b determines the corr between the components. The Corr between X1 and X2 will be 1 if b>0, 0 if b =0 and -1 if b<0. Effects of linear transformations

• A second effect of linear transformations on cov is that the amount or scale of a has no effect on the variance, and the scale of b determines the scale or changes in the variance by b^2. • A third effect of linear of linear transformations of cov is that the scale of cov is determined by two variables, b and d, as follows: Cov[a+bX1,c+dX2]= bdCov[X1,X2] • The fourth effect of linear transformations on cov between random variables relates to and cokurtosis. Variance of Two asset portfolio

• The variance of a two-asset portfolio, X1 and X2 with weights a and b, respectively is: Var[aX1 + bX2]= a^2(Var(x1)) + b^2(Var(x2)) + 2abCov(X1X2) Portfolio Risk Management

• In the Context of portfolio risk management, a conditional expectation of a random variable is computed based on Specific Event Occurring. • A conditional distribution is defined based on the conditional probability for a bivariate random variables X1 given X2. Independent and Identically Distributed Random Variables

• Are independent of all other components. • Are all from a single univariate distribution, and • All have the same moments. Features of the sum of n I.I.D random variable

• The expected value of the sum of n i.i.d random variables is equal to n(mean). • The Variance of the sum of n i.i.d random variables is equal to n(variance)^2. • The Variance of the sum of i.i.d random variables grows linearly. • The Variance of the average of multiple i.i.d random variables decreases as n increases. Reading 16 Sample moments Mean and Variance

• The Sample mean and sample variance, for a sample of n independent and identically distributed random variables Xi are computed as: Mean= Sum Xi/n Var= Sum(Xi-(Mean))^2 • The Sample Mean is an estimator based on a known data set where all data points are observable. It is only an estimate of the true population mean. Properties of Mean

• All intervals and ratio data sets have an arithmetic mean. • All data values are considered and included while computing. • A data set has only one mean. Variance & Standard Deviation

• Variance of mean estimator depends upon the variance of the sample data and the number of observations. • If more data is available, then it is more difficult to estimate the true variance. The variance of mean estimator decreases when number of observations increases. Variance of Mean Estimator is: Variance/n Point Estimate or Estimators

• Point estimates are single (sample) values used to estimate population parameters and the formula used to compute a point estimate is known as an estimator. Biased Estimators

• The bias of an estimator measures the difference between the expected value of the estimator and the true population value: Bias= E(Estimator)- Population value • When X consists of i.i.d random variables, the main estimator is equal to the true population mean. • Sample Mean is an unbiased estimator. • Sample variance is a biased estimator. Best Linear Unbiased Estimator

• It is the best estimator for population mean available because it has the minimum variance of any linear unbiased estimator. • Unbiased=> Population – Sample = 0 • Efficient=> whose variance is minimum • Consistent=> If sample size increases, it’ll start behaving like normal. Law of Large numbers

• The Law implies estimators converge to the true population value or where an average of many samples converges to the expected estimator. • The Central Limit Theorem states that when the sample size is large, the sums of i.i.d random variables will be normally distributed. Calculation of Median

• The median calculation with an odd number sample size is: Median(x) = x(n+1)/2. • The median calculation with an even number sample size is: Median(x) = (1/2) (x [of n/2] + x of [n/2 +1]) Cross Moments

• 1st 0 => Cross Means • 2nd 1=> Covariance • 3rd 2=> Coskewness • 4th 3=> Cokurtosis Covariance & Correlation

• Covariance measures the extent to which two random variables tend to be above and below their respective means for each joint realization. Cov(x,y) = E{[x-E(x)][y-E(y)]} • Correlation is a standardized measure of association between two random variables; it ranges in value from - 1 to +1 and is equal to Corr(x,y) = Cov(x,y)/[Var(x)]^0.5 [Var(y)]^0.5 where [Var(x)]^0.5 is SD(x) Co skewness

• It measures the likelihood of large directional movements occurring for one variable when the one variable is large. • Coskewness is zero when there is no relationship between the sign of one variable when large moves occur with the other variable. Calculation of Coskewness Co kurtosis

• The cokurtosis depends upon the correlation. • The cokurtosis ranges between +1 to +3, with the smallest value of 1 occurring when the correlation is equal to 0 and cokurtosis increases when correlation moves away from zero. • Cokurtosis for the asymmetrical cases range from -3 to +3 and is a linear relationship that is upward sloping as the correlation increases from -1 to +1. Calculation of Co kurtosis Reading 17 Hypothesis Testing Introduction

• Hypothesis testing is the statistical assessment of a statement or idea regarding a population. For example : The mean return for the US Security is greater than zero. • Hypothesis are stated in terms of the population parameter to be tested like the population mean. Null and alternative hypothesis

• Null (Ho) : is the hypothesis the researcher wants to reject. It is stated with an equal to sign. For example : Ho : U =Uo • Alternative Hypothesis (Ha) : is what is concluded if there is sufficient evidence to reject the null hypothesis. It is usually the alternative hypothesis that the researcher is really trying to assess. Test statistic

• Test statistic is calculated from the sample data • It is compared with the critical value for the decision. One tailed Test

• One sided test is referred as one tailed test. For eg : if the researcher Wants to test whether the return on stock is greater than zero then it is a one tailed test. • A one tailed test may be structured as : Ho:U<=Uo Ha: U>Uo • Decision rule Reject Ho if : t-stat > critical value Fail to reject Ho if : t-stat

• Two sided test is referred as two tailes test. Two sidrd tests allow for deviation on both sides of the hypothesized value • A two tailed test may be structured as : Ho:U=Uo Ha :U not equal to Uo • Decision rule Reject Ho if : T-sat > upper crtical value Or T sat < lower critical value Type 1 and type 2 error

• Type 1 error :rejection of the null hypothesis when it is actually true • Type 2 error :the failure to reject the null hypothesis when it is actually false • The significance level i.e alpha is the probability of making type 1 error • Power of test is the probability of correctly rejecting the null hypothesis when it is false Relation between confidence interval and hypothesis tests

• Confidence interval is a range of values within which the researcher believes the true population parameter may lie.

• Similarly, a hypothesis test would conpare a test statistic to a critical value. - critical value <= t statistic <= critical value Statistical significance vs practical significance

• Statistical significance does not necessarily imply practical significance. • Reasons: Transaction costs Taxes Economical significance P-value

• The p-value is the probability of obtaining a test statistic that would lead to a rejection of the null hypothesis, assuming the null hypothesis is true. It is the smallest level of significance for which the null hypothesis can be rejected. • For one tailed test it is the probability that lies above the computed test statistic or below the computed test statistic for lower tails • For two tailed test, the p-value is the probability that lies above the positive value of the test statistic plus the probability that lies below the negative value of the test statistic • Decision rule If p value< significance level :reject And if p value>significance level :fail to reject T-test and Z -test Questions

1. Austin Roberts believes the mean price of houses in the area is greater than $145,000. A random sample of 36 houses in the area has a mean price of $149,750. The population standard deviation is $24,000 and Roberts want to conduct a hypothesis test at a 1% level of significance. The value of the calculated test statistic is closest to • Z=0.67 • Z=1.19 • Z=4 • Z=8.13 2. Which of the following statements about hypothesis testing is most accurate? • The power of a test is one minus the probability of a type I error. • The probability of a type I error is equal to the significance level of the test. • To test the claim that X is greater than zero, the null hypothesis would be Ho:X>0. • If you can disprove the null hypothesis,then you have proven the alternative hypothesis. Answer

1. B 2. B Reading 18 Linear Regression LINEAR REGRESSION BASIC LAYOUT

Population = B0 + B1x + e

Sample = b0 + b1x + e

Where Population , sample are dependant variables , “ x” is the independent variable , B0 and b0 are respective slopes and “ e “ is the error term. PROPERTIES OF REGRESSION

1) Everything should be linear 2) Linearity should be in parameters 3) The error term should be additive OLS

Ordinary Least Squares is an estimation is a process that estimates the parameters B0 and B1 to minimise error terms.

AIM- to reduce error HOMOSCEDASTICITY

The data should be homoscedastic in order to reduce error in the model.

As seen , the data is plotted near to the Best fit line , reducing error. HETEROSCEDASTICITY

Data should not be heteroscedastic as this increases the error in the model. As seen , the data is nowhere near the best fit line. Y = B0 + B1x + e e = Y – (B0 + B1x)

where B0 is the intercept and B1 is the slope

퐜퐨퐯(퐱,퐲) Slope = β = = OLS 퐕퐚퐫 (퐱) ASSUMPTIONS UNDERLYING LINEAR REGRESSION

1) The expected value of the error term conditional on the x variable should be 0 , or correlation of error and variable should be 0. 2) Should be iid ( identically independently distributed)

• Also,

- Linear between dependent and independent variables - Not omitting any variable - Error not to correlate with independent variable SUM OF SQUARES

Total sum of squares = explained sum of squares + sum of squared residuals

TSS = ESS + SSR R2 ( coefficient of determination )

• This checks the goodness to fit of a model.

• The higher th R2 , better the goodness of fit to the model.

푬푺푺 푻푺푺−푺푺푹 푺푺푹 R2 = = = 1- 푻푺푺 푻푺푺 푻푺푺 OR R2 = r2 STANDARD ERROR OF REGRESSION

- Gauges to fit ( lower the SER , the better the model) - Degree of variability of actual `y value’ to estimated ` y value’

풆ퟐ - SER = 풏−풌−ퟏ

Where e = error , k is the coefficient STEPS OF HYPOTHESIS TESTING OF SINGLE REGRESSION VARIABLE

1) Specify the hypothesis to be tested 2) Calculate the test statistic 3) Reject the null or fail to reject the null on the basis of test statistic. THE P VALUE

• The P value is the smallest possible value for which we can reject the null hypothesis. For example

- If p value > significance level , we cannot reject the null - If p value < significance level , we can reject the null hypothesis. Reading 19 Regressionwith Multiple Variables What does Regression with Multiple Explanatory variables mean?

: To include more than one independent variable in our regression analysis. Till now, we have done regression analysis with one variable and now to differentiate between the two let’s work on an example. CASE 1: Determine the impact of GDP growth rate on share price of RIL.

RIL = A +X1(GDP) CASE 2: Determine the impact of GDP growth rate, interest rates and inflation on share price of RIL.

RIL = A +X1(GDP) + X2(Interest rates) + X3(Inflation)

In Case 1 we have only 1 independent variable i.e. GDP [LINEAR REGRESSION], but in case 2 we have 3 explanatory variables [Such kind on regression models are known as Multiple Regression models] ASSUMPTIONS OF MULTIPLE REGRESSION

• The expected value of the error term, conditional on the independent variables, is zero: [E(εi |Xi’s ) = 0]. • All (Xs and Y) observations are I.i.d • The variance of X is positive (otherwise estimation of β would not be possible). • The variance of the errors is constant (i.e., homoscedasticity). • There are no outliers observed in the data • X variables are not perfectly correlated (i.e., they are not perfectly linearly dependent). In other words, each X variable in the model should have some variation that is not fully explained by the other X variables PARTIAL SLOPE COEFFECIENTS

• MEANING: For a multiple regression, the interpretation of the slope coefficient is that it captures the change in the dependent variable for a one unit change in independent variable, holding the other independent variables constant. As a result, the slope coefficients in a multiple regression are sometimes called partial slope coefficients.

• Taking the Same RIL Example ⮚ RIL = 6 + 2.5(GDP) + 1.3(Interest Rates) + .34(Inflation) ⮚ In this case, 2.5 (GDP) means that for every one unit change in GDP; RIL share price will increase by 2.5, keeping Interest rates and Inflation constant. INTERPRETING MULTIPLE REGRESSION RESULTS

• Now, we will take 3 models to interpret the regression results ⮚ Linear Regression -> RIL = 3 + 1.8(GDP) Interpretation: When GDP is 0, RIL = 3 When GDP increases by 1 unit, RIL increases by 1.8 units. Regression with 2 independent variables -> RIL = 2 + .7(GDP) + 1.25(Interest rates) NOTE: That the estimated slope coefficient of GDP changed from 1.8 to .7 when we added Interest rates. This happens because GDP and Interest rates are also correlated (NOT PERFECTLY). Interpretation: When GDP is 0, RIL = 2 When GDP increases by 1 unit, RIL increases by .7 units When Interest rates increase by 1 unit, RIL increases by 1.25 units COEFFICIENT OF DETERMINATION

• Recall that for a single regression, R2 = r2 (x,y) and R2 = ESS/TSS which is goodness of fit. • However, in multiple regression R2 might not be a reliable measure of the explanatory power. REASONS: ⮚ R2 increases as independent variables are added to the model, even if the marginal contribution of new variables is not statistically significant. ⮚ A high R2 may reflect the impact of large independent variable rather than how well the set explains the Dependent variable (Overestimating the regression) ADJUSTED R2

• To overcome the problems mentioned in last slide, many researchers recommend use of Adjusted R2 rather than R2

FORMULA

R2a = 1 − [(n-1/(n-k-1) ) × (1 − R2)]

Where: n = number of observations k = number of independent variables NOTE: R2a will be less than or equal to R2 . So, while adding a new independent variable to the model will increase R2 , it may either increase or decrease the R2a THE F - TEST

• An F-test is useful to evaluate a model against other competing partial models. For example, a model with three independent variables (GDP, Interest rates, Inflation) can be compared against a model with only one independent variable (GDP). We are trying to see if the two additional variables (I/R and Inflation ) in the full model contribute meaningfully to explain the variation in RIL.

• The calculated F-statistic is compared to the critical F-value [with q degrees of freedom in the numerator and (n − kF − 1) degrees of freedom in the denominator]. If the calculated F-stat is greater than the critical F- value, the full model contributes meaningfully to explaining the variation in Y. QUESTIONS

1) A researcher estimated the following three-factor model to explain the return on different portfolios: RP,i = 1.70 + 1.03 Rm,i − 0.23Rz,i + 0.32 Rv,I Calculate the following:

- The return on a portfolio when Rm = 8%, Rz = 2% and Rv = 3%

- The impact on portfolio return if Rz declines by 1%

- The expected return on the portfolio when Rm = Rz = Rv = 0

2) An analyst runs a regression of monthly value-stock returns on 5 independent variables over 60 months. The total sum of squares for the regression is 460, and the residual sum of squares is 170. Calculate the R2 and adjusted R2 . ANSWER

1) • 10.44% • .23% • 1.70%

2) • R2 = 63% • R2a = 59.6% Reading 20 Regression Diagnostic HOMOSKEDASTICITY AND HETEROSKEDASTICITY HOMOSKEDASTICITY HETEROSKEDASTICITY - When variance of residuals is constant across all - When the variance of residuals is not constant, it is said to observations, regression is said to be be heteroskedastic. homoskedastic. - There are two types of heteroskedastic datasets:

Conditional Unconditional - Related to level of (conditional - Not related to level of on) independent variable. independent variables. - It is problematic. - It is non problematic.

- It is non Problematic for regression results.

EFFECTS: - Standard Errors are unreliable estimates - Coefficients estimates are still consistent and unbiased. - Hypothesis Testing is unreliable DETECTING HETEROSKEDASTICITY

A scatterplot can reveal patterns among observations, Also a chi squared test statistic can be used. • Estimate the regression using standard ordinary least squares (OLS) procedures and estimate the residuals and square them (ε2 i ). • Use the squared estimated residuals in step 1 as the independent variable in a new regression with the original explanatory variables. • Calculate the R2 for the model in step 2 and use it to calculate the chi squared test statistic: χ2 = nR2 The chi- squared statistic is compared to its critical value with [k × (k + 3) / 2] degrees of freedom, where k = number of independent variables. • If the calculated χ2 > critical χ2 , we reject the null hypothesis of no conditional heteroskedasticity. CORRECTING FOR HETEROSKEDASTICITY

• If conditional heteroskedasticity is detected, we can conclude that the coefficients are unaffected but the standard errors are unreliable. In such a case, revised, White standard errors should be used in hypothesis testing instead of the standard errors from OLS estimation procedures. • White standard errors are heteroskedasticity-consistent standard errors. MULTICOLLINEARITY

• Multicollinearity refers to the condition when two or more of the independent variables, or linear combinations of the independent variables, in a multiple regression are highly correlated with each other. While multicollinearity does not represent a violation of regression assumptions, its existence compromises the reliability of parameter estimates. EFFECTS

- There is a greater probability that we will incorrectly conclude that a variable is not statistically significant [TYPE 2 Error] DETECTING MULTICOLLINEARITY

• The most common way to detect multicollinearity is the situation where t tests indicate that none of the individual coefficients is significantly different than zero, while the R2 is high (and the F-test rejects the null hypothesis). • [Individual p values > .05, while R2 is high]

• Another approach -> VIF = Variance Inflation Factor • Another approach to identify multicollinearity is to calculate the variance inflation factor (VIF) for each 2 explanatory variable. To do that, we calculate R in the model using the subject explanatory variable (Xj ) as the dependent variable and the other X variables as independent variables.

• This R2 is the used to calculate VIF VIF = 1/(1-R2) • A VIF > 10 (i.e., R2 > 90%) should be considered problematic for that variable. CORRECTING MULTICOLLINEARITY

1. Omit one or more of correlated independent variables, Unfortunately, it is not always an easy task to identify the variable(s) that are the source of the multicollinearity. 2. There are statistical procedures that may help in this effort, like stepwise regression, which systematically remove variables from the regression until multicollinearity is minimized. MODEL SPECIFICATION

• Meaning: To include all those relevant explanatory variables that might explain variation in dependent variable and exclude all those variables that are irrelevant. OMITTED VARIABLE BIAS

Omitting relevant factors from an ordinary least squares (OLS) regression can produce misleading or biased results. Omitted variable bias is present when two conditions are met: • the omitted variable is correlated with other independent variables in the model • the omitted variable is a determinant of the dependent variable. BIAS – VARIANCE TRADEOFF

• Models with too many explanatory variables (i.e., overfit models) may explain the variation in dependent variable well in-sample, but perform poorly out-of-sample. Overfit, larger models have a high bias error due to inclusion of too many independent variables. Smaller models, on the other hand, have high in sample variance errors (i.e., lower R2 ).

Ways to deal with Bias-Variance Tradeoff: 1. General-to-specific model: involves starting with the largest model and then successively dropping independent variables that have the smallest absolute t-statistic. 2. m-fold cross-validation: involves dividing the sample into m parts and then using (m-1) parts (known as the training set) to fit the model and the remaining part (known as the validation block) to use for out-of sample validation. IDENTIFYING OUTLIERS

• Outliers skews the estimated regression parameters therefore it is necessary to ensure that there are no outliers in the data. METRIC TO IDENTIFY OUTLIER: -> COOK’S MEAUSRE

-> Large values of Cook’s measure (i.e., Dj > 1) indicate that the dropped observation was indeed an outlier. Reading 21 Stationary Time Series What’s a Time Series?

• A time series is an ordered sequence of values of a variable at equally spaced time intervals. - It may also be defined as a collections of readings belonging to different time periods of some economic/composite variables

• One variable is time (independent variable) and the second is “Data” (dependent variable) - Example: a graph of US GDP growth in the last 50 years.

• Applications of time series analysis include: - Economic forecasting; - Sales forecasting; - Stock market analysis; - Budgetary analysis; etc. What’s a Time Series?

• A time series can be decomposed into: I. A trend component: a general systematic linear or (most often) nonlinear component that changes over time and does not repeat II. A seasonal component: any regular variation (fluctuation) with a period of less than one year III. A cyclical component: variations with a period greater than one year IV. Random component: irregular variation or erratic fluctuations In this chapter, we dwell on various terminologies and aspects under cyclicality. Autoregression

• An autoregressive model is a time series model that uses observations from previous time steps as input to a regression equation to predict the value at the next time step. o Values from previous time steps (past values) are also referred to as“lagged” values.

o “We can predict the value for the next time step (t) given the observations at the last p time steps” Example • US GDP this year may be a function of GDP figures over the last 10years. Covariance Stationary

Formal definition A time series is covariance stationary if all the terms of the sequence have the same mean, and if the covariance between any two terms of the sequence depends only on the relative positions of the two terms, (i.e. on how far apart they are from each other), and not on their absolute position, (i.e., where they are located in the sequence).

• In order to conduct valid statistical inferences based on autogressive models, we must assume that the time series is covariance stationary. Covariance Stationary

• There are three basic requirements for a time series to be covariance stationary:

❑The expected value/mean of the time series must be constant and finite in all periods ❑The variance of the time series must be constant and finite in all periods ❑The covariance of the time series with itself for a fixed number of periods in the past or future must be constant and finite • If the time series has a trend line, such as GDP growth or labor participation rate, by definition that time series is not mean stationary. Other Terminologies

• Autocovariance function - Refers to the tool used to quantify stability of the covariance structure. - Helps summarize cyclical dynamics in a covariance stationary time series. • Autocorrelation function - Refers to the degree of correlation and interdependency between data points in a time series. - Has much to do with the correlation of time series observations with previous time steps (lags). - At lag k, the autocorrelation function gives the correlation between series values that are k intervals apart

Other Terminologies

• Partial Autocorrelation - Refers to a summary of the relationship between an observation in a time series with observations at prior time steps with the relationships of intervening observations removed. - The partial autocorrelation at lag k is the correlation that results after removing the effect of any correlations due to the terms at shorter lags. Example: The partial autocorrelation at lag 20 is the correlation between observations that are 20 time steps apart after removing the effect of any correlations due to terms at lags 19, 18, … , etc. - Positive correlation: Large current values correspond with large values at the specified lag; - Negative correlation: Large current values correspond with small values at the specified lag. White Noise

• White noise is a time series process with a zero mean, constant variance, and no serial correlation.

• Expressed mathematically, A time series 풚풕 is a white noise process if:

--E(푦푡) = 0 for all t 2 2 --Var(푦푡) = 휎 for all t, 휎 < ∞

-- Cov(푦푡, 푦푠) = 0 if t ≠ s

• The error term of a time series is white noise, i.e.,휀푡 = 푦푡 o The 휀푡 is a set of noise terms for each (t); i.e., 휀푡: (휀1 , 휀2 … , 휀푛 ) generated by each time series event.

White Noise

• White noise is also called zero-mean white noise and is denoted as: 푦푡~푊N(0, 휎2) • If (y) is both uncorrelated & independent, then it is independent white noise denoted as: 푦t~iid (0, 휎2)

• If (y) is normally distributed along with being uncorrelated, then (y) can also be serially independent and is called Gaussian White Noise or Normal White Noise, denoted as:

푦푡~iid푁(0, 휎2) White Noise

The dynamic structure of a white noise process includes the following characteristics: • The unconditional mean and variance must be constant for any covariance stationary process. • The lack of any correlation in white noise means that all autocovariances and autocorrelations are zero beyond displacement zero. • Both conditional and unconditional means and variances are the same for an independent white noise process. • Events in a white noise process exhibit no correlation between the past and present. The Lag Operator

• A lag operator quantifies how a time series evolves by lagging a data series. • It enables a model to express how past data links to the present and how present data links to the future. • The lag operator L shifts a time series back by one time increment.

• 퐋(퐲퐭) = 퐲퐭−1 Lag value at t-2 in terms of lag operator ퟐ • 푳 (풚풕) = 푳푳풚풕 = 푳y풕−ퟏ = 풚풕−ퟐ In general, 풌 • 푳 (풚풕) = 풚풕−k MOVING AVERAGE • The moving average is a popular method investors use to model time series. • A moving-average model establishes a linear relationship between the output variable and the current and various past values of a stochastic (imperfectly predictable) term. • In basic terms, a moving average model is a linear regression of the current value of the series against current and previous (observed) white noise error terms. • The First-Order Moving Average (MA(1)) Process is defined as: • where:

O 푦푡 = the time series variable being estimated

O 휀푡 = current random white noise shock

O 휀푡−1 = one-period lagged random white noise shock O 휃 = coefficient for the lagged random shock • The number 1 in MA(1) refers to a lag of just one step where the process depends on one error term to fit previous data and predict what may happen in the future. MA(1) Properties

• The unconditional mean is zero.

• The unconditional variance is equal to 𝝈ퟐ (ퟏ+ 휽ퟐ) .

• The conditional variance is 𝝈ퟐ.

• An increase in the absolute value of θ causes the unconditional variance to increase, given that the value of σ is constant. Moving Averages(MA) Models

• The MA(1) process can be inverted and the current value of the series expressed in terms of a current shock and the lagged values of the series, instead of a current and a lagged shock.

• This can be done by making 흐풕 the subject of the formula :

• This is referred to as the autoregressive representation. o This process of inversion enables the forecaster to express current observables in terms of past observables.

• For inversion to occur, the absolute value of 휽 must be less than 1, i.e., | 휽| < 1 Properties of a MA(q) Process

• Following is the general appearance of any MA(q) process,

휖푡~푊N(0, 휎2) • The MA(q) process is a generalized representation of the MA(1) process. -This means that the MA(1) process is a special case of the MA(q) process, with q being equal to 1.

• Therefore, the MA(q) and the MA(1) processes have properties that are similar in all aspects.

• When q>1, the MA(q) lag operator polynomial has q roots, and there are chances of ending up with complex roots. Properties of a MA(q) Process

1) Covariance Stationary: Both MA (q) & MA (1) are covariance stationary, irrespective of the value of parameters.

2) Invertibility: Both MA (q) & MA (1) are invertible at | θ | < 1.

3) Conditional Mean: The conditional mean of MA (1) depends on only the first lag of the innovation whereas that of MA (q) depends upon q lags of the innovation (a MA (q) process has the potential for longer memory.

4) Autocorrelation Function: In the MA (1) case, all autocorrelations beyond displacement 1 are zero while in case of MA (q), all autocorrelations beyond displacement q are zero. Autoregressive Models (AR) Models

• From the previous chapter, recall that an autoregressive model is a time series model that uses observations from previous time steps as input to a regression equation to predict the value at the next time step.

• An autoregressive model of order 1, i.e. AR(1), is an autoregressive process where the current value is based on the immediately preceding value, plus the shock.

o Where 휺풕 is white noise and 흓 is the parameter/coefficient of the process in time t-1. • For the GDP growth series, for instance, an autoregressive model of order one uses only the information on GDP growth observed in the last quarter to predict a future growth rate. Properties of an AR(1) Process

1) Unlike the MA(1) process which is always covariance stationary, the AR(1) is covariance stationary only if |휽| < 1. 2) The autoregressive process is always invertible, unlike the MA(1) which is invertible only when |휃| < 1.

3) The unconditional mean of AR(1) is zero.

4) The unconditional variance for AR(1) is given by: 휎2 / 1− 휙2 2 5) The conditional mean and variance are 휙푦푡−1 and 휎 , respctively. Yule-Walker Equation

• To estimate the autoregressive parameters, such as the coefficient (흓), forecasters need to accurately estimate the autocovariance of the data series. This is possible thanks to the Yule-Walker equation. • When using the Yule-Walker concept to solve for the autocorrelations of an AR(1) process, we use the following relationship: t 𝝆풕 = 흓 for t=1,2,3 Given an AR(1) process such as:

풚풕 = 흐풕 + ퟎ. ퟕ0풚풕−1 • The coefficient (흓) is equal to 0.70, and using the concept derived from the Yule-Walker equation, the first- period autocorrelation is 0.70, the second-period autocorrelation is 0.49 (0.72), and so on for the remaining autocorrelations. AR(p) Process

• The following is the equation of a general pth order autoregressive process, AR(p):

• where: o 푦푡 = the time series variable being estimated y; o 푦푡−1 = one-period lagged observation of the variable being estimated; o 푦푡−푝 = pth period lagged observation of the variable being estimated; o 휀푡 = current random white noise shock; and o 휙 = coefficients for the lagged observations of the variable being estimated

• The AR(p) process is also covariance stationary and exhibits the same decay in autocorrelations that was found in the AR(1) process. Autoregressive Moving Average (ARMA) Models

• MA and AR models can be combined to obtain a better approximation to the Wold representation. • The result is the autoregressive moving average, ARMA (p,q), process. • The ARMA(1,1) is the simplest ARMA process which is neither a pure autoregression or a pure moving average. That is:

풚풕 = 흓풚풕−ퟏ + 흐풕 + 휽흐풕 −1 • The ARMA formula merges the concepts of an AR process and an MA process. Application of AR and ARMA Models

• Both autoregressive (AR) and autoregressive moving average (ARMA) models can be applied to time series data that show signs of seasonality.

• It’s important to bear in mind that seasonality is most apparent when the autocorrelations for a data series do not abruptly cut off, but rather decay gradually with periodic spikes. Reading 22 Non-Stationary Time Series What is Trend?

• Trend is a general systematic linear or (most often) nonlinear component that changes over time and does not repeat. o It is a pattern of gradual change in a condition, output, or process, or an average or general tendency of a series of data points to move in a certain direction over time. • Trend forecasting is quantitative. o It is forecasting is based on tangible, concrete numbers from the past. • It uses time series data – data where the numerical value is known over different points in time. • Data is usually plotted on a graph: o the horizontal x-axis is used to plot time, such as the year; and o The vertical y-axis is used to plot the information you are trying to predict. Linear Trend

• A trend is said to be linear when the variable is approximately linear; i.e., it changes at a constant rate as in a straight line. • The value of the trend at time period t (푇푇푡푡) can be represented by the equation for a straight line (of the form y = c + mx) as:

퐓퐭 = 훃ퟎ + 훃ퟏ Timet Where

• Time푡 = 푡 = (1,2,3, … , 푛 − 1, 푛) for a sample of size n o The variable TIME is called a “time trend” or "time dummy” 훽0 is called the intercept and it is value of the trend at time t = 0 훽1 is the slope of the line, which is positive (negative) if the trend is increasing (decreasing) Linear Trend

• On a graph, a linear trend appears as a straight line angled diagonally up or down. • If we looked at sales of Nike products, for example, we might see a diagonal line angled upward, indicating that sales are increasing steadily over time. Nonlinear Trend

• Nonlinear trends are those in which the variable changes at an increasing or decreasing rate rather than at a constant rate like in linear trends. • Can be expressed as quadratic functions of time, i.e., 2 퐓퐭 = 훃ퟎ + 훃퐭Timet+ 훃2Timet Forecasting Trend

• In forecasting, trend models are used to predict the values of a series y into the future. • At a time period T for instance, we use a trend model to forecast the h-step-ahead value of a series y. Forecasting Trend

• Assuming ε is just an independent zero-mean random noise, the optimal forecast of for any future period is 0. • Thus, using the same forecasting model as before, i.e.,

the forecast 푌푇+ℎ at time T can be given by: Forecasting Trend

• However, parameters 훽0 and 훽1 are unknown.

• Therefore forecast 푌푇+ℎ at time T can is given by: What is Seasonality?

• Seasonality is a characteristic of a time series in which the data experiences regular and predictable changes that recur every calendar year.

• Seasonality may be due to weather patterns, holiday patterns, school calendar patterns, etc.

• There’s a difference between seasonality and cyclicality: o Seasonal effects are observed within one calendar year, e.g., spikes in sales over Christmas; while o Cyclical effects span time periods shorter or longer than one calendar year, e.g., spikes in sales due to low unemployment rates. Seasonality

• Seasonality can have a big effect on investment returns o A business may experience high sales during certain seasons and low sales during off-peak seasons. o Failure to take these fluctuations into account may result in buy/sell decisions based only on short-term trading activity.

• Seasonal patterns can be eliminated from a time-series to study the effect of other components such as cyclical variations. o Seasonal variations contribute to forecasting and the prediction of the future trends. Modelling Seasonality

• Seasonality can be modeled by using regression analysis on seasonal dummies. To create seasonal dummies, let s be the number of seasons in a year which could also depict the yearly number observations on a series. o Thus, s = 4 if we have quarterly data, s = 12 if we have monthly data, s = 52 if we have weekly data, and so forth. • If we have four seasons, we create four dummies such that:

D1t = 1 if t = quarter 1 = 0 if not

D2t = 1 if t = quarter 2 = 0 if not

D3t = 1 if t = quarter 3 = 0 if not

D4t = 1 if t = quarter 4 = 0 if not Modelling Seasonality

• This can be represented as:

퐷1푡 = (1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0)

퐷2푡 = (0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0)

퐷3푡= 0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0

퐷4푡 = (0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1)

The seasonality can then be modeled as: Modelling Seasonality

• Since we have:

• We can expand this to have:

• Here, 휸휸풊풊 is the intercept and the model regresses on an intercept, with a different intercept in each season, i.e., 훾훾1 ≠ 훾훾2 ≠ 훾훾3 ≠훾훾4 • These intercepts which summarize the seasonal pattern over the year are known as the seasonal factors. Modelling Seasonality

• Can we work with s – 1 seasonal dummies instead? o Yes. o We could also work with s – 1 seasonal dummies with an intercept as:

• The constant term is the intercept for the omitted season, and the coefficients on the seasonal dummies give the seasonal increase or decrease relative to the omitted season. o Thus, yt = 훾훾1 if t is in quarter 1, o yt = 훾훾1 + 훾훾2 if t is in quarter 2, … , o yt = 훾훾1 + 훾훾4 if t is in quarter 4 Modelling Seasonality

• When forecasting, we would normally also include a linear trend alongside seasonality. • The model would appear as follows: Other Types of Calander Effects

1) Holiday variation - The idea is that over time, there can be changes in some holidays’ dates. For example, the Easter holiday whose date differs despite arriving at approximately the same time each year. - The timing of such holidays partly affects the behavior of most series and such our forecasting models should track them. - Dummy variables are used to handle holiday effects, an Easter Dummy is a good example.

2) Trading-day variation -Different months contain different numbers of trading days which influences the model and the forecast. • For example, the number of trading days in February will be less than that in August in any given year. -To handle this condition, a trading day variable can be incorporated as a dummy variable along with the normal seasonal dummies. Complete Model

• The complete model that takes into account the likelihood or holiday or trading day variation is written as:

• In the above equation, there are V1 holiday variables denoted by HDV, and V2 trading-day variables denoted as TDV. • The ordinary least square can be a good estimation of this standard regression equation. H-step Ahead Forecast

• The full model is:

• At time T+h:

• An h-step-ahead forecast based on information known at time T is:

Note: Since parameters are unknown, they can be replaced by estimates as before. Reading 23 Measuring Return, Volatility, and Correlation Simple and continuously compounded returns

A simple return can be expressed over various periods of time, spanning from single hour to full year

Continuously compunded returns can be calculated using the following formula Volatility, variance and implIed volatility

• The volatility of a variable, Is expressed as the standard deviation of its returns . • The variance of an asset is expressed as square of standard deviation . • Options are used to calculate implied volatility, which is an annual volatility number that can be measured by backing into it using option prices . Normal and non-normal distribution

Normal distribution Non-Normal distribution • Thin tails • Fat tails • No skewness • Skewness • No kurtosis • Kurtosis Jarque-bera test

• The jarque-bera test statistic can be used to test whether a distribution is normal, meaning that there is zero skewness and no excess kurtosis

• Decision rule.

If the JB calculation > critical value, reject the null hypothesis The power law

• In a normal distribution with kurtosis of three, the tails are thin. For other distributions the tails don’t decline as quickly. Some of these distributions have power tails, implying that the probability of seeing a return larger than the specific value of x is equal to : P (X greater than x) = k/x^-a Correlation and dependence

• Correlation : represents linear relationship between two variable • Covariance : Represents the directional relationship between two variables • Pearson's correlation serves as a method of measuring linear dependence • For non linear dependence spearman’s rank correlation and Kaendall’s tau can be used. The values of both must lie between -1 and 1, if the value is zero then the results are completely independent Spearman’s rank correlation

Spearman’s rank correlation is a linear correlation estimator which is applied to the ranks of observations In a situation where two random variables(x and y) have n associated observations rank x and rank y serve as the rank of the variables

The equation of the for the correlation estimator is :

di = rank xi – rank yi Kendall’s tau

• Kendall’s tau is used to measure concordant and discordant pairs there relative frequency • The equation for calculating kendall’s tau is :

• Random variables with many concordant pairs have strong positive dependence , whereas variables with many discordant pairs have strong negative relation Positive definiteness

• It is defined as every average combination having positive variance, requires that variance of an average of components in a covariance matrix must be positive • In order to ensure that correlation matrices have positive definite, two structural correlations are used: First is equicorrelation: sets all correlation equal to the same amount

Second it assumes correlation due to common exposure Questions

1. An analyst calculates a spearman’s rank correlation of 0.48. This output is indicative of • Positive linear correlation • Negative linear correlation • Positive non linear dependence

• Negative non linear dependence 2. Relative to a normal distribution, financial returns to have a non normal distribution, which will have • Thin tails • Kurtosis greater than 3 • No skewness • A symmetrical distribution Answers

1. C 2. B Reading 24 Simulation and Bootstrapping MONTE CARLO SIMULATION

• Monte Carlo simulation involves creating a model, into which all variability and correlations of the input criteria are entered. The model then runs a large number of simulations to obtain a spread of results • It is a numerical method of stastical simulation which utilizes sequence of random numbers to perform the simulation. Analyzing price of stock

3)Stochastic calculation (quantative finance)

reliance

2)Fundamental 1)Technical analysis analysis (indicators/patterns) (i/s,b/s,CF) Intrinsic value How can monte carlo simulation be conducted

1) Generate data with respect to the desired data generating process. The errors will be drawn from some specific distribution. 2) Next, do regression and the test stastic. 3) Save the test statistic or whatever parameter is of interest. 4) Go back to stage 1 and repeat N times (N should be large). Reducing monte carlo sampling error

Sampling error ▪ The large the number of simulations done prior to estimating the value of parameter, the lower the sampling error. 휎 ▪ Eg . √푛 Antithetic variates

• It involves taking the complement of a set of random numbers and running parallel simulation on those

• Consider two variable X1 and X2. The variance of X1 and X2 is: senerio1) var (X1+X2)/2 = 1/4[var(X1)+var(X2)+2cov(X1,X2)]

Senerio2) If your X2 set is negatve then: = 1/4[var(X1)+var(X2)-2cov(X1,X2)] Control variates

• The control variate technique is widely used method to reduce the sampling error in monte carlo simulations. A control variate involves replacing a variable x (under simulation)that has unknown properties, with a similar variable y that has known properties Bootstrapping method

• The bootstrapping approach draws random return data from sample of histroical data • Unlike monte carlo simulation ,boot strapping does not directly model the observed data,nor does it make assumptions about the distribution of data. Rather, the observed data is sampled directly from unkown distribution. Independent and identically distributed

• In this methodology, samples are simply drawn one-by-one from the observed data, and replaced

• If we required a simulation of sample size of three from data set with total of 10 observations, the i.i.d. bootstrap generates observation indices by randomly sampling three times with replacement from the value {1,2,…10}.these indices indicate the observed data to be included in simulated (i.e boot strapping ) samples Circular block bootstrap

• The CBB method is used to produce bootstrap samples by sampling block, with replacement ,until the required bootstrap samples size is produced

• Eg. Suppose the 10 observations are available and they are sampled in blocs of size three. Ten blocs are constructed , starting with {x1,x2,x3},{x2,x3,x4},…{x8,x9,x10},{x9,x10,x1} the first eight blocks use three consecutive observations , but final two blocks wrap around.

• For sample size of n, a block size of √n is generally appropriate Questions

1] Which of the following statement regarding monte carlo simulation is least accurate? When compare using monte carlo simulation A. Simulation data is used to numerically approximate the expected value of the function. B. The users specifies a complete data generating process (DGP) that used to produce simulated data. C. The observed data are used directly to generate a simulated data set. D. A full stastical model is used that includes an assumtion about the distribution of the stocks. 2] The boot strapping method is most likely to be effective when the A. Data contains outliers. B. Present is different from past. C. Data is independent. D. Markets have experienced structural changes. Answers

1] C

2] C