STA 6104 Financial Time Series

Home , Exponential smoothing, Moving average

Moving Averages and Exponential Smoothing Smoothing

• Our objective is to predict some future value Yn+k given a past history {Y1,Y2,...,Yn} of observations up to time n.

• Smoothing always involves some form of local averaging of data such that the non-systematic components of individual observations cancel each other out.

• All the time-series smoothing techniques use a form of weighted average of past observations to smooth up-and-down movements, i.e., some statistical methods of suppressing short-term ﬂuctuations.

• The underlying assumption of these methods is that the ﬂuctuations in past values represent random departures from some smooth curve that can plausibly be extrapolated into the future to produce a forecast or series of forecasts.

1 Moving Averages

• The most common technique is moving average smoothing which replaces each element of the series by either the simple or weighted average of n most recent values, where n is the width of the smoothing “window” (see Box & Jenkins, 1976; Velleman & Hoaglin, 1981).

• The simple moving average (SMA) is most often used for simple descriptive patterns of a time-series variable and is a very simple procedure for forecasting a time-series variable.

• The calculation of the simple moving average is given by 1 SMAt = (Yt + Y 1 + · · · + Y ), k t− t−(k−1) where SMAt is the simple k-period moving average in time period t.

2 Moving Averages

• This calculation involves the previous k periods and is used for generating the moving average for this time period.

• We can use the SMA to generate a one-period ahead forecast by

Yˆt+1 = SMAt.

• The naive method that assumes the next period will be identical to the present is a special case of SMA with k = 1.

• The term “moving average” refers to an average that is updated each time period by deleting one observation at the beginning of the period and replacing it with another at the end of the period.

3 Moving Averages

• The choice of how many of the most recent time periods k to use in computing the moving average is determined by the researcher (subjectively) or choosing the value of k that generates the smallest error (objectively).

• The accuracy of the forecast using a simple moving average is, in most cases, dependent upon the choice of k.

• The larger the choice of k, the smoother the series will be.

• Example: Exchange rate with Japan (c3t1.xls)

4 Centered Moving Averages

• The major diﬀerence between the simple moving average and the centered moving average is the selection of observation used.

• The simple moving average uses the current observation plus previous observations, e.g., a ﬁve-period SMA uses the current time period observation and the four previous time period observations.

• The centered moving average (CMA) “centers” its average on the current period using both the previous time period observations and the forward time period observations, e.g., a ﬁve-period CMA uses the current time period observation, the two previous time period observations and the two subsequent time period observations.

5 Centered Moving Averages

• The speciﬁc method of calculating a centered moving average depends upon whether the period of averaging, k, is even or odd.

• If k is odd, then the centered moving average is calculated by 1 CMAt = (Y + · · · + Yt + · · · + Y , k t−((k−1)/2)) t+((k−1)/2)) where Yt is the mid-point in the range of k data observations.

• Note that (k − 1)/2 observations are lost both at the beginning and the end of the series.

6 Centered Moving Averages

• In most time-series data, the centered moving average most commonly is associated with an even number of observations.

• For example, we are often analyzing either quarterly or monthly data, where the choice of k would be 4 for quarterly or 12 for monthly data.

• In these cases, the SMA does not correspond to the time period observations in the original data series.

• To obtain an average that does correspond to the original time periods, CMA are calculated using two-period moving averages of the initial moving averages.

7 Centered Moving Averages

• If k is even, the CMA is calculated in a two-step process.

1. Calculate two moving averages that bound the time period: 1 MA (1) = (Y ( 2)+1 + ··· + Yt + ··· + Y +( 2)) t k t− k/ t k/ 1 MA (2) = (Y ( 2)+2 + ··· + Yt + ··· + Y +( 2)+1) t k t− k/ t k/

2. Take the simple average of these two values in order to “center” the average on a corresponding time period of the original series: 1 CMA = (MA + MA ) t 2 t(1) t(2)

• The number of observation lost at each end of the series is L/2.

8 Exponential Smoothing

• Exponential smoothing is a method for continually revising an estimate or forecast by accounting for more recent changes or for ﬂuctuations in the data.

• Here, we assume there is no systematic trend or seasonal eﬀects in the process, or that these have been identiﬁed and removed.

• The mean of the process can change from one time step to the next, but we have no information about the likely direction of these changes.

• The model is

Yt = µt + et,

where µt is the non-stationary mean of the process at time t and et are independent random deviations with a mean of 0 and a standard deviation σ.

9 Exponential Smoothing

• A typical application is forecasting sales of a well-established product in a stable market.

• Exponential smoothing assigns exponentially decreasing weights as the observation get older.

• Recent observations are given relatively more weight in forecasting than the older observations.

• In exponential smoothing, however, there are one or more smoothing parameters to be determined (or estimated) and these choices determine the weights assigned to the observations.

10 Exponential Smoothing

• Let at be our estimate of µt.

• Given that there is no systematic trend, an intuitively reasonable estimate of the mean at time t is given by a weighted average of our observation at time t and our estimate of the mean at time t - 1:

at+1 = αYt + (1 − α)at, 0 < α < 1,

where at is the exponentially weighted moving average at time t.

• The value of α determines the amount of smoothing, and it is referred to as the smoothing parameter.

11 Exponential Smoothing

• If α is near 1, there is little smoothing and at is approximately Yt. This would only be appropriate if the changes in the mean level were expected to be large by comparison with σ

• At the other extreme, a value of α near 0 gives highly smoothed estimates of the mean level and takes little account of the most recent observation.

• This would only be appropriate if the changes in the mean level were expected to be small compared with σ.

• A typical compromise ﬁgure for α is 0.2 since in practice we usually expect that the change in the mean between time t − 1 and time t is likely to be smaller than σ.

12 Exponential Smoothing

• As time passes the smoothed statistics becomes the weighted average of a greater and greater number of the past observations Yt−n, and the weights assigned to previous observations are in general proportional to the terms of the geometric progression {1, (1 − α), (1 − α)2, (1 − α)3,...}.

• A geometric progression is the discrete version of an exponential function, so this is where the name for this smoothing method originated:

w(t) = w(0) exp(kt),k = ln(1 − α), w(0) = 1.

13 Holt’s Exponential Smoothing

• We usually have more information about the market than exponential smoothing can take into account. Sales are often seasonal, and we may expect trends to be sustained for short periods at least. But trends will change.

• The ﬁrst extension of the simple exponential smoothing is to adjust the smoothing model for any trend in the data.

• When a trend exists, the forecast may then by improved by adjusting for this trend by using a two-parameter exponential smoothing (originated by C. C. Holt, 1957 “Forecast- ing trends and seasonals by exponentially weighted moving averages”, ONR Research Memorandum, Carnegie Institute of Tech- nology 52.).

14 Holt’s Exponential Smoothing

• The model adds a growth factor (or trend factor) to the smoothing equation as a way to adjust for the trend.

• The Holt’s exponential smoothing is given by

at+1 = αYt + (1 − α)(at + bt) bt+1 = β(at+1 − at)+(1 − β)bt where

15 Additive and Multiplicative Models

• Time-series models can basically be classi- ﬁed into two types: additive models and multiplicative models.

• For additive model, we assume the data is the sum of the time-series components, i.e.,

Yt = Trt + Snt + Clt + et.

• If the data does not contain one of the components, the value for that is equal to zero.

• The seasonal (or cyclical) component of an additive model is independent of the trend, and thus the magnitude of the seasonal swing (movement) is constant over time.

16 Additive and Multiplicative Models

• For multiplicative model, the data is the product of the various components, i.e.,

Yt = Trt × Snt × Clt × et.

• If the data does not contain one of the components, the value for that is equal to 1.

• The seasonal (or cyclical) component of a multiplicative model is proportional (a ra- tio) of the trend, and thus the magnitude of the seasonal swing is increases or decreases according to the behavior of the trend.

• Although most data that posses seasonal (cyclical) variations cannot be precisely clas- siﬁed as additive or multiplicative in nature, we usually look at forecasts obtained in both models and choose the model that yields the smallest error.

17 Winter’s Exponential Smoothing

• Winter’s exponential smoothing model is another extension of the simple exponential smoothing model; it is used for data that exhibit both trend and seasonality.

• It is a three-parameter exponential smoothing model, which has an additional equation to adjust for the seasonal component.

• The Winter’s exponential smoothing is due to P. R. Winters (1960, “Forecasting sales by exponentially weighted moving averages”, Management Science 6, 324–342).

18 Winter’s Exponential Smoothing

• The additive Holt-Winters prediction function (for time series with period length p) is

Yˆt+h = at + h × bt + st+1+(h−1) mod p, where at, bt and st are given by

at = α(Yt − st−p) + (1 − α)(at−1 + bt−1) bt = β(at − at−1) + (1 − β)bt−1 st = γ(Yt − at) + (1 − γ)st−p where

at+1 = smoothed value for time period t + 1 α = smoothing constant for the level (0 <α< 1) xt = actual value in time period t at = smoothed value for time period t bt+1 = trend estimate for time period t + 1 bt = trend estimate for time period t β = smoothing constant for the trend estimate (0 <β< 1) γ = smoothing constant for the seasonality estimate (0 <γ< 1) h = number of periods in forecast lead period p = number of periods in the seasonal cycle st = seasonality estimate for time period t

19 Winter’s Exponential Smoothing

• The multiplicative Holt-Winters prediction function (for time series with period length p) is ˆ Yt+h = (at + hbt)st+1+(h−1) mod p,

where at, bt and st are given by

at = α(Yt/st−p)+(1 − α)(at−1 + bt−1) bt = β(at − at−1)+(1 − β)bt−1 st = γ(Yt/at)+(1 − γ)st−p

• As with simple and Holt’s exponential smoothing, initial values must be selected to initialize or warm up the model.

• Over a long time period, the particular values selected have little eﬀect on the forecast.

20 New Product Forecasting

• For new products, because they typically lack historical data, most forecasting techniques cannot produce satisfying results.

• It is typically impossible for Holt’s exponential smoothing to determine the trend since the dataset is too small.

• To overcome this diﬃculty, forecasters use a number of models that generally fall in the category called diﬀusing models.

• These models are alternatively called S-curves, growth models, saturation models, or substitution curves.

• These models are most commonly used to forecast the sales of new products and technology life cycles.

21 New Product Forecasting

• Life cycles usually follow a common pattern:

1. A period of slow growth just after intro- duction during an embryonic stage.

2. A period of rapid growth.

3. Slowing growth in a mature phase.

4. Decline.

• The forecaster’s task is to identify and estimate the parameters of such a pattern of growth.

• Each new-product model has its own lower and upper limit and expert opinion is needed to determine these limits.

• In most cases, the lower limitation is 0 and the determination of the upper limit is a more complicated task.

22 Gompertz Curve

• Two most common forms of S-curves used in forecasting are the Gompertz curve and the logistics curve (also known as the Peral curve).

• The Gompertz curve is named after its de- veloper, Benjamin Gompertz, an English ac- tuary.

• In 1825, Benjamin Gompertz proposed an exponential increase in death rates with age.

• The Gompertz’s Law of Mortality states that the death rate is increasing in a geometric progression.

23 Gompertz Curve

• The Gompertz function is given as

ae−bt Yt = Le , where a and b are the parameters that describing the curve and L is the upper limit of Y .

• The Gompertz curve will range in value from zero to U as t varies from −∞ to ∞.

• The Gompertz curve is best used in situations where it becomes more diﬃcult to achieve an increase in the growth rate as the maximum value is approached.

• Example: Color TV adoptions (c3t8)

24 Logistics Curve

• The logistics curve is another way of forecasting with sparse data and is also used frequently to forecast new-product sales.

• The logistics curve has the following form: L Yt = , 1 + ae−bt where a and b are the parameters that describing the curve and L is the upper limit of Y .

• The logistics curve is symmetric about its point of inﬂection, i.e., the upper half of the curve is a reﬂection of the lower half.

• Note that the Gompertz curve is not necessarily symmetric about its point of inﬂection.

25 Logistics Curve vs. Gompertz Curve

• To choose in between Logistics curve and Gompertz curve, it lies in whether it is easier to achieve the maximum value the closer you get to it, or whether it becomes more diﬃcult to attain the maximum value the closer you get to it.

• If there is an oﬀsetting factor such that growth is more diﬃcult to maintain as the maximum is approached, then the Gompertz curve will be the best choice.

• If there are no such oﬀsetting factors hindering the attainments of the maximum value, the logistics curve will be the best choice.

• Example: Cellular phone adoption (c3t9)

26 Bass Model

• Professor Frank M. Bass published a paper describing his mathematical model, which quantified the theory of adoption and diffusion of a new product by society, in Management Science nearly fifty years ago (Bass, 1969).

• The mathematics is straightforward, and the model has been inﬂuential in marketing, and on a variety of biological, medical and scientiﬁc forecasts.

• An entrepreneur with a new invention will often use the Bass model when making a case for funding.

27 Bass Model

• The Bass formula for the number of people, Nt who have bought a product at time t depends on three parameters:

– p: the coeﬃcient of innovation (the probability of initial purchase of a new product independent of the inﬂuence of previous buyers)

– q: the coeﬃcient of imitation (the pressure of imitation on previous purchasers)

– m: the total number of people who eventually buy the product

• The Bass formula (discrete-time version) is

Nt+1 = Nt + p(m − Nt) + qNt(m − Nt)/m

• Rationale for the model is that initial sales will be to people who are interested in the novelty of the product, whereas later sales will be to people who are drawn to the product after seeing their friends and acquaintances use it.

28 Bass Model

• The above formula is a diﬀerence equation and its solution is 1 − e−(p+q)t N = m   . t −(p+q)t 1 + (q/p)e  (We will verify this result for the continuous-time version of the model)

• One interpretation of the Bass model is that the time from product launch until purchase is assumed to have a probability distribution that can be parameterized in terms of p and q.

• The interpretation of the hazard is that if it is multiplied by a small time increment, it gives the probability that a random purchaser who has not yet made the purchase will do so in the next small time increment.

29 Bass Model

• The continuous-time model of the Bass formula can be expressed in terms of the hazard function and the CDF: h(t) = p + qF (t).

• This diﬀerential equation give the solution for F (t) as 1 − e−(p+q)t F (t) = . 1 + (q/p)e−(p+q)t

• Therefore, the pdf is (p + q)2e−(p+q)t f(t)= . p[1 + (q/p)e−(p+q)t]2

• Cumulative sales are given by m × F (t).

ln(q)−ln(p) • The time to peak is t = p+q .

• Example: Adoption of Telephone-answering devices in the United States (c3t12)