<<

STA 6104 Financial Time

Moving and Exponential Smoothing

• Our objective is to predict some future value Yn+k given a past history {Y1,Y2,...,Yn} of observations up to time n.

• Smoothing always involves some form of local averaging of such that the non-systematic components of individual observations cancel each other out.

• All the time-series smoothing techniques use a form of weighted of past observations to smooth up-and-down movements, i.e., some statistical methods of suppressing short-term fluctuations.

• The underlying assumption of these methods is that the fluctuations in past values represent random departures from some smooth curve that can plausibly be extrapolated into the future to produce a forecast or series of forecasts.

1 Moving Averages

• The most common technique is smoothing which replaces each element of the series by either the simple or weighted average of n most recent values, where n is the width of the smoothing “window” (see Box & Jenkins, 1976; Velleman & Hoaglin, 1981).

• The simple moving average (SMA) is most often used for simple descriptive patterns of a time-series variable and is a very simple procedure for a time-series vari- able.

• The calculation of the simple moving average is given by 1 SMAt = (Yt + Y 1 + · · · + Y ), k t− t−(k−1) where SMAt is the simple k-period moving average in time period t.

2 Moving Averages

• This calculation involves the previous k periods and is used for generating the moving average for this time period.

• We can use the SMA to generate a one-period ahead forecast by

Yˆt+1 = SMAt.

• The naive method that assumes the next period will be identical to the present is a special case of SMA with k = 1.

• The term “moving average” refers to an average that is updated each time period by deleting one observation at the beginning of the period and replacing it with another at the end of the period.

3 Moving Averages

• The choice of how many of the most recent time periods k to use in computing the moving average is determined by the researcher (subjectively) or choosing the value of k that generates the smallest error (objectively).

• The accuracy of the forecast using a simple moving average is, in most cases, dependent upon the choice of k.

• The larger the choice of k, the smoother the series will be.

• Example: Exchange rate with Japan (c3t1.xls)

4 Centered Moving Averages

• The major difference between the simple moving average and the centered moving average is the selection of observation used.

• The simple moving average uses the current observation plus previous observations, e.g., a five-period SMA uses the current time period observation and the four previous time period observations.

• The centered moving average (CMA) “centers” its average on the current period using both the previous time period observations and the forward time period observations, e.g., a five-period CMA uses the current time period observation, the two previous time period observations and the two subsequent time period observations.

5 Centered Moving Averages

• The specific method of calculating a centered moving average depends upon whether the period of averaging, k, is even or odd.

• If k is odd, then the centered moving average is calculated by 1 CMAt = (Y + · · · + Yt + · · · + Y , k t−((k−1)/2)) t+((k−1)/2)) where Yt is the mid-point in the of k data observations.

• Note that (k − 1)/2 observations are lost both at the beginning and the end of the series.

6 Centered Moving Averages

• In most time-series data, the centered moving average most commonly is associated with an even number of observations.

• For example, we are often analyzing either quarterly or monthly data, where the choice of k would be 4 for quarterly or 12 for monthly data.

• In these cases, the SMA does not correspond to the time period observations in the original data series.

• To obtain an average that does correspond to the original time periods, CMA are calculated using two-period moving averages of the initial moving averages.

7 Centered Moving Averages

• If k is even, the CMA is calculated in a two-step process.

1. Calculate two moving averages that bound the time period: 1 MA (1) = (Y ( 2)+1 + ··· + Yt + ··· + Y +( 2)) t k t− k/ t k/ 1 MA (2) = (Y ( 2)+2 + ··· + Yt + ··· + Y +( 2)+1) t k t− k/ t k/

2. Take the simple average of these two values in order to “center” the average on a corresponding time period of the original series: 1 CMA = (MA + MA ) t 2 t(1) t(2)

• The number of observation lost at each end of the series is L/2.

8

• Exponential smoothing is a method for continually revising an estimate or forecast by accounting for more recent changes or for fluctuations in the data.

• Here, we assume there is no systematic trend or seasonal effects in the process, or that these have been identified and removed.

• The of the process can change from one time step to the next, but we have no information about the likely direction of these changes.

• The model is

Yt = µt + et,

where µt is the non-stationary mean of the process at time t and et are independent random deviations with a mean of 0 and a σ.

9 Exponential Smoothing

• A typical application is forecasting sales of a well-established product in a stable market.

• Exponential smoothing assigns exponentially decreasing weights as the observation get older.

• Recent observations are given relatively more weight in forecasting than the older observations.

• In exponential smoothing, however, there are one or more smoothing parameters to be determined (or estimated) and these choices determine the weights assigned to the observations.

10 Exponential Smoothing

• Let at be our estimate of µt.

• Given that there is no systematic trend, an intuitively reasonable estimate of the mean at time t is given by a weighted average of our observation at time t and our estimate of the mean at time t - 1:

at+1 = αYt + (1 − α)at, 0 < α < 1,

where at is the exponentially weighted moving average at time t.

• The value of α determines the amount of smoothing, and it is referred to as the smoothing parameter.

11 Exponential Smoothing

• If α is near 1, there is little smoothing and at is approximately Yt. This would only be appropriate if the changes in the mean level were expected to be large by comparison with σ

• At the other extreme, a value of α near 0 gives highly smoothed estimates of the mean level and takes little account of the most recent observation.

• This would only be appropriate if the changes in the mean level were expected to be small compared with σ.

• A typical compromise figure for α is 0.2 since in practice we usually expect that the change in the mean between time t − 1 and time t is likely to be smaller than σ.

12 Exponential Smoothing

• As time passes the smoothed be- comes the weighted average of a greater and greater number of the past observations Yt−n, and the weights assigned to previous observations are in general proportional to the terms of the {1, (1 − α), (1 − α)2, (1 − α)3,...}.

• A geometric progression is the discrete ver- sion of an exponential function, so this is where the name for this smoothing method originated:

w(t) = w(0) exp(kt),k = ln(1 − α), w(0) = 1.

13 Holt’s Exponential Smoothing

• We usually have more information about the market than exponential smoothing can take into account. Sales are often seasonal, and we may expect trends to be sustained for short periods at least. But trends will change.

• The first extension of the simple exponential smoothing is to adjust the smoothing model for any trend in the data.

• When a trend exists, the forecast may then by improved by adjusting for this trend by us- ing a two-parameter exponential smoothing (originated by C. C. Holt, 1957 “Forecast- ing trends and seasonals by exponentially weighted moving averages”, ONR Research Memorandum, Carnegie Institute of Tech- nology 52.).

14 Holt’s Exponential Smoothing

• The model adds a growth factor (or trend factor) to the smoothing equation as a way to adjust for the trend.

• The Holt’s exponential smoothing is given by

at+1 = αYt + (1 − α)(at + bt) bt+1 = β(at+1 − at)+(1 − β)bt where

at+1 = smoothed value for time period t + 1 α = smoothing constant for the level (0 <α< 1) xt = actual value in time period t at = smoothed value for time period t bt+1 = trend estimate for time period t + 1 bt = trend estimate for time period t β = smoothing constant for the trend estimate (0 <β< 1)

15 Additive and Multiplicative Models

• Time-series models can basically be classi- fied into two types: additive models and multiplicative models.

• For additive model, we assume the data is the sum of the time-series components, i.e.,

Yt = Trt + Snt + Clt + et.

• If the data does not contain one of the com- ponents, the value for that is equal to zero.

• The seasonal (or cyclical) component of an additive model is independent of the trend, and thus the magnitude of the seasonal swing (movement) is constant over time.

16 Additive and Multiplicative Models

• For multiplicative model, the data is the prod- uct of the various components, i.e.,

Yt = Trt × Snt × Clt × et.

• If the data does not contain one of the com- ponents, the value for that is equal to 1.

• The seasonal (or cyclical) component of a multiplicative model is proportional (a ra- tio) of the trend, and thus the magnitude of the seasonal swing is increases or decreases according to the behavior of the trend.

• Although most data that posses seasonal (cyclical) variations cannot be precisely clas- sified as additive or multiplicative in nature, we usually look at forecasts obtained in both models and choose the model that yields the smallest error.

17 Winter’s Exponential Smoothing

• Winter’s exponential smoothing model is another extension of the simple exponential smoothing model; it is used for data that exhibit both trend and .

• It is a three-parameter exponential smoothing model, which has an additional equation to adjust for the seasonal component.

• The Winter’s exponential smoothing is due to P. . Winters (1960, “Forecasting sales by exponentially weighted moving averages”, Management Science 6, 324–342).

18 Winter’s Exponential Smoothing

• The additive Holt-Winters prediction func- tion (for with period length p) is

Yˆt+h = at + h × bt + st+1+(h−1) mod p, where at, bt and st are given by

at = α(Yt − st−p) + (1 − α)(at−1 + bt−1) bt = β(at − at−1) + (1 − β)bt−1 st = γ(Yt − at) + (1 − γ)st−p where

at+1 = smoothed value for time period t + 1 α = smoothing constant for the level (0 <α< 1) xt = actual value in time period t at = smoothed value for time period t bt+1 = trend estimate for time period t + 1 bt = trend estimate for time period t β = smoothing constant for the trend estimate (0 <β< 1) γ = smoothing constant for the seasonality estimate (0 <γ< 1) h = number of periods in forecast lead period p = number of periods in the seasonal cycle st = seasonality estimate for time period t

19 Winter’s Exponential Smoothing

• The multiplicative Holt-Winters prediction function (for time series with period length p) is ˆ Yt+h = (at + hbt)st+1+(h−1) mod p,

where at, bt and st are given by

at = α(Yt/st−p)+(1 − α)(at−1 + bt−1) bt = β(at − at−1)+(1 − β)bt−1 st = γ(Yt/at)+(1 − γ)st−p

• As with simple and Holt’s exponential smoothing, initial values must be selected to initialize or warm up the model.

• Over a long time period, the particular values selected have little effect on the forecast.

20 New Product Forecasting

• For new products, because they typically lack historical data, most forecasting techniques cannot produce satisfying results.

• It is typically impossible for Holt’s exponential smoothing to determine the trend since the dataset is too small.

• To overcome this difficulty, forecasters use a number of models that generally fall in the category called diffusing models.

• These models are alternatively called S-curves, growth models, saturation models, or substitution curves.

• These models are most commonly used to forecast the sales of new products and technology life cycles.

21 New Product Forecasting

• Life cycles usually follow a common pattern:

1. A period of slow growth just after intro- duction during an embryonic stage.

2. A period of rapid growth.

3. Slowing growth in a mature phase.

4. Decline.

• The forecaster’s task is to identify and estimate the parameters of such a pattern of growth.

• Each new-product model has its own lower and upper limit and expert opinion is needed to determine these limits.

• In most cases, the lower limitation is 0 and the determination of the upper limit is a more complicated task.

22 Gompertz Curve

• Two most common forms of S-curves used in forecasting are the Gompertz curve and the logistics curve (also known as the Peral curve).

• The Gompertz curve is named after its de- veloper, Benjamin Gompertz, an English ac- tuary.

• In 1825, Benjamin Gompertz proposed an exponential increase in death rates with age.

• The Gompertz’s Law of Mortality states that the death rate is increasing in a geometric progression.

23 Gompertz Curve

• The Gompertz function is given as

ae−bt Yt = Le , where a and b are the parameters that describing the curve and L is the upper limit of Y .

• The Gompertz curve will range in value from zero to U as t varies from −∞ to ∞.

• The Gompertz curve is best used in situations where it becomes more difficult to achieve an increase in the growth rate as the maximum value is approached.

• Example: Color TV adoptions (c3t8)

24 Logistics Curve

• The logistics curve is another way of forecasting with sparse data and is also used frequently to forecast new-product sales.

• The logistics curve has the following form: L Yt = , 1 + ae−bt where a and b are the parameters that describing the curve and L is the upper limit of Y .

• The logistics curve is symmetric about its point of inflection, i.e., the upper half of the curve is a reflection of the lower half.

• Note that the Gompertz curve is not necessarily symmetric about its point of inflection.

25 Logistics Curve vs. Gompertz Curve

• To choose in between Logistics curve and Gompertz curve, it lies in whether it is easier to achieve the maximum value the closer you get to it, or whether it becomes more difficult to attain the maximum value the closer you get to it.

• If there is an offsetting factor such that growth is more difficult to maintain as the maximum is approached, then the Gompertz curve will be the best choice.

• If there are no such offsetting factors hindering the attainments of the maximum value, the logistics curve will be the best choice.

• Example: Cellular phone adoption (c3t9)

26 Bass Model

• Professor Frank M. Bass published a paper describing his mathematical model, which quantified the theory of adoption and diffusion of a new product by society, in Management Science nearly fifty years ago (Bass, 1969).

• The mathematics is straightforward, and the model has been influential in marketing, and on a variety of biological, medical and scientific forecasts.

• An entrepreneur with a new invention will often use the Bass model when making a case for funding.

27 Bass Model

• The Bass formula for the number of people, Nt who have bought a product at time t depends on three parameters:

– p: the coefficient of innovation (the probability of initial purchase of a new product independent of the influence of previous buyers)

– q: the coefficient of imitation (the pressure of imitation on previous purchasers)

– m: the total number of people who eventually buy the product

• The Bass formula (discrete-time version) is

Nt+1 = Nt + p(m − Nt) + qNt(m − Nt)/m

• Rationale for the model is that initial sales will be to people who are interested in the novelty of the product, whereas later sales will be to people who are drawn to the product after seeing their friends and acquaintances use it.

28 Bass Model

• The above formula is a difference equation and its solution is 1 − e−(p+q)t N = m   . t −(p+q)t 1 + (q/p)e  (We will verify this result for the continuous-time version of the model)

• One interpretation of the Bass model is that the time from product launch until purchase is assumed to have a that can be parameterized in terms of p and q.

• The interpretation of the hazard is that if it is multiplied by a small time increment, it gives the probability that a random purchaser who has not yet made the purchase will do so in the next small time increment.

29 Bass Model

• The continuous-time model of the Bass formula can be expressed in terms of the hazard function and the CDF: h(t) = p + qF (t).

• This differential equation give the solution for F (t) as 1 − e−(p+q)t F (t) = . 1 + (q/p)e−(p+q)t

• Therefore, the pdf is (p + q)2e−(p+q)t f(t)= . p[1 + (q/p)e−(p+q)t]2

• Cumulative sales are given by m × F (t).

ln(q)−ln(p) • The time to peak is t = p+q .

• Example: Adoption of Telephone-answering devices in the United States (c3t12)

30