<<

An Online Vector Error Correction Model for Exchange Rates Forecasting

Paola Arce1, Jonathan Antognini1, Werner Kristjanpoller2 and Luis Salinas1,3 1Departamento de Informatica,´ Universidad Tecnica´ Federico Santa Mar´ıa, Valpara´ıso, Chile 2Departamento de Industrias, Universidad Tecnica´ Federico Santa Mar´ıa, Valpara´ıso, Chile 3CCTVal, Universidad Tecnica´ Federico Santa Mar´ıa, Valpara´ıso, Chile

Keywords: Vector Error Correction Model, Online Learning, Financial Forecasting, Foreign Exchange Market.

Abstract: Financial are known for their non-stationary behaviour. However, sometimes they exhibit some stationary linear combinations. When this happens, it is said that those time series are cointegrated.The Vector Error Correction Model (VECM) is an econometric model which characterizes the joint dynamic behaviour of a set of cointegrated variables in terms of forces pulling towards equilibrium. In this study, we propose an Online VEC model (OVECM) which optimizes how model parameters are obtained using a sliding window of the most recent data. Our proposal also takes advantage of the long-run relationship between the time series in order to obtain improved execution times. Our proposed method is tested using four foreign exchange rates with a frequency of 1-minute, all related to the USD currency base. OVECM is compared with VECM and ARIMA models in terms of forecasting accuracy and execution times. We show that OVECM outperforms ARIMA forecasting and enables execution time to be reduced considerably while maintaining good accuracy levels compared with VECM.

1 INTRODUCTION librium. In finance, many economic time series are revealed to be stationary when they are differentiated In finance, it is common to find variables with long- and restrictions often improves fore- run equilibrium relationships. This is called cointe- casting (Duy and Thoma, 1998). Therefore, VECM gration and it reflects the idea of that some set of vari- has been widely adopted. ables cannot wander too far from each other. Coin- In finance, pair trading is a very common exam- tegration means that one or more linear combinations ple of cointegration application (Herlemont, 2003) of these variables are stationary even though individ- but cointegration can also be extended to a larger set ually they are not (Engle and Granger, 1987). Fur- of variables (Mukherjee and Naka, 1995),(Engle and thermore, the number of cointegration vectors reflects Patton, 2004). how many of these linear combinations exist. Some Both VECM and VAR model parameters are ob- models, such as the Vector Error Correction (VECM), tained using (OLS) method. take advantage of this property and describe the joint Since OLS involves many calculations, the parame- behaviour of several cointegrated variables. ter estimation method is computationally expensive VECM introduces this long-run relationship when the number of past values and observations in- among a set of cointegrated variables as an error cor- creases. Moreover, obtaining cointegration vectors is rection term. VECM is a special case of the vector also an expensive routine. autorregresive model (VAR) model. VAR model ex- Recently, online learning algorithms have been presses future values as a linear combination of vari- proposed to solve problems with large data sets be- ables past values. However, VAR model cannot be cause of their simplicity and their ability to update used with non-stationary variables. VECM is a lin- the model when new data is available. The study pre- ear model but in terms of variable differences. If sented by (Arce and Salinas, 2012) applied this idea cointegration exists, variable differences are station- using ridge regression. ary and they introduce an error correction term which There are several popular online methods adjusts coefficients to bring the variables back to equi- such as perceptron (Rosenblatt, 1958), passive-

Arce P., Antognini J., Kristjanpoller W. and Salinas L.. 193 An Online Vector Error Correction Model for Exchange Rates Forecasting. DOI: 10.5220/0005205901930200 In Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM-2015), pages 193-200 ISBN: 978-989-758-077-2 Copyright c 2015 SCITEPRESS (Science and Technology Publications, Lda.) ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

aggressive (Crammer et al., 2006), stochastic gradient In other words, a set of I(1) variables is said to descent (Zhang, 2004), aggregating algorithm (Vovk, be cointegrated if a linear combination of them exists 2001) and the second order perceptron (Cesa-Bianchi which is I(0). et al., 2005). In (Cesa-Bianchi and Lugosi, 2006), an in-deph analysis of online learning is provided. 2.2 Vector Autorregresive Models In this paper, we propose an online formulation of the VECM called Online VECM (OVECM). OVECM VECM is a special case of VAR model and both de- is a lighter version of VECM which considers only a scribe the joint behaviour of a set of variables. sliding window of the most recent data and introduces VAR(p) model is a general framework to describe matrix optimizations in order to reduce the number of the behaviour of a set of l endogenous variables as operations and therefore execution times. OVECM a linear combination of their last p values. These l also takes into account the fact that cointegration vec- variables at time t are represented by the vector yt as tor space doesn’t experience large changes with small follows: changes in the input data.  > OVECM is later compared against VECM and yt = y1,t y2,t ... yl,t , ARIMA models using four currency rates from the where y corresponds to the time series j evaluated foreign exchange market with 1-minute frequency. j,t at time t. VECM and ARIMA models were used in an itera- The VAR(p) model describes the behaviour of a tive way in order to allow fair comparison. Execu- dependent variable in terms of its own lagged values tion times and forecast performance measures MAPE, and the lags of the others variables in the system. The MAE and RMSE were used to compare all methods. model with p lags is formulated as the following: Model effectiveness is focused on out-of-sample forecast rather than in-sample fitting. This criteria y = φ y + ··· + φ y + c + ε , (2) allows the OVECM prediction capability to be ex- t 1 t−1 p t−p t pressed rather than just explaining data history. where t = p + 1,...N, φ1,...,φp are l × l matrices of The next sections are organized as follows: sec- real coefficients, εp+1,...,εN are error terms, c is a tion 2 presents the VAR and VECM, the OVECM al- constant vector and N is the total number of samples. gorithm proposed is presented in section 3. Section 4 The VAR matrix form of equation (2) is: gives a description of the data used and the tests car- ried on to show accuracy and time comparison of our B = AX + E, (3) proposal against the traditional VECM and section 5 where: includes conclusions and a proposal for future study. >  yp ... yN−1 yp−1 ... yN−2  . . .  A =  . .. .  , 2 BACKGROUND  . .   y1 ... yN−p 2.1 Integration and Cointegration 1 ... 1  >

A time series y is said to be integrated of order d if B = yp+1 ... yN , after differentiating the variable d times, we get an I(0) process, more precisely:  >

(1 − L)dy ∼ I(0), X =  φ1 ··· φp c  , where I(0) is a stationary time series and L is the lag operator:  > E = εp+1 ... εN . (1 − L)y = ∆y = yt − yt−1 ∀t 1 l Let yt = {y ,...,y } be a set of l stationary time Equation (3) can be solved using ordinary least series I(1) which are said to be cointegrated if a vector > l squares estimation. β = [β(1),...,β(l)] ∈ R exists such that the time series,

> 1 l Zt := β yt = β(1)y + ··· + β(l)y ∼ I(0). (1)

194 AnOnlineVectorErrorCorrectionModelforExchangeRatesForecasting

 > > > 2.3 VECM β yp ··· β yN−1  ∆yp ··· ∆yN−1    VECM is a special form of a VAR model for I(1) vari- A =  . .. .  , (5) ables that are also cointegrated (Banerjee, 1993). It  . . .   ∆y ··· ∆y  is obtained by replacing the form ∆yt = yt − yt−1 in  2 N−p+1 equation (2). VECM is expressed in terms of differ- 1 ··· 1 ences, has an error correction term and the following  > form: B = ∆yp+1 ... ∆yN , (6) p−1 ∗ ∆yt = Ωyt−1 + ∑ φi ∆yt−i + c + εt , (4)  > | {z } i=1 Error correction term ∗ ∗ X =  α φ1 ··· φp−1 c , (7) ∗ where coefficients matrices Ω and φi are function of matrices φi (shown in equation (2)) as follows:  > p ∗ E = εp+1 ... εN  (8) φi := − ∑ φ j j=i+1 Ω := −(I − φ1 − ··· − φp) VAR and VECM parameters shown in equation (3) can be solved using standard regression tech- The matrix Ω has the following properties (Johansen, niques, such as ordinary least squares (OLS). 1995): • If Ω = 0 there is no cointegration 2.4 Ordinary Least Squares Method • If rank(Ω) = l i.e full rank, then the time series are not I(1) but stationary When A is singular, solution to equation (3) is given by the ordinary least squares (OLS) method. OLS • If rank( ) = r, 0 < r < l then, there is coin- Ω consists of minimizing the sum of squared errors or tegration and the matrix can be expressed as Ω equivalently minimizing the following expression: Ω = αβ>, where α and β are (l × r) matrices and rank(α) = rank(β) = r. 2 min kAX − Bk2 The columns of β contains the cointegration vec- X tors and the rows of α correspond with the ad- for which the solution Xˆ is well-known: justed vectors. β is obtained by Johansen proce- dure (Johansen, 1988) whereas α has to be deter- Xˆ = A+ B mined as a variable in the VECM. where A+ is the Moore-Penrose pseudo-inverse which It is worth noticing that the factorization of the can be written as follows: matrix Ω is not unique since for any r × r nonsin- gular matrix H we have: A+ = (A>A)−1A> . (9) > = −1 > However, when A is not full rank, i.e rank(A) = αβ αHH β > −1 > > k < n ≤ m, A A is always singular and equation (9) = (αH)(β(H ) ) cannot be used. More generally, the pseudo-inverse is = α∗(β∗)> best computed using the compact singular value de- composition (SVD) of A: with α∗ = αH and β∗ = β(H−1)>. > If cointegration exists, then equation (4) can be A = U1 Σ1 V , m×n 1 written as follows: m×k k×k k×n as follows p−1 ∆y = αβ>y + φ∗∆y + c + ε , t t−1 ∑ i t−i t A+ = V Σ−1U> . i=1 1 1 1 which is a VAR model but for time series differences. VECM has the same form shown in equation (3) but with different matrices:

195 ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

3 METHODOLOGY Algorithm 1: OVECM: Online VECM. Input: 3.1 Online VECM y: matrix with N input vectors and l time series p: number of past values L L < N Since VECM parameter estimation is computation- : sliding window size ( ) ally expensive, we propose an online version of mean error: MAPE threshold n VECM (OVECM). OVECM considers only a slid- : interpolation points to obtain MAPE Output: ing window of the most recent data. Moreover, since {y [L + ],...,y [N]} cointegration vectors represent long-run relationships pred 1 pred : model predictions 1: for i = N − L do which vary little in time, OVECM determines firstly 0 to 2: y ← y[i i + L] if they require calculation. i : 3: if i = 0 then OVECM also implements matrix optimizations in 4: v ← getJohansen(y , p) order to reduce execution time, such as updating ma- i 5: [AB] ← vecMatrix(y , p,v) trices with new data, removing old data and introduc- i 6: else ing new cointegration vectors. 7: [AB] ← vecMatrixOnline(y , p,v,A,B) Algorithm 1 shows our OVECM proposal which i 8: ∆Y [i] ← A[−1,:] × X considers the following: pred 9: end if • The function getJohansen returns cointegration 10: X ← OLS(A,B) vectors given by the Johansen method considering 11: e ← mape(B[−n,:],A[−n,:] × X) the trace statistic test at 95% significance level. 12: if mean(e) > mean error then getJohansen • The function vecMatrix returns the matrices (5) 13: v ← (yi, p) vecMatrixUpdate and (6) that allows VECM to be solved. 14: A ← (yi, p,v,A) 15: X ← OLS(A,B) • The function vecMatrixOnline returns the ma- 16: end if trices (5) and (6) aggregating new data and remov- 17: end for ing the old one, avoiding calculation of the matrix 18: Ytrue ← Y[L + 1 : N] A from scratch. 19: Ypred ← Y[L : N − 1] + ∆Ypred • Out-of-sample outputs are saved in the variables Y and Y . true pred Algorithm 2: SLVECM: Sliding window VECM. • The model is solved using OLS. Input: • In-sample outputs are saved in the variables ∆ytrue y: matrix with N input vectors and l time series and ∆ypred. p: number of past values L L < N • The function mape gets the in-sample MAPE for : sliding window size ( ) Output: the l time series. {ypred[L + 1],...,ypred[N]}: model predictions • Cointegration vectors are obtained at the begin- 1: for i = 0 to N − L do ning and when they are required to be updated. 2: yi ← y[i : i + L + 1] This updating is decided based on the in-sample 3: model = VECM(yi, p) MAPE of the last n inputs. The average of all 4: Ypred[i] = model.predict(y[i + L]) MAPEs are stored in the variable e. If the aver- 5: end for age of MAPEs (mean(e)) is above a certain error 6: Ytrue = y[i + L + 1 : N] given by the mean error threshold, cointegration vectors are updated. • If new cointegration vectors are required, the recent data, whereby the models are updated at every function vecMatrixUpdate only updates the cor- time step. responding columns of matrix A affected by those Since we know out time series are I(1) SLARIMA vectors (see equation 5). is called with d = 1. ARIMA is executed for every Our proposal was compared against VECM and time series. ARIMA. Both algorithms were adapted to an online Both OVECM and SLVECM time complexity context in order to get a reasonable comparison with is dominated by Johansen method which is O(n3). our proposal (see algorithms 2 and 3). VECM and Thus, both algorithms order is O(Cn3) where C is the ARIMA are called with a sliding window of the most number of iterations.

196 AnOnlineVectorErrorCorrectionModelforExchangeRatesForecasting

Algorithm 3: SLARIMA: Sliding window ARIMA. l is the loglikelihood function Input: k number of estimated parameters y: matrix with N input vectors and l time series N p: autoregressive order number of observations d: integrated order Loglikelihood function is obtained from the q: moving average order Residual Sum of Squares (RSS): L: sliding window size (L < N) N  RSS Output: l = − 1 + ln(2π) + ln (14) {ypred[L + 1],...,ypred[N]}: model predictions 2 N 1: for i = 0 to N − L do 2: for j = 0 to l − 1 do 3: yi ← y[i : i + L + 1, j] 4 RESULTS 4: model = ARIMA(yi,(p,d,q)) 5: Ypred[i, j] = model.predict(y[i + L, j]) 4.1 Data 6: end for 7: end for Tests of SLVECM, SLARIMA and our proposal 8: Y = y[i + L + 1 : N,:] true OVECM were carried out using foreign four ex- change data rates: EURUSD, GBPUSD, USDCHF 3.2 Evaluation Methods and USDJPY. This data was collected from the free database Dukascopy which gives access to the Swiss Forecast performance is evaluated using different Foreign Exchange Marketplace (Dukascopy, 2014). methods. We have chosen three measures usually The reciprocal of the last two rates (CHFUSD, used: JPYUSD) were used in order to obtain the same base currency for all rates. The tests were done us- MAPE. Mean Average Percent Error which presents ing 1-minute frequency from ask prices which corre- forecast errors as a percentage. sponded to 1.440 data points per day from the 11th to 1 N |y − yˆ | the 15th of August 2014. MAPE = ∑ t t × 100 (10) N |yt | t=1 4.2 Tests MAE. Mean Average Error which measures the dis- tance between forecasts to the true value. Before running the tests, we firstly checked if they 1 N were I(1) time series using the Augmented Dickey MAE = ∑ |yt − yˆt | (11) N t=1 Fuller (ADF) test. RMSE. Root Mean Square Error also measures the Table 1: Unit roots tests. distance between forecasts to the true values but, unlike MAE, large deviations from the true value Statistic Critical value Result have a large impact on RMSE due to squaring EURUSD -0.64 -1.94 True forecast error. ∆EURUSD -70.45 -1.94 False v GBPUSD -0.63 -1.94 True u N u 2 ∆GBPUSD -54.53 -1.94 False u ∑(yt − yˆt ) t CHFUSD -0.88 -1.94 True RMSE = t=1 (12) N ∆CHFUSD -98.98 -1.94 False JPYUSD -0.65 -1.94 True ∆JPYUSD -85.78 -1.94 False 3.3 Model Selection

Akaike Information Criterion (AIC) is often used in Table 1 shows that all currency rates cannot reject model selection where AIC with smaller values are the unit root test but they rejected it with their first preferred since they represent a trade-off between bias differences. This means that all of them are I(1) time and variance. AIC is obtained as follows: series and we are allowed to use VECM and therefore 2l 2k OVECM. AIC = − + (13) N N bias variance where

197 ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

4.3 Parameter Selection 4.4 Execution Times

In order to set OVECM parameters: windows size L We ran OVECM and SLVECM 400 iterations. and lag order p, we propose to use several window SLARIMA execution time is excluded because its sizes: L = 100,400,700,1000. For every window size is not comparable with OVECM and SLVECM. L we chose lag order with minimum AIC. SLARIMA was created based on statsmodels library ARIMA parameters were also obtained using routine ARIMA. AIC. Parameters optimization is presented in table 2: The execution times are shown in the table 3.

Table 2: Parameters optimization. VECM order and Table 3: Execution times. ARIMA parameters were selected using AIC. L order e Time[s] Windows size L VECM ARIMA OVECM 100 p=2 0 2.492 L order (p) order (p,d,q) OVECM 100 p=2 0.0026 1.606 100 2 p=2,d=1,q=1 SLVECM 100 p=2 – 2.100 400 5 p=1,d=1,q=1 OVECM 400 p=5 0 3.513 700 3 p=2,d=1,q=1 OVECM 400 p=5 0.0041 2.569 1000 3 p=2,d=1,q=1 SLVECM 400 p=5 – 3.222 OVECM 700 p=3 0 3.296 In OVECM we also define a mean error vari- OVECM 700 p=3 0.0032 2.856 able, which was defined based on the in-sample SLVECM 700 p=3 – 3.581 MAPEs. Figure 1 shows how MAPE moves and how OVECM 1000 p=3 0 4.387 mean error variable help us to decide whether new OVECM 1000 p=3 0.0022 2.408 cointegration vectors are needed. SLVECM 1000 p=3 – 3.609

0.0040 MAPE EURUSD It is clear that execution time depends directly on 0.0035 L and p since they determine the size of matrix A and 0.0030 therefore affect the OLS function execution time. It 0.0025 is worthy of note that execution time also depends 0.0020 on mean error because it determines how many times 0.0015 0 10 20 30 40 50 OVECM will recalculate cointegration vectors which 0.0028 0.0027 MAPE GBPUSD is an expensive routine. 0.0026 0.0025 Figure 1 shows an example of the in-sample 0.0024 0.0023 MAPE for 50 iterations. When the average of the in- 0.0022 0.0021 sample MAPEs is above mean error new cointegra- 0.0020 0.0019 tion vectors are obtained. In consequence, OVECM 0 10 20 30 40 50

0.012 performance increases when mean error increases. 0.011 MAPE USDCHF However, this could affect accuracy, but table 4 shows 0.010 0.009 that using an appropriate mean error doesn’t affect 0.008 0.007 accuracy considerable. 0.006

0.005

0.004 0 10 20 30 40 50 4.5 Performance Accuracy

0.000110 0.000105 MAPE USDJPY 0.000100 Table 4 shows in-sample and out-of-sample per- 0.000095 0.000090 formance measures: MAPE, MAE and RMSE for 0.000085 0.000080 OVECM, SLVECM and SLARIMA. Test were done 0.000075 0.000070 using the parameters defined in table 2. We can 0.000065 0 10 20 30 40 50 see that OVECM has very similar performance than Figure 1: In-sample MAPEs example for 50 minutes. The SLVECM and this support the theory that cointegra- average of them is considered to obtain new cointegration tion vectors vary little in time. Moreover, OVECM vectors. also out performed SLARIMA based on these three performance measures. We can also notice that in-sample performance in OVECM and SLVECM is related with the out-of- sample performance. This differs with SLARIMA

198 AnOnlineVectorErrorCorrectionModelforExchangeRatesForecasting

Table 4: Model measures.

Model In-sample Out-of-sample Method L Parameters e MAPE MAE RMSE MAPE MAE RMSE OVECM 100 P=2 0.0026 0.00263 0.00085 0.00114 0.00309 0.00094 0.00131 OVECM 400 P=5 0.0041 0.00378 0.00095 0.00127 0.00419 0.00103 0.00143 OVECM 700 P=3 0.0032 0.00323 0.00099 0.00130 0.00322 0.00097 0.00132 OVECM 1000 P=3 0.0022 0.00175 0.00062 0.00087 0.00172 0.00061 0.00090 SLVECM 100 P=2 - 0.00262 0.00085 0.00113 0.00310 0.00095 0.00132 SLVECM 400 P=5 - 0.00375 0.00095 0.00126 0.00419 0.00103 0.00143 SLVECM 700 P=3 - 0.00324 0.00099 0.00130 0.00322 0.00098 0.00132 SLVECM 1000 P=3 - 0.00174 0.00061 0.00087 0.00172 0.00061 0.00090 SLARIMA 100 p=2, d=1, q=1 - 0.00285 0.00110 0.00308 0.00312 0.00098 0.00144 SLARIMA 400 p=1, d=1, q=1 - 0.00377 0.00101 0.00128 0.00418 0.00106 0.00145 SLARIMA 700 p=2, d=1, q=1 - 0.00329 0.00102 0.00136 0.00324 0.00097 0.00133 SLARIMA 1000 p=2, d=1, q=1 - 0.00281 0.00074 0.00105 0.00177 0.00063 0.00092 which models with good in-sample performance are 5 CONCLUSIONS not necessarily good out-of-sample models. More- over OVECM outperformed SLARIMA using the A new online vector error correction method was pre- same window size. sented. We have shown that our proposed OVECM Figure 2 shows the out-of-sample forecasts made considerably reduces execution times without com- by our proposal OVECM with the best parameters promising solution accuracy. OVECM was compared found based on table 4 which follows the time series with VECM and ARIMA with the same sliding win- very well. dow sizes and OVECM outperformed both in terms

+2.918e−1 0.00020 of execution time. Traditional VECM slightly outper- True EURUSD 0.00018 formed our proposal but the OVECM execution time Pred EURUSD 0.00016 is lower. This reduction of execution time is mainly 0.00014 because OVECM avoids the cointegration vector cal- 0.00012

0.00010 culation using the Johansen method. The condition

0.00008 0 10 20 30 40 50 for getting new vectors is given by the mean error

+5.175e−1 0.00026 variable which controls how many times the Johansen 0.00024 True GBPUSD 0.00022 Pred GBPUSD method is called. Additionally, OVECM introduces 0.00020 0.00018 matrix optimization in order to get the new model in 0.00016 0.00014 an iterative way. We could see that our algorithm took 0.00012 0.00010 much less than a minute at every step. This means that 0.00008 0 10 20 30 40 50 it could also be used with higher frequency data and −0.09852 would still provide responses before new data arrives. −0.09854 True USDCHF Pred USDCHF −0.09856 For future study, it would be interesting to im- −0.09858 prove the out-of-sample forecast by considering more −0.09860 explicative variables, to increase window sizes or try- −0.09862 ing new conditions to obtain new cointegration vec- −0.09864 0 10 20 30 40 50

+4.62564 tors. 0.00010 True USDJPY Since OVECM is an online algorithm which opti- 0.00008 Pred USDJPY mizes processing time, it could be used by investors 0.00006 as an input for strategy robots. Moreover, some tech- 0.00004 nical analysis methods could be based on its output. 0.00002

0 10 20 30 40 50 Figure 2: OVECM forecasting accuracy example for 50 minutes using L = 1000 and p = 3.

199 ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

REFERENCES

Arce, P. and Salinas, L. (2012). Online ridge regression method using sliding windows. 2011 30th Interna- tional Conference of the Chilean Computer Science Society, 0:87–90. Banerjee, A. (1993). Co-integration, Error Correction, and the Econometric Analysis of Non-stationary Data. Advanced texts in econometrics. Oxford University Press. Cesa-Bianchi, N., Conconi, A., and Gentile, C. (2005). A second-order perceptron algorithm. SIAM J. Comput., 34(3):640–668. Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learn- ing, and Games. Cambridge University Press, New York, NY, USA. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., and Singer, Y. (2006). Online passive-aggressive al- gorithms. Journal of Machine Learning Research, 7:551–585. Dukascopy (2014). Dukascopy historical data feed. Duy, T. A. and Thoma, M. A. (1998). Modeling and fore- casting cointegrated variables: some practical experi- ence. Journal of Economics and Business, 50(3):291– 307. Engle, R. F. and Granger, C. W. J. (1987). Co-Integration and Error Correction: Representation, Estimation, and Testing. Econometrica, 55(2):251–276. Engle, R. F. and Patton, A. J. (2004). Impacts of trades in an error-correction model of quote prices. Journal of Financial Markets, 7(1):1–25. Herlemont, D. (2003). Pairs trading, convergence trading, cointegration. YATS Finances & Techonologies, 5. Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control, 12(2-3):231–254. Johansen, S. (1995). Likelihood-Based Inference in Cointe- grated Vector Autoregressive Models. Oxford Univer- sity Press. Mukherjee, T. K. and Naka, A. (1995). Dynamic re- lations between macroeconomic variables and the japanese stock market: an application of a vector er- ror correction model. Journal of Financial Research, 18(2):223–37. Rosenblatt, F. (1958). The Perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–408. Vovk, V. (2001). Competitive on-line statistics. Interna- tional Statistical Review, 69:2001. Zhang, T. (2004). Solving large scale linear prediction prob- lems using stochastic gradient descent algorithms. In ICML 2004: Proceedings of the twenty-first inter- national conference on Machine Learning. OMNI- PRESS, pages 919–926.

200