A Randomness Test for Financial Time Series

A RANDOMNESS TEST FOR FINANCIAL TIME SERIES W. Adrián Risso* Abstract A randomness test is generated using tools from symbolic dynamics, and the theory of communication. The new thing is that neither normal distribution nor symmetric probability distribution, nor variance process is necessary to be assumed. Even more, traditional independent identically normal white noise is nested. Asymmetric probability distributions and more complex process like GARCH can also be tested. The test is used in different financial asset returns taken from NYSE. S&P 500, and Dow Jones indexes, and the 10 years treasure notes returns are also tested. Therefore we reject the null hypothesis of randomness for daily data. In addition, some series reject the null hypothesis when using weekly and monthly data. When assuming an asymmetric probability distribution as model for the returns hypothesis of randomness is rejected in the case of daily returns, and also for some cases using weekly and monthly data. This could suggest that asset returns follow a complex nonlinear dynamics, or there exist some kind of anomalies in the financial market. We also believe in the importance of the test, for instance, in checking if the residual of an ARIMA follow a white noise process no matter this process is normal distributed or not. Introduction Bachelier (1900) was the first proposing stock prices follow a Brownian motion. These stock prices would reflect all available information according to the efficient market hypothesis suggested by Fama (1965). Therefore the better prediction of future prices are the present prices. Since then, difference in stock prices has been modelled as white noise process, implying that stock prices are random walk processes. This assumption was criticized due to the existence of some well known stylized facts, Mandelbrot (1963) considers that financial returns have a long memory, then stock prices should be modelled using a fractal Brownian motion. Peters (1994) (1996) gives evidence of fractability in financial markets using Hurst exponent as a measure of persistence. Moreover, it seems that constant variance hypothesis in financial returns is not supported by empirical evidence. In fact Mandelbrot (1963) proposes an stable paretian distribution in order to model the asset returns which implies an infinite variance. Since then, different models have been proposed permitting the variance to change, an example is the ARCH model (see Engel (1982)), and the Markov switching models (see Hamilton (1989)). Lo and McKinlay (1988), using the variance ratio test, found that financial returns behaviour would not be random. This fact would suggest the existence of different anomalies, Singal (2004) surveys all the anomalies found until now. In this context, the present work has two principal objectives. At first we tried to demonstrate that stock market asset returns do not behave as any type of white noise processes, and then, Brownian motion theory is not applied in this cases. We try to do this, taking the daily decreases and increases and seeing the behaviour of the combination for 2, 3, 4, and 5 day decreases and increases. Since theory says that returns are completely random, combinations of * Università degli Studi di Siena. e-mail address: [email protected] I would like to acknowledge Doyle Farmer, Roberto Renò, and Lionello Punzo for their very helpful comments, and suggestions. 1 different day decreases and increases should have the same probability among them. It means it would not be expected to find combinations of decreases and increases more probable than others. For instance imagine that 0 means decrease in one day returns and 1 means increase. Imagine also that we have the following daily time series of codified returns: 01001101000101101010010001110110010000101101101101101101110110011011001100100010 We have 80 days of decreases and increases, if the process in completely random in 1 day the probability of decreases should be 1/2 =40/80 and the same for increasing. Moreover if the process is random, in 2 days we have 4 possibilities and the probability should be 1/4 for each possible case. Reasoning in this manner, the probability of an event composed by combination of n days should be 2-n, in case of having a random process. In order to do this we develop a test of randomness based in the symbolic dynamic and the concept of entropy. The developing of a test of randomness not only for asset returns but also a general powerful test of randomness will be the second objective. In fact the test of randomness does not need the assumption of normality, and it permits the variance to follow different processes, like a GARCH process, or even an infinite variance like in the case of the paretian distributions suggested by Mandelbrot (1963). Section 2 explains what symbolic time series analysis and symbolic dynamic are, in section 3 a definition of Shannon entropy as a measure of the uncertainty is given. People familiar with these two concepts could skip the latter 2 sections. Section 4 proceeds to describe our randomness test, and in section 5 real data from NYSE is taken and contrasted with the new statistic. In section 6 we assume a more general probability distribution for the returns were possible asymmetry is considered. Finally section 7 concludes. 2) Symbolic Dynamics and Symbolic Time Series Analysis (STSA) The earliest developments in symbolic dynamics began with the study of the complex behaviour of dynamical system. In 1898, Hadamard developed a symbolic description of sequence in geodesic flows on surfaces of negative curvature. Morse and Hedlund (1938) were the first using the term symbolic dynamic. Their idea was that a symbolic trajectory T was formed by symbols taken from a finite set of generating symbols subject to rules of admissibility which are dependent on the underlying Poincaré fundamental group. The trajectory taken with one of its symbols is called a symbolic element. Symbolic elements are analogues of line elements on an ordinary trajectory. A metric is assigned to the space of symbolic elements and this space turns out to be perfect, compact, and totally disconnected. As Piccardi (2004) remarks, Symbolic dynamic is a term which denotes theoretical investigation and is well distinct from its experiment-oriented approach known as STSA. During the last years the STSA methodology has been applied in the qualitative analysis of time series in different fields of the sciences. Nonetheless, few application in economics has been done using this type of series. Symbolization involves transformation of raw time-series measurements into a series of discrete symbols that are processed to extract information about the generation process. As Keller and Wittfeld (2004) note the idea behind this method is simple: instead of considering the exact state of a system at some time, one is interested in a coarse-grained description. The state space is decomposed into a small number of pieces, and states being contained in the same piece are identified. The STSA is still developing in different ways, however we will focus on the approach given by Finney, and Daw (1998). 2.1) Symbolization We will proceed to describe the STSA methodology based on Finney and Daw (1998), Finney, Daw, and Halow (1998), and Daw , Finney and Tracy (2003). Given a continuous or discrete time series (for instance, let us consider a series of daily asset returns) it is necessary to define the number of symbols n. Transformation consists in the partition of the original time series in one or many selected thresholds, values of the original series which are in one part of the partition will be codified with a symbol as is shown in figure 1 2 Fig.1: Process of symbolizing of a Time Series. Taken from Daw, Finn ey, Tracy (2003) Typically, the discretization partition is defined in such a way that the individual occurrence of each symbol is equiprobable with all others. In the dynamic transformation, arithmetic differences of adjacent data points define the symbolic values. We symbolize a positive difference as 1 and a negative difference as 0. Such a differenced symbolization scheme is relatively insensitive to extreme noise spikes in the data. After symbolization, the next step in identification of temporal patterns is the construction of symbol sequences (words in the symbolic-dynamics terminology). If each possible sequence is represented in terms of a unique identifier, the end result will be a new time series often referred to as a symbolic-sequence series (or code series). Symbolic-sequence construction has at least outward similarities to time-delay embedding. Again, if the process is random all the possible “words” of the same length will have the same probability. Symbol-sequence construction has also been described in terms of symbolic trees. For a fixed sequence length of L successive symbols, the total number of branches is nL, and thus the number of possible sequences increases exponentially with tree depth. There is no methodology to choose the correct number of L. Daw, Finney, and Tracy (2003) assert that a recent improvement to the fixed-sequence-length approach is the implementation of context trees by Kennel and Mees (2000, 2001). This approach allows some of the possible sequences (i.e. branches) to be shortened to reflect reduced predictability over long times. Modification of the length of individual branches depends on information theoretic measures that indicate how efficiently the observed dynamics are predicted (i.e., how well the symbolization scheme compresses the available information). For a given dynamical system, all sequences are not realizable. Such nonocurring sequences are called forbidden words. Various statistics can be used to characterize the symbol-sequence histograms. For example, we can define a modified Shannon entropy as: 1 H S = − ∑ pi,L log2 pi,L log2 N i where, N is the total number of sequences in the time series, i is a symbol-sequence index of sequence vector length L, and pi,l is the relative frequency of symbol sequence i.

Load more