<<

A TEST FOR FINANCIAL SERIES

W. Adrián Risso*

Abstract

A randomness test is generated using tools from symbolic dynamics, and the theory of communication. The new thing is that neither normal distribution nor symmetric , nor process is necessary to be assumed. Even more, traditional independent identically normal white is nested. Asymmetric probability distributions and more complex process like GARCH can also be tested. The test is used in different financial asset returns taken from NYSE. S&P 500, and Dow Jones indexes, and the 10 years treasure notes returns are also tested. Therefore we reject the null hypothesis of randomness for daily . In addition, some series reject the null hypothesis when using weekly and monthly data. When assuming an asymmetric probability distribution as model for the returns hypothesis of randomness is rejected in the case of daily returns, and also for some cases using weekly and monthly data. This could suggest that asset returns follow a complex nonlinear dynamics, or there exist some kind of anomalies in the financial market. We also believe in the importance of the test, for instance, in checking if the residual of an ARIMA follow a process no matter this process is normal distributed or not.

Introduction

Bachelier (1900) was the first proposing stock prices follow a Brownian motion. These stock prices would reflect all available information according to the efficient market hypothesis suggested by Fama (1965). Therefore the better of prices are the present prices. Since then, difference in stock prices has been modelled as white noise process, implying that stock prices are processes.

This assumption was criticized due to the existence of some well known stylized facts, Mandelbrot (1963) considers that financial returns have a long memory, then stock prices should be modelled using a fractal Brownian motion. Peters (1994) (1996) gives evidence of fractability in financial markets using as a measure of persistence.

Moreover, it seems that constant variance hypothesis in financial returns is not supported by empirical evidence. In fact Mandelbrot (1963) proposes an stable paretian distribution in order to model the asset returns which implies an infinite variance. Since then, different models have been proposed permitting the variance to change, an example is the ARCH model (see Engel (1982)), and the Markov switching models (see Hamilton (1989)).

Lo and McKinlay (1988), using the variance ratio test, found that financial returns behaviour would not be random. This fact would suggest the existence of different anomalies, Singal (2004) surveys all the anomalies found until now.

In this context, the present work has two principal objectives. At first we tried to demonstrate that stock market asset returns do not behave as any type of white noise processes, and then, Brownian motion theory is not applied in this cases. We try to do this, taking the daily decreases and increases and seeing the behaviour of the combination for 2, 3, 4, and 5 day decreases and increases. Since theory says that returns are completely random, combinations of

* Università degli Studi di Siena. e-mail address: [email protected] I would like to acknowledge Doyle Farmer, Roberto Renò, and Lionello Punzo for their very helpful comments, and suggestions. 1 different day decreases and increases should have the same probability among them. It it would not be expected to find combinations of decreases and increases more probable than others. For instance imagine that 0 means decrease in one day returns and 1 means increase. Imagine also that we have the following daily time series of codified returns:

01001101000101101010010001110110010000101101101101101101110110011011001100100010

We have 80 days of decreases and increases, if the process in completely random in 1 day the probability of decreases should be 1/2 =40/80 and the same for increasing. Moreover if the process is random, in 2 days we have 4 possibilities and the probability should be 1/4 for each possible case. Reasoning in this manner, the probability of an event composed by combination of n days should be 2-n, in case of having a random process. In order to do this we develop a test of randomness based in the symbolic dynamic and the concept of entropy. The developing of a test of randomness not only for asset returns but also a general powerful test of randomness will be the second objective. In fact the test of randomness does not need the assumption of normality, and it permits the variance to follow different processes, like a GARCH process, or even an infinite variance like in the case of the paretian distributions suggested by Mandelbrot (1963). Section 2 explains what symbolic time series and symbolic dynamic are, in section 3 a definition of Shannon entropy as a measure of the is given. People familiar with these two concepts could skip the latter 2 sections. Section 4 proceeds to describe our randomness test, and in section 5 real data from NYSE is taken and contrasted with the new . In section 6 we assume a more general probability distribution for the returns were possible asymmetry is considered. Finally section 7 concludes.

2) Symbolic Dynamics and Symbolic Time Series Analysis (STSA)

The earliest developments in symbolic dynamics began with the study of the complex behaviour of dynamical system. In 1898, Hadamard developed a symbolic description of in geodesic flows on surfaces of negative curvature. Morse and Hedlund (1938) were the first using the term symbolic dynamic. Their idea was that a symbolic trajectory T was formed by symbols taken from a finite set of generating symbols subject to rules of admissibility which are dependent on the underlying Poincaré fundamental group. The trajectory taken with one of its symbols is called a symbolic element. Symbolic elements are analogues of line elements on an ordinary trajectory. A metric is assigned to the space of symbolic elements and this space turns out to be perfect, compact, and totally disconnected.

As Piccardi (2004) remarks, Symbolic dynamic is a term which denotes theoretical investigation and is well distinct from its -oriented approach known as STSA. During the last years the STSA methodology has been applied in the qualitative analysis of time series in different fields of the sciences. Nonetheless, few application in economics has been done using this type of series. Symbolization involves transformation of raw time-series measurements into a series of discrete symbols that are processed to extract information about the generation process. As Keller and Wittfeld (2004) note the idea behind this method is simple: instead of considering the exact state of a system at some time, one is interested in a coarse-grained description. The state space is decomposed into a small number of pieces, and states being contained in the same piece are identified. The STSA is still developing in different ways, however we will focus on the approach given by Finney, and Daw (1998).

2.1) Symbolization

We will proceed to describe the STSA methodology based on Finney and Daw (1998), Finney, Daw, and Halow (1998), and Daw , Finney and Tracy (2003). Given a continuous or discrete time series (for instance, let us consider a series of daily asset returns) it is necessary to define the number of symbols n. Transformation consists in the partition of the original time series in one or many selected thresholds, values of the original series which are in one part of the partition will be codified with a symbol as is shown in figure 1

2

Fig.1: Process of symbolizing of a Time Series. Taken from Daw, Finn ey, Tracy (2003)

Typically, the discretization partition is defined in such a way that the individual occurrence of each symbol is equiprobable with all others. In the dynamic transformation, arithmetic differences of adjacent data points define the symbolic values. We symbolize a positive difference as 1 and a negative difference as 0. Such a differenced symbolization scheme is relatively insensitive to extreme noise spikes in the data.

After symbolization, the next step in identification of temporal patterns is the construction of symbol (words in the symbolic-dynamics terminology). If each possible sequence is represented in terms of a unique identifier, the end result will be a new time series often referred to as a symbolic-sequence series (or code series). Symbolic-sequence construction has at least outward similarities to time-delay embedding.

Again, if the process is random all the possible “words” of the same length will have the same probability. Symbol-sequence construction has also been described in terms of symbolic trees. For a fixed sequence length of L successive symbols, the total number of branches is nL, and thus the number of possible sequences increases exponentially with tree depth. There is no methodology to choose the correct number of L.

Daw, Finney, and Tracy (2003) assert that a recent improvement to the fixed-sequence-length approach is the implementation of context trees by Kennel and Mees (2000, 2001). This approach allows some of the possible sequences (i.e. branches) to be shortened to reflect reduced over long . Modification of the length of individual branches depends on information theoretic measures that indicate how efficiently the observed dynamics are predicted (i.e., how well the symbolization scheme compresses the available information). For a given dynamical system, all sequences are not realizable. Such nonocurring sequences are called forbidden words.

Various can be used to characterize the symbol-sequence . For example, we can define a modified Shannon entropy as:

1

H S = − ∑ pi,L log2 pi,L log2 N i

where, N is the total number of sequences in the time series, i is a symbol-sequence index of sequence vector length L, and pi,l is the relative of symbol sequence i. For random data HS should be equal to 1, whereas for nonrandom data it should be between 0 and 1.

3 There exist many works in sciences1using this special form of symbolization, see Daw, Finney and Tracy (2003) for a review in Astrophysics, Biology, Medicine, Fluid Flow, Chemistry, Mechanical Systems, Artificial intelligence, Control, and Communication. However, this approach has been scarcely applied in economics, Brida (2000) considers an application of STSA to economic time series, Brida, and Punzo (2003) apply the symbolic time series using dynamic regimes. Nevertheless, the approach has not been applied in finances.

Other types of symbolic time series analysis have been done. Keller and Lauffer (2002), Keller and Wittfeld (2004), and Keller and Sinn (2005) use symbolic time series from the ECG (electroencephalogram). In the latter paper the approach ordinal analysis of time series is developed but is a very incipient methodology. They assert that this method allows to recognize structure in a system and to discriminate and classify different states. In order to get reliable statements, it is necessary to develop models and ‘ordinal’ statistical characteristics beyond the permutation entropy.

Tiňo, et. al. in Abu-Mostafa, et. al. (2000) use a different approach in order to predict the volatility in the Dow Jones Industrial . They find that there appears to indeed lie some potential in using a symbolic dynamics approach to volatility , in which continuous values are replaced by symbols through quantization. They mark that there have not been many applications of symbolic methods to modelling financial time series. However, it was found that discretization of financial time series can potentially filter the data and reduce the noise. Even more, the symbolic nature of the pre-processed data enables one to interpret the predictive models as list of clear rules.

But, there are shortcomings: 1) The determination of the number of quantization intervals (symbols) and their cut values ad hoc; 2) In such situation due to nonstationarity issues many symbols make the statistics poor, in fact (Giles et al. (1997)) indicates that the predictive model achieved the best performance with binary inputs streams. In fact Giles et. al (1997) use an hybrid neural network for 5 different foreign exchange rates, the direction of change for the next day is predicted with a 47.1% error rate.

3) Shannon Entropy as a measure of uncertainty

Clausius (1865) introduced the concept of entropy as a measure of the amount of energy in a thermodynamic system. But we shall focus in the meaning given by Information Theory. In fact, Shannon (1948) considers entropy as an useful measure of uncertainty in the context of communication theory where the maximum values is taken by a completely random event. For instance, if we consider an English Book is an obvious fact that certain combinations of letters are more probable than other. The latter comes from the fact that English has a probability distribution. If we take a page of an English book we will see that events of three letters like THE will appear more frequently than combinations like XCV. In fact the combination THE has a larger probability in English language than XCV, however the two combinations will have the same probability in a random process of letters. Knowing this we can increase codifying more probably events with simple codes and no so probable others with more complex codes. Therefore Shannon measure will give a value for the English less than the value for a completely random process. This idea will be fundamental in our analysis, because if a time series of increases and decreases in stock market is completely random as theory asserts their entropy should be larger than a non-random process. Then the Normalized Shannon entropy will be our measure of randomness that we will use in the stock market asset returns. The entropy of a is a measure of the uncertainty of the random variable; it is a measure of the amount of information required on the average to describe the random variable.

We will first introduce the concept of entropy showing its required properties:

1) It should be a of P=(p1,p2,...,pn) in this manner we can write H=H(p1,p2,...,pn)=H(P),where P is probability distribution of the events. 2) It should be a continuous function of p1,p2,...,pn. Small changes in p1,p2,...,pn should cause a small change in Hn 3) It should not change when the outcomes are rearranged among themselves.

1Kurths, Schwarz, Witt, Krampe, Abel (1996) use this approach to measure analysis. Their application in cardiology, detecting High-Risk-Patients for Sudden Cardiac Death is very interesting. 4 4) It should not change if an impossible outcome is added to the probability scheme. 5) It should be minimum and possibly zero when there is no uncertainty. 6) It should be maximum when there is maximum uncertainty which arises when the outcomes are equally likely so that Hn should be maximum when p1=p2=…=pn=1/n. 7) The maximum value of Hn should increase as n increases. Shannon (1948) suggested the following measure:

Hn(p1,p2,…,pn)=-∑pilog2pi

Logarithms to base 2 are used then entropy is measured in bits. This measure satisfies all properties mentioned above and takes the maximum when all events are equally likely. The latter is easily seen if we solve the following Lagrange equation:

-∑pilog2pi - λ(∑pi=1).

Since the function is concave its local maximum is also a global maximum. Note also that when pi=0 we have 0log0=0 which is proved by continuity since xlogx→0 as x→0. Thus adding terms of zero probability does not change the entropy.

In order to clarify the concept of Shannon, let's take an event with two possibilities and their respective probabilities p and q=1-p. The Shannon Entropy will be defined by

H=-(p.log(p)+q.log(q))

Fig 2 shows graphically the shape of the function, note that the maximum is given when the probability is 0.5 for each event. This cases corresponds to a random event (like flipping a coin). Instead, a certain event (when probability of one event is 1) will produce an entropy equal to 0.

Fig.2: Shape of the Shannon entropy function. Note that maximum happens when the process is random (p=0.5)

This idea of maximum randomness when entropy is maximum, and certainty when the entropy is zero will be basic in the future of this work.

In general, Khinchin (1957) showed that any measure satisfying all the properties must take the following form:

-k∑pilog2pi

Where k is an arbitrary constant. In particular we will be interested in the normalized Shannon Entropy, therefore we will take k=1/log2(n). This will be used when comparing events of different lengths. We said that Shannon entropy is maximum when all outcomes are equally probable. This is consistent with Laplace's principle of insufficient reason that unless there is information to the contrary, all outcomes should be considered equally probable. 5

4) Test of Randomness based on the normalized Shannon Entropy

As it was mentioned before we are interested in a powerful test in order to see if the stock market asset returns are random events as the theory asserts. Therefore we will use the Normalized Shannon Entropy which gives a continuous measure from 0 (a completely certain event) to 1 (a completely random event) as fundamental input in our test.

We can obtain different data series from NYSE for more than 10.500 days. We are also interested in daily increases and decreases in financial returns, in the symbolic dynamic conception it means that we define the partition in 0. Thus we will have basically 2 possible events in one day, the return of an asset increases or decreases. Being this a random event as theory asserts probability of each event would be 0.5 obtaining a maximum Entropy equals to 1. Even more if we take 2 consecutives days (a word of 2 letters according to STSA) in the stock market we have 4 equally likely events (2 days increase in returns, 2 days of decreases, first day increases and second decreases, and the converse) the entropy should not be different from 1. Even more if the event is random, taking different consecutive days, these “words” should have the same probability among them. In general if we take n consecutive days of these random events the possibilities increase at the rate of 2ⁿ, for a random event the probability of finding a word of n days should be in this case 2-n.

The conception of the test is very simply. We simulate daily random increases and decreases in the stock market as flipping a coin, assuming as theory says that process is random. In order to do this we used MatLab random number generator of 0s and 1s, where “0” will decrease, and “1” increase. Note that no probability distribution is assumed, symmetry is not required, and no assumption about variance is held. Thus the generality of the test for random events is obtained, and the test could be used not only for the asset returns but also each time we are interested in checking if a time series is a white noise, for instance though in the residual of an ARIMA model.

The only requisite here is that 0 and 1 be equally probable. Thus, we simulated 10.000 times 10.500 returns changes, and we computed the correspondent entropy (H), giving always a number near to 1 as it was expected. In this manner we obtain 10.000 H for 1 day, 2 days, 3 days, 4, and 5 consecutive days. No difference should be found in H for choosing 1, 2, 3, 4, or 5 consecutive days, but since the sample is limited is expected that as the days increase, H reduces. However, theoretically a random event of consecutive days has no variation from the maximum entropy value.

Using the 10.000 Entropy simulated values of a random process we are able to construct our variable, it will be R=1-H, It means that 0 is a random process and 1 is completely certain event. Now we construct the corresponding simulated distribution for a 10.500 data sample. Figure 1, 2, 3, 4, and 5 show these simulated distributions.

R1day

25

20

15 Percent 10

5

0 ...... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6 6 7 7 8 9 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 1 3 5 7 9 1 5 8 1 6 1 4 R1day

6 R2days

10

8

6 Percent 4

2

0 ...... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 7 7 7 8 9 0 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 7 9 1 5 7 2 4 0 R2days

R3days

6

5

4

3 Percent

2

1

0 ...... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 9 9 9 0 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 9 1 4 9 1 3 7 1 R3days

R4days

4

3

2 Percent

1

0 ...... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 9 9 9 0 0 0 0 1 1 3 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 5 8 2 5 8 1 3 5 8 1 R4days

7 R5days

3.0

2.5

2.0

1.5 Percent

1.0

0.5

0.0 ...... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 0 0 0 0 0 1 1 1 1 1 2 2 2 2 3 4 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 9 2 4 6 9 1 2 R5days

The following is to take real data from stock market, codifying the stock market asset returns in increases and decreases according to the STSA and compute the Entropy, and then R. Therefore comparing the R with the table critical value using a p-value of 5% we are able to reject the null hypothesis about randomness of the process when the statistic is at the right of the critical value. Then H₀)R=0 means that our null hypothesis is that the process is random.

Table 1: Critical Values at 95% for statistic R=1-H (where H is Entropy) Sample=10500 R1day R2days R3days R4days R5days 0.00026 0.00032 0.00040 0.00054 0.00075 Source: Based on the obtained results

5) Standard & Poor 500, Dow Jones and 10 years treasure notes interest rate

5.1) Daily data

We obtained data series from the S&P 500, Dow Jones and the 10 year treasure notes interest rate. Taking 10.500 daily data for 11 asset returns, the 10 years treasure note interest rate difference, the Dow Jones, and the S&P 500 index differences, and codifying the series according to increases and decreases we computed the corresponding entropies until 5 consecutive days event. We describe the procedure because is an useful manner to show how we have to proceed to compute the .

r(t)=ln(P(t))-ln(P(t-1)) where r(t) is the financial return at the t and P(t) is its respective price. After that we proceed to codify the data according to the partition, therefore for positive values we assign “1”, and “0” for the negative ones.

⎧ 0 if r(t) < 0 s(t) = ⎨ ⎩1 if r(t) > 0

Once we have the daily sequence of 0s and 1s (or decreases and increases) we will be interested in the combination of these events in 2 days, 3, 4, and 5 days. Of courses the combinations of 0s and 1s increases according to 2L, where n is the quantity of days. Thus L=3 (3 days) means that we have 2³=8 possible cases, if the r(t) is white noise then all these combinations should be equally probable producing an entropy equal to 1.

8 Having the codified data, we compute the indicator R for 1, 2, 3, 4, and 5 days. For instance in 3 days we have 8 possibilities: (0,0,0),(0,0,1),(0,1,0),(0,1,1),(1,0,0),(1,0,1),(1,1,0), and (1,1,1). So we proceed to compute the cases and we calculate the R in the following way:

1 ⎡ n n n n ⎤ ⎛ 1 ⎞ ⎛ 1 ⎞ ⎛ 8 ⎞ ⎛ 8 ⎞ R(3) ≡ 1− H (3) = 1+ L ⎢⎜ ⎟log2 ⎜ ⎟ + ... + ⎜ ⎟log2 ⎜ ⎟⎥ log2 (2 ) ⎣⎝ n ⎠ ⎝ n ⎠ ⎝ n ⎠ ⎝ n ⎠⎦

2 Where ni is the quantity of the i-th event in the time series, and n is the total of events in the time series . Table 2 shows the results.

Table 2: Randomness test based on Normalized Shannon Entropy (R=1-H) (10.500 days) Financial Return R1-day R2-days R3-days R4-days R5-days Alcoa Inc. 0.0047 0.0064 0.0070 0.0074 0.0079 Boeing Co. 0.0063 0.0076 0.0087 0.0094 0.0100 Caterpillar Inc. 0.0038 0.0058 0.0065 0.0069 0.0073 Coca Cola Co. 0.0025 0.0030 0.0031 0.0043 0.0033 Du Pont EI 0.0044 0.0046 0.0046 0.0047 0.0048 Eastman Kodak Co. 0.0037 0.0039 0.0041 0.0043 0.0045 General Electric Co. 0.0021 0.0023 0.0025 0.0028 0.0031 General Motors Co. 0.0051 0.0055 0.0059 0.0064 0.0068 Hewlett Packard Co. 0.0017 0.0022 0.0027 0.0030 0.0034 IBM 0.0010 0.0010 0.0011 0.0012 0.0014 Walt Disney Co. 0.0028 0.0044 0.0053 0.0061 0.0066 S&P 500 0.0019 0.0071 0.0093 0.0105 0.0114 Dow Jones 0.0015 0.0035 0.0049 0.0056 0.0063 10 years treasure notes 0.0133 0.0183 0.0200 0.0208 0.0215 Source: Based on the obtained results

Note that all the R values are greater than the critical value at 95%, so we can reject the null hypothesis that financial returns are completely random, for instance they do not behave as white noise process. Therefore there are combination of events that are more probable than others. This fact could be suggesting that the process would have a complex nonlinear dynamic that seems to be completely random or the existence of some kind of anomaly. It is important to consider that we selected a particular partition, it is possible the existence of another partition or partitions remarking more important facts about the asset returns dynamic. Since a methodology for the selection of the correct partition is not available, more work is necessary in order to discover the possible existence of a simpler dynamic in the returns.

5.2) Weekly data

In order to check whether the random process is rejected taking other period of time, we also consider weekly data for the same assets obtaining more than 2.000 observations. In the same way as we did with the daily data we proceed to do the 10.000 simulations for 2.000 weeks obtaining the critical values in Table 3.

Table 3: Critical Values at 95% for statistic R=1-H (where H is Entropy) Sample=2.000 R1-week R2-weeks R3-weeks R4-weeks R5-weeks 0.00133 0.00167 0.00214 0.00283 0.00397 Source: Elaborated in base to obtained results

Computing the R- statistic for the assets as we did in the section 5.1 we obtain as result that null hypothesis of randomness is rejected in the case of indexes (S&P 500, Dow Jones), and for the 10 year treasure notes. More

2 For instance, n1 is the quantity of time that event (0,0,0) is seen in the codified time series. 9 difficult is the case of particular assets, where only in the case of Coca Cola Co., General Electric Co., and IBM the null hypothesis is rejected. The results are shown in Table 4.

Table 4: Randomness test based on Normalized Shannon Entropy (R=1-H) (2.000 weeks) Financial Return R1-week R2-weeks R3-weeks R4-weeks R5-weeks Alcoa Inc. 0.0000 0.0000 0.0003 0.0009 0.0015 Boeing Co. 0.0000 0.0005 0.0007 0.0011 0.0015 Caterpillar Inc. 0.0001 0.0008 0.0014 0.0021 0.0031 Coca Cola Co. 0.0001 0.0020* 0.0027* 0.0030* 0.0034 Du Pont EI 0.0005 0.0009 0.0012 0.0016 0.0023 Eastman Kodak Co. 0.0000 0.0008 0.0015 0.0021 0.0027 General Electric Co. 0.0001 0.0006 0.0011 0.0027 0.0042* General Motors Co. 0.0004 0.0006 0.0010 0.0016 0.0025 Hewlett Packard Co. 0.0003 0.0005 0.0007 0.0015 0.0023 IBM 0.0003 0.0011 0.0034* 0.0049* 0.0063* Walt Disney Co. 0.0005 0.0006 0.0016 0.0026 0.0035 S&P 500 0.0124* 0.0136* 0.0140* 0.0145* 0.0153* Dow Jones 0.0083* 0.0082* 0.0085* 0.0092* 0.0102* 10 years treasure notes 0.0004 0.0024* 0.0041* 0.0055* 0.0069* * indicates rejection of the null hypothesis Source: Based on the obtained results

5.3) Monthly data

We also tested the behaviour of the financial returns using monthly data. Then, we proceeded to simulate 10.000 times series of 500 points. The critical values at 95% are showed in table 5.

Table 5: Critical Values at 95% for statistic R=1-H (where H is Entropy) Sample=500 R1-month R2-months R3-months R4-months R5-months 0.0051 0.00674 0.00859 0.01148 0.01623 Source: Based on the obtained results

Table 6 shows the results for the R-statistics. As we can see randomness is rejected in case of S&P 500, and Dow Jones indexes. Note that also is rejected the null hypothesis for Coca Cola Co., and Walt Disney Co.. While it is not possible to reject randomness for the other financial returns.

Table 6: Randomness test based on Normalized Shannon Entropy (R=1-H) (500 months) Financial return R1-month R2-months R3-months R4-months R5-months Alcoa Inc. 0.0017 0.0018 0.0020 0.0031 0.0050 Boeing Co. 0.0020 0.0020 0.0034 0.0078 0.0141 Caterpillar Inc. 0.0037 0.0038 0.0039 0.0062 0.0089 Coca Cola Co. 0.0119* 0.0126* 0.0133* 0.0183* 0.0242* Du Pont EI 0.0000 0.0008 0.0013 0.0044 0.0087 Eastman Kodak Co. 0.0002 0.0002 0.0005 0.0043 0.0093 General Electric Co. 0.0000 0.0001 0.0007 0.0017 0.0038 General Motors Co. 0.0017 0.0025 0.0032 0.0054 0.0094 Hewlett Packard Co. 0.0009 0.0011 0.0022 0.0035 0.0060 IBM 0.0001 0.0001 0.0007 0.0011 0.0025 Walt Disney Co. 0.0104* 0.0116* 0.0124* 0.0139* 0.0157* 10 S&P 500 0.0159* 0.0173* 0.0181* 0.0209* 0.0246* Dow Jones 0.0205* 0.0205* 0.0214* 0.0230* 0.0280* 10 years treasure notes 0.0006 0.0020 0.0035 0.0045 0.0077 * indicates rejection of the null hypothesis Source: Based on the obtained results

6) Changing a traditional assumption about returns

Let us assume now that returns have an asymmetric probability distribution. For instance think that positive returns are more likely than negative ones. In that case we should change the symbolization in order to use our test. It is well known that the is a good measure of the probability distribution centre when we have asymmetric distribution. Then we will test if returns follow a random process with an asymmetric distribution. If the latter is true, doing a symbolization that takes 50% percent of probability by a side and 50% in the other side would produce combinations equally probable. This is possible using the median (we call µ) as a threshold. We know that we are going far from the traditional assumption but we want to show that even in this case the test is useful if we do the correct partition. This mean we have to take a threshold (in this case µ) defining two regions with 50% percent of probability as is shown in Fig. 3.

Fig.3: Asymmetric probability distribution for r. We have defined the partition in µ, since the process is random different combinations of 0s and 1s should be random

Therefore we can do the symbolization using the function defined bellow.

⎧ 0 if r(t) < µ s(t) = ⎨ ⎩1 if r(t) > µ

Being a random process the combination of these symbols for different moments of time should be equally probable as it was explained before, and the test could be used in the same way as before. Table 7 is a summary of the results that changes when we do this symbolization.

Table 7: Randomness test assuming an asymmetric distribution Daily R1 R2 R3 R4 R5 S&P 500 0 0.0051* 0.0074* 0.0087* 0.0097* Dow Jones 0 0.0021* 0.0037* 0.0045* 0.0053* Weekly R1 R2 R3 R4 R5 Coca Cola Co. 0 0.0021* 0.0029* 0.0033* 0.0037 General Electric Co. 0 0.0009 0.0015 0.0030 0.0046* Hewlett Packard Co. 0 0.0001 0.0002 0.0009 0.0015 IBM 0 0.0012 0.0034* 0.0048* 0.0064* Walt Disney Co. 0 0.0001 0.0011 0.0023 0.0031 S&P 500 0 0.0005 0.0009 0.0016 0.0022 Dow Jones 0 0.0001 0.0002 0.0009 0.0019 10 year treasure notes 0 0.0024* 0.0041* 0.0055* 0.0069* 11 Monthly R1 R2 R3 R4 R5 Alcoa Inc. 0 0.0011 0.0028 0.0056 0.0088 Boeing Co. 0 0.0009 0.0055 0.0109 0.0164* Caterpillar Inc. 0 0 0.0016 0.0056 0.0101 Coca Cola Co. 0 0.0005 0.0024 0.0046 0.0086 Du Pont EI 0 0.0008 0.0012 0.0047 0.0093 Eastman Kodak Co. 0 0 0.0001 0.0039 0.0095 General Electric Co. 0 0.0001 0.0007 0.0017 0.0038 General Motors Co. 0 0.0008 0.0015 0.0035 0.0078 Hewlett Packard Co. 0 0 0.0010 0.0018 0.0051 Walt Disney Co. 0 0.0011 0.0017 0.0034 0.0065 S&P 500 0 0.0001 0.0011 0.0034 0.0084 Dow Jones 0 0.0002 0.0009 0.0019 0.0038 10 years treasure notes 0 0.0014 0.0027 0.0040 0.0074 * indicates rejection of the null hypothesis ( Remember null hypothesis is: Ho) R=0 ) Source: Based on the obtained results

As we can see using daily data we can reject the null hypothesis of randomness (R=0) in all the cases so equal as in the symmetric case. Using weekly data we can reject the randomness in the cases of Coca Cola Co., General Electric Co., IBM, and the 10 year treasure notes, as we did in the symmetric case. But now we cannot reject the randomness in case of the S&P 500 and Dow Jones. When we use monthly data Boeing Co. seems to be the only one who does not present a random process. In all the other case we cannot reject the randomness hypothesis.

7) Conclusion

Modelling asset returns still assumes a white noise process. This idea comes from Bachelier (1900) who modelled asset prices as a Brownian motion. Fama (1965) connects this random process with the efficient market hypothesis, it means if the market is perfect so there is no place for forecasting.

However empirical evidence exists showing that the asset returns would not be random, see for instance Peter (1996) who uses Hurst coefficient in order to show this, Mandelbrot (1963) asserts that normality assumption is not correct at the moment of modelling financial time series, instead of this he proposes the stable paretian distribution. Lo and McKinlay (1989) shows that asset returns would not be random process according to their variance ratio test, and Singal (2004) shows all the known anomalies in the stock market.

We tried to show that financial series are not well modelled assuming a white noise process. In order to do this we used a new approach that comes from symbolic dynamics and information theory. The method developed is powerful since it does not require neither the assumption of normality, nor constant variance. Even more the latter process is nested in our test, since it recognizes this and others, for instance think of a GARCH process, or and asymmetric distribution. The only requirement for this random test is a 0 mean probability distribution. In this sense the test recognizes a general white noise.

It was shown that in all the cases the null hypothesis of randomness was rejected for daily data. Both cases were tested, assuming a symmetric 0 mean distribution or assuming an asymmetric distribution. This suggests that some events are more probable, and therefore the efficient market hypothesis would not be working. Even more when we try to test the randomness using different periods (weekly and monthly data), randomness is also rejected for S&P 500, Dow Jones, and Coca Cola Co. in the case of assuming a symmetric 0 mean distribution. The latter suggests that random walk would not be a good model in order to represent this kind of financial returns.

A possible next step would be to take for instance to make the test using 4 symbols (big decrease, small decrease, small increase, and big increase returns).

12 We believe also that the test could be applied as a general randomness test, for instance imagine testing the residuals of an ARIMA process. This test could be important detecting whether the residuals are a white noise no matter if they are normal distributed or not.

References

Bachelier, L. (1900): “Theory of Speculation”, in P.Cootner, ed., The Random Character of Stock Market Prices. Cambridge, MA: MIT Press, 1964.

Brida, J.G., (2000): “Symbolic Dynamics in Multiple Regime Economic Models”, Tesi di Dottorato di Ricerca in Economia Politica, Dipartimento di Economia Politica. Università degli Studi di Siena.

Brida, J.G., Punzo, L.F. (2003): “Symbolic time series analysis and dynamic regimes”, Structural Change and Economic Dynamics, 14, pp. 159-183.

Clausius, R. (1865): “The nature of the motion we call heat”, translated in Stephen G. Brush (ed.), Kinetic Theory, 1965.

Daw, C.S.,Finney, C.E.A., Tracy, E.R. (2003): “A review of symbolic analysis of experimental data” , Review of Scientific Instruments . February 2003 -- Volume 74, Issue 2, pp. 915-930

Engle, Robert F. (1982): “Autoregressive Conditional with Estimates of the Variance of United Kingdom Inflation”; Econometrica, Vol. 50, No. 4. (Jul., 1982), pp. 987-1008

Fama (1965): “The Behavior of Stock Market Prices”, Journal of Business, Vol. 38, pp. 34-105.

Finney, Daw, Green (1998): “Symbolic Time-Series Analysis of Engine Combustion measurements”, SAE paper No. 980624.

Finney, Daw, Nguyen, Hallow (1998): “Symbolic Sequences Statistics for Monitoring Fluidization”, 1998 International Mechanical Congress & Exposition.

Hamilton (1989):”A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle”, Econometrica, 57, pp. 357-384.

Keller, Lauffar, Heinz (2002): “Symbolic Analysis of High-Dimensional Time Series”. International Journal of Bifurcation and Chaos 13, pp. 2657-2668.

Keller, Wittfeld, Hatharina (2004): “Distances of Time Series Components by Means of Symbolic Dynamics”. International Journal of Bifurcation and Chaos 14, pp. 693-704

Keller, Sinn, (2005): “Ordinal Analysis of Time Series”. Physica A 356, pp. 114-120.

Khinchin (1957): Mathematical Foundations of Information Theory. Courier Dover Publications

Kurths, Schwartz, Krampe, Abel (1996): “Measures of Complexity in Signal Analysis”, in Chaotic, Fractal, and Nonlinear . Woodbury, New York 1996. pp. 33-54.

Lo; Andrew, W., MacKinlay, A., Craig. (1988): “Stock Market Prices do not Follow Random Walks: Evidence from a Simple Specification Test “,The Review of Financial Studies, Vol. 1, No. 1. (Spring, 1988), pp. 41-66.

Mandelbrot (1963): “The Variation of Certain Speculative Prices”. The Journal of Business, Vol. 36, No. 4. (Oct. 1963), pp. 394- 419.

Morse, M, Hedlund, G.A. (1938): “Symbolic Dynamics”, American Journal of Mathematics, Vol. 60, No. 4. (Oct., 1938), pp. 815-866

Peters, Edgard, E. (1994): Fractal Market Analysis, first edition, Wiley Finances Edition.

Peters, Edgard, E. (1996): Chaos and order in the capital markets, second edition, Wiley Finances Edition.

Piccardi, Carlo (2004): “On the Control of Chaotic System via Symbolic time series analysis”, Chaos, Vol. 14, No. 4, pp. 1026- 1034.

Shannon, Claude (1948): “A Mathematical theory of communication”, Bell. Sys. Tech. Journal, 27: 379-423., 623-656, 1948. 13

Singal, V. (2004): Beyond the Random Walk: A Guide to Stock Market Anomalies and Low-Risk Investing. Oxford University Press.

Tiňo, Peter, Schittenkopf, C., Dorffner, G, Dockner, E.J. (2000): “A Symbolic Dynamics Approach to Volatility Prediction”, in Abu-Mostafa, et al. (2000) (1999) MIT.

14