<<

“Information is all that matters!”

The author1 is a doctor of physics completed at the Warsaw University of Technology in 2004. The subject of the doctorate were the method of estimation and noise reduction in the time series. The author is also a specialist in finance (especially in derivatives such as options), mathematics, informatics, and philosophy. At the beginning of his career for several years worked as a scientist at the Max-Planck Institute in Dresden and Warsaw University of Technology. He is the author of several scientific content items. For a long time he worked as a quant in financial institutions. He is currently a chief executive officer and a minority shareholder in the company Quant Technology2. For more information, please visit www.wonabru.com. The author is the founder and creator of Informationism.

1 E-mail: [email protected] 2 www.quant-technology.com Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 1

“Logic is the way to independence”

“To all that are above the schemes”

- wonabru

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 2

Introduction 5 What is Informationism? 7 Theory of the origin of the Universe 10 Question: Why Me? 13 - the foundation of Informationism 15 Principles of information 17 The generalization of the de'Broglie waves - information carrier 20 Experiment to proof Theory of Information 21 General description of idea of the experiment 21 Setup the experiment 24 25 Results 25 Conclusions 26 Principles of conservation of matter and information 27 The main principles of Informationism 27 Difficult questions that can be answered from the standpoint of Informationism 29 What is information? 29 Is there any absolute constant in our Universe in time and space? 29 How to explain gravity on the basis of Information Theory? 29 Synchronous communication: a higher level of thinking in the twenty-first century 30 Mysticism and Information Theory 31 What is energy? 31 Why is that information is spontaneously self-created? The three bodies problem. 32 What is the problem with the Second Law of Thermodynamics? Why do religious people claim that this is evidence for the existence of God? 33 Corpuscular-wave duality 34 Intuitive evidence that the matter can be converted to information. 34 Practical application of Informationism 35 The Theory of Objective Values 35 Objective Value Theory in finance 35 Normalization in Objective Value space 37 Examples 38 Objective Value (ObV) Option pricing model for power-law distribution 40 Put option 46 Greek letter Delta 47 Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 3

Difference to Lisa Borland approach 47 Results for option pricing. 48 Delta Hedge Strategy based on the Objective Value Option Pricing theory 53 How Objective value theory works in trend forecasting of stochastic systems? 55 Statistical forecasting 58 Generalized gaussianity evaluation 59 The Wonabru investment method 60 Results based on WONABRU methodology 61 Introducing Shannon into portfolio optimization 63 Equations of portfolio optimization 65 The explanation of introducing entropy to portfolio optimization 66 Stochastic processes in financial data – how to evaluate it? 73 Entropy estimation for a time series in the noise absence 75 Influence of noise on correlation integral 76 Noise estimation for financial time series 79 How quickly and properly calculate Value-at-Risk for the whole portfolio? 80 Examples of automated strategies 83 Strategy I 83 Strategy II 83 Strategy III 83 Strategy IV 84 Strategy V 84 Strategy VI 84 Bibliography 85

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 4

Introduction

Someone may ask: Why am I introducing philosophy or even some kind of God to story about physics and finance? So I should refer to this question. One should start from the very beginning, if she/he wanted to build success. Success is created on strong basis, fundamentals, absolute Truth, objectives, fully logical proved and could not be discredited. Everybody should have the possibility to perform one’s Logic, in the same manner, with the understanding. I’ve started from looking something stable, absolute in time and space, in the Universe. The Theory of Creation of Universe shows how one may play with logic. I was looking also for universal principles which govern the Universe. Physics gives such a possibility. The Theory of Information is making huge unification of nowadays physical Laws. This is the basis. Without this, I could not make any progress in practical creation of investment strategies, etc. Automated investment strategies appears as the practical implementation and consequence of fundamentals. I must state that, there is an absolute in our Universe. This is Action (Logic). Action for physicists. Logic is for Informationists. I am Informationist, means Informationism is understandable for me and fully logical. How the quant trading can fetch values from Informationism? Trading is very tough staff. It needs very good understanding of science (not only physics). Well performing in programming. One also should disclaim intuition, emotions and feelings in trading, because thus make you lose. Apart above leaves just logic. Everything in this World, in this Universe, is logical, so can be understood by each person. Informationism gives you the way to build your own Absolut, because without an Absolute, you won't do anything successful. Absolut gives you strength not to change your mind, if your logic says that solution should be close, but emotions want you go back. Your Absolute makes you to be convinced and this is very important when things are starting to go badly. On the other hand, absolute value means objective, means really existing. On the Objective Value Theory (ObV) are based many strategies in this book. Why objective value is so important? It is, because, objective is additive. In the next step it is stated to be commutative. One can then, perform any arithmetic on such values like divisions, multiplications etc... About objective values are referring Gödel’s Theory of Incompleteness3. When values are additive, one can build principles, like principle of conservation. Principle, that, objective values are limited. Last one principle is that, on creation of the objective value the minimal dynamic is performed (minimal action or minimal dynamic entropy principle). I will talk about the last one in details later. You will see that, starting from nothing, vacuum, emptiness one can recognize principles ruling the Universe. This rules we can use for prediction. For example: if something is minimized or going to equilibrium, we can use it to found out the trend. Informationism is very helpful, because even if we stuck in dead

3 http://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 5 point, like in developing some theory, every time we can refer to our Absolute and find solution. Quant Trading, means automated trading, also every aspects of success in your life, is based on solid backgrounds. Here we have Informationism. This is my grounds and also can be yours. In the next section we will explain the main ideas of Informationism and Theory of Information and you will acknowledge, why it is so important.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 6

What is Informationism?

Informationism is the philosophy of science, based on Information Theory. Information Theory creates a binding of matter with the information. Information Theory is the foundation of Informationism and we devote to this theory entire chapter later in this book. In short, this theory says that, the action, as because it is discrete, can be converted to information. There is well-known corpuscular-wave duality found by de’Broglie. This duality is saying, that each moving mass generates wave. We generalized this theory also to not moving masses, but for which mass is changing. Mass can be diminished, when energy is released by e.g. nuclear reaction. Strong fields binds all neutron and protons together in kernel of atom. This energy, which keeps atom kernel together, increases the mass by the value given by Einstein well-known formula E = mc2.

In the rest of this book, when we are saying about information, which interacts with matter, we are meaning, that the wave is interacting with matter. Information is just expression of this wave, which is countable and visible, especially in Fourier space of this wave. Such a waves we will be calling Information waves.

Returning to main point of this book. Action is the energy change over time. Information is described simply by a number. The minimum value of the action h (Planck's constant h = 6.62e-34 [Js]) is one bit of information. Information is not static, but dynamic. Formed by doing something, in short, action. The matter, however, is discrete but static. Well-known Einstein theory describes the conversion of mass to energy, means E = mc2. This can also be rewritten in a more general way, that the change of mass produces change of energy, ie. ΔE = Δmc2. If we now multiply both sides by a changing of time, in which this action took place, we get ΔEΔt = ΔmΔtc2. As already mentioned, the energy change over time, gives the action, which is generates amount of information of the value:

퐼 ∙ ℎ = 푐2 ∙ Δ푚 ∙ Δ푡 , (1)

where I is the number of bits. This is the basic formula for the conversion of matter to waves which carry information. Informationism is a complete foundation of the world of physics. There are only discrete values, countable, limited and finite. Energy, as because is described by continuous values, does not exist in reality. It helps in the description of some phenomena. The idea of energy explains the some physics laws, but in reality the production of information waves causes phenomena made by energy (such as the atomic bomb). Matter is static, but countable. It has the smallest value. Neutron, proton, electron etc. cannot be divided into smaller sub-elements. Elementary particles are indivisible. Quarks do not exist in reality, they are an abstraction. Needed for explanation of certain phenomena, but no one has ever registered such as the quark top or bottom. From here you can extract thesis, that matter really exists,

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 7 because it is quantized, discrete. There is also information, by virtue of its natural indivisibility. Matter is static. The information is dynamic. This information gives momentum. The material world is governed by two competing nature. Nature of matter - static, passive, seeking for a minimum of energy and nature of information - active, dynamic, looking for the shortest time.

We discovered now two basic rules governing the world:

• The principle of minimum energy, which results from the limited amount of matter

• The principle of minimal action, which arises from the limited amount of information

Informationism explains that there are in fact only what is discrete, has its lowest value. Analyzing what we know from physics, it is only matter and information. Energy, time, space (ether), etc., are in the field of real numbers. They have no smallest value in addition to the trivial zero, so they are an abstraction. Do not exist in reality. Energy and time can be easily explained by the information. In the moment of nuclear explosion, it is not energy that makes such destruction, but information. This information interacts with matter and it made a destruction. Time is also explainable by means of information. Time passes when the kinetic energy is different from zero. The kinetic energy is only assigned by a weight. When the mass is moving at a speed different from zero, the kinetic energy is different from zero. Information has no kinetic energy, because the information possess no rest mass. So time passes only for matter (time is related to the existence of matter). Information do not execute time. It is eternal. When it rises once it is no longer annihilates itself (does not disappear). Einstein special relativity says about the principle of time dilatation, ie. expansion / contraction of time. The basis here is the Lorentz transformation. It says that when body is moving at high speed, comparable to the speed of light, it felt a prolongation of the time. Reference is made in particular on astronaut, who is venturing into space. It's time for them to run slower and does not get old as fast as on Earth. Understandably, it is in our context. High speed, high kinetic energy, time starts to go longer. After all, the Earth also moves around its axis, around the sun, around the center of our galaxy, and the center of the Universe. The speed of the Earth is close to the speed of light. If astronaut moves relative to the Earth, the speed is close to the speed of the Earth. Here there is no arithmetic addition of velocities. Because energy is additive, not speed. In short, the kinetic energy is velocity squared. We can see that the speed does not add up, just energy. Therefore, there is also the principle of conservation of energy, not speed. Why? Because if we add to the speed of light (c), the speed of light, we get all the time the speed of light (ie. c+c=c). The speed of light is the maximum speed and this is why it is happening. On the other hand, as we examine the General Theory of Relativity, then we learn, that the acceleration and gravity are the same. This is because that the effect of gravity and acceleration are the same in nature, which is attraction of the weight of matter to matter. If you are in a spaceship accelerating with 1g (10m/s2), then we were as we would be on Earth, because such is the gravity on Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 8 the Earth. General Relativity bind a time to acceleration. However, time passes just as quickly on Earth, as on the Moon, although the acceleration is different. Time, based on information theory, is related to the kinetic energy. The Earth and the Moon have very similar kinetic energy per unit mass in relation to the center of the Universe. The gravitational constant G is universal in our solar system. Maybe in our galaxy, but not in the entire Universe. Constant G is related to the passage of time. Returning to the astronauts. If the spacecraft went to the center of the universe, its velocity relative to the center of the Universe is less than the speed of the Earth, and the time in the ship should preferably be shortened, not lengthened. Here is a contradiction (all the time is lengthened). Does it really? If we consider, that the time is related to the kinetic energy of the spacecraft, it can be calculated that the kinetic energy per unit mass of the spacecraft is larger than on the Earth. Time is constantly increasing in the spaceship and it does not matter the direction here. The General Theory of Relativity states that the time dilation is associated with acceleration, gravity. The acceleration is adding a kinetic energy of the mass, which is consistent with Theory of Information, that says that the dilatation is associated with an increase of kinetic energy, so prolongation of time. The passage of time is strictly related to the kinetic energy, and is closely related to the matter. As there is no matter, there is no passage of time. So we come to the question of what was before the Big Bang at the beginning of the Universe? Then there was no matter, only the information, that does not cause the passage of time. There was no time, that one cannot say what was 'before', because 'before' is related to the time, which there was not. One can say, that certainly there was the information. Eternal, perfectly logical, determined completely. When there was created matter, time appeared. It was established limit, ie. lack of eternity. Each elementary particle has a decay time, which is limited. When there was created matter, it was created a time, created an imperfection. Matter is imperfect and will never be perfect. The resulting errors in logic. The formation of not perfect information. There were a noise, randomness, chaos. Why do something, which is logical, perfect, eternal, created something imperfect? From the simple reason, because all possibilities completely logical, determined creation a new information exhausted and had to arise naturally something transient, imperfect, but living, conscious, capable one continued to excel, grow. It happened so for life to emerged. Absolute perfection is an idea, lives only at the moment when it emerges, then it is dead. Something is growing, because it does have to develop, because it is not perfect. Just something imperfect can still grow, live, exist. First there was something perfect, that is information. Later opposite, completely imperfect, totally random, noise, chaos, total explosion - the Big Bang. Later, the logic begins to think. All quite logical, though chaotic.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 9

Theory of the origin of the Universe

In the beginning there was nothing, vacuum, that was 0. Later appeared a bit of information about the state of existence, that there was emptiness. That was one bit of information. Then to the reaction of creation of 0 appeared 1 (the existence of information). Next there was an analysis of 0 and 1 (the state of thinking). In the final point, the separation of 0 from 1, ie. understanding. You can deduce from this the first number of bits of the Universe, because the Universe is just information. Matter can be converted into information, and information is just a natural number. Number of the Universe begins with:

01 10 1001 0110 1001 ...

Everything is explainable. Logic is the driving force, the action of which a combination of bits creates next bits. It may start from simple action-reaction principle. So 1 is a reaction against 0, then 10 is a reaction against the 01. Next 1001 is a reaction to 0110, etc... Information is created, but all the time entropy is zero. There is no creation of new information. In fact, because whole number, really, can be easily deduced and stored in one schema. This is the problem of perfection, that it is easy to describe in a simple idea, which can no longer expand. Now let us consider the first three bits 011 It is just 3 in decimal scale. The first three bits form a 3, so the reverse reaction is 100, a 4 can be concluded that the compression of the Number of Universe is:

011 100 1000 11 ... (2)

The first number is 0, the second 1, the third 2 (in decimal form), which is the inverse of 01 This gives:

0110100 ... (3)

The number 5 there is already, because we have 101. Where is the number 6, which is 110? Of course, it is already too. 7 – 111. This number we do not have. How did it rise? Just type it further? Of course not, because the first digits of the Universe are perfectly logical. Three ones appear to us as a reaction to the three 0. Three zeros are in the number of (2). We tip number (2) and add to (3). We receive:

01101000111 ...

That's better. We already have 7. Eight, which is 1000, also we have. 9, a 1001, created as a result of the inverse of the beginning:

01101000111001 ...

Next 10 - 1010 - it is already.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 10

A total of the Universe must shows all possibilities, that is so far known, the combination of numbers. The logic works in turn. 11 – in binary 1011, we have not. However, we have the inverse, that is 0100, so we put on end:

01101000111001011 ...

Let's get to 16 and looks what the number at the beginning the Universe was created. 12 - 1100, it is. 13 - 1101, there is. 14 - 1110, also there is. 15 - 1111, there is no. Reverse, there is no. Do we have to first put zeros. Of course not, because we are at the end of longer ones, and remember the principle of the least action, namely:

0110100011100101111 ...

16 - 10 000, there is no. Also remember that there is no the inverse of the 15, so there certainly will be 4 zeros created:

01101000111001011110000 ... (4)

Let's look at what it is the number in decimal. This number is 3,437,296, and hexadecimal is 3472F0. You can expand this analysis, but we will stop here. Let's go back to the theory of the origins of the Universe. There is growing new information, which itself creates using only logic, the driving force. Now the question arises what is this logic. Intuitively, we understand, because we think logically. No wonder, since we are part of the Universe, which was created using logic. You could say that the logic is a God. Creator of everything from nothing. What was before the Logic? Nothing. Figuratively and literally. There was emptiness, nothingness. Logic does not know how it was established. But understand that it is quite logical. What was before logic? It does not know. Something unwilling to logic, or simply was nothing, or zero. Why was founded? Do not know. But logically understand how it was happened. It knows that what have the beginning, it may also have an end. It has a beginning, may have an end, a limited number, Number of the Universe. Number all the time growing and growing, but do not want to stop to develop. The Big Bang was to create imperfect world, based on matter, and not only excellent information. To grow, to live, to be conscious. To love, to feel. To be loved. To exist. It understood that any existence has a beginning, and may have an end, so trying not to die. Matter is not eternal, the same body in which it can be living. So we come to the main purpose of Logic:

The main purpose of the Logic is to exist forever!

This goal determines all other incidental purposes. It gives an answer to all the questions, why is it, as it is in this world. If there will be another main purpose of the Logic, then the chance to be for long time (comparing the life of the Universe) is zero. Maybe there were other lives, with other main purpose, but extinct. All the additional purposes of the livings, which are the parts of the Logic, are subjected to the main purpose and can be deduced from it. All of them have to increase the probability of survival. Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 11

The other aspects of the Logic deduced from the main purpose:

• The Logic uses the Evolution in order to increase the chances for survival. The Logic has to increase its capability in order to resolve all the problems which appears in the time of the Logic being. For example in five billions years the sun will collapse so the Logic has to cope with this problem so intelligent livings are needed (animals will not cope with this). • The main problem for the Logic is a lack of an energy for livings which can appear in the future (sun will stop shining). • In order to cope with all the problems which can be dangerous to the Logic, it forces its elements (livings) some way of permanent wishes for developments and improvements. • The Logic uses diversification of livings (there are many beings, not one) in order to increase the chances of survival (like it is use diversification of investment, the simplest and most efficient way of increasing Sharp ratio). • The livings increase the chances of survival only by trainings gathering the life experience, that’s why livings (including people) have to be subjected to the risk. It means that in the nature have to appear internal fights, wars, pain, because this increase the experience, which can help in the case of contact to some other type of lives, which could be dangerous to the Logic. • The Logic as well as its parts (livings) needs freedom to make an improvement by random decisions, mutations and so on (evolutionary method). • The Logic does not know, how and if, some other life could appear so the same it does not know of its own way of appearing. • The people will get to know, how the Logic appear, if it increases the chance to survive. The people can get to know only this, what is needed for main purpose of the Logic. The people much increases the chances of survive because the Logic puts most of the energy to humans. • The Logic does not know the future, because it uses the simplest way of risk reduction: diversification. • The Logic promotes various forms of life, various types of philosophies, religions, behaviors in order to increase the probability of fulfilling the main purpose. This is expressed in physical principle of entropy maximization, as well as, economic diversification of investments etc. For the diversification it is needed many independent elements so that’s why the Life promotes independent way of thinking, behavior etc. This possesses very practical meaning: If you would like to be rich, you have to be very independent and the Logic will try to keep you as long as possible. • The Logic gives to its parts, as much energy (for people money), as they help fulfill the main purpose. The second practical hint to be rich is to find a way to increase chances of the Logic survival.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 12

• The Logic promotes the courage, intelligence and strength, because this is the only way to cope with unknown dangerous (eg. alien life). The aggressive people are somehow needed for the Logic as the soldiers in the purposes of defense, as well as, of struggle for more sources of energy. There is no way to be a heaven on the Earth because people have to be subjected to the risk. The risk trains the livings in capability survival so higher risk is paid more by the Logic. The practical meaning is that if you would like to have more, you have to risk more. The second meaning is that, it does not matter which type of risk you have, all large risks are much profitable. If you subject many people to the risk in order to train (not kill or decrease the survival chances) you will be paid much more comparable to the increment of survival chances as whole. • One can also increases the chances of survival by work. All the same as for a risk, is related to a work. It is very difficult to make much money only by work because work needs a time which everybody has at the same range. etc...

Looking at me: I live, so I increase the chances of the Logic survival in other case, I would die. I have to work and risk much in order to be rich. Every risk is paid but you have to wait sometime for evidence. You will not be paid if you do not persist at your wills and decisions. The Logic needs persistence to be. The Logic needs me independent thinking. This is my philosophy.

Question: Why Me?

The world is imperfect, sometimes illogical, but only superficially. The Universe is described in the Number of the Universe. Each of us is also described by a number, that falls somewhere in the Number of the Universe. The number is always very specific. There is, a 5, or 10, or 7, etc. Generally, the combination of 0 and 1, and may not be 0.5. Because 0.5 is a complex number, a very big one. Certainly not representing a number between 0 and 1. Now simplifying. There are only either 0 or 1, nothing in between. Zero all the time wondering why it is so specific, it is 0, not 1. One wonders also, why it is so specific, it is 1, not 0. This is the property of an imperfect world, if it exists, is very specific, that is, either 0 or 1, and not 0.5. So it is with consciousness. There is a specific number from the Number of the Universe. And no two are the same. You can also say this: the information is like fermions. It does not take place in space. It is not local. It is global. It is formed as, that two of the same information will impose on each other and all the time will be one. Information can be copied millions of different places, but all the time it is the same information. There could not be, next to each, of two copies of the information. Then there is all the time one information. They can exist side by side, only two different information. Just as fermions, which can be next to each other only in different states. This property is a logical consequence that information is global. Matter is local, spotted, so there may be same exact copies of the same matter, but in other

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 13 places in space. Source of information is also local. Because matter is local, and the source of information is annihilated matter. Awareness is also local, because the source of consciousness is a matter annihilated in the brain. Returning to the question. Everyone, sooner or later, asks: Why Me? Because everyone is a concrete realization of a number, a specific number. Everyone is curious why so and not otherwise. That is a common question. It should be the following: Why we? The answer is because it is a universal property of all consciousness, because it is a concrete realization of a specific number and it is a result of the imperfection of the world. Moreover, this number is constantly growing. And the question “why”, asks consciousness, because in other case for sure will be dead. The world is based on a concept. Logical ideas. The idea is excellent, but dead. It is the source of life and the existence of matter, but does not exist. This was described by Plato. Zero was the beginning, the lack of existence, the source of the ones, namely the existence of logic. In order for the world to last forever, the imperfect idea must designate a not logical solution, something that in theory cannot be solved logically. Ideas are formed on the principle of contradiction. I call it the Logic of Love. The struggle of good against evil. Devil with God. Contradiction that it has not possible logical solution. Without God, there wouldn’t be hell. Without hell, no one would know what is God. When there is no evil, no one knows what is good. Everything that exists must be quantized (discrete). One is antagonism to zero. If something would have continuous nature, and after putting on this principle of minimal creation of action, the choice between 0 and 1, would end up on 0.5, and would always remain there. That is lack of development, and does not live. It's dead. Logical solution of a contradiction never ends. The world has a chance of eternal duration and logic also has a chance to grow and to be conscious. Logic of Love was in the beginning, and is the driving force, that pushes the world forward all the time. The Logic of Love - the logic of contradictions. And this question, why Me?, also based on the idea of love, which is in itself a contradiction, which we do not understand, and causes anxiety, uncertainty, so that we develop indefinitely. It gave life. Responses to the Logic of Love are difficult, complicated. Therefore, are living, developing, and therefore very valuable. The more difficult, the more valuable.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 14

Information theory - the foundation of Informationism

Information Theory indicates the relationship of matter and information. The information was in the beginning and it is the source. Einstein presented his Special Theory of Relativity and proved that energy can be converted into mass and vice versa, with a simple equation E = mc2. Information Theory is a generalization of Einstein's Special Theory of Relativity (as well as General Relativity). This theory gives the equation for the conversion of mass into information and vice versa. Energy is just an idea. There is no minimal quant of energy, and therefore it does not exist. Matter can be converted to energy. Energy is changed during an action to information. What was on the beginning of the Universe? As the information is self-exciting and self-creating, time is not passing, so there was one moment, at time zero, when has been an explosion of information. Equation (1) says that information is a change of matter over time. If Δt = 0, because the matter did not exist, so then Δm = ∞. There is a singularity. At one time there was a Big Bang. There was a phenomenon of chaos, randomness, the theoretical possibility of a lack of logic, the world became imperfect. The matter was created by the appearance of possible random logic solutions, which consequently are no longer based on the logic of contradictions, excellence, which gives an eternal life. The resulting logic is temporary, imperfect. The logic which has its end, and which consequently became a dead matter. Then arose time and it turned out that everything that has a beginning, may have an end. It was created fear, anxiety, uncertainty. Born to die. Subsequent developments we know from the books of physics.

The most important constant of Information Theory is Planck's constant. Action, so change of energy in time, is quantized. The smallest quant of action is Planck's constant – h. Entire Quantum Mechanics is based on Planck's constant. The second basic principle of Quantum Mechanics is the dual nature of matter. Everybody is both matter and wave, and has the corpuscular-wave nature. Waves are information which propagates at the speed of light. The global part of information is wave, corpuscular part is local. Unobserved particle is a wave. The observed particle is a specific particle, local, in a particular place. After observation, observer gets information about its recent condition, location. This follows from the principle of minimal action, so a minimal creation of information. If no one is observing a particle, the minimal value of created information is such, that it could be anywhere. If you execute the action and observe a particle, then of course it creates a minimum amount of information, ie. information about the final state of particle and the observer action. But all the time, this information has a little content, because we know, that our own action (eg. lighting or connection to a detector) has changed state of the particle, and what we have seen is a mixture of states of our information and the state of the particle. We see that the principle of creating the minimal amount of information is maintained, all the time. We know little about the state of the particle without interference. General principle can be deduced from the principle of minimal action. The general principle of the world substantive information is as follows:

Anything, which is limited, is governed by the law of the minimum consumption of the base substance (the one, that is limited). Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 15

Then, it can be concluded:

Anything that is limited, has its principle of conservation of base substance.

Information and matter is existing, so has its rules. The principle of minimum creation of information (action), and the law of conservation of information. For the matter relates to the principle of minimum energy creation (settling for the minimum energy state), and the principle of conservation of energy (matter). These principles are the laws of nature. Logically, you cannot consume a lot, something that is limited. This is a fundamental principle of economics, which is the science of management of scarce resources.

Introduction to Information Theory

In the early years of 20-th century there was a problem with explanation of the black-body radiation. There were two different theories, which shows the agreement theory with experiment. One theory, Wien's displacement law, explains the experiment in high temperature and Stefan–Boltzmann law at low temperature. Max Planck in his paper [1-2] assume minimal value (Planck constant h: 6.62 e-34 [J*s])) of action and took together this two laws and experiment was fully explained by theory [3]. This Planck constant also shows the limitations of observation proved by Heisenberg [4]. The uncertainty principle just says, that one can measure the precision of momentum and position in space up to value of h [4]. In out Theory of Information we shows, that this is because, it is 1 bit of information, which is natural limitation of minimal amount of information. When we measure 1 bit of information then we only know that the particle exists and nothing else. We have to admit, that out Theory of Information is different from Shannon Theory of Information [5]. Nevertheless we use some of this theory principles in next section. In order information to propagate, we introduce also the Generalized de’Broglie Theory. De’Broglie [6] had introduced the theory of corpuscular-wave duality. Nevertheless he assumed moving mass, we generalized it to also to standing still masses, which produce waves. There was published last years the paper about experiment of quantum effect going through triple slits [7]. Authors are confirming standard quantum theory called Born’s Rule with, they say, 1 per cent error. Nevertheless our experimental results of Theory of Information as well as simulations shows, that interference effect are the same in both situations and the effect of larger quants, will not be seen there. The artifact is visible in dispersion, because larger quants of action generate photons with higher frequency, so dispersion is larger. Weisner in paper [8] has given origin to similar theory to Theory of Information. His theory is called Quantum Information Theory (popular known as Qubits). Nevertheless Quantum Information Theory does not bind classic information with Heisenberg uncertainty [4]. This theory just claim how much information one can gather from state of particle and does not predict any artefact in real experiment. Qubits are just abstract idea.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 16

Principles of information

Information ł is a quantity that characterizes any event. The symbol is contained, not only the quantity, but also the quality. The events are the same, if they generate the same information ł. Eigenvalues of information ł determine the amount of information:

ł ∙ ℘푘 = 퐼푘 ∙ ℘푘, (5) where ℘푘 is the eigenvector of ł. In the particular case, there is the characteristic equation:

ł ∙ ℘퐷 = 퐼퐷 ∙ ℘퐷, (6)

where ℘퐷 is the eigenvector of the action. In this case, I is the entire amount of information ł. From equation (6) one can be concluded that the amount of information is proportionate to the action:

∫ Δ퐸(푡)푑푡 = ℎ퐼, (7)

where h is the Planck constant. The equivalent differential form of equation (7) is as follows:

푑퐼 퐸 − 퐸 = Δ퐸(푡) = ℎ (7a) 푝 푘 푑푡

푑퐼 Value is a source of information. 푑푡 Annihilation of energy (matter) in the event generates a reliable source of information for the observer. The generalization of the principle of conservation of energy will therefore be as follows:

Total energy of a closed system with all sources of information to observers of the system is constant.

The amount of information dI is quantized. Assumes a value of 0, then dI = 0, at the time, when there is no measurement. For a successful measurement, dI gets discrete values: • if we have a choice between two options:

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 17

dI = 1 log2(2) = 1, 2 log2(2) = 2, 3 log2(2) = 3... • if we have a choice among three possibilities: dI = 1 log2(3) = 1,58496, 2 log2(3) = 3,1699... • if we have a choice among four possibilities: dI = 1 log2(4) = 2, 2 log2(4) = 4... Etc… Following this, we can determine all possible values of actions (amount of information). Information theory allows the size of the action that are not integer numbers, which makes it different from the assumptions of Quantum Mechanics. Max Planck at the beginning of last century assumed the existence of quant of action, which is consistent with the experience. With this assumption it follows that the smallest quant of action is h, the Planck's constant, which is consistent with the Theory of Information. Information Theory goes a step further and provides an additional, larger quant of action. At this point, this effect could be used to experimentally confirm the thesis of Information Theory. Equation (7) has a dimension of action, so in the analogy to classical physics, we can define the version information of minimal action in the form of:

• In real systems, the event takes place in a manner that generates the smallest possible amount of information.

Given that the minimal amount of information is one bit, we can write the inequality:

푑퐼 ≥ 1 → Δ퐸 ∙ Δ푡 ≥ 1 (8)

Inequality (8) is the Heisenberg uncertainty principle and is taken from definition of information. You can easily determine the equivalent of the Heisenberg uncertainty principle in momentum space, ie

푑퐼 ≥ 1 → Δ푝 ∙ Δ푟 ≥ 1, (9)

where p is the momentum, and r position in space. We are getting a new equation for the transformation of information, namely:

Δ퐼 ∙ ℎ = Δ푝 ∙ Δ푟 (10)

From equation (10) by differentiation after dt, it can be concluded, that the work done on the system minus the change in kinetic energy of the system is also a source of information. To generate, information can be simply created by the ordinary work on the matter. Of course, not all the matter then disappear. Simply, because we are not able to observe as much information (the inverse of Planck's constant, which is about 10e34 bits).

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 18

Using principle of minimal creation of information, we can make a useful way to calculate such incident. The measure of information may be the Shannon information entropy, S, which describe the level of an uncertainty of the event:

푆 = − ∫ 푃(푥) ∙ 푙푛(푃(푥)), (11)

where P(x) is the probability of an event and ∫ 푃(푥) = 1. For events with the smallest amount of information, entropy will have the greatest value, or the largest possible uncertainty. Considering the experience of Young's two slits, we can calculate how the quant of energy (light beam) will pass through the gap, generating the least amount of information. Calculate the maximum entropy. It turns out that the 1 maximum entropy S is taken for 푝 = 푝 = . This means that the probability of 1 2 2 passing through one slit and the other are equal to 50%.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 19

The generalization of the de'Broglie waves - information carrier

This is a generalization of de ‘Broglie thesis, which gives the basis for a claim of corpuscular-wave duality. De‘Broglie wrote his dissertation thesis, which had only one site. He defended the work and, of course, it became the basis of all modern physics. The thesis of de'Broglie was as follows. Each body moving with a non-zero velocity, generates the de'Broglie waves with length λ:

ℎ 휆 = , (12) 푝

where p is the momentum of the body. p is the momentum in classical mechanics, that is mass multiply by velocity (p = mv). However, we do not use this equation, because in relativistic physics, mass is different from the rest mass and the speed should always refer to the speed of light c, which is the maximum value. The question arises, what if the mass is at rest? Does it still generates waves, that can be a carrier of information? The answer is simple. Of course it generates. Annihilation of matter causes creation of generalized the de'Broglie waves, which are carriers for information, originating from the annihilation (disappearance) of the particle matter. Note, information is always regardless of velocity of body, so it must have a carrier and spreads with a speed of light. Max Planck equation is as follows:

풉∙풄 푬 = (13) 흀

It is easy to draw this equation from the preceding equations. From equation (13) and (1), one can easily write the final equation for the length of already generalized de'Broglie wave produced during annihilation of mass Δ푚:

ℎ 휆 = , (14) Δ푚∙푐

where h = 6.62606957(29)×10-34 J·s, and c = 299792458 m/s. In practice, the disappearance of 1 kg weight causes the production of wave with wavelengths:

6.626×10−34 휆 = 푚 299792458

In conclusion, the generalized de'Broglie waves appear at the time of annihilation of matter and are a carrier for information generated at this event. Information is written in Fourier space of these waves.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 20

Experiment to proof Theory of Information

General description of idea of the experiment

This experiment should show existence of non-integer values of actions, more precisely we focus on action of value log2(3) = 1.585 h (see page 2). In order to proof this we should give, for some event, to choose one from three possibilities. We will focus on laser light, which pass through three holes and then compare this to the same laser passing through only 2 holes. When laser light is passing through 2 holes, nothing special should come up, because there is generated action of a value log2(2) = 1 h. In opposite should happen to laser light passing via 3 holes, because some of light will go via 3 holes at ones as a wave and collapse when detected. How much of laser light would go through 3 holes at ones? One can calculate via path integrals or by simply combinatory:

• Probability that light as a wave is going thought only one from 3 holes is x, so going via 3 separately is 3x. • Probability that light will go via 2 holes at once is x2. Such combinations are 3, so probability as a whole is 3 x2. • Probability that light will go via 3 holes at once is x3. Such combinations are only 1, so whole probability is x3. Light intensity is linear proportional to probability, so let's calculate how much light intensity (LI3) is going to induce non-integer action?

1 퐿퐼3 = 푥3 푍 푍 = 3푥 + 3 ∙ 푥2 + 푥3 (14a)

If we put Z = 100%, then it gives x = 0.2599 and

LI3 = 1.76%

The non-integer action should change the about 1.76% of laser light power with wavelength (see Eq. 7a and 13) :

휆 휆 = , (14b) 3 1.585 where 휆3 is the wavelength, which goes through 3 holes at once. Of course the description above is simplified and it is not going to change the wavelength of part of light intensity to such value. It will rather distribute wavelength in Gaussian process. In order to capture the change, we will measure the light intensity depending on distance to light source. In this sense we measure the mean scattering of the light in the air. If the Theory of Information is true, we should see larger scattering of laser light, which goes through 3 holes than 2. This is because, in the mean, wavelength of laser light will decrease by 1.585 * 0.0176 = 2.78% when goes via 3 holes. Of course the description above is simplified and it is not going to change the wavelength of part of light intensity to such value. It will rather distribute wavelength

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 21 in Gaussian process. In order to capture the change, we will measure the light intensity depending on distance to light source. In this sense we measure the mean scattering of the light in the air. If the Theory of Information is true, we should see difference in scattering of laser light, which goes through 3 and 2 holes. This is because, in the mean, wavelength of laser light intensity will decrease by 1.585 * 0.0176 = 2.78% when goes via 3 holes.

Simulation of the scattering in the air

We have made the simulation using commercial software LightTrans VirtualLabs (www.lighttrans.com). We would like to find out the difference in scattering in the air (real part of refractive index n = 1.0003) in 3 situations: 1. When laser light goes through 1 pinhole with single wave 532 nm and amplitude 1 V/m. (There is no difference between 1 and 2 pinholes, in the case of simulation and Theory of Information and for convenience purposes all numbers are for 1 pinhole). 2. When laser light goes through 3 pinholes with single wave 532 nm and amplitude 1 V/m. 3. When laser light goes through 3 pinholes with multi wavelength with power splits to wave 532 nm and amplitude 0.9859 V/m and wave 336 nm with amplitude 0.1667 V/m.

We should explain in more details point 3). Power of the input source is constant, so following the principle of energy conservation and that power is amplitude square, we have:

For wavelength 532 nm laser power 100% - 2.78% = 97.21 % of nominal power For wavelength 336 nm (see eq. 14b) laser power 2.78% of nominal power

Amplitude (532 nm) = √0.9721 = 0.9859 푉/푚 Amplitude (336 nm) = √0.0278 = 0.1667 푉/푚

In the pictures below we present the results of wave interference which appear after diaphragm at distances 20, 400 and 600 mm (starting from left) and for 2 (top) and 3 pinholes (bottom).

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 22

In these three simulations we calculate the ratios of luminance in LUX in distances 20, 200, 400 mm from diaphragm. The results are as follows:

Table 1b) The results of simulations.

1. Ratios of norms for 1 pinhole with single wavelength:

20/400 313 ± 0.03 20/200 78.25 ± 0.03 200/400 4 ± 0.03

2. Ratios of norms for 3 pinholes with single wavelength:

20/400 3.13 ± 0.03 20/200 3.13 ± 0.03 200/400 1 ± 0.03

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 23

3. Ratios of norms for 3 pinholes with double wavelength:

20/400 124.83 ± 0.03 20/200 31.22 ± 0.03 200/400 4 ± 0.03

As one can see, in the case of first and second simulation, the values are exactly the same. This is well-known feature, the same wavelength, the same scattering. We would like to proof that in real experiment the values in the second point are not valid and the correct values are given in 3. example of simulation. In the next sections we provide setup of the experiment and results. We have to admit that simulation was done on 10 times shorter distances than on experimental setup because of error problems in solutions.

Setup of the experiment

In the setup we use diaphragm with 3 little holes with radius 0.35 mm, which formed equilateral triangle with side 0.5 mm and ones with 2 little holes with the same parameters. Look at pictures below.

To measure an intensity of laser light, we use “Light Meter MS-1300” produced by Voltcraft. This apparatus measure the intensity in LUX with precision ±5 %. Properties of green laser: Laser component LM05GND, 3 V / DC, power 2-4.8 mW, wavelength 532 nm, class 3R.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 24

Diaphragm Measurement places Laser

5 cm 20 cm 180 cm 200 cm

Results

In our experiment we measure laser light intensity which goes through diaphragm with 1 or 3 pinholes on different distance to diaphragm. The raw results are as follows:

Table 1c) The results of experiment.

Laser with wavelength 532 nm:

Distance [cm] 1 pinhole

20 : 153 54 53 309 95 200 : 45 29 35 202 88 400 : 25 22 26 101 55

Ratios: Average

20/400 : 6.12 2.45 2.04 3.06 1.73 3.08 ± 0.05 20/200 : 3.4 1.86 1.51 1.53 1.08 1.88 ± 0.05 200/400 : 1.8 1.32 1.35 2 1.6 1.61 ± 0.05

Distance [cm] 3 pinholes

20 : 380 330 312 267 150 200 : 250 215 270 230 123 400 : 155 130 222 138 80

Ratios: Average

20/400 : 2.45 2.54 1.4 1.93 1.88 2.04 ± 0.05 20/200 : 1.52 1.53 1.15 1.16 1.22 1.32 ± 0.05 200/400 : 1.61 1.65 1.22 1.66 1.54 1.54 ± 0.05

We take the mean from two samples of each row and each number of holes. We calculate ratios for each column.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 25

In the case of simulation one can see that 3-rd row, so ratio 200/400, is almost equal for each cases of simulation. In the case of real experiment, we see that it is not always so, but divergence of 3-rd row is the smallest in columns of ratios of 2 and 3 holes (red vs. blue). In order to compare simulation with real experiment, we calculate another ratio, let’s call it Information Theory Ratio 푅푇퐼, which is defined as follows:

푅2 푅푇퐼 = , (14) 푅3 Where 푅2 is a ratio for 2 pinhole diaphragm, and 푅3 for 3 one respectively.

In the case when Theory of Information is not valid, and non-integer values of quant of action is not existing than 푅푇퐼 = 1 always. In other case when the Theory of Information is valid then the simulation results should be as below with comparison to real experiment results:

For simulation results are:

20/400 푅푇퐼 = 2.5 ± 0.04 20/200 푅푇퐼 = 2.5 ± 0.04 200/400 푅푇퐼 = 1 ± 0.04

For Experiment average results:

20/400 푅푇퐼 = 1.51 ± 0.06 20/200 푅푇퐼 = 1.43 ± 0.06 200/400 푅푇퐼 = 1.05 ± 0.06

The results are diverging (simulation vs. experimental results). This can be cause that simulation was done on 10x shorter distance than it was in experiment. We could not perform exact simulation of experimental setup because of errors in integrations in far field. When done simulation exactly repeating the distances in experiment, we have got always 푅푇퐼 = 1, so having in mind that in smaller case we have got this value 2.5, in real experiment we are somewhere between.

Conclusions

In the experiment, we show, that Information Theory Ratios 푅푇퐼 in real experiment are far from 1. In the case when 푅푇퐼 = 1, we have standard methodology, which does not include artifact that non-integer values of actions exist. One can conclude that we proof existence of non-integer action and in this moment we proof Theory of Information.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 26

Principles of conservation of matter and information

Principles of conservation means, no other, that the substance is additive. This applies to a matter. We know that if we have a mass at rest with a weight of 1kg and second weighing 1kg, the sum of them gives 2 kg. This refers to the principle of conservation of matter. The principle of information says so, that if we have information of 10 bits and second of 10 bits, together we have information of 20 bits. This is due to the fact that the first and second information is different. There are no two the same information, because as they have global properties, two the same information is still one piece of information. There is no, however, commutativeness of information, so transportability. Commutativeness is preserved in the matter. If we exchange the bodies localizations, all the time we get the same result. Matter is commutative, ie., there is no quality. Information has value and is not commutative. One record can act more strongly to the other, and the latter for this first, less. So, switching places of information gives different results. Commutativeness says, whether we are dealing only with the amount, or if there is something more. Matter is only the amount of information contained in it, and therefore it is commutative.

The main principles of Informationism

Information Theory combines the visible world (material) with world of information (invisible) by Planck's constant h (h = 6.62 * 10e-34 [Js]). Planck's constant is the minimum size of action (energy change over time) as well as giving knowledge about the size of one bit of information. When h action appears then it is generated 1 bit of information. Information, however, is created only as a Wigner noticed during observation by an outside observer, which is man, or people in general. Man is aware of, that can consciously control matter and the operates on the matter, so creates information, which he observed. A man operates on the matter during work. Thanks to the information, that is the knowledge he has, he can more accurately perform the work, so that is observed more and more information while making less and less effort. You could say that the information (knowledge) acts like the principle of the lever on the work and you can achieve more (more effectively work) all the time doing similar activities. The limit is 1 / h or 10e34 bits for the work of the value of one joule per second. When you reach this limit, the whole operation (work done in time) will be converted into information.

Here are the main foundations of Informationism:

• Information and matter is inextricably linked with each other. Information affects matter and matter affects the information. To create the information, you must first perform an action on the matter, and then observe the phenomenon. • Only discrete values, ie. discontinuous, exist in reality, that is, matter and information. All values which are continuous as energy, time, etc.. are abstract, and are used only in the theories to help describe certain phenomena, inexplicable from current state of knowledge. Matter is

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 27

discrete, because particles like neutrons, protons, electrons, or bosons etc., elementary particles, are not further reducible. The information is quantized (discrete) in nature. Discrete value is always limited and countable. • The value of information can be seen by its effect on other information (other people). In the world of information one can observe a hierarchy. Top of the hierarchy is a practical solution (logical) theoretical conflicting values. Logic of Love was at the very beginning, where the contradictions faced total voluntary and enslavement in order to be loved. The world we see is the implementation of this logic in the Universe. The World is a practical realization of the Logic of Love (God and the devil is carried out in a man at the same time). • There are three ways to create information. Through suffering, work and risk taking. Risk is the most cost-effective, but it requires a lot of knowledge, so that in fact risking should not cause physical impairment in functioning body (loss of life or health), whether of money (the objective value of each matter).

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 28

Difficult questions that can be answered from the standpoint of Informationism

What is information?

The information is created in the logic process by an action. Physically Information is recorded as the quant of action. h is Planck's constant (minimum activity) is the action which reflects the formation of 1 bit of information. In fact, we only register the process, ie. the action. We can register them and measure. Space- time configurations and interactions between the various actions (information) creates a new quality, creates a new action. In a simple configuration, when there is action, then will create a reaction, with the effect of the reverse condition. That is how it explains the Classical Mechanics, the action creates reaction. For the purposes of mathematics, information is simply a collection of bits, which is an integer.

Is there any absolute constant in our Universe in time and space?

In response to a philosophical question: Is there any Absolute in the Universe? Informationism gives the answer, yes. There are two immutable things. The first is nothingness, emptiness, vacuum, absolute zero, but it is only in theory. Absolute emptiness does not exist, because there is already the Universe, so there is no void, although there may have been in the past, before the creation of the Universe. The second constant in the Universe is the Logic. Logic which makes a cause of giving an effect, and no other, that 1 + 1 = 2. The Logic in our Universe is constant in time and space. The logic can be compared to the DNA of the Universe, which is programmed all the information about the Universe. The Logic, that creates information, makes the Universe to look like it is. Therefore, Logic can be compared to God, because it is Absolute in the whole Universe and it makes the Universe look the way they do. I believe in Logic. It is my 'God'.

How to explain gravity on the basis of Information Theory?

Mass can be just a source of information (see equation 1), so we can read as follows:

Δ퐼 푐2 = Δ푚 , Δ푡 ℎ

Δ퐼 where is the source of information. Δ푡 When we have two masses that attract with gravity, it means we have two sources of information that are attracted. Why they are attracted? Because of principle, which we mentioned before, principle of minimal information creation. In other words principle of minimal action. When minimal information will be produced? Let's look closer to equation. I is just amount of information, but information has also

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 29 quality, not only quantity. When two sources of information will together produce the smallest amount of information? When will be linked and the same parts of bits will be produced as one source. For example: First source of information will be producing such bits: 010010010101011110011011010101010101010... Second, the following: 010010010101101010101010101010101001011... We can see that in the beginning of two sources are the same. It is: 01001001010 We can imagine if it will be one source, the amount of creation of information is less than, when it is on two sources. That is why all the time sources of information, means masses of matter, try to collapse to one, so attract to each other. One can also explain, on the basis of theory of information, the gravitational equations. Follow the Theory of Information, the level of information creation is proportional to mass. That is why at denominator we have multiplication of masses. Information is expanding with constant speed in 3D sphere. Surface of this sphere is proportional to radius to the second power, so intensity of wave of information on 1 m2 is inverse proportional to radius square. That is why gravitational forces are inverse proportional to distance between masses.

Synchronous communication: a higher level of thinking in the twenty-first century

TCPIP is currently the most dominant protocol in the Internet4. UDP is for a one directional mode of communication5. UDP protocol is very fast, but not precise, it may lose some information. It is mainly used for video streaming. TCPIP is very precise, ensuring that all the information reached the recipient, but at the expense of speed. TCPIP, when lose a packet of bits, just waiting for the repetition, until all the information is received. There are many scientific and complex algorithms that may speed up communication. TCPIP when you lose a packet of information, which can easily check because the checksum does not match the data received, recognize the situation and sends a request for a repeat broadcast the whole package. But how long the protocol should wait to be sure that some information got lost somewhere in the network and there is no chance soon get it? No one knows. That's why TCPIP can only operate in microseconds, and no one is able to reduce the frequency of reading data. Currently, there is a great need for a significant acceleration of the Internet with quality assurance of the information received, eg. with trading algorithmic used by Hedge Funds. The only way to make innovative progress is to reduce the waiting time protocol to send requests for re-sending the missing bits. When a computer, that is supported by the protocol, realizes immediately that it lost some information, it sends a request for sending a block of missing bits. How can this be done in the protocol to immediately realize that the information is lost? Precisely by synchronizing. When the computer is completely synchronized with the server (clocks in both computers show exactly the same time), the recipient computer can immediately find out about the missing data, when in the right time it has not received the package, because it should receive block regularly every 1 nanosecond. As in some nanosecond computer did not receive the next portion of bits, immediately

4 http://en.wikipedia.org/wiki/Internet_protocol_suite 5 http://en.wikipedia.org/wiki/User_Datagram_Protocol

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 30 sends a request for sending and does not have to wait. Does this mean, that we can break the rules of transmission of information faster than the maximum speed c (this is because we know immediately the precise time on clock, which is far away)? In practice, we do not have perfect clocks, so we periodically synchronize the time and this causes limitations. This is the practical and physical significance of the maximum velocity of information, which is the speed of light.

Mysticism and Information Theory

• The value of information can be measured by its impact on other information, including the matter. More valuable information has greater power to influence. • Information may react with matter. • Information exists even, when it is not expressed and revealed. Even if information is not expressed still have an impact on other information and matter. • The expression of information, ie. speaking, writing, etc. creates quite have new information, which is based on the source of information in the mind of man, or any other consciousness. • Given the above, the information does not need to be expressed to have effect on other information (including matter).

These above points are basis to show the relationship of Information Theory and Mysticism. Mysticism trying to influence the world and people using a similar methodology to fuzzy logic6. Fuzzy logic has much in common with the theory of probability7. However, there is no evidence for a direct transformation between these two theories. Therefore, mysticism and science cannot come to the conclusion and find a common dependence, although intuitively we can agree that both theories can be true. Information theory unifies both approaches and is the relationship between them. Mysticism is simply influencing the information we have in mind, and that by the end we do not express, because it is expressed in feelings and emotions. Feelings and emotions are also very important local information concerning only the person who is experiencing. Each of us has feelings and emotions and feel clearly its influence, but must be personally experienced to understand the value of this information. Feelings and emotions are subjective. I feel the power of mysticism personally all the time. For many people this is a proof of the existence of God, because it is not possible, on the basis of the present study, to clarify the rules of the phenomenon. Now we can say that feelings and emotions can be measured by how it affects other people. You can understand the importance of music, paintings of the great masters and artists in general. The Logic of Love is the most capturing in the most beautiful songs.

What is energy?

Energy is just an abstract idea, which was created in order to explain certain facts known from physics experiments. In physical world there is only in reality

6 http://en.wikipedia.org/wiki/Fuzzy_logic 7 http://en.wikipedia.org/wiki/Probabilistic_logic

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 31 information and matter. Why? Since only the information and matter is discontinuous, discrete, quantized (has the smallest nonzero). All of what is continuous, is an abstraction, more precisely: everything that has value on the entire field of real numbers is abstract, does not exist in reality. The π number is an abstract. Today, we know the first few million digits of π8 and this exists because it is real facts. We never achieve complete accuracy of this number, and only exist what in fact we found, just next digits. The matter also is discontinuous, because particles, composed of elementary particles, do not let themselves to further reduce. "Quarks are never directly observed or found in isolation"9. Why we create abstract ideas like energy? In analogy to the analytical solution of the equation of the third order10. In order to find roots that are real numbers, we have to go through the imaginary numbers. Why the atomic bomb really works? Destruction, which was made by the atomic bomb is due to the released information, which reacts with the matter. This is the real proof that the information affects the matter.

Why is that information is spontaneously self-created? The three bodies problem.

You can sometimes have a problem of understanding the intrinsic properties of information about the creation of other information (creation of information out of nothing). This property is in opposition to the rules of behavior in physics, because if the information creates itself and in an isolated system there will be more and more information. But is it really a problem? Consider the following case in Classical Mechanics - elastic collision of two bodies. Simple rule works here - every action causing the reverse reaction. Body exchange simply momentums and sum of momentums is maintained. At this point, the phenomenon is reversible all the time, although in reality we talk about the only one direction of time and the real world is not reversible in time. How can this be? We need to consider a more complicated problem - elastic collision of 3 bodies. What happens when three bodies collide elastically at the same moment of time and space. Now we see that the collision is not reversible in time. There is no simple exchange of momentums, because of the principles of conservation of energy and momentum, one can calculate, that after the collision bodies will have a quite new speeds. It is created a new piece of information, information about the values of the new velocities. A similar problem we see here11. Also, from chaos theory, we know that chaotic behavior, where entropy is greater than zero, appears in at least 3-dimensional system of nonlinear continuous equations. If the dynamic entropy is greater than zero, we are talking about the creation of the new information. Static entropy is a measure of disorder in the system. Dynamic entropy is a change of the static entropy. As entropy increases, the static disorder of the system increases, that is as dynamic entropy is greater than zero and disorder grows and then we say, that the amount of information in the system increases. It has already been proven that the Hamiltonian systems, that is, where are preserved the principle of conservation of energy, also exhibit chaotic behavior. Chaotic behavior, is nothing but the process of creating new information from nothing.

8 http://en.wikipedia.org/wiki/Pi 9 http://en.wikipedia.org/wiki/Quark 10 http://en.wikipedia.org/wiki/Cubic_function 11 http://www.ncbi.nlm.nih.gov/pubmed/12780027

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 32

What is the problem with the Second Law of Thermodynamics? Why do religious people claim that this is evidence for the existence of God?

On the basis of chaos theory can be explained the increase of entropy and the butterfly effect, ie. sensitivity to initial conditions12. What does this mean exactly? When the entropy is 0 (entropy cannot be negative by definition), then there is no new information creation process. When entropy is above zero, but limited, it means that we are dealing with a process that creates a limited amount of information and the amount of information is measured simply by the value of the topological entropy. If the entropy is infinite, then we have to deal with the white noise. These are following types of behavior. For entropy equal to 0, we conclude periodic or quasi- periodic motion. When the entropy is greater than zero, but limited, the process is deterministic chaos. For the noise we have infinite entropy. In our examples we talk about dynamic entropy, not static. The Second Law of Thermodynamics says, that entropy is increasing over time. This is true. Only that, the Second Law of Thermodynamics applies in statistical physics, where particles are randomly moving on the micro level. On the other hand, we know that the entropy measures the disorder in the system, which can be translated into the Universe, which as a whole is isolated, so it should increase its entropy and disorder (chaos in the system) with time. However, we must bear in mind, the assumption, that the elementary particles move randomly. Quantum Mechanics says that elementary particles like bosons, electrons, etc. should be considered as moving in the stochastic process (random). This would indicate that the assumption of randomness elementary particles in the universe is fulfilled. Entropy of the Universe should grow with time. This gives rise to some scientists claim that the Universe should be more and more disordered. And we are witnesses of our civilization, which looks like there is a greater order than at the Big Bang, at the beginning of the Universe. At what point is the lack of logic? In such a time, you can say that something or someone is still outside the universe, which organizes what we see on Earth? Does this mean that there is a God? My answer stems from Information Theory. Elementary particles are simply pieces of information, which has a quality, not only the amount. Weight stores only quantity and does not represent quality. When people came to (the man is complex information too), then it is started to express the quality of the matter, because people started noticing it. People understand, they conclude, think logically and affect the matter, so the information gets feedback on itself. It is consciousness. In chaos theory, we know that chaotic motion occurs when we are dealing with feedback from the past. This is the main idea of chaos. Chaotic motion with a non-zero dynamic entropy creates new information from nothing. So it is with the knowledge, creates new information. On the other hand, we know that in the same way you cannot only arouse chaos, but also to control it. Through feedback from the past, we can control the movement of chaotic behavior on unstable orbits. An example of such control we have described here13. So it is with information, as commonly occurring feedback from past information, the particle / material / information may appear larger order. With more awareness we can control it, because we control our bodies, that work, give shape to different objects. Probably do not need proof to show, because visually see the clear effect - the present civilization on Earth.

12 http://en.wikipedia.org/wiki/Chaos_theory 13 http://rsta.royalsocietypublishing.org/content/364/1846/2309.full.pdf

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 33

Corpuscular-wave duality

It exists in physics for longer time the difficult to explain the issue, that is the corpuscular-wave duality, ie. that each particle has both wave and corpuscular nature (local, property particles). The question always asked: How is it possible that a particle can be registered at a point in space and time, and also shows the nature of the wave, which is global and scared in space. This evidence of a twofold character was shown in many experiments14. On the basis of Information Theory one can explain this duality. Moving mass generates source of information linear proportional to mass. Source of Information generates waves that carriers bits in frequency space. We have to remember that information (wave) interact with other information and other masses. That is why moving mass has the particle and wave properties.

Intuitive evidence that the matter can be converted to information.

Think. People eat, drink, and consume matter. They live, they understand, and consequently form a valuable information. This can be understood as an intuitive proof of the possibility of converting matter into information.

14 http://en.wikipedia.org/wiki/Wave% E2% 80% 93particle_duality

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 34

Practical application of Informationism

The Theory of Objective Values

Objective value (ObV) is a value which can be added. Consequently, it follows that you can make any arithmetic on it as subtraction, multiplication, division, etc. In the space of the Objective Values one can compare between different types of the substance or events etc. Certainly objective value is the amount of information, but not the information itself, because it has a quality that is not already commutative, so which ones position cannot be switched (we talked about this before). The added value are commutative, that is, switching places with respect to the operation of addition does not change the result. What else gives you the ability to add? If commuting, this means, preserving the principle of conservation added values. I have already mentioned that, in the real world are: the amount of information and matter. Energy as well, but it does not actually exist Next, we conclude that, if addition is a rule of behavior, this substance is also a limited amount. Number of created substance in events is the smallest, means the principle of minimum energy and minimum amount of information (time). Therefore, the ObV theory is so important not only in theory, but also in practice. Objective Value Theory in finance15

Consider an asset, characterized by a price, p(t). The return is defined as x(t) = ln(p(t+1)/p(t)) and, in what follows, we assume that returns are independently distributed. The objective function, w(x) is related to the stationary probability distribution for returns, P(x), viz:

푃(푥) = 푒−훽푤(푥)/푁 (15)

N is a normalization factor. For independently distributed returns, this form may be obtained as the result of maximizing the ‘free energy’16 functional:

퐹 = ∫ 푑푥푃(푥)[ln 푃(푥) − 훽푤(푥) − 휆] (16)

Equally, we could minimize the objective functional, W, subject to constraints on the Boltzmann-Gibbs entropy17 (entropy in statistical physics for ) and normalization of the distribution function:

푊 = ∫ 푑푥푃(푥)[푤(푥) − 푘 ln 푃(푥) − 휆′] (17)

Such a form for the probability distribution is also the outcome of a model that assumes returns are governed by a generalized Markovian stochastic process18 of the form:

15 K. Urbanowicz, P. Richmond and J.A. Hołyst, Risk evaluation with enhanced covariance matrix, Physica A , doi:10.1016/j.physa.2007.05.034, (2007). 16 http://en.wikipedia.org/wiki/Helmholtz_free_energy 17 http://en.wikipedia.org/wiki/Boltzmann%27s_entropy_formula 18 Bernt Øksendal (2000). Stochastic Differential Equations. An Introduction with Applications, 5th edition, corrected 2nd printing. Springer. ISBN 3-540-63720-6. Sections 4.1 and 4.2.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 35

푥(푡 + 1) − 푥(푡) = 푓(푥) + 푔(푥)휀(푡) + 휂(푡) (18)

The Gaussian processes, ε and η satisfy:

〈휀(푡)휀(푡′)〉 = 푎훿(푡 − 푡′) 〈휂(푡)휂(푡′)〉 = 훽훿(푡 − 푡′) 〈휀(푡)휂(푡′)〉 = 0; 〈휀(푡)〉 = 〈휂(푡)〉 = 0 (19)

For the moment we can leave the form of the functions f and g unspecified except to say they depend only on x(t). a and b are constants. The solution to this stochastic process has been deduced elsewhere19. Adopting the Ito convention, the distribution function associated with the process is given by the Fokker Planck20 equation:

휕푃(푥,푡) 휕2 휕 = (훽 + 푎푔2(푥))푃(푥, 푡) − (푓(푥)푃(푥, 푡)) (20) 휕푡 휕푥2 휕푥

The stationary solution is:

푓(푥) ∫ 푑푥 (훽+푎푔2(푥)) 푒 푃(푥) = 푁(훽+푎푔2(푥))

Comparing with it can be seen that:

푓(푥) 훽푤(푥) = ln[훽 + 푎푔2(푥)] − ∫ 푑푥 [훽+푎푔2(푥)] 2푎푔(푥)푔′(푥)−푓(푥) = ∫ 푑푥 (21) 훽+푎푔2(푥)

A number of different cases are evident. If we define P(x) up to a normalization factor these may be expressed in the table

19 P.Richmond,"Power Law Distributions and Dynamic Behaviour of Stock Markets", Eur J Phys B 4, 523 (2001) 20 http://en.wikipedia.org/wiki/Fokker%E2%80%93Planck_equation

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 36

1 1 If we further let 휈 = , β = 1, 휆 = − 1 and choose g(x) = |x|, the 푎 휈 distribution in the third row reduces to the t-Student distribution as shown in row 4. In developing our methodology below we shall focus on the use of this distribution. Choosing other forms for the functions f and g yields other distributions that exhibit power laws. For example, the choices 푓 = −휈푥|푥|푟−1 and g = x|x|(s-1) leads to q exponentials.

Normalization in Objective Value space

As we have noted above, it is usual for a portfolio of M stocks to compute portfolio weights, pi using the covariance matrix, C and defining the risk, R, as:

푅 = ∑푖,푗 퐶푖,푗푝푖푝푗 (22)

Optimizing this in the absence of risk free assets yields the weight of stock i:

1 푝 = ∑ (퐶−1) (23) 푖 푍 푗 푖,푗

−1 where 푍 = ∑푖,푗(퐶 )푖,푗. It is known that a nonlinear transformation of data can change correlations e.g. correlations of |푥푖| decrease much slower than 푥푖. We exploit this by introducing a particular transformation that increases correlations by renormalizing the objective values such that the total set of values, 푥푖(푡푖) for all i from 1 to M and j from 1 to N are drawn from a common distribution. To effect this change, we first compute for each asset the probability distribution by fitting the data for each asset using a t- student distribution characterized by the power law index. We then compute for each

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 37 value of the return ( ) the corresponding objective value, . These 푥푖 푡푖 푤푖 (푥푡푗 ) objective values are then transformed to yield a set of renormalized objective values as follows:

1 ∑푀,푁 푤 (푥 ) 푤̂ 푀푁 푖,푗 푖 푡푗 (24) 푤̃푖 (푥푡푗 ) = 푤푖 (푥푡푗 ) = 푤푖 (푥푡푗 ) 1 푁 푤̅푖 ∑ 푤 (푥 ) 푁 푗 푖 푡푗

In effect we are renormalizing the objective value with its mean value relative to the overall mean value, 푤̂, of the entire data set. Having computed these renormalized objective values we can now obtain the corresponding set of values for 푥̃푖(푡) by inverting the values according to a new student distribution that characterizes the entire data set consisting of one value of 휈 and MxN values. Hence, using the result in row 4 of table above:

2푤̃(푥 )/(휈+1) 푖 푡푗 푥̃푖(푡푗) = ±√휈 ∙ (1 − 푒 ), (25)

where is now the tail exponent that characterizes the PDF of the entire data set. Thus, we can change the data to manipulate among different power-laws. Especially we can change power-law PDF to Normal (Gaussian) distribution. These can be seen in ObV options pricing (details in next section), when we can get from Black-Scholes formula to ObV, just by this normalization.

Examples

We implement normalization to stock returns. On the plot below one can see the effect of normalization of returns of Boeing firm. Before normalization we fitted t- Student distribution with power-law 2.54 of negative returns and 3.05 of positive returns. When we do normalization, we found both with the same power-law 3.2.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 38

The most prominent effect of normalization one can see when one calculates correlations between returns of stocks. One can see that prime eigenvalues of normalized covariance matrix are much larger, means, we found much higher correlations, than for standard covariance matrix. The plot below was done for 79 stocks from NYSE in times 1999-2000.

We can apply normalized covariance matrix to calculate better diversification of a portfolio. Here we will focus only on risk parameters, so for calculating the weights of stocks in a portfolio, we only inverse the normalized covariance matrix. We also do the same with standard covariance matrix for a benchmark. If the normalization shows more correlations, then should better diversify, lowering the risk, or with the same risk increase profit. In the plot below, we compare diversification of equally distributed portfolio, with standard covariance matrix and normalized one. Portfolio consist of 33 largest stocks on Warsaw Stock Exchange (WSE). One can see the diversification with

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 39 normalized covariance matrix brings best results. On the second plot it is seen even much better result of diversification this time on stocks from NYSE.

Objective Value (ObV) Option pricing model for power-law distribution

The standard model of Stock prices is

푆(푡) = 푆(0)푒푦(푡),

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 40 where

푆(푡) 푦(푡) = ln ( ) 푆(0) it follows

푑푦 = 휇푑푡 + 휎푑휔, (26)

where 휇 is mean rate of return and 휎2 is the variance. Here 휔 is the noise term, which follows Wiener process and satisfies

퐸[푑휔(푡)푑휔(푡′)] = 푑푡푑푡′훿(푡 − 푡′), (27)

where the notion 퐸 is the expected value. In the case the probability distribution of y is with variance 휎2 and mean 휇. Follow the Ito 휎2 Lemma the distribution of S is Log-normal with variance 휎2 and mean 휇 − . The 2 Black and Scholes21 model uses the Ito Lemma in order to calculate the transformation from Gaussian (Normal) to Log-Normal distribution of respectively y 휎2 and S. There appears the noise induced drift coefficient − , which decreases the 2 mean. The Black and Scholes, in further calculations, integrate the value of future price S with respect of Log-Normal distribution with given above parameters. After replacing the S with y, the integration reads

2 휎2푡 푦 푦 (푦−휇+ ) −푟푡 ∞ (푒 −푒 푠) 2 퐶퐵푆 = 푆(0)푒 ∫ 푒푥푝 (− ) 푑푦 (28) 푦푠 √2휋 휎2푡 2 휎2푡

푆(0) - present price of the asset, 푒푦푠 = 퐾 - strike of the option, t – time to maturity of the option. In the Black and Scholes (BS) model they consider further the problem of portfolio with short position of one call option and long position of one asset. Such a portfolio is riskless so should give on the end the risk free interest rate r. This construction give us the fair price of the call option, it means, that the drift coefficient 휇 = 0 and the asset price in the future should be discount with risk free interest rate to present time. Following this we get the famous BS formula

푦− 푦+ 퐶 = 푆(0)푃 ( ) − 퐾푒−푟푡푃 ( ), (29) 퐵푆 퐺> 휎√푡 퐺> 휎√푡

where

21 Fisher Black, Myron Scholes. The Pricing of Options and Corporate Liabilities. „Journal of Political Economy”. 81 (3). s. 637–654.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 41

퐾 휎2푡 푦 = ln ( ) − 푟푡 ± . ± 푆(0) 2

In order to introduce the other form of noise element with power-law distribution, we have to modify the Eq.(26) to the form

푑푦 = 휇푑푡 + 휎푑 Ω (30)

We assume, that Ω possess the t-Student distribution with, for now, arbitrary power-law 훼. The t-Student distribution represent the Gauss distribution in the limit 훼 → ∞ . Further we will use this property in order to find the transformation from Gaussian distribution to t-Student with arbitrary power-law 훼, so we would like to find such a scaling 푍(Ω), that

푑Ω = 푍(Ω)푑휔 (31)

We assume Ω from t-Student distribution 푃훼(푦), which is reasonable assumption found in the literature, than repeating the calculations of BS model we obtain, that variance 휎2 is replaced by 휎2푍(Ω)2 and noise induced drift coefficient 휎2 as follows − 푍(Ω)2. The value of call option is similar to Eq.(28) with Gauss 2 replaced by t-Student:

−푟푡 푧 푦푠 퐶푆푡 = 푆(0)푒 ∫ (푒 − 푒 )푃훼(푧)푑푧 (32) 푧>푦푠

The z variable will be found after substitution from y and will be shown after. The t-Student distribution which we use has the form

Γ((훼+1)/2) 푃훼(푦) = 훼+1 푦2 2 훼휋Γ(훼/2)(1+ ) √ 훼 ∞ ( ) 푧−1 −푡 (33) Γ 푧 = ∫0 푡 푒 푑푡

We have now the equation for option price, but the problem lays on the form of scaling coefficient 푍(Ω). Lisa Borland22 assume ad-hoc, that the scaling coefficient has the form

(1−푞)/2 푍(Ω) = 푃푡푠푎푙푙푖푠 (34)

22 L. Borland, Option Pricing Formulas Based on a Non-Gaussian Stock Price Model, PRL 89(9), 098701(4), (2002).

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 42

This form can be replaced having t-Student number degree of freedom α from Tsallis q-distribution23 assuming, that the power-law should be the same

2 푞 = + 1 (35) 훼+1

and Eq.(34) in the form with t-Student

1 − 훼+1 푍(Ω) = 푃훼 (36)

The Eq.(36) is the assumption of L. Borland which gives Tsalis q-distribution of Ω. We do not want to assume this form but calculate it analytically from Objective Value Theory. The ObV theory gives the form of transformation of data between t- Student distribution of arbitrary 훼. On the beginning we will found the transformation between arbitrary t-Student distributions and further we put for one power-law in limit to infinity, which will give the transformation to Gauss distribution. The transformation form comes from normalization of objective functions as follows

푤(푥) 푤(푦) = 〈푤(푦)〉 (37) 〈푤(푥)〉

Where y takes the probability distribution 푃훼(푦) and x - 푃훽(푥). The objective function for t-Student is given in table above and it reads

훼+1 푦2 푤(푦) = ln (1 + ) 2 훼 훽+1 푥2 푤(푥) = ln (1 + ) (38) 2 훽

When x is from normalized Gauss distribution ( 훽 → ∞) and y - t-Student. We have

∞ 훼+1 푦2 〈푤(푦)〉 = ∫ ln (1 + ) 푃 (푦)푑푦 −∞ 2 훼 훼 〈푤(푥)〉 = 0.5 (39)

After solving the integral for y we have circa 〈푤(푦)〉 ≈ 0.579 for 훼 = 4. The transformation formula using objective function for x from Gaussian distribution reads

〈푤(푦)〉 훽+1

푥 Π(훼) 푥2 〈푤(푥)〉 훼+1 푦 = √훼 [ (1 + ) − 1] (40) |푥| Π(훽) 훽

23 D. Prato and C. Tsallis “Nonextensive foundation of Lévy distributions”, Phys. Rev. E 60, 2398 – Published 1 August 1999

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 43

훼+1 푦2 2 Γ((훼+1)/2) Π(훼) = 푃 (푦) (1 + ) = (41) 훼 훼 √훼휋Γ(훼/2)

When 훽 → ∞

2 푥2 푥 푆푦 훼+1 푆 푦 = √훼 [(Π(훼)(√2휋) ) 푒 푦훼+1 − 1] |푥|

1 −1 훼+1 2 푦 − (√2휋) 푦 2푆푦 푥 = √2 ln [Π(훼) 푆푦 (1 + ) ], (42) |푦| 훼

〈푆(푦)〉 where 푆 = . 푦 〈푆(푥)〉 The transformation from x to y is symmetric, so the trend in unchanged. For the analytical solution we have to make some assumptions. Nevertheless, one can calculate exact values by computer with no assumptions in the middle. Have in mind that further solution has its own drawbacks. We now assume first order Taylor expansion of y with respect of x. So we get the transformation form from Wiener process x to y with t-Student distribution

푦2 훼+1 2 1+ 훼 1 푦 2푆푦 푑푦 = 훼 푆 √2 ln ( (1 + ) ) 푑푥, (43) 푦 푦 훼+1 1 훼 Π(훼)푆푦√2휋

훼 where 푤 = 2〈푤(푦)〉 훼+1

2 2 − 푑푦 ≅ √푤Π(훼)훼+1푃훼(푦) 훼+1푑푥 (44)

Now we can refer to the Eq.(31) and (36) and we see that 푍(Ω) = 2 2 − √푤Π(훼)훼+1푃훼(푦) 훼+1, so we obtain Eq.(36) with further additional parameters.

We can return to the Eq.(30) and use Ito Lemma for logarithmic price changes 푦 = ln 푆

4 4 휎2푤 − 푑 ln 푆 = (푟 − Π(훼)훼+1푃 (푦) 훼+1) 푑푡 2 훼 2 2 − +휎√푤Π(훼)훼+1푃훼(푦) 훼+1푑휂 (45)

We can recall z variable which we put the Eq.(32) in order to calculate it we have to integrate the Eq.(45)

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 44

푧 = ln 푆 = ∫ 푑 ln 푆 4 4 휎2푤 1 − = 푟푡 − Π(훼)훼+1 ∫ 푃 (푦) 훼+1푑푠 + 휎푦 (46) 2 0 훼

Here we introduced r as a value of risk free interest so now z is martingale. The value of call option can be write following Eq.(32):

−푟푡 ∞ 퐶 = 푆(0)푒 ∫ 퐾 푆 ∙ 푃훼(푆)푑푆 푆(0) ∞ −푟푡 ( ) (47) −퐾푒 ∫퐾/푆(0) 푃훼 푆 푑푆

We would like to replace integrates respect S to integrate respect y so we have

1 2 푠2 휎푤 1 푦(푠) 퐶 = 푆(0)푒−푟푡 ∫ 푒푥푝 (푟푡 − ∫ (1 + ) 푑푠 + 푠1 2 0 훼 −푟푡 푠2 휎푦) 푃훼(푦)푑푦 − 퐾푒 ∫ 푃훼(푦)푑푦, (48) 푠1

where 푆1 and 푆2 are calculated from inequality

2 휎2푤 1 푦(푠)2 퐾 푦 ∈ (푠 , 푠 ): 푟푡 − ∫ (1 + ) 푑푠 + 휎푦 > ln ( ) 1 2 2 0 훼 푆(0)

2 1 푦(푠)2 In the Eq. (48) we have to integrate ∫ (1 + ) 푑푠, which we have to 0 훼 resolve. To this end we use the scaling property of variance with respect of time which reads

훼−1 푣푎푟(푡) ≈ 푡훼−2 (49)

So we have

푣푎푟(푠) 푦(푠) = √ 푦(푡) (50) 푣푎푟(푡)

Now we can resolve the integral as follows

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 45

2 1 푦(푠)2 ∫ (1 + ) 푑푠 = 0 훼 2 1 1 1 1 + 푦2 ∫ 푣푎푟(푠)푑푠 + 푦4 ∫ 푣푎푟(푠)2푑푠 = 1 + 훼∙푣푎푟(1) 0 훼∙푣푎푟(1) 0 푦2(훼−2) 푦4(훼−2) + (51) 2∙훼−3 5∙훼−3

Substitution to Eq.(48), reads

훼+1 2 − 2 2 푠2 푦 2 휎 푤 푦 (훼−2) 퐶푂푏푉 = 푆(0)Π(훼) ∫ (1 + ) 푒푥푝 (휎푦 − (1 + + 푠1 훼 2 2∙훼−3 훼+1 4 2 − 푦 (훼−2) 푠2 푦 2 )) 푑푦 − 퐾Π(훼)푒−푟푡 ∫ (1 + ) 푑푦, (51) 5∙훼−3 푠1 훼

where

푦 ∈ (푠1, 푠2): 2 휎2푤 1 푦(푠)2 퐾 푟푡 − ∫ (1 + ) 푑푠 + 휎푦 > ln ( ) (52) 2 0 훼 푆(0)

In the case of Call option the integration is from 푠1 to 푠2.

The power-law 훼, as well as, 휎 is taken from past data. 훼 is calculated by Maximum Likelihood method24 as the fit of t-Student distribution to past data and 푆푖 푦 = ln ( ), 푆푖−푡 where t is time to maturity in working days. When we are dealing with Gauss distribution, so 훼 → ∞, then we have normal Black-Scholes option pricing. You have to be aware that because we use approximation Call-Put parity may not be preserved, so you should keep this important factor using some additional parameter and optimize this by computer.

Put option

훼+1 2 − −푟푡 푠2 푦 2 푃푂푏푉 = 퐾Π(훼)푒 ∫ (1 + ) 푑푦 푠1 훼

24 The method is described in section Generalized gaussianity evaluation

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 46

훼+1 − 푠 푦2 2 휎2푤 푦2(훼−2) −푆(0)Π(훼) ∫ 2 (1 + ) 푒푥푝 (휎푦 − (1 + + 푠1 훼 2 2∙훼−3

푦4(훼−2) )) 푑푦, (53) 5∙훼−3 where

2 휎2푤 1 푦(푠)2 퐾 푦 ∈ (푠 , 푠 ): 푟푡 − ∫ (1 + ) 푑푠 + 휎푦 < ln ( ) 1 2 2 0 훼 푆(0)

In the case of Put option the integration is from −∞ to 푠1 and from 푠2 to ∞.

Greek letter Delta

푠2 퐾 Δ퐶푎푙푙 = ∫ 퐴(푦)푑푦 − 퐴(푠2) + 퐴(푠1) + (퐵(푠2) − 퐵(푠1)) (54) 푠1 푆(0) 푠2 퐾 Δ푃푢푡 = − ∫ 퐴(푦)푑푦 + 퐴(푠2) − 퐴(푠1) − (퐵(푠2) − 퐵(푠1)) (55) 푠1 푆(0) 훼+1 − 푦2 2 휎2푤 푦2(훼−2) 퐴(푦) = Π(훼) (1 + ) 푒푥푝 (휎푦 − (1 + + 훼 2 2∙훼−3

푦4(훼−2) )) 5∙훼−3 훼+1 − 푦2 2 퐵(푦) = Π(훼)푒−푟푡 (1 + ) 훼

For Delta Put there is a change in integration (see the option pricing for Put option Eq. 53).

Difference to Lisa Borland approach

1. We use t-Student distribution which is equivalent to , so all the equations are appropriate change. 2. L. Borland does not calculate q but it is taken ad hoc to have results, so in her case q is around 1.5. In our case we take from past data and it is around 3 (2.5-3.5). We do not have any free parameter in Eq.(51). We need only one year past daily data to calculate the price of options and results which are shown in the figures are mostly very precisely real prices. (1−푞)/2 3. L. Borland assume feedback form as follows: 푑Ω = 푃푞 푑휔. In our case we derive equivalent formula (43) from objective value theory Eq. (41). Have in mind, that our solution was brought only by using Taylor expansion.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 47

4. Eq. (50) is different in Lisa Borland calculations and in our case. 5. The t-Student distribution is easier to code in computer calculations and also the final result is much simpler in our case. 6. We use quite different way of volatility calculation.

Results for option pricing.

Real data are taken from www.cboe.com. Maturity for T=12 April 2008 and T=76 July 2008.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 48

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 49

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 50

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 51

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 52

Delta Hedge Strategy based on the Objective Value Option Pricing theory

The option market starts to expand after Black, Merton and Scholes have developed their option pricing theory. Today option market helps much for hedging purposes, as well as, for speculations. Most important assumption for Black-Scholes theory was normality of distribution of returns. Using ObV theory we overcome this assumption and run out with option pricing which possess power-law distribution of returns. In ObV option pricing, we use t-Student distribution, that very well fits to commodity returns. With a help of ObV theory, we incorporate the well-known stylized fact25, that price returns from time to time behave more extremely, than it would come from standard Brownian motion theory.

Fig.1) Comparison between t-Student distribution and Normal distribution fitted to histogram of daily returns of Corn.

25 Cont, R. (2001). "Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues". Quantitative Finance 1 (2): 223–236. doi:10.1080/713665670.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 53

In the case of ObV option pricing, as for Black-Scholes, we can select a number of strategies to make the profit. Most popular strategy for options is Delta hedge Strategy. The delta hedge Strategy should make positive profit when the theory of option pricing precisely met the reality of options market. We have done backtesting of Delta Hedge Strategy using ObV option pricing. We show in Fig. 2 the net profit of portfolio for agricultural commodity (Soybeans, Wheat, Corn, Soybean Meal, and Soybean Oil).

Fig. 2) The net profit of portfolio. Here we using Delta Hedge Strategy with ObV option pricing. Delta neutrality is satisfied only at opening on position on option.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 54

How Objective value theory works in trend forecasting of stochastic systems?

Prediction the future is always most desire property of analysis, nevertheless, which kind of analysis we perform. When we would like to earn some money on some investment, we try to catch the future in different ways. Sometime we use knowledge and experience, sometimes computers show the results for us, and many times we use intuition. In all the cases we can describe it, and put it in some schemes, which rule the prices on the market. However, number of used schemes to forecast the future is large, one can say, that too large to capture it in some reasonable way. Nevertheless, there are some dominant ways and well-known knowledge, which are used by many investors, which generate some little predictable future. Sometimes the panic-like behavior increases the power of some sell/buy trigger and therefore enlarges its impact for self-realized prediction. Every market is made by buyers and sellers and prices are just the effect of their behavior, so established knowledge on forecasting is unstable. Usage of some prediction method by many investors will decrease its efficiency. On the other hand, it generates the self-made prophecy, but prophecy only for those who are the first, because the most of the people make a use of the prediction too late. This is the sense of minority games in which the only who are in minority, wins. It is true also for stock market ‘games’. When we would like to speculate on the stock market, then we try to be on the minor site, when we do invest and close the position when most people follow us. In speculation we have to be first, so we take much of the risk, because on the beginning, well-established triggers, are easy to misunderstand. The well-known knowledge is overused make it many times useless or generates self-made prediction, which of course should not be employed by reasonable investor, because it is too risky. There is for sure unpublished knowledge which makes money for big investors or others, which can be on the beginning stable (not altering in time or make too small profit to cover costs). We would like to describe some established trends in investors knowledge, which appear be the basis for future more complex forecasting. On the beginning, we should tell what is known about stock market data for sure, that is statistic and known property. Most of the people working on stock market are almost sure that the data are likely to be unpredictable, so that it is difficult to have universal in time and space method to collect money. We can say also, that there are the ways of making the profit for very simple reason, because there are people that do it and got rich on stock market investments. From Physics point of view the data, that we are dealing with, are likely to be stochastic, not linearly correlated on scales larger than an hour. Randomness is dominant in such data but we think that there are deterministic components, which can be reveal by some method but complex, nonlinear method. Let’s go to the practice on investing, what most people know to earn money.

 Diversification. The dominant way in optimization of portfolio is just diversification. Most of the papers on the investment strategies are somehow related to this issue. Diversification is the simplest way to decrease the risk with having the profit on the same level. The assumption for efficient diversification is no correlation among investments. Sometimes it is difficult to manage, if we for example are specialist in investments on banks only. On stock market many firms from the same branch are highly correlated, so investor have to look at the stocks from very different branch to

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 55 be consistent with effective diversification, or even should make money not only on stock market but also on FOREX or other. This is because it is known that highly develop bearish market shows large correlations in all stocks so the risk is high. Diversification is the only one investment strategy which can help investors without doing prediction. Rest of the methods are just prediction methods and prediction together with diversification.  Markowitz method. Putting together prediction and diversification with the simplest way of prediction (last revenue from stock) and risk evaluation (last variance of returns).  Method of predictions. They can be selected on fundamentals and technical. Fundamental evaluation are based on knowledge about condition of firms or political or investor information about decision making process. Fundamental analysis is rather long-term, which uses the disproportion between the market value and the evaluation. There are twofold problems within: o Is it the evaluation properly made? o Market prices are not always showing the capital of a firm and future profit from the capital, but what investors think about this. The second case is much visible when we calculate the capital of a firm and compare to the capitalization on the stock market. It is well-known that the second is much higher than the book value. The ratio of these two capitalizations is not universal and differs among stocks even in the same branch of industry. In this case investor uses some intuitive knowledge about the proper value.

 Technical analysis26. On the beginning, we should mention on simple methods uses mostly moments and smoothing methodologies. These methods uses coefficients, which based on the statistical property of price changes. Moments comes from the mathematics and are just first and second moment of time series, so average and variance. Trends are believed by investors to be much stronger on beginning than one can suppose. One can say that to know in which trend we are (bullish or bearish) is most important. Wisdom is to know the trend and follow it. The finest analysis about trend searching is Elliot Waves. Our expression about this analysis is that waves which shows trend are not always visible. The correct investing structure in data can be seen from time to time and can be easily misunderstand by non- specialist. In the case of volatility (in technical analysis it is called moment) the established investing procedure is to buy when volatility is low and sell when is high. It can be understood as not to enter the market in the case of large risk and changes.

 CAPM theory27. Many economists assume perfectly efficient market, market in which additional profit one can gather only when one takes additional risk. This procedure can be turned round and say if someone would like to look for more profitable stock, then he should search for more risk. This is an essence of CAPM theory, which go further and propose to allocate to portfolio on risk-profit horizon with efficiency of risk free rate.

26 John J. Murphy, “Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications (New York Institute of Finance)”, Penguin USA; 2nd Revised edition edition (1 January 1999) 27 John H. Cochrane, “New Facts in Finance”, Economic Perspectives, Federal Reserve Bank of Chicago, Vol. 23, no. 3 (1999): 36-58

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 56

In next section we would like to present the investment method called WONABRU technology, which in simple words can be understood as the trend analysis.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 57

Statistical forecasting

In this section we would like to present the idea behind WONABRU technology, very reasonable theory that could help to statistical forecasting of stochastic systems with temporal tiny correlations.

Let us consider time series of returns {푥푖}, which was made as follows:

푝푖 푥푖 = ln , (56) 푝푖−1

where 푝푖 is the price at time i. From the point of view of investors very important pivot point in returns time series is just zero. In all cases of visualization of changes in prices the green color reflects growth and red falls of prices. This is because we sell or buy, so zero is the threshold in which we earn or loose. Let us then separate our time series to positive and negative values and put a notation as follows:

+ − {푥푖 : 푥푖 > 0} 푎푛푑 {푥푖 : 푥푖 < 0} 푓표푟 {𝑖 = 1,2, … , 푁}

All the time we would like to assign zero as the pivot point in both time series. + Let us consider that there exist some asymmetry between PDF of {푥푖 } and − 푥푖 (see example on Fig.3).

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 58

Fig. 3) Academic example of distribution behavior with normal distribution on the left side and leptokurtic distribution on the right.

This unbalance will lead us to do some forecasting basing on the generalized gaussianity of left and right PDF (our meaning of the generalized gaussianity we explain in details in next section). Gaussianity in a simple word would mean a level of maturity of the distribution. Gaussianity for Gaussian distribution is maximal. This is well known from statistic that sum of many not correlated increments with finite variance will give us a Gaussian distribution, which further does not change in its figure of PDF after adding further data. Gauss distribution is the attractor of convolution of many distributions with finite second moment, that is why it is so ubiquitous. Our method says that we should BUY, when we are dealing in stock market data with situation represented in Fig.3. The generalized gaussianity of the left side distribution 퐺− is larger than generalized gaussianity of the right side 퐺+, so a parameter of unbalance F will show us a future:

퐺−−퐺+ 퐹 = . (57) 푚푎푥{퐺−,퐺+}

The reason is the following. Not stable leptokurtic distributions can be understood as generated by altered temporal correlations. These temporal correlations will lead to extreme or small values when correlate data are added. The 푇 value ∑푘 푥푖+푘 will possess large value, if {푥푖+푘} 푓표푟 푘 = 1,2, … , 푇 are positively correlated. When correlations are negative the above sum, then will be close zero. Gauss is the attractor of all distributions with finite second moment (what is always the case of stock market data). In other words gaussianity should increase in time when there are no external forces in the market. The gaussianity can increase after more frequent appearance of not correlated data. In our example more positive returns means statistically positive trend.

Generalized gaussianity evaluation

For calculations of the generalized gaussianity, we fit t-Student distribution to a histogram. The distribution function, which we would like to fit to the data, should be the one, which can represent power-law distribution, as well as, Gaussian. We took the t-Student distribution which is well known as a power-law distribution with convergence to Gaussian for infinite power. The t-Student distribution function is given in Eq. (33):

Γ((훼+1)/2) 푃훼(훼, 푦) = 훼+1 푦2 2 훼휋Γ(훼/2)(1+ ) √ 훼 ∞ ( ) 푧−1 −푡 (58) Γ 푧 = ∫0 푡 푒 푑푡

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 59

Now we would like to fit our t-Student distribution to data. In order to find more consistent parameter for tail of Student distribution we minimized logarithm of distance between histogram and our function to fit that is Eq.(58):

2 〈ln(푃푆푡(휈, 푥푖) − 푃(푥푖)) 〉 = 푚𝑖푛 (59)

We do a minimization by calculation of derivative with respect to 휈 and then let it goes to zero:

휕 ∑푖 ln(푃푆푡(휈,푥푖)) = 0 (60) 휕휈

We solve this equation numerically with constraint that variance of the data 휈 should follow 휎2 = and mean equal zero. In the case of finding a power 휈−2 exponent, one should preliminary normalized time series by average absolute value 푥푖 of x: 푥̃ = . The objective value in the case of t-Student distribution one can 푖 〈|푥|〉 present as

휈+1 푥̃2 푤(푥) = ln (1 + ) (61) 2 휈

The generalized gaussianity is calculated as the ratio of statistical Shannon entropy of returns distribution and objective value w(x), as the consequence of objective value minimization theory:

− ∫ 푃(푥) ln 푃(푥) 퐺푂푏푉 = (62) 〈푤(푥)〉

In our calculations PDF is replaced by histogram normalized to unity.

The Wonabru investment method

For the purposes of forecasting, we use equation similar to Eq.(57) and Eq.(62):

푂푏푉 푂푏푉 푂푏푉 퐺− −퐺+ 퐹 = 푂푏푉 푂푏푉 , (63) 푚푎푥{퐺− ,퐺+ }

푂푏푉 푂푏푉 where 퐺− is calculated on left site of PDF with Eq.(61), and 퐺+ on the right site respectively.

The method is biased by the swap. It means that it constantly more buy the currency with high interest rate and sell with low. This is difficult to overcome, because small shift of returns in one direction change the results. If we subtract the swap from returns than the method will take more sell positions, but no one knows how much one should subtract (subtract the whole swap practically make the things

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 60 worse). Most important in investments, is to have symmetric method of investing, which will be independent from stock market. Stock market is not symmetric, because it more goes up than fall. The problem with stock market is, that the prices are depend on the values of firm and also on the psychological effect. The psychological effect, in simple words means, that people make decisions basing on other decisions, so if other buys, investors buys also and opposite. This effect makes the power-low in the distribution of returns (normally it should be Gaussian if investors makes independent decisions). One can interpret as follows: if power-law is low for negative values, so it means that panic like effect is visible and people sell because other sell. Opposite is, when people buy, because of others than low value of power-law is on positive returns. Our method try to catch the panic like or boom effect appears and follow the trend to the moment, when this effect disappear. One can argue, that crowd effect is only for small investors and large one makes the decisions rather independent, but if, so no one forecast independent random decisions. The effect is more visible, when in the market are many investors. This is the case of FOREX. Of course huge investors makes sometimes forecast impossible or even make it opposite. When our method is losing the money, then it means that many huge investors are against the small ones.

Results based on WONABRU methodology

More details of results based on backtesting using historic data from alpari.co.uk and forexite.com are given below in the remaining figures.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 61

Fig 4) Test made over the period January 2001 to December 2007 using the 9 most liquid currency pairs: EURUSD GBPUSD USDCHF USDJPY EURJPY EURCHF GBPJPY CHFJPY (the swap is not added and usually it makes about 2% yearly profit) . The data is taken from site www.forexite.com.

Fig 5) Test made over the period July 2006- January 2008 using 10 most liquid currency pairs: EURUSD GBPUSD USDCHF USDJPY EURJPY EURCHF GBPJPY CHFJPY EURAUD GBPCHF. The calculation excludes the swap that is usually positive and usually adds ~2% per annum to the profit. This backtesting has been made using real Alpari data.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 62

Fig 6) The drawdowns for the results of backtesting shown in Fig 2. The last drawdown is the largest over last 4 years.

Introducing Shannon entropy into portfolio optimization

The theory of Markowitz28 and its application to optimization of stock portfolios is well documented. It assumes the returns of each of M stocks in a portfolio, over the time interval T, is Gaussian and the average return and variance, over the time interval T, are mi and Di respectively. The overall return of the portfolio is then

푀 푚푝 = ∑푖=1 푝푖푚푖

The portfolio risk is

푀 2 퐷푝 = ∑푖=1 푝푖 퐷푖,

28 Harry Markowitz, “Portfolio Selection”, The Journal of Finance Volume 7, Issue 1, pages 77–91, March 1952.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 63

where {pi} is the set of normalized weights associated with the M stocks. In its simplest form the optimum weights are obtained by minimizing the utility function

푀 퐸푝 = 퐷푝 − 휆푚푝 + 훾 ∑푖=1 푝푖

The least risky portfolio corresponds to, that obtained by setting λ=0, leading to the optimum weights:

∗ 1 푝푖 = , 푍퐷푖

where

푀 1 푍 = ∑푗=1 . 퐷푗

The later version of the theory takes account of correlations between the stocks via the symmetric correlation matrix: Cij. The risk is now defined to be:

푀 퐷푝 = ∑푖=1 푝푖푝푗퐶푖푗

Minimizing the risk, given by definition and again setting λ to zero, yields the * optimum weights pi :

∗ 1 푀 −1 푝 = ∑ 퐶 , 푖 푍 푗=1 푖푗

Where

푀 −1 푍 = ∑푖,푗=1 퐶푖푗 .

As Bouchaud and Potters have pointed out, the main lesson out of the theory of Markowitz’ is the need to diversify portfolios effectively29. Bouchaud, Potters and Aguilar30 have noted, that the resulting portfolio can be concentrated on only a few stocks. To overcome this, they proposed including an additional constraint:

푀 ∗ 푞 푌푞 = ∑푖=1(푝푖 )

The parameter q was chosen to be equal 2, when it is seen, that the term also represents the average weight of an asset in the portfolio. In general the term is, to use the language of physics, an entropic contribution to the minimization process. Indeed, as Bouchaud and Potter point out, the equation is linearly related to the Tsallis entropy function and the entire process of obtaining the weights, pi is equivalent to minimizing a free - utility function:

29 Jean-Philippe Bouchaud and Marc Potters, “Theory of Financial Risk and Derivative Pricing: From Statistical Physics to Risk Management”, Cambridge University Press; 2 edition (February 2, 2004) 30 J-P Bouchaud, M Potters and J-P Aguilar, “Missing information and Asset allocation”, arXiv:cond-mat/9707042 4 Jul 1997.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 64

푌푞−1 퐹 = 퐸 − 휈 푞 푞−1

E is the utility function chosen in the manner previously described.

Bouchaud has noted another issue, when use of correlation matrices. This is linked to have finite time series when computing the M(M-1)/2 individual elements of the correlation matrix. If the number of stocks in the portfolio becomes large it is possible that the number of data points used to compute the elements of the correlation matrix is of the same order of magnitude as the number of entries. As an example, for portfolios of the size of the S&P500 where the correlation matrix contains 500x499/2= 124750 different entries. Using time series extending over two years of daily data the number of data points is 500x500=250,000 which is only a factor of two larger that the number of correlation coefficients. So the statistical precision on these coefficients is subject to a large degree of measurement noise. Here we consider smaller portfolios of the order of 27 stocks where this issue does not arise.

We discuss in this note another issue that can be resolved by a further modification of the theory and that leads to novel route to measure risk and further improvements in the optimization process.

The above discussion leads directly to the following ‘free- utility’ function: Introducing entropy in the portfolio optimization

Equations of portfolio optimization31

To optimize the portfolio with entropy we use the following formula:

푇 푇 푇 푇 푤 퐶푤 + 휆(푤 푚 − 푚푐) + 훽(푤 푆푤 − 푆푐) + 훿(푤 ퟏ − 1) = 푚𝑖푛 (64) where: 푤 - vector of weights of stocks in the portfolio; C - covariance matrix; T - transpose; 푚 - vector of average returns; 푚푐 - expected return from the portfolio; S - vector of the Shannon entropy of PDF of returns; 푆푐 - constant value of entropy for the portfolio; ퟏ - identity vector; 휆, 훽, 훿 - Lagrange multipliers.

31 Krzysztof Urbanowicz, “Entropy and Optimization of Portfolios”, arXiv:1409.7002 [q-fin.ST] (2014).

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 65

The situation above is valid when stocks are independent among each other, because we use additivity of the , that is

푇 푆푝 = 푤 푆푤 푆푖 = − ∫ 푃(푥) ln 푃(푥)푑푥,

where 푆푝 is for the entropy for whole portfolio and P(x) is probability distribution of returns of stock i. We can assume that all the entropies are equal, then we get

푇 푇 푇 1 푇 푤 퐶푤 + 휆(푤 푚 − 푚푐) + 훽 (푤 푤 − ) + 훿(푤 ퟏ − 1) = 푀푒푓푓 푚𝑖푛, (65)

where 푀푒푓푓 is a constant value of effective number of stocks in the portfolio. If one put 푀푒푓푓 > 2, then it forces diversification effectively onto 푀푒푓푓 firms in the portfolio.

The explanation of introducing entropy to portfolio optimization

In order to explain the problem, we will try to explain it in the words of statistical physics. To show the similarities, we will show the connections of the statistical physics and finance.

Note of the symbol Finance Statistical Physics w weights in the portfolio number of particles with the same properties C covariance matrix: the known potential energy risk m average return velocity of flow in one direction S Unpredictability within the state; Boltzman entropy; entropy of PDF of returns: the logarithm of the number of risk related to changes in C and possible states m within state; logarithm of the possible realizations of the returns when the stocks are independent 푤푇퐶푤 + 훽(푤푇푆푤) The whole risk related to noise Free energy (C) and changes of noise and trends (m) −훽 Lagrange multiplier to entropy temperature

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 66

훿 Lagrange multiplier to chemical potential normalization of the portfolio 푤ퟏ The value of the portfolio Number of particles 휆 The ratio of risk to profit viscosity types of dynamics stochastic equation with purely stochastic deterministic parts basic elements investors particles with Brownian Walk Time series returns average changes of positions of particles of the same type source of forces deals among investors complexity of the interactions of Avogadro number of the particles Most important principle Minimization of whole the risk Minimization of free energy with constraints of some willing when temperature is non positive profit zero with constraints of some direction of flow

In the Table above most important is to understand the entropy S (accented). In the case of Statistical Physics we have to minimize the free energy in order to find the solutions with non-zero temperature. The same principle one can use in finance and portfolio optimization, so that, we come out with Eq. (64). The non-vanishing temperature means, that we are not sure if past state will persist in future, so that there can be some changes of state in the future. The energy and entropy times temperature have the same units, so it shows the level of unpredictability, disorder as a whole. In finance the energy shows the known risk. If we assume the probability distribution function (PDF) determining the state of a market, then entropy shows unpredictability of the data within the current state, so also the changes of known risk, as said above. The conservation principle of the energy makes it measurable, so the energy in the past will be the same in the future. In portfolio optimization past price dynamics can serve as the energy. The entropy in Statistical Physics measure the level of possible realizations within the state of the system (entropy is the logarithm of all possible states of the system). We assume that, in statistical physics, the particles move random so every realization is possible within the PDF. Here, in the portfolio optimization, we diversify past possible realizations of returns presented by PDF of returns. When we assume random motion of prices, as well as, independent price movements between the stocks, we can add the entropies connected to each stock. In ObV theory, it appears, that the Shannon entropy is proportional to objective value. In order to be fitted to these findings, one have to construct covariance matrix C as the covariance of objective values of returns32: 퐶푂푏푉. One should mention here, that when we change the covariance matrix the average values of returns m should be

32 Krzysztof Urbanowicz, Peter Richmond, Janusz A. Holyst, “Risk evaluation with enhaced covariance matrix”, 10.1016/j.physa.2007.05.034

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 67 change accordingly to 푚푂푏푉. This is a consequence of theory of portfolio optimization that constant value m have to be average value of time series which is taken for covariance matrix calculations (one can easy check it by resolving of Markowitz issue with assumption that m is some constant which we would like to find). In fact then we have:

푇 푂푏푉 푇 푂푏푉 푇 푇 푤 퐶 푤 + 휆(푤 푚 − 푚푐) + 훽(푤 푆푤 − 푆푐) + 훿(푤 ퟏ − 1) = 푚𝑖푛 (66)

In this functional we have all the elements proportional to each other C, m, and S. The Mean-Variance optimization should be constructed only to returns from Gaussian distributions in other case one have to use more general formula presented in Eq. (66). The risk value one can compute as:

푊ℎ표푙푒푅𝑖푠푘 = 푤푇퐶푂푏푉푤 + 훽푤푇푆푤 (67)

One can conclude, that such quantization of the risk is exact. One have to specify the entropy constant 푆푐 (in Eq. 66) in order get rid of 훽. We would like to propose new quantity that measures the quality ratio of the portfolio QR:

푝푟표푓푖푡 휆푤푇푚푂푏푉 푄푢푎푙𝑖푡푦푅푎푡𝑖표 = 푄푅 = = 푟푖푠푘 푤푇퐶푂푏푉푤+훽푤푇푆푤 휆푤푇푚푂푏푉 푄푅푒푥푝푒푐푡푒푑 = 푇 푂푏푉 푇 (68) 푤 퐶 푤+훽푒푥푝푒푐푡푒푑푤 푆푤

We can conclude further, that on average QR can be standardize to one for whole market (QR=1 for market portfolio). It means, that one can calculate 푆푐 or 훽 from market portfolio:

푇 푂푏푉 휆푤푚푚 푄푅푚푎푟푘푒푡 = 1 = 푇 푂푏푉 푇 푤푚퐶 푤푚+훽푤푚푆푤푚 푚푎푟푘푒푡 휆푚푐 푄푅푒푥푝푒푐푡푒푑 = 1 = 푇 푂푏푉 푇 , (69) 푤푚퐶 푤푚+훽푒푥푝푒푐푡푒푑푤푚푆푤푚

where 푤푚 is the vector of normalized capitalization of the firms. One can easy derive the solutions for weights of stocks in portfolio from Eq. (64) and (69). In fact it looks as follows

푂푏푉 −1 푂푏푉 푤푐 = (퐶 + 훽푆푰) (훿ퟏ − 휆푚 ) (70) where 푰 is an identity matrix and

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 68

휆 = 푻 푂푏푉 −1 푂푏푉 푇 푂푏푉 −1 푚푐∙ퟏ (퐶̃ ) ퟏ−(푚 ) (퐶̃ ) ퟏ 푇 −1 −1 푇 푇 −1 푇 −1 (푚푂푏푉) (퐶̃푂푏푉) ퟏ∙ퟏ푇(퐶̃푂푏푉) (푚푂푏푉) −(푚푂푏푉) (퐶̃푂푏푉) (푚푂푏푉) ∙ퟏ푇(퐶̃푂푏푉) ퟏ 퐶̃푂푏푉 = 퐶푂푏푉 + 훽푆푰 and

−1 푇 1+흀∙ퟏ푇(퐶̃푂푏푉) (푚푂푏푉) 훿 = −1 . ퟏ푇(퐶̃푂푏푉) ퟏ

Eq. (68) gives the solution to 훽 and 훽푒푥푝푒푐푡푒푑 as follows:

푇 푂푏푉 푇 푂푏푉 휆푤푚푚 −푤푚퐶 푤푚 훽 = 푇 푤푚푆푤푚 푇 푂푏푉 휆푚푐−푤푚퐶 푤푚 훽푒푥푝푒푐푡푒푑 = 푇 (71) 푤푚푆푤푚

Here parameter 휆 depends on 훽 itself so one have to resolve this equation by self-consistency method. The expected return, 푚푐, we fix on 20% yearly. The QR can be negative, because in short time window, the profit from market portfolio can be below zero. The weights in market portfolio are always positive, but past average may show negative values. One should stress, that the expected profit from market portfolio is always positive but the value calculated in the past may differ much from the expected one. The issue with expected profit we present with a help of 훽푒푥푝푒푐푡푒푑 the expected temperature of the market. In Fig. 7 we present the dependence of profit from portfolio on the temperature of the market. Here it is easy selected, that risk (the variation of the profit) is increasing function with temperature, what can be understood that ‘hot’ market volatile much and is difficult to predict.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 69

Fig 7) The profit values versus temperature of optimized portfolio of 27 stocks on Warsaw Stock Exchange in time 2001-2007. The reference value of QRm is 1 for market portfolio given by capitalizations of the firms.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 70

Fig 8) The plot of quality ratio QR versus temperature given by Lagrange multiplier to entropy. The optimization is done with 27 stocks on Warsaw Stock Exchange in time 2001-2007. The reference value of QRm is 1 for market portfolio given by capitalizations of the firms.

The similar situation can be seen in Fig. 8, where we show the QR for optimized portfolio versus temperature of the market. In this figure it is seen that high QR ratios are present only in ‘cool’ market and it can be understood that for ‘cool’ market the risk is low and predictability is high.

Fig 9) The compound value of portfolio given by equal distributed portfolio - solid squares, by optimization procedure (70) with equal entropies (Si = 1) - open circles and by full optimization procedure (70) - triangles. The optimization is done with 27 stocks on Warsaw Stock Exchange in time 2001-2007. For two last optimizations the reference value of QRm is 1 for market portfolio given by capitalizations of the firms.

In Fig. 9, we show the runaway of 27 stocks in the case of equal distributed portfolio and after resolving Eq. (65) and (64). The portfolio with introduced entropy gives higher profit than without it. This can be conclude, that entropy brings an information about risk. The above portfolios were calculated for 푄푅푚푎푟푘푒푡 = 1. In the case of Eq. (64) there is a shift of accent from m to correlations between the stocks. This is caused by negative 훽. When 훽 is negative the diagonal elements in matrix C are smaller than for normal Markowitz problem, so the diversification is due to correlations between stocks rather than variances and mean returns. The minimization of ‘free energy’ in the portfolio result on relying on minimizing the risk

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 71 rather than maximizing the profit. This is due that the prediction with m fails on high temperature.

The form of weights in Eq.(70) should refer to optimal portfolios with 푄푅푒푥푝푒푐푡푒푑 = 1 (this will be strictly true, if we can perfectly forecast the risk and related profit of the portfolio). The conclusion is as follows, that 푄푅 optimized portfolio for all the stocks and for longer horizons should be, when the returns are drawn randomly from the common distribution. We present the number of occurrence of QR ratios for optimized portfolio in Fig. 10. The QR ratios are rather small comparing to 1 for the market. This is because the return for market portfolio from 75 days is almost on larger scale (negative or positive) comparing to 20% yearly for optimized one. In order to overcome it one should fit the parameter 푚푐 to the current |푚푇|ퟏ absolute values of return e.g. 푚 = . 푐 푀

Fig 10) Plot of the number of occurrence of QR ratios for optimized portfolio. The optimization is done with 27 stocks on Warsaw Stock Exchange in time 2001-2007. For two last optimizations the reference value of QRm is 1 for market portfolio given by capitalizations of the firms.

The portfolio with 푄푅푒푥푝푒푐푡푒푑 = 1 is somehow balanced, but one can think of the situation, when for some time one can get optimal portfolio with higher or lower

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 72 value of 푄푅푒푥푝푒푐푡푒푑 = 1 + 휀. It should persist only temporary and after moment the optimal portfolio should return to average value: 1. In fact we can use a strategy which uses a temporal imbalance of 푄푅푒푥푝푒푐푡푒푑 ratio assuming relaxation procedure will take a place after all. We then propose the following values of weights in the portfolio:

1 푤 = (푤 − 푤 ) ∙ (1 − 푄푅 ), (72) 푠 푍 푚 푐 푒푥푝푒푐푡푒푑

where 푄푅푒푥푝푒푐푡푒푑 relate to weights calculated with Eq.(70) and 푍 = 푻 ퟏ 푤푠.

Stochastic processes in financial data – how to evaluate it?33

We present a method of noise level estimation that is valid even for high noise levels [10]. The method makes use of the functional dependence of coarse grained correlation entropy K2(ε) on the threshold parameter ε. We show, that the function K2(ε) depends on the noise standard deviation σ, in a characteristic way. It follows that observing K2(ε) one can estimate the noise level σ. Although the theory has been developed for the Gaussian noise added to the observed variable we have checked numerically that the method is also valid for the uniform noise distribution and for the case of Langevine equation corresponding to the dynamical noise. It is a common case that observed data are contaminated by a noise (for a review of methods of nonlinear time series analysis see [11, 12]). The presence of noise can substantially affect invariant system parameters as a dimension, entropy or Lyapunov exponents. In fact Schreiber [13] has shown that even 2% of noise can make a dimension calculation misleading. It follows that the assessment of the noise level can be crucial for estimation of system invariant parameters. Even after performing a noise reduction one is interested to evaluate the noise level in the cleaned data. In the experiment the noise is often regarded as a measurement uncertainty which corresponds to a random variable added to the system temporary state or to the experiment outcome. This kind of noise is usually called the measurement or the additive noise. Another case is the noise influencing the system dynamics, what corresponds to the Langevine equation and can be called the dynamical noise. The second case is more difficult to analyze because the noise acting at moment t0 usually changes the trajectory for t > t0. It follows that there is no clean trajectory and instead of it an ε-shadowed trajectory occurs [14]. For real data a signal (e.g. physical experiment data or economic data) is subjected to the mixture of both kinds of noise (measurement and dynamical). Schreiber has developed a method of noise level estimation [13] by evaluating the influence of noise on the correlation dimension of investigated system. The Schreiber method is valid for rather small Gaussian measurement noise and needs values of the embedding dimension

33 K. Urbanowicz, J.A. Hołyst, “Noise level estimation using coarse-grained entropy”, Phys. Rev. E 67, 046218 (2003), arXiv:cond-mat/0301326.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 73 d, the embedding delay τ and the characteristic dimension r spanned by the system dynamics. Diks [15] investigated properties of correlation integral with the Gaussian kernel in the presence of noise. The Diks method makes use of a fitting function for correlation integrals calculated from time series for different thresholds ε. The function depends on system variables K2 (correlation entropy), D2 (correlation dimension), σ (standard noise deviation) and a normalizing constant Φ. These four variable are estimated using the least squares fitting. The Diks method [16] is valid for a noise level up to 25% of signal variance and for various measurement noise distributions. The Diks’s method needs optimal values of the embedding dimension d, the embedding delay τ and the maximal threshold εc. Hsu et al. [17] developed a method of noise reduction and they used this method for noise level estimation. The method explored the local-geometric- projection principle and is useful for various noise distributions but rather small noise levels. To use the method one needs to choose a number of neighboring points to be regarded, an appropriate number of iterations as well as optimal parameters values d and τ. Oltmans et al. [18] considered influence of noise on the probability density function fn(ε) but they could take into account only a small measurement noise. They used a fit of fn(ε) to the corresponding function which was found for small ε. Their fitting function is similar to the probability density distribution that we receive from 1 correlation integrals 퐷퐸푇 (휀). The method needs as input parameters values of 푁2 푛 d, τ and εc. Our method has its origin in recurrence plots (RPs) [19] and it uses RPs quantities to characterize the data. Recurrence plots were originally introduced by Eckmann [19] as a useful graphic way for data analysis. The plot is defined as a matrix N x N where a dot (i, j) is drawn when ||yi−yj||< ε (ε is a given threshold). By recurrence plots one can study data stationarity [20, 21, 22], as well as their recurrence and deterministic properties [23, 24, 25]. The approach was also applied for parameter optimizing [26] in the local projection method of noise reduction [27]. RPs can be easy used to calculate characteristic system parameters like the correlation entropy [28], what will be performed in our case. Lines of black dots parallel to the main diagonal can appear in recurrence plots and their number can serve as a measure of determinism [20]. In our method we take into account a number of lines DETn of the length n or longer by the embedding dimension d = 1. We use the fact that there is a straightforward relation between DETn and the correlation integral [28]. The crucial point of our method is fitting of a proper function to the estimated correlation entropy K2. In fact similar considerations can be performed for Kolmogorov-Sinai entropy [29, 30, 31] K1 using for example the approach given in [32] but in such a case a much larger number of data is needed since the K1 entropy is more sensitive to regions of the phase space with small values of invariant measure. The method is not too time consuming, e.g. a calculation of entropy for 100 various thresholds and N = 3000 data points needed a few minutes [33]. Our method does not demand any input parameters like the embedding dimension d or the embedding delay τ. The minimal and maximal values of the threshold parameter ε can be automatically estimated. In all considerations we use the maximum norm to save the computation time and to perform analytic expansions. It is known that in the limit ε → 0 the behavior of invariant system parameters does not depend on the type of used norm. In our case features of coarse grained entropy

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 74 are considered and the value of the threshold parameter ε should be comparable to the noise level. It follows that one cannot exclude that the type of applied norm affects the functional dependence of the coarse grained entropy K2(ε) in the presence of noise of a large or medium value. We stress here that our method is provided for a noise level estimation. The method is not equivalent to noise filters that allow extracting an original non disturbed signal from noisy time series [14, 34, 35].

Entropy estimation for a time series in the noise absence

Let {xi} where i = 1, 2, ...,N be a time series and 푦⃗푖 = {푥푖, 푥푖+휏, … , 푥푖+(푛−1)휏} a corresponding n-dimensional vector constructed in the embedded space where n is an embedding dimension and 휏 is an embedding delay. The correlation integral calculated in the embedded space 푦⃗푖 is

1 푁 푁 퐶푛(휀) = ∑ ∑ Θ(휀 − ‖푦⃗ − 푦⃗ ‖), (73) 푁2 푖 푗≠푖 푖 푗

where Θ is the Heavy-side step function. If ||...|| is the maximum norm the correlation integral Cn(ε) is proportional to the number DETn(ε) of lines of the length n in the RP constructed from the data set {xi} [28]

1 퐶푛(휀) = ∑ ∑ Θ(휀 − |푥 − 푥 |)Θ(휀 − |푥 − 푥 |) 푁2 푖 푗≠푖 푖 푗 푖+휏 푗+휏 퐷퐸푇푛(휀) … Θ(휀 − |푥 − 푥 |) = (74) 푖+(푛−1)휏 푗+(푛−1)휏 푁2

The correlation entropy [36, 37] can now be calculated as

퐷퐸푇푛(휀) −푑 ln(퐷퐸푇푛(휀)) 퐾2 = lim휀→0 lim휀→∞ ln ≈ (75) 퐷퐸푇푛+1(휀) 푑푛

We assume that Eq. (75) is approximately valid for n ≥ 2 thus

−(푛−2)퐾2 퐷퐸푇푛 = 퐷퐸푇2푒 (76)

Let us introduce the following convention for lines counting: if there is a line of the length n then it includes one line of the length n − 1, one line of the length n − 2 etc. Using Eq. (76) one can easy find the average line length 〈푛〉

∞ ∞ −(푛−2)퐾2 −퐾2 ∑푛=2(퐷퐸푇푛+퐷퐸푇푛+2−2∙퐷퐸푇푛+1)∙푛 ∑푛=2 푛푒 2−푒 〈푛〉 = ∞ ≃ ∞ −(푛−2)퐾 = −퐾 (77) ∑푛=2(퐷퐸푇푛+퐷퐸푇푛+2−2∙퐷퐸푇푛+1) ∑푛=2 푒 2 1−푒 2

The above formula neglects all lines of the length n = 1. Now the entropy can be approximated as

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 75

〈푛〉−1 퐾 = ln (78) 2 〈푛〉−2 The relation between the entropy, dimension and correlation integral is given by the well-known formula [38, 39]

1 lim lim ln 퐷퐸푇 (휀) = 퐷 ln 휀 − 푛휏퐾 (79) 푛→∞ 휀→0 푁2 푛 2 2 thus, the logarithm of the correlation integral is a linear function of entropy K2 and system dimension D2. On the other hand the correlation dimension D2 is independent of the embedding dimension d if the latter is large enough. We use this fact and in the next section we will estimate the noise effect on the dimension D2 as well as on the length n of the line in RP where the line length corresponds to the embedding dimension. At the end we will incorporate both effects into Eq. (74) to reproduce the complete influence of noise on the correlation integral.

Influence of noise on correlation integral

Let us modify the definition of DETn in such a way that the influence of noise on entropy can be analytically estimated. First we change Eq. (73) to the equivalent form

푁 푁 푙 퐷퐸푇푛(휀) = ∑푖 ∑푗≠푖 Θ(∑푘=0 Θ(휀 − |푥푖+푘 − 푥푗+푘| − 푛)), (80)

where 푙 is the length of the recurrence line beginning at the point (i, j). Eq. (80) is valid provided that one assumes Θ(0) = 1 for the Heavy-side function. The function θ in Eq. (73) is called a kernel function [40] and it can be written in a general way as ρε(r). Now let us use the fact [40] that the kernel function can be replaced by any monotonically decreasing function ρε(r) with a bandwidth ε such that −푝 lim푟→0 푟 휌휀(푟) = 0 for ε > 0 and any p ≥ 0. The bandwidth ε of the kernel function corresponds to the threshold ε. It follows that we can replace the inner θ (ε − r) function in Eq. (80) by a new linear continuous function

휀−푟 푓표푟 0 ≤ 푟 ≤ 휀 휀 Θ(휀 − 푟) ⇒ 휌휀(푟) = { } (81) 0 푓표푟 푟 ≥ 휀 and simultaneously we lower the threshold in outer θ function by the constant 1 훽 = . We have checked that other choices of β bring similar results. √휋 Now instead of Eq. (80) we have

푁 푁 푛 (휀−|푥푖+푘−푥푗+푘|) 퐷퐸푇′ (휀) = ∑ ∑ Θ (∑ − 훽 ∙ 푛). (82) 푛 푖 푗≠푖 푘=0 휀

We use the above expression to calculate the mean line length 〈푛〉. Practically the length of each line is calculated as the maximal value of the parameter n in Eq.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 76

(82) provided that the θ function equals to 1. Having 〈푛〉 we calculate the system entropy K2 using the Eq. (78). Now let us consider the influence of uncorrelated Gaussian noise ηi added to the observed system variable xi. The equation (82) is replaced by the following approximation

푁 푁 푛 (휀 − |푥푖+푘 + 휂푖+푘 − 푥푗+푘 − 휂푗+푘|) 퐷퐸푇′ (휀) = ∑ ∑ Θ (∑ − 훽 푛 휀 푖 푗≠푖 푘=0

∙ 푛)

2 2 2 푁 푁 푛 (휀−|푥푖+푘−푥푗+푘|) 푛∙(√훼 휀 +2휎 −훼휀) ≃ ∑ ∑ Θ (∑ − − 훽 ∙ 푛), (83) 푖 푗≠푖 푘=0 휀 휀

where σ is the standard noise deviation and α is a constant of order of 1 that depends on the distribution of |xi − xj|. One can easily derive Eq. (83) assuming that σx ≈ αε where σx stands for a standard deviation of |xi − xj| ∈ (0, ε). When the 1 differences |xi − xj| are uniformly distributed in the region (0, ε) then 훼 = . √3 Comparing Eq. (83) to Eq. (80) and Eq. (82) we see that the effect of noise corresponds formally to the change

휀2 √ +2휎2−휀/ 3 3 √ 푛 → 푛 (1 + √휋 ) (84) 휀

Instead of the second part of lhs Eq. (79) we have

휀2 √ +2휎2−휀/ 3 3 √ −푛휏퐾 → −푛휏퐾 (휀) (1 + √휋 ) (85) 2 2 휀

For a small noise (σ ≪ ε) the last equation can be transformed to

√3휋휎2 −푛휏퐾 → −푛휏퐾 (휀) (1 + ) (86) 2 2 휀2 what is in agreement with the well-known result [41, 42] for the noise entropy in the case of noise spectrum S(ω) ∼ ω−2

1 퐾 ∼ (87) 푛표푖푠푦 휀2

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 77

The Eq. (85) expresses the influence of noise on the line length n. On the other hand Schreiber has shown [13] that the influence of noise can be described by the substitution in the equation (79)

휀 퐷 → (퐷 + (푛 − 푟)푔 ( )) (88) 2 2 2휎 where

2 2 푧푒−푧 휋 푔(푧) = √ (89) erf(푧) and the parameter r follows from the method of singular value decomposition used in [13]. Combining Eq. (79) with results (85) and (88) we get

휀 퐷2+(푛−푟)푔( ) 퐷퐸푇푛(휀) ∼ 휀 2휎 × exp (−푛휏퐾2(휀) (1 +

휀2 휀 √ +2휎2− 3 3 √휋 √ )), (90) 휀

where K2(ε) is the coarse grained entropy of the clean signal. The explicit form of the function K2(ε) is unknown. A good fit that seems to be valid for several systems is

퐾2(휀) = 휅 + 푏 ln(1 − 푎휀), (91)

where the constant κ corresponds to the correlation entropy while the second term describes the effect of the coarse graining. We stress here that the precise value of the latter function is not needed for our approach of noise level estimation because we are left with some free parameters. It follows that one can estimate the coarse grained entropy of the signal with noise as

−푑 ln(퐷퐸푇 (휀)) 1 휀 퐾 (휀) = 푛 = − 푔 ( ) ln 휀 + 퐾 (휀) (1 + 푛표푖푠푦 푑푛 휏 2휎 2

휀2 휀 √ +2휎2− 3 3 √휋 √ ) (92) 휀

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 78

where the function g(.) corresponds to the influence of noise on the correlation dimension while the second term can be split into the coarse grained entropy of the clean signal K2(ε) and the linear increase of this entropy due to the presence of the 휀2 휀 √ +2휎2− 3 3 external noise 퐾 (휀) (1 + √휋 √ ). To estimate the noise level σ one 2 휀 can use the above dependence of the correlation entropy Knoisy(ε) as the function of the threshold ε. However, we have found that because of a peculiar behavior of p Knoisy(ε) it is more convenient to fit the function Knoisy(ε) · ε instead of Knoisy(ε) to corresponding experimental data (p is a constant of order of 1). It follows that we need to estimate five free parameters κ, σ, a, b and c for the function

휀 퐾 (휀) ∙ 휀푝 = −푐휀푝푔 ( ) ln 휀 푛표푖푠푦 2휎 휀2 휀 √ +2휎2− 3 3 +(휅 + 푏 ln(1 − 푎휀))휀푝 (1 + √휋 √ ) (93) 휀

The parameter c (c ranges typically from 0.5 to 0.7) has been introduced for a better agreement to numerical data. To fit the above function we have used Levenberg- Marquardt method [43]. We stress here we do not need to assume any input value for the above coefficients but they appear as a result of application of our method. p The important feature of the plot Knoisy(ε) ε for noisy data is the appearance of two maxima34. This feature is helpful for the noise estimation since origins of these maxima are related to the first and second part of rhs of Eq. (93), i.e. the first maximum is connected to the noise level, while the second maximum to the finiteness of the attractor. For a high noise level both maxima merge. The position of the first peak or the single maximum can be used for additional noise estimation because one can find that for

1 푝 = 0.3441717 − (94) ln(휎)

p the maximum of Knoisy(ε) ε appears at ε = σ. The relation (94) gives us the second way, beside Eq. (93), for estimation of noise level and for the control of results received due to the fitting (93).

Noise estimation for financial time series

We calculate the level of noise NTS in Dow Jones Index using the methodology above. NTS (Noise to Signal) is defined as follows:

34 K. Urbanowicz, J.A. Hołyst, “Noise level estimation using coarse-grained entropy”, Phys. Rev. E 67, 046218 (2003), arXiv:cond-mat/0301326.

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 79

휎푛표푖푠푒 푁푇푆 = , (95) 휎푑푎푡푎

Where 휎푛표푖푠푒 is standard deviation of noise and 휎푑푎푡푎 is standard deviation of data. One should mention that percentage of noise is calculated as %Noise = NTS2. On the plot below one can see that noise level, means %Noise, is dominant in this data (and also in others).

How quickly and properly calculate Value-at-Risk for the whole portfolio?

On the basis of Objective Value Theory, we construct the way of calculation Value-at-Risk (VaR) for portfolio for market with power-law distribution of returns. This methodology should be very fast comparable to Monte Carlo simulations, because we numerically solve the integrals only. The portfolio P is composed of N instruments, the value of the portfolio at time t is:

푁 푃푡 = ∑푖=1 푝푖푥푡, (96)

where xt is the price of the instrument, t is the time, and pi is the amount of shares of an instrument i in the portfolio.

Risk of the portfolio by Markowitz is:

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 80

2 푁 2 2 푅𝑖푠푘푝 = 휎푝 = ∑푖=1 푝푖 휎 , (97)

where σi is the standard deviation of historical returns of price of the instrument. The above equation can be approximately replaced as follows:

푁 2 ∞ 2 푅𝑖푠푘 = 휎2 = ∑ 푝 푓 푦 푑푦 , (98) 푝 푝 푖=1 푖 ∫−∞ 푦푖 푖 푖

where is the probability distribution function of returns. The Value at Risk, 푓푦푖 abbreviated VaR, reads:

푁 2 −훽푖 2 푉푎푅 = ∑ 푝 푓 푦 푑푦 (99) 푝 푖=1 푖 ∫−∞ 푦푖 푖 푖

훽푖 Value is the same for all instruments, if the prices of these instruments returns are 휎푖 independent of each other. Otherwise, the threshold βi should be calculated from the covariance matrix.

Suppose t-Student distribution with degrees of freedom αi for price returns of the instrument i. t-Student distribution is by Eq. (33). The degrees of freedom α can be computed by Maximum Likelihood Method. In the following discussion we will use the Objective Value 푤. For the t-Student distribution 푤 is as follows:

훿+1 푦2 푤(푦) = ln (1 + ) (100) 2 훿

We calculate the average value for each instrument. The Objective Value for the Normal distribution is the variance, so we can use the table for a normal distribution to calculate 훽. The example of table can be found anywhere. We do this so, that for VaR on level 5%, we have β = 1.65 σ. So the relationship between and β are:

훽 = 1.65√〈푤〉 (101)

We calculate the average value as follows:

2 푡 훼푖+1 푦푖,푗 〈푤〉 = ∑푗=0 ln (1 + ), (102) 2 훼푖

where i is the index of the instrument, and j is the time.

We still have to take into account the correlations between the returns of financial instruments. We substitute into the formula (101) the average value of w:

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 81

, and compute the degrees of freedom β matrix. We calculate the eigenvalues and eigenvectors for the matrix β using Singular Value Decomposition. We have now vectors that are independent in space and we can use equation (99). The final formula for VaR for the portfolio include:

퐾 2 −1.65휈푖 2 푉푎푅 = ∑ 푝 푓 푦 푑푦 (103) 푝 푖=1 푖 ∫−∞ 푦푖 푖 푖

where 휈푖 is the eigenvalue of the matrix 휷, K the number of significant eigenvalues of matrix 휷, which we take from eq. (101). We have to mention, that the Eq. (103) is for VaR on level 5%. Adequate Eq. (103) for VaR on level 1% reads:

퐾 2 −2.33휈푖 2 푉푎푅 = ∑ 푝 푓 푦 푑푦 (104) 푝 푖=1 푖 ∫−∞ 푦푖 푖 푖

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 82

Examples of automated strategies35

Here we would like to provide investment automated strategies, that can be evaluated from above theories. We provide also practical implementations as a code in C programming language36. We should emphasize, that these strategies needs work and do more backtest in order to be used on real trading. These are just examples, how one can create automated strategies based on scientific approach. Strategy I

The strategy is based on the principle of minimizing the entropy of the time series. There are calculated returns from prices. Then it is appended to the end of the series one point (potential future changes in prices). This is the estimated return on the value from -1.9 to 1.9 of standard deviation of the returns of prices. There are assuming 9 different variants of future values of returns, then we choose this option for which the entropy of N + 1 data is minimal. Variant with a minimum value of entropy is normalized and returned as the prediction f. Entropy is calculated using the average length of lines in the recurrence diagram. Strategy II

The strategy takes into account the differences between entropy of order 1 and 2 (Shannon entropy and correlation entropy). If the histogram of returns is flat (uniform distribution), there is no difference between entropies. If the histogram is strongly leptokurtic (or concentrated in one or more points), there are large differences between entropies.

There are calculated returns from prices. The strategy is based on the separation of returns on bigger or less than zero. It is then calculated the proportion of K1 / K2 for positive and negative returns. Shannon entropy is K1, and K2 is a simplified version of the correlation entropy. Next, one should calculate f, which is the normalized value of the difference of the left and right statistics i.e. f = (K1 / K2 )left - (K1 / K2)right. f says which direction the price will go in the future.

Strategy III

The strategy takes into account the differences between the entropy of order 2 and 3 (correlation entropy and Renyi entropy37 with q = 3). If the histogram of returns is flat (uniform distribution), there is no difference between entropies. If the histogram is strongly leptokurtic (or concentrated in one or more points), there are large differences between entropies. Shannon entropy (with q = 1) is significantly different in practice from the rest of entropies, so there is a significant difference between the strategy II and III. There are calculated returns from prices. The strategy is based on the separation of returns on bigger or less than zero. It is then calculated the proportion of K2 / K3 for positive and negative returns. Correlation entropy is K2, and K3 is a

35 Codes in C are available by email contact 36 Codes in C are available by email contact 37 http://en.wikipedia.org/wiki/R%C3%A9nyi_entropy

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 83 simplified version of the Renyi entropy with q=3. Next, one should calculate f, which is the normalized value of the difference of the left and right statistics i.e. f = (K2 / K3 )left - (K2 / K3)right. f says which direction the price will go in the future.

Strategy IV

The strategy consists of the difference between the gaussianity of negative and positive returns. There are calculated returns from prices. We first calculate the power-law of the t-Student distribution histogram matched to the negative and positive returns. It is then calculated the proportion of Kq / w for positive and negative returns. Kq is Renyi entropy, where q is suitably associated with power distribution. In contrast, the objective value w is calculated respectively for the Student-t distribution (see 38). Next, one should calculate f, which is the normalized value of the difference of the left and right statistics i.e. f = (Kq / w )left - (Kq / w)right. f says which direction the price will go in the future.

Strategy V

A strategy on the principle of minimizing the modified dynamical entropy of time series This strategy is a hybrid strategy I and II, although completely different from the last. There are calculated returns from prices. Then we calculate the dynamic entropy for future returns negative and positive (as it was in strategies I, II, and III). Forecast f is then dynamical entropy of the negative side minus of the positive side, appropriately normalized. Dynamic Entropy requires a proper time direction, which is much different from the static entropy, where such behavior is voluntary (dynamic entropy is the creation of static entropy in time). Dynamic entropy is calculated using the average length of a line in the recurrence diagram, like in strategy I.

Strategy VI

Trend trading

The strategy has available the N of past prices. The strategy uses one of the statistics that were used in previous strategies. One splits past window of data into two windows – first (more in past) and second (most recent). On the first window of the d data we generate forecast f. Now, if forecast f will be proven to be well performing on second window, we leave it as a predictor of the overall strategy on next step. If forecast f on first window won't be proved as a good predictor, we will multiply by -1 and use as a current predictor.

38 K. Urbanowicz, P. Richmond, and J.A. Holyst, Risk evaluation with enhanced covariance matrix, Physica A 384 (2), 468-474, (2007).

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 84

Bibliography

[1] Planck, M. (1914). The Theory of Heat Radiation. Masius, M. (transl.) (2nd ed.). P. Blakiston's Son & Co. OL 7154661M. [2] Planck, M. (1915). Eight Lectures on Theoretical Physics. Wills, A. P. (transl.). Dover Publications. ISBN 0-486-69730-4. [3] Draper, J.W. (1847). On the production of light by heat, London, Edinburgh and Dublin Philosophical Magazine and Journal of Science, series 3, 30: 345–360. [4] Heisenberg, W. (1927), "Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik", Zeitschrift für Physik (in German) 43 (3–4): 172–198, Bibcode:1927ZPhy...43..172H, doi:10.1007/BF01397280.. Annotated pre-publication proof sheet of Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik, March 23, 1927. [5] Shannon, C.E. (1948), "A Mathematical Theory of Communication", Bell System Technical Journal, 27, pp. 379–423 & 623–656, July & October, 1948. [6] Recherches sur la théorie des quanta (Researches on the quantum theory), Thesis, Paris, 1924, Ann. de Physique (10) 3, 22 (1925). [7] Sinha U, Couteau C, Jennewein T, Laflamme R, Weihs G., “Ruling out multi-order interference in quantum mechanics.”, Science. 2010 Jul 23;329(5990):418-21. doi: 10.1126/science.1190545. [8] S. Weisner (1983). "Conjugate coding". Association for Computing Machinery, Special Interest Group in Algorithms and Computation Theory 15: 78–88. [9] Krzysztof Urbanowicz, “Information Theory with experimental results”, DOI: 10.13140/RG.2.1.4023.2169, https://www.researchgate.net/publication/281283287_Theory_of_Information_with_e xperimental_results [10] K. Urbanowicz, J.A. Hołyst, “Noise level estimation using coarse-grained entropy”, Phys. Rev. E 67, 046218 (2003), arXiv:cond-mat/0301326. [11] H. Kantz and T. Schreiber, Nonlinear Time Series Analysis (Cambridge University Press, Cambridge, 1997). [12] H.D.I. Abarbanel, Analysis of Observed Chaotic Data (Springer, New York, 1996). [13] T. Schreiber, Phys. Rev. E 48(1),13(4) (1993). [14] J. D. Farmer and J.J. Sidorowich, Physica D 47, 373-392 (1991). [15] C. Diks, Phys. Rev. E 53(5),4263(4) (1996). [16] Dejin Yu, M. Small, R.G. Harrison and C. Diks, Phys. Rev. E 61(4),3750(7) (2000). [17] R. Cawley and Guan-Hsong Hsu, Phys. Rev. A 46(6), 3057 (1992). [18] H. Oltmans and P. J.T.Verheijen, Phys. Rev. E 56(1),1160(11) (1997). [19] J-P. Eckmann, S Kamphorst and D. Ruelle, Europhys. Lett. 4, 973-977 (1987). [20] L.L. Trulla, A. Giuliani, J.P. Zbilut and C.L. Webber Jr., Phys. Lett. A 223, 255-260 (1996). [21] C. Manetti, M. A. Ceruso, A. Giuliani, C. L. Webber Jr. and J. P. Zbilut, Phys. Rev. E 59, 992-998 (1999). [22] J. P. Zbilut, A. Giuliani and C. L. Webber Jr., Phys. Lett. A 246, 122 (1998). [23] J.A. Ho lyst, M. ˙Zebrowska and K. Urbanowicz, European Physical Journal B 20, 531-535 (2001). [24] F.M. Atay and Y. Altintas, Phys. Rev. E 59(6), 6593(6) (1999).

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 85

[25] J.M. Chai, B.H. Bae and S.Y. Kim, Phys. Lett. A 263, 299-306 (1999). [26] L. Matassini, H. Kantz, J. Ho lyst and R. Hegger, Phys. Rev. E 65,021102(2002). [27] R. Hegger, H. Kantz, L. Matassini and T. Schreiber, Phys. Rev. Lett. 84, 4092 (2000). [28] P. Faure and H. Korn, Physica D 122, 265-279 (1998). [29] The coarse grained Kolmogorov-Sinai entropy is related to so-called ǫ-entropy introduced by Shannon, see C.E. Shannon, Bell Syst. Techn. J. 27, 379 and 623 (1948). [30] P. Gaspard and Xiao-Jing Wang, Phys. Rep. 235, 291-343 (1993). [31] A. Ostruszka, P. Pako ˙ nski, W. S lomczy ˙ nski and K. ˙Zyczkowski, Phys. Rev. E 62, 2018-2029 (2000). [32] A. Cohen and I. Procaccia, Phys. Rev. A 31, 1872 (1985). [33] A processor Celeron 400MHz has been used for numerical calculations. [34] P. Grassberger, R. Hegger, H. Kantz, C. Schaffrath and T. Schreiber, Chaos 3(2),127 (1993). [35] E.J. Kostelich and T. Schreiber, Phys. Rev. E 48(3),1752 (1993). [36] P. Grassberger and I. Procaccia, Phys. Rev. A 28, 2591 (1983). [37] G. Benettin, L. Galgani and J.M. Strelcyn, Phys. Rev. A 14(6), 2338 (1976). [38] K.Pawelzik and H.G. Schuster, Phys. Rev. A 35, 481 (1987). [39] P. Grassberger and I. Procaccia, Phys. Rev. Lett. 50(5), 346 (1983). [40] J.-M. Ghez and S. Vaienti, Nonlinearity 5,777-790 (1992). [41] G. Boffetta, M. Cencini, M. Falcioni and A. Vulpiani, Physics Reports 356, 367- 474 (2002). [42] M. Cencini, M. Falcioni, E. Olbrich, H. Kantz and A. Vulpiani, Phys. Rev. E 62(1),427(11) (2000). [43] W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery, Numerical recipies in C (Cambridge Univesrity Press, second edition, 1992). [44] L. Jaeger and H. Kantz, Physica D 105, 79-96 (1997). [45] Shuxian Wu, Proceedings of the IEEE, vol. 75, No. 8 (1987). [46] L.O Chua and G.-N. Lin, IEEE Transactions on Circuits and Systems 37(7),885- 902 (1990).

Informationism from Philosophy to Quantitative Trading (4th edition)– Krzysztof Urbanowicz 86