Arxiv:1405.0126V1 [Cs.IT] 1 May 2014 Aint Eitgae Ihpeiu Eois Ec Gener Information

arXiv:1405.0126v1 [cs.IT] 1 May 2014 aint eitgae ihpeiu eois ec gener information. hence integrated memories, ating previous with integrated information. be obse conscious to generate a vation for to requirement require- the observation establishes the second conscious The establishes a experiment for slightly thought adapt ment first we The which here. experiments, thought two through unified, a of actions system. the singular reflects which brain information, human behaviour integrates the producing it that that say is we consciousness when produces mean Accord we structure, what information. organizational Tononi, integrate to systems’ to be capacity a thus its of specifically can terms and in phenomenon processing quantified consciousnes information mean- that an proposes be theory is to The issue addressed. this ingfully allows which framework theoretical sub- to rise give to combine experiences. correlates jective neural these color, understan how is ing remains of sha that question of experience the hard perception The the the of 2008). with (Tononi, impair interfere areas can to lesions certain other shown while to been damage has example, cortex to For consciousness of precise aspects uncovered. allowing different of are correlates neuroscience neural in advances Continuing aacmrsin ouaiyo mind. of modularity compression; data Keywords: mod be cannot it exists, computationally. consciousness unitary lossles resul if algo- This that complete using plies that functions. idea noncomputable prove requires We this integration of integrati theory. formalization lossless information a a rithmic provide as and consciousness i necessi frame process it propose to would we natural integration memories, more existing lossy to Since damage continuous tate de loss. 2014) Griffith, information (e.g. on form information previous integrated that of consciou argue izations of We theory information. integrated (2008) Tononi’s as review ness we article this In ooi(08 xlistefudtoso i theory his of foundations the explains (2008) Tononi a provides theory information integrated (2008) Tononi’s sCncosesCmual?QatfigItgae Info Integrated Quantifying Computable? Consciousness Is osiuns;itgae nomto;synergy; information; integrated Consciousness; Introduction Abstract eec aur ([email protected]) Maguire Rebecca loihi nomto Theory Information Algorithmic colo uies ainlCleeo Ireland of College National Business, of School hlMgie([email protected]) Maguire Phil hlpeMsr([email protected]) Moser Philippe iglGifih([email protected]) Griffith Virgil optto n erlSystems, Neural and Computation eateto optrScience Computer of Department atc,Psdn,California Pasadena, Caltech, U anoh Ireland Maynooth, NUI FC uln1 Ireland 1, Dublin IFSC, elled im- t pend al- ve s- thus s s - ing be pe d- r- s - ca ml eetrs htnwi a itnus between distinguish can it now that so detector smell ficial nomto ln sntsfcetfrcncosexperienc conscious for sufficient not tha is establishes alone experiment information thought second (2008) Tononi’s Information Integrated Generating 2: Requirement information). among generate discriminate must they they (i.e. where alternatives context many a within m situated experiences of be Descriptions its colours. on other depends with would ‘red’ contrast ‘red’ of as informativeness object The an the meaning. labeling no coloured of hold act was the world red, whole of po shade the alternative same if of example, range For a sibilities. experience to an relative of quality expressed the necessarily that is experiment thought tial information. of we 000psil tts iliglog yielding states, possible 10,000 tween as smell a identifies human a late When 2008). ( al., nose detected et the berts smells lining neurons different humans receptor olfactory contrast, 10,000 specialized In by than bit. more single a distinguish one, by binary can encoded a be thus is can T lavender and aspects. and molecule other chocolate single all between be a ignoring distinction may onto scents, two it latching the example, is separates For detector that the do. that humans case arom as the full manner the same experienced but has the lavender, it in from that guarantee chocolate not distinguish does Clearl this to chocolate? managed of has smell it the experienced actually has tor flashes sensor the and etrol ed odsigihbtentopsil smell de- possible the two between case distinguish this to In needs only lavender. tector and o flavors chocolate two candle: producing currently scented Let’ is boxes. factory conveyor appropriate the the the that to suppose on them fo passing directing used candles and below is the belt detector of The in- aroma candles detector. the smell sampling scented artificial producing an factory in vests a that imagine Let’s Information Generating 1: Requirement mgn httesetdcnl atr nacstearti- the enhances factory candle scented the that Imagine h motn on htTnn 20)rie ihhsini- his with raises (2008) Tononi that point important The ac fcooaesetdcnlsi asdunderneath passed is candles scented chocolate of batch A hyaegnrtn epnewihdsigihsbe- distinguishes which response a generating are they chocolate mto Using rmation a esyta h detec- the that say we Can . 2 10 , 000 = 13 choco- . bits 3 Al- ust e. he s- is s. y a s f r t 1 million different smells, even more than the human nose. The same reasoning can be used to explain why a video Can we now say that the detector is truly smelling chocolate camera, which generates plenty of information, remains un- when it outputs chocolate, given that it is producing more in- conscious, in contrast to a person viewing the same scene. formation than a human? What is the difference between the The memories generated by the video camera can be easily detector’s experience and the human experience? edited independently of each other. For example, I can de- Like the human nose, the artificial smell detector uses spe- cide to delete all of the footage recorded yesterday between cialized olfactory receptors to diagnose the signature of the 2pm and 4pm. In contrast, a person viewing the same scenes scent and then looks it up in a database to identify the appro- encodes information in an integrated fashion. I cannot delete priate response. However, each smell is responded to in iso- Amy’s memories from yesterday because all of her memo- lation of every other. The exact same response to a chocolate ries from today have already been influenced by them. The scent occurs even if the other 999,999 entries in the database two sets of memories cannot easily be disentangled. When it are deleted. The factory might as well have purchased a mil- comes to human consciousness it is not possible to identify lion independent smell detectors and placed them together any simple divisions or disjoint components. in the same room, each unit independently recording and re- What Tononi’s (2008) theory proposes is that when peo- sponding to its own data. ple use the term ‘consciousness’ to describe the behaviour of According to Tononi (2008), the information generated by an entity they have the notion of integrated information in such a system differs from that generated by a human insofar mind. We attribute the property of being conscious to sys- as it is not integrated. Because it may as well be composed tems whose responses cannot easily be decomposed or disin- of individual units, each with the most limited repertoire, an tegrated into a set of causally independent parts. In contrast, unintegrated set of responses cannot yield a subjective expe- when we say that a video camera is unconscious, what we rience. To bind the repertoire, a system must generated in- mean is that the manner in which it responds to visual stimuli tegrated information. Somehow, the response to the smell of is unaffected by the information it has previously recorded. chocolate must be encoded in terms of its relationship with other experiences. Quantifying Integrated Information Tononi (2008) seeks to formalize the measurement of inte- Consciousness as Integrated Information grated information. His central idea is to quantify the infor- Inside the human nose there are different neurons which are mation generated by the system as a whole above and beyond specialized to respond to particular smells. The process of the information generated independently by its parts. For in- detection is not itself integrated. For example, with selective tegrated information to be high, a system must be connected damage to certain olfactory receptors a person could conceiv- in such a way that information is generated by causal interac- ably lose their ability to smell chocolate while retaining their tions among rather than within its parts. ability to smell lavender. However, the human experience Assuming that the brain generates high levels of integrated of smell is integrated as regards the type of information it information, this implies that the encoding of a stimulus must records in response. be deeply connected with other existing information in the According to Tononi’s (2008) theory, when somebody brain. We now address the question of what form of process- smells chocolate the effect that it has on their brain is inte- ing might enable such integrated information to arise. grated across many aspects of their memory. Let’s consider, Griffith (2014) rebrands the informational difference be- for example, a human observer named Amy who has just ex- tween a whole and the union of its parts as ‘synergy’. He perienced the smell of chocolate. A neurosurgeon would find presents the XOR gate as the canonicalexample of synergistic it very difficult to operate on Amy’s brain and eliminate this (i.e. integrated) information. Consider, for example, a XOR recent memory without affecting anything else. According gate with two inputs, X1 and X2, which can be interpreted as to the integrated information theory, the changes caused by representing a stimulus and an original brain state. They com- her olfactory experience are not localised to any one part of bine integratively to yield Y , the resultant brain state which her brain, but are instead widely dispersed and inextricably encodes the stimulus. Given X1 and X2 in isolation we have intertwined with all the rest of her memories, making them no information about Y . The resultant brain state Y can only difficult to reverse. This unique integration of a stimulus with be predicted when both components are taken into account at existing memories is what gives experiences their subjective the same time. Given that the components X1 and X2 do not (i.e. observer specific) flavour. This is integrated information. have any independent causal influence on Y , all of the infor- In contrast, deleting the same experience in the case of an mation about Y here is integrated. artificial smell detector would be easy. Somewhere inside the One issue with presenting the XOR gate as the canonical system is a database with discrete variables used to maintain example of synergistic information is that it is lossy. A two the detection history. These variables can simply be edited to bit input is reduced to a single bit output, meaning that half erase a particular memory. The information generated by the the entropy has been irretrievably lost. If the brain integrated artificial smell detector is not integrated. It does not influence information in this manner, the inevitable cost would be the the subsequent information that is generated. It lies isolated, destruction of existing information. While it seems intuitive detached and dormant. for the brain to discard irrelevant details from sensory input, it seems undesirable for it to also hemorrhage meaningful achieves. content. In particular, memory functions must be vastly non- We begin with a brief description of algorithmic informa- lossy, otherwise retrieving them repeatedly would cause them tion theory (see Li and Vityányi, 2008, for more details). We to gradually decay. use strings to refer to finite binary sequence, i.e. an element ω We propose that the information integration evident in cog- of set 2< . Any finite object can be encoded into a string nition is not lossy. In the following sections we define a form in some natural way. We are interested in effective descrip- of synergy, based on data compression, which does not rely tions of strings (i.e. computable by a universal computer i.e. on the destruction of information, and subsequently explore Turing machine) . For a string x, its (plain) Kolmogorovcom- its implications. plexity C(x) is the length of the shortest effective description of x. More formally, fix a universal Turing machine U. C(x) Data Compression as Integration is the length of the shortest program x∗ such that U on input x∗ outputs x. It can be shown that the value of C(x) does not Data compression is the process by which an observation is depend on the choice of U up to an additive constant. C(x) is reduced by identifying patterns within it. For example the the amount of algorithmic information contained in x. A ran- sequence 4,6,8,12,14,18,20,24... can be simplified as the dom string is a string x that cannot be compressed, e.g. such description “odd prime numbers +1”. The latter representa- that C(x) is at least the length of x. tion is shorter than the original sequence, hence it evidences For two strings x,y the conditional Kolmogorov complex- data compression. ity C(x|y) of x given y is the size of the shortest program A close link exists between data compression and predic- q such that U on input p and provided y as an extra in- tion. Levin’s (1974) Coding Theorem demonstrates that, with put, outputs x. The information x has about y is defined as high probability, the most likely model that explains a set of I(x : y)= C(x) −C(x|y)=+ C(y) −C(y|x), where =+ means observations is the most compressed one. In addition, for equal up to a O(1) (constant) term. The idea of C-based syn- any predictable sequence of data, the optimal prediction of ergy (Griffith, 2014) is to define four intuitive slices of the the next item converges quickly with the prediction made by C-information of the function m : (x1,x2) 7→ y. the model which has the simplest description (Solomonoff, 1964). As per Occam’s razor, concise models make fewer 1. R: the amount of the C-information strings x1 and x2 con- assumptions and are thus more likely to be correct. vey redundantly about y, or, equivalently, the amount of These insights lay the foundation for a deep connection data compression that y achieves assuming statistically in- between data compression, prediction and understanding, a dependent inputs theoretical perspective on intelligence and cognisance which 2. U1: the amount of C-information that only string x1 con- we refer to as ‘compressionism’. Adopting this perspec- veys about y. tive, Maguire and Maguire (2010) propose that the binding of information we associate with consciousness is achieved 3. U2: the amount of C-information that only string x2 con- through sophisticated data compression carried out in the veys about y. brain, suggesting a link between this form of processing and 4. S: the amount of C-information the concatenation string, Tononi’s (2008) notion of information integration. (x1,x2) conveys about y not conveyed by either x1 or x2. In the case of an uncompressed string, every bit carries in- From the Partial Information Decomposition framework dependent information about the string. In contrast, when a (Williams & Beer, 2010), we have the following equalities text file is compressed to the limit, each bit in the final rep- relating the nonnegative scalars R, U , U , and S: resentation is fully dependent on every other bit for its sig- 1 2 nificance. No bit carries independent information about the R +U1 = I(x1 : y) original text file. For an uncompressed file, damaging the R +U2 = I(x2 : y) first bit leaves you with a 50% chance of getting the first bit R +U1 +U2 + S = I(x1,x2 : y) . right and 100% chance of getting the rest of the bits right. For an optimally compressed file, damaging the first bit cor- First, using the three equalities above we can define an easy rupts everything and leaves you with only a 50% chance of expression for the synergy minus the redundancy, getting all the bits right and a 50% chance of getting them I(x1,x2 : y) − I(x1 : y) − I(x2 : y)= S − R . all wrong. Clearly, the information encoded by the bits in the compressed file is more than the sum of its parts, highlighting Theorem 1 Given C(x1,x2|y)= 0, then S ≤ R with equality a link between data compression and Tononi’s (2008) concept when I(x1 : x2)= 0. of integrated information. Proof. Using the prior expression we expand the three C- In the following section we formally prove that, given the information slices into their respective C-entropies.

Partial Information Decomposition (Williams & Beer, 2010) S − R = I(x1,x2 : y) − I(x1 : y) − I(x2 : y) formulation of synergy, the amount of integrated informa- = C(x ,x ) −C(x ) −C(x ) −C(x ,x |y)+C(x |y)+ tion an information-lossless process produces on statistically 1 2 1 2 1 2 1 independent inputs is equivalent to the data compression it C(x2|y). Given that C(x1,x2|y)= 0, we know likewise that C(x1|y)= Quantifying Integration Using Edit Distance C(x |y)= 0; we simplify the above, 2 If data is optimally compressed then it becomes extremely difficult to edit in its compressed state. For example, imagine S − R = C(x1,x2) −C(x1) −C(x2) a compressed encoding of a Wikipedia page. You want to edit = C(x1)+C(x2|x1) −C(x1) −C(x2) the first word on the page. But where is this word encoded in = −I(x1 : x2). the compressed file? There is no easily delineated set of bits which corresponds to the first word and nothing else. Instead, From the above we have, the whole set of data has been integrated, with every bit from the original file depending on all the others. To discern the S = R − I(x : x ) . 1 2 impact that the first word has had on the compressed encoding you have to understand the compression. There are no Which entails S ≤ R with equality when I(x1 : x2)= 0. ⊓⊔ shortcuts. The above result shows that synergy (i.e. integrated infor- To formalize integrated information as data compression mation) is equivalent to redundancy (i.e. data compression) we consider a stimulus, first in its raw unintegrated state, and for lossless functions operating on statistically independent second, encoded in its integrated state within the brain. The inputs. However, an obstacle remains to expressing synergy level of integration is equivalent to the difficulty of identify- in this format. Although Griffith’s (2014) formulation of syn- ing the raw information and editing it within its integrated ergy identifies the link with data compression, giving a def- state. inition of the C-information slices R,U1,U2,S based on C- In the following definition z andz ¯ are the raw stimulus complexity is not trivial. and the brain encoded stimulus. We consider the difficulty ′ To quantify synergy for lossless functions using C- of editing z into z , for example, editing the smell of choco- complexity, Tononi’s (2008) definition of integrated infroma- late to turn it into the smell of lavender. If this operation tion must be somehow translated from its original operational is performed on a raw, unintegrated dataset then the task is framework of Shannon information theory to that of algorith- straight-forward: the bits that differ are simply altered. Con- mic information theory. We now show that the most natural sider, however, the challenge for the neurosurgeon operating way of performing this translation does not succeed. on Amy’s brain. If the stimulus has not been widely inte- Suppose the synergy of function (x,y) 7→ z is defined as grated then the neurosurgeon can concentrate on a single localised area of her brain and hopefully the encoding will be S0(x,y : z)= C(z|x)+C(z|y) −C(z|xy) −C(z|x ∩ y) overt, reflecting the original unintegrated format in which the information was originally transmitted. However, if the stim- where C(z|x ∩ y) is the shortest program that outputs z given ulus has been successfully integrated (i.e. compressed) then advice x or y, (i.e. the program outputs z on any of the two its encoding will reflect the overlap of patterns between it and advices x or y). The following result shows that, using this the entire contents of Amy’s brain. Its representation will be definition, the concatenation function turns out to have high widely distributed, with effects on all kinds of other memo- S0 synergy, which is anomalous. ries, making it impossible to isolate and edit. Theorem 2 Consider the concatenation function z(x,y) = We quantify the integration of an encoding process oper- xy. Then z is a lossless function of S0 synergy |z|/2. ating on a stimulus as the minimum informational distance between the original state of the encoded stimulus and any Proof. Pick two independent n/2-bit random strings x,y start- possible edited state. If every state is completely different to ing with 0 resp. 1 i.e. x = 0..., y = 1... and C(x|y)= n/2 the original, then the integration is 1; if there exists an edited and C(y|x)= n/2. state which is only trivially removed, the integration is 0. By definition of synergy For example, when an image on a digital camera is altered,

S0(x,y : z)= C(z|x)+C(z|y) −C(z|xy) −C(z|x ∩ y) the informational distance between the camera’s original and edited state is small. In contrast, the neurosurgeon strug- where the first two terms are n/2, the third is O(1), and the gles to edit the memories in Amy’s brain: changing even the last is n/2 because of the following program p. p is an O(1) slightest detail requires the contents of her brain to be com- instructions part followed by the bitwise XOR of x,y denoted pletely reconstructed. The edit distance is so great that her w, i.e. n/2 + O(1) bits total. Instructions: Given advice a, original brain state is largely useless for identifying a target XOR a with w to obtain d. If d starts with 0 output da, else edited brain state. output ad. So when a = x, d = y and we output ad = xy. Formally, the edit distance of m at point z is a number be- Similarly when a = y then d = x and we output da = xy, i.e. tween 0 and 1 that measures the level of integration of m(z). C(z|x ∩ y)= n/2. ⊓⊔ It is measured by looking at all strings z′ similar to z, and finding the one that minimizes the ratio of length of the short- In the following section, we outline an alternative strategy est description of m(z) given m(z′) to the length of shortest for defining integrated information using C-complexity. description of m(z). The smallest ratio obtained is the edit distance. Since the numerator is always positive and less or Amy’s brain and directly edit her conscious memories, be- equal to the denominator, the edit distance is between 0 and cause the process of integration is irreversibly complex. 1. This edit distance quantifies information integration for Yet Amy’s brain is a physical causal system which fol- lossless functions. lows the laws of physics. Information flows into Amy’s brain Definition 1 The edit distance of m at point z is given by conducted by nerve impulses and gets processed by neurons through biochemical signalling. Whatever information- C(m(z)|m(z′)) lossless changes result should theoretically be reversible. To min { }. z′6=z:C(z|z′)≤log|z| C(m(z)) argue otherwise seems to suggest that a form of magic is going on in the brain, which is beyond computationalmodelling. On the Computability of Integration McGinn (1991) points out that intractable complexity of In this section we prove an interesting result using the above the mind does not necessarily require the brain to transcend definition, namely that lossless information integration can- the laws of physics: instead, the intractability can have an ob- not be achieved by a computable process. server specific source. He argues that the mind-body problem According to the integrated information theory, when we is cognitively closed to humans in the same way that quantum think of another person as conscious we are viewing them as mechanics is closed to a zebra. This perspective, known as a completely integrated and unified information processing ‘new mysterianism’, maintains that the hard problem of con- system, with no feasible means of disintegrating their consciousness stems, not from a supernatural process, but from scious cognition into disjoint components. We assume that natural limits in how humans form concepts. their behaviour calls into play all of their memories and re- Similarly, the apparent unitary nature of consciousness flects full coordination of their sensory input. We now prove does not require a mystical process of integration which tran- that this form of complete integration cannot be modelled scends physical computability. Our result merely establishes computationally. a link between integration and irreversibility, the cause of An integrating function’s output is such that the informa- which can be due to limitations in the observer’s perspec- tion of its two (or more)inputs is completelyintegrated. More tive. While we intuitively assume that consciousness must formally, be a fundamental property as defined from a God’s eye perspective, the attribution of this property always takes place in Definition 2 A 1-1 function m : z = (z1,z2) 7→ z¯ is integrating a social context. When people attribute consciousness to a ′ ′ ′ ′ if for any strings z 6= z , C(z¯ | z¯) ≥ C(z¯ ) −C(z | z). system they are acknowledging a subjective inability to break i.e, the knowledge of m(z) does not help to describe m(z′), it down into a set of independent components, forcing them when z and z′ are close. to treat its actions as the behaviour of a unified, integrated whole. The irreversibilty here is observer-centric, as opposed Theorem 3 No integrating function is computable. to absolute. Rather than establishing a new property of con- Proof. Suppose m is a computable integrating function. Let sciousness, our result can therefore be interpreted as merely z be a random string, i.e. such that C(z) ≥ |z|. Let z′ be clarifying what is meant by the use of this concept. Specif- the string obtained by flipping the first bit of z. We have ically, conscious behaviour is that which is resistant to our C(z′ | z)= O(1). Consider the following program forz ¯′ given best attempts at decomposition. z¯: Cycle through all strings until the unique z is found such that m(z)= z¯. Compute z′ by flipping the first bit of z. Com- Neuroscientific Modelling putez ¯′ = m(z′). An alternative account is simply that consciousness does not Since m is computable, the program above is of constant exist: the unitary appearance of people’s behaviour is recog- size i.e., C(z¯′ | z¯)= O(1). Also C(z¯′)=+ C(z′)=+ C(z) ≥ |z| nizable as an illusion. Dennett (1991) adopts this perspective because m is computable, 1-1 and by choice of z. with his multiple drafts model. He views consciousness as Because m is integrating, we have C(z¯′ | z¯) ≥ C(z¯′) − being inherently decomposable, criticizing the idea of what C(z′ | z)= |z|− O(1), a contradiction. ⊓⊔ he calls the ‘Cartesian theatre’, a point where all of the information processing in the brain is integrated. Dennett presents The implications of this proof are that we have to abandon consciousness as a succession of multiple drafts, a process either the idea that people enjoy genuinely unitary conscious- in constant flux, without central organization or irreversible ness or that brain processes can be modelled computationally. binding. If a person’s behaviour is totally resistant to disintegration Could neuroscience provide us with a mechanical model (i.e. we cannot analyse it independently from the rest of their of human behaviour that supersedes the value of attributing cognition), then it implies that something is going on in their consciousness, as Dennett (1991) suggests? It sometimes brain that is so complex it cannot feasibly be reversed. In line arises that a system to which we have previously attributed with this view, Bringsjord and Zenzen (1997) specifically ar- unitary consciousness is subsequently recognized as follow- gue that the difference between cognition and computation is ing mechanical rules. For example, when conversing with that computation is reversible whereas cognition is not. For a chatbot, we might suddenly notice that its responses can instance, it is impossible for the neurosurgeon to operate on be predicted solely on the basis on the preceding sentence. We then adopt the superior rule-based model and cease to at- computational modelling can disentangle. tribute consciousness. Fodor (2001) summarizes as follows: “Local mental pro- Ultimately, for consciousness to be revealed as an illusion, cesses appear to accommodate pretty well to Turing’s theory people would have to agree that neuroscientific modelling that thinking is computation; they appear to be largely modu- succeeds in disintegrating every aspect of behaviour. Note lar...By contrast, what we’ve found out about global cognition that the key word here is ‘people’: people would have to is mainly that it is different from the local kind...we deeply agree. Arguably, the ultimate standard that we have for mea- do not understand it”. While neuroscience might shed light surement depends on the notion of other observers, which on the input and output functions of the brain, the quantifi- are themselves integrated, unified wholes. For this reason, cation for integrated information we have presented here im- Maguire and Maguire (2011) speculate that future develop- plies that it will be unable to shed light on the complex tangle ments in information theory will recognize the intractable that is core consciousness. complexity of the mind as a key concept supporting the notion of objectivity in measurement, a shift which would un- References dermine the meaningfulness of the goal to ‘understand’ the Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., & mind. Walter, P. (1997). Molecular biology of the cell. Bringsjord, S., & Zenzen, M. (1997). Cognition is not Scramble In, Scramble Out computation: The argument from irreversibility. Synthese, Assuming integrated consciousness is a genuine phe- 113(2), 285–320. nomenon, its noncomputability has interesting implications Dennett, D. C. (1991). Consciousness explained. Little, for what has to happen in the brain. When stimuli are picked Brown. up by the brain they enter at disintegrated locations. For ex- Fodor, J. A. (2001). The mind doesn’t work that way: The ample, visual stimuli enter through the optic nerve and are scope and limits of computational psychology. MIT press. processed initially by the primary visual cortex. When a vi- Griffith, V. (2014). A principled infotheoretic ψ-like measure. sual stimulus is encoded in the occipital lobe it clearly has not arXiv preprint arXiv:1401.0978. yet been integrated with the rest of cognition. For instance, Legg, S., & Hutter, M. (2007). Tests of machine intelli- Stanely, Li and Dan (1999) analysed an array of electrodes gence. In 50 years of artificial intelligence (pp. 232–242). embedded in the thalamus lateral geniculate nucleus area of a Springer. cat and were able to decode the signals to generate watchable Levin, L. A. (1974). Laws of information conservation (non- movies of what the cat was observing. growth) and aspects of the foundation of probability theory. Similarly, the initiation of action must be localised in par- Problems Information Transmission, 10(3), 206–210. ticular areas of the brain which control the relevant mus- Li, M., & Vitányi, P. (2008). An introduction to kolmogorov cles. This readiness potential must detach from the rest of complexity and its applications. Springer. the brain’s processing and hence is no longer integrated. For Maguire, P., & Maguire, R. (2010). Consciousness is data example, following up on Libet’s original experiments, Siong compression. In Proceedings of the thirty-second confer- Soon et al. (2008) demonstrated that, by monitoring activ- ence of the cognitive science society (pp. 748–753). ity in the frontopolar prefrontal cortex they could predict a Maguire, P., & Maguire, R. (2011). Understanding the com- participant’s decision to move their right or left hand several plexity of the mind. European Perspectives on Cognitive seconds before the participant became aware of it. Science. However, if integration is necessary for consciousness, McGinn, C. (1991). The problem of consciousness: Essays then somewhere between the stimulus entering the brain and towards a resolution. Blackwell Oxford, UK. the decision leaving the brain, there is a point where the in- Solomonoff, R. J. (1964). A formal theory of inductive infer- formation cannot be fully disentangled from the rest of cog- ence. part i. Information and Control, 7(1), 1–22. nition. This integrated processing cannot be localised to any Soon, C. S., Brass, M., Heinze, H.-J., & Haynes, J.-D. (2008). part of the brain or any specific point in time. The contents of Unconscious determinants of free decisions in the human cognition are effectively unified. We label this idea ‘scramble brain. Nature Neuroscience, 11(5), 543–545. in, scramble out’ to reflect the irreversible integration and dis- Stanley, G. B., Li, F. F., & Dan, Y. (1999). Reconstruction integration that must occur between observation and action. of natural scenes from ensemble responses in the lateral The aspects of cognition that have been clarified by neu- geniculate nucleus. The Journal of Neuroscience, 19(18), roscience so far tend to involve processing before scram- 8036–8042. ble in or after scramble out. For example, it is well estab- Tononi, G. (2008). Consciousness as integrated information: lished that the occipital lobe is involved in visual process- a provisional manifesto. The Biological Bulletin, 215(3), ing or that the prefrontal cortex encodes future actions before 216–242. they are performed. These components are modular in that Williams, P. L., & Beer, R. D. (2010). Nonnegative de- they have specialised, encapsulated, evolutionarily developed composition of multivariate information. arXiv preprint functions. However, somewhere between input and output arXiv:1004.2515. there must also be a binding process of integration that no