Complex Social Systems a guided exploration of concepts and methods

Information Theory

Martin Hilbert (Dr., PhD) Today’s questions I. What are the formal notions of data, & knowledge?

II. What is the role of in complex systems? …the 2nd law and life…

III. What is the relation between information and growth? Complex Systems handle information

"Although they [complex systems] differ widely in their physical attributes, they resemble one another in the way they handle information. That common feature is perhaps the best starting point for exploring how they operate.” (Gell-Mann, 1995, p. 21)

Source: Gell-Mann, M. (1995). The Quark and the Jaguar: Adventures in the Simple and the Complex. New York: St. Martin’s Griffin. Complex Systems Science as computer science

“What we are witnessing now is … actually the beginning of an amazing convergence of theoretical physics and theoretical computer science” Chaitin, G. J. (2002). Meta-Mathematics and the Foundations of Mathematics. EATCS Bulletin, 77(June), 167–179.

Society computes

Numerical: Analytical: Agent-based models Information Theory Why now? Thermodynamics: or ?

Aerodynamics: or ? Hydrodynamics: or ?

Electricity: or ?

etc… Claude Shannon Alan Turing John von Neumann Bell, D. (1973). The Coming of Post- Industrial Society: A Venture in Social Forecasting. New York, NY: Basic Books. “industrial organization of goods” => “industrial organization of information”

15,000 citations

35,000 citations Beniger, J. (1986). The Control Revolution: Castells, M. (1999). The Information Technological and Economic Origins of the Age, Volumes 1-3: Economy, Society Information Society. Harvard University Press and Culture. Cambridge (Mass.); “exploitation of information” Oxford: Wiley-Blackwell. => “control social & economic processes” “society lives in networks”

3,000 citations => “information as well” Shannon, C. (1948). A Mathematical Theory of Communication. Technical Journal, 27, 379–423, 623–656. Shannon, C. E., & Weaver, W. (1949). The Mathematical Theory of Communication. University of Illinois Press.

U.S. Congress, Office of Technology Assessment. (1995). Global communications : opportunities for trade and aid. (OTA-ITC-642nd ed.). U.S. Government Printing Office. From the Source Coding Theorem to the Channel Coding Theorem Information Communication

Claude Shannon (1948) Information Theory Primer: information counts differences  Syntactic and Semantic information o In information theory and computer science, the definition of information consists in its quantification (just like the definitions of “heat” (C○;F), “speed” (mph; km/h), or “water” (g,l) are independent from being good, bad or useless for you) o To assign a value to the meaning of information, we first require its quantity: Ralph Hartley [value / unit of information], [US$ / unit of information], or [fitness increase / unit of information], etc. (1888 – 1970)  Ralph Hartley & “the difference” o Information consists in the differentiation among alternative possibilities: no information without differences!  information does only exist if there are different choices  can be understood as the opposite of uncertainty (the more uncertainty we have regarding different possibilities, the less information we have; the more information we have, the less uncertainty)  What’s the most basic difference? o Binary: [black,white]; [up,down]; [left,right]; [head,tail]; [42,non-42]; [no-current,current]; [0,1] (!)

o How much information is revealed with a binary decision (coin flip)? o To make sense, each additional unit should provide as much information as the previous: additive measure = each additional unit gives the number of choices at this additional step

Source: Hartley, R. V. L. (1928). Transmission of Information. Bell System Technical Journal, International Congress of Telegraphy and Telephony, Lake Como, Italy, 1927, 535–563 Information Theory Primer: finding the right measure

 How many binary symbols (coin flips) are needed to describe 8 differences?

? 1st coin flip

Ralph Hartley (1888 – 1970) 2nd coin flip [h,h] [h,t] [t,h] [t,t]

3rd coin flip [111] [110] [101] [100] [011] [010] [001] [000] 4th coin flip…?  Answer: …through the “2-ary uncertainty” revealed at each step: 1 flip o 2 choices = 1 symbol (coin flip) o 2 = 2 choices o 4 choices = 2 symbols o 22 flips = 4 choices 2 ∗ 2 ∗ 2 = 23 = 8 o 8 choices = 3 symbols o 23 flips = 8 choices 푙표푔2 2 ∗ 2 ∗ 2 = 푙표푔28 = 3 o 16 choices = 4 symbols o 24flips = 16 choices

Source: Hartley, R. V. L. (1928). Transmission of Information. Bell System Technical Journal, International Congress of Telegraphy and Telephony, Lake Como, Italy, 1927, 535–563 Information Theory Primer: base of log defines metric system

 How many senary (6-ary) symbols are needed to describe 36 choices? ? 1st roll of dice

[1,3] [2,4] [4,4] 2nd roll of dice Base of logarithm defines metric system, just like: - feet vs. meters - liters vs. gallons  6-ary code: o 6 choices = 1 symbol (dice rolls) o 61 roll = 6 choices o 36 choices = 2 symbols o 62 rolls = 36 choices 6 ∗ 6 ∗ 6 = 63 = 216 o 216 choices = 3 symbols o 63 rolls = 216 choices 푙표푔6 6 ∗ 6 ∗ 6 = 푙표푔6216 = 3

Source: Hartley, R. V. L. (1928). Transmission of Information. Bell System Technical Journal, International Congress of Telegraphy and Telephony, Lake Como, Italy, 1927, 535–563 Information Theory Primer: Shannon’s (1948) idea of the bit

Shannon’s game of twenty questions: 220 = 1,048,576 => …down to 5 miles2 !

Claude Shannon (1916 – 2001)

Claude E. Shannon (1948) A Mathematical Theory of Communication, Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656. A Information Theory Primer B C coding Yes D E No F G H Yes I J K No Yes L No M N Claude Shannon Yes O (1948) P A Mathematical 2 3 4 5 J Theory of 1 R bits S Communication, Yes Bell System Technical Journal, T Vol. 27, pp. 379–423, 623–656. No No U V Yes W genius: X No Y Transmit: Z Ñ “g” Yes Á No É Í Ó => “g” = yes-yes-no-no-yes = 11001 Ú A Information Theory Primer B C + coding Yes D P E R No F O G H B Yes I A J B K I No Yes L L No M Yes N I Claude Shannon O T (1948) P Y A Mathematical J 1 2 3 4 5 bits– Theory of R S D Communication, No Yes Bell System Technical Journal, T I Vol. 27, pp. 379–423, 623–656. No U S V T Yes W genius: X R No Y I Transmit: Z B Ñ U Yes Á “i” T No É Í I Ó O => “i” = yes-no-yes-yes-yes = 10111 Ú N => consider REDUNDANCY A Information Theory Primer B C + Yes D P E R No F O G H B Yes I A J B K I No Yes L L No M Yes N I O T P Y 1 2 3 4 5 Jbits– R S D Yes No T I Source Coding No U S V T Theorem: Yes W genius: X R What’s the purest No Y I form Transmit: Z B Ñ U of information? Yes Á “i” T ! ENTROPY ! No É Í I Ó O => “i” = yes-no-yes-yes-yes = 10111 Ú N => consider REDUNDANCY COMPRESSION Transmit less data, so you can transmit more information “I do not ne_d to comm______(…with the same amount of ev_ry det_il, _u kn_w what I data symbols…) mean, and _u rec_ive all the info____ anywa_s, right? Source Coding

Probability of letters Probability of words Theorem: in English alphabet: in English language: What’s the purest form E: 13% “the”: 10% T: 10% “ of”: 5.1% of information? A: 8% “in”: 2.7% ! ENTROPY ! O: 7% … … “with”: 0.66% Q: 0.121% “from”: 0.64% Z: 0.077% ... COMPRESSION Transmit less, so you can transmit more!

SourceL Purandare, A. (2008). JPEG vs JPEG2000 Comparison P.S. there is also “lossy compression”, but this is simply elimination of some information (reducing quality / detail / information). Real “data compression” into information is “loss-less”. Source Coding Theorem: What’s the purest form of information? ! ENTROPY !

1993! telecom

number of performance of sum up their product telecom devices devices Infrastructure Hardware 7% per year 28% 8% per year per year X Software = compression

10.4% per year

Number of X X = (drops) bits storage

number of performance of sum up their product storage devices devices Infrastructure Hardware 8% per year 25% 5% per year per year X Software = compression

11% per year

Number of X X = (stones) bits The 2nd law of thermodynamics: how structure and information relate

“The law that entropy always increases holds, I think, the supreme position among the laws of ” (Eddington, 1927)

=> only 1 way to => 20 ways to => 6! = 720 ways create this structure create this structure to create this structure

......

? ? ?

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

t1 no uncertainty t2 max. uncertainty t3 = min. entropy = max. entropy Things decay as time passes = Things become more uniformly distributed as everything interacts with everything

= as time goes by it is most likely (almost certain) that the most likely thing will happen…

Sources: Eddington, A. S. (1927). The Nature of the Physical World. Kessinger Publishing. As time goes by entropy increases…

…BUT what about Isaac Asimov’s “Last Question” ever asked: "Can entropy ever be reversed?“ Life reverses entropy all the time!

Erwin Schrödinger

Sources: Asimov, Isaac. The Last Question. Science Fiction Quarterly. November 1956 (1887-1961) 3/4 of 1/4 of the How does evolution the time time “communicate”?

50 % 50 % 33 % 67 % 50 % 50 %

33 % 67 % 20 % 80 % 3/4 of 1/4 of the How does evolution the time time “communicate”?

20 % 80 % 11 % 89 % 20 % 80 %

11 % 89 % 6 % 94 % How does evolution “communicate”?

t1 => t2 => … => tn

25 % of the time

75 % of the time

The eco-”system” communicates an environmental structure to the system, which stores this information in its own structure A How much information B C does an evolutionary Yes D E dynamic communicate? No F G H Yes I J No K Yes L Yes No M N O P . 5 bits 4 3 2 1 J . R . No S . Yes T . No U . Yes V . W . “Survival of the fittest” X . among 32 choices No Y . Z . communicates 5 bits Ñ . Yes Á . No É . Í . Ó . Ú . 풑(푩풆풕풂풎풂풙)풕=ퟏퟗퟕퟔ= ퟎ. ퟓ 풑(푽푯푺)풕=ퟏퟗퟕퟔ= ퟎ. ퟓ t = 1976

풘(푩풎) = ퟎ. ퟎퟖ 풘(푽푯푺) = ퟏ. ퟗퟐ t = 1984

풕=ퟏퟗퟕퟔ 풑(푩풎)풕=ퟏퟗퟕퟔ= ퟎ. ퟎퟒ 풑(푽푯푺) = ퟎ. ퟗퟔ

풘(푩풎) = ퟎ 풘(푽푯푺) = ퟏ. ퟎퟒퟏퟔ t = 2002

풑(푽푯푺)풕=ퟐퟎퟎퟐ= ퟏ …and now for the intuitively challenged among us:

t = 0

푡 푡 푝1 = 0.5 푝2 = 0.5 퐭 퐭 퐭 퐰ퟏ = 4 퐰ퟐ = 1 풘 = 2.5

t = 1

푡+1 푡+1 푝1 = 0.2 푝2 = 0.2 퐭+ퟏ 퐭+ퟏ 풘 = 4 퐭+ퟏ 퐰ퟐ = 1 퐰ퟏ = 4 = − 푝(푛푡+1) log 푝(푛푡+1) + 푝(푛푡) log 푝(푛푡) t = … … …

T = final Observing the 2nd Law

There are more ways particles can be mixed than there are ways to order them …just like with dice…

Source: Wilensky, U. (1997). NetLogo GasLab Two Gas model. Center for Connected Learning and Computer-Based Modeling, Northwestern University Deep down the rabbit hole: Maxwell’s demon (well alive for 120 years!)

“…conceive of a being whose faculties are so sharpened that…”

“Intelligence” can reduce entropy! Arising structure can be used to do work! …but what does “intelligence” mean?

Source: Maxwell, J. C. (1872). Theory of heat. Westport, Conn., Greenwood Press. http://www.archive.org/details/theoryheat02maxwgoog Feynman, Richard P. (1963). The Feynman Lectures on Physics, Vol. 1. Massachusetts, USA: Addison-Wesley. Chapter 46. James Clerk Maxwell Leff, Harvey S. & Andrew F. Rex, ed. (1990). Maxwell's Demon: Entropy, Information, Computing. Bristol: Adam-Hilger. ISBN 0-7503-0057-4. (1831-1879) Leff, Harvey S. & Andrew F. Rex, ed. (2002). Maxwell's Demon 2: Entropy, Classical and Quantum Information, Computing Maxwell’s demon

Particles speed: blue < 5 < green < 15 < red

Source: Wilensky, U. (1997). NetLogo GasLab Maxwells Demon model Center for Connected Learning and Computer-Based Modeling, Northwestern University. “Mind-less” Maxwell’s demon in environment in equilibrium  Particle is uniformly distributed 50% - 50%

Szilard Engine ? ?

Leo Szilard (1898 – 1964) “Mind-less” Maxwell’s demon in environment in equilibrium  Particle is uniformly distributed 50% - 50%

Szilard Engine ? ?

? ? “Mind-less” Maxwell’s demon in environment in equilibrium  Particle is uniformly distributed 50% - 50%

Szilard Engine ? ?

? ? “Mind-full” Maxwell’s demon

Szilard Engine ? ? Store 1 bit: Act by 0 = left & 1 = right allocating the 0 or 1 Szilard engine left “Mind-full” Maxwell’s demon

Szilard Engine ? ? Store 1 bit: Act by Gain energy Store 1 bit: 0 = left & 1 = right 0 = left & 1 = right allocating equivalent to the “1 bit” 0 or 1 Szilard 0 or 1 engine left Act by allocating the Szilard engine right “Mind-full” Maxwell’s demon

Szilard Engine ? ? Store 1 bit: Act by Gain energy Store 1 bit: 0 = left & 1 = right 0 = left & 1 = right allocating equivalent to the “1 bit” 0 or 1 Szilard 0 or 1 engine left Act by allocating the Szilard Gain energy engine right equivalent to “1 bit”

Charles Bennett Leo Szilard (1982) (1928) & Rolf Landauer Sources: Szilard, L. (1929). über die Entropieverminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen. Zeitschrift für Physik A Hadrons and Nuclei, 53(11), 840–856. Bennett, C. H. (1982). The thermodynamics of computation—a review. International Journal of Theoretical Physics, 21(12), 905–940. “Mind-full” Maxwell’s demon

Szilard Engine ? ? Store 1 bit: LooseGain energy Store 1 bit: 0 = left & 1 = right 0 = left & 1 = right equivalentequivalent toto “1“1 bit”bit” 0 or 1 0 or 1

LooseGain energy equivalent to “1 bit”

Charles Bennett (1982) & Rolf Landauer

Sources: Bennett, C. H. (1982). The thermodynamics of computation—a review. International Journal of Theoretical Physics, 21(12), 905–940. Maxwell’s demon in environment out of equilibrium

Szilard Engine ? ? Store 1 bit: 0 = left & 1 = right

0 or 1 Maxwell’s demon in environment out of equilibrium

Szilard Engine ? ? Store 1 bit: Gain energy Discovering 0 = left & 1 = right equivalent to pattern of (erasure cost of) “right particle 0 or 1 1 bit series”! Maxwell’s demon in environment out of equilibrium

Szilard Engine ? ? Store 1 bit: 0 = left & 1 = right

0 or 1 Maxwell’s demon in environment out of equilibrium

Szilard Engine ? ? Store 1 bit: 0 = left & 1 = right

0 or 1 Maxwell’s demon in environment out of equilibrium

Szilard Engine ? ? Store 1 bit: 0 = left & 1 = right

0 or 1 Maxwell’s demon in environment out of equilibrium

Szilard Engine ? ? Store 1 bit: 0 = left & 1 = right

0 or 1 Summing up 3-step exorcism by the hands of:

“Intelligence” is required to make observations  The demon needs to store information! Leo Szilard (1928)

The deletion of information costs energy  Every (deletion of an) observation cost as much energy as it gains Charles Bennett (1982) through observation & Rolf Landauer

If the observed reality has structure (“out of equilibrium”), observations can be compressed  Store less bits than energy gained  We can exploit the structure around us …once we discover / learn the structure! Wojciech Zurek (1989) Exploiting structure through knowledge & information

t1

t2

t3

Andréi Kolmogórov Claude Shannon (1903 - 1987) (1916 – 2001) Exploiting structure through knowledge & information

t1

t2

t3

Andréi Kolmogórov Claude Shannon (1903 - 1987) (1916 – 2001) Exploiting structure through knowledge & information

t1

t2

t3

t4 Andréi Kolmogórov Claude Shannon (1903 - 1987) (1916 – 2001)

t5

Rain Sun Repeat 0 1 0 1 t6

t7

t 8 Algorithm: “an ordered set of unambiguous, executable steps that defines a terminating process” Observe and repeat[011] (Brookshear, 2009). Computer Science: An store n bits Overview (10th ed.). Addison Wesley; p. 205). Exploiting structure through knowledge & information The danger of overfitting t1 0 y = a*x + k t 2 1 # of bits vs. t3 1 exactitude t4 0 Y = a*x3 + b*x2 + c*x1 + k t5 1

t6 1

t Andréi Kolmogórov 7 0

(1903 - 1987) t8 1 Y = a*x6 + b*x5 + a*x4 + c*x3 + d*x2 + e*x1 + k Two-step Minimum t9 1 Overfitting? …maybe just accidental t Description Length (MDL) 10 0 noise without predictive power? t11 0 17 bit brute force: [01101101100110111]

t12 1 3 2 1 13 bit algorithm (requires 2bits to codify # 3): Y = a*x + b*x + c*x + k + ԑ + ԑ ([011]*3) [00110111] t13 1 ԑ t Two-step MDL: 11 bit algorithm: 14 0 ([011]*3)[0]([011]*2)[1] Rule + exceptions t15 1 ԑ

Rissanen, J. (1978). Modeling By Shortest Data t16 1 Practical Stats applications: Description. Automatica, 14, 465–471. - Bayesian Information Criterion (BIC) Rissanen, J. (2010). Information and Complexity in t17 1 - Akaike Information Criterion (AIC) Statistical Modeling. Springer. (Grunwald, 2007. The Minimum Description Length Principle. The MIT Press). Exploiting structure through knowledge & information

t1

t2

t3

Andréi Kolmogórov Claude Shannon (1903 - 1987) (1916 – 2001) Li and Vitanyi (1997; p. 187): Cover and Thomas (2006; p. 463): “It is a beautiful fact that these two “It is an amazing fact that the expected notions turn out to be much the same” length of the shortest binary computer (theorem of equality between stochastic entropy and description of a random variable is expected algorithmic complexity). approximately equal to its entropy”. 퐻(푠|푡) ≥ 퐸 퐾 푠|푡 ≥ 퐻 푠|푡

Identify city in U.S. Knowledge & Information regarding elephant Identify it Algorithm Shannon’s Describe among all non- describing game of 20 elephant elephants = the city = questions = bit by bit = probabilistic deterministic probabilistic deterministic Zurek’s learning demon & effective complexity  Distinguishes between things: …and in between are o (Kolmogorov complexity) known The Doors of perception… o and unknown (Shannon uncertainty)

Random  Requires that: microstate o the system has the ability to identify patterns & to compress them (smart us!)

o the environment has structure (lucky us!)

Highly ordered microstate

Sources: Zurek, W. H. (1989). Algorithmic randomness and physical entropy. Physical Review A, 40(8), 4731. Caves, C. (1990). Entropy and Information: How much information is needed to assign a probability? In W. H. Zurek (Ed.), Complexity, Entropy and the Physics of Info(91–115). Oxford: Westview Press. Gell-Mann, M., & Lloyd, S. (1996). Information measures, effective complexity, and total information. Complexity, 2(1), 44–52. 퐻 푅ate|푆푡푎푡푒 = − 푝 푠 ∗ 푝 푟|푠 ∗ log푝(푟|푠) 표푣푒푟 푎푙푙 푠 표푣푒푟 푎푙푙 푟

= − 푝 푟, 푠 ∗ log 푝 푟 푠 = −퐸푠 푎푛푑 푟 log 푝 푟 푠 표푣푒푟 푎푙푙 푠 푎푛푑 푟

푝 푟 = 퐴|푠 = 퐴 푝 푟 = 퐵|푠 = 퐵 푝 푟 = 퐵|푠 = 퐴 0.2 푝 푠 = 퐴 푝 푠 = 퐵 0.8 A B 0.4 0.75 0.25 푝 푟 = 퐴|푠 = 퐵 0.6 still 0.8 0.2

0.5 0.5 like dislike

1.0

퐻 푅ate|푆푡푎푡푒 = − 푝 푠 ∗ 푝 푟|푠 ∗ log푝(푟|푠) 표푣푒푟 푎푙푙 푠 표푣푒푟 푎푙푙 푟

= − 푝 푟, 푠 ∗ log 푝 푟 푠 = −퐸푠 푎푛푑 푟 log 푝 푟 푠 표푣푒푟 푎푙푙 푠 푎푛푑 푟 Life reverses entropy thanks to information it has about the environment!

Erwin Schrödinger (1887-1961)

Sources: Adami, C. (1997). Introduction to Artificial Life (Corrected.). New York: Springer. Adami, C. (2011). The use of information theory in evolutionary biology. Binder, P. M., & Danchin, A. (2011). Life’s demons: information and order in biology. EMBO Reports, 12(6), 495–499. Optimal fitness as perfect absorption of signals Kelly 1956; Donaldson-Matasci, Lachmann & Bergstrom 2008; 2010; Rivoire & Leibler 2011

푮+

1 bit of information = reduction of uncertainty by half

Claude Shannon (1948)

½ * uncertainty = 1 bit of informationinformation == 2 * Growth 푰 푬; 푪 The more you know, the more you can grow

풅 + + + = 푬풆 퐥퐨퐠 .푾 − 푫푲푳 푷 풆|품 푴 풆|품 − 푯 푬 푮 − 푫푲푳 푷 풆, 품 푷 풆, 품

Hilbert, M. (2015). Fitness as Information Fit: Communication Channel… . http://ssrn.com/author=1827058 Information & Growth

Environment Updated population type

푊 w w 푊

풅 푮𝒓풐풘풕풉 = 푬풆 퐥퐨퐠 .푾 − 푯 푬 푮 − 푫푲푳 푷 풆|품 푷 풆|풎 − 푰 푬 ; 푮

Hilbert, M. (2015). An Information Theoretic Decomposition of Fitness: Engineering the Communication Channels of Nature and Society (SSRN Scholarly Paper No. ID 2588146). Social Science Research Network. http://papers.ssrn.com/abstract=2588146 Different levels of coarse-graining “…strictly speaking, neither genes, nor cells, nor organisms, nor ideas evolve. Only populations can evolve” Nowak (2006); p.14

Beak cells Non-beak cells

풑(푩풆풂풌)풕= ퟎ. ퟏ 풑(풏풐풏푩)풕≈ ퟎ. ퟗ

t

푾 풑풐풑풖풍풂풕풊풐풏 풘(푩풆풂풌) = ퟑ 풘(풏풐풏푩) = ퟏ = ퟏ. ퟐ

풑(푩풆풂풌)풕+ퟏ= ퟎ. ퟐퟓ 풑(풏풐풏푩)풕+ퟏ= ퟎ. ퟕퟓ

t +1 “…strictly speaking, neither genes, nor cells, nor organisms, nor ideas evolve. Only populations can evolve” Nowak (2006); p.14

Finch

t

푾 풑풐풑풖풍풂풕풊풐풏 Malthusian fitness: = ퟏ. ퟐ 풍풐품(푾) = 풍풐품(ퟏ. ퟐ)

t +1

Law of requisite variety "Variety absorbs variety" or "Variety begets variety“? Level 1 Level 2|Level 1

t = 0 A C

a b c d

30%

30%

풖풏

100%

100%

=

40%

70%

70% 푷 Requisite variety: “the value of information is bound by the mutual information between the t = 1 system and its observer”

(Touchette & Lloyd 2000; 2004; Rivoire & Leibler 2011) 풖풏

50%

50%

0%

=

0%

8

8

풆 60 % 60

Here: “the value of information is bound by the 푷 50%

mutual information between the environment and 50%

0%

0% 2 the (avg. updated) evolving population” 2 t = 2 http://youtu.be/z7bVw7lMtUg 6:55

http://youtu.be/sBHGzRxfeJY https://www.khanacademy.org/computing/ computer-science/informationtheory

Gleick (2011). The Information: A History, a Theory, a Flood