<<

, algorithmic complexity, and subjective

Thomas L. Griffiths Joshua B. Tenenbaum [email protected] [email protected] Department of Psychology Brain and Cognitive Department Stanford University Massachusetts Institute of

Abstract accounts that can express the strong prior knowl- We present a statistical account of human random- edge that contributes to our inferences. The struc- ness judgments that uses the idea of algorithmic tures that people find simple form a strict (and flex- complexity. We show that an existing measure of ible) subset of those easily expressed in a computer the randomness of a sequence corresponds to the as- sumption that non-random sequences are generated program. For example, the sequence of heads and by a particular probabilistic finite state automaton, tails TTHTTTHHTH appears quite complex to us, even and use this as the basis for an account that evalu- though, as the parity of the first 10 digits of π, it ates randomness in terms of the length of programs is easily generated by a computer. Identifying the for machines at different levels of the Chomsky hi- kinds of regularities that contribute to our sense of erarchy. This approach results in a model that pre- dicts human judgments better than the responses simplicity will be an important part of any cognitive of other participants in the same . theory, and is in fact necessary since is not computable (Kolmogorov, 1965). The development of theory prompted There is a crucial middle ground between Kol- cognitive scientists to formally examine how humans mogorov complexity and the arbitrary encoding encode experience, with a of schemes be- schemes to which Simon (1972) objected. We will ing used to predict subjective complexity (Leeuwen- explore this middle ground using an approach that berg, 1969), memorization difficulty (Restle, 1970), combines rational with algorith- and sequence completion (Simon & Kotovsky, 1963). mic . This approach gives an in- This proliferation of similar, seemingly arbitrary tuitive transparency to measures of complexity by theories was curtailed by Simon’s (1972) expressing them in terms of , and uses that the inevitable high correlation between mea- computability to establish meaningful differences be- sures of information content renders them essentially tween them. We will test this approach on judg- equivalent. The development of algorithmic infor- ments of the randomness of binary sequences, since mation theory (see Li & Vitanyi, 1997, for a detailed randomness is one of the key applications of Kol- account) has revived some of these ideas, with code mogorov complexity: Kolmogorov (1965) suggested lengths playing a central role in recent accounts of that random sequences are irreducibly complex, a human concept (Feldman, 2000), subjective notion that has inspired several psychological theo- randomness (Falk & Konold, 1997), and the role of ries (eg. Falk & Konold, 1997). We will analyze sub- simplicity in cognition (Chater, 1999). Algorithmic jective randomness as an inference about the source information theory avoids the arbitrariness of ear- of a sequence X, comparing its probability of being lier approaches by using a single universal code: the generated by a random source, P (X random), with complexity of an object (called the Kolmogorov com- its probability of generation by a more| regular pro- plexity after Kolmogorov, 1965) is the length of the cess, P (X regular). Since probabilities map directly shortest computer program that can reproduce it. to code lengths,| P (X regular) uniquely identifies a Chater and Vitanyi (2003) argue that a preference measure of complexity.| This formulation allows us for simplicity can be seen throughout cognition, from to identify the properties of an existing complexity to language learning. Their argument is measure (Falk & Konold, 1997), and extend it to based upon the important constraints that simplicity capture more of the statistical structure detected provides for solving problems of induction, which are by people. While Kolmogorov complexity is ex- central to cognition. Kolmogorov complexity gives a pressed in terms of programs for a universal Turing formal of addressing “asymptotic” questions machine, many of the regularities people detect are about induction, such as why anything is learnable computable by simpler devices. We will use Chom- at all, but the constraints it imposes are too weak sky’s (1956) of formal languages to orga- to support the rapid inferences that characterize hu- nize our analysis, testing a set of nested models that man cognition. In order to explain how human be- can be interpreted in terms of the length of programs ings learn so much from so little, we need to consider for automata at different levels of the hierarchy. Complexity and randomness Falk & Konold (1997) DP model The idea of using a code based upon the length of Finite state model computer programs was independently proposed by Solomonoff (1964), Kolmogorov (1965), and Chaitin (1969), although it has come to be associated with Kolmogorov. A sequence X has Kolmogorov com- plexity K(X) equal to the length of the shortest program p for a (prefix) universal

U that produces X and then halts, Subjective randomness

K(X) = min l(p), (1) p:U(p)=X 2 4 6 8 10 12 14 16 18 20 where l(p) is the length of p in bits. Kolmogorov Number of alternations complexity can be used to define algorithmic proba- bility, with the probability of X being Figure 1: randomness ratings from Falk and Konold (1987, Experiment 1), shown with the pre- K(X) l(p) R(X) = 2− = max 2− . (2) dictions of DP and the finite state model. p:U(p)=X with 10 alternations. The mean DP has a similar There is no requirement that R(X) sum to one over profile, achieving a maximum at 12 alternations and all sequences; many probability distributions that giving a correlation of r = 0.93. correspond to codes are unnormalized, assigning the missing probability to an undefined sequence. Subjective randomness as Kolmogorov complexity can be used to mathemat- a statistical inference ically define the randomness of sequences, identify- Psychologists have claimed that the way we think ing a sequence X as random if l(X) K(X) is small − about chance is inconsistent with probability the- (Kolmogorov, 1965). While not necessarily follow- ory (eg. Kahneman & Tversky, 1972). For ex- ing the form of this definition, psychologists have ample, people are willing to say that X1=HHTHT preserved its spirit in proposing that the perceived is more random than X2=HHHHH, while they are randomness of a sequence increases with its complex- equally likely to arise by chance: P (X1 random) = ity. Falk and Konold (1997) consider a particular 1 5 | P (X2 random) = ( 2 ) . However, many of the ap- measure of complexity they call the “difficulty pre- parently| irrational aspects of human judgments can dictor” (DP ), calculated by counting the number of be understood by considering the possibility that runs (sub-sequences containing only heads or tails), people are assessing a different kind of probability – and adding twice the number of alternating sub- instead of P (X random), we evaluate P (random X) sequences. For example, the sequence TTTHHHTHTH (Griffiths & Tenenbaum,| 2001). | is a run of tails, a run of heads, and an alternating The statistical basis of subjective randomness be- sub-sequence, DP = 4. If there are several parti- comes clear if we view randomness judgments in tions into runs and alternations, DP is calculated terms of a signal detection task (cf. Lopes, 1982; 1 on the partition that results in the lowest score. Lopes & Oden, 1987). On seeing a stimulus X, Falk and Konold (1997) showed that DP corre- we consider two hypotheses: X was produced by lates remarkably well with subjective randomness a random process, or X was produced by a regular judgments. Figure 1 shows the results of Falk and process. Finding regularities is an important part Konold (1997, Experiment 1), in which 97 partici- of identifying predictable processes, a fundamental pants each rated the apparent randomness of ten bi- component of induction (Lopes, 1982). The deci- nary sequences of length 21, with each sequence con- sion about the source of X can be formalized as a taining between 2 and 20 alternations (transitions , from heads to tails or vice versa). The mean rat- ings show the classic preference for overalternating P (random X) P (X random) P (random) | = | , (3) sequences: the sequences perceived as most random P (regular X) P (X regular) P (regular) are those with 14 alternations, while a truly random | | process would be most likely to produce sequences in which the posterior odds in favor of a random gen- erating process are obtained from the likelihood ratio 1We modify DP slightly from the definition of Falk and the prior odds. The only part of the right hand and Konold (1997), who seem to require alternating sub- side of the equation affected by X is the likelihood sequences to be of even length. The equivalence results shown below also hold for their original version, but it ratio, which led Griffiths and Tenenbaum (2001) to makes the counter-intuitive interpretation of HTHTH as define the subjective randomness of X as a run of a single head, followed by an alternating sub- sequence, DP = 3. Under our formulation it would be P (X random) random(X) = log | , (4) parsed as an alternating sequence, DP = 2. P (X regular) | being the evidence that X provides towards the con- 1 for zi = 1, 3, 5 and 0 for zi = 2, 4, 6 we have a clusion that it was produced by a random process. regular generating process based on repeating four When evaluating binary sequences, it is natural to “motifs”: state 1 repeats H, state 2 repeats T, states set P (X random) = ( 1 )l(X). Taking the 3 and 4 repeat HT, and states 5 and 6 repeat TH. δ | 2 in base 2, random(X) is l(X) log2 P (X regular), is the probability of continuing with a motif, while depending entirely on P−(X regular).− We| obtain α defines a prior over motifs, with the probability of random(X) = K(X) l(X),| the difference between producing a motif of length k proportional to αk. the complexity of a sequence− and its length, if we Having defined this HMM, the equivalence to choose P (X regular) = R(X), the algorithmic prob- DP is straightforward. For a choice of Z indi- | ability defined in Equation 2. This is identical to the cating n1 runs and n2 alternating sub-sequences, n n1 n2 1 δ n1+n2 n1+2n2 mathematical definitions of randomness discussed P (X,Z) = δ − − ( 2α+2− α2 ) α . Tak- above. However, the key point of this statistical ap- ing P (X regular) to be maxZ P (X,Z), it is easy to proach is that we are not restricted to using R(X): show that| random(X) = DP log α when δ = 0.5 − we have a measure of the randomness of X for and α = √3 1 . By varying δ and α, we ob- any choice of P (X regular), reducing the problem of 2− | tain a more general model: as shown in Figure specifying a measure of complexity to the more intu- 1, taking δ = 0.525, α = 0.107 gives a better itive task of determining the probability with which fit to the of Falk and Konold (1997), r = sequences are produced by a regular process. 0.99. This also addresses some of the counter- A statistical model of intuitive predictions of DP : if δ > 0.5, increas- ing the length of a sequence but not changing randomness perception the number of runs or alternating sub-sequences DP does a good job of predicting the results of reduces its randomness, since P (X regular) grows Falk and Konold (1997), but it has some counter- faster than P (X random). With the| choices of δ intuitive properties, such as independence of length: and α given above,| random(HHTHT) = 3.33, while HHTHT and HHHHHHHHHHHHHHHHHTHT both have random(HHHHHHHHHHHHHHHHHTHT) = 2.61. The DP = 3, but the former seems more random. In this effect is greater with larger values of δ. section we will show that using DP is equivalent Just as the algorithmic probability R(X) is a to specifying P (X regular) with a hidden Markov defined by the length of pro- model (HMM), providing| the basis for a statistical grams for a universal Turing machine, this choice of model of randomness perception. P (X random) can be seen as specifying the length of An HMM is a probabilistic finite state automa- “programs”| for a particular finite state automaton. ton, associating each symbol xi in the sequence X = The output of an automaton is determined by its x1x2 . . . xn with a “hidden” state zi. The probability state sequence, just as the output of a universal Tur- of X under this model is obtained by summing over ing machine is determined by its program. However, all possible hidden states, P (X) = P (X,Z), since the state sequence is the same length as the se- PZ where Z = z1z2 . . . zn. The model assumes that each quence itself, this alone does not provide a meaning- xi is chosen based only on zi and each zi is chosen ful measure of complexity. In our model, probabil- based only on zi 1, allowing us to write P (X,Z) = ity imposes a metric on state sequences, dictating a n − n P (z0) i=2 P (zi zi 1) i=1 P (xi zi). An HMM can greater cost for moves between certain states. Since Q | − Q | thus be specified by a transition matrix P (zi zi 1), we find the state sequence Z most likely to have pro- | − a prior P (z0), and an emission matrix P (xi zi). Hid- duced X, we have ananalogue of Kolmogorov com- den Markov models are widely used in statistical| ap- plexity defined on a finite state automaton. proaches to speech recognition and . Using DP as a measure of randomness is equiva- lent to specifying P (X random) with an HMM cor- 1 2 responding to the finite| state automaton shown in Figure 2. This HMM has six states, and we can H T define a transition matrix

2 2 δ Cα Cα 0 0 Cα 5  Cα δ Cα2 0 0 Cα2  4 Cα Cα 0 δ 0 Cα2 H T P (zi zi 1) =  2  (5) | −  Cα Cα δ 0 0 Cα   2   Cα Cα Cα 0 0 δ  T H  Cα Cα Cα2 0 δ 0  6 3

where each row is a vector of (unnormalized) transi- Figure 2: The finite state automaton corresponding to the HMM described in the text. Solid lines indi- tion probabilities (ie. the first row is P (zi zi 1 = 1)), 2 | −2 and a prior P (z0) = (Cα Cα Cα 0 0 Cα ), with cate motif continuation, dotted lines are permitted 1 δ state changes, and numbers label the states. C = − 2 . If we then take P (xi = H zi) to be 2α+2α | Lopes & Oden (1987) − Repetition Lopes & Oden (1987) − Alternation 1 1 amined as a function of the number of alternations and whether or not a sequence was symmetric, in- 0.8 0.8 cluding both mirror symmetry (all sequences for 0.6 0.6 which x1x2x3x4 = x8x7x6x5) and “cyclic” symme- try (HTHTHTHT, HHTTHHTT, HHHHTTTT and their 0.4 0.4 P(correct) P(correct) complements). Their results are shown in Figure 3, Theoretical 0.2 Symmetric 0.2 together with the theoretical optimal performance Asymmetric 0 0 that could be obtained with perfect knowledge of the 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Number of alternations Number of alternations processes generating the sequences. Deviations from Finite state model Context−free model Finite state model Context−free model optimal performance reflect a difference between the 1 1 1 1 P (X regular) implicitly used by participants and the | 0.5 0.5 0.5 0.5 distribution used to generate the sequences.

P(correct) Our Bayesian model naturally addresses this deci- 0 0 0 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 sion problem. By the relationship between log odds Number of alternations and probabilities, we have 1 Figure 3: Results of Lopes and Oden (1987, Experi- P (random X) = ment 1) together with predictions of finite state and | 1 + exp λ random(X) ψ {− − } context-free models, illustrating effects of symmetry. where λ scales the effect of random(X), with λ = 1 Ascending the Chomsky hierarchy a correctly weighted Bayesian inference, and ψ = log P (random) is the log prior odds. Fitting the fi- Solomonoff’s (1964) contemplation of codes based P (regular) upon computer programs was initiated by Noam nite state model outlined in the previous section to Chomsky’s talk at the 1956 Dartmouth Summer this data gives δ = 0.638, α = 0.659, λ = 0.128, Study Group on Artificial Intelligence (Li & Vi- ψ = 2.75, and a correlation of r = 0.90. However, tanyi, 1997, p. 308). Chomsky was presenting a as shown− in Figure 3, the model does not predict the formal hierarchy of languages based upon the kinds effect of symmetry. of computing machines required to recognize them Accommodating symmetry requires a “context- (Chomsky, 1956). Chomsky identified four types of free” model for P (X regular). This model allows languages, falling into a strict hierarchy: Type 3, sequences to be generated| by three methods: repe- the simplest, are regular languages, recognizable by tition, producing sequences with probabilities deter- a finite state automaton; Type 2 are context-free mined by the HMM introduced above; symmetry, languages, recognizable by a push-down automaton; where half of the sequence is produced by the HMM Type 1 are context-sensitive languages, recognizable and the second half is produced by reflection; and by a Turing machine with a tape of length linearly complement symmetry, where the second half is pro- bounded by the length of the input; Type 0 are re- duced by reflection and exchanging H and T. We then cursively enumerable languages, recognizable by a take P (X regular) = maxZ,M P (X,Z M)P (M), standard Turing machine. Kolmogorov complexity where M is| the method of production.| Since the is defined with respect to a universal Turing ma- two new methods of producing a sequence go be- chine, capable of recognizing Type 0 languages. yond the capacities of a finite state automaton, this There are features of regular sequences that can- can be viewed as imposing a metric on the programs not be recognized by a finite state automaton, be- for a push-down automaton. Applying this model longing to languages on higher levels of the Chom- to the data from Lopes and Oden (1987), we obtain sky hierarchy. One such feature is symmetry: the δ = 0.688, α = 0.756, λ = 0.125, ψ = 2.99, and set of symmetric sequences is a classic example of a probabilities of 0.437, 0.491, 0.072, for− repetition, context-free language, and symmetry is known to symmetry, and complement symmetry respectively, affect subjective randomness judgments (Lopes & with a fit of r = 0.98. As shown in Figure 3, the Oden, 1987). Here we will develop “context-free” model captures the effect of symmetry. (Type 2) and “context-sensitive” (Type 1) models Duplication is a context-sensitive regularity: the that incorporate these regularities. set of all sequences generated by repeating a sub- Lopes and Oden (1987, Experiment 1) illustrated sequence exactly twice forms a context-sensitive lan- the effects of symmetry on subjective randomness guage. This kind of regularity can be incorporated using a signal detection task, in which participants into a “context-sensitive” model in the same way classified sequences of length 8 as being either ran- as symmetry, but the results of Lopes and Oden dom or non-random. Half of the sequences were (1987) are at too coarse a grain to evaluate such generated at random, but the other half were gen- a model. Likewise, these results do not allow us to erated by a process biased towards either repeti- identify whether our simple finite state model cap- tion or alternation, depending on condition. The tures enough regularities: since only motifs of length proportion of sequences correctly classified was ex- 2 are included, random(THHTHHTH) is quite large. Table 1: Log likelihood (correlation) for models Finite state model Data CS model Model 4 motifs 22 motifs r=0.69 Finite state -1617.95 (0.65) -1597.47 (0.69) Context-free -1577.41 (0.74) -1553.59 (0.79) Context-sensitive -1555.47 (0.79) -1531.05 (0.83) Repetition To evaluate the contribution of these factors, we Symmetry Complement conducted an experiment testing two sets of nested Duplication models. The experiment was based on Lopes and Oden (1987, Experiment 1), asking participants to Context−free model classify sequences as regular or random. One set r=0.79 of models used the HMM equivalent to DP , with 4 motifs and 6 states. The second used an HMM ex- tended to allow 22 motifs (all motifs up to length 4 that were not repetitions of a smaller motif), with a total of 72 states. In each set we evaluated three models, at different levels in the Chomsky hierarchy. The finite state (Type 3) model was simplest, with four free parameters: δ, α, λ and ψ. The context-free Context−sensitive model (Type 2) model adds two parameters for the proba- 1 bilities of symmetry and complement symmetry, and r=0.83 the context-sensitive (Type 1) model adds one more parameter for the probability of duplication. Be- cause the three models are nested, the simpler being 0.5 a special case of the more complex, we can use likeli- hood ratio tests to determine whether the additional P(random|X) parameters significantly improve the fit of the model. 0 0 0.5 1 Method Data Participants Figure 4: Scatterplots show the relationship between Participants were 20 MIT undergraduates. model predictions and data, with markers according Stimuli to the process the context-sensitive model identified as generating the sequence. The arrays on the right Stimuli were sequences of heads (H) and tails (T) presented in 130 point fixed width sans-serif font on show the sequences used in the experiment, ordered a 19” monitor at 1280 1024 pixel resolution. from most to least random by both human responses × and the context-sensitive model. Procedure Participants were instructed that they were about gave a further significant improvement, χ2(1) = to see sequences which had either been produced by 45.08, p < 0.0001. Scatterplots for these three a random process (flipping a fair coin) or by other models are shown in Figure 4, together with se- processes in which the choice of heads and tails was quences ordered by randomness based on the data not random, and had to classify these sequences ac- and the context-sensitive model. The parameters cording to their source. After a practice session, each of the context-sensitive model were δ = 0.706, α = participant classified all 128 sequences of length 8, in 0.102, λ = 0.532, ψ = 1.99, with probabilities random order, with each sequence randomly start- of 0.429, 0.497, 0.007, 0.067− for repetition, symmetry, ing with either a head or a tail. Participants took complement, and duplication. This model also ac- breaks at intervals of 32 sequences. counted well for the data sets discussed earlier in the paper, obtaining correlations of r = 0.94 on Results Falk and Konold (1997) and r = 0.91 on Lopes We optimized the log-likelihood for all six models, and Oden (1987) using exactly the same parame- with the results shown in Table 1. The model with 4 ter values, showing that it can give a good account motifs consistently performed worse than the model of randomness judgments in general and is not just with 22 motifs, so we will focus on the results of overfitting this particular data set. the latter. The context-free model gave a significant The likelihood ratio tests suggest that the context- improvement over the finite state model, χ2(2) = sensitive model gives the best account of human ran- 87.76, p < 0.0001, and the context-sensitive model domness judgments. Since this model has several free parameters, we evaluated its generalization per- that uses the framework of rational statistical in- formance using a split-half procedure. We randomly ference to explore measures of complexity that are split the participants in half ten times, computed more restrictive than Kolmogorov complexity, while the correlation between halves, and fit the context- retaining the principles of algorithmic information sensitive model to one half, computing its correlation theory by organizing these measures in terms of com- with both that half and the unseen data. The mean putability. This approach provides us with a good correlation (obtained via a Fisher z transformation) account of subjective randomness, and suggests that between halves was r = 0.73, while the model gave it may be possible to develop restricted measures of a mean correlation of r = 0.77 on the fit half and complexity applicable elsewhere in cognitive . r = 0.76 on the unseen half. The fact that the cor- Acknowledgments We thank Liz Baraff for her de- relation with the unseen data is higher than the cor- votion to , Tania Lombrozo, Charles relation between halves suggests that the model is Kemp, and Tevya Krynski for significantly reducing accurately extracting information about the statis- P (random this paper), and Ruma Falk, Cliff Konold, tical structure underlying subjective randomness. Lola Lopes,| and Gregg Oden for answering questions, Discussion providing data and sending challenging postcards. TLG was supported by a Stanford Graduate Fellowship. The results of our experiment and model-fitting sug- gest that the perceived randomness of binary se- References quences is sensitive to motif repetition, symmetry Chaitin, G. J. (1969). On the length of programs for and symmetry in the complement, and duplication computing finite binary sequences: statistical consider- of a sub-sequence. In fact, these regularities can be ations. Journal of the ACM, 16:145–159. incorporated into a statistical model that predicts Chater, N. (1999). The search for simplicity: A fun- human judgments better than the responses of other damental cognitive principle? Quarterly Journal of Experimental Psychology, 52A:273–302. participants in the same experiment. These regular- Chater, N. and Vitanyi, P. (2003). Simplicity: a unify- ities can be recognized by a Turing machine with ing principle in cognitive science. Trends in Cognitive a tape of length linearly bounded by the length of Science, 7:19–22. the sequence, corresponding to Type 1 in the Chom- Chomsky, N. (1956). Threee models for the description sky hierarchy. Our statistical model provides a com- of language. IRE Transactions on Information Theory, putable measure of the randomness of a sequence in 2:113–124. the spirit of Kolmogorov complexity, but defined on Falk, R. and Konold, C. (1997). Making sense of random- a simpler computing machine. ness: Implicit encoding as a bias for judgment. Psycho- logical Review, 104:301–318. The probabilistic approach presented in this pa- Feldman, J. (2000). Minimization of boolean copmlexity per provides an intuitive method for developing mea- in human concept learning. Nature, 407:630–633. sures of complexity. However, we differ from exist- Kahneman, D. and Tversky, A. (1972). Subjective prob- ing accounts of randomness that make claims about ability: A judgment of representativeness. (Chater, 1999; Falk & Konold, 1997) in Psychology, 3:430–454. viewing probability as primary, and the relationship Kolmogorov, A. N. (1965). Three approaches to the between randomness and complexity as a secondary quantitative definition of information. Problems of In- consequence of a statistical inference comparing ran- formation Transmission, 1:1–7. dom generation with generation by a more regular Leeuwenberg, E. L. L. (1969). Quantitative specification of information in sequential patterns. Psychological Re- process. This approach emphasizes the interpreta- view, 76:216–220. tion of subjective randomness in terms of a ratio- Li, M. and Vitanyi, P. (1997). An introduction to Kol- nal statistical inference, and explains why complex mogorov complexity and its applications. Springer Ver- sequences should seem more random in terms of lag, London. P (X regular) being biased towards simple outcomes: Lopes, L. L. (1982). Doing the impossible: A note on random| sequences are those that seem too complex induction and the experience of randomness. Journal to have been produced by a simple process. of Experimental Psychology, 8:626–636. Lopes, L. L. and Oden, G. C. (1987). Distringuishing Chater and Vitanyi (2003) argue that simplicity between random and nonrandom events. Journal of may provide a unifying principle for cognitive sci- Experimental Psychology: Learning, Memory and Cog- ence. While simplicity undoubtedly plays an im- nition, 13:392–400. portant role in guiding induction, being able to use Restle, F. (1970). Theory of serial pattern learning. Psy- these ideas in cognitive science requires developing a chological Review, 77:481–495. means of quantifying simplicity that can accommo- Simon, H. A. (1972). Complexity and the representa- date the kind of strong prior knowledge that human tion of patterned sequences of symbols. Psychological Review, 79:369–382. beings bring to bear on inductive problems. Kol- Simon, H. A. and Kotovsky, K. (1963). Human acquisi- mogorov complexity provides a universal, objective tion of concepts for sequential patterns. Psychological measure, and a firm foundation for this endeavour, Review, 70:534–546. but is very permissive in the kinds of structures it Solomonoff, R. J. (1964). A formal theory of inductive identifies as simple. We have described an approach inference. part i. Information and Control, 7:1–22.