Research Article A Unified Model of Complex Specified Information
George D. Montanez*˜ Department of Computer Science, Harvey Mudd College, Claremont, CA, USA
Abstract A mathematical theory of complex specified information is introduced which unifies several prior methods of computing specified complexity. Similar to how the exponential family of probability distributions have dissimilar surface forms yet share a common underlying mathematical identity, we define a model that allows us to cast Dembski’s semiotic specified complexity, Ewert et al.’s algorithmic specified complexity, Hazen et al.’s functional information, and Behe’s irreducible complexity into a common mathematical form. Adding additional constraints, we introduce canonical specified complexity models, for which one-sided conservation bounds are given, showing that large specified complexity values are unlikely under any given continuous or discrete distribution and that canonical models can be used to form statistical hypothesis tests, by bounding tail probabilities for arbitrary distributions.
Cite As: Montanez˜ GD (2018) A Unified Model of Complex Specified Information. BIO-Complexity 2018 (4):1-26. doi:10.5048/BIO-C.2018.4. Editor: Robert J. Marks II Received: June 29, 2018; Accepted: November 30, 2018; Published: December 14, 2018 Copyright: © 2018 George D. Montanez.˜ This open-access article is published under the terms of the Creative Commons Attribution License, which permits free distribution and reuse in derivative works provided the original author(s) and source are credited. Notes: A Critique of this paper, when available, will be assigned doi:10.5048/BIO-C.2018.4.c. *Email: [email protected]
“Life must depend on a higher level of complexity, structure without predictable repetition, he argued.” - James Gleick, “The Information: A History, a Theory, a Flood”, ch. 9
INTRODUCTION Recent work in specified complexity has proposed using algorithmic information theory as a way of com- Specified complexity, the property of an object being both putationally grounding specified complexity, under the unlikely and structurally organized, has been proposed name of algorithmic specified complexity (ASC) [5, 6]. as a signifier of design [1–3]. Objects exhibiting specified Ewert and collaborators have shown that under any pro- complexity must be both complex (e.g., unlikely under posed discrete distribution, sequences exhibiting high the relevant probability distribution) and specified (e.g., levels of ASC must be unlikely [6]. We extend this body conform to an independent or detached specification). of work by generalizing the result of Ewert et al. [6], While mathematical methods for measuring specified showing that all specified complexity models of a certain complexity have changed over the decades [1, 4–6], the class, which we call canonical models, share this property. underlying components of specification and complex- We re-derive the existing result of Ewert et al. [6] as a ity have remained constant. Possible applications of corollary to our general theorem, and prove a number specified complexity include differentiating malicious net- of previously unknown results, including a conservation work behavior from non-intentional anomalies, identify- result for a variant of semiotic specified complexity, in- ing genetically-modified material in agricultural products troduced in Dembski (2005) [4]. These results add to (such as artificially engineered gene sequences), identify- a body of work generally known as conservation of in- ing intelligently engineered extra-terrestrial signals, and formation theorems, which prove that in many contexts testing for artificial general intelligence [7]. At its core, it takes as much information to find an object as you it has been proposed as a method of sorting artificial might gain by using that object, rendering explanations or teleological structures from naturally occurring unin- that rely on such specialized objects as cases of special tentional ones. Specified complexity has been severely pleading. Viewed as a conservation theorem, if one char- criticized, both for its underlying logic and its various acterizes specified complexity as a form of information formulations [8–11]. signal measurable in bits, the probability under the pro-
Volume 2018 | Issue 4 | Page 1 A Unified Model of Complex Specified Information posed distribution of encountering a structure exhibiting the function prior to observing x, but need not be, as b or more bits of complex specified information is no long as the function itself is not conditioned on x when greater than 2−b, thus incurring an information cost of computing the value for x. In other words, specification at least b bits. functions of the form νx(x) or ν(x|x) are invalid. This More importantly, this work formally reduces speci- danger of fabrication is a general vulnerability of statis- fied complexity testing to a form of statistical hypothe- tical hypothesis testing, since one could inappropriately sis testing, a view inspired by Dembski (2005) [4] (and choose the significance level α (acting as the specification preceded by Milosavljevi´c[12]). Accordingly, we demon- function) in response to the observed test statistic value, strate the connection for canonical specified complexity or even define a test statistic Tx(X) = c · 1{X = x} models, where the model core (called a kardis) plays the (where 1{·} denotes the indicator function, which equals same role as a p-value under a null distribution. We 1 when the argument evaluates to true, and 0 otherwise) prove that for such models the null-distribution probabil- that always assigns a large value c to the particular x ity mass of values as extreme as the one observed is upper observed and zero to all other possible observations, lead- bounded by the significance level α, which is the crucial ing to the smallest possible critical region, a singleton property for p-value significance tests. Thus, we formally set containing only the observation itself. If the danger establish the canonical model as a particular statistical of fabrications forces us to reject specified complexity hypothesis test statistic, one which would prove useful hypothesis testing then we must also reject many other in cases where only a likelihood and a specification value statistical hypothesis tests, which all suffer from the same are available and the behavior of the distribution outside vulnerability. of the given observation is otherwise unknown. We begin by reviewing the general structure of speci- 1.1 Common Form and Canonical Models fied complexity models and defining their common and We begin with a few definitions. canonical model forms, along with providing a general Definition 1 (ν(X )). For any integrable nonnegative recipe for constructing new specified complexity models. specification function ν : X → , define ν(X ) as We then review statistical hypothesis testing and the R≥0 follows: level-α property, formally reducing p-value testing to a form of specified complexity model hypothesis testing. R ν(x)dx if continuous, Conservation and level-α test properties of canonical X ν(X ) := P ν(x) if discrete, (1) specified complexity models are proven, and instructions x∈X R for conducting specified complexity model hypothesis X dν(x) in general. tests are also given. Algorithmic specified complexity, Definition 2 (Common Form and Kardis). For any semiotic specified complexity, and functional informa- probability distribution p(x) on space X , any strictly tion are reduced to common form, canonical variants positive scaling constant r ∈ and any nonnegative are derived, and conservation theorems are proven for R>0 function ν : X → , we define a common form model each as corollaries to our results for the general canonical R≥0 as specified complexity models. While criticisms of speci- fied complexity [8–11] (and their rebuttals [2, 13–16]) are p(x) SC(x) := − log r (2) duly acknowledged, we do not attempt to address them 2 ν(x) here, as the purpose of this manuscript is to prove the mathematical commonality of several complexity metrics, with specified complexity kardis κ(x) = r(p(x)/ν(x)). not to defend them individually. Definition 3 (Canonical Specified Complexity Model). Any common form model constrained such that ν(X ) ≤ r 1. SPECIFIED COMPLEXITY MODELS is a canonical specified complexity model. A specified complexity model is a function of an obser- vation X taking values in some space X , where X is It should be stated explicitly that the constraint for distributed according to probability distribution p(x) canonical models is what allows us to bound the proba- (denoted as X ∼ p). It contains two components, one bility of observing extreme specified complexity values, measuring the likelihood of the observation under p (re- and that ideally ν(X ) = r, which allows us to obtain ferred to as the complexity term for historical reasons), tight bounds. In addition, the scaling constant r should and another component that measures the specificity not depend on x. To define a canonical model, it suffices 1 of the observation, according to a specification function to define the kardis κ(x) and ensure that the constraints ν(x). This function should be computable or capable of are satisfied for r, p, and ν. As a useful example, given being estimated given x and should be definable indepen- any kardis κ(x) from a canonical specified complexity dently of the particular observation x, to avoid cherry- 1From Greek kardi´a (heart) and Ilocano kardis (pigeon pea, picking and fabrications. This can be done by defining which is a small component contained in a larger pod).
Volume 2018 | Issue 4 | Page 2 A Unified Model of Complex Specified Information model, we can define an augmented kardis (and thus f(x) is the original function and ≥ 0 is some small augmented model) that incorporates a significance level constant used to prevent undefined values. If needed, term within the model. We do so as follows. we can then rescale γ(x) into the range [0, 1] by using zero-one scaling: Definition 4 (rα, κα(x), and SCα(x)). Given κ(x) = γ(x) − γ r(p(x)/ν(x)) from any canonical specified complexity ν(x) := min (3) model and any significance level α ∈ (0, 1], define γmax − γmin 0 r where we have defined γmin := minx0∈X γ(x ) and γmax r := , α α analogously. If γ already has the desired range does not p(x) need rescaling, simply define ν(x) := γ(x). κ (x) := r , and α α ν(x) Define r := ν(X ) as a normalizing constant (or set r to any real-valued number exceeding this quantity). SCα(x) := − log2 κα(x) Given any proposed probability distribution p defined = SC(x) + log2(α). on X and following Equation2, we define the generic specified complexity kardis as It is easy to verify that any SCα(x) thus defined is p(x) also a canonical specified complexity model, since rα ≥ r κ(x) := r (4) by the condition on α, which implies ν(X ) ≤ rα. Such ν(x) significance level models will allow us to take arbitrary and the new specified complexity model as the negative canonical form specified complexity models and use them log of the kardis, namely directly for α-level hypothesis testing (Theorem3). Much like exponential family distributions have di- SC(x) := − log2 κ(x). (5) verse surface forms yet can all be reduced to a single Having verified that p is a probability distribution on X , common form, existing specified complexity models, such ν is a nonnegative function on X where ν(X ) ≤ r, we see as algorithmic specified complexity and semiotic specified that this recipe produces canonical specified complexity complexity, can also be reduced to the common form of models. Definition2. With little (and sometimes no) additional work, we can define canonical variants of these models. Having one canonical form is important for many rea- 2. STATISTICAL HYPOTHESIS TESTING sons, not least of which because it allows us to prove Statistical hypothesis testing [17–19] allows one to reject general properties of many specified complexity models a proposed probability distribution (null distribution) as simultaneously, such as an explanation for a particular observation, whenever observations of that kind have low probability under the Corollary 1. (Level-α Property for Specified Complexity distribution. There are two broad classes of statistical I) Given X ∼ p and significance level α ∈ (0, 1] and any tests [20, 21], those based on Neyman-Pearson hypothe- canonical specified complexity model with kardis κ(X), sis testing and those based on Fisherian p-values. These classes of tests have different forms, origins, and even Pr (κ(X) ≤ α) ≤ α. statistical philosophies, but underlying both is the com- We will discuss this result in Section3, and give a mon idea that when collections of observations have low proof in the Appendix. probability under a null distribution, rejecting that distri- bution as an explanation for the data leads to beneficial 1.2 Specified Complexity Model Recipe outcomes. Neyman-Pearson tests establish a procedure We now give a recipe constructing new specified com- with good long-term statistical properties, controlling plexity models for any finite discrete X . For x ∈ X , both false-positive and false-negative errors (Type-I and define D as the set of all x such that x is the product of Type-II), whereas Fisher’s p-values allow for the rejection some process of interest. Let γ(x) be any bounded func- of a null distribution without having to consider an alter- tion on X quantifying some aspect of objects in D (such native hypothesis. Hybrid tests [19, 20] can be formed as functional coherence, organization, irreducible com- by first choosing a significance level α, then computing a plexity, meaning, relative compressibility, etc.), where p-value and rejecting the null distribution if it is below larger values give stronger evidence of being produced α. Such tests, if performed consistently and repeatedly, by the process of interest. Thus defined, γ acts as a will reject the null hypothesis with probability not ex- featurizer for x, outputting a scalar-valued feature cor- ceeding α in the cases when it actually holds, so they related to its true label, the label being 1{x ∈ D}. We are often used as a “best-of-both-worlds” approach in 2 have assumed that larger values of γ indicate stronger practice. In general, a p-value is the smallest α at which correlation with membership, but if anti-correlated, we 2However, others view this as a “worst-of-both-worlds” method. can define γ(x) := −f(x) or γ(x) := (f(x) + )−1, where See, for example, the discussion in Lew [20].
Volume 2018 | Issue 4 | Page 3 A Unified Model of Complex Specified Information we would reject the null hypothesis [22]. Reporting a test probability (under the null distribution) of the set of all result using a p-value allows the reader to compare that observations that have test statistics at least as extreme value to any significance level they deem appropriate, as the value T (x) observed. Assuming that large test leaving to them the decision to reject or not reject based statistic values are extreme, they can be defined formally on the observed data [19]. Because of their popularity as follows: and utility, we will focus our discussion of statistical hy- pothesis testing to p-values and the level-α tests derived Definition 5 (P-Value for Large Extreme Test Statis- from them, as described in Casella and Berger (Section tics). For X ∼ Pθ, 8.3.4) [19]. pval(x) = Pθ (T (X) ≥ T (x)) . The reader should be aware, however, of the statis- tical controversy surrounding p-values (and statistical Similar definitions can be given for small extreme statistic hypothesis testing more generally [23]), such as abuses values, or large magnitude extreme values. like “p-hacking” and the potential for high false discov- ery rates even under ideal conditions [24]. The false Under p-value statistical testing, the null distribu- discovery rate measures the proportion of times one is tion is rejected as an explanation for the data when- wrong when rejecting the null hypothesis, even if one ever the p-value is less than the specified significance rarely chooses to reject it. While the α-level controls level. The following property holds for p-value statistical how often you reject the null hypothesis when it actu- tests, whenever testing for large or small extreme test ally holds (thus controlling Type-I error), it could be statistic values, since they are distributed according to the case that you almost never reject when it does not a Uniform-(0, 1) distribution whenever the test statistic hold (and you should), and thus the ratio comparing has a continuous distribution [22]: the number of times you incorrectly reject it to the total Definition 6 (Level-α Property). Given X ∼ Pθ, we say number of times you reject it overall could be quite large the level-α property holds for random variable Y := f(X) even for small α-levels. P-values are therefore imperfect if and ubiquitous tools, with the potential for abuse or mishandling; they should be handled with the care and Pr(Y < α) ≤ α. attention required for tool use in general. We note that p-values are themselves random vari- ables, being functions of the random variable X. By the 2.1 P-Values level-α property, the probability under the null distribu- n Given a set of observations X = {X1 = x1,...,Xn = tion of seeing a p-value of level α is therefore no greater xn}, we need some way of measuring how likely observa- than α. P-values for which the level-α property holds are tions of this type are under the proposed null distribution. called valid [19]. Because small p-values are by definition If the outcome is probable under the distribution, we unlikely under the proposed distribution, observation of have no reason to reject it as an explanation for the data; them may justify rejecting the distribution in question as if the observed data (and outcomes like it) are sufficiently an explanation for the data. Surprisingly, we can show unlikely under the null distribution, then we reject it as this level-α property also holds for canonical specified an explanation for the data. To do so, one needs to first complexity models, and so the same set of implications define a test statistic T , which is a real-valued function follow for such models. of Xn, and a significance level α, which defines the level of improbability for which we will reject the null hypoth- 2.2 P-Value Specified Complexity Models esis. The test statistic is typically some statistic of the A p-value statistical test can be cast into specified com- observed data (hence the name), such as the average plexity model form by simple algebraic manipulation. of the observations, the number of heads in a series of Begin with any such test that rejects a null distribution coin flips, or some other property of the data. Although when the p-value is smaller than α, namely, typically n > 1 (i.e., the test statistic is a function of more than just one observation), for simplicity we will pval(X) < α. (6) consider the special case of n = 1, which will also allow for easier comparison with specified complexity models If we rearrange the sides of the inequality and take nega- later. tive logs, we obtain Different hypothesis tests lead to different definitions p (X) − log val > 0, (7) of “extreme” test statistic values. For some tests, large 2 α T (X) are considered extreme and will lead to rejection, whereas in other cases small T (X) will do so. Other tests, which is a common form specified complexity model test such as two-tailed tests, reject large magnitude |T (X)| (letting r = 1), though not a canonical one (since α values. A p-value, denoted pval(x), is defined as the is not normalized over X ). However, for every test of
Volume 2018 | Issue 4 | Page 4 A Unified Model of Complex Specified Information this form there exists an equivalent canonical specified observation of “heads,” using the weighted coin analogy. complexity model test on a new two-element space, X˜, We can associate α with a default level of “fit” under the defined as follows.3 Let X˜ := {1, 0}, where the first specification distribution. When the heads outcome has element is identified with the extreme region of X , and a higher probability under this alternative distribution the second, with the complement of that region. The than under the null distribution, we reject the null dis- null distribution, test statistic, and original observation tribution in favor of the alternative. The test condition induce the following probability distribution on X˜: pval(X) < α is equivalent to
( pval(X) pval(X) ifx ˜ = 1, < 1, (10) p˜(˜x) := (8) α 1 − pval(X) ifx ˜ = 0. so we are essentially performing something like a modified Since the p-value is the probability of observing such likelihood ratio test under this interpretation, with the extreme values under the null distribution, the event of caveats that we are not considering maximization over observing X˜ = 1 has this same probability under p˜. Thus, a constrained and an unconstrained class of parameters, the original test is transformed into an equivalent test of nor are we allowing for general c cutoff values. a weighted coin flip (for a Bernoulli random variable X˜) to see if the outcome of heads is likely given its weight. Pseudo-trials Interpretation A second interpretation of For this new test the null distribution being directly the α term involves moving it from underneath the p- tested is no longer the original distribution, but is now value, to give a quantity r = 1/α ≥ 1. Returning again the induced Bernoulli distribution, p˜, with parameter to the weighted coin analogy, we can view r as the total number of coin flips we have, which gives us more chances pval(X). This new distribution serves as the complexity component of the specified complexity model. It is a of seeing the event in question (namely, a flip that lands proxy for the original null distribution, and rejecting it on heads). It is a number of pseudo-trials, which can be results in rejecting the original distribution as well (and thought of as hallucinated coin flips. We have observed vice versa). heads, and we want to know if that outcome was likely The α term can take one of two interpretations: it can given the number of flips r and the weight on heads be interpreted as either a specification function, which (probability pval(X)). Whenever r is an integer, we have ˜ assigns probability mass α to X = 1, or as a number of p (X) val = r · p (X) = [Y ] (11) replicational pseudo-trials r = 1/α, which are additional α val E chances of seeing the event in question [4]. We will discuss both interpretations shortly. where Y ∼ Binom(r, pval(X)) (i.e., Y is a binomially- Returning to the original non-canonical p-value spec- distributed random variable, with parameters N = r ified complexity model on X , it follows directly from the and p = pval(X), and with expected number of heads level-α property that equal to Np = r · pval(X)). Since we reject the null hypothesis whenever p (X)/α < 1, we see that the test val pval(X) is equivalent to testing if the expected number of heads Pr − log > 0 ≤ α. (9) 2 α is fewer than one under the proposed distribution. If it is, we reject the null distribution as an explanation for Furthermore, we can prove a conservation result for these the data, since we observed a heads outcome. models, for any desired α-level:
Theorem 1. (Conservation of P-Value Specified Com- plexity) Under the same conditions as Definition6, for Under either interpretation, we can see that p-value X ∼ Pθ and α ∈ (0, 1], tests are equivalent to specified complexity model hy- pothesis tests. Furthermore, for any specific p-value pval(X) −b test there exists an equivalent two-element canonical Pr − log2 > b ≤ α2 . α specified complexity model test. Taking the two-mass interpretation, define ν˜(X˜ = 1) := α, and we see that Two-mass Interpretation Consider the ratio pval(X)/α. testing As we have seen, the pval(X) term is the probability mass of the extreme region determined by the test statistic pval(X) < α (12) and null distribution. The α term can be interpreted as a probability mass, being the mass under a different is equivalent to testing distribution (a specification distribution) for the same ! p˜(X˜ = 1) 3 − log > 0, (13) I have been told this formulation formally corresponds to a 2 ν˜(X˜ = 1) coupling [25].
Volume 2018 | Issue 4 | Page 5 A Unified Model of Complex Specified Information on the two-element space X˜, with p˜(X˜ = 1) implicitly a What is the probability of observing a specified random variable of X. Under the pseudo-trials interpre- complexity value at least as extreme (‘special’) tation, define ν˜(X˜ = 1) := 1 and r := 1/α, so that we as SC(x)? obtain the equivalent test Theorem2 allows us to upper bound this probability −SC(x) − log rp˜(X˜ = 1) > 0, (14) by 2 . This result is somewhat surprising, since 2 it holds for arbitrary discrete or continuous distribu- tions with arbitrary tail behavior, without making any again defined in terms of a random variable X˜ ∼ p˜. strong parametric assumptions on the form of p. To For either definition of ν˜, letting ν˜(X˜ = 0) = 1 − gain intuition as to why specification functions bound ν˜(X˜ = 1), we have ν˜(X˜) = 1 ≤ r. Because of this, we see tail behavior, consider what it means to have a speci- that both specified complexity models in the new tests fied complexity value at least as extreme as SC(x). By are canonical specified complexity models. Thus, p-value Definition3, we have testing is reducible to canonical specified complexity model testing on a new space X˜. p(x) SC(x) = − log r . (16) 2 ν(x) 3. MAIN RESULTS Thus, for x0 ∈ X , SC(x0) ≥ SC(x) implies p(x0) ≤ p(x) Having defined canonical specified complexity models or ν(x0) ≥ ν(x) (or both). While many x0 elements in and shown their relationship to p-value statistical tests, X can have p(x0) ≤ p(x), which could even make the we next state a series of results for this family of models, occurrence of some small probability event all but certain with full proofs given in the Appendix. We demonstrate (as is true for uniform distributions on large spaces), it that one can upper bound the probability of observing cannot be the case that an arbitrarily large proportion extreme specified complexity values for arbitrary prob- may have ν(x0) ≥ ν(x), since ν(X ) ≤ r. Thus, increasing ability distributions, without needing the distribution the proportion of large ν values increases the scaling in question to have a tractable analytical or parametric constraint r, which implies a larger kardis rp(x)/ν(x) form, in contrast to traditional p-values. This feature is a and smaller SC(x) value, which controls the tightness of consequence of simultaneously observing a low probabil- the bound. ity value under the distribution and a high specification If we try to adversarially devise p and ν functions value under the specification function: since the spec- so as to violate the bound, we can see that the integra- ification values are normalized, few elements can have tion constraint still protects us, through the following large values. Although many elements can have low prob- informal argument. For any observed SC(x) value from ability values (thus making the occurrence of observing a canonical specified complexity model, we can attempt any such low-probability event probable, given enough of to choose p and ν such that the probability them), few can have low probability while being highly specified. Because specified complexity is an information Pr(SC(X) ≥ SC(x)) (17) quantity (namely, it is logarithmic in form), the resulting bound takes the form of a one-sided conservation result. is larger than the bound given by Theorem2. To ac- complish this, we must make this probability large (i.e., 3.1 Canonical Conservation Bound a large proportion of p’s mass must reside on regions Theorem 2 (Conservation of Canonical Specified Com- where the SC values exceed SC(x)) and the bound plexity). Let p(x) be any discrete or continuous prob- tight (i.e., the observed SC(x) must be large). This ability measure on space X , let r ∈ R>0 be a scaling gives us the best chance of violating the bound. To constant, and let ν : X → R≥0 be any nonnegative inte- address the first goal, we can make the probability of grable function where ν(X ) ≤ r. Then the set of extreme values as large as possible by as- 0 suming that x = argmaxx0∈X p(x ) and that such an p(X) argmax exists. Then, p(x0) ≤ p(x) for all x0 ∈ X . Be- Pr − log r ≥ b ≤ 2−b, (15) 2 ν(X) cause of this, ν can remain constant or decay with p, as long as p(x0)/ν(x0) ≤ p(x)/ν(x), which will guarantee where X ∼ p. SC(x0) ≥ SC(x) for all x0 ∈ X , thus maximizing the Remark. Any canonical specified complexity model will probability Pr(SC(X) ≥ SC(x)) by making it equal to satisfy the conditions of the above theorem by definition, 1. so it holds for all such models. To address the second goal, we assume the observed SC(x) is as large as possible under the conditions already Let us pause to understand the import of Theorem2. established. In such a case, ν will have contours that Given an observed specified complexity value SC(x), the decay at exactly the same rates as the contours of p, by key question becomes: the following argument. If ν decayed any slower (i.e.,
Volume 2018 | Issue 4 | Page 6 A Unified Model of Complex Specified Information there were some ν(x0) larger than absolutely necessary) r applies to kardii and specified complexity values, respec- would be larger than absolutely necessary, making SC(x) tively, with proofs given in the Appendix. The second smaller than need be, and thus violating our assumption result gives us a simple way to construct hypothesis tests that SC(x) is as large as possible. If ν decayed any from canonical specified complexity models. faster than p, there would exist some x0 ∈ X with posi- tive measure such that p(x0)/ν(x0) > p(x)/ν(x), which Corollary 1 (Level-α Property for Specified Complexity would contradict our assumption that the probability I). Given X ∼ p and significance level α ∈ (0, 1], let κ(x) Pr(SC(X) ≥ SC(x)) is maximized. Because the con- be the kardis from any canonical specified complexity tours of ν follows those of p exactly at all points, this model. Then makes ν a scaled copy of p, namely, ν(x) = cp(x). Max- Pr (κ(X) ≤ α) ≤ α. imizing SC(x) requires making rp(x)/ν(x) as small as possible. Either we can minimize r or p(x)/ν(x) or both. Corollary 2 (Level-α Property for Specified Complex- To minimize r, set r = ν(X ), its smallest possible value. ity II). Let SC(x) be any canonical specified complexity To minimize the ratio p(x)/ν(x) requires maximizing the model. Then for X ∼ p and significance level α ∈ (0, 1], separation between values of p and ν, which means max- imizing the scalar multiplier c. The multiplier must be Pr (SC(X) ≥ − log2 α) ≤ α. nonnegative, since ν is a nonnegative function (as is p). The multiplier must also not equal 0, since SC(x) would Corollary2 gives us a practical hypothesis test for all be undefined in that case. For any positive multiplier c canonical specified complexity models, where we reject less than r, the null hypothesis whenever test statistic T (x) (defined as SC(x)) exceeds − log2 α. Table1 gives cutoff values p(x) p(x) r for common α levels. As can be seen from the table, r = r = > 1. (18) ν(x) cp(x) c observing specified complexity values exceeding ten bits corresponds to an α-level of less than 0.001. This would imply a negative SC(x) for all x ∈ X , mak- ing the bound from Theorem2 exceed 1, thus trivially Table 1: Test statistic cutoff by α-level. satisfying it. Therefore, we can assume c ≥ r. Further- α T (x) more, our integration constraint on ν implies the scalar .1 3.33 bits multiplier must not exceed r, since p integrates to one, .05 4.33 bits which implies .01 6.65 bits r ≥ ν(X ) = cp(X ) = c. (19) .001 9.97 bits .0001 13.29 bits Thus, the scalar multiplier must equal r. This implies α − log2 α bits SC(x) = − log2(r/r) = 0. By Theorem2, the bound corresponding to SC(x) = 0 is Pr (SC(X) ≥ 0) ≤ 20 = 1, Augmented canonical specified complexity models so our bound is again satisfied even under this adversarial also possess the level-α property, and thus can also be case. used to define hypothesis tests. We first state a more The addition of specification is what allows us to general result (with proof given in the Appendix) that control these probabilities. Considering probabilities in allows us to demonstrate the property of such models as isolation is not enough. While unlikely events can hap- a direct corollary. pen often (given enough elements with low probability), specified unlikely events rarely occur. This explains why Theorem 3 (α-Exponential Bound). Given X ∼ p, sig- even though every sequence of one thousand coin flips nificance level α ∈ (0, 1], and any augmented canonical is equally likely given a fair coin, the sequence of all specified complexity model SCα(x) as defined in Defini- zeroes is a surprising and unexpected outcome whereas tion4, an equally long random sequence of heads and tails is not. −b Specification provides the key to unlocking this riddle. Pr (SCα(X) ≥ b) ≤ α2 . Corollary 3 (Level-α Property for Specified Complexity 3.2 Level-α Property for Specified Complexity III). Let SC (x) be any augmented canonical specified Tests α complexity model as defined in Definition4. Then for Several results follow from Theorem2. One of these we X ∼ p and significance level α ∈ (0, 1], have already encountered, namely that canonical kardii, like p-values, possess the level-α property that allows Pr (SCα(X) ≥ 0) ≤ α. them to bound the mass of extreme values under distri- bution p. The following two corollaries show how this Proof. Let b = 0 and invoke Theorem3.
Volume 2018 | Issue 4 | Page 7 A Unified Model of Complex Specified Information
In other words, the probability (under p) of observ- common probability function p(x). Define a set of ing positive values for an augmented canonical specified mixture variables complexity model with significance level α is no greater than α. Λ = {λi : i = 1, . . . , m, 0 ≤ λi ≤ 1}, Pm such that i=1 λi = 1. Then for any such Λ and defining 0 3.3 Common Form Model Bound r := maxi ri, Although by Definition3 all canonical specified com- p(x) plexity models are normalized such that ν(X ) ≤ r, we 0 0 SC (x) := − log2 r Pm can easily obtain a tail-bound for unnormalized common i=1 λiνi(x) form models using Theorem2, with a full proof given in is a canonical specified complexity model. the Appendix. Theorem 5 (Combined Models II). Let Corollary 4 (Common Form Model Bound). For any SC1(x),...,SCm(x) be canonical specified com- common form model SC, where r and ν(X ) are as in plexity models sharing a common probability function Definitions1 and2, we have p(x). Define a set of mixture variables Λ = {λ : i = 1, . . . , m, 0 ≤ λ ≤ 1}, Pr (SC(X) ≥ b) ≤ 2−br−1ν(X ). i i Pm such that i=1 λi = 1. Then for any such Λ, In the special case of r = 1, this simplifies to m ! X −b Pr (SC(X) ≥ b) ≤ 2−bν(X ). Pr λiSCi(X) ≥ b ≤ m2 . i=1 With the above corollary one can make use of non- canonical specified complexity models, since the size of 4. ADDITIONAL RESULTS the specification function can be controlled directly in In this section we show how several existing specified the bound. This gives a result that may be applied to complexity models, such as algorithmic specified com- previously published specified complexity models without plexity [5], irreducible complexity [26], and functional needing to first define canonical form variants. information [27], can be recast into common form, with close variants defined as canonical specified complexity 3.4 Combining Models models. Doing so allows us to prove conservation results Given that multiple specified complexity models exist, bounding the probability of observing extreme values un- and others may be defined using the recipe in Section 1.2, der those models. As canonical models, they can also be one naturally wonders if two or more canonical mod- used to form statistical hypothesis tests as in Section 3.2. els can be combined to produce yet another canoni- Demonstrating the commonality of such a diverse group cal model. Namely, given specified complexity models of models highlights the utility of the canonical model SC1(x),...,SCm(x), can we form a combined model abstraction, and suggests that any model seeking to SC0(x) subject to the bounds given previously? Assume solve similar problems will converge to a form similar to all models share a common p(x) function, so that there canonical specified complexity. is a single random variable X ∼ p for all SCi. There are two simple ways to create a combined model: 1) combine 4.1 Semiotic Specified Complexity We begin with a form of specified complexity outlined specification functions νi(x) into a hybrid specification function ν0(x) (with which a canonical kardis can be in Dembski (2005) [4], which was proposed as an im- defined), or 2) average the specified complexity models provement and mathematical refinement to the model together. We give bounds for both types of combined suggested in Dembski (1998) [1]. Although recent work models, where the first results in a new canonical model in algorithmic specified complexity seems aimed at super- and the second is no longer a canonical model since seding this model [5, 6], we include it here for historical the product probability function on the single random purposes. As shown in the Appendix, semiotic speci- variable X no longer integrates to 1 in general. For fied complexity is a common form model (though not the second type of combined model, we give a looser canonical), so we can invoke Corollary4 and obtain the bound on the probability of observing extreme combined following bound, along with a conservation bound on a specified complexity values under distribution p. closely related canonical variant. Corollary 5 (Bounds on Semiotic Specified Complexity Theorem 4 (Combined Models I). Let (SSC)). Let p(x) be any probability distribution on a SC1(x),...,SCm(x) be canonical specified com- finite discrete space X and define plexity models with corresponding specification functions 120 ν1(x), . . . , νm(x), scaling constants r1, . . . , rm, and SSC(x, s) := − log2 10 ϕs(x)p(x)
Volume 2018 | Issue 4 | Page 8 A Unified Model of Complex Specified Information
where ϕs(x) is the number of patterns available to semi- An irreducibly complex evolutionary path- otic agent s which have descriptive complexity no greater way is one that contains one or more unse- than that of x. Then for X ∼ p, we have lected steps (that is, one or more necessary- but-unselected mutations). The degree of −b −120 X −1 Pr (SSC(X, s) ≥ b) ≤ 2 10 ϕs(x) . irreducible complexity is the number of unse- x∈X lected steps in the pathway. Additionally, whenever |X | ≤ 10120, the following con- and servation bound holds: Demonstration that a system is irreducibly Pr (SSC(X, s) ≥ b) ≤ 2−b. complex is not a proof that there is absolutely no gradual route to its production. Although 4.2 Algorithmic Specified Complexity an irreducibly complex system can’t be pro- As mentioned previously, the conservation result from duced directly, one can’t definitively rule out Ewert et al. [6], which initially inspired the work here, the possibility of an indirect, circuitous route. follows as a corollary of the general result given in Theo- However, as the complexity of an interacting rem2. Algorithmic specified complexity is a canonical system increases, the likelihood of such an specified complexity model, allowing us to apply Theo- indirect route drops precipitously. rem2 directly. The somewhat qualitative nature of the original def- Corollary 6 (Conservation of Algorithmic Specified inition has typically led to an ‘all-or-nothing’ view of Complexity (ASC)). For discrete space X , define irreducible complexity [29, 30]. It can be argued, however, that the original formulation more naturally suggests a ASC(x, c, p) := − log2 p(x) − K(x|c), continuum of ‘irreducibility.’ We attempt to provide such a quantitative, non-binary formulation of irreducible com- where K(x|c) is the conditional Kolmogorov complexity plexity here. We make no claim that this is an optimal of x given context c. Then for X ∼ p, or otherwise definitive model, but is offered simply as a relevant application of the canonical view of specified Pr (ASC(X, c, p) ≥ b) ≤ 2−b. complexity. Proof. Let r = 1 and ν(x) = 2−K(x|c), so that Within the definitions and clarifications above, a few zero-one indicators of irreducible complexity are present, p(x) such as the need for ‘several’ (more than two) parts, the ASC(x, c, p) = − log r . (20) 2 ν(x) parts being ‘well-matched’ and ‘interacting’, and all parts contributing necessarily to the basic (core) function of the Because ν is a probability measure on X (being a uni- system such that the removal of any of them also removes versal algorithmic probability distribution), this implies that functionality. In addition to these indicator features, ν(X ) = 1, making it less than or equal to r. The result there are scalar-valued features, such as the complexity then follows from Theorem2. of the system (e.g., the number of ‘unselected steps’, Remark. Under the conditions of Corollary6, given length of ‘circuitous route[s]’), the count of interacting any test statistic T (x) = ASC(x, c, p), we obtain components, and the functional specificity of such ‘well- matched’ interacting parts. Thus, irreducible complexity Pr (ASC(X, c, p) ≥ T (x)) ≤ 2−T (x). (21) contains a complexity component (complex system with several interacting parts, all routes must be circuitous Thus, one can reject the null hypothesis at all significance with unselected steps) and a specification component levels α ≥ 2−T (x). (parts contribute to an identifiable function, the core is irreducible, all parts are well-matched). The first 4.3 Defining Quantitative Irreducible Complexity aspect ensures irreducibly complex systems are unlikely Irreducible complexity as defined by Behe [26] describes under proposed distributions governing the emergence ...a single system which is composed of sev- of biological systems, and the second captures aspects eral well-matched, interacting parts that con- of human engineered systems, correlating irreducible tribute to the basic function, and where the complexity with intentional design. removal of any one of the parts causes the Following Behe, let us say a system is irreducible system to effectively cease functioning. if it contains at least three well-matched components that contribute necessarily to a core function, such that In his book [26], Behe proposes irreducible complexity removal or mutilation of any of the components causes as an indicator of intentional design by intelligent agents. the system to cease functioning with respect to the core Behe further clarified the definition, adding [28]: function. More formally, given a finite discrete space
Volume 2018 | Issue 4 | Page 9 A Unified Model of Complex Specified Information
X , a modular system x ∈ X containing components y1, . . . , y` (with each yi being drawn from a shared space Y of possible options), and a core function g : X → [0, 1], let Cg(x) = {y1, . . . , yk} (22) be defined as the irreducible core of the system relative to function g, such that g(Cg(x)) > 0 and g(Cg(x)\{yi}) = Figure 1: Whole vs. Many. Tight interface coupling suggests con- 0 for all yi ∈ Cg(x) (i.e., the core can perform the func- nection and coherent wholeness. When interfaces are not tightly tion and removing any component from the irreducible matched, we perceive parts as individual items rather than a unified, core causes the function to cease, which is identified here coherent whole. doi:10.5048/BIO-C.2018.4.f1 by a functional value of zero).4 If neither x nor any subset of x can perform the function, the irreducible is suggestive of cohesion. In the figure, the left hand core is simply the empty set. Our definition considers side contains two blue-tinted sub-figures. At the top, the components of x as they are; we do not allow for we perceive two separate pieces, whereas the image on modifications to the components (such as fusing two the bottom is perceived as a unified whole, even though components together or altering one so that it performs it is also comprised of two separate pieces, albeit with the role of two components while eliminating the second). well-matched interfaces connecting them. On the right, We implicitly assume interactions among core compo- there is another contrasting pair of peach-tinted images, nents remain as they were in x, but will consider changes where the pieces without matching interfaces are quickly in interaction patterns that might preserve core function perceived to be separate entities (Many), whereas the when later measuring interaction specificity. We assume pieces sharing matching interfaces are perceived as a function g is scaled so that maximum function occurs single ring (Whole), rather than as the two separate when g(x) = 1 and g(x) = 0 corresponds to no detectable pieces that comprise it. In both cases, the well-matched core function activity. To make estimation easier, the interfaces strongly suggest cohesion, which is one of the space X might be restricted to some manifold of the ways humans identify functional wholes rather than col- true space, such as the subspace of systems containing k lections of individual pieces. At minimum, they signal a components or fewer (where k corresponds to the number correlation that might be the result of a common cause of core components in x). or top-down, intentional coordination (especially when For a configuration x with irreducible core C (x), we g resulting in functional systems) [31]. Furthermore, well- can derive the following features from C (x) to form g matched components contribute to the improbability of specification function ν. Let k be the number of compo- a configuration by limiting the number of viable alter- nents in the core, which will act as a scalar multiplier native options. Thus, they contribute simultaneously to for our function, so that when there are many necessary the complexity and specificity of a system. components, the specification function will take larger For each component y in core C (x), let w(y ) be the values. Let m be the number of functionally relevant i g i proportion of components from the space Y of possible interactions between the components of C (x), where an g options that maintain core function when replacing y in interaction is functionally relevant if removing it causes i the system, namely, a change in the value of g(x). Interactions that do not contribute to or otherwise cause detectable differences |{a : a ∈ Y, g(C (x )) > 0}| w(y ) := g yi,a , (23) in the core function are functionally irrelevant, and are i |Y| thus excluded. In addition to the number of interacting components, where xyi,a denotes the configuration that replaces com- our specification function will also include terms for part ponent yi with component a, and Y denotes the subspace specificity and interaction specificity, corresponding to of possible alternate components (e.g., a component li- the ‘well-matched’ qualifier of Behe’s original definition. brary, lexicon, or alphabet). Define Well-matched components of a system are often an indica- Y 1 tion of intentional purpose, allowing humans to perceive γ (x) := (24) 1 w(y ) a coherent whole from many separate pieces, where the i i wholeness suggests a higher-level intentional cohesion or concept. Figure1 shows how strong interface matching as the product of the reciprocal of all w(yi) proportions, so that when parts are very specific within the system 4Note, this implicitly assumes closure of the irreducible core op- eration within the space X , such that X also contains any system (not many alternatives maintain function), the w(yi) pro- that could be constructed from x’s components or their alterna- portions are small and thus the γ1 value is large. Under tives. the conditions that yi ∈ Y for all yi and assuming a finite
Volume 2018 | Issue 4 | Page 10 A Unified Model of Complex Specified Information