A Unified Model of Complex Specified Information
Total Page:16
File Type:pdf, Size:1020Kb
Research Article A Unified Model of Complex Specified Information George D. Montanez*˜ Department of Computer Science, Harvey Mudd College, Claremont, CA, USA Abstract A mathematical theory of complex specified information is introduced which unifies several prior methods of computing specified complexity. Similar to how the exponential family of probability distributions have dissimilar surface forms yet share a common underlying mathematical identity, we define a model that allows us to cast Dembski’s semiotic specified complexity, Ewert et al.’s algorithmic specified complexity, Hazen et al.’s functional information, and Behe’s irreducible complexity into a common mathematical form. Adding additional constraints, we introduce canonical specified complexity models, for which one-sided conservation bounds are given, showing that large specified complexity values are unlikely under any given continuous or discrete distribution and that canonical models can be used to form statistical hypothesis tests, by bounding tail probabilities for arbitrary distributions. Cite As: Montanez˜ GD (2018) A Unified Model of Complex Specified Information. BIO-Complexity 2018 (4):1-26. doi:10.5048/BIO-C.2018.4. Editor: Robert J. Marks II Received: June 29, 2018; Accepted: November 30, 2018; Published: December 14, 2018 Copyright: © 2018 George D. Montanez.˜ This open-access article is published under the terms of the Creative Commons Attribution License, which permits free distribution and reuse in derivative works provided the original author(s) and source are credited. Notes: A Critique of this paper, when available, will be assigned doi:10.5048/BIO-C.2018.4.c. *Email: [email protected] \Life must depend on a higher level of complexity, structure without predictable repetition, he argued." - James Gleick, \The Information: A History, a Theory, a Flood", ch. 9 INTRODUCTION Recent work in specified complexity has proposed using algorithmic information theory as a way of com- Specified complexity, the property of an object being both putationally grounding specified complexity, under the unlikely and structurally organized, has been proposed name of algorithmic specified complexity (ASC) [5, 6]. as a signifier of design [1{3]. Objects exhibiting specified Ewert and collaborators have shown that under any pro- complexity must be both complex (e.g., unlikely under posed discrete distribution, sequences exhibiting high the relevant probability distribution) and specified (e.g., levels of ASC must be unlikely [6]. We extend this body conform to an independent or detached specification). of work by generalizing the result of Ewert et al. [6], While mathematical methods for measuring specified showing that all specified complexity models of a certain complexity have changed over the decades [1, 4{6], the class, which we call canonical models, share this property. underlying components of specification and complex- We re-derive the existing result of Ewert et al. [6] as a ity have remained constant. Possible applications of corollary to our general theorem, and prove a number specified complexity include differentiating malicious net- of previously unknown results, including a conservation work behavior from non-intentional anomalies, identify- result for a variant of semiotic specified complexity, in- ing genetically-modified material in agricultural products troduced in Dembski (2005) [4]. These results add to (such as artificially engineered gene sequences), identify- a body of work generally known as conservation of in- ing intelligently engineered extra-terrestrial signals, and formation theorems, which prove that in many contexts testing for artificial general intelligence [7]. At its core, it takes as much information to find an object as you it has been proposed as a method of sorting artificial might gain by using that object, rendering explanations or teleological structures from naturally occurring unin- that rely on such specialized objects as cases of special tentional ones. Specified complexity has been severely pleading. Viewed as a conservation theorem, if one char- criticized, both for its underlying logic and its various acterizes specified complexity as a form of information formulations [8{11]. signal measurable in bits, the probability under the pro- Volume 2018 j Issue 4 j Page 1 A Unified Model of Complex Specified Information posed distribution of encountering a structure exhibiting the function prior to observing x, but need not be, as b or more bits of complex specified information is no long as the function itself is not conditioned on x when greater than 2−b, thus incurring an information cost of computing the value for x. In other words, specification at least b bits. functions of the form νx(x) or ν(xjx) are invalid. This More importantly, this work formally reduces speci- danger of fabrication is a general vulnerability of statis- fied complexity testing to a form of statistical hypothe- tical hypothesis testing, since one could inappropriately sis testing, a view inspired by Dembski (2005) [4] (and choose the significance level α (acting as the specification preceded by Milosavljevi´c[12]). Accordingly, we demon- function) in response to the observed test statistic value, strate the connection for canonical specified complexity or even define a test statistic Tx(X) = c · 1fX = xg models, where the model core (called a kardis) plays the (where 1{·} denotes the indicator function, which equals same role as a p-value under a null distribution. We 1 when the argument evaluates to true, and 0 otherwise) prove that for such models the null-distribution probabil- that always assigns a large value c to the particular x ity mass of values as extreme as the one observed is upper observed and zero to all other possible observations, lead- bounded by the significance level α, which is the crucial ing to the smallest possible critical region, a singleton property for p-value significance tests. Thus, we formally set containing only the observation itself. If the danger establish the canonical model as a particular statistical of fabrications forces us to reject specified complexity hypothesis test statistic, one which would prove useful hypothesis testing then we must also reject many other in cases where only a likelihood and a specification value statistical hypothesis tests, which all suffer from the same are available and the behavior of the distribution outside vulnerability. of the given observation is otherwise unknown. We begin by reviewing the general structure of speci- 1.1 Common Form and Canonical Models fied complexity models and defining their common and We begin with a few definitions. canonical model forms, along with providing a general Definition 1 (ν(X )). For any integrable nonnegative recipe for constructing new specified complexity models. specification function ν : X! , define ν(X ) as We then review statistical hypothesis testing and the R≥0 follows: level-α property, formally reducing p-value testing to a 8 form of specified complexity model hypothesis testing. R ν(x)dx if continuous, Conservation and level-α test properties of canonical <> X ν(X ) := P ν(x) if discrete, (1) specified complexity models are proven, and instructions x2X :>R for conducting specified complexity model hypothesis X dν(x) in general. tests are also given. Algorithmic specified complexity, Definition 2 (Common Form and Kardis). For any semiotic specified complexity, and functional informa- probability distribution p(x) on space X , any strictly tion are reduced to common form, canonical variants positive scaling constant r 2 and any nonnegative are derived, and conservation theorems are proven for R>0 function ν : X! , we define a common form model each as corollaries to our results for the general canonical R≥0 as specified complexity models. While criticisms of speci- fied complexity [8{11] (and their rebuttals [2, 13{16]) are p(x) SC(x) := − log r (2) duly acknowledged, we do not attempt to address them 2 ν(x) here, as the purpose of this manuscript is to prove the mathematical commonality of several complexity metrics, with specified complexity kardis κ(x) = r(p(x)/ν(x)). not to defend them individually. Definition 3 (Canonical Specified Complexity Model). Any common form model constrained such that ν(X ) ≤ r 1. SPECIFIED COMPLEXITY MODELS is a canonical specified complexity model. A specified complexity model is a function of an obser- vation X taking values in some space X , where X is It should be stated explicitly that the constraint for distributed according to probability distribution p(x) canonical models is what allows us to bound the proba- (denoted as X ∼ p). It contains two components, one bility of observing extreme specified complexity values, measuring the likelihood of the observation under p (re- and that ideally ν(X ) = r, which allows us to obtain ferred to as the complexity term for historical reasons), tight bounds. In addition, the scaling constant r should and another component that measures the specificity not depend on x. To define a canonical model, it suffices 1 of the observation, according to a specification function to define the kardis κ(x) and ensure that the constraints ν(x). This function should be computable or capable of are satisfied for r, p, and ν. As a useful example, given being estimated given x and should be definable indepen- any kardis κ(x) from a canonical specified complexity dently of the particular observation x, to avoid cherry- 1From Greek kardi´a (heart) and Ilocano kardis (pigeon pea, picking and fabrications. This can be done by defining which is a small component contained in a larger pod). Volume 2018 j Issue 4 j Page 2 A Unified Model of Complex Specified Information model, we can define an augmented kardis (and thus f(x) is the original function and ≥ 0 is some small augmented model) that incorporates a significance level constant used to prevent undefined values. If needed, term within the model. We do so as follows. we can then rescale γ(x) into the range [0; 1] by using zero-one scaling: Definition 4 (rα, κα(x), and SCα(x)).