Predictability, Complexity, and Learning

Total Page:16

File Type:pdf, Size:1020Kb

Predictability, Complexity, and Learning ARTICLE Communicated byJean-Pierre Nadal Predictability,Complexity, and Learning WilliamBiale k NEC ResearchInstitute, Princeton, NJ 08540,U.S.A. IlyaNemenm an NEC ResearchInstitute, Princeton, New Jersey08540, U.S.A., and Department of Physics,Princeton University, Princeton, NJ 08544,U.S.A. NaftaliTishby NEC ResearchInstitute, Princeton, NJ 08540,U.S.A., and School of Computer Science andEngineering and Center for Neural Computation, Hebrew University, Jerusalem 91904,Israel We dene predictiveinformation Ipred(T) asthe mutual informat ion be- tweenthepastandthefutureof atimeseries.Threequalitati velydifferent behaviorsare found inthelimit of largeobservat ion times T: Ipred(T) can remain nite,grow logarithmically,or grow asafractionalpower law.If thetime series allows us tolearna model witha nitenumber ofparam- eters, then Ipred(T) grows logarithmicallywith a coefcient that counts thedimensio nalityof themodel space.In contrast,power-la wgrowth is associated,for example ,withthe learning of innite paramet er(ornon- parametric)models such as continuous functionswith smoothne sscon- straints.Thereare connectio ns betweenthe predictiv einformation and measuresof complexity thathave been de ned both in learningtheory and theanalysis of physicalsystems through statisticalm echanicsand dynamicalsystemstheory .Furthermore,in thesam eway thatentropy provides theunique measureof availableinformation consistentwith somesim pleand plausibleconditions ,wearguethat the divergent part of Ipred(T) provides theunique measurefor the complexi ty of dynam- icsunderlying atimeseries.Finally, we discusshow theseideas may be usefulin problemsin physics,statisti cs,and biology. 1Introduction There is an obvious interest inhaving practicalalgorithms forpredicting the future, and there isacorrespondinglylarge literature onthe problemof time-series extrapolation. 1 But predictionis both moreand less than extrap- 1 Theclassic papers areby Kolmogoroff(1939, 1941) and Wiener (1949),who essentially solved all the extrapolation problems thatcould besolved bylinear methods.Our under- Neural Computation 13, 2409–2463 (2001) c 2001Massachusetts Institute of Technology ° 2410 W.Bialek, I.Nemenman, and N.Tishby olation. Wemight be able to predict,for example, the chance ofrain inthe comingweek even ifwe cannot extrapolate the trajectory oftemperature uctuations. In the spiritof its thermodynamic origins, informationtheory (Shannon, 1948) characterizes the potentialities and limitationsof all possi- ble predictionalgorithms, as well as unifying the analysis ofextrapolation with the moregeneral notion ofpredictability.Specically ,we can dene a quantity—the predictiveinformation— that measures how much ourobser- vations ofthe past can tellus about the future. The predictiveinformation characterizes the worldwe are observing, and we shall see that this char- acterization is closeto ourintuition about the complexityof the underlying dynamics. Predictionis one ofthe fundamental problemsin neural computation. Much ofwhat we admirein expert human performanceis predictivein character: the pointguard who passes the basketball to aplacewhere his teammate willarrive in asplitsecond, the chess master who knows how moves made now willin uence the end game two hours hence, the investor who buys astock inanticipation that itwill grow inthe year to come. Moregenerally ,we gather sensory informationnot forits own sake but in the hopethat this informationwill guide ouractions (including ourverbal actions). But acting takes time, and sense data can guide us only tothe extent that those data informus about the state ofthe worldat the time ofour actions, so the only components ofthe incomingdata that have a chance ofbeing useful are those that are predictive.Put bluntly, nonpredictive informationis uselessto the organism ,and ittherefore makes sense to isolate the predictiveinformation. It willturn out that most ofthe information we collectover a long periodof timeis nonpredictive,so that isolating the predictiveinformation must go along way toward separating out those features ofthe sensory worldthat are relevant forbehavior. One ofthe most importantexamples ofpredictionis the phenomenon of generalization in learning. Learning is formalizedas nding amodelthat explains ordescribes aset ofobservations, but again this isuseful only be- cause we expect this modelwill continue to be valid. Inthe language of learning theory (see, forexample, Vapnik, 1998), an animal can gain selec- tive advantage not fromits performanceon the training data but only from its performanceat generalization. Generalizing—and not “overtting” the training data—isprecisely the problemof isolating those features ofthe data that have predictivevalue (see also Bialek and Tishby,inpreparation).Fur- thermore, we know that the success ofgeneralization hinges oncontrolling the complexityof the modelsthat we are willingto consideras possibilities. standingof predictability was changedby developments in dynamicalsystems, which showed thatapparently random (chaotic) time series could arise fromsimple determin- istic rules, andthis led to vigorous exploration of nonlinearextrapolation algorithms (Abarbanelet al.,1993). For a review comparingdifferent approaches,see the conference proceedings edited byWeigend andGershenfeld (1994). Predictability,Complexity,and Learning 2411 Finally,learning amodelto describe adata set can be seen as an encod- ing ofthose data, as emphasized by Rissanen (1989), and the quality of this encoding can be measured using the ideas ofinformation theory .Thus, the explorationof learning problemsshould provideus with explicitlinks among the concepts ofentropy ,predictability,and complexity. The notion ofcomplexity arises not only inlearning theory,but also in several other contexts. Somephysical systemsexhibit morecomplex dynam- icsthan others (turbulent versus laminar ows inuids), and some systems evolve toward morecomplex states than others (spin glasses versus ferro- magnets). The problemof characterizing complexityin physical systems has asubstantial literature ofits own (foran overview,see Bennett, 1990). In this context several authors have considered complexitymeasures based onentropy ormutual information,although, as far as we know,no clear connections have been drawn among the measures ofcomplexitythat arise inlearning theory and those that arise indynamical systems and statistical mechanics. Anessential difculty inquantifying complexityis to distinguish com- plexityfrom randomness. Atrue randomstring cannot be compressedand hence requires along description ; itthus iscomplexin the sense dened by Kolmogorov(1965 ; Li & Vitanyi, 1993; Vitanyi &Li, 2000), yet the phys- icalprocess that generates this string may have avery simpledescription. Both instatistical mechanics and inlearning theory,ourintuitive notions ofcomplexity correspond to the statements about complexityof the un- derlying process, and not directlyto the descriptionlength orKolmogorov complexity. Our central result is that the predictiveinformation provides a general measure ofcomplexity,which includes as specialcases the relevant concepts fromlearning theory and dynamical systems. Whilework oncomplexity inlearning theory rests specically on the idea that one is trying to infera modelfrom data, the predictiveinformation is apropertyof the data (or, moreprecisely ,ofan ensemble ofdata) themselves without reference to a specic class ofunderlying models. Ifthe data are generated by aprocessin aknown class but with unknown parameters, then we can calculate the pre- dictiveinformation explicitly and show that this informationdiverges loga- rithmicallywith the size ofthe data set we have observed ; the coefcient of this divergence counts the number ofparameters inthe modelor, more pre- cisely,the effective dimension ofthe modelclass, and this providesa link to known results ofRissanen andothers.Wealso can quantifythe complexityof processes that falloutside the conventional nite dimensional models, and we show that these morecomplex processes are characterized by apower law rather than alogarithmicdivergence ofthe predictiveinformation. By analogy with the analysis ofcriticalphenomena instatistical physics, the separation oflogarithmic from power-law divergences, together with the measurement ofcoefcients and exponents forthese divergences, allows us to dene “universality classes ” forthe complexityof data streams. The 2412 W.Bialek, I.Nemenman, and N.Tishby powerlaw ornonparametric class ofprocesses may be crucialin real-world learning tasks, where the effective number ofparameters becomes so large that asymptotic results for nitely parameterizable modelsare inaccessible in practice.There is empiricalevidence that simplephysical systems can generate dynamics in thiscomplexityclass, and there are hints that language also may fallin this class. Finally,we argue that the divergent components ofthe predictivein- formationprovide a unique measure ofcomplexity that isconsistent with certain simplerequirements. Thisargument isinthe spiritof Shannon ’sorig- inal derivation ofentropy as the unique measure ofavailable information. Webelieve that this uniqueness argument providesa conclusive answer to the question ofhow one should quantify the complexityof a process generating atimeseries. With the evident cost oflengthening ourdiscussion, we have tried to give aself-contained presentation that develops ourpoint of view ,uses simpleexamples to connect with known results, and then generalizes and goes beyond these results. 2 Even incases
Recommended publications
  • Complexity” Makes a Difference: Lessons from Critical Systems Thinking and the Covid-19 Pandemic in the UK
    systems Article How We Understand “Complexity” Makes a Difference: Lessons from Critical Systems Thinking and the Covid-19 Pandemic in the UK Michael C. Jackson Centre for Systems Studies, University of Hull, Hull HU6 7TS, UK; [email protected]; Tel.: +44-7527-196400 Received: 11 November 2020; Accepted: 4 December 2020; Published: 7 December 2020 Abstract: Many authors have sought to summarize what they regard as the key features of “complexity”. Some concentrate on the complexity they see as existing in the world—on “ontological complexity”. Others highlight “cognitive complexity”—the complexity they see arising from the different interpretations of the world held by observers. Others recognize the added difficulties flowing from the interactions between “ontological” and “cognitive” complexity. Using the example of the Covid-19 pandemic in the UK, and the responses to it, the purpose of this paper is to show that the way we understand complexity makes a huge difference to how we respond to crises of this type. Inadequate conceptualizations of complexity lead to poor responses that can make matters worse. Different understandings of complexity are discussed and related to strategies proposed for combatting the pandemic. It is argued that a “critical systems thinking” approach to complexity provides the most appropriate understanding of the phenomenon and, at the same time, suggests which systems methodologies are best employed by decision makers in preparing for, and responding to, such crises. Keywords: complexity; Covid-19; critical systems thinking; systems methodologies 1. Introduction No one doubts that we are, in today’s world, entangled in complexity. At the global level, economic, social, technological, health and ecological factors have become interconnected in unprecedented ways, and the consequences are immense.
    [Show full text]
  • Wavelet Entropy Energy Measure (WEEM): a Multiscale Measure to Grade a Geophysical System's Predictability
    EGU21-703, updated on 28 Sep 2021 https://doi.org/10.5194/egusphere-egu21-703 EGU General Assembly 2021 © Author(s) 2021. This work is distributed under the Creative Commons Attribution 4.0 License. Wavelet Entropy Energy Measure (WEEM): A multiscale measure to grade a geophysical system's predictability Ravi Kumar Guntu and Ankit Agarwal Indian Institute of Technology Roorkee, Hydrology, Roorkee, India ([email protected]) Model-free gradation of predictability of a geophysical system is essential to quantify how much inherent information is contained within the system and evaluate different forecasting methods' performance to get the best possible prediction. We conjecture that Multiscale Information enclosed in a given geophysical time series is the only input source for any forecast model. In the literature, established entropic measures dealing with grading the predictability of a time series at multiple time scales are limited. Therefore, we need an additional measure to quantify the information at multiple time scales, thereby grading the predictability level. This study introduces a novel measure, Wavelet Entropy Energy Measure (WEEM), based on Wavelet entropy to investigate a time series's energy distribution. From the WEEM analysis, predictability can be graded low to high. The difference between the entropy of a wavelet energy distribution of a time series and entropy of wavelet energy of white noise is the basis for gradation. The metric quantifies the proportion of the deterministic component of a time series in terms of energy concentration, and its range varies from zero to one. One corresponds to high predictable due to its high energy concentration and zero representing a process similar to the white noise process having scattered energy distribution.
    [Show full text]
  • Dissipative Structures, Complexity and Strange Attractors: Keynotes for a New Eco-Aesthetics
    Dissipative structures, complexity and strange attractors: keynotes for a new eco-aesthetics 1 2 3 3 R. M. Pulselli , G. C. Magnoli , N. Marchettini & E. Tiezzi 1Department of Urban Studies and Planning, M.I.T, Cambridge, U.S.A. 2Faculty of Engineering, University of Bergamo, Italy and Research Affiliate, M.I.T, Cambridge, U.S.A. 3Department of Chemical and Biosystems Sciences, University of Siena, Italy Abstract There is a new branch of science strikingly at variance with the idea of knowledge just ended and deterministic. The complexity we observe in nature makes us aware of the limits of traditional reductive investigative tools and requires new comprehensive approaches to reality. Looking at the non-equilibrium thermodynamics reported by Ilya Prigogine as the key point to understanding the behaviour of living systems, the research on design practices takes into account the lot of dynamics occurring in nature and seeks to imagine living shapes suiting living contexts. When Edgar Morin speaks about the necessity of a method of complexity, considering the evolutive features of living systems, he probably means that a comprehensive method should be based on deep observations and flexible ordering laws. Actually designers and planners are engaged in a research field concerning fascinating theories coming from science and whose playground is made of cities, landscapes and human settlements. So, the concept of a dissipative structure and the theory of space organized by networks provide a new point of view to observe the dynamic behaviours of systems, finally bringing their flowing patterns into the practices of design and architecture. However, while science discovers the fashion of living systems, the question asked is how to develop methods to configure open shapes according to the theory of evolutionary physics.
    [Show full text]
  • Biostatistics for Oral Healthcare
    Biostatistics for Oral Healthcare Jay S. Kim, Ph.D. Loma Linda University School of Dentistry Loma Linda, California 92350 Ronald J. Dailey, Ph.D. Loma Linda University School of Dentistry Loma Linda, California 92350 Biostatistics for Oral Healthcare Biostatistics for Oral Healthcare Jay S. Kim, Ph.D. Loma Linda University School of Dentistry Loma Linda, California 92350 Ronald J. Dailey, Ph.D. Loma Linda University School of Dentistry Loma Linda, California 92350 JayS.Kim, PhD, is Professor of Biostatistics at Loma Linda University, CA. A specialist in this area, he has been teaching biostatistics since 1997 to students in public health, medical school, and dental school. Currently his primary responsibility is teaching biostatistics courses to hygiene students, predoctoral dental students, and dental residents. He also collaborates with the faculty members on a variety of research projects. Ronald J. Dailey is the Associate Dean for Academic Affairs at Loma Linda and an active member of American Dental Educational Association. C 2008 by Blackwell Munksgaard, a Blackwell Publishing Company Editorial Offices: Blackwell Publishing Professional, 2121 State Avenue, Ames, Iowa 50014-8300, USA Tel: +1 515 292 0140 9600 Garsington Road, Oxford OX4 2DQ Tel: 01865 776868 Blackwell Publishing Asia Pty Ltd, 550 Swanston Street, Carlton South, Victoria 3053, Australia Tel: +61 (0)3 9347 0300 Blackwell Wissenschafts Verlag, Kurf¨urstendamm57, 10707 Berlin, Germany Tel: +49 (0)30 32 79 060 The right of the Author to be identified as the Author of this Work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
    [Show full text]
  • Fractal Curves and Complexity
    Perception & Psychophysics 1987, 42 (4), 365-370 Fractal curves and complexity JAMES E. CUTI'ING and JEFFREY J. GARVIN Cornell University, Ithaca, New York Fractal curves were generated on square initiators and rated in terms of complexity by eight viewers. The stimuli differed in fractional dimension, recursion, and number of segments in their generators. Across six stimulus sets, recursion accounted for most of the variance in complexity judgments, but among stimuli with the most recursive depth, fractal dimension was a respect­ able predictor. Six variables from previous psychophysical literature known to effect complexity judgments were compared with these fractal variables: symmetry, moments of spatial distribu­ tion, angular variance, number of sides, P2/A, and Leeuwenberg codes. The latter three provided reliable predictive value and were highly correlated with recursive depth, fractal dimension, and number of segments in the generator, respectively. Thus, the measures from the previous litera­ ture and those of fractal parameters provide equal predictive value in judgments of these stimuli. Fractals are mathematicalobjectsthat have recently cap­ determine the fractional dimension by dividing the loga­ tured the imaginations of artists, computer graphics en­ rithm of the number of unit lengths in the generator by gineers, and psychologists. Synthesized and popularized the logarithm of the number of unit lengths across the ini­ by Mandelbrot (1977, 1983), with ever-widening appeal tiator. Since there are five segments in this generator and (e.g., Peitgen & Richter, 1986), fractals have many curi­ three unit lengths across the initiator, the fractionaldimen­ ous and fascinating properties. Consider four. sion is log(5)/log(3), or about 1.47.
    [Show full text]
  • Measures of Complexity a Non--Exhaustive List
    Measures of Complexity a non--exhaustive list Seth Lloyd d'Arbeloff Laboratory for Information Systems and Technology Department of Mechanical Engineering Massachusetts Institute of Technology [email protected] The world has grown more complex recently, and the number of ways of measuring complexity has grown even faster. This multiplication of measures has been taken by some to indicate confusion in the field of complex systems. In fact, the many measures of complexity represent variations on a few underlying themes. Here is an (incomplete) list of measures of complexity grouped into the corresponding themes. An historical analog to the problem of measuring complexity is the problem of describing electromagnetism before Maxwell's equations. In the case of electromagnetism, quantities such as electric and magnetic forces that arose in different experimental contexts were originally regarded as fundamentally different. Eventually it became clear that electricity and magnetism were in fact closely related aspects of the same fundamental quantity, the electromagnetic field. Similarly, contemporary researchers in architecture, biology, computer science, dynamical systems, engineering, finance, game theory, etc., have defined different measures of complexity for each field. Because these researchers were asking the same questions about the complexity of their different subjects of research, however, the answers that they came up with for how to measure complexity bear a considerable similarity to eachother. Three questions that researchers frequently ask to quantify the complexity of the thing (house, bacterium, problem, process, investment scheme) under study are 1. How hard is it to describe? 2. How hard is it to create? 3. What is its degree of organization? Here is a list of some measures of complexity grouped according to the question that they try to answer.
    [Show full text]
  • Big Data for Reliability Engineering: Threat and Opportunity
    Reliability, February 2016 Big Data for Reliability Engineering: Threat and Opportunity Vitali Volovoi Independent Consultant [email protected] more recently, analytics). It shares with the rest of the fields Abstract - The confluence of several technologies promises under this umbrella the need to abstract away most stormy waters ahead for reliability engineering. News reports domain-specific information, and to use tools that are mainly are full of buzzwords relevant to the future of the field—Big domain-independent1. As a result, it increasingly shares the Data, the Internet of Things, predictive and prescriptive lingua franca of modern systems engineering—probability and analytics—the sexier sisters of reliability engineering, both statistics that are required to balance the otherwise orderly and exciting and threatening. Can we reliability engineers join the deterministic engineering world. party and suddenly become popular (and better paid), or are And yet, reliability engineering does not wear the fancy we at risk of being superseded and driven into obsolescence? clothes of its sisters. There is nothing privileged about it. It is This article argues that“big-picture” thinking, which is at the rarely studied in engineering schools, and it is definitely not core of the concept of the System of Systems, is key for a studied in business schools! Instead, it is perceived as a bright future for reliability engineering. necessary evil (especially if the reliability issues in question are safety-related). The community of reliability engineers Keywords - System of Systems, complex systems, Big Data, consists of engineers from other fields who were mainly Internet of Things, industrial internet, predictive analytics, trained on the job (instead of receiving formal degrees in the prescriptive analytics field).
    [Show full text]
  • Volatility Estimation of Stock Prices Using Garch Method
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by International Institute for Science, Technology and Education (IISTE): E-Journals European Journal of Business and Management www.iiste.org ISSN 2222-1905 (Paper) ISSN 2222-2839 (Online) Vol.7, No.19, 2015 Volatility Estimation of Stock Prices using Garch Method Koima, J.K, Mwita, P.N Nassiuma, D.K Kabarak University Abstract Economic decisions are modeled based on perceived distribution of the random variables in the future, assessment and measurement of the variance which has a significant impact on the future profit or losses of particular portfolio. The ability to accurately measure and predict the stock market volatility has a wide spread implications. Volatility plays a very significant role in many financial decisions. The main purpose of this study is to examine the nature and the characteristics of stock market volatility of Kenyan stock markets and its stylized facts using GARCH models. Symmetric volatility model namly GARCH model was used to estimate volatility of stock returns. GARCH (1, 1) explains volatility of Kenyan stock markets and its stylized facts including volatility clustering, fat tails and mean reverting more satisfactorily.The results indicates the evidence of time varying stock return volatility over the sampled period of time. In conclusion it follows that in a financial crisis; the negative returns shocks have higher volatility than positive returns shocks. Keywords: GARCH, Stylized facts, Volatility clustering INTRODUCTION Volatility forecasting in financial market is very significant particularly in Investment, financial risk management and monetory policy making Poon and Granger (2003).Because of the link between volatility and risk,volatility can form a basis for efficient price discovery.Volatility implying predictability is very important phenomenon for traders and medium term - investors.
    [Show full text]
  • Lecture Notes on Descriptional Complexity and Randomness
    Lecture notes on descriptional complexity and randomness Peter Gács Boston University [email protected] A didactical survey of the foundations of Algorithmic Information The- ory. These notes introduce some of the main techniques and concepts of the field. They have evolved over many years and have been used and ref- erenced widely. “Version history” below gives some information on what has changed when. Contents Contents iii 1 Complexity1 1.1 Introduction ........................... 1 1.1.1 Formal results ...................... 3 1.1.2 Applications ....................... 6 1.1.3 History of the problem.................. 8 1.2 Notation ............................. 10 1.3 Kolmogorov complexity ..................... 11 1.3.1 Invariance ........................ 11 1.3.2 Simple quantitative estimates .............. 14 1.4 Simple properties of information ................ 16 1.5 Algorithmic properties of complexity .............. 19 1.6 The coding theorem....................... 24 1.6.1 Self-delimiting complexity................ 24 1.6.2 Universal semimeasure.................. 27 1.6.3 Prefix codes ....................... 28 1.6.4 The coding theorem for ¹Fº . 30 1.6.5 Algorithmic probability.................. 31 1.7 The statistics of description length ............... 32 2 Randomness 37 2.1 Uniform distribution....................... 37 2.2 Computable distributions..................... 40 2.2.1 Two kinds of test..................... 40 2.2.2 Randomness via complexity............... 41 2.2.3 Conservation of randomness............... 43 2.3 Infinite sequences ........................ 46 iii Contents 2.3.1 Null sets ......................... 47 2.3.2 Probability space..................... 52 2.3.3 Computability ...................... 54 2.3.4 Integral.......................... 54 2.3.5 Randomness tests .................... 55 2.3.6 Randomness and complexity............... 56 2.3.7 Universal semimeasure, algorithmic probability . 58 2.3.8 Randomness via algorithmic probability........
    [Show full text]
  • A Python Library for Neuroimaging Based Machine Learning
    Brain Predictability toolbox: a Python library for neuroimaging based machine learning 1 1 2 1 1 1 Hahn, S. ,​ Yuan, D.K. ,​ Thompson, W.K. ,​ Owens, M. ,​ Allgaier, N .​ and Garavan, H ​ ​ ​ ​ ​ ​ 1. Departments of Psychiatry and Complex Systems, University of Vermont, Burlington, VT 05401, USA 2. Division of Biostatistics, Department of Family Medicine and Public Health, University of California, San Diego, La Jolla, CA 92093, USA Abstract Summary Brain Predictability toolbox (BPt) represents a unified framework of machine learning (ML) tools designed to work with both tabulated data (in particular brain, psychiatric, behavioral, and physiological variables) and neuroimaging specific derived data (e.g., brain volumes and surfaces). This package is suitable for investigating a wide range of different neuroimaging based ML questions, in particular, those queried from large human datasets. Availability and Implementation BPt has been developed as an open-source Python 3.6+ package hosted at https://github.com/sahahn/BPt under MIT License, with documentation provided at ​ https://bpt.readthedocs.io/en/latest/, and continues to be actively developed. The project can be ​ downloaded through the github link provided. A web GUI interface based on the same code is currently under development and can be set up through docker with instructions at https://github.com/sahahn/BPt_app. ​ Contact Please contact Sage Hahn at [email protected] ​ Main Text 1 Introduction Large datasets in all domains are becoming increasingly prevalent as data from smaller existing studies are pooled and larger studies are funded. This increase in available data offers an unprecedented opportunity for researchers interested in applying machine learning (ML) based methodologies, especially those working in domains such as neuroimaging where data collection is quite expensive.
    [Show full text]
  • BIOSTATS Documentation
    BIOSTATS Documentation BIOSTATS is a collection of R functions written to aid in the statistical analysis of ecological data sets using both univariate and multivariate procedures. All of these functions make use of existing R functions and many are simply convenience wrappers for functions contained in other popular libraries, such as VEGAN and LABDSV for multivariate statistics, however others are custom functions written using functions in the base or stats libraries. Author: Kevin McGarigal, Professor Department of Environmental Conservation University of Massachusetts, Amherst Date Last Updated: 9 November 2016 Functions: all.subsets.gam.. 3 all.subsets.glm. 6 box.plots. 12 by.names. 14 ci.lines. 16 class.monte. 17 clus.composite.. 19 clus.stats. 21 cohen.kappa. 23 col.summary. 25 commission.mc. 27 concordance. 29 contrast.matrix. 33 cov.test. 35 data.dist. 37 data.stand. 41 data.trans. 44 dist.plots. 48 distributions. 50 drop.var. 53 edf.plots.. 55 ecdf.plots. 57 foa.plots.. 59 hclus.cophenetic. 62 hclus.scree. 64 hclus.table. 66 hist.plots. 68 intrasetcor. 70 Biostats Library 2 kappa.mc. 72 lda.structure.. 74 mantel2. 76 mantel.part. 78 mrpp2. 82 mv.outliers.. 84 nhclus.scree. 87 nmds.monte. 89 nmds.scree.. 91 norm.test. 93 ordi.monte.. 95 ordi.overlay. 97 ordi.part.. 99 ordi.scree. 105 pca.communality. 107 pca.eigenval. 109 pca.eigenvec. 111 pca.structure. 113 plot.anosim. 115 plot.mantel. 117 plot.mrpp.. 119 plot.ordi.part. 121 qqnorm.plots.. 123 ran.split. ..
    [Show full text]
  • Public Health & Intelligence
    Public Health & Intelligence PHI Trend Analysis Guidance Document Control Version Version 1.5 Date Issued March 2017 Author Róisín Farrell, David Walker / HI Team Comments to [email protected] Version Date Comment Author Version 1.0 July 2016 1st version of paper David Walker, Róisín Farrell Version 1.1 August Amending format; adding Róisín Farrell 2016 sections Version 1.2 November Adding content to sections Róisín Farrell, Lee 2016 1 and 2 Barnsdale Version 1.3 November Amending as per Róisín Farrell 2016 suggestions from SAG Version 1.4 December Structural changes Róisín Farrell, Lee 2016 Barnsdale Version 1.5 March 2017 Amending as per Róisín Farrell, Lee suggestions from SAG Barnsdale Contents 1. Introduction ........................................................................................................................................... 1 2. Understanding trend data for descriptive analysis ................................................................................. 1 3. Descriptive analysis of time trends ......................................................................................................... 2 3.1 Smoothing ................................................................................................................................................ 2 3.1.1 Simple smoothing ............................................................................................................................ 2 3.1.2 Median smoothing ..........................................................................................................................
    [Show full text]