Predictability of Extreme Events in Time Series

Total Page:16

File Type:pdf, Size:1020Kb

Predictability of Extreme Events in Time Series Predictability of Extreme Events in Time Series Dissertation zur Erlangung des Doktorgrades des Fachbereichs Mathematik und Naturwissenschaften (Fachgruppe Physik) Doctor rerum naturalium (Dr. rer. nat.) der Bergischen Universit¨at Wuppertal WUB-DIS 2008-05 vorgelegt von Sarah Hallerberg geboren am 28.12.1979 in Bad Oeynhausen JUNI 2008 erstellt am Max-Planck-Institut fur¨ Physik komplexer Systeme Dresden Diese Dissertation kann wie folgt zitiert werden: urn:nbn:de:hbz:468-20080438 [http://nbn-resolving.de/urn/resolver.pl?urn=urn%3Anbn%3Ade%3Ahbz%3A468-20080438] Abstract In this thesis we access the prediction of extreme events observing precursory structures, which were identified using a maximum likelihood approach. The main goal of this thesis is to investi- gate the dependence of the quality of a prediction on the magnitude of the events under study. Until now, this dependence was only sporadically reported for different phenomena without be- ing understood as a general feature of predictions. We propose the magnitude dependence as a property of a prediction, indicating, whether larger events can be better, harder or equally well predicted than smaller events. Furthermore we specify a condition which can characterize the magnitude dependence of a distinguished measure for the quality of a prediction, the Receiver Operator characteristic curve (ROC). This test condition allows to relate the magnitude depen- dence of a prediction task to the joint PDF of events and precursory variables. If we are able to describe the numerical estimate of this joint PDF by an analytic expression, we can not only characterize the magnitude dependence of events observed so far, but infer the magnitude depen- dence of events, larger then the observed events. Having the test condition specified, we study the magnitude dependence for the prediction of increments and threshold crossings in sequences of random variables and short- and long-range correlated stochastic processes. In dependence of the distribution of the process under study we obtain different magnitude dependences for the prediction of increments in Gaussian, exponentially, symmetrized exponentially, power-law and symmetrized power-law distributed processes. For threshold crossings we obtain the same magnitude dependence for all distributions studied. Furthermore we study the dependence on the event magnitude for the prediction of increments and threshold crossings in velocity incre- ments, measured in a free jet flow and in wind-speed measurements. Additionally we introduce a method of post-processing the output of ensemble weather forecast models in order to identify precursory behavior, which could indicate failures of weather forecasts. We then study not only the success of this method, but also the magnitude dependence. keywords: extreme events, statistical inference, prediction via precursors, ROC curves, likelihood ratio, magnitude dependence Zusammenfassung In dieser Arbeit untersuchen wir die Vorhersagen von Extremereignissen in Zeitreihen, welche mittels Beobachtung von Vorl¨auferstrukturen gemacht werden. Geeignete Vorl¨auferstrukturen w¨ahlen wir mit Hilfe von bedingten Wahrscheinlichkeitsverteilungen (Maximum Likelihood Ansatz) aus. Das Hauptanliegen dieser Arbeit ist es, die Abh¨angigkeit der Vorhersagen von der Ereignisgr¨oße zu untersuchen. Bisher wurde sporadisch f¨ur einzelne Ph¨anomene beschrieben, daß gr¨oßere Ereignisse besser vorhersagbar sind, als kleinere, ohne dies jedoch als eine allgemeine Eigenschaft von Vorhersagen zu interpretieren. Wir f¨uhren somit die Ereignisgr¨oßenabh¨angigkeit als Eigenschaft einer Vorhersage ein. Die Ereignisgr¨oßenabh¨angigkeit kann positiv, negativ oder null sein und somit angeben, ob gr¨oßere und somit extremere Ereignisse, besser, schlechter oder gleichgut vorherzusagen sind, wie kleinere. Des Weiteren geben wir eine Bedingung an, welche die Ereignisgr¨oßenabh¨angigkeit im Bezug auf ein bestimmtes Maß f¨ur die Qualit¨at von Vorhersagen, die Receiver Operator Characteristic, angibt. Besagte Bedingung erlaubt es, die Ereignisgr¨oßenabh¨angigkeit auf die Verbundverteilung von Ereignis und Vorl¨auferstruktur zur¨uckzuf¨uhren. Sofern wir in der Lage sind, diese Verteilung analytisch zu beschreiben, k¨onnen wir aus bisherigen Beobachtungen die Gr¨oßenabh¨angigkeit von Ereignissen einsch¨atzen, die gr¨oßer sind, als alle bisher beobachteten Ereignisse. Dabei setzen wir vorraus, daß extreme und nicht extreme Ereignisse der gleichen Wahrscheinlichkeitsverteilung folgen. Mit Hilfe der Testbedingung untersuchen wir nun die Ereignisgr¨oßenabh¨angigkeit der Vorhersagen von Inkre- menten und Schwellwert¨uberschreitungen in Folgen von Zufallszahlen, sowie in kurz- und langre- ichweitig korrelierten stochastischen Prozessen. In Abh¨angigkeit von der Verteilung des unter- suchten Prozesses erhalten wir unterschiedliche Ergebnisse f¨ur die Ereignsisgr¨oßenabh¨angigkeit der Vorhersagen von Inkrementen. Die Vorhersagen von Schwellwert¨uberschreitungen zeigten jedoch f¨ur alle untersuchten Verteilungen qualitativ die gleiche Ereignisgr¨oßenabh¨angigkeit. Des Weiteren haben wir Vorhersagen von Inkrementen und Schwellwert¨uberschreitungen in Zeitrei- hen aus Geschwindigkeitsinkrementen, die in einem Freistrahlexperiment und in bodennahem Wind gemessen wurden, untersucht. Zus¨atzlich haben wir auch eine Methode zur Vorhersage von abweichenden Wettervorhersagen entwickelt. Diese Methode konstruiert aus Ensemblewet- tervorhersagen Indikatoren f¨ur eine große Abweichung zwischen der vorhergesagten und der beobachteten Temperatur. Außerdem wurde auch an den Vorhersagen von abweichenden Tem- peraturvorhersagen die Ereignisgr¨oßenabh¨angigkeit untersucht. Stichw¨orter: Extremereignisse, Vorhersagen mittels Vorl¨auferstrukturen, Receiver Operator Charak- teristik, Maximum-Likelihood-Ansatz, Ereignisgr¨oßenabh¨angigkeit Acknowledgments First of all I would like to thank Holger Kantz for suggesting the idea to study the magnitude dependence, for innumerous discussions, advice, support and encouragement during his super- vision of my thesis. Additionally I have benefited from various discussions about the contents of this thesis and other topics. Therefore I would like to thank Eduardo Goldani Altmann, Anja Riegert, Alexan- der Croy, Detlef Holstein, Sara Pinto Garcia and Friedrich Lenz. I am especially grateful to Jochen Br¨ocker for his suggestions concerning the precursors for large failures of weather forecast and for pointing out the importance of the Neyman- Pearson Lemma. I am also grateful to the group of Joachim Peinke at the University of Oldenburg, especially to Julia Gotschall, for providing us with their excellent data from the free-jet experiment and their wind speed measurements. The weather predictions and the corresponding verifications were provided from the Euro- pean Center for Medium Range Weather forecasts and they were made accessible to me due to a collaboration with Leonard A. Smith and his group in the Center for the Analysis of Time Series at the London School of Economics. I would therefore like to use the opportunity to thank also the members of this group for their support. Furthermore I would like to thank Astrid S. de Wijn, Anja Riegert and Sara Pinto Garcia for their moral support, encouragement and friendship. Many thanks also to all the colleagues and staff of the Max Planck Institute for the Physics of Complex Systems for providing a friendly working environment and for their help in many practical issues. I am also extremely grateful to my parents for their help and support in many practical issues, especially for taking care of moving my possessions and refurbishing my flat these days, so that I could concentrate on completing this thesis. Finally I would like to thank the Max Planck Society for funding this thesis and the DAAD for funding the collaboration with the Center for the Analysis of Time Series in London. Hiermit versichere ich, dass ich die vorliegende Arbeit ohne unzul¨assige Hilfe Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe; die aus fremden Quellen direkt oder indirektubernommenen ¨ Gedanken sind als solche kenntlich gemacht. Die Arbeit wurde bisher weder im Inland, noch im Ausland in gleicher oder ¨ahnlicher Form einer anderen Pr¨ufungsbeh¨orde vorgelegt. Die vorliegende Dissertation wurde am Max-Planck-Institut f¨ur Physik komplexer Systeme unter der wissenschaftlichen Betreuung von Prof. Dr. Holger Kantz angefertigt. Es haben keine fr¨uheren Promotionsverfahren von mir stattgefunden. Hiermit erkenne ich die Promotionsordnung des Fachbereiches Mathematik und Naturwissenschaften der Bergischen Universit¨at Wuppertal an. Contents 1 Introduction 1 2 Theoretical Background 3 2.1ExtremeEventsintimeseries ............................ 3 2.1.1 Whatareextremeevents?........................... 3 2.1.2 ExtremeEventsinTimeseries........................ 4 2.1.3 Assigning a magnitude to the events under study . ......... 7 2.2PredictionsviaPrecursoryStructures........................ 8 2.2.1 Describing stochastic processes with conditional probabilities via Markov Chains...................................... 10 2.2.2 Applying finite conditioning to general (non Markovian) processes . 11 2.2.3 FromHypothesisTestingtotheBayesClassifier.............. 11 2.3MeasuringtheQualityofaPrediction........................ 18 2.3.1 Predictability, Kullback Leibler distance and Mutual Information . 19 2.3.2 ForecastVerification.............................. 20 2.3.3 BrierScoreandIgnorance..........................
Recommended publications
  • Kalman and Particle Filtering
    Abstract: The Kalman and Particle filters are algorithms that recursively update an estimate of the state and find the innovations driving a stochastic process given a sequence of observations. The Kalman filter accomplishes this goal by linear projections, while the Particle filter does so by a sequential Monte Carlo method. With the state estimates, we can forecast and smooth the stochastic process. With the innovations, we can estimate the parameters of the model. The article discusses how to set a dynamic model in a state-space form, derives the Kalman and Particle filters, and explains how to use them for estimation. Kalman and Particle Filtering The Kalman and Particle filters are algorithms that recursively update an estimate of the state and find the innovations driving a stochastic process given a sequence of observations. The Kalman filter accomplishes this goal by linear projections, while the Particle filter does so by a sequential Monte Carlo method. Since both filters start with a state-space representation of the stochastic processes of interest, section 1 presents the state-space form of a dynamic model. Then, section 2 intro- duces the Kalman filter and section 3 develops the Particle filter. For extended expositions of this material, see Doucet, de Freitas, and Gordon (2001), Durbin and Koopman (2001), and Ljungqvist and Sargent (2004). 1. The state-space representation of a dynamic model A large class of dynamic models can be represented by a state-space form: Xt+1 = ϕ (Xt,Wt+1; γ) (1) Yt = g (Xt,Vt; γ) . (2) This representation handles a stochastic process by finding three objects: a vector that l describes the position of the system (a state, Xt X R ) and two functions, one mapping ∈ ⊂ 1 the state today into the state tomorrow (the transition equation, (1)) and one mapping the state into observables, Yt (the measurement equation, (2)).
    [Show full text]
  • Lecture 19: Wavelet Compression of Time Series and Images
    Lecture 19: Wavelet compression of time series and images c Christopher S. Bretherton Winter 2014 Ref: Matlab Wavelet Toolbox help. 19.1 Wavelet compression of a time series The last section of wavelet leleccum notoolbox.m demonstrates the use of wavelet compression on a time series. The idea is to keep the wavelet coefficients of largest amplitude and zero out the small ones. 19.2 Wavelet analysis/compression of an image Wavelet analysis is easily extended to two-dimensional images or datasets (data matrices), by first doing a wavelet transform of each column of the matrix, then transforming each row of the result (see wavelet image). The wavelet coeffi- cient matrix has the highest level (largest-scale) averages in the first rows/columns, then successively smaller detail scales further down the rows/columns. The ex- ample also shows fine results with 50-fold data compression. 19.3 Continuous Wavelet Transform (CWT) Given a continuous signal u(t) and an analyzing wavelet (x), the CWT has the form Z 1 s − t W (λ, t) = λ−1=2 ( )u(s)ds (19.3.1) −∞ λ Here λ, the scale, is a continuous variable. We insist that have mean zero and that its square integrates to 1. The continuous Haar wavelet is defined: 8 < 1 0 < t < 1=2 (t) = −1 1=2 < t < 1 (19.3.2) : 0 otherwise W (λ, t) is proportional to the difference of running means of u over successive intervals of length λ/2. 1 Amath 482/582 Lecture 19 Bretherton - Winter 2014 2 In practice, for a discrete time series, the integral is evaluated as a Riemann sum using the Matlab wavelet toolbox function cwt.
    [Show full text]
  • Wavelet Entropy Energy Measure (WEEM): a Multiscale Measure to Grade a Geophysical System's Predictability
    EGU21-703, updated on 28 Sep 2021 https://doi.org/10.5194/egusphere-egu21-703 EGU General Assembly 2021 © Author(s) 2021. This work is distributed under the Creative Commons Attribution 4.0 License. Wavelet Entropy Energy Measure (WEEM): A multiscale measure to grade a geophysical system's predictability Ravi Kumar Guntu and Ankit Agarwal Indian Institute of Technology Roorkee, Hydrology, Roorkee, India ([email protected]) Model-free gradation of predictability of a geophysical system is essential to quantify how much inherent information is contained within the system and evaluate different forecasting methods' performance to get the best possible prediction. We conjecture that Multiscale Information enclosed in a given geophysical time series is the only input source for any forecast model. In the literature, established entropic measures dealing with grading the predictability of a time series at multiple time scales are limited. Therefore, we need an additional measure to quantify the information at multiple time scales, thereby grading the predictability level. This study introduces a novel measure, Wavelet Entropy Energy Measure (WEEM), based on Wavelet entropy to investigate a time series's energy distribution. From the WEEM analysis, predictability can be graded low to high. The difference between the entropy of a wavelet energy distribution of a time series and entropy of wavelet energy of white noise is the basis for gradation. The metric quantifies the proportion of the deterministic component of a time series in terms of energy concentration, and its range varies from zero to one. One corresponds to high predictable due to its high energy concentration and zero representing a process similar to the white noise process having scattered energy distribution.
    [Show full text]
  • Alternative Tests for Time Series Dependence Based on Autocorrelation Coefficients
    Alternative Tests for Time Series Dependence Based on Autocorrelation Coefficients Richard M. Levich and Rosario C. Rizzo * Current Draft: December 1998 Abstract: When autocorrelation is small, existing statistical techniques may not be powerful enough to reject the hypothesis that a series is free of autocorrelation. We propose two new and simple statistical tests (RHO and PHI) based on the unweighted sum of autocorrelation and partial autocorrelation coefficients. We analyze a set of simulated data to show the higher power of RHO and PHI in comparison to conventional tests for autocorrelation, especially in the presence of small but persistent autocorrelation. We show an application of our tests to data on currency futures to demonstrate their practical use. Finally, we indicate how our methodology could be used for a new class of time series models (the Generalized Autoregressive, or GAR models) that take into account the presence of small but persistent autocorrelation. _______________________________________________________________ An earlier version of this paper was presented at the Symposium on Global Integration and Competition, sponsored by the Center for Japan-U.S. Business and Economic Studies, Stern School of Business, New York University, March 27-28, 1997. We thank Paul Samuelson and the other participants at that conference for useful comments. * Stern School of Business, New York University and Research Department, Bank of Italy, respectively. 1 1. Introduction Economic time series are often characterized by positive
    [Show full text]
  • Time Series: Co-Integration
    Time Series: Co-integration Series: Economic Forecasting; Time Series: General; Watson M W 1994 Vector autoregressions and co-integration. Time Series: Nonstationary Distributions and Unit In: Engle R F, McFadden D L (eds.) Handbook of Econo- Roots; Time Series: Seasonal Adjustment metrics Vol. IV. Elsevier, The Netherlands N. H. Chan Bibliography Banerjee A, Dolado J J, Galbraith J W, Hendry D F 1993 Co- Integration, Error Correction, and the Econometric Analysis of Non-stationary Data. Oxford University Press, Oxford, UK Time Series: Cycles Box G E P, Tiao G C 1977 A canonical analysis of multiple time series. Biometrika 64: 355–65 Time series data in economics and other fields of social Chan N H, Tsay R S 1996 On the use of canonical correlation science often exhibit cyclical behavior. For example, analysis in testing common trends. In: Lee J C, Johnson W O, aggregate retail sales are high in November and Zellner A (eds.) Modelling and Prediction: Honoring December and follow a seasonal cycle; voter regis- S. Geisser. Springer-Verlag, New York, pp. 364–77 trations are high before each presidential election and Chan N H, Wei C Z 1988 Limiting distributions of least squares follow an election cycle; and aggregate macro- estimates of unstable autoregressive processes. Annals of Statistics 16: 367–401 economic activity falls into recession every six to eight Engle R F, Granger C W J 1987 Cointegration and error years and follows a business cycle. In spite of this correction: Representation, estimation, and testing. Econo- cyclicality, these series are not perfectly predictable, metrica 55: 251–76 and the cycles are not strictly periodic.
    [Show full text]
  • Biostatistics for Oral Healthcare
    Biostatistics for Oral Healthcare Jay S. Kim, Ph.D. Loma Linda University School of Dentistry Loma Linda, California 92350 Ronald J. Dailey, Ph.D. Loma Linda University School of Dentistry Loma Linda, California 92350 Biostatistics for Oral Healthcare Biostatistics for Oral Healthcare Jay S. Kim, Ph.D. Loma Linda University School of Dentistry Loma Linda, California 92350 Ronald J. Dailey, Ph.D. Loma Linda University School of Dentistry Loma Linda, California 92350 JayS.Kim, PhD, is Professor of Biostatistics at Loma Linda University, CA. A specialist in this area, he has been teaching biostatistics since 1997 to students in public health, medical school, and dental school. Currently his primary responsibility is teaching biostatistics courses to hygiene students, predoctoral dental students, and dental residents. He also collaborates with the faculty members on a variety of research projects. Ronald J. Dailey is the Associate Dean for Academic Affairs at Loma Linda and an active member of American Dental Educational Association. C 2008 by Blackwell Munksgaard, a Blackwell Publishing Company Editorial Offices: Blackwell Publishing Professional, 2121 State Avenue, Ames, Iowa 50014-8300, USA Tel: +1 515 292 0140 9600 Garsington Road, Oxford OX4 2DQ Tel: 01865 776868 Blackwell Publishing Asia Pty Ltd, 550 Swanston Street, Carlton South, Victoria 3053, Australia Tel: +61 (0)3 9347 0300 Blackwell Wissenschafts Verlag, Kurf¨urstendamm57, 10707 Berlin, Germany Tel: +49 (0)30 32 79 060 The right of the Author to be identified as the Author of this Work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
    [Show full text]
  • Big Data for Reliability Engineering: Threat and Opportunity
    Reliability, February 2016 Big Data for Reliability Engineering: Threat and Opportunity Vitali Volovoi Independent Consultant [email protected] more recently, analytics). It shares with the rest of the fields Abstract - The confluence of several technologies promises under this umbrella the need to abstract away most stormy waters ahead for reliability engineering. News reports domain-specific information, and to use tools that are mainly are full of buzzwords relevant to the future of the field—Big domain-independent1. As a result, it increasingly shares the Data, the Internet of Things, predictive and prescriptive lingua franca of modern systems engineering—probability and analytics—the sexier sisters of reliability engineering, both statistics that are required to balance the otherwise orderly and exciting and threatening. Can we reliability engineers join the deterministic engineering world. party and suddenly become popular (and better paid), or are And yet, reliability engineering does not wear the fancy we at risk of being superseded and driven into obsolescence? clothes of its sisters. There is nothing privileged about it. It is This article argues that“big-picture” thinking, which is at the rarely studied in engineering schools, and it is definitely not core of the concept of the System of Systems, is key for a studied in business schools! Instead, it is perceived as a bright future for reliability engineering. necessary evil (especially if the reliability issues in question are safety-related). The community of reliability engineers Keywords - System of Systems, complex systems, Big Data, consists of engineers from other fields who were mainly Internet of Things, industrial internet, predictive analytics, trained on the job (instead of receiving formal degrees in the prescriptive analytics field).
    [Show full text]
  • Volatility Estimation of Stock Prices Using Garch Method
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by International Institute for Science, Technology and Education (IISTE): E-Journals European Journal of Business and Management www.iiste.org ISSN 2222-1905 (Paper) ISSN 2222-2839 (Online) Vol.7, No.19, 2015 Volatility Estimation of Stock Prices using Garch Method Koima, J.K, Mwita, P.N Nassiuma, D.K Kabarak University Abstract Economic decisions are modeled based on perceived distribution of the random variables in the future, assessment and measurement of the variance which has a significant impact on the future profit or losses of particular portfolio. The ability to accurately measure and predict the stock market volatility has a wide spread implications. Volatility plays a very significant role in many financial decisions. The main purpose of this study is to examine the nature and the characteristics of stock market volatility of Kenyan stock markets and its stylized facts using GARCH models. Symmetric volatility model namly GARCH model was used to estimate volatility of stock returns. GARCH (1, 1) explains volatility of Kenyan stock markets and its stylized facts including volatility clustering, fat tails and mean reverting more satisfactorily.The results indicates the evidence of time varying stock return volatility over the sampled period of time. In conclusion it follows that in a financial crisis; the negative returns shocks have higher volatility than positive returns shocks. Keywords: GARCH, Stylized facts, Volatility clustering INTRODUCTION Volatility forecasting in financial market is very significant particularly in Investment, financial risk management and monetory policy making Poon and Granger (2003).Because of the link between volatility and risk,volatility can form a basis for efficient price discovery.Volatility implying predictability is very important phenomenon for traders and medium term - investors.
    [Show full text]
  • Generating Time Series with Diverse and Controllable Characteristics
    GRATIS: GeneRAting TIme Series with diverse and controllable characteristics Yanfei Kang,∗ Rob J Hyndman,† and Feng Li‡ Abstract The explosion of time series data in recent years has brought a flourish of new time series analysis methods, for forecasting, clustering, classification and other tasks. The evaluation of these new methods requires either collecting or simulating a diverse set of time series benchmarking data to enable reliable comparisons against alternative approaches. We pro- pose GeneRAting TIme Series with diverse and controllable characteristics, named GRATIS, with the use of mixture autoregressive (MAR) models. We simulate sets of time series using MAR models and investigate the diversity and coverage of the generated time series in a time series feature space. By tuning the parameters of the MAR models, GRATIS is also able to efficiently generate new time series with controllable features. In general, as a costless surrogate to the traditional data collection approach, GRATIS can be used as an evaluation tool for tasks such as time series forecasting and classification. We illustrate the usefulness of our time series generation process through a time series forecasting application. Keywords: Time series features; Time series generation; Mixture autoregressive models; Time series forecasting; Simulation. 1 Introduction With the widespread collection of time series data via scanners, monitors and other automated data collection devices, there has been an explosion of time series analysis methods developed in the past decade or two. Paradoxically, the large datasets are often also relatively homogeneous in the industry domain, which limits their use for evaluation of general time series analysis methods (Keogh and Kasetty, 2003; Mu˜nozet al., 2018; Kang et al., 2017).
    [Show full text]
  • A Python Library for Neuroimaging Based Machine Learning
    Brain Predictability toolbox: a Python library for neuroimaging based machine learning 1 1 2 1 1 1 Hahn, S. ,​ Yuan, D.K. ,​ Thompson, W.K. ,​ Owens, M. ,​ Allgaier, N .​ and Garavan, H ​ ​ ​ ​ ​ ​ 1. Departments of Psychiatry and Complex Systems, University of Vermont, Burlington, VT 05401, USA 2. Division of Biostatistics, Department of Family Medicine and Public Health, University of California, San Diego, La Jolla, CA 92093, USA Abstract Summary Brain Predictability toolbox (BPt) represents a unified framework of machine learning (ML) tools designed to work with both tabulated data (in particular brain, psychiatric, behavioral, and physiological variables) and neuroimaging specific derived data (e.g., brain volumes and surfaces). This package is suitable for investigating a wide range of different neuroimaging based ML questions, in particular, those queried from large human datasets. Availability and Implementation BPt has been developed as an open-source Python 3.6+ package hosted at https://github.com/sahahn/BPt under MIT License, with documentation provided at ​ https://bpt.readthedocs.io/en/latest/, and continues to be actively developed. The project can be ​ downloaded through the github link provided. A web GUI interface based on the same code is currently under development and can be set up through docker with instructions at https://github.com/sahahn/BPt_app. ​ Contact Please contact Sage Hahn at [email protected] ​ Main Text 1 Introduction Large datasets in all domains are becoming increasingly prevalent as data from smaller existing studies are pooled and larger studies are funded. This increase in available data offers an unprecedented opportunity for researchers interested in applying machine learning (ML) based methodologies, especially those working in domains such as neuroimaging where data collection is quite expensive.
    [Show full text]
  • BIOSTATS Documentation
    BIOSTATS Documentation BIOSTATS is a collection of R functions written to aid in the statistical analysis of ecological data sets using both univariate and multivariate procedures. All of these functions make use of existing R functions and many are simply convenience wrappers for functions contained in other popular libraries, such as VEGAN and LABDSV for multivariate statistics, however others are custom functions written using functions in the base or stats libraries. Author: Kevin McGarigal, Professor Department of Environmental Conservation University of Massachusetts, Amherst Date Last Updated: 9 November 2016 Functions: all.subsets.gam.. 3 all.subsets.glm. 6 box.plots. 12 by.names. 14 ci.lines. 16 class.monte. 17 clus.composite.. 19 clus.stats. 21 cohen.kappa. 23 col.summary. 25 commission.mc. 27 concordance. 29 contrast.matrix. 33 cov.test. 35 data.dist. 37 data.stand. 41 data.trans. 44 dist.plots. 48 distributions. 50 drop.var. 53 edf.plots.. 55 ecdf.plots. 57 foa.plots.. 59 hclus.cophenetic. 62 hclus.scree. 64 hclus.table. 66 hist.plots. 68 intrasetcor. 70 Biostats Library 2 kappa.mc. 72 lda.structure.. 74 mantel2. 76 mantel.part. 78 mrpp2. 82 mv.outliers.. 84 nhclus.scree. 87 nmds.monte. 89 nmds.scree.. 91 norm.test. 93 ordi.monte.. 95 ordi.overlay. 97 ordi.part.. 99 ordi.scree. 105 pca.communality. 107 pca.eigenval. 109 pca.eigenvec. 111 pca.structure. 113 plot.anosim. 115 plot.mantel. 117 plot.mrpp.. 119 plot.ordi.part. 121 qqnorm.plots.. 123 ran.split. ..
    [Show full text]
  • Public Health & Intelligence
    Public Health & Intelligence PHI Trend Analysis Guidance Document Control Version Version 1.5 Date Issued March 2017 Author Róisín Farrell, David Walker / HI Team Comments to [email protected] Version Date Comment Author Version 1.0 July 2016 1st version of paper David Walker, Róisín Farrell Version 1.1 August Amending format; adding Róisín Farrell 2016 sections Version 1.2 November Adding content to sections Róisín Farrell, Lee 2016 1 and 2 Barnsdale Version 1.3 November Amending as per Róisín Farrell 2016 suggestions from SAG Version 1.4 December Structural changes Róisín Farrell, Lee 2016 Barnsdale Version 1.5 March 2017 Amending as per Róisín Farrell, Lee suggestions from SAG Barnsdale Contents 1. Introduction ........................................................................................................................................... 1 2. Understanding trend data for descriptive analysis ................................................................................. 1 3. Descriptive analysis of time trends ......................................................................................................... 2 3.1 Smoothing ................................................................................................................................................ 2 3.1.1 Simple smoothing ............................................................................................................................ 2 3.1.2 Median smoothing ..........................................................................................................................
    [Show full text]