Daniel Zelterman Applied Multivariate Statistics with R Statistics for Biology and Health

Total Page:16

File Type:pdf, Size:1020Kb

Daniel Zelterman Applied Multivariate Statistics with R Statistics for Biology and Health Statistics for Biology and Health Daniel Zelterman Applied Multivariate Statistics with R Statistics for Biology and Health Series Editors Mitchell Gail Jonathan M. Samet Anastasios Tsiatis Wing Wong More information about this series at http://www.springer.com/series/2848 Daniel Zelterman Applied Multivariate Statistics with R 123 Daniel Zelterman School of Public Health Yale University New Haven, CT, USA ISSN 1431-8776 ISSN 2197-5671 (electronic) Statistics for Biology and Health ISBN 978-3-319-14092-6 ISBN 978-3-319-14093-3 (eBook) DOI 10.1007/978-3-319-14093-3 Library of Congress Control Number: 2015942244 Springer Cham Heidelberg New York Dordrecht London © Springer International Publishing Switzerland 2015 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publica- tion does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper Springer International Publishing AG Switzerland is part of Springer Science+Business Media (www. springer.com) Barry H Margolin 1943–2009 Teacher, mentor, and friend Permissions R is Copyright c 2014 The R Foundation for Statistical Computing. R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Vienna, Austria. URL: http://www. R-project.org/. We are grateful for the use of data sets, used with permission as follows: Public domain data in Table 1.4 was obtained from the U.S. Cancer Statistics Working Group. United States Cancer Statistics: 1999–2007 Inci- dence and Mortality Web-based Report. Atlanta: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention and Na- tional Cancer Institute; 2010. Available online at www.cdc.gov/uscs. The data of Table 1.5 is used with permission of S&P Dow Jones. The data of Table 1.6 was collected from www.fatsecret.com and used with permission. The idea for this table was suggested by the article originally appearing in The Daily Beast, available online at http://www.thedailybeast.com/galleries/2010/06/17/ 40-unhealthiest-burgers.html. The data in Table 7.1 is used with permission of S&P Dow Jones Indices. The data in Table 7.4 was gathered from www.fatsectret.com and used with permission. The idea for this table was suggested by the article originally appearing in The Daily Beast, available online at http://www.thedailybeast.com/galleries/2010/10/18/ halloween-candy.html. The data in Table 7.5 was obtained from https://health.data.ny.gov/. The data in Table 8.2 is used with permission of Dr. Deepak Nayaran, Yale University School of Medicine. The data in Table 8.3 was reported by the National Vital Statistics Sys- tem (NVSS): http://www.cdc.gov/nchs/deaths.htm. The data in Table 9.3 was generated as part of theMedicareCurrentBen- eficiary Survey: http://www.cms.gov/mcbs/ andavailableonthecdc.gov website. viii The data in Table 9.4 was collected by the United States Department of Labor, Mine Safety and Health Administration. This data is available online at http://www.msha.gov/stats/centurystats/coalstats.asp. The data of Table 9.6 is used with permission of Street Authority and David Sterman. The data in Table 11.1 is adapted from Statistics Canada (2009) “Health Care Professionals and Official-Language Minorities in Canada 2001 and 2006,” Catalog no. 91-550-X, Text Table 1.1, appearing on page 13. The data source in Table 11.5 is copyright 2010, Morningstar, Inc., Morn- ingstar Bond Market Commentary September, 2010. All Rights Reserved. Used with permission. The data in Table 11.6 iscitedfromUSDepartment of Health and Human Services, Administration on Aging. The data is available online at http://www.aoa.gov/AoARoot/Aging\_Statistics/index.aspx. The website http://www.gastonsanchez.com provided some of the code appearing in Output 11.1. Global climate data appearing in Table 12.2 is used with permission: Climate Charts & Graphs, courtesy of Kelly O’Day. Tables 13.1 and 13.6 are reprinted from Stigler (1994) and used with permission of the Institute of Mathematical Statistics. In Addition Several data sets referenced and used as examples were obtained from the UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science (Bache and Lichman 2013). Grant Support The author acknowledges support from grants from the National Institute of Mental Health, National Cancer Institute, and the National Institute of Environmental Health Sciences. The content is solely the responsibility of the author and does not necessarily represent the official views of the National Institutes of Health. Preface ULTIVARIATE STATISTICS is a mature field with many different M methods. Many of these are mathematical. Fortunately, these meth- ods have been programmed so you should be able to run these on your com- puter without much difficulty. This book is targeted to a graduate-level practitioner who may need to use these methods but does not necessarily know about the mathematical derivations. For example, we use the sample average of the multivariate distribution to estimate the population mean but do not need to prove op- timal properties of such an estimator when sampled from a normal parent population. Readers may want to analyze their data, motivated by discipline- specific questions. They will discover ways to get at some important results without a degree in statistics. Similarly, those well trained in statistics will likely be familiar with many of the univariate topics covered here, but now can learn about new methods. The reader should have taken at least one course in statistics previously and have some familiarity with such topics at t-test, degrees of freedom (df), p-values, statistical significance, and the chi-squared test of independence in a2× 2 table. He or she should also know the basic rules of probability such as independence and conditional probability. The reader should have some basic computing skills including data editing. It is not necessary to have experience with R or with programming languages although these are good skills to develop. We will assume that the reader has a rudimentary acquaintance of the univariate normal distribution. We begin a discussion of multivariate models with an introduction of the bivariate normal distribution in Chap. 6.These are used to make the leap from the scalar notation to the use of vectors and matrices used in Chap. 7 on the multivariate normal distribution. A brief review of linear algebra appears in Chap. 4, including the correspond- ing computations in R. Other multivariate distributions include models for extremes, described in Sect. 13.3. We frequently include the necessary software to run the programs in R because we need to be able to perform these methods with real data. In some cases we need to manipulate the data in order to get it to fit into the proper format. Readers may want to produce some of the graphical displays ix Preface x given in Chap. 3 for their own data. For these readers, the full programs to produce the figures are listed in that chapter. The field of statistics has developed many useful methods for analyzing data, and many of these methods are already programmed for you and readily available in R.What’smore,R is free, widely available, open source, flexible, and the current fashion in statistical computing. Authors of new statistical methods are regularly contributing to the many libraries in R so many new results are included as well. As befitting the Springer series in Life Sciences, Medicine & Health, a large portion of the examples given here are health related or biologically oriented. There are also a large number of examples from other disciplines. There are several reasons for this, including the abundance of good exam- ples that are available. Examples from other disciplines do a good job of illustrating the method without a great deal of background knowledge of the data. For example, Chap. 9 on multivariable linear regression methods begins with an example of data for different car models. Because the measurements on the cars are readily understood by the reader with little or no additional explanation, we can concentrate on the statistical methods rather than spend- ing time on the example details. In contrast, the second example, presented in Sect. 9.3, is about a large health survey and requires a longer introduction for the reader to appreciate the data. But at that point in Chap. 9,weare already familiar with the statistical tools and can address issues raised by the survey data. New Haven, CT, USA Daniel Zelterman Acknowledgments Special thanks are due to Michael Kane and Forrest Crawford, who together taught me enough R to fill a book, and Rob Muirhead, who taught multi- variate statistics out of TW Anderson’s text.
Recommended publications
  • Arxiv:1910.08883V3 [Stat.ML] 2 Apr 2021 in Real Data [8,9]
    Nonpar MANOVA via Independence Testing Sambit Panda1;2, Cencheng Shen3, Ronan Perry1, Jelle Zorn4, Antoine Lutz4, Carey E. Priebe5 and Joshua T. Vogelstein1;2;6∗ Abstract. The k-sample testing problem tests whether or not k groups of data points are sampled from the same distri- bution. Multivariate analysis of variance (Manova) is currently the gold standard for k-sample testing but makes strong, often inappropriate, parametric assumptions. Moreover, independence testing and k-sample testing are tightly related, and there are many nonparametric multivariate independence tests with strong theoretical and em- pirical properties, including distance correlation (Dcorr) and Hilbert-Schmidt-Independence-Criterion (Hsic). We prove that universally consistent independence tests achieve universally consistent k-sample testing, and that k- sample statistics like Energy and Maximum Mean Discrepancy (MMD) are exactly equivalent to Dcorr. Empirically evaluating these tests for k-sample-scenarios demonstrates that these nonparametric independence tests typically outperform Manova, even for Gaussian distributed settings. Finally, we extend these non-parametric k-sample- testing procedures to perform multiway and multilevel tests. Thus, we illustrate the existence of many theoretically motivated and empirically performant k-sample-tests. A Python package with all independence and k-sample tests called hyppo is available from https://hyppo.neurodata.io/. 1 Introduction A fundamental problem in statistics is the k-sample testing problem. Consider the p p two-sample problem: we obtain two datasets ui 2 R for i = 1; : : : ; n and vj 2 R for j = 1; : : : ; m. Assume each ui is sampled independently and identically (i.i.d.) from FU and that each vj is sampled i.i.d.
    [Show full text]
  • Generalized Principal Component Analysis
    Generalized Principal Component Analysis Karo Solat Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Economics Aris Spanos, Chair Esfandiar Maasoumi Kwok Ping Tsang Richard A. Ashley Eric A. Bahel April 30, 2018 Blacksburg, Virginia Keywords: Factor Model, PCA, Elliptically Contoured Distributions, Exchange Rate, Forecasting, Monte Carlo Simulation, Statistical Adequacy Copyright 2018, Karo Solat Generalized Principal Component Analysis Karo Solat (ABSTRACT) The primary objective of this dissertation is to extend the classical Principal Components Analysis (PCA), aiming to reduce the dimensionality of a large number of Normal interre- lated variables, in two directions. The first is to go beyond the static (contemporaneous or synchronous) covariance matrix among these interrelated variables to include certain forms of temporal (over time) dependence. The second direction takes the form of extending the PCA model beyond the Normal multivariate distribution to the Elliptically Symmetric fam- ily of distributions, which includes the Normal, the Student's t, the Laplace and the Pearson type II distributions as special cases. The result of these extensions is called the Generalized principal component analysis (GPCA). The GPCA is illustrated using both Monte Carlo simulations as well as an empirical study, in an attempt to demonstrate the enhanced reliability of these more general factor models in the context of out-of-sample forecasting. The empirical study examines the predictive capacity of the GPCA method in the context of Exchange Rate Forecasting, showing how the GPCA method dominates forecasts based on existing standard methods, including the random walk models, with or without including macroeconomic fundamentals.
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • 13Th International Workshop on Matrices and Statistics Program And
    13th International Workshop on Matrices and Statistics B¸edlewo, Poland August 18–21, 2004 Program and abstracts Organizers • Stefan Banach International Mathematical Center, Warsaw • Committee of Mathematics of the Polish Academy of Sciences, Warsaw • Faculty of Mathematics and Computer Science, Adam Mickiewicz Uni- versity, Pozna´n • Institute of Socio-Economic Geography and Spatial Management, Faculty of Geography and Geology, Adam Mickiewicz University, Pozna´n • Department of Mathematical and Statistical Methods, Agricultural Uni- versity, Pozna´n This meeting has been endorsed by the International Linear Algebra Society. II Sponsors of the 13th International Workshop on Matrices and Statistics III Mathematical Research and Conference Center in B¸edlewo IV edited by A. Markiewicz Department of Mathematical and Statistical Methods, Agricultural University, Pozna´n,Poland and W. Wo ly´nski Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Pozna´n,Poland Contents Part I. Introduction Poetical Licence The New Intellectual Aristocracy ............ 3 Richard William Farebrother Part II. Local Information Part III. Program Part IV. Ingram Olkin Ingram Olkin, Statistical Statesman .......................... 21 Yadolah Dodge Why is matrix analysis part of the statistics curriculum ...... 25 Ingram Olkin A brief biography and appreciation of Ingram Olkin .......... 31 A conversation with Ingram Olkin ............................ 34 Bibliography .................................................. 58 Part V. Abstracts Asymptotic
    [Show full text]
  • Univariate and Multivariate Skewness and Kurtosis 1
    Running head: UNIVARIATE AND MULTIVARIATE SKEWNESS AND KURTOSIS 1 Univariate and Multivariate Skewness and Kurtosis for Measuring Nonnormality: Prevalence, Influence and Estimation Meghan K. Cain, Zhiyong Zhang, and Ke-Hai Yuan University of Notre Dame Author Note This research is supported by a grant from the U.S. Department of Education (R305D140037). However, the contents of the paper do not necessarily represent the policy of the Department of Education, and you should not assume endorsement by the Federal Government. Correspondence concerning this article can be addressed to Meghan Cain ([email protected]), Ke-Hai Yuan ([email protected]), or Zhiyong Zhang ([email protected]), Department of Psychology, University of Notre Dame, 118 Haggar Hall, Notre Dame, IN 46556. UNIVARIATE AND MULTIVARIATE SKEWNESS AND KURTOSIS 2 Abstract Nonnormality of univariate data has been extensively examined previously (Blanca et al., 2013; Micceri, 1989). However, less is known of the potential nonnormality of multivariate data although multivariate analysis is commonly used in psychological and educational research. Using univariate and multivariate skewness and kurtosis as measures of nonnormality, this study examined 1,567 univariate distriubtions and 254 multivariate distributions collected from authors of articles published in Psychological Science and the American Education Research Journal. We found that 74% of univariate distributions and 68% multivariate distributions deviated from normal distributions. In a simulation study using typical values of skewness and kurtosis that we collected, we found that the resulting type I error rates were 17% in a t-test and 30% in a factor analysis under some conditions. Hence, we argue that it is time to routinely report skewness and kurtosis along with other summary statistics such as means and variances.
    [Show full text]
  • Multivariate Chemometrics As a Strategy to Predict the Allergenic Nature of Food Proteins
    S S symmetry Article Multivariate Chemometrics as a Strategy to Predict the Allergenic Nature of Food Proteins Miroslava Nedyalkova 1 and Vasil Simeonov 2,* 1 Department of Inorganic Chemistry, Faculty of Chemistry and Pharmacy, University of Sofia, 1 James Bourchier Blvd., 1164 Sofia, Bulgaria; [email protected]fia.bg 2 Department of Analytical Chemistry, Faculty of Chemistry and Pharmacy, University of Sofia, 1 James Bourchier Blvd., 1164 Sofia, Bulgaria * Correspondence: [email protected]fia.bg Received: 3 September 2020; Accepted: 21 September 2020; Published: 29 September 2020 Abstract: The purpose of the present study is to develop a simple method for the classification of food proteins with respect to their allerginicity. The methods applied to solve the problem are well-known multivariate statistical approaches (hierarchical and non-hierarchical cluster analysis, two-way clustering, principal components and factor analysis) being a substantial part of modern exploratory data analysis (chemometrics). The methods were applied to a data set consisting of 18 food proteins (allergenic and non-allergenic). The results obtained convincingly showed that a successful separation of the two types of food proteins could be easily achieved with the selection of simple and accessible physicochemical and structural descriptors. The results from the present study could be of significant importance for distinguishing allergenic from non-allergenic food proteins without engaging complicated software methods and resources. The present study corresponds entirely to the concept of the journal and of the Special issue for searching of advanced chemometric strategies in solving structural problems of biomolecules. Keywords: food proteins; allergenicity; multivariate statistics; structural and physicochemical descriptors; classification 1.
    [Show full text]
  • If I Were Doing a Quick ANOVA Analysis
    ANOVA Note: If I were doing a quick ANOVA analysis (without the diagnostic checks, etc.), I’d do the following: 1) load the packages (#1); 2) do the prep work (#2); and 3) run the ggstatsplot::ggbetweenstats analysis (in the #6 section). 1. Packages needed. Here are the recommended packages to load prior to working. library(ggplot2) # for graphing library(ggstatsplot) # for graphing and statistical analyses (one-stop shop) library(GGally) # This package offers the ggpairs() function. library(moments) # This package allows for skewness and kurtosis functions library(Rmisc) # Package for calculating stats and bar graphs of means library(ggpubr) # normality related commands 2. Prep Work Declare factor variables as such. class(mydata$catvar) # this tells you how R currently sees the variable (e.g., double, factor) mydata$catvar <- factor(mydata$catvar) #Will declare the specified variable as a factor variable 3. Checking data for violations of assumptions: a) relatively equal group sizes; b) equal variances; and c) normal distribution. a. Group Sizes Group counts (to check group frequencies): table(mydata$catvar) b. Checking Equal Variances Group means and standard deviations (Note: the aggregate and by commands give you the same results): aggregate(mydata$intvar, by = list(mydata$catvar), FUN = mean, na.rm = TRUE) aggregate(mydata$intvar, by = list(mydata$catvar), FUN = sd, na.rm = TRUE) by(mydata$intvar, mydata$catvar, mean, na.rm = TRUE) by(mydata$intvar, mydata$catvar, sd, na.rm = TRUE) A simple bar graph of group means and CIs (using Rmisc package). This command is repeated further below in the graphing section. The ggplot command will vary depending on the number of categories in the grouping variable.
    [Show full text]
  • Healthy Volunteers (Retrospective Study) Urolithiasis Healthy Volunteers Group P-Value 5 (N = 110) (N = 157)
    Supplementary Table 1. The result of Gal3C-S-OPN between urolithiasis and healthy volunteers (retrospective study) Urolithiasis Healthy volunteers Group p-Value 5 (n = 110) (n = 157) median (IQR 1) median (IQR 1) Gal3C-S-OPN 2 515 1118 (810–1543) <0.001 (MFI 3) (292–786) uFL-OPN 4 14 56392 <0.001 (ng/mL/mg protein) (10–151) (30270-115516) Gal3C-S-OPN 2 52 0.007 /uFL-OPN 4 <0.001 (5.2–113.0) (0.003–0.020) (MFI 3/uFL-OPN 4) 1 IQR, Interquartile range; 2 Gal3C-S-OPN, Gal3C-S lectin reactive osteopontin; 3 MFI, mean fluorescence intensity; 4 uFL-OPN, Urinary full-length-osteopontin; 5 p-Value, Mann–Whitney U-test. Supplementary Table 2. Sex-related difference between Gal3C-S-OPN and Gal3C-S-OPN normalized to uFL- OPN (retrospective study) Group Urolithiasis Healthy volunteers p-Value 5 Male a Female b Male c Female d a vs. b c vs. d (n = 61) (n = 49) (n = 57) (n = 100) median (IQR 1) median (IQR 1) Gal3C-S-OPN 2 1216 972 518 516 0.15 0.28 (MFI 3) (888-1581) (604-1529) (301-854) (278-781) Gal3C-S-OPN 2 67 42 0.012 0.006 /uFL-OPN 4 0.62 0.56 (9-120) (4-103) (0.003-0.042) (0.002-0.014) (MFI 3/uFL-OPN 4) 1 IQR, Interquartile range; 2 Gal3C-S-OPN, Gal3C-S lectin reactive osteopontin; 3MFI, mean fluorescence intensity; 4 uFL-OPN, Urinary full-length-osteopontin; 5 p-Value, Mann–Whitney U-test.
    [Show full text]
  • Violin Plots: a Box Plot-Density Trace Synergism Author(S): Jerry L
    Violin Plots: A Box Plot-Density Trace Synergism Author(s): Jerry L. Hintze and Ray D. Nelson Source: The American Statistician, Vol. 52, No. 2 (May, 1998), pp. 181-184 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2685478 Accessed: 02/09/2010 11:01 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=astata. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to The American Statistician. http://www.jstor.org StatisticalComputing and Graphics ViolinPlots: A Box Plot-DensityTrace Synergism JerryL.
    [Show full text]
  • Statistical Analysis in JASP
    Copyright © 2018 by Mark A Goss-Sampson. All rights reserved. This book or any portion thereof may not be reproduced or used in any manner whatsoever without the express written permission of the author except for the purposes of research, education or private study. CONTENTS PREFACE .................................................................................................................................................. 1 USING THE JASP INTERFACE .................................................................................................................... 2 DESCRIPTIVE STATISTICS ......................................................................................................................... 8 EXPLORING DATA INTEGRITY ................................................................................................................ 15 ONE SAMPLE T-TEST ............................................................................................................................. 22 BINOMIAL TEST ..................................................................................................................................... 25 MULTINOMIAL TEST .............................................................................................................................. 28 CHI-SQUARE ‘GOODNESS-OF-FIT’ TEST............................................................................................. 30 MULTINOMIAL AND Χ2 ‘GOODNESS-OF-FIT’ TEST. ..........................................................................
    [Show full text]
  • Recent Outlier Detection Methods with Illustrations Loss Reserving Context
    Recent outlier detection methods with illustrations in loss reserving Benjamin Avanzi, Mark Lavender, Greg Taylor, Bernard Wong School of Risk and Actuarial Studies, UNSW Sydney Insights, 18 September 2017 Recent outlier detection methods with illustrations loss reserving Context Context Reserving Robustness and Outliers Robust Statistical Techniques Robustness criteria Heuristic Tools Robust M-estimation Outlier Detection Techniques Robust Reserving Overview Illustration - Robust Bivariate Chain Ladder Robust N-Dimensional Chain-Ladder Summary and Conclusions References 1/46 Recent outlier detection methods with illustrations loss reserving Context Reserving Context Reserving Robustness and Outliers Robust Statistical Techniques Robustness criteria Heuristic Tools Robust M-estimation Outlier Detection Techniques Robust Reserving Overview Illustration - Robust Bivariate Chain Ladder Robust N-Dimensional Chain-Ladder Summary and Conclusions References 1/46 Recent outlier detection methods with illustrations loss reserving Context Reserving The Reserving Problem i/j 1 2 ··· j ··· I 1 X1;1 X1;2 ··· X1;j ··· X1;J 2 X2;1 X2;2 ··· X2;j ··· . i Xi;1 Xi;2 ··· Xi;j . I XI ;1 Figure: Aggregate claims run-off triangle I Complete the square (or rectangle) I Also - multivariate extensions. 1/46 Recent outlier detection methods with illustrations loss reserving Context Reserving Common Reserving Techniques I Deterministic Chain-Ladder I Stochastic Chain-Ladder (Hachmeister and Stanard, 1975; England and Verrall, 2002) I Mack’s Model (Mack, 1993) I GLMs
    [Show full text]
  • Download from the Resource and Environment Data Cloud Platform (
    sustainability Article Quantitative Assessment for the Dynamics of the Main Ecosystem Services and their Interactions in the Northwestern Arid Area, China Tingting Kang 1,2, Shuai Yang 1,2, Jingyi Bu 1,2 , Jiahao Chen 1,2 and Yanchun Gao 1,* 1 Key Laboratory of Water Cycle and Related Land Surface Processes, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, P.O. Box 9719, Beijing 100101, China; [email protected] (T.K.); [email protected] (S.Y.); [email protected] (J.B.); [email protected] (J.C.) 2 University of the Chinese Academy of Sciences, Beijing 100049, China * Correspondence: [email protected] Received: 21 November 2019; Accepted: 13 January 2020; Published: 21 January 2020 Abstract: Evaluating changes in the spatial–temporal patterns of ecosystem services (ESs) and their interrelationships provide an opportunity to understand the links among them, as well as to inform ecosystem-service-based decision making and governance. To research the development trajectory of ecosystem services over time and space in the northwestern arid area of China, three main ecosystem services (carbon sequestration, soil retention, and sand fixation) and their relationships were analyzed using Pearson’s coefficient and a time-lagged cross-correlation analysis based on the mountain–oasis–desert zonal scale. The results of this study were as follows: (1) The carbon sequestration of most subregions improved, and that of the oasis continuously increased from 1990 to 2015. Sand fixation decreased in the whole region except for in the Alxa Plateau–Hexi Corridor area, which experienced an “increase–decrease” trend before 2000 and after 2000; (2) the synergies and trade-off relationships of the ES pairs varied with time and space, but carbon sequestration and soil retention in the most arid area had a synergistic relationship, and most oases retained their synergies between carbon sequestration and sand fixation over time.
    [Show full text]