Applied and Computational Statistics • Sorana D

Total Page:16

File Type:pdf, Size:1020Kb

Applied and Computational Statistics • Sorana D Applied and Computational Statistics Computational and Applied • Sorana D. Bolboacă D. Sorana • Applied and Computational Statistics Edited by Sorana D. Bolboacă Printed Edition of the Special Issue Published in Mathematics www.mdpi.com/journal/mathematics Applied and Computational Statistics Applied and Computational Statistics Special Issue Editor Sorana D. Bolboac˘a MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade Special Issue Editor Sorana D. Bolboaca˘ Iuliu Hat¸ieganu University of Medicine and Pharmacy Romania Editorial Office MDPI St. Alban-Anlage 66 4052 Basel, Switzerland This is a reprint of articles from the Special Issue published online in the open access journal Mathematics (ISSN 2227-7390) from 2018 to 2019 (available at: https://www.mdpi.com/journal/ mathematics/special issues/applied computational statistics). For citation purposes, cite each article independently as indicated on the article page online and as indicated below: LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. Journal Name Year, Article Number, Page Range. ISBN 978-3-03928-176-3 (Pbk) ISBN 978-3-03928-177-0 (PDF) c 2020 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications. The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND. Contents About the Special Issue Editor ...................................... vii Preface to ”Applied and Computational Statistics” .......................... ix Ibrahim Elbatal, Farrukh Jamal, Christophe Chesneau, Mohammed Elgarhy and Sharifah Alrajhi The Modified Beta Gompertz Distribution: Theory and Applications Reprinted from: Mathematics 2019, 7, 3, doi:10.3390/math7010003 .................. 1 Lorentz J¨antschi and Sorana D. Bolboac˘a Computation of Probability Associated with Anderson–Darling Statistic Reprinted from: Mathematics 2018, 6, 88, doi:10.3390/math6060088 .................. 18 Miltiadis S. Chalikias Optimal Repeated Measurements for Two Treatment Designs with Dependent Observations: The Case of Compound Symmetry Reprinted from: Mathematics 2019, 7, 378, doi:10.3390/math7040378 ................. 35 Lili Tan, Yunzhan Gong and Yawen Wang A Model for Predicting Statement Mutation Scores Reprinted from: Mathematics 2019, 7, 778, doi:10.3390/math7090778 ................. 41 Dan-Marian Joit¸a and Lorentz J¨antschi Extending the Characteristic Polynomial for Characterization of C20 Fullerene Congeners Reprinted from: Mathematics 2017, 5, 84, doi:10.3390/math5040084 .................. 80 v About the Special Issue Editor Sorana D. Bolboac˘a is a professor of medical informatics and biostatistics at the ”Iuliu Hat, ieganu” University of Medicine and Pharmacy Cluj-Napoca, Romania. She earned her Ph.D. in Medicine (2006) from the Iuliu Hat¸ieganu University of Medicine and Pharmacy (title of the thesis Evidence-Based Medicine: Logistics and Implementation) and a Ph.D. in Horticulture (2010) from the University of Agriculture Sciences and Veterinary Medicine Cluj-Napoca (title of the thesis Statistical Models for Analysis of Genetic Variability). Her research interests are multidisciplinary, and include applied and computational statistics, molecular modeling, genetic analysis, statistical modeling in medicine, integrated health informatics systems and the application of new technologies in medicine, medical diagnostics research, medical imaging analysis, assisted decision systems, research ethics, social media and health information, and evidence-based medicine. She is the author of more than 200 papers and 19 monographs in medicine, computational chemistry, computer science, mathematics, environmental sciences, biomedical engineering, nanoscience nanotechnology, and medical informatics. vii Preface to ”Applied and Computational Statistics” The research on statistical populations, samples, or models have applications in all research areas and are conducted to gain knowledge for real-world problems. Increased calculation power opens the path to computational statistics, algorithm translation, and the implementation of statistical methods and computer simulations. These areas are developing rapidly, providing solutions to multidisciplinary, interdisciplinary, and transdisciplinary topics. An excellent theoretical statistics method is worthless in the absence of real-data applicability. Furthermore, any excellent theoretical statistics method will find its ending sooner or later without proper implementation. Statistical methods find their application is understanding phenomenon from all fields, including medicine, biology, biochemistry, agriculture, horticulture, engineering, and more. The Special Issue of Mathematics entitled “Applied and Computational Statistics” provides new methods and their applicability to the prediction of the mutation score, repeated measurements bi-treatment cross-over design, modified beta Gompertz distribution, extending the characteristic polynomial, and computation of the probability associated with Anderson–Darling statistics. In addition to giving a detailed presentation of the implemented method that assures the reproducibility, the collection of articles also includes specific applicability examples. Thanks to all contributors for their involvement in finding statistical solutions to real-life problems; keep staying on the path of knowledge gain. Dear reviewers, thank you very much for the time spent in reviewing the articles and for the constructive comments that have certainly contributed to the quality of the papers. I warmly invite readers to enjoy reading this collection of articles, and I hope that new ideas will come to life for the benefit of science by reading these manuscripts. Sorana D. Bolboac˘a Special Issue Editor ix mathematics Article The Modified Beta Gompertz Distribution: Theory and Applications Ibrahim Elbatal 1, Farrukh Jamal 2, Christophe Chesneau 3,*, Mohammed Elgarhy 4 and Sharifah Alrajhi 5 1 Institute of Statistical Studies and Research (ISSR), Department of Mathematical Statistics, Cairo University, Giza 12613, Egypt; [email protected] 2 Department of Statistics, Govt. S.A Postgraduate College Dera Nawab Sahib, Bahawalpur, Punjab 63360, Pakistan; [email protected] 3 Department of Mathematics, LMNO, University of Caen, 14032 Caen, France 4 Department of Statistics, University of Jeddah, Jeddah 21589, Saudi Arabia; [email protected] 5 Department of Statistics, Faculty of Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia; [email protected] * Correspondence: [email protected]; Tel.: +33-02-3156-7424 Received: 7 November 2018; Accepted: 17 December 2018; Published: 20 December 2018 Abstract: In this paper, we introduce a new continuous probability distribution with five parameters called the modified beta Gompertz distribution. It is derived from the modified beta generator proposed by Nadarajah, Teimouri and Shih (2014) and the Gompertz distribution. By investigating its mathematical and practical aspects, we prove that it is quite flexible and can be used effectively in modeling a wide variety of real phenomena. Among others, we provide useful expansions of crucial functions, quantile function, moments, incomplete moments, moment generating function, entropies and order statistics. We explore the estimation of the model parameters by the obtained maximum likelihood method. We also present a simulation study testing the validity of maximum likelihood estimators. Finally, we illustrate the flexibility of the distribution by the consideration of two real datasets. Keywords: modified beta generator; gompertz distribution; maximum likelihood estimation MSC: 60E05; 62E15; 62F10 1. Introduction The Gompertz distribution is a continuous probability distribution introduced by Gompertz [1]. The literature about the use of the Gompertz distribution in applied areas is enormous. A nice review can be found in [2], and the references therein. From a mathematical point of view, the cumulative probability density function (cdf) of the Gompertz distribution with parameters λ > 0 and α > 0is given by − λ ( αx− ) G(x)=1 − e α e 1 , x > 0. The related probability density function (pdf) is given by α − λ ( αx− ) g(x)=λe xe α e 1 , x > 0. It can be viewed as a generalization of the exponential distribution (obtained with α → 0) and thus an alternative to the gamma or Weibull distribution. A feature of the Gompertz distribution is that g(x) is unimodal and has positive skewness, whereas the related hazard rate function (hrf) given by h(x)=g(x)/(1 − G(x)) is increasing. To increase the flexibility of the Gompertz distribution, further Mathematics 2019, 7, 3; doi:10.3390/math70100031 www.mdpi.com/journal/mathematics Mathematics 2019, 7,3 extensions have been proposed. A natural one is the generalized Gompertz distribution introduced by El-Gohary et al. [3]. By introducing an exponent parameter a > 0, the related cdf is given by − λ ( αx− ) a F(x)= 1 − e α e 1 , x > 0. The related applications show that a plays an important role in term of model flexibility. This idea was then extended by Jafari et al. [4] who used the so-called beta generator introduced by Eugene et al. [5]. The related cdf is given by λ α − − α (e x−1) 1 1 e − − F(x)= ta 1(1 − t)b 1dt B(a, b) 0 = ( ) > I − λ ( αx− ) a, b
Recommended publications
  • Applications of Computational Statistics with Multiple Regressions
    International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 3 (2017), pp. 923-934 © Research India Publications http://www.ripublication.com Applications of Computational Statistics with Multiple Regressions S MD Riyaz Ahmed Research Scholar, Department of Mathematics, Rayalaseema University, A.P, India Abstract Applications of Computational Statistics with Multiple Regression through the use of Excel, MATLAB and SPSS. It has been widely used for a long time in various fields, such as operations research and analysis, automatic control, probability statistics, statistics, digital signal processing, dynamic system mutilation, etc. The regress function command in MATLAB toolbox provides a helpful tool for multiple linear regression analysis and model checking. Multiple regression analysis is additionally extremely helpful in experimental scenario wherever the experimenter will management the predictor variables. Keywords - Multiple Linear Regression, Excel, MATLAB, SPSS, least square, Matrix notations 1. INTRODUCTION Multiple linear regressions are one of the foremost wide used of all applied mathematics strategies. Multiple regression analysis is additionally extremely helpful in experimental scenario wherever the experimenter will management the predictor variables [1]. A single variable quantity within the model would have provided an inadequate description since a number of key variables have an effect on the response variable in necessary and distinctive ways that [2]. It makes an attempt to model the link between two or more variables and a response variable by fitting a equation to determined information. Each value of the independent variable x is related to a value of the dependent variable y. The population regression line for p informative variables 924 S MD Riyaz Ahmed x1, x 2 , x 3 ......
    [Show full text]
  • Isye 6416: Computational Statistics Lecture 1: Introduction
    ISyE 6416: Computational Statistics Lecture 1: Introduction Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology What this course is about I Interface between statistics and computer science I Closely related to data mining, machine learning and data analytics I Aim at the design of algorithm for implementing statistical methods on computers I computationally intensive statistical methods including resampling methods, Markov chain Monte Carlo methods, local regression, kernel density estimation, artificial neural networks and generalized additive models. Statistics data: images, video, audio, text, etc. sensor networks, social networks, internet, genome. statistics provide tools to I model data e.g. distributions, Gaussian mixture models, hidden Markov models I formulate problems or ask questions e.g. maximum likelihood, Bayesian methods, point estimators, hypothesis tests, how to design experiments Computing I how do we solve these problems? I computing: find efficient algorithms to solve them e.g. maximum likelihood requires finding maximum of a cost function I \Before there were computers, there were algorithms. But now that there are computers, there are even more algorithms, and algorithms lie at the heart of computing. " I an algorithm: a tool for solving a well-specified computational problem I examples: I The Internet enables people all around the world to quickly access and retrieve large amounts of information. With the aid of clever algorithms, sites on the Internet are able to manage and manipulate this large volume of data. I The Human Genome Project has made great progress toward the goals of identifying all the 100,000 genes in human DNA, determining the sequences of the 3 billion chemical base pairs that make up human DNA, storing this information in databases, and developing tools for data analysis.
    [Show full text]
  • Visualizing Research Collaboration in Statistical Science: a Scientometric
    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Library Philosophy and Practice (e-journal) Libraries at University of Nebraska-Lincoln 9-13-2019 Visualizing Research Collaboration in Statistical Science: A Scientometric Perspective Prabir Kumar Das Library, Documentation & Information Science Division, Indian Statistical Institute, 203 B T Road, Kolkata - 700108, [email protected] Follow this and additional works at: https://digitalcommons.unl.edu/libphilprac Part of the Library and Information Science Commons Das, Prabir Kumar, "Visualizing Research Collaboration in Statistical Science: A Scientometric Perspective" (2019). Library Philosophy and Practice (e-journal). 3039. https://digitalcommons.unl.edu/libphilprac/3039 Viisualliiziing Research Collllaborattiion iin Sttattiisttiicall Sciience:: A Sciienttomettriic Perspecttiive Prrabiirr Kumarr Das Sciienttiiffiic Assssiissttantt – A Liibrrarry,, Documenttattiion & IInfforrmattiion Sciience Diiviisiion,, IIndiian Sttattiisttiicall IInsttiittutte,, 203,, B.. T.. Road,, Kollkatta – 700108,, IIndiia,, Emaiill:: [email protected] Abstract Using Sankhyā – The Indian Journal of Statistics as a case, present study aims to identify scholarly collaboration pattern of statistical science based on research articles appeared during 2008 to 2017. This is an attempt to visualize and quantify statistical science research collaboration in multiple dimensions by exploring the co-authorship data. It investigates chronological variations of collaboration pattern, nodes and links established among the affiliated institutions and countries of all contributing authors. The study also examines the impact of research collaboration on citation scores. Findings reveal steady influx of statistical publications with clear tendency towards collaborative ventures, of which double-authored publications dominate. Small team of 2 to 3 authors is responsible for production of majority of collaborative research, whereas mega-authored communications are quite low.
    [Show full text]
  • Department of Statistics and Data Science Promotion and Tenure Guidelines
    Approved by Department 12/05/2013 Approved by Faculty Relations May 20, 2014 UFF Notified May 21, 2014 Effective Spring 2016 2016-17 Promotion Cycle Department of Statistics and Data Science Promotion and Tenure Guidelines The purpose of these guidelines is to give explicit definitions of what constitutes excellence in teaching, research and service for tenure-earning and tenured faculty. Research: The most common outlet for scholarly research in statistics is in journal articles appearing in refereed publications. Based on the five-year Impact Factor (IF) from the ISI Web of Knowledge Journal Citation Reports, the top 50 journals in Probability and Statistics are: 1. Journal of Statistical Software 26. Journal of Computational Biology 2. Econometrica 27. Annals of Probability 3. Journal of the Royal Statistical Society 28. Statistical Applications in Genetics and Series B – Statistical Methodology Molecular Biology 4. Annals of Statistics 29. Biometrical Journal 5. Statistical Science 30. Journal of Computational and Graphical Statistics 6. Stata Journal 31. Journal of Quality Technology 7. Biostatistics 32. Finance and Stochastics 8. Multivariate Behavioral Research 33. Probability Theory and Related Fields 9. Statistical Methods in Medical Research 34. British Journal of Mathematical & Statistical Psychology 10. Journal of the American Statistical 35. Econometric Theory Association 11. Annals of Applied Statistics 36. Environmental and Ecological Statistics 12. Statistics in Medicine 37. Journal of the Royal Statistical Society Series C – Applied Statistics 13. Statistics and Computing 38. Annals of Applied Probability 14. Biometrika 39. Computational Statistics & Data Analysis 15. Chemometrics and Intelligent Laboratory 40. Probabilistic Engineering Mechanics Systems 16. Journal of Business & Economic Statistics 41.
    [Show full text]
  • Factor Dynamic Analysis with STATA
    Dynamic Factor Analysis with STATA Alessandro Federici Department of Economic Sciences University of Rome La Sapienza [email protected] Abstract The aim of the paper is to develop a procedure able to implement Dynamic Factor Analysis (DFA henceforth) in STATA. DFA is a statistical multiway analysis technique1, where quantitative “units x variables x times” arrays are considered: X(I,J,T) = {xijt}, i=1…I, j=1…J, t=1…T, where i is the unit, j the variable and t the time. Broadly speaking, this kind of methodology manages to combine, from a descriptive point of view (not probabilistic), the cross-section analysis through Principal Component Analysis (PCA henceforth) and the time series dimension of data through linear regression model. The paper is organized as follows: firstly, a theoretical overview of the methodology will be provided in order to show the statistical and econometric framework that the procedure of STATA will implement. The DFA framework has been introduced and developed by Coppi and Zannella (1978), and then re-examined by Coppi et al. (1986) and Corazziari (1997): in this paper the original approach will be followed. The goal of the methodology is to decompose the variance and covariance matrix S relative to X(IT,J), where the role of the units is played by the pair “units-times”. The matrix S, concerning the JxT observations over the I units, may be decomposed into the sum of three distinct variance and covariance matrices: S = *SI + *ST + SIT, (1) where: 1 That is a methodology where three or more indexes are simultaneously analysed.
    [Show full text]
  • The Stata Journal Publishes Reviewed Papers Together with Shorter Notes Or Comments, Regular Columns, Book Reviews, and Other Material of Interest to Stata Users
    Editors H. Joseph Newton Nicholas J. Cox Department of Statistics Department of Geography Texas A&M University Durham University College Station, Texas Durham, UK [email protected] [email protected] Associate Editors Christopher F. Baum, Boston College Frauke Kreuter, Univ. of Maryland–College Park Nathaniel Beck, New York University Peter A. Lachenbruch, Oregon State University Rino Bellocco, Karolinska Institutet, Sweden, and Jens Lauritsen, Odense University Hospital University of Milano-Bicocca, Italy Stanley Lemeshow, Ohio State University Maarten L. Buis, University of Konstanz, Germany J. Scott Long, Indiana University A. Colin Cameron, University of California–Davis Roger Newson, Imperial College, London Mario A. Cleves, University of Arkansas for Austin Nichols, Urban Institute, Washington DC Medical Sciences Marcello Pagano, Harvard School of Public Health William D. Dupont , Vanderbilt University Sophia Rabe-Hesketh, Univ. of California–Berkeley Philip Ender , University of California–Los Angeles J. Patrick Royston, MRC Clinical Trials Unit, David Epstein, Columbia University London Allan Gregory, Queen’s University Philip Ryan, University of Adelaide James Hardin, University of South Carolina Mark E. Schaffer, Heriot-Watt Univ., Edinburgh Ben Jann, University of Bern, Switzerland Jeroen Weesie, Utrecht University Stephen Jenkins, London School of Economics and Ian White, MRC Biostatistics Unit, Cambridge Political Science Nicholas J. G. Winter, University of Virginia Ulrich Kohler , University of Potsdam, Germany Jeffrey Wooldridge, Michigan State University Stata Press Editorial Manager Stata Press Copy Editors Lisa Gilmore David Culwell, Shelbi Seiner, and Deirdre Skaggs The Stata Journal publishes reviewed papers together with shorter notes or comments, regular columns, book reviews, and other material of interest to Stata users.
    [Show full text]
  • Rank Full Journal Title Journal Impact Factor 1 Journal of Statistical
    Journal Data Filtered By: Selected JCR Year: 2019 Selected Editions: SCIE Selected Categories: 'STATISTICS & PROBABILITY' Selected Category Scheme: WoS Rank Full Journal Title Journal Impact Eigenfactor Total Cites Factor Score Journal of Statistical Software 1 25,372 13.642 0.053040 Annual Review of Statistics and Its Application 2 515 5.095 0.004250 ECONOMETRICA 3 35,846 3.992 0.040750 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 4 36,843 3.989 0.032370 JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY 5 25,492 3.965 0.018040 STATISTICAL SCIENCE 6 6,545 3.583 0.007500 R Journal 7 1,811 3.312 0.007320 FUZZY SETS AND SYSTEMS 8 17,605 3.305 0.008740 BIOSTATISTICS 9 4,048 3.098 0.006780 STATISTICS AND COMPUTING 10 4,519 3.035 0.011050 IEEE-ACM Transactions on Computational Biology and Bioinformatics 11 3,542 3.015 0.006930 JOURNAL OF BUSINESS & ECONOMIC STATISTICS 12 5,921 2.935 0.008680 CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS 13 9,421 2.895 0.007790 MULTIVARIATE BEHAVIORAL RESEARCH 14 7,112 2.750 0.007880 INTERNATIONAL STATISTICAL REVIEW 15 1,807 2.740 0.002560 Bayesian Analysis 16 2,000 2.696 0.006600 ANNALS OF STATISTICS 17 21,466 2.650 0.027080 PROBABILISTIC ENGINEERING MECHANICS 18 2,689 2.411 0.002430 BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY 19 1,965 2.388 0.003480 ANNALS OF PROBABILITY 20 5,892 2.377 0.017230 STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT 21 4,272 2.351 0.006810 JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS 22 4,369 2.319 0.008900 STATISTICAL METHODS IN
    [Show full text]
  • Abbreviations of Names of Serials
    Abbreviations of Names of Serials This list gives the form of references used in Mathematical Reviews (MR). ∗ not previously listed The abbreviation is followed by the complete title, the place of publication x journal indexed cover-to-cover and other pertinent information. y monographic series Update date: January 30, 2018 4OR 4OR. A Quarterly Journal of Operations Research. Springer, Berlin. ISSN xActa Math. Appl. Sin. Engl. Ser. Acta Mathematicae Applicatae Sinica. English 1619-4500. Series. Springer, Heidelberg. ISSN 0168-9673. y 30o Col´oq.Bras. Mat. 30o Col´oquioBrasileiro de Matem´atica. [30th Brazilian xActa Math. Hungar. Acta Mathematica Hungarica. Akad. Kiad´o,Budapest. Mathematics Colloquium] Inst. Nac. Mat. Pura Apl. (IMPA), Rio de Janeiro. ISSN 0236-5294. y Aastaraam. Eesti Mat. Selts Aastaraamat. Eesti Matemaatika Selts. [Annual. xActa Math. Sci. Ser. A Chin. Ed. Acta Mathematica Scientia. Series A. Shuxue Estonian Mathematical Society] Eesti Mat. Selts, Tartu. ISSN 1406-4316. Wuli Xuebao. Chinese Edition. Kexue Chubanshe (Science Press), Beijing. ISSN y Abel Symp. Abel Symposia. Springer, Heidelberg. ISSN 2193-2808. 1003-3998. y Abh. Akad. Wiss. G¨ottingenNeue Folge Abhandlungen der Akademie der xActa Math. Sci. Ser. B Engl. Ed. Acta Mathematica Scientia. Series B. English Wissenschaften zu G¨ottingen.Neue Folge. [Papers of the Academy of Sciences Edition. Sci. Press Beijing, Beijing. ISSN 0252-9602. in G¨ottingen.New Series] De Gruyter/Akademie Forschung, Berlin. ISSN 0930- xActa Math. Sin. (Engl. Ser.) Acta Mathematica Sinica (English Series). 4304. Springer, Berlin. ISSN 1439-8516. y Abh. Akad. Wiss. Hamburg Abhandlungen der Akademie der Wissenschaften xActa Math. Sinica (Chin. Ser.) Acta Mathematica Sinica.
    [Show full text]
  • The Importance of Endogenous Peer Group Formation
    Econometrica, Vol. 81, No. 3 (May, 2013), 855–882 FROM NATURAL VARIATION TO OPTIMAL POLICY? THE IMPORTANCE OF ENDOGENOUS PEER GROUP FORMATION BY SCOTT E. CARRELL,BRUCE I. SACERDOTE, AND JAMES E. WEST1 We take cohorts of entering freshmen at the United States Air Force Academy and assign half to peer groups designed to maximize the academic performance of the low- est ability students. Our assignment algorithm uses nonlinear peer effects estimates from the historical pre-treatment data, in which students were randomly assigned to peer groups. We find a negative and significant treatment effect for the students we in- tended to help. We provide evidence that within our “optimally” designed peer groups, students avoided the peers with whom we intended them to interact and instead formed more homogeneous subgroups. These results illustrate how policies that manipulate peer groups for a desired social outcome can be confounded by changes in the endoge- nous patterns of social interactions within the group. KEYWORDS: Peer effects, social network formation, homophily. 0. INTRODUCTION PEER EFFECTS HAVE BEEN widely studied in the economics literature due to the perceived importance peers play in workplace, educational, and behavioral outcomes. Previous studies in the economics literature have focused almost ex- clusively on the identification of peer effects and have only hinted at the poten- tial policy implications of the results.2 Recent econometric studies on assorta- tive matching by Graham, Imbens, and Ridder (2009) and Bhattacharya (2009) have theorized that individuals could be sorted into peer groups to maximize productivity.3 This study takes a first step in determining whether student academic per- formance can be improved through the systematic sorting of students into peer groups.
    [Show full text]
  • Computational Tools for Economics Using R
    Student Workshop Student Workshop Computational Tools for Economics Using R Matteo Sostero [email protected] International Doctoral Program in Economics Sant’Anna School of Advanced Studies, Pisa October 2013 Maeo Sostero| Sant’Anna School of Advanced Studies, Pisa 1 / 86 Student Workshop Outline 1 About the workshop 2 Getting to know R 3 Array Manipulation and Linear Algebra 4 Functions, Plotting and Optimisation 5 Computational Statistics 6 Data Analysis 7 Graphics with ggplot2 8 Econometrics 9 Procedural programming Maeo Sostero| Sant’Anna School of Advanced Studies, Pisa 2 / 86 Student Workshop| About the workshop Outline 1 About the workshop 2 Getting to know R 3 Array Manipulation and Linear Algebra 4 Functions, Plotting and Optimisation 5 Computational Statistics 6 Data Analysis 7 Graphics with ggplot2 8 Econometrics 9 Procedural programming Maeo Sostero| Sant’Anna School of Advanced Studies, Pisa 3 / 86 Student Workshop| About the workshop About this Workshop Computational Tools Part of the ‘toolkit’ of computational and empirical research. Direct applications to many subjects (Maths, Stats, Econometrics...). Develop a problem-solving mindset. The R language I Free, open source, cross-platform, with nice ide. I Very wide range of applications (Matlab + Stata + spss + ...). I Easy coding and plenty of (free) packages suited to particular needs. I Large (and growing) user base and online community. I Steep learning curve (occasionally cryptic syntax). I Scattered references. Maeo Sostero| Sant’Anna School of Advanced Studies, Pisa 4 / 86 Student Workshop| About the workshop Workshop material The online resource for this workshop is: http://bit.ly/IDPEctw It will contain the following material: The syllabus for the workshop.
    [Show full text]
  • Robust Statistics Part 1: Introduction and Univariate Data General References
    Robust Statistics Part 1: Introduction and univariate data Peter Rousseeuw LARS-IASC School, May 2019 Peter Rousseeuw Robust Statistics, Part 1: Univariate data LARS-IASC School, May 2019 p. 1 General references General references Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A. Robust Statistics: the Approach based on Influence Functions. Wiley Series in Probability and Mathematical Statistics. Wiley, John Wiley and Sons, New York, 1986. Rousseeuw, P.J., Leroy, A. Robust Regression and Outlier Detection. Wiley Series in Probability and Mathematical Statistics. John Wiley and Sons, New York, 1987. Maronna, R.A., Martin, R.D., Yohai, V.J. Robust Statistics: Theory and Methods. Wiley Series in Probability and Statistics. John Wiley and Sons, Chichester, 2006. Hubert, M., Rousseeuw, P.J., Van Aelst, S. (2008), High-breakdown robust multivariate methods, Statistical Science, 23, 92–119. wis.kuleuven.be/stat/robust Peter Rousseeuw Robust Statistics, Part 1: Univariate data LARS-IASC School, May 2019 p. 2 General references Outline of the course General notions of robustness Robustness for univariate data Multivariate location and scatter Linear regression Principal component analysis Advanced topics Peter Rousseeuw Robust Statistics, Part 1: Univariate data LARS-IASC School, May 2019 p. 3 General notions of robustness General notions of robustness: Outline 1 Introduction: outliers and their effect on classical estimators 2 Measures of robustness: breakdown value, sensitivity curve, influence function, gross-error sensitivity, maxbias curve. Peter Rousseeuw Robust Statistics, Part 1: Univariate data LARS-IASC School, May 2019 p. 4 General notions of robustness Introduction What is robust statistics? Real data often contain outliers.
    [Show full text]
  • V1603419 Discriminant Analysis When the Initial
    TECHNOMETRICSO, VOL. 16, NO. 3, AUGUST 1974 Discriminant Analysis When the Initial Samples are Misclassified II: Non-Random Misclassification Models Peter A. Lachenbruch Department of Biostatistics, University of North Carolina Chapel Hill, North Carolina Two models of non-random initial misclassifications are studied. In these models, observations which are closer to the mean of the “wrong” population have a greater chance of being misclassified than others. Sampling st,udies show that (a) the actual error rates of the rules from samples with initial misclassification are only slightly affected; (b) the apparent error rates, obtained by resubstituting the observations into the calculated discriminant function, are drastically affected, and cannot be used; and (c) the Mahalanobis 02 is greatly inflated. KEYWORDS misclassified. This clearly is unrealistic. The present Discriminant Analysis study attempts to present two other models which Robustness Non-Random Initial Misclassification it is hoped are more realistic than the “equal chance” model. To do this, we sacrifice some mathematical rigor, as the equations involved are 1. INTR~DUOTI~N very difficult to set down. Thus, Monte Carlo Lachenbruch (1966) studied the effects of mis- experiments are used to evaluat’e the behavior of classification of the initial samples on the per- the LDF. formance of the linear discriminant function (LDF). 2. hlODELSAND EXPERIMENTS More recently, Mclachlan (1972) developed the asymptotic results for this model and found gen- We assume that observations x, from the ith erally good agreement with the earlier results. It population, 7~; , i = 1, 2 are normally distributed was found that if the total proportion was not with mean p, and covariance matrix I.
    [Show full text]