Univariate Analyses Can Be Used for Which of the Following
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Efficient Estimation of Parameters of the Negative Binomial Distribution
E±cient Estimation of Parameters of the Negative Binomial Distribution V. SAVANI AND A. A. ZHIGLJAVSKY Department of Mathematics, Cardi® University, Cardi®, CF24 4AG, U.K. e-mail: SavaniV@cardi®.ac.uk, ZhigljavskyAA@cardi®.ac.uk (Corresponding author) Abstract In this paper we investigate a class of moment based estimators, called power method estimators, which can be almost as e±cient as maximum likelihood estima- tors and achieve a lower asymptotic variance than the standard zero term method and method of moments estimators. We investigate di®erent methods of implementing the power method in practice and examine the robustness and e±ciency of the power method estimators. Key Words: Negative binomial distribution; estimating parameters; maximum likelihood method; e±ciency of estimators; method of moments. 1 1. The Negative Binomial Distribution 1.1. Introduction The negative binomial distribution (NBD) has appeal in the modelling of many practical applications. A large amount of literature exists, for example, on using the NBD to model: animal populations (see e.g. Anscombe (1949), Kendall (1948a)); accident proneness (see e.g. Greenwood and Yule (1920), Arbous and Kerrich (1951)) and consumer buying behaviour (see e.g. Ehrenberg (1988)). The appeal of the NBD lies in the fact that it is a simple two parameter distribution that arises in various di®erent ways (see e.g. Anscombe (1950), Johnson, Kotz, and Kemp (1992), Chapter 5) often allowing the parameters to have a natural interpretation (see Section 1.2). Furthermore, the NBD can be implemented as a distribution within stationary processes (see e.g. Anscombe (1950), Kendall (1948b)) thereby increasing the modelling potential of the distribution. -
T-TEST Outline Hypothesis Testing Steps Bivariate Analysis
Bivariate Analysis T-TEST Variable 1 2 LEVELS >2 LEVELS CONTINUOUS Variable 2 2 LEVELS X2 X2 t-test chi square test chi square test >2 LEVELS X2 X2 ANOVA chi square test chi square test (F-test) CONTINUOUS t-test ANOVA -Correlation (F-test) -Simple linear Regression Outline Comparison of means: t-test Hypothesis testing steps T-test is used when one variable is of a continuous nature and the other is dichotomous. T-test The t-test is used to compare the means of two groups on a given variable. Anova Examples: Difference in average blood pressure among males & females. Difference in average BMI among those who exercise and those who do not. Hypothesis testing steps Comparison of means: t-test Example 1: Identify the study objective Research question: Among university students, is there a State the null & alternative hypothesis difference between the average weight for males versus females? Select the proper test statistic Null hypothesis (Ho): μ weight males = μ weight females Calculate the test statistic Alternative hypothesis (Ha): μ weight males ≠ μ weight females Take a statistical decision based on the p-value. Statistical test: t-test Reject or accept the null hypothesis Comparison of means: t-test Comparison of means: t-test T-Test (SPSS output) If this p-value is < 0.05 then reject null hypothesis and conclude that the variances are different (accept alternative) Group Statistics and hence check this p-value for the t-test. Std. Error gender N Mean Std. Deviation Mean weight male 804 75.92 12.843 .453 If this p-value is > 0.05 then accept null hypothesis and female 1135 56.47 8.923 .265 conclude that the variances are equal and hence check this p-value for the t-test. -
Chapter Five Multivariate Statistics
Chapter Five Multivariate Statistics Jorge Luis Romeu IIT Research Institute Rome, NY 13440 April 23, 1999 Introduction In this chapter we treat the multivariate analysis problem, which occurs when there is more than one piece of information from each subject, and present and discuss several materials analysis real data sets. We first discuss several statistical procedures for the bivariate case: contingency tables, covariance, correlation and linear regression. They occur when both variables are either qualitative or quantitative: Then, we discuss the case when one variable is qualitative and the other quantitative, via the one way ANOVA. We then overview the general multivariate regression problem. Finally, the non-parametric case for comparison of several groups is discussed. We emphasize the assessment of all model assumptions, prior to model acceptance and use and we present some methods of detection and correction of several types of assumption violation problems. The Case of Bivariate Data Up to now, we have dealt with data sets where each observation consists of a single measurement (e.g. each observation consists of a tensile strength measurement). These are called univariate observations and the related statistical problem is known as univariate analysis. In many cases, however, each observation yields more than one piece of information (e.g. tensile strength, material thickness, surface damage). These are called multivariate observations and the statistical problem is now called multivariate analysis. Multivariate analysis is of great importance and can help us enhance our data analysis in several ways. For, coming from the same subject, multivariate measurements are often associated with each other. If we are able to model this association then we can take advantage of the situation to obtain one from the other. -
Univariate and Multivariate Skewness and Kurtosis 1
Running head: UNIVARIATE AND MULTIVARIATE SKEWNESS AND KURTOSIS 1 Univariate and Multivariate Skewness and Kurtosis for Measuring Nonnormality: Prevalence, Influence and Estimation Meghan K. Cain, Zhiyong Zhang, and Ke-Hai Yuan University of Notre Dame Author Note This research is supported by a grant from the U.S. Department of Education (R305D140037). However, the contents of the paper do not necessarily represent the policy of the Department of Education, and you should not assume endorsement by the Federal Government. Correspondence concerning this article can be addressed to Meghan Cain ([email protected]), Ke-Hai Yuan ([email protected]), or Zhiyong Zhang ([email protected]), Department of Psychology, University of Notre Dame, 118 Haggar Hall, Notre Dame, IN 46556. UNIVARIATE AND MULTIVARIATE SKEWNESS AND KURTOSIS 2 Abstract Nonnormality of univariate data has been extensively examined previously (Blanca et al., 2013; Micceri, 1989). However, less is known of the potential nonnormality of multivariate data although multivariate analysis is commonly used in psychological and educational research. Using univariate and multivariate skewness and kurtosis as measures of nonnormality, this study examined 1,567 univariate distriubtions and 254 multivariate distributions collected from authors of articles published in Psychological Science and the American Education Research Journal. We found that 74% of univariate distributions and 68% multivariate distributions deviated from normal distributions. In a simulation study using typical values of skewness and kurtosis that we collected, we found that the resulting type I error rates were 17% in a t-test and 30% in a factor analysis under some conditions. Hence, we argue that it is time to routinely report skewness and kurtosis along with other summary statistics such as means and variances. -
Package 'Biwavelet'
Package ‘biwavelet’ May 26, 2021 Type Package Title Conduct Univariate and Bivariate Wavelet Analyses Version 0.20.21 Date 2021-05-24 Author Tarik C. Gouhier, Aslak Grinsted, Viliam Simko Maintainer Tarik C. Gouhier <[email protected]> Description This is a port of the WTC MATLAB package written by Aslak Grinsted and the wavelet program written by Christopher Torrence and Gibert P. Compo. This package can be used to perform univariate and bivariate (cross-wavelet, wavelet coherence, wavelet clustering) analyses. License GPL (>= 2) URL https://github.com/tgouhier/biwavelet BugReports https://github.com/tgouhier/biwavelet/issues LazyData yes LinkingTo Rcpp Imports fields, foreach, Rcpp (>= 0.12.2) Suggests testthat, knitr, rmarkdown, devtools RoxygenNote 7.1.1 NeedsCompilation yes Repository CRAN Date/Publication 2021-05-26 05:10:10 UTC R topics documented: biwavelet-package . .2 ar1.spectrum . .4 ar1_ma0_sim . .5 arrow ............................................6 arrow2 . .7 1 2 biwavelet-package check.data . .7 check.datum . .8 convolve2D . .9 convolve2D_typeopen . 10 enviro.data . 10 get_minroots . 11 MOTHERS . 12 phase.plot . 12 plot.biwavelet . 13 pwtc............................................. 17 rcpp_row_quantile . 19 rcpp_wt_bases_dog . 20 rcpp_wt_bases_morlet . 21 rcpp_wt_bases_paul . 22 smooth.wavelet . 23 wclust . 24 wdist . 25 wt.............................................. 26 wt.bases . 28 wt.bases.dog . 29 wt.bases.morlet . 29 wt.bases.paul . 30 wt.sig . 31 wtc.............................................. 32 wtc.sig . 34 wtc_sig_parallel . 36 xwt ............................................. 37 Index 40 biwavelet-package Conduct Univariate and Bivariate Wavelet Analyses Description This is a port of the WTC MATLAB package written by Aslak Grinsted and the wavelet program written by Christopher Torrence and Gibert P. Compo. This package can be used to perform uni- variate and bivariate (cross-wavelet, wavelet coherence, wavelet clustering) wavelet analyses. -
Characterization of the Bivariate Negative Binomial Distribution James E
Journal of the Arkansas Academy of Science Volume 21 Article 17 1967 Characterization of the Bivariate Negative Binomial Distribution James E. Dunn University of Arkansas, Fayetteville Follow this and additional works at: http://scholarworks.uark.edu/jaas Part of the Other Applied Mathematics Commons Recommended Citation Dunn, James E. (1967) "Characterization of the Bivariate Negative Binomial Distribution," Journal of the Arkansas Academy of Science: Vol. 21 , Article 17. Available at: http://scholarworks.uark.edu/jaas/vol21/iss1/17 This article is available for use under the Creative Commons license: Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0). Users are able to read, download, copy, print, distribute, search, link to the full texts of these articles, or use them for any other lawful purpose, without asking prior permission from the publisher or the author. This Article is brought to you for free and open access by ScholarWorks@UARK. It has been accepted for inclusion in Journal of the Arkansas Academy of Science by an authorized editor of ScholarWorks@UARK. For more information, please contact [email protected], [email protected]. Journal of the Arkansas Academy of Science, Vol. 21 [1967], Art. 17 77 Arkansas Academy of Science Proceedings, Vol.21, 1967 CHARACTERIZATION OF THE BIVARIATE NEGATIVE BINOMIAL DISTRIBUTION James E. Dunn INTRODUCTION The univariate negative binomial distribution (also known as Pascal's distribution and the Polya-Eggenberger distribution under vari- ous reparameterizations) has recently been characterized by Bartko (1962). Its broad acceptance and applicability in such diverse areas as medicine, ecology, and engineering is evident from the references listed there. -
Meta4diag: Bayesian Bivariate Meta-Analysis of Diagnostic Test Studies for Routine Practice
meta4diag: Bayesian Bivariate Meta-analysis of Diagnostic Test Studies for Routine Practice J. Guo and A. Riebler Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, PO 7491, Norway. July 8, 2016 Abstract This paper introduces the R package meta4diag for implementing Bayesian bivariate meta-analyses of diagnostic test studies. Our package meta4diag is a purpose-built front end of the R package INLA. While INLA offers full Bayesian inference for the large set of latent Gaussian models using integrated nested Laplace approximations, meta4diag extracts the features needed for bivariate meta-analysis and presents them in an intuitive way. It allows the user a straightforward model- specification and offers user-specific prior distributions. Further, the newly proposed penalised complexity prior framework is supported, which builds on prior intuitions about the behaviours of the variance and correlation parameters. Accurate posterior marginal distributions for sensitivity and specificity as well as all hyperparameters, and covariates are directly obtained without Markov chain Monte Carlo sampling. Further, univariate estimates of interest, such as odds ratios, as well as the SROC curve and other common graphics are directly available for interpretation. An in- teractive graphical user interface provides the user with the full functionality of the package without requiring any R programming. The package is available through CRAN https://cran.r-project.org/web/packages/meta4diag/ and its usage will be illustrated using three real data examples. arXiv:1512.06220v2 [stat.AP] 7 Jul 2016 1 1 Introduction A meta-analysis summarises the results from multiple studies with the purpose of finding a general trend across the studies. -
UNIT 1 INTRODUCTION to STATISTICS Introduction to Statistics
UNIT 1 INTRODUCTION TO STATISTICS Introduction to Statistics Structure 1.0 Introduction 1.1 Objectives 1.2 Meaning of Statistics 1.2.1 Statistics in Singular Sense 1.2.2 Statistics in Plural Sense 1.2.3 Definition of Statistics 1.3 Types of Statistics 1.3.1 On the Basis of Function 1.3.2 On the Basis of Distribution of Data 1.4 Scope and Use of Statistics 1.5 Limitations of Statistics 1.6 Distrust and Misuse of Statistics 1.7 Let Us Sum Up 1.8 Unit End Questions 1.9 Glossary 1.10 Suggested Readings 1.0 INTRODUCTION The word statistics has different meaning to different persons. Knowledge of statistics is applicable in day to day life in different ways. In daily life it means general calculation of items, in railway statistics means the number of trains operating, number of passenger’s freight etc. and so on. Thus statistics is used by people to take decision about the problems on the basis of different type of quantitative and qualitative information available to them. However, in behavioural sciences, the word ‘statistics’ means something different from the common concern of it. Prime function of statistic is to draw statistical inference about population on the basis of available quantitative information. Overall, statistical methods deal with reduction of data to convenient descriptive terms and drawing some inferences from them. This unit focuses on the above aspects of statistics. 1.1 OBJECTIVES After going through this unit, you will be able to: Define the term statistics; Explain the status of statistics; Describe the nature of statistics; State basic concepts used in statistics; and Analyse the uses and misuses of statistics. -
Correlation and Regression Analysis
OIC ACCREDITATION CERTIFICATION PROGRAMME FOR OFFICIAL STATISTICS Correlation and Regression Analysis TEXTBOOK ORGANISATION OF ISLAMIC COOPERATION STATISTICAL ECONOMIC AND SOCIAL RESEARCH AND TRAINING CENTRE FOR ISLAMIC COUNTRIES OIC ACCREDITATION CERTIFICATION PROGRAMME FOR OFFICIAL STATISTICS Correlation and Regression Analysis TEXTBOOK {{Dr. Mohamed Ahmed Zaid}} ORGANISATION OF ISLAMIC COOPERATION STATISTICAL ECONOMIC AND SOCIAL RESEARCH AND TRAINING CENTRE FOR ISLAMIC COUNTRIES © 2015 The Statistical, Economic and Social Research and Training Centre for Islamic Countries (SESRIC) Kudüs Cad. No: 9, Diplomatik Site, 06450 Oran, Ankara – Turkey Telephone +90 – 312 – 468 6172 Internet www.sesric.org E-mail [email protected] The material presented in this publication is copyrighted. The authors give the permission to view, copy download, and print the material presented that these materials are not going to be reused, on whatsoever condition, for commercial purposes. For permission to reproduce or reprint any part of this publication, please send a request with complete information to the Publication Department of SESRIC. All queries on rights and licenses should be addressed to the Statistics Department, SESRIC, at the aforementioned address. DISCLAIMER: Any views or opinions presented in this document are solely those of the author(s) and do not reflect the views of SESRIC. ISBN: xxx-xxx-xxxx-xx-x Cover design by Publication Department, SESRIC. For additional information, contact Statistics Department, SESRIC. i CONTENTS Acronyms -
Univariate Statistics Summary
Univariate Statistics Summary Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example: the favourite colour of a class of students; the mode of transport that each student uses to get to school; the rating of a TV program, either “a great program”, “average program” or “poor program”. Postal codes such as “3011”, “3015” etc. Numerical data are observations based on counting or measurement. Calculations can be performed on numerical data. There are two main types of numerical data Discrete data, which takes only fixed values, usually whole numbers. Discrete data often arises as the result of counting items. For example: the number of siblings each student has, the number of pets a set of randomly chosen people have or the number of passengers in cars that pass an intersection. Continuous data can take any value in a given range. It is usually a measurement. For example: the weights of students in a class. The weight of each student could be measured to the nearest tenth of a kg. Weights of 84.8kg and 67.5kg would be recorded. Other examples of continuous data include the time taken to complete a task or the heights of a group of people. Exercise 1 Decide whether the following data is categorical or numerical. If numerical decide if the data is discrete or continuous. 1. 2. Page 1 of 21 Univariate Statistics Summary 3. 4. Solutions 1a Numerical-discrete b. Categorical c. Categorical d. -
Multiple Linear Regression
Multiple Linear Regression Mark Tranmer Mark Elliot 1 Contents Section 1: Introduction............................................................................................................... 3 Exam16 ...................................................................................................................................................... Exam11...................................................................................................................................... 4 Predicted values......................................................................................................................... 5 Residuals.................................................................................................................................... 6 Scatterplot of exam performance at 16 against exam performance at 11.................................. 6 1.3 Theory for multiple linear regression................................................................................... 7 Section 2: Worked Example using SPSS.................................................................................. 10 Section 3: Further topics.......................................................................................................... 36 Stepwise................................................................................................................................... 46 Section 4: BHPS assignment................................................................................................... 46 Reading list............................................................................................................................. -
The Landscape of R Packages for Automated Exploratory Data Analysis by Mateusz Staniak and Przemysław Biecek
CONTRIBUTED RESEARCH ARTICLE 1 The Landscape of R Packages for Automated Exploratory Data Analysis by Mateusz Staniak and Przemysław Biecek Abstract The increasing availability of large but noisy data sets with a large number of heterogeneous variables leads to the increasing interest in the automation of common tasks for data analysis. The most time-consuming part of this process is the Exploratory Data Analysis, crucial for better domain understanding, data cleaning, data validation, and feature engineering. There is a growing number of libraries that attempt to automate some of the typical Exploratory Data Analysis tasks to make the search for new insights easier and faster. In this paper, we present a systematic review of existing tools for Automated Exploratory Data Analysis (autoEDA). We explore the features of fifteen popular R packages to identify the parts of the analysis that can be effectively automated with the current tools and to point out new directions for further autoEDA development. Introduction With the advent of tools for automated model training (autoML), building predictive models is becoming easier, more accessible and faster than ever. Tools for R such as mlrMBO (Bischl et al., 2017), parsnip (Kuhn and Vaughan, 2019); tools for python such as TPOT (Olson et al., 2016), auto-sklearn (Feurer et al., 2015), autoKeras (Jin et al., 2018) or tools for other languages such as H2O Driverless AI (H2O.ai, 2019; Cook, 2016) and autoWeka (Kotthoff et al., 2017) supports fully- or semi-automated feature engineering and selection, model tuning and training of predictive models. Yet, model building is always preceded by a phase of understanding the problem, understanding of a domain and exploration of a data set.