Descriptive Methods

Total Page:16

File Type:pdf, Size:1020Kb

Descriptive Methods Descriptive Methods 707.031: Evaluation Methodology Winter 2015/16 Eduardo Veas what we do with the data depends on the scales 2 Measurement Scales 3 The complexity of measurements • Nominal Crude • Ordinal • Interval • Ratio Sophisticated 4 Nominal data • arbitrarily assigning a code to a category or attribute: postal codes, job classifications, military ranks, gender • mathematical manipulations are meaningless • mutually exclusive categories • each category is a level 5• use: freq, counts, Ordinal data • ranking of an attribute • interval between points in scale not intrinsically equal • comparisons < or > are possible 6 Interval data • equal distances between adjacent values, but no absolute zero • temperature in C or F • mean can be computed • Likert scale data ? 7 Ratio • absolute zero • can be operated mathematically • time to complete, distance or velocity of cursor, • count, normalized count (count per something) 8 Frequencies Title Text 9 Frequency tables • tab.courses<- as.data.frame(freq(ordered(courses)), plot=FALSE) • CumFreq= cumsum(tab.courses[- dim(tab.courses)[1],]$Frequency) • tab.courses$CumFreq=c(CumFreq,NA) • tab.courses 10 Interpreting frequency tables Frequency Percent CumPercent CumFreq 1 2 20 20 2 2 3 30 50 5 3 4 40 90 9 4 1 10 100 10 Total 10 100 NA NA 11 Contingency Tables Right-handed Left-handed Total Males 43 9 52 Females 44 4 48 Totals 87 13 100 sd 12 Modelling 13 Statistical models • A model has to accurately represent the real world phenomenon. • A model can be used to predict things about the real world. • The degree to which a statistical model represents the data collected is called fit of the model 14 Frequency distributions • plot observations on the x-axis and a bar showing the count per observation • ideally observations fall symmetrically around the center • skew and kurtosis describe abnormalities in the distributions 15 Histogram / Frequency distributions 16 Center of a distribution • Mode: score that occurs most frequently in the dataset • it may take several values • it may change dramatically with a single added score • Median: is the middle score (after ranking all scores) • for even nr of scores, add centric values and divide by 2 • good for ordinal, interval and ratios • Mean: average score 17 • can be influenced by extreme scores Dispersion of a distribution • range: difference between lowest and highest score 252 - 22 = 232 121 - 22 = 99 • interquartile difference: mode + upper and lower quartiles 18 Fit of the mean • deviance: mean - x • sum of squared errors (SS) • variance = SS / N-1 • stddev = 19 sqrt(variance) Assumptions 20 Assumptions of parametric data • normally distributed: sample or error in the model • homogeneity of variance: • correlational: variance of one variable should be stable at all levels of the other variable • groups: each sample comes from a population with same variance • interval data: at least interval data • independence: the behaviour of one participant does not 21 influence that of another Distributions for DLF 1.2 0.75 0.6 0.9 0.50 0.4 0.6 Density Density Density 0.25 0.2 0.3 0.0 0.00 0.0 0 1 2 3 4 0 1 2 3 0 1 2 3 Hygiene score on day1 Hygiene score on day 2 Hygiene score on day 3 3 3 3 2 2 2 sample sample sample 1 1 1 0 22 0 0 -2 0 2 -3 -2 -1 0 1 2 3 -2 -1 0 1 2 theoretical theoretical theoretical Quantify normallity 23 Different groups 24 Exam histogram 0.025 0.020 0.015 density 0.010 0.005 0.000 25 50 75 100 25 exam Exam histogram 0.04 0.025 0.03 density 0.02 0.020 0.01 0.015 0.00 10 20 30 40 50 60 70 exam density 0.010 0.06 0.005 0.04 density 0.000 0.02 25 50 75 100 exam 0.00 26 60 70 80 90 100 exam Shapiro-Wilk test • # Shapiro-Wilk • shapiro.test(rexam$exam) • • #if we are comparing groups, what is important is the normallity within each group • by(rexam$exam, rexam$uni, shapiro.test) 27 Reporting Shapiro-Wilk • A Shapiro-Wilk test on the R exam, W=0.96, proved a significant deviation from normality (p<0.05). 28 Homogeneity of variance • Levene’s test: • leventTest(rexam$exam, rexam$uni, center=mean) • Reporting: for the percentage on the R exam, the variances were similar for KFU and TUG students, F(1,98)=2.09 29 Homogeneity of variance • Levene in large datasets may give sig for small variations • Double check Variance ratio (Hartley’s Fmax) 30 Correlations Title Text 31 Everything is hard to begin with, but the more you practise the easier it gets 32 Relationships • Everything is hard to begin with, but the more you practise the easier it gets • increase in practice, increase in skill • increase in practice, but skill remains unchanged • increase in practice, decrease in skill 33 Correlations • Bivariate: correlation between two variables • Partial: correlation between two variables while controlling the effect of one or more additional variables 34 Covariance • are changes in one variable met with similar changes in the other variable • cross product deviations= multiply deviations of the two variables • covariance= CPD / (N-1) 35 Covariance II • Positive: both variables vary in the same direction • Negative: variables vary in opposite directions • Covariance is scale dependent and cannot be generalized 36 Pearson correlation coefficient • cov/sxsy • Data must be at least interval • Value between -1 and 1 • 1 -> variables positively correlated • 0 -> no linear relationship • -1 -> variables negatively correlated 37 Dataset Exams and Anxiety • effects of exam stress and revision on exam performance • questionnaire to assess anxiety relating to exams (EAQ) 38 Enter data • examData<-read.delim("ExamAnxiety.dat", header=TRUE) • examData2<- examData[,c(“Exam”,"Anxiety","Revise")] • cor(examData2) 39 Pearson correlation • Exam Anxiety Revise • Exam 1.0000000 -0.4409934 0.3967207 • Anxiety -0.4409934 1.0000000 -0.7092493 • Revise 0.3967207 -0.7092493 1.0000000 40 Confidence values • rcorr(as.matrix(examData[,c(“Exam","Anxiety","R evise")])) • Exam Anxiety Revise • Exam 0 0 • Anxiety 0 0 • Revise 0 0 41 Reporting Pearson’s CC A Pearson correlation coefficient indicated a significant correlation between anxiety performance and time spent revising, r=-.44, p<0.01 42 Spearman’s correlation coefficient • non parametric test • first rank the data and then apply Pearson cc 43 Liar Dataset • contest for storytelling the biggest lie • 68 participants, ranking, and creativity questionnaire 44 Spearman test • liarData=read.delim("biggestLiar.dat", header=TRUE) • rcorr(as.matrix(liarData[,c(“Position","Creativity") ])) • Position Creativity • Position 1.00 -0.31 • Creativity -0.31 1.00 45 Reporting spearman A Spearman non-parametric correlation test indicated a significant correlation between creativity and ranking in the world’s biggest liar contest, r=-.37, p<0.001 46 Kendall’s tau non-parametric • used for small datasets • cor.test(liarData$Position, liarData$Creativity, alternative="less", method="kendall") • z = -3.2252, p-value = 0.0006294 • alternative hypothesis: true tau is less than 0 • sample estimates: • tau 47• -0.3002413 Reporting Kendall’s test A Kendall tau correlation coefficient indicated a correlation between creativity and performance in the World’s biggest liar contest, t=-.30, p<0.001 48 Biserial and point-biserial correlations • one variable is dichotomous (categorical with 2 categories) • point biserial: for discrete dichotomy (e.g., dead) • biserial: for continuous dichotomy (e.g., pass exam) 49 Readings • Discovering statistics using R (Andy Field, Jeremy Miles, Zoe Field) 50 R Title Text 51 set work directory • setwd("/new/work/directory") • getwd() • ls() # list the objects in the current workspace 52 packages • install.packages(“package.name") #installing packages • library(package.name) # loading a package • package::function() # disambiguating functions 53 Nominal and Ordinal data • mydata$v1 <- factor(mydata$v1, levels = c(1,2,3), labels = c("red", "blue", “green")) • mydata$v1 <- ordered(mydata$y, levels = c(1,3, 5), labels = c("Low", "Medium", "High")) 54 Missing data • is.na(var) #tests for missing valua/ also in rows • mydata$v1[mydata$v1==99] <- NA # select rows where v1 is 99 and recode column v1 • x <- c(1,2,NA,3) mean(x) # returns NA mean(x, na.rm=TRUE) • newdata <- na.omit(mydata) # spawn dataset without 55 missing data Install and load packages • install.packages(“car”); install.packages(“ggplot2”); install.packages(“pastecs”); install.packages(“psych”); install.packages(“descr”) • library(car);library(ggplot2);library(pastecs);librar y(psych);library(Rcmdr);library(descr) 56 Enter data • id<-c(1,2,3,4,5,6,7,8,9,10) • sex<-c(1,1,1,1,1,2,2,2,2,2) • courses<-c(2.0,1.0,1.0,2.0,3.0,3.0,3.0,2.0,4.0,3.0) • sex<-factor(sex, levels=c(1:2), labels=c("M", "F")) • example<- data.frame(ID=id,Gender=sex,Courses=courses ) 57 Frequency Distributions • facebook<- c(22,40,53,57,93,98,103,108,116,121,252) • library(modeest) • mfv(facebook) • mean(facebook) 58• median(facebook) Enter data • quantile (facebook) • IQR (facebook) • var(facebook) • sd(facebook) 59 describing your data • #load meaningful data • lecturer<-read.csv(“lecturerData.csv”, header=TRUE) • #get statistics • stat.desc(lecturerData[,c("friends", "income")], basic=FALSE, norm=TRUE) 60 describing your data II • # print frequency table • tab.friends<- as.data.frame(freq(ordered(lecturerData $friends)), plot=FALSE) • tab.friends.cumsum<-cumsum(tab.friends[- dim(tab.friends)[1],]$Frequency) • tab.friends$CumFreq=c(tab.friends.cumsum,NA) • tab.friends 61 Testing Normally Distributed • Load DLF data • dlf<-read.delim("DownloadFestival.dat", header=TRUE) • Data
Recommended publications
  • International Journal of Applied Science and Research
    International Journal of Applied Science and Research FLOOD FREQUENCY ANALYSIS USING GUMBEL DISTRIBUTION EQUATION IN PART OF PORT HARCOURT METROPOLIS Eke, Stanley N. & Hart, Lawrence Department of Surveying and Geomatics, Rivers State University, Port Harcourt, Nigeria IJASR 2020 VOLUME 3 ISSUE 3 MAY – JUNE ISSN: 2581-7876 Abstract – The adequacy of the Gumbel distribution equation for hydrological extremes, with regards to rainfall extreme, is very paramount in hydrologic studies and infrastructural development of any region. This study investigates how the Gumbel distribution equation model and rainfall data set can be used to analyse flood frequency and flood extreme ratio of any given spatial domain and underscore its significance in the application of the model in geo-analysis of varying environmental phenomena. The classical approach of periodic observation of flood heights was deployed over a consistent number of rainfall days in addition to the determination of rainfall intensity and flow rate using relevant hydrological models over a period of time from available rainfall information. The geospatial height data of the western part of the Port Harcourt city metropolis being the study area was also provided. The result showed that a flood peak of 82cm was determined to have a sample mode of 0.532 in relation to sample size of 30 with an associated standard deviation of 1.1124. The result showed that from the frequency curve, the occurrence of smaller floods with a flood peak height of 90cm will be symmetrical and skewed. We assert that the Gumbel distribution equation model serves as a veritable tool for quick and efficient flood analysis and prediction for the study area.
    [Show full text]
  • An Estimation of the Probability Distribution of Wadi Bana Flow in the Abyan Delta of Yemen
    Journal of Agricultural Science; Vol. 4, No. 6; 2012 ISSN 1916-9752 E-ISSN 1916-9760 Published by Canadian Center of Science and Education An Estimation of the Probability Distribution of Wadi Bana Flow in the Abyan Delta of Yemen Khader B. Atroosh (Corresponding author) AREA, Elkod Agricultural Research Station, Abyan, Yemen P.O. Box 6035, Kormaksar, Aden Tel: 967-773-229-056 E-mail: [email protected] Ahmed T. Moustafa International Center for Agricultural Research in the Dry Areas (ICARDA) Dubai Office, P.O. Box 13979, Dubai, UAE Tel: 971-50-636-7156 E-mail: [email protected] Received: January 5, 2012 Accepted: January 29, 2011 Online Published: April 17, 2012 doi:10.5539/jas.v4n6p80 URL: http://dx.doi.org/10.5539/jas.v4n6p80 Abstract Wadi Bana is one of the main Wadies in Yemen, where different quantities of water reach the Delta and depends on the amounts of rainfall in the wadi watershed. In order to estimate the spate water flow distribution probability, data on water flow were collected during the period from 1948 to 2008. Seven probability distributions, namely, Gamma, Weibull, Pearson 6, Rayleigh, Beta, Kumaraswamy and Exponential distributions were tested to model the distribution of the Wadi Bana flows and Kolmogorov-Smirnov, Anderson-Darling and Chi- Squared goodness-of-fit tests were used to evaluate the best fit at 5 % level of significance. It suggests that the best frequency distribution obtained for the Wadi Bana flows in the Abyan Delta of Yemen is Gamma followed by weibull distribution for both summer and autumn seasons.
    [Show full text]
  • Logistic Distribution Becomes Skew to the Right and When P>1 It Becomes Skew to the Left
    FITTING THE VERSATILE LINEARIZED, COMPOSITE, AND GENERALIZED LOGISTIC PROBABILITY DISTRIBUTION TO A DATA SET R.J. Oosterbaan, 06-08-2019. On www.waterlog.info public domain. Abstract The logistic probability distribution can be linearized for easy fitting to a data set using a linear regression to determine the parameters. The logistic probability distribution can also be generalized by raising the data values to the power P that is to be optimized by an iterative method based on the minimization of the squares of the deviations of calculated and observed data values (the least squares or LS method). This enhances its applicability. When P<1 the logistic distribution becomes skew to the right and when P>1 it becomes skew to the left. When P=1, the distribution is symmetrical and quite like the normal probability distribution. In addition, the logistic distribution can be made composite, that is: split into two parts with different parameters. Composite distributions can be used favorably when the data set is obtained under different external conditions influencing its probability characteristics. Contents 1. The standard logistic cumulative distribution function (CDF) and its linearization 2. The logistic cumulative distribution function (CDF) generalized 3. The composite generalized logistic distribution 4. Fitting the standard, generalized, and composite logistic distribution to a data set, available software 5. Construction of confidence belts 6. Constructing histograms and the probability density function (PDF) 7. Ranking according to goodness of fit 8. Conclusion 9. References 1. The standard cumulative distribution function (CDF) and its linearization The cumulative logistic distribution function (CDF) can be written as: Fc = 1 / {1 + e (A*X+B)} where Fc = cumulative logistic distribution function or cumulative frequency e = base of the natural logarithm (Ln), e = 2.71 .
    [Show full text]
  • Thomas Haslwanter with Applications in the Life Sciences
    Statistics and Computing Thomas Haslwanter An Introduction to Statistics with Python With Applications in the Life Sciences Statistics and Computing Series editor W.K. Härdle More information about this series at http://www.springer.com/series/3022 Thomas Haslwanter An Introduction to Statistics with Python With Applications in the Life Sciences 123 Thomas Haslwanter School of Applied Health and Social Sciences University of Applied Sciences Upper Austria Linz, Austria Series Editor: W.K. Härdle C.A.S.E. Centre for Applied Statistics and Economics School of Business and Economics Humboldt-Universität zu Berlin Unter den Linden 6 10099 Berlin Germany The Python code samples accompanying the book are available at www.quantlet.de. All Python programs and data sets can be found on GitHub: https://github.com/thomas- haslwanter/statsintro_python.git. Links to all material are available at http://www.springer. com/de/book/9783319283159. The Python solution codes in the appendix are published under the Creative Commons Attribution-ShareAlike 4.0 International License. ISSN 1431-8784 ISSN 2197-1706 (electronic) Statistics and Computing ISBN 978-3-319-28315-9 ISBN 978-3-319-28316-6 (eBook) DOI 10.1007/978-3-319-28316-6 Library of Congress Control Number: 2016939946 © Springer International Publishing Switzerland 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
    [Show full text]
  • Mining Data Management Tasks in Computational Santiago Miguel Notebooks: an Empirical Cepeda Analysis of Zurich, Switzerland
    MSc Thesis September 13, 2020 Mining Data Management Tasks in Computational Santiago Miguel Notebooks: an Empirical Cepeda Analysis of Zurich, Switzerland Student-ID: 12-741-385 [email protected] Advisor: Cristina Sarasua Prof. Abraham Bernstein, PhD Institut f¨urInformatik Universit¨atZ¨urich http://www.ifi.uzh.ch/ddis Acknowledgements I would like to thank Prof. Abraham Bernstein for giving me the opportunity to write my thesis at the Dynamic and Distributed Information System Group of the University of Zurich. I would also like to give my sincerest gratitude to Cristina Sarasua, who was my supervisor for the duration of this thesis. She went above and beyond to give me the right guidance and tools that were necessary for me to do my work. Furthermore, she not only made sure that I always stayed on track, but her constant support and valuable insights were invaluable to this thesis. Zusammenfassung Das Ziel dieser Arbeit ist, das Verst¨andnisdar¨uber zu vertiefen, wie Datenwissenschaftler arbeiten und dies insbesondere im Hinblick auf die Aufgaben des Datenmanagements. Die Motivation hinter dieser Arbeit ist, die vorherrschende L¨ucke in Bezug auf die man- gelnde empirische Evidenz zu den konkreten Datenmanagementaufgaben in der Daten- wissenschaft zu f¨ullen.Ebenfalls von Interesse ist zu erkennen, welche Rolle die Daten- managementaufgaben in Bezug auf den gesamten datenwissenschaftlichen Prozess spielt. Dar¨uber hinaus wird das Hauptaugenmerk auf die Analyse spezifischer Datenbereinigungs- und Datenintegrationsaufgaben innerhalb des Datenmanagements gelegt. Dieses Ziel wird durch Etikettierung, Data-Mining und die Anwendung statistischer Tests auf Daten- Wissenschaft-Notebooks aus der realen Welt erreicht. Dabei erh¨altman ein Schl¨usselwort- Kennzeichnungssystem, das in der Lage ist, mehrere Arten von Zellen innerhalb von Daten-Wissenschaft-Notebooks zu identifizieren und zu kennzeichnen.
    [Show full text]
  • An R Package for the Optimization of Stratified Sampling
    JSS Journal of Statistical Software October 2014, Volume 61, Issue 4. http://www.jstatsoft.org/ SamplingStrata: An R Package for the Optimization of Stratified Sampling Giulio Barcaroli Italian National Institute of Statistics (Istat) Abstract When designing a sampling survey, usually constraints are set on the desired precision levels regarding one or more target estimates (the Y 's). If a sampling frame is available, containing auxiliary information related to each unit (the X's), it is possible to adopt a stratified sample design. For any given stratification of the frame, in the multivariate case it is possible to solve the problem of the best allocation of units in strata, by minimizing a cost function subject to precision constraints (or, conversely, by maximizing the precision of the estimates under a given budget). The problem is to determine the best stratification in the frame, i.e., the one that ensures the overall minimal cost of the sample necessary to satisfy precision constraints. The X's can be categorical or continuous; continuous ones can be transformed into categorical ones. The most detailed stratification is given by the Cartesian product of the X's (the atomic strata). A way to determine the best stratification is to explore exhaustively the set of all possible partitions derivable by the set of atomic strata, evaluating each one by calculating the corresponding cost in terms of the sample required to satisfy precision constraints. This is unaffordable in practical situations, where the dimension of the space of the partitions can be very high. Another possible way is to explore the space of partitions with an algorithm that is particularly suitable in such situations: the genetic algorithm.
    [Show full text]
  • A Neural Networks Approach for Improving the Accuracy of Multi-Criteria Recommender Systems
    Article A Neural Networks Approach for Improving the Accuracy of Multi-Criteria Recommender Systems Mohammed Hassan *,† and Mohamed Hamada † Software Engineering Lab, Graduate School of Computer Science and Engineering, University of Aizu, Aizuwakamatsu-city 965-8580, Japan; [email protected] * Correspondence: [email protected] † These authors contributed equally to this work. Received: 28 July 2017; Accepted: 16 August 2017; Published: 25 August 2017 Abstract: Accuracy improvement has been one of the most outstanding issues in the recommender systems research community. Recently, multi-criteria recommender systems that use multiple criteria ratings to estimate overall rating have been receiving considerable attention within the recommender systems research domain. This paper proposes a neural network model for improving the prediction accuracy of multi-criteria recommender systems. The neural network was trained using simulated annealing algorithms and integrated with two samples of single-rating recommender systems. The paper presents the experimental results for each of the two single-rating techniques together with their corresponding neural network-based models. To analyze the performance of the approach, we carried out a comparative analysis of the performance of each single rating-based technique and the proposed multi-criteria model. The experimental findings revealed that the proposed models have by far outperformed the existing techniques. Keywords: recommender systems; artificial neural network; simulated annealing; slope one algorithm; singular value decomposition 1. Introduction Recommender systems (RSs) are software tools that have been widely used to predict users’ preferences and recommend useful items that might be interesting to them. The rapid growth of online services such as e-commerce, e-learning, e-government, along with others, in conjunction with the large volume of items in their databases makes RSs become important instruments for recommending interesting items that are not yet seen by users.
    [Show full text]
  • Python in Hydrology In
    Python In Hydrology Version 0.0.0 Sat Kumar Tomer Copyright © 2011 Sat Kumar Tomer. Printing history: November 2011: First edition of Python in Hydrology. Permission is granted to copy, distribute, and/or modify this document under the terms of the GNU Free Doc- umentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and with no Back-Cover Texts. The GNU Free Documentation License is available from www.gnu.org or by writing to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. The original form of this book is LATEX source code. Compiling this LATEX source has the effect of generating a device-independent representation of a textbook, which can be converted to other formats and printed. iv Preface History I started using Python in July 2010. I was looking for a programming language which is open source, and can combine many codes/modules/software. I came across Python and Perl, though there might be many more options available. I googled the use of Python and Perl in the field of general scientific usage, hydrology, Geographic Information System (GIS), statistics, and somehow found Python to be the language of my need. I do not know if my conclusions about the Python versus Perl were true or not? But I feel happy for Python being my choice, and it is fulfilling my requirements. After using Python for two-three months, I become fond of open source, and started using only open source software, I said good-bye to Windows, ArcGis, MATLAB.
    [Show full text]
  • N 7 a - 1 9 46 5
    N 7 a - 1 9 46 5 NASA CR-120967 MODERN METHODOLOGY OF DESIGNING TARGET RELIABILITY INTO ROTATING MECHANICAL COMPONENTS by Dimitri B. Kececioglu and Louie B. Chester THE UNIVERSITY OF ARIZONA College of Engineering Engineering Experiment Station FILE COPY Prepared for NATIONAL AERONAUTICS AND SPACE ADMINISTRATION NASA Lewis Research Center Grant NCR 03-002-044 1. Report No. 2. Government Accession No. 3. Recipient's Catalog No. NASA CR-120967 4. Title and Subtitle 5. Report Date MODERN METHODOLOGY OF DESIGNING TARGET January 31, 1973 6. Performing Organization Code RELIABILITY INTO ROTATING MECHANICAL COMPONENTS 7. Author(s) 8. Performing Organization Report No. Dimitri B. Kececioglu and Louie B. Chester 10. Work Unit No. 9. Performing Organization Name and Address The University of Arizona 11. Contract or Grant No. Tucson, Arizona 85721 NCR 03-022-044 13. Type of Report and Period Covered 12. Sponsoring Agency Name and Address Contractor Report National Aeronautics and Space Administration 14. Sponsoring Agency Code Washington, D.C. 20546 15. Supplementary Notes Project Manager, Vincent R. Lalli, Spacecraft Technology Division, NASA Lewis Research Center, Cleveland, Ohio 16. Abstract Theoretical research done for the development of a methodology for designing specified reliabilities into rotating mechanical components is referenced to prior reports submitted to NASA on this research. Experimentally determined distributional cycles-to-failure versus maximum alternating nominal strength (S-N) diagrams, and distributional mean nominal strength versus maximum alternating nominal strength (Goodman) diagrams are presented. These distributional S-N and Goodman diagrams are for AISI 4340 steel, R 35/4' hardness, round, cylindrical specimens 0.735 in.
    [Show full text]
  • Python with Applications in the Life Sciences Statistics and Computing
    Statistics and Computing Thomas Haslwanter An Introduction to Statistics with Python With Applications in the Life Sciences Statistics and Computing Series editor W.K. Härdle More information about this series at http://www.springer.com/series/3022 Thomas Haslwanter An Introduction to Statistics with Python With Applications in the Life Sciences 123 Thomas Haslwanter School of Applied Health and Social Sciences University of Applied Sciences Upper Austria Linz, Austria Series Editor: W.K. Härdle C.A.S.E. Centre for Applied Statistics and Economics School of Business and Economics Humboldt-Universität zu Berlin Unter den Linden 6 10099 Berlin Germany The Python code samples accompanying the book are available at www.quantlet.de. All Python programs and data sets can be found on GitHub: https://github.com/thomas- haslwanter/statsintro_python.git. Links to all material are available at http://www.springer. com/de/book/9783319283159. The Python solution codes in the appendix are published under the Creative Commons Attribution-ShareAlike 4.0 International License. ISSN 1431-8784 ISSN 2197-1706 (electronic) Statistics and Computing ISBN 978-3-319-28315-9 ISBN 978-3-319-28316-6 (eBook) DOI 10.1007/978-3-319-28316-6 Library of Congress Control Number: 2016939946 © Springer International Publishing Switzerland 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
    [Show full text]
  • Reliability Release 0.6.0
    reliability Release 0.6.0 Jul 23, 2021 Quickstart Intro 1 Quickstart for reliability 3 2 Introduction to the field of reliability engineering7 3 Recommended resources 9 4 Equations of supported distributions 13 5 What is censored data 19 6 Creating and plotting distributions 25 7 Fitting a specific distribution to data 31 8 Fitting all available distributions to data 45 9 Mixture models 53 10 Competing risks models 63 11 DSZI models 73 12 Optimizers 87 13 Probability plots 91 14 Quantile-Quantile plots 103 15 Probability-Probability plots 109 16 Kaplan-Meier 113 17 Nelson-Aalen 119 18 Rank Adjustment 123 19 What is Accelerated Life Testing 127 20 Equations of ALT models 131 i 21 Getting your ALT data in the right format 135 22 Fitting a single stress model to ALT data 137 23 Fitting a dual stress model to ALT data 147 24 Fitting all available models to ALT data 157 25 What does an ALT probability plot show me 165 26 Datasets 171 27 Importing data from Excel 175 28 Converting data between different formats 181 29 Reliability growth 187 30 Optimal replacement time 189 31 ROCOF 193 32 Mean cumulative function 195 33 One sample proportion 201 34 Two proportion test 203 35 Sample size required for no failures 205 36 Sequential sampling chart 207 37 Reliability test planner 211 38 Reliability test duration 215 39 Chi-squared test 219 40 Kolmogorov-Smirnov test 223 41 SN diagram 227 42 Stress-strain and strain-life 231 43 Fracture mechanics 241 44 Creep 247 45 Palmgren-Miner linear damage model 251 46 Acceleration factor 253 47 Solving simultaneous
    [Show full text]
  • An R Companion for the Handbook of Biological Statistics
    AN R COMPANION FOR THE HANDBOOK OF BIOLOGICAL STATISTICS SALVATORE S. MANGIAFICO Rutgers Cooperative Extension New Brunswick, NJ VERSION 1.3.3 i ©2015 by Salvatore S. Mangiafico, except for organization of statistical tests and selection of examples for these tests ©2014 by John H. McDonald. Used with permission. Non-commercial reproduction of this content, with attribution, is permitted. For-profit reproduction without permission is prohibited. If you use the code or information in this site in a published work, please cite it as a source. Also, if you are an instructor and use this book in your course, please let me know. [email protected] Mangiafico, S.S. 2015. An R Companion for the Handbook of Biological Statistics, version 1.3.3. rcompanion.org/documents/RCompanionBioStatistics.pdf . (Web version: rcompanion.org/rcompanion/ ). ii Table of Chapter Introduction ............................................................................................................................... 1 Purpose of This Book.......................................................................................................................... 1 The Handbook for Biological Statistics ................................................................................................ 1 About the Author of this Companion .................................................................................................. 1 About R ............................................................................................................................................
    [Show full text]