BHS 307 – Statistics for the Behavioral Sciences

Copy Link

PSY 307 – Statistics for the Behavioral Sciences Chapter 14 – t-Test for Two Independent Samples Independent Samples Observations in one sample are not paired on a one-to-one basis with observations in the other sample. Effect – any difference between two population means. Hypotheses: Null H0: 1 – 2 = 0 ≤ 0 Alternative H1: 1 – 2 ≠ 0 > 0 The Difference Between Two Sample Means Effect Size X1 minus X2 The null hypothesis (H0) is that these two means come from underlying populations with the same mean (so the difference between them is 0 and 1 – 2 = 0). Sampling Distribution of Differences in Sample Means All possible x1-x2 difference scores that could occur by chance 1 – 2 x1-x2 Critical Value Critical Value Does our x1-x2 exceed the critical value? YES – reject the null (H0) What if the Difference is Smaller? All possible x1-x2 difference scores that could occur by chance 1 – 2 x1-x2 Critical Value Critical Value Does our x1-x2 exceed the critical value? NO – retain the null (H0) Distribution of the Differences In a one-sample case, the mean of the sampling distribution is the population mean. In a two-sample case, the mean of the sampling distribution is the difference between the two population means. The standard deviation of the difference scores is the standard error of this distribution. Formulas for t-test (independent) (X X ) ( ) t 1 2 1 2 hyp s Estimated standard error x1 x2 2 2 s p s p 2 SS1 SS2 SS1 SS2 s s p x1 x2 df n n 2 n1 n2 1 2 2 2 2 ( X1) 2 ( X 2 ) SS1 X1 SS2 X 2 n1 n2 Estimated Standard Error Pooled variance – the variance common to both populations is estimated by combining the variances. The variance average is computed by weighting the group variance by the degrees of freedom (df) then dividing by combined df. Df for pooled variance: n1 + n2 - 2 Confidence Intervals for t The confidence interval for two independent samples is: X X (t )(s ) 1 2 conf x1 x2 Find the appropriate value of t in the t table using the formula for df. The true difference in population means will lie between the upper and lower limits some % of the time Assumptions Both populations are normally distributed with equal variance. With equal sample sizes > 10, valid results will occur even with non- normal populations. Equate sample sizes to minimize effects of unequal variance. Increase sample size to minimize non-normality. Population Correlation Coefficient Two correlated variables are similar to a matched sample because in both cases, observations are paired. A population correlation coefficient ( ) would represent the mean of r’s for all possible pairs of samples. Hypotheses: H0: = 0 H1: ≠ 0 t-Test for Rho ( ) Similar to a t–test for a single group. Tests whether the value of r is significantly different than what might occur by chance. Do the two variables vary together by accident or due to an underlying relationship? Formula for t r t hyp 2 1 r Standard error of prediction n 2 Calculating t for Correlated Variables Except that r is used in place of X, the formula for calculating the t statistic is the same. The standard error of prediction is used in the denominator to calculate the standard deviation. Compare against the critical value for t with df = n – 2 (n = pairs). Importance of Sample Size Lower values of r become significant with greater sample sizes: As n increases, the critical value of t decreases, so it is easier to obtain a significant result. Cohen’s rule of thumb .10 = weak relationship .30 = moderate relationship .50 = strong relationship.

Recommended publications

UCLA STAT 13 Comparison of Two Independent Samples

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences Comparison of Two Instructor: Ivo Dinov, Independent Samples Asst. Prof. of Statistics and Neurology Teaching Assistants: Fred Phoa, Kirsten Johnson, Ming Zheng & Matilda Hsieh University of California, Los Angeles, Fall 2005 http://www.stat.ucla.edu/~dinov/courses_students.html Slide 1 Stat 13, UCLA, Ivo Dinov Slide 2 Stat 13, UCLA, Ivo Dinov Comparison of Two Independent Samples Comparison of Two Independent Samples z Many times in the sciences it is useful to compare two groups z Two Approaches for Comparison Male vs. Female Drug vs. Placebo Confidence Intervals NC vs. Disease we already know something about CI’s Hypothesis Testing Q: Different? this will be new µ µ z What seems like a reasonable way to 1 y 2 y2 σ 1 σ 1 s 2 compare two groups? 1 s2 z What parameter are we trying to estimate? Population 1 Sample 1 Population 2 Sample 2 Size n1 Size n2 Slide 3 Stat 13, UCLA, Ivo Dinov Slide 4 Stat 13, UCLA, Ivo Dinov − Comparison of Two Independent Samples Standard Error of y1 y2 y µ y − y µ − µ z RECALL: The sampling distribution of was centered at , z We know 1 2 estimates 1 2 and had a standard deviation of σ n z What we need to describe next is the precision of our estimate, SE()− y1 y2 − z We’ll start by describing the sampling distribution of y1 y2 Mean: µ1 – µ2 Standard deviation of σ 2 σ 2 s 2 s 2 1 + 2 = 1 + 2 = 2 + 2 SE()y − y SE1 SE2 n1 n2 1 2 n1 n2 z What seems like appropriate estimates for these quantities? Slide 5 Stat 13, UCLA, Ivo Dinov Slide 6 Stat 13, UCLA, Ivo Dinov 1 − − Standard Error of y1 y2 Standard Error of y1 y2 Example: A study is conducted to quantify the benefits of a new Example: Cholesterol medicine (cont’) cholesterol lowering medication.
Basic ES Computations, P. 1 BASIC EFFECT SIZE GUIDE with SPSS

Basic ES Computations, p. 1 BASIC EFFECT SIZE GUIDE WITH SPSS AND SAS SYNTAX Gregory J. Meyer, Robert E. McGrath, and Robert Rosenthal Last updated January 13, 2003 Pending: 1. Formulas for repeated measures/paired samples. (d = r / sqrt(1-r^2) 2. Explanation of 'set aside' lambda weights of 0 when computing focused contrasts. 3. Applications to multifactor designs. SECTION I: COMPUTING EFFECT SIZES FROM RAW DATA. I-A. The Pearson Correlation as the Effect Size I-A-1: Calculating Pearson's r From a Design With a Dimensional Variable and a Dichotomous Variable (i.e., a t-Test Design). I-A-2: Calculating Pearson's r From a Design With Two Dichotomous Variables (i.e., a 2 x 2 Chi-Square Design). I-A-3: Calculating Pearson's r From a Design With a Dimensional Variable and an Ordered, Multi-Category Variable (i.e., a Oneway ANOVA Design). I-A-4: Calculating Pearson's r From a Design With One Variable That Has 3 or More Ordered Categories and One Variable That Has 2 or More Ordered Categories (i.e., an Omnibus Chi-Square Design with df > 1). I-B. Cohen's d as the Effect Size I-B-1: Calculating Cohen's d From a Design With a Dimensional Variable and a Dichotomous Variable (i.e., a t-Test Design). SECTION II: COMPUTING EFFECT SIZES FROM THE OUTPUT OF STATISTICAL TESTS AND TRANSLATING ONE EFFECT SIZE TO ANOTHER. II-A. The Pearson Correlation as the Effect Size II-A-1: Pearson's r From t-Test Output Comparing Means Across Two Groups.
When Does the Pooled Variance T-Test Fail?

African Journal of Mathematics and Computer Science Research Vol. 2(4), pp. 056-062, May, 2009 Available online at http://www.academicjournals.org/AJMCSR © 2009 Academic Journals Full Length Research Paper When does the pooled variance t-test fail? Teh Sin Yin* and Abdul Rahman Othman School of Distance Education, Universiti Sains Malaysia, 11800 USM, Penang, Malaysia. E-mail: [email protected] or [email protected]. Accepted 30 April, 2009 The pooled variance t-tests used prominently for comparing means between two groups is usually restricted with the assumptions of normality and homogeneity of variances. However, the violation of the assumptions happens in many real world data. In this study, the conditions where the pooled variance t-test would fail were investigated. The performance of the t-test was evaluated under different conditions. They were sample sizes, type of distributions (normal or non-normal), and unequal group variances. The Type I error rates and power of the pooled variance t-tests for different designs were obtained and compared. The results showed that the test failed dramatically when the group sample sizes were small and unequal with slight departure from homogeneity of variances. Key words: Pooled variance, power, t-test, type 1 error. INTRODUCTION The t-test first proposed in 1908 by William Sealy Gosset, mance of the t-test was evaluated under different condi- a statistician working for the Guinness brewery in Dublin, tions. They were sample sizes, type of distributions (nor- Ireland ("Student" was his pen name) (Mankiewicz, 1975; mal or non-normal), and equal/unequal group variances.
Accepted Manuscript

Accepted Manuscript Comparing different ways of calculating sample size for two independent means: A worked example Lei Clifton, Jacqueline Birks, David A. Clifton PII: S2451-8654(18)30128-5 DOI: https://doi.org/10.1016/j.conctc.2018.100309 Article Number: 100309 Reference: CONCTC 100309 To appear in: Contemporary Clinical Trials Communications Received Date: 3 September 2018 Revised Date: 18 November 2018 Accepted Date: 28 November 2018 Please cite this article as: L. Clifton, J. Birks, D.A. Clifton, Comparing different ways of calculating sample size for two independent means: A worked example, Contemporary Clinical Trials Communications (2018), doi: https://doi.org/10.1016/j.conctc.2018.100309. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Revised Journal Paper: Sample Size 2018 ACCEPTED MANUSCRIPT Comparing different ways of calculating sample size for two independent means: a worked example Authors: Lei Clifton (1), Jacqueline Birks (1), David A. Clifton (2) Institution: University of Oxford Affiliations: (1) Centre for Statistics in Medicine (CSM), NDORMS, University of Oxford; [email protected], [email protected] (2) Institute of Biomedical Engineering (IBME), Department of Engineering Science, University of Oxford; [email protected] Version: 0.7, Date: 23 Nov 2018 Running Head: Comparing sample sizes for two independent means Key words: sample size, independent, means, standard error, standard deviation, RCT, variance, change score, baseline, correlation, arm, covariate, outcome measure, post-intervention.
Hazards in Choosing Between Pooled and Separate- Variances T Tests

Psicológica (2009), 30 , 371-390. Hazards in Choosing Between Pooled and Separate- Variances t Tests Donald W. Zimmerman 1 & Bruno D. Zumbo 2 1Carleton University, Canada; 2University of British Columbia, Canada If the variances of two treatment groups are heterogeneous and, at the same time, sample sizes are unequal, the Type I error probabilities of the pooled- variances Student t test are modified extensively. It is known that the separate-variances tests introduced by Welch and others overcome this problem in many cases and restore the probability to the nominal significance level. In practice, however, it is not always apparent from sample data whether or not the homogeneity assumption is valid at the population level, and this uncertainty complicates the choice of an appropriate significance test. The present study quantifies the extent to which correct and incorrect decisions occur under various conditions. Furthermore, in using statistical packages, such as SPSS, in which both pooled-variances and separate-variances t tests are available, there is a temptation to perform both versions and to reject H0 if either of the two test statistics exceeds its critical value. The present simulations reveal that this procedure leads to incorrect statistical decisions with high probability. It is well known that the two-sample Student t test depends on an assumption of equal variances in treatment groups, or homogeneity of variance, as it is known. It is also recognized that violation of this assumption is especially serious when sample sizes are unequal (Hsu, 1938; Scheffe ′, 1959, 1970). The t and F tests, which are robust under some violations of assumptions (Boneau, 1960), are decidedly not robust when heterogeneity of variance is combined with unequal sample sizes.
Estimation of Variances and Covariances for Highdimensional Data

Advanced Review Estimation of variances and covariances for high-dimensional data: a selective review Tiejun Tong,1 Cheng Wang1 and Yuedong Wang2∗ Estimation of variances and covariances is required for many statistical methods such as t-test, principal component analysis and linear discriminant analysis. High-dimensional data such as gene expression microarray data and ﬁnancial data pose challenges to traditional statistical and computational methods. In this paper, we review some recent developments in the estimation of variances, covariance matrix, and precision matrix, with emphasis on the applications to microarray data analysis. © 2014 Wiley Periodicals, Inc. How to cite this article: WIREs Comput Stat 2014, 6:255–264. doi: 10.1002/wics.1308 Keywords: covariance matrix; high-dimensional data; microarray data; precision matrix; shrinkage estimation; sparse covariance matrix INTRODUCTION sample size, there is a large amount of uncertainty associated with standard estimates of parameters such ariances and covariances are involved in the con- as the sample mean and covariance. As a consequence, struction of many statistical methods including V statistical analyses based on such estimates are usually t-test, Hotelling’s T2-test, principal component anal- unreliable. ysis, and linear discriminant analysis. Therefore, the Let Y = (Y , … , Y )T be independent random estimation of these quantities is of critical impor- i i1 ip samples from a multivariate normal distribution,1,2 tance and has been well studied over the years. The recent flood of high-dimensional data, however, poses 1∕2 , , , , new challenges to traditional statistical and compu- Yi =Σ Xi + i = 1 … n (1) tational methods. For example, the microarray tech- T nology allows simultaneous monitoring of the whole where = ( 1, … , p) is a p-dimensional mean vec- genome.
1 Two Independent Samples T Test Overview of Tests Presented Three

Two Independent Samples t test Overview of Tests Presented Three tests are introduced below: (a) t-test with equal variances, (b) t-test with unequal variances, and (c) equal variance test. Generally one would follow these steps to determine which t-test to use: 1. Perform equal variances test to assess homogeneity of variances between groups, 2. if group variances are equal, then use t-test with equal variances, 3. if group variances are not equal, then use t-test with unequal variances. These notes begin with presentation of t-test with equal variances assumed, next is information about testing equality of variances, then presented is the t-test with unequal variances. 1. Purpose The two independent samples t-test enables one to determine whether sample means for two groups differ more than would be expected by chance. The independent variable is qualitative with two categories and the dependent variable must be quantitative (ratio, interval, or sometimes ordinal). Example 1: Is there a difference in mean systolic blood pressure between males and females in EDUR 8131? (Note: IV = sex [male vs. female], DV = blood pressure.) Example 2: Does intrinsic motivation differ between students who are given an opportunity to provide instructional feedback and students who not given an opportunity to provide instructional feedback? (Note: IV = feedback opportunity [yes vs. no], DV = intrinsic motivation. Background: Two weeks into a semester an instructor asked students to provide written feedback on instruction with suggestions for improvement.) 2. Steps of Hypothesis Testing Like with the one-sample t test, the two-sample t test follows the same steps for hypothesis testing: a.
Non-Paired T-Test (Welch)

P-values and statistical tests 3. t-test Marek Gierliński Division of Computational Biology Hand-outs available at http://is.gd/statlec Statistical testing Data Null hypothesis Statistical test against H0 H0: no effect All other assumptions Significance level ! = 0.05 Test statistic Tobs p-value: probability that the observed effect is random & < ! & ≥ ! Reject H0 Insufficient evidence Effect is real 2 One-sample t-test One-sample t-test Null hypothesis: the sample came from a population with mean ! = 20 g 4 t-statistic n Sample !", !$, … , !& ' - mean () - standard deviation (* = ()/ - - standard error 2(* n From these we can find 0 2(* ' − 0 . = (* n more generic form: deviation . = standard error 5 Note: Student’s t-distribution n t-statistic is distributed with t- distribution Normal n Standardized n One parameter: degrees of freedom, ! n For large ! approaches normal distribution 6 William Gosset n Brewer and statistician n Developed Student’s t-distribution n Worked for Guinness, who prohibited employees from publishing any papers n Published as “Student” n Worked with Fisher and developed the t- statistic in its current form n Always worked with experimental data n Progenitor bioinformatician? William Sealy Gosset (1876-1937) 7 William Gosset n Brewer and statistician n Developed Student’s t-distribution n Worked for Guinness, who prohibited employees from publishing any papers n Published as “Student” n Worked with Fisher and developed the t- statistic in its current form n Always worked with experimental data n Progenitor bioinformatician?
Discriminant Function Analysis

Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 1975 Discriminant Function Analysis Kuo Hsiung Su Utah State University Follow this and additional works at: https://digitalcommons.usu.edu/gradreports Part of the Applied Statistics Commons Recommended Citation Su, Kuo Hsiung, "Discriminant Function Analysis" (1975). All Graduate Plan B and other Reports. 1179. https://digitalcommons.usu.edu/gradreports/1179 This Report is brought to you for free and open access by the Graduate Studies at DigitalCommons@USU. It has been accepted for inclusion in All Graduate Plan B and other Reports by an authorized administrator of DigitalCommons@USU. For more information, please contact [email protected]. DISCRIMINANTFUNCTION ANALYSIS by Kuo Hsiung Su A report submitted in partial fulfillment of the requirements for the degree of MASTEROF SCIENCE in Applied Statistics Plan B Approved: UTAHSTATE UNIVERSITY Logan, Utah 1975 ii ACKNOWLEDGEMENTS My sincere thanks go to Dr. Rex L. Hurst for his help and valuable suggestions ' in the preparation of this report. He is one of the teachers I mostly respect. I would also like to thank the members of my committee, Dr. Michael P. Windham and Dr. David White for their contributions to my education. Finally, I wish to express a sincere gratitude to my parents and wife for their various support during the time I studied at Utah State University. iii TABLEOF CONTENTS Chapter Page I. INTRODUCTION 1 II. CLASSIFICATIONPROCEDURES 3 I II. MATHEMATICSOF DISCRIMINANTFUNCTION 9 (3 .1) OrthoRonal procedure 10 (3. 2) Non-orthogonal procedure 20 (3 .3) Classification of groun membership using discriminant function score 23 (3.4) The use of categorical variable in discriminant function analysis 24 IV.
Multivariate Control Chart

Soo King Lim 1.0 Multivariate Control Chart............................................................. 3 1.1 Multivariate Normal Distribution ............................................................. 5 1.1.1 Estimation of the Mean and Covariance Matrix ............................................... 6 1.2 Hotelling’s T2 Control Chart ..................................................................... 6 1.2.1 Hotelling’s T Square ............................................................................................. 8 1.2.2 T2 Average Value of k Subgroups ...................................................................... 10 Example 1 ...................................................................................................................... 13 1.2.3 T2 Value of Individual Observation .................................................................. 14 Example 2 ...................................................................................................................... 14 1.3 Two-Sample Hotelling’s T Square .......................................................... 16 Example 3 ...................................................................................................................... 17 Example 4 ...................................................................................................................... 19 1.4 Confidence Level of Two-Sample Difference Mean .............................. 20 1.5 Principal Component Analysis ...............................................................
United States Court of Appeals for the Federal Circuit ______

Case: 18-1229 Document: 107 Page: 1 Filed: 10/03/2019 United States Court of Appeals for the Federal Circuit ______________________ MID CONTINENT STEEL & WIRE, INC., Plaintiff-Appellant v. UNITED STATES, Defendant-Appellee PT ENTERPRISE INC., PRO-TEAM COIL NAIL ENTERPRISE INC., UNICATCH INDUSTRIAL CO., LTD., WTA INTERNTIONAL CO., LTD., ZON MON CO., LTD., HOR LIANG INDUSTRIAL CORPORATION, PRESIDENT INDUSTRIAL INC., LIANG CHYUAN INDUSTRIAL CO., LTD., Defendants-Cross-Appellants ______________________ 2018-1229, 2018-1251 ______________________ Appeals from the United States Court of International Trade in Nos. 1:15-cv-00213-CRK, 1:15-cv-00220-CRK, Judge Claire R. Kelly. ______________________ Decided: October 3, 2019 ______________________ ADAM H. GORDON, The Bristol Group PLLC, Washing- ton, DC, argued for plaintiff-appellant. Also represented by PING GONG. Case: 18-1229 Document: 107 Page: 2 Filed: 10/03/2019 2 MID CONTINENT STEEL & WIRE v. UNITED STATES MIKKI COTTET, Appellate Staff, Civil Division, United States Department of Justice, Washington, DC, argued for defendant-appellee. Also represented by JEANNE DAVIDSON, JOSEPH H. HUNT, PATRICIA M. MCCARTHY. ANDREW THOMAS SCHUTZ, Grunfeld, Desiderio, Le- bowitz, Silverman & Klestadt LLP, Washington, DC, ar- gued for defendants-cross-appellants. Also argued by NED H. MARSHAK, New York, NY. Also represented by MAX FRED SCHUTZMAN, New York, NY; KAVITA MOHAN, Wash- ington, DC. ______________________ Before NEWMAN, O’MALLEY, and TARANTO, Circuit Judges. TARANTO, Circuit Judge. The United States Department of Commerce found that certain foreign producers and exporters were dumping certain products into the United States market, and it im- posed a small antidumping duty on their imports. A do- mestic company argues that Commerce should have imposed a higher duty.
(MANOVA) Compares Groups on a Set of Dependent Variables Simultaneously

Newsom Psy 522/622 Multiple Regression and Multivariate Quantitative Methods, Winter 2021 1 Multivariate Analysis of Variance Multivariate analysis of variance (MANOVA) compares groups on a set of dependent variables simultaneously. Rather than test group differences using several separate ANOVAs and run the risk of increased familywise error (probability of one or more Type I errors), the MANOVA approach makes a single comparison.1 The MANOVA is appropriate only when the several dependent variables are related to one another and the pattern of group differences expected for all of the dependent variables is in the same direction. The multiple measures can be several scale scores, individual items, or other related measures. An example might be a researcher's interest in which several psychotherapy approaches differ in their ability to reduce psychological distress, where several measures of psychological distress, including depression, anxiety, and perceived stress are analyzed together. Alternatively, one might analyze several subscales of depression, such as positive affect, negative affect, and somatic symptoms. MANOVA provides a convenience of a different type of omnibus test of all of the measures at once. Hotelling's T2 The null hypothesis tested with MANOVA is that all of the dependent variable means are equal. Because the algebraic equations become increasingly complex with multiple dependent variables, multivariate analysis are usually described in terms of matrices that summarize the multiple dependent measures. So, the null hypothesis is also a test of whether the vectors (columns) of means are equal across groups. A significant result indicates that one or more of the dependent variable means differ among groups.