Estimation and Testing for Association with Multiple-Response Categorical Variables from Complex Surveys

Total Page:16

File Type:pdf, Size:1020Kb

Estimation and Testing for Association with Multiple-Response Categorical Variables from Complex Surveys Section on Survey Research Methods Estimation and Testing for Association with Multiple-Response Categorical Variables from Complex Surveys Christopher R. Bilder1, Thomas M. Loughin2 Department of Statistics, University of Nebraska-Lincoln, [email protected], http://www.chrisbilder.com1 Department of Statistics and Actuarial Science, Simon Fraser University Surrey, Surrey, BC, V3T0A3, [email protected] KEY WORDS: Correlated binary data; Loglinear model; NHANES; Pearson statistic; Pick any/c; Rao-Scott adjustments Scherer, 1998). Only recently has work been done to 1. Introduction address these problems. For tests of independence Many surveys include questions that invite between one SRCV and one MRCV, Agresti and Liu respondents to “choose all that apply” or “pick any” from (1999), Bilder et al. (2000), Decady and Thomas (2000), a series of items. A recent Yahoo! internet search on the and Bilder and Loughin (2001) describe various phrases “survey” and “choose all that apply” yielded over adjustments to the Pearson statistic and methods to 2000 hits, many from online surveys. Issues of statistical approximate the resulting sampling distributions. Agresti validity of voluntary-response surveys notwithstanding, and Liu (1999) point out that MRCVs can be expressed this points to the fact that these questions are ubiquitous as binary vectors wherein each element of the vector in modern survey methodology. Furthermore, the United indicates whether the corresponding item is chosen as one States Office of Management and Budget has mandated of the responses. Decady and Thomas (2000) cleverly that federal surveys ask questions about race and note the parallel between an application of an adjusted ethnicity in a “choose all that apply” format (Federal Pearson statistic to MRCVs and the use of the Pearson Register, 1997, p. 58781), allowing members of the statistic in non-multinomial sampling structures as increasingly multiracial population to acknowledge their studied by Rao and Scott (1981), although the form of the complex ethnicity. Given the frequency with which these Pearson statistic used by Decady and Thomas is not questions occur, it is vital to have good methods for invariant to the 0/1 coding of the binary vectors (a statistical analysis of the data they provide. different value of the statistic results if the elements Variables that summarize survey data arising from indicate non-selection of that item). Thomas and Decady “choose all that apply” questions are referred to as (2004) and Bilder and Loughin (2004) discuss tests of multiple-response categorical variables (MRCVs). They independence between two MRCVs. present a challenge because they cannot be handled in the Beyond testing for association, there have been a few same manner as the usual single-response categorical efforts to model associations involving MRCVs. Agresti variables (SRCVs), although this has only recently been and Liu (1999, 2001) propose marginal logit models to recognized in the literature (Umesh, 1995; Loughin and describe the association between a single MRCV and a Scherer, 1998). When two or more categorical variables single SRCV. They suggest, but do not explore, are measured, questions naturally arise regarding the extensions to multiple MRCVs. Bilder and Loughin associations among them. The difficulty with the analysis (2003, 2007) examine these suggestions and propose their of associations involving MRCVs comes from the fact own modeling procedure. They conclude that a that individual subjects may respond positively to more generalized loglinear model fit using a marginal than one item from a list, and these responses are likely to estimation procedure is the preferred model due to be correlated. The result is that tests for independence computational ease, flexibility, and overall performance. between categorical variables involving MRCVs cannot Inference using this model makes use of work by Rao and be performed in the “usual” ways. For example, the Scott (1984) and Haber (1985). In particular, model Pearson test statistic for independence is not invariant to fitting uses a “pseudo” maximum likelihood approach the arbitrary choice of whether the positive or the based upon an incorrect assumption of a multinomial negative responses are tabulated (Agresti and Liu, 1999; distribution for the observed marginal counts, and then Bilder, Loughin, and Nettleton, 2000; Bilder and adjustments are applied to the sampling distributions of Loughin, 2001). Versions of the Pearson statistic that are the various estimators and test statistics. modified to be invariant to this choice of coding have All previous work on MRCVs has been conducted distributions that are not, in general, chi-square (Agresti under the assumption of simple random sampling with and Liu, 1999; Bilder et al. 2000, Bilder and Loughin, replacement, so that measurements on sampled subjects 2004). can be viewed as a set of independent, identically- Ignoring these problems and simply using the usual distributed random variables from some infinite Pearson statistic and its associated chi-square distribution population. No methods are currently available for the provides a test with very poor properties (Loughin and common situation of testing and modeling association 2838 Section on Survey Research Methods among MRCVs based on data arising from complex 2. Background survey sampling involving, for example, probability proportional to size, stratification, and/or clustering. The 2.1 General notation present research combines the results of Rao and Scott Let U denote a population of N units, and let s be a (1984) and Bilder and Loughin (2007) to extend existing sample of n units selected from U according to some modeling and analysis techniques in order to provide probability sampling plan with known first-order valid analysis of data from complex survey designs. inclusion probabilities. Let wu, u = 1, …, N, represent Beyond the measured responses, the only additional survey weights. These may be simply the inverse of the information that the proposed methods require is a set of first-order inclusion probabilities, or they may be more survey weights that permit unbiased estimation of complicated to account for nonresponse, post- population totals, and a method of calculating the stratification, and so forth. Assume that these weights are variance of these estimates. General tests for association, constructed to lead to unbiased estimates of population like those developed by Loughin and Scherer (1998), totals. Bilder et al. (2000), Thomas and Decady (2004), and To simplify the exposition, consider the case of two Bilder and Loughin (2004), are obtained as goodness-of- MRCVs, Y = (Y1, …, YI) and Z = (Z1, …, ZJ), where Yi is fit tests from the proposed models. the binary response to item i of Y and Zj is the binary The National Health and Nutrition Examination response to item j of Z. Let the corresponding observed Survey (NHANES) provides an excellent opportunity to values in the population be yu = (yu1, …, yuI) and zu = (zu1, apply the proposed methods. This survey is a large, …, zuJ ) for u = 1, …, N. Extensions to more than two nationwide study of the health and diet of people in the MRCVs are discussed in Section 5. Consider United States and is conducted periodically using a subpopulations of U corresponding to different (y, z) multistage, complex survey sampling design. The 1999- combinations with U(y, z) = {u: (yu, zu) = (y, z)}. The 2000 survey contains numerous questions that can be population total count for combination (y, z) is N(y, z) = treated as “choose all that apply” questions. The focus ∑u∈U δ (u ∈ U()y,z ), where δ(⋅) is an indicator function. here will be on questions asking the respondent about The sample-weighted estimate of the population total lifetime tobacco use (>100 cigarettes, >20 pipes, >20 count for combination (y, z) is N (,)y z = cigars, >20 snuff, and >20 chewing tobacco) and types of ∑us∈ wuuδ ( ∈ U()y,z ). These counts can be represented respiratory problems experienced (cough on most days, as 2I+J×1 vectors, N and N . Without loss of generality, bring up phlegm on most days, experienced wheezing in assume that the elements of each vector are arranged in chest, dry cough at night). The exact survey questions and lexicographic order according to the binary numerical data are available on the NHANES website at value of (y, z). Thus, element k corresponds to the www.cdc.gov/nchs/about/major/nhanes/NHANES99_00. combination (y, z) that is the binary equivalent of the htm. Table 1 displays survey-design adjusted proportions decimal value k – 1. for the number of individuals who responded positively Next, consider the population marginal count for (yi = to each item. Note that individual survey respondents a, zj = b), where a, b ∈ {0, 1}, Mab(ij) = may be represented in more than one table cell because ∑u∈U δ (,yazbui= uj = ), which is estimated by they could use more than one type of tobacco and/or have M ab() ij ==∑us∈ wyuδ (, ui azb uj =). The corresponding more than one type of respiratory problem so that population and estimated proportions are Pab(ij) = Mab(ij)/N common Pearson chi-square tests for independence or and PMab() ij= ab () ij /,N respectively, where Nw= ∑us∈ u . loglinear models accounting for the survey design can not Table 2 shows the estimated proportions for the be used. The purpose of this paper is to show how one respiratory symptoms and tobacco use data from can simultaneously model and estimate the association NHANES. Note that Mab(ij) = bNab′ () ij and M ab() ij = structure between MRCVs in this setting. I+J bNab′ () ij for a suitably-chosen 1×2 row vector bab′ () ij . This paper is organized as follows. Notation and The elements of bab() ij are δ(yi = a, zj = b) = 0 or 1 with preliminary details are given in Section 2, followed by a order corresponding to the 2I+J ordered (y, z) values. detailed description of the model and associated inference Thus, we have the representations M = BN and procedures.
Recommended publications
  • E-Survey Methodology
    Chapter I E-Survey Methodology Karen J. Jansen The Pennsylvania State University, USA Kevin G. Corley Arizona State University, USA Bernard J. Jansen The Pennsylvania State University, USA ABSTRACT With computer network access nearly ubiquitous in much of the world, alternative means of data col- lection are being made available to researchers. Recent studies have explored various computer-based techniques (e.g., electronic mail and Internet surveys). However, exploitation of these techniques requires careful consideration of conceptual and methodological issues associated with their use. We identify and explore these issues by defining and developing a typology of “e-survey” techniques in organiza- tional research. We examine the strengths, weaknesses, and threats to reliability, validity, sampling, and generalizability of these approaches. We conclude with a consideration of emerging issues of security, privacy, and ethics associated with the design and implications of e-survey methodology. INTRODUCTION 1999; Oppermann, 1995; Saris, 1991). Although research over the past 15 years has been mixed on For the researcher considering the use of elec- the realization of these benefits (Kiesler & Sproull, tronic surveys, there is a rapidly growing body of 1986; Mehta & Sivadas, 1995; Sproull, 1986; Tse, literature addressing design issues and providing Tse, Yin, Ting, Yi, Yee, & Hong, 1995), for the laundry lists of costs and benefits associated with most part, researchers agree that faster response electronic survey techniques (c.f., Lazar & Preece, times and decreased costs are attainable benefits, 1999; Schmidt, 1997; Stanton, 1998). Perhaps the while response rates differ based on variables three most common reasons for choosing an e-sur- beyond administration mode alone.
    [Show full text]
  • Regression Summary Project Analysis for Today Review Questions: Categorical Variables
    Statistics 102 Regression Summary Spring, 2000 - 1 - Regression Summary Project Analysis for Today First multiple regression • Interpreting the location and wiring coefficient estimates • Interpreting interaction terms • Measuring significance Second multiple regression • Deciding how to extend a model • Diagnostics (leverage plots, residual plots) Review Questions: Categorical Variables Where do those terms in a categorical regression come from? • You cannot use a categorical term directly in regression (e.g. 2(“Yes”)=?). • JMP converts each categorical variable into a collection of numerical variables that represent the information in the categorical variable, but are numerical and so can be used in regression. • These special variables (a.k.a., dummy variables) use only the numbers +1, 0, and –1. • A categorical variable with k categories requires (k-1) of these special numerical variables. Thus, adding a categorical variable with, for example, 5 categories adds 4 of these numerical variables to the model. How do I use the various tests in regression? What question are you trying to answer… • Does this predictor add significantly to my model, improving the fit beyond that obtained with the other predictors? (t-ratio, CI, p-value) • Does this collection of predictors add significantly to my model? Partial-F (Note: the only time you need partial F is when working with categorical variables that define 3 or more categories. In these cases, JMP shows you the partial-F as an “Effect Test”.) • Does my full model explain “more than random variation”? Use the F-ratio from the Anova summary table. Statistics 102 Regression Summary Spring, 2000 - 2 - How do I interpret JMP output with categorical variables? Term Estimate Std Error t Ratio Prob>|t| Intercept 179.59 5.62 32.0 0.00 Run Size 0.23 0.02 9.5 0.00 Manager[a-c] 22.94 7.76 3.0 0.00 Manager[b-c] 6.90 8.73 0.8 0.43 Manager[a-c]*Run Size 0.07 0.04 2.1 0.04 Manager[b-c]*Run Size -0.10 0.04 -2.6 0.01 • Brackets denote the JMP’s version of dummy variables.
    [Show full text]
  • Using Survey Data Author: Jen Buckley and Sarah King-Hele Updated: August 2015 Version: 1
    ukdataservice.ac.uk Using survey data Author: Jen Buckley and Sarah King-Hele Updated: August 2015 Version: 1 Acknowledgement/Citation These pages are based on the following workbook, funded by the Economic and Social Research Council (ESRC). Williamson, Lee, Mark Brown, Jo Wathan, Vanessa Higgins (2013) Secondary Analysis for Social Scientists; Analysing the fear of crime using the British Crime Survey. Updated version by Sarah King-Hele. Centre for Census and Survey Research We are happy for our materials to be used and copied but request that users should: • link to our original materials instead of re-mounting our materials on your website • cite this as an original source as follows: Buckley, Jen and Sarah King-Hele (2015). Using survey data. UK Data Service, University of Essex and University of Manchester. UK Data Service – Using survey data Contents 1. Introduction 3 2. Before you start 4 2.1. Research topic and questions 4 2.2. Survey data and secondary analysis 5 2.3. Concepts and measurement 6 2.4. Change over time 8 2.5. Worksheets 9 3. Find data 10 3.1. Survey microdata 10 3.2. UK Data Service 12 3.3. Other ways to find data 14 3.4. Evaluating data 15 3.5. Tables and reports 17 3.6. Worksheets 18 4. Get started with survey data 19 4.1. Registration and access conditions 19 4.2. Download 20 4.3. Statistics packages 21 4.4. Survey weights 22 4.5. Worksheets 24 5. Data analysis 25 5.1. Types of variables 25 5.2. Variable distributions 27 5.3.
    [Show full text]
  • Analysis of Variance with Categorical and Continuous Factors: Beware the Landmines R. C. Gardner Department of Psychology Someti
    Analysis of Variance with Categorical and Continuous Factors: Beware the Landmines R. C. Gardner Department of Psychology Sometimes researchers want to perform an analysis of variance where one or more of the factors is a continuous variable and the others are categorical, and they are advised to use multiple regression to perform the task. The intent of this article is to outline the various ways in which this is normally done, to highlight the decisions with which the researcher is faced, and to warn that the various decisions have distinct implications when it comes to interpretation. This first point to emphasize is that when performing this type of analysis, there are no means to be discussed. Instead, the statistics of interest are intercepts and slopes. More on this later. Models. To begin, there are a number of approaches that one can follow, and each of them refers to a different model. Of primary importance, each model tests a somewhat different hypothesis, and the researcher should be aware of precisely which hypothesis is being tested. The three most common models are: Model I. This is the unique sums of squares approach where the effects of each predictor is assessed in terms of what it adds to the other predictors. It is sometimes referred to as the regression approach, and is identified as SSTYPE3 in GLM. Where a categorical factor consists of more than two levels (i.e., more than one coded vector to define the factor), it would be assessed in terms of the F-ratio for change when those levels are added to all other effects in the model.
    [Show full text]
  • 2021 RHFS Survey Methodology
    2021 RHFS Survey Methodology Survey Design For purposes of this document, the following definitions are provided: • Building—a separate physical structure identified by the respondent containing one or more units. • Property—one or more buildings owned by a single entity (person, group, leasing company, and so on). For example, an apartment complex may have several buildings but they are owned as one property. Target population: All rental housing properties in the United States, circa 2020. Sampling frame: The RHFS sample frame is a single frame based on a subset of the 2019 American Housing Survey (AHS) sample units. The RHFS frame included all 2019 AHS sample units that were identified as: 1. Rented or occupied without payment of rent. 2. Units that are owner occupied and listed as “for sale or rent”. 3. Vacant units for rent, for rent or sale, or rented but not yet occupied. By design, the RHFS sample frame excluded public housing and transient housing types (i.e. boat, RV, van, other). Public housing units are identified in the AHS through a match with the Department of Housing and Urban Development (HUD) administrative records. The RHFS frame is derived from the AHS sample, which is itself composed of housing units derived from the Census Bureau Master Address File. The AHS sample frame excludes group quarters housing. Group quarters are places where people live or stay in a group living arrangement. Examples include dormitories, residential treatment centers, skilled nursing facilities, correctional facilities, military barracks, group homes, and maritime or military vessels. As such, all of these types of group quarters housing facilities are, by design, excluded from the RHFS.
    [Show full text]
  • A Meta-Analysis of the Effect of Concurrent Web Options on Mail Survey Response Rates
    When More Gets You Less: A Meta-Analysis of the Effect of Concurrent Web Options on Mail Survey Response Rates Jenna Fulton and Rebecca Medway Joint Program in Survey Methodology, University of Maryland May 19, 2012 Background: Mixed-Mode Surveys • Growing use of mixed-mode surveys among practitioners • Potential benefits for cost, coverage, and response rate • One specific mixed-mode design – mail + Web – is often used in an attempt to increase response rates • Advantages: both are self-administered modes, likely have similar measurement error properties • Two strategies for administration: • “Sequential” mixed-mode • One mode in initial contacts, switch to other in later contacts • Benefits response rates relative to a mail survey • “Concurrent” mixed-mode • Both modes simultaneously in all contacts 2 Background: Mixed-Mode Surveys • Growing use of mixed-mode surveys among practitioners • Potential benefits for cost, coverage, and response rate • One specific mixed-mode design – mail + Web – is often used in an attempt to increase response rates • Advantages: both are self-administered modes, likely have similar measurement error properties • Two strategies for administration: • “Sequential” mixed-mode • One mode in initial contacts, switch to other in later contacts • Benefits response rates relative to a mail survey • “Concurrent” mixed-mode • Both modes simultaneously in all contacts 3 • Mixed effects on response rates relative to a mail survey Methods: Meta-Analysis • Given mixed results in literature, we conducted a meta- analysis
    [Show full text]
  • A Computationally Efficient Method for Selecting a Split Questionnaire Design
    Iowa State University Capstones, Theses and Creative Components Dissertations Spring 2019 A Computationally Efficient Method for Selecting a Split Questionnaire Design Matthew Stuart Follow this and additional works at: https://lib.dr.iastate.edu/creativecomponents Part of the Social Statistics Commons Recommended Citation Stuart, Matthew, "A Computationally Efficient Method for Selecting a Split Questionnaire Design" (2019). Creative Components. 252. https://lib.dr.iastate.edu/creativecomponents/252 This Creative Component is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Creative Components by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. A Computationally Efficient Method for Selecting a Split Questionnaire Design Matthew Stuart1, Cindy Yu1,∗ Department of Statistics Iowa State University Ames, IA 50011 Abstract Split questionnaire design (SQD) is a relatively new survey tool to reduce response burden and increase the quality of responses. Among a set of possible SQD choices, a design is considered as the best if it leads to the least amount of information loss quantified by the Kullback-Leibler divergence (KLD) distance. However, the calculation of the KLD distance requires computation of the distribution function for the observed data after integrating out all the missing variables in a particular SQD. For a typical survey questionnaire with a large number of categorical variables, this computation can become practically infeasible. Motivated by the Horvitz-Thompson estima- tor, we propose an approach to approximate the distribution function of the observed in much reduced computation time and lose little valuable information when comparing different choices of SQDs.
    [Show full text]
  • Describing Data: the Big Picture Descriptive Statistics Community
    The Big Picture Describing Data: Categorical and Quantitative Variables Population Sampling Sample Statistical Inference Exploratory Data Analysis Descriptive Statistics Community Coalitions (n = 175) In order to make sense of data, we need ways to summarize and visualize it. Summarizing and visualizing variables and relationships between two variables is often known as exploratory data analysis (also known as descriptive statistics). The type of summary statistics and visualization methods to use depends on the type of variables being analyzed (i.e., categorical or quantitative). One Categorical Variable Frequency Table “What is your race/ethnicity?” A frequency table shows the number of cases that fall into each category: White Black “What is your race/ethnicity?” Hispanic Asian Other White Black Hispanic Asian Other Total 111 29 29 2 4 175 Display the number or proportion of cases that fall into each category. 1 Proportion Proportion The sample proportion (̂) of directors in each category is White Black Hispanic Asian Other Total 111 29 29 2 4 175 number of cases in category pˆ The sample proportion of directors who are white is: total number of cases 111 ̂ .63 63% 175 Proportion and percent can be used interchangeably. Relative Frequency Table Bar Chart A relative frequency table shows the proportion of cases that In a bar chart, the height of the bar corresponds to the fall in each category. number of cases that fall into each category. 120 111 White Black Hispanic Asian Other 100 .63 .17 .17 .01 .02 80 60 40 All the numbers in a relative frequency table sum to 1.
    [Show full text]
  • Chi Square Survey Questionnaire
    Chi Square Survey Questionnaire andCoptic polo-neck and monoclonal Guido catholicizes Petey slenderize while unidirectional her bottom tigerishness Guthrie doffs encased her lamplighter and cricks documentarily necromantically. and interlinedMattery disinterestedly,queenly. Garcon offerable enskied andhis balderdashesmanufactural. trivializes mushily or needily after Aamir ethylate and rainproof Chi Square test of any Contingency Table because Excel. Comparing frequencies Chi-Square tests Manny Gimond. Are independent of squared test have two tests are the. The Chi Squared Test is a statistical test that already often carried out connect the start of they intended geographical investigation. OpenStax Statistics CH11THE CHI-SQUARE Top Hat. There are classified according to chi square survey questionnaire. You can only includes a questionnaire can take these. ANOVA Regression and Chi-Square Educational Research. T-Tests & Survey Analysis SurveyMonkey. Aids victims followed the survey analysis has a given by using likert? Square test of questionnaires, surveys frequently scared to watch horror movies too small? In short terms with are regression tests t-test ANOVA chi square and. What you calculate a survey solution is two columns of questionnaires, surveys frequently than to download reports! Using Cross Tabulation and Chi-Square The Survey Says. The Chi-Square Test for Independence Department of. And you'll must plug the research into a chi-square test for independence. Table 4a reports the responses to questions 213 in framework study survey. What output it mean look the chi square beauty is high? Completing the survey and surveys frequently ask them? Chi square test is rejected: the survey in surveys, is the population, explain that minority male and choose your dv the population of.
    [Show full text]
  • Collecting Data
    Statistics deal with the collection, presentation, analysis and interpretation of data.Insurance (of people and property), which now dominates many aspects of our lives, utilises statistical methodology. Social scientists, psychologists, pollsters, medical researchers, governments and many others use statistical methodology to study behaviours of populations. You need to know the following statistical terms. A variable is a characteristic of interest in each element of the sample or population. For example, we may be interested in the age of each of the seven dwarfs. An observation is the value of a variable for one particular element of the sample or population, for example, the age of the dwarf called Bashful (= 619 years). A data set is all the observations of a particular variable for the elements of the sample, for example, a complete list of the ages of the seven dwarfs {685, 702,498,539,402,685, 619}. Collecting data Census The population is the complete set of data under consideration. For example, a population may be all the females in Ireland between the ages of 12 and 18, all the sixth year students in your school or the number of red cars in Ireland. A census is a collection of data relating to a population. A list of every item in a population is called a sampling frame. Sample A sample is a small part of the population selected. A random sample is a sample in which every member of the population has an equal chance of being selected. Data gathered from a sample are called statistics. Conclusions drawn from a sample can then be applied to the whole population (this is called statistical inference).
    [Show full text]
  • Questionnaire Analysis Using SPSS
    Questionnaire design and analysing the data using SPSS page 1 Questionnaire design. For each decision you make when designing a questionnaire there is likely to be a list of points for and against just as there is for deciding on a questionnaire as the data gathering vehicle in the first place. Before designing the questionnaire the initial driver for its design has to be the research question, what are you trying to find out. After that is established you can address the issues of how best to do it. An early decision will be to choose the method that your survey will be administered by, i.e. how it will you inflict it on your subjects. There are typically two underlying methods for conducting your survey; self-administered and interviewer administered. A self-administered survey is more adaptable in some respects, it can be written e.g. a paper questionnaire or sent by mail, email, or conducted electronically on the internet. Surveys administered by an interviewer can be done in person or over the phone, with the interviewer recording results on paper or directly onto a PC. Deciding on which is the best for you will depend upon your question and the target population. For example, if questions are personal then self-administered surveys can be a good choice. Self-administered surveys reduce the chance of bias sneaking in via the interviewer but at the expense of having the interviewer available to explain the questions. The hints and tips below about questionnaire design draw heavily on two excellent resources. SPSS Survey Tips, SPSS Inc (2008) and Guide to the Design of Questionnaires, The University of Leeds (1996).
    [Show full text]
  • Which Statistical Test
    Which Statistical Test Using this aid Understand the definition of terms used in the aid. Start from Data in Figure 1 and go towards the right to select the test you want depending on the data you have. Then go to the relevant instructions (S1, S2, S3, S4, S5, S6, S7, S8, S9 or S10) to perform the test in SPSS. The tests are mentioned below. S1: Normality Test S2: Parametric One-Way ANOVA Test S3: Nonparametric Tests for Several Independent Variables S4: General Linear Model Univariate Analysis S5: Parametric Paired Samples T Test S6: Nonparametric Two Related Samples Test S7: Parametric Independent Samples T Tests S8: Nonparametric Two Independent Samples T Tests S9: One-Sample T Test S10: Crosstabulation (Chi-Square) Test Definition of Terms Data type is simply the various ways that you use numbers to collect data for analysis. For example in nominal data you assign numbers to certain words e.g. 1=male and 2=female. Or rock can be classified as 1=sedimentary, 2=metamorphic or 3=igneous. The numbers are just labels and have no real meaning. The order as well does not matter. For ordinal data the order matters as they describe order e.g. 1st, 2nd 3rd. They can also be words such as ‘bad’, ‘medium’, and ‘good’. Both nominal and ordinal data are also refer to as categorical data. Continuous data refers to quantitative measurement such as age, salary, temperature, weight, height. It is good to understand type of data as they influence the type of analysis that you can do.
    [Show full text]