Univariate Analyses Can Be Used for Which of the Following
Total Page:16
File Type:pdf, Size:1020Kb
Chapter Coverage and Supplementary Lecture Material: Chapter 17 Please replace the hand-out in class with a print-out or on-line reading of this material. There were typos and a few errors in that hand-out. This covers (and then some) the material in pages 275-278 of Chapter 17, which is all you are responsible for at this time. There are a number of important concepts in Chapter 17 which need to be understood in order to proceed further with the exercises and to prepare for the data analysis we will do on data collected in our group research projects. I want to go over a number of them, and will be sure to cover those items that will be on the quiz. First, at the outset of the chapter, the authors explain the difference between quantitative analysis which is descriptive and that which is inferential. And then the explain the difference between univariate analysis, bivariate analysis and multivariate analysis. Descriptive data analysis, the authors point out on page 275, involves focus in on the data collected per se, which describes what Ragin and Zaret refer to as the object of the research - what has been studied (objectively one hopes). This distinction between the object of research and the subject of research is an important one, and will help us understand the distinction between descriptive and inferential data analysis. As I argue in the paper I have written: “In thinking about the topic of a research project, it is helpful to distinguish between the object and subject of research. Ragin and Zaret distinguished between the object of research, which are the observational units, and the subject of research, such as relationships among variables (Ragin and Zaret 1983), the nature of a social mechanism, or some other subject.” (Dover, Michael. 2006. “Teaching Yourself How to Write a Thesis: Several Easy Steps,” article submitted to Teaching Sociology, citing Ragin, Charles and David Zaret. 1983. "Theory and Method in Comparative Research: Two Strategies." Social Forces 61:731-53.) Univariate Bivariate Multivariate (1 Variable) (Relationship (Relationship between 2 between 3 or more variables) variables) Descriptive Frequencies To Be Discussed To Be Discussed (Characteristics of Mode a sample itself or of Median a population) Mean Inferential To Be Discussed To Be Discussed To Be Discussed (Inference from a sample to a larger population) As the text points out (p. 275), “Descriptive analysis does not provide a basis for generalizing beyond our particular study or sample.” In other words, descriptive analysis doesn’t permit coming to conclusions based upon the empirical results about anything other than the object of the research, it doesn’t permit inferring anything about a larger population, even if the sample studied was a random sample of that larger population. Doing so requires the use of inferential data analysis, which is covered in the second half of this chapter, after it covers descriptive data analysis. The authors imply but evade the question of whether relationships between variables can be studied in descriptive data analysis. The do point out, “Even when we describe relationships between variables in our study, that alone does not provide sufficient grounds for inferring those relationships exist in general or have an theoretical meaning.” Implied here is that one can in fact discuss relationships among variables, but because most social science research focuses on efforts to generalize to larger populations, mere descriptive research is often considered of lesser scientific importance. In social work research, however, it is often the case that the population of interest is for instance the clients of an agency, or the workers within an agency, or a defined neighborhood area. Important research can be done that is descriptive about such a population, even if one can’t infer to larger populations. In other words, even in descriptive research, the subject of a research project can be relationships between variables among a sample studied, if the sample studied is the actual population of interest. This is an important point not covered by the text, and it is one which is relevant to our research this term. If one is studying an entire population, say all SWK 100 students or all SWK 250 students, can can do data analysis that is descriptive but which permits studying characteristics of that entire population. One assumption behind inferential statistics is that one is starting with a random sample of a population, such that one can analyze relationship among variables found in the sample and project with some degree of confidential that the same findings would apply to the larger population. However, it is universally agreed that the larger a random sample, the better, resources permitting. In fact, the larger the random sample, the smaller the sampling error associated with research. But think about that for a moment. What if one took a 99% random sample, i.e. one drew from a population a sample that included all but 1% of the population. Could one infer results from a “descriptive” data analysis of that sample that applied to the entire population? For instance, say a strong relationship was found between gender and belief in the importance of HIV testing, and that this was found at the .01 level of statistical significance. This means that in 99% of all random samples drawn from the population, such a relationship would be found. Such an interpretation is made irrespective of the size of the random sample! But, thinking logically, one could communicate such a finding in say a report to respondents or a press release with greater confidence if there was a large random sample, correct? So what if the random sample were 99% of the population? Still true, presumably. But what if the entire population was studied, with 100% response rate, in other words, all possible members of that population responded. Still, the relationship between gender and belief in HIV could be reported. Therefore, at the level of a population being studied as a population, the distinction between descriptive and inferential data analysis disappears. Accordingly, there is nothing magical about the distinction between descriptive and inferential data analysis. And, there is nothing “inferior” about descriptive data analysis compared to inferential data analysis. I might add, I disagree with the authors that descriptive statistics can’t be of theoretical relevance. As I argue: “As such, my dissertation was a theoretically relevant exploratory and descriptive study. One lesson here is that theory is relevant not only to explanatory studies but to descriptive studies as well.” (Dover, 2006). Descriptive Univariate Analysis First, let’s discuss univariate analysis, the analysis of characteristics of a single variable, such as the distribution of values, the dispersion of values, etc.. What is univariate analysis used for? Let’s take the example of age, in our dataset. After all, univariate analysis of age can help us understand the age distribution of the sample. It can’t tell us anything about the relationship between age and another variable, say education, that would be bivariate analysis. It can’t tell us anything about the relationship between age and education, controlling for health, that would be multivariate analysis. Univariate analysis - a single variable Bivariate analysis - two variables Multivariate analysis - more than 2 variables. What are some of the key types of univariate analysis? Three types of univariate analysis are referred to as ways of measuring the central tendency and these include mode, median and mean. But before discussing these, let’s be sure that we understand the concept frequency distribution. When one does a Histogram in Excel, as is done in Exercise 2, one is producing a frequency distribution that is expressed in the table that accompanies the Histogram chart. For each value of the variable, actually, to be more specific, for each value which is a code assigned to an attribute of the variable as well as for each value representing missing data of some kind if there is such missing data, the frequency of that value is found. For an interval variable (definition below), there may be dozens or even hundreds of values found in a frequency distribution, but for ordinal or nominal variables, there are typically only a few or up to around a dozen such values. For them, a table such as that accompanying the histogram is a good way to display a frequency distribution. A frequency distribution, then, is an excellent way to do a univariate analysis of the characteristics of a single variable. Another way to engage in a univariate analysis is to use a statistic or a parameter to characterize that variable. Note that the authors don’t use that term! In fact, one can read entire textbooks on statistics and find the term statistic isn’t defined. In fact, of the leading four dictionaries of sociology, only one defines the term statistic: “A mathematical value that summarizes the characteristics of a sample.” (George A. Theodorson and Achilles G. Theodorson, 1969, Modern Dictionary of Sociology). The mode, median and mean are three types of statistics (when applied to a sample) and are three forms of what are known as population parameters when applied to a population. Seen this way, statistics are hopefully less intimidating: they are merely ways of understanding something about a sample of a population or about a population itself. There are three common measures of central tendency, the mode, the median and the mean. The Mode: The mode is the most important measure of central tendency for nominal variables, where mean and median are meaningless, because nominal variables aren’t rank ordered. The mode value, or modal value, of a single variable is simply the most frequent attribute of that variable.