Statistical Knowledge for Teaching Categorical Association

Excerpt of draft Categorical Association mini-Module. Copyright 2017 Casey and Ross STATISTICAL KNOWLEDGE FOR TEACHING CATEGORICAL ASSOCIATION INTRODUCTION This draft material was meant to be supplemental to introductory statistics textbooks, whether they are algebra-based or calculus-based. So, it does not attempt to teach the mechanics of formal hypothesis testing (e.g., Chi-Squared computations). Please consult the many free online introductory statistics books for that material (see the References for links). Future versions of this material will include more intro-stats material rather than being supplemental. This material is designed to address Statistical Knowledge for Teaching regarding categorical association. This includes two main components: Subject Matter Knowledge and Pedagogical Content Knowledge. The first involves material that is typically contained in ordinary statistics courses, but algebra-based courses tend to skip some parts, and calculus-based courses tend to skip other parts, so here we present a unified (but non-calculus) view appropriate for future teachers. Pedagogical content knowledge is what teachers specifically need to know. In this material, you will be discussing common student misconceptions, responding to students’ work, constructing lessons that use key pedagogical ideas, and considering standards from the Common Core State Standards for Mathematics (CCSS-M) and the Advanced Placement Statistics (college-level introductory statistics course offered at high schools) curriculum (see page 31 for the standards). TABLE OF CONTENTS (for entire mini-module; those in this excerpt bolded) Overview 2 Activity 1: Table Construction 2 Two Big Questions 6 Activity 2: Joint, Marginal, and Conditional Relative Frequencies 6 Joint Relative Frequencies 7 Marginal Relative Frequencies 7 Conditional Relative Frequencies 8 Independence 9 Activity 3: Analyzing student thinking 14 Activity 4: Graphing and EDA 20 Comparative Bar Charts 20 Segmented Bar Graphs 22 Other plots you might see: Mosaic/Ribbon Charts 24 Other plots you might see: Pie Charts 24 Activity 5: Association versus Causation 25 Activity 6: Chi-square analysis thoughts 28 A Simpler Explanation 30 Activity 7: Curriculum Standards: CCSS versus AP Statistics 32 Activity 8: Randomization 35 References 40 Free Online Statistics Books 40 Bibliography 40 Homework 41 1 Excerpt of draft Categorical Association mini-Module. Copyright 2017 Casey and Ross Activity 3: Analyzing student thinking Do the following two tasks: TASK 1 (Smoking): In a medical center 250 people have been observed in order to determine whether the habit of smoking has some relationship with bronchial disease. The following results have been obtained: Bronchial No bronch. Total Question 3-a: For this sample of people, is disease disease bronchial disease associated with smoking? Smoke 90 60 150 Explain your answer. Not smoke 60 40 100 Total 150 100 250 TASK 2 (Drug): We are interested in assessing if a certain drug produces digestive troubles in old people. For a sufficient period, 25 old people have been studied, and these results have been obtained: Digestive No diges. Total Question 3-b: Using the information in this troubles troubles table, for this sample of old people, is Drug taken 9 8 17 digestive trouble associated with taking the No drug 7 1 8 drug? Total 16 9 25 What follows are some sample student responses to these tasks that illustrate misconceptions students have when doing tasks like these. These student responses are real student responses to these tasks, or ones like them, which have come out of research studies done by statistics educators. Set 1 David (Smoking task): Yes, there is an association. Although both the percentage of smokers with bronchial disease and the percentage of non-smokers with bronchial disease are 60%, there are more smokers with bronchial disease than nonsmokers. Nathan (Smoking task): Ummm wouldn’t it make more sense though that they would have to interview the same amount of people because if they did then they would have more yeses and more nos for the not-smoking population. Question 3-c: With your partner, discuss and document: 1) What misconceptions do these students have? 2) How are their misconceptions related? i.e., What prerequisite knowledge do these students appear to be lacking? 2 Excerpt of draft Categorical Association mini-Module. Copyright 2017 Casey and Ross 3) Your response to this pair of students as their teacher. Be specific in your description of your response, writing exactly what you would say and/or drawing anything you would use in your response. Set 2—[Deterministic misconception—removed for this excerpt] Set 3—[Unidirectional misconception—removed for this excerpt] Set 4—[Localist misconception—removed for this excerpt] Set 5—[Ignoring Data, Using Only Preconceived Notions—removed for this excerpt] In summary, these are the common student misconceptions students have when analyzing categorical data for associations: Lack of proportional reasoning Deterministic Unidirectional Localist Use of only intuition and ignoring the data Learning about these common misconceptions that students have regarding categorical association is meant to better prepare you to teach this topic because you can anticipate student issues and incorporate that knowledge into your planning for instruction on the topic. Other pedagogical points which research has found helps students learn categorical association are: Presenting all of the data simultaneously rather than one case at a time Working with data from meaningful contexts Using a small number of data points 3 Excerpt of draft Categorical Association mini-Module. Copyright 2017 Casey and Ross Telling students that inverse association is possible, and it’s still association. Including examples of inverse association and no association 4 Excerpt of draft Categorical Association mini-Module. Copyright 2017 Casey and Ross Activity 4: Graphing and EDA Now we turn to the various ways to graph contingency tables, and ways of doing exploratory data analysis (EDA). Comparative Bar Charts Consider a generic survey with two questions; each person answers either Pro (in favor), Undecided, or Con (against) to question 1 and question 2 [note: we will be updating this example to make it less generic, more interesting]. Suppose we get the following frequency table: Table: Two polling questions Pro2 Und2 Con2 Totals: Pro1 105 15 30 150 Und1 42 6 12 60 Con1 63 9 18 90 Totals: 210 30 60 300 In a previous activity, we found that it shows perfect independence. Now let’s consider how to graph it. One graph that many people think to start with is bars showing each cell’s count. This is called a side-by-side bar graph or a comparative bar chart. There are two ways to group the bars: Two Survey Questions: Two Survey Questions: Comparative Bar Chart Comparative Bar Chart 120 120 100 100 80 Pro1 80 Pro2 60 60 Und1 Und2 40 40 Frequency Con1 Frequency Con2 20 20 0 0 Pro2 Und2 Con2 Pro1 Und1 Con1 Figures 4-A and 4-B Both of these suffer the same problem, though: since there were far more people in the Pro1 category than Und1 or Con1 (150 versus 60 or 90), we expect to see taller bars for Pro1 regardless of any association between the variables. We can’t compare bar heights without also mentally comparing to the number of people in each category with these graphs of frequency. Question 4-a: Give at least three ways you can tell that these graphs above aren’t showing relative frequencies. Question 4-b: If a student said “the Con1 bar for Pro2 is much taller than the Con1 bar for Und2, so the variables are associated because Con1 response varies with their position on question 2”, how would you respond? 5 Excerpt of draft Categorical Association mini-Module. Copyright 2017 Casey and Ross Instead of graphing the simple frequencies, we must compute relative frequencies based on the number of people on each row (or each column) to get graphs we can easily interpret. That is, we need to graph the conditional relative frequencies instead of the joint. We can use either row- or column-based conditionals, and we can group them in two ways, which means we have 4 possible plots. First, consider these two: Two Survey Questions: Two Survey Questions: Comparative Bar Chart Comparative Bar Chart 80% 80% 60% 60% Pro1 Pro2 40% 40% Und1 Und2 20% Con1 20% Con2 Relative Frequency Relative Frequency 0% 0% Pro2 Und2 Con2 Pro1 Und1 Con1 Figures 4-C and 4-D Question 4-c: Are these showing us the conditional relative frequencies of Question 1 given Question 2, or vice versa? Explain. Hint: what adds to 100%? The graph on the left shows perfect independence between the questions because the chance of Pro2 does not change (70%) regardless of whether the person was Pro1, Und1, or Con1, and similarly for the other positions (Und2, Con2) on question 2. That is, for independence, we are looking for the bars in each group to be the same height in this type of graph. If the bar heights in a group are not the same, we have evidence of statistical association. Notice that this might feel backwards: in non-technical language, people who are associated with each other have something in common (for example, labor unions or professional organizations are often called Associations), but here the association is shown by different bar heights based on how people answered question 1 in the poll. The graph on the right shows perfect independence between the questions because the relative frequencies of answers to question 2 does not change from one group of bars to another. That is, the RFs do not depend on what someone answered on question 1. Here we are looking at the height of the leftmost bar in each group and noting they are all the same, and similarly the middle bars in each group all have the same height, and the rightmost bars too.

Statistical Knowledge for Teaching Categorical Association

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support