<<

S t u d e n t ’ s G u i d e Volume 2: Quantitative Analysis and Statistics Classes

Prepared by S. Clayton Palmer Instructor of Weber State University

J. Lon Carlson Associate Professor Department of Economics

Illinois State University www.HarperAcademic.com P r e fa c e

Introduction Welcome to the Student’s Guide to SuperFreakonomics! SuperFreakonomics drives home the point that there is no argument without a statistical analysis of the data. The conclusions in the book have been drawn from the data—using the statistical methods you are being introduced to in your classes.

Many students view statistical methods and statistical inference as difficult, if not impossible, to master. However, this perception is often based on the notion that statistics is not applicable to anything that real people deal with. As an old college song says, “We’re working our way through college, to get a lot of knowledge that we’ll probably never, ever, use again.1” SuperFreakonomics dispels this myth. What we learn from Levitt and Dubner is that when you understand the data, you understand the world!

As you read SuperFreakonomics, you will see the real-world applications of statistical concepts in specific situations. But, there are no explanations of the statistical methods used. The purpose of this guide is to help you understand the analyses presented in SuperFreakonomics by providing a “bridge” between the material covered in an introductory course in statistics and the engaging top- ics covered in what we consider to be one of the most fascinating books in economics.

There is one other task we attempt to accomplish in this guide. The authors of SuperFreakonomics show you what it’s like to think like a statistician. The material we present here is intended to make the job even easier for you. Organization of the Student’s Guide

We organized the material in this guide to help you identify the key points in each chapter and to ensure that you have a firm grasp of the key concepts presented in the book. The first section of each chapter in this guide consists of an overview that highlights the major topics and points pre- sented in the book. The overview is designed to alert you to the major topics and is not intended to serve, in any way, as a substitute for the material in the text.

The second section of each chapter highlights key statistical concepts and methods that are addressed in the corresponding chapter of SuperFreakonomics. The purpose of this section is to alert you to the major factors that affect the relationships being illustrated. These concepts and methods are not organized the same way your statistics text is organized. Simple concepts and more complex methods show up in the same chapters of SuperFreakonomics. You will learn about simple statistical techniques and more complex methods in all of the chapters. We’ve labeled

1. We’re Working Our Way Through College Music and lyrics by Johnny Mercer & Richard Whiting. From the movie Varsity Show (Warner Bro. 1937) 2 them according to the subjects in your statistical text book so that you can slide more easily from SuperFreakonomics to your text book and vice versa.

The third section of each chapter consists of a list of what we have termed “core competencies.” How well you are able to respond to each of the questions listed in this section will be a strong indi- cator of the extent to which you understand the material presented in the book. Using the Student Guide

When using the student guide, remember that the overview and discussions of terms are not sub- stitutes for reading SuperFreakonomics. Instead they are designed to flag key topics and alert you to specific items you may have missed. When completing the “core competency” questions, we recom- mend that you avoid using your book or notes to answer the questions on your initial run through. Instead, use the questions as a way to flag topics you have not yet mastered. When you cannot answer a specific question, or you answer it incorrectly, you should take that as a clue: go back and devote more time to the topic. One More Helpful Thought

This guide was prepared for students taking a class in quantitative analysis or statistics. Another guide has been prepared for students enrolled in an economics class. It contains descriptions of how the topics of SuperFreakonomics relate standard economic concepts and provides a “bridge” between this book and a traditionally organized economics text. Depending on the class your are taking and the way your instructor has organized the curriculum, you may find the economics-ori- ented student guide helpful.

C h a p t e r 1

How Is a Street Prostitute Like a Department-Store Santa? Summary Levitt and Dubner begin Chapter 1 by describing some of the ways women have been abused and discriminated against over time. Then, they point out that although conditions in certain countries have improved dramatically over the last few decades, women still suffer the effects of discrimina- tion. This is especially true in the labor market, where a significant wage differential between men and women continues to persist despite increased educational opportunities and legislative initia- tives such as Title IX. This leads to consideration of the labor market in which women clearly domi- nate the supply side—the market for prostitution.

3 We know that the topic of prostitution may make some instructors uncomfortable. The discomfort factor notwithstanding, however, this topic demonstrates how economic analysis can provide an objective assessment of at least some of the benefits and costs of a profession that provokes a vast range of responses from outside observers. Remind your students to focus on the data! The authors begin their analysis of the market for prostitution by describing what the market was like at the turn of the century in , and how the subsequent criminalization of this activity affected wages and working conditions over time. Information on relative wages in absolute and current dollars suggests why so many women would consider working as a prostitute. Focusing on the upscale Everleigh Club, the authors also show how manipulation of supply and demand through product differentiation—supply a service that entails greater marginal costs and target that segment of consumers with greater willingness to pay—could result in a price much higher than the broader market equilibrium. They then provide useful insights regarding how society’s objectives might be best met when it comes to enforcement of certain laws, i.e., it is more effective to focus on demand than supply if certain activities are going to be effectively curtailed. The next part of the chapter focuses on information gleaned from data collection efforts by that shed considerable light on the current conditions in the market for prostitution in Chicago, ranging from wages and prices for various services rendered to the effects of relying on marketers (e.g., pimps) to sell one’s products or services. At this point, the authors also demonstrate that what may be true in one market—pimps tend to improve market outcomes for prostitutes— may not hold elsewhere, e.g., procuring the services of a realtor does not necessarily leave the seller of a house better off. This portion of the chapter concludes with an examination of the role the police play in controlling prostitution and how the solutions they devise can be at odds with the objectives of policymakers, i.e., the principal-agent problem. Attention then turns to consideration of how the increase in the range of opportunities for edu- cated women to work outside of teaching has simultaneously increased their average pay and dis- advantaged school children. The latter is simply a result of a decrease in the average level of ability of people entering the labor market for teachers. The authors then consider several possible expla- nations for the wage gap between men and women that continues to exist even after the increase in the availability of higher paying jobs for women. The final section tells the story of a woman who decided to become a prostitute on her own terms, and how she used sound marketing strategies and economic principles to achieve a considerable level of financial success.

Basic Statistical Concepts 1. Descriptive Statistics. This chapter provides numerous descriptors of data sets. Some examples are: the wages earned by street prostitutes, the wages earned among Americans generally, accounting for differences in gender, the prices charged for various sexual acts performed by street prostitutes. You should use this chapter to as a guide to the various data types, how sur- veys are designed, how data are gathered and how the data are analyzed and presented.

4 Types of Data. Most of the examples cited above are cross-sectional data. This is in contrast to time-series data. The description of how the mean wage of a prostitute has changed over time is an example of time-series data. Make certain to note the difference between these two types of data. In addition, pay attention to what information is qualitative. Among the examples of qualitative data, you will find nominal data—the street names of prostitutes and the locations where sexual services are rendered are examples. Data Gathering Techniques. How were the data on prostitution gathered? Sudhir Venkatesh col- lected data from prostitutes regarding sexual services rendered as well as data on a number of variables associated with the sex trade in South Chicago. Recognizing that traditional methods of gathering data regarding prostitution have not produced reliable results, he opted for a non-con- ventional method. He used “trackers.” These trackers were people who stood on street corners or sat in brothels with the prostitutes themselves. The trackers collected data from the prostitutes as soon as the customer was gone. The prostitutes were paid to provide the data to the tracker. Most of the trackers were themselves former prostitutes. Note that the collection of these data began with a designing a survey: a careful consideration of what questions would be asked and how would they be asked. The next task was to identify the desired sample size and how the participants of the survey would be chosen. Among other con- siderations, Venkatesh choose two main areas. These two areas differed by one variable: the use of pimps. This way, even though Venkatesh was engaged in an observational study, he had the rough equivalent of a natural experiment in which one data set differed from the other on the basis of this one variable. Data Gathering Complications. Gathering data related to a sensitive and morally provocative market like the sex trade can be difficult. It’s a tricky business and the data gathered won’t always be reli- able. Notice that Venkatesh decided that, in order to elicit cooperation and honest responses from the prostitutes surveyed, a properly credentialed tracker would be a former prostitute. Moreover, it was often necessary to pay for the prostitutes’ cooperation in answering the survey questions. The data gathering difficulties encountered by Vankatesh indicate that analyses of an illicit trade should be viewed with suspicion. The gathering of credible and reliable data will be difficult, expensive, and time consuming. Experimental vs. Observational Data. The data set on the South Chicago sex trade collected by Vankatesh is an example of observational data gathering. An experimental approach to gathering data involves carefully controlling the variables of interest. In observational data gathering, there is no attempt to control the variables. A survey is an oft-used method of gathering observational or non-experimental data. Sampling Techniques: Simple Random Sampling vs. Convenience Sampling. The data on the South Chicago sex trade collected by Venkatesh came from 160 prostitutes in three South Chicago loca- tions over a two-year period. He collected data on 13 variables from 2,200 sexual transactions.

5 Techniques regarding simple random samples described in statistics textbooks state that, within any population of N elements, any possible sample of size n has an equal probability of being selected. There are a variety of ways in which to ensure a random sample. One is to assign a number to each member of a population and then generate a set of n random numbers and include only those who are selected by the random numbers generator in the sample. Given the illicit nature of prostitution, it would be extremely difficult, if not impossible to determine the number of prosti- tutes in South Chicago, i.e., the population size, let alone locate and survey only those prostitutes who are selected by a random number generator. You might say that the data gathered is an example of convenience sampling. Your statistics text describes the different varieties of sampling. Convenience sampling is a nonprobablility sampling technique. It is often said that, with convenience sampling, it is impossible to evaluate the good- ness of the sample in terms of representing a population. Therefore, the calculation of sample statistics, e.g., the sample mean and standard deviation, is a useless exercise. The other side of this argument is that information from a convenience sample can be used to represent population parameters so long as those who are not sampled do not differ in any systematic way from those who are the subject of the sample. Therefore, the relevant question becomes: Are these 160 prosti- tutes, and the services they provide, systematically different from the population of prostitutes in South Chicago? If not, the information gathered can be used just as the information gathered from a simple random sample or a census of prostitutes in two main areas of South Chicago. 2. Normative vs. positive analysis. Positive analysis relates to what is. “Prostitutes who ply their trade under the management of a pimp typically earn higher hourly wages” is an example of a positive statement and can be the target of positive statistical analysis. Normative analysis relates to what ought to be. It is not statistical analysis. If your statistics text includes any information at all on the difference between positive and norma- tive analysis, it’s probably in the first chapter. Yet, the temptation, even for a statistician, to engage in moralizing is strong. Statistical analysis is, at its heart, heartless. It does not account for or con- sider the moral aspects of the subject matter or consider how life should be. It merely analyzes how life is. This chapter, about prostitution, draws conclusions from the data without judgment. This is how statisticians must practice their craft. 3. Hypothesis Testing. In this chapter, Levitt and Dubner compare the average wage of prostitutes who use a pimp to those who market and manage themselves. In addition to a comparison of wages, the analysis includes a comparison of the number of incidents of assault by a client and the number of times a prostitute was forced to give a “freebie” to a gang member. These are cases in which a statistician gathers data on important variables and compares the mean of one sample against the mean of another. He or she will attempt to infer whether the differ- ences in the point estimates for these populations can be used to infer a difference in the respec- tive populations. In other words, is the difference in mean hourly wage noted from the sample data evidence of the existence of “pimpact” or is it merely a coincidence associated with random sam-

6 pling? The statistician will make this decision based on statistical significance. This is accomplished through hypothesis testing. This begins by stating a null and alternative hypothesis. In the prostitu- tion case these hypothesis statements are as follows:

H0: µ1= µ2

Ha: µ1≠ µ2

Where: µ1 is the population mean hourly wage of prostitutes without pimp managers

µ2 is the population mean hourly wage of prostitutes with pimp managers

The next steps are in your textbook in the chapter on hypothesis testing: 1) specify the level of sig- nificance, 2) compute the value of the test statistic, 3) calculate the “rejection rule”, and 4) use the value of the test statistic and the rejection rule to determine whether to reject H0. Hypothesis testing requires some practice. There are simple examples in every statistics text. More Hypothesis Testing. In addition to the question of prostitutes and pimps, the value of real estate agents is examined. Two samples are examined. The first sample consists of houses made available for sale by the home owner using FSBOMadison.com. The second sample consists of homes sold using a real estate agent. The mean sales price of the house in both samples is not sta- tistically different. Testing the null hypothesis, the conclusion can be reached that the mean sales price of a house in one sample doesn’t differ from the mean sales price in the other sample. For the sake of practice, pull out a piece of paper and write the null and alternative hypotheses for yourself. 4. Sampling: Mendacity among Respondents. Another problem related to conducting surveys is the issue of the truthfulness of the answers provided by survey participants. It is now clearly under- stood that—in some cases—survey respondents will exaggerate claims, distort information and out-right lie to surveyors. This is particularly the case for surveys regarding sensitive subjects such as drug abuse and sex. If there is some reason to believe that individuals giving responses to a survey are not entirely truthful, the survey results fall under a cloud of suspicion and pro- vide less-than-reliable results. This chapter of SuperFreakonomics describes a Mexican welfare program entitled Oportunidades. Here, economists studying the clients of this program found that the claims of the value of personal assets were routinely underreported.

Core Competencies

If you have read and carefully studied this chapter, you should be able to complete the following tasks. Note: these competencies do not begin with simpler statistical concepts and work toward more difficult one. Instead, they follow the organization of the book.

1. Describe how men and women have compared over time with respect to such factors as life expectancy, discrimination, and abuse. How has the comparison between the two changed over the past 100 years?

7 2. Referring to the study by Bertrand, Goldin, and Katz, list and explain the three main variables they identify as the primary causes of the observed wage gap between men and women.

3. How would you use multivariate regression analysis to determine the observed wage gap between men and women is a result of sexual discrimination and not other factors that may differentiate men and women in the labor market?

4. What are the steps in completing a hypothesis test, testing whether there is a statistically sig- nificant difference between two sample means—such as the mean wage of women as com- pared to men?

5. Compare the average wages earned by women in legal professions, e.g., seamstress, cleaning woman, to those earned by prostitutes in Chicago at the turn of the century.

6. Describe how the revenues earned by the proprietors and the employees of the Everleigh Club compared to the average for prostitutes in Chicago. What factors accounted for this difference?

7. A recognized problem with surveys and questionnaires that gather data about sensitive or mor- ally ambiguous subjects is that the responses by those being questioned are often unreliable. Explain why the method Sudhir Venkatesh used to collect data on prostitution in Chicago is superior to more traditional survey methods, particularly with respect to the reliability.

8. Using the example of the data collected on prostitution in Chicago, explain the difference between time-series and cross-sectional data.

9. Explain the difference between normative and positive analysis. How is this distinction useful in studying the sex trade?

10. How does the illicit and illegal nature of prostitution make the collection of data for statistical analysis difficult, expensive and potentially unreliable?

11. Explain the use of “trackers” by Sudhir Venkatesh in gathering data about the sex trade in Chicago. What, do you think, would be useful qualifications for the job of “tracker”?

12. Using the table presented in SuperFreakonomics on the menu of sex acts and the mean price of each, what graphical devise would be useful to visually illustrate this information? Could you create one that would array the information in an informative and visually compelling way?

13. Using the data collected on the sex trade in Chicago, explain the difference between an experi- mental study and an observational study. Can you think of a way to gather basic data on the sex trade using an experimental approach?

14. Explain the difference between simple random sampling and convenience sampling. Which was used by Sudhir Venkatesh to collect data on the sex trade?

8 15. How is data from convenience sampling to be used? Under what conditions can standard meth- ods of statistical inference be used with convenience sampling?

16. Explain what Levitt and Dubner mean by pimpact. In addition, describe a method for statistically measuring pimpact.

17. Explain what Levitt and Dubner mean by rimpact. Describe a method for statistically measuring rimpact.

18. How might one use multiple regression analysis to explain what variables determine the price of a house. What independent variables do you think should be included in the regression analysis?

19. Sudhir Vankatesh concludes that the mean hourly wage for a street prostitute who uses a pimp as a manager is higher than one who doesn’t. What statistical method can be used to determine if this difference is statistically meaningful?

20. What information is needed complete a hypothesis test of two sample means—such as the mean hourly wage of prostitutes with a pimp and those without?

21. Answer the question posed by the chapter’s title, i.e., how is a street prostitute like a depart- ment store Santa?

C h a p t e r 2

Why Should Suicide Bombers Buy Life Insurance? Summary This chapter begins with a description of how certain outcomes a person experiences, e.g., suscep- tibility to certain illnesses, disease, propensity for success in academics and the sports world, can be tied to characteristics of one’s parents, e.g., their religion, name, or success on the field of play, as well the individual’s date of conception. The authors then use the fact that what makes a particular boy 800 times more likely to play in the major leagues than a randomly selected one—the boy’s father played in the majors—to segue to the question: Who produces terrorists? The next part of the chapter focuses on various characteristics of a terrorist, e.g., the family back- ground of the typical terrorist; the distinction between a terrorist and a revolutionary; how the act of terrorism works to achieve the terrorist’s goals, including the public goods aspect of terror- ism; the direct and indirect costs of terrorism; and the unintended benefits a terrorist threat can yield, e.g., a reduction in other types of crime due to increased policing activities and reduced ill- ness because people don’t fly as much. This discussion once again highlights how the conventional wisdom—terrorists are poor and uneducated—can be at odds with reality—terrorists tend to

9 come from well-educated, higher-income families. The discussion of terrorism in turn serves as a springboard to examine the importance of information. The authors focus on how the September 11 attacks in particular, and terrorist acts in general, could easily overwhelm most emergency rooms around the country. The reasons include the design of most emergency rooms and information constraints. Levitt and Dubner’s discussion of Craig Feied’s efforts to transform emergency care at a Washington area hospital provides an interesting snapshot of the evolution of hospital emergency care over the past 50 years and an excellent illustration of the value of information. Adequate information about a patient’s background is critical to successfully treating that patient in an emergency situa- tion. The problem for Feied and his colleague was how to make that information readily available. The answer was object-oriented programming—and the result was Azyxxi, which Microsoft sub- sequently purchased from its creators and renamed Amalga. The adoption of Amalga in turn has resulted in substantially improved medical care for patients. It also has produced a massive amount of data which can be used to, among other things, evaluate the relative effectiveness of doctors. In the next part of the chapter, the authors emphasize the many pitfalls that can be encountered in attempting to answer a question such as: Who are the best and worst doctors in the E.R.? One of the most pressing problems is the lack of correlation between the quality of health care and the death rate of a particular doctor’s patients. After identifying the type of data that are most likely to reflect a doctor’s relative skill the authors note that, in fact, there is relatively little variation among E.R. doctors with respect to skill. That being said, they also point out that some doctors, e.g., females from top-rated medical schools, appear to perform marginally better than their peers. Factors that are actually more important from the patient’s perspective include his/her specific ailment, gen- der, income level, and ability to avoid going to the hospital if at all possible. This leads to a brief discussion of some of the factors that are correlated with longevity, including professional success, religious behavior, and inheritance taxes. This is followed by an examination of the relative ineffec- tiveness of chemotherapy as a treatment for many forms of cancer and a brief exploration of possible explanations for why chemotherapy nonetheless continues to be used so heavily. The final section of the chapter explains the chapter’s title by examining those factors that tend to distinguish a terrorist from the broader population. For example, the typical terrorist owns a mobile phone, is a student, and rents his home. Equally important, the typical terrorist does not have a savings account, does not withdraw money from an ATM on Friday afternoon, and does not buy life insurance. Based on these characteristics, it would be in the terrorist’s interest to purchase life insurance to throw the authorities off of his track. While these characteristics substantially reduce the pool of potential terrorists, there are an unacceptably large number of people who, based on this information, would be falsely accused of being a terrorist. The authors then explain that the relative intensity of an unspecified banking activity (which cannot be described for security rea- sons) substantially reduces the pool of suspected terrorists and may greatly enhance our ability to thwart future attacks.

10 Basic Statistical Concepts 1. Descriptive Statistics. As with the previous chapter, this chapter is replete with examples of descriptive statistics. Honestly, it’s a statistical feast. You can use this chapter to see illustrations of data presentations, data analysis, and data gathering techniques. You can see from the data on terrorists and from the data regarding patients and ER doctors that there is significant infor- mation in producing simple averages, ranges, relative frequencies, and in creating data tables. Survey Design. The preparation of a survey, the choice of participants, and the development of a survey instrument are to be approached with caution. As a student of statistics, the example of Craig Freied and the Washington Hospital Center, should serve as a case study in the problems encountered in designing a survey. While the WHC possessed a significant data set, the ER suffered from “datapenia,” (a word invented by Freied). Doctors were spending significant amounts of time managing information. The survey questions, given to doctors by medical assistants, facilitated the development of a medi- cal information system that allowed doctors to spend more time treating patients. The information system—named Azyxxi—began with an astutely prepared survey.

2. Correlation versus causation. Many principles texts like to address certain fallacies that must be avoided when conducting scientific inquiry. One is the problem of confusing correlation with causation. For example, the number of people who drown and ice cream sales are positively correlated: both tend to increase in the summer. However, we should not then conclude that the increase in ice cream sales causes an increase in drownings (or vice versa). The chapter’s opening discussion of the correlation between health problems and the timing of certain individuals’ conception/birth highlights the importance of digging deep enough to find the actual cause of such correlations; the “hidden” variable that is the causal variable, e.g., parents’ religious practices. The same is true of success in sports. It is not the fact that someone is born earlier in the year that makes them better at sports, rather, boys and girls born earlier in the recruit- ing year for such things as Little League baseball, tend to be more physically mature during a key time period. Given statistic that a ball player is 50% more likely to be selected by a major league team if his birthday is in August instead of July, if one didn’t separate correlation from causation, you’d be tempted to believe that one’s astrological sign conferred baseball talents.

More correlation vs causation. This same distinction is also critically important when evaluating the relative effectiveness of doctors. Focusing on patient outcomes could be extremely misleading depending on how patients are assigned to specific doctors. As such, the observation that a particu- lar doctor’s patients have higher mortality rate than another doctor does not allow us to conclude that the first doctor is not as good as the second doctor. As described in the next paragraph, this could be a problem of the dreaded selection bias.

11 3. Selection Bias. Clearly selection bias is an ill to be avoided and is a subject covered by every basic statistics text. Yet, in practice, it’s often difficult to avoid. College instructors and researchers often recruit students to participate in studies of various things. A note is placed in the student newspaper or bulletin board asking for volunteers. Medical clinics will often advertise on the radio or newspaper for people having a certain condition in order to test a new drug or therapy. In hospitals and universities alike, the reliance on volunteers introduces a selection bias. As described in this chapter, measuring a doctor’s skill also points out that selection bias can be hard to avoid. This problem occurs because patients aren’t randomly assigned to doctors. Better doctors may even have a higher death rate because the sicker patients may seek out the doctors with the best reputations. This example—attempting to measure a doctor’s skills—is a good case study on encountering and avoiding sample bias. It’s important for the student to know that most of the statistical methods that he or she learns from the statistics textbook are based on the premise that the data gathered from a sample are representative of the population. 4. Multivariate Regression Analysis. What makes a good terrorist? In the example of Ian Horsley, the characteristics of people who commit bank fraud and those of terrorists are revealed. This example can be used to describe how such characteristics can be revealed through regression analysis. Think for a minute about how a regression equation would be developed that could be used as a model to predict a terrorist or a person who is planning on committing bank fraud— at least a model that would be evidence enough to cast suspicion on a person. What would the dependant variable be? It might be a dichotomous variable: Terrorist or not a terrorist. Some of the independent variables could be: Did a person open his bank account with cash? Is a person’s address a P.O. Box? Did the person open a savings account as well? Others independent variables would be numeric or indices: the frequency of large withdrawals or the number of overseas transactions. So, once you’d gathered historical data on the banking habits of terrorists and non-terrorists, you would calculate the regression coefficients and test the regression equation for statistical significance. Model Specification. Starting with a regression equation and testing the equation and each variable for statistical significance begins the process of model specification. You may decide some variables are not statistically significant and drop them out of your equation. You could use a process called stepwise regression to specify your model. Once you have a regression equation that passes muster, you now use this equation as a predictive tool—a model—to assist in predicting who might be a terrorist or bank criminal. 5. Lagged Variables. A statistical technique used in developing a regression model is a lagged vari- able. It is a variable that is “lagged” one or more time periods before the dependent variable. The theory is that some variables take time to produce results. The analyses done by Almond and

12 Mazumder described at the beginning of this chapter regarding the “accident of birth” requires that one or more of the independent variables be cast as a lagged variable. Another example is the case of the “Spanish flu.” One way to reveal that the cause of a person’s relatively lower lifetime income is the 1918 flu pandemic is to correlate a person’s current income with a lagged time variable. The same could be done for the oddities of horse racing in 2004. The prequel to this book, , included a rather notable example of a lagged variable. The reduction in the U.S. crime rate was revealed to have likely been caused by an increase in abortions 18 to 25 years earlier. 6. Proper Choice of Dependent Variable. At times, when concentrating on collecting and developing data in order to populate a data set in order to estimate the coefficients of a regression equa- tion, it is not clear what the dependent variable should be. This chapter considers the risks that individuals face when considering the choice of driving or flying to a destination. What is it that people are really trying to avoid? Is it death? That is unlikely because, though many people avoid thinking about it, death is inevitable. Levitt and Dubner describe how people are more inter- ested in avoiding risks that they themselves don’t control. For example, when one is flying in a plane, one’s safety is determined by someone else. In a car, one’s safety is largely controlled by oneself—or so we like to think.

Core Competencies

Once you have read and carefully studied this chapter you should be able to complete the following tasks. Note: these competencies do not begin with simpler statistical concepts and work toward more difficult one. Instead, they follow the organization of the book.

1. Describe how the month in which someone is born can influence his success in sports in his adolescent and early teen years. Use this to explain the difference between correlation and causation.

2. What visual techniques would you use to illustrate the “relative age effect?” Could you create this visual with the information given? Remember, the purpose creating a visualization of data display is to make it compelling and informative.

3. Explain how you would use multivariate regression analysis to determine how likely it is for a person to play major-league baseball. What independent variables would you use?

4. What is a dummy variable? How would a dummy variable be used in a regression analysis to determine whether there is a “relative-age effect” in professional sports?

5. What is a “lagged variable?” How would a lagged variable be used to determine the life-time earnings of someone whose mother suffered from the “Spanish flu” in during her pregnancy?

6. Explain the statistical idea that “correlation is not causation” as it relates to the “relative age effect.” Wouldn’t certain astrological signs be correlated with great baseball players?

13 7. How could multivariate regression analysis be used to determine the characteristics of the typical terrorist? Could this technique be used by law enforcement agencies to profile likely terrorists?

8. What is risk analysis? How is it illustrated in describing the likelihood of “death by terrorist?”

9. In comparing the risks of driving a car versus the risk of flying an airplane, what risk factor should be used (i.e. if using a regression equation, what is the dependent variable)? Should we be measuring the risk of serious injury? The risk of death? The risk of death per mile? Or per trip?

10. Discuss the importance of information in the provision of treatment in a hospital emergency room. In particular, focus on how the amount of information readily available to doctors can affect their ability to successfully treat patients.

11. List the four variables Craig Feied identified as critical for an information system to possess in order for the system to be truly effective. What are the methods a statistician would use for gathering data on these variables?

12. What is “datapenia?” Describe how Craig Feied designed a survey that would collect the critical information needed to develop an information system for hospital staff that was truly effective.

13. Summarize the available data on the relative effectiveness of chemotherapy as a treatment for cancer. Suppose you were to gather data to confirm or refute the results described. How would you gather these data? What method(s) would you use to analyze the data you have gathered?

14. According to Levitt and Dubner, the fact that the age-adjusted mortality rate for cancer is essen- tially unchanged over the past half century hides some good news. Describe this good news and explain what variable is added to the age-adjusted cancer mortality rate to be able to discern this good news.

15. The mortality rate among active duty military is almost the same during the 1980—a time of peace—as during the years in which the military was fighting wars in Iraq and Afghanistan. Information on what variables could be gathered to help explain this phenomenon?

16. If you were interested in determining if there is a difference between the active duty military mortality rates during a time of peace with the same mortality rate during a time of war, how would you state the null hypothesis? What are the steps involved in testing the null hypothesis you have described?

17. Explain the concept of selection bias. How might this be a relevant factor in determining the skill of doctors?

18. How could multivariate regression analysis be used to determine the variables correlated with the development of lung cancer? Describe what independent variables you would include in the regression analysis.

14 19. Suppose that one of the independent variables included in the regression is whether a person smokes cigarette. Suppose that this variable turns out to be statistically significant. Would you have proved that cigarette smoking causes lung cancer?

20. Explain why an algorithm designed to identify the terrorists in a population that is 99 per- cent accurate is nonetheless of limited value, especially as the size of the relevant population increases or the number of terrorists in the population declines.

C h a p t e r 3

Unbelievable Stories About Apathy and Altruism Summary In this chapter, Levitt and Dubner examine the evolution of experimental economics and the sub- field of behavioral economics. They set up the discussion by describing one of the more shocking stories from the 1960s: the brutal and the apparent apathy displayed by a large number of bystanders. As they discuss the rise in crime that occurred between 1950 and 1970, they explore a variety of possible explanations for the observed increase, including reduced arrest and imprisonment rates and the baby boom. The most interesting explanation, which also has empirical support, is the advent of television. (Why increased television viewing leads to increased crime is still somewhat of a mystery.) The authors then shift back to the Genovese murder to ask why no one exhibited altruism on that particular night. The next part of the chapter, which examines such questions as why and to what extent altruism exists, begins by considering the distinction between altruism and self-serving behavior. As has argued, many seemingly altruistic actions, e.g., visiting someone in a retirement home, frequently include a strategic element. Subsequent empirical analysis has confirmed this. To better understand altruism, a group of economists moved their work to the laboratory, i.e., experimen- tal economics, to gain additional insights. Building on the work of John Nash and the Prisoner’s Dilemma, two new games, Ultimatum and Dictator, were developed and refined. The overwhelm- ing conclusion that emerged from experiments involving these games was that the vast majority of humans are in fact altruistic. This, in turn, cast considerable doubt on the traditional assumption of the rational, self-interested economic agent. It also raised the possibility that most, if not all, of society’s problems could be solved simply by relying on the altruistic nature of human beings. Take, for example, the case of organ transplants. If humans are indeed altruistic, it seems there would never be a shortage of kidney’s for transplants considering a person is born with two but really only needs one. Attention then shifts to the subfield of behavioral economics and the work of John List. After chronicling List’s career path, including his work in mainstream behavioral economics, the authors describe how List began to explore the relationship between laboratory results and the real world.

15 In short, what List found was that the participants in an experiment behave much differently when they don’t know they are part of an experiment. Moreover, as the experiment more closely resem- bles the real world, results approach what traditional theory would predict, i.e., rational, self-inter- ested behavior. The explanation for why previous laboratory experiments produced the results they did include selection bias, the effects of scrutiny, and context and its corresponding incentives. The last part of the chapter considers what constitutes pure altruism, as opposed to impure altru- ism, and then returns to the case of the Genovese murder. What is really interesting here is that all of the shock waves felt throughout the country in the wake of this story may have been more the result of fiction than fact.

Basic Statistical Concepts 1. Descriptive Statistics. This chapter provides a number of descriptions of rising crime rates. Unlike many of the examples of cross-sectional data in previous chapters these are examples of time- series data. Just for practice, as you read through this chapter, identify the type of data you’re looking at: Is it time-series, cross-sectional or both? Also, try this: What visual devise would you use to present the crime rate data over time. Make it something that would be immediately informative. One of the studies referred to in this chapter uses data from the U.S. government, referred to as a longitudinal study. How are these data different from time-series data? 2. Correlation vs. Causation. Levitt and Dubner note in this chapter that the rise of TV watching is correlated with the increase in the crime rate. This appears, without further examination, to be a spurious correlation. After all, how can watching TV lead to criminal behavior? There must be a third—causal—variable at work here. However, Levitt and Dubner postulate some hypotheses suggesting a causal link between youthful TV watching and adult crime. They are compelled to this by finding additional variet- ies of correlation between youthful TV watching and adult crime. Levitt and Dubner extend the ways that they statistically examine the correlation of these two variables. For example, they look at crime rate differentials for different age groups within a city. Just for practice, how you would test some of these hypotheses? Did the Andy Griffith Show make people think that the law was incompetent so they would easily get away with criminal activity? 3. Experimental vs. Observational Studies and a “natural experiment:” This chapter is largely about an experimental approach to gathering data on human behavior. Featured prominently is the experimental work of people like John List. This is unusual because economists and those who practice statistics in the social sciences are often relegated to gathering observational data. As a student of statistics, you should be able to distinguish between the experimental approach to gathering and analyzing data and the observational approach.

16 Entertainingly included in this chapter is an example of a “natural experiment:” a case where cir- cumstances surreptitiously prevail and observational data are similar to the setting up an experi- ment. The natural experiment was the large and sudden release of people from prison as a result of lawsuits related to prison overcrowding. We will encounter another serendipitously arranged set of circumstances that forms a natural experiment in the next chapter: a study of germs and childbirth by Dr. Ignatz Semmelweis. The laboratory experiments by John List and other behavioral economists are—as we have said— not the usual fare of economists or statisticians in the social sciences. Therefore, the experiments are a useful contrast to observational studies. From these examples you should learn that the selec- tion of the participants in these experiments is important. Also, how you set up the experiment is important. We describe this further in the following paragraphs. 4. Multiple Regression Analysis. Explaining the increase in U.S. crime rates during the decades of the 1950s, 60s, and 70s is a statistical problem that lends itself to a multiple regression analy- sis technique. There are a variety of variables that are possible candidates as causes of the increased crime rate: population increase, a growing anti-authoritarian attitude, and reductions in prison population are examples used in this chapter. While the specific data are not given, enough information is presented in this chapter to under- stand how a regression model that explains an increase in crime rates might look. As practice, use this chapter and make a list of the significant independent variables. See if you can determine the degree of influence of these variables on an increasing crime rate. Now, describe the multivariate regression analysis that you would develop. What would the inde- pendent variables be? How would you gather the appropriate data? 5. Multicollinearity. The OLS method of calculating the coefficients for regression equations make assumptions about the variables included. One of these is that the independent variables are not correlated with each other. In examining the causes of the increase in the crime rate through the 1950s, 60s, and 70s, an astute observer would include independent variables such as: the incidence of , the number of single parent families, the number of female heads of household and the incarceration rate. If you think about it, there may be an approximate lin- ear relationship among some of these independent variables. If this condition exists, it’s known as multicollinearity. One of the possible “cures” for multicollinearity is to respecify the model to eliminate or reduce the relationship among the variables. As you learn about regression analy- sis, you will learn how to test for this problem as well as other conditions that may violate the assumptions of OLS estimators. 6. Selection Bias. In examining the behavior of those who played the game Dictator, List concluded something about the behavior of the population. Were the subjects of the observations a rep- resentative sampling of the population? Likely not. As pointed out by Levitt and Dubner, col- lege freshmen who volunteer for a laboratory experiment differ from the general population in important ways.

17 This is a good example of importance of obtaining a random sample. A properly selected random sample will be likely to be representative of the population. At least, with a random sample, the sta- tistical techniques you learn from your textbook will allow you to create confidence intervals and perform tests of significance. As we described in our review of the first chapter, there are samples described in SuperFreakonomics that are not, strictly speaking, random samples (the prostitute data for instance). The analysis per- formed using the data from these samples can be inferred to the population if the subjects that are observed do not differ in any systematic way from the population. 7. Behavior Modified by Observation? Do you modify a person’s behavior by observing him or her? This question is not relevant in conducting observational studies. However, laboratory experi- mentation using humans as subjects potentially affects the behavior of the subject. This means that the behavior observed by the researcher during an experiment does not translate to the population as a whole. The behavior observed by John List through laboratory experiments seems to contradict the behav- ior observed in the field or when the rules were less strict and there no one is observing the partici- pants during the experiment. One explanation for this is described in the paragraph above. This is also referred to as the effect of scrutiny. Do people act differently when they think they are being observed? Levitt and Dubner conclude that people may act with a greater degree of altruism in lab- oratory when they know they are being observed because there is a social stigma to acting greedily. 8. Mendacity, Artificial Context and the Reliability of Responses. Levitt and Dubner describe another reason why the work done by List suggests that the results of many laboratory experiments should be considered unreliable: context. A laboratory setting is artificial. The initial conclu- sions derived from the Ultimatum and Dictator games, and the results of John List’s subsequent experiments, based on slight modifications of those games, clearly demonstrate that, in order for the results that flow from it to be valid, a model must sufficiently reflect the relevant fea- tures of the world it is trying to emulate.

A related context problem involves the use of college volunteers. While playing Dictator, these students may be more interested in pleasing the investigator than reaching inside themselves and showing their true selves. Levitt and Dubner refer to college volunteers as “scientific do-gooders.” So, as a student of statistics, what do you conclude? In order to avoid unreliable results and con- clusions, laboratory experiments must be set up to avoid selection bias, the behavioral changing effects of scrutiny and must have a context that doesn’t have an artificial feel.

Core Competencies

Once you have read and carefully studied this chapter you should be able to complete the following tasks. Note: these competencies do not begin with simpler statistical concepts and work toward more difficult one. Instead, they follow the organization of the book.

18 1. Explain why the brutal murder of Kitty Genovese raised such serious questions about the seem- ingly limitless apathy of people in the .

2. Explain the related to U.S. crime rates through the decades of the 1950s, 60s and 70s.

3. Describe the trend of arrests per crime and the incarceration rate over these same decades.

4. How could multivariate regression analysis be used to determine the influence of changes in arrest rates, incarceration rates, and the postwar baby boom on the crime rate?

5. SuperFreakonomics describes growing crime rates during the 1950s, 60s, and 70s. At the same time, the U.S. population rate was growing. State the null hypothesis you would test if you wanted to determine whether the per capita crime rate in this time period was statistically sig- nificantly different from the 1940s. What steps are needed to conduct a hypothesis test on this question?

6. Variables that might be used to develop a model for the crime rate include the incidence of pov- erty, the number of female heads of households and the incarceration rate. Explain why there might be a correlation among these variables.

7. A number of assumptions are required for the least squares method of estimating coefficients of a multivariate regression analysis to get unbiased estimators. What are these assumptions? In reference to the previous core competency, which of these assumptions might be violated? What is this condition known as? How can it be addressed?

8. Describe the correlation between television viewing and the rate of crime in individual cities. Does this correlation indicate that television viewing increases crime?

9. How was the statistical technique referred to as lagged variables used in establishing this correlation? 10. What explanations do the authors give as to why television viewing at a young age, specifically, The Andy Griffith Show, is correlated with criminal behavior later in life? How, using statistical methods, could you further investigate this explanation?

11. What is a natural experiment? Explain how a natural experiment was used to determine the relationship between incarceration and crime rates. How is it different from observational studies?

12. This chapter describes the work by several behavioral economists using laboratory experiments. Explain how the approach of experimental economics and the resulting subfield of behavioral economics differs from most economic research.

13. What are the differences between John List’s experiments involving baseball trading cards in a laboratory setting and his experiments on a real trading floor? What did List hypothesize might be the source of the observed differences?

19 14. Considering the experiments using the various forms of the Dictator game, how do different rules of the same game affect the conclusions of the researcher?

15. Given that seemingly minor modifications of rules affect the outcomes of laboratory experi- ments and thus the conclusions about human behavior drawn by the researcher, how can you know whether an experiment isn’t missing some delicate rule? In other words, can research, done in the context of a laboratory experiment, ever be considered reliable?

16. Surveys and questionnaires are used extensively to gather information that a statistician sum- marizes, analyzes, and draws conclusions from. Explain whether the context of these surveys can cause the responses of these experiments to differ from behaviors one might observe in the real world. What does this say about the reliability of information gathered through surveys and questionnaires?

17. Summarize, in general terms, the modifications John List made to the Dictator game, and explain why the results of his experiments ended up casting extreme doubt on the conclusions regarding human beings and altruistic behavior that were based on the initial version of the game.

18. Explain the role selection bias has in gathering reliable information from experimental studies. As described by the authors, how was the selection of experiment participants biased?

19. What is the role of scrutiny in gathering reliable data from a laboratory experiment such as Dictator? How can it change the outcomes of these experiments?

20. How does a further examination of the facts and data related to the Kitty Genovese murder compare with the story told by article published shortly after the incident?

C h a p t e r 4

The Fix Is In—and It’s Cheap and Simple Summary This chapter is all about problems, solutions to those problems, and unintended consequences that can arise. The specific problems addressed range from health-related issues (including deaths among mothers of newborns and polio), to deaths due to car accidents, to population growth and species extinction. A recurring theme is that in all of the cases considered, the solution to the prob- lem is invariably simple and relatively cheap, once the problem and its cause are well understood. The chapter begins by describing the puerperal fever epidemic that struck top European hospitals in the mid-1800s. As Levitt and Dubner point out, many of the suspected causes, in one way or

20 another, held women responsible for the outbreak. And it wasn’t until an enterprising administra- tor who relied on statistical observations began to examine the problem, and the accidental death of a leading doctor, that the problem was solved. After all was said and done, the disease was linked to exposure to germs delivered by doctors who did not adequately clean their hands after perform- ing autopsies and just before they delivered babies. And so it was that the solution to the problem was as cheap and simple as one could possibly imagine, i.e., wash your hands, Doctor! The authors discuss a number of solutions to problems that have proven to have unintended, and costly, consequences. The Americans with Disabilities Act has made it easier for these folks to move about in society, but it also has left them with fewer job opportunities because potential employ- ers became afraid of running afoul of the law and simply chose not to hire the disabled. And then there’s the Endangered Species Act, which has led many property owners to engage in habitat destruction, which is the opposite of the law’s intent. The same is true of such policies as per unit pricing of trash disposal, which has encouraged illegal dumping and resulted in more burn victims because people have an increased incentive to burn their trash, and a mandatory debt relief law that ended up limiting the availability of credit to those who need it most. The next part of the chapter focuses on a variety of situations in which major problems have been solved using relatively cheap and simple fixes. Ammonium nitrate, for example, proved to be a rather inexpensive solution to the problem of how to feed a rapidly growing world population. The discovery of oil averted what otherwise would have been the extinction of whales. And a set of vac- cines ultimately eliminated the burgeoning costs polio imposed in the United States and abroad. At this point, the authors also stress the difference between treating a problem after it arises and effectively eliminating the problem. This is the nature of vaccines and other preventive drugs in medicine. There is no doubt that considerable costs are incurred in developing new vaccines and drugs. But it is also true that once the vaccine/drug has been developed, its costs are dwarfed by the benefits in the form of adverse health effects avoided. Levitt and Dubner then return to a topic addressed at some length in their previous book, i.e., deaths associated with automobile accidents. However, this time, they address the problem more broadly, considering deaths of both children and adults. Beginning with death rates in general, they emphasize the distinction between mitigating the adverse effects of an accident ex post, and avoid- ing the harm altogether. This was indeed the motivation for putting seat belts in cars. If someone is prevented from being thrown about the interior of a car when an accident occurs, the injuries suf- fered will be much less than they would be without such a restraint, regardless of how hard or soft objects in the car are. Levitt and Dubner turn to the issue of child safety seats. This time around, the conclusions are more specific, i.e., children under the age of 2 are clearly better off in car seats, while children in the 2-6 age category are equally well off using either seat belts or car seats, especially when it comes to serious injury. They wrap up this part of the chapter by suggesting a possible solu- tion that would be both simple and cheap—design seat belts specifically for children.

21 The last section of the chapter describes a possible solution to one of the more destructive prob- lems created by Mother Nature: hurricanes. After explaining the process by which hurricanes form (you can’t identify a solution if you don’t first understand the problem), they describe a relatively low-cost solution that is currently being tested. The ultimate question, of course, is whether gov- ernments can be persuaded to try something that is both cheap and simple.

Basic Statistical Concepts 1. Descriptive Statistics. The statistics in this chapter are, like the other chapters in this book, abun- dant. This chapter, however, contains some rather astonishing statistics: the death rates of mothers giving birth from puerperal fever, the reduction in child fatalities through the use of child safety seats, the whale harvest by the Captain Ahabs of the 19th century among others. As a student of statistics, you might like the fact that much of the data in this chapter are often related to government policies and the unintended consequences of these policies. Examples include data on the how the Americans with Disabilities Act might have resulted in less employ- ment of the disabled and how the Endangered Species Act might result in a reduction in the habi- tat available to endangered species. This is also a good chapter to illustrate both cross-sectional and time-series data since there are several examples of both. There are also a number of cases of data that are combined time-series and cross-sectional. A good exercise for you as a statistics student would be to read through this chapter and identify the type of data being described. 2. Correlation versus causation. As anyone who has had basic statistics knows, correlation does not imply causation. This is a lesson that we have now come across in every chapter of SuperFreakonomics. That being said, correlation and causation are often treated synonymously. Thus, it should not be surprising that such factors as personal predisposition (no doubt many mothers about to deliver their babies are quite anxious), foul air in the delivery wards (remem- ber this was happening in the 1800s), the presence of male doctors, and catching chill or leaving the delivery room too soon were suggested as possible causes of puerperal fever. While there may have been a high correlation between the presumed causes and the fever, hindsight clearly shows the cause which was related only to the male doctors, because they were performing autopsies and not washing up before entering the delivery room. This correlation is causation link is not a problem relegated to the 19th century, however. In the middle of the 20th century, some researchers thought that polio was caused by the consumption of ice cream because they both tended to spike in the summer. Such a connection is no more defen- sible than the suggestion that the number of deaths by drowning is caused by the consumption of ice cream given that both are observed to increase in the summer months—a famous case of misapplied logic This chapter also includes at least one example of correlation having meaningful causation: the discovery and use of oil and the salvation of the whale. Here, the increased use of oil is correlated

22 with reductions in the harvesting of whales. The correlation, in this case, is not spurious, but is, in fact causal. So, are you confused? Are you thinking that we’ve changed our minds here in the penul- timate chapter and now say it’s okay to draw a causal link when only correlation is known. Heaven forbid! Correlation is used as evidence to support a hypothesis. Or, there must be an explanation; a theory to support the causal link (in this case: oil from crude is a perfect substitute for oil from whale blubber). That doesn’t mean that all theories that describe a causal link between two vari- ables that are supported by the data are good theories. A grade-A student is always open to new theories. 3. Experimental vs. Observational Studies. Levitt and Dubner tackle the question of the efficacy of child-restraining systems in automobiles. The first thing a statistician might do is to take an observational approach. That is, to look at the data already gathered by the alphabet soup of government agencies. However, it turns out that the available data relates only to mortality and injury rates for kids in children’s car seats or for kids who are completely unrestrained in the car. Comparisons of restraining systems are not possible with the existing data. So, a lesson for the student. When data regarding the variables of interest are not available, a researcher has to con- duct his or her own experiment. Conclusions are drawn regarding an experiment conducted by Levitt and Dubner comparing the child safety aspects of child seats versus standard seat belts. Since Levitt and Dubner found no adequate source of data to test the safety efficacy of seat belts as compared to child safety seats, they conducted their own experiment. Another interesting note for the student of statistics is that Levitt and Dubner had a difficult time finding a crash test laboratory that would allow them to purchase the services of the lab! This is because the laboratory owners didn’t want to be involved in a test that would draw conclusions that would offend the car-seat manufactures. 4. Selection Bias: A major portion of this chapter is about child restraints in automobiles. The statis- tic cited by child-safety researchers is that booster seats reduce significant injury by roughly 60 percent. The data used by the researchers to arrive at this conclusion is garnered from insurance records that describe interviews with parents. It may well be that these records are fraught with distorted information. There are two reasons for this. First, an automobile accident involving one’s child is a traumatic event for the parent. He or she may misremember the details of such an event. Second, the parents interviewed by their insurance company may be intimidated or embarrassed causing them to misrepresent the truth about how restrained their children may have been at the time of an accident. This is known as response bias. For both of these reasons, the information given by parents may not be considered reliable. We now have described several examples of how selection bias distorts the goal of collecting reli- able information. Selection bias seems to be difficult to avoid. 5. Multiple Regression Analysis. The advent of the agricultural revolution has been the subject of a considerable amount of analysis. The development of a model to assist in explaining the agri- cultural revolution can be accomplished through multivariate regression analysis. As a student

23 of statistics, as you read the part of the chapter on the agricultural revolution you may wonder what variables would be considered as significant independent variables. For practice, construct a regression equation using the variables mentioned. Which of these, do you think, would be considered the most consequential? 6. Heteroskedasticity. The unbiasness of OLS estimators relies on the assumption of homoskedastic- ity. This funny word means that the error terms for each observation of a variable are normally distributed and that this distribution is the same for each value of the variable. In this chapter, there is an example that could be used to illustrate heteroskedasticity: the powerful damage wrought by hurricanes. Suppose a hurricane researcher is attempting to develop a regression equation in which damage done by hurricanes is the dependent variable. Suppose the size of the hurricane is one of the independent variables—whether measured by category or by energy out- put. It is likely that the error term would increase in size as the size of the hurricane increases-- with a result of heteroskedasticity.

Core Competencies

Once you have read and carefully studied this chapter you should be able to complete the following tasks. Note: these competencies do not begin with simpler statistical concepts and work toward more difficult one. Instead, they follow the organization of the book.

1. Describe the various (incorrect) proposed explanations for the dramatic increase in puerperal fever that occurred at General Hospital in Vienna in the 1840s. For each one, explain how you think these explanations came about.

2. What type of data was collected by Semmelweis to study maternity mortality rates?

3. Describe whether the mortality data for the Vienna hospital represents a randomized sample, a convenience sample or a census or something else. What difference does this make in terms of drawing conclusions on the mortality rate of the doctor’s ward versus the midwives’ ward?

4. Considering the data collected on mortality rates at the Vienna hospital in the various wards, what are the values of the point estimates? How are point estimates different from population parameters?

5. Describe the process Semmelweis went through to develop a hypothesis for the cause of the higher mortality rates in the doctor’s ward at the Vienna hospital.

6. Identify the variable that Semmelweis determined to be correlated to the incidence of puerperal fever at General Hospital in Vienna. Did he decide that this correlation indicated causality? What did he do to “test” his hypothesis? What is confirmed?

7. Let’s pretend to start over and test Semmelweis hypothesis according to standard statistical methods. What is the dependent variable in this case? What are the independent variables?

24 8. When testing a medical procedure, the method most often used is to have a random selection of patients treated with the medical procedure and a control group that is not treated. How would you set up this type of experiment to test Semmelweis’ hypothesis? What is the null hypothesis?

9. Summarize the costs, both personal and economic, that resulted from the polio epidemic that struck the United States in the early to mid 1900s.

10. Explain the principal of unintended consequences. Use examples from this chapter to illustrate the principal.

11. How would you test the hypothesis that the use of forceps during child delivery reduces the incidence of death and injury?

12. Using multivariate regression analysis, how would you create a model that explains the agricul- tural revolution?

13. Explain the process of stepwise regression. How might you use this method to develop the model of the agricultural revolution mentioned above?

14. Consider how the risk of travel might be examined. What variable would you choose as the dependent variable in an analysis of variables that are correlated with risk? Explain how using mortality per number of miles driven vs. mortality per trip taken affects your conclusions?

15. Explain why Levitt and Dubner conducted their own experiments on child safety seats versus adult seat belts rather than just resorting to the F.A.R.S. data set.

16. Using the child safety seat examples described by Levitt and Dubner, explain the difference between observational studies and experimental studies.

17. Explain selection bias. Describe how it becomes a problem in gathering reliable responses from the parent interview data of automobile companies.

18. Explain response bias. Why does response bias materialize in interviewing parents about the circumstances related to automobile accidents in which young children are involved?

19. Considering child restraint systems in automobiles, how could a researcher design a study that minimizes the chance that his or her research will be tainted by either selection bias or by response bias?

20. Focusing on children over the age of two, what do the data tell us about the relative effective- ness of seat belts and child safety seats with respect to deaths resulting from car crashes?

21. The least squares method of estimating the coefficients of a regression analysis makes certain assumptions about the behavior of the data in order to produce unbiased estimators of these

25 coefficients. Suppose the dependent variable is the economic damage done by a hurricane and that the independent variable is the size of a hurricane. As the size of the hurricane increases, it seems likely that the error term would increase in size as the size of the hurricane increases. What is this condition known as? How can the model be specified to reduce or eliminate this effect?

C h a p t e r 5

What Do Al Gore and Mount Pinatubo Have in Common? Summary What if much of what’s been proposed to address the problem of is wrong or oth- erwise off the mark? This chapter takes a refreshingly objective look at the problem, focusing on global warming. The first part of the chapter points out that as recently as the 1970s, global cool- ing, not global warming, was the major concern of many climatologists. The subsequent increase in average global temperatures, however, moved global warming to the forefront. The authors explain many of the suspected causes of global warming, ranging from carbon emissions, to methane from cows and other ruminants, to changes in agricultural production. Levitt and Dubner also take a close look at the unique character of the global warming dilemma, emphasizing the considerable challenges scientists face as they try to predict what will happen over the longer term. They wrap up the introduction by considering the near religious dimension of the movement to stop global warming. In the next part of the chapter, the authors present a very accessible discussion of the concept of an externality, beginning with negative externalities such as pollution, and the idea of using a tax to induce parties responsible for negative externalities to internalize them. They also are careful to point out the considerable difficulties that would be encountered in any attempt to use a tax to internalize the externalities that result in global warming. The discussion then moves to the con- cept of positive externalities with a fair amount of attention devoted to specific examples. The first is the adoption of the LoJack by many automobile owners as a means to thwart the efforts of would-be car thieves. This generates external benefits in the form of an increased ability of police to locate and take down the chop shops where most stolen cars end up, and a reduction in auto thefts for both people who install a LoJack and those who do not. The second example considered is the eruption of Mount Pinatubo in 1991 and the decrease in the average global temperature that followed. This serves as a springboard to the work that goes on at (I.V.). It also answers the question: What do Al Gore and Mount Pinatubo have in common? They both suggest a way to cool the planet. We were actually introduced to Intellectual Ventures in the previous chapter when we learned about the approach that is being proposed to mitigate the adverse effects of hurricanes. The

26 authors take the time to acquaint the reader with many of the gifted scientists who work for I.V. and, in particular, those who are working on solutions to the global warming problem. One of the novel aspects of the solutions being proposed is also a carryover from the previous chapter; they are, by and large, simple and relatively cheap. To better appreciate the solutions being proposed by I.V., we are first provided with I.V.’s assessment of the global warming problem, its causes, and the likely effectiveness (little if any) of the solutions that have been proposed by others, e.g., Al Gore’s focus on reducing the amount of CO2 being emitted into the atmosphere. In contrast, I.V. prefers to take its cue from the effects of “big-ass volcanoes,” and reduce global warming by reducing the amount of the sun’s rays that reach the earth. While Budyko’s Blanket, a garden hose to the sky that would extend 22 miles into the stratosphere and spew out sulfur dioxide in much the same manner as would a large volcanic eruption, is currently the focus of much of their attention, the folks at I.V. are also considering other approaches, including extending the smokestacks of specific coal-fired electric plants into the stratosphere, and stimulating cloud formation over the oceans to reflect back more of the sun’s rays. Levitt and Dubner then point out the major obstacle to all of these solutions, which collectively fall under the heading of “geoengineering.” The problem, quite sim- ply, is the often encountered inability to change people’s behavior, especially when the proposed change is considered repugnant by some. The last part of the chapter chronicles the obstacles Ignatz Semmelweis encountered in his efforts to get doctors to wash their hands before seeing each patient. Levitt and Dubner then point out that this problem has persisted into modern times. In fact, only recently have certain hospitals been able, with the help of a computer screen saver featuring bacteria-laden hand prints, to induce doctors to achieve near 100 percent compliance with guidelines for hand washing. A similar chal- lenge was confronted in Africa where the AIDS epidemic continued unabated until intervention in the form of circumcision substantially reduced the incidence of HIV infections among the affected population.

Basic Statistical Concepts 1. Descriptive Statistics. This last chapter, like the four before it, is replete with statistical informa- tion. The main focus of this chapter—global warming—provides examples of time-series data. The atmospheric temperature of the Earth over a series of years and over a series of decades, the change in ocean elevations over time and the change in global temperatures after the erup- tion of “big ass” volcanoes are all examples of time-series data. These stand in contrast to much of the data presented in the previous chapters in SuperFreakonomics: The prices of sex acts and the wages of prostitutes presented in chapter one are cross-sectional data. Data can be presented in a variety of visualizations. Can you imagine how the global warming data could be presented in an informative and creative way? This visualization could be a presenta- tion of mean global temperature over time, or it could be a scattergram that presents two variables: the mean global temperature and the amount of green-house gasses produced by anthropomor- phic sources.

27 2. Normative versus positive analysis. Some of the efforts to reduce the pollutants that are believed to contribute to global warming appeal to people’s altruistic motives. This is the approach being taken in Al Gore’s massive PR campaign. While this is not meant to deny that Gore’s position is prompted from scientific findings, there is nonetheless, a normative aspect to the appeal for the adoption of specific policies. The argument goes like this: There is growing evidence of global warming. Carbon dioxide is one of the greenhouse gasses believed to contribute to global warming. Many activities people engage in cause an increase in atmospheric carbon

dioxide. Humans are morally obligated to reduce the amount of CO2 they are emitting into the atmosphere. Compare this to the proposal by the I.V. brainiacs that sulfur dioxide should be spewed into the stratosphere to help cool the earth. This proposal is based on the observations of how a few “big- ass” volcanoes have induced a cooling of the earth for a few years after they erupted. This policy suggestion lacks normative considerations. It is, however, likely to be met by a normative backlash: humans should not fight the effects of existing pollution by adding more pollution. 3. Correlation vs Causation. More than any other contemporary topic, the issue of global warming is one that evokes the question of whether the correlation of human economic production and

atmospheric warming is a simply a correlation or whether the increasing production of CO2 by humans is the cause of global warming. This chapter presents several examples of the confusion regarding this question. In the mid-1970s, world-wide cooling was predicted. However, over the past 100 years, global ground temperatures have risen. Thus, the subject of this chapter serves to illustrate the logical fallacy of equating the mere cor- relation of two variables with causation. This is a case in which a more robust model must be pre- pared to include variables that may play a significant role in determining the fate of the global atmosphere. 4. Multivariate Regression Analysis. As you now know, the development of a properly specified mul- tivariate regression equation is a precursor to model building. The development of a predictive model of global climate change is facilitated by the creation of one or more multivariate regres- sion equations. Each variable included in the equation(s) used would be tested for statistical significance. After reading this chapter, consider what independent variables might be included in a multivariate regression analysis where the dependent variable is mean global temperature. Levitt and Dubner suggest several variables that may be important determinants in a regression equation. Just to get you started, here are two variables Levitt and Dubner suggest: the number of airplane flights and the amount of bovine flatulence. 5. Model Building. Model building consists of two main tasks: selecting the proper functional form for the model in order to properly describe the relationships among the variables and selecting the independent variables that should be included in the model. In this chapter the question of proper model building is paramount. According to the authors: “To predict global surface tem- peratures, one must take into account these [airplane flight information] and many other factors including evaporation, rainfall and . . . animal emissions.”

28 6. Forecasting. Forecasting is the use of models or other statistical techniques to forecast the val- ues of specific variables into the future. In order to accomplish this, a model needs to be devel- oped. In this chapter, several models of climate change are described. Predictions of atmospheric temperature, sea level and world-wide agricultural production are made by these models. Atmospheric temperature forecasts are the major thrust of this chapter. In your statistics text, you may have come across methods of developing forecasts. The simplest of these is trend analysis: drawing a trend line through a set of values of two variables. Why is a simple trend analysis insuf- ficient to forecast the global atmospheric change? You should now be able to explain how to prepare forecasts using regression analysis. You begin with theory and choose independent variables based on climate theory. You collect data, estimate coefficients and go through the process of properly specifying a model. You then use this model to predict values of the dependent variable—in this case mean global temperature or the increase in sea levels. 7. Scientific Uncertainty. Statistical textbooks assume that data can be used to produce sample sta- tistics, to test hypotheses, to create regression equations, or to develop models. In this chapter, Levitt and Dubner cast doubt on this viewpoint. They note that it is not clear what data are rel- evant. They also cast doubt on the idea that a model created from the best data available is suf- ficiently complex to handle the complicated case of global climate change. They even describe a situation of scientific Darwinism: Only those scientists whose models predict catastrophic global climate change will receive funding to conduct research and fully develop atmospheric models. 8. As a student of statistics, how do you determine the reliability of the data you collect? How do you deal with cases in which scientists disagree as to the appropriate data to use?

Core Competencies Once you have read and carefully studied this chapter you should be able to complete the following tasks. Note: these competencies do not begin with simpler statistical concepts and work toward more difficult one. Instead, they follow the organization of the book.

1. Even without the use of sophisticated statistical methods much can be learned by simply exam- ining the relevant data. Describe the sources of greenhouse gases.

2. To what degree would greenhouse gases be eliminated by “buying local”—in terms of food pro- duction and delivery?

3. What visual devise could be used to clearly illustrate the relationship between anthropomorphic greenhouse gas emissions and mean global temperature? What type of data are referred to in this chapter?

4. List and explain the three primary science-related factors Levitt and Dubner cite as reasons why global warming is a uniquely thorny problem.

29 5. Explain why existing climate models have a difficult time predicting the climate future.

6. What independent variables would be included in a multivariate regression analysis that would be used as a model to predict future global mean temperature?

7. Suppose you include the eruption frequency of “big-ass” volcanoes as one of the independent variables in the regression analysis described above. How would the value of the estimated coefficient of this variable compare with the value of other independent variables?

8. What does model specification mean? How is it relevant to building models that are used to forecast future global temperatures?

9. What statistical technique could be used to determine the deterrent effect of “the club” versus the LoJack? State the null hypothesis for this experiment.

10. Describe the process of stepwise regression. What role to t-statistics and F scores play in step- wise regression? What would the process be for using this technique in developing a model for global warming?

11. Describe the effect the eruption of Mount Pinatubo had on the earth’s average temperature in the two years following the eruption.

12. How does the existence of scientific uncertainty complicate statistical analysis?

13. Summarize Intellectual Ventures’ assessment of the current generation of climate-assessment models and what this assessment means for our ability to rely on those models in formulating sound global-warming policy.

14. What is normative analysis? How is it used by some environmental activists to determine the appropriate solution to global warming. How does this differ from positive analysis?

15. Describe the primary driver behind rising sea levels and contrast this with the argument made by many environmental activists regarding the assumed cause of this phenomenon.

16. What is “geoengineering?”

17. Drawing on the observed effects big-ass volcanoes have on the earth’s temperature, explain how Budyko’s Blanket, i.e., the “garden hose to the sky” would work to reduce or eliminate the process of global warming.

18. Describe the expected costs of implementing the Budyko’s Blanket strategy. How does this compare with the cost of Al Gore’s PR campaign regarding global warming?

Economics students will find a student guide to Superfreakonomics and Freakonomics at www.HarperAcademic.com

30