The Following Exercises Are All Based on Using PDQ-Explore to Access the 2006 ACS Data

EXERCISES FOR THE ACS2006 DATA

The following exercises are all based on using PDQ-Explore to access the 2006 ACS data in the IPACS_ALL data set. Unless otherwise noted, include the selection Y2K=2006 as part of the Universe/Selection specification in the setup. Note that the default sample weight assigned for this data set automatically selects only the first data set available for each year in the concatenated 1850-2006 file of decennial and ACS data.

Setup entries in PDQ-Explore are not case-sensitive. Variable names in the following exercises are in upper-case, but they may be entered in the setup windows in upper, lower, or mixed case.

Suggestion: Keep a window open to the IPUMS-USA documentation: http://usa.ipums.org/usa Click on “Variables” under “Documentation”

Exercise 1: Tabulate poverty by variables of interest such as state (STATEFIP), race (RACE), employment status (EMPSTAT), education (EDUCREC), type of household (HHTYPE), etc. Percentage the results, sort them, and experiment with the various display options for the results.

Suggestion: Try POVERTY<100 as the column variable and then repeat the tabulation using POVERTY/100 as the column variable.

Exercise 2: Repeat Exercise 1, but use Summary Statistics rather than Tabulation for the Query Type. Enter POVERTY as the “Describe Expression.” Again, experiment with the various display options for the results.

Exercise 3: Compare the characteristics of persons whose only source of income is social security to those who have other sources of income, perhaps including social security income. Will using INCTOT=INCSS be sufficient to identify these persons? Look at the characteristics of these persons—where they live, household structure, age, sex, education, etc.

Exercise 4: We have looked at tables on poverty on the census.gov website. Reproduce one of the tables using PDQ-Explore with the 2006 ACS data in the IPACS_ALL file. Then further disaggregate the data by adding a third dimension to the table.

For example, see if you can reproduce the states portion of Table R1703 that ranks the states by the percent of the population that is 65 and over who are below the poverty level. Then add RACE as another dimension to the table.

How well do our percentages compare with the Census figures? Why do they differ? Exercise 5: Using the 2006 ACS data, tabulate occupation by sex for persons who are employed. Percentage the results by column and then sort the results by column. Then restore the original order, percentage the results by row, and sort the results again by column. What does looking at the percentages by column tell us? What do the percentages by row tell us?

Note that the data set has several occupation variables. Begin with OCC, but you may wish to look at some of the others.

Suggestion: Set occupation as the row variable. Tables may be sorted by clicking on the head of the row or column.

Exercise 6: Using summary statistics, determine average wage income for occupations. Then add SEX as a second dimension. Rearrange the display of the results by setting SEX as the row variable and OCC as the foreach variable. Find an occupation where the difference in wages is striking. Then create a new tabulation that selects persons in that occupation and see if you can identify other variables that may be related to the difference in wages—age, education, race, area of the country, etc.

Exercise 7: Find the occupation that is associated with the highest level of poverty.

Exercise 8: Examine modes of transportation to work in terms of characteristics of the place of residence such as location (state, metro-area) and type of housing as well as person characteristics such as age, education, race, and sex.

Exercise 9: Which variables in the data set could be considered to be directly related to the quality of life of persons? Could they be combined into an index? Are some parts of what you would consider quality of life not represented in the census questions?

Exercise 10: Intelligent use of census and survey data requires that a user of the data be sensitive to factors that influence the quality of the responses and reported data. Look at two tabulations from the 2006 ACS data. Tabulate TRANTIME for people who report that they are employed using EMPSTAT=1 to select people who are employed. Does anything look peculiar about the results? Then do the same for INCTOT over the range of incomes from $40,000-41,000. What is different about this tabulation? What might account for the difference?

Note and compare where the counts tend to heap in the two tabulations.

Exercise 11: What are the characteristics of married-couple households where the wife earns more than the husband or where the wife has higher education than the husband? Suggestion: Use the SPOUSE custom item to select households where the wife earns more than the husband. Your Universe/Selection might be: Y2K=2006 & HHTYPE=1 & (RELATE=1 | RELATE=2) & SEX=1 & SPOUSE(INCWAGE)<999999 & INCWAGE

Why does the selection include (RELATE=1 | RELATE=2) and SPOUSE(INCWAGE)<999999? Whose occupations are we tabulating if we execute this query?

Note that in relatively complex setups such as this one, it is usually wise to run variations on the query to verify that we are getting what we think we are getting. What might we look at to confirm that our selection is correct?

Exercise 12: Tabulate POVERTY/100 by state (STATEFIP). Then “export” the result to save it to a file and subsequently load it into an Excel spreadsheet. Use the “Export” option under the File menu.

Hint: Set “Show Commas” off under the results option menu so suppress the display of commas in the results.

Exercise 13: Tabulate the first two RELATE categories (RELATE:1..2) by SEX for married couple households (HHTYPE=1). We might expect the counts on the diagonal of this table to be equal, i.e., for each married male head, there should be a female spouse and for each married female head, there should be a male spouse. Why aren’t the counts equal?

Exercise 14: Use the case statement to collapse VEHICLES into categories of None, 1, 2, and 3 or more vehicles available. Then tabulate by state. Which states have the fewest vehicles per household? Which the most?

Try the following: CASE(VEHICLES=9,0,VEHICLES=1,1,VEHICLES=2,2,3) as the column expression. To assign labels to the numeric codes, use the “Construct Custom Item/Recode” option under the “Query” tab.