Proceedings of Statistics Symposium 2010 Social Statistics: The Interplay among Censuses, Surveys and Administrative Data

1991- Mortality and Cancer Follow-Up Study

Paul A. Peters and Michael Tjepkema1

Abstract

Census mortality linkages are proven to be powerful tools for analysing the mortality differences for numerous population groups. In an recently approved record linkage, the 1991 Census of Population, Canadian Mortality Database, and Canadian Cancer Database will be linked in order to examine cancer incidence and causes of death in conjunction with socio-demographic and neighbourhood characteristics. The linkage of the 1991 Census cohort to these databases will allow for the analysis of mortality using the CMDB in conjunction with the extensive information from the 1991 Census of Population long-forms (2B and 2D), the recording of individual mobility over time using postal codes of tax filers from the Tax Summary Files, and the inclusion of important analysis of cancer morbidity via the CCDB. This presentation overviews the previous census mortality linkage, describe the new linkage, outline the linkage process, and present some initial linkage results.

Key Words: Census; Health; Mortality; Cancer; Data Linkage.

1. Introduction

This paper describes the development of and preliminary results from the linkage of the 2B (long form) with the Canadian Mortality Database (CMDB), Canadian Cancer Database (CCDB), and annual Tax Summary Files (TSF) (record linkage number 052-2009). This linkage is an expansion and extension of the earlier 1991- Mortality Follow-Up Study (Wilkins et. al., 2008). The linkage of the 1991 Census cohort to these three databases allows for the analysis of mortality using the CMDB in conjunction with the extensive information from the 1991 Census, the recording of individual mobility over time, and the inclusion of important analysis of cancer morbidity.

The primary purpose of this expansion and extension is to assess the impact of long-term exposure to air pollution on human health, with the objective of informing the development of Canada-wide standards for key criteria air pollutants. The specific objectives of this expanded linkage are first, to determine whether mortality from all causes combined, from ischaemic heart disease, from cardiopulmonary disease, from respiratory cancer, and from all cancers combined are associated with long-term exposure to ambient air pollutants. Second, to determine whether there are air pollution risks for cancer incidence and the risks for specific types of cancer. Third, to extend the 10- year follow-up on the 1991 Census cohort in order to examine cancer incidence and causes of death in conjunction with socio-demographic, disability, and neighbourhood characteristics over an additional 10-year period.

There is a recognised need for more environmental data related to human exposure, with linkages of separate sources of information identified as an important way in which can meet identified data gaps (Statistics Canada, 2008). In particular, findings from a recent health-environment expert panel report noted that it is critical to investigate the health of who live, work, or are educated near sources of pollution. This linkage request addresses these key recommendations with the development of a significant baseline cohort that could be used to evaluate the risk factors of environmental exposure on human health outcomes.

1Paul A. Peters, Health Analysis Division, RHC-24M, 100 Tunney’s Pasture Driveway, Ottawa, ON, Canada, K1A 0T6 ([email protected]); Michael Tjepkema, Health Analysis Division, RHC-24Q, 100 Tunney’s Pasture Driveway, Ottawa, ON, Canada, K1A 0T6 ([email protected])

1

2. Background and Purpose

2.1 Background

The 1991-2001 Census Mortality Follow-Up Study forms the basis for the current expanded linkage. The 1991-2001 linkage created a cohort of individuals with the goal of producing indicators of mortality for monitoring health disparities across regions and among socio-economic groups, Aboriginal peoples, and immigrant groups in Canada (record linkage number 012-2001). The linkage included the 1991 Census of population (long forms), 1991 Health and Activity Limitation Survey (HALS), the 1990 and 1991 TSF, and mortality data between 1991 and 2001 from the CMDB. The 1991-2001 linkage yielded a cohort of approximately 15% of the Canadian non-institutional resident population age 25 and older in 1991.

This sample of 2,735,152 Census respondents was first linked via probabilistic record linkage to the 1990 and 1991 T1 personal tax files, matching date of birth, sex, marital status and postal code on the Census and tax files to obtain a name from the TSF to use in the linkage to death registrations in the CMDB from 1991 to 2001. To preserve respondent anonymity, names were encrypted on the T1 and the CMDB prior to linkage. The Social Insurance Number (SIN) from the tax file was not used in the linkage and only tax deaths and emigration from the TSF were put into the analysis file and used to calculate person-years at risk.

The intention of original linkage of the 1991 Census cohort to mortality was to create a set of baseline indicators of mortality, in order to monitor health disparities in the Canadian population. Initial work on the health effects of ambient air pollution using the existing linked analysis file is underway via a joint project of Statistics Canada and Health Canada researchers. However, there are several limitations to the 1991 Census cohort linked file. First, it does not include place of residence information for respondents on an annual basis after the 1991 Census. This information is required in studies of air pollution, to measure the residential mobility of the population, as air pollution is known to vary significantly at the local level (Jerrett et. al., 2009).

Second, the analysis file is limited in scope given the number of deaths which occurred in this cohort (260,820 deaths from 1991 to 2001). A larger number of deaths are required in order to obtain sufficient statistical power to accurately measure the long-term effects of air pollution (Krewski et. al. 2009). As such, in order to examine the effect of air pollution on the risk of mortality and cancer incidence a longer follow-up period is required.

Third, the existing linkage does not include the CCDB, so no information is available on cancer incidence in this cohort. Few studies have examined the risks of long-term exposure to ambient air pollution on cancer incidence and mortality. In particular, the proposed linkage to the CCDB will allow for analysis of cancer incidence rather than only deaths due to cancer.

2.2 Purpose

The primary purpose for the expanded linkage is to provide a cohesive dataset for analyzing the health effects of long-term exposure to air pollution on Canadians. A secondary use of the linked data base will be to examine the relationship of socio-demographic and neighbourhood characteristics to cancer incidence and causes of death.

Extension of the 1991 Census cohort follow-up linkage will allow for a substantial increase in the number of deaths over the longer time period, and the expansion of the linkage to annual tax data will result in a reduction in potential misclassification of exposure to air pollution. Since air pollution mortality risks are small, on the order of 10% increased risk over the exposure range seen in Canada, a large cohort size is required to obtain sufficient statistical power to detect such effects. The mortality follow-up on the 1991 Census cohort to 2011 is expected to more than double the number of deaths included in the estimates of years of life lost. This additional power will allow for the identification and analysis of susceptible population sub-groups who may have different levels of risk. Regional risk estimates can also be calculated and potential factors identified that correspond to different levels of risk between geographic regions.

2

In addition, there has been a considerable improvement in the monitoring of air pollution since 2000. New techniques of measuring exposure using satellite imagery or land-use regression modelling allow for significantly- improved estimates of exposure. The combination of these methods with the large cohort included in the linked database allows for a wider geographic coverage for exposure estimates and the ability to link information on a greater number of individuals to air pollution estimates.

A key component of this linkage is the inclusion of place of residence from the Tax Summary Files, permitting the recording of Census respondents’ post-censal mobility. This is important because exposure to ambient air pollution is assigned based on the respondent’s place of residence. Research in Canada has shown that air pollution risks vary at the local level (Burnett et. al. 1998). Thus, detailed place of residence information is required to accurately attribute exposure values to individuals. Research in the US has shown that air pollution-related mortality risks vary widely across that country (Krewski et. al. 2009). Thus, estimates of regional risk in Canada are also expected to have substantial variation. The TSF provide a means of obtaining the six-digit postal code of Census respondents on an annual basis after 1991.

2.3 Benefits / Public Good

It is now widely recognized that exposure to outdoor air pollution generated from combustion sources poses a public health risk to Canadians (Koranteng et. al., 2007; Samoli et. al., 2008). However, much of the scientific evidence is focussed on linking short-term or acute exposure to the exacerbation of health problems. For example, exposure to elevated daily outdoor concentrations of combustion-related pollution, such as ground level ozone and fine particulate matter, have been linked to increased asthma symptoms (Burra et. al., 2009), visits to emergency departments for heart, lung, and circulatory problems (Szyszkowskz, 2008; Szyszkowskz et. al., 2009), hospital admissions (Steib et. al., 2002; Villeneuve et. al., 2006) and mortality (Burnett et. al., 1998; 2000; 2003). Much of the evidence of acute health effects on Canadians has been obtained through studies of hospital admission and mortality data conducted in a continuing collaborative effort between Health Canada and Statistics Canada since 1992. However, detailed cohort studies in Canada have been limited by the data sources available and have often relied on extrapolation to the Canadian context from US sources.

Several cohort studies of mortality and air pollution have been conducted in the United States. The largest American study is the Cancer Prevention Cohort II study conducted by the American Cancer Society (ACS) (Pope et. al., 1995). In that study, 1.1 million subjects were interviewed in 1982 and followed for vital status to the present. The results of that study and the extended re-analysis show links between long-term fine particulate exposure and deaths from ischaemic heart disease and lung cancer (Krewski et. al., 2009). Importantly, other cohort studies in the US show that long-term exposure to fine particulate matter is associated with mortality risks ten times greater than that of short-term exposures, based on time-series studies. A paper published recently in the New England Journal of Medicine using the ACS study demonstrates for the first time that long-term exposure to ozone is linked to increased deaths attributed to respiratory causes (Jerrett et. al., 2009). The results of studies using the ACS study have led the US Environmental Protection Agency to promulgate a national annual average standard, in addition to their 24-hour standard which safeguards against extreme short-term exposures (U.S. EPA, 2004).

To date, Canada-wide standards for annual averages of either fine particulate matter or ozone have not been developed, largely due to lack of evidence from the Canadian population and uncertainties about transferring risks observed in the US to Canada (Raisenne, 2003). A recent multi-national collaborative study between Canada, the US, and Europe supports this concern of transferability, where the association between short-term air pollution exposure and mortality is examined (Samoli et. al., 2008). The results of that study indicate that risk estimates based on Canadian populations are twice as large as those based on US and European studies, highlighting the importance of obtaining direct estimates of risk on Canadians.

In addition, information on the risk of development of cancer in the general population due to ambient exposure to carcinogens from fossil fuel combustion and other sources is not well defined. While fine particulate air pollution from combustion can be linked to deaths from lung cancer, the relationship to incidence of cancer is not well understood (Krewski et. al., 2009). This expanded cohort linkage study provides important information on whether exposure to combustion-related ambient pollution plays a role in developing cancer in the Canadian population. Linkage of the cohort to the CCDB will provide the means to assess the risk of cancer incidence.

3

3. Linkage Methodology

The basis for the linkage is the 1991 Census of Population 2B and 2D. In the 1991 Census of Population, one in five households (20%) in non-remote areas of Canada received a Census 2B (long form) questionnaire. The Census 2D (long form) “Canvasser questionnaire” was used to enumerate all households in remote northern areas of Canada and on Indian reserves. The 2B and 2D questionnaires contain all the questions from the Census 2A (short form) plus additional questions on topics such as education, ethnicity, mobility, income, employment and dwelling characteristics. In 1991, the total population who completed the long-form questionnaires was about 3.6 million individuals. The 1991 Census cohort used for this linkage study includes only individuals aged 25 and older.

To facilitate the linkage, TSF are used as a “name file” to improve data linkage as a bridge between individuals on the Census, the CMDB, and the CCDB (Figure 3.1). Here, the TSF from 1991 to 2007 were also used to create a mobility and migration component. The TSF is an annual file derived from personal tax returns filed for the year of reference. For this study, TSF were used to improve data linkage, provide alive follow-up, and record mobility where available using the postal code.

Figure 3-1 Linkage methodology flowchart for 1991-2011 Canadian Census Mortality and Cancer Follow-Up Study

Mortality information was drawn from the CMDB, which contains demographic and medical (cause of death) information on deaths registered by all provincial and territorial vital statistics registries in Canada. The CMDB comprises all deaths that occur within Canada, as well as deaths of Canadian residents reported by some US states. Cancer incidence was taken from the CCDB, which contains diagnosed incidences of cancer reported for all individuals whose usual place of residence is Canada or who are non-permanent residents. For cancer analysis, it is important to begin the study period (1991 to 2011) with a group of disease-free (no cancer diagnosis) individuals. To accomplish this, record linkage of the cohort to the cancer incidence data from 1969 to 2006 will be used to flag the records of Census respondents who were diagnosed with cancer before 1991. On the 1991 Census cohort file, Statistics Canada assigned a randomly-generated Statistics Canada number to each respondent and the linkage variables were pre-processed so that they are compatible with variables on the tax, cancer, and mortality databases. The cohort file was then linked to TSF from 1990 to 2007, and variables from the tax files were appended to the cohort file for use in the mortality linkage and for retention in the migration output

4 file. Each year of the TSF was used to identify additional deaths in the cohort not ascertained by the CMDB. Income data were not retained on this file.

The cohort file will be linked to the 1969 to 2006 CCDB using probabilistic linkage techniques to select the best of several linkages. A cancer output file will be prepared, containing the randomly-generated Statistics Canada respondent number and cancer incidence information for those in the 1991 Census cohort whose records linked to the CCDB. Names and other identifiers, such as cancer registry number, will not be included in this file.

3.2 Analytic Methodology

The methodology for analysis of ambient air pollution will be similar to that being used in an on-going project, in which the 1991 Census cohort linked to the 1991 to 2001 mortality data is analysed using ambient air pollution concentrations from various geographic locations across Canada. The analysis planned for the expanded linkage to the 1991 Census cohort will develop time-varying estimates of outdoor pollution concentrations for sulphur dioxide, nitrogen dioxide, and fine and coarse particulate matter across the study period (1991-2011).

The models used to estimate exposure to air pollution will include use of such information as fixed-site ambient monitoring data, land-use characteristics, satellite information, visibility at airports, proximity to major roads, and traffic counts. Fixed-site measurements from the National Air Pollution Surveillance (NAPS) Network are not available for every pollutant for every year between 1991 and 2001, but are available for most areas beyond 2001. Where necessary, exposure values will be estimated via a combination of land use regression modelling, satellite interpolation, and kriging surfaces (Lamsal et. al., 2010; Medical Association, 2005). Sensitivity analysis will be conducted on each exposure model to evaluate the model strength (Krewski et. al., 2009).

Relating mortality risk factors and longevity requires the use of survival models. Analysis of the 1991 Census cohort will use newly-developed computer software which implements a Cox proportional-hazards survival model with spatially auto correlated random effects at several levels of geographic nesting, such as province, community, and neighbourhood, with each level incorporating a spatial autoregressive error process. A detailed explanation of the methodology can be found in Krewski et. al. (2009).

4. Discussion

Census respondents were eligible for the cohort if they were usual, non-institutional residents of Canada on the day of the Census, were in the long-form Census records, and had reached age 25 by Census day. These individuals are those considered “in-scope” in Table 4-1. However, only those linked to a name file could be reliably followed for mortality, where approximately 20% could not be linked. As shown below, 2,860,244 individuals were eligible to be part of the cohort. To reduce the final cohort to equal 15% of the Canadian population, a random sample of 4.4% were removed, leaving a final cohort size of 2,734,835 persons.

5

Table 4-1 Derivation of the cohort from 1991 Census records, Canada Number Derivation of cohort In-scope census records¹ 3,576,487 Not linked to name file 716,243 Linked to name file (1990 and 1991 TSF) 2,860,244 Linked to name file but not followed for deaths 125,409 Linked to name file and followed for deaths (the cohort)² 2,734,835 Died during the follow-up period 426,979 Deaths ascertained by CMDB 409,711 Deaths ascertained by tax only 17,268 Followed for mobility from tax summary files (1991-2007) 96.7%

Percentage of population 1991 mid-year population estimate for population aged 25 and older³ 18,225,349 Cohort as a percentage of the population aged 25 and older 15.0% ¹ Non-institutional residents of Canada aged 25 or older with long-form questionnaire ² Random sample of 4.4% of those linked to name file ³ CANSIM table 051-0001/3604 Source: Canadian Census Mortality and Cancer Follow-Up Study, 1991-2006

Of the cohort, 426,979 died during the follow-up period, where 17,268 deaths were ascertained only by tax records and thus do not have any associated information on the cause, location, or nature for mortality. From the TSF, 96.7% of the cohort was followed for mobility between 1990 and 2007, although if individuals did not file taxes for each year during this period, missing values for place of residence will appear for each corresponding year.

In the absence of large-scale nationally-representative longitudinal health surveys, cohorts datasets such as that described here allow for the study of numerous aspects of the population-health inequalities and outcomes, particularly as related to environmental exposures. The first census mortality linkage has resulted in over a dozen major publications on a range of topics. The expanded cohort created by the addition of cancer incidence, mobility, and additional years of mortality will facilitate numerous additional analyses of key health outcomes by population- based socio-economic indicators and individual characteristics.

References

Burnett, R.T., Brook, J., Dann, T., et al. (2000). Association between particulate- and gasphase components of urban air pollution and daily mortality in eight Canadian cities. Inhalation & Toxicology, 12(suppl 4), pp. 15–39.

Burnett, R.T., Cakmak, S., and Brook, J.R. (1998). The effect of the urban ambient air pollution mix on daily mortality rates in 11 Canadian cities. Canadian Journal of Public Health, 89, pp. 152–156.

Burnett, R.T., and Goldberg, M.S. (2003). Size-fractioned particulate mass and daily mortality in eight Canadian cities. Revised analysis of time-series studies of air pollution and health, Boston, MA: Health Effects Institute, pp. 85-89.

Burra, T.A., Moineddin, R., Agha, M.M., and Glazier, R.H. (2009). Social disadvantage, air pollution, and asthma physician visits in , Canada. Environmental Research, 109, pp. 998-1003.

Statistics Canada (2008), Health environment linkages expert panel report to Statistics Canada, Environment Accounts and Statistics Division, December 19, 2008.

6

Jerrett, M., Burnett, R.T., Pope, C.A. III, et al. (2009). Long-term ozone exposure and mortality, New England Journal of Medicine, 360, pp. 1085-1095.

Jerrett, M., Finkelstein, M., Brook, J.R., et al. (2009). A cohort study of traffic-related air pollution and mortality in Toronto, Ontario, Canada. Environmental Health Perspectives, 117(5), pp. 772-777.

Koranteng, S., Osornio Vargas, A.R., and Buka, I. (2007). Ambient air pollution and children’s health: A systematic review of Canadian epidemiological studies. Paediatric and Child Health, 12, pp. 225-233.

Krewski, D., Jerrett, M., Burnett, R.T., et al. (2009). Extended follow-up and spatial analysis of the American Cancer Society study linking particulate air pollution and mortality. Boston, MA: Health Effects Institute.

Lamsal, L.N., Martin, R.V., van Donkelaar A., et. al. (2010). Indirect validation of tropospheric nitrogen dioxide retrieved from the OMI satellite instrument: insight into the seasonal variation of nitrogen oxides at northern midlatitudes. Journal of Geophysics Research, 115.

Ontario Medical Association (2005). The Illness Costs of Air Pollution: 2005-2026, Health and Economic Damage Estimates, Toronto: Ontario Medical Association.

Pope, C.A. III, Thun, M.J., Namboodiri M.M., et al. (1995). Particulate air pollution as a predictor of mortality in a prospective study of US adults. American Journal of Respiratory and Critical Care Medicine, 151, pp. 669- 674.

Raisenne, M. (2003). Science and regulation – U.S. and Canadian overview. Journal of Toxicology and Environmental Health, Part 4, 66, pp. 1503-1506.

Samoli, E., Peng, R., Ramsay, T., et al. (2008). Acute effects of ambient particulate matter on mortality in Europe and North America: Results from the APHENA study. Environmental Health Perspectives, 116, pp. 1480- 1486.

Stieb, D.M., Smith-Doiron, M., Brook, J.R., et al. (2002). Air pollution and disability days in Toronto: results from the National Population Health Survey. Environmental Research Section A, 89, pp. 210-219.

Szyszkowicz, M., Rowe, B.H., and Kaplan, G.G. (2009). Ambient sulphur dioxide exposure and emergency department visits for migraine in Vancouver, Canada. International Journal of Occupational Medicine and Environmental Health, 22, pp. 7-12.

Szyszkowicz, M. (2008). Ambient air pollution and daily emergency department visits for ischemic stroke in , Canada. International Journal of Occupational Medicine and Environmental Health, 21, pp. 295-300.

U.S. Environmental Protection Agency (2004). Air quality criteria for particulate matter EPA 600/P-99/002aF-bF. Washington DC: U.S. Environmental Protection Agency.

Villeneuve, P.J., Chen, L., Stieb, D., and Rowe, B.H. (2006). Associations between outdoor air pollution and emergency department visits for stroke in Edmonton, Canada. European Journal of Epidemiology, 21, pp. 689-700.

Wilkins, R., Tjepkema, M., Mustard, C., and Choinière, R. (2008). The Canadian census mortality follow-up study, 1991 through 2001. Health Reports, 19, pp. 25-43.

7