The Human Mortality Database: Current status and on-going developments

Magali Barbieri Dmitri Jdanov Univ. California, Berkeley Max Planck Institute for INED, Paris Demographic Research, Rostock

Acknowledgement: this presentation is based on the work conducted by many members of the HMD team over the years, at both the University of California, Berkeley, and the Max Planck Institute for Demographic Research (MPIDR), Rostock.

House of Finance Days, Université Paris-Dauphine, HMD users meeting, March 8, 2019 Background

• Goal of the HMD: To provide detailed high-quality mortality and population data free of charge to all persons interested in the history of human longevity • 50,000+ registered users : Academics (5,000 publications) Students Actuaries Policy analysts Journalists Other corporations (pension funds, banks, etc…)

2 What is in the HMD? • Detailed historical data and supporting documentation for 40 national populations: – Death counts and estimated population exposures (person-years lived) at the finest detail possible – Original estimates of age-specific death rates and life tables in various formats (age x time) • Computed using various forms of input data: – Death counts from national statistical offices – Census counts – Birth counts – Official population estimates 3 Additional information in HMD funders and sponsors

Grants and donations:

Support provided by the U.S. National Institute on Aging (grants R01-AG011552 and R01-AG040245), the U.K. Institute and Faculty of Actuaries, the Canadian Institute of Actuaries, the Dutch Royal Actuarial Association, AXA, Hannover-Re, Milliman- , RGA, SCOR and the Society of Actuaries.

Disclaimer: The author is solely responsible for the content of this presentation, which does not necessarily represent the official views of the National Institutes of Health and other sponsors. Who is responsible for the HMD?

Two teams of researchers: • Max Plank Institute for Demographic Research (in Rostock, ) led by Vladimir Shkolnikov, Director • UC Berkeley (Dept of Demography) led by Magali Barbieri, Associate Director (previously John Wimoth, Founding Director) John R. Wilmoth Vladimir M. Shkolnikov Magali Barbieri Dmitry Jdanov Founding Director, Director, MPIDR Associate Director, Head of the MPIDR UCB in 2000, now UN Head of the UCB Team, Team, MPIDR UCB&INED Max Planck Team Berkeley Team (members present and some former) (members present and some former) Gabriel Borges

Domantas Dana Glei Jasilionis Evgeny Sebastian Kirill Andreev Carl Boe Kluesener Andreev

Tim Riffe Vladimir Canudas- Romo Pavel Grigoriev Eva Kibele Sigrid Gellers Celeste Winant Monica Rembrandt Scholz Lisa Yang Alexander HMD Project Staff (March 2019)

• Directors (1 + 1 = 2) • Country specialists (4 + 5 = 9) • Administrative assistants (2) • Others providing technical support (4 + 1 = 5) Guiding principles • Comparability – Over time (from 1751 to 2017) – Across countries (40 mostly high-income) • Accessibility – free and easy access to data and metadata • Flexibility – Data files in multiple formats • Reproducibility – Access to all initial (input) data – Full documentation – HMD scripts are freely available • Quality control – Standardized rigorous data quality checks for regular updates – Intensive data check procedures and research work for new countries – Work with external experts Core activities: HMD country updates (1)

• Update of all HMD countries on rotating basis • Priority countries: – The US – The UK and components – Germany – – France – Russia – – Canada Core activities: HMD country updates (2) • Steps involved in country updates: 1. Collect data (births, deaths, and populations from NSOs – publicly available or customized tables) 2. Format as standard input files 3. Prepare cocktail script (from a palette of standard HMD computer routines) 4. Run script and adjust if needs be 5. Check internal and external consistency of output (automatic diagnostic charts and other standard verifications) => exchange with in-country experts 6. Update all documentation files (internal and public) 7. Submit to HMD Directors for verification 8. Publish on HMD website => 1 to 3 weeks per country (sometimes more if particular issues arise) Core activities: Investigate new countries • Recently added countries: – Greece – Croatia – South Korea • Other countries investigated – Costa Rica – Moldova • In progress / plans – Serbia – Romania – EU28 – Hong Kong Core activities: Improve the HMD methods • Motivation: increase accuracy of mortality estimates • Current Methods Protocol = Version 6 (Dec. 2017 => all countries updated to this new version in 2018) • Version 5: work in progress – New inter-censal method – Old age mortality Why HMD? How HMD is used by actuaries

Three major applications 1. Standard for relational models (to link client pools to national population) 2. Analyses of variability in risks over time and across populations 3. Mortality improvement models (mainly for model development and experimentation) An example: life expectancy in Moldova All correct figures are highlighted in yellow Numerator-denominator bias: an example of Moldova

The problem: systematic bias (deaths and births refer to the de facto population, (.e. occurred within the country, while population estimates also include long-term emigrants - Moldavian citizens living abroad) leads to an under- estimation of mortality and fertility

* Since 1998 official population counts do not include Transnistria region

The solution: population estimates were corrected using data on border crossing and additional data collected at the census 2004

Source: Penina, Jdanov, Grigoriev (2015) Data challenges

Censuses and assessment of the population denominator Bulgaria: correction of population data

The standard HMD inter-censal method is not applicable to the period 1985-1992 because of an irregular pattern of out-migration. In 1985-8, international migration was very restricted in Bulgaria. After the collapse of communism in 1989 - mass emigration (mostly of the Turkish minority) over the next several years.

4700000 4700000

1984 1991 4500000 4500000 1992 1985 (census year) (census year) 4300000 4300000 Females 2000 MALES 4100000 FEMALES 4100000 Males

3900000 2001 3900000 census year

3700000 3700000

3500000 3500000 1980 1985 1990 1995 2000

1961 1963 1965 1967 1969 1971 1973 1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003

Trends in the total number of males and females. Bulgaria, 1961-2003. Official population estimates (left) and HMD data (right). Source: Jasilionis D., Jdanov D.A. Human Mortality Database: Background and Documentation for Bulgaria HMD Solution: official population estimates were used for 1985-8, but new population estimates were calculated for the latter period. The year 1988 was treated as a “pseudo- census point” as the beginning of the inter-censal interval. The HMD inter-censal estimates for Germany

1) Using additional migration data and cubic spline interpolation for migration trends across cohorts we removed the population changes due to the earlier “cleaning” by the statistical offices. 2) We distributed the accumulated error (not the net migration!) uniformly over the adjustment period of 24 years (30 years for East German lands): Data challenges

Changeable population definitions across time Changes in the definition of population: Poland

20,000,000 In the 2000s, Poland faced a massive Post-censal population out-migration that followed the EU estimates calculated Unfofficial inter- according to the 1988 censal estimates enlargement of 2004. It was expected census based on the 19,000,000 2011 census FEMALES that the population counts will be Post-censal population corrected downward after the next estimates calculated according to the 1960 18,000,000 population census of 2011. But census Pre- and post-censal population estimates according to the 2002 Statistics Poland has unexpectedly decided to change the official 17,000,000 MALES Post-censal population estimates according to the definition of the population status 2011 census from the permanently resident (acting

16,000,000 in 2010 and earlier) to the usually resident (from 2011 onward). Statistics Poland did not re-estimate age-specific 15,000,000 Post-censal population estimates calculated according population counts back to previous to the 1970 census census. Due to irregular migration

14,000,000 pattern the standard HMD inter-censal

1960 1964 1966 1970 1972 1974 1976 1978 1980 1984 1986 1990 1992 1996 1998 2002 2004 2006 2008 2010 2012 1968 1982 1988 1994 2000 2014 1962 method for reconstruction of annual population estimates is not applicable. Figure: Official and adjusted (Tymicki et al. , 2015) estimates of population of Poland Data challenges

Mortality at advanced ages Growing problems at advanced ages

Russia: life expectancy at age 90 4.5

4.3 Males (Standard HMD)

4.1 Females (Standard HMD)

3.9 males (SR80)

3.7 females (SR80)

3.5

3.3

3.1 Life expectancy at age 90 age at expectancy Life 2.9

2.7

2.5 1980 1985 1990 1995 2000 2005 2010 2015 Year Free & open access to all data Open Data

Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form. Re-use and Redistribution: the data must be provided under terms that permit re-use and redistribution including the intermixing with other datasets.

Universal Participation: everyone must be able to use, re-use and redistribute - there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.

Thorough documentation of data, data sources, and computational methods

Data Sources: - Computed from microdata - Computed from tabulated data from government - Computed from tabulated data from report - Computed from tabulated population and death data - Final report - Others - Preliminary report - Unknown Challenge

How can high-quality data repositories compete with the “quick” data solutions covering the entire world and easily available online?

Ongoing work and future plans

• Adding cause-of-death data • Sub-national data • Less (statistically) developed countries • Latin America • China • India Ongoing work and future plans: developing countries Coverage of death registration (December 2014)

Source: UN Population Division (http://unstats.un.org/unsd/demographic/CRVS/CR_coverage.htm) Data availability in the HMD (March, 2019)

HMD countries Principal data sources on mortality in China

CENSUSES OR SURVEYS BY NATIONAL BUREAU OF STATISTICS (NBS)  Population censuses: 1982, 1990, 2000, and 2010. - enumeration of people who died in a household one year or 18 months before the census or survey. Inter-censal 1% sample surveys: 1987, 1995, and 2005  Annual Population Change Surveys. Smaller surveys for inter-censal years.

HOUSEHOLD REGISTRY (“HUKOU”) BY MINISTRY OF PUBLIC SECURITY - each resident is legally required to register in the household registration system, registration to be cancelled within a month after death. Serves as basis for census.

VITAL REGISTRATION / SURVEILLANCE SYSTEMS BY HEALTH MINISTRY  Nationwide Vital Registration System: 8 % of the national population, ca. 110 million people (2005 est.), mostly urban, Eastern China (Rao et al. 2005).  Disease Surveillance Points (DSP): 161 surveillance points, ~10 mill. people.  National Child and Maternal Mortality Surveillance Points: 336 counties / urban districts covering 140 mill. people, child and maternal mortality. China and Sweden (1950+), Male China and Sweden (1950+), Male

Underestimated Infant Mortality

No accidental mortality hump

Death underregistration at the oldest-old ages

Overestimated mortality for ages 1-15 Source: Human Mortality Database and China Census Data Life Expectancy for Chinese Males Life Expectancy for Chinese Females Harmonized series of mortality estimates for India and its major states using Sample Registration System (SRS) and survey data Infant mortality rates from NFHS III (2001-05) and SRS (2002-06) for 16 Indian states SRS - a nationwide system for collecting vital statistics based on a dual record system for a sample of villages/urban blocks. Key features:  Set up in late 1960s, age-specific mortality estimates /life tables from 1970-75 onwards; Covers major states by urban/rural breakdown; Coverage: ~7.6 million pop. (2014). Problems: age heaping, over-estimation of old age mortality…. Temporary life expectancy between exact ages 0 and 60 in India, 1970-75 to 2000-04. Ongoing and future studies in cooperation with Usha Ram (IIPS, Mumbai, India): - Data quality and coverage (NFHS, DLHS, SRS)  focus on effects of age heaping. - Estimation of adult mortality by social status using District Level Household Survey (DLHS) & National Family Health Survey (NFHS4). - Examining validity of cause of death data.

Graphs from: Saikia, Jasilionis, Ram, Shkolnikov, 2011. General goal of the AXA project • To construct HMD-like life table series for Hong Kong and for Mexico and assess their accuracies for monitoring actuarial longevity risks • Collaboration between – the University of California- Berkeley team of the HMD and – the AXA group and local entities (AXA China Region and AXA Mexico) – with technical support from the Mortality Branch of the United Nations Population Division Specific aims

1. Construct a time series of life tables at the national level for Hong Kong and for Mexico using the HMD approach and methods protocol 2. Establish a standard set of data quality indicators to evaluate the reliability of the life tables (building from the demographic literature) 3. Measure the impact of data quality issues on the assessment of variations in biometrical risks and future longevity trends 4. Propose adjustments to the series to improve accuracy, using indirect estimation techniques and/or statistical methods Motivations • Joint interest in assessing mortality trends and their accuracy in a growing number of countries • For Academics: – Pressure to include additional countries into the HMD while preserving the high quality of the database mortality series – Opportunity provided by increased investments by international organizations and private sponsors to improve national demographic data collection systems => need to monitor the international Millenium/Sustainable Development Goals • For AXA: – Regulatory need to constantly improve data quality in countries of operation Longevity and mortality risks are of paramount importance at Life level, not only to calculate the Solvency Capital Required, but also to define the Best Estimates. Both are based on historical data, and the more relevant the data, the more accurate the mortality and longevity risks monitoring. => need to better assess variations in biometrical risks and future longevity trends in historical mortality series Result for Hong Kong promising Mexico The perfect case study: complete but clearly imperfect demographic information

Females Males

1 1

.8

.8

.6

.6

.4 1990 1990

log(20q60) 2016 log(20q60) 2016

.4

.2

.001 .01 .1 .2 .4 .6 .001 .01 .1 .2 .4 .6 log(5q0) log(5q0)

HMD Countries MEX

.2 50 Ongoing work and future plans: CoD data Adding cause-of-death data

• For all HMD countries with cause-of-death data following the International Classification of Diseases (ICD) • Back to 1950 or earliest year available • Respectful of privacy issues – No access to input data for some countries – Five-year age group • Three set of data series consistent with all-cause series: – Cause-of-death fractions – death counts – age-specific death rates • Shortlist of <100 exclusive cause-of-death categories (mostly compatible with EUROSTAT and NCHS) • Emphasis on disruptions arising from revisions of the ICD 52 Prepared COD series

• The (1959-2016) • England and Wales (1950-2013) • France (1958-2015) • Canada (1950-2009) • Sweden (1952-2012) • Norway (1951-2012) • Japan (1950-2013) • The Czech Republic (1950-2013)

53 COD project stalled

• Competition issue with the Human Cause-of- Death Database (HCD) • Same research teams (MPIDR, INED, and UC Berkeley) • Conceptual differences: same idea but with adjustments for changes in the International Classification of Diseases + HCD includes non- HMD countries (with indirect estimation methods) • On-going discussions to combine both databases and host them on the HMD website Ongoing work and future plans: sub-national data Sub-national databases à la HMD

• Canadian Human Mortality Database (Université de Montréal, Canada => UC Berkeley – thanks to the CIA) • Japan Mortality Database (Institute for Population and Social Security Research, Tokyo – Former PhD student at UC Berkeley) • United States Mortality DataBase (USMDB, usa.mortality.org) at UC Berkeley (state series published, county series in the work) • Germany Mortality Database (MPIDR) • Australia Mortality Database (new project) • France Mortality Database (looking for funding) Monitoring mortality at sub-national level: data, methods, and evidence

Call for papers

A two-&-half-Day International workshop

To be held at ANU, Canberra, Australia, October 15-17, 2019.

Organized by: The Australian National University, the Human Mortality Database team.