Report on Exact and Statistical Matching Techniques

Statistical Policy Working Papers are a series of technical documents prepared under the auspices of the Office of Federal Statistical Policy and Standards. These documents are the product of working groups or task forces, as noted in the Preface to each report. These Statistical Policy Working Papers are published for the purpose of encouraging further discussion of the technical issues and to stimulate policy actions which flow from the technical findings and recommendations. Readers of Statistical Policy Working Papers are encouraged to communicate directly with the Office of Federal Statistical Policy and Standards with additional views, suggestions, or technical concerns. Office of Joseph W. Duncan Federal Statistical Director Policy Standards For sale by the Superintendent of Documents, U.S. Government Printing Office Washington, D.C. 20402 Statistical Policy Working Paper 5 Report on Exact and Statistical Matching Techniques Prepared by Subcommittee on Matching Techniques Federal Committee on Statistical Methodology DEPARTMENT OF COMMERCE UNITED STATES OF AMERICA U.S. DEPARTMENT OF COMMERCE Philip M. Klutznick Courtenay M. Slater, Chief Economist Office of Federal Statistical Policy and Standards Joseph W. Duncan, Director Issued: June 1980 Office of Federal Statistical Policy and Standards Joseph W. Duncan, Director Katherine K. Wallman, Deputy Director, Social Statistics Gaylord E. Worden, Deputy Director, Economic Statistics Maria E. Gonzalez, Chairperson, Federal Committee on Statistical Methodology Preface This working paper was prepared by the Subcommittee on Matching Techniques, Federal Committee on Statistical Methodology. The Subcommittee was chaired by Daniel B. Radner, Office of Research and Statistics, Social Security Administration, Department of Health and Human Services. Members of the Subcommittee include Rich Allen, Economics, Statistics, and Cooperatives Service (USDA); Thomas B. Jabine, Energy Information Administration (DOE); and Hans J. Muller, Bureau of the Census (DOC). The Subcommittee report describes and contrasts exact and statistical matching techniques. Applications of both exact and statistical matches are discussed. The report is intended to be useful to statisticians in various Federal agencies in determining when it is appropriate to use exact matching techniques or when it may be appropriate to use statistical matching techniques. The recommendations of the report also include suggestions for further research. i Members of the Subcommittee on Matching Techniques Daniel B. Radner, Chairperson Office of Research and Statistics, Social Security Administration Department of Health and Human Services Rich Allen Economics, Statistics, and Cooperatives Service Department of Agriculture Maria E. Gonzalez (ex officio)* Chairperson, Federal Committee on Statistical Methodology Office of Federal Statistical Policy and Standards Department of Commerce Thomas B. Jabine* Energy Information Administration Department of Energy Hans J. Muller Bureau of the Census Department of Commerce *Member, Federal Committee on Statistical Methodology ii Acknowledgements The body of this report represents the collective effort of the Subcommittee on Matching Techniques. Although all members of the Subcommittee reviewed and commented on all parts of the report, specific members were responsible for writing different sections. The authors of the respective chapters and appendices appear below: Chapter Author(s) I Daniel Radner, Thomas Jabine, Rich Allen II II Hans Muller, Rich Allen III Daniel Radner IV Daniel Radner, Thomas Jabine Appendix I Rich Allen II Daniel Radner III Hans Muller, Rich Allen Maria E. Gonzalez and Thomas B. Jabine provided indispensable guidance and encouragement throughout the Subcommittee's work. Tore Dalenius, an ex officio member of the Subcommittee when the work began, provided important insights in the early stages of the work and helpful comments on drafts of the report. Others who contributed to the work as members of the Subcommittee in its earlier stages include: Richard Barr, Richard Coulter, David Hirschberg, Matthew Huxley, Benjamin Klugh, Stanley Kulpinski, Robert Penn, and Scott Turner. Members of the Federal Committee on Statistical Methodology and the Office of Federal Statistical Policy and Standards reviewed and commented on drafts of the report. Also, we are grateful to Benjamin Tepping, Ivan Fellegi, Horst Alter, and Michael Colledge for their helpful comments on drafts of the report, and to all those who supplied examples of matching. iii Members of the Federal Committee on Statistical Methodology (February 1979) Maria Elena Gonzalez (Chair) Charles D. Jones Office of Federal Statistical Bureau of the Census (Commerce) Policy and Standards (Commerce) William E. Kibler Barbara A. Bailar Economics, Statistics, and Bureau of the Census (Commerce) Cooperatives Service (Agriculture) Norman D. Beller Economics, Statistics, and Frank de Leeuw Cooperatives Service (Agriculture) Bureau of Economic Analysis (Commerce) Barbara A. Boyes Bureau of Labor Statistics Alfred D. McKeon (Labor) Bureau of Labor Statistics (Labor) Edwin J. Coleman Bureau of Economic Analysis (Commerce) Lincoln E. Moses Energy Information Administration John E. Cremeans (Energy) Bureau of Economic Analysis (Commerce) Monroe G. Sirken National Center for Health Marie D. Eldridge Statistics (HHS) National Center for Education Statistics (Education) Wray Smith Office of the Assistant Secretary Daniel H. Garnick for Planning and Evaluation Bureau of Economic Analysis (HHS) (Commerce) Thomas B. Jabine Thomas G. Staples Energy Information Administration Social Security Administration (Energy) (HHS) iv Table of Contents Page Preface. i Acknowledgements . iii CHAPTER I-INTRODUCTION AND OVERVIEW A. Scope of Study. 1 1. Definitions and Uses of Matching . 1 2. Matching Applications and Examples . 2 3. Confidentiality Issues . 3 4. The Role of Computers. 4 B. Auspices. 4 C. Dissemination of Report . 5 D. Organization of Report. 5 CHAPTER II-EXACT MATCHING A. Nature and History. 7 B. Types of Matching Error . 8 C. Procedures. 9 1. Preliminary Steps. 9 2. Selection of Match Characteristics and Definition of "Agreement" and "Disagreement" for Each Characteristic . 9 3. Blocking and Searching . .10 4. Weighting of Characteristics of Comparison Pairs . .10 5. Determination of Thresholds. .11 6. Validation of Decisions. .11 D. Practical Problems. .12 1. Source Data. .12 2. Matching Procedures. .12 3. Matching Mode. .12 4. Follow-up . .13 E. Reliability 13 F. Elimination of Duplication in One File. .14 CHAPTER III-STATISTICAL MATCHING A. Introduction. .15 B. A Suggested Framework for the Analysis of Statistical Matching Methods . .16 1. Universe . .16 2. Two Data Sets. .16 3. Hypothetical Exact Match . .16 4. Estimate of Hypothetical Exact Match . .17 5. Statistical Match Result . .17 v TABLE OF CONTENTS-Continued Page C. Applications of Statistical Matching. .17 1. Matching Steps . .18 2. Two Basic Types of Methods . .18 3. History and Development of Matching Methods. .19 a. Bureau of Economic Analysis, U.S. Department of Commerce, CPS-TM Match . .19 b. Bureau of Economic Analysis, U.S. Department of Commerce, SFCC Match . .20 c. Brookings Institution MERGE-66. .20 d. Christopher Sims' Comments. .21 e. Statistics Canada SCF-FEX Match . .22 f. Yale University (and National Bureau of Economic Research). .22 g. Office of Tax Analysis, U.S. Department of the Treasury . .24 h. Brookings Institution MERGE-70. .24 i. Office of Research and Statistics, Social Security Administration . .25 j. Statistics Canada COC and MCF Matches . .26 k. Mathematica Policy Researchs. .26 l. Other Statistical Matches . .27 D. Criticisms of Statistical Matching. .27 E. Types of Errors in Statistically Matched Data . .27 F. Summary and Conclusions . .28 CHAPTER IV-FINDINGS AND RECOMMENDATIONS A. Findings. .31 1. Definitions of Exact and Statistical Matching. .31 2. Usefulness of Matching . .31 3. Applications of Exact and Statistical Matching . .31 4. Comparison of Errors . .32 5. Comparison of Relative Risk of Disclosure and Potential for Harm to Individuals . .32 6. Legal Obstacles to Exact Matching. .32 B. Recommendations . .33 1. General. .33 a. When Should Matching be Used. .33 b. Choice between Exact and Statistical Matching . .33 c. Documentation of Matches. .33 d. Public Release of Matched Data. .33 e. Confidentiality Restrictions on Matching. .33 2. Research . .34 a. Exact Matching. .34 b. Statistical Matching. .34 APPENDICES Appendix I. Economics, Statistics, and Cooperatives Service Example of Exact Matching A. Exact Matching Considerations. .35 B. Selected Match Rules . .37 C. Practical Problems . .39 D. Technical Papers . .39 Appendix II. Office of Research and Statistics Example of Statistical Matching A. Introduction and Input Files . .41 B. Matching Method. .41 vi TABLE OF CONTENTS-Continued Page C. Correspondence of Values of Matching Variables . .42 D. Tables . .43 Appendix III. Selected Examples of Exact Matching A. Record Check Studies of Population Coverage. .47 B. Matching of Probation Department and Census Records. .48 C. Computer Linkage of Health and Vital Records: Death Clearance . .49 D. Use of Census Matching for Study of Psychiatric Admission Rates . .51 E. June 1975 Retired Uniformed Services Study. .51 F. Federal Annuitants-Unemployment Compensation Benefits Study . .51 G. Office of Education Income Validation Study . .52 H. Department of Defense Study of Military Compensation. .52 I. Department of the Treasury-Social

Report on Exact and Statistical Matching Techniques

Introduction to Social Statistics

Publications Using the HMD in Years 1997 – 2013

Statistical Matching: a Paradigm for Assessing the Uncertainty in the Procedure

Statistics on Spotlight: World Statistics Day 2015

A Machine Learning Approach to Census Record Linking∗

THE HISTORY and DEVELOPMENT of STATISTICS in BELGIUM by Dr

Precept 8: Some Review, Heteroskedasticity, and Causal Inference Soc 500: Applied Social Statistics

Santa ARRIVES

A Meta-Analysis Examining the Impact of Computer-Assisted Instruction on Postsecondary Statistics Education: 40 Years of Research JRTE | Vol

Stability and Median Rationalizability for Aggregate Matchings

Appendix 2: Publications and Other Works Using Data from the Human

Full Property Address Primary Liable