87048 Stats News
Total Page:16
File Type:pdf, Size:1020Kb
statistical society of australia incorporated newsletter November 2005 Number 113 Registered by Australia Post Publication No. NBH3503 issn 0314-6820 75 years of statisticians in CSIRO methods to agricultural research. Her projects included control of oriental peach moths and blowflies, work on plant diseases and noxious weeds, and studies of the effects of supplements on sheep. “Betty Allan’s statistical work pushed CSIR’s agricultural science to new heights – training scientists, designing reliable experiments and extracting important information,” said Dr Murray Cameron, chief of CSIRO Mathematical and Information Sciences. “After she retired in 1940 following her marriage, CSIRO formally established a Biometrics Section, forerunner to the present-day CSIRO Mathematical and Information Sciences.” Dinner Speakers Terry Speed (left) and John Field (right) with Murray Cameron (centre), Chief of CSIRO Mathematical and Information Sciences At a dinner in Canberra on 27 September 2005, CSIRO paid tribute to the seventy fifth anniversary of the appointment of its first statistician, Frances Elizabeth “Betty” Allan. The dinner featured talks by John Field of BiometricsSA on the life of Betty Allan, and Terry Speed of WEHI and UC Berkeley on the expansive topic “Mathematics, Statistics, Biology: Past, Present and Future”. Born in 1905, Betty Allan studied pure and mixed mathematics at Melbourne University, completing both bachelors and masters degrees and winning scholarships and honours throughout. In 1928 she took up a CSIR studentship for “the study of statistical methods applied to agriculture”, studying at Newnham College, Cambridge and then working at Rothamsted Experimental Station with RA Fisher. Fisher described her as Betty Allen having “a rare gift for first-class mathematics”. On 29 September 1930, Betty Allan was appointed to the then- More information: www.cmis.csiro.au/stats75 CSIR Division of Plant Industry in Canberra to apply statistical Andrea Mettenmeyer In this issue Data Linkage Symposium .......................2 President’s Corner ...................................5 Three Doors ............................................10 SBI Conference ........................................3 Bayes for Beginners Workshop ..............6 Martin Esteban Donadio 1973-2005 .....12 Editorial ......................................................4 Named Lectures .......................................9 Branch Reports .......................................14 Symposium on data linkage On Tuesday 9 September 2005 a dynamic linked chain of health events, chance?’, Rose Karmel of the Australian ‘Symposium on Data Linkage’ was held ordered by date. Each linked chain can Institute of Health and Welfare (AIHW) at the Australian National University be broken at any point and new links explored this problem in the context (ANU) in Canberra. The event was can be inserted or existing links deleted of linking hospital records for persons organised by the Canberra Branch of as new information becomes available. 65 and over with residential aged care the SSAI and the School of Finance Chains can be extended further into the records. The interface between acute and Applied Statistics, ANU. About 75 future or the past as new health events hospital care and residential aged care persons from all over Australia, and are discovered. The WADLS now covers has long been recognised as an important one from New Zealand, came together health services for around 1.7 million issue in aged care services research. If to participate in the symposium which individuals, making it one of the world’s records can be matched using only date ran all day and involved eight speakers. largest and most comprehensive linked of birth, gender, region and relevant Data linkage is a rapidly growing area in data resources. Around 250 research event dates, social researchers can begin statistics and involves combining two or papers have been published based on to piece together patterns of movement more sets of data so as to maximise the the WADLS. Di, whose talk was titled between hospital and aged care services. use of the available information. ‘WA Data Linkage System’ presented Rose used Poisson models and some After an introduction by Glenys a summary of the system design and simplifying assumptions on birthday Bishop, the first speaker, Agnes Walker development, contributing data sources distributions to estimate the probability of the National Centre for Epidemiology and a summary of the research outputs of a chance match, and the likely false and Population Health (NCEPH) at ANU generated over the WADLS’s 10 year match rate. She showed that, under gave a talk titled ‘Building datasets using history. Chris, whose talk was titled reasonable assumptions, when records many sources: What are researchers ‘National data linkage with privacy are blocked by region, date of birth and allowed – and not allowed – to do in protection’ then elaborated on some sex, the Australian overall false rate is Australia and overseas?’. Agnes noted aspects of the WADLS, in particular likely to be very low, so that a no-name that to address a range of policy-relevant privacy issues, comparison with similar matching strategy is feasible. In a joint issues in research projects a single data systems abroad, and the additional project with the Western Australian Data source is rarely sufficient. However use value provided by incorporating links Linkage Unit (WA DLU), Rose is currently of multiple data sources, while often to the Commonwealth MBS, PBS and engaged in comparing links established fraught with difficulty, can lead to Aged Care datasets. At the end of the between hospital and residential aged superior results if researchers persevere Symposium, Di also presented a short care records using her no-name strategy with imputation or statistical matching. informational DVD on the WADLS. with name-based links obtained by the The difficulty in Australia has arisen, in Peter Christen of the Department WA DLU where names and addresses part, from the inability to use a unique of Computer Science at ANU gave an were available in both sets of records for person identifier to link various data impressive talk on ‘Recent developments the linkage process. collections, for example the Medicare card in data linkage technologies’. He began After lunch, Bruce Fraser of the number to link various administrative with a brief historical tour of data ABS presented a talk titled ‘Gaining health datasets. In some other countries, linkage, from the early seminal thinking community acceptance for data such as in Scandinavia, datasets in of Fellegi and Sunter nearly 40 years linking projects’, in which he gave which all government information has ago, to the modern day approaches a compelling case for making privacy been directly linked across individuals of machine learning and attribute and confidentiality management visible through a unique identifier are available recognition models. Hidden Markov to key stakeholders, and also to the to researchers. Agnes presented examples models, originally developed for speech community. He has witnessed a certain from her own research in which the use pattern recognition, are among the level of mistrust expressed in the media of imputation and statistical matching techniques being used for segmenting and echoed in the public regarding across several data sources has led to names and addresses into fields for data initiatives that might make it easier to link superior results. One involved the cleaning and standardisation. Blocking, records and build integrated information use of Australian Bureau of Statistics an old technique used in data linkage systems. He outlined information privacy (ABS) survey data in combination with for reducing the number of candidate principles that should be considered administrative Pharmaceutical Benefits matches, is being replaced by more before undertaking research involving Scheme (PBS) data to show that PBS sophisticated techniques, such as ‘fuzzy’ linking records. Bruce urged all of us medicine costs were a considerably blocking and sorted neighbourhood as researchers to carefully weigh the greater financial burden for Australia’s approaches. Linking records using potential benefits of public health and poorer families then for the rest of the probabilistic matching typically classifies welfare research against the potential population. potential matches into three categories: risks to privacy, to identify clear research The second presentation, given by matches, non-matches and possible objectives likely to serve the public Di Rosman of the WA Department of matches. Resolution of the possible interest, to engage at an early stage with Health and Chris Kelman of NCEPH matches is typically done through clerical key stakeholders, and to be prepared to and ANU, was on the topic of the review; however, this can be time- adopt practices which ensure that the Western Australian Data Linkage System consuming and error-prone. More details privacy of individuals is not breached. (WADLS). This System links seven core can be found at http://datamining.anu. Christine O’Keefe of Mathematical WA population health datasets with edu.au/linkage.html and Information Sciences at CSIRO next more than 12 clinical and health service How well can one expect to match presented a talk titled the ‘Queensland data sets. Recently, Commonwealth records when names and addresses Linked Data Set (QLDS) project: A data Medicare Benefits Scheme (MBS), PBS are not