A Statistical Method for Estimating Under-Reported Incidence Rates with Application to Child Diarrhea in Thai Provinces Bordering Cambodia
Total Page:16
File Type:pdf, Size:1020Kb
A STATISTICAL METHOD FOR UNDER-REPORTED CHILD DIARRHEA A STATISTICAL METHOD FOR ESTIMATING UNDER-REPORTED INCIDENCE RATES WITH APPLICATION TO CHILD DIARRHEA IN THAI PROVINCES BORDERING CAMBODIA Sulawan Yotthanoo1 and Chamnein Choonpradub2 1Department of Statistics, School of Science and Technology, Naresuan University, Phayao; 2Department of Mathematics and Computer Science, Faculty of Science and Technology, Prince of Songkla University, Pattani Campus, Pattani, Thailand Abstract. Diarrhea is a major health problem in Thailand, but reported data of disease incidence are known or suspected to be under-reported. This study aimed to develop a statistical model for estimating the annual incidence of hospital diarrhea cases among children under five years. Data regarding diarrhea patients 0-4 years old were col- lected for the National Notifiable Disease Surveillance (Report 506) about Thai prov- inces bordering Cambodia during 1999-2004 by the Ministry of Public Health. A log- linear regression model based on the prevailing seasonal-trend pattern was used for diarrhea incidence as a function of quarter, year and district, after imputing rates where under-reporting was evident, using populations obtained from the 2000 population census. The model also takes any spatial correlation between districts into account, using the generalized estimating equation (GEE) method. Diarrhea incidence had sea- sonal peaks in the first quarter (January to March) and the trend steadily increased from 1999 to 2004. Results from such studies can help health authorities develop pre- vention policies. Key words: child diarrhea, statistical model, BEE method, linear regression INTRODUCTION third of all deaths in this age group are as- sociated with diarrhea. Approximately 1.5 Diarrhea is one of the world’s top five billion diarrhea episodes and 4 million infectious disease causes of death (Brownlie deaths occur annually among children age et al, 2006) and remains a major cause of less than five years (Vargas et al, 2004). morbidity and mortality among children in As in other developing countries, diar- developing countries (Carlos and Saniel, rhea in Thailand is a major health problem 1990; Parashar et al, 2003). Children under and accounts for approximately 50% of all five years of age have an average of 3.3 diar- hospital-reported infectious diseases (Thai rhea episodes per year, and more than one- Working Group on Burden of Disease and Injuries, 2002). The Bureau of Epidemiology Correspondence: Chamnein Choonpradub, De- (2002) reported that while diarrhea-related partment of Mathematics and Computer Science, Faculty of Science and Technology, Prince of mortality declined from 1.11 per 100,000 in Songkla University, Pattani Campus, Pattani 1988 to 0.23 per 100,000 in 2002, morbidity 94000, Thailand. increased from 1,488 cases per 100,000 in Tel: 66 (0) 89 4660 803; Fax: 66 (0) 7331 2729 1993 to 1,687 cases per 100,000 in 2002. In E-mail: [email protected] 2002, there were 1,055,393 cases of diarrhea in Vol 41 No. 1 January 2010 203 SOUTHEAST ASIAN J TROP MED PUBLIC HEALTH Thailand, of which one-third occurred among Data analysis children under five years of age and 12% re- Although the registry included the vil- quired hospitalization (Jiraphongsa et al, 2005). lage of residence and date of hospitalization Given that diarrhea morbidity remains of the patient, we used districts (statistical high in Thailand, there is a need to improve regions containing up to hundreds of vil- treatment to prevent the disease, especially lages ranging in population from 795 to in provinces bordering Cambodia where 21,409). We did this to substantially reduce cross-border migration may be a factor correlations between annual incidence out- (Thimasarn et al, 1995; Konchom et al, 2003). comes in successive periods of time and in We investigated a statistical model neighboring locations, while still enabling based on linear regression for estimating the trends for place and time to be identified. extent of under-reporting. We then classified Data from the Thai infectious disease regis- patterns of child diarrhea in Thai provinces try are known, or suspected, to be seriously bordering Cambodia using the GEE model. under-reported (Lumbiganon et al, 1990; Saengwonloey et al, 2003; Intusoma et al, 2008). MATERIALS AND METHODS We first calculated disease incidence in cells defined by district (i) and month (j) of Study area and data source year (t) as the ratio of the number of reported The border between Thailand and Cam- cases (n ) to the district population in 1,000s bodia is approximately 800 km long, stretch- ijt (P ). For reasons evident from a detailed ing along the provinces of the lower north- i study of monthly disease counts (Table 2), east area of Thailand from a point known as any occurrence of zero cases in a cell was “Chong Bog” in Ubon Ratchathani Province considered as a possible instance of under- and ending in the Had Lek Subdistrict of reporting, and an additive linear model was Khlong Yai in Trat Province (Fig 1). Due to fitted to the logarithms of the remaining in- its stronger economic growth, Thailand at- cidence rates, namely, tracts many migrant workers from Cambo- dia. Cross-border migration has been con- nijt nected with health problems, including in- ln = y = µ+ + α + β γ (1) P ijt i j t . fectious diseases (Thimasarn et al, 1995; i Konchom et al, 2003). In this model µ is a constant and αi, βj Data regarding diarrhea cases from 1999 and γt are the effects of district i=1,2,...,106, to 2004 in the border provinces of interest month j=1,2,...,12 and year t=1,2,...,6, respec- were taken from the National Notifiable Dis- tively, with zero means. After examining the ease Surveillance Report (506), Bureau of plot of standardized residuals, this model Epidemiology, Ministry of Public Health. was refitted after further omission of cells Each record contains the type of infectious corresponding to residuals below a specified disease, age, gender, subdistrict of residence, cut-off value. Having thus obtained an ac- date of hospitalization, and disease severity ceptable fit, the omitted occurrences were of the patient. The resident population de- then imputed using the model. Next, we nominator used to compute the annual inci- aggregated the monthly data into quarterly dence was obtained from the Population and incidence rates and fitted a model similar to Housing Census of 2000, performed by the (1) with j now representing quarter instead National Statistics Office of Thailand. of month, and the residuals from this model 204 Vol 41 No. 1 January 2010 A STATISTICAL METHOD FOR UNDER-REPORTED CHILD DIARRHEA were used to compute correlation coeffi- above. These estimated proportions for lev- cients between different districts. The aver- els of each factor after adjusting for the other ages of the correlation coefficients within factors in the model were again computed by each province were then used to fit a gener- requiring the weighted average of the ad- alized estimating equation (GEE) model justed proportions match the overall propor- (Liang and Zeger, 1986; Yan and Fine, 2004) tion, using a Newton-Raphson iteration pro- having a fixed block-diagonal correlation cedure with Marquardt damping, and stan- structure with the blocks corresponding to dard errors for differences between indi- provinces. Residuals from this model were vidual proportions and the overall mean. examined using normal quantile plots after All statistical analysis was carried out filtering to remove their estimated correla- using the R program (R Development Core tion. Team, 2007). To obtain unbiased estimates of inci- dence rates in cells we used the formula RESULTS Preliminary analysis rˆ = exp ( yˆ + c) , ijt ijt (2) Preliminary analysis indicated gender was not a major factor influencing diarrhea where y ˆ ijt is the fitted value of yijt and c is a constant chosen to match the total num- incidence (Ardkeaw and Tongkumchum, ber of observed cases with the total given 2009), therefore it was not included in the by the model. After fitting the model, unbi- model. Children less than five years old had ased incidence rates for levels of each factor the highest age-specific annual incidence adjusted for other factors were calculated rate (Bureau of Epidemiology, 2007), so this similarly. Standard errors for these adjusted group was selected for study. incidence rates were obtained using sum Data from Trat Province were not avail- contrasts (Venables and Ripley 2002) to com- able for 2002 and so were excluded from the pare the incidence rates for each level of a analysis. factor with the overall mean incidence rate. During the study period from January Since the confidence intervals for fac- 1999 to December 2004, 260,522 cases of di- tor-specific incidence rates and proportions arrhea were reported from district hospitals obtained from this model (using the sum in the Thai-Cambodia border provinces contrasts) divide naturally into three groups among children less than 5 years old. The according to their location entirely above the number of cases reported in a month for each mean, around the mean, or entirely below district varied from zero to 578 (average an- the mean, we used this trichotomy to create nual incidence rate 69.4 per 1,000). Among schematic maps of districts according to their the six provinces, Surin (78.8 per 1,000) and estimated diarrhea annual incidence rates Buri Ram (78.2 per 1,000) had relatively high and under-reporting percentages. rates (Table 1). We also estimated the extent of under- Under-reporting reporting data by district, quarter and year, Table 2 shows the number of monthly by fitting a simple logistic regression model reported diarrhea cases in ten selected dis- (Hosmer and Lemeshow, 2000; Kleinbaum tricts.