Some Practical Guidelines for Effective Sample-Size Determination in Observational Studies

Aceh International Journal of Science and Technology, 1 (2): 51-53 August 2 12 ISSN: 2 88-986 SHORT COMMUNICATION Some Practical Guidelines for Effective Sample-Size Determination in Observational Studies 1Wan Muhamad Amir W Ahmad, 2Wan Abdul Aziz Wan Mohd Amin, 3Nor Azlida Aleng, and 4Norizan Mohamed 1, 3, 4Department of Mathematics, Faculty of Science and Technology, 2Department of Language and ommunication, Universiti Malaysia Terengganu (UMT) 1wmamir&umt.edu.my, 2(i(a&umt.edu.my, 4nori(an&umt.edu.my, 3a(lida_aleng&umt.edu.my Abstract- The purpose of this present study was to determine the right sample si(e in observational study which is focusing on medical or health sciences field. Sample si(e calculation is actually depends on the type of how the study is designed. For example different formulas are used to calculate the sample si(e in different type of the study. ,n this article, the formula of single proportion and two proportions is discussed and an example from medical research, which may contribute to the understanding of this problem, is presented. Keywords- Sample si(e determination, single proportion, and two proportions. ,n a research study, most of researchers design experiments with do(ens of treatment combinations. .n experiment that they had design can involve several of measurement include of dichotomous variables. Sample si(e calculations for dichotomous variables do not re/uire 0nowledge of any standard deviation. Statistical analysis is based on the 0ey idea that observation on a sample of sub1ects is made and then draws inferences about the population from which the sample is drawn (2aing, 2003). ,f the study sample is not representative of the population, it will well mislead and statistical procedure cannot help. 4owever, even a well5designed study can give only an idea of the answer sought because of random variation in the sample. Thus results from a single sample are sub1ect to statistical uncertainty which is strongly related to the si(e of the sample. 6uality or strength of statistical inference depends largely on the si(e of the sample selected (2aing, 20037 8ardner and .ltman, 1989). ,n this regard, there are some concerns about determining ade/uate sample si(e for studies where particular parameters of populations are estimated. ,n this study, we emphasi(e the parameter which is based on proportion of a population. For examples we want to estimate the prevalence of 4,V5,nfected Tuberculosis disease in a population. .ccording to 2aing (2003) an appropriate inclusion and exclusion criteria are also important to obtain participants who have particular characteristics which are under research interest. The other area, which cannot be simply ignored, is testing validity and reliability of measurement tools before actual data collection. pilot study or a pretest is re/uired to test the validity and reliability of measurement tools. ,n addition, there are a number of potential biases which can be avoided while designing study and during planning of data collection. The following is the brief guide to approach on how to determine a sample si(e based on single proportion and two proportions. For example, we want to conduct a study on “.ssociated Factors of 4,V5,nfected Tuberculosis in 4ospital Universiti Sains Malaysia, 4USM”. The aim of this study is to determine the prevalence of TB among 4,V patients and associated factors of TB among 4,V positive patients. The following steps are the brief guide to determine a sample si(e based on parameter which is ?proportion’ (Dupont and Alummer, 19907 Lwanga and Lemeshow, 1991). To illustrate this case, we fixed ob1ectives as shown below. From the fixed ob1ectives, we can determine the right sample si(e. Specific ob1ectives were- 1. To determine the prevalence of TB among people living with the 4,V from 2001 to 2010 in 4USM. 2. To identify associated factors of TB among 4,V positive patients. For ob1ective 1, we used single proportion method. To determine the sample si(e re/uired to estimate the proportion with the desired level of precision, some idea was re/uired beforehand about the possible magnitude of the proportion. ,t can be obtained from previous studies in literature or from a pilot study if there was no similar study conducted before. This formula is usually used in prevalence studies. 2 z ’ Sample si(e, n = p ()1 − p ∆ alculation was done on proportion of TB in people living with 4,V patients. 2 .961 ’ Sample si(e, n = 0 .072 ()1 − 0 .072 .050 n = 103 Aatients 51 Aceh International Journal of Science and Technology, 1 (2): 51-53 August 2 12 ISSN: 2 88-986 where % B 1.9C Level of Significance, D B 0.05 .bsolute precision (E) B 5F p is expected proportion of individual in the sample with the characteristic of interest at the 100(15D) F confidence interval. .nticipated population proportion (p) B 7.2F (8ao et al., 2010) .fter adding 20F estimated missing data, we finally decided that total patients to be sampled is n B 123.C ≈ 124 patients. So, total sample si(e needed n B 124 of TB in people living with 4,V patients. For ob1ective 2, we used two proportions method. The calculation of sample si(e can be done using Aower and Sample Si(e alculation (AS) software or manual calculation, with the significance level (D) 0.05 and the power of study (15 H) of 80F. The detectable ha(ard ratio of the presence of prognostic factor relative to absence of prognostic factors was decided by the researcher and expert opinion by clinicians (Dupont and Alummer, 1997, Mugusi et al., 2009). Below is the Two Aroportions formula- 1 1 2 p 0 ( − p 0 ) + p 1 ( − p 1 ) n = 2 ()z α + z β ()p 0 − p 1 where- A0 B Based on literature review A1 B Based on expert opinion 1 9600 zα = z .050 = . (one tailed) 0 8416 zβ = z .200 = . (one tailed) Table 1. Sample si(e calculation 2o Ib1ective JA1 A0 Type 1 error Aower Sample Si(e 1 D4 ell count (2issapatorn et al., 2003) 0.52 0.3C 5F 80F 142 patients Calculation 0 36 1 0 36 0 52 1 0 52 . ( − . ) + . ( − . ) 1 96 0 84 2 142 patients n = 2 (). + . = ()0.36 − 0.52 2 4eterosexual (2issapatorn et al., 2003) 0.44 0.28 5F 80F 137 patients Calculation 280 1 280 440 1 440 . ( − . ) + . ( − . ) 961 840 2 137 patients n = 2 (). + . = ().280 − .440 3 ,ntravenous drug user (,VDU) 0.5C 0.4 5F 80F 149 patients (2issapatorn et al., 2003) Calculation 140 40 0 56 1 0 56 . ( − . ) + . ( − . ) 1 96 0 84 2 149 patients n = 2 (). + . = ().40 − 0 .56 4 Blood transfusion (2issapatorn et al., 2003) 0.41 0.25 5F 80F 132 patients Calculation 0 41 1 0 41 0 25 1 0 25 . ( − . ) + . ( − . ) 1 96 0 84 2 132 patients n = 2 (). + . = ()0 .25 − 0 .41 5 .ge (2issapatorn et al., 2003) 0.53 0.37 5F 80F 148 patients Calculation 0 37 1 0 37 0 53 1 0 53 . ( − . ) + . ( − . ) 1 96 0 84 2 148 patients n = 2 (). + . = ()0.37 − 0.53 JDetermined by expert opinion From Table 1 we can see that, the largest sample si(e is ta0ing into the account is n B149 patients. ,n this case, we consider 149 patients (because we choose the biggest sample si(e). .fter adding 20F estimated missing data, we get n = 149 + ( .20 × 149 ) = 178 .8 ≈ 179 per group, which can be obtained as follows- i. Aatients with TB among 4,V patients B 179 patients ii. Aatients with 2on5TB among 4,V patients B 179 patients Therefore, a total patient to be sampled is (179 × 2 ) = 358 patients. The application of the proposed formula for the sample si(e determination has been discussed in this paper with approaching to the case of determination associated factors of 4,V5infected Tuberculosis. Before we proceed the study, we 52 Aceh International Journal of Science and Technology, 1 (2): 51-53 August 2 12 ISSN: 2 88-986 have to design the study which is including the sample si(e determination. This 0ind of study has been conducted in many parts of the world. ,n this paper, two different methods have been used- (i) Single proportion and (ii) Two proportions. 2aing (2003) pointed out that the statistical importance of re/uired sample si(e, to be earmar0ed as a standard value for a particular population, was rarely emphasi(ed though values have been recogni(ed as references for ages. 4ealth personnel need to be aware of the fact that central to the planning of any such study is the decision on how large a sample to select from the population under study. This article is discussed a certain extension of important issue of sample si(e determination for observational studies. .hmad et al. (2011) pointed out that the larger your sample, the more sure you can be that their answers truly reflect the population. 4owever, there are a few guidelines that can have to be addressed in this particular area of health sciences. References Dupont, W. D., and Alummer, W.D. (1990). Aower and sample si(e calculation, . review and computer program. ontrolled linical Trails, 11- 11C5128. Mugusi, F. M., Mehta, S., Villamor, E., Urassa, W., Saathoff, E., Bosch, R. N., and Faw(i, W. W. (2009). Factors associated with mortality in 4,V5infected and uninfected patients with pulmonary tuberculosis. BM Aublic 4ealth, 9- 409. 8ardner, M. N., and .ltman D.8. (1989). Statistics with confidence5confidence intervals and statistical guidelines. British Medical Nournal- 1513.

Some Practical Guidelines for Effective Sample-Size Determination in Observational Studies

Economic Analysis of Optimal Sample Size in Legal Proceedings

8.3 Estimating Population Proportions ! We Use the Sample Proportion P ˆ As Our Estimate of the Population Proportion P

Solutions to Homework 3 Statistics 302 Professor Larget

8 Statistical Inference 8.1 Estimates for Population Parameters In

Workbook 05-Lab 4- Hypothesis Testing for a Population Proportion

Principles of Statistics Xuming He [email protected]

Making Sense of Statistical Studies

Confidence Intervals for a Population Proportion P

Tests for One Proportion

Sampling Distribution and Central Limit Theorem

Inference for Proportions IPS Chapter 8

Understanding Inference