European Journal of Clinical Nutrition (2007) 61, 1064–1071 & 2007 Nature Publishing Group All rights reserved 0954-3007/07 $30.00 www.nature.com/ejcn

ORIGINAL ARTICLE Design effects associated with dietary nutrient intakes from a clustered design of 1 to 14-year-old children

PA Metcalf1,2, RKR Scragg2, AW Stewart2 and AJ Scott1

1Department of Statistics, University of Auckland, Auckland 1, New Zealand and 2Department of Epidemiology and Biostatistics, University of Auckland, Auckland 1, New Zealand

Objective: To calculate intra-cluster and intra-household design effects and intra-class correlation coefficients for dietary nutrients obtained from a 24 h record-assisted recall. Design: Children were recruited using clustered probability . Randomly selected starting-point addresses were obtained with probability proportional to mesh block size. Setting: Children aged 1–14 years in New Zealand. Subjects: There were 125 children in 50 clusters, giving an average of 2.498 children per cluster. In 15 homes, there were two children for the calculation of intra-household statistics. Results: Intra-cluster design effects ranged from 1.0 for cholesterol, b-carotene, vitamin A, vitamin D, vitamin E, selenium, fructose and both carbohydrate and protein expressed as their contribution to total energy intakes to 1.552 for saturated fat, with a median design effect of 1.148. Their corresponding intra-cluster correlations ranged from 0 to 0.37, respectively. Intra- household design effects ranged from 1.0 for height to 1.839 for vitamin B6, corresponding to intra-household correlations of 0 and 0.839. The median intra-household design effect was 1.550. Using a sampling design of two to three households per cluster for estimating dietary nutrient intakes would need, on average, a 15% increase in sample size compared with simple random sampling with a maximum increase of 55% to cover all nutrients. Conclusions: These data enable sample sizes for dietary nutrients to be estimated for both cluster and non- for children aged 1–14 years. The larger design effects found within households suggest that little extra information may be obtained by sampling more than one child per household. Sponsorship: The New Zealand Ministry of Health contracted this study. European Journal of Clinical Nutrition (2007) 61, 1064–1071; doi:10.1038/sj.ejcn.1602618; published online 31 January 2007

Keywords: sample size; cluster sampling; between-cluster variation; dietary nutrients; children; adolescents

Introduction Cluster sampling is an alternative to random sampling that retains the methodological strength of randomization Cluster randomized trials or group sampling in which groups (Donner et al., 1990; Koepsell et al., 1992). Examples of of subjects are allocated to different treatments are becoming clusters are schools, neighbourhoods, suburbs, towns or increasingly popular (Campbell and Grimshaw, 1998). cities. However, people in clusters cannot be treated as independent and the effect of this on an outcome leads to Correspondence: Dr PA Metcalf, Department of Community Health, the need to increase the sample size (Campbell and University of Auckland, Private Bag 92019, Auckland 1, New Zealand. Grimshaw, 1998; Kerry and Bland, 1998). E-mail: [email protected] Guarantor: PA Metcalf. This principle can also be applied to the recruitment of Contributors: RKRS, AWS and PAM contributed to the design of this study. participants in epidemiological surveys. We applied PAM, AJS and AWS contributed to the statistical analysis and interpretation of this method for the collection of dietary information the data. All authors contributed to the conduct of the study, data collection (24 h record-assisted recall and a qualitative food fre- and writing of this manuscript. Received 1 November 2005; revised 16 November 2006; accepted 16 quency questionnaire (FFQ)) in a pilot survey for the November 2006; published online 31 January 2007 Children’s Nutrition Survey in New Zealand. This method DEFs associated with dietary nutrient intakes of children aged 1–14 years PA Metcalf et al 1065 was considered appropriate as the study protocol required appear to have been published widely in the past (Campbell that an adequate sample of Pacific children be included by and Grimshaw, 1998). oversampling, and it is known that many Pacific people are We conducted a pilot study using cluster sampling of not in local electoral rolls. children to assess their dietary nutrient intakes. Here, we Simple random sampling assumes that the data from each report both intra-cluster and intra-household DEFs, and subject are independent of other subjects. Independence of intra-class correlation coefficients for a number of nutrient data is a prerequisite for simple tests of significance such intakes, height, weight and haemoglobin associated with as Student’s t-tests and regression. However, when cluster randomizing street addresses rather than individuals. This sampling is used, individuals within each cluster are unlikely information is needed for calculating required sample sizes to be independent of each other because people in a cluster of children’s nutrition surveys using clustered sampling. (e.g. a suburb) may share similar dietary attitudes, socio- economic status and environmental factors, such as shop- ping in the same food shops. The consequence of these Methods influences is that dietary nutrient intakes from individuals within clusters tend to be more similar to each other than The aim of this study was to pilot sampling methods for a dietary nutrient intakes from individuals from different national Children’s Nutrition Survey by collecting dietary clusters. As a result of this, the (or ) and other information from a clustered random sample of of between-group differences is usually larger than for the children in Auckland, Shannon and Feilding. All aspects of same number of randomly selected individuals, and its the study had ethical approval from the Auckland Ethics precision is usually less than that for the within-group Committee. variance. This reduces the statistical power and means that the number of subjects needs to be increased. Investigators need good estimates of the intra-class correlation coefficients Recruitment of children for variables of interest, which together with the number of A total of 125 children in Auckland and Shannon were observations per cluster determines the size of the extra recruited in the pilot testing of methods (response rate 70%). variation in the nested design. The pilot testing was to assess the acceptability of the 24-h The design effect (DEF) (Kerry and Bland, 1998) or record-assisted recall computerized programme, anthropo- inflation factor (Reading et al., 2000) is the ratio of the metric measurements, socio-demographic, physical activity, variance using cluster randomization to the variance using food security and medical history questions, and the individual randomization. It can be expressed in terms of the repeatability of the FFQ (Metcalf et al., 2003). Starting points intra-cluster correlation (r) and the number in a single (addresses) were randomly selected with probability propor- cluster, m: DEF ¼ 1 þ (m–1)r. If there is only one observation tional to mesh block size in prescribed areas by Statistics New per cluster, m ¼ 1 then the DEF is 1.0, and the two designs are Zealand. Recruiters went to the starting point address and the same. Otherwise, the larger the intra-cluster correlation, then visited the next 19 houses to the left of the starting that is, the larger the variation between clusters is, the bigger point address. From these houses, 125 children were selected the DEF and more subjects will be needed to get the same according to the following probabilities: Maori, 0.32; Pacific, power as a study which uses simple random sampling. For 0.79; and Other, 0.32. These probabilities came from the example, a DEF of 2.0 means that the sample size needs to proportion of children expected in each ethnic group in the be twice as large to give the same power as a simple random population under study. sample. Nutrients were calculated using the New Zealand Food The analysis of cluster randomized trials must also take Composition Database (New Zealand Food Composition into account the clustered nature of the data. Standard Database, OCNZ93). statistical techniques are not appropriate as they require data to be independent, unless aggregated analysis is performed at the level of the cluster. If the clustering effect is ignored, Statistical analysis P-values will be artificially small, and confidence intervals Statistical analyses were performed with the Statistical will be too small, increasing the chances of spuriously Analysis System (SAS Institute Inc., 2004). Median nutrient significant findings and misleading conclusions. Techniques intakes are reported, and standard deviations were calculated have now been developed to analyse data arising from a using a very robust-scale , the median absolute clustered design, which allow the hierarchical nature of the deviation (MAD) about the median (Hampel, 1974), which is data to be modelled appropriately (Rice and Leyland, 1996). an option in the SAS PROC UNIVARIATE procedure (SAS They allow variation to be modelled at each level of the data, Institute Inc., 2004). SAS PROC MIXED was used to estimate e.g., at both the cluster and the household levels. these variance components using restricted maximum like- The main difficulty in calculating sample sizes for cluster lihood (REML) using the following SAS code: PROC MIXED; sample studies is in obtaining an estimate of the within- CLASS cluster; MODEL nutrient ¼ RANDOM cluster; which cluster variation or intra-cluster correlation, as these do not results in a cluster variance component – the between-cluster

European Journal of Clinical Nutrition DEFs associated with dietary nutrient intakes of children aged 1–14 years PA Metcalf et al 1066 Table 1 Median (MAD) daily dietary nutrient intakes, weight, height and haemoglobin levels in 40 children aged 1–4 years, 44 children aged 5–9 years and 41 children aged 10–14 years

Variable 1–4 (years) 5–9 years 10–14 years

Energy (MJ) 5.766 (2.266) 7.901 (2.487) 8.674 (2.963) Carbohydrate (g) 195 (71.8) 243 (73.2) 247 (96.2) Carbohydrate as % energy 54.5 (9.95) 50.3 (7.34) 52.1 (7.35) Protein (g) 49.7 (23.0) 62.4 (27.0) 73.6 (33.5) Protein as % energy 13.4 (4.10) 12.6 (4.24) 14.2 (3.77) Fat (g) 47.6 (25.3) 82.8 (29.3) 86.1 (27.2) Fat as % energy 31.7 (5.98) 32.1 (7.98) 35.6 (6.07) Saturated fat (g) 20.7(12.37) 32.1 (18.97) 37.0 (17.75) Saturated fat as % energy 14.4 (4.46) 13.8 (4.82) 15.0 (3.55) Monounsaturated fat (g) 16.1 (9.28) 26.3 (13.33) 28.8 (10.50) Monounsaturated fat as % energy 9.1 (3.18) 11.1 (3.34) 11.5 (3.02) Polyunsaturated fat (g) 6.5 (3.77) 8.0 (4.70) 10.0 (6.24) Polyunsaturated fat as % energy 4.0 (1.84) 4.1 (1.56) 3.6 (1.76) Water (g) 1285 (432.4) 1347 (507.5) 1470 (402.3) Fibre (g) 11.6 (5.33) 14.3 (4.08) 17.5 (7.88) Sugars (g) 113.7 (75.7) 116.0 (55.5) 118.6 (60.5) Glucose (g) 15.6 (10.0) 15.5 (9.0) 14.2 (12.5) Fructose (g) 17.5 (12.6) 17.1 (15.1) 18.1 (15.5) Sucrose (g) 47.4 (41.4) 62.7 (45.9) 59.4 (34.5) Lactose (g) 14.3 (9.46) 13.9 (7.74) 14.0 (11.43) Maltose (g) 2.46 (1.68) 3.20 (1.98) 3.95 (3.63) Starch (g) 88.1 (39.0) 116.5 (39.1) 129.5 (58.9) Cholesterol (mg) 123.0 (70.7) 188.4 (110.6) 256.4 (171.4) Thiamine (mg) 1.39 (0.65) 1.42 (0.83) 1.79 (1.08) Riboflavin (mg) 1.55 (0.67) 1.63 (0.66) 1.75 (0.77) Niacin total (mg) 9.5 (4.04) 12.7 (4.03) 14.7 (6.92) Niacin equivalents from tryptophan (mg) 20.1 (7.34) 27.8 (11.69) 29.3 (12.24) Vitamin A (mg) 486.4 (305.8) 582.5 (409.6) 627.4 (375.4) Vitamin C (mg) 81.7 (77.2) 104.2 (105.9) 114.1 (100.7) Vitamin D (mg) 1.70 (1.68) 1.73 (1.31) 2.71 (2.32) Vitamin E (mg) 4.56 (2.32) 6.37 (3.05) 7.37 (4.21) Vitamin B6 (mg) 1.00 (0.74) 1.19 (0.65) 1.21 (0.61) Vitamin B12 (mg) 2.65 (1.82) 2.61 (1.63) 4.05 (2.66) Folate (mg) 149.2 (56.4) 170.7 (68.2) 184.9 (67.6) Beta carotene (mg) 668.7 (658.0) 1055.6 (996.6) 1015.0 (835.9) Retinol (mg) 264.8 (137.0) 329.4 (225.0) 257.2 (207.3) Sodium (mg) 1615 (560.5) 2520 (1257.1) 2722 (1052.3) Potassium (mg) 2127 (992.6) 2164 (670.6) 2501 (961.0) Magnesium (mg) 198.4 (77.5) 221.2 (57.8) 226.9 (83.8) Calcium (mg) 627.2 (326.4) 599.5 (347.9) 649.6 (457.7) Phosphorus (mg) 851.8 (331.5) 1109.9 (414.9) 1177.5 (458.6) Iron (mg) 7.7 (2.94) 9.8 (2.66) 11.2 (3.97) Zinc (mg) 6.4 (2.42) 8.2 (3.10) 10.7 (6.43) Manganese (mg) 1892 (667.7) 2423 (851.2) 2688 (1140.1) Copper (mg) 0.75 (0.45) 1.01 (0.42) 1.01 (0.47) Selenium (mg) 22.7 (12.23) 24.3 (9.29) 38.6 (23.25) Weight (kg) 15.5 (2.52) 23.9 (7.27) 50.0 (13.64) Height (mm) 960.7 (108.5) 1248.0 (106.0) 1543.5 (127.6) Haemoglobin (g/l) 125.0 (4.45) 129.0 (5.93) 133.0 (5.93)

Abbreviation: MAD, median absolute deviation is a robust measure of the .

2 variance sc and a residual variance component – the within- removed, if necessary, and the models refitted to see if the 2 cluster variance sr (Gulliford et al., 1999) from the same results changed substantially. However, it has previously children. For each variable, model assumptions were checked been noted that the PROC MIXED of the cluster by plotting the residuals against their predicted values for and residual variance derived under normality assumptions the fixed and random effects and examining a normal plot of are reasonable estimators even when their distributions are the residuals, and examining a normal plot of the cluster unspecified (Harville, 1977). The intra-class correlation 2 sc estimates of the random effects where the cluster variance coefficient (ri) was then calculated as 2 2 (Gulliford et al., ðsc þsr Þ was 40 (as the predicted values are a constant). Outliers were 1999). DEFs were calculated using the following formula:

European Journal of Clinical Nutrition DEFs associated with dietary nutrient intakes of children aged 1–14 years PA Metcalf et al 1067 Table 2 Within-cluster design effects, intraclass correlation coefficients and variance components for dietary nutrients, weight, height and haemoglobin from 125 children

Nutrient Between cluster variance Within cluster variance Intraclass correlation coefficient Design effect

Energy (MJ) 3.813 8.077 0.321 1.480 Carbohydrate (g) 2753.51 11 013.00 0.200 1.300 Carbohydrate as % energy 0.00 90.44 0.000 1.000 Protein (g) 211.22 629.96 0.251 1.376 Protein as % energy 0.00 17.02 0.000 1.000 Fat (g) 467.37 1031.64 0.312 1.467 Fat as % energy 2.39 58.88 0.039 1.059 Saturated fat 118.48 203.23 0.368 1.552 Saturated fat as % energy 2.27 16.56 0.120 1.180 Monounsaturated fat (g) 51.81 140.22 0.270 1.404 Monounsaturated fat as % energy 0.17 10.40 0.016 1.024 Polyunsaturated fat (g) 0.70 48.24 0.014 1.022 Polyunsaturated fat as % energy 0.62 6.08 0.092 1.138 Water (g) 28 469.00 221 417.00 0.114 1.171 Fibre (g) 4.12 122.37 0.033 1.049 Sugars (g) 759.96 5660.00 0.118 1.177 Glucose (g) 9.12 139.77 0.061 1.092 Fructose (g) 0.00 178.36 0.000 1.000 Sucrose (g) 236.78 4698.69 0.048 1.072 Lactose (g) 5.29 136.77 0.037 1.056 Maltose (g) 0.79 8.99 0.081 1.121 Starch (g) 80.85 4935.27 0.016 1.024 Cholesterol (mg) 0.00 28 124.00 0.000 1.000 Thiamine (mg) 0.22 1.27 0.149 1.223 Riboflavin (mg) 0.14 0.85 0.142 1.212 Niacin total (mg) 6.17 31.58 0.163 1.245 Niacin equivalents from tryptophan (mg) 68.00 126.09 0.350 1.525 Vitamin A (mg) 0.00 11 106 419.00 0.000 1.000 Vitamin C (mg) 650.26 14 690.00 0.042 1.064 Vitamin D (mg) 0.00 5.10 0.000 1.000 Vitamin E (mg) 0.00 18.45 0.000 1.000 Vitamin B6 (mg) 0.16 1.97 0.077 1.115 Vitamin B12 (mg) 1.09 4.62 0.191 1.286 Folate (mg) 1260.40 8746.57 0.126 1.189 Beta carotene (mg) 0.00 402 990 000.00 0.000 1.000 Retinol (mg) 2176.86 56 721.00 0.037 1.055 Sodium (mg) 214 497.00 1 495 629.00 0.125 1.188 Potassium (mg) 215 649.00 1 044 789.00 0.171 1.256 Magnesium (mg) 3338.64 10 464.00 0.242 1.362 Calcium (mg) 15 845.00 133 872.00 0.106 1.159 Phosphorus (mg) 66 850.00 142 527.00 0.319 1.478 Iron (mg) 9.49 18.53 0.339 1.507 Zinc (mg) 4.39 21.23 0.171 1.257 Manganese (mg) 388 561.00 1 764 106.00 0.181 1.270 Copper (mg) 4.60 112.19 0.039 1.059 Selenium (mg) 0.00 579.29 0.000 1.000 Weight (kg) 12.77 295.83 0.041 1.062 Height (mm) 10 542.00 666 592.00 0.016 1.023 Haemoglobin (g/l) 14.91 66.64 0.183 1.274

DEF ¼ 1 þ (m0À1) Â ri (Gulliford et al., 1999). The mean children in the sample. In the 15 households with two cluster size was calculated as (Armitage and Berry, 1994): children sampled, the second child was excluded from the 2 0 13 within-cluster calculations. Pk 2 6 B mi C7 1 6 Bi¼1 C7 m0 ¼ 6n À B C7 ðc À 1Þ 4 @ n A5 Results

Children were aged 1–14 years with a median of 7 years. where c is the total number of clusters, mi is the number of There were 49 girls and 76 boys. Table 1 shows medians, children in the ith cluster and n is the total number of MADs (a robust measure of the standard deviations) for daily

European Journal of Clinical Nutrition DEFs associated with dietary nutrient intakes of children aged 1–14 years PA Metcalf et al 1068 nutrient intakes, weight, height and haemoglobin levels by energy intakes were not included in this figure because they age groups 1–4, 5–9 and 10–14 years. As expected, in general, are a derived quantity and because some nutrients would median nutrient intakes, weight and height increased with have been included twice. As the of the size of the age. Exceptions were carbohydrate, protein, saturated fat, all DEFs did not decrease much as the DEFs got larger, this expressed as their percentage contribution to total energy suggested that there were important DEFs within clusters. intakes, glucose, fructose, sucrose, lactose, beta carotene, Note that the size of the between-cluster and within- retinol, calcium and copper. cluster variance components is dependent on the magnitude DEFs were calculated for sampling by cluster (intra-cluster) of the nutrient intakes. and for sampling more than one child per household (intra- household). Intra-household design effects There were 30 children from 15 households, with two children per household. The DEF and intra-household Intra-cluster design effects correlation for individual nutrients, weight, height and There were 125 children in the random sample of 50 clusters, haemoglobin are given in Table 3. Intra-household DEFs

giving an average of 2.498 children per cluster, with a range ranged from 1 for height to 1.839 for vitamin B6, corre- of 1–6. The variance components, DEF and intra-cluster sponding to intra-household correlations of 0 and 0.839, correlation for individual nutrients, weight, height and respectively. The median within-household DEF was 1.550. haemoglobin are reported in Table 2. Dietary nutrient DEFs In general, the DEFs were much larger within households ranged from 1.0 for cholesterol, beta carotene, vitamin A, than within clusters. Exceptions were for fat expressed by its vitamin D, vitamin E, fructose, and both carbohydrate and contribution to total energy intake, manganese, height and protein when expressed by their contribution to total energy haemoglobin. intakes to 1.552 for saturated fat, with a median DEF of 1.148. Their corresponding intra-cluster correlation coeffi- cients ranged from 0.00 to 0.37, respectively. The DEFs (and Example corresponding intra-cluster correlations (ri)) for weight The following example shows how the data in Tables 1 and 2 (1.062; ri ¼ 0.041), height (1.023; ri ¼ 0.016) and haemo- can be used for calculating sample sizes. To detect a globin (1.273; ri ¼ 0.183) were all relatively small. difference in total energy of 0.500 MJ between two groups DEFs for nutrients were categorized into groups of width of 1–4-year-old children at the 5% significance level and 80% 0.1 to determine whether the number of DEFs in each power, then category decreased with increasing size of the effect, which ÀÁ 2 2 could suggest random outliers. Figure 1 shows the frequency 2 Za= þ Z1Àb s n ¼ 2 distribution of the DEFs across the 41 crude nutrients. D2 Nutrients expressed as their total contribution to total where n is the number of people required per group, Z is from the normal distribution, a is the significance level, b is the power, s2 is the standard deviation squared, and D is the difference between groups to be detected. If we use a random sample n ¼ 2 Â (1.96 þ 0.84)2 Â (2.266)2/ 0.5002 ¼ 324 children per group, and allowing for the DEF for a clustered sample, we have n ¼ 1.492 Â 324 ¼ 484 children per group.

Discussion

We have reported both intra-cluster and intra-household DEFs, and intra-class correlation coefficients calculated from 24-h-assisted recall data for a number of nutrient intakes, height, weight and haemoglobin, associated with randomi- zing street addresses rather than individuals.

Recruitment method Figure 1 Frequency of intra-cluster DEFs for nutrients categorized The sampling scheme in the current study used probabilities into groups of width 0.1. of selection for the different ethnic groups in the area under

European Journal of Clinical Nutrition DEFs associated with dietary nutrient intakes of children aged 1–14 years PA Metcalf et al 1069 Table 3 Within-household design effects, intraclass correlation coefficients and variance components for nutrients, weight, height and haemoglobin from 30 children

Nutrient Between household variance Within household variance Intraclass correlation coefficient Design effect

Energy (MJ) 9.44 9.77 0.492 1.492 Carbohydrate (g) 6679.28 10 412.00 0.391 1.391 Carbohydrate as % energy 14.07 59.53 0.191 1.191 Protein (g) 528.96 430.55 0.551 1.551 Protein as % energy 4.03 10.81 0.271 1.271 Fat (g) 1310.27 1223.09 0.517 1.517 Fat as % energy 16.91 31.43 0.250 1.350 Saturated fat (g) 335.66 326.00 0.507 1.507 Saturated fat as % energy 8.92 10.84 0.451 1.451 Monounsaturated fat (g) 192.05 123.04 0.610 1.610 Monounsaturated fat as % energy 4.63 5.02 0.479 1.479 Polyunsaturated fat (g) 21.95 14.07 0.609 1.609 Polyunsaturated fat as % energy 7.25 2.49 0.744 1.744 Water (g) 153 284.00 107 859.00 0.587 1.587 Fibre (g) 36.67 40.89 0.473 1.473 Sugars (g) 3274.66 2393.33 0.578 1.578 Glucose (g) 175.60 45.63 0.794 1.794 Fructose (g) 155.65 66.13 0.702 1.702 Sucrose (g) 567.81 1969.88 0.224 1.224 Lactose (g) 73.19 157.78 0.317 1.317 Maltose (g) 7.94 6.18 0.562 1.562 Starch (g) 1035.05 3604.03 0.223 1.223 Cholesterol (mg) 11 374.00 7653.16 0.598 1.598 Thiamine (mg) 1.39 0.71 0.661 1.661 Riboflavin (mg) 0.70 0.41 0.630 1.630 Niacin total (mg) 19.57 20.81 0.485 1.485 Niacin equivs from tryptophan (mg) 108.08 74.95 0.591 1.591 Vitamin A (mg) 98 029.00 164 901.00 0.373 1.373 Vitamin C (mg) 10 290.00 4517.86 0.695 1.695 Vitamin D (mg) 1.68 3.93 0.299 1.299 Vitamin E (mg) 6.74 8.77 0.435 1.435 Vitamin B6 (mg) 0.64 0.12 0.839 1.839 Vitamin B12 (mg) 4.79 3.82 0.556 1.556 Folate (mg) 4151.76 7207.92 0.365 1.365 Beta carotene (mg) 921 447.00 2 773 847.00 0.249 1.249 Retinol (mg) 43 001.00 58 937.00 0.422 1.422 Sodium (mg) 1 078 200.00 786 317.00 0.578 1.578 Potassium (mg) 1 233 911.00 405 616.00 0.753 1.753 Magnesium (mg) 7821.54 3605.88 0.684 1.684 Calcium (mg) 112 323.00 84 900.00 0.570 1.570 Phosphorus (mg) 152 321.00 124 910.00 0.549 1.549 Iron (mg) 31.75 11.05 0.742 1.742 Zinc (mg) 15.69 9.67 0.619 1.619 Manganese (mg) 376 605.00 1 138 182.00 0.249 1.249 Copper (mg) 0.21 0.06 0.789 1.789 Selenium (mg) 191.74 149.46 0.562 1.562 Weight (kg) 16.94 241.92 0.065 1.065 Height (mm) 0.00 93 868.00 0.000 1.000 Haemoglobin (g/l) 19.24 105.75 0.155 1.155

study, which meant that there was only a probability of 0.24 groups would have approximately the same number of of two children being recruited from any one household, and children selected in the final sample. a very small probability (o0.008) of more than two children being selected from a single household. It was also possible that no children from a household with eligible children were selected. The advantage of this selection process was Intra-cluster DEFs that, within the three ethnic groups, all individuals were The DEFs and intra-cluster correlations reported in Table 2 selected with the same probability and hence no weighting suggest that the method of cluster sampling used in this pilot would be required for within-group analyses. In addition, survey would increase the required sample size by 15% on the selection probabilities were such that all three ethnic average for most nutrients, with a maximum required

European Journal of Clinical Nutrition DEFs associated with dietary nutrient intakes of children aged 1–14 years PA Metcalf et al 1070 increase of 55% to cover all nutrients, compared to using household, the DEF is 1 þ 2 Â 0.49 ¼ 1.984. In contrast, the simple random sampling. intra-household correlation for copper of 0.789 gives a DEF For example, the intra-cluster correlation of 0.32 for total of 2.578, giving very little additional information for the energy means that for a sample size of 20 per cluster, the DEF extra two children interviewed. Thus, in general, within- is 1 þ 19 Â 0.321 ¼ 7.10. Thus, there is a cost associated with household variation is smaller than within-cluster variation. larger-sized clusters. This ‘cost’ depends on the relative costs An inverse relationship between-cluster size and the degree of recruiting and interviewing. Connelly (2003) describes of within-cluster variation has been described previously how to select economically efficient combination(s) of (Gulliford et al., 1999). clusters and cluster size. Although there is a need for the publication of variance components and intra-class correlation coefficients to aid the design of complex surveys, these do not appear to have Intra-household design effects been published widely in the past (Campbell and Grimshaw, Table 3 shows that the DEFs were generally larger within a 1998). A selection of studies that have done can be found in household than within clusters. This shows that the diet of the book by Donner et al. (Donner et al., 2000). The intra- children living in the same household were much more class correlation coefficient is more generalizable than the similar than children living in other houses. DEF, as the latter is dependent on cluster size. The main reason for estimating the within-household These data enable sample sizes to be estimated for cluster DEFs was to determine the degree to which the DEFs were sampling of dietary nutrient intakes in children aged 1–14 inflated. The two main factors that influence this are (i) the years. The larger DEFs found within households suggest that intra-household correlation. If this is high, it means little extra information may be obtained by sampling more the characteristics (e.g. nutrient intakes) of children within than one child per household, but this decision should be the same household are very similar. This suggests that only governed by the relative costs of interviewing more than one one child should be sampled per household for a study child per household versus the costs of going to single similar to the current one (in effect, sampling the other households. children adds little further information to the survey results because of the close similarities of dietary intakes within a household). However, if the analysis of the data focuses on Acknowledgements examining age groups separately, any DEFs introduced by sampling more than one per household would be reduced in Other collaborators on the Children’s Nutrition Pilot Survey such an analysis because the children within a household are were Professor Boyd Swinburn, Dr Cameron Grant and Dr likely to be spread across different age groups, and (ii) David Schaaf from the University of Auckland, Professor variability in the household sizes. Sampling only one per Mason Durie and Eljon Fitzgerald (Massey University, household leads to disproportionately sampling children Palmerston North), Dr Elaine Rush (Auckland University of who belong to households with fewer children. This does not Technology) and Dr Clare Wall, Kate Sladden and Patsy necessarily lead to any bias (any potential bias is removed by Watson (Massey University, Albany). Dr Patricia Metcalf was the use of weights inversely proportional to the household supported by the Health Research Council of New Zealand. size), but it is not necessarily very efficient (especially as the Pacific Island population was a key subdomain of interest). This population group has more variably sized households References (with regard to the number of children in New Zealand). An additional factor that needs to be taken into account relates to Armitage P, Berry G (1994). Statistical methods in medical research,3rd ed. Blackwell Scientific Publications: Oxford. the practical issues surrounding the fieldwork for the survey. Campbell MK, Grimshaw JM (1998). Cluster randomised trials: time Reducing the costs of interviewers’ travel and time is a prime for improvement. The implications of adopting a cluster design factor in undertaking a clustered approach. Sampling more are still largely being ignored. BMJ 317, 1171–1172. children per household would mean less travel, less work Connelly LB (2003). Balancing the number and size of sites: an economic approach to the of cluster samples. making contact with households and less work convincing Control Clin Trials 24, 544–559. parents to participate. Once one child has been surveyed, the Donner A, Brown KS, Brasher P (1990). A methodological review of marginal costs of including further children from the same non-therapeutic intervention trials employing cluster randomisa- household would most likely be considerably smaller than tion, 1979–1989. Int J Epidemiol 19, 795–800. Donner A, Klahr N (2000). Design and analysis of cluster randomization moving to a different household to survey the next child. trials in health research. Arnold: London. The decision of whether to select more than one child in a Food Files (2004). Data Files of the New Zealand Food Composition household depends on the relative costs of interviewing Database. Palmerston North, New Zealand: New Zealand Institute more than one child in a household versus the costs of going of Crop & Food Research. Gulliford MC, Ukoumunne OC, Chinn S (1999). Components of to single households. For example, the intra-household variance and intraclass correlations for the design of community- correlation of 0.492 for total energy intake means that based surveys and intervention studies: data from the Health collecting information from up to three children in a Survey for England 1994. Am J Epidemiol 149, 876–883.

European Journal of Clinical Nutrition DEFs associated with dietary nutrient intakes of children aged 1–14 years PA Metcalf et al 1071 Hampel FR (1974). The influence curve and its role in robust Metcalf PA, Scragg RK, Sharpe S, Fitzgerald ED, Schaff D, Watts C estimation. J Am Stat Assoc 69, 383–393. (2003). Short-term repeatability of a food frequency question- Harville DA (1977). Maximum likelihood approaches to variance naire in New Zealand children aged 1–14 years. Eur J Clin Nutr 57, component estimations and to related problems. J Am Stat Assoc 1498–1503. 72, 320–338. Reading R, Harvey I, Mclean M (2000). Cluster randomised trials in Kerry SM, Bland JM (1998). The intracluster correlation coefficient in maternal and child health: implications for power and sample cluster randomisation. BMJ 316, 1455–1460. size. Arch Dis Child 82, 79–83. Koepsell TD, Wagner EH, Cheadle AC, Patrick DL, Martin DC, Diehr Rice N, Leyland A (1996). Multilevel models: applications to health PH et al. (1992). Selected methodological issues in evaluating data. J Health Services Res Policy 1, 154–164. community-based health promotion and disease prevention SAS Institute Inc (2004). SAS/STAT User’s Guide Version 91. SAS programs. Annu Rev Public Health 13, 31–57. Institute Inc.: Cary, NC.

European Journal of Clinical Nutrition