APPENDIX C: THE DESIGN OF THE SYPE SAMPLE AND THE CALCULATION OF SAMPLING WEIGHTS

The sample of the Survey of Young People in 2009 (SYPE) was designed in such a way as to be representative at the national as well as regional levels. The sample size of approximately 17,000 young people between the ages of 10 and 29 was selected to provide estimates of key indicators related to adolescents and youth for the country as a whole and for four administrative regions (Urban governorates, Lower Egypt governorates, Upper Egypt governorates and the Frontier governorates), and, where relevant, for the urban and rural segments of these regions. These indicators include never enrollment rates, dropout rates, the incidence of child labor, and unemployment rates. Based on previous statistics about the incidence of young people in the relevant age and sex groups, we determined that a nationally-representative sample of 11,000 households would be sufficient. To obtain accurate estimates for the Frontier governorates, these governorates had to be oversampled. As a result, the SYPE is not a self-weighted sample and weights are needed to obtain the correct estimates. The weights are derived in Section 2 below. C.1 Sample Design Sample Frame The SYPE sample was designed as a multi-stage stratified cluster sample. The primary sampling units (PSUs) were selected from a CAPMAS master sample. The master sample is a stratified cluster sample that contains 2,400 PSUs, divided into 1,080 urban and 1,320 rural PSUs. These PSUs are drawn from a frame of enumeration areas (EAs) covering the entire country prepared by CAPMAS from the 2006 Population Census. Each EA is drawn up in such a way as to contain roughly 1500 dwelling units. The sample is stratified into governorates and each governorate is further stratified into urban and rural segments, where relevant. The distribution of PSUs across strata in the master sample reflects the distribution of the population so as to produce a self-weighted sample.

To achieve a fairly wide geographic dispersion in the SYPE sample and thus minimize the design effect, we set the number of households per cluster to 25. To obtain these 25 households, 25 dwelling units were systematically selected from the roughly 1500 listed in each EA.64 To get the sample size we needed, we set the number of required PSUs to 455, for a total sample size of 11,375 households. The distribution of PSUs across governorates and urban and rural areas in both the master sample and the SYPE sample are shown in Table 1. The final sample of households interviewed was made up of 11,372 households, which yielded a total of 15,029 young people aged 10-29.

As shown in Table 1, the PSU’s in the SYPE sample were drawn from the EA’s in the master sample at a rate of roughly 19%-20%. With the exception of the Frontier Governorates and the Luxor administrative area, the sampling rate varies in a relatively narrow range from 14% to 27%. To get good representation from the sparsely populated Frontier Governorates, we increased the sampling rate significantly, in some cases retaining all the PSU’s in the master sample. Weights will be derived at the level of the administrative region to account for these varying sampling rates.

64 An additional 5 to 10 dwelling units per PSU were selected to allow for replacement in case the dwelling could not be located or was found vacant. 264

Survey of Young People in Egypt / January 2011

Selecting the Urban Slums Sub-Sample One of the objectives of SYPE is to obtain separate estimates for young people living in urban slums (referred to in the report chapters as informal urban areas). To make sure we had enough representation of urban slums, we used a study conducted by the Information and Decision Support Center of the Egyptian Cabinet of Ministers (IDSC) to classify urban PSU’s in the CAPMAS master sample into slum and non-slum areas. Deciding how to allocate urban PSUs to slum and non-slum areas was not a straightforward exercise given the unreliability of the data on the population of the slum areas.

Table 1: Distribution of primary sampling units across governorates and urban/rural areas in CAPMAS master sample And SYPE sample Enumeration Areas in Sampling Rate from Master PSUs in SYPE Sample CAPMAS Master Sample Sample Governorate Total Urban Rural Total Urban Rural Total Urban Rural Urban governorates

Cairo 285 285 0 55 55 0 19% 19% -- Alexandria 149 149 0 25 25 0 17% 17% -- Port Said 20 20 0 4 4 0 20% 20% -- Suez 17 17 0 4 4 0 24% 24% -- Lower Egypt

Damietta 39 15 24 8 3 5 21% 20% 21% Dakahlia 176 50 126 29 8 21 16% 16% 17% Sharkia 175 42 133 29 7 22 17% 17% 17% Qalioubia 145 56 89 21 8 13 14% 14% 15% Kafr El Sheikh 85 20 65 17 4 13 20% 20% 20% Gharbia 139 44 95 25 8 17 18% 18% 18% Menoufia 107 23 84 21 5 16 20% 22% 19% Behira 152 31 121 23 4 17 15% 13% 14% Ismailia 31 15 16 8 4 4 26% 27% 25% Upper Egypt

Giza 215 130 85 34 21 13 16% 16% 15% Beni Suef 69 17 52 13 3 10 19% 18% 19% Fayoum 78 19 59 13 3 10 17% 16% 17% Minya 128 26 102 23 4 17 18% 15% 17% Assiut 101 28 73 17 5 12 17% 18% 16% Souhag 114 25 89 21 5 16 18% 20% 18% Qena 88 20 68 17 4 13 19% 20% 19% Aswan 37 16 21 8 3 5 22% 19% 24% Luxor 14 7 7 8 4 4 57% 57% 57% Frontier Governorartes

Matrouh 8 6 2 8 6 2 100% 100% 100% New Valley 6 3 3 6 3 3 100% 100% 100% Red Sea 9 8 1 6 5 1 67% 63% 100% North Sinai 10 6 4 9 5 4 90% 83% 100% South Sinai 3 2 1 3 2 1 100% 100% 100% Total 2400 1080 1320 455 212 239 19% 20% 18%

265

Survey of Young People in Egypt / January 2011

Table 2: Distribution of urban slum areas nationally and in SYPE sample

Distribution of all Distribution of urban National distribution urban PSU's in SYPE slum PSU's in SYPE Distribution of urban non- of slum areas sample sample slum PSUs in SYPE sample

Governorate Number Percent Urban Percent Number Percent Number Percent

Cairo 74 8% 55 26% 4 9% 51 30% Alexandria 29 3% 25 12% 2 5% 23 14% Port Said 3 0% 4 2% 0 0% 4 2% Suez 3 0% 4 2% 0 0% 4 2% Damietta 37 4% 3 1% 2 5% 1 1% Dakahlia 119 12% 8 4% 6 14% 2 1% Sharkia 82 9% 7 3% 4 9% 3 2% Qalioubia 58 6% 8 4% 3 7% 5 3% Kafr El Sheikh 49 5% 4 2% 3 7% 1 1% Gharbia 48 5% 8 4% 3 7% 5 3% Menoufia 45 5% 5 2% 2 5% 3 2% Behira 71 7% 4 2% 4 9% 0 0% Ismailia 12 1% 4 2% 1 2% 3 2% Giza 29 3% 21 10% 2 5% 19 11% Beni Suef 38 4% 3 1% 2 5% 1 1% Fayoum 28 3% 3 1% 1 2% 2 1% Minya 75 8% 4 2% 4 9% 0 0% Assiut 0 0% 5 2% 0 0% 5 3% Souhag 22 2% 5 2% 1 2% 4 2% Qena 0 0% 4 2% 0 0% 4 2% Aswan 7 1% 3 1% 0 0% 3 2% Luxor 12 1% 4 2% 0 0% 4 2% Matrouh 30 3% 6 3% 0 0% 6 4% El Wadi El 5 1% 3 1% 0 0% 3 2% Gedid Red Sea 47 5% 5 2% 0 0% 5 3% North Sinai 36 4% 5 2% 0 0% 5 3% South Sinai 0 0% 2 1% 0 0% 2 1% Total 959 100% 212 100% 44 100% 168 100%

First, we had to make a decision on how to allocate the 212 urban PSUs to slum and non- slum PSUs. The most reasonable estimate of the share of slums in the urban population was close to 20%, leading us to allocate 44 of the 212 urban PSU’s in the sample to slum areas. Second, we had to allocate these 44 slum PSUs to the various governorates. This allocation was done in such a way as to match as closely as possible the distribution of the number of slum areas across governorates as shown in Table 2. Ideally, we should have allocated slum PSUs across governorates according to each governorate’s share of slum population rather than its share in the number of slum areas. However, given the unreliable information about 266

Survey of Young People in Egypt / January 2011

the population of slum areas, it was impossible to do the allocation in terms of population. This allocation decision is likely to understate the true share of slums in governorates such as Cairo, Giza and Alexandria, where the size of slums is likely to be larger than average, and overstate slum populations in governorates like Damietta, Dakahlia and Sharkia where the size of slums is probably smaller than average. Without reliable data on slum populations, it is unfortunately not possible to use weights to correct for this possible bias in the geographic distribution of slums. C.2 Sampling Weights and Expansion Factors Three sampling weights are included in the SYPE database: (i) the household sampling weight, (ii) the roster individual sampling weight and (iii) the interviewed individual sampling weight. There are three corresponding expansion factors that expand the population to the projected population in mid-2009. It should be kept in mind, however, that both the weights and expansion factors are designed to reproduce the structure of the population as measured in the 2006 Population Census, since no information on changes in the structure of the population is available for the period from November 2006, when the census was taken, and mid-2009. The household weight and expansion factor The household sampling weight (hw) is the normalized inverse probability of selection at the household level. This weight takes into consideration the sampling probability for each stratum and possible differences in household response rates at the PSU level. The strata again are defined as the four administrative regions mentioned above, namely urban governorates, Lower Egypt governorates, Upper Egypt governorates and Frontier governorates.65 The last three regions were not separated into urban and rural strata on the assumption that the PSU’s within them are allocated to urban and rural components in a self-weighted way. Similarly the contribution of the slum PSUs within the urban was assumed to reflect the true contribution of slums within each of the four strata. We start by defining the following terms:

is the number of households in the sample in stratum s

is the total number of households in the sample is the number of households in the population in stratum s in the 2006 Population Census

is the total number of households in the population in the 2006 Population Census is the number of households in PSU p in stratum s actually interviewed out of the initially selected 25 households

G is the population growth factor from the 2006 population census to the middle of 2009

is the number of individuals in the sample in stratum s

65 An administrative reorganization occurred at the end of 2009 that created the governorate out of part of and part of , and Sixth of October governorate out of part of Giza governorate. For our purposes, Helwan governorate was considered to be part of the Lower Egypt governorates and Sixth of October governorate was considered to be part of the Upper Egypt governorates. 267

Survey of Young People in Egypt / January 2011

is the total number of individuals in the sample is the size of the population in stratum s in the 2006 Population Census

is the total size of the population of Egypt in the 2006 Population Census

number of PSU’s in stratum s If every household in the stratum had an equal probability of being selected, the probability of selection in stratum s is given by . However non-response at the PSU level may reduce the probability of being selected. The probability of being selected in PSU p in stratum s is given by ). Thus the household sampling weight is given by

. This expression is obtained by normalizing the inverse probability of selection by its average over the entire sample.66

The household expansion factor for each household in PSU p is stratum s is given by: .

The roster individual weight and expansion factor The roster individual sampling weight (iw) is constructed in a similar fashion as the household weight, but takes into account possible differences in average household size between the SYPE sample and the 2006 Population Census. The roster individual weights in PSU p in stratum s are given by: . The roster individual expansion factors are given by: .

The interviewed individual weights The interviewed individual weights take into account the fact that only one individual per household in each of the five targeted age-sex groups (g) is randomly selected for interviewing as well as any possible individual non-response for the selected individuals due to either absence or refusal. These targeted age-sex groups are indexed by g below.67 To undertake this calculation, we define some additional terms: is the number of individuals in group g in household h in PSU p and stratum s. is the total number of individuals in the sample in group g

66 To obtain the average, inverse probabilities for each household are summed over each PSU, then across PSUs in each stratum then across strata. Since the probabilities are constant within PSUs, they are first multiplied by hps, the number of interviewed households per PSU. Since, the resultant figure is constant across PSUs within a stratum, its sum in a stratum is obtained by multiplying it by the number of PSUs (PSUs) and then summed across strata. 67 The targeted age-sex groups (g) are any child 10-14, males 15-21, females 15-21, males 22-29, and females 22-29. 268

Survey of Young People in Egypt / January 2011

is the estimated total number of individuals in group g in the population in mid 2009 The weights are calculated as the inverse of the multiplication of three probabilities, namely the probability of the household being selected, the probability that the individual is selected from among the eligible individuals in the household in the targeted age-sex group, and the probability of responding once selected. The probability of the household being selected is given by above. The probability of the individual being selected among the eligible individuals in the household is given by . The final probability, the probability of response , is given by the predicted probability from a probit model that an individual in group g listed in the roster who is eligible and selected for interviewing will actually respond to the individual questionnaire. The regressors used in the probit model are individual characteristics obtained from the roster, such as age-sex group, region of residence, urban slum/urban non-slum/rural residence, household wealth quintile and education level. The probit model estimates are shown in Table 3. They indicate that the lowest response rates are among males 18 to 29 residing in the governorates of Upper Egypt and belonging to the bottom three wealth quintiles. The coefficients shown in the Table 3 were used to predict the individual-specific response probabilities that we refer to here as . The interviewed individual weight for an individual i of group g in household h in PSU p and stratum s is therefore given by:

.

The interviewed individual expansion factor is given by =

The population of each group g in mid 2009, , was calculated by applying the age-sex distribution from the SYPE survey to the total projected population in mid-2009 as provided by CAPMAS.

269

Survey of Young People in Egypt / January 2011

.

Table 3: the probit model estimate of the probability of response for eligible individuals Probit Model Explanatory Variable Coef. Std. Err. z-score P>z Female 10-14 0.443 0.117 -3.800 0.000 Male 15-17 -0.514 0.093 5.510 0.000 Female 15-17 0.172 0.118 -1.450 0.146 Male 18-21 -1.157 0.083 13.890 0.000 Female 18-21 -0.113 0.101 1.120 0.264 Male 22-29 -1.112 0.081 13.790 0.000 Female 22-29 0.190 0.097 -1.960 0.050 Lower urban -0.433 0.117 3.700 0.000 Lower rural -0.560 0.085 6.560 0.000 Lower slum -0.696 0.105 6.620 0.000 Upper urban -0.855 0.093 9.240 0.000 Upper rural -1.071 0.086 12.460 0.000 Upper slum -1.327 0.116 11.450 0.000 Frontier urban -0.430 0.129 3.340 0.001 Frontier rural -0.295 0.145 2.040 0.041 2nd wealth quintile 0.075 0.053 -1.420 0.155 3rd wealth quintile 0.065 0.054 -1.210 0.227 4th wealth quintile 0.276 0.064 -4.310 0.000 Highest wealth quintile 0.262 0.079 -3.340 0.001 Inc. primary educ. 0.050 0.095 -0.530 0.599 Primary educ. -0.122 0.084 1.460 0.144 Preparatory educ. 0.067 0.091 -0.730 0.466 Secondary educ. -0.131 0.078 1.680 0.094 Interm. Diploma -0.062 0.132 0.470 0.636 University education -0.154 0.088 1.750 0.080 Intercept 2.727 0.133 -20.450 0.000

Note: The reference category is an illiterate 10-14-year-old male residing in an urban governorate whose household is in the lowest wealth quintile.

270

Survey of Young People in Egypt / January 2011