<<

Grid sampling for a mixed-mode human survey and adjustment for non-response

Seppo Laaksonen1 1University of , e-mail: [email protected]

Acknowledgements: The study is a methodological part of the ongoing project that is initiated by the professors Mari Vaattovaara ja Matti Kortteinen from the University of Helsinki. I also thank Henrik Lönnqvist and Teemu Kempainen who are co-working for the project as well.

NTTS 2013 _ Seppo Laaksonen 1 Table 1. Statistics of grids where one or more adults living

Type of area Numb Population er of of 25-74 grids years Stratum of ’poor’ grids 1058 232416 Stratum of ’rich’ grids 1187 70382 Municipality strata without 5020 390142 confidentiality exclusion Excluded due to 1616 6785 confidentiality from the grid- based sample but not from the municipality sampling. All 8881 699725

NTTS 2013 _ Seppo Laaksonen 2 Figure1. Grids for ‘rich’ people (RED) vs. ‘poor’ people (BLUE) in the municipalities of the survey. The remaining grids are between those two ones or empty of people

Median income < 32092 Median income > 73206

NTTS 2013 _ Seppo Laaksonen 3 Table 2. Distribution of the gross sample to strata. The group ‘Others’ in the above scheme is equal to municipality gross sample size.

25-74 Munici- year Poor grids Rich grids pality Populatio ‘poor, h’ ‘rich, h’ ‘all, h’ Total n Helsinki, most urbanised southern area 110 46 1000 1156 27465 Helsinki, most urbanised northern area 1142 8 1000 2150 40206 Helsinki, suburb 2501 1324 2500 6325 147098 and 546 3127 2000 5673 131840 Hyvinkää 248 64 600 912 24944 Järvenpää 115 38 600 753 21717 124 48 600 772 18874 89 173 600 862 20065 0 0 1000 1000 57059 0 0 600 600 22613 Mäntsälä and 49 22 600 671 13850 Nurmijärvi 85 120 600 805 21924 48 134 600 782 10269 118 201 600 919 20948 746 574 1500 2820 104930 81 121 600 802 15923 All 6000 6000 15000 27000 699725

NTTS 2013 _ Seppo Laaksonen 4 Inclusion probabilities

Single municipality strata

nh  k  Nh

Strata with grid sampling and thus with post-strata, ’Rich’ grids (and similarly to ’Poor’ grids and ’All’ grids

n   rich,h k Nrich,h

In which

Nall,h  Nh  (N poor,h  Nrich,h )

NTTS 2013 _ Seppo Laaksonen 5 Table 3. Some statistics of the gross/net sample design weights

Munici- Munici- Grid part Grid part Statistics pality part pality part Gross Net Gross Net Obser- 12000 4387 15000 5231 vations Population 302357 302357 397368 397368

Mean 25.8 70.6 27.1 77.8

Minimum 8.3 18.2 13.1 39.0

Maximum 45.6 164.2 57.1 167.8

CV (%) 54.6 61.4 36.4 39.9

NTTS 2013 _ Seppo Laaksonen 6 Our strategy for the weight adjustments is as follows:

(i) We take those initial weights wk and divide these by the estimated response probabilities (called also response propensities) of each respondent obtained from the probit model, and symbolised by pk. (ii) Before going forward, it is good to check that the probabilities pk are realistic, that is, they are not too small, for instance. Naturally, all probabilities are below one. (iii) Since the sum of the weights (i) does not match to the known population statistics by strata h or by post-strata ‘rich, h’ , ‘poor, h’ or ‘all, h’, they should be calibrated so that the sums are equal to the sums of the initial weights in each stratum. This is made by multiplying the weights (i) by the ratio in which h may refer to w post-strata as well. q  h k h w / p h k k (iv) It is good also to check these weights against basic statistics, for example as presented in Table 3. If the weights are not plausible, the model should be revised.

NTTS 2013 _ Seppo Laaksonen 7 Table 4. Outcomes from the response propensity modeling by probit regression Probit Standard Auxiliary variable Category estimate error p-value Type of grid Intermediate -0.064 0.006 <.0001 (ref= Rich) Poor -0.148 0.006 <.0001 Gender Male -0.292 0.003 <.0001 (ref= Female) Female 0,000 0 . Age group 25-34 -0.618 0.006 <.0001 (ref= 65-74 years) 35-44 -0.575 0.006 <.0001 45-54 -0.439 0.006 <.0001 55-64 -0.161 0.005 <.0001 Mother tongue Finnish -0.009 0.007 0.208 No significant (ref=Swedish) Swedish 0,000 0 . Number of people 1 0.179 0.013 <.0001 (ref=6+) 2 0.359 0.013 <.0001 3 0.272 0.013 <.0001 4 0.289 0.013 <.0001 5 0.216 0.014 <.0001 Removed to the Before 1995 0.013 0.004 0.0008 current house Between 1995-2006 -0.049 0.005 <.0001 (ref=After 2006) After 2006 0,000 0 . Current and previous Removed to the 0.019 0.007 0.0113 living area southern (ref=Removed within Removed within the 0.032 0.004 <.0001 the same zip code southern Finland area Labour market status Unemployed -0.051 0.008 <.0001 (ref=No unemployed)

NTTS 2013 _ Seppo Laaksonen 8 Interaction by gender in the probit model Females of all age groups are participating better. It is fairly linear since 35 years old.

0 25-34 35-44 45-54 55-64 65-74

-0,2

-0,4

Male -0,6 Female

-0,8

-1

-1,2

NTTS 2013 _ Seppo Laaksonen 9 Probit estimates by income (earning plus capital) Fairly linear relationship Not that we could not get education. This replaces it.

0,4

0,3

0,2

0,1

0 Highest 2nd highest 3rd highest 3rd lowest 2nd lowest Lowest

-0,1

-0,2

-0,3

-0,4

NTTS 2013 _ Seppo Laaksonen 10 Probit estimates by current and previous house size If removed to a larger house, the response propensity is higher. If a smaller, not so motivated to participate

0,05

0,045

0,04

0,035

0,03

0,025

0,02

0,015

0,01

0,005

0 Current house smaller Current house size about as earlier Current house larger

NTTS 2013 _ Seppo Laaksonen 11 100

80

60 Web Paper

40

20 Propensity, % 0 0 20 40 60 80

Figure 2. Example of the cumulative response propensities for the respondents via ‘web’ and via ‘paper’ , respectively. We see that there are lower propensities for web respondents. But a web option is good for the survey participation, any way. More effort to motivate to use web is required

NTTS 2013 _ Seppo Laaksonen 12 Adjustment leads to a new weight with a slightly higher variation as expected

Munici- Munici- Adjusted Grid part Grid part pality pality weights Statistics part part for all Gross Net Gross Net Net

Observations 12000 4387 15000 5231 9618

Population 302357 302357 397368 397368 699725

Mean 25.8 70.6 27.1 77.8 72.8

Minimum 8.3 18.2 13.1 39.0 12.4

Maximum 45.6 164.2 57.1 167.8 754.3

CV (%) 54.6 61.4 36.4 39.9 67.8

NTTS 2013 _ Seppo Laaksonen 13 Results on people’s opinion on their living area by the type of grid; the means and standard errors in parenthesis. Indicators are scaled so that 0 = lowest, 100= highest.

Intermediate Poor grids Rich grids grids General assessment 74.6 (0.48) 62.3 (0.55) 83.3 (0.44) Quality of environment 74.5 (0.31) 65.4 (0.37) 79.6 (0.32) Quality of housing 72.7 (0.34) 65.4 (0.36) 77.4 (0.33) conditions Quality of services 68.9 (0.37) 73.2 (0.37) 68.8 (0.42) Assessment of living 74.8 (0.35) 67.6 (0.40) 80.1 (0.31) area Amount of problems 44.2 (0.60) 66.7 (0.58) 34.9 (0.58)

NTTS 2013 _ Seppo Laaksonen 14 Conclusion:

Administrative areas and either postal zip codes are not ideal when designing and analysing survey data. Grids offer a flexible tool since they can be of a whatever size in principle, but confidentiality should be taken carefully into account. Results based on small grids are also more interesting comparing to those of ordinary areas. Basically, people living in a small grid know each other but this is not true with administrative and similar areas.

NTTS 2013 _ Seppo Laaksonen 15