G Detection of housing market segments:

(1) SOM analysis

Enlarged feature map illustrating price levels in Helsinki subareas (dark colour = cheap area; light colour = expensive area).

225 The dwelling format (dark colour = single-family houses; light colour = multi-storey houses).

he number of rooms (dark colour = 1-2 rooms; light colour = 3+ room

226 Age (dark colour = new buildings; light colour = old buildings).

Status (dark colour =low status; light colour = high status).

Negative social externalities (dark colour = low levels; light colour = high levels).

227 The ‘urban’ indicator (dark colour = most urban areas; light colour = least urban areas).

Commercial services (dark colour = low levels of services; light colour = high levels of services).

Public services (dark colour = low levels of services; light colour = high levels of services).

228 The ‘open space’ indicator (dark colour = least undeveloped land in the area; light colour = most undeveloped land in the area).

(2) LVQ analysis No. of labels Exact definition of Classification accuracy, Success of supervised and criterion labelling criterion validation sample training with the LVQ: (training sample in training 10000, 20000, 30000 brackets) etc. iterations; the best map before over-training occurs; test sample 2 open space Amount of undeveloped 99.32% (99.37%) - (what is the added value indicator land in the vicinity within a with trying to improve this 2km range: 0-4.99 km2 / accuracy?) 5.00+ km2 2 location in Area urbanisation indicator 97.98% (98.30%) - relation to CBD (good proxy for accessibility): <-2 / >-2 2 age 0-49 yrs / 50+ yrs 96.71% (96.50%) - 2 location A posteriori clustering 95.64% (95.17%) - combined with based on the organised factors maps: certain suburbs / the rest of the data 2 public Nr of public services in the 94.95% (95.35%) - services area: 0-49 / 50+ 2 commercial Nr of commercial services 91.38% (91.76%) - services in the area: 0-39 / 40+ 2 location Municipality: Helsinki / 88.50% (90.33%) Improvement =>91.58%, but else no clear overtraining

229 (Continued) No. of labels Exact definition of Classification accuracy, Success of supervised and criterion labelling criterion validation sample training with the LVQ: (training sample in training 10000, 20000, 30000 brackets) etc. iterations; the best map before over-training occurs; test sample 2 house type Dwelling format: multi- 88.25% (89.26%) Improvement =>93.11%, but storey apartment / other no clear overtraining 2 negative Area sos.ext -indicator: 87.51% (88.21%) Improvement =>91.95%, social positive/ negative externalities 3 location A posteriori clustering 87.47% (93.75%) Marginal improvement => combined with based on the organised 87.62%, but no clear other factors maps: 2 separate groups overtraining and rest of the data 2 price per FIM 7369 or less / FIM 87.32% (86.82%) Improvement => 4.22%, but sq.m. 7370+ no clear overtraining 2 status Area status -indicator: 86.43% (88.05%) Improvement => 5.14%, but positive / negative no clear overtraining 3 (4) location Municipality: Helsinki / 85.39% (87.43%) Improvement => 1.86%, but / Vantaa no clear overtraining (/Kauniainen minor segment) 3 price per FIM 4869 or less / FIM 81.54% (83.20%) Improvement => 90.30% sq.m. 4870 - 9869 / FIM 9870+ 3 age 0-24 yrs / 25-74 yrs / 75+ 81.27% (83.61%) Improvement => 1.67%, but yrs no clear overtraining 2 size (rooms) 1-2 rooms / 3+ rooms 70.47% (72.08%) Improvement => 88.21% 3 size (rooms) 1 room / 2 rooms / 3+ 54.92% (57.96%) Improvement => 73.65% rooms ~ 400 micro- Subareas 30.56% (35.55%) Too difficult location

Some comments:

The map size is 12*8. A bigger map (e.g. 24*16), would give a better classification accuracy. For example, compare the following:

S with micro-location a 12*8 map gives 30.56%, and a 24*16 map gives 51.31% S with macro-location a 12*8 map gives 85.39%, and a 24*16 map gives 93.36%.

In general: defining more classes give a lower classification accuracy, if the criterion is the same.

230 The classification is based either on an a priori chosen criterion or an a posteriori chosen clustering. In general, labels chosen based on the a posteriori clustering gives a better accuracy than the a priori chosen labels.

A dichotomous classification, where the number of observations per class in one class is large (tenfold) compared to the other was excluded, even if the result was superior.

On balance:

The very high levels of classification accuracy imply segmentation within the data set. Perhaps surprisingly, the best classification result is obtain with the dichotomous ‘open space’ -indicator. As expected, the urbanisation indicator, which proxies CBD accessibility, gives a good result. Also the house type matters, as expected. The age of the building serves as a proxy for location (possibly also an independent effect as a proxy for aesthetic values attached to the architecture/design?), and is as such very important. Both types of services are important and also the macro-location matters: either the municipality (Helsinki, Vantaa or Espoo) or a more specific grouping based on the combined effect of age, price and number of rooms, which segments certain suburbs located far away from the centre to their separate groups. These are two groups comprising dwellings in northern, eastern and north western Helsinki, Espoo and Vantaa1; in areas with poor services and low density, located relatively far away from the centre of Helsinki. These dwellings have an average or low price per sq.m., are situated in average or new building stock, and comprise three or more rooms.

The conclusion is that a segmentation based on house type, location and other factors (age?) seems more appropriate in Helsinki than a segmentation based on price-levels.

Also supervised training was quite successful in the most labelling solutions. The accuracy percentages were in most cases improved, but a clear overtraining did not occur, perhaps due to the already high levels of accuracy. The two best classification results for the unsupervised map already had an classification accuracy of above 95%. Therefore, a supervised training probably would not have had any added value.

1 Group (I) Vantaa: Länsimäki, Koivukylä, Rajakylä, Simonkylä, Hämevaara-Hämeenkylä; Espoo: Bemböle. Group (II) Helsinki/North: Paloheinä-Torpparinmäki, Kumpula-Toukola; Helsinki/North-West: Malminkartano; Helsinki-East: Puotila; Vantaa: Havukoski-Rekola; Espoo: , - Westend, , Kilo-, Nöykkiö-Latokaski, Viherlaakso-Lippajärvi.

231