Analysis of the Main Determinants of Price in Housing Buildings in

Parishes of , , , , Penha de França, Santa Clara and São Vicente

Carolina Pais Correia [email protected]

Instituto Superior Técnico

May 2019

Abstract

The purpose of this research is to disclose the determinants of the sale price of residential properties in the city of Lisbon, particularly in the parishes of Arroios, Beato, Marvila, Olivais, Penha de França, Santa Clara and São Vicente. This work was based on a database with information on real estate sold in the municipality of Lisbon from 2008 to 2017, with a sample of 1986 properties sold in the parishes under study. For each parish, two models of property price prediction were developed using two statistical tools: multiple linear regression model and generalized linear model. The most recurrent predictive variables were the year of construction and the state of conservation of the property, and these variables tend to have a high weight. The quarter, the existence of an extra floor in the property and the existence of charges are not predictive variables of any of the parishes. The area, which normally has a great weight in the definition of real estate prices, is predictive on only three parishes and presents a very low weight. This is because the independent variable, the price of real estate, is given in the form of price per square meter, removing the largest contribution of the area on the property value. Comparing the two models of each parish, it is observed that the generalized linear model always has a better fit. The parish whose model has the best adjustment is Beato, and the one with the worst adjustment is São Vicente. Keywords: Housing market, price determinants, hedonic model

different perspectives. In this work the focus 1. Introduction is directed to the price formation, exploring The price of residential real estate is the predictive models of real estate prices. currently of great importance for various The hedonic price method [1], allows sectors of society, namely governments, modelling the price of a product based on construction companies, real estate agents quantitative attributes associated with it, and the population in general. This is due to typically through a traditional linear the fact that the real estate market and the regression. This model was later applied to changes in sales values could affect the the real estate market [2], with the growth of the economy, the inflation and the integration of physical, environmental and banking sector, as well as social equity and accessibility characteristics to econometric accessibility. Given its importance and its models that sought to justify differences in instability, the housing market has been real estate values. strongly studied in recent years through

1 Over the last decades the hedonic price To better understand the expected model has been studied resorting to different behaviour of variables and which are the analysis tools and with different objectives, most used statistic tools, a few studies that which resulted in the use of a wide range of were carried out in similar scopes to this explanatory variables. Hedonic models have work were analysed. Table 1 marks the been used to model housing markets and variables and he statistic tools used in each housing price indices [3] [4] and identification of the studies. For each variable used, it’s of housing submarkets [5]. There were also signal and its magnitude in the model are authors that opted for other approaches, symbolised between parenthesis. The focusing their studies on the effect of certain magnitude is symbolised by how many times determinants of the property price, such as the signal repeats itself ([+][ - ]: low weight; proximity to railway stations [6], the theft [+++][- - -]: high weight). index in residential areas [7], or the distance to the city centre [8]. ______

Table 1 - Variables (signal and magnitude) and Statistic Tools observed in other studies

and and

Age

Area

Floor

ance to the ance

Garage

State of

Typology

Reference

CityCentre

Accessibility

Conservation

StatisticTools

StorageRoom Transportation

No.Bathrooms

Dist AirConditioning x x x [9] x (+) x (+++) x (+) x (+) x (+) x (-) QR (++) (++) (++) x x LR, [10] x (+++) x (-) x (+) (++) (++) GWR [11] x (--) x (+++) x (--) SLR [7] x (--) x (+++) x (+) LR, RR x [12] x (+) x (+) x (-) MR (++) [13] x (--) GWR [14] x (---) x (+++) x (---) LR QR - Quantile Regression; LR - Linear Regression; GWR - Geographically Weighted Regression; SLR - Semi-Log Regression; RR - Robust Regression (control of outliers); MR - Multiple Regression ______

The most used explanatory variable to determinants underlying real estate prices estimate housing prices is the area, and it is depend on the particular context, which may also the one that generally presents greater vary in time and space. weight in the final sale price of the property. This weight is positive, which means that the The methodologies based on simple and price of the property tends to increase with multiple linear regressions are the ones with the increase of its area. The age of the the greatest presence in the literature, but in property is also of relevant in the definition of recent years there has been a trend towards the price of a property, so it is repeated in the exploration of increasingly complex many research efforts. The impact of this simulation models of the behaviour of the variable tends to have a negative sign, since real estate market. In the older studies, real estate deteriorates as their age linear regression was used because it was increases. The signal of each variable may the tool available. Still, even with the present some variation, but its strength is development of new modelling tools, in even more inconstant. The variable might particular artificial intelligence tools such as have a high impact in some studies and be artificial neural networks, linear regressions less relevant in others. An analysis of are used to establish a baseline to models used in several studies concluded benchmark the performance of the models that studies often disagreed about the sign developed with the more sophisticated tools and magnitude of the characteristic’s weight [16]; [17]. Another evolution in terms of in the final price [15]. This reveals that the modelling tools, is the use of approaches

2 that allow the incorporation spatial referring to the study areas are used, information related to the environment in completing a total of 1986 properties. The which the property is inserted into the database that supports this study presents hedonic models. With the increase on the information regarding the price of the use of GIS platforms to integrate spatial with properties and their characteristics. These non-spatial information, tools such as the characteristics constitute a set of variables to Geographically Weighted Regression be incorporated in the models, being these (GWR), or the Spatial Autocorrelation Model variables the following: i) Semester of Sale (SAC), among others, have been used to (SEM); ii) Quarter of Sale (TRI); iii) Parish address real estate prices. (FRE); iv) Construction Year (ACON); v) Building Floors (PEDIF); vi) Extra Building Floors (PEDIF_extra); vii) Typology (TIP); viii) Floor (PISO); ix) Extra Floor 2. Case Study and Methodology (PISO_extra); x) Area (CEDIF); xi), State of Conservation of Property (CIM); xii) The parishes of Olivais, Beato, Marvila and Abandoned (ABAND); xiii) Onus (ONUS); Santa Clara considered in the present works and xiv) Buyer (COMP). Extra building floors are classified by the real estate market in reflects the case of buildings with semi-attic Lisbon as the peripheral zones. These are in the last floor. The Extra floor reflects the the zones that are in the eastern periphery cases of duplex or triplex configuration of the of Lisbon (excluding the parish of Parque apartments. The independent variable das Nações), being further away from the (PVNA) is the selling price of each property centre of the city. These are areas marked reduced to the area unit and normalized to by an industrial past, of which there are still the average price of the year in which the several vestiges, and with some presence of transaction occurred. By doing so, the social housing. Despite being characterized influence of time on the housing prices is by aged and degraded buildings, a great removed. development is expected in the coming years, through the appearance of new The methodological approach of this work is leisure spaces, construction of new to test different models that simulate the enterprises and reformulation of behaviour underlying the formulation of the thoroughfares. These areas have an price of a property based on its extremely privileged location in terms of characteristics. For this, two tools were views to the Tejo River. tested: i) Linear Regression; and ii) Generalized Linear Modelling, with this The parishes of Arroios, Penha de França second to integrate the interaction of and São Vicente are designated as other variables and non-linearity. Firstly, studies historical zones. They are located between were done on the descriptive statistics of the the peripheral areas and the historic city database and the correlation matrix of the centre of the city, very attractive also for their variables under study. Then, through a first proximity to the business centre of Lisbon. linear regression, we obtained the Being older zones possess more historical statistically significant predictive variables patrimony than the peripheral zones, reason for each parish and, with the improvement of why they have more points of interest. the categories of the variables and the Closer to the city centre, they benefit from models and with the removal of the outliers better access and a greater offer of means identified, a linear regression model was of transport. Unlike the outlying areas, they obtained for each parish. Subsequently are not limited to residential areas, being several generalized linear models were abundantly served by local commerce, jobs tested, evaluating different combinations of and miscellaneous services, so they are variables in interaction. In the parishes that busier during the day and have a larger had AREA as an explanatory variable it was number of inhabitants, thus having a high included as non-linear through the populational density. They are also older application of a power. Finally, the best areas, which increases the existence of models of each type of tool per parish were more degraded buildings, but implies that, in selected and compared. general, many of the buildings have historical value. This study is based on a sample of 8188 residential properties in the municipality of Lisbon, , sold in the period from 2008 to 2017. From the data base, only data

3 3. Results and Discussion considered independent, it is very difficult to guarantee that one characteristic of a given 3.1. Descriptive Statistics and data does not have any type of influence of Correlation of Variables another. In order to analyse the level at which two variables are correlated a The results presented in this work are Pearson correlation matrix was calculated. divided by parish. This option was chosen From the analysis of the matrix it’s since the average price is statistically noticeable that some variables have a high different between parishes. This fact was degree of correlation, while in others the observed through the calculation of a degree of correlation is insignificant. Of the ANOVA based on several statistical tests, possible pairs of variables, those with the such as the Tukey HSD test. highest correlation values are SEMxTRIM (0.893), ACONxCEDIF (0.398), ACONxCIM From the study of the statistical behaviour of (0.330), PEDIFxPISO (0.516), TIPxAREA the PVNA variable, it is observable that the (0.620), CEDIFxCIM (0.584). Peripheral Zones have lower average values than the Other Historical Zones, with the Olivais, Marvila and Beato areas having a similar average price. On the other hand, the 3.2. Predictive Variables Other Historical Zones have very different average values, being Arroios the parish With the initial Linear Regression, using the with higher average values. Stepwise method, the explanatory variables of each parish were defined. Table 2 Regarding the remaining independent summarizes the predictive variables for each variables, it should be noted that the parish. variables SEM, PEDIF_extra, PISO_extra, ABAND, ONUS and COMP have a similar It is also observed that the most common behaviour in all parishes, while the variables are the year of construction remaining variables’ behaviours change (ACON) and the condition of the property from zone to zone, particularly ACON and (CIM), appearing in six of the seven AREA. This diversity of behaviours is largely parishes. These two characteristics tend to explained by the fact that they are variables be very important in the definition of the with a greater number of categories, thus selling price of a property, and are usually with greater dispersion of data, while the related, since the condition of the property remaining variables have a smaller number depends a lot on the year in which it was of categories, which reduces the variation of built. An also common variable in this study the data. is the buyer (COMP), that is present in the models of five parishes. Although the variables under study are ______

Table 2 - Predictive Variables

Variables Parish SEM ACON PEDIF PEDIF_extra TIP PISO AREA CEDIF CIM ABAND COMP

Santa Clara X X X X X

Olivais X X X X X X X

Marvila X X X

Beato X X X

Penha de X X X X X X França

São Vicente X X X X X

Arroios X X X X X X X X

______

4 The variables referring to the quarter (TRI), conservation of the property, is a variable of the extra floor (PISO_extra) and the great importance and with a significant and existence of onus (ONUS) are not predictive positive weight in the sale price. The positive variables of any of the parishes, and value of the multiplicative parameter therefore are not considered significant in indicates that the price of the property the dependent variable PVNA). increases as the state of conservation improves. A rehabilitated or new property, Although there was a wide variation in the with good conditions will be more valued prices of real estate in Lisbon in the years of than a property aged or presenting the study, the dependent variable (PVNA) pathologies (structural or not). The comfort contains the normalized prices at the and conditions offered by the property are of average selling price of the respective year. great importance, which is reflected in the The variables PISO_extra and ONUS had a parameters referring to CIM that have very constant behaviour, since many of the relatively high values, demonstrating that properties do not have more than one floor this variable has a great weight in and have no charge, having low standard determining the final sale price. deviation values, so the variables are not significant. The year of construction (ACON) indicates the age of the property, which may be indicative of the use that the property has had, and the state of degradation that it may 3.3. Linear Regression have. The variable has a considerable Several new categorizations of variables weight, being only less influential in Beato's were tested, and then applied to linear model, and is always positive, which regression models with the various describes an increase in price as the year of combinations of these categorizations. construction increases. This implies that These models were compared and higher ACON values have greater influence afterwards, for each parish, the model that on the dependent variable, that is, that the presented the best fit, that is, with higher R2 more recent the property, the higher the value, was selected. The selected models component of the final price referring to the are summarized in Table 3. year of construction, so that there is a valuation of the most recent properties when The variable CIM, referring to the state of compared to older ones.

______

Table 3 - Linear Regression

Variables (β)

2

R

Parish

TIP

CIM

SEM

extra

PISO

AREA

Const. ACON

PEDIF

CEDIF COMP

ABAND PEDIF_

Santa 0.187 0.076 0.021 -0.193 0.006 0.121 0.432 Clara Olivais 0.215 0.095 -0.202 -0.045 0.087 0.061 0.048 0.057 0.408 Marvila 0.033 0.135 0.098 0.088 0.417 Beato -11.484 0.006 -0.002 0.137 0.371 Penha de -0.002 0.078 0.084 -0.115 0.214 0.075 0.066 0.372 França São 0.309 0.078 0.103 -0.001 0.246 0.081 0.215 Vicente Arroios 0.049 0.090 -0.213 -0.037 0.046 0.174 0.233 0.061 0.084 0.305

______The area (AREA) is the variable with the price. However, in these studies the price is lowest weight. This is common in the three given in absolute value, while in the present parishes that have the area as a predictive study the PVNA variable already represents variable. In other studies, this variable tends the price per unit area. The presence of the to have a preponderant weight in the sale area as a predictive variable may represent

5 a slight variation in price/m2 depending on state of conservation of the property. For the size of the property. Santa Clara, the model of interaction between the sale semester and the state of conservation of the property was chosen, since it presents a better adjustment, and for 3.4. Generalized Linear Model the parish of São Vicente the chosen Before applying the generalized linear interaction model included the interaction of model, the AREA variable was transformed the state of conservation of the property with in the Beato, Santa Clara and São Vicente the buyer. parishes, which contained it as a descriptive variable. The applied transformation is non- The Generalized Linear Model assigns a linear, and consisted in the implementation multiplicative factor to each category of a of a power, obtained through non-linear variable, unlike what happens in Linear regression. For the mentioned zones were Regression, where each variable has a obtained the powers presented in Table 4. single factor. This deconstruction of the factor allows the model to be better adapted Table 4 - AREA powers to the data, since the numerical order assigned to each category does not Santa São represent (unless it is an ordinal variable) an Beato Clara Vicente increase of ordered magnitude, since the values are only to facilitate modelling. Thus, p 3.114 -0,002 0.000 the Generalized Linear Model allows to conclude, for each variable, which categories have higher and lower weight in It is possible to observe that the power the model. The model assumes, by variable, obtained for São Vicente is zero, which turns one (or more) base category, to which it all the inputs of the variable into the unit assigns a value of zero. The weight of the (value 1). This implies that when changing other categories is then given by reference the analysis tool, the variable AREA is no to this base category. longer explanatory for the parish of São Vicente. The variable, which even in linear regression already had a very low weight, is then on the threshold of being explanatory or 3.5. Model Comparison not. As already mentioned, the Generalized For each parish, several combinations of Linear Model in SPSS Statistics uses the variables were tested in interaction, and the AIC as the adjustment parameter. Thus, in model with the best fit was chosen. This order to be able to compare the two models, adjustment is determined by the Aikaike the results obtained through the statistic Information Criterion (AIC), since it is the tools were transferred to the MS Excel adjustment parameter provided for this database. Subsequently, comparison methodology. The lower the Aikaike graphs were elaborated with two sets of Information Criterion, the better the data: the set of linear regression results and adjustment. the set of MLG results. For each set of data the price obtained through the model was For the parish of Arroios, it was chosen the placed on the axis of the ordinates and on model that contained the interaction the axis of the abscissa the real value of the between the year of construction and the price. The respective influence lines were state of conservation of the building, since it plotted and the coefficient of determination was the one that presented a lower AIC R2 was obtained for each model. Figure 1 value. For Beato it was selected the model contains the graphs obtained for the seven with interaction between the year of parishes. The data in blue refer to the linear construction and the area, and for Marvila regression and the green data for the was selected the model with the interaction generalized linear model. Table 5 contains between the sale semester and the year of the values of R2 of both models of each construction. For the parish of Olivais, a parish. model of interaction between the year of construction and the variable referring to the It is possible to verify that the Generalized abandonment of the property was chosen, Linear Model (GLM) obtained coefficients of and for the parish of Penha de França the determination (R2) superior to the ones chosen model included the interaction obtained by Linear Regression, which between the year of construction and the means that the GLM has a better adjustment

6 to the actual behaviour of the sales prices. It when compared to the points of the linear can be observed in the graphs that the points regressions, which are more dispersed. of the generalized linear models are much more condensed and along the trend line,

Arroios Beato

4 2

]

] - - 3,5 3 1,5 2,5 2 1 1,5 1 0,5 0,5

0 0

Real Real (PVNA) Price [ Real Real (PVNA) Price [ 0 0,5 1 1,5 2 2,5 0 0,25 0,5 0,75 1 1,25 1,5 1,75 Modelled Price [-] Modelled Price [-]

Marvila Olivais

1,5 2

]

]

- - 1,5 1 1 0,5 0,5

0 0 Real Real (PVNA) Price [ 0 0,25 0,5 0,75 1 Real (PVNA) Price [ 0 0,25 0,5 0,75 1 1,25 Modelled Price [-] Modelled Price [-]

Penha de França Santa Clara 2

2 ]

]

- - 1,5 1,5

1 1

0,5 0,5

0

Real Real (PVNA) Price [ 0 Real Real (PVNA) Price [ 0 0,25 0,5 0,75 1 1,25 1,5 0 0,25 0,5 0,75 1 1,25 Modelled Price[-] Modelled Price[-] São Vicente

3

] - 2,5 2 1,5 1 0,5

Real Real (PVNA) Price [ 0 0 0,25 0,5 0,75 1 1,25 1,5 1,75 2 Modelled Price [-]

Figure 1 - Model Comparison Graphs

7

______

Table 5 - Comparison of R2 values

Santa Penha de São Olivais Marvila Beato Arroios Clara França Vicente

Linear 0.274 0.408 0.416 0.353 0.355 0.215 0.306 Regression R2 Generalized 0.602 0.543 0.515 0.896 0.396 0.303 0.397 Linear Model

______As seen in Table 5 the Generalized Linear location, since there are several Model had better behaviour in all zones than neighbourhoods of social housing in this the Linear Regression, with a greater parish. difference between models in some parishes than in others. The parish of Beato is the one In general, the abnormally low values with the greatest difference between the two correspond to degraded or poorly models, where the coefficient of maintained properties, and/or located in determination of the GLM (0.896) is much cellars, that is, corresponding to buried higher than that of Linear Regression floors, cases that are repeated more in (0.353), and it is also the one with the better Arroios and São Vicente. model adjustment. By contrast, São Vicente was the parish that obtained models with worse adjustment. The adjustment of the Generalized Linear Model varies according 4. Conclusion to the amount of data of each parish. The first important conclusion drawn from this dissertation is that the variables have, in general, very different behaviours and 3.6. Outliers impacts from parish to parish. The decision to opt for separate models for the different The properties belonging to the Other zones is supported by the theory of spatial Historical Zones tend to have more outliers, heterogeneity, which states that different since the sample of these zones is larger. areas of the city affect property demand and Many of the abnormally high-priced their price differently, and the comparison of properties belonging to these areas are the different models allows to understand, to housed in older buildings, many of which a certain extent, the effect that different have been rehabilitated and have some zones have on the various variables. historical value. Others are inserted in new buildings, which stand out for having far The implied average price of the real estate superior conditions to the surrounding varies in the different parishes, including buildings. In addition, the location of these between parishes belonging to the same properties, closer to the historic centre and type of zone, being this variation more the business centre of the city, is a factor of noticeable in the other historical zones, appreciation when compared to other similar which implies that these zones have greater properties, but located in more remote spatial heterogeneity than the peripheral areas. zones. In addition, they also have, in general, higher average values, indicating In the Peripheral Zones the number of that the proximity to the historical and outliers is smaller, and these properties tend business centres of the city have a great to stand out more at very low values. The impact on the price of real estate, as would Olivais parish is one of the four that has a be expected from the analysis of other more relevant number of outliers, and many studies’ results. of these correspond to very low values of sale price, and are usually associated with The year of construction and the states of buildings and/or properties very degraded, in conservation of the property and of the poor conditions or with a less attractive building are variables with significant levels

8 of correlation, and are the variables with Markets: Product Differentiation in greater weight in the determination of the prices of the real estate, in the several Pure Competition, Journal of Political parishes. Although there are several factors Economy, 1974. that affect the price of a property, the conditions of age and state of the materials and finishes that it presents end up being the [2] A. C. Goodman, Hedonic Prices, Price most determinant characteristics. Indices and Housing Markets, Journal Comparing the two models of each parish, it of Urban Economics, 5, 471-484, 1978. is concluded that the price values calculated through the Generalized Linear Model have [3] M. Fleming and J. G. Nellis, The a better adjustment to the behaviour of real prices. Although both models have a linear Halifax House Price Index Technical behaviour, the Generalized Linear Model Detail, Halifax, Halifax Building assigns each category of each variable a multiplicative parameter of its own. Thus, the Society, 1984. various categories of a variable may have different signs and magnitudes, depending [4] H. Mark and M. A. Goldberg, on the influence of that category on the price of the property, because each category has Alternative Housing Price Indices: An a different influence, contrary to what is Evaluation, Journal of American Real considered in the linear regression models. Estate and Urban Economics In addition, these models include a variable interaction component. The variable Association, 12, 30-49, 1984. interaction component improves the fit of the model, since it takes into account the fact [5] A. S. Adair, J. N. Berry and W. S. that the predictive variables are not completely independent of each other. McGreal, Hedonic Modeling, Housing Submarkets and Residential Valuation, The models with the best fit are the generalized linear models of the parishes of Journal of Property Research, 13, 67- Beato and Santa Clara. These present 83, 1996. coefficient of determination values of 0.896 and 0.602, respectively, which is equivalent to the models explaining 89.6% and 60.2% [6] G. Debrezion, E. Pels and P. Rietveld, of the dependent variable. These are high The Impact of Rail Transport on Real and significant values, especially the value of Beato. The remaining values are lower Estate Prices: An Empirical Analysis of and vary between 0.303 and 0.543, and the the Dutch Housing Market, Urban lowest value (0.303) corresponds to the model of the parish of São Vicente. Studies, 48(5), 997-1015, 2011. The area, which in the analysed literature tends to be one of the variables with greater [7] M. Wilhelmsson, Ceccato and V., Does weight, in this study seems to be little or not Burglary Affect Property Prices in a significant. This happens because the used value of the price is not absolute, but a value Nonmetropolitan Municipality?, given per square meter. When the impact of Journal of Rural Studies, Vol. 39, 210- the area in the price is removed, it also removes a factor that explains a great part of 218, 2015. the price value, which explains why some of the obtained models have such low [8] B. Soderberg and C. Janssen, adjustments. Estimating Distance Gradients for Apartment Properties, Urban Studies, 5. References Vol. 38, No. 1, 61–79., 2001.

[1] S. Rosen, Hedonic Prices and Implicit

9 [9] J. Zietz, E. N. Zietz and G. S. Sirmans, E. N. Zeitz, The Composition of Determinants of House Prices: A Hedonic Pricing Models, Journal of Quantile Regression Approach, The Real Estate Literature, 13(1), 3-46, Journal of Real Estate Finance and 2005. Economics, 37(4), 317-333, 2008. [16] M. Stamou, A. Mimis and A. Rovolis, [10] M. McCord, P. Davis, M. Haran, S. House price determinants in Athens: a McGreal and D. McIlhatton, Spatial spatial econometric approach, ournal Variation as a Determinant of House of Property Research, 34:4,269-284., Price, Journal of Financial 2017. Management of Property and [17] V. Reichel and P. Zimčík, Construction, Vol. 17, No. 1, 49 - 72, Determinants of Real Estate Prices in 2012. the Statutory City of Brno, Acta [11] J. Melichar and K. Kaprová, Revealing Universitatis Agriculturae et Preferences of Prague's Homebuyers Silviculturae Mendelianae Brunensis, Toward Greenery Amenities: The 66(4): 991 – 999., 2018. Empirical Evidence of Distance-Size

Effect, Landscape and Urban Planning, 109(1), 56-66, 2013.

[12] P. Davis, J. McCord, M. McCord and M. Haran, Modelling the Effect of Energy Performance Certificate Rating on Property Value in the Belfast Housing Market, International Journal of Housing Markets and Analysis, Vol. 8(3), 292 - 317, 2015.

[13] J. Yao and A. Fotheringham, Local Spatiotemporal Modelling of House Prices: A Mixed Model Approach, The Professional Geographer, 68:2, 189- 201, 2016.

[14] A. Nistor and D. Reianu, Determinants of Housing Prices: Evidence from Ontario Cities, 2001-2011, International Journal of Housing Markets and Analysis, 2018.

[15] G. S. Sirmans, D. A. Macpherson and

10