USEFULNESS OF METADATA FOR THE CORRECTION OF TIME SERIES INHOMOGENEITY Maria Carmen Beltrano1 , Simona Sorrenti1

1CRA-CMA National Council for Agricultural Research –Unit for Climatology and Agrometeorology Via del Caravita 7a, I-00186 Rome; ph. +39066195311; fax +390669531215 maria [email protected], [email protected]

Abstract

The inhomogeneities in meteorological time series that appear as abrupt discontinuities, gradual changes, or changes in variability often depend to changes in the observing system, changes in instrumentation or in exposure, station relocation or replacement of sensors, or, more, to the application of new calibration corrections. If it is important to have improvements in measuring techniques, it is also important to have information, known as metadata — that is information on data — that tells station history and often can explain the occurrence, kind and the time of discontinuities. Only after clearing up inhomogeneities in time series caused by operational changes in observing systems we can apply appropriate statistical programs to link the previous dataset with the new dataset in an homogeneous databases with a high degree of confidence. We show a case of study about the utility of metadata in a preliminary detection of discontinuities in the thermometric series of the urban weather observatory of Roman College founded at the end of 18th century, located in a historical building in the heart of Rome. This study is carrying out within the research program “AGROSCENARI” financed by Ministry for Agriculture, Alimentation and Forestry (D.M. 8608/7303/2008).

Key words: breakpoints in time series, metadata.

Introduction

In the headquarters of the Unit for Climatology and Agro-Meteorology Research (CRA-CMA), is located the oldest meteorological observatory of Rome, known as "Roman College Observatory". Its institution, due to the Jesuits who then occupied the premises of the College, began in mid- 1500, and was then known in Europe for its scientific value. At the Centre, observations, started with regularity in 1782, and continue to be performed always in the same site, using more and more modern methods of observation, always comparable with the previous one. In effect, the scientific importance of Roman College Observatory rely on the length and consistency of the series that is a very valuable database for studies about Rome climate and urban climatic changes depending from urbanization. Studies about long historical data series are carrying out to identify climatic trend, expression of a probable climate change. However, to carry out a correct climatic analysis, it need to know also the history of weather station, to clean the data set from mistakes linked to external factors. In fact, in the last decade the scientific community has become more aware of the fact that the real climate signal in original series of meteorological data generally is hidden behind non-climatic noise caused by station relocation, changes in instruments and instrument screens, changes in observation times, observers, and observing routines, algorithms for the calculation of means, and so on. So at present time the statement that time series of meteorological data cannot be used for

1 climate research without a clear knowledge about the state of the data in terms of homogeneity (http://www.isac.cnr.it/~climstor/hom_training.html) has a very large agreement. In the context of climate change, to interpret correctly inhomogeneities which appear as abrupt discontinuities, gradual changes, or changes in variability present in meteorological time series are very important. There are different ways for solving homogeneity problems, and the choice of the most suitable one is not treated in this study; we try to interpret the data-set characteristics -metadata availability- before to apply any homogenization system. Having diagrams and original records permits to obtain many important information about station history, useful to reconstruct observatory life and to read unclear data. Regarding Roman College Observatory we maintain in the paper archive all records and original diagrams since the start of 20th Century. In this work we present a case study about the research to explain and understand causes of discontinuities in the historical series of temperature minimum and maximum of the Roman College Observatory, with the metadata help.

Fig. 1. Calandrelli Tower, headquarters of Roman College Observatory in Rome

Materials and methods

The daily minimum and maximum temperature series of the Roman College Observatory of the period 1901-1999 stored in the National Agrometeorological Data Base (BDAN), the computerized archive of the Unit for Climatology and Agrometeorology Research (CMA), are analyzed. CMA manages the agro-meteorological monitoring network (RAN), that consists in about 130 automatic weather stations located in national territory; CMA also manages several traditional monitoring stations including Roman College Observatory. Measures detected by Agrometeorological and traditional stations are stored in the National Agrometeorological Data Base (BDAN), the computerized archive of CMA. The first step was to analyze the series discontinuities starting from daily minimum and maximum temperature series of the Roman College Observatory of the period 1901-1999 extracted from BDAN.

2 The specific advantage of discontinuities analysis is to attract attention about climatic and non reason that are the origin of the climate evolution. We adopted the discontinuities analysis by flat step regression. The aim of the flat-step regression is the detection of breakpoints in a time series defining a partition in stationary climatic sub-periods described by two different mean values of temperature. This approach permits to define the climatic normal levels before and after breakpoints, where the mean values minimize gaps. For each parameter and each year the annual mean was calculated. The detection of breakpoint has been carried out by the “STRUCCHANGE” package in the software R. (Zeleis et al. 2003). Consider the standard regression model

T yi = xi βi + ui i = 1,….,n where: yi is the observation of the dependent variable at time i, xi is a vector of regressors,

βi is the k-dimensional vector of regression coefficient and ui is an error term.

Assuming the existence of m breakpoints, that is the existence of m+1 partitions with a constant value of the regression coefficients , the optimal position of these is determined by minimizing the residual sum of squares m1 RSS(i1 ,....,im )   rss(i j1 1,i j ) j1 where:

rss ( i j  1  1 , i isj ) the residual sum of squares of the j partition.

The optimal number of breaks can be determined by minimizing an information criterion. Bai and Perron (2003) argue that the Bayesian Information Criterion (BIC) is a suitable selection (Zeileis et al., 2002, 2003).

Individuated breakpoint years, several studies are researched to compare the results of our investigation. Also news and information about station change have been researched in archive, to understand if the climate change shown by statistical analysis were depending to mistakes in station management or due to particular meteorological events. The results of this preliminary analysis will be able to clean the series from mistakes due to the management, so, it will be possible to go on to the series homogenization in rigorous way.

Results

To identify change points we’ve analyzed three time series: temperature minimum (Tn), maximum (Tx) and mean (T) of the Roman College Observatory since 1901 to 1999. The breakpoints are detected in several years (table 1).

3 Table 1. Parameters Discontinuity years Minimum temperature (Tn) 1922, 1949, 1984 Maximum temperature (Tx) 1919, 1937, 1967, 1981 Mean temperature (T) 1919, 1967, 1984

In graphics Figures 1-3 we represented the temperature series breakpoints by dotted lines. The whole period means are represented by the green horizontal lines, and the blue horizontal lines represent the mean of the sub-periods detected. The lowest horizontal red lines shows the confidence range of 90%.

Fig. 2. The annual mean temperature during period 1901-1999 at Roman College Observatory

In the Figure 3 and Figure 4 the results of the discontinuity points for the minimum and maximum annual data are shown. The mean and minimum temperature show the biggest difference between the whole mean and the sub-periods means at the beginning and at the end of the series with change points detected around the early twenties and the middle eighties, while the most obvious difference for the maximum series is that occurred during the seventies where the mean is 17.8 while considering the whole period a mean value of 20.5 has been detected. For minimum temperatures each sub-period identified by breakpoints presents an increase of mean values. In the last one there is an important increase of 0,87 °C. For the rise observed in the third sub-period, it is possible to conjecture that at the end of second world war and up to the second half of ’60 years there is in Italy the economic boom, heating systems in the city have begun to spread, giving rise to the phenomenon known as "heat island".

4 Fig. 3. The annual minimum temperature during period 1901-1999 at Roman College Observatory

Fig. 4. The annual maximum temperature during period 1901-1999 at Roman College Observatory

For maximum temperatures we note a fluctuating behavior with sub-periods mean values up or down to the whole mean. The lowest mean values of the maximum temperature detected in ’70 years seem to be structural because they coincide with all evidences known in scientific studies as “cold period” (IPCC, 2001; Esposito S.). Probably the change point dated ‘1984’ can be explained both with the structural change carried out in many studies (Werner et al. 2000, IPCC, 2007) and with the observer change.

5 The research of the Observatory metadata was difficult: in the paper archive we don’t find useful information about eventual changes in instrumentation and in observing routines. The results obtained in analysis of dates of observer change (Table 2) show an interesting corresponding between break point referred to minimum and mean temperature in 1984 and the observer change at the end of 1983, that have better to investigate in a comparison with the signal registered in the same period in other stations.

Table 2. Dates of Observer change ---- May 1937 September 1964 November 1983 January 1991 June 1997

Conclusions

To identify existing trend, expression of possible climate change in studying thermometric series, ever need a work of data homogenization. We think that a rigorous work of data homogenizations need before a preliminary study of each historical series, also with the help of metadata. In fact we think is important to clean the observation series from all mistakes derived from causes non dependent to meteorological behavior. We present the study case of the minimum and maximum temperatures series detected at the Roman College Observatory in Roma until 1901 to 1999. We studied data series adopting the discontinuities analysis by flat step regression that detect breakpoints in a time series, dividing the series in stationary climatic sub-periods described by different mean values of temperature. Then we tried to interpret breakpoints in looking for metadata information regarding Roman College Observatory. Unfortunately we didn’t find useful information, except dates of observer change. The biggest problems in analyzing for homogenization of a single historical data set is to find metadata, useful information about management station: in fact also in a principal observatory like Roman College, which is also the CMA observatory, with a long historical tradition, there are many problems in order to understand the real climate signal in original series of meteorological data. Generally non-climatic noise is an important cause in the interpretation of the quality of the data in terms of homogeneity. So far metadata haven’t carried out indications about the series fluctuation, but the future aim of this work is to investigate more deeply the discontinuities results. This work can be considered a first step for detection and correction of the inhomogeneities in the historical series of the Roman Observatory. Future development will include the study of different paper archive and will use other proxy variable that could be a useful instrument for the explanation of the identified discontinuities.

6 Reference

Bai J., Perron P. (2003). Computation and Analysis of Multiple Structural Change Models, Journal of Applied Econometrics, 18, 1- 22. Brunetti M., M. Maugeri, Monti F., Nanni T. (2006). Temperature and precipitation variability in italy in the last two centuries from homogenized instrumental time series. International Journal of Climatology 26: 345–381 Chiaudani A., Borin M., Berti A., Mariani L. (2008). Due metodi di analisi di discontinuità a confronto: piecewise e strucchange in funzione agro climatic, Italian Journal of Agrometeorology, Anno 13 - N.1-febbraio 2008 CNR-ISAC. (2005). Training session on homogenization methods, Bologna 17-18 May, www.isac.cnr.it/~climstor/hom_training.html Esposito S. (2008). Relazione tecnico-scientifica I° anno, Individuazione delle anomalie climatiche di temperatura e pioggia in Italia finalizzate alle previsioni a scala temporale mensile, RELAZIONE TECNICA PROGETTO “TEMPIO”-PREVISIONI MENSILI DI TEMPERATURA E PRECIPITAZIONI A SUPPORTO DELLA PIANIFICAZIONE DELL’ATTIVITÀ AGRICOLA http://www.cra- cma.it/Tempio/risultati/Relazione_1anno_Progetto_TEMPIO_CRA-CMA.pdf Giavante S., Quaresima S., Di Giuseppe E., Esposito S., Beltrano M.C. (2007). Discontinuità di serie storiche termopluviometriche italiane nel periodo 1961-2007. Italian Journal of Agrometeorology, Quaderno degli Abstract del 12° Convegno nazionale di Agrometeorologia- AIAM 2009, Sassari 15 - 17 giugno 2009, Anno 14 - N.2- giugno 2010 IPCC, 2001: Climate Change 2001: Synthesis Report. A Contribution of Working Groups I, II and III to the Third Assessment Report of the Intergovernmental Panel o Climate Change [Watson, R.T. and the Core Writing Team (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, 398pp. Leroy, M. (2008). Metadata to document surface observation. TECO-2008 - WMO Technical Conference on Meteorological and Environmental Instruments and Methods of Observation, IOM No.96 (TD 1462), St. Petersburg, Russian Federation, 27-29 November 2008 Mariani L. (2006). Some methods for time series analysis in Agrometeorology. Rivista Italiana di Agrometeorologia 48 - 56(2) Zeileis A, Leisch F, Hornik K, Kleiber C (2002). “Strucchange: An R Package for Testing forStructural Change in Linear Regression Models.” Journal of Statistical Software, 7(2), 1–38. URL http://www.jstatsoft.org/v07/i02/. Zeileis A., Kleiber C., Krämer W., Hornik K., (2003). Testing and dating of structural changes in practice, Computational Statistics and Data Analysis, 44, 109-123.

7