<<

Nat. Hazards Earth Syst. Sci., 17, 305–314, 2017 www.nat-hazards-earth-syst-sci.net/17/305/2017/ doi:10.5194/nhess-17-305-2017 © Author(s) 2017. CC Attribution 3.0 License.

Spatio-temporal modelling of lightning climatologies for complex terrain

Thorsten Simon1,2, Nikolaus Umlauf2, Achim Zeileis2, Georg J. Mayr1, Wolfgang Schulz3, and Gerhard Diendorfer3 1Institute of Atmospheric and Cryospheric Sciences, University of Innsbruck, Innsbruck, 2Department of Statistics, University of Innsbruck, Innsbruck, Austria 3OVE-ALDIS, Vienna, Austria

Correspondence to: Thorsten Simon ([email protected])

Received: 1 June 2016 – Discussion started: 4 July 2016 Revised: 16 January 2017 – Accepted: 8 February 2017 – Published: 6 March 2017

Abstract. This study develops methods for estimating light- 2000). Thus, the private and the insurance sector have a ning climatologies on the day−1 km−2 scale for regions with strong interest in reliable climatologies for such events, i.e. complex terrain and applies them to summertime observa- for risk assessment or as a benchmark forecast of a warning tions (2010–2015) of the lightning location system ALDIS system. in the Austrian state of in the Eastern . Lightning is a transient, high-current (typically tens of Generalized additive models (GAMs) are used to model kiloamperes) electric discharge in the air with a typical both the probability of occurrence and the intensity of light- length in kilometres. The lightning discharge in its entirety is ning. Additive effects are set up for altitude, day of the usually termed a lightning flash or just a flash. Each flash typ- year (season) and geographical location (longitude/latitude). ically contains several strokes , which are the basic elements The performance of the models is verified by 6-fold cross- of a lightning discharge (Rakov, 2016). Lightning flashes are validation. rare events. Around 0.5–4 km−2 yr−1 occur in the Austrian The altitude effect of the occurrence model suggests Alps (Schulz et al., 2005). Lightning location data that are higher probabilities of lightning for locations on higher el- homogeneously detected – with the same network and se- evations. The seasonal effect peaks in mid-July. The spatial lection algorithm – typically cover on the order of 10 years. effect models several local features, but there is a pronounced Consequently not enough data are available to compute a minimum in the north-west and a clear maximum in the east- spatially resolved climatology on the km−2 scale by simply ern part of Carinthia. The estimated effects of the intensity taking the mean of each cell – even less so if a finer temporal model reveal similar features, though they are not equal. The resolution than yr−1 is desired, which is the case for lighting main difference is that the spatial effect varies more strongly following a prominent annual cycle. Thus there is the need to than the analogous effect of the occurrence model. develop methods to robustly estimate lightning climatologies A major asset of the introduced method is that the resulting by exploiting information contained not only in each analy- climatological information varies smoothly over space, time sis cell and maybe its neighbouring cells but in the complete and altitude. Thus, the climatology is capable of serving as data set. Lightning might, e.g. depend on the altitude of the a useful tool in quantitative applications, i.e. risk assessment grid cells so that combining the information from all cells at and weather prediction. a particular altitude band increases the sample size and leads to a more robust estimate. Other common effects that might be exploited are geographical location, day of the year, time of day, slope orientation, distance from the nearest mountain 1 Introduction ridge, etc. One possibility of harnessing the complete data set to pro- Severe weather, associated with thunderstorms and lightning, duce a lightning climatology in such a manner is using gen- causes fatalities, injuries and financial losses (Curran et al.,

Published by Copernicus Publications on behalf of the European Geosciences Union. 306 T. Simon et al.: Spatio-temporal modelling of lightning climatologies eralized additive models (GAMs; see Hastie and Tibshirani, 2 Data 1990; Wood, 2006). They can include these common effects as additive terms. Each of these terms might be of arbitrary In this study 6 years of data (2010–2015) from the ALDIS complexity and represent the potentially non-linear relation- detection network (Schulz et al., 2005) are included. The data ship between lightning and the covariate. An additional ad- within this period are processed in real time by the same vantage of GAMs is the ability for inference. It can be tested lightning location algorithm ensuring stationarity with re- whether each of the included effects is actually an effect – or spect to data processing. The summer months May to Au- not significant and thus not exploiting common information gust are selected, as this is the dominant lightning season in the whole data set. GAMs allow the inclusion of expert in the . The original ALDIS data contain sin- knowledge through the choice of the covariates. GAMs have gle strokes and information on which strokes belong to a been used to compile climatologies of extremes (Chavez- flash. In order to analyse flashes, only the very first stroke Demoulin and Davison, 2005; Yang et al., 2016) and full of a flash is taken into account. Both cloud-to-ground and precipitation distributions (Rust et al., 2013; Stauffer et al., intra-cloud lightning strikes are considered, as both typically 2016). A further benefit of GAMs is that all the parameters indicate thunderstorms which are of interest in this study. of a distribution, e.g. scale and shape, can be modelled and ALDIS is part of the European cooperation for light- not only the expected value (Stauffer et al., 2016). ning detection (EUCLID) (Pohjola and Mäkelä, 2013; Schulz In this study GAMs are applied to estimate a climatology et al., 2016), which bundles European efforts in lightning of the probability of lightning and a climatology of the ex- detection. Schulz et al.(2016) present an evaluation of the pected numbers of flashes with a spatial resolution of 1 km2 performance of lightning detection in Europe and the Alps and a temporal resolution of 1 day. They also serve as prox- by comparison against direct tower observations with respect ies for climatologies of the occurrence of thunderstorms (e.g. to detection efficiency, peak current estimation and location Gladich et al., 2011; Poelman, 2014; Mona et al., 2016) and accuracy. The median location error was found to be in the thunderstorm intensity. The climatologies will be compiled order of 100 m. Furthermore, they show that the flash detec- for a region in the south-eastern Alps. tion efficiency is greater than 96 % (100 %) if one of the re- A study investigating lightning data for the period 1992 turn strokes in a flash had a peak current greater than 2 kA to 2001 (Schulz et al., 2005) found that flash densities over (10 kA). However, it is impossible to determine the detec- the complex topography of Austria vary between 0.5 and tion efficiency of intra-cloud flashes without a locally in- 4 flashes km−2 yr−1 depending on the terrain. Data from the stalled very high frequency (VHF) network. Thus no attempt same lightning location system (LLS) were also analysed is made in Schulz et al.(2016) to characterize the detection to obtain thunderstorm tracks (Bertram and Mayr, 2004). efficiency of intra-cloud flashes. The key finding is that thunderstorms are often initialized at The region of interest is the state of Carinthia in the south mountains of moderate altitude and propagate towards flat of Austria at the border with and Slovenia. Carinthia areas afterwards. extends 180 km in an east–west direction and 80 km in a Other studies focus on lightning detected in the vicinity of north–south direction. The elevation varies between 339 m the Alps: a 6-year analysis of lightning detection data over and 3798 m a.m.s.l. (above mean sea level). For invoking Germany reveals that the highest activity is in the north- elevation as a covariate into the statistical model (Sect.3) ern foothills of the Alps and during the summer months, digital elevation model (DEM) data (Kärnten, 2015), which when the number of thunderstorm days goes up to 7.5 yr−1 are on hand on a 10 m × 10 m resolution, are averaged over (Wapler, 2013). Lightning activity is also high along the 1 km × 1 km cells (Fig.1), which leads to a maximum eleva- southern rim of the Alps. Feudale et al.(2013) found ground tion of 3419 m a.m.s.l. with respect to this resolution. As the flash densities of up to 11 flashes km−2 yr−1 in north-eastern distribution of the altitude is highly skewed, the logarithm Italy, which is south of our region of interest, Carinthia. of the altitude serves as covariate, which is distributed more The paper is structured as follows: the lightning detection uniformly in the range 6 to 8. data, the region of interest, Carinthia, and the pre-processing The lightning data, for May to August of the 6 years, are of the data are described in Sect.2. The methods for estimat- transferred to the same 1 km × 1 km raster by counting the ing the lightning climatologies from the data are based on flashes within one spatial cell and per day. This procedure generalized additive models (Sect.3). The non-linear effects yields 9904 cells and 738 days for a total of n = 7 309 152 estimated for occurrence and intensity model and exemplary data points, from which 157 440 (2.15 %) show lightning ac- climatologies are presented afterwards (Sect.4). Some spe- tivity. The number of cells in which a specific number of cial aspects of interest for end-users are discussed in Sect.5. flashes was detected decreases rapidly for increasing count The study is summarized and concluded at the end of the pa- numbers. The most extreme data point has 37 flashes per cell per (Sect.6). and day (Fig.2). The mean number of detected flashes in the cells given lightning activity is 1.75. Figure3 shows an example for a climatology based on empirical estimates for July. Here the number of days with

Nat. Hazards Earth Syst. Sci., 17, 305–314, 2017 www.nat-hazards-earth-syst-sci.net/17/305/2017/ T. Simon et al.: Spatio-temporal modelling of lightning climatologies 307

12.6° E 13° E 13.4° E 13.8° E 14.2° E 14.6° E 15° E 12.6° E 13° E 13.4° E 13.8° E 14.2° E 14.6° E 15° E 47.2° N 47.2° N 40 km

●A 47° N 47° N High ●E 46.8° N 46.8° N

46.6° N 46.6° N ●C ●D ●B

46.4° N 46.4° N 40 km

0 % 2 % 4 % 6 % 8 % 10 % 0 500 1000 1500 2000 2500 3000 3500 Figure 3. Empirical climatological probability of lightning for a Figure 1. Altitude of Carinthia (m a.m.s.l.) averaged over day in July in Carinthia on the 1 km × 1 km scale computed from 1 km × 1 km cells. Attributes of sample locations are listed in Ta- counting the days with lightning over all July days in the 6-year ble1. The location map shows the location of Carinthia (black) in period and dividing by the number of all July days. Austria (light grey).

1e+05 (a) 350 (b) currence of lightning Pr(Y > 0) within a cell and a day; sec- 300 8e+04 ond, a truncated count model to assess the expected number 250 [ | 6e+04 200 of flashes within a cell given lightning activity E Y Y > 0].

4e+04 150 This procedure refers to a hurdle model (Mullahy, 1986; 100 Zeileis et al., 2008) which has the further benefit to be able to 2e+04 50 handle the large amount of zero-valued data points (97.85 %, 0e+00 0 1 3 5 7 9 10 15 20 25 30 35 in our case). The occurrence model and the intensity model No. of flashes (d−1 km−2) No . of flashes (d−1 km−2) are specified in Sects. 3.2 and 3.3. In Sect. 3.4 a short overview over the applied verification Figure 2. Daily frequency of 1 km × 1 km grid cells with counts of techniques is given, i.e. cross-validation, scoring rules and flashes (excluding zeros). The right panel (b) zooms into the tail of bootstrapping. the distribution. The percentage of boxes with no flashes detected is 97.85 %. 3.1 Generalized additive model lightning is divided by the total number of days for every The main motivation for using a GAM is the possibility to es- single grid cell. While some patterns emerge, a large amount timate (potentially) non-linear relationships between the re- of noise is visibly superimposed. sponse and the covariates. In the following, the basic concept of GAMs is introduced for an arbitrary parameter θ of some probability density function d(·; θ) (PDF). A GAM aiming 3 Methods at modelling a spatio-temporal climatology over complex ter- rain would have the form, This section introduces the statistical models for estimating the climatologies for lightning occurrence and lightning in- g(θ) = β0 + f1(logalt) + f2(doy) + f3(lon,lat), (1) tensity. The aim of the statistical model is to explain the re- sponse, i.e. the probability of occurrence or counts of flashes, where g(·) is a link function that maps the scale of the pa- by appropriate spatio-temporal covariates, i.e. logarithm of rameter θ to the real line. The right-hand side is called the the altitude (logalt), day of the year (doy) and geographical additive predictor, where β0 is the intercept term and fj are location (lon,lat). Since the response might non-linearly de- unspecified (potentially) non-linear smooth functions that are pend on the covariates, we choose generalized additive mod- modelled using regression splines (Wood, 2006; Fahrmeir els (GAMs) as a statistical framework, for which a brief in- et al., 2013). For each fj a design matrix Xj containing troduction is presented in Sect. 3.1. spline basis functions is constructed. Thus the GAM can be It is assumed that the number of flashes detected within a written as generalized linear model (GLM), cell and day are generated by a random process Y . Realiza- ∈ { } tions of the random process are denoted by yi 0, 1, 2, . . . , 3 where i = 1, 2, . . . , n indicates the observation. Two distinct X g(θ) = β0 + Xj βj . (2) models are set up: first, a model for the probability of the oc- j=1 www.nat-hazards-earth-syst-sci.net/17/305/2017/ Nat. Hazards Earth Syst. Sci., 17, 305–314, 2017 308 T. Simon et al.: Spatio-temporal modelling of lightning climatologies

> > > The coefficients β = (β0, β1 , β2 , β3 ) are estimated by max- that the positive counts of flashes within a spatial cell and day imizing the penalized log-likelihood, follow a zero-truncated Poisson distribution with the PDF, d (y,µ) n 3 !!! ; = Pois X −1 X dZTP(y µ) , (5) l(β) = log d yi;g β0 + Xj βj 1 − dPois(0,µ) i=1 j=1 where y ∈ {1, 2, . . . }, and dPois(·;µ) is the PDF of the Poisson 3 1 X > distribution with expectation µ. The conditional expectation − λj β Sj β , (3) 2 j j is E[Y |Y > 0] = µ/(1 − e−µ). The GAM for this component j=1 has the form of Eq. (1) with µ replacing θ. The logarithm where the first term on the right-hand side is the unpenalized serves as link function g(µ) = log(µ). log-likelihood. The second term is added to prevent overfit- The family for modelling the zero-truncated Poisson dis- ting by penalizing too-abrupt jumps of the functional forms. tribution ztpoisson() within a GLM or GAM frame- work is implemented in the R-package countreg (Zeileis and λj are the smoothing parameters corresponding to the func- Kleiber, 2016). For more information on and a formal defi- tions fj . For λj = 0 the log-likelihood is unpenalized with nition of hurdle models the reader is referred to Zeileis et al. respect to fj . When λj → ∞ the fitting procedure will select (2008). a linear effect for fj . The selection of the smoothing param- eters λ is performed by cross-validation (Sect. 3.4). j 3.4 Verification The value of λj determines the degrees of freedom of the associated effect. Lower values of λj , e.g. small penaliza- In this section the verification procedures are briefly intro- tion, lead to an effect with more degrees of freedom, which duced, namely the cross-validation, the applied scores and might explain more features but is also prone to overfitting. the block-bootstrapping. High values of λj , e.g. strong penalization, result in an effect In order to ensure the verification of the model along in- with fewer degrees of freedom. Thus fewer features can be dependent data, we applied a 6-fold cross-validation (Hastie explained. The balance between small and strong penaliza- et al., 2009). Six years of data are available. The parameters tion and thus the corresponding degrees of freedom is found of the model are estimated based on 5 years of the data and by performing cross-validation (Sect. 3.4). validated on the remaining year. This is done 6 times with Sj in Eq. (3) are prespecified penalty matrices, which de- every single year serving as a validation period once. pend on the choice of spline basis for the single terms. The The log-likelihood is applied as scoring function, which reader is referred to Wood(2006) for more details. is also called logarithmic score in the literature on proper Estimation of a GAM for such a large data set, i.e. scoring rules (Gneiting and Raftery, 2007). bam 7 309 152 data points, is feasible, e.g. via function () To assess confidence intervals of the estimated parameters (for big additive models) implemented in the mgcv package and effects, day-wise block-bootstrapping was performed. (Wood et al., 2015; Wood, 2016) of the statistical software R By day-wise block-bootstrapping we mean the following: we (R Core Team, 2016). resample the 738 dates of all available days with repetition and pick all the data observed on these days spatially. This 3.2 Occurrence model procedure is executed 1000 times in order to assess confi- The first component models the probability of lightning dence intervals. Pr(Y > 0) = π occurring within a 1 km × 1 km cell and a day. The Bernoulli distribution with the parameter π of the prob- 4 Results ability density function (PDF), This section presenting the results of the statistical models is y 1−y dBe(y;π) = π (1 − π) , (4) structured as follows: first, the non-linear effects of the occur- rence model are described in Sect. 4.1; second, the effects of will be fitted. Since the data are binomial, y ∈ {0, 1} indicates the intensity model are presented in Sect. 4.2. Finally, exem- no lightning and lightning within the cells. The model for π plary applications illustrate in Sect. 4.3 how climatological takes the form of Eq. (1) with π replacing θ. The comple- information can be drawn from the models. mentary log-log g(π) = log(−log(1 − π)) is implemented as link function. 4.1 Occurrence model

3.3 Intensity model The estimates of the effect of the occurrence model (Sect. 3.2) are depicted in Fig.4. The values are on the scale The second part is the truncated count component for the ex- of the additive predictor, i.e. the right-hand side of Eq. (1). pected number of flashes given lightning activity. We will The additive predictor takes the value of the intercept term β0 refer to this component as the intensity model. It is assumed if the sum of all other effects is equal to zero. Its estimate is

Nat. Hazards Earth Syst. Sci., 17, 305–314, 2017 www.nat-hazards-earth-syst-sci.net/17/305/2017/ T. Simon et al.: Spatio-temporal modelling of lightning climatologies 309

Table 1. Coordinates of the sample locations in Fig.1.

Id Name Lon Lat Alt (◦ E) (◦ N) (m a.m.s.l.) A Heiligenblut 12.84 47.04 1315 B Nassfeld 13.28 46.56 1525 C Dobratsch 13.67 46.60 2166 D Klagenfurt 14.31 46.62 447 E Rosennock 13.71 46.88 2440

suggests that a linear term for the altitude might be suffi- cient. For altitudes above 2000 m, however, the non-linear term leads to larger values than a linear term β1alt would do. The temporal or seasonal effect f2(doy), i.e. the depen- dence of the target on the day of the year (doy), shows a steep increase during May, reaches its maximum in mid-July and decreases slowly during August (Fig.4b). This result in- dicates that the main lightning season in Carinthia lasts from mid-June until end of August. The estimated degrees of free- dom are 2.5, which leads to the simple shape of the seasonal effect with one clear maximum. Figure 4. The effects of the occurrence model on the scale of the ad- The spatial effect f3(lon,lat), which explains the spatial ditive predictor. (a) The altitude (logalt) effect. Ticks on the x axis variations of the linear predictor that cannot be explained by are set in 100 m intervals. The grey lines show 1000 estimates from the altitude term f1(logalt), requires 138 degrees of freedom. day-wise block-bootstrapping. The solid red line is the median of (Fig.4c). Most prominent features are the minimum near the the 1000 estimates, the dashed red lines are the 95 % confidence in- north-west and the maximum in the middle to eastern part of tervals. (b) The seasonal (doy) effect. (c) The spatial (lon,lat) effect. Carinthia. In the north-west the highest mountains, the High The plot shows the median of 1000 estimates from day-wise block- Tauern, of Carinthia are located. The minimum with values bootstrapping. The difference between two contour lines is 0.1. less than −0.3 on the complementary log-log scale suggests Dashed contour lines indicate negative values. that lightning activity is less pronounced in this region. This finding is in line with former analyses of the lightning ac- tivity in Austria (Troger, 1998; Schulz et al., 2005), which β0 = −3.97 (−4.15, −3.80) on the complementary log-log stated that the main alpine crest is an area with a minimum in scale, which is 1.87 % (1.56, 2.21 %) in terms of the proba- flash density. The maximum zone with values exceeding 0.3 bility of lightning. The numbers in parentheses are the 95 % in the middle to eastern part of Carinthia covers the so-called confidence intervals computed from 1000 day-wise block- Gurktal Alps. In comparison with the , the - bootstrapping estimates. tal Alps have a lower average elevation and the mountains How the effects in Fig.4 can be interpreted to obtain the are not as steep. Such maxima at moderate or low altitude probability of lightning at a particular location and day is are mostly modelled by the spatial effect, not by the altitude shown for the location E (Rosennock in Fig.1) and 20 July. effect. The location is at an altitude of 2440 m (Table1), for which As the altitude is a function of longitude and latitude, the altitude effect is roughly 0.34 (Fig.4a). The contribution one could ask whether it would be sufficient to only take of the seasonal effect (Fig.4b) for 20 July is about 0.64. The spatial effect into account that implicitly contains the alti- spatial effect (Fig.4c) has a value of −0.03 at this geograph- tude and skip the explicit altitude effect. In general the pre- ◦ ◦ ical location (13.71 E, 46.88 N). Adding these values to the sented method would be capable of modelling the influence intercept β0 = −3.97 yields −3.02, which is on the scale of of the altitude within the spatial effect implicitly. However, the additive predictor. It needs to be transferred with the in- the shape of the altitude in the region of interest is very com- verse of the complementary log-log function to obtain the plex. Thus, a spatial effect with a large degree of freedom value in probability space, which is 4.76 %. would be required in order to account for the complex alti- The second term f1(logalt) of the additive predictor mod- tude shape. As we know the shape of the altitude we can pass els the effect of the logarithm of the altitude. f1(logalt) varies it to the GAM as an isolated effect. The altitude effect con- from roughly −0.2 for low altitudes to values greater than 0.5 tains only information associated with the altitude while the for altitudes above 2800 m (Fig.4a). This function takes 7.9 remaining effects are captured by the spatial term. degrees of freedom. Its shape is close to exponential, which www.nat-hazards-earth-syst-sci.net/17/305/2017/ Nat. Hazards Earth Syst. Sci., 17, 305–314, 2017 310 T. Simon et al.: Spatio-temporal modelling of lightning climatologies

12.6° E 13° E 13.4° E 13.8° E 14.2° E 14.6° E 15° E 47.2° N

47° N

46.8° N

46.6° N

46.4° N

40 km

2 % 3 % 4 % 5 % 6 %

Figure 6. Climatological probability (expected values) of lightning in Carinthia on the 1 km × 1 km scale for 20 July computed with a generalized additive model (GAM).

ing effect of the occurrence model. However, there are some features in common for both effects. For instance, the promi- nent maximum visible in Fig.5c in the Gurktal Alps also appears in the spatial effect of the count model (Fig.4c). The Figure 5. The effects of the intensity model on the scale of the ad- strong minimum in the western part of the domain is also ditive predictor. Labelling is analogue to Fig.4. (a) The altitude (lo- common. The most pronounced new feature is the strong galt) effect. (b) The seasonal (doy) effect. (c) The spatial (lon,lat) local maximum with values exceeding 0.9 in the south of effect. Carinthia. A 165 m radio tower is installed on the peak of the Dobratsch mountain (location C in Fig.1), which trig- gers lightning strokes under suitable conditions, i.e. occur- The introduced model (Eq. 1) could also be extended rence of a thunderstorm. Other maxima of this effect could by potentially non-linear functions of other covariates that also be attributed to sites of radio towers, which suggests that are meaningful for a climatological assessment, e.g. surface the number of flashes is more sensitive to local constructions roughness, slope and aspect of topography. However, in the than the probability of lightning. present case, adding these covariates was not improving the model. 4.3 Applications 4.2 Intensity model In order to illustrate how climatological information can The non-linear effects of the intensity model (Sect. 3.3) are be drawn from the GAMs, two different kinds of applica- depicted in Fig.5. The estimate of the intercept term takes tions are presented. First, maps show spatial climatologies the value β0 = −0.01 (−0.19, 0.14) which leads to a expected (Figs.6 and7). Here, the occurrence model and/or the in- number of flashes given lightning activity of 1.57 (1.47, 1.68) tensity model are evaluated for one specific day. Second, the when the sum of all other effects is equal to zero. seasonal climatology for selected 1 km × 1 km grid cells are The altitude effect f1(logalt) (Fig.5a), with 5.4 degrees discussed (Fig.8). Here, the models are evaluated with re- of freedom, reveals a similar functional form as the altitude spect to the geographical location of the point of interest and effect of the occurrence model (Fig.4a). However, it has a its altitude. flatter shape for the terrain between 600 and 1200 m and a The spatial distribution of climatological probabilities of steeper increase for high altitudes above 2000 m a.m.s.l. lightning occurring in a cell for 20 July (close to the seasonal The seasonal effect f2(doy) is −0.5 in early May, reaches peak) varies from 1.8 to 6.5 % (Fig.6). In the western part of a maximum of 0.3 in early July and decreases to values the domain, local valleys and mountain ridges become visi- around −0.3 until the end of August (Fig.5b). Thus the am- ble through the altitude effect (Fig.4a). However, the high- plitude of this effect is not as strong as the seasonal effect of est probabilities do not occur over the highest terrain in the the occurrence model (Fig.4b) and the location of the maxi- north-west, where the spatial effect counteracts the altitude mum is earlier. It has 2.1 degrees of freedom. effect, leading to moderate probabilities around 2 to 3 %. The The spatial effect f3(lon,lat) varies strongly and requires spatial effect (Fig.4c) is responsible for the maximum over 166 degrees of freedom which is more than the correspond- the moderate altitude region of the Gurktal Alps. Such a map

Nat. Hazards Earth Syst. Sci., 17, 305–314, 2017 www.nat-hazards-earth-syst-sci.net/17/305/2017/ T. Simon et al.: Spatio-temporal modelling of lightning climatologies 311

12.6° E 13° E 13.4° E 13.8° E 14.2° E 14.6° E 15° E Table 2. Relative frequencies (%) of number of flashes (columns) 47.2° N for 20 July of the sample locations (rows) in Fig.1 derived from the occurrence and intensity GAM. 47° N 0 1 2 3 > 4 46.8° N A 98.16 1.12 0.52 0.16 0.04 B 95.37 1.74 1.49 0.86 0.54 46.6° N C 95.54 0.76 1.10 1.06 1.54 D 96.94 1.81 0.88 0.28 0.09 46.4° N E 95.24 2.24 1.52 0.69 0.31 40 km

0.04 0.07 0.1 0.13 0.16 5 % 0.12 A (a) 2.1 (b) 4 % B 0.10 7 Figure 7. Climatological number of flashes (expected values) in C 7.6 D 0.08 3.4 Carinthia on the 1 km × 1 km scale for 20 July. 3 % E 5.5 0.06 2 %  0.04 1 % 0.02 0.00 can also serve as thunderstorm climatology when lightning May Jun Jul Aug May Jun Jul Aug is taken as a proxy for thunderstorms. A comparison of Figs.6 and3 illustrates some of the ben- Figure 8. Seasonal climatologies for sample locations, which are efits of using GAMs instead of taking averages in each grid highlighted in Fig.1. (a) Occurrence model. (b) Expected number cell for computing expected values of lightning occurrence. of flashes. The legend shows expected number of flashes accumu- lated over the lightning season. Harnessing the information from the complete data set in- stead of using only information contained in each grid cell removes the noise and makes the overall pattern visible, e.g. eastern part of Carinthia shows a moderate chance of light- the difference between lightning over valleys and ridges. ning with values around 3 % during the peak of the season. For the same day, 20 July, the expected number of flashes The climatologies of the expected number of flashes are is depicted in Fig.7. This is the product of probabilities of depicted in Fig.8b. The order of location has changed. In lightning π from the occurrence model and the expected particular, the highest number of flashes are expected for lo- number of flashes given lightning activity, which is derived cation C which is the Dobratsch mountain. This is caused from the intensity model. Values range from 0.028 to 0.166. by the strong local maximum in the spatial effect of the in- The lowest values can be found in the north-western part of tensity model (Fig.5c). The legend shows expected num- Carinthia where the spatial effects of both models reveal a ber of flashes accumulated over the lightning season, which minimum. Next to the maximum in the Gurktal Alps, where leads to values between 2.1 for location A and 7.6 for loca- maxima in the spatial effects of both models can also be tion C. These values are in good agreement with the analysis found, a second peak appears at the Dobratsch mountain (lo- by Schulz et al.(2005, Fig. 5 therein). cation C in Fig.1), which is due to the local maximum in the Finally, it is also possible to derive relative frequencies of spatial effect of the intensity model (Fig.5c). the number of flashes of a specific location and day of the Next to the spatial information one can extract seasonal year from the GAM. The relative frequencies have been de- climatologies for different locations (Fig.8). These are com- rived for the five sample locations (Table2). The first column puted exemplary for five sites (Table1). Figure8a shows the of the table with the probabilities of no flashes occurring only climatologies of lightning probability. Differences between contains information from the occurrence model (cf. Fig.8a). the annual cycles of the probabilities are due to the altitude All other probabilities for one or more flashes are derived effect and the spatial effect of the occurrence model (Fig.4). from both the occurrence and the intensity model. The in- The highest probabilities between 4 and 5 % are modelled fluence of the intensity model is especially dominant in the in July for location B (dashed line), which is located at the relative frequencies for location C, where the probability of south-western border of Carinthia in the vicinity of a local having four or more flashes on 20 July is 1.54 %. maximum of the spatial effect (Fig.4c). This climatology ex- hibits a strong seasonality, as probabilities fall below the 1 % level. Though located at a similar altitude, the climatology of 5 Discussion location A (solid line) reveals maximum values less than 2 %. This difference is due to the spatial effect, which exhibits a This section addresses two helpful points for end users. The clear minimum in north-western Carinthia. The climatology first one is on how to choose the cross-validation score in of location D (dashed dotted line) in the lower plains in the order to avoid overfitting of the seasonal effect (technically www.nat-hazards-earth-syst-sci.net/17/305/2017/ Nat. Hazards Earth Syst. Sci., 17, 305–314, 2017 312 T. Simon et al.: Spatio-temporal modelling of lightning climatologies

16 % ● ● 16 % ● ● speaking the selection of its smoothing parameter λ). The (a) (b)

● ● second point is a discussion on how the introduced model ● ● 12 % ● ●● ● 12 % ● ●● ● ● ● ● ● ● ● (Eq.1) can be extended towards a weather prediction tool, ● ● ● ●● ● ●● ● ● ● ● 8 % ● 8 % ● i.e. for warning purposes. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● To illustrate the first point a subset of the large 4 % ● ● ● ● 4 % ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● data set is selected. We pick all data points in a ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● 0 % ●●●●●●●● ●●●●●●●●●●●● ●● ●●●●●●●●●●●●●● ● ● ●● ● ●● ●●●●●●● ●●●● 0 % ●●●●●●●● ●●●●●●●●●●●● ●● ●●●●●●●●●●●●●● ● ● ●● ● ●● ●●●●●●● ●●●● 5 × 5 neighbourhood around the location E. Thus, only May Jun Jul Aug May Jun Jul Aug 6 years × 123 days × 25 cells = 18 450 data points remain. The probability of lightning π is the target variable. Further- Figure 9. Local fits for the location E. Circles show empirical esti- more, altitude and spatial effects are omitted for simplicity mates. For comparison the estimate of the full occurrence model is for such a small region (cf. the smooth spatial effect of the added (dashed line). (a) Solid line is the GAM evaluated by cross- occurrence model in Fig.4c). Thus the GAM has the form, validation with day-wise blocks. (b) Solid line is the GAM evalu- ated by cross-validation without day-wise blocks. g(π) = β0 + f (doy). (6) The model is fitted twice: first, with the selection of non-linear effects and interactions of these predictors can be the smoothing parameter λ by 6-fold cross-validation modelled. Another major benefit of this procedure is that the where the observations made on a single day are kept climatology is nested within the additive predictor. Thus the together in a block, e.g. the cross-validation splits the performance of the prediction tool would be at least on the 6 years × 123 days = 738 days into six parts; second, λ is quality level of the climatology, but would not fall below. determined by 6-fold cross-validation without the day-wise blocks, e.g. the cross-validation splits the 18 450 data points randomly into six parts. In both cases the maximal number 6 Conclusions of degrees of freedom is set to 30. Figure9 shows the estimates of the two models. The This study presented how generalized linear models (GAMs) estimate resulting from the cross-validation with day-wise (e.g. Hastie and Tibshirani, 1990; Wood, 2006) provide a blocks is much smoother (1.9 degrees of freedom) than the useful tool for building a lightning climatology or a clima- estimate resulting from the cross-validation without daily tology for the occurrence of thunderstorms. The main con- blocks (29.9 degrees of freedom). Thus the latter estimate cept is to decompose the signal into different effects: an al- takes roughly the maximal degree of freedom and is obvi- titude effect, a seasonal effect and a spatial effect. The most ously overfitted. beneficial aspect of this method is that smooth estimates for The reason for the distinct estimates lies in the dependence these effects are obtained on such a fine spatial and tempo- structure of the data. For one cell the probability of detecting ral scale as 1 km2 and 1 day. This makes the resulting cli- lightning on 1 day given that lightning was detected on the matology a valuable tool for quantitative purposes, e.g. risk previous day is 6.7 %. Spatial dependence is much stronger. assessment or benchmarking in weather prediction. In order Provided that lightning occurs in one cell, the probability of to provide smooth effects, the method harnesses information lightning occurring in the adjacent cell is 41 %. This strong from the complete data set and not just separately in each spatial dependence comes with a physical meaning. First, the cell as would be the case by simply averaging the data. Even preconditions for thunderstorms and lightning to take place more effects than demonstrated in this paper can be included, vary much stronger from day to day than in the course of e.g. slope and aspect of topography or parameters associated a single day. Second, thunderstorm systems, i.e. multi-cell with land use. The choice of common effects allows expert thunderstorms or super-cell thunderstorms, cover a large area knowledge to be included. Additionally and importantly, ap- or even travel over a larger area (Markowski and Richardson, plying GAMs will also show which of these proposed effects 2011). is significant. For this reason we recommend exploring the dependence A hurdle approach was employed to compute a climatol- structure of the data first and defining the cross-validation ogy of the intensity of lightning in order to properly handle score according to this dependence structure. the large number of zeros in the data. Thus, two aspects of Finally, we discuss how the introduced model (Eq.1) can lightning are captured by the models: the probability of light- be extended in order to serve as a weather prediction tool. It ning occurring and the number of flashes detected within a is possible to add further predictors from a numerical weather grid cell. The effects of the two models are similar but not prediction system to the right-hand side of Eq. (1). In the case equal. In particular, the spatial effect of the intensity model of lightning and thunderstorms suitable predictors could be varies more strongly than the corresponding effect of the oc- convective inhibition energy (CIN), convective available po- currence model. For instance, local intensity maxima are trig- tential energy (CAPE), vertical shear of horizontal winds or gered in the vicinity of radio towers. large scale circulation patterns (e.g. Bertram and Mayr, 2004; In sum, the occurrence model and the count model took Chaudhuri and Middey, 2012). Within the GAM framework roughly 150 and 180 degrees of freedom, respectively. This

Nat. Hazards Earth Syst. Sci., 17, 305–314, 2017 www.nat-hazards-earth-syst-sci.net/17/305/2017/ T. Simon et al.: Spatio-temporal modelling of lightning climatologies 313 is a relatively small number compared to the degrees of free- Fahrmeir, L., Kneib, T., Lang, S., and Marx, B.: Regression: dom required by other methods. Counting and averaging Models, methods and applications, Springer, Berlin, Heidelberg, flashes with respect to a resolution of km−2 yr−1 would lead doi:10.1007/978-3-642-34333-9, 2013. to 9904 degrees of freedom in the introduced case without Feudale, L., Manzato, A., and Micheletti, S.: A cloud-to-ground capturing the seasonal cycle. Thus the GAM approach leads lightning climatology for north-eastern Italy, Adv. Sci. Res., 10, to a smooth, non-linear and sparse quantification of the cli- 77–84, doi:10.5194/asr-10-77-2013, 2013. Gladich, I., Gallai, I., Giaiotti, D., and Stel, F.: On the diurnal cycle matologies. of deep moist convection in the southern side of the Alps anal- ysed through cloud-to-ground lightning activity, Atmos. Res., 100, 371–376, doi:10.1016/j.atmosres.2010.08.026, 2011. 7 Data availability Gneiting, T. and Raftery, A. E.: Strictly proper scoring rules, pre- diction, and estimation, J. Am. Stat. Assoc., 102, 359–378, ALDIS data are available on request from ALDIS doi:10.1198/016214506000001437, 2007. ([email protected]) – fees may be charged. Hastie, T. and Tibshirani, R.: Generalized additive models, vol. 43, CRC Press, Boca Raton, 1990. Hastie, T., Tibshirani, R., and Friedman, J.: The elements of statis- tical learning: data mining, inference and prediction, in: Springer Author contributions. Thorsten Simon, Georg J. Mayr and Series in Statistics, 2nd Edn., Springer, Berlin, doi:10.1007/978- Achim Zeileis defined the scientific scope of this study. Thorsten Si- 0-387-84858-7, 2009. mon performed the statistical modelling, evaluation of the results Kärnten, L.: Digital terrain model (DTM) Carinthia, CC-BY- and wrote the paper. Georg J. Mayr provided support on the meteo- 3.0: Land Kärnten – data.ktn.gv.at, ALS airborne mea- rological analysis. Nikolaus Umlauf and Achim Zeileis contributed surements, Land Kärnten, http://www.data.gv.at/katalog/dataset/ to the development of the statistical methods. Wolfgang Schulz 25ab688b-5b8b-4363-a1a8-17662dc05db8 (last access: 16 Jan- and Gerhard Diendorfer were in charge of data quality and gave uary 2016), 2015. advice on the lightning-related references. All authors discussed Markowski, P. and Richardson, Y.: Mesoscale meteorology in mid- the results and commented on the paper. latitudes, vol. 2, John Wiley & Sons, 2011. Mona, T., Horváth, Á., and Ács, F.: A thunderstorm cell-lightning activity analysis: The new concept of air mass catchment, Atmos. Competing interests. The authors declare that they have no conflict Res., 169, 340–344, doi:10.1016/j.atmosres.2015.10.017, 2016. of interest. Mullahy, J.: Specification and testing of some modified count data models, J. Econometr., 33, 341–365, doi:10.1016/0304- 4076(86)90002-3, 1986. Acknowledgements. We acknowledge the funding of this work Poelman, D. R.: A 10-year study on the characteristics of thun- by the Austrian Research Promotion Agency (FFG) project derstorms in Belgium based on cloud-to-ground lightning data, LightningPredict (Grant No. 846620). The computational results Mon. Weather Rev., 142, 4839–4849, doi:10.1175/MWR-D-14- presented have been achieved using the HPC infrastructure LEO of 00202.1, 2014. the University of Innsbruck. Furthermore we are deeply grateful to Pohjola, H. and Mäkelä, A.: The comparison of GLD360 and EU- the editor and reviewers for their valuable comments. CLID lightning location systems in Europe, Atmos. Res., 123, 117–128, doi:10.1016/j.atmosres.2012.10.019, 2013. Edited by: A. Günther Rakov, V. A.: Fundamentals of Lightning, Cambridge University Reviewed by: two anonymous referees Press, Cambridge, doi:10.1017/CBO9781139680370, 2016. R Core Team: R: A language and environment for statistical com- puting, R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/, last access: 16 January 2016. Rust, H. W., Vrac, M., Sultan, B., and Lengaigne, M.: Map- References ping weather-type influence on senegal precipitation based on a spatial-temporal statistical model, J. Climate, 26, 8189–8209, Bertram, I. and Mayr, G. J.: Lightning in the Eastern Alps 1993– doi:10.1175/JCLI-D-12-00302.1, 2013. 1999, Part I: Thunderstorm tracks, Nat. Hazards Earth Syst. Sci., Schulz, W., Cummins, K., Diendorfer, G., and Dorninger, M.: 4, 501–511, doi:10.5194/nhess-4-501-2004, 2004. Cloud-to-ground lightning in Austria: A 10-year study using data Chaudhuri, S. and Middey, A.: A composite stability index for di- from a lightning location system, J. Geophys. Res.-Atmos., 110, chotomous forecast of thunderstorms, Theor. Appl. Climatol., D09101, doi:10.1029/2004JD005332, 2005. 110, 457–469, doi:10.1007/s00704-012-0640-z, 2012. Schulz, W., Diendorfer, G., Pedeboy, S., and Poelman, D.: The Chavez-Demoulin, V. and Davison, A. C.: Generalized additive European lightning location system EUCLID – Part 1: Perfor- modelling of sample extremes, J. Roy. Stat. Soc. Ser. C, 54, 207– mance validation, Nat. Hazards Earth Syst. Sci., 3, 5325–5355, 222, doi:10.1111/j.1467-9876.2005.00479.x, 2005. doi:10.5194/nhess-16-595-2016, 2016. Curran, E. B., Holle, R. L., and López, R. E.: Lightning Stauffer, R., Messner, J. W., Mayr, G. J., Umlauf, N., and Zeileis, casualties and damages in the United States from 1959 A.: Spatio-temporal precipitation climatology over complex ter- to 1994, J. Climate, 13, 3448–3464, doi:10.1175/1520- 0442(2000)013<3448:LCADIT>2.0.CO;2, 2000. www.nat-hazards-earth-syst-sci.net/17/305/2017/ Nat. Hazards Earth Syst. Sci., 17, 305–314, 2017 314 T. Simon et al.: Spatio-temporal modelling of lightning climatologies

rain using a censored additive regression model, Int. J. Climatol., Wood, S. N., Goude, Y., and Shaw, S.: Generalized additive mod- doi:10.1002/joc.4913, in press, 2016. els for large data sets, J. Roy. Stat. Soc. Ser. C, 64, 139–155, Troger, W.: Die Einbeziehung des Österreichischen Blitzor- doi:10.1111/rssc.12068, 2015. tungssystems ALDIS in die Meteorologische Analyse von Ge- Yang, C., Xu, J., and Li, Y.: Bayesian geoadditive modelling of wittern, MS thesis, Institut für Meteorologie und Geophysik der climate extremes with nonparametric spatially varying temporal Universität Innsbruck, Innsbruck, 1998. effects, Int. J. Climatol., 36, 3975–3987, doi:10.1002/joc.4607, Wapler, K.: High-resolution climatology of lightning characteristics 2016. within central Europe, Meteorol. Atmos. Phys., 122, 175–184, Zeileis, A. and Kleiber, C.: countreg: Count data regression, doi:10.1007/s00703-013-0285-1, 2013. R package version 0.2-0, http://R-Forge.R-project.org/projects/ Wood, S. N.: Generalized additive models: An introduction with R, countreg, last access: 16 January 2016. CRC Press, Boca Raton, 2006. Zeileis, A., Kleiber, C., and Jackman, S.: Regression mod- Wood, S. N.: mgcv: GAMs with GCV/AIC/REML smoothness els for count data in R, J. Stat. Softw., 27, 1–25, estimation and GAMMs by PQL, R package version 1.8-16, doi:10.18637/jss.v027.i08, 2008. http://CRAN.R-project.org/package=mgcv, last access: 16 Jan- uary 2016.

Nat. Hazards Earth Syst. Sci., 17, 305–314, 2017 www.nat-hazards-earth-syst-sci.net/17/305/2017/