Application of a principal component analysis based on routinely measured data for the assessment of air quality Eberhard Schaller, Sergiy Vorogushyn Chair for Environmental Meteorology Brandenburg Technical University, Cottbus, Objectives A set of European Union framework and daughter directives define the basic principles of a strategy to assess and improve air quality in Europe. This strategy includes a repeated documentation of the air quality and, for the first time, the use of models for the estimation of pollution levels. Until today measures have been established for sulfur dioxide, nitrogen dioxide and total reactive nitrogen oxides, particulate matter and lead in the 1st daughter directive, benzene and carbon monoxide in the 2nd daughter directive, as well as ozone in the 3rd daughter directive. These measures include both short term and long term averages, limit exceedances and model quality objectives. It is the aim of this contribution to the joint project VALIUM to use principal component analysis techniques for the classification of high concentration episodes and the construction of maps of the spatial distribution of near-surface concentrations based on routine measurements. Ozone, nitrogen dioxide and reactive nitrogen oxides, carbon monoxide and particulate matter (PM10) are currently considered. The investigations are primarily carried out for the German federal state . In the future this technique will also be applied to simulated fields from the model system M-SYS developed in VALIUM subproject 2 (TP2, Schlünzen).

Methodology Example: PM10 – Lower Saxony – Year 2000

Time Averaging Interval 1.0 BRAUNLAGE-Wurmberg -BROITZEM PM10 (according to EU directive) 0.9 CLOPPENBURG year 2000 CUXHAVEN Principal Component Analysis DUDERSTADT 0.8 15 stations EMDEN GOETTINGEN n 0.7

INPUT HANNOVER-LINDEN tio c

JORK n 0.6 Covariance Matrix LINGEN u Time series (concentrations, f LUECHOW n 0.5 LUENEBURG of phase-space centered meteorological parameters) tio NORDENHAM-CITY a l

e 0.4 NORDERNEY time series from S stations, r

OKER r o 0.3 (including missing values) OSNABRUECK C RINTELN SALZGITTER 0.2 SOLLING WALSRODE 0.1 WILHELMSHAVEN-VOSLAPP WOLFSBURG 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 40 80 120 160 200 240 280 320 Eigenvalues and Eigenvectors Data availability Distance km (in descending order) Map Construction Figure 1: Data availability in 2000 for 22 Stations in Lower Saxony; stations Eigenvectors (principal components) with more than 75 % missing data are excluded from the principal component are an ortho-normal set of linearly Grid Definition analysis (left); correlation coefficient for simultaneous measurements at two independent vectors in the stations as a function of the distance between these stations (right). S-dimensional phase space (lower left corner, domain size, grid spacing) PM10 NORDERNEY 5950 Year 2000 NORDENHAM-CITY JORK

EMDEN Time Series of Weights 5900 LUECHOW (S-dimensional phase space) Weighting Function for each WALSRODE Grid Point 5850 for reconstruction/filtering of LINGEN WOLFSBURG original measurements HANNOVER-LINDENLUENEBURG determined from a weighted 5800 OSNABRUECK BRAUNSCHWEIG-BROITZEM PM10 linear combination of all relevant NORDERNEY 5950 Year 2000 OKER eigenvectors JORK 5750 SOLLING NORDENHAM-CITY EMDEN

5900 Exp=2 GOETTINGEN 5700 3350 3400 3450 3500 3550 3600 3650 LUECHOW

WALSRODE 5850

LINGEN WOLFSBURG HANNOVER-LINDEN MAP(s) 5800 LUENEBURG OSNABRUECK (time averaged) concentrations, BRAUNSCHWEIG-BROITZEM

limit exceedances 5750 OKER SOLLING

Exp=2 GOETTINGEN 5700 3350 3400 3450 3500 3550 3600 3650 Figure 2: Spatial distribution of the annual PM10 concentration (upper left) Exceedance Limit and number of exceedances over the future (effective from 01-Jan-2005) daily 3 (according to EU directive) limit value of 50 µg/m (lower right); Gauss-Krüger coordinates are used on both axes, i.e. on the abscissa the distance in km from 9 oE (located at 3500 km) and on the ordinate the distance in km from the equator is shown, respectively.

Theoretical Aspects 1750 1.00

1500 0.90 c

u

)

m

t

s

u e

The basic input data matrix for the principal components technique consists of l

t

a -

1250 0.80 t

e

i

e

v r

up to 96432 rows (time series of eleven years, 1990-2000, i.e. 4018 days with e

c

V

S

a r

r 1000 0.70 i

hourly observations) and up to 45 columns defined by the number of stations. As o

a

f

n

.

c

g

.

e

e a first step the covariance and the correlation matrices are determined. Their (

750 0.60 c

o

e

n

u

l

t a

non-zero off-diagonal elements result from the mutually dependence of the r v PM10, Niedersachsen i

500 0.50 b

n

u e

2000 / 15 2000 / 22 t

measurements at two stations. These internal correlations have to be removed. g

i

i

o

E n This is achieved by a rotation of the coordinate system in such a way that the 250 0.40 eigenvectors form a new set of unity vectors. The eigenvalues are a measure for 0 0.30 0 2 4 6 8 10 12 14 16 18 20 22 the contribution of each eigenvector to the total variance. Figure 3 shows the Principle Component No. eigenvalues (symbols) and their cumulative contribution to the total variance Figure 3: Eigenvalues (symbols) and cumulative contribution to the (solid line) for two sets of stations. The set of 15 stations has also been used in the variance (solid lines) for two data sets (year 2000, hourly values): red lines/symbols: all 22 stations, blue lines/symbols: 15 stations construction of the two maps shown in Figure 2. with highest data availability (see Figure 1).