FEBRUARY 1998 KAUFMANN AND WEBER 89

Directional Correlation Coef®cient for Channeled Flow and Application to Wind Data over Complex Terrain

PIRMIN KAUFMANN AND RUDOLF O. WEBER Paul Scherrer Institute, Villigen, (Manuscript received 23 April 1996, in ®nal form 26 May 1997)

ABSTRACT Analysis of vector quantities or directional data, such as the variables characterizing ¯ow, is of signi®cant interest to geophysical ¯uid dynamicists. For ¯ows with strong channeling, a new simple correlation coef®cient is de®ned. It is demonstrated by application to a model of channeled ¯ow that the new correlation captures the ¯ow features in the case of channeling better than other correlations taken from the literature. The new correlation coef®cient is applied to wind data from a mesoscale network of anemometers in complex terrain. A cluster analysis based on the correlation matrix is used to group observation sites into classes with similar behavior of the channeled ¯ow. Sites within the same class are not necessarily geographically close. A similar behavior of the wind directions indicated by these classes seems to be more closely related to the orographic features and to the altitude of the sites than to the horizontal distance between them.

1. Introduction correlations enables one to see which valleys or areas experience the same forcing of the wind. A correlation Meteorologists and oceanographers have dealt with coef®cient for wind direction of such channeled ¯ows the problem of correlating vector quantities for at least should have some special properties. When winds in 80 years (Dietzius 1916; Sverdrup 1917; Breckling two different valleys are directed down-valley, the cor- 1989; Hanson et al. 1992), and statisticians also tackled relation between the winds in the valleys should be high. the problem (Fisher 1993). A vector quantity requires When the ¯ow in one valley is in the down-valley di- both magnitude and direction for its unique character- rection and in the up-valley direction in another valley, ization. When the vector is represented by its compo- the correlation between the winds of the two valleys nents in a coordinate system like the Cartesian system should be negative. If no simultaneous channeling in or spherical coordinates, correlation coef®cients can be the two valleys is observed, the correlation should be- de®ned using these components. Many de®nitions of a come close to zero. A simple correlation coef®cient sat- single scalar value describing the correlation of vector isfying these requirements is proposed here. quantities have appeared in the literature (see the re- In section 2, several de®nitions of directional and views in Breckling 1989; Hanson et al. 1992; Crosby vector correlation coef®cients are reviewed. Section 3 et al. 1993). When the magnitude of the vector is ignored introduces a new de®nition of correlation for channeled and its direction alone is studied (which is equivalent ¯ows, which makes use of the speci®c properties of to considering vectors of unit length), problems arise channeled ¯ows. The different correlation coef®cients because the direction is a circular variable (Mardia 1972; discussed in sections 2 and 3 are compared in section Essenwanger 1986; Fisher 1993). Several de®nitions of 4 by application for an idealized situation of two wind correlation coef®cients for circular variables have been vectors showing pronounced channeling. In section 5, published (a review is given in Hanson et al. 1992). the new correlation coef®cient for channeled ¯ows is In the present paper, the highly channeled near-sur- applied to wind observations from a mesoscale ®eld face ¯ow of the atmosphere in a region with many val- experiment with the objective being to identify groups leys is the focus of attention. The determination of a of measurement sites with similar behavior of wind di- correlation coef®cient between the wind measurements rections. at different station locations allows us to compare the ¯ow in the different valleys. The examination of these 2. Review of some directional and vector correlation coef®cients

Corresponding author address: Dr. Rudolf O. Weber, Paul Scherrer In the analysis of channeled wind, as discussed in Institute, CH-5232 Villigen PSI, Switzerland. section 5, we are only interested in wind direction but E-mail: [email protected] not in wind speed or the magnitude of the wind vector.

᭧ 1998 American Meteorological Society

Unauthenticated | Downloaded 09/30/21 01:59 PM UTC 90 JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY VOLUME 15

Therefore, both directional and vector correlation co- This correlation coef®cient takes values from 0 to 1. ef®cients (applied to vectors of unit length) are suitable Hanson et al. (1992) de®ne a variant of this vector cor- in our case. relation coef®cient with a sign factor, det(⌺12)/|det(⌺12)|. Among the variety of proposed directional and vector If the products ␲(x, y) in (4) in Sverdrup's de®nition correlation coef®cients (Hanson et al. 1992 list 17 def- (5) are replaced by the centered covariances ␴(x, y)in initions) we chose four different de®nitions (Sverdrup (1), the expression (7) of Breckling (1989) is recovered. 1917; Fisher and Lee 1983; Breckling 1989; Crosby et Crosby et al. (1993) discuss a vector correlation co- al. 1993). Let W1 ϭ (u1, ␷ 1) and W 2 ϭ (u 2, ␷ 2) be two, ef®cient de®ned by two-dimensional vectors representing the horizontal ␳ ϭ {tr([⌺ ]Ϫ1⌺ [⌺ ]Ϫ1⌺ )}1/2, (8) wind vector at two measurement sites. The covariance CBG 11 12 22 21

␴(u1, u1) and the cross-covariance ␴(u1, u 2) are de®ned which is essentially the de®nition given by Hooper in the standard way as (1959) and which was further developed by Jupp and Mardia (1980). The squared correlation coef®cient ␴(x, y) ϭ E[xy] Ϫ E[x]E[y], (1) 2 ␳CBG is the sum of squares of the canonical correlations where E[x] is the expectation value of the random vari- (Crosby et al. 1993) and ranges from 0 to 2. We use in able x. To simplify the equations later, the following the following a normalized form, covariance matrices are introduced, according to the no- /2, (9) tation of Crosby et al. (1993). ␳C ϭ ␳CBG ͙ which takes values from 0 to 1. Breaker et al. (1994) ␴(u11, u ) ␴(u 11, ␷ ) studied signi®cance tests of this vector correlation. Han- ⌺11 ϭ ; ΂΃␴(␷ 11, u ) ␴(␷ 11, ␷ ) son et al. (1992) summarize and discuss in detail the invariance properties of these and many other correla-

␴(u12, u ) ␴(u 12, ␷ ) tion coef®cients. ⌺12 ϭ , etc. (2) ΂΃␴(␷ 12, u ) ␴(␷ 12, ␷ ) 3. Directional correlation coef®cient for channeled In the same way the product matrices ⌸11, etc, are de®ned as ¯ow All of the de®nitions in the last section apply to any ␲(u11, u ) ␲(u 11, ␷ ) type of ¯ow and do not make use of any speci®c prop- ⌸11 ϭ , etc., (3) ΂΃␲(␷ 11, u ) ␲(␷ 11, ␷ ) erties of the ¯ow. In contrast, the correlation for chan- where the uncentered product moments ␲(x, y)oftwo neled ¯ow as de®ned in this section incorporates the random variables x and y are de®ned by properties of channeled ¯ow into the de®nition of the correlation coef®cient. This speci®c correlation coef®- ␲(x, y) ϭ E[xy]. (4) cient becomes better suited for the channeled ¯ows but One of the oldest de®nitions of a vector correlation was is not applicable to other types of ¯ow. given by Sverdrup (1917). He de®ned a correlation by The near-surface winds over complex terrain are often channeled by valleys, even showing countercurrents to 1/2 [tr(⌸ )]22ϩ [␲(u , ␷ ) Ϫ ␲(␷ , u )] the geostrophic ¯ow (Wippermann and Gross 1981; ␳ ϭ 12 1 2 1 2 , (5) S tr(⌸ )tr(⌸ ) Wippermann 1984; Whiteman and Doran 1993). In Ά·11 22 smaller valleys, often thermally induced ¯ows develop where tr(A) denotes the trace of matrix A. Sverdrup (for a review see Whiteman 1990). To compare wind- (1917) stresses that the uncentered product moments direction data from different valleys, it is desirable to ␲(x, y) in (4) must be taken and not the centered co- have a correlation coef®cient that indicates whether up- variances ␴(x, y) in (1). or down-valley ¯ow prevails in both valleys. Figure 1 Fisher and Lee (1983) developed a directional cor- shows the wind rose of a station in the Rhein Valley relation coef®cient ranging from Ϫ1 to 1. A represen- east of (station E1, see section 5 for more details). tation of their correlation coef®cient in terms of the Two preferred directions (southeasterly and westerly matrices de®ned above is given in Fisher and Lee (1986) winds) are evident. This is a good example of a distri- and Breckling (1989): bution of directions in a ¯ow that is usually referred to det(⌸ ) as channeled. We term one of them the wind di- ␳ ϭ 12 , (6) rection (southeasterly in this case) and the other the FL 1/2 [det(⌸11)det(⌸ 22)] secondary wind direction (westerly in this case). It where det(A) denotes the determinant of matrix A. should be noted that these two dominant wind directions Breckling (1989) proposed the following correlation are, however, not 180Њ apart. For the de®nition of a coef®cient: correlation coef®cient, each wind direction is assigned to the main or the secondary wind direction, or to the tr([⌺⌺])1/2 ␳ ϭ 12 21 . (7) class of all other wind directionsÐthus, to one of three B 1/2 [tr(⌺11)tr(⌺ 22)] classes. All wind directions of the main class are as-

Unauthenticated | Downloaded 09/30/21 01:59 PM UTC FEBRUARY 1998 KAUFMANN AND WEBER 91

FIG. 2. Wind roses of idealized channeled ¯ows as described in text. The left chart represents the angular distribution of the angle

␸1 of the independent ¯ow. The right chart shows the distribution of

the coupled wind direction ␸2.

5 shows that a cluster analysis based on the correlation matrix gives essentially the same results for our cor- relation and for the contingency coef®cient. As the correlation coef®cient (10) is de®ned as a stan- dard product-moment correlation, although with discre- tized variables, Fisher's z-transform (Stuart and Ord 1987) can be used to get an estimate of signi®cance FIG. 1. Wind rose of station E1 (see Table 1 and Fig. 9). The length of each 10Њ sector is proportional to its frequency of occurrence. The levels. The correlations are calculated from a sample of circle indicates the 5% level. A uniform angular distribution would size N. Let n11 be the observed frequency of simulta- have constant lengths of 2.8% for all 36 sectors. neous occurrence of the main wind direction at both

stations, n 22 the frequency of secondary direction at both stations, etc. The 3 ϫ 3 values nkl are distributed fol- signed the value ϩ1, all wind directions of the second- lowing a multinomial distribution with nine classes. For ary class are assigned the value Ϫ1, and all other wind large N (section 5 shows that for our dataset N Ͼ 3800) directions obtain the value 0. This assignment of values the multinomial distribution is well approximated by a allows the calculation of a standard Pearson cross-cor- normal distribution (Johnson and Kotz 1969). Hence, relation coef®cient between wind directions at different for large N a standard test of the correlation coef®cient stations (denoted by i and j)by can be used. (E[aa] Ϫ E[a ]E[a ]) ␳chf ϭ ij i j , (10) ij 22221/2 [(E[aiijj] Ϫ E[a ])(E[a ] Ϫ E[a ])] 4. Test of correlation coef®cients with a model for channeled ¯ow where ai denotes the values Ϫ1, 0, ϩ1 assigned to the wind directions at station i. An idealized situation of ¯ow with strong channeling The choice of the three values Ϫ1, 0, ϩ1 seems some- at two different locations is considered. The two wind what arbitrary. However, two of the three values can be roses are shown in Fig. 2. The wind direction ␸1 at the chosen arbitrarily without changing the value of the cor- ®rst location has a probability of 42% to be in the main relation coef®cient (10), as in the de®nition of ␳chf the direction (75Њ, 105Њ) and of 26% to be in the secondary variables are standardized. One of the three values must direction (255Њ, 285Њ). These probabilities are similar to be ®xed, and its value in¯uences the resulting value of the observed ones of station E1, whose wind rose is the correlation coef®cient. We used a variable quantity shown in Fig. 1. The wind direction ␸ 2 at the second C instead of the ®xed value 0 for the wind directions site is determined from the wind direction ␸1 at the ®rst not belonging to one of the two dominant classes. The site by compressing the wind directions with a northern chf parameter C can then be varied to make |␳ | a maxi- component (␸ Ͻ 90Њ or ␸1 Ն 270Њ) and stretching the mum. We did this maximization with the data described wind directions with a southern component (90Њ Յ ␸1 in section 5 and obtained values of C close to zero Ͻ 270Њ). The angle ␥ (Fig. 2) indicates how much the (ഠ0.02). Therefore, C ϭ 0 was chosen throughout this main and secondary directions at the second site deviate paper, giving values of |␳chf| very close to its maximum. from 180Њ. For ␥ ϭ 0, the transformation is just the

As the three wind direction classes are nominal vari- identity transformation, and the directions ␸1 and ␸ 2 are ables, Pearson's contingency coef®cient (Sachs 1982) perfectly correlated. If ␥ ± 0, the angles are still cor- could be used to describe dependencies between them. related in the sense that they simultaneously occur in However, it has, for our intended application, two draw- the main direction class, in the secondary direction backs. First, it has no sign, thus, it cannot be distin- class, or in the remaining group. guished whether two stations have simultaneously up- Figure 3 shows the different correlation coef®cients valley winds or one has up-valley and the other down- discussed in sections 2 and 3 as a function of the angle valley winds. Second, as the contingency coef®cient is ␥. The new correlation coef®cient ␳chf (10) for channeled based on a ␹ 2 test of the contingency table (Sachs 1982), ¯ow is the only one among the ®ve coef®cients that it gives all classes the same weight, whereas with our gives a value of 1, independent of the angle ␥. The other correlation measure the two preferred directions have four correlation coef®cients, which do not use any spe- more weight than the other directions. However, section ci®c properties of the ¯ow and may be used for all types

Unauthenticated | Downloaded 09/30/21 01:59 PM UTC 92 JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY VOLUME 15

raphy. The Rhein ¯ows from the east through Basel, where it turns sharply northward forming the wide upper Rhein Valley bordered by the Scharzwald (Black Forest) to the east and the Vosges to the west (not shown in Fig. 4). To the south the Jura Mountains separate the Rhein Valley from the Swiss Middleland. Several small- er tributary rivers run through the mountain ranges. In the MISTRAL area, 50 meteorological stations (triangles in Fig. 4) were operated with the goal of mea- suring in detail the near-surface winds and ®nding the typical ¯ow patterns of that region with its complex topography. By means of an automated classi®cation method, a small number of 12 typical regional ¯ow patterns could be identi®ed among the 8784 1-h mean

FIG. 3. Correlation coef®cient as a function of the angle ␥, where wind ®elds of a 1-yr period (Weber and Kaufmann 1995; 180ЊϪ␥ is the angle between the main and secondary direction for Kaufmann and Weber 1996; Kaufmann 1996). the model described in section 4. Five different correlations are For climatological studies of the ¯ow patterns in this shown: (6) of Fisher and Lee (1986), (7) of Breckling (1989), (5) of region and for applications such as air-pollution control Sverdrup (1917), (8) of Crosby et al. (1993), and the correlation (10) for channeled ¯ow (Coeff-chf in the graph). or on-line emergency response planning, it would be desirable to identify the ¯ow patterns from a few stations only, instead of being forced to operate and analyze all of ¯ow, decrease with increasing ␥. The new coef®cient 50 stations. We try, therefore, to identify groups of sta- (10), which is only applicable to channeled ¯ow, in- tions with similar behavior of wind direction. Provided dicates a perfect correlation even when the main direc- such groups exist, it may then be suf®cient to select tion and the secondary direction of the wind direction only one station per group and still get all necessary distribution are not 180Њ apart. This is a desirable prop- information to identify the regional ¯ow patterns. erty for the analysis of channeled ¯ow, as the class of A list of the MISTRAL stations is given in Table 1 the direction (for example up- or down-valley) is more [more detailed information about the stations can be important than the direction per se. The coef®cient found in Kaufmann (1996)]. The ®rst column gives the meets our requirement that the correlation be high if station label used in Fig. 7. The second column of Table both observations are in the same class of directions. 1 indicates the orographic situations in which the sta- We conclude that the new coef®cient (10) is best suited tions are located. We distinguish among stations in a for our purpose of characterizing channeled ¯ow. valley (V), stations on a valley slope (S), stations in If noise (white noise with a triangular amplitude dis- hilly terrain without pronounced orographic features tribution was used) is added to the wind direction ␸ , 2 (H), stations on a pass (P), and stations on isolated the two angles become decorrelated to some extent, and mountain tops (M). As can be seen from the sensor a properly de®ned correlation coef®cient should become smaller with increasing noise intensity. In fact, all ®ve heights in Table 1 (fourth column), the anemometers correlation coef®cients discussed before get smaller were placed at nonstandard heights, ranging from 6 to with increasing values of noise intensity. Our correlation 15 m for masts on open space. Stations were also mount- ␳chf decays fastest for small and moderate values of noise ed on buildings with sensor heights up to 70 m above intensity and thus shows more accurately the presence ground. Station C (operated by the Swiss Meteorolog- of an uncorrelated part of the wind directions. ical Institute) is even located on a telecommunication tower 262 m above ground. The classi®cation of the 50 stations into groups is 5. Application to wind data from the MISTRAL done by a cluster analysis (Anderberg 1973). In a similar area and use for a cluster analysis of sites way, climatic regions were identi®ed by use of cluster In 1991±92 the ®eld experiment ``Modell fuÈr Immis- analysis (e.g., Stooksbury and Michaels 1991; Jackson sions-Schutz bei Transport und Ausbreitung von Luft- and Weinland 1995). Based on our experience with the fremdstoffen''Ðmodel for impact prevention during classi®cation of wind ®elds (Weber and Kaufmann transport and diffusion of air pollutants (MISTRAL)Ð 1995; Kaufmann and Weber 1996) we used a hierar- took place in a region of about 55 km ϫ 55 km around chical cluster analysis, the complete linkage method Basel, Switzerland (Kamber and Kaufmann 1992). The (Anderberg 1973). All hierarchical clustering methods MISTRAL experiment was part of an international cli- need a measure describing the dissimilarity (or distance) matological project called ``Regio-Klima-Projekt''Ð between the objects to be grouped. The complete linkage Regio climate project (REKLIP), which takes place in method is invariant under monotonous transformation the upper Rhein Valley (Parlow 1992, 1996). The MIS- of the distance (Jain and Dubes 1988), whereas other TRAL area, shown in Fig. 4, has quite complex topog- clustering methods are sensitive to details of the distance

Unauthenticated | Downloaded 09/30/21 01:59 PM UTC FEBRUARY 1998 KAUFMANN AND WEBER 93

FIG. 4. The observation area of the MISTRAL project (55 km ϫ 55 km) around the city of Basel showing the 50 measurement sites (black and white triangles). Contour lines and shading give the height above sea level. White areas are below 300 m MSL, light shaded areas are 300± 500 m MSL, medium shaded areas are 500±700 m MSL, dark shaded areas are 700±900 m MSL, and black areas are higher than 900 m MSL. Station locations higher than 700 m MSL are marked with a white triangle. The station labeled by a C is St. Chrischona with its anemometer placed on a tower 262 m above ground. de®nition. We de®ne a dissimilarity, or distance, mea- apart (e.g., station V4 in Fig. 5). For many others like sure between two stations i and j by S9 (see Fig. 5) the angle between the preferred direc- tions is less than 180Њ, and for station B7 (Fig. 5), it is dchfϭ 1 Ϫ ||,␳ chf (11) ij ij only about 110Њ Some stations (B1, B2, B4, N1, N2, chf where␳ij is the correlation coef®cient (10) for chan- N3, S4, S5, S6, S7, W1, W6, and W7) have a trimodal neled ¯ow between the wind directions at stations i and distribution of wind direction (B2 and N2 are shown in j. Because the sign of ␳chf depends on the arbitrary def- Fig. 5). In these cases, the third mode is caused by a inition of main and secondary wind direction, we use small tributary valley whose out¯ow to the main valley the absolute value of the correlation coef®cient in the gives the third peak in wind direction. Therefore, the distance de®nition. wind directions along the main valley axis are taken as The de®nition of the main and secondary wind di- main and secondary directions. A few stations have a rections, which are necessary for the calculation of quite uniform distribution of wind direction (S3 and V5, chf ␳ij , is to a certain extend subjective. Figure 5 shows see station S3 in Fig. 5) or only one evident mode (N7), the wind roses of 6 stations for the selected 1-yr period which makes the assignment of the preferred directions from 1 September 1991 to 31 August 1992. The main subjective. However, since only three stations are in- direction is indicated by dark shading, the secondary volved, this should not have great in¯uence on the re- direction by light shading. Most (34) of the 50 stations sults of the cluster analysis. have a distinct bimodal distribution of wind direction, With the distance measure (11) the matrix of all dis- indicating a strong channeling of the ¯ow. For some tances between pairs of stations can be calculated. To stations, the two preferred directions are about 180Њ see whether there are pairwise correlations signi®cantly

Unauthenticated | Downloaded 09/30/21 01:59 PM UTC 94 JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY VOLUME 15

TABLE 1. Location and altitude of the 50 anemometers used in the MISTRAL ®eld experiment. The labels are the ones shown in Fig. 7. The type indicates whether the station is in a valley (V), on the slope of a valley (S), in unspeci®c hilly terrain (H), on a pass (P), or on a mountain top (M). The third and fourth columns give the altitude of the station above mean sea level (MSL) and the sensor height above ground (AGL). The cluster number in the last column refers to the nine cluster solution described in section 5. Altitude Sensor Cluster Label Type (m MSL) (m AGL) number B1 V 266 68 4 B2 V 257 64 4 B3 V 270 34 1 B6 V 258 54 1 B7 V 273 43 5 B8 V 282 25 1 B9 V 289 53 1 E1 V 350 13 1 E3 V 297 53 1 FIG. 5. Wind roses of six MISTRAL stations for the period from N1 V 237 28 4 1 September 1991 through 31 August 1992. The circles indicate the N2 V 230 30 4 5% level. The main direction has dark shading, the secondary direc- N3 V 294 53 3 tion has light shading. The wind roses of V4, S9, and B7 are three N5 V 280 69 1 of 34 bimodal distributions; B7 has the smallest angle between the N7 V 335 13 8 preferred directions. The wind roses N2 and B2 are 2 of 13 trimodal V2 V 293 43 3 wind roses; B2 has the largest third modus relative to the second. V3 V 310 50 3 Site S3 is one of the two sites with nearly uniform wind roses. V4 V 335 48 3 V8 V 422 21 3 W6 V 350 25 6 are autocorrelated in time, the effective degrees of free- W8 V 450 14 3 dom (Bayley and Hammersley 1946) are reduced. Still, B5 S 350 24 1 S1 S 370 13 1 even if a reduction by a factor of 10 takes place, at least V1 S 355 13 5 380 effective degrees of freedom remain in the worst V5 S 470 13 9 case, and all correlations with |r| Ͼ 0.13 are signi®cantly V6 S 480 6 2 (at the 1% level) different from zero. For each station V7 S 490 7 3 V9 S 475 5 8 the maximum correlation (in absolute value) to any of B4 H 293 29 6 the other stations was searched. Considering only the E2 H 568 13 2 maximum correlation corresponds to a multiple testing E4 H 557 13 2 of the data (Sneyers 1990). Maintaining the same overall E5 H 587 12 2 signi®cance level of 0.01, the binomial test described E6 H 623 13 2 N4 H 460 10 1 in Sneyers (1990) gives a corrected signi®cance level N6 H 480 10 1 of 0.0002 for the maximum correlation. The critical val- S3 H 494 15 3 ue of the correlation is increased for N ϭ 3800 to |r| Ͼ S4 H 598 13 1 0.06 and for N ϭ 380 to |r| Ͼ 0.19. The observed max- S6 H 620 13 2 W1 H 395 13 2 ima range from 0.28 to 0.84 and are all signi®cantly W2 H 440 15 2 different from zero. W3 H 440 14 5 The 50-by-50 correlation matrix is then used for the W4 H 320 54 1 cluster analysis. The hierarchical cluster analysis suc- W7 H 300 26 5 cessively merges stations to groups (or clusters) until S8 P 750 10 7 S9 P 870 13 7 only a single cluster remains that consists of all stations. C M 490 262 2 The number of clusters to be retained can be inferred N8 M 802 10 2 from a plot of the distance at which clusters are merged S2 M 712 10 2 versus the number of clusters (Fig. 6). Moving from 50 S5 M 1175 13 2 S7 M 1001 35 2 clusters (each station forming a group of its own) to W5 M 765 50 4 smaller numbers of clusters, the distance increases, showing abrupt changes at several places. Strong in- creases can be seen for 30, 27, 9, and 7 clusters in Fig. different from zero, Fisher's z transform was used. At 6. To select 27 or 30 clusters does not make sense since least N ϭ 3800 pairs of valid data enter the calculation for these choices less than two stations would belong of the correlations' coef®cients. By means of the z trans- on average to a cluster. We chose nine clusters because form (Stuart and Ord 1987) it can be estimated that all the largest increase of distance takes place between 9 correlations with |r| Ͼ 0.04 are signi®cantly (at the 1% and 8 clusters. For a given number of clusters the mean level) different from zero for this N. As the hourly means distance within all clusters and the mean distance be-

Unauthenticated | Downloaded 09/30/21 01:59 PM UTC FEBRUARY 1998 KAUFMANN AND WEBER 95

FIG. 6. Distance level at which two clusters are merged in the hier- archical cluster analysis as a function of the number of clusters. tween all clusters can be calculated. In the case of 9 clusters these distances become 0.25 and 0.56, respec- tively. Hence, there is a clear separation of the stations into the 9 clusters. The cluster membership of each sta- tion is given in the last column of Table 1. The clusters FIG. 7. Cluster membership of the stations for the nine clusters as are ordered according to their size, cluster 1 includes described in text. The area is divided into six regions indicated by a 13 stations, cluster 9 only 1 station. gray letter. Each station has a label consisting of the letter of its region and a digit (see Table 1). The station on the St. Chrischona In Fig. 7 the cluster membership of the 50 MIS- tower, near the center of the area, is denoted by a C. TRAL stations is shown by different symbols in a map of the area. Often, stations that are very close geographically, like B3 and B6 or B8 and S1, belong Cluster 3 includes stations in the smaller valleys to the same cluster. However, cluster 3, for example, (compared to the Rhein Valley) of the , N3 (V), includes station W8 in the Valley and station N3 the Birs, W8 (V), the , V2, V3, and V4 (all V), in the Wiese Valley, which are not close geographi- a smaller tributary river, V7 (S), and V8 (V), and one cally, but whose orographical location (in a valley, station, S3 (H) in the hilly terrain near the Ergolz Valley. see Table 1) is similar. The main direction at these eight stations was chosen Cluster 1 includes stations in the Rhein Valley around as down-valley and the secondary direction as up-valley. Basel, B3, B6, B8, B9, E1, E3, and N5 (all V), but also The correlations between the eight stations are all pos- several stations located in the hilly terrain between the itive, showing that down-valley ¯ow or up-valley ¯ow valleys, B5 (S), S1 (S), S4 (H), N4 (H), N6 (H), and occurs simultaneously in the smaller valleys. This con- W4 (H). [The type of geographical location (V, S, H, ®rms that these ¯ows are thermally driven (Kaufmann P, or M, see Table 1) of the stations is given in paren- and Weber 1996; Kaufmann 1996). theses.] Most stations of cluster 1 are strongly in¯uenced The stations in the Rhein Valley north of Basel, B1, by the ¯ow in the Upper Rhein Valley. Only station S4 B2, N1, N2 (all V), and W5 (M) form cluster 4. The shows a southerly ¯ow during nighttime, when the Up- ¯ow in this part of the Rhein Valley is mainly affected per Rhein Valley stations show down-valley ¯ow. In by the channeling through the surrounding mountain contrast to the stations in other valleys, the stations of ranges of the Schwarzwald and the Vosges (see also cluster 1 are located in places that are open to the west, Wippermann and Gross 1981; Wippermann 1984). Sta- such that strong westerly winds can suppress the valley tion W5 south of Basel is located on a mountain and ¯ows. may therefore be in¯uenced by the ¯ow through the In cluster 2 stations are from the whole MISTRAL Rhein Valley extending to greater height than the local, area. Besides the two stations in the western part, W1 thermally induced ¯ows in the Birs Valley. and W2 (both H), the stations at higher altitudes, V6 Cluster 5 consists of stations in the Birs Valley, B7 (S), S2 (M), S5 (M), S6 (H), S7 (M), E2 (H), E4 (H), (V) and W7 (H), the Ergolz Valley, V1 (S), and station E5 (H), E6 (H), and N8 (M) and the very exposed lo- W3 (H) in the hilly terrain between the valleys. These cation C (M) belong to this cluster. These stations rep- stations are not located directly on the valley axis, but resent larger-scale ¯ow features in contrast to the valley rather on the valley slopes, which may explain why stations of cluster 3, which have locally generated ther- these stations do not belong with the valley stations to mal winds. cluster 3.

Unauthenticated | Downloaded 09/30/21 01:59 PM UTC 96 JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY VOLUME 15

The largest clusters built by a cluster analysis are We can compare the station clusters obtained in the generally those with small distances (highest correla- present paper (Fig. 7) with the mean winds at the stations tions) within the clusters. The elements with larger dis- for the 12 typical regional ¯ow patterns obtained by the tances are left to build clusters with other outliers. These classi®cation of the wind ®elds (Kaufmann and Weber outliers are not necessarily clustered with the nearest 1996; Kaufmann 1996). It can be seen from Figs. 8 and element, but with the nearest element not already part 9 of Kaufmann and Weber (1996) that stations belonging of another cluster. The identi®cation of outliers depends to the same cluster show winds consistent with the typ- to some extent on the choice of the clustering method ical regional ¯ow patterns. (Anderberg 1973). From the clusters obtained, we conclude that the cor- The two stations B4 (H) and W6 (V) form cluster 6. relation of two stations not only depends on the hori- Both stations have a trimodal wind rose (Fig. 5), and zontal distance between the stations but is also, and the choice of the preferred directions is not obvious. sometimes even much more, in¯uenced by the vertical The correlation between these two station is relatively distance of the stations (see also Palomino and MartõÂn low (␳chf ϭ 0.39). Each is slightly higher correlated to 1995) and the orographic features of the station loca- one other site, both of these being a member of a larger tions. Hence, interpolation of winds using only the hor- group. This cluster seems to be built due to the effect izontal distance as weights of the interpolation scheme described above. will not be suf®cient in complex terrain. More important The two stations S8 and S9 (both P) on passes of the than the horizontal distance is whether the stations and Jura Mountains form cluster 7. They are moderately the grid points at which winds should be interpolated correlated (␳chf ϭ 0.60). Stations S8 and S9 have strong- have similar exposures, orographic features, and alti- ly channeled ¯ow in south±north and southeast±north- tude. For example, one suitably chosen station in the west directions, respectively (Fig. 5). These directions Birs, Ergolz, or Wiese Valley may well represent the are perpendicular to the Jura Mountain range (Fig. 4). ¯ow in all smaller valleys, as all these ¯ows are ther- Hence, the wind direction at these pass stations repre- mally forced and the whole MISTRAL area usually ex- sents the ¯ow from the Swiss Middleland to the Rhein periences about the same incoming solar radiation dur- Valley across the Jura Mountains. This ¯ow seems to ing a day. be governed by mechanisms other than the ¯ow in the MISTRAL area north of the Jura Mountains. 6. Summary and conclusions Cluster 8 consists also of two stations only: N7 (V) and V9 (S). Station N7 in the Valley is special For ¯ow patterns with strong channeling, a simple since it has mostly northerly, down-valley winds (see correlation coef®cient is de®ned. It is based on the ex- Fig. 5). These two stations are only weakly correlated istence of two preferred directions for a channeled ¯ow. (␳chf ϭϪ0.28) and are presumably outliers that are by Three classes of directions can then be formedÐthe chance in the same group. Station N7 actually shows main direction class, a secondary direction class, and slightly higher correlations with ®ve other stations, all the remaining directions. Assigning a numeric value to but one of them with north±south channeled ¯ow. each of these three classes, one can calculate a linear Cluster 9 consists of a single station V5 (S) located correlation coef®cient. In this way a correlation coef- on the south-facing slope of the Ergolz Valley. This ®cient is de®ned that takes into account the channeling station is greatly affected by local in¯uences like shel- of the ¯ow. A simple model for channeled ¯ow at two tering and slope winds and does not follow larger-scale locations was considered. The new correlation coef®- ¯ow patterns (see also Figs. 8 and 9 of Kaufmann and cient, which makes use of the speci®c properties of Weber 1996). It is most highly correlated (␳chf ϭ 0.34) channeled ¯ow but is only applicable to such ¯ows, is with the nearby station V4. the only one that gives maximum correlation even if the If only seven clusters are chosen, the cluster analysis two preferred directions are not opposite each other and procedure merges clusters 1 and 7 and clusters 5 and is, therefore, well suited for the analysis of channeled 6. If only ®ve clusters are retained, the cluster analysis ¯ow. Other correlations taken from the literature, which procedure merges in addition clusters 8 and 9 and clus- are applicable to all types of ¯ows and do not make use ters 3 and 4. of any speci®c ¯ow properties, show a decrease of cor- For comparison, the same cluster analysis as de- relation if the two preferred directions are not opposite scribed above was performed with a correlation matrix each other. based on Pearson's contingency coef®cient (Sachs The correlation for channeled ¯ow was applied to a 1982). In this case 11 clusters emerge. Still, the large dataset of atmospheric wind measurements. In a me- clusters 1, 2, 3, and 4 remain essentially the same. The soscale region over complex terrain, 50 ground stations most important difference is that the two pass stations with anemometers captured the near-surface wind ®eld S8 and S9 do not form their own cluster but belong during the MISTRAL ®eld experiment. The 1-h means now to cluster 1. Some stations (V6, V9, W4, and N7) over an entire year were used to calculate the correla- form clusters of their own and are not merged to any tions between all pairs of stations. Based on this cor- of the larger clusters. relation matrix the stations were grouped by a cluster

Unauthenticated | Downloaded 09/30/21 01:59 PM UTC FEBRUARY 1998 KAUFMANN AND WEBER 97 analysis. Nine groups of stations were identi®ed show- Hooper, J. W., 1959: Simultaneous equations and canonical corre- ing similar channeling behavior. The stations within a lation theory. Econometrica, 27, 245±256. Jackson, I. J., and H. Weinland, 1995: Classi®cation of tropical rain- group are not necessarily close in space, but instead may fall stations: A comparison of clustering techniques. Int. J. Cli- be located throughout the whole region. More important matol., 15, 985±994. for a similar behavior of the wind than spatial distance Jain, A. K., and R. C. Dubes, 1988: Algorithms for Clustering Data. is the topographic location and the altitude of the sta- Prentice-Hall, 320 pp. Johnson, N. L., and S. Kotz, 1969: Distribution in Statistics: Discrete tions. These results cast some doubt on the widely used Distributions. John Wiley & Sons, 328 pp. interpolation schemes for near-surface wind ®elds that Jupp, P. E., and K. V. Mardia, 1980: A general correlation coef®cient only use spatial distance for the determination of the for directional data and related regression problems. Biometrika, weights. 67, 163±173. Kamber, K., and P. Kaufmann, 1992: Das Mistral-Messnetz, Kon- Because many ¯ows in geophysics are in¯uenced by zeption, Aufbau und Betrieb. Regio Basiliensis, 33, 107±114. topography they may experience channeling by these Kaufmann, P., 1996: Regionale Windfelder uÈber komplexer Topo- boundaries. Our new correlation coef®cient provides a graphie. Ph.D. thesis No. 11565, Swiss Federal Institute of Tech- simple tool for the analysis of such channeled ¯ows nology (ETH), Zurich, Switzerland, 147 pp. [Available from Dr. Rudolf O. Weber, Paul Scherrer Institute, CH-5232 Villigen PSI, from a wide range of ®elds. Switzerland.] , and R. O. Weber, 1996: Classi®cation of mesoscale wind ®elds Acknowledgments. The project REKLIP/MISTRAL is in the MISTRAL ®eld experiment. J. Appl. Meteor., 35, 1963± 1979. partly funded by the two cantons of Basel. Additional Mardia, K. V., 1972: Statistics of Directional Data. Academic Press, data were kindly provided by the Swiss Meteorological 357 pp. Institute, ZuÈrich and the Geographical Institute of the Palomino, I., and F. MartõÂn, 1995: A simple method for spatial in- University of Basel. terpolation of the wind in complex terrain. J. Appl. Meteor., 34, 1678±1693. Parlow, E., 1992: REKLIPÐKlimaforschung statt Meinungsmache am Oberrhein. Regio Basiliensis, 33, 71±80. REFERENCES , 1996: The regional climate project REKLIPÐAn overview. Theor. Appl. Climatol., 53, 3±7. Anderberg, M. R., 1973: Cluster Analysis for Applications. Academic Sachs, L., 1982: Applied Statistics: A Handbook of Techniques. Press, 359 pp. Springer-Verlag, 706 pp. Bayley, G. V., and J. M. Hammersley, 1946: The ``effective'' number Sneyers, R., 1990: On the statistical analysis of series of observations. of independent observations in an autocorrelated time series. J. World Meteorological Organization Tech. Note 143, 192 pp. Roy. Stat. Soc., 8 (Suppl.), 184±197. [Available from WMO, Case Postale 2300, CH-1211 Geneva 2, Breaker, L. C., W. H. Gemmill, and D. S. Crosby, 1994: The appli- Switzerland.] cation of a technique for vector correlation to problems in me- Stooksbury, D. E., and P. J. Michaels, 1991: Cluster analysis of south- teorology and oceanography. J. Appl. Meteor., 33, 1354±1365. eastern U.S. climate stations. Theor. Appl. Climatol., 44, 143± Breckling, J., 1989: The Analysis of Directional Time Series: Appli- 150. cations to Wind Speed and Direction. Vol. 61, Lecture Notes in Stuart, A., and J. K. Ord, 1987: Kendall's Advanced Theory of Sta- tistics, Vol. 1: Distribution Theory. Charles Grif®n & Company Statistics, Springer-Verlag, 238 pp. Ltd., 604 pp. Crosby, D. S., L. C. Breaker, and W. H. Gemmill, 1993: A proposed Sverdrup, H. U., 1917: UÈ ber die Korrelation zwischen Vektoren mit de®nition for vector correlation in geophysics: Theory and ap- Anwendungen auf Meteorologische Aufgaben. Meteor. Z., 34, plication. J. Atmos. Oceanic Technol., 10, 355±367. 285±291. Dietzius, R., 1916: Ausdehnung der Korrelationsmethode und der Weber, R. O., and P. Kaufmann, 1995: Automated classi®cation Methode der kleinsten Quadrate auf Vektoren. Sitzungsber. scheme for wind ®elds. J. Appl. Meteor., 34, 1133±1141. Akad. Wiss. Wien Math. Naturwiss. Kl. Abt 2a, 125, 3±20. Whiteman, C. D., 1990: Observations of thermally developed wind Essenwanger, O. M., 1986: General Climatology, 1B: Elements of systems in mountainous terrain. Atmospheric Processes over Statistical Analysis. Elsevier, 424 pp. Complex Terrain, Meteor. Monogr., No. 45, Amer. Meteor. Soc., Fisher, N. I., 1993: Statistical Analysis of Circular Data. Cambridge 5±42. University Press, 277 pp. , and J. C. Doran, 1993: The relationship between overlying , and A. J. Lee, 1983: A correlation coef®cient for circular data. synoptic-scale ¯ows and winds within a valley. J. Appl. Meteor., Biometrika, 70, 327±332. 32, 1669±1682. , and , 1986: Correlation coef®cients for random variables Wippermann, F., 1984: Air ¯ow over and in broad valleys: Channeling on a unit sphere or hypersphere. Biometrika, 73, 159±164. and counter-current. Beitr. Phys. Atmos., 57, 92±105. Hanson, B., K. Klink, K. Matsuura, S. M. Robeson, and C. J. Will- , and G. Gross, 1981: On the construction of orographically mott, 1992: Vector correlation: Review, exposition, and geo- in¯uenced wind roses for given distributions of the large-scale graphic application. Ann. Assoc. Amer. Geogr., 82, 103±116. wind. Beitr. Phys. Atmos., 54, 492±501.

Unauthenticated | Downloaded 09/30/21 01:59 PM UTC