<<

------NRC Form 390A U.S. NUCLEAR REGULATORY COMMISSION (12 2003) NRCMD 3 9 RELEASE TO PUBLISH UNCLASSIFIED NRC CONTRACTOR SPEECHES, PAPERS, AND JOURNAL ARTICLES (Please type or prrnt)

1 TITLE (State in lull as it rlppeers on the speech, paper, or journal article) I Aaalication of Gener#slizedPareto Distribution to Constrain Uncertainty in Low-Probability Peak Ground Accelerations

2. AUTHOR(s) Luc Huyse, Rui Cheri, and John A. Stamatakos

3. NAME OF CONFERENCE. LOCATION. AND DATE@) NIA (Journal Article)

4. NAME OF PUBLICATION Bulletin of Seismological Society of America

5. NAME AND ADDRESS 01: THE PUBLISHER TELEPHONE NUMBER OF THE PUBLISHER Bulletin of Seismological Society of America PO BOX1268 (541) 549-2203 221 West Cascade - Suite #C Sister, Oregon 97759 - - 6 CONTRACTOR NAME AND COMPLETE MAILING ADDRESS (Include ZIP code) TELEPHONE NUMBER OF THE CONTRACTOR James R Wlnterle Center for Nuclear Waste Regulatory Analyses (210) 522-5249 6220 Culebra Road San Antonlo. TX 78238

7. CERTlFlCATlON (ANSWER ALL QUESTIONS)

A. COPYRIGHTED MATERIAL - Does this speech, paper, or journal article contain copyrighted material? If ves, attach a letter of release from the source that holds the copvriaht.

B. PATENT CLEARANCE - Does this speech, paper, or journal art~clerequire patent clearance? If yes, the NRC Patent Counsel must signify clearance by signing below

COUNSEL (Type or SIGNATURE DATE

C. REFERENCE AVAILABILITY - Is all material referenced in this speech, paper, or journal article available to the public either through a public library, the Government Printing Office, the National Technical Information Service, or the NRC Public Document Room? If no, list below the specific availability of each referenced document. 1 SPECIFIC AVAILABILITY I D. METRIC UNIT CONVERSION - Does this speech, paper, or journal article contain measurement and weight values? If yes, all must be converted to the International System of Units, followed by the English units in brackets, pursuant to the NRC Policy Statement implementing the Omnibus Trade and Competitiveness Act of 1988, Executive Order 12770, July 25. 1991.

8. AUTHORIZATION The signatures of the NRC project manager and the contractor official certify that the NRC contractor speech, paper, or journal article is authorized by NRC, that it has undergone appropriate peer review for technical content and for material that might compromise commercial proprietary rights, and that it does not contain classified, sensitive unclassified, or nonpublic information. (NRC MD 3.9. Part II(A)(l)(d))

A CONTRACTOR AUTHORIZING OFFICIAL (Type or print name) I Sltakanta Mohantv. Asst Director 1 I B. NRC RESPONSIBLE PROJECT MANAGER (Type or print name) I'OFFICE~DIVISION 'JAIL STOP I E MAIL I D Did you place the speech paper or jourral art~clen ADAl.4S7 YES 3 .lo C if so prowde the ADAMS Accesston No

(Use Tevp ate OClO 099) -- -. -

SIGNATURE

I I I I tlRC FORM 390A (12-2003) PRINTED ON RECYCLED PAPER Application of Generalized to Constrain Uncertainty in Low-Probability Peak Ground Accelerations Luc Huyse,1 Rui Chen,2 John A. Stamatakos3 Abstract Probabilistic seismic hazard analysis (PSHA) has now become standard practice to characterize earthquake ground motion hazard and to develop ground motion inputs for seismic design and seismic performance analyses. One emerging issue is the application of PSHA at low annual exceedance probabilities, particularly the characterization of scatter (aleatory variability) in the recorded ground motion parameters, including peak ground acceleration (PGA). Lognormal distributions are commonly used to model ground motion variability. However, a lognormal distribution, when unbounded, can yield a nonzero probability for unrealistically high ground motion values. Bounded lognormal distributions, on the other hand, may distort the data, render the shape of the hazard curve artificial, and produce erroneous hazard results at low annual exceedance probabilities. In this paper, we evaluate the appropriateness of the lognormal assumption for low-probability ground motions by examining the tail behavior of the PGA recordings from the Pacific Earthquake Engineering Research (PEER) Next Generation Attenuation (NGA) database. We also illustrate the tail behavior of PGA residuals using Abrahamson and Silva NGA ground motion relations and the NGA database. Our analyses show that the tail portion of the PGA data does not follow a lognormal distribution. Instead, it is better characterized by the Generalized Pareto Distribution (GPD). We propose using a composite distribution model that consists of a lognormal distribution (up to a threshold value of ground motion residual) combined with GPD for the tail region. We further demonstrate implications of this composite distribution model in PSHA using a simple example and GPD parameters derived from the residual fit of the PEER NGA Database. Our results show that at low exceedance probabilities, the composite distribution model yields considerably lower PGA values than the unbounded lognormal distribution. The composite distribution also produces hazard curves that are more reasonable in shape than truncated lognormal distributions, because the PGA increases asymptotically with a decreasing probability level rather than remain fixed at a predefined maximum value. The presented approach is readily adapted to spectral accelerations and other ground motion parameters.

Keywords: extreme events, tail modeling, peak-over-threshold analysis, Generalized Pareto Distribution, low probability seismic ground motions, and probability seismic hazard analysis

Introduction The high-level waste repository proposed for Yucca Mountain would be subject to vibratory ground motions from earthquakes because of its location in the tectonically active Basin and Range province of western North America. Potential seismic hazards at Yucca Mountain should be considered in safety assessments for the preclosure period as well as performance assessment calculations after closure of the proposed repository. During the preclosure period, structures, systems, and components of the geologic repository operations area must meet radiological

1 Principal Engineer, Reliability and Materials Integrity Section, Mechanical and Materials Engineering Division, Southwest Research Institute®, San Antonio, Texas 2 Consultant and Adjunct Professor, Department of Civil Engineering, California State University, Chico, California 3 Assistant Director, Center for Nuclear Waste Regulatory Analyses, Geosciences and Engineering Division, Southwest Research Institute®, San Antonio, Texas

1 safety objectives. The U.S. Nuclear Regulatory Commission (NRC) regulations require that the preclosure safety analysis identifies and systematically analyzes naturally occurring hazard [code of federal regulations (CFR), chapter 10, part 63.112(b)]. Seismicity is also identified as an event class that may be relevant to evaluations of repository postclosure performance [10 CFR 63.102(j)].

To assess the seismic hazard at Yucca Mountain, the DOE conducted a probabilistic seismic hazard assessment (PSHA) via expert elicitation (CRWMS M&O, 1998) that resulted in probabilistic hazard curves relating vibratory ground motion to annual exceedance probability. Application of the PSHA to the small annual exceedance probabilities (e.g., below 10-6 per year) necessary to conduct a performance assessment for the potential repository at Yucca Mountain results in ground motions that have been questioned as unrealistically large (e.g., Corradini, 2003; Kokajko, 2005; NRC, 2005; Schlueter, 2000).

The 2003 Nuclear Waste Technical Review Board (NWTRB) joint meeting on natural and engineered systems on seismic issues focused on the very large vibratory ground motions estimated by the DOE probabilistic seismic hazard assessment at annual exceedance probabilities below 10-6. The NWTRB sent a letter to DOE expressing concern that the extrapolation of the probabilistic seismic hazard curves to very low probabilities resulted in ground motion estimates that were physically unrealistic (Corradini, 2003). The NWTRB noted that use of unrealistically large ground motion, even if acknowledged as a conservative approach, could lead to a number of problems including (i) a skewed understanding of repository behavior and the significance of different events; (ii) a consideration of events for which there is little or no understanding or engineering practice; and (iii) an undermined confidence in the scientific basis of the process under consideration (Corradini, 2003). The regulations at §63.102(j) explain that the seismic event class includes the range of credible earthquakes. DOE transmitted Technical Basis Document No. 14 to the NRC (DOE, 2004). In that document (Bechtel SAIC Company, LLC, 2004), it was acknowledged that the large ground motions estimated by the probabilistic seismic hazard assessment at small annual exceedance probabilities (below 10!6) overestimated the severity of low-probability ground motion at Yucca Mountain.

The DOE PSHA followed the common practice for PSHA studies in that lognormal distributions were used to model the ground motion scatter. Because the lognormal distribution belongs to the Gumbel domain of attraction (Castillo, 1988; Singpurwalla, 1972), decreasing the exceedance probability results in progressively larger ground motions that are in part due to extrapolation into the extreme tails of the lognormal distribution. In this paper, we evaluate the appropriateness of the lognormal assumption for low-probability ground motions. As a first example the PGA records from two well-recorded aftershocks of the 1999 Chi-Chi earthquake are analyzed. This data set was chosen because it provided a sufficiently large sample associated with a single magnitude (Mw) to which peak-over-threshold analysis techniques could be directly applied. In a second step, a peak-over-threshold analysis is performed on the PGA residuals, using the Abrahamson-Silva NGA ground motion relations (Abrahamson and Silva, 2007) as an example. This two-step approach allows us to demonstrate the universal applicability of peak- over-threshold models to either the raw PGA records of specific events or to the residuals in a regression model with broader applicability (the regression model covers a wider range of earthquakes, distances and fault mechanisms).

2

We will demonstrate that a threshold value marks the tail portion of the PGA or residual data that is better characterized by the Generalized Pareto Distribution (GPD). Therefore, ground motions are modeled by a composite distribution model (CDM) that consists of a lognormal distribution below the threshold value and GPD for the tail region. The application of the CDM in PSHA will be illustrated using a simple example and GPD parameters derived from the residual fit of the PEER NGA Database. The hazard results using CDM will be compared with those obtained using unbounded and truncated lognormal distributions.

Tail Behavior and Peak-Over-Threshold Analysis For large return periods or very low frequencies, an accurate estimation of the tails of the distribution is of primary interest. Classical distribution parameter estimation methods such as moment or maximum likelihood estimation are of limited use in such an application because they are heavily weighted toward the bulk of the data.

Generally speaking, two types of statistical models of extremes can be distinguished on the basis of independently and identically distributed data:

• Classical extreme values analysis: only the maxima within a given reference period are included in the analysis

• Peak-over-threshold analyses: all data that exceed a given threshold value are included

Arguably the biggest drawback of the classical extreme value analysis is that only a single data point per reference period can be used. For instance, in an application where the maximum PGA due to an earthquake is to be estimated, only the largest PGA value for a specific year or decade could be used. Such an approach severely limits the amount of data available to estimate the distribution parameters and therefore introduces significant statistical uncertainty on the distribution parameters. In addition, the method discards some large PGA values if they were to occur within a decade with several large earthquakes and may include some smaller PGA values, which are maxima of relatively calm decades.

Peak-over-threshold analyses, on the other hand, use only data (xi) that exceed a given threshold value (λ) in the distribution parameter estimation. Pickands (1975) identified the GPD as the limiting distribution for the excesses (X – λ) for a sufficiently large threshold value λ. Throughout this paper, upper case notation X will be used to denote the X, whereas a lower case x refers to specific sample data or realizations of the random variable X.

Generalized Extreme Value Distribution

Distribution Parameters It can be shown that – under some mild conditions (e.g., Castillo, 1988; Coles, 2001) – the distribution of maxima converges to one of three limiting distributions, which are more widely known as the Gumbel (Type I), Frechet (Type II), and Weibull (Type III) distribution. All three distribution types have a location (λ) and scale (δ) parameter; the Frechet and Weibull

3 distribution also have a (c), where c > 0 for the Frechet and c < 0 for the Weibull.

These three distribution types can be further grouped into a Generalized Extreme Value Distribution or GEVD (also referred to as the von Mises form of the extreme value distribution—see Castillo, 1988):

−1/ c ⎪⎧ ⎡ x − λ ⎤ ⎪⎫ F (x) = exp − 1+ c ,c ≠ 0 (1) GEVD ⎨ ⎢ δ ⎥ ⎬ ⎩⎪ ⎣ ⎦ + ⎭⎪ where the + denotes that the GEVD is defined only when the term inside the square brackets is positive. When the parameter c is zero, the is retrieved as a limiting case of the GEVD: − λ = ⎧− ⎡− x ⎤⎫ = FGumbel (x) exp⎨ exp⎢ ⎥⎬, c 0 (2) ⎩ ⎣ δ ⎦⎭

Statistical uncertainty is associated with each of the distribution parameters λ, δ, and c. An advantage of using the generalized extreme value distribution over the individual Gumbel, Frechet, or Weibull distributions is that the statistical confidence bounds (i.e., uncertainty) on the shape parameter (c) reflect the relative likelihood of the data following either a Gumbel, Frechet or . Note that even small amounts of curvature (“c” small but non-zero) can have a significant effect on the PGA estimates when extrapolation far beyond the available data records is necessary.

Parameter Estimation Methods Several methods are available for parameter estimation in the statistical literature: moment estimation, maximum likelihood estimation, and minimum- best linear unbiased , among others. A practical, -based method of estimating the parameters is described in Maes and Breitung (1993): a weighted least-square fit is performed on the empirical distribution of the data in the Gumbel plot. A Gumbel plot is achieved by plotting –ln (-ln[Femp(xi)]), where Femp is the empirical distribution, achieved by assigning an equal weight 1/n to all n data xi. In the Gumbel plot, the Gumbel distribution (c = 0) becomes a straight line, whereas a Frechet and Weibull distribution are concave and convex curves, respectively:

1 ⎡ x − λ ⎤ − ln()− ln[]F (x) = ln 1+ c , c ≠ 0 GEVD ⎢ δ ⎥ c ⎣ ⎦ + (3) − λ − ()− []= −⎡ x ⎤ = ln ln FGumbel (x) , c 0 ⎣⎢ δ ⎦⎥

Because least-squares estimation can be interpreted as a maximum likelihood estimator, the estimator is unbiased. If the data set is not too small, the covariance matrix Σ of the parameters can most readily be obtained using the asymptotic normality assumption of maximum likelihood and is given by the minus inverse of the of the matrix (Castillo, 1988):

4

−1 ⎡ ∂ 2l ⎤ []Σ = −E (4) ⎢∂θ ∂θ ⎥ ⎣⎢ i j ⎦⎥ where l represents the log-likelihood and θi and θj represent any combination of the three GEVD parameters λ, δ, and c.

Generalized Pareto Distribution

Distribution Parameters Pickands (1975) showed that the GPD arises as the limiting distribution of the excesses, X – λ, for a sufficiently large threshold λ. The GPD has the following form:

−1/ c ⎧ ⎡ x − λ ⎤ ⎪1− 1+ c ,c ≠ 0 ⎪ ⎢ δ ⎥ F (x) = ⎣ ⎦ + (5) GPD ⎨ − λ ⎪ − ⎡− x ⎤ = 1 exp⎢ ⎥,c 0 ⎩⎪ ⎣ δ ⎦ and is characterized by a location (λ), scale (δ), and shape (c) parameter.

The tail heaviness index H (Boos, 1984) is a measure of the curvature of the log-exceedance function L(x) and is defined as:

− L"(x) H (x) = with L(x) = − ln[]1− F(x) (6) []L'(x) 2 where F(x) is the cumulative distribution function (CDF) of the random variable X.

For a GPD, the tail heaviness index is readily shown to be constant and equal to the shape parameter: H = c. In general, the tail heaviness index H benchmarks the tail behavior against an exponential tail. Depending on the value of H (or c), three types of tails can be discerned: light, neutral, and heavy (Figure 1). Special cases for the GPD are the (c = 0), the uniform distribution (c = -1), and the (c = -1/2). The latter two are light-tailed distributions; examples of heavy-tailed distributions are the Cauchy and Pareto distribution.

Among many other applications, the GPD distribution has been used successfully in the estimation of extreme order of maximum wave crest heights in offshore applications (Castillo et al., 1995) and the characterization of the frequency of extreme earthquake events (Pisarenko and Sornette, 2003).

5 When the tail heaviness index is negative, the distribution is bounded and the maximum attainable value for the random variable X is reached when:

− λ δ ⎡ + xmax ⎤ = ⇔ = λ − 1 c 0 xmax (7) ⎣⎢ δ ⎦⎥ c

Statistical uncertainty is associated with each of the distribution parameters λ, δ, and c, which implies that this upper bound is itself uncertain. When -1 < c < 0, the probability density function (PDF) vanishes when the finite end point is reached; for c = -1, the PDF is non-zero but finite; for c < -1, the PDF becomes infinite at the asymptotic upper bound (Caers and Maes, 1998).

Parameter Estimation Methods Various estimators for the tail heaviness index and GPD parameters exist in the statistical literature (e.g., Hill, 1975; Hosking and Wallis, 1987; Hsieh, 1999; Drees et al., 2000). For general use, the method of moment and probability weighted moment estimators (Hosking and Wallis, 1987) are problematic because the variance of X is infinite when c > 1. In the elemental percentile method (Castillo et al., 1995), the initial estimate of the tail heaviness index is estimated directly from the CDF at three observed order statistics. The final estimate of the tail heaviness index is given by the of all possible distinct triplets of order statistics within the n observations. The other GPD parameters are subsequently estimated through linear regression.

Maes (1995) proposed a practical, risk-based method of estimating the parameters similar to the one developed for the GEVD in Maes and Breitung (1993). In a first step the threshold is selected. An important property of the GPD in this regard is that the shape parameter is invariant for a shift to a higher threshold λ. The threshold is selected as the beginning of the last stable, linear part of the mean excess E(X – λ | x > λ) plot. Above threshold levels for which the GPD is appropriate, the expected value E(X – λ | x > λ) of the conditional excesses X – λ will vary linearly with λ (Coles, 2001). In some cases, only very little data may be present in the last stable linear part of the mean excess curve; in this case it is recommended to perform the GPD estimation for multiple threshold levels. This approach will be illustrated in the application to both the PGA and residual data.

In the second step, a weighted least-square fit with respect to the scale and shape parameters (δ and c) is performed on the empirical distribution of the data in the minus log-exceedance plot:

[ ()λ δ − 2 ] min wi LGPD (xi | , ,c) Lemp (xi ) δ ,c − λ ⎧ = − []− = 1 ⎡ + c(x )⎤ (8) ⎪LGPD (x) ln 1 FGPD (x) ln 1 where ⎨ c ⎣⎢ δ ⎦⎥ ⎪ = − []− ⎩Lemp (x) ln 1 Femp (x)

Because the estimate is obtained using a similar least-squares estimation procedure for GPD as for the GEVD, the estimator is unbiased. If the data set is not too small, the covariance matrix Σ of the parameters can most readily be obtained using the asymptotic normality assumption of

6 maximum likelihood estimators and is given by the minus inverse of the Fisher information matrix (Castillo, 1988) given by Equation (4), in which θi and θj represent any combination of the scale and shape parameters δ and c in the GPD. These parameter uncertainties can in turn be used to estimate the confidence bounds on any percentile of interest. Confidence bounds can be obtained using analytical approximations or using bootstrap resampling methods (Efron and Tibshirani, 1993). For instance, using a assumption, the first-order second moment estimate of the 90% confidence bounds for the maximum attainable value of a light-tailed (c < 0) random variable X is given by:

⎡ δ δ ⎤ λ − −1.64σ ,λ − +1.64σ ⎣⎢ c c ⎦⎥ (9) Var(δ ) + Var(c)(δ / c)2 − 2Covar(δ ,c)δ / c where σ = c and Var(δ) and Var(c) denote the variance of the scale and shape parameter, respectively, and Covar(δ, c) is the covariance of the scale and shape parameter.

Sensitivity to the threshold level can be assessed in two different ways. The first method uses the invariance of c for increasing threshold level λ. When λ is selected sufficiently high such that only points in the tail are included in the GPD estimation, the estimates for c will be invariant with respect to the threshold λ. It should be pointed out, however, that statistical instability will adversely affect the accuracy of the tail heaviness and scaling parameter estimates when the threshold λ is chosen so high that only very few points remain in the tail and can be used for parameter estimation. An excellent illustration of this property is given in Castillo et al. (1995) where peak values of the wave heights are analyzed. Alternatively, Caers and Maes (1998) proposed an approach where the optimal choice for the threshold λ is selected such that the total mean square error (consisting of the sum of the squared bias and variance) of the estimator is minimized.

Tail Equivalence Tail equivalence is an important concept in the characterization of tail behavior. It provides a criterion for the quality of an approximation of a CDF F1(x) by another distribution F2(x). For instance, F1(x) could be the empirical distribution of the raw data, whereas F2(x) is a parametric distribution such as a Gumbel, lognormal, or GPD. Two distributions, F1(x) and F2(x), are considered to be right-tail equivalent if for large values of the variable x:

1− F (x) lim 1 = 1 (10) x→∞ − 1 F2 (x)

In other words, two distributions are right-tail equivalent if they ultimately have the same tail behavior, irrespective of what the distribution of the bulk of the data looks like. An example of such tail equivalence is given in Figure 2, which compares the pore size distribution associated with two different aluminum casting processes. Although the CDF FX(x) at x or the exceedance probability (or complementary CDF) 1 – FX(x) is quite different for the two processes (left in

7 Figure 2), the tail plot on the right of Figure 2 indicates that the two distributions are essentially tail equivalent. This is conceptually illustrated in Figure 3, which shows that the ratio of the exceedance probabilities for the probability distributions F1(x) and F2(x) of the two processes converges to unity for large pore sizes. Note that the largest data points are excluded from Figure 3 because the empirical CDF value of the upper order statistics is necessarily highly uncertain. In practical applications, tail equivalence is not so much identified on the basis of the raw data but rather by comparing the shape parameters of the two distributions. If considerable overlap exists between the confidence bounds of the two shape parameters, it can be concluded that the distributions are tail equivalent.

The tail equivalence between the GPD and the GEVD distribution families follows immediately from evaluating the definitions for x → ∞.

−1/ c ⎡ x − λ ⎤ 1+ c − − ⎢ δ ⎥ 1 FGPD (x) = 1 FGPD (x) = ⎣ ⎦+ = lim − 1 (11) x→∞ 1− F (x) − ln[]F (x) − λ 1/ c GEVD GEVD ⎡ + x ⎤ ⎢1 c ⎥ ⎣ δ ⎦+

Consequently, although the GPD distribution is used to model all large values whereas the GEVD models all extremes (regardless whether they exceed a particular threshold or not), each of the GPD types is tail equivalent with an extreme value distribution (Table 1).

Application to the Chi-Chi Aftershock PGA Data As a first example, the PGA records from two well-recorded aftershocks of the 1999 Chi-Chi earthquake are analyzed. These data are part of the PEER NGA database (non-public version 7.3, 02-08-06), which is discussed in detail in Chiou and Youngs (2006) and shown in Figure 4. Both aftershocks have moment magnitude of 6.2 and reverse faulting mechanism. Together, there are 564 records.

This data set was chosen because it provided a sufficiently large sample associated with aftershocks of the same magnitude and similar source characteristics and the recording sites have similar crustal characteristics. Peak-over-threshold analysis techniques could be directly applied to data grouped by distance bins without the use of any type of regression modeling, which might influence the tail behavior. However, it should be noted that other factors, such as site conditions and path characteristics, can introduce sample bias. Therefore, the peak-over- threshold analysis on PGA raw data is exploratory in nature.

Some exploratory modeling is performed first on the data set to become acquainted with its tail characteristics. The empirical distribution of the PGA was analyzed and the appropriateness of the traditional lognormal distribution model was assessed. To this extent, the Chi-Chi data set was partitioned into bins according to their closest-distance to ruptured area (r) and the empirical distribution of the peak ground acceleration was plotted on lognormal probability paper. Figure 5 shows that the data generally follow a lognormal distribution, i.e. a straight line. Because the slope in the lognormal plot is essentially identical for all closest distance bins, it can be concluded that the coefficient of variation is independent of the distance r.

8

The log-exceedance or tail plot in Figure 6 shows the behavior of the extreme upper tails. With the exception of the 35-45 km (22-28 mi) distance bin, the PGA distribution seems to have a finite upper bound. Although insufficient data exist to draw firm conclusions, it does not seem unreasonable to assume that the lack of upper bound in the 35-45 km (22-28 ) series is perhaps caused by statistical fluctuations. Figure 6 also reveals that the maximum PGA decreases with increasing distance.

Because all data were included in Figure 6 (equivalent to a threshold value λ = 0 g), the exploratory modeling gives only a general indication of the tail behavior which will now be quantified. Since it is generally accepted that log(PGA) may be proportional to log(distance), equal log(r) bins were created and the number of data within each bin is recorded in Figure 7. Given that the tail region typically contains only a few percent of all data, many of the distance range bins contain insufficient data for a tail analysis to be performed. The tail fitting procedure will be illustrated using the 130 data points in the 50 to 70 km (31-44 mi) range.

In a first step, the threshold λ must be determined. Several procedures can be used for this (Coles, 2001). Here we demonstrate a straightforward graphical procedure, based on the conditional mean excess E(PGA – λ | PGA > λ), where E(.) denotes the expectation operator. If the random variable follows a generalized Pareto distribution, then the conditional mean excess is a linear function of the threshold level λ. This leads to a selection procedure for λ, as the start point of the last linear segment of the conditional mean excess curve. Figure 8 shows the conditional mean excess as function of the threshold level λ. The resulting threshold level is indicated in Figure 8; λ = 0.1275g. Unfortunately, only 9 data points exceed this level; a sample size that is perhaps too small to draw a meaningful inference. It therefore makes sense to also consider λ = 0.085g and even λ = 0.065g as a possible threshold level.

The GPD parameters and other tail statistics that are associated with each of the threshold levels are summarized in Table 2. The shape parameter is essentially constant for all three threshold levels. This illustrates the invariance of the tail heaviness index with respect to a shift of the threshold deeper into the tail region. Since the tail heaviness is negative, the GPD is bounded in this case and an upper bound limit for the PGA can be estimated. Figure 9 shows the raw Chi- Chi PGA data as well as the GPD fit and 95th percentile of the upper limit; the figure also shows where the general lognormal model (with parameters calculated from the statistical moments of all the PGA data) would fall. It can be concluded that – in the tail region – the lognormal model significantly overestimates the PGA for a given reliability level 1 – FX(x) or log-exceedance value. The different shapes of the GPD and the lognormal illustrate that the two distributions are not tail equivalent; the lognormal distribution belongs to the unbounded Gumbel domain of attraction (Castillo, 1988) whereas the data clearly suggest convergence to a bounded Weibull extreme value distribution.

It must therefore concluded that, while the unbounded lognormal PGA gives increasingly large values for PGA as a function of the return period, the GPD model for the 50-70 km (31-44 mi) distance bin has a finite upper bound on the PGA. For exceedance probabilities less than 7%, i.e. the extent of the tail region, the lognormal model assumption overestimates the maximum PGA. The tail of the lognormal distribution is much longer than for the GPD model.

9

Because δ and c are subject to statistical uncertainty, the maximum PGA value will be uncertain as well. First-order approximations to the 95th percentile of the upper limit are listed in Table 2. It can be seen that the confidence intervals grow progressively larger with increasing threshold level λ.

Application to PGA Residuals As a second example, a peak-over-threshold analysis is performed on the total residuals of the PGA. The total residual includes both the inter-event and intra-event residuals and is defined as:

ε = ()− ( ) ln PGAobserved ln PGApredicted (12) where PGA predicted is the peak ground acceleration calculated using a ground motion model for the chosen earthquake and recording site.

In the literature, numerous empirical ground motion models, often known as ground motion attenuation models, have been proposed to describe the dependence of ground motion parameters on earthquake magnitude, source-to-site distance, and other source, travel path, and local site soil and rock characteristics. The most comprehensive and integrated effort to develop ground models is the recent “Next Generation of Ground Motion Attenuation Models” (NGA) project organized by Pacific Earthquake Engineering Research Center (Chiou et al., 2006). Five NGA ground motion models have been developed, including ground motion relations of Abrahamson and Silva (2007). The Abrahamson and Silva NGA relations are used in this paper to calculate the predicted PGA and the PGA residuals.

Conducting a peak-over-threshold analysis on PGA residuals of an attenuation model instead of the PGA records themselves allows combining data from multiple events with different parameters that affect ground motion characteristics, including among others, the earthquake magnitude, site-to-source distance, site condition, style of faulting. Although the tail analysis of the residuals implicitly assumes that the statistics of the residuals are unaffected by the specific values of the attenuation model parameters (i.e. the statistics associated with one type of faulting are no different than for another type of faulting), a much larger size sample is obtained in a residual analysis than in a direct assessment of the PGA, and this, in turn, results in narrower confidence bounds on the tail behavior.

The PEER NGA database (http://peer.berkeley.edu/products/rep_nga_models.html) was used to calculate the PGA residuals using the NGA ground motion relations developed by Abrahamson and Silva (2007). The peak ground acceleration values are equal to the geometric average of the two orthogonal horizontal components orientated randomly, as defined in NGA flatfile documentation (http://peer.berkeley.edu/products/nga_flatfiles_dev.html).

In residual calculations using the Abrahamson and Silva (2007) ground motion relations, we excluded the earthquake recordings that were considered not applicable and were excluded by Abrahamson and Silva (2007) based on their data selection criteria. These excluded earthquake recordings and bases for exclusion are given in Table D-1 of Abrahamson and Silva (2007). In

10 addition, we excluded the four Chi-Chi aftershocks that require a separate distance attenuation model. These earthquakes are identified by earthquake identification numbers (EQID) 172, 173, 174, and 175 in the NGA flatfile. These earthquakes were excluded mainly because the calculated residuals show considerable bias by distance, and we were not able to determine the cause. Also, the use of different regression models within the same data set may introduce additional uncertainty in terms of residual distribution. Instead, selected Chi-Chi aftershock recordings were used for peak-over-threshold analysis on the raw PGA data as documented in a previous section of this paper.

A considerable fraction (110 earthquakes) of the NGA flatfile does not have finite-fault models and, therefore, is missing important source parameters such as rupture distance, Joyner-Boore distance, depth to top of fault rupture, fault rupture width, source-to-site azimuth, hanging wall and footwall indicators, etc. The majority of the earthquakes without finite-fault models have magnitudes less than 6.0. Youngs (2005) developed finite-fault models for these earthquakes based on simulation of 101 possible rupture planes for each earthquake and supplemented the NGA database with simulated source parameters. These simulated parameters were used by the NGA developers in their ground motion model development. Therefore, two sets of residuals were considered in peak-over-threshold analysis, one with the simulated finite-fault data provided by Youngs (2005) and one without.

A third set of residuals was extracted from the PGA residual data calculated by Boore and Atkinson (2007) using their NGA equations and selective NGA earthquake recordings. This data is posted on NGA website (http://peer.berkeley.edu/products/Boore-Atkinson-NGA.html, file named ba_gm_resids.s1_resids_02apr07.pga). Boore and Atkinson (2007) defined their residual as the ratio of observed ground motion to predicted ground motion. We converted their residual values to the residual values defined by equation (12).

The summary statistics of the regression residuals of both the Abrahamson-Silva and Boore-Atkinson models are listed in Table 3. Their residuals generally follow a lognormal distribution, as shown in Figure 10. A complete GPD analysis was performed on the two separate sets of Abrahamson-Silva residuals: a set with simulated finite fault data and a set without simulated finite fault data. Complete summary statistics for the tail region of the Abrahamson-Silva model residuals are given in Table 4.

The threshold selection plot for the Abrahamson-Silva model with finite fault is given in Figure 11; the tail plot is given in Figure 12 for λ = 0.9. The lognormal distribution fitted to all residuals is also indicated on Figure 12. Although there is no overwhelming difference between the GPD and lognormal distribution assumption within the range of data, considerable differences appear when predictions need to be made beyond the data range. The GPD model is able to better capture the curvature in the data. In addition, theoretical considerations the use of the GPD model beyond the data range.

The threshold selection plot for the Abrahamson-Silva model without finite fault is given in Figure 13; the tail plot along with the lognormal distribution fitted to all residuals is given in Figure 14 for λ = 0.7. Figure 14 shows that the GPD model captures the true tail behavior much better than the lognormal distribution (which was fitted to all residual data) does. Although the

11 best estimate of the GPD shape parameter is close to zero (see Table 4), it is significantly different from zero (its coefficient of variation is 12%) and the residual distribution is therefore bounded.

The threshold selection plots (Figure 11 and Figure 13) indicate that the identification of the threshold level λ where the tail begins is not always straightforward. If the data follow a GPD distribution, the shape parameter of the GPD will be invariant with respect to a shift of the threshold λ to higher values. Figure 15 illustrates this property for the Abrahamson-Silva model with finite fault data. If the threshold is selected too low, e.g. λ < -1.0, a large number of data points which clearly do not belong to the tail (i.e. they are not part of the last linear part of the mean excess curve in Figure 11) will be included in the estimation of the shape parameter and this will result in a biased estimate for the parameter c (see Figure 15). Figure 15 shows that the shape parameter c converges as the threshold is increased. On the other hand, if the threshold is moved too far into the tail, very few data will remain to estimate the c-value, which results in numerically unstable estimates with considerable statistical uncertainty (wide confidence bounds in Figure 15). In practice, the optimal threshold value achieves a compromise between bias and variance of the estimate (Caers and Maes, 1998).

Implication for Seismic Hazard Analyses Thousands of strong motion recordings have shown a considerable degree of scatter or uncertainty (aleatory variability) in ground motion parameters, including peak ground acceleration. As mentioned previously, numerous predictive models (attenuation equations) have been developed to characterize the ground motion distribution and to quantify the uncertainty, for example the NGA models (http://peer.berkeley.edu/products/rep_nga_models.html) and those published in the 1997 special issue of Seismological Research Letters (volume 68, number 1).

Aleatory variability in ground motion prediction and the way it is integrated in seismic hazard analyses have significant effects on the calculated hazard. For decades, the existence of ground motion variability has been recognized (Bender, 1984) and the formulation of hazard calculation has included integration across the scatter in ground motion (Cornell, 1971; Marz and Cornell, 1973; McGuire, 1976). However, in practice, as indicated by Bommer and Abrahamson (2006), ground motion variability was often neglected or its effect was artificially reduced by various means in studies carried out in the 1970s and 1980s. Truncating the lognormal distribution at a predefined level of ground motion, often specified as the median ground motion plus a number of standard deviations, is an example of various attempts to reduce the effect of ground motion variability. Bommer and Abrahamson (2006) conducted extensive review and analysis and emphasized the importance of including ground motion variability in PSHA. They concluded that neglecting ground motion variability in earlier studies is the main reason that these studies yielded much lower ground motion hazard than the probability hazard studies performed in recent years.

Probability seismic hazard calculations should be carried out in the way that integrates across the full range of variability in ground motion. However, such integration should be performed over a probability density function that better characterizes the ground motion distribution than the widely used lognormal distribution. A lognormal distribution, when unbounded, gives a nonzero

12 probability to ground motion values so high that they are not physically possible. Bounded lognormal distributions, on the other hand, may distort the data, render the shape of the hazard curve artificial, and produce erroneous hazard results, particularly at low annual exceedance probabilities. In addition, the selection of the truncation level is somewhat arbitrary. This section demonstrates that probabilistic hazard integration should be performed over a density that is composite: a lognormal distribution supplemented by a GPD in the tail region. In the GPD, as detailed in the previous sections, PGA would gradually approach an upper bound.

The Assumption of Lognormal Distribution and Probability Calculation To discuss the effects of ground motion variability and the way it is integrated in PSHA on the calculated hazard, we need to briefly review the basics of PSHA calculations, starting with the PDF of the lognormal distribution for PGA. This PDF can be written as: 1 ⎧ − ( y−μ )2 1 σ 2 Y ⎪ e 2 Y PGA> 0 f (y; μ ,σ ) = πσ (13) Y Y ⎨ 2 Y (PGA) ⎪ ⎩0 PGA= 0 where Y = ln (PGA) is a random variable with normal distribution that has a mean, μY, and , σY. Please note that an uppercase symbol is used for a random variable, whereas the lowercase denotes a specific sample value of this random variable. This notation is consistent with standard practice in stochastic mechanics. The median and standard deviation are obtained from ground motion models such as Abrahamson and Silva (2007). For a given earthquake with magnitude m, the probability that the ground motion at distance r exceeds a particular value a0 is the complementary of the CDF and can be written as: − 1 −μ 2 ( y Y ) a0 1 σ 2 ≥ = − 2 Y = > F(Y lna0 | m,r) 1 e d(PGA) Y ln(PGA) 0 (14) ∫0 πσ 2 Y (PGA)

The integration is performed numerically. Because the CDF of a lognormal distribution can be easily obtained from the CDF of a standard normal distribution, often denoted as Φ(Z), by means of transformation, Equation (14) can be written as:

⎛ ln a − μ ⎞ F(Y ≥ ln a | m, r) = 1− Φ()Z = 1− Φ⎜ 0 Y ⎟ Y = ln(PGA) > 0 (15) 0 ⎜ σ ⎟ ⎝ Y ⎠ lna − μ where Z = 0 Y is a standard normal random variable. Φ(Z) has been extensively tabulated σ Y and is readily available in many statistics handbooks.

Suppose the site of interest is affected by a total of N earthquake sources, each of which has an associated magnitude Mi, distance Ri, and annual rate of occurrence vi. An earthquake source can produce multiple earthquakes that can occur at various locations of the source. Therefore, Mi and Ri are random variables, each having its own distribution. If fMi(m) and fRi(r) are used to denote the PDFs of magnitude and distance variables, respectively, for the ith source, then the total annual probability that the ground motion exceeds a0 is (total probability theorem):

13 N P[Y ≥ lna ] = v F(Y ≥ lna | m,r) f (m) f (r)dmdr (16) 0 ∑ i ∫∫ 0 Mi Ri i=1

The effect of ground motion variability and its integration is manifested in the calculation of ≥ F(Y lna0 | m,r).

For demonstration purposes, we use a simple case that is similar to the hazard calculation example used by Field (2006). This example assumes a rock site condition. It further assumes that there are two potential sources, both are vertical strike-slip faults, and both are located at a rupture distance of 15 km from the site (r1 = r2 = 15 km). The first produces an M 5.0 earthquake once every 20 years (m1 = 5.0, v1 = 1/20), and the second produces an M 7.0 earthquake once every 300 years (m2 = 7.0, v2 = 1/300). For the chosen magnitudes, distances, and rates of occurrence, Equation (16) is simplified to give the total annual probability that the ground motion exceeds a0 as:

≥ = ≥ + ≥ P[Y lna0 ] v1F1(Y1 lna0 | m1,r1) v2F2 (Y2 lna0 | m2 ,r2 ) (17)

The NGA ground motion relations developed by Abrahamson and Silva (2007) are used to obtain the median PGAs of 0.0794 g ( μ = -2.533) and 0.1636 g ( μ = -1.810), respectively, for Y1 Y2 the M 5.0 and M 7.0 earthquakes. The total standard deviation for ln(PGA) is calculated to be ≥ 0.7449 and 0.5336, respectively for the M 5.0 and M 7.0 earthquakes. F1(Y1 lna0 | m1,r1 ) and ≥ F2 (Y2 lna0 | m2,r2 ) are calculated for various values of a0 by numerically integrating over the entire lognormal distribution without truncation. The total annual probabilities of exceeding various PGA levels are then calculated using Equation (17) and plotted as hazard curves shown in Figure 16. The results show that at an annual exceedance probability of 10-7, the PGA exceeds 2.0 g, and at an annual exceedance probability of 10-8, the PGA reaches 3.5 g.

Truncated Lognormal Distribution If the lognormal distribution is truncated at a PGA value of atrun, the PDF needs to be lna − μ renormalized [divided by Φ(z ), where z = trun Y ] so that it integrates to one when the trun trun σ Y maximum PGA (i.e., atrun) is reached:

1 ⎧ − ( y−μ )2 1 1 σ 2 Y ⎪ e 2 Y PGA≤ a f (y; μ ,σ ) = Φ πσ trun (18) Y Y ⎨ (ztrun) 2 Y (PGA) ⎪ ⎩0 Otherwise

The probability that a given earthquake with magnitude m produces PGA at distance r exceeding a particular value a0 is then:

14 − 1 −μ 2 ⎧ ( y Y ) 1 a0 1 σ 2 ⎪ − 2 Y ≤ ≥ = 1 e d(PGA) PGA atrun F(Y lna0 | m,r) ⎨ Φ ∫0 πσ (19) (ztrun) 2 Y (PGA) ⎪ > ⎩ 0 PGA atrun

Again, denoting the integration of lognormal distribution as Φ(Z) by means of transformation, Equation (19) can be simplified as:

⎧ 1 []Φ(z ) − Φ()Z PGA≤ a ≥ = ⎪Φ trun trun F(Y lna0 | m,r) ⎨ (ztrun ) (20) ⎪ > ⎩0 PGA atrun

Using the same simple two-source example used in the case of unbounded lognormal distribution, the total annual probabilities of exceeding various PGA levels are calculated with the lognormal distribution truncated at various levels. Figure 17 compares hazard curves with no truncation and with truncations at a PGA of median plus 1, 2, 3, and 4 times standard deviation, respectively.

The maximum PGA for the various truncation levels is calculated using: μ + σ = Y n Y atrunc e (21) where n is the number of standard deviations and μY and σY are the mean and standard deviation of ln(PGA), as defined in Equation (13). Because the maximum PGA is limited to the value determined by Equation (21) for each earthquake scenario, the total PGA is also limited. Hazard curves are bent toward the maximum PGA. More severe bending of the calculated hazard curve corresponds to the more severe truncation (i.e., smaller n value).

Note that the hazard curves show an abrupt change in curvature that is more pronounced in some cases (e.g., truncation at one standard deviation) than others. This is the result of combining hazard curves from two earthquakes that are significantly different in magnitude; each has its limited maximum PGA. Figure 18 shows the individual and total hazard curves for truncation at three standard deviations. The shape of the total hazard curve is dominated by the smaller earthquake at the higher probability level and by the larger earthquake at a very low probability level. This effect would not be as significant if a range of earthquake magnitudes is considered, in which case the total hazard curve would smooth out due to the more continuous contributions from earthquakes of varying magnitude. However, it is still an artificial effect due to truncation of ground motion.

Although truncating the lognormal distribution of ground motion at a specified value of uncertainty is a common practice in PSHA, truncation below 3 standard deviations is not supported by the analyses of empirical strong-motion data in this study or other studies (Abrahamson, 2000; Bommer et al., 2004; Bommer and Abrahamson, 2006). As Bommer and Abrahamson (2006) indicated, truncating at a level that is above 3 standard deviations has little effect on the hazard curves for the return periods generally used in engineering design or even at

15 the 10,000-year level employed in the nuclear industry in some cases. This observation is further confirmed by Figure 17.

Truncation will, however, affect the calculated hazard if longer return periods are of interest, (e.g., above 106 years in the case of postclosure performance assessment for geological disposal of nuclear wastes). Because the selection of truncation levels is often arbitrary, the calculated ground motions at these longer return periods are also arbitrary. Furthermore, truncating ground motion uncertainty potentially means ignoring (i.e., simply throwing away) a portion of the recorded strong motion data that has low probability of occurring, whereas it is precisely these low probability occurrences that control the hazard at very long return periods.

Figure 19 shows deaggregation results with respect to ground motion uncertainty (ε) for a hypothetical site. At lower ground motion (higher probability level or shorter return period), the PDF of ε is skewed to the left, whereas at higher ground motion (lower probability level or longer return period), the PDF of ε is skewed to the right. Figure 19a (upper figure) suggests that modeling the left tail of the residual is important for low return periods, whereas it is the right tail that is of interest in the prediction of low probability events (Figure 19b, lower figure). In this example, the ground motion at low annual exceedance probability levels is controlled by uncertainty up to 3 or 4 standard deviations. Therefore, correctly modeling the uncertainty in the high-density region of the deaggregation diagram in Figure 19b is critical if ground motions associated with very low annual exceedance probabilities are of interest. In the current state of practice, all of the existing PSHA codes assume lognormal distribution of ground motion and therefore the integration for hazard is all based on lognormal distribution. Figures 9, 12, and 14 indicate that significant differences between the recorded data and the assumed lognormal distribution model may exist in the tail region and when extrapolated beyond the data range. Using a composite distribution model (lognormal supplemented by the GPD in the tail region) may result in significantly more accurate estimates of the hazard.

Combined Lognormal and Generalized Pareto Distribution In this section, we demonstrate hazard calculation using a composite distribution model that combines lognormal distribution and GPD for peak ground acceleration. Hazard integration is over the lognormal distribution until a threshold PGA value, aλ, is reached. For the portion with PGA greater than aλ (i.e., tail region), the integration is performed over the GPD. This is achieved by combining Equation (20) and the CDF for GPD [Equation (5)] and renormalizing the distribution, which yields the probability that a given earthquake with magnitude m producing PGA at distance r exceeding a particular value a0 as:

⎧ Φ(Z) 1− (1− p ) ln(PGA) − μ ≤ λ ≥ = ⎪ tail Φ ln(PGA) F(Y lna0 | m,r) ⎨ (zλ ) (22) ⎪ []− − μ > λ ⎩ptail 1 FGPD (PGA) ln(PGA) ln(PGA)

lnaλ + μ where z = ln y , with a = exp(λ + μ ) , corresponding to PGA when λ σ λ ln y y

16 −1/ c ⎡ ()ln(PGA) − μ − λ ⎤ − μ = λ = − + ln(PGA) ln(PGA) ln(PGA) ; FGPD (PGA) 1 ⎢1 c ⎥ ⎣ δ ⎦

λ, δ, and c are Generalized Pareto Density parameters defined in Equation (5) and given in Table 4 for PGA residuals; ptail is the fraction of recorded data in the tail region (see Table 4).

Again, using the same two-source example as before, the total annual probabilities of exceeding various PGA levels are calculated using the combined lognormal distribution and GPD. Hazard examples are calculated using two separate sets of GPD parameters: the one with simulated finite fault data and the one without simulated finite fault data. Figures 20 and 21 show total hazard curves with lognormal distribution without truncation, lognormal distribution with truncations at the threshold and upper bound PGAs (threshold and upper bound values mark the beginning and end of the GPD tail model, respectively), and the composite distribution model for each case, respectively. The threshold PGA is defined in Equation (22). The upper bound PGA is calculated similarly, but using the best estimate of upper bound value in Table 4 in the place of tail threshold value λ. The threshold and upper bound PGAs are found to be equal to the median PGA plus 1.12 and 2.83 times standard deviation, respectively, for the case with finite fault data. The threshold and upper bound PGAs are calculated to be the median PGA plus 0.94 and 4.55 times standard deviation, respectively, for the case without simulated finite fault data.

These figures show that, in general, hazard estimated using the composite distribution model is lower than hazard estimated using the lognormal distribution without truncation for all annual probabilities below about 10-2. The difference becomes increasingly more significant as the probability level decreases, particularly for the set of residuals calculated with simulated finite fault data (Figure 20). The composite distribution model predicts higher hazard than the lognormal distribution truncated at the threshold PGA and, for most annual exceedance probability, it predicts lower hazard than the lognormal distribution truncated at the upper bound PGA. For the set of residuals calculated with the simulated finite fault data, the difference between predicted hazard level using the composite distribution model and lognormal distribution truncated at upper bound PGA is insignificant. However, the former predicts a smoother hazard curve because the characteristics of the hazard curve are controlled by the smooth transition from the lognormal distribution to GPD and by GPD parameters in the tail region rather than being bent artificially toward an arbitrary cutoff value, as is the case of truncated lognormal distribution.

It is to be note that the GPD parameter values used in our PSHA example are the best-estimate results from the peak-over-threshold analysis. These estimates are associated with uncertainties as indicated by the coefficient of variation and confidence intervals for scale and shape parameters (see Figure 15 for an illustration of the confidence bounds on the shape parameter). Variations in these parameters will affect both the predicted hazard value for a specific probability and the shape of the hazard curve. This is demonstrated by the differences between Figures 20 and 21. Although not quantitatively defined, there is also uncertainty associated with threshold λ. This parameter is important because it defines the probability level at which the composite distribution model will begin to differ from the traditional lognormal distribution. Because the confidence associated with λ is not estimated, the value obtained for this

17 parameter can be subject to debate and, therefore, is considered preliminary. It is likely that the threshold value can be increased with increasing sample size. These uncertainties, however, will not affect our general observations, the applicability of peak-over-threshold analysis to PGA residuals, and the calculation of probability hazard using combined distribution models. To gain confidence in this regard, we performed tail analysis for a set of residuals obtained from an alternative ground motion model, the Boore and Atkinson NGA equations and their selective NGA earthquake recordings (Boore and Atkinson, 2007). The tail analysis shows very similar distribution characteristics to the residuals obtained from the Abrahamson and Silva NGA model without finite fault data, as demonstrated by the GPD parameters in Table 4. There is very good agreement between the upper bound on the maximum residual predicted by the Boore- Atkinson and Abrahamson-Silva model without the finite-fault data.

Conclusion Probabilistic seismic hazard analysis (PSHA) has been used to characterize earthquake ground motion hazard and to establish design input for nuclear facilities and for other engineering structures. However, the characterization of scatter (aleatory variability) in the recorded ground motion parameters, including peak ground acceleration (PGA), is subject to considerable debate. The state of practice is to use a lognormal distribution to model ground motion variability. Although the lognormal distribution assumption works quite well for moderate levels of the exceedance probability, it leads to unrealistically high values of PGA when predictions are made for extremely small exceedance probabilities (i.e. long return periods). In this paper, the validity of this lognormal distribution assumption for the PGA is assessed using statistical peak-over-threshold analysis techniques.

Both raw ground motion recordings from the PEER NGA database and the attenuation model residuals were used to demonstrate the approach. The statistical analysis indicated that although the PGA data or the attenuation residuals generally follow a lognormal distribution, significant deviations from the lognormal model may exist near the upper end of the distribution, where they are better characterized by the Generalized Pareto Distribution (GPD). The GPD distribution is also justified on the basis of theoretical considerations since it arises as the limiting distribution in peak-over-threshold analyses. Although the GPD pertains to only a small part of the distribution, the difference between the lognormal and the GPD is important because it is precisely this upper end that controls the estimate of ground motions at low probabilities. The implications of peak-over-threshold analysis in PSHA are demonstrated using a composite distribution model that consists of a lognormal distribution supplemented by the GPD in the upper tail. Results show that – for the large return periods in the examples considered – the composite distribution model yields considerably lower PGA values or residuals than the unbounded and physically unrealistic lognormal distribution. The peak-over-threshold modeling approach provides a rational basis for determining the PGA or residual upper bound, wherever it exists. In addition, due to the absence of an arbitrary truncation, the resulting hazard curves transition gradually toward that upper bound.

Realistic modeling of low probability ground motions will gain increasing significance with renewed interests in nuclear energy production. The GPD parameters derived in this study are only for PGA and are considered preliminary and specific to the PEER NGA data and the ground motion models employed. Our approach, however, can be applied to more extensive ground

18 motion data and other ground motion models so that the GPD parameters can be derived for more general applications. The approach can also be readily adapted to spectral accelerations and other ground motion parameters. Furthermore, the composite distribution model can be implemented in existing or future PSHA software for practical applications.

Acknowledgments This report was prepared to document work performed by the Center for Nuclear Waste Regulatory Analyses (CNWRA) for the U.S. Nuclear Regulatory Commission (NRC) under Contract No. NRC–02–02–012. The activities reported here were performed on behalf of the NRC Office of Nuclear Material Safety and Safeguards, Division of High-Level Waste Repository Safety. The report is an independent product of the CNWRA and does not necessarily reflect the views or regulatory position of the NRC. The manuscript benefited from discussions with Abou-Bakr Ibrahim, Tianqing Cao, Jon Ake, and Mahendra Shah. The authors would also like to thank Bob Youngs for providing the simulated finite fault data and Sarah Gonzalez (formerly at CNWRA, now NRC) for her contributions to the data collection efforts. We thank Beverly Street and Lauren Mulverhill for editorial support and report preparation. We are also grateful for the technical review by Alan Morris and Budhi Sagar and programmatic review by Sitakanta Mohanty, which greatly improved the quality and clarity of the information in this paper.

References Abrahamson, N. A. and W. J. Silva (2007). Abrahamson and Silva NGA Ground Motion Relations for the Horizontal Component of Peak and Spectral Ground Motion Parameters, draft version 2, July 9, http://peer.berkeley.edu/products/Abrahamson-Silva_NGA.html Abrahamson, N. A. (2000). State of the practice of seismic hazard assessment. Proceedings of GeoEng 2000, Melboume, Australia, November, 19–24, 2000. 1, 659–685. Bechtel SAIC Company, LLC, (2004). Technical Basis Document No. 14: Low Probability Seismic Events, Rev. 01. Bechtel SAIC Company, LLC, Las Vegas, Nevada. Accession Number: ML041880094. (http://www.nrc.gov/reading-rm/adams.html). Bender, B. (1984). Incorporating acceleration variability into seismic hazard analysis. Bull. Seism. Soc. Am., 74, no. 4, 1451–1462. Bommer, J. J., N. A. Abrahamson, F. O. Strasser, A. Pecker, P-Y. Bard, H. Bungum, F. Cottom, D. Faeh, F. Sabetta, F. Scherbaum, and J. Studer (2004). The challenge of defining the upper limits on earthquake ground motions, Seism. Res. Lett. 75, no. 1, 82–95. Bommer, J. J. and N. A. Abrahamson (2006). Why do modern probabilistic seismic-hazard analyses often lead to increased hazard estimates?, Bull. Seism. Soc. Am., 96, no. 6, 1967–1977. Boos, D. D. (1984). Using to estimate large percentiles, Technometrics, 2 no. 1, 33–39. Boore, D. M and G. M. Atkinson (2007). Boore-Atkinson NGA Ground Motion Relations for the Geometric Mean Horizontal Component of Peak and Spectral Ground Motion Parameters. May, http://peer.berkeley.edu/products/Boore-Atkinson-NGA.html Caers, J., M. A. Maes (1998). Identifying tails, bounds and end-points of random variables, Structural Safety, 20 no. 1, 1–23.

19 Castillo, E. (1988). Extreme Value Theory in Engineering, Statistical Modeling and Decision , Academic Press Inc., Boston, MA. Castillo, E., A. S. Hadi, and J. M. Sarabia (1995). Statistical analysis of extreme waves, Proceedings of the 14th ASME Offshore Mechanics and Arctic Engineering Conference, Copenhagen, Denmark, 2, 33–40. Chiou, B., M. Power, N. Abrahamson, and C. Roblee. 2006. An Overview of the Project of Next Generation of Ground Motion Attenuation Models for Shallow Crustal Earthquakes in Active Tectonic Regions, Paper No. B11, Fifth National Seismic Conference on Bridges & Highways, San Francisco, CA, September 18-20 Chiou, B. S. and R. R. Youngs (2006). Chiou and Young’s PEER-NGA empirical ground motion model for the average horizontal component of peak acceleration and pseudo-spectral acceleration for spectral periods of 0.01 to 10 seconds, Interim Report for USGS Review. Coles, S. (2001). An introduction to statistical modeling of extreme values, Springer Series in Statistics, Springer, London, United Kingdom. Cornell, C. A. (1971). Probabilistic Analysis of Damage To Structures Under Seismic Loads– Dynamic Waves in Civil Engineering, D. A. Howells, I. P. Haigh, and C. Taylor (Editors), John Wiley& Sons, New York, 473–488. Corradini, M. L. (2003). Board comments on February 24 panel meeting on seismic issues. Letter (June 27) to Dr. Margaret S.Y. Chu, DOE, Office of Civilian Radioactive Waste Management. U. S. Nuclear Waste Technical Review Board, Washington, DC:. . CRWMS M&O (1998). Probabilistic Seismic Hazard Analyses for Fault Displacement and Vibratory Ground Motion at Yucca Mountain, Nevada, Final Report. I. G. Wong and J. C. Stepp, coords. Report SP32IM3, CRWMS M&O, Oakland, California. DOE (2004). Transmittal of Report Technical Basis Document No. 14 Low Probability Seismic Events, Revision 1, Addressing Key Technical Issue (KTI) Agreements Related to Earthquake Effects. DOE, Las Vegas, Nevada. Accession Number: ML041880091. (http://www.nrc.gov/reading-rm/adams.html) Drees, H., L. de Haan, and S. Resnick (200). How to make a Hill plot, The Annals of Statistics, 28 no.1, 254–274. Efron, B. and R .J. Tibshirani (1993). An Introduction to the Bootstrap, Chapman & Hall, New York City, New York. Field, N. H. (2006). Probabilistic Seismic Hazard Analysis, A Primer, (http://epicenter.usc.edu/docs/PSHA_Primer_v2.pdf). Hill, B. M. (1975). A simple and general approach to ingerence about the tail of a distribution, Annals of Statistics, 3, 1163–1174. Hosking, J. R. M. and J. R. Wallis (1987). Parameter and quantile estimation for the generalized Pareto distribution, Technometrics, 29, no 3, 339–346. Hsieh, P.-H. (1999). Robustness of tail estimation, Journal of Computational and Graphical Statistics, 8, no. 2, 318–332. Kokajko, L. E. (2005). Pre-Licensing Evaluation of Key Technical Issue Agreements: Structural Deformation and Seismicity 2.01, 2.01 Additional Information Needed-1, 2.02, 2.04, 2.04 Additional Information Needed-1; Repository Design and Thermal Mechanical Effects .01, 2.02, 3.03; Container Life and Source Term 3.10; and Total System Performance Assessment and Integration 3.06. Technical Basis Document 14, Low Probability

20 Seismic Events. Letter (April 13) to J.D. Ziegler, U.S. Department of Energy, NRC, Washington, DC. ADAMS ML043450673. Maes, M. A., and K. W. Breitung (1993). Reliability-Based Tail Estimation in Probabilistic Structural Mechanics: Advances in Structural Reliability Methods, P. D. Spanos, Y.-T. Wu (Eds.), Springer, Berlin, 335–346. Maes, M. A. (1995). Tail heaviness in structural reliability, Proceedings of ICASP 7, Paris, France. Marz, H. A. and C. A. Cornell (1973). Seismic risk based on a quadratic magnitude-frequency law, Bull. Seism. Soc. Am. 63, no. 6, 1999-2006. McGuire, R. K. (1976). FORTRAN computer program for seismic risk analysis, U.S. Geol. Surv. Open-File Rept. NRC (April 2005). NUREG–1762, Integrated Issue Resolution Status Report, Rev. 1, NRC, Washington, DC. Pickands, J. (1975). Statistical inference using extreme order statistics, The Annals of Statistics, 3, 119–131. Pisarenko, V.F. and D. Sornette (2003). Characterization of the Frequency of Extreme Earthquake Events by the Generalized Pareto Distribution, Pure and Applied Geophysics 160, 2343-2364. Schlueter, J. (2000). U.S. Nuclear Regulatory Commission/U.S. Department of Energy Technical Exchange and Management Meeting on Igneous Activity (August 29–31, 2000). Letter to S. Brocoum, NRC, Washington, DC: NRC. ADAMS ML003763285. www.nrc.gov/waste/hlw-disposal/publi-involvement/mtg-archive.html#KTI. Singpurwalla, N.D. (1972). Extreme values from a lognormal law with applications to air pollution problems, Technometrics, 19, no. 3), 703-711. Youngs, R. R. (2007). Estimation of distance and geometry measures for earthquakes without finite-fault rupture models, Note to NGA Developers, December 8, 2005 (obtain from Dr. Youngs by e-mail).

21 Figure Captions

Figure 1: Three types of tail distributions

Figure 2: Empirical distributions of the surface pores obtained with two aluminum casting processes

Figure 3: Tail equivalence of pore distributions in two cast aluminum microstructures (source: private data)

Figure 4: Plot of Peak Ground Acceleration data in PEER NGA database

Figure 5: Lognormal probability plot for Chi-Chi Peak Ground Acceleration data

Figure 6: Log-exceedance plot for Chi-Chi Peak Ground Acceleration data (for exploratory purposes only because assumed threshold λ = 0)

Figure 7: Number of Peak Ground Acceleration records in each distance bin

Figure 8: Mean conditional excess plot for 50 < r < 70 km (31 < r < 44 mi). A threshold λ = 0.1275 can be clearly identified as the beginning of the last stable, linear part of this plot.

Figure 9: Comparison of Generalized Pareto Distribution fit with 95% confidence upper bound and lognormal distribution for 50 < r < 70 km (31 < r < 44 mi).

Figure 10: Assessing the normality of the attenuation residuals ε = ln(PGAoberserved) – ln(PGAregression)

Figure 11: Conditional mean excess plot of the residual in the Abrahamson-Silva attenuation model with the finite fault data.

Figure 12: Comparison of Generalized Pareto Distribution fit with 95th percentile upper bound and lognormal distribution for the residual in the Abrahamson-Silva attenuation model with the finite fault data.

Figure 13: Conditional mean excess plot of the residual in the Abrahamson-Silva attenuation model without the finite fault data.

Figure 14: Comparison of Generalized Pareto Distribution fit with 95th percentile upper bound and lognormal distribution for the residual in the Abrahamson-Silva attenuation model without the finite fault data.

Figure 15: Best estimate and 95th percentile intervals of the Generalized Pareto Distribution shape parameters as function of the threshold λ for the residual in the Abrahamson-Silva attenuation model with the finite fault data.

22

Figure 16: The total Peak Ground Acceleration exceedance rate (hazard curves) for the two earthquake sources described in the text using the full range lognormal distribution

Figure 17: The total Peak Ground Acceleration exceedance rate for the two earthquake sources described in the text using truncated lognormal distribution

Figure 18: Peak Ground Acceleration exceedance rate (hazard curves) for the two earthquake sources described in the text and the total using lognormal distribution truncated at median plus three standard deviations

Figure 19: Deaggregation of Peak Ground Acceleration hazard with respect to ground motion uncertainty (ε) for a hypothetical site

Figure 20: Comparison of the total Peak Ground Acceleration exceedance rate for the two earthquake sources described in the text using full range lognormal distribution, truncated lognormal distribution, and combined lognormal distribution and Generalized Pareto Distribution (Abrahamson-Silva with finite fault data)

Figure 21: Comparison of the total Peak Ground Acceleration exceedance rate for the two earthquake sources described in the text using full range lognormal distribution, truncated lognormal distribution, and combined lognormal distribution and Generalized Pareto Distribution (Abrahamson-Silva without finite fault data)

23

0.1

0.1 H < 0; subexponential light tail 0.1

0.1

0.1

0.1 H = 0; exponential neutral tail 0.0

L(x) = -ln[1-F(x)] L(x) 0.0

0.0 H > 0; superexponential heavy tail 0.0

0.0 0.00.20.40.60.81.0 X

Figure 1

24

1.0 8 0.0003

7 0.0009

0.8 6 0.0025 (x) X (x)]

X 5 0.0067 0.6

4 0.0183

0.4 3 0.0498 Tail ordinate: -ln[(1-F Empirical CDF:F(i) = (i-0.5)/n

2 0.1353 Exceedance probability- F 1 = 0.2

1 0.3679

0.0 0 1.0000 101 102 103 104 105 106 104 105 Pore surface [micron2] Pore surface [micron2]

Figure 2

25 (x)] 1.1 2

1.0 (x)]/[1 - F 1 0.9

0.8 ties [1 - F ties [1 -

0.7

0.6

dance probabili 0.5

0.4

0.3

Ratio of excee Ratio 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14

2 Pore size [mm ]

Figure 3

26

1

Magnitude 7.8 7.6 7.4 7.2 0.1 7 6.8 6.6 6.4 6.2 6 5.8 PGA [g] 5.6 5.4 0.01 5.2 5 4.8 4.6 4.4

0.001 1 10 100 1000 Closest distance [km]

Figure 4

27

Chi-Chi aftershocks, M = 6.2 (1999) 3 5 < r < 15 15 < r < 25 25 < r < 35 35 < r < 45 45 < r < 55 2 55 < r < 65 65 < r < 75 75 < r < 85 85 < r < 95 95 < r < 105 1 105 < r < 115 115 < r < 125 125 < r < 135 135 < r < 145 145 < r < 155 0

-1 Normal probability scale [sigma levels] Normal probability

-2

-3 0.001 0.01 0.1 1 PGA [g]

Figure 5

28

Chi-Chi aftershocks, M = 6.2 (1999) 5 r < 5 5 < r < 15 15 < r < 25 25 < r < 35 35 < r < 45 45 < r < 55 4 55 < r < 65 65 < r < 75 75 < r < 85 85 < r < 95 95 < r < 105 105 < r < 115 3 115 < r < 125 125 < r < 135 135 < r < 145 145 < r < 155

2 Log exceedance scale: -ln[1-F(x)] Log exceedance

1

0 0.001 0.01 0.1 1 PGA [g]

Figure 6

29

180

160

140

120

s in distance bin distance in s 100

80

60

40

20 Number of PGA record

0 .8 .1 .5 .1 .8 0 .3 .5 17 25 35 50 70 10 41 99 to to to to to to 1 1 0 .8 .1 .5 .1 .8 to to 17 25 35 50 70 00 1.3 1 14

Closest distance range [km]

Figure 7

30

0.055 )

λ 0.050

0.045 | PGA > λ

0.040

0.035

0.030

0.025 Mean Excess = E(PGA Excess Mean -

0.020 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 λ Threshold PGA level [g]

Figure 8

31

3.0 0.950 Chi-chi data GPD fitted curve 2.5 95th percentile GPD upper bound 0.918

General lognormal model (x) X (x)] X 2.0 0.865

1.5 0.777

1.0 0.632 log-exceedance: -ln[1 - F log-exceedance: 0.5 0.393 F probability tail Cumulative

0.0 0.000 0.12 0.14 0.16 0.18 0.20 0.22 PGA [g]

Figure 9

32

4 0.99997

Abrahamson-Silva with finite fault 3 Abrahamson-Silva without finite fault 0.99865 Boore-Atkinson 2 0.97725

1 0.84134

0 0.50000

-1 0.15866 Cumulative probability Cumulative

Normal probability scale probability Normal -2 0.02275

-3 0.00135

-4 0.00003 -3 -2 -1 0 1 2 3 ε Residual = ln(PGAobserved) - ln(PGApredicted)

Figure 10

33

2.5

) 2.0 λ > ε | λ

- 1.5 ε

1.0

Mean Excess = E( 0.5

0.0 -3 -2 -1 0 1 2 λ Threshold residual level

Figure 11

34

8 0.9997 Tail of Abrahamson-Silva residuals 7 GPD fitted curve 0.9991 95th percentile GPD upper bound

General lognormal model (x) X

(x)] 6 0.9975 X

5 0.9933

4 0.9817

3 0.9502

2 0.8647 log-exceedance: -ln[1 - F - -ln[1 log-exceedance: Cumulative tail probability F tail probability Cumulative 1 0.6321

0 0.0000 0.81.01.21.41.61.82.02.2 ε residual = ln(PGAobserved) - ln(PGApredicted)

Figure 12

35

2.2

) 2.0 λ >

ε 1.8 | λ 1.6 - ε 1.4

1.2

1.0

0.8

0.6

0.4

Mean Excess Residual = E( Mean Excess 0.2

0.0 -3 -2 -1 0 1 2 λ Threshold residual level

Figure 13

36

8 0.9991 Tail of Abrahamson-Silva residuals GPD fitted curve 7 0.9975 95th percentile GPD upper bound

General lognormal model (x) X

(x)] 6 X 0.9933

5 0.9817 4 0.9502 3

0.8647 2 log-exceedance: -ln[1 - F - -ln[1 log-exceedance: Cumulative tail probability F tail probability Cumulative 1 0.6321

0 0.0000 0.60.81.01.21.41.61.82.0 ε residual = ln(PGAobserved) - ln(PGApredicted)

Figure 14

37

0.4

Best estimate of shape parameter 0.2 95% confidence bounds on shape parameter

0.0

-0.2

-0.4

Shape parameter c Shape parameter -0.6

-0.8

-1.0 -1.0 -0.5 0.0 0.5 1.0 1.5 λ ε Threshold level of residual

Figure 15

38

10-1

10-2

10-3

10-4 ce Probability

10-5

10-6 Annual Exceedan Annual 10-7

10-8 0.00.51.01.52.02.53.03.5

PGA [g]

Figure 16

39

10-1

Without truncation -2 10 Truncate at 4 sigma Truncate at 3 sigma Truncate at 2 sigma 10-3 Truncate at 1 sigma

10-4

10-5

10-6 Annual Exceedance ProbabilityAnnual Exceedance 10-7

10-8 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

PGA [g]

Figure 17

40

10-1

M = 5.0 10-2 M = 7.0 Total

10-3

10-4

10-5

10-6 Annual Exceedance ProbabilityAnnual Exceedance 10-7

10-8 0.00.20.40.60.81.0

PGA [g]

Figure 18

41

1.2 Figure 19a 1 PGA amplitude: 0.20 g Hazard: 6.880×10-4 0.8 Mean ε: -0.83

0.6

0.4

Probability Density Probability 0.2

0 -3 -2 -1 0 1 2 3 4 Epsilon

3

2.5 Figure 19b PGA amplitude: 2.0 g Hazard: 1.567×10-7 2 Mean ε: 3.38 1.5

1

Probability Density Probability 0.5

0 -3 -2 -1 0 1 2 3 4 Epsilon

Figure 19

42

Abrahamson-Silva NGA model with finite-fault data 10-1 Lognormal, without truncation Lognormal, truncated at upper bound PGA 10-2 Lognormal, truncated at threshold PGA Combined lognormal and GPD 10-3

10-4 ce Probability

10-5

10-6 Annual Exceedan 10-7

10-8 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

PGA [g]

Figure 20

43

Abrahamson-Silva NGA model without finite-fault data 10-1 Lognormal, without truncation Lognormal, truncated at upper bound PGA 10-2 Lognormal, truncated at threshold PGA Combined lognormal and GPD 10-3

10-4 ce Probability

10-5

10-6 Annual Exceedan 10-7

10-8 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

PGA [g]

Figure 21

44

Table 1: Tail equivalence between Generalized Pareto and Extreme Value distribution types Tail H(x) PDF of X - λ EV domain Light - Beta Weibull (c < 0) Neutral 0 Exponential Gumbel (c = 0) Heavy + Pareto Frechet (c > 0)

Table 2: Tail statistics, Generalized Pareto Distribution parameters, and predicted upper bound limit of the Peak Ground Acceleration and its 95% upper confidence limit for the distance range 50-70km of the 2 Chi-Chi Mw = 6.2 aftershocks. Peak Ground Acceleration upper Threshold Scale Shape Ntail bound 95% C.L. upper bound 0.065 0.064 -0.376 36 0.235 0.254 0.085 0.056 -0.351 25 0.243 0.279 0.127 0.044 -0.351 9 0.252 0.278

Table 3: Bulk statistics of the regression residuals of the Abrahamson-Silva (A-S) and Boore-Atkinson models.

Coefficient Model description Ndata Mean StDev Median of Variation A-S with finite fault 1919 -0.057 0.553 1.101 0.598 A-S without finite fault 1190 -0.061 0.543 1.090 0.586 Boore-Atkinson 1574 -0.110 0.596 1.070 0.654

Table 4: Tail statistics of the regression residuals of the Abrahamson-Silva (A-S) and Boore-Atkinson models.

Residual Model description % Tail Threshold Scale Shape Upper Bound A-S with finite fault 4.3% 0.9 0.35 -0.29 2.19 A-S without finite fault 8.1% 0.7 0.31 -0.12 3.84 Boore-Atkinson 8.8% 0.7 0.29 -0.11 3.55

45