<<

ELSEVIER International Journal of Forecasting 11 (1995) 503-538

Forecasting and risk

David Vere-Jones Institute of Statistics and Operations Research, Victoria University of Wellington, Wellington, New Zealand

Abstract

This paper reviews issues, models, and methodologies arising out of the problems of predicting earthquakes and forecasting earthquake risk. The emphasis is on statistical methods which attempt to quantify the probability of an earthquake occurring within specified time, space, and magnitude windows. One recurring theme is that such probabilities are best developed from models which specify a time-varying conditional intensity (conditional probability per unit time, area or volume, and magnitude interval) for every point in the region under study. The paper comprises three introductory sections, and three substantive sections. The former outline the current state of , earthquakes and their parameters, and the point process background. The latter cover the estimation of background risk, the estimation of time-varying risk, and some specific examples of models and prediction algorithms. The paper concludes with some brief comments on the links between forecasting earthquakes and other forecasting problems.

Keywords: Earthquake prediction; Seismic hazards; Point process modelling; Conditional intensity models; Probability forecasting

1. Introduction all their defects) currently available in respect of floods, hurricanes, and volcanic eruptions. Natural hazards are assuming ever greater Twenty years ago there was considerable op- economic importance, not only on a regional but timism among scientists concerning the future of also on a global scale. The growth of major cities earthquake prediction. Many observational in hazard prone areas, and the public anxiety studies showed that the occurrence of major associated with risks to critical facilities such as earthquakes was preceded, at least on some nuclear reactors, has focused attention on the occasions, by anomalous behaviour in a wide problems of insurance against natural hazards, variety of physical variables-some less surpris- disaster mitigation, and disaster prevention. ing, such as increases or fluctuations in the Within this general area, the earthquake haz- frequency of smaller events, but others more ard stands out as one of the most intractable. surprising, such as anomalous fluctuations of While basic knowledge is available about levels magnetic and electric fields, not to mention of background risk, nothing yet exists for earth- anomalous animal behaviour. Out of so many quakes comparable to the warning systems (for possibilities, there seemed a good chance of

0169-2070/95/$09.50 © 1995 Elsevier Science B.V. All rights reserved SSDI 0169-2070(95)0062l-4 504 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538

developing reliable precursory indicators which, Acceptance of this viewpoint has not come like rising river levels for flooding, would pro- easily to the geophysicists, who, in common with vide the basis for early warning procedures many other physical scientists, have a predisposi- against earthquakes. tion to view the world in deterministic terms. A The intervening twenty years have brought a notable exception was Sir Harold Jeffreys who salutary reminder that nature does not reveal her warned, more than fifty years ago, that secrets easily. Rikitake (1976) and Mogi (1985) 'A physical law is not an exact prediction, but a give systematic accounts of the physical phenom- statement of the relative probabilities of varia- ena which may be associated with earthquake tions of different amounts; ... the most that the occurrence, and report on a case-by-case basis laws do is to predict a variation that accounts for some Japanese efforts to use such phenomena in the greater part of the observed variation. The developing earthquake predictions; Chinese ef- balance is called 'error' and is usually quickly forts are reported by Ma et al. (1990). A recent forgotten or altogether disregarded.' (Jeffreys, and somewhat iconoclastic overview is given by 1939) Lomnitz (1994). While it is possible to retain, as Early exPectations that earthquake predictions the present writer certainly does, some measure could be made deterministically, or with an error of optimism in the long-term outcome of such which could be "quickly forgotten or altogether efforts, three major obstacles have become very disregarded", have proved unfounded. The apparent. First, the process of earthquake for- probabilities have come to take a central role. mation is complex; as indeed is true of most The distinguished contemporary seismologist, other forms of fracture. Second, extensive and Keiiti Aki, wrote recently that reliable data on possible precursors of major 'I would like to define the task of the physical earthquakes are expensive to collect, and take a scientist in earthquake prediction as estimating long time to accumulate. Third, processes which objectively the probability of occurrence of an have their origin some 5, 10 or even 20 km below earthquake within a specified time, place and time the earth's surface are singularly resistant to window, under the condition that a particular set direct observation. Very accurate instruments for of precursory phenomena was observed.' (Aki, measuring changes of level and tilt of the ground 1989) have been put in place, but the fluctuations they As early as 1984, a similar sentiment was built reveal are not easily related to the occurrence of into a protocol of UNESCO and IASPEI (the major events. Changes may appear before some International Association of and earthquakes but not before others; or they may Physics of the Earth's Interior) (IUGG, 1984). It appear and stabilize without any major event recommends, "that predictions should be formu- occurring. Much the same can be said about the lated in terms of probability, i.e. the expectation other proposed predictors. An impression is in the time-space-magnitude domain, of the beginning to emerge that in seismically active occurrence of an earthquake". Later, it elabo- areas, large segments of the earth's crust are in rates, some sense close to a critical state; small per- 'As prediction develops in the course of con- turbations can produce unexpectedly large ef- tinuous monitoring of the region, the expectation fects, even at a distance; energy, stored gradual- may be progressively modified on the basis of ly in a volume of the crust, is released sporadi- further precursory data, or other relevant occur- cally through a complex network of interacting rences so that the prediction process itself be- faults, in patterns that have a fractal rather than comes continuous. Likewise the performance of a regular character. the hypothesis can be continuously reassessed. The growing recognition of these difficulties The region being monitored will in general has highlighted the need to formulate models for include areas of reduced expectation as well as earthquake occurrence in probabilistic terms. areas of increased expectation, and the proce- D, Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 505 dure as a whole can, if appropriate, be readily potheses? To be "fully statistically substan- adopted as a basis for designing or modifying tiated" means here that the proposed prediction earthquake countermeasures in the region.' procedure should be backed by a sufficient Even if deterministic prediction is no longer number of directly observed successes and fail- regarded as a realistic goal, it still casts its ures to establish its performance at some agreed shadow over the formulation of prediction and level. This is more stringent than having general forecasting algorithms. There is often an implicit evidence in favour of the theory or model on assumption that the alternative to a deterministic which the procedure is based, or extrapolating prediction is a deterministic prediction with from small events to large ones, or from a error. One of the recurrent themes of the pres- retrospective analysis to a prospective one. ent paper is that even this concession may not be There are analogies with the problems facing sufficient. Rather, somewhat as in quantum the medical profession in regard to the use of a theory, one needs to think of an evolving field of promising new drug which still awaits clinical probabilities, within which the concept of "the trialing. The rigorous procedures surrounding predicted event" loses any precise meaning. The clinical trials can be compared to the procedures nearest equivalent to a predicted event is a for testing earthquake prediction algorithms pro- region with very high concentration of probabili- posed in a series of papers by Evison and ty, but even here the patterns are constantly Rhoades-see, for example, Rhoades and Evison being modified as new predictive information (1979, 1989), Rhoades (1989), Evison and comes to hand. Rhoades (1993). In the clinical trials context, the The protocol quoted earlier goes on to discuss need for rigorously designed and conducted the special responsibilities which devolve upon statistical tests has been found essential to safe- authors and journal editors intending to publish guard the public interest. Evison and Rhoades a paper which incorporates material equivalent argue that something similar should be required to an earthquake prediction. This is a reminder for prospective schemes of earthquake predic- of the difficulty of divorcing the scientific aspects tion. One difficulty with earthquakes however, is of research in this area from its social, economic, that it could take decades or even centuries to and political implications. The seismologists had accumulate the information necessary to fully their fingers badly burned in this regard in a statistically substantiate a proposed method of well-known incident where a scientific journal prediction for great earthquakes. Whether, and published a paper forecasting a major earth- under what circumstances, any lower level of quake in the close vicinity of a major city. The verification could be acceptable is one of the key local press became aware of this prediction, and points at issue. drew attention to it, causing some degree of An important recent development has been panic. No earthquake in fact occurred, and the the decision by the USGS (United States scientific basis of the prediction was subsequently Geological Survey) to publish official assess- discredited, but the incident highlighted the ments of the probabilities of damaging earth- importance of preliminary consultation and quakes affecting different parts of scrupulous refereeing before material of this over specified horizons (see Working Group on kind is published, even in a scientific journal. Californian Earthquake Probabilities, 1988, For a full account of the event and its implica- 1990). These assessments are intended to repre- tions see Olson et al. (1989). sent the best available scientific opinion at the The incident also highlights a central problem time they are published. However, the basis of currently facing both scientists and adminis- their calculations has already been seriously trators concerned with earthquake prediction: challenged (see Kagan and Jackson, 1991, how to proceed in the light of plausible, but not 1994b). While such challenges may be seen as a fully statistically substantiated, models or hy- necessary and natural part of the evolution of 506 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538

improved forecasting methods, they highlight the ing and reporting data, and more eclectic in the dangers of basing operational decisions on only forms of information they are willing to consider, partially tested hypotheses. than their Western colleagues). It appears that It seems difficult to argue, however, that such the process of quantifying the risk was relatively calculations should not be made, and that they informal, but took place within a environment should not be made publicly available. The that, in this case at least, enabled a bold decision problem is rather in finding the proper adminis- to be taken and acted upon (see Bolt (1988); trative response to estimates which are admitted- Lomnitz (1994) takes a more sceptical view of ly based on provisional hypotheses. the proceedings). Several authors have recently treated the Decisions to act upon information provided by problem of calling an earthquake alert from a scientists have been taken in countries other than decision-theoretic point of view (see Ellis (1985), China. In both Japan and the US, areas have Molchan (1991), and the discussion in Section 5 been declared temporary high risk zones where of the present article). They conclude that under particular precautions should be taken (for ex- quite wide conditions the optimal strategy will ample, Mogi, 1985; Bakun et al., 1986). The have the form "call an alert when the conditional overall record of such efforts, however, is not probability intensity [see Section 3 for defini- impressive. The form of alert has been rather tions] exceeds a certain threshold level". Such generalised and, in the cases known to me, the considerations reinforce the view that the pri- predicted event has failed to materialise. In mary task of the scientist concerned with earth- other cases, major events have occurred without quake prediction should be the development of any form of alert being issued. theories and models which allow such condition- In summary, it must be admitted that attempts al probabilities to be explicitly calculated from to estimate time-varying risks, or to issue earth- past information. On the other hand the de- quake predictions, have so far yielded less of velopment of an alarm strategy, including the practical value than the long-term/static esti- setting of suitable threshold levels and the ac- mates of average risk used in engineering and tions which should follow an alarm, should not insurance applications. The development of be the responsibility of the scientist, but of a building codes and associated zoning schemes, group representing appropriate local and nation- the use of long-term risk estimates as essential al authorities, with the scientist (not to mention input to the design of risk-sensitive structures, the statistician) in an advisory ~ole. and the development of risk estimates for earth- In the sense of a decision to call an earthquake quake insurance and reinsurance, have contribu- alert, there have been several examples of suc- ted substantially to reducing both human and cessful earthquake predictions, notably in China economic losses from earthquakes. Despite their (see for example, Cao and Aki, 1983). The most great potential, earthquake prediction methods famous of these was the Haicheng earthquake of have still to reach the stage where they make a 1975, when the decision to evacuate substantial comparable contribution. numbers of people from the city was made less In the remainder of this article I shall look in than 24 hours before the occurrence of a major more detail at some of the statistical models and earthquake. Such examples illustrate, from a methods which have been suggested for estimat- different point of view, the fact that making an ing both static and time-varying risks. The next earthquake prediction is a different problem two sections give a brief summary of the techni- from quantifying the risk. The information avail- cal material which is required, first from the able to the Chinese seismologists in Haicheng in seismological and then from the statistical side. 1975 was not very different in kind to what is The substantive sections are Section 4, dealing available to most other seismologists (although with the static risk, Section 5 looking in general the Chinese tend to be more assiduous in collect- at the problem of estimating a time-varying risk, D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 507 and Section 6, containing an account of some are essential for any statistical analysis of an specific algorithms for earthquake predictions earthquake catalogue. If any one of these is and risk forecasts. absent it is not possible to define meaningfully the scope of the catalogue or to assess its completeness or reliability. However, earth- 2. Earthquakes and earthquake risk quakes are complex events, and in many respects these five coordinates represent only the tip of 2.1. Earthquake parameters the iceberg. We shall also have occasion to refer to the following additional aspects. We provide first a brief dictionary of terminol- and : Smaller events, ogy used in discussing earthquakes. Bolt (1988) usually with closely related focal mechanisms, gives an excellent more general introduction to and falling within rather tightly defined regions, seismology. occurring respectively before and after a major Epicentre: The latitude and longitude of the event, and thought to be directly associated with projection onto the earth's surface of the point it, for example with the release of residual stress of first motion of the earthquake. Also used, induced by the main event. particularly with historical events, as a more Focal mechanism: A description of the general reference to the region of maximum orientation of the plane and the direction of ground motion. first motion in that plane, derived from studies of Depth: The depth below the surface of the the wave patterns radiated from the focus. point of first motion, usually at least 2-3 km, Seismic moment: An alternative measure of and for damaging earthquakes commonly in the the size of the event, proportional roughly to the range 5-20km. Depths up to 600 km have been product of the average stress drop over the fault recorded. The three-dimensional coordinates of and the area of fault which moves in the earth- the point of first motion are also referred to as quake. Its theoretical properties are somewhat the hypocentre or focus of the earthquake. better understood than those of the magnitude, Origin time: The instant at which motion and it or the "moment magnitude" derived from started. it is tending to replace magnitude in theoretical Magnitude: A measure of the size of the event, studies, and to some extent also in observational commonly quoted in the Richter scale, derived work. In a full treatment (Aki and Richards, from distance-adjusted indications of the am- 1980) the seismic moment is defined as a tensor. plitude of ground motion as recorded on a Energy release: Determined indirectly from standard . As a logarithmic scale is estimates of the fault motion and stress drop, used in this definition, the magnitude is roughly themselves derived from measurements on the proportional to the logarithm of most of the amplitude and character of the wave motion. physical variables related to the size of the event: Roughly proportional to 1015M÷c, where M is the energy release, length of fault movement, the magnitude. duration of shaking, etc. One unit of magnitude Earthquake intensity: A characteristic not of corresponds to an increase of energy by a factor the earthquake but of the ground motion it of about 30. As a unit for scientific work the induces at a particular site. It is most commonly magnitude has many weaknesses, and in recent measured on the modified Mercalli scale, which years there has been a proliferation of alter- for each level of intensity (I-XII) specifies native scales and definitions. In one form or characteristics of the type of motion and the type another, however, it remains the basic measure of damage likely to occur. It is related closely to of earthquake size. the maximum ground acceleration (which is used These five coordinates (epicentral latitude and as such in engineering and design studies) but longitude, depth, origin time, and magnitude) incorporates also elements of duration of shak- 508 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 ing, etc. The seismologists' intensity, as de- 2.2. Earthquake statistics scribed here is not to be confused with the point process intensity, as introduced in Section 3, A wide variety of statistical analyses may be representing a rate of occurrence. carried out on the data listed in a typical regional In fact, full entries in an earthquake catalogue catalogue. We comment briefly on the distribu- will list a great deal of further information about tions of the main parameters (see also Figs. 1-3). each event, including the numbers and locations The distribution of epicentres over a geo- of the stations recording the event; the quality of graphical region is usually very irregular. On the the estimates of the epicentre, depth, and origin largest scale this is seen in the clustering of time; any special features or observations; de- .earthquakes along plate boundaries and similar scriptions of observed ground movements and global features, but persists also on smaller fault displacements; and brief summaries of scales (Fig. 1). In this visual sense the spatial damage, deaths, and casualties. patterns exhibit self-similarity. This is confirmed

• " '"'.' I

e?,tD

.J

Ov

170 172 174 176 178 180 Longitude Fig. 1. (a) Earthquakes with M/> 3 in the main seismic region of New Zealand, 1987-1992. (b) Microearthquakes with M/> 2 in the Wellington region, 1978-1992. D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 509

tq. 0

o. z,

_1 z

Q

173.5 174.0 174.5 175.0 175.5 176.0 Longitude Fig. 1. Continued.

by estimates of the fractal dimension of the reference magnitude and b, the so-called "b- source region, which are typically around 1.5 for value", is usually in the interval 0.7-1.2 and epicentral maps (for example, Kagan, 1991a; often close to 1. This distribution corresponds to Harte, 1994). In three dimensions the dimension a Pareto distribution for earthquake energies, estimates increase to around 2.2. and has been the subject of many explanations, Frequency of occurrence decreases rapidly models, and variations. For one version, and below the crust, that is at depths beyond 20-30 further discussion, see Vere-Jones (1976, 1977) km or so, although where the earthquakes are and Bebbington et al. (1990). associated with a subducting plate, they may It is important for risk purposes to know persist within or along the margins of the sub- whether the form of the distribution is main- ducting plate to very great depths (Fig. 2). tained for very high magnitudes. The short The distribution of magnitudes is associated answer is that it is not. Magnitudes much above with one of the most striking regularities in 9 are not observed, and would imply a catas- earthquake occurrence, the Gutenberg-Richter trophic rupturing of the earth's crust. More frequency-magnitude law. This states that mag- controversial is the existence of smaller, local, nitudes follow an exponential distribution, and is maximum magnitudes associated with the limited usually quoted in the form energy-storage capacity of local geological struc- Frequency of events with magnitude > M tures. The existence of such upper bounds can be critical to the determination of risk estimates at a = 10 a-bM (2.1) particular site. Unfortunately, since one is where a is related to the frequency at some operating in the extreme tail of the distribution, 510 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538

o - •. . • ....~.:,....;:: ...... -,,~~'~:~=f.-;.,.,,~.a.,~.....".~~...: ~~~ ,.' .-. :~ :..':i-'_ .... ' .

• ,. ,',',,':;~ ..... : ..: ".~.?;~!~'~,~." • , .::,I"!.~!!~~8~L: "..::/.'.i:~.~ ' ~,".~ "'" "."." " 8 • ,~.=...~,~.~.¢~.'... ~.... "7, ....'. :.,,~..',.~ -

•.." " ":~:';r:~'" ." " J~ ...":i ...7"'~'~. ' "': ' • .....,~... ,,...... Q "7, •..: :;.,i:'..':,;, ~.

• . . ".-'-

I I I 170.5 171.0 171.5 172.0 172.5 173.0

latitude * sin(-30) + longitude * cos(-30) Fig. 2. The depth distribution of microearthquakes in the Wellington region. Same data as in Fig. l(b) for a cross-section looking 23.5 ° E of N (corresponding to the transformation t = latitude x sin ( - 30 °) = longitude x cos ( - 30°)). the data is usually insufficient to distinguish (1894), is often an adequate approximation. clearly between models for the upper tail, or to Again there are many explanations and varia- estimate at all reliably the value of a possible tions of the model. In a general sense the power upper-bound. One form with some theoretical law distributions associated with both the Guten- backing (see Vere-Jones, 1977; Kagan, 1991b) berg-Richter and Omori laws bear witness to corresponds to the energy distribution with tail self-similar properties of the earthquake process. probabilities of the form 2.3. Varieties of risk P(energy > E) = cE-~ e -~E (2.2) The term "earthquake risk" is used loosely in One other pronounced statistical regularity a variety of senses, and we shall attempt to concerns the decrease of frequency with time distinguish between these. The principal distinc- along an sequence. This again follows tion we shall make depends on whether (like a a power law form, and is usually well modelled geophysicist) one is concerned with the occur- by a non-homogeneous Poisson process with rate rence of earthquakes per se, or (like an en- function (intensity) of the form gineer) with the effects of ground motion, or (like an economic planner or insurance special- A(t) = (c + t)-o+a) (2.3) ist) with the costs of damage. This leads us to where c and 8 are constants specific to the make the following distinctions. sequence but usually small, so that the simplest (i) The geophysical risk (or earthquake hazard) form A(t)[oct-1], first discovered by Omori is the expected rate of occurrence (number of D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 511

%0 %

E e ~

0- o

e'•

I I I I 3 4 5 6

in Fig. 3. The Gutenberg-Richter relation, in the standard form of a log-survivor plot. The horizontal axis represents magnitude, the vertical axis log~oP(M/> m). The slope of the line gives the b-value. events per unit time) within a prescribed region mological context, following the recommenda- and over a certain magnitude level. tions of UNEP (1980), the geophysical risk (ii) The engineering risk is the expected rate of would be called the "earthquake hazard" or occurrence of exceedances of a specified level of possibly the "hazard rate", while the term "risk" ground motion (measured either in terms of itself would be restricted to the product of Mercalli intensity or ground acceleration) at a hazard rate by vulnerability (i.e. amount at risk), particular site or within a prescribed region. this corresponding to an expected loss as in (iii) (iii) The economic risk is the expected losses above. (in dollar terms), per unit time, from earth- The above definitions are intended in the first quakes affecting a particular city or other instance as descriptions of the long-term average, specified region. static, or background risk. However, they can The reader should be warned that the above is also be applied to time-dependent risks, but in not standard terminology. Terms such as "haz- this case must be thought of as instantaneous ard", "intensity", and "risk" are used differently values, predicted or interpolated using some in the seismological, engineering, insurance, and appropriate model structure, and conditional on point process contexts. We shall use "risk" as the relevant prior information. general term for a probability or expected rate of In all cases we have chosen to make the occurrence of a damaging event. In the seis- definitions in terms of rates (expected numbers 512 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 per unit time) rather than probabilities. One of particular size at a particular source, "attenua- the difficulties in using probabilities directly is tion factors" can be used to estimate the am- that they are tied to specific and essentially plitude of ground motion at a particular site. The arbitrary intervals or windows. Standardising the overall risk at the site can then be obtained by time interval, in order to facilitate comparisons, summing (integrating) the contributions from is tantamount to working in rates, at least if the different possible sources. Additional complex- time interval is small. ities arise from the effects of soil conditions Working in rates also avoids a premature (micro-zoning), the focusing effects of local commitment to a specified model. Most en- topography or geological structures, the fre- gineering and insurance applications use static quency content of the seismic waves, directional background rates, and these can be estimated effects, etc. A standard reference to this stage is from simple average frequencies. The prob- Cornell (1968); see also Cornell (1980), Vere- abilities, on the other hand, have to be calcu- Jones (1983), Veneziano and Van Dyck (1987). lated assuming either the Poisson model or some The final stage uses the estimated levels of alternative to it, introducing what may be ground movement to forecast the extent of gratuitous assumptions concerning the model. Of damage to particular types of structure and course, some commitment to a model will ulti- hence to estimate the economic losses in dollar mately be needed, in order to develop predic- terms. This stage is fraught with even greater tions, but even here, statements in terms of uncertainties than the other two. The usual expected rates rather than probabilities facili- approach is to try and estimate "damage ratios" tates comparisons. for a given type of structure subjected to a given The ratio between a local short-term rate and level of shaking. These represent the cost of the long-term background rate we shall call the damage as a proportion of the value of the risk enhancement factor (Kagan and Knopoff, structure. Expected losses can then be estimated 1977; Vere-Jones, 1978). It is analogous to the by multiplying values by damage ratios and probability gain of Utsu (1979, 1983) and Aki summing over the different structures at risk. (1981), which is a ratio of the probabilities for Dowrick (1991) and Dowrick and Rhoades some agreed region and observation interval; the (1993) present detailed analyses of damage ratios probability gain reduces to the risk enhancement arising in a recent New Zealand earthquake. factor when the intervals are small. Other terms Even when these primary losses have been used are the risk refinement factor (Rhoades and estimated, there remains the problem of losses Evison, 1979) and hazard refinement factor due to secondary effects, particularly fires. These (Evison, 1984). Depending on circumstances, the are yet more difficult to assess, and more vari- risk enhancement factor may be greater or less able, than those associated with the direct effects than one; in the latter case we may also talk of the ground motion. In practice, simulations about the risk reduction factor. For risk forecasts generally offer the best way of combining models to have significant practical consequences, such for the geophysical and engineering risk with the factors need to reach an order of magnitude or particularities of a given portfolio, and are al- SO. ready widely used in the insurance and reinsur- There is an important logical progression be- ance industries (for example, Rauch and Smolka, tween the three types of risk defined above. 1992). The geophysical risk is the province of the For the remainder of this article we shall scientist and the main focus of the present paper. concentrate on the problems of forecasting the It relates directly to the physical processes caus- geophysical risk. It is the centre of scientific ing earthquakes. effort in this field, and is where the modelling Estimates of the engineering risk can be de- component originates. In reality, however, the rived from estimates of the geophysical risk. three aspects cannot be fully separated, but Granted the occurrence of an earthquake of a together form part of a tangled web of scientific, D. Vere-Jones / International Journal of Forecasting I1 (1995) 503-538 513 engineering, economic, political, administrative, instants, with which the other variables can be and social issues. associated. One then has a choice of treating the process as a point process in time (one dimen- sion), in time and space (three or four dimen- 3. The point process framework sions), or quite generally as a marked point process in time, the mark for each event con- To develop and compare the performances of taining information about all the other parame- particular models of earthquake risk, it is essen- ters it is desired to include in the study. tial to have available a sufficiently extensive and To develop a point process model it is neces- flexible theoretical framework. For the geophysi- sary in principle to specify consistent joint dis- cal risk in particular, the natural framework is tributions for the random variables N(A) count- that provided by the theory of random point ing the numbers of events in subsets A of the processes. This is a point of view I have long chosen phase space (time, time × space, time x advocated (see for exampte, Vere-Jones, 1970, mark). The process is stationary in time if these 1978), but it has yet to be fully exploited in the joint distributions remain invariant under field of earthquake prediction. One reason may simultaneous and equal shifts of the sets A along be that not only are probabilistic methods in the time axis. general relatively new to earthquake prediction, The first and second moments of the random but point processes in particular require different variables N(A), are given by the set functions forecasting procedures to those commonly used re(A) = E[N(A)] in other branches of time series. Traditional time m2(A x B) = E[(N(A )N(B))], series methods are based on the linear predic- tions which are optimal for Gaussian processes. and play a specially important role. In the special Forecasting the time to the next event is essen- case of a stationary point process in time alone, tially a non-linear problem, however, and the there exists a mean rate m (assumed finite and distributions involved are often far from Gaus- non-zero) such that sian. None the less quite general and flexible re(A) = mlA I methods of forecasting point processes are now available, closely related to the "Cox regression or, in infinitesimal notation methods" (Cox, 1972) used in reliability and E[dN(t)] = m.dt other contexts for modelling the effects of exter- nal factors on life distributions. where IAI is the length (Lebesgue measure) of In this section we provide an introduction to the set A. The second order properties of such a the most important ideas needed to develop process are most conveniently stated in terms of forecasting procedures in the point process con- a covariance density c(r) defined loosely by text. There are many texts on point processes Cov(dN(t), dN(t + ~')) = [c(r) + m6 (r)] dt dr. which can be referred to for further details - see, for example, Cox and Lewis (1966), Snyder Here the delta-function term m6(r) arises (1975), and Cox and Isham (1980) for modelling from the point character of the process, which aspects, and Bremaud (1981) or Daley and Vere- implies that E[dN(t) 2] is of order dt, whereas Jones (1988) for the background theory. Snyder E(dN(t) dN(s)] is of order dt ds for t ~ s. and Bremaud as well as Karr (1986), Kutoyants It is possible in principle to develop a second (1984) and Daley and Vere-Jones, all give ac- order prediction theory based on these concepts, counts of the conditional intensity concept, (see, for example, chapter 11 of Daley and Vere- which is central to the development both of Jones, 1988) analogous in many ways to the forecasts and of simulations. second order prediction theory for continuous The point process context is relevant so long processes. It is useful, however, only for prob- as the origin times can be treated as time lems in which the time scale is large compared 514 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 with the mean interval between events. Dis- ;t(tl:~t)dt tributions of the number of events then N(A) = E[dN(t)lpast history ~t up to time t]. become more closely Gaussian, and the linear predictions which derive from the second order More generally, for a marked point process in theory become closer to optimal. This is em- time phatically not the case for predicting the time to A(t, M[~)dtdM the next event. Here it is more natural to focus on the sequence of time intervals between = E[N(dt x dM)lpast history ~], events, which will also be stationary, and to where the past history may now include infor- attempt to forecast the length of the current mation about the marks {Mi} (each of which may interval. be a vector), as well as the times of past events. An immediate difficulty is the need to take In fact it is possible to define the past history into account the time which has already elapsed even more generally, to include information since the occurrence of the last event. If the about external variables, or processes evolving in intervals are presumed independent (i.e. form a time in parallel with the point process being renewal process) then one can work from the studied. The crucial problem is to determine how conditional distribution of the remaining life- the conditional intensity depends on such past time, given the time already elapsed. This dis- variables. This requires both physical and mathe- tribution is most conveniently represented in matical insights, and might be considered a terms of the hazard function h(t), defined for crystallisation of Aki's concern that the scientist lifetimes (i.e. interval lengths) by endeavour to find objective means of calculating h(t)dt = Prob(life ends in (t, t + dt)llife > t). probabilities of occurrence conditional on given (3.1) precursory information. Recognition of the crucial role of the con- If the life distribution has density f(t), and ditional intensity function was a major turning distribution function F(t), then point in the study of point processes, and may well be so for earthquake prediction also. h(t) = f (t)/[l - F(t)]. (3.2) Among its key properties are the following, One then has where for simplicity we return to the one-dimen- sional case and contract the notation to A(t). (i) A knowledge of the conditional intensity P(life time > t + ~'llife > t) = exp - h(u)du function, as an explicit function of the past, completely defines the probability structure of the process. from which the expected residual lifetime, or any (ii) The log likelihood of a realisation other characteristic of the distribution of the (t 1...tN) in the time interval [0,T] is given by time to the next event, can be computed. For a process of independent intervals, the T hazard function embraces all the information logL(t 1, t2,...,tN) = ~ log A(tl)- A(u) du. about the structure of the process needed for i f 0 predictions. In the general case this hazard (3.3) function has to be conditioned, not only by the time since the last event, but by any additional This representation is of great importance information concerning the past history that may because (under suitable regularity conditions: see affect the distribution of the remaining time. It Ogata (1978)) it allows the full machinery of then becomes nothing other than the conditional likelihood methods to be used with conditional probability intensity, or conditional intensity intensity models, including not only the standard function estimation and hypothesis testing methods, but D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 515 also likelihood-based model selection criteria The forecasting and simulation properties also such as AIC (Akaike, 1974; Sakamoto et al., generalise in an obvious way, at least if the total 1983). intensity, fm A(t, m)dm = A(t) remains finite. To (iii) If T* denotes the time to the next event, simulate the process, for example, one may one can write generate first the time T* to the next event of any kind, using A(t) in place of A(t) in (iii) and (iv) above. A value of M, say M*, can then be P(r* >~-)=exp (;)- A*(u)du (3.4) generated from the density f*(m)=A(t+ 0 T*,rn)/A(t+ T*). Adding the generated pair where A*(u) is obtained by extrapolating A(t[~) (t + T*, M*) to the history, one can then repeat forward from t to t + u under the assumption the exercise, and so continue. Practical difficul- ties can arise near the boundaries, where in- that no further event occurs in (t,t + u). formation may be missing, or in the extension to (iv) The above result allows A(t) to be used as a basis for simulations, in particular for simula- models with infinite total rates, but even in such tion forward given a past history ~. One then cases the simulation can generally proceed with some approximations. uses (iii) to generate a value of T*, adds the Such simulations provide a comprehensive occurrence of the event at t + T* to an updated view of the future of the process, from which it is history ~+r*, and repeats the simulation. For further details and alternative methods see possible to estimate the distributions for any quantities of interest, such as the time to the Ogata (1981) and Wang et al. (1991). next event with a mark in some specified subset, (v) The differences dN(t) - Aft) dt integrate to form a continuous time Martingale. The con- the expected numbers of events within specified ditional intensity therefore provides a link to the time and mark boundaries, etc. Martingale properties which have been of key A final extension, also of considerable impor- importance in other branches of stochastic pro- tance, is to situations where the conditional cess theory. See the references cited earlier, intensity depends on external variables as well as especially Bremaud (1981) or Karr (1986), for the past history of the process itself. In the further details and a more rigorous treatment. earthquake context such external variables could (vi) In the special case of a stationary Poisson include the coordinates of any precursor events, process, A(t) reduces to a constant A(=rn) or the values of physical processes evolving independent of the past. This is the "lack of concurrently in time. Within a comprehensive memory" property of the Poisson process: it has model, such variables would be modelled jointly a constant risk, irrespective of the past occur- with the point process. When a comprehensive rence or non-occurrence of events. model is not available, it may still be possible to It is very important for the applications to develop explicit expressions conditional on the earthquakes that these results extend in a natural values of the external variables, much as is done way to marked point processes (the background in the Cox regression models referred to earlier theory is in Jacod, 1975). In particular, the (see Cox, 1972, 1975). likelihood for a set of observed occurrences The formal simplicity of these extensions should not be allowed to disguise the fact that (tl,M0, (t2,M2), ..., (tN,MN) over the time interval (0,T) and a region of marks m becomes developing and fitting models in higher dimen- sions is never a simple task. log L[(t~, M~), (t 2, M2)...(tN, MN) ] =

7" M F f 4. Estimating the background risk

1 ,J d 0 m As we indicated in Section 2, most engineering (3.5) and insurance applications are based on esti- 516 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 mates of the long-term or background risk, how the motion is divided when it is spread over assuming implicitly that the underlying processes a network of faults and a long period of time. are stationary. In this section we examine in The fundamental dispute, which has persisted somewhat greater detail the questions posed by now for over half a century, and casts its shadow estimating such rates. Even if one's concern is also over methods for earthquake prediction, is more with the time-varying rates, the back- between the geological and seismological evi- ground rates are important as a reference against dence; whether the major risk is associated with which short-term fluctuations can be gauged. events on major, identified faults, or is more widely distributed as the seismological records of smaller events tends to suggest. The methodolo- 4.1. Sources of data gies appropriate to these two situations are widely different, and we consider each context A fundamental question which arises in this briefly below, starting with the seismological context concerns the most appropriate source of approach. A final subsection looks at the role information. There are at least four as listed played by the Poisson model. below. (i) Information from current instrumental 4.2. Estimating background rates from the catalogues. catalogue data (ii) Records of historical earthquakes. (iii) "Paleoseismological" evidence of large fault This topic is of importance in its own right. It movements preserved in peat bogs, stream encompasses not only questions about time offsets, uplifted terraces, etc. rates, but also about estimating the parameters (iv) Measurements of the relative motions of in the frequency magnitude law, estimating and plate boundaries, from both geological and displaying the spatial distribution of seismicity, geodetic sources. and investigating possible spatial variations in Each of these sources has its own weaknesses the distribution of magnitudes. In most of this as a basis for estimating the occurrence rates of work it is tacitly assumed that we are dealing major damaging earthquakes. The catalogue with a model of "distributed seismicity"-that records are generally too short to allow for direct magnitude and space distributions have smoothly estimates of the rates of large events, while varying density functions which can be estimated extrapolation from the rates for smaller events by suitable averaging procedures. begs the question of whether the relative rates To focus ideas, consider first a general station- remain constant. The records of historical earth- ary, marked, point process in time, with finite quakes are rarely complete, being subject to the overall rate m say. In our context, m would be vicissitudes of war, changing regimes, shifting interpreted as the overall rate of events lying trade routes and population centres, and to within a defined observation region and over considerable uncertainties concerning mag- some threshold magnitude M 0. For any measur- nitudes. The data from "fossil" earthquakes are able subset C of the mark space (i.e. earth- restricted to major events causing large, perma- quakes falling within a given space and mag- nent ground movements. This leaves unanswered nitude window) one can define also the average any questions concerning smaller events, which rate re(C)~ m of events with marks falling into may be highly damaging on a human scale but C. It is clear that re(C) is additive over disjoint not yet large enough to produce fault breaks. It subsets of the mark space, so that the normalised is also difficult to establish whether the move- set function ment really occurred in a single large event. II(C) = m(C)/m Finally, use of the rates of plate motion raises unresolved issues concerning the proportion of defines a probability distribution over the set of that motion that is taken up by earthquakes, and marks. It may be called the stationary mark D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 517 distribution. In our context it will be a joint which form the spatial component of the mark. distribution of magnitude and epicentre. The aim Here, however, a parametric form is unlikely to is to estimate, not merely the overall rate m, but be useful, and some form of non,parametric or the set function m(C) or II(C). semi-parametric estimate will generally be neces- For example, if the epicentral coordinates are sary. A range of non-parametric density estima- ignored (data is pooled for the whole region), tion methods is now available (see, for example, the resulting Gutenberg-Richter relation may be texts such as Silverman, 1986; Stoyan et al., regarded as defining the (marginal) stationary 1987; Ripley, 1988). The methods have been distribution of magnitudes. As a probability reviewed from the perspective of earthquake distribution it is usually written in the form catalogues in Vere-Jones (1992). With one caveat (see below) they represent standard 2- or 3- P(Mag > MlMag/> Mo) = 10 b(g-Mo) dimensional procedures. A simple, and generally where M o is the threshold magnitude. Alter- quite adequate, procedure, is that of kernel natively, multiplying both sides by the overall estimation. Here the spatial intensity (rate per rate gives unit time and area) at a point x is estimated by an expression of the form Frequency of events with Mag/> M = lOa b(M MI)). X(x) - i kh(x - xi) -Ti where lO a = m is the overall rate of events with where T is the time span of the catalogue, the x i Mag >/M 0. The parameters a (or m) and b can are coordinates of the events in the catalogue be estimated very straightforwardly from (now lumping together the magnitudes) and k h is = loglo (N/T) a 2-dimensional, usually isotropic, probability density with a variable scale parameter h (such /) = (-M - M0) log10 e as the standard deviation if k is taken to be where N is the total number of events with isotropic normal). The corresponding probability M > M0, T is the time interval covered by the density is obtained by dividing by the overall rate catalogue, M is the average magnitude of events m = N/T. In either case the crucial decision is with M >M 0. Both estimates are method of the choice of the bandwidth h, to give an optimal moment estimates, and are also maximum likeli- compromise between bias (caused by over- hood estimates under the assumption that events smoothing, h too large) and variability (caused are occurring randomly and independently in by under-smoothing, h too small). time. Some care has to be taken over details The caveat concerning the earthquake applica- such as rounding of magnitude values (see for tions relates to the assumption of independent example, Vere-Jones, 1988). observations which is incorporated into the stan- As mentioned in Section 2 there is a consider- dard recommendations for the choice of h. In the able literature devoted to variants on the simple present context, time and space clustering repre- Gutenberg-Richter relation and the closely re- sents a departure from this assumption which lated question of whether or not there exist local needs to be examined with some care. The aim upper bounds for the magnitudes. Space pre- should be to smooth over ephemeral clusters cludes entering into details, but if the simple (which die away with time and are not repeated Gutenberg-Richter form appeared inadequate, at the same location) but to preserve local one might use a truncated (censored) exponen- regions of high intensity which persist over time. tial if the truncation point had been determined A rather crude approach to this problem, sug- a priori on geophysical grounds, otherwise a gested in Vere-Jones (1992), is to choose h by a 2-parameter form such as (2.2). "learning and evaluation" technique, smoothing Somewhat analogous questions arise in es- the first part of the catalogue with a given h, and timating the spatial distribution of epicentres, scoring the pattern so obtained with the data in 518 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538

the second part, finally choosing that h which so that tracking the variation of/~ as a function gives the best score. See Kagan and Jackson of x is equivalent to tracking the behaviour of (1994a) for a further application. the mean magnitude (-M - M0) as a function of x. Alternatives to kernel estimation are the use Once again there is a family of procedures of orthogonal expansions, and the use of splines. available, including versions of the kernel and We refer to Vere-Jones (1992) for further details. spline procedures. The kernel estimate, for ex- An example of the application of more sophisti- ample, takes the form cated techniques (Bayesian spline fitting proce- dures) is given by Ogata and Katsura (1988). Ekh(X -- Xi) Whichever method is used, once estimates ~(x) = E(Mi - Mo)kh(x - xi) ,~(x) are available for a given location x, stan- dard routines can be used to evaluate )~ (x) on a representing the reciprocal of a weighted average grid of points over the observation region, and to of the magnitudes, with the weights decreasing produce 3-D or contour plots from these. In- as a function of distance from x. Again Vere- deed, ad hoc methods of this kind have long Jones (1992) gives further details, while Ogata been used by seismologists and earthquake en- and Katsura (1988) give an example using a gineers to produce contour plots, not so much of Bayesian spline approach. the geophysical risk, but of the engineering risk. These are most commonly displayed as contour plots of the intensity or ground acceleration 4.3. Rate estimates from the fault model likely to be exceeded with a given low probabili- ty, in a given period, or conversely as contour l'he main alternative to a spatially distributed plots of the expected return time for a given model is to assume that events are restricted to a level of shaking. See, for example, Algermissen network of more or less well-identified faults. et al. (1982) and Basham et al. (1985). A series There are two lines of evidence supporting this of more recent examples, which illustrate differ- point of view. The first is the abundant evidence ent approaches and technical points, is contained associating fault movement with the occurrence in Berry (1989). of large earthquakes. The second is the more An important issue of principle is whether the recent evidence quoted in Section 2 that the spatial distributions for low magnitude events are fractal dimension of the source region of earth- similar to those of high magnitudes, or, equiva- quakes, even when calculated from catalogue lently, whether b-values vary with space. For data, is significantly below the nominal dimen- these purposes the joint distribution of epicentral sion of earthquake epicentre maps. Such results location and magnitude is needed. From a might be expected if events were concentrated computational viewpoint, the easiest approach is on a network of fault planes, but the interpreta- probably to write the joint rate in the form tion is not completely transparent and other explanations may also be possible. A(x, M) = A(x)f(MIx) Many procedures for estimating the engineer- ing risk are based on identifying major active where f(Mlx ) is the conditional density for faults and estimating the mean return times for magnitudes, given a location x, and A(x) is the major events on each. Recently this view has marginal epicentral intensity. Assuming f(MIx ) been narrowed by supposing that any given fault retains the exponential form, estimating f(Mlx ) or fault segment is characterised by events of a reduces to the problem of estimating spatial particular magnitude (or slip). The main evi- fluctuations in the parameter b, which can be dence for this "characteristic earthquake" hy- regarded as a problem of non-parametric regres- pothesis comes from the records of "fossil" sion. This is the more obvious in that the earthquakes, as outlined in Section 4.1. It has estimate/~ is the reciprocal of mean magnitude, several times been suggested that the events D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 519 recorded in this way appear to be roughly workers, and that it figures extensively in nearly constant in both size and return time (see for all discussions of the engineering and economic example, Sieh, 1981; Schwarz and Coppersmith, risks. 1984). The direct seismological evidence for such The reason is related to a comment made a viewpoint is scanty, but reports have been earlier, that a large part of the discussion can be given of similar sequences of events in particular carried out directly in terms of mean rates, or locations (Bakun and McEvilly, 1984). mean return times, implying that under the An apparent objection to such a model is that assumption of stationary, the exact choice of it contradicts the immense amount of data sup- model is immaterial. The Poisson assumption porting the usual frequency-magnitude law. One enters only when it is desired to compute prob- possible response is that the frequency-mag- abilities from the mean rate m, which is then nitude law may not reflect the size distribution of done according to the exponential formula earthquakes on a given fault, but the size dis- Prob(no events in (t, t + x)) = e-mX tribution of faults. This interpretation has the advantage of linking the self-similarity observed In fact the probabilities as such rarely play an at the geographical level to the power law form important operational role. They have become of the frequency-magnitude law. It is an idea accepted in decisions concerning design at least that has been part of the folklore for some time partly as a matter of convention. Provided the (see for example, Lomnitz' contribution to Vere- Poisson formula is always used to compute the Jones (1970) and the author's reply). Since the probability, there is (for a given design period x) surface break of a fault may not indicate its real a 1:1 monotonic relationship between rates and size, while most of the smaller faults would not probabilities so that in effect the decisions are even reach the surface, the two hypotheses of based on an assessment of rates. distributed and fault-linked seismicity may be The fact that the Poisson model plays the harder to separate than one might at first im- default role is to be expected. It is the "maxi- agine. mum entropy" model (i.e. making least assump- If the characteristic earthquake hypothesis is tions in some sense) for a point process with accepted, then estimating the background rate of given rates, and arises as the result of many events on a given major fault is reduced to "information-reducing" transformations of a counting the number of events on the fault, and point process (see Daley and Vere-Jones (1988) dividing by the time interval. Where the number chapters 9 and 11 for more discussion of these of events is insufficient, evidence from the plate ideas). motion, coupled with some estimate of the likely In particular, the engineering risk is usually size of the characteristic earthquake on the fault, the result of summing the contributions to the provides a second more indirect route. Neither risk from a number of near-independent sources. method is very reliable. The major review by It is well known that the superposition of se- Kagan and Jackson (1994b) suggests that current quences of events from several independent estimates based on these procedures may be sources approximates a Poisson process (see for exaggerated, and calls into question the whole example, Daley and Vere-Jones, 1988). Hence, basis of the characteristic earthquake hypothesis. the combined point process of exceedances of a given level of ground motion at a given site is likely to be closer to Poisson than the component 4.4. A note on the Poisson model from any individual source region. Although the effects of departures from the The reader may be surprised to find this topic Poisson model would tend to become more left until the end of the section, given that the pronounced at very small probability levels, the Poisson model is widely treated as the default author has argued elsewhere (Vere-Jones, 1983; option by seismologists, engineers, and insurance see also McGuire and Shedlock, 1981) that even 520 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538

here other errors and uncertainties are likely to varying forecasts of the risk from large earth- swamp those deriving from departures from the quakes. Poisson model. (ii) As we would see it, in an ideal situation, The net result is that while the Poisson model the proper sequence of events would be: first, plays an indispensable role in guiding the discus- model development and model validation; sec- sions of risk into sensible channels, it is not ond, development of time-varying risk forecasts usually a critical assumption for operational based on the selected model; third, development decisions based on the background risk: if any- of earthquake alerts based on the risk forecasts. thing it tends to provide a somewhat conserva- The actual situation presents a considerable tive source of estimates. distortion of this sequence. The main emphasis This situation, however, alters drastically has immediately jumped to the stage of develop- when we approach the realm of time-dependent ing and testing prediction algorithms, often with- risks. The constant risk or lack-of-memory prop- out the development of proper physical or erty of the Poisson model is contradicted, albeit statistical models, and without realistic discus- in different directions, both by the prevalence of sion of the operational use of the predictions. clustering, and by the characteristic earthquake Two main reasons may be suggested for this model. Any proposal for earthquake prediction premature (as it seems to me) emphasis on will be measured in the first instance by the predictions: the great difficulty of developing extent of its improvement over the constant-risk physical models; and the social and political Poisson model. pressures on scientists to develop successful prediction procedures. (iii) The discussion of earthquake prediction is further muddied by ambiguities in just what a 5. Time-varying risks- general issues prediction means. In some discussions it may mean nothing more than a calculation of prob- 5.1. An overview abilities, a forecast of the time-varying risk. By contrast, a formally announced prediction As soon as we enter the domain of time inevitably takes on some of the character of an varying risks, new and difficult issues arise. Some earthquake alert. Calling an alert, however, is of these are as follows. essentially an economic and political decision. (i) Because of the relative paucity of data The emphasis on developing formal predictions relating to large damaging earthquakes, direct has therefore placed the scientist in the awkward estimation of the relevant conditional intensities, position of trying to carry some of the obliga- or conditional probabilities, for such events tions and responsibilities that more properly becomes very difficult. To a much greater extent belong to local and national authorities. than for the background risk, estimates of any (iv) The emphasis on validating prediction form of time-varying risk are likely to be strongly algorithms, as against the more general pro- model-dependent. The model-based approach cedure of model validation, produces the further takes advantage of model structure to reduce the danger that the former process may become number of parameters that have to be estimated, equated to the latter. However, the optimal thus allowing the existing data to be used more procedure for testing a particular algorithm may effectively. The corresponding disadvantage is not be optimal for testing the model on which it that model-validation becomes a major concern is based. Moreover, by highlighting the problem in its own right. Different models, leading to of validating particular prediction algorithms, the different prediction scenarios, may be supported possibility may be overlooked that existing to a similar extent by the data. At the present models, despite their imperfections, can provide time, there is no general agreement as to the sufficiently reliable guidelines to initiate worth- most appropriate model to be used for time- while risk-reduction activities. D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 521

(v) Some approaches to prediction, in par- 5.2. Calling an earthquake alert as a problem in ticular the pattern-recognition approach, eschew optimal decision making the idea of models as part of a basic philosophi- cal stance. The aim is to be purely empirical, Several authors have recently considered using only the data and mechanical search pro- earthquake prediction from this point of view, cedures to identify and calibrate the thresholds and pointed out that here also the conditional for a prediction algorithm. In so far as they are intensity, as defined in Section 3, plays a key assessed in terms of frequencies of success or role. The ideas were first raised in Ellis (1985), failure (false alarms and failures to predict), such but have recently been considerably generalised, methods are still inherently probabilistic in sharpened and extended in a series of papers by character. Rather than accepting the prediction Molchan and others (Molchan, 1990, 1991, 1994; algorithm so obtained as a final result, it would Molchan and Kagan, 1992). We present a brief seem more sensible to view the variables in- outline of the basic argument. corporated in the algorithm as promising ingredi- The class of situations considered is restricted ents for a more explicit model, and examine in the first instance to those where only three them carefully for their possible physical mean- quantities are needed to describe the cost struc- ing. ture, namely (vi) As in the estimation of background risk, a -the cost per unit time of maintaining the there is a need to distinguish between the geo- alert physical and the engineering risk. Earthquake C~ - the cost of a predicted event (i.e. one falling alerts would seem more appropriately based on within the alert period) the latter; the scientist's task, on the other hand, C 2- the cost of an unpredicted event (falling is to produce models for a time-varying geo- outside the alert period) physical risk. Earthquake predictions run the Then over some time interval of length T the danger of confusing the two aspects. total costs have the form (vii) Both the information needed in the pre- diction, and the kinds of actions to which the S = al(A) + CIN(A ) + CzN(A ~) prediction may lead, depend strongly on the time : al(A) + C2N(A ) - (C 2 - CI)N(A) horizon of the prediction. It is common in the seismological literature to distinguish at least where A is the interval over which the alert is three main time-scales: short-term and imminent called, l(A) is its length, N(A) and N(A c) are the predictions (anything from a few seconds to numbers of predicted and unpredicted events about a day); intermediate-term prediction (a respectively, and we assume C 2 > C 1. few days to a few years); long-term predictions The aim is to reduce the expected cost to a (a few years or longer). The implicit focus in this minimum. paper is on intermediate- and long-term predic- We consider first the special case that the tions, although some methodological points may intensity function A(t) is continuous, and non- apply more generally. random, as in the case of a non-homogeneous It will be evident even from this incomplete Poisson process (see Fig. 4). list that moving to time-varying risks not only Suppose first that the period of the alert is to sharply increases the model-building difficulties, be restricted to a set A with given length l(A). but has administrative and political overtones Given A(t) and l(A), how best should A be which cannot be ignored. In the remaining parts chosen? of this section, we shall look at three particular We claim that the optimal choice of A is aspects in somewhat greater detail: the problem obtained by gradually increasing the crossing of calling an earthquake alert; the problem of level k (see the figure), thus decreasing the validating models and prediction algorithms; and length of the interval on which A(t)/> k, until this the problem of dealing with model uncertainty. length is just equal to the prescribed value l(A). 522 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538

these and further ramifications we refer the reader to Molchan (1994).

5.3. The validation of models and predictions

Model testing is not difficult once the models A A T have been specified in terms of conditional Fig. 4. Schematic diagram for calling an alert. The curved line is the intensity A(t); for a given length of the alert intensities. As we saw in Section 3, the likeli- period, the area under the curve is maximised by lifting the hood of such a model can be written out explicit- horizontal line until the intercepted length is just equal to ~'. ly, using either (3.3) or its extension to marked point processes (3.5). The usual range of The fact that this gives the optimal choice likelihood-based techniques then becomes avail- hinges on the equation able. Two fully-specified models can be com- pared by using the likelihood ratio as the test- E[N(A)I = fA(t)dt; statistic, an excellent illustration being given in A Evison and Rhoades (1993). Significance levels may not be easy to write down exactly, but at the As for the optional value of I(A), or equiva- worst they can be found approximately by simu- lently of the critical threshold k, this is easily lation. Where the models are nested, the usual discovered by taking expectations of the cost X a likelihood ratio test will often be applicable. equation and rewriting it in the form Likelihood methods can even be applied when the processes depend on external variables which T are not incorporated as part of the model; taking etcl=c f + -(c2-c,)a(.)]du these values as given then yields a partial or 0 A conditional likelihood which can still be used as a basis for inference (Cox, 1975). However, this The first term on the RHS is fixed. The maxi- approach may create problems for predictions, mum of the second term is achieved where A is since the model does not include a description of taken to be the set how to predict forward the external variables. A = {u:a - (C2 - Cl),~.(u) > 0 } Even where models are not nested, the likelihood-based AIC procedure can still prove This corresponds to taking k=a(C 2- C1). In useful in model selection; Vere-Jones and Ozaki other words, gains are to be expected from (1982), Ogata (1983, 1988), Zheng and Vere- calling an alert as soon as the expected savings Jones (1991) are among many papers making use per unit time from calling the alert exceeds the of this approach to compare models in the expected cost per unit time of maintaining the seismological context. This procedure is not alert. Since the expecting saving per event (C2 - intended for model-testing in the Neyman-Pear- C1) is typically several orders of magnitude son sense, but for selecting the model most likely greater than the cost per unit time of maintaining to prove practically effective in a prediction or the alert, the critical value of the intensity k can decision-making context. In fact this is a situa- be quite small. tion closer to the reality of earthquake prediction The same type of consideration carries over to than the idealised context envisaged in the the case where the intensity A depends on space Neyman-Pearson scheme. as well as time, or is even random (i.e. a More commonly, in the context of earthquake conditional intensity). In the latter case a sam- predictions, it is required that a particular model ple-path regularity condition (ensuring "predic- be developed to the stage of an explicit algo- tability" of the set A where A(t) > k) is needed to rithm for calling a prediction or earthquake maintain the validity of the discussions. For alert. Then the algorithm itself is tested by D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 523

recording, for a data set not used in developing As against this, it can be claimed that in any the model, the numbers of successful predic- application of such importance, the performance tions, failures to predict, and false alarms, and of the final prediction algorithm needs to be comparing these with the numbers that might be checked directly, and independently of any ear- expected on the basis of a simple null hypothesis, lier validations of the model. Kagan and Jackson usually that of a stationary Poisson process. (1991, 1994b) have rightly pointed to the difficul- As mentioned earlier, exclusive reliance on ty of ensuring that the initial choice of data, testing prediction algorithms has several weak- especially threshold levels and geographical nesses as a general prescription. It leads too boundaries, does not have the effect of favouring easily to the validity of the model being judged (not necessarily wittingly), the data that best by the success or otherwise of the particular supports the model, thereby introducing a bias in prediction algorithm. The prediction, however, favour of the model. Rhoades and Evison (1989) is based on specific thresholds-ideally, as dis- point to the need for a careful distinction be- cussed in the previous section, on thresholds tween subjective and objective stages of develop- stated in terms of the conditional intensity. It ing a scientific model, and demand that the may be quite possible for a model to be valid, model verification stage be free of subjective but for the prediction to fail through a wrong elements. Even with the best intentions, how- choice of threshold, or to be suspended because ever, it is hard to avoid all forms of bias, so that for a given threshold, there is insufficient data to continuing evaluation of the model and associ- give significant results-a probable outcome ated algorithms, on new data, in actual use or in when the predictions are tested on large events. conditions as close as possible to actual use, is This approach also introduces, to some extent ultimately essential. gratuitously, the technical problem of developing statistical procedures for testing prediction algo- 5.4. Dealing with model uncertainty rithms. A common problem is that successive events may not be independent, so that the Even in the most optimistic scenario it seems simplest analysis, assuming successes and failures unrealistic to assume that the model could ever form independent Bernoulli trials, may be mis- be known exactly. Even if its general structure leading. Rhoades and Evison have explored this were known, parameter values would need to be issue in some depth (see, for example Rhoades, estimated. The number of parameters required 1989; Rhoades and Evison, 1979, 1993). to define the model to an adequate degree of In their analyses they generally suppose that precision might also be uncertain. Finally, there specific precursors are available in the form of might be uncertainty about the model structure. signals or alerts which occur as associated point Competing model structures might be supported processes. A more general approach may be to a similar degree by the available data, but needed if the physical variables incorporating the might lead to different forecasts of the time- predictive information are themselves continu- varying hazard. What can be said about predic- ous. In such a case it is not obvious that first tion in the face of such uncertainties? converting these signals into discrete forms, by First of all, it should be emphasised that in this setting thresholds, will generate optimal proce- respect earthquake prediction is not different in dures, nor is it obvious, if such conversions are kind from other forms of statistical prediction. used, how optimal thresholds should be chosen. Even in the simplest example of prediction based Moreover it would seem difficult to discuss such on a regression, it is recognised that the predic- questions without the existence of a joint model tion will be subject to error from two sources: for both the earthquakes and the associated the uncertainty inherent in the randomess of the processes, in which case the joint model itself predicted event itself (deriving ultimately from should be tested, before any algorithms based on factors which have not been captured in the its use are developed. model) and that deriving from uncertainty in the 524 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 parameters. When predicting a characteristic of mates (or probabilities) prepared on the basis of the distribution of the predicted event, such as these. Each model's likelihood would contribute its expected value, or the probability of the event to its posterior weighting, and from time to time exceeding a certain threshold, the former com- the contents of the basket could be reviewed, ponent drops away, but the latter remains. dropping out old models that had not performed Exactly this situation applies to predicting well, and replacing them with new ones which expected values or probabilities associated with might do better. At any time, the resulting earthquakes. The expected values and prob- probability intensity would be available for use abilities inherit the uncertainties implicit in the as an input to decision-making and disaster model formulation. Confidence intervals can be mitigation procedures. For further discussion of developed for such characteristics, just as for the the problem of combining predictions see, for model parameters. A Bayesian approach is likely example, Clemen (1989). to be the most flexible, in which the confidence interval would be derived from the posterior distribution of the characteristic in question. Nor 6. Time varying risks: Some specific algorithms is the Bayesian approach limited to situations where the only uncertainties relate to the param- Despite the massive literature on earthquake eter values within a well-defined model. They prediction, there are very few candidate pro- can be extended to situations where the model cedures which are based on models from which itself is uncertain, by giving prior probabilities to conditional intensities or probabilities can be each possible model, developing each model's calculated, and which at the same time have posterior probability, and using these values to been tested on data, enjoy some semblance of weight the posterior distributions arising from credibility, and yield a significant sharpening of each model separately. The resulting mixture the risk over the Poisson model. Indeed, we distribution can be used to define both a point know of none which satisfy these criteria fully. In estimate for the characteristic of interest, and an this section we describe some of the best-known associated confidence interval. Indeed, almost and more successful attempts. exactly this procedure has been used by the Working Group on Californian Earthquake 6.1. The Hawkes process and Ogata's ETAS Probabilities (1988, 1990), although the models model they included were restricted to versions of the time predictable model (see Section 6.2 below). In the early 1970s, Hawkes (1971) introduced This raises an important issue. It is implicit in a family of what he called "self-exciting" or the guidelines for issuing predictions set out by "mutually exciting" models, which became both NEPEC and other groups that the resolution of pioneering examples of the conditional intensity the problem of model uncertainty is seen in methodology and models of general utility for terms of model selection. Out of the many the description of seismic catalogues. Early ap- uncertain candidates presently offering them- plications to such data are in Hawkes and selves, one or two will ultimately show superior Adamopoulos (1973), and in the discussion of performance, and will be selected as the basis of the "Klondyke model" in Lomnitz (1974). The real-life prediction schemes. models have been greatly improved and extend- In fact, this may not be the best approach in ed by Japanese authors, especially Ogata, whose practice. The alternative is to think not in terms "ETAS" model was recently adapted for the of model selection, but of model combination, IASPEI software programme (see Ogata, 1993). probably using a Bayesian approach as suggested In his hands it has been successfully used to above. Rather as in the definition of economic- elucidate the detailed structure of aftershock type indices, a "basket" of models could be sequences (Ogata and Shimazaki, 1984), the selected, and weighted conditional intensity esti- causal influence of events in one region on those D. Vere-Jones / InternationalJournal of Forecasting11 (1995)503-538 525

in another (Ogata et al., 1982; De Natale et al., function which increases rapidly with magnitude 1988), the presence of trends and seasonal effects (e.g. as exp(yM) for some 3' > 0). (Ogata and Katsura, 1986), and the detection of The extension to "mutually exciting" pro- periods of relative quiescence masked by a cesses takes the form background of decaying aftershock sequences (Ogata, 1992). A general introduction and re- Ai(t)=ai + ~ [,,:~ 0 is a background immigration term; The final extension is to replace the discrete > 0 represents the contribution to the con- g(x) index j by a continuously variable spatial coordi- ditional intensity after a lag of length x, and nate x, which leads to a space-time-magnitude satisfies J',~ g(u)du < 1, and the sum is taken over intensity of the form all events {tn} occurring before the current time t. The process has both epidemic and branching A(t, M, x_) process interpretations. Any single event can be ,,:,<,Z thought of as the "parent" of a family of later -f(M)[a(x_)+ h(M,)g(t-t,,x__-x,)], events, its "offspring", which ultimately die out where a(y_) is a spatially distributed immigration but are replenished by an immigration compo- or background noise term, and is now a nent. Its motivation is essentially pragmatic, g(t,x) space-time kernel such as a Gaussian (diffusion) although Lomnitz (1974) points out the resem- kernel. Further details and examples are given blance to Boltzmann's original representation of by Musmeci and Vere-Jones (1992) and Rathbun "elastic after-effects" (Boltzmann, 1876) as a (1993). It is one of the very few examples of a linear combination of time-decaying terms. In full space-time model. Recently, however, practice, g(x) is usually given a simple paramet- Kagan and Jackson (1994a) have taken such ric form, such as a finite sum of Laguerre ideas one step further by incorporating not only polynomials (Vere-Jones and Ozaki, 1982), or the location but also the orientation of the fault the Pareto-type form used in Ogata's ETAS movement. In other respects their model is model, by analogy with Omori's law (see Section 2.1). similar in spirit to, but different in detail from, the self-exciting models described here. The model is readily extended to include At present, models of this kind offer the best additional variables. Magnitudes can be incorpo- general purpose description for catalogues of rated in a marked point process intensity of the form moderate size (M t> 4 or M t>5) earthquakes. They are amply attested by applications to many different seismic regions. The time-magnitude A(t,M)=f(M)[a + ,,:,,

5

...... i ...... i ...... ';...... :,...... 4

3

2

1

0 8 ...... 6 ......

0 1 O0 200 300 400 500 600 700 800

Time (days) Fig. 5. A realisation of the conditional intensity from Ogata's ETAS model. The uppermost graph represents the cumulative curve J'0 A(t)dt, the middle graph is a plot of the conditional intensity A(t) itself, and the bottom plot shows the times and magnitudes of the largest events. The plot is produced from software developed by Y. Ogata and T. Utsu for the International Association for Seismology and Physics of the Earth's Interior (IASPEI). D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 527

or fault segment, prior to the current time t. difficulty posed by this model lies in the fact that Then if h(ulM) denotes the hazard function of it is not rooted in the catalogue of current the time between events, supposing the previous earthquakes. The evidence in favour of the event had magnitude M, the conditional intensity characteristic earthquake model is largely based function for events on that fault has the form on paleoseismological records. Even incorporat- ing historical events, there are rarely more than A(t) = h(t- t*lM*). (6.2) two or three direct records of events which have A detailed working through of this methodolo- affected substantial segments of the major faults. gy to events affecting California in general, and Not only is the general evidence for the model the San Francisco Bay region in particular, are indirect, but its parameters, such as the slip rate contained in the two USGS working group and the standard deviation in the lognormal reports (Working Group on Californian Earth- distribution, also have to be inferred indirectly, quake Probabilities, 1988, 1990). The "engineer- usually from a consideration of geological and ing risk" for a given site is then calculated by tectonic evidence. summing the risk contributions from each fault This lack of connection with the seismic segment close enough to affect the site in ques- catalogue is illustrative of the seismologist's tion: thus the engineering risk (for a specified current dilemma. Is it reasonable to assume that intensity of shaking) appears as a sum of terms the major events follow patterns and processes which cannot be directly confirmed from the AE(t, x) = ~ai(x- --Xi, Mi)hi(t - t* IM*) catalogues? If so, what is the best method of i combining such information with the catalogue where the a i are attenuation factors relating the data? site to the source i and characteristic magnitude One reason for caution with models of this M i of the ith source fault; ti* and Mi* are the kind, illustrated all too vividly by the recent time and magnitude of the previous event on Northridge earthquake in Los Angeles, is that that fault, and hi(u[M ) is the conditional hazard earthquakes quite large enough to pose a major function for the ith fault. hazard in human terms, may be insignificant in In the specific formulation preferred by the relation to the still greater events which take up Working Group, the role of the magnitude is the motion of the plates on a geological time played by the slip (displacement) on the fault scale. Indeed the third Working Group report, and the inter-event distributions are taken to be for Southern California, for which drafts are lognormal, with median proportional to the slip currently in preparation, adapts a mixed of the previous event on the fault. The justifica- philosophy, looking at the risk both from large tion for the lognormal is indirect, deriving from events along the major faults, and from smaller analogies with in'er-event distributions from a events in other locations. variety of seismic sources. The hazard function for the lognormal dis- 6.3. Stress release models tribution increases from zero to a maximum around the median and then decreases back to The time-predictable model considered briefly zero. Possible histories for the conditional in- in the previous subsection represents an attempt tensity on a single fault, and for the engineering to capture, in terms of a stochastic model, the risk at a site affected by several faults, are shown underlying idea of Reid's (1910) elastic rebound in Fig. 6. Both seem to me somewhat counter- theory of major earthquakes. The "stress release intuitive. Note that the latter is closer to constant model" developed in a series of papers by the than the former, as a consequence of adding the author and colleagues (see Ogata and Vere- contributions from different sources (approxima- Jones, 1984; Vere-Jones and Deng, 1988; Zheng tion to a Poisson process). and Vere-Jones, 1991, 1994) attempts to provide As mentioned in Section 4, the most serious a more general formulation of the same idea. 528 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538

0

o

~d o4

0 /

fi i i i 0 5 10 15 time

0

o

o / o 0 i i i ! 0 5 10 15 time

o

~d

o

O ¢5 i ! i i 0 5 10 15 time Fig. 6. (a) and (b) Schematic representations of the conditional intensity from the time-predictable model, (c) The result of superposing the risks from both sources. D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 529

This version was developed primarily for analys- We are now in a position to examine the ing regional historical catalogues, for events with differences between the model variants. In the M >/6 or 6.5, that could be considered reason- time predictable model ~0(x) has a singularity at ably complete over a period of 3-4 centuries. the strength of the fault, say x0:to(x)=0 for Historical catalogues approaching this quality are x x 0. By contrast, in available from Northern China, Japan, Italy, the regional stress release model it is assumed Iran, and possibly other countries. Here we shall that to(x)--e x, i.e. that there is a "soft" upper outline a general formulation of the model and limit to the strength of the crust over the region contrast the assumptions required to specialise it or along the fault. first to the time-predictable model of the previ- The more crucial difference, however, is in the ous section, and then to the model used in the form of the frequency-magnitude distribution. analyses of historical data mentioned above. In the model for historical earthquakes, it is The first general assumption is that the prob- assumed that a version of the Gutenberg-Rich- abilities of events occurring within the region (or ter law is valid. This is converted to a stress on the fault) are determined by an unobserved release distribution by taking f(slx)= C.S 8, state variable. In the versions of the model which with/3 of the order 1.5-2. The time-predictable have been developed so far it is supposed that model, however, was not intended to model this state variable can be represented by a scalar catalogue data, but a sequence of events of quantity X(t) which increases linearly between roughly constant size (slip). Consequently, al- events and decreases instantaneously when though magnitudes as such are not explicitly events occur. It is not necessary that this quanti- modelled in the Working Group reports, it is ty be interpreted literally as stress but this is its implicitly assumed that they are either equal to general character. At any time, X(t) then repre- the magnitude of the characteristic earthquake, sents the balance between the accumulated tec- or normally distributed about it, leading to the tonic stress in a region and the amount of stress lognormal distribution for the slips or the stress released through earthquakes: release. A feature of the regional stress release model X(t) - X(O) = at -/3 ~ S n . (6.3) is that, although the smaller events make rela- n:tn

China E1

...... Poisson / ~!1~1 ~ --StressReleas~

t 500 1600 1700 1800 1900 2000 Year China E2 I Poisson I

1500 1600 1700 1800 1900 2000 Year

it} China Wl

o I ...... Poisson I o Stress release

n'u3 o. o o • "~-1 ...... ' o 1500 1600 1700 1800 1900 2000 Year -PossonChina W2 J

1500 1600 1700 1800 1900 2000 Year Fig. 7. Realisationsof the conditionalintensity for the regionalstress release model, applied to events in Northern China (from Zheng and Vere-Jones, 1991). The examples show diversityin the ability of the model to fit the data. systematically exploring the possibility of using gions of the world; see, inter alia, Gel'land et al. pattern recognition methods to develop algo- (1973), Cisternas et al. (1985). rithms for detecting space or time-space win- Our concern is rather with the use of such dows with heightened probabilities of earth- methods to predict "TIPS", or "times of in- quake occurrence. For example, an extensive creased probability". The best known of their series of papers has been published on identify- algorithms of this type is the so-called M8 ing earthquake-prone locations in different re- algorithm. This was the subject of a considerable D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 531

furore in 1988, when the Soviet president indi- is declared for one of these circles when, for cated to his American counterpart that Soviet three years in succession, at least one indicator scientists had predicted a period of heightened from each of the four types (i)-(iv) exceeds its risk for Southern California. The US National upper p% percentile (where p = 10% for types Earthquake Prediction Evaluation Committee (i)-(iii) and p = 25% for case (iv)), and not more was called into action to meet with the Soviet than one indicator fails to exceed the corre- scientists and consider their predictions (see sponding threshold. The TIP is declared for a NEPEC, 1989). No immediate action was taken, 5-year period. For further details see, for exam- but further evaluation of the algorithm continued ple, Keilis-Borok and Kossobokov (1990a,b). and a number of reports have since appeared in The authors of the algorithm emphasise its which the algorithm is described and applied to purely empirical character, and disavow the different regions and with different data sets possibility of linking it to a model-based ap- (see, for example, Brown et al., 1989). proach, or even to explicit risk forecasts. How- In essence, pattern recognition procedures are ever, as Ellsworth suggested in the NEPEC a means of automatically sifting through large (1989) discussion one can obtain a rough indica- data sets to discover particular data combina- tion of the risk enhancement factor associated tions which occur repeatedly before events of with a TIP by comparing its predicted number of interest, and are therefore to be considered as events (one) with the expected number within potential precursors of such events. The pro- the same region and time interval from the cedures generally involve two stages, a learning background rate. According to Ellsworth this stage, in which patterns of potential interest are suggests a risk enhancement factor in the range identified, and a selection stage, in which those 3-5, which is similar to that given by the models patterns are selected which "work" with an considered in the two preceding subsections. independent set of data. Moreover it might be argued that one of the The M8 algorithm makes use of four indicators main values of such an approach is in the clues it which have been developed and selected over a gives as to features which may indicate local considerable period. They comprise: changes in the risk environment, and which may (i) the numbers of main shocks (retained after be worthy of further study. Even as they stand, removing aftershocks by a standard win- any or all of the indicators (i)-(iv) could be dowing procedure) in a moving time inter- incorporated as regressands in a Cox-regression val of fixed length; type model for the conditional intensity, and this (ii) the rate of change of such numbers; type of modelling could be used to give a more (iii) a weighted sum of the mainshocks within direct evaluation of their predictive power. such a sliding window, where the weight associated with a given event is proportional 6.5. The precursory swarm hypothesis to a fractional power of the energy release; (iv) a numerical characteristic of aftershock ac- One particular pattern, of a similar general tivity within the moving window, namely, character to those above, has been investigated the maximal count of aftershocks from any in considerable detail. This is the so-called "pre- aftershock sequence within the year preced- cursory swarm", a group of intermediate size ing the end of the moving time window. events, closely grouped in time, space, and The first three of these indicators are repeated magnitude, and followed after some interval by a for events with a lower magnitude threshold, larger event whose time, space, and magnitude giving seven indicators in all. coordinates can be roughly inferred from the All seven of these characteristics are then characteristics of the preceding swarm. This idea evaluated for a series of circles with fixed diam- has been investigated in a series of papers of eter (related to the magnitude threshold of the increasing sophistication and complexity by catalogue) and centred on a suitable grid. A TIP Evison and Rhoades: in particular see Evison 532 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538

(1977, 1982), Evison and Rhoades (1993). More- precursor, there may be no clear physical moti- over they have used their experience with this vation for the thresholds chosen, so that one is idea to formulate and discuss more general back into the pattern recognition mode. In more aspects of hypothesis testing as it relates to complex settings, where several types of pre- earthquake prediction; see, inter alia, Rhoades cursor may be involved, one becomes entangled and Evison (1979, 1993), Rhoades (1989). in the impenetrable logical thicket surrounding The following points arise, and are typical of the independence relations among the precur- those associated with the testing of precursory sors, and between them and the predicted event. phenomena. The assumptions made here can drastically (a) An exact definition of the precursory event modify the risk enhancement factor, but may be is needed. In the present case this requires virtually impossible to verify directly because of explicit statements of just what may or may the lack of data on the joint occurrence of not be considered a precursory swarm. several of the precursors. (b) Likewise an exact statement is needed of In the simple form of the precursory swarm just what events are being predicted. model described in Rhoades and Evison (1979), (c) The description of the model should include the magnitude M of the predicted event, and the an expression for the probability distribution time T to its occurrence from the occurrence of (e.g. in the form of the hazard function) of the swarm, are given by logarithmic regressions the time to the predicted event, after the of the form occurrence of the precursor. (d) From the information in (a), (b), and (c), a log T= a o + aiM + E 1 likelihood ratio can then be developed for M = b 0 + bl log T + b 2 logM + E 2 testing the given hypothesis against the Pois- son as a null hypothesis. Note that the respectively, where M is the average magnitude likelihood ratio subsumes the information of the three largest events in the swarm, E 1 and concerning the numbers of successful predic- E 2 are normally distributed error terms. As in tions (precursors followed by appropriate Section 6.2 the lead times are lognormally dis- events), false alarms (precursors not fol- tributed, which according to Rikitake (1976) is lowed by an appropriate event), and failures typical of a range of longer-term precursory to predict (events not preceded by precur- phenomena. sors). This simple form of the hypothesis was tested (e) Since the authors advocate a Bayesian ap- over some twelve years, and ultimately rejected proach, updating formulae need to be de- when its likelihood ratio, relative to the Poisson, veloped both for estimating the parameters fell below a 5% level. According to Evison and within the model, and for evaluating the Rhoades (1993) the reasons for its failure hinge likelihood ratio referred to in (d). In general on the narrow definitions required in (a) and (b), this will include procedures for updating the in particular by the inability of the model to cope probabilities that the observed precursors satisfactorily with clusters of swarms and clusters are "valid" - i.e. will be followed by an of events. A more sophisticated ver- appropriate event. sion of the model, described in Evison (1982), is None of these points is entirely straightfor- still being tested. ward. The definitions required for (a) and (b) are Despite the difficulties in formulating and to some extent arbitrary, particularly insofar as testing a precise hypothesis, there is evidence they involve magnitude or distance thresholds. from many regions that something akin to a As pointed out earlier, it is difficult, in such a precursory swarm occurs before many major context, to separate out the problem of testing events in many seismic regions. Such features the model from the problem of testing the may indicate a change in the underlying stress thresholds. Here, as in the definition of a "valid" regime, and it is possible that something like a D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 533 hidden Markov model (for the stress regime) groundwater level, and direction of ground tilt- could provide a more flexible modelling frame- ing, etc, supported the conclusions, and rein- work within which such precursory features forced the decision to call the alert. Indeed, could be explored. Lomnitz (1994) suggested that the public may have been leading the seismologists, rather than vice versa. In any case it seems that the decision 6.6. The Chinese experience was taken following the "step-by-step" procedure - initial information confirmed at each The one widely accepted example of a success- step by further evidence-favoured by the Chi- ful prediction, which established earthquake nese, and descriptive of a mode of decision- prediction as something more than a seismolog- making rather than a method for computing ist's pipe-dream, occurred in Haicheng, in risks, geophysical or otherwise. Such an interpre- Northern China on February 4, 1975. According tation is supported by the fact that, although the to Bolt (1988), on the day before the earth- Chinese have claimed other successful predic- quake, 'the Shihpengyu Seismographic Station tions (see Cao and Aki, 1983; Ma et al., 1990), suggested that the (recent) small shocks were the none have had the same impact. There have also foreshocks of a large earthquake. As a result, on been some tragic failures, notably in respect of February 4, the party committee and the re- the even greater Tanshang earthquake, which volutionary committee of Liaoning province occurred the following year with a reported alerted the entire province. Special meetings death toll of 250,000. were held at once to ensure that precautionary An attempt to quantify retrospectively the measures would be taken. The people were risks which might have been calculated from the mobilised to build temporary living huts, move information available at the time of the Haich- patients from hospitals, concentrate transporta- eng earthquake was made by Cao and Aki tion facilities and important objects, organise (1983). They first identified four types of pre- medical teams, induce people to evacuate their cursory information: long term (periods of gener- homes, and move the old and weak to safety'. ally high activity); intermediate term (enhance- Then, at 7.36 pm, an earthquake of magnitude ment of moderate earthquake activity); short 7.3 struck the Haicheng-Yingkow region. West- term (variation of radon content of water from a ern as well as Chinese reports confirmed that the well); imminent (reports of anomalous animal damage had been extensive, but that as a conse- behaviour). Then they calculated a rough risk quence of the prediction the losses of human and enhancement factor (probability gain) for each, animal lives had been greatly reduced. and multiplied the results to obtain a total risk Subsequent analysis of this dramatic episode enhancement of about 14,000. The largest contri- suggests that cultural and political factors played butions came from the terms corresponding to as big a role in this success as the information animal behaviour and radon content. available to the Chinese seismologists. At that It is difficult to give much credence to this time in China, earthquake surveillance was a value, which vastly exceeds any of the estimates high priority among the lay population as well as considered in the previous subsections. The among the specialists, leading to a high level of estimates for the individual risk enhancement local involvement. Some years previously a factors are very rough and the evidence support- period of relatively high activity had been noted, ing them is not supplied in detail. Also the followed by a quiescence. Then, not long before multiplication of these factors implies indepen- the earthquake itself, activity began to recover, dent precursors, and can lead to a gross inflation and to migrate towards the future epicentre. of the risk enhancement if in fact the effects are Finally the larger shocks occurred which were not independent (see Rhoades, 1989). The calcu- identified as foreshocks. Supplementary reports lations do, however, suggest the important moral of anomalous animal behaviour, changes in that continued monitoring of a wide range of 534 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538

potential precursors may be an essential pre- tional reactions to any precise prediction, sci- requisite to successful earthquake prediction. entific or otherwise, and increases the difficulty of utilizing predictive information in a rational and objective manner. Nor are the media slow to 7. Concluding remarks exploit the public interest in earthquake predic- tion, and if the episode ends up, as it often does, To what extent does earthquake forecasting in making the scientists look foolish, then so raise unique problems and to what extent does it much the better; they are becoming an increas- share features in common with other forecasting ingly popular target for public frustration of contexts? I hope that, after reading thus far in many kinds. Gori (1993) documents a recent the article, readers with expertise in other fields example of the expenses, anxiety, and general of forecasting might be better able to answer this confusion caused by a prediction that had no question than I am. By way of concluding endorsement from any reputable seismological remarks, however, let me give a personal impre- body. Administrators are hardly to be blamed ssion of the situation. for not knowing which way to jump in such As with any forecasting task there are two situations. Their primary responsibility is to the aspects to consider: what forecasting information constituencies they serve, and it is to their is really wanted, and what technical tools are perception of the situation, rather than to the needed to supply it from the available data. As a distant voice of the scientist, that they are likely proper statistician should, I put the question of to listen in moments of stress. Such incidents use first: until the real uses have been clarified, illustrate that earthquake prediction is caught up no statistical investigation is likely to be effec- with social and political issues which in practice tive, and any work undertaken may be wasted. obscure and even override considerations of cost Who, then, actually wants forecasts of earth- and the rational use of resources. quake risk, and why? I doubt if earthquake forecasting is unique in Here I have to admit that, if genuine users facing problems of this kind, but it is surely an exist, they have been slow to identify them- extreme case. selves. Indeed, looking at earthquake prediction Similar reasons may lie behind the "stop-go" from this point of view, we quickly encounter character of earthquake research funding, at two of the main problems which bedevil the least in the United States. Bruce Bolt remarks in subject: the high stakes for which the game is a recent review (Bolt, 1991) that played, and the long-term character of the in- 'In terms of national welfare, it might be ex- vestments that have to be made. Earthquake pected that the risk involved in earthquakes countermeasures are not cheap to put into place would give special force to the claims for funds and it may be many years before they show their and resources for earth scientists, engineers, value. There will always be vested interests planners and others involved in enhancing seis- wanting to opt out of expenses which may be in mic safety. Seismological history tells otherwise. the long-term public interest, but show little in Risk reduction is characterised by bursts of the way of direct benefits. Perhaps for this activity and political support following major reason, I know of little stated interest in risk earthquakes, and delay curves that have a half- forecasts which provide time-dependent risk en- life of a year or so before public interest re- hancement factors in the range 1-10, although cedes.' this might seem a realistic goal in the present My impression is that the most satisfying state of knowledge, and, used appropriately, applications of statistical forecasting procedures might lead to significant reductions in the losses in seismology are related to engineering situa- caused by earthquakes. tions where a particular need has been more At the same time the huge cost and high tightly identified. The task of forecasting the drama of a major earthquake encourages emo- amplitude and other characteristics of the ground D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 535 motion at a particular site is the most familiar of known. In essence, they fall into the same these, and is incorporated in many important category as forecasting lifetimes, for which there engineering and design procedures. A recent exist substantial literatures in the engineering local example relates to the analysis of lifelines (reliability), actuarial, and medical contexts. In in the Wellington region and the location of any such situation, the central concept is likely to particular points of vulnerability in case of a be the hazard function (conditional intensity major earthquake affecting the region. Such function, age-specific mortality rate) conditioned studies usually involve no more than a careful by relevant supplementory informatian. evaluation of background rates, but there is A more significant technical difficulty, in my surely scope for extending them by identifying opinion, is posed by the spatial dimension of also the facilities that could best make use of earthquake occurrence. Then there is no clearly advance warnings of increased levels of local risk defined "next event", as this will depend on the at different time scales. Looking at the forecast- region the user has in view (not to mention also ing problem from such practical perspective the magnitude threshold and possible constraints helps to identify the predictive information that on other earthquake characteristics). One is will be most genuinely useful. As well as direct- forced into forecasting a space-time intensity, ing scientific and statistical research into prod- which stretches both models and algorithms to uctive channels, such a programme can also help the limit. Indeed, full space-time models, for to strengthen the working links between sci- earthquakes or anything else, are rare. Only in entists, statisticians, and administrators. epidemiology, to my knowledge, do similar By contrast, looking for the ultimate earth- problems arise, with somewhat similar models quake prediction algorithm may prove to be and procedures currently being developed. something akin to the search for the holy grail, Even here, therefore, the technical difficulties rich in virtue and prestige, but somewhat lacking in developing space-time earthquake forecasts in useful outcomes. are not unique, although near the edge of Now let me turn to technical aspects of earth- current statistical and computing capabilities. quake forecasting. By far the greatest technical To summarise, there is little doubt that earth- difficulty is simply the absence of any identified quake prediction remains one of the most chal- source of reliable predictive information. This is lenging problems in contemporary science. It is primarily the scientist's problem, but the statisti- still conceivable that a breakthrough will occur- cian may be able to help through improved that some precursory phenomenon will be dis- methods of condensing and displaying the in- covered which will allow alerts to be called with formation in multidimensional data sets, and in a high level of confidence and precision. It seems developing more effective procedures for more likely, however, that the emerging picture evaluating models and prediction algorithms. will be one of gradually tightening precautionary The exaggerated swing from excessive optimism measures-low-level alerts which can be locally concerning the possibility of earthquake predic- shifted to a higher level in response to sharpened tion to the current state of general pessimism is estimates of the earthquake risk. certainly due in part to a failure to appreciate the statistical nature of the subject; neither extreme is really an appropriate response. Acknowledgements Leaving the fundamental scientific problem aside, forecasting earthquakes differs from most I am grateful to the Editor for suggesting the routine forecasting problems in that it deals not preparation of this article, and for his encourage- with a discrete or continuous time series, but ment and helpful criticisms along the way. I am with sudden random events. But, although prob- also indebted to a number of colleagues who lems of this kind may be relatively uncommon in read and commented on an early version of the forecasting practice, they are certainly not un- article, in particular to Frank Evison and David 536 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538

Jackson, whose detailed criticisms have forced Cao, T. and K. Aki, 1983, Assigning probability gain to precursors of four large Chinese earthquakes, Journal of me to tighten up the discussion in almost every Geophysical Research 88, 2185-2190. section of the paper. Cisternas, A., P. Godefroy, A. Grishiani et al., 1985, A dual Work on this paper was supported in part approach to recognition of earthquake prone areas in the through the New Zealand Foundation for Sci- Western Alps, Annals of Geophysics 3, 249-270. ence, Research and Technology Grant No. 93- Clemen, R.T., 1989, Combining forecasts, International VIC-36-039. Journal of Forecasting 5, 559-583. Cornell, C.A., 1968, Engineering seismic risk analysis, Bul- letin of the Seismological Society of America 67, 1173- 1194. References Cornell, C.A., 1980, Probabilistic analysis: A 1980 reassessment, Proceedings of Conference on Earth- quake Engineering, Skopje, Yugoslavia. Akaike, H., 1974, A new look at the statistical model Cox, D.R., 1972, Regression models and life tables (with identification, IEEE Transactions: Automation and Con- discussion), Journal of the Royal Statistical Society B 34, trol AC-19, 716-723. Aki, K., 1981, A probabilistic synthesis of precursory phe- 187-220. Cox, D.R., 1975, Partial likelihood, Biometrika 62, 269-276. nomena, in: D.W. Simpson and P.G. Richards, eds., Cox, D.R. and V. Isham, 1980, Point processes (Chapman Earthquake prediction: An international review (Maurice Ewing Series 4, A.G.U., Washington D.C.). and Hall, London). Cox, D.R. and P.A.W. Lewis, 1966, The statistical analysis of Aki, K., 1989, Ideal probabilistic earthquake prediction, series of events (Methuen, London). Tectonophysics 169, 197-198. Daley, D.J. and D. Vere-Jones, 1988, An introduction to the Aki, K. and P.G. Richards, 1980, Quantitative seismology, Vols. 1 and 2 (W.H. Freeman, San Francisco). theory of point processes (Springer, Berlin). Algermissen, S.T., D.M. Perkins, P. Denham, S.L. Henson De Natale, G., F. Musmeci and A. Zollo, 1988, A linear and B.L. Bender, 1982, Probabilistic estimates of maxi- intensity model to investigate the causal relation between mum acceleration and velocity in rock in the contiguous Calabrian and North Aegean earthquake sequences, Jour- United States, USGS Open File Report 82-1033. nal of the Royal Astronomical Society 95, 285-293. Bakun, W.H. and T.V. McEvilly, 1984, Recurrence models Dowrick, D.J., 1991, Damage costs for houses and farms as a and Parkfield, California, earthquakes, Journal of Geo- function of intensity in the 1987 Edgecumbe earthquake, physical Research 89, 3051-3058. Engineering and Structural Dynamics 20, 455-469. Bakun, W.H., J. Bredehoeft, R.O. Burford, W.W.J. Dowrick, D.J. and D.A. Rhoades, 1993, Damage costs for Ellsworth, M.J.S. Johnson, L. Jones, A.G. Lindh, C. commercial and industrial property as a function of intensi- Mortensen, E. Roeloffs, S. Shulz, P. Segall and W. ty in the 1987 Edgecumbe earthquake, Engineering and Thatcher, 1986, prediction scenarios Structural Dynamics 22, 869-884. and response plans, US Geological Survey Open-File Ellis, S.P., 1985, An optimal statistical decision rule for Report 86-365. calling earthquake alerts, Earthquake Prediction Research Basham, P.W., D.H. Weichert, F.M. Anglia and M.J. Berry, 3, 1-10. 1985, New probabilistic strong seismic motion maps of Evison, F.F., 1977, The precursory , Canada, Bulletin of the Seismological Society of America Physics of the Earth and Planetary Interiors 15, 19-23. 75, 563-595. Evison, F.F., 1982, Generalised precursory swarm hypoth- Bebbington, M., X. Zheng and D. Vere-Jones, 1990, Percola- esis, Journal of Physics of the Earth 30, 155-170. tion theory: A model for rock fracture? Geophysics Jour- Evison, F.F., 1984, Seismic hazard assessment in continental nal International 100, 215-220. areas, Proceedings of International Symposium on Con- Berry, M.J., 1989, ed., Earthquake hazard assessment and tinental Seismicity and Earthquake Prediction (Seimologi- prediction, Tectonophysics 167, 81-328. cal Press, Beijing), 751-762. Bolt, B.A., 1988, Earthquakes, 2nd edn. (W.H. Freeman, Evison, F.F. and D.A. Rhoades, 1993, The precursory New York). earthquake swarm in New Zealand: Hypothesis tests, New Bolt, B.A., 1991, Balance of risks and benefits in preparation Zealand Journal of Geology and Geophysics 36, 51-60. for earthquakes, Science 251, 169-174. Gel'land, I.M., S.H. Guberman, A. Izvekova et al., 1973, Boltzmann, L., 1876, Zur Theorie der elastischen Nach- Pattern recognition of areas prone to strong earthquakes I wirkung, Poggend, Ann Erganzsgbd 7, 624. Pamir and Tien-Shan, Computational Seismology 6, 107- Bremaud, P., 1981, Point processes and queues: Martingale 133. dynamics, (Springer, New York). Gori, P.L., 1993, The social dynamics of a false earthquake Brown, D., Q. Li, E. Nyland and D.H. Weichert, 1989, prediction and the response by the public sector, Bulletin Premonitory seismicity patterns near Vancouver Island, of American Seismology Society 83, 963. Canada, Tectonophysics 167, 299-312. Harte, D., 1994, Dimension estimates of earthquake epi- D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538 537

centres and hypocentres, preprint, Institute of Statistics diction, Physics of the Earth and Planetary Interiors 61, and OR, Victoria University of Wellington. 84-98. Hawkes, A.G., 1971, Spectra of some self-exciting and Molchan, G.M., 1991, The structure of optimal strategies in mutually exciting point processes, Biometrika 58, 83-90. earthquake prediction, Computational Seismology 24 Hawkes, A.G. and L. Adamopoulos, 1973, Cluster models (Contemporary Methods of Treatment of Seismic Data) for earthquakes: Regional comparisons, Bulletin of Inter- 3-19. English version: Tectonophysics 193, 255-265. national Statistics Institute 45, 454-461. Molchan, G.M., 1994, Models of the optimization of earth- IUGG, 1984, Code of practice of earthquake prediction, quake prediction, Computational Seismology 25 (Problems IUGG Chronicle No 165. of Earthquake Prediction and Interpretation of Earth- Jacod, J., 1975, Multivariate point processes: Predictable quake Data) 7-27. projections, Radon-Nikodym derivates, representations of Molchan, G.M. and Y.Y. Kagan, 1992, Earthquake predic- Martingales, Zeitschrift f/Jr Wahrscheinlichkeitstherie 31, tion and its optimization, Journal of Geophysical Research 235-253. 97, 4823-4838. Jeffreys, H., 1939, Theory of probability, 1st edn. (Oxford University Press, Oxford). Musmeci, F. and D. Vere-Jones, 1992, A space-time cluster- Kagan, Y.Y., 1991a, Fractal dimensions of brittle fracture, ing model for historical earthquakes, Annals of the Insti- Journal of Non-linear Science 1, 1-16. tute of Statistical Mathematics 44, 1-1 I. Kagan, Y.Y., 1991b, Seismic moment distribution, NEPEC, 1989, Proceedings National Earthquake Prediction Geophysics Journal International 106, 709-716. Evaluation Council, June 6-7 1988, ed., R.G. Updyke, US Kagan, Y.Y. and D. Jackson, 1991, Seismic gap hypothesis: Geological Survey Open File Report 89-1444. Ten years after, Journal of Geophysical Research 21, Ogata, Y., 1978, The asymptotic behaviour of maximum 419-421, 431. likelihood estimators for stationary point processes, Annals Kagan, Y.Y. and D. Jackson, 1994a, Long-term probabilistic of the Institute of Statistical Mathematics 30a, 145-156. forecasting of earthquakes, Journal of Geophysical Re- Ogata, Y., 1981, On Lewis' simulation method for point search 99, 13,685-13,700. processes, IEEE Transactions Inf Theory IT-27, 23-31. Kagan, Y.Y. and D. Jackson, 1994b, New seismic gap Ogata, Y., 1983, Estimation of the parameters in the modi- hypothesis: Five years after (preprint). fied Omori law for frequencies by the maximum likelihood Kagan, Y.Y. and L. Knopoff, 1977, Earthquake risk predic- procedure, Journal of the Physics of the Earth 31, 115- tion as a stochastic process, Physics of the Earth and 124. Planetary Interiors 14, 97-108. Ogata, Y., 1988, Statistical models for earthquake occurrence Karr, A.F., 1986, Point processes and their statistical infer- and residual analysis for point processes, Journal of Ameri- ence (Marcel Dekker, New York). can Statistical Association 83, 9-27. Keilis-Borok,V.I. and V.G. Kossobokov, 1990a, Premonitory Ogata, Y., 1992, Detection of precursory quiescence before activation of earthquake flow-algorithm M8, Physics of the major earthquakes through a statistical model, Journal of Earth and Planetary Interiors 61, 73-83. Geophysical Research 97, 19845-19871. Keilis-Borok, V.I. and V.G. Kossobokov, 1990b, Times of Ogata, Y., 1993, Fast likelihood computation of epidemic increased probability of strong earthquakes (M~>7.5) type aftershock-sequence model, Geophysical Research, diagnosed by algorithm M8 in Japan and adjacent ter- Letters 20, 2143-2146. ritories, Journal of Geophysical Research 95, 12,413- Ogata, Y. and K. Katsura, 1986, Point process models with 12,422. linearly parametrizered intensity for the application to Kutoyants, Y.A., 1984,Parameter estimation for stochastic earthquake catalogues, Journal of Applied Probability processes (Heldermann, Berlin). 23A, 231-240. Lomnitz, C.A., 1966, Magnitude stability in earthquake Ogata, Y. and K. Katsura, 1988, Likelihood analysis of sequences, Bulletin of Seismological Society of America spatial inhomogeneity of marked point processes, Annals 56, 247-249. of the Institute of Statistical Mathematics 40, 29-39. Lomnitz, C.A., 1974, Global tectonics and earthquake risk Ogata, Y. and K. Shimazaki, 1984, Transition from after- (Elsevier, Amsterdam). shock to normal activity: The 1965 Rat Islands earthquake Lomnitz, C.A., 1994, Earthquake prediction (Wiley, New aftershock sequence, Bulletin of Seismological Society of York). America 74, 1757-1765. Ma, Z., Z. Fu, Y. Zhang, C. Wong, G. Zhang and D. Lin, Ogata, Y. and D. Vere-Jones, 1984, Inference for earthquake 1990, Earthquake prediction: Nine major earthquakes in models: A self-correcting model, Stochastic Processes and China (1966-1976) (Springer, Berlin). their Applications 17, 337-347. McGuire, R.K. and K.M. Shedlock, 1981, Statistical uncer- Ogata, Y., H. Akaike and K. Katsura, 1982, The application tainty in seismic hazard variations in the United States, of linear intensity models to the investigation of causal Bulletin of Seismological Society of America. relations between a point process and another point Mogi, K., 1985, Earthquake prediction (Academic Press, process, Annals of the Institute of Statistical Mathematics Tokyo). 40, 29-39. Molchan, G.M., 1990, Strategies in strong motion pre- Olson, R.S., B. Podestra and J.M. Nigg, 1989, The politics 538 D. Vere-Jones / International Journal of Forecasting 11 (1995) 503-538

of earthquake prediction (Princeton University Press, Veneziano, D. and J. Van Dyck, 1987, Statistical analysis of Princeton). earthquake catalogues for seismic hazard, in: Y.K. Lin and Omori, T., 1894, On the aftershocks of earthquakes, Journal R. Minoi, eds., Stochastic applications in earthquake of the College of Science of the Imperial University of engineering, Lecture notes in engineering 32 (Springer, Tokyo 7, 111-200. Berlin) 385-427. Rathbun, S.I., 1993, Modelling marked spatio-temporal Vere-Jones, D., 1970, Stochastic models for earthquake point patterns, Bulletin of International Statistics Institute occurrence (with discussion), Journal of Royal Statistical (Proceedings 49th Session of ISI), Vol. 2, 379-396. Society B 32, 1-62. Rauch, E. and A. Smolka, 1992, Earthquake insurance: Vere-Jones, D., 1976, A branching model for crack propaga- Seismic exposure and portfolio dispersal; their influence on tion, Pure Applied Geophysics 114, 711-726. the probable maximum loss, 10th Vere-Jones, D., 1977, Statistical theories for crack propaga- World Conference, Balkema, Rotterdam, 6101-6104. tion, Mathematical Geology 9, 455-481. Reid, H.F., 1910, The mechanics of earthquakes, University Vere-Jones, D., 1978, Earthquake prediction: A statistician's of California Publication, Bulletin of the Department of view, Journal of the Physics of the Earth 26, 129-146. Geological Sciences 6, 413-433. Vere-Jones, D., 1983, What are the main uncertainties in Rhoades, D.A., 1989, Independence, precursors and earth- estimating earthquake risk? Bulletin of New Zealand quake hazard, Tectonophysics 169, 199-206. National Society for Earthquake Engineering 16(1), 39- Rhoades, D.A. and F.F. Evison, 1979, Long-range earth- 44. quake forecasting based on a single precursor, Geophysics, Vere-Jones, D., 1988, Statistical aspects of the analysis of Journal of Royal Astronomical Society 59, 43-56. historical earthquake catalogues, in: C. Margottini, ed., Rhoades, D.A. and F.F. Evison, 1989, Time variable factors Historical seismicity of Central-Eastern Mediterranean in earthquake hazard, Tectonophysics 167, 201-210. region (ENEA, Rome) 271-297. Rhoades, D.A. and F.F. Evison, 1993, Long-range forecast- Vere-Jones, D., 1992, Statistical methods for the description ing based on a single precursor with clustering, Geo- and display of earthquake catalogues, in: A.T. Walden and physical Journal International 113, 371-381. P. Guttorp, eds., Statistics in the environmental and earth Rikitake, T., 1976, Earthquake prediction (Elsevier, Am- sciences (Edward Arnold, London) 220-246. sterdam). Vere-Jones, D. and Y. Deng, 1988, A point process analysis Ripley, B., 1988, Statistical inference for spatial processes of historical earthquakes from North China, Earthquake (Cambridge University Press, Cambridge). Research in China (English translation) 2(2), 165-181. Sakamoto, Y., M. Ishiguro and G. Kitagawa, 1983, Akaike Vere-Jones, D. and T. Ozaki, 1982, Some examples of information criterion statistics (Reidel, Dordrecht). statistical estimation applied to point process data, 1 Cyclic Schwartz and Coppersmith, 1984, Fault behaviour and Poisson and self-exciting models, Ann Inst Stat Math 34 B, characteristic earthquakes: Examples from the Wasatch 189-207. and San Andreas Faults, Journal of Geophysical Research Wang, A.L., D. Vere-Jones and X. Zheng, 1991, Simulation 89, 5681-5698. and estimation procedures for stress release models, in: Sieh, K., 1981, A review of geological evidence for recur- M.J. Beckmann, M.N. Gopalan and R. Subramanian, rence times of large earthquakes, in: D. Simpson and P. eds., Stochastic processes and their applications, Lecture Richards, eds., Earthquake prediction, an international notes in economics and mathematical systems 370 (Spring- review (MEwing Series 8, American Geophysical Union) er, Berlin) 11-27. 209-216. Working Group on Californian Earthquake Probabilities, Silverman, B., 1986, Density estimation for statistics and 1988, Probabilities of large earthquakes occurring in Cali- data analysis (Chapman and Hall, London). fornia on the San Andreas Fault, US Geological Survey Snyder, D.L., 1975, Random point processes (Wiley, New Open File Report 88-398. York). Working Group on Californian Probabilities, 1990, Prob- Stoyan, D., W.S. Kendall and J. Mecke, 1987, Stochastic abilities of earthquakes in the San Francisco Bay region, geometry and its applications (Springer, Berlin). California, US Geological Survey Circular 1053. UNEP, 1980, Report of expert group meeting, Nairobi, Zheng, X. and D. Vere-Jones, 1991, Application of stress Document Working Group B3,5. release models to historical earthquakes from North China, Utsu, T., 1979, Calculation of the probability of success of an Pure Applied Geophysics 135, 559-576. earthquake prediction (in the case of the Izu-Oshima- Zheng, X. and D. Vere-Jones, 1994, Further applications of Kinkai earthquake of 1978), Report of Coordinating the stress release model to historical earthquake data, Committee for Earthquake Prediction, 164-166. Tectonophysics 229, 101-121. Utsu, T., 1983, Probabilities associated with earthquake prediction and their relationship, Earthquake Prediction Research, 105-114.