UNDERDETERMINATION AND INDIRECT MEASUREMENT

A DISSERTATION SUBMITTED TO THE DEPARTMENT OF PHILOSOPHY AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Teru Miyake June 2011

© 2011 by Teru Miyake. All Rights Reserved. Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/cs884mb1574

ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Michael Friedman, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Helen Longino

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Patrick Suppes

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

George Smith

Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.

iii

Abstract

We have been astonishingly successful in gathering knowledge about certain objects or systems to which we seemingly have extremely limited access. Perhaps the most difficult problem in the investigation of such systems is that they are extremely underdetermined. What are the methods through which these cases of underdetermination are resolved?

I argue in Chapter 1 that these methods are best understood by thinking of what scientists are doing as gaining access to the previously inaccessible parts of these systems through a series of indirect measurements. I then discuss two central problems with such indirect measurements, theory mediation and the combining of effects, and ways in which these difficulties can be dealt with.

In chapter 2, I examine the indirect measurement of planetary distances in the solar system in the sixteenth and seventeenth centuries by Copernicus and Kepler. In this case, there was an underdetermination between three different theories about the motions of the planets, which can be partly resolved by the measurement of distances between the planets. The measurement of these distances was enabled by making certain assumptions about the motions of the planets. I argue that part of the

iv

justification for making these assumptions comes from decompositional success in playing off measurements of the earth‘s orbit and the Mars orbit against each other.

In chapter 3, I examine the indirect measurement of mechanical properties such as mass and forces in the solar system by Newton. In this case, there were two underdeterminations, the first an underdetermination between two theories about the true motion of the and the earth, and the second an underdetermination between various theories for calculating planetary orbits. Newton resolves these two problems of underdetermination through a research program where the various sources of force are identified and accounted for. This program crucially requires the third law of motion to apply between celestial objects, an issue about which Newton was criticized by his contemporaries. I examine the justification for the application of the third law of motion through its successful use for decomposition of forces in the solar system in a long-term research program. I further discuss comments by Kant on the role of the third law of motion for Newton, in which Kant recognizes its indispensability for a long-term program for determining the center of mass of the solar system and thus defining a reference point relative to which forces can be identified.

Chapter 4 covers the indirect measurement of density in the earth‘s interior using observations of seismic waves. One of the difficult problems in this case is that we can think of the interior density of the earth as a continuous function of radius—in order to determine this radius function, you are in effect making a measurement of an infinite number of points. The natural question to ask here is how much resolution the observations give you. I will focus on the work of geophysicists who were concerned with this problem, out of which a standard model for the earth‘s density was developed.

v

Acknowledgments

I am incredibly lucky to have been able to take two extraordinary seminars in which the seeds for the ideas set forth in this dissertation were sown. The first is a seminar on Newton‘s Principia that George Smith taught at Tufts University that I took when I was an MA student. George‘s unwavering attention to the details that make a difference, his way of identifying and trying to answer truly deep and interesting questions about science, and above all his kindness and dedication to his students, all made a deep impression on me. I sat in on this seminar again when George taught a version of it when he visited Stanford University a few years later. I would like to sit in on it many more times if I could—I‘m sure I would get more out of it every time.

The one other seminar that made a similarly deep impression on me was Michael

Friedman‘s seminar on Kant‘s Metaphysical Foundations of Natural Science that I took at Stanford. I found Michael to be a thinker of a completely different sort from George, but I also saw a very similar uncompromising attitude with regard to the study of Kant and the sciences of his time, and Michael‘s warm personality made it easy for me to work with him as my advisor at Stanford. George and Michael are a pair of mentors who, each in his own unique way, sets the highest standards in his area of research. I only hope my own work could approach those standards someday.

vi

The rest of the dissertation committee is no less distinguished. Pat Suppes is, of course, in a league of his own. When I first talked to Pat, I have to admit that it was with a mixture of awe and apprehension, but I grew to really enjoy walking out to visit him at Ventura Hall. Helen Longino was always very helpful and encouraging, even during a very busy stint as department chair. Tom Ryckman was not an official member of the committee, but he was certainly a committee member in my eyes. I have had countless discussions with him about the topics covered in this dissertation, and he was the most dependable source of advice and support during my years at Stanford.

As I have already mentioned, I got my MA in philosophy at Tufts University, and besides George I would like to thank Dan Dennett, Jody Azzouni, Kathrin Koslicki,

David Denby, and the members of my cohort. At Stanford, I would like to thank the following faculty: Brian Skyrms, David Hills, Krista Lawlor, Lanier Anderson, Chris

Bobonich, Mark Crimmins, Nadeem Hussain, Marc Pauly, John Perry, and Dagfinn

Follesdal. Grad students and visiting scholars who have contributed to the development of the ideas in this dissertation include Quayshawn Spencer, Angela Potochnik, Joel

Velasco, Alistair Isaac, Johanna Wolff, Tomohiro Hoshi, Sally Riordan, Ben Wolfson,

Dan Halliday, Danny Elstein, Shawn Burns, Micah Lewin, and Samuel Kahn. Part of this dissertation was given as a talk at the UC Irvine LPS department, and I thank the audience for their comments, and Jeff Barrett and Kyle Stanford in particular for their hospitality.

I wrote much of this dissertation at the Max Planck Institute for the History of

Science in Berlin, where I was a Predoctoral Fellow. The Max Planck Institute provided a perfect environment for writing this dissertation, and I would especially like to thank

vii

Raine Daston and the scholars in Department II. Financial support for the years during which I was working on this dissertation was provided by the Whiting Foundation and the Ric Weiland Fellowship. In addition, I am proud to say that I was the very first Pat

Suppes Fellow at Stanford, for which I would like to thank Pat a second time.

I could not have had better preparation for the work I had to do for this dissertation than my undergraduate experience at Caltech. I want to thank all of my friends throughout those four very tough but ultimately rewarding years.

Finally, all the members of my family know that the roots of my philosophical education began with long arguments over pretty much anything with my twin brother

Kay. I would like to thank Dad, Mom, Yochan, June, and Kay for their support.

viii

Table of Contents

Chapter 1: Underdetermination and Indirect Measurement ...... 1

Chapter 2: Copernicus, Kepler, and Decomposition ...... 35

Chapter 3: Newton and Kant on the Third Law of Motion ...... 68

Chapter 4: Underdetermination in the Indirect Measurement of the Density Distribution of the Earth’s Interior...... 102

Epilogue ...... 136

Bibliography ...... 141

ix

-1-

Underdetermination and Indirect Measurement

1 Prelude

Suppose one day archeologists unearth a mysterious artifact—a perfect black cube, 10 centimeters on a side, cool to the touch, made of what looks like the blackest possible steel. They decide, rather unimaginatively, to call the artifact ―Cube‖. It‘s a mere curiosity at first, but scientists soon find that it has some mystifying features. The material it is made out of is incredibly hard—it cannot be broken, cut, pierced, drilled, or dynamited. It cannot even be scraped in order to take samples of the material. All attempts to take CAT scans or MRI images of the inside of Cube have failed. On one face are several white dots that look as if they are projected onto the face from within

Cube. The dots move across the face of Cube, tracing out trajectories over time.

Now suppose we are scientists trying to figure out what is going on inside Cube.

We will find, unfortunately, that our options are severely limited, since we have found no way of accessing the interior of Cube. What do we do? Perhaps the only thing to do is simply to assume that there are certain lawful connections between the internal and external states of Cube, that is, the dynamics of the external states depends somehow

1

upon the dynamics of the internal states. We then make hypotheses about (a) the dynamics of the internal states of Cube, and (b) the laws that connect the internal to the external states of Cube. From these hypotheses, we deduce predictions about the dynamics of the external states. If those predictions match our observations of the external states, we say that those hypotheses have been confirmed. This method, the hypothetico-deductive method, was described by Pierre Duhem in The Aim and Structure of Physical Theory (1954) as being the method of physics, and it has been widely adopted by philosophers, most notably Quine.

There is a problem with this method, though, as Duhem recognized. Since we have no antecedent knowledge whatsoever about the internal states of Cube, there is enormous leeway in the hypotheses we can come up with. For any given dynamics of the external states of Cube, there will be many different sets of hypotheses that are consistent with those dynamics. In philosophical parlance, our theory of the internal states of Cube is massively underdetermined by our observation of the external states.

Because of this underdetermination, the mere agreement of predictions about the dynamics of the external states of Cube with actual observations gives us little reason to think that the hypotheses from which those predictions were deduced have, in any way, characterized the true internal states of Cube. Faced with this predicament, we might give up on the idea that we can gain any knowledge at all about the internal states of

Cube, and instead become instrumentalists. We change our aim to simply predicting the dynamics of the external states of Cube without making any claim to having any knowledge about the internal states.

2

2 Resolving underdetermination

According to one way of thinking about the methodology of planetary astronomy in the sixteenth century, planetary astronomers were in a position very much like what the scientists faced with studying Cube. All of our knowledge about the solar system came from the observation of the motions of the planets as they moved across the night sky. More specifically, we can think of ourselves as being located inside an immense, hollow, black sphere, on the inner surface of which the constellations are painted. We can then determine the positions of the planets on this sphere, as seen from the earth, and thus record their apparent motions over time. We cannot, however, know how far away a planet is from us merely by looking at it. So we are, in effect, looking at the two- dimensional projection, onto the celestial sphere, of the actual three-dimensional motions of the planets through space. Moreover, although we did not know for sure in the sixteenth century, we are observing these motions from a platform, the earth, that is itself moving.

Drawing out the analogy with the story of Cube, we can think of the apparent motions of the planets as corresponding to the external states of Cube, while the actual three-dimensional motions correspond to the internal states. Like the scientists studying

Cube, astronomers in the sixteenth century faced a problem of radical underdetermination. Famously, the apparent motions of the planets across the night sky were compatible with three different theories of the actual motions of the planets—the

Ptolemaic, the Copernican, and the Tychonic theories1—in which the actual three- dimensional motions of the planets are radically different from each other. This is a

1 I will describe these theories in more detail in chapter 2.

3

classic situation of underdetermination. There were three radically different theories that could all be made to fit the observations then available to about the same degree of precision. At the end of the sixteenth century, some astronomers, such as a contemporary of Kepler‘s called Ursus, came to conclusions similar to those I discussed above about Cube.2 They decided that the aim of planetary astronomy should not be about acquiring knowledge about the actual motions of the planets at all. Instead, the aim of planetary astronomy should simply be to provide a convenient way of calculating the apparent motions of the planets.

How was this state of underdetermination eventually resolved? Well, suppose the method of astronomy is, like for Cube, hypothetico-deductive. You make hypotheses, deduce the observable consequences of these hypotheses, and then you compare these consequences with actual observations. Since the problem is that there were three theories that could fit the observations to the same degree of precision, we might think that one way of resolving the underdetermination is through increasing the precision in the actual observations. As we shall see in chapter 2, however, Johannes

Kepler shows in the Astronomia Nova that, with minor modifications, the Ptolemaic,

Copernican, and Tychonic systems can be made to give exactly the same predictions for the apparent two-dimensional motions of the planets—they can be made empirically equivalent. Thus, a mere increase in precision of the observations of the apparent motions could not resolve the underdetermination. What actually happened is that

Galileo turned his telescope to the skies in 1619 and observed that Venus has phases, just like the moon. This situation is inconsistent with the Ptolemaic theory, so it was

2 I will discuss Ursus in chapter 2.

4

eliminated from contention.3 A new kind of technology, the telescope, allowed us to bring a new kind of evidence to bear on the question of what the actual motions of the planets are.

I think, however, that there is a third way in which the underdetermination could have gotten resolved. In fact, Kepler had a good argument, prior to 1619, that the

Ptolemaic theory is not the correct theory of planetary motion. I just got done saying that Kepler showed all three theories of the planetary motions could be made empirically equivalent to each other, and so could not be distinguished on the basis of observations of the apparent two-dimensional motions of the planets. We might note, however, that the three theories predict very different motions for the planets through three- dimensional space. If we could somehow measure the actual distances between the planets with confidence, we could eliminate one or more of the theories. As I said, we cannot get planetary distances simply by direct observation of the two-dimensional motions, but they can be inferred from these two-dimensional motions by indirect measurement.

3 Indirect measurement

So we might be able to resolve underdetermination in some cases by using indirect measurement. As we shall see, however, there is a problem. In order to carry out indirect measurement, you have to presuppose certain facts about the system you are investigating. The central question of this dissertation will be: How can we know with confidence that indirect measurements are correct or approximately correct, given that

3 It was not until Newton that the Tychonic theory was conclusively laid to rest, as we will see in chapter 3.

5

we must presuppose certain facts about the system? Let me sharpen this question further by explaining what I mean by an indirect measurement, and giving some idea of what the assumptions are that you have to make about the system.

Suppose there is a complicated, partially inaccessible system that I want to acquire knowledge about. A complicated system is one that consists of many parts, those parts having various properties and relations with each other. I say an object is partially inaccessible if we can only confidently measure a proper part of the properties of, and relations between, the parts of that object. I call the properties that we can confidently measure the accessible properties. I will also sometimes speak of accessible parts, by which I simply mean the parts of the system that have properties that we can confidently measure. In order to determine the properties and relations of the inaccessible parts, we must make inferences based upon what we know about the accessible parts. Indirect measurement, then, is the measurement of inaccessible properties or relations of a complicated, partially inaccessible system, through inference based upon observations of the accessible properties.

We can think of the solar system, as viewed by astronomers in the sixteenth century, as a complicated, partially inaccessible system. It is complicated because it consists of many parts, namely the planets, the sun, and the moon, each having properties such as mass and size, and distance relations between them. It is partially inaccessible because we have access to the two-dimensional motions of the planets, but we do not have access to distances in three-dimensional space. So the measurement of planetary distances based upon observations of the apparent two-dimensional motions of the planets is indirect measurement.

6

Now let us go back to the question I asked a few paragraphs back. Could we have used the observations of the apparent two-dimensional motions of the planets to break out of the state of underdetermination prior to 1619? The answer to this question depends on whether we could have made indirect measurements of planetary distances with confidence prior to 1619. I think we could, as I will argue in chapter 2. But here, I simply want to examine what might make us lack confidence about indirect measurements.

Before I go on with my discussion of indirect measurement, I want to distinguish indirect measurement from a somewhat similar kind of problem. Suppose there is a system that is partially inaccessible but not complicated. For example, say we have found a huge underground lake, and we want to know the mineral content in the various parts of the lake, but we only have access to parts of it. We might then take samples of the water from the parts we can access, measure the mineral content in these samples, and then extrapolate to the entire lake. We are making the assumption here, of course, that the mineral content in the parts of the lake that are inaccessible to us is going to be similar to the mineral content in the parts that are accessible. If this assumption turns out to be wrong, we will be wrong about the mineral content in the inaccessible parts.

There can be interesting epistemological problems with this kind of extrapolation, but it will not be a central topic of this dissertation. I will stick to complicated systems, for which I believe there are particular problems and ways of dealing with these problems.

I will now explain what I take to be the central problems with indirect measurement. First, note that we can be very confident about the results of some indirect measurements. I do not have direct access to the amount of electric current

7

flowing through a wire, but I can have great confidence in the value I measure using a galvanometer. At least part of the reason for this confidence has to do with what I call antecedent familiarity. If an object is of a type that is familiar to me, I can safely assume certain facts about that object. I know that if I drop a shot put from a height of 10 meters, it will reliably hit the ground in approximately 1.4 seconds, barring any extraordinary circumstances. I know this because I know that objects like shot puts fall with a uniform acceleration of approximately 9.8 m/s2 at the surface of the Earth. There have been some cases in the history of science, however, where we have wanted to know facts about an object that is utterly unlike anything else we knew of at the time. The solar system is a good example of such an object. For all astronomers knew in the sixteenth century, the solar system could have been radically different from anything else we knew of, so it was hard to know what a reasonable assumption to make about the solar system was.

I think there are two main difficulties when carrying out an indirect measurement that would make us lack confidence in such a measurement, particularly if the system we are making the measurement on is antecedently unfamiliar. The first difficulty is theory- mediation. You have to make measurements of the inaccessible properties, based upon observations of the properties that are accessible. In order to make such measurements, you need to presuppose that a particular relation applies between the accessible properties and the inaccessible properties. If the relation you use to make the measurement is not known antecedently, then the question naturally arises as to how you can know that the measurement is correct.

The second difficulty is the combining of effects. Again, the root of this difficulty is that you have to make measurements of inaccessible properties based upon

8

observation of accessible properties. Suppose the system you are making a measurement upon is complicated. If so, there could be more than one part of the system that has an effect on the accessible parts. If you want to measure a property of one of those parts, you might have to separate out, or decompose, the effects of the various parts on the accessible part. If you do not antecedently know the composition of the system, however, you might not know exactly how to carry out such a decomposition. If so, you might not be confident that the measurement you make using such a decomposition is correct.

I will discuss these difficulties in more detail in the following sections of this chapter, but now let me return to the notion of underdetermination. Suppose that there is a system that we are interested in acquiring knowledge about, but there are two or more theories that can account for all observations equally well. As I mentioned, there are a couple of ways in which we can think we could resolve this situation of underdetermination. One way is simply to improve on the observations we already have, by increasing the precision of these observations. The other way is to come up with an entirely new set of observations, like Galileo observing the phases of Venus.

What I am arguing in this dissertation is that there is a third way to resolve the underdetermination. This is to make indirect measurements by inference from the observations that are available to us. In order to make these indirect measurements, however, we must make certain assumptions about what the system is like. Because of the problem of theory-mediation, you have to make assumptions about the relation between the inaccessible properties and the accessible properties of the system. Because of the problem of combining of effects, you have to make assumptions about the

9

composition of the system, that is, the relation between the parts of the system. Since these assumptions enable indirect measurements to be made, I will sometimes refer to them as enabling assumptions.

So the now sharpened-up central question of this dissertation is the following:

Given that, in order to carry out an indirect measurement, you must make inferences from the accessible properties of a system to the inaccessible properties, and that in order to make these inferences, you need to make the assumptions that (1) certain relations between accessible and inaccessible properties apply, and (2) effects from various inaccessible parts on the accessible parts can be decomposed in a certain way, how do you ensure that the indirect measurement you made is correct, or approximately correct? I will lay out a preliminary answer to this question in the rest of this chapter.

4 Theory mediation

If I want to find out how wide my window is, I simply take out a tape measure and measure it. Sometimes, however, I do not have the right kind of access to an object on which I want to make a measurement. As I write this, the Tokyo Skytree, which will become the tallest freestanding structure in Japan when completed, is being built.

Suppose I want to figure out how tall it is at this point during its construction. I could not very well take out a tape measure to measure its height. Instead, I might improvise a device with which I measure the angle from the horizon to the top of the Skytree. I then find out the distance from my position to the Skytree construction site. Simple geometry tells me that the height of the Skytree should then approximately be this distance times

10

the sine of the angle I measured, assuming that the angle is small. With the help of geometry, I have made a measurement of something that is physically inaccessible to me.

In a particularly philosophical moment, I might realize that I have made the assumption here that the Skytree is the kind of thing to which Euclidean geometry applies. We would never call this assumption into question in our day-to-day dealings.

But what if, instead of the Skytree, I was trying to calculate distances to something that is utterly unfamiliar to me? Astronomers in the sixteenth century, for example, used geometry in determining the orbits of the planets. If they had known of other geometries, they might well have raised the question of whether Euclidean geometry really applies to the planets. After all, those planets were known to be unimaginably distant, and nobody had the faintest clue what kind of material they could be made out of. Why should we believe Euclidean geometry applies to them?

For almost all practical purposes, when we make such a measurement, we are on safe ground assuming that mathematics and geometry will apply to the objects that we are investigating. But sometimes, in order to make a measurement, we need to assume more than mathematics and geometry. Sometimes we have to assume that a system on which we are trying to make a measurement has certain physical properties, and behaves in accordance with certain mathematical relations. Because I make use of a bit of physical theory in order to make this kind of measurement, I say that such measurements are theory-mediated.

Now, when we make measurements using bits of physical theory, the way in which the theory is used in the measurement can be surprisingly complicated. For example, consider the problem of trying to measure the muzzle velocity of a cannon.

11

One way we might make this measurement is to fire the cannon and measure how far the cannonballs fly. The following equation, allows you to calculate, given the angle  at which a cannon is fired, the muzzle velocity v, and the gravitational acceleration g at the surface of the Earth, the horizontal distance D at which a cannonball lands:

D = 2 v2 (cos  sin ) / g. (1)

This equation assumes no air resistance, a perfectly flat Earth, and a constant acceleration due to gravity. In order to calculate the distance D, all you need to do is plug in the values of the muzzle velocity and the angle of the cannon.

Now, suppose we want to determine the muzzle velocity of a particular cannon, but we do not have any means of directly measuring the velocity of the cannonballs as they come shooting out of the muzzle. There is a way of using the equation given above for making a measurement of this muzzle velocity. We can think of this method as a way of measuring a property of something that is not directly accessible, much like our determination of the height of the Tokyo Skytree.

First, we fire the cannon several times, at a predetermined angle, and measure the distances at which cannonballs land. We then might guess various values of v, for which we calculate the distances D at which we predict the cannonball ought to land. We take the value of v that gives us a predicted value for D that is the nearest to the actually observed values. Then we might refine our value of v further by taking a cluster of values around this best value for v, and calculating the distances at which we predict the cannonball ought to land given these values for v. We then compare these distances with the distances we have actually measured, and take the value of v that is closest to

12

these distances. We can keep repeating this until we home in on a value for v. Using this procedure, we hopefully will have measured the muzzle velocity.

Note that this procedure involves using a mathematical equation where v and  are independent variables, and D is a dependent variable. If the aim were to determine

D given values for v and , one could simply plug in the values and use the equation to calculate D. In this case, however, we are using measured values of D in order to try to determine the value of v—that is, we are trying to determine the value of an independent variable using measured values for the dependent variable. The way in which we do this is to vary the value of v until we find one that fits the value of D that we have observed.

Often, the independent variables such as v are called parameters, and this kind of problem is called a parameter estimation problem, or a bit more colloquially, curve- fitting. This kind of problem is also often called an inverse problem, particularly in cases where instead of trying to estimate discrete parameters, you are trying to estimate a continuous function.

Suppose we take the mathematical equation to be correct, and that  is known.

Then the logical relation between v and D is in the form of an if-statement: if v has such-and-such a value, then D has such-and-such a value. Note that this relation does not uniquely determine v given D. What we really would want to guarantee uniqueness for the value of v would be a logical relation in the form of an if-and-only-if statement.

There is also a further problem having to do with the logic. We used a kind of homing procedure to find the value of v, where we first guess a value and then adjust v until we get a value for D that best fits our measured value. Note that this homing procedure works because we know that D is going to be smooth over small variations in v. But if

13

the equation we were using were such that the dependent variable is sensitive to small fluctuations in the independent variables, we would not be able to do such a homing procedure. For the homing procedure to work, the logic has to be of the form if v has very nearly such-and-such a value, then D has very nearly such-and-such a value. In some cases of indirect measurement, the use of this very nearly relation is crucial, as we shall see in chapter 3.

In some cases, due to the mathematical relation between the independent and dependent variables, there are problems having to do with the nonuniqueness of solutions. Methods for addressing these nonuniqueness problems have recently become important in geophysics, computer imaging, and other fields, under the rubric of

―inverse problem theory‖. I will postpone discussion of this problem until chapter 4.

5 When a measurement is theory-mediated, how do we know it’s correct?

As we did with the measurement of the height of Tokyo Skytree, we might think about the assumptions we are making when we carry out this measurement. How do we know that these assumptions will result in correct measurements? For example, what needs to be the case in order for us to come up with the correct value for v, the muzzle velocity of the cannon?

Our initial impulse might be to say that the equation we are using, and the assumptions we are making about this system, must be true of the system. But we should immediately realize that the equation we are using, and the assumptions we are making about the system, such as no air resistance and a constant acceleration due to gravity, are, strictly speaking, false with respect to this system. Now, one might think

14

that we ought to try to make the measurement procedure as realistic as possible, by including as many details as we can. We could, for example, try to include air resistance, include known details of the terrain, even allow for things like wind and atmospheric pressure. The problem is that, in many cases, adding too many details to the measurement procedure complicates the procedure enormously, and in some cases makes the determination of a value impossible.

On the other hand, we would be in trouble if the assumptions we make are too unrealistic. In that case, we could perhaps carry out the measurement procedure and determine values for the properties of the system. But if the assumptions we make are too unrealistic, the values we calculate would give us properties of some imaginary cannon, not the real cannon we are interested in. There is a tradeoff here. If the assumptions we use are too unrealistic, then we would get the wrong answer for our measurement. But if we are too realistic, then we won‘t be able to carry out the measurement procedure. The trick is to find assumptions that are realistic enough so that they will let us calculate a value for the muzzle velocity that is close enough, for our purposes, to the correct value for the real cannon.

How, then, do we know that we are making the right assumptions, and using the right equation, to calculate the correct value for v? With regard to the cannon example, the answer to this question is ultimately going to be an appeal to our everyday experience, and our experience with cannons in particular (hopefully, we are experienced artillery engineers). Our familiarity with the type of thing that cannons are, and the conditions under which they are fired, allows us to justify the assumptions we make about the system.

15

There was also the further problem of the logic of the relation between v and D.

Even if I have found a value for v that is consistent with the value for D that I have measured, the logic does not guarantee that the value for v that I found is unique. Here again, though, we make the assumption that the value for v is unique because of our familiarity with the situation. We know that, given a constant value for , and the conditions under which the cannon is fired, Equation 1 ought to apply at least approximately, and there should be a unique positive value of v for each value of D.

In this example, there is a part of the system to which we do not have direct access—we cannot directly measure the muzzle velocity of the cannon. In order to make an indirect measurement of this muzzle velocity, we must make a large number of assumptions about the cannon. Fortunately, the cannon is a type of system that is familiar to us, so we can have confidence in the assumptions we make. We might say that the muzzle velocity of the cannon is an inaccessible property of a familiar system, and our familiarity with systems of this type allows us to set up a procedure through which we can measure this inaccessible property.

What do we do, though, if we want to measure inaccessible properties of unfamiliar systems? Let us hold that thought until after I discuss the second of the two main difficulties of indirect measurement, the combining of effects.

6 Representing partially inaccessible systems

Before I discuss the combining of effects, however, I first want to introduce the following way of representing partially inaccessible systems. This will facilitate the

16

discussion by giving us an intuitive grasp of what is going on in cases of the combining of effects.

Figure 1

We might represent our measurement of the muzzle velocity of the cannon as in

Figure 1. This diagram is in the form of a directed graph.4 The reason it is a directed graph should become clear in the next few sections, but let‘s just take a look at the figure first. There are two nodes, labeled X and Y. There is an arrow, labeled a, going from X to Y. Here is how to interpret this picture. Y stands for the distance the cannonballs travel, X stands for the muzzle velocity of the cannon, and the arrow a stands for the relation between X and Y, namely Equation 1 given above. The relation a uniquely determines Y, given X. That is, as I have mentioned, it is a logical relation of the form if

4 I should say that some inspiration for these diagrams comes from Jim Woodward‘s work on causation. The idea of these diagrams, however, is not to try to infer causes from observation. In fact, it is almost the opposite—this sort of structure is assumed in order to enable measurements of properties. I was also greatly influenced by George Smith‘s work, particularly his paper ―Closing the Loop‖, encapsulated in the idea of trying to find the ―details that make a difference, and the differences they make‖.

17

X = v, then Y = w. We have access to Y, that is, we have the means for confidently measuring its value. What we want is to find the value of X. Ignoring, for now, the difficulties I mentioned involving nonuniqueness, we can say that the value of X can be determined if we know the value of Y, because we know the relation a.

We can think of the arrow a, in this case, as representing a causal relation. But in other cases, the arrow could stand for other relations. For example, the measurement of the height of the Tokyo Skytree can also be represented by Figure 1. Think of X as standing for the height of the Skytree, and Y as standing for the angle from the top of the

Skytree to the horizon and the distance from my position to the Skytree site. We are now interpreting Y as standing for two variables. The arrow a now stands for a geometrical relation between X and Y, which uniquely determines Y, given X. As in the cannon example, we can determine the value of X, given Y, because we know the relation a.

Note, though, that these graphs should not be taken to be faithful representations of these systems. For example, as I mentioned with regard to the cannon example, the relation represented by a, Equation 1, is not actually true with regard to the system. We might further take issue with the structure of the diagram itself. There are factors, such as the wind, that will influence the distance that the cannonball travels. Shouldn‘t there, then, be other arrows that point towards the node Y? In fact, if we wanted to come up with a complete picture of what is happening with the cannon, we would have to have a very complicated graph, with nodes standing, say, for the wind, details of the terrain, variations in the gravitational constant, Coriolis forces, and so on. As experienced artillery engineers, we might decide that we do not consider any of those things. We

18

assume that those other things will not have much of an effect on the outcome, and we feel right about this in virtue of our experience as artillery engineers. In this case, this very simple picture involving just X, Y, and a is sufficient for us to get a reasonably accurate value for X, which is what we wanted. We assume that the relation a holds for this system well enough for us to make this measurement.

One further remark: the diagram looks like the kind of thing that is often called a model, both by scientists and philosophers. Because the word is used for many different kinds of things in the philosophical literature, however, I have thought it best to avoid it.

The role of these diagrams is simply to represent the elements that are necessary for the measurement to be carried out, and their relation with each other. I am using them as conceptual tools for thinking about particular cases of measurement, and to facilitate discussion about what is going on in such measurements. It should not be assumed that a scientist carrying out a measurement has such a diagram explicitly in mind.

5 Combining of effects and decomposition

Now the discussion in the previous section raises an obvious question. What if the system I am investigating is more complicated, having various different parts that have significant effects on the accessible parts? This is the problem I discussed earlier in this chapter as the problem of the combining of effects.

19

Figure 2

We can now discuss this problem using the diagrams I have just introduced.

What if I can‘t reduce a system to a very simple one like Figure 1, but it is more like

Figure 2? In Figure 2, there are now three nodes, X, Y, and Z, and two arrows—one from X to Z, labeled a, and the other from Y to Z, labeled b. We can take a to be a relation that licenses an inference of the following form: given that there are no other factors affecting Z, then if X = v, then Z = w. Similarly, we can take b to be a relation that licenses an inference of the form given that there are no other factors affecting Z, then if Y = v, then Z = w. Now, suppose we have access to Z, and we want to measure either Y or X. Since Z is affected by both Y and X, we need some way of separating out their effects on Z. If we could somehow successfully separate out their effects, we would be able to measure X or Y.

Let me illustrate this situation with the cannon example again. Let Z be the distance the cannonball travels, and let X be the muzzle velocity of the cannon. The arrow a going from X to Z again represents Equation 1. But now we have another factor,

20

represented by Y, that has an effect on the distance the cannonball travels. Say Y is the speed of the headwind or tailwind in the direction the cannonball is shot. Then in order to measure X given observations of Z, we would somehow have to compensate for the effect of Y on Z.

How do we compensate? Perhaps the easiest way to do it is to wait to fire the cannon at times when there is no wind. Since at such times there will be no effect of Y on Z, we can effectively reduce Figure 2 to Figure 1. In this case, we are isolating the effect of X on Z from the effect of Y on Z, in order to measure X. Now, it just happens that in this example, this sort of measurement using isolation can be done. But what if there is never a time when the wind dies down, and there is always a headwind, for example?

More generally, what do you do in a situation like in Figure 2, where you have access to Z, and you want to measure X, but there is always a significant effect of Y on

Z? You would have to find some way to separate out the effects of X and Y on Z. I call the process of separating out the effects decomposition. How might you carry out this decomposition? One way to do it would be to somehow try to model what the effect of

Y on Z would be, and then subtract that out in order to measure Z. Of course, we are making the assumption here that the effects of X and Y on Z will add linearly, which will not always be the case. At this point, however, I don‘t want to make things too complicated. Let me simply note, at this point, that we are indeed making this assumption about how the effects add together.

21

Figure 3

Now, there are other arrangements we can think of as well. In Figure 3, we have an arrow going from X to Y, and an arrow from Y to Z. Suppose we have access to Z, and we want to determine the value of X. In this case, X has a causal effect on Y, and Y has a causal effect on Z, and we need somehow to measure X via its effect on Y.

Another possible arrangement is in Figure 4, where now in addition there are arrows going between X and Y. We might think of this as a case where there is now some kind of causal interaction. I call the various different ways in which we can arrange the arrows and the nodes the relational structure. If all the relations are causal relations, then we can think of it as a kind of causal structure. In all of these cases, if we want to

22

measure X or Y based on our observations of Z, we must somehow separate out the effects of X and Y on Z—that is, we must carry out a decomposition.

Figure 4

All of this might seem complicated, but when we are trying to measure properties of something that is familiar to us, isolating and decomposing the various effects comes rather naturally. For example, suppose I am in a moving car and I have a radar gun with me. I want to measure the speed of an oncoming car. I point the radar gun at the car, then I look at the speed that the radar gun gives, and then I compensate for my own speed by looking at my speedometer and subtracting my own estimated speed. This is a form of decomposition that comes naturally because this is a system that is made up of parts that are familiar to us. Of course, as the directed graph gets more complicated and you have to decompose more effects, measurement can become immensely more difficult.

I think decomposition is an aspect of indirect measurement, and of scientific methodology in general, which has been overlooked by philosophers. In individual

23

cases, scientists are certainly aware of the difficulties involved with separating out various effects when carrying out measurements. But there has been very little philosophical literature on the problems of decomposition.5

7 Antecedently unfamiliar systems

Up to this point, we have been talking about the measurement of inaccessible properties of familiar systems. What if, instead of a familiar system such as a cannon I was trying to make a measurement on an inaccessible property of a system that is antecedently unfamiliar to me? Is this even possible? Don‘t we have to know certain things about the system antecedently in order to measure such inaccessible properties?

In the case of the cannon, we have to know the laws of physics, facts about the environment of the cannon such as properties of the air and terrain, and less quantifiable facts about cannons in general—how they are manufactured, how they are fired, and so on. How could we possibly set up a measurement of an inaccessible property of an antecedently unfamiliar system?

History seems to show, however, that successful measurements have been made of inaccessible properties of antecedently unfamiliar systems. Consider planetary astronomy again. The solar system—not to be confused with the traces we observe of planets across the night sky, but the planetary system itself—was surely about as inaccessible and antecedently unfamiliar as a system could be. We might now laugh at the idea that the planets are carried around the heavens in crystalline spheres, but the solar system is utterly unlike anything astronomers at the time knew about. There was

5 A few philosophers who have addressed this problem or related problems are George Smith (2002a, 2002b), Hasok Chang (2004), and William Wimsatt (2007).

24

simply no way to know in advance what a reasonable assumption about the planets is.

Yet, as I show in chapters 2 and 3, the work of Kepler and Newton are examples of how measurements of antecedently unfamiliar systems can be carried out successfully.

Let us think carefully about what makes the measurement of inaccessible properties of antecedently unfamiliar systems difficult. As I have discussed earlier, there are two basic problems—theory-mediation and the combining of effects. First, to illustrate the problem of theory-mediation, let us return to the cannonball example.

Recall that the diagram for that example is given in Figure 1. There are two nodes, X and Y, with an arrow, a, pointing from X to Y. Now, suppose we didn‘t know the laws of physics, so we couldn‘t derive the relation, Equation 1, which relates X to Y and thereby allows us to measure X by observing Y. If we only have access to Y, we would not be able to measure X, without knowing this equation. Let me represent this situation in

Figure 5. I have X and Y, but now only a dotted arrow from X to Y, with a question mark next to it. This is an indication that we think we know that there is a relation between X and Y, but we don‘t know exactly what it is.

Figure 5

25

How would we measure X in this case? One thing we might think of doing is simply guessing the relation. But how could we be at all sure that we have measured X correctly, using a guessed relation? If I was really a cannon maker, here‘s what I would think of doing. I would try to build something like the cannon, that launches a heavy object like a cannonball, but for which I know the initial velocity—perhaps a catapult of some kind. By launching the object at different velocities, I might find some kind of relation between the initial velocity and the distance traveled. Then, by induction, I assume that the same relation holds for the cannon. Since I now have a relation between

X and Y, I can make the measurement. So in this case I do not have to derive something like Equation 1 from fundamental theory—I can determine it empirically. Still, one might ask whether the inductive move is justified—how do I know that the relation I found from the catapult applies to cannons as well? Let us hold this thought for a while.

Figure 6

Now let us think about the problem of combining of effects. Think once again about the cannonball example. Suppose we do know Equation 1, so we know of the

26

relation a relating X to Y. But perhaps we are inexperienced as artillery engineers. We don‘t know whether there could be other influences on the distance traveled, such as the wind. Without knowing whether there could be such other influences, we would not be able to measure X with confidence. Let me represent this situation in Figure 6. I have X and Y, and an arrow going from X to Y as in Figure 1, but now I have a couple dotted arrows going towards Y with question marks beside them, indicating possible effects on

Y. Now, again, if we were really cannon makers, there would be ways of determining whether, say, wind is a factor. We could, for example, fire the cannon using the same amount of powder under various conditions of wind to make sure that the distance the cannonball travels is not affected too much by the wind. But, of course, there could be further unforeseen conditions that affect the distance the cannonball flies. Without being able to anticipate such unforeseen conditions, we have no way of correcting for them.

8 Indirect measurement and evidence

Let me now return to what I said is the central question of this dissertation: Given that, in order to carry out an indirect measurement, you must make inferences from the accessible properties of a system to the inaccessible properties, and that in order to make these inferences, you need to make assumptions that (1) certain relations between accessible and inaccessible properties apply, and (2) effects from various inaccessible parts on the accessible parts can be decomposed in a certain way, how do you ensure that the indirect measurement that you made is correct, or approximately correct?

If the system we are making an indirect measurement on is antecedently familiar, we can often give plausibility arguments for assumptions (1) and (2). For example,

27

going back to the cannon example again, we take it as given that the laws of physics apply to cannonballs, and that under the right conditions, Equation 1 will apply. And I can give an argument based on past experience to say that the actual conditions are indeed close enough to those conditions for us to be able to apply Equation 1 to this particular situation—that, say, the wind is not going to be a factor. But what do we do if the system is antecedently unfamiliar?

If we look at cases from the history of science, there is not a simple answer, because the situations tend to be very complicated. Even in cases where the system you are investigating is antecedently unfamiliar, you can give plausibility arguments for the assumptions. For example, as we shall see in chapter 3, Newton referred to experiments done in his laboratory to justify the applicability of the laws of motion in the Principia.

This is a reasonable assumption to make as a working hypothesis, but it could not have been known at the time that the laws are in fact applicable to celestial objects.

Plausibility arguments are much weaker without the weight of experience behind them.

I think there is a different way of gaining confidence that an indirect measurement is correct or approximately correct, which does not involve trying to come up with a straight justification for the assumptions (1) and (2): let the indirect measurements themselves be evidence that the assumptions were correct.

28

Figure 7

I think there are at least two strategies through which this can be done. The first strategy is converging measurement.6 Suppose there is some system that we can represent by Figure 7. There is a node X with two arrows out from it, arrow a to node Y, and arrow b to node Z. Suppose both Y and Z are accessible properties, that is, we have a way of measuring their values confidently. Suppose we don‘t have too much confidence in the relations a and b. In this situation, there are two different ways of measuring X, through observation of Y using relation a, and through observation of Z using relation b. If we carry out both measurements, and we get approximately the same result, that is, they converge, then this is good reason to think that the measurements are good, and that the measurement of X is correct. We can, of course, have more than two such converging measurements. The more the results converge, the better reason we have to believe that the measurement of X is indeed correct. Note, however, that we can get converging results even if the relations a and b are not strictly true of the system—it could be the case that, say, relation a simply holds to a good approximation under the

6 The term, and the idea, are George Smith‘s. See (Smith 2002a) and his unpublished manuscript ―Closing the Loop‖.

29

circumstances of the measurement. It turns out that we have more reason to believe that the measurement itself is correct than the assumptions we made in order to make the measurement.

This has implications, by the way, for the way in which we view the ―flow‖ of evidence in science. In Figure 7, I have confidence in my measurements of Y. I have low confidence in the relation a. Since I am using the relation a to measure X, one might think that I should have low confidence in my measurement of X. This would indeed be the case if I only measured X one way, but if I also measure X through the other relation b, and they converge, then this will increase my confidence in X even if I have low confidence in b. In fact, this might be reason to raise my confidence in the applicability of the relations a and b. To put it in a loose but picturesque way, evidential power does not flow monotonically from Y and Z towards X. Rather, under certain circumstances such as converging measurements, X can be a new source of evidence, and the evidential power can actually ―flow outward‖ from X. Of course, we have to be careful about what such converging measurements actually show about the relations a and b. The conclusion we can draw from such convergent measurement is that the relations a and b are applicable under the conditions of the measurements, but we would not know whether they would be applicable in other conditions.

There are other strategies besides converging measurement in which we can get the indirect measurements themselves be evidence that the assumptions were correct.

They involve more complicated relational structures. The following strategy is what I call decompositional success. For example, take a look again at Figure 2. Here, the accessible property Z is affected by both the inaccessible property X, via the relation a,

30

and the inaccessible property Y, via the relation b. Suppose we don‘t have too much confidence in the relations a or b, and we want to measure X. We might first try guessing the effect of Y on Z, subtracting that effect out, and then measuring X using the relation a. We now have a way of modeling the effect of X on Z using the relation a.

Now subtracting that effect out, we measure the value of Y using the relation b. Using this new value of Y, we model the effect of Y on Z. We subtract out that effect and measure a more refined value for X. Using this new, refined value for X, we model the effect on Z, and we now come up with a new, refined measurement for Y.

If my measurements of X and Y seem to be converging on certain values, then this is good evidence that this relational structure is approximately correct and the relations a and b are also at least approximately applicable. Why? Suppose the relation a is not approximately applicable. Then when we model the effect of X on Z and subtract out this modeled effect in order to measure Y, we do not expect to get a good value when we measure Y. Then, when we model the effect of Y and subtract it out to measure X, we should expect this measurement not to give a good value for X, and thus it should not agree with the previous value for X. Thus, if the sequence of values for X is converging, this is evidence that the values for X and Y are correct. To put it loosely, we are ―playing the measurements of X and Y off of each other‖—the measurement of X presupposes that the measurement of Y is approximately correct, and the measurement of

Y presupposes that the measurement of X is correct. If either one is not approximately correct, then in all probability the procedure should not work.

In actuality, these relational structures often turn out to be even more complicated. But it is the very fact that these structures are so complicated that they can,

31

in some cases, confer very high confidence that some indirect measurements are correct.

The more complicated a structure, the more ways in which one can play measurements off of one another, or try to measure one property in more than one way.

Now I want to discuss some limitations of these methods. First, as we shall see when we start looking at actual cases of indirect measurement, most indirect measurements are far from easy to do, especially when they involve systems that are partially inaccessible. They often involve observations that are limited and hard to get, and the calculations themselves can often be laborious, especially when we consider sciences such as planetary astronomy in the sixteenth and seventeenth centuries. Thus, indirect measurements will often be made with the hope that it will be shown down the road that the assumptions that were made in carrying out the measurements will turn out to be true. We will see in chapter 3, for example, that this is the best way to view what

Newton was doing in the Principia.7

The second limitation also has to do with the temporal dimension. These methods all involve comparing the results of different indirect measurements. In most cases, the indirect measurements will be made at different times. If the property you are measuring changes over time, then you will not be able to get converging measurements.

Thus, a fundamental presupposition in using these methods is that the property you are measuring will not be changing its value significantly over time—that the value will be stable. This is an issue that I will discuss in more detail in chapter 3.

7 This is George Smith‘s view of the methodology of the Principia. This dissertation is largely the result of trying to understand Smith‘s views of methodology particularly as they relate to the problem of underdetermination.

32

8 Case studies

Now that I have laid out my general view of indirect measurement, the rest of this dissertation is devoted to case studies of indirect measurement of complicated, partially inaccessible systems. Each case will involve a problem where there is initially a difficult problem of underdetermination—the available observations are not good enough to uniquely determine the inaccessible properties of the system. Indirect measurement through the use of enabling assumptions will resolve at least part of that underdetermination. I will, for the most part, focus on understanding the justification for the enabling assumptions.

In chapter 2, I examine the indirect measurement of planetary distances in the solar system in the sixteenth and seventeenth centuries by Copernicus and Kepler. In this case, there was an underdetermination between three different theories about the motions of the planets, which can be partly resolved by the measurement of distances between the planets. The measurement of these distances was enabled by making certain assumptions about the motions of the planets. I argue that part of the justification for making these assumptions comes from decompositional success in playing off measurements of the earth‘s orbit and the Mars orbit against each other.

In chapter 3, I examine the indirect measurement of mechanical properties such as mass and forces in the solar system by Newton. In this case, there were two underdeterminations, the first an underdetermination between two theories about the relative motion of the sun and the earth, and the second an underdetermination between various theories for calculating planetary orbits. Newton resolves these two problems of underdetermination through a research program where the various sources of force are

33

identified and accounted for. This program crucially requires the third law of motion to apply between celestial objects, a point on which Newton was criticized. I examine the justification for the application of the third law of motion through its successful use for decomposition of forces in the solar system, in a long term research program. I further discuss comments by Kant on the role of the third law of motion for Newton, in which

Kant recognizes its indispensability for a long-term program for determining the center of mass of the solar system and thus defining a reference point relative to which forces can be identified.

Chapter 4 covers the indirect measurement of density in the earth‘s interior using observations of seismic waves. One of the difficult problems in this case is that we can think of the interior density of the earth as a continuous function of radius—in order to determine this radius function, you are in effect making a measurement of an infinite number of points. The natural question to ask here is how much resolution the observations give you. I will focus on the work of geophysicists who were concerned with this problem, out of which eventually a standard model for the earth‘s density grew.

34

-2-

Copernicus, Kepler, and Decomposition

1 Planetary Astronomy

The most difficult problem of planetary astronomy in the sixteenth century was that the observed two-dimensional motions of the planets across the night sky are consistent with three different theories of the actual three-dimensional motions of the planets through space—the Ptolemaic theory, the Copernican theory, and the

Tychonic theory. In other words, the theory of the actual motions of the planets was underdetermined by the available observations. In fact, by making minor modifications, you could make the theories empirically indistinguishable from each other, given the kinds of observations that were available at the time. It seemed to some astronomers in the sixteenth century that this underdetermination is unresolvable, and that, in fact, trying to determine the actual motions of the planets should not even be an aim of planetary astronomy.

This problem could be solved, however, if you could find a way of indirectly measuring the distances between the planets, for the motions of the earth, the sun, and the planets through space are different for each of these theories. Copernicus

35

and Kepler both use the method of triangulation to attempt to measure planetary distances—setting up a triangle with the sun, the earth, and a planet at the corners and using geometrical relations to determine distances. In order to carry out this procedure, however, it is very important to know the angles of the triangle accurately.

But as I will explain, in order to determine these angles, you must perform what I called a decomposition in chapter 1—you have to separate out the effects due to two different features of the planetary motions. These features are called the first inequality and the second inequality.

Thinking about this problem in terms of the picture of indirect measurement I provided in chapter 1, we can take the solar system to be a complicated, partially inaccessible system, with the apparent motions of the planets being the accessible properties of the system, and the actual three-dimensional motions of the planets being the inaccessible properties. In chapter 1, I explained that in order to carry out an indirect measurement, you need to assume that (1) certain relations between the accessible and the inaccessible properties apply, and (2) effects from various inaccessible parts on the accessible parts can be decomposed in a certain way. The central question was how you ensure that the indirect measurement is correct or approximately correct in the face of (1) and (2).

With regard to assumption (1), the fundamental theory from with the relations between the accessible and inaccessible properties, that is, the relations between the apparent motions of the planets and the actual three-dimensional motions of the planets, are derived, is Euclidean geometry. That Euclidean geometry is applicable to the planets was never called into question by astronomers—they could not have,

36

of course, since they did not know of any other geometry than that of Euclid.

Assumption (2), however, involves exactly how you break down the apparent motions of the planets. All astronomers at the time, following Ptolemy, separated out two motions, the first inequality and the second inequality. There were disagreements, however, as to how to characterize each of these motions, and to what actual motions of the planets the first and second inequality corresponded. Since this separation of motions had to be done in order to determine planetary distances, how could an astronomer know whether a measurement of planetary distances involving decomposition is correct?

2 Planetary Astronomy in the sixteenth century

Although a more thorough treatment of planetary astronomy from the mid- sixteenth to the early seventeenth century would certainly require a section on Tycho

Brahe, I will focus on the work of Copernicus and Kepler. We will be thinking about the work of Copernicus and Kepler in terms of the framework I described in Chapter

1. We have access to the two-dimensional motions of the planets across the night sky, that is, angular distances of the planets relative to the constellations, over time.

What we want to know are the actual motions of the planets in three dimensions, that is, relative distances and directions of the planets over time. We will find that the measurement of planetary distances crucially involves separating out two different features of the motions of the planets—the first inequality and the second inequality.

I will explain what the first and second inequalities are shortly, but let us first consider the apparent two-dimensional motions of the planets. We can think of the

37

night sky as a vast, hollow sphere, onto the inner surface of which are painted the stars that are visible from the earth, some forming the familiar shapes of the constellations. The sun appears to make one entire circuit around this sphere every year, and the great circle along which it travels is called the ecliptic. The planets appear to move roughly along the ecliptic, but their motions are somewhat complicated. Movement along the direction of the ecliptic is called longitudinal motion, while movement perpendicular to the ecliptic is called latitudinal motion.

Since it is the longitudinal motions that ultimately yield information about planetary distances, I will talk almost exclusively of the longitudinal motions in what follows.

Now, let us consider these longitudinal motions. The motions of the planets are fairly regular, but they have two significant irregularities in their motion. One irregularity is the famous retrograde motion. At some points along their journey along the ecliptic, the planets will appear to stop and reverse direction for a while, going the opposite direction along the ecliptic. This irregularity was called the second inequality (or the second anomaly) by astronomers from the time of Ptolemy through Kepler. We now know that the second inequality arises because we are viewing the motions of the planets from a platform that is itself moving, namely the earth.

The other irregularity is that the planets appear to speed up and slow down at various points as they travel along the ecliptic. This variation in apparent angular velocities was called the first inequality (or the first anomaly). We now know that this variation occurs for two reasons. About half of the maximum variation in the apparent angular velocity is because the planets actually do speed up and slow down

38

relative to the sun in accordance with Kepler‘s area rule, while the remaining half comes from the sun not being at the center of the earth‘s orbit, but at a focus.

Figure 1

Figure 1 (from Swerdlow and Neugebauer 1984, 615) is a representation of the

Ptolemaic theory. The earth is labeled O, and the position of a planet is labeled P.

In this theory, the second inequality is accounted for by the use of epicycles. The planet moves in an epicycle, which is a circular orbit, while the center of the epicycle itself moves in a circular orbit, called the deferent, around the earth. In Figure 1, the center of the epicycle is labeled C, while the center of the deferent is labeled M. The first inequality is accounted for by offsetting the earth O from the center M of the deferent, and having another point called the equant point, labeled E, located on the opposite side of the center from the earth, at the same distance from the center as the earth. The planet travels at constant angular velocity as seen from the equant point,

39

and thus when seen from the earth it will appear to speed up and slow down at various points along its orbit. Since the equant point does not coincide with the center of the deferent, the planet‘s actual motion along the deferent will not be uniform circular motion.8

Ptolemaic astronomy was enormously successful—since its development in the second century, it was not superseded in accuracy for over a thousand years, until the work of Kepler. There were some aspects of Ptolemaic astronomy that were unsatisfactory, however, if one tried to think about how it could be physically implemented. Almost all astronomers before the time of Kepler believed the planets were carried along in their circular orbits by being embedded in rotating crystalline spheres. As I just mentioned, according to Ptolemaic theory, the speed of the planet along the deferent is not uniform—thus if it is being carried along by a crystalline sphere, the sphere must somehow slow down and speed up in such a way that the planet has constant angular velocity as seen from the equant point. It was difficult to see how this speeding up and slowing down could be physically implemented. In response to this difficulty, there was a school of Arabic astronomers connected to the

Maragha observatory in modern-day Iran who, in the thirteenth and fourteenth centuries, developed planetary models using only uniform circular motion, using epicycles to account for the first inequality.

Famously, Copernicus came up with a theory in which the second inequality is accounted for by putting the sun at the center of the solar system and having the

8 See Evans 1984 for an excellent exposition of the role that the equant plays in Ptolemaic astronomy, and why this innovation allowed Ptolemaic astronomy to be so empirically successful.

40

earth go around the sun. The second inequality is then seen to be the effect of observing the planets from a point that is itself moving. Although popular accounts of Copernicus have him rejecting the Ptolemaic theory because of its epicycles, he actually objected to it for the same reason as the Maragha astronomers—because it departed from uniform circular motion (Swerdlow and Neugebauer 1984, 293-294).

In fact, Copernicus accounts for the first inequality using the same principles as the

Maragha astronomers did, with an epicycle.9 In order to get the theory to capture the motions that Ptolemy could using the equant, this epicyclic theory for the first inequality had to be rather complicated. Figure 2 (from Swerdlow and Neugebauer

1984, 616) shows the Copernican theory for the first inequality.

Figure 2

9 Swerdlow and Neugebauer go so far as to say that Copernicus ―can be looked upon as, if not the last, surely the most noted follower of the Maragha school‖. (295)

41

3 Triangulation

Since both Copernicus and Kepler use fundamentally the same method to get planetary distances, I will first explain the basic method so that the explication will be easier when we look specifically at what Kepler and Copernicus do. At root, the method is very simple. First take a look at Figure 3. It shows the sun, the earth, and a planet, surrounded by constellations. The constellations are taken to be fixed permanently in their positions, and thus they provide a reference point for recording observations of the planets. From the earth, I can observe the position of the sun S and the planet P along the ecliptic. The position along the ecliptic is called the longitude. The longitude as seen from the earth is called the geocentric longitude.

In Figure 3, it just so happens that the earth, the sun, and the planet are lined up so that the sun and the planet are exactly on opposite sides from the earth. Notice here that when I observe the planet from the earth, I see it exactly the way I would see it from the sun. When the sun, the earth, and a planet are in this configuration, this is called opposition. Call the longitude as seen from the sun the heliocentric longitude. Then at opposition, the geocentric longitude and the heliocentric longitude coincide.

42

Figure 3

Now suppose the planet, the earth, and the sun are in the configuration shown in Figure 4. In this configuration, the planet would have a different longitude, that is, it would appear to be moving through different constellations, depending on whether

I observe it from the earth or from the sun. We can see that when not in opposition, the geocentric longitude and heliocentric longitude will be different.

Figure 4

43

Now suppose when the earth, a planet, and the sun are in the configuration of figure 4, we want to find the distance from the earth to the planet, the distance EP, as a ratio of the distance from the earth to the sun, the distance ES. Suppose we already have a theory of the motion of the earth around the sun, so that we know, at any given time, the longitude of the earth as seen from the sun. And suppose we have in addition a theory of the motion of the planet P around the sun as well, so we know, at any given time, the heliocentric longitude of the planet P. The theory of the earth‘s motion will give us the direction of the line ES, while the theory of the motion of P will give us the direction of the line SP. Making one observation from the earth will give us the direction of the line EP, thus allowing us to find all the angles in the triangle EPS. This will then allow us to find, by simple geometry, the ratio of the length of the line EP to the length of the line ES, which is what we wanted. Thus, given that this is the actual configuration of the earth, the sun, and the planet, and that I have the proper theories for the earth‘s motion and the planet‘s motion, I can find the distance from the earth to the planet, relative to the size of the earth‘s orbit.

4 Copernicus’s measurement of planetary distances

We will now move on to the specific method that Copernicus uses to measure distances to the planets, which he does in Book 5 of De Revolutionibus. Since the method he uses is basically the same for all five planets, with minor differences depending upon whether the planet is an inner planet or an outer planet, I will only describe his procedure for one of the planets, Saturn. The basic method is triangulation, just as I described above. We can think of Figure 5 (from Swerdlow

44

and Neugebauer 1984, 635) as a much more detailed and complicated version of

Figure 4. Saturn is labeled P, the earth is labeled O, and the sun10 is labeled S. The reason this figure is so much more complicated than figure 4 is that the theory of

Copernicus does not consist of the simple circles I have above. The theory of motion for Saturn involves an epicycle to account for the first inequality, and there are further complications because the sun is not located at the center of the orbit of

Saturn. But if we strip away some of these complications, the basic method is the same. The idea is to determine the angles in the triangle formed by the earth, the sun, and Saturn.

Figure 5

10 One detail that I will discuss in a later section is that the sun here is the mean sun, not the true sun.

45

The first leg of the triangle, the direction of the line from the earth to the sun, is given by the Copernican solar theory, which is really the theory of the earth‘s motion for Copernicus. The solar theory is worked out in Book 3 of De

Revolutionibus. The second leg of the triangle, the direction of the line from the sun to Saturn, is determined by the Copernican theory of the first inequality. As I have mentioned, this involves a rather complicated mechanism involving an eccentric and an eipcycle. Thus, the parameters that control the first inequality, namely, the direction of the line of apsides, the eccentricity, and the radius of the epicycle, must be determined from observations. Now, a complication is that the observations to determine the parameters for the first inequality must be made from an earth that is moving. Another way to say this, in keeping with the way I discussed measurement in chapter 1, is that the observations we make from the earth are the combined effect of the first and second inequalities.

But there is a handy way of controlling for the effect of the second inequality.

In order to determine the parameters for the first inequality, Copernicus used only observations at opposition. Recall that the effect of the second inequality is due to the motion of the earth for Copernicus, so observations at opposition, when you are in effect seeing the planet as it would be seen from the sun, allow you to isolate the effects of the first inequality from the effects of the second inequality.11 Using three observations of Saturn at opposition, Copernicus used a complicated iterative method, taken from Ptolemy, to determine the parameters of the first inequality. He also

11 Note that Ptolemy, too, uses observations at opposition to eliminate the effects of the second inequality, but of course the second inequality is taken to be an effect of motion along an epicycle.

46

checked this measurement against three observations made at opposition by Ptolemy.

Once the parameters for the first inequality were determined, Copernicus could then determine the heliocentric longitude of Saturn, giving the direction of the line from the sun to Saturn. The third leg of the triangle, the direction of the line from the earth to Saturn, was given by a single observation made by Copernicus when Saturn was not at opposition.

Using his theory, the three observations of Saturn at opposition from Ptolemy, his own three observations of Saturn at opposition, and one observation of Saturn not at opposition, Copernicus finds that the distance of Saturn from the mean sun, if we take the radius of the earth‘s orbit to be one unit, is 9.70 and 8.65 respectively when it is farthest and closest to the sun.12

Let me now make some comments on the overall procedure. It is actually a very simple trigonometric measurement, but it is complicated by having to account for the first and second inequalities. The Copernican theory takes the second inequality to be the result of the earth‘s own motion, while the first inequality is taken to be the result of motion on an epicycle and the eccentricity of the planet‘s orbit.

In terms of the way of viewing measurement that I gave in Chapter 1, we have to separate out the effects of these two factors. This is done by two steps in which isolation is carried out. In the first step, the effects due to the first inequality are isolated by using observations at opposition, which eliminate the effect of the second inequality. Then the theory for the first inequality is fitted to these

12 Rosen 1978, 254; Swerdlow and Neugebauer 1984, 335-336. I have converted the units from hexagesimal to decimal.

47

observations by varying the parameters for the first inequality, and finding values for the parameters that are consistent with the observations. The fit of the parameters is checked against Ptolemy‘s own measurements at opposition. Now, once the first inequality is fitted to the observations, Copernicus uses a single observation when not at opposition. This now allows Copernicus to use the now-fitted theory for the first inequality to subtract out the effect of the first inequality, and isolate the effect of the second inequality. This allows him to determine the one parameter for the second inequality that is important for observations of Saturn, the ratio of the Saturn- sun distance to the earth-sun distance.

Note that the equivalent measurement can be done in the Ptolemaic theory.

In the Ptolemaic theory, the equivalent measurement would be one in which the radius of the deferent of Saturn‘s orbit is compared to the radius of the epicycle. Of course, there would be differences in calculation due to differences in the theory for the first inequality, but the equivalent calculation can be, and is, done in the

Ptolemaic theory. Measurements at opposition would correspond, in the Ptolemaic case, to measurements of the position of Saturn when the actual motion of Saturn coincides with what its motion would be if it did not move along the epicycle and simply moved along the deferent. In other words, measurements at opposition perform the same function for Ptolemy as for Copernicus—they isolate the effect due to the second inequality.

The one advantage of the Copernican theory is that the second inequality is accounted for by the same motion—that of the earth—for all the planets. This gives the Copernican astronomer a common measure, the radius of the earth‘s orbit, with

48

which to compare the distances of all the planets. In the Ptolemaic theory, the relative size of the deferent and the epicycle can be measured for each planet, but there is no common measure for the distances of all the planets. Therefore, you get the relative distances of the planets ―for free‖ if you are a Copernican. If you are a

Ptolemaic astronomer, there is a way of determining the relative distances to each of the planets—you assume that the spheres that contain the planets are contiguous to each other. This will give planetary distances, but the assumption that the spheres that hold the planets are touching is rather arbitrary.13

I have described the method by which Copernicus did his indirect measurement of the planetary distances. Now, let us think about how confident we could be that these measurements are correct. If we take a look at the procedure, there are a couple of worries. First, the procedure is to fit the parameters of the first inequality theory using three observations at opposition. The fit of the parameters is checked against three further observations at opposition. And then, in order to isolate the effect due to the second inequality, a measurement is made of Saturn when not at opposition, and the first inequality is subtracted out using this first inequality theory. We might worry here that although the first inequality theory is tested when in opposition, we now rely on it to predict the effect of the first inequality when not at opposition. Is this a good interpolation to make? We will find that Kepler asks this question and comes up with a negative answer.

A bigger worry is that the whole procedure presupposes that the Copernican theory is the correct theory of planetary motion. The observations are used not to

13 See Swerdlow and Neugebauer, 58.

49

confirm whether the theory is true, but in order to fit the parameters. The Ptolemaic theory did just as well in capturing the planetary observations—in fact, Copernicus checked his results against Ptolemaic ratios of the epicycle radius to deferent radius.

The procedure provides no way of safeguarding against the possibility that the

Copernican theory might be wrong.

There is some reason for preferring the Copernican theory over the Ptolemaic theory because it gives the planetary distances, and their orders, unambiguously, but this is nothing like a knockdown argument against the Ptolemaic theory. And there were other arguments for taking the earth not to be moving relative to the fixed stars, such as the failure to observe stellar parallax.

5 Underdetermination

I have now ended the section on Copernicus with the conclusion that the indirect measurements of planetary distances by Copernicus simply presuppose that the Copernican theory is correct. This leads us to the great problem underlying planetary astronomy at the time—observations at the end of the sixteenth century could not tell the difference between the Ptolemaic theory, the Copernican theory, and a third competitor, the Tychonic theory, which I will describe shortly. In fact, it wasn‘t just that the observations do not tell the difference. By means of minor modifications to each of the theories, they could be made empirically equivalent to each other, as Kepler shows in Chapter 1 of the Astronomia Nova.

As I have already mentioned, any theory of planetary motion had to account for two major components of planetary motion, the first inequality and the second

50

inequality. The first inequality could be accounted for with the use of the equant, as in Ptolemaic theory, or with the use of an epicycle, as by the Maragha astronomers and Copernicus. There are three ways of accounting for the second inequality. The first is shown in Figure 6a (taken from Swerdlow and Neugebauer, 613), with the earth O at the center, and the planet P moving on an epicycle, the center of the epicycle C moving around the earth. This is how it is done in the Ptolemaic theory.

Note here that the direction of the sun as seen from earth is parallel to the direction of the planet as seen from the center of the epicycle C.

Now, one transformation you can do on this theory that will keep all observations exactly the same would be effectively to switch the deferent and the epicycle. In order to avoid confusion, let me just call these two circles the big circle and the small circle. In the Ptolemaic theory, you have the center of the small circle moving along the circumference of the big circle. But you can imagine swapping the big circle and the small circle, so that now you have the small circle centered on the earth O, and the center of the big circle moving along the circumference of the small circle, as in Figure 6b. It just so happens you can now place the sun S at the center of the big circle, and the systems in Figure 6a and Figure 6b will give you exactly the same observations. Figure 6b shows how the second inequality is accounted for in the Tychonic theory.

Now you can think of making a further transformation of Figure 6b. Instead of the sun S moving around the earth O, you take the earth to be moving around the sun, as in Figure 6c. The planet P moves in an orbit centered on the sun. This is how the second inequality is accounted for in the Copernican theory. All of these

51

arrangements will give rise to exactly the same apparent motions when observed from the earth.

Figure 6a, b, c (from left to right)

We thus might worry that there will never be a way to tell the difference between these three arrangements for the second inequality. And similar arguments might be made for the first inequality as well—one could account for exactly the same motions using either an equant or an epicycle.14 If so, then trying to determine which of these three-dimensional motions of the planets is the correct one would be a waste of time. If we are astronomers in the sixteenth century, we might then decide that the aim of planetary astronomy should not be to try to determine the actual three-dimensional motions of the planets at all, but to provide a method of calculating the motions that we can actually observe. This is, in fact, the view expressed in the famous preface to De Revolutionibus, written by Andreas Osiander:

For this art, it is quite clear, is completely and absolutely ignorant of the causes of the apparent nonuniform motions. And if any causes are devised by the imagination, as indeed very many are, they are not put forward to

14 The first inequality theory of Copernicus in fact preserves uniform angular motion with respect to the equant. See Swerdlow and Neugebauer, 296.

52

convince anyone that they are true, but merely to provide a reliable basis for computation. However, since different hypotheses are sometimes offered for one and the same motion (for example, eccentricity and an epicycle for the sun‘s motion), the astronomer will take as his first choice the hypothesis that is easiest to grasp. The philosopher will perhaps rather seek the semblance of the truth. But neither of them will understand or state anything certain, unless it has been divinely revealed to him. (Rosen 1978, xx)

As I have already mentioned, Copernicus himself actually had physical reasons for believing that his own theory was correct—namely that it was more consistent with the idea that the planets were being carried along by crystalline spheres. This consideration is rather weak, however, especially in the face of various competing arguments for the Ptolemaic theory that were taken seriously at the time, such as those based on Biblical passages.

In any case, the underdetermination here seems insurmountable, because of what we might term the ―anything you can do, I can do just as well‖ argument.

Given a theory of planetary motions using any of the three physical arrangements,

Ptolemaic, Copernican, or Tychonic, you could turn it into an empirically equivalent theory using a different physical arrangement by means of mathematical transformations. Is there a way out? We will now turn to the work of Kepler, who believed he found a method of dealing with this underdetermination, and put it to work in his Astronomia Nova.

6 The Ursus-Kepler Dispute

Before we examine Kepler‘s work in astronomy, let us take a look at his views on methodology. The issue of underdetermination was at the core of a nasty dispute between and his contemporary, an astronomer by the name

53

of Ursus, also known as Nicolaus Reimers Baer. In 1588, Ursus published a tract on astronomy called the Fundamentum Astonomicum, featuring a theory similar to that of Tycho‘s, having the earth at the center, the sun and the Moon revolving around the earth, and the other planets revolving around the sun. In 1596, Tycho publicly accused Ursus of plagiarism. In 1597, Ursus responded with the Tractatus, a nasty, rambling essay that contains a few serious points but is also full of snide remarks and egregious personal attacks.

Kepler got pulled into this dispute against his will. He had once written to

Ursus and politely commented in the letter that ―I admire your hypotheses.‖

Unbeknownst to Kepler, Ursus put Kepler‘s letter into the text of the Tractatus, making it seem as if Kepler supported Ursus against Tycho. As it happens, Kepler was in dire need of a job in 1600, having been ordered to leave Graz after having refused to convert to Catholicism. He had been corresponding with Tycho, and a job under Tycho at would allow him to continue pursuing astronomy. He managed to secure this position, but only under the condition that he write a detailed response defending Tycho against Ursus.

This response is known as the Apologia. Ursus died soon after Kepler wrote the Apologia, so it was not published until the nineteenth century. It therefore played no role in the development of astronomy, physics, or the philosophy of science, but it is of great interest to us because we can think of it as one of the first clear statements

54

of scientific methodology at the dawn of the modern era, made by one of its most important practitioners.15

Ursus has views very similar to those expressed by Osiander in his preface to

De Revolutionibus, and in fact quotes it approvingly. For Ursus, hypotheses are

―fabrications which we imagine and use to portray the world-system,‖ and it is not in the least necessary ―that these hypotheses correspond altogether… …to the world system itself.‖ (Jardine 1984, 41) Ursus gives several arguments for this anti- realistic view of astronomical hypotheses, but the one that is most significant from our point of view is what Jardine calls the ―argument from evidential insufficiency‖

(Jardine, 212)—since, as Ursus argues, one can reach true conclusions from false premises, the mere fact that planetary latitudes and longitudes can be accurately predicted does not mean that the theory that makes such accurate predictions is correct.

What Ursus says, on the face of it, is true. As I have mentioned in chapter 1, the crux of the argument from underdetermination is that two radically different theories, such as the Copernican and the Ptolemaic theory, can give the same predictions. This means that both theories can give accurate predictions, but one of them must be false. Kepler gives the following response, which gives us some insight into the method he uses in the Astronomia Nova:

Well, then, isn‘t it necessary for one of the two hypotheses about the primary motion (to take an example) to be false—either the one which says the earth is moved within the heavens, or the one which holds that the heavens are turned about the earth? Certainly if contradictory propositions cannot both be

15 In fact, the title of Nicholas Jardine‘s (1984) book on this controversy is The Birth of History and Philosophy of Science.

55

true at once, these two will not both be true at once: rather one of them will be altogether false. … Does what is true follow equally from what is false and what is true then? Far from it! For the occurrences listed above, and a thousand others, happen neither because of the motion of the heavens, nor because of the motion of the earth, insofar as it is a motion of the heavens or of the earth. Rather, they happen insofar as there occurs a degree of separation between the earth and the heaven along a path which is regularly curved with respect to the path of the sun, by whichever of the two bodies that separation is brought about. So the above-mentioned things are demonstrated from two hypotheses insofar as they fall under a single genus, not insofar as they differ. Since, therefore, they are one for the purpose of the demonstration, for the purpose of the demonstration they certainly are not contradictory propositions. And even though a physical contradiction inheres in them, that is still entirely irrelevant to the demonstration. (Jardine 1984, 142)

Two hypotheses that are different in physical terms, like the Copernican and

Ptolemaic theories, can give the same predictions. But they do so only because they

―fall under a single genus, not insofar as they differ‖. That is, there is something about the two hypotheses that is similar—in this case the relative motions of the sun and the earth with respect to each other—that makes the two hypotheses have the same predictions. But two different hypotheses will also have differences, which can be found, because ―every hypothesis whatsoever, if we examine it minutely, yields some consequence which is entirely its own and not shared with any other hypothesis‖ (143).

Two hypotheses might give the same predictions under a certain set of conditions, but there must be some conditions under which they will give rise to different predictions. The passage also suggests a further insight. Suppose we know exactly what it is about two different hypotheses that makes the two hypotheses give the same predictions. Then we can know under what conditions the predictions of the two hypotheses coincide, and thus use the predictions of one hypothesis to stand

56

in for the predictions of another hypothesis. We will see that in the Astronomia Nova,

Kepler at times uses the predictions of a theory that he knows to be wrong (for example, the theory of heliocentric Mars longitudes that Kepler called his ―vicarious theory‖, to be discussed below) because he knows that the theory will yield predictions that are accurate under certain conditions.

6 True sun and mean sun

We will now examine three instances of where Kepler measures planetary distances in the Astronomia Nova. This work is a radical reformulation of planetary astronomy, focusing on the motion of Mars. As he states in the introduction, its main aim is to improve the accuracy of astronomical tables, and in this aim its results are supposed to be acceptable to proponents of the Ptolemaic, Copernican, and Tychonic theories. It also has a secondary aim of showing that the Copernican theory is the true theory of the planetary motions. In the first few chapters, Kepler shows that, as

I have already mentioned, the Ptolemaic, Copernican, and Tychonic theories can all be made empirically equivalent to each other with minor modifications. This allows him to use his own preferred Copernican theory for most of the book, with the assurance that with the appropriate transformations the same procedures can be done using the other theories as well.

In Chapters 5 and 6, Kepler makes a point that is very significant with regard to his methodology. For the determination of the parameters of the Ptolemaic,

Copernican, and the Tychonic theory, the position of the sun is important. For example, we saw that observations of the planets were made at opposition. This was

57

done by calculating the time at which the earth would be exactly between the sun and the planet, and observing the position of the planet at that time. The easiest way to do this is to assume that the sun moves with constant angular speed across the ecliptic, even though we know that its speed actually varies. The position the sun would be in if it moved with constant angular speed is called the mean sun, while its actual position is called the true sun.

Ptolemy, Copernicus, and Tycho all used the mean sun in their calculations of planetary positions. Kepler believed that the motions of the planets were caused by the sun, and thus planetary positions ought to be measured from the true sun, not the mean sun—the mean sun just being a point out in empty space with doubtful physical significance. In particular, this would make a difference in determining the position of the line of apsides, the line that goes through the points on the orbit that are closest and farthest from the sun. The position of the line of apsides plays a very important role in determining features of the orbit.

Kepler asks what difference the use of the mean sun in place of the true sun would make to the observations. See Figure 7 (from Stephenson 1987, 37). In the figure, A is the true sun and B is the mean sun. Determining the line of apsides based on the true sun would give the solid orbit, while determining it based on the mean sun would give the dotted orbit. What Kepler finds is that the two orbits can give planetary distances that are very different, such as points X and Y. But consider the procedure that was used to check the orbits for the first inequality by Copernicus.

They were all done at opposition, in order to cancel the effects of the second inequality. This means you are effectively looking at the position of the planet from

58

point A or point B, depending upon whether you use the mean sun or the true sun.

But the angular distance between X and Y is very small from either point. In order to be able to see the distance between X and Y, you have to view X and Y from the side—that is, you have to make observations when not at opposition, when the earth is not located on the line from the planet to the sun.

Figure 7

What this shows is that even if two theories cannot be distinguished with respect to observations of the two-dimensional motions of the planets that are directly observable from the earth, they might give significantly different predictions for planetary distances, so if you could somehow measure planetary distances, then this would provide a way of rejecting one or the other theory.

In the next part of the Astronomia Nova, Kepler develops a theory for the first inequality of Mars, using the true sun instead of the mean sun. This theory, like the

59

Ptolemaic theory, assumes that the orbit of Mars is circular, and it uses an equant for the first inequality. It is, however, unlike the Ptolemaic theory in that the distance from the center of the orbit to the equant and the distance from the center of the orbit to the eccenter are not equal, and they are treated as independent parameters. Kepler fits the parameters to four observations at opposition made by Tycho, and checks them against all the observations Tycho made at opposition and two further observations made by Kepler himself, and shows that this theory gives heliocentric longitudes of Mars to within two minutes of arc. Kepler, having already shown that two significantly different theories can give very nearly the same longitudes, checks this theory in two different ways, and shows that the theory cannot be right. Kepler calls this theory his ―vicarious hypothesis‖ because it has to be false, yet it gives good heliocentric longitudes, a fact that he makes use of later.

What all this painstaking work showed is that in order to come up with a theory for the first inequality, observations at opposition would not suffice. In order to put further constraints on the theory, you have to be able to determine planetary distances. But in order to determine planetary distances, you can‘t use observations at opposition—you have to set up a triangle like that of Figure 4. In order to set up the triangle, however, you need a good theory of the earth‘s motion. This is what

Kepler turns to in the next part of the Astronomia Nova.

7 The earth’s orbit

Seen from the modern perspective, there are actually two features of

Ptolemaic theory that correspond to the earth‘s motion. The first feature is Ptolemy‘s

60

solar theory, which is modeled using an eccentric circular orbit with no equant. The second feature is the second inequality of the planets, which is modeled using an epicycle in Ptolemaic theory. The epicycle is a circular orbit in which the planet moves uniformly around the center of the epicycle. The center of the epicycle is the point that attaches it to the deferent. As Stephenson (1987, 50) points out, this corresponds to the simplest imaginable theory of the earth‘s motion, one where the earth is moving uniformly in a circle with the sun at its center. In the Copernican theory, this model for the second inequality is duly preserved, so Copernicus has the earth revolving around at a uniform speed in a circle with the mean sun at its center.

But one could think of modeling the earth‘s orbit using an eccentric and an equant, as with the other planets.

Kepler measures the orbit of the earth with respect to the true sun16 in

Chapter 26.17 He ingeniously exploits the fact that the orbital period of Mars is known to be 687 days, meaning that Mars returns to the same point in space relative to the fixed stars every 687 days. By using four observations of Mars spaced 687 days apart, you can in effect see the earth from the same position in space. In Figure

8 (from Stephenson, 52), this position in space is labeled M, the position of earth during the four observations are labeled E1, E2, E3, and E4, and the true sun is labeled

C. The direction of the line from the earth to Mars is known from the observations.

The direction of the line from the true sun to Mars is given by the vicarious hypothesis, which Kepler knew to give accurate heliocentric longitudes. The directions of the lines from the true sun to the earth come from Tycho‘s solar theory,

16 He also does it for the mean sun in Chapter 24. 17 Accounts of this measurement can be found in Wilson 1968, 1972, and Stephenson 1987.

61

which Kepler shows later in chapter 31 to provide accurate longitudes. This determines all the angles in the triangles, and they all share one leg, line CM, which allows Kepler to know the relative distances of the points E1, E2, E3, and E4 from C.

Since three points determine a circle, he could determine the earth‘s orbit under the assumption that it is circular.

Figure 8

This procedure showed that the center of the earth‘s orbit lies very nearly halfway between the equant point of the earth‘s orbit and the position of the true sun—that is, it can be shown that the earth‘s orbit has very nearly bisected eccentricity, just like the other planets. As Stephenson (1987, 55) points out, these measurements of the earth‘s orbit are done not by observing the position of the sun, but by observing the position of Mars, controlling for the effect of the first inequality by using observations spaced by one orbital period of Mars. This means that these results, when interpreted according to the Ptolemaic theory, do not apply to the sun‘s

62

motion, but to the second inequality of Mars. But this now renders Ptolemaic theory much more implausible, because one could no longer accept the fairly neat picture where each planet moves on an epicycle with its center attached to the deferent.

Now, instead of the epicycle, each planet has a transformed version of the earth‘s orbit, attached to the deferent at an eccentric point corresponding to the true sun, and having an equant point with bisected eccentricity! Kepler remarks that Ptolemy has been ―refuted in passing‖:

And finally, when a comparison of hypotheses has been made, and it has appeared that four theories of the sun can be generated from a single theory of the earth, like many images from one substantial face, the sun itself, the clearest of truth, will melt all this Ptolemaic apparatus like butter, and will disperse the followers of Ptolemy, some to Copernicus‘s camp, and some to Brahe‘s. (Donahue 1992, 337)

After this point, Kepler does not seriously entertain the Ptolemaic theory as a possibility in the Astronomia Nova. This is not a knockdown argument against the

Ptolemaic theory in the way Galileo‘s observation of the phases of Venus was, but it made the Ptolemaic theory seriously implausible.

8 Other uses of triangulation in the Astronomia Nova

Kepler also uses triangulation at later sections in the Astronomia Nova, when he is working out the orbit of Mars. He first has a theory where Mars moves on an epicycle and traces out an eccentric circular orbit. In order to simplify calculations of the trajectory of the planet, he uses the area rule, which he at that time regarded as simply a helpful shortcut. By comparing the trajectory calculated using the area rule with the vicarious theory, which he knows gives accurate heliocentric longitudes, he

63

finds that his epicyclic theory makes Mars go too fast at certain points in the orbit, and too slow at others. This implied either that the orbit is not circular but oval, or that the area rule is wrong. Kepler then uses triangulation in order to figure out the shape of the orbit, using a now-modified version of Tycho‘s solar theory with bisected eccentricity to now get more accurate earth-sun distances. These triangulation measurements show that the orbit must be oval.

Now having concluded that the orbit is oval, Kepler comes up with a modified epicyclic theory in which epicycle rotates uniformly, but he finds the calculations required for this theory to be horrendous, so he approximates it with an ellipse, which he calls an ―auxiliary ellipse‖. The auxiliary ellipse, however, gives a trajectory that is off by about the same as the eccentric circular orbit. He then decides to measure the distances to Mars in order to confirm whether the orbit is an oval. He chooses several sets of points that are located symmetrically about the line of apsides, and finds that the pairs of points agree to within the precision of his calculations. As Wilson (1968, 13) notes, these measurements of the distances were good enough to show that the orbit is oval, but not that it is an ellipse. One further result that Kepler gets out of this measurement, however, is that it showed that the line of apsides passes through the true sun, not the mean sun, so it therefore provided clear evidence that planetary positions should be referenced to the true sun.

9 Decomposition

According to a hypothetico-deductive view, like that held by Ursus, the evidence for a theory comes entirely from predictions of accessible properties. And

64

because of the problem of underdetermination, we should not take those theories to correspond to something real. Kepler‘s approach flies in the face of the hypothetico- deductive method—he makes the assumption that the theories of planetary motion correspond to actual motions of the planets. But he is also very careful. He realizes that simply because a theory gives good predictions does not mean it is true. It could turn out that the theory gives good predictions only under certain conditions. Thus, you need to understand the conditions under which the theory gives you good predictions, and why. With that understanding, you can have confidence in the theory.

This is related to the problem of decomposition I have discussed in chapter 1.

The motions of the planets are decomposed into two main components, the first inequality and the second inequality—or, from the point of view of a Copernican astronomer, the motion of the earth and the motion of the planet. In order to determine the motion of Mars accurately, I need to separate it out from the motion of the earth. And in order to determine the motion of the earth, I need to know the Mars orbit accurately. A false theory can be used for this job of decomposition, if you know under what conditions it yields good predictions. For example, Kepler uses two false theories, Tycho‘s solar theory, and the vicarious theory for Mars, in a triangulation to determine the center of the earth‘s orbit, and concludes that the eccentricity of the earth‘s orbit is bisected. He could use the false theories because he knew that they yield good predictions for longitudes, even though he knew their distances must be false. This gave him the modified version of Tycho‘s solar theory, which gave better distances. Later, using triangulation with this modified theory

65

gave him good enough measurements of the Mars orbit to show that the orbit is an oval. Thus, by playing the earth‘s orbit and the Mars orbit off of each other, he was able to get better and better estimates of what the true orbit is like.

The success of this decomposition is a strong indication that the assumptions that went into the decomposition were correct. This is because if the assumptions are wrong, the decomposition is very unlikely to work. For example, consider Kepler‘s use of the true sun instead of the mean sun as a reference point for the planetary orbits. As we saw, Kepler showed that the true sun and the mean sun would yield very similar results at opposition, but they would be significantly different away from opposition. Kepler goes with the true sun because of his beliefs about the physical causes of the planetary motions. However, this would have been a rather weak justification at the time, because most astronomers did not share his beliefs about the physical causes. But then Kepler used the true sun in his modification of

Tycho‘s solar theory to a bisected eccentric orbit. And the adjustments in distances due to Tycho‘s solar theory allowed the determination of the Mars orbit to better accuracy, and particularly the measurement of the position of the line of apsides.

This measurement showed that the line of apsides goes through the true sun, not the mean sun. It‘s true that the justification seems circular here, since at the beginning

Kepler simply assumed that the true sun is the proper point of reference. But if the true sun were not the proper reference point, you would not expect to get symmetric measurements along the line of apsides, nor would there be any reason to think that the line of apsides would pass through the true sun.

66

We might compare the method of Kepler with that of Copernicus, particularly in the way each of them uses triangulation. Copernicus assumes at the outset that his theory is correct, and uses triangulation in order to determine a parameter of his theory, namely the size of a planet‘s orbit relative to the earth‘ orbit. Kepler, on the other hand, carries out triangulation, but does not assume the theories he uses to do the triangulation is correct. In fact, he uses theories that he knows are false, and tries to give reasons why the triangulation is nevertheless correct. The measurements are then used in further determinations of planetary orbits using decomposition, and ultimately success in decomposition provides evidence that the use of those theories for the measurements were correct.

67

-3-

Newton and Kant on the Third Law of Motion

1 Underdetermination once again

In the previous chapter, we saw that there was an underdetermination between the three theories of planetary motion, the Ptolemaic, Copernican, and Tychonic theories, with respect to a certain set of observations, namely planetary latitudes and longitudes over time as seen from the earth. The Ptolemaic theory was definitively rejected by the observation of the phases of Venus by Galileo in 1619, but I discussed another way in which the underdetermination might have been resolved—by the indirect measurement of planetary distances. In particular, I focused on the work of Kepler. By decomposing the apparent motions of the planets into a part due to the motion of the earth and a part due to the motion of the planets, he was able to make progress on the question of what the actual three-dimensional motions of the planets are. Going back to my picture of indirect measurement from chapter 1, we can think of the solar system as a complicated, partially inaccessible system. The work of Kepler showed how you can make indirect measurements of planetary distances, and how to justify the presuppositions that must be made in order to carry out the measurements.

68

This chapter continues my examination of indirect measurement in planetary astronomy, but now focuses on the work of Newton. At the time of publication of

Newton‘s Principia in 1687, the underdetermination between the remaining two theories of planetary motion, the Copernican and Tychonic theories, was still unresolved. To be more precise, Kepler‘s theory and other subsequent theories of planetary motion had superseded the original Copernican and Tychonic theories, but it was still unresolved whether the earth stood still with respect to the fixed stars and the sun moved around the earth, or whether the sun stood still and the earth moved around the sun. There was one way this underdetermination could be resolved that was known since at least the time of

Tycho. This was to observe the stellar parallax, the difference in the apparent positions of stars due to their being observed from different positions along the earth‘s orbit.

Stellar observations, however, were not accurate enough at the time of Newton, and in fact stellar parallax was not actually measured until 1838.

There was also an entirely different underdetermination problem with regard to the motions of the planets at the time Newton began his work on planetary motion.

Kepler had, in 1627, published the Rudolphine Tables, a set of astronomical tables based on his own theory of the planetary motion. We now know that the predicted positions of the Rudolphine Tables were thirty times more accurate than those of any prior astronomical tables,18 and the subsequent success of these tables showed that Kepler‘s theory was approximately correct. But in the following decades, astronomers such as

Ismael Boulliau, Vincent Wing, Thomas Streete, and Nicolaus Mercator developed different astronomical tables, using their own theories for calculating the planetary

18 Gingerich, ―Johannes Kepler‖, 77, from Taton and Wilson.

69

motions. George Smith points out that Newton knew of at least seven different such ways of calculating planetary motions, all within more or less the same level of accuracy.19 By now, there was fairly good agreement about what the approximate three- dimensional motions of the planets had to be, setting aside the issue of Tychonic versus

Copernican. But was there any way of deciding, once and for all, which theory of the planetary motions about the sun is the correct one?

Newton resolved these two issues of underdetermination by introducing the indirect measurement of a new kind of property of the solar system—mechanical properties such as forces and mass. From the three-dimensional motions of the planets—now known with confidence to be approximately correct—these mechanical properties can be measured. But as I have discussed in chapter 1, indirect measurement of inaccessible properties of complicated systems often requires us to make assumptions about what the system is like. In this case, you have to assume that certain mechanical laws that are known to apply to objects here on earth can be extended to celestial objects as well. One question is how such an extension of these mechanical laws to celestial objects can be justified, especially considering that they are antecedently unfamiliar objects. Further, in order to indirectly measure forces from motions, we must know whether the motions are the combined effect of different sources of force, and how to decompose these effects. That is, decomposition of forces must be carried out, much like Kepler had to decompose motions.

19 From ―Closing the Loop: Testing Newtonian Gravity, Then and Now‖, Lecture 1, Isaac Newton lectures at Stanford University. See also Wilson, ―Predictive astronomy in the century after Kepler‖ in Taton and Wilson 1989.

70

Newton‘s third law of motion turns out to play an indispensable role in identifying sources of force from motions. But, as we shall see, Newton‘s application of the third law of motion for attraction between celestial bodies was an unjustified assumption for many of Newton‘s contemporaries, because we would not expect it to be applicable to the motions of the planets if they are mediated by an aether, which was thought likely at the time. I will examine the objections lodged against Newton, first by

Newton‘s editor, Roger Cotes, and then later by Kant. We will see that ultimately the application of the third law was justified by the long-term success of the research programs that were carried out using the third law of motion as if the attraction between the planets is not mediated by an aether.

2 From motions to forces

Newton‘s work leading up to the Principia was initiated by a series of letters in late 1679 to early 1680 with Robert Hooke,20 where they discuss what the motion would be if an object was dropped from the surface of the earth towards the center, imagining that there is nothing to impede the motion of the object as it falls. Newton first says it would spiral in towards the center, but Hooke replies that the motion would be in the form of an ellipsoid. Newton then writes back, saying that under the assumption that gravity is the same at all distances from the center, you get a cloverleaf-like pattern as the object falls towards the center. Hooke then replies that he is assuming that the attraction is proportional to the inverse of the square of the distance to the center, and

20 Isaac Newton, The Correspondence of Isaac Newton, Vol 2, 1676-1687, 304-313.

71

wonders what the properties of the curved line traced out by an object moving under such an inverse square attraction would be.

A pair of problems is under discussion here: the question of what the motion of an object under a particular law of attraction would be, and the question of what the law of attraction would be for an object moving on a particular trajectory, when we constrain the attraction such that it is always directed towards a central point. Motion under inverse square attraction was discussed in the years leading up to the publication of the

Principia, by Hooke and others who were interested in astronomy. In a letter to Newton written in 1686,21 Edmund Halley states that in January 1684, he came to the conclusion from Kepler‘s 3/2 power rule22 that the centripetal force governing the motions of the planets is inverse square, and he discussed the idea with Hooke and Christopher Wren.

In this discussion, Hooke claimed that he could account for all the motions of the heavenly bodies with an inverse square law. Halley himself had failed to do so.

It was, in fact, Newton who was first able to give a detailed answer to the question of what the motions of bodies under inverse square forces would be, in the short tract ―De Motu‖, written in 1684, and out of which the Principia grew. An early version of ―De Motu‖ is an analysis of the motion of bodies under centripetal forces,

21 Ibid., 441-442. When the Principia was being prepared for publication, there was a dispute between Newton and Hooke. Hooke wanted to be acknowledged in the Principia for providing the idea of the inverse square law to Newton, but Newton claims he had the idea much earlier. Halley, being the editor of the Principia, was the go-between in this dispute. 22 Perhaps by making use of the Huygens result that I mention below, that the centripetal force goes as v2/r. The 3/2 power rule says that the square of the orbital period of a planet goes as the cube of the length of the semi-major axis. Let the ellipse be a circle to a first approximation, so that the foci are very near the center of the circle, the planet moves at a constant speed to a first approximation, and the semi-major axis can be treated as the radius of the circle. Let the speed of the planet be v, and the length of the semi-major axis be r. The period of the planet is proportional to r/v, so the 3/2 power rule says that r3 is proportional to r2/v2. We rearrange to get v2 proportional to 1/r. The Huygens result then says the force should be proportional to v2/r, so by substitution we get the force being proportional to 1/r2.

72

starting off with three definitions and four hypotheses, and deriving four theorems.

Throughout the tract, Newton inserts scholia in which he discusses the application of these theorems to planetary astronomy. Theorem 1 says that bodies moving under centripetal forces obey Kepler‘s area rule. Theorem 2 says that, for bodies revolving uniformly in circles, the centripetal force is proportional to the square of the arcs traveled per unit time divided by the radius. This relationship, which had already been shown by Christiaan Huygens in the Horologium Oscillatorium in 1673, is better known to physics students in the form of Corollary 1, which states that the centripetal force goes as v2/r, where v is the velocity of the body in the arc of the circle, and r is the radius of the circle. In Corollary 5, Newton shows that if bodies moving in circular orbits obey the 3/2 power rule, then the centripetal forces are inverse square, and vice versa.

Theorem 3 shows how to determine the force law for a body moving in an arbitrary trajectory under a centripetal force. Theorem 4 shows that for bodies moving in ellipses with centripetal forces directed toward a focus of the ellipse, the square of the period is proportional to the cube of the semi-major axis—a generalization of Kepler‘s 3/2 power rule to elliptical orbits.

Newton discusses in depth two ways of applying these results to planetary astronomy. First, in a scholium to Theorem 4, he shows that you can use the 3/2 power rule result to determine the length of the semi-major axis of a planet‘s orbit very precisely, since the periods of the planets are known precisely. Then, given the length of the semi-major axis, Newton describes a procedure for determining the position of the focus not occupied by the sun, and thus determining the orbit of the planet very precisely.

Second, in Problem 4, he shows how, given an inverse square force, you can determine

73

the ellipse in which a body moves, given an initial position, and an initial velocity. In a scholium to Problem 4, he discusses the application of this problem to finding the orbits of comets. There had been much debate about the motions of comets at that time, and being able to show that their motions can be the result of an inverse square force was a real breakthrough at the time.

This suggests that Newton was on his way towards a new way of determining the motions of objects in the solar system, where forces give us a new way of determining the orbits more precisely. There was, however, one problem. If we think about familiar cases where there is a body held in a circular orbit by a centripetal force, such as in a sling, there is an equal and opposite force back on the force center. We can feel this force in the case of the sling. So any general theory of bodies in motion around centers of force needs to take this mutual interaction into account.

In a later version of ―De Motu‖, Newton adds several new features that account for this mutual interaction. Newton adds a new definition, where he says that ―the representatives of quantities are any other quantities proportional to those under consideration‖. Notably, this definition explicitly allows for indirect measurements of particular quantities. He also adds three new laws, one of which, Law 4, says that ―by the mutual actions between bodies the common center of gravity does not change its state of motion.‖ This is equivalent to the third law of motion in the Principia—when you have two bodies, and one body exerts a force on a second body, then in order for this law to hold, the second body must exert an equal and opposite force on the first body.

74

The new law stating that the common center of gravity does not change its state of motion can be used to show that the center of gravity of the solar system must lie near the sun, providing a way of resolving the underdetermination between the Copernican and Tychonic theories. But it also means that the orbits of the planets cannot be determined precisely, since the planets never move in the same orbit twice:

Hence truly the Copernican system is proved a priori. For if the common center of gravity is calculated for any position of the planets it either falls in the body of the sun or will always be very close to it. By reason of this deviation of the sun from the center of gravity the centripetal force does not always tend to that immobile center, and hence the planets neither move exactly in ellipses nor revolve twice in the same orbit. So there are as many orbits to a planet as it has revolutions, as in the motion of the moon, and the orbit of any one planet depends on the combined motion of all the planets, not to mention the action of all these on each other. But to consider simultaneously all these causes of motion and to define these motions by exact laws allowing of convenient calculation exceeds, unless I am mistaken, the force of the entire human intellect. (Herivel 1965, 301)

This famous scholium is known as the ―Copernican Scholium‖. We should note that at this point Newton makes no mention of universal gravity. From our modern point of view, we tend to think that the complexity in the motions of the planets is because all of the planets exert forces directly on each other, and so they are in constant interaction.

But here, the most straightforward way to read Newton is that the complexity is because each planet exerts a pull on the sun, which moves the sun, but then this motion of the sun affects the motions of the other planets, and then this in turn affects the motion of the sun, and so on. If there are mutual interactions between each of the planets and the sun, then the determination of the orbits of the planets becomes much more difficult.

The Copernican Scholium is important because it points toward a new way of thinking about the problem of planetary motion. The natural way to think about what

75

Newton is doing in ―De Motu‖ is that he is thinking about using inverse square forces in order to put constraints on what the trajectories of the planets would be. If they are governed purely by an inverse square force, they would be elliptical with the sun at one focus, and obey the 3/2 power rule, for example. That would put constraints on the trajectories and make the determination of the planetary orbits by indirect measurement much simpler. Now, it could have turned out that the orbits of the planets are perfectly stable and well-behaved, in which case we might have slowly refined our measurements of the planetary orbits over decades of observations. But Newton found that there was a distinct possibility that the orbits of the planets are not stable, and that they therefore cannot be used for such a series of indirect measurements.

Here is another way of thinking about this. Suppose we wanted to characterize the motions of the planets in some way. One way in which we might think to characterize these motions is by determining the shapes of the planetary orbits and the trajectories of the planets over time, as was the aim of Kepler‘s project. These will do to a certain level of accuracy, but if we want to be even more accurate, the trajectories will not be the best set of parameters for characterizing the solar system, because they are constantly changing—unstable, to use a term from chapter 1. They need to be stable because a great part of the ultimate justification of the measurements of the trajectories comes from agreement between measurements—whether in converging measurements, or in successive approximations using decomposition as Kepler did for the orbit of Mars in the Astronomia Nova.

If the picture of the planetary motions that Newton presents in the Copernican

Scholium is right—ungraspable complexity in these motions, but governed by simple

76

inverse square forces—then the best parameters for characterizing the planetary motions would be whatever stable forces are at work. But the existence and strength of these forces must be inferred from the motions. And since there are many objects in motion in the solar system, there is now the familiar question of to what object one ought to refer the motions, in order properly to determine the values of the attractive forces. This issue is explicitly addressed in the scholium that follows the Copernican Scholium, where

Newton writes that ―the motion of projectiles in our air are to be referred to the immense and truly immobile space of the heavens, not to the movable space which turns round with the earth and our air and is commonly regarded as at rest.‖ (Herivel, 302)

Newton develops this idea further in the Principia. In the Scholium to the

Definitions, Newton postulates the existence of an object, absolute space, to which one can refer all motions. In modern terms, absolute space is a frame of reference relative to which all departures from uniform motion in a straight line are indicative of a force at work. But absolute space cannot be detected—what is one to do? It turns out that for any given set of massive bodies interacting through the force of gravity, their center of mass provides such a reference point. In order to find the proper reference point for the solar system, then, all you need to do is find the center of mass of the solar system. One problem is that you don‘t know what all the sources of force in the solar system are to begin with. But one way of dealing with this problem is as follows. You start by making a first approximation—from the acceleration of bodies surrounding the Sun, the

Earth, and the planets, you can get a rough idea of the strength of the acceleration field around these bodies. You then calculate the motions we would expect to see given these fields. Deviations from these expected motions are taken to be the result of some other

77

force at work. You then try to locate the source of this other force. If you find this new source, you incorporate this new source into your calculations. Further deviations are taken to be indications of other forces at work, and so on.

Recall that the program of Kepler involved decomposing the motions of the planets and using successive approximation to get better and better estimates for the orbits of the planets. The Newtonian program can be thought of as a program involving a new kind of decomposition—trying to identify, and measure, the forces at work in the solar system. There are some new issues that must be dealt with, however. I focus on one problem in this chapter. As I have mentioned, the Newtonian program involves finding the center of mass of the bodies in the solar system. The center of mass is found by the application of the third law of motion between the bodies. But there is some reason to think, as we shall see, that the third law of motion cannot be applied between the celestial bodies.

3 Newton’s application of the third law of motion in the Principia

We have seen that certain assumptions, the third law of motion in particular, would allow, if true, the determination of the proper point of reference for motions in the solar system. But was Newton in fact justified in making this assumption? The application of the third law of motion to planetary interactions, in particular, was a point in the Principia on which Newton has been criticized, going all the way back to

Newton‘s contemporary and editor of the second edition of the Principia, Roger Cotes.

At the time the Principia was published, the most popular theory for explaining the motions of the planets among natural philosophers was the vortex theory, in which it

78

was supposed that the planets are carried around by a kind of matter, the aether, that fills the spaces between the planets. If this theory is true, there is no reason to think that the third law of motion would be applicable to planetary attraction. This was famously pointed out to Newton by Cotes when he was preparing the second edition of the

Principia. In a letter to Newton, he wrote:

But in the first corollary of this 5th proposition I meet with a difficulty, it lies in these words: ―and since, every attraction is mutual‖. I am persuaded they are true when the attraction may be properly so called, otherwise they may be false. (Isaac Newton, The Correspondence of Isaac Newton, Vol 5, 1709-1713, 392).

Cotes is referring to Proposition 5 in Book 3, a key part of the argument for universal gravity. In Book 3, Newton starts off with six ―Phenomena‖—that Kepler‘s area rule and 3/2 power rule apply to the satellites of Jupiter and Saturn, that the planets obey the 3/2 power rule, that the five planets, leaving aside the earth, obey the area rule, and that the moon also obeys the area rule to a first approximation. Then, using various mathematical propositions derived in Book 1, he concludes in Proposition 1 that

Jupiter‘s satellites are subject to an inverse square centripetal force towards Jupiter (and the same for Saturn in the second and third editions), in Proposition 2 that the same holds for the planets and the sun, and in Proposition 3 that the same holds for the moon and the earth. Proposition 4 claims that the force holding the moon in place is terrestrial gravity, arguing through the famous Moon Test. Proposition 5 claims that the centripetal forces that account for the motions of the satellites of Jupiter and Saturn, and of the planets, is also gravity.

In Corollary 1, which is the problematic one for Cotes, Newton simply states,

―and since, by the third law of motion, every attraction is mutual, Jupiter will gravitate

79

toward all its satellites, Saturn toward all its satellites, and the earth will gravitate toward the moon, and the sun toward all the primary planets.‖ This corollary in fact does its main work in Proposition 7, which states that gravity is universal, and it is proportional

2 to the quantity of matter in each body, that is, in the familiar equation F = Gm1m2/r , where m1 is the mass of the attracting body and m2 is the mass of the attracted body, the fact that m1 is a factor in the equation.

Cotes objects to Newton simply stating outright that the third law of motion applies to the attraction between Jupiter and its satellites, Saturn and its satellites, the earth and the moon, and the sun and the planets. He then gives an example to clarify the problem. He asks Newton to imagine that there are two globes on a table, and globe A is at rest while globe B is brought towards globe A by an invisible hand. An observer who was not aware of the invisible hand would find that it appears just as if globe B is being attracted by a ―proper and real attraction of A‖. It would be a mistake, however, for the observer to apply the third law of motion between globe A and globe B and

―conclude contrary to his sense and observation that the globe A does also move towards globe B and will meet it at the common center of gravity of both bodies‖.

If gravity is mediated by some causal mechanism such as an aether, as the prevailing theory at the time held, then there is no warrant for applying the third law of motion to the planetary interaction. Rather, we would have to apply the third law of motion not between the planets, but between the planets and whatever mediating mechanism there is. To put this in more modern terms, if there is a mediating mechanism such as an aether, two planets that seem to be interacting with each other are not, in fact, a closed system, because aether is interacting with both bodies.

80

4 The justification for the application of the third law of motion

How did Newton defend the application of the third law of motion for interactions between the planets? Newton drafted two responses to Cotes, a short response that he sent to Cotes and a longer one that was never sent. In both of these responses, Newton strongly emphasizes the status of the laws of motion. The laws of motion, he argues, are not to be seen as ―hypotheses‖, but as ―axioms‖:

The difficulty you mention… …is removed by considering that as in geometry the word ―hypothesis‖ is not taken in so large a sense as to include the axioms and postulates, so in experimental philosophy it is not to be taken in so large a sense as to include the first principles or axioms which I call the laws of motion. These principles are deduced from phenomena and made general by induction: which is the highest evidence that a proposition can have in this philosophy. (Newton 2004, 118)

For the deduction from the phenomena of the third law of motion, Newton directs Cotes to look at the scholium to the laws of motion. There, Newton refers to various experiments having been conducted with pendulums, but he acknowledges that these only show the third law of motion to hold for collisions. So Newton further provides a thought experiment where he has us imagine two objects, A and B, that attract each other, with an obstacle sandwiched in between. Now suppose the attraction of A to B and the attraction of B to A are not equivalent. For example, suppose that A is attracted more to B than B is to A. Then the excess force of A would mean there is a net force on the obstacle in the direction from A towards B—thus, the whole system of object A, B, and obstacle would be accelerated indefinitely, which would contradict the first law of motion. Newton says he has tested this with a lodestone and iron, floating them side by side in water, and could observe no net acceleration.

81

Newton further gives a similar argument for the earth in the corollaries, but he makes the argument a bit more dramatic for his unsent response to Cotes:

If a great mountain upon either pole of the earth gravitated towards the rest of the earth more than the rest of the earth gravitated towards the mountain, the weight of the mountain would drive the earth from the plane of the ecliptic and cause it, so soon as it could extricate itself from the system of the sun and planets, to go away in infinitum with a motion perpetually accelerated. (Newton 2004, 121)

As I mentioned above, the basic justification that Newton is offering here for his application of the third law of motion is inductive. We are familiar with the behavior of objects here on earth, and all the objects we know of on earth obey the third law of motion. Newton gives further reasons to think that the same holds true of celestial motions—the earth itself, for example, does not go shooting off into deep space due to some imbalance in the mutual forces of its parts—although this argument about the earth requires that parts of the earth attract each other, something aether theorists clearly would not have accepted.

The problem with Newton‘s response is that the status of the third law of motion is not what would have been at issue for aether theorists. They could grant to Newton that the third law of motion holds for all interactions between bodies. But suppose the attraction between Body A and body B is mediated by an aether. Then you would not look merely at the interactions between body A and body B, but the interaction between body A, body B, and the intervening aether. A system of body A, body B, and the intervening aether could have body A and body B exerting different attractions on each other, and not go shooting off into infinity, as long as the difference in apparent attractions is compensated for by the intervening aether.

82

And this is exactly Cotes‘s objection: that the third law of motion applies only when the attraction is ―properly so-called‖, that is, it is a direct, unmediated attraction.

Another way of thinking about Cotes‘s objection is that it is not about an enabling assumption, the third law of motion, after all. It is, instead, about decomposition. We are trying to infer the existence of forces, and their magnitudes, from the motions of the planets. But if there is an intervening aether, don‘t we have to account for that as well?

Recent work on the Principia, starting with the influential work of Howard Stein

(1967), emphasizes that the argument for universal gravity should not be taken to end at

Proposition 7 of Book 3, but rather ―the remainder of Book 3 can be seen as devoted to the ‗proof by phenomena‘ of the law of gravitation‖. (Stein 1990, 220) That is, the rest of Book 3, of which the greater part remains after Proposition 7, is devoted to various ways in which we could produce more evidence for universal gravity. Under this picture, not even Newton himself would have taken the law of universal gravity to have been established once and for all at the time the Principia was published.

The status of the third law of motion might be similar. Under this sort of view, the third law of motion has not been established to be applicable to interplanetary interactions by Proposition 7 of Book 3, but the arguments in the scholium to the laws of motion are supposed to be enough to take the third law of motion to apply between the planets. That is, the true justification for the application of the third law of motion is not simply an appeal to the legitimacy of induction as an inferential move. We can accept induction as an inferential move in our day-to-day dealings. But the planets are antecedently unfamiliar objects—they are completely unlike anything else we knew of at the time. Newton has done experiments in his laboratory to show that the third law of

83

motion holds, but how could he possibly know that these results will generalize to interactions between the planets? The suggestion here is that the ultimate justification for the inductive leap comes from a research program involving indirect measurements which the application of the third law enables, as we will see in the following sections.

5 Kant on the immediate attraction of matter

I now want to examine a rather interesting comment by Kant on this very issue in

Newton. Newton‘s application of the third law of motion as if there is a direct attraction between the planets makes it seem as if he is not allowing for the possibility of a mediating aether, but in fact Newton makes statements elsewhere that make it seem as if he thinks an aether theory could somehow be compatible with the theory laid out in the

Principia. Kant believes this is an inconsistent view:

It is commonly supposed that Newton did not at all find it necessary for his system to assume an immediate attraction of matter, but, with the most rigorous abstinence of pure mathematics, allowed the physicists full freedom to explain the possibility of attraction as they might see fit, without mixing his propositions with their play of hypotheses. But how could he ground the proposition that the universal attraction of bodies, which they exert at equal distances around them, is proportional to the quantity of their matter, if he did not assume that all matter, merely as matter, therefore, and through its essential property, exerts this living force? For although between two bodies, when one attracts the other, whether their matter be similar or not, the mutual approach (in accordance with the law of equality of interaction) must always occur in inverse ratio to the quantity of matter, this law still constitutes only a principle of mechanics, but not of dynamics. That is, it is a law of the motions that follow from attracting forces, not of the proportion of the attractive forces themselves, and it holds for all moving forces in general. (Kant 2004, 514-515)

Kant is saying here that it is often thought that Newton left it open whether the attraction due to universal gravity is a direct, unmediated attraction between planets, or whether

84

this attraction is mediated by an aether. Newton himself says as much in the Optics, as

Kant points out a few paragraphs later:

And, even though he says in the advertisement to the second edition of his Optics, ‗to show that I do not take gravity for an essential property of bodies, I have added one question concerning its cause‘, it is clear that the offense taken by his contemporaries, and perhaps even by Newton himself, at the concept of an original attraction set him at variance with himself. (Kant 2004, 515)

But Kant believes that leaving this question open is inconsistent with what Newton actually does in the Principia.

This comment might be taken basically to be making the same point that Cotes was making, that the inductive leap made by assuming that the third law of motion can be applied as if there is a direct, unmediated force acting between the planets is unjustified. But let us look carefully at the context of this comment. Kant first asks how

Newton could show that the attractive forces of planets at equal distances around them, that is, what we now think of as the strength of the gravitational field, and what Newton calls the ―absolute quantity of attractive force‖, is proportional to their quantity of matter.

Newton shows this in Proposition 7 of Book 3, and the argument depends upon the problematic Corollary 1 of Proposition 5 pointed out by Cotes, where Newton applies the third law of motion as if the attraction between planets is unmediated. Kant next states something a bit cryptic. The ―mutual approach… …must always occur in inverse ratio to the quantity of matter‖, but, Kant explains, this is only a principle of mechanics, not of dynamics. Here Kant is jumping ahead to his chapter on Mechanics, and particularly the section where he presents his version of the third law of motion: ―Third mechanical law. In all communication of motion, action and reaction are always equal

85

to one another.‖ (Kant 2004, 544) This principle, he says, is a law of the motions that follow from attracting forces, not of the proportion of the attracting forces themselves.

As Michael Friedman points out in his commentary on the Metaphysical

Foundations (Friedman 2004, 53), Kant is distinguishing here between what we would now call gravitational mass and inertial mass. Inertial mass is a quantity that comes into play in all mechanical interactions when there is a force on a body—it is the m in F = ma.

On the other hand, gravitational mass is a quantity that is a measure of the absolute strength of the gravitational field of some body, that is, the mass of the attracting body.

Gravitational mass is inferred from the strength of the field. As Kant points out, inertial mass comes into play in non-gravitational interactions, such as those involving magnetic forces, but gravitational mass does not.

But then there is a problem. Kant claims in the chapter on Mechanics that quantity of matter can only be measured mechanically, that is, by the comparison of quantity of motion. Now, this is not a problem when we are talking about things that we have access to mechanically—things like billiard balls, rocks, pieces of wood, and so on.

But now suppose we want to measure the quantity of matter in the planets. We can‘t simply bang them together and watch what happens to determine their relative masses, as we can with billiard balls and rocks. The only way to measure the quantity of matter in the planets is dynamically—through the forces they exert. Kant claims, however, that the measurement of the quantity of matter in the planets is ultimately mechanical:

Nevertheless, original attraction, as the cause of universal gravitation, can still yield a measure of the quantity of matter, and of its substance (as actually happens in the comparison of matters by weighing), even though a dynamical measure—namely, attractive force—seems here to be the basis, rather than the attracting matter‘s own inherent motion. But since, in the case of this force, the

86

action of a matter with all its parts is exerted immediately on all parts of one another, and hence (at equal distances) is obviously proportional to the aggregate of the parts, the attracting body also thereby imparts to itself a speed of its own inherent motion (by the resistance of the attracted body), which, in like external circumstances, is exactly proportional to the aggregate of its parts; so the estimation here is still in fact mechanical, although only indirectly so. (Kant 2004, 541)

Gravitational attraction yields an indirect measure of the masses of the planets—how?

You have to make the assumption that all parts of one body (such as Jupiter) exert a direct force on all parts of another body (such as the sun). Now, all parts of Jupiter are roughly equidistant from the sun, so the total force of Jupiter on the sun is proportional to the mass of Jupiter. But there is an equal and opposite force of the sun on Jupiter, so the attractive force between Jupiter and the sun must be proportional to the mass of

Jupiter. Crucially, to make this work, you have to assume that all of the parts of Jupiter act directly and simultaneously on all the parts of the sun. That is, it must be direct, particle-to-particle universal gravity. It is hard to see how one could make an indirect measurement of the mass of a planet through the mediation of some aether. As Kant recognized, the question of whether gravitational mass and inertial mass are equivalent seems to be a serious lacuna in Newton‘s argument in the Principia.

6 Universal versus celestial inverse square gravity

Before I continue on with my discussion of Kant, I want to discuss further the issue of particle-to-particle universal gravity. Newton‘s theory of gravity says that there is an inverse square attractive force between every bit of matter in the universe. An alternative theory of gravity would be for celestial bodies, e.g. the sun, the planets, and their satellites, to have inverse square attractive fields around them, but for there not to

87

be such an attractive force between every single bit of matter. Under this alternative theory of gravity, a bowling ball would be attracted to the earth in accordance with the inverse square law, but there would be no attractive force between two bowling balls.

Let us call this alternative theory of gravity ―celestial inverse square gravity‖.

How can we tell which is the correct theory of gravity? It turns out that since almost all of the results of the Principia concern the attraction between celestial bodies, there is only one part of Book 3 in which Newton considers observations that could potentially tell the difference. This is in Propositions 19 and 20, which concern the shape of the earth and other planets. Centrifugal force due to the rotation of the earth makes the earth bulge out slightly at the equator. This is because the centrifugal force at the equator will have the effect of slightly canceling the surface gravity, while there will be no centrifugal force at the poles. In Proposition 19, Newton uses a method where he imagines there is an L-shaped canal filled with water that goes from a pole of the earth straight down to the center, and then perpendicularly outwards to the equator, and considers what the relative height of the water would be in each leg of the canal, to calculate how much the earth bulges at the equator. He calculates that the earth should be about 1/229 higher at the equator than the poles, which is about 17 miles (Newton

1999, 823-824).

This method crucially uses results from Book 1, Sections 12 and 13, in which

Newton assumes various different force rules for particle-to-particle universal gravity, particularly a linear rule and an inverse square rule, and calculates what the force rule would be if you aggregate the attractive forces due to all the particles in bodies of various shapes. In Section 12, he considers spheres. Among his results in this section,

88

he shows that in the case of inverse square particle-to-particle gravity, for a particle inside a solid sphere of uniform density, the force on the particle varies linearly with distance from the center of the spheres (Proposition 73), and for a particle outside such a sphere, the force will be inverse square with respect to distance from the center of the sphere (Proposition 74). In Section 13, he considers other shapes, and of particular interest for us are corollaries 2 and 3 to Proposition 91, in which he considers the force of attraction for a particle located externally on the axis of rotation of a spheroid

(corollary 2) and the force on a particle located inside a uniformly dense spheroid

(corollary 3). Newton uses corollary 3 to get the result for Proposition 19.

This is significant because his contemporary Christiaan Huygens had also been interested in the question of the shape of the earth. In the ―Discourse on the Cause of

Gravity‖, published in 1690, two and a half years after the publication of the Principia, he makes his own calculation, but assumes that the force of gravity inside the earth is uniform, and comes up with the earth being 1/578 higher at the equator than at the poles.

He comments that Newton came up with a different result, but he assumes particle-to- particle attraction, which he cannot accept:

I am not especially in agreement with a Principle that he supposes in this calculation and others, namely, that all the small parts that we can imagine in two or more different bodies attract one another or tend to approach each other mutually. This I could not concede, because I believe I see clearly that the cause of such an attraction is not explicable either by any principle of mechanics or by the laws of motion. Nor am I at all persuaded of the necessity of the mutual attraction of whole bodies… (Huygens 1690, 159 [Bailey and Smith translation, unpublished])

The measurement of the oblateness of the earth could potentially provide a way of determining whether gravity is universal or merely celestial. In Proposition 20, Newton

89

provides a way of measuring the oblateness of the earth. The surface gravity should be slightly weaker at the equator than at the poles because it is farther away from the center of the earth. Since the period of a pendulum is related to its length and the strength of surface gravity, a pendulum with a period of one second at the North Pole would have to be shortened slightly in order to have a period of one second at the equator. By measuring the comparative lengths of a seconds-pendulum at various latitudes, the strength of surface gravity at these latitudes can be found, and thereby the oblateness of the earth.

In the first edition of the Principia, Newton calculates in Proposition 20 how much longer a seconds-pendulum would be in the island of Gorée, Cayenne, and at the equator, in comparison to Paris, and gives the results of 81/1000, 89/1000, and 90/1000 inches respectively. But he further comments that these results assume that the density of the interior of the earth is uniform. If the matter at the center is denser, he comments, the pendulums nearer the equator would be a bit longer. In fact, he goes on, the French had in fact made such determinations of the lengths of a seconds-pendulum, and since the lengths are slightly longer, he concludes that ―the earth will be somewhat higher at the equator than according to the above calculations and denser at the center than in mines near the surface‖. Notably, he is saying that discrepancies between what his theory predicts and what observations show is not an indication that his theory is wrong, but that one of his assumptions, uniform density inside the earth, is wrong.

In the 1730‘s, the French natural philosophers Maupertuis and Clairaut sent expeditions to Peru and Lapland in order to determine the shape of the earth. The data from these expeditions appeared to show that the earth is higher by a ratio of 1/215 or

90

1/223 according to one set of numbers, or between 1/132 and 1/303 according to another set.23 If the numbers were correct, then the oblateness of the earth was closer to what

Newton predicted, but it was not conclusive. Since the oblateness of the earth depends crucially on the density within the earth, however, the question of the actual density distribution within the earth became important. In 1743, Clairaut published his theory on the figure of the earth, in which he derives an equation from which one could test, independently of the earth‘s density distribution, whether particle-to-particle inverse square gravity is correct, given that the earth is spheroidally symmetric. At the end of the eighteenth century, Laplace worked on the problem of trying to reconcile the data on surface gravity and Newtonian particle-to-particle gravity, and came to the conclusion that Newtonian universal gravity indeed holds, and that the earth‘s interior density probably decreases from the center to the surface.

Kant was familiar with the work of Maupertuis and Euler, and we in fact know that his library contained a book on the shape of the earth by Maupertuis.24 He was thus aware of the research that was going on at the time on the figure of the earth, and probably of its relevance for establishing universal gravity. The question of the density distribution inside the earth was still an open question. This is a topic to which I will return in chapter 4.

23 Smith 2006 (Leiden lecture), ―The Question of Mass in Newton‘s Law of Gravity‖. 24 The title of the book was ―The Degree of the Meridian Between Paris and Amiens . . . on Which Basis the Shape of the Earth is Established by Comparison of this Degree with That Measured at the Polar Circle‖. See Kant 1992, 509.

91

7 Was Newton at variance with himself?

According to Michael Friedman, the application of the third law of motion plays a crucial role in the project of the Metaphysical Foundations, which is a ―critical analysis of the conceptual foundations of Newtonian physics.‖ (Friedman 1990, 186)

One fundamental concept that Newton uses in the Principia is the concept of absolute space. This concept is used to set up a distinction between true and apparent motions, which allows Newton to set out the purpose of the Principia, ―to determine true motions from their causes, effects, and apparent differences, and, conversely, of how to determine from motions, whether true or apparent, their causes and effects.‖ (Newton

1999, 415)

Kant, in the chapter on Phoronomy in the Metaphysical Foundations, states that he cannot accept this conception of absolute space, because it cannot be an object of experience (Kant 2004, 481). But according to Friedman, Kant has a surrogate conception for Newtonian absolute motion, which is determined by ―constructing‖ it from apparent motions using the methods in the Principia:

My suggestion is that we view Kant, in the Phenomenology in particular, as attempting to turn Newton‘s argument of Book 3 of the Principia on its head. Newton begins with the ideas of absolute space and absolute motion, formulates his laws of motion with respect to this pre-existing spatio-temporal framework, and finally uses the laws of motion to determine the true motions in the solar system from the observable, so far merely relative or apparent, motions in the solar system. Kant, on the other hand, conceives this very same Newtonian argument as a constructive procedure for first defining the concept of true motions. This procedure does not find, discover, or infer the true motions; rather, it alone makes an objective concept of true motion possible in the first place. (Friedman 1992, 142-143)

92

Exactly how does this ―constructive procedure‖ work? Friedman goes on to explain that for Kant, the laws of motion are ―conditions under which alone the concept of true motion have meaning: that is, the true motions are just those that satisfy the laws of motion,‖ and in particular that ―one can use Newton‘s third law of motion, the equality of action and reaction, to characterize a privileged frame of reference for describing the true motions in a system of interacting bodies‖. This privileged frame of reference is that of the center of gravity of such a system.25 There will thus be a privileged frame of reference for the solar system, which will define the true motions of the solar system, and now if we think even bigger and include the entire Milky Way, there will be a privileged frame of reference for the Milky Way, and so on. Absolute space, then, for

Kant, is, in Friedman‘s words, ―the ideal end-point of this constructive procedure‖.

Friedman goes on to claim that for Kant, the immediacy and universality of gravity are a priori because they are necessary in order to carry out this constructive procedure:

We need to presuppose the immediacy and universality of gravitational attraction in order to develop a rigorous method for comparing the masses of the primary bodies in the solar system. We need such a method, in turn, in order rigorously to determine the center of mass of the solar system. This, in turn, is necessary for rigorously determining a privileged frame of reference and thus for giving objective meaning, in experience, to the distinction between true and apparent (absolute and relative) motion. This, finally, is necessary if matter, as the movable in space (Definition 1 of the Phoronomy: 480.5-10), is to be itself possible as an object of experience. Hence an essential—that is, immediate and universal—attraction is necessary to matter as an object of experience. It follows, for Kant, that the immediacy and universality of gravitational attraction must be viewed—like the laws of motion themselves—as in an important sense a priori. These two properties cannot be straightforwardly obtained from our experience

25 For example, for a system in which there are two bodies with equal mass, one at rest and the other traveling toward it at the uniform velocity v, there will be one privileged frame of reference, in this case the one defined by the midpoint between the two bodies, in which the two bodies will be traveling with respective velocities of v/2 and –v/2.

93

of matter and its motions—by some sort of inductive argument, say—for they are necessarily presupposed in making an objective experience of matter and its motions possible in the first place. (Friedman 1992, 157-158)

Friedman goes on to press this point in the face of the objection that the immediacy and universality of gravitational attraction cannot be a priori, since it is ―subject to inductive confirmation of the most straightforward kind: namely, by observations of the planetary perturbations‖. That is, if gravity acts immediately and universally, we ought to see gravitational interactions between, say, Jupiter and Saturn that would show up as a perturbation of the motions of these planets. Friedman accepts that such observations would provide support for the universality of gravitational attraction. But what is important is how this property of gravity is used to determine the motions of the planetary perturbations:

Kant‘s point is that the notion of true or absolute motion does not even have objective meaning or content unless we employ Newton‘s procedure for determining the center of mass of the solar system and hence presuppose that absolute gravitational acceleration is in fact universal. To be sure, observations of planetary perturbations turn out, fortunately, to provide corroboration for this whole scheme; but consider what our situation would have been if such perturbations had not been observed. We would not then merely be in the position of having disconfirmed an empirical hypothesis, the hypothesis that gravitational accelerations are universal; rather, we would be left with no coherent notion of true or absolute motion at all. For the spatio-temporal framework of Newtonian theory—which, for Kant, can alone make such an objective notion of true or absolute motion first possible—would itself lack all objective meaning. (Friedman 1992, 159)

Friedman is claiming that in order for the constructive procedure to work in the first place, we must presuppose the immediacy and universality of gravity. If, for example, it turned out that we did not see perturbations in the motions of Jupiter and Saturn, this would presumably show that gravity does not act immediately and universally after all.

94

If so, then we would not be able to carry out the constructive procedure, and we would not be able to construct a surrogate for absolute space in which we could have an objective notion of true motion.

I accept Friedman‘s account of the construction of absolute space, but I am not so sure about whether if we failed to find planetary perturbations between Jupiter and

Saturn, this would render the entire notion of absolute space incoherent. Granted, if we failed to see any perturbations at all between Jupiter and Saturn, we would take this to be showing us that there is something seriously wrong with Newtonian theory. But

Kant‘s comment on Newton says that Newton must assume an immediate gravitational attraction. I think there is an opening for Newton, though. Suppose gravity is mediated by an aether, but it turns out that the way the aether mediates the gravitational attraction, we can treat most planetary interactions as if there is an immediate attraction there. But nevertheless, there is the possibility of some leakage of momentum into the aether, and thus that the third law of motion applies only approximately for planet-to-planet interactions. Now, as long as the third law of motion holds approximately between the planets, I do not see why we cannot still have a constructive procedure.

Moreover, the recent work of George Smith (Smith 2002b) has shown how sensitive Newton was to the idea of approximation, since, after all, the Principia was written after Newton realized, when composing the Augmented Version of ―De Motu‖, that the motions of the planets could be ungraspably complex. Smith believes that

Newton came up with two ways of dealing with this ungraspable complexity. The first is that ―in every case in which he deduces some feature of celestial gravitational forces, he has taken the trouble in Book 1 to prove that the consequent of the ‗if-then‘

95

proposition licensing the deduction still holds quam proxime so long as the antecedent holds quam proxime‖ (Smith 2002b, 156). That is, since Newton knows that the planetary motions could never be described exactly, he has made sure that his argument still works even if there is only an approximate match between the deduced phenomena and observation. And so, ―thanks to this restriction, unless his laws of motion are seriously wrong, Newton‘s law of gravity is definitely true at least quam proxime of celestial motions over the century of observations from Tycho to the Principia‖.

The second method is that ―in every case in which Newton deduces some feature of celestial gravitational forces, mathematical results established in Book 1 allow him to identify specific conditions under which the phenomenon from which the deduction is made would hold not merely quam proxime, but exactly‖ (Smith 2002b, 157). By doing so, the complexities of the true motions may be addressed ―in a sequence of progressively more complex idealizations, with systematic deviations from the idealizations at any stage providing the ‗phenomena‘ serving as evidence for the refinement achieved in the next. Such systematic deviations are appropriately called

‗second-order phenomena‘ in so far as they are not observable in their own right, but presuppose the theory‖.

Let us think about the perturbations between Jupiter and Saturn again. Although

Newton does not actually attempt to work out a theory for these interactions in the

Principia, they are an example of second-order phenomena that could be used to gather more and more evidence in Newton‘s ongoing program. Now, let us consider two possibilities. The first possibility is that the interaction between the planets is mediated by an aether and there really is no momentum exchange at all between the planets (such

96

as in the invisible hand example). But the fact that the third law of motion is applied so often in predicting phenomena would mean that we would see huge discrepancies, so we would know very early on that there is something wrong with Newton‘s program. The second possibility is that the interaction between the planets is immediate to a very good approximation. This could be due to some ―leakage‖ of momentum into an aether in cases of interplanetary attraction, and in this case, the leakage would probably begin to show up in many different second-order phenomena, not just that between Jupiter and

Saturn. We would then have to modify Newton‘s theory so as to take the effects of the aether into account.

What was actually the case, of course, is that there was no such leakage into the aether. So we got better and better idealizations as we looked at various second-order phenomena for a very long time, over the course of an immensely successful research program, but we eventually found a discrepancy that just did not go away, namely the precession of the perihelion of Mercury. Newton thought his program would guarantee that only the second possibility would happen, since in the first case, his program would presumably not even get off the ground. And this is why he could apply the third law of motion as if gravity was unmediated, while still leaving open the possibility of gravity being mediated by the aether. The application of the third law of motion would still have been approximately correct, and as I explained above, Newton‘s program was designed with such cases of quam proxime agreement between deduced phenomena and observation in mind. We would still have a coherent notion of the true motions, one as coherent as any other, since no deduction of the phenomena, and no calculation of the

97

center of mass of the solar system, could ever be exact, in the face of its awesome complexity.

So what was Kant complaining about? Going back to Kant‘s terminology, it seems like one could still carry out the constructive procedure, and as long as we can treat planetary interactions between the planets as if there is an immediate and universal attraction between the planets, we would still be able to construct a surrogate for absolute space. I want to suggest that Kant‘s complaint is not that a constructive procedure could not be carried out without assuming the immediacy and universality of the gravitational attraction. I think, rather, that Kant‘s complaint is about what I have called stability in chapter 1. In order for the constructive procedure to result in a surrogate for absolute space, one that is objective, it must have an endpoint. Of course, the endpoint of the procedure might never be reached. But the constructive procedure must be such that if it were carried out infinitely, it would converge on some point which would be the center of gravity of the universe. We are not guaranteed to have such an endpoint, however, without assuming the immediacy of gravitational attraction.

Why not? Well, let me explain why we are guaranteed to have an endpoint if we assume the immediacy and universality of gravitational attraction. If we assume the immediacy and universality of gravitational attraction, this will give us a way of indirectly measuring the masses of the planets dynamically, as Kant mentions in the chapter on Mechanics. Immediately after the passage on measuring the masses of the planets, Kant has Proposition 2: ―First Law of Mechanics. In all changes of corporeal nature the total quantity of matter remains the same, neither increased nor diminished‖

(Kant 2004, 541). Mass, being the quantity of matter in the planets, is a stable quantity.

98

It cannot be reduced or enlarged without adding or taking away some portion of the planets themselves. In the Remark to Proposition 2, Kant contrasts consciousness, which has a ―degree, which can be greater or smaller, without any substance at all needing to arise or perish for this purpose‖ (Kant 2004, 542). Thus a soul could gradually wane away and expire. The masses of the planets do not do that. In fact, they necessarily cannot do that, since they fall under the concept of the movable in space.

The quantity of matter in the planets can only be a ―plurality of the real external to one another‖ (Kant 2004, 543, emphasis in the original). Thus, as long as there is no transfer of material between the planets themselves, the center of mass of the universe is guaranteed to be stable, and we are guaranteed to have an endpoint to the constructive procedure.

Now suppose the gravitational attraction is mediated by an aether, and there is some slight leakage of momentum into the aether, but the third law of motion can still be applied between the planets approximately. To be sure, we could still measure the masses of the planets, and calculate the center of mass, although it would only be an approximation. So we could still carry out a constructive procedure. But we cannot guarantee the stability of such a procedure—since we have no idea what is happening to the leaked momentum, it could be that every time we carry out the procedure, it is giving us a slightly different value for the center of mass of the planets. This might be fine for practical purposes. But there is not guarantee at all that there will be a stable endpoint.

Of course, the entire story about the constructive procedure is merely Kant‘s way of reconstructing the metaphysical foundations of Newton‘s theory. But the point about stability still holds. Newton‘s program depended on making indirect measurements of

99

the attractive forces of the planets. Now, granted, for almost all practical purposes, as long as the third law of motion applies at least approximately, we can do just fine and come up with approximate calculations of the motions of the planets. But Newton‘s program would eventually have been in trouble if there was no procedure that would give us a stable measurement of the masses of the planets, and such a procedure would likely have been impossible if the gravitational attraction was mediated by an aether.

8 Conclusion

This ends one portion of this dissertation, where I have considered the history of planetary astronomy, starting from Copernicus, with a particular focus on the role of indirect measurement. I argued that the use of indirect measurement allowed spectacular progress to be made in planetary astronomy. At the same time, we must realize that in order for the use of indirect measurement to be possible, the world must cooperate.

Thus we need to look very carefully at what the conditions are that make indirect measurements possible in the first place, and whether the world actually does satisfy these conditions.

I want to end with an extremely famous passage from the B Preface to Kant‘s

Critique, in which Kant compares his critical project to the ―first thoughts of

Copernicus‖, who:

…when he did not make good progress in the explanation of the celestial motions if he assumed that the entire celestial host revolves around the observer, tried to see if he might not have greater success if he made the observer revolve and the stars at rest. (Bxvi)

100

This quote is famous because of Kant‘s comparison of the motion of the earth to the active role of our own minds in the construction of experience. But it also shows that

Kant knew of the progress that was able to be made in planetary astronomy through the assumption that the earth moves. But the progress does not come for free. It is merely hypothetical, until we understand the conditions that make such progress possible, and make sure the world actually fulfills those conditions, as Kant declares a few pages later in a footnote:

The central laws of the motion of the heavenly bodies established with certainty what Copernicus assumed at the beginning only as a hypothesis, and at the same time they proved the invisible force (of Newtonian attraction) that binds the universe, which would have remained forever undiscovered if Copernicus had not ventured, in a manner contradictory to the senses yet true, to seek for the observed movements not in the objects of the heavens but in their observer. In this Preface I propose the transformation in our way of thinking presented in criticism merely as a hypothesis, analogous to that other hypothesis, only in order to draw our notice to the first attempts at such a transformation, which are always hypothetical, even though in the treatise itself it will be proved not hypothetically but rather apodictically from the constitution of our representations of space and time and from the elementary concepts of the understanding. (Bxxii)

I take Kant‘s point here to be that for some problems, namely those problems dealing with systems that are antecedently unfamiliar, hypotheses are not enough—we need at least to attempt to understand the conditions that make the solving of such problems possible in the first place.

101

-4-

Underdetermination in the Indirect Measurement of the Density Distribution of the Earth’s Interior

1 Introduction

This chapter will focus on the indirect measurement of the distribution of density in the interior of the earth, particularly with an eye towards understanding how to deal with problems of underdetermination. Modern attempts to determine the density distribution of the earth start with Newton‘s suggestion, in Propositions 19 and 20 of the Principia, that measurements of the shape of the earth can give us information about the distribution of density inside the earth, as I recounted briefly in chapter 3. In the subsequent two centuries, work on determining the mean density and the shape of the earth allowed some constraints to be put on the density distribution, but radically different density distributions could fit those constraints.

The invention of the seismograph in 1892 allowed an entirely new way of making indirect measurements of the earth‘s interior. Since the velocity of seismic waves depends upon the properties of the medium through which they are traveling, you can make inferences about the properties of the interior of the earth through the measurement of travel times of seismic waves. This allowed much more detailed

102

models of the earth‘s interior to be constructed, but there were still questions as to how uniquely determined such models were. In the 1960‘s, a new kind of seismic observation, normal mode oscillations, provided further constraints on earth models, but the question still remained as to how different earth models could be and still agree with observations. In the late 1960‘s, the geophysicists Backus and Gilbert developed a way to answer this question, and this work led to the development of further refined earth models.

We will see that the problem of determining the earth‘s density distribution is a decomposition problem, similar to the one that Kepler had to solve to determine the orbits of Mars and the earth, or, for that matter, the one that Newton had to solve to determine the proper point to which to refer all motions in the solar system. The decomposition problem in this case, however, is much more complicated. The structure itself is more complicated, and the indirect measurement involves a very complex interplay of assumptions about the structure and properties of the earth.

Underdetermination was a constant problem, and we will examine the ways in which geophysicists dealt with this underdetermination.

2 Early estimates of the density distribution

We will first take a look at early estimates of the density distribution, prior to the use of seismological data. These estimates grew out of work on the shape of the earth in the eighteenth century using gravitational potential theory. If we take the earth to be hydrostatic, rotating, and symmetrical about the polar axis, we can use gravitational potential theory to find the shapes of the surfaces of constant density

103

within the earth. In 1743, Clairaut derived a second-order differential equation, now called Clairaut‘s equation, that gives the ellipticity of a surface of constant density within the earth as a function of radius. In attempting to solve Clairaut‘s equation,

Legendre, in 1793, found a law for the density as a function of radius, (r), that made the equation tractable:

-1 (r) = 0(Ar) sin Ar (1)

In 1825, Laplace, expressly trying to find a density law for the earth using plausible physical assumptions, also found the same equation, now known as the Legendre-

Laplace density law. Other density laws were proposed in the nineteenth century, including Roche‘s law (2) in 1848, and Darwin‘s law (3) in 1884:

2 (r) = 0(1 – Kr ) (2) (r) = Ar-n (3)

In 1879, the British physicists Thomson and Tait proposed a model with two layers of uniform density, an inner layer with a density of 1 and a radius of a1, and an outer shell with a density of 2.

We can see that all of these density laws are simple and involve at most two independent parameters. The reason is because there was very little data to bring to bear on the problem of determining the internal density—the varying radius of the earth‘s spheroidal surface, and its density and moment of inertia. In order to be able to come up with unique values for the parameters, the density laws had to contain very few parameters. There were some constraints that came from potential theory, hydrostatics, and thermodynamics, but there were not nearly enough to narrow down

104

the choices, so there were density models that were radically different from each other. In fact, it was shown in a paper in 1949 by Georg Kreisel that radically different density distributions can give rise to the same potential function, and thus it is in principle impossible to uniquely determine the earth‘s density distribution based on measurements of surface gravity alone.

3 Travel times

The invention of the seismograph in 1892 gave us an entirely new way of making indirect measurements of the interior of the earth. Whenever an earthquake occurs, seismic waves are generated at the earthquake source, travel through the interior of the earth, and are detected at seismographic receiving stations located all over the globe. Since the velocity of a seismic wave depends upon the properties of the medium through which it passes, we can determine properties of the interior of the earth as a function of radius from travel times of seismic waves.

I will first explain seismic waves by analogy with waves on a string. By considering the relation between the force due to the tension on a very small segment of a string, and the displacement of this segment, we can derive an equation for the motion of an elastic wave on this string. For a string with uniform density  and tension , this equation will give a velocity for the wave of v = (/)1/2. We can do something similar for elastic waves in a linearly elastic continuum in three dimensions. The relation between the forces, or the stress, on a volume element, and the displacement, or the strain, of that element, is given by the constitutive equation of the medium, which is the equivalent of Hooke‘s law in three dimensions:

105

ij = cijklkl (4)

Both the stress, ij, and strain, kl, are tensors with 9 components. The constants cijkl, called the elastic moduli, make up a tensor with 81 components. Symmetry considerations reduce the number of independent components in the tensor to 21. If it is further assumed that the medium has the same properties in any orientation, in other words, that it is isotropic, we can reduce the number of independent elastic moduli all the way down to two. There are various ways to define the two independent moduli, but one way is in terms of the shear modulus , also called the rigidity, and the bulk modulus k, also called the incompressibility.

For a linearly elastic, isotropic medium, we can derive wave equations like we could with the string. Two kinds of waves will in fact be generated. For one kind of wave, the volume elements are displaced in a direction perpendicular to the direction of propagation of the wave, that is, they are transverse. In seismology these kinds of waves are called S waves. Another kind of wave is longitudinal, the volume elements being displaced in a direction parallel to the direction of propagation of the wave. In seismology, these waves are called P waves.

The velocity of a P wave is usually represented by , while the velocity of S waves is usually represented by . The velocities in an isotropic elastic medium are given by the following equations:

2 = (k + 4/3)/, (5) 2 = /. (6)

106

The velocities of P and S waves will thus depend on the density, rigidity, and incompressibility in the medium through which they are passing. Much like light waves, if seismic waves come upon a boundary where the density of the medium changes abruptly, they will be refracted and reflected according to Snell‘s Law. If the medium has a continuous change in density, then seismic waves will follow a curved path in accordance with the principle of least time.

When an earthquake occurs, P and S waves are generated and these waves, along with reflected waves, are picked up at receiving stations located all over the globe. Some waves go directly from the source to the receiver, while others are waves that have reflected off the surface of the earth or reflected or refracted at the core-mantle boundary. Since these various waves go along different paths, they are detected at the receiver at different times. Different arrivals are called phases. For example, the P phase is a P wave that goes directly from the source to the receiver along a curved path, the S phase is an S wave that goes directly from the source to the receiver, the PP phase is a P wave that is reflected once at the surface., and the

PcS phase is a wave that started out as a P wave and was converted to an S wave upon reflection at the core mantle boundary.

Some of the first results on the earth‘s internal structure from the observation of S and P waves had to do with the discovery of discontinuities within the earth, such as between the liquid core and the solid mantle. In 1906, Oldham took a delay in the arrival of P waves at an angular distance of greater than 120 degrees to be an indication that at the center of the earth is a zone where P wave velocities abruptly decrease. This central layer was called the core, while the layer surrounding the core

107

was called the mantle. Due to the abrupt decrease in the velocity of P waves in the core, P waves are bent downwards, creating a ―shadow zone‖ between angular distances of about 100 and 142 degrees. In 1926, the core was shown to be at least partly liquid, based upon evidence on the rigidity of the interior of the earth from the variation of latitude and tidal motions.26 In 1909, the discovery of a sharp increase in

P velocity at about 50 km depth was taken to be evidence for another boundary, the

Mohorovicic discontinuity (often called the Moho), which is at about 50 km depth and separates the crust from the mantle. Later, the detection of P waves in the shadow zone was taken to indicate the presence of an inner core with higher velocity.

If we assume that the earth is spherically symmetric, so that the density, incompressibility, and rigidity are functions only of depth within the earth, then the travel times for various seismic phases between any two points on the earth separated by the same angular distance will be the same. From seismographic observations, global travel time tables can be constructed, although there are some complications.

The exact location and time of each earthquake is not known, so these must be estimated. The earth is not exactly spherical, and contains lateral inhomogeneities.

Conditions at the source of the earthquake, and errors in reading seismograms must also be taken into account. The geophysicist Harold Jeffreys, known to philosophers for his work on statistics, developed his theory of statistical inference in response to the problem of determining accurate travel times and the earth‘s density distribution.

By 1940, Jeffreys and Bullen created travel time tables, the J. B. tables, accurate to

26 A common perception is that the liquidity of the core was established by the inability to detect S waves passing through the core, since S waves cannot propagate through a liquid medium. Brush (1980) shows that this perception is wrong—the liquidity of the core came to be accepted only after the work of Harold Jeffreys on rigidity.

108

within two seconds. Travel time tables developed independently by Gutenberg and

Richter between 1934 and 1939 are in good agreement with the J. B. tables, and we can take this to be an indication of the reliability of the tables.

Travel times can then be used to estimate the velocities of S and P waves at various depths within the earth. Since travel times are essentially the integral of the

―slowness‖ or the reciprocal of the velocity, we can determine velocities at various depths by solving an integral equation. In 1939, making certain assumptions such as smoothness and extrapolating at depths where various complications or uncertainties prevented the determination of velocities, Jeffreys and Bullen came up with distributions for the velocities of P and S waves within the earth. Working independently, from a different set of travel time tables, Gutenberg and Richter came up with their own distribution in 1939. The general reliability of the Jeffreys-Bullen and Gutenberg-Richter time tables is indicated by the agreement between these distributions.

As I have mentioned, the S and P wave velocities are related to the density, rigidity, and incompressibility through equations (5) and (6). With independent evidence about the values of the rigidity and incompressibility, the internal density distribution of the density can be known to within certain bounds.

4 Earth models based on travel times

In the decades following, the first detailed models of density distribution within the earth were developed. Consideration of thermodynamics and chemistry leads to the conclusion that in relatively homogeneous regions of the earth‘s interior,

109

the main contribution to changes in the earth‘s density comes from changes in pressure, and can be approximated by a relation known as the Williamson-Adams equation, which relates the change in density to the density and incompressibility:

d/dz = Gm/(r2), (7) where  = k/ = 2 - 42/3, (8)

A family of earth models, called Type A models, was developed in the

1940‘s and 50‘s. These models made the assumption that the earth consists of seven relatively homogeneous layers: region A being the crust, regions B, C, and D comprising the mantle, and regions E, F, and G comprising the core. Making use of the Williamson-Adams equation inside regions B, D, and E, each of which was assumed to be relatively homogeneous, and using certain distribution laws for the other layers, the density distribution was calculated from the velocity distribution inferred from travel time data, with the earth‘s mass and moment of inertia as constraints.

Another set of earth models, called Type B models, was developed starting in the 1950‘s. These models grew out of Type A models, but incorporated the k-p hypothesis, which postulates that the incompressibility k and the change in k with pressure, dk/dp, vary smoothly and continuously with pressure at 1000 km depth and below. These models used a generalization of the Williamson-Adams equation, derived from the Murnaghan-Birch equations of state, which applies in inhomogeneous regions. A consideration of equation (5), along with a sharp increase in the travel times for P waves that go through the core, showed that in order to comply with the k-p hypothesis, there had to be a sharp jump in the rigidity there,

110

thus indicating that the inner core is solid. This was the biggest difference between

Model A and Model B. There were also differences in the density gradient in the lower part of region D, and in the density in the upper mantle. In 1963, substantial changes in the known value of the moment of inertia of the earth due to satellite measurements allowed some transfer of mass in Type B models from the mantle to the core, and allowed Type A and Type B models to be brought into close agreement.

Let us consider now the assumptions that went into Type A and Type B models. Bullen states that the biggest assumptions in the Type A models were (a) the value of the density at the top of the mantle, (b) the S and P wave velocities, and

(c) the applicability of the Williamson-Adams equation in regions B, D, and E

(Bullen 1975, 179). Let me briefly consider each of these assumptions in turn. With regard to (a), in order to fit these models, the density at one point within each of the layers had to be known. The value for the density at the top of the mantle was estimated by Williamson and Adams in 1923 to be around 3.3 g/cm3, based upon the known values of the moment of inertia and mass of the earth, as well as supporting evidence from the inferred composition of rock at the top of the mantle. If the inferred composition of the rock is wrong, the density at the top of the mantle could be off, and this would affect the density in regions B and C.

The issues of the S and P wave velocities and the applicability of the

Williamson-Adams equation are related to each other, so I will discuss them together.

Also, in order to keep this discussion brief, I will only talk about these issues as they pertain to regions B and C. Throughout much of the mantle for S and P wave velocities, and throughout much of the core for P wave velocities, the velocity slowly

111

increases as the depth increases, with the rate of increase slowly decreasing. We can call this ―normal‖ velocity variation. In the 1930‘s and early 40‘s, normal velocity variation was taken to be an indication of homogeneity of the region. This is important for Model A because the Williamson-Adams equation assumes homogeneity.

In the 1930‘s, the regions B, C, and D (the mantle) were assumed to be homogeneous and the Williamson-Adams equation was applied to them to find their density distributions. Then the moment of inertia of the core was determined by subtracting out the contribution from the mantle. The moment of inertia for the core thus calculated, however, implied that the density in the core would have to decrease significantly as you go deeper, which was implausible. This was taken to mean that some region of the mantle is inhomogeneous, so that the Williamson-Adams equation is not applicable there.

Separately from this, travel time curves prior to 1939 indicated the presence of a sharp change in velocity or velocity gradient at some particular depth within the upper mantle, and thus abnormal velocity variation. Region B was originally defined provisionally by Jeffreys as a 380 km thick region from the bottom of the crust to where these sharp changes were thought to occur. Region C, in which velocities were abnormal, was taken to be the region of the mantle that is inhomogeneous.

Region B was taken to be a homogeneous region where velocity variation is normal.

Evidence from after 1939, using time travel data from different geographical areas, seemed to indicate that the sharp change occurs at depths significantly higher than the bottom of region B, and that there is significant lateral variation in region B,

112

making the velocity variation actually abnormal throughout region B. There is also some evidence, going back to Gutenberg in 1926, that there is a low velocity layer in region B. Evidence after 1939 also supports the view that region C is inhomogeneous.

Thus, the division of the major layers such as the mantle into sublayers, as well as the locations of the boundaries between them, was somewhat arbitrary, starting with a best guess as to how many layers there are, and the locations of the boundaries between them, and with simple assumptions such as homogeneity, and then adjusting the models against new data. Now, let us think of this problem in terms of the picture of complicated, partially inaccessible systems that I presented in chapter 1. We might think of the problem of finding the locations and the properties of each of the layers of the earth as a kind of decomposition problem, much like

Kepler‘s problem of breaking down the motions of the planets into a component due to the motion of the earth and a component due to the motion of Mars. In the case of

Kepler our ability to determine these two motions accurately are interlocked—we saw that better accuracy in the theory of the motion of the earth gave better accuracy for determining the motion of Mars, for example.

In the case of determining the earth‘s density distribution, too, the problems of determining the properties of the various layers to better accuracy are interlocked, but the interlocking is much more complicated, since there are more layers, and the interlocking is between seismic velocities, homogeneity, locations of the layer boundaries, moment of inertia, and so on. When Kepler adjusted his earth orbit to one with bisected eccentricity, it forced an adjustment onto the distances of the Mars

113

orbit. But in this case, one can make adjustments in many places, by changing the locations of the layer boundaries, by rejecting homogeneity for a layer, adjusting travel times, and so on.

So although by the 1960‘s, the convergence of Type A and Type B models seemed to indicate that we had the internal density distribution correct to a first approximation, one might well have wondered whether a significantly different model could be made to fit all of the observational data available. That is the main question driving a paper from 1968 (Press 1968) that I will briefly touch upon later in this chapter. But before we come to that, we must first examine a completely new way of bringing seismological evidence to bear on the question of the internal density distribution: normal modes of the earth.

5 Normal modes

In the 1960‘s, it became possible to bring a new type of seismological observation to bear on the problem of determining the internal structure of the earth—observations of the normal modes, or free oscillations, of the earth. As I described above, the motion of an idealized mathematical string with tension  and density  can be approximated by a wave equation, and with the addition of certain boundary conditions (such as fixing the ends of the string), only certain functions are possible solutions to this wave equation. These functions are called normal modes, and the frequencies corresponding to the normal modes are called the normal mode frequencies. Arbitrarily shaped waves traveling along the string can be represented as linear combinations of these normal modes using the theory of Fourier analysis.

114

Although the mathematics is a bit messier, essentially the same thing can be done for the vibrations of an idealized three-dimensional object. In particular, we can think of an idealized earth as a spherically symmetric, isotropic, perfectly elastic, non-rotating body. The free vibrations of such a body are a linear combination of the three-dimensional equivalent of the normal modes of the string—in this case, they are complicated three dimensional functions involving Legendre polynomials. It turns out there are two types of normal modes—torsional modes, which inolve no radial movements, and spheroidal modes, which combine radial and transverse motions.

For an idealized earth that is spherically symmetric, isotropic, perfectly elastic, and non-rotating, and given its rigidity, incompressibility, and density as functions of radius, the frequencies of the normal modes can be calculated. This problem is in principle solvable uniquely and exactly, although it may require integration by computational techniques. Thus, observation of the earth‘s normal modes would give us a new set of constraints on the distribution of the density, rigidity, and incompressibility inside the earth, in addition to the travel time observations.

The earth is analogous to a bell—when you strike a bell, it rings at particular frequencies. We can think of making inferences about the shape and density of the bell from the frequencies at which it rings. Likewise, if we can make observations of the normal modes of the earth, we can make inferences about its interior. In order to excite the normal modes of the earth, however, it takes quite a bit of energy. It so happens that a couple of the largest earthquakes ever recorded occurred in the early

115

1960‘s—the 1960 Chilean earthquake and the 1964 Alaska earthquake. The normal modes recorded from these earthquakes allowed the development of the first earth models to incorporate normal mode observations. One of the first such models, HB1, was developed by starting with Model A‖, a Type A model with modifications taking into account the 1963 satellite measurements of moments of inertia, and then adjusting the parameters to fit normal mode data taken from the Chilean and Alaskan earthquakes.

6 Underdetermination of earth models

The method of construction of earth models I have been discussing involved starting with a first approximation model based on travel times, and then adjusting that model against new data such as later observations of travel times or the satellite measurement of moment of inertia. These models were then further adjusted to take account of normal mode data. This method, in which you start with a model and then tweak it, is fundamentally like the hypothetico-deductive method. You make hypotheses—that the earth has several layers with boundaries at certain depths, and that certain assumptions such as homogeneity are applicable in these layers. You then start with an initial model, and adjust the parameters of the model until the model agrees with observations.

As I have discussed in chapter 1, the major problem with the hypothetico- deductive method is underdetermination. Even if you find a model that fits the data, there is no guarantee that it is uniquely determined. Now, I suggested one way out in chapter 1—what I called decompositional success. And I said above that we can

116

think of the separation of the interior of the earth into layers with specific properties as being a kind of decomposition. In the case of Kepler, we saw that an adjustment in the earth‘s motions forced adjustments on the motions of Mars, which showed that the Mars orbit must be an oval, and that the line of apsides of the Mars orbit must go through the true sun. Piece by piece, these adjustments forced themselves onto each other. For the earth models, however, there are simply too many places where one can make adjustments, so we do not see the adjustments forcing themselves onto each other in the same way. To be sure, the models seemed to be improving in response to new observations, but not in a way that ruled out the possibility of other models that fit the data just as well.

So there was a real worry among geophysicists in the 1960‘s that significantly different models might be compatible with the observations. In 1968,

Frank Press published a paper in which the aim was to explore the possibility that the assumptions of the earth models at the time were wrong, and that there could be radically different earth models that fit the observations just as well. His method was brute force—using a Monte Carlo procedure to produce, on a computer, 5 million different earth models, each with a different distribution of density, and testing each against the mass of the earth, the moment of inertia, travel times, and normal mode frequencies. The aim was to get an idea of how different earth models could be and still agree with observations:

The problem of establishing uniqueness of earth models derived from geophysical data is still unsolved; for this reason all models proposed for earth have been somewhat in doubt. One advantage of the Monte Carlo method is that it finds a smaller number of solutions, satisfying the geophysical data, from a very large number of possible models. The degree

117

to which the successful solutions agree (or disagree) is a rough measure of the precision to which earth models can be specified with the currently available data. Furthermore, Monte Carlo procedures find models lacking bias stemming from ―initial‖ models or other preconceived notions of earth‘s structure. (Press 1968, 1218)

Among the 5 million models generated, Press found six that agreed with observations but rejected three of them as being implausible.

Figure 1 (from Press 1968, 1219) shows the three that he found, along with one that he rejected, and a standard model proposed by Birch. Let us refer to the models that agree with observations as observationally acceptable models. We can see that the observationally acceptable models can be quite different from each other, especially in the upper mantle—the models seem to have an unexpectedly complicated structure.27 Press suggests that we can learn things about what the actual density distribution is like by looking at common features of the observationally acceptable models. But, as Press himself points out, we must take care in making such an inductive move, since if he had tested 5 million more, a set of models with radically different features might have been found.

The problem is that the Monte Carlo method that Press uses is a speeded-up, brute-force version of the hypothetico-deductive method. No matter how many observationally observable models are found, the possibility remains that there could be more that are radically different. The hypothetico-deductive method does not give us the means for dealing with this underdetermination. What we would really

27 According to Bullen (1975, 332), this is because Press‘s procedure favored overparametrized models.

118

like to know is this: how different can the models be and still be observationally acceptable?

Figure 1 (Top: density distribution in mantle. Bottom: density distribution in the core.)

7 Backus and Gilbert

The geophysicists George Backus and Freeman Gilbert wrote a seminal series of papers (1967, 1968, 1970) that explicitly considers the underdetermination of the distribution of density in the interior of the earth. In this series of papers, Backus and

119

Gilbert refer to data that are taken to correspond to features of the earth as a whole as

―gross earth data‖. These include the mass of the earth, the moments of inertia, travel times of P and S waves, and frequencies of the normal modes.

One of the problems that the Press work really highlighted is that the number of possible earth models is incredibly large, even if you use plausibility constraints to reduce their number. Just how large? Backus and Gilbert (1970), discussing the

Press (1968) work, mention that ―the set of earth models under serious discussion in the current literature has at least 40 free parameters. If the selection problem consisted merely in picking the correct value of each parameter from among three possibilities, there would be 340, or 1019, different earth models to consider.‖ Even with Monte Carlo methods, this is a huge number of earth models—much, much greater than the 5 million tested by Press. And even this number is artificially small, restricting the value of each parameter to three possibilities.

And since we can only observe a relatively small number of gross earth data such as mass of the earth, moments of inertia, travel times, and normal mode frequencies, we are going to end up with a huge number of observationally acceptable earth models. In their series of papers, Backus and Gilbert were interested in the question of how much that finite number of data can be used to pinpoint the internal structure of the earth:

Human limitations are such that at any given epoch only a finite number of gross earth data will have been measured. This paper is a discussion of the extent to which these finitely many gross data can be used to determine the earth‘s internal structure. (Backus and Gilbert 1967, 247)

120

If we simply consider the number of observation we can possibly make, and the number of degrees of freedom of the internal structure of the earth, it looks like the latter number is much greater than the former, so we will have a hopeless underdetermination. But just how hopeless is it? Might there be ways of dealing with this underdetermination?

The first step towards answering these questions is to make the underdetermination problem more precise. We must make explicit (1) what is being underdetermined, (2) what is doing the underdetermining, (3) what, exactly, the relation of underdetermination is. Backus and Gilbert set the problem up as follows:

The problem we shall consider is the following: suppose a non-rotating, spherical, isotropic Earth of radius a has density (r), bulk modulus k(r), and shear modulus (r),28 all functions only of r, the radial distance from the center. Suppose a finite number J of gross Earth data 1, 2, …, J have been measured, and that these data depend only on the functions , k, . Given the observed values of 1, …, J, what can be said about the unknown functions , k, ?29 (Backus and Gilbert 1967, 248)

In terms of the three questions given above: (1) they take the density, bulk modulus, and shear modulus of the Earth as a function of radius to be underdetermined; and

(2) they take these functions to be underdetermined by some number of gross earth data, which is finite.

Backus and Gilbert then answer the third question (3) by coming up with a formal mathematical relation between the density, bulk modulus, and shear modulus on the one hand, and the gross Earth data on the other, using the theory of linear

28 Recall that bulk modulus is a measure of incompressibility, and shear modulus is a measure of rigidity. 29 Some of the notation has been changed for consistency with other parts of this paper.

121

differential operators. According to the characterization of the problem given above, the earth is assumed to be completely specifiable by the three functions (r), k(r), and (r). Thus, an earth model can be represented by an ordered triple of real-valued functions defined on [0, 1] (normalizing for the radius of the earth), that is, m = (, k,

). Linear combinations of earth models can be defined in a straightforward way: am1 + bm2 = (a1+b2, ak1+bk2, a1+b2). We can then think of earth models as points in an infinite dimensional linear space M, the space of all possible earth models (see Figure 2). A natural inner product30 can be defined on this space, and it can be completed to form a Hilbert space (the space L2[a,b]).

Figure 2

In order to remove an ambiguity in the term ‗gross earth data‘, we distinguish between ‗gross earth data‘ and ‗gross earth functionals‘. Gross earth functionals are simply real-valued functions g1, g2, … gn on the space M of all possible earth models

(see Figure 3). There are gross earth functionals corresponding to the earth‘s mass, its moment of inertia, its normal mode frequencies, and so on. The actually observed

30 The inner product of m1 and m2 is defined to be the integral from r = 0 to 1 of 12 + k1k2 + 12. The choice of inner product is somewhat arbitrary; what is important at this point is that a Hilbert space can be constructed.

122

values of these gross earth functionals, which we designate by 1, 2, … n, and which are taken to be the values for the ‗real‘ earth, are then called ‗gross earth data‘.

Recall, for example, that the normal mode frequencies depend entirely on the internal structure of the earth. So the frequency of each normal mode can be taken to be a function on the space M of possible earth models. That is, for each normal mode i, there is a function gi that associates, to each point in this space M, a real number representing the frequency of that particular normal mode. And each point in this space M of course represents one possible internal structure of the earth, or one possible earth model. The actually observed values of the frequency of the ith normal mode is given by i.

Figure 3

We now have a precise way of stating the underdetermination. Given a certain set of observed gross earth data 1, 2, … n, can we pinpoint a single model mE which we take to correspond to the ‗real‘ earth in the space M (see Figure 4)?

123

Figure 4

Here we make the assumption that the observed gross earth functionals are linear functions.31 This is not the case for the normal mode frequencies, and in fact, most gross earth functionals are nonlinear functions. There are some linear gross earth functionals, however, like the quality factors of the normal modes. I am doing the linear case first because the linear case must be understood in order to understand the nonlinear case.

According to the Riesz representation theorem, every bounded linear functional L(f) on a Hilbert space may be written as the inner product of l and f, , where l is a point in the Hilbert space uniquely determined by the functional L

32 (Parker 1994, 31-32). Thus, every gross earth functional gi(m) has associated with it a unique point Gi in the space M, such that

gi = . (9)

31 A function f(r) is linear if f(ar1 + br2) = af(r1) + bf(r2) for any real numbers a and b and any values r1 and r2 of the variable r. 32 My explanation here follows that given in Parker 1994, since I find it much cleaner, but I have made the notation and terminology consistent with Backus and Gilbert.

124

Backus and Gilbert call the point Gi associated with each gross earth functional the

‗data kernel‘ of that gross earth functional (Parker (1994) calls this the ‗representer‘).

Take a particular earth model m, which is a point in the space M. The value of a gross earth functional gi for that particular earth model m is given by the inner product of m and the data kernel Gi of that gross earth functional. Intuitively, what this means is that for each normal mode, there‘s a vector G in M such that the frequency of that normal mode for each earth model m is given by the length of the projection of m onto G (see Figure 5).

Figure 5

Now, suppose we are looking for a point mE in the space of possible models, which we take to represent the ‗real‘ earth. Suppose we only have a finite number of gross earth data, say a thousand of them. We are in effect looking at the projection of mE onto the subspace A of M spanned by the data kernels G1, …, G1000 of the gross earth data (see Figure 6). But this space would only have a thousand dimensions, whereas the space M is infinite dimensional. The upshot is simply that

125

there is an infinite number of models that will agree with the data just as well as mE, and hence are observationally indistinguishable from mE. In fact, the space of such observationally acceptable models is infinite dimensional. The underdetermination looks pretty bad, to say the least!

Figure 6

8 The Resolution Method

But maybe it‘s not so bad. This is actually something we already knew about, although we did not have a precise characterization of this underdetermination. For the internal structure of the Earth is something that we postulated at the beginning to be something with infinite degrees of freedom—the way we defined (r), k(r), and

(r), these can be arbitrary functions of radius. And therefore we cannot hope to pin down a unique internal structure given a finite number, no matter how large, of gross

Earth data.

126

A more pertinent question for the geophysicist is whether there are observationally equivalent models that are significantly different from a geophysical standpoint, as Backus and Gilbert point out:

With only finitely many gross data we cannot expect to resolve details of arbitrarily small vertical scale; our vertical resolution is finite. This remark is sufficiently tautological as to be without geophysical interest. A geophysically more interesting question is whether there is any other source of non-uniqueness besides the finite resolving power inherent in a finite set of gross Earth data. We shall see in general there is. (Backus and Gilbert 1967, 251)

An infinite number of models will be observationally acceptable. An infinite number of observationally acceptable models will differ from each other only in their fine- scale structure, but this is not important—if we are doing geophysics, we do not care about models that differ only on the scale of millimeters. The important question is, could there be observationally acceptable models that differ from each other in geophysically significant ways? And if there are, in what way do they differ?

A further thought is that if all of the observationally acceptable models have certain geophysically significant features in common, then we can conclude that the

‗real‘ earth also has those features.

If we are fortunate or shrewd in our choice of which gross data to measure, then all the different [observationally acceptable] earth models may share some common properties. For example, they may all have a low-velocity zone in the upper mantle; or they may all become essentially the same when we take running averages of their , k, and  over some fixed depth interval H. In the first example, we can definitely assert that the earth has a low- velocity zone in the upper mantle. In the second example, we can claim to know , k, and  as functions of radius r, except for unresolved details whose vertical length scale is H or less. (Backus and Gilbert 1967, 249)

127

This will allow us to take what looks like a hopeless underdetermination problem, and draw conclusions about the real world based on it. But how could we go about doing this? Suppose we happen to find an observationally acceptable Earth model, and it has certain features, such as a low-velocity zone in the upper mantle. Is this a feature that is particular to this one model, or is it a feature that all observationally acceptable Earth models share? One way of answering this question is simply to generate a large number of observationally acceptable Earth models and see if they all have this feature, as with the Monte Carlo procedure in Press (1968). But the worry, as I have suggested, is that the simple fact that all the models you have generated so far have this feature, does not mean that all observationally acceptable models have this feature.

Now, another way of answering this question is to find out how much resolving power you have. Suppose your model has a feature near a point r0. If that feature is smaller than the resolving power of the data, then you can conclude that this feature is an artifact of the model you happened to find. In (1968), Backus and

Gilbert provide a method for finding the vertical resolving power of a given set of gross earth data. The method they developed is called the ―Backus-Gilbert resolution method‖.

Suppose we are interested in how much we can know about the values of

(r), k(r), and (r) for the ‗real‘ earth, given a set of gross earth data g1, …, gn. As I mentioned above, each gross earth functional gi can be written as the inner product of its data kernel Gi and a point in M representing an earth model. Thus, the gross earth data gi can be taken to be the lengths of the projections of the point representing the

128

‗real‘ earth mE onto the data kernels Gi. Now we ask how much vertical resolution we can expect to get at some point r0. We will only get as much resolution as can be discriminated by the data kernels Gi. To get some idea of the resolution, then, we try to construct, out of the data kernels Gi, a function that gets as close as possible to a delta function at r0. This function is called the resolving kernel. The criterion for delta-ness is somewhat arbitrary—Backus and Gilbert choose one that is convenient numerically.

The aim of the procedure is, as I mentioned, to give the geophysicist some idea of what the vertical resolution around some point is, so the delta-ness criterion can be decided upon on a practical basis. Any features in an observationally acceptable earth model in the vicinity of r0 that are smaller than the width of the resolving kernel can be taken to be artifacts of the model. The resolving kernel can then be used to generate models in which these artifacts are smoothed out. Thus the

Backus-Gilbert method can be used to determine a model that contains all the resolvable features but does not include model artifacts.

9 The nonlinear case

I said above that most gross Earth functionals are actually nonlinear, including the frequencies of the normal modes. For nonlinear gross Earth functionals, you must first find an observationally acceptable model m0. You then make the assumption that all other observationally acceptable models are sufficiently similar to this model that a linear approximation holds (that is, we assume that the gross Earth functionals are Frechet differentiable) namely, that fi = gi(m1) – gi(m0)

129

can be approximated by a linear function on the space M. You can then proceed with the resolution method exactly as in the linear case, using fi instead of gi. You are, in effect, exploring the part of the space of possible models M that is near the reference model m0. There is no way, however, of telling whether there are radically different models that are so far away that a linear approximation won‘t work.

The upshot is that if the observational data consists of linear functionals, one can use the resolution method in order to tell whether there are observationally acceptable models that differ significantly from each other, or whether all observationally acceptable models differ only in fine-scale detail. If the observational data consists of nonlinear functionals, however, as is the case with normal mode frequencies, you have to make the assumption that there are no models that are so far away that a linear approximation won‘t work, in order to use the resolution method. The only way of determining whether there are such models is to attempt to construct such models and see if they agree with observations. Given the appropriate computational power, one can systematically construct models and test them using Monte Carlo methods, but since the space of possible models is infinite dimensional, there are limitations to such methods, and drawing epistemological conclusions from them can be rather risky.

10 Models using normal mode data and underdetermination

One of the outcomes of Backus and Gilbert‘s work was the development of mathematical techniques for ―inverting‖ the observational data which were then used by geophysicists working on earth density models to develop new ones incorporating

130

normal mode frequencies.33 The resolution method seemingly provided a way of determining the uncertainty in the models, and it looked like the uncertainty could be steadily decreased by increasing the number of gross earth data being inverted.

Meanwhile, it was found by them and others working at the time that earth models obtained by inverting travel time data and earth models obtained by inverting normal mode data had significant discrepancies. The difference was postulated to be because the travel time data was biased due to all receiving stations being on continental platforms. The travel times for each of the phases was thought to be skewed by some set amount for each phase. This was called the ―baseline‖ problem at the time.

Gilbert, Dziewonski, and Brune (1973) used the Backus-Gilbert resolution method in trying to determine the amount of bias in the travel times of certain phases.

They inverted 376 gross earth data consisting only of normal modes and mass and moment of inertia, in two stages. From the model they ended up with, they calculated the expected travel times. The resolution method gave them a way of calculating the uncertainty in the travel times. They found, for example, that the travel time of the PKIKP phase is 1213 seconds, with an uncertainty of 0.2 seconds.

Travel time tables showed the travel time as 1211 seconds, so the correction to the travel time for the PKIKP phase was taken to be 2 seconds. Later, Kanamori and

Anderson (1977) showed that most of the discrepancy between the normal mode data

33 At around the same time, scientists in other fields such as computational imaging and applied mathematics became interested in similar kinds of problems, and the increasing availability of computers that could be used to solve such problems led to the development of the cross-disciplinary field of inverse problem theory.

131

and the travel time data was due not to the locations of the receiving stations, but because of physical dispersion due to the anelasticity of the medium.

I want to make a remark about underdetermination in the light of this result.

Underdetermination is often treated in a very general manner in the philosophical literature. It is usually simply taken to be a situation where more than one set of hypotheses agree with observations. But such a situation can arise in more than one way. The Backus and Gilbert work highlight this fact quite nicely. What Backus and Gilbert showed was that, given a certain method of inverting a set of gross earth data, more than one model—in fact, an infinite number of models—could be made compatible with the data. This was a simple consequence of the mathematics involved. We might call this mathematical underdetermination.

But there is a different way for underdetermination to occur—you might find a different set of models compatible with the data if you use a different set of assumptions. For example, you might make the assumption that the medium is elastic, or you can assume it is anelastic. We might call this theoretical underdetermination.

The Backus-Gilbert resolution method gives you information about the uncertainty in the density distribution provided that you have made the correct assumptions in your theory about the model. If you haven‘t made the correct assumptions, then the resolution calculations might be wrong. Notable assumptions that might go into inversion calculations are, in addition to elasticity, the assumption that linear approximations are going to hold for the normal mode calculation, the isotropy of the medium, and lateral homogeneity of the earth. So in fact, even

132

though the inversion method gives you a good idea of the resolution you are getting on the density distribution, you still have to be careful that the assumptions you are making are correct. One way to make sure is to have cross-checks on the inversion method.

That is one of the ideas behind Gilbert and Dziewonski (1975), in which, starting with two initial models, inversion was done with 1064 normal mode data and the mass and moment of inertia of the earth, to get the models 1066A and 1066B.

Their aim was to use the models in two interdependent inverse problems. As we have been discussing, the normal mode frequencies can be inverted to determine the earth‘s density distribution. In addition, normal mode amplitudes can be inverted to determine the source mechanism of an earthquake—the particular motions that occur at the source of an earthquake.

Gilbert and Dziewonski suggest a ―cyclic or iterative process of successive refinement of structure and mechanism‖, where the normal mode frequencies can be used to refine our estimate of the structure of the earth, and this structure can be used in the inversion of normal mode amplitudes to determine the source mechanism. The source mechanism can then be used to refine the normal mode data, from which better estimates of the structure of the earth can be determined, and so on.

This is like the way in which Kepler successively refined his measurements of the earth‘s orbit and the Mars orbit—an improved earth orbit allowed him to improve his Mars orbit. Success in playing these indirect measurements off one another showed that the particular decomposition was correct, and that the assumptions that he was making were at least approximately correct. Likewise,

133

success in this process of successive refinement of earth structure and earthquake mechanism could also show that the assumptions being made, such as the linear approximation for the normal modes, are at least approximately correct.

By 1981, earth models had been refined to a point where it was decided that a standard model for the structure of the earth should be constructed—the preliminary reference earth model, or PREM (Dziewonski and Anderson, 1981). The aim was to provide a standard earth model that can be used in the study of precession and nutation, the tides, and free oscillations. For this model, about 1000 normal mode frequencies, 500 travel time observations, 100 normal mode Q values,34 and the mass and moment of inertia of the earth were inverted. Notably, by then earth models had been refined to a point where it was found that the assumption of isotropy had to be dropped for the upper mantle. Recall that the assumption of isotropy allowed the tensor of elastic moduli to be reduced to two moduli, the incompressibility k and the rigidity . In PREM, the assumption of isotropy was loosened, and five elastic constants were used.

11 Conclusion: thinking about underdetermination in context

This look at the history of the indirect measurement of the earth‘s internal density distribution shows that we must re-think the usual way of thinking about underdetermination in philosophy. In the first place, as I have just mentioned, there can be more than one kind of underdetermination. Ways of dealing with one kind of underdetermination might not work for another kind. For example, the Backus-

34 Q values are a measure of the attenuation of the normal mode oscillations over time.

134

Gilbert resolution method gave geophysicists a way of determining just how bad the mathematical underdetermination might be, but gave no indication whatsoever of whether there might be theoretical underdetermination. Second, the hypothetico- deductive view is inadequate for thinking about the determination of the density distribution of the earth‘s interior. It is not simply a matter of coming up with hypotheses and testing them against data. Often, scientists are carrying out decomposition, and playing interdependent measurements off of one another.

A final point: philosophers must do a better job thinking about epistemological issues in science, such as underdetermination, in the context of ongoing research. Simply saying that multiple hypotheses are consistent with observations tells us one thing: that we don‘t know whether a theory is true. It tells us nothing about what we do know. In order to understand better what we do know, we need more studies of underdetermination in context, where we examine how scientists dealt with problems of underdetermination.

Another field in which underdetermination problems are important is neuroimaging. A study of the history and current practice of neuroimaging might give us more insight into ways of dealing with underdetermination. I will leave such a study for a future project.

135

Epilogue

The original plan for this dissertation was to have a total of six chapters. In addition to the four chapters that currently comprise the dissertation, I had planned to have a fifth chapter examining contemporary neuroimaging in terms of indirect measurement. A further sixth chapter was to examine the relation between the views about underdetermination and indirect measurement developed in this dissertation with those in the philosophical literature. Research for this dissertation took much longer than I had anticipated, and my acceptance of an academic job compelled me to finish work on this dissertation rather abruptly. For this reason, I have been forced to leave out certain topics that I believe would belong in a more thorough treatment of the subjects covered in this dissertation. I will briefly comment on these topics, and avenues for future research, in this epilogue.

Since a major theme of this dissertation is indirect measurement, this project would benefit greatly from an examination of the large body of literature on the topic of measurement, in particular the classic treatments by Campbell (1957) and Ellis (1966), and the philosophical and mathematical work on the foundations of measurement by

Krantz, Suppes, Luce, and Tversky (1971, 1989, 1990). When I began this dissertation,

136

my main aim was to try to understand how scientists actually deal with cases of underdetermination, especially when they are trying to acquire knowledge about partially inaccessible systems. Indirect measurement became a central issue only later, as I began to realize that the process of separating out and measuring properties of such systems provides a way of dealing with this underdetermination. As a result, I have been working with a preliminary notion of indirect measurement, developed independently of the general literature on measurement. I believe a detailed survey of this literature could potentially provide a set of sophisticated and powerful concepts and tools with which to think about measurement in the investigation of partially inaccessible systems. For example, indirect measurement as I have been describing it is similar to the classic notion of derived measurement, which can be found in Campbell

(1957) and Ellis (1966). Locating my own ideas with respect to the existing literature on measurement is, I feel, likely to be fruitful for the further development of my project. In addition to the literature on measurement, there is a related literature on scientific methodology, written by practicing scientists, such as Wilson (1952), that I believe could also provide useful perspectives from which to think about indirect measurement.

In all actual cases where we try to make inferences about the inaccessible properties of particular systems, we ought to expect some amount of error in the observations of the accessible properties. Dealing with this error is an essential part of the procedure of indirect measurement, and a vast literature on statistics and error analysis has developed in the twentieth century, much of it in response to problems of indirect measurement. As I mentioned in passing in Chapter 4, the famous work of

Harold Jeffreys in statistics (Jeffreys 1961, 1973) was developed in conjunction with his

137

work on travel times of body waves. Normal mode inversion is also immensely complicated by the possibility of error in the determination of the normal mode frequencies, which are inverted to obtain density distributions. Backus and Gilbert

(1970) is a generalization of the resolution method developed in earlier papers, now taking into account the potential for observational error. Methods of inversion have been developed since then that are Bayesian (e.g., Tarantola 2002) or frequentist (e.g.,

Parker 1994) in their theoretical orientation. An extensive examination of the use of statistical tools and error analysis in the investigation of partially inaccessible systems will thus be an important part of future work on this project.

The notion of stability that I have developed in this dissertation plays an important role in the ultimate justification of assumptions that are made in order to enable indirect measurement. As I have defined it in this dissertation, stability is a feature of the properties of the objects we are investigating—a property is stable if it does not change significantly over time. A task that I leave for the future is spelling out exactly what the relation is between stability and a similar notion—well-behavedness of measurements. My notion of stability is somewhat worrisome in that it makes reference to features of objects that are ―out there‖ in the world, and which cannot be directly accessed. Could it be replaced by the notion of well-behavedness, which is a feature of measurement outcomes, and are thus accessible to us? As a preliminary task, we must of course first make clear exactly what well-behavedness of measurements is. Another notion that is related to both stability and well-behavedness is invariance. An advantage of utilizing the notion of invariance is that it is well-defined, since a formal theory of invariance already exists (see Suppes 2001). Thus, one direction in which I might take

138

this project in the future would be to couch my ideas about indirect measurement, at least the theoretical parts, in more formal terms. This might allow me to define more clearly what stability is, or to replace it with a more well-defined notion like invariance.

I would finally like to end with some comments on what I had originally planned to put in chapters 5 and 6. Chapter 5 of my dissertation was to be about neuroimaging, which I think of as another example of the investigation of a partially inaccessible system. Towards that end, I have been doing research on two different methods of neuroimaging—functional magnetic resonance imaging (fMRI) and electroencephalography/magnetoencephalography (EEG/MEG). Both fMRI and

EEG/MEG are often referred to as involving inverse problems, and neuroimaging has incorporated some techniques that were originally developed for geophysics applications.

There are, however, some major differences between neuroimaging and the problems I have covered in this dissertation. First, the brain is manipulable to an extent that is not possible with the solar system or the interior of the earth. The basic method of neuroimaging is to manipulate the brain in certain ways and then to measure its response.

Second, the human brain is not an individual, like the solar system or the interior of the earth, but a class of individuals. This fact has implications for justification. I have claimed in this dissertation that stability is important for the ultimate justification of indirect measurements, but can we get stable measurements when what we are measuring are properties of various individual brains that are significantly different from each other?

The plan for Chapter 6 was to compare my own views about underdetermination with those that are in the existing literature, starting with Duhem (1954), and going

139

through more recent work such as Laudan and Leplin (1991) and Stanford (2006). My general views on underdetermination are that (1) we simply need more detailed studies of what scientists have actually done in response to situations of underdetermination before we really understand the epistemological implications, and (2) there are really several different kinds of underdetermination. With regard to (2), I mentioned two different kinds of underdetermination, mathematical and theoretical, in Chapter 4. I am inclined to think that we could make further distinctions between kinds of underdetermination, each of which must be dealt with in different ways.

This ends my look at some directions in which I want to take this project, but it by no means exhausts the possibilities for future research. The topics covered in this dissertation are, I think, incredibly rich and deserve much more attention than they have hitherto received from philosophers.

140

Bibliography

Backus, G. E. and Gilbert J. F. (1967). Numerical Applications of a Formalism for Geophysical Inverse Problems. Geophysical Journal of the Royal Astronomical Society, 13: 247-276.

Backus, G. and Gilbert, F. (1968). The Resolving Power of Gross Earth Data. Geophysical Journal of the Royal Astronomical Society, 16: 169-205.

Backus, G. and Gilbert, F. (1970). Uniqueness in the Inversion of Inaccurate Gross Earth Data. Philosophical Transactions of the Royal Society of London, Series A, Mathematical and Physical Sciences, Vol. 266, No. 1173: 123-192.

Brush, S. G. (1980). Discovery of the Earth‘s core. American Journal of Physics, 48(9): 705-724.

Buland, R. 1981. Free Oscillations of the Earth. Annual Reviews of Earth and Planetary Sciences, 9: 385-413.

Bullen, K. E. (1975). The Earth’s Density. London: Halsted Press.

Campbell, N. (1920). Physics: The Elements. Cambridge University Press.

Chang, H. (2004). Inventing Temperature: Measurement and Scientific Progress. Oxford University Press.

141

Copernicus, N. (1978). On the Revolutions. (E. Rosen, Trans.). Baltimore: The Johns Hopkins University Press.

Courant, R. and Hilbert, D. (1953). Methods of Mathematical Physics, 1st English ed., Vol. 1 and 2. New York: Interscience Publishers.

Duhem, P. (1954). The Aim and Structure of Physical Theory. (P. Wiener, Trans.). Princeton: Princeton University Press.

Duhem, P. (1969). To Save the Phenomena. (E. Doland and C. Maschler, Trans.). Chicago: University of Chicago Press.

Dziewonski, A. M. and Anderson, D. L. (1981). Preliminary Reference Earth Model. Physics of the Earth and Planetary Interiors, 25: 297-356.

Evans, J. (1984). On the function and the probable origin of Ptolemy‘s equant. American Journal of Physics, 52 (12): 1080-1089.

Friedman, M. (1990). Kant and Newton: Why Gravity Is Essential to Matter. In P. Bricker and R. I. G. Hughes, (Eds.), Philosophical Perspectives on Newtonian Science. Cambridge, MA: MIT Press.

Friedman, M. (1992). Kant and the Exact Sciences. Cambridge, MA: Harvard University Press.

Friedman, M. (forthcoming). Kant’s Construction of Nature.

Gilbert, F., Dziewonski, A., and Brune, J. (1973). An Informative Solution to a Seismological Inverse Problem. Proceedings of the National Academy of Sciences, Vol. 70, No. 5: 1410-1413.

Gilbert, F. and Dziewonski, A. M. (1975). An Application of Normal Mode Theory to the Retrieval of Structural Parameters and Source Mechanisms from Seismic Spectra. Philosophical Transactions of the Royal Society of London, Series A, Mathematical and Physical Sciences, Vol. 278, No. 1280: 187-269.

142

Gingerich, O. (1989). Johannes Kepler. In Taton and Wilson (1989).

Halmos, P. (1942). Finite Dimensional Vector Spaces. Princeton: Princeton University Press.

Herivel, J. (1965). The Background to Newton’s Principia: A Study of Newton’s Dynamical Researches in the Years 1664-84. Oxford: The Clarendon Press.

Huygens, C. (1690). Discourse on the Cause of Gravity. (K. Bailey and G. E. Smith, Trans.)

Jardine, N. (1984). The Birth of History and Philosophy of Science: Kepler’s A Defence of Tycho Against Ursus. Cambridge: Cambridge University Press.

Jeffreys, H. (1961). Theory of Probability, 3rd ed. Oxford University Press.

Jeffreys, H. (1973). Scientific Inference, 3rd ed. Cambridge University Press.

Jeffreys, H. (1976). The Earth: Its Origin, History, and Physical Constitution, Sixth Edition. Cambridge: Cambridge University Press.

Kanamori, H. and Anderson, D. L. (1977). Importance of Physical Dispersion in Surface Wave and Free Oscillation Problems: Review. Reviews of Geophysics and Space Physics, Vol. 15, No. 1: 105-112.

Kant, I. (1992). Theoretical Philosophy, 1755-1770. (D. Walford and R. Meerbote, Trans.). Cambridge: Cambridge University Press.

Kant, I. (1998). Critique of Pure Reason. (P. Guyer and A. Wood, Trans.). Cambridge: Cambridge University Press.

Kant, I. (2004). Metaphysical Foundations of Natural Science. (M. Friedman, Trans.). Cambridge: Cambridge University Press.

Kepler, J. (1992). New Astronomy. (W. Donahue, Trans.). Cambridge University Press.

143

Krantz, D. H., Luce, R. D., Suppes, P., and Tversky, A. (1971). Foundations of Measurement, Volume I: Additive and Polynomial Representations. New York: Academic Press.

Kreisel, G. (1949). Some Remarks on Integral Equations with Kernels. Proceedings of the Royal Society of London, Series A, Mathematical and Physical Sciences, Vol. 197, No. 1049: 160-183.

Lanczos, C. (1997). Linear Differential Operators. Mineola, NY: Dover.

Laudan, L. and J. Leplin. (1991). Empirical Equivalence and Underdetermination. The Journal of Philosophy, Vol. 88, No. 9. 449-472.

Luce, R. D., Krantz, D. H., Suppes, P., and Tversky, A. (1990). Foundations of Measurement, Volume III: Representation, Axiomatization, and Invariance. San Diego, CA: Academic Press.

Masters, G. and Gubbins, D. (2003). On the Resolution of Density within the Earth. Physics of the Earth and Planetary Interiors, 140: 159-167.

Malvern, L. E. (1969). Introduction to the Mechanics of a Continuous Medium. Prentice-Hall.

Newton, I. (1974). The Correspondence of Isaac Newton, Vol. II: 1676-1687. (H. W. Turnbull, Ed.). Cambridge University Press.

Newton, I. (1974). The Correspondence of Isaac Newton, Vol. V: 1709-1713. (A. Hall and L. Tilling, Eds.). Cambridge University Press.

Newton, I. (1999). Mathematical Principles of Natural Philosophy. (I. B. Cohen and A. Whitman, Trans.). Berkeley: University of California Press.

Newton, I. (2004). Isaac Newton: Philosophical Writings. (A. Janiak, Ed.). Cambridge University Press.

144

Parker, R. L. (1994). Geophysical Inverse Theory. Princeton: Princeton University Press.

Press, F. (1968). Density Distribution in Earth. Science, Vol. 160, No. 3833: 1218-1221.

Riesz, F. and Nagy, B. (1990). Functional Analysis. Mineola, NY: Dover.

Smith, G. E. (2002a). From the Phenomenon of the Ellipse to an Inverse-Square Force: Why Not? In D. Malament, (Ed.), Reading Natural Philosophy: Essays in the History and Philosophy of Science and Mathematics. Chicago: Open Court Press.

Smith, G. E. (2002b). The methodology of the Principia. In I. B. Cohen and G. E. Smith, (Eds.), The Cambridge Companion to Newton. Cambridge: Cambridge University Press.

Smith, G. E. (unpublished). Closing the Loop: Testing Newtonian Gravity, Then and Now. 2007 Isaac Newton Lectures at Stanford University. Lecture 1, February 22, 2007. Available for download at http://www.stanford.edu/dept/cisst/events0506.html.

Smith, G. E. (unpublished). The question of mass in Newton‘s law of gravity. Lecture given at Leiden in 2007.

Stanford, K. (2006). Exceeding Our Grasp. Oxford: Oxford University Press.

Stein, H. (1967). Newtonian Space-Time. Texas Quarterly 10. 174-200.

Stein, H. (1990). From the Phenomena of Motions to the Forces of Nature: Hypothesis or Deduction? PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association, Vol. 1990 Vol. 2. 209-222.

Stein, S. and Wysession, M. (2003). An Introduction to Seismology, Earthquakes, and Earth Structure. Blackwell.

Stephenson, B. (1987). Kepler’s Physical Astronomy. New York: Springer-Verlag.

145

Suppes, P. (2001). Representation and Invariance of Scientific Structures. Stanford, CA: CSLI Publications.

Suppes, P., Krantz, D. H., Luce, R. D., and Tversky, A. (1989). Foundations of Measurement, Volume II: Geometrical, Threshold, and Probabilistic Representations. San Diego, CA: Academic Press.

Swerdlow, N. M. and Neugebauer, O. (1984). Mathematical Astronomy in Copernicus’s De Revolutionibus. New York: Springer-Verlag.

Tarantola, A. (2005). Inverse Problem Theory and Methods for Model Parameter Estimation. Philadelphia: SIAM.

Taton, R. and Wilson, C., (Eds). (1989). Planetary Astronomy from the Renaissance to the Rise of Astrophysics Part A: to Newton. Cambridge University Press.

Wilson, C. (1968). Kepler‘s derivation of the elliptical path. Isis 59. 5-25.

Wilson, C. (1972). How did Kepler discover his first two laws? Scientific American 226. 93-106.

Wilson, C. (1989). Astronomy from Kepler to Newton: historical studies. London: Varorium Reprints.

Wilson, E. B. (1952). An Introduction to Scientific Research. New York: McGraw-Hill.

Wimsatt, W. (2007). Re-Engineering Philosophy for Limited Beings: Piecewise Approximations to Reality. Harvard Unversity Press.

Woodward, J. (2005). Making Things Happen: A Theory of Causal Explanation. Oxford University Press.

146